A sorting algorithm i have written.
Pros
- Gives each element a position once.
- Few comparisons needed.
Cons
- For certain sets of data, large amount of extra storage is needed.
- Only works with integers.
The main reason i post the algorithm here is to learn if it already exists or not. From the research i have done i can't find anything similar. Though i am sceptical it would be very fun to know if it is new or not. What i mostly want to know however is if the algorithm could be useful and if my implementation of it is reasonable.
Algorithm:
Basic idea: When you divide the number you want to sort with the largest number of the set you get a percentage. This percentage gives a rough position where in the sorted list that number should be.
Because it is difficult to know how the data is distributed the mean/average is used. By dividing the mean-value with the amount of unique values in the set, you can decide how much of the data set is above and below the mean-value.
Then by using the above idea on all the elements, each element can get a position and be put in a new sorted list.
Some lists with a high amount of clustering or small number of elements with large values, can have the same position calculated from different elements. Therefore all the elements are first put in a larger array to lessen the amount of collisions and if a collision occur the element is moved sligthly up/down. Bacause the new array is larger, elements can be moved up and down without causing problems with insertion of new ones.
Duplicates are handled by counting the collisions of each element.
When all elements have been placed in the larger array it is then collapsed into an array of the correct size.
Exampel:
[4, 20, 6, 4, 4]
min = 4, max = 20, mean = 8
Calculating the possible amout of unique elements:
(max - min + 1) = (20 - 4 + 1) = 17
Calculating a percentage for the mean:
(1 - (mean-min)/unique) = (1 - (8-4)/17) = 0.76
This tells us roughly how large portion of the elements are below the mean. This makes sense because we have 4 out of 5 elements lower than 8.
If we for this example create a 20 times larger array (100 elements) we know that the 80 first elements are for the 4 elements under 8. Thus each element has 20 places to use for collisions.
First element:
4 => ((element - min)*places for each element) = (4-4)*20 = 0
6 => (6-4)*20 = 40
Here 4 gets position 0 and 6 gets position 40. Just as 6 is between 4 and 8,40 is between 0 and 80. After doing this for all elements and then removing all the empty space in the large array, an array have been sorted.
Testing:
I have timed a python vesion of this code by hard-coding different scenarios and
comparing time with quicksort and countingsort. Out of a lack of experience i
can't draw any real conclusions but these are my results:
With a set of data larger than roughly 500 elements spacesort took half the time quicksort did.
Generally, countingsort is faster but certain sets of data such as a completely homogeneous set can be done faster with spacesort after a few code changes. And of course sets with a large range of data.
Good to know:
The large empty array created is called "space" in the code.
Except for a data-array the function needs an argument "size", this konstant multiplied by the length of the data-array gives the length of the space-array. e.g the example above used size = 20.
How large the empty array created needs to be depends alot on the set of data. The smaller the size the faster the algorith and less memory usage. A completely homogeneous set needs only the same size as the original data set. The less homogeneous the larger it needs to be. A good starting point might be 10 times larger.
struct spaceElement { public int data; public bool set; // true if a value have been set to this element. } /* Takes two arguments, data = set of data to be sorted and size then returns the sorted list.*/ static int[] spaceSort (int[]data, int size) { int min, max, mean; minMaxMean(data, out min, out max, out mean); spaceElement[] space = insertElements(data, size, min, max, mean); return collapseSpace(space, data.Length); } // Arguments: list of data, size constant for space, min-value, max-value, mean-value. // Returns the space array with sorted elements. static spaceElement[] insertElements(int[] data, int size, int min, int max, int mean){ int lowMax, highMin; lowMaxhighMin(data, min, max, mean, out lowMax, out highMin); int numDiffElements = max - min + 1; // Largets possible amount of different elements. e.g 4,20,6,4,4 has 20-4+1 = 17 elements. double meanPercent = 1 - (mean - min) / (double)numDiffElements; // Determines what percentage of elements are below the mean. e.g 4,5,6,7,20 => 1 - (7 - 4)/17 = 0.82. // Creates a large array where the sorted elements can be inserted according to their size. int spaceLength = data.Length * size; spaceElement[] space = new spaceElement[spaceLength + size * 2 + 1]; // Determine how much space each element gets if it lies below or above the mean. double lowMult = meanPercent * data.Length * size / (lowMax - min + 1); double highMult = (1 - meanPercent) * data.Length * size / (max - highMin + 1); foreach ( int x in data) { bool done = false; int index = 0; int lastStep = size; // Determine the initial position of the element in the space array. if ( x <= mean) { index = (int)Math.Round( (x - min) * lowMult + size/2 ); lastStep = (int)Math.Round(lowMult); } else { index = (int)Math.Round( (x - highMin) * highMult + size + meanPercent*spaceLength ); lastStep = (int)Math.Round(highMult); } /* First checks if the position is empty, if it is put the element there. Else if value is the same as the already stored value increment count on that element. If the value is not equal to the already stored value jump up/down half of the lowMult/highMult. */ while (!done) { if (!space[index].set){ space[index].data = x; space[index].set = true; done = true; } else if (space[index].data == x) { space[index].count += 1; done = true; } else if ( space[index].data > x ) { lastStep = (int)Math.Round(lastStep / 2.0d); index -= lastStep; } else { lastStep = (int)Math.Round(lastStep / 2.0d); index += lastStep; } } } return space; } /* Arguments: list of data. Out: min-value, max-value and mean/average-value */ static void minMaxMean ( int[] data, out int min, out int max, out int mean ) { min = data[0]; max = data[0]; long sum = 0; foreach ( int x in data ) { sum += x; if ( x < min ) min = x; if ( x > max ) max = x; } mean = (int)Math.Round(sum / (double)data.Length); } /* Arguments : list of data, min-value, max-value, mean/average-value. Returns the highest value less than or equal to the mean-value and the lowest value greater than the mean-value. */ static void lowMaxhighMin( int[] data, int min, int max, int mean, out int lowMax, out int highMin) { lowMax = min; highMin = max; foreach ( int x in data ) { if ( x > lowMax && x <= mean ) lowMax = x; if ( x < highMin && x > mean ) highMin = x; } } // Creates an array of the same size as the original data array and populates it with the sorted values. static int[] collapseSpace( spaceElement[] space, int length ) { int[] newArray = new int[length]; int count = 0; foreach ( spaceElement x in space) { if ( x.set) { for (int y = 0; y <= x.count; y++) { newArray[count] = x.data; count++; } } } return newArray; }