7
\$\begingroup\$

The challenge is to sort an integer array containing only ones and zeros with the lowest complexity possible.

As always, I am interested in any improvements suggested but in this question I am most interested in whether the complexity that I have assigned to the algorithm is correct (as well as the lowest possible). I have included two implementations that have two different running times. I believe that they are still both the same complexity. The larger purpose in posting this is to improve or verify my understanding of Complexity and big oh notation.

I assign the complexity as O(n) to this first algorithm since it only looks at every value in the array once, and only once.

Algorithm I

 /// <param name="value">A list of values to sort where all values are either one or zero.</param> /// <returns>Sorted value array.</returns> public static int[] SortOnesZeros(int[] value) { int j = value.Length - 1; int i = 0; while (i < j) { if (value[i] < 0 || value[i] > 1 || value[j] < 0 || value[j] > 1) { throw new Exception("Value array can only contain 1's and 0's."); } while (value[i] == 0) { i++; } while (value[j]== 1) { j--; } if (j > i && value[i] > value[j] ) { int temp = value[j]; value[j] = value[i]; value[i] = temp; j--; i++; } } return value; } 

I have a second implementation that I believe is also O(n) though it is longer since it runs through n twice, but it is not nested. I believe this is true because O(2n) reduces to O(n) since multiplied constants are not considered in big-oh complexity analysis. It is also longer because it changes every value on the second pass. No need to improve on this algorithm, I believe that the first algorithm is already better.:

Algorithm II

 public static int[] SortOnesZerosAlternate(int[] values) { int countOfZeros = 0; foreach(int value in values) { if (value == 0) { countOfZeros++; } if (value <0 || value > 1) { throw new Exception("Value array can only contain 1's and 0's."); } } for (int i = 0; i < countOfZeros; i++) { values[i] = 0; } for (int i = countOfZeros; i < values.Length; i++) { values[i] = 1; } return values; } 

EDIT ***************************************

@superb rain is correct: The first algorithm has a mistake where, under the right conditions, it tries to sort values in the value array that are not actually on the list. This issue will occur on any list where all the values are the same. I have fixed this in my own code by making the change below:

 while (i < j && value[i] == 0) { i++; } while (i < j && value[j]== 1) { j--; } 
\$\endgroup\$
6
  • 1
    \$\begingroup\$Your first one crashes for {0, 0}.\$\endgroup\$CommentedAug 4, 2020 at 17:44
  • \$\begingroup\$For the second algorithm, wouldn't the cost of running through n be compensated by the very predictable element access ? Also, does anything like likely attribute exist in c# ? stackoverflow.com/questions/51797959/…\$\endgroup\$
    – aki
    CommentedAug 5, 2020 at 7:50
  • \$\begingroup\$@anki I think you are correct, and I think that saying that the time complexity O(n) == O(2n) is saying the same thing. See the comments in spyr03's answer for details from actual speed test results. I don't think there is anything like likely in C# but that is interesting, thanks.\$\endgroup\$CommentedAug 6, 2020 at 17:03
  • 1
    \$\begingroup\$How fast is simply using C#'s sort method?\$\endgroup\$CommentedAug 6, 2020 at 22:00
  • \$\begingroup\$Ha Ha, that probably violates the spirit of the thing....\$\endgroup\$CommentedAug 7, 2020 at 16:48

2 Answers 2

4
\$\begingroup\$

I agree with your analysis that both algorithms have O(n) time complexity (where n is the number of elements in the array). An important factor to note is that both algorithms also use constant extra space.

However, you can include other metrics in your analysis too. Anything that may be expensive to do can be considered. Two common metrics are the number of comparisons and the number of exchanges.

The number of comparisons tells you how often an element of the array is accessed and then compared. Memory access can be quite expensive, depending on how far up the memory hierarchy (registers, cache levels, RAM, local storage, external storage) you need to traverse before getting the value.

The number of exchanges tells you how many times elements are swapped. This can be expensive any time writing to memory is. For instance, writing to shared memory in a multi-process program.


Taking the above into account, ignoring the error checking case, and also ignoring any potential optimisations from the compiler, I would argue the second algorithm is better. For the first implementation I count 2 comparisons per element, and 1 exchange per element. For the second, I count 1 comparison per element, and then 1 assignment per element (an assignment of a constant is cheaper than exchanging two elements).

You can replace the exchange with assignments in the first implementation. Since the array only contains the values 0 and 1, and you've checked that the two elements are different.

if (j > i && value[i] > value[j] ) { value[j] = 1; value[i] = 0; j--; i++; } 

Without profiling the code, I don't think it is possible to tell if the extra number of comparisons in the first implementation are more costly than the overhead of the second pass through the array in the second implementation. I would not be surprised if the numerous bounds checks in the first piece of code (i < j) are the actually largest performance hit.


Third implementation

Based on the comment that the first implementation is still faster, here is another solution that tries to improve upon the second. I'm hoping that by writing it with well known operations, the compiler can work some magic.

public static int[] SortOnesZerosAlternate(int[] values) { int countOfZeros = values.Length - values.sum(); // Maybe replace this with values.Clear? for (int i = 0; i < countOfZeros; i++) { values[i] = 0; } for (int i = countOfZeros; i < values.Length; i++) { values[i] = 1; } return values; } 
\$\endgroup\$
12
  • \$\begingroup\$I just ran a test of a randomly generated array with 100 Million values. The first algorithm won being roughly 95% faster. I believe that is because you are focusing on worst case where the list is sorted in descending order... I will test this for the fun of it. Your fix to the first algorithm improved the time by roughly 15%. Based on an unscientific observation the results for the first algorithm can vary by about 10% on my system.\$\endgroup\$CommentedAug 6, 2020 at 16:41
  • \$\begingroup\$Interesting. I will add a third solution as an edit, if you want to try profiling it.\$\endgroup\$
    – spyr03
    CommentedAug 6, 2020 at 16:51
  • 1
    \$\begingroup\$@amalgamate It might be worth pointing out that with 100 million values and an even balance of 0s and 1s, countOfZeros could overflow.\$\endgroup\$
    – spyr03
    CommentedAug 6, 2020 at 17:03
  • 1
    \$\begingroup\$I improved Algorithm III by assigning "values = new int[values.Length];" instead of assigning all of the zero values. This improved the results where... Algorithm I ran 50% faster than Algorithm III for Average Case. Algorithm I ran 8% faster than Algorithm III for Worst Case.\$\endgroup\$CommentedAug 6, 2020 at 18:03
  • 1
    \$\begingroup\$You might consider looking into whatever the C# equivalent of std::fill is. It might be better optimised.\$\endgroup\$
    – spyr03
    CommentedAug 6, 2020 at 18:05
3
\$\begingroup\$

I think you might be able to get a marginal boost by implementing this in two steps: move all the 1s to the end, set everything before to 0. Your current alternate is: count 0s, write 0s, write 1s.

public static void SortOnesZeros(int[] input) { var oneCursor = input.Length - 1; for (var i = oneCursor; i >= 0; i--) { if (input[i] == 1) { input[oneCursor--] = 1; } } // if the cursor is still at the end we have an all 0 array so nothing to clear. if (oneCursor != input.Length - 1) { Array.Clear(input, 0, oneCursor + 1); } } 

Unfortunately Array.Clear is O(N) (although N here is number of 0s, not total elements so worst case is O(N)). There are some interesting techniques to zero the first part faster though: https://stackoverflow.com/a/25808955/1402923

\$\endgroup\$
4
  • \$\begingroup\$Using Array.Clear, which I admit @spyr03 suggested, Shaved enough time that Algorithm III (@spyr03) now beats Algorithm I in worst case. Algorithm I is still faster by 40%.\$\endgroup\$CommentedAug 6, 2020 at 18:17
  • \$\begingroup\$Ah - I must have missed that suggestion, @amalgamate. Have you tested against creating a new array and returning that? Then it's only one pass through the input but you use twice as much memory.\$\endgroup\$
    – RobH
    CommentedAug 6, 2020 at 18:22
  • \$\begingroup\$See the comment that starts with " I improved Algorithm III by assigning "values = new int[values.Length];" in @spyr03's answer.\$\endgroup\$CommentedAug 6, 2020 at 19:25
  • \$\begingroup\$Kudos! I implemented your code and it wins the speed challenge. Percentages will not do here. At an array size of 10 million, your algorithm which I guess I should call IV, gets the values of 40ms for random values, and 51ms for worst case. The first algorithm (I) gets the values of 42ms for random values, and 63ms for worst case.\$\endgroup\$CommentedAug 6, 2020 at 20:30

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.