Counting sort for objects in Java

Question

The counting sort for integers counts the amount of each integer, then traverses the count entries in order, and for each entry inserts the integer representing the bucket into the array as many times as its entry denotes.

This is basically same idea, but for general objects. The algorithm makes a pass through the entire range and each array component is appended to a list of equal elements. After the first pass, the array containing distinct components is sorted by a conventional algorithm (here java.util.Arrays.sort). The next step is to simply iterate over each distinct component, and for each component dump the list it represents into the original array.

Whenever the input array consists of n elements and only k of which are distinct, the running time of object-counting sort is \$\mathcal{O}(n + k \log k),\$ and the space complexity is \$\mathcal{O}(n + k)\$.

The algorithm, however, is particularly picky about the API the array component is to provide: besides the compareTo, hashCode and equals are required. equals is true if and only if compareTo returns 0. Also, if equals is true, hashCode should agree (the other way around is not necessarily true: if two objects agree on hashCode, they might not be same from equals-point of view.

It seems that it is superior to java.util.Arrays.sort at arrays of size around 1e6.

So, what do you think?

Arrays.java:

package net.coderodde.util; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; public class Arrays { public static <E extends Comparable<? super E>> void sort(final E[] array, final int fromIndex, final int toIndex) { if (toIndex - fromIndex < 2) { return; } final Map<E, List<E>> map = new HashMap<>(); for (int i = fromIndex; i < toIndex; ++i) { final E current = array[i]; if (map.containsKey(current)) { map.get(current).add(current); } else { final List<E> list = new ArrayList<>(); list.add(current); map.put(current, list); } } final Object[] keys = map.keySet().toArray(); java.util.Arrays.sort(keys); int index = fromIndex; for (final Object key : keys) { for (final E element : map.get(key)) { array[index++] = element; } } } public static <E extends Comparable<? super E>> void sort(final E[] array) { sort(array, 0, array.length); } public static final <E> boolean arraysEqual(final E[]... arrays) { if (arrays.length < 2) { return true; } for (int i = 0; i < arrays.length - 1; ++i) { if (arrays[i].length != arrays[i + 1].length) { return false; } } for (int idx = 0; idx < arrays[0].length; ++idx) { for (int i = 0; i < arrays.length - 1; ++i) { if (arrays[i][idx] != arrays[i + 1][idx]) { return false; } } } return true; } }

Demo.java:

package net.coderodde.util; import java.util.Random; public class Demo { private static final int N = 1000000; /** * An ad-hoc class for profiling purposes. */ static class Person implements Comparable<Person> { enum SortKey { NAME, AGE } private final String name; private final int age; /** * Chooses the key used for sorting. In these case only two: sorting by * name or sorting by age. */ private SortKey sortKey = SortKey.NAME; Person(final String name, final int age) { this.name = name; this.age = age; this.sortKey = SortKey.NAME; } String getName() { return name; } int getAge() { return age; } void setSortKey(final SortKey sortKey) { this.sortKey = sortKey; } @Override public int compareTo(final Person other) { // Works according the current sort key. switch (sortKey) { case NAME: return this.name.compareTo(other.name); case AGE: return Integer.compare(this.age, other.age); default: throw new IllegalStateException("Unknown sorting key."); } } @Override public int hashCode() { switch (sortKey) { case NAME: return name.hashCode(); case AGE: return age; default: throw new IllegalStateException("Unknown sorting key."); } } @Override public boolean equals(final Object o) { switch (sortKey) { case NAME: return ((Person) o).name.equals(this.name); case AGE: return ((Person) o).age == this.age; default: throw new IllegalStateException("Unknown sorting key."); } } } public static void main(final String... args) { final long seed = System.currentTimeMillis(); System.out.println("Seed: " + seed); final Random rnd = new Random(seed); final Person[] array1 = getRandomPersonArray(N, 100, rnd, Person.SortKey.AGE); final Person[] array2 = array1.clone(); System.out.print("net.coderodde.util.Arrays.sort: "); long ta1 = System.currentTimeMillis(); net.coderodde.util.Arrays.sort(array1); long tb1 = System.currentTimeMillis(); System.out.println((tb1 - ta1) + " ms."); System.out.print("java.util.Arrays.sort : "); long ta2 = System.currentTimeMillis(); java.util.Arrays.sort(array2); long tb2 = System.currentTimeMillis(); System.out.println((tb2 - ta2) + " ms."); System.out.println("Sorted arrays equal: " + Arrays.arraysEqual(array1, array2)); } private static Person[] getRandomPersonArray(final int size, final int maxAge, final Random rnd, final Person.SortKey sortKey) { final Person[] array = new Person[size]; final String[] names = { "Jack", "Amanda", "Paul", "David", "Lisa" }; for (int i = 0; i < size; ++i) { array[i] = new Person(names[rnd.nextInt(names.length)], rnd.nextInt(maxAge + 1)); array[i].setSortKey(sortKey); } return array; } }

janos · Accepted Answer · 2021-10-31 12:59:02Z

Counting sort

The definition in wikipedia, emphasis mine:

counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess distinct key values, and applying prefix sum on those counts to determine the positions of each key value in the output sequence.

The posted code groups the items to distinct groups, then sorts the keys, and finally builds the sorted output. It does not count objects, and it does not use counts to produce the output.

It sorts based on distinct keys. I see the similarity with counting sort, but it's not really that.

Comparing with `Arrays.sort`

It seems that it is superior to java.util.Arrays.sort at arrays of size around 1e6.

I think proper benchmarking is hard in Java. To make such bold statement, I would expect stronger evidence, for example using jmh.

The measurement in the posted code gives unfair advantage to itself, by using a comparison on age, with at most 101 distinct values. That is, the 1e6 random objects will all fall into 101 distinct groups, and in this case it's not surprising that the posted code seems to perform better. Tweaking the posted code a bit, the scale seems to tip when the number of distinct objects is passes about one third of the total count. When all objects are distinct, java.util.Arrays.sort wins hands down.

The statement could become true by adding a relevant detail:

It seems that it is superior to java.util.Arrays.sort at arrays of size around 1e6 when the number of distinct items is relatively small (less than one fourth of the total count).

Simpler aggregation code

A bit simpler way to write the aggregation code:

for (int i = fromIndex; i < toIndex; ++i) { final E current = array[i]; List<E> list = map.get(current); if (list == null) { list = new ArrayList<>(); map.put(current, list); } list.add(current); }

Even simpler:

for (int i = fromIndex; i < toIndex; ++i) { final E current = array[i]; List<E> list = map.computeIfAbsent(current, k -> new ArrayList<>()); list.add(current); }

One-liner:

final Map<E, List<E>> map = java.util.Arrays.stream(array, fromIndex, toIndex) .collect(Collectors.groupingBy(Function.identity()));

Jin Kwon · Accepted Answer · 2021-10-31 12:02:47Z

In my humble opinion, the counting sort should be remained as a non-comparison sort. That's why the algorithm is suitable for positive integers of a very narrow range.

The code you introduced is a comparison sort just by specifying Comparable<? super E>>.

The aggregating code is basically same as

Map<E, List<E>> map = new TreeMap<>(); Arrays.stream(array).forEach(e -> { map.compute(e, (k, v) -> { if (v == null) { v = new ArrayList<>(); } v.add(v); return v; }); });

Event for a non-stable algorithm,

Map<E, Integer> map = new TreeMap<>(); Arrays.stream(array).forEach(e -> { map.compute(e, (k, v) -> { if (v == null) { v = 0; } return ++v; }); });

Now the problem is that the distribution of possibility of the unsorted array.

What if someone is trying to counting-sort following array?

1, 2, 2^31-1

Stack Exchange Network

Counting sort for objects in Java

2 Answers 2

Counting sort

Comparing with `Arrays.sort`

Simpler aggregation code

Hot Network Questions

Counting sort for objects in Java

2 Answers 2

Counting sort

Comparing with Arrays.sort

Simpler aggregation code

Related

Hot Network Questions

Comparing with `Arrays.sort`