6
\$\begingroup\$

I was trying out leetcode's Largest Number.
As per the challenge's description:

Given a list of non-negative integers nums, arrange them such that they form the largest number and return it. Since the result may be very large, so you need to return a string instead of an integer.

Example 1:
Input: nums = [10,2]
Output: "210"

Example 2:
Input: nums = [3,30,34,5,9]
Output: "9534330"

Constraints:
1 <= nums.length <= 100
0 <= nums[i] <= \$10^9\$

Here is my solution:

class Solution { // custom comparator: comparing integers as strings // to decide which to place before the other. static class SortByFirstDigit implements Comparator<Integer> { @Override public int compare(Integer i1, Integer i2) { String a = Integer.toString(i1); String b = Integer.toString(i2); //compare integers as strings to know which will be picked up ie 'ab' and 'ba' for example return Integer.compare(0, (a + b).compareTo(b + a)); } } // REF: Convert int[] to Integer[] // function toConvertInteger is taken from stackoverflow at: // https://stackoverflow.com/a/31967630/1063062 public static Integer[] toConvertInteger(int[] ids) { Integer[] newArray = new Integer[ids.length]; for (int i = 0; i < ids.length; i++) { newArray[i] = Integer.valueOf(ids[i]); } return newArray; } public String largestNumber(int[] nums) { Integer[] Nums = toConvertInteger(nums); Arrays.sort(Nums, new SortByFirstDigit()); StringBuilder builder = new StringBuilder(); for (int s : Nums) builder.append(s); //if we get a string of all zeros, strip to 1 zero. if (builder.toString().matches("^[0]+$")) return "0"; return builder.toString(); } } 

Question

  • This code passes all test but runs in around 9ms. What part(s) of the code are making it slow? and how to make it faster?
  • Or, is it better to solve this problem in other ways (ie more performant) like for example using a N-ary tree for example? While analyzing the problem I had at some point thought of a tree algorithm which I will explain graphically in this picture: enter image description here

It will traverse from leafs to parents following:

concatenate(parent key + greater integer nodes (ie greater than parent)) ---> parent integer value -->concatenate (parent key + smaller integer nodes (ie smaller than parent))

ie in this example 9 5 34 3 30

\$\endgroup\$
4
  • 2
    \$\begingroup\$You do not take into account that the input numbers are not necessarily unique. You are essentially submitting for review your data structure for the algorithm and the data stucture is not ready. Also, the question about the traversal algorithm is better suited at stackoverflow.com.\$\endgroup\$CommentedApr 4, 2024 at 11:09
  • 1
    \$\begingroup\$Wouldn't a simple sort of the input strings offer a way to immediately read out the solution? Short strings (e.g. "9") sorting before long ones (e.g. "91") helps us quickly find the answer. The example highlights "2" > "10".\$\endgroup\$
    – J_H
    CommentedApr 4, 2024 at 17:03
  • 3
    \$\begingroup\$@J_H no see example 2, you have 3 34 and 30. In the output 34 is before 3, and 30 is after 3 ie 34330.\$\endgroup\$
    – ccot
    CommentedApr 5, 2024 at 12:44
  • 1
    \$\begingroup\$@TorbenPutkonen Could you indicate an example where the calculation is incorrect? If not could you remove your comment?\$\endgroup\$CommentedApr 7, 2024 at 14:46

3 Answers 3

9
\$\begingroup\$

Let's pretend that the purpose of LeetCode is to train for the real world. In the real world,

  • you wouldn't call this class Solution;
  • you would want to write your own tests;
  • you would want to write your own benchmarks (I do not demonstrate this);
  • largestNumber should be static; and
  • once the performance is "good enough" for your application, move on rather than fussing with micro-optimisation.

To the last point: I (mostly) like your current approach, with some asterisks.

SortByFirstDigit is a lie, since it doesn't sort by the first digit only; I propose something like ByDigits with an explanatory comment.

Delete toConvertInteger and use a simple chain of stream operations. The stream will also obviate your use of a builder.

Don't toString() in the comparator; front-load this to a mapping operation that occurs prior to the sort.

Don't Integer.compare(0; you can use String.compare directly.

The comparator should be a final on the class, rather than being instantiated in the method. Same for the zero-trim regex.

The zero-trim regex should not put 0 in a character class; you can use the character directly.

Suggested

LexConcatenator.java

import java.util.Arrays; import java.util.Comparator; import java.util.regex.Pattern; import java.util.stream.Collectors; public class LexConcatenator { private static final Pattern allZero = Pattern.compile("^0+$"); private static final ByDigits comp = new ByDigits(); private static final class ByDigits implements Comparator<String> { // Compare string representations of integers to know which order produces the larger value when concatenated @Override public int compare(String i1, String i2) { // Use string-lexicographic comparison instead of parsing integers // and comparing those, because lex comparison can short-circuit return (i2 + i1).compareTo(i1 + i2); } } public static String largestNumber(int[] nums) { // nums is assumed positive. String s = Arrays.stream(nums) // Mark the stream as unordered to allow some optimisations // https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/stream/package-summary.html#Ordering .unordered() .parallel() .mapToObj(Integer::toString) .sorted(comp) .collect(Collectors.joining()); if (allZero.matcher(s).matches()) return "0"; return s; } } 

LexConcatenatorTests.java

import org.junit.jupiter.api.Test; import static org.junit.jupiter.api.Assertions.assertEquals; public class LexConcatenatorTests { private static void compare(String expected, int... values) { String actual = LexConcatenator.largestNumber(values); assertEquals(expected, actual); } @Test void basic() { compare("0", 0); compare("1", 1); compare("10", 0, 1); compare("10", 1, 0); compare("100", 1, 0, 0); compare("111", 1, 1, 1); } @Test void zeroTrim() { compare("0", 0, 0); compare("0", 0, 0, 0); compare("0", 0, 0, 0, 0); } @Test void orders() { compare("908070", 70, 80, 90); compare("900000", 9, 0, 0, 0, 0, 0); compare("9111", 9, 111); compare("991000", 99, 1000); compare("54321", 1, 2, 3, 4, 5); compare("43210", 0, 1, 2, 3, 4); } @Test void overflow() { // would overflow a uint64, though impl is just an int32 compare("18999001001001001001", 1_899_900_100, 1_001_001_001); } } 
\$\endgroup\$
13
  • \$\begingroup\$(Don't Integer.compare(0, ‹comparison›(a, b)): do ‹comparison›(b, a) instead.)\$\endgroup\$
    – greybeard
    CommentedApr 7, 2024 at 15:35
  • 1
    \$\begingroup\$That's... not hidden. I call it out in the front-load description, and it's a pretty clear part of the stream.\$\endgroup\$CommentedApr 7, 2024 at 16:19
  • 1
    \$\begingroup\$Maybe! I'm not very good at DP. I suggest that you file a theory-focused question on cs.stackexchange.com . My first wild guess is that any tree-based solution won't improve on the time complexity of a basic sort, but it might win if you take advantage of some structural knowledge i.e. the digit count and the radix.\$\endgroup\$CommentedApr 7, 2024 at 17:05
  • 1
    \$\begingroup\$Regarding performance, the (i2 + i1).compareTo(i1 + i2) bit is the Achilles' heel as it is creating two throwaway strings for every single "integer" comparison. Also the allZero regex is killing a fly with a sledgehammer as you only need to check if the first character in the string is 0.\$\endgroup\$CommentedApr 8, 2024 at 7:48
  • 1
    \$\begingroup\$@MaartenBodewes Converting the input elements into strings in not necessary.\$\endgroup\$CommentedDec 31, 2024 at 19:09
7
\$\begingroup\$

Observations

Your implementation of the edge case check whether all input elements are equal to zero is costly:

if (builder.toString().matches("^[0]+$")) return "0"; return builder.toString(); 

Firstly, the resulting String is being generated twice, which means allocating two new arrays and copying the internal byte array of the StringBuilder two times. Additionally, you're using the regex engine to process the string.

While it's sufficient to check only if the first character is '0', as the leftmost character corresponds to the first digit of the largest input element. The only scenario in which this character can be '0' is when the largest element itself is zero, which implies that all elements are zero.

if (builder.charAt(0) == '0') { return "0"; } return builder.toString(); 

It's worth noting that it's perfectly fine repeating cheap method calls which produce the same value (such as accessing a property via getter), but it is not prudent to do so when it comes to expensive calls.

And I want to make it clear that JIT-compiler is capable of determining through escape analysis that in this case an instance of StringBuilder is guaranteed not to be modified between these two calls buffer.toString() and it's safe to reuse the result of the first call. But before this can happen, the method should become "hot", i.e. the number of invocations should exceed the compilation threshold. On modern JVMs that make use of the tiered compilation it happens quicker, nonetheless, there would be a considerable amount of calls incurring the penalty of a repeated expensive call.

Designing an Alternative Solution

The solution shared by OP and its improved version posted by Reinderien are making use JDK's built-in Timsort incurring a heavy cost by constructing a pair of combined Strings during each comparison.

There can be two possible directions of improvement:

  • reduce the number of comparisons (we need a different algorithm);
  • reduce the cost of comparison (we need a less expensive comparison strategy).

Let's take a closer look at the process of comparing numbers and see if we can derive some useful properties that would allow us to compare elements directly with one another without constructing additional strings.

Analyzing the Comparison Process

Disclaimer: this analysis is not intended to be a rigorous mathematical study. Instead, the goal to provide an accessible overview of the properties of concatenated integer numbers

  • Case study 1:L2 == L1

In the most simple and the least interesting scenario, the numbers have the same length.

We can compare these numbers directly, without concatenation, to determine their ordering.

Conjecture 1:

If numbers to be compared have the same length, we can compare them directly.

Proof

If the numbers are equal, the results of their concatenations will also be equal. If the numbers differ, a concatenated version that starts from a greater number will also be greater. Hence, we can compare numbers of equal length directly.


But in general, a randomly chosen pair of numbers from the range [0, 109] is likely to contain numbers of different length.

  • Case study 2:L2 % L1 == 0; L1 x k = L2 where k > 1

Suppose we have a pair in which one number has a length L2 is exactly k times greater than the length L1 of the other number.

Let's denote the digits of the shorter number as AB. And let's consider for now that digits of the longer number are undefined and therefore denoted as ?. We'll explore how each ? relates to A or B them as we go (each ? can relate to A and B as either less, greater or equal depending on the scenario being considered).

Below are the abstract representations of the concatenated values. To find out which of them is greater, we should start at the leftmost (most significant) digit and compare the digits one by one. Nothing arcane.

AB:?????? ?? <- concatenated value 1st:2nd ?? ??????:AB <- concatenated value 2nd:1st -- <- digits to compare 

It's important to note that letters A and B are just a way of referring to a digit at a particular position in the shorter number (1st number). Our primary interest is to find out how each of the ? in the longer number (2nd number) relates to them. The fact that these letters are distinct doesn't mean that their actual digit values are necessarily distinct.

Column sign : separator is used only as an indication of the origin of the digits on its left and right. Whitespace characters are added at positions corresponding to : to avoid offsets

In the most trivial scenario, at least one of the two first digits of the 2nd number differs from AB, and we can immediately determine the greater number without considering other digits.

If the first two first digits of the concatenated values are equal, then we can replace the first pair of ? in the longer number with AB:

AB:AB???? ?? AB ??????:AB -- <- digits to compare 

Note: we should compare one digit at a time. Prior to the result above, we will have AB:A??????? and A???????:AB. I'm describing the process of comparison, considering the digit sequence with the length equal to the length of the shorter number only to reduce the number of the diagrams.

Now we have to compare the third and fourth digits of the 2nd number. And again, if one of these ? differs from AB - we got the result of comparison and the remaining ? will come into picture.

If they are equal to AB, we should repeat the same steps the next pair of digits.

AB:ABAB?? ?? AB:ABABAB ?? AB:ABABAB AB AB AB????:AB => AB ABAB??:AB => AB ABABAB:AB (identical digit patterns) -- -- 

Observation:

The process of constructing each concatenated value is an equivalent of repeating the digits of the initial number (on the left of the : separator) in a circular fashion.

The shorter number was repeated entirely k times: (AB):(AB)(AB)(AB)(AB).

And the longer number was only partially repeated, with the repeated segment truncated to the length of the shorter number: (ABABABAB):(AB).

Conjecture 2:

In the context of this problem, any number can be considered to be equal to a longer number repeating all its digits k times (i.e. it doesn't matter which number goes first 39 or 3939).

Proof

The example shown above can be generified to demonstrate that this conjunction holds true for any L1 and L2, where L2 is exactly k times greater than L1.

We can repeat the same steps to prove that a digit sequence S1 containing all digits of the shorter number will be observed in the longer number k times, and both versions of the concatenated value will consist of the digit sequence S1 repeated k+1 times. Hence, guaranteed to be equal regardless of the actual digit constituting this S1 sequence.

  • Case study 3:L2 % L1 != 0

Now, let's consider a case when the length L2 of the first number does not divide evenly by the length L1 of the second number.

And let's refer to the digits of the shorter number as AB. And let's denote the digits of the longer number with ? (as undefined).

AB:??? ?? AB:AB? ?? AB:ABA B? AB:ABA BA ?? ???:AB => AB ???:AB => AB AB?:AB => AB ABA:AB (varying digit patterns) -- -- - 

The digit patterns of concatenated values differ from each other. And they can be equal only if the actual digits that correspond to A and B are equal. However, that's not the most important observation.

The key detail here is that both concatenated values are formed by repeating the digits of the initial number on the left of the : separator.

The shorter number (AB) is repeated as (AB)(AB)(A) with the last segment truncated. And the longer number (ABABA) partially repeated as (AB).

This characteristic is also seen in the previously considered case. We can formulate a new conjecture based on the example shown above and the observation from the previous case study.

Conjecture 3:

While performing comparison of two numbers of length L1 and L2, where L1 != L2, we can replace these numbers with their repetitive versions of length L1 + L2 without effecting the result of comparison.

Proof

Let's say that digits of the shorter constitute a digit sequence S1.

And let's go through the same steps of constructing concatenated values as we did already a couple of time.

The first mismatch determines the result of comparison. If the prefix of length L1 at the beginning of the concatenated value that starts with the longer number doesn't match S1, then we know the winner. Otherwise, the equality of these segments creates a condition for the subsequent equality check, and we proceed with the same steps of constructing concatenated values as we did a couple of times before.

Here are the resulting concatenated values:

(S1) : (S1)x(L2/L1 times) + (S1)[0, L2%L1) <- 1st:2nd (S1) + (S1)x(L2/L1 - 1 times) + (S1)[0, L2%L1) : (S1) <- 2nd:1st 

I.e. the first value is formed by repeating the shorter number described as (S1). The repeated part consists of L2/L1 full (S1) sequences followed by a prefix of (S1) having a length of L2 % L1.

And the second concatenated value is constructed by being partially repeating the longer number. The repeated part consists of a single occurrence of (S1).

We can implement comparison based on this property instead of constructing dozens of strings.

Implementing repeating integer wrapper

Let me introduce a lighweight wrapper type that we'll use later to leverage the comparison properties we discovered.

Its key functionality:

  • method digitAt() retrieving a digit at arbitrary position that exceed the length of the wrapped int value;
  • method length() reporting the length of the wrapped int value.

It's a simple data carrier that doesn't need encapsulation, hence, a perfect candidate for being implemented as a Java 16 record.

public record RepeatingWrapper(int initial, byte[] digits) { private static final int MAX_INT_DIGIT_COUNT = 10; // the number of digit in Integer.MAX_VALUE public int digitAt(int position) { return digits[position % digits.length]; } public int length() { return digits.length; } public static RepeatingWrapper wrap(int initialValue) { var digits = new byte[positiveIntLength(initialValue)]; addDigits(digits, initialValue); return new RepeatingWrapper(initialValue, digits); } private static int positiveIntLength(int number) { int sentinel = 10; for (int count = 1; count < MAX_INT_DIGIT_COUNT; count++) { if (sentinel > number) { return count; } sentinel *= 10; // multiplication takes less CPU cycles than division } // hence sentinel value is multiplied instead of dividing initial value return MAX_INT_DIGIT_COUNT; } private static void addDigits(byte[] digits, int initialValue) { int remainder = initialValue; for (int i = digits.length - 1; i >= 0; i--) { digits[i] = (byte) (remainder % 10); remainder /= 10; } } // It's also a good idea to override equals() and hashCode() based on the initial int value (I'm sure you know how to do this) } 

Alternative Timsort-based Solution

Now, this task boils down to implementing Comparator that sorts the wrappers based on the first missmatching digit value in reversed order.

The following implementation makes use of Conjecture 1 (numbers of equal length can be compared directly) and Conjecture 3 (numbers can be replaced with their repetitive versions of length L1 + L2):

private static final Comparator<RepeatingWrapper> REVERSED_WRAPPER_COMPARATOR = new Comparator<>() { @Override public int compare(RepeatingWrapper left, RepeatingWrapper right) { if (left.length() == right.length()) { return right.initial() - left.initial(); } int totalLength = left.length() + right.length(); for (int i = 0; i < totalLength; i++) { int diff = right.digitAt(i) - left.digitAt(i); if (diff != 0) { return diff; } } return 0; } }; 

The remaining part is streightforward: wrap the input, sort, unwrap and join.

public static String largestNumber(int[] numbers) { var result = Arrays.stream(numbers) .mapToObj(RepeatingWrapper::wrap) .sorted(REVERSED_WRAPPER_COMPARATOR) .mapToInt(RepeatingWrapper::initial) .collect( StringBuilder::new, StringBuilder::append, StringBuilder::append ); return result.charAt(0) == '0' ? "0" : result.toString(); } 

Note that we are not generating resulting String if the first character is '0'

Unit tests:

class RepeatingTimsortTest { @ParameterizedTest @MethodSource("zeroArrays") void largestNumber_ShouldReturnZero_WhenAllElementsAreZero(int[] input) { assertThat(largestNumber(input)).isEqualTo("0"); } private static Stream<Arguments> zeroArrays() { return Stream.of( Arguments.of(new int[1]), Arguments.of(new int[100]) ); } @ParameterizedTest @MethodSource("sameLengthInput") void largestNumber_ShouldSortElementsOfSameLength(int[] input, String expectedResult) { assertThat(largestNumber(input)).isEqualTo(expectedResult); } public static Stream<Arguments> sameLengthInput() { return Stream.of( Arguments.of(new int[]{1, 0, 3, 2, 5, 6, 4, 7, 9, 8}, join("9, 8, 7, 6, 5, 4, 3, 2, 1, 0")), Arguments.of(new int[]{399, 126, 999, 328, 109}, join("999, 399, 328, 126, 109")), Arguments.of(new int[]{10999, 99775, 33333, 99775, 10999}, join("99775, 99775, 33333, 10999, 10999")) ); } @ParameterizedTest @MethodSource("prefixesInput") @DisplayName("[ largestNumber ] Should resolve elements that have common prefixes") void largestNumber_ShouldResolvePrefixedElements(int[] input, String expectedResult) { assertThat(largestNumber(input)).isEqualTo(expectedResult); } public static Stream<Arguments> prefixesInput() { return Stream.of( Arguments.of(new int[]{35, 31, 3, 39}, join("39, 35, 3, 31")), Arguments.of(new int[]{35, 31, 3, 356, 39}, join("39, 356, 35, 3, 31")), Arguments.of(new int[]{39535, 395, 39535, 350, 3503, 35035, 3505}, join("395, 39535, 39535, 3505, 35035, 350, 3503")), Arguments.of(new int[]{35935, 359, 35935}, join("359, 35935, 35935")), Arguments.of(new int[]{75, 7593, 7593, 75}, join("7593, 7593, 75, 75")), Arguments.of(new int[]{35, 353, 35, 353, 35}, join("35, 35, 35, 353, 353")), Arguments.of(new int[]{999985, 777736, 9999859, 9999859, 7777367, 999985}, join("9999859, 9999859, 999985, 999985, 7777367, 777736")), Arguments.of(new int[]{9999977, 9999983, 999, 9999953, 9999905, 9999950, 9999942, 9999945, 9999915}, join("999, 9999983, 9999977, 9999953, 9999950, 9999945, 9999942, 9999915, 9999905")) ); } private static String join(String initial) { return initial.replace(", ", ""); } @Test void largestNumber_ShouldSortLargeInput() { var input = RandomGenerator.of("L64X128MixRandom") .ints(0, 1_000_000_000) .limit(10_000_000) .toArray(); var expected = LexConcatenator.largestNumber(input); var actual = largestNumber(input); assertThat(actual).isEqualTo(expected); } } 

Test dependencies: JUnit 5 and AssertJ

The last test compares the result of sorting 10 million elements with the value produced by the implementation shared by Reinderien (since it's simple and straightforward enough to be trusted).

Fun Part - Measuring Performance

First of all it's worth to note, that the numbers you're getting from LeetCode (or any similar platform) are very rough estimates intended to give you a general idea of how your implementation performs. However, don't treat them as precise performance metrics.

LeetCode cannot afford to run your code long enough for JIT compilation to kick in, and then measure the performance of the optimized code. They also cannot execute your code with minimal interference from other processes. Doing so would be wasteful and would take an inordinate amount of time.

If you want more accurate measurements, you can create your own benchmarks using the Java Microbenchmark Harness (JMH), developed by the OpenJDK team, and run them on your local machine.

@OutputTimeUnit(value = TimeUnit.NANOSECONDS) @Warmup(iterations = 20) @Measurement(iterations = 10) @Fork(5) @BenchmarkMode(Mode.AverageTime) @State(Scope.Benchmark) public class LargestNumberPerformanceTest { private int valueUpperBound = 1_000_000_000; private int digitCount = 10; @Param({ "1000", "10000", "100000" }) private int inputSize; private int[] input; @Setup public void initializeInput() { input = RandomGeneratorFactory .of("L64X128MixRandom") .create(1919121) .ints(0, valueUpperBound) .limit(inputSize) .toArray(); } public static void main(String[] args) throws IOException { org.openjdk.jmh.Main.main(args); } @Benchmark public String timsortReinderien() { return LexConcatenator.largestNumber(input); } @Benchmark public String timsortOP() { return new Solution().largestNumber(input); } @Benchmark public String heapsortOP() { return new LargestNumber().largestNumber(input); } @Benchmark public String repeatitionBasedTimsort() { return RepeatingTimsort.largestNumber(input); // implementation I shared previously } } 

This benchmark also includes Heap sort implementation shared by OP on CS Stack Exchange

The problem description states that the given array will contain up to 100 elements. However, such a tiny input size in this case results in a lightning-fast running time of a method call, which are known to be more challenging to benchmark because they are more impacted by the internal activities in the JVM and at the system level. For that reason, I've used the smallest input size of 1,000.

Before we take a look at the results of the benchmarks presented above, let me introduce a problem related to the performance of Java 8 Streams.

Reinderien used a parallel Stream in their solution, expecting it would reduce the execution time. Let's run the following benchmarks to see if this decision will pay off.

@Benchmark public String timsortReinderien() { return LexConcatenator.largestNumber(input); } @Benchmark public String timsortParallelReinderien() { return LexConcatenatorParallel.largestNumber(input); } 

The only difference between the two versions: I commented out parrallel() operation in the first one. The numbers of warmup and measurement iterations, as well as the number of JVM fork are the same as in the configuration shown above

Benchmark (inputSize) Mode Cnt Score Error Units timsortParallelReinderien 1000 avgt 50 692409.299 ± 1275.764 ns/op timsortParallelReinderien 10000 avgt 50 3084455.051 ± 9519.656 ns/op timsortReinderien 1000 avgt 50 470691.091 ± 2077.516 ns/op timsortReinderien 10000 avgt 50 6527085.855 ± 30675.021 ns/op 

These results I got on a 16-core machine while it was idle with no junk running in the background (before executing benchmarks it highly addviseble to restart the system and make sure that there will be as fewer interfires with other processes as possible, see the manual in the linked JMH-repository)

Doesn't look impressive at all.

For 1,000 switching to parallel actually worsens the running time by 50%. While according to the problem description, there would be at most 100 elements, and the overhead of the parallel execution would be felt more sharply.

The original problem definitely doesn't warrant employing concurrency.

In case of 10,000 elements the parrallel stream runs twice faster, but it's probably not exactly what you might expect from 16 cores. If you'll try to increase the input size to 100,000 elements, you'll see the ratio parallel vs sequential improving further.

Despite this answer getting lengthy, believe it's worth taking the time to explain these parallel vs sequential metrics and to scrutinize the topic of parallel stream in general.

Myth of 100% Guaranteed Performance Boost with Parallel Streams

It's pretty common to see, especially on Stack Overflow, suggestions along the lines of "if you need to speed up the execution, and you have more than one core, Parallel Streams to the rescue!"

But this decision should not be taken lightly, the overhead of thread-management and the costs of splitting and merging concurrent tasks can outweigh the benefits of parallel execution.

Whether it's worth considering parallelizing a CPU-intensive piece of code isn't only contingent on the number of cores available, there are other factors such as:

  • The number of the input elements.
  • The cost of the operations performed on each element (sorting 1,000 elements is not a big deal, but when these elements are used as an input of a computationally-intensive function, let's say, with polynomial complexity, even 100 elements will be a lot of work).
  • And there environment-specific factors. Parallel Streams are using threads from the Common ForkJoinPool (that's an implementation detail which is not mentioned in the Stream API documentation, and we cannot configure it, although there's a hack with running a parallel stream inside your own fork-join pool that hinges on this implementation detail). In a multithreaded environment, cases when a parallel stream doesn't complete quickly enough might create performance bottlenecks.

The bullet points one and two are responsible for the unexpected figures of these measurements parallel vs sequential.

The primary cost running this stream pipeline is associated with the sorted() operation:

String s = Arrays.stream(nums).unordered().parallel() .mapToObj(Integer::toString) .sorted(comp) .collect(Collectors.joining()); 

Converting each element into a string and joining the result are way less expensive.

Internally sorted() operation collects all the elements into an array and then delegates the job to Arrays.parallelSort(). To observe how the cases input size 1,000 and 10,000 differ from each other I would advice to place a breakpoint inside the Arrays.parallelSort() method, and run the Reinderien's parallel stream with these input sizes (you can use one of my unit tests for that).

Alternatively, we can take a closer look at the implementation:

public static <T> void parallelSort(T[] a, Comparator<? super T> cmp) { if (cmp == null) cmp = NaturalOrder.INSTANCE; int n = a.length, p, g; if (n <= MIN_ARRAY_SORT_GRAN || (p = ForkJoinPool.getCommonPoolParallelism()) == 1) TimSort.sort(a, 0, n, cmp, null, 0, 0); else ... 

In the second if-statement we see that in case if parallelism of the common pool is set to one of if the input size is less or equal to the MIN_ARRAY_SORT_GRAN threshold of 8192 elements, the juice is not worth the squeese and sorting will be performed sequentially in the current thread.

I.e. effectively, for 1,000 input elements, only map() and collect() are executed in parallel. And we are paying the price of concurrency related overheads without reaping significant benefits.

This exemplifies the importance of carefully considering the decision of whether to use parallel streams.

And in practice, parallel streams are a niche case and are not used very often.

Now, let's proceed, with the analysis of the results of the benchmarks, and compare how the alternative implementation I've show differs from other solutions posted so far (all streams will be executed sequentially).

Analyzing the Benchmark Results

1,000 elements:

Benchmark (inputSize) Mode Cnt Score Error Units heapsortOP 1000 avgt 50 1333543.206 ± 3086.212 ns/op timsortOP 1000 avgt 50 686338.675 ± 1727.745 ns/op timsortReinderien 1000 avgt 50 471167.662 ± 1199.969 ns/op wrapperBasedTimsort 1000 avgt 50 290360.837 ± 2607.703 ns/op 

As we can in case, Heapsort-based implementation has the largest execution time. This algorithm has the same time complexity of O(n log n) as Timsort but tends to have higher constant factors, making it slower in practice.

The use case when Heapsort might be preferable than other sorting algorithm maybe a memory-constraint environment (since it allows to sort the given array in place).

Heapsort is not adaptable, its constant factors don't vary significantly. While, Timsort is highly adaptable, and it can even run in linear time if the input is almost sorted.

Since Heapsort's performance is always almost identical regardless of the input there's no chance it can outdo Timsort, and I excluded from other benchmarks to reduce the overall execution time (because it takes a lot of time, but you can generate results for Heapsort with other input sizes if you wish).

10,000 elements:

Benchmark (inputSize) Mode Cnt Score Error Units timsortOP 10000 avgt 50 9716024.613 ± 28462.212 ns/op timsortReinderien 10000 avgt 50 6676583.912 ± 14005.644 ns/op wrapperBasedTimsort 10000 avgt 50 4204352.438 ± 27464.796 ns/op 

100,000 elements:

Benchmark (inputSize) Mode Cnt Score Error Units timsortOP 100000 avgt 50 120442242.471 ± 589555.482 ns/op timsortReinderien 100000 avgt 50 85392935.668 ± 493892.337 ns/op wrapperBasedTimsort 100000 avgt 50 55345557.752 ± 327648.833 ns/op 

As we can observe, the following relationship between the results of the three Timsort-based solutions holds true for all input sizes:

  • Reinderien's solution is ~ 60% slower than the wrapper-based solution;
  • The original solution is twice slower than the wrapper-based solution.

Since the algorithm is the same, differs only the cost of each comparison (i.e. only constant factors are effected), we should expect the overall ratios remain the same regardless of the input size.

\$\endgroup\$
1
  • \$\begingroup\$Ultra trivial suggestion: Perhaps using caret ('^') to indicate value(s) (columns) being compared in the text description might be more recognisable. Me? I can't help "seeing" those hyphens as some sort of "underscore" that implies some sort addition (ie. not them as a marker/pointer). Excellent exposition in this answer, btw! Cheers!\$\endgroup\$
    – Fe2O3
    CommentedJan 1 at 22:55
3
\$\begingroup\$

(**Revised from earlier version after serious flaw reported by @A. Ivanchenko. Thank you, Alex.)


It's worrisome to see code for a problem involving an array of unsigned integers resorting to string-ification to find a solution.

Sorting will involve iterative comparison (and possible swapping) of pairs of values. The Leetcode problem states there may be up to 100 integer values involved, ranging from (single digit) 0 to the 10 digits of 109, each value comprised of any number of digits. The OP invokes transformations to strings in order to compare lots of pairs.

Even with conversions to strings (lots of looping and modulo arithmetic) done only once per execution ("up-front loading"), requisite iterative comparisons of array digit values, being 1 to 10 digits long, entails a lot of looping...

After correcting my too optimistic approach (thanks again, Alex), the problem can be solved without recourse to strings or arrays (other than the array of values used.)


NB: Consider the range of the values given in the problem statement. The maximum value is exactly 109 requiring 10 decimal digits ("1,000,000,000"). This suggests that the designer wanted to include the hazard that the maximum value that can be represented by a 32 bit unsigned integer: FFFFFFFF16 == 429496729510. Working with 32 bit unsigned integers, values greater than ~4.3billion are out of range! The ceiling would be much higher if 64 bit integers are used. One 'trade off' is that doubling the size of each value could significantly decrease cache performance. "One doesn't use a hammer to squash a fly."


Consider the first example provided: [10,2] to produce 210.

Trial-and-error reveals that it would be "convenient" if 10 could be temporarily amplified to a value equivalent to nine decimal digits 101010101, and 2 likewise to nine decimal digits: 222222222. (Recall: nine decimal digits is a value that fits snuggly in a 32 bit variable.) Then, sorting (descending) gives the sequence [(2,222222222),(10,101010101)] where those second values are the integer sort keys and 2 or 10 is the actual value to be used for output.

Try again with a few values up to 3 digits long: [47,8,203]
Again, "amplifying" each value to achieve uniformity (3 digits each) could be achieved. The trick is to reproduce the digit values, left-to-right as many times as necessary to achieve uniformity of the numbers.
203 becomes 203203203
47 becomes 474747474 and
8 becomes 888888888.

Now, sorting these "amplified" numbers (descending) gives
( 8,888888888),
( 47,474747474),
(203,203203203)
and extracting the base values in order gives 8_47_203, the largest value that can be formed by the concatenation of the original digits.

The observation is that any number of digits can be 'amplified' to a companion value (of a uniform number of digits) suitable for ranking by simple integer, not string, comparisons.

Turning to the single, ten-digit 'outlier' of 1,000,000,000, all those trailing zeros indicate that this value (or multiple instances thereof) belong on the right end of the finished result. Consider the case of only two values 1,000,000,000 and 7. It's readily apparent that the concatenation looking like 71,000,000,000 is much more than 10,000,000,007. In other words, the single 10 digit value can be handled specially when comparing pairs while sorting elements.

Below are fragments of a 'C' approach to this problem; not Java as I am unfamiliar with Java. I do beg your indulgence.

// define an object to store value both as given and 'augmented' typedef struct { uint32_t org; // original value uint32_t nrm; // 'augmented' } Arr; ... // initialise an array of 'original' values ('augmented' companions default to zero) Arr arr[] = { {0}, {1}, {12}, {0}, {1212}, {123}, {0}, {1234}, {12345}, {123456}, {1234567}, {12345678}, {123456789}, {1000000000}, {0}, }; ... // One-time (startup) derivation of augmented values from original values for( i = 0; i < sizeof arr/sizeof arr[0]; i++ ) { uint32_t n = arr[i].org; // duplicate the value if( n <= 9 ) n = n * 111111111; // 9 => 999999999 else if( n <= 99 ) n = n * 10101010 + n/10; // 98 => 989898989 else if( n <= 999 ) n = n * 1001001; // 987 => 987987987 else if( n <= 9999 ) n = n * 100010 + n/1000; // 9876 => 987698769 else if( n <= 99999 ) n = n * 10000 + n/10; // 98765 => 987659876 else if( n <= 999999 ) n = n * 1000 + n/1000; // 987654 => 987654987 else if( n <= 9999999 ) n = n * 100 + n/100000; // 9876543 => 987654398 else if( n <= 99999999 ) n = n * 10 + n/10000000; // 98765432 => 987654329 else n = ( n == 1000000000 ) ? 0 : n; // 987654321 => 987654321 // Treat 1 billion (10 digits) like zero, except to compare against zero. arr[i].nrm = n; // assign 'augmented' version } ... // Specialised integer comparison function returning -ve/0/+ve // enabling sorting function to swap-or-not two objects. // NB: Descending sequence is wanted. // 1 billion (with trailing zeros) is valued as infinitesimally greater than zero // achieving the desired sorting sequence. int cmp( const void *p0, const void *p1 ) { Arr *px = (Arr*)p0; Arr *py = (Arr*)p1; if( px->nrm == py->nrm ) { if( px->org == 1000000000 ) return -1; if( py->org == 1000000000 ) return 1; return 0; // optional. Could just fall thru to next return statement } return py->nrm - px->nrm; // NB: descending } 

The OP's code and pasted image demonstrate a serious attempt was made to solve this Leetcode challenge. Kudos for the effort.

The attempt to use a tree as the data store will always be fruitless until the 'quirk' of the data values (and how to represent them for useful comparison) is perceived and conquered.
In fact, the overheads of allocating and connecting a tree's nodes can be expensive in both time and space requirements.


Much can be done using strings as "array objects", but these abstractions come at a price ("overhead").

When you truly understand the desired outcome of sorting/comparison, and work through sufficient and varied examples, the solution (to this problem, and others) can be achieved within the realm of blindly fast, "bare metal" integer operations that computers are so good at.

Disagreeing with @Reinderien, using available time to optimise and improve on what you've already achieved is how you can hone your skills and develop new ones. You can learn to see (and produce) better and faster solutions to future challenges you may encounter.

\$\endgroup\$
3
  • 1
    \$\begingroup\$I have to point out that the conclusion that you drew in the first part of your answer is wrong. "Consider this "padding the integer on the right with sufficient copies of its lowest digit"" - appending copies of the copies of the least significant digit to equate the length of two numbers will give incorrect results.\$\endgroup\$CommentedJan 1 at 10:25
  • 2
    \$\begingroup\$Consider 19 and 1993. If you substitute 19 with 1999 (as you said "sufficient copies of its lowest digit"), then you'll get the ordering 191993. Which is wrong, it should be 199319. I've examined the process of comparison at length in my answer\$\endgroup\$CommentedJan 1 at 10:41
  • 1
    \$\begingroup\$@AlexanderIvanchenko You're absolutely right! I was too optimistic in my first approach (replicating 'units' digit only.) Posting corrected revision as I think it's important to stress using integer comparisons (how to?) vs. multiple array element comparisons... Thanks for the pointer! Cheers! :-)\$\endgroup\$
    – Fe2O3
    CommentedJan 1 at 22:01

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.