Observations
Your implementation of the edge case check whether all input elements are equal to zero is costly:
if (builder.toString().matches("^[0]+$")) return "0"; return builder.toString();
Firstly, the resulting String
is being generated twice, which means allocating two new arrays and copying the internal byte array of the StringBuilder
two times. Additionally, you're using the regex engine to process the string.
While it's sufficient to check only if the first character is '0'
, as the leftmost character corresponds to the first digit of the largest input element. The only scenario in which this character can be '0'
is when the largest element itself is zero, which implies that all elements are zero.
if (builder.charAt(0) == '0') { return "0"; } return builder.toString();
It's worth noting that it's perfectly fine repeating cheap method calls which produce the same value (such as accessing a property via getter), but it is not prudent to do so when it comes to expensive calls.
And I want to make it clear that JIT-compiler is capable of determining through escape analysis that in this case an instance of StringBuilder
is guaranteed not to be modified between these two calls buffer.toString()
and it's safe to reuse the result of the first call. But before this can happen, the method should become "hot", i.e. the number of invocations should exceed the compilation threshold. On modern JVMs that make use of the tiered compilation it happens quicker, nonetheless, there would be a considerable amount of calls incurring the penalty of a repeated expensive call.
Designing an Alternative Solution
The solution shared by OP and its improved version posted by Reinderien are making use JDK's built-in Timsort incurring a heavy cost by constructing a pair of combined String
s during each comparison.
There can be two possible directions of improvement:
- reduce the number of comparisons (we need a different algorithm);
- reduce the cost of comparison (we need a less expensive comparison strategy).
Let's take a closer look at the process of comparing numbers and see if we can derive some useful properties that would allow us to compare elements directly with one another without constructing additional strings.
Analyzing the Comparison Process
Disclaimer: this analysis is not intended to be a rigorous mathematical study. Instead, the goal to provide an accessible overview of the properties of concatenated integer numbers
In the most simple and the least interesting scenario, the numbers have the same length.
We can compare these numbers directly, without concatenation, to determine their ordering.
Conjecture 1:
If numbers to be compared have the same length, we can compare them directly.
Proof
If the numbers are equal, the results of their concatenations will also be equal. If the numbers differ, a concatenated version that starts from a greater number will also be greater. Hence, we can compare numbers of equal length directly.
But in general, a randomly chosen pair of numbers from the range [0, 109] is likely to contain numbers of different length.
- Case study 2:
L2 % L1 == 0
; L1 x k = L2
where k > 1
Suppose we have a pair in which one number has a length L2
is exactly k
times greater than the length L1
of the other number.
Let's denote the digits of the shorter number as AB
. And let's consider for now that digits of the longer number are undefined and therefore denoted as ?
. We'll explore how each ?
relates to A
or B
them as we go (each ?
can relate to A
and B
as either less, greater or equal depending on the scenario being considered).
Below are the abstract representations of the concatenated values. To find out which of them is greater, we should start at the leftmost (most significant) digit and compare the digits one by one. Nothing arcane.
AB:?????? ?? <- concatenated value 1st:2nd ?? ??????:AB <- concatenated value 2nd:1st -- <- digits to compare
It's important to note that letters A
and B
are just a way of referring to a digit at a particular position in the shorter number (1st number). Our primary interest is to find out how each of the ?
in the longer number (2nd number) relates to them. The fact that these letters are distinct doesn't mean that their actual digit values are necessarily distinct.
Column sign :
separator is used only as an indication of the origin of the digits on its left and right. Whitespace characters are added at positions corresponding to :
to avoid offsets
In the most trivial scenario, at least one of the two first digits of the 2nd number differs from AB
, and we can immediately determine the greater number without considering other digits.
If the first two first digits of the concatenated values are equal, then we can replace the first pair of ?
in the longer number with AB
:
AB:AB???? ?? AB ??????:AB -- <- digits to compare
Note: we should compare one digit at a time. Prior to the result above, we will have AB:A???????
and A???????:AB
. I'm describing the process of comparison, considering the digit sequence with the length equal to the length of the shorter number only to reduce the number of the diagrams.
Now we have to compare the third and fourth digits of the 2nd number. And again, if one of these ?
differs from AB
- we got the result of comparison and the remaining ?
will come into picture.
If they are equal to AB
, we should repeat the same steps the next pair of digits.
AB:ABAB?? ?? AB:ABABAB ?? AB:ABABAB AB AB AB????:AB => AB ABAB??:AB => AB ABABAB:AB (identical digit patterns) -- --
Observation:
The process of constructing each concatenated value is an equivalent of repeating the digits of the initial number (on the left of the :
separator) in a circular fashion.
The shorter number was repeated entirely k
times: (AB):(AB)(AB)(AB)(AB)
.
And the longer number was only partially repeated, with the repeated segment truncated to the length of the shorter number: (ABABABAB):(AB)
.
Conjecture 2:
In the context of this problem, any number can be considered to be equal to a longer number repeating all its digits k
times (i.e. it doesn't matter which number goes first 39
or 3939
).
Proof
The example shown above can be generified to demonstrate that this conjunction holds true for any L1
and L2
, where L2
is exactly k
times greater than L1
.
We can repeat the same steps to prove that a digit sequence S1
containing all digits of the shorter number will be observed in the longer number k
times, and both versions of the concatenated value will consist of the digit sequence S1
repeated k+1
times. Hence, guaranteed to be equal regardless of the actual digit constituting this S1
sequence.
- Case study 3:
L2 % L1 != 0
Now, let's consider a case when the length L2
of the first number does not divide evenly by the length L1
of the second number.
And let's refer to the digits of the shorter number as AB
. And let's denote the digits of the longer number with ?
(as undefined).
AB:??? ?? AB:AB? ?? AB:ABA B? AB:ABA BA ?? ???:AB => AB ???:AB => AB AB?:AB => AB ABA:AB (varying digit patterns) -- -- -
The digit patterns of concatenated values differ from each other. And they can be equal only if the actual digits that correspond to A
and B
are equal. However, that's not the most important observation.
The key detail here is that both concatenated values are formed by repeating the digits of the initial number on the left of the :
separator.
The shorter number (AB)
is repeated as (AB)(AB)(A)
with the last segment truncated. And the longer number (ABABA)
partially repeated as (AB)
.
This characteristic is also seen in the previously considered case. We can formulate a new conjecture based on the example shown above and the observation from the previous case study.
Conjecture 3:
While performing comparison of two numbers of length L1
and L2
, where L1 != L2
, we can replace these numbers with their repetitive versions of length L1 + L2
without effecting the result of comparison.
Proof
Let's say that digits of the shorter constitute a digit sequence S1
.
And let's go through the same steps of constructing concatenated values as we did already a couple of time.
The first mismatch determines the result of comparison. If the prefix of length L1
at the beginning of the concatenated value that starts with the longer number doesn't match S1
, then we know the winner. Otherwise, the equality of these segments creates a condition for the subsequent equality check, and we proceed with the same steps of constructing concatenated values as we did a couple of times before.
Here are the resulting concatenated values:
(S1) : (S1)x(L2/L1 times) + (S1)[0, L2%L1) <- 1st:2nd (S1) + (S1)x(L2/L1 - 1 times) + (S1)[0, L2%L1) : (S1) <- 2nd:1st
I.e. the first value is formed by repeating the shorter number described as (S1)
. The repeated part consists of L2/L1
full (S1)
sequences followed by a prefix of (S1)
having a length of L2 % L1
.
And the second concatenated value is constructed by being partially repeating the longer number. The repeated part consists of a single occurrence of (S1)
.
We can implement comparison based on this property instead of constructing dozens of strings.
Implementing repeating integer wrapper
Let me introduce a lighweight wrapper type that we'll use later to leverage the comparison properties we discovered.
Its key functionality:
- method
digitAt()
retrieving a digit at arbitrary position that exceed the length of the wrapped int
value; - method
length()
reporting the length of the wrapped int
value.
It's a simple data carrier that doesn't need encapsulation, hence, a perfect candidate for being implemented as a Java 16 record
.
public record RepeatingWrapper(int initial, byte[] digits) { private static final int MAX_INT_DIGIT_COUNT = 10; // the number of digit in Integer.MAX_VALUE public int digitAt(int position) { return digits[position % digits.length]; } public int length() { return digits.length; } public static RepeatingWrapper wrap(int initialValue) { var digits = new byte[positiveIntLength(initialValue)]; addDigits(digits, initialValue); return new RepeatingWrapper(initialValue, digits); } private static int positiveIntLength(int number) { int sentinel = 10; for (int count = 1; count < MAX_INT_DIGIT_COUNT; count++) { if (sentinel > number) { return count; } sentinel *= 10; // multiplication takes less CPU cycles than division } // hence sentinel value is multiplied instead of dividing initial value return MAX_INT_DIGIT_COUNT; } private static void addDigits(byte[] digits, int initialValue) { int remainder = initialValue; for (int i = digits.length - 1; i >= 0; i--) { digits[i] = (byte) (remainder % 10); remainder /= 10; } } // It's also a good idea to override equals() and hashCode() based on the initial int value (I'm sure you know how to do this) }
Alternative Timsort-based Solution
Now, this task boils down to implementing Comparator
that sorts the wrappers based on the first missmatching digit value in reversed order.
The following implementation makes use of Conjecture 1 (numbers of equal length can be compared directly) and Conjecture 3 (numbers can be replaced with their repetitive versions of length L1 + L2
):
private static final Comparator<RepeatingWrapper> REVERSED_WRAPPER_COMPARATOR = new Comparator<>() { @Override public int compare(RepeatingWrapper left, RepeatingWrapper right) { if (left.length() == right.length()) { return right.initial() - left.initial(); } int totalLength = left.length() + right.length(); for (int i = 0; i < totalLength; i++) { int diff = right.digitAt(i) - left.digitAt(i); if (diff != 0) { return diff; } } return 0; } };
The remaining part is streightforward: wrap the input, sort, unwrap and join.
public static String largestNumber(int[] numbers) { var result = Arrays.stream(numbers) .mapToObj(RepeatingWrapper::wrap) .sorted(REVERSED_WRAPPER_COMPARATOR) .mapToInt(RepeatingWrapper::initial) .collect( StringBuilder::new, StringBuilder::append, StringBuilder::append ); return result.charAt(0) == '0' ? "0" : result.toString(); }
Note that we are not generating resulting String
if the first character is '0'
Unit tests:
class RepeatingTimsortTest { @ParameterizedTest @MethodSource("zeroArrays") void largestNumber_ShouldReturnZero_WhenAllElementsAreZero(int[] input) { assertThat(largestNumber(input)).isEqualTo("0"); } private static Stream<Arguments> zeroArrays() { return Stream.of( Arguments.of(new int[1]), Arguments.of(new int[100]) ); } @ParameterizedTest @MethodSource("sameLengthInput") void largestNumber_ShouldSortElementsOfSameLength(int[] input, String expectedResult) { assertThat(largestNumber(input)).isEqualTo(expectedResult); } public static Stream<Arguments> sameLengthInput() { return Stream.of( Arguments.of(new int[]{1, 0, 3, 2, 5, 6, 4, 7, 9, 8}, join("9, 8, 7, 6, 5, 4, 3, 2, 1, 0")), Arguments.of(new int[]{399, 126, 999, 328, 109}, join("999, 399, 328, 126, 109")), Arguments.of(new int[]{10999, 99775, 33333, 99775, 10999}, join("99775, 99775, 33333, 10999, 10999")) ); } @ParameterizedTest @MethodSource("prefixesInput") @DisplayName("[ largestNumber ] Should resolve elements that have common prefixes") void largestNumber_ShouldResolvePrefixedElements(int[] input, String expectedResult) { assertThat(largestNumber(input)).isEqualTo(expectedResult); } public static Stream<Arguments> prefixesInput() { return Stream.of( Arguments.of(new int[]{35, 31, 3, 39}, join("39, 35, 3, 31")), Arguments.of(new int[]{35, 31, 3, 356, 39}, join("39, 356, 35, 3, 31")), Arguments.of(new int[]{39535, 395, 39535, 350, 3503, 35035, 3505}, join("395, 39535, 39535, 3505, 35035, 350, 3503")), Arguments.of(new int[]{35935, 359, 35935}, join("359, 35935, 35935")), Arguments.of(new int[]{75, 7593, 7593, 75}, join("7593, 7593, 75, 75")), Arguments.of(new int[]{35, 353, 35, 353, 35}, join("35, 35, 35, 353, 353")), Arguments.of(new int[]{999985, 777736, 9999859, 9999859, 7777367, 999985}, join("9999859, 9999859, 999985, 999985, 7777367, 777736")), Arguments.of(new int[]{9999977, 9999983, 999, 9999953, 9999905, 9999950, 9999942, 9999945, 9999915}, join("999, 9999983, 9999977, 9999953, 9999950, 9999945, 9999942, 9999915, 9999905")) ); } private static String join(String initial) { return initial.replace(", ", ""); } @Test void largestNumber_ShouldSortLargeInput() { var input = RandomGenerator.of("L64X128MixRandom") .ints(0, 1_000_000_000) .limit(10_000_000) .toArray(); var expected = LexConcatenator.largestNumber(input); var actual = largestNumber(input); assertThat(actual).isEqualTo(expected); } }
Test dependencies: JUnit 5 and AssertJ
The last test compares the result of sorting 10 million elements with the value produced by the implementation shared by Reinderien (since it's simple and straightforward enough to be trusted).
Fun Part - Measuring Performance
First of all it's worth to note, that the numbers you're getting from LeetCode (or any similar platform) are very rough estimates intended to give you a general idea of how your implementation performs. However, don't treat them as precise performance metrics.
LeetCode cannot afford to run your code long enough for JIT compilation to kick in, and then measure the performance of the optimized code. They also cannot execute your code with minimal interference from other processes. Doing so would be wasteful and would take an inordinate amount of time.
If you want more accurate measurements, you can create your own benchmarks using the Java Microbenchmark Harness (JMH), developed by the OpenJDK team, and run them on your local machine.
@OutputTimeUnit(value = TimeUnit.NANOSECONDS) @Warmup(iterations = 20) @Measurement(iterations = 10) @Fork(5) @BenchmarkMode(Mode.AverageTime) @State(Scope.Benchmark) public class LargestNumberPerformanceTest { private int valueUpperBound = 1_000_000_000; private int digitCount = 10; @Param({ "1000", "10000", "100000" }) private int inputSize; private int[] input; @Setup public void initializeInput() { input = RandomGeneratorFactory .of("L64X128MixRandom") .create(1919121) .ints(0, valueUpperBound) .limit(inputSize) .toArray(); } public static void main(String[] args) throws IOException { org.openjdk.jmh.Main.main(args); } @Benchmark public String timsortReinderien() { return LexConcatenator.largestNumber(input); } @Benchmark public String timsortOP() { return new Solution().largestNumber(input); } @Benchmark public String heapsortOP() { return new LargestNumber().largestNumber(input); } @Benchmark public String repeatitionBasedTimsort() { return RepeatingTimsort.largestNumber(input); // implementation I shared previously } }
This benchmark also includes Heap sort implementation shared by OP on CS Stack Exchange
The problem description states that the given array will contain up to 100
elements. However, such a tiny input size in this case results in a lightning-fast running time of a method call, which are known to be more challenging to benchmark because they are more impacted by the internal activities in the JVM and at the system level. For that reason, I've used the smallest input size of 1,000
.
Before we take a look at the results of the benchmarks presented above, let me introduce a problem related to the performance of Java 8 Streams.
Reinderien used a parallel Stream in their solution, expecting it would reduce the execution time. Let's run the following benchmarks to see if this decision will pay off.
@Benchmark public String timsortReinderien() { return LexConcatenator.largestNumber(input); } @Benchmark public String timsortParallelReinderien() { return LexConcatenatorParallel.largestNumber(input); }
The only difference between the two versions: I commented out parrallel()
operation in the first one. The numbers of warmup and measurement iterations, as well as the number of JVM fork are the same as in the configuration shown above
Benchmark (inputSize) Mode Cnt Score Error Units timsortParallelReinderien 1000 avgt 50 692409.299 ± 1275.764 ns/op timsortParallelReinderien 10000 avgt 50 3084455.051 ± 9519.656 ns/op timsortReinderien 1000 avgt 50 470691.091 ± 2077.516 ns/op timsortReinderien 10000 avgt 50 6527085.855 ± 30675.021 ns/op
These results I got on a 16-core machine while it was idle with no junk running in the background (before executing benchmarks it highly addviseble to restart the system and make sure that there will be as fewer interfires with other processes as possible, see the manual in the linked JMH-repository)
Doesn't look impressive at all.
For 1,000
switching to parallel actually worsens the running time by 50%
. While according to the problem description, there would be at most 100
elements, and the overhead of the parallel execution would be felt more sharply.
The original problem definitely doesn't warrant employing concurrency.
In case of 10,000
elements the parrallel stream runs twice faster, but it's probably not exactly what you might expect from 16 cores. If you'll try to increase the input size to 100,000
elements, you'll see the ratio parallel vs sequential improving further.
Despite this answer getting lengthy, believe it's worth taking the time to explain these parallel vs sequential metrics and to scrutinize the topic of parallel stream in general.
Myth of 100% Guaranteed Performance Boost with Parallel Streams
It's pretty common to see, especially on Stack Overflow, suggestions along the lines of "if you need to speed up the execution, and you have more than one core, Parallel Streams to the rescue!"
But this decision should not be taken lightly, the overhead of thread-management and the costs of splitting and merging concurrent tasks can outweigh the benefits of parallel execution.
Whether it's worth considering parallelizing a CPU-intensive piece of code isn't only contingent on the number of cores available, there are other factors such as:
- The number of the input elements.
- The cost of the operations performed on each element (sorting
1,000
elements is not a big deal, but when these elements are used as an input of a computationally-intensive function, let's say, with polynomial complexity, even 100
elements will be a lot of work). - And there environment-specific factors. Parallel Streams are using threads from the Common ForkJoinPool (that's an implementation detail which is not mentioned in the Stream API documentation, and we cannot configure it, although there's a hack with running a parallel stream inside your own fork-join pool that hinges on this implementation detail). In a multithreaded environment, cases when a parallel stream doesn't complete quickly enough might create performance bottlenecks.
The bullet points one and two are responsible for the unexpected figures of these measurements parallel vs sequential.
The primary cost running this stream pipeline is associated with the sorted()
operation:
String s = Arrays.stream(nums).unordered().parallel() .mapToObj(Integer::toString) .sorted(comp) .collect(Collectors.joining());
Converting each element into a string and joining the result are way less expensive.
Internally sorted()
operation collects all the elements into an array and then delegates the job to Arrays.parallelSort()
. To observe how the cases input size 1,000
and 10,000
differ from each other I would advice to place a breakpoint inside the Arrays.parallelSort()
method, and run the Reinderien's parallel stream with these input sizes (you can use one of my unit tests for that).
Alternatively, we can take a closer look at the implementation:
public static <T> void parallelSort(T[] a, Comparator<? super T> cmp) { if (cmp == null) cmp = NaturalOrder.INSTANCE; int n = a.length, p, g; if (n <= MIN_ARRAY_SORT_GRAN || (p = ForkJoinPool.getCommonPoolParallelism()) == 1) TimSort.sort(a, 0, n, cmp, null, 0, 0); else ...
In the second if
-statement we see that in case if parallelism of the common pool is set to one of if the input size is less or equal to the MIN_ARRAY_SORT_GRAN
threshold of 8192
elements, the juice is not worth the squeese and sorting will be performed sequentially in the current thread.
I.e. effectively, for 1,000
input elements, only map()
and collect()
are executed in parallel. And we are paying the price of concurrency related overheads without reaping significant benefits.
This exemplifies the importance of carefully considering the decision of whether to use parallel streams.
And in practice, parallel streams are a niche case and are not used very often.
Now, let's proceed, with the analysis of the results of the benchmarks, and compare how the alternative implementation I've show differs from other solutions posted so far (all streams will be executed sequentially).
Analyzing the Benchmark Results
1,000
elements:
Benchmark (inputSize) Mode Cnt Score Error Units heapsortOP 1000 avgt 50 1333543.206 ± 3086.212 ns/op timsortOP 1000 avgt 50 686338.675 ± 1727.745 ns/op timsortReinderien 1000 avgt 50 471167.662 ± 1199.969 ns/op wrapperBasedTimsort 1000 avgt 50 290360.837 ± 2607.703 ns/op
As we can in case, Heapsort-based implementation has the largest execution time. This algorithm has the same time complexity of O(n log n) as Timsort but tends to have higher constant factors, making it slower in practice.
The use case when Heapsort might be preferable than other sorting algorithm maybe a memory-constraint environment (since it allows to sort the given array in place).
Heapsort is not adaptable, its constant factors don't vary significantly. While, Timsort is highly adaptable, and it can even run in linear time if the input is almost sorted.
Since Heapsort's performance is always almost identical regardless of the input there's no chance it can outdo Timsort, and I excluded from other benchmarks to reduce the overall execution time (because it takes a lot of time, but you can generate results for Heapsort with other input sizes if you wish).
10,000
elements:
Benchmark (inputSize) Mode Cnt Score Error Units timsortOP 10000 avgt 50 9716024.613 ± 28462.212 ns/op timsortReinderien 10000 avgt 50 6676583.912 ± 14005.644 ns/op wrapperBasedTimsort 10000 avgt 50 4204352.438 ± 27464.796 ns/op
100,000
elements:
Benchmark (inputSize) Mode Cnt Score Error Units timsortOP 100000 avgt 50 120442242.471 ± 589555.482 ns/op timsortReinderien 100000 avgt 50 85392935.668 ± 493892.337 ns/op wrapperBasedTimsort 100000 avgt 50 55345557.752 ± 327648.833 ns/op
As we can observe, the following relationship between the results of the three Timsort-based solutions holds true for all input sizes:
- Reinderien's solution is ~
60%
slower than the wrapper-based solution; - The original solution is twice slower than the wrapper-based solution.
Since the algorithm is the same, differs only the cost of each comparison (i.e. only constant factors are effected), we should expect the overall ratios remain the same regardless of the input size.
"9"
) sorting before long ones (e.g."91"
) helps us quickly find the answer. The example highlights"2" > "10"
.\$\endgroup\$