Reader's Note: this is NOT a duplicate of the similar question Why is processing a sorted array faster than processing an unsorted array?; the two questions focus on different premises and thus have different explanations.
I have a list of 500000 randomly generated Tuple<long,long,string>
objects on which I am performing a simple "between" search:
var data = new List<Tuple<long,long,string>>(500000); ... var cnt = data.Count(t => t.Item1 <= x && t.Item2 >= x);
When I generate my random array and run my search for 100 randomly generated values of x
, the searches complete in about four seconds. Knowing of the great wonders that sorting does to searching, however, I decided to sort my data - first by Item1
, then by Item2
, and finally by Item3
- before running my 100 searches. I expected the sorted version to perform a little faster because of branch prediction: my thinking has been that once we get to the point where Item1 == x
, all further checks of t.Item1 <= x
would predict the branch correctly as "no take", speeding up the tail portion of the search. Much to my surprise, the searches took twice as long on a sorted array!
I tried switching around the order in which I ran my experiments, and used different seed for the random number generator, but the effect has been the same: searches in an unsorted array ran nearly twice as fast as the searches in the same array, but sorted!
Does anyone have a good explanation of this strange effect? The source code of my tests follows; I am using .NET 4.0.
private const int TotalCount = 500000; private const int TotalQueries = 100; private static long NextLong(Random r) { var data = new byte[8]; r.NextBytes(data); return BitConverter.ToInt64(data, 0); } private class TupleComparer : IComparer<Tuple<long,long,string>> { public int Compare(Tuple<long,long,string> x, Tuple<long,long,string> y) { var res = x.Item1.CompareTo(y.Item1); if (res != 0) return res; res = x.Item2.CompareTo(y.Item2); return (res != 0) ? res : String.CompareOrdinal(x.Item3, y.Item3); } } static void Test(bool doSort) { var data = new List<Tuple<long,long,string>>(TotalCount); var random = new Random(1000000007); var sw = new Stopwatch(); sw.Start(); for (var i = 0 ; i != TotalCount ; i++) { var a = NextLong(random); var b = NextLong(random); if (a > b) { var tmp = a; a = b; b = tmp; } var s = string.Format("{0}-{1}", a, b); data.Add(Tuple.Create(a, b, s)); } sw.Stop(); if (doSort) { data.Sort(new TupleComparer()); } Console.WriteLine("Populated in {0}", sw.Elapsed); sw.Reset(); var total = 0L; sw.Start(); for (var i = 0 ; i != TotalQueries ; i++) { var x = NextLong(random); var cnt = data.Count(t => t.Item1 <= x && t.Item2 >= x); total += cnt; } sw.Stop(); Console.WriteLine("Found {0} matches in {1} ({2})", total, sw.Elapsed, doSort ? "Sorted" : "Unsorted"); } static void Main() { Test(false); Test(true); Test(false); Test(true); }
Populated in 00:00:01.3176257 Found 15614281 matches in 00:00:04.2463478 (Unsorted) Populated in 00:00:01.3345087 Found 15614281 matches in 00:00:08.5393730 (Sorted) Populated in 00:00:01.3665681 Found 15614281 matches in 00:00:04.1796578 (Unsorted) Populated in 00:00:01.3326378 Found 15614281 matches in 00:00:08.6027886 (Sorted)
Item1 == x
, all further checks oft.Item1 <= x
would predict the branch correctly as "no take", speeding up the tail portion of the search. Obviously, that line of thinking has been proven wrong by the harsh reality :)TotalCount
around10,000
or less, the sorted version does perform faster (of course, trivially faster at those small numbers) (FYI, your code might want to have the initial size ofvar data List = new List<Tuple<long, long, string>>(500000)
bound againstTotalCount
instead of hard-coding the capacity)data.Where()
shows the same slowdown, as does anything else that iterates over the sorted list. Operating on the sorted and unsorted lists without any filter takes the same time.TupleComparer
but that is entirely unnecessary sinceComparer<Tuple<long, long, string>>.Default
already has this behavior (from theIComparable
implementation ofTuple<,,>
). So you can just usedata.Sort()
with no arguments.