6
\$\begingroup\$

https://leetcode.com/explore/interview/card/top-interview-questions-medium/110/sorting-and-searching/799

Given a non-empty array of integers, return the k most frequent elements.

Example 1:

Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]

Example 2:

Input: nums = [1], k = 1
Output: [1] Note:

You may assume k is always valid, 1 ≤ k ≤ number of unique elements. Your algorithm's time complexity must be better than O(n log n), where n is the array's size.

using System.Collections.Generic; using System.Linq; using Microsoft.VisualStudio.TestTools.UnitTesting; //good example for bucket sort namespace SortingQuestions { /// <summary> /// https://leetcode.com/explore/interview/card/top-interview-questions-medium/110/sorting-and-searching/799 /// </summary> [TestClass] public class TopKFrequentTest { [TestMethod] public void SimpleUseCaseTest() { int[] nums = { 1, 1, 1, 2, 2, 3 }; int k = 2; int[] expected = { 1, 2 }; CollectionAssert.AreEqual(expected, TopKFrequentClass.TopKFrequent(nums, k).ToArray()); } } public class TopKFrequentClass { public static IList<int> TopKFrequent(int[] nums, int k) { Dictionary<int, int> number2Count = new Dictionary<int, int>(); //we count how many times each number appears foreach (var num in nums) { number2Count.TryGetValue(num, out var temp); number2Count[num] = temp + 1; } List<int>[] bucket = new List<int>[nums.Length + 1]; //we allocate an array in the size of the original list of numbers //we iterate all of the numbers and for add each number to the index in the array // the index represents how many times that number appeared // // 0 times -> none // 1 times -> number 3 // 2 times -> number 2 // 3 times -> number 1 // 4 times -> none // 5 times -> none foreach (var key in number2Count.Keys) { int frequency = number2Count[key]; if (bucket[frequency] == null) { bucket[frequency] = new List<int>(); } bucket[frequency].Add(key); } List<int> result = new List<int>(); // we iterate the list bucket in reverse until the number of items in the result // list equals k, because we iterate in reserve we get the biggest numbers for (int pos = bucket.Length - 1; pos >= 0 && result.Count < k; pos--) { if (bucket[pos] != null) { result.AddRange(bucket[pos]); } } return result; } } } 

I really like this question and it took me some time to understand how to implement the bucket sorting.

I know I can also use MaxHeap. Since C# doesn't have a MaxHeap, do you think using a SortedList can help here somehow with the performance? (remember the question limitations)

I am mainly looking for performance optimizations.

\$\endgroup\$

    3 Answers 3

    9
    \$\begingroup\$

    I think that in terms of performance you'd be hard pressed to do better than a LINQ query. Not only fast but very compact and relatively easy to understand. It could look something like this:

    public IList<int> TopKFrequent(int[] nums, int k) { var answer = (from int n in nums group n by n into g orderby g.Count() descending select g.Key).Take((k)).ToList(); return answer; } 

    For those that prefer method syntax here's the equivalent:

    public IList<int> TopKFrequent(int[] nums, int k) { var answer = nums.GroupBy(n => n).OrderByDescending(g => g.Count()).Take(k).Select(g => g.Key).ToList(); return answer; } 
    \$\endgroup\$
    8
    • 6
      \$\begingroup\$Isn't this sorting nlogn?\$\endgroup\$
      – Gilad
      CommentedSep 16, 2019 at 4:28
    • 1
      \$\begingroup\$I wonder if it's even possible to avoid n log n worst case performance when k == n?\$\endgroup\$CommentedSep 16, 2019 at 9:45
    • 1
      \$\begingroup\$@Gilad it probably won't have to sort the entire list (I believe from memory and measurements that it uses a lazy quick-sort, so it won't sort (much) stuff that isn't used). You'd have to perform some measurements to make sure it's doing the 'right thing', but in my experience it is hard to beat consistently with a general purpose sort. (This is a big advantage of quick-sort over merge sort and log(n) insertion heap sorts).\$\endgroup\$CommentedSep 16, 2019 at 10:57
    • 1
      \$\begingroup\$@VisualMelon as far as I can tell OrderBy will always result in a full sort.\$\endgroup\$
      – Johnbot
      CommentedSep 16, 2019 at 12:59
    • 1
      \$\begingroup\$@Johnbot You're right: seems framework does just sort everything. Looks like the lazy version is new in .NET Core. (code is on corefx GitHub)\$\endgroup\$CommentedSep 16, 2019 at 13:13
    3
    \$\begingroup\$

    There is a bug in your implementation. During the building of result, you can overshoot the amount of items returned if bucket[pos].Count >= k - result.Count. It's probably better to use a separate variable that keeps track of the amount of added items.

    \$\endgroup\$
      2
      \$\begingroup\$

      Expanding on @tinstaafl's answer, you may write an extension method to retrieve the X most frequently occurring items like so:

      public static IEnumerable<T> GetMostFrequent<T>(this IEnumerable<T> inputs, int topXMostFrequent) { var uniqueGroups = inputs.GroupBy(i => i); if (uniqueGroups.Count() <= topXMostFrequent) { return uniqueGroups.Select(group => group.Key); } return uniqueGroups.OrderByDescending(i => i.Count()) .Take(topXMostFrequent) .Select(group => group.Key); } 

      I personally prefer the method syntax to the query syntax. (See this page for the differences between the two.)

      Notice that there is no need to call OrderByDescending() if all of the unique groups of items in the initial collection are to be included in the final collection.

      This generic method allows you to pass in collections of any type (and not just of the int type). E.g.:

      public class Program { public static void Main() { int[] numbers = { 1, 1, 1, 2, 2, 3 }; // Prints 1 and 2 Console.WriteLine("Two most frequent numbers: " + string.Join(", ", numbers.GetMostFrequent(2))); char[] letters = { 'a', 'a', 'a', 'b', 'b', 'c' }; // Prints 'a' and 'b' Console.WriteLine("Two most frequent letters: " + string.Join(", ", letters.GetMostFrequent(2))); string[] fruits = { "apple", "apple", "apple", "banana", "banana", "banana", "cherry", "cherry" }; // Prints "apple" and "banana" Console.WriteLine("Two most common fruits: " + string.Join(", ", fruits.GetMostFrequent(2))); } } 

      You may run my code here and play around with it.

      \$\endgroup\$
      6
      • \$\begingroup\$It seems counterintuitive that if the list is truncated (i.e. not all entries are returned); that you then get an ordered list, but when the list count exactly matches the "top X" counter you then get an unordered list. I get why you did it (saving on ordering performance) but the unusual behavior is worse than the minor performance hit. (You also seem to have forgotten about cases where the list is shorter than the "top X" counter, these will oddly return an ordered list as well)\$\endgroup\$
        – Flater
        CommentedSep 16, 2019 at 12:50
      • 3
        \$\begingroup\$"I invoke Take() before Select(). This is more efficient, because there is no need to perform Select() on elements that will not end up in the final collection anyway." that's not how Enumerable works. The Select is lazy and will only be evaluated Take(n)-number of times.\$\endgroup\$
        – Johnbot
        CommentedSep 16, 2019 at 12:50
      • \$\begingroup\$The ordering of Take and Select should not matter, right? In both cases only the neccessary elements will be processed since it's evaluated lazily.\$\endgroup\$
        – germi
        CommentedSep 16, 2019 at 12:50
      • \$\begingroup\$@Johnbot You are right; I had a brainfart. I have updated my answer.\$\endgroup\$
        – M.Y. Babt
        CommentedSep 16, 2019 at 12:55
      • \$\begingroup\$@Flater "You also seem to have forgotten about cases where the list is shorter than the "top X" counter" -- You are right; I had a brainfart when I wrote this. I have updated my answer.\$\endgroup\$
        – M.Y. Babt
        CommentedSep 16, 2019 at 12:56

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.