Speed up counting sort algorithm in Ruby

Question

I'm attempting this simple HackerRank counting sort problem but testcase #3, which has 100,000 test cases, is timing out.

Is there something particularly inefficient about this code?

input = [] $stdin.each do |line| input << line end input.shift arr = input.map { |line| line.split(' ')[0].to_i } answer = (0..99).map do |i| counter = 0 arr.uniq.each do |j| counter += arr.count(j) if j <= i end counter end puts answer.join ' '

Anonymous · Accepted Answer · 2014-05-14 06:06:02Z

arr.count(j) scans the array, so it takes linear time. Doing it once for each element of arr.uniq takes quadratic time. Doing it 100 times doesn't help either. :)

The arr.uniq.each loop is unnecessary, though (as is the call to uniq). The problem is to count the entries less than i, and you can express that directly:

counter = arr.count {|n| n <= i}

This takes linear time, so it should be much faster.

For another speedup, you can solve the whole problem with only one pass over the input, not 100. As the problem description hints, you'll need an auxiliary array to store the counts.

By the way, the first four lines can be replaced by one, using one of Ruby's convenient utilities:

input = $stdin.readlines

The whole program can be simplified to two lines.

tokland · Accepted Answer · 2014-05-14 13:30:59Z

The performance issues have already been discussed by other users, so I'll just show an alternative way of tackling the problem. When dealing with mathematical-related problems, a functional approach is usually the way to go. That means identifying the abstractions, implementing them and leaving the code as declarative as possible. Chances are all those abstractions will prove useful somewhere else. That's how I'd write it:

nlines = $stdin.readline.to_i numbers = $stdin.lines.take(nlines).map { |line| line.split.first.to_i } cumulative_sums = numbers.frequency.values_at(*(0..99)).accumulate(0, &:+) puts(cumulative_sums.join(" "))

But where did these abstractions come from? let's implement them:

module Enumerable def accumulate(initial_value) acc = initial_value map { |x| acc = yield(acc, x) } end def frequency Hash.new(0).tap { |f| each { |v| f[v] += 1 } } end end

Note that now, the non-generic code that is really doing something interesting is a one-liner (numbers.frequency.values_at(*(0..99)).accumulate(0, &:+)). Also, it's O(n) space/time.

200_success · Accepted Answer · 2014-12-05 11:15:46Z

You've written a straightforward solution, but botched the implementation. It takes O(m + n) space to store n items of input and distinct values up to m=100. It takes O(m n) time to do arr.uniq in a loop, and O(m n²) when including the count() operation. arr.uniq does not vary, and therefore should have been lifted out of the loop, in which case it would be O(n) to do arr.uniq plus O(m) to build the answer array. Moving arr.uniq out of the loop and doing a smarter count() might be enough to bring performance down to a reasonable level.

However, your performance challenge is for the case where \$n = 100000\$, and \$m = 100\$. It would be nice to be able to take advantage of the fact that \$n \gg m\$. While \$O(n)\$ time is the theoretical minimum, \$O(n)\$ space could be improved upon.

An alternate solution is to use a data structure known as a binary indexed tree, also known as a Fenwick tree. The tree can be represented as an array of \$m\$ elements. The tradeoff is that it takes \$\log m\$ time to process each data item. When \$n \gg m\$, you can pretend that \$m\$ is a largish constant.

$$ \newcommand{smallm}[0]{\overset{n\ \gg\ m}{\longrightarrow}} \begin{array}{|l|c|c|} \hline \\ & \textrm{Straightforward approach, done well} & \textrm{Fenwick tree} \\ \hline \\ \textrm{Space} & O(m + n) \smallm O(n) & O(m) \smallm O(1) \\ \hline \\ \textrm{Time} & O(m + n) \smallm O(n) & O((m + n) \log m) \smallm O(n) \\ \hline \end{array}$$

class FenwickTree def initialize(limit) @arr = [0] * limit end # O(log m) def incr(item, count=1) loop do break if item >= @arr.length @arr[item] += count item |= item + 1 end end # O(log m) def count_le(item) count = 0 while item >= 0 count += @arr[item] item = (item & (item + 1)) - 1 end count end # O(m log m) def report ([email protected]).map { |i| count_le(i) } end end f = FenwickTree.new(100) # O(m) gets.to_i.times do # O(n log m) f.incr(gets.to_i) end puts f.report.join(' ') # O(m log m)

This answer is really interesting. Thanks! I'm afraid I'm not the right person to evaluate it, so I've accepted @Anonymous's answer. I will definitely spend some time thinking about this though :) — niftygrifty, CommentedMay 14, 2014 at 11:43
The straightforward solution (counts = Array.new(100, 0); $stdin.each{|line| counts[line.to_i] += 1}; puts (0..99).map{|i| counts.take(i).reduce(:+)}.join(' ')) is \$O(m)\$ space. There's no need for a tree. — Anonymous, CommentedMay 14, 2014 at 16:12

Stack Exchange Network

Speed up counting sort algorithm in Ruby

3 Answers 3

Hot Network Questions

Speed up counting sort algorithm in Ruby

3 Answers 3

Related

Hot Network Questions