I (ruby beginner) have a performance issue with the following ruby code. The idea is to aggregate a CSV into a smaller, better-to-handle file and calculating the average of some values. Basically the same code in groovy runs about 30 times faster. And I can't believe that Ruby is that slow! ;-)
Some background: The file consists of lines like this (it is an output file of a JMeter performance test):
timeStamp|elapsed|label|responseCode|responseMessage|threadName|dataType|success|failureMessage|bytes|Latency 2013-05-17_16:30:11.261|4779|Session_Cookie|200|OK|Thread-Gruppe 1-1|text|true||21647|4739
All values of e.g. a certain minute are selected by looking at the first 16 characters of the timestamp:
2013-05-17_16:30:11.261 => 2013-05-17_16:30
I wanted to collect values in buckets/slices which are represented by the shortened timestamp and the label (third column). The value of the second column ("elapsed") is summed up.
require 'csv' require_relative 'slice' # Parameters: <precision> <input file> [<output file>] precision = ARGV[0].to_i input_file = ARGV[1] output_file = ARGV[2] time_slices = Hash.new CSV.foreach(input_file, {:col_sep => '|', :headers => :first_line}) do |row| current_time_slice = row['timeStamp'][0, precision] if time_slices[current_time_slice] == nil time_slices[current_time_slice] = Hash.new end if time_slices[current_time_slice][row['label']] time_slices[current_time_slice][row['label']].put_line(row) else new_slice = Slice.new(current_time_slice, row['label']) new_slice.put_line(row) time_slices[current_time_slice][row['label']] = new_slice end end out = File.new(output_file, 'a') out.puts 'time|label|elapsed_average' time_slices.values.each do |time_slice| time_slice.values.each do |slice| out.puts slice.aggregated_row end end
The slice class looks like this:
class Slice attr_accessor :slice_timestamp, :slice_label, :lines, :sum, :count def initialize(slice_timestamp, slice_label) @slice_timestamp = slice_timestamp @slice_label = slice_label @count = 0 @sum = 0 end def put_line(line) @sum = @sum + line[1].to_i @count = @count + 1 end def average @sum / @count end def aggregated_row @slice_timestamp + '|' + @slice_label + '|' + average.to_s end end
I think that I chose a quite straightforward and non-optimized approach, but still - the same approach is much faster in Groovy. What can be the reason for that?