3
\$\begingroup\$

The following is my Ruby attempt at a (very) basic CSV file parser class, inspired by an exercise from the book Seven Languages in Seven Weeks. I'm a Ruby novice and will be grateful for any suggestions for improvement.

#!/usr/local/bin/ruby -w # CSV file parser. # # USAGE: # CsvFile.new('data.csv').each { |row| puts row.firstname } # class CsvFile # Parses the given CSV file into a collection of rows. # def initialize file @rows = [] headers = file.gets.chomp.split(", ") file.each do |line| values = {} headers.zip(line.chomp.split(", ")).each do |key, value| values[key] = value end @rows << CsvRow.new(values) end end # Iterates over all rows. # def each @rows.each { |row| yield(row) } end # Returns the index-th row, or null if no such row exists. # def [] index @rows[index] end end # CSV row. # class CsvRow # Creates a new CSV row with the given values. # # * *Args* : # - +values+ -> a hash containing the column -> value mapping # def initialize values @values = values end # Returns the value in the column given as method name, or null if # no such value exists. # def method_missing name, *args @values[name.to_s] end end # TEST CASES # class AssertionError < RuntimeError end def assert &block raise AssertionError unless yield end require 'stringio' file = StringIO.new( "firstname, lastname, age, sex Andrej, Beles, 25, male Delia, Marin, 20, female Henry, Prahanth, 33, male" ) csvFile = CsvFile.new(file) row = csvFile[0] assert { row.firstname == "Andrej" } assert { row.lastname == "Beles" } assert { row.age == "25" } assert { row.sex == "male" } row = csvFile[1] assert { row.firstname == "Delia" } assert { row.lastname == "Marin" } assert { row.age == "20" } assert { row.sex == "female" } row = csvFile[2] assert { row.firstname == "Henry" } assert { row.lastname == "Prahanth" } assert { row.age == "33" } assert { row.sex == "male" } puts "DONE." 
\$\endgroup\$
1
  • \$\begingroup\$If you are trying for a generic csv parser you need to handle more then just splitting a line on commas. What about commas embedded in quoted strings? How should strings be quoted? Can you handle multi-line quoted strings? Can you recover if something is not quoted correctly? You may want to at least handle the samples at en.wikipedia.org/wiki/Comma-separated_values\$\endgroup\$
    – Darryl
    CommentedNov 21, 2014 at 21:05

1 Answer 1

3
\$\begingroup\$

You've got a bug here.

headers = file.gets.chomp.split(", ") 

There is no requirement in a *.csv file that there be a space after the comma between items. In fact, if there is a space, that space should be considered part of the item, not part of the delimiter.

The next thing I notice is that comma separated files are not strictly separated by commas. Other delimiters are possible, and likely to be come across. Things such as pipes | are common. I would consider supporting other delimiters. For example, Excel supports Tabs, Semicolons, Commas, and Spaces, along with an option for a custom defined delimiter. You might not want to fuss with supporting user defined delimiters, but certainly your class becomes more useful if you support the ones I've mentioned.

The process of implementing this should clean up the string literal duplication you have here.

 def initialize file @rows = [] headers = file.gets.chomp.split(", ") file.each do |line| values = {} headers.zip(line.chomp.split(", ")).each do |key, value| values[key] = value end @rows << CsvRow.new(values) end end 

But minimally, you should replace ", " with a constant value so you never accidentally change the delimiter in one place, but not the other.

\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.