I have the following Ruby code, which parses an XML document sax style (it's a very simplified version):
require 'nokogiri' class ParsingError < RuntimeError; end class Parser def initialize mapper, collector, error_collector @error_collector = error_collector @sax_handler = Handler.new(collector, @error_collector, mapper) end def process io_stream parser = Nokogiri::XML::SAX::Parser.new(@sax_handler) parser.parse_io(io_stream, 'UTF-8') rescue ParsingError => e @error_collector << "ParsingError: #{e.message}" end end class Handler < Nokogiri::XML::SAX::Document def initialize collector, error_collector, mapper @mapper = mapper @collector = collector @error_collector = error_collector end def start_element name, attrs=[] return if name=='nodes' @current_element = @mapper.new(name) end def end_element name, attrs=[] return if @current_element.nil? data = @current_element.get_errors if data[:errors].empty? @collector << @current_element.result else @error_collector << data end @current_element = nil end def characters string @current_element.record(string) if @current_element end def error message raise ParsingError, message end end class Mapper def initialize name @data = "" @name = name end def record str @data << str end def result {name: @name, data: @data} # here are more properties end def get_errors arr = [] if @data.index("Error 123") arr << "Error Code 123" elsif @data.index("Error 456") arr << "Error Code 456" elsif @data.index("Error 789") arr << "Error Code 789" # and a lot more validations end {name: @name, errors: arr} end end # Example usage xml = %{<?xml version="1.0" encoding="UTF-8"?> <nodes> <node1>some more data</node1> <node2>data Error 123</node2> <malformed xml </nodes> } collector = [] error_collector = [] mapper = Mapper parser = Parser.new(Mapper, collector, error_collector) parser.process(StringIO.new(xml)) puts "Data:" puts collector puts puts "Errors:" puts error_collector
My problem is the data collection and error handling.
- I'm passing two arrays around,
collector
anderror_collector
, which doesn't seem right. I want to collect data and errors in one place, but the collector itself should simply implement a<<
method (like an Array). Any advice how to improve it? - Generally, the error handling feels weird. I need it in two places: Logging parsing errors and validations on items. To identify the items I also need the name of a node, except when it's a parsing error. So there are two kinds of errors: Parsing and node validation errors. How could I improve this code and remove the code smells?
- Any other advice how to improve the code above?