I Needed to parse some positive integers with the following constraints:
If possible use C++ only (no C API)
Needs to be able to process parsed data after each newline
Should be reasonably fast
Memory usage should be kept low (i.e. don't read the entire file in one go)
CSV file is well formed so error checking can be mostly omitted
Unknown amount of lines in file
Should support both unix (
\n
) and windows (\r\n
) line endings
This code runs fine and is faster than the standard approach of streaming via >>
. On average it takes 290ms to parse 10,000,000 lines whereas the version with >>
takes 1.1s which I think is not a bad improvement.
However I have some concerns about this:
Are there any obvious mistakes that make this slower than it should be?
Are there any side effects or undefined behavior?
Is the template part applied correctly or is there a better way to do it?
Does the naming make sense?
Please also point out anything else you notice.
Sample input:
1,22,333 4444,55555,666666
Code:
#include <algorithm> #include <fstream> #include <string> #include <vector> inline void parse_uints( const char* buffer_ptr, std::vector<uint_fast16_t>& numbers) { bool is_eol = false; while (!is_eol) { uint_fast16_t parsed_number = 0; for (;;) { if (*buffer_ptr == ',') { break; } if (*buffer_ptr == '\r' || *buffer_ptr == '\0') { is_eol = true; break; } parsed_number = (parsed_number * 10) + (*buffer_ptr++ - '0'); } // skip delimiter ++buffer_ptr; numbers.emplace_back(parsed_number); } } template<typename T> void read_line( const std::string& filename, const uint_fast8_t& line_length, const uint_fast8_t& values_per_line, T callback) { std::ifstream infile{filename}; if (!infile.good()) { return; } std::vector<uint_fast16_t> numbers; numbers.reserve(values_per_line); std::string buffer; buffer.reserve(line_length); while (infile.good() && std::getline(infile, buffer)) { parse_uints(buffer.data(), numbers); callback(numbers); numbers.clear(); } } int main(int argc, char** argv) { constexpr uint_fast8_t line_length = 25; constexpr uint_fast8_t values_per_line = 3; read_line(argv[1], line_length, values_per_line, [](auto& values) { // do something with the values here, for example get the max auto max = std::max_element(std::begin(values), std::end(values)); }); }
'\r'
character portably. You might need to expect a single'\n'
as well.\$\endgroup\$\r\n
line endings are typical for windows: cs.toronto.edu/~krueger/csc209h/tut/line-endings.html\$\endgroup\$std::getline()
to read the lines delimited with'\n'
in 1st place. The way you have it your code is portable.\$\endgroup\$