12
\$\begingroup\$

For a raytracer I’ve been writing with a classmate, we use a custom scene definition format that allows specifying shapes, composite shapes, materials, lights, cameras and transform and render commands.

The assets of a scene (i.e. shapes, composite shapes, materials, lights and cameras) are stored in the data transfer object Scene. The task of this parser is to take a text file with the scene definition, parse its rules and transform them into objects which populate a Scene object.

A few pieces of the code I left out of simplicity, because I want the review to focus on the parsing strategy. Here is a list of these things and what they do.

  • Logger: A simple logger class (a singleton) that prints messages when in debug mode. It’s also possible to generate a log file of the currently stored messages
  • load_file(file_path): Takes a path as a string, loads the file and returns its content as a string
  • sanitize_inputs(file_string): Removes leading/trailing spaces, consecutive spaces, comments (starting with #, anywhere in the line) and empty lines
  • split(file_string, delimiter): Takes a string and splits it by a character. Returns a vector of strings
  • order_rule_vector(rule_vector): Some rules in the scene definition format depend on other rules. To allow the user to write them out in any order, the vector of rules is partitioned first so that rules with dependencies come after their dependants

Also I left out the code for parsing lights, cameras, transform commands and render commands, because the strategy is identical to what’s shown in the code below.

Scene Definition Format

Our raytracer will implement the Phong reflection model. Currently there are regular spheres and axis-aligned boxes.

# name, ambient, diffuse, specular terms, shininess define material red 1 0 0 1 0 0 1 0 0 1 # name, min, max positions, material name define shape box b1 -100 -80 -20 1002 80 -100 red # name, center position, radius, material name define shape sphere s1 0 0 -100 50 blue # name, shape(s) define shape composite c1 b1 s1 

The Parser

std::shared_ptr<Scene> load_scene(std::string const& file_path) { std::shared_ptr<Scene> scene = std::make_shared<Scene>(); auto logger = Logger::instance(); logger->log("-- Start. Parsing scene: " + file_path); std::string file_string = load_file(file_path); sanitize_inputs(file_string); std::vector<std::string> rule_vector = split(file_string, '\n'); order_rule_vector(rule_vector); std::istringstream iss{ file_string }; for (auto&& rule : rule_vector) { std::istringstream rule_stream{ rule }; std::string command; rule_stream >> command; if (command == "define") { std::string define_target; rule_stream >> define_target; if (define_target == "material") { std::string mat_name; rule_stream >> mat_name; if (scene->materials.find(mat_name) != scene->materials.end()) { logger->log("Warning: Duplicate material '" + mat_name + "' was skipped."); continue; } Color ambient, diffuse, specular; double shininess; rule_stream >> ambient.r; rule_stream >> ambient.g; rule_stream >> ambient.b; rule_stream >> diffuse.r; rule_stream >> diffuse.g; rule_stream >> diffuse.b; rule_stream >> specular.r; rule_stream >> specular.g; rule_stream >> specular.b; rule_stream >> shininess; logger->log("--- Adding material: " + mat_name); auto material = std::make_shared<Material>( mat_name, ambient, diffuse, specular, shininess ); if (!scene->materials.insert({ mat_name, material }).second) { logger->log("Error: Material '" + mat_name + "' wasn't added to the scene object."); } } else if (define_target == "shape") { std::string shape_type, shape_name; rule_stream >> shape_type; rule_stream >> shape_name; std::shared_ptr<Shape> shape; if(shape_type == "sphere") { glm::vec3 center; double radius = 1.0; rule_stream >> center.x; rule_stream >> center.y; rule_stream >> center.z; rule_stream >> radius; std::string mat_name; rule_stream >> mat_name; std::shared_ptr<Material> material; if(scene->materials.find(mat_name) == scene->materials.end()) { logger->log("--- Warning: Material '" + mat_name + "' not found. Using default material instead."); material = std::make_shared<Material>(); } else { material = scene->materials[mat_name]; } shape = std::make_shared<Sphere>(shape_name, material, center, radius); } else if(shape_type == "box") { glm::vec3 min, max; rule_stream >> min.x; rule_stream >> min.y; rule_stream >> min.z; rule_stream >> max.x; rule_stream >> max.y; rule_stream >> max.z; std::string mat_name; rule_stream >> mat_name; std::shared_ptr<Material> material; if(scene->materials.find(mat_name) == scene->materials.end()) { logger->log("--- Warning: Material '" + mat_name + "' not found. Using default material instead."); material = std::make_shared<Material>(); } else { material = scene->materials[mat_name]; } shape = std::make_shared<Box>(shape_name, material, min, max); } else if (shape_type == "composite") { std::string child_token; Composite comp{ shape_name }; while (std::getline(rule_stream, child_token, ' ')) { // Leading/trailing spaces result in empty strings // being part of the tokens if (child_token == "") { continue; } auto search_it = scene->shapes.find(child_token); if (search_it == scene->shapes.end()) { logger->log("Error: Shape '" + child_token + "' for composite " + comp.name() + " not found."); continue; } if (!comp.add(search_it->second)) { logger->log("Error: Shape '" + child_token + "' was not added to composite " + comp.name()); continue; } } } else { logger->log("--- Warning: Skipping unknown shape type " + shape_type); continue; } logger->log("--- Adding shape " + shape_name); if (!scene->shapes.insert({ shape_name, shape }).second) { logger->log("Error: Shape " + shape_name + " wasn't added to the scene object."); } } else if (define_target == "light") { // Parse lights } else if (define_target == "camera") { // Parse cameras } else { logger->log("--- Warning: Skipping unknown define target " + define_target); continue; } } else if (command == "transform") { // Parse transform commands } else if (command == "render") { // Parse render commands } else { logger->log("--- Skipping unknown command " + command); continue; } } logger->log("-- Success. Scene parsed completely."); return scene; } 

I don’t like that code. It’s a branch nightmare. I write rule_stream >> a lot. Also there are especially redundant pieces in there which I have to refactor.

An example would be parsing the shapes, more precisely which materials they use. I check a map for the materials name and use the pointer that’s stored there. The same code appears twice (once for spheres, once for boxes). That’s bad. One idea to solve this is to put the material name right after the shapes name. Then I can check for existance of the material before branching into the sphere and box cases.

Are there better approaches to parse such simple, regular languages like our scene definition format? What else can I improve?

\$\endgroup\$
2
  • 2
    \$\begingroup\$boost-spirit.com/home might be helpful\$\endgroup\$CommentedJul 6, 2016 at 19:05
  • 1
    \$\begingroup\$Why not using OOP and create a class Parser with all needed parameters and create a series of class ParserMaterial ... class ParserShape inherited from class Parser ?\$\endgroup\$CommentedOct 26, 2016 at 20:59

1 Answer 1

3
\$\begingroup\$

Getting rid of the branches

I don’t like that code. It’s a branch nightmare.

Yes, let's get rid of the branches. Consider that all the branches are in the form of:

if (command == "foo") { // parse rule_stream further … } 

When you have a sequence of homogenous if/else statements like that, there are alternatives to consider. One would be a switch-statement, but those unfortunately only work with integers and enums. Another option would be to create a map from strings to std::functions:

using ParseFunction = std::function<void(std::istream&, Scene*)>; static const std::unordered_map<std::string, ParseFunction> commands = { {"define", parse_define}, {"transform", parse_transform}, {"render", parse_render}, }; 

Then inside load_scene(), you would write something like:

std::string command; rule_stream >> command; auto it = commands.find(command); if (it != commands.end()) { auto parse_function = it->second; parse_function(rule_stream, scene->get()); } else { logger->log("--- Skipping unknown command " + command); continue; } 

Then you can define a function for each command, for example:

void parse_define(std::istream& rule_stream, Scene* scene) { std::string define_target; rule_stream >> define_target; … scene->materials.insert(…); … } 

That shows how to get rid of the outer ifs. You can also do the same for targets and shapes.

Avoiding writing rule_stream >>

I write rule_stream >> a lot.

This is because you are reading each individual value. You can write your own overloads of operator>>() for the types you are using. For example:

std::istream& operator>>(std::istream& is, glm::vec3 &v) { return is >> v.x >> v.y >> v.z; } 

And then you can write:

glm::vec3 min, max; rule_stream >> min >> max; 

Split up large functions

Another reason your code looks bad is because everything is in one huge function. It is easy to lose overview. The techniques I've shown above already introduce new functions that in turn simplify your load_scene(). But even without those you could have split up the function yourself:

static void parse_define(std::istream& rule_stream, Scene* scene) { … } … static void parse_rule(const std::string& rule, Scene* scene) { std::istringstream rule_stream{ rule }; std::string command; rule_stream >> command; if (command == "define") { parse_define(rule_stream, scene): } else if (command == "transform") { … } … }; std::shared_ptr<Scene> load_scene(std::string const& file_path) { … for (auto&& rule : rule_vector) { parse_rule(rule, scene->get()); } return scene; } 

Are the std::shared_ptrs necessary?

I see you use std::shared_ptrs a lot. But are they really necessary? You could have load_scene() return a Scene directly; even without C++17's guaranteed copy elision, if Scene has a move constructor, it could move all its contents efficiently to the caller. If you do need a smart pointer because of polymorphism, consider whether you can just use astd::unique_ptr instead of the more expensive std::shared_ptr.

Let's have a look at scene->materials. This looks like a std::map or std::unordered_map. These containers will already do memory management for you, and pointers to values stored in the map are guaranteed to be stable. So if it was defined as:

struct Scene { … std::unordered_map<std::string, Material> materials; … }; 

The code in your parser could be simplified a lot:

std::string mat_name; Color ambient, diffuse, specular; float shininess; rule_stream >> mat_name >> ambient >> diffuse >> specular >> shininess; auto result = scene->materials.try_emplace( mat_name, mat_name, ambient, diffuse, specular, shininess ); if (!result.second) { logger->log("Error: Material '" + mat_name + "' wasn't added to the scene object."); } 

(I used C++17's try_emplace() here, you can achieve the same in other ways in earlier versions of C++, but less elegantly).

If you ensure any material used by shapes is in scene->materials before the shapes using them are added, then you can just store a reference to the Material in a Shape instead of using a std::shared_ptr.

If you have a default constructor and an operator>> overload for Material, then you could also consider writing:

Material mat; rule_stream >> mat; if (scene->materials.try_emplace(mat.name(), mat) { logger->log("Error: Material '" + mat.name() + "' wasn't added to the scene object."); } 

Do you really need to order the rules first?

You are ordering the rules before parsing them. That means you have to read the whole file in memory first. But do you really need that? I see you are already handling missing materials, by assigning a default constructed Material to a Shape. But why not add that default Material under the given name to scene->materials, and then when you later do encounter the rule defining that Material, just update it in place?

if (define_target == "shape") { if (shape_type == "sphere") { … auto& material = scene->materials[mat_name]; shape = std::make_unique<Sphere>(shape_name, material /* reference */, center, radius); } … } else if (define_target == "material") { … auto& material = scene->materials[mat_name]; material = {mat_name, ambient, diffuse, specular, shininess}; } … 

You can then do error checking afterwards: any material in scene->materials that is still just default constructed was added by a "shape" target, but there was never a matching "material" target.

About parser generators

Dan Oberlam suggested Boost Spirit. This allows defining a grammar for a language completely inside C++. There are also external tools that can generate a parser written in C++ for you, like GNU Bison. These can be very helpful, especially for more complex grammars. The one you are trying to parse looks simple enough though that it probably is easier to write your own parser in C++.

\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.