This code is to solve a Hackerrank problem called Attribute Parser. It's my first time dealing with regex in cpp and I wonder if I managed to be expressive enough to tell with the code.
The HRML explanation from Hackerrank:
We have defined our own markup language HRML. In HRML, each element consists of a starting and ending tag, and there are attributes associated with each tag. Only starting tags can have attributes. We can call an attribute by referencing the tag, followed by a tilde, '~' and the name of the attribute. The tags may also be nested.
The opening tags follow the format:
<tag-name attribute1-name = "value1" attribute2-name = "value2" ... >
The closing tags follow the format:
< /tag-name >
For example:
<tag1 value = "HelloWorld"> <tag2 name = "Name1"> </tag2> </tag1>
The attributes are referenced as:
tag1~value tag1.tag2~name
You are given the source code in HRML format consisting of lines. You have to answer queries. Each query asks you to print the value of the attribute specified. Print "Not Found!" if there isn't any such attribute.
I've decided to solve this using the cpp regex engine from the standard library.
//! Hackerrank HRML Attribute Parser //! This program reads from an input file that passes a HRML document as explained in //! in the Hackerrank challenge "Attribute Parser". //! The first line of the input file include n and q, where n is the number of lines //! of the HRML documment that follows, and q is the number of querries that follow the //! HRML documment #include <iostream> #include <fstream> #include <string> #include <regex> #include <unordered_map> int main() { std::ifstream ifile("input"); std::smatch result; if (ifile.is_open()) { int n,q; ifile >> n >> q; ifile.ignore(); std::string document; for (;n>0;--n) { std::string line; std::getline(ifile, line); document.append(line); } using Tag_name = std::string; using Attribute_name = std::string; std::unordered_map<Tag_name, std::unordered_map<Attribute_name, std::string>> tag_map; Tag_name tag_name{}; std::regex tag_regex(R"(<[^>]*)"); auto tag_matches_begin = std::sregex_iterator(document.begin(), document.end(), tag_regex); auto tag_matches_end = std::sregex_iterator(); for (auto tag_it = tag_matches_begin; tag_it != tag_matches_end; ++tag_it) { std::smatch match = *tag_it; auto match_string = match.str(); // if beginig of tag <tag ... if (std::regex_search(match_string, result, std::regex(R"(<\s*([^/]\w*))"))) { std::string new_tag_name = result[1].str(); if (tag_name.empty()) { tag_name = new_tag_name; } else { tag_name = tag_name + "." + new_tag_name; } std::string search_string = match_string; while (std::regex_search(search_string, result, std::regex(R"re(([^=\s]*)\s*=\s*"([^"]*))re"))) { std::string attribute_name = result[1].str(); std::string attribute_value = result[2].str(); tag_map[tag_name][attribute_name] = attribute_value; search_string = result.suffix(); } } // if end of tag </tag> else if (std::regex_search(match_string, result, std::regex(R"(</\s*(\w*))"))) { std::string end_tag_name = result[1].str(); tag_name = std::regex_replace(tag_name, std::regex(end_tag_name), ""); tag_name = std::regex_replace(tag_name, std::regex(R"(\.$)"), ""); } } for (;q>0;--q) { std::string line; std::getline(ifile, line); std::regex_search(line, result, std::regex(R"((.*)~(.*))")); std::string tag_name = result[1].str(); std::string attribute_name = result[2].str(); if (tag_map[tag_name].count(attribute_name) > 0 ) { std::cout << tag_map[tag_name][attribute_name] << "\n"; } else { std::cout << "Not Found!" << "\n"; } } std::cout << std::flush; } else { std::cout << "Unable to open input file" << std::endl;; } return 0; }
I find my regex expressions a little bit cryptic. I wonder how it feels for a third person who reads the code. Any suggestion on style and other tips?
The code works fine, and as an example, the following input:
7 10 <a value = "GoodVal"> <b value = "BadVal" size = "10"> <c height = "auto"> <d size = "3"> <e strength = "200%"> <f a1 = "1" a2 = "2" a3 = "3"> </f> </e> </d> </c> </b> </a> a.b.c.d.e.f~a1 a.b.f~a1 a.b~size a.b.c.d.e.f~a2 a.b.c.d.e.f~a3 a.c~height a.b.d.e~strength a.b.c.d.e~strength d~sze a.b.c.d~size
Produces the following output:
1 Not Found! 10 2 3 Not Found! Not Found! 200% Not Found! 3
std::ctype<char>
, and then imbuing that into the line stream. It will be a little bit awkward, but will be extremely easy to implement given some time to test for corner cases.\$\endgroup\$