3
\$\begingroup\$

I have solved a task which takes input in a unique markup language. In this language each element consists of a starting and closing tag. Only starting tags can have attributes. Each attribute has a corresponding name and value.

For example:

<tag-name attribute1-name = "Value" attribute2-name = "Value2" ... > 

A closing tag follows this format:

</tag-name> 

The tags may also be nested. For example:

<tag1 value = "HelloWorld"> <tag2 name = "Name1"> </tag2> </tag1> 

Attributes are referenced as:

tag1~value tag1.tag2~name 

The source code of the markup language is given in n lines of input, the number of attribute references or queries is q.

I am seeking any improvements to my code and any bad practices I may be using.

 #include <vector> #include <iostream> using namespace std; typedef vector < string > VS; typedef vector < VS > VVS; //Takes numberOfLines of input and splits the lines into strings VS Split(int numberOfLines) { VS myVec; string x; for (int i = 0; i < numberOfLines; i++) { getline(cin, x); string::size_type start = 0; string::size_type end = 0; while ((end = x.find(" ", start)) != string::npos) { myVec.push_back(x.substr(start, end - start)); start = end + 1; } myVec.push_back(x.substr(start)); } return myVec; } //removes unnecessary characters in this case they are: " , // > (if it is the final element of the tag), and " string Remove(string s) { string c = s; c.erase(c.begin()); auto x = c.end() - 1; if ( * x == '>') { c.erase(x); x = c.end() - 1; } c.erase(x); return c; } int main() { VS HRML, attributes, values, t, validTags; VVS Tags, ValidAttributes, ValidValues; int n, q; cin >> n >> q; HRML = Split(n); //does the heavy lifting for (int i = 0; i < HRML.size(); i++) { string x = HRML[i]; //checks if x contains the beginning of the starting tag if (x[0] == '<' && x[1] != '/') { //checks if x contains the end of the starting tag if (x[x.size() - 1] == '>' && x[1] != '/') { ValidAttributes.push_back(attributes); attributes.clear(); ValidValues.push_back(values); values.clear(); } auto c = x.end() - 1; if ( * c == '>') { x.erase(c); } x.erase(x.begin()); t.push_back(x); Tags.push_back(t); } // checks if x contains the end of the starting tag if (x[x.size() - 1] == '>' && x[1] != '/') { ValidAttributes.push_back(attributes); attributes.clear(); ValidValues.push_back(values); values.clear(); } //checks if x contains the ending tag else if (x[1] == '/') { x.erase(x.begin()); x.erase(x.begin()); x.erase(x.end() - 1); for (int i = 0; i < t.size(); i++) { if (x == t[i]) { t.erase(t.begin() + i); } } } //checks to see if an attribute has been assigned a value else if (x == "=") { attributes.push_back(HRML[i - 1]); values.push_back(Remove(HRML[i + 1])); } } string x = ""; //makes valid(user-usable) tags from all the tags // passed into vector<string> Tags for (int i = 0; i < Tags.size(); i++) { for (int j = 0; j < Tags[i].size(); j++) { if (Tags[i].size() > 1) { string begin = Tags[i][j] + '.'; x += begin; if (j == (Tags[i].size() - 1)) { x.erase(x.end() - 1); validTags.push_back(x + '~'); x = ""; } } else { validTags.push_back(Tags[i][0] + '~'); } } } //iterates through each query given by the user and checks if it is valid for (int i = 0; i < q + 1; i++) { string output = ""; if (i == 0) { string x; getline(cin, x); } else { string x; getline(cin, x); int c = 0; for (int j = 0; j < validTags.size(); j++) { for (int p = 0; p < ValidAttributes[c].size(); p++) { if (x == validTags[j] + ValidAttributes[c][p]) { output = ValidValues[c][p]; /*if a valid attribute reference has been found then there is no need to check the rest of validAttribute[c] */ goto endOfLoop; } else { output = "Not Found!"; } } if (c < (ValidAttributes.size() - 1)) { c++; } } endOfLoop: cout << output << endl; } } return 0; } 
\$\endgroup\$
6
  • \$\begingroup\$You could use Yacc to generate the parser, rather than use custom error-prone parsing.\$\endgroup\$
    – esote
    CommentedJan 6, 2019 at 21:29
  • 3
    \$\begingroup\$This was just a task I decided to do for fun.\$\endgroup\$CommentedJan 6, 2019 at 21:31
  • 1
    \$\begingroup\$You have a goto in your code. Most coding standards consider this bad just by itself. I would consider it bad only if you can't justify it. But there is no comment explaining why you need a goto.\$\endgroup\$CommentedJan 7, 2019 at 18:37
  • \$\begingroup\$Your indentation is horrible and makes the code nearly unreadable. I'll come back and review it if you fix the indentation. But otherwise I'll just ignore.\$\endgroup\$CommentedJan 7, 2019 at 18:38
  • 1
    \$\begingroup\$It looks like your input is well-formed XML, in which case you could use one of the many XML parsers that are available instead of creating your own from scratch.\$\endgroup\$CommentedJan 9, 2019 at 10:58

1 Answer 1

1
\$\begingroup\$

Here are some things that may help you improve your code.

Don't abuse using namespace std

Putting using namespace std at the top of every program is a bad habit that you'd do well to avoid.

Be careful with signed and unsigned

In multiple places in this code an inti with an unsigned size_t returned from size(). It would be better to declare i to also be size_t.

Use all of the required #includes

The type std::string is used but its declaration is in #include <string> which is not actually in the list of includes.

Prefer using over typedef

When you make more use of templates, you will likely encounter the reason many people prefer using over typedef in modern C++. So your declarations become:

using VS = std::vector<std::string>; using VVS = std::vector<VS>; 

See T.43 for details.

Use "range for" and simplify your code

The code currently uses this:

for (int i = 0; i < HRML.size(); i++) { string x = HRML[i]; 

There is a much simpler and more efficient way to do this:

for (auto x : HRML) { 

This won't work without further modification to the details of the loop, but for help on that, see the next suggestion.

Use a state machine

The logic of this code could be expressed as a state machine. If that were done, one could process the stream "on the fly" character at a time with little difficulty.

goto still considered harmful

Generally speaking, the use of goto is not recommended. Using it as you have, for breaking out nested loops is about the only accepted use. See ES.76 for details. In this case, however, you can avoid it entirely as shown in the next suggestion.

Use appropriate data structures

The use of vectors of vectors of strings is not a very efficient structure for this program. I would suggest that using an unordered_map would be a better choice and would allow you change the convoluted triple loop at the end of the program to this:

for(std::string query; q > 0 && std::getline(std::cin, query); --q) { auto search = tagValue.find(query); if (search == tagValue.end()) { std::cout << "Not Found!\n"; } else { std::cout << search->second << '\n'; } } 

Parse input carefully

On my machine, your posted code segfaulted and crashed when run with the sample input provided at the code challenge site. The reason is that this code is not reading input correctly. Specifically, after this line:

std::cin >> n >> q; 

there is still a newline character in the input stream. Get rid of it like this:

constexpr std::size_t maxlinelen{200}; std::cin.ignore(maxlinelen, '\n'); 

The value of maxlinelen is from the problem description, but generically one could use this:

std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); 

Also, this line is not safe because it assumes that there are at least two characters in the string, which isn't guaranteed.

if (x[0] == '<' && x[1] != '/') { 
\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.