7
\$\begingroup\$

I tried to serialize a doubly linked list. Can you rate it, and what could be improved? I open the file with fopen(path, "wb") and write all data in binary mode.

#include <string> #include <unordered_map> #include <vector> #include <random> #include <iostream> struct ListNode { ListNode* prev = nullptr; ListNode* next = nullptr; ListNode* rand = nullptr; std::string data; }; class List { public: void Serialize(FILE* file) { std::unordered_map<ListNode*, int> uMap; auto cur = head; for (int i = 1; i <= count; i++) { uMap.insert(std::make_pair(cur, i)); cur = cur->next; } std::vector <std::pair<std::string, int>> vNode; vNode.reserve(count); cur = head; while (cur) { int randEl{ 0 }; if (cur->rand != nullptr) { auto search = uMap.find(cur->rand); randEl = search->second; } vNode.push_back(std::make_pair(cur->data, randEl)); cur = cur->next; } fwrite(&count, sizeof(count), 1, file); for (auto& a: vNode) { fwrite(&a.second, sizeof(a.second), 1, file); int size = a.first.size(); fwrite(&size, sizeof(size), 1, file); fwrite(a.first.c_str(), 1, size, file); } } void Deserialize(FILE* file) { std::unordered_map<int, ListNode*> uMap; std::vector<int> vRandNode; int rCount{ 0 }; fread(&rCount, sizeof(rCount), 1, file); vRandNode.reserve(rCount); rCount = 1; for (; rCount <= vRandNode.capacity(); rCount++) { int randNode{ 0 }; fread(&randNode, sizeof(randNode), 1, file); int len{ 0 }; fread(&len, sizeof(len), 1, file); std::string temp{ "" }; while (len > 0) { char ch ; fread(&ch, sizeof(ch), 1, file); temp += ch; len--; } Add(temp); vRandNode.push_back(randNode); uMap.insert(std::make_pair(rCount, tail)); } auto cur = head; for(auto a: vRandNode) { if (a != 0) cur->rand = uMap.find(a)->second; else cur->rand = nullptr; cur = cur->next; } } void Add(std::string str) { ListNode* node = new ListNode; node->data = std::move(str); count++; if (head == nullptr) { head = node; } else { tail->next = node; node->prev = tail; } tail = node; } List(){} List(std::string str) { Add(std::move(str)); } ~List() { while (head) { auto temp = head->next; delete head; head = temp; } } void ShufleRandom() { std::random_device rd; std::mt19937 gen(rd()); auto cur = head; while (cur) { auto randNode = head; int randNum = gen() % (count + 1); for (int i = 1; i < randNum; i++) { randNode = randNode->next; } if (randNum == 0) cur->rand = nullptr; else cur->rand = randNode; cur = cur->next; } } void PrintListAndRand() const { auto cur = head; while (cur) { std::cout << "===Data->" << cur->data; if (cur->rand) std::cout << " | RandData->" << cur->rand->data << "===" << std::endl; else std::cout << " | RandData->nullptr===" << std::endl; cur = cur->next; } std::cout << "_______________________________________________________________" << std::endl; } private: ListNode* head = nullptr; ListNode* tail = nullptr; int count = 0; }; 
\$\endgroup\$
1
  • \$\begingroup\$(There are 83 trailing spaces in the code as posted here.)\$\endgroup\$CommentedMar 6, 2023 at 17:33

2 Answers 2

9
\$\begingroup\$

Thanks for posting your code and not being afraid of feedback. The indentation of your code is nice and you're already using a lot of stuff present in the C++ standard library. That's great.

Unclear application

I don't understand how the structure would be beneficial for me. What would I use a linked list pointing to random nodes for?

And while I can add something to the list, it seems there's no way of accessing the elements of the list except saving and printing.

Your list does not act like a C++ container and thus its use is limited. We can't use it with any algorithms of the standard library.

Use of FILE and fopen()

These probably come from <cstdio>, which you should include, following the "include what you use" (IWYU) principle. And, as the name of the library suggests, they are C functions. For C++ you might want to #include <fstream> and use std::ifstream and std::ofstream instead.

Rule of 5

You have implemented a custom destructor, which means that you should think about a copy constructor, copy assignment operator, move constructor and move assignment operator as well.

See: The rule of three/five/zero

Long function Serialize

The function is 30 lines long and from its structure, it looks like this could be split into 3 smaller private functions:

  1. conversion of the list into a Hashmap.
  2. conversion of the Hashmap into a flat list
  3. writing the flat list

Use emplace_back

The line vNode.push_back(std::make_pair(cur->data, randEl)); can be replaced by the shorter and more efficient vNode.emplace_back(cur->data, randEl);.

Long method Deserialize

I expected to see the 3 operations of Serialize() in reverse order, but somehow they are interleaved. At the moment, I can't judge whether that makes your implementation more efficient, but at least it hurts readability.

Try 3 private methods in the reverse order of Serialize.

count not set when deserializing

In Deserialize, why is the variable called rCount instead of count? At first sight, it seemed to me that count is never set - until I find that Add() is called which increases the counter.

Wrong use of capacity

The loop for (; rCount <= vRandNode.capacity(); rCount++) is incorrect. After vRandNode.reserve(rCount);, the capacity of that vector may be larger than the original requested size. As a consequence, you may read more items than available in the file.

[...] to a value that's greater or equal to new_cap.

Source: https://en.cppreference.com/w/cpp/container/vector/reserve

This brings me to an important topic:

Missing tests

It seems that there are no unit tests for your class. Although I find unit tests quite hard to do in C++ (compared to other languages), I still think they are useful and you should look into that.

Using C++ streams (as suggested before) will actually help you with testing, since you can use a string stream instead of a file stream during the test.

Uniform initialization

std::string temp{ "" }; can just be std::string temp;.

int randNode{ 0 }; can be int randNode{}; and similar.

You can also do that for ListNode: ListNode* prev{}; will use a nullptr.

Missing error checks / basic exception guarantee

The return value of fread() is never checked. Your code assumes that the file always has correct content. If you implement a unit test, also implement one for partial / broken files.

Try to achieve at least Basic exception guarantee level.

Make the loop complete

Instead of

rCount = 1; for (; rCount <= vRandNode.capacity(); rCount++) 

write

for (rCount = 1; rCount <= vRandNode.capacity(); rCount++) 

so that the for loop looks like a normal for loop.

Except for capacity(), as discussed before.

Use auto

You already use auto a lot. This applies to the principle "almost always auto" (AAA). But you can use it more, e.g. auto node = new ListNode; instead of ListNode* node = new ListNode;

ShufleRandom

Shuffle is written with 2 Fs. ShuffleRandom() seems to be duplicate words for the same thing. Shuffle() is enough.

Random number generation

I don't know about your requirements on the distribution of the random numbers. Nor did I fully understand why that list has random pointers in the first place. The modulo approach typically doesn't give you a uniform distribution. Have a look at this Stack Overflow question to find out how to generate random numbers in a better way.

gen() gives you an unsigned integer, but randNum? is a signed integer. randNum should be the same type: auto const randNum = gen() % (count + 1);

I'd also like to see

if (randNum == 0) { cur->rand = nullptr; } else { cur->rand = nthNode(randNum); } 

Const correctness

The PrintListAndRand() function is const, so it seems that you're at least a bit familiar with const correctness. Seeing that, I would consider that Serialize() should also be const.

\$\endgroup\$
5
  • \$\begingroup\$Thank you very much for such an extended comment. I'll try to clarify. I was given the task of writing functions to serialize and deserialize a doubly linked list with a pointer to a random element as an additional complication. Therefore, I neglected some list functions (copy constructor, get value, etc.).\$\endgroup\$
    – Denisych
    CommentedMar 5, 2023 at 19:09
  • \$\begingroup\$@Denisych: yeah, in times of ChatGPT, the professors need to come up with some new ideas, I guess. To say "I don't want this thing", you can =delete them. Otherwise there's a table of 42 items what the compiler will do. That's so hard to remember\$\endgroup\$CommentedMar 5, 2023 at 19:54
  • \$\begingroup\$Stack Overflow, not StackOverflow.\$\endgroup\$CommentedMar 6, 2023 at 17:47
  • \$\begingroup\$Do you have a reference for IWYU? (There is a tool of the same name.)\$\endgroup\$CommentedMar 6, 2023 at 17:47
  • \$\begingroup\$@PeterMortensen: Thanks for your edits. I thought there was a principle with this name and the tool is supporting this principle. But maybe I was wrong and the tool was first. The best link I could find is include-what-you-use.org\$\endgroup\$CommentedMar 6, 2023 at 22:47
4
\$\begingroup\$

Thomas Weller has already made many good points in his answer. I'll just add this:

Keep responsibilities to a minimum

You've added a class List that has two responsibilities: being a container for std::strings, and a (de)serializer for that data. Maybe you want to do more with the list than just (de)serialize it, what other functionality are you going to add to this class? You want to avoid having it grow and grow until it becomes an unmaintainable behemoth.

There is an easy way to solve this: move Serialize() and Deserialize() out of class List. They could look like:

void Serialize(const List& list, FILE* file) { … auto cur = list.head; for (int i = 1; i <= list.count; i++) { … } … } void Deserialize(List& list, FILE* file) { … list.Add(temp); … } 

Of course, you need some help from List to be able to do this. While the deserializer is easy as it just needs to add items to a list, and Add() is a public member function, the serializer would need some way to iterate over the elements of the list. This could be done by making head and countpublic member variables, but that is not very nice. Better would be to add member functions which return these values, as that would prevent accidental modification.

Now that you've decoupled the (de)serializer from the List class, you might consider the following:

Make the (de)serializer work on STL containers

The standard library comes with many containers types, including a std::list. Consider using that instead. Now that you know you can add a (de)serializer without having to add that functionality to the list class itself, you can modify your (de)serializer to work with a STL container. Considering that your List is basically a std::list<std::string>, you could write:

using List = std::list<std::string>; void Serialize(const List& list, FILE* file) { … for (const std::string& item: list) { … } … } void Deserialize(List& list, FILE* file) { … list.push_back(temp); … } 

But you can go further than this:

Make the code more generic

The above only works for lists of strings. What if you have an array of strings, or a vector of ints? You could write different versions of Serialize() and Deserialize() that deal with all the cases you want to be able to handle, but that will lead to a lot of code duplication. By making your functions templates, you can write code once that will work on many different types. Consider:

template<typename Container> void Serialize(const Container& container, FILE* file) { … for (auto& element: container) { … } … } template<typename Container> void Serialize(const Container& container, FILE* file) { … container.push_back(temp); … } 

Of course, now it becomes a bit harder if you want to allow serializing containers that contain things other than std::strings. That's also solvable by adding functions that deal with serializing one element of a specific type, which you then call from the serialization function that handles containers. This may be a bit of work, which is wasted if you only ever want to serialize containers of std::strings, but if you want to serialize other things, it will be worth it.

Have a look at existing serialization libraries

There are existing libraries that can help you serialize your own datastructures, like for example Boost Serialization. Consider having a look at them; maybe they are a good fit for your own projects, so you don't have to reinvent the wheel. They might also serve as inspiration for your own serialization code.

\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.