6
\$\begingroup\$

This is my code and I would like to get it code reviewed. It is functional and behaves as expected.

I pass some basic types to the Serialize() function and then deserialize the output to get back the original value.

size_t returnSize(const char* s) { string string(s); return string.size(); }; template<typename T> size_t returnSize(const T& t) { return sizeof(t); }; template<typename T> string Serialize(const T& t) { T* pt = new T(t); vector<char> CasttoChar; for (int i =0 ;i<returnSize(t);i++) { CasttoChar.push_back(reinterpret_cast<const char*>(pt)[i]); } delete pt; return string( CasttoChar.begin(), CasttoChar.end() ); }; template<> string Serialize<string>(const string& t) { return Serialize(t.c_str()); }; template<typename T> T DeSerialize(const string& cstr) { const T* a = reinterpret_cast<const T*>(cstr.c_str()); return *a; } template<> string DeSerialize<string>(const string& cstr) { return DeSerialize<char*>(cstr); } int _tmain(int argc, _TCHAR* argv[]) { int x = 97; string c = Serialize<int>(x); cout << DeSerialize<int>(c) << endl; char g = 'g'; string c2 = Serialize<char>(g); cout << DeSerialize<char>(c2) << endl; string k = "blabla"; string c3 = Serialize<const char*>(k.c_str()); cout << DeSerialize<char*>(c3) << endl; string ka = "string"; string c4 = Serialize<string>(ka); cout << (DeSerialize<string>(c4)).c_str(); system("PAUSE"); return EXIT_SUCCESS; } 
\$\endgroup\$
1
  • \$\begingroup\$You're duplicating a string on the heap only to take its size?\$\endgroup\$
    – asveikau
    CommentedJan 13, 2013 at 8:33

2 Answers 2

8
\$\begingroup\$

First of all, fundamentally: std::string is a data type to hold text. Exclusively. It should not be used to hold binary data. Use a std::vector<(unsigned) char> for that.

Secondly, you are using heap allocation without needing to:

T* pt = new T(t); 

This makes no sense at all, and introduces the potential of a memory leak. You could simply make a copy – but you don’t even need to do that. Just work on the original data.

Next, about naming; it’s general convention in C++ to use underscore_separated_names rather than camelCase or PascalCase. This is only a convention but I’d follow it unless there’s a very compelling reason not to.

Also on the topic of naming: returnSize is redundant, return has no place in the name. Rename it to object_size or something along those lines. You are also inconsistent in your naming convention here.

To get the size of a zero-terminated string, use std::strlen, that’s more efficient than converting to a std::string.

Finally, the code simply doesn’t work. It creates a bit-by-bit shallow representation. That only works for simple composite objects, it no longer works for objects that use pointers or references internally – you’ve already noticed that, because otherwise you wouldn’t need to create a special chase for char* and std::string. Incidentally, for the case of char* you specialise returnSize but for the case of std::string you specialise Serialize and DeSerialize. That’s asymmetrical and means that you have several places that might require changing when new types are added. You should reduce this to a single point which requires specialisation.

Ignoring that, I’d rewrite the code as follows:

#include <algorithm> #include <iostream> #include <string> #include <vector> typedef unsigned char byte_t; typedef std::vector<byte_t> buffer; std::size_t object_size(const char* s) { return std::strlen(s); }; template<typename T> std::size_t object_size(T const& obj) { return sizeof(obj); }; template<typename T> buffer serialize(const T& obj) { std::size_t size = object_size(obj); buffer buf(size); byte_t const* obj_begin = reinterpret_cast<byte_t const*>(&obj); std::copy(obj_begin, obj_begin + size, buf.begin()); return buf; }; template<> buffer serialize<std::string>(std::string const& str) { return serialize(str.c_str()); }; template<typename T> T deserialize(buffer const& buf) { return *reinterpret_cast<const T*>(&buf[0]); } template<> std::string deserialize<std::string>(buffer const& buf) { return deserialize<char*>(buf); } int main() { using std::cout; int x = 97; buffer c = serialize(x); cout << deserialize<int>(c) << "\n"; char g = 'g'; buffer c2 = serialize(g); cout << deserialize<char>(c2) << "\n"; std::string k = "blabla"; buffer c3 = serialize(k.c_str()); cout << deserialize<char*>(c3) << "\n"; std::string ka = "string"; buffer c4 = serialize(ka); cout << deserialize<std::string>(c4) << "\n"; } 

… but like I said, this solution doesn’t generalise and is pretty useless in practice. Have a look at Boost.Serialization for a proper implementation. Unfortunately this is quite a complex problem.

\$\endgroup\$
7
  • 4
    \$\begingroup\$Sorry dont agree with this: It’s general convention in C++ to use underscore_separated_names. I tend to see more C++ code that is camel case. But I also see lots with '_'. Personal I prefer camel case (which is why I also tend to see it more). But you should generally follow the conventions of your current employer (or project).\$\endgroup\$CommentedJan 13, 2013 at 19:48
  • \$\begingroup\$@Loki for local variables? Ugh. But in general I think one should follow the practice established by the standard library of the respective language, otherwise you will have to mix styles. I’d go as far as claiming that underscore-separated will always be the only generally agreed-on convention in C++ because as soon as that’s no longer the case people are obviously no longer using the standard library pervasively in their code and the resulting code won’t be idiomatic C++. It may be better (or whatever) but it’s no longer idiomatic C++.\$\endgroup\$CommentedJan 13, 2013 at 19:59
  • \$\begingroup\$@konrad thank you for this detailed answer, unfortunately I can't use boost serialization since its not a header only library. As for your comment regarding this solution cannot be generalized to serialize objects that contain pointers or references, I agree I wanted to serialize basic types only and each class (to be serialized) will implement its own toString() function that will use the above serialize methods to serialize the basic types of that class. I know it's not cleanest but do I have an alternative?\$\endgroup\$
    – Kam
    CommentedJan 13, 2013 at 20:18
  • \$\begingroup\$@Kam Hmm. I don’t think that “not header only” is a compelling argument, to be honest. Similarly, “cannot use Boost” has no place in C++ nowadays. It’s akin to saying “I cannot use the standard library”. To paraphrase Stroustrup: doing anything in C++ without a library is hard. Use libraries. They’re your friends.\$\endgroup\$CommentedJan 13, 2013 at 20:21
  • \$\begingroup\$@konrad, it's a solution I am trying to develop for the company I work for, and they restrict the use of external binaries :$\$\endgroup\$
    – Kam
    CommentedJan 13, 2013 at 20:28
2
\$\begingroup\$

I'll expand on the portability aspect that wasn't quite explained in the other answer:

Be aware that _tmain(), _TCHAR, and system("PAUSE") are specific to Windows environments, while the first two are also Microsoft extensions that are not considered standard C++.

This can be fixed by changing

  • _tmain() to main()
  • _TCHAR to char
  • system("PAUSE") to something like std::cin.get()

    However, if you can get your IDE to pause automatically, then you won't need your own.

\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.