C++ custom string class implementation

Question

I recently had an interview where I was tasked with implementing a custom C++ string class without using the STL. The interviewer provided the function signatures and variable declarations, specifying that the APIs and parameters were fixed and could not be altered. My responsibility was to complete the function implementations within these constraints. I would greatly appreciate any advice or constructive criticism on my implementation.

class String { public: String() : data(new char[1]{'\0'}), size(0) {} String(const char* str); String(const String& other); String(String&& other) noexcept; ~String(); String& operator=(const String& other); String& operator=(String&& other) noexcept; String operator+(const String& other); String& operator+=(const String& other); char& operator[](std::size_t index); const char& operator[](std::size_t index) const; friend std::ostream& operator<<(std::ostream& os, const String& str); private: char* data; std::size_t size; }; String::String(const char* str) { if (str) { size = std::strlen(str); data = new char[size + 1]; std::strcpy(data, str); } else { data = new char[1]; data[0] = '\0'; size = 0; } } String::String(const String& other) { size = other.size; data = new char[size + 1]; std::strcpy(data, other.data); } String::String(String&& other) noexcept : data(other.data), size(other.size) { other.data = nullptr; other.size = 0; } String::~String() { delete[] data; } String& String::operator=(const String& other) { if (this == &other) { return *this; } delete[] data; size = other.size; data = new char[size + 1]; std::strcpy(data, other.data); return *this; } String& String::operator=(String&& other) noexcept { if (this == &other) { return *this; } delete[] data; data = other.data; size = other.size; other.data = nullptr; other.size = 0; return *this; } String String::operator+(const String& other) { String newStr; newStr.size = size + other.size; newStr.data = new char[newStr.size + 1]; std::strcpy(newStr.data, data); std::strcat(newStr.data, other.data); return newStr; } String& String::operator+=(const String& other) { char* newData = new char[size + other.size + 1]; std::strcpy(newData, data); std::strcat(newData, other.data); delete[] data; data = newData; size += other.size; return *this; } char& String::operator[](std::size_t index) { if (index >= size) { throw std::out_of_range("Index out of range"); } return data[index]; } const char& String::operator[](std::size_t index) const { if (index >= size) { throw std::out_of_range("Index out of range"); } return data[index]; } std::ostream& operator<<(std::ostream& os, const String& str) { os << str.data; return os; }

Looks promising, yet how about adding a method returning the length of the string? — coderodde, CommentedSep 4, 2023 at 8:59
Consider using small string optimization to impress the interviewer. The standard also allocates memory in multiples of two. This has a benefit of making concatination less prone to mallocs. — Ajinkya Kamat, CommentedDec 10, 2023 at 0:07

G. Sliepen · Accepted Answer · 2023-09-04 08:51:33Z

This looks quite good, I see you have thought about most of the corner cases. However, a few were missed:

`new` can throw

When you allocate memory with new, an exception can be thrown if the system is out of memory. In some cases it's fine to not handle this and let this propagate to the caller, but you should only do that if you left your object in a valid state. Consider these lines from the copy assignment operator:

delete[] data; size = other.size; data = new char[size + 1]; // could throw!

If the last line throws an exception, then you have already deleted data, but size is set to other.size. This leaves your String object in an invalid state, since you assume that, apart from right after moves, data is always a non-null pointer. So calling operator[]() will now crash with a segmentation fault due to a null-pointer dereference.

Make sure you handle this case correctly. Ideally, the state of the object is unchanged if an exception is thrown in any operation. So call newfirst, store the result in a temporary pointer, copy the data over, and only then replace data with the pointer to the new storage, and delete the old data.

In C++, strings can contain NUL-bytes

Contrary to C-strings, where a NUL-byte signifies the end of the string, in C++ a std::string can contain as many NUL-bytes as you want. But the C functions strcpy() and strcat() don't know about that, so this could result in less data being copied than needed. You can in fact create such a string by doing:

String hello("Hello, world!"); hello[5] = '\0';

Memory leak in `operator+()`

In operator+() you default-construct newStr. This causes it to allocate one byte. However, you then allocate a new array and assign it to newStr.data without deleting that one byte first.

Performance

Apart from the above correctness issues, there might be some ways to improve performance. First, instead of unconditionally delete[]ing old storage in the assignment operator, consider that you can check if the old storage is big enough to hold the new string. In that case, you don't have to do a new allocation.

Second, it is common in the standard library that operator[] does not do range checks, if you want those you can use the at() member function. However, if they have given you the function prototypes and added noexcept to the move constructor/assignment operator but not to operator[](), then maybe they did intend for you to do the range check.

An empty string optimization could have been done, where you don't allocate memory in the default constructor. You would need to add some checks here and there to handle this, but it would have avoided the unnecessary allocation in operator+() for newStr.

As to the NULL-bytes in C++ string, do I need to use memcpy instead of strcpy? @G. Sliepen — Jacob, CommentedSep 4, 2023 at 18:17
Get used to not using mem functions. All have better generic alternatives in std. — Red.Wave, CommentedDec 9, 2023 at 20:25

Davislor · Accepted Answer · 2023-09-06 15:52:44Z

You Can Add `private` Member Functions and `friend` Functions

At least, I think you can. That doesn’t change the the public API one whit, or the ABI (so long as none of them are virtual) so it ought to be kosher. I would start with constexpr String::swap(String& other) noexcept and constexpr String::String(char* const moved_data, const std::size_t other_size) noexcept.

The value of the swap is that it lets you use the copy-and-swap idiom, which solves many of your exception-safety bugs. This also makes it much easier to write a lock-free, thread-safe implementation.

The value of a private: constructor that sets data and size is that you frequently default-initialize a String object, which allocates one byte of memory, and then never delete[] its memory. You could insert a call to newstr::~String(); between the declaration and the initialization each and every time, but it would be much better to be able to just write:

String newstr(new_data, new_size);

Use the Correct String-Copy Functions

G. Sleipen already brought up that C++ strings can contain '\0' bytes. They didn’t mention that there are several different places where this causes a bug, such as:

String::String(const String& other) { size = other.size; data = new char[size + 1]; std::strcpy(data, other.data); }

This will fail to copy any data after a \0. Everywhere you use std::strcpy in this code, you should be using std::memcpy. This also will have the advantage of being faster.

Prefer Member-Initializer Lists in your Constructors

Code such as

constexpr String(char* const moved_data, const std::size new_size) noexcept : data(moved_data), size(new_size) {}

makes it easy to verify that each data member is initialized once and only once. Compilers are very good at optimizing constructors that have this form. It’s also, in my opinion, elegant. (The one gotcha is that the data members will be set in the order that they are declared within your class definition, not the order in which you write the commands.)

In some cases, this form of initialization would force you to write a helper (or “phi”) function, such as

String::String(const String& other) : data(new_copy(other.data, other.size)), size(other.size) {}

If new_copy is a useful function that you will re-use in several places and simplify your implementations, this is a big win.

Use Your Functions to Implement Each Other

The interviewer is testing you to see if you implement operators like = and + by delegating to simpler building blocks, such as the copy constructor.

Instead of std::memcpy you should recommend std::copy or std::copy_n. — Cris Luengo, CommentedDec 9, 2023 at 23:42

Cris Luengo · Accepted Answer · 2023-12-09 23:52:05Z

Overview:

I found one bug.
Your move constructor puts the object into an invalid state, where you can then potentially invoke undefined behavior.

You have a small memory leak in the operator+.

Couple of places where you are using non-standard techniques.

Check for self assignment.
Not using copy and swap
Missing swap method
Not using initializer lists.

None of it terribly bad. But as a result makes your code longer and thus harder to parse.

Other people have mentioned that you can't assume that a string does not include a null character. So using the C libraries strcpy() and strcat() are not safe. Prefer to use std::copy().

Also using throwing operation when you have put the object into an invalid state is not safe.

Standard technique is:

Do all throwing operations first
Modify state with only non throwing operations.
Release any resources.

Code Review:

Don't see why you need special code here. Use the ability to chain constructors to simplify this and make sure that you have one place that does the allocation:

 String() : data(new char[1]{'\0'}), size(0) {}

I would write like this:

 String() : String(""). // or even nullptr are you support the check // for null in that constructor. {}

Why only output?

 friend std::ostream& operator<<(std::ostream& os, const String& str);

Normally you want to support both input and output!

Sure. That will work.

 char* data; std::size_t size;

But any change in size is now going to result in a resize. Might be nice to be able to remove then add data without reallocating memory all the time.

Note: I say might. You will have to look at the use of your String and see if that is worth the extra variable.

I always prefer to use the initializer list to initialize the variables.

String::String(const char* str) { if (str) { size = std::strlen(str); data = new char[size + 1]; std::strcpy(data, str); } else { data = new char[1]; data[0] = '\0'; size = 0; } }

I would write like this:

String::String(char const* str) : size(str ? strlen(str), 0) , data(new char[size + 1]) // Note you will need to swap the order of the member // variables { std::copy(str, str + size, data); data[size] = '\0'; }

Again I prefer to always use the initializer list.

String::String(const String& other) { size = other.size; data = new char[size + 1]; std::strcpy(data, other.data); }

SO I would write like this:

String::String(const String& other) : size(other.size) , data(new char[size + 1]) { std::copy(other.data, other.data + size + 1, data); }

You can write this simpler using std::exchange.

String::String(String&& other) noexcept : data(other.data), size(other.size) { other.data = nullptr; other.size = 0; }

Write like this:

String::String(String&& other) noexcept : size(std::exchange(other.size, 0)) , data(std::exchange(other.data, nullptr)) {}

I would point out that you just put the other object into a state that has a nullptr. This is not done in any of your constructors. I would assume that this is not a valid state for a normal object. This looks like a bug to me!

If I did this:

String one("One"); String sizeOne = std::move(One); char last = one[0]; // Would you expect that to crash?

I would consider this bad practice:

 if (this == &other) { return *this; }

Yes; you do have to write in a way that works for self assignment. But this is self pessimizing code. The case that practically never happens is probably going to cost extra.

The standard technique has been, for many years now, a technique called copy and swap. This does not use a test for self assignment.

There is a standard technique for writing the copy and move assignment operator as a single function.

String& String::operator=(const String& other) { if (this == &other) { return *this; } delete[] data; size = other.size; data = new char[size + 1]; // Throwing operation. // done after modifying state. // object not in a consistent state. std::strcpy(data, other.data); return *this; } String& String::operator=(String&& other) noexcept { if (this == &other) { return *this; } delete[] data; // Not an issue with char // But you can't assume that // delete is always safe to call (think about this situation if you were building a vector rather than string). data = other.data; size = other.size; other.data = nullptr; other.size = 0; return *this; }

I would write like this (supports copy and move)

// Notice parameter is passed by value. // L-Value input is copied. // R-Value input is moved. String& String::operator=(String other) noexcept { swap(other); return *this; }

Note: If you use assignment with an L-Value it is copied into the parameter other then state is swapped. This is the same cost as you have above. If you do assignment with R-Value then it is moved into the parameter (no copy) then state is swapped. This is the same cost as you have above. You do have to write swap() but that is relatively trivial.

It is common to write operator+ in terms of operator+=:

String String::operator+(const String& other) { String newStr; // Note this allocated memory. // You are overwritting "data" // Without releasing this memory. // So you have a memory leak. newStr.size = size + other.size; newStr.data = new char[newStr.size + 1]; std::strcpy(newStr.data, data); std::strcat(newStr.data, other.data); return newStr; }

I would simplify this to:

String String::operator+(const String& other) { // Note: Your class is not implemented in a way // That makes this efficient. But with a tiny bit of work. // This can be made to be just as efficient. String result(*this); return result += other; }

In std::string the operator[] is not checked. It is assumed you have pre-checked. This is to optimize for speed. The std::string has a seprate method that has a checked bounds check at().

char& String::operator[](std::size_t index) { if (index >= size) { throw std::out_of_range("Index out of range"); } return data[index]; }

Consider the following:

1: You update your code to support either size() or begin() and end(). In this case you can loop over the code:

for (int loop = 0; loop < string.size(); ++loop) { std::cout << string[loop] << "n"; }

Here we know that loop is always inside the correct bounds and will never throw. So why am I paying the price of checking the loop bounds on every access to operator[] when I have already guaranteed that accesses will be within bounds.

Lets simplify this to a single line:

std::ostream& operator<<(std::ostream& os, const String& str) { // Note: You are ssuming that there are no null characters in this string. os << str.data; return os; }

I would write like this:

std::ostream& operator<<(std::ostream& os, String const& str) { return os << std::string_view(str.data, str.size); }

Red.Wave · Accepted Answer · 2023-12-11 20:53:21Z

The interface of your class is fine: you are following the rule 5; that means you have the 5 big special functions:

Destructor
Copy constructor
Copy assignment operator
Move constructor
Move assignment operator

The general rule of thumb is rule 035: either define none of the above, or define the 1st 3, or define all 5. In corner cases you can define #1 and either/both of #4 or/and #5, where the compiler implicitly sets #2 and #3 to =delete. If you diviate from rule 035, then your code eventually hits one of the major bugs:

Resource/memory leak
Double delete
Use after free

Now rule 5 which you have chosen is about performance optimization. If you just wanted correctness, you could stick to rule 3 and implement it with copy/swap idiom. Since you have chosen to have a performance optimized string class, you should have considered removing excessive alloc/dealloc in your assignment operators:

You should've introduced a private capacity member to store current available bytes and assert that always capacity>=size.
When assigning(in both add & copy), if the destination capacity is larger than source size, you can skip alloc/dealloc and copy bytes, followed by std::fill_n to zero.
In move assignment, you can select between source and destination buffer to keep, based on capacities: either swap buffer pointers, or std::copy_n destination from source.

Finally since you want to keep things simple, and custom allocations are not part of the picture, you can avoid new/delete and use std::free for deallocation. C function strdup creates a duplicate of its operand in ram and returns the start address. The safer C function strndup takes a maximum byte count as its second argument. Both of them are C API functions; they don't throw and their results can be checked against nullptr. I would also go ahead and define a char array accepting constructor too:

template<std::size_t N> String::String(char const (&arr)[N]); //I skipped the definition

This constructor prevents string literals inputs from decaying to pointers; it grabs them by reference and has to duplicate them(like the char pointer overload), but gets the size information instantly from N: if last character is not null N + 1, other was just N. I would also mark the pointer overload as explicit and give it a defaulted count argument:

explicit String::String(char const* begin, std::size_t count = ~0ull); //I skipped the definition

Now I have a more generic constructor that deals with char arrays created at runtime(literal strings are copied via the template constructor above).

Sorry for lack of code snippets; This is not about reviewing my implementation after all.

==================================

To anyone reading this:

I made a terrible mistake of introducing strdup and strndup as std C++ functions; they seem to be not. But I like them. If you plan on to work on POSIX compliant systems, they can be good choices.

When did std::strdup() and std::strndup() get added to C++? I thought those were C only, so far (added in C23). Certainly the latest draft standard doesn't mention either function. — Toby Speight, CommentedDec 11, 2023 at 9:56
@TobySpeight fair point. I have seen and used strdup long ago. It stroke me to find their not std. Thank you. I have to edit the post sometime later. — Red.Wave, CommentedDec 11, 2023 at 20:41

Stack Exchange Network

C++ custom string class implementation

4 Answers 4

`new` can throw

In C++, strings can contain NUL-bytes

Memory leak in `operator+()`

Performance

You Can Add `private` Member Functions and `friend` Functions

Use the Correct String-Copy Functions

Prefer Member-Initializer Lists in your Constructors

Use Your Functions to Implement Each Other

Overview:

Code Review:

Hot Network Questions

C++ custom string class implementation

4 Answers 4

new can throw

In C++, strings can contain NUL-bytes

Memory leak in operator+()

Performance

You Can Add private Member Functions and friend Functions

Use the Correct String-Copy Functions

Prefer Member-Initializer Lists in your Constructors

Use Your Functions to Implement Each Other

Overview:

Code Review:

Related

Hot Network Questions

`new` can throw

Memory leak in `operator+()`

You Can Add `private` Member Functions and `friend` Functions