Dynamic array implementation using C++

Question

I implemented a basic dynamic array using C++.

When deleting an element and shifting the others, the last element will be twice repeated (the original still exists after copying). I want to avoid memory leaking by deallocating the last element of the array. Currently, I do this by shrinking once after a number of deletions. Are there better solutions?

Finally, I am interested in some feedback on this implementation.

#include <iostream> #include <stdexcept> using namespace std; class DynamicArray { protected: int size; int* arr; int length; void grow() { copyToNewSize(size * 2); } void shrink() { copyToNewSize(size / 2); } void copyToNewSize(int newSize) { int* temp = new int[newSize]; if (newSize > size) for (int i = 0; i < size; i++) { temp[i] = arr[i]; } else for (int i = 0; i < newSize; i++) { temp[i] = arr[i]; } delete[] arr; size = newSize; arr = temp; } void checkIndex(int index) { if (index >= length || index < 0) throw "Index is out of the array range!"; } public: DynamicArray(int startingSize) { size = startingSize; arr = new int[size]; length = 0; } ~DynamicArray() { delete[] arr; } int push(int value) { length++; if (length == (size + 1)) grow(); arr[length - 1] = value; return length; } int pop() { length--; int value = arr[length]; // memory leaking , delete arr[length] if (length <= size / 2) shrink(); return value; } int insert(int index, int value) { checkIndex(index); length++; if (length == (size + 1)) grow(); for (int i = length - 1; i > index; i--) { arr[i] = arr[i - 1]; } arr[index] = value; return length; } int remove(int index) { checkIndex(index); int value = arr[index]; length--; for (int i = index; i < length; i++) arr[i] = arr[i + 1]; // memory leaking , delete arr[length] if (length <= (size / 2)) shrink(); return value; } int get(int index) { checkIndex(index); return arr[index]; } void set(int index, int value) { checkIndex(index); arr[index] = value; } int search(int value) { for (int i = 0; i < length; i++) { if (arr[i] == value) return i; } return -1; } int getLength() { return length; } }

Do you have any specific reason for not using std::vector<int> or one of the other standard containers? — jdt, CommentedSep 20, 2021 at 11:54

Serge Ballesta · Accepted Answer · 2021-09-20 08:33:17Z

First of all, there is a major problem with your class: it does not respect the rule of three/five/zero:

it has a non trivial destructor that destroys memory allocated in constructor

it neither defines nor deletes the copy constructor and assignment operator Let us look at the following code:

 DynamicArray dynarr[16]; // initialize values in dynarr if (...) { // opens a nested block DynamicArray dynarr2 = dynarr; // copy construct the new DynamicArray (1) // use it } // dynarr2 goes out of scope (2)

At (1), the compiler generated copy constructor will blindly copy the member, having dynarr and dynarr2 share their inner int array.

At (2), the dtor for dynarr2 will delete the shared inner array: ZAP!...dynarr now has a dangling pointer, Undefined Behaviour is ensured from there....

There is another possible error in size management. You consistently divide it by 2 when removing elements and multiply it by 2 when adding. Fine. Except that if you let the size reach 0 by removing the last element of a DynamicArray 0*2 is still 0 so the size will be definitively stuck at 0! You must handle that corner case by preventing the size to reach 0 in shrink.

What follow are only minor possible improvements.

Your code uses using namespace std;
This is not an error because it is allowed by the language, and is even common in many tutorials. It is a nice trick that allows to use all the goodies from the standard library without the std:: prefix. But, it does import a number of useless symbols into the current namespace, somehow defeating the whole namespace concepts. In production code, best practices recommend to never use using namespace std but only import the relevant symbols. Only a cosmetic improvement.
in checkIndex, you use throw "Index is out of the array range!";
This actually throws a pointer to a string litteral. Here again it is not an error since it is allowed by the language. But best practices recomment to only throw objects of (subclasses of) the std::exception class. Users of your class will probably not expect to have to catch a char pointer. You'd better use an out_of_range or a runtime_error exception if you think it is not worth declaring a custom exception
You compare size and newsize in copyToNewSize to know how many elements should be copied. But in fact you have only to copy length elements... This one is only a tiny optimization.

For your initial question, there is nothing bad in only shrinking after a number of removals, because a shrink involve reallocation and a copy of the elements, so it is an expensive operation. For that reason, you should even considere stop shrinking below a threshold (for example 8 elements). As a bonus, it will directly prevent size to reach 0, and you could even make that minimal size an optional parameter of your constructor.

and you could even make that minimal size an optional parameter of your constructor. - Then you'd have to store it somewhere. That sounds unlikely to be good, although if you're implementing this for real (rather than a learning exercise) then making it different from std::vector is a good idea, otherwise you'd just use std::vector. I'd be more inclined to make a tuning option like that an optional template parameter, or possibly a static member variable. — Peter Cordes, CommentedSep 21, 2021 at 3:15
@PeterCordes: I generally avoid integer templates, because they are a nightmare in a library: either you put the full definition in headers, or you have to instantiate a number of concrete classes and users will be restricted to them. I have never found a correct implementation of multidimensional arrays in C++ because of that (inner dimensions can only be constexpr). — Serge Ballesta, CommentedSep 21, 2021 at 7:29

jdt · Accepted Answer · 2021-09-20 18:39:48Z

6

Consider using size_t instead of int. The actual type of size_t is platform-dependent; a common mistake is to assume size_t is the same as unsigned int, which can lead to problems, particularly as 64-bit architectures become more prevalent.
It is generally bad practice to repeat code that does the same thing. Consider changing copyToNewSize to something like this:

void copyToNewSize(size_t newSize) { int* temp = new int[newSize]; size_t copySize = std::min(size, newSize); for (size_t i = 0; i < copySize; i++) { temp[i] = arr[i]; } delete[] arr; size = newSize; arr = temp; }

I would suggest that you rename size to allocatedSize to avoid confusion between size and length.

edited Sep 20, 2021 at 18:39

answered Sep 20, 2021 at 17:52

jdt

2,4175 silver badges21 bronze badges

\$\begingroup\$64-bit architectures are already widespread; what's still left to change is that real-world memory capacities continue to grow, so it's becoming more realistic (and maybe less rare in real-world usage) to actually have arrays that large.\$\endgroup\$
– Peter Cordes
CommentedSep 21, 2021 at 3:17
\$\begingroup\$Also, std::copy exists, can be a good idea to use it instead of open-coding a copy loop.\$\endgroup\$
– Peter Cordes
CommentedSep 21, 2021 at 3:19
\$\begingroup\$#pragma omp parallel for on a copy loop? Yeah, could be for very large copies (like maybe more than 4GiB), on systems like a big Xeon where a single core can't saturate memory-controller bandwidth. (Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?). If you have so many large copies to do that it's worth considering threads just for that, though, probably algorithmic optimizations to avoid copying so much will be even better. Or see my comments on mdfst13's answer re: Linux mremap to avoid physical copying.\$\endgroup\$
– Peter Cordes
CommentedSep 21, 2021 at 12:02
\$\begingroup\$(Or if other cores can be doing something useful while some are just copying, that might be good for overall throughput, unless multiple other threads are waiting for this copy to complete. On some CPUs there are diminishing returns for using even more threads to max out copy bandwidth, although on current Skylake-Xeon it's fairly linear for quite a few cores because the per-core memory bandwidth is so small (compared to a desktop and also as a fraction of the available aggregate memory-controller bandwidth))\$\endgroup\$
– Peter Cordes
CommentedSep 21, 2021 at 12:05
\$\begingroup\$You forgot to add any compiler flags on TIO, so that was gcc -O0. If you just meant to link the code but not try to run it on the server, godbolt.org/z/jW91snGdj lets you look at the asm. (using gcc -O3 -fopenmp. Without -fopenmp, #pragma omp does nothing.)\$\endgroup\$
– Peter Cordes
CommentedSep 21, 2021 at 16:20

| Show 6 more comments

mdfst13 · Accepted Answer · 2021-09-21 15:27:39Z

Don't talk like a pirate

I would prefer the name array (another reason not to using namespace std;) to arr. And I'd prefer a more descriptive name like data or numbers to either.

Don't waste work

 int push(int value) { length++; if (length == (size + 1)) grow(); data[length - 1] = value; return length; }

You add and subtract three times. Consider

 int push(int value) { if (length == size) { grow(); } data[length] = value; length++; return length; }

That adds once (the increment).

In general, I would avoid the statement form of control structures. That can lead to odd, hard to find bugs. Worse, you use it inconsistently, which makes the code harder to read.

Fragile code

 if (length <= size / 2) shrink();

and

 void shrink() { copyToNewSize(size / 2); }

Combined these are fragile. To work properly, you need size / 2 to be the same in both places. Consider

 size_t bound = size / 2; if (length <= bound) { copyToNewSize(bound); }

Now it has to be the same in both places. Another alternative

 void shrinkIfNecessary() { size_t bound = size / 2; if (length <= bound) { copyToNewSize(bound); } }

Then replace the if and call to shrink with shrinkIfNecessary.

In general, you should try to avoid parallel logic. If you need the same logic in two places, try to abstract it out into at least a variable (e.g. bound here) or a function (e.g. shrinkIfNecessary).

It's natural to think that grow should have a shrink. But here, the shrinking operation is triggered differently. They aren't mirror images of each other. You have to grow any time you exceed the bounds. You don't need to shrink every time. So it actually makes sense to have grow and shrinkIfNecessary instead.

You also might consider moving the 2 to a constant, e.g. GROWTH_FACTOR. But that will work fine even if inconsistent between growing and shrinking, so it's less important.

`std::copy`

 if (newSize > size) for (int i = 0; i < size; i++) { temp[i] = arr[i]; } else for (int i = 0; i < newSize; i++) { temp[i] = arr[i]; }

In this, you copy manually. But you don't need to do that. C++ has std::copy. So you could just say

 std::copy(arr, arr + std::min(size, newSize), temp);

The std::min was already posted here.

Note that std::copy requires #include <algorithm>.

This would get around the entire question of int vs. size_t vs. auto as regards to the loop index variables. No more loops means no more loop indexes.

The C++ way

C++ also has std::vector, which would get around the entire class.

The C way

If you prefer to reinvent the wheel, you might consider if you want to go all the way and return to the C memory management. Why C? Because in C, you have realloc, which can make arrays smaller or larger. Usually when making an array smaller, it would use the same memory and free just what is at the end. C++ doesn't have that capability (except in that C++ has access to the C routines). Of course, if you do that, you want to stop using new and delete in favor of calloc/malloc and free.

The realloc function also handles the copying for you if necessary. But it won't always be necessary, because realloc will usually reuse the existing array when making it smaller. And it will sometimes use the existing array when making it bigger as well.

Duplicates

When deleting an element and shifting the others, the last element will be twice repeated (the original still exists after copying).

In one sense, this is harmless. If you track the size correctly and initialize the memory before using, this might be true, but you'd never know.

If you are concerned about leaving traces in your program, note that realloc can make an array smaller efficiently enough that you could call it on every pop or remove. However, you'd still likely prefer to grow less frequently. Because at least some of the time, it would cause the array to need to be copied.

You might also consider manually clearing the entry. E.g. in remove:

 data[length] = 0;

That's the kind of thing you'd do if the data were sensitive for some reason.

std::vector unfortunately can't use realloc because it has to work on non-trivially-copyable types (i.e. whose object-representation may need to change when copied to a new address, or just whose copy-constructor has a side-effect such as debug logging.). But even more unfortunately, not even in a template specialization for trivially-copyable types is easy: C++ has replaceable new, and there's a guarantee that if std::vector allocates, it's via calls to new, so user-defined new will run. This replacement might be in a different compilation unit so you can't template on it. — Peter Cordes, CommentedSep 21, 2021 at 10:16
IDK why C++ has refused to add a try_realloc allocator interface that could avoid copying, but it makes std::vector pretty garbage for huge arrays, especially compared to what it could be doing on Linux where the mremap(2) system call does this at page granularity, trying to extend an existing mapping. And with MREMAP_MAYMOVE can always avoid copying by remapping the existing physical pages to the start of a larger extend of virtual address space if there were wasn't room to grow at the original spot. (Still TLB misses though.) — Peter Cordes, CommentedSep 21, 2021 at 10:22
So basically C++ requires std::vector to suck at clever reallocation optimizations, although if an implementation could detect that new wasn't replaced (or promise via a command line arg), the as-if rule would let std::vector use smarter realloc. Howard Hinnant said on SO) that his attempts to improve things for C++11 didn't get traction. IMO adding a try_realloc interface to the language would have been easy; implementations could always define it as new + copy + delete or return false. — Peter Cordes, CommentedSep 21, 2021 at 10:29
So yeah, writing your own container that used realloc or even mmap / mremap (intended for large allocations where bypassing a free-list makes sense) could be a case where it's actually interesting for real use, not just as a learning exercise. — Peter Cordes, CommentedSep 21, 2021 at 10:31
@PeterCordes std::vector is a template (and so is std::allocator), templates can have specialisations. std::vector<TriviablyCopyable>can use realloc — Caleth, CommentedSep 21, 2021 at 12:39

Stack Exchange Network

Dynamic array implementation using C++

3 Answers 3

Don't talk like a pirate

Don't waste work

Fragile code

`std::copy`

The C++ way

The C way

Duplicates

Hot Network Questions

Dynamic array implementation using C++

3 Answers 3

Don't talk like a pirate

Don't waste work

Fragile code

std::copy

The C++ way

The C way

Duplicates

Related

Hot Network Questions

`std::copy`