Computing the first n primes: an example from Clean Code revised for C++

Question

I'm trying to improve on examples from Clean Code* while re-implementing them in C++. This time it's the Sieve of Eratosthenes prime-computing example from pp. 71-74.

Below is the original adapted for C++, without improvements:

.h

class PrimeGenerator { public: PrimeGenerator() = default; ~PrimeGenerator() = default; static std::vector<unsigned> generatePrimes(unsigned maxValue); private: static void uncrossIntegersUpTo(unsigned maxValue); static void crossOutMultiples(); static unsigned determineIterationLimit(); static void crossOutMultiplesOf(unsigned i); static bool notCrossed(unsigned i); static void putUncrossedIntegersIntoResult(); static unsigned numberOfUncrossedIntegers(); static std::vector<bool> crossedOut; static std::vector<unsigned> result; };

.cpp

std::vector<bool> PrimeGenerator::crossedOut; std::vector<unsigned> PrimeGenerator::result; std::vector<unsigned> PrimeGenerator::generatePrimes(unsigned maxValue) { if (maxValue < 2) return {}; uncrossIntegersUpTo(maxValue); crossOutMultiples(); putUncrossedIntegersIntoResult(); return result; } void PrimeGenerator::uncrossIntegersUpTo(unsigned maxValue) { crossedOut = std::vector<bool>(maxValue + 1, false); crossedOut[0] = true; crossedOut[1] = true; } void PrimeGenerator::crossOutMultiples() { unsigned limit = determineIterationLimit(); for (size_t i = 2; i <= limit; ++i) { if (notCrossed(i)) crossOutMultiplesOf(i); } } unsigned PrimeGenerator::determineIterationLimit() { // Every multiple in the array has a prime factor that // is less than or equal to the root of the array size, // so we don't have to cross out multiples of numbers // larger than that root. double iterationLimit = std::sqrt(crossedOut.size()); return static_cast<unsigned>(iterationLimit); } void PrimeGenerator::crossOutMultiplesOf(unsigned i) { for (size_t multiple = 2 * i; multiple < crossedOut.size(); multiple += i) { crossedOut[multiple] = true; } } bool PrimeGenerator::notCrossed(unsigned i) { return !crossedOut[i]; } void PrimeGenerator::putUncrossedIntegersIntoResult() { result = std::vector<unsigned>(numberOfUncrossedIntegers()); size_t j = 0; for (size_t i = 2; i < crossedOut.size(); ++i) { if (notCrossed(i)) result[j++] = i; } } unsigned PrimeGenerator::numberOfUncrossedIntegers() { unsigned count = 0; for (size_t i = 2; i < crossedOut.size(); ++i) { if (notCrossed(i)) count++; } return count; }

What we see here is a static class with static functions and members. We don't like these in C++, so it seems this code could be better served with a namespace and some free functions. Let's try - my improvement attempt coming below.

.h

namespace PrimeGenerator { std::vector<unsigned> generatePrimes(unsigned maxValue); }

.cpp

namespace { std::vector<bool> uncrossIntegersUpTo(int maxValue) { std::vector<bool> crossedOut(maxValue + 1, false); crossedOut[0] = true; crossedOut[1] = true; return crossedOut; } unsigned determineIterationLimit(size_t size) { // Every multiple in the array has a prime factor that // is less than or equal to the root of the array size, // so we don't have to cross out multiples of numbers // larger than that root. double iterationLimit = std::sqrt(size); return static_cast<unsigned>(iterationLimit); } void crossOutMultiplesOf(unsigned i, std::vector<bool>& crossedOut) { for (size_t multiple = 2 * i; multiple < crossedOut.size(); multiple += i) { crossedOut[multiple] = true; } } void crossOutMultiples(std::vector<bool>& crossedOut) { unsigned limit = determineIterationLimit(crossedOut.size()); for (size_t i = 2; i <= limit; ++i) { if (!crossedOut[i]) crossOutMultiplesOf(i, crossedOut); } } std::vector<unsigned> putUncrossedIntegersIntoResult(const std::vector<bool>& crossedOut) { std::vector<unsigned> result; for (size_t i = 2; i < crossedOut.size(); ++i) { if (!crossedOut[i]) result.push_back(i); } return result; } } namespace PrimeGenerator { std::vector<unsigned> generatePrimes(unsigned maxValue) { if (maxValue < 2) return {}; auto crossedOut = uncrossIntegersUpTo(maxValue); crossOutMultiples(crossedOut); return putUncrossedIntegersIntoResult(crossedOut); } }

A quick summary of changes:
- I removed the class, leaving a single interface function in a PrimeGenerator namespace.
- The numberOfUncrossedIntegers() function didn't seem to make much sense, so I refactored putUncrossedIntegersIntoResult(...) to get rid of the former.
- notCrossed(...) would now need to have two parameters, so it stopped making sense either. It's gone now.

Now, I have two questions about my code. First of all, we now need to pass the crossedOut vector around, which is a downside compared to the previous design. Would you propose an alternative solution to mitigate this? Secondly, are there any extra places where I should have used size_t instead of unsigned?

Cheers!

EDIT:
I'd like to stress I care more about good software engineering and coding style here rather than making this algorithm as fast as possible. Though of course it should be correct.

* Clean Code: A Handbook of Agile Software Craftsmanship, Robert C. Martin

L. F. · Accepted Answer · 2019-08-21 09:27:34Z

In general, we try to keep functions small and nice, but in your case, using so many functions is just weird. For example, are you sure the uncrossedIntegersUpTo function is needed at all? How about determineIterationLimit? You can just handle them in the main function.

Also, std::vector<bool> is not like the normal vector. It is not a container because it (usually) packs the bools together to save space. This may result in a significant increase in runtime. (See Howard Hinnant's On vector<bool>) This is definitely a big mistake, but there is no trivial backwards compatible way to fix it at this stage. You may have to consider working around it with, say, std::vector<char> in this case.

Now let's talk about types. First, it is std::size_t, not size_t. Second, your use of unsigned everywhere is like a "magic type" (analogous to magic numbers) — it will help to define an alias like

using number_t = unsigned;

And I think something like std::uint_fast32_t will be better in this case. Also, std::size_t operates on sizes. Are you sure you want to use it for numbers?

std::sqrt operates on floating point numbers and may cause precision problems here. You may want to design some isqrt function for integers.

Putting these together, your code may be as simple as something like: (not tested)

// generate primes less than max std::vector<number_t> generate_primes(number_t max) { if (max < 2) return {}; // You may need to use char or something like that std::vector<bool> table(max, false); // crossed out numbers table[0] = table[1] = true; const number_t limit = isqrt(max); // like that for (number_t i = 2; i < limit; ++i) { if (!table[i]) { for (number_t j = i * 2; j < max; ++j) table[j] = true; } } std::vector<number_t> result; for (number_t i = 2; i < max; ++i) { if (!table[i]) result.push_back(i); } return result; }

Thanks for the valuable input! There are points I agree with, and some I disagree with, as in life :) Ad. Small functions. I do like small functions, and I don't think "it's just weird" is enough of a justification not to write them. Another thing I don't like are raw loops (S. Parent's talk on this is highly recommendable: youtu.be/W2tWOdzgXHA), so I'd insist on keeping crossOutMultiples along with putUncrossedIntegersIntoResult (but rename it). I do think you've got a point, though, and I'd include crossOutMultiplesOf in the former plus get rid of determineIterationLimit. — SG_90, CommentedAug 23, 2019 at 9:39
Ad. std::vector<bool>. I had a vague idea there was something wrong with it, but didn't know exactly what. Thanks! Ad. size_t vs std::size_t. Oh right, size_t actually comes from the C compatibility header. True, we should prefer the namespaced one, though they're essentially the same, I think. std::size_t is not as convenient, however ;) Still, you're right. Ad. using number_t. That's an interesting idea, thanks. Where would you use size_t, then? — SG_90, CommentedAug 23, 2019 at 9:40
Ad. std::uint_fast32_t. I didn't know about these types. Cool, thanks. Good for when you care about maximally optimising your algorithms for speed. Ad. isqrt. I'm wondering if this is really necessary. I might be missing something, though, so could you clarify at what point do you see this becoming a problem? Overall, thanks for the comments, I learned something. If I may give you some feedback, then I missed some arguments justifying your suggestions, such as with writing short functions. — SG_90, CommentedAug 23, 2019 at 9:43
I also have one comment to your own solution. Naming a vector or an array a 'table' doesn't really describe what it's for. crossedOut, which is not my idea, accomplishes this far better. Cheers! — SG_90, CommentedAug 23, 2019 at 9:43
@SG_90 There is nothing wrong with raw loops whatsoever. What’s wrong is loops that really aren’t logically loops (e.g., zeroing a range). std::size_t is used when you are dealing with size, e.g., as the size of an array. The fast family of types are not directly related to speed ... There is no objective way to judge which type is “fast,” and I just consider them as the “natural” type to use. The alternative is the least family, which may actually incur performance overhead. And std::sqrt may compute the square root of large numbers wrongly, and may be inefficient. — L. F., CommentedAug 23, 2019 at 14:50

vnp · Accepted Answer · 2019-08-21 16:09:55Z

Starting cross-out from size_t multiple = 2 * i is a serious pessimization. Every multiple of i with other factor less than i has already been crossed out during previous passes, which dealt with these smaller primes. You should start with size_t multiple = i * i.
Notice that computing i * i eliminates the need to compute sqrt. The outer loop written as
```
for (size_t i = 2; i * i < crossedOut.size(); ++i) 
```
terminates in a very natural way. It may also be beneficial to store the square in a variable:
```
for (size_t i = 2; (square = i * i) < crossedOut.size(); ++i) { crossOutMultiplesOf(square, i, crossedOut); } 
```
Now, passing both i and its square seems redundant. OTOH not passing the square forces the callee to recompute it. This is a very rare situation in which I'd advocate against factoring the loop into a function. Consider
```
for (size_t i = 2; (square = i * i) < crossedOut.size(); ++i) { for (size_t multiple = square; multiple < crossedOut.size(); multiple += i) { crossedOut[multiple] = true; } } 
```

Stack Exchange Network

Computing the first n primes: an example from Clean Code revised for C++

2 Answers 2

Hot Network Questions

Computing the first n primes: an example from Clean Code revised for C++

2 Answers 2

Related

Hot Network Questions