4
\$\begingroup\$

This program tried to solve LeetCode problem 3366: Minimum Array Sum.
It fails for k odd, op1+op2 high enough such there are more odd value to optionally subtract k from than subtractions to allot.

A Python iteration of the approach chosen got annoyingly verbose when using partition by value and rank - open coded because I wouldn't know which implementation to take or use.

I thought C++&STL provided "everything" I wanted to use with the introduction of ranges - it seems I ended up not using any.

I used an "AI" web service for an idea what equivalent "modern C++" code might look like, and tinkering with that and web "IDE"s.
Pretty much none of that code remains as I went from the circumstantial (see above) sort all the eligible values (and abuse sorting once more where order has potentially been messed up) to the partition as needed intended from the outset.

/** leetcode problem 3366: Minimum Array Sum * given a list nums of non-negative integers * and three non-negative integers k, op1, and op2 * return the minimum total sum attainable using * (each at most once per index): * - up to op1 halvings rounding up * - up to op2 subtractions of k not resulting in a negative number *//* class Solution { public: int minArraySum(vector<int>& nums, int k, int op1, int op2) { return minimum_array_sum(nums, k, op1, op2); } }; */ #include <iostream> #include <vector> #include <algorithm> #include <numeric> static long int sum_halves(auto first, auto exclude) { return std::transform_reduce(first, exclude, 0L, std::plus<long>(), [](auto v) { return v / 2L; }); } // sum the top k halves from [first, exclude) // side effect: permutes [first, exclude) // XXX useful behaviour with too few values? static long int sum_top_halves(auto first, auto exclude, int k) { auto top = exclude - k; if (top <= first) { top = first; } else { std::nth_element(first, top, exclude); } return sum_halves(top, exclude); } /** return the minimum total sum attainable using (each at most once per index): * - up to op1 halvings rounding up * - up to op2 subtractions of k not resulting in a negative number */ static long minimum_array_sum(const std::vector<int>& nums, int k, int op1, int op2) { long total = std::reduce(nums.cbegin(), nums.cend(), 0L); // no-brainers: if (k <= 0) { // don't subtract values not reducing sum op2 = 0; k = 0; } if (op1 <= 0 && op2 <= 0) // no operations with any effect return total; int min_reducible = (k == 1) ? 1 : (op1 > 0 ? 2 : k); std::vector<int> partitioned; std::ranges::copy_if(nums.cbegin(), nums.cend(), std::back_inserter(partitioned), [min_reducible](auto v){return min_reducible<=v;}); long n = partitioned.size(); const auto begin = partitioned.begin(), end = partitioned.end(); op1 = std::min((long)op1, n); if (op2 <= 0) { // no subtractions, but halvings needs half a brain: // just halve up to op1 greatest values return total - sum_top_halves(begin, end, op1); } auto first_big = std::partition(begin, end, [k](auto v){return v<k;}); long big = first_big - begin; op2 = std::min((long)op2, n - big); total -= k * op2; if (op1 <= 0) return total; auto untouched = big - op2; if (op1 <= untouched) return total - sum_top_halves(first_big, end, op1); // more than untouched values huge enough to halve before subtraction? auto huge = end - std::partition(first_big, end, [k](auto v){return v < 2 * k - 1;}); long halvables = std::max(untouched, huge); auto halve_first = std::min(halvables, (long) op1); total -= sum_top_halves(first_big, end, halve_first); op1 -= halve_first; if (op1 <= 0) return total; for (int i = n - big, beyond = n - halve_first ; i < beyond ; i++) partitioned[i] -= k; return total - sum_top_halves(begin, end - halve_first, op1); } int main() { auto t1 = std::vector<int>{2, 8, 3, 19, 3}, t2 = std::vector<int>{2, 4, 3}; std::cout << minimum_array_sum(t1, 3, 1, 1) << '\n'; // 23 std::cout << minimum_array_sum(t2, 3, 2, 1) << std::endl; // 3 return 0; } 

Would the benefits of more const correctness outweigh impaired(?) readability?
Would this approach benefit significantly from using ranges?
Should I convert the for-loop to an algorithm? for_each?
Should the first, exclude parameters to "the sum_halves() explicitly be references? (How?)

\$\endgroup\$
4
  • \$\begingroup\$I haven't used C++ in earnest in decades - not even problem/challenge style&level: I don't trust my intuitions.\$\endgroup\$
    – greybeard
    CommentedDec 4, 2024 at 15:03
  • \$\begingroup\$The other browser remembers I visited CodeConvert.ai's Online Python to C++ Converter. What impressed me considerably was the differences natural language Additional instructions made.\$\endgroup\$
    – greybeard
    CommentedDec 4, 2024 at 15:45
  • \$\begingroup\$(Out of curiosity, I just exercised the generate code feature from above site with my terse function specification. 1st attempt was slow&slightly wrong. Tweaking to up to op1 halvings, rounding the result up yielded a very aloof approach, if "combinatorially slow". It looks picture perfect, but has a gross error to the right of the 80th column. Each error looks trivial to fix)\$\endgroup\$
    – greybeard
    CommentedDec 6, 2024 at 13:11
  • \$\begingroup\$(Curious why these particular batteries wouldn't be included with Python, I had a search engine find me numpy.*partition (equivalent to C++ std::nth_element()). No such luck with std::partition().)\$\endgroup\$
    – greybeard
    CommentedDec 9, 2024 at 8:14

2 Answers 2

6
+100
\$\begingroup\$

Notes

I thought C++&STL provided "everything" I wanted to use with the introduction of ranges - it seems I ended up not using any.

It will almost never be the case that the standard library provides everything you need, for anything.

The C++ standard library (not the “STL”; that is something else) is intentionally very minimal, providing only the basic building blocks. It is possible, and not even all that hard, to raw-dog many projects using only the standard library as a dependency. But that’s not really the smart way to build a real, non-trivial application.

Would the benefits of more const correctness outweigh impaired(?) readability?

That is for you to decide, or for whoever it is deciding what different benefits weigh.

More const will almost never improve the execution speed or memory efficiency of a program. But it can vastly improve the safety of a codebase. If you don’t care about safety, then there probably won’t be much benefit from const for you.

But if you don’t care about safety, don’t expect anyone else to be keen to run your code.

Would this approach benefit significantly from using ranges?

Again, “benefit” is subjective.

Ranges generally won’t make (well-written) code any faster or more memory-efficient. They may not even improve the legibility, or the clarity of the algorithm. In fact, in many cases, ranges may actually be worse for some of those metrics.

The power of ranges is that it makes it easy to write code that is “correct by construction”. Yes, just as with const, ranges is about improving safety… not efficiency.

For example, let’s say you want to read all the words in a dictionary file, select every word that you have all the letter games tiles for, compute the scores for those words, reject any words with a score below a minimum threshold, and save the results in a vector. With ranges, that might look like:

auto words = std::views::istream<std::string>(dictionary_file) | std::views::filter([&letters_i_have](auto&& word) { return std::ranges::is_permutation(word, letters_i_have); }) | std::views::transform([](auto&& word) { return std::tuple{word, compute_score(word)}; }) | std::views::filter([threshold](auto&& word_with_score) { return std::get<1>(word_with_score) >= threshold; }) | std::views::transform([](auto&& word_with_score) { return std::get<0>(word_with_score); }) | std::ranges::to<std::vector>() 

Now, put aside opinions about whether this code is beautiful or not, and ignore whether it is efficient or not. The real magic here is that this code cannot be wrong. By that I mean: assuming it compiles, and assuming the logic is correct, it handles every edge case that you would normally have to write dozens of if cases and other spaghetti logic to handle. What happens if the input file is empty or unreadable? No problem, you’ll get an empty vector (or an exception). What happens if either filter filters everything out? No problem, still works. There are no branches; the logic is a straight, simple line. Reasoning about this code is easy.

That is the power of ranges: no more branches, looping, or circuitous paths through business logic. You get one straight line through a pipeline of operations. One entry point, one exit point (exceptions or other failure states notwithstanding).

Is that beneficial? Depends on your point of view.

Should I convert the for-loop to an algorithm? for_each?

  1. Yes.
  2. Usually no.

So, why bother with algorithms when you have a perfectly good for loop? There are several answers.

  1. Algorithms are self-documenting. Is that for doing a search? Is it transforming the data in a range? You won’t know until you read the code! On the other hand, there is no mystery if you see find() or transform(). You know what’s going on.
  2. Algorithms are contained. A for loop has access to all of the surrounding context, so a typo in the loop body could change the value of a variable many layers up. By contrast, an algorithm cannot touch any data you do not give it access to.
  3. Algorithms can be highly optimized. A decent compiler can unroll loops and vectorize operations fairly well. But algorithms make this very easy, so you can expect even better optimization. (Or at least, no worse.)
  4. Algorithms are tested. It is very easy to introduce bugs in raw loops, and very, very hard to catch those bugs before they bite you. Algorithms can be independently and rigorously tested, so you know they work.

… and so on.

Once again, you keep asking about benefits in the abstract, but there is no such thing. Everything in engineering is trade-offs. Algorithms will probably not make your code more efficient (though they might!). But they do have other properties you might find beneficial.

As for for_each() specifically, sometimes it is exactly the right tool for the job: that is, when you are literally just doing some operation with each element of a sequence. But that’s actually quite rare. Even when you are just stepping one element at a time through a sequence, in order, you’re usually actually doing something that’s more than just “an operation”… like maybe a find, or a fold, or you’re generating data, or transforming.

The most common case where you’re doing literally nothing but an operation on a sequence is when you are printing the elements. In that case you are not transforming or generating the sequence, you are not searching, moving or copying, and you are not doing any sort of fold. You’re literally just doing an operation—printing—for each element. In that case, for_each() is fine… but even then, it is now possible to print a sequence without a loop:

auto const v = std::vector{1, 2, 3, 4}; // Before C++23: std::cout << '['; auto first = true; std::ranges::for_each(v, [&first](auto i) { if (not std::exchange(first, false)) std::cout << ", "; std::cout << i; }); std::cout << ']'; // Since C++23: std::print("{}", v); 

In your specific case, you are definitely not doing just an operation using each value of a sequence:

 for (int i = n - big, beyond = n - halve_first ; i < beyond ; i++) partitioned[i] -= k; 

That is clearly a transform (specifically, a transform of a subrange of partition, using std::minus with the second operand bound to k: transform(sr, begin(sr), bind_back(minus<>{}, k))… although probably more clearly written as like auto const subtract_k = bind_back(minus<>{}, k); transform(sr, begin(sr), subtract_k);.

Should the first, exclude parameters to "the sum_halves() explicitly be references? (How?)

I don’t see why. But the how is easy:

static long int sum_halves(auto& first, auto& exclude) { 

Now they’re references.

Or even better:

static long int sum_halves(auto&& first, auto&& exclude) { 

Now they’re universal references.

However this all seems pointless given that the one and only thing you do with these variables is pass them to transform_reduce()… which immediately takes them by value anyway.

But back to the why… it’s not clear because you’ve created a template but given the type no name (equivalent to template<typename T, typename U> sum_halves(T first, U exclude)), but first and exclude are suppose to be iterators… or rather, an iterator and a sentinel. Iterators are meant to be passed around by value, not by reference (except for rare cases, like advance()).

I mean, if you want to be really obsessive and handle the wackiest corner cases with maximal efficiency, then you could take the arguments by value and then move them into the arguments for transform_reduce(). But realistically, you will never see a case where that actually matters.

Code review

#include <iostream> #include <vector> #include <algorithm> #include <numeric> 

Because the order of includes matters, it is generally wise to order them logically. Usually that means by library, then alphabetically.

Also, you are missing a ton of includes.

static long int sum_halves(auto first, auto exclude) { return std::transform_reduce(first, exclude, 0L, std::plus<long>(), [](auto v) { return v / 2L; }); } 

I know declaring every function static is a coding challenge thing, but it really doesn’t matter in real-world programming.

Now, making an unconstrained template like this is a terrible idea. Written this way, you seem to be saying that first and exclude can be literally anything, and can be completely unrelated to each other. But that’s obviously not true.

For starters, first is clearly meant to be an iterator, and exclude is meant to be another iterator or a sentinel for the same range. Let’s put a pin in that last bit for the moment. So first is an iterator, and there is no reason it could not be an input iterator. So, to start:

template<std::input_iterator I> auto sum_halves(I first, I exclude) 

Note that I am not using the abbreviated form anymore. That’s because there are two parameters, and I want their types to be related. You could write this in abbreviated form, but to be blunt, I have never used the abbreviated form because it is so limited to the point of uselessness.

Now, in modern C++, the type of an “end” iterator need not be the same as the type as the “begin” iterator. In other words, rather than iterator pairs, these days we use iterators and sentinels… where the sentinel could be (and usually is) the same type as the iterator, but doesn’t need to be. So you could do:

template<std::input_iterator I, std::sentinel_for<I> S> auto sum_halves(I first, S exclude) 

However

You’re passing the arguments on to std::transform_reduce(), which predates the iterator-sentinel paradigm, and so does not support sentinels. So you might as well not bother.

Okay, so first is an input iterator, but not just any input iterator. Its value type should be a type that is convertible to long, because that’s what you want to reduce it to.

template<std::input_iterator I> requires std::convertible_to<std::iter_reference_t<I>, long> auto sum_halves(I first, I exclude) 

Now let’s look inside the function:

template<std::input_iterator I> requires std::convertible_to<std::iter_reference_t<I>, long> auto sum_halves(I first, I exclude) { return std::transform_reduce(first, exclude, 0L, std::plus<long>(), [](auto v) { return v / 2L; }); } 

Most of that is fine… but there’s one dangerous trap in there, and it is in the lambda. As written, the lambda takes “any type”, then does a division by a long, and then passes the result of that to std::plus<long>’s operator function (which takes two longs). If everything were long all the way down, there would be no problem… but if the “any type” happens to be a type that, when divided by a long does not produce a long… weird things can happen.

There are a number of ways to fix the problem. Since we’ve already constrained the “any type” to be convertible to long, we could just force that conversion:

[](auto v) { return static_cast<long>(v) / 2L; } 

Or we could have the conversion happen automatically at the start of the lambda call:

[](long v) { return v / 2; } 

Basically, don’t do the division until you are sure v is a long. Once it is, then you know the result of the division is also a long, which is what you want.

I want to get back to that pin from earlier. Right now the function looks like this: sum_halves(a, b). Now, we have constrained a and b to both be iterators, and both be iterators of the same type… but there is still no way to know that they are supposed to be iterators from the same range, or that a is supposed to be before b.

This is why writing algorithms with iterators pairs is old-fashioned. The modern way is to use a range:

template<std::ranges::input_range R> requires std::convertible_to<std::ranges::range_reference_t<R>, long> constexpr auto sum_halves(R&& r) { return std::transform_reduce( std::ranges::begin(r), std::ranges::end(r), 0L, std::plus<>{}, [](long v) { return v / 2L; } ); } 

Using a range rather than a pair of iterators completely eliminates all the problems with the iterators not being for the same range, or out of order, or anything else.

And of course, we add constexpr because we can.

Now, it’s not necessary, but you could upgrade the range type to a forward range, and add an execution policy to get vectorization or parallelization here. You’d have to give up the constexpr though.

static long int sum_top_halves(auto first, auto exclude, int k) { auto top = exclude - k; if (top <= first) { top = first; } else { std::nth_element(first, top, exclude); } return sum_halves(top, exclude); } 

So let’s start by applying all the stuff from the previous function:

template<std::ranges::input_range R> requires std::convertible_to<std::ranges::range_reference_t<R>, long> constexpr auto sum_top_halves(R&& r, int k) 

Now, a naked int in an interface is a code smell. What does it mean? int just means “a number”, but what kind of number? What does it represent?

Because k is supposed to be the distance to move the end iterator back, the right type here should probably be range_difference_t<R>:

template<std::ranges::input_range R> requires std::convertible_to<std::ranges::range_reference_t<R>, long> constexpr auto sum_top_halves(R&& r, std::ranges::range_difference_t<R> k) 

Finally, k must also be zero or greater, but less than the size of the range. Unfortunately, while we can constrain types in C++23, we can’t yet constrain values. That might be coming in C++26, with contracts, and it might look like this:

template<std::ranges::input_range R> requires std::convertible_to<std::ranges::range_reference_t<R>, long> constexpr auto sum_top_halves(R&& r, std::ranges::range_difference_t<R> k) pre(k >= std::ranges::range_difference_t<R>{}) // don't need to check that k is < the range size, because that's handled by advance() { // ... } 

For now, we can at least note the precondition as a comment.

Now let’s dive into the body:

template<std::ranges::input_range R> requires std::convertible_to<std::ranges::range_reference_t<R>, long> constexpr auto sum_top_halves(R&& r, std::ranges::range_difference_t<R> k) // pre(k >= std::ranges::range_difference_t<R>{}) { // Need these so the rest of the code remains unchanged. auto const first = std::ranges::begin(r); auto const exclude = std::ranges::end(r); auto top = exclude - k; if (top <= first) { top = first; } else { std::nth_element(first, top, exclude); } return sum_halves(top, exclude); } 

Now, you have a bit of UB going on here. If k is larger than distance(first, exclude), then the operation exclude - k could explode. To understand why, imagine the iterators you are working with are just pointers and if somehow the range you are working with just so happened to be located at memory address 10, happened to have 20 byte-sized elements, and then k is 50. In that scenario, first will point to address 10, exclude will point to address 30, then you do 30 - 50 and compute address −20. What happens next with the if (top <= first) is anyone’s guess, but it’s kinda irrelevant because your computer is already on fire due to trying to helpfully prefetch data from a memory address located 2 microns off the left side of your RAM chip.

So you can’t just move exclude backwards arbitrarily and then check after the fact if you’ve blown past the beginning of the range. Let’s try to change the strategy. Let’s check the distance first, and then do the subtraction only if it’s safe. To do this, we can no longer use an input range, because doing the distance check would burn through the single pass, so it has to be at least a forward range. And since we’re using nth_element(), it needs to be a random-access range in any case, so let’s just go with that.

template<std::ranges::random_access_range R> requires std::convertible_to<std::ranges::range_reference_t<R>, long> constexpr auto sum_top_halves(R&& r, std::ranges::range_difference_t<R> k) // pre(k >= std::ranges::range_difference_t<R>{}) { auto const first = std::ranges::begin(r); auto const exclude = std::ranges::end(r); auto top = first; if (k < std::ranges::distance(r)) { top = exclude; std::ranges::advance(top, -k, first); std::nth_element(first, top, exclude); } return sum_halves(top, exclude); } 

Let’s range-ify that a bit:

template<std::ranges::random_access_range R> requires std::convertible_to<std::ranges::range_reference_t<R>, long> constexpr auto sum_top_halves(R&& r, std::ranges::range_difference_t<R> k) // pre(k >= std::ranges::range_difference_t<R>{}) { auto v = std::ranges::subrange{r}; if (k < std::ranges::distance(v)) { auto const top = std::ranges::nth_element(v, std::ranges::end(v) - k); v = {top, std::ranges::end(v)}; } return sum_halves(v); } 

That’s all I have time to review today. Hope it’s enough to help a bit.

\$\endgroup\$
1
  • \$\begingroup\$Yes, you could print with std::copy() or with a number of other algorthims (you could print with std::find_if() if you wanted), but they are just hacks, because printing is not copying. std::for_each() is the correct tool for printing.\$\endgroup\$
    – indi
    CommentedDec 5, 2024 at 11:04
6
\$\begingroup\$

It would be better to target the most recent version of C++ which is C++23, but you should be using at least C++20 for new code rather than C++17.

The code is inconsistent when applying braces around compound statements or possible compound statements:

 if (k <= 0) { // don't subtract values not reducing sum op2 = 0; k = 0; } if (op1 <= 0 && op2 <= 0) // no operations with any effect return total; 

The preferred way is to always use braces:

 if (k <= 0) { // don't subtract values not reducing sum op2 = 0; k = 0; } if (op1 <= 0 && op2 <= 0) { // no operations with any effect return total; } Please note that it is difficult to maintain code where there are 2 statements on a line. 

Old C Style Casts

There are a number of places where the code is casting a value to long, in C++ a compile time cast should be static_cast(VALUE). There are also dynamic casts.

 op2 = std::min((long)op2, n - big); 

Correct way:

 op2 = std::min(static_cast<long>(op2), n - big); 

Function Return Type

The type specified by the prototype class in comments is int, returning a long to an int can cause a lose of data.

Variable Declarations

For readability and maintainability each variable should be declared and initialized on its own line.

 const auto begin = partitioned.begin(), end = partitioned.end(); 

Versus

 const auto begin = partitioned.begin(); const auto end = partitioned.end(); 

Prefer std::size_t for Array Indexes and Size Values

The size member of all container classes returns std::size_t, this is an unsigned integer value that is guaranteed to be large enough to hold the size of any object in memory.

 std::size_t n = partitioned.size(); 

Note: this will cause n to need to be cast to the proper type in the calls to std::min() and std::max().

\$\endgroup\$
3
  • \$\begingroup\$(Is there an actionable guideline for picking language version tags? I explicitly ask about using a "post-17" feature (and paged through many a function documentation). I tagged C++17 because I think that's all the code presented needs.)\$\endgroup\$
    – greybeard
    CommentedDec 4, 2024 at 18:14
  • \$\begingroup\$(accumulating ints in an int can lead to loss of information… An adaptor like the one commented out is what I prefer with external interface requirements I don't approve of. I even get to name business functions conforming to language conventions…)\$\endgroup\$
    – greybeard
    CommentedDec 4, 2024 at 18:24
  • 2
    \$\begingroup\$@greybeard: The text of your question doesn't contain the string "17". (Or 20, or 23 except as one of the sum results). So it's current not at all clear that you're intending this as a "modern" C++ program using any useful language features up to and including C++23. At least in future, you should say so in the text of the question. Also, I think tagging [c++23] makes sense if you're going to compile it with gcc -std=gnu++23, so the language you're writing in is C++23. You can mention in the text that you only ended up using features from C++17 and earlier.\$\endgroup\$CommentedDec 5, 2024 at 7:08

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.