3
\$\begingroup\$

I'm writing a little collection of sorting algorithms and some benchmarks (using Google benchmark). I wrote the following heap sort but I'm struggling in understanding why it is so slower than std::sort_heap implemented in the STLC++ Stardard Library. The code is inspired to the algorithm described in Cormen book "Introduction to algorithms"

Am I doing something silly?
I have also profiled the code using perf and it seems that the hottest instruction is the comparison function.

Here it goes my implementation:

template <typename Iterator> inline Iterator left_child(const Iterator begin, Iterator idx) { return idx + std::distance(begin, idx) + 1; } template <class Iterator> inline Iterator right_child(const Iterator begin, Iterator idx) { return left_child(begin, idx) + 1; } template <class Iterator> inline Iterator parent(Iterator begin, Iterator idx) { return begin + std::distance(begin + 1, idx) / 2; } template <typename Iterator, typename CMP_FN> inline void heapify(const Iterator begin, const Iterator end, Iterator idx, CMP_FN cmp) { bool go = true; while (go) { const Iterator left(left_child(begin, idx)); const Iterator right(right_child(begin, idx)); Iterator candidate(idx); // this if relies on if short circuiting cmp has to be guarded for segfault if (left < end && !cmp(*left, *idx)) candidate = left; //this if relies on if short circuiting cmp has to be guarded for segfault if (right < end && !cmp(*right, *candidate)) candidate = right; if (candidate != idx) { std::swap(*candidate, *idx); // go=true; idx = candidate; } else go = false; } } template <typename Iterator, typename CMP_FN> inline void build_heap(const Iterator begin, const Iterator end, CMP_FN cmp) { const auto d = distance(begin, end) / 2; for (Iterator s = begin + d; s >= begin; s--) heapify(begin, end, s, cmp); } ///////////////////////////////////////////////// /// @Brief Heap sort implementation (inspiration from Cormen et al Book) // CMP_FN has type: D -> D -> bool ///////////////////////////////////////////////// template <typename Iterator, typename CMP_FN> void heap_sort(Iterator s, Iterator e, CMP_FN cmp) { auto d = std::distance(s,e); if(d <= 1) return; build_heap(s, e, cmp); while(d--){ e--; std::swap(*s, *e); heapify(s, e, s, cmp); } } 

When benchmarked those are the timings I get:

---------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------------------- benchmark_random_values<quicksorter_hoare>/32768 2 ms 2 ms 154 benchmark_random_values<quicksorter_hoare>/65536 4 ms 4 ms 73 benchmark_random_values<quicksorter_hoare>/131072 8 ms 8 ms 34 benchmark_random_values<quicksorter_hoare>/262144 17 ms 17 ms 16 benchmark_random_values<quicksorter_hoare>/524288 36 ms 36 ms 8 benchmark_random_values<quicksorter_hoare>/1048576 77 ms 77 ms 4 benchmark_random_values<quicksorter_hoare>/2097152 159 ms 159 ms 2 benchmark_random_values<quicksorter_hoare>/4194304 336 ms 336 ms 1 benchmark_random_values<quicksorter_hoare>/8388608 701 ms 701 ms 1 benchmark_random_values<quicksorter_hoare>/16777216 1466 ms 1466 ms 1 benchmark_random_values<heap_sorter>/32768 2 ms 2 ms 111 benchmark_random_values<heap_sorter>/65536 5 ms 5 ms 53 benchmark_random_values<heap_sorter>/131072 12 ms 12 ms 24 benchmark_random_values<heap_sorter>/262144 25 ms 25 ms 11 benchmark_random_values<heap_sorter>/524288 55 ms 55 ms 5 benchmark_random_values<heap_sorter>/1048576 120 ms 120 ms 2 benchmark_random_values<heap_sorter>/2097152 264 ms 264 ms 1 benchmark_random_values<heap_sorter>/4194304 659 ms 659 ms 1 benchmark_random_values<heap_sorter>/8388608 1657 ms 1657 ms 1 benchmark_random_values<heap_sorter>/16777216 4038 ms 4038 ms 1 benchmark_random_values<sort_heap_std>/32768 2 ms 2 ms 108 benchmark_random_values<sort_heap_std>/65536 5 ms 5 ms 60 benchmark_random_values<sort_heap_std>/131072 9 ms 9 ms 30 benchmark_random_values<sort_heap_std>/262144 20 ms 20 ms 12 benchmark_random_values<sort_heap_std>/524288 40 ms 40 ms 7 benchmark_random_values<sort_heap_std>/1048576 87 ms 87 ms 3 benchmark_random_values<sort_heap_std>/2097152 175 ms 175 ms 2 benchmark_random_values<sort_heap_std>/4194304 365 ms 365 ms 1 benchmark_random_values<sort_heap_std>/8388608 757 ms 757 ms 1 benchmark_random_values<sort_heap_std>/16777216 1736 ms 1736 ms 1 

As you can see from the timing the heap sorter is much slower than the std::sort_heap and also than a quicksort that I wrote.

\$\endgroup\$
4
  • \$\begingroup\$As an aside, do you know What's the difference between “STL” and “C++ Standard Library”?\$\endgroup\$CommentedJul 4, 2017 at 16:13
  • \$\begingroup\$Apparently, I did not! XD\$\endgroup\$CommentedJul 4, 2017 at 16:17
  • \$\begingroup\$@Deduplicator Thanks for this link. Is it okay to simply say stdlib when refering to the standard library?\$\endgroup\$
    – yuri
    CommentedJul 4, 2017 at 21:29
  • \$\begingroup\$@yuri Yes, standard library is quite a mouthful ;-)\$\endgroup\$CommentedJul 4, 2017 at 21:34

1 Answer 1

3
\$\begingroup\$

Well, before we even consider efficiency, lets look at correctness.
Your code invokes Undefined Behavior:

Specifically, you create iterators which will refer far past the range you got in left_child() and right_child() after you swapped a leaf-node in heapify.

That probably also answers for your programs speed.

Or maybe you compared heap_sort() performance with std::sort_heap(), which is completely unfair as the former corresponds to the combination of std::make_heap()+std::sort_heap().


Why do you split heapsort() into two words like heap_sort()? That at best confuses.

You would normally call the function which builds the whole heap heapify(), though build_heap() isn't that uncommon. I would keep to make_heap() like the standard uses though.

The helper-function you unaccountably called heapify() even though it's just a small step in building a heap would normally be called sift_down(), down_heap() or a variant thereof.

I would expect a (preprocessor-) constant under CMP_FN instead of a template parameter. Why don't you use the more common Compare, Comp or Cmp, especially as it need not be a function, only a callable?

A qualified call to std::swap(a, b) cannot pick up specializations in other namespaces using ADL. Use using std::swap; swap(a, b); or std::iter_swap(pa, pb);.

Regarding "something silly": Avoid using flags for flow-control, dedicated control-structures are easier to follow.

\$\endgroup\$
3
  • \$\begingroup\$thanks for your answer. I fixed the left_child functions by checking if (distance(idx, end-2) < distance(begin, idx) and if that fails returning end(the right_child similarly). Is that what you were refereeing to as UB? Regarding the performance, the problem is still there. What do you mean that the comparison is unfair? shouldn't make_heap correspond to my build_heap and sort_heap to my heap_sort? Plus thanks for the stylistic comments. I fill fix them up right away.\$\endgroup\$CommentedJul 4, 2017 at 16:01
  • \$\begingroup\$Read it: std::sort_heap: Converts the max heap [first, last) into a sorted range in ascending order.\$\endgroup\$CommentedJul 4, 2017 at 16:07
  • \$\begingroup\$Obviously I am benchmarking make_heap+sort_heap vs my heap_sort\$\endgroup\$CommentedJul 4, 2017 at 16:18

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.