I'm writing a little collection of sorting algorithms and some benchmarks (using Google benchmark). I wrote the following heap sort
but I'm struggling in understanding why it is so slower than std::sort_heap implemented in the C++ Stardard Library. The code is inspired to the algorithm described in Cormen book "Introduction to algorithms"STL
Am I doing something silly?
I have also profiled the code using perf
and it seems that the hottest instruction is the comparison function.
Here it goes my implementation:
template <typename Iterator> inline Iterator left_child(const Iterator begin, Iterator idx) { return idx + std::distance(begin, idx) + 1; } template <class Iterator> inline Iterator right_child(const Iterator begin, Iterator idx) { return left_child(begin, idx) + 1; } template <class Iterator> inline Iterator parent(Iterator begin, Iterator idx) { return begin + std::distance(begin + 1, idx) / 2; } template <typename Iterator, typename CMP_FN> inline void heapify(const Iterator begin, const Iterator end, Iterator idx, CMP_FN cmp) { bool go = true; while (go) { const Iterator left(left_child(begin, idx)); const Iterator right(right_child(begin, idx)); Iterator candidate(idx); // this if relies on if short circuiting cmp has to be guarded for segfault if (left < end && !cmp(*left, *idx)) candidate = left; //this if relies on if short circuiting cmp has to be guarded for segfault if (right < end && !cmp(*right, *candidate)) candidate = right; if (candidate != idx) { std::swap(*candidate, *idx); // go=true; idx = candidate; } else go = false; } } template <typename Iterator, typename CMP_FN> inline void build_heap(const Iterator begin, const Iterator end, CMP_FN cmp) { const auto d = distance(begin, end) / 2; for (Iterator s = begin + d; s >= begin; s--) heapify(begin, end, s, cmp); } ///////////////////////////////////////////////// /// @Brief Heap sort implementation (inspiration from Cormen et al Book) // CMP_FN has type: D -> D -> bool ///////////////////////////////////////////////// template <typename Iterator, typename CMP_FN> void heap_sort(Iterator s, Iterator e, CMP_FN cmp) { auto d = std::distance(s,e); if(d <= 1) return; build_heap(s, e, cmp); while(d--){ e--; std::swap(*s, *e); heapify(s, e, s, cmp); } }
When benchmarked those are the timings I get:
---------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------------------- benchmark_random_values<quicksorter_hoare>/32768 2 ms 2 ms 154 benchmark_random_values<quicksorter_hoare>/65536 4 ms 4 ms 73 benchmark_random_values<quicksorter_hoare>/131072 8 ms 8 ms 34 benchmark_random_values<quicksorter_hoare>/262144 17 ms 17 ms 16 benchmark_random_values<quicksorter_hoare>/524288 36 ms 36 ms 8 benchmark_random_values<quicksorter_hoare>/1048576 77 ms 77 ms 4 benchmark_random_values<quicksorter_hoare>/2097152 159 ms 159 ms 2 benchmark_random_values<quicksorter_hoare>/4194304 336 ms 336 ms 1 benchmark_random_values<quicksorter_hoare>/8388608 701 ms 701 ms 1 benchmark_random_values<quicksorter_hoare>/16777216 1466 ms 1466 ms 1 benchmark_random_values<heap_sorter>/32768 2 ms 2 ms 111 benchmark_random_values<heap_sorter>/65536 5 ms 5 ms 53 benchmark_random_values<heap_sorter>/131072 12 ms 12 ms 24 benchmark_random_values<heap_sorter>/262144 25 ms 25 ms 11 benchmark_random_values<heap_sorter>/524288 55 ms 55 ms 5 benchmark_random_values<heap_sorter>/1048576 120 ms 120 ms 2 benchmark_random_values<heap_sorter>/2097152 264 ms 264 ms 1 benchmark_random_values<heap_sorter>/4194304 659 ms 659 ms 1 benchmark_random_values<heap_sorter>/8388608 1657 ms 1657 ms 1 benchmark_random_values<heap_sorter>/16777216 4038 ms 4038 ms 1 benchmark_random_values<sort_heap_std>/32768 2 ms 2 ms 108 benchmark_random_values<sort_heap_std>/65536 5 ms 5 ms 60 benchmark_random_values<sort_heap_std>/131072 9 ms 9 ms 30 benchmark_random_values<sort_heap_std>/262144 20 ms 20 ms 12 benchmark_random_values<sort_heap_std>/524288 40 ms 40 ms 7 benchmark_random_values<sort_heap_std>/1048576 87 ms 87 ms 3 benchmark_random_values<sort_heap_std>/2097152 175 ms 175 ms 2 benchmark_random_values<sort_heap_std>/4194304 365 ms 365 ms 1 benchmark_random_values<sort_heap_std>/8388608 757 ms 757 ms 1 benchmark_random_values<sort_heap_std>/16777216 1736 ms 1736 ms 1
As you can see from the timing the heap sorter is much slower than the std::sort_heap
and also than a quicksort that I wrote.
stdlib
when refering to the standard library?\$\endgroup\$