Streamlined iteration: exploring keys and values in C++20

In software, we often use key-value data structures, where each key is unique and maps to a specific value. Common examples include dictionaries in Python, hash maps in Java, and objects in JavaScript. If you combine arrays with key-value data structures, you can represent most data.

In C++, we have two standard key-value data structures: the std::map and the std::unordered_map. Typically, the std::map is implemented as a tree (e.g., a red-black tree) with sorted keys, providing logarithmic time complexity O(log n) for lookups, insertions, and deletions, and maintaining keys in sorted order. The std::unordered_map is typically implemented as a hash table with unsorted keys, offering average-case constant time complexity O(1) for lookups, insertions, and deletions. In the std::unordered_map, the hash table uses a hashing function to map keys to indices in an array of buckets. Each bucket is essentially a container that can hold multiple key-value pairs, typically implemented as a linked list. When a key is hashed, the hash function computes an index corresponding to a bucket. If multiple keys hash to the same bucket (a collision), they are stored in that bucket’s linked list.

Quite often, we only need to look at the keys, or look at the values. The C++20 standard makes this convenient through the introduction of ranges (std::ranges::views::keys and std::ranges::views::values). Let us consider two functions using the ‘modern’ functional style. The first function sums the values and the next function counts how many keys (assuming that they are strings) start with a given prefix.

template<typename map_type>auto sum_values(const map_type&map){auto values =map|std::ranges::views::values;returnstd::accumulate(values.begin(), values.end(),typename map_type::mapped_type{});}template<typename map_type>size_t count_keys_with_prefix(const map_type&map,std::string_view prefix){auto keys =map|std::ranges::views::keys;returnstd::ranges::count_if(keys,[prefix](std::string_view key){return key.starts_with(prefix);});}
The code defines two template functions for processing a map-like container. The sum_values function takes a map of any type and computes the sum of its values by extracting them using std::ranges::views::values, then applying std::accumulate to add them up, starting with a default-constructed value of the map’s value type. The count_keys_with_prefix function counts how many keys in the map start with a given prefix by extracting the keys with std::ranges::views::keys and using std::ranges::count_if with a lambda that checks if each key begins with the specified prefix via starts_with. Both functions leverage C++20’s ranges library for concise and expressive manipulation of the map’s keys and values.
If you do not crave the functional style, you can use loops… like so…
template<typename map_type>auto sum_values_daniel(map_type &map){typename map_type::mapped_type sum{};for(constauto&value :map|std::ranges::views::values){ sum += value;}return sum;}template<typename map_type>size_t count_keys_with_prefix_daniel(const map_type &map,std::string_view prefix){size_tcount=0;for(constauto&key :map|std::ranges::views::keys){if(key.starts_with(prefix)){++count;}}returncount;}
I personally find loops more readable than lambda functions, but it might be worth testing both to see which one provides the best performance.

When you do not have C++20, you can write the same functions using a more conventional style:

template<typename map_type>typename map_type::mapped_type sum_values_cpp11(const map_type&map){typename map_type::mapped_type sum =typename map_type::mapped_type();for(typename map_type::const_iterator it =map.begin(); it !=map.end();++it){ sum += it->second;}return sum;}template<typename map_type>size_t count_keys_with_prefix_cpp11(const map_type&map,conststd::string& prefix){size_tcount=0;for(typename map_type::const_iterator it =map.begin(); it !=map.end();++it){conststd::string& key = it->first;if(key.size()>= prefix.size()&& key.compare(0, prefix.size(), prefix)==0){++count;}}returncount;}
Prior to C++20, there was no dedicated starts_with function to directly check if a string was a prefix of another, requiring more verbose methods like compare. Of course, you can combine the old-style code with the ‘starts_with’  method. I think that the addition of functions like ‘starts_with’ is an underrated benefit of the recent C++ standards. Small details can be impactful.

Let us compare the performance. I have generated a large key-value map with a thousand keys (as strings) with corresponding integer values. All my keys begin with ‘key_’ and I ask my functions to count how many keys begin with ‘key_’ (that should be all of them). I use both an std::map and std::unordered_map data source. I record how many nanoseconds I need to spend per key checked. I use two machines, one with an Intel Ice Lake processor running at 3.2 GHz with GCC 12 and LLVM 16 as the compilers and one with an Apple M2 processor running at 3.5 GHz with LLVM 15 as the compiler. They do not use the same instruction set, so we cannot directly compare the number of instructions retired: we expect an ARM-based system like the Apple M2 to use a few more instructions.

Using an std::map I get the following timings and instruction counts:

functionApple M2/LLVM15Intel Ice Lake/GCC12Intel Ice Lake/LLVM16
C++20 functional2.2 ns5.5 ns5.8 ns
C++20 loop2.0 ns5.5 ns5.8 ns
C++114.0 ns5.5 ns5.5 ns
C++11 with starts_with2.0 ns5.5 ns5.8 ns
functionApple M2/LLVM15Intel Ice Lake/GCC12Intel Ice Lake/LLVM16
C++20 functional46 instructions29 instructions49 instructions
C++20 loop41 instructions29 instructions49 instructions
C++1158 instructions30 instructions30 instructions
C++11 with starts_with43 instructions30 instructions49 instructions

Using an std::unordered_map, I get the following timings and instruction counts:

functionApple M2/LLVM15Intel Ice Lake/GCC12Intel Ice Lake/LLVM16
C++20 functional1.1 ns3.8 ns3.8 ns
C++20 loop0.9 ns5.4 ns3.8 ns
C++110.9 ns5.4 ns3.8 ns
C++11 with starts_with0.9 ns5.4 ns3.8 ns
functionApple M2/LLVM15Intel Ice Lake/GCC12Intel Ice Lake/LLVM16
C++20 functional35 instructions10 instructions30 instructions
C++20 loop30 instructions13 instructions30 instructions
C++1126 instructions14 instructions12 instructions
C++11 with starts_with32 instructions14 instructions30 instructions

We find that iterating over an std::map instance is slower than iterating over an std::unordered_map instance.

Performance comparisons between Apple M2/LLVM15 and Intel Ice Lake/GCC12 systems reveal striking differences in C++ code execution. While functional approaches (e.g., C++20 ranges) can outperform traditional loops by up to 40% under GCC 12, using starts_with for string comparisons dramatically halves running time for std::map on Apple/LLVM15.

Despite small input sizes fitting in cache, Intel’s iteration performance is not compute-bound, achieving under 2 instructions per cycle (IPC), while Apple’s exceeds 4 IPC, highlighting its architectural advantage in optimizing data access patterns.

My source code is available.

Daniel Lemire, "Streamlined iteration: exploring keys and values in C++20," in Daniel Lemire's blog, April 20, 2025, https://lemire.me/blog/2025/04/20/iterating-through-keys-and-values-in-c-with-c20-code/.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

3 thoughts on “Streamlined iteration: exploring keys and values in C++20”

  1. The mixing of compilers and CPUs makes it hard to see how much of the performance difference is due to the compiler LLVM/GCC and how much is due to CPU arch. ARM/Intel.
    A more fair comparison would be between the same version of LLVM on both CPUs.

  2. The inner loop of the “C++11 starts_with” variant gets faster if you replace the std::string& parameter by a std::string_view (which is C++17). Apart from instruction order, the generated code is then equivalent to “C++20 functional”. IOW, the difference appears to be due to the aliasing rules.

Leave a Reply

You may subscribe to this blog by email.

close