Comparing C++ range libraries for filter+reverse case with non-trivial lambda

Claudio Ctin4 months ago14 mins

C++ ranges are awesome as they often make the code simpler and easier to reason about. But how they compare against each other? To check that I did a micro-benchmark of c++ ranges for one particular use case I encountered in practice.

The scenario I tested was as follows:

I have an input data set that needs to be filtered based on non-trivial lambda.
Only a subset of this data is needed.
Then an index is added to the elements.
The last step is to reverse the sequence.

The repo with source code is here https://github.com/serpent7776/cpp-ranges-bench

It’s basically filter+take+enumerate+reverse, but there are two tricky things here:

The non-triviality of the lambda being used for filtering. In the example code, such a lambda is often a trivial function. What happens when the cost of invoking such lambda is not not negligible?
Filter+reverse is a tricky combination. Filtering yields a result set of unknown size, which means that subsequent reverse cannot be done easily in the general case.

The ranges library needs to take both of these situations into account, otherwise it might introduce noticeable overhead.

Let’s check it out.

The code uses catch2 to test for correctness and performance.
Catch2 is a unit testing framework for C++, but it also provides basic micro-benchmarking features.
Unfortunately, it doesn’t provide any way to visualise the results, so I created a script and submitted PR adding plotting capabilities to Catch.

Tested implementations

I have reviewed the following implementations:

clike – a simple C-like implementation. This still uses c++ vector and its methods, but it’s just a for loop with if statements. It collects the result in a vector and when it’s done, it reverses it in-place.
algorithms – Manual implementation using STL algorithms. Instead of operating on the elements themselves, it allocates an extra vector with indices and operates on that. It only copies the elements in the last step.
boost_adaptors – Uses boost::adaptors. It provides very minimal feature set, but it’s surprisingly capable. Due to the limitations of adaptors library, it allocates an extra vector of filtered elements and operates on that.
rangesv3 – Uses ranges-v3 – range library for C++14/17/20. Pretty close to the implementation using std::ranges, but a bit more complex, because it distinguishes between actions and views. This method uses the reverse action with views and to_vector, which might allocate an extra vector.
stdranges – Uses C++23 ranges. This is the shortest and cleanest implementation. The compiler I was using didn’t have to implemented, so I had to roll my own. This may have affected the results.
fluxranges – Uses flux, a C++20 library for sequence-oriented programming. Quite noisy compared to other range implementations.

I used the following versions of the libraries:

ranges-v3 0.12
fluxranges faf2e0f8b9f616001890a87b613f47313443f8e5
g++ stdlib, g++ (GCC) 13.2.1 20230801
All compiled in C++23 mode with the -O2 flag.

Test cases

I have generated some random data using the faker python library. The data is a set of usernames with an id and a set of connected names. I will search for usernames that match some criteria which involves looking through the connected names. This should be enough to consider the filter criteria as non-trivial.
To do the comparisons I have generated 10 000 rows of data.

The test cases I will check are:

all ismiths – select all items that have “ismith” as connection. There are 15 such entries in total.
5 ismiths – select at most 5 items that have “ismith” as connection.
empty result set – searches for an item that doesn’t exist, yielding no results.
early single item – searches for all items that have “henry79” as connection. There’s only one such item, early in the sequence.
late single item – search for all items that have “emilymclaughlin” as connection. There’s only one such item, late in the sequence.
every other item – accept every other item, while also performing some operation on the input data.

Results

In all cases flux was noticeably slower than other methods.
Ranges-v3 was generally on par with other solutions. There are two cases where it’s slower: ‘all ismiths’ and ‘early single item’.
boost adaptors are overall quite good. In ‘all ismiths’, ’empty result set’ and ‘late single item’ are on par with fastest implementation and just slightly slower in other cases.
C-like solution was always on par or better than other solutions.

Some more details

Let’s try to see some more details. I will look at the ‘early single item’ test. This case is interesting, because there are big differences between the implementations.

First, let’s run the test case for a few more times. I’ll make the test cases run 1000 times by adding this code to each implementation section.

auto _ = GENERATE(range(1, 1000));

This uses Catch2 data generator to execute the section multiple times, each time with a different value. In this case I’m not interested in the value itself, I’m only doing this to run the section again.

Now let’s run perf stat on it. To collect the data I’ll will only run this single test case and only one implementation at a time, ignoring the benchmarks. I do this by invoking the compiled binary with the test-case name, the -c option with a section name that specifies which implementation to use and the –skip-benchmarks option so that no benchmarks are run. You can read more about Catch2 command line usage.
perf stat is invoked with -o stats –append, which appends the perf data to stats file. Read more about perf stat usage.

 : > stats # truncate stats file
 for t in clike algorithms boost_adaptors rangesv3 stdranges fluxranges; do
 perf stat -o stats –append ./read “early single item” -c $t –skip-benchmarks ;
 done

I will then convert the stats file with a perf-stat-to-sc script I wrote. It will create a sc-im spreadsheet, which will make it easier to compare the results.

implementation
context_switches
cpu_migrations
page_faults
cycles
instructions
insn_per_cycle
branches
branch_misses
backend_bound
bad_speculation
frontend_bound
retiring
real
user
sys

stdranges
0
0
50,796
11,279,370,035
29,524,803,591
29,524,803,591
7,633,968,112
106,521,026
5.9
19.5
28.8
45.8
3.646195512
3.502992000
0.136365000

clike
0
0
1,411
10,338,071,101
28,584,460,227
28,584,460,227
7,385,993,023
86,580,500
5.8
16.5
28.8
48.9
3.319227798
3.255089000
0.056539000

boost_adaptors
0
0
1,398
11,255,210,775
29,537,931,422
29,537,931,422
7,634,756,489
106,986,997
4.4
26.6
26.7
42.4
3.533653076
3.448212000
0.066610000

fluxranges
0
0
252,772
13,048,628,986
33,858,601,153
33,858,601,153
8,604,259,911
123,154,769
7.3
20.8
26.5
45.5
4.400705838
4.032191000
0.359219000

rangesv3
0
0
1,395
12,079,299,571
30,489,816,925
30,489,816,925
7,883,204,145
125,824,949
4.5
29.1
25.8
40.6
3.805896934
3.701721000
0.082913000

algorithms
0
0
7,620
11,329,769,168
29,617,205,033
29,617,205,033
7,644,573,504
106,472,532
6.1
18.8
28.8
46.3
3.656246734
3.548037000
0.099826000

The first thing to notice is the difference in the number of page faults. c-like, rangesv3 and boost adaptors had about 1400 faults. For custom algorithms it was ~7600. For std::ranges the number of faults was an order of magnitude larger (~50,000). For flux it was another order of magnitude larger (~250,000). The differences in cycles and instructions are quite small. It’s a bit bigger for flux though. The C-like implementation executed slightly fewer instructions than the other implementations. The number of branches is also similar, slightly higher for flux. There are bigger differences in branch misses. The C-like implementation has the least number of misses. Flux and rangesv3 have the most misses. Flux and std::ranges have spent more time in the kernel.

But this collects stats from the entire program run. The issue is that most of the time is spent on reading and parsing the input file, which is the same in all cases. I want to collect stats only for the single implementation function. For this I can use valgrind’s cachegrind with additional code instrumentation.
I will add CACHEGRIND_START_INSTRUMENTATION and CACHEGRIND_STOP_INSTRUMENTATION just before and after the implementation function I’m interested in and then run the binary with –instr-at-start=no which will disable instrumentation at startup and will wait for the first CACHEGRIND_START_INSTRUMENTATION to enable it.
Read more about cachegrind usage.

 SECTION(“clike”) {
 auto _ = GENERATE(range(1, 1000));
 CACHEGRIND_START_INSTRUMENTATION;
 const std::vector<Out> found = clike(data, accept, max_items);
 CACHEGRIND_STOP_INSTRUMENTATION;
 REQUIRE(found == expected);
 }

 valgrind –tool=cachegrind –instr-at-start=no ./read ‘early single item’ -c fluxranges

This is the data I got:

implementation
instructions executed

clike
246 384

algorithms
416 548 909

boost_adaptors
336 002 843

rangesv3
671 686 872

stdranges
336 037 633

fluxranges
3 241 270 519

This confirms that C-like executes far fever instructions than other implementations and flux ranges, by far, the most.

Conclusions

Ranges are really cool. They often make the code easier to read and reason about. But they do not always fit all the use cases. Sometimes, in some very specific cases, a for loop might give more control that is needed to get some better results. In general though, it’s usually better to start with a ranges, as they will be good enough in most cases.

Please follow and like us:

Stiri similare

Python vs Java: A Deep Dive into the Best Programming Language for You

Claudio Ctin1 week ago1 week ago

Hey everyone! How’s your week going? 😊 Whether you’re in the middle of a coding marathon, enjoying a well-deserved break, or just here to explore new tech ideas, we’re happy to have you. Today, we’re diving into a hot topic: Python vs. Java. 🚀 These two programming giants are often at the center of debates,…

Using TypeORM with TSX: A Smoother Development Experience

Claudio Ctin1 week ago1 week ago

Working with TypeScript in Node.js applications can sometimes be challenging, especially when integrating with tools like TypeORM. While ts-node has been a popular choice for running TypeScript directly, it often comes with configuration headaches and performance bottlenecks. In this article, we’ll explore using tsx as an alternative to ts-node for TypeORM, providing a smoother and…

Should We Allow to Appear as A Child of ?

Claudio Ctin1 week ago1 week ago

It actually works well. Please follow and like us:

Next Js Localisation

Claudio Ctin1 week ago1 week ago

Hello All, I am new to this community and thank you for your support and time. I am working on a bilingual site (English and Japanese). I am using next-intl package to set the locale. I am trying to implement the following use cases Set the default language to user browser language Set the default…

Scale Deployments Based on HTTP Requests with Keda

Claudio Ctin1 week ago1 week ago

I’m using CloudWatch as a trigger, which means I have installed the CloudWatch agent responsible for sending metrics to CloudWatch. My KEDA fetches metrics based on the collection interval defined in the configuration of ScaledObject. You can follow the configuration below to set up the ScaledObject based on the number of HTTP requests. apiVersion: keda.sh/v1alpha1…

Human readable time in Rust

Claudio Ctin1 week ago1 week ago

When it comes to formatting time into the human-readable format in Rust, there could be various possible solutions including doing some math or using Instance. However, I demonstrate the millisecond crate, the dedicated and specialized crate for the purpose. This crate converts nanoseconds, microseconds, milliseconds, seconds, etc into short and long formats; suitable for human…

Introduction to the MERN Stack: Building Full-Stack Applications with Ease 🚀

Claudio Ctin1 week ago1 week ago

In the world of web development, the MERN stack has emerged as a powerful solution for creating dynamic and interactive applications. Comprising four key technologies—MongoDB, Express.js, React, and Node.js—this stack allows developers to build full-stack applications using JavaScript throughout. In this blog post, we’ll explore how these technologies work together, their pros and cons, and…

NEXT-JS PROJECT

Claudio Ctin1 week ago1 week ago

REVIEW In the course of two weeks I have gained skills needed to build a full-stack Next.js applications.On this project we were expected to build a full-stack web application which should have a public home page, login page, dashboard pages and setting up database. REQUIREMENTS . Operating system; MacOS, Windows or Linus . Node.js 18.17.0…

Special Types of Interviews: How to Identify and Succeed in Them..

Claudio Ctin1 week ago1 week ago

When preparing for interviews, it’s essential to recognize that not all interviews are alike. Some interviewers ask questions that follow specific patterns, and failing to identify these can make the process more challenging. Understanding these distinct types of interviews can help you better prepare and succeed. Let’s dive into the three main types: 1) Behavioral…

Essential Tips for Aspiring Web Developers

Claudio Ctin1 week ago1 week ago

Web development is an ever-evolving field that offers endless opportunities for creativity and innovation. Whether you’re just starting or looking to refine your skills, these tips will help you confidently navigate the world of web development. 1. Practice Regularly: Consistent practice is key to mastering web development. Set aside time each day to code, experiment…

Enhancing Healthcare Outcomes with Oracle Cloud Infrastructure: A Path to Better Results

Claudio Ctin1 week ago1 week ago

In today’s rapidly changing healthcare environment Leveraging advanced technology is critical to improving patient outcomes and operational efficiency. Oracle Cloud Infrastructure (OCI) offers a comprehensive suite of solutions designed to transform healthcare delivery by Improved data management Strong security guarantees and advanced analytics This article dives into how OCI drives better healthcare outcomes and why…

Mastering React: A Comprehensive Learning Guide

Claudio Ctin1 week ago1 week ago

Welcome to a thorough journey through React and its ecosystem! This guide will provide you with a structured path for mastering React, from understanding basic concepts to exploring advanced topics in React and associated technologies. Getting Started with React React is a powerful JavaScript library for building user interfaces, mainly for single-page applications where you…