Categories
C++ R Rcpp

Insert and Remove Performance of Boost’s flat_set v.s. std::set

The standard way to represent an ordered set of numbers is with a binary tree. This offers a good mix of performance properties as the number of elements gets large. In particular it offers O(log(n)) operations insertion/deletion, O(log(n)) operations to find an element. Finding the ith element of the set takes more time, O(n) operations for random access compared to a vector, which has O(1) complexity for that task.

In C++ std::set implements an ordered set as a binary tree, hiding all the ugly details for you. But std::set is not always the right structure to use. Unless you are doing lots of inserts and removals in large sets, binary trees can be inefficient because they have to do things like pointer referencing, and the data is not stored in a continuous block leading to cache misses. This is why some have declared that you shouldn’t use set.

An alternative is to use a sorted std::vector as the set data structure. This improves the performance of random access to O(1), and worsens the performance of insertion/deletion to O(n). There is a handy drop in replacement for std::set in boost called flat_set. You can mostly take any code using set and switch it to flat_set with no changes to the logic.

Recently I was in a situation where I had performance critical code using many small sets (typically between 0 and 20 elements). This code does lots of insertions and deletions, so one would initially think that flat_set is not a good option with its O(n) complexity, but remember that complexity is an asymptotic statement as n grows, and I was relatively certain that my n was small. So which should be used? The only way to find out was to do some simulations.

For each n I generated a set<int> and flat_set<int> with equally spaced integers from 0 to 100,000,000, then I inserted and removed 500,000 random integers in the same range and recorded the timings. The compiler optimization was set to high ( -O3 ).

size  flat_set  std::set
2     0.02791   0.070053
4     0.031647  0.07919
8     0.038431  0.08474
16    0.0528    0.091744
32    0.0697    0.104424
64    0.085957  0.129225
128   0.1176    0.129537
256   0.15825   0.138755
512   0.253153  0.148642
1024  0.401831  0.156223
2048  0.718302  0.166177
4096  1.35207   0.176593
8192  2.5926    0.19331

Times are in seconds here, so for small sets (2-16 elements) flat_set is twice as fast, and beats std::set all the way up though 128 elements. By 4096 elements we are paying the asymptotic cost however and flat_set is >10x slower. flat_set is vector backed, so we know that it will be much faster at other operations like random access, iteration and find because it is in a contiguous block of memory. The surprising thing is that it is even faster at insertions and deletions provided the set size is modest.

If you are an R user, you can use flat_set very easily now with the new BH package. Simply add it as a linkingTo in your package and Bob is your uncle. Below is the code that I used for the simulations (it uses Rcpp but that can easily be taken out).

	
        #include <boost/container/flat_set.hpp>
        #include <set>
        #include <ctime>
        #include <Rcpp.h>
        GetRNGstate();
        std::clock_t start;
	double d1,d2;
	boost::container::flat_set<int> fs;
	std::set<int> ss;
	int n = 500000;
	int max = 100000000;
	for(int i = 1;i<14;i++){
		int s = round(pow(2.0,i));
		d1=0.0;
		d2=0.0;
		fs.clear();
		fs.reserve(s*2);
		ss.clear();
		for(int j=0;j<s;j++){
			int val = round(((j+1)/(double)s)*max);//floor(Rf_runif(0.0,max));
			fs.insert(val);
			ss.insert(val);
		}
		start = std::clock();
		for(int j=0;j<n;j++){
			int rand = floor(Rf_runif(0.0,max));
			bool ins = fs.insert(rand).second;
			if(ins)
				fs.erase(rand);
		}
		d1 += ( std::clock() - start ) / (double) CLOCKS_PER_SEC;

		start = std::clock();
		for(int j=0;j<n;j++){
			int rand = floor(Rf_runif(0.0,max));
			bool ins = ss.insert(rand).second;
			if(ins)
				ss.erase(rand);
		}
		d2 += ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
		std::cout << s << ", " << d1 << ", " << d2 << "," << std::endl;
	}
	PutRNGstate();