PSA: Don't use std::rand()

With the <random> header in C++11 onwards, there really is no reason to use std::rand() to generate random numbers. In fact, using std::rand() could be really harmful.

std::rand() generates random numbers using a linear congruential engine. This is a simple formula – $x_{n+1} = (a x_n + b) \mod c$. The trouble with such a simple engine is that the numbers generated have very low entropy.

But I really don’t care about entropy, it’s not like I’m writing a cryptography application, you say? Well, let me tell you a cautionary tale about how using std::rand() sent me on a wild goose chase for over a week trying to hunt down a bug miles from where it really was.

Here’s the story. I was writing a piece of code that was supposed to return events at a random time, somewhere between x and y cycles in the future. I did not really care that the random numbers generated were good, I just wanted something quick and dirty. So, I ended up using code that looked like this:

time_type new_event_time = get_current_time() + (std::rand() % (y - x)) + x;

Looks good, and behaved reasonably well for almost all the test cases I threw at the problem. However, one specific test case was really sensitive to the order in which events returned, but only if the number of pending events was larger than a certain number z. This test started to fail, as is to be expected. However, the failures happened a great deal into the future, which suggested some weird errors in bookkeeping that would manifest after a certain number of insertions and deletions. This forced me to go on a bug-hunting spree that revealed absolutely nothing. Then, I discovered a talk by STL, which is the inspiration for this post as well. In this talk, STL shows how bad the linear congruential engine in the std::rand() function really is.

So, I simply replaced the random number generation code. The std::srand() equivalent code is

std::random_device rd;
std::mt19937 mt(rd());

std::uniform_int_distribution dist(x, y);

and the equivalent code for

time_type new_event_time = get_current_time() + (std::rand() % (y - x)) + x;

simplifies to

time_type new_event_time = get_current_time() + dist(mt);

Simple, isn’t it? With this new piece of code, my test began to fail much earlier, which prompted me to discover the real source of the bug.

I was intimidated by the new C++11 random number generation library because it requires the creation of a random device, a pseudo-random engine and a distribution, which is three more than a call to std::rand(). However, the low quality of std::rand() makes it unsuitable for even the simplest tasks requiring random numbers. My advice is, don’t ever use std::rand() and think you can get away with it.

Related

comments powered by Disqus