Chaos Project

Game Development => Sea of Code => Topic started by: Blizzard on June 14, 2017, 02:06:33 pm

Title: Randomness and distribution
Post by: Blizzard on June 14, 2017, 02:06:33 pm
Those of you who worked with C or C++ probably know that on Win32 systems that the macro RAND_MAX is only 32,767 (0x7FFF). This causes problems if you want random values higher than that since due to the distribution, there are values that you will never get.

e.g. You want 1,000,000. 1,000,000 / 32,767 ~= 30.52. That means that you can get a 0 or a 31, but never anything between these two numbers.

Now, you could take two numbers and sum them up or multiply them, but that breaks the equal distribution of numbers and some values will happen more often than others. It's easiest illustrated with a few small values. Let's take a range from 0 to 2 inclusive. All possible combinations:

0+0, 0+1, 0+2
1+0, 1+1, 1+2
2+0, 2+1, 2+2

If you take a close look at all possible combinations, you will notice that the results are not equally probable. 0 and 4 appear only once, 1 and 3 appear both twice and 2 appears 3 times as a result.

The only way to really handle this issue is to combine the values in a way that preserves the equal distribution. If we simplify the entire thing and say it's only possible to generate 1 bit randomly, we can get the values 0 or 1. Now if we do this procedure N times, we can get a number that consists of N bits and all numbers generated in such a way will always be equally likely to appear and the distribution will be preserved. Since we already have 15 bits (32,767) at our disposal to be generated, we can just generate this value twice and then just combine them by shifting one of the numbers to have a 30 bit value.

I implemented this in our foundation library, you can take a look here at line 46 and 47 (or just find "#define HRAND()" if the code has changed since this post was made): https://github.com/AprilAndFriends/hltypes/blob/master/src/hltypesUtil.cpp
I do casting to int64_t, because I have multiplication and division with another 32-bit int so nothing breaks. Incidentally I think this is the same reason why Microsoft limited RAND_MAX to 32,767 in the first place.