0% found this document useful (0 votes)
158 views18 pages

Math - Random in V8

The document summarizes how the author and their company realized that Math.random() in V8 JavaScript engine was flawed and led to collisions in randomly generated identifiers. They were generating random 22-character strings from a 64-character alphabet to use as identifiers, but received an email that two identical identifiers had been generated, which should be virtually impossible. Upon investigating pseudorandom number generators (PRNGs) more, they discovered that Math.random() in V8 has a small internal state and periodic behavior that limits the possible random values it can generate, bounding the number of distinct identifiers possible.

Uploaded by

Barrel Roll
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views18 pages

Math - Random in V8

The document summarizes how the author and their company realized that Math.random() in V8 JavaScript engine was flawed and led to collisions in randomly generated identifiers. They were generating random 22-character strings from a 64-character alphabet to use as identifiers, but received an email that two identical identifiers had been generated, which should be virtually impossible. Upon investigating pseudorandom number generators (PRNGs) more, they discovered that Math.random() in V8 has a small internal state and periodic behavior that limits the possible random values it can generate, bounding the number of distinct identifiers possible.

Uploaded by

Barrel Roll
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

TIFU by using Math.

random()
By Betable CTO, Mike Malone

ike most good TIFUs this didnt happen today. It actually happened about
L two years ago. It is, however, still relevant. More signicantly, its not just
me who screwed up. Math.random() in the V8 Javascript engine is screwed
up, too.
Many random number generators in use today are not very good. There is a
tendency for people to avoid learning anything about such subroutines; quite
often we nd that some old method that is comparatively unsatisfactory has
blindly been passed down from one programmer to another, and todays users
have no understanding of its limitations.

Donald Knuth; The Art of Computer Programming, Volume 2.


By the end of this post Im hoping well all agree on two things, one of which
may be slightly more controversial than the other:

We were stupid not to understand the limitations of V8s PRNG before


using it, and CSPRNGs are a safer option if youre feeling lazy.

The implementation of Math.random() in V8 should be replaced. The


current algorithm, which appears to have been passed down from one
programmer to another, is comparatively unsatisfactory (and arguably

completely broken) due to subtle, non-intuitive degenerate behavior that


is likely to be encountered under realistic circumstances.
For the record, I do want to say that I think V8 is a very impressive piece of
software and the people who work on it are clearly very talented. This isnt an
indictment of any of them. Rather, its an illustration of how subtle some
aspects of software development can be.
Now that we all know where were headed, lets begin at the beginning.
.

etable is built on random numbers. Aside from other more obvious uses
B we like using randomly generated identiers. Our architecture is
distributed and microservice-y, and random identiers are easier to
implement than sequential identiers in this sort of system.
For example, we generate random request identiers whenever we receive an
API request. We thread these identiers through to sub-requests in headers,
log them, and use them to collate and correlate all of the things that
happened, across all of our services, as a result of a single request.
Generating random identiers isnt rocket science. Theres only one
requirement
It must be really, really unlikely that the same identier will ever be generated
twice, causing a collision.
And there are just two factors that impact the likelihood of a collision
1. The size of the identier space the number of unique identiers that are
possible
2. The method of identier generation how an identier is selected from
the space of all possible identiers
Ideally we want a big identier space from which identiers are selected at
random from a uniform distribution (henceforth, well assume that anything
done at random uses a uniform distribution).
We did the birthday paradox math and settled on making our request
identiers 22 character words with each character drawn from a 64 character
alphabet. They look like EB5iGydiUL0h4bRu1ZyRIi or
HV2ZKGVJJklN0eH35IgNaB. Each character in the word has 64 possible
values, there are 22 characters, so there are 64 such words. That makes the
size of our identier space 64 or ~2.

With 2 possible values, if identiers were randomly generated at the rate of one
million per second for the next 300 years the chance of a collision would be
roughly 1 in six billion.
So weve got a big enough identier space, but how do we generate identiers
at random? The answer is a decent pseudo-random number generator
(PRNG), a common feature of many standard libraries. The top of our API
stack is a Node.js service (we also use a lot of Go, but thats another blog
post). Node.js, in turn, uses the V8 Javascript engine that Google built for its
Chrome web browser. All compliant ECMAScript (Javascript)
implementations must implement Math.random(), which takes no arguments
and returns a random number between 0 and 1. Good start.
So, given a sequence of pseudo-random numbers between 0 and 1 we need to
generate a random word with characters from our 64 character alphabet. This
is a pretty common problem, heres the pretty standard solution we chose

varALPHABET='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_

2
3

random_base64=functionrandom_base64(length){

varstr="";

for(vari=0;i<length;++i){

varrand=Math.floor(Math.random()*ALPHABET.length);

str+=ALPHABET.substring(rand,rand+1);

returnstr;

10

random_base64.jshostedwithbyGitHub

viewraw

Before you start picking it apart, theres nothing wrong with this code it
does exactly what its supposed to do.
Were in business. A procedure for producing random identiers that is
extremely unlikely to produce a collision, even if were producing a million a
second for 300 years. Test, commit, push, test, deploy.
The above code hit production and we forgot about it until a rather alarming
email came through from a colleague, Nick Forte, telling us that the
impossible had happened

nyone who considers arithmetical methods of producing random digits


A is, of course, in a state of sin. John von Neumann was stating the
obvious here its a simple tautology that deterministic methods, like
arithmetic, cannot produce random bits. Its a contradiction in terms. So what
is a PRNG?
What is a PRNG?

Lets ground our discussion by looking at a simple PRNG and the random
numbers it produces

This should make von Neumanns point clear the sequence of numbers
generated by this algorithm is obviously not random. For most purposes, this
non-randomness doesnt really matter. What we really need is an algorithm

that can generate a sequence of numbers that appear to be random


(technically, they should appear to be independent and identically distributed
random variables having a uniform distribution over the generators range).
In laymans terms, we should be able to safely pretend that the pseudo-random
numbers are truly random numbers.
If its hard to distinguish a generators output from truly random sequences we
call it a high quality generator. If its easy, we call it a low quality generator.
For the most part, quality is determined empirically by pulling a bunch of
numbers from the generator and running some statistical tests for
randomness (e.g., by checking that there are an equal number of 0 bits and 1
bits, guring out how many collisions there are, doing a Monte-Carlo
estimation of pi, etc). Another, more pragmatic measure of PRNG quality is
how well the algorithm actually works in practice as a stand-in for truly
random numbers.
Apart from its non-randomness, our simple algorithm demonstrates another
important characteristic of all PRNGs. If you keep pulling numbers, it will
eventually start repeating the same sequence over again. This is called
periodicity, and all PRNGs are periodic.
The period or cycle length of a PRNG is the length of the sequence of numbers
that the PRNG generates before repeating.
You can think of a PRNG as a highly compressed codebook containing a
sequence of numbers. The kind a spy might use as a one-time pad. The seed is
the starting position in the book. Eventually, youll loop around the end of the
book and get back to where you started. Thats a cycle.
A long cycle length doesnt guarantee high quality, but it helps. Often, cycle
length can be guaranteed by mathematical proof. Even when we cant
calculate cycle length exactly, we can determine an upper bound. Since a
PRNGs next state and output are both deterministic functions of its current
state, the cycle length cannot be larger than the number of possible states. To
achieve this maximum cycle length the generator must enter every possible
state before returning, again, to its current state.
If a PRNGs state has a k-bit representation, the cycle length is less than or equal
to 2. A PRNG that actually achieves this maximum cycle length is called a fullcycle generator.
Good PRNGs are designed so that their cycle length is close to this upper
bound. Otherwise youre wasting memory.

Lets go one step further and analyze the number of distinct random values
that a PRNG can produce through some deterministic transformation on its
output. For instance, consider the problem of generating triples of random
values between 0 and 15, like (2, 13, 4) or (5, 12, 15). There are 16 or 4096
such triples, but our simple PRNG can only produce 16 of them.

It turns out that this is another general characteristic of PRNGs


The number of distinct values that can be generated from a pseudo-random
sequence is bounded by the sequences cycle length.
The same holds regardless of the sort of value were producing we can only
produce 16 distinct tuples of four values (or any other length), 16 distinct
array shues, 16 random fuzz test values, etc. We will always only be able to
generate 16 distinct values of any type. Ever.
Remember our algorithm for producing random identiers? We randomly
generated words consisting of 22 characters drawn from a 64 character
alphabet. In other words, we generated tuples of 22 numbers between 0 and
63. Its the same problem, and the number of distinct identiers we can
produce is bounded by the size of the PRNGs internal state, and its cycle
length, in the same way.
Math.random()

Back to our problem. In response to Nicks email about identier collisions we


quickly reviewed our birthday paradox math and checked our scaling code.
We couldnt nd anything wrong, so the problem had to be deeper. Armed
with our general knowledge of PRNGs, lets start digging.
The ECMAScript standard says the following about Math.random()
Returns a Number value with positive sign, greater than or equal to 0 but less
than 1, chosen randomly or pseudo randomly with approximately uniform
distribution over that range, using an implementation-dependent algorithm or
strategy.

The specication leaves a lot to be desired. First, it doesnt mention anything


about precision. Since ECMAScript Numbers are IEEE 754 binary64 doubleprecision oats, we might expect 53-bit precision (i.e., random values taking
the form x/2 for all x in 0..2-1). Mozillas SpiderMonkey engine seems to
agree but, as well soon nd out, V8s Math.random() only has 32-bit precision
(i.e., values taking the form x/2 for all x in 0..2-1). Good to know, but no
matter, we only need six bits to produce a random letter from our 64 character
alphabet.
What does matter for us is that the specication leaves the actual algorithm up
to implementors. It has no minimum cycle length requirement, and is handwavy about the PRNGs quality the distribution just needs to be
approximately uniform. To do our analysis we need to know what algorithm
V8 uses. Its nowhere to be found in any documentation, so lets go to the
source.

The V8 PRNG

would have gone with mersenne twister since it is what everyone else
I uses (python, ruby, etc). This short critique, left by Dean McNamee, is the
only substantive feedback on the code review of V8s PRNG when it was rst
committed on June 15, 2009. Deans recommendation is the same one Ill
eventually get around to making in this post.
V8s PRNG code has been tweaked and moved around over the past six years.
It used to be native code, now its in user-space, but the algorithm has
remained essentially the same. The actual implementation uses internal API
and is a bit obfuscated, so lets look at a more readable implementation of the
same algorithm

varMAX_RAND=Math.pow(2,32);

varstate=[seed(),seed()];

3
4

varmwc1616=functionmwc1616(){

varr0=(18030*(state[0]&0xFFFF))+(state[0]>>>16)|

varr1=(36969*(state[1]&0xFFFF))+(state[1]>>>16)|

state=[r0,r1];

8
9

varx=((r0<<16)+(r1&0xFFFF))|0;

10

if(x<0){

11

x=x+MAX_RAND;

12

13

returnx/MAX_RAND;

14

mwc1616.jshostedwithbyGitHub

viewraw

Well, thats still pretty opaque, but lets slog through.


There is one more clue. A comment in older versions of the V8 source stated
simply: random number generator using George Marsaglias MWC
algorithm. A few minutes with Google teaches the following

George Marsaglia was a mathematician who spent much of his career


studying PRNGs. He also created the original Diehard battery of
statistical tests for measuring the quality of a random number generator.

MWC stands for multiply-with-carry, a class of PRNGs that Marsaglia


invented. MWC generators are very similar to classic linear congruential
generators (LCGs) like our simple example from earlier (in fact, there is a
one-to-one correspondence between MWCs and LCGs, see section 3.6 of
this paper for details). Their advantage over standard LCGs is that they
can produce sequences with longer cycle lengths with about the same
number of CPU cycles.

So if youre going to crib a PRNG o of someone, Marsaglia seems like a good


choice, and MWC seems like a reasonable algorithm.
The V8 algorithm doesnt look like a typical MWC generator though. It turns
out thats because the V8 algorithm is not an MWC generator. Its two MWC
sub-generators one on line 5, the other on line 6 combined to produce one
random number on line 9. Ill spare you the math, but each of the subgenerators have prime cycle lengths of about 2, making the combined cycle
length of the generated sequence about 2.
If youll recall, we have 2 possible identiers, but now we know that V8s
Math.random() can only produce 2 of them. Still, assuming a uniform

distribution, the probability of collision after randomly generating


100,000,000 identiers should be less than 0.4%. We started seeing collisions
after generating far fewer identiers than that. Something must be wrong
with our analysis. The cycle length estimate is provably correct, so we must
not have a uniform distribution there must be some additional structure to
the sequence being generated.
A Tale of Two Generators

Before returning to the V8 PRNG, lets look one more time at our random
identier generation code

varALPHABET='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_

2
3

random_base64=functionrandom_base64(length){

varstr="";

for(vari=0;i<length;++i){

varrand=Math.floor(Math.random()*ALPHABET.length);

str+=ALPHABET.substring(rand,rand+1);

returnstr;

10

random_base64.jshostedwithbyGitHub

viewraw

The scaling method on line 6 is important. This is the method that MDN
recommends for scaling a random number, and its in widespread use in the
wild. Its called the multiply-and-oor method, because thats what it does. Its
also called the take-from-top method, because the lower bits of the random
number are truncated, leaving the left-most or top bits as our scaled integer
result. (Quick note: its subtle, but in general this method is slightly biased if
your scaled range doesnt evenly divide your PRNGs output range. A general
solution should use rejection sampling like this, which is part of the standard
library in other languages.)

Do you see the problem yet? Whats weird about the V8 algorithm is how the
two generators are mixed. It doesnt xor the numbers from the two streams
together. Instead, it simply concatenates the lower 16 bits of output from the

two sub-generators. This turns out to be a critical aw. When we multiply


Math.random() by 64 and oor it we end up with the left-most, or top 6 bits of
the number. These top 6 bits come exclusively from the rst of the two
MWC sub-generators.

Bits from PRNG are red, from PRNG are blue.

If we analyze the rst sub-generator independently we see that it has 32 bits


of internal state. Its not a full-cycle generator its actual cycle length is about
590 million (18,030*2-1, the math is tricky but its explained here and here,
or you can just trust me). So we can only produce a maximum of 590 million
distinct request identiers with this generator. If they were randomly selected
there would be a 50% chance of collision after generating just 30,000
identiers.

If that were true, we should have started seeing collisions almost immediately.
To understand why we didnt, recall our simple example where we pulled
triples from a 4 bit LCG. Birthday paradox math doesnt apply for this
application the sequence is nowhere near random, so we cant pretend it is. Its
clear that we wont produce a duplicate until the 17th triple. The same thing is
happening with the V8 PRNG and our random identiers under certain
conditions, the PRNGs lack of randomness is making it less likely that well see
a collision.

In this case the generators determinism worked in our favor, but thats not
always true. The general lesson here is that, even for a high quality PRNG, you
cant assume a random distribution unless the generators cycle length is much
larger than the number of random values youre generating. A good general
heuristic is
If you need to use n random values you need a PRNG with a cycle length of at least
n.
The reason is that, within a PRNGs period, excessive regularity can cause poor
performance on some important statistical tests (in particular, collision tests).
To perform well, the sample size n must be proportional to the square root of
the period length. Page 22 of Pierre LEcuyers excellent chapter on random
number generation has more detail.
For a use case like ours, where were trying to generate unique values using
multiple independent sequences from the same generator, were less
concerned about statistical randomness and more concerned that the
sequences not overlap. If we have n sequences of length l from a generator
with period p, the probability of an overlap is [1-(nl)/(p-1)] , or
approximately ln/p for a big enough p (see here and here for details). The
point is we need a long cycle length. Otherwise were making a mistake
pretending our sequence is random.
Long story short, if youre using Math.random() in V8 and you need a
sequence of random numbers thats reasonably high quality, you shouldnt use
more than about 24,000 numbers. If youre generating multiple streams of
any substantial size and dont want any overlap, you shouldnt use
Math.random() at all.
If the algorithm that V8s Math.random() uses is poor quality, you might be
wondering how it was chosen at all. Lets see if we can nd out.
A Brief History of MWC1616

he MWC generator concatenates two 16-bit multiply-with-carry


T generators [] has period about 2 and seems to pass all tests of
randomness. A favorite stand-alone generator faster than KISS, which
contains it. Thats the extent of Marsaglias analysis of MWC1616, which is
the name of the algorithm that powers V8s Math.random(). If you take him at
his word, the algorithm ticks the box for most of the important criteria youd
consider in choosing a PRNG.
MWC1616 was rst introduced by Marsaglia in 1997 as a simple general
purpose generator that, in his words, seems to pass all tests of randomness
put to it, a comment that betrays Marsaglias largely empirical methodology.
He seems to have trusted an algorithm if it passed his Diehard tests.

Unfortunately, the Diehard tests he was using in the late 1990s werent that
good, at least by todays standards. If you run MWC1616 through a more
modern empirical testing framework like TestU01's SmallCrush it fails
catastrophically (it does even worse than the MINSTD generator, which was
outdated even in the 1990s, but Marsaglias Diehard tests probably didnt have
the granularity to tell him that).

//January12,1999/V8PRNG:((r0<<16)+(r1^0xFFFF))%2^32

varx=((r0<<16)+(r1&0xFFFF))|0;

3
4

//January20,1999:(r0<<16)+r1)%2^32

varx=((r0<<16)+r1)|0;

mwc1616versions.jshostedwithbyGitHub

viewraw

As far as I know, theres no mathematical basis for combining sub-generators


the way MWC1616 does concatenating subsets of the generated bits. Its
more typical to see bits from sub-generators mixed using some form of modulo
arithmetic (e.g., addition modulo 2, or xor). It appears that Marsaglia,
himself, became aware of this deciency shortly after posting MWC1616 on
Usenet as a component of one version of his KISS generator. On January 12,
1999, Marsaglia posted the version of MWC1616 used in V8. Eight days later,
on January 20, he posted a dierent version of the algorithm. Its subtle, but
in the updated version, the upper bits of the second generator are not masked
away, mixing bits from the two sequences more thoroughly.
Both versions of the algorithm appear in other places, adding to the
confusion. The January 20 version of MWC1616 (i.e., the better version) is in
Numerical Recipes, labeled MWC with Base b = 2, under the heading When
You Have Only 32-Bit Arithmetic, and only after rst advising that, rather than
implementing one of the algorithms listed, you should get a better compiler!
Pretty discouraging words for an algorithm thats better than what V8 has
powering Math.random(). Rather inexplicably (because its so obscure) the
January 20 version of MWC1616 is also given as an example computational
method in Wikipedias article on random number generation.
Implementations of the older January 12 version are included in TestU01
twice, once labeled MWC1616 and a second time labeled MWC97R. Its also
one of the generators available in R (apparently it used to be the default).
So there are lots of places the algorithm can be found. It was obscure to me,
but given the bona des listed above I guess its not surprising it was chosen.
Hopefully this article will serve as a warning, strengthening and conrming
Knuths observation that kicked o this post

In general, PRNGs are subtle and you should do your own analysis and
understand the limitations of any algorithm youre implementing or
using

Specically, dont use MWC1616, its not very good

There are lots of better options. Lets look at a couple of them.


The CSPRNG Workaround

To x our identier code we needed a replacement for Math.random() and we


needed it fast. Lots of alternative PRNG implementations exist for Javascript,
but we were looking for something that

Has a long enough period to generate all of our 2 identiers

Is well supported and battle tested

Luckily, the Node.js standard library has another PRNG that meets both
requirements: crypto.randomBytes(), a cryptographically secure PRNG
(CSPRNG) that calls OpenSSLs RAND_bytes (which, according to the docs,
produces a random number by generating the SHA-1 hash of 8184 bits of
internal state, which it regularly reseeds from various entropy sources). If
youre in a web browser crypto.getRandomValues() should do the same job.
This isnt a perfect general solution for three reasons

CSPRNGs almost always use non-linear transformations and are


generally slower than non-cryptographic alternatives

Many CSPRNG systems are not seed-able, which makes it impossible to


create a reproducible sequence of values (e.g., for testing)

CSPRNGs emphasize unpredictability over all other measures of quality,


some of which might be more important for your use case

However

Speed is relative, and CSPRNGs are fast enough for most use cases (I can
get about 100MB/s of random data from crypto.getRandomValues() in
Chrome on my machine)

In the limit, unpredictability implies an inability to distinguish the


generators output from true randomness, which implies everything else
we want out of a pseudo-random sequence

Its likely that a generator advertised as cryptographically secure has


been carefully code reviewed and subjected to many empirical tests of
randomness

Were still making some assumptions, but they are evidence-based and
pragmatic. If youre unsure about the quality of your non-cryptographic
alternatives, and unless you need deterministic seeding or require rigorous
proofs of quality measures, using a CSPRNG is your best option. If you dont
trust your standard librarys CSPRNG (and you shouldnt for cryptographic
purposes) the right solution is to use urandom, which is managed by the
kernel (Linux uses a scheme similar to OpenSSLs, OS X uses Bruce Schneiers
Yarrow generator).
I cant tell you the exact cycle length of crypto.randomBytes() because as far as
I know theres no closed form solution to that problem (i.e., no one knows).
All I can say is that with a large state space and a continuous stream of new
entropy coming in, it should be safe. If you trust OpenSSL to generate your
public/private key pairs then it doesnt make much sense not to trust it here.
Empirically, once we swapped our call to Math.random() with a call to
crypto.randomBytes() our collision problem went away.
In fact, Chrome could just have Math.random() call the same CSPRNG theyre
using for crypto.randomBytes(), which appears to be what Webkit is doing.
That said, there are lots of fast, high quality non-cryptographic alternatives,
too. Lets put a nal nail in the MWC1616 con and take a look at some other
options.
V8s PRNG is Comparatively Unsatisfactory

My goal was to convince you that V8s Math.random() is broken, and should
be replaced. So far weve found obvious structural patterns in its output bits,
catastrophic failure on empirical tests, and poor performance in the real
world. If you still want more evidence, here are some pretty pictures that
might sway you

Random noise from Safari (left) and V (right) generated in browser with this code.

Monte-Carlo Estimate of PI after iterations using this code.

Hopefully youll agree at this point that V8s Math.random() is comparatively


unsatisfactory and should be xed. The question is how should it be xed? A
one line patch would improve the bit-mixing, but I cant see any reason to
keep MWC1616 at all. There are better options.
A detailed comparison of the myriad existing methods for producing pseudorandom bits is going to have to wait for another post. Roughly, though, the
requirements were looking for are

A large state space, and a large seed ideally at least 1024 bits, since this
will be an upper bound on other qualities of the generator. A state space
of 2 is enough for 99.9% of use cases, with a signicant safety factor.

Speed, lets make it at least as fast as the current implementation which


produces around 25 million numbers per second on my machine.

Memory eciency well probably need at least 256 bytes for a


generator with 1024 bits of state in user-space Javascript (we can only
use 32 bits per 64 bit Number), if thats infeasible there are workarounds,
but Im going to assume we can aord this.

A very long period, a full cycle generator is great but anything over 2
should be sucient to avoid cycling. Anything over 2 should let us
safely pull 2 values while continuing to pretend weve got a random
sequence.

At a minimum, passing marks on empirical tests of randomness like


TestU01's SmallCrush extra credit for passing marks on BigCrush and
good equidistribution (which, unfortunately, I dont have room to
explain). TestU01 is more rigorous than Dieharder, and I dont know
much about the NIST tests or rngtest, but those might work too.

There are many PRNG algorithms that meet or exceed these requirements.
Xorshift generators (also discovered by Marsaglia) are fast and do very well
on statistical tests (much better than MWC1616). An xorshift variant called
xorgens4096 has been implemented in Javascript by David Bau. It has a 4096bit state space, a cycle length of ~2, and it runs faster than MWC1616 in
Chrome on my machine. Moreover, it has no systematic failures on BigCrush.
Recently its been shown that taking the output of an xorshift generator and
multiplying by a constant is a sucient non-linear transformation for the
generator to pass BigCrush. This class of generators, called xorshift*, is very
fast, easy to implement, and memory ecient. The xorshift1024* generator
meets or exceeds all of our requirements. If the memory premium turns out to
be a real problem, the xorshift64* generator has the same memory footprint,
a longer cycle length, and is faster than MWC1616, beating it on all counts.
Another new family of linear/non-linear hybrid generator called PCG claims
similar performance and quality characteristics.
So there are lots of good algorithms to chose from. That said, the safest choice
is probably a standard Mersenne Twister. The most popular variant,
MT19937, was introduced in the late 90s. Since then its become the standard
generator in dozens of software packages. Its not perfect, but it has been
battle tested and thoroughly analyzed. Its properties are well understood, and
it does well on empirical tests. With an ostentatiously long cycle length of
2-1 its hard to misuse, but it does have an imposing 2KB state space and is
criticized for its memory footprint and relatively poor performance. A quick
search uncovered an existing Javascript implementation, by Sean
McCullough. Inexplicably, its as fast as the existing Math.random()
implementation in Chrome on my machine.
So my advice is that V8 reconsider Dean McNamees comment from six years
ago and use the Mersenne Twister algorithm. Its fast enough, and robust
enough to be safely used by developers who dont have a deep understanding
of how PRNGs work. A more exotic alternative is ne too. Just get rid of
MWC1616, please!

In Summary

This was a long post, so let me re-cap the important bits

The PRNG algorithm behind V8s Math.random() is called MWC1616. If


youre only using the most signicant 16 bits it has a very short eective
cycle length (less than 2). In general, it does poorly on empirical tests of
quality. For most non-trivial use cases its not safe to pretend that its
output is truly random. Be careful using it for anything you care about.

Cryptographically secure PRNGs are a better option if you dont have


time to do proper diligence on non-cryptographic alternatives. The safest
option (and the right solution for crypto) is to use urandom. In browser
you can use crypto.getRandomValues().

There are options for non-cryptographic PRNG algorithms that are faster
and higher quality than MWC1616. V8 should replace its Math.random()
implementation with one of them. There are no losers. Mersenne Twister
(MT19937) is the most popular, and probably the safest choice.

Ill note, in passing, that Mozillas use of the LCG from Javas util.Random
package isnt much better than MWC1616. So SpiderMonkey should probably
go ahead and upgrade too.
In the meantime, the browser continues to be a confusing and dangerous
place. Be safe out there!
.

Special thanks to Nick Forte, who discovered the collision bug in our production
code, Wade Simmons, who originally tracked the problem down to V8s
Math.random() implementation, and the entire Betable Engineering team, who
put up with me ranting about random numbers for two weeks while I wrote this
post.

You might also like