Skip to content

DOC: refactor BitGenerator docs to allow comparison and reccomendations #13675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

mattip
Copy link
Member

@mattip mattip commented May 30, 2019

In order to ground the discussion in #13635, I refactored the BitGenerator documents to try to compare and contrast the alternatives. There is information missing, help needed to correct my misunderstandings and fill in missing information.

.. _`PCG author's page`: http://www.pcg-random.org/
.. _`xorshift, xoroshiro and xoshiro authors' page`: http://xoroshiro.di.unimi.it/
.. _`Random123`: https://www.deshawresearch.com/resources_random123.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this information to bit_generators/index.rst and included the top section of that back in here verbatim via an .. include:: directive

.. rubric:: Footnotes

.. [1] More is better. As always with benchmarks, these are rough guides, your
experience may vary. Win32 is relative to linux64, ``dSFMT``
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table, which is difficult to format in RST, needs to be filled in (or revamped or removed). The numbers in the speed column are made up. Should we include timings at all here, should they be relative to a single baseline or per-OS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do speed relative to MT19937, which is the current release default.

@mattip
Copy link
Member Author

mattip commented May 30, 2019

There are a number of separate issues here. Github does not support threaded discussions, so if the answer is not simple and definitive it should become a separate issue/PR. It is very difficult to follow an issue with tens of comments and use of email to respond:

  • Why not just use dSFMT everywhere?
  • when to prefer Philox over ThreeFry and visa-versa? If they are indistiguishable maybe we should include only one of them?
  • when to prefer PCG over Philox/ThreeFry?
  • Should we provide a bit more background on the differences between the various families?

More?

@rkern
Copy link
Member

rkern commented May 30, 2019

  • Why not just use dSFMT everywhere?

MT algorithms fail statistical quality tests.

:class:`~.xoshiro512.Xoshiro512` where the `~.xoshiro512.Xoshiro512.jumped`
method is used to advance the state. For very large scale applications --
requiring 1,000+ independent streams, :class:`~pcg64.PCG64` or
:class:`~.philox.Philox` are the best choices.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention here that MT algorithms fail statistical quality tests? What would be a good phrasing?

Copy link
Contributor

@bashtage bashtage Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is out of date given the recent discussions and should be updated. Ultimately it will depend on what generators are included.

@mattip
Copy link
Member Author

mattip commented May 30, 2019

@bashtage ping, don't know if you saw this PR

@charris charris changed the title DOC: refactor BitGeneator docs to allow comparison and reccomendations DOC: refactor BitGenerator docs to allow comparison and reccomendations May 31, 2019
@mattip
Copy link
Member Author

mattip commented Jun 10, 2019

MT algorithms fail statistical quality tests.

@rkern could you suggest a wording that would be appropriate for this summary, or a link for more information? Is this wikipedia page accurate?

@rkern
Copy link
Member

rkern commented Jun 10, 2019

That list of advantages and disadvantages is not terribly applicable to us, and inexplicably puts passing "most, but not all, of the TestU01 tests" in the "advantages" category. It's very hard not to pass most of TestU01; the fact that there are smaller, faster PRNGs that do pass all of TestU01, that has to count as a disadvantage in my mind. The original TestU01 paper from 2007 has the details about the failures that MT19937 encounters in testing. One can quibble about whether or not the specific tests that it does fail are "important" or "designed to make MT and other LSFR-based PRNGs fail", which is why I'm not running around telling everyone to retract their Monte Carlo papers that used MT. But it does mean that it's not what I'm recommending as the default in the future.

On the same note, this is also why I lean against Xoshiro as well. They are also LSFR-based and show some statistical oddities that a PRNG really shouldn't despite its using an output function to help mix things up. That said, it doesn't fail the tests out-of-box, yet. I doubt anyone will really run into problems. But since there are PRNGs that don't have these flaws, I won't recommend this one.

@bashtage
Copy link
Contributor

It would probably be best to replace the Philox implementation which is formally called Philox4x64 with Philox4x32. Philox4x32 is widely used since it is trivial to generate 1000s of independent streams by setting a key value and performs well on GPUs. It is included by default in TensorFlow, PyTorch and recent versions of MATLAB.

@mattip
Copy link
Member Author

mattip commented Jun 10, 2019

Hmm. If we supply more than one built-in bit_generator, it would be good to be able to at least give users some idea what the tradeoffs are or some other criteria for preferring one over the other, even as a set of external links. The table now renders like this. What can we add to the last column to describe them?

@bashtage
Copy link
Contributor

As for recommendations, I don't think that this is necessary (and perhaps not good) aside from choosing the default bit generator. Most people will stick with the default, so this is the important recommendation that matters. If a user decides to use another generator for some reason, then it seems reasonable to assume they are makign an informed decision.

@mattip
Copy link
Member Author

mattip commented Jun 10, 2019

Then why include them in NumPy, they should be in an external add-on package?

@bashtage
Copy link
Contributor

bashtage commented Jun 10, 2019

@mattip One could consider "why" bit generators are supplied. One is obvious MT19937: legacy. For some of the others, my reasoning is:

  • dSFMT: Widely used across many software packages as the default generator. Probably behind as many papers as any other generator. Good performance on any CPU with SSE2 or Altivec. I personally prefer its cousin SFMT which is faster in most applications now since random doubles are less important than random 64 bit unsigned integers. This said, SFMT isn't common, and one of the two reasons to include dSFMT is the ubiquity of this bit generator.
  • ThreeFry and Philox?x??: modern counter-based RNGs that are trivial to safely scale to 1000s of instances. This type is widely used in machine learning. These are probably very safe to recommend.
  • Xoshiro: Fast and popular bit generator, despite some of the corner case reservations. Could safely cut the 512 version.
  • PCG: Lots of reasons. Could probably cut PCG32, although PCG64 has very poor performance on 32-bit Windows (and pretty bad on 32-bit Linux).

@mattip mattip force-pushed the refactor-bit_generator branch from f90779f to 5f631ef Compare June 10, 2019 10:14
@mattip mattip force-pushed the refactor-bit_generator branch from 5f631ef to cfa232b Compare June 10, 2019 10:43
@mattip
Copy link
Member Author

mattip commented Jun 10, 2019

I incorporated the latest comments into the refactored page and updated the table with the performance data. Any thoughts?

@bashtage
Copy link
Contributor

Should probably cut the top tables to have the same entries as the bottom. The top seems a bit wide to me.

In between two tables `RandomState(MT19937())` should probably be ``RandomState(MT19937())``

@rkern
Copy link
Member

rkern commented Jun 10, 2019

dSFMT: Widely used across many software packages as the default generator.

Is it? Which packages? The first package that comes up for me when googling "dSFMT" is randomgen itself (at least in my filter bubble).

@rkern
Copy link
Member

rkern commented Jun 10, 2019

That list reads weird to me. Those were @bashtage's reasons for incorporating each one. That's not necessarily good advice to help a user select which one to use, should they want to stray from the default.

I'd help with wording, but I still maintain that we should only include the default (my preference still being PCG64 as the best all-rounder) and MT19937 for legacy and leave the rest to a third-party package. The maintainer of which would have much less responsibility to guide the user than numpy has if we were to include each of these.

@bashtage
Copy link
Contributor

Is it? Which packages? The first package that comes up for me when googling "dSFMT" is randomgen itself (at least in my filter bubble).

Julia. I was under the impression that MATLAB moved to it, but I think I was mistaken and it was only added in R2015a.

@bashtage
Copy link
Contributor

I'd help with wording, but I still maintain that we should only include the default (my preference still being PCG64 as the best all-rounder) and MT19937 for legacy and leave the rest to a third-party package. The maintainer of which would have much less responsibility to guide the user than numpy has if we were to include each of these.

While I'm happy if PCG it is the chosen one, IMO PCG64 is too new and untested to be the only included modern generator. AFAICT it is not the default anywhere. I also feel that its performance gap on 32-bit systems is a problem (having worked in a unit of the US government that only issues 32 bit Python on Windows for a while, I'm sensitive to this). I also have some concerns that a subset of users might find the lack of a published paper problematic.

I also think there is a case for hiding MT19937. It is only used by RandomState and never needs to be direclty initialized since RandomState happily accepts 32-bit uints/array of uint.

As for the others, I stand by my reasons given above. Over time I've become more enamored with the Random123 (ThreeFry/Philox) family of generators for their simplicity in high-dimensional problems, despite having mediocre performance. This family also provides the best answer to any questions about random seeding of a default RNG. I also like the hardware-accelerated versions of aesctr which are like these two but faster (not that I'd recommend these here, at least right now, despite the relatively widespread availability fo AES hardware accelaration instructions (x86, AMD64 and armV8).

@rkern
Copy link
Member

rkern commented Jun 10, 2019

FWIW, PCG64 is now as old as MT19937 was when Python switched over to it, and is much more thoroughly tested (praise be to Moore's law), so I'm not terribly concerned about either of those. Someone has to be first-mover. If you want something more venerable, jsf64 might be a good option.

I guess our point of difference is not so much the specifics about each PRNG algorithm per se; your reasoning for each one is sound for raising those above the universe of all PRNGs. I agree that all of them should be available to users. But to me, the hard work of that was done by your BitGenerator design. We now can provide third-party packages with the non-default BitGenerators. And to me, that's the best way to provide users that option, given numpy's central role and conservative nature. I don't quite get your reservations about making PCG64 the only PRNG plus third-party options while still agreeing to making PCG64 the default (that you think most people are going to use without question) plus numpy-provided alternatives. I don't really understand that middle ground.

What do you think of providing the single default PRNG (plus legacy, hidden or not; separate issue) in numpy, putting the rest of these in your list into scipy as a distinguished second tier, and leaving the rest to the whims of the greater community? That's a very typical division of labor for these projects. We can always add PRNGs to either of these! We can't take them away, though. If our experience suggests that everyone is reaching for scipy.random.DSFMT for whatever reason, we can always promote it up to numpy.

@bashtage
Copy link
Contributor

The intellectual challenge I have with including a single, modern bit generator is that there isn't a clear best generator right now. There are the new, high-performance generators (PCG, Xoshiro, Wyrand), and there are new medium security (Philox, ThreeFry, ChaCha8/12) and high security (AES-Ctr (which is about as fast as PCG/Xoshiro w/ AES-NI and very slow without), Chacha20, and others) generators. These are all making different tradeoffs in different dimensions. For example, the counter-based generators have some desirable characteristics around seeding (keying) that PCG and Xoshiro do not. Both PCG and Xoshiro can be pretty easily broken by simple user interventions (advance a large round number, or multiply by 57 or respectively). The CBRNGs also have trivial advance functions, and so all support the range of alternative uses. AFAIK good CBRNGs do not have these issues.

While I feel strongly that making a default choice for Generator is a necessary user-convenience, I believe excluding generators that are plausibly in a set that contains the "best" does not ultimately do a service to non-expert users since it effectively discourages them from considering alternatives. I recently stumbled access rust-random's Guide which I think provides a good model for how one can think about helping users make reasonable choices.

Finally, I think there is a case for integrating with the wider Python/random number ecosystem. Some modern generators that are not PCG are in wide use. To me, keeping some of these are first class citizens closes the distance between projects and makes it easier to move between different computational models.

Not sure if you have seen VecRNG from Cern, but they are considering a similar generation strategy to SeedSequence.

@charris
Copy link
Member

charris commented Jun 15, 2019

@bashtage I think we have different starting points in deciding what should be in NumPy. The way I see things is that NumPy provides a framework and basic functionality rather than a platform for experimenting with the advantages or downsides of different RNGs. I didn't always think that way, that is why there are a variety of 1D interval zero solvers in SciPy. At the time I wrote them I figured "Hey, I've played with these, let's get them all out there for others to play with", but in retrospect, only two, maybe even one, was actually needed, the rest were just baggage. I do think there is a place for a variety of RNGs, but in external packages. Indeed, I expect Intel among others to create such packages, both for speed optimizations and to provide different underlying algorithms. Leaving that work to downstream developers will decrease the maintenance burden on NumPy and encourage the natural evolution of algorithms free from the hassle of dealing with NumPy and its PR and decision processes.

@bashtage
Copy link
Contributor

@charris I am not suggesting that NumPy should contain a wide gamut of bit generators. There are many that are clearly inappropriate for NumPy (e.g. one that used RDRAND or AES-NI). There are others that are simply dominated and so have no added value.

If minimalism is the desired goal, then the only generator that has to be included is MT19937. If this was the preferred course, then I think this is easily explainable.

What seems stranger is to make a choice about including modern generator where it doesn't seem to have such a clear rationale for one over another. They are close enough that reasonable people with slightly different preferences over features (performance, independence) would end up at different choices. So what I am suggesting (or advocating for I suppose) is that this minimal set of distinct generators is included.

@rgommers
Copy link
Member

What do you think of providing the single default PRNG (plus legacy, hidden or not; separate issue) in numpy, putting the rest of these in your list into scipy as a distinguished second tier, and leaving the rest to the whims of the greater community? That's a very typical division of labor for these projects.

We're talking about 6-7 generators here (assuming Xoshiro512 gets cut, with PCG32 to be decided on). If two get kept in numpy, that means finding a new home for 5-6 generators. Creating a new SciPy module for those is probably not justified. Reasons:

  • It's more work
  • the duplicate functionality is confusing rather than helpful (it's annoying for fft and linalg too)
  • SciPy's release cadence and backwards compat policy are the same as for NumPy
  • SciPy supports ~4-5 versions of NumPy, so it'll be a couple of years before SciPy is >= 1.17.0. Hence they cannot be included now.

tl;dr SciPy isn't really an option.

So I think it's either a standalone package, or in NumPy. I have no clear preference there. For me, the main deciding factor there is maintainer interest/availability. @bashtage has done most of the hard work, with a lot of support from @rkern. We will need their domain expertise. So if there's more energy for maintaining it inside NumPy, then I think that's fine.

Another consideration: numpy.random will actually then be one of the few (the only?) submodule that's state-of-the-art. For linalg and fft, SciPy is better. And then we have a long list of less interesting modules: matlib, dual, ctypeslib, emath, distutils, ma, polynomial, rec, char, etc. Some of those are larger than even the maximally inclusive version of random would be, and none are as interesting as random.

@rkern
Copy link
Member

rkern commented Jun 16, 2019

Yes, I think PCG32 and Xoshiro512 should be dropped as their respective companion PRNGs are probably recommended more.

  • DSFMT <dsfmt> - SSE2 enabled versions of the MT19937 generator. Widely used
    across many software packages as the default generator. Probably behind as
    many papers as any other generator. Good performance on any CPU with SSE2 or
    Altivec. See the dSFMT authors' page_.

I think this wording was based on a false recollection that dSFMT was MATLAB's default PRNG. Since this is not true, I think it requires another justification (and certainly different wording if the algorithm remains). To me, it's just a fast member of the Mersenne Twister family; it shares all of MT19937's statistical failings while not actually being the same as MT19937. I just don't see the user base who wants to conservatively stay within the MT family despite having proven statistically-better options, but not so conservative that they won't stay with MT19937. That's a tough needle to thread.

  • ThreeFry <threefry> and Philox <philox> - counter-based generators
    capable of being advanced an arbitrary number of steps or generating
    independent streams. Very popular in machine learning. See the Random123_
    page for more details about this class of bit generators.

I think only one of these needs to be here. I think I would also dispute the "very popular in machine learning". I'm pretty sure neither (or both collectively) are more popular than MT19937 in that field or any other, just because defaults matter. What they do have is relevance to massively-parallel applications in GPUs (because you can trivially get independent streams with dumb seed allocation), which I think is the source of the claim. But that leads me to question whether numpy is the right place, if that's the justification. It seems to me that you would want to get the implementation from the package that's providing the GPU implementation as well to ensure that they are the same; I certainly would. Each of these has a couple of variants within the family, and if we're playing around with the seeding like SeedSequence, that's just another place where implementation differences can creep in. At the same time, if we can promote the use of @imneme's strong seeding algorithm through SeedSequence, then it's entirely possible that that would obviate the desirability of these algorithms on the GPU; one could pick faster GPUable algorithms that just need strong, but not perfect, seed allocation.

That said, having a counter-based medium-strength-crypto PRNG might be justification enough all by itself; I'm just wobbly on the stated "popular in machine learning" justification.

Finally, I think there is a case for integrating with the wider Python/random number ecosystem. Some modern generators that are not PCG are in wide use. To me, keeping some of these are first class citizens closes the distance between projects and makes it easier to move between different computational models.

Similarly, I'm not particularly moved by this. There's a case to be made for inter-language operation (GPU/CPU inter-operation is a kind of subset of this use case), but that would probably be best supported in the packages providing the inter-language communication. It's not like we can actually replicate the results in Julia just by using DSFMT like they use. Most non-trivial programs won't replicate the computations that we run on top of the raw bit-streams. We're not even doing that between versions of numpy, much less different languages altogether.

I am sympathetic to perhaps having a representative of a counter-based PRNG like Philox and Threefry because people do write code now with them differently than other PRNGs thanks to their seeding freedom. That means that there would be less translation work that needs to be done at the API level, even if you aren't expecting to reproduce exact results. But I don't think that Julia using DSFMT, for example, means that we have to have DSFMT in numpy; just about any PRNG would do at the API level (with .jumped(), but only if that's used on the Julia side). A Julia-interop package might provide such a thing, though, taking care to reproduce seeding algorithms and such.

  • Xoshiro256 <xoshiro256> and Xoshiro512 <xoshiro512> - The most recently
    introduced XOR, shift, and rotate generator. Fast and popular bit generator,
    despite some reservations in rare corner case.More information about these bit
    generators is available at the xorshift, xoroshiro and xoshiro authors' page_.

Actually, I think this is the one that gets at why I've been pushing to have just one recommended PRNG in numpy. Possibly irrationally (I'll leave it to others to judge), I just don't want Xoshiro in numpy. I used to use the predecessors of this algorithm in C code where I needed a good, but short implementation, and the author's claims of statistical quality were enticing. But then other people ran the tests and found that they failed, for a couple of generations of this lineage of algorithms. Having been burnt by this family of algorithms once, I'm not eager to give it pride of place in numpy. While this particular member passes the current tests, analysis shows some weird behaviors that could easily become a future failure. To bastardize a saying from the related crypto field, the randomness tests only get better, never worse.

The sting of finding that MT19937 fails statistical tests only two years after making it numpy's only PRNG also makes me want to run away from every LFSR-based PRNG as well. numpy doesn't have to share my emotional irrationalities, though, so I'll leave that as my piece said.

Both PCG and Xoshiro can be pretty easily broken by simple user interventions (advance a large round number, or multiply by 57 or respectively).

I just want to be clear, these two are not the same category of things. .advance() is an, ahem, advanced operation that doesn't come up often, and we make it clear that one shouldn't do blindly. That's why you wrapped it up nicely in the .jumped() API using author-recommended constants and removing choices from the users. With .jumped() around, you have to go out of your way to misuse .advance(). This is just like the way we prevent bad zeroland states in MT19937 and Xoshiro with good seeding algorithms. You could defeat it by setting the state directly to a weak state, but that's on you if you do.

Whereas multiplying by 256*n + 57 (any n is a problem) is something that one can do with no intention and with no workaround, like in @lemire's bounded-integer algorithm that we use for .integers(). If .integers(57) is detectably non-random because of the BitGenerator underneath it, I think that's a problem.

If we want one other modern "fast", non-counter PRNG besides PCG64 (or even instead of PCG64), then we really should look at jsf64 or gjrand. Both are well-tested and well-analyzed, should be fast even on Win32 (at least, not slow like PCG64). Neither supports .jumped(), which @bashtage won't like, but doesn't bother me personally, as I like SeedSequence.spawn() better. Maybe the jumpability of PCG64 and the venerableness and Win32 performance of jsf64 is enough to have both around.

@mattip
Copy link
Member Author

mattip commented Jun 16, 2019

Thanks for this. I will separate the proposal to remove each of dFSMT, Xoshiro* and ThreeFry, Philox into separate PRs so we can debate each on their merits.

@mattip
Copy link
Member Author

mattip commented Jun 24, 2019

I think this PR can be closed, we are going in the direction of including a minimal set of BitGenerators so in any case it would need reworking for the final set.

@charris
Copy link
Member

charris commented Jun 27, 2019

Superseded by #13849, so closing.

@charris charris closed this Jun 27, 2019
@mattip mattip deleted the refactor-bit_generator branch October 11, 2020 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants