0% found this document useful (0 votes)
41 views6 pages

Response To Critique of Dream Investigation Results: Minecraft Speedrunning Team December 2020

The document summarizes flaws in Dream's response to an analysis of his Minecraft speedrunning results. It makes three key points: 1) Dream's response estimates a different probability than the original analysis and does so invalidly. It does not disprove the original analysis. 2) Most of Dream's direct criticisms of the statistical techniques used are incorrect. The only valid criticism relates to the number of random number generator factors corrected for. 3) Examples provided by Dream's response to argue sampling bias corrections were incorrect are themselves incorrect. Simulations show the original analysis' corrections were valid.

Uploaded by

Aiden Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views6 pages

Response To Critique of Dream Investigation Results: Minecraft Speedrunning Team December 2020

The document summarizes flaws in Dream's response to an analysis of his Minecraft speedrunning results. It makes three key points: 1) Dream's response estimates a different probability than the original analysis and does so invalidly. It does not disprove the original analysis. 2) Most of Dream's direct criticisms of the statistical techniques used are incorrect. The only valid criticism relates to the number of random number generator factors corrected for. 3) Examples provided by Dream's response to argue sampling bias corrections were incorrect are themselves incorrect. Simulations show the original analysis' corrections were valid.

Uploaded by

Aiden Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Response to Critique of Dream Investigation Results

Minecraft Speedrunning Team


December 2020

1 Introduction value less or equal to Dream’s) fully accounts for all


stopping rule issues.
Before going into the details of the flaws in Dream’s
response paper, we would like to clarify a few impor-
tant points. The issue can be described as follows. Suppose we
First of all, the response paper attempts to esti- have a sequence of Bernoulli trials with probability
mate an entirely different probability from ours, and 0.1, and we stop after the first successful trial. The
even then, does so invalidly. That is, its “1 in 10 mil- last trial that we have is necessarily a success, leading
lion” calculation is both invalid and not directly com- to biased results if we assumed a standard fixed-n
parable to the “1 in 7.5 trillion” number from the sampling scheme. The author of Dream’s response
moderator report. Even if the analysis that produced alleges that Dream’s streams are more accurately
their number were performed correctly, that would modeled as the sum of variables with such a negative
not in any way show our analysis to be incorrect. One binomial stopping rule (where each variable corre-
would have to demonstrate that our statistical tech- sponds to a run), rather than a single variable with an
niques are invalid, not just that asking a different unknown stopping rule. However, the “stops” that
question leads to a different answer. are alleged to be a problem are not true stops. Dream
Second, most of the direct criticisms of our anal- continues speedrunning the next run, and hence the
ysis in the response paper are blatantly incorrect, Bernoulli sequence continues. The division of the se-
disputing the accuracy of extremely standard sta- quence into "runs" or "streams" is arbitrary and the
tistical techniques firmly grounded in probability distribution can be modelled without taking it into
theory. The only criticism of our analysis which even account. The only way that having a data-dependent
arguably holds any water is the critique of our choice stopping rule per run influences the data is by influ-
of 10 as the number of RNG factors to correct for. We encing the stopping rule of the full data, which was
strongly disagree that 37 is a suitable number, but accounted for as admitted in the response paper. For
even if, despite that, it were used, it would not change example, the sequence of n negative binomial sub-
our conclusion. sequences that require x successes each is equiva-
lent to a single negative binomial sequence requiring
k = nx successes.
2 The Binomial Distribution
Dream’s response paper suggested that per-run stop- Analogously, if you keep flipping a coin until you
ping has to be accounted for, as compared to a bino- get heads twice, you are likelier to observe more
mial distribution with an overall stopping rule. In heads than tails as compared to a fixed number of
this section we explain why this is incorrect. We ar- tosses. However, if you simply take a break after
gue that using a binomial distribution with a "worst- getting two heads and return afterwards, it doesn’t
case scenario" stopping rule (having a binomial p- affect the numbers whatsoever.

1
saying that there are 80 choices for the
starting position of the 20 successful coin
tosses in the string of 100 cases gives 280
20 =

7.629 × 10−5 or 1 in 13000... The actual


odds come out to be about 1 in 6300, clearly
better than the supposed “upper limit”
(a) Chunked Negative Binomial (b) Direct Negative Binomial calculated using the methodology in the
MST Report. This is due to the facts men-
Figure 1: Distribution Comparison tioned above: 1) subsets with different p-
values are harder to combine and 2) “lucky
streaks” are not average randomly chosen
2.1 Example Simulation samples, but samples that are specifically
We can illustrate this point with a rather straight- investigated because they are lucky.
forward example. Suppose that we have a sequence Applying a Sidak correction, like we used, yields a
of Bernoulli trials succeeding with probability 0.1 probability of 7.63 × 10−5 , or one in about 13,000, as
each. We stop after 200 successes, which is an overall they noted. However, reading over the page that they
stopping rule at “k = 200” — a negative binomial linked1 , we can get the exact result of 3.91 × 10−5 ,
setup. We do this in chunks called “runs” that each notably smaller than our Sidak correction value. Pro-
have a stopping rule of “stop if xrun = 2” where x is ceeding with a simple Monte Carlo simulation, just
the number of successes in that particular run. Effec- as the response paper does, we run a simulation for
tively, we will stop after successfully completing 100 500 million samples and yield a value of 3.86 × 10−5 ,
runs. Here, simulation yields the distribution shown or about one in 25,900, again smaller than the value
in Figure 1a for the number of trials. However, us- from our correction. It is unclear how the author of
ing the same seed in a simulation of a pure negative Dream’s response paper got their values.
binomial setup without per-run stopping yields the The author proceeds to give another example, but
exact same result, as shown in 1b. it is unclear what they did. They state that they are
This example illustrates that when the same stop- finding the probability of three consecutive events
ping rule is used overall, the stopping rule of the with probability 0.01, but do not state out of how
individual runs do not matter. Again, to reiterate, many trials these events come from. Equation 2 from
the “runs” are entirely arbitrary separations. The the response paper was referenced, but this equation
only way the per-run stopping rule matters is in how does not appear to be relevant here2 . However, com-
it influences the overall stopping rule. paring a simple Monte Carlo simulation with 500 mil-
lion samples again, considering the case of n = 100,
3 Sampling Bias Corrections we find an exact value of 9.70 × 10−5 , and a Monte
Carlo value of 9.71 × 10−5 . In contrast, using the
The response paper alleges that our bias correction same correction as the original paper, we get the
was incorrect. The paper proposes that our correc- larger value of 9.8 × 10−5 . The author seemed to
tion cannot properly handle “streaks” of successes, suggest that our correction is inaccurate due to the
and gives some examples to illustrate. However, the p-values for various streams or runners being differ-
numbers given by the paper’s author for their own ent. However, it is only Dream’s combined p-value
examples are incorrect. that is relevant to the correction, and as has been
illustrated above, the correction was not shown to
At first this seems extremely unlikely as be wrong.
the probability of getting 20 heads in a 1 https://mathworld.wolfram.com/Run.html
row is 2120 , just less than 1 in a million. 2 Equation 2 from the response paper is a formula for the prob-

Applying the Bonferroni correction and ability density function for the product of n iid uniform variables

2
4 Including all 11 streams as the p-value outputted, after correcting for the
number of streams, is the p-value for Dream’s entire
Dream’s response paper notes that: livestream history. Were it applied to someone else,
it would also be applied to their entire livestream his-
However, as is discussed throughout this tory. Moreover, their estimation of 300 livestreamed
document, choosing to put a break point runs per day over the past year is highly implausi-
between the streams after seeing the prob- ble. Many runs are not livestreamed, and the esti-
abilities would require including a correc- mation is based on current numbers, even though
tion for the bias of knowing this result. Minecraft speedrunning has grown massively in the
This implies that we did not correct for this bias, but recent months.
we did, as per section 8.2 in our initial paper. Dream’s At the time of Dream’s run, there were 487 run-
response paper concludes that when including all ners who had times in 1.16 – far under 1000 – and
11 streams in the analysis, there is “no statistically the vast majority of these were unpopular or did not
significant evidence that Dream was modifying the stream. Selection bias could only be induced from
probabilities”. This result is expected and meaning- observed runners, so speedrunners who had no sig-
less, as Dream is only accused of using a modified nificant viewership watching their attempts should
game for the last 6 streams; including all streams not be included. Frankly, there were probably fewer
dilutes the data, yielding inconsistent results. than 50 runners in any version who might’ve been
examined like this, but we used 1000 as an upper
bound.
5 Correction Across Runners Note that treating whether or not someone is “ob-
served” as a binary value is a simplification: the less
The rebuttal paper states: likely extreme luck would be noticed for someone,
the less they contribute to sampling bias. We in-
In Section 8.3, they claim that their calcula- cluded people who have only a handful of viewers in
tion of p is for a runner within their entire the calculation even though the amount of sampling
speedrunning career. This is presumably bias they introduce is likely negligible.
based on the argument from Section 8.2 Additionally, note that this is one of the most im-
that they have already corrected for every portant factors shifting the number upwards in the
possible subset of streams... Further, that response paper. Severely overestimating the number
correction was based on choosing 6 of 11 of livestreamed attempts artificially inflates the final
livestream events from Dream, suggesting number to a massive degree.
that their definition of “career” is 11 multi-
hour livestream events comprising about
50 runs.
6 The number of RNG types
This is incorrect. The p-value this process generates
is the probability that results as extreme as Dream’s Dream’s response paper corrects across 37 different
are obtained if one chooses the most extreme se- random factors. It is worth noting that, even using
quence of streams from a runner’s entire stream- this increased number of factors, the final p-value
ing career. The choice of 11 is only due to the fact only changes by a factor of 15. If we accepted this
that this happens to be the amount of times Dream list, it would not change our conclusion, but we still
has streamed speedrun attempts — to calculate that hold that the list is seriously flawed.
value for a different runner, you would use the num- Dream suggests that eye breaking odds, various
ber of times they had streamed instead of using 11. mob spawn rates, dragon perch time, triangulation
The response paper suggests correcting across ability, and various seed-based factors should be
livestreams instead of individuals. This is redundant, counted. However, these are more difficult to cheat

3
than blaze rods and piglin bartering rates, and in 7 Paradigm Inconsistency
some cases are entirely implausible for us to examine.
The dominant theory is that Dream cheated by mod- In section 4.2 of Dream’s response paper, the author
ifying the internal configuration files in his launcher explains they use the Bayesian statistics paradigm in-
jar file directly. Other methods are possible as well, stead of the hypothesis testing paradigm used in our
but this is likely the most straightforward. Using this report. That is, Dream’s response paper attempts to
method, only entity drops and piglin barters can be calculate the probability that Dream cheated given
modified. the bartering and blaze data; in contrast, our paper
Dream offers frequency of triangulation into calculates the probability of obtaining bartering and
stronghold as one factor. However, this isn’t random blaze results at least as extreme as Dream’s under
at all, and is instead a skill-based factor3 . Addition- the assumption the game is unmodified. These are
ally, many of the factors proposed are seed-based. entirely different probabilities, but Dream’s response
An extensive amount of time would be required to paper confuses the two paradigms throughout, pro-
seedfind for enough randomly generatable world ducing an uninterpretable result.
seeds for a livestream, making it not a very plausi-
ble method for long-term cheating. Further, it is in
principle possible to detect set-seeds based on non- 7.1 Unclear Corrections
seed random factors. As a simplified example, if we
clearly know the LCG state at a fixed length from Dream’s response paper mimics many of the bias
seed generation, we can backstep to seed generation corrections in our original paper, but because the
to find what seed should’ve been generated. Frankly, starting value is the posterior probability of an un-
this would be rather difficult to do, but it would be modified game and not a p-value, some of these cor-
attempted first instead of statistical analysis. rections are unjustified. Indeed, it is not trivially
Some suggested factors rely on strategies that obvious that frequentist p-value corrections can be
were either defunct or nonexistent at the time of applied to such a probability.
Dream’s runs. Monuments, and string from barters, Dream’s response paper attempts to correct for
are only important for so-called “hypermodern” the stopping rule. This is perfectly fine under a fre-
strategies, which often skip villages and explore the quentist paradigm like we used. However, it is in-
ocean. These strategies did not exist at the time of consistent with the Bayesian paradigm used in the
Dream’s run. Similarly, ender pearl trades are prac- response paper. Bayesians follow the likelihood prin-
tically never used in 1.16 runs due to it being more ciple, such that changes to the likelihood by a factor
difficult and slower to get pearls via trades than via that does not depend on the parameter of interest
barters. As a result, no top runs in 1.16 utilize villager do not change the results. A well-known feature of
trading. the likelihood principle is that stopping rules are
Finally, some factors occur too rarely to obtain a irrelevant to analyses that use methods following
large enough sample for analysis. For instance, one it. Hence, the author should not have accounted for
only gets to the end portal on nearly completed runs, stopping rules at all, including the dropping of the
so there would be very few events to check. last data point. Indeed, the response paper itself
Clearly, the 37 number is entirely unrealistic. It stated that one of the reasons why a Bayesian ap-
relies on the use of strategies that Dream could not proach was used is to avoid having to model the stop-
have used, and the investigation of factors that we ping rule of each run. However, despite this state-
could not investigate. Again though, even if we ac- ment, the author goes on to drop the last data point
cept the full 37 number, it only changes our result by in attempt to address the stopping rule.
a factor of 15 – not enough to change our conclusion. Similarly, the response paper attempts to correct
for selection bias across runners. This is rather odd, as
3 How well a player can triangulate based on eye throws. the goal of these corrections is to control error rates,

4
a goal that is not shared with Bayesian methods4 . Relevant Links:
The likelihoods across individuals are independent of
one another, and therefore comparisons across other By Moderators or Dream
individuals are irrelevant to a Bayesian analysis.
1. Dream Investigation Results, original moderator
paper.
7.2 Invalid Comparison 2. Critique of Dream Investigation Results, Dream re-
The final conclusion of Dream’s response paper con- sponse paper by Photoexcitation.
flates the posterior probability with the p-value once
3. Did Dream Fake His Speedruns - Official Moderator
more.
Analysis, Moderator YouTube investigation re-
port.
In any case, the conclusion of the MST Re-
port that there is, at best, a 1 in 7.5 trillion 4. Did Dream Fake His Speedrun - RESPONSE, Dream
chance that Dream did not cheat is too ex- response video.
treme for multiple reasons that have been
discussed in this document.
By Others
Again, the 1 in 7.5 trillion chance does not represent 5. Reddit r/statistics comment by mfb, a particle
the probability that Dream did not cheat; it repre- physicist with a PhD in physics.
sents the probability of any Minecraft speedrunner
to get results at least as extreme as Dream’s using 6. The chances of “lucky streaks”, a Reddit post by
an unmodified game while streaming. Widening the particle physicist mfb.
scope to any streaming speedrunner already artifi-
7. Dream’s cheating scandal - explaining ALL the math
cially enlarges the p-value in Dream’s favor and was
simply, YouTube video by Mathemaniac.
only done to prevent accusations of p-hacking and
the like. 8. Blog post by Professor Andrew Gelman.
Even if Dream’s response calculation were done
correctly, the 1 in 10 million posterior probability
would not be directly comparable to the 1 in 7.5 tril-
lion figure and would still imply a 99.99999% chance
of Dream cheating.

8 Conclusion
The author of Dream’s response paper appears to mix
frequentist and Bayesian methods, resulting in an un-
interpretable final result. Further, these methods are
applied incorrectly, preventing valid conclusions be-
ing made. Despite these problems being in Dream’s
favor, the author presents a probability that still sug-
gests that Dream was using a modified game. Hence,
our conclusion remains unchanged.
4 With the exception of matching priors, although such can

hardly be considered Bayesian.

5
A Julia Simulation Code 20
21 end
end

22 res
A.1 Stopping Rule Simulations 23 end
24
25 # probability is numruns / 500000000
1 using Random
2 using Distributions
3 using Plots
4
5 Random . seed !(1234) A.3 1% Event Simulation
6 nbsplit = []
7 for i ∈ 1:1000
1 using Random
8 n = 0
2 using Distributed
9 nseq = 0
3 using Distributions
10 while nseq != 100
4
11 x = 0
5 numruns = @distributed (+) for i ∈
12 while x != 2
1:500000000
13 x += rand ( Bernoulli (0.1) )
6 x = rand ( Bernoulli (0.01) , 100)
14 n += 1
7
15 end
8 res = false
16 nseq += 1
9 count = 0
17 end
10 for j ∈ 1: length ( x )
18 push !( nbsplit , n )
11 if x [ j ]
19 end
12 count += 1
20
13 else
21 Random . seed !(1234)
14 count = 0
22 nb = []
15 end
23 for i ∈ 1:1000
16
24 x = 0
17 if count == 3
25 n = 0
18 res = true
26 while x != 200
19 break
27 x += rand ( Bernoulli (0.1) )
20 end
28 n += 1
21 end
29 end
22 res
30 push !( nb , n )
23 end
31 end
24
32
25
33
26 # probability is numruns / 500000000
34 # nb : Direct negative binomial result
35 # nbsplit : Chunked negative binomial result
36
37 println ( nb == nbsplit )

A.2 Coin Flip Simulation


1 using Random
2 using Distributed
3
4
5 numruns = @distributed (+) for i ∈
1:500000000
6 x = rand ( Bool , 100)
7
8 res = false
9 count = 0
10 for j ∈ 1: length ( x )
11 if x [ j ]
12 count += 1
13 else
14 count = 0
15 end
16
17 if count == 20
18 res = true
19 break

You might also like