Understanding Probability Finally
Understanding Probability Finally
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 2 of 15
source
Benjamin Franklin once said that two things in life are certain: death
and taxes. Many people might extend this claim even further by
affirming that these are the only things which are certain. The
remaining parts of life are least predictable. We could, however, argue
that even the inevitable death might not be as certain as we think,
because we don’t always know when, where and how it will occur.
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 3 of 15
source
In the coin example, the sample space has two elements: head and tail.
This is a discrete sample space, in contrast to the continuous sample
space in the software example. In the latter case, the amount paid for
the software can be any positive real number. In both cases, we are
performing a probabilistic experiment, with an unknown outcome X,
also called random variable.
In the coin flip experiment, if the coin lands head, X will get the value
h, else it will get the value t. If the experiment is repeated many times,
we usually define the probability that the random variable X will get
the value little x, as the fraction of times that little x occurs.
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 4 of 15
Probability as a function
In the case of a coin flip, if we perform the experiment sufficiently long
enough, for almost an infinite number of times, with a fair coin, we
shall expect the probability to land head or tail to be 1/2. Both
probabilities of landing head or tail are non-negative and will sum up
to 1. Therefore we can consider probability as a function that takes an
outcome, i.e. an element from the sample space and maps the outcome
to a non-negative real number so that the sum of all these numbers
equals 1. We also call this a probability distribution function. When
we know the sample space and the probability distribution, we have
the full description of the experiment and we can reason about
uncertainty.
source
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 5 of 15
source
source
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 6 of 15
source
source
source
In all examples so far, order did matter. Sometimes the order does not
matter, and we just care about the collection. Instead of working with
tuples of possible outcomes, we deal with the set of possible outcomes.
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 7 of 15
For example, if we flip a coin twice, we can have the outcomes (head,
tail) or (tail, head). When order does not matter, both outcomes build
the event {(head, tail), (tail, head)}.
source
Let’s repeat the same experiment, this time without replacement. Now,
the distribution is uniform, as shown below. Order and sampling
methods can have a significant impact on the probabilities.
source
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 8 of 15
source
The table and plot above describe the probability mass function
(PMF) of a discrete random variable, which can also be presented as a
histogram. In the case of continuous random variable, obviously the
PMF cannot be uniform and cannot be an increasing function. It can,
however, be a decreasing or double-infinite function.
Probability of intervals
Sometimes we are interested in computing the probabilities of a
random variable having a value less or equal to some threshold. The
cumulative distribution function (CDF) is what allows us to
manipulate interval probabilities, as depicted in the example below.
source
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 9 of 15
source
source (2017)
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 10 of 15
source
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 11 of 15
The simplest non-trivial random variable will take two possible values,
for example 1 and 0, representing success and failure, or head and tail.
The Bernoulli distribution defines the probability of success p, and
the probability of failure q=1-p. When performing several such
independent binary experiments, the probability of a given sequence
of outcomes is easily calculated as the product of individual
probabilities.
source
source
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 12 of 15
source
source
source
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 13 of 15
source
source
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 14 of 15
Conclusion
In this article, we offered an accessible refresher of probabilistic
concepts, most data scientists want to master. We went beyond rolling
dice and intuitively illustrated how uncertainty is quantified and
analyzed using outcomes, events, sample space, distribution functions,
cumulative functions, random variables, expectation and variance.
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 15 of 15
https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019