0% found this document useful (0 votes)
3 views15 pages

Understanding Probability Finally

This article provides a practical guide to understanding probability concepts relevant to data science and machine learning, emphasizing real-world examples over complex mathematics. Key topics include random variables, probability distributions, expectation, mean, and variance, aimed at enhancing the reader's ability to discuss probability confidently. The article also covers sampling methods, joint distributions, and various types of probability distributions, including Bernoulli, binomial, Poisson, and normal distributions.

Uploaded by

Surya Bhoi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views15 pages

Understanding Probability Finally

This article provides a practical guide to understanding probability concepts relevant to data science and machine learning, emphasizing real-world examples over complex mathematics. Key topics include random variables, probability distributions, expectation, mean, and variance, aimed at enhancing the reader's ability to discuss probability confidently. The article also covers sampling methods, joint distributions, and various types of probability distributions, including Bernoulli, binomial, Poisson, and normal distributions.

Uploaded by

Surya Bhoi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Understanding probability. Finally!

– Towards Data Science Page 1 of 15

Understanding probability. Finally!


Illustrated practical guide to probability concepts for
data scientists

Michel Kana, Ph.D Follow


Sep 23 · 12 min read

This article gently walks through probability concepts underlying data


science and machine learning. Each notion is carefully introduced and
illustrated by real-word examples, while avoiding as much
mathematics and theorems as possible. As a data scientist, you will
finally nail concepts such as outcome, event, random variable,
probability distribution, expectation, mean and variance. Although
you already solve real-world problems on a day-to-day basis using
random forest, logistic regression, K-means clustering, support vector
machines or even deep learning, you will now be able to speak
confidently about probability at the end of this refresher.

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 2 of 15

source

Benjamin Franklin once said that two things in life are certain: death
and taxes. Many people might extend this claim even further by
affirming that these are the only things which are certain. The
remaining parts of life are least predictable. We could, however, argue
that even the inevitable death might not be as certain as we think,
because we don’t always know when, where and how it will occur.

How do we approach random phenomena? By Faith or by Reason?


Faith is the unquestioning belief in the truth of something that does
not require any evidence and is assumed to not be provable by any

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 3 of 15

empirical or rational means. Reason is the faculty of the mind through


which we can logically come to rational conclusions. While faith is a
good tool to deal with random phenomena calmly, reason gives us
means to learn the range, average or variability of what we observe.
Reason allows us to infer the structure, change and relations from
what we have observed on a bigger scale. We can also attempt to
predict the future, likelihood and guarantee about how certain we are
about the outcome. The reasoning approach brings benefits such as
understanding what is underlying observations, planning ahead to see
how things could change and building things that provide the most
certain outcome.

Breaking down uncertainty


When we are observing the world around us and collecting data, we
are performing an experiment. A simple example is coin flip, while
observing if it lands head or tail. A more complex example is to try a
software and to observe how much we are ready to pay for its full
version. All possible outcomes of such experiments build the so-called
sample space, denoted Ω.

source

In the coin example, the sample space has two elements: head and tail.
This is a discrete sample space, in contrast to the continuous sample
space in the software example. In the latter case, the amount paid for
the software can be any positive real number. In both cases, we are
performing a probabilistic experiment, with an unknown outcome X,
also called random variable.

In the coin flip experiment, if the coin lands head, X will get the value
h, else it will get the value t. If the experiment is repeated many times,
we usually define the probability that the random variable X will get
the value little x, as the fraction of times that little x occurs.

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 4 of 15

Probability as a function
In the case of a coin flip, if we perform the experiment sufficiently long
enough, for almost an infinite number of times, with a fair coin, we
shall expect the probability to land head or tail to be 1/2. Both
probabilities of landing head or tail are non-negative and will sum up
to 1. Therefore we can consider probability as a function that takes an
outcome, i.e. an element from the sample space and maps the outcome
to a non-negative real number so that the sum of all these numbers
equals 1. We also call this a probability distribution function. When
we know the sample space and the probability distribution, we have
the full description of the experiment and we can reason about
uncertainty.

In a uniform probability distribution, all outcomes in the sample


space will have the same probability of occurring. Our coin flip case is
such an example. Another example is card game, where the probability
of picking a given card out of a standard deck of 52 cards is the same
1/52.

source

Non-uniform distributions will map different probabilities to


different outcomes or set of outcomes. A good example is illustrated by
the pie chart below, where each category represents an outcome and
proportions are probabilities.

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 5 of 15

source

When multiple outcomes occur


A set of outcomes is also called a probabilistic event. For example,
when we throw a dice, the set of all even numbers is an event. An even
occurs if it contains the occurred outcome. Therefore if we throw the
number 4, the Even event is said to have occurred. The probability of
an event will be the fraction of times the event occurs when the
experiment is repeated many times. For example, if we throw the die
10 times, and we get the following numbers 5, 3, 2, 3, 2, 1, 4, 6, 5, 2,
then, the probability of the odd event is 5/10=1/2. If we repeat the die
throw an infinite number of times, the probability of the Even event
will be the fraction of times we have an even number, which is the sum
of fraction of times 2, 4 or 6 occurs, which gives us
1/6+1/6+1/6=3/6=1/2.

source

Sampling with or without replacement


Sometimes we might want to repeat the same experience multiple
times, like flipping a coin. When repeating experiments, a common
assumption is that the outcome of one experiment does not have any
influence on the outcome of the others. In order words, the
experiments are independent.

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 6 of 15

source

In real-life situations, we often select things from a larger population


and analyze them. There are the following two ways to do this:
sampling with replacement, sampling without replacement. When we
sequentially select with replacement, the outcomes can repeat and
the experiments are independent. In the example below, we sample
two balls from a jar with two balls, one yellow and the second blue.

source

If we select without replacement from the jar, the outcomes cannot


repeat and the experiments are dependent.

source

In all examples so far, order did matter. Sometimes the order does not
matter, and we just care about the collection. Instead of working with
tuples of possible outcomes, we deal with the set of possible outcomes.

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 7 of 15

For example, if we flip a coin twice, we can have the outcomes (head,
tail) or (tail, head). When order does not matter, both outcomes build
the event {(head, tail), (tail, head)}.

Let’s now look at another situation, where we select 2 cards out of a


deck with 6 cards with replacement. When order matters, we have a
uniform distribution, meaning that all possible tuples (1,1), (1,2), …,
(6,6,) have the same probability of occurring 1/36. However, when
order does not matter, the distribution is not uniform anymore, as
illustrated below.

source

Let’s repeat the same experiment, this time without replacement. Now,
the distribution is uniform, as shown below. Order and sampling
methods can have a significant impact on the probabilities.

source

When outcomes are numbers


A random variable will be a random outcome whose values are
numbers. Studying numbers (random variables) instead of outcomes
brings several advantages. Instead of looking at heads and tails as
occurrences, we can manipulate the count of heads, the count of tails
or the sum of both. Moreover, the probabilities of a random variable
can be displayed in a plot or can be expressed as functions with specific
properties. Random variables take their values from a sample space. If
the sample space is finite, such as a set of numbers {1, 2, 3}, then we
are dealing with a discrete random variable. When the sample space

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 8 of 15

is infinite, such as ℝ, then we have a continuous random variable.


Below you can see an example of discrete random variable describing
the count of heads in an experiment where three fair coins are flipped.

source

The table and plot above describe the probability mass function
(PMF) of a discrete random variable, which can also be presented as a
histogram. In the case of continuous random variable, obviously the
PMF cannot be uniform and cannot be an increasing function. It can,
however, be a decreasing or double-infinite function.

Probability of intervals
Sometimes we are interested in computing the probabilities of a
random variable having a value less or equal to some threshold. The
cumulative distribution function (CDF) is what allows us to
manipulate interval probabilities, as depicted in the example below.

source

What we should expect


In order to study uncertainty around a random variable, we are also
interested in the range of possible values, the minimum and maximum
values, the range average, element average, and sample mean.
Especially, the sample mean is an expectation of what we might likely
observe in the future on average. The sample mean or expected value
is an average of the values of a random variable, weighted by their

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 9 of 15

probabilities. Below we have a temperature distribution, where 0


degree is observed most of the time. In a sample of 10 temperature
values, we obtain the sample mean 19 degrees, that tells us an idea of
the average temperature we will likely observe.

source

When the number of samples go to infinity, we expect to see a value


that is called the expectation or mean of the random variable as
depicted below. It is also denoted μ.

Life expectancy is actually a good illustration of expectation. Below we


can see how the expectation of demographic figures changed within 50
years in developed countries.

source (2017)

The variance from expectation


After we know what to expect, we can now look at how the random
variable is going to differ from the expected value or mean. Two
samples with the same mean can look very differently. Those

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 10 of 15

variations are captured by the variance of the random variable, which


is defined as the expectation of the absolute difference between the
random variable X and its mean μ. Because the absolute is not easy to
analyze, especially its differentiation is not possible at zero, the
variance is calculated using the square of differences. The standard
deviation is defined as the square root of the variance.

When everything happen at once


In real life, the sample space is a combination of multiple variables. If
we want to send ads to students, we might want to consider their
grade, study year and major in order to target each specific audience
with the right message.

source

Therefore, given 2 or more random variables, we consider the so-


called joint distribution that gives the probability of every possible
tuple of values which the variables can take in the sample spaces. The
join distribution is sufficient for calculating the marginal probabilities,
i.e. the probability of a single variable taking a specific value.

What we should know


Random variables typically belong to popular families of
distributions, which are very well studied and facilitate our lives.
Discrete distributions describe random variable, which have countable
outcomes, in contrast to continuous distributions.

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 11 of 15

The simplest non-trivial random variable will take two possible values,
for example 1 and 0, representing success and failure, or head and tail.
The Bernoulli distribution defines the probability of success p, and
the probability of failure q=1-p. When performing several such
independent binary experiments, the probability of a given sequence
of outcomes is easily calculated as the product of individual
probabilities.

source

The Binomial distribution defines the probability of a specific number


of successes in n independent Bernoulli trials, where the probability of
success is p. It can be used to study the number of positive respondents
in a clinical trials or the number of marketing leads who will
successfully be converted into paying customers. The Poisson
binomial distribution is a variation of the binomial distribution,
where the probability of success is not fixed.

source

The Poisson distribution approximates the binomial distribution for


small probability of success p for a very large number of trials n. For
example it is very useful to study the number of people with rare-
disease infections, the number of website visitors clicking on an ad, the
number of responses to spam, the number of daily emergency calls, the
number of people visiting a store out of the whole city population, the
number of gallery visitors who purchase a painting.

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 12 of 15

source

When we repeat independent Bernoulli trials with probability of


success p, the number of trials n until we observe success follows a
Geometric distribution. For example, the number of hacker attacks
needed before breaching into a computer system and the number of
startups before successful business, will be distributed geometrically.

source

In many cases, random variables have values from an infinite space,


with an uncountable number of possible values. For example time
variables such as flight duration, life duration, or space related
variable such as surface and height; or mass related variables such as
weight; or temperature. Even quantities which are almost countable
can be considered here, such as cost of houses, price of products,
interest rate and unemployment rate. In all these cases, we do not use
probability mass function any more. Instead we should use the so-
called probability density function (PDF).

The Exponential distribution extends the geometric distribution to


continuous values. For example, the duration of a phone call, the
waiting time when a customer calls a customer care hotline, and the
lifetime of a vehicle can all be described by an exponential distribution
with parameter λ.

source

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 13 of 15

The most popular of all is the Normal or Gaussian distribution, that


commonly applies whenever adding many independent factors. It
describes the probability of values given that we know their mean and
standard deviation. Examples include people’s height and weight, as
well as salaries.

source

The probability density functions, mean and variance of continuous


distributions are summarized in the table below.

source

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 14 of 15

Conclusion
In this article, we offered an accessible refresher of probabilistic
concepts, most data scientists want to master. We went beyond rolling
dice and intuitively illustrated how uncertainty is quantified and
analyzed using outcomes, events, sample space, distribution functions,
cumulative functions, random variables, expectation and variance.

We used pedagogic materials from prof. Alon Orlitsky from UC San


Diego in this article. He offered a free course on EdX on probability and
statistics using Python together with prof. Yoav Freund, the inventor of
the AdaBoost algorithm. If you want to dive deeper into the concepts
introduced in this article, we highly recommend you to take the
course.

Conditional probability was not covered; neither was the essential


characteristic of Bayesian methods, which is their explicit use of
probability for quantifying uncertainty in inferences based on
statistical data analysis.

Thanks for reading.

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019
Understanding probability. Finally! – Towards Data Science Page 15 of 15

https://towardsdatascience.com/understanding-probability-finally-576d54dccdb5 03-10-2019

You might also like