05 Probability
05 Probability
Introduction to
probability
theory
2
Internal use
1. Random experiments, outcomes and events
Basic concepts I
– In statistics, we consider random experiments
Definition (Random experiment)
A random experiment is a process leading to at least two possible
outcomes with uncertainty as to which will occur.
• Examples: flip a coin, roll a die, daily change in a stock’s price, number of
customers entering a store.
• In each case, we can obtain different possible outcomes.
Definition (Basic Outcomes, Outcome space)
The potential outcomes of a random experiment are called basic
outcomes, and the set of all basic outcomes is called sample space and is
denoted by !.
3
Internal use
1. Random experiments, outcomes and events
Basic concepts II
– Examples:
• Flip a coin: Ω = {$%&', )&*+}
• Roll a die: Ω = {1, 2, 3, 4, 5, 6}
• Daily change in a stock’s price:
Ω = {higher than yesterday, lower than yesterday}
• Number of customers entering a store: Ω = {0, 1, 2, 3, … }
– Sometimes the interest is not the basic outcomes themselves but in some
subset of all outcomes in Ω.
Definition (Event)
An event is a set of basic outcomes from the outcome space, and it is said to
occur if the random experiment gives rise to one of its constituent basic
outcomes.
– Example: if a die is thrown, consider “Number resulting is even” and “Number
resulting is odd”.
4
Internal use
1. Random experiments, outcomes and events
Relation of events
– Intersection of events:
• The intersection of two events represents the set of common elements
which are in both sets.
! ∩ # ⇔ %&'( ! )*+ # &,,-.
• If two events have no common basic outcomes, we will say they are
mutually exclusive: ! ∩ # = ∅
5
Internal use
1. Random experiments, outcomes and events
Relation of events
– Union of events:
• The union of two events represents the set of common and not common
elements which are in either set.
! ∪ # ⇔ %&'(%) * +) , +) -+'( +../)0
6
Internal use
1. Random experiments, outcomes and events
Relation of events
– A case of special interest concerns a collection of several events whose
union is the whole outcome space Ω.
Definition (Collectively exhaustive events)
Let "# , "% , … "' be K events in the outcome space Ω. If
() ∪ (+ ∪ ⋯ ∪ (- = /, these K events are said to be collectively
exhaustive.
Definition (Complement)
Let A be and event in the outcome space Ω. The set of basic outcomes
belonging to Ω but not to A is called the complement of A and it is
denoted by 01 .
7
Internal use
2. What is probability?
What does Mathematics say about probability?
• Probability of event A: !"($)
• Kolgomorov axioms of probability:
1. 0 ≤ Pr A ≤ 1
2. Pr(Ω) = 1
3. Any countable sequence of pairwise disjoint events A1, A2, …, satisfies:
Pr $H ∪ $J ∪ ⋯ = L Pr($M )
M
Further properties:
• If OH , OJ , …, OP are collectively exhaustive events, then:
Pr OH ∪ OJ ∪ ⋯ ∪ OS = 1
• Pr $T = 1 − Pr $
• Pr $ ∪ V = Pr $ + Pr V − Pr($ ∩ V) Internal use
8
2. What is probability?
What is the intuition of probability?
• Probability is a measure of the uncertainty. It deals with the chance
(the likelihood) of an event occurring.
• There are two main approaches:
1. Frequentist (classical) approach: It is based on a concept we already
studied in the unit 2 of the course: the relative frequency.
2. Bayesian approach: Probability is interpreted as reasonable
expectation representing a state of knowledge or as quantification of
a personal belief (instead of frequency or propensity of some
phenomenon).
9
Internal use
2. What is probability?
Frequentist approach
• Example: Flip a coin
• What is the meaning of “the probability of getting a head is 50%”?
• Karl Pearson (an English mathematician, 1857-1936) flipped a coin 24000
KLMKL
times and he got 12012 heads ⟹ LNMMM = 0.5005=50.05%
• Under this approach, we should repeat the experiment n times and then
count the number of times that the event A, W(X), was observed.
W(\)
• Relative frequency of event A: ]
• Therefore, probability will be the limit of the relative frequency as n increases:
W(X)
lim = Pr(X)
]→` a
But, what happens if we cannot repeat the experiment many times? ⟹ Subjective
probability
10
Internal use
2. What is probability?
Bayesian approach
• The Bayesian approach to probability allows to quantitatively determine
probability values corresponding to statements whose truth or falsity is not
known with certainty.
• Conditional probability: It is is a measure of the probability of an event
occurring, given that another event has already occurred
&'()∩+)
Pr # $ = -.(+) , where /0($) ≠ 0
• Bayesian theory is completely based on the Bayes’ Theorem, which describes
how to obtain the probability of a hypothesis given an observation.
Pr # $ · Pr($)
/0 $ # =
Pr(#)
• Interpretation: Subjective probability. We are interested in the probability of
the event B ⟹ Pr $ =a priori probability; Pr $ # = a posteriori probability
11
Internal use
2. What is probability?
Examples for the use of Bayesian methods
• Until the ’90s Bayesian techniques suffered from the lack of computing
power.
• With the availability of new knowledge and powerful computers is
becoming more popular among scientists.
• Bayesian techniques have spread across many fields of science.
• It started as a merely intuition and after receiving data, it updated the
concept of probability
• A well-known example: Bayes’ rule animates the perky paper-clip that
pops up on computer screens running Microsoft Office, making Bayesian
guesses about what advice the user might need
12
Internal use
2. What is probability?
Bayesian approach. Example
• Example
An insurance company believes that people can be divided into two classes:
those who are accident prone and those who are not. Their statistics show
that an accident-prone person will have an accident at some time within a
fixed 1-year period with probability 0.4, whereas this probability decreases to
0.2 for a non-accident-prone person. If we assume that 30% of the population
is accident prone:
a) What is the probability that a new policyholder will have an accident
within a year of purchasing a policy?
b) Suppose that a new policyholder has an accident within a year of
purchasing a policy. What is the probability that he/she is an accident
prone?
13
Internal use
2. What is probability?
Bayesian approach. Example
• Example: Decision tree and probabilities
0.4=40% Accident (A)
Accident prone
30% (P)
No Accident (NA)
Non-Accident prone
(NP)
No Accident (NA)
14
Internal use
2. What is probability?
Bayesian approach. Example
• Example: Decision tree and probabilities
0.4=40% Accident (A)
Accident prone
30% (P) 1−0.4=60% No Accident (NA)
!" # = Pr ' ( · Pr ( + Pr ' +( · Pr(+() = 0.4 · 0.3 + 0.2 · 0.7 = 0.26 = 56%
A new policyholder will have an accident within a year of purchasing a policy with
a 26% of probability. 16
Internal use
2. What is probability?
Bayesian approach. Example
• Example
b) Suppose that a new policyholder has an accident within a year of
purchasing a policy. What is the probability that he/she is an accident
prone?
17
Internal use
2. What is probability?
Bayesian approach. Example
• Example
A marketing company is analyzing the impact of its marketing campaign in Radio
(R), Television (TV) an Internet (I). In order to do it, it categorizes results of
each one of the campaigns in Good (G), Medium (M) and Bad (B). Company
knows that a television campaign is never bad and is good with a probability of
70%. About radio campaigns the probability of being medium is 30% and the
probability of being bad 15%. Furthermore, all internet campaigns are medium.
If company does a 30% of radio campaigns and a 25% of internet campaigns,
calculate:
a) Probability that a campaign will be good.
b) If a campaign have had medium results, probability of being from radio.
c) Probability that a television campaign will have good results.
18
Internal use
2. What is probability?
Bayesian approach. Example
• Example: Decision tree and probabilities
Good (G)
30%
Medium (M)
Medium (M)
0% Bad (B)
0% Bad (B)
0% 20
Bad (B) Internal use
2. What is probability?
Bayesian approach. Example
• Example
a) Probability that a campaign will be good.
R = Radio
TV = Television
I = Internet
G = Good
M = Medium
B = Bad
12 / . · 12(.)
!"#$% & '($)*$+ ⟹ -* . / =
12(/)
The probability that a television campaign will have good results is 31.5%.
23
Internal use
3. Random variable
Definitions
• Assume that a random experiment is to be carried out and that numerical
values can be attached to the possible outcomes.
• Before the experiment, there will be uncertainty as to the outcome, and this
can be quantified in terms of probabilistic statements.
• When the outcomes are numerical values, these probabilities can be
conveniently summarized through the notion of a random variable.
• Definition: A random variable is a variable that takes on numerical values
determined by the outcome of a random experiment.
– It is important to distinguish between a random variable and the possible values it
can take.
– We denote a random variable by X, and its possible values by x.
24
Internal use
3. Random variable
Discrete and continuous random variables
• Random variables can be discrete and continuous:
• Discrete random variable: It is a variable that can take a countable (or
finite) number of values.
• Examples:
• Number of patients in a doctor’s surgery in a day.
• Number of defective items in a box of ten.
• Continuous random variable: It is a variable that can take an infinite
number of possible values.
• Examples:
• Time required to run a kilometer.
• Household income in a year.
25
Internal use
3. Random variable
Probability mass function and density function
• Discrete random variables à Probability mass function:
! " = Pr(' = ")
• Continuous random variables: This probability is equal to zero (!), and ! " is
defined by its properties.
• Properties:
1. ! " ≥ 0, ∀ " ∈ .
2. If X is discrete ⟹ ∑∀ 1∈2 ! " = 1; If X is continuous ⟹ ∫2 ! " 5" = 1
3. Discrete ⟹ Pr(' ∈ 6) = ∑ 1∈A ! " ; Continuous ⟹ Pr ' ∈ 6 ∫7 ! " 5",
where 6 ∈ .
• Any function that satisfies these properties is a p.d.f. ⟹ infinite number of
p.d.f.’s
26
Internal use
3. Random variable
Cumulative distribution function
• Cumulative distribution function:
! " = Pr(' ≤ ")
• If X is discrete: ! " = Pr ' ≤ " = ∑,+ -(.)
,
• If X is continuous: ! " = Pr ' ≤ " = ∫01 - . 2.
• This function completely describes the probability distribution of the random
variable.
• Properties:
1. Monotone increasing (not necessarily strictly): F " ≤ ! 3 if " < 3.
2. Right-continuous: ! 8 = lim ! " for any 8.
,→<=>
3. lim ! " = 0 8D2 lim ! " = 1
,→01 ,→=1
• All probability questions about X can be answered in terms of the c.d.f.
27
Internal use
4. Expected value and variance
Expected value
• The expected value is the average value of a random variable over a large
number of experiments.
• For discrete random variables, the expected value is defined as:
! " = $ )* · Pr(" = )* )
∀&∈(
• For continuous random variables, the expected value is:
2
! " = 0 ) · 3 ) 4)
12
• In the frequentist approach, the expected value can be viewed as the long-run
average value that a random variable would take if the experiment was repeated
a large number of times.
• Do not confuse with the sample mean which is calculated from observed data,
while the expected value is the mean of the theoretical/population distribution.
28
Internal use
4. Expected value and variance
Properties of the expected value
1. ! " = ", where " is a constant.
2. ! $% ± $' ± $( ±. … ± $+ = ! $% ) ± ! $' ± ! $( ± … ± !($+
3. ! $% · $' · $( · … · $+ = ! $% ) · ! $' · ! $( · … · !($+ ⇔ the random variables
are independent.
4. Denoting the mean by ! $ = 0, we have ! $ − 0 = 0 ⟹ The mean is the
center of gravity of the distribution X.
5. ! $ + 6 = ! 7 + 6, where 6 is a constant.
6. ! 8$ = 8 · ! 8 , where 8 is a constant.
7. From 5 and 6, we have:
9 = 6 + 8$ ⟹ ! 9 = 6 + 8 · !($)
29
Internal use
4. Expected value and variance
Variance
• Consider the following random variables !, # and ':
! = 0, with probability 100%
−1, with probabiliy 50%
#=6
1, with probability 50%
−100, with probabiliy 50%
'=6
100, with probability 50%
• The expected value for all the previous variables !, # and ' is equal to zero
[E ! = ; # = ; ' = 0], but there is obviously a much greater spread (or
dispersion) in # and ' than in !.
• To measure the possible variation of each variable around its mean < we have to
define the variance:
=>? ! = ; ! − < @ 30
Internal use
4. Expected value and variance
Calculation of the variance
• The variance can also be calculated as:
!"# $ = & $ − ( ) = & $ ) − &($) )
31
Internal use
4. Expected value and variance
Properties of the variance
32
Internal use
4. Expected value and variance
Covariance
• The covariance is defined as:
!"# $, & = ( $ − *+ · & − *- = ( $ · & − ( $ · ((&)
• The sign of the covariance shows the direction of the relation between X and Y.
• Properties:
1. !"# $, & = !"#(Y, X)
2. !"# $, $ = 234 $
3. !"# 3$, & = 3 · !"# $, &
4. 234 $ ± Y = Var X + Var Y ± 2 · !"#($, &)
33
Internal use
5. Discrete random variables
Review of basic concepts
• Discrete random variables can take on either a finite or at most a countably
infinite set of discrete values.
• A probability mass function (p.m.f.) is a function that gives the probability that
a discrete random variables is exactly equal to some value: Pr # = %& = '& ,
where all the values of %& that belong to the sample space have positive
probabilities ⟹ Pr # = %& > 0.
• The sum of probabilities for all possible values in the sample space is equal to
one:
- Pr # = %& = - '& = 1
./ ∈1 ./ ∈1
• The cumulative distribution function (c.d.f.) is given by:
3& = Pr # ≤ %& = - Pr(# = %8 )
34
.5 6./ Internal use
5. Discrete random variables
Example 1
• For an unfair die the possible outcomes have the following probability
distribution:
!" 1 2 3 4 5 6
#" 0.1 0.4 0.1 0.2 0.05 0.15
• Calculate the distribution function:
?
35
Internal use
5. Discrete random variables
Example 1 (solution)
• For an unfair die the possible outcomes have the following probability
distribution:
*0 1 2 3 4 5 6
10 0.1 0.4 0.1 0.2 0.05 0.15
• Calculate the distribution function:
0 * <1
0.1 * ≤1
0.5 * ≤2
! " = 0.6 * ≤3
0.8 * ≤4
0.85 * ≤5
36
1 * ≤6 Internal use
5. Discrete random variables
Example 2
• Consider a fair die. For the random variable X “number obtained after rolling
a die”:
a) Find the probability mass function:
?
b) Find the cumulative distribution function:
?
c) Calculate the following probabilities: Pr # = 3 , Pr # ≥ 3 , Pr # < 3 ,
Pr 4 < # < 5 , Pr 3 < # ≤ 5 , Pr 1 ≤ # < 4 and Pr(1 ≤ # ≤ 4)
? Internal use
37
5. Discrete random variables
Example 2 (solution)
• Consider a fair die. For the random variable X “number obtained after rolling
a die”:
a) Find the probability mass function:
,/ 1 2 3 4 5 6
0/ 1/6 1/6 1/6 1/6 1/6 1/6
b) Find the cumulative distribution function:
0 ,<1
1/6 , ≤ 1
2/6 , ≤ 2
! " = 3/6 , ≤ 3
4/6 , ≤ 4
5/6 , ≤ 5 38
1 ,≤6 Internal use
5. Discrete random variables
Example 2 (solution)
c) The following probabilities:
!" # = % = '&(
!"(# ≥ %) = Pr . = 3 + Pr . = 4 + Pr . = 5 + Pr . = 6 = 4&%
!" # < % = Pr . = 1 + Pr . = 2 = '&%
!" 8 < # < 9 = :
!" % < # ≤ 9 = Pr . = 4 + Pr . = 5 = '&%
!" ' ≤ # < 8 = Pr . = 1 + Pr . = 2 + Pr . = 3 = '&4
!" ' ≤ # ≤ 8 = Pr . = 1 + Pr . = 2 + Pr . = 3 + Pr . = 4 = 4&%
39
Internal use
5. Discrete random variables
Expected value
• The expected value is the theoretical average value of a numerical random
experiment over many repetitions of the experiment.
• It can be calculated by multiplying each of the possible outcomes of an
experiment by the likelihood each outcomes will occur and then summing all
of those values.
)
! " = $ = % *& · ,&
&'(
• Example: A random variable can take the values 1, 2 and 3 with probabilities
0.2, 0.3 and 0.5, respectively. With this information, calculate the expected
value.
? 40
Internal use
5. Discrete random variables
Expected value
• The expected value is the theoretical average value of a numerical random
experiment over many repetitions of the experiment.
• It can be calculated by multiplying each of the possible outcomes of an
experiment by the likelihood each outcomes will occur and then summing all
of those values.
)
! " = $ = % *& · ,&
&'(
• Example: A random variable can take the values 1, 2 and 3 with probabilities
0.2, 0.3 and 0.5, respectively. With this information, calculate the expected
value of this experiment.
)
*& 1 2 3
! " = % *& · ,& = 1 · 0.2 + 2 · 0.3 + 3 · 0.5 = 2.3
,& 0.2 0.3 0.5 &'(
41
Internal use
5. Discrete random variables
Variance
• The variance of a random variable is a measure of spread for a distribution
that determines the degree to which the values of a random variable differ
from the expected value. It can be calculated as follows:
- -
!"# $ = &'( = ) .* − 0 ( · 2* = ) .*( · 2* − 0(
*+, *+,
• Example: Consider a fair die. Find the expected value and the variance for the
random variable X “number obtained after rolling a die”.
?
42
Internal use
5. Discrete random variables
Variance
• The variance of a random variable is a measure of spread for a distribution
that determines the degree to which the values of a random variable differ
from the expected value. It can be calculated as follows:
- -
!"# $ = &'( = ) .* − 0 ( · 2* = ) .*( · 2* − 0(
*+, *+,
• Example: Consider a fair die. Find the expected value and the variance for the
random variable X “number obtained after rolling a die”.
.* 1 2 3 4 5 6
2* 1/6 1/6 1/6 1/6 1/6 1/6
1 1 1 1 1 1
3 4 = 1 · + 2 · + 3 · + 4 · + 5 · + 6 · = <. >
6 6 6 6 6 6
1 1 1 1 1 1
?@A 4 = 1 · + 2 · + 3 · + 4 · + 5 · + 6 · − 3.5( = B. CB
( ( ( ( ( (
43
6 6 6 6 6 6 Internal use