0% found this document useful (0 votes)
58 views43 pages

Statistical Foundations: SOST70151 - LECTURE 2

Uploaded by

zhaoluanw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views43 pages

Statistical Foundations: SOST70151 - LECTURE 2

Uploaded by

zhaoluanw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

STATISTICAL FOUNDATIONS

SOST70151 – LECTURE 2

Prof. Natalie Shlomo and Dr. Kathrin Morosow


MyAttendance

• To log your attendance at these timetabled sessions, you will need to do the
following:
1. At the beginning of the session, on the My Manchester App, or My Manchester
Webpage, navigate to ‘My Check In’ (Formerly My Attendance). Please see the
attached user guide.
2. Click on the timetable and ‘Check In’ to the appropriate activity.
N.B You may also be required to sign a physical register. If so, it is important that
you do this as well.
3. If you are unable to ‘Check In’, please make your session leader aware at the end
of the session.
• A video guide on how to use the system, a user guide and FAQs can be found
here: https://www.welcome.manchester.ac.uk/get-ready/become-a-
student/guide-to-my-manchester/my-attendance/
Overview

Random variables and Probability Distributions

1. Random Variables
2. Probability Density Function
3. Probability Cumulative Distribution Function
4. Bernoulli Distribution
5. Binomial Distribution
6. Poisson Distribution
7. The Uniform Distribution
8. The Exponential Distribution
Random Variables
• Consider the experiment of asking a person the question on gender
• The outcome will be `Male' or `Female'.
• But from a data analysis perspective we cannot work with categories: `Male' or
`Female' - we need a way of transforming these responses into numbers

Example 1: Toss a coin. The sample space is S={H,T}


• We define a random variable X that maps S={H,T} into:
1 𝑖𝑓 𝐻𝑒𝑎𝑑𝑠
• 𝑋=ቊ
0 𝑖𝑓 𝑇𝑎𝑖𝑙𝑠

• For an experiment of tossing the coin 20 times, we can sum up 𝑋𝑖 , i=1,..,20 and
obtain the number of times we obtained Heads: σ20 𝑖=1 𝑋𝑖 or the proportion of
Heads: σ20
𝑖=1 𝑋𝑖 /20
Random Variables

Example 2: In a question on a government policy, the outcomes are:


S={Strongly Approve, Approve, Indifferent, Disapproves, Strongly
Disapproves}

Random variable: X maps S as follows:


−2 𝑖𝑓 𝑠𝑡𝑟𝑜𝑛𝑔𝑙𝑦 𝑑𝑖𝑠𝑎𝑝𝑝𝑟𝑜𝑣𝑒𝑠
−1 𝑖𝑓 𝑑𝑖𝑠𝑎𝑝𝑝𝑟𝑜𝑣𝑒𝑠
0 𝑖𝑓 𝑖𝑛𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡
𝑋=
1 𝑖𝑓 𝑎𝑝𝑝𝑟𝑜𝑣𝑒𝑠
2 𝑖𝑓 𝑠𝑡𝑜𝑛𝑔𝑙𝑦 𝑎𝑝𝑝𝑟𝑜𝑣𝑒𝑠

Example 3: In a question on income, random variable X maps S into ℝ


(any rational number) where 𝑋 = 𝜔𝜖𝑆 is the income
Random Variables

• In general, X is a function such that X: S → A and A is a subset of S

• We can distinguish between continuous and discrete random variables depending on the number of
elements in A
If A is uncountable than X is continuous
If A is countable then X is discrete

• Once random variables are defined, we can address questions about the likelihood of events into
questions about the likelihood of X

• The function that fully describes the likelihood of each value (or set of values) of a random variable are
called probability density functions
• A realisation is a possible outcome of the random variable (denoted by lower case letters)
Probability Density Function of a Discrete Random Variable

• Let X be a discrete random variable with values 𝑥1 , 𝑥2 , …


The function:
𝑃 𝑋 = 𝑥 𝑖𝑓 𝑥 = 𝑥𝑗 , 𝑗 = 1,2,3
𝑓 𝑥 =൝
0 𝑖𝑓𝑥 ≠ 𝑥𝑗

is the probability density function of X

• This density function 𝑓 𝑥 characterizes the likelihood of each value of X and satisfies 2 properties:
(1) 𝑓 𝑥 ≥ 0 (occurs with some likelihood or does not occur at all)
(2) If 𝑥1 , 𝑥2 , … are all values of a discrete random variable X then σ𝑗 𝑓(𝑥𝑗 ) = 1 (the probability of
observing all values of X is 1)
Probability Density Function of a Discrete Random Variable

An example of the probability density function for a


discrete random variable.
The underlying random variable could take on
values 0,1,...,20 (as seen in the horizontal axis).
Each bar measures the 𝑃 𝑋 = 𝑥 for 𝑥 𝜖
{0,1,2,…20}
You can readily see which values are most likely
(9,10,11) and which are very unlikely (0, 1, 19, 20)
Probability Density Function of a Discrete Random Variable

Assume 3 coin tosses with 23 possible outcomes:


TTT, TTH, THT, HTT, HHH, HTH, HHT THH

Let X be the number of Heads: x=0,1,2,3

x 0 1 2 3
f(x) 0.125 0.375 0.375 0.125
Probability Cumulative Distribution Function

The Probability Cumulative Distribution Function of a random variable X, 𝐹𝑥 𝑥 (or simply 𝐹(𝑥)) measures:
𝐹 𝑋 = 𝑃 𝑋 ≤ 𝑥𝑗 = σ𝑗:𝑥𝑗 ≤𝑥 𝑓(𝑥𝑗 )

Is uniquely defined for each random variable:


lim 𝐹 𝑥 = 0
𝑥→−∞
lim 𝐹 𝑥 = 1
𝑥→∞
Monotonic increasing: if 𝑥1 ≤ 𝑥2 , then 𝐹(𝑥1 ) ≤ 𝐹(𝑥2 )
Right continuous.

Then: 𝑓 𝑥𝑗 = 𝑃 𝑋 ≤ 𝑥𝑗 − 𝑃 𝑋 ≤ 𝑥𝑗−1 = 𝐹 𝑥𝑗 − 𝐹 𝑥𝑗−1


and 𝑃 𝑋 > 𝑥 = 1 − 𝐹(𝑥)
Probability Cumulative Distribution Function of a Discrete
Random Variable

An example of the distribution function for a discrete


random variable
The underlying random variable can take on values
0,1,...,20 (as seen in the horizontal axis)
Each step measures 𝑃(𝑋 ≤ 𝑥) for 𝑥 𝜖 {0,1,2,…20}
Probability Cumulative Distribution Function of a Discrete
Random Variable

• Note that the vertical axis in the graphs are different (the graph on the left goes up to 0.3; the graph on the
right goes up to 1). Each step in the right graph results from adding all the bars up to that point in the left
graph. That is, 𝐹 𝑋 = 𝑃 𝑋 ≤ 𝑥𝑗 = σ𝑗:𝑥𝑗 ≤𝑥 𝑓(𝑥𝑗 )
• Conversely, the height of each step corresponds with the height of the corresponding bar at the same point
on the left graph. That is, 𝑓 𝑥𝑗 = 𝑃 𝑋 ≤ 𝑥𝑗 − 𝑃 𝑋 ≤ 𝑥𝑗−1
Probability Cumulative Distribution Function of a Discrete
Random Variable

Assume 3 coin tosses with 23 possible outcomes:


TTT, TTH, THT, HTT, HHH, HTH, HHT THH

Let X be the number of Heads: x=0,1,2,3

x 0 1 2 3
F(x) 0.125 0.5 0.875 1
Modelling

• Having defined random variables and their distributions/densities allows us to carry out inference
• The point of probability theory was to characterise what happens in real life
• We can create models of random behaviour that try to replicate the behaviour of real random
phenomena
• These models are specific equations for, normally, a random variable's density function
• When observing real random phenomenon, a researcher conjectures which of these models
might best explain the phenomenon
• Although real life is unlikely to follow exact formulae, imposing these `nice' models on real
phenomena is the focus of social science research
Bernoulli Distribution

• Bernoulli experiment can result in only one of two possible Outcomes:


success with a probability p or failure with a probability 1-p
(Examples: getting heads on a flip of a coin, answering yes to a binary question such as
employed vs not employed)
• A Bernoulli random variable assumes one of two possible values: 0 or 1 to indicate failure or success
respectively in a Bernoulli experiment.
• If X has a Bernoulli distribution, then
𝑓 𝑥 = 𝑃 𝑋 = 𝑥 = 𝑝 𝑥 (1 − 𝑝)1−𝑥

0 𝑖𝑓 𝑥 < 0
𝐹 𝑥 = ቐ1 − 𝑝 𝑖𝑓 0 ≤ 𝑥 < 1
1 𝑖𝑓 𝑥 ≥ 1

Note that if x = 1, then f (x) = p and if x = 0, then f (x) = 1 − p


Bernoulli Distribution

Bernoulli probability density function (pdf) and probability cumulative distribution function (cdf)
with p = 0.3
Binomial Distribution

A random variable, X, is said to have a binomial distribution if it is defined to be the total


number of successes in N independent Bernoulli experiments each with probability of
success p. N denotes the number of trials and k the number of successes

If X has a Binomial distribution, then


𝑁 𝑘
𝑓 𝑥 =𝑃 𝑋=𝑘 = 𝑝 (1 − 𝑝)𝑁−𝑘
𝑘
𝑘
𝑁 𝑗
𝐹 𝑥 =෍ 𝑝 (1 − 𝑝)𝑁−𝑗
𝑗
𝑗=1

𝑁 𝑁!
Recall that = and 𝑁! = 𝑁 × 𝑁 − 1 × ⋯ × 1
𝑘 𝑘! 𝑁−𝑘 !
Example: number of times of getting a heads after flipping a coin N times (with probability of
seeing a head = p)
Binomial Distribution

For instance, if p= 0.4, n = 3,


3×2×1
𝑓 2 = 32 0.42 0.61 = 0.16 × 0.6 = 0.288
(2×1)(1)
𝐹 2 =𝑃 𝑋 ≤2 =𝑓 0 +𝑓 1 +𝑓 2 =
3 2 0.61 = 0.936
2
0.4

while
F(2) = P(X 2) = f (0) + f (1) + f (2) = 0.216 +
0.432 + 0.288 = 0.936 Binomial probability density functions with
p = 0.5 and k = 20 (green), with
p = 0.5 and k = 40 (blue) and with
p = 0.05 and k = 20
(orange)
Poisson Distribution

The Poisson distribution with parameter 𝜆 summarises probability of a given number of occurrences of an
event in a fixed amount of time (occurrences are independent, e.g. one occurrence doesn’t affect the
likelihood of any other occurrence)

If X has a Poisson distribution, then

𝑒 −𝜆 𝜆𝑥
𝑓 𝑥 =𝑃 𝑋=𝑥 = 𝑥 = 1,2,3 …
𝑥!

Example: The number of leaflets posted through your letter box on a given day; the number of patients
arriving in an emergency room between 11pm and midnight
Poisson Distribution

Poisson probability density functions with 𝜆 = 1 (orange), 𝜆 = 5 (blue) and 𝜆 = 20 (green).


Other Distributions for Discrete Random Variables

• Negative-Binomial
• Multinomial
• Zero-inflated Poisson
• Beta-Binomial
• Mixed Poisson (this is not one distribution, but a class of them which includes the
Negative Binomial and Beta-Binomial distributions )

Note that all these densities depend on `tuning' constants (p in the Bernoulli; k, p in the
Binomial; 𝜆 in the Poisson). We call these constants `parameters '.
Parameters matter because they define the shape of the probability density and
cumulative distribution functions (i.e. they determine which values are more/less likely)
Random Variables - Continuous

As a preview, we do not allocate probabilities to every possible outcome of continuous variables

Let X be
• lifespan bulb: x > 0
• juice in carton: 0  x  1

Rather to subsets of their ranges, i.e. Intervals:


The probability that the bulb survives between 0.1 and 1 hour
P({X  0.1}  {X  1}) = P(0.1  X  1 )

The probability that the juice in the carton is more than half a litre but less than 1 litre
P({X  0.5}  {X < 1}) = P(0.5  X < 1 )
Random Variables - Continuous

In many cases it is not meaningful to allocate for every outcome a probability because
the outcomes are too many
Random Variables - Continuous
Example: Proportion of a bottle that is full (a number between 0 and 1); Income

Histograms represent probabilities in


intervals

We focus attention on the liklihood of


intervals rather than single values
Random Variables - Continuous

The probability density function here measures that proportional change in 𝐹 𝑥 when we
move 𝑥 by a tiny amount ∆𝑥:

𝐹 𝑥 + ∆ − 𝐹(𝑥)
𝑓 𝑥 = lim
∆→∞ 𝑥

Or more formally 𝑓 𝑥 is the derivative 𝐹 𝑥


Random Variables - Continuous

The probability density function here measures that proportional change in 𝐹 𝑥 when we move 𝑥 by a tiny
amount ∆𝑥:

Or more formally 𝑓 𝑥 is the derivative 𝐹 𝑥

The density function at x tells us the proportional change in


probability f(x) would experience if we move a negligible
amount away from x.
Random Variables - Continuous

Histograms for probabilities in interval

𝑃 𝑎≤𝑋<𝑏 = The number of


𝐴𝑟𝑒𝑎 𝑢𝑛𝑑𝑒𝑟 𝑓 𝑥 observations = the area
𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑎 𝑎𝑛𝑑 𝑏 (The base × height)

This is the meaning of the derivative:


height
𝑥
𝐹 𝑥 = ‫׬‬−∞ 𝑓 𝑥
And

𝑃 𝑎≤𝑋<𝑏
𝑏
=F(b)-F(a)=‫)𝑥(𝑓 𝑎׬‬
base
Random Variables - Continuous

Histograms for probabilities in interval

The number of
observations = the area
(The base × height)

The curve describes the


probability density
distribution
Random Variables - Continuous

With infinitely many bars we are left with the curve

def The curve f(x) is called a probability density function (pdf)

The area under the curve is the probability of the interval

P(2 X <4)
Random Variables - Continuous

EXAMPLE:

probability that we get a P({X≥4})


value of X that is smaller than 2 but larger than 4? P({X<2})
P({X<2}{X≥4}) =
= P({X<2})+P({X≥4})

The total area under the curve is the probability:


P(−∞≤X≤∞)=1

So: P({X<2}{X≥4}) = P({X<2})+P({X≥4})


= 1−P({{X<2}{X≥4}}c) = 1−P(2 X <4)
The Uniform Distribution

We say that the continuous stochastic variable


X~uniform(0,1) if the pdf
1
ìï 1 if 0 £ x £ 1
f (x) = í
ïî 0 0 otherwise

The general pdf X~uniform(a,b)


1
𝑎≤𝑥≤𝑏
f(x) = ൝𝑏−𝑎
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
All intervals (of equal length) equally likely

Most basic form of `random’ in continuous case


The Uniform Distribution

The probability cumulative distribution function (cdf) is


F(x)=P(X≤ x)
1
The cdf of a uniform random
variable is F(b)=P(0  X < b ) = b

And more generally


b
0 𝑥<𝑎
𝑥−𝑎
F(x)=൞𝑏−𝑎 𝑎 ≤ 𝑥 < 𝑏
1 𝑥≥𝑏

b
The Uniform Distribution

Example: The average amount of weight gained by a person over the


winter months is uniformly distributed from 0 to 30lbs. Find the
probability a person will gain between 10 and 20lbs during the winter
months

Step 1: The pdf of Uniform(0,30) =1/30


Step 2: The width of the “slice” of the distribution is 20 – 10 = 10
Step 3: Multiply the width (Step 2) by the height (Step 1) to get:
Probability = 10 * 1/30 = 10/30 = 1/3.
The Exponential Distribution

Assume that we want to model something positive that has increasingly


less probability the larger it gets

- The time till divorce


- The time spent as unemployed

The probability that


P(0.5<X<1.5)
is greater than
P(2.5<X<3.5)
The exponential distribution has pdf

f(x) = λe−λx
The Exponential Distribution

We say that X~exponential(λ)

f(x) = λe−λx
The Exponential Distribution

We say that X~exponential(λ)

f(x) = λe−λx

A useful property is that we can write down


the Cumulative distribution function (cdf)

F(x) = 1−e−λx
The Exponential Distribution

Let time until Mark has his next tea pexp(2,1)


X~exponential(λ=1)

f(x) = e−x
What is the probability that Mark has a cup of
tea in the next 2 hours?

F(2) = P(X<2)=1−e−λ2
= 1−e−2
= 1−0.135
= 0.86
The Exponential Distribution

Proof:
We say that X~exponential(λ) 𝑃(𝑇>𝑠+𝑡∩𝑇>𝑠)
=
𝑃(𝑇>𝑠)
𝑃(𝑇>𝑠+𝑡) 𝑒 −𝜆(𝑠+𝑡)
f(x) = λe−λx F(x) =P(X<x)= 1−e−λx 𝑃(𝑇>𝑠)
= −𝜆𝑠
𝑒
=
−𝜆𝑡
𝑒 = 𝑃(𝑇 > 𝑡)
P(T>s+t|T>s)=P(T>t)
Function is used in Cox survival
model
If time to events is exponential
then the number of events is
Poisson
P(Y=k) = e−λ
The Exponential Distribution
Example: The cdf is:
- If jobs arrive every 15 seconds on average, λ =
4 per minute, what is the probability of F(x) =P(X<k)= 1−e−λk
waiting less than or equal to 30 seconds, i.e
0.5 min? F(.5)=1-e(-4*.5)=1-e-2=0.86

- Accidents occur with a Poisson distribution at Poisson: P(X > 5) = 1 − P(X ≤ 5)=1-
an average of 4 per week. i.e. λ = 4 42 𝑒 −4 43 𝑒 −4
1. Calculate the probability of more than 5 [𝑒 −4 +4𝑒 −4 + + +
2! 3!
accidents in any one week 44 𝑒 −4 45 𝑒 −4
2. What is the probability that at least two + ]=0.215
4! 5!
weeks will elapse between accident?

Exponential:
P(X>2)= e(-4*2)= e-8=0.0034
Simulation in R
Base R includes commands to calculate the value Example: dexp is the pdf of exponential
of a large set of density density functions and distribution, pexp is the cdf and rexp generates
probability cumulative distribution functions. random deviates with 𝜆=1:
plot(c(0:10), dexp(c(0:10),1), type='l',
dbinom(x,size=,prob=) xlab='x', ylab='pdf', main='lambda: 1')
dpois(x, lambda=)
dunif(x, min= , max=) plot(c(0:10), pexp(c(0:10),1), type='l',
dexp(x,rate = ) xlab='x', ylab='cdf', main='lambda: 1')

pbinom(x,size=,prob=) exponrand1000 <- rexp(1000,1)


ppois(x, lambda=) hist(exponrand1000 )
punif(x, min= , max=)
pexp(x,rate = )
Simulation in R
Generate observations simulating the behaviour
of random variables.:

rbinom(n,size=,prob=)
rpois(n, lambda=)
runif(n, min= , max=)
rexp(n,rate = )
Simulation in R
If we have lots of observations from a random variable, we can simulate its true density
(and cumulative distribution) function

This also implies that we could compute probabilities: give a set of simulated numbers
from a random variable, P(X ≤ x) can be approximated by the proportion of numbers
smaller or equal than x:

realProbability <- pexp(1, rate=1)


simulatedData <- rexp(100000, rate=1)
numbersUnder1 <- simulatedData<=1
print(realProbability)
## [1] 0.6321206
mean(numbersUnder1)
## [1] 0.6332
Reading

Discrete and Continuous Random Variables, probability distributions:

Crawshaw & Chambers (2014)


Chapters 4, 5 and 6

Agresti, A. (2018)
Chapter 4.2 and 4.3

Gill (2006)
Chapter 8

You might also like