Chapter 4 Data Analysis
Chapter 4 Data Analysis
Slide 1/67
Introduction Random variables Describing distributions Special case Independent events
Outline
Introduction
Randomness
Random variables
Types of Random Variables
Distributions of Random variables
Describing probability distributions
Measures of location
Measures of spread
Special case
Sums of i.i.d.r.v.s and the Central Limit Theorem
Independent events
Exercises
Slide 2/67
Introduction Random variables Describing distributions Special case Independent events
References
References:
Utts and Heckard (4th & 5th editions) – Chapter 8
DeVeaux, Velleman, Bock – Chapter 16
Utts and Heckard (3rd edition) – Chapters 7 & 8
Slide 3/67
Introduction Random variables Describing distributions Special case Independent events
Learning Outcomes:
At the end of this topic you should be able to:
▶ Understand notation for random variables and related
quantities
▶ Recognise discrete and continuous random variables
▶ Use the pmf, pdf and cdf to calculate probabilities and
quantiles
▶ Describe probability distributions using expected value,
variance and standard deviation (including calculating these in
some cases)
▶ Use properties of expected value and variance to calculate
these for derived (or re-scaled) variables
▶ State the Central Limit Theorem
▶ Understand independence and mutual exclusion for random
variables
Slide 4/67
Introduction Random variables Describing distributions Special case Independent events
Road Map
Slide 5/67
Introduction Random variables Describing distributions Special case Independent events
Introduction
Slide 6/67
Introduction Random variables Describing distributions Special case Independent events
▶ So, what values for our sample mean (for example) could we
expect to observe, if we took lots and lots (an infinite number)
of random samples and recorded their respective means?
The answer to questions like this come from understanding what
random phenomena (processes) are, and how they behave.
⃝
In this chapter we look at ( ◦
◦ ) hypothetical (population) models
that enable us to model random phenomena, such as the variation
in outcomes from sample-to-sample.
Slide 7/67
Introduction Random variables Describing distributions Special case Independent events
. . . Visually
Slide 8/67
Introduction Random variables Describing distributions Special case Independent events
Randomness
Random phenomena
Example:
Flip a fair coin. It’s challenging to guess the outcome of just one
coin toss (one trial) because the outcome is random.
However, if you flip a fair coin many times over you would, in the
long run, expect the proportion of heads to be about 0.5.
Slide 9/67
Introduction Random variables Describing distributions Special case Independent events
Randomness
Slide 10/67
Introduction Random variables Describing distributions Special case Independent events
Randomness
https://pollev.com/paulfijn
Slide 11/67
Introduction Random variables Describing distributions Special case Independent events
Randomness
https://www.youtube.com/watch?v=JC41M7RPSec
Slide 12/67
Introduction Random variables Describing distributions Special case Independent events
Randomness
Quantifying randomness
Slide 13/67
Introduction Random variables Describing distributions Special case Independent events
Randomness
Pr(X = x) = p(x)
Slide 14/67
Introduction Random variables Describing distributions Special case Independent events
Random Variables
Notation
▶ for random variables use capital letters (X , Y , Z , . . .)
▶ for observed values use lower-case letters (x, y , z, . . .)
Slide 15/67
Introduction Random variables Describing distributions Special case Independent events
Examples
Slide 16/67
Introduction Random variables Describing distributions Special case Independent events
Examples
Slide 17/67
Introduction Random variables Describing distributions Special case Independent events
Probability Distributions
Slide 18/67
Introduction Random variables Describing distributions Special case Independent events
Slide 19/67
Introduction Random variables Describing distributions Special case Independent events
For example:
Y = number of individuals in a group who own an iPhone;
Z = number of medical students with CVD (Colour Vision
Deficiency);
V = number of grapevines that are diseased
Slide 20/67
Introduction Random variables Describing distributions Special case Independent events
. . . and Continuous
For example:
W = weight;
X = age;
T = temperature (◦ C)
Slide 21/67
Introduction Random variables Describing distributions Special case Independent events
Slide 22/67
Introduction Random variables Describing distributions Special case Independent events
https://pollev.com/paulfijn
Slide 23/67
Introduction Random variables Describing distributions Special case Independent events
pX (x) = Pr (X = x).
pX (x) may be displayed as a table, a graph or a formula.
Slide 24/67
Introduction Random variables Describing distributions Special case Independent events
Example — table
Slide 25/67
Introduction Random variables Describing distributions Special case Independent events
Example — graph
Slide 26/67
Introduction Random variables Describing distributions Special case Independent events
Properties of pmfs
Slide 27/67
Introduction Random variables Describing distributions Special case Independent events
F (x) = Pr (X ≤ x).
Slide 28/67
Introduction Random variables Describing distributions Special case Independent events
Illustrating. . .
x 1 2 3 4 5
p(x) 2c 3c c 4c 5c
Slide 29/67
Introduction Random variables Describing distributions Special case Independent events
Slide 30/67
Introduction Random variables Describing distributions Special case Independent events
. . . Pr(X = x)?
Slide 31/67
Introduction Random variables Describing distributions Special case Independent events
Example: rounding
Then Pr(16.5 < T < 17.5) is taken as the probability that ‘the
student takes 17 minutes’ (rounded to the nearest minute).
Slide 32/67
Introduction Random variables Describing distributions Special case Independent events
Properties of pdfs
3. Finding probabilities:
Z b
Pr(a < X < b) = fX (x)dx.
a
Slide 33/67
Introduction Random variables Describing distributions Special case Independent events
The cdf is defined in the same way for both discrete and
continuous random variables.
Slide 34/67
Introduction Random variables Describing distributions Special case Independent events
Slide 35/67
Introduction Random variables Describing distributions Special case Independent events
Measures of location
Percentiles of a distribution
Continuous distribution
the x boundary for a left area
Slide 36/67
Introduction Random variables Describing distributions Special case Independent events
Measures of location
Slide 37/67
Introduction Random variables Describing distributions Special case Independent events
Measures of location
Slide 38/67
Introduction Random variables Describing distributions Special case Independent events
Measures of location
Example:
x 0 1 2 3
pX (x) 81 3
8
3
8
1
8
1 3 3 1 3
E (X ) = 0 × + 1 × + 2 × + 3 × = = 1.5
8 8 8 8 2
Slide 39/67
Introduction Random variables Describing distributions Special case Independent events
Measures of location
Slide 40/67
Introduction Random variables Describing distributions Special case Independent events
Measures of location
E (Y ) = a × E (X ) + b
E (X + Y ) = E (X ) + E (Y ).
Slide 41/67
Introduction Random variables Describing distributions Special case Independent events
Measures of spread
Slide 42/67
Introduction Random variables Describing distributions Special case Independent events
Measures of spread
X
(x−µ)2 pX (x) = E (X − µ)2 ,
Var (X ) = where µ = E (X ).
x
Slide 43/67
Introduction Random variables Describing distributions Special case Independent events
Measures of spread
x 0 1 2 3
1 3 3 1
pX (x) 8 8 8 8
3
E (X ) = 2 (found previously)
2
1 3 3 1 3
Var (X ) = 0 × + 1 × + 4 × + 9 × −
8 8 8 8 2
24 9 3
= − =
8 4 4
Slide 44/67
Introduction Random variables Describing distributions Special case Independent events
Measures of spread
x 0 1 2
1.
pX (x)
0.5 0.3 0.2
2. pX (x) = x3 (0.6)x (0.4)3−x ,
x = 0, 1, 2, 3
3. pX (x) = kx, x = 1, 2, 3, 4, 5.
Slide 45/67
Introduction Random variables Describing distributions Special case Independent events
Measures of spread
where µ = E (X ).
p
The standard deviation of X is again Var (X ).
Slide 46/67
Introduction Random variables Describing distributions Special case Independent events
Measures of spread
Var (X ) = E (X 2 ) − µ2 .
Slide 47/67
Introduction Random variables Describing distributions Special case Independent events
Measures of spread
Var (Y ) = a2 Var (X )
sd(Y ) = |a|sd(X )
Also
Var (X − Y ) = Var (X ) + Var (Y ),
and
Var (aX + bY ) = a2 Var (X ) + b 2 Var (Y ).
Slide 48/67
Introduction Random variables Describing distributions Special case Independent events
Measures of spread
8. Standardising
8.1 Let X be a random variable, discrete or continuous, with
expectation µ and standard deviation σ.
From (6) it follows that XS = X −µ
σ has expectation 0 and
standard deviation 1.
8.2 Let Y be a random variable, discrete or continuous, with
expectation 0 and standard deviation 1. Again from (6) it
follows that W = σY + µ has expectation µ and standard
deviation σ.
Slide 49/67
Introduction Random variables Describing distributions Special case Independent events
Measures of spread
Rescaling — an example
Slide 50/67
Introduction Random variables Describing distributions Special case Independent events
Measures of spread
Slide 51/67
Introduction Random variables Describing distributions Special case Independent events
Measures of spread
E (T ) =
V (T ) =
stdev (T ) =
Slide 52/67
Introduction Random variables Describing distributions Special case Independent events
E (Sn ) = µ + µ + · · · + µ = nµ
Var (Sn ) = σ 2 + σ 2 + · · · + σ 2 = nσ 2
and hence p √
sd(Sn ) = Var (Sn ) = nσ
Slide 53/67
Introduction Random variables Describing distributions Special case Independent events
The Central Limit Theorem says that if, in addition, n is large then:
d √
Sn ≈ N(nµ, nσ)
This is an amazing and important result! It is the fundamental
reason for the importance of the normal distribution.
Slide 55/67
Introduction Random variables Describing distributions Special case Independent events
Then
E (X̄ ) = µ
Var (X̄ ) = σ 2 /n
and √
sd(X̄ ) = σ/ n
Slide 56/67
Introduction Random variables Describing distributions Special case Independent events
Example
Slide 57/67
Introduction Random variables Describing distributions Special case Independent events
The mean gives the forecast. The standard deviation gives the
accuracy of the forecast.
Slide 58/67
Introduction Random variables Describing distributions Special case Independent events
When betting on red in Roulette, how do the results for 100 games
of $1 each compare to:
playing 50 games for $2 each?
Slide 59/67
Introduction Random variables Describing distributions Special case Independent events
Slide 60/67
Introduction Random variables Describing distributions Special case Independent events
Slide 61/67
Introduction Random variables Describing distributions Special case Independent events
Definition of Independence
Slide 62/67
Introduction Random variables Describing distributions Special case Independent events
Slide 63/67
Introduction Random variables Describing distributions Special case Independent events
Slide 64/67
Introduction Random variables Describing distributions Special case Independent events
160
Pr (C ) = = 0.16 ̸= Pr (C |H) = 0.4
1000
Furthermore, since:
Pr (C |H) > Pr (C ). . . then the two events are positively associated.
Slide 65/67
Introduction Random variables Describing distributions Special case Independent events
Slide 66/67
Introduction Random variables Describing distributions Special case Independent events
Exercises
Slide 67/67
Introduction Random variables Describing distributions Special case Independent events
Exercises
Slide 68/67