Stats2 Textbook Week4
Stats2 Textbook Week4
for
STATISTICS FOR DATA SCIENCE - 2
Contents
1 Continuous random variable 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Cumulative Distribution function . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 CDF of a random variable . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Properties of CDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Continuous Random Variable: Approximation of CDF from Discrete to Con-
tinuous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Cumulative Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Examples of valid CDF’s . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Probability of intervals using continuous CDF . . . . . . . . . . . . . 10
1.5 General random variables and continuous random variables . . . . . . . . . . 12
1.5.0.1 CDFs and random variables . . . . . . . . . . . . . . . . . . 13
1.5.1 Properties of CDF: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.2 Continuous random variable . . . . . . . . . . . . . . . . . . . . . . . 14
1.5.3 Some scenarios for continuous models . . . . . . . . . . . . . . . . . . 18
1.6 Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6.1 Properties of PDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Common distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.7.1 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.7.2 Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.7.2.1 Memoryless property of Exponential . . . . . . . . . . . . . 30
1.7.3 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.7.3.1 PDF of Normal distribution . . . . . . . . . . . . . . . . . . 31
1.7.3.2 CDF of Normal distribution . . . . . . . . . . . . . . . . . . 32
1.7.3.3 Standard normal distribution . . . . . . . . . . . . . . . . . 32
1.7.3.4 Probability computations with normal distribution . . . . . 33
2
Chapter 4
– We have the data for 45, 000+ meteorites and the range of weights vary from 0.01
grams to 60 tons.
– If we want to do some statistical study on this vast data, we will think of meteorite
weight as the random variable. If we stick with the discrete random variable, it
becomes really difficult to gain any insight from it.
• Preprocessing: Take (log2 ) to the data. Now, the range is reduced from [0.01,
60000000] to [−6.6, 25.8]. But still, we have 45000+ data.
• Main idea: Divide [−6.6, 25.8] into 100 intervals of the size 0.3. We will have
3
Histogram for the log of weight of meteorite data
The x-axis represents the bin size and the y-axis gives the count of number of values
that lie in each bin.
(ii) Binomial(n, p)
We can see here the pmf has a nice shape but dealing with calculation is not easy. So,
we can give up the precision of individual values and focus on the shapes and come
with some alternative model.
1
1
Meteorite data has been taken from NASA’s open data portal.
4
1.2 Cumulative Distribution function
1.2.1 CDF of a random variable
Definition: The Cumulative Distribution Function (CDF) of a random variable X, denoted
FX (x), is a function from R to [0, 1], defined as
FX (x) = P (X ≤ x)
CDF is a very important bridge between the discrete world and the continuous world.
iv) As x → ∞, FX goes to 1.
1.2.3 Examples
i) Bernoulli random variable
Consider a Bernoulli random variable X with X taking the values 0 and 1 with proba-
bilities (1 − p) and p, respectively.
Solution:
5
ii) Throw a die
Consider a random variable X that represent the outcomes on throwing a fair die. The
outcomes are {1, 2, 3, 4, 5, 6}.
X ∼ Uniform{1, 2, 3, 4, 5, 6}
6
Therefore, the CDF of X is given by
0, x<1
1≤x<2
1/6,
2/6, 2≤x<3
FX (x) = 3/6, 3≤x<4
4/6, 4≤x<5
5/6,
5≤x<6
1, x≥6
Compute: P (X = 4.5)
P (X = 4.5) = 0 (since there is no jump at x = 4.5)
x x1 x2 x3 x4 x5
fX (x) p1 p2 p3 p4 p5
7
iv) Computing probability of intervals using CDF
Compute: (a) P (3 < X ≤ 10), (b) P (3.2 < X ≤ 10.6), (c) P (X ≤ 17), (d) P (X ≤ 17.3),
(e) P (X > 87), (f) P (X > 87.4)
Solution:
10 3 7
(a) P (3 < X ≤ 10) = FX (10) − FX (3) = − =
100 100 100
10 3 7
(b) P (3.2 < X ≤ 10.6) = FX (10.6) − FX (3.2) = − =
100 100 100
17
(c) P (X ≤ 17) = FX (17) =
100
17
(d) P (X ≤ 17.3) = FX (17.3) =
100
87 13
(e) P (X > 87) = 1 − FX (87) = 1 − =
100 100
87 13
(f) P (X > 87.4) = 1 − FX (87.4) = 1 − =
100 100
8
1.3 Continuous Random Variable: Approximation of CDF from
Discrete to Continuous
Consider the plot of CDF of Binomial random variable (n, 0.6). Keep the scale of the picture
same and increase the value of n.
Notice that the CDFs start to look like a continuous line with the increase in the value
of n.
2. As x → −∞, F goes to 0.
3. As x → ∞, F goes to 1.
The functions defined in the above manner mirror the properties of CDF of a random
variable. If we take any arbitrary CDF, it does not have to be a step like structure, it can
also be smooth and continuous.
9
1.4.1 Examples of valid CDF’s
• We can observe that all the CDF’s are non-decreasing, starts at 0 and ends at 1, so
they are valid CDF’s.
• We can describe the continuous curves in many interesting ways. Also, the calculations
with probabilities of intervals become much simpler if we have a continuous model.
Let FX (k) denotes the CDF of X and F (x) is the approximate CDF of X.
0
x≤0
FX (k) = k/100 k ≤ x < k + 1, k = 1, 2, . . . , 99
1 x ≥ 100
10
0
x≤0
F (x) = x/100 0 ≤ x ≤ 100
1 x ≥ 100
CDF of X
10 3 7
(a) P (3 < X ≤ 10) = F (10) − F (3) = − =
100 100 100
10.6 3.2 7.4
(b) P (3.2 < X ≤ 10.6) = F (10.6) − F (3.2) = − =
100 100 100
17
(c) P (X ≤ 17) = F (17) =
100
17.3
(d) P (X ≤ 17.3) = F (17.3) =
100
87 13
(e) P (X > 87) = 1 − F (87) = 1 − =
100 100
87.4 12.6
(f) P (X > 87.4) = 1 − F (87.4) = 1 − =
100 100
Observations:
• We will get the same value for P (3 < x ≤ 10) even if we use the exact CDF FX (k)
or the approximated CDF F (x).
• We will get different values for P (3.2 < X ≤ 10.6) if we use FX (k) and if we use
F (x). There is a small difference in the values.
11
2. Binomial using continuous CDF
k
X 100
FX (k) = (0.6)j (0.4)n−j
j=0
j
1
F (x) =
−1.65451(x − 60)
1 + exp √
24
Here 24 is the variance of Binomial(100, 0.6) and 60 is the mean of Binomial(100, 0.6).
F (x) is a good approximation for FX (k). To check, we will compute the probabilities
using both FX (k) and F (x). We will see that both gives a very close value. We are
not losing much.
12
1.5.0.1 CDFs and random variables
Theorem (Random variable with CDF F (x)) Given a valid CDF F (x), there exists a random
variable X taking values in R such that
P (X ≤ x) = F (x)
Remarks:
• This theorem allows us to define a CDF first, a valid CDF that can be defined in any
way we want. It assures that there is a random variable in some probability space.
• The value of the CDF at a particular input x, F (x) is P (X ≤ x). This connection
between the random variable and the CDF is very important, and it also allows us to
use the CDF directly to compute probabilities involved in the random variable.
• Any event we define using the random variable X, for example, X > a or X < a, etc.
we can use this connection to derive the probabilities.
Find:
(i) P (X = 0)
(iv) P (X = 2.00000 . . .)
Solution:
(i) P (X = 0) = 0.5
13
(iii) P (1.9999999 < X ≤ 2.0000001) = 0.00000002
We can observe that as the precision increases, probability decreases.
(iv) P (X = 2.00000 . . .) = 0
Here X is taking value with infinite precision and F (x) is continuous at x = 2, so the
probability is 0.
Remark:
• If F (x) jumps at a point, then it takes that value with non-zero probability.
• If there is no jump in F (x), if it is smooth and continuous at that point, it takes that
value with probability 0.
Remarks:
• P (X = x) = 0 for all x.
14
Examples:
1. Given below are the plot of few CDFs. Identify the kind of distribution from the
following:
(a)
(b)
(c)
(d)
Solution:
15
(a) Since the CDF is continuous, it has a continuous distribution.
(b) The CDF has a step-like structure, so it a discrete distribution.
(c) At 0, there is a jump in the CDF, and then it is a continuous curve, so it has
mixture (both discrete and continuous) distribution.
(d) Since the CDF is continuous, it has a continuous distribution.
CDF of X
i) Find P (X < −3), P (−3 < X < −1), P (−1 < X < 1), P (X ≤ 3), P (0 ≤ X < 3).
ii) Is there an x0 for which P (X = x0 ) > 0?
iii) Is X a continuous random variable?
Solution:
16
(ii) As we can observe from the figure that P (X = −5) = 0.2, so there is an x0 for
which P (X = x0 ) = 0.
(iii) Since there is a jump in the CDF at x = −5, therefore, it has a mixed distribution.
3. Consider a random variable X with CDF
0 x < −5
0.04x + 0.2 −5 ≤ x < 0
F (x) =
0.2 + 0.2x 0≤x<4
x≥4
1
CDF of X
i) Find P (X < −3), P (−3 < X < −1), P (−1 < X < 1), P (X ≤ −3), P (0 ≤ X < 3).
ii) Is there an x0 for which P (X = x0 ) > 0?
iii) Is X a continuous random variable?
Solution:
(i)
• P (X < −3) = F (−3) = (0.04 × −3) + 0.2 = 0.08
17
(ii) The CDF is continuous for all x, so there is not any x0 for which P (X = x0 ) > 0.
(iii) Since the CDF is continuous, random variable X is continuous .
• Throw a dart onto a circular board - distance of the point of impact from the center
of the board.
• Price of a stock.
Definition: A continuous random variable X with CDF FX (x) is said to have a PDF fX (x)
if, for all x0 ,
Zx0
FX (x0 ) = fX (x0 ) dx
−∞
18
2. CDF is an increasing function. CDF being higher at some x does not mean that X
takes more values there. On the other hand, if the density is higher, then X takes
mores values around those points.
3. Density gives a clear picture of how the distribution looks like, but in case of CDF, we
only see how the probability increases.
Examples:
The above figure is for the Uniform distribution. The image on the left hand side is
for Uniform[0, 5] and the image on the right hand side is for Uniform[0, 1/2].
19
1.6.1 Properties of PDF
A function f : R → R is said to be a density function if
1. f (x) ≥ 0
R∞
2. fX (x) dx = 1
−∞
Remark: Given a density function f , there is a continuous random variable X with PDF
as f .
Support of a random variable: It is defined as the points where the density function is
strictly greater than 0. Mathematically, for any random variable X with density fX (x)
Note: Supp(X) contains intervals in which X can fall with positive probability.
For any event A defined using the random variable X, probability of event is computed as
Z
P (A) = f (x) dx
A
Examples:
20
i) Consider the function (
3x2 , 0 < x < 1
f (x) =
0, otherwise
Solution
• P (X = 1/5) = 0 ; (Since X is continuous)
• P (X = 2/5) = 0 ; (Since X is continuous)
• P (X ∈ [1/5 − ϵ, 1/5 + ϵ])
1/5+ϵ
Z
1 1
P −ϵ<X < +ϵ = 3x2 dx
5 5
1/5−ϵ
1/5+ϵ
3
=x
1/5−ϵ
3 3
1 1
= +ϵ − −ϵ
5 5
6
= ϵ + 2ϵ3 , where ϵ << 0
25
3 3
2 2 24
• Similarly, P (X ∈ [2/5 − ϵ, 2/5 + ϵ]) = +ϵ − − ϵ = ϵ + 2ϵ3 , where
5 5 25
ϵ << 0
21
ii) Consider a random variable X with density
(
2x, 0 < x < 1
f (x) =
0, otherwise
Solution:
• P (X ∈ [0.1, 0.3])
Z0.3
P (0.1 ≤ X ≤ 0.3) = 2x dx
0.1
0.3
2
=x
0.1
=(0.3)2 − (0.1)2
=0.08
• P (X ∈ (0.1, 0.3])
Z0.3
P (0.1 < X ≤ 0.3) = 2x dx
0.1
0.3
2
=x
0.1
=(0.3)2 − (0.1)2
=0.08
• P (X ∈ [0.1, 0.3))
Z0.3
P (0.1 ≤ X < 0.3) = 2x dx
0.1
0.3
2
=x
0.1
=(0.3) − (0.1)2
2
=0.08
22
• P (X ∈ (0.1, 0.03))
Z0.3
P (0.1 < X < 0.3) = 2x dx
0.1
0.3
2
=x
0.1
=(0.3)2 − (0.1)2
=0.08
Solution:
R∞
For f (x) to be a valid density function, f (x) dx should be 1. Therefore,
−∞
Z∞
f (x) dx = 1
−∞
Z0 Z1/4 Z3/4 Z1 Z∞
=⇒ f (x) dx + f (x) dx + f (x) dx + f (x) dx + f (x) dx = 1
−∞ 0 1/4 3/4 1
Z0 Z1/4 Z3/4 Z1 Z∞
=⇒ 0 dx + k dx + 2k dx + 3k dx + f (x) dx = 1
−∞ 0 1/4 3/4 1
1/4 3/4 1
23
1.7 Common distributions
1.7.1 Uniform distribution
A continuous random variable X is said to be uniform in [a, b], if it has a flat PDF in the
range [a, b].
Za Zx Z∞
1
FX (x) = 0 dx + dx + 0 dx
b−a
−∞ a b
x−a
=0 + +0
b−a
x−a
=
b−a
0, x≤a
x − a
FX (x) = , a<x<b
b−a
1, x≥b
Find: P (−3 ≤ X ≤ 2), P (5 <| X |< 7), P (1 − ϵ < X < 1 + ϵ), P (9 − ϵ < X < 9 + ϵ),
P (X > 7|X > 3).
24
Now,
R2 1 5 1
• P (−3 ≤ X ≤ 2) = dx = =
−3 20 20 4
• P (5 <| X |< 7)
• P (1 − ϵ < X < 1 + ϵ)
Z1+ϵ
1
P (1 − ϵ < X < 1 + ϵ) = dx
20
1−ϵ
2ϵ
= , where ϵ << 0
20
2ϵ
For any x0 in the ϵ interval of [−1, 1], P (x0 − ϵ < X < x0 + ϵ) = .
20
• P (9 − ϵ < X < 9 + ϵ)
2ϵ
Similarly, for any x0 in the ϵ interval of [−9, 9], P (x0 − ϵ < X < x0 + ϵ) = .
20
• P (X > 7|X > 3)
P (X > 7, X > 3)
P (X > 7|X > 3) =
P (X > 3)
P (X > 7)
=
P (X > 3)
R10 1
20
dx
7
= 10
R 1
20
dx
3
3/20 3
= =
7/20 7
25
Solution: (Using the CDF)
Now,
2 + 10 −3 + 10 5
• P (−3 ≤ X ≤ 2) = FX (2) − FX (−3) = − =
20 20 20
• P (5 <| X |< 7)
• P (1 − ϵ < X < 1 + ϵ)
2ϵ
For any x0 in the ϵ interval of [−1, 1], P (x0 − ϵ < X < x0 + ϵ) = .
20
• P (9 − ϵ < X < 9 + ϵ)
2ϵ
Similarly, for any x0 in the ϵ interval of [−9, 9], P (x0 − ϵ < X < x0 + ϵ) = .
20
26
• P (X > 7|X > 3)
P (X > 7, X > 3)
P (X > 7|X > 3) =
P (X > 3)
P (X > 7)
=
P (X > 3)
1 − P (X ≤ 7)
=
1 − P (X ≤ 3)
1 − FX (7)
=
1 − FX (3)
7 + 10
1−
20 3/20 3
= = =
3 + 10 7/20 7
1−
20
1. Clearly fX (x) ≥ 0.
2. Support of X is {x : x > 0}
Z∞ Z0 Z∞
fX (x) dx = 0 dx + λe−λx dx
−∞ −∞ 0
−λx
e
=0 + λ
−λ
∞
= − e−λx =1
0
27
Rx Rx
Now, P (X ≤ x) = fX (x) dx = λe−λx dx = −e−λx |x0 = 1 − e−λx
−∞ −∞
Find: P (5 < X < 7), P (X > 4), P (1−ϵ < X < 1+ϵ), P (9−ϵ < X < 9+ϵ), P (X > 7|X > 3).
Now,
7
R7 e−2x
• P (5 < X < 7) = 2e−2x dx = 2 = e−10 − e−14
5 −2
5
28
• P (1 − ϵ < X < 1 + ϵ)
Z1+ϵ
P (1 − ϵ < X < 1 + ϵ) = 2e−2x dx
1−ϵ
1+ϵ
e−2x
=2
−2
1−ϵ
−2(1−ϵ) −2(1+ϵ)
=e −e , where ϵ << 0
29
• P (X > 7|X > 3)
P (X > 7, X > 3)
P (X > 7|X > 3) =
P (X > 3)
P (X > 7)
=
P (X > 3)
1 − FX (7)
=
1 − FX (3)
1 − (1 − e−14 )
=
1 − (1 − e−6 )
=e−8
Proof:
P (X > s + t, X > s)
P (X > s + t | X > s) =
P (X > s)
P (X > s + t)
=
P (X > s)
1 − FX (s + t)
=
1 − FX (s)
1 − (1 − e−λ(s+t) )
= = e−λt = P (X > t)
1 − (1 − e−λs )
Suppose you are waiting for a bus at a bus stop, it is a random waiting time, so that
random waiting time is very commonly modeled as an exponential random variable. No
matter at what time you go to the bus stop, the waiting time is going to be same. For any
real life situation, memoryless property is very useful.
30
1.7.3.1 PDF of Normal distribution
−(x − µ)2
1
fX (x) = √ e 2σ 2 , where Supp(X) = R
σ 2π
µ ∈ R, σ is a positive real number.
Observations:
• As the value of X becomes much larger from µ, fX (x) goes to 0. Similarly as the value
of X becomes much smaller from µ, fX (x) goes to 0.
1. fX (x) is non-negative.
−(x − µ)2
R∞ 1
2. √ e 2σ 2 dx = 1.
−∞ σ 2π
x−µ
Proof: Let =z
σ
Now, x = σz + µ =⇒ dx = σdz
Z∞ −(x − µ)2 Z∞
1 1 2
√ e 2σ 2 dx = √ e−z /2 σ dz (1)
σ 2π σ 2π
−∞ −∞
31
z2
Now, substitute = y 2 in (1), we will have
2
Z∞ Z∞
1 2 1 2
√ e−z /2 σ dz = √ e−y dy
σ 2π π
−∞ −∞
R∞ 2 √
We know that e−y dy = π
−∞
−(x − µ)2
R∞ 1
Therefore, √ e 2σ 2 dx = 1
−∞ σ 2π
Zx
FX (x) = fX (u) du
u=∞
X −µ
Z= ∼ Normal(0, 1).
σ
PDF:
1 2
fZ (z) = √ e−z /2
2π
32
CDF:
Zz
1 2
FZ (z) = √ e−u /2 du
2π
−∞
i) P (X < 5)
Solution:
X −2 5−2
P (X < 5) =P √ < √
5 5
√
=P (Z < 3/ 5)
√
=FZ (3/ 5)
iii) P (X > 5)
Solution:
X −2 5−2
P (X > 5) =P √ > √
5 5
√
=P (Z > 3/ 5)
√
=1 − FZ (3/ 5)
33
i) P (5 < X < 7)
Solution:
5−3 X −3 7−3
P (5 < X < 7) =P √ < √ > √
1 1 1
=P (2 < Z < 4)
=FZ (4) − FZ (2)
P (X > 7, X > 3)
P (X > 7|X > 3) =
P (X > 3)
P (X > 7)
=
P (X > 3)
P (X − 3 > 7 − 3)
=
P (X − 3 > 3 − 3)
P (Z > 4)
=
P (Z > 0)
1 − FZ (4)
=
1 − FZ (0)
Similarly, try to find P (9 − ϵ < X < 9 + ϵ), P (−5 < X < 5), P (X > 4).
34