Continuous R.V. Student
Continuous R.V. Student
• We require
1
The Distribution Function
1. For a continuous random variable,
c.d.f F(x) is given by
x
F(x) = P(X x) = f(y)dy.
−
2. f(x)=F’(x).
3. F is non-decreasing as f ≥ 0.
X
a b
P(a X b) =
shaded area cropped by the curve
2
Theorem 1.2.1
3
Example 1.2.1
• Solution:
4
Mean and Variance of a
continuous random variable
5
Example 1.2.2
6
7
Uniform Distribution U(a,b)
• X is uniformly distributed on an
interval (a,b) if it has the density
function
1
if a<x<b
f ( x ) = b-a
0 otherwise
• Notation: X~U(a,b).
8
Mean and Variance
(why?)
If X ~ U ( a, b ) , then
a+b
E(X) = , and
2
2
V (X) =
(b - a)
.
12
9
Example 1.2.3
10
Example 1.2.3
Solution
11
Normal Distribution
• The p.d.f. of normal distribution
with parameters μ and σ is
2
1 x-μ
−
1 2 σ
f(x) = e
2π
• The total area under the curve for
any probability distribution is 1
and this area represents the total
probability.
• If X is normal with mean μ and
variance σ2, we write X~N(μ,σ2).
• If μ=0 and σ2=1, then X has the
standard normal distribution.
• P.D.F. of standard normal: φ(.)
• CDF : Φ(.)
12
• A little history
• de Moivre
• Adrain (1808) and Gauss (1809)
• Laplace
13
14
15
Question 1
16
Question 2
17
Theorem 1.2.2
1. If X~N(μ,σ2), then aX+b~N(aμ+b,a2σ2).
Hence, (X-μ)/σ ~N(0,1). We only need to
tabulate the standard normal
distribution values.
2. If X~N(μ,σ2), then -X~N(-μ,σ2).
3. If X~N(μ,σ2) and Y~N(ν,τ2) are
independent, then X+Y~N(μ+ν,σ2+ τ2).
4. Φ(-z) = 1- Φ(z). We can re-write this as
Φ(z) + Φ(-z) = 1, so, Φ(0)=0.5.
5. φ(-z) = φ(z).
2
1 x-μ
−
1 2 σ
f(x) = e
2
2π
18
Examples
• Solution:
19
Examples
• Solution:
20
Examples
• X~N(5,16) and Y~N(-5,9), X and Y are
independent. What is the probability
P(X-Y>15)?
Solution.:
21
Example 1.2.6: Normal Approximation
• For many data sets, histograms assume
forms similar to normal curves. Here is
an example which utilizes that:
Suppose that the height of a group of
1000 students in distributed normally
with mean 66 inches and standard
deviation of 6 inches.
• Approximately how many of them have
height more than 78 inches?
• Approximately how many students have
height between 5 and 6 feet?
22
23
Example 1.2.6: Normal Approximation
Solution
• We write X ~ Exponential(λ) or
simply X ~ Exp(λ).
e
−x
dx = 1
0
• CDF x
F(x) = e dt = 1 − e
−t −x
, x>0.
0
25
If X ~ Exp(), E ( X ) = , var(X)= 2 .
1 1
26
EXAMPLE
• Suppose that School of Mathematics
found out that on average, there are
2 hits per minute on homepage of
the School.
27
Y~EXP(λ)
=inter-hit time are iid
Y1 Y2 Y3 Y4 Y5
X ~ Poisson(λ)
= # of hits of web page up to time t;
Assuming #of hits during the
intervals are independent.
28
• Suppose that School of Mathematics
found out that on average, there are
2 hits per minute on homepage of
the School.
29
Example
What is the probability that we have
to wait at most 40 seconds to observe
the first hit?
30
Example
What is the probability that we have
to wait at most 40 seconds to observe
the first hit?
31
Example
• If there is no hit after the first minute,
what is the probability that we have
to wait another 40 seconds for the
first hit?
32
Memoryless property
P{X s+ t|X s} = P{X t} s,t 0
Here
P( X s) = 1 − P(X s) = e −s .
P(X s+ t, X s)
P{X s+ t|X s} =
P( X s)
e −(s + t)
= −s
e
−t
= e = P( X t).
If we think of X as being the lifetime
of some instrument, then the
probability that the instrument
survives for at least s+t hours given
that it has survived t hours is the
same as the initial probability that it
survives for at least t hours.
33
Exponential Example
• The postal office in the university is staffed by
two clerks. Suppose that when Mr. Smith
enters the system, he discovers that Ms. Jones
is being served by one of the clerks and Mr.
Brown by the other. Suppose that Mr. Smith is
told that his service will begin as soon as
either Jones or Brown leaves.
A. 1/3
B. ¼
C. ½
D. 1
34
Y~EXP(λ)
=inter-hit time are iid
Y1 Y2 Y3 Y4 Y5
X~Poisson(λ)
= # of hits of web page up to time t;
Assuming #of hits during the
intervals are independent.
35
Erlang Distribution
➢ Recall we modelled the waiting
times until the first hit as exp(2).
Then how long do we have to wait
for the second hit?
➢ To do that we need add the
waiting time until the first hit and
the time between the first and
second hit.
Y1 Y2 Y3 Y4 Y5
36
➢ Let Y1=the waiting time until the
first hit, then Y1~exp(2), λ=2;
➢ Let Y2=the waiting time until the
second hit, by the memoryless
property, then Y2~exp(2), λ=2;
Y1 Y2 Y3 Y4 Y5
37
Erlang Distribution
➢ Suppose we want to model the time
to complete n operations in series,
where each operation requires an
exponential period of time to
complete.
➢ An Erlang distribution has the
density function as:
k k-1 -λx
λ x e
f(x) =
(k-1)!
x 0, k N.
0, 1, 2, …
41
Gamma Distribution
0, 1, 2, 720!,…
42
Gamma Distribution
43
140 Y=X!
120
100
80
60
40
20
0
0 1 2 3 4 5 6
44
140 Y=X!
120
100
80
60
40
20
0
0 1 2 3 4 5 6
45
• The gamma function is a solution to
the interpolation of y = (x − 1)! at the
positive integer values for x.
46
The gamma function for any real k>0,
is defined as:
− −
(k) = k 1
x e dx x
0
or
− dx
(k) = k
x e x
x
0
(n) = (n − 1)! for n a positive integer
(x+ 1) = x(x).
(1) = 1 (2) = 1 (3) = 2
(1 / 2) = .
47
What is a gamma PDF?
If we want a valid PDF related to Gamma
function, then it needs to be
nonnegative and probability adds up to 1.
NORMALISEDE IT.
k −1 − x Gamma(1,k) PDF
x e dx
1= (k)
0
48
Y~Gamma(λ,k) PDF
X
Let y= , X~Gamma(1,k)
dx
X=y, = .
dy
Change of variables, Theorem 1.3.1
dx
fY (y) = fX (x)
dy
k − 1 −y
(y) e
=
(k)
k y k − 1e −y
=
(k)
for y>0.
49
Theorem 1.2.3
k xk −1e −x
f(x) = ,x 0
(k)
• Γ(k + 1) = kΓ(k), k>-1.
• Γ(k)=(k-1)! for any positive integer
k.
➢ An Erlang distribution is a gamma
distribution, where k is a natural
number.
50
Y~EXP(λ)
=inter-hit time are iid
Y1 Y2 Y3 Y4 Y5
X~Poisson(λt)
= # of hits of web page up to time t;
Assuming #of hits during the
intervals are independent.
• If X~Gamma(λ,k1) and
Y~Gamma(λ,k2) are independent, then
X+Y~ Gamma(λ, k1+k2).
Theorem 1.2.4
• If Xi~Exponential (λ),
n
i =1,2,…,n
independently, then Xi ~Gamma(λ,n),
i=1
or Erlang(λ,n).
52
Expectation and Variance
k
x x k − 1e − x
E(X) =
0 (k)
d(x)
1 −
= k
(x) e ( x)d(x)
(k)
0
1
= (k + 1)
(k)
k(k) k
= = .
(k)
k 2
var(X) = . Find E(X ) first.
2
53
Example 1.2.7 The χ2 Distribution
• The Gamma(½,½) distribution is
called the χ2 (Chi-squared)
distribution with 1 degree of freedom.
• Notation: 12 .
2
• The 1 distribution also arises by
squaring a standard normal random
variable.
• Sum of n independent distributions,
is therefore Gamma(½,n/2), and is
written as 2n.
• It is referred to as the χ2 distribution
with n degrees of freedom.
• Therefore, 22 is also Exponential(½).
• The 2n distribution has mean n and
variance 2n.
54
Example 1.2.8
• Lifetime of a radio tube is
exponential with mean 1000 hours
(i.e. parameter λ=1/1000).
• Suppose a radio has five of such
components, each of which
functions independently.
• Assume that the radio works if at
least four of the components are
working perfectly.
• What is the probability that the radio
stops working within first 100 hours?
55
Solution 1.2.8
Let Ej, j = 1, …, 5 be the event that the j-
th tube stops working within first 100
hours, and let Ij be the indicator of Ej.
That is, Ij=1 if Ej occurs, and 0
otherwise.
56
Beta Distribution
• How likely is it that the Labour
Party will win the next elections
in 2024 in UK?
57
Beta Distribution
• Beta function, ranging from 0
to 1, is a very versatile way to
represent outcomes of
probabilities.
58
• Beta distribution is a
generalisation of the uniform
distribution.
59
• Beta distribution has two
parameters Beta(a,b), a family
of distributions, a>0, b>0.
a −1 b−1
f(x) = cx (1 − x ) if 0 x 1
• Integral makes a famous
question in maths, which is the
beta function.
60
1
(a)(b)
B(a, b) = xa −1 (1 − x) b−1 dx =
(a + b)
0
Proof:
(a)(b) = xa −1e − xdx y b−1e − ydy
0 0
= xa −1 (t − x)b−1 e − tdtdx (t x+y)
0 x
= e − t xa −1 (t − x)b−1 dxdt
0 x
= e − t a −1 (1 − )b−1 ta + b−1ddt (x t)
0 x
1
= e − t ta + b−1dt a −1 (1 − )b−1 d
0 0
1
=(a+b) a −1 (1 − )b−1 d
0
1
(a)(b)
= a −1 (1 − )b−1 d = B(a, b)
(a+b)
0 61
A random variable has a beta distribution
with parameters a and b if its pdf is :
1 a −1 b−1
x ( 1 − x) if 0 x 1
f(x) = B(a, b)
0 otherwise
where
1 a −1
B(a, b) = x (1 − x)b−1 dx.
0
62
• We focuses on its statistical
stories.
a −1 b−1
f(x) = cx (1 − x ) if 0 x 1
63
f(x) = cxa −1 (1 − x)b−1 if 0 x 1
• Beta distribution belongs to a
flexible family of continuous
distributions on (0, 1).
f(x) f(x)
f(x) f(x)
64
65
• Beta distribution belongs to a
flexible family of continuous
distributions on (0, 1).
66
• Particular, it is used a “conjugate
prior to binomial”
• X|P~Bin(n,p)
• We observe a Bin. rv, if p is known
then Bin, if p is unknown, then we
give p a distribution, let
p~Beta(a,b), which is called “the
prior”, this reflects the uncertainty
on p, rv.
• The prior exists before we observe
the data X, after we observe X, we
then update p using Bayesian rule.
• Find the posterior distribution:
p|X
67
Depends on p,
a function of p
Prior
Integrate
over all p. it
does not
depends on p
68
p ~ Beta(a, b)
69
What are the expected value and
variance for the Beta distribution?
70
a −1 b−1
xx (1 − x) dx
E(X) = B(a, b)
0
1
1 b−1
a
= x (1 − x) dx
B(a, b)
0
B(a + 1 , b)
=
B(a, b)
(a + 1)(b) / (a + b+ 1)
=
(a)(b) / (a+b)
a
= .
a+b
ab
V(X) =
(a + b)2 (a + b + 1)
71
Exercise 1.2.9
• Let X~U(0,1).
• Take Y=Xn and find p.d.f. of Y.
72
Theorem 1.3.1
• Let X be a continuous random
variable having p.d.f. fX.
• Suppose that g is a strictly
monotone differentiable function.
• Then the random variable Y=g(X)
has a p.d.f. given by
−1 d −1
fX (g (y)) g (y)
dy
fY (y) =
if y = g(x) for some x
0 otherwise
where g −1
( y ) is defined
as the value of x such that g ( x ) = y.
73
Example 1.3.1
• Let X~N(μ,σ2), and take Y=aX+b, a>0.
74
Example
Verify that if X~U(0,1), and Y=-lnX,
then Y~Exp(1),
75
Example
Verify that if X~U(0,1), and Y=-lnX,
then Y~Exp(1),
76
Theorem 1.3.2
77
Example
• Let X ~ U(-1,1), and let Y = X2. Find
p.d.f of Y.
78
Example
• Let X ~ U(-1,1), and let Y = X2. Find
p.d.f of Y.
79
Theorem 1.4.1
• If there is a function F(x) that
satisfies the 4 conditions:
80
This is a rather deep result and
is very hard to prove. However,
we could prove it easily in the
following two specific cases:
• a) F(x) is a step function;
• b) F(x) is a differentiable
function.
81
Example
82