0% found this document useful (0 votes)
7 views82 pages

Continuous R.V. Student

Uploaded by

jixuezhanggg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views82 pages

Continuous R.V. Student

Uploaded by

jixuezhanggg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

Continuous Random Variables

• A Continuous response or random


variable, X, is described in terms of
a probability density function
(p.d.f.) f(x):
b
P(a < X < b) =  f(x)dx for any a < b.
a

• We require

1
The Distribution Function
1. For a continuous random variable,
c.d.f F(x) is given by
x
F(x) = P(X  x) =  f(y)dy.
−
2. f(x)=F’(x).
3. F is non-decreasing as f ≥ 0.

X
a b
P(a  X  b) =
shaded area cropped by the curve
2
Theorem 1.2.1

If X is a continuous random variable,


then P(X=x) = 0 for any real number x.

3
Example 1.2.1

• Consider a random variable X with


F(x) = x2 for 0≤x≤1.
• What is its density function?

• Solution:

4
Mean and Variance of a
continuous random variable

5
Example 1.2.2

• Suppose X has density function


f(x) = ax b for 0  x  1

 0 otherwise
• Find the values of a and b if E(X) =
1/3. (Assume b>-1.)

6
7
Uniform Distribution U(a,b)
• X is uniformly distributed on an
interval (a,b) if it has the density
function
 1
 if a<x<b
f ( x ) =  b-a
0 otherwise

• Notation: X~U(a,b).

8
Mean and Variance
(why?)
If X ~ U ( a, b ) , then
a+b
E(X) = , and
2
2
V (X) =
(b - a)
.
12

9
Example 1.2.3

• Suppose that the length of ‘1”


screws’ are actually uniformly
distributed between 0.99” and 1.02”.
• In a pack of 120 screws, how many are
expected to be longer than 1.01”?

10
Example 1.2.3
Solution

11
Normal Distribution
• The p.d.f. of normal distribution
with parameters μ and σ is
2
1  x-μ 
−  
1 2 σ 
f(x) = e
2π
• The total area under the curve for
any probability distribution is 1
and this area represents the total
probability.
• If X is normal with mean μ and
variance σ2, we write X~N(μ,σ2).
• If μ=0 and σ2=1, then X has the
standard normal distribution.
• P.D.F. of standard normal: φ(.)
• CDF : Φ(.)

12
• A little history
• de Moivre
• Adrain (1808) and Gauss (1809)
• Laplace

13
14
15
Question 1

Who was the 18th century statistician


and consultant to gamblers that
discovered the normal curve?
A. de Moivre
B. Laplace
C. Adrian
D. Gauss
E. Quételet

16
Question 2

Why the discovery of normal curve is


important?
A. It has a relatively simple formula.
B. Many natural phenomena are at least
approximately normally distributed.
C. Many inferential statistics can only be
computed with a normal distribution.
D. Because it is a popular method to
model discrete random variables.

17
Theorem 1.2.2
1. If X~N(μ,σ2), then aX+b~N(aμ+b,a2σ2).
Hence, (X-μ)/σ ~N(0,1). We only need to
tabulate the standard normal
distribution values.
2. If X~N(μ,σ2), then -X~N(-μ,σ2).
3. If X~N(μ,σ2) and Y~N(ν,τ2) are
independent, then X+Y~N(μ+ν,σ2+ τ2).
4. Φ(-z) = 1- Φ(z). We can re-write this as
Φ(z) + Φ(-z) = 1, so, Φ(0)=0.5.
5. φ(-z) = φ(z).

2
1  x-μ 
−  
1 2 σ 
f(x) = e
2
2π
18
Examples

• X~N(5,16). What is P(1<X<13)?

• Solution:

19
Examples

• X~N(5,16). What is P(1<X<13)?

• Solution:

20
Examples
• X~N(5,16) and Y~N(-5,9), X and Y are
independent. What is the probability
P(X-Y>15)?

Solution.:

21
Example 1.2.6: Normal Approximation
• For many data sets, histograms assume
forms similar to normal curves. Here is
an example which utilizes that:
Suppose that the height of a group of
1000 students in distributed normally
with mean 66 inches and standard
deviation of 6 inches.
• Approximately how many of them have
height more than 78 inches?
• Approximately how many students have
height between 5 and 6 feet?

22
23
Example 1.2.6: Normal Approximation
Solution

• Let X denote the height of a random


student.
• Then X~
• Then Z =
• The probability of height being
greater than 78 inches is

So, the approximate number of students


with height more than 78 inches is
0.0228  1000  23.
24
Exponential Distribution
• One parameter (rate parameter): λ

• Exponential with f(x) = λe-λx, x>0.

• We write X ~ Exponential(λ) or
simply X ~ Exp(λ).

• This is a valid pdf.


 e
−x
dx = 1
0

• CDF x
F(x) =  e dt = 1 − e
−t −x
, x>0.
0
25
If X ~ Exp(), E ( X ) = , var(X)= 2 .
1 1
 

• The exponential distribution can be


used to model times between
successive rare events, e.g. the
times between failures of light
bulbs.

26
EXAMPLE
• Suppose that School of Mathematics
found out that on average, there are
2 hits per minute on homepage of
the School.

• We start to observe the web page at


a certain time point 0, and decide to
model the waiting time till the first
hit Y (in minutes)

27
Y~EXP(λ)
=inter-hit time are iid

Y1 Y2 Y3 Y4 Y5

X ~ Poisson(λ)
= # of hits of web page up to time t;
Assuming #of hits during the
intervals are independent.

28
• Suppose that School of Mathematics
found out that on average, there are
2 hits per minute on homepage of
the School.

• We start to observe the web page at


a certain time point 0, and decide to
model the waiting time till the first
hit Y (in minutes)
– using Exponential Distribution.

• To do that, we need an appropriate


value for the rate parameter, λ.
• The average waiting time (Y)
between hits is 0.5 minutes.:
E(Y)=0.5.

29
Example
What is the probability that we have
to wait at most 40 seconds to observe
the first hit?

30
Example
What is the probability that we have
to wait at most 40 seconds to observe
the first hit?

31
Example
• If there is no hit after the first minute,
what is the probability that we have
to wait another 40 seconds for the
first hit?

32
Memoryless property
P{X  s+ t|X  s} = P{X  t} s,t  0
Here
P( X  s) = 1 − P(X  s) = e −s .
P(X  s+ t, X  s)
P{X  s+ t|X  s} =
P( X  s)
e −(s + t)
= −s
e
−t
= e = P( X  t).
If we think of X as being the lifetime
of some instrument, then the
probability that the instrument
survives for at least s+t hours given
that it has survived t hours is the
same as the initial probability that it
survives for at least t hours.
33
Exponential Example
• The postal office in the university is staffed by
two clerks. Suppose that when Mr. Smith
enters the system, he discovers that Ms. Jones
is being served by one of the clerks and Mr.
Brown by the other. Suppose that Mr. Smith is
told that his service will begin as soon as
either Jones or Brown leaves.

• If the amount of time that a clerk spends with


a customer is exponentially distributed with λ,
what is the probability that, of the three
customers, Mr. Smith is the last to leave the
postal office?

A. 1/3
B. ¼
C. ½
D. 1

34
Y~EXP(λ)
=inter-hit time are iid

Y1 Y2 Y3 Y4 Y5

X~Poisson(λ)
= # of hits of web page up to time t;
Assuming #of hits during the
intervals are independent.

35
Erlang Distribution
➢ Recall we modelled the waiting
times until the first hit as exp(2).
Then how long do we have to wait
for the second hit?
➢ To do that we need add the
waiting time until the first hit and
the time between the first and
second hit.

Y1 Y2 Y3 Y4 Y5

36
➢ Let Y1=the waiting time until the
first hit, then Y1~exp(2), λ=2;
➢ Let Y2=the waiting time until the
second hit, by the memoryless
property, then Y2~exp(2), λ=2;

➢ To do that we need add the


waiting time until the first hit and
the time between the first and
second hit: L=Y1+Y2.

➢ The sum of the independent


exponential random variables will
follow the Erlang distribution.

Y1 Y2 Y3 Y4 Y5

37
Erlang Distribution
➢ Suppose we want to model the time
to complete n operations in series,
where each operation requires an
exponential period of time to
complete.
➢ An Erlang distribution has the
density function as:
k k-1 -λx
λ x e
f(x) =
(k-1)!
x  0, k  N.

 is the rate parameterm


and k is the shape parameter. 38
Example of Erlang
Distribution
• You join a queue with three people
ahead of you. One is being served
and two are waiting. Their service
times S1, S2 and S3 are independent,
exponential random variables with
common mean 2 minutes.
• Thus the parameter is the mean
service rate λ = .5/minute.
• Your conditional time in the queue
given the system state k = 3 upon
your arrival is
X = S1 + S 2 + S 3 .

X is Erlang distributed with density


function k=3. The probability that you
wait more than 5 minutes in the
queue is
39
40
Gamma Distribution

0, 1, 2, …

41
Gamma Distribution

0, 1, 2, 720!,…

42
Gamma Distribution

43
140 Y=X!

120

100

80

60

40

20

0
0 1 2 3 4 5 6

44
140 Y=X!

120

100

80

60

40

20

0
0 1 2 3 4 5 6

45
• The gamma function is a solution to
the interpolation of y = (x − 1)! at the
positive integer values for x.

The gamma function for any real k>0,


is defined as:

(k) =  x k −1e −xdx
0
or

− dx

(k) = k
x e x
x
0

46
The gamma function for any real k>0,
is defined as:

− −

(k) = k 1
x e dx x

0
or

− dx

(k) = k
x e x
x
0
(n) = (n − 1)! for n a positive integer
(x+ 1) = x(x).
(1) = 1 (2) = 1 (3) = 2
(1 / 2) =  .

47
What is a gamma PDF?
If we want a valid PDF related to Gamma
function, then it needs to be
nonnegative and probability adds up to 1.

NORMALISEDE IT.
 k −1 − x Gamma(1,k) PDF
x e dx
1=  (k)
0

Now consider Y~Gamma(λ,k) PDF

48
Y~Gamma(λ,k) PDF

X
Let y= , X~Gamma(1,k)

dx
 X=y, = .
dy
Change of variables, Theorem 1.3.1
dx
fY (y) = fX (x)
dy
k − 1 −y
(y) e
= 
(k)
 k y k − 1e −y
=
(k)
for y>0.

49
Theorem 1.2.3
 k xk −1e −x
f(x) = ,x  0
(k)
• Γ(k + 1) = kΓ(k), k>-1.
• Γ(k)=(k-1)! for any positive integer
k.
➢ An Erlang distribution is a gamma
distribution, where k is a natural
number.

➢ The Exponential(λ) is Gamma(λ,1).

50
Y~EXP(λ)
=inter-hit time are iid

Y1 Y2 Y3 Y4 Y5

X~Poisson(λt)
= # of hits of web page up to time t;
Assuming #of hits during the
intervals are independent.

The sum of interval ~Gamma(λ,k)


51
Additive property

• If X~Gamma(λ,k1) and
Y~Gamma(λ,k2) are independent, then
X+Y~ Gamma(λ, k1+k2).

Theorem 1.2.4
• If Xi~Exponential (λ),
n
i =1,2,…,n
independently, then  Xi ~Gamma(λ,n),
i=1
or Erlang(λ,n).

52
Expectation and Variance
k
x  x k − 1e − x
E(X) = 
0 (k)
d(x)


1 − 

= k
(x) e ( x)d(x)
(k)
0
1
= (k + 1)
(k)
k(k) k
= = .
(k) 
k 2
var(X) = . Find E(X ) first.
 2
53
Example 1.2.7 The χ2 Distribution
• The Gamma(½,½) distribution is
called the χ2 (Chi-squared)
distribution with 1 degree of freedom.
• Notation: 12 .
2

• The 1 distribution also arises by
squaring a standard normal random
variable.
• Sum of n independent distributions,
is therefore Gamma(½,n/2), and is
written as 2n.
• It is referred to as the χ2 distribution
with n degrees of freedom.
• Therefore,  22 is also Exponential(½).
• The 2n distribution has mean n and
variance 2n.

54
Example 1.2.8
• Lifetime of a radio tube is
exponential with mean 1000 hours
(i.e. parameter λ=1/1000).
• Suppose a radio has five of such
components, each of which
functions independently.
• Assume that the radio works if at
least four of the components are
working perfectly.
• What is the probability that the radio
stops working within first 100 hours?

55
Solution 1.2.8
Let Ej, j = 1, …, 5 be the event that the j-
th tube stops working within first 100
hours, and let Ij be the indicator of Ej.
That is, Ij=1 if Ej occurs, and 0
otherwise.

56
Beta Distribution
• How likely is it that the Labour
Party will win the next elections
in 2024 in UK?

• In my view, the probability is


0.42. You may think it is 0.35,
your friend thinks the chances
are 0.6, and your roommate
thinks the chance is 0.2.

• We would like to find a way to


summarise the chances that a
person will say.

57
Beta Distribution
• Beta function, ranging from 0
to 1, is a very versatile way to
represent outcomes of
probabilities.

58
• Beta distribution is a
generalisation of the uniform
distribution.

59
• Beta distribution has two
parameters Beta(a,b), a family
of distributions, a>0, b>0.
a −1 b−1
f(x) = cx (1 − x ) if 0  x  1
• Integral makes a famous
question in maths, which is the
beta function.

60
1
(a)(b)
B(a, b) =  xa −1 (1 − x) b−1 dx =
(a + b)
0
Proof:
 
(a)(b) =  xa −1e − xdx  y b−1e − ydy
0 0
 
=  xa −1  (t − x)b−1 e − tdtdx (t  x+y)
0 x
 
=  e − t  xa −1 (t − x)b−1 dxdt
0 x
 
=  e − t  a −1 (1 − )b−1 ta + b−1ddt (x  t)
0 x
 1
=  e − t ta + b−1dt  a −1 (1 − )b−1 d
0 0
1
=(a+b) a −1 (1 − )b−1 d
0
1
(a)(b)
 =  a −1 (1 − )b−1 d = B(a, b)
(a+b)
0 61
A random variable has a beta distribution
with parameters a and b if its pdf is :

 1 a −1 b−1
 x ( 1 − x) if 0  x  1
f(x) =  B(a, b)
 0 otherwise

where
1 a −1
B(a, b) =  x (1 − x)b−1 dx.
0

62
• We focuses on its statistical
stories.
a −1 b−1
f(x) = cx (1 − x ) if 0  x  1

63
f(x) = cxa −1 (1 − x)b−1 if 0  x  1
• Beta distribution belongs to a
flexible family of continuous
distributions on (0, 1).
f(x) f(x)

a=1, b=1 a=2, b=1

f(x) f(x)

a=1/2, b=1/2 a=2, b=2

64
65
• Beta distribution belongs to a
flexible family of continuous
distributions on (0, 1).

• Used as a probability for


probabilities, as prior for a
parameter (0,1). Particular, it
is used a “conjugate prior to
binomial.”

• It has various connections


with other distributions.

66
• Particular, it is used a “conjugate
prior to binomial”
• X|P~Bin(n,p)
• We observe a Bin. rv, if p is known
then Bin, if p is unknown, then we
give p a distribution, let
p~Beta(a,b), which is called “the
prior”, this reflects the uncertainty
on p, rv.
• The prior exists before we observe
the data X, after we observe X, we
then update p using Bayesian rule.
• Find the posterior distribution:
p|X

67
Depends on p,
a function of p
Prior

p(X = k|p) f(p)


f(p|X = k) =
P(X = k)

Integrate
over all p. it
does not
depends on p

68
p ~ Beta(a, b)

p(X = k|p) f(p)


f(p|X = k) =
P(X = k)
n k n −k a −1 b−1
  p ( 1 − p) c p (1-p)
 k
=
P(X = k)
k n −k a −1 b−1
 p (1 − p) p (1-p)
a + k −1 b+ n −k −1
p (1 − p)
 P|X ~ Beta(a + X, b + n − X)

69
What are the expected value and
variance for the Beta distribution?

70
 a −1 b−1
xx (1 − x) dx
E(X) =  B(a, b)
0
1
1 b−1

a
= x (1 − x) dx
B(a, b)
0
B(a + 1 , b)
=
B(a, b)
(a + 1)(b) / (a + b+ 1)
=
(a)(b) / (a+b)
a
= .
a+b

ab
V(X) =
(a + b)2 (a + b + 1)
71
Exercise 1.2.9
• Let X~U(0,1).
• Take Y=Xn and find p.d.f. of Y.

72
Theorem 1.3.1
• Let X be a continuous random
variable having p.d.f. fX.
• Suppose that g is a strictly
monotone differentiable function.
• Then the random variable Y=g(X)
has a p.d.f. given by

 −1 d −1
fX (g (y)) g (y)
 dy
fY (y) = 
 if y = g(x) for some x
0 otherwise

where g −1
( y ) is defined
as the value of x such that g ( x ) = y.

73
Example 1.3.1
• Let X~N(μ,σ2), and take Y=aX+b, a>0.

74
Example
Verify that if X~U(0,1), and Y=-lnX,
then Y~Exp(1),

75
Example
Verify that if X~U(0,1), and Y=-lnX,
then Y~Exp(1),

76
Theorem 1.3.2

• Let X be a continuous random variable


having pdf fX.
• Suppose g is a differentiable function.
• Then the random variable Y defined as
Y=g(X) has a p.d.f. given by
 d
  fX (x(y)) (x(y)) if y = g(x) for some x
fY (y) = {x:y =g(x)} dy

0 otherwise

where x(y) is the expression of x in terms of


y.

77
Example
• Let X ~ U(-1,1), and let Y = X2. Find
p.d.f of Y.

78
Example
• Let X ~ U(-1,1), and let Y = X2. Find
p.d.f of Y.

So here, g(x)=x2, and the function is


no longer monotone. The range of Y
is still (0,1), but each value of y now
corresponds to two values of x; x=√y
and x=-√y.

79
Theorem 1.4.1
• If there is a function F(x) that
satisfies the 4 conditions:

then it is possible to obtain a


random variable X with its
distribution function given by
F(x).

80
This is a rather deep result and
is very hard to prove. However,
we could prove it easily in the
following two specific cases:
• a) F(x) is a step function;
• b) F(x) is a differentiable
function.

But, it is easy to see that the


corresponding random variables
are discrete random variables in
case of a) and continuous random
variables in case of b).

81
Example

82

You might also like