SlidesCourse 14 Oct

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

UNIT 4 continued:

Properties of Expectation and Variance


E(a X + b) = a E(X) + b
E(g(x)+h(x)) = E(g(x)) + E(h(x))
V(X) = E(X^2)-E(X)2
V(a X + b) = a2 V(X)
for h(.) concave: E(h(x)) ≥ h(E(x)) (Jensen’s In equality)
Example C: We consider two urns with 3 balls numbered (0, 1, 2) and (1, 2, 3).
U1, U2 ... number obtained from randomly drawing from urn 1 and urn 2 respectively
a) Find the exact value of E(U1 . U2) and Var(U1 . U2).
b) Find the approximate E(U1 . U2) and Var(U1 . U2) using simulation with R.
Solution b) Y = U1 . U2
simEV <- function(n=1.e5){
U1 <- floor(runif(n)*3) # floor(runif(n,0,3))
U2 <- 1+floor(runif(n)*3)
Y <- U1*U2
c(E=mean(Y),V=var(Y)) # var(x) = sum((x_i – mean(x))^2)/n
}
simEV(1.e6)
simEV1 <- function(n=1.e5){
U1 <- sample(0:2,size=n,replace=T)
U2 <- sample(1:3,size=n,replace=T)
Y <- U1*U2
c(E=mean(Y),V=var(Y)) # var(x) = sum((x_i – mean(x))^2)/n
}
simEV1(1.e6)

# E V
#2.002192 3.780069
#simEV(1.e6)
# E V
#1.999655 3.777565

Solution a) Y = U1 . U2 we find first the pmf of Y


f_Y(0) = P(Y=0) =P(U1=0) = 1/3;
f_Y(1) = P(U1=1 and U2=1)= 1/9
f_Y(2) = P((1, 2) or (2,1))= 2/9
f_Y(3) = P((1,3) )= 1/9
f_Y(4) = P((2, 2))= 1/9
f_Y(6) = P((2, 3))= 1/9
pmf <- c(3,1,2,1,1,0,1)/9
Y <- 0:6
sum(pmf*Y) # E(X)
[1] 2
> sum(pmf*Y^2)- sum(pmf*Y)^2 # Var(X)
[1] 3.777778
E(Y) = sum(x . f(x)) = (0*3 + 1*1+2*2+3*1+4*1+6*1)/9 = 2
E(Y^2) = sum(x . f(x)) = (0*3 + 1*1+4*2+9*1+16*1+36*1)/9 = 70/9 = 7.77778
Var(Y) = 70/9 – 2^2 = 34/9 = 3.77778
comparing the results of simulation and the exact formula show the they are correct

Example D)
0 for x < 0
X continuous: F(x) = x^2/2 for 0 <= x < 1
x/2 for 1 <= x < 2
1 for 2 <= x
a) Show that F(x) is a CDF and that it is continuous.
b) Guess the value of the expectation of X. Is it larger or smaller than 1 or exactly 1?
c) Calculate the expectation of X.
Solution a: non decreasing is ok as x^2/2 for x>0 ok, x/2 ok, 0 and 1 ok
limit_( x-> - infinity) ok limit to + infinity is 1 as required
right continuous function;
we show continuous: x and x^2 are continuous ==> check x=0 and x=1 and 2 all ok
(comment F is continuous ==> P(X= c) = 0 ie X has no point masses
b) pdf: f(x) = F’(x) = 2x/2 = x for 0<=x<1
1/2 for 1 <= x < 2
0 else
plot of the pdf:
Rcode: plot(xv <- seq(0,2,0.001), ifelse(xv<1,xv,1/2),type="l")
plot shows: E(X) is larger than 1 as values “far from 1“ have larger probability to the right
of 1.
c) “classical mistake” E(X) = integral x F(x) dx NEED pdf to calcuate E(X) !!!!!!!!!
solution:
E(X) = integral_(0,2) x f(x) dx =
= integral_(0,1) x f(x) dx + integral_(1,2) x f(x) dx =
= integral_(0,1) x^2 dx + integral_(1,2) x 0.5 dx = [ x^3/3]_(0,1) + 0.25 [x^2]_(1,2) =

= 1/3 + 0.25(4-1) = 1/3 + 3/4 = 1.083333


F <- function(x){
ifelse(x<0,0,
ifelse(x<1,x^2/2,
ifelse(x<2,x/2,1))) }
F(1) #[1] 0.5
plot(xv<-seq(-0.5,2.5,0.001),F(xv),type="l")
windows();plot(xv <- seq(0,2,0.001),ifelse(xv<1,xv,1/2),type="l")
Example E: (will Elvan explain tomorrow!!)
f(x) = c/ x^3 for x >= 2. Find expectation and variance!
First we need c:
Integral_(2 to Inf) c/x^3 dx = c [ -0.5/x^2]_(2 to Inf) = c (0+1/8) = 1 c=8
f(x) = 8/x^3 for x >= 2
0 else
E(X) = integrate_(from 2 to Inf) x 8/x^3 dx = integrate_(from 2 to Inf) 8/x^2 dx =
= (-8/x)_(2 to Inf) = 0 +4 = 4
for the variance we need first E(X^2)
E(X^2) = integrate_(from 2 to Inf) x^2 8/x^3 dx = integrate_(from 2 to Inf) 8/x dx =
= (8 log(x)) _(from 2 to Inf) = Inf (second moment does not exist)
as limit_(x to Inf) log(x) = Inf
==> The variance does not exist

UNIT 5 common distributions


Use the pdf file LN Common Distributions, Poisson Process on moodle !!

Necessary to remember name and Definition !!!!!!!!


The rest you can find in the table
PROBLEM: Which distribution should be used ? !!!!!!!!

DISCRETE DISTRIBUTIONS
For a sequence of biased coin tosses with prob(1) = prob(head in 1 exper.) = p
= “Bernoulli trials” (sequence of independent 0,1 experiments)
Binomial distribution
X … Total number of successes in n trials parameters: n p
Geometric distribution (Negative Binomial with r=1)
X … Number of trials required till the 1-st success p
Negative Binomial
X … Number of trials required till the r-th success r p
Poisson
X … Total number of successes when n is large, p small and unknown but E(X) is known
Hypergeometric distribution N… number of balls in urn, n trials, r “marked”
Urn experiment without replacement
for N >> n very similar to Binomial
#################
Example A0:
A policeman has the experience that about 10 percent of the car drivers have no license.
A day he stops cars and checks if the driver has a valid driving license.
What is the probability that of 22 stopped cars more than 5 have no license.
Proper notation to solve such probability examples using standard distributions:
1) define a RV t: hat helps to solve the example:

X … total number of drivers who have no license.


2) Find the distribution of X ~ Binomial(n=22, p=0.1)
3) Formulate the probabiity question with help of the defined RV: P(X > 5)
4) Calculate the probability: P(X > 5) = 1 – F(5) = 1-pbinom(5,size=n,prob=p)
P(X > 5) = 1 - P(X <=5) = 1 – F(5) = pbinom(5,size=22,prob=0.1) = 0.018215
= 1 – (f(0)+f(1)+f(2)+f(3)+f(4)+f(5) )
= 1 – sum(dpois(0:5,size=n,prob=p)) = 0.018216
!!!! pbinom(5,size=22,prob=0.1) == sum(dpois(0:5,size=22,prob=0.1))

Example A1: A university has 2000 students, 900 of them male and 1100 female. 100 of
them are randomly selected as delegation of the university.
a) What is the probability that the number of male students in the delegation is at least 50?
b) What is the expecation and the standard deviation of the number of male students in
the delegation?
Let us guess the approximate results:

X ... number of mail students among the 100 selected students


X ~ Hypergeometric(N=2000,n=100,r=900)
a) P(X ≥ 50) = 1 – F(50) = 1 – 0.8715
The table of standard distributions gives: R-command: dhyper(x, m=r, n=N-r, k=n)
F(50) = phyper(50,m=900,n=1100,k=100) = 0.8715258

Example 5.A: Use R to show “numerically” for λ= 2.5 that the formula
for E() and Var() of the Poisson distribution is correct.
E(X) = sum_(x from 0 to infinity) x f(x)
pmf of Poisson distribution: dpois(x, lambda=λ)
x <- 0:100
sum(x*dpois(x,lam=2.5)) # E(X) = lam = 2.5

Var(X) = E( (X-E(X))^2 ) = E(g(X) f(X))


=sum_(0 to infinity) g(x) f(x)
x<- 0:50sum((x-2.5)^2*dpois(x,lam=2.5))# better code
# numerically

Var(X) = E(X^2) –E(X)^2 # numerically less stable


x <- 0:100
sum(x^2*dpois(x,lam=2.5))-(sum(x*dpois(x,lam=2.5)))^2

CONTINUOUS DISTRIBUTIONS
Normal: CDF of normal: N(0,1): F(x) = pnorm(x);
general mu and sigma F(x) = pnorm((x-mu)/sigma)
Uniform
Exponential
Gamma = sum of α independent Exponential (for α integer)

R-functions!!!! CDF functions start with p


pmf (and pdf) functions start with d
quantile (ie inverse CDF) functions start with q
random variate generation functions start with r

Quantile(p) = F-1(p) m=quantile(0.5) … F(m) = 0.5 m auch 50% left and 50% right of m
m is called the median
a=quantile(0.9) F(a) = 0.9
RV X with quantile(0.9) =20 ==> how many percent of X are larger than 20 ? 0.1
X exceeds (is larger than) 20 with probability 0.9

Example A2: The height of 5 years old fur trees is assumed to follow a normal distribution
with mean 4.5 meter and standard deviation 0.4 meter.
a) Find the probability that the height of a random selected tree exceeds 6 meters.
b)Find the probability that the height of a random selected tree does not exceed 4 meters.
c) Find the height that is exceeded with a probability of 2 percent.
X ... height, X ~ N(mu=4.5,sigma=0.4)
a) P(X>6) = 1-F(6) = 1-pnorm((6-4.5)/0.4) = 1-pnorm(3.75) = -pnorm(-3.75) = 8.84e-5
c) P(X > c) = 0.02 ==> P(X<c) = F(c) = 1 - 0.02 = 0.98; c = F-1(0.98)
Z = (X – mu)/sigma E(Z) = E((X – mu)/sigma) = (E(X)-mu)/sigma = 0
V(Z) = V((X – mu)/sigma) = V(X)/sigma2 = sigma2/sigma2 = 1
Z~ N(mu=0,sigma=1)
X = mu + sigma Z ==> to get the quantile c of X we can simply use the quantile of cZ of Z
and transform it with mu + sigma Z to get c = mu + sigma cZ
cZ = qnorm(0.98) = 2.0537 ==> c = 4.5+0.4*qnorm(0.98) = 5.3215

Example 5.B: For the gamma distribution:


Use R to show that numerically for α=5 and rate λ= 2.5 the
formulas for E() and Var() are correct.
E(X) = integral_(x from 0 to infinity) x f(x) dx
We use the numeric integration function “integrate()”
integrate(f,lower,upper, ...,rel.tol=1.e-12)

# to find E(X)
integrate(function(x)x*dgamma(x,shape=5,rate=2.5),
0,Inf,rel.tol=1.e-12)
2 with absolute error < 3.3e-13

# to find the variance


integrate(function(x)(x-2)^2*dgamma(x,shape=5,rate=2.5),
0,Inf,rel.tol=1.e-12)
0.8 with absolute error < 6.6e-13
POISSON PROCESS
Use the pdf file “LN Common Distributions, Poisson Process” to see the definition:
Poisson process with rate λ describes random events in continuous time.
It is important to know that for a Poisson process the number of arrivals in non
overapping intervals are independent and it has the memoryless proerty which means:
The waiting time till the next event is always the same. It is not changed by the current
time and not by the past events that were observed.
To solve examples you mainly need the two properties:
1) The number of events in a time interval (a,b) is Poisson distributed with rate = (b-a) λ
2) The waiting time till the next event at any time point follow the exponential
distribution with parameter λ.
A Poisson process with high rate λ has many events in a time interval.
The waiting time till the eg. 3rd event follows the gamma distribution with shape=3 rate= λ

Example A3: arrivals to a shop follow a Poisson process with rate 20 per hour.
a) What is the probability that in the next 6 minutes no customer arrives.
b) What is the probability that the waiting time for the 30th customer is smaller than one
hour?
c) Find the waiting time for the arrival of the 30th customer that is exceeded with 99% ?

Solution a) X ... number of customers in 6 minutes:


X ~ Poisson(lam=20*6/60=2) P(X=0) = f(0) = dpois(0,lam=2) = exp(-2)
b) Y ... waiting time 30th customer Y~Gamma(shape=30, rate = 20)
P(Y < 1) = F(1) = pgamma(1,shape=30,rate=20) = 0.02181822
c) P(Y< c) =0.01 ==> c = qgamma(0.01,shape=30,rate=20) = 0.9371213 = 56.23 minutes

Example A4: The interarrival times to a filling station are assumed to follow independent
exponential random variates with expectation 20 minutes. What is the probability that in
one hour more than 5 customers arrive?
Solution: As the interarrival times are indepedent exponential variates we know that the
events form a Poisson process. So we first have to find the rate of that poisson process.
It is equal to the rate of the interarrival time distribution. E(exponential) = 1/lam =20
==> lam of Poisson process is 1/20 = 0.05 per minute or 0.05*60 = 3 per hour.
Y ... number of customers in one hour Y~Poisson(lam=3)
P(Y>5) = 1-ppois(5,lam=3) 0.08391794

Information Course:
Tuesday Oct 15 at 11.00: PS
Monday Oct 21 at 15.00 ZOOM course (video shared)
Tuesday Oct 22 at 11.00: PS Unit 5
Monday Nov 4 and Tuesday Nov 5 Lecture in class
Midterm exam mid of November
(Date will be announced)

You might also like