0% found this document useful (0 votes)
28 views13 pages

W8PS

Uploaded by

polar neckson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views13 pages

W8PS

Uploaded by

polar neckson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Statistics for Data Science - 2

Week 8 Practice assignment

1. Let X1 , . . . , Xn be n i.i.d. samples from a random variable X with mean µ and variance
σ 2 . Let X̄ 2 be an estimator of µ2 where X̄(sample mean) is an unbiased estimator of
µ. Is the estimator X̄ 2 unbiased always?

(a) Yes
(b) No

Solution:
X1 + . . . + X n
X̄ =
n
Given X̄ is an unbiased estimator of µ and X̄ 2 is an estimator of µ2 .
=⇒ E[X̄] = µ
Now,

E[X̄ 2 ] =Var(X̄) + (E[X̄])2


σ2
= + µ2
n
6=µ2

Therefore, estimator X̄ 2 is not an unbiased estimator of µ2 .

2. Let X1 , X2 , . . . , Xn be n i.i.d. samples from a distribution with PDF


1 + θx
fX (x) = , −1 < x < 1
2

Let θ̂ = 3X̄ be an estimator of θ. Find the mean squared error of θ̂.


(3 − θ2 )
(a)
n
(3 + θ2 )
(b)
n
(3 + θ)
(c)
n
(3 − θ)
(d)
n

1
Solution:
Given θ̂ = 3X̄ an estimator of θ.
Expectation of X is given by
Z 1
E[X] = xfX (x)dx
−1
Z 1  
1 + θx
= x dx
−1 2
1 1
Z
= (x + θx2 )dx
2 −1
1
x2 θx3 θ
= + =
4 6 3
−1

Bias(θ̂, θ) =E[θ̂ − θ]
   
X1 + . . . + Xn
=E 3 −θ
n
 

=3 − E[θ] = 0
3n

Therefore, estimator θ̂ is unbiased.


Z 1
2
E[X ] = x2 fX (x)dx
−1
Z 1  
2 1 + θx
= x dx
−1 2
1 1 2
Z
= (x + θx3 )dx
2 −1
1
x3 θx4 1
= + =
6 8 3
−1

2
1 θ2
Therefore, Var[X] = −
3 9
  
X 1 + . . . + Xn
Var(θ̂) =Var 3
n
9
= 2 (nVar[X])
n  
1 θ2

9
= 2 n −
n 3 9
2
3−θ
=
n
3 − θ2
MSE(θ̂) = Bias(θ̂)2 + Var[θ̂] = .
n

3. Consider 100 samples X1 , X2 , . . . , X100 from a random variable X whose distribution


100 100
has mean µ and variance σ 2 . Let
P P 2
Xi = 150 and Xi = 1999. Find an unbiased
i=1 i=1
estimate for Var(X).

(a) 17.74
(b) 17.91
(c) 1.5
(d) 2.25

Solution:
Given the distribution of X has mean equal to µ and variance equal to σ 2 .
100
P 100
P 2
Also, Xi = 150 and Xi = 1999
i=1 i=1
1 P n
We know that S 2 = (Xi − X̄)2 is an unbiased estimator of Var[X].
n − 1 i=1

3
Therefore,
n
2 1 X
S = (Xi − X̄)2
n − 1 i=1
n
1 X 2
= (Xi + X̄ 2 − 2Xi X̄)
n − 1 i=1
n n
!
1 X X
= Xi2 + nX̄ 2 − 2X̄ Xi
n−1 i=1 i=1
n
!
1 X
= Xi2 + nX̄ 2 − 2nX̄ 2
n−1 i=1
n
!
1 X
= Xi2 − nX̄ 2
n−1 i=1
  n 2    n
2 
P P
1  X n
2
 i=1 Xi   1 
n
X 2 i=1
Xi 
= X − n = X −
    
n − 1  i=1 i n   n − 1  i=1 i n
 

1502
 
12
Therefore, S = 1999 − = 17.91
100 − 1 100
n
P
4. Let X1 , X2 , . . . , Xn ∼ i.i.d. X. Let a1 , . . . , an ≥ 0 such that ai = 1. Define the
i=1
n
ai xi . Define the estimator for the variance as S 2 =
P
estimator for mean as X̄ =
i=1
n
ai (Xi − X̄) with E[X] = µ and Var(X) = σ 2 . Choose the correct option(s) from
2
P
i=1
the following:

(a) X̄ is an unbiased estimator.


 
2 n−1
(b) E[S ] = σ2
n
 n

2
ai σ 2
2
P
(c) E[S ] = 1 −
i=1
n
(d) E[S 2 ] = a2i σ 2
P
i=1

(e) S 2 is an unbiased estimator for Var(X).

Solution:

4
Given X1 , X2 , . . . , Xn ∼ i.i.d. X, E[X] = µ, Var[X] = σ 2
n
P Pn
X̄ = ai xi is an estimator of µ, where ai = 1.
i=1 i=1

n
P n
P
(a) E[X̄] = E[a1 X1 + · · · + an Xn ] = ai E[X] = µ (since ai = 1)
i=1 i=1
Bias(X̄) = E[X̄] − E[X] = µ − µ = 0
Therefore, X̄ is an unbiased estimator of µ.
n n
a2i Var[X] = σ 2 a2i
P P
(b) Var[X̄] = Var[a1 X1 + · · · + an Xn ] =
i=1 i=1

E[X̄] =µ (1)
n
X
Var[X̄] =σ 2 a2i (2)
i=1

n
X
S2 = ai (Xi − X̄)2
i=1
n
X
= (ai Xi2 + ai X̄ 2 − 2ai Xi X̄)
i=1
Xn n
X n
X
= ai Xi2 + 2
ai X̄ − 2ai X̄Xi
i=1 i=1 i=1
Xn Xn
= ai Xi2 + X̄ 2 − 2X̄ 2 = ai Xi2 − X̄ 2
i=1 i=1

5
Now,
n
! n
X X
E[S 2 ] = E ai Xi2 − X̄ 2 = E[ai Xi2 ] − E[X̄ 2 ]
i=1 i=1
n
X
= ai E[Xi2 ] − E[X̄ 2 ]
i=1
n
X
= ai (σ 2 + µ2 ) − (Var[X̄] + µ2 )
i=1
n
X
=σ 2 + µ2 − σ 2 a2i − µ2 [From(2)]
i=1
n
X
=σ 2 − σ 2 a2i
i=1
n
!
X
= 1− a2i σ 2
i=1

Therefore, (b) is not true.


 n

2
ai σ 2 , therefore, (c) is true.
2
P
(c) Since E[S ] = 1 −
i=1
(d) (d) is not the correct option.
(e) Bias(S 2 ) = E[S 2 ] − σ 2 6= σ 2 .
Therefore, S 2 is not an unbiased estimator of Var[X].
5. Let X1 , . . . , Xn ∼ i.i.d. Uniform(−a, a). Find the ML estimator of a.
(a) âM L = max(| X1 |, . . . , | Xn |)
(b) âM L = max(X1 , . . . , Xn )
(c) âM L = min(X1 , . . . , Xn )
1
(d) âM L = n min(X1 , . . . , Xn )
2
Solution:
X1 , · · · , Xn ∼ Uniform(−a, a).
fXi (xi ) is given by 
1 for −a < xi < a
fXi (xi ) = 2a
0 otherwise
Likelihood function of a is given by
n  n
Y 1
L(x1 , x2 , . . . , xn ) = fX (xi ) =
i=1
2a

6
In order to maximise the likelihood function, we need to minimize a.
Since −a < xi < a for all i and | xi |< a, therefore, a = max(| x1 |, . . . , | xn |).
Therefore, the ML estimator of a is max(| X1 |, . . . , | Xn |).

6. Let X1 , X2 , X3 ∼ iid Normal(µ, σ 2 ). Given a random sample (−1, 0, 1), find the maxi-
mum likelihood estimate of σ 2 .
2
a) 3
7
b) 12
1
c) 3
5
d) 12

Solution:
n
(Xi − µ̂M L )2
P
i=1
ML estimator of σ 2 is , where µ̂M L = X̄.
n
−1 + 0 + 1
Given the samplings −1, 0, 1, X̄ = =0
3
(−1)2 + 02 + 12 2
Therefore, ML estimator of σ 2 is = .
3 3
7. Let X1 , . . . , Xn be n i.i.d. samples of a random variable X. Let X have the PDF
f (x) = (α + 1)xα , where 0 < x < 1.

(a) Find the ML estimator of α.


n
i. α̂M L = 1 + P
n
log Xi
i=1
n
ii. α̂M L = −1 − P
n
log Xi
i=1
n
iii. α̂M L = 1 − P
n
log Xi
i=1
n
iv. α̂M L = −1 + P
n
log Xi
i=1
Solution:
Given,
f (x) = (α + 1)xα , 0<x<1

7
Likelihood function of a sampling X1 , X2 , . . . , Xn will be given by
n
Y
L(x1 , x2 , . . . , xn ) = fX (xi )
i=1
= (α + 1)n xα1 · · · xαn
⇒ log(L) = n log(α + 1) + α(log(x1 ) + · · · + log(xn ))

Therefore, ML estimator for α is given by

α̂ = arg max[n log(α + 1) + α(log(x1 ) + · · · + log(xn ))]


α

Let Y = n log(α + 1) + α(log(x1 ) + · · · + log(xn ))


Now,
dY d
= [n log(α + 1) + α(log(x1 ) + · · · + log(xn ))]
dα dα
n
= + log(x1 ) + · · · + log(xn )
α+1

Now,
dY
=0

n
⇒ = −[log(x1 ) + · · · + log(xn )]
α+1
n
⇒ α̂M L = −1 − Pn
log Xi
i=1

α+1
(b) The mean of the random variable X is α+2
. Find the estimator of α using method
of moments.
1 + 2M1
i. α̂M M E =
M1 − 1
1 − M1
ii. α̂M M E =
M1 − 1
1 + M1
iii. α̂M M E =
M1 − 1
1 − 2M1
iv. α̂M M E =
M1 − 1
Solution:

8
α+1
The expected value of X, E(X) is given as α+2 .
Using method of moments,
α+1
= m1
α+2
1 − 2m1
α=
m1 − 1
The estimator is
1 − 2M1
α̂M M E =
M1 − 1
8. Let X be a discrete random variable taking the values −1, 0, 1 with probabilities P (X =
p p
−1) = , P (X = 0) = , P (X = 1) = 1 − p. Let X1 , . . . , Xn ∼ i.i.d.{−1, 0, 1}. Find
2 2
the estimator of p using the method of moments.
2 − 2M1
(a)
3
2 + 2M1
(b)
3
1 + 2M1
(c)
3
2 + M1
(d)
3
Solution:
The expected value of X, E(X) is given by
X  p  p (2 − 3p)
E[X] = xpX (x) = −1 × + 0× + (1 × (1 − p)) =
x
2 2 2

(2 − 3p)
E[X] =
2
Using method of moments,
(2 − 3p)
= m1
2
The estimator is
2 − 2m1
p̂ =
3
2 − 2M1
p̂ =
3
9. Let X be a random variable with PDF
α
fX (x) = (λa)xα−1 e−λx , x > 0.
where α and a are constants. Find the maximum likelihood estimator of λ for n i.i.d.
samples of X.

9
n
Xiα
P
i=1
(a)
n
n
(b) P
n
Xiα
i=1
n
(c) n
Xiα
P
α
i=1
n
Xiα
P
i=1
(d)

Solution:
Given,
α
fX (x) = (λa)xα−1 e−λx , x>0
Likelihood function of a sampling X1 , X2 , . . . , Xn will be given by
n
Y
L(x1 , x2 , . . . , xn ) = fX (xi )
i=1
α α
= (λa)n (x1 · · · xn )α−1 e−λ(x1 +···+xn )

Likelihood is a function of the parameter so, we can ignore the constant terms in the
likelihood function. Therefore,
α α
L = λn e−λ(x1 +···+xn )
⇒ log(L) = n log(λ) − λ(xα1 + · · · + xαn )

Therefore, ML estimator for λ is given by

λ̂ = arg max[n log(λ) − λ(xα1 + · · · + xαn )]


λ

Let Y = n log(λ) − λ(xα1 + · · · + xαn )


Now,
dY d
= [n log(λ) − λ(xα1 + · · · + xαn )]
dλ dλ
n
n X α
= − x
λ i=1 i

10
Now,
dY
=0

n
n X α
⇒ = xi
λ i=1
n
⇒λ = Pn
Xiα
i=1

10. A random sample of 1000 television screens taken from the household of a city shows
that the average running time of television is 7 hours per day with a standard deviation
of 2 hours. Assume the distribution of measurements to be approximately normal.
Calculate a 99% confidence interval for the daily average television running hours.
Hint: Use P (−2.58 < Z < 2.58) = 0.99.

(a) [6.02, 6.98]


(b) [7.02, 8.19]
(c) [6.12, 7.98]
(d) [6.83, 7.17]

Solution:
Given β = 0.99, n = 1000, X̄ = 7 and σ = 2.
To find: P (| X̄ − µ |≤ α) = 0.99
 
X̄ − µ α
P | √ |≤ √ =0.99
σ/ n σ/ n
 
α
=⇒ P | Z |≤ √ =0.99 where Z ∼ Normal(0, 1)
σ/ n
 
α α
=⇒ P − √ ≤ Z ≤ √ =0.99
σ/ n σ/ n

It is given that (−2.58 < Z < 2.58) = 0.99, therefore,


α σ 2
√ = 2.58 =⇒ α = 2.58 × √ = 2.58 × √ = 0.163
σ/ n n 1000

The confidence interval for µ is [X̄ − α, X̄ + α].


Therefore, 99% confidence interval for µ is [6.83, 7.17].

11
11. The distribution of the diameter of screws produced by a certain machine is normally
distributed with µ and σ unknown. We observe a random sample
9.8, 10.2, 10.4, 9.8, 10.0, 10.2 and 9.6 (in cm).
Find a 95% confidence interval for the mean diameter of screws.
Hint: Use P (−2.447 < t6 < 2.447) = 0.95 and S(sample standard deviation) = 0.283.
(a) [10.74, 11.26]
(b) [9.74, 10.26]
(c) [7.47, 8.26]
(d) [7.98, 8.75]
Solution:
Given that S = 0.283, n = 7, β = 0.95

9.8 + 10.2 + 10.4 + 9.8 + 10.0 + 10.2 + 9.6


Now, X̄ = =10
7
X̄ − µ
Using t-distribution, √ ∼ tn−1 .
S/ n
α
√ =2.447
S/ n
0.283
α =2.447 × √
7
=0.26
P (|µ̂ − µ| < 0.26) = 0.95
So, 95% confidence interval is [10 − 0.26, 10 + 0.26] = [9.74, 10.26].
12. A data scientist wishes to determine the average time it takes to run one epoch of a
machine learning model in her machine. How large a sample will she need to be 95%
confident that her sample mean will be within 15 seconds of the true mean? Assume
that it is known from previous studies that σ = 40 seconds.
Hint: Use P (−1.96 < Z < 1.96) = 0.95.
Answer: 28
Let X denote the time taken to run epoch of a machine learning model.
Given that σ = 40
To find the value of n such that P (|µ̂ − µ| ≤ 15) = 0.95
P (|µ̂ − µ| ≤ 15) = 0.95
 
µ̂ − µ 15
⇒P | √ |≤ √ = 0.95
σ/ n σ/ n
 
15
⇒ P |Z| ≤ √ = 0.95
σ/ n

12
Now,
15
√ = 1.96
σ/ n
√ 1.96
⇒ n = 40 ×
15
⇒ n = 27.31

Therefore, the sample size should be 28.

13

You might also like