Statistics Exercise Solution
Statistics Exercise Solution
Statistics Exercise Solution
y e
y!
y = 0, 1, 2,.............where y! = 1x2x3....xy.
(a) Given that this density function is valid, what properties must it satisfy?
yi e
y!
i
i 0
= 1 and
yi e
0
yi!
i = 0,1,2,........
You might care to initiate a discussion about whether these hold in this case.
There is a well known proof of the first which students do not have to know, the
second is fairly obvious. Also E(y) = and Var(y) = for the Poisson.
(b) Assuming = 2, calculate the following;
2 2 e 2
= 0.27
2!
2 0 e 2
21 e 2
+
0!
1!
(i)
Prob{y = 2} =
(ii)
Prob{y < 2} =
(iii)
(c) Sketch out the shape of the density function if = 2. Is it skewed or symmetric?
Explain.
f(0) = 0.135, f(1) = 0.27, f(2) = 0.27, f(3) = 0.18
Its skewed to the right.
3.
(a)
= x
0 x
= (1 x)
x1
(i) Find
As the density function has a kink at x = , we have to integrate in two steps. The
first step is from 0 to . The second from to 1. Thus must solve the following
equation;
1/ 2
xdx +
( 1 x )dx
= 1
1/ 2
1/ 2
= [x 2 / 2]10/ 2 =
xdx
= [x x 2 / 2]11/ 2 = / 2 ( / 2 / 8 ) =
( 1 x )dx
1/ 2
So
+
= 1 and / 4 1 or 4
8
8
Thus
f(x)
= 4x
0 x
= 4(1 x)
x1
4sds =
F(s) =
0 s
2s 2
F(s) = F(1/2) +
4(1 s)ds
= + 4s 2s 2
1/ 2
1/ 2
+ 4s 2s 2 2 +
= 2s(2 s) 1
1/2 s 1
4x
dx = [4x 3 / 3]10/ 2 =
1
6
= [2x 2 4x 3 / 3]11 / 2 = 2 4 / 3 ( 1/ 2 4 / 24 ) =
4x( 1 x )dx
Thus E(X) =
(v) Find Var(x)
Var(x) = E( x )
E( x )2
1
3
1/ 2
E( x ) =
3
4x dx +
4x ( 1 x )dx
1/ 2
= 1/16 + 4/3
4.
following :
(i)
X
9
4
(ii) Y
Y 5X 43
(iii) Y
5.
3X 17
X 12
(iv) Y
.
4
4
1
Var(Y) = 9
4
Var(Y) = 1
f (x)
3
(4 x 2 )
16
0x2
=0
(a)
otherwise
12x x 3
3
2
(
4
x
)
dx
= 1.5 0.5 = 1 it is proper
=
0 16
16 16 0
2
(b)
12x x 3
3
2
(
4
x
)
dx
=
1 16
16 16 1
2
= 1 11/16 = 5/16
1.7
12x x 3
3
2
16 (4 x )dx = 16 16 =
0.5
0.5
1.7
24x 2 3x 4
3x
2
0 16 (4 x )dx = 64 64 = 96/64 48/64
0
2
(c)
= 48/64 = 0.75
16x 3 3x 5
= 2 96/80 = 0.8
80 0
64
3x 2
(4 x 2 )dx =
E( x ) =
16
0
2
6.
xi
i = 1,2,3,....N
x
N
(x
x) 2
Derive a simpler expression for the sample variance which can be used in
calculations.
Sample variance =
(x
2
i
(x
N
2
i
(x
x) 2
2x ( x i )
N
2
2( x )
1
[ x i2 2xx i ( x )2 ]
(x)
N
2
( x)
(x
N
2
i
( x )2
7.
2
and cov(x,y) = 12
, a and b are constants, show that
the variance of z
E[z E(z) ] 2 = a 2 12 +
b 2 22 +
2ab12 .
a 2 E(x 2 ) b 2 E( y 2 ) a 2 12 b 2 22 + 2abE(xy)
2a 2 12 2ab 2 1 2ab1 2 2b 2 22 2ab1 2
= a 2 (E( x 2 ) 12 )
= a 2 var( x ) +
b 2 (E( y 2 ) 22 ) +
2ab( E( xy ) 1 2 )
b 2 var(b) + 2abcov(x,y)
a 2 E(x 1 ) 2 + b 2 E( y 2 ) 2
a 2 var( x) +
+ 2 abE(x 1 )( y 2 ) =
b 2 var(b) + 2abcov(xy)
10.
f(y)
X
0
0.05
0.10
0.03
0.18
0.21
0.11
0.19
0.51
0.08
0.15
0.08
0.31
0.34
0.36
0.30
1.00
f(x)
Var X E X 2 E X where
Var Y E Y 2 EY where
2
Cov X ,Y E XY E X EY E XY 0.961.13
where E XY
xy px, y .
See the table below values in the brackets are the values of xy.
0.05
0.10
0.03
(0)
(0)
(0)
0.21
0.11
0.19
(0)
(1)
(2)
0.08
0.15
0.08
(0)
(2)
(4)
Y
2
Need to find P Y y X x
PY y X x
P X x, Y y
P( X x)
PY 0 X 0
P X 0, Y 0 0.05 5
using the joint table and (b).
0.34 34
P X 0
PY 1 X 0
P X 0, Y 1 0.21 21
0.34
0.34 34
PY 2 X 0
P X 0, Y 2 0.08 8
0.34
0.34 34
f y X 0
or f y 0
Total
5
34
21
34
8
34
1.0
PY 0 X 1
PY 1 X 1
PY 2 X 1
P X 1, Y 0 0.10 5
using the joint table and (b).
0.36 18
P X 1
P X 1, Y 1 0.11 11
0.36
0.36 36
P X 1, Y 2 0.15 5
0.36
0.36 12
f y X 1
Total
5
18
11
36
5
12
1.0
or f y 1
PY 0 X 2
P X 2, Y 0 0.03 1
using the joint table and (b).
0.30 10
P X 2
PY 1 X 2
P X 2, Y 1 0.19 19
0.30
0.30 30
PY 2 X 2
P X 2, Y 2 0.08 4
0.30
0.30 15
f y X 2
or f y 2
Total
1
10
19
30
4
15
1.0
You could put all 3 conditional distributions together in one table to give the conditional
distribution of Y given X = x as follows :
X
0
0
Y
34
21
15
34
1
10
18
11
34
36
19
36
30
15
y f y
X 1
5
+1
18
=0
11
+2
36
15 41
= or 1.1389 to 4 dec. places.
36 36
= E Y2 X 1 E Y X 1
y p y
where E Y 2 X 1
5
11
15 71
X 1 02 + 12 + 22 = .So
36 36
18
36
71 41
Var Y X 1
0.6752 to 4 dec. places.
36 36
11.
Grosvenor Motors Ltd. is developing a marketing plan to better target advertising and
sales promotion to subgroups. As part of the market research they have prepared the
table given below which indicates the probabilities for subgroups defined by age of
car and owner age group. For example, X = 1 indicates that the car owners are aged
16 25 years and Y = 2 indicates that the car is between 2 and 4 years old.
Age Group (X)
1
(16 25 yrs)
(26 45 yrs )
(46 65 yrs)
( 2 yrs)
0.05
0.17
0.06
( 2 4 yrs)
0.15
0.20
0.07
( 5 yrs )
0.10
0.08
0.12
(a)
What age group would you concentrate your advertising and sales promotion
on if you were attempting to sell cars that are over 5 years old?
As Y is the age of the car, when Y = 3 the car is at least 5 years old. When Y = 3, the
X value with the highest associated probability is X = 3 since 0.12 > 0.10 > 0.08, so
the age-group you would concentrate on is 46 65 years.
(b)
Calculate the following probabilities : P( X = 2, Y = 3), P( X = 3, Y =
1), P X 2 Y 3 , P Y 2 X 3 , P X 2, Y 2 and define
(i) P( X = 2, Y = 3) = 0.08 = the probability that the car owner is aged 26 45 and the
car is at least 5 years old.
(ii) P( X = 3, Y = 1) = 0.06 = the probability that the car owner is aged 46 65 and the
car is at most 2 years old.
(iii) P X 2 Y 3
P X 2, Y 3 0.08 4
= the probability that the car owner
P(Y 3)
0.30 15
(iv) P Y 2 X 3
P X 3, Y 2 0.07 7
(v) P X 2, Y 2 P X 1 or 2, Y 1 or 2 = P( X = 1, Y = 1) + P( X = 1, Y = 2) +
P( X = 2, Y = 1) + P( X = 2, Y = 2) = 0.05 + 0.15 + 0.17 + 0.20 = 0.57 = the probability
that the car owner is no more that 45 years old and his/her car is at most 4 years old.
(c) Find the marginal distributions of X and Y and hence calculate P X 2 and
PY 2 . See the joint table above with row and column totals.
The marginal distribution of X is given by
x
Total
p X x
0.30
0.45
0.25
1.00
Total
pY y
0.28
0.42
0.30
1.00
Cov X ,Y E XY E X EY E XY 1.952.02
where E XY
xy px, y .
See the table below values in the brackets are the values of xy.
0.05
0.17
0.06
(1)
(2)
(3)
0.15
0.20
0.07
(2)
(4)
(6)
0.10
0.08
0.12
(3)
(6)
(9)
Y
3
= 3.95
So
(e) Explain why the random variables X and Y are not independent.
If X and Y were independent, then Cov X , Y would be 0 so as it is not 0, it means
that X and Y cannot be independent.
12.
ky3
2
k
(
x
y
)
dxdy
=
(
kx
y
)
dx
=
0 0
3
0
0
1 1
( kx
kx3 kx
= 2k/3 = 1
3 0
3
Thus k = 3/2
3x 2 y y 3
3( x 2 y 2 )dy
(3x 2 1)
=
f(x) =
=
2 0
2
2
2
0
1
k
)dx
3
3x 3 x
3x 4 x 2
0 2 2 dx = 8 4 = 5/8
0
1
E(x) =
3x 4 x 2
3x 5 x 3
dx =
= 3/10 + 1/6
E ( x ) =
6 0
2
2
10
0
1
= 14/30 Var(x) =
14/30 -
25/64
6( X 2 y 2 )
=
2( 3 X 2 1 )
f ( x , y)
f(y | X) =
f (x)
13.
unknown
var( 2 ) = 22 . A
3 =
a 1 +
(1 a) 2
E( 3 ) =
(b)
(1 a)E ( 2 ) = a + (1 a) =
aE( 1 ) +
Var( 3 ) =
var( 2 ) + 2a(1-a)cov( 1 2 )
var( 1 ) +
12 +
22
22
12 22
1
2
( x i )
n
2 =
(a)
1
2 =
1
1
( x i ) 2 ) =
E((x i ) 2 ) =
n
n
thus unbiased
2
E( x i )
2 E( x i )
+ 2
n
= E( x i2 ) 2 =
Var ( x i ) =
(b)
1
( x i ) 2 ]
s2 =
1
(x i x) 2
2 .
s2 =
1
n
1
(x i x) 2 =
x
2
i
E( s 2 ) =
1
n
var( x i )
x
2
i
2x
xi
n
x2
E( x i2 )
E( x 2 )
x2
E(x )
2
i
1
n
E( x 2 )
[E( x i )]2
(var( x )
[E( x )]2 )
= var( x i ) var( x )
Thus because s 2 contains x and not , the E( s 2 ) depends on the variance of x
as well as the var( x i ) . The variance of x unlike the variance of is non zero.
17.
Construct confidence intervals for the population mean when the sample mean is 11.5
and unbiased estimate of the population variance is 100.
(a) Use a level of confidence of 95 % and a sample size of 25
critical value of t(24) is 2.064. Standard deviation of the sample mean is
11.5 (1.96x0.01)
11.48 11.52 where is the population mean
(b) Use a level of confidence of 90 %
(i) a sample size of 25
critical value of t(24) is 1.711. Standard deviation of the sample mean is
(a)
Calculate P(180X220)
P(180X220) = P(
180
220
180 200
220 200
) = P(
)
X
X
20
20
(b)
Calculate P(160X240)
P(160X240) = P(
160
240
160 200
240 200
) = P(
) =
X
X
20
20
21Test the null hypothesis that the population mean is 10 against a two sided alternative if
the sample mean is 11.5 and an unbiased estimator of the population variance is 100 (as in
1 above).
(a) Set the probability of making a type 1 error is 5 per cent.
(i)
if N = 25
test statistic is
(11.5 10)
= 0.75
10 / 5
(ii)
if N = 100
test statistic is
(11.5 10)
= 1.5
10 / 10
(iii)
if N = 1,000,000
test statistic is
(11.5 10)
= 150.0
10 / 1000
(ii)
(iii)
(c) Would your results in (a) be any different if the population variance was
known to be 100. Explain in detail.
It might make a difference for the two cases N = 4 and 25. Note in all cases
where the sample size is small (below 30 approx) the population must be
assumed to be normally distributed.
Critical values would then be 1.96, but the decision not to reject null would be
the same.
22.
Suppose you wish to test a hypothesis about the mean growth rate of a certain kind
of companies. A random sample of 51 companies give you an estimator of the mean
growth rate of 8.7 per cent. The estimated standard error of this estimator is 0.9. You
know that the estimator has a normal distribution.
(a)Test the hypothesis that the mean growth rate is 10 per cent using a 5 per cent
level of significance.
H 0 population mean = 6
H 1 population mean 6
t = (8.76)/0.9 = 3 this has a t distribution with 50 degrees of freedom.
Critical value is 2.009. Thus reject null.
(c) A new larger sample becomes of 200 companies. The estimator of the mean
growth is 8.5 and standard error is 0.3. Test the null that the mean growth rate is
10 per cent
H 0 population mean = 10
H 1 population mean 10
z = (8.5 10)/0.3 = 5 thus reject the null at 5 % and at 1%
23
(a) The mean of the annual average rate of return (in per cent) for the stock of
the public companies in the smallest decile (by size) in the US for 33 years
(1970-2002) is 13.30. The standard deviation of the sample is 24.91. If you were
to test the hypothesis that the mean annual rate of return is at least 10 per cent,
write down the null and alternative hypotheses and the test statistic.
H 0 population mean = 10
13.3 10
= 0.7611. This has a t(32) if the null is true. 5%
24.91/ 33
critical value for a one sided test is 1.629 approx. Thus do not reject.
(b) The mean annual average rate of return for the largest decile for the same
period is 11.87 per cent. The sample standard deviation is 18.23. Test the
null hypothesis that the annual average rate of return is 11 per cent. Use a
significance level of 5 %.
H 0 population mean = 11
H 1 population mean
Test statistic is
11
11.87 11
18.23 / 33
Note: the standard deviations given are the square root of an unbiased
estimator of the population variance.