Statistics and Probability - Solved Assignments - Semester Spring 2010
Statistics and Probability - Solved Assignments - Semester Spring 2010
Solved Assignments
Semester Spring 2010
Assignment 1
Question 1: (Marks:
2+2+2+4=10
Given that
Now we will find the median by using the empirical relationship among the three measures
i.e.
The main purpose of a statistical study is to make inference about population on the basis of
sample data. So to get descriptive information from sample, we need data. And collection of
numerical data provides the BASIS for the analysis of data to carry out further steps.
A paint retailer has had numerous complaints from customers about under-filled paint cans. As a
result retailer started to inspect the incoming shipments. A recent shipment contained 2,440
gallon-size cans. The retailer sampled 50 cans and weighted each on a scale capable of
measuring weight up to four decimal places and properly filled cans weight 10 pounds.
1. Describe a population
2. Describe a variable of interest
3. Describe the data type of variable
4. Describe a sample
Sol:
a) The population is the set of units of interests to the retailer, which is the shipment of
2,440 cans of paint.
b) The weight of paint cans is the variable, the retailer wishes to evaluate.
c) In this case retailer has to measure the weight, and the weight is continuous quantitative
variable.
d) The sample is the subset of population. In this case, it is the 50 cans of paint selected by
the retailer.
Question 2: Marks:
2+2+6=10
Under this method, the information is gathered by employing trained enumerators who assist the
informants in making the entries in the schedules or questionnaires correctly. This method gives the
most reliable information if the enumerator is well-trained, experienced and tactful.
(b) Average height of the students in a school is 5.2 inches. A sample of 12 students showed the
following heights in inches.
5.0, 5.3, 5.2, 4.9, 4.11, 5.0, 5.5, 5.4, 5.1, 5.0, 5.2, 4.10
Sol:
As µ=5.2 and sample mean of the data is
x=
∑ x = 59.81 = 4.98
n 12
Sampling error = x − µ
=4.98-5.5=-0.22
(c) Find the missing frequencies and complete the following table.
2 2/15
4 1
6 7
8 3
10 15 1
=2/15
So, First class has 2 frequency and in cumulative first class frequency is the first cumulative
frequency so first cumulative will also be 2
And last cumulative frequency is the total no of all the frequencies the difference between 10 and
15 will generate 5 which is the last class frequency
BY dividing all the also frequencies we can obtain the relative frequencies.
x f c.f Cumulativ
Relative
frequency
2 2 2 2/15
4 1 3 3/15
6 4 7 7/15
8 3 10 10/15
10 5 15 15/15
Question 3: Marks:
=2+8=10
a) Can we find out the Median from the following data? If yes, write the reason (No need to
calculate the median).
2000-2999/- 300
3000-3999/- 250
4000-4999/- 50
Sol:
Yes we can find the median from the data as median is the most appropriate measure of average
when data is in open ended class intervals.
(b) Compute Mean, Median and Mode from the following data.
No. of students 1 2 3 5 6
f 15 10 5 15 5
Sol:
1 15 15 15
2 10 20 25
3 5 15 30
5 15 75 45
6 5 30 50
Total 50 155
X=
∑ fx = 155 / 50
Mean= ∑f
= 3.1
Since n/2 =50/2=25 is an integer so, median will be the averages of (n/2)th value and
{(n+2)/2}th value,
n
median = ( ) th value
2
50
= ( ) th value
2
= 25 th value
and
n+2
median = ( )th value
2
50 + 2
=( )th value
2
= (52 )th value
2
= 26 th value
Now we check the 25th value and the 26th value in the cumulative frequency column and found
that these values lie corresponds to 2 & 3 respectively. So
Median= (2+3)/2
=2.5
Mode
As the data is discrete, so mode would be that value; which occur maximum no. of times in the
data set and here we have two modes 1 and 5, as they both occur equal no. of times in the data
set i.e. 15 times.
Assignment 2
Question 1: (Marks:
4x2=8)
Range is only the difference between the minimum and maximum value. It gives no information
about the distribution between two ends of series and it is affected by outliers (highly extreme
values). Hence it can draw misleading/false picture of the observation.
The quartile deviation is superior to range as it is not affected by extremely large or small
observations. It covers the central 50% of values. It is also used in situations where extreme
observations are thought to be unrepresentative.
Both are used to measure the dispersion of the data set and involve each and every data-value in
their computation. But in mean deviation, while using the absolute values we neglect the fact that
some deviations are negative and some are positive. We introduce a kind of artificiality in Mean
Deviation and because of that the further theoretical development or application of the concept is
impossible.
This problem is overcome by computing the standard deviation. This problem is overcome by
computing the Standard Deviation. We square the deviations in Standard Deviation rather than
taking absolute values of the deviations.
That’s why standard deviation is much preferred and widely used measure of dispersion.
A limitation of the Chebychev's theorem is that it gives no information at all about the
probability of observing a value within one standard deviation of the mean. That is when the
value of constant “k” is one. Although huge amount of data fall within µ ± σ , this can not be
explained by this theorem.
d) If coefficient of skewness = 0, then what would you say about the skewness of the
distribution?
If the coefficient of skweness = 0, then it is a symmetrical distribution. That’s mean, median and
mode of distribution is equal.
Question 2: (Marks:
4+8=12)
a) Show that the range is greatly affected by the extreme values; interpret the result.
Solution:
Given that
Then
Range=Xm-X0
=1014-9
=1005
Interpretation:
Observing the values closely, we find that value ‘9’ is significantly smaller than the rest of
values in the data set. And since range depends on this value too, this single value has caused the
range of the data set to be wider and it is presenting a misleading picture about the whole data.
b) The mean and the standard deviation of a set of values is 50 and 10 respectively. Compute
X ± 2 S and X ± 3S . Interpret the results in the light of (i) empirical rule (ii) Chebyshe’s
inequality.
Solution:
X ± 3S = 50 ± 3(10) = (20,80)
Question 3: (Marks:
5+5=10)
a. Find the first two moments about mean from the following data.
Solution:
To find the moments about mean we have to find the mean of the data.
X X −X (X − X )2
34 -17 289
36 -15 225
38 -13 169
40 -11 121
42 -9 81
54 3 9
56 5 25
68 17 289
70 19 361
72 21 441
0 2010
Mean:
ΣX 510
X= = = 51
n 10
m2 =
n
2010
= = 201
10
Q3 = 44.962963
Xɶ = 39.606382
Solution:
(Q1 + Q3 − 2 Median)
Sk =
Q3 − Q1
34.087156 + 44.962963 − 2(39.606382)
Sk =
44.962963 − 34.087156
−0.162645
Sk =
10.875807
Sk = −0.014954752
Assignment 3
Question 1: Marks:
3+3+4=10
∑Y 2
= 26, ∑ Y = 10, ∑ XY = 37
Solution:
s yx =
∑Y 2
− a ∑ Y − b∑ XY
n−2
26 − ( −1)(10 ) − ( 0.5 )( 37 )
=
5−2
26 + 10 − 18.5
=
3
17.5
= = 5.833 = 2.415
3
Y= 2.64 + 10.83 X
And
X= -1.91 + 6.18 Y
Are these lines possible for any data set? Explain your answer:
Solution:
These lines are possible only if the square root of the product of two slopes “r” lies
between -1 and +1. The correlation coefficient “r” in this case is given blow.
r = byx × bxy
r = 10.83 × 6.18
r = 66.93 = 8.18 > 1
c) Two dice are rolled. Make a sample space also find the probability that
n( sum 10) 3
P (Sum is A) = = = 0.0833
n( S ) 36
Let B be the event that sum of the outcomes is equal to 7.
B = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}
n( sum 7) 6
P(B) = = = 0.167 = 6/36
n( S ) 36
Let C be the event that sum of the outcomes is equal to 1.
C = {φ }
n( sum1) 0
P(C) = = =0
n( S ) 36
Question 2: Marks:
4+6=10
a) If S= {1, 2, 3, 4, 5, 6}, A = {1, 2, 3, 4} and B = {3, 4, 5, 6}, then verify whether A and B
are independent?
Solution:
P ( A ∩ B ) = P ( A) × P ( B )
A ∩ B = {3, 4}
P (A ∩ B) = 2/6
P (A) = 4/6
P (B) = 4/6
Since,
P (A) x P (B) = 4/6 x 4/6
P (A) x P (B) = 4/9
P (A) x P (B) ≠ P (A ∩ B)
Hence A and B are not independent.
b) Indicate whether the following statement is true or false for three mutually exclusive
events A, B and C. Justify your answer.
1 2 1 1 1
P( A) = , × P( B) = and × P(C ) =
6 3 6 4 6
Solution:
Given that
1
P( A) =
6
And
2 1
.P( B) =
3 6
1 3 3
⇒ P( B ) = × =
6 2 12
3
⇒ P( B ) =
12
Now
1 1
.P ( C ) =
4 6
1 4 4
P (C ) = × =
6 1 6
For three events to be mutually exclusive there sum must be equal to one
1 3 4 13
P ( A) + P ( B ) + P (C ) = ( ) + ( ) + ( ) = ≠1
6 12 6 12
a) If we draw a card from an ordinary deck of 52 playing cards. Can king and diamond be
mutually exclusive events? Give reason to support your answer.
Solution: The both events can not be mutually exclusive because if we draw a card from an
ordinary deck of 52 playing cards it can be both a king and a diamond. So they are not
mutually exclusive events.
b) A marble is drawn at random from a box containing 10 red, 30 white, 20 blue and 15
orange marbles.
Find the probability that the drawn marble is
i. orange or red
ii. not – ‘red or blue’
iii. not blue
iv. red, white or blue.
Solution:
Red marbles White marbles Blue marbles Orange marbles Total
10 30 20 15 75
a)
No. of Petals
P(X)
X
x1 = 3 0.05
x2 = 4 0.10
x3= 5 0.20
x4 = 6 0.30
x5 = 7 0.25
x6 = 8 0.075
x7 = 9 0.025
Total 1
Sol:
X P(X)
-2 0.1
-1 k
0 0.2
1 2k
2 0.3
3 3k
Find
(i) K (ii) P(X<2) (iii) P (X≥2).
Sol
X P(X) P(X)
-2 0.1 0.1
-1 k 0.0667
0 0.2 0.200
1 2k 0.1333
2 0.3 0.3000
3 3k 0.2000
Total 0.6+6k 1.000
∑ P ( X ) = 0.6 + 6k
AS
∑ P( X ) = 1
6k = 1 − 0.6 = 0.4
6k = 0.4
k = 0.4 / 6
k = 0.0667
Question 2: Marks:
2+2+6=10
Sol
= 2 (4) + 5 (1)
=8+5
= 13
3 9 3
h(0) = + +
28 28 28
15
h(0) = = 0.5357
28
c) Let X and Y are two discrete r.v.’s with the following joint probability distribution:
x
1 2
y
1 0.10 0.15
2 0.20 0.30
3 0.10 0.15
Sol
x
1 2 h(y)
Question 3: Marks: 10
Find
(i) Joint Probability distribution table
(ii) Marginal probability function of X and Y,
(iii) Are X and Y are independent.
Solution:
x 1 2 3
(ii)
5
xy 2 y 4 y 5 y y
h ( y ) = ∑ f ( x, y ) = ∑ = + + = for y = 1, 2,3
x x = 2 66 66 66 66 6
(iii)
x y xy
Now g ( x ) .h ( y ) = × = = f ( x, y )
11 6 66
a) When you consider poisson distribution as the limiting form of the binomial distribution?
Solution:
It is a limiting approximation to the binomial distribution, when p, the probability of success is
very small but n, the number of trials is so large that the product np = µ is of a moderate size.
b) The mean and standard deviation of the population is 30 and 5 respectively. The probability
distribution of the parent population is unknown, find the mean and standard error of the
sampling distribution of X when n=50
Solution:
Given is
µ = 30,σ = 5 and n = 50
As we know that
µX = µ
⇒ µ X = 30
c) Ten vegetables cans, all of the same size, have lost their labels. It is known that 5 contain
tomatoes and 5 contain corns. If 5 are selected at random, what is the probability that all contain
tomatoes? What is the probability that 3 or more contain tomatoes?
Solution:
K= 5 N= 10 N-K= 5 n=5
Let X denote the number of tomatoes cans then hypergeometric distribution is given by
k N − k
x n−x
P ( X = x) =
N
n
C55C05 1
P( X = 5) = 10
= = 0.00397
C5 252
Question 2: Marks:
3+7=10
Solution:
Sampling with replacement: Sampling is said to be with replacement when from a population a
sampling unit is drawn, observed and then returned to the population before another unit is
drawn.
In sampling with replacement, an element can be selected more than once.
Sampling without replacement: Sampling is said to be without replacement when from a
population a sampling unit is drawn and not returned to the population before another unit is
drawn.
b) A finite population consists of values 6, 6, 9, 15 and 18. Calculate the sample means for all
possible random samples of size n=3, that can be drawn from this population without
replacement. Make the sampling distribution of sample mean and find the mean and variance of
this distribution.
Solution:
Given data is
N=5, n=3
No Samples x = ∑x/n
1 6,6,9 7
2 6,6,15 9
3 6,6,18 10
4 6,9,15 10
5 6,9,18 11
6 6,15,18 13
7 6,9,15 10
8 6,9,18 11
9 6,15,18 13
10 9,15,18 14
x f f (x) xf ( x ) x2 f (x)
Question 3: Marks:
2+2+6=10
a) Find the value of maximum ordinate of the standard normal curve correct to four decimal
places.
Solution:
Since the standard normal probability density function is symmetric about zero, its maximum
ordinate is at Z=0
1 − (0)2 / 2 1
= e =
2π 2.507
= 0.3989
b) If Z is a standard normal variable with mean 0 and variance 1, then find the Lower quartile.
Solution:
P( Z < Q1 ) = 0.25
φ (Q1 ) = 0.25
(Q1 ) = φ −1 (0.25)
Q1 = −0.6745
As we know that
Q1 = µ − 0.6745σ
Putting value
Q1 = 0 − 0.6745(1)
Q1 = 0.6745
c) Let X 1 , X 2 , X 3 be a random sample of size 3 from a population with mean µ and variance σ
2
X1 + X 2 + X 3
T1 =
3
X1 + 2 X 2 + X 3
T2 =
4
First we examine which one among T1 & T2 is unbiased. If ONLY one of them is the unbiased
we can prefer it as a better estimator. If both of them are unbiased then we have to compare their
variances. The estimator with least variance will be the preferred.
So let’s first see unbiasedness:
And for T2
X + 2X2 + X3
E (T2 ) = E 1
4
1 4µ
E (T2 ) = E ( µ + 2 µ + µ ) = =µ
4 4
So T2 is also unbiased.
Since both estimator are unbiased, NOW we have to check there variances.
X + X2 + X3 1
Var (T1 ) = Var 1 = 9 [Var ( X 1 ) + Var ( X 2 ) + Var ( X 3 ) ]
3
1 3σ 2
σ2
= σ 2 + σ 2 + σ 2 = =
9 9 3
X + 2X2 + X3 1
Var (T2 ) = Var 1 = 16 [Var ( X 1 ) + 4Var ( X 2 ) + Var ( X 3 ) ]
4
1 6σ 2
3σ 2
σ 2 + 4σ 2 + σ 2 = =
16 16 8
1 3
< so
3 8
Var (T1 ) < Var (T2 )
Hence we conclude that since T1 is unbiased as well as has low variance, so this estimator T1 is
better then T2.
Assignment 6
Question 1: Marks:
5x2=10
Solution:
If H1 :θ < θ 0
(i) Then the test is left-tailed test, and the rejection region is located in the left tail of the
distribution.
If H1 :θ > θ 0
(ii)
Then the test is right-tailed test, and the rejection region is located in the right tail of the
distribution.
If H1 :θ ≠ θ 0
(iii)
Then the test is two-tailed test, and the rejection region is located equally in both tails of the
distribution.
2) If α = 0.10, how many intervals would be expected to contain µ ?
Solution:
We would expect about 90% of all such confidence intervals to contain µ and 10% to miss µ , in
the repeated sampling.
3) What does role the sample mean play in a two-sided confidence interval for µ based
on a random sample from a normal distribution?
Solution:
The sample mean is the mid point of the confidence interval but has no effect on the length of the
confidence interval.
Solution:
5) If an automobile is driven on the average no more than 16000 Km per year, then
formulate the null and alternative hypothesis.
Solution:
H 0 : µ ≤ 16000 km
H1 : µ > 16000km
Question 2: Marks:
2+2+6=10
a) The average yield of corn of variety A exceeds the average yield of variety B by at least
200 Kg per acre, formulate null and alternative hypothesis.
Solution:
H 0 : µ A − µ B ≥ 200 kg
H1 : µ A − µ B < 200 kg
Solution:
If the value of parameter is fully specified (i.e. H1 : µ ≠ µ0 ), we use two tailed test. If the
parameter of the distribution is not specified ( H1 : µ > µ0 or H1 : µ < µ0 ) then we use one sided
test.
c) In a poll of college students in a large university, 300 of 400 students living in students
residences (hostels) approved a certain course of action, whereas 200 of 300 students not
living in students’ residences approved it. Compute the 90% confidence interval for the
difference of proportions.
Solution:
or 0.08 ± (1.645)
(0.75)(0.25) + (0.67)(0.33)
400 300
Question 3: Marks:
5+5=10
a) The Punjab Highway Department is studying the traffic pattern on the G.T. Road near
Lahore. As part of the study, the department needs to estimate the average number of
vehicles that pass the Ravi Bridge each day. A random sample of 65 days gives x = 5010
and s = 650. Find the 90 percent confidence interval estimate for µ, the average number
of vehicles per day.
Solution:
α = 0.1
α = 0.05
2
Zα = Z 0.05 = 1.645
2
s
x ± zα 2
n
650
5010 ± 1.645
65
or 5010 ± 132.62
or 4877.38 to 5142.62
or, rounding the above two figures correct to the nearest whole number, we have :
4877 to 5142
b) Mr. Ali wants to run election for City Government. After a strong election campaign, Mr.
Ali’s staff conducts their own poll over the weekend prior to the election. The results
show that for a random sample of 500 voters 290 will vote for Mr. Ali. Develop a 95
percent confidence interval for the population proportion who will vote for Mr. Ali
using α = 0.05 .
Solution:
p̂(1 − p̂ )
p̂ ± z α / 2
n
0.58(1 − 0.58)
= 0.58 ± 1.96
500
= 0.58 ± 0.043
= (0.537, 0.623)
The end points of the confidence interval are 0.537 and 0.623. The lower point of the confidence
interval is greater than 0.50. So, we conclude that the proportion of voters in the population
supporting Mr. Ali is greater than 50 percent. He will win the election, based on the polling
results.