STATISTICS FINAL EXAM (MPM) Answer Sheet
STATISTICS FINAL EXAM (MPM) Answer Sheet
Instructor’s Name ✓
Hussen
2 You can work on the exam for 48 hrs. Please submit the exam answer file typed in a word document
or hand writing to your course instructor at 2:30 AM morning Local time ON Saturday, January
28, 2023 CPU-3 Instructors office. Unequivocally no postponements - Late submissions are totally
unacceptable. The only option for submission is by using hard copy, problems will not be accepted
for any case.
4 Once exam answer file is submitted for grading, no requests for amendments or supplements will be
permitted.
1
Answer sheet– Write your answers below
CASE ONE:
#1: State briefly the relative importance of sampling over complete enumeration. ( 5 pts)
Sampling theory provides the tools and techniques for data collection, keeping in mind the
objectives to be fulfilled and the nature of the population.
1. Sample surveys
Census: The complete count of the population is called a census. The observations on all the
sampling units in the population are collected in the census. For example, in India, the census is
conducted at every tenth year in which observations on all the persons staying in India is collected.
Sample: One or more sampling units are selected from the population according to some specified
procedure. A sample consists only of a portion of the population units. Such a collection of units
is called the sample.
Sampling involves the collection of data on a smaller number of units in comparison to the
complete enumeration, so the cost involved in the collection of information is reduced. Further,
additional information can be obtained at little cost in comparison to conducting another separate
survey. For example, when an interviewer is collecting information on health conditions, then
he/she can also ask some questions on health practices. This will provide additional information
on health practices, and the cost involved will be much less than conducting an entirely new survey
on health practices.
2. Organization of work:
2
It is easier to manage the organization of a collection of a smaller number of units than all the units
in a census. For example, in order to draw a representative sample from a state, it is easier to
manage to draw small samples from every city than drawing the sample from the whole state at a
time. This ultimately results in more accuracy in the statistical inferences because the better
organization provides better data and in turn, improved statistical inferences are obtained.
3. Greater accuracy:
The persons involved in the collection of data are trained personals. They can collect the data more
accurately if they have to collect a smaller number of units than a large number of units.
The data from a sample can be quickly summarized. For example, the forecasting of the crop
production can be done quickly based on a sample of data than collecting first all the observation.
5. Feasibility:
Conducting the experiment on a smaller number of units, particularly when the units are destroyed,
is more feasible. For example, in determining the life of bulbs, it is more feasible to fuse a
minimum number of bulbs. Similarly, in any medical experiment, it is more feasible to use less
number of animals
CASE TWO:
Briefly discuss the purpose and meaning of the different stages of Statistical investigation. (5 pts)
1. Collection of data: the process of measuring, gathering, assembling the raw data up on
which the statistical investigation is to be based. It means the methods that are to be
employed for obtaining the required information from the units under investigations.
Importance
i. Low cost and universal
ii. Free from biases.
iii. Respondents have adequate time to respond iv. Fairly approachable
3
2. Organization of data: Summarization of data in some meaningful Data organization
is the practice of categorizing and classifying data to make it more usable. Similar to a file
folder, where we keep important documents, you’ll need to arrange your data in the most
logical and orderly fashion, so you — and anyone else who accesses it — can easily find
what they’re looking for.
4. Analysis of data: The process of extracting relevant information from the summarized
data, mainly using elementary mathematical operation. Is the process of systematically
applying statistical and/or logical techniques to describe and illustrate, condense and recap,
and evaluate data.
Data Analysis is essential as it helps businesses understand their customers better, improves sales,
improves customer targeting, reduces costs, and allows for the creation of better problem-solving
strategies.
5. Inference of data: The interpretation and further observation of the various statistical
measures through the analysis of the data by implementing those methods by which
conclusions are formed and inferences made. Statistical inference is the process of drawing
conclusions about an underlying population based on a sample or subset of the data. In most
cases, it is not practical to obtain all the measurements in each population.
4
Inferential Statistics is important to examine the data properly. To make an accurate conclusion,
proper data analysis is important to interpret the research results. It is majorly used in the future
prediction for various observations in different fields.
CASE THREE:
CPU College has registered 12,000 students for the last four years. The college administration
would like to know the number of students who have participated in co-curricular activities. For
the purpose of the study, the administrator collected the names of 400 students from the files by
taking proportional number of students from each of the years (batches) for interview. (5 pts)
Based on the above information, find
a. The variable of interest
b. The source of data (primary or secondary)
c. The population
d. The sample
e. The sampling technique used
#3:
a. The variable of interest
The number of the students who have participated in co-curricular activities
Co-curricular activities
b. The source of data
Secondary data because the administrator of CPU college was collected the names of students from
the files not taking the name of the students directly from their.
c. The population
The total number of students that registered in CPU college in the last last four years. i.e 12,000
students re the population.
d. The sample
The number of the students that are the college administrator collected from the total number of
the students within the college by taking proportional number students from each of the
years(batches) for interview.
i.e. 400 students are sample
e. The sampling technique used
The sampling techniques is stratified sampling. Because the population is first divided into
groups(strata) according to batches and the population are heterogeneous.
5
CASE FOUR:
Suppose one box contains 5 black and 3 white balls and a second box contains 4 black and 6 white
balls if one ball is drawn from each box, what is the probability that…………(3 pts)
=1/4
b) Both are white.
The probability of both are white
P(both are white)= 3/8*6/10
=9/40
c) The sample space at the first box is 8
P1(B)=5/8
P1(W)=3/8
The sample space of the second box is 10
P2(B)=4/10=2/5
Then the probability of 1 white and 1 black balls
1 white from the first box and a black ball from the second box P(W1 Λ B2) 1 white ball from
the second box and 1 black from the first box i.e. P(W2 Λ B1)
So that P(W Λ B) = P(W1 Λ B2)+ P(W2 Λ B1)
6
P(W Λ B)= P(W1)(B2)+ P(W2)( B1)
=3/8*4/10+6/10*5/8
=3/20+3/8
=6+15/40
=21/40
CASE FIVE:
#5: The frequency distribution of the hourly wage rate of 60 employees of a paper mill is
as follows: (5 pts)
Wage rate (Rs.) 54-56 56-58 58-60 60-62 62-64
Number of workers 10 10 20 10 10
Calculate the
a. Mean
Ʃ𝑓(𝑥)
X=
𝑁
Find Class Mark
𝑈𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 +𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
x= 2
7
3540
x = Ʃ𝑓(𝑥)
𝑁 = = 59
60
Mean=59
b. Range
VARIANCE =
400
VARIANCE =S = 2
=6.78
60−1
Standard déviation =√𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =√𝑠 2 = √6.78=2.60
8
Standard deviation = 2.60 AND VARIANCE
=6.78
d. Median
N/2 = 60/2 = 30
40 the first cumulative frequency to be greater than or equal to 30
Median = 58.41
e. Mode
Solution
Wage No of Worker (Frequency) f
54-56 10
56-58 10
58-60 20
9
60-62 10
62-64 10
58-60 is the modal Class (Highest Frequency)
∆1
X=𝐿𝑚 + (∆1+∆2)𝐶
Lm= 58
W = class size =60-58=2
1=the difference between frequency of the modal class and frequency of the class before it.
= 20-10=10
2=the difference between the frequency of the modal class and the frequency of the class after
it. = 20-10=10
10
= 58 (10+10) *2=59
Mode = 59
CASE SIX:
Suppose that a couple will have three children. Letting B denote a boy and G denote a girl. List
the sample space outcomes and probability that correspond to each of the following events. (8
pts)
10
#6:
sample space
BBB BBG BGG BGB GGG GGB GBB GBG
B
B G
B G B
G
G B B
G
G B
ii. from the total sample space i.e. 8 there are three ways to have exactly two girls(GGB ,GBG
and BGG)
P(exactly two girls) =3/8
iii. There are three ways to have exactly one girl child (BBG,BGB,GBB)
P(exactly one girl child)= 3/8
iv. There are only one way to have not at least one girl (BBB)
11
P none of the girl child =1/8
CASE SEVEN:
Suppose that the CPU college dean of students has generally assumed that the average age of a
student is no more than 20 years. However, lately the students have appeared to be somewhat older
than before, and the office believes that the average age now might be older. Suppose that 50
students are chosen from enrolment records randomly and the means found to be 20.76. If the
population standard deviation for the ages of these university students is 3.6 years, perform a
hypothesis test at 𝛼 = 0.05………..(8 pts)
a. State the hypothesis
b. State the decision rule
c. Compute the value of the test statistic (in this case the Z-value)
d. Accept or reject H0
#7:
a.
i. Null hypothesis H0 =µ= 20 years
ii. Alternative hypothesis h1 µ> 20 years
b. The test is the right test so that, to reject the null hypothesis the value of the Z
calculated(Zcal) > the value of the Z tabulated (Ztab) at α=0.05 and No reject Null
hypothesis the value of Z calculated(Zcal) < the value of Z tabulated (Ztab) at α= 0.05
Hence: at α=0.05
Ztab = ±1.65
Or if Zcal > Ztab(±1.65) reject H0
If Zcal < Ztab(±1.65) Not reject H0
12
20.76−20 0.76 0.76
Zcal = = 3.6⁄ = 0.51
3.6/√20 7.1
d. Accept or reject
When the right tailed test value of Zcal < the value of Ztab the null hypothesis would be
accepted.
Hence our value of Zcal is less than the value of Ztab. So null hypothesis accepted.
CASE EIGHT:
Write the difference between random and non-random sampling techniques and list different
sampling techniques under each category?
#8:
There are mainly two methods of sampling which are random and non-random sampling.
Random sampling is referred to as that sampling technique where the probability of choosing
each sample is equal.
The sample that is chosen randomly is an unbiased representation of the total population. If at all,
the sample chosen does not represent the population, it leads to sampling error.
Non-random sampling is a sampling technique where the sample selection is based on factors
other than just random chance. In other words, non-random sampling is biased in nature.
Here, the sample will be selected based on the convenience, experience or judgment of the
researcher.
Following are some of the points of difference between random sampling and non-random
sampling.
Random sampling Non-random sampling
Random sampling is a sampling technique Non-random sampling is a sampling
where each sample has an equal probability of technique where the sample selected will be
getting selected based on factors such as convenience,
judgement and experience of the researcher
and not on probability
Random sampling is unbiased in nature Non-random sampling is biased in nature
Based on probability Based on other factors such as convenience,
judgement and experience of researcher but,
not based on probability
13
Random sampling is representative of the Non-random sampling lacks the
entire population representation of the entire population
Zero probability never occur Zero probability can occur
Random sampling is the most simple Non-random sampling method is a somewhat
sampling technique complex sampling technique
Probability sampling methods
✓ Simple random sampling.
✓ Systematic sampling.
✓ Stratified sampling.
✓ Cluster sampling.
✓ Convenience sampling.
✓ Purposive sampling.
✓ Snowball sampling.
The commonly used non-probability sampling methods include the following.
✓ Convenience or haphazard sampling.
✓ Volunteer sampling.
✓ Judgement sampling.
✓ Quota sampling.
✓ Snowball or network sampling.
✓ Crowdsourcing.
✓ Web panels.
CASE NINE:
A researcher wishes to estimate the number of days it takes an automobile dealer to sella
Chevrolet Aveo. A random sample of 50 cars had a mean time on the dealer’s lot of 54 days.
Assume the population standard deviation to be 6.0 days. Find the best point estimate of the
population mean and the 95% confidence interval of the population mean. (6 points)
#9:
Given
X mean of the sample =54
α= standard deviation of the population =6
CI= confidence interval of the population mean =95%
14
Required
a) µ= mean of population
b) 95% confidence interval of the population on mean
Solution:
a) We have already seen that the mean “x” of a sample can be used to estimate the mean “µ”
of the population.
However, the mean of every sample will equal the population mean. Hence the best point
estimate of the population on mean “µ” is 54 days.
b) The confidence interval of the mean of the population for 95% confidence interval
𝛼
Formula x± Zβ/2 (√𝑁) or
𝛼 𝛼
x- Zβ/2 (√𝑁) <µ<x+ Zβ/2 (√𝑁)
where x = sample = 54
µ= population mean
Zβ/2 = standard error of population mean =6
α= standard deviation of population
N= number of sample = 50
For 95% confidence interval standard error of the population mean(Zβ/2) =1.96
Or Zβ/2 =1.96
Then
6 6
54-1.96(√50 )<µ<54+1.96(√50 )
6 6
54-1.96(7.1)<µ<54+1.96(7.1)
54-1.96(0.85) <µ<54+(0.85)
54-1.67<µ<54+1.67
52.33<µ<55.67
Or 54±1.67
Hence with 95% confidence that the interval between 52.33 and 55.67 days does contain the
population mean.
15