0% found this document useful (0 votes)
129 views39 pages

Chapter 4.3 ZICA

This document discusses sampling methods and statistical inference. It describes different sampling techniques including simple random sampling, systematic random sampling, stratified random sampling, and cluster sampling. It also covers non-probability sampling methods like convenience sampling and judgment sampling. The document then discusses statistical inference, dividing it into estimation and hypothesis testing. It lists four key properties of good estimators: unbiasedness, consistency, efficiency, and sufficiency.

Uploaded by

Dixie Cheelo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views39 pages

Chapter 4.3 ZICA

This document discusses sampling methods and statistical inference. It describes different sampling techniques including simple random sampling, systematic random sampling, stratified random sampling, and cluster sampling. It also covers non-probability sampling methods like convenience sampling and judgment sampling. The document then discusses statistical inference, dividing it into estimation and hypothesis testing. It lists four key properties of good estimators: unbiasedness, consistency, efficiency, and sufficiency.

Uploaded by

Dixie Cheelo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

4.

3 Methods of Sampling and Survey Design

Sampling is a process of examining a representative number of items (people or


things) out of the whole population or universe. The reason for sampling is to
gain an understanding about some feature or attribute of the whole population,
based on the measures of the sample.

To find a good sample is often not easy. For example, we sample the fruit at the
top of a basket we may have no idea if there is a bad fruit in the middle or at the
bottom of the basket. If we are studying college students’ attitudes and we
interview students on the steps of the School of technology, we may never
encounter a business student. A sample only provides an estimate of a population
measure and accuracy of the estimate will depend on:

a) The right size for a sample, the larger the sample size the greater is the
probability that the sample is representative of the population.

b) Selecting the right sampling method so that the sample represents the
population.

c) The extent of variability in the population.

Why do we sample?

1) Time and cost are probably the two most important reasons for sampling.
If one wants to determine if customers in a given market will buy a
product, one doesn’t usually have the time or funds to interview all
potential customers. For example, try interviewing everyone who uses
protex in Zambia.

2) Testing may prove destructive. If you want to test the durability of your
product, stress tests on the entire output may leave you with no product to
sell.

3) Accuracy is another reason for sampling. One would think that a study of
the entire population would be more accurate than a study of a sample. If
one has a very large population and wishes to take a complete count, then
one must hire a large number of inventory takers who must work at a rapid
rate. As more personnel are hired, it is likely that they may be less
efficient than the original employees. Thus, a limited number of skilled
workers, studying a well defined sample, may provide more accurate
results than a survey of the entire population.

156
Sampling Designs

Probability Samples

Taking a sample to use for the study of a population means that one must make
sure that the sample really represents the population. The first problem is to
decide what is the population that should be sampled. We need a sampling
frame. A sampling frame is a list of every item or member of the population to
be sampled. Sampling of individual items should be done at random i.e random
sampling procedure yielding a representative sample requires that everyone in the
population has an equal chance of being chosen.

Simple Random Sample

Choosing people from a list may be done using a table of random numbers. If we
were interested in sampling a group of 10 people, for example, we could number
10 pieces of paper from 0 through 9 and put them in a container. Then, we could
mix the pieces of paper and draw out a number. This is essentially what has been
done when a random number table is constructed. Such a table uses numbers of
one, two, three, four or more digits, and a computer picks a number at random.
These numbers are then recorded in a table such as Table 1 in the Appendix..

Table 1 shows a list of randomly selected numbers. Let us say that we wanted to
select a sample of 20 people from a list of 500. First we number everyone on the
list from 1 to 500. Now we might decide, which number to choose as the start of
our sample.

Assume we close our eyes and put our pencil down upon a number, say 62463,
the sixth number in third column. Now, assume that we continue reading down
this column from this point on and then start at the top of the next column. We
discard numbers that are too large for our list (that is, numbers with three digits
and less than or equal to 500 are retained). We could just as easily read across the
rows. Any procedure is acceptable as long as we follow it consistently.

Now let us pick 20 numbers we have, 022, 077, 065, 484, 140, 135, 239, 074,
037, 275, 474, 075, 145, 401, 264, 076, 449, 374, 230 and 041. Note that if the
number was selected previously, we discard it. We now look up these numbers
on our numbered list and interview the appropriate persons.

Nowadays, many hand calculators have random number programs that can be
used to generate a set of random numbers. The method just described above is
called Simple random sampling.

157
Systematic Random Sampling

This method consists of starting with a random number from the random number table
counting down to this number on our list. Then selecting additional units at evenly
spaced intervals (every kth population unit K > 1) until the sample has been formed. If
the list follows some numerical pattern, this method can’t be used. The regular pattern
might cause bias.

Stratified Random Sampling

Another random sampling technique is called stratified random sampling , in which we


divide our population into groups and take a selected sample size in each of the groups.
This technique is used for two reasons:

i) it can lead to reduced sampling error; and


ii) it ensures a large enough sample in each stratum (class) for study of the particular
strata.

As an example, we may be planning a survey to determine the profitability of offering a


refuse collection service in a town. Recognizing that wealthy home owners might stratify
the population of homes on the basis of value, forming three strata: homes valued at
K30,000,000 and less, more than K30,000,00 but less than K100,000, and homes valued
at K100,000 and more. From each stratums, we then take a random sample of selected
homes.

Cluster Sampling

Clusters are identified in the populations, a set of clusters is randomly sampled, and a
complete census is taken of each to form the sample. Clustering tends to decrease costs
and increase sampling error for the same size sample, because people who live close
together are more likely to be similar than others. For example, a few geographical areas
(perhaps a township or a road in a town) are selected at random and every single
household of the population.

In general, population units may be stratified on any number of characteristics common


variables used for stratifying in ‘people” surveys are age, income, and location of
residence.

158
Non Probability Samples

These samples do not rely on the laws of probability for selection, but depend on the
judgment of interviewers or their supervisors.

Convenience samples consist of studies of people who happen to be available or who call
in their results. Suppose we are interested in conducting a study of the attitudes of
shoppers at a new shopping center on the kinds of stores in the center, the attractiveness
of the center, parking difficulties, and so on. To collect sample information, we ask
persons to participate in the survey who happen to walk past the central area of the
center. The sample in this instance is a convenient sample – the people are not being
selected according to a probability plan and, presumably judgment is not being used in
selecting those to participate in the survey. Convenience samples are prone to bias by
their very nature – selecting population elements that are convenient to choose almost
always makes them special ro different from the rest of the elements in the population in
some way.

Judgment Sampling

Judgment sampling is done by an expert who is familiar with the population measures.
He selects the units from the population. The quality of judgment sample depends on the
competence of the expert who selects the population units to be sampled.

Quota Sampling

Quota sampling attempt to ensure that the sample represents the characteristics of the
population. The interviewer is free to select any one who meet the given specifications.
He or she may choose in a non random fashion. We cannot make good estimate of
sampling error because we haven’t used a random sampling procedure. The method is
cheap and reasonably effective and in consequence is widely used.

159
4.4 Statistical Inferences

Introduction

Statistical inference can be divided into two parts namely estimation and
hypothesis testing. Firstly we deal with estimation which is the procedure or
rules or formulae used to estimate a population characteristics (parameter).
Sample measures (or statistics) are used to estimate population measures (such as
population means, µ the Greek symbol ‘mu’, population variance σ 2 the Greek
symbol ‘Sigma’). The corresponding sample measures are sample mean,‘ x ’
pronounced x-bar, and sample variance ‘ S 2 ’ respectively.

Hypothesis testing is the process of establishing theory or hypothesis about some


characteristics of the population and then draw information from a sample to see
if the hypothesis is supported or not

Estimation

Properties of good Estimators

There are four properties of a good estimator

a) Unbiasness. An estimator is said to be unbiased if the mean of the sample


mean x of all possible random samples of size n , drawn from a
population of size N, equals the population mean ( µ ) . Therefore the
mean of the distribution of the sample means will be the same as the
population mean.

b) Consistency. An estimator is said to be consistent if, as the sample size


increases, the accuracy of the estimate of the population parameter also
increases.

c) Efficiency. An estimator is said to be efficient than any other if, it has


the smallest variance among all the estimators.

d) Sufficiency. An estimator is said to be sufficient if it utilizes all the


information about it to estimate the required population parameter.

In practical situation, it is not possible to have all the four qualities on one
estimator. The researcher choose which qualities he/she wants the estimator to
have.

160
Distribution of Sample Means

Consider a population with mean = µ and standard deviation = σ . Samples of size n


are taken from this population and the sample means x are found. The distribution of
σ
sample means has mean µ x = µ and standard deviation (standard error) σ x = .
n

If the population is normally distributed, or if the sample size is ‘large’ (i.e. n > 30) , then
the sample means is approximately normal.

Example 1

A normal population has a mean = 500 and standard deviation = 125. Find the
probability that a sample of 65 values has a mean greater than 538.

We have µ = 500 and σ = 125. The distribution of sample means has mean

σ 125
µ x = 500 and s tan dard error σ x = = = 15.50
n 65

σ x = 15.50

µ x = 500 538Sample means

538 − 500
Z= = 2.45
15.50

Therefore the probability that a sample mean greater than 538 is equal to the area shaded
= 0.5000 – 0.4929 = 0.0071 from the tables.

161
Example 2

The daily output from a production line has a mean 7500 unts with a standard deviation
of 500 units. What is the probability that during the next 125 days the average output
will be under 7400 units per day?

We have µ = 7500 and σ = 500. The population of sample means is approximately


σ 500
Normal (since n is large) with µ x = µ = 7500 and σ x = = = 44.72
n 125

σ x = 44.72

7400 µ x = 7500 Sample means

7400 − 7500
Z= = −2.24
44.72

P ( Samplemean < 7400) = area shaded


= 0.5000 − 0.4875
= 0.0125

Confidence Intervals

If we have chosen a good sample and have calculated the mean from the sample for the
effect we wish to study, we may offer this estimate to the public or a company as an
estimate for the population mean. This is called a point estimate. The only problem is
that we offer evidence from one sample about the nature of the population, and we have
no idea how reliable this estimate is.

Reliability depends on sample size n and the amount of variability in the original
population σ . Even more helpful would be a combination of the estimate, the standard
error of the estimate and some notion of the probability of being correct. This
information is contained in what is called a confidence interval.

162
Since sample means are normally distributed, we can use normal curve probabilities to
describe our estimates.

The following confidence intervals hold for the normal curve

X ± 1.96 σ x = 95% confidence Interval


X ± 1.65 σ x = 90% confidence Interval

In general the (1 − α )% confidence interval is given by X ± Z α σ x where


2

X is the sample mean


α
Z α is the critical po int whose area to the right is .
2 2

σ x is the s tan dard error

Example 3

Find the 95% confidence limits for the average daily output over 125 days given in
Example 2.

We have µ = 7500 and σ = 500. Also µ x = 7500, σ x = 44.72. The 95%


confidence limits for a sample mean are:

x ± 1.96σ x = 7500 ± 1.96(44.72)


= 7500 ± 87.65
= 7412.35 to 7587.65

We are 95% sure that the average output over 125 days will be between 7412.35 and
7587.65 units per day.

The principles involved in setting confidence limits can be used to determine what
sample size should be taken, if we wish to achieve a given level of precision.

163
Example 4

In Example 2, if the daily output from a production line has a mean of 8000 units with a
standard deviation of 534 units. If the probability is 0.99 that the error will be at most
116 points on the test scale for such data, how large should the sample size be?

d = the error term which is half the width of the confidence interval. Hence d = 2.5

σ α
Therefore; d = Z α , α = .01, = .005, Z 0.005 = 2.85 from the tables.
2 n 2

2.85(534)
Hence, 116 =
n

(2.85)(534)
n=
116
n = 172.13
n ≅ 173

Round up to the next whole number

Finite Population Correction Factor (FPC)

For random samples of size n taken from a finite population having the mean µ and
standard deviation σ , the sampling distribution x has the mean µ x = µ and the standard
σ N −n
deviation σ x =
n N −1

Where N is the population size and the sample size is n > 5% of the population size.

164
Example 5

A sample of 120 is drawn from a population of 1200 with a sample standard deviation of
9 centimeters. What is the finite population correction factor? What is the standard error
of the mean?

N −n
fpc = ifn > 5% N
N −1

5% of N = .05(1200) = 60. Therefore

1200 − 120
fpc = = 0.949
1200 − 1

σ N −n 9
σx = = (0.949) = 0.7797cm
n N −1 120

Estimation of Population Proportions

So far the process of statistical inference has been applied to the arithmetic mean. The
standard error of a sample proportion is given by

pq
σ pˆ = and µ pˆ = P
n

where p̂ is the sample proportion, P is the population proportion and q = 1 − p.

Using this value, we are able to set confidence interval for the estimate of the population
proportion based on the sample proportion in exactly the same way as outlined previously
for the mean as:

pq
µ pˆ ± Z α
2
n

165
Example 6

A manufacturing process produced approximately 5% defective items.

a) Find the mean and standard deviation of the proportion of defectives obtained in
sample 500 items.

b) Find the 95% confidence limits for the proportion of defectives in a sample of 500
items.

a) We have p = 0.05, and n = 500

∴ mean = µ pˆ = P = 0.05

P (1 − P )
S tan dard deviation = σ pˆ =
n
0.05(0.95)
= = 0.00975
500

b) Assuming Normal distribution for the proportion of defectives in a


sample, the 95% confidence limits are given by:

µ pˆ ± 1.96 = 0.05 ± 1.96(0.00975)


= 0.05 ± 0.01911
= 0.03089 to 0.06911

Thus the 95% confidence limits for the proportion of defectives in a


sample are 3.1% to 6.9%.

Example 7

In a random sample of 375 employees, 68% were found to be in favour of strike action.
Find the 99% confidence limits for the proportion of all employees in the company who
are in favour of such action.

166
We are given pˆ = 0.69 and n = 375

∴ µ pˆ = pˆ = 0.68
pˆ (1 − pˆ ) (0.68)(0.32)
σ pˆ = =
n 375
= 0.0241

Assuming a normal distribution the 99% confidence limits are:

µ pˆ ± 2.58σ pˆ = 0.68 ± 2.58(0.0241)


= 0.68 + 0.0622
0.6178 to 0.7422

The 99% confidence limits are 61.78% to 74.22%.

Estimation From Small Samples.

When sampling is done where sample size is less than 30 and the population variance is
unknown (i.e. S, the sample standard deviation is used as an estimate of σ , the
population standard deviation). The confidence limits for the time population mean are
given by

S
x ± tα
, n −1 n
2

Where tα is student’s t distribution value and n − 1 is the degree of freedom.


, n −1
2

x is the sample mean


S the s tan dard deviation of the sample mean

∑x
2
2
− nx
S=
n −1

the t variable is defined by the following formula

x−µ
t= S
n

167
where x is the mean of a random sample of n measurements, µ is the population mean
of the x distribution, and S is the sample standard distribution.

The characteristics of the t-distribution are:

i) It is an exact distribution which is symmetrical and mould shaped.

ii) It is flatter than the Normal distribution i.e. the area near the tails are greater than
the Normal Distribution.

iii) As the sample size becomes larger the t distribution approaches the normal
Distribution. To use the t distribution the tables in Appendix 1.

Example 8

A random sample of 12 men is taken and is found to have a mean height of 1.67cm and a
standard deviation of 0.48cm. Find:

i) 99%

ii) 95%

confidence limits for the population mean height.

α
x = 1.67, n = 12 and S = 48. 1 − α = 99, α = .01 and = 0.005. Hence
2
tα = t0.005,11 = 3.106 from Table 2. The 99% confidence limits are given by:
, n −1
2

S
x ± tα
, n −1 n
2

0.48
= 1.67 ± 3.106
12

= 1.67 ± 0.430

= 1.24cm to 2.1cm

168
Thus the 99% confidence limits for the population mean are between 1.24 and
2.1cm.

α
ii) 1 − α = .95, α = 0.05, = 0.025. Therefore tα = t0.025, 11 = 2.201 , from
2 , n −1
2
Table 2 the 95% confidence limits are given by

S
x ± tα
, n −1 n
2

0.48
= 1.67 ± 2.201
12

= 1.67 ± 0.305

= 1.365 to 1.975

Thus the 95% confidence limits for the population mean are between 1.365 to
1.975cm.

Exercise 4

1) A study of a sample of 500 bank accounts is made to estimate the average size of
a bank account. The sample mean is calculated to be K500 000. From previous
studies of bank accounts, it is known that the standard deviation is K67 000.
Construct a 99% confidence for the mean size of bank accounts.

2) An ice cream factory wishes to know the average number of women per block of
houses in a given compound. A sample of 120 blocks of houses within the
compound indicates that the average number of women is 94. When the standard
deviation is estimated for those 120 block of houses, it is found to be S = 15.04.
Calculate a 95% confidence interval for the number of women.

3) Assume that some college students want to find out what percent of the
population will vote for the MMD candidate. A sample of 125 voters reveals that
65 will vote MMD. Should we predict that the MMD candidate will win
(assuming that there are just two parties)? Construct a 98% confidence interval
that the MMD will win.

169
4) A study of a sample of 125 customers at the college bookstore indicated
20% preferred new books while 80% wanted used ones.

a) Estimate the standard error of a proportion for those favouring new


books. What is the standard error of a proportion for those
favouring old books?

b) Construct a 99% confidence interval for the proportion of the


population favouring new books.

c) Construct a 95.5% confidence interval for the proportion of the


population favouring new books.

5) A group of college student is trying to determine the appropriate sample


size to use. They wish to be within 2% of the true proportion with 95%
confidence. Past records indicate that the proportion of defective is 9 in
300. What sample size should they use?

6) Past experience has indicated that the salaries of factory workers in a


certain industry are approximately normally distributed with a standard
deviation of K225 000. How large a sample of factory workers would be
required if we wish to estimate the population mean salary µ to within
K27 000 with a confidence of 95%?

7) The length of time required for persons taking a Mathematics test is


assumed to be normally distributed. A random sample of 25 persons
taking the test is conducted and their test times are recorded, yielding an
average test time 120 minutes with a standard deviation of 24 minutes.
Find a 99% confidence interval for the population mean test time µ .

8) The Local Authority have tested the durability of a new paint for white
center lines, a highway department has painted test strips across heavily
travelled roads in nine different locations, and automatic counters showed
that they disappear after having been crossed by 200, 245, 235, 225, 220,
230, 235, 248 and 250 cars. Construct a 99% confidence interval for the
average number of crossings this paint can withstand before it disappears.

4.5 Hypothesis Testing

Hypothesis testing or significance testing is in many ways similar to the process


of estimation dealt with in the previous section. Random sampling is involved
and the properties of the distribution of sample means and proportion are still
used.

A hypothesis is some testable belief or opinion and hypothesis testing is the


process by which the belief is tested by statistical means.

170
This belief about the population parameter is called the null hypothesis denoted H o . The
value specified in the null hypothesis is often a historical value, a claim or a production
specification. The opposite of the null hypothesis is the alternative hypothesis denoted
by H a or H 1.

For example, if the average score of Mathematics students is 85, 10 years ago, we might
use a null hypothesis H o : µ = 85 for a study involving the average score of this year’s
mathematics class. If television networks claim that the average length of time devoted
to commercials in a 45-minutes program is 10 minutes, we would use H o : µ = 10
minutes as our null hypothesis in a study regarding the length of time devoted to
commercials. The alternative hypothesis is accepted if the null hypothesis is rejected. In
the two examples above, if we believe the average score of mathematics students is
greater than it was 10 years ago, we could use an alternative hypothesis H1 : µ > 85
while in the commercial case if the length of time devoted to commercials is not10
minutes, we could use an alternative hypothesis H1 : µ ≠ 10 .

If we reject the null hypothesis when it is in fact true, we have an error that is called a
type 1 error. On the other hand, if we accept the null hypothesis when it is in fact
false, we have made an error that is called a type II error. Table 4.1 summarizes these
results.

Table 1.1 Type I and II Errors

Our Decision

Truth of Ho Accept Ho Reject Ho

Ho is true Correct decision Type I error

Ho is false Type II error Correct decision

To investigate the significance of sampling mean we evaluate

x−µ x−µ
Z= i. e Z=
σx σ/ n

171
A selection of ‘significant’ values of Z together with the significance level α are given
below.

α .05 .025 .01 .005


Z 1.64 1.96 2.33 2.58

Example 9

Consider a Normal population with a standard deviation = 30. A random sample of 24


items is found to have a mean of 168. Test the assumption at the 5% significance level
that the population has a mean of 150.

Assumption: population mean = µ = 150. Alternative mean µ ≠ 150. We write

H o : µ = 150
H a : µ ≠ 150

This type of a test, we call it two tailed test. If µ ≠ 150 , it could either be less than 150
or greater than 150, hen the term two tailed test.

α α
= 0 .5 = .025
2 2

1 − α = .95

-1.96 1.96

We are given α = .05. Since it is a two tailed test, we share this area equally in the two
α .05
tails i.e = = .025. The shaded area is called the rejection region and the unshaded
2 2
area the acceptance region. The point separating the rejection region from the acceptance
region is called the critical point.

172
We are given that α = 30, n = 24 and x = 168, now

x−µ 168 − 150


Z= =
σ 30 / 24
n
18
=
6.12
= 2.94

A ‘significant” value of Z at the 5% level is 1.96, i.e the 95% confidence limits for Z are
–1.96 and +1.96 (see diagram above). Therefore, our value of Z is significant (i.e it is
outside the confidence limits). We reject H o . We thus accept that the population mean
is not equal to 150.

Example 10

A random sample of 12 family toy cars is found to have an average retail price of K300
000. Assuming that toy car prices are Normally distributed with a standard deviation of
K50 000.00, test the assumption (at the 5% level) that the average price of a family toy
car is:

a) K35 000, and

b) more than K35 000

We are given α = 50 000, n =12 and x = 300 000

a) H o : µ = 35 000
H1 : µ ≠ 35 000

this is a two-tailed test. We have

173
x−µ 300 000 − 350 000
Z= =
σ 50
n 12

− 50 000
=
50 000
3.4641

= −3.46

.025 .025

1 − α = .95

−1.96 1.96

Now a significant value of Z at the 5% level is 1.96. i.e. we reject H o if Z value is


greater than +1.96 or less than –1.96. Therefore, our value of Z(-3.46) is
significant. We reject the assumption H o . Thus the sample shows that the
average price of a family car is significantly different from K350 000.

b) H o : µ = 35 000
H1 : µ > 35 000

This is a one-tailed test. We have

x−µ 300 000 − 350 000


Z= =
σ 50
n 12

= −3.46

174
5%

0 1.65 Z

Therefore, our value of Z(= -3.46) is not significant. We accept H o . We thus


accept the assumption that the average family car is equal to K350 000.

Example 11

A random sample of 12 items is obtained from a Normal population and is found to have
a mean = 50 with a standard deviation = 7. Test the assumption, at the 5% significance
level, that the population mean is 40.

We are given S = 7, n =12 and x = 50

H o : µ = 40
H1 : µ ≠ 40 (two tailed test).

Now the sample size is small and hence the test statistic is no longer Z but t given by

x − µ 50 − 40
t= =
S 7
n 12

10
=
2.021

= 4.95

Now, a significant value of t at the 5% level (two tailed ) with n –1 = 12 = 11 is 2.201.


Therefore, our value t = 4.95 is significant. We reject H o . We thus conclude that the
population mean is significantly different from 40.

175
Example 12

An assessment test is given to all prospective employees in a company. Test scores are
known to be Normally distributed. A random sample of 7 participants obtained the
following results: 69, 58, 68, 66, 75, 85, 80.

Test the assumption that the mean test score is 65 using the 5% significance level.

H o : µ = 65
H1 : µ ≠ 65

This is a two-tailed test. We have x : 49, 58, 66,75, 85, 82. Now

x=
∑ x = 483 = 69, and
n 7

∑x
2
2
− nx 34319 − 7(69) 2
S= =
n −1 6

= 12.858

x − µ 69 − 65 4
∴ t= = =
S 12.858 4.86
n 7

t = 0.823

From the tables, a significant value of t at 5% level with n – 1 = 7 – 1 = 6 is 2447.


Therefore, our value of t = 0.823 is not significant. We can accept H o . We thus
conclude that the average test score is not significantly different from 65.

Example 13

The amount of monthly income tax paid by employees is approximately Normally


distributed. A random sample of 25 employees paid an average of K350 000 per month
in income tax, with a standard deviation of K160 000. At the 5% significance level test
the assumption that the average amount of income tax paid is greater than K250 000 per
month per employee.

We are given S = 160, n = 25 and x = 350

176
We write: H o : µ = 250 000
H a : µ > 250 000

This is a one tailed test. Now

x − µ 350 000 − 250 000


t= =
S 160 000
n 25

t = 3.125

From the tables, a significant value of t at the 5% level with n – 1 = 24 is 1.111.


Therefore, our value of t = 3.125 is significant. We can reject H o and accept the
assumption that the average income tax paid by employees in the company is
significantly greater than K250 000 per month.

Hypothesis Testing of Proportion

A population contains proportion P of ‘successes”. Random sample of size n are taken


from this population. The proportion of ‘successes” in the samples are distributed with a
P(1 − P)
mean µ pˆ = P and s tan dard deviation σ pˆ =
n

If n is ‘large’ the sample proportion are approximately Normally distributed. The


significance of a sample proportion p̂ can be examined using the formula

Pˆ − µ pˆ
Z=
σ pˆ

Example 14

It is assumed that over half of the employees in a large company are in favour of a
proposed new salary structure. A sample of 250 employees found that 42% were in
favour. Does this sample verify the assumption? (use the 1% significance level)

We have pˆ = 0.42, n = 250

H o : P = 0.50
H i : P < 0.50

177
Now,

P (1 − P ) 0.5(0.5)
σ pˆ = = = 0.0316
n 250

Pˆ − µ pˆ 0.42 − 0.50
∴ Z= = = −2.53
σ pˆ 0.0316

A significant value of Z at the 1% level (one-tail) is –2.33. Therefore, our value of Z = -


2.53 is significant. We reject H o . We thus accept the assumption that the population
does have a proportion less than 50%, i.e. less than half of the employees are in favour of
the proposed new salary structure.

Example 15

It is required to test the hypothesis that 56% of households have a television set. A
random sample of 500 households found that 75% of the sample had television sets. The
significance level is 1%.

We have P = 0.06, n = 500, Pˆ = 0.75 , α = 1%.

This is a two tailed test because we wish to test the hypothesis as it is and not against a
specific alternative hypothesis that the real proportion is either larger or smaller.

i.e H o : P = 0.56
H i : P ≠ 0.56

Now,

0.56(.44)
σ pˆ = = 0.0222
500

0.75 − .56
∴ Z= = 8.56
0.0222

At the 1% level of significance for a two-tailed test the appropriate Z value is 2.58.

Therefore, our value of Z = 8.56 is significant. We reject H o . We thus accept the


assumptions that the proportion of household who have a television set is not 56%.

178
Exercise 5

1) A Normal population has a standard deviation of 50. A random sample of 30


items is found to have a mean of 270. Using the 1% significance level examine
the assumption that the population has a mean of 280.

2) A machine makes twist-off caps for bottles. The machine is adjusted to make
caps of diameter 1.87cm. Production records show that when the machine iis so
adjusted, it will make caps with mean diameter 1.87 cm and with standard
deviation σ = 0.045cm. During an inspector checks the diameters of caps to see
if the machine is not functioning properly in which case the diameter is no longer
1.87cm. A sample of 65 caps is taken and the mean diameter for this sample x is
found to be 1.98cm. Is the machine working properly i.e µ ≠ 1.87 . (Use a 5%
level of significance).

3) Monthly salaries in a company are normally distributed with a standard deviation


of K94500. A sample of 20 employees is found to have a mean salary of K756
000 per month. Using the 1% level of significance would you conclude that the
average salary in this company is significantly higher than K720 000 per month?

4) A random sample of seven bank accounts show balances equal to: K270 000,
K120 000, K1600 000, K620 000, K1980 000, K3200 000, K2600 000. Test the
assumption that the mean bank balance is K1250 000. (use the 5% significance
level).

5) A sample of 28 items from a normal population is found to have a mean of 550


with a standard deviation of 69. At the 5% level test the assumption that the
population has a mean of 520.

6) From a random sample of 95 Zambian companies, it is found that 36 companies


had annual turnovers in excess of 30 million kwacha. Using a 1% significance
level, test the assumption that 45% of all Zambian companies have over 30
million kwacha annual turnover.

7) A team of eye surgeons has developed a new technique for a risky eye operation
to restore the sight of people blinded from a certain disease. Under the old
method, it is know that only 45% of the patient who undergo this operation
recover their eyesight. Suppose that surgeons in various hospitals have performed
a total of 230 operations using the new method and that 98 have been successful
(the patients fully recovered their sight). Can we justify the claim that the new
method is better than the old one? (use a 5% level of significance).

179
Hypothesis Testing of the Difference Between Two means

The distribution of sample mean differences is normally distributed and remains normally
distributed whatever the distribution of the population from which the samples are drawn.
When n > 30 i.e. large samples, the Normal area tables are used. When n < 30 the t
distribution are used.

S12 S 22
The standard errors of the σˆ ( x1 − x 2 ) = + difference of means where:
n1 n2

S1 = Standard deviation of sample 1, size n1 .

S2 = Standard deviation of sample 2, size n2 and the Z score is calculated thus:

x1 − x 2
Z=
σˆ x − x
1 2

Example 1

A psychological study was conducted to compare the reaction times of men and women
to a certain stimulus. Independent random samples of 50 men and 50 women were
employed in the experiment. The results were shown in the table below. Do the data
present sufficient evidence to suggest a difference between time and mean reaction times
for men and women? Use α = 0.05

Men Women
n1 = 50 n2 = 50
x1 = 43 x 2 = 37
S12 = 20 S 22 = 12

180
We have

H o : µ1 = µ 2
H a : µ1 ≠ µ 2 (two − tailed )

∧ 512 522
σ( x1 − x 2 )
= +
n1 n2
20 12
= +
50 50
= 0 .8

x1 − x2 43 − 37
Z= =
0 .8
Now σ x −x
1 2

= 7 .5

A significant value of Z at the 5% level with Z = 1.96 . Therefore our value of Z = 7.5 is
significant. We reject H 0 . We thus conclude that there is a significant difference
between the average earnings in the two companies.

Example 2

A consumer group is testing camp stoves. To test the heating capacity of a stove, the
group measures the time required to bring 2 litres of water from 10ºc to boiling (at sea
level).

Two competing models are under consideration. Thirty-seven of each model are tested
and the following results are obtained.

Model 1: mean time x1 = 12.5 min; standard deviation s1 = 2.6 min


Model 2: mean time x2 = 10.1 min; standard deviation s2 = 3.2 min

Is there any difference between the performances of these two models (use a 1% level of
significance)?

181
We have

H o : µ1 = µ 2
H a : µ1 ≠ µ 2

x1 − x 2 12.5 − 10.1
Z= =
S12 S 22 (2.6) 2 (3.2) 2
+ +
n1 n2 37 37

2 .4
= = 3.54
0.678

Z = 3.54 is greater than the critical value of Z = 2.58. Therefore , we reject H o and
accept the assumption that there is a significant difference between the performances of
these two models.

Hypothesis Testing of the Difference Between Proportions

In a similar manner it may be required to test the difference between the proportions of a
given attribute found in two random samples.

The following symbols will be used.

Sample 1 Sample 2

Sample proportion of successes 1 P̂ P̂2


Population proportion of successes P1 P2
Sample size n1 n2

The assumption is that the two sample are from the same population. Hence the pooled
sample proportion.

P1n1 + P2 n2 pq pq
P= and the s tan dard erorr is σˆ ( p1 − P2 ) = + , and
n1 + n2 n1 n2

( Pˆ1 − Pˆ2 ) − ( P1 − P2 )
Z= → (1)
σˆ ( P1 − P2 )

182
But under the null hypothesis H : P1 = P2 , hence (1) is reduced to:

( Pˆ1 − Pˆ2 )
Z=
σˆ ( P1 − P2 )

Example 3

The following results have been recorded from random samples of candidates taking two
Institute examinations.

Examinations No. of candidates sampled No. of passes


Communication 50 35
Mathematics 85 42

Use the 1% level of significance to examine whether there is a significant difference in


the proportions of candidates passing the two examinations.

Let Communication be sample 1 and mathematics be sample 2

n1 = 50 n2 = 85
35 42
Pˆ1 = = 0 .7 Pˆ2 = = 0.49
50 85

H o : P1 = P2
H a : P1 ≠ P2

(.0568)(0.432) (0.568)(0.432)
σˆ ( P − P ) = +
1 2
50 85

= 0.0882

0.7(50) + 0.49(85) 76.65


Now P = = = 0.568
50 + 85 135

0.7 − 0.49
∴Z = = 2.38
0.0883

183
A significant value of Z at 1% level (two tailed is 2.58. Therefore, our value of Z = 2.38
is not significant. We accept H o . There is no significant difference between the
proportion of candidates passing the two examinations.

Example 4

A college committee wishes to know if the proportion of students who received A grades
was decreasing as a result of the committee recent report to the Principal that showed that
grades had risen since 2002 and that Mampi College grades had risen faster than grades
in other college in the country. A sample of grades in 2002 and 2003, after the
committee’s report was given, were studied to see if the proportion of A grades had gone
down significantly. Use α = 0.05.

Year Proportion of A grades Number of students


2002 0.70 120
2003 0.50 110

Let 2002 results be sample 1, and 2003 results be sample 2

We have:

H o : P2 ≥ P1
H a : P2 < P1

0.70(120) + .50(110)
Now P = = 0.604
120 + 110

 1 1 
σˆ ( P − P ) = (0.604)(0.396) +  = 0.065
 120 110 
1 2

This is a one tailed test. The critical Z value is –1.65.

0.50 − 0.70
Z= = −3.08
0.065

Since –3.08 is less than –1.65, we reject H o . We therefore conclude that the grades have
gone down since the grade committee reports was issued.

184
Exercise 6

1) A quality inspection of two production lines gave the following results

Production line Sampling size No. of defectives


X 45 4
Y 35 9

Use a 5% significant level to test the claim that line A is more reliable than line B.

2) In a survey of voting intentions prior to a local government elections, 45% from a


random sample of 400 voters said that they intend to vote for the MMD candidate.
In a second area in the same constituency there were 32% intending to vote
MMD in a sample of 370. Use a 1% level of significance to determine whether
there is a significant difference between voting intentions in the two areas.

3) The graduating class to two prestigious business schooks were surveyed about
their average starting salary with the following results.

School Average starting Standard Sample size


salary(K’ million) deviation
(K’ million)
X 80 1.8 150
Y 85 1.44 115

At a 0.05 confidence level, do we have adequate reason to believe that graduate of


School Y have equal starting salaries?

4. A calculator Company was trying to decide between two brands of batteries to


recommend in its calculators. If the batteries were of equal life, the company
preferred brand 1 because of its better distribution network. Based on the
following data and using 5% confidence level, which battery should the quality
control engineer recommend?

Battery Mean life (hr) Standard Sample size


Deviation(hrs)
Brand 1 110 15 150
Brand 2 115 20 150

185
5) Consider the following null and alternative hypotheses.

H o : µ1 − µ 2 ≤ 0
H a : µ1 − µ 2 > 0

Sample of size n1 = 30 and n2 = 30 are planned to test this null hypothesis.


Suppose it is known that σ 1 = 12, σ 2 = 24. Further, suppose that the two
population are taken from each population independently with mean
x1 = 30 and x 2 = 20 respectively. Should the null hypothesis be rejected or not
rejected. Explain. Use α = .05.

186
EXAMINATION QUESTION WITH ANSWERS

Multiple Choice Questions

1.1 A simple random sample of size 25 is drawn from a finite population consisting of
145 units. If the population standard deviation is 10.5, what is the standard error
of the sample mean when the sample is drawn with replacement?

A. 5.0 B. 2.1 C. 1.45 D. 1.92

(Natech, 1.2 Mathematics and Statistics, June 2003)

1.2 A Corkhill machine is set of fill a small bottle with 9.00 grams of medicine. It is
claimed that the mean weight is less than 9.0 grams. The hypothesis is to be
tested at the 0.01 level. A sample revealed these weight (in grams): 9.2, 8.7, 8.9,
8.6, 8.8, 8.5, 8.7 and 9.0. What are the null and alternative hypotheses?

A. H o : µ = 90, H1 : µ < 90 B H o : µ ≠ 90, H1 : µ > 9.0

C. H o : µ > 9.0, H1 : µ < 9.0 D H o : µ ≠ 9.0, H1 : µ = 9.0

(Natech, 1.2 Mathematics and Statistics, June 2001)

1.3 A study of Excelsior Furniture limited regarding the payment of invoices revealed
that, on average, an invoice was paid 20 days after it was received. The standard
deviation equaled 5 days. What percentage f the invoices were paid within 15
days of receipt?

A. 20% B. 34.13% C. 15.87% D. 84.13%

(Natech, 1.2 Mathematics and Statistics, December 2002)

1.4 A procedure based on sample evidence and probability theory used to determine
whether a statement about the value of a population parameter is reasonable and
should not be rejected, or unreasonable and should be rejected is called:

A. Forecasting B. Hypothesis C. inferential statistics

D. Hypothesis testing.
(Natech, 1.2 Mathematics and Statistics, December 2003)

187
1.5 If, in a sample of 150, 60 respondents say they prefer product P to product Q ,
then the standard error of the sample proportion is:

A. 0.0016 B. 0.04 C. 0.4 D. 0.6


(Natech, 1.2 Mathematics and Statistics, December 2001)

1.6 Which of the following does not require a sampling frame?

A. Cluster sampling B. Quota sampling

C. Stratified sampling D. Random sampling


(Natech, 1.2 Mathematics and Statistics, December 2003)

1.7 The finite correction factor is used in computing the standard error of the mean
when:

A. X is infinite. B. N is finite. C. n is finite

D. σ is finite.

1.8 In a “5 percent two-tail test” Concerned with the value of the population mean,
the area in each tail (region of rejection) of the standard-normal distribution
model is:

A. 0.025 B. 0.05 C. 0.10 D. 0.95

1.9 Type I error refer to the error of:

A. Accepting a true null hypothesis


B. Rejecting a true null hypothesis
C. Accepting a false null hypothesis
D. Rejecting a false null hypothesis

1.10 In the general procedure of hypothesis testing the “benefit of doubt is give to the:

A. Sample statistics B. Test statistic

C. Null hypothesis C. Alternative hypothesis

188
SECTION B

QUESTION ONE

a) State the null and alternative hypotheses given the following information.

During the last year, the average quarterly charge on a current account held at a
bank was K25,000. The bank wishes to investigate whether the amount paid in
charges has increased or not. So they sample 50 accounts and obtain a mean of
K25,500.

b) The mean breaking strength of the cables supplied by a manufacturer is 1,800


with a standard deviation of 100. By a new technique in the manufacturing
process it is claimed that tbe breaking strength of the cables has increased. In
order to test this claim a sample of 50 cables is tested. It is found that the mean
breaking strength is 1,850. Can we support the claim at 0.01 level of
significance?
(Natech, 1.2 Mathematics and Statistics, June 2001)

c) A company manufactures support struts which have a mean breaking strength of


125 KN with a standard deviation from the mean of 185KN. As a result of trials
with more expensive raw material, a batch of 25 struts with a mean breaking
strength of 1310KN is produced. Is this evident that the new material is
producing stronger struts?
(Natech, 1.2 Mathematics and Statistics, June 2003)

QUESTION TWO

a) Distinguish between a point estimate and an interval estimate.

b) ZACB, a training Institution runs a number of promotional activities for their


programmes, one of which is called “ZACB hour”.

In order to ascertain the effectiveness of this promotional activity, a sample of 500


new students was randomly selected and asked how they came to know about the
institution’s programme. 185 of the 500 students indicated that it was through
“ZACB hour”.

Required:

i) Compute a 95% confidence interval for the population proportion of


students to be interviewed if the sample promotion is to lie within 1% of
the true population proportion at 95% confidence interval level.

189
ii) Calculate the number of students to be interviewed if the sample
promotions is to lie within 1% of the true population proportion at 95%
confidence level.
(Natech, 1.2 Mathematics and Statistics, December 2004)

QUESTION THREE

a) i) The mean and the standard deviation of the height of a random sample of
100 students are 168.75cm and 7.5cm respectively.

Required:

Calculate the 99% confidence interval for the mean height of all students
at the college.

ii) In measuring the reaction of individuals, a psychologist estimates that the


standard deviation of all times is 0.05 second.

Required:

Calculate the smallest size necessary in order to be 95% confidence that


the error in the estimate will not exceed 0.01 seconds.

(Natech, 1.2 Mathematics and Statistics, December 2003)

b) A sample of 10 measurement of the diameter of a marble gave a mean of 4.38 mm


and a standard deviation of 0.06mm.

Construct a 99% confidence interval of the population diameter.


(Natech, 1.2 Mathematics and Statistics, June 2005)

c) A weekly turnover of a small retail company is Normally distributed with an


average of K42 000 per week.

Following an advertising campaign a seven-week period produced an average


turnover of K49 000 per week. With a standard deviation of K4 500. At the 1%
level test whether there has been a significant increase in the turnover.

(Natech, 1.2 Mathematics and Statistics, December 1998)

190
d) A random sample of six (6) bank accounts showed balances equal to

K232,200; K2,752,000; K843,200; K1, 376,000; K421,600; K1,346, 400. Test


the assumption that the mean bank balance is K1,032,000. (use the 1%
significance level).
(Natech, 1.2 Mathematics and Statistics, Nov/Dec 2000)

QUESTION FOUR

a) The Manager of a coach company wishes to estimate the average daily number of
kilometers covered by his coaches. He requires a confidence limit of 80%, but the
error must be within plus or minus 20 kilometers of the true mean.

Assuming that the previous investigation have indicated that a very good estimate
for the population standard deviation is 130 kilometers.

i) Find the required sample size.

ii) Suppose the sample size is greater than his fleet of coaches, what steps
should be taken?

(Natech, 1.2 Mathematics and Statistics, December 2001)

b) XYZ Ltd has developed a cleaning detergent for use by housewives in their
homes. A decision must now be made whether or not to market the detergent.
Marketing the product will be profitable if the mean number of units ordered per
household is greater than 3.0 and unprofitable if less than 3.0. The decision will
be based on the sales potential shown in the home demonstrations of the detergent
with a random sample of the housewives in the target market.

i) Specify the null and alternatives hypotheses if the more costly error is to
market the detergent when it is not profitable.

ii) Specify the null and alternative hypotheses if the more costly error is to
fail to market the detergent when the mean number of units ordered per
household exceeds 3.0.

191
c) An instructor wishes to determine whether or not student performance has
changed over the duration of the course. Scores in two equivalent test in
“mathematics proficiency”, one before and the other after the course, are
summarized in the Table below.

Student 1 2 3 4 5 6 7 8 9 10 11
Before 29 61 73 51 49 71 33 48 44 75 55
After 55 67 73 35 48 93 59 47 42 60 46

Is there any statistical evidence that the course has produced some learning? Use
α = 0.05 .

(Natech, 1.2 Mathematics and Statistics, June 1977)

QUESTION FIVE

a) Formulate the appropriate null and alternative hypotheses in each of the following
situations.

i) A large mining company has a mean level of absenteeism of 98 workers


per 1,000 at anyone time. To help reduce this rather high rate of
absenteeism, the company has introduced a new attendance bonus. The
company now wishes to determine whether absenteeism ahs declined.

ii) Suppose you own a book shop that sells a variable number of books per
day, and that if the mean number of books sold is less than 20 per day, you
will eventually be bankrupt. If the mean number of books sold exceeds 20
per day, you care financially safe. You wish to determine whether your
sales (from the bookshop) are leading to a financial disaster.

iii) When it is operating correctly, a metal lathe produces machine bearing


with a mean diameter of 1.27cm. Otherwise, the process if out of control.
A quality control inspector want to check whether the process is out of
control.

b) A relief organization knows from the previous studies that the average distance
that each family has to walk to fetch water is 5.6km. A small capital investment
programme is initiated to sink boreholes to address this problem.

Two months after the completion of the programme, a sample of thirty families
revealed that the mean distance is now 5km with a standard deviation of 1.4km.
Is this a significant improvement? Conduct your test at 5% level of significance.
(Natech, 1.2 Mathematics and Statistics, December 1996)

192
QUESTION SIX

a) From a recent survey conducted on environmental issues, out of 200 persons


sampled. 1600 favoured more strict environmental protection measures. What is
the estimated population proportion?

b) It is known that the distribution of efficiency ratings for production employees at


Bearing Centre Ltd is normally distributed with a population mean of 200 and a
population standard deviation of 16. The research department if challenging this
mean, stating it is different from 200. As a result the efficiency ratings of 100
production employees were analyzed and then mean of the sample was computed
to be 203.5.

With this information and using the 0.01 level of significance, test the hypothesis
that the population mina is 200.
(Natech, 1.2 Mathematics and Statistics, December 1997)

193
C N
Confidence Intervals, 162 numerical, 158

D P
Distribution of Sample Means, 161 Probability Samples, 157, 159

E R
EXAMINATION QUESTION WITH ANSWERS, random, 157, 158, 159, 160, 164, 166, 168, 170, 172,
187 173, 175, 176, 177, 178, 179, 180, 183, 185, 187,
190, 191, 192

F
S
Finite Population Correction Factor, 164
Sampling, 156, 157, 158, 159, 185
Simple Random Sample, 157
H standard deviation, 161, 162, 164, 165, 166, 167, 168,
169, 170, 172, 173, 175, 177, 179, 180, 182, 187,
Hypothesis testing, 160, 171, 188 188, 189, 190, 191, 192, 193
Hypothesis Testing, 171, 177, 180, 183 Statistical Inferences, 160

194

You might also like