0% found this document useful (0 votes)
9 views

Sampling and Sampling Distribution

GBS 541

Uploaded by

kaszulu1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Sampling and Sampling Distribution

GBS 541

Uploaded by

kaszulu1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

CHAPTER 5

PROBABILITY DISTRIBUTION

Reading

Newbold Chapters 4 (not 4.4) and only 5.5 in Chapter 5

Wonnacott and Wonnacott Chapter 4

Tailoka Frank P Chapter 9

Introductory Comments

This Chapter introduces the three useful standard distributions for two counts (Discrete
Probability distribution) and one for (Continuous probability Distribution). These are so
often used that everyone should be familiar with them. We need to know the mean, the
variance and how to find simple probabilities.

5.0 Discrete Random Variables

A random variable maybe defined roughly as a variable that takes on different


numerical values because of chance. Random variables are classified as either
discrete or continuous. A discrete random variable is one that can take on only a
finite or countable number of distinct values. For example, the number of people
entering a shop is finite the values are 0, 1, 2, etc., the outcomes on 1 roll of a fair
die are limited to 1, 2, 3, 4, 5 and 6.

A random variable is said to be continuous in a given range if the variable can


assume any value in any given interval. A continuous variable can be be
measured with any degree of accuracy by using smaller and smaller units of
measurements. Examples of continuous variables include weight, length,
velocity, distance, time, and temperature. While discrete variables can be
counted, continuous variable can be measured with some degree of accuracy.

A probability distribution of a discrete random variable x whose value at x is


f (x ) possess the following properties.

67
1. f ( x)  0 for all real values of x

2.  f ( x)  1
x

Property 1: simply states that probabilities are greater than or equal to zero. The
second property states that the sum of the probabilities in a probability
distribution is equal to 1. The notation

 f ( x) means ‘sum of the values


x
f for all the values that x takes on’. We will

ordinarily use the term probability distribution to refer to both discrete and
continuous variables; other terms are sometimes used to refer to probability
distributions (also called probability functions).

Probability distributions of discrete random variables are often referred to as


probability mass functions or simply mass functions because the probabilities are
massed at distinct points, for example along the x axis.

Probability distributions of continuous random variables are referred to as


probability density functions or density functions.

5.1 Cumulative Distribution Functions

Given a random variable x , the values of the cumulative distribution function at


x , denoted F (x ) , is the probability that x takes on values less than or equal to
x . Hence

f ( x)  p ( x)  ( x)  (1)

In the case of a discrete random variable, it is clear that

f (c )   f ( x )  (2)
xc

The symbol  f ( x)
x c

Means “sum of the values of x for all values of x less than or equal to c”.

68
Example 1

Shoprite is interested in diversifying its product line into the soft goods market.
Mr Phiri, Vice president in charge of mergers and acquisitions, is negotiating the
acquisition of quick-save, a discount shop. The determine the price Shoprite
would have to pay per share for quick save, she sets up the probability distribution
for the stock price shown in the table below.

Probability distribution and cumulative distribution for the price of Quick


save common stock.

Price of Quicksave Probability Cumulative Probability


Common stock  x  f x  F x 
K74 250 0.08 0.08
76 500 0.15 0.23
78 750 0.53 0.76
81 000 0.20 0.96
83 250 0.04 1.00

The probability that the price would be K78 750 or less is

P( x  K 78 750)  F ( K 78750)  0.08  0.15  0.53  0.76


P( x  K 76 500)  F ( K 76 500)  0.23

69
A graph of the cumulative distribution function is a step function that is the values
change in discrete ‘steps’ at the indicated integral values of the random variable x.

F (x )


1.00 

0.80 

0.60

0.40

0.20 

0.00

K74 250 76 500 78 750 81 000 83 250 x

Price of stock

Graph of cumulative distribution of the price of Quicksave common stocks.

5.2 Probability Distribution of Discrete Random Variables

We will discuss the binomial and Poisson probability distribution of discrete


random variables.

  E ( x)   xP( x)
All x

The variance of discrete random variable x is

 2  E ( x   ) 2   ( x   ) 2 p ( x)
All x

70
In general, if g(x) is any function of the discrete random variable x, then

E[ g ( x)]   g ( x) P( X  x)
All x

For example

E (20x)   20xP( X  x)
E ( x 2 )   x 2 P( X  x)
E ( X  5)   ( x  5) P( X  x)

Example 2

The random variable X has the following distribution for x  1,2,3,4.

X 1 2 3 4
P ( X  x) 0.02 0.35 0.53 0.10

Calculate:

a) E ( x)
b) E (5 x  3)
c) E( X 2 )
d) 6 E ( x)  8
e) E (5 x 2  2)

Solution

a) E ( x)   xP( X  x)
 1(0.02) 2(0.35)  3(0.53)  4(0.10)
 0.02  0.70  1.59  0.40
 2.71

71
b) E (5 x  3)  5 E ( x)  3
 5 xP( X  x)  3
 5 [1(0.02)  2(0.35)  3(0.53)  4(0.10)]  3
 5(2.71)  3
 13.55  3
 10.55

c) E ( X 2 )   X 2 P( X  x)
 12 (0.02)  22 (0.35)  32 (0.53)  42 (0.10)
 0.02  1.4  4.77  1.6
 7.79

d) 6E ( x)  8  6 xP( X  x)  8
= 6(2.71) + 8 = 16.26 + 8
= 24.26

e) E (5 x 2  2)  5E ( x 2 )  2
5 E ( x 2 ) 2
5 x 2 P ( X  x)  2
 5(7.79)  2
 40.95

In general, the following results hold when X is a discrete random variable.

1) E ( a )  a where a is any constant.

2) E (ax)  aE( X ), where a is any constant

3) E (aX b)  aE( x)  b, where a and b are any constants.

4) E[ f1 ( x)  f 2 ( x)]  E[ f ( x)]  E[ f 2 ( x) where f1 and f 2 are functions of X.

72
Variance, Var (x)

As for the variance, the following results are useful.

1) Var (a )  0 where a is any constant

2) Var (ax)  a 2 var( x) where a is any constant

3) Var(ax b)  a 2 var( x) where a and b are any constants.

Example 3

For the data in Example 2, calculate the following:

a) Var (5 x  3)  25 var( x)
b) Var (4 x)
c) Var (3x  2)

Solution

a) Var (5 x  3)  25 var( x)
We will need to find Var ( x)  E ( x 2 )  E 2 ( x)

E( X )   xP( X  x)
 2.71.

E ( X 2 )   X 2 P( X  x)
 7.79

Var ( x)  E ( X 2 )  E 2 ( x)
 7.79  (2.71) 2
 0.4459

Var (5 x  3)  25 var( x)
 25(0.4459)
Therefore var( 5 x  3)  11.1475

73
b) Var (4 x)  16 var( x)
 16(0.4459)  7.1344

c) Var (3x  2)  9 var( x)


 9(0.4459)  4.0131
Example 4
A risky investment involves paying K300 000 that will return K2, 700,000 (for a net
profit of K2, 400,000) with probability 0.3 or K0 .00 (for a net loss of K300 000) with
probability 0.7. What is your expected net profit from this investment?

Solution
x P(x)
2,400,000 0.3
-300,000 0.7

(Note that a loss is treated as a negative profit.)


Then E ( x)   xP( x)  2,400,000(0.3)  (300,000)(0.7)  720,000  210,000  510,000
Your expected net profit on an investment of this kind is K510, 000. If you were to make
a very large number of investments, some would result in a net profit of K7200, 000, and
others would result in a net loss of K300, 000. However, in the long run, your Average
net profit per investment would be K510, 000.

5.3 The Binomial Distribution

The Binomial distribution, in which there are two possible outcomes on each
experimental trial, is undoubtedly the most widely applied probability distribution
of a discrete random variable. It has been used to describe a large variety of
processes in business and the social sciences as well as other areas. The Bernoulli
process after James Bernoulli (1654 – 1705) gives rise to the Binomial
distribution.

The Bernoulli process has the following characteristics.

a) On each trial, there are two mutually exclusive possible outcomes, which
are referred to as “success” and “failure”. In somewhat different language
sample space of possible outcomes on each experimental trial is S =
(failure, success).

b) The probability of a success will be denoted by P , P remains constant


from trial to trial. The probability of a failure will be denoted by q , q is
always equal to 1  P .

74
c) The trials are independent. That is, the outcomes on any given trial or
sequence of trials does not affect the outcomes on subsequent trials.

Suppose we toss a coin 3 times, then we may treat each toss as one Bernoulli trial.
The possible outcomes on any particular trial are a head and a tail. Assume that
the appearance of a head is a success. For example, we may choose to refer to the
appearance for a defective item in a production process as a success, if a series of
births is treated as a Bernoulli process, the appearance of female 9male0 may be
classified as a success.

Consider the experiment of tossing a fair coin three times, then the sequence of
outcome is

HTH, HHH, HHT, THH, TTT, THT, TTH, HTT

Since the probability of a success and failure on a given trial are respectively, P
and , the probability of the outcome for instance {HTH }  pqp  p 2 q where p is
the probability of observing a “head” and q is the probability of observing a “tail”.

Outcome Probability

HTH pqp  p 2 q

HHH PPP  p 3

HHT ppq  p 2 q

THH qpp  qp2

THT qpq  q 2 p

TTT qqq  q 3

TTH qqp  q 2 p

HTT pqq  pq2

We can obtain the number of such sequences from the formula for the number of
combination of n objects taken x at a time. Thus the number of possible sequences in
 3
which two heads can occur is   .
 2

75
n!
Thus C (n, x) 
x!(n  x)!

3!
C (3,2)  3
2!1!

These are the events {HTH}, {HHT}, {THH}

Therefore the probability of exactly 2 heads p( x  2)  c(3,2)qp2

1 1
In the case of the fair coin, we assign a probability of to p and to q. Hence
2 2

P( x  2)  C (3,2)(1 / 2)(1 / 2) 2  3 / 8.

This result may be generalized to obtain the probability of (exactly) p successes in n


trials of a Bernoulli process. Let us assume n  x failures occurred followed by x
successes, in that order. We may then represent this sequence as:

qqq . . . q ppp

n  x Failures x successes

The probability of this particular sequence is q n  x p x . The number of possible sequences


 n
of n trials resulting in exactly x success is   .
 x

Therefore, the probability of obtaining x successes in n trials of a Bernoulli process is


given by

F ( x)  (n, x)q n  x p x for x  0,1,2, . . ., n

If we denote by x the random variable “number of successes in these n trials”, then

F ( x)  P( X  x)

76
The fact that this is a probability distribution is verified by noting the following
conditions.

1) f ( x)  0 for all real numbers of x


2)  f ( x)  1
x

Therefore, the term binomial probability distribution, or simply binomial distribution, is


usually used to refer to the probability distribution resulting from a Bernoulli process.

In problems where the assumption of a Bernoulli process are met, we can obtain the
probabilities of zero, one, or more successes in n trials from the respective terms of the
binomial expansion (q  p) n , where q and p denotes the probabilities of failure and
success on a single trial and n is the number of trials.

Example 5

The tossing of a fair coin 3 times was used earlier as an example of a Bernoulli process.
Compute the probabilities of all possible numbers of heads and this establishes a
particular binomial distribution.

Solution

1
This problem is an application of the binomial distribution for P 
, n  3. Letting x
2
represent the random variable “number of heads”, the probability distribution is as
follows:

(Number of heads)

77
X P( x)
 3  1   1 
0 3
1
0       
 0 2   2  8

 3  1 
1 2
1 3
1      
 1  2   2 8

 3  1   1 
2 1
3
2      
 2  2   2  8

 3  1   1 
3 0
1
3      
 3  2   2  8

Example 6

A machine that produces stampings for car engines is not working properly and
producing 15% defectives. The defective and no defective stampings proceed from the
machine on a random manner. If 4 stampings are randomly collected, find the probability
that 2 of them are defective.

Solution

Let P = 0.15 be the probability that a single stamping will be defective and let X equal the
number of defective in n = 4 trials. Then,

q  1  p  1  0.15  0.85 and


n  x n x
p( x)    p q  4(0.15) x (0.85) 4  x
x 
x
4!
 (0.15) x (0.85) 4  x ( x  0,1,2,3,4)
x!(4  x)!

Therefore, the probability of x = 2 defectives in a sample n = 4, substitute x = 2 into


the formula for P(X) to obtain

78
4!
P(2)  (0.15) 2 (0.85) 2  0.01625625(6)
2!(4  2)!
 0.0975375
 0.0975

The mean, variance and standard deviation for a Binomial random variable is given by:

Mean   np
Variance  2  npq
S tan dard deviation   npq

To calculate the values of  and  in example 5, substitute n = 4 and P = 0.15 unto the
following formula

  np  4(0.15)  0.60
  npq  (4)(0.15)(0.85)  0.51  0.714

Example 7

Payani Serenje owns 5 stocks. The probability that each stock will rise in price is 0.6.
What is the probability that three out of the five stocks will rise in price?

Solution

n  5  0.6, q  1  P  0.4

Let x be the number of stocks, then

P ( X  3)  (5,3)(0.6)3 (0.4) 2
5!
 .(0.216)(0.16)
3!2!
(5)(4)
 (0.216)(0.16)
2
 0.3456
 0.346

From the tables n = 5, P = 0 .6

79
P(3)  P( X  3)  P( X  2)  .663  .317  0.34

5.4 The Poisson Distribution

The Poisson distribution is named after the eighteenth century in the early 1800s
French Physicist and mathematicain. The Poisson distribution is a discrete
probability distribution which has the following formula.

 xe
P( X )  , forx  0,1,2 . . .
x!

Where P(x) is the probability that a variable with a Poisson distribution equals x,
 is the mean or expected value of the Poisson distribution, and e is
approximately 2.718 and is the base of the natural logarithms.

One reason why the Poisson distribution is important in statistics is that it can be
used as an approximation to the binomial distribution. If n (the number of trials)
is large and P (the probability of success) is small, the probability can be
approximated by the Poisson distribution where np   . Experience indicates
that the approximation is adequate for most practical purposes if n is at least 20
and P is no greater than 0.05.

The Poisson distribution has been used to describe the probability function of such
situations.

1) Product demand
2) Demand for service
3) Number of telephone calls that come through a switchboard.,
4) Number of death claims per day received by an insurance company.
5) Number of breakdowns of an electronic computer per much.

All the preceding has two elements in common,

1) The given occurrence can be described in terms of a discrete random variable,


which takes on values, 0, 1, 2, and so forth.

2) There is some rate that characterizes the process producing the outcome. The rate
is the number of occurrences per interval of time or space.

80
For instance, product demand can be characterized by the number of units purchased in a
specified period. Product demand may be viewed as a process that produces random
occurrences in continuous time.

The characteristics of a Poisson distribution are as follows:-

1) The experiment consists of counting the number of times a particular even occurs
during a given unit of time, or in a given area of volume (or any unit of
measurement,

2) The probability that an event occurs in a given unit of time, area, or volume is
independent of the number that occur in their units.

Note that the most important difference between the Binomial and the Poisson
distributions is that in the Binomial distribution we find the probability of a number of
successes in n trials , whiles as for the Poisson distribution we find the probability of the
number of successes per unit of time.
Example 7

Suppose the random variable X the number of the company’s absent employees on
Tuesdays has (approximately) a Poisson probability distribution. Assuming that the
average number of Tuesday absentees is 3.4;

a) Find the mean and standard deviation of x , the number of absent employees on
Tuesday.

b) Find the probability that exactly 3 employees are absent on a given Tuesday.

c) Find the probability that at least two employees are absent on a Tuesday.

Solution

a) The mean and variance of a Poisson distribution are equal to  . Thus for this
example

 = 3.4,  2  3.4

Therefore the standard deviation is

  3.4  1.84

b) We want the probability that exactly three employees are absent on Monday. The
probability distribution for x is

81
 X e 
P( X ) 
X!

Then  = 3.4, X = 3, and e 3.4 = 0.033373 (from Table 2)

(3.4)3 e 3.4 (3.4)3 (0.033373)


Thus, P(3)    0.2186.
3! 6

c) To find the probability that at least two employees are absent on Tuesday, we
need to find


P ( X  2)  P (2)  P (3)  . . .   P ( X )
x2

Alternatively, we could find the complementary event

P ( X  2)  1 P ( X  1)  1  [ P (0) P (1)]
 (3.4) 0 e3.4 (3.4)1 e3.4 
 1  
 0! 1! 
 1  [0.033373 (3.4)(0.03337]
 1  0.1468412  0.8531588
 0.8532

Example 8

On Saturdays at Southdown, a small airport in Kalulushi, airplanes arrive at an average of


3 for the one hour period 13 00 hours to 14 00 hours. If these arrivals are distributed
according to the Poisson probability distribution, what are the probabilities that:

a) Exactly zero airplanes will arrive between 13 00 hours to 14 00 hours next


Saturday?

b) Either one or two airplanes will arrive between 13.00 hours and 14 00 hours next
Saturday?

c) A total of exactly two airplanes will arrive between 13 00 hrs and 14 00 hrs
during the next three Saturdays?

82
Solution

a)  = 3 and we let X be the number of arrivals during the specified time period.

30 e. 3
P(0)   0.049787068
0!
 0.0498

(From the table, we have 0.049787).

b) P( X  1 or X  2) P( X  1)P( X  2)
31 e 3 32 e 3
 
1! 2!
9
 e  3 (3  )
2
15
 ( )(0.04978068)
2
 0.37340301
 0.3734.

c) A total of exactly two arrivals in three Saturdays during the period 13 00 hours to
14 00 hours can be obtained. For example by having two arrivals on the first day,
none on the second day, and none on the third day during the specified one-hour
period.

The total number of ways in which the event in question can occur is shown in the
table below.

Number of Arrivals
Saturday Day 1 Saturday Day 2 Saturday Day 3
2 0 0
0 2 0
0 0 2
1 1 0
1 0 1
0 1 1

83
Number of ways of obtaining a total of exactly 2 arrivals in 3 Saturdays.

 3[ P( X  20][ P ( X  0)]2  3[ P( X  1)]2 [ P( X  0)]

(32 e  3 ) (30 e  3 ) 2 (31 e  3 ) 2 (30 e  3 )


3 3
2! 0! 1! 0!

9
9  81e 81
 3e  9   9    (0.0001)
2  2 2

 0.0049815

 0.005

5.5 Continuous Random Variables

The probability distribution of continuous random variables is also important in


statistical theory. They are a theoretical representation of a continuous random
variable such as the time taken in minutes to do some work, or the mass in
grammes of a bag of salt.

The continuous random variable is specified by its probability density function,


which is written f (x ) where f ( x)  0 throughout the range of values for which
x is defined. The probability density function ( p.d . f ) can be represented by a
curve, and the probabilities are given by the area under the curve.

For a continuous random variable x that assumes a value in the interval a  x  b,


b
the P(a  x  b)   f ( x)dx , assuming the integral exists. Similar to the
a

requirements for a discrete probability distribution, require f ( x)  0 and


b

 f ( x)dx  1.
a
If x is a continuous random variable – and with p.d.f. f (x ), then

b b
var( x)   x 2 f ( x)dx   2 where   E ( x)   xf ( x)dx, the standard deviation of x
a a

is often written as   var( x) .

84
5.6 The Normal Distribution

The normal distribution plays a central role in statistical theory and practice,
particularly in the area of statistical inference.

Any important characteristic of the normal distribution is that we need to know


only the mean and standard deviation to compute the entire distribution. The
normal probability distribution is defined by the question.

1
2 (x   2 )
e
F ( x)  2
2 2

The normal distribution is perfectly symmetric about its mean  . computing the
area over intervals under the normal probability distribution is a difficult task. As
a result, we will use the computed areas listed in Table 3.

Example 1

Suppose you have a normal random variable x with   50 and   15. Find the
probability that x will fall within the interval 30  x  70.

Solution

We compute the Z-Score (or standard score) for the measurement x, the standard
score is defined by:

Value  Mean x


Z 
S tan dard deviation 

30  50
Thus Z   1.33
15

Because x = 30 lies to the left of the mean, the corresponding Z-score should be
negative and of the same numerical value as the Z-score corresponding to x = 50.

70  50 20 
Z   1.33
15 15

85
f (x )

(4)

30 50 70

Normal frequency function:   50,   15.

To find the area corresponding to a Z-score of 1.33, we first locate the value 1.3
in the left-hand column. Since this column lists Z values to one decimal place
only, we refer to the top row of the table to get the second decimal place, 0.03.
Finally, we locate the number where the row labeled Z = 1.3 and the column
labeled 0.03 meet. This number represents the area between the mean,  and the
measurement that has a Z-score of 1.33.

A = 0.4062

Or, the probability that x will fall between 50 and 70 is 0.4082. Thus the
required probability is 2(0.4082) = 0.8164.

Example 2

Use Table 1 to determine the area to the right of the Z-score 1.64for the standard normal
distribution, i.e., find P( Z  1.64) .

Solution

Standard Normal Distribution:   0,   1

86
The probability that a normal random variable will be more than 1.64 standard deviation
to the right of its mean is indicated in the figure above. Because the normal distribution
is symmetric, half of the total probability (.5) lies to the right of the mean and half to the
left. Therefore, the desired probability is P( Z  1.64)  0.5  A. .

Where A is the area between   0 and Z =1.64 as shown in the figure.

Referring to Table 1, the area A corresponding to Z = 1.64 is 0.4495, so,


P( Z  1.64) 0.5  A  0.5  0.4495  0.0505.

Example 3

Find the probability that the value of the standard normal variable will be between –1.23
and +1.14.

Solution

Table 1.0 show that the area under the standard normal curve between 0 and 1.23 is
0.3907, so the area between 0 and –1.23 must also be 0.3907. Table 1.0 show that the
area between 0 and 1.14 is 0.3729. Thus, the area between –1.23 and +1.14 equals
0.3907 + 0.3729 = 0.7636, which means that the probability we want equals 0.7636.

-1.23 0 +1.14

Example 4

Find the probability that the value of the standard normal variable will be between 0.43
and 1.55.

87
Solution

0 0.43 1.55

From Table 1, the area between 0 and 1.55 is 0.4394 and that between 0 and 0.43 is
0.1664. Therefore the area between 1.55 is 0.4394 – 0.1664 = 0.2730.

The Normal Distribution As An Approximation To The Binomial Distribution

Normal Approximation to the Binomial Distribution. If n (the number of trials) is large


and P( the probability of success) is not too close to 0 or 1, the probability distribution of
the number of successes occurring in n Bernoulli trials can be approximated by a normal
distribution. Experience indicates that the approximation is fairly accurate as long as
1 1 1
np  5 when p  and n(1  p)  when p  .
2 2 2

Example 5

1
The probability that a machine will be down for repairs next week is . A firm has 100
2
such machines and whether one is down, is statistically independent of whether another is
not down. What is the probability that at least 60 machines will be down?

Solution

The number of machines down for repair has a binomial distribution with mean equal to
1 1
100     or 50. Because of the continuity correction, the probability that the
 2  2
number down for repairs is 60 or more can be approximated by the probability that the
value of a normal variable with mean equal to 50 and standard deviation equal to 5
exceeds 59.50. The value of the standard normal variable corresponding to 59.50 is (59-
50)  5, or 1.9. Table 3 shows that the area under the standard normal curve between

88
zero is 1.9 is 0.4713, so the area to the right of 1.9 must equal 0.5000 – 0.4713 = 0.0287.
This is the probability that at least 60 machines will be down for repair.

Learning Objectives

After working through this Chapter, you should be able to:

 Give the formal definition of a random variable, and distinguish between a


random variable and the values it takes.

 Explain the difference between continuous and discrete random variables.

 Discuss such distributions as Binomial, Poisson, and Normal and calculate


probabilities of events for such random variables.

 Find the mean and the variance of the binomial, Poisson and Normal distributions.

89
Sample Examination Questions

1. a) It is estimated that 75% of a grapefruit crop is good, the other 25% have
rotten centers that cannot be detected unless the grapefruit is cut open.
The grapefruit are sold in sacks of 6. Let r be the number of good
grapefruit in the sack.

i) Make a histogram of the probability distribution of r.

ii) What is the probability of getting no more than one bad grapefruit
in a sack?

iii) What is the probability of getting at least one grapefruit in a sack?

iv) What is the expected number of good grapefruit in a sack?

v) What is the standard deviation of the r probability distribution?

b) Let x have a normal distribution with   10 and   2. Find the


probability that an x value selected at random from the distribution is
between 11 and 14.

2. a) In a lottery, you pay K12 500 to choose a number (integer) between 0 and
9999, inclusive. If the number is drawn, you win K12 500,000. What is
your expected gain (or loss) per play?

b) A large hotel knows that on average 2% of its customers require a special


diet for medical reasons. It is hosting a conference for 500 people.

i) Which probability distribution would you suggest for calculating


the exact probability that no customer at the conference will
require a special diet? Calculate this probability.

ii) Which probability distribution do you suggest is an approximation


to this and why? Calculate an approximate probability that no
customers require a special diet.

iii) Compare your answers to (i) and (ii).

iv) From past records the hotel knows that 0.2% of its customers will
require medical attention while staying in the hotel. Calculate the
exact and approximate probability that no customer out the 500
will require medical attention while attending the conference. Is
this approximation better or worse that the approximation used in
(ii)? Why?

90
3. a) The Table below shows the probabilities for the number of complaints
received each day by a newspaper agency from customers not receiving a
paper.

No. of complaints 8 9 10 11 12
Probability .35 .42 .18 .03 .02

i) Find the mean and standard deviation of the number of complaints.

ii) The agency state the cost (in kwachas) of daily complaints to be C
= 600 + 300x, where x is the number of complaints. Find the mean
and standard deviation of the cost of daily complaints.

b) A write has prepared to submit sit articles for publication. The probability
of any article being accepted is 0.20. Assuming independence, find the
probability that the writer will have

i) exactly one article accepted.


ii) At least two articles accepted
iii) No more than three articles accepted
iv) At most two articles accepted.

4. a) A Toyota dealer wishes to know how many citations to order for the
coming month. Estimated demand is normally distributed, with a standard
deviation of 20 and a mean of 120.

i) What is the probability that he will need more than 160?

ii) What is the probability that he will eed less than 90?

b) A client wishes to know what price he might be able to get for a business
property. The realtor estimates that a sale price for that property of K600
million would be exceeded no more than 5% of time. A price at least
K420 million should be obtained at least 90% of the time.. Assuming the
distribution of sales prices to be normal, answer the following questions?

i) What are  and  for this distribution?


ii) What is the probability of a scale price greater than K540, less than
K640 million, and between K540 million and K600 million.

91
5. a) Which of the following are continuous variables, and which are discrete
variables.

i) Number of traffic fatalities per year in the town of Livingstone.

ii) Distance a ball travels after bring killed by a soccer player.

iii) Time required to drive from home to campus on any given day.

iv) Number of cars in Kitwe on any given day.

v) Your weight before breakfast each morning.

b) The ABCD Mother-in-law sociologists say that 80% of married women


claim that their husbands’ mothers are the biggest bones of contention in
their marriages (sex and money are lower-rated areas of contention).
Suppose that five married women are having lunch together one
afternoon, what is the probability that:

i) All of them dislike their mother-in-law


ii) None of them dislike her mother-in-law?
iii) At least four of them dislike their mother-in-law?
iv) No more than three of them dislike their mother-in – law.

c) The Mulenga Café has found that about 6% of the parties who make
reservations don’t show up. If 90 party observations have been made, how
many can be expected to show up. Find the standard deviation of this
distribution.

6. a) The mean and standard deviation on an examination are 85 and 15


respectively. Find the scores on standard units of students receiving
grades.

i) 65

ii) 89

b) Determine the probabilities

i) P ( Z  2.12)
ii) P ( 16  Z  1.13)

where Z is assumed to be normal with mean 0 and variance 1.

92
c) What is the probability of obtaining at least 1280 heads if a coin is tossed
2500 times and heads and tails are equally likely?

d) The side effects of a certain drug cause discomfort to only a few patients.
The probability that any individual will suffer from the side effects is
0.005. If the drug is given to 35 000 patients, what is the probability that
three (3) will suffer side effects.

7. a) The customer service center in a large Luksa department store has


determined that the amount of time spent with a customer with a
complaint is normally distributed with a mean of 9.3 minutes and a
standard deviation of 2.5 minutes. What is the probability that for a
randomly chosen customer with a complaint the amount of time sent
resolving the complaint will be:

i) less that 10 minutes?

ii) more than 5 minutes

iii) between 8 and 15 minutes.

b) A car rental company is determined that the probability a car will need
service work in any given month is 0.25. The company has 850 cars.

i) What is the probability that more than 150 cars will require service
work in a particular month?

ii) What is the probability that fewer than 180 cars will need service
work in a given month? (Give reason for the method used to
calculate the probabilities in (i) and (ii).

c) A contractor estimates the probabilities for the number of days required to


complete a certain type of construction project as follows.

Time (days) 1 2 3 4 5
Probability .04 .21 .34 .31 .10

i) What is the probability that a randomly chosen project will take


less than 3 days to complete.

ii) Find the expected time to complete a project.

93
iii) Find the standard deviation of time required to complete a project.

iv) The Contractor’s project cost is made up of two parts – a fixed


cost of K100,000,000 plus K10,000,000 for each day taken to
complete the project. Find the standard deviation of total project
costs.

94
CHAPTER 6

SAMPLING AND SAMPLING DISTRIBUTION

Reading

Newbold Chapter 6

Wonnacolt and Wonnacolt Chapter 6

Tailoka Frank P Chapter 10

James T Mc Clave and P George Benson Chapter 7

Introductory Comments

We now start on the work that defines the subject Statistics as a different and unique
subject. The idea of sampling and sampling distribution for a statistic like the mean must
be clearly understood by all users of statistics. This is not an easy Chapter to understand.

6. Sampling Theory

Sampling and Sampling Distribution

6.1 Sampling

If we draw an object from a box, we have the choice of replacing or not replacing
the abject into the box before we draw again. In the first case a particular object
can come up gain and again, whereas in the second it can come up only once.
Sampling where each member of a pollution may be chosen more than once is
called sampling with replacement while sampling where each member cannot be
chosen more than once is called sampling without replacement.

95
Random Samples. Random Numbers

Clearly the reliability of conclusions drawn concerning a population depends on


whether the sample is properly chosen so as to represent the population
sufficiently well, and one of the important problems of statistical inference is just
how to choose a sample.

The way to do this for finite population is to make sure that each members of the
population has the same chance of being in the Sample, which os often called a
random sample. Random sampling can be accomplished for relatively small
populations by drawing lots or equivalently, by using a table of random numbers
specially constructed for such purposes.

Because inference from sample to population cannot be certain we must use the
language of probability in any statement of conclusions.

6.2 Sampling Distributions

As we have seen, a sample statistic that is computed from X 1 , . . . , X n is a


function of these random variables and is therefore itself a random variable. The
probability distribution of a sample statistic is often called the sampling
distribution of the statistic.

Alternatively, we can consider all possible sample of size n that can be drawn
from the population, and for each sample we compute the statistic. In this manner
we obtain the distribution of the statistic, which is its sampling distribution.

For a sampling distribution, we can of course compute a mean, variance, standard


deviation, etc. The standard deviation is sometimes also called the standard error.

The Sample Mean

Let X 1 , X 2 , . . . X n denote the independent, identically distributed random


variables for a random sample of size n as described above. Then the mean of the
sample or sample mean is a random variable defined by

X1 X 2  . . .  X n
x  (1)
n

96
If x1 , x2 , . . ., xn denote the values obtained in a particular sample of size b, then the mean
x  x2  . . .  xn
for that sample is denoted by x  1  ( 2)
2

Sampling Distributions of Means

Let f (x ) be the probability distribution of some given population from which we draw a
sample of size n. Then it is natural to look for the probability distribution of the sample
statistics x , which is called the sampling distribution for the sample mean, or the
sampling distribution of mean. The following theorems are important in this connection.

Theorem 6.1

The mean of the sampling means denoted by  x    (3)

Where  is the mean of the population. Theorem 6 – 1 states that the expected value of
the sample mean is the population mean.

Theorem 6.2

If a population is infinite and the sampling ir random or if the population is finite and
sampling is with replacement, then the variance of the sampling distribution of means,
denoted by  x2 , is given by

 2

E (x   )    2
x
2
n

Theorem 6.3

If the population is of size N, if sampling is without replacement, and if the sample size is
 2  N  n
n  N , then the previous equation is replaced by  x 2
 (5)
n  N  1 

While  x is from Theorem 6.1.

Note that Theorem 6.3 is basically the same as 6.2 as N  

97
Theorem 6.4

If the population from which samples are taken is normally distributed with mean  and
variance  2 , then the sample mean is normally distributed with mean  and variance
2
.
n

Theorem 6.5

Suppose that the population from which samples are taken has a probability with mean 
and variance  2 that is not necessarily a normal distribution. Then the standardized
variable associated with x , given by

x
Z  (6)

n

is asymptotically normal, i.e.

z 2
lim 1 

n
P( Z  z ) 
2 e

2
du  (7 )

Theorem 6.5 is a consequence of the Central limit theorem. It is assumed here that the
population is infinite or that sampling is with replacement. Otherwise, the above is

correct if we replace in Theorem 6.5 by  x2 as given in theorem 6.3.
n

Example 1.0

Five hundred ball bearings have a mean weight of 5.02kg and a standard deviation of
0.30kg. Find the probability that a random sample of 100 ball bearings chosen from this
group will have a combined weight of more than 5.10kg.

2 N n
For the sampling distributions of means,  x    5.02kg, and  x2 
n N 1

0.30 500  100


  0.027
100 500  1

98
The combined weight will exceed 5.10kg if the mean weight of the 100 bearings exceeds
5.10kg.

5.10  5.02
5.10 in standards units Z   2.96
0.027

The required probability is the area to the right z = 2.96 as shown in Figure 6.1.

2.96

Figure. 6.1.

The probability is 0.5 – 0.4985 = 0.0015. Therefore, there are only 3 chances in 2000 of
picking a sample of 100 ball bearings with a combined weight exceeding 5.10 kg.

Sampling Distribution of Proportions

Suppose that a population is infinite and binomially distributed, with p and


q  1  p being the respective probabilities that any given number exhibits or does not
exhibit of a certain property. For example, the population may be all possible tosses of a
fair coin, in which the probability may be all possible tosses of a fair coin, in which the
1
probability of the event heads is pˆ  .
2

Consider all possible samples of size n drawn from this population, and for each sample
determine the statistic that is the proportion p̂ of successes. In the case of the coin, p̂
would the proportion of heads turning up in n tosses. Then we obtain a sampling
distribution whose mean  p̂ and standard deviation  p̂ are given by

pq p(1  p)
 pˆ  P  pˆˆ    (8)
n n

For large values of n( n  30) the sampling distribution is very nearly a normal
distribution, as seen from Theorem 6.5. For finite populations in which samplings

99
without replacement, the equation  p̂ given above is replaced by  x as given Theorem
pq
6.3 with  pˆ  .
n

Example 2.0
A simple random sample of size 64 is selected from a population with p  0.30 .
(a) What is the expected value of p̂ ?
(b) What is the standard deviation of p̂ ?
(c) Show the sampling distribution of p̂ ?
(d) What does the sampling distribution of p̂ show?

Solution
(a) The expected value of p̂ i.e., E ( pˆ )  p  0.30 .
(b) The standard deviation
pq 0.31  0.3
of p̂  pˆ    0.00328125  0.0573 .
n 64
(c) Normal with E ( pˆ )  30 and  pˆ  0.0573 .
(d) The probability distribution of p̂ .

Sampling Distribution of Differences and Sums

Suppose that we are given two populations. For each sample size n1 drawn from the first
population, let us compute a statistic X 1 . This yields a sampling distribution for X 1
whose mean and standard deviation we denote by  X and  X , respectively. Similarly
1 1

for each sample of size n2 drawn from the second population, let us compute a statistic
X 2 whose mean and standard deviation are  X and  X respectively.
2 2

Taking all possible combinations of these samples from the two populations, we can
obtain a distribution of the differences X 1  X 2 , which is called the sampling distribution
of differences of the statistics. The mean and standard deviation of this sampling d,
denoted respectively.

By  X  X   X   X X   X2 1   X2 2 (9)
1 2 1 2 1X2

Provided that the samples chosen do not in any way depend on each other, i.e., the
samples are independent (in other words, the random variables X 1 and X 2 are
independent.)

100
Similarly for the sample means from two populations, denoted by x1 , x2 , respectively,
then the sampling distribution of the differences of means is given for infinite population
with mean and standard deviation  X ,  X and  X , X , respectively by
1 1 2 2

x 1  x2
  x1   x 2  1   2 , and  (10)

 12  22
x 1  x2
  x2   x2    (11)
1 2
n1 n2

Using Theorems 6.1 and 6.2 this result also holds for finite populations if sampling is
done with replacement. The standardized variable

( X 1  X 2 )  ( 1   2 )
Z
 12  22

n1 n2

in that case is very nearly normally distributed if n1 and n2 are large (n1, n2  30).
Similar results can be obtained for infinite populations in which sampling is without
replacement by using Theorems 6.1 and 6.3.

Example 3.0
In the age of rising housing costs, comparisons are often made between costs in different
areas of the country. In order to compare the average cost 1  of a 3 – bedroom, 2 – bath
home in Kitwe to the average cost 2  of a similar home in Lusaka, independent random
samples were taken of 190 housing costs in Kitwe and 120 housing costs in Lusaka.
Describe the sampling distribution of ( x1  x2 ) , the difference in sample housing costs in
the two cities.

Solution
The mean of the sampling distribution of x1  x2  is Ex1  x2   E( x1 )  E( x1 )  1  2
The variance of x1  x2  is the sum of the variances of x1 and x2 ; Thus
 12  22  12  22
 x2  x     , where  12 and  22 represent the population variances of
1 2
n1 n2 190 120
the costs of 3 – bedroom, 2 bath homes in Kitwe and Lusaka, respectively. The standard
 12  22
deviation of the sampling distribution of x1  x2  is the  .
190 120

101
Corresponding results can be obtained for sampling distributions of differences of
proportions from two binomially distributed populations with parameters
P1, q1, and P2 , q2 , whose mean and standard deviation of their difference is given by

 Pˆ  Pˆ  P1  P2
1 2
 (13)

Pˆ1qˆ1 Pˆ2 qˆ2


 Pˆ  pˆ    (14)
1 2 n1 n2

Example 4.0

It has been found that 2% of the tools produced by a certain machine are defective. What
is the probability that in a shipment of 400 such tool, 3% or more will prove defective?

pˆ qˆ 0.02(0.98) 0.14
 p  Pˆ  0.02, p     0.007
n 400 20

 0.03  0.02 
P( Pˆ  0.03)  P Z  
 0.007 
 P( Z  1.43)

 0.5000  0.4236

 0.0764

1.43

102
Learning Objectives

After working through this Chapter, you should be able to:

 Give the formal definition of a random variable, and distinguish between a


random variable and the values it take,

 Explain the difference between continuous and discrete random variables.

 Discuss such distribution as Binomial, Poisson, and Normal and calculate


probabilities of event for such random variables.

 Find the mean and the variance of the Binomial, Poisson and Normal distribution.

 Define the sampling distribution of the sample mean, the sample proportion and
their differences.

103
CHAPTER 7

ESTIMATION

Reading

Newbold Chapter 7

Wonnacott and Wonnacott Chapter 7

Tailoka Frank P Chapter 10

Introductory Comments

We need to know how the mean of the population is related to the sample mean. What
characteristics must the sample mean have. We need to know whether the sample is
likely to give us an estimate close to the population value. To tell us this, we use
confidence intervals.

7. Estimation Theory

7.1 Unbiased Estimates and Efficient Estimates

A statistic is called unbiased estimator of a population parameter if the mean or


expectation of the statistic is equal to the parameter. The corresponding value of
the statistic is then called unbiased estimate of the parameter.

If the sampling distribution of two statistics have the same mean, the statistic with
the smaller variance is called a more efficient estimator of the mean. The
corresponding value of the efficient statistic is then called an efficient estimate .
Clearly one would in practice prefer to have estimators that are both efficient and
unbiased, but this is not always possible.

7.2 Point estimates and Interval Estimates

An estimate of a population parameter given by a single number is called a point


estimate of the parameter. An estimate of a population parameter given by two
numbers between which the parameter may be considered to lie is called an
interval estimate of the paratmeter.

104
Example 1.0

If we say that a distance is 34.5km, we are giving a point estimate. If, on the
other hand, we say that the distance is 34.5  0.04km, i.e., the distance lies
between 34.46 and 34.54km, we are giving an interval estimate.

A statement of the error or precision of an estimate is often called reliability.

7.3 Confidential Interval Estimates of Population Parameters.

Let  s and  s be the mean and standard deviation (standard error) of the
sampling distribution of a statistic S. Then if the sampling distribution of S is
approximately normal (which we have seen is true for many statistics if the
sample size n  30), we can expect to find S lying in the interval  s   s to
 s   s ,  s  2 s to  s  2 s or  s  3 s , to   3 s , about 68%, 95% and
99.7% of the time respectively.

Equivalently, we can expect to find, or we can be confident of finding  in the


intervales S   s , to S  s , S  2 , to S  2 , S  3 s to S 3 s about 68%,
95% and 99.7% of the time respectively. Because of this, we call these respective
intervals 68%, 95% and 99.7% confidence intervals for estimating  s (i.e., for
estimating the population parameter, in this case of an unbiased S). The end
number of these intervals ( S   s S  2 s , S  3 s ) are then called the 68%, 95%
and 99.7% confidence limites.

Similarly, S  1.96 s and S  2.58 s are 95% and 99% confidence limits for
 s . The percentage confidence is often called the confidence level. The
numbers 1.96, 2.58, etc., in the confidence limits are called critical values and are
denoted by Z c . From confidence levels, we can find critical values.

7.4 Confidence Intervals for Means

We shall see how to create confidence intervals for the mean of a population
using two different cases. The first case shall be when we have a large sample
size ( n  30), and the second case shall be when we have a smaller sample
n  30) and the underlying population is normal.

Large samples ( n  30)

105
If the statistic S is the sample mean x , then the 95% and 99% confidence limits
for estimation of the population mean  are given by
x  1.96 x , and x  2.58 x , respectively.

More generally, the confidence limits are given by x  Z c x where Z c which


depends on the particular level of condience desired. The confidence limits for
the population mean are given by


x  Zc  (1)
n

In case of sampling from an infinite population or if sampling is done with


replacement from a finite population, and by

 N n
x  Zc  (2)
n N 1

If sampling is done without replacement from a population of finite size N.


In general, the population standard deviation  is unknown, so that to obtain the
above cnfidence limits we use the estimator Sˆ or S .

Example 2.0

Find a 95% confidence interval estimating the mean height of the 1546 male
students at XYZ University by taking a sample size 100. (Assume the mean of
the sample, x , is 67.45 and that the standard deviation of the sample Ŝ , is
2.93cm).


The 95% confidence limits are x  1.96
n

Using x = 67.45cm and Ŝ = 2.93 as an estimate of  , the confidence limits are

 2.93 
67.45  1.96  or 67.45  0.57
 100 

Then the 95% confidence interval for the population mean  is 66.88 to 68.02
cm, which can be denoted by 66.88    68.02.

106
We can therefore say that the probabilit that the population mean height lies
between 66.88 and 68.02 cm is above 95%. In symbols, we write
P(66.88    68.02)  0.95% . This is equivalent to saying that we are 95%
confident that the population mean (true mean) lies between 66.88 and 68.02cm.

7.5 Sample Sample (n  30) and Population Normal

In this case use the distribution to obtain confidence levels. For example, if
 t0.025 and t0.025 are values of T for which 2.5% of the area lies in each tail of the t
distribution, then a 95% confidence interval for T is given by

 t0.025 
x    n
 t0.025  (3)
S

From which we can see that  can be estimated to lie in the interval

Sˆ S
x  t0.025    x  t0.025  (4)
n n

with a 95% confidence. In general the confidence limits for population means are
given by

S
x  tc  (5)
n

where the tc values can be read from Table 2.


Example 3.0
The following data have been collected from a sample of nine items from
a normal population: 12, 9, 16, 20, 16, 23, 7, 8, and 10.
(a) What is the point estimate of the population mean?
(b) What is the point estimate of the population standard
deviation?
(c) What is the 90% confidence interval for the population
mean?
Solution

(a) The point estimate is x 


 x  121  13.444
n 9

107
(b) The point estimate of the population standard deviation is
 x
 x  n
121
2 2
2
1879 
s  9  5.615
n 1 8
s  5.615 
(c) We have x  t0.05,8 , 13.444  1.860  , 13.444  3.4813.
n  9 
Thus, the 90% confidence interval estimate of the population mean is
9.9627 to 16.9253.

7.6 Confidence Intervals for Proportions

Suppose that the statistic S is the proportional of “successes’ in a sample of size


n  30 drawn from a binomial population in which P is the proportion of
successes (i.e. the probability of success). Then the confidence limits for P are
given pˆ  z  pˆ where p̂ denotes the proportion of success in the sample of size
2

n . Using the value of  p̂ obtained in chapter 6; we see that confidence limits for
the population are given by:

pq P(1  P)
P  Zc  P  Zc  (6)
n n

This is the case where sampling is from an infinite population or if sampling is


done with replacement from a finite population. Similarly, the confidence limits
are:

pq N  n
P  Zc  (7 )
n N 1

when sampling is done without replacement from a finite population of size N .


Note that these results are obtained from (1) and (2) on replacing x by P
and  by Pq . To compute the above confidence limits, we use the sample
estimate P for p.

108
Example 4.0

A sample roll of 100 votes chosen at random from all voters in a given district
indicated that 55% of them were in favour of a particular candidate. Find the 99%
confidence limits for the proportion of all voters in favour of this candidate.

The 99% confidence limits for the population P are

P (1  p )
P 1.58 p  P  2.58
n

055(0.45)
 0.55  2.58
100
 0.55  0.13

7.7 Confidence Intervals for Differences and Sums

If X 1 and X 2 are two sample means with approximately normal sampling


distributions, confidence limits for the differences of the population parameters
corresponding to X 1 and X 2 are given by

X 1  X 2   Z c s1  s 2  X 1  X 2   Z c  s21  s22  (8)

While confidence limits for the sum of the population parameters are given by

X 1  X 2   Z c s1  s 2  S1  S 2   Z c  s21  s22  (9)

provided the samples are independent.

For example, confidence limits for the difference of two population means, in the
case where the populations are infinite and have known standard deviations
 1 , 2 , are given by

 s2  s2
x  x   Z 
1 2 c x1  x 2
 
 x1  x 2  Z c
n1
1

n2
 (10)

109
where x1, n1 and x2 , n2 are the respective means and sizes of the two samples
drawn from the populations.

Similarly, confidence limits for the difference of two population proportions,


where the populations are infinite, are given by

P(1  p1 ) P (1  p2 )
P1  P 2  Zc  2  (11)
n1 n2

When P1 and P2 , are sample proportions and n1 and n2 are sizes of the two
samples drawn from the populations.

Example 5.0

In a random sample of 400 adults and 600 teenagers who watched a certain
television program, 100 adults and 300 teenagers indicated that they like it.
Construct the 99.7% confidence limits for the difference in proportions of all
adults and all teenagers who watched the program and liked it.

Confidence limits for the difference in proportions of the two groups are given by
911), where subscripts 1 and 2 refer to teenagers and adults, respectively, and
Q1  1  p1, Q2  1  p2. Here P1  300/ 600  0.5 and P2  100/ 400  0.25 are
respectively, the proportions of teenagers and adults who liked the program.

The 99.7% confidence limits are given by

(0.50)(0.50) (0.25)(0.75)
0.50  0.25  3  0.25  0.09  (12)
600 400

Therefore, we can be 99.7% confident that the true difference in proportions lies between
0.16 and 0.34.

110
7.7 Determing The Sample Size

Form the previous work, there is a 1   probability that the value of the sample will

provide a sampling error of Z   x or less. Because  x  , we can rewrite this
2 n
statement to read. There is 1   probability that the value of the sample mean will
  
provide a sampling error of Z    or less. Given values Z  and  , we can determine
2  n  2
the sample size n needed to provide any sampling error. Let d  the maximum sampling
2
  2
 Z  
 
error, we have n   2 2 . This is the sample size which will provide a
d
probability statement of 1   with sampling error d or less.
In most cases,  , will be unknown. In practice one of the following procedures can be
used.
(a) Use a pilot study to select a preliminary sample. The sample standard
deviation from the preliminary sample can be used as the planning value for  .
(b) Use the sample standard deviation from a previous sample of the same
or similar units
(c) Use judgment or best guess for the value of  . This is where you apply
the Empirical rule or the Chebyshev’s rule.

Example 6.0
How large a sample should one select to be 90% confident that the sampling error is 3 or
less? Assuming the population variance is 36.

Solution
1.65 6 
2 2

We have d  0.05 , Z 0.05  1.65 and   6 . Hence n  6 .6


32

In cases where the computed n is a fraction, we round up to the next integer value: hence
the recommended sample size here is 7.

pq
As for a proportion, n  Z 2 . In practice, the planning value for the population
2
d2
proportion can be chosen in the same way as the population mean. However if none of
them applies, use p  0.05

111
Example 7.0
In a survey, the planning value for the population proportion p is given as 0.45. How
large a sample should be taken to be 95% confident that the sample proportion is within
 0.04 of the population proportion?

Solution

We have d  0.04 , Z 0.025  1.96 , p  0.45 and q  0.55 . Hence

n
1.962 0.450.55  594.2475
0.042
Hence, a sample size of 595 is recommended.

Example 8.0
How large a sample should be taken to be 90% confident that the sampling error of
estimation of the population proportion is 0.02 or less? Assume past data are not
available for developing a planning value for p ?

Solution

We have Z 0.05  1.65 , and assume that p  0.5 , q  0.5 and d  0.02 .

Therefore n 
1.652 0.50.5  1701.5625. The recommended sample size is 1702.
0.022

112
Learning Objectives

After working through this Chapter you should be able to:

 Explain a point estimate and confidence interval.


.
 Find confidence intervals for means of normal populations, and for differences of
means of two normal populations, both when variance (s) are known and when
they are unknown..

 Confidence intervals for proportions and differences of proportions.

113

You might also like