0% found this document useful (0 votes)
5 views

4. Interval Estimation

Uploaded by

harshitkokra.p24
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

4. Interval Estimation

Uploaded by

harshitkokra.p24
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 69

Estimation

Population Mean: s Known


Population Mean: s Unknown
Population Proportion
Determining the Sample Size

Dr. Shraddha Mishra


IMI, New Delhi
Why estimate ?

• We make estimates without worry about whether they are scientific but
with the hope that the estimates bear a reasonable resemblance to the
outcome.
• Managers use estimates too to make rational decisions while dealing with
issues that lack complete information and with a great deal of uncertainty.
• We use statistics to make more logical and useful estimates.
• Statistical inferences are based on estimations.
• In both estimations and hypothesis testing inferences about
characteristics of the population are drawn from information contained in
samples
Estimator and Estimates
• Any sample statistic that is used to estimate a population parameter is called an
estimator.

• An estimate is a specific observed value of a statistic.

E.g. Suppose we calculate the mean odometer reading from a sample of used taxis
and find it to be 98,000 miles. If we use this specific value to estimate the mileage
for a whole fleet of used taxis, the value 98,000 miles would be an estimate.
The most commonly-used estimator of the population:

Sample Statistic Population Parameter


Mean (X) is the Mean ()

Variance (s2) is the Variance (2)

Standard Deviation is the Standard Deviation


(s) ()

Proportion (P)
)
Proportion ( p is the
Types of Estimates

We can make two types of estimates about the population. They are
referred to as point estimates and interval estimates.
• A point estimate is the sample statistic that is used to estimate the
population parameter.
• Single number
• Often insufficient because it is either right or wrong
• Useful when accompanied by an estimate of the error that might be
involved.
Problem

The National Stadium is considering expanding its seating capacity and needs to
know both the average number of people who attend events there and the
variability in this number. The following are the attendances (in thousands) at nine
randomly selected sporting events. Find point estimates of the mean and the
variance of the population from which the sample was drawn.
8.8 14.0 21.3 7.9 12.5 20.6 16.3 14.1 13.0

Ans. 14.3 and 4.6


Interval Estimates

• An interval estimate is the range of values within which a researcher or an


employee can say with some confidence that the population parameter falls. This
range is called confidence interval.
• It provides additional information about the variability of the estimate.
• It indicates the error in two ways:
1.By the extent of its range
2.By the probability of the true population parameter lying within that range.
Confidence level vs confidence interval

Probability that the population


parameter will fall in this range
The range within which the
population parameter will fall,
includes lower limit and upper
limit
Criteria of a good estimator

Desirable properties of estimators include:

• Unbiasedness
• Efficiency
• Consistency
• Sufficiency
Unbiasedness
• An estimator is said to be unbiased if its expected
value of the sample statistic is equal to the population
parameter it estimates.

For example, E(X)=  so the sample mean is an unbiased


estimator of the population mean.

• Unbiasedness is an average or long-run property. The


mean of any single sample will probably not equal the
population mean, but the average of the means of
repeated independent samples from a population will
equal the population mean.

Any systematic deviation of the estimator from the


Unbiased and Biased Estimator

{
Bias
An unbiased estimatorA biased estimator
is is off target on
on target on average. average.
Efficiency

Given the choice of two unbiased estimators of


the same population parameter, we would
prefer to use the point estimator with the
smaller standard deviation, since it tends to
provide estimates closer to the population
parameter.
The point estimator with the smaller standard
deviation is said to have greater relative
efficiency than the other.
Efficiency

An estimator is efficient if it has a relatively


small variance (and standard deviation).

An efficient estimator An inefficient estimator is,


is, on average, closer on average, farther from
to the parameter the parameter being
being estimated.. estimated.
Consistency

An estimator is said to be consistent if its


probability of being close to the parameter it
estimates increases as the sample size
increases.

Consistency

n = 10 n = 100
Sufficiency

An estimator is said to be sufficient if it


contains all the information in the data about
the parameter it estimates.

Sample mean is a sufficient estimator as


every observation in the sample is used in the
calculation of the sample mean. The sample
median is not a sufficient estimator, as it is the
middle point of the dataset, regardless of the
magnitudes of all other data elements.
Properties of Sample Mean
For a normal population, both the sample mean and sample
median are unbiased estimators of the population mean, but the
sample mean is both more efficient (because it has a smaller
variance), and sufficient. Every observation in the sample is used
in the calculation of the sample mean, but only the middle value is
used to find the sample median.

In general, the sample mean is the best estimator of the population


mean. The sample mean is the most efficient unbiased estimator
of the population mean. It is also a consistent estimator.
Properties of Sample Variance

The sample variance (the sum of the squared deviations from the sample
mean divided by (n-1) is an unbiased estimator of the population variance.
In contrast, the average squared deviation from the sample mean is a biased
(though consistent) estimator of the population variance.

2
=
æ å ( x - x )
2
ö
= s 2
E (s ) E ç ÷
-
è ( n 1) ø

æ å ( x - x )2 ö
÷ <s
2

è n ø
Margin of error and the Interval Estimate
• A point estimator cannot be expected to provide the exact value of the
population parameter.
• An interval estimate can be computed by adding and subtracting a margin of
error to the point estimate.

• Margin of Error is a function of standard deviation of sample statistic and and


z-value.
• The purpose of an interval estimate is to provide information about how
close the point estimate, is to the value of the parameter.
Problem:

From a population known to have a standard deviation of 1.4, a sample of 60


individuals is taken. The mean of this sample is found to be 6.2.
a) Find the standard error of the mean.
b) Establish an interval estimate around the sample mean, using one standard
error of the mean.
Interval Estimate of a Population Mean: σ Known

• The general form of an interval estimate of a population mean is

• In order to develop an interval estimate of a population mean, the margin of error


must be computed using either:
Interval Estimate of a Population Mean: σ Known
Interval Estimate of Population Mean:
s Known
Before Sampling
Before Sampling Before sampling, there is a 0.95probability that the interval
P   1.96   x    1.96   0.95 
P   1.96 n  x    1.96 n 0.95   1 .96
 n n 
n
or will include the sample mean (and 5% that it will not).
or
After Sampling
After Sampling

P x  1.96     x  1.96   0.95 Conversely, after sampling,approximately 95% of such intervals
P x  1.96 n    x  1.96 n 0.95
 n n  
x 1.96
n
will include the population mean (and 5% of them will not).


That is, x 1.96 is a 95% confidence interval for  .
n
Interval Estimate of
Population Mean
s Known
Sampling
distribution
of x
1 -  of all
/2 /2
x values
interval
does x

not interval
z /2  x z /2  x
include includes
m [------------------------- x -------------------------] m
[------------------------- x -------------------------]
[------------------------- x -------------------------]
A 95% Interval around the Population Mean
Sampling Distribution of the Mean
0.4

95%
0.3
Approximately 95%
Approximately 95% of of sample
sample
f(x)
0.2 means can
means can be be expected
expected to to fall
fall
0.1
within the
within the interval
interval
   1.96  ,   1.96  
2.5% 2.5%  n. n 
0.0
.
 x
   196
.
  196
. n
n
Conversely, about
about 
2.5% can
Conversely,  1.2.5%
96
n
can
x be expected
be expected toto be
be above
above
x and

  1.96and 2.5% can
2.5% can be
be expected
expected
2.5% fall n
below the
x to be below
to be below
..
x
interval x
x
2.5% fall    
   1.96 ,   1.96
x above the n n 
x interval So 5%
So 5% can
can be
be expected
expected to
to fall
fall
x
outside the
outside the interval
interval
..
95% fall
within the
interval
95% Intervals around the Sample Mean

0.4
Sampling Distribution of the Mean
Approximately95%
Approximately 95%of ofthe
theintervals
intervals
x 1.96 aroundthe
 around thesample
samplemean
meancancanbebe
95% n
0.3
expected
expected totoinclude
includethe
theactual
actualvalue
valueof
ofthe
the
populationmean,
mean,.. (When
(Whenthe thesample
sample
f(x)

population
0.2

0.1 meanfalls
mean fallswithin
withinthe
the95%
95%interval
intervalaround
around
2.5% 2.5%
0.0
thepopulation
the populationmean.)
mean.)
  x
  196
.   196
.
n n
x x x

x **5%
5%of
ofsuch
suchintervals
intervalsaround
aroundthe
thesample
sample
x
meancan
mean canbe beexpected
expectednot
nottotoinclude
includethe
the
x
actualvalue
actual valueof
ofthe
thepopulation
populationmean.
mean.
* x
x
(Whenthe
(When thesample
samplemean
meanfalls
fallsoutside
outsidethe
the
x 95%interval
95% intervalaround
aroundthe
thepopulation
population
x
mean.)
mean.)
x
x
*
x x
A (1-a )100% Confidence Interval for m
We define z as the z value that cuts off a right-tail area of  under the standard
2 2
normal curve. (1-) is called the confidence coefficient.  is called the error
probability, and (1-)100% is called the confidence level.
Stand ard Norm al Distrib ution  
P  z za  = a/2
>
0.4  2

(1   )
  a/2
0.3 
P z < - za  =
 2

 
f(z)

0.2
P  - za < z < za  =(1 - a)
0.1    2 2

2 2
0.0 (1- a)100% Confidence Interval:
-5 -4 -3 -2 -1 0 1 2 3 4 5 s
 z Z z x za
2 2
2 n
Interval Estimate of a Population Mean: σ Known
Interval Estimate of μ:
z scores for confidence interval in relation to alpha

In estimation, any confidence level can be applied; however, the most


widely used levels are 90 %, 95%, and 99%.
Distribution of sample means for 99% confidence interval
Interval Estimate of a Population Mean: σ Known

Confidence Alpha Alpha divided by Table Look-up Area 2 subscript alpha divided by 2
Level 2 baseline
90% .10 .05 .9500 1.645

95% .05 .025 .9750 1.960


99% .01 .005 .9950 2.576
The Level of Confidence and the Width of the Confidence Interval

When sampling from the same population,


using a fixed sample size, the higher the
confidence level, the wider the confidence
St an d ar d N or m al Di s tri b uti o n St an d ar d N or m al Di stri b uti o n

0.4
interval. 0.4

0.3 0.3
f(z)

f(z)
0.2 0.2

0.1 0.1

0.0 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z

80% Confidence Interval: 95% Confidence Interval:


 
x 128
. x 196
.
n n
The Sample Size and the Width of the Confidence Interval

When sampling from the same population,


using a fixed confidence level, the larger the
sample size, n, the narrower the confidence
interval. S am p ling D is trib utio n o f the M e an
S am p ling D is trib utio n o f the M e an

0 .4 0 .9

0 .8

0 .3 0 .7

0 .6

0 .5

f(x)
f(x)

0 .2
0 .4

0 .3
0 .1
0 .2

0 .1
0 .0 0 .0

x x

95% Confidence Interval: n = 20 95% Confidence Interval: n = 40


Meaning of C% Confidence

• We say that this interval has been established at the 90% confidence level.
• The value 0.90 is referred to as the confidence coefficient.
Using the Z Statistic for Estimating Population Mean
• The z statistic can be used for estimating the population parameter
on the basis of the sample statistic.
• Confidence interval for estimating population mean μ

• The confidence interval with the associated probability can be


calculated as below:
Interval Estimate of a Population Mean: σ Known
Example : Discount Sounds

Discount Sounds has 260 retail outlets throughout the United States. The firm is evaluating a
potential location for a new outlet, based in part, on the mean annual income of the
individuals in the marketing area of the new location.

A sample of size n = 36 was taken; the sample mean income is $41,100. The population is not
believed to be highly skewed. The population standard deviation is estimated to be $4,500,
and the confidence coefficient to be used in the interval estimate is 0.95.
Interval Estimate of a Population Mean: σ Known
Example: Discount Sounds

Thus at 95% confidence, the margin of error is $1,470.


Interval Estimate of a Population Mean: σ Known
Example: Discount Sounds

$41,100 + $1,470
or
$39,630 to $42,570

We are 95% confident that the interval contains the population mean.
Interval Estimate of a Population Mean: σ Known

Example: Discount Sounds

Confidence Margin of Interval estimate


level error
90% 1,234 39,866 to 42,334

95% 1,470 39,630 to 42,570

99% 1,932 39,168 to 43,032

In order to have a higher degree of confidence, the margin of error and


thus the width of the confidence interval must be larger.
Example: A researcher has taken a random sample of size 70 from a
population with a sample mean of 35 and a population standard deviation
of 4.62. Construct a 90% confidence interval to estimate the population
mean.

This result implies that the researcher is 90% confident that the
population mean will lie between 34.091 and 35.909. The point estimate
is 35.
Example
Comcast, the computer services company, is planning to invest heavily
in online television services. As part of the decision, the company wants
to estimate the average number of online shows a family of four would
watch per day. A random sample of
n = 100 families is obtained, and in this sample the average number
of shows viewed per day is 6.5 and the population standard
deviation is known to be 3.2. Construct a 95% confidence interval
for the average number of online television shows watched by the entire
population 
x z / 2 xof
families
x z / of
2
4.
n
3 .2
6.5 z0.025 6.5 1.96 * 0.32
100
6.5 0.6272 (5.87,7.13)
OR,
5.87    7.13
Example
A survey of small business with Web Sites found that the average amount spent on
a site was INR 11,500 per year. Given a sample of 50 businesses and a population
standard deviation of = 600, what is the margin of error? Use 95% confidence.
What sample size would you recommend if the study required a margin of error of
150?
Margin of error = z.025 ( / n )

z.025 1.96  $600 n 50

Margin of error = 1.96(600/ 50 ) = 166.31

A larger sample size would be needed to reduce the margin of error to $150 or less.

1.96(600 / n ) 150

Solving for n shows n = 62


Interval Estimate of a Population Mean: σ unknown

• We’ll assume for now that the population is normally distributed.


t Distribution
• The t distribution, developed by William Gosset is a continuous probability distribution
that is similar to the normal distribution with its bell shape, but has heavier tails.
• It is used for estimating population parameters for small sample sizes or unknown
standard deviation.
• A specific t distribution depends on a parameter known as the degrees of freedom.
• Degrees of freedom refer to the number of independent pieces of information that go
into the computation of s.
• A t distribution with more degrees of freedom has less dispersion and small degrees of
freedom give heavier tails or high dispersion.
• As the degrees of freedom increases, the difference between the t distribution and the
standard normal probability distribution becomes smaller and smaller. This is a reason why
many researchers use the z distribution for large samples.
t Distribution

• For more than 100 degrees of freedom, the standard normal z value
provides a good approximation to the t value.
• The standard normal z values can be found in the infinite degrees () row of
the t distribution table.
Why n−1 (degrees of freedom) is used:

1. When estimating the population standard deviation (σ) using the sample
standard deviation (s), we calculate s based on the deviations of the sample
values from the sample mean (xbar).
2. The sample mean itself is computed from the same sample, introducing a
constraint: the sum of the deviations from the mean is always zero. This means
only n−1deviations are "free" to vary independently; the nth value is
determined by the others.
Interval Estimate of a Population Mean: σ Unknown
A (1-)100% confidence interval for  when  is not known (assuming a normally
distributed population) is given by:
Large Sample Confidence Intervals for the Population Mean

Whenever  is not known (and the population is assumed


normal), the correct distribution to use is the t distribution
with n-1 degrees of freedom. Note, however, that for large
degrees of freedom, the t distribution is approximated well by
the Z distribution.
A large - sample (1-  )100% confidence interval for :
s
x z
2 n
Interval Estimate of a Population Mean: σ Unknown
Example: Apartment Rents

A reporter for a student newspaper is writing an article on the cost of off-campus housing. A
sample of 16 one-bedroom apartments within a half-mile of campus resulted in a sample
mean of $750 per month and a sample standard deviation of $55.

Let us provide a 95% confidence interval estimate of the mean rent per month for the
population of one-bedroom apartments within a half-mile of campus. We will assume this
population to be normally distributed.
Interval Estimate of a Population Mean: σ Unknown
Interval Estimate

We are 95% confident that the mean rent per month for the population of
one-bedroom apartments within a half-mile of campus is between $720.70
and $779.30.
Example
A stock market analyst wants to estimate the average return on a
certain stock. A random sample of 15 days yields an average
x 10.37%
(annualized) return of and a standard deviation of s =
3.5%. Assuming a normal population of returns, give a 95%
confidence interval for the average return on this stock.

SOLUTION
The critical value of t for df = (n -1) = (15 -1) =14 and a right-tail
area of 0.025 is:
t 0.025 2.145
The corresponding confidence interval or interval estimate is:
s
x t 0. 025
n
3.5
10.37 2.145
15
10.37 1.94
 8.43,12.31
Example : In order to estimate the customer loyalty for a particular product, a researcher poses the
following question to a sample of 100 customers: How many years have you been continuously using
this product? This sample yielded a mean period of 8 years with a sample standard deviation of 2
years. Construct a 95% confidence interval for estimating the population mean.

This result implies that the researcher is 95% confident that the
population mean (average years after purchase in the population)
will lie between 7.608 years and 8.392 years.
Summary of Interval Estimation Procedures for a Population Mean

Can the
population standard
deviation s be assumed
Yes known ? No

Use the sample


s Known standard deviation
Case
s Unknown s to estimate s
Case

For small sample, For large sample,


Use Use Use

x z / 2 x z / 2
s
n s
x t / 2 n
n
Please note, large sample can be considered as n is greater or equal to 30
Sample Size for an Interval Estimate of a Population Mean
Example: Discount Sounds
Recall that Discount Sounds is evaluating a potential location for a new retail outlet, based in
part, on the mean annual income of the individuals in the marketing area of the new location.

Suppose that Discount Sounds’ management team wants an estimate of the population mean
such that there is a 0.95 probability that the sampling error is $500 or less.

How large a sample size is needed to meet the required precision?


Sample Size for an Interval Estimate of a Population Mean
Example : The personnel department of an organization wants to apply cost-cutting
measures for improving efficiency. As the first step, the personnel department wants
to curtail telephone expenses incurred by employees. For this, personnel
department has taken a random sample of 10 employees and gathered the following
data about telephone expenses (in thousand rupees) in the previous year:10, 12, 24,
23, 11, 14, 15, 34, 16, 23. Construct a 95% confidence interval to estimate the
average telephone expenses of the employees in the population.
(Here, sample average and sample s.d. have to be calculated based on the
Solution: given sample values)
Interval Estimate of a Population Proportion
• The general form of an interval estimate of a population proportion
is:
Interval Estimate of a Population Proportion
Large-Sample Confidence Intervals for the Population Proportion, p
The estimator of the population proportion, p , is the sample proportion, p . If the
sample size is large, p has an approximately normal distribution, with E( p ) = p and
pq
V( p ) = , where q = (1 - p). When the population proportion is unknown, use the
n
estimated value, p , to estimate the standard deviation of p .

For estimating p , a sample is considered large enough when both n p an n q are greater
than 5.
Interval Estimate of a Population Proportion
Large-Sample Confidence Intervals for the Population Proportion, p

Sampling p(1 p)
distribution p 
n
of p̂

/2 1 -  of all /2


p̂ values


p
z / 2 p z / 2 p
A large - sample (1- )100% confidence interval for the population proportion, p :

pˆ z pˆ qˆ
 /2 n
where the sample proportion, p̂, is equal to the number of successes in the sample, x,
divided by the number of trials (the sample size), n, and q̂ = 1- p̂.
Interval Estimate of a Population Proportion
Example: Political Science, Inc.

Political Science Inc. (PSI) specializes in voter polls and surveys designed to keep political
office seekers informed of their position in a race.

Using telephone surveys, PSI interviewers ask registered voters who they would vote for if
the election were held that day.

In a current election campaign, PSI has just found that 220 registered voters, out of 500
contacted, favor a particular candidate. PSI wants to develop a 95% confidence interval
estimate for the proportion of the population of registered voters that favor the candidate.
Interval Estimate of a Population Proportion

PSI is 95% confident that the proportion of all voters that favor the
candidate is between 0.3965 and 0.4835.
Example: A research company conducted a survey on 300 randomly selected
taxpayers. It found that out of 300 taxpayers, 180 taxpayers have filled the “SARAL”
form correctly. Construct a 95% confidence interval to estimate the percentage of
taxpayers who have filled the form correctly in the population.
Example
A marketing research firm wants to estimate the share that foreign
companies have in the American market for certain products. A
random sample of 100 consumers is obtained, and it is found that
34 people in the sample are users of foreign-made products; the
rest are users of domestic products. Give a 95% confidence
interval for the share of foreign products in this market.
SOLUTION

pq ( 0.34 )( 0.66)
p z  0.34 1.96
2
n 100
 0.34 (1.96)( 0.04737 )
 0.34 0.0928
 0.2472 ,0.4328

Thus, the firm may be 95% confident that foreign manufacturers control anywhere
from 24.72% to 43.28% of the market.
Sample Size for an Interval Estimate of a Population Proportion
Example: Political Science, Inc.

Suppose that PSI would like a 0.99 probability that the sample proportion is within 0.03 of
the population proportion.

How large a sample size is needed to meet the required precision? (A previous sample of
similar units yielded 0.44 for the sample proportion.)
Sample Size for an Interval Estimate of a Population Proportion

Note: We used 0.44 as the best estimate of p. If no information is available about p,


then 0.5 is often used because it provides the greatest possible sample size. If we had
used = 0.5, the recommended n would have been 1843.

You might also like