0% found this document useful (0 votes)
19 views26 pages

Confidence Interval

The document discusses confidence interval estimation for proportions and means. It provides examples of estimating population proportions and means from sample data using confidence intervals. It also discusses interpreting confidence intervals and their applications to problems involving estimating revenues, costs, losses, and differences in proportions or means.

Uploaded by

Ashutosh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views26 pages

Confidence Interval

The document discusses confidence interval estimation for proportions and means. It provides examples of estimating population proportions and means from sample data using confidence intervals. It also discusses interpreting confidence intervals and their applications to problems involving estimating revenues, costs, losses, and differences in proportions or means.

Uploaded by

Ashutosh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Confidence Interval Estimation

Karthik Sriram
IIM Ahmedabad
Estimating Population Proportion

Suppose we want to estimate proportion ‘p’

p=proportion of people in the population who


are willing to buy a product

We ask a random sample of n=100 people

Suppose 𝑝 = 35% say they are willing to buy


(sample proportion)
Estimating Population Proportion
Will 𝑝 =35% be necessarily same as p ?

Credit: Getty Images/iStockphoto


Estimating Population Proportion

If each student in this class had conducted an


independent survey and calculated 𝑝 based on
their own sample, then would they each get
35% again?
One survey

Possible values of 𝑝

Green=actual population value of p


If somebody else had done the survey

Possible values of 𝑝

Green=actual population value of p


3 different surveys could have given

Possible values of 𝑝

Green=actual population value of p


4 different surveys could have given…

Possible values of 𝑝

Green=actual population value of p


10 different surveys (with n=100 each) could have
potentially given different results…some to the
right and some to the left of true p

Possible values of 𝑝

Green=actual population value of p


So, there is a probability distribution of possibilities
of 𝑝 that can potentially result from any survey

Possible values of 𝑝̂

Green=actual population value of p


CLT gives the approximate distribution of 𝑝
( )
𝑝 ∼ N 𝑚𝑒𝑎𝑛 = 𝑝, 𝑆𝐸 =
In sampling,
SD(𝑝) 𝑖𝑠 referred as “Standard Error”, SE(𝑝)

Possible values of 𝑝̂

Green=actual population value of p


95% Confidence Interval
( )
Since 𝑝̂ ∼ N 𝑚𝑒𝑎𝑛 = 𝑝, 𝑆𝐸 =

“With 95% confidence, p will be in [𝑝̂ ± 1.96 𝑆𝐸]”

1.96 SE is called Margin of Error for 95% confidence

( )
Practically, we use 𝑆𝐸 = in place of SE, and say
“With 95% confidence, p will be in [𝑝̂ ± 1.96 𝑆𝐸]”
Interpretation of 95% CI
P( 𝑝 in [𝑝 ± 1.96 𝑆𝐸]) ≈0.95

There is a 95% chance that the confidence


interval constructed based on a random sample
will capture the true population value of p.
Exercise
95% CI = [𝑝 − 1.96 𝑆𝐸, 𝑝 + 1.96 𝑆𝐸])

Compute 95% CI when p_hat=.35, n=100

How will you compute 99% CI?

What will happen if population size is small?


N=1000
Small or Finite population
For small population, i.i.d assumption fails

Here, still 95% CI = [𝑝̂ − 1.96 𝑆𝐸, 𝑝̂ + 1.96 𝑆𝐸])

( )
But 𝑆𝐸 = ×

If N= population size, n= sample size then


finite population correction =

957 C2 Mayin
of
error

21
FI ie
2
i

Some Use Cases


Confidence Interval Estimation

for “Proportions”
8544 14788 18000
8000
8000

Use case 1: Bread n’ Butter


Bread n’ Butter, a popular sandwich shop is located in a small town where its
potential total customer base is 1000. The shop introduced a new ‘Spicy
Pineapple Sandwich’ (SPS). During the first week it was found that, out of a
(random) sample of 150 customers, 35 bought SPS. The SPS is priced at Rs.
250. Assuming 200 customers will visit the shop next week, what can we say
about the revenues from SPS next week?
o At 95% confidence, can we say that the revenues will be more than 8000?,
how about more than 9000?
o The owner is hoping to meet Rs 18000 of his loan obligations next week based
on the revenues from SPS. Is it reasonable for the owner to assume a revenue
of 18000 from SPS?
o What if the loan obligation had been only Rs 8000?
o How would we update our answers if the shop’s potential customer base is
very large instead of 1000?
o If we wanted to estimate the proportion of customers who buy SPS within a
Margin of Error of 0.04, then what should be the sample size?
Use case 2: in-store Smart phone
shopping
Google1 along with M/A/R/C research, conducted a
survey in the U.S to study the impact of smart
phones on in-store shopping. They surveyed a
random sample of 1,507 smart-phone owners, who
completed a 3 part survey for a shopping trip.
Questions of interest would include for e.g. “When
do they use it, before going to the store, to search a
store? do they use it inside the store while
shopping? etc. Assume the number of smart-phone
owners is 200000.

1. What is a 99% interval estimate for the “%


smart phone owners that are smart phone
shoppers”?
2. Suppose that for every In-store-shopper, it is
possible to increase revenue by 100 dollars in a
year using some targeted marketing. Based on
the survey results what can we suggest as the
resulting revenue increase at 99 % confidence?
Estimating Population Mean

Suppose we want to estimate average ‘𝜇’

𝜇 =average income per household in a town

We ask a random sample of n=100 households

𝜇̂ = 𝑥,̅ ‘sample mean’ income per household


95% confidence interval
95% confidence interval
𝑥̅ ± 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒 × 𝑆𝐸(𝑥)̅

o 𝑆𝐸 𝑥̅ =

𝜎= SD of x-variable in the population

o 𝑆𝐸 𝑥̅ =

𝑠= SD of x-variable in the sample
95% confidence interval
95% confidence interval
𝑥̅ ± 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒 × 𝑆𝐸(𝑥)̅

o 𝑆𝐸 𝑥̅ = Small population
√ √
𝜎= SD of x-variable in the population

o 𝑆𝐸 𝑥̅ = Small population
√ √
𝑠= SD of x-variable in the sample
95% confidence interval
95% confidence interval
𝑥̅ ± 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒 × 𝑆𝐸(𝑥)̅

o multiple=1.96 = 97.5th pctile of N(0,1) if n>50

o multiple = 97.5th pctile of t-distribution with


n-1 degrees of freedom, if n<=50
Some Use Cases
Confidence Interval Estimation

for “Mean”
Use Case 3: Subsidies
A state government promised subsidies on electricity consumption, public
education, water consumption among other schemes, and believed based
on some calculations that it will achieve a cost saving of Rs 2500 per
month on an average per household.

After the scheme had been implemented for a year, an independent


agency conducted a survey of 3000 randomly chosen households. For this
sample, the average saving was found to be Rs. 2600 per month and the
standard deviation in the sample was Rs. 1100.

Based on a 95% confidence interval, is there evidence to suggest that the


state government has met its promise?
Use case 4: Income tax audit
An income-tax audit is planned in a business district of a city consisting of 50 businesses. The
auditor has a record of the income filed by each business. However, there is a doubt that
there is a lot of under-reporting which may be leading an overall under-collection of taxes.
The income tax department wants to estimate the total overall loss to Income-tax
department from these businesses. However, the authorities cannot audit more than 5
businesses due to resource constraints. Therefore they took a random sample of 5
businesses and noted their Reported tax, Audited tax value and consequently, Loss=(Audited
–Reported) values.
IDs for the Business units sampled : #45, #8, #33, #41, #4
Loss= Audited – Reported tax (in Rs Lakhs): 6.8, 3.7, 12.2, 6.8, 3.2

o Based on the random sample, what can we say about the Average Loss from the 50
businesses, at 95% confidence ?
o What can we say about the Total Loss at 95% confidence?
o Suppose from prior knowledge of context and pilot studies we know that the
population variance (𝜎 ) <10, then to ensure a margin of error of 1 lakh in estimating
the mean loss, what should have been the sample size?

MEEEE É c
IL

Margin
of 2 10
error 6N
distribution

4
In
x̅ NN distribution
CLT t with
tiff 1dg

2 NHD 6 freedom
s
Further reference (not in syllabus)
Other estimation problems:

o Difference in proportions ( 𝑝 − 𝑝 ): e.g. change in %


buyers due to marketing
o Difference in Mean (𝜇 − 𝜇 ): e.g. Decrease in average
issue-resolution time at a IT help desk, after a new training

Again, these involve point estimate +/- margin of error.

Can be implemented by referring text book for formulas.

t
Detail m

n n Us

estimate
t E
effective datasize
ms n
m Ñ n t 4

You might also like