0% found this document useful (0 votes)
5 views34 pages

Lecture 7 CIs A

This document covers the concept of confidence intervals for estimating population means using sample data, emphasizing the importance of sampling distributions and the Central Limit Theorem. It explains how to calculate confidence intervals, the factors affecting their width, and provides examples for practical application. Additionally, it discusses the margin of error and the significance of sample size and variability in determining the precision of estimates.

Uploaded by

kamssandaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views34 pages

Lecture 7 CIs A

This document covers the concept of confidence intervals for estimating population means using sample data, emphasizing the importance of sampling distributions and the Central Limit Theorem. It explains how to calculate confidence intervals, the factors affecting their width, and provides examples for practical application. Additionally, it discusses the margin of error and the significance of sample size and variability in determining the precision of estimates.

Uploaded by

kamssandaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

IMAT1703

DATA ANALYTICS AND


STATISTICS

LECTURE 7

CONFIDENCE INTERVAL FOR


THE MEAN
OVERVIEW
 Understand the concept of a sampling
distribution.
 Be aware of the Central Limit Theorem.
 Appreciate the reasons for sampling error
and the need for the standard error.
 Be familiar with the statistical background,
concepts and terminology used in calculating
confidence intervals (and hypothesis testing)
using the Normal Distribution.

2
RECALL : THE NORMAL
DISTRIBUTION

3
RECALL : STATISTICAL RESULTS BASED ON SAMPLES

 When you take a sample of data, the results


will vary from sample to sample.

 So, statistical results based on samples


should include a measure of how much the
results are expected to vary from sample to
sample.

4
THE SAMPLING DISTRIBUTION

5
THE MEAN OF THE SAMPLING DISTRIBUTION

6
THE STANDARD ERROR OF THE SAMPLING DISTRIBUTION

7
SAMPLE SIZE AND STANDARD ERROR

8
Example 7.1
 Consider a population of employees. Suppose X is the time
taken (in minutes) for an employee to word process a
business letter.
 The distribution of times (in minutes) to word process a
business letter is modelled on the normal distribution:
X is N(10, 4).
 Now take a random sample of 10 employees, measure the
time it takes for each employee to word process the letter
and find the mean of this sample of employees.
 Repeat this process with a different sample of 10 employees,
then repeat over and over again, graphing the results for all
of the samples.
 Then repeat the process above by taking random samples 9of
50 employees.
INTERPRETATION CONTINUED

10
WHEN THE SHAPE OF THE DISTRIBUTION IS UNKNOWN OR
NOT NORMAL

11
THE CENTRAL LIMIT THEOREM

12
INTRODUCING CONFIDENCE INTERVALS
A confidence interval (CI) is used for the
purpose of estimating a population parameter
(a single number that describes a population)
by using statistics (numbers that describe a
sample of data.)
Instead of giving a single estimate of the
population mean we can give a range of
values for the population mean, called a
confidence interval.
Confidence intervals are determined using 13

the sampling distribution of the mean.


INTRODUCING CONFIDENCE INTERVALS
 Example:

 You might estimate the mean household income (population


parameter) based on the mean income from a random sample
of 1000 homes (sample statistics).
 However, because sample results will vary, you need to add a
measure of that variability to your estimate
 this measure of variability is called the margin of error.

 A range of likely values for the parameter is given by:

sample statistic ± margin of error

14
 Which is the basis of calculating a confidence interval.
MARGIN OF ERROR
 The margin of error measures the variation in the
random samples due to chance.
Note As you didn’t get to sample everyone in the
population:
 you expect your sample results to be “out” by a
certain amount ‘just by chance’ and
 you acknowledge that your results could change with
subsequent samples and that they’re only accurate to
within a certain range (which is the margin of
error).
 The ultimate goal when making an estimate using a
confidence interval is to have a small margin of error.
15
 The narrower the CI, the more precision in the
results.
SO HOW DO YOU ENSURE THAT YOUR CI WILL BE
NARROW ENOUGH?

This has to be considered before data


collection (as after the data are collected,
the width of the CI has been set!)

Note As you didn’t get to sample everybody


in the population there are 3 factors that
affect the size of the margin of error:
 the confidence level
 the amount of variability in the population
16
 the sample size.
FACTOR: CHOOSING A CONFIDENCE LEVEL
Recall:
 The standard deviation measures the variation
among all possible data values within the
population itself,
whereas
 the standard error measures the variation among
all the possible values of the sample means.
 The confidence level of a CI corresponds to the
percentage of time your result would be correct if
numerous random samples were taken.

Note the most usual confidence level used is 95% 17

although others such as 99% exist.


THE CONFIDENCE LEVEL

18
FACTOR: POPULATION VARIABILITY

19
FACTOR: CHOOSING THE SAMPLE SIZE

20
EXAMPLE 7.2

21
95% CI FOR THE POPULATION MEAN
(FOR LARGE SAMPLES N≥30)

22
99% CI FOR THE POPULATION MEAN
(FOR LARGE SAMPLES N≥30)

23
99% CI FOR THE POPULATION MEAN
(FOR LARGE SAMPLES N≥30)

10 – 2.58 x 2 ÷ √50 to 10 + 2.58 x 2 ÷ √50


10 – 0.73 to 10 + 0.73
9.27 to 10.73
We are 99% confident that the true (population) mean to
24
word process the business letter lies between 9.27 and 10.73
minutes.
Example 7.3
A sample of 100 invoices is randomly selected from a large
file and the sample mean is calculated to be £86. If the
sample standard deviation is £6, calculate the 95%
confidence interval of the mean.

μ - 1.96 σ to μ + 1.96 σ
√n √n

86 – 1.96 x 6 ÷ √100 to 86 + 1.96x6 ÷ √100


86 – 1.176 to 86 + 1.176
84.824 to 87.176

25
Example 7.4
150 people were asked what their weekly income is.
The sample mean was calculated as £378 and the
sample standard deviation as £111.80.

95% CI
μ - 1.96 σ to μ + 1.96 σ
√n √n

378 – 1.96x111.8/√150 to 378 + 1.96x111.8/√150


378 – 17.89 to 378 + 17.89
360.11 to 395.89
26
Example 7.4
99% CI

μ – 2.58 σ to μ + 2.58 σ
√n √n

378 – 2.58x111.8/√150 to 378 + 2.58x111.8/√150


378 – 23.55 to 378 + 23.55
354.45 to 401.55
27
EXERCISE
A company is collecting data on the performance of its employees. It takes
a sample of 100 workers and finds the mean number of items produced
per day is 105. The standard deviation is 19.

Calculate:

a) 95% confidence interval for the population mean

μ – 1.96 σ to μ + 1.96 σ
√n √n

b) 99% confidence interval for the population mean

μ – 2.58 σ to μ + 2.58 σ 28
√n √n
CI FOR THE DIFFERENCE OF 2 POPULATION MEANS
(WHEN N1, N2 ≥30)

29
INTERPRETING THE CI FOR THE DIFFERENCE OF
2 MEANS

30
Example 7.5
 Ace Delivery Service operates a fleet of delivery vans. Currently,
the company have all of their drivers paying for the diesel using
the same brand of credit card – a Texgas credit card. However
the company’s senior management have now decided that
perhaps Quik-Chek, a chain of convenience stores that also sell
diesel (but does not accept credit cards) is worth investigating.

Texgas Quik-
A random sample of
Chek
diesel prices (per litre) at
sample size 35 40
35 Texgas petrol stations
and 40 Quik-Chek petrol
mean £1.48 £1.39
standard 8p 6p
stations, nationwide, are
deviation summarised in the table.
31
Texgas Quik-
Example 7.5 Chek

sample size 35 40
mean £1.48 £1.39
standard 8p 6p
deviation

a) Calculate an appropriate confidence interval to


compare the diesel prices.
b) What should the management of Ace Delivery
Service do: stay with Texgas or move to Quik-
Chek? Justify your answer.
c) When calculating the CI, does it matter if the
distributions of diesel prices are not normally
32
distributed? Justify your answer.
Texgas Quik-
Chek
sample 35 40
size
mean £1.48 £1.39
standard 8p 6p
deviation

33
Solutions
b)
Zero is not contained within the 95%CI and both lower and
upper limits are positive, indicating that the cost of Texgas is
significantly higher than Quik-Chek.
We are 95% confident that Texgas is between 6p and 12p
more expensive than Quik-Chek.
So management of Ace Delivery Service should move to
Quik-Chek as there is a significant difference between the
prices.
c)
No it doesn’t matter if the distributions of diesel prices (i.e.
the data values) are not normally distributed as the sample
sizes are sufficiently large (n1, n2 ≥ 30) to assume that the34
sampling distribution (of sample means) will be normally
distributed.

You might also like