0% found this document useful (0 votes)
12 views15 pages

Chap8 STAT 2 Merged

Chapter 7 discusses sampling distributions, defining key concepts such as parameters, statistics, and the sampling distribution of sample means and proportions. It explains how to calculate and interpret these distributions, including examples of estimating population parameters and constructing confidence intervals. Chapter 8 continues with point estimation and confidence intervals, emphasizing the importance of unbiased estimators and the Central Limit Theorem in statistical analysis.

Uploaded by

Bảo Châu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views15 pages

Chap8 STAT 2 Merged

Chapter 7 discusses sampling distributions, defining key concepts such as parameters, statistics, and the sampling distribution of sample means and proportions. It explains how to calculate and interpret these distributions, including examples of estimating population parameters and constructing confidence intervals. Chapter 8 continues with point estimation and confidence intervals, emphasizing the importance of unbiased estimators and the Central Limit Theorem in statistical analysis.

Uploaded by

Bảo Châu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter 7: Sampling distributions

Nguyen Minh Tri

University of Information Technology

April 17, 2025

7.1 Parameters, statistics and sampling distribution


When we wonder about the social world, we are actually wondering about characteristics
of a population.
1. What is the mean age of people in prison in the Vietnam?
2. How much time do UIT students spend playing games per day on average?
3. What is the average income of an information technology engineer in Vietnam?
Definition 7.1 Let X , X , . . . , X be n independent random variables, each having the
same probability distribution. We call X , X , . . . , X to be a random sample of size n
from the population.
Definition 7.2 Any function of the random variables constituting a random sample is called
a statistic.
Example 7.3 Let X , X , . . . , X be a random sample from a population.
1. Sample mean
1
X= X
n
2. Sample variance
1
S = (X X)
n 1
3. Sample standard deviation: S = S .
4. Assume that there are X elements has property . Then sample propositon
X
P̂ = .
n
Definition 7.4 A parameter is a numerical descriptive measure of a population.
Denote: µ population mean; population variance; population standard
deviation; p population proportion.
Value of statistic: x sample mean; s sample variance; s sample standard deviation;
p̂ sample proportion.
Example 7.5 identify the boldface number as the value of a population parameter or a
sample statistic.
1. In a recent survey of Americans, 52% of Republicans say global warming is
happening.
2. A spokesman for a large insurance agency reported that the proportion of all women
with some form of life insurance is 0.32.
3. The U.S. Department of Transportation recently reported that the mean age of all
highway bridges in the United States is 42 years.
Solution.
1. 52% is a ....................................... This number describes a characteristic of a
sample of Republicans.
2. 0.32 is a ...................................... This number describes a characteristic of the
entire population of women.
3. 42 is a ...................................... This number is a characteristic of all the highway
bridges in the United States.
Any statistic is a random variable, because the value of the statistic di↵ers from
sample to sample.

Definition 7.6 The sampling distribution of a statistic is the probability distribution of


the statistic.
Example 7.7 Consider a population consisting of 6 measurements 2, 4, 6, 6, 7, 8. A
random sample of n = 2 measurements is selected from the population. Find the sampling
distribution of the sample mean X.
Solution. All possible samples of n = 2 measurements
Samples x Samples x Samples x
2,4 3 4,6 5 6,7 6,5
2,6 4 4,6 5 6,8 7
2,6 4 4,7 5,5 6,7 6,5
2,7 4,5 4,8 6 6,8 7
2,8 5 6,6 6 7,8 7,5
Calculating the probabilities of the remaining values of X and arranging them in a table,
we obtain the probability distribution shown here
X 3 4 4, 5 5 5, 5 6 6, 5 7 7, 5
P (X = x )
This is the sampling distribution for X.
The expectation of X
1 2 1
E(X) = 3. + 4. + + 7, 5. = 5, 5 = µ (population mean)
15 15 15
Variance of X
1 2 1 47
var(X) = (3 5, 5) + (4 5, 5) + + (7, 5 5, 5) =
15 15 15 30

7.2 The sampling distribution of the sample mean


Let X be the mean of a random sample of size n drawn from a population with mean µ
and variance .
1. Mean of sampling distribution equals mean of sampled population: E(X) = µ
2. Standard deviation of sampling distribution: =
n
3. When n is efficiently large (n 30), the sampling distribution of X will be
approximately a normal distribution with mean E(X) = µ and standard deviation
= / n.
X µ
4. If the population is distributed normally, then the distribution of T := has
S/ n
t-distribution of (n 1) degree of freedom.
7.3 The distribution of sample proportion
Consider a sample of n individuals or objects (or trials) and let X be the number of
successes in the sample. The sample proportion is defined to be
X
P̂ = .
n
Let P̂ be the sample proportion of successes in a sample of size n from a population with
true proportion of success p. The sampling distribution of P̂ is summarized below.

1. The mean of P̂ : E(P̂ ) = p.


p(1 p)
2. The variance of P̂ : var(P̂ ) := = .
n
p(1 p)
3. The standard deviation of P̂ : = .
n
4. If n is large and both np 5 and n(1 p) 5, then the distribution of P̂ is
approximately normal: P̂ N (p; p(1 p)/n).
Example 7.8 In early 2013, the number of people looking to change auto insurance com-
panies reached a record low. However, for the population of those people who were looking
for a better deal, 45% did change companies. Suppose 110 people looking for a better deal
on auto insurance are selected at random and the number who actually switch policies is
determined.
a. Find the distribution of the sample proportion of people who switch policies, P̂ .
b. What is the probability that the sample proportion (for the 110 people selected) is
greater than 0.50?
c. Find the probability that the sample proportion will be between 0.37 and 0.47.
Solution.
a. For n = 110; p = 0.45 and
np = 110.(0.45) = 49.5 > 5; n(1 p) = 110.(0.55) = 60.5 > 5
The distribution of P̂ is approximately normal with
p(1 p) (0.45).(0.55)
µ = p = 0.45; = = = 0.00225
n 110
Hence P̂ N (0.45; 0.00225) and = 0.00225.
b. Find P (P̂ > 0.5)

P̂ 0.45 0.5 0.45


P (P̂ > 0.5) = P >
0.00225 0.00225
= P (Z > 1.05)
=1 P (Z 1.05)
=1 0.8531 = 0.1469

Find P (0.37 < P̂ < 0.47).

0.37 0.45 P̂ 0.45 0.47 0.45


P (0.37 < P̂ < 0.47) = P < <
0.00225 0.00225 0.00225
= P ( 1.69 < Z < 0.42)
= (0.42) ( 1.69) = 0.6628 0.0455 = 0.6173
Chapter 8: Estimations

Nguyen Minh Tri

University of Information Technology

April 17, 2025

8.1 Point estimation


Definition 8.1
1. A point estimate of a population parameter is a single number computed from a
sample, which serves as a guess for the parameter.
2. An estimator is a statistic of interest and is, therefore, a random variable.
3. An estimate is simply a specific value of an estimator.
Suppose we need to estimate a population parameter ✓ and there are many di↵erent
statistics (rules) available.
Example 8.2 Considering the temperature of 20 di↵erent times of a location, people get
the following data table:
24.46 25.61 26.25 26.42 26.66 27.15 27.31 27.54 27.74 27.94
27.98 28.04 28.28 28.49 28.50 28.87 29.11 29.13 29.50 30.88
We want to estimate the average temperature µ of this location.
Consider the following estimators and resulting estimates for µ
1. Estimator = X, estimate = x = x /n = 555.86/20 = 27.79 (sample mean)
2. Estimator = X̃, estimate = x̃ = (27.94 + 27.98)/2 = 27.96 (sample median)
3. Estimator = (max(X) min(X))/2, estimate = (24.46 + 30.88)/2 = 27.67
Which estimator, when used on other samples of X ’s, will tend to produce estimates
closest to the true value?
Suppose ✓ˆ is an estimator for the population parameter ✓. So ✓ˆ is a random variable and
so it has a mean, variance, or distribution.
Definition 8.3 A statistic ✓ˆ is an unbiased estimator of a population parameter ✓ if
ˆ = ✓.
E(✓)
Probability Density
True value ✓
Unbiased Estimator

Biased Estimator

Estimate
] ✓ˆbiased ]
E[✓ˆunbiasedE[

Figure: Probability Distributions of Biased and Unbiased Estimators

Example 8.4
The sample mean is an unbiased estimator of the population mean (E(X) = µ).
Sample variance is an unbiased estimator for population variance (E(S ) = ).
The sample proportion is an unbiased estimator for the population proportion
(E(P̂ ) = p).

The second rule for choosing a statistic is that, of all unbiased statistics, the best statistic
to use is the one with the smallest variance.
Probability Density
True value ✓

Low variance

High variance

Estimate
E[✓ˆ ] = E[✓ˆ ] = ✓

Figure: Two Unbiased Estimators with Di↵erent Variances

The choice of an estimator is a difficult decision, and there is no definitive answer.


Suppose we estimate the population parameter ✓, and there are several unbiased
statistics from which to choose. If one of these statistics has the smallest possible
variance, it is called the MVUE (minimum-variance unbiased estimator).
If the population is normal, the sample mean is a really good statistic to use for
estimating µ.
8.2 A confidence interval for a population mean when is known
If we say that the average monthly salary of an IT engineer with 5 years of experience is
26.2 million VND, then that number is not very reliable. You may be close, but chances
are that the true value isn’t really 26.2 but somewhere around it. When you think about
it, it’s much safer to say that the average monthly salary of an IT engineer with 5 years of
experience is somewhere between 22.2 and 30.4 million VND. In this way, you have created
a confidence interval around your point estimate of 26.2.
You may say that you are 95% confident that the average monthly salary (population
parameter) lies between 22.2 and 30.4 million VND. The number 95% is called confidence
level.
Definition 8.5
An interval estimate of a parameter is an interval or a range of values used to
estimate the parameter. This estimate may or may not contain the value of the
parameter being estimated.
Confidence level, denote 1 ↵, of an interval estimate of a parameter is the
probability that the interval estimate will contain the parameter, assuming that a
large number of samples are selected and that the estimation process on the same
parameter is repeated.
A confidence interval is a specific interval estimate of a parameter determined by
using data obtained from a sample and by using the specific confidence level of the
estimate.
If our confidence level is 95% then in the long run, 95% of our confidence intervals will
contain population parameter and 5% will not.

↵/2 ↵/2
confidence level
1 ↵
✓ z ✓ ✓+z

The sample mean X is a point estimator of the population mean µ. According to the
Central Limit Theorem, the sampling distribution of the sample mean is approximately
normal for large samples. For a large sample of size n 30.
1. Suppose either (a) population is normal, or (b) the sample size n is large, or
both, and the population standard deviation is known.
2. The sample mean X is (approximately) normal: X N (µ; /n).
3. A 100(1 ↵)% confidence interval for µ :
x z
n
where z can be found in the normal table.
Definition 8.6 The value z is called critical value. It is a value on the measurement
axis in a standard normal distribution such that P (Z z ) = 1 ↵/2.
Example 8.7 ( is known) Fifteen vehicles were observed at random for their speeds (in
mph) on a highway with speed limit posted as 70 mph, and it was found that their average
speed was 73.3 mph. Suppose that from past experience we can assume that vehicle speeds
are normally distributed with = 3.2.
Construct a 90% confidence interval for the true mean speed µ, of the vehicles on this
highway.
Solution.
We have . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Confidence level 1 ↵ = .............. Hence ↵ = ............. and z = ..................
A ................. confidence interval for the true mean speed µ is

x z <µ<x z
n n
or
71.681 < µ < 74.919.
We are 90% confident that the true mean speed µ of the vehicles on this highway is
between 71.681 and 74.919 mph.
Example 8.8 Suppose the weight of a 185/60/14 tire filled with air is normally distributed
with standard deviation 1.25 pounds. In a random sample of 15 filled tires, the sample
mean weight was x = 18.75 pounds. Find a 95% confidence interval for the true mean
weight of 185/60/14 tires.
Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....................................................................................
....................................................................................
....................................................................................

....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
Definition 8.9 The margin of error for the estimate of µ is z
n
Example 8.10 A manufacturing firm is interested in estimating the average distance trav-
eled to work by its employees. Past studies of this type indicate that the standard deviation
of these distances should be in the neighborhood of 2 miles. How many employees should
be sampled if the estimate is to be within 0.1 mile of the true average, with 95% confi-
dence?
Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................

8.3 A confidence interval for a population mean when is unknown

Case 1: Assume that the population standard deviation is unknown and the sample size
n is large (n 30). The confidence interval is
s
x z
n
where s is the sample standard deviation.
Example 8.11 ( is unknown, large sample) Scholastic Aptitude Test (SAT) mathematics
scores of a random sample of 500 high school seniors in a city are collected, and the
sample mean and standard deviation are found to be 501 and 112, respectively. Find a
99% confidence interval on the mean SAT mathematics score for seniors in that city.
Solution.
Sample mean x = ..........., sample standard deviation ............, sample size n = .....
Confidence level: ........................... Hence ↵ = .............. and z = ....................
A 99% confidence interval of µ is
s s
x z <µ<x+z
n n

488.1 < µ < 513.9.

We are 99% confident that the true mean SAT mathematics score of seniors is
between 488.1 and 513.9.
Example 8.12 Suppose a PC manufacturer wants to evaluate the performance of its hard
disk memory system. One measure of performance is the average time between failures of
the disk drive. To estimate this value, a quality control engineer recorded the time between
failures for a random sample of 45 disk-drive failures. The following sample statistics were
computed: x = 1762 hours and s = 215 hours. Estimate the true mean time between
failures with a 90% confidence interval.
Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
Case 2: Assume that the population from which the sample is selected has an approximate
normal distribution. The population standard deviation is unknown and the sample size
n is small (n < 30). Then the random variable
X µ
T =
S/ n
has a Student t-distribution with n 1 degrees of freedom. Here S is the sample standard
deviation.
If x and s are the sample mean and the sample standard deviation of a random
sample of size n from a normal population, then
s s
x t <µ<x+t
n n
is a (1 ↵)100% CI for the population mean µ.
Definition 8.13 t is a critical value related to a t-distribution with m degrees of freedom.
If T has a t-distribution with m degrees of freedom, then P (T t ) = ↵.
Example 8.14 The following is a random data set from a normal population:
7.2 5.7 4.9 6.2 8.5 2.8
Construct a 95% CI for the population mean µ.
Solution.
Mean and standard deviation of the sample: x = 5.883 and s = 1.959
Population follows .................. distribution, is ................., sample size n = ........,
we we use t-distributrion with ....... degrees of freedom.

Confidence level 95% means that ↵ = 1 0.95 = 0.05, we have from the t-table
that t = 2.571.
Hence, a 95% CI for µ is:
s s
x t ,x + t
n n
or
(3.827; 7.939)
We are 95% confident that the true mean µ will be between 3.827 and 7.939.
Example 8.15 A random sample of size 26 is drawn from a population having a normal
distribution. The sample mean and the sample standard deviation from the data are given,
respectively, as x = 2.22 and s = 1.67. Construct a 98% CI for the population mean µ
and interpret.
Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
Example 8.16 The following data, expressed as an air pollution index, give the air quality
of a city for 10 randomly selected days:
57.3 ; 58.1 ; 58.7 ; 66.7 ; 58.6 ; 61.9 ; 59.0 ; 64.4 ; 62.6 ; 64.9
Assuming that the data may be looked upon as a random sample from a normal population,
construct a 95% CI for the actual average air pollution index for this city and interpret.
Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................

8.4 Confidence interval for a population proportion


Consider a binomial distribution with parameter p. Let X be the number of
successes in n trials.
A point estimator of the proportion p is given by the statistic P̂ = X/n.
By the Central Limit Theorem, for n sufficiently large, P̂ is approximately normally
p(1 p)
distributed with mean µ = p and variance = .
n
If P̂ is the proportion of successes in a random sample of size n, an approximate
100(1 ↵)% confidence interval, for the binomial parameter p is given by

p̂(1 p̂) p̂(1 p̂)


p̂ z < p < p̂ + z
n n
One should require both np̂ > 5 and n(1 p̂) > 5.
Example 8.17 An auto manufacturer gives a bumper-to-bumper warranty for 3 years or
36,000 miles for its new vehicles. In a random sample of 60 of its vehicles, 20 of them
needed five or more major warranty repairs within the warranty period. Estimate the true
proportion of vehicles from this manufacturer that need five or more major repairs during
the warranty period, with confidence level 0.95.
Solution.
20 1
Sample proportion: p̂ = = .
60 3
Confidence level: 1 ↵ = 0.95. Hence, ↵ = 0.05 and z = 1.96.
Sample size n = 60 and np̂ = 20, n(1 p̂) = 40.
A 95% confidence interval for population proportion p is
p̂(1 p̂) p̂(1 p̂)
p̂ z < p < p̂ + z
n n
or
0.21405 < p < 0.45262
We are 95% confident that the true proportion of vehicles from this manufacturer
that need five or more major repairs during the warranty period will lie in the
interval (0.21405, 0.45262).
Example 8.18 The proportion of defective memory chips produced by a factory is p. Sup-
pose 400 chips are tested and 10 of them are found to be defective. Compute 95%
confidence interval for the proportion of defective chips.
Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................

Example 8.19 A survey conducted of 1404 respondents found that 323 students paid for
their education by student loans. Find the 90% confidence interval of the true proportion
of students who paid for their education by student loans.
Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
8.5 Determining sample size
We show in this section that the appropriate sample size for making an inference
about a population mean or proportion depends on the desired reliability.
The di↵erence between x and µ is the error of estimation resulting from the
sampling process. Let E = x µ = the error of estimation. Note that
x µ E
z= =
/ n / n
A formula is used to determine sample size
z
n=
E
The population variance is known or can be determined from past studies.
If population variance is unknown, then we can estimate
1
= range
4

↵/2 ↵/2
1 ↵

µ z µ µ+z
range

Example 8.20 The estimation of a new operating system’s mean response time to an
editing command should have an error bound of 5 milliseconds with 95% confidence. Ex-
perience with other operating systems suggests that = 25 is a reasonable approximation
to the population standard deviation. What sample size n should be used?
Solution.
For confidence level 1 ↵ = 0.95, we have z = 1.96.
Error of estimation: E = 5
Standard deviation: = 25
The sample size
z (1.96).(25)
n= = 96.04
E 5
We round up n = 97.
Example 8.21 The manufacturer of official NFL footballs uses a machine to inflate its
new balls to a pressure of 13.5 pounds. When the machine is properly calibrated, the
mean inflation pressure is 13.5 pounds, but uncontrollable factors cause the pressures of
individual footballs to vary randomly from about 13.3 to 13.7 pounds. For quality control
purposes, the manufacturer wishes to estimate the mean inflation pressure to within .025
pound of its true value with a 99% confidence interval. What sample size should be used?
Solution.
For confidence level 1 ↵ = 0.99, we have z = 2.575.
Error of estimation: E = 0.25
1
The range of observations 13.7 13.3 = 0.4, hence we can estimate = 0.4 = 0.1.
4
The sample size
z (2.575).(0.1)
n= = 106.09
E 0.25
We round up n = 107.
Example 8.22 A large manufacturing firm is interested in estimating the average distance
traveled to work by its employees. Past studies of this type indicate that the standard
deviation of these distances should be in the neighborhood of 2 km. How many employees
should be sampled if the estimate is to be within 0.1 km of the true average, with 95%
confidence?
Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................

In order to estimate binomial probability p within error estimation E with (1 ↵)100%


confidence, the required sample size is found by solving the following equation for n :

p̂(1 p̂)
E=z
n
The formula for determining sample size
z p̂(1 p̂)
n=
E
Example 8.23 Suppose that a local TV station in a city wants to conduct a survey to
estimate support for the president’s policies on the economy within 3% error with 95%
confidence. Suppose they have an initial estimate that 70% of the people in the city support
the economic policies of the president. How many people should the station survey?
Solution.
For confidence level 1 ↵ = 0.95, we have z = 1.96.
Error of estimation: E 0.03
Sample proportion p̂ = 0.7
The sample size
z p̂(1 p̂) (1.96) .(0.7)(0.3)
n = 896.37.
E 0.05
Thus, the TV station must survey at least 897 people.
Example 8.24 A researcher wishes to estimate, with 95% confidence, the proportion of
people who did not have a tablet. A previous study shows that 40% of those interviewed did
not have a tablet. The researcher wishes to be accurate within 2% of the true proportion.
Find the minimum sample size necessary.
Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................

Example 8.25 Suppose you want to estimate the average age of all Boeing 737-300 air-
planes now in active domestic U.S. service. You want to be 95% confident, and you want
your estimate to be within one year of the actual figure. The 737-300 was first placed in
service about 24 years ago, but you believe that no active 737-300s in the U.S. domestic
fleet are more than 20 years old. How large of a sample should you take? (97)
Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................
....................................................................................

You might also like