0% found this document useful (0 votes)
83 views

Module 6-1

- The document discusses the sampling distribution of the sample mean when random samples are taken from a population. - If the population is normally distributed, the sampling distribution will be normally distributed regardless of sample size. If the sample size is large enough (usually n ≥ 30), the sampling distribution will be approximately normally distributed by the Central Limit Theorem. - When the population standard deviation is unknown, it is estimated using the sample standard deviation. This is used to estimate the standard error of the mean to compute P-values and margins of error using Student's t-distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Module 6-1

- The document discusses the sampling distribution of the sample mean when random samples are taken from a population. - If the population is normally distributed, the sampling distribution will be normally distributed regardless of sample size. If the sample size is large enough (usually n ≥ 30), the sampling distribution will be approximately normally distributed by the Central Limit Theorem. - When the population standard deviation is unknown, it is estimated using the sample standard deviation. This is used to estimate the standard error of the mean to compute P-values and margins of error using Student's t-distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Module 6 - 1: Inferences About Means

The Sampling Distribution of the Sample Mean

When random samples of size n are taken from a population with mean µ
and standard deviation σ, the sampling distribution of the sample mean Y
with sample size n has mean µȳ = µ and standard deviation σȳ = SD(Y ) =
σ
√ .
n

ˆ If the population has a Normal distribution, then the sampling distri-


bution of Y with sample size n will be exactly Normally distributed,
regardless of the sample size n.
ˆ For any population, if the sample size is large enough (usually n ≥ 30),
then by the Central Limit Theorem (CTL), the sampling distribution
of Y with sample size n will be approximately Normally distributed.

Y − µȳ Y −µ
Z= = σ ∼
˙ N (0, 1)
σȳ √
n

When making inferences about µ by building confidence intervals and con-


ducting hypothesis tests using sample data, the population standard devi-
ation σ is typically unknown.

In this case, we estimate σ with the sample standard deviation s and then
use the standard error
s
SE(Y ) = √
n
to estimate the standard deviation σȳ = SD(Y ).

When we compute P-values and margins of error using SE(Y ), we require


a new collection of models called the Student’s t-Models.
Module 6-1 Page 1 of 21
Student’s t Distribution

−4 −2 0 2 4

A t-distribution has an associated t-curve, which has the following proper-


ties:
ˆ The total area under a t-curve is 1.
ˆ A t-curve extends infinitely in both directions, getting very close to,
but never touching, the horizontal axis.
ˆ A t-curve is bell-shaped and symmetric about t = 0.
ˆ There are infinitely many t-distributions/t-curves. Each one is identi-
fied by its number of degrees of freedom.
ˆ We will denote the t-model with degrees of freedom df by tdf .

df = 15
df = 2
df = 1

−4 −2 0 2 4

Module 6-1 Page 2 of 21


ˆ t-curves have more spread than the standard normal curve.
ˆ t-curves have thicker tails than the Z-curve, that is, a t-curve does not
approach the horizontal axis as quickly as the Z-curve does.
ˆ As the number of degrees of freedom increases, the t-curves look in-
creasing like the standard normal curve.
ˆ For df ≥ 2001, a t-curve and the Z-curve are essentially indistinguish-
able, but when df ≥ 30, they are almost identical.
ˆ To find critical values for confidence intervals or P-values for hypothesis
tests using a t-distribution, we can use software or a t-table.
Z
t2

−4 −2 0 2 4

Using a t-Table

ˆ The table provides either upper-tailed or two-tailed areas/probabilities


as the column headings.
ˆ The table only gives certain degrees of freedom as the row headings. If
the degrees of freedom you require are not on the table, use the next
lower df value. (This also applies to the χ2-table.)
ˆ The t-scores are given in the interior of the table.
ˆ The final row gives the critical values from the Z-distribution and label
it ∞ df (t-model with infinite degrees of freedom).

Module 6-1 Page 3 of 21


Example: In each of the following, find the relevant t-score.

(a) df = 8

(b) df = 14

(c) Critical value for a 95% confidence interval with df = 29

Module 6-1 Page 4 of 21


(d) Critical value for a 99% confidence interval with df = 42

Module 6-1 Page 5 of 21


Example: In each of the following, use the t-table to find an interval in
which the probability lies and compute the probability using software.

(a) P (t5 > 2.483)

Module 6-1 Page 6 of 21


(b) P (t26 < −1.01)

Module 6-1 Page 7 of 21


(c) 2P (t17 > 1.5)

Module 6-1 Page 8 of 21


Assumptions and Conditions:
(a) Independence Assumption: values in sample must be indepen-
dent of each other.
(i) Randomization Condition: data obtained from a simple ran-
dom sample from the population or from a properly randomized
experiment.
(ii) 10% Condition: sample size should be less than 10% of popu-
lation size (when drawn without replacement).

(b) Normal Population Assumption: we require either


(i) A large sample size (n ≥ 30), or
(ii) the population distribution is normal.
* Nearly Normal Condition: The data comes from an ap-
proximately Normal population (distribution is unimodal and
reasonably symmetric). To check this, we can look at a his-
togram or Normal probability plot of the data.

Key Fact:
When these assumption are made and conditions are met, the standardized
variable
Y −µ
t=  
s

n
follows a Student’s t-model with n − 1 degrees of freedom, where n is the
sample size.
Module 6-1 Page 9 of 21
One-Sample t-Interval for the Mean when σ is Unknown

Recall: a confidence interval has the form


point estimate ± margin of error

= point estimate ± critical value × standard error of the estimate

When the population standard deviation σ is unknown and when the rel-
evant assumptions are made and conditions are met, a 100(1 − α)% confi-
dence interval for the population mean µ is
s
ȳ ± t∗n−1 √
n
where t∗n−1 is the critical value corresponding to the 100(1 − α)% confidence
level based on n − 1 degrees of freedom.

ˆ Increasing the sample size, decreases the margin of error, which makes
the confidence interval narrower.
ˆ Increasing the confidence level, increases the margin of error, which
makes the confidence interval wider.

Note: When interpreting a confidence interval, keep in mind:


ˆ The confidence interval is about the population mean, not the indi-
vidual values or the sample mean.
ˆ Avoid making probability statements about a particular interval. A
particular interval either includes µ or it does not. The population
mean µ does not vary.
ˆ The interval we have computed does not set the standard for the other
intervals.
ˆ The confidence level tells us the percentage of all possible samples of
size n that result in an interval that contains µ.

Module 6-1 Page 10 of 21


Example: A random sample of 21 washing machines was obtained and
the length (in minutes) of the wash cycle of each one was recorded. The
sample had a mean of ȳ = 37.8 minutes and a sample standard deviation
of s = 5.9 minutes. Assume the population distribution of all wash cycle
times is Normal.

(a) Give a 99% confidence interval for the true mean wash cycle time.

∴ we are 99% confident that the true mean wash cycle time is between

Module 6-1 Page 11 of 21


(b) Give a 90% confidence interval for the true mean wash cycle time.

∴ we are 90% confident that the true mean wash cycle time is between

(c) Give a 90% confidence interval for the true mean wash cycle time, if
the sample size was actually 57.

∴ we are 90% confident that the true mean wash cycle time is between

Module 6-1 Page 12 of 21


Choosing your Sample Size

Sometimes we want to know what sample size to choose in order to have a


specific margin of error with a specific confidence level.

To do this, we take the formula for the margin of error


s
M E = t∗n−1 √
n
and solve for n 2
t∗n−1s

n=
ME

Problems: since we have not taken a sample, we do not have a value for
s or the number of degrees of freedom.

Solutions:
ˆ guess the value of s, conduct a small pilot study, or use a previous
study.
ˆ use the z ∗ critical value (or t∗ with df = ∞) corresponding to the
given confidence level in place of t∗n−1.

Therefore, to find the minimum sample size to construct a 100(1 − α)%


confidence interval with a margin of error of at most M E, we use a sample
of size
2
z ∗s

n=
ME

Note: When you use this formula, always round up (to an integer) at the
end to find n!
Module 6-1 Page 13 of 21
Example: Suppose that scientists studying infant health want to estimate
the mean head circumference of all infants. How large a sample should they
take if they want to be 95% confident that the estimate is within 0.5 cm of
the true population mean? Suppose that in a previous sample, the sample
standard deviation was s = 2.1 cm.

Example: Suppose that we want to estimate the mean cost of textbooks


per semester for U of A students to within $8 of the true population mean.
Also suppose that the amount spent on textbooks per semester by U of
A students is Normally distributed and that most students spend between
$70 and $370 per semester. What is the minimum sample size we should
use to be 90% confident of attaining the level of accuracy mentioned above?

Module 6-1 Page 14 of 21


Hypothesis Test for a Population Mean: One-Sample t-Test

A hypothesis test for a population mean µ, when σ is unknown, has five


steps:

1. Assumptions/Conditions:
ˆ σ is unknown.
ˆ We have a random sample.
ˆ Individuals in sample must be independent of each other.
ˆ The population is normally distributed or the sample size is large
enough (n ≥ 30).

2. Hypotheses:
H0 : µ = µ 0

µ 6= µ0 (two-tailed test)
HA : µ < µ0 (lower-tailed test)
µ > µ0 (upper-tailed test)

3. Test Statistic:

ȳ − µ0
t0 =  
s

n

When the conditions are met, the test statistic follows a Student’s t-
distribution with n − 1 degrees of freedom, under the null hypothesis.
Module 6-1 Page 15 of 21
4. P -value: Report and interpret the P -value in context. Compute one
of the following using software or the t-table with df = n − 1:

Test P -value
Two-tailed Test 2P (tn−1 > |t0|)
Lower-tailed Test P (tn−1 < t0)
Upper-tailed Test P (tn−1 > t0)

If using a t-table, we will likely only be able to find a range in which


the P-value lies.

5. Conclusion: Report and interpret the P -value in context. Given a


significance level α,
ˆ if P - value ≤ α, we reject H0 at level α
ˆ if P - value > α, we do not reject H0 at level α

Example: A manufacturer claims that, on average, a gallon of its paint


will cover 400 square feet of surface area. (Assume the coverage areas of
the cans of paint made by this manufacturer are normally distributed.) To
test this claim, a random sample of ten 1-gallon paint cans of white paint
were used to paint ten identical areas using the same type of equipment.
The areas (in square feet) covered by these ten cans were:
310 311 412 368 447
376 303 410 365 350
(a) Do the data present sufficient evidence to indicate that the average
coverage area differs from 400 square feet? Use α = 0.05.

Note: ȳ = 365.2 and s = 48.417


Module 6-1 Page 16 of 21
1. Assumptions/Conditions:

2. Hypotheses:

3. Test Statistic:

Module 6-1 Page 17 of 21


4. P -value:

5. Conclusion:

Since , we at the
0.05 significance level, that is, there statistical
evidence to conclude that the average coverage area differs from 400
square feet (at α = 0.05).

Note: The P-value is the smallest α level at which we can reject H0.

Module 6-1 Page 18 of 21


(b) Construct a 95% confidence interval for the true mean coverage area
of all cans of paint made by this manufacturer.

∴ we are 95% confident that the true mean coverage area of all paint cans
made by this manufacturer is between

Module 6-1 Page 19 of 21


Example: Employers are often concerned about the amount of time
that employees spend each day making personal use of company technology.
Suppose that the CEO of a large company wants to determine if the average
amount of time per day spent on personal use of company devices for his
employees is more than 75 minutes. (Assume the amount of time per
day that employees spend on personal use of company devices is normally
distributed.) A random sample of 10 employees was selected and asked
about their daily personal use of company devices. Their responses were:
66 70 75 88 69
89 71 71 63 86
Do the data present sufficient evidence that the mean for this company is
more than 75 minutes? Use α = 0.05.

Note: ȳ = 74.8 and s = 9.45

1. Assumptions/Conditions:

2. Hypotheses:

Module 6-1 Page 20 of 21


3. Test Statistic:

4. P -value:

5. Conclusion:

Since , we at the
0.05 significance level, that is, there statistical
evidence to conclude that the mean time spent on personal use of
company technology is more than 75 minutes per day at this company
(at α = 0.05).
Module 6-1 Page 21 of 21

You might also like