Inference About A Population: - Population Mean - Population Proportion

Inference About A Population
•Population mean
•Population proportion
Inference About A Population…
Population
Sample
Inference
Statistic
Parameter
We will develop techniques to estimate and test three

population parameters:
Population Mean
Population Proportion p
Population SD σ
12.2
Inference With Variance Unknown…
Previously, we looked at estimating and testing the
population mean when the population standard
deviation ( ) was known or given:
But how often do we know the actual population

variance?
Instead, we use the Student t-statistic, given by:

12.3
Example 1
It is likely that in the near future nations will have
to do more to save the environment.
Possible actions include reducing energy use and

recycling.
Currently (2007) most products manufactured from

recycled material are considerably more expensive
than those manufactured from material found in
the earth.
12.4
Example 1
Newspapers are an exception.
It can be profitable to recycle newspaper.
A major expense is the collection from homes. In recent years a

number of companies have gone into the business of collecting
used newspapers from households and recycling them.
A financial analyst for one such company has recently computed

that the firm would make a profit if the mean weekly newspaper
collection from each household exceeded 2.0 pounds.
12.5
Example 1
In a study to determine the feasibility of a

recycling plant, a random sample of 148
households was drawn from a large community,
and the weekly weight of newspapers discarded
for recycling for each household was recorded.
Do these data provide sufficient evidence to

allow the analyst to conclude that a recycling
plant would be profitable?
12.6
IDENTIFY
Example 1
Our objective is to describe the population of the amount of
newspaper discarded per household, which is an interval
variable. Thus the parameter to be tested is the population
mean µ.
We want to know if there is enough evidence to conclude that

the mean is greater than 2. Thus,
H1 : µ > 2
Therefore we set our usual null hypothesis to:

H0 : µ = 2
12.7
IDENTIFY
Example 1
x 
The test statistic is: t
s/ n
ν = n −1
Because the alternative hypothesis is:

H1: µ > 2
the rejection region becomes:

t  t  ,  t .01,148  t .01,150  2.351
12.8
True in reality False in reality
Reject null Type 1 error α Correct decision

1-β =power of the
test.
Accept null Correct decision Type 2 error β

1-α = significance
level.
α=prob of rejecting a correct null. Type 1 error

Pvalue= prob of obtaining the calculated test
statistic.
Decision rule: Reject the null hypothesis if
pvalue< α
R: t-test
>data1=read.csv("C:\\Users\\TOSHIBA\\Desktop\\ex1_week3_onepop_mean.csv")
> newspaper=data1[,1]
> t.test(newspaper,alternative="greater",mu=2)
One Sample t-test
data: newspaper
t = 2.2369, df = 147, p-value = 0.0134
alternative hypothesis: true mean is greater than 2
95 percent confidence interval:
2.046905 Inf
sample estimates:
mean of x
2.180405
t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired =

FALSE, var.equal = FALSE, conf.level = 0.95, ...)
INTERPRET
Example 1
The value of the test statistic is t = 2.24 and its p-value is .
0134.
Our decision will depend on the level of significance α, if
α=0.01, there is not enough evidence to infer that the mean
weight of discarded newspapers is greater than 2.0.
Note that there is some evidence; the p-value is .0134.
However, because we wanted the Type I error to be small we
insisted on a 1% significance level. Thus, we cannot conclude
that the recycling plant would be profitable.
If α=0.05, we will reject the null which means that we accept
the alternative and we conclude that the recycling plant would
be profitable.
12.11
Identifying Factors
Factors that identify the t-test and estimator of
:
12.12
Check Required Conditions
The Student t distribution is robust, which means that if the
population is nonnormal, the results of the t-test and
confidence interval estimate are still valid provided that the
population is “not extremely nonnormal”.
To check this requirement, draw a histogram of the data

and see how “bell shaped” the resulting figure is. If a
histogram is extremely skewed (say in the case of an
exponential distribution), that could be considered
“extremely nonnormal” and hence t-statistics would be not
be valid in this case.
12.13
> hist(newspaper)
Histogram of newspaper
25
20
F re q u e n c y
15
10
5
0
0 1 2 3 4
newspaper
Inference: Population Proportion…
When data are nominal, we count the number of occurrences of
each value and calculate proportions. Thus, the parameter of
interest in describing a population of nominal data is the
population proportion p.
This parameter was based on the binomial experiment.
Recall the use of this statistic:
where p-hat ( ) is the sample proportion: x successes in a

sample size of n items.
12.15
Inference: Population Proportion…
When np and n(1–p) are both greater than 5, the
sampling distribution of is approximately normal
with
mean:
standard deviation:
Hence:
12.16
Inference: Population Proportion
Test statistic for p:
The confidence interval estimator for p is given by:
(both of which require that np>5 and n(1–p)>5)

12.17
Example 2
After the polls close on election day networks compete
to be the first to predict which candidate will win.
The predictions are based on counts in certain

precincts and on exit polls.
Exit polls are conducted by asking random samples of

voters who have just exited from the polling booth
(hence the name) for which candidate they voted.
12.18
Example 2
In American presidential elections the candidate who
receives the most votes in a state receives the state’s
entire Electoral College vote.
In practice, this means that either the Democrat or the

Republican candidate will win.
Suppose that the results of an exit poll in one state were

recorded where 1 = Democrat and 2 = Republican.
Xm12-05*
12.19
Example 2
The polls close at 8:00 P.M.
Can the networks conclude from these data that

the Republican candidate will win the state?
Should the network announce at 8:01 P.M. that

the Republican candidate will win?
12.20
IDENTIFY
Example 2
The problem objective is to describe the population of votes in
the state. The data are nominal because the values are
“Democrat” and “Republican.” Thus the parameter to be
tested is the proportion of votes in the entire state that are for
the Republican candidate. Because we want to determine
whether the network can declare the Republican to be the
winner at 8:01 P.M., the alternative hypothesis is
H1: p > .50
And hence our null hypothesis becomes:

H0: p = .50
12.21
IDENTIFY
Example 2
The test statistic is
p̂  p
z
p(1  p) / n
12.22
>data2=read.csv("C:\\Users\\TOSHIBA\\Desktop\\eslasca\\ex2_week3_onepop
_prop.csv")
> votes=data2[,1]
> x=407
> n=765
> prop.test(x,n,alternative="greater")
1-sample proportions test with continuity correction
data: x out of n, null probability 0.5

X-squared = 3.0118, df = 1, p-value = 0.04133
alternative hypothesis: true p is greater than 0.5
95 percent confidence interval:
0.5016378 1.0000000
sample estimates:
p
0.5320261
INTERPRET
Example 2
At the 5% significance level we reject the null
hypothesis and conclude that there is enough
evidence to infer that the Republican candidate
will win the state.
However, is this the right decision?
12.24
INTERPRET
Example 2
One of the key issues to consider here is the cost
of Type I and Type II errors.
A Type I error occurs if we conclude that the

Republican will win when in fact he has lost.
12.25
INTERPRET
Example 2
Such an error would mean that a network would
announce at 8:01 P.M. that the Republican has
won and then later in the evening would have to
admit to a mistake.
If a particular network were the only one that

made this error it would cast doubt on their
integrity and possibly affect the number of
viewers.
12.26
INTERPRET
Example 2
This is exactly what happened on the evening of the
U. S. presidential elections in November 2000.
Shortly after the polls closed at 8:00 P.M. all the

networks declared that the Democratic candidate
Albert Gore would win in the state of Florida.
A couple of hours later, the networks admitted that

a mistake had been made and the Republican
candidate George W. Bush had won.
12.27
INTERPRET
Example 2
Several hours later they again admitted a mistake and finally
declared the race too close to call.
Fortunately for each network all the networks made the same
mistake.
However, if one network had not done this it would have developed
a better track record, which could have been used in future
advertisements for news shows and likely drawn more viewers.
Considering the costs of Type I and II errors it would have been

better to use a 1%significance level.
12.28
Identifying Factors
Factors that identify the z-test and interval
estimator of p:
12.29

Inference About A Population: - Population Mean - Population Proportion

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Inference About A Population: - Population Mean - Population Proportion

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Inference About A Population: - Population Mean - Population Proportion

Uploaded by

Copyright:

Available Formats

Inference About A Population

We will develop techniques to estimate and test three

But how often do we know the actual population

Instead, we use the Student t-statistic, given by:

Possible actions include reducing energy use and

Currently (2007) most products manufactured from

It can be profitable to recycle newspaper.

A major expense is the collection from homes. In recent years a

A financial analyst for one such company has recently computed

In a study to determine the feasibility of a

Do these data provide sufficient evidence to

We want to know if there is enough evidence to conclude that

Therefore we set our usual null hypothesis to:

Because the alternative hypothesis is:

the rejection region becomes:

Reject null Type 1 error α Correct decision

Accept null Correct decision Type 2 error β

α=prob of rejecting a correct null. Type 1 error

One Sample t-test

t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired =

To check this requirement, draw a histogram of the data

This parameter was based on the binomial experiment.

Recall the use of this statistic:

where p-hat ( ) is the sample proportion: x successes in a

The confidence interval estimator for p is given by:

(both of which require that np>5 and n(1–p)>5)

The predictions are based on counts in certain

Exit polls are conducted by asking random samples of

In practice, this means that either the Democrat or the

Suppose that the results of an exit poll in one state were

Can the networks conclude from these data that

Should the network announce at 8:01 P.M. that

H1: p > .50

And hence our null hypothesis becomes:

1-sample proportions test with continuity correction

data: x out of n, null probability 0.5

However, is this the right decision?

A Type I error occurs if we conclude that the

If a particular network were the only one that

Shortly after the polls closed at 8:00 P.M. all the

A couple of hours later, the networks admitted that

Considering the costs of Type I and II errors it would have been

You might also like