0% found this document useful (0 votes)
6 views29 pages

Estimation

The document covers estimation methods in statistics, focusing on point and interval estimation for population parameters. It explains properties of point estimators, confidence intervals for population means and proportions, and introduces hypothesis testing, including Type I and Type II errors. Additionally, it discusses the relationship between hypothesis tests and confidence intervals, and provides examples to illustrate these concepts.

Uploaded by

tonin10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views29 pages

Estimation

The document covers estimation methods in statistics, focusing on point and interval estimation for population parameters. It explains properties of point estimators, confidence intervals for population means and proportions, and introduces hypothesis testing, including Type I and Type II errors. Additionally, it discusses the relationship between hypothesis tests and confidence intervals, and provides examples to illustrate these concepts.

Uploaded by

tonin10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Introduction Statistics II

Simone Tonin

1/29
Estimation

2/29
Estimation

The values of population parameters are often unknown.


We use a representative sample of the population to
estimate the population parameters.
There are two types of estimation:
Point Estimation
Interval Estimation

3/29
Point estimation

A point estimate is a single numerical value used to


estimate the corresponding population parameter. A point
estimate is obtained by selecting a suitable statistic (a
suitable function of the data) and computing its value from
the given sample data. The selected statistic is called the
point estimator.

The point estimator is a random variable, so it has a


distribution, mean, variance etc.

e.g. the sample mean X = n1 ni=1 Xi is one possible point


P
estimator of the population mean µ.
The point estimate is x̄ = n1 ni=1 xi .
P

4/29
Point estimation: properties

Let θ be the unknown population parameter and θ̂ be its


estimator. The parameter space is denoted by Θ.

An estimator θ̂ is called unbiased estimator of θ if


E(θ̂) = θ.

The bias of the estimator θ̂ is defined as Bias(θ̂) = E(θ̂) − θ

5/29
Point estimation: properties

Mean Square Error (MSE) is a measure of how close θ̂


is, on average, to the true θ,

M SE = E[(θ̂ − θ)2 ] = V ar(θ̂) + [Bias(θ̂)]2

6/29
Interval estimation

An interval estimate (confidence interval) is an


interval, or range of values, used to estimate a population
parameter.
This interval will contain the population parameter with
some probability 100(1 − α)100%,
i.e., P (A < θ < B) = (1 − α).
The level of confidence 100(1 − α)% is the probability
that the interval estimate contains the population
parameter.
Interval estimate components:

point estimate ± (critical value × standard error)

7/29
Confidence intervals for the population mean

When sampling is from a normal distribution with known


variance σ 2 , then a 100(1 − α)% confidence interval for the
population mean µ is
σ
x̄ ± zα/2 √
n
where zα/2 can be obtained from the standard normal
distribution table.
100(1 − α)% α zα/2
90% 0.10 1.645
95% 0.05 1.96
99% 0.01 2.58
If σ is unknown and n ≥ 30, the sample standard deviation
rP
(xi − x̄)2
s=
n−1
in place of σ.
8/29
Confidence intervals for the population mean

9/29
Confidence intervals for the population mean

If the sampling is from a non-normal distribution and n ≥ 30,


then the sampling distribution of x̄ is approximately normally
distributed (central limit theorem) and we can use the same

formula, x̄ ± zα/2 (σ/ n), to construct the approximate
confidence interval for population mean.

10/29
Confidence intervals for the population mean

When sampling is from a normal distribution whose standard


deviation σ is unknown and the sample size is small, the
100(1 − α)% confidence interval for the population mean µ is

x̄ ± tα/2 (s/ n)

where tα/2 can be obtained from the t distribution table with


df = n − 1 and s is the sample standard deviation which is
given by rP
(xi − x̄)2
s=
n−1
If σ is unknown, and we neither have normal population nor
large sample, then we should use nonparametric statistics (not
cover in this course).

11/29
Interpreting confidence intervals

Probabilistic interpretation: In repeated sampling,


from some population, 100(1 − α)% of all intervals which
we constructed will in the long run include the population
parameter.

Practical interpretation: When sampling is from some


population, we have 100(1 − α)% confidence that the single
computed interval contains the population parameter.

12/29
Confidence interval for a population proportion

The 100(1 − α)% confidence interval for a population


proportion π is given by
r
π̂(1 − π̂)
π̂ ± zα/2
n
where π̂ is the sample proportion.

13/29
Example 15

Suppose an Italian car rental firm wants to estimate the average


number of kilometres travelled per day by each of its cars
rented in Florence. A random sample of 20 cars rented in
Florence reveals that the sample mean travel distance per day is
85.5 kilometres, with a population standard deviation of 19.3
kilometres. Compute a 99% confidence interval to estimate µ.

For a 99% level of confidence, a z value of 2.575 is obtained


(from the standard normal table). Assume that number of
kilometres travelled per day is normally distributed.
σ
x̄ ± zα/2 √
n
19.3
85.5 ± 2.575 √
20
85.5 ± 11.1
thus 74.4 ≤ µ ≤ 96.6
14/29
Hypothesis testing

15/29
Motivation

We often encounter such statements or claims:


A newspaper claims that the average starting salary of
MBA graduates is over £50K.(one sample test)

A claim about the efficiency of a particular diet program,


the average weight after the program is less than the
average weight before the program. (two paired samples test)

On average female managers earn less than male managers,


given that they have the same qualifications and skills.(two
independent samples test)
So we have claims about the populations’ means (averages) and
we would like to verify or examine these claims.

This is a kind of problem that hypothesis testing is designed


to solve.
16/29
The nature of hypothesis testing

We often use inferential statistics to make decisions or


judgments about the value of a parameter, such as a
population mean.
Typically, a hypothesis test involves two hypotheses:
1 Null hypothesis: a hypothesis to be tested, denoted by
H0 .
2 Alternative hypothesis (or research hypothesis): a
hypothesis to be considered as an alternate to the null
hypothesis, denoted by H1 or Ha .
The problem in a hypothesis test is to decide whether or
not the null hypothesis should be rejected in favour of the
alternative hypothesis.
The choice of the alternative hypothesis should reflect the
purpose of performing the hypothesis test.

17/29
The nature of hypothesis testing

How do we decide whether or not to reject the null


hypothesis in favour of the alternative hypothesis?
Very roughly, the procedure for deciding is the following:
Take a random sample from the population.
If the sample data are consistent with the null hypothesis,
then do not reject the null hypothesis; if the sample data
are inconsistent with the null hypothesis, then reject the
null hypothesis and conclude that the alternative hypothesis
is true.
Test statistic: the statistic used as a basis for deciding
whether the null hypothesis should be rejected.
The test statistic is a random variable which therefore
has a sampling distribution with mean and standard
deviation (so-called standard error).

18/29
Type I and Type II Errors

Type I error: rejecting the null hypothesis when it is in


fact true.
Type II error: not rejecting the null hypothesis when it
is fact false.
The significance level, α, of a hypothesis test is defined
as the probability of making a Type I error, that is, the
probability of rejecting a true null hypothesis.
19/29
Type I and Type II Errors

Relation between Type I and II error probabilities:


For a fixed sample size, the smaller the Type I error
probability, α, of rejecting a true null hypothesis, the larger
the Type II error probability of not rejecting a false null
hypothesis and vice versa.
Possible conclusions for a hypothesis test:
We use the terms reject and failure to reject for possible
decision about a null hypothesis.
You should keep in mind that failure to reject the null
hypothesis leads to much greater uncertainty because we
do not know the probability of Type II error. (It is better
to say “do not reject” than “accept”).
When the null hypothesis is rejected in a hypothesis test
performed at the significance level α, we say that the
results are statistically significant at level α.
20/29
Hypothesis tests for one population mean

In order to test the hypothesis that the population mean µ is


equal to a particular value µ0 , we are going to test the null
hypothesis

H0 : µ = µ0

against one of the following alternatives:

H1 : µ 6= µ0 (Two-tailed)
H1 : µ < µ0 (Left-tailed)
H1 : µ > µ0 (Right-tailed)

21/29
Hypothesis tests for one population mean

In order to test H0 , we need to use one of the following test


statistics, we should choose the one that satisfies the
assumptions.
If σ is known, and we have a normally distributed
population or large sample (n ≥ 30), then the test statistic,
so-called z-test, is
x̄ − µ0
z= √
σ/ n
where σ is the standard deviation of the population.
If σ is unknown, and we have a normally distributed
population or large sample (n ≥ 30), then the test statistic,
so-called t-test, is
x̄ − µ0
t= √ with df = n − 1.
s/ n
where s is the standard deviation of the sample.
22/29
Critical-value approach to hypothesis testing

For any specific significance level α, one can obtain these


critical values ±zα/2 and ±zα from the standard normal table.

1.282 1.645 1.960 2.326 2.576


z0.10 z0.05 z0.025 z0.01 z0.005

If the value of the test statistic falls in the rejection


region, reject H0 ; otherwise do not reject H0 .
23/29
Critical-value approach to hypothesis testing

For any specific significance level α, one can obtain these


critical values ±tα/2 and ±tα from the T distribution table. For
example, for df = 9 and α = .05, the critical values are
±t0.025 = ±2.262 and ±t0.05 = ±1.833.

24/29
The p-value approach to hypothesis testing

The p-value is the smallest significance level at which the


null hypothesis would be rejected. The p-value is also
known as the observed significance level.

The p-value measures how well the observed sample agrees


with the null hypothesis. A small p-value (close to zero)
indicates that the sample is not consistent with the null
hypothesis and the null hypothesis should be rejected. On
the other hand, a large p-value (larger than .10) generally
indicates a reasonable level of agreement between the
sample and the null hypothesis.

As a rule of thumb, if p-value ≤ α then reject H0 ; otherwise


do not reject H0 .
25/29
Hypothesis testing and confidence intervals

Hypothesis tests and confidence intervals are closely related.


Consider, for instance, a two tailed hypothesis test for a
population mean at the significance level α. It can be shown
that the null hypothesis will be rejected if and only if the value
µ0 given for the mean in the null hypothesis lies outside the
100(1 − α)-level confidence interval for µ.

Example:
At significance level α = 0.05, we want to test H0 : µ = 40
against H1 : µ 6= 40 (so here µ0 = 40).
Suppose that the 95% confidence interval for µ is
35 <µ< 38.
As µ0 = 40 lies outside this confidence intervals, we reject
H0 .

26/29
Test of Normality

One of the assumptions in order to use z-test or t-test is that


the population which we sampled from is normally distributed.
However we did not yet test this assumption, we should perform
a so-called test of normality. In order to do so:
We can plot our data sample, e.g. histogram, boxplot,
Use normality tests such as Kolmogorov-Smirnov test or
Shapiro-Wilk test. The null and alternative hypotheses are
H0 : the population being sampled is normally distributed.
H1 : the population being sampled is nonnormally
distributed.
If σ is unknown, and we neither have normal population nor
large sample, then we should use nonparametric tests instead of
z-test or t-test (not cover in this course).

27/29
Example 16

Each year, manufacturers perform mileage tests on new car


models and submit the results to the Environmental Protection
Agency (EPA). The EPA then tests the vehicles to determine
whether the manufacturers are correct. In 1992 one company
reported that a particular model equipped with a four-speed
manual transmission averaged 29 mpg on the highway. Suppose
the EPA tested 15 of the cars and obtained the following gas
mileages.

27.3 30.9 25.9 31.2 29.7


28.8 29.4 28.5 28.9 31.6
27.8 27.8 28.6 27.3 27.6

What decision would you make regarding the company’s report


on the gas mileage of the car? Perform the required hypothesis
test at the 5% significance level.
28/29
Example 16 (cont.)

The null and alternative hypotheses:

H0 : µ = 29 mpg vs. H1 : µ 6= 29 mpg

The value of the test statistic,


x̄ − µ0 28.753 − 29
t= √ = √ = −0.599
s/ n 1.595/ 15

As p-value = 0.559 > α = 0.05. So, we cannot reject H0 .

At the 5% significance level, the data do not provide sufficient


evidence to conclude that the company’s report was incorrect.

29/29

You might also like