hypothesis testing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Key Terms

Population
In statistics, we generally want to study a population.
You can think of a population as a collection of persons,
things, or objects under study.
Population

Determine the population:


We want to know the average (mean) amount of money first year college
students spend at Christ University on school supplies (excluding books).
We randomly survey 100 first year students at the college. Three of those
students spent Rs 150, Rs 200, and Rs 225, respectively.
Population
Determine the population:
We want to know the average (mean) amount of money first year college
students spend at Christ University on school supplies (excluding books).
We randomly survey 100 first year students at the college. Three of those
students spent Rs 150, Rs 200, and Rs 225, respectively.

Ans:
The population is all first-year students attending Christ University this
term.
Key Terms

Sample
To study the population, we select a sample.
The idea of sampling is to select a portion (or subset) of the
larger population and study that portion (the sample) to gain
information about the population.
Sample
Determine the sample:

We want to know the average (mean) amount of money first year college
students spend at Christ University on school supplies (excluding books).
We randomly survey 100 first year students at the college. Three of those
students spent Rs 150, Rs 200, and Rs 225, respectively.

Ans:
The sample could be 100 first year students at the college.
Parameter
A parameter is a number that is a property of the population.
Because it takes a lot of time and money to examine an entire
population, sampling is a very practical technique.
Parameter
Determine the parameter:
We want to know the average (mean) amount of money first year college
students spend at Christ University on school supplies (excluding books).
We randomly survey 100 first year students at the college. Three of those
students spent Rs 150, Rs 200, and Rs 225, respectively.

Ans:
The parameter is the average (mean) amount of money spent (excluding
books) by first year college students at Christ University this term.
Statistic
A statistic is a number that represents a property of the
sample.
Statistic
Determine the statistic:
We want to know the average (mean) amount of money first year college
students spend at Christ University on school supplies (excluding books).
We randomly survey 100 first year students at the college. Three of those
students spent Rs 150, Rs 200, and Rs 225, respectively.

Ans:
The statistic is the average (mean) amount of money spent (excluding
books) by first year college students in the sample.
Relation Between a Population and its Samples

Population – Parameter

Sample – Statistic
Categories of Statistical Analysis

Descriptive Statistics Inferential Statistics


Inferential statistics
Inferential statistics allows you to make predictions (“inferences”) from that data.
With inferential statistics, you take data from samples and make generalizations
about a population.
Statistical Inference

Estimation Testing of Hypothesis

Point Interval Bayesian

Parametric Non-Parametric Sequential Bayesian


the sample size
is not fixed in
advance.
Large Sample Instead data are
Small Sample evaluated as
they are
collected, and
Normal Sampling distributions further sampling
is stopped.
Why Statistical Tests?
Statistical tests are intended to decide whether a hypothesis about
distribution of one or more populations or samples should be rejected or
accepted.
Statistical Hypothesis
▪ A hypothesis is a contention based on preliminary observation of what appears to be
facts, which may or may not be true.

Some examples :

▪ The average rate of inflation in 1970’s was greater than the average rate of inflation in
1990’s.

▪ An increase in the proportion of workers belonging to labor unions increases the wage rate
in a state, Ceteris paribus.

▪ An increase in the unemployment rate reduces the rate of inflation.


HYPOTHESIS AND HYPOTHESIS TESTING

A statistical hypothesis is a claim (assertion, statement, belief, or


assumption) about an unknown population parameter value.

The process that enables a decision maker to test the validity (or
significance) of his claim by analyzing the difference between the value of
sample statistic and the corresponding hypothesized population
parameter value is called hypothesis testing.
Hypothesis Testing
• Is also called significance testing
• Tests a claim about a parameter using evidence (data in a sample
• The technique is introduced by considering a one-sample z test
• The procedure is broken into five steps
• Each element of the procedure must be understood
Step 1: State the Null Hypothesis (H0) and Alternative Hypothesis (H1)/Ha

✔ The null hypothesis (H0) is a claim of “no difference in the population”


✔ The alternative hypothesis (Ha) claims “H0 is false”

The problem: In the 1970s, 20–29 year old men in India had a mean μ body weight
of 170 pounds. Standard deviation σ was 40 pounds. We test whether mean body
weight in the population now differs.

Null hypothesis H0: μ = 170 (“no difference”)


The alternative hypothesis can be either Ha: μ > 170 (one-sided test) or
Ha: μ ≠ 170 (two-sided test)
Step 2: State the Level of Significance, α (alpha)
The level of significance, usually denoted by α (alpha), defines the likelihood of
rejecting a null hypothesis when it is true,
✔ that is, it is the risk a decision maker takes of rejecting the null hypothesis when
it is really true.

This means that the finding has a 95% chance of being true. Instead it will show
you ".05," meaning that the finding has a five percent (.05) chance of not being
true, which is the converse of a 95% chance of being true.

Desired Confidence Interval Z Score Level of Significance


90% 1.645 10%
95% 1.96 5%
99% 2.576 1%
Step 3: Establish Critical or Rejection Region
The acceptance region shown in Fig. is a range of values of the sample statistic spread around the null hypothesized
population parameter. If values of the sample statistic fall within the limits of acceptance region, the null hypothesis
is accepted, otherwise it is rejected.
Step 4: Construction of test Statistic
Step 5: Formulate a Decision Rule to Accept Null Hypothesis

Compare the calculated value of the test statistic with the critical value (also called standard table
value of test statistic). The decision rules for null hypothesis are as follows:

• Accept H0 if the test statistic value falls within the area of acceptance.
• Reject otherwise.
HYPOTHESIS TESTING FOR POPULATION PARAMETERS WITH LARGE SAMPLES
Hypothesis Testing for Single Population Mean
A packaging device is set to fill detergent powder packets with a mean weight of 5 kg, with a standard
deviation of 0.21 kg. The weight of packets can be assumed to be normally distributed. The weight of
packets is known to drift upwards over a period of time due to machine fault, which is not tolerable. A
random sample of 100 packets is taken and weighed. This sample has a mean weight of 5.03 kg. Can we
conclude that the mean weight produced by the machine has increased? Use a 5 per cent level of
significance.
Desired Confidence Interval Z Score Level of Significance
90% 1.645 10%
95% 1.96 5%
99% 2.576 1%
The mean lifetime of a sample of 400 fluorescent light bulbs produced by a
company is found to be 1600 hours with a standard deviation of 150 hours. Test the
hypothesis that the mean life time of the bulbs produced in general is higher than
the mean life of 1570 hours at α = 0.01 level of significance.
The mean lifetime of a sample of 400 fluorescent light bulbs produced by a
company is found to be 1600 hours with a standard deviation of 150 hours. Test the
hypothesis that the mean life time of the bulbs produced in general is higher than
the mean life of 1570 hours at α = 0.01 level of significance.
An ambulance service claims that it takes, on the average, 8.9 minutes to reach its destination in
emergency calls. To check on this claim, the agency which licenses ambulance services has then
timed on 50 emergency calls, getting a mean of 9.3 minutes with a standard deviation of 1.8
minutes. Does this constitute evidence that the figure claimed is too low at the 1 per cent
significance level?
An ambulance service claims that it takes, on the average, 8.9 minutes to reach its destination in
emergency calls. To check on this claim, the agency which licenses ambulance services has then
timed on 50 emergency calls, getting a mean of 9.3 minutes with a standard deviation of 1.8
minutes. Does this constitute evidence that the figure claimed is too low at the 1 per cent
significance level?
Type I Error (False Positive Error)

A type I error occurs when the null hypothesis is actually


true, but was rejected as false by the testing.

Let’s use a shepherd and wolf example.


Let’s say that our null hypothesis is that there is “no
wolf present.”
That is, the actual condition was that there was no wolf
present; however, the shepherd wrongly indicated
there was a wolf present by calling “Wolf! Wolf!”
This is a type I error or false positive error.
Type II Error (False Negative)

A type II error occurs when the null hypothesis is


actually false, but was accepted as true by the
testing.

Again, our null hypothesis is that there is “no wolf


present.”
That is, the actual situation was that there was a
wolf present; however, the shepherd wrongly
indicated there was no wolf present.
Let’s start with our shepherd/wolf example
Null Hypothesis Type I Error / False Positive Type II Error / False Negative
Wolf is not present Shepherd thinks wolf is present Shepherd thinks wolf is NOT
(shepherd cries wolf) when no present (shepherd does nothing)
wolf is actually present when a wolf is actually present

Cost Assessment Costs (actual costs plus shepherd Replacement cost for the sheep
credibility) associated witheaten by the wolf, and
scrambling the townsfolk to killreplacement cost for hiring a
the non-existing wolf new shepherd

Null Hypothesis is true Null hypothesis is false


Reject null hypothesis Type I Error False Positive Correct Outcome True
Positive
Fail to reject null Correct outcome True Type II Error False
hypothesis Negative Negative
a type I error can be thought of as “convicting an innocent person” and type II error
“letting a guilty person go free”.
Shawshank Redemption

Null Hypothesis Type I Error / False Positive Type II Error / False Negative
Person is not guilty of the Person is judged as guilty when Person is judged not guilty when
crime the person actually did not they actually did commit the crime
commit the crime (convicting an (letting a guilty person go free)
innocent person)
Cost Assessment Social costs of sending an Risks of letting a guilty criminal
innocent person to prison and roam the streets and committing
denying them their personal future crimes
freedoms (which in our society,
is considered an almost
unbearable cost)
Null Hypothesis Type I Error / False Positive Type II Error / False Negative
Medicine A cures (H0 true, but rejected as false) (H0 false, but accepted as true)
Disease B Medicine A cures Disease B, but is Medicine A does not cure Disease B,
rejected as false but is accepted as true
Cost Assessment Lost opportunity cost for rejecting Unexpected side effects (maybe even
an effective drug that could cure death) for using a drug that is not
Disease B effective
Hence, many textbooks and instructors will say that the Type 1
(false positive) is worse than a Type 2 (false negative) error.

You might also like