0% found this document useful (0 votes)
35 views37 pages

Hypothesis Testing

Uploaded by

Venkata Lokendra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views37 pages

Hypothesis Testing

Uploaded by

Venkata Lokendra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Business Statistics

Agenda - Estimation and Hypothesis Testing - Week 2


1. Sampling and Inference 5. Performing a Hypothesis Test
a. Simple random samples a. Some key ideas
b. Sampling distribution b. Assumptions
c. Central Limit Theorem c. Critical point
2. Estimation d. Rejection region approach
a. Point estimation e. p-value approach
b. Interval estimation 6. One-Tailed and Two-Tailed Tests
3. Hypothesis Testing 7. Confidence Interval and Hypothesis Test
a. Introduction
b. Hypothesis Formulation
4. Basic concepts of Hypothesis Testing
a. Importance of null
b. Importance of test statistic
c. Type I and Type 2 errors
d. Hypothesis testing template
Sampling and Inference
Revisiting the need for sampling..

In many of the situations, what we have available to us is a sample of data.

The data we have is finite.

Till now, the goal was to find ways of describing, summarizing and visualising
the sample data only

Moving ahead, we want to make inferences about the


“entire” population using the sample data.
Sampling : Simple Random Sampling

A sampling technique where every item in the population has an equal chance
of being selected

Allows all the entities in the population to have


Why are simple random an equal chance of being selected and so the
samples important? sample is likely to be representative of the
population
Sampling Distribution
The sampling distribution of a statistic is the probability distribution of that
statistic when we draw many samples
For example sampling distribution of the mean, sampling distribution of variance etc.
To a great extent, statistical inference techniques are based on sampling distribution of a statistic

Samples of
size n

Population Sampling Distribution


Distribution of means
Sampling Distribution
Central Limit Theorem

The sampling distribution of the sample means will approach


normal distribution as the sample size gets bigger, no matter
what the shape of the population distribution is.

Assumptions

Data must be randomly sampled Sample values must be independent of each other

Samples should come from the same distribution Sample size must be sufficiently large (≥30)
Central Limit Theorem

Large sample size provides better estimate of


the population mean.

For sample size n = 5, the mean of sample


means pile up around the population mean.

For sample size n = 30, the mean of sample


means are much closer to the population
mean.
Estimation
Estimation

Estimation
Make inference about a population parameter
based on sample statistic

Point Estimation Interval Estimation


Single point estimation of the population A range of values within which the
parameter population parameter lies with some
(x%) confidence
E.g. Population mean as estimated from
the sample mean is $40 E.g. Population mean should lie between
$38-42, with 95% confidence (x = 95)
Point Estimation
A point estimate of a population parameter is a single value of a statistic

Point estimates vary from sample to sample. Often an interval is used to provide a range of values the
parameter can take, instead of a single point estimate.
Interval estimation - Confidence interval

Confidence interval provides an interval, or a range of values, which is expected


to cover the true unknown parameter.
Confidence limits
True Value
Estimation
(unknown)
The upper and lower limits of
the interval are determined
using the distribution of the
sample mean and a multiplier
which specifies the ‘confidence’
Confidence level
Confidence Interval for Mean 𝜇

Interpretation of 95% Confidence Interval

- The interpretation of a 95% confidence interval is that, if the process is repeated a


large number of times, then the intervals so constructed, will contain the true
population parameter 95% of times.

Why not 100% Confidence Interval?

- A 100% confidence interval will include all possible values.


- Hence there will be no insight into the problem.
Hypothesis Testing
Why Hypothesis?

The problem of estimation is considered, when there is no


previous knowledge of the population parameter. The
Estimation problem is simpler in that case. A random sample is taken,
a sample statistic is computed and an appropriate point
and interval estimate is suggested.

Often the interest is not in the numerical value of the point


Hypothesis estimate of the parameter, but in knowing the plausibility
Testing of a hypothesis about the population parameter by using
sample data. Estimation is not enough to arrive at a
conclusion in such cases.
What is Hypothesis?

Often we are interested in population parameter(s)

A hypothesis is a conjecture about the population parameter(s)

For example, a bulb manufacturing company is interested in knowing whether the new
manufacturing process improves reliability of the bulbs.

The objective of the Hypothesis Testing is to SET a value for the parameter(s) and perform
a statistical TEST to see whether that value is tenable in the light of the evidence gathered
from the sample.
Overview of Applications

Applications of Hypothesis Testing

Testing Testing the Testing the


Research validity of a business
Hypotheses claim decisions

e.g. a new automobile e.g. a manufacturer claims e.g. new online ad has
system increases the mean that 1L soft drink bottles are resulted in higher online
mpg performance filled with an average of at conversion rates for an
least 0.99L E-commerce website
Stating the Hypothesis
Null and Alternative Hypotheses - Two
mutually exclusive statements about
the population parameter(s)

Null Hypothesis (H0 ) Alternative Hypothesis (Ha)


The presumed current The rival opinion
state of the matter or research hypothesis
or status quo. or an improvement target.

E.g. The new process for E.g. The new process for
manufacturing bulbs does manufacturing bulbs
not improve reliability. improves reliability.
Null & Alternative Formulation : Example

Mean length of lumber is specified to be 8.5m for a certain building project. A construction
engineer wants to make sure that the shipments she received adhere to that specification.

The population parameter about which the hypothesis will be formed is population mean
𝜇.

The hypotheses are


H 0 : 𝜇= 8.5

H a : 𝜇≠ 8.5
Tips to formulate Null & Alternative

Am I testing an
Am I testing a status quo
assumption or claim that
that already exists?
is beyond what I know?

Null Hypothesis Alternate Hypothesis

Negation of the research Research question to be


question proven

Always contains equality (=, >= Doesn’t contain equality (≠, >,
, <=) <)
Basic Concepts of Hypothesis Testing
Importance of Null

Null hypothesis is assumed to be true unless reasonably strong evidence to the contrary is
found.

Based on a random sample a decision is made whether there exists reasonably strong
evidence against the null hypothesis.

Evidence is strong (satisfies the Reject the null hypothesis


predetermined decision rule) in favour of alternative hypothesis

Evidence is not strong (does not satisfy Fail to reject the null hypothesis
the predetermined decision rule) in favour of alternative hypothesis
Importance of Test Statistic
The test statistic is calculated from the sample data and tested against the predetermined
Decision Rule.

The test statistic is a random variable that follows a standard distribution such as Normal,
T, F, Chi-square etc. Sometimes the tests are named after the test statistic

Since hypothesis testing is done on the basis of sampling distribution, the decisions made
are probabilistic.

Hence, it is very important to understand the errors associated with hypothesis testing.
Type I and Type II Error
Type I and Type II Errors

Level of Power of
significance the test
H 0 is True H 0 is False

Type I Error Correct decision


Reject H 0 Prob = α Prob = 1 - β

Fail to reject Correct decision Type II Error


H0 Prob = 1 - α Prob = β
Type I and Type II Errors : Example

Null Hypothesis: The patient Alternate Hypothesis: The patient


doesn’t have cancer has cancer

Type I error (false positive): “The patient doesn’t have cancer but doctors says she does”

Type II error (false negative): “The patient does have cancer but report says she doesn’t”
Template for Hypothesis Testing
Hypothesis Testing Template

1 Identify the key question What is the research question that you are trying to answer?

2 Establish the hypotheses What is the metric of interest? Define the Null and Alternate Hypothesis.

What data do you have? Do you understand what it means? Can it be used
3 Understand and prepare data directly?

4 Identify the right test Choose the method for testing based on the last three points

5 Check the assumptions Ensure that data satisfies the assumption for the test.

6 Perform the test Get to conclusion based on the results (p-value)


Performing a hypothesis test
Some key ideas first
● Probability of rejecting the null hypothesis when it is
true
Level of
Significance (𝝰) ● Fixed before the hypothesis test.

● Probability of observing test statistic or more extreme


results than the computed test statistic, under the
null hypothesis.
p-value
● Depends on the sample data. Alpha is pre-fixed but
p-value depends on the value of the test statistic

● The total area under the distribution curve of the test


Acceptance or statistic is partitioned into acceptance and rejection
Rejection Region region

● Reject the null hypothesis when the test statistic lies


in the rejection region, Else we fail to reject it
Let’s start simple

Consider the following questions in hypothesis testing

What are the null and alternative hypotheses? What is an appropriate test statistic?

How to check whether the data is giving significant


What is preset level of significance?
evidence against the null hypothesis or not?

Let’s see an example and understand the significance of the above questions

For simplicity, we will assume that the population standard deviation is known and the
sample size is more than 30.
Example
It is known from experience that for a certain E-commerce company the mean delivery time
of the products is 5 days with a standard deviation of 1.3 days.

The new customer service manager of the company is afraid that the company is slipping
and collects a random sample of 45 orders. The mean delivery time of these samples comes
out to be 5.25 days.

Is there enough statistical evidence for the manager’s apprehension that the mean delivery
time of products is greater than 5 days.

This is clearly a one-tailed test, concerning population mean 𝛍,


the mean delivery time of products.
First test - z-test for One Mean

Significance of Test
Assumptions
the test Statistic
Distribution
Test for population Standard Normal
mean ● Continuous data distribution
H0 : 𝜇 = 𝜇 0 ● Normally distributed population
or sample size > 30
● Known population standard
deviation 𝜎
● Random sampling from the
population
One-tailed and Two-tailed Tests
One-tailed and Two-tailed Tests
Greater than type
H a : 𝜇> 𝜇0

One-tailed test
Less than type
Alternative H a : 𝜇< 𝜇0
Hypothesis

Two-tailed test

Not equal type


H a : 𝜇≠ 𝜇0

Choice of One tailed vs Two tailed depends on the nature of the problem, not on the sample data!
Difference between One-tailed and Two-tailed Tests

Test statistic value does not change for two-tailed or one-tailed test.

Only the critical value(s) / p-value associated with the test statistic changes

0 1.645 -1.96 0 1.96

The difference is not tested on this


The difference is tested on both the
side and the hypothesis test has
sides.
greater power on the other side

You might also like