0% found this document useful (0 votes)
39 views42 pages

Introduction To Data Analytics: Statistical Inference - II

This document discusses statistical inference and hypothesis testing. It defines Type I and Type II errors in hypothesis testing and provides examples of calculating the probabilities of these errors. The document also discusses one-tailed and two-tailed hypothesis tests, giving examples of defining the rejection regions for different tests. Finally, it presents two case studies and outlines the five steps for testing hypotheses.

Uploaded by

preethi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views42 pages

Introduction To Data Analytics: Statistical Inference - II

This document discusses statistical inference and hypothesis testing. It defines Type I and Type II errors in hypothesis testing and provides examples of calculating the probabilities of these errors. The document also discusses one-tailed and two-tailed hypothesis tests, giving examples of defining the rejection regions for different tests. Finally, it presents two case studies and outlines the five steps for testing hypotheses.

Uploaded by

preethi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

INTRODUCTION TO

DATA ANALYTICS

Class #11
Statistical Inference - II

Dr. Sreeja S R
Assistant Professor
Indian Institute of Information Technology
IIIT Sri City
IIITS: IDA - M2021 1
Q U O T E O F T H E D AY. .

IIITS: IDA - M2021 2


IN THIS PRESENTATION…

• Errors in hypothesis testing

• Case Study 1: Coffee Sale

• Case Study 2: Machine Testing

• Summary of Sampling Distributions in Hypothesis Testing

IIITS: IDA - M2021 3


 
Calculating
•Assuming
  that we have the results of random sample. Hence, we use the
characteristics of sampling distribution to calculate the probabilities of making
either Type I or Type II error.

Example 6.6:
Suppose, two hypotheses in a statistical testing are:

Also, assume that for a given sample, population obeys normal distribution. A
threshold limit say is used to say that they are significantly different from a.

IIITS: IDA - M2021 4


 
Calculating
•  

 Here, shaded region implies the probability that,

a-δ a a+δ

Thus the null hypothesis is to be rejected if the mean value is less than or
greater than .

If denotes the sample mean, then the Type I error is

IIITS: IDA - M2021 5


THE REJECTION REGION
•  
The rejection region comprises of value of the test statistics for which
1. The probability when the null hypothesis is true is less than or equal to the specified .
2. Probability when is true are greater than they are under .

a’ a a”
Rejection region for H0 for a
given value of α

Reject H0 Do not reject H0 Reject H0


≠a =a ≠a

IIITS: IDA - M2021 6


Two-Tailed Test
•  For two-tailed hypothesis test, hypotheses take the form

In other words, to reject a null hypothesis, sample mean or under a given .

Thus, in a two-tailed test, there are two rejection regions (also known as critical
region), one on each tail of the sampling distribution curve.

IIITS: IDA - M2021 7


Two-Tailed Test
Acceptance region
Accept H0 ,if the sample
mean falls in this region

95 % of area

0.025 of area 0.025 of area

µH 0

Rejection region
Reject H0 ,if the sample mean falls
in either of these regions

Acceptance and rejection regions in case of a two-tailed test with 5% significance level.
IIITS: IDA - M2021 8
One-Tailed Test
•A one-tailed
  test would be used when we are to test, say, whether the population mean is
either lower or higher than the hypothesis test value.

Symbolically,

Wherein there is one rejection region only on the left-tail (or right-tail).
Acceptance region
Acceptance region

.05 of area
.05 of area

Rejection region
Rejection region
¿  − tailed test
  tailed test
¿
IIITS: IDA - M2021 9
EXAMPLE 6.7: CALCULATING
 

•  
Consider the two hypotheses are

The null hypothesis is

The alternative hypothesis is

Assume that given a sample of size 16 and standard deviation is 0.2 and sample
follows normal distribution.

IIITS: IDA - M2021 10


EXAMPLE 6.7: CALCULATING
 

•We  can decide the rejection region as follows.

Suppose, the null hypothesis is to be rejected if the mean value is less than 7.9 or greater than 8.1.
If is the sample mean, then the probability of Type I error is

Given the standard deviation of the sample is 0.2 and that the distribution follows normal
distribution.
Thus,

and

Hence,
IIITS: IDA - M2021 11
 

Example 6.8: Calculating and


There are two identically appearing boxes of chocolates. Box A contains 60 red and
40 black chocolates whereas box B contains 40 red and 60 black chocolates. There
is no label on the either box. One box is placed on the table. We are to test the
hypothesis that “Box B is on the table”.

To test the hypothesis an experiment is planned, which is as follows:


• Draw at random five chocolates from the box.
• We replace each chocolates before selecting a new one.
• The number of red chocolates in an experiment is considered as the sample
statistics.

Note: Since each draw is independent to each other, we can assume the sample distribution
follows binomial probability distribution. IIITS: IDA - M2021 12
Example 6.8: Calculating
•Let  us express the population parameter as
The hypotheses of the problem can be stated as:
// Box B is on the table
// Box A is on the table
Calculating
In this example, the null hypothesis specifies that the probability of drawing a red chocolate is .
This means that, lower proportion of red chocolates in observations favors the null hypothesis.
In other words, drawing all red chocolates provides sufficient evidence to reject the null
hypothesis. Then, the probability of making a error is the probability of getting five red
chocolates in a sample of five from Box B. That is,

Using the binomial distribution

Thus, the probability of rejecting a true null hypothesis is That is, there is approximately
chance that the box B will be mislabeled as box A. IIITS: IDA - M2021 13
 
Example 6.8: Calculating
•   error occurs if we fail to reject the null hypothesis when it is not true. For the current
The
illustration, such a situation occurs, if Box A is on the table but we did not get the five red
chocolates required to reject the hypothesis that Box B is on the table.
The probability of error is then the probability of getting four or fewer red chocolates in a
sample of five from Box A.
That is,

Using the probability rule:

That is,
Now,
Hence,

That is, the probability of making error is over . This means that, if Box IIITS:
A isIDAon- M2021
the table,
14
the
probability that we will be unable to detect it is .
CASE STUDY 1: COFFEE SALE

A coffee vendor nearby Kharagpur railway station has been having average
sales of 500 cups per day. Because of the development of a bus stand nearby, it
expects to increase its sales. During the first 12 days, after the inauguration of
the bus stand, the daily sales were as under:

550 570 490 615 505 580 570 460 600 580 530 526

On the basis of this sample information, can we conclude that the sales of coffee
have increased?

Consider 5% level of confidence.

IIITS: IDA - M2021 15


HYPOTHESIS TESTING : 5 STEPS

•The
  following five steps are followed when testing hypothesis

1. Specify and , the null and alternate hypothesis, and an acceptable level of .

2. Determine an appropriate sample-based test statistics and the rejection region for
the specified .

3. Collect the sample data and calculate the test statistics.

4. Make a decision to either reject or fail to reject .

5. Interpret the result in common language suitable for practitioner.

IIITS: IDA - M2021 16


CASE STUDY 1: STEP 1

•Step
  1: Specification of hypothesis and acceptable level of

Let us consider the hypotheses for the given problem as follows.

cups per day


The null hypothesis that sales average 500 cups per day and they have not
increased.

The alternative hypothesis is that the sales have increased.

Given the acceptance level of

IIITS: IDA - M2021 17


CASE STUDY 1: STEP 2

•   2: Sample-based test statistics and the rejection region for specified


Step

Given the sample as

550 570 490 615 505 580 570 460 600 580 530 526

Since the sample size is small and the population standard deviation is not known, we shall
use assuming normal population. The test statistics is

To find and , we make the following computations.

= IIITS: IDA - M2021 18


CASE STUDY 1: STEP 2

IIITS: IDA - M2021 19


Case Study 1: Step 2
•  

Hence,

Note:
Statistical table for t-distributions gives a t-value given n, the degrees of freedom and ,
the level of significance and vice-versa.

IIITS: IDA - M2021 20


Case

Study 1: Step 3
 

Step 3: Collect the sample data and calculate the test statistics

As is one-tailed, we shall determine the rejection region applying one-tailed in the right
tail because is more than type ) at level of significance.

IIITS: IDA - M2021 21


Case

Study 1: Step 3
 

Step 3: Collect the sample data and calculate the test statistics

As is one-tailed, we shall determine the rejection region applying one-tailed in the right
tail because is more than type ) at level of significance.

Using table of for 11 degrees of freedom and with level of significance,

IIITS: IDA - M2021 22


Case Study 1: Step 4
•Step
  4: Make a decision to either reject or fail to reject H0

The observed value of which is in the rejection region and thus is rejected at level of
significance.

IIITS: IDA - M2021 23


Case Study 1: Step 5
Step 5: Final comment and interpret the result

We can conclude that the sample data indicate that coffee sales have increased.

IIITS: IDA - M2021 24


CASE STUDY 2: MACHINE TESTING
•A  medicine production company packages medicine in a tube of 8 ml with . In
maintaining the control of the amount of medicine in tubes, they use a machine. To
monitor this control a sample of 16 tubes is taken from the production line at
random time interval and their contents are measured precisely. The mean amount of
medicine in these 16 tubes will be used to test the hypothesis that the machine is
indeed working properly. The given sample size has a sample mean 7.89 and sample
follows normal distribution.

IIITS: IDA - M2021 25


CASE STUDY 2: STEP 1

•  
Step 1: Specification of hypothesis and acceptable level of

The hypotheses are given in terms of the population mean of medicine per tube.

The null hypothesis is

The alternative hypothesis is

We assume , the significance level in our hypothesis testing 0.05.


(This signifies the probability that the machine needs to be adjusted less than 5).

IIITS: IDA - M2021 26


CASE STUDY 2: STEP 2

•Step
  2: Sample-based test statistics and the rejection region for specified

Rejection region: G, which gives (obtained from standard normal calculation for two-
tailed test).

IIITS: IDA - M2021 27


CASE STUDY 2: STEP 3

•  
Step 3: Collect the sample data and calculate the test statistics

Sample results: , ,

With the sample, the test statistics is

Hence,

IIITS: IDA - M2021 28


CASE STUDY 2: STEP 4
•  

Step 4: Make a decision to either reject or fail to reject H0

-2.20 -1.96 0 1.96 2.20

Since , we reject

IIITS: IDA - M2021 29


CASE STUDY 2: STEP 5
•  

Step 5: Final comment and interpret the result

We conclude and recommend that the machine be adjusted.

IIITS: IDA - M2021 30


CASE STUDY 2: ALTERNATIVE TEST
•Suppose
  that in our initial setup of hypothesis test, if we choose instead of 0.05, then the
test can be summarized as:

1. ,

2. Reject if

3. Sample result n =16, = 0.2, =7.89, ,

4. , we fail to reject = 8

5. We do not recommend that the machine be readjusted.

IIITS: IDA - M2021 31


Hypothesis Testing Strategies
• The hypothesis testing determines the validity of an assumption (technically
described as null hypothesis), with a view to choose between two conflicting
hypothesis about the value of a population parameter.

• There are two types of tests of hypotheses


 Non-parametric tests (also called distribution-free test of hypotheses)
Parametric tests (also called standard test of hypotheses).

IIITS: IDA - M2021 32


Parametric Tests : Applications
• Usually assume certain properties of the population from
which we draw samples.

• Observation come from a normal population

• Sample size is small

• Population parameters like mean, variance, etc. are hold good.

• Requires measurement equivalent to interval scaled data.

IIITS: IDA - M2021 33


Parametric Tests
•Important
  Parametric Tests
The widely used sampling distribution for parametric tests are

Note:
All these tests are based on the assumption of normality (i.e., the source of data is
considered to be normally distributed).

IIITS: IDA - M2021 34


Parametric Tests : Z-test
•:  This is most frequently test in statistical analysis.

• It is based on the normal probability distribution.

• Used for judging the significance of several statistical measures particularly


the mean.

• It is used even when or is applicable with a condition that such a distribution


tends to normal distribution when n becomes large.

• Typically it is used for comparing the mean of a sample to some


hypothesized mean for the population in case of large sample, or when
population variance is known.
IIITS: IDA - M2021 35
Parametric Tests : t-test
•  

: It is based on the t-distribution.

• It is considered an appropriate test for judging the significance of a sample


mean or for judging the significance of difference between the means of two
samples in case of

• small sample(s)

• population variance is not known (in this case, we use the variance of the sample as an
estimate of the population variance)

IIITS: IDA - M2021 36


 

Parametric Tests : -test


•  

: It is based on Chi-squared distribution.

• It is used for comparing a sample variance to a theoretical population


variance.

IIITS: IDA - M2021 37


 

Parametric Tests : -test


•  

: It is based on F-distribution.

• It is used to compare the variance of two independent samples.

• This test is also used in the context of analysis of variance (ANOVA) for
judging the significance of more than two sample means.

IIITS: IDA - M2021 38


Hypothesis Testing : Assumptions
•Case
  1: Normal population, population infinite, sample size may be large or small, variance
of the population is known.

Case 2: Population normal, population finite, sample size may large or small………variance
is known.

Case 3: Population normal, population infinite, sample size is small and variance of the
population is unknown.

and

IIITS: IDA - M2021 39


Hypothesis Testing
•Case
  4: Population finite

Note: If variance of population is known, replace by . Population normal, population


infinite, sample size is small and variance of the population is unknown.

IIITS: IDA - M2021 40


Hypothesis Testing : Non-Parametric Test

• Non-Parametric tests
Does not under any assumption
Assumes only nominal or ordinal data

Note: Non-parametric tests need entire population (or very large sample size)
IIITS: IDA - M2021 41
Any question?

IIITS: IDA - M2021 42

You might also like