0% found this document useful (0 votes)

39 views42 pages

Introduction To Data Analytics: Statistical Inference - II

This document discusses statistical inference and hypothesis testing. It defines Type I and Type II errors in hypothesis testing and provides examples of calculating the probabilities of these errors. The document also discusses one-tailed and two-tailed hypothesis tests, giving examples of defining the rejection regions for different tests. Finally, it presents two case studies and outlines the five steps for testing hypotheses.

Uploaded by

preethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views42 pages

Introduction To Data Analytics: Statistical Inference - II

Uploaded by

preethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

INTRODUCTION TO

DATA ANALYTICS

Class #11
Statistical Inference - II

Dr. Sreeja S R
Assistant Professor
Indian Institute of Information Technology
IIIT Sri City
IIITS: IDA - M2021 1
Q U O T E O F T H E D AY. .

IIITS: IDA - M2021 2

IN THIS PRESENTATION…

• Errors in hypothesis testing

• Case Study 1: Coffee Sale

• Case Study 2: Machine Testing

• Summary of Sampling Distributions in Hypothesis Testing

IIITS: IDA - M2021 3

Calculating
•Assuming
that we have the results of random sample. Hence, we use the
characteristics of sampling distribution to calculate the probabilities of making
either Type I or Type II error.

Example 6.6:
Suppose, two hypotheses in a statistical testing are:

Also, assume that for a given sample, population obeys normal distribution. A
threshold limit say is used to say that they are significantly different from a.

IIITS: IDA - M2021 4

Calculating
•

Here, shaded region implies the probability that,

a-δ a a+δ

Thus the null hypothesis is to be rejected if the mean value is less than or
greater than .

If denotes the sample mean, then the Type I error is

IIITS: IDA - M2021 5

THE REJECTION REGION
•
The rejection region comprises of value of the test statistics for which
1. The probability when the null hypothesis is true is less than or equal to the specified .
2. Probability when is true are greater than they are under .

a’ a a”
Rejection region for H0 for a
given value of α

Reject H0 Do not reject H0 Reject H0

≠a =a ≠a

IIITS: IDA - M2021 6

Two-Tailed Test
• For two-tailed hypothesis test, hypotheses take the form

In other words, to reject a null hypothesis, sample mean or under a given .

Thus, in a two-tailed test, there are two rejection regions (also known as critical
region), one on each tail of the sampling distribution curve.

IIITS: IDA - M2021 7

Two-Tailed Test
Acceptance region
Accept H0 ,if the sample
mean falls in this region

95 % of area

0.025 of area 0.025 of area

µH 0

Rejection region
Reject H0 ,if the sample mean falls
in either of these regions

Acceptance and rejection regions in case of a two-tailed test with 5% significance level.
IIITS: IDA - M2021 8
One-Tailed Test
•A one-tailed
test would be used when we are to test, say, whether the population mean is
either lower or higher than the hypothesis test value.

Symbolically,

Wherein there is one rejection region only on the left-tail (or right-tail).
Acceptance region
Acceptance region

.05 of area
.05 of area

Rejection region
Rejection region
¿ − tailed test
tailed test
¿
IIITS: IDA - M2021 9
EXAMPLE 6.7: CALCULATING

•
Consider the two hypotheses are

The null hypothesis is

The alternative hypothesis is

Assume that given a sample of size 16 and standard deviation is 0.2 and sample
follows normal distribution.

IIITS: IDA - M2021 10

EXAMPLE 6.7: CALCULATING

•We can decide the rejection region as follows.

Suppose, the null hypothesis is to be rejected if the mean value is less than 7.9 or greater than 8.1.
If is the sample mean, then the probability of Type I error is

Given the standard deviation of the sample is 0.2 and that the distribution follows normal
distribution.
Thus,

and

Hence,
IIITS: IDA - M2021 11

Example 6.8: Calculating and

There are two identically appearing boxes of chocolates. Box A contains 60 red and
40 black chocolates whereas box B contains 40 red and 60 black chocolates. There
is no label on the either box. One box is placed on the table. We are to test the
hypothesis that “Box B is on the table”.

To test the hypothesis an experiment is planned, which is as follows:

• Draw at random five chocolates from the box.
• We replace each chocolates before selecting a new one.
• The number of red chocolates in an experiment is considered as the sample
statistics.

Note: Since each draw is independent to each other, we can assume the sample distribution
follows binomial probability distribution. IIITS: IDA - M2021 12
Example 6.8: Calculating
•Let us express the population parameter as
The hypotheses of the problem can be stated as:
// Box B is on the table
// Box A is on the table
Calculating
In this example, the null hypothesis specifies that the probability of drawing a red chocolate is .
This means that, lower proportion of red chocolates in observations favors the null hypothesis.
In other words, drawing all red chocolates provides sufficient evidence to reject the null
hypothesis. Then, the probability of making a error is the probability of getting five red
chocolates in a sample of five from Box B. That is,

Using the binomial distribution

Thus, the probability of rejecting a true null hypothesis is That is, there is approximately
chance that the box B will be mislabeled as box A. IIITS: IDA - M2021 13

Example 6.8: Calculating
• error occurs if we fail to reject the null hypothesis when it is not true. For the current
The
illustration, such a situation occurs, if Box A is on the table but we did not get the five red
chocolates required to reject the hypothesis that Box B is on the table.
The probability of error is then the probability of getting four or fewer red chocolates in a
sample of five from Box A.
That is,

Using the probability rule:

That is,
Now,
Hence,

That is, the probability of making error is over . This means that, if Box IIITS:
A isIDAon- M2021
the table,
14
the
probability that we will be unable to detect it is .
CASE STUDY 1: COFFEE SALE

A coffee vendor nearby Kharagpur railway station has been having average
sales of 500 cups per day. Because of the development of a bus stand nearby, it
expects to increase its sales. During the first 12 days, after the inauguration of
the bus stand, the daily sales were as under:

550 570 490 615 505 580 570 460 600 580 530 526

On the basis of this sample information, can we conclude that the sales of coffee
have increased?

Consider 5% level of confidence.

IIITS: IDA - M2021 15

HYPOTHESIS TESTING : 5 STEPS

•The
following five steps are followed when testing hypothesis

1. Specify and , the null and alternate hypothesis, and an acceptable level of .

2. Determine an appropriate sample-based test statistics and the rejection region for
the specified .

3. Collect the sample data and calculate the test statistics.

4. Make a decision to either reject or fail to reject .

5. Interpret the result in common language suitable for practitioner.

IIITS: IDA - M2021 16

CASE STUDY 1: STEP 1

•Step
1: Specification of hypothesis and acceptable level of

Let us consider the hypotheses for the given problem as follows.

cups per day

The null hypothesis that sales average 500 cups per day and they have not
increased.

The alternative hypothesis is that the sales have increased.

Given the acceptance level of

IIITS: IDA - M2021 17

CASE STUDY 1: STEP 2

• 2: Sample-based test statistics and the rejection region for specified

Step

Given the sample as

550 570 490 615 505 580 570 460 600 580 530 526

Since the sample size is small and the population standard deviation is not known, we shall
use assuming normal population. The test statistics is

To find and , we make the following computations.

= IIITS: IDA - M2021 18

CASE STUDY 1: STEP 2

IIITS: IDA - M2021 19

Case Study 1: Step 2
•

Hence,

Note:
Statistical table for t-distributions gives a t-value given n, the degrees of freedom and ,
the level of significance and vice-versa.

IIITS: IDA - M2021 20

Case
•
Study 1: Step 3

Step 3: Collect the sample data and calculate the test statistics

As is one-tailed, we shall determine the rejection region applying one-tailed in the right
tail because is more than type ) at level of significance.

IIITS: IDA - M2021 21

Case
•
Study 1: Step 3

Step 3: Collect the sample data and calculate the test statistics

As is one-tailed, we shall determine the rejection region applying one-tailed in the right
tail because is more than type ) at level of significance.

Using table of for 11 degrees of freedom and with level of significance,

IIITS: IDA - M2021 22

Case Study 1: Step 4
•Step
4: Make a decision to either reject or fail to reject H0

The observed value of which is in the rejection region and thus is rejected at level of
significance.

IIITS: IDA - M2021 23

Case Study 1: Step 5
Step 5: Final comment and interpret the result

We can conclude that the sample data indicate that coffee sales have increased.

IIITS: IDA - M2021 24

CASE STUDY 2: MACHINE TESTING
•A medicine production company packages medicine in a tube of 8 ml with . In
maintaining the control of the amount of medicine in tubes, they use a machine. To
monitor this control a sample of 16 tubes is taken from the production line at
random time interval and their contents are measured precisely. The mean amount of
medicine in these 16 tubes will be used to test the hypothesis that the machine is
indeed working properly. The given sample size has a sample mean 7.89 and sample
follows normal distribution.

IIITS: IDA - M2021 25

CASE STUDY 2: STEP 1

•
Step 1: Specification of hypothesis and acceptable level of

The hypotheses are given in terms of the population mean of medicine per tube.

The null hypothesis is

The alternative hypothesis is

We assume , the significance level in our hypothesis testing 0.05.

(This signifies the probability that the machine needs to be adjusted less than 5).

IIITS: IDA - M2021 26

CASE STUDY 2: STEP 2

•Step
2: Sample-based test statistics and the rejection region for specified

Rejection region: G, which gives (obtained from standard normal calculation for two-
tailed test).

IIITS: IDA - M2021 27

CASE STUDY 2: STEP 3

•
Step 3: Collect the sample data and calculate the test statistics

Sample results: , ,

With the sample, the test statistics is

Hence,

IIITS: IDA - M2021 28

CASE STUDY 2: STEP 4
•

Step 4: Make a decision to either reject or fail to reject H0

-2.20 -1.96 0 1.96 2.20

Since , we reject

IIITS: IDA - M2021 29

CASE STUDY 2: STEP 5
•

Step 5: Final comment and interpret the result

We conclude and recommend that the machine be adjusted.

IIITS: IDA - M2021 30

CASE STUDY 2: ALTERNATIVE TEST
•Suppose
that in our initial setup of hypothesis test, if we choose instead of 0.05, then the
test can be summarized as:

1. ,

2. Reject if

3. Sample result n =16, = 0.2, =7.89, ,

4. , we fail to reject = 8

5. We do not recommend that the machine be readjusted.

IIITS: IDA - M2021 31

Hypothesis Testing Strategies
• The hypothesis testing determines the validity of an assumption (technically
described as null hypothesis), with a view to choose between two conflicting
hypothesis about the value of a population parameter.

• There are two types of tests of hypotheses

 Non-parametric tests (also called distribution-free test of hypotheses)
Parametric tests (also called standard test of hypotheses).

IIITS: IDA - M2021 32

Parametric Tests : Applications
• Usually assume certain properties of the population from
which we draw samples.

• Observation come from a normal population

• Sample size is small

• Population parameters like mean, variance, etc. are hold good.

• Requires measurement equivalent to interval scaled data.

IIITS: IDA - M2021 33

Parametric Tests
•Important
Parametric Tests
The widely used sampling distribution for parametric tests are

Note:
All these tests are based on the assumption of normality (i.e., the source of data is
considered to be normally distributed).

IIITS: IDA - M2021 34

Parametric Tests : Z-test
•: This is most frequently test in statistical analysis.

• It is based on the normal probability distribution.

• Used for judging the significance of several statistical measures particularly

the mean.

• It is used even when or is applicable with a condition that such a distribution

tends to normal distribution when n becomes large.

• Typically it is used for comparing the mean of a sample to some

hypothesized mean for the population in case of large sample, or when
population variance is known.
IIITS: IDA - M2021 35
Parametric Tests : t-test
•

: It is based on the t-distribution.

• It is considered an appropriate test for judging the significance of a sample

mean or for judging the significance of difference between the means of two
samples in case of

• small sample(s)

• population variance is not known (in this case, we use the variance of the sample as an
estimate of the population variance)

IIITS: IDA - M2021 36

Parametric Tests : -test

•

: It is based on Chi-squared distribution.

• It is used for comparing a sample variance to a theoretical population

variance.

IIITS: IDA - M2021 37

Parametric Tests : -test

•

: It is based on F-distribution.

• It is used to compare the variance of two independent samples.

• This test is also used in the context of analysis of variance (ANOVA) for
judging the significance of more than two sample means.

IIITS: IDA - M2021 38

Hypothesis Testing : Assumptions
•Case
1: Normal population, population infinite, sample size may be large or small, variance
of the population is known.

Case 2: Population normal, population finite, sample size may large or small………variance
is known.

Case 3: Population normal, population infinite, sample size is small and variance of the
population is unknown.

and

IIITS: IDA - M2021 39

Hypothesis Testing
•Case
4: Population finite

Note: If variance of population is known, replace by . Population normal, population

infinite, sample size is small and variance of the population is unknown.

IIITS: IDA - M2021 40

Hypothesis Testing : Non-Parametric Test

• Non-Parametric tests
Does not under any assumption
Assumes only nominal or ordinal data

Note: Non-parametric tests need entire population (or very large sample size)
IIITS: IDA - M2021 41
Any question?

IIITS: IDA - M2021 42

03.research Group 3 Chapter 2 TVL 12 HE-B FBS Competency Edited
100% (1)
03.research Group 3 Chapter 2 TVL 12 HE-B FBS Competency Edited
3 pages
SSC CGL Books
0% (2)
SSC CGL Books
3 pages
Chapter 7 - Statistical Inference
No ratings yet
Chapter 7 - Statistical Inference
62 pages
Hypothesis Test
83% (6)
Hypothesis Test
15 pages
The Lekki Headmaster JAMB Questions and Answers 2025
100% (4)
The Lekki Headmaster JAMB Questions and Answers 2025
35 pages
Item Analysis With Mastery Level & Frequency of Errors
100% (1)
Item Analysis With Mastery Level & Frequency of Errors
3 pages
Scribe
100% (1)
Scribe
9 pages
Examenes - Listening (Part 2)
100% (2)
Examenes - Listening (Part 2)
67 pages
Statistics & Probability Q4 - Week 3-4
No ratings yet
Statistics & Probability Q4 - Week 3-4
16 pages
Probability and Statistics Notes
No ratings yet
Probability and Statistics Notes
10 pages
Chapter 1 - Part 1 Introduction To Quality Assurance
100% (4)
Chapter 1 - Part 1 Introduction To Quality Assurance
25 pages
Testing of Hypothesis
67% (3)
Testing of Hypothesis
37 pages
2.introduction To Hypothesis Testing
No ratings yet
2.introduction To Hypothesis Testing
43 pages
Bus Math-Module 6.5 Test of of Significant Differences
No ratings yet
Bus Math-Module 6.5 Test of of Significant Differences
131 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
12 pages
Hypothesis
No ratings yet
Hypothesis
59 pages
Sample Size Determination-1
No ratings yet
Sample Size Determination-1
7 pages
III - Essentials of Test Score Interpretation
No ratings yet
III - Essentials of Test Score Interpretation
31 pages
Data Analytics Module 1 Lesson 6 Summary Notes
No ratings yet
Data Analytics Module 1 Lesson 6 Summary Notes
17 pages
One-Sample Tests of Hypothesis
No ratings yet
One-Sample Tests of Hypothesis
39 pages
Chapter 4test of Hypotheses
No ratings yet
Chapter 4test of Hypotheses
42 pages
Introduction: Hypothesis Testing Is A Formal Procedure For Investigating Our Ideas
No ratings yet
Introduction: Hypothesis Testing Is A Formal Procedure For Investigating Our Ideas
7 pages
Testing of Hypothesis Notes
No ratings yet
Testing of Hypothesis Notes
10 pages
Testing of Hypotheses (Saktipada Nanda)
No ratings yet
Testing of Hypotheses (Saktipada Nanda)
53 pages
CH III Hypothesis Testing
No ratings yet
CH III Hypothesis Testing
39 pages
IE5005 Lecture 04
No ratings yet
IE5005 Lecture 04
57 pages
1.1 Hypothesis Testing
No ratings yet
1.1 Hypothesis Testing
93 pages
Ken Black QA ch09
No ratings yet
Ken Black QA ch09
60 pages
Testing of Hypotheses PDF
No ratings yet
Testing of Hypotheses PDF
21 pages
Basic Business Statistics: (8 Edition)
No ratings yet
Basic Business Statistics: (8 Edition)
36 pages
07 (Chapter 7)
No ratings yet
07 (Chapter 7)
63 pages
22AIP3101A Session 10
No ratings yet
22AIP3101A Session 10
56 pages
Chapter 5
No ratings yet
Chapter 5
59 pages
Hypothesis Testing One Sample Full
No ratings yet
Hypothesis Testing One Sample Full
58 pages
Hypothesis Testing and Estimation
No ratings yet
Hypothesis Testing and Estimation
7 pages
Lesson 15-Test of Hypothesis
No ratings yet
Lesson 15-Test of Hypothesis
3 pages
Basic Concepts of Hypothesis Testing Discussion
No ratings yet
Basic Concepts of Hypothesis Testing Discussion
46 pages
Hypothesis and Index Number and Sampling Method
No ratings yet
Hypothesis and Index Number and Sampling Method
36 pages
Sampling
No ratings yet
Sampling
22 pages
10.01. Testing of Hypotheses Printable
No ratings yet
10.01. Testing of Hypotheses Printable
22 pages
Probability and Statistics Assignment
No ratings yet
Probability and Statistics Assignment
5 pages
06 Analyze
No ratings yet
06 Analyze
25 pages
Author: Dr. K. GURURAJAN: Class Notes of Engineering Mathematics Iv Subject Code: 06mat41
0% (1)
Author: Dr. K. GURURAJAN: Class Notes of Engineering Mathematics Iv Subject Code: 06mat41
122 pages
3 Hypothesis-Testing
No ratings yet
3 Hypothesis-Testing
59 pages
Testing of Hypothesis - SVB Notes
No ratings yet
Testing of Hypothesis - SVB Notes
20 pages
Lecture Notes 1
No ratings yet
Lecture Notes 1
147 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
44 pages
03-Hypothesis Testing With One Sample For The Mean Tutorial
No ratings yet
03-Hypothesis Testing With One Sample For The Mean Tutorial
24 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
35 pages
BADM 221 Unit 8 - Test of Hypothesis
No ratings yet
BADM 221 Unit 8 - Test of Hypothesis
47 pages
Chapter 5
No ratings yet
Chapter 5
8 pages
Lec 1
No ratings yet
Lec 1
38 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
26 pages
Infer Ential
No ratings yet
Infer Ential
25 pages
STSM3714 (With Notes From Class)
No ratings yet
STSM3714 (With Notes From Class)
110 pages
Business Statistics by S P Gupta 1
No ratings yet
Business Statistics by S P Gupta 1
18 pages
LESSON 5.1 Steps in Hypothesis Testing
No ratings yet
LESSON 5.1 Steps in Hypothesis Testing
4 pages
Unit4 R
No ratings yet
Unit4 R
21 pages
Fundamentals of Hypothesis Testing
No ratings yet
Fundamentals of Hypothesis Testing
9 pages
Lecture 7 With Solutions1
No ratings yet
Lecture 7 With Solutions1
42 pages
MT271 Chapter
No ratings yet
MT271 Chapter
14 pages
1305AFE Business Data Analysis: Statistical Inference
No ratings yet
1305AFE Business Data Analysis: Statistical Inference
58 pages
Problems of Hypothesis Testing-02
No ratings yet
Problems of Hypothesis Testing-02
44 pages
Hypothesis
No ratings yet
Hypothesis
27 pages
Hypothesis Testing Skills Set
No ratings yet
Hypothesis Testing Skills Set
6 pages
Testing Statistical Hypothesis - 1
No ratings yet
Testing Statistical Hypothesis - 1
10 pages
Marketing: Real People, Real Choices: Market Research
No ratings yet
Marketing: Real People, Real Choices: Market Research
34 pages
Chapter 3 &4 HYPOTHESIS and Chi-Square TESTING
No ratings yet
Chapter 3 &4 HYPOTHESIS and Chi-Square TESTING
17 pages
08 Probability Distributions
No ratings yet
08 Probability Distributions
50 pages
Babbie Basics 5e PPT CH 8
No ratings yet
Babbie Basics 5e PPT CH 8
12 pages
Types of Sampling
100% (1)
Types of Sampling
3 pages
MTH21ES Test 1 - 2023
No ratings yet
MTH21ES Test 1 - 2023
7 pages
Tutorial 14 Correlation
No ratings yet
Tutorial 14 Correlation
3 pages
1.4.8 B All Questions
No ratings yet
1.4.8 B All Questions
18 pages
Data Analysis Intro
No ratings yet
Data Analysis Intro
33 pages
Excel Assignment Opre 3360 Nateb
No ratings yet
Excel Assignment Opre 3360 Nateb
70 pages
Marketing Research FINAL (G11) G4
No ratings yet
Marketing Research FINAL (G11) G4
9 pages
Topic 2 Types of Educational Research Approaches: Synopsis
No ratings yet
Topic 2 Types of Educational Research Approaches: Synopsis
19 pages
BTL RM Notes
No ratings yet
BTL RM Notes
82 pages
Flowchart 2
No ratings yet
Flowchart 2
7 pages
The Effectiveness of A Mock Board Experience in Coaching Students For The Dental Hygiene National Board Examination
No ratings yet
The Effectiveness of A Mock Board Experience in Coaching Students For The Dental Hygiene National Board Examination
7 pages
Graduation Requirements - Chart 6
No ratings yet
Graduation Requirements - Chart 6
1 page
Ev M Manual August 2023
No ratings yet
Ev M Manual August 2023
182 pages
Community Radio: Voice of Voiceless: A Comparative Analysis of Punjab and Haryana
No ratings yet
Community Radio: Voice of Voiceless: A Comparative Analysis of Punjab and Haryana
11 pages
Introduction To Data Analytics: Sampling Distributions
No ratings yet
Introduction To Data Analytics: Sampling Distributions
31 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
30 pages
Sample Chapter 3
No ratings yet
Sample Chapter 3
17 pages
Student Suicides - What Are The Deep Rooted Problems
No ratings yet
Student Suicides - What Are The Deep Rooted Problems
8 pages
Voting
No ratings yet
Voting
20 pages
Performance (30%) Class Participation (20%) : Subject and Section: Schedule
No ratings yet
Performance (30%) Class Participation (20%) : Subject and Section: Schedule
10 pages
MR - ' T ' Test-Independent Sample and Paired Sample
No ratings yet
MR - ' T ' Test-Independent Sample and Paired Sample
26 pages
Respondents of The Study
No ratings yet
Respondents of The Study
5 pages
Updated Study Plan 101 COS1501 2024
No ratings yet
Updated Study Plan 101 COS1501 2024
3 pages
Hypothesis Testing: Six Sigma Thinking, #6
From Everand
Hypothesis Testing: Six Sigma Thinking, #6
Sumeet Savant
No ratings yet