0% found this document useful (0 votes)

20 views35 pages

Chapter 5

Uploaded by

radhekrishna85411

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views35 pages

Chapter 5

Uploaded by

radhekrishna85411

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Chapter 5

Brief Overview of Probability

SAMPLING DISTRIBUTIONS
• An important application of statistics in machine learning is
how to draw a conclusion about a set or population based on
the probability model of random samples of the set.
• For example, based on the malignancy sample test results of
some random tumour cases we want to estimate the
proportion of all tumours which are malignant and thus advise
the doctors on the requirement or non-requirement of biopsy
on each tumour case.
• Different random samples may give different estimates.
• If we can get some knowledge about the variability of all
possible estimates derived from the random samples, then we
should be able to arrive at reasonable conclusions.
Terminologies used in Sampling
• Population: a finite set of objects being investigated.
• Sampling: Different machine learning models do not perform
well if the size of data is very large because of the limitation
of the computer memory. To solve this problem we have to
pick the part of the data set which represents the whole data
set. This process of picking the part of the data set is known
as sampling.
• Random sample: a sample of objects drawn from a population
in a way that every member of the population has the same
chance of being chosen.
• Sampling distribution refers to the probability distribution of a
random variable defined in a space of random samples.
Sampling with replacement
• While choosing the samples from the population if
each object chosen is returned to the population
before the next object is chosen.
• In this case, repetitions are allowed.
• That means, if the sample size n is chosen from the
population size of N, then the number of such samples
is:
• N × N × cc. × N = Nn , because each object can be
repeated.
• The probability of each sample being chosen is: 1/ Nn
Sampling without replacement
• We don’t return the object being chosen to
the population before choosing the next
object:
• Choose n elements out of N elements:
Mean and variance of sample with
replacement
Sampling Without Replacement
HYPOTHESIS TESTING
• A fundamental statistical method to make informed
decisions based on empirical evidence.
• It involves formulating assumptions about population
parameters using sample statistics and rigorously
evaluating these assumptions against collected data.
(For example, a judge assumes a person is innocent and
verifies this by reviewing evidence and hearing
testimony before reaching a verdict.)
• A systematic approach that allows researchers to assess
the validity of a statistical claim about an unknown
population parameter.
• Hypothesis testing is basically an assumption that
we make about a population parameter. It
evaluates two mutually exclusive statements
about a population to determine which statement
is best supported by the sample data.
• To test the validity of the claim or assumption
about the population parameter:
– A sample is drawn from the population and analyzed.
– The results of the analysis are used to decide whether
the claim is true or not.
Defining Hypotheses

• Null hypothesis (H0): In statistics, the null hypothesis is a general

statement or default position that there is no relationship between
two measured cases or no relationship among groups. In other words,
it is a basic assumption or made based on the problem knowledge.
Example: A company’s mean production is 50 units/per day H0: μ = 50.
• Alternative hypothesis (H1): The alternative hypothesis is the
hypothesis used in hypothesis testing that is contrary to the null
hypothesis.
Example: A company’s production is not equal to 50 units/per day i.e.
H1: μ ≠ 50.
• The null hypothesis and alternative hypothesis are complementary
statistical hypotheses that are used to test a claim or statement about
a population:
Null vs. Alternative Hypothesis
Key Terms of Hypothesis Testing

• Level of significance: It refers to the degree of

significance in which we accept or reject the
null hypothesis. 100% accuracy is not possible
for accepting a hypothesis, so we, therefore,
select a level of significance that is usually 5%.
• This is normally denoted with α and generally,
it is 0.05 or 5%, which means your output
should be 95% confident to give a similar kind
of result in each sample.
Key Terms of Hypothesis Testing
• P-value: calculated probability, is the
probability of finding the observed/extreme
results when the null hypothesis(H0) of a
study-given problem is true. If your P-value is
less than the chosen significance level then
you reject the null hypothesis i.e. accept that
your sample claims to support the alternative
hypothesis.
Key Terms of Hypothesis Testing
• Test Statistic: A numerical value calculated from sample data
during a hypothesis test, used to determine whether to reject
the null hypothesis. It is compared to a critical value or p-value
to make decisions about the statistical significance of the
observed results.
– Z-test: If population means and standard deviations are known. Z-
statistic is commonly used.
– t-test: If population standard deviations are unknown. and sample
size is small than t-test statistic is more appropriate.
– Chi-square test: Chi-square test is used for categorical data or for
testing independence in contingency tables
– F-test: F-test is often used in analysis of variance (ANOVA) to compare
variances or test the equality of means across multiple groups.
Key Terms of Hypothesis Testing
Calculating test statistic

• Z-statistics: When population means and standard

deviations are known.
z=(x–μ)/(σ/n0.5)
• T-Test is used when n<30,
t=(xˉ−μ)/ (s/n0.5)

x̄ = sample mean, μ = population mean, s = standard

deviation of the sample, n = sample size
Key Terms of Hypothesis Testing
• Critical value: The critical value in statistics is a
threshold or cutoff point used to determine
whether to reject the null hypothesis in a
hypothesis test.
Key Terms of Hypothesis Testing
• Degrees of freedom: The variability or freedom one
has in estimating a parameter. The degrees of freedom
are related to the sample size and determine the shape.
• Degrees of freedom are the maximum number of
logically independent values, which may vary in a data
sample. Degrees of freedom are calculated by
subtracting one from the number of items within the

• 𝑑𝑓=𝑁−1
data sample.

, where N is the number of items in the data sample

Comparing Test Statistic

• There are two ways to decide where we should

accept or reject the null hypothesis.
• Method A: (using Critical Value)
– If Test Statistic>Critical Value: Reject the null hypothesis.
– If Test Statistic≤Critical Value: Fail to reject the null
hypothesis.
(Critical values are predetermined threshold values that
are used to make a decision in hypothesis testing. To
determine critical value for hypothesis testing, we
typically refer to a statistical distribution table)
Comparing Test Statistic
• Method B: Using P-values
We can also come to an conclusion using the p-
value,
• If the p-value is less than or equal to the
significance level i.e. (p≤α), you reject the null
hypothesis.
• If the p-value is greater than the significance
level i.e. (p≥α), you fail to reject the null
hypothesis.
One-Tailed and Two-Tailed Test
One-Tailed

• One tailed test focuses on one direction, either greater than or less than a
specified value. We use a one-tailed test when there is a clear directional
expectation based on prior knowledge or theory. The critical region is located
on only one side of the distribution curve. If the sample falls into this critical
region, the null hypothesis is rejected in favor of the alternative hypothesis.
• There are two types of one-tailed test:
• Left-Tailed (Left-Sided) Test: The alternative hypothesis asserts that the true
parameter value is less than the null hypothesis. Example: H0:μ≥50 and H1: μ<50
• Right-Tailed (Right-Sided) Test: The alternative hypothesis asserts that the true
parameter value is greater than the null hypothesis. Example: H0 : μ≤50 and
H1:μ>50
One-Tailed and Two-Tailed Test
Two-Tailed Test

• Considers both directions, greater than and

less than a specified value.
• We use a two-tailed test when there is no
specific directional expectation, and want to
detect any significant difference.
• Example: H0: μ= 50 and H1: μ1≠=50
Error in Hypothesis testing
• Type I error: When we reject the null
hypothesis, although that hypothesis was true.
Type I error is denoted by alpha(α).
• Type II error : When we accept the null
hypothesis, but it is false. Type II errors are
denoted by beta(β).
Real life Examples of Hypothesis Testing

• Example-1: Does a New Drug Affect Blood Pressure?

Imagine a pharmaceutical company has developed a new
drug that they believe can effectively lower blood pressure
in patients with hypertension. Before bringing the drug to
market, they need to conduct a study to assess its impact
on blood pressure.
Data: Before Treatment:
120, 122, 118, 130, 125, 128, 115, 121, 123, 119
After Treatment:
115, 120, 112, 128, 122, 125, 110, 117, 119, 114
Step 1: Define the Hypothesis
Null Hypothesis: (H0)The new drug has no effect
on blood pressure.
Alternate Hypothesis: (H1)The new drug has an
effect on blood pressure.
• Step 2: Define the Significance level
Let’s consider the Significance level at 0.05,
indicating rejection of the null hypothesis.
If the evidence suggests less than a 5% chance
of observing the results due to random
variation.
• Step 3: Compute the test statistic

• Using paired t test, for the above problem

m= -3.9, s= 1.8 and n= 10
T-statistic = -9 based on the formula for paired t
test
Paired T-Test – A Detailed Overview

• t-test is the statistical method used to determine if

there is a difference between the means of two
samples.
• This t-test is further divided into 3 types based on
your data and result need.
– One sample t-test: the mean of a single population is
compared against the known mean.
– Independent sample t-test: the mean of two different
populations is compared.
– Paired sample t-test: the mean of the same group or
population is at separate times.
Paired t-test
• Object is measured twice consequential providing the pairs of
observation for paired t-Test.
• Used to find if the mean of the dependent variable is the same in two
same or related groups.
• For example: measuring the weight of a person before and after
breakfast.
• The hypothesis can be represented as:
– Null Hypothesis, H0: u1 = u2 or H0: u1 –u2 = 0
– Alternative hypothesis, H1: u1 is not equal to u2 or H1: u1 – u2 is not equal to
zero.
(U1 is the mean of variable 1 U2 is the mean of variable 2)
• t = m/(s/√n),
m = mean of the difference i.e Xafter, Xbefore
s = standard deviation of the difference (d) i.e di=Xafter, Xbefore,
n = sample size,
• Step 4: Find the p-value
The calculated t-statistic is -9 and degrees of
freedom df = 9,
you can find the p-value using statistical
software or a t-distribution table.
thus, p-value = 8.538051223166285e-06
• Step 5: Result
• If the p-value is less than or equal to 0.05, the
researchers reject the null hypothesis.
• If the p-value is greater than 0.05, they fail to reject the
null hypothesis.
• Conclusion: Since the p-value (8.538051223166285e-06)
is less than the significance level (0.05), the researchers
reject the null hypothesis. There is statistically significant
evidence that the average blood pressure before and
after treatment with the new drug is different.
Example-2: Cholesterol level in a population

• Data: A sample of 25 individuals is taken, and

their cholesterol levels are measured.
• Cholesterol Levels (mg/dL):
• 205, 198, 210, 190, 215, 205, 200, 192, 198, 205, 198, 202,
208, 200, 205, 198, 205, 210, 192, 205, 198, 205, 210, 192,
205.
• Populations Mean = 200
• Population Standard Deviation (σ): 5
mg/dL(given for this problem)
• Step 1: Define the Hypothesis
• Null Hypothesis (H0): The average cholesterol
level in a population is 200 mg/dL.
• Alternate Hypothesis (H1): The average
cholesterol level in a population is different
from 200 mg/dL.
• Step 2: Define the Significance level
• As the direction of deviation is not given , we
assume a two-tailed test, and based on a
normal distribution table, the critical values
for a significance level of 0.05 (two-tailed) can
be calculated through the z-table and are
approximately -1.96 and 1.96.
• Compute the test statistic
Z-statistics: When population means and
standard deviations are known.
z=(x–μ)/(σ/n0.5)

• The test statistic is calculated by using the z

formula Z=(203.8–200)/(5÷250.5)
• We get Z=2.039999999999992.
• Step 4: Result
• Since the absolute value of the test statistic
(2.04) is greater than the critical value (1.96),
we reject the null hypothesis. And conclude
that, there is statistically significant evidence
that the average cholesterol level in the
population is different from 200 mg/dL

Ben - Biostatics Lecture Notes 200L
No ratings yet
Ben - Biostatics Lecture Notes 200L
49 pages
Tests of Hypothesis, Correlation and Regression Analysis
No ratings yet
Tests of Hypothesis, Correlation and Regression Analysis
60 pages
Hypothesis Testing - Intro - Summer 2025
No ratings yet
Hypothesis Testing - Intro - Summer 2025
59 pages
1.1 Hypothesis Testing
No ratings yet
1.1 Hypothesis Testing
93 pages
Ashish+Gupta+Project+Report Advanced+Statistics 13 11 2022
50% (2)
Ashish+Gupta+Project+Report Advanced+Statistics 13 11 2022
21 pages
Testing Technique in Data Science
No ratings yet
Testing Technique in Data Science
65 pages
Module 6 Testing of Hypothesis
No ratings yet
Module 6 Testing of Hypothesis
49 pages
BS IMI U5 Oct23
No ratings yet
BS IMI U5 Oct23
165 pages
Applied - Data - Science MODULE 2 SEM8
No ratings yet
Applied - Data - Science MODULE 2 SEM8
53 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
78 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
24 pages
Advanced Statistic
No ratings yet
Advanced Statistic
33 pages
T Test
No ratings yet
T Test
29 pages
2 Intro To Inferential Stat
No ratings yet
2 Intro To Inferential Stat
37 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
37 pages
Hypothesis Testing: By: Janice Galus Cordova
75% (4)
Hypothesis Testing: By: Janice Galus Cordova
22 pages
Lecture III
No ratings yet
Lecture III
52 pages
QA Hypothesis
No ratings yet
QA Hypothesis
41 pages
Unit 4 Statistical Testing and Modeling in R
No ratings yet
Unit 4 Statistical Testing and Modeling in R
25 pages
Testing of Hypothesis Stastistics
No ratings yet
Testing of Hypothesis Stastistics
93 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
61 pages
Statistics and Probability
100% (2)
Statistics and Probability
71 pages
BRM Unit 4
No ratings yet
BRM Unit 4
20 pages
MODULE 3 Statistics - 240131 - 013906
No ratings yet
MODULE 3 Statistics - 240131 - 013906
36 pages
Testing Hypothesis - 1
No ratings yet
Testing Hypothesis - 1
18 pages
9 Nov Maths Notes
No ratings yet
9 Nov Maths Notes
12 pages
Module 1 - One Sample Test - With MINITAB
No ratings yet
Module 1 - One Sample Test - With MINITAB
60 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
54 pages
Week - 1 Day - 2 Inferential Statistics
No ratings yet
Week - 1 Day - 2 Inferential Statistics
34 pages
Computational Data Science - Unit 4
No ratings yet
Computational Data Science - Unit 4
18 pages
Infer Ential
No ratings yet
Infer Ential
25 pages
The Effects of Parenting Styles On Students' Self-Efficacy: Mojgan Seifi
No ratings yet
The Effects of Parenting Styles On Students' Self-Efficacy: Mojgan Seifi
8 pages
Chapt10 Hypothesis Testing One-Sample Tests BBA
No ratings yet
Chapt10 Hypothesis Testing One-Sample Tests BBA
50 pages
Stat Prob Q4 Module 4
50% (2)
Stat Prob Q4 Module 4
20 pages
Lecture 3 of Computational Statistics
No ratings yet
Lecture 3 of Computational Statistics
32 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
9 pages
Unit 3 (Hypothesis Testing)
No ratings yet
Unit 3 (Hypothesis Testing)
40 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
34 pages
Chapter 7
No ratings yet
Chapter 7
9 pages
Chapter 8 (Technical English For Statistics)
No ratings yet
Chapter 8 (Technical English For Statistics)
6 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
34 pages
6 RM - Basics of Testing of Hypothesis
No ratings yet
6 RM - Basics of Testing of Hypothesis
33 pages
Lect 7 Hypothesis Testing
No ratings yet
Lect 7 Hypothesis Testing
23 pages
Hypothesis Testing v2.0
No ratings yet
Hypothesis Testing v2.0
40 pages
Types of Statistical Hypothesis: Statistics
No ratings yet
Types of Statistical Hypothesis: Statistics
18 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
86 pages
Testing of Hypothesis
67% (3)
Testing of Hypothesis
37 pages
ch05 PDF
100% (1)
ch05 PDF
21 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
5 pages
Learning Module - Statistics and Probability
No ratings yet
Learning Module - Statistics and Probability
71 pages
Unit 3 Hypothesis
No ratings yet
Unit 3 Hypothesis
41 pages
Chapter IX Hypothesis Testing
No ratings yet
Chapter IX Hypothesis Testing
31 pages
What Is Hypothesis Testing in Statistics Types A
No ratings yet
What Is Hypothesis Testing in Statistics Types A
2 pages
Session 11 Handouts
No ratings yet
Session 11 Handouts
37 pages
Basic Concepts of Hypothesis Testing Discussion
No ratings yet
Basic Concepts of Hypothesis Testing Discussion
46 pages
Students at Risk
100% (1)
Students at Risk
53 pages
Introduction To Statistical Hypothesis Testing in R
No ratings yet
Introduction To Statistical Hypothesis Testing in R
8 pages
Inferential Statistics
100% (4)
Inferential Statistics
28 pages
Chapter Five Hypothesis Testing
No ratings yet
Chapter Five Hypothesis Testing
50 pages
Statistics Can Be Broadly Classified Into Two Categories Namely (I) Descriptive Statistics and (II) Inferential Statistics
0% (1)
Statistics Can Be Broadly Classified Into Two Categories Namely (I) Descriptive Statistics and (II) Inferential Statistics
59 pages
Week 1 To 3 Lectures Q A
No ratings yet
Week 1 To 3 Lectures Q A
16 pages
Testing of Hypothesis Hypothesis
No ratings yet
Testing of Hypothesis Hypothesis
32 pages
Syallabus of Mba 1st Sem Jnvu Jodhpur
No ratings yet
Syallabus of Mba 1st Sem Jnvu Jodhpur
34 pages
Hypothesis Lecture
No ratings yet
Hypothesis Lecture
7 pages
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
No ratings yet
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
15 pages
Edm 803 Quantitative
No ratings yet
Edm 803 Quantitative
22 pages
Busi 820 Week 8 DBF
No ratings yet
Busi 820 Week 8 DBF
7 pages
90156hypothesis Testing
No ratings yet
90156hypothesis Testing
34 pages
Hypothesis Testing Overview
No ratings yet
Hypothesis Testing Overview
2 pages
Module 8-3 Inference About Two Populations
No ratings yet
Module 8-3 Inference About Two Populations
64 pages
12 Anova
No ratings yet
12 Anova
43 pages
Practical Research 2 Quarter 4 Module 7
No ratings yet
Practical Research 2 Quarter 4 Module 7
3 pages
Unit+16 T Test
No ratings yet
Unit+16 T Test
35 pages
Hypothesis Testing Assignment
No ratings yet
Hypothesis Testing Assignment
12 pages
Worksheet N0. 5.1 B Test On One Sample Mean
No ratings yet
Worksheet N0. 5.1 B Test On One Sample Mean
14 pages
Jurnal Internasional
No ratings yet
Jurnal Internasional
22 pages
Chapter 3
No ratings yet
Chapter 3
19 pages
RM Unit 2
No ratings yet
RM Unit 2
20 pages
MB0050 Research Mythodology
No ratings yet
MB0050 Research Mythodology
10 pages
Augmented Dickney Fuller and Phillip-Peron Tests: Prior To Global Contagion
No ratings yet
Augmented Dickney Fuller and Phillip-Peron Tests: Prior To Global Contagion
10 pages
SOWQMT1014JD11
No ratings yet
SOWQMT1014JD11
5 pages
De Castro Research Problem Theoretical and Conceptual Framework Report 3
No ratings yet
De Castro Research Problem Theoretical and Conceptual Framework Report 3
30 pages
Unit 15 Hypothesis
No ratings yet
Unit 15 Hypothesis
26 pages
Aivazian, Ge and Qiu - 2005
No ratings yet
Aivazian, Ge and Qiu - 2005
15 pages
Assignment 1 Stats Lab 2
No ratings yet
Assignment 1 Stats Lab 2
13 pages
MG221: Applied Probability & Statistics: Syllabus 2018
No ratings yet
MG221: Applied Probability & Statistics: Syllabus 2018
2 pages
4) Afework
No ratings yet
4) Afework
193 pages
SMDM:Quiz-3 - Hypothesis Testing: Attempt History
No ratings yet
SMDM:Quiz-3 - Hypothesis Testing: Attempt History
4 pages
New Lampiran 5 Analisis Univariat Dan Bivariat
No ratings yet
New Lampiran 5 Analisis Univariat Dan Bivariat
4 pages
Hypothesis Testing: Six Sigma Thinking, #6
From Everand
Hypothesis Testing: Six Sigma Thinking, #6
Sumeet Savant
No ratings yet

Chapter 5

Uploaded by

Chapter 5

Uploaded by

Chapter 5

Brief Overview of Probability

• Null hypothesis (H0): In statistics, the null hypothesis is a general

• Level of significance: It refers to the degree of

• Z-statistics: When population means and standard

x̄ = sample mean, μ = population mean, s = standard

, where N is the number of items in the data sample

• There are two ways to decide where we should

• Considers both directions, greater than and

• Example-1: Does a New Drug Affect Blood Pressure?

• Using paired t test, for the above problem

• t-test is the statistical method used to determine if

• Data: A sample of 25 individuals is taken, and

• The test statistic is calculated by using the z

You might also like