0% found this document useful (0 votes)

21 views22 pages

MLS 2 - Statistics For Data Science

Uploaded by

golgothgolgoth039

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views22 pages

MLS 2 - Statistics For Data Science

Uploaded by

golgothgolgoth039

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Statistics for Data Science

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Topics covered so far
1. Statistical Inference
a. Distributions - Binomial, Uniform, Normal
b. Sampling
c. Central Limit Theorem
d. Conﬁdence Intervals
2. Hypothesis Testing
a. Hypothesis Formulation
b. One-Tailed Test vs Two-Tailed Test
c. Type I and Type II Errors

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Gauge your Understanding
1. What is a random variable and how is it related to probability distribution?
2. What are some of the most commonly used distributions?
3. What is Central Limit Theorem (CLT) and when is it used?
4. What do you mean by estimations?

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
What is a Random Variable?
A random variable is a function that assigns a numerical value to each outcome of an experiment. It
assumes different values with different probability. It is usually denoted by capital letter X and the
probability associated with any particular value of X is denoted by P(X=x).

Example: Suppose that a fair coin is tossed twice and the possible outcome are {HH, HT, TH, TT}. Let X be
the random variable representing the number of heads that can come up. So, X can take values from the
set {2, 1, 0}.
The probability of two heads coming up is P(X=2) = ¼.

Random Variable

Discrete random variable: It can Continuous random variable: It can

take only a ﬁnite number of values. take uncountable number of values
For example: Number of employees in a given range. For example:
getting promoted in an organization. Speed of an aircraft.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
What is a Probability Distribution?
The probability distribution of a random variable describes the values that the random variable can
take along with the probabilities of those values.

Discrete
Discrete Probability Probability mass
Distribution function

provides the probability for each

value of the random variable
Random Variable

Continuous Probability Probability density

Continuous
Distribution function

determines the probability with

which the continuous random
variable lies in a given interval

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
Distributions around us (commonly occuring)

Bernoulli The outcome of tossing a fair coin

Binomial The number of non-defective products in a production run

Uniform The number of books sold weekly at a bookstore

Normal IQ distribution of all the seven years old children in New York

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 6
Binomial Distribution
The binomial distribution is the probability distribution of the number of successes of an experiment that is
conducted multiple times and has only two possible outcomes.

Example: Suppose you have purchased 10 lottery tickets and the possible outcomes are winning the lottery or not
winning the lottery, then you can answer a question like what is the probability of winning 6 lottery tickets using
binomial distribution.

The assumptions of Binomial distribution are as follows:

1. There are only two possible outcomes (success or failure) for each trial.
2. The number of trials is ﬁxed.
3. The outcome of each trial is independent. In other words, none of the trials have an effect on the probability
of the next trial.
4. The probability of success is exactly the same for each trial.

Note: In binomial distribution, if the number of trials for a given experiment is equal to 1, then it is called Bernoulli
distribution.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 7
Uniform Distribution

The Uniform Distribution is the probability

distribution where all outcome are equal
likely.

Continuous Uniform Distribution: Can take

Discrete Uniform Distribution: Can take a any value within a given range with equal
ﬁnite number (m) of values and each value probability.
has equal probability of selection.
Example: Weight gained by a person over
Example: Rolling a single die. next 2 months can be uniformly distributed
between 2 to 5 Kg.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 8
Normal Distribution
The normal distribution is a continuous probability distribution that is symmetric about the mean. It is also known
as bell curve because the graph of its probability density function looks like a bell.

Example: The height of all adult males in a city

Properties:

● It has a zero skewness

● Mean = Median = Mode
● If mean = 0 and standard deviation = 1, then it is called a standard normal distribution

Empirical Rule

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 9
Sampling Distributions
What is the need for sampling?

Given the limited resources and time, it is not always possible to study the population. That’s why we choose a sample out of the
population to make inference about the population.

Example: Suppose a new drug is manufactured and it needs is to be tested for the adverse side effects on a country’s population. It is
almost impossible to conduct a research study that involves everyone.

What are Sampling Distributions?

It is a distribution of a particular sample statistic obtained from all possible samples drawn from a speciﬁc population.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 10
Central Limit Theorem

The sampling distribution of the sample means will approach

normal distribution as the sample size gets bigger, no matter
what the shape of the population distribution is.

Assumptions

Data must be randomly sampled Sample values must be independent of each other

Samples should come from the same distribution Sample size must be sufﬁciently large (≥30)

Let’s see CLT in action by simulation - Link to external site

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 11
Estimations

Make inferences about population

Estimation parameter based on sample statistic

Point Estimation Interval Estimation

The range of values within

which the population
Single value of a
parameter lies with some
population parameter
conﬁdence
Ex. Population mean as
Ex. Population mean
estimated from the
should lie in the range
sample mean is $20
$15-$25, with 95%
conﬁdence

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 12
Case Study
Inferential Statistics

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 13
Gauge Your Understanding
1. What is hypothesis testing and what are different types of hypotheses?
2. What are some of the key terms involved in hypothesis testing?
3. What is the difference between one-tailed and two-tailed tests?
4. What are the steps to perform a hypothesis test?

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 14
Introduction to Hypothesis Testing

Question of Interest Hypotheses about the population

e.g. Has the new online parameter(s)
Ad increased the
conversion rates for an
E-commerce website?

Null Hypothesis (H0) Alternative Hypothesis (Ha)

The status quo The research hypothesis
e.g. The new Ad has not increased the e.g. The new Ad has increased the
conversion rate. conversion rate.

● Probability of observing equal or more extreme results than the computed

test statistic, under the null hypothesis.
P-Value ● The smaller the p-value, the stronger the evidence against the null
hypothesis.

● The signiﬁcance level (denoted by α), is the probability of rejecting the null
hypothesis when it is true.
Level of Signiﬁcance ● It is a measure of the strength of the evidence that must be present in the
sample data to reject the null hypothesis.

● The total area under the distribution curve of the test statistic is partitioned
Acceptance or Rejection into acceptance and rejection region
Region ● Reject the null hypothesis when the test statistic lies in the rejection region,
else we fail to reject it

Types of Error ● There are two types of errors - Type I and Type II

Level of signiﬁcance =
α
Conﬁdence Level = H0 True H0 False
(1 - α )

Correct
Reject H0 Type I Error (α)
decision

Fail to reject Correct Type II Error

H0 decision (β)

Power of the
test = (1 - β)

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 17
Let’s go through an example
Problem Statement: The store manager believes that the average waiting time for the customers
at checkouts has become worse than 15 minutes. Formulate the Null and the Alternate
hypotheses.

Null Hypothesis (H0): The average waiting time at checkouts is less than equal to 15 minutes.

Alternate Hypothesis (Ha): The average waiting time at checkouts is more than 15 minutes.

Type I error (False Positive): Reject Null hypothesis when it is indeed true. “The fact is that the
average waiting time at checkout is less than equal to 15 minutes but the store manager has
identiﬁed that it is more than 15 minutes”.

Type II error (False Negative): Fail to reject Null hypothesis when it is indeed false. “The fact is
that the average waiting time at checkout is more than 15 minutes but the store manager has
identiﬁed that it is less than equal to 15 minutes”.

Reject H0 if the value of

Reject H0 if the value of Reject H0 if the value of
test statistic is either too
test statistic is too small test statistic is too large
small or too large

Select Appropriate Test

Set Level of Signiﬁcance, 𝛂

Collect Data and Calculate Test Statistic

Determine p-value Determine Critical Value

Compare with 𝛂 Compare with Test Statistic

Reject or Fail to Reject H0

3 - Introduction To Inferential Statistics
No ratings yet
3 - Introduction To Inferential Statistics
32 pages
Programming With Python and GUI Development... 2024
No ratings yet
Programming With Python and GUI Development... 2024
145 pages
Action Research For Educators, 2nd Edition (Daniel R. Tomal) PDF
100% (2)
Action Research For Educators, 2nd Edition (Daniel R. Tomal) PDF
215 pages
Statistics and Probability
100% (1)
Statistics and Probability
27 pages
Effects Pencil Shaving Pechay Growth
No ratings yet
Effects Pencil Shaving Pechay Growth
23 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
435 pages
Testing of Hypothesis
67% (3)
Testing of Hypothesis
37 pages
Tests of Significance
No ratings yet
Tests of Significance
36 pages
Inferential Statistics: Sampling, Probability, and Hypothesis Testing
No ratings yet
Inferential Statistics: Sampling, Probability, and Hypothesis Testing
26 pages
Lec 5
No ratings yet
Lec 5
64 pages
00 - Inrroduction To Statistics
No ratings yet
00 - Inrroduction To Statistics
30 pages
Probability: PSYB07 Gabriel Baylon October 2, 2013
No ratings yet
Probability: PSYB07 Gabriel Baylon October 2, 2013
9 pages
Unit 6
No ratings yet
Unit 6
81 pages
Statistics Notes
No ratings yet
Statistics Notes
15 pages
Lecture Slides - Inferential Statistics
No ratings yet
Lecture Slides - Inferential Statistics
42 pages
Quant Part2
No ratings yet
Quant Part2
40 pages
Module 02 - AIML Statisitcs
No ratings yet
Module 02 - AIML Statisitcs
103 pages
Week 9+10+11
No ratings yet
Week 9+10+11
82 pages
EDA Reviewer
No ratings yet
EDA Reviewer
8 pages
Summary - Introduction To Statistics
No ratings yet
Summary - Introduction To Statistics
23 pages
Point of Estimation of Parameters and Sampling Distri.
No ratings yet
Point of Estimation of Parameters and Sampling Distri.
39 pages
RM Module 3
No ratings yet
RM Module 3
30 pages
Definition of Median
No ratings yet
Definition of Median
6 pages
Bio-Stat Class 2 and 3
No ratings yet
Bio-Stat Class 2 and 3
58 pages
2 Intro Inferrential Statistics
No ratings yet
2 Intro Inferrential Statistics
24 pages
Chapter3 Notes
No ratings yet
Chapter3 Notes
24 pages
13 Final Review
No ratings yet
13 Final Review
32 pages
ECON 361: Income & Inequality: Lecture 2: Review of Statistics
No ratings yet
ECON 361: Income & Inequality: Lecture 2: Review of Statistics
279 pages
Unit 3 R As A Set of Statistical Tables
No ratings yet
Unit 3 R As A Set of Statistical Tables
31 pages
Data Collection Analysis PS202 Reporting
No ratings yet
Data Collection Analysis PS202 Reporting
25 pages
Probs-Stats Revision Notes
No ratings yet
Probs-Stats Revision Notes
19 pages
Internal Paper
No ratings yet
Internal Paper
20 pages
Probability Distributions.
No ratings yet
Probability Distributions.
46 pages
What Is Statistic
No ratings yet
What Is Statistic
129 pages
ISI Workshop-16 - 20, 2015. ISI Campus, Kolkata: TH TH
No ratings yet
ISI Workshop-16 - 20, 2015. ISI Campus, Kolkata: TH TH
8 pages
Infer Ential
No ratings yet
Infer Ential
25 pages
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
No ratings yet
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
8 pages
Statistics
No ratings yet
Statistics
5 pages
2 Inferential+Statistics+ (Theoretical)
No ratings yet
2 Inferential+Statistics+ (Theoretical)
4 pages
COM 201 - Inferential Statistics - 18032022-1
No ratings yet
COM 201 - Inferential Statistics - 18032022-1
58 pages
Probability Tutorial
No ratings yet
Probability Tutorial
8 pages
Hypothesis Testing Lecture
No ratings yet
Hypothesis Testing Lecture
28 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
ProbabilityDistributions BRSM SP2022 Lecture3
No ratings yet
ProbabilityDistributions BRSM SP2022 Lecture3
45 pages
Probability and Statistics
No ratings yet
Probability and Statistics
8 pages
UNIT - 4 Complete
No ratings yet
UNIT - 4 Complete
77 pages
Ders 1
No ratings yet
Ders 1
34 pages
Unit 1 Ssmda Notes
No ratings yet
Unit 1 Ssmda Notes
35 pages
Statistical and Probability Tools For Cost Engineering
No ratings yet
Statistical and Probability Tools For Cost Engineering
16 pages
08 STATSPROB Third Quarter
No ratings yet
08 STATSPROB Third Quarter
9 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
S2 Revision Notes
No ratings yet
S2 Revision Notes
2 pages
AP Stats Cheat Sheet FINAL
No ratings yet
AP Stats Cheat Sheet FINAL
8 pages
Key of Week1 - Lecture Notes
No ratings yet
Key of Week1 - Lecture Notes
10 pages
BCS301M33
No ratings yet
BCS301M33
11 pages
Research - Stats Notes
No ratings yet
Research - Stats Notes
44 pages
9 Hypothesis Testing - 25 - 02 - 28 - 23 - 16 - 15
No ratings yet
9 Hypothesis Testing - 25 - 02 - 28 - 23 - 16 - 15
34 pages
FDA CIA 2 Qs Answers
No ratings yet
FDA CIA 2 Qs Answers
26 pages
Eco254 Summary (Full) 08024665051
No ratings yet
Eco254 Summary (Full) 08024665051
12 pages
Inferential Statistics: Probability
No ratings yet
Inferential Statistics: Probability
5 pages
FBA Module 2
No ratings yet
FBA Module 2
27 pages
DMV - Unit I
No ratings yet
DMV - Unit I
44 pages
6.4 The Normal Distribution
No ratings yet
6.4 The Normal Distribution
23 pages
Black Belt Questions
100% (1)
Black Belt Questions
8 pages
Research PF 5s
No ratings yet
Research PF 5s
11 pages
Lecture+Slides+ +week+1
No ratings yet
Lecture+Slides+ +week+1
30 pages
Fintech and Disruptions-An Impact Assessment
No ratings yet
Fintech and Disruptions-An Impact Assessment
9 pages
One Way ANOVA
100% (1)
One Way ANOVA
3 pages
Notebook - Deep Neural Networks
No ratings yet
Notebook - Deep Neural Networks
28 pages
AB Testing
No ratings yet
AB Testing
6 pages
MLS 1 - Regression
No ratings yet
MLS 1 - Regression
20 pages
5 2-4 Spatial Environmental Data Gaussian Processes
No ratings yet
5 2-4 Spatial Environmental Data Gaussian Processes
3 pages
Presentation of Research Methodology
No ratings yet
Presentation of Research Methodology
11 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
63 pages
The Plugged-In Life of Teens: Impact of Social Media On Interpersonal Communication Among Adolescences
No ratings yet
The Plugged-In Life of Teens: Impact of Social Media On Interpersonal Communication Among Adolescences
5 pages
Worksheet N0. 5.1 B Test On One Sample Mean
No ratings yet
Worksheet N0. 5.1 B Test On One Sample Mean
14 pages
RM Note Unit - 4
No ratings yet
RM Note Unit - 4
21 pages
CHAPTER 7 (SAS Session) 2023
No ratings yet
CHAPTER 7 (SAS Session) 2023
137 pages
Diversification Strategies, Bus1 N Ess Cycles and Economic Performance
No ratings yet
Diversification Strategies, Bus1 N Ess Cycles and Economic Performance
12 pages
Assignment 3 FBA
No ratings yet
Assignment 3 FBA
14 pages
Practice Problems - Part 01
No ratings yet
Practice Problems - Part 01
6 pages
Advanced Statistics Project Report
No ratings yet
Advanced Statistics Project Report
20 pages
IEP102 Group 3
No ratings yet
IEP102 Group 3
75 pages
Take Home Exam Ukp
No ratings yet
Take Home Exam Ukp
4 pages
Notebook - Music Recommendation System Reference
No ratings yet
Notebook - Music Recommendation System Reference
22 pages
Stock Market Dashboard in Python
No ratings yet
Stock Market Dashboard in Python
4 pages
ML LVC 3 Post-Session Summary
No ratings yet
ML LVC 3 Post-Session Summary
16 pages
10 1016@j Jbusres 2012 02 023
No ratings yet
10 1016@j Jbusres 2012 02 023
6 pages
The CNN Architecture
No ratings yet
The CNN Architecture
15 pages
Active Vs Passive
No ratings yet
Active Vs Passive
23 pages
Untitled
No ratings yet
Untitled
18 pages
Evaluation of An Arabic Speech Corpus of Emotions A Perceptual and Statistical Analysis
No ratings yet
Evaluation of An Arabic Speech Corpus of Emotions A Perceptual and Statistical Analysis
17 pages
3166-3175 Ijpr1301550
No ratings yet
3166-3175 Ijpr1301550
11 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
1 3 Multiple Hypothesis Testing
No ratings yet
1 3 Multiple Hypothesis Testing
14 pages
Acceptability Level of Pickleball Sport in The Philippines
No ratings yet
Acceptability Level of Pickleball Sport in The Philippines
16 pages
ERIC Notebook: Common Measures and Statistics in Epidemiological Literature
No ratings yet
ERIC Notebook: Common Measures and Statistics in Epidemiological Literature
5 pages
ML LVC 2 Post-Session Summary
No ratings yet
ML LVC 2 Post-Session Summary
12 pages
Notebook - Text Classification
No ratings yet
Notebook - Text Classification
7 pages
Building A Tanh Activation Function
No ratings yet
Building A Tanh Activation Function
9 pages
Stock Watson 4E Exercisesolutions Chapter5 Students
No ratings yet
Stock Watson 4E Exercisesolutions Chapter5 Students
9 pages
Notebook - Agave Plant Maturation Model Inference and Testing
No ratings yet
Notebook - Agave Plant Maturation Model Inference and Testing
7 pages
Time Series Analysis 1718649022
No ratings yet
Time Series Analysis 1718649022
5 pages
Notebook - Geospatial
No ratings yet
Notebook - Geospatial
11 pages
5 2-6 Spatial Environmental Data Gaussian Processes
No ratings yet
5 2-6 Spatial Environmental Data Gaussian Processes
4 pages
Boston Dataset
No ratings yet
Boston Dataset
6 pages
New System To Harness 40% of The Sun's Heat To Produce Clean Hydrogen Fuel
No ratings yet
New System To Harness 40% of The Sun's Heat To Produce Clean Hydrogen Fuel
6 pages
Glossary of Notations - Recommender Systems Part 3
No ratings yet
Glossary of Notations - Recommender Systems Part 3
4 pages
RAGE Against The Machine - Retrieval-Augmented LLM Explanations
No ratings yet
RAGE Against The Machine - Retrieval-Augmented LLM Explanations
4 pages
5 3-2 Spatial Environmental Data Model Selection Long-Range Dependencies
No ratings yet
5 3-2 Spatial Environmental Data Model Selection Long-Range Dependencies
3 pages
Notebook - Main Code
No ratings yet
Notebook - Main Code
4 pages
Data Pipeline in ML
No ratings yet
Data Pipeline in ML
3 pages
A Study On Clinical, Radiological and Electrophysiological Profile in Patients Presenting With First Unprovoked Seizure
No ratings yet
A Study On Clinical, Radiological and Electrophysiological Profile in Patients Presenting With First Unprovoked Seizure
8 pages
ML LVC 3 Glossary
No ratings yet
ML LVC 3 Glossary
1 page
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet

MLS 2 - Statistics For Data Science

Uploaded by

MLS 2 - Statistics For Data Science

Uploaded by

Statistics for Data Science

Discrete random variable: It can Continuous random variable: It can

provides the probability for each

Continuous Probability Probability density

determines the probability with

Bernoulli The outcome of tossing a fair coin

Binomial The number of non-defective products in a production run

Uniform The number of books sold weekly at a bookstore

The assumptions of Binomial distribution are as follows:

The Uniform Distribution is the probability

Continuous Uniform Distribution: Can take

Example: The height of all adult males in a city

● It has a zero skewness

What are Sampling Distributions?

The sampling distribution of the sample means will approach

Let’s see CLT in action by simulation - Link to external site

Make inferences about population

Point Estimation Interval Estimation

The range of values within

Question of Interest Hypotheses about the population

Null Hypothesis (H0) Alternative Hypothesis (Ha)

● Probability of observing equal or more extreme results than the computed

Fail to reject Correct Type II Error

Reject H0 if the value of

Select Appropriate Test

Set Level of Signiﬁcance, 𝛂

Collect Data and Calculate Test Statistic

Determine p-value Determine Critical Value

Compare with 𝛂 Compare with Test Statistic

Reject or Fail to Reject H0

You might also like