0% found this document useful (0 votes)

22 views25 pages

Unit 2

Uploaded by

msmakkar.chief19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views25 pages

Unit 2

Uploaded by

msmakkar.chief19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Course :Data Science

Course Code : ITAC008

Chapter
2:
Statistic
s of Data
Science
Chapter Index
S.No. Reference Particulars Slide
No. From - To
1 Learning Objectives 3
2 Topic 1 Measures of Central 4 -5
Tendency
3 Topic 2 Probability Theory 6 - 9
4 Topic 3 Statistical Inference 10 - 11
Chapter Index
S.No. Particulars Slide
Reference From -To
No.
5 Topic Sampling Theory 12 - 15
4
6 Topic Hypothesis Testing 16 - 18
5
7 Topic Regression 19 – 20
6 Analysis
8 Let’s Sum Up 16
Learning Objectives
 Describe the concept of probability theory
 Explain the meaning and types of statistical inference
 Discuss the importance of sampling theory
 Elucidate the meaning and importance of hypothesis testing
 Describe the concept and types of regression analysis
Measures of Central Tendency

 Arithmetic Mean: The mean of a variable represents its average value. It can be
calculated by using the below formula:

- where X represents the sample mean and fi represents the frequency of an ith observation of
the variable. Mean is the hypothetical value of a variable. It may or may not exist in the dataset.
1. Measures of Central Tendency

 Median : Median is called positional average of a variable. When we

arrange the observations of a variable in an ascending or descending
order then the middle value of the series of the observations is called
Median. Median value divides the observations into two equal halves.
Half of the observations of the variable are lower than the median value
and the other half observations are higher than the median value.
Quartiles, Deciles and Percentiles are the extensions of the median.
 Mode : The mode of a variable is the observation with the highest
frequency or highest concentration of frequencies.
2. Probability Theory

 Probability theory is a branch of mathematics that is concerned with

chance or probability. Probability theory deals with concepts by
expressing them in the form of axioms which formalize in terms of
probability space. The probability may take any value between 0 and 1.
The probability space assigns a value between 0 and 1 to a set of
outcomes which are called sample space. If a subset of the sample
space is taken, it is called an event.
 The probability theory involves use of discrete and continuous random
variables and probability distributions. The distributions provide
mathematical abstractions of non-deterministic or uncertain processes
or measured quantities which may occur as a single occurrence or over
time.
2. Probability Theory

Continuous Probability Distributions

• A continuous random variable is a random variable having an infinite
and uncountable range. If the random variable is continuous, its
probability distribution is called continuous probability distribution.
• A continuous distribution refers to the set of probabilities of the possible
values of a continuous random variable.
• A probability distribution can be described using an equation called
Probability Density Function (PDF). The area under the curve of a
random variable’s PDF shows the probabilities of the continuous random
variables.
• Probability of a continuous random variable having some value is zero.
2. Probability Theory

Discrete Probability Distributions

 Random events lead to discrete random variables. Usually, the discrete

random variables are denoted as X and their probability distribution is
denoted as P(X).
 Some of the most common discrete probability distributions used in
statistics include binomial distribution, geometric distribution,
hypergeometric distribution, multinomial distribution, negative binomial
distribution, and Poisson distribution.
 Discrete probability distributions can be described using frequency
distribution tables, graphs or charts.
2. Probability Theory

Classical Probability Distributions

There are four types of Classical Probability Distributions:
 Bernoulli Distribution: A Bernoulli distribution has only one trial and
only two possible outcomes, namely 1 (success) and 0 (failure).
 Uniform Distribution: In a uniform distribution, there may be any
number of outcomes and the probability of getting any outcome is
equally likely.
2. Probability Theory

 Binomial Distribution: A binomial distribution is the one wherein only

two outcomes are possible for all the trials and each trial’s results are
independent of each other.
 Normal Distribution: Normal distribution results in a bell-shaped
symmetrical curve. This distribution occurs naturally in many situations.
3. Statistical Inference

• Statistical inference refers to the process using which inferences about a

population are made on the basis of certain statistics calculated from a
sample of data drawn from that population. In other words, statistical
inference refers to the use of probability theory to make inferences
about a population from the sample data.
• Assume that we want to estimate the average life expectancy of males
living in Tamil Nadu, India or the percentage of public that is satisfied
with the work done by the current government. To know the actual
results, we cannot obtain data from each person in the population.
Therefore, we obtain the data from a part of the population called
sample. Data is obtained from the sample population and is analyzed to
draw inferences about the population.
3. Statistical Inference

In inferential statistics, the experimenter tries to achieve three goals as

follows:
• Parameter estimation: Parameters are the unknown constants in a
probability distribution that determine the properties of a distribution.
• Data prediction: After the parameters have been estimated for a
particular distribution, they can be used to predict the future data.
• Model comparison: After the data has been predicted for an entire
population, the experimenter selects one model which best explains the
observed data from two or more models.
4. Sampling Theory

The practice of collecting samples and analyzing them to derive some useful
information is called sampling theory. Some important concepts related to sampling
theory are as follows:
• Data: Data refers to the entire set of observations that have been collected.
• Population: An entire group of subjects or objects that are to be studied and
analysed is called population.
• Sample: A sample is a portion or sub-collection of elements that are examined in
order to estimate the characteristics of a population.
• Parameter: A parameter refers to a characteristic of a sample that is generalised for
the population.
• Statistics: It is a branch of mathematics that deals with planning and conducting
experiments, obtaining data, and organising, summarising, presenting, analysing,
interpreting and drawing conclusions based on data.
4. Sampling Theory

Sampling Frame
• Sampling frame refers to the complete list of all the items (everyone and everything)
that must be studied. At first, it would appear that a sampling frame is the same as
population. But, population is general, whereas sampling frame is specific.
• For example, we may define a population as all those individuals who can be
sampled (for example, all the Indian Americans living in Texas, USA), whereas an
exhaustive list of all the Indian Americans living in Texas, USA would be considered
as the sampling frame because it is not necessary that all the Indian Americans living
in Texas, USA would be listed under the list so provided.
• In statistical research, the experimenters require a list of items in order to draw a
sample from it. It must be ensured that the sampling frame is adequate for the needs
of the experimenter.
Sampling Theory

Sampling Methods
 In statistics, there are various sampling methods. Sampling methods are
divided into two categories, namely probability sampling and non-
probability sampling.
 Probability sampling is the one wherein the sample has a known
probability of being selected.
 In non-probability sampling, a sample does not have known probability
of being selected. In probability sampling, we can determine the
probability that each sample will be selected. In addition, we can also
determine which sampling units belong to which sample..
4. Sampling
Theory
Sampling Errors
• Errors that are involved in sampling
are shown in following figure:
5. Hypothesis
Testing
 A hypothesis is a statement or a proposed explanation about
one or more populations. A hypothesis statement is usually
associated with the population parameters. A hypothesis can
be tested using a research method.
 In hypothesis testing, there are two types of hypotheses,
namely null hypothesis and alternate hypothesis. The null
hypothesis (H0) is the hypothesis to be tested. Alternate
hypothesis (HA) is the hypothesis that must be accepted if the
sample data leads to rejection of H0.
 Hypothesis testing, also called significance testing, is a
method which is used to test the hypothesis regarding the
population parameters using the data collected from a
sample. Alternatively, we can say that hypothesis testing is a
method of evaluating samples to learn about the
characteristics of a given population.
Hypothesis Testing

Four Steps to Hypothesis Testing

• The process of hypothesis testing consists of four steps as follows:

1 2 3 4
Step 1: Identify Step 2: Set the Step 3: Select a Step 4: Make a
the hypothesis to criterion upon random sample decision –
be tested. which the from the Compare the
hypothesis would population and observed value of
be tested. measure the the sample to
sample mean what we expect to
(Compute the test observe if the
statistic). claim we are
testing is true.
Hypothesis Testing

Analysis of Variance (ANOVA)

• Independent-sample t-test can be applied to situations where there are
only two independent samples. In other words, we can use independent-
sample t-tests for comparing the means of two populations (such as
males and females). When we have more than two independent
samples, t-test is inappropriate. The Analysis of Variance (ANOVA) has
an advantage over t-test when the researcher wants to compare the
means of a larger number of population (i.e., three or more).
• ANOVA is a parametric test that is used to study the difference among
more than two groups in the datasets. It helps in explaining the amount
of variation in the dataset.
Regression Analysis

• Regression analysis is a statistical method that is used to model a

relationship between two or more variables of interest.
• Regression analysis is usually used to model a relationship between a
response variable (dependent variable) and one or more predictor
(independent) variables.
• There are various types of regression. However, the basic function of
these regression models is to examine the influence of one or more
independent variables on a dependent variable.
• Regression analysis helps in identifying which variables have an impact
on a variable of interest.
Regression Analysis

Types of Regression Techniques

Seven important types of regression techniques include:
 Linear Regression
 Logistic Regression
 Polynomial Regression
 Ordinal Regression
 Ridge Regression
 Principal Components Regression (PCR)
 Partial Least Squares (PLS) Regression
Let’s Sum Up

 The probability theory is a branch of mathematics that is concerned with chance or

probability.
 The probability theory involves use of discrete and continuous random variables and
probability distributions.
 Statistical inference describes the use of probability theory to make inferences about
a population from the sample data.
 The practice of collecting samples and analyzing them to derive some useful
information is called sampling theory.
 Hypothesis testing is a method of calculating samples to learn about the
characteristics of a given population.
 Regression analysis a statistical method which is used to model a relationship
between two or more variables of interest.
THANK YOU

Econometrics Cheat Sheet Stock and Watson
100% (5)
Econometrics Cheat Sheet Stock and Watson
2 pages
Stats
No ratings yet
Stats
52 pages
Bio Statistics
No ratings yet
Bio Statistics
72 pages
Introduction To Probability
No ratings yet
Introduction To Probability
66 pages
P299 Module 8 Notes
No ratings yet
P299 Module 8 Notes
8 pages
Module 4-Sampling 2
No ratings yet
Module 4-Sampling 2
56 pages
Statistics
No ratings yet
Statistics
16 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
Research Designe and Basics of Stistics Manish Jain
100% (1)
Research Designe and Basics of Stistics Manish Jain
67 pages
Lecture 3: Sampling and Sample Distribution
No ratings yet
Lecture 3: Sampling and Sample Distribution
30 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
Business Statistics
No ratings yet
Business Statistics
25 pages
Research 9 Q3
No ratings yet
Research 9 Q3
17 pages
Engineering Mathematics - IV (15MAT41) Module-V: SAMPLING THEORY and Stochastic Process
100% (1)
Engineering Mathematics - IV (15MAT41) Module-V: SAMPLING THEORY and Stochastic Process
28 pages
Statistical Methods
No ratings yet
Statistical Methods
15 pages
Probstats Reviewer
No ratings yet
Probstats Reviewer
3 pages
Sampling Distribution
No ratings yet
Sampling Distribution
19 pages
Super Position Theorem
No ratings yet
Super Position Theorem
14 pages
Unit 5
No ratings yet
Unit 5
53 pages
Probability & Statistics
No ratings yet
Probability & Statistics
108 pages
To Statistics
No ratings yet
To Statistics
85 pages
3 Introduction To Probablities
No ratings yet
3 Introduction To Probablities
25 pages
Probability Distribution
No ratings yet
Probability Distribution
16 pages
Inferential Statistics 1 (G4)
No ratings yet
Inferential Statistics 1 (G4)
43 pages
What Is A Probability Distribution
No ratings yet
What Is A Probability Distribution
11 pages
1 Intro-Statistics
No ratings yet
1 Intro-Statistics
61 pages
Statistical Concepts and Principles
No ratings yet
Statistical Concepts and Principles
37 pages
STAT Vocab
No ratings yet
STAT Vocab
15 pages
Module 02 - AIML Statisitcs
No ratings yet
Module 02 - AIML Statisitcs
103 pages
Module 1 Introduction To Statistics and Data Analysis Math403 2020 PDF
No ratings yet
Module 1 Introduction To Statistics and Data Analysis Math403 2020 PDF
29 pages
Statictic Sammy CORRECTED3
No ratings yet
Statictic Sammy CORRECTED3
57 pages
Unit 4R
No ratings yet
Unit 4R
15 pages
COM 201 - Inferential Statistics - 18032022-1
No ratings yet
COM 201 - Inferential Statistics - 18032022-1
58 pages
Basic Statistics Data Management & Sampling GED0103
No ratings yet
Basic Statistics Data Management & Sampling GED0103
36 pages
R Language All Topic
No ratings yet
R Language All Topic
54 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
13 pages
Statisticsppt Copy 170221201132
No ratings yet
Statisticsppt Copy 170221201132
30 pages
Statistics and Data Management
No ratings yet
Statistics and Data Management
8 pages
Inferential Statistics
No ratings yet
Inferential Statistics
23 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
Unit 4
No ratings yet
Unit 4
20 pages
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
No ratings yet
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
91 pages
STATISTICS
No ratings yet
STATISTICS
9 pages
Sampling and Sampling Distribution
100% (1)
Sampling and Sampling Distribution
64 pages
Inferential Statistics: Sampling, Probability, and Hypothesis Testing
No ratings yet
Inferential Statistics: Sampling, Probability, and Hypothesis Testing
26 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Classify Sample Observation
No ratings yet
Classify Sample Observation
2 pages
Book Statistics
100% (1)
Book Statistics
197 pages
MFCS
No ratings yet
MFCS
53 pages
Final Cheat Sheet 2
No ratings yet
Final Cheat Sheet 2
4 pages
Statistics: - MACHINE LEARNING - Exciting!
No ratings yet
Statistics: - MACHINE LEARNING - Exciting!
18 pages
TLP - MATH1310 Statistical Concepts
No ratings yet
TLP - MATH1310 Statistical Concepts
10 pages
Unit - 1 Introduction-Statistical Inference
No ratings yet
Unit - 1 Introduction-Statistical Inference
28 pages
Probability Distributions: Mr. Yonatan N
No ratings yet
Probability Distributions: Mr. Yonatan N
47 pages
STA301 IMP Notes Headings and Some Questions Answers Prepared by
No ratings yet
STA301 IMP Notes Headings and Some Questions Answers Prepared by
32 pages
Basics of Data - OpenStax
No ratings yet
Basics of Data - OpenStax
39 pages
Probability and Statistics
No ratings yet
Probability and Statistics
3 pages
Decsci Reviewer CHAPTER 1: Statistics and Data
No ratings yet
Decsci Reviewer CHAPTER 1: Statistics and Data
7 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
IMTC634 - Data Science - Chapter 11
No ratings yet
IMTC634 - Data Science - Chapter 11
22 pages
IMTC634 - Data Science - Chapter 6
No ratings yet
IMTC634 - Data Science - Chapter 6
22 pages
IMTC634 - Data Science - Chapter 12
No ratings yet
IMTC634 - Data Science - Chapter 12
15 pages
IMTC634 - Data Science - Chapter 10
No ratings yet
IMTC634 - Data Science - Chapter 10
18 pages
IMTC634 - Data Science - Chapter 9
No ratings yet
IMTC634 - Data Science - Chapter 9
16 pages
IMTC634 - Data Science - Chapter 13
No ratings yet
IMTC634 - Data Science - Chapter 13
16 pages
Risk Analytics (IMT) - Chapter 12
No ratings yet
Risk Analytics (IMT) - Chapter 12
25 pages
IMTC634 Data Science Chapter 3
No ratings yet
IMTC634 Data Science Chapter 3
11 pages
Risk Analytics (IMT) - Chapter 11
No ratings yet
Risk Analytics (IMT) - Chapter 11
27 pages
Distribution Methods and Strategies
No ratings yet
Distribution Methods and Strategies
25 pages
Risk Analytics (IMT) - Chapter 7
No ratings yet
Risk Analytics (IMT) - Chapter 7
47 pages
Customer Relationship Management
No ratings yet
Customer Relationship Management
25 pages
Unit 13
No ratings yet
Unit 13
21 pages
Unit 13 Budgeting and Budgetary Control
No ratings yet
Unit 13 Budgeting and Budgetary Control
33 pages
Unit 3
No ratings yet
Unit 3
20 pages
Overall Course Brochure in Upgrad
No ratings yet
Overall Course Brochure in Upgrad
51 pages
Question Bank Research Methodology and Biostatistics BPT 402 1. Calculate Appropriate Measure of Skewness From The Following Data
No ratings yet
Question Bank Research Methodology and Biostatistics BPT 402 1. Calculate Appropriate Measure of Skewness From The Following Data
17 pages
A Deep Reinforcement Learning-Based Intelligent Grid-Forming Inverter For Inertia Synthesis by Impedance Emulation
No ratings yet
A Deep Reinforcement Learning-Based Intelligent Grid-Forming Inverter For Inertia Synthesis by Impedance Emulation
4 pages
Dive Into Deep Learning
No ratings yet
Dive Into Deep Learning
60 pages
AI Learning Plan, 2 Hours - Day
No ratings yet
AI Learning Plan, 2 Hours - Day
37 pages
Vignesh's Documentation
No ratings yet
Vignesh's Documentation
59 pages
Modelling Plume Rise and Dispersion From Pool Fires
No ratings yet
Modelling Plume Rise and Dispersion From Pool Fires
10 pages
KMÜ 308 Take Home Examination - 2021-Spring
No ratings yet
KMÜ 308 Take Home Examination - 2021-Spring
1 page
Building Connections Political Corruption and Road Construction in India PDF
No ratings yet
Building Connections Political Corruption and Road Construction in India PDF
17 pages
Challenges in ML&DM
No ratings yet
Challenges in ML&DM
12 pages
31 - Kahneman Et Al (2023) - Income and Emotional Well-Being - A Conflict Resolved
No ratings yet
31 - Kahneman Et Al (2023) - Income and Emotional Well-Being - A Conflict Resolved
6 pages
Multiple Regression Example (Salary Experience and Score)
No ratings yet
Multiple Regression Example (Salary Experience and Score)
4 pages
Identification of OOT
No ratings yet
Identification of OOT
6 pages
Solutions For Biostatistics For The Biological and Health Sciences 3rd Edition by Triola
No ratings yet
Solutions For Biostatistics For The Biological and Health Sciences 3rd Edition by Triola
17 pages
2024 Cfa Level I Errata
No ratings yet
2024 Cfa Level I Errata
36 pages
12 2marks With Ans
No ratings yet
12 2marks With Ans
21 pages
IDS Working Paper 185: Henry Lucas and Andrea Cornwall
No ratings yet
IDS Working Paper 185: Henry Lucas and Andrea Cornwall
34 pages
2020 The - Impact - of - Leadership - On - em
No ratings yet
2020 The - Impact - of - Leadership - On - em
9 pages
MBA VTU Syllabus 2012-13
No ratings yet
MBA VTU Syllabus 2012-13
186 pages
Determinants of Food Access of Old People in Dire Dawa City
No ratings yet
Determinants of Food Access of Old People in Dire Dawa City
9 pages
Applied Design of Experiments and Taguchi Methods
90% (10)
Applied Design of Experiments and Taguchi Methods
371 pages
Applying Machine Learning Approach To Predict Student's Performance I HE
No ratings yet
Applying Machine Learning Approach To Predict Student's Performance I HE
20 pages
Assignment 3
No ratings yet
Assignment 3
46 pages
Chapter 16-17 - Correlation Regression Latest
No ratings yet
Chapter 16-17 - Correlation Regression Latest
23 pages
Regression 2024
No ratings yet
Regression 2024
49 pages
Artikel Lengkap - Vol 6 No 1 (2024) - February (2024) - JIEP
No ratings yet
Artikel Lengkap - Vol 6 No 1 (2024) - February (2024) - JIEP
19 pages
Hunermund Louw 2023 On The Nuisance of Control Variables in Causal Regression Analysis
No ratings yet
Hunermund Louw 2023 On The Nuisance of Control Variables in Causal Regression Analysis
14 pages
Comparing Economic Conditions Sikkim Vs Maharashtra
No ratings yet
Comparing Economic Conditions Sikkim Vs Maharashtra
3 pages
Highway and Railroad Engineering
No ratings yet
Highway and Railroad Engineering
27 pages

Unit 2

Uploaded by

Unit 2

Uploaded by

Course :Data Science

Course Code : ITAC008

 Median : Median is called positional average of a variable. When we

 Probability theory is a branch of mathematics that is concerned with

Continuous Probability Distributions

Discrete Probability Distributions

 Random events lead to discrete random variables. Usually, the discrete

Classical Probability Distributions

 Binomial Distribution: A binomial distribution is the one wherein only

• Statistical inference refers to the process using which inferences about a

In inferential statistics, the experimenter tries to achieve three goals as

Four Steps to Hypothesis Testing

Analysis of Variance (ANOVA)

• Regression analysis is a statistical method that is used to model a

Types of Regression Techniques

 The probability theory is a branch of mathematics that is concerned with chance or

You might also like