SlideShare a Scribd company logo
zekeLabs
Statistics for Data Science
“Goal - Become a Data Scientist”
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
“A Goal without a Plan is just a wish”
● Introduction to Statistics
● Importance of Statistics
● Understanding Variables Types
● Descriptive vs Inferential Statistics
Overview of
Statistics
Introduction to Statistics
● Science of learning from data.
● Methodical data collection.
● Employ correct data analysis.
● Presenting analysis effectively.
● Opposite to statistics is “Anecdotal Evidence”.
Importance
● Avoid getting biased samples
● Prevent overgeneralization
● Wrong causality
● Incorrect Analysis
● Applied to any domain
Variables
● Explanatory (predictor or independent)
● Response (outcome or dependent)
● A variable can serve as independent in one
study and dependent in another
Data Types of Variables - Quantitative
versus Qualitative
● Quantitative - Numerical data. Eg. weight, temperature, number_project
● Qualitative - Non-numerical data. Eg. dept, salary
Types of Quantitative Variables
● Continues - any numeric value. Eg. Sqft
● Discrete - count of the presence of a characteristic, result, item, or activity.
Eg. Floor
Qualitative Data: Categorical, Binary, and
Ordinal
● Categorical or Nominal. Eg - dept ( sales, RD etc. )
● Binary. Eg. Left ( 1 or 0 )
● Ordinal. Eg. salary ( low, medium, high )
Choosing Statistical Analysis based on data
type
Types of Statistical Analysis
● Descriptive Statistics - Describes data.
○ Common Tools - Central tendency, Data distribution, skewness
● Inferential Statistics - Draw conclusions from the sample & generalize for
entire population
○ Common Tools - Hypothesis Testing, Confidence Intervals, Regression Analysis
● Measure of Central Tendency
● Measure of Variability
● Visualizing Data
Summarizing Data
Measure of Central Tendency
● Mean - Average of data, suited for continuous data with no outliers
● Median - Middle value of ordered data, suited for continuous data with
outliers
● Mode - Most occuring data, suited for categorical data ( both nominal and
ordinal )
Measure of Variance
● Range
● Interquartile Range
● Variance
● Standard Deviation
Visualizing Continuous Data
● Histogram
● ScatterPlot
Visualizing Continuous Data - 2
● Box-Plot
Visualizing Discrete Data
● Histogram
● Pie
● Basics of Probability
● Conditional Probability
● Discrete Probability Function
● Continuous Probability Function
● Central Limit Theorem
Probability
Distribution
Probability of Single Event
Probability of Two Independent Events
P(A AND B) = P(A) * P(B)
Probability of heads on tossing of two coins P(A) * P(B) = ½ * ½ = ¼
P(A OR B) = P(A) + P(B) - P(A AND B)
Probability of head in 1st flip or probability of head in 2nd flip or both
½ + ½ - ¼ = ¾
Conditional Probability
Probability of an event given the other event has occurred.
P(B|A) - Probability of event B given A has happened
P(A AND B) = P(A) * P(B|A)
Probability of drawing 2 aces = P(drawing one ace from deck) * P(drawing one
ace given already one ace is pulled out)
Probability of drawing 2 aces = 4/52 * 3/51
Probability distribution
● A function describing the likelihood of obtaining possible values that a
random variable can assume.
● Consider salary of employee data, we can create distribution of salary.
● Such distribution is useful to know which outcome is more likely.
● Sum of probability of all outcomes is 1, so every outcome has likelihood
between 0 & 1
● PDF are divided into two types based on data - Discrete and Continues
Discrete Probability Distribution Function
● Probability mass functions for discrete data
● Binomial Distribution for Binary Data (Yes/No)
● Poisson Distribution for count data (No. of cars per family)
● Uniform Distribution for Data with equal probability (Rolling dice)
Binomial Distribution
Poisson Distribution
Uniform Distribution
Probability distribution for continuous data
● Probability mass function for continuous data
● Central tendency, variation & skewness important parameters
● Normal Probability Distribution or Gaussian Distribution or Bell curve
● Lognormal Probability Distribution
Normal Distribution
● A probability function that
describes how the values of a
variable are distributed.
● Symmetric distribution
● Mean = 69, Std = 2.8
● Notation Alert, mu & sigma term
used for entire population
Height Distribution
Normal Distribution - 2
● Empirical Rule of Normal Distribution : 68 - 95 - 99
● Standard Normal Distribution : Mean = 0, Std = 1.0
● Z-scores is a great way to understand where a specific observation fall wrt
entire population. It is basically number of std far from mean.
Lognormal Distribution
● Introduction
● Central Tendency
● Data Distribution
● Skewness
● Correlation
Descriptive
Statistics
Lognormal Distribution
● Introduction
● Hypothesis Testing
● Confidence Intervals
● Regression Analysis
Inferential
Statistics
● Chi-square Test of Independence
● Correlation and Linear Regression
● Analysis of Variance or ANOVA
Relationships
between Variables
Thank You !!!
Visit : www.zekeLabs.com for more details
Let us know how can we help your organization to Upskill the employees to
stay updated in the ever-evolving IT Industry.
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

More Related Content

PDF
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
PPTX
The Basics of Statistics for Data Science By Statisticians
PPTX
Introduction to ML (Machine Learning)
PPTX
Dimension reduction techniques[Feature Selection]
PPTX
Statistics and data science
PPTX
Introduction to Data Science
PDF
Supervised and Unsupervised Machine Learning
PDF
Exploratory data analysis data visualization
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
The Basics of Statistics for Data Science By Statisticians
Introduction to ML (Machine Learning)
Dimension reduction techniques[Feature Selection]
Statistics and data science
Introduction to Data Science
Supervised and Unsupervised Machine Learning
Exploratory data analysis data visualization

What's hot (20)

PDF
Statistics for data scientists
PDF
Introduction to Statistical Machine Learning
PDF
Logistic regression
PPTX
Exploratory data analysis with Python
PPTX
Probability Theory for Data Scientists
PPTX
Exploratory Data Analysis
PDF
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
PPT
Mining Frequent Patterns, Association and Correlations
PDF
Linear Regression vs Logistic Regression | Edureka
PPTX
ML - Multiple Linear Regression
PDF
Introduction to data science
PDF
Introduction to data analytics
PDF
Principal component analysis and lda
PPT
Data preprocessing
PDF
Missing data handling
PDF
The Data Science Process
PPT
Data cleaning-outlier-detection
PDF
Data preprocessing using Machine Learning
PDF
Logistic regression
PDF
Support Vector Machines ( SVM )
Statistics for data scientists
Introduction to Statistical Machine Learning
Logistic regression
Exploratory data analysis with Python
Probability Theory for Data Scientists
Exploratory Data Analysis
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Mining Frequent Patterns, Association and Correlations
Linear Regression vs Logistic Regression | Edureka
ML - Multiple Linear Regression
Introduction to data science
Introduction to data analytics
Principal component analysis and lda
Data preprocessing
Missing data handling
The Data Science Process
Data cleaning-outlier-detection
Data preprocessing using Machine Learning
Logistic regression
Support Vector Machines ( SVM )
Ad

Similar to Statistics for data science (20)

PPT
Introduction to Statistics53004300.ppt
PPTX
Data in science
PPTX
Basic for DoE ruchir
PPTX
Statistics in research by dr. sudhir sahu
PDF
Data Science_Chapter -2_Statical Data Analysis.pdf
PPTX
Statistics .pptx
PPTX
Probability_Distributions_Presentation_Complete.pptx
PPTX
Basic statistics 1
PPTX
Basics of statistics
PPT
statistics introduction
PPT
havsHahsSfsAFSsfASFSFSFSfssfAFsfsfastasa
PPTX
Basic stat analysis using excel
PPT
POINT_INTERVAL_estimates.ppt
PPT
Review of Chapters 1-5.ppt
PPTX
Statistical Analysis and Hypothesis Tesing
PDF
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
PPT
Chapter34
PPTX
statistics.pptxghfhsahkjhsghkjhahkjhgfjkjkg
PDF
Statistical Methods in Research
PDF
Machine learning
Introduction to Statistics53004300.ppt
Data in science
Basic for DoE ruchir
Statistics in research by dr. sudhir sahu
Data Science_Chapter -2_Statical Data Analysis.pdf
Statistics .pptx
Probability_Distributions_Presentation_Complete.pptx
Basic statistics 1
Basics of statistics
statistics introduction
havsHahsSfsAFSsfASFSFSFSfssfAFsfsfastasa
Basic stat analysis using excel
POINT_INTERVAL_estimates.ppt
Review of Chapters 1-5.ppt
Statistical Analysis and Hypothesis Tesing
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Chapter34
statistics.pptxghfhsahkjhsghkjhahkjhgfjkjkg
Statistical Methods in Research
Machine learning
Ad

More from zekeLabs Technologies (20)

PPTX
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
PPTX
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
PDF
[Webinar] Following the Agile Footprint - zekeLabs
PPTX
Machine learning at scale - Webinar By zekeLabs
PDF
A curtain-raiser to the container world Docker & Kubernetes
PPTX
Docker - A curtain raiser to the Container world
PPTX
Serverless and cloud computing
PPTX
02 terraform core concepts
PPTX
08 Terraform: Provisioners
PPTX
Outlier detection handling
PPTX
Nearest neighbors
PPTX
PPTX
Master guide to become a data scientist
PPTX
Linear regression
PPTX
Linear models of classification
PPTX
Grid search, pipeline, featureunion
PPTX
Feature selection
PPTX
Essential NumPy
PPTX
Ensemble methods
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
Machine learning at scale - Webinar By zekeLabs
A curtain-raiser to the container world Docker & Kubernetes
Docker - A curtain raiser to the Container world
Serverless and cloud computing
02 terraform core concepts
08 Terraform: Provisioners
Outlier detection handling
Nearest neighbors
Master guide to become a data scientist
Linear regression
Linear models of classification
Grid search, pipeline, featureunion
Feature selection
Essential NumPy
Ensemble methods

Recently uploaded (20)

PDF
REPORT: Heating appliances market in Poland 2024
PPTX
Belt and Road Supply Chain Finance Blockchain Solution
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
PDF
ai-archetype-understanding-the-personality-of-agentic-ai.pdf
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
PDF
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
PDF
KodekX | Application Modernization Development
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
PDF
Modernizing your data center with Dell and AMD
PPTX
ABU RAUP TUGAS TIK kelas 8 hjhgjhgg.pptx
PPTX
Web Security: Login Bypass, SQLi, CSRF & XSS.pptx
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
Reimagining Insurance: Connected Data for Confident Decisions.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Event Presentation Google Cloud Next Extended 2025
REPORT: Heating appliances market in Poland 2024
Belt and Road Supply Chain Finance Blockchain Solution
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
ai-archetype-understanding-the-personality-of-agentic-ai.pdf
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
KodekX | Application Modernization Development
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
Modernizing your data center with Dell and AMD
ABU RAUP TUGAS TIK kelas 8 hjhgjhgg.pptx
Web Security: Login Bypass, SQLi, CSRF & XSS.pptx
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
CIFDAQ's Market Insight: SEC Turns Pro Crypto
NewMind AI Weekly Chronicles - August'25 Week I
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Reimagining Insurance: Connected Data for Confident Decisions.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Event Presentation Google Cloud Next Extended 2025

Statistics for data science

  • 2. “Goal - Become a Data Scientist” “A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett “The Plan” “A Goal without a Plan is just a wish”
  • 3. ● Introduction to Statistics ● Importance of Statistics ● Understanding Variables Types ● Descriptive vs Inferential Statistics Overview of Statistics
  • 4. Introduction to Statistics ● Science of learning from data. ● Methodical data collection. ● Employ correct data analysis. ● Presenting analysis effectively. ● Opposite to statistics is “Anecdotal Evidence”.
  • 5. Importance ● Avoid getting biased samples ● Prevent overgeneralization ● Wrong causality ● Incorrect Analysis ● Applied to any domain
  • 6. Variables ● Explanatory (predictor or independent) ● Response (outcome or dependent) ● A variable can serve as independent in one study and dependent in another
  • 7. Data Types of Variables - Quantitative versus Qualitative ● Quantitative - Numerical data. Eg. weight, temperature, number_project ● Qualitative - Non-numerical data. Eg. dept, salary
  • 8. Types of Quantitative Variables ● Continues - any numeric value. Eg. Sqft ● Discrete - count of the presence of a characteristic, result, item, or activity. Eg. Floor
  • 9. Qualitative Data: Categorical, Binary, and Ordinal ● Categorical or Nominal. Eg - dept ( sales, RD etc. ) ● Binary. Eg. Left ( 1 or 0 ) ● Ordinal. Eg. salary ( low, medium, high )
  • 10. Choosing Statistical Analysis based on data type
  • 11. Types of Statistical Analysis ● Descriptive Statistics - Describes data. ○ Common Tools - Central tendency, Data distribution, skewness ● Inferential Statistics - Draw conclusions from the sample & generalize for entire population ○ Common Tools - Hypothesis Testing, Confidence Intervals, Regression Analysis
  • 12. ● Measure of Central Tendency ● Measure of Variability ● Visualizing Data Summarizing Data
  • 13. Measure of Central Tendency ● Mean - Average of data, suited for continuous data with no outliers ● Median - Middle value of ordered data, suited for continuous data with outliers ● Mode - Most occuring data, suited for categorical data ( both nominal and ordinal )
  • 14. Measure of Variance ● Range ● Interquartile Range ● Variance ● Standard Deviation
  • 15. Visualizing Continuous Data ● Histogram ● ScatterPlot
  • 16. Visualizing Continuous Data - 2 ● Box-Plot
  • 17. Visualizing Discrete Data ● Histogram ● Pie
  • 18. ● Basics of Probability ● Conditional Probability ● Discrete Probability Function ● Continuous Probability Function ● Central Limit Theorem Probability Distribution
  • 20. Probability of Two Independent Events P(A AND B) = P(A) * P(B) Probability of heads on tossing of two coins P(A) * P(B) = ½ * ½ = ¼ P(A OR B) = P(A) + P(B) - P(A AND B) Probability of head in 1st flip or probability of head in 2nd flip or both ½ + ½ - ¼ = ¾
  • 21. Conditional Probability Probability of an event given the other event has occurred. P(B|A) - Probability of event B given A has happened P(A AND B) = P(A) * P(B|A) Probability of drawing 2 aces = P(drawing one ace from deck) * P(drawing one ace given already one ace is pulled out) Probability of drawing 2 aces = 4/52 * 3/51
  • 22. Probability distribution ● A function describing the likelihood of obtaining possible values that a random variable can assume. ● Consider salary of employee data, we can create distribution of salary. ● Such distribution is useful to know which outcome is more likely. ● Sum of probability of all outcomes is 1, so every outcome has likelihood between 0 & 1 ● PDF are divided into two types based on data - Discrete and Continues
  • 23. Discrete Probability Distribution Function ● Probability mass functions for discrete data ● Binomial Distribution for Binary Data (Yes/No) ● Poisson Distribution for count data (No. of cars per family) ● Uniform Distribution for Data with equal probability (Rolling dice)
  • 27. Probability distribution for continuous data ● Probability mass function for continuous data ● Central tendency, variation & skewness important parameters ● Normal Probability Distribution or Gaussian Distribution or Bell curve ● Lognormal Probability Distribution
  • 28. Normal Distribution ● A probability function that describes how the values of a variable are distributed. ● Symmetric distribution ● Mean = 69, Std = 2.8 ● Notation Alert, mu & sigma term used for entire population Height Distribution
  • 29. Normal Distribution - 2 ● Empirical Rule of Normal Distribution : 68 - 95 - 99 ● Standard Normal Distribution : Mean = 0, Std = 1.0 ● Z-scores is a great way to understand where a specific observation fall wrt entire population. It is basically number of std far from mean.
  • 31. ● Introduction ● Central Tendency ● Data Distribution ● Skewness ● Correlation Descriptive Statistics
  • 33. ● Introduction ● Hypothesis Testing ● Confidence Intervals ● Regression Analysis Inferential Statistics
  • 34. ● Chi-square Test of Independence ● Correlation and Linear Regression ● Analysis of Variance or ANOVA Relationships between Variables
  • 36. Visit : www.zekeLabs.com for more details Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. www.zekeLabs.com | +91-8095465880 | [email protected]