3rd Session. Slides
3rd Session. Slides
Statistics Analytics
Graphical
EDA
Session #3
“The greatest value of a picture is when it
forces us to notice what we never expected
to see.”
John Tukey
CSV
Spreadsheet
A Quick
Refresher
On
Statistics
Statistics
Descriptive Inferential
Measures of Hypothesis
Measures of Measures of
Central
Tendency
Dispersion Distribution Testing
Regression
Mean Range Skewness Analysis
Statistics Standard
Median Deviation Kurtosis
Mode Variance
IQR
56 , 1 , 1 , 7 , 10
60
56
Mean
50
56 + 1 + 1 + 7 + 10
40
5
30
15
20
15
10 10
7
0 1 1
56 , 1 , 1 , 7 , 10
7
56 , 1 , 1 , 7 , 10
Mode
56 – 1
1–2 1
7–1
10 – 1
56 , 1 , 1 , 7 , 10
Mean 15
Which is the best
measure of central
Median 7 Tendency ?
Mode 1
56 , 1 , 1 , 7 , 10
Range 56 - 1
55
56 , 1 , 1 , 7 , 10
60
56
50
40
Variance
30
20
15
10 10
7
0 1 1
56 , 1 , 1 , 7 , 10
60
56
50
40
Variance 30
20
15
10 10
7
0 1 1
56 , 1 , 1 , 7 , 10
60
56
50
40
Variance 30
20
15
10 10
7
0 1 1
56 , 1 , 1 , 7 , 10
60
56
50
40
Variance 30
1681
20
64 25 15
10 196 196 10
7
0 1 1
56 , 1 , 1 , 7 , 10
60
56
50
40
Variance 30
1681
20
64 25 15
10 196 196 10
7
)
196 25
(
0 1 1
64
1681 196
Mean
432.4
56 , 1 , 1 , 7 , 10
Standard
Deviation 432.4 20.8
20.8
20.8
• Measure of how dispersed the data is in relation to the mean.
• Small standard deviation indicates data are clustered tightly
around the mean
• Large standard deviation indicates data are more spread out.
Standard
Deviation
Standard
Deviation
IQR Median
(Inter
Quartile
Range)
Box Plot
Box &
Whisker
Plot
Box Plot
Box Plot
Measure of the asymmetry of the distribution of a variable about its mean
Skewness
Skewness
Right-Skewed Data:
• In business, right-skewed data could indicate a small number of high-value customers or transactions.
• Strategies might focus on retaining these high-value entities while trying to shift more customers towards higher
value.
Left-Skewed Data:
• If the distribution of purchase amounts is left-skewed, it suggests that most customers are making relatively high-
value purchases.
• This can indicate a strong, high-value customer base.
Kurtosis
Crucial process of
performing initial
investigations on data
to discover patterns, to
check assumptions with
the help of summary
statistics and graphical
representations
Meta Data
Exploratory
Business
Data Insights
Uni-Variate
Analysis
Analysis
Multivariate Detect
Detect Feature
Analysis Missing
Outliers Engineering
Values
Prescriptive
What should be done
Predictive
What will happen
Analytics
Value Diagnostic
Why something happened
Descriptive
What happened
Complexity
Analysis Vs Analytics
Past and Present
Focus
Looks at Historical Data
Analytics
Future Oriented
Analysis Anticipates Trends
• Explores each variable on its own in a data set
• Descriptive statistics describe and summarize data.
• Central tendency of the values.
• Check for the variability in the data
• Check the shape of data
• Check for missing values
• Check for outliers
• Check Skewness and Kurtosis
Univariate
Analysis
• Involves examining relationships between two or more variables simultaneously
• Correlation Analysis
• Inferential Analysis
• Interaction Analysis
• Visualize relationships using : Scatterplots, Heatmaps, Pair-Plots etc.
• Checking Missing values and Outliers
Multivariate
Analysis
Graphical/Visual Analysis Of Statistical
Measures
✓ What are the central tendencies?
✓ Mean
✓ Median
✓ Mode(s)
✓ What are the dispersion measures?
✓ Range
✓ IQR
✓ Standard Deviation.
✓ Variance
✓ What are the shapes of the distributions?
✓ Uniform
✓ Symmetric or skewed?
• Are there any missing values? How do you Treat them ?
• Are there outliers or extreme values? How do you Treat them ?
Metadata
Filename Catpics#1.jpg
Owner Bella
Created 1st May 2024
Camera ………
……..
Missing
Data
Few Reasons for Missing Values
• Observations are not recorded for certain fields due to some reasons.
Deletion Imputation
Impute Missing Values
Outlier Analysis
Detection and Treatment
Outlier
What are Anomalies ?
• They are abnormal observations that lies far away from other
values
• Errors
• Natural
• Intentional
# OF UNITS SOLD
0
100
200
300
400
500
600
700
900
800
2:00 AM 1000
4:00 AM
6:00 AM
8:00 AM
10:00 AM
12:00 PM
2:00 PM
TIME
4:00 PM
April
6:00 PM
8:00 PM
10:00 PM
12:00 AM
2:00 AM
4:00 AM
iPhone Sales - Website on 1 st
6:00 AM
8:00 AM
10:00 AM
12:00 PM
₹ 1,50,000
Applications of Anomaly Detection
Anomaly
₹ 10,000
• Credit card fraud is a socially relevant problem
and poses a great threat to businesses all
around the world.
• Link: https://pair-code.github.io/facets/index.html
• Load the Dataset in LMS : “Session#3_Dataset _Used_Car.xlsx”
• Perform the following Univariate Analysis and Multivariate Analysis
Disclaimer
Few of the graphs and visualizations used in the presentation are not my own and have been taken from following sources.
Sources : Google Images, the Economist, William S. Cleveland and Robert McGill 1984, Data-To-Viz, Oracle, Datavizcatalogue,
blogs.sas.com,steema.com, consumer reports,visual display of quantitative information by E.Tufte, python graph gallery and
lucidchart, Storytelling with data, cole nussbaumer knaflic, Good charts by Scott Berinato