0% found this document useful (0 votes)

18 views24 pages

BI-LEc 3

Uploaded by

Ayesha Asad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views24 pages

BI-LEc 3

Uploaded by

Ayesha Asad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Exploratory Data

Analysis (EDA)
Lecture 3- Miss Ushna Tasleem
 Exploratory Data Analysis (EDA) is a process of
describing the data using statistical and
visualization techniques to bring important
aspects of that data into focus for further analysis.
This involves inspecting the dataset from many
angles, describing & summarizing it without
Introduction making any assumptions about its contents.

 EDA is a significant step to take before diving into

statistical modeling or machine learning, to ensure
the data is really what it is claimed to be and that
there are no obvious errors. It should be part of
data science projects in every organization
 Exploratory Data Analysis (EDA) is like exploring a
new place. You look around, observe things, and
try to understand what’s going on. Similarly, in
EDA data science, you look at a dataset, check out
the different parts, and try to figure out what’s
Exploratory Data happening in the data.
Analysis (EDA)
 Exploratory data analysis (EDA) is used by data
scientists to analyze and investigate data sets and
summarize their main characteristics, often
employing data visualization methods.
 It involves using statistics and visual tools to
understand and summarize data, helping data
scientists and data analysts inspect the dataset
from various angles without making assumptions
Exploratory Data about its contents.
Analysis (EDA)
 The main purpose of EDA is to help look at
data before making any assumptions. It can help
identify obvious errors, as well as better
understand patterns within the data, detect
outliers or anomalous events, find interesting
relations among the variables.
 Look at the Data: Gather information about the data, such as the
number of rows and columns, and the type of information each column
contains. This includes understanding single variables and their
distributions.
 Clean the Data: Fix issues like missing or incorrect values.
Preprocessing is essential to ensure the data is ready for analysis and
predictive modeling.
 Make Summaries: Summarize the data to get a general idea of its
contents, such as average values, common values, or value

Process distributions. Calculating quantiles and checking for skewness can

provide insights into the data’s distribution.
 Visualize the Data: Use interactive charts and graphs to spot trends,
patterns, or anomalies. Bar plots, scatter plots, and other
visualizations help in understanding relationships between variables.
Python libraries like pandas, NumPy, Matplotlib, Seaborn, and Plotly
are commonly used for this purpose.
 Ask Questions: Formulate questions based on your observations,
such as why certain data points differ or if there are relationships
between different parts of the data.
 Find Answers: Dig deeper into the data to answer these questions,
which may involve further analysis or creating models, including
 There are four primary types of EDA:
 Univariate non-graphical. This is simplest form of data
analysis, where the data being analyzed consists of just
one variable. Since it’s a single variable, it doesn’t deal
with causes or relationships. The main purpose of

Types of univariate analysis is to describe the data and find

patterns that exist within it.
exploratory  Univariate graphical. Non-graphical methods don’t

data provide a full picture of the data. Graphical methods are

therefore required. Common types of univariate graphics
analysis include:
 Stem-and-leaf plots, which show all data values and the
shape of the distribution.
 Histograms, a bar plot in which each bar represents the
frequency (count) or proportion (count/total count) of cases
for a range of values.
 Box plots, which graphically depict the five-number summary
of minimum, first quartile, median, third quartile, and
maximum.
 Multivariate nongraphical: Multivariate data arises
from more than one variable. Multivariate non-
graphical EDA techniques generally show the
Types of relationship between two or more variables of the data

exploratory through cross-tabulation or statistics.

 Multivariate graphical: Multivariate data uses
data graphics to display relationships between two or more

analysis sets of data. The most used graphic is a grouped bar

plot or bar chart with each group representing one
level of one of the variables and each bar within a
group representing the levels of the other variable.
Other common types of multivariate graphics include:
 Scatter plot, which is used to plot data points on a
horizontal and a vertical axis to show how much one
variable is affected by another.

Types of  Multivariate chart, which is a graphical representation

exploratory of the relationships between factors and a response.

 Run chart, which is a line graph of data plotted over
data time.

analysis  Bubble chart, which is a data visualization that

displays multiple circles (bubbles) in a two-
dimensional plot.
 Heat map, which is a graphical representation of data
where values are depicted by color.
 Exploratory Data Analysis (EDA) is an essential
step in the data analysis process. It involves
analyzing and visualizing data to understand its
Why is Exploratory main characteristics, uncover patterns, and
Data Analysis identify relationships between variables.
Important?  EDA is crucial because raw data is usually skewed,
may have outliers, or too many missing values. A
model built on such data results in sub-optimal
performance. In the hurry to get to the machine
learning stage, some data professionals either
entirely skip the EDA process or do a very
mediocre job. This is a mistake with many
implications, including:
•Insight Generation: EDA helps uncover patterns, trends,
and relationships in data that drive business decisions.
•Anomaly Detection: Identifying outliers and unusual patterns
that may indicate data quality issues or business anomalies.
Why is •Hypothesis Formation: Helps in formulating hypotheses about the

Exploratory data that can be tested further using more sophisticated analytical

Data Analysis techniques.

•Improved Data Quality: By cleaning and transforming data,
Important? EDA ensures that subsequent analyses are based on accurate and
relevant information.
•Decision-Making Support: Provides decision-makers with a
deeper understanding of the data, enabling more informed and
effective business strategies.
1. Descriptive Statistics
 Summary Statistics: Measures like mean,
median, mode, standard deviation, and variance
provide a quick understanding of the data
distribution.
 Frequency Distribution: Analyzes how often
EDA values occur within a dataset, helping identify
common patterns or outliers.
Techniques  Example: A retail company analyzes the average,
median, and mode of daily sales figures to
understand general sales trends. The standard
deviation is also calculated to assess sales
variability.
 2. Data Visualization

 Histograms: Used to visualize the distribution of a

single variable, showing the frequency of data points
within specified ranges.
 Box Plots: Display the distribution of data based on a

EDA five-number summary (minimum, first quartile, median,

third quartile, and maximum) and help identify outliers.

Techniques  Scatter Plots: Reveal relationships or correlations

between two numerical variables, helping to identify
trends or patterns.
 Heatmaps: Show the concentration of data points in a
dataset, often used to represent correlation matrices.
 Bar Charts and Pie Charts: Visualize categorical data
to compare different groups or categories
3. Data Cleaning
• Handling Missing Data: Identifying and addressing
missing values through techniques such as imputation,
deletion, or filling with a default value.
• Outlier Detection: Identifying and managing outliers
that could skew the results. Techniques include the use of
z-scores or IQR (Interquartile Range).

EDA Example:

Techniques • Handling Missing Data: An organization notices that

some customer demographic data is missing. They decide
to impute missing age values using the median age of the
entire customer base to maintain analysis accuracy.
• Outlier Detection: In analyzing website traffic data, an
outlier detection method identifies a sudden spike in
traffic on a particular day, which is traced back to a
marketing campaign.
4. Data Transformation
• Normalization/Standardization: Adjusting data
scales to ensure variables are comparable,
especially when they are on different scales.
EDA • Log Transformation: Used to reduce skewness in
Techniques data distributions, making patterns more visible.
• Binning: Grouping continuous data into discrete
bins to simplify analysis and reveal trends.
Examples :
 Normalization/Standardization: A company
standardizes their sales and marketing data (e.g.,
scaling the data so that each variable has a mean
of 0 and a standard deviation of 1) to compare the
effects of different variables on revenue.
EDA  Log Transformation: A BI team uses a log
Techniques transformation on highly skewed revenue data to
make the distribution more normal, facilitating
better modeling and trend analysis.
 Binning: Customer ages are binned into ranges
(e.g., 18-25, 26-35, etc.) to simplify the analysis of
age-related purchasing behavior.
5. Correlation Analysis
• Correlation Coefficients: Quantify the
relationship between two variables (e.g., Pearson
or Spearman correlation), helping to identify
EDA variables that are strongly related.

Techniques • Correlation Matrices: Visual representations of

correlation coefficients for multiple variables,
aiding in identifying potential predictors.
• Examples:
• Correlation Coefficients: A business examines
the Pearson correlation between customer
satisfaction scores and customer retention rates,
finding a strong positive correlation that suggests

EDA higher satisfaction leads to better retention.

Techniques • Correlation Matrices: A BI analyst creates a

correlation matrix to understand how different
financial KPIs (e.g., revenue, profit margin,
customer acquisition cost) are interrelated,
guiding strategic decisions.
6. Feature Engineering
• Derived Variables: Creating new variables from
existing ones to capture more complex
relationships (e.g., interaction terms, polynomial

EDA features).

Techniques • Dimensionality Reduction: Techniques like

Principal Component Analysis (PCA) reduce the
number of variables while retaining most of the
variance, making it easier to interpret data.
 Feature Engineering Examples
• Derived Variables: A BI team creates a new
feature called "Customer Lifetime Value (CLV)" by
combining average purchase amount, purchase
frequency, and customer retention rate, providing
a powerful metric for marketing analysis.
EDA • Dimensionality Reduction: To simplify a
Techniques complex dataset with many variables, a company
uses Principal Component Analysis (PCA) to reduce
the number of features while retaining most of the
variability in the data, making it easier to visualize
and analyze.
7. Hypothesis Testing
• T-tests and ANOVA: Used to compare means between
groups and determine if observed differences are
statistically significant.
• Chi-Square Test: Used to examine the relationship
between categorical

EDA • Example:
• T-tests and ANOVA: A BI analyst conducts a t-test to
Techniques compare the average sales before and after a new
marketing campaign, determining if the observed
increase is statistically significant.
• Chi-Square Test: An e-commerce company uses a Chi-
Square test to examine the relationship between
customer gender and product category preference,
finding significant associations that inform targeted
marketing strategies.
 8. Time Series Analysis
• Trend Analysis: Identifying trends over time, such as
seasonal patterns or long-term shifts, using line plots or
time series decomposition.
• Autocorrelation: Measures the correlation of a time
series with its own past values, helping to identify
patterns over time.
EDA • Example:
Techniques • Trend Analysis: A BI team analyzes monthly sales
data over several years to identify seasonal trends,
such as increased sales during the holiday season, and
uses this insight for inventory planning.
• Autocorrelation: An autocorrelation function is
applied to daily website traffic data to detect any
repeating patterns or cycles, such as weekly peaks in
traffic.
9. Clustering and Segmentation
 K-Means Clustering: Grouping data points into
clusters based on similarity, often used for customer
segmentation in BI.
 Hierarchical Clustering: Creating a tree of clusters to
understand data groupings at different levels of
similarity.
EDA  Examples:

Techniques  K-Means Clustering: A company segments its

customers into clusters based on purchasing behavior
using K-means clustering, identifying distinct customer
groups such as "bargain hunters" and "premium buyers.
 "Hierarchical Clustering: A BI team uses hierarchical
clustering to organize products into categories based on
features such as price, brand, and customer ratings,
enabling more effective product recommendations.
10. Data Profiling
 Univariate Analysis: Examining each variable
individually to understand its distribution, central
tendency, and variability.
 Multivariate Analysis: Exploring relationships
between multiple variables simultaneously to uncover
complex interactions.
EDA  Examples:

Techniques  Univariate Analysis: A BI analyst examines each

variable individually, such as the distribution of
customer ages, to understand the basic characteristics
of the customer base.
 Multivariate Analysis: A retail company performs
multivariate analysis on sales data, examining how
variables like product category, customer location, and
purchase time interact to influence overall sales.
 Exploratory Data Analysis (EDA) is a critical
process in data analysis that involves summarizing
and visualizing data to uncover patterns,
relationships, and anomalies. It helps businesses
make informed decisions by providing insights into
data distributions and correlations.
 EDA can be performed through univariate,

Summary bivariate, and multivariate analyses, using both

graphical and non-graphical techniques. It also
includes methods like dimensionality reduction
and data transformation to enhance data quality
and interpretability.
 Overall, EDA is essential for understanding and
preparing data for more advanced analyses in
Business Intelligence.

Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Unit 3
No ratings yet
Unit 3
47 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
Dev 1
No ratings yet
Dev 1
2 pages
Unit 4
No ratings yet
Unit 4
33 pages
What Is Exploratory Data Analysis (EDA) ?
No ratings yet
What Is Exploratory Data Analysis (EDA) ?
6 pages
03a EDA
No ratings yet
03a EDA
47 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
62 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
173 pages
The Analysis - in - EDA
No ratings yet
The Analysis - in - EDA
7 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Document
No ratings yet
Document
21 pages
4.1 Advanced Data Analysis & Visualization
No ratings yet
4.1 Advanced Data Analysis & Visualization
12 pages
ML Exp1 - 2201107
No ratings yet
ML Exp1 - 2201107
34 pages
Exploratory Data Analysis (EDA) in Data
No ratings yet
Exploratory Data Analysis (EDA) in Data
12 pages
Group 7
No ratings yet
Group 7
19 pages
E Data Analysis
No ratings yet
E Data Analysis
2 pages
Module 2
No ratings yet
Module 2
81 pages
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
No ratings yet
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
15 pages
CH4 Exploratory Data Analysis
No ratings yet
CH4 Exploratory Data Analysis
12 pages
Notes Unit I
No ratings yet
Notes Unit I
47 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
9 pages
Unit 1
No ratings yet
Unit 1
19 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
Unit 1
No ratings yet
Unit 1
23 pages
Unit 1
No ratings yet
Unit 1
50 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
Notes - Unit 1 - Exploratory Data Analysis
No ratings yet
Notes - Unit 1 - Exploratory Data Analysis
33 pages
What Is Exploratory Data Analysis (EDA)
100% (2)
What Is Exploratory Data Analysis (EDA)
13 pages
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
No ratings yet
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
47 pages
Eda 1
No ratings yet
Eda 1
25 pages
Datascience Unit-4
No ratings yet
Datascience Unit-4
6 pages
Exploratory Data Analysis EDA Part of Data PreProcessing
No ratings yet
Exploratory Data Analysis EDA Part of Data PreProcessing
11 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
5 pages
Unit 3-BA
No ratings yet
Unit 3-BA
31 pages
Notes - EDA-Unit1
No ratings yet
Notes - EDA-Unit1
34 pages
Eda 2
No ratings yet
Eda 2
69 pages
Exploratory Dataanalysis (EDA) : Kevin Angelo A. Inlong
No ratings yet
Exploratory Dataanalysis (EDA) : Kevin Angelo A. Inlong
6 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
Exploratory Data Analysis in ML
No ratings yet
Exploratory Data Analysis in ML
7 pages
Exp 4-10 Merged
No ratings yet
Exp 4-10 Merged
89 pages
Eda
No ratings yet
Eda
6 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
Unit 2
No ratings yet
Unit 2
58 pages
Assignment EDA
No ratings yet
Assignment EDA
4 pages
IMPDAV
No ratings yet
IMPDAV
105 pages
Unit 1
No ratings yet
Unit 1
29 pages
C21 Sma Exp4
No ratings yet
C21 Sma Exp4
12 pages
DSML Notes
No ratings yet
DSML Notes
32 pages
Data Analytics Interview Questions
No ratings yet
Data Analytics Interview Questions
3 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
10 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
SPM Lecture 19
No ratings yet
SPM Lecture 19
27 pages
SPM Lecture 20
No ratings yet
SPM Lecture 20
21 pages
Heuristic
No ratings yet
Heuristic
2 pages
Formatted Cloud Resume
No ratings yet
Formatted Cloud Resume
3 pages
Lecture Adversarial Searches
No ratings yet
Lecture Adversarial Searches
25 pages
BI Lec 6 - Hypothesis Testing
No ratings yet
BI Lec 6 - Hypothesis Testing
22 pages
Welcome To The Azure Hands On Lab
No ratings yet
Welcome To The Azure Hands On Lab
8 pages
Main Features Your Camera Should Have
No ratings yet
Main Features Your Camera Should Have
1 page
BI - Lec 7 - Case Study 2
No ratings yet
BI - Lec 7 - Case Study 2
11 pages
Lec 13-ETL
No ratings yet
Lec 13-ETL
18 pages
BI-Lec 8-9 Decision Making
No ratings yet
BI-Lec 8-9 Decision Making
35 pages
Lec 11 - DW
No ratings yet
Lec 11 - DW
32 pages
Lec 10 Descriptive Analysis
No ratings yet
Lec 10 Descriptive Analysis
18 pages
Lab11 WebEngineering
No ratings yet
Lab11 WebEngineering
5 pages
COT 3 Standard Deviation
No ratings yet
COT 3 Standard Deviation
37 pages
Statistics Is The ": Science Which Deals With The Collection, Analysis and Interpretation of Numerical Data"
No ratings yet
Statistics Is The ": Science Which Deals With The Collection, Analysis and Interpretation of Numerical Data"
98 pages
Business Statistics End Term Exam (Set 1)
No ratings yet
Business Statistics End Term Exam (Set 1)
5 pages
ANSWERS Calculate Expected Return and Standard Deviation For Individual Stocks and Portfolios
No ratings yet
ANSWERS Calculate Expected Return and Standard Deviation For Individual Stocks and Portfolios
3 pages
RN4 - BEEA StatPro RN - Sampling and Sampling Distribution of The Sample Mean - SJ - JC - FINAL
No ratings yet
RN4 - BEEA StatPro RN - Sampling and Sampling Distribution of The Sample Mean - SJ - JC - FINAL
18 pages
Fractiles-of-Group-Data
No ratings yet
Fractiles-of-Group-Data
34 pages
Cumulative Frequency Graphs: DR Frost
No ratings yet
Cumulative Frequency Graphs: DR Frost
37 pages
Unit 2 Lesson 3 Variance of Random Variable
No ratings yet
Unit 2 Lesson 3 Variance of Random Variable
30 pages
Exercise Frequency Distribution & Graph
No ratings yet
Exercise Frequency Distribution & Graph
5 pages
Vishu Choudhary E19epy019 End Term Exam Answer Sheet
No ratings yet
Vishu Choudhary E19epy019 End Term Exam Answer Sheet
8 pages
Maths Statistics Case Study Questions
No ratings yet
Maths Statistics Case Study Questions
2 pages
Ia2-1mba-Business Statistics
No ratings yet
Ia2-1mba-Business Statistics
2 pages
Linear Correlation Analysis Application
No ratings yet
Linear Correlation Analysis Application
4 pages
Quartile Percentile and Decile Compatibility Mode
No ratings yet
Quartile Percentile and Decile Compatibility Mode
5 pages
Dispersion Skewness Kurtosis
No ratings yet
Dispersion Skewness Kurtosis
41 pages
STA301 Assignment Solution by Pin
No ratings yet
STA301 Assignment Solution by Pin
3 pages
Chapter 1 Exam Review - Graphical Displays of Data SOLUTIONS
No ratings yet
Chapter 1 Exam Review - Graphical Displays of Data SOLUTIONS
8 pages
Statistics Group Assignment
No ratings yet
Statistics Group Assignment
9 pages
Unit-8 Block 4 Statistics in Psychology
No ratings yet
Unit-8 Block 4 Statistics in Psychology
32 pages
EC2203
No ratings yet
EC2203
20 pages
Basic Statistics Answer Key
No ratings yet
Basic Statistics Answer Key
34 pages
Variance and Standard Deviation
No ratings yet
Variance and Standard Deviation
18 pages
Business Statistics Syllabus
No ratings yet
Business Statistics Syllabus
3 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
13 pages
Prof Ed 6 2
No ratings yet
Prof Ed 6 2
6 pages
4b-Assignment of Displaying & Exploring Data
100% (1)
4b-Assignment of Displaying & Exploring Data
6 pages
Elitealgo v30
No ratings yet
Elitealgo v30
30 pages
Measure of Variability - Data Management PDF
No ratings yet
Measure of Variability - Data Management PDF
79 pages
Math 7 1ST Quarter
No ratings yet
Math 7 1ST Quarter
57 pages

BI-LEc 3

Uploaded by

BI-LEc 3

Uploaded by

Exploratory Data

 EDA is a significant step to take before diving into

Process distributions. Calculating quantiles and checking for skewness can

Types of univariate analysis is to describe the data and find

data provide a full picture of the data. Graphical methods are

exploratory through cross-tabulation or statistics.

analysis sets of data. The most used graphic is a grouped bar

Types of  Multivariate chart, which is a graphical representation

exploratory of the relationships between factors and a response.

analysis  Bubble chart, which is a data visualization that

Data Analysis techniques.

 Histograms: Used to visualize the distribution of a

EDA five-number summary (minimum, first quartile, median,

Techniques  Scatter Plots: Reveal relationships or correlations

Techniques • Handling Missing Data: An organization notices that

Techniques • Correlation Matrices: Visual representations of

EDA higher satisfaction leads to better retention.

Techniques • Correlation Matrices: A BI analyst creates a

Techniques • Dimensionality Reduction: Techniques like

Techniques  K-Means Clustering: A company segments its

Techniques  Univariate Analysis: A BI analyst examines each

Summary bivariate, and multivariate analyses, using both

You might also like