0% found this document useful (0 votes)

11 views10 pages

Lesson 5 Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a critical step in Business Analytics that prepares data for modeling by identifying patterns, anomalies, and relationships among variables. It employs data visualization and statistical techniques to ensure data quality and appropriateness for analysis, ultimately aiding data scientists in achieving valid business outcomes. The document outlines practical steps for conducting EDA using Python, including loading data, identifying null values, and visualizing unique counts.

Uploaded by

Saadie Essie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views10 pages

Lesson 5 Exploratory Data Analysis

Uploaded by

Saadie Essie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

BUSINESS INTELLIGENCE AND ANALYTICS

Lesson 5: Exploratory data analysis (EDA)

Setting the context
Before you start a Business Analytics project, it’s important to ensure that the
data is ready for modeling work.
o Exploratory Data Analysis (EDA) ensures the readiness of the data for
Business Analytics.
o In fact, EDA ensures that the data is more usable. Without a proper EDA,
Machine Learning work suffer from accuracy issues and many times, the
algorithms won't work.

What is exploratory data analysis?

 Exploratory data analysis (EDA) is used by data scientists to analyze and

investigate data sets and summarize their main characteristics, often employing data
visualization methods.
o It helps determine how best to manipulate data sources to get the
answers you need, making it easier for data scientists
 to discover patterns, spot anomalies, test a hypothesis, or check
assumptions.

 EDA is primarily used to see what data can reveal beyond the formal modeling
or hypothesis testing task and provides a better understanding of data set
variables and the relationships between them.
 It can also help determine if the statistical techniques you are considering for data
analysis are appropriate.
 Originally developed by American mathematician John Tukey in the 1970s,
EDA techniques continue to be a widely used method in the data discovery
process today.

Why is exploratory data analysis important in Business Analytics?

The main purpose of EDA is to help look at data before making any assumptions. It can
help identify obvious errors, as well as better understand patterns within the data, detect
outliers or anomalous events, find interesting relations among the variables.

1|P ag e
Data scientists can use exploratory analysis to ensure the results they produce are
valid and applicable to any desired business outcomes and goals.
 EDA also helps stakeholders by confirming they are asking the right questions.
 EDA can help answer questions about standard deviations, categorical variables,
and confidence intervals.
 Once EDA is complete and insights are drawn, its features can then be used for
more sophisticated data analysis or modeling, including machine learning.

Programming Language Used

Python: an interpreted, object-oriented programming language with dynamic

semantics. Its high-level, built-in data structures, combined with dynamic
typing and dynamic binding, make it very attractive for rapid application
development, as well as for use as a scripting or glue language to connect
existing components together.
Python and EDA can be used together to identify missing values in a data set, which
is important so you can decide how to handle missing values for machine learning.

Practical Case study to illustrate how to conduct Exploratory data analysis (EDA)

Using Python Language.

Some steps used to investigate data

1. Exploratory Data Analysis - EDA.

2. Load the Data.
3. Basic information about data - EDA.
4. Duplicate values.
5. Summary statistics i.e mean, count, standard deviation, etc.
6. Unique values in the data.
7. Visualize the Unique counts.
8. Find the Null values.
9. Replace the Null values.

1. Load the Data

Well, first things first. We will load the titanic dataset into python to perform EDA.

2|P ag e
2. Basic information about data - EDA

The df.info () function will give us the basic information about the dataset. For any data,
it is good to start by knowing its information. Let’s see how it works with our data.

3|P ag e
Using this function, you can see the number of null values, datatypes, and memory
usage as shown in the above outputs along with descriptive statistics.

3. Duplicate values

You can use the df.duplicate.sum () function to the sum of duplicate value present if
any. It will show the number of duplicate values if they are present in the data.

4|P ag e
Well, the function returned ‘0’. This means, there is not a single duplicate value present
in our dataset and it is a very good thing to know.

4. Unique values in the data

You can find the number of unique values in the particular column using unique
() function in python.

array ([3, 1, 2], dtype=int64)

array ([0, 1], dtype=int64)

array (['male', 'female'], dtype=object)

The unique () function has returned the unique values which are present in the data and
it is pretty much cool!

5. Visualize the Unique counts

Yes, you can visualize the unique values present in the data. For this, we will be using
the seaborn library. You have to call the sns. Count plot () function and specify the
variable to plot the count plot.

5|P ag e
6. Find the Null values

Finding the null values is the most important step in the EDA. ensuring the quality of data
is paramount.

6|P ag e
we have some null values in the ‘Age’ and ‘Cabin’ variables.

7. Replace the Null values

Hey, we got a replace () function to replace all the null values with a specific data. It is
too good!

It is very easy to find and replace the null values in the data as shown. I have used 0 to
replace null values. You can even opt for more meaningful methods such as mean or
median.

8. Know the datatypes

Knowing the datatypes which you are exploring is very important and an easy process
too. Let’s see how it works.

7|P ag e
You have to use the types function for this a shown and you will get the datatypes of
each attribute.

9. Filter the Data

Yes, you can filter the data based on some logic.

the above code has returned only data values that belong to class 1.

10. A quick box plot

You can create a box plot for any numerical column using a single line of code.

8|P ag e
11. Correlation Plot - EDA

Finally, to find the correlation among the variables, we can make use of the correlation
function. This will give you a fair idea of the correlation strength between different
variables.

This is the correlation matrix with the range from +1 to -1 where +1 is highly and
positively correlated and -1 will be highly negatively correlated.

12. seaborn library

You can even visualize the correlation matrix using

9|P ag e
Exploratory Data Analysis – EDA Summary

 EDA is applied to investigate the data and summarize the key

insights.
 It will give you the basic understanding of your data, it’s
distribution, null values and much more.
 You can either explore data using graphs or through some python
functions.
 There will be two type of analysis. Univariate and Bivariate. In the
univariate, you will be analyzing a single attribute. But in the
bivariate, you will be analyzing an attribute with the target attribute.
 In the non-graphical approach,
o you will be using functions such as shape, summary, describe,
is null, info, datatypes and more.
 In the graphical approach,
o you will be using plots such as scatter, box, bar, density and
correlation plots

Revision Questions Exploratory Data Analysis (EDA)

1. What is the Difference between Univariate, Bivariate, and Multivariate
analysis? in EDA analysis.

2. During the data preprocessing step, how should one treat missing/null
values? How will you deal with them?

3. What is an outlier and how to identify them?

10 | P a g e

EDA Unit 1 Notes
No ratings yet
EDA Unit 1 Notes
27 pages
Research Activity
100% (1)
Research Activity
3 pages
UNIT 1
No ratings yet
UNIT 1
23 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Group-7
No ratings yet
Group-7
19 pages
Unit 3
No ratings yet
Unit 3
47 pages
Exploratory Data Analysis in ML
No ratings yet
Exploratory Data Analysis in ML
7 pages
Eda
No ratings yet
Eda
4 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
PDF_Experiments-1_DADV
No ratings yet
PDF_Experiments-1_DADV
41 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
DOC-20250125-WA0000.
No ratings yet
DOC-20250125-WA0000.
15 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Document (4)
No ratings yet
Document (4)
21 pages
Unit-1
No ratings yet
Unit-1
52 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Module 2
No ratings yet
Module 2
81 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Exploratory Data Analysis EDA Part of Data PreProcessing
No ratings yet
Exploratory Data Analysis EDA Part of Data PreProcessing
11 pages
EDA Feature eng- Estimation Inference and Hypothesis
No ratings yet
EDA Feature eng- Estimation Inference and Hypothesis
53 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
eda1
No ratings yet
eda1
25 pages
Exploratory Data Analysis (EDA) Using Python
No ratings yet
Exploratory Data Analysis (EDA) Using Python
21 pages
DL_EDA_process
No ratings yet
DL_EDA_process
2 pages
ML EXP1_2201107
No ratings yet
ML EXP1_2201107
34 pages
Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
No ratings yet
Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
73 pages
Lecture 21
No ratings yet
Lecture 21
16 pages
Eda Unit 1
No ratings yet
Eda Unit 1
57 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Data Exploration and Visualization
100% (1)
Data Exploration and Visualization
281 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
What Is Exploratory Data Analysis (EDA) ?
No ratings yet
What Is Exploratory Data Analysis (EDA) ?
6 pages
Unit 1
No ratings yet
Unit 1
19 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
unit-1
No ratings yet
unit-1
50 pages
DSP UNIT - II
No ratings yet
DSP UNIT - II
14 pages
devish all unit
No ratings yet
devish all unit
42 pages
Exploratory Dataanalysis (EDA) : Kevin Angelo A. Inlong
No ratings yet
Exploratory Dataanalysis (EDA) : Kevin Angelo A. Inlong
6 pages
Unit 4
No ratings yet
Unit 4
33 pages
Module 2 PPT
No ratings yet
Module 2 PPT
78 pages
What Is Exploratory Data Analysis?: Intuition
No ratings yet
What Is Exploratory Data Analysis?: Intuition
8 pages
Unit 3
No ratings yet
Unit 3
222 pages
Data Sciecnce
No ratings yet
Data Sciecnce
16 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
Unit - 1 EDA
No ratings yet
Unit - 1 EDA
123 pages
5. Exploratory Data Analysis (EDA) in Data
No ratings yet
5. Exploratory Data Analysis (EDA) in Data
12 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
12 pages
What Is EDA in Data Science - Everything About Exploratory Data - by Aman Kharwal - Medium
No ratings yet
What Is EDA in Data Science - Everything About Exploratory Data - by Aman Kharwal - Medium
11 pages
EDA and Cleaning
No ratings yet
EDA and Cleaning
24 pages
IOT Domain
No ratings yet
IOT Domain
70 pages
Perform Exploratory Data Analysis
No ratings yet
Perform Exploratory Data Analysis
5 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Python for Data Analysis 2nd module (2)
No ratings yet
Python for Data Analysis 2nd module (2)
14 pages
Data Scientist Roadmap
From Everand
Data Scientist Roadmap
Mohammed Ahmed
5/5 (1)
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Attorney Misconduct Case Study Complete (1)
No ratings yet
Attorney Misconduct Case Study Complete (1)
2 pages
multidimensional_poverty
No ratings yet
multidimensional_poverty
3 pages
topic_15
No ratings yet
topic_15
21 pages
Topic 7-Python Functions Modules
No ratings yet
Topic 7-Python Functions Modules
16 pages
BCT 321 Assignment 2 - 7.7.2024
No ratings yet
BCT 321 Assignment 2 - 7.7.2024
1 page
Lesson 4 Exploring Agricultural Insights With Anova in Python
No ratings yet
Lesson 4 Exploring Agricultural Insights With Anova in Python
9 pages
Lecture 8- Artificial Neural Networks
No ratings yet
Lecture 8- Artificial Neural Networks
41 pages
Introduction to DAX in Power BI
100% (1)
Introduction to DAX in Power BI
18 pages
BSD 321 User Centered Design Case Study
No ratings yet
BSD 321 User Centered Design Case Study
2 pages
LECTURE 3-BDM 411 Data Analytics and BIG Data
No ratings yet
LECTURE 3-BDM 411 Data Analytics and BIG Data
49 pages
Between Adolescence and Adulthood Expectations Abo
No ratings yet
Between Adolescence and Adulthood Expectations Abo
22 pages
Handbook of Statistical Methods for Randomized Controlled Trials, 1st Edition Final Version Download
No ratings yet
Handbook of Statistical Methods for Randomized Controlled Trials, 1st Edition Final Version Download
16 pages
Lampiran Analisis Bivariat Status Gizi Dengan Prestasi Belajar
No ratings yet
Lampiran Analisis Bivariat Status Gizi Dengan Prestasi Belajar
2 pages
Honest Causal Forests
No ratings yet
Honest Causal Forests
5 pages
QBUS5001 Practice Questions - Topic 2 One Sample Inference For The Proportion
No ratings yet
QBUS5001 Practice Questions - Topic 2 One Sample Inference For The Proportion
12 pages
Module 4 - Study Material - Overview of Predictive Analytics
No ratings yet
Module 4 - Study Material - Overview of Predictive Analytics
15 pages
Doug Bates Mixed Models
No ratings yet
Doug Bates Mixed Models
75 pages
MD Under Discrete Series
No ratings yet
MD Under Discrete Series
8 pages
DAL Assignment 4 Endsem
No ratings yet
DAL Assignment 4 Endsem
8 pages
De Chaisemartin D Haultfœuille 2020 Two Way Fixed Effects Estimators With Heterogeneous Treatment Effects
No ratings yet
De Chaisemartin D Haultfœuille 2020 Two Way Fixed Effects Estimators With Heterogeneous Treatment Effects
35 pages
A Review On Linear Regression Comprehensive in Machine Learning
No ratings yet
A Review On Linear Regression Comprehensive in Machine Learning
8 pages
7.2_ Sample Proportions
No ratings yet
7.2_ Sample Proportions
6 pages
Formula Sheet
No ratings yet
Formula Sheet
21 pages
Power Comparison of Correlation Tests (Simulation) : PASS Sample Size Software
No ratings yet
Power Comparison of Correlation Tests (Simulation) : PASS Sample Size Software
9 pages
Randomized Block Design!: Exercises
No ratings yet
Randomized Block Design!: Exercises
5 pages
DOEppt
No ratings yet
DOEppt
16 pages
Primary Research: Product Placement in Movies
No ratings yet
Primary Research: Product Placement in Movies
31 pages
Business Statistics Unit 2
No ratings yet
Business Statistics Unit 2
17 pages
Statistics & Probability
No ratings yet
Statistics & Probability
11 pages
Kest 106
No ratings yet
Kest 106
17 pages
Control-Charts Compress
No ratings yet
Control-Charts Compress
19 pages
BS Assignment
No ratings yet
BS Assignment
14 pages
Homework 1
No ratings yet
Homework 1
3 pages
HISTOGRAMS
No ratings yet
HISTOGRAMS
5 pages
Regression Statistics: Anova
No ratings yet
Regression Statistics: Anova
2 pages
4T Test
No ratings yet
4T Test
23 pages
Quick Stata Guide
No ratings yet
Quick Stata Guide
22 pages
SYSTEMATIC REVIEW: Are The Results of The Review Valid?: Introduction Should Clearly State The Question. If You
No ratings yet
SYSTEMATIC REVIEW: Are The Results of The Review Valid?: Introduction Should Clearly State The Question. If You
2 pages
Individual Assignment Fundamental of Biostatistics
No ratings yet
Individual Assignment Fundamental of Biostatistics
2 pages
AP Bio Lab 7
100% (1)
AP Bio Lab 7
10 pages