1.3.1. Exploratory Data Analysis
1.3.1. Exploratory Data Analysis
1
Learning Goals
In this section, we will cover:
- Approaches to conducting Exploratory Data Analysis (EDA)
- EDA techniques
- Sampling from DataFrames
- Producing EDA visualizations
2
What is Exploratory Data Analysis?
Exploratory data analysis (EDA) is an approach for analyzing
data sets to summarize their main characteristics, often with visual
methods.
3
Why is EDA Useful?
EDA allows us to get an initial feel for the data.
4
Techniques for EDA
Summary Statistics:
Average, Median, Min, Max, Correlations, etc.
Visualizations:
Histograms, Scatter Plots, Box Plots, etc.
5
Tools for EDA
Data Wrangling:
Pandas
Visualization:
Matplotlib, Seaborn
6
EDA: Job Applicant Summary Statistics
8
Sampling from DataFrames
9
Visualization Libraries
- Matplotlib
- Pandas (via Matplotlib)
- Seaborn
● Statistically-focused plotting methods
● Global preferences incorporated by Matplotlib
10
Basic Scatter Plots with Matplotlib
11
Scatter Plots with Multiple Layers
12
Histograms
13
Customizing Plots
14
Customizing Plots: by Group
15
Pair Plots for Features
16
Pair Plots for Features
17
Pair Plots for Features
18
Seaborn Example: Hexbin Plot
19
Seaborn Example: Facet Grid
20
Seaborn Example: Facet Grid
21
Seaborn Example: Facet Grid
22
Summary
● Exploratory Data Analysis
○ EDA is an approach to analyzing data sets that summarizes their main characteristics, often
using visual methods. It helps you determine if the data is usable as-is, or if it needs further
data cleaning.
○ EDA is also important in the process of identifying patterns, observing trends, and formulating
hypothesis.
○ Common summary statistics for EDA include finding summary statistics and producing
visualizations.
23
Learning Recap
In this section, we discussed:
- Approaches to conducting Exploratory Data Analysis (EDA)
- EDA techniques
- Sampling from DataFrames
- Producing EDA visualizations
24