Exploratory Data Analysis
Objectives
Exploratory Data Analysis
Implement descriptive statistics
Demonstrate the basics of grouping
Exploratory Data Analysis 2
Exploratory Data Analysis-
EDA
EDA is an approach to analyze data in order to:
Summarize main characteristics of the data
Gain a better understanding of the data set
Uncover relationships between different variables
Extract important variables for the problem we're trying to
solve.
The main question for car price predict problem:
What are the characteristics that have the most impact on
the car price?
Exploratory Data Analysis 3
Descriptive Statistics
It’s important to first explore your data before you spend
time building complicated models
Descriptive statistical analysis helps to describe basic
features of a data set
df.describe(): summarize statistics
Exploratory Data Analysis 4
Descriptive Statistics
value_count(): Summarize the categorical data
summarize the categorical data:
118 cars in the front wheel drive
category.
75 cars in the rear wheel drive category
8 cars in the four wheel drive category
Exploratory Data Analysis 5
Descriptive Statistics
Box plots are a great way to visualize numeric data
Give your opinion about this chart!
Exploratory Data Analysis 6
Descriptive Statistics
Box plots make it easy to compare
between groups.
We can see the distribution of
different categories of the drive
wheels feature over the price
feature.
The distribution of price between
the rear-wheel drive, and the other
categories are distinct.
The price for front-wheel drive and
four-wheel drive are almost
indistinguishable.
Exploratory Data Analysis 7
Descriptive Statistics
What if we want to understand the relationship between
engine size and price Could engine size possibly predict the
price of a car?
One good way to visualize this is using a scatter plot.
Each observation in the scatter plot is represented as a point.
This plot shows the relationship between two variables.
The predictor variable is the variable you use to predict an
outcome.
The target variable is the variable that you are trying to predict
Exploratory Data Analysis 8
Descriptive Statistics
linear relationship between these two variables
Exploratory Data Analysis 9
Group By in Python
Problem:
Is there any relationship between the different types of drive
system, forward, rear, and four-wheel drive, and the price of the
vehicles
If so, which type of drive system adds the most value to a vehicle?
Student gives the answers
Exploratory Data Analysis 10
Group By in Python
Problem:
Is there any relationship between the different types of drive
system, forward, rear, and four-wheel drive, and the price of the vehicles
If so, which type of drive system adds the most value to a vehicle?
Solution: we could group all the data by the different types of
drive wheels and compare the results of these different drive
wheels against each other.
df.groupby(): grouping data
Can be applied on categorical variables
Group data into categories
Single or multiple variables
Exploratory Data Analysis 11
Group By in Python
Exploratory Data Analysis 12
Group By in Python
pivot(): one variable displayed along the columns and
the other variable displayed along the rows
Exploratory Data Analysis 13
Group By in Python
Heat map takes a rectangular grid of data and assigns a
color intensity based on the data value at the grid
points.
It is a great way to plot the target variable over multiple
variables and through this get visual clues with the
relationship between these variables and the target.
Exploratory Data Analysis 14
Group By in Python
Each type of body style is numbered along the x-axis and each type of
drive wheels is numbered along the y-axis.
The average prices are plotted with varying colors based on their
values.
The top section of the heat map seems to have higher prices than the
bottom section.
Exploratory Data Analysis 15
Summary
Exploratory Data Analysis
Implement descriptive statistics
Demonstrate the basics of grouping
Exploratory Data Analysis 16
Q&A
Exploratory Data Analysis 17