0% found this document useful (0 votes)
5 views

1.5 Data Analysis with Python- Exploratory Data Analysis 1

Uploaded by

namdudotran1
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

1.5 Data Analysis with Python- Exploratory Data Analysis 1

Uploaded by

namdudotran1
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

Exploratory Data Analysis

Objectives
 Exploratory Data Analysis
 Implement descriptive statistics
 Demonstrate the basics of grouping

Exploratory Data Analysis 2


Exploratory Data Analysis-
EDA
 EDA is an approach to analyze data in order to:
 Summarize main characteristics of the data
 Gain a better understanding of the data set
 Uncover relationships between different variables
 Extract important variables for the problem we're trying to
solve.

 The main question for car price predict problem:


 What are the characteristics that have the most impact on
the car price?

Exploratory Data Analysis 3


Descriptive Statistics
 It’s important to first explore your data before you spend
time building complicated models
 Descriptive statistical analysis helps to describe basic
features of a data set
 df.describe(): summarize statistics

Exploratory Data Analysis 4


Descriptive Statistics
 value_count(): Summarize the categorical data

 summarize the categorical data:


 118 cars in the front wheel drive
category.
 75 cars in the rear wheel drive category
 8 cars in the four wheel drive category

Exploratory Data Analysis 5


Descriptive Statistics
 Box plots are a great way to visualize numeric data

 Give your opinion about this chart!

Exploratory Data Analysis 6


Descriptive Statistics
 Box plots make it easy to compare
between groups.
 We can see the distribution of
different categories of the drive
wheels feature over the price
feature.
 The distribution of price between
the rear-wheel drive, and the other
categories are distinct.
 The price for front-wheel drive and
four-wheel drive are almost
indistinguishable.

Exploratory Data Analysis 7


Descriptive Statistics
 What if we want to understand the relationship between
engine size and price Could engine size possibly predict the
price of a car?
 One good way to visualize this is using a scatter plot.
 Each observation in the scatter plot is represented as a point.
This plot shows the relationship between two variables.
 The predictor variable is the variable you use to predict an
outcome.
 The target variable is the variable that you are trying to predict

Exploratory Data Analysis 8


Descriptive Statistics

 linear relationship between these two variables

Exploratory Data Analysis 9


Group By in Python
 Problem:
 Is there any relationship between the different types of drive
system, forward, rear, and four-wheel drive, and the price of the
vehicles
 If so, which type of drive system adds the most value to a vehicle?

 Student gives the answers

Exploratory Data Analysis 10


Group By in Python
 Problem:
 Is there any relationship between the different types of drive
system, forward, rear, and four-wheel drive, and the price of the vehicles
 If so, which type of drive system adds the most value to a vehicle?

 Solution: we could group all the data by the different types of


drive wheels and compare the results of these different drive
wheels against each other.
 df.groupby(): grouping data
 Can be applied on categorical variables
 Group data into categories
 Single or multiple variables

Exploratory Data Analysis 11


Group By in Python

Exploratory Data Analysis 12


Group By in Python
 pivot(): one variable displayed along the columns and
the other variable displayed along the rows

Exploratory Data Analysis 13


Group By in Python
 Heat map takes a rectangular grid of data and assigns a
color intensity based on the data value at the grid
points.
 It is a great way to plot the target variable over multiple
variables and through this get visual clues with the
relationship between these variables and the target.

Exploratory Data Analysis 14


Group By in Python
 Each type of body style is numbered along the x-axis and each type of
drive wheels is numbered along the y-axis.
 The average prices are plotted with varying colors based on their
values.
 The top section of the heat map seems to have higher prices than the
bottom section.

Exploratory Data Analysis 15


Summary
 Exploratory Data Analysis
 Implement descriptive statistics
 Demonstrate the basics of grouping

Exploratory Data Analysis 16


Q&A

Exploratory Data Analysis 17

You might also like