Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17
Exploratory Data Analysis
Objectives Exploratory Data Analysis Implement descriptive statistics Demonstrate the basics of grouping
Exploratory Data Analysis 2
Exploratory Data Analysis- EDA EDA is an approach to analyze data in order to: Summarize main characteristics of the data Gain a better understanding of the data set Uncover relationships between different variables Extract important variables for the problem we're trying to solve.
The main question for car price predict problem:
What are the characteristics that have the most impact on the car price?
Exploratory Data Analysis 3
Descriptive Statistics It’s important to first explore your data before you spend time building complicated models Descriptive statistical analysis helps to describe basic features of a data set df.describe(): summarize statistics
Exploratory Data Analysis 4
Descriptive Statistics value_count(): Summarize the categorical data
summarize the categorical data:
118 cars in the front wheel drive category. 75 cars in the rear wheel drive category 8 cars in the four wheel drive category
Exploratory Data Analysis 5
Descriptive Statistics Box plots are a great way to visualize numeric data
Give your opinion about this chart!
Exploratory Data Analysis 6
Descriptive Statistics Box plots make it easy to compare between groups. We can see the distribution of different categories of the drive wheels feature over the price feature. The distribution of price between the rear-wheel drive, and the other categories are distinct. The price for front-wheel drive and four-wheel drive are almost indistinguishable.
Exploratory Data Analysis 7
Descriptive Statistics What if we want to understand the relationship between engine size and price Could engine size possibly predict the price of a car? One good way to visualize this is using a scatter plot. Each observation in the scatter plot is represented as a point. This plot shows the relationship between two variables. The predictor variable is the variable you use to predict an outcome. The target variable is the variable that you are trying to predict
Exploratory Data Analysis 8
Descriptive Statistics
linear relationship between these two variables
Exploratory Data Analysis 9
Group By in Python Problem: Is there any relationship between the different types of drive system, forward, rear, and four-wheel drive, and the price of the vehicles If so, which type of drive system adds the most value to a vehicle?
Student gives the answers
Exploratory Data Analysis 10
Group By in Python Problem: Is there any relationship between the different types of drive system, forward, rear, and four-wheel drive, and the price of the vehicles If so, which type of drive system adds the most value to a vehicle?
Solution: we could group all the data by the different types of
drive wheels and compare the results of these different drive wheels against each other. df.groupby(): grouping data Can be applied on categorical variables Group data into categories Single or multiple variables
Exploratory Data Analysis 11
Group By in Python
Exploratory Data Analysis 12
Group By in Python pivot(): one variable displayed along the columns and the other variable displayed along the rows
Exploratory Data Analysis 13
Group By in Python Heat map takes a rectangular grid of data and assigns a color intensity based on the data value at the grid points. It is a great way to plot the target variable over multiple variables and through this get visual clues with the relationship between these variables and the target.
Exploratory Data Analysis 14
Group By in Python Each type of body style is numbered along the x-axis and each type of drive wheels is numbered along the y-axis. The average prices are plotted with varying colors based on their values. The top section of the heat map seems to have higher prices than the bottom section.
Exploratory Data Analysis 15
Summary Exploratory Data Analysis Implement descriptive statistics Demonstrate the basics of grouping