0% found this document useful (0 votes)

17 views6 pages

Eda Expt

Uploaded by

mohsinzari468

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

Eda Expt

Uploaded by

mohsinzari468

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Experiment No.

– 01

Aim: To Perform Exploratory Data Analysis (EDA) on automobile data

Prerequisites: - Automobile Data, Jupyter Notebook, Google Colab

Theory:

Exploratory Data Analysis (EDA) is an analysis approach that identifies general patterns in
the data. These patterns include outliers and features of the data that might be unexpected.
EDA is an important first step in any data analysis. Exploratory Data Analysis refers to the
critical process of performing initial investigations on data so as to discover patterns, to spot
anomalies, to test hypothesis and to check assumptions with the help of summary statistics
and graphical representations. Data analytics in automotive industry enhances supply chain
management by providing insights into the entire supply chain, from raw material suppliers to
finished vehicle dealerships. This information can be used to improve efficiency, reduce
costs, and mitigate risks. An automobile dataset typically includes information about various
types of vehicles, such as cars, trucks, and motorcycles. The dataset may include information
about the make, model, year, and manufacturer of the vehicle.

Types of EDA:

There are four types of EDA viz;

1. Univariate Analysis

2. Bivariate Analysis

3. Multivariate Analysis

Types of Exploratory Data Analysis (EDA)

1. Univariate Analysis
Definition: Focuses on analyzing a single variable at a time.
Purpose: To understand the variable's distribution, central tendency, and spread.
Techniques: Descriptive statistics (mean, median, mode, variance, standard deviation).
Visualizations (histograms, box plots, bar charts, pie charts).

2. Bivariate Analysis

Definition: Examines the relationship between two variables.

Purpose: To understand how one variable affects or is associated with another.
Techniques: Scatter plots.
Correlation coefficients (Pearson, Spearman).
Cross-tabulations and contingency tables.
Visualizations (line plots, scatter plots, pair plots).

3. Multivariate Analysis

Definition: Investigates interactions between three or more variables.

Purpose: To understand the complex relationships and interactions in the data.
Techniques:
Multivariate plots (pair plots, parallel coordinates plots).
Dimensionality reduction techniques (PCA, t-SNE).
Cluster analysis.
Heatmaps and correlation matrices.

Common visualizations in Exploratory Data Analysis include;

1. Histograms,

2. Scatter Plots,

3. Box Plots,

4. Bar Charts,

5. Line Charts,

6. Heatmaps, And

7. pair plots

Conclusion:

EDA is an essential process for data scientists to analyze the data before reaching final
assumptions. So, It can help data scientists to identify errors, and abnormal events, promote a
better understanding of patterns within the data, and help in understanding the data set
variables. Explanatory data analysis (EDA) is a formal and rigorous approach to data analysis
that is used to test hypotheses, make predictions, and draw conclusions based on the data. It
helps us comprehend the underlying patterns, identify potential biases or confounding factors,
and communicate hidden data insights. Here we Concludes as;

 EDA greatly improves an analyst's core understanding of different variables. ...

 More importantly, EDA can help analysts identify major errors, any anomalies, or
missing values in their dataset. ...
 EDA can also help analysts identify key patterns.
 EDA also helps to find or identify any potential outliers or anomalies in the dataset.
Outliers can have a significant impact on the ML model or data analysis results. So,
removing outliers or dealing with outliers becomes a critical part of the data science
process.
['Automobile_data.csv']

Data Loading

In [2]:

Data Cleaning

Data contains "?" replace it with NAN

In [3]:
Missing Data

fill missing data of normalised-losses, price, horsepower, peak-rpm, bore, stroke with the

respective column mean Fill missing data category Number of doors with the mode of the column

i.e. Four
In [4]:
df_temp = df_automobile[df_automobile['normalized-losses']!='?'] normalised_mean =
df_temp['normalized-losses'].astype(int).mean()
df_automobile['normalized-losses'] = df_automobile['normalized-losses'].replace('?',normalised_mean).astype(int)

df_temp = df_automobile[df_automobile['price']!='?'] normalised_mean =

df_temp['price'].astype(int).mean()
df_automobile['price'] = df_automobile['price'].replace('?',normalised_mean).astype(int)

df_temp = df_automobile[df_automobile['horsepower']!='?'] normalised_mean =

df_temp['horsepower'].astype(int).mean()
df_automobile['horsepower'] = df_automobile['horsepower'].replace('?',normalised_mean).astype(int)

df_temp = df_automobile[df_automobile['peak-rpm']!='?'] normalised_mean =

df_temp['peak-rpm'].astype(int).mean()
df_automobile['peak-rpm'] = df_automobile['peak-rpm'].replace('?',normalised_mean).astype(int)

df_temp = df_automobile[df_automobile['bore']!='?']
normalised_mean = df_temp['bore'].astype(float).mean()
df_automobile['bore'] = df_automobile['bore'].replace('?',normalised_mean).astype(float)

df_temp = df_automobile[df_automobile['stroke']!='?']
normalised_mean = df_temp['stroke'].astype(float).mean()
df_automobile['stroke'] = df_automobile['stroke'].replace('?',normalised_mean).astype(float)

df_automobile['num-of-doors'] = df_automobile['num-of-doors'].replace('?','four') df_automobile.head()

Out[4]:

Summary statistics of variable

In [5]:

Out[5]:
In [7]:

Findings

More than 70 % of the vehicle has Ohc type of Engine

57% of the cars has 4 doors

Gas is preferred by 85 % of the vehicles

Most produced vehicle are of body style sedan around 48% followed by hatchback 32%

Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
173 pages
Data Exploration and Visualization
100% (1)
Data Exploration and Visualization
281 pages
IMPDAV
No ratings yet
IMPDAV
105 pages
EDA - Task
No ratings yet
EDA - Task
20 pages
DS203 2024 09 06 Data Problems 1
No ratings yet
DS203 2024 09 06 Data Problems 1
25 pages
Datascience 3
No ratings yet
Datascience 3
40 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Lecture 22
No ratings yet
Lecture 22
20 pages
Swaraj Project
No ratings yet
Swaraj Project
16 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
Lab07ML - f40
No ratings yet
Lab07ML - f40
13 pages
Cmps 396X Advanceddata Science: Fatima K. Abu Salem Exploratory Analysis Driving Visual Analysis With Automobile Data
100% (1)
Cmps 396X Advanceddata Science: Fatima K. Abu Salem Exploratory Analysis Driving Visual Analysis With Automobile Data
113 pages
Eda 1
No ratings yet
Eda 1
25 pages
Engo 645
No ratings yet
Engo 645
9 pages
Exploratory Data Analysis (EDA) Using Python
No ratings yet
Exploratory Data Analysis (EDA) Using Python
21 pages
Intro To Exploratory Data Analysis Eda in Python
No ratings yet
Intro To Exploratory Data Analysis Eda in Python
7 pages
AI-MAJOR-AUGUST - Aryal Ashish
No ratings yet
AI-MAJOR-AUGUST - Aryal Ashish
16 pages
Intro
No ratings yet
Intro
26 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Exploratory Data Analysis (EDA) in Data
No ratings yet
Exploratory Data Analysis (EDA) in Data
12 pages
Group 7
No ratings yet
Group 7
19 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
ML Exp1 - 2201107
No ratings yet
ML Exp1 - 2201107
34 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
Document
No ratings yet
Document
21 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
Unit 1
No ratings yet
Unit 1
23 pages
Revised Group 3 Chapter 123 OJT Monitoring System
100% (1)
Revised Group 3 Chapter 123 OJT Monitoring System
21 pages
Data Sciecnce
No ratings yet
Data Sciecnce
16 pages
Dev Core
No ratings yet
Dev Core
7 pages
Project Report
No ratings yet
Project Report
7 pages
Unit 4
No ratings yet
Unit 4
33 pages
Unit 1
No ratings yet
Unit 1
50 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Exploratory Data Analysis EDA Part of Data PreProcessing
No ratings yet
Exploratory Data Analysis EDA Part of Data PreProcessing
11 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
ML Exp No 1
No ratings yet
ML Exp No 1
8 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
EDA QB Full Answers
No ratings yet
EDA QB Full Answers
18 pages
Perform Exploratory Data Analysis
No ratings yet
Perform Exploratory Data Analysis
5 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Eda Notes
No ratings yet
Eda Notes
4 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
Exploratory Data Analysis in ML
No ratings yet
Exploratory Data Analysis in ML
7 pages
Unit 1
No ratings yet
Unit 1
19 pages
Unit 3
No ratings yet
Unit 3
47 pages
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
No ratings yet
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
4 pages
Business Analytics: Methods, Models, and Decisions: Descriptive Statistics
No ratings yet
Business Analytics: Methods, Models, and Decisions: Descriptive Statistics
100 pages
Revision and Reflection L4M2 v1-3
100% (1)
Revision and Reflection L4M2 v1-3
15 pages
Perception of Selected Senior High Schoo
No ratings yet
Perception of Selected Senior High Schoo
50 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
D. Preliminary Activities
100% (1)
D. Preliminary Activities
16 pages
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
What Is Exploratory Data Analysis?: Intuition
No ratings yet
What Is Exploratory Data Analysis?: Intuition
8 pages
Dev 1
No ratings yet
Dev 1
2 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
TOR - Manggar Waste Management PPP Project
No ratings yet
TOR - Manggar Waste Management PPP Project
24 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
Why Doe The Philippines Import Rice
No ratings yet
Why Doe The Philippines Import Rice
170 pages
A ENGLISH 10 Q4M5 Teacher Copy Final Layout
No ratings yet
A ENGLISH 10 Q4M5 Teacher Copy Final Layout
23 pages
Ms Data Science S, 24 (WEEK# 1) Unlock
No ratings yet
Ms Data Science S, 24 (WEEK# 1) Unlock
31 pages
Civil Engineering Lab Report Format Final
No ratings yet
Civil Engineering Lab Report Format Final
4 pages
Brand Equity UCB
No ratings yet
Brand Equity UCB
24 pages
Thesis Writing Handbook
100% (3)
Thesis Writing Handbook
6 pages
KPMG Global Tech Report
No ratings yet
KPMG Global Tech Report
32 pages
University of Ghana Thesis Repository
100% (3)
University of Ghana Thesis Repository
6 pages
Chapter 8 Es
No ratings yet
Chapter 8 Es
97 pages
Seminar
No ratings yet
Seminar
27 pages
HOTS in The Classroom V.2.1
No ratings yet
HOTS in The Classroom V.2.1
30 pages
ERCf9 Informed Consent
No ratings yet
ERCf9 Informed Consent
3 pages
Test Bank For Organization Theory and Design 11th Edition by Daft
100% (48)
Test Bank For Organization Theory and Design 11th Edition by Daft
16 pages
Learn SAS Programming
No ratings yet
Learn SAS Programming
29 pages
Sail2023 Invite BPC Handout
No ratings yet
Sail2023 Invite BPC Handout
12 pages
Criteria For Selecting First 10 Minutes Goal Bets
No ratings yet
Criteria For Selecting First 10 Minutes Goal Bets
4 pages
Artikel 2
No ratings yet
Artikel 2
10 pages
Gaussian Distributions: Overview: This Worksheet Introduces The Properties of Gaussian Distributions, The
No ratings yet
Gaussian Distributions: Overview: This Worksheet Introduces The Properties of Gaussian Distributions, The
25 pages
The Influence of The Internet and Mobile Educational Apps On Academic Performance Among First-Year Students of Technological University of The Philippines-Cavite
No ratings yet
The Influence of The Internet and Mobile Educational Apps On Academic Performance Among First-Year Students of Technological University of The Philippines-Cavite
15 pages
1 Introduction To Applied Social Psychology: Lindastegandtalibrothengatter
No ratings yet
1 Introduction To Applied Social Psychology: Lindastegandtalibrothengatter
10 pages
Project Success and Failure
No ratings yet
Project Success and Failure
12 pages
Longterm Recovery From Hurricane Sandy Evidence From A Survey in New York City
No ratings yet
Longterm Recovery From Hurricane Sandy Evidence From A Survey in New York City
4 pages
Sotheby's Institute of Art and Claremont Graduate University Announce A New Master's Degree Program in Art Business
No ratings yet
Sotheby's Institute of Art and Claremont Graduate University Announce A New Master's Degree Program in Art Business
3 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet

Eda Expt

Uploaded by

Eda Expt

Uploaded by

Experiment No.

Aim: To Perform Exploratory Data Analysis (EDA) on automobile data

Prerequisites: - Automobile Data, Jupyter Notebook, Google Colab

There are four types of EDA viz;

Types of Exploratory Data Analysis (EDA)

Definition: Examines the relationship between two variables.

Definition: Investigates interactions between three or more variables.

Common visualizations in Exploratory Data Analysis include;

 EDA greatly improves an analyst's core understanding of different variables. ...

Data contains "?" replace it with NAN

df_temp = df_automobile[df_automobile['price']!='?'] normalised_mean =

df_temp = df_automobile[df_automobile['horsepower']!='?'] normalised_mean =

df_temp = df_automobile[df_automobile['peak-rpm']!='?'] normalised_mean =

df_automobile['num-of-doors'] = df_automobile['num-of-doors'].replace('?','four') df_automobile.head()

Summary statistics of variable

More than 70 % of the vehicle has Ohc type of Engine

Gas is preferred by 85 % of the vehicles

You might also like