0% found this document useful (0 votes)
17 views6 pages

Eda Expt

Uploaded by

mohsinzari468
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Eda Expt

Uploaded by

mohsinzari468
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Experiment No.

– 01

Aim: To Perform Exploratory Data Analysis (EDA) on automobile data

Prerequisites: - Automobile Data, Jupyter Notebook, Google Colab

Theory:

Exploratory Data Analysis (EDA) is an analysis approach that identifies general patterns in
the data. These patterns include outliers and features of the data that might be unexpected.
EDA is an important first step in any data analysis. Exploratory Data Analysis refers to the
critical process of performing initial investigations on data so as to discover patterns, to spot
anomalies, to test hypothesis and to check assumptions with the help of summary statistics
and graphical representations. Data analytics in automotive industry enhances supply chain
management by providing insights into the entire supply chain, from raw material suppliers to
finished vehicle dealerships. This information can be used to improve efficiency, reduce
costs, and mitigate risks. An automobile dataset typically includes information about various
types of vehicles, such as cars, trucks, and motorcycles. The dataset may include information
about the make, model, year, and manufacturer of the vehicle.

Types of EDA:

There are four types of EDA viz;

1. Univariate Analysis

2. Bivariate Analysis

3. Multivariate Analysis

Types of Exploratory Data Analysis (EDA)

1. Univariate Analysis
Definition: Focuses on analyzing a single variable at a time.
Purpose: To understand the variable's distribution, central tendency, and spread.
Techniques: Descriptive statistics (mean, median, mode, variance, standard deviation).
Visualizations (histograms, box plots, bar charts, pie charts).

2. Bivariate Analysis

Definition: Examines the relationship between two variables.


Purpose: To understand how one variable affects or is associated with another.
Techniques: Scatter plots.
Correlation coefficients (Pearson, Spearman).
Cross-tabulations and contingency tables.
Visualizations (line plots, scatter plots, pair plots).

3. Multivariate Analysis

Definition: Investigates interactions between three or more variables.


Purpose: To understand the complex relationships and interactions in the data.
Techniques:
Multivariate plots (pair plots, parallel coordinates plots).
Dimensionality reduction techniques (PCA, t-SNE).
Cluster analysis.
Heatmaps and correlation matrices.

Common visualizations in Exploratory Data Analysis include;

1. Histograms,

2. Scatter Plots,

3. Box Plots,

4. Bar Charts,

5. Line Charts,

6. Heatmaps, And

7. pair plots

Conclusion:

EDA is an essential process for data scientists to analyze the data before reaching final
assumptions. So, It can help data scientists to identify errors, and abnormal events, promote a
better understanding of patterns within the data, and help in understanding the data set
variables. Explanatory data analysis (EDA) is a formal and rigorous approach to data analysis
that is used to test hypotheses, make predictions, and draw conclusions based on the data. It
helps us comprehend the underlying patterns, identify potential biases or confounding factors,
and communicate hidden data insights. Here we Concludes as;

 EDA greatly improves an analyst's core understanding of different variables. ...


 More importantly, EDA can help analysts identify major errors, any anomalies, or
missing values in their dataset. ...
 EDA can also help analysts identify key patterns.
 EDA also helps to find or identify any potential outliers or anomalies in the dataset.
Outliers can have a significant impact on the ML model or data analysis results. So,
removing outliers or dealing with outliers becomes a critical part of the data science
process.
['Automobile_data.csv']

Data Loading

In [2]:

Data Cleaning

Data contains "?" replace it with NAN

In [3]:
Missing Data

fill missing data of normalised-losses, price, horsepower, peak-rpm, bore, stroke with the

respective column mean Fill missing data category Number of doors with the mode of the column

i.e. Four
In [4]:
df_temp = df_automobile[df_automobile['normalized-losses']!='?'] normalised_mean =
df_temp['normalized-losses'].astype(int).mean()
df_automobile['normalized-losses'] = df_automobile['normalized-losses'].replace('?',normalised_mean).astype(int)

df_temp = df_automobile[df_automobile['price']!='?'] normalised_mean =


df_temp['price'].astype(int).mean()
df_automobile['price'] = df_automobile['price'].replace('?',normalised_mean).astype(int)

df_temp = df_automobile[df_automobile['horsepower']!='?'] normalised_mean =


df_temp['horsepower'].astype(int).mean()
df_automobile['horsepower'] = df_automobile['horsepower'].replace('?',normalised_mean).astype(int)

df_temp = df_automobile[df_automobile['peak-rpm']!='?'] normalised_mean =


df_temp['peak-rpm'].astype(int).mean()
df_automobile['peak-rpm'] = df_automobile['peak-rpm'].replace('?',normalised_mean).astype(int)

df_temp = df_automobile[df_automobile['bore']!='?']
normalised_mean = df_temp['bore'].astype(float).mean()
df_automobile['bore'] = df_automobile['bore'].replace('?',normalised_mean).astype(float)

df_temp = df_automobile[df_automobile['stroke']!='?']
normalised_mean = df_temp['stroke'].astype(float).mean()
df_automobile['stroke'] = df_automobile['stroke'].replace('?',normalised_mean).astype(float)

df_automobile['num-of-doors'] = df_automobile['num-of-doors'].replace('?','four') df_automobile.head()

Out[4]:

Summary statistics of variable

In [5]:

Out[5]:
In [7]:

Findings

More than 70 % of the vehicle has Ohc type of Engine


57% of the cars has 4 doors

Gas is preferred by 85 % of the vehicles


Most produced vehicle are of body style sedan around 48% followed by hatchback 32%

You might also like