0% found this document useful (0 votes)
37 views13 pages

DAL - 1 Complete

This document describes an exploratory data analysis project on automobile data using Python. The analysis includes loading and cleaning the data, calculating summary statistics, and performing univariate and bivariate analyses. Univariate analyses include histograms of various variables and frequency counts of categorical variables. Bivariate analyses include box plots, scatter plots, and correlation calculations to explore relationships between variables like price, engine size, fuel efficiency, and other attributes.

Uploaded by

sagar korde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views13 pages

DAL - 1 Complete

This document describes an exploratory data analysis project on automobile data using Python. The analysis includes loading and cleaning the data, calculating summary statistics, and performing univariate and bivariate analyses. Univariate analyses include histograms of various variables and frequency counts of categorical variables. Bivariate analyses include box plots, scatter plots, and correlation calculations to explore relationships between variables like price, engine size, fuel efficiency, and other attributes.

Uploaded by

sagar korde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

PDVVP College of Engineering

Practical No. 1

Aim- To perform Exploratory Data Analysis on Automobile data.

Prerequisites- Automobile data, Jupyter Notebook .

Program & Outputs-


1. Packages-
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

2. Data loading-
dataset=pd.read_csv(r'C:\Users\My Pc\Desktop\automobile.csv')
dataset

1
PDVVP College of Engineering

dataset.head()

dataset.shape

dataset.info()

2
PDVVP College of Engineering

3. Data cleaning-
data=dataset.replace('?',np.NAN)
data.isnull().sum()

dataset['stroke'].fillna('np.nan',inplace=True)
dataset['horsepower-binned'].fillna('np.nan',inplace=True)

dataset.isnull().sum()

3
PDVVP College of Engineering

dataset.head(10)

4. Summary statistics of variable-


dataset.describe()

4
PDVVP College of Engineering

5. Univariate analysis-
plt.figure(figsize=(10,8))
dataset[['engine-size','peak-rpm','curb-weight','horsepower','price']].hist(figsize=(10,8))
plt.figure(figsize=(10,8))
plt.tight_layout()
plt.show()

5
PDVVP College of Engineering

6. Findings-
plt.figure(1)
plt.subplot(221)
dataset['engine-
type'].value_counts(normalize=True).plot(figsize=(10,8),kind='bar',color='green')
plt.title("Number of Engine TYpe frequency diagram")
plt.ylabel('Number of Engine Type')
plt.xlabel('engine-type');
plt.subplot(222)

6
PDVVP College of Engineering

dataset['num-of-
doors'].value_counts(normalize=True).plot(figsize=(10,8),kind='bar',color='red')
plt.title("Number of Door frequency diagram")
plt.ylabel('Number of Doors')
plt.xlabel('num-of-doors');
plt.subplot(224)
dataset['body-
style'].value_counts(normalize=True).plot(figsize=(10,8),kind='bar',color='purple')
plt.title("Number of Body Style frequency diagram")
plt.ylabel('Number of vehicles')
plt.xlabel('body-style');
plt.tight_layout()
plt.show()

7
PDVVP College of Engineering

corr=dataset.corr()
plt.figure(figsize=(20,9))
a=sns.heatmap(corr,cmap='brg',annot=True,fmt='.2f')

8
PDVVP College of Engineering

7. Bivariate analysis –
plt.rcParams['figure.figsize']=(18,9)
ax=sns.boxplot(x="make",y="price",data=dataset)

plt.rcParams['figure.figsize']=(19,7)
ax=sns.boxplot(x="body-style",y="price",data=dataset)

9
PDVVP College of Engineering

plt.rcParams['figure.figsize']=(10,5)
ax=sns.boxplot(x="drive-wheels",y="price",data=dataset)

sns.regplot(x="engine-size",y="price",data=dataset)
plt.ylim(0,)

dataset[["engine-size","price"]].corr()

10
PDVVP College of Engineering

sns.regplot(x="highway-mpg",y="price",data=dataset)
plt.ylim(0,)

data[['highway-mpg','price']].corr()

sns.regplot(x="peak-rpm",y="price",data=dataset)

11
PDVVP College of Engineering

dataset[['peak-rpm','price']].corr()

sns.barplot(x="body-style",y="price",data=dataset)

sns.barplot(x="engine-location",y="price",data=dataset)

12
PDVVP College of Engineering

sns.barplot(x="drive-wheels",y="price",data=dataset)

13

You might also like