0% found this document useful (0 votes)

15 views15 pages

Trilokesh Assignment

Uploaded by

trilokesh51

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views15 pages

Trilokesh Assignment

Uploaded by

trilokesh51

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

11/11/2023, 21:32 MLops_Plotly.

ipynb - Colaboratory

import numpy as np
import pandas as pd

Read File

import plotly.express as px

df=pd.read_csv('cardekho.csv')
df.head(5)

Car_Name Year Selling_Price Kms_Driven Fuel_Type Seller_Type Transmission Owner

0 Maruti 800 AC 2007 60000 70000 Petrol Individual Manual First Owner

1 Maruti Wagon R LXI Minor 2007 135000 50000 Petrol Individual Manual First Owner

2 Hyundai Verna 1.6 SX 2012 600000 100000 Diesel Individual Manual First Owner

3 Datsun RediGO T Option 2017 250000 46000 Petrol Individual Manual First Owner

4 Honda Amaze VX i-DTEC 2014 450000 141000 Diesel Individual Manual Second Owner

df['Year'].sort_values(ascending=False)

3206 2022
4179 2021
1777 2020
2481 2020
1575 2020
...
3661 1997
61 1996
2972 1996
631 1995
3334 1992
Name: Year, Length: 4340, dtype: int64

current_age=2023
df['Age']=current_age-df['Year']

df.columns

Index(['Car_Name', 'Year', 'Selling_Price', 'Kms_Driven', 'Fuel_Type',

'Seller_Type', 'Transmission', 'Owner', 'Age'],
dtype='object')

Data Visualisation

# Scatter Matrix
scatter_matrix_fig = px.scatter_matrix(df, dimensions=['Year', 'Selling_Price', 'Kms_Driven'],\
color='Fuel_Type', title='Scatter Matrix')
scatter_matrix_fig.show()

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 1/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory

Scatter Matrix

2020

2010
Year

2000
fig = px.scatter(df, x='Selling_Price', y='Kms_Driven', title='Selling Price vs. Kms_Driven')
fig.show() 1990
Selling_Price

5M
Selling Price vs. Kms_Driven

800k
800k
Kms_Driven

600k
700k
400k
200k
600k
0
1990 2000 2010 2020 0 5M 0 200k 40
500k
Kms_Driven

Year Selling_Price Kms_

400k

300k

200k

100k

0 1M 2M 3M 4M 5M 6M 7M

Selling_Price

# Box Plot
box_plot_fig = px.box(df, x='Fuel_Type', y='Selling_Price', color='Transmission',\
title='Box Plot of Selling Price by Fuel Type and Transmission')
box_plot_fig.show()

Box Plot of Selling Price by Fuel Type and Transmission

6M
Selling_Price

Petrol Diesel CNG LPG

Fuel_Type

px.box(df,x='Selling_Price',points='suspectedoutliers')

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 2/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory

0 1M 2M 3M 4M 5M 6M 7M
df.loc[df['Kms_Driven']<1000]
Selling_Price

Car_Name Year Selling_Price Kms_Driven Fuel_Type Seller_Type Transmission Owner Age

1312 Mahindra Quanto C6 2014 250000 1 Diesel Individual Manual Second Owner 9

1714 Ford Freestyle Titanium Diesel 2020 784000 101 Diesel Dealer Manual Test Drive Car 3

1715 Ford Figo Titanium 2020 635000 101 Petrol Dealer Manual Test Drive Car 3

1716 Ford Ecosport 1.5 Diesel Titanium 2020 1000000 101 Diesel Dealer Manual Test Drive Car 3

There is no such co relation we can observe from this.

fig22 = px.pie(df,names='Seller_Type',title='Percentage of cars by Seller type')

fig22.show()

Percentage of cars by Seller type

22.9%

2.35%

74.7%

Individual seller type has the most percentage

sunburst_fig = px.sunburst(df, path=['Fuel_Type', 'Transmission'], values='Selling_Price',

title='Sunburst Chart of Selling Price by Fuel Type and Transmission')
sunburst_fig.show()

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 3/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory

Sunburst Chart of Selling Price by Fuel Type and Transmission

Manual

l
se
Die
LPG
Electric Manual
Automatic

CNG Manual

Manual

Auto
Automatic mat
Petrol ic

Manual

fig3 = px.histogram(df, x='Selling_Price', nbins=50, title='Histogram of Selling Price',color_discrete_sequence=['#6495ED'])

fig3.show()

Histogram of Selling Price

1200

1000

800
count

600

400

200

0
0 1M 2M 3M 4M 5M 6M 7M

Selling_Price

From the range of 2L to 4L no of cars sold highest almost 1400

cars=df.groupby('Year')['Year'].count()
fig = px.line(cars, x=cars.index, y=cars.values,color_discrete_sequence=['#6495ED'],title='Cars_Sold_By_Year')
fig.show()

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 4/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory

Cars_Sold_By_Year

400

1.Highest sales
300recorded in the year of 2017 and after that sales decreased. 2.Surprisingly 2019 cars sold is less but selling price was high in
that year,so there might be chance of elctric cars sold more in 2019.
y

200
Double-click (or enter) to edit

cars=df.groupby('Transmission')['Transmission'].count()
100
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Cars_Sold'},color_discrete_sequence=['#03DAC5'])
fig12.show()

1995 2000 2005 2010 2015

index
4000

3500

3000

2500
Cars_Sold

2000

1500

1000

500

0
Automatic Manual

index

Manual Gear type are more favourable than Automatic Gear type.

cars=df.groupby('Year')['Selling_Price'].mean()
fig = px.line(cars, x=cars.index, y=cars.values,title='Avg_selling_Price_by_Year',color_discrete_sequence=['#03DAC5'])
fig.show()

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 5/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory

Avg_selling_Price_by_Year

2019 has the highest selling year while in 1999 there was a sudden drop in selling,after that sale of cars gradually increase.
1M

fig_box = px.box(df, x='Kms_Driven', title='Distribution of Kms Driven', height=250,

color_discrete_sequence=['#03DAC5'],
0.8M )

fig_box.show()
0.6M
y

Distribution of Kms Driven

0.4M

0.2M

0 0 100k 200k 300k 400k 500k 600k

1995 2000 2005 2010 2015
Kms_Driven
Year

Avg kms driven is around 60k kms,while there are few cars who drove is 800k kms

fig_box = px.box(df, x='Selling_Price', title='Distribution of selling_price', height=250,

color_discrete_sequence=['#03DAC5'],
)

fig_box.show()

Distribution of selling_price

0 1M 2M 3M 4M 5M 6M 7M

Selling_Price

fig_box = px.box(df, x='Age', title='Distribution of selling_price', height=250,

color_discrete_sequence=['#03DAC5'],
)

fig_box.show()

Distribution of selling_price

0 5 10 15 20 25

Age

Avg Selling_Price is 350k while few cars Avg Selling_Price ranges between 8M and 9M

cars=df.groupby('Fuel_Type')['Kms_Driven'].mean()
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Avg_Kms_Driven'},color_discrete_sequence=['#6495ED'])
fig12.show()

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 6/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory

90k

80k

70k
Avg_Kms_Driven

60k

50k

40k

30k

20k

10k

0
CNG Diesel Electric LPG

Fuel_Type

LPG is the highest driven segment category cars

cars=df.groupby('Owner')['Selling_Price'].mean()
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Avg_Selling_Price_By_Ownership'},color_discrete_sequence=['#6495ED']
fig12.show()

0.8M
Avg_Selling_Price_By_Ownership

0.6M

0.4M

0.2M

0
First Owner Fourth & Above Owner Second Owner Test Drive Car

Owner

Test Drive Cars avg selling price is highest

cars=df.groupby('Fuel_Type')['Selling_Price'].mean()
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Avg_selling_price_by_Fuel_type'},color_discrete_sequence=['#03DAC5']
fig12.show()

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 7/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory

0.8M
Avg_selling_price_by_Fuel_type

0.6M

0.4M

Pricing of eletric cars are more than diesel,though diesel cars sold more than eletric cars.

0.2M
cars=df.groupby('Fuel_Type')['Fuel_Type'].count()
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Fuel_type_Cars_Sold'},color_discrete_sequence=['#6495ED'])
fig12.show()
0
CNG Diesel Electric LPG

Fuel_Type

2000
Fuel_type_Cars_Sold

1500

1000

500

0
CNG Diesel Electric LPG

index

Sale of Diesel cars are highest

Double-click (or enter) to edit

cars=df.groupby('Fuel_Type').agg({'Selling_Price':'mean','Fuel_Type':'count'})
cars['Revenue']=cars['Selling_Price']*cars['Fuel_Type']
fig = px.bar(cars, x=cars.index, y=cars['Revenue'],title='Avg Revenue by Fuel type')
fig.update_xaxes(categoryorder='total descending')
# fig.update_yaxes(showgrid=False),
# fig.update_xaxes(showgrid=False),

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 8/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory

Avg Revenue by Fuel type

1.4B

1.2B

1B
Revenue

Major Contribution
0.8B in Revenue is Diesel Cars

0.6B
fig = px.histogram(df, x="Owner", color="Fuel_Type", barmode="stack")
fig.update_layout(
0.4B
title="Cardekho - Stacked Column Chart by Owner and Fuel Type",
xaxis_title="Owner_Type",
yaxis_title="Count",
0.2B
legend_title="Fuel_Type")

fig.show() 0
Diesel Petrol CNG Electric

index

Cardekho - Stacked Column Chart by Owner and Fuel Type

2500

2000
Count

1500

1000

500

0
First Owner Second Owner Fourth & Above Owner Third Owner

Owner_Type

For every owner type their first preference is diesel car & the second is petrol

# Pair Plot
pair_plot_fig = px.scatter(df, x='Year', y='Selling_Price', color='Fuel_Type',\
marginal_y='violin', marginal_x='histogram',\
title='Pair Plot of Year vs Selling Price')
pair_plot_fig.show()

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 9/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory

Pair Plot of Year vs Selling Price

8M
corr=df.corr()
fig = px.imshow(corr, color_continuous_scale='YlOrRd',text_auto=True)
fig.update_layout(
6M
Selling_Price

title='Correlation Matrix',
margin=dict(l=100, r=100, t=100, b=100))
fig.show() 4M

<ipython-input-28-a1846bcfff27>:1: FutureWarning:
2M

The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select
0

1995 2000 2005 2010 2015 2020

Correlation Matrix Year

Year 1 0.4139634 −0.4196243 −1

Selling_Price 0.4139634 1 −0.1923481 −0.4139634

Kms_Driven −0.4196243 −0.1923481 1 0.4196243

Age −1 −0.4139634 0.4196243 1

Year Selling_Price Kms_Driven Age

cars=df.groupby('Age')['Selling_Price'].mean()
fig = px.bar(cars, x=cars.index, y=cars.values,title='Avg. Selling price by Age')
fig.update_xaxes(categoryorder='total descending')
fig.update_yaxes(showgrid=False),
fig.update_xaxes(showgrid=False),
fig.update_layout(xaxis_title='Age', yaxis_title="Avg. Selling Price",
plot_bgcolor='#2d3035', paper_bgcolor='#2d3035',
title_font=dict(size=25, color='#ffffff', family="Muli, sans-serif"),
font=dict(color='#ffffff'),
)

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 10/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory

Avg. Selling price by Age

0.8M
df['Brand_Name']=df['Car_Name'].str.split()
Avg. Selling Price

df['Brand_Name']=df['Brand_Name'].apply(lambda x:x[0])

0.6M
df.head(1)

0.4M
Car_Name Year Selling_Price Kms_Driven Fuel_Type Seller_Type Transmission Owner Age Brand_Name

0 Maruti 800 AC 2007 60000 70000 Petrol Individual Manual First Owner 16 Maruti

0.2M

cars=df.groupby('Brand_Name')['Selling_Price'].mean()
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Avg_SellingPrice_By_Btand'},color_discrete_sequence=['#03DAC5'])
0
fig12.update_xaxes(categoryorder='total descending')
5 10 15 20 25
fig12.show()
Age

3.5M

3M
Avg_SellingPrice_By_Btand

2.5M

1.5M

0.5M

0
La BM Me Vo J Au MG Je Isu Kia To Mi Ma F Op Ho Sk Vo Ni Hy Re Fo Ma
nd W rce lvo agu di ep zu yo tsu hin ord elCo nda od lks ss un na r ru
de ar ta bis dr a wa an da ult ce ti
s-B hi a rsa ge i
en n
z

Brand_Name

df['Brand_Name'].nunique()

cars=df.groupby('Brand_Name').agg({'Selling_Price':'sum','Brand_Name':'count'})
cars['Revenue']=cars['Selling_Price']*cars['Brand_Name']
fig = px.bar(cars, x=cars.index, y=cars['Revenue'],title='Revenue by Car_Brands')
fig.update_xaxes(categoryorder='total descending')

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 11/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory

Revenue by Car_Brands

500B

400B

Maruti contributes highest revenue

Revenue

300B

Lable Encoding for Finding best feature

200B

df=pd.read_csv('cardekho.csv')
100B

Assigning weights
0
Ma Hy according
u
Ma To
y
Tato domain
t
Ho
n
Fo
r
knowledge
Re
n
Ch
e
Au
d
Vo
lks
BM Me Sk
od
Ni
ss
Da F
tsu iat
La
nd
Ja
gu
Vo
lvo
Mi
tsu
Je
ep
MG
ru nd hin ot a da d au vr i W rce an a
ti ai dr a lt ole wa de a n r bis
a t ge s-B hi
n en
z
# Finding The age of Cars
df.insert(0, "Age_of_car", df["Year"].max()+1-df["Year"] ) index
df.drop('Year', axis=1, inplace=True)
# df.head()

column_to_encode = 'Owner'
weights = {'First Owner': .8, 'Second Owner': .7, 'Fourth & Above Owner': .4,'Third Owner':.5,'Test Drive Car':.9}
df['weighted_Owner'] = df[column_to_encode].map(weights)

column_to_encode = 'Fuel_Type'
weights = {'CNG': 0, 'Diesel': .3, 'Electric': .2, 'LPG': .1, 'Petrol': .4}
# Create a new column for the weighted labels
df['weighted_Fuel'] = df[column_to_encode].map(weights)
# df.drop(['Owner','Fuel_Type'],inplace=True)

Finding weights

sum_sell=df['Selling_Price'].sum()
agg_sell=df.groupby('Owner')['Selling_Price'].agg(["count","mean","sum",'median'])

agg_sell['Owner_weight'] = agg_sell['sum']/sum_sell

agg_sell

count mean sum median Owner_weight

Owner

First Owner 2833 598760.994705 1696289898 450000.0 0.774976

Fourth & Above Owner 81 173901.197531 14085997 130000.0 0.006435

Second Owner 1106 343891.088608 380343544 250499.5 0.173766

Test Drive Car 17 954293.941176 16222997 894999.0 0.007412

Third Owner 303 270247.844884 81885097 190000.0 0.037410

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['Car_Name'] = le.fit_transform(df['Car_Name'])
df['Encoded_Transmission'] = le.fit_transform(df['Transmission'])
df['Encoded_Seller_Type'] = le.fit_transform(df['Seller_Type'])
# df['Encoded_Brand'] = le.fit_transform(df['Brand'])

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4340 entries, 0 to 4339
Data columns (total 12 columns):

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 12/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age_of_car 4340 non-null int64
1 Car_Name 4340 non-null int64
2 Selling_Price 4340 non-null int64
3 Kms_Driven 4340 non-null int64
4 Fuel_Type 4340 non-null object
5 Seller_Type 4340 non-null object
6 Transmission 4340 non-null object
7 Owner 4340 non-null object
8 weighted_Owner 4340 non-null float64
9 weighted_Fuel 4340 non-null float64
10 Encoded_Transmission 4340 non-null int64
11 Encoded_Seller_Type 4340 non-null int64
dtypes: float64(2), int64(6), object(4)
memory usage: 407.0+ KB

numeric_cols = df.select_dtypes(include=['number'])

cat_cols = df.select_dtypes(include=['object'])

cat_cols.columns

Index(['Fuel_Type', 'Seller_Type', 'Transmission', 'Owner'], dtype='object')

df_cat=df[['Fuel_Type', 'Seller_Type', 'Transmission', 'Owner']]

df=df[['Age_of_car', 'Car_Name', 'Selling_Price', 'Kms_Driven',\

'weighted_Owner', 'weighted_Fuel', 'Encoded_Transmission',\
'Encoded_Seller_Type']]

df.head()

Age_of_car Car_Name Selling_Price Kms_Driven weighted_Owner weighted_Fuel Encoded_Transmission Encoded_Seller_Typ

0 16 775 60000 70000 0.8 0.4 1

1 16 1041 135000 50000 0.8 0.4 1

2 11 505 600000 100000 0.8 0.3 1

3 6 118 250000 46000 0.8 0.4 1

4 9 279 450000 141000 0.7 0.3 1

# Assuming df is your DataFrame

X = df.drop('Selling_Price', axis=1)
y = df['Selling_Price']

from sklearn.feature_selection import RFE

from sklearn.linear_model import LinearRegression

model = LinearRegression()
rfe = RFE(model, n_features_to_select=1)
rfe.fit(X, y)

feature_ranking = pd.DataFrame({'Feature': X.columns, 'Ranking': rfe.ranking_})

feature_ranking = feature_ranking.sort_values(by='Ranking')
print(feature_ranking)

output 4
Feature
weighted_Fuel
Ranking
1
3 weighted_Owner 2
5 Encoded_Transmission 3
0 Age_of_car 4
6 Encoded_Seller_Type 5
1 Car_Name 6
2 Kms_Driven 7

Using Recursive Feature Elimination Fuel Type is the best feature

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 13/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a decision tree model

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

# Plot feature importance

feature_importance = model.feature_importances_
feature_names = X.columns
plt.barh(feature_names, feature_importance)
plt.show()

Using DecisionTreeRegressor Transmission is the best feature

df.drop('Car_Name',axis=1,inplace=True)

continuous_features=df.select_dtypes(np.int64)

categorical_features=df.select_dtypes(np.object_)

#theil's u statistic,cremers v,chi square,weight of evidence

df.columns

Index(['Age_of_car', 'Selling_Price', 'Kms_Driven', 'weighted_Owner',

'weighted_Fuel', 'Encoded_Transmission', 'Encoded_Seller_Type'],
dtype='object')

from scipy.stats import chi2_contingency

!pip install association-metrics

Requirement already satisfied: association-metrics in /usr/local/lib/python3.10/dist-packages (0.0.1)

categorical_cols = df_cat

import association_metrics as am
import pandas as pd
import seaborn as sns
df = categorical_cols.apply(lambda x: x.astype("category") if x.dtype == "object" else x)
cramers_v = am.CramersV(df)
cfit = cramers_v.fit().round(2)
print(cfit)

Fuel_Type Seller_Type Transmission Owner

Fuel_Type 1.00 0.04 0.07 0.03
Seller_Type 0.04 1.00 0.21 0.21
Transmission 0.07 0.21 1.00 0.09
Owner 0.03 0.21 0.09 1.00

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 14/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
plt.figure(figsize=(10, 8))
sns.heatmap(cfit, annot=True, cmap='coolwarm', fmt='.2f', linewidths=.5)
plt.title("Cramér's V Heatmap")
plt.show()

https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 15/15

DS3000 USER MANUAL EN Rev0-51 Release 130613
No ratings yet
DS3000 USER MANUAL EN Rev0-51 Release 130613
57 pages
Disc Brakes
100% (1)
Disc Brakes
14 pages
Exploratory Data Analysis (EDA) Using Python
No ratings yet
Exploratory Data Analysis (EDA) Using Python
21 pages
Manual Susuki 1.3 Twin Cam
No ratings yet
Manual Susuki 1.3 Twin Cam
17 pages
Seal Systems, Bearing Arrangements and Couplings in Saudi A
100% (1)
Seal Systems, Bearing Arrangements and Couplings in Saudi A
129 pages
Cars Sales Dashboard
No ratings yet
Cars Sales Dashboard
19 pages
Crane 1 2 HH400-1518-4
100% (1)
Crane 1 2 HH400-1518-4
244 pages
Scientemp 43-1.7 Manual, Operating and Installation Manual
No ratings yet
Scientemp 43-1.7 Manual, Operating and Installation Manual
12 pages
Differences Between Vowels and Consonants
100% (1)
Differences Between Vowels and Consonants
8 pages
Laboratory Manual: Engineering Workshop
No ratings yet
Laboratory Manual: Engineering Workshop
58 pages
Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
100% (2)
Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
30 pages
Cars4U - Rajat Kapoor 21032021 FINAL-2
0% (1)
Cars4U - Rajat Kapoor 21032021 FINAL-2
39 pages
Ec6734A Improves Oil-Water Separation, Boosts Oil Recovery, Conditions Produced Water For Disposal in South Texas
No ratings yet
Ec6734A Improves Oil-Water Separation, Boosts Oil Recovery, Conditions Produced Water For Disposal in South Texas
4 pages
Data Analytics Project PDF
No ratings yet
Data Analytics Project PDF
10 pages
Manual Operador Bobcat 3x
No ratings yet
Manual Operador Bobcat 3x
352 pages
Vibration of Hydraulic Machinery
0% (1)
Vibration of Hydraulic Machinery
20 pages
Applications of Fluid Mechanics
No ratings yet
Applications of Fluid Mechanics
9 pages
Project - Analyzing The Impact of Car Features On Price and Profitability
No ratings yet
Project - Analyzing The Impact of Car Features On Price and Profitability
8 pages
Toyota - Diagnostic Trouble Codes
91% (22)
Toyota - Diagnostic Trouble Codes
5 pages
Data Analysis Report
No ratings yet
Data Analysis Report
74 pages
Ip Project
No ratings yet
Ip Project
52 pages
Car Price Prediction
No ratings yet
Car Price Prediction
35 pages
Car Price Prediction Project
No ratings yet
Car Price Prediction Project
34 pages
Flexible Solar Cells
No ratings yet
Flexible Solar Cells
2 pages
Practical Example Full Notes
No ratings yet
Practical Example Full Notes
48 pages
Hostel Bill
No ratings yet
Hostel Bill
1 page
Analysis of Old Cars Data
No ratings yet
Analysis of Old Cars Data
32 pages
Energy Skate Park - Used
No ratings yet
Energy Skate Park - Used
4 pages
EDA Withoutcode
No ratings yet
EDA Withoutcode
36 pages
0886f Bio Cab
No ratings yet
0886f Bio Cab
16 pages
Car Price Prediction 1
No ratings yet
Car Price Prediction 1
24 pages
HW210 1
No ratings yet
HW210 1
14 pages
Machine Learning Project 1690186790
No ratings yet
Machine Learning Project 1690186790
18 pages
Exploratiory Data Analysis
No ratings yet
Exploratiory Data Analysis
26 pages
The Laboratory Work 12
No ratings yet
The Laboratory Work 12
9 pages
Weekly Diary Report-244
No ratings yet
Weekly Diary Report-244
9 pages
Sample Project - IP - 12
No ratings yet
Sample Project - IP - 12
14 pages
Streptomycin Jan 2022
No ratings yet
Streptomycin Jan 2022
35 pages
Electric Vehicle Range Prediction-Regression Analysis
No ratings yet
Electric Vehicle Range Prediction-Regression Analysis
37 pages
Cambridge O Level: Physics 5054/12
No ratings yet
Cambridge O Level: Physics 5054/12
20 pages
General Description 1.6.1 Plate and Shell Element
No ratings yet
General Description 1.6.1 Plate and Shell Element
11 pages
SMDM Business+Report
No ratings yet
SMDM Business+Report
11 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Dav Week8 240953580
No ratings yet
Dav Week8 240953580
15 pages
Internship
No ratings yet
Internship
23 pages
Lec ExploratoryDataAnalysis1Unit5Part1
No ratings yet
Lec ExploratoryDataAnalysis1Unit5Part1
22 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
SMDM Business+Report
No ratings yet
SMDM Business+Report
11 pages
Intro To Exploratory Data Analysis Eda in Python
No ratings yet
Intro To Exploratory Data Analysis Eda in Python
7 pages
Belarus Car Price Prediction
No ratings yet
Belarus Car Price Prediction
18 pages
Xii Project PDF
No ratings yet
Xii Project PDF
19 pages
Impact of Car Features
No ratings yet
Impact of Car Features
9 pages
Project Report
No ratings yet
Project Report
7 pages
BDA-4 EDA Project
No ratings yet
BDA-4 EDA Project
19 pages
Temp 2 Lab 1
No ratings yet
Temp 2 Lab 1
5 pages
9587 - 9638 - 9563 - ADS - Exp1.ipynb - Colab
No ratings yet
9587 - 9638 - 9563 - ADS - Exp1.ipynb - Colab
8 pages
Lab1 For Module3 - Python Code
No ratings yet
Lab1 For Module3 - Python Code
10 pages
22eg107a11 DWV
No ratings yet
22eg107a11 DWV
15 pages
Analyzing Car Market Trends and Pricing Insights Using Python
No ratings yet
Analyzing Car Market Trends and Pricing Insights Using Python
9 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Data Vizualization - Jupyter Notebook
No ratings yet
Data Vizualization - Jupyter Notebook
20 pages
Elite Sports Cars Eda
No ratings yet
Elite Sports Cars Eda
9 pages
Project - Analyzing The Impact of Car Features On Price and Profitability
No ratings yet
Project - Analyzing The Impact of Car Features On Price and Profitability
8 pages
Car Prediction - Colab
No ratings yet
Car Prediction - Colab
8 pages
DV Ca-1
No ratings yet
DV Ca-1
9 pages
Note
No ratings yet
Note
9 pages
Practical 2 .Ipynb - Colab
No ratings yet
Practical 2 .Ipynb - Colab
9 pages
PSE 9e CH 27
No ratings yet
PSE 9e CH 27
7 pages
ICT550 Car - Detail - MUHAMMAD FAZLIEE AIMAN 2021122331
No ratings yet
ICT550 Car - Detail - MUHAMMAD FAZLIEE AIMAN 2021122331
9 pages
Plag
No ratings yet
Plag
3 pages
Gorgon Project Chevron
No ratings yet
Gorgon Project Chevron
18 pages
Demand Elasticity, Ramsey Index and Cross-Subsidy Scale Estimation For Electricity Price in China
No ratings yet
Demand Elasticity, Ramsey Index and Cross-Subsidy Scale Estimation For Electricity Price in China
9 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
7 pages
Stock Min 2 April
No ratings yet
Stock Min 2 April
14 pages
Python Pandas Matplot
No ratings yet
Python Pandas Matplot
15 pages
Fundamental Principles of Generators
No ratings yet
Fundamental Principles of Generators
11 pages
Data Frames and Charts 2: 2.1 Dealing With Missing Values
No ratings yet
Data Frames and Charts 2: 2.1 Dealing With Missing Values
12 pages
Output Submission Format
No ratings yet
Output Submission Format
4 pages
Mercruiser Cylinder Heads
No ratings yet
Mercruiser Cylinder Heads
1 page
Ist Part A
No ratings yet
Ist Part A
4 pages
Introduction To Python - Minor Project
No ratings yet
Introduction To Python - Minor Project
5 pages
Eda Notes
No ratings yet
Eda Notes
4 pages
Part A
No ratings yet
Part A
3 pages
Flat NGFLGOEU-J Nexans
No ratings yet
Flat NGFLGOEU-J Nexans
4 pages
Hager Consumer Unit VML91820-data1
No ratings yet
Hager Consumer Unit VML91820-data1
2 pages
9 Libraries
No ratings yet
9 Libraries
1 page
Vehicle Photo With Registration Plate 60 MM X 30 MM: Pollution Under Control Certificate
No ratings yet
Vehicle Photo With Registration Plate 60 MM X 30 MM: Pollution Under Control Certificate
1 page
Chevy/GMC Trucks 1973-1987: How to Build & Modify
From Everand
Chevy/GMC Trucks 1973-1987: How to Build & Modify
Jim Pickering
3/5 (1)
Engine Management: Advance Tuning
From Everand
Engine Management: Advance Tuning
Greg Banish
3/5 (5)

Trilokesh Assignment

Uploaded by

Trilokesh Assignment

Uploaded by

11/11/2023, 21:32 MLops_Plotly.

Car_Name Year Selling_Price Kms_Driven Fuel_Type Seller_Type Transmission Owner

Index(['Car_Name', 'Year', 'Selling_Price', 'Kms_Driven', 'Fuel_Type',

Year Selling_Price Kms_

Box Plot of Selling Price by Fuel Type and Transmission

Petrol Diesel CNG LPG

Car_Name Year Selling_Price Kms_Driven Fuel_Type Seller_Type Transmission Owner Age

There is no such co relation we can observe from this.

fig22 = px.pie(df,names='Seller_Type',title='Percentage of cars by Seller type')

Percentage of cars by Seller type

Individual seller type has the most percentage

sunburst_fig = px.sunburst(df, path=['Fuel_Type', 'Transmission'], values='Selling_Price',

Sunburst Chart of Selling Price by Fuel Type and Transmission

fig3 = px.histogram(df, x='Selling_Price', nbins=50, title='Histogram of Selling Price',color_discrete_sequence=['#6495ED'])

Histogram of Selling Price

From the range of 2L to 4L no of cars sold highest almost 1400

1995 2000 2005 2010 2015

fig_box = px.box(df, x='Kms_Driven', title='Distribution of Kms Driven', height=250,

Distribution of Kms Driven

0 0 100k 200k 300k 400k 500k 600k

fig_box = px.box(df, x='Selling_Price', title='Distribution of selling_price', height=250,

fig_box = px.box(df, x='Age', title='Distribution of selling_price', height=250,

LPG is the highest driven segment category cars

Test Drive Cars avg selling price is highest

Sale of Diesel cars are highest

Double-click (or enter) to edit

Avg Revenue by Fuel type

Cardekho - Stacked Column Chart by Owner and Fuel Type

Pair Plot of Year vs Selling Price

1995 2000 2005 2010 2015 2020

Year 1 0.4139634 −0.4196243 −1

Selling_Price 0.4139634 1 −0.1923481 −0.4139634

Kms_Driven −0.4196243 −0.1923481 1 0.4196243

Age −1 −0.4139634 0.4196243 1

Year Selling_Price Kms_Driven Age

Avg. Selling price by Age

Maruti contributes highest revenue

Lable Encoding for Finding best feature

count mean sum median Owner_weight

First Owner 2833 598760.994705 1696289898 450000.0 0.774976

Fourth & Above Owner 81 173901.197531 14085997 130000.0 0.006435

Second Owner 1106 343891.088608 380343544 250499.5 0.173766

Test Drive Car 17 954293.941176 16222997 894999.0 0.007412

Third Owner 303 270247.844884 81885097 190000.0 0.037410

from sklearn.preprocessing import LabelEncoder

Index(['Fuel_Type', 'Seller_Type', 'Transmission', 'Owner'], dtype='object')

df_cat=df[['Fuel_Type', 'Seller_Type', 'Transmission', 'Owner']]

df=df[['Age_of_car', 'Car_Name', 'Selling_Price', 'Kms_Driven',\

Age_of_car Car_Name Selling_Price Kms_Driven weighted_Owner weighted_Fuel Encoded_Transmission Encoded_Seller_Typ

0 16 775 60000 70000 0.8 0.4 1

1 16 1041 135000 50000 0.8 0.4 1

2 11 505 600000 100000 0.8 0.3 1

3 6 118 250000 46000 0.8 0.4 1

4 9 279 450000 141000 0.7 0.3 1

# Assuming df is your DataFrame

from sklearn.feature_selection import RFE

feature_ranking = pd.DataFrame({'Feature': X.columns, 'Ranking': rfe.ranking_})

Using Recursive Feature Elimination Fuel Type is the best feature

from sklearn.model_selection import train_test_split

# Train a decision tree model

# Plot feature importance

Using DecisionTreeRegressor Transmission is the best feature

#theil's u statistic,cremers v,chi square,weight of evidence

Index(['Age_of_car', 'Selling_Price', 'Kms_Driven', 'weighted_Owner',

from scipy.stats import chi2_contingency

!pip install association-metrics

Requirement already satisfied: association-metrics in /usr/local/lib/python3.10/dist-packages (0.0.1)

Fuel_Type Seller_Type Transmission Owner

You might also like