Trilokesh Assignment
Trilokesh Assignment
ipynb - Colaboratory
import numpy as np
import pandas as pd
Read File
import plotly.express as px
df=pd.read_csv('cardekho.csv')
df.head(5)
0 Maruti 800 AC 2007 60000 70000 Petrol Individual Manual First Owner
1 Maruti Wagon R LXI Minor 2007 135000 50000 Petrol Individual Manual First Owner
2 Hyundai Verna 1.6 SX 2012 600000 100000 Diesel Individual Manual First Owner
3 Datsun RediGO T Option 2017 250000 46000 Petrol Individual Manual First Owner
4 Honda Amaze VX i-DTEC 2014 450000 141000 Diesel Individual Manual Second Owner
df['Year'].sort_values(ascending=False)
3206 2022
4179 2021
1777 2020
2481 2020
1575 2020
...
3661 1997
61 1996
2972 1996
631 1995
3334 1992
Name: Year, Length: 4340, dtype: int64
current_age=2023
df['Age']=current_age-df['Year']
df.columns
Data Visualisation
# Scatter Matrix
scatter_matrix_fig = px.scatter_matrix(df, dimensions=['Year', 'Selling_Price', 'Kms_Driven'],\
color='Fuel_Type', title='Scatter Matrix')
scatter_matrix_fig.show()
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 1/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
Scatter Matrix
2020
2010
Year
2000
fig = px.scatter(df, x='Selling_Price', y='Kms_Driven', title='Selling Price vs. Kms_Driven')
fig.show() 1990
Selling_Price
5M
Selling Price vs. Kms_Driven
800k
800k
Kms_Driven
600k
700k
400k
200k
600k
0
1990 2000 2010 2020 0 5M 0 200k 40
500k
Kms_Driven
300k
200k
100k
0 1M 2M 3M 4M 5M 6M 7M
Selling_Price
# Box Plot
box_plot_fig = px.box(df, x='Fuel_Type', y='Selling_Price', color='Transmission',\
title='Box Plot of Selling Price by Fuel Type and Transmission')
box_plot_fig.show()
8M
6M
Selling_Price
4M
2M
Fuel_Type
px.box(df,x='Selling_Price',points='suspectedoutliers')
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 2/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
0 1M 2M 3M 4M 5M 6M 7M
df.loc[df['Kms_Driven']<1000]
Selling_Price
1312 Mahindra Quanto C6 2014 250000 1 Diesel Individual Manual Second Owner 9
1714 Ford Freestyle Titanium Diesel 2020 784000 101 Diesel Dealer Manual Test Drive Car 3
1715 Ford Figo Titanium 2020 635000 101 Petrol Dealer Manual Test Drive Car 3
1716 Ford Ecosport 1.5 Diesel Titanium 2020 1000000 101 Diesel Dealer Manual Test Drive Car 3
22.9%
2.35%
74.7%
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 3/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
Manual
l
se
Die
LPG
Electric Manual
Automatic
CNG Manual
Manual
Auto
Automatic mat
Petrol ic
Manual
1200
1000
800
count
600
400
200
0
0 1M 2M 3M 4M 5M 6M 7M
Selling_Price
cars=df.groupby('Year')['Year'].count()
fig = px.line(cars, x=cars.index, y=cars.values,color_discrete_sequence=['#6495ED'],title='Cars_Sold_By_Year')
fig.show()
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 4/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
Cars_Sold_By_Year
400
1.Highest sales
300recorded in the year of 2017 and after that sales decreased. 2.Surprisingly 2019 cars sold is less but selling price was high in
that year,so there might be chance of elctric cars sold more in 2019.
y
200
Double-click (or enter) to edit
cars=df.groupby('Transmission')['Transmission'].count()
100
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Cars_Sold'},color_discrete_sequence=['#03DAC5'])
fig12.show()
index
4000
3500
3000
2500
Cars_Sold
2000
1500
1000
500
0
Automatic Manual
index
Manual Gear type are more favourable than Automatic Gear type.
cars=df.groupby('Year')['Selling_Price'].mean()
fig = px.line(cars, x=cars.index, y=cars.values,title='Avg_selling_Price_by_Year',color_discrete_sequence=['#03DAC5'])
fig.show()
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 5/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
Avg_selling_Price_by_Year
2019 has the highest selling year while in 1999 there was a sudden drop in selling,after that sale of cars gradually increase.
1M
fig_box.show()
0.6M
y
0.2M
Avg kms driven is around 60k kms,while there are few cars who drove is 800k kms
fig_box.show()
Distribution of selling_price
0 1M 2M 3M 4M 5M 6M 7M
Selling_Price
fig_box.show()
Distribution of selling_price
0 5 10 15 20 25
Age
Avg Selling_Price is 350k while few cars Avg Selling_Price ranges between 8M and 9M
cars=df.groupby('Fuel_Type')['Kms_Driven'].mean()
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Avg_Kms_Driven'},color_discrete_sequence=['#6495ED'])
fig12.show()
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 6/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
90k
80k
70k
Avg_Kms_Driven
60k
50k
40k
30k
20k
10k
0
CNG Diesel Electric LPG
Fuel_Type
cars=df.groupby('Owner')['Selling_Price'].mean()
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Avg_Selling_Price_By_Ownership'},color_discrete_sequence=['#6495ED']
fig12.show()
1M
0.8M
Avg_Selling_Price_By_Ownership
0.6M
0.4M
0.2M
0
First Owner Fourth & Above Owner Second Owner Test Drive Car
Owner
cars=df.groupby('Fuel_Type')['Selling_Price'].mean()
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Avg_selling_price_by_Fuel_type'},color_discrete_sequence=['#03DAC5']
fig12.show()
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 7/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
1M
0.8M
Avg_selling_price_by_Fuel_type
0.6M
0.4M
Pricing of eletric cars are more than diesel,though diesel cars sold more than eletric cars.
0.2M
cars=df.groupby('Fuel_Type')['Fuel_Type'].count()
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Fuel_type_Cars_Sold'},color_discrete_sequence=['#6495ED'])
fig12.show()
0
CNG Diesel Electric LPG
Fuel_Type
2000
Fuel_type_Cars_Sold
1500
1000
500
0
CNG Diesel Electric LPG
index
cars=df.groupby('Fuel_Type').agg({'Selling_Price':'mean','Fuel_Type':'count'})
cars['Revenue']=cars['Selling_Price']*cars['Fuel_Type']
fig = px.bar(cars, x=cars.index, y=cars['Revenue'],title='Avg Revenue by Fuel type')
fig.update_xaxes(categoryorder='total descending')
# fig.update_yaxes(showgrid=False),
# fig.update_xaxes(showgrid=False),
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 8/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
1.4B
1.2B
1B
Revenue
Major Contribution
0.8B in Revenue is Diesel Cars
0.6B
fig = px.histogram(df, x="Owner", color="Fuel_Type", barmode="stack")
fig.update_layout(
0.4B
title="Cardekho - Stacked Column Chart by Owner and Fuel Type",
xaxis_title="Owner_Type",
yaxis_title="Count",
0.2B
legend_title="Fuel_Type")
fig.show() 0
Diesel Petrol CNG Electric
index
2500
2000
Count
1500
1000
500
0
First Owner Second Owner Fourth & Above Owner Third Owner
Owner_Type
For every owner type their first preference is diesel car & the second is petrol
# Pair Plot
pair_plot_fig = px.scatter(df, x='Year', y='Selling_Price', color='Fuel_Type',\
marginal_y='violin', marginal_x='histogram',\
title='Pair Plot of Year vs Selling Price')
pair_plot_fig.show()
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 9/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
8M
corr=df.corr()
fig = px.imshow(corr, color_continuous_scale='YlOrRd',text_auto=True)
fig.update_layout(
6M
Selling_Price
title='Correlation Matrix',
margin=dict(l=100, r=100, t=100, b=100))
fig.show() 4M
<ipython-input-28-a1846bcfff27>:1: FutureWarning:
2M
The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select
0
cars=df.groupby('Age')['Selling_Price'].mean()
fig = px.bar(cars, x=cars.index, y=cars.values,title='Avg. Selling price by Age')
fig.update_xaxes(categoryorder='total descending')
fig.update_yaxes(showgrid=False),
fig.update_xaxes(showgrid=False),
fig.update_layout(xaxis_title='Age', yaxis_title="Avg. Selling Price",
plot_bgcolor='#2d3035', paper_bgcolor='#2d3035',
title_font=dict(size=25, color='#ffffff', family="Muli, sans-serif"),
font=dict(color='#ffffff'),
)
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 10/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
1M
0.8M
df['Brand_Name']=df['Car_Name'].str.split()
Avg. Selling Price
df['Brand_Name']=df['Brand_Name'].apply(lambda x:x[0])
0.6M
df.head(1)
0.4M
Car_Name Year Selling_Price Kms_Driven Fuel_Type Seller_Type Transmission Owner Age Brand_Name
0 Maruti 800 AC 2007 60000 70000 Petrol Individual Manual First Owner 16 Maruti
0.2M
cars=df.groupby('Brand_Name')['Selling_Price'].mean()
fig12=px.bar(cars,x=cars.index,y=cars.values,labels={'y':'Avg_SellingPrice_By_Btand'},color_discrete_sequence=['#03DAC5'])
0
fig12.update_xaxes(categoryorder='total descending')
5 10 15 20 25
fig12.show()
Age
3.5M
3M
Avg_SellingPrice_By_Btand
2.5M
2M
1.5M
1M
0.5M
0
La BM Me Vo J Au MG Je Isu Kia To Mi Ma F Op Ho Sk Vo Ni Hy Re Fo Ma
nd W rce lvo agu di ep zu yo tsu hin ord elCo nda od lks ss un na r ru
de ar ta bis dr a wa an da ult ce ti
s-B hi a rsa ge i
en n
z
Brand_Name
df['Brand_Name'].nunique()
29
cars=df.groupby('Brand_Name').agg({'Selling_Price':'sum','Brand_Name':'count'})
cars['Revenue']=cars['Selling_Price']*cars['Brand_Name']
fig = px.bar(cars, x=cars.index, y=cars['Revenue'],title='Revenue by Car_Brands')
fig.update_xaxes(categoryorder='total descending')
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 11/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
Revenue by Car_Brands
500B
400B
300B
df=pd.read_csv('cardekho.csv')
100B
Assigning weights
0
Ma Hy according
u
Ma To
y
Tato domain
t
Ho
n
Fo
r
knowledge
Re
n
Ch
e
Au
d
Vo
lks
BM Me Sk
od
Ni
ss
Da F
tsu iat
La
nd
Ja
gu
Vo
lvo
Mi
tsu
Je
ep
MG
ru nd hin ot a da d au vr i W rce an a
ti ai dr a lt ole wa de a n r bis
a t ge s-B hi
n en
z
# Finding The age of Cars
df.insert(0, "Age_of_car", df["Year"].max()+1-df["Year"] ) index
df.drop('Year', axis=1, inplace=True)
# df.head()
column_to_encode = 'Owner'
weights = {'First Owner': .8, 'Second Owner': .7, 'Fourth & Above Owner': .4,'Third Owner':.5,'Test Drive Car':.9}
df['weighted_Owner'] = df[column_to_encode].map(weights)
column_to_encode = 'Fuel_Type'
weights = {'CNG': 0, 'Diesel': .3, 'Electric': .2, 'LPG': .1, 'Petrol': .4}
# Create a new column for the weighted labels
df['weighted_Fuel'] = df[column_to_encode].map(weights)
# df.drop(['Owner','Fuel_Type'],inplace=True)
Finding weights
sum_sell=df['Selling_Price'].sum()
agg_sell=df.groupby('Owner')['Selling_Price'].agg(["count","mean","sum",'median'])
agg_sell['Owner_weight'] = agg_sell['sum']/sum_sell
agg_sell
Owner
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4340 entries, 0 to 4339
Data columns (total 12 columns):
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 12/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age_of_car 4340 non-null int64
1 Car_Name 4340 non-null int64
2 Selling_Price 4340 non-null int64
3 Kms_Driven 4340 non-null int64
4 Fuel_Type 4340 non-null object
5 Seller_Type 4340 non-null object
6 Transmission 4340 non-null object
7 Owner 4340 non-null object
8 weighted_Owner 4340 non-null float64
9 weighted_Fuel 4340 non-null float64
10 Encoded_Transmission 4340 non-null int64
11 Encoded_Seller_Type 4340 non-null int64
dtypes: float64(2), int64(6), object(4)
memory usage: 407.0+ KB
numeric_cols = df.select_dtypes(include=['number'])
cat_cols = df.select_dtypes(include=['object'])
cat_cols.columns
df.head()
model = LinearRegression()
rfe = RFE(model, n_features_to_select=1)
rfe.fit(X, y)
output 4
Feature
weighted_Fuel
Ranking
1
3 weighted_Owner 2
5 Encoded_Transmission 3
0 Age_of_car 4
6 Encoded_Seller_Type 5
1 Car_Name 6
2 Kms_Driven 7
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 13/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
df.drop('Car_Name',axis=1,inplace=True)
continuous_features=df.select_dtypes(np.int64)
categorical_features=df.select_dtypes(np.object_)
df.columns
categorical_cols = df_cat
import association_metrics as am
import pandas as pd
import seaborn as sns
df = categorical_cols.apply(lambda x: x.astype("category") if x.dtype == "object" else x)
cramers_v = am.CramersV(df)
cfit = cramers_v.fit().round(2)
print(cfit)
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 14/15
11/11/2023, 21:32 MLops_Plotly.ipynb - Colaboratory
plt.figure(figsize=(10, 8))
sns.heatmap(cfit, annot=True, cmap='coolwarm', fmt='.2f', linewidths=.5)
plt.title("Cramér's V Heatmap")
plt.show()
https://colab.research.google.com/drive/1bYSu9K9cH9GP3WFcoRGXSaSx3QJgI_mn?authuser=3#scrollTo=CC0C_UO28h4n&printMode=true 15/15