0% found this document useful (0 votes)
2 views4 pages

Datacleaning - Ipynb - Colab

The document contains a Jupyter notebook for data cleaning using a car dataset loaded from a Google Sheets CSV link. It performs various operations such as dropping rows with engine values greater than 1500, filling missing engine values with the mode, and removing the 'model' column due to low uniqueness. The notebook also includes visualizations using Plotly to analyze relationships between engine size, kilometers driven, and price.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

Datacleaning - Ipynb - Colab

The document contains a Jupyter notebook for data cleaning using a car dataset loaded from a Google Sheets CSV link. It performs various operations such as dropping rows with engine values greater than 1500, filling missing engine values with the mode, and removing the 'model' column due to low uniqueness. The notebook also includes visualizations using Plotly to analyze relationships between engine size, kilometers driven, and price.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

10/22/24, 7:17 PM datacleaning.

ipynb - Colab

import pandas as pd

df=pd.read_csv("https://docs.google.com/spreadsheets/d/1okbuAhRRXId5a5LPC85wSE2MBgKpQ-kY4nnABE3gCFc/export?format=csv&usp=sharing")

df

brand model transmission age fuel engine km owner price

0 mahindra thar manual 4 diesel 2184.0 11003 1 1231000

1 hyundai verna manual 6 petrol 1591.0 66936 1 786000

2 tata harrier manual 2 diesel 1956.0 27990 1 1489000

3 honda city automatic 1 petrol 1498.0 5061 1 1227000

4 ford ecosport manual 3 diesel 1498.0 23480 1 887000

... ... ... ... ... ... ... ... ... ...

174 maruti baleno manual 8 petrol 1197.0 33099 1 473000

175 mg hector automatic 4 petrol 1451.0 66590 1 1269000

176 hyundai creta manual 4 petrol 1497.0 80554 2 968000

177 kia seltos automatic 5 diesel 1493.0 51955 1 1266000

178 hyundai creta automatic 8 petrol 1591.0 61059 3 715000

179 rows × 9 columns

Next steps: Generate code with df


toggle_off View recommended plots New interactive sheet

import plotly.express as px

px.scatter(x=df['engine'],y=df['km'])

120k

100k

80k

60k
y

40k

20k

800 1000 1200 1400 1600 1800

i=df[df['engine']>1500].index

df=df.drop(index=i)

l=[]
for i in range (0,len(df)):
l.append(i)
df['index']=l

px.scatter(x=df['engine'],y=df['km'])

https://colab.research.google.com/drive/1YpcH_4uz3sVHGwE9Em0GZ4p8Tjq5ZTxR#scrollTo=e9RzZJomeQtH&printMode=true 1/4
10/22/24, 7:17 PM datacleaning.ipynb - Colab

120k

100k

80k

60k
y

40k

20k

800 900 1000 1100 1200 1300

px.scatter(x=df['price'],y=df['km'])

120k

100k

80k

60k
y

40k

20k

0.2M 0.4M 0.6M 0.8M 1M 1.2M

df.isnull().sum()

https://colab.research.google.com/drive/1YpcH_4uz3sVHGwE9Em0GZ4p8Tjq5ZTxR#scrollTo=e9RzZJomeQtH&printMode=true 2/4
10/22/24, 7:17 PM datacleaning.ipynb - Colab

brand 0

model 0

transmission 0

age 0

fuel 0

engine 4

km 0

owner 0

price 0

index 0

dtype: int64

df[df['engine'].isnull()]

brand model transmission age fuel engine km owner price index

7 tata nexon automatic 4 diesel NaN 59866 1 728000 4

13 hyundai creta manual 6 petrol NaN 59085 1 823000 9

74 hyundai new automatic 0 petrol NaN 3295 1 974000 66

130 kia seltos manual 5 petrol NaN 90505 1 980000 117

df['engine'].unique()

array([1498., 1199., nan, 1497., 1197., 998., 1493., 1451., 1194.,


1462., 796., 999., 1496., 1086., 1373., 1198., 1248., 799.,
1396., 1368.])

df['engine'].mode()[0]

1197.0

df['engine']=df['engine'].fillna(df['engine'].mode()[0])

df[df['engine'].isnull()]

brand model transmission age fuel engine km owner price index

df.isnull().sum()

brand 0

model 0

transmission 0

age 0

fuel 0

engine 0

km 0

owner 0

price 0

index 0

dtype: int64

df.duplicated().sum()

https://colab.research.google.com/drive/1YpcH_4uz3sVHGwE9Em0GZ4p8Tjq5ZTxR#scrollTo=e9RzZJomeQtH&printMode=true 3/4
10/22/24, 7:17 PM datacleaning.ipynb - Colab
len(df['model'].unique())/len(df)

0.2981366459627329

len(df['model'].unique())

48

len(df)

161

df=df.drop(columns=['model'])

df

brand transmission age fuel engine km owner price index

3 honda automatic 1 petrol 1498.0 5061 1 1227000 0

4 ford manual 3 diesel 1498.0 23480 1 887000 1

5 honda manual 3 petrol 1199.0 44787 1 796000 2

6 tata manual 2 petrol 1199.0 450 1 813000 3

7 tata automatic 4 diesel 1197.0 59866 1 728000 4

... ... ... ... ... ... ... ... ... ...

173 mahindra manual 4 petrol 1197.0 40924 1 681000 156

174 maruti manual 8 petrol 1197.0 33099 1 473000 157

175 mg automatic 4 petrol 1451.0 66590 1 1269000 158

176 hyundai manual 4 petrol 1497.0 80554 2 968000 159

177 kia automatic 5 diesel 1493.0 51955 1 1266000 160

161 rows × 9 columns

Next steps: Generate code with df


toggle_off View recommended plots New interactive sheet

Start coding or generate with AI.

https://colab.research.google.com/drive/1YpcH_4uz3sVHGwE9Em0GZ4p8Tjq5ZTxR#scrollTo=e9RzZJomeQtH&printMode=true 4/4

You might also like