Data Science Project
Data Science Project
ipynb - Colaboratory
import pandas as pd
df = pd.read_csv('/content/Housing_Data_Set (1).csv')
df
output price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking prefar
... ... ... ... ... ... ... ... ... ... ... ...
df.head()
price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking prefarea
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 538 non-null float64
1 area 545 non-null int64
2 bedrooms 544 non-null float64
3 bathrooms 545 non-null int64
4 stories 543 non-null float64
5 mainroad 545 non-null object
6 guestroom 543 non-null object
7 basement 545 non-null object
8 hotwaterheating 537 non-null object
9 airconditioning 544 non-null object
10 parking 545 non-null int64
11 prefarea 545 non-null object
12 furnishingstatus 545 non-null object
13 Date 542 non-null object
dtypes: float64(3), int64(3), object(8)
memory usage: 59.7+ KB
df.describe()
https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 1/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory
df.isnull().sum()
price 7
area 0
bedrooms 1
bathrooms 0
stories 2
mainroad 0
guestroom 2
basement 0
hotwaterheating 8
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
Date 3
dtype: int64
df = df.drop(columns = 'Date')
df.head()
Fill the null values in the Data using the mean average method
df['price'].fillna(df["price"].mean(), inplace=True)
df.isnull().sum()
price 0
area 0
bedrooms 1
bathrooms 0
stories 2
mainroad 0
guestroom 2
basement 0
hotwaterheating 8
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
dtype: int64
https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 2/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory
df['price'].fillna(df["price"].mean(), inplace=True)
df
df['bedrooms'].fillna(df["bedrooms"].mean(), inplace=True)
df.isnull().sum()
price 0
area 0
bedrooms 0
bathrooms 0
stories 2
mainroad 0
guestroom 2
basement 0
hotwaterheating 8
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
dtype: int64
df['stories'].fillna(df["stories"].mean(), inplace=True)
df.isnull().sum()
price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 2
basement 0
hotwaterheating 8
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
dtype: int64
df['guestroom'].fillna(df['guestroom'].mode().iloc[0], inplace=True)
df.isnull().sum()
price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 3/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory
guestroom 0
basement 0
hotwaterheating 8
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
dtype: int64
df['hotwaterheating'].fillna(df['hotwaterheating'].mode().iloc[0], inplace=True)
df.isnull().sum()
price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
dtype: int64
df['airconditioning'].fillna(df['airconditioning'].mode().iloc[0], inplace=True)
df.isnull().sum()
price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 0
parking 0
prefarea 0
furnishingstatus 0
dtype: int64
df = df.sort_values('furnishingstatus')
df.head()
df = df.reset_index()
df
https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 4/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory
index price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking
... ... ... ... ... ... ... ... ... ... ... ... ...
df.isnull().sum()
price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 0
parking 0
prefarea 0
furnishingstatus 0
dtype: int64
plt.scatter(df["price"], df["area"])
https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 5/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory
<matplotlib.collections.PathCollection at 0x7a4a6102f010>
import seaborn as sns
sns.pairplot(df)
https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 6/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory
<seaborn.axisgrid.PairGrid at 0x7a4a6109f880>
sns.set(style="darkgrid")
<Axes: >
<seaborn.axisgrid.FacetGrid at 0x7a4a5c071e40>
https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 7/7