0% found this document useful (0 votes)
17 views2 pages

DMV - 3 - Jupyter Notebook

Uploaded by

Anushka Jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views2 pages

DMV - 3 - Jupyter Notebook

Uploaded by

Anushka Jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

10/6/24, 7:24 PM DMV_3 - Jupyter Notebook

In [1]: import pandas as pd

In [2]: df = pd.read_csv('Housing.csv')

In [3]: df.columns = df.columns.str.strip()


df.columns = df.columns.str.replace(' ', '_')
df.columns = df.columns.str.replace('[^A-Za-z0-9_]', '', regex=True)

In [4]: df.head()

Out[4]:
price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking prefarea furnishingstatus

0 13300000 7420 4 2 3 yes no no no yes 2 yes furnished

1 12250000 8960 4 4 4 yes no no no yes 3 no furnished

2 12250000 9960 3 2 2 yes no yes no no 2 yes semi-furnished

3 12215000 7500 4 2 2 yes no yes no yes 3 yes furnished

4 11410000 7420 4 1 2 yes yes yes no yes 2 no furnished

In [5]: df.tail()

Out[5]:
price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking prefarea furnishingstatus

540 1820000 3000 2 1 1 yes no yes no no 2 no unfurnished

541 1767150 2400 3 1 1 no no no no no 0 no semi-furnished

542 1750000 3620 2 1 1 yes no no no no 0 no unfurnished

543 1750000 2910 3 1 1 no no no no no 0 no furnished

544 1750000 3850 3 1 2 yes no no no no 0 no unfurnished

In [6]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 545 non-null int64
1 area 545 non-null int64
2 bedrooms 545 non-null int64
3 bathrooms 545 non-null int64
4 stories 545 non-null int64
5 mainroad 545 non-null object
6 guestroom 545 non-null object
7 basement 545 non-null object
8 hotwaterheating 545 non-null object
9 airconditioning 545 non-null object
10 parking 545 non-null int64
11 prefarea 545 non-null object
12 furnishingstatus 545 non-null object
dtypes: int64(6), object(7)
memory usage: 55.5+ KB

In [7]: df.describe()

Out[7]:
price area bedrooms bathrooms stories parking

count 5.450000e+02 545.000000 545.000000 545.000000 545.000000 545.000000

mean 4.766729e+06 5150.541284 2.965138 1.286239 1.805505 0.693578

std 1.870440e+06 2170.141023 0.738064 0.502470 0.867492 0.861586

min 1.750000e+06 1650.000000 1.000000 1.000000 1.000000 0.000000

25% 3.430000e+06 3600.000000 2.000000 1.000000 1.000000 0.000000

50% 4.340000e+06 4600.000000 3.000000 1.000000 2.000000 0.000000

75% 5.740000e+06 6360.000000 3.000000 2.000000 2.000000 1.000000

max 1.330000e+07 16200.000000 6.000000 4.000000 4.000000 3.000000

In [8]: df.shape

Out[8]: (545, 13)

localhost:8888/notebooks/BE_PRACTICALS/DMV_3.ipynb 1/2
10/6/24, 7:24 PM DMV_3 - Jupyter Notebook

In [9]: df.columns

Out[9]: Index(['price', 'area', 'bedrooms', 'bathrooms', 'stories', 'mainroad',


'guestroom', 'basement', 'hotwaterheating', 'airconditioning',
'parking', 'prefarea', 'furnishingstatus'],
dtype='object')

In [10]: df.isnull().sum()

Out[10]: price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 0
parking 0
prefarea 0
furnishingstatus 0
dtype: int64

In [16]: Categorical_Column = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 'aircondtioning', 'prefarea', 'furnishing_statu

In [19]: filtered_data = df[df['price'] > 100000]


print("Filtered data: ", filtered_data.head())

Filtered data: price area bedrooms bathrooms stories mainroad guestroom basement \
0 13300000 7420 4 2 3 yes no no
1 12250000 8960 4 4 4 yes no no
2 12250000 9960 3 2 2 yes no yes
3 12215000 7500 4 2 2 yes no yes
4 11410000 7420 4 1 2 yes yes yes

hotwaterheating airconditioning parking prefarea furnishingstatus


0 no yes 2 yes furnished
1 no yes 3 no furnished
2 no no 2 yes semi-furnished
3 no yes 3 yes furnished
4 no yes 2 no furnished

In [21]: categorical_cols = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 'airconditioning', 'prefarea', 'furnishingstatus


df = pd.get_dummies(df, columns=categorical_cols, drop_first=True)

In [23]: Q1 = df['price'].quantile(0.25)
Q3 = df['price'].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

data_no_outliers = df[(df['price'] >= lower_bound) & (df['price'] <= upper_bound)]
print("Data after removing outliers:\n", data_no_outliers.describe())

Data after removing outliers:


price area bedrooms bathrooms stories \
count 5.300000e+02 530.000000 530.000000 530.000000 530.000000
mean 4.600663e+06 5061.518868 2.943396 1.260377 1.788679
std 1.596119e+06 2075.449479 0.730515 0.464359 0.861190
min 1.750000e+06 1650.000000 1.000000 1.000000 1.000000
25% 3.430000e+06 3547.500000 2.000000 1.000000 1.000000
50% 4.270000e+06 4500.000000 3.000000 1.000000 2.000000
75% 5.600000e+06 6315.750000 3.000000 1.000000 2.000000
max 9.100000e+06 15600.000000 6.000000 3.000000 4.000000

parking
count 530.000000
mean 0.664151
std 0.843320
min 0.000000
25% 0.000000
50% 0.000000
75% 1.000000
max 3.000000

In [ ]: ​

localhost:8888/notebooks/BE_PRACTICALS/DMV_3.ipynb 2/2

You might also like