0% found this document useful (0 votes)
21 views30 pages

Copy - of - Descriptive - EDA - Munjal - Exercise1.ipynb - Colaboratory

The document analyzes a housing dataset with 1460 rows and 81 columns. It checks the shape of the data, visualizes the first 5 records, lists the column names, describes numerical columns, checks the datatype of each column, and performs a missing values analysis. It finds the dataset has 259 missing values in the LotFrontage column and 1369 missing in the Alley column.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views30 pages

Copy - of - Descriptive - EDA - Munjal - Exercise1.ipynb - Colaboratory

The document analyzes a housing dataset with 1460 rows and 81 columns. It checks the shape of the data, visualizes the first 5 records, lists the column names, describes numerical columns, checks the datatype of each column, and performs a missing values analysis. It finds the dataset has 259 missing values in the LotFrontage column and 1369 missing in the Alley column.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.

ipynb - Colaboratory

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as plt

Read the dataset from local

housing_dataset = pd.read_csv("housing_dataset.csv")

# from google.colab import drive


# drive.mount('/content/drive')

Check the number of rows and column in data

housing_dataset.shape

account_circle (1460, 81)

Visualize the first 5 records of the dataset

housing_dataset.head()

Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities ..

0 1 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub .

1 2 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub .

2 3 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub .

3 4 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub .

4 5 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub .

5 rows × 81 columns

Check the column names of the dataset

housing_dataset.columns

Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',


'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',
'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC',
'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType',

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 1/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
'SaleCondition', 'SalePrice'],
dtype='object')

Describe the numerical columns in data

housing_dataset.describe()

Id MSSubClass LotFrontage LotArea OverallQual OverallCond YearBuilt YearRem

count 1460.000000 1460.000000 1201.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.

mean 730.500000 56.897260 70.049958 10516.828082 6.099315 5.575342 1971.267808 1984.

std 421.610009 42.300571 24.284752 9981.264932 1.382997 1.112799 30.202904 20.

min 1.000000 20.000000 21.000000 1300.000000 1.000000 1.000000 1872.000000 1950.

25% 365.750000 20.000000 59.000000 7553.500000 5.000000 5.000000 1954.000000 1967.

50% 730.500000 50.000000 69.000000 9478.500000 6.000000 5.000000 1973.000000 1994.

75% 1095.250000 70.000000 80.000000 11601.500000 7.000000 6.000000 2000.000000 2004.

max 1460.000000 190.000000 313.000000 215245.000000 10.000000 9.000000 2010.000000 2010.

8 rows × 38 columns

Checking datatype of each column

housing_dataset.dtypes

Id int64
MSSubClass int64
MSZoning object
LotFrontage float64
LotArea int64
...
MoSold int64
YrSold int64
SaleType object
SaleCondition object
SalePrice int64
Length: 81, dtype: object

Tip - use pd.set_option for displaying full data ,either rows or columns

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

housing_dataset.dtypes # Now dtypes does not display ...

Id int64
MSSubClass int64
MSZoning object
LotFrontage float64
LotArea int64
Street object
Alley object
LotShape object
LandContour object
Utilities object
LotConfig object
LandSlope object
Neighborhood object

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 2/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
Condition1 object
Condition2 object
BldgType object
HouseStyle object
OverallQual int64
OverallCond int64
YearBuilt int64
YearRemodAdd int64
RoofStyle object
RoofMatl object
Exterior1st object
Exterior2nd object
MasVnrType object
MasVnrArea float64
ExterQual object
ExterCond object
Foundation object
BsmtQual object
BsmtCond object
BsmtExposure object
BsmtFinType1 object
BsmtFinSF1 int64
BsmtFinType2 object
BsmtFinSF2 int64
BsmtUnfSF int64
TotalBsmtSF int64
Heating object
HeatingQC object
CentralAir object
Electrical object
1stFlrSF int64
2ndFlrSF int64
LowQualFinSF int64
GrLivArea int64
BsmtFullBath int64
BsmtHalfBath int64
FullBath int64
HalfBath int64
BedroomAbvGr int64
KitchenAbvGr int64
KitchenQual object
TotRmsAbvGrd int64
Functional object
Fireplaces int64
FireplaceQu object

housing_dataset.head() # Now head does not display ... in columns

Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape Land

0 1 60 RL 65.0 8450 Pave NaN Reg

1 2 20 RL 80.0 9600 Pave NaN Reg

2 3 60 RL 68.0 11250 Pave NaN IR1

3 4 70 RL 60.0 9550 Pave NaN IR1

4 5 60 RL 84.0 14260 Pave NaN IR1

Missing values analysis

Check percentage of missing values in each column

pd.DataFrame(housing_dataset.isnull().sum())

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 3/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

Id 0

MSSubClass 0

MSZoning 0

LotFrontage 259

LotArea 0

Street 0

Alley 1369

LotShape 0

LandContour 0

Utilities 0

LotConfig 0

LandSlope 0

Neighborhood 0

Condition1 0

Condition2 0

BldgType 0

HouseStyle 0

OverallQual 0

OverallCond 0

YearBuilt 0

YearRemodAdd 0

RoofStyle 0

RoofMatl 0

Exterior1st 0

Exterior2nd 0

MasVnrType 8

MasVnrArea 8

ExterQual 0

ExterCond 0

Foundation 0

BsmtQual 37

BsmtCond 37

BsmtExposure 38

BsmtFinType1 37

BsmtFinSF1 0

BsmtFinType2 38

BsmtFinSF2 0

BsmtUnfSF 0

TotalBsmtSF 0

Heating 0

HeatingQC 0
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 4/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
HeatingQC 0

CentralAir 0

Electrical 1

1stFlrSF 0

2ndFlrSF 0

LowQualFinSF 0

GrLivArea 0

BsmtFullBath 0

BsmtHalfBath 0

FullBath 0

HalfBath 0

BedroomAbvGr 0

KitchenAbvGr 0

KitchenQual 0

TotRmsAbvGrd 0

Functional 0

Fireplaces 0

FireplaceQu 690

GarageType 81

GarageYrBlt 81

GarageFinish 81

GarageCars 0

GarageArea 0

GarageQual 81

GarageCond 81

PavedDrive 0

WoodDeckSF 0

OpenPorchSF 0

EnclosedPorch 0

3SsnPorch 0

ScreenPorch 0

PoolArea 0

PoolQC 1453

Fence 1179

MiscFeature 1406

MiscVal 0

MoSold 0

YrSold 0

SaleType 0

SaleCondition 0
pd.DataFrame(housing_dataset.isnull().sum()/housing_dataset.isnull().count(),columns=['%
SalePrice 0 missing']).sor

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 5/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

% missing

PoolQC 0.995205

MiscFeature 0.963014

Alley 0.937671

Fence 0.807534

FireplaceQu 0.472603

LotFrontage 0.177397

GarageYrBlt 0.055479

GarageCond 0.055479

GarageType 0.055479

GarageFinish 0.055479

GarageQual 0.055479

BsmtFinType2 0.026027

BsmtExposure 0.026027

BsmtQual 0.025342

BsmtCond 0.025342

BsmtFinType1 0.025342

MasVnrArea 0.005479

MasVnrType 0.005479

Electrical 0.000685

Id 0.000000

Functional 0.000000

Fireplaces 0.000000

KitchenQual 0.000000

KitchenAbvGr 0.000000

BedroomAbvGr 0.000000

HalfBath 0.000000

FullBath 0.000000

BsmtHalfBath 0.000000

TotRmsAbvGrd 0.000000

GarageCars 0.000000

GrLivArea 0.000000

GarageArea 0.000000

PavedDrive 0.000000

WoodDeckSF 0.000000

OpenPorchSF 0.000000

EnclosedPorch 0.000000

3SsnPorch 0.000000

ScreenPorch 0.000000

PoolArea 0.000000

MiscVal 0.000000

MoSold 0 000000
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 6/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
MoSold 0.000000

YrSold 0.000000

SaleType 0.000000

SaleCondition 0.000000

BsmtFullBath 0.000000

HeatingQC 0.000000

LowQualFinSF 0.000000

LandSlope 0.000000

OverallQual 0.000000

HouseStyle 0.000000

BldgType 0.000000

Condition2 0.000000

Condition1 0.000000

Neighborhood 0.000000

LotConfig 0.000000

YearBuilt 0.000000

Utilities 0.000000

LandContour 0.000000

LotShape 0.000000

Street 0.000000

LotArea 0.000000

MSZoning 0.000000

OverallCond 0.000000

YearRemodAdd 0.000000

2ndFlrSF 0.000000

BsmtFinSF2 0.000000

1stFlrSF 0.000000

CentralAir 0.000000

MSSubClass 0.000000

Heating 0.000000

TotalBsmtSF 0.000000

BsmtUnfSF 0.000000

BsmtFinSF1 0.000000

RoofStyle 0.000000

Foundation 0.000000

ExterCond 0.000000

ExterQual 0.000000

Exterior2nd 0.000000

Exterior1st 0.000000

RoofMatl 0.000000

SalePrice 0.000000
Missing value imputation example using mean

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 7/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

housing_dataset['LotFrontage'].isnull()

0 False
1 False
2 False
3 False
4 False
...
1455 False
1456 False
1457 False
1458 False
1459 False
Name: LotFrontage, Length: 1460, dtype: bool

housing_dataset['LotFrontage'].isnull().sum()

259

housing_dataset['LotFrontage'].shape # out of 1460 records in LotFrontage , 259 values are null

(1460,)

housing_dataset['LotFrontage'] = housing_dataset['LotFrontage'].fillna(housing_dataset['LotFrontage'].m

housing_dataset['LotFrontage'].isnull().sum() # After imputation by mean , sum of nulls in LotFrontage

Dropping columns from dataset

# Drop single column - BsmtFinSF1 column from data


housing_dataset = housing_dataset.drop(['BsmtFinSF1'], axis = 1) # Tip - use inplace = True
print(housing_dataset.head())
print(housing_dataset.columns)

# Drop multiple columns - BsmtFinSF2,BsmtUnfSF columns from data


housing_dataset.drop(['BsmtFinSF2','BsmtUnfSF'], axis = 1,inplace = True) # Tip - use inplace = True
print(housing_dataset.head())
print(housing_dataset.columns)

Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \


0 1 60 RL 65.0 8450 Pave NaN Reg
1 2 20 RL 80.0 9600 Pave NaN Reg
2 3 60 RL 68.0 11250 Pave NaN IR1
3 4 70 RL 60.0 9550 Pave NaN IR1
4 5 60 RL 84.0 14260 Pave NaN IR1

LandContour Utilities LotConfig LandSlope Neighborhood Condition1 \


0 Lvl AllPub Inside Gtl CollgCr Norm
1 Lvl AllPub FR2 Gtl Veenker Feedr
2 Lvl AllPub Inside Gtl CollgCr Norm
3 Lvl AllPub Corner Gtl Crawfor Norm
4 Lvl AllPub FR2 Gtl NoRidge Norm

Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt \


0 Norm 1Fam 2Story 7 5 2003
1 Norm 1Fam 1Story 6 8 1976
2 Norm 1Fam 2Story 7 5 2001
3 Norm 1Fam 2Story 7 5 1915
4 Norm 1Fam 2Story 8 5 2000

YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType \


0 2003 Gable CompShg VinylSd VinylSd BrkFace
1 1976 Gable CompShg MetalSd MetalSd None
2 2002 Gable CompShg VinylSd VinylSd BrkFace
3 1970 Gable CompShg Wd Sdng Wd Shng None
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 8/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
4 2000 Gable CompShg VinylSd VinylSd BrkFace

MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure \


0 196.0 Gd TA PConc Gd TA No
1 0.0 TA TA CBlock Gd TA Gd
2 162.0 Gd TA PConc Gd TA Mn
3 0.0 TA TA BrkTil TA Gd No
4 350.0 Gd TA PConc Gd TA Av

BsmtFinType1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating \


0 GLQ Unf 0 150 856 GasA
1 ALQ Unf 0 284 1262 GasA
2 GLQ Unf 0 434 920 GasA
3 ALQ Unf 0 540 756 GasA
4 GLQ Unf 0 490 1145 GasA

HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF \


0 Ex Y SBrkr 856 854 0
1 Ex Y SBrkr 1262 0 0
2 Ex Y SBrkr 920 866 0
3 Gd Y SBrkr 961 756 0
4 Ex Y SBrkr 1145 1053 0

GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr \


0 1710 1 0 2 1 3
1 1262 0 1 2 0 3
2 1786 1 0 2 1 3
3 1717 1 0 1 0 3
4 2198 1 0 2 1 4

KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu \


0 1 Gd 8 Typ 0 NaN

Creating new column by combining columns and separating by underscore

housing_dataset['House_Roof_Style'] = housing_dataset['HouseStyle'] + " " + housing_dataset['RoofStyle


housing_dataset.head()

Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape Land

0 1 60 RL 65.0 8450 Pave NaN Reg

1 2 20 RL 80.0 9600 Pave NaN Reg

2 3 60 RL 68.0 11250 Pave NaN IR1

3 4 70 RL 60.0 9550 Pave NaN IR1

4 5 60 RL 84.0 14260 Pave NaN IR1

Separate and get just the number of floors from column House Style

housing_dataset['House_floors'] = housing_dataset.House_Roof_Style.str.split().str.get(0)
housing_dataset.head()

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 9/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape Land


housing_dataset['House_Roof_Style'] = housing_dataset.House_Roof_Style.str.split().str.get(0) + housing
housing_dataset.head()
0 1 60 RL 65.0 8450 Pave NaN Reg

1 2 20 RL 80.0 9600 Pave NaN Reg


Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape Land
2 3 60 RL 68.0 11250 Pave NaN IR1
0 1 60 RL 65.0 8450 Pave NaN Reg
3 4 70 RL 60.0 9550 Pave NaN IR1
1 2 20 RL 80.0 9600 Pave NaN Reg
4 5 60 RL 84.0 14260 Pave NaN IR1
2 3 60 RL 68.0 11250 Pave NaN IR1

3 4 70 RL 60.0 9550 Pave NaN IR1

4 5 60 RL 84.0 14260 Pave NaN IR1

Replacing values in a column with other values

housing_dataset['SaleCondition'].unique()

array(['Normal', 'Abnorml', 'Partial', 'AdjLand', 'Alloca', 'Family'],


dtype=object)

# searchfor = ['Abnorml','Alloca']
# housing_dataset[housing_dataset.SaleCondition.str.contains('|'.join(searchfor))].head(10)
housing_dataset[housing_dataset.SaleCondition.isin(['Abnorml','Alloca'])].head(10)

Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape Lan

3 4 70 RL 60.0 9550 Pave NaN IR1

8 9 50 RM 51.0 6120 Pave NaN Reg

19 20 20 RL 70.0 7560 Pave NaN Reg

38 39 20 RL 68.0 7922 Pave NaN Reg

40 41 20 RL 84.0 8658 Pave NaN Reg

46 47 50 RL 48.0 12822 Pave NaN IR1

56 57 160 FV 24.0 2645 Pave Pave Reg

88 89 50 C (all) 105.0 8470 Pave NaN IR1

91 92 20 RL 85.0 8500 Pave NaN Reg

98 99 30 RL 85.0 10625 Pave NaN Reg

housing_dataset["SaleCondition"].replace({"Abnorml": "Abnormal", "Alloca": "Allocated","Partial":"Prtl"


housing_dataset['SaleCondition'].unique()
housing_dataset[housing_dataset.SaleCondition.isin(['Abnormal','Allocated',"Prtl"])]

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 10/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape

3 4 70 RL 60.000000 9550 Pave NaN IR1

8 9 50 RM 51.000000 6120 Pave NaN Reg

11 12 60 RL 85.000000 11924 Pave NaN IR1

13 14 20 RL 91.000000 10652 Pave NaN IR1

19 20 20 RL 70.000000 7560 Pave NaN Reg

20 21 60 RL 101.000000 14215 Pave NaN IR1

38 39 20 RL 68.000000 7922 Pave NaN Reg

40 41 20 RL 84.000000 8658 Pave NaN Reg

46 47 50 RL 48.000000 12822 Pave NaN IR1

48 49 190 RM 33.000000 4456 Pave NaN Reg

56 57 160 FV 24.000000 2645 Pave Pave Reg

58 59 60 RL 66.000000 13682 Pave NaN IR2

60 61 20 RL 63.000000 13072 Pave NaN Reg

87 88 160 FV 40.000000 3951 Pave Pave Reg

88 89 50 C (all) 105.000000 8470 Pave NaN IR1

91 92 20 RL 85.000000 8500 Pave NaN Reg

98 99 30 RL 85.000000 10625 Pave NaN Reg

102 103 90 RL 64.000000 7018 Pave NaN Reg

107 108 20 RM 50.000000 6000 Pave NaN Reg

112 113 60 RL 77.000000 9965 Pave NaN Reg

113 114 20 RL 70.049958 21000 Pave NaN Reg

117 118 20 RL 74.000000 8536 Pave NaN Reg

119 120 60 RL 65.000000 8461 Pave NaN Reg

129 130 20 RL 69.000000 8973 Pave NaN Reg

144 145 90 RM 70.000000 9100 Pave NaN Reg

151 152 20 RL 107.000000 13891 Pave NaN Reg

157 158 60 RL 92.000000 12003 Pave NaN Reg

159 160 60 RL 134.000000 19378 Pave NaN IR1

162 163 20 RL 95.000000 12182 Pave NaN Reg

167 168 60 RL 86.000000 10562 Pave NaN Reg

178 179 20 RL 63.000000 17423 Pave NaN IR1

188 189 90 RL 64.000000 7018 Pave NaN Reg

196 197 20 RL 79.000000 9416 Pave NaN Reg

197 198 75 RL 174.000000 25419 Pave NaN Reg

198 199 75 RM 92.000000 5520 Pave NaN Reg

212 213 60 FV 72.000000 8640 Pave NaN Reg

219 220 120 RL 43.000000 3010 Pave NaN Reg

220 221 20 RL 73.000000 8990 Pave NaN IR1

223 224 20 RL 70.000000 10500 Pave NaN Reg

225 226 160 RM 21.000000 1680 Pave NaN Reg

226 227 60 RL 82 000000 9950 Pave NaN IR1


https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 11/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
226 227 60 RL 82.000000 9950 Pave NaN IR1

238 239 20 RL 93.000000 12030 Pave NaN Reg

257 258 20 RL 68.000000 8814 Pave NaN Reg

261 262 60 RL 69.000000 9588 Pave NaN IR1

270 271 60 FV 84.000000 10728 Pave NaN Reg

278 279 20 RL 107.000000 14450 Pave NaN Reg

281 282 20 FV 60.000000 7200 Pave Pave Reg

283 284 20 RL 74.000000 9612 Pave NaN Reg

285 286 160 FV 35.000000 4251 Pave Pave IR1

290 291 60 RL 120.000000 15611 Pave NaN Reg

303 304 20 RL 70.000000 9800 Pave NaN Reg

320 321 60 RL 111.000000 16259 Pave NaN Reg

349 350 60 RL 56.000000 20431 Pave NaN IR2

350 351 120 RL 68.000000 7820 Pave NaN IR1

351 352 120 RL 70.049958 5271 Pave NaN IR1

358 359 80 RL 92.000000 6930 Pave NaN IR1

378 379 20 RL 88.000000 11394 Pave NaN Reg

381 382 20 FV 60.000000 7200 Pave Pave Reg

387 388 80 RL 72.000000 7200 Pave NaN Reg

389 390 60 RL 96.000000 12474 Pave NaN Reg

393 394 30 RL 70.049958 7446 Pave NaN Reg

398 399 30 RM 60.000000 8967 Pave NaN Reg

401 402 20 RL 65.000000 8767 Pave NaN IR1

403 404 60 RL 93.000000 12090 Pave NaN Reg

408 409 60 RL 109.000000 14154 Pave NaN Reg

409 410 60 FV 85.000000 10800 Pave NaN Reg

410 411 20 RL 68.000000 9571 Pave NaN Reg

412 413 20 FV 70.049958 4403 Pave NaN IR2

415 416 20 RL 73.000000 8899 Pave NaN IR1

420 421 90 RM 78.000000 7060 Pave NaN Reg

428 429 20 RL 64.000000 6762 Pave NaN Reg

430 431 160 RM 21.000000 1680 Pave NaN Reg

431 432 50 RM 60.000000 5586 Pave NaN IR1

443 444 120 RL 53.000000 3922 Pave NaN Reg

456 457 70 RM 34.000000 4571 Pave Grvl Reg

460 461 60 FV 75.000000 8004 Pave NaN IR1

473 474 20 RL 110.000000 14977 Pave NaN IR1

479 480 30 RM 50.000000 5925 Pave NaN Reg

492 493 60 RL 105.000000 15578 Pave NaN IR1

495 496 30 C (all) 60.000000 7879 Pave NaN Reg

507 508 20 FV 75.000000 7862 Pave NaN IR1

511 512 120 RL 40.000000 6792 Pave NaN IR1

515 516 20 RL 94 000000 12220 Pave NaN Reg


https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 12/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
515 516 20 RL 94.000000 12220 Pave NaN Reg

516 517 80 RL 70.049958 10448 Pave NaN IR1

523 524 60 RL 130.000000 40094 Pave NaN IR1

527 528 60 RL 67.000000 14948 Pave NaN IR1

529 530 20 RL 70.049958 32668 Pave NaN IR1

530 531 80 RL 85.000000 10200 Pave NaN Reg

544 545 60 RL 58.000000 17104 Pave NaN IR1

550 551 120 RL 53.000000 4043 Pave NaN Reg

571 572 20 RL 60.000000 7332 Pave NaN Reg

572 573 60 RL 83.000000 13159 Pave NaN IR1

575 576 50 RL 80.000000 8480 Pave NaN Reg

577 578 80 RL 96.000000 11777 Pave NaN IR1

578 579 160 FV 34.000000 3604 Pave Pave Reg

581 582 20 RL 98.000000 12704 Pave NaN Reg

585 586 20 RL 88.000000 11443 Pave NaN Reg

588 589 20 RL 65.000000 25095 Pave NaN IR1

595 596 20 RL 69.000000 11302 Pave NaN IR1

597 598 120 RL 53.000000 3922 Pave NaN Reg

602 603 60 RL 80.000000 10041 Pave NaN IR1

608 609 70 RL 78.000000 12168 Pave NaN Reg

613 614 20 RL 70.000000 8402 Pave NaN Reg

615 616 85 RL 80.000000 8800 Pave NaN Reg

618 619 20 RL 90.000000 11694 Pave NaN Reg

630 631 70 RM 50.000000 9000 Pave Grvl Reg

635 636 190 RH 60.000000 10896 Pave Pave Reg

639 640 120 RL 53.000000 3982 Pave NaN Reg

644 645 20 FV 85.000000 9187 Pave NaN Reg

658 659 50 RL 78.000000 17503 Pave NaN Reg

664 665 20 RL 49.000000 20896 Pave NaN IR2

666 667 60 RL 70.049958 18450 Pave NaN IR1

678 679 20 RL 80.000000 11844 Pave NaN IR1

681 682 50 RH 55.000000 4500 Pave Pave IR2

686 687 60 FV 84.000000 10207 Pave NaN Reg

688 689 20 RL 60.000000 8089 Pave NaN Reg

693 694 30 RL 60.000000 5400 Pave NaN Reg

702 703 60 RL 82.000000 12438 Pave NaN IR1

708 709 60 RL 65.000000 9018 Pave NaN IR1

709 710 20 RL 70.049958 7162 Pave NaN IR1

711 712 50 C (all) 66.000000 8712 Pave Pave Reg

728 729 90 RL 85.000000 11475 Pave NaN Reg

738 739 90 RL 60.000000 10800 Pave NaN Reg

740 741 70 RM 60.000000 9600 Pave Grvl Reg

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 13/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
757 758 60 RL 70.049958 11616 Pave NaN IR1

765 766 20 RL 75.000000 14587 Pave NaN IR1

772 773 80 RL 94.000000 7819 Pave NaN Reg

774 775 20 RL 110.000000 14226 Pave NaN Reg

776 777 20 RL 86.000000 11210 Pave NaN IR1

793 794 20 RL 76.000000 9158 Pave NaN Reg

797 798 20 RL 57.000000 7677 Pave NaN Reg

798 799 60 RL 104.000000 13518 Pave NaN Reg

803 804 60 RL 107.000000 13891 Pave NaN Reg

805 806 20 RL 91.000000 12274 Pave NaN IR1

812 813 20 C (all) 66.000000 8712 Grvl NaN Reg

819 820 120 RL 44.000000 6371 Pave NaN IR1

824 825 20 FV 81.000000 11216 Pave NaN Reg

825 826 20 RL 114.000000 14803 Pave NaN Reg

828 829 60 RL 70.049958 28698 Pave NaN IR2

854 855 20 RL 102.000000 17920 Pave NaN Reg

864 865 20 FV 72.000000 8640 Pave NaN Reg

866 867 20 RL 67.000000 10656 Pave NaN IR1

874 875 50 RM 52.000000 5720 Pave NaN Reg

875 876 60 FV 75.000000 9000 Pave NaN Reg

885 886 120 FV 50.000000 5119 Pave NaN IR1

894 895 90 RL 64.000000 7018 Pave NaN Reg

896 897 30 RM 50.000000 8765 Pave Grvl Reg

897 898 90 RL 64.000000 7018 Pave NaN Reg

898 899 20 RL 100.000000 12919 Pave NaN IR1

903 904 20 RL 50.000000 14859 Pave NaN IR1

912 913 30 RM 51.000000 6120 Pave NaN Reg

914 915 160 FV 30.000000 3000 Pave Pave Reg

916 917 20 C (all) 50.000000 9000 Pave NaN Reg

922 923 20 RL 65.000000 10237 Pave NaN Reg

925 926 20 RL 70.049958 15611 Pave NaN IR1

938 939 60 RL 73.000000 8760 Pave NaN Reg

942 943 90 RL 42.000000 7711 Pave NaN IR1

944 945 20 RL 70.049958 14375 Pave NaN IR1

951 952 20 RH 60.000000 7800 Pave NaN Reg

965 966 60 RL 65.000000 10237 Pave NaN Reg

968 969 50 RM 50.000000 5925 Pave NaN Reg

970 971 50 RL 60.000000 10800 Pave NaN Reg

973 974 20 FV 95.000000 11639 Pave NaN Reg

977 978 120 FV 35.000000 4274 Pave Pave IR1

978 979 20 RL 68.000000 9450 Pave NaN Reg

987 988 20 RL 83.000000 10159 Pave NaN IR1

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 14/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

989 990 60 FV 65.000000 8125 Pave NaN Reg

993 994 60 RL 68.000000 8846 Pave NaN Reg

995 996 50 RL 51.000000 4712 Pave NaN IR1

1001 1002 30 RL 60.000000 5400 Pave NaN Reg

1017 1018 120 RL 70.049958 5814 Pave NaN IR1

1021 1022 20 RL 64.000000 7406 Pave NaN Reg

1024 1025 20 RL 70.049958 15498 Pave NaN IR1

1027 1028 20 RL 71.000000 9520 Pave NaN IR1

1032 1033 60 RL 70.049958 14541 Pave NaN IR1

1046 1047 60 RL 85.000000 16056 Pave NaN IR1

1049 1050 20 RL 60.000000 11100 Pave NaN Reg

1050 1051 20 RL 73.000000 8993 Pave NaN IR1

1051 1052 20 RL 103.000000 11175 Pave NaN IR1

1055 1056 20 RL 104.000000 11361 Pave NaN Reg

1077 1078 20 RL 70.049958 15870 Pave NaN IR1

1080 1081 20 RL 80.000000 11040 Pave NaN Reg

1099 1100 20 RL 82.000000 11880 Pave NaN IR1

1107 1108 60 RL 168.000000 23257 Pave NaN IR3

1108 1109 60 RL 70.049958 8063 Pave NaN Reg

1115 1116 20 RL 93.000000 12085 Pave NaN Reg

1121 1122 20 RL 84.000000 10084 Pave NaN Reg

1122 1123 20 RL 70.049958 8926 Pave NaN IR1

1131 1132 20 RL 63.000000 10712 Pave NaN Reg

1136 1137 50 RL 80.000000 9600 Pave NaN Reg

1140 1141 20 RL 60.000000 7350 Pave NaN Reg

1142 1143 60 RL 77.000000 9965 Pave NaN Reg

1152 1153 20 RL 90.000000 14115 Pave NaN IR1

1158 1159 20 RL 92.000000 11932 Pave NaN Reg

1163 1164 90 RL 60.000000 12900 Pave NaN Reg

1165 1166 20 RL 79.000000 9541 Pave NaN IR1

1181 1182 120 RM 64.000000 5587 Pave NaN IR1

1182 1183 60 RL 160.000000 15623 Pave NaN IR1

1186 1187 190 RL 107.000000 10615 Pave NaN IR1

1196 1197 60 RL 58.000000 14054 Pave NaN IR1

1200 1201 20 RL 71.000000 9353 Pave NaN Reg

1209 1210 20 RL 85.000000 10182 Pave NaN IR1

1217 1218 20 FV 72.000000 8640 Pave NaN Reg

1219 1220 160 RM 21.000000 1680 Pave NaN Reg

1220 1221 20 RL 66.000000 7800 Pave NaN IR1

1228 1229 120 RL 65.000000 8769 Pave NaN Reg

1233 1234 20 RL 70.049958 12160 Pave NaN IR1

1234 1235 70 RH 55.000000 8525 Pave NaN Reg

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 15/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

1238 1239 20 RL 63.000000 13072 Pave NaN Reg

1241 1242 20 RL 83.000000 9849 Pave NaN Reg

1243 1244 20 RL 107.000000 13891 Pave NaN Reg

1245 1246 80 RL 78.000000 12090 Pave NaN Reg

1246 1247 60 FV 65.000000 8125 Pave NaN Reg

1264 1265 120 RH 34.000000 4060 Pave NaN Reg

1279 1280 50 C (all) 60.000000 7500 Pave NaN Reg

1289 1290 60 RL 86.000000 11065 Pave NaN IR1

1297 1298 180 RM 35.000000 3675 Pave NaN Reg

1298 1299 60 RL 313.000000 63887 Pave NaN IR3

1306 1307 120 RL 48.000000 6955 Pave NaN IR1

1311 1312 20 RL 68.000000 8814 Pave NaN Reg

1317 1318 120 FV 47.000000 4230 Pave Pave Reg

1324 1325 20 RL 75.000000 9986 Pave NaN Reg

1344 1345 60 RL 85.000000 11103 Pave NaN IR1

1347 1348 20 RL 93.000000 15306 Pave NaN IR1

1363 1364 60 RL 73.000000 8499 Pave NaN IR1

1364 1365 160 FV 30.000000 3180 Pave Pave Reg

1366 1367 60 RL 68.000000 9179 Pave NaN IR1

1375 1376 20 RL 89.000000 10991 Pave NaN IR1

1394 1395 120 RL 53.000000 4045 Pave NaN Reg

1402 1403 20 RL 64.000000 6762 Pave NaN Reg

1413 1414 20 RL 88.000000 10994 Pave NaN IR1

1423 1424 80 RL 70.049958 19690 Pave NaN IR1

Statistical summary30
1428 1429 RM 60.000000 7200 Pave NaN Reg

1435 1436 20 RL 80.000000 8400 Pave NaN Reg


housing_dataset.describe().T
1437 1438 20 #T isRL
for transpose
96.000000 12444 Pave NaN Reg

1449 1450 180 RM 21.000000 1533 Pave NaN Reg

1451 1452 20 RL 78.000000 9262 Pave NaN Reg

1453 1454 20 RL 90.000000 17217 Pave NaN Reg

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 16/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

count mean std min 25% 50%

Id 1460.0 730.500000 421.610009 1.0 365.75 730.500000

MSSubClass 1460.0 56.897260 42.300571 20.0 20.00 50.000000

LotFrontage 1460.0 70.049958 22.024023 21.0 60.00 70.049958

LotArea 1460.0 10516.828082 9981.264932 1300.0 7553.50 9478.500000

OverallQual 1460.0 6.099315 1.382997 1.0 5.00 6.000000

OverallCond 1460.0 5.575342 1.112799 1.0 5.00 5.000000

YearBuilt 1460.0 1971.267808 30.202904 1872.0 1954.00 1973.000000

YearRemodAdd 1460.0 1984.865753 20.645407 1950.0 1967.00 1994.000000

MasVnrArea 1452.0 103.685262 181.066207 0.0 0.00 0.000000

TotalBsmtSF 1460.0 1057.429452 438.705324 0.0 795.75 991.500000

1stFlrSF 1460.0 1162.626712 386.587738 334.0 882.00 1087.000000

2ndFlrSF 1460.0 346.992466 436.528436 0.0 0.00 0.000000

LowQualFinSF 1460.0 5.844521 48.623081 0.0 0.00 0.000000

GrLivArea 1460.0 1515.463699 525.480383 334.0 1129.50 1464.000000

BsmtFullBath 1460.0 0.425342 0.518911 0.0 0.00 0.000000

BsmtHalfBath 1460.0 0.057534 0.238753 0.0 0.00 0.000000

FullBath 1460.0 1.565068 0.550916 0.0 1.00 2.000000

HalfBath 1460.0 0.382877 0.502885 0.0 0.00 0.000000

BedroomAbvGr 1460.0 2.866438 0.815778 0.0 2.00 3.000000

KitchenAbvGr 1460.0 1.046575 0.220338 0.0 1.00 1.000000

TotRmsAbvGrd 1460.0 6.517808 1.625393 2.0 5.00 6.000000

Fireplaces 1460.0 0.613014 0.644666 0.0 0.00 1.000000

Statistical summary
GarageYrBlt 1379.0 for1978.506164
categorical24.689725
columns1900.0
too 1961.00 1980.000000

GarageCars 1460.0 1.767123 0.747315 0.0 1.00 2.000000

GarageArea 1460.0 472.980137


housing_dataset.describe(include='all').T 213.804841 0.0 334.50 480.000000

WoodDeckSF 1460.0 94.244521 125.338794 0.0 0.00 0.000000

OpenPorchSF 1460.0 46.660274 66.256028 0.0 0.00 25.000000

EnclosedPorch 1460.0 21.954110 61.119149 0.0 0.00 0.000000

3SsnPorch 1460.0 3.409589 29.317331 0.0 0.00 0.000000

ScreenPorch 1460.0 15.060959 55.757415 0.0 0.00 0.000000

PoolArea 1460.0 2.758904 40.177307 0.0 0.00 0.000000

MiscVal 1460.0 43.489041 496.123024 0.0 0.00 0.000000

MoSold 1460.0 6.321918 2.703626 1.0 5.00 6.000000

YrSold 1460.0 2007.815753 1.328095 2006.0 2007.00 2008.000000

S l Pi 1460 0 180921 195890 79442 502883 34900 0 129975 00 163000 000000

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 17/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

count unique top freq mean std

Id 1460.0 NaN NaN NaN 730.5 421.610009

MSSubClass 1460.0 NaN NaN NaN 56.89726 42.300571

MSZoning 1460 5 RL 1151 NaN NaN

LotFrontage 1460.0 NaN NaN NaN 70.049958 22.024023

LotArea 1460.0 NaN NaN NaN 10516.828082 9981.264932

Street 1460 2 Pave 1454 NaN NaN

Alley 91 2 Grvl 50 NaN NaN

LotShape 1460 4 Reg 925 NaN NaN

LandContour 1460 4 Lvl 1311 NaN NaN

Utilities 1460 2 AllPub 1459 NaN NaN

LotConfig 1460 5 Inside 1052 NaN NaN

LandSlope 1460 3 Gtl 1382 NaN NaN

Neighborhood 1460 25 NAmes 225 NaN NaN

Condition1 1460 9 Norm 1260 NaN NaN

Condition2 1460 8 Norm 1445 NaN NaN

BldgType 1460 5 1Fam 1220 NaN NaN

HouseStyle 1460 8 1Story 726 NaN NaN

OverallQual 1460.0 NaN NaN NaN 6.099315 1.382997

OverallCond 1460.0 NaN NaN NaN 5.575342 1.112799

YearBuilt 1460.0 NaN NaN NaN 1971.267808 30.202904

YearRemodAdd 1460.0 NaN NaN NaN 1984.865753 20.645407

RoofStyle 1460 6 Gable 1141 NaN NaN

RoofMatl 1460 8 CompShg 1434 NaN NaN

Exterior1st 1460 15 VinylSd 515 NaN NaN

Exterior2nd 1460 16 VinylSd 504 NaN NaN

MasVnrType 1452 4 None 864 NaN NaN

MasVnrArea 1452.0 NaN NaN NaN 103.685262 181.066207

ExterQual 1460 4 TA 906 NaN NaN

ExterCond 1460 5 TA 1282 NaN NaN

Foundation 1460 6 PConc 647 NaN NaN

BsmtQual 1423 4 TA 649 NaN NaN

BsmtCond 1423 4 TA 1311 NaN NaN

BsmtExposure 1422 4 No 953 NaN NaN

BsmtFinType1 1423 6 Unf 430 NaN NaN

BsmtFinType2 1422 6 Unf 1256 NaN NaN

TotalBsmtSF 1460.0 NaN NaN NaN 1057.429452 438.705324

Heating 1460 6 GasA 1428 NaN NaN

HeatingQC 1460 5 Ex 741 NaN NaN

CentralAir 1460 2 Y 1365 NaN NaN

Electrical 1459 5 SBrkr 1334 NaN NaN

1stFlrSF 1460 0 NaN NaN NaN 1162 626712 386 587738


https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 18/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
1stFlrSF 1460.0 NaN NaN NaN 1162.626712 386.587738

2ndFlrSF 1460.0 NaN NaN NaN 346.992466 436.528436

LowQualFinSF 1460.0 NaN NaN NaN 5.844521 48.623081

GrLivArea 1460.0 NaN NaN NaN 1515.463699 525.480383

BsmtFullBath 1460.0 NaN NaN NaN 0.425342 0.518911

BsmtHalfBath 1460.0 NaN NaN NaN 0.057534 0.238753

FullBath 1460.0 NaN NaN NaN 1.565068 0.550916

HalfBath 1460.0 NaN NaN NaN 0.382877 0.502885

BedroomAbvGr 1460.0 NaN NaN NaN 2.866438 0.815778

KitchenAbvGr 1460.0 NaN NaN NaN 1.046575 0.220338

KitchenQual 1460 4 TA 735 NaN NaN

TotRmsAbvGrd 1460.0 NaN NaN NaN 6.517808 1.625393

Functional 1460 7 Typ 1360 NaN NaN

Fireplaces 1460.0 NaN NaN NaN 0.613014 0.644666

FireplaceQu 770 5 Gd 380 NaN NaN

GarageType 1379 6 Attchd 870 NaN NaN

GarageYrBlt 1379.0 NaN NaN NaN 1978.506164 24.689725

GarageFinish 1379 3 Unf 605 NaN NaN

GarageCars 1460.0 NaN NaN NaN 1.767123 0.747315

GarageArea 1460.0 NaN NaN NaN 472.980137 213.804841

GarageQual 1379 5 TA 1311 NaN NaN

GarageCond 1379 5 TA 1326 NaN NaN

PavedDrive 1460 3 Y 1340 NaN NaN

WoodDeckSF 1460.0 NaN NaN NaN 94.244521 125.338794

OpenPorchSF 1460.0 NaN NaN NaN 46.660274 66.256028

EnclosedPorch 1460.0 NaN NaN NaN 21.95411 61.119149

3SsnPorch 1460.0 NaN NaN NaN 3.409589 29.317331


Separation of numerical and categorical variables
ScreenPorch 1460.0 NaN NaN NaN 15.060959 55.757415

PoolArea 1460.0 NaN NaN NaN 2.758904 40.177307


cat_cols=housing_dataset.select_dtypes(include=['object']).columns
num_cols =PoolQC 7 3 Gd 3 NaN NaN
housing_dataset.select_dtypes(include=np.number).columns.tolist()
print("Categorical
Fence Variables:")
281 4 MnPrv 157 NaN NaN
print(cat_cols)
print("Numerical
MiscFeatureVariables:")
54 4 Shed 49 NaN NaN
print(num_cols)
MiscVal 1460.0 NaN NaN NaN 43.489041 496.123024
Categorical Variables:
MoSold 1460.0 NaN NaN NaN 6.321918 2.703626
Index(['MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities',
'LotConfig',
YrSold 'LandSlope',
1460.0 NaN 'Neighborhood',
NaN NaN 'Condition1',
2007.815753 'Condition2',
1.328095 2
'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st',
'Exterior2nd',
SaleType 1460'MasVnrType',
9 'ExterQual',
WD 1267 'ExterCond',
NaN 'Foundation',
NaN
'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2',
'Heating', 'HeatingQC',
SaleCondition 1460 6 'CentralAir', 'Electrical',
Normal 1198 NaN 'KitchenQual',
NaN
'Functional', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual',
SalePrice 1460.0
'GarageCond', NaN
'PavedDrive', NaN NaN
'PoolQC', 180921.19589
'Fence', 79442.502883 34
'MiscFeature',
'SaleType', 'SaleCondition', 'House_Roof_Style', 'House_floors'],
House_Roof_Style 1460 27 1StoryGable 513 NaN NaN
dtype='object')
Numerical
H fl Variables:
1460 8 1St 726 N N N N
['Id', 'MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemo

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 19/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

Univariate analysis

housing_dataset['SalePrice'].describe()

count 1460.000000
mean 180921.195890
std 79442.502883
min 34900.000000
25% 129975.000000
50% 163000.000000
75% 214000.000000
max 755000.000000
Name: SalePrice, dtype: float64

Histogram of a numerical column

sns.distplot(housing_dataset['SalePrice'])

/Users/prarab/opt/anaconda3/lib/python3.9/site-packages/seaborn/distributions
warnings.warn(msg, FutureWarning)
<AxesSubplot:xlabel='SalePrice', ylabel='Density'>

Mean of a quantitative variable

np.mean(housing_dataset['SalePrice'])

180921.19589041095

Range of a quantitative variable

np.max(housing_dataset['LotFrontage']) - np.min(housing_dataset['LotFrontage'])

292.0

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 20/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

for iz in range(10):
print(iz+1)

1
2
3
4
5
6
7
8
9
10

# To show graphs of all numerical columns at once:


import matplotlib.pyplot as plt
for col in num_cols:
print(col)
print('Skew :', round(housing_dataset[col].skew(), 2))
plt.figure(figsize = (15, 4))
plt.subplot(1, 2, 1)
housing_dataset[col].hist(grid=False)
plt.xlabel(col)
plt.ylabel('count')
plt.subplot(1, 2, 2)
sns.boxplot(x=housing_dataset[col])
plt.show()

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 21/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

Id
Skew : 0.0

MSSubClass
Skew : 1.41

LotFrontage
Skew : 2.38

LotArea
Skew : 12.21

OverallQual
Skew : 0.22

OverallCond
Skew : 0.69

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 22/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

YearBuilt
Skew : -0.61

YearRemodAdd
Skew : -0.5

MasVnrArea
Skew : 2.67

TotalBsmtSF
Skew : 1.52

1stFlrSF
Skew : 1.38

2ndFlrSF
Skew : 0.81

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 23/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

LowQualFinSF
Skew : 9.01

GrLivArea
Skew : 1.37

BsmtFullBath
Skew : 0.6

BsmtHalfBath
Skew : 4.1

FullBath
Skew : 0.04

HalfBath
Skew : 0.68

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 24/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

BedroomAbvGr
Skew : 0.21

KitchenAbvGr
Skew : 4.49

TotRmsAbvGrd
Skew : 0.68

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 25/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 26/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 27/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 28/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

Univariate analysis of categorical columns

# Categorical Variables:
# Index(['MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities',
# 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
# 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st',
# 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation',
# 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2',
# 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual',
# 'Functional', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual',
# 'GarageCond', 'PavedDrive', 'PoolQC', 'Fence', 'MiscFeature',
# 'SaleType', 'SaleCondition', 'House_Roof_Style', 'House_floors'],
# dtype='object')
# Numerical Variables:

fig, axes = plt.subplots(2, 2, figsize = (18, 18))


fig.suptitle('Bar plot for categorical variables in the dataset')
sns.countplot(ax = axes[0, 0], x = 'HouseStyle', data = housing_dataset, color = 'blue',
order = housing_dataset['HouseStyle'].value_counts().index);
sns.countplot(ax = axes[0, 1], x = 'House_floors', data = housing_dataset, color = 'blue',
order = housing_dataset['House_floors'].value_counts().index);
sns.countplot(ax = axes[1, 0], x = 'Street', data = housing_dataset, color = 'blue',
order = housing_dataset['Street'].value_counts().index);

# Taken only first 20 records as sample currently for plotting Model


sns.countplot(ax = axes[1, 1], x = 'RoofStyle', data = housing_dataset, color = 'blue',
order = housing_dataset['RoofStyle'].head(20).value_counts().index);

# In case you want to rotate text labels of x axis of any plot


axes[0][1].tick_params(labelrotation=45);
axes[1][1].tick_params(labelrotation=90);

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 29/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory

Bivariate analysis

Numerical vs numerical

housing_dataset.plot.scatter(x='GrLivArea', y='SalePrice')

https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 30/30

You might also like