Copy - of - Descriptive - EDA - Munjal - Exercise1.ipynb - Colaboratory
Copy - of - Descriptive - EDA - Munjal - Exercise1.ipynb - Colaboratory
ipynb - Colaboratory
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as plt
housing_dataset = pd.read_csv("housing_dataset.csv")
housing_dataset.shape
housing_dataset.head()
5 rows × 81 columns
housing_dataset.columns
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 1/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
'SaleCondition', 'SalePrice'],
dtype='object')
housing_dataset.describe()
8 rows × 38 columns
housing_dataset.dtypes
Id int64
MSSubClass int64
MSZoning object
LotFrontage float64
LotArea int64
...
MoSold int64
YrSold int64
SaleType object
SaleCondition object
SalePrice int64
Length: 81, dtype: object
Tip - use pd.set_option for displaying full data ,either rows or columns
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
Id int64
MSSubClass int64
MSZoning object
LotFrontage float64
LotArea int64
Street object
Alley object
LotShape object
LandContour object
Utilities object
LotConfig object
LandSlope object
Neighborhood object
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 2/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
Condition1 object
Condition2 object
BldgType object
HouseStyle object
OverallQual int64
OverallCond int64
YearBuilt int64
YearRemodAdd int64
RoofStyle object
RoofMatl object
Exterior1st object
Exterior2nd object
MasVnrType object
MasVnrArea float64
ExterQual object
ExterCond object
Foundation object
BsmtQual object
BsmtCond object
BsmtExposure object
BsmtFinType1 object
BsmtFinSF1 int64
BsmtFinType2 object
BsmtFinSF2 int64
BsmtUnfSF int64
TotalBsmtSF int64
Heating object
HeatingQC object
CentralAir object
Electrical object
1stFlrSF int64
2ndFlrSF int64
LowQualFinSF int64
GrLivArea int64
BsmtFullBath int64
BsmtHalfBath int64
FullBath int64
HalfBath int64
BedroomAbvGr int64
KitchenAbvGr int64
KitchenQual object
TotRmsAbvGrd int64
Functional object
Fireplaces int64
FireplaceQu object
pd.DataFrame(housing_dataset.isnull().sum())
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 3/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
Id 0
MSSubClass 0
MSZoning 0
LotFrontage 259
LotArea 0
Street 0
Alley 1369
LotShape 0
LandContour 0
Utilities 0
LotConfig 0
LandSlope 0
Neighborhood 0
Condition1 0
Condition2 0
BldgType 0
HouseStyle 0
OverallQual 0
OverallCond 0
YearBuilt 0
YearRemodAdd 0
RoofStyle 0
RoofMatl 0
Exterior1st 0
Exterior2nd 0
MasVnrType 8
MasVnrArea 8
ExterQual 0
ExterCond 0
Foundation 0
BsmtQual 37
BsmtCond 37
BsmtExposure 38
BsmtFinType1 37
BsmtFinSF1 0
BsmtFinType2 38
BsmtFinSF2 0
BsmtUnfSF 0
TotalBsmtSF 0
Heating 0
HeatingQC 0
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 4/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
HeatingQC 0
CentralAir 0
Electrical 1
1stFlrSF 0
2ndFlrSF 0
LowQualFinSF 0
GrLivArea 0
BsmtFullBath 0
BsmtHalfBath 0
FullBath 0
HalfBath 0
BedroomAbvGr 0
KitchenAbvGr 0
KitchenQual 0
TotRmsAbvGrd 0
Functional 0
Fireplaces 0
FireplaceQu 690
GarageType 81
GarageYrBlt 81
GarageFinish 81
GarageCars 0
GarageArea 0
GarageQual 81
GarageCond 81
PavedDrive 0
WoodDeckSF 0
OpenPorchSF 0
EnclosedPorch 0
3SsnPorch 0
ScreenPorch 0
PoolArea 0
PoolQC 1453
Fence 1179
MiscFeature 1406
MiscVal 0
MoSold 0
YrSold 0
SaleType 0
SaleCondition 0
pd.DataFrame(housing_dataset.isnull().sum()/housing_dataset.isnull().count(),columns=['%
SalePrice 0 missing']).sor
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 5/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
% missing
PoolQC 0.995205
MiscFeature 0.963014
Alley 0.937671
Fence 0.807534
FireplaceQu 0.472603
LotFrontage 0.177397
GarageYrBlt 0.055479
GarageCond 0.055479
GarageType 0.055479
GarageFinish 0.055479
GarageQual 0.055479
BsmtFinType2 0.026027
BsmtExposure 0.026027
BsmtQual 0.025342
BsmtCond 0.025342
BsmtFinType1 0.025342
MasVnrArea 0.005479
MasVnrType 0.005479
Electrical 0.000685
Id 0.000000
Functional 0.000000
Fireplaces 0.000000
KitchenQual 0.000000
KitchenAbvGr 0.000000
BedroomAbvGr 0.000000
HalfBath 0.000000
FullBath 0.000000
BsmtHalfBath 0.000000
TotRmsAbvGrd 0.000000
GarageCars 0.000000
GrLivArea 0.000000
GarageArea 0.000000
PavedDrive 0.000000
WoodDeckSF 0.000000
OpenPorchSF 0.000000
EnclosedPorch 0.000000
3SsnPorch 0.000000
ScreenPorch 0.000000
PoolArea 0.000000
MiscVal 0.000000
MoSold 0 000000
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 6/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
MoSold 0.000000
YrSold 0.000000
SaleType 0.000000
SaleCondition 0.000000
BsmtFullBath 0.000000
HeatingQC 0.000000
LowQualFinSF 0.000000
LandSlope 0.000000
OverallQual 0.000000
HouseStyle 0.000000
BldgType 0.000000
Condition2 0.000000
Condition1 0.000000
Neighborhood 0.000000
LotConfig 0.000000
YearBuilt 0.000000
Utilities 0.000000
LandContour 0.000000
LotShape 0.000000
Street 0.000000
LotArea 0.000000
MSZoning 0.000000
OverallCond 0.000000
YearRemodAdd 0.000000
2ndFlrSF 0.000000
BsmtFinSF2 0.000000
1stFlrSF 0.000000
CentralAir 0.000000
MSSubClass 0.000000
Heating 0.000000
TotalBsmtSF 0.000000
BsmtUnfSF 0.000000
BsmtFinSF1 0.000000
RoofStyle 0.000000
Foundation 0.000000
ExterCond 0.000000
ExterQual 0.000000
Exterior2nd 0.000000
Exterior1st 0.000000
RoofMatl 0.000000
SalePrice 0.000000
Missing value imputation example using mean
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 7/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
housing_dataset['LotFrontage'].isnull()
0 False
1 False
2 False
3 False
4 False
...
1455 False
1456 False
1457 False
1458 False
1459 False
Name: LotFrontage, Length: 1460, dtype: bool
housing_dataset['LotFrontage'].isnull().sum()
259
(1460,)
housing_dataset['LotFrontage'] = housing_dataset['LotFrontage'].fillna(housing_dataset['LotFrontage'].m
Separate and get just the number of floors from column House Style
housing_dataset['House_floors'] = housing_dataset.House_Roof_Style.str.split().str.get(0)
housing_dataset.head()
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 9/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
housing_dataset['SaleCondition'].unique()
# searchfor = ['Abnorml','Alloca']
# housing_dataset[housing_dataset.SaleCondition.str.contains('|'.join(searchfor))].head(10)
housing_dataset[housing_dataset.SaleCondition.isin(['Abnorml','Alloca'])].head(10)
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 10/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 13/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
757 758 60 RL 70.049958 11616 Pave NaN IR1
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 14/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 15/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
Statistical summary30
1428 1429 RM 60.000000 7200 Pave NaN Reg
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 16/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
Statistical summary
GarageYrBlt 1379.0 for1978.506164
categorical24.689725
columns1900.0
too 1961.00 1980.000000
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 17/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 19/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
Univariate analysis
housing_dataset['SalePrice'].describe()
count 1460.000000
mean 180921.195890
std 79442.502883
min 34900.000000
25% 129975.000000
50% 163000.000000
75% 214000.000000
max 755000.000000
Name: SalePrice, dtype: float64
sns.distplot(housing_dataset['SalePrice'])
/Users/prarab/opt/anaconda3/lib/python3.9/site-packages/seaborn/distributions
warnings.warn(msg, FutureWarning)
<AxesSubplot:xlabel='SalePrice', ylabel='Density'>
np.mean(housing_dataset['SalePrice'])
180921.19589041095
np.max(housing_dataset['LotFrontage']) - np.min(housing_dataset['LotFrontage'])
292.0
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 20/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
for iz in range(10):
print(iz+1)
1
2
3
4
5
6
7
8
9
10
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 21/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
Id
Skew : 0.0
MSSubClass
Skew : 1.41
LotFrontage
Skew : 2.38
LotArea
Skew : 12.21
OverallQual
Skew : 0.22
OverallCond
Skew : 0.69
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 22/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
YearBuilt
Skew : -0.61
YearRemodAdd
Skew : -0.5
MasVnrArea
Skew : 2.67
TotalBsmtSF
Skew : 1.52
1stFlrSF
Skew : 1.38
2ndFlrSF
Skew : 0.81
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 23/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
LowQualFinSF
Skew : 9.01
GrLivArea
Skew : 1.37
BsmtFullBath
Skew : 0.6
BsmtHalfBath
Skew : 4.1
FullBath
Skew : 0.04
HalfBath
Skew : 0.68
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 24/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
BedroomAbvGr
Skew : 0.21
KitchenAbvGr
Skew : 4.49
TotRmsAbvGrd
Skew : 0.68
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 25/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 26/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 27/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 28/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
# Categorical Variables:
# Index(['MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities',
# 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
# 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st',
# 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation',
# 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2',
# 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual',
# 'Functional', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual',
# 'GarageCond', 'PavedDrive', 'PoolQC', 'Fence', 'MiscFeature',
# 'SaleType', 'SaleCondition', 'House_Roof_Style', 'House_floors'],
# dtype='object')
# Numerical Variables:
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 29/30
21/11/2023, 00:00 Copy_of_Descriptive_EDA_Munjal_exercise1.ipynb - Colaboratory
Bivariate analysis
Numerical vs numerical
housing_dataset.plot.scatter(x='GrLivArea', y='SalePrice')
https://colab.research.google.com/drive/1nHXtKRleY5CVYi8iMwDKGDTnWVeiP6jL#scrollTo=SWU1K04iQody&printMode=true 30/30