Assignment-2: Pandas PD Numpy NP Seaborn Sns Matplotlib - Pyplot PLT
Assignment-2: Pandas PD Numpy NP Seaborn Sns Matplotlib - Pyplot PLT
[11]: print(data.head())
[13]: print(data.tail())
1
[17]: print("Data shape:", data.shape)
[19]: print(data.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 545 non-null int64
1 area 545 non-null int64
2 bedrooms 545 non-null int64
3 bathrooms 545 non-null int64
4 stories 545 non-null int64
5 mainroad 545 non-null object
6 guestroom 545 non-null object
7 basement 545 non-null object
8 hotwaterheating 545 non-null object
9 airconditioning 545 non-null object
10 parking 545 non-null int64
11 prefarea 545 non-null object
12 furnishingstatus 545 non-null object
dtypes: int64(6), object(7)
memory usage: 55.5+ KB
None
[21]: ss = data.describe()
ss
parking
count 545.000000
mean 0.693578
std 0.861586
min 0.000000
25% 0.000000
2
50% 0.000000
75% 1.000000
max 3.000000
[25]: data
[29]: data.isnull()
3
544 False False False False False False False
furnishingstatus
0 False
1 False
2 False
3 False
4 False
.. …
540 False
541 False
542 False
543 False
544 False
[31]: data.isnull().sum()
[31]: price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 0
parking 0
prefarea 0
furnishingstatus 0
dtype: int64
4
[33]: data.isna().mean()*100
[35]: data.head()
[39]: data.fillna('Disha')
5
hotwaterheating airconditioning parking prefarea furnishingstatus
0 no yes 2 yes furnished
1 no yes 3 no furnished
2 no no 2 yes semi-furnished
3 no yes 3 yes furnished
4 no yes 2 no furnished
.. … … … … …
540 no no 2 no unfurnished
541 no no 0 no semi-furnished
542 no no 0 no unfurnished
543 no no 0 no furnished
544 no no 0 no unfurnished
[41]: data.duplicated().sum()
[41]: 0
[43]: data
6
[545 rows x 13 columns]
[45]: data['area']
[45]: 0 7420
1 8960
2 9960
3 7500
4 7420
…
540 3000
541 2400
542 3620
543 2910
544 3850
Name: area, Length: 545, dtype: int64
[59]: sns.histplot(data['area'],kde=True,color='b',fill=False)
plt.show()
7
[63]: sns.kdeplot(data)
plt.show()
8
[69]: import warnings
warnings.filterwarnings('ignore')
9
[87]: sns.boxplot(data['area'])
10
[91]: sns.countplot(x='area', data=data)
plt.xticks(rotation=45) # Rotate labels if needed
plt.show()
11
[117]: sns.scatterplot(x='area', y='price', data=data)
plt.show()
12
[ ]: from scipy.stats import zscore
z_scores = np.abs(zscore(data.select_dtypes(include=[np.number])))
outliers = data[(z_scores > 3).any(axis=1)]
print(outliers)
[ ]: Q1 = data['price'].quantile(0.25)
Q3 = data['price'].quantile(0.75)
IQR = Q3 - Q1
outliers = data[(data['price'] < Q1 - 1.5 * IQR) | (data['price'] > Q3 + 1.5 *␣
↪IQR)]
print(outliers)
[ ]:
[ ]:
[ ]:
[ ]:
13
[ ]:
[ ]:
14