DL 1
DL 1
: 1
Name: Gayatri Rajendra Jagadale
Roll No.:2447062
Batch: D
Problem Statement –
Real estate agents want help to predict the house price for regions in the USA. He gave you the dataset
to work on and you decided to use the Linear Regression Model. Create a model that will help him to
estimate what the house would sell for. URL for a dataset:
https://github.com/huzaifsayed/Linear-Regression-Model-for-House-PricePrediction/blob/
master/USA_Housing.csv
import pandas as pd import
numpy as np import seaborn as
sns import matplotlib.pyplot as
plt
%matplotlib inline
df = pd.read_csv('USA_Housing.csv')
df
Avg. Area Income Avg. Area House Age Avg. Area Number of Rooms
\
0
Avg. Area Income Avg. Area House Age Avg. Area Number of
Rooms \
count 5000.000000 5000.000000
5000.000000
mean 68583.108984 5.977222
6.987792
std 10657.991214 0.991456
1.005833
min 17796.631190 2.644304
3.236194
25% 61480.562388 5.322283
6.299250
50% 68804.286404 5.970429
7.002902
75% 75783.338666 6.650808
7.665871
max 107701.748378 9.519088
10.759588
Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of
Rooms',
'Avg. Area Number of Bedrooms', 'Area Population', 'Price',
'Address'],
dtype='object')
sns.pairplot(df)
<seaborn.axisgrid.PairGrid at 0x26b82949fd0>
sns.distplot(df['Price'])
C:\Users\shrey\AppData\Local\Temp\ipykernel_18444\834922981.py:1:
UserWarning:
`distplot` is a deprecated function and will be removed in seaborn
v0.14.0.
Please adapt your code to use either `displot` (a figure-level
function with
similar flexibility) or `histplot` (an axes-level function for
histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(df['Price'])
<Axes: xlabel='Price', ylabel='Density'>
sns.heatmap(df.corr(), annot =True)
C:\Users\shrey\AppData\Local\Temp\ipykernel_18444\621126171.py:1:
FutureWarning: The default value of numeric_only in DataFrame.corr is
deprecated. In a future version, it will default to False. Select only
valid columns or specify the value of numeric_only to silence this
warning.
sns.heatmap(df.corr(), annot=True )
<Axes: >
X =df[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of
Rooms',
'Avg. Area Number of Bedrooms', 'Area Population']]
y = df['Price']
lr = LinearRegression() lr.fit(X_train,y_train)
LinearRegression()
print(lr.intercept_)
-2640159.7968526953
coeff_df = pd.DataFrame(lr.coef_,X.columns,columns=['Coefficient'])
coeff_df
Coefficient
Avg. Area Income 21.528276
Avg. Area House Age 164883.282027
Avg. Area Number of Rooms 122368.678027
Avg. Area Number of Bedrooms 2233.801864
Area Population 15.150420
predictions = lr.predict(X_test)
plt.scatter(y_test,predictions)
<matplotlib.collections.PathCollection at 0x26b8bb4c4c0>
sns.distplot((y_test-predictions),bins=50)
C:\Users\shrey\AppData\Local\Temp\ipykernel_18444\1061164399.py:1:
UserWarning:
`distplot` is a deprecated function and will be removed in seaborn
v0.14.0.
Please adapt your code to use either `displot` (a figure-level
function with
similar flexibility) or `histplot` (an axes-level function for
histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot((y_test-predictions),bins=50)
<Axes: xlabel='Price', ylabel='Density'>