0% found this document useful (0 votes)
16 views11 pages

DL 1

The document outlines an assignment by Gayatri Rajendra Jagadale to create a Linear Regression Model for predicting house prices in the USA using a provided dataset. It includes data exploration, model training, and evaluation metrics such as MAE, MSE, and RMSE. The code demonstrates data loading, preprocessing, model fitting, and visualization of predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

DL 1

The document outlines an assignment by Gayatri Rajendra Jagadale to create a Linear Regression Model for predicting house prices in the USA using a provided dataset. It includes data exploration, model training, and evaluation metrics such as MAE, MSE, and RMSE. The code demonstrates data loading, preprocessing, model fitting, and visualization of predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Assingmnet No.

: 1
Name: Gayatri Rajendra Jagadale
Roll No.:2447062
Batch: D
Problem Statement –
Real estate agents want help to predict the house price for regions in the USA. He gave you the dataset
to work on and you decided to use the Linear Regression Model. Create a model that will help him to
estimate what the house would sell for. URL for a dataset:
https://github.com/huzaifsayed/Linear-Regression-Model-for-House-PricePrediction/blob/
master/USA_Housing.csv
import pandas as pd import
numpy as np import seaborn as
sns import matplotlib.pyplot as
plt

%matplotlib inline

df = pd.read_csv('USA_Housing.csv')

df

Avg. Area Income Avg. Area House Age Avg. Area Number of Rooms

79545.458574 5.682861 7.009188

\
0

1 79248.642455 6.002900 6.730821

2 61287.067179 5.865890 8.512727

3 63345.240046 7.188236 5.586729

4 59982.197226 5.040555 7.839388 ...

... ... ...

4995 60567.944140 7.830362 6.137356

4996 78491.275435 6.999135 6.576763

4997 63390.686886 7.250591 4.805081


4998 68001.331235 5.534388 7.130144
4999 65510.581804 5.992305 6.792336
Avg. Area Number of Bedrooms Area Population Price \
0 4.09 23086.800503 1.059034e+06
1 3.09 40173.072174 1.505891e+06
2 5.13 36882.159400 1.058988e+06
3 3.26 34310.242831 1.260617e+06
4 4.23 26354.109472 6.309435e+05...
... ... ...
4995 3.46 22837.361035 1.060194e+06
4996 4.02 25616.115489 1.482618e+06
4997 2.13 33266.145490 1.030730e+06
4998 5.44 42625.620156 1.198657e+06
4999 4.07 46501.283803 1.298950e+06
Address
0 208 Michael Ferry Apt. 674\nLaurabury, NE 3701...
1 188 Johnson Views Suite 079\nLake Kathleen, CA...
2 9127 Elizabeth Stravenue\nDanieltown, WI 06482...
3 USS Barnett\nFPO AP 44820
4 USNS Raymond\nFPO AE 09386 ...
...
4995 USNS Williams\nFPO AP 30153-7653
4996 PSC 9258, Box 8489\nAPO AA 42991-3352 4997 4215
Tracy Garden Suite 076\nJoshualand, VA 01...
4998 USS Wallace\nFPO AE 73316
4999 37778 George Ridges Apt. 509\nEast Holly, NV 2...
[5000 rows x 7 columns]
df.head()
Avg. Area Income Avg. Area House Age Avg. Area Number of Rooms \
0 79545.458574 5.682861 7.009188
1 79248.642455 6.002900 6.730821
2 61287.067179 5.865890 8.512727
3 63345.240046 7.188236 5.586729
4 59982.197226 5.040555 7.839388
Avg. Area Number of Bedrooms Area Population Price \
0 4.09 23086.800503 1.059034e+06
1 3.09 40173.072174 1.505891e+06
2 5.13 36882.159400 1.058988e+06
3 3.26 34310.242831 1.260617e+06
4 4.23 26354.109472 6.309435e+05
Address
0 208 Michael Ferry Apt. 674\nLaurabury, NE 3701...
1 188 Johnson Views Suite 079\nLake Kathleen, CA...
2 9127 Elizabeth Stravenue\nDanieltown, WI 06482...
3 USS Barnett\nFPO AP 44820 4 USNS
Raymond\nFPO AE 09386 df.describe()

Avg. Area Income Avg. Area House Age Avg. Area Number of
Rooms \
count 5000.000000 5000.000000
5000.000000
mean 68583.108984 5.977222
6.987792
std 10657.991214 0.991456
1.005833
min 17796.631190 2.644304
3.236194
25% 61480.562388 5.322283
6.299250
50% 68804.286404 5.970429
7.002902
75% 75783.338666 6.650808
7.665871
max 107701.748378 9.519088
10.759588

Avg. Area Number of Bedrooms Area Population Price


count 5000.000000 5000.000000 5.000000e+03
mean 3.981330 36163.516039 1.232073e+06
std 1.234137 9925.650114 3.531176e+05
min 2.000000 172.610686 1.593866e+04
25% 3.140000 29403.928702 9.975771e+05
50% 4.050000 36199.406689 1.232669e+06
75% 4.490000 42861.290769 1.471210e+06
max 6.500000 69621.713378 2.469066e+06
df.columns

Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of
Rooms',
'Avg. Area Number of Bedrooms', 'Area Population', 'Price',
'Address'],
dtype='object')
sns.pairplot(df)

<seaborn.axisgrid.PairGrid at 0x26b82949fd0>
sns.distplot(df['Price'])
C:\Users\shrey\AppData\Local\Temp\ipykernel_18444\834922981.py:1:
UserWarning:
`distplot` is a deprecated function and will be removed in seaborn
v0.14.0.
Please adapt your code to use either `displot` (a figure-level
function with
similar flexibility) or `histplot` (an axes-level function for
histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(df['Price'])
<Axes: xlabel='Price', ylabel='Density'>
sns.heatmap(df.corr(), annot =True)

C:\Users\shrey\AppData\Local\Temp\ipykernel_18444\621126171.py:1:
FutureWarning: The default value of numeric_only in DataFrame.corr is
deprecated. In a future version, it will default to False. Select only
valid columns or specify the value of numeric_only to silence this
warning.
sns.heatmap(df.corr(), annot=True )

<Axes: >
X =df[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of
Rooms',
'Avg. Area Number of Bedrooms', 'Area Population']]

y = df['Price']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,


test_size=0.4, random_state=101)

from sklearn.linear_model import LinearRegression

lr = LinearRegression() lr.fit(X_train,y_train)

LinearRegression()
print(lr.intercept_)
-2640159.7968526953
coeff_df = pd.DataFrame(lr.coef_,X.columns,columns=['Coefficient'])
coeff_df
Coefficient
Avg. Area Income 21.528276
Avg. Area House Age 164883.282027
Avg. Area Number of Rooms 122368.678027
Avg. Area Number of Bedrooms 2233.801864
Area Population 15.150420
predictions = lr.predict(X_test)

plt.scatter(y_test,predictions)

<matplotlib.collections.PathCollection at 0x26b8bb4c4c0>

sns.distplot((y_test-predictions),bins=50)
C:\Users\shrey\AppData\Local\Temp\ipykernel_18444\1061164399.py:1:
UserWarning:
`distplot` is a deprecated function and will be removed in seaborn
v0.14.0.
Please adapt your code to use either `displot` (a figure-level
function with
similar flexibility) or `histplot` (an axes-level function for
histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot((y_test-predictions),bins=50)
<Axes: xlabel='Price', ylabel='Density'>

from sklearn import metrics

print('MAE:', metrics.mean_absolute_error(y_test, predictions))


print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test,
predictions)))
MAE: 82288.22251914942
MSE: 10460958907.20898
RMSE: 102278.82922290899

You might also like