Linear Regression Using Python
Linear Regression Using Python
She has asked me if i could help her out with your new data science skills. me say yes, and
decide that Linear Regression might be a good path to solve this problem!
My neighbor then gives you some information about a bunch of houses in regions of the
India,it is all in the data set: INDIA_Housing.csv.
'Avg. Area Income': Avg. Income of residents of the city house is located in. 'Avg. Area House
Age': Avg Age of Houses in same city 'Avg. Area Number of Rooms': Avg Number of Rooms
for Houses in same city 'Avg. Area Number of Bedrooms': Avg Number of Bedrooms for
Houses in same city 'Area Population': Population of city house is located in 'Price': Price that
the house sold at 'Address': Address for the house
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
INDIAhousing.head()
Out[2]: Avg.
Avg. Avg. Area
Area
Avg. Area Area Number Area
Number Price Address
Income House of Population
of
Age Bedrooms
Rooms
9127 Elizabeth
2 61287.067179 5.865890 8.512727 5.13 36882.159400 1.058988e+06 Stravenue\nDanieltown,
WI 06482..
USS Barnett\nFPO AP
3 63345.240046 7.188236 5.586729 3.26 34310.242831 1.260617e+06
44820
USNS Raymond\nFPO
4 59982.197226 5.040555 7.839388 4.23 26354.109472 6.309435e+05
AE 09386
In [3]: INDIAhousing.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Avg. Area Income 5000 non-null float64
1 Avg. Area House Age 5000 non-null float64
2 Avg. Area Number of Rooms 5000 non-null float64
3 Avg. Area Number of Bedrooms 5000 non-null float64
4 Area Population 5000 non-null float64
5 Price 5000 non-null float64
6 Address 5000 non-null object
dtypes: float64(6), object(1)
memory usage: 273.6+ KB
In [4]: INDIAhousing.describe()
In [5]: INDIAhousing.columns
<seaborn.axisgrid.PairGrid at 0x21385c9ef50>
Out[6]:
In [7]: sns.distplot(INDIAhousing['Price'])
C:\Users\HP\AppData\Local\Temp\ipykernel_4772\867072288.py:1: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(INDIAhousing['Price'])
C:\Users\HP\AppData\Local\Temp\ipykernel_4772\2139820291.py:1: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(INDIAhousing['Area Population'])
<Axes: xlabel='Area Population', ylabel='Density'>
Out[8]:
C:\Users\HP\AppData\Local\Temp\ipykernel_4772\3131757723.py:1: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
C:\Users\HP\AppData\Local\Temp\ipykernel_4772\1332842614.py:1: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
C:\Users\HP\AppData\Local\Temp\ipykernel_4772\2831880010.py:1: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
C:\Users\HP\AppData\Local\Temp\ipykernel_4772\334197827.py:1: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
C:\Users\HP\AppData\Local\Temp\ipykernel_4772\2139820291.py:1: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(INDIAhousing['Area Population'])
<Axes: xlabel='Area Population', ylabel='Density'>
Out[13]:
In [15]: sns.histplot(INDIAhousing['Price'])
<Axes: >
Out[22]:
Y = INDIAhousing['Price']
In [27]: lm = LinearRegression()
In [28]: lm.fit(X_train,Y_train)
Out[28]: ▾ LinearRegression
LinearRegression()
-2640159.79685267
Out[30]: Coefficient
Holding all other features fixed, a 1 unit increase in Avg. Area Income is associated with an
increase of $21.52
Holding all other features fixed, a 1 unit increase in Avg. Area House Age is associated with
an increase of $164883.28
Holding all other features fixed, a 1 unit increase in Avg. Area Number of Rooms is
associated with an increase of $122368.67
Holding all other features fixed, a 1 unit increase in Avg. Area Number of Bedrooms is
associated with an increase of $2233.80
Holding all other features fixed, a 1 unit increase in Area Population is associated with an
increase of $15.15
Does this make sense? Probably not because I made up this data. If you want real data to
repeat this sort of analysis, check out the boston dataset:
boston = load_boston()
print(boston.DESCR)
boston_df = boston.data
In [32]: plt.scatter(Y_test,predictions)
<matplotlib.collections.PathCollection at 0x2138b335750>
Out[32]:
Residual Histogram
In [33]: sns.distplot((Y_test-predictions),bins=50);
C:\Users\HP\AppData\Local\Temp\ipykernel_4772\1960946261.py:1: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot((Y_test-predictions),bins=50);
In [34]: sns.histplot((Y_test-predictions),bins=50);
MAE: 82288.22251914942
MSE: 82288.22251914942
RMSE: 102278.82922290884
Thank You
In [ ]: