0% found this document useful (0 votes)
28 views11 pages

Ash Regression

This document discusses loading and exploring a housing dataset using Pandas and performing linear regression on the data. It loads the data, cleans it by dropping null rows and columns, explores relationships between variables via plotting, and builds a linear regression model to predict prices using scikit-learn.

Uploaded by

sarpateashish5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views11 pages

Ash Regression

This document discusses loading and exploring a housing dataset using Pandas and performing linear regression on the data. It loads the data, cleans it by dropping null rows and columns, explores relationships between variables via plotting, and builds a linear regression model to predict prices using scikit-learn.

Uploaded by

sarpateashish5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

hs4e8yoxb

May 8, 2024

[1]: from google.colab import drive


drive.mount('/content/drive')

Mounted at /content/drive

[2]: import pandas as pd


df=pd.read_csv('/content/drive/MyDrive/rgression_dataset/HousingData.csv')
df

[2]: CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \


0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3 222
.. … … … … … … … … … …
501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1 273
502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273
503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273
504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273
505 0.04741 0.0 11.93 0.0 0.573 6.030 NaN 2.5050 1 273

PTRATIO B LSTAT MEDV


0 15.3 396.90 4.98 24.0
1 17.8 396.90 9.14 21.6
2 17.8 392.83 4.03 34.7
3 18.7 394.63 2.94 33.4
4 18.7 396.90 NaN 36.2
.. … … … …
501 21.0 391.99 NaN 22.4
502 21.0 396.90 9.08 20.6
503 21.0 396.90 5.64 23.9
504 21.0 393.45 6.48 22.0
505 21.0 396.90 7.88 11.9

[506 rows x 14 columns]

[2]:

1
[3]: df.rename(columns={'MEDV':'Price'},inplace=True)

[4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CRIM 486 non-null float64
1 ZN 486 non-null float64
2 INDUS 486 non-null float64
3 CHAS 486 non-null float64
4 NOX 506 non-null float64
5 RM 506 non-null float64
6 AGE 486 non-null float64
7 DIS 506 non-null float64
8 RAD 506 non-null int64
9 TAX 506 non-null int64
10 PTRATIO 506 non-null float64
11 B 506 non-null float64
12 LSTAT 486 non-null float64
13 Price 506 non-null float64
dtypes: float64(12), int64(2)
memory usage: 55.5 KB

[5]: df=df.dropna()
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 394 entries, 0 to 504
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CRIM 394 non-null float64
1 ZN 394 non-null float64
2 INDUS 394 non-null float64
3 CHAS 394 non-null float64
4 NOX 394 non-null float64
5 RM 394 non-null float64
6 AGE 394 non-null float64
7 DIS 394 non-null float64
8 RAD 394 non-null int64
9 TAX 394 non-null int64
10 PTRATIO 394 non-null float64
11 B 394 non-null float64
12 LSTAT 394 non-null float64
13 Price 394 non-null float64

2
dtypes: float64(12), int64(2)
memory usage: 46.2 KB

[6]: df

[6]: CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \


0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222
5 0.02985 0.0 2.18 0.0 0.458 6.430 58.7 6.0622 3 222
.. … … … … … … … … … …
499 0.17783 0.0 9.69 0.0 0.585 5.569 73.5 2.3999 6 391
500 0.22438 0.0 9.69 0.0 0.585 6.027 79.7 2.4982 6 391
502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273
503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273
504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273

PTRATIO B LSTAT Price


0 15.3 396.90 4.98 24.0
1 17.8 396.90 9.14 21.6
2 17.8 392.83 4.03 34.7
3 18.7 394.63 2.94 33.4
5 18.7 394.12 5.21 28.7
.. … … … …
499 19.2 395.77 15.10 17.5
500 19.2 396.90 14.33 16.8
502 21.0 396.90 9.08 20.6
503 21.0 396.90 5.64 23.9
504 21.0 393.45 6.48 22.0

[394 rows x 14 columns]

[7]: df.drop('CHAS',axis=1,inplace=True)

<ipython-input-7-0b8b043076ef>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-


docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.drop('CHAS',axis=1,inplace=True)

[8]: df.drop('ZN',axis=1,inplace =True )

<ipython-input-8-2e9e1705e643>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-

3
docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.drop('ZN',axis=1,inplace =True )

[9]: df

[9]: CRIM INDUS NOX RM AGE DIS RAD TAX PTRATIO B \


0 0.00632 2.31 0.538 6.575 65.2 4.0900 1 296 15.3 396.90
1 0.02731 7.07 0.469 6.421 78.9 4.9671 2 242 17.8 396.90
2 0.02729 7.07 0.469 7.185 61.1 4.9671 2 242 17.8 392.83
3 0.03237 2.18 0.458 6.998 45.8 6.0622 3 222 18.7 394.63
5 0.02985 2.18 0.458 6.430 58.7 6.0622 3 222 18.7 394.12
.. … … … … … … … … … …
499 0.17783 9.69 0.585 5.569 73.5 2.3999 6 391 19.2 395.77
500 0.22438 9.69 0.585 6.027 79.7 2.4982 6 391 19.2 396.90
502 0.04527 11.93 0.573 6.120 76.7 2.2875 1 273 21.0 396.90
503 0.06076 11.93 0.573 6.976 91.0 2.1675 1 273 21.0 396.90
504 0.10959 11.93 0.573 6.794 89.3 2.3889 1 273 21.0 393.45

LSTAT Price
0 4.98 24.0
1 9.14 21.6
2 4.03 34.7
3 2.94 33.4
5 5.21 28.7
.. … …
499 15.10 17.5
500 14.33 16.8
502 9.08 20.6
503 5.64 23.9
504 6.48 22.0

[394 rows x 12 columns]

[10]: import matplotlib.pyplot as plt

df.plot.scatter(x='CRIM',y='Price')

[10]: <Axes: xlabel='CRIM', ylabel='Price'>

4
[11]: df.plot.scatter(x='TAX',y='Price')

[11]: <Axes: xlabel='TAX', ylabel='Price'>

5
[12]: df.plot.scatter(x='B',y='Price')

[12]: <Axes: xlabel='B', ylabel='Price'>

6
[13]: df.plot.scatter(x='LSTAT',y='B')

[13]: <Axes: xlabel='LSTAT', ylabel='B'>

7
[14]: import numpy as np

from sklearn import linear_model


x=df.drop('Price',axis=1)
y=df['Price']
x = np.array(x)
x.reshape(1,-1)

[14]: array([[6.3200e-03, 2.3100e+00, 5.3800e-01, …, 2.1000e+01, 3.9345e+02,


6.4800e+00]])

[15]: from sklearn.model_selection import train_test_split


x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)

[15]:

[16]: from sklearn.linear_model import LinearRegression


model=LinearRegression()
model.fit(x_train,y_train)

[16]: LinearRegression()

8
[17]: y_pred=model.predict(x_test)

[18]: model.score(x_test,y_test)

[18]: 0.6975020387554531

[19]: model.score(x_train,y_train)

[19]: 0.7725543921852038

[20]: import numpy as np

from sklearn import metrics


print("MAE:",metrics.mean_absolute_error(y_test,y_pred))
print("MSE:",metrics.mean_squared_error(y_test,y_pred))
print("RMSE:",np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

MAE: 3.81048024985643
MSE: 37.14034003813928
RMSE: 6.094287492245445

[21]: print("Accuracy:",model.score(x_test,y_test)*100,"%")

Accuracy: 69.75020387554531 %

[27]: x_train.shape

y_train.shape

[27]: (315,)

[23]: plt.scatter(x_train,y_train, color = 'red')


plt.plot(x_train,model.predict(x_train), color='blue')
plt.x_label("crime")
plt.y_label("house prices")

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-07e70f9b4ff2> in <cell line: 1>()
----> 1 plt.scatter(x_train,y_train, color = 'red')
2 plt.plot(x_train,model.predict(x_train), color='blue')
3 plt.x_label("crime")
4 plt.y_label("house prices")

9
/usr/local/lib/python3.10/dist-packages/matplotlib/pyplot.py in scatter(x, y, s,␣
↪c, marker, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors,␣

↪plotnonfinite, data, **kwargs)

2860 vmin=None, vmax=None, alpha=None, linewidths=None, *,


2861 edgecolors=None, plotnonfinite=False, data=None, **kwargs):
-> 2862 __ret = gca().scatter(
2863 x, y, s=s, c=c, marker=marker, cmap=cmap, norm=norm,
2864 vmin=vmin, vmax=vmax, alpha=alpha, linewidths=linewidths,

/usr/local/lib/python3.10/dist-packages/matplotlib/__init__.py in inner(ax,␣
↪data, *args, **kwargs)

1440 def inner(ax, *args, data=None, **kwargs):


1441 if data is None:
-> 1442 return func(ax, *map(sanitize_sequence, args), **kwargs)
1443
1444 bound = new_sig.bind(ax, *args, **kwargs)

/usr/local/lib/python3.10/dist-packages/matplotlib/axes/_axes.py in␣
↪scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths,␣

↪edgecolors, plotnonfinite, **kwargs)

4582 y = np.ma.ravel(y)
4583 if x.size != y.size:
-> 4584 raise ValueError("x and y must be the same size")
4585
4586 if s is None:

ValueError: x and y must be the same size

10
11

You might also like