0% found this document useful (0 votes)
20 views4 pages

Dsbda 4

The document shows code for loading and exploring a Boston housing dataset using pandas and scikit-learn. It splits the data into training and test sets, fits a linear regression model to predict housing prices, and calculates metrics like mean squared error.

Uploaded by

Arbaz Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views4 pages

Dsbda 4

The document shows code for loading and exploring a Boston housing dataset using pandas and scikit-learn. It splits the data into training and test sets, fits a linear regression model to predict housing prices, and calculates metrics like mean squared error.

Uploaded by

Arbaz Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

3/22/24, 6:42 PM COTB12

In [1]: import pandas as pd


import numpy as np
import matplotlib.pyplot as plt

In [19]: x=np.array([95,85,80,70,60])
y=np.array([85,95,70,65,70])

In [20]: model=np.polyfit(x,y,1)

In [21]: model

array([ 0.64383562, 26.78082192])


Out[21]:

In [22]: predict = np.poly1d(model)


predict(65)

68.63013698630137
Out[22]:

In [23]: y_pred=predict(x)
y_pred

array([87.94520548, 81.50684932, 78.28767123, 71.84931507, 65.4109589 ])


Out[23]:

In [24]: from sklearn.metrics import r2_score


r2_score(y, y_pred)

0.4803218090889326
Out[24]:

In [25]: y_line = model[1] + model[0]* x


plt.plot(x, y_line, c = 'r')
plt.scatter(x, y_pred)
plt.scatter(x,y,c='r')

<matplotlib.collections.PathCollection at 0x7fb3fed92f50>
Out[25]:

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/X2ZVULOZ/DSBDA_4[1].html 1/4


3/22/24, 6:42 PM COTB12

In [2]: data=pd.read_csv("/home/student/Desktop/Boston.csv")

In [3]: data

Out[3]: Unnamed:
crim zn indus chas nox rm age dis rad tax ptratio black
0

0 1 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90

1 2 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90

2 3 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83

3 4 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63

4 5 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90

... ... ... ... ... ... ... ... ... ... ... ... ... ...

501 502 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99

502 503 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90

503 504 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90

504 505 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45

505 506 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21.0 396.90

506 rows × 15 columns

In [4]: data.isnull().sum()

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/X2ZVULOZ/DSBDA_4[1].html 2/4


3/22/24, 6:42 PM COTB12
Unnamed: 0 0
Out[4]:
crim 0
zn 0
indus 0
chas 0
nox 0
rm 0
age 0
dis 0
rad 0
tax 0
ptratio 0
black 0
lstat 0
medv 0
dtype: int64

In [5]: x = data.drop(['medv'], axis = 1)


y = data['medv']

In [10]: from sklearn.model_selection import train_test_split


xtrain, xtest, ytrain, ytest =train_test_split(x, y, test_size =0.2,random_state =

In [11]: import sklearn


from sklearn.linear_model import LinearRegression

In [12]: lm = LinearRegression()
model=lm.fit(xtrain, ytrain)

In [13]: ytrain_pred = lm.predict(xtrain)


ytest_pred = lm.predict(xtest)

In [14]: df=pd.DataFrame(ytrain_pred,ytrain)
df=pd.DataFrame(ytest_pred,ytest)

In [15]: from sklearn.metrics import mean_squared_error, r2_score


mse = mean_squared_error(ytest, ytest_pred)
print(mse)
mse = mean_squared_error(ytrain_pred,ytrain)
print(mse)

33.266961459239134
19.302216223048

In [16]: mse = mean_squared_error(ytest, ytest_pred)


print(mse)

33.266961459239134

In [18]: plt.scatter(ytrain ,ytrain_pred,c='blue',marker='o',label='Training data')


plt.scatter(ytest,ytest_pred ,c='lightgreen',marker='s',label='Test data')
plt.xlabel('True values')
plt.ylabel('Predicted')
plt.title("True value vs Predicted value")
plt.legend(loc= 'upper left')
#plt.hlines(y=0,xmin=0,xmax=50)
plt.plot()
plt.show()

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/X2ZVULOZ/DSBDA_4[1].html 3/4


3/22/24, 6:42 PM COTB12

In [ ]:

file:///C:/Users/Arbaz shaikh/AppData/Local/Microsoft/Windows/INetCache/IE/X2ZVULOZ/DSBDA_4[1].html 4/4

You might also like