0% found this document useful (0 votes)

28 views11 pages

Ash Regression

This document discusses loading and exploring a housing dataset using Pandas and performing linear regression on the data. It loads the data, cleans it by dropping null rows and columns, explores relationships between variables via plotting, and builds a linear regression model to predict prices using scikit-learn.

Uploaded by

sarpateashish5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views11 pages

Ash Regression

Uploaded by

sarpateashish5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

hs4e8yoxb

May 8, 2024

[1]: from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive

[2]: import pandas as pd

df=pd.read_csv('/content/drive/MyDrive/rgression_dataset/HousingData.csv')
df

[2]: CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \

0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3 222
.. … … … … … … … … … …
501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1 273
502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273
503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273
504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273
505 0.04741 0.0 11.93 0.0 0.573 6.030 NaN 2.5050 1 273

PTRATIO B LSTAT MEDV

0 15.3 396.90 4.98 24.0
1 17.8 396.90 9.14 21.6
2 17.8 392.83 4.03 34.7
3 18.7 394.63 2.94 33.4
4 18.7 396.90 NaN 36.2
.. … … … …
501 21.0 391.99 NaN 22.4
502 21.0 396.90 9.08 20.6
503 21.0 396.90 5.64 23.9
504 21.0 393.45 6.48 22.0
505 21.0 396.90 7.88 11.9

[506 rows x 14 columns]

[2]:

1
[3]: df.rename(columns={'MEDV':'Price'},inplace=True)

[4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CRIM 486 non-null float64
1 ZN 486 non-null float64
2 INDUS 486 non-null float64
3 CHAS 486 non-null float64
4 NOX 506 non-null float64
5 RM 506 non-null float64
6 AGE 486 non-null float64
7 DIS 506 non-null float64
8 RAD 506 non-null int64
9 TAX 506 non-null int64
10 PTRATIO 506 non-null float64
11 B 506 non-null float64
12 LSTAT 486 non-null float64
13 Price 506 non-null float64
dtypes: float64(12), int64(2)
memory usage: 55.5 KB

[5]: df=df.dropna()
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 394 entries, 0 to 504
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CRIM 394 non-null float64
1 ZN 394 non-null float64
2 INDUS 394 non-null float64
3 CHAS 394 non-null float64
4 NOX 394 non-null float64
5 RM 394 non-null float64
6 AGE 394 non-null float64
7 DIS 394 non-null float64
8 RAD 394 non-null int64
9 TAX 394 non-null int64
10 PTRATIO 394 non-null float64
11 B 394 non-null float64
12 LSTAT 394 non-null float64
13 Price 394 non-null float64

2
dtypes: float64(12), int64(2)
memory usage: 46.2 KB

[6]: df

[6]: CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \

0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222
5 0.02985 0.0 2.18 0.0 0.458 6.430 58.7 6.0622 3 222
.. … … … … … … … … … …
499 0.17783 0.0 9.69 0.0 0.585 5.569 73.5 2.3999 6 391
500 0.22438 0.0 9.69 0.0 0.585 6.027 79.7 2.4982 6 391
502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273
503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273
504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273

PTRATIO B LSTAT Price

0 15.3 396.90 4.98 24.0
1 17.8 396.90 9.14 21.6
2 17.8 392.83 4.03 34.7
3 18.7 394.63 2.94 33.4
5 18.7 394.12 5.21 28.7
.. … … … …
499 19.2 395.77 15.10 17.5
500 19.2 396.90 14.33 16.8
502 21.0 396.90 9.08 20.6
503 21.0 396.90 5.64 23.9
504 21.0 393.45 6.48 22.0

[394 rows x 14 columns]

[7]: df.drop('CHAS',axis=1,inplace=True)

<ipython-input-7-0b8b043076ef>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-

docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.drop('CHAS',axis=1,inplace=True)

[8]: df.drop('ZN',axis=1,inplace =True )

<ipython-input-8-2e9e1705e643>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-

3
docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.drop('ZN',axis=1,inplace =True )

[9]: df

[9]: CRIM INDUS NOX RM AGE DIS RAD TAX PTRATIO B \

0 0.00632 2.31 0.538 6.575 65.2 4.0900 1 296 15.3 396.90
1 0.02731 7.07 0.469 6.421 78.9 4.9671 2 242 17.8 396.90
2 0.02729 7.07 0.469 7.185 61.1 4.9671 2 242 17.8 392.83
3 0.03237 2.18 0.458 6.998 45.8 6.0622 3 222 18.7 394.63
5 0.02985 2.18 0.458 6.430 58.7 6.0622 3 222 18.7 394.12
.. … … … … … … … … … …
499 0.17783 9.69 0.585 5.569 73.5 2.3999 6 391 19.2 395.77
500 0.22438 9.69 0.585 6.027 79.7 2.4982 6 391 19.2 396.90
502 0.04527 11.93 0.573 6.120 76.7 2.2875 1 273 21.0 396.90
503 0.06076 11.93 0.573 6.976 91.0 2.1675 1 273 21.0 396.90
504 0.10959 11.93 0.573 6.794 89.3 2.3889 1 273 21.0 393.45

LSTAT Price
0 4.98 24.0
1 9.14 21.6
2 4.03 34.7
3 2.94 33.4
5 5.21 28.7
.. … …
499 15.10 17.5
500 14.33 16.8
502 9.08 20.6
503 5.64 23.9
504 6.48 22.0

[394 rows x 12 columns]

[10]: import matplotlib.pyplot as plt

df.plot.scatter(x='CRIM',y='Price')

[10]: <Axes: xlabel='CRIM', ylabel='Price'>

4
[11]: df.plot.scatter(x='TAX',y='Price')

[11]: <Axes: xlabel='TAX', ylabel='Price'>

5
[12]: df.plot.scatter(x='B',y='Price')

[12]: <Axes: xlabel='B', ylabel='Price'>

6
[13]: df.plot.scatter(x='LSTAT',y='B')

[13]: <Axes: xlabel='LSTAT', ylabel='B'>

7
[14]: import numpy as np

from sklearn import linear_model

x=df.drop('Price',axis=1)
y=df['Price']
x = np.array(x)
x.reshape(1,-1)

[14]: array([[6.3200e-03, 2.3100e+00, 5.3800e-01, …, 2.1000e+01, 3.9345e+02,

6.4800e+00]])

[15]: from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)

[15]:

[16]: from sklearn.linear_model import LinearRegression

model=LinearRegression()
model.fit(x_train,y_train)

[16]: LinearRegression()

8
[17]: y_pred=model.predict(x_test)

[18]: model.score(x_test,y_test)

[18]: 0.6975020387554531

[19]: model.score(x_train,y_train)

[19]: 0.7725543921852038

[20]: import numpy as np

from sklearn import metrics

print("MAE:",metrics.mean_absolute_error(y_test,y_pred))
print("MSE:",metrics.mean_squared_error(y_test,y_pred))
print("RMSE:",np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

MAE: 3.81048024985643
MSE: 37.14034003813928
RMSE: 6.094287492245445

[21]: print("Accuracy:",model.score(x_test,y_test)*100,"%")

Accuracy: 69.75020387554531 %

[27]: x_train.shape

y_train.shape

[27]: (315,)

[23]: plt.scatter(x_train,y_train, color = 'red')

plt.plot(x_train,model.predict(x_train), color='blue')
plt.x_label("crime")
plt.y_label("house prices")

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-07e70f9b4ff2> in <cell line: 1>()
----> 1 plt.scatter(x_train,y_train, color = 'red')
2 plt.plot(x_train,model.predict(x_train), color='blue')
3 plt.x_label("crime")
4 plt.y_label("house prices")

9
/usr/local/lib/python3.10/dist-packages/matplotlib/pyplot.py in scatter(x, y, s,␣
↪c, marker, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors,␣

↪plotnonfinite, data, **kwargs)

2860 vmin=None, vmax=None, alpha=None, linewidths=None, *,

2861 edgecolors=None, plotnonfinite=False, data=None, **kwargs):
-> 2862 __ret = gca().scatter(
2863 x, y, s=s, c=c, marker=marker, cmap=cmap, norm=norm,
2864 vmin=vmin, vmax=vmax, alpha=alpha, linewidths=linewidths,

/usr/local/lib/python3.10/dist-packages/matplotlib/__init__.py in inner(ax,␣
↪data, *args, **kwargs)

1440 def inner(ax, *args, data=None, **kwargs):

1441 if data is None:
-> 1442 return func(ax, *map(sanitize_sequence, args), **kwargs)
1443
1444 bound = new_sig.bind(ax, *args, **kwargs)

/usr/local/lib/python3.10/dist-packages/matplotlib/axes/_axes.py in␣
↪scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths,␣

↪edgecolors, plotnonfinite, **kwargs)

4582 y = np.ma.ravel(y)
4583 if x.size != y.size:
-> 4584 raise ValueError("x and y must be the same size")
4585
4586 if s is None:

ValueError: x and y must be the same size

10
11

ML Lab Experiment Shortened With Same Output
No ratings yet
ML Lab Experiment Shortened With Same Output
6 pages
Deber
No ratings yet
Deber
23 pages
ML Manual
No ratings yet
ML Manual
18 pages
Geya Fds
No ratings yet
Geya Fds
34 pages
Dsbda Exp4 Part1
No ratings yet
Dsbda Exp4 Part1
39 pages
ML Lab Experiment Shivansh
No ratings yet
ML Lab Experiment Shivansh
29 pages
Machine Learning - Code - Jupiter
No ratings yet
Machine Learning - Code - Jupiter
14 pages
ML Minimized Programs
No ratings yet
ML Minimized Programs
9 pages
Houses Prices Prediction Model
No ratings yet
Houses Prices Prediction Model
11 pages
ML Lab File Final
No ratings yet
ML Lab File Final
17 pages
ML Regression
No ratings yet
ML Regression
9 pages
Numeric
No ratings yet
Numeric
20 pages
DA Lab2
No ratings yet
DA Lab2
5 pages
DSBDA4
No ratings yet
DSBDA4
6 pages
DSBDA Assignment 4 Jupyter Notebook
No ratings yet
DSBDA Assignment 4 Jupyter Notebook
5 pages
Regression Algorithm
No ratings yet
Regression Algorithm
9 pages
DT As Regressor-Follow
No ratings yet
DT As Regressor-Follow
4 pages
AD-22053227 Lab 401, 402
No ratings yet
AD-22053227 Lab 401, 402
4 pages
Xgboost
No ratings yet
Xgboost
12 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
Data Analytucs 1
No ratings yet
Data Analytucs 1
5 pages
Prg7a - Jupyter Notebook
No ratings yet
Prg7a - Jupyter Notebook
12 pages
Python Cheat Sheet For Data Analysis
No ratings yet
Python Cheat Sheet For Data Analysis
2 pages
Week 6 LAB
No ratings yet
Week 6 LAB
13 pages
DL 1
No ratings yet
DL 1
4 pages
Chirag HOusing Price Pred
No ratings yet
Chirag HOusing Price Pred
12 pages
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
No ratings yet
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
14 pages
New Opendocument Text
No ratings yet
New Opendocument Text
7 pages
DSBDA Prac4 1
No ratings yet
DSBDA Prac4 1
2 pages
DL (Pra 01)
No ratings yet
DL (Pra 01)
9 pages
Linear Regression - Jupyter Notebook
No ratings yet
Linear Regression - Jupyter Notebook
2 pages
Housing Prices Linear Regression
No ratings yet
Housing Prices Linear Regression
3 pages
Code 1
No ratings yet
Code 1
3 pages
IoT Task4 21BEC0384
No ratings yet
IoT Task4 21BEC0384
9 pages
Linear Regression - Py
No ratings yet
Linear Regression - Py
2 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Zerox Ready
No ratings yet
Zerox Ready
21 pages
T2 Summary VHA
No ratings yet
T2 Summary VHA
14 pages
ML Manual
No ratings yet
ML Manual
9 pages
Explain Me Every Code Written in It With Deep Know
No ratings yet
Explain Me Every Code Written in It With Deep Know
7 pages
ML Internal Answers
No ratings yet
ML Internal Answers
9 pages
Cheat Sheet Modeldeploy
No ratings yet
Cheat Sheet Modeldeploy
2 pages
Performance of IEC 61850 9-2 Process Bus and Corrective Measure PDF
No ratings yet
Performance of IEC 61850 9-2 Process Bus and Corrective Measure PDF
11 pages
Petrol Assignment
No ratings yet
Petrol Assignment
5 pages
Mlext
No ratings yet
Mlext
1 page
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
1
No ratings yet
1
13 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
Python File
No ratings yet
Python File
5 pages
Report
No ratings yet
Report
40 pages
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
No ratings yet
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
7 pages
ML
No ratings yet
ML
17 pages
Train
No ratings yet
Train
17 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Expt 7
No ratings yet
Expt 7
3 pages
Docu 4
No ratings yet
Docu 4
3 pages
Vehicle State Estimation For Roll Control System: Jihan Ryu, Nikloai K. Moshchuk, and Shih-Ken Chen
No ratings yet
Vehicle State Estimation For Roll Control System: Jihan Ryu, Nikloai K. Moshchuk, and Shih-Ken Chen
6 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
Deepak Data Analysis 1
No ratings yet
Deepak Data Analysis 1
31 pages
Midterm
No ratings yet
Midterm
9 pages
Forecast - Notes
100% (1)
Forecast - Notes
24 pages
Demand Forecasting (For Students) - V6
No ratings yet
Demand Forecasting (For Students) - V6
75 pages
Qualifying Exam in Probability and Statistics PDF
0% (1)
Qualifying Exam in Probability and Statistics PDF
11 pages
Statistics and Probability Module 4 Moodle
No ratings yet
Statistics and Probability Module 4 Moodle
6 pages
AP 7.1 Guided Notes For Reading Textbook
No ratings yet
AP 7.1 Guided Notes For Reading Textbook
6 pages
PSM Estimation Confidence Interval
No ratings yet
PSM Estimation Confidence Interval
55 pages
Performance Evaluation of Novel Logarithmic Estimators Under Correlated Measurement Errors
No ratings yet
Performance Evaluation of Novel Logarithmic Estimators Under Correlated Measurement Errors
12 pages
QBA Final Exam (Make-Up)
No ratings yet
QBA Final Exam (Make-Up)
14 pages
Chapter 9
No ratings yet
Chapter 9
23 pages
Chapter 5 Testing For Linear Restrictions and Structural Change
No ratings yet
Chapter 5 Testing For Linear Restrictions and Structural Change
7 pages
Pengaruh Kepercayaan Diri Terhadap Prestasi Belajar Siswa
No ratings yet
Pengaruh Kepercayaan Diri Terhadap Prestasi Belajar Siswa
8 pages
27.02.2024 For Students
No ratings yet
27.02.2024 For Students
7 pages
SaeHB Me Beta
No ratings yet
SaeHB Me Beta
6 pages
AP Stats Unit 3 Practice Test
No ratings yet
AP Stats Unit 3 Practice Test
4 pages
Solutions Advanced Econometrics 1 Midterm 2013
No ratings yet
Solutions Advanced Econometrics 1 Midterm 2013
3 pages
Tree Example - Hitters - Jupyter Notebook
No ratings yet
Tree Example - Hitters - Jupyter Notebook
10 pages
U Exercise Work Male Height Weight Age Health: 5.2. Consider A Model For The Health of An Individual
No ratings yet
U Exercise Work Male Height Weight Age Health: 5.2. Consider A Model For The Health of An Individual
21 pages
Regression Vs Bland-Altman
No ratings yet
Regression Vs Bland-Altman
37 pages
Case Problem 2 Gulf Real Estate Properties
No ratings yet
Case Problem 2 Gulf Real Estate Properties
14 pages
Regression Results
No ratings yet
Regression Results
7 pages
0 0 1 1 1 W A P 1 N N I 1 I X I N 1 N N I 1 I 2
No ratings yet
0 0 1 1 1 W A P 1 N N I 1 I X I N 1 N N I 1 I 2
2 pages
Excel Regression Analysis Output Explained
No ratings yet
Excel Regression Analysis Output Explained
14 pages
Cowan Statistical Data Analysis
No ratings yet
Cowan Statistical Data Analysis
10 pages
Camry Group Activity Bsba-Fm3a
No ratings yet
Camry Group Activity Bsba-Fm3a
12 pages
Baker (2011) Fragility Fitting
No ratings yet
Baker (2011) Fragility Fitting
10 pages
The Problem of Overfitting: Perspective
No ratings yet
The Problem of Overfitting: Perspective
12 pages
Romadaniati, Taufeni Taufik, Dan Azwir Nasir. (2020)
No ratings yet
Romadaniati, Taufeni Taufik, Dan Azwir Nasir. (2020)
11 pages
Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection
From Everand
Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection
Bart Baesens
No ratings yet
A Book of Numbers
From Everand
A Book of Numbers
Maria Morisot
No ratings yet

Ash Regression

Uploaded by

Ash Regression

Uploaded by

hs4e8yoxb

[1]: from google.colab import drive

[2]: import pandas as pd

[2]: CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \

PTRATIO B LSTAT MEDV

[506 rows x 14 columns]

[6]: CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \

PTRATIO B LSTAT Price

[394 rows x 14 columns]

See the caveats in the documentation: https://pandas.pydata.org/pandas-

[8]: df.drop('ZN',axis=1,inplace =True )

See the caveats in the documentation: https://pandas.pydata.org/pandas-

[9]: CRIM INDUS NOX RM AGE DIS RAD TAX PTRATIO B \

[394 rows x 12 columns]

[10]: import matplotlib.pyplot as plt

[10]: <Axes: xlabel='CRIM', ylabel='Price'>

[11]: <Axes: xlabel='TAX', ylabel='Price'>

[12]: <Axes: xlabel='B', ylabel='Price'>

[13]: <Axes: xlabel='LSTAT', ylabel='B'>

from sklearn import linear_model

[14]: array([[6.3200e-03, 2.3100e+00, 5.3800e-01, …, 2.1000e+01, 3.9345e+02,

[15]: from sklearn.model_selection import train_test_split

[16]: from sklearn.linear_model import LinearRegression

[20]: import numpy as np

from sklearn import metrics

[23]: plt.scatter(x_train,y_train, color = 'red')

↪plotnonfinite, data, **kwargs)

2860 vmin=None, vmax=None, alpha=None, linewidths=None, *,

1440 def inner(ax, *args, data=None, **kwargs):

↪edgecolors, plotnonfinite, **kwargs)

ValueError: x and y must be the same size

You might also like