0% found this document useful (0 votes)

17 views9 pages

Assigment Regression

Uploaded by

Raj Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views9 pages

Assigment Regression

Uploaded by

Raj Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Introduction to Regression.

Please install scikit learn and matplotlib. We will use scikit learn for data analysis and matplotlib for
plotting. Pandas will be used for reading the data.

1 Linear Regression

Figure 1: Linear regression

Regression analysis is a conceptually simple method for investigating functional relationships among
variables. These variables are known as dependent/response variable (x and independent/predictor
variable (y).

y = a1 x1 + a2 x2 + ................ + an xn + b (1)
Pn
xi yi − ni=1 xi ni=1 yi
P P
n
a= i=1
(2)
n ni=1 xi 2 − ( ni=1 xi )2
P P

Pn Pn
i=1 yi − a i=1 xi
b= (3)
n
Please consider a dataset of marks distribution of a class as given below;

1
Listing 1: marks_btech.dat

Names Maths Phys Chem Bio Eng Soc-Science Comp-Science

Rishi 92 80 91 78 76 89 95
Arpit 91 77 74 81 83 91 96
Raj 95 92 78 91 78 79 99
Indra 78 72 68 81 84 89 82
Vinay 89 81 78 86 81 84 92
Ashok 48 39 37 38 51 57 61
Arya 38 27 31 38 38 51 44
Priti 91 90 91 95 98 99 98
Som 90 89 95 78 91 91 96
Akash 38 37 34 71 81 78 51
Sruti 98 92 78 91 78 68 100
Palash 79 72 68 81 84 89 86
Arun 89 81 78 86 81 80 92
Rajesh 80 99 97 88 81 84 84
Veer 88 27 31 38 38 45 92
Sarti 94 90 91 95 98 99 97
Somen 89 89 95 78 91 89 93
Aksh 88 77 74 71 81 84 92
Arpita 93 77 74 81 83 88 96

1.1 Single variable linear regression

In the program given below, we will study the regression between marks scored by students in maths
and computer science. The marks scored in maths comprise X. Marks scored in computer science
comprise y. First, we will split the data (X,y) in training (70 %) and test (30 %) sets. We will fit the
training data to calculate the regression coefficients and intercept. Finally, predict the line of best fit
for y.

The least square regression line for the set of n data points is given by y = ax + b

2
Listing 2: linear_regression

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
df= pd.read_table("marks_Btech.dat", sep="\s+")

X=df.iloc[:,1:2].values
print X
y=df.iloc[:,7].values
print y

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=0)

print X_train
print X_test
print y_train
print y_test

reg = linear_model.LinearRegression()

reg.fit(X_train,y_train)

print reg.coef_
print reg.intercept_

y_pred= reg.predict(X_test)

plt.scatter(X_test, y_test, color='black', linewidth =3)

plt.scatter(X_train, y_train, color='red', linewidth =1)
plt.scatter(X_test, y_pred, color='blue', linewidth=3)
plt.show()

1.2 Multi variable linear regression

The dependent variable y depends on n independent variables x1 , x2 , ..,xn ;

y = a1 x1 + a2 x2 + ... + an xn + b (4)

Suppose, we want to calculate the linear regression between the marks scored in several/rest of the
subjects and computer science. We will perform multi variable linear regression;

3
Listing 3: Multi_linear

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.model_selection import train_test_split
df= pd.read_table("marks_Btech.dat", sep="\s+")

X=df.iloc[:,1:-1].values
print X, len(X)
y=df.iloc[:,7].values
print y, len(y)

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.2,random_state=0)

print X_train, len(X_train)

print X_test
print y_train, len(y_train)
print y_test

reg = linear_model.LinearRegression()
reg.fit(X_train,y_train)

print reg.coef_

y_pred= reg.predict(X_test)

print y_test

print reg.intercept_

print y_pred

2 Polynomial regression
y = a0 + a1 x + a2 x2 + ... + an xn + b (5)
The data consist of n observations of the dependent variable y on independent variable x.

In the program given below, we use polynomial regression of order 1 to 6 to study the regression
between marks scored by the students in maths and computer science.

4
Figure 2: Linear regression

Listing 4: Poly_regression

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=0)

reg = linear_model.LinearRegression()

from sklearn.preprocessing import PolynomialFeatures

poly_reg= PolynomialFeatures(degree=2)
x_poly = poly_reg.fit_transform(X)
reg.fit(x_poly, y)
y_pred = reg.predict(poly_reg.fit_transform(X_test))
print reg.coef_
print reg.intercept_

for i in range(1,6):
poly_reg= PolynomialFeatures(degree=i)
x_poly= poly_reg.fit_transform(X)
reg.fit(x_poly, y)
print('Degree of Equation:', i )
print('Coefficient:', reg.coef_ )
print('Intercept:', reg.intercept_ )
print('Accuracy Score', reg.score(poly_reg.fit_transform(X_test),y_test))

5
3 Assignment
• Please use matplotlib to plot the y_pred as function of x_test using polynomial regression of
order 1 to 6 and compare them in the same plot.

• In the program written above, we perform single variable polynomial regression. Please perform
multivariable polynomial regression to predict y, which is the marks scored in computer science
as a function of marks scored in all the subjects.

4 Dealing with collinearity

Multicollinearity arises if the independent variables (X) suffer non linear dependence among each other.
Multicollinearity can create inaccurate estimates of the regression coefficients, inflate the standard
errors of the regression coefficients, give false and nonsignificant coefficient values, and degrade the
predictability of the model.

4.1 Ridge
Ridge regression addresses some of the problems of ordinary least squares by imposing a penalty on
the size of coefficients. The ridge coefficients minimize a penalized residual sum of squares,

min(kY − X(β)k22 + λ kβk22 ) (6)

Here, λ is a complexity parameter that controls the amount of shrinkage: the larger the value of λ, the
greater the amount of shrinkage and thus the coefficients become more robust to collinearity.

1. It shrinks the parameters, therefore it is mostly used to prevent multicollinearity.

2. It reduces the model complexity by coefficient shrinkage

Listing 5: Ridge

import matplotlib.pyplot as plt

from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge

df= pd.read_table("marks_Btech.dat", sep="\s+")

X=df.iloc[:,1:-1].values
print X
y=df.iloc[:,7].values
print y

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=0)

model = Ridge(alpha=1.0, fit_intercept =True)

model.fit(X_train,y_train)
print model.coef_
y_pred = model.predict(X_test)
print y_test, y_pred
print model.intercept_

6
Figure 3: Ridge and LASSO

4.2 LASSO
LASSO (Least Absolute Shrinkage Selector Operator), is a similar technique to ridge. LASSO selects
only some feature while reduces the coefficients of others to zero. This property is known as feature
selection and which is absent in case of ridge. In LASSO instead of adding squares of β, we will add
absolute value of β.
min(kY − X(β)k22 + λ kβk1 ) (7)
β

1. It is generally used when we have more number of features, because it automatically does feature
selection.

It is useful in some contexts due to its tendency to prefer solutions with fewer parameter values,
effectively reducing the number of variables upon which the given solution is dependent. For this
reason, the LASSO and its variants are fundamental to the field of compressed sensing.

Listing 6: Lasso

import matplotlib.pyplot as plt

from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso

df= pd.read_table("marks_Btech.dat", sep="\s+")

X=df.iloc[:,1:-1].values
print X
y=df.iloc[:,7].values
print y

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=0)

l2_model = Lasso(alpha=1.0, fit_intercept= True)

l2_model.fit(X_train, y_train)
print l2_model.coef_
y_pred = l2_model.predict(X_test)
print y_test, y_pred
print l2_model.intercept_

7
4.3 ENet
Elastic net regression is basically a hybrid of ridge and lasso regression. This combination allows for
learning a sparse model, where few of the weights are non-zero like LASSO, while still maintaining
the regularization (keeping the same number of features, but reduce the magnitude of the coefficients)
properties of ridge.

min(kY − X(β)k22 + λ1 kβk1 + λ2 kβk22 ) (8)

Listing 7: ENet

import matplotlib.pyplot as plt

from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

df= pd.read_table("marks_Btech.dat", sep="\s+")

X=df.iloc[:,1:-1].values
print X
y=df.iloc[:,7].values
print y

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=0)

print X_train
print X_test
print y_train
print y_test

enet=ElasticNet(alpha=3.0, fit_intercept=True)
enet.fit(X_train, y_train)
print enet.coef_
y_pred = enet.predict(X_test)
print y_test, y_pred
print enet.intercept_

4.4 Feature reduction

We may reduce the number of features using the following program. Please note that in this case the
predicted values of y (y_pred) differ a lot from test values of y (y_test). Thus, feature reduction
generally works if the features are two alike, which is not so in this case.

8
Listing 8: Multi_linear

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
df= pd.read_table("marks_Btech.dat", sep="\s+")
X=df.iloc[:,1:-1].values
print X, len(X)
y=df.iloc[:,7].values
print y, len(y)
print X.shape

X_new = SelectKBest(chi2, k=2).fit_transform(X,y)

print X_new.shape
print X_new

X_train, X_test, y_train, y_test = train_test_split(X_new,y, test_size=0.2,random_state=0)

reg = linear_model.LinearRegression()
reg.fit(X_train,y_train)

print reg.coef_

y_pred= reg.predict(X_test)

print y_test

print reg.intercept_

print y_pred

5 Assignment
• Please also go through (i) support vector machine regression (ii) Kernel Ridge regression.

Collins - Cambridge - Further Pure Maths 1 - Worked Solutions
100% (3)
Collins - Cambridge - Further Pure Maths 1 - Worked Solutions
131 pages
Long Quiz Math 7 3rd Q
No ratings yet
Long Quiz Math 7 3rd Q
3 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Assignment No.4 - (20-Ele-68)
No ratings yet
Assignment No.4 - (20-Ele-68)
17 pages
Unit 5
No ratings yet
Unit 5
171 pages
Regression Models
No ratings yet
Regression Models
5 pages
ML Regression Documentation
No ratings yet
ML Regression Documentation
7 pages
Simple Linear Regression: Math Behind
No ratings yet
Simple Linear Regression: Math Behind
6 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
INSY446 - 3 - Linear Model Part 2
No ratings yet
INSY446 - 3 - Linear Model Part 2
27 pages
Regression
No ratings yet
Regression
16 pages
Wa0002.
No ratings yet
Wa0002.
5 pages
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
No ratings yet
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
5 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
Lab 6 - Linear Regression and Multiple Linear Regression
No ratings yet
Lab 6 - Linear Regression and Multiple Linear Regression
12 pages
Import Pandas As PD
No ratings yet
Import Pandas As PD
3 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
1.1. Linear Models - Scikit-Learn 1.6.1 Documentation
No ratings yet
1.1. Linear Models - Scikit-Learn 1.6.1 Documentation
41 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Linear Regression Notes Extended
No ratings yet
Linear Regression Notes Extended
3 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
ML Linear Regression Trupesh Patel
No ratings yet
ML Linear Regression Trupesh Patel
23 pages
2 - (9-3) Regression Classifiers
No ratings yet
2 - (9-3) Regression Classifiers
35 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
Assignment 7
No ratings yet
Assignment 7
4 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
22UCS303 DS-Unit IV-LINEAR REGRESSION
No ratings yet
22UCS303 DS-Unit IV-LINEAR REGRESSION
19 pages
DS Unit 2 Essay Answers
No ratings yet
DS Unit 2 Essay Answers
17 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
Mlda U1
No ratings yet
Mlda U1
10 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
ML Lab-3
No ratings yet
ML Lab-3
14 pages
Experiment 7 ML Vtu
No ratings yet
Experiment 7 ML Vtu
5 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
LAB5 Regularization
No ratings yet
LAB5 Regularization
6 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
16 pages
Linear Regression - Cheatsheet
No ratings yet
Linear Regression - Cheatsheet
8 pages
Regression
No ratings yet
Regression
16 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
SimpleMultipleLinearRegression FoundationalMathofAI S24
No ratings yet
SimpleMultipleLinearRegression FoundationalMathofAI S24
6 pages
Lecture-6 Linear Regression Addition
No ratings yet
Lecture-6 Linear Regression Addition
15 pages
Chat Openai Com Share 42b24a73 839b 4128 Ade9 7d8eed9e9533
No ratings yet
Chat Openai Com Share 42b24a73 839b 4128 Ade9 7d8eed9e9533
21 pages
Unit 2
No ratings yet
Unit 2
92 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
Lab Linear Regression
No ratings yet
Lab Linear Regression
21 pages
Machine Learning Unit2
No ratings yet
Machine Learning Unit2
31 pages
FML Unit2
No ratings yet
FML Unit2
13 pages
Numpy
No ratings yet
Numpy
2 pages
Exp1 (Linear - Regression) (1) 2
No ratings yet
Exp1 (Linear - Regression) (1) 2
7 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Exp 4 - LM
No ratings yet
Exp 4 - LM
5 pages
Machine Learning With Python Algorithms
No ratings yet
Machine Learning With Python Algorithms
28 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Nelder Mead 2D
No ratings yet
Nelder Mead 2D
5 pages
Solution of Algebraic & Transcendental Equation: Graeffe's Root Squaring Method
No ratings yet
Solution of Algebraic & Transcendental Equation: Graeffe's Root Squaring Method
3 pages
Practice MCQ Unit 1
No ratings yet
Practice MCQ Unit 1
59 pages
Saddle Points
No ratings yet
Saddle Points
6 pages
AOD - 1 - Answer Key
No ratings yet
AOD - 1 - Answer Key
2 pages
HMT Lecture 3 - Two-Dimensional Steady State Conduction
No ratings yet
HMT Lecture 3 - Two-Dimensional Steady State Conduction
7 pages
G.D. Goenka Public School, Sector 22, Rohini Class - Ix Practice Sheet-2 Subjective Polynomials
No ratings yet
G.D. Goenka Public School, Sector 22, Rohini Class - Ix Practice Sheet-2 Subjective Polynomials
2 pages
CSA Lab 8
No ratings yet
CSA Lab 8
4 pages
Graphical Method Minimization
No ratings yet
Graphical Method Minimization
21 pages
Hannah PDF
No ratings yet
Hannah PDF
2 pages
Negative
No ratings yet
Negative
8 pages
In The Fig Given Below, The Number of Zeroes of The Polynomial F (X) Is
No ratings yet
In The Fig Given Below, The Number of Zeroes of The Polynomial F (X) Is
4 pages
Unit III - Least Square Estimation
No ratings yet
Unit III - Least Square Estimation
6 pages
Assingment 2
No ratings yet
Assingment 2
2 pages
Adaptive Finite Element Methods
100% (1)
Adaptive Finite Element Methods
24 pages
The Application of Numerical Approximation Methods Upon Digital Images
No ratings yet
The Application of Numerical Approximation Methods Upon Digital Images
5 pages
Partial Fractions PDF
100% (2)
Partial Fractions PDF
36 pages
Integer Programming (Ip) : Ha Thi Xuan Chi, PHD
No ratings yet
Integer Programming (Ip) : Ha Thi Xuan Chi, PHD
53 pages
Mindmap of Finite Element Analysis (FEA)
No ratings yet
Mindmap of Finite Element Analysis (FEA)
1 page
Linear Programming 3
No ratings yet
Linear Programming 3
13 pages
Math 257: Finite Difference Methods
No ratings yet
Math 257: Finite Difference Methods
8 pages
Qbmscca040020206 PDF
No ratings yet
Qbmscca040020206 PDF
33 pages
Cubic Spline Interpolation PDF
100% (1)
Cubic Spline Interpolation PDF
67 pages
Computational Fluid Mechanics and Heat Transfer Ktu Honours
No ratings yet
Computational Fluid Mechanics and Heat Transfer Ktu Honours
7 pages
Introduction To Optimization
No ratings yet
Introduction To Optimization
13 pages
Linear Programing Problems
No ratings yet
Linear Programing Problems
36 pages
Unit 3 RMT Notes
No ratings yet
Unit 3 RMT Notes
13 pages
QAM 4th Module Assessment PDF
No ratings yet
QAM 4th Module Assessment PDF
7 pages

Assigment Regression

Uploaded by

Assigment Regression

Uploaded by

Introduction to Regression.

Figure 1: Linear regression

Names Maths Phys Chem Bio Eng Soc-Science Comp-Science

1.1 Single variable linear regression

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=0)

plt.scatter(X_test, y_test, color='black', linewidth =3)

1.2 Multi variable linear regression

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.2,random_state=0)

print X_train, len(X_train)

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=0)

from sklearn.preprocessing import PolynomialFeatures

4 Dealing with collinearity

min(kY − X(β)k22 + λ kβk22 ) (6)

1. It shrinks the parameters, therefore it is mostly used to prevent multicollinearity.

2. It reduces the model complexity by coefficient shrinkage

import matplotlib.pyplot as plt

df= pd.read_table("marks_Btech.dat", sep="\s+")

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=0)

model = Ridge(alpha=1.0, fit_intercept =True)

import matplotlib.pyplot as plt

df= pd.read_table("marks_Btech.dat", sep="\s+")

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=0)

l2_model = Lasso(alpha=1.0, fit_intercept= True)

min(kY − X(β)k22 + λ1 kβk1 + λ2 kβk22 ) (8)

import matplotlib.pyplot as plt

df= pd.read_table("marks_Btech.dat", sep="\s+")

X_train, X_test,y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=0)

4.4 Feature reduction

X_new = SelectKBest(chi2, k=2).fit_transform(X,y)

X_train, X_test, y_train, y_test = train_test_split(X_new,y, test_size=0.2,random_state=0)

You might also like