0% found this document useful (0 votes)

11 views12 pages

Lab 6 - Linear Regression and Multiple Linear Regression

The document provides an overview of linear regression, including simple and multiple linear regression implementations in Python. It explains the assumptions of linear regression, such as linearity, multicollinearity, autocorrelation, and homoscedasticity, and discusses applications in various fields like economics and finance. Additionally, it includes code examples for implementing linear regression using datasets like Boston housing and diabetes data.

Uploaded by

prof.severussnape.hp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views12 pages

Lab 6 - Linear Regression and Multiple Linear Regression

Uploaded by

prof.severussnape.hp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Linear Regression (Python Implementation)

Linear regression is a statistical method for modeling relationships between a dependent

variable with a given set of independent variables.
Note: We refer to dependent variables as responses and independent variables
as features for simplicity.

In order to provide a basic understanding of linear regression, we start with the most
basic version of linear regression, i.e. Simple linear regression.

Simple Linear Regression

Simple linear regression is an approach for predicting a response using a single feature.
It is assumed that the two variables are linearly related. Hence, we try to find a linear
function that predicts the response value(y) as accurately as possible as a function of the
feature or independent variable(x).

Let us consider a dataset where we have a value of response y for every feature x:

For generality, we define:

x as feature vector, i.e x = [x_1, x_2, …., x_n],

y as response vector, i.e y = [y_1, y_2, …., y_n]
for n observations (in above example, n=10).

A scatter plot of the above dataset looks like:-

Code: Python implementation of above technique
1) Run IDLE
2) Click File>New File and type the following code and save the code as LRM.py
import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points
n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot

plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)

if __name__ == "__main__":
main()
3) Click Run>Run Module and observe the following output and model graph

Output:
Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437
And graph obtained looks like this:

4) Change X and Y with different values and run the LRM.py file and note down the output in
observation

5) Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing Linear
Regression modeling and note down the steps and outputs in your observation.
Multiple linear regression
Code: Python implementation of multiple linear regression techniques on
the Boston house pricing dataset using Scikit-learn.
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model, metrics

# load the boston dataset

boston = datasets.load_boston(return_X_y=False)

# defining feature matrix(X) and response vector(y)

X = boston.data
y = boston.target

# splitting X and y into training and testing sets

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,

random_state=1)

# create linear regression object

reg = linear_model.LinearRegression()

# train the model using the training sets

reg.fit(X_train, y_train)

# regression coefficients
print('Coefficients: ', reg.coef_)

# variance score: 1 means perfect prediction

print('Variance score: {}'.format(reg.score(X_test, y_test)))

# plot for residual error

## setting plot style
plt.style.use('fivethirtyeight')

## plotting residual errors in training data

plt.scatter(reg.predict(X_train), reg.predict(X_train) - y_train,
color = "green", s = 10, label = 'Train data')

## plotting residual errors in test data

plt.scatter(reg.predict(X_test), reg.predict(X_test) - y_test,
color = "blue", s = 10, label = 'Test data')

## plotting line for zero residual error

plt.hlines(y = 0, xmin = 0, xmax = 50, linewidth = 2)

## plotting legend
plt.legend(loc = 'upper right')

## plot title
plt.title("Residual errors")

## method call for showing the plot

plt.show()

Output:
Coefficients:
[ -8.80740828e-02 6.72507352e-02 5.10280463e-02 2.18879172e+00
-1.72283734e+01 3.62985243e+00 2.13933641e-03 -1.36531300e+00
2.88788067e-01 -1.22618657e-02 -8.36014969e-01 9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

and Residual Error plot looks like this:

Exercise:
Use the diabetes data set from UCI and Pima Indians Diabetes data set and
perform multiple linear regression. Also compare the results of the above
analysis for the two data sets

In the above example, we determine the accuracy score using Explained

Variance Score.

We define:
explained_variance_score = 1 – Var{y – y’}/Var{y}

where y’ is the estimated target output, y the corresponding (correct) target

output, and Var is Variance, the square of the standard deviation.
The best possible score is 1.0, lower values are worse.

Assumptions:
Given below are the basic assumptions that a linear regression model makes
regarding a dataset on which it is applied:
 Linear relationship: Relationship between response and feature variables
should be linear. The linearity assumption can be tested using scatter plots.
As shown below, 1st figure represents linearly related variables whereas
variables in the 2nd and 3rd figures are most likely non-linear. So, 1st figure
will give better predictions using linear regression.
 Little or no multi-collinearity: It is assumed that there is little or no
multicollinearity in the data. Multicollinearity occurs when the features (or
independent variables) are not independent of each other.
 Little or no auto-correlation: Another assumption is that there is little or no
autocorrelation in the data. Autocorrelation occurs when the residual errors
are not independent of each other.
 Homoscedasticity: Homoscedasticity describes a situation in which the
error term (that is, the “noise” or random disturbance in the relationship
between the independent variables and the dependent variable) is the same
across all values of the independent variables. As shown below, figure 1 has
homoscedasticity while figure 2 has heteroscedasticity.
Applications:

 Trend lines: A trend line represents the variation in quantitative data with
the passage of time (like GDP, oil prices, etc.). These trends usually follow a
linear relationship. Hence, linear regression can be applied to predict future
values. However, this method suffers from a lack of scientific validity in
cases where other potential changes can affect the data.

 Economics: Linear regression is the predominant empirical tool in

economics. For example, it is used to predict consumer spending, fixed
investment spending, inventory investment, purchases of a country’s
exports, spending on imports, the demand to hold liquid assets, labor
demand, and labor supply.

 Finance: The capital price asset model uses linear regression to analyze
and quantify the systematic risks of an investment.

 Biology: Linear regression is used to model causal relationships between

parameters in biological systems.
References:

 https://www.geeksforgeeks.org/linear-regression-python-implementation/
 https://en.wikipedia.org/wiki/Linear_regression
 https://en.wikipedia.org/wiki/Simple_linear_regression
 http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
 http://www.statisticssolutions.com/assumptions-of-linear-regression/

SIDDHANT VIJAY 2K20 CH 65 Sem 5
No ratings yet
SIDDHANT VIJAY 2K20 CH 65 Sem 5
29 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
ML Experiment No 1 Linear Regression Analysis
No ratings yet
ML Experiment No 1 Linear Regression Analysis
3 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
ML Unit
No ratings yet
ML Unit
23 pages
Instructional Module
100% (2)
Instructional Module
6 pages
LR LogReg
No ratings yet
LR LogReg
53 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
AIand MLlab 5
No ratings yet
AIand MLlab 5
10 pages
2.3 Assumptions of Linear Regression
No ratings yet
2.3 Assumptions of Linear Regression
16 pages
Regression
No ratings yet
Regression
16 pages
Exp 1 121a1047 Lavanya Kurup ML
No ratings yet
Exp 1 121a1047 Lavanya Kurup ML
11 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
47 Exp2 Dav
No ratings yet
47 Exp2 Dav
15 pages
Implementation of Linear Regression With Python
No ratings yet
Implementation of Linear Regression With Python
5 pages
Dav Exp
No ratings yet
Dav Exp
11 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Unit 5
No ratings yet
Unit 5
171 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
ML Linear Regression Trupesh Patel
No ratings yet
ML Linear Regression Trupesh Patel
23 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
Linear Regression Model 1
No ratings yet
Linear Regression Model 1
23 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Simple Linear Regression: Math Behind
No ratings yet
Simple Linear Regression: Math Behind
6 pages
Machine Learning With Python Algorithms
No ratings yet
Machine Learning With Python Algorithms
28 pages
Regression Model
No ratings yet
Regression Model
6 pages
Linear Regression - FDS
No ratings yet
Linear Regression - FDS
18 pages
ML PR-2
No ratings yet
ML PR-2
11 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
Regression Analysis
No ratings yet
Regression Analysis
16 pages
Assignment 7
No ratings yet
Assignment 7
4 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
AI Lec23
No ratings yet
AI Lec23
36 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Exp 4 - LM
No ratings yet
Exp 4 - LM
5 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
DSUP Exp4
No ratings yet
DSUP Exp4
6 pages
Linear Regression Code
No ratings yet
Linear Regression Code
5 pages
JLG-860SJ - en
No ratings yet
JLG-860SJ - en
142 pages
AD 6 Chawl Case Study
No ratings yet
AD 6 Chawl Case Study
4 pages
Assignment 2 - LP1
No ratings yet
Assignment 2 - LP1
7 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
ML Exp1 C36
No ratings yet
ML Exp1 C36
13 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
Facts & Formulae Chemistry
100% (2)
Facts & Formulae Chemistry
53 pages
AI Lab9
No ratings yet
AI Lab9
5 pages
Dav Exp3
No ratings yet
Dav Exp3
3 pages
Biostatistics Concepts and Applications For Biologists
No ratings yet
Biostatistics Concepts and Applications For Biologists
210 pages
Dav Exp2
No ratings yet
Dav Exp2
3 pages
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
No ratings yet
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
4 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Import Pandas As PD
No ratings yet
Import Pandas As PD
3 pages
Linear Regression Notes Extended
No ratings yet
Linear Regression Notes Extended
3 pages
Pursue Lesson 1
No ratings yet
Pursue Lesson 1
10 pages
American Scientist, Vol. 111.1 (January-February 2023)
No ratings yet
American Scientist, Vol. 111.1 (January-February 2023)
68 pages
So Harian N3 TGL 03 Juni 2024
No ratings yet
So Harian N3 TGL 03 Juni 2024
160 pages
RM 64
No ratings yet
RM 64
632 pages
Mccall Diesel Motor Works Case Study
100% (2)
Mccall Diesel Motor Works Case Study
14 pages
Power Engineering (Trivia 3)
No ratings yet
Power Engineering (Trivia 3)
7 pages
Supported Upgrade Paths For FortiOS Firmware 5.2
0% (1)
Supported Upgrade Paths For FortiOS Firmware 5.2
20 pages
Session 7: Genetics, Experience and Financial Sophistication
100% (1)
Session 7: Genetics, Experience and Financial Sophistication
40 pages
IJCRT2310639
No ratings yet
IJCRT2310639
9 pages
J24 Jimmys Combo
No ratings yet
J24 Jimmys Combo
54 pages
Unit 5 (C++) - Function
No ratings yet
Unit 5 (C++) - Function
102 pages
WHKF DWH Instructions
No ratings yet
WHKF DWH Instructions
11 pages
Javascriptinterviewquestions 240713104909 D9bedd8b
No ratings yet
Javascriptinterviewquestions 240713104909 D9bedd8b
25 pages
Unit 8
No ratings yet
Unit 8
62 pages
Tony Tella Resume
No ratings yet
Tony Tella Resume
2 pages
Digital Electronics and Communication Systems: Curriculum
No ratings yet
Digital Electronics and Communication Systems: Curriculum
83 pages
RTE 1503 Unit 3 Self Test
No ratings yet
RTE 1503 Unit 3 Self Test
15 pages
Ejercicios Recuperacion Presente Simple 4c2ba
No ratings yet
Ejercicios Recuperacion Presente Simple 4c2ba
4 pages
Vasilka
No ratings yet
Vasilka
4 pages
GPS Unit 2 Assignment Sheet
No ratings yet
GPS Unit 2 Assignment Sheet
3 pages
Chapter-4 Bullet
No ratings yet
Chapter-4 Bullet
5 pages
Synthesis of Ethanol by Fermentation
No ratings yet
Synthesis of Ethanol by Fermentation
10 pages
Analysis2 Final Exam 2022 PDF
No ratings yet
Analysis2 Final Exam 2022 PDF
3 pages
Using The Universal PE Unpacker
No ratings yet
Using The Universal PE Unpacker
11 pages
Power Tools: Rotary Hammer
No ratings yet
Power Tools: Rotary Hammer
2 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Lab 6 - Linear Regression and Multiple Linear Regression

Uploaded by

Lab 6 - Linear Regression and Multiple Linear Regression

Uploaded by

Linear Regression (Python Implementation)

Linear regression is a statistical method for modeling relationships between a dependent

Simple Linear Regression

For generality, we define:

x as feature vector, i.e x = [x_1, x_2, …., x_n],

A scatter plot of the above dataset looks like:-

def estimate_coef(x, y):

# mean of x and y vector

# calculating cross-deviation and deviation about x

# calculating regression coefficients

return (b_0, b_1)

def plot_regression_line(x, y, b):

# predicted response vector

# plotting the regression line

# function to show plot

# plotting regression line

# load the boston dataset

# defining feature matrix(X) and response vector(y)

# splitting X and y into training and testing sets

# create linear regression object

# train the model using the training sets

# variance score: 1 means perfect prediction

# plot for residual error

## plotting residual errors in training data

## plotting residual errors in test data

## plotting line for zero residual error

## method call for showing the plot

and Residual Error plot looks like this:

In the above example, we determine the accuracy score using Explained

where y’ is the estimated target output, y the corresponding (correct) target

 Economics: Linear regression is the predominant empirical tool in

 Biology: Linear regression is used to model causal relationships between

You might also like