100% found this document useful (1 vote)

119 views18 pages

Lab2 Linear Regression

This document describes a lab project on applying linear regression to regression and classification tasks. The purpose is to use linear regression to predict house prices using the Boston Housing dataset, and classify handwritten digits using the MNIST dataset. Both closed-form and iterative gradient descent methods are used to solve the linear regression problems. Experimental results show that gradient descent with different hyperparameters, such as learning rate and mini-batch size, can converge the solution, though a large learning rate may not. Scikit-learn is also used to implement linear regression for comparison.

Uploaded by

Juan Zarate

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

119 views18 pages

Lab2 Linear Regression

Uploaded by

Juan Zarate

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Lab 2: Linear Regression

NOTE: This is a lab project accompanying the following book [MLF] and it should be used together with the
book.

[MLF] H. Jiang, "Machine Learning Fundamentals: A Concise Introduction

(http://wiki.eecs.yorku.ca/user/hj/research:mlfbook)", Cambridge University Press, 2021. (bibtex
(http://www.cse.yorku.ca/~hj/mlf-jiang.bib))

The purpose of this lab is to apply a simple machine learning method, namely linear regression, to some
regression and classification tasks on two popular data sets. We will show how linear regression may diﬀer
when used to solve a regression or classification problem. As we know, linear regression is simple enough
so that we can derive the closed-form solution to solve it. In this project, we will use both the closed-form
method and an iterative gradient descent method (e.g. minibatch SGD) to solve linear regression for these
tasks and compare their pros and cons in practice. Moreover, we will use linear regression as a simple
example to explain some fine-tuning tricks when using any iterative optimization methods (e.g. SGD) in
machine learning. As we will see in the up-coming projects, these tricks become vital in learning large
models in machine learning, such as deep neural networks.

Prerequisites: N/A

I. Linear Regression for Regression

Example 2.1:
Use linear regression to predict house prices in the popular Boston House data set
(https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html). Consider to use both the closed-form
solution and an iterative method to fit to the data and discuss their pros and cons with some experimental
results.

In [ ]: #link my Google drive

from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive
In [ ]: # load Boston House data set
import pandas as pd
import numpy as np

raw_data = pd.read_csv('/content/drive/My Drive/Colab Notebooks/datase

ts/boston.csv', header=None)
data_rows = np.reshape(raw_data.to_numpy(), (506,14))
data = data_rows[:,:13]
target = data_rows[:,13]

# normalize input features to zero-mean and unit-variance

data = (data-np.mean(data, axis=0))/np.std(data, axis=0)

print(data.shape)
print(target.shape)

(506, 13)
(506,)

In [ ]: # use the closed-form solution (Eq(6.9) on page 112)

# add a constant column of '1' to accomodate the bias (see the margin
note on page 107)
data_wb = np.hstack((data, np.ones((data.shape[0], 1), dtype=data.dtyp
e)))

print(data_wb.shape)

# refer to the closed-form solution, i.e. Eq.(6.9) on page 112

w = np.linalg.inv(data_wb.T @ data_wb) @ data_wb.T @ target

# calculate the mean square error in the training set

predict = data_wb @ w
error = np.sum((predict - target)*(predict - target))/data.shape[0]

print(f'mean square error for the closed-form solution: {error:.5f}')

(506, 14)
mean square error for the closed-form solution: 21.89483
Consider to solve the above linear regression using an iterative optimization, such as gradient descent.

Refer to eq.(6.8) on page 112, the objective function in linear regression, i.e. the mean square error (MSE), is
given as:
𝑁
1
(𝐰⊺ 𝐱𝑖 − 𝑦𝑖 )
2
2∑
𝐸(𝐰) =
𝑖=1

(𝐰 𝐗 𝐗𝐰 − 2𝐰⊺ 𝐗⊺ 𝐲 + 𝐲⊺ 𝐲)
1 ⊺ ⊺
=
2
we can show that its gradient can be computed in several equivallent ways as follows:
𝑁 𝑁
∂𝐸(𝐰)
(𝐰 𝐱𝑖 − 𝑦𝑖 )𝐱𝑖 = ∑ 𝐱𝑖 (𝐱𝑖 𝐰 − 𝑦𝑖 )
⊺ ⊺
∑
=
∂𝐰 𝑖=1 𝑖=1
𝑁 𝑁

( ∑ 𝐱𝑖 𝐱𝑖 )𝐰
⊺
𝑦𝑖 𝐱𝑖
∑
= −
𝑖=1 𝑖=1
⊺ ⊺
= 𝐗 𝐗𝐰 − 𝐗 𝐲
where 𝐗 and 𝐲 are defined in the same way as on page 112.

In the following, we use the formula from last row to calculate gradients via vectorization. Furthermore, we
implement a mini-batch SGD, .i.e. Algorithm 2.3 on page 62, to learn linear regression iteratively.
In [ ]: # solve linear regression using gradient descent
import numpy as np

class Optimizer():
def __init__(self, lr, annealing_rate, batch_size, max_epochs):
self.lr = lr
self.annealing_rate = annealing_rate
self.batch_size = batch_size
self.max_epochs = max_epochs

# X[N,d]: input features; y[N]: output targets; op: hyper-parameters f

or optimzer
def linear_regression_gd(X, y, op):
n = X.shape[0] # number of samples
w = np.zeros(X.shape[1]) # initialization

lr = op.lr
errors = np.zeros(op.max_epochs)
for epoch in range(op.max_epochs):
indices = np.random.permutation(n) #randomly shuffle data indices
for batch_start in range(0, n, op.batch_size):
X_batch = X[indices[batch_start:batch_start + op.batch_size]]
y_batch = y[indices[batch_start:batch_start + op.batch_size]]

# vectorization to compute gradients for a whole mini-batch (see

the above formula)
w_grad = X_batch.T @ X_batch @ w - X_batch.T @ y_batch

w -= lr * w_grad / X_batch.shape[0]

diff = X @ w - y # prediction difference

errors[epoch] = np.sum(diff*diff)/n
lr *= op.annealing_rate
#print(f'epoch={epoch}: the mean square error is {errors[epoch]}')

return w, errors
In [ ]: import matplotlib.pyplot as plt

op = Optimizer(lr=0.001, annealing_rate=0.99, batch_size=30, max_epoch

s=500)

w, errors = linear_regression_gd(data_wb, target, op)

plt.title('A larger mini-batch; converging smoothly but still a gap')

plt.xlabel('number of epoches')
plt.plot(errors, 'b', 21.89*np.ones(errors.shape[0]), 'c--')
plt.legend(['gradient descent', 'closed-form solution'])

Out[ ]: <matplotlib.legend.Legend at 0x7ffaed7e1d10>

In [ ]: import matplotlib.pyplot as plt

op = Optimizer(lr=0.001, annealing_rate=0.99, batch_size=2, max_epochs

=100)

w, errors = linear_regression_gd(data_wb, target, op)

plt.title('use a small mini-batch; converging quicklly and nicely to t

he optimum')
plt.xlabel('number of epoches')
plt.plot(errors, 'b', 21.89*np.ones(errors.shape[0]), 'c--')
plt.legend(['gradient descent', 'closed-form solution'])

Out[ ]: <matplotlib.legend.Legend at 0x7ffaed8d6c10>

In [ ]: import matplotlib.pyplot as plt

op = Optimizer(lr=0.01, annealing_rate=0.99, batch_size=20, max_epochs

=100)

w, errors = linear_regression_gd(data_wb, target, op)

plt.title('use a larger learning rate and larger mini-batch, also conv

erging nicely')
plt.xlabel('number of epoches')
plt.plot(errors, 'b', 21.89*np.ones(errors.shape[0]), 'c--')
plt.legend(['gradient descent', 'closed-form solution'])

Out[ ]: <matplotlib.legend.Legend at 0x7ffaed7e1e50>

In [ ]: import matplotlib.pyplot as plt

op = Optimizer(lr=0.2, annealing_rate=0.99, batch_size=20, max_epochs=

12)

w, errors = linear_regression_gd(data_wb, target, op)

plt.title('not converging due to an overly large learning rate is used

')
plt.xlabel('number of epoches')
plt.plot(errors, 'b', 21.89*np.ones(errors.shape[0]), 'c--')
plt.legend(['gradient descent', 'closed-form solution'])

Out[ ]: <matplotlib.legend.Legend at 0x7ffaed27b290>

Finally, let us show how to solve the above linear regression problem using the scikit-learning
implmenetation.
In [ ]: import numpy as np
from sklearn import linear_model
from sklearn.metrics import mean_squared_error

# Create linear regression object

l_regr = linear_model.LinearRegression()

# Train the model using the training set

l_regr.fit(data_wb, target)

# Make predictions using the same training set

predict = l_regr.predict(data_wb)

# The mean squared error

print("Mean squared error: %.5f" % mean_squared_error(target, predict)
)

Mean squared error: 21.89483

II. Linear Regression for Classification

Example 2.2:
Use linear regression to build a binary classifier to classify two digits ('3' and '8') in the MNIST data set.
Consider to use both the closed-form solution and an iterative method to fit to the data and discuss their
pros and cons with experimental results.

In [ ]: # install python_mnist

!pip install python_mnist

Collecting python_mnist
Downloading python_mnist-0.7-py2.py3-none-any.whl (9.6 kB)
Installing collected packages: python-mnist
Successfully installed python-mnist-0.7
In [ ]: #load MINST images

from mnist import MNIST

import numpy as np

mnist_loader = MNIST('/content/drive/My Drive/Colab Notebooks/datasets

/MNIST')
train_data, train_label = mnist_loader.load_training()
test_data, test_label = mnist_loader.load_testing()
train_data = np.array(train_data, dtype='float')/255 # norm to [0,1]
train_label = np.array(train_label, dtype='short')
test_data = np.array(test_data, dtype='float')/255 # norm to [0,1]
test_label = np.array(test_label, dtype='short')

#add small random noise to avoid matrix singularity

train_data += np.random.normal(0,0.0001,train_data.shape)

print(train_data.shape, train_label.shape, test_data.shape, test_label

.shape)

(60000, 784) (60000,) (10000, 784) (10000,)

In [ ]: # prepare digits '3' and '8' for linear regression

digit_train_index = np.logical_or(train_label == 3, train_label == 8)

X_train = train_data[digit_train_index]
y_train = train_label[digit_train_index]
digit_test_index = np.logical_or(test_label == 3, test_label == 8)
X_test = test_data[digit_test_index]
y_test = test_label[digit_test_index]

# add a constant column of '1' to accomodate the bias (see the margin
note on page 107)
X_train = np.hstack((X_train, np.ones((X_train.shape[0], 1), dtype=X_t
rain.dtype)))
X_test = np.hstack((X_test, np.ones((X_test.shape[0], 1), dtype=X_test
.dtype)))

# convert labels: '3' => -1, '8' => +1

CUTOFF = 5 # any number between '3' and '8'
y_train = np.sign(y_train-CUTOFF)
y_test = np.sign(y_test-CUTOFF)

print(X_train.shape)
print(y_train)

print(X_test.shape)
print(y_test)

(11982, 784)
[-1 -1 -1 ... 1 -1 1]
(1984, 784)
[-1 -1 -1 ... -1 1 -1]
In [ ]: # use the closed-form solution

# refer to the closed-form solution, i.e. Eq.(6.9) on page 112

w = np.linalg.inv(X_train.T @ X_train) @ X_train.T @ y_train

# calculate the mean square error and classification accuracy on the t

raining set
predict = X_train @ w
error = np.sum((predict - y_train)*(predict - y_train))/X_train.shape[
0]
print(f'mean square error on training data for the closed-form solutio
n: {error:.5f}')

accuracy = np.count_nonzero(np.equal(np.sign(predict),y_train))/y_trai
n.size*100.0
print(f'classification accuracy on training data for the closed-form s
olution: {accuracy:.2f}%')

# calculate the mean square error and classification accuracy on the t

est set
predict = X_test @ w
error = np.sum((predict - y_test)*(predict - y_test))/X_test.shape[0]
print(f'mean square error on test data for the closed-form solution: {
error:.5f}')

accuracy = np.count_nonzero(np.equal(np.sign(predict),y_test))/y_test.
size*100.0
print(f'classification accuracy on test data for the closed-form solut
ion: {accuracy:.2f}%')

mean square error on training data for the closed-form solution: 0.1
9629
classification accuracy on training data for the closed-form solutio
n: 96.99%
mean square error on training data for the closed-form solution: 1.3
0940
classification accuracy on test data for the closed-form solution: 9
5.92%
In [ ]: # use linear regression from sklearn

import numpy as np
from sklearn import linear_model
from sklearn.metrics import mean_squared_error

# Create linear regression object

l_regr = linear_model.LinearRegression()

# Train the model using the training set

l_regr.fit(X_train, y_train)

# Make predictions using the same training set

predict = l_regr.predict(X_train)
print("Mean squared error on training data: %.5f" % mean_squared_error
(y_train, predict))

# Make predictions using the test set

predict = l_regr.predict(X_test)
print("Mean squared error on test data: %.5f" % mean_squared_error(y_t
est, predict))

Mean squared error on training data: 0.19629

Mean squared error on test data: 1.30940

Next, let us consider to use mini-batch stochastic gradient descent (SGD) to learn linear regression models
for this binary classification problem. When we fine-tune any SGD method for a classificaion problem in
machline learning, it is very important to monitor the following three learning curves:

1. Classification Accuracy on the training set (curve A): this is the goal of the empirical risk mininization
(ERM) of the zero-one loss for classification (see Eq.(5.6) on page 99).
2. Classification Accuracy on an unseen test/development set (curve B): we need to compare the curves A
and B over the learning course to monitor whether overfitting or underfit occurs. Overfitting happens
when the gap between A and B is overly big while underfitting happens when A and B get very close
and both of them yield fairly poor performance. Moreover, we can also monitor the curves A and B to
determine when to terminate the learning proces for the best possible performance on the
test/devopement set.
3. The value of the learning objective function (curve C): because the zero-one loss is not directly
minimizable, we will have to establish a proxy objective function according to some criteria (see Tabe
7.1 on page 135). These objective functions are closelly related to the zero-one loss but they are NOT
the same. When we fine-tune an iterative optimization method, the first thing is to ensure that the value
of the chosen objective function descreases over the entire learning course. If we cannot reduce the
objective function (even when a suﬃciently small learning rate is used), it is very likely that the
implementation or code is buggy. Furthermore, if curve C is going down while curve A is not going up,
this is another indicator that someting is still wrong in the implementation.

In [ ]: # solve linear regression using gradient descent

import numpy as np

class Optimizer():
def __init__(self, lr, annealing_rate, batch_size, max_epochs):
self.lr = lr
self.annealing_rate = annealing_rate
self.batch_size = batch_size
self.max_epochs = max_epochs

# X[N,d]: training features; y[N]: training targets;

# X2[N,d]: test features; y2[N]: test targets;
# op: hyper-parameters for optimzer
#
# Note: X2 and y2 are not used in training
# but only for computting the learning curve B
#
def linear_regression_gd2(X, y, X2, y2, op):
n = X.shape[0] # number of samples
w = np.zeros(X.shape[1]) # initialization

lr = op.lr
errorsA = np.zeros(op.max_epochs)
errorsB = np.zeros(op.max_epochs)
errorsC = np.zeros(op.max_epochs)

for epoch in range(op.max_epochs):

indices = np.random.permutation(n) #randomly shuffle data indices
for batch_start in range(0, n, op.batch_size):
X_batch = X[indices[batch_start:batch_start + op.batch_size]]
y_batch = y[indices[batch_start:batch_start + op.batch_size]]

# vectorization to compute gradients for a whole mini-batch (see

the above formula)
w_grad = X_batch.T @ X_batch @ w - X_batch.T @ y_batch

w -= lr * w_grad / X_batch.shape[0]

# for learning curve C

diff = X @ w - y # prediction difference
errorsC[epoch] = np.sum(diff*diff)/n

# for learning curve A

predict = np.sign(X @ w)
errorsA[epoch] = np.count_nonzero(np.equal(predict,y))/y.size

# for learning curve B

predict2 = np.sign(X2 @ w)
errorsB[epoch] = np.count_nonzero(np.equal(predict2,y2))/y2.size

lr *= op.annealing_rate
print(f'epoch={epoch}: the mean square error is {errorsC[epoch]:.3
f} ({errorsA[epoch]:.3f},{errorsB[epoch]:.3f})')
return w, errorsA, errorsB, errorsC

In [ ]: import matplotlib.pyplot as plt

op = Optimizer(lr=0.001, annealing_rate=0.99, batch_size=50, max_epoch

s=20)

w, A, B, C = linear_regression_gd2(X_train, y_train, X_test, y_test, o

fig, ax = plt.subplots(2)
fig.suptitle('monitoring three learning curves (A, B, C)')
ax[0].plot(C, 'b', 0.196*np.ones(C.shape[0]), 'c--')
ax[0].legend(['curve C', 'closed-form solution'])

ax[1].plot(A, 'b', B, 'r')

ax[1].legend(['curve A', 'curve B'])
epoch=0: the mean square error is 0.398 (0.911,0.920)
epoch=1: the mean square error is 0.309 (0.929,0.942)
epoch=2: the mean square error is 0.279 (0.938,0.950)
epoch=3: the mean square error is 0.264 (0.945,0.957)
epoch=4: the mean square error is 0.254 (0.949,0.959)
epoch=5: the mean square error is 0.248 (0.951,0.962)
epoch=6: the mean square error is 0.244 (0.953,0.961)
epoch=7: the mean square error is 0.240 (0.954,0.962)
epoch=8: the mean square error is 0.237 (0.955,0.964)
epoch=9: the mean square error is 0.235 (0.956,0.963)
epoch=10: the mean square error is 0.234 (0.957,0.963)
epoch=11: the mean square error is 0.232 (0.957,0.964)
epoch=12: the mean square error is 0.231 (0.958,0.964)
epoch=13: the mean square error is 0.230 (0.958,0.964)
epoch=14: the mean square error is 0.229 (0.958,0.964)
epoch=15: the mean square error is 0.228 (0.959,0.964)
epoch=16: the mean square error is 0.227 (0.959,0.962)
epoch=17: the mean square error is 0.226 (0.959,0.963)
epoch=18: the mean square error is 0.226 (0.959,0.965)
epoch=19: the mean square error is 0.225 (0.960,0.965)

Out[ ]: <matplotlib.legend.Legend at 0x7ffadb5ad890>

In the above setting, we use a large mini-batch (50), which leads to fairly smooth convergence. As we can
see, even though there is a big gap in the objective function between SGD (curve C) and the closed-form
solution, classification accuracy of SGD exceeds that of the closed-form solution on either the training or
testing set. This indicates that MSE used in linear regression is NOT a good learning criterion for
classification (see why in section 7.1.1 on page 136).
In [ ]: import matplotlib.pyplot as plt

op = Optimizer(lr=0.001, annealing_rate=0.99, batch_size=5, max_epochs

=10)

w, A, B, C = linear_regression_gd2(X_train, y_train, X_test, y_test, o

fig, ax = plt.subplots(2)
fig.suptitle('monitoring three learning curves (A, B, C)')
ax[0].plot(C, 'b', 0.196*np.ones(C.shape[0]), 'c--')
ax[0].legend(['curve C', 'closed-form solution'])

ax[1].plot(A, 'b', B, 'r')

ax[1].legend(['curve A', 'curve B'])

epoch=0: the mean square error is 0.236 (0.956,0.963)

epoch=1: the mean square error is 0.224 (0.960,0.964)
epoch=2: the mean square error is 0.220 (0.961,0.963)
epoch=3: the mean square error is 0.217 (0.962,0.964)
epoch=4: the mean square error is 0.215 (0.962,0.963)
epoch=5: the mean square error is 0.214 (0.963,0.963)
epoch=6: the mean square error is 0.213 (0.964,0.965)
epoch=7: the mean square error is 0.215 (0.963,0.962)
epoch=8: the mean square error is 0.212 (0.963,0.964)
epoch=9: the mean square error is 0.214 (0.963,0.963)

Out[ ]: <matplotlib.legend.Legend at 0x7ffadb49df50>

In this setting, we use the same learning rate but a much smaller mini-batch size. A smaller mini-batch
means more model updates in each epoch. As a result, the classification accuracy on the training set (curve
A) is improved over the previous setting. However, we can see that the classification accuracy on the unseen
set (curve B) starts to go down after epoch 6, which indicates that we should terminate the learning at epoch
6.

Exercises

Problem 2.1:
Use Ridge regression to solve the regression problem in Example 2.1 as well as the classification problem in
Example 2.2, also implement both closed-form and iterative approachs, compare the results of Ridge
regression with those of linear regression.

Problem 2.2:
Use LASSO to solve the regression problem in Example 2.1 as well as the classification problem in Example
2.2, compare the results of LASSO with those of linear regression and Ridge regression.

Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Predicting Life Expectancy Using Machine Learning
100% (1)
Predicting Life Expectancy Using Machine Learning
9 pages
QuantEconlectures Python3 PDF
100% (1)
QuantEconlectures Python3 PDF
1,125 pages
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Cynix:1 Vulnhub Walkthrough: Table of Content
No ratings yet
Cynix:1 Vulnhub Walkthrough: Table of Content
17 pages
1 Lect - 1.2 - 12 - August 2022 PDF
No ratings yet
1 Lect - 1.2 - 12 - August 2022 PDF
59 pages
Unit - V
100% (1)
Unit - V
75 pages
Logistic Regression
100% (1)
Logistic Regression
12 pages
EDA On Titanic Dataset
100% (1)
EDA On Titanic Dataset
39 pages
A Star Search PDF
100% (1)
A Star Search PDF
6 pages
Under Sampling
100% (1)
Under Sampling
7 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
Session 11 - Multiple Regression Analysis (GbA) PDF
No ratings yet
Session 11 - Multiple Regression Analysis (GbA) PDF
119 pages
Exercise 3: Logistic Regression: Andrew NG (Very Slightly Edited by Luis R. Izquierdo For The University of Burgos)
No ratings yet
Exercise 3: Logistic Regression: Andrew NG (Very Slightly Edited by Luis R. Izquierdo For The University of Burgos)
5 pages
Linear Discriminant Analysis - Credit Card Default Analysis
No ratings yet
Linear Discriminant Analysis - Credit Card Default Analysis
7 pages
Poly
100% (1)
Poly
108 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Data Science Engineering Full Time Program Brochure
No ratings yet
Data Science Engineering Full Time Program Brochure
21 pages
Course Title: Data Pre-Processing and Visualization
100% (2)
Course Title: Data Pre-Processing and Visualization
11 pages
Merge +1
No ratings yet
Merge +1
107 pages
Credit Card Fraud Detection Using Machine Learning
100% (1)
Credit Card Fraud Detection Using Machine Learning
82 pages
Regression Models Course Project
100% (1)
Regression Models Course Project
4 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
100% (1)
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
151 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
KPMG - Data Set
100% (1)
KPMG - Data Set
1,685 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
Abstraction and Interface
No ratings yet
Abstraction and Interface
17 pages
Scip y Lectures
100% (1)
Scip y Lectures
329 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
Nearest Neighbour Algorithm
No ratings yet
Nearest Neighbour Algorithm
20 pages
Tutor
100% (1)
Tutor
309 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
Interview Questions For DS & DA (ML)
100% (1)
Interview Questions For DS & DA (ML)
66 pages
EDA Lecture Module 2
100% (1)
EDA Lecture Module 2
42 pages
KPMG
100% (1)
KPMG
2 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
CPE412 Pattern Recognition (Week 8)
100% (1)
CPE412 Pattern Recognition (Week 8)
25 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
Python For You and Me: Release 0.3.alpha1
100% (1)
Python For You and Me: Release 0.3.alpha1
143 pages
Blank: CFC Cumulative Forecast Error or Bias Error
100% (1)
Blank: CFC Cumulative Forecast Error or Bias Error
2 pages
Leer Los Datos: Import As Import As Import As From Import From Import
100% (1)
Leer Los Datos: Import As Import As Import As From Import From Import
14 pages
Association Rules
No ratings yet
Association Rules
64 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
49 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Tableau MMMF PDF
100% (1)
Tableau MMMF PDF
11 pages
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
100% (1)
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
16 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Capstone Story Template
No ratings yet
Capstone Story Template
30 pages
Student Guide - Module 2 Machine Learning
No ratings yet
Student Guide - Module 2 Machine Learning
50 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
8multiple Linear Regression
100% (1)
8multiple Linear Regression
21 pages
1.1 Statistics For Data Science PDF
No ratings yet
1.1 Statistics For Data Science PDF
91 pages
LinearRegression Tutorial
No ratings yet
LinearRegression Tutorial
40 pages
Multiple Regression
No ratings yet
Multiple Regression
7 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Adversarial Training For Large Neural Language Models
No ratings yet
Adversarial Training For Large Neural Language Models
13 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
ch9 Ensemble Learning
No ratings yet
ch9 Ensemble Learning
19 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
1 s2.0 S0168169919311573 Main
No ratings yet
1 s2.0 S0168169919311573 Main
11 pages
Health Care Call Center Whitesheet
No ratings yet
Health Care Call Center Whitesheet
14 pages
Landscape Definition Guide For SAP HANA® - : Best Practices For A Successful Deployment
No ratings yet
Landscape Definition Guide For SAP HANA® - : Best Practices For A Successful Deployment
19 pages
Expanded CRM Notes BCom6thSem BCU
No ratings yet
Expanded CRM Notes BCom6thSem BCU
6 pages
03 MRK Foods
No ratings yet
03 MRK Foods
1 page
R3 in Geomatics Research Results
100% (2)
R3 in Geomatics Research Results
53 pages
Mechanical Engineering Undergraduate-Manual
No ratings yet
Mechanical Engineering Undergraduate-Manual
27 pages
Digital File Despatch (DFD), Available Only When Final Entries Have Been Received Date Available On The School Support Hub
No ratings yet
Digital File Despatch (DFD), Available Only When Final Entries Have Been Received Date Available On The School Support Hub
2 pages
Sat Practice Test 2 Answers.25 28
No ratings yet
Sat Practice Test 2 Answers.25 28
4 pages
CompTIA A+ Core 1 Cable Types & Connectors
No ratings yet
CompTIA A+ Core 1 Cable Types & Connectors
28 pages
Digital Manufacturing in Design and Arch
No ratings yet
Digital Manufacturing in Design and Arch
8 pages
As ISO 14976-2006 Surface Chemical Analysis - Data Transfer Format
No ratings yet
As ISO 14976-2006 Surface Chemical Analysis - Data Transfer Format
8 pages
Security Best Practices Guide For ICM and IPCC Enterprise & Hosted Editions
No ratings yet
Security Best Practices Guide For ICM and IPCC Enterprise & Hosted Editions
120 pages
Nc7Wz14 Tinylogic Uhs Dual Inverter With Schmitt Trigger Inputs
No ratings yet
Nc7Wz14 Tinylogic Uhs Dual Inverter With Schmitt Trigger Inputs
10 pages
ACN Question Bank
100% (1)
ACN Question Bank
3 pages
JK Sharma - PERT and CPM
No ratings yet
JK Sharma - PERT and CPM
139 pages
Subjects Revised Curriculum BS ELECTRICAL ENGINEERING
No ratings yet
Subjects Revised Curriculum BS ELECTRICAL ENGINEERING
18 pages
VB Script General Examples
100% (1)
VB Script General Examples
12 pages
File Format Bulk Transfers 3.3
100% (1)
File Format Bulk Transfers 3.3
3 pages
Assignment - Algebraic Expressions Class 7
No ratings yet
Assignment - Algebraic Expressions Class 7
4 pages
Detecting Cyberattacks Using Anomaly Detection in Industrial Control Systems - A Federated Learning Approach
No ratings yet
Detecting Cyberattacks Using Anomaly Detection in Industrial Control Systems - A Federated Learning Approach
16 pages
ARDUINO ေရးသားနည္း
67% (3)
ARDUINO ေရးသားနည္း
81 pages
SM 06 New PDF Free
No ratings yet
SM 06 New PDF Free
25 pages
IP DIRECCION Y SUBNETING IPV4 Ent 1
No ratings yet
IP DIRECCION Y SUBNETING IPV4 Ent 1
12 pages
DSSLTC 4020
No ratings yet
DSSLTC 4020
42 pages
2003 A New Technique For Identification of Eighteen Flutter Derivatives Using A Three-Degree-Of-Freedom Section Model
No ratings yet
2003 A New Technique For Identification of Eighteen Flutter Derivatives Using A Three-Degree-Of-Freedom Section Model
10 pages
Palay Market Trend and Forecast
No ratings yet
Palay Market Trend and Forecast
43 pages
CHAPTER III KLS XI Prtmuan 2
No ratings yet
CHAPTER III KLS XI Prtmuan 2
6 pages
GC 2024 10 18
No ratings yet
GC 2024 10 18
13 pages
EDI&Gentran Course
No ratings yet
EDI&Gentran Course
1 page

Lab2 Linear Regression

Uploaded by

Lab2 Linear Regression

Uploaded by

Lab 2: Linear Regression

[MLF] H. Jiang, "Machine Learning Fundamentals: A Concise Introduction

I. Linear Regression for Regression

In [ ]: #link my Google drive

from google.colab import drive

raw_data = pd.read_csv('/content/drive/My Drive/Colab Notebooks/datase

# normalize input features to zero-mean and unit-variance

In [ ]: # use the closed-form solution (Eq(6.9) on page 112)

# refer to the closed-form solution, i.e. Eq.(6.9) on page 112

# calculate the mean square error in the training set

print(f'mean square error for the closed-form solution: {error:.5f}')

# X[N,d]: input features; y[N]: output targets; op: hyper-parameters f

# vectorization to compute gradients for a whole mini-batch (see

diff = X @ w - y # prediction difference

op = Optimizer(lr=0.001, annealing_rate=0.99, batch_size=30, max_epoch

w, errors = linear_regression_gd(data_wb, target, op)

plt.title('A larger mini-batch; converging smoothly but still a gap')

Out[ ]: <matplotlib.legend.Legend at 0x7ffaed7e1d10>

op = Optimizer(lr=0.001, annealing_rate=0.99, batch_size=2, max_epochs

w, errors = linear_regression_gd(data_wb, target, op)

plt.title('use a small mini-batch; converging quicklly and nicely to t

Out[ ]: <matplotlib.legend.Legend at 0x7ffaed8d6c10>

op = Optimizer(lr=0.01, annealing_rate=0.99, batch_size=20, max_epochs

w, errors = linear_regression_gd(data_wb, target, op)

plt.title('use a larger learning rate and larger mini-batch, also conv

Out[ ]: <matplotlib.legend.Legend at 0x7ffaed7e1e50>

op = Optimizer(lr=0.2, annealing_rate=0.99, batch_size=20, max_epochs=

w, errors = linear_regression_gd(data_wb, target, op)

plt.title('not converging due to an overly large learning rate is used

Out[ ]: <matplotlib.legend.Legend at 0x7ffaed27b290>

# Create linear regression object

# Train the model using the training set

# Make predictions using the same training set

# The mean squared error

Mean squared error: 21.89483

II. Linear Regression for Classification

!pip install python_mnist

from mnist import MNIST

mnist_loader = MNIST('/content/drive/My Drive/Colab Notebooks/datasets

#add small random noise to avoid matrix singularity

print(train_data.shape, train_label.shape, test_data.shape, test_label

(60000, 784) (60000,) (10000, 784) (10000,)

digit_train_index = np.logical_or(train_label == 3, train_label == 8)

# convert labels: '3' => -1, '8' => +1

# refer to the closed-form solution, i.e. Eq.(6.9) on page 112

# calculate the mean square error and classification accuracy on the t

# calculate the mean square error and classification accuracy on the t

# Create linear regression object

# Train the model using the training set

# Make predictions using the same training set

# Make predictions using the test set

Mean squared error on training data: 0.19629

In [ ]: # solve linear regression using gradient descent

# X[N,d]: training features; y[N]: training targets;

for epoch in range(op.max_epochs):

# vectorization to compute gradients for a whole mini-batch (see

# for learning curve C

# for learning curve A

# for learning curve B

In [ ]: import matplotlib.pyplot as plt

op = Optimizer(lr=0.001, annealing_rate=0.99, batch_size=50, max_epoch

w, A, B, C = linear_regression_gd2(X_train, y_train, X_test, y_test, o

ax[1].plot(A, 'b', B, 'r')

Out[ ]: <matplotlib.legend.Legend at 0x7ffadb5ad890>

op = Optimizer(lr=0.001, annealing_rate=0.99, batch_size=5, max_epochs

w, A, B, C = linear_regression_gd2(X_train, y_train, X_test, y_test, o

ax[1].plot(A, 'b', B, 'r')

epoch=0: the mean square error is 0.236 (0.956,0.963)

Out[ ]: <matplotlib.legend.Legend at 0x7ffadb49df50>

You might also like