0% found this document useful (0 votes)

16 views83 pages

Module3 Ch1

Module 3 covers various machine learning models and their training algorithms, including linear regression, gradient descent, and logistic regression. It discusses concepts such as bias, variance, and regularization techniques like Ridge and Lasso regression. The module also explains the importance of training methods, validation techniques, and the computational complexity of different algorithms.

Uploaded by

abiahmoljoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views83 pages

Module3 Ch1

Uploaded by

abiahmoljoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

Module 3

Training Models: Linear regression, gradient descent, polynomial

regression, learning curves, regularized linear models, logistic regression
Support Vector Machine: linear, Nonlinear , SVM regression and under
the hood
Topics covered:
• Machine Learning models and their training algorithms mostly like
black boxes.
• Bias is defined as the inability of the model because of that there is
some difference or error occurring between the model’s predicted
value and the actual value.
• These differences between actual or expected values and the
predicted values are known as error or bias error or error due to bias.
• Bias is a systematic error that occurs due to wrong assumptions in
the machine learning process.
• Low Bias: Low bias value means fewer assumptions are taken to build
the target function. In this case, the model will closely match the
training dataset.
• High Bias: High bias value means more assumptions are taken to
build the target function. In this case, the model will not match the
training dataset closely.
Chapter objectives
• Linear Regression model, one of the simplest models there is.
• Training methods
• A direct “closed-form” equation that directly computes the model parameters
that best fit the model to the training set (i.e., the model parameters that
minimize the cost function over the training set).
• Using an iterative optimization approach, called Gradient Descent (GD), that
gradually tweaks the model parameters to minimize the cost function over
the training set, eventually converging to the same set of parameters as the
first method.
• Polynomial Regression
• Logistic Regression
• Softmax Regression
• A simple linear model is mathematically represented as

• Where a0 is bias, a1 is slope of the line and e is error prediction

• a0 and a1 are regression coefficients
• Y is random and Mutual independent- Every event is independent of any
intersection of the other events
• The difference between predicted and true values are called error, error is also
mutually independent
• Unknown parameters are constants
• Figure shows optimal line , data point and error
• X1….xn are data points ei error
• A regression line is the line if best fit for which the sum of the squares
of residuals is minimum
• Linear regression is modelled as
• Given that weekly sales y , predict 7th week sales.
• (n*1) (n*2)(2*1)+(n*1)
• where n is number of data sets
Closed-form solution
Training Models: Linear regression
• Simple regression model of life satisfaction:
• life_satisfaction= θ0 + θ1 × GDP_per_capital.
• θ0 , θ1 are model parameters
• input feature: GDP_per_capita
• Linear Regression model prediction

• Where ŷ (y hat) is the predicted value

• n is the number of features.
• xi is the ith feature value.
• θj is the jth model parameter (including the bias term θ0 and the feature weights θ1, θ2, ⋯, θn).
Linear Regression model prediction
(vectorized form)
Validating regression methods

• 1. Standard error(SE) y-y’

• 2. Mean Absolute error (MAE)

• 3. Mean Squared error (MSE)

• 4. Root mean Square error (RMSE)
• 5. Relative MSE (RMSE)
MSE cost function for a Linear Regression
model
The Normal Equation
• To find the value of θ that minimizes the cost function, there is a
closed-form solution, a mathematical equation that gives the result
directly
• Randomly generated linear dataset

import numpy as np
import matplotlib as plt
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
plt.plot(X, y, "b.")
Set θ=1
X_b = np.c_[np.ones((100, 1)), X] # add x0 = 1 to each instance
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

• Apply inv() function from NumPy’s Linear Algebra module (np.linalg)

to compute the inverse of a matrix, and the dot() method for matrix
multiplication
X_new = np.array([[0], [2]])
X_new_b = np.c_[np.ones((2, 1)), X_new]
# add x0 = 1 to each instance
y_predict = X_new_b.dot(theta_best)
Plot
plt.plot(X_new, y_predict, "r-")
plt.plot(X, y, "b.")
plt.axis([0, 2, 0, 15])
plt.show()
Performing linear regression using Scikit-
Learn is quite simple
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
lin_reg.intercept_, lin_reg.coef_
lin_reg.predict(X_new)
theta_best_svd, residuals, rank, s = np.linalg.lstsq(X_b, y, rcond=1e-6)
np.linalg.pinv(X_b).dot(y)

• The LinearRegression class is based on the scipy.linalg.lstsq() function (the name stands
for “least squares”)
• The pseudoinverse itself is computed using a standard matrix factorization technique
called Singular Value Decomposition (SVD) that can decompose the training set matrix X
into the matrix multiplication of three matrices U Σ VT
Computational Complexity
• The Normal Equation computes the inverse of XT X, which is an (n + 1) × (n + 1)
matrix.

• The computational complexity of inverting a matrix is typically about O(n2.4) to

O(n3) (depending on the implementation).
• In other words, if you double the number of features, you multiply the
computation time by roughly 22.4 = 5.3 to 23 = 8.

• The SVD approach used by Scikit-Learn’s LinearRegression class is about O(n2).

Gradient Descent
• Gradient Descent is a very generic optimization algorithm capable of
finding optimal solutions to a wide range of problems

θ with random values (this is called random

initialization), and then improve it gradually, taking one
small step at a time, each step attempting to decrease
the cost function (e.g., the MSE), until the algorithm
converges to a minimum
• An important parameter in Gradient Descent is the size of the steps,
determined by the learning rate hyper parameter.
• If the learning rate is too small, then the algorithm will have to go
through many iterations to converge, which will take a long time
• If the learning rate is too high, you might jump across the valley and
end up on the other side, possibly even higher up than you were
before.
• This might make the algorithm diverge, with larger and larger values,
failing to find a good solution
• Finally, not all cost functions look like nice regular bowls. There may
be holes, ridges, plateaus, and all sorts of irregular terrains, making
convergence to the minimum very difficult.
• Two main challenges with Gradient
Descent: if the random initialization starts
the algorithm on the left, then it will
converge to a local minimum, which is not
as good as the global minimum.
• If it starts on the right, then it will take a
very long time to cross the plateau, and if
you stop too early you will never reach the
global minimum.
Global and local minima point
• The point at which a function takes the minimum value is called
global minima.
• However, when the goal is to minimize the function and solved using
optimization algorithms such as gradient descent, it may so happen
that function may appear to have a minimum value at different
points. Those several points which appear to be minima but are not
the point where the function actually takes the minimum value are
called local minima.
• Machine learning algorithms such as gradient descent algorithms
may get stuck in local minima during the training of the models.
Gradient descent is able to find local minima most of the time and not
global minima because the gradient does not point in the direction of
the steepest descent
• The MSE cost function for a Linear Regression model happens to be a
convex function, which means that if you pick any two points on the
curve, the line segment joining them never crosses the
curve.Gradient Descent is guaranteed to approach arbitrarily close
the global minimum
Batch Gradient Descent

• To implement Gradient Descent, you need to compute the gradient of the

cost function with regards to each model parameter θj

• computes the partial derivative of the cost function with regards to

parameter
• θj, noted ∂/∂θj MSE(θ).
Gradient vector of the cost function
This formula involves calculations over the full
training set X, at each Gradient Descent step!
This is why the algorithm is called Batch
Gradient Descent:
it uses the whole batch of training
data at every step (actually, Full Gradient
Descent would probably be a better name).
• Once the gradient vector is obtained, which points uphill, just go in
the opposite direction to go downhill.
• This means subtracting ∇θMSE(θ) from θ.
• This is where the learning rate η comes into play: multiply the
gradient vector by η to determine the size of the downhill step
eta = 0.1 # learning rate Eta (η) is the 7th letter of the Greek alphabet
n_iterations = 1000
m = 100
theta = np.random.randn(2,1) # random initialization
for iteration in range(n_iterations):
gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)
theta = theta - eta * gradients
Stochastic Gradient Descent
• The main problem with Batch Gradient Descent is the fact that it uses
the whole training set to compute the gradients at every step, which
makes it very slow when the training set is large.
• At the opposite extreme, Stochastic Gradient Descent just picks a
random instance in the training set at every step and computes the
gradients based only on that single instance.
• Obviously this makes the algorithm much faster
• On the other hand, due to its stochastic (i.e., random) nature, this
algorithm is much less regular than Batch Gradient Descent: instead
of gently decreasing until it reaches the minimum, the cost function
will bounce up and down, decreasing only on average.
• Over time it will end up very close to the minimum, but once it gets
there it will continue to bounce around, never settling down
n_epochs = 50
t0, t1 = 5, 50 # learning schedule hyperparameters
def learning_schedule(t):
return t0 / (t + t1)
theta = np.random.randn(2,1) # random initialization
for epoch in range(n_epochs):
for i in range(m):
random_index = np.random.randint(m)
xi = X_b[random_index:random_index+1]
yi = y[random_index:random_index+1]
gradients = 2 * xi.T.dot(xi.dot(theta) - yi)
eta = learning_schedule(epoch * m + i)
theta = theta - eta * gradients
from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, penalty=None, eta0=0.1)
sgd_reg.fit(X, y.ravel())
sgd_reg.intercept_, sgd_reg.coef_
Mini-batch Gradient Descent
• At each step, instead of computing the gradients based on the full
training set (as in Batch GD) or based on just one instance (as in
Stochastic GD), Mini-batch GD computes the gradients on small
random sets of instances called mini-batches.
• The main advantage of Mini-batch GD over Stochastic GD is that you
can get a performance boost from hardware optimization of matrix
operations, especially when using GPUs
• Mini-batch GD will end up walking
around a bit closer to the minimum
than SGD
• All end up near the minimum, but
Batch GD’s path actually stops at
the minimum, while both Stochastic
GD and Mini-batch GD continue to
walk around.
• Batch GD takes a lot of time to take
each step, and Stochastic GD and
Mini-batch GD would also reach the
minimum
Polynomial Regression
• Polynomial regression can handle non linear relationship among
variables by using nth degree of a polynomial
• It provides non linear curve such as quadratic and cubic

Second degree transformation ( quadratic)

Thrd degree transformation ( cubic))

Polynomial with degree 2
Simple quadratic equation (degree 2)
m = 100
X = 6 * np.random.rand(m, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1)

import matplotlib.pyplot as plt

plt.plot(X, y, "b.")
plt.axis([0, 2, 0, 15])
plt.show()
• from sklearn.preprocessing
• import PolynomialFeatures
• poly = PolynomialFeatures(degree=2, include_bias=False)
• X_poly = poly.fit_transform(X)

• from sklearn.linear_model
• import LinearRegression
• lin_reg = LinearRegression()
• lin_reg.fit(X_poly, y)
• lin_reg.intercept_, lin_reg.coef_ #a0,a1,a2 y = ax2 + bx + c.
• yhat = 0 . 56x1 2 + 0 . 93x1 + 1 . 78
• Original y = 0 . 5x1 2 + 1 . 0x1 + 2 . 0 + Gaussian noise

• Polynomial Regression is capable of finding relationships between

features
• Training data:
• when there are just one or two instances in the training set, the
model can fit them perfectly, which is why the curve starts at
zero.
• But as new instances are added to the training set, it becomes
impossible for the model to fit the training data perfectly, both
because the data is noisy and because it is not linear at all.
• So the error on the training data goes up until it reaches a
plateau, at which point adding new instances to the training set
doesn’t make the average error much better or worse.
• validation data:
• When the model is trained on very few training instances, it is
incapable of generalizing properly, hence validation error is
initially quite big.
• Then as the model is shown more training examples, it learns
and thus the validation error slowly goes down.
• However, once again a straight line cannot do a good job
modeling the data, so the error ends up at a plateau, very
close to the other curve.
Bias, Variance, Irreducible error
• Bias: This part of the generalization error is due to wrong assumptions, such as assuming
that the data is linear when it is actually quadratic. A high-bias model is most
likely to underfit the training data
• Variance
This part is due to the model’s excessive sensitivity to small variations in the training data.
A model with many degrees of freedom (such as a high-degree polynomial
model) is likely to have high variance, and thus to overfit the training data.
• Irreducible error
This part is due to the noisiness of the data itself. The only way to reduce this
part of the error is to clean up the data (e.g., fix the data sources, such as broken
sensors, or detect and remove outliers)
Regularized Linear Models
• Ridge Regression
• Lasso Regression
• Elastic Net
Ridge Regression
• Ridge Regression is also called Tikhonov regularization
• Regularization term used is

• This term is added to the cost function.

• This forces the learning algorithm to not only fit the data but also keep the
model weights as small as possible.
• The regularization term should only be added to the cost function during
training.
• Evaluation of model’s performance is using the un-regularized performance
measure
• Cost function used:

• Θ0 is not regularized hence I starts from 1

Ridge function usage in scikit
• from sklearn.linear_model import Ridge
• ridge_reg = Ridge(alpha=1, solver="cholesky")
• ridge_reg.fit(X, y)
• ridge_reg.predict([[1.5]])
Lasso Regression
(Least Absolute Shrinkage and Selection Operator
Regression)
• Cost function:

• Eliminate the weights of the least important features

Logistic Regression
• Logistic Regression (also called Logit Regression)
• Commonly used to estimate the probability that an instance belongs
to a particular class
• Logistic Regression model estimated probability (vectorized form)

• σ(・)—is a sigmoid function

Examples for logistic
• Model prediction

σ(t) < 0.5 when t < 0, and σ(t) ≥ 0.5 when t ≥ 0,

so a Logistic Regression model predicts 1 if xT θ is positive, and 0 if it is negative.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
iris = datasets.load_iris()
list(iris.keys())['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename']
X = iris["data"][:, 3:]
# petal width
y = (iris["target"] == 2).astype(np.int64) # 1 if Iris-Virginica, else 0
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression()
log_reg.fit(X, y)
X_new = np.linspace(0, 3, 1000).reshape(-1, 1)
y_proba = log_reg.predict_proba(X_new)
plt.plot(X_new, y_proba[:, 1], "g-", label="Iris-Virginica")
plt.plot(X_new, y_proba[:, 0], "b--", label="Not Iris-Virginica")

log_reg.predict([[1.7], [1.5]])
Softmax Regression
• The Logistic Regression model can be generalized to support multiple
classes directly, without having to train and combine multiple binary
classifiers.
• Softmax score function
Predictor
In scikit learn
• X = iris["data"][:, (2, 3)] # petal length, petal
• widthy = iris["target"]
• softmax_reg = LogisticRegression(multi_class="multinomial",solver="lbfgs",
C=10)
• softmax_reg.fit(X, y)
• softmax_reg.predict([[5, 2]])
• softmax_reg.predict_proba([[5, 2]])

Week 4
No ratings yet
Week 4
101 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
The Humongous Book of Calculus Problems
No ratings yet
The Humongous Book of Calculus Problems
580 pages
Lecture 05 06
No ratings yet
Lecture 05 06
40 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
Lecture Slides - Linear Reg
No ratings yet
Lecture Slides - Linear Reg
34 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Module I Complete Notes
No ratings yet
Module I Complete Notes
136 pages
Lattice Lesson Plan
No ratings yet
Lattice Lesson Plan
8 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
Q1 General Mathematics 11 - Module 1
91% (11)
Q1 General Mathematics 11 - Module 1
41 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
54 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Regression
No ratings yet
Regression
25 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Grade 5 Advanced Math Test
No ratings yet
Grade 5 Advanced Math Test
2 pages
Lec 3
No ratings yet
Lec 3
22 pages
2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
Training Models
No ratings yet
Training Models
13 pages
Chapter04 Training Models
No ratings yet
Chapter04 Training Models
33 pages
Algebra For High Schools Volume-01 - ST5404
No ratings yet
Algebra For High Schools Volume-01 - ST5404
172 pages
Lecture04. Training Models (Regression in Chapter 4)
No ratings yet
Lecture04. Training Models (Regression in Chapter 4)
44 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Lec6 7 Linear Regression
No ratings yet
Lec6 7 Linear Regression
38 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Lab 4 - Markdown Practical - Solution
No ratings yet
Lab 4 - Markdown Practical - Solution
5 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
Regression
No ratings yet
Regression
16 pages
Regression
No ratings yet
Regression
30 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Properties of Rational Numbers Worksheets
No ratings yet
Properties of Rational Numbers Worksheets
4 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Module 3
No ratings yet
Module 3
27 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
(Ebook) Mathematics For The IB MYP 1 by Irina Amlin, Rita Bateson ISBN 9781471880919, 1471880915 Instant Download
No ratings yet
(Ebook) Mathematics For The IB MYP 1 by Irina Amlin, Rita Bateson ISBN 9781471880919, 1471880915 Instant Download
34 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
GR 1 Report Week 7
No ratings yet
GR 1 Report Week 7
6 pages
Grade 7 Integers in PDF
No ratings yet
Grade 7 Integers in PDF
17 pages
Measurement Book 2
No ratings yet
Measurement Book 2
20 pages
Lecture 6 Introduction To M Function Programming
No ratings yet
Lecture 6 Introduction To M Function Programming
5 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Mathematics 6 2ND Quarter Cot Detailed DLL Complete
No ratings yet
Mathematics 6 2ND Quarter Cot Detailed DLL Complete
6 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
Fractions Chapter 16 and 17 Maths
No ratings yet
Fractions Chapter 16 and 17 Maths
4 pages
Chapter 03
No ratings yet
Chapter 03
77 pages
ML Notes
No ratings yet
ML Notes
14 pages
Maths GR 9-15-19 March
No ratings yet
Maths GR 9-15-19 March
23 pages
Table of Specification First Quarterly Test in Mathematics Vi
No ratings yet
Table of Specification First Quarterly Test in Mathematics Vi
7 pages
Python Project New
No ratings yet
Python Project New
8 pages
Year 9 Multiplication and Division PDF
No ratings yet
Year 9 Multiplication and Division PDF
27 pages
Math 130 College Mathematics
No ratings yet
Math 130 College Mathematics
5 pages
Grade 4 Multiplication and Division Unit Plan
No ratings yet
Grade 4 Multiplication and Division Unit Plan
23 pages
Ged Math Lesson 1 Ec
No ratings yet
Ged Math Lesson 1 Ec
13 pages
12x Lesson2 Ml4ed
No ratings yet
12x Lesson2 Ml4ed
18 pages
Practical Lab File Based ON Programing in C: Submitted by
No ratings yet
Practical Lab File Based ON Programing in C: Submitted by
6 pages
Matrix Multiplication
No ratings yet
Matrix Multiplication
14 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
DFMFullCoverageFM MatrixAlgebra
No ratings yet
DFMFullCoverageFM MatrixAlgebra
8 pages
Working Towards Level 4c: Solving Problems
No ratings yet
Working Towards Level 4c: Solving Problems
4 pages
Math Multiplication and Division L3 PDF
100% (12)
Math Multiplication and Division L3 PDF
81 pages
RPT Math Form 2
No ratings yet
RPT Math Form 2
16 pages
Salmiya Indian Model School, Kuwait
No ratings yet
Salmiya Indian Model School, Kuwait
3 pages
6 - Reaction Force in Pelton Turbines
No ratings yet
6 - Reaction Force in Pelton Turbines
8 pages
Division Using Units of 2 and 3: Mathematics Curriculum
No ratings yet
Division Using Units of 2 and 3: Mathematics Curriculum
2 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Module3 Ch1

Uploaded by

Module3 Ch1

Uploaded by

Module 3

Training Models: Linear regression, gradient descent, polynomial

• Where a0 is bias, a1 is slope of the line and e is error prediction

• Where ŷ (y hat) is the predicted value

• 1. Standard error(SE) y-y’

• 2. Mean Absolute error (MAE)

• 3. Mean Squared error (MSE)

• Apply inv() function from NumPy’s Linear Algebra module (np.linalg)

• The computational complexity of inverting a matrix is typically about O(n2.4) to

• The SVD approach used by Scikit-Learn’s LinearRegression class is about O(n2).

θ with random values (this is called random

• To implement Gradient Descent, you need to compute the gradient of the

• computes the partial derivative of the cost function with regards to

Second degree transformation ( quadratic)

Thrd degree transformation ( cubic))

import matplotlib.pyplot as plt

• Polynomial Regression is capable of finding relationships between

• This term is added to the cost function.

• Θ0 is not regularized hence I starts from 1

• Eliminate the weights of the least important features

• σ(・)—is a sigmoid function

σ(t) < 0.5 when t < 0, and σ(t) ≥ 0.5 when t ≥ 0,

You might also like