0% found this document useful (0 votes)

9 views25 pages

Regression

Uploaded by

brokenbottle571

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views25 pages

Regression

Uploaded by

brokenbottle571

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Regression

What is regression?

❖ In statistical modeling, regression analysis is a set of statistical processes for estimating

the relationships between a dependent variable (often called the 'outcome' or
'response' variable, or a 'label' or ‘target’ in machine learning parlance) and one or
more independent variables (often called 'predictors', 'covariates', 'explanatory
variables' or 'features‘ or ‘attributes’).

❖ Regression analysis is primarily used for two conceptually distinct purposes: Prediction
and understanding causal relationship.

❖ Applications: Economics, finance, biology, psychology, and machine learning, to make

predictions, infer relationships, and understand the underlying patterns in data.

❖ The earliest form of regression was the method of least squares, which was published
by Legendre in 1805 [1], and by Gauss in 1809 [2].

❖ History: The term "regression" was coined by Francis Galton in the 19th century to
describe a biological phenomenon. The phenomenon was that the heights of
descendants of tall ancestors tend to regress down towards a normal average (a
phenomenon also known as regression toward the mean).
Types of regression

There are different types of regression models, including:

❖ Linear Regression: Assumes a linear relationship between the dependent variable and
one or more independent variables.

❖ Multiple Regression: Extends linear regression to include multiple independent

variables.

❖ Polynomial Regression: Allows for non-linear relationships by including polynomial

terms in the model.

❖ Logistic Regression: Used when the dependent variable is binary (two possible
outcomes) and models the probability of a particular outcome.

❖ Ridge Regression and Lasso Regression: Variants of linear regression that include
regularization terms to prevent overfitting.
Linear regression

❖ The most common form of regression analysis is linear regression, in which the
relationship between dependent and independent variable is approximated by a linear
equation.
❖ The goal is to find the best-fitting line (or hyperplane in the case of multiple
independent variables) that minimizes the sum of squared differences between the
observed and predicted values.

❖ Linear regression model:

Linear regression (cont.)

❖ Linear regression model prediction in vectorized form:

Linear regression: method of least square

❖ The model parameters for the hypothesis function may be determined by the principle
of least square:
minimizing the mean of the squares of the differences between the observed
dependent or target variable in the input dataset and the output of the (linear)
hypothesis function of the independent or feature variables (MSE).

❖ MSE cost function for a linear regression model:

❖ The normal equation:

❖ The matrix XTX is referred to as Gram matrix or normal matrix

Coding linear regression model from scratch
import numpy as np
np.random.seed(42) # to make this code example
reproducible
m = 100 # number of instances
X = 2*np.random.rand(m, 1) # column vector
y = 4 + 3 * X + np.random.randn(m, 1) # column
vector
from sklearn.preprocessing import
add_dummy_feature
X_b = add_dummy_feature(X) # add x0 = 1 to each
instance
theta_best = np.linalg.inv(X_b.T @ X_b) @ X_b.T @
>>> theta_best
y Fig.2.1. A randomly generated linear data-set
array([[4.21509616],
[2.77011339]])

>>> X_new = np.array([[0], [2]])

>>> X_new_b = add_dummy_feature(X_new) # add x0 =
1 to each instance
>>> y_predict = X_new_b @ theta_best
>>> y_predict
array([[4.21509616],
[9.75532293]])

import matplotlib.pyplot as plt

plt.plot(X_new, y_predict, "r-",
label="Predictions")
plt.plot(X, y, "b.") Fig.2.2. Linear regression model predictions
Performing linear regression using Scikit-Learn

>>> from sklearn.linear_model import ❖ The LinearRegression class is based on the

LinearRegression scipy.linalg.lstsq() function (the name stands for
>>> lin_reg = LinearRegression()
“least squares”), which can be called directly.
>>> lin_reg.fit(X, y)
>>> lin_reg.intercept_, lin_reg.coef_
❖ This function computes θ = X+ y, where X+ is the
(array([4.21509616]), array([[2.77011339]])) pseudoinverse of X (specifically, the
Moore–Penrose inverse). You can use
>>> lin_reg.predict(X_new)
np.linalg.pinv() to compute the pseudoinverse
array([[4.21509616],
[9.75532293]]) directly.
❖ The pseudoinverse itself is computed using a
>>> theta_best_svd, residuals, rank, s = standard matrix factorization technique called
np.linalg.lstsq(X_b, y, rcond=1e-6) singular value decomposition (SVD) that can
decompose the training set matrix X into the
>>> theta_best_svd matrix multiplication of three matrices U Σ V⊺.

array([[4.21509616], The pseudoinverse is computed as X+ = VΣ+U⊺.

[2.77011339]])

❖ Computational complexity:
>>> np.linalg.pinv(X_b) @ y

array([[4.21509616], Normal equation: O(n3).

[2.77011339]])
SVD based pseudoinv: O(n2)
Computational complexity of linear regression- Gradient Descent

❖ Computational cost of solving normal equation: O(n3), where ‘n’ is the number of
features

❖ Computational cost of SVD based approach of Scikit-Learn: O(n2)

❖ Both the Normal equation and the SVD approach get very slow when the number of
features grows large (e.g., 100,000).

❖ Gradient descent is a fundamental optimization technique and is widely employed in

training various machine learning models, including neural networks, linear regression,
and logistic regression, among others.

❖ The primary goal of gradient descent is to iteratively adjust the model's parameters in
the direction that reduces the error.
Gradient Descent: Overview
❖ High level overview of gradient descent:
• Initialize Parameters: Start with initial values for the parameters of the model.

• Calculate the Gradient: Compute the gradient of the cost function with respect to each
parameter. The gradient represents the direction and magnitude of the steepest
increase in the cost function.

• Update Parameters: Adjust the parameters in the opposite direction of the gradient to
decrease the cost. This is done by multiplying the gradient by a learning rate and
subtracting the result from the current parameter values.

• Repeat steps 2 and 3 until convergence or a❖predefined number

Different of iterations.
variants:
Batch Gradient Descent,
Stochastic Gradient Descent,
Mini-Batch Gradient Descent

Fig.2.3. Pictorial description of Gradient Descent algorithm

Gradient Descent: Learning Rate
❖ The learning rate is a crucial hyperparameter: too small value may lead to slow
convergence, while a too large value may cause overshooting and instability.

Fig.2.4. Learning rate too small Fig.2.5. Learning rate too high

❖ Local and global minimum

❖ Plateau

❖ Fortunately, the MSE cost function

for a linear regression model
happens to be a convex function.

Fig.2.6. Gradient descent pitfalls

Gradient Descent: Feature scaling
❖ When using gradient descent, you should ensure that all features have a similar scale
(e.g., using Scikit-Learn’s StandardScaler class), or else it will take much longer to
converge.

Fig.2.7. Gradient descent with (left) and without (right) feature scaling
Batch Gradient Descent
❖ To implement gradient descent, you need to compute the gradient of the cost function
with regard to each model parameter
An implementation of Gradient descent:
(5) with (left) and without (right) feature scaling

eta = 0.1 # learning rate

n_epochs = 1000
❖ Gradient vector of the cost function:
m = len(X_b) # number of instances
np.random.seed(42)

theta = np.random.randn(2, 1) #
(6) randomly initialized model parameters

for epoch in range(n_epochs):

gradients = 2 / m * X_b.T @ (X_b @
theta - y)
❖ Gradient Descent step:
theta = theta - eta * gradients
>>> theta
(7) array([[4.21509616],
[2.77011339]])
Batch Gradient Descent: Learning rates
❖ Gradient descent worked perfectly. But what if you had used a different learning rate
(eta)? Figure 2-8 shows the first 20 steps of gradient descent using three different
learning rates.

Fig.2.8. Batch gradient descent with various learning rates

❖ To find a good learning rate, we can use grid search.

Stochastic Gradient Descent
❖ Instead of using the whole training set, stochastic gradient descent picks a random
instance in the training set at every step and computes the gradients based only on that
single instance..
❖ It is very fast and memory efficient.

❖ Due to its stochastic nature the algorithm is

much less regular than batch gradient
descent.
❖ It also helps the algorithm jumps out of the
local minima.

Fig.2.9. Convergence of Stochastic

from sklearn.linear_model import
Gradient Descent algorithm
SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000,
tol=1e-5, penalty=None, eta0=0.01,
n_iter_no_change=100, random_state=42) ❖ Learning schedule: simulated
annealing
sgd_reg.fit(X, y.ravel()) # y.ravel()
because fit() expects 1D sgd_reg.coef_
>>> sgd_reg.intercept_, targets
(array([4.21278812]), array([2.77270267]))
Mini Batch Gradient Descent
❖ Mini-batch GD computes the gradients on small random sets of instances called
mini-batches.

Fig.2.10. Gradient descent paths in

parameter space
Polynomial Regression
❖ The same ordinary least square regression method as used in linear regression method can be used
for polynomial regression as well.
❖ A simple way to do this is to add powers of each feature as new features, then train a linear model
on this extended set of features.
np.random.seed(42)
m = 100
X = 6 * np.random.rand(m, 1) - 3
y = 0.5 * X ** 2 + X + 2 +
np.random.randn(m, 1)
>>> from sklearn.preprocessing import
PolynomialFeatures

>>> poly_features = PolynomialFeatures(degree=2,

include_bias=False)
>>> X_poly = poly_features.fit_transform(X) Fig.2.11. Generated nonlinear and noisy dataset
>>> X[0]
array([-0.75275929])
>>> X_poly[0]
array([-0.75275929, 0.56664654])
>>> lin_reg = LinearRegression()
>>> lin_reg.fit(X_poly, y)
>>> lin_reg.intercept_, lin_reg.coef_
(array([1.78134581]), array([[0.93366893,
0.56456263]]))
❖ Model estimate: ypred = 0.56(x1)2 + 0.93x1 + 1.78 Fig.2.12. Generated nonlinear and noisy dataset
Learning Curves
❖ The high-degree polynomial regression model is severely overfitting the training data, while the
linear model is underfitting it.

❖ In general, how can you decide how

complex your model should be? How can
you tell that your model is overfitting or
underfitting the data?

❖ Cross-validation is the method to get an

estimate of a model’s generalization
performance.

❖ Learning curves: Plots of the model’s

training error and validation error as a
function of the training iteration.
Fig.2.13. High-degree polynomial regression

❖ Learning curves: There is a learning curve function from sklearn.model_selection to analyze how
the performance of a LinearRegression model changes as the training dataset size increases.
Learning Curves (Cont.)

❖ Figure 2.14 shows output of the learning curve for simple linear regression applied to the noisy
quadratic data set of figure 2.11.

from sklearn.model_selection import

learning_curve

train_sizes, train_scores, valid_scores =

learning_curve(
LinearRegression(), X, y,
train_sizes=np.linspace(0.01, 1.0, 40), cv=5,
scoring="neg_root_mean_squared_error")

train_errors = -train_scores.mean(axis=1)

valid_errors = -valid_scores.mean(axis=1)

Fig.2.14. Learning curves

❖ Interpretation of the results:
Training and validation scores are close, but they are high which indicates underfitting.
Learning Curves (Cont.)
❖ Now, consider the learning curves of a 10th-degree polynomial model on the same data (Figure 2.11):.

from sklearn.pipeline import make_pipeline

polynomial_regression = make_pipeline(
PolynomialFeatures(degree=10,
include_bias=False),
LinearRegression())

train_sizes, train_scores, valid_scores =

learning_curve(
polynomial_regression, X, y,
train_sizes=np.linspace(0.01, 1.0, 40), cv=5,
scoring="neg_root_mean_squared_error") Fig.2.15. Learning curves
❖ There are two important differences as compared to last one:
• The error on the training data is much lower than before.
• There is a gap between the curves. This means that the model performs significantly better on
the training data than on the validation data, which is the sign of overfitting.

❖ Bottom-line:
• If the training and validation scores are close and good, the model generalizes well.
• If the training score is much better than the validation score, the model may be overfitting.
• If both scores are poor, the model may be underfitting..
Logistic Regression

❖ Logistic regression is a statistical method used for binary classification problems, where the goal is to
predict the probability that an instance belongs to a particular class.

❖ Just like a linear regression model, a logistic regression model computes a weighted sum of the input
features (plus a bias term). but instead of outputting the result directly like the linear regression
model does, it outputs the logistic of this result.

❖ Estimating Probabilities:

❖ Logistic sigmoid function:

❖ Prediction:
Logistic regression model prediction
using a 50% threshold probability

(10)

❖ With default threshold of 50%

probability it predicts 1 if θ⊺x>0
Fig.2.11. Logistic function and θ⊺x<0.
Logistic Regression: Training and Cost Function

❖ The objective of training is to set the parameter vector θ so that the model estimates high
probabilities for positive instances (y = 1) and low probabilities for negative instances (y = 0).

❖ Cost function for single training instance:

(11)

❖ log loss:

(12)

❖ Minimization of the cost function:

(13)

❖ The objective of training is to set the parameter vector θ so that the model estimates high
probabilities for positive instances (y = 1) and low probabilities for negative instances (y = 0).
Logistic Regression: Decision Boundary

❖ We can use the iris dataset to illustrate logistic regression. This is a famous dataset that contains the
sepal and petal length and width of 150 iris flowers of three different species: Iris setosa, Iris
versicolor, and Iris virginica
from sklearn.linear_model import
>>> from sklearn.datasets import load_iris
LogisticRegression
>>> iris = load_iris(as_frame=True)
from sklearn.model_selection import
>>> list(iris)
train_test_split
['data', 'target', 'frame', 'target_names',
X = iris.data[["petal width (cm)"]].values
'DESCR', 'feature_names',
y = iris.target_names[iris.target] ==
'filename', 'data_module’]
'virginica'
X_train, X_test, y_train, y_test =
>>> iris.data.head(3)
train_test_split(X, y, random_state=42)
sepal length (cm) sepal width (cm) petal length
(cm) petal width (cm)
log_reg =
0 5.1 3.5 1.4 0.2
LogisticRegression(random_state=42)
1 4.9 3.0 1.4 0.2
log_reg.fit(X_train, y_train)
2 4.7 3.2 1.3 0.2

>>> iris.target.head(3) # note that the instances

are not shuffled
0 0
1 0
2 0
Name: target, dtype: int64

>>> iris.target_names
array(['setosa', 'versicolor', 'virginica'],
Logistic Regression: Decision Boundary

X_new = np.linspace(0, 3, 1000).reshape(-1, 1)

# reshape to get a column vector
y_proba = log_reg.predict_proba(X_new)
decision_boundary = X_new[y_proba[:, 1] >=
0.5][0, 0]
plt.plot(X_new, y_proba[:, 0], "b--",
linewidth=2,
label="Not Iris virginica proba")
plt.plot(X_new, y_proba[:, 1], "g-",
linewidth=2, label="Iris virginica proba")
plt.plot([decision_boundary,
decision_boundary], [0, 1], "k:", linewidth=2,
label="Decision boundary")
[...] # beautify the figure: add grid, labels,
axis, legend, arrows, and samples
plt.show()
>>> decision_boundary
1.6516516516516517
>>> log_reg.predict([[1.7],
[1.5]])
array([ True, False])
References

1. A.M. Legendre. Nouvelles méthodes pour la détermination des orbites des comètes, Firmin Didot,
Paris, 1805. “Sur la Méthode des moindres quarrés” appears as an appendix.

2. Chapter 1 of: Angrist, J. D., & Pischke, J. S. (2008). Mostly Harmless Econometrics: An Empiricist's
Companion. Princeton University Press.

The Routledge Handbook of Second Language Acquisition and Speaking
No ratings yet
The Routledge Handbook of Second Language Acquisition and Speaking
491 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Education and Development in India
No ratings yet
Education and Development in India
672 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
Flight Performance and Planning (PPL)
No ratings yet
Flight Performance and Planning (PPL)
3 pages
Regression
No ratings yet
Regression
16 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
ML Lecture 2 2023
No ratings yet
ML Lecture 2 2023
59 pages
Lecture04. Training Models (Regression in Chapter 4)
No ratings yet
Lecture04. Training Models (Regression in Chapter 4)
44 pages
Lec6 7 Linear Regression
No ratings yet
Lec6 7 Linear Regression
38 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
ML 20 04 23
No ratings yet
ML 20 04 23
19 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
Question 1 B
No ratings yet
Question 1 B
6 pages
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
Module 3
No ratings yet
Module 3
27 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
115 pages
Fileml
No ratings yet
Fileml
54 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Final ML
No ratings yet
Final ML
54 pages
2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Chapter04 Training Models
No ratings yet
Chapter04 Training Models
33 pages
CSE 412 Lab Manual 3 Linear Regression
No ratings yet
CSE 412 Lab Manual 3 Linear Regression
10 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
54 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
S&ML Unit 5 - Q & A
No ratings yet
S&ML Unit 5 - Q & A
15 pages
Mla Unit 2
No ratings yet
Mla Unit 2
99 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
ML Section2
No ratings yet
ML Section2
36 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Unit 2
No ratings yet
Unit 2
92 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Modern Pridictive Modelling (Regression)
No ratings yet
Modern Pridictive Modelling (Regression)
12 pages
MLDL I Linear Regression With Gradient Descent - Ipynb Colaboratory
No ratings yet
MLDL I Linear Regression With Gradient Descent - Ipynb Colaboratory
15 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
MR 6-Channel Phased Array Flex Coil: GE 1.5T and 3.0T Operator Manual
No ratings yet
MR 6-Channel Phased Array Flex Coil: GE 1.5T and 3.0T Operator Manual
40 pages
Reference No. Self-Assessment Guide Agricultural Crop Production NC Ii Perform Nursery Operations
No ratings yet
Reference No. Self-Assessment Guide Agricultural Crop Production NC Ii Perform Nursery Operations
4 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Methods of Research
100% (1)
Methods of Research
6 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
003 ITP UG Piping
100% (1)
003 ITP UG Piping
4 pages
DTAV40Series Instructions PDF
No ratings yet
DTAV40Series Instructions PDF
12 pages
Aultons Pharmaceuticals Drying PDF
No ratings yet
Aultons Pharmaceuticals Drying PDF
18 pages
Alternators: LSA 42.2 - 2 Pole
No ratings yet
Alternators: LSA 42.2 - 2 Pole
7 pages
Electrics Database Questions
No ratings yet
Electrics Database Questions
9 pages
GIS and Its Implementations
No ratings yet
GIS and Its Implementations
250 pages
The Sabre Story: June 2000
No ratings yet
The Sabre Story: June 2000
5 pages
Laws of Malaysia: Factories and Machinery Act 1967 (Act 139)
No ratings yet
Laws of Malaysia: Factories and Machinery Act 1967 (Act 139)
3 pages
2SC829
No ratings yet
2SC829
4 pages
Sicon
No ratings yet
Sicon
32 pages
PC 17L1
No ratings yet
PC 17L1
4 pages
Tds Total Fluidmatic Iiig - 1208 - Product Spec
No ratings yet
Tds Total Fluidmatic Iiig - 1208 - Product Spec
2 pages
ManuallyDeleteDS Job
No ratings yet
ManuallyDeleteDS Job
11 pages
PRACTICAL RESEARCH 2 - Set B
No ratings yet
PRACTICAL RESEARCH 2 - Set B
1 page
Soft Computing.
No ratings yet
Soft Computing.
51 pages
TCP Header
No ratings yet
TCP Header
8 pages
Gesture Control of Mobile Robot Based On MSP430 Microcontroller
No ratings yet
Gesture Control of Mobile Robot Based On MSP430 Microcontroller
13 pages
Digital TV Tuner Ic: Features
No ratings yet
Digital TV Tuner Ic: Features
25 pages
Alperen Tunçkıran 2517100
No ratings yet
Alperen Tunçkıran 2517100
111 pages
Quadratic Functions
No ratings yet
Quadratic Functions
14 pages
CN 1-5.
No ratings yet
CN 1-5.
46 pages
C4.1 Student Activity: Amount of Substance
No ratings yet
C4.1 Student Activity: Amount of Substance
7 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
17 pages
Linksys Unmanaged Switches: Key Features and Benefits
No ratings yet
Linksys Unmanaged Switches: Key Features and Benefits
2 pages
Neural Network
No ratings yet
Neural Network
22 pages
Vehicle Over-Speed Detection System
No ratings yet
Vehicle Over-Speed Detection System
3 pages
OS LAB-IPC Using PIPE
No ratings yet
OS LAB-IPC Using PIPE
9 pages
Short Notes.
No ratings yet
Short Notes.
2 pages
Medical Conference Style Presentation by Slidesgo
No ratings yet
Medical Conference Style Presentation by Slidesgo
41 pages
Exercises of Multi-Variable Functions
From Everand
Exercises of Multi-Variable Functions
Simone Malacrida
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Regression

Uploaded by

Regression

Uploaded by

Regression

❖ In statistical modeling, regression analysis is a set of statistical processes for estimating

❖ Applications: Economics, finance, biology, psychology, and machine learning, to make

There are different types of regression models, including:

❖ Multiple Regression: Extends linear regression to include multiple independent

❖ Polynomial Regression: Allows for non-linear relationships by including polynomial

❖ Linear regression model:

❖ Linear regression model prediction in vectorized form:

❖ MSE cost function for a linear regression model:

❖ The normal equation:

❖ The matrix XTX is referred to as Gram matrix or normal matrix

>>> X_new = np.array([[0], [2]])

import matplotlib.pyplot as plt

>>> from sklearn.linear_model import ❖ The LinearRegression class is based on the

array([[4.21509616], The pseudoinverse is computed as X+ = VΣ+U⊺.

array([[4.21509616], Normal equation: O(n3).

❖ Computational cost of SVD based approach of Scikit-Learn: O(n2)

❖ Gradient descent is a fundamental optimization technique and is widely employed in

• Repeat steps 2 and 3 until convergence or a❖predefined number

Fig.2.3. Pictorial description of Gradient Descent algorithm

❖ Local and global minimum

❖ Fortunately, the MSE cost function

Fig.2.6. Gradient descent pitfalls

eta = 0.1 # learning rate

for epoch in range(n_epochs):

Fig.2.8. Batch gradient descent with various learning rates

❖ To find a good learning rate, we can use grid search.

❖ Due to its stochastic nature the algorithm is

Fig.2.9. Convergence of Stochastic

Fig.2.10. Gradient descent paths in

>>> poly_features = PolynomialFeatures(degree=2,

❖ In general, how can you decide how

❖ Cross-validation is the method to get an

❖ Learning curves: Plots of the model’s

from sklearn.model_selection import

train_sizes, train_scores, valid_scores =

Fig.2.14. Learning curves

from sklearn.pipeline import make_pipeline

train_sizes, train_scores, valid_scores =

❖ Logistic sigmoid function:

❖ With default threshold of 50%

❖ Cost function for single training instance:

❖ Minimization of the cost function:

>>> iris.target.head(3) # note that the instances

X_new = np.linspace(0, 3, 1000).reshape(-1, 1)

You might also like