100% found this document useful (1 vote)

48 views

Linear - Regression

- Linear regression aims to predict a target value based on an input value using a linear model. - Nonlinear basis functions can be applied to inputs to allow for nonlinear relationships in the model while still using linear regression. - Maximum likelihood estimation and least squares are commonly used to fit the linear regression model parameters by minimizing the sum of squared errors. - Regularization can be added to the least squares optimization to prevent overfitting. Multiple outputs can also be modeled using a matrix of weights.

Uploaded by

howgibaa

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

48 views

Linear - Regression

Uploaded by

howgibaa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Introduction to Machine Learning

Linear Models for Regression

林彥宇教授
Yen-Yu Lin, Professor
國立陽明交通大學資訊工程學系
Computer Science, National Yang Ming Chiao Tung University

Some slides are modified from Prof. Sheng-Jyh Wang

and Prof. Hwang-Tzong Chen
Regression

• Given a training data set comprising 𝑁 observations {𝐱𝑛 }𝑁 𝑛=1

and the corresponding target values {𝑡𝑛 }𝑁 𝑛=1 , the goal of
regression is to predict the value of 𝑡 for a new value of 𝐱

https://www.scribbr.com/statistics/linear-regression-in-r/
2
A simple regression model

• A simple linear model:

➢ Each observation is in a 𝐷–dimensional space 𝐱 = (𝑥1 , … , 𝑥𝐷 )T

➢ 𝑦 is a regression model parametrized by 𝐰 = (𝑤0 , … , 𝑤𝐷 )T
➢ The output is a linear combination of the input variables
➢ It is a linear function of parameters
➢ The fitting power is quite limited. Seeking a nonlinear extension
for the input variables

3
An example

• A regressor in the form of

➢ A straight line in this case -> Insufficient fitting power
➢ Nonlinear feature transforms before linear regression

4
Linear regression with nonlinear basis functions

• Simple linear model:

• A linear model with nonlinear basis functions

where {𝜙𝑗 }𝑀−1

𝑗=1 : nonlinear basis functions
𝑀: the number of parameters
𝑤0 : the bias parameter allowing a fixed offset

• The regression output is a linear combination of nonlinear

basis functions of the inputs
5
Linear regression with nonlinear basis functions

• A linear model with nonlinear basis functions

• Let 𝜙0 𝐱 = 1, a dummy basis function. The regression function

is equivalently expressed as

where and

6
Examples of basis functions
• Polynomial basis function: taking the form of powers of 𝑥

• Gaussian basis function: governed by and

➢ governs the location while governs the scale

• Sigmoidal basis function: governed by and

where

7
How basis functions work

• Take Gaussian basis functions as an example

y = w0 + w11 ( x ) + w22 ( x ) + ... + wM −1M −1 ( x )

1(x) 2(x) 3(x) 4(x) 5(x) 6(x) 7(x) 8(x)

8
Maximum likelihood and least squares

• Assume each observation is sampled from a deterministic

function with an added Gaussian noise

where 𝜀 is a zero mean Gaussian and precision (inverse

variance) is 𝛽

• Thus, we have the conditional probability

9
Maximum likelihood and least squares

• Given a data set of inputs X = {x1, . . . , xN} with corresponding

target values t1, . . . , tN, we have the likelihood function

• The log likelihood function is

where

10
Maximum likelihood and least squares

• Given a data set of inputs X = {x1, . . . , xN} with corresponding

target values t1, . . . , tN, we have the likelihood function

• The log likelihood function is

How?

where

11
Maximum likelihood and least squares

• Gaussian noise likelihood ֞ sum-of-squares error function

• Maximum likelihood solution: Optimize 𝐰 by maximizing the

log likelihood function

• Step 1: Compute the gradient of log likelihood w.r.t. 𝐰

• Step 2: Set the gradient to zero, which gives

12
Maximum likelihood and least squares

• Define the design matrix in this task

➢ It has 𝑁 rows, one for each training sample

➢ It has 𝑀 columns, one for each basis function

13
Maximum likelihood and least squares

• Setting the gradient to zero

we have

• How to derive?
➢ Hint 1:
➢ Hint 2:

14
Maximum likelihood and least squares

• The ML solution

• is known as the Moore-Penrose pseudo-

inverse of the design matrix

• has linearly independent columns. Why is invertible?

15
Maximum likelihood and least squares

• Similarly, 𝛽 is optimized by maximizing the log likelihood

where

• We get

16
Regression for a new data point

• The conditional probability (likelihood function)

• After learning, we get 𝐰 ՚ 𝐰ML and 𝛽 ՚ 𝛽ML

• Specify the prediction of a data point 𝐱 in the form of a

−1
Gaussian distribution with mean 𝑦 𝐱, 𝐰ML and variance 𝛽ML

17
Regularized least squares

• Add a regularization term helps alleviate over-fitting

• The simplest form of the regularization term

• The total error function becomes

• Setting the gradient of the function w.r.t. 𝐰 to 0, we have

18
Regularized least squares

• A more general regularizer

• q=2 → quadratic regularizer

• q=1 → the lasso in the statistics literature
• Contours of the regularization term

19
Multiple outputs

• In some applications, we wish to predict 𝐾 > 1 target values

➢ One target value: Income -> Happiness
➢ Multiple target values: Income -> Happiness, Hours of duty, Health

• Recall the one-dimensional case

• With the same basis functions, the regression approach becomes

where 𝐖 is a 𝑀 × 𝐾 matrix, 𝑀 is the number of basis functions,

and 𝐾 is the number of target values

20
Multiple outputs

• The conditional probability of a single observation is

➢ An isotropic Gaussian, i.e., with a diagonal covariance matrix

➢ Each pair of variables are independent

• The log likelihood function is

21
Multiple outputs: Maximum likelihood solution

• Setting the gradient of the log likelihood function w.r.t. 𝐖 to 0,

we have

• Consider the kth column of 𝐖ML

where 𝐭 𝑘 is a 𝑁-dimensional vector with components [𝑡𝑛𝑘 ]

• It leads to 𝐾 independent regression problems

22
Sequential learning

• The maximum likelihood derivation is a batch technique

➢ It takes all training data into account at the same time
➢ Case 1: The training data set is sufficiently large
➢ Case 2: Data points are arriving in a continuous stream

• For the two cases, it is worthwhile to use sequential

algorithms, or on-line algorithms, in which the data points are
considered one by one, and the model parameters are
updated incrementally

23
Sequential learning

• Stochastic gradient descent

➢ Error function comprises a sum over data points 𝐸 = σ𝑛 𝐸𝑛

➢ Given data point 𝐱 𝑛 , the parameter vector 𝐰 is updated by

where 𝜏 is the iteration number and η is the learning rate

➢ In the case of sum-of-squares error, it is

24
Maximum a posterior

• Likelihood function

• Let’s consider a prior function, which is a Gaussian

where 𝐦0 is the mean and S0 is the covariance matrix

• The posterior function is also a Gaussian

where is the mean

and is the covariance
25
How to derive the mean and covariance in posterior

• According to the marginal and conditional Gaussians on page

93 of the PRML textbook

26
A zero-mean isotropic Gaussian prior

• A general Gaussian prior function

where 𝐦0 is the mean and S0 is the covariance matrix

• A widely used Gaussian prior

• Mean and covariance of the resulting posterior function

27
Sequential Bayesian learning: An example

• Data, including observations and target values, are given one-

by-one

• Data are in a one-dimensional space

• Data are sampled from the function ,

where and , and added by a Gaussian
noise
➢ Note that the function is unknown
➢ We have just the observations and the target values

28
An example

• Regression function

29
An example

• Regression function

• In the beginning, no
data are available

• Constant likelihood

• Prior = posterior

• Sample 6 curves for

function according to
posterior distribution

30
An example

• Regression function

• One data (blue circle)

sample is given
• Likelihood for this
sample
• White cross
• Posterior proportional
to likelihood x prior
• Sample 6 curves
according to posterior
31
An example

• Regression function

• Second data (blue

circle) sample is given
• Likelihood for the
second sample
• White cross
• Posterior proportional
to likelihood x prior
• Sample 6 curves
according to posterior
32
An example

• Regression function

• 20 data (blue circle)

sample are given
• Likelihood for the 20th
sample
• White cross
• Posterior proportional
to likelihood x prior
• Sample 6 curves
according to posterior
33
Predictive distribution
• Recall the posterior function

where and

• Given 𝐰, we regress a data sample via

• In Bayesian treatment, the predictive distribution is

• Then we have

• where
34
• Green curve is used to sample data. It is unknown
• Blue circle: a sampled data
• After learning, the predictive distribution

• Red curve: the mean of the Gaussian above

• Red shaded region: One standard deviation on either side of mean
35
36
• Sample 5 points of 𝐰 according to the posterior function
• Plot the corresponding regression functions

37
References

• Chapters 3.1 and 3.3 in the PRML textbook

38
Thank You for Your Attention!

Yen-Yu Lin (林彥宇)

Email: [email protected]
URL: https://www.cs.nycu.edu.tw/members/detail/lin

Applied Data Science Camp - Info
100% (1)
Applied Data Science Camp - Info
12 pages
Study Plan - SBL 12 Week - PER
100% (1)
Study Plan - SBL 12 Week - PER
1 page
Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Chapter-3-Linear Models For Regression
100% (1)
Chapter-3-Linear Models For Regression
61 pages
9 Regression
100% (1)
9 Regression
14 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Bagging, Boosting
100% (1)
Bagging, Boosting
32 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Hypothesis and Hypothesis Testing
100% (1)
Hypothesis and Hypothesis Testing
59 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
100% (1)
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
16 pages
SQL Cheat Sheet
100% (1)
SQL Cheat Sheet
44 pages
Assignment10 4
100% (1)
Assignment10 4
3 pages
Vinee
100% (1)
Vinee
28 pages
HW1
100% (1)
HW1
8 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Thinkcspy 3
100% (1)
Thinkcspy 3
415 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
0.1 Guilherme Marthe - Boston House Pricing Challenge
100% (1)
0.1 Guilherme Marthe - Boston House Pricing Challenge
15 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Bagging and Boosting
100% (1)
Bagging and Boosting
19 pages
XG Boost PDF
100% (1)
XG Boost PDF
3 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
10 pages
Regressao Linear Simples - Ipynb - Colaboratory
100% (1)
Regressao Linear Simples - Ipynb - Colaboratory
2 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
Cloud Motion Tracking (1) (Read-Only)
100% (1)
Cloud Motion Tracking (1) (Read-Only)
10 pages
Data Analytics Time Table V2
100% (1)
Data Analytics Time Table V2
6 pages
Book
100% (1)
Book
480 pages
8 Best Python Cheat Sheets For Beginners and Intermediate Learners
100% (1)
8 Best Python Cheat Sheets For Beginners and Intermediate Learners
13 pages
Homoscedasticity, Heteroscedasticity and Multicollinearity
100% (1)
Homoscedasticity, Heteroscedasticity and Multicollinearity
10 pages
Project 1 - Radio Link Failure Prediction
100% (1)
Project 1 - Radio Link Failure Prediction
8 pages
PR01
100% (1)
PR01
41 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Sales Forecasting
100% (1)
Sales Forecasting
10 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
Machine Learning Algorithm
100% (2)
Machine Learning Algorithm
20 pages
Regression
100% (1)
Regression
20 pages
XGBoost R Tutorial
100% (1)
XGBoost R Tutorial
10 pages
Teleco Cutomer Churn
100% (1)
Teleco Cutomer Churn
5 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Lab7.ipynb - Colaboratory
100% (1)
Lab7.ipynb - Colaboratory
5 pages
Logistics Regression
100% (1)
Logistics Regression
5 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
PWD 2019 20 Class 9 Final PDF
No ratings yet
PWD 2019 20 Class 9 Final PDF
21 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Research Objectives 1
No ratings yet
Research Objectives 1
31 pages
Multiple Linear Regression Analysis Usin
No ratings yet
Multiple Linear Regression Analysis Usin
19 pages
Journal of Contemporary Accounting & Economics: Yu-Tzu Chang, Hanchung Chen, Rainbow K. Cheng, Wuchun Chi T
No ratings yet
Journal of Contemporary Accounting & Economics: Yu-Tzu Chang, Hanchung Chen, Rainbow K. Cheng, Wuchun Chi T
19 pages
Module 11 Unit 2 Simple Linear Regression
No ratings yet
Module 11 Unit 2 Simple Linear Regression
10 pages
ML Unit 1 MCQ
100% (1)
ML Unit 1 MCQ
9 pages
Regression
No ratings yet
Regression
3 pages
Aircraft Gas Turbine Engine Health Monitoring Syst
No ratings yet
Aircraft Gas Turbine Engine Health Monitoring Syst
13 pages
JCN 10 774 Wald Test
No ratings yet
JCN 10 774 Wald Test
1 page
Weather Forecasting
No ratings yet
Weather Forecasting
10 pages
Kasmad - Analysis of Puchase Decision...
No ratings yet
Kasmad - Analysis of Puchase Decision...
8 pages
Customer Review Analysis Using Data Science
No ratings yet
Customer Review Analysis Using Data Science
31 pages
Statistics For Engineers and Scientists, 6th Edition William Navidi - Ebook PDF 2024 Scribd Download
100% (7)
Statistics For Engineers and Scientists, 6th Edition William Navidi - Ebook PDF 2024 Scribd Download
41 pages
Tum Dersler Veri Madenciligi
No ratings yet
Tum Dersler Veri Madenciligi
123 pages
Experimental and Action Research Fareedullah
No ratings yet
Experimental and Action Research Fareedullah
11 pages
Cappadocia, Weiss & Pepler (2012) - AUTISM
No ratings yet
Cappadocia, Weiss & Pepler (2012) - AUTISM
35 pages
1248 4115 1 PB
No ratings yet
1248 4115 1 PB
11 pages
Variables: Variable
No ratings yet
Variables: Variable
13 pages
Operations Research(9)
No ratings yet
Operations Research(9)
215 pages
Jurnal Pengaruh Kompensasi Dan Motivasi Kerja Terhadap Kepuasan Kerja Karyawan (Penelitian Pada Kantor Pusat Bank CIMB Niaga Subdirektorat Digital Banking, Branchless and Partnership
No ratings yet
Jurnal Pengaruh Kompensasi Dan Motivasi Kerja Terhadap Kepuasan Kerja Karyawan (Penelitian Pada Kantor Pusat Bank CIMB Niaga Subdirektorat Digital Banking, Branchless and Partnership
30 pages
Criterion Regression 1 PDC A + (B × MP$) Regression 2 PDC A + (B × # of Pos) Regression 3 PDC A + (B × # of SS)
No ratings yet
Criterion Regression 1 PDC A + (B × MP$) Regression 2 PDC A + (B × # of Pos) Regression 3 PDC A + (B × # of SS)
3 pages
CS5805 Proposal 1
No ratings yet
CS5805 Proposal 1
2 pages
How The Experimental Method Works in Psychology-1
No ratings yet
How The Experimental Method Works in Psychology-1
55 pages
Data Analytics Week 5
No ratings yet
Data Analytics Week 5
20 pages
Elliott. Relationships Among Instrumental Sight Reading and Seven Selected Predictor Variables
100% (1)
Elliott. Relationships Among Instrumental Sight Reading and Seven Selected Predictor Variables
11 pages
A Formal Report
No ratings yet
A Formal Report
20 pages
04.Session Notes on Principal Component Regression(1)
No ratings yet
04.Session Notes on Principal Component Regression(1)
12 pages
Hydrocortisone in Silico Analysis
No ratings yet
Hydrocortisone in Silico Analysis
11 pages
Causal Comparative Research
No ratings yet
Causal Comparative Research
31 pages
Perceived Risk Barriers To Internet Shopping
No ratings yet
Perceived Risk Barriers To Internet Shopping
13 pages
Creating A Data Analysis Plan What To
No ratings yet
Creating A Data Analysis Plan What To
7 pages