0% found this document useful (0 votes)

65 views20 pages

CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)

This document provides an overview of linear regression problems and models. It discusses: 1. Regression problems involve predicting a continuous target value from input data and commonly involve curve fitting, time series forecasting, and other applications. 2. Linear regression models the relationship between input and output with a linear function and aims to minimize the loss between predictions and targets. 3. The model can be trained with a closed-form solution by taking the pseudo-inverse of the input data or using gradient descent to iteratively minimize loss.

Uploaded by

Mary Liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views20 pages

CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)

Uploaded by

Mary Liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

CIS 4526: Foundations of Machine Learning

Linear Regression
(modified from Sanja Fidler)

Instructor: Kai Zhang

CIS @ Temple University, Fall 2020
Regression Problems
Regression Problems
Regression Problems

• Curve Fitting
Regression Problems

• Time Series Forecast

Regression Problem

• What
do all these problems have in common?
– Input: d-dimensional samples/vectors
– Output: continuous target value
• How to make predictions?
– A model, a function that represents the relationship between
and
– A loss or (cost, or objective function), which tells us how well
our model approximates the training examples
– Optimization, a way of finding the parameters of our model that
minimizes the loss function
Simple 1-D example

𝑦(𝑥)
Model Selection

• Model Complexity
– In what form should we parameterize the prediction
function?
– How complex should the model be?
• Example: linear, quadratic, or degree-d polynomial? (1-d case)
• Common Belief
– Simple models
• less flexible, but may be easy to solve (such as a linear model)
– Complex models
• more powerful, but difficult to solve, and prone to overfitting
• We will start from building simple, linear models
Model Selection

• We will start from building simple, linear models

Higher-dimensional Linear Regression

• Circles are training examples

• Line/Plane represent the model/hypothesis
• Red lines represent in-sample error

10
Linear Model
• Given
d-dimensional training samples
• Linear model
– Equivalent form:
′
𝒘
𝑥𝑛
– So we usually apply ``augmented’’, (d+1)-dimensional data
• More convenient to derive closed-form solutions
Training (In-Sample) Error, Ein

Loss Function
Model

𝒘= [ ¿ ]

𝑹(𝒅+𝟏)× 𝟏

Input data matrix Output

𝑹 𝑵 × 𝟏
𝑹 𝑵 ×(𝒅 +𝟏)

12
Minimizing Ein by closed-form solution
In order to minimize a function E(w)
=
=
=
We need to set its gradient to 0

=0
And then solve the resultant equation (usually easier)

Since X’X is a square matrix, we can use its inverse

Why? If we want to solve
(it is the counterpart/generalization of square We need to solve
matrix inverse in solving linear systems) is a rectangular (no inverse defined)
But applies here, as if can be inverted!
13
Pseudo Inverse by SVD
•
• Non-square m-by-n matrix typically has no inverse, two definitions
– (1) Mathematical flavored definition: Moore-Penrose generalized inverse is an n-by-m matrix such
that

– (2) Machine learning flavored definition: as the solution of the following linear system

• Pseudo-inverse can be computed by SVD.

– Using SVD then (Assume that A is of full column rank)

𝛴 𝑈 ′
A 𝑈 𝛴 𝑉
+¿¿

=
’ 𝐴 =
Properties
of U,V, and

’ (invertible)

– How to prove that SVD based is the pseudo inverse of A (by using the properties of U,V, and
above)?
• either ,
• or is the solution of in
Linear Regression Algorithm

For numerical stability, we can replace the pseudo-inverse by

15
Augmented Linear Model

• Can
we obtain both (1) closed-form solution and (2)
capacity of modelling nonlinear shapes?
• Nonlinear Data Augmentation
– If we want to use the following model
•
– Then the regression can be written as

, , ,1

.
.
¿ .
.
.
. .
.
Minimizing Loss by Gradient Descent

Compute gradient & setting it to 0

Compute gradient & iterative hill climbing (matrix form, more compact, closed-form)
= =
=

Question: will they lead to the same solution?

Convexity/Optimality
• Gradient-based method vs Closed-form solutions
– Gradient based solution is easy to derive, but need to choose step
length, and iterate many times
– Closed-form solution is not always achievable, needs more math; it may
only need one step to go to the optimal solution
• For the objective in linear regression
– The two methods theoretically find the same solution
– Because the loss function is quadratic, convex
• For general optimization problems
– Gradient method is more popular/feasible
• Prone to local optimal
– Closed-form solution is usually difficult
• But you can be lucky if you can
– Derive fixed point iteration
– Or transform it to an SVD problem
Stochastic Gradient Descent
•• The Gradient is a sum of N terms (given N samples)

=
Compute gradient
(summation form)

– Can be quite expensive for large sample set

• Stochastic Gradient Descent:

– one sample at a time: sample index n can be randomly chosen

• Mini-batch Stochastic gradient

– One can choose a random subset of samples (indexed by B)

– |B| = 32, 64,…

SGD may drive the
iterations out of a
local optimal

Gradient
Stochastic
descent
Gradient

SGD leads to fluctuating

objective function

Loss
Function

Maths 1A - Chapter Wise Important Questions
80% (133)
Maths 1A - Chapter Wise Important Questions
27 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Quadratic Equation - 01-60
100% (1)
Quadratic Equation - 01-60
60 pages
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
CM20315 02 Supervised
No ratings yet
CM20315 02 Supervised
53 pages
Lecture 0.2 - Linear Methods For Regression, Optimization
No ratings yet
Lecture 0.2 - Linear Methods For Regression, Optimization
53 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
02 01 Regression
No ratings yet
02 01 Regression
14 pages
Introml 02 Regression Annotated PDF
No ratings yet
Introml 02 Regression Annotated PDF
26 pages
2 Linear Regression
No ratings yet
2 Linear Regression
14 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Chapter Regression
No ratings yet
Chapter Regression
10 pages
Yousef Saad - Iterative Methods For Sparse Linear Systems-Society For Industrial and Applied Mathematics (2003)
No ratings yet
Yousef Saad - Iterative Methods For Sparse Linear Systems-Society For Industrial and Applied Mathematics (2003)
460 pages
Lec 03
No ratings yet
Lec 03
42 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Spotlight - Crux (2021-22) - Day-5 - In-Class Assingement - Mathematics - (Only Que.)
No ratings yet
Spotlight - Crux (2021-22) - Day-5 - In-Class Assingement - Mathematics - (Only Que.)
7 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
BCS405D M-Ii
No ratings yet
BCS405D M-Ii
16 pages
Lec 6
No ratings yet
Lec 6
19 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Group 30
No ratings yet
Group 30
33 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
ML 2
No ratings yet
ML 2
155 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
10 Regression, Including Least-Squares Linear and Logistic Regression
No ratings yet
10 Regression, Including Least-Squares Linear and Logistic Regression
5 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
S&S Previous Question Papers
No ratings yet
S&S Previous Question Papers
75 pages
Abstract: y F X X X, X, X
No ratings yet
Abstract: y F X X X, X, X
10 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Recurrence Relations
No ratings yet
Recurrence Relations
14 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Notes Linearregression
No ratings yet
Notes Linearregression
4 pages
Maths1 A em Final
No ratings yet
Maths1 A em Final
72 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
2 Ex 2B - The Exponential Function
No ratings yet
2 Ex 2B - The Exponential Function
11 pages
Sequence and Series 1.1-1.89: Preface VII
No ratings yet
Sequence and Series 1.1-1.89: Preface VII
3 pages
Graph Theory: A First IN
100% (2)
Graph Theory: A First IN
4 pages
Linear Algebra Demystified Ch6
No ratings yet
Linear Algebra Demystified Ch6
15 pages
Exercise 5 Numer
100% (1)
Exercise 5 Numer
3 pages
Disjoint Sets Data Structure: Example. Consider A System of Three Sets (1, 3, 5), (2, 6), (4, 7, 8)
No ratings yet
Disjoint Sets Data Structure: Example. Consider A System of Three Sets (1, 3, 5), (2, 6), (4, 7, 8)
8 pages
SEC 5th SEM (Graph Theory) - Dec-2024
No ratings yet
SEC 5th SEM (Graph Theory) - Dec-2024
6 pages
Complex Analysis Questions: October 2012
No ratings yet
Complex Analysis Questions: October 2012
13 pages
Lab Report 4 nc007.
No ratings yet
Lab Report 4 nc007.
5 pages
DAY 8 - Using The Unit Circle
No ratings yet
DAY 8 - Using The Unit Circle
3 pages
Inverse Trigonometric Functions - Formula Sheet - (12th Board Booster 2.0 2024)
No ratings yet
Inverse Trigonometric Functions - Formula Sheet - (12th Board Booster 2.0 2024)
2 pages
Exercises
No ratings yet
Exercises
17 pages
Lecture # 6 (Ex.4.1-4.3)
No ratings yet
Lecture # 6 (Ex.4.1-4.3)
4 pages
M.sc. Mathematics - Syllabus
No ratings yet
M.sc. Mathematics - Syllabus
24 pages
Correspondences
No ratings yet
Correspondences
13 pages
Examples and Theorems in Analysis: Walker
No ratings yet
Examples and Theorems in Analysis: Walker
3 pages
DAA Lesson Plan For CSE A & B 2022
No ratings yet
DAA Lesson Plan For CSE A & B 2022
3 pages
1111 LA MidtermExam
No ratings yet
1111 LA MidtermExam
2 pages
Answer 56855
No ratings yet
Answer 56855
1 page
f (x) = 3 cos x − π: Mathematics: Analysis & Approaches Standard Level
No ratings yet
f (x) = 3 cos x − π: Mathematics: Analysis & Approaches Standard Level
2 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet

CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)

Uploaded by

CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)

Uploaded by

CIS 4526: Foundations of Machine Learning

Instructor: Kai Zhang

• Time Series Forecast

• We will start from building simple, linear models

• Circles are training examples

Input data matrix Output

Since X’X is a square matrix, we can use its inverse

• Pseudo-inverse can be computed by SVD.

For numerical stability, we can replace the pseudo-inverse by

Compute gradient & setting it to 0

Question: will they lead to the same solution?

– Can be quite expensive for large sample set

• Stochastic Gradient Descent:

– one sample at a time: sample index n can be randomly chosen

• Mini-batch Stochastic gradient

– |B| = 32, 64,…

SGD leads to fluctuating

You might also like