0% found this document useful (0 votes)

9 views24 pages

CS550 Lec2

The document summarizes key topics from a lecture on linear regression in machine learning: - It discussed linear regression problems, loss functions for regression like sum of squared errors, and regularization techniques like L2 regularization to avoid overfitting. - It explained that linear models can be used for regression by learning weight vectors to map inputs to outputs. These simple models can also be extended to multi-output and multiclass classification problems. - The lecture covered methods for learning the parameters of linear models like the normal equations, pseudo-inverse, and QR decomposition methods. It also introduced randomized techniques for approximating the SVD to compute low-rank approximations.

Uploaded by

dipsresearch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views24 pages

CS550 Lec2

Uploaded by

dipsresearch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

CS 550: Machine Learning

Lecture 2: Regression

Instructor: Dr. Gagan Raj Gupta

Today’s Class

 Linear Regression problems

 Loss Functions for Regression
 Stability Issues in matrix operations
 Regularization to avoid over-fitting
 Different regularization objectives
 SGD to solve Linear Regression
3
Linear Models
 Consider learning to map an input to the corresponding (say real-valued) output

 Assume the output to be a linear weighted combination of the input features

This defines a linear model with

parameters given by a “weight
vector”
Each of these weights have a simple
interpretation: is the “weight” or importance
of the feature in making this prediction
The “optimal” weights are unknown and
 This simple model can be used for Linear Regression have to be learned by solving an
optimization problem, using some training
data
 This simple model can also be used as a “building block” for more complex models
 Even classification (binary/multiclass/multi-output/multi-label) and various other ML/deep learning
models
4
Simple Linear Models as Building Blocks
 In some regression problems, each output itself is a real-valued vector
 Example: Given a full body image of a person, predict height, weight, hand size, and leg size (
 Such problems are commonly known as multi-output regression

 We can assume a separate linear model for each of the outputs

Now each is a D-dim weight

⊤
𝑦 𝑚= 𝒘 𝑚 𝒙 vector for predicting the output

Here is an MxD weight matrix

𝒚 =𝐖 𝒙 with its row containing
⊤
𝒘1
Note: Learning separate models may not be ideal ⊤ Learning this model will
𝒘2
these multiple outputs are somewhat correlated with require us to learn this weight
each other. But this model can be extended to matrix (or equivalently, the
handle such situation (techniques are a bit advanced 𝒙
to be discussed right now – but if curious, you may 𝒘𝑀
⊤
weight vectors)
look up more about multitask learning techniques) 𝒚 𝐖
5
Simple Linear Models as Building Blocks
 Linear models are also used in multiclass classification problems

 Assuming classes, we can assume the following model

⊤
𝑦 =argmax 𝑘∈ {1 , 2 ,… , 𝐾 } 𝒘 𝒙 𝑘

 Can think of as the score of the input for the class

 Once learned (using some optimization technique), these weight vectors (one for each
class) can sometimes have nice interpretations, especially when the inputs are images
These images sort
The learned weight
vectors of each of the 4 of look like class
classes visualized as prototypes if I
images – they kind of were using LwP
look like a “template” of 𝒘 𝑐𝑎𝑟 𝒘 𝑓𝑟𝑜𝑔 𝒘 h𝑜𝑟𝑠𝑒 𝒘 𝑐𝑎𝑡 
Yeah, “sort of”. 
what the images from
That’s why the dot product of each of these weight vectors with No wonder why LwP (with
that class should look an image from the correct class will be expected to be the largest Euclidean distances) acts
like like a linear model. 
6
Simple Linear Models as Building Blocks
 Linear models are building blocks for dimensionality reduction methods like PCA
This looks very similar to the multi-
output model, except that the values of
the latent features are not known and
have to be learned

 Linear models are building blocks for even deep learning model (each layer is like a multi-
output linear model, followed by a nonlinearity)

In a deep learning model, each layer learns a latent

feature representation of the inputs using a model like a
multi-output linear model, followed by a nonlinearity

The last (output) layer can have one or more outputs

More on this when we discuss deep learning later

7
Learning Linear Models
Linear Regression (Problem setup)
 We are given observations (x, y)
 Let us build a linear model to predict y from the observations x s. t.
 Compute the “best” solution to the model that minimizes SSE (sum of squared distances)
 This problem is very similar to solving , a system of linear equations
 In most cases, there may be no exact solution to this problem
 But, there are many approaches we can take to get the “least squares” solution
 This minimizes the sum of square errors

Normal Equations
Since () is perpendicular to all vectors Ax in the column space,
Normal Equation for solving :
Least squares sol to Ax=b:
Projection of b onto Col(A): =
Projection matrix that multiplies b to give p:
Pseudo Inverse Method
 Pseudo Inverse Method:
 If A has independent columns,
 If A has independent rows,
 Pseudo inverse can be computed using SVD:
 contains the inverse of all non-zero diagonal elements in
QR (Gram-Schmidt) Method

 Decompose
 Where Q is an orthogonal matrix, and R is a
triangular matrix
 Then, and the normal equation , can be solved
as

 This is computationally efficient to solve

Alternatives to compute Low Rank Approximations
 Let A be an mxn matrix of low numerical
rank
 Suppose that you can’t afford to compute
the full SVD or you don’t have a good
implementation
 How can you compute a low-rank
approximation to A?
 Gram-Schmidt: Keep reducing a rank-1
component from A
 Complexity: O(mnk) Each of these approximations result in a
factorization of the form
 Krylov Methods: Restrict the matrix A to the
k-dimensional “Krylov subspace” Where Q is an approximate orthonormal basis for
 Span the column space of A
Randomized Low Rank Approximations
Range Finding (Basis) Problem: Given an m x n matrix A and an integer k <
min(m,n). Find an orthonormal m x k matrix Q such that

Solving the primitive problem via randomized sampling — intuition:

1. Draw Gaussian random vectors
2. Form “sample” vectors
3. Form orthonormal vectors such that
Span() = Span()
For instance, Gram-Schmidt can be used — pivoting is rarely
required.
If A has exact rank k, then SpanRange(A) with probability 1.
Randomized SVD
 Goal: Given an m x n matrix A, compute an approximate rank-k SVD
 Algorithm:
 1. Draw an n x k Gaussian random matrix G. G=
randn(n,k)
 2. Form the m x k sample matrix Y = AG
Y=A*G
 3. Form an m x k orthonormal matrix Q such that Y = QR [Q, ~ ] =
qr(Y)
 4. Form the k x n matrix B =
B = Q’ * A
 5. Compute the SVD of the small matrix B: B = [Uhat,
Sigma, V] = svd(B,0)
 6. Form the matrix U =
U = Q * Uhat
 Power iteration to improve the accuracy: The computed factorization is close to optimally
accurate when the singular values of A decay rapidly. When they do not, a small amount of
14
Linear Regression
 Given: Training data with input-output pairs , ,

 Goal: Learn a model to predict the output for new test inputs

 Assume the function that approximates the I/O relationship to be a linear

model ⊤ Can also write all of them
𝑦 𝑛 ≈ 𝑓 ( 𝒙𝑛 ) = 𝒘 𝒙 𝑛 (𝑛=1 , 2 , … , 𝑁 ) compactly using matrix-
vector notation as

 Let’s write the total error or “loss” of this model over the training
measures the data as
Goal of learning is to find
the that minimizes this loss
) prediction error or “loss” or
“deviation” of the model on
+ does well on test data Unlike models like KNN and DT, here we
a single training input
have an explicit problem-specific
objective (loss function) that we wish to
15
Linear Regression: Pictorially
 Linear regression is like fitting a line or (hyper)plane to a set of points [ ] 𝑧1
𝑧2
= 𝜙( 𝑥 )

What if a line/plane doesn’t

model the input-output
relationship very well, e.g., Original (single) feature Two features
Nonlinear curve needed Can fit a plane (linear)
if their relationship is better
modeled by a nonlinear

(Output )
curve or curved surface?
(Output )

Do linear No. We can even fit a

models become curve using a linear
useless in such model after suitably
cases? transforming the inputs
Input (single feature) (Feature 2) (Feature 1)

The transformation can be predefined or learned (e.g.,

using kernel methods or a deep neural network based
feature extractor). More on this later
 The line/plane must also predict outputs the unseen (test) inputs well
16
Loss Functions for Regression Choice of loss function usually
depends on the nature of the
data. Also, some loss functions
result in easier optimization
 Many possible loss functions for regression problems problem than others

Squared loss Absolute

Loss ( 𝑦 𝑛 − 𝑓 ( 𝒙 𝑛 ))2 Loss ¿ 𝑦 𝑛 − 𝑓 ( 𝒙 𝑛 ) ∨¿
loss
Very commonly used
for regression. Leads Grows more slowly than
to an easy-to-solve squared loss. Thus better
suited when data has some
optimization problem
outliers (inputs on which
model makes large errors)
) )
Loss ¿ 𝑦 𝑛 − 𝑓 ( 𝒙 𝑛 ) ∨− 𝜖
Huber loss Loss
-insensitive loss
Squared loss for
small errors (say up
(a.k.a. Vapnik
to ); absolute loss for loss) Note: Can also use
larger errors. Good Zero loss for small errors squared loss instead
for data with outliers (say up to ); absolute loss of absolute loss
−𝛿 𝛿 ) for larger errors
−𝜖 𝜖 )
17
Linear Regression with Squared Loss
In matrix-vector notation, can write it
 In this case, the loss func will be compactly as )

 Let us find the that optimizes (minimizes) the above squared loss
The “least squares” (LS)
 We need calculus and optimization to do this! problem Gauss-Legendre, 18th
century)

 The LS problem can be solved easily and has a closed form solutionClosed form
solutions to ML
= problems are
rare.

= ¿( 𝑿 𝑿)
⊤ −1 ⊤
𝑿 𝒚
matrix inversion – can be expensive.
Ways to handle this. Will see later
18
Proof: A bit of calculus/optim. (more on this later)
 We wanted to find the minima of

 Let us apply basic rule of calculus: Take first derivative of and set to zero
Chain rule of calculus

Partial derivative of dot product w.r.t each element of Result of this derivative is - same size as
 Using the fact , we get
 To separate to get a solution, we write the above as
𝑁 𝑁

∑ 2 𝒙 𝑛 ( 𝑦𝑛 − 𝒙 ⊤
𝑛 𝒘 ) =0 ∑ 𝑦 𝑛 𝒙 𝑛 − 𝒙𝑛 𝒙 ⊤
𝑛 𝒘=0
𝑛=1 𝑛=1

= ¿( 𝑿 𝑿)
⊤ −1
𝑿 𝒚
⊤
19
Problem(s) with the Solution!
 We minimized the objective w.r.t. and got

= ⊤
¿( 𝑿 𝑿)
−1 ⊤
𝑿 𝒚
 Problem: The matrix may not be invertible
 This may lead to non-unique solutions for

 Problem: Overfitting since we only minimized loss defined on training data

 Weights may become arbitrarily large to fit training data perfectly
 Such weights may perform poorly on the test data however is called the Regularizer and
measures the “magnitude” of
 One Solution: Minimize a regularized objective
is the reg. hyperparam.
 The reg. will prevent the elements of from becoming too large Controls how much we wish
 Reason: Now we are minimizing training error + magnitude of vector to regularize (needs to be
tuned via cross-validation)
20
Regularized Least Squares (a.k.a. Ridge Regression)
 Recall that the regularized objective is of the form

 One possible/popular regularizer: the squared Euclidean ( squared) norm of

2
𝑅 ( 𝒘 )=‖𝒘‖2=𝒘 ⊤ 𝒘
 With this regularizer, we have the regularized least squares problem as
Look at the form of the solution. We
+ are adding a small value to the
diagonals of the DxD matrix (like
Why is the method 𝑁 adding a ridge/mountain to some land)
called “ridge”
regression = arg min 𝒘 ∑ ( 𝑦 𝑛 − 𝒘 𝒙 𝑛 ) + 𝜆 𝒘 𝒘
⊤ 2 ⊤

𝑛=1
 Proceeding just like the LS case, we can find the optimal which is given by

= ¿( 𝑿
⊤
𝑿+ 𝜆 𝐼𝐷)
−1
𝑿 𝒚
⊤
21
A closer look at regularization Remember – in general,
weights with large magnitude
are bad since they can cause
 The regularized objective we minimized is overfitting on training data and
𝑁 may not work well on test data
𝐿𝑟𝑒𝑔 ( 𝒘 ) =∑ ( 𝑦 𝑛 − 𝒘 ⊤ 𝒙 𝑛 )2 + 𝜆 𝒘 ⊤ 𝒘
𝑛=1

 Minimizing w.r.t. gives a solution for that

 Keeps the training error small Good because, consequently, the Not a “smooth” model
individual entries of the weight since its test data
 Has a small squared norm = vector are also prevented from predictions may
becoming too large change drastically
even with small
 Small entries in are good since they lead to “smooth” models changes in some
feature’s value

A typical learned without reg.

𝒙 𝑛=¿ 1.2 0.5 2.4 0.3 0.8 0.1 0.9 2.1 𝑦 𝑛= 0.8
3.2 1.8 1.3 2.1 10000 2.5 3.1 0.1

𝒙 𝑚=¿ 1.2 0.5 2.4 0.3 0.8 0.1 0.9 2.1

𝑦 𝑚=100
Just to fit the training data where one of the
inputs was possibly an outlier, this weight
Exact same feature vectors only Very different outputs though (maybe
became too big. Such a weight vector will
differing in just one feature by a small one of these two training ex. is an
possibly do poorly on normal test inputs
amount outlier)
Note that optimizing loss
22
Other Ways to Control Overfitting functions with such regularizers is
usually harder than ridge reg. but
several advanced techniques exist
(we will see some of those later)
 Use a regularizer defined by other norms, e.g.,
Use them if you have a very
norm regularizer large number of features but
𝐷
many irrelevant features.
‖𝒘‖1 =∑ ¿ 𝑤𝑑 ∨¿ ¿ These regularizers can help in
When should I used these 𝑑=1 automatic feature selection
regularizers instead of the sparse means many entries
regularizer? ‖𝒘‖0 =¿ nnz ( 𝒘 ) Using such regularizers in will be zero or near
gives a sparse weight zero. Thus those features
Automatic feature
vector as solution will be considered
selection? Wow, cool!!! norm regularizer (counts
But how exactly? number of nonzeros in irrelevant by the model
and will not influence
prediction
 Use non-regularization based approaches
 Early-stopping (stopping training just when we have a decent val. set accuracy)
 Dropout (in each iteration, don’t update some of the weights) All of these are very popular ways
to control overfitting in deep
 Injecting noise in the inputs learning models. More on these
later when we talk about deep
learning
23
Linear Regression as Solving System of Linear Eqs
 The form of the lin. reg. model is akin to a system of linear equation
 Assuming training examples with features each, we have
First training example: 𝑦 1=𝑥11 𝑤 1+ 𝑥 12 𝑤 2+ …+ 𝑥 1 𝐷 𝑤 𝐷
Note: Here denotes the
Second training example: 𝑦 2=𝑥 21 𝑤1 + 𝑥 22 𝑤2 +… + 𝑥2 𝐷 𝑤 𝐷
feature of the training
example
equations and unknowns
N-th training example: 𝑦 𝑁 =𝑥 𝑁 1 𝑤1 + 𝑥 𝑁 2 𝑤 2+ …+ 𝑥 𝑁𝐷 𝑤 𝐷 here ()
 However, in regression, we rarely have but rather or
 Thus we have an underdetermined () or overdetermined () system
 Methods to solve over/underdetermined systems can be used for lin-reg as well
 Many of these methods don’t require expensive matrix inversion
Now solve
this!
Solving lin-reg ⊤ −1 ⊤
𝒘 ¿(𝑿 𝑿) 𝑿 𝒚 where , and
as system of lin eq.
System of lin. Eqns with equations and unknowns
24
Calculus and Optimization for ML
 Regularized Linear Regression (a.k.a. Ridge Regression)

Problem more compactly written as Solution more compactly as

 Getting closed-form soln required simple calculus, but is expensive to compute

 Especially when is very large (since we need to invert a matrix)

 How to solve this and other (possibly more difficult) optimization problems arising in ML efficiently?

 What’s the basic calculus and optimization knowledge we need for ML?

M.Tech (RE) Course Plan
No ratings yet
M.Tech (RE) Course Plan
89 pages
Assignment 2 - Consumer Research, Inc.
No ratings yet
Assignment 2 - Consumer Research, Inc.
9 pages
Econometric Exam 2 Flashcards - Quizlet
No ratings yet
Econometric Exam 2 Flashcards - Quizlet
18 pages
Notes Linearregression
No ratings yet
Notes Linearregression
4 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Group 30
No ratings yet
Group 30
33 pages
ML Module 2,3,4
No ratings yet
ML Module 2,3,4
13 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Lecture 0.2 - Linear Methods For Regression, Optimization
No ratings yet
Lecture 0.2 - Linear Methods For Regression, Optimization
53 pages
Abstract: y F X X X, X, X
No ratings yet
Abstract: y F X X X, X, X
10 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Day 1
No ratings yet
Day 1
41 pages
Linear Regression
No ratings yet
Linear Regression
104 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Lec03 MultLinRegression
No ratings yet
Lec03 MultLinRegression
42 pages
5.2 Regression
No ratings yet
5.2 Regression
19 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Lec 03
No ratings yet
Lec 03
42 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
CS 532 Lecture Notes
No ratings yet
CS 532 Lecture Notes
25 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
MLF Week 4 Notes by Manisha Pal
No ratings yet
MLF Week 4 Notes by Manisha Pal
13 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
Lecture-04 - Least Squares and Geometry
No ratings yet
Lecture-04 - Least Squares and Geometry
35 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
ML Lecture 2 2023
No ratings yet
ML Lecture 2 2023
59 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Spatial Statistics: Module Handbook
No ratings yet
Spatial Statistics: Module Handbook
11 pages
MBA Semester-II: CC 207 - Research Methodology and Operations Research (RM & OR)
No ratings yet
MBA Semester-II: CC 207 - Research Methodology and Operations Research (RM & OR)
3 pages
Ipl Score Prediction
No ratings yet
Ipl Score Prediction
7 pages
MA - Fma Syllabus and Study Guide 2020-21 FINAL
No ratings yet
MA - Fma Syllabus and Study Guide 2020-21 FINAL
16 pages
Time Series PDF
No ratings yet
Time Series PDF
34 pages
Pengaruh Motivasi, Disiplin Kerja Dan Kepuasan Kerja Terhadap Kinerja Karyawan Pt. Jadi Abadi Corak Biscuit Surabaya
No ratings yet
Pengaruh Motivasi, Disiplin Kerja Dan Kepuasan Kerja Terhadap Kinerja Karyawan Pt. Jadi Abadi Corak Biscuit Surabaya
22 pages
Valleyadmin, Journal Manager, 1 Theijsshi
No ratings yet
Valleyadmin, Journal Manager, 1 Theijsshi
8 pages
Schaeffer Kas 2023 The Integration Paradox A Review and Meta Analysis of The Complex Relationship Between Integration
No ratings yet
Schaeffer Kas 2023 The Integration Paradox A Review and Meta Analysis of The Complex Relationship Between Integration
26 pages
AFRICDSA Certified Data Scientist Syllabus - V1.2
No ratings yet
AFRICDSA Certified Data Scientist Syllabus - V1.2
12 pages
Nu - Edu.kz Econometrics-I Assignment 3 Answer Key
No ratings yet
Nu - Edu.kz Econometrics-I Assignment 3 Answer Key
7 pages
MMPM 06 Guess Paper Ignou
No ratings yet
MMPM 06 Guess Paper Ignou
59 pages
Customer Satisfaction Measures
No ratings yet
Customer Satisfaction Measures
24 pages
Using Minitab For Teaching Statistics in Higher Education
No ratings yet
Using Minitab For Teaching Statistics in Higher Education
3 pages
Sustainability 15 12122
No ratings yet
Sustainability 15 12122
15 pages
Bs Math Syllabus - Math in The Modern World
No ratings yet
Bs Math Syllabus - Math in The Modern World
11 pages
Crypto Mining and Market Quality
No ratings yet
Crypto Mining and Market Quality
21 pages
Parallel Manipulators - Edited by Jee Hwan Ryu PDF
No ratings yet
Parallel Manipulators - Edited by Jee Hwan Ryu PDF
508 pages
Immediate Download (Ebook PDF) Business Statistics in Practice 3rd Canadian Edition Ebooks 2024
100% (3)
Immediate Download (Ebook PDF) Business Statistics in Practice 3rd Canadian Edition Ebooks 2024
49 pages
Data8 Fa24 Final
No ratings yet
Data8 Fa24 Final
19 pages
Nadia Jurnal Inggris
No ratings yet
Nadia Jurnal Inggris
8 pages
China TMC
No ratings yet
China TMC
26 pages
Lecture 2.3 Model Validation
No ratings yet
Lecture 2.3 Model Validation
16 pages
GBM Vignette
No ratings yet
GBM Vignette
28 pages
Econ3150 - 4150 2018v Utsat Sensorveiledning
No ratings yet
Econ3150 - 4150 2018v Utsat Sensorveiledning
10 pages
Syllabus AMS 315 F2016
No ratings yet
Syllabus AMS 315 F2016
7 pages
Bowerman Regression CHPT 1
100% (2)
Bowerman Regression CHPT 1
18 pages
Results of Statistical Analysis of Pressure Relief Valve Proof Test Data
No ratings yet
Results of Statistical Analysis of Pressure Relief Valve Proof Test Data
20 pages

CS550 Lec2

Uploaded by

CS550 Lec2

Uploaded by

CS 550: Machine Learning

Instructor: Dr. Gagan Raj Gupta

 Linear Regression problems

 Assume the output to be a linear weighted combination of the input features

This defines a linear model with

 We can assume a separate linear model for each of the outputs

Now each is a D-dim weight

Here is an MxD weight matrix

 Assuming classes, we can assume the following model

 Can think of as the score of the input for the class

In a deep learning model, each layer learns a latent

The last (output) layer can have one or more outputs

More on this when we discuss deep learning later

 This is computationally efficient to solve

Solving the primitive problem via randomized sampling — intuition:

 Assume the function that approximates the I/O relationship to be a linear

What if a line/plane doesn’t

Do linear No. We can even fit a

The transformation can be predefined or learned (e.g.,

Squared loss Absolute

 Problem: Overfitting since we only minimized loss defined on training data

 One possible/popular regularizer: the squared Euclidean ( squared) norm of

 Minimizing w.r.t. gives a solution for that

A typical learned without reg.

𝒙 𝑚=¿ 1.2 0.5 2.4 0.3 0.8 0.1 0.9 2.1

Problem more compactly written as Solution more compactly as

 Getting closed-form soln required simple calculus, but is expensive to compute

You might also like