0% found this document useful (0 votes)

4 views48 pages

Introml sp24 Lec2

Uploaded by

Bambang Widjanarko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views48 pages

Introml sp24 Lec2

Uploaded by

Bambang Widjanarko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

https://introml.mit.

edu/

Intro to Machine Learning

Lecture 2: Linear regression and regularization

Shen Shen
Feb 9, 2024

(many slides adapted from Tamara Broderick)

Logistical issues? Personal concerns?
We’d love to help out at
[email protected]
Logistics
11am Section 3 and 4 are completely full and we have many
requests to switch. Physical space packed.
If at all possible, please help us by signup/switch to other slots.

OHs start this Sunday, please also join our Piazza

Thanks for all the assignments feedback. We are adapting on-the-
go but these certainly beneﬁt future semesters.
Start to get assignments due now. (ﬁrst up, exercises 2, keep an
eye on the "due")
https://shenshen.mit.edu/demos/gifs/atlas_darpa_overall.gif

Optimization + ﬁrst-principle physics

https://www.youtube.com/embed/fn3KWM1kuAw?enablejsapi=1
Outline
Recap of last (content) week.
Ordinary least-square regression
Analytical solution (when exists)
Cases when analytical solutions don't exist
Practically, visually, mathamtically
Regularization
Hyperparameter, cross-validation
Outline
Recap of last (content) week.
Ordinary least-square regression
Analytical solution (when exists)
Cases when analytical solutions don't exist
Practically, visually, mathemtically
Regularization
Hyper-parameter, cross-validation
~ ⊤ ~ −1 ~ ⊤ ~
θ = (X X ) X Y
∗
~ ⊤ ~ −1 ~ ⊤ ~
θ = (X X ) X Y
∗

When θ∗ exists, guaranteed to be

unique minimizer of
~ ⊤ ~ −1 ~ ⊤ ~
θ = (X X ) X Y
may not be
Now, the catch: ∗
well-deﬁned

~ ⊤ ~ −1 ~ ⊤ ~ ~ ~
θ = (X X ) X Y is not well-deﬁned if (X ⊤ X ) is not invertible
∗

~ ~
Indeed, it's possible that(X ⊤ X ) is not
invertible.

~⊤ ~
In particular,(X X ) is not invertible if
~
and only if X is not full column rank
~ ⊤ ~ −1 ~ ⊤ ~
θ = (X X ) X Y
is not well-
Now, the catch: ∗
deﬁned
~
if X is not full column rank

Recall
~
indeed X is not full column rank

1. if n<d
~
2. if columns (features) in X have linear
dependency
1. if n<d (i.e. not enough data)
Recap: ~
2. if columns (features) in X have linear
dependency (i.e., so-called co-linearity)

~ ⊤ ~ −1 ~ ⊤ ~
θ = (X X ) X Y
∗
is not deﬁned

Both cases do happen in practice

In both cases, loss function is a "half-pipe"
In both cases, inﬁnitily-many optimal
hypotheses
Side-note: sometimes noise can resolve
invertabiliy issue, but undesirable
Outline
Recap of last (content) week.
Ordinary least-square regression
Analytical solution (when exists)
Cases when analytical solutions don't exist
Practically, visually, mathemtically
Regularization
Hyper-parameter, cross-validation
Regularization

🥺 🥰
Ridge Regression Regularization
Ridge Regression Regularization
Ridge Regression Regularization

λ is a hyper-parameter
Outline
Recap of last (content) week.
Ordinary least-square regression
Analytical solution (when exists)
Cases when analytical solutions don't exist
Practically, visually, mathemtically
Regularization
Hyper-parameter, cross-validation
Cross-validation
Cross-validation
Cross-validation

…
Cross-validation
Cross-validation
Cross-validation
Cross-validation
Cross-validation
Comments about cross-validation

good idea to shufﬂe data ﬁrst

a way to "reuse" data
not evaluating a hypothesis, but rather
evaluating learning algorithm. (e.g. hypothesis class, hyper-
parameter)
Could e.g. have an outer loop for picking good hyper-
parameter/class
Summary
One strategy for finding ML algorithms is to reduce the ML
problem to an optimization problem.
For the ordinary least squares (OLS), we can find the optimizer
analytically, using basic calculus! Take the gradient and set it to
zero. (Generally need more than gradient info; suffices in OLS)
Two ways to approach the calculus problem: write out in terms of
explicit sums or keep in vector-matrix form. Vector-matrix form is
easier to manage as things get complicated (and they will!) There
are some good discussions in the lecture notes.
Summary
What does it mean to well posed.
When there are many possible solutions, we need to indicate our
preference somehow.
Regularization is a way to construct a new optimization problem
Least-squares regularization leads to the ridge-regression formulation.
Good news: we can still solve it analytically!
Hyper-parameters and how to pick them. Cross-validation
We'd love it for you to share some lecture feedback.

Thanks!

Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
No ratings yet
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
258 pages
Data Science L20 - Regularization
No ratings yet
Data Science L20 - Regularization
41 pages
Discrete Inverse Problem - Insight and Algorithms
No ratings yet
Discrete Inverse Problem - Insight and Algorithms
209 pages
Unit 2
No ratings yet
Unit 2
92 pages
Lecture 3
No ratings yet
Lecture 3
61 pages
Regularization
No ratings yet
Regularization
42 pages
Bias
No ratings yet
Bias
62 pages
MSCV MLDL Remedial
No ratings yet
MSCV MLDL Remedial
95 pages
2.b Applied Machine Learning Secret Sauce - Slides
No ratings yet
2.b Applied Machine Learning Secret Sauce - Slides
41 pages
Chapter 2 - Logistic Regression
No ratings yet
Chapter 2 - Logistic Regression
88 pages
NO LINEALs
No ratings yet
NO LINEALs
61 pages
Lecture 7
No ratings yet
Lecture 7
29 pages
Lecture6 Regularization
No ratings yet
Lecture6 Regularization
56 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
10: Empirical Risk Minimization
No ratings yet
10: Empirical Risk Minimization
6 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
Regression
No ratings yet
Regression
39 pages
L09 - Regularisation
No ratings yet
L09 - Regularisation
79 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Group 30
No ratings yet
Group 30
33 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
ECEN615 Fall2022 Lect16-1
No ratings yet
ECEN615 Fall2022 Lect16-1
47 pages
07 Regularization
No ratings yet
07 Regularization
51 pages
Lecture 13 - Least Squares
No ratings yet
Lecture 13 - Least Squares
28 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
Lecture-05 - Least Squares and Optimization
No ratings yet
Lecture-05 - Least Squares and Optimization
34 pages
2022 Scribe Lecture7
No ratings yet
2022 Scribe Lecture7
9 pages
Chapter Regression
No ratings yet
Chapter Regression
10 pages
Learning From Data: 9: Regularization
No ratings yet
Learning From Data: 9: Regularization
37 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
No ratings yet
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
18 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
Activity 2.1.4 Calculating Force Vectors Answer Key: 5 sin 30 right 2.5 = = װ
33% (6)
Activity 2.1.4 Calculating Force Vectors Answer Key: 5 sin 30 right 2.5 = = װ
4 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
01 Lecturenote SRM
No ratings yet
01 Lecturenote SRM
9 pages
The Least-Mean-Square (LMS) Algorithm and Its Geophysical Applications
No ratings yet
The Least-Mean-Square (LMS) Algorithm and Its Geophysical Applications
28 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
Fitjee 2015
0% (2)
Fitjee 2015
32 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
Regularization (Mathematics)
No ratings yet
Regularization (Mathematics)
11 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Andrew Rosenberg - Lecture 5: Linear Regression With Regularization CSC 84020 - Machine Learning
No ratings yet
Andrew Rosenberg - Lecture 5: Linear Regression With Regularization CSC 84020 - Machine Learning
38 pages
Regularization (Mathematics) - Wikipedia
No ratings yet
Regularization (Mathematics) - Wikipedia
13 pages
ML 01
No ratings yet
ML 01
24 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
Homework2 - Tran Anh Vu
No ratings yet
Homework2 - Tran Anh Vu
3 pages
Class Xii CS Practical File 2
No ratings yet
Class Xii CS Practical File 2
63 pages
CS 256: LMS Algorithms
No ratings yet
CS 256: LMS Algorithms
23 pages
Area Bounded by A Curve
100% (1)
Area Bounded by A Curve
15 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Ex Regularization 1
No ratings yet
Ex Regularization 1
2 pages
Introduction To Machine Learning: The Problem of Overfitting
No ratings yet
Introduction To Machine Learning: The Problem of Overfitting
8 pages
PSLE Maths 2018 Paper 1 Booklet B
No ratings yet
PSLE Maths 2018 Paper 1 Booklet B
8 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Mechanical Toy
100% (1)
Mechanical Toy
10 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
Descartes Fermat Analytic Geometry
No ratings yet
Descartes Fermat Analytic Geometry
58 pages
Worksheet - Iii: Class: C4 (C Batch) Subject: Mathematics
No ratings yet
Worksheet - Iii: Class: C4 (C Batch) Subject: Mathematics
1 page
ICSE Final Practice Paper-3-1
No ratings yet
ICSE Final Practice Paper-3-1
7 pages
2022 b7 End of Term 3 Mathematics Paper 1
No ratings yet
2022 b7 End of Term 3 Mathematics Paper 1
3 pages
Fundamentals of Mathematics Final Exam
No ratings yet
Fundamentals of Mathematics Final Exam
6 pages
4Q Mathematics 10 PT
100% (1)
4Q Mathematics 10 PT
4 pages
Thesis Topics Differential Equations
100% (2)
Thesis Topics Differential Equations
6 pages
Ch16 Estimating Accuracy
No ratings yet
Ch16 Estimating Accuracy
9 pages
IB Questions Trigonometric Functions
No ratings yet
IB Questions Trigonometric Functions
8 pages
UNIT 1 REGULAR POLYGONS Notes 4 ESO
No ratings yet
UNIT 1 REGULAR POLYGONS Notes 4 ESO
6 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Mark Scheme Pure Mathematics and Mechanics
No ratings yet
Mark Scheme Pure Mathematics and Mechanics
21 pages
B.A. Introduction To Logic 2012-13: Lecture 11: Propositional Logic IX
No ratings yet
B.A. Introduction To Logic 2012-13: Lecture 11: Propositional Logic IX
5 pages
Worksheet On Decimals
No ratings yet
Worksheet On Decimals
3 pages
Post Test: Data On Pre-Test (Copy From "Pre-Test" File "Raw For Report" Sheet, Row 9)
No ratings yet
Post Test: Data On Pre-Test (Copy From "Pre-Test" File "Raw For Report" Sheet, Row 9)
24 pages
G6 SA2 Maths SET A
No ratings yet
G6 SA2 Maths SET A
5 pages
Cce C
No ratings yet
Cce C
1 page
4 Approximation and Round-Off Errors
No ratings yet
4 Approximation and Round-Off Errors
27 pages
Development of A New Integrated Local Trajectory Planning and Tracking Control Framework For Autonomous Ground Vehicles
No ratings yet
Development of A New Integrated Local Trajectory Planning and Tracking Control Framework For Autonomous Ground Vehicles
20 pages
Lower Bound Theories
No ratings yet
Lower Bound Theories
5 pages
BAMS
No ratings yet
BAMS
2 pages
Chapter 8 Designing Substansial Task To Utilize ICT in Math Lessons
No ratings yet
Chapter 8 Designing Substansial Task To Utilize ICT in Math Lessons
9 pages
C++ Templates Are Turing Complete
No ratings yet
C++ Templates Are Turing Complete
3 pages
Computer Project STD IX
No ratings yet
Computer Project STD IX
2 pages
Quiz 6 Compre Math
No ratings yet
Quiz 6 Compre Math
3 pages

Introml sp24 Lec2

Uploaded by

Introml sp24 Lec2

Uploaded by

https://introml.mit.

Intro to Machine Learning

(many slides adapted from Tamara Broderick)

OHs start this Sunday, please also join our Piazza

Optimization + ﬁrst-principle physics

When θ∗ exists, guaranteed to be

Both cases do happen in practice

good idea to shufﬂe data ﬁrst

You might also like