0% found this document useful (0 votes)

4 views25 pages

16 - The Key To The Most Powerful ML Models

Uploaded by

shreyaiota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views25 pages

16 - The Key To The Most Powerful ML Models

Uploaded by

shreyaiota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

𝑦

𝑦 = sign ( 𝐰 𝐱 + 𝑏) 𝑦 =𝐰 𝐱 +𝑏
⊤ ⊤

Linear Models
It does not seem any linear model can You are right; to solve such
ever do well on these tasks, no matter learning problems, an entirely
how carefully we learn that model new class of models is needed
𝑦

?
?
𝑥

𝑦 = sign ( 𝐰 𝐱 + 𝑏) 𝑦 =𝐰 𝐱 +𝑏
⊤ ⊤

Non-linear Models
non-linear models
stepping out of line seems to be the key to
ML’s success
Linear Functions
A function is called linear if it satisfies two properties
(Additivity) For any two vectors ,
(Homogeneity) For any vector and scalar ,
Claim: Every linear function is of the form for some
fixed vector -th coordinate

Proof: Consider the standard basis vector for and

consider where . Using the two properties of linear
functions, for any
Exercise
 Show that every linear function satisfies
 Consider polynomials of the form where . For what
values of can be a linear function?
 Show that every function that is linear is convex too!
 Is the converse true? Is every convex function linear too?
 Recall the norms of the form .
Are norms linear for any value of ? If not, which property
of linear functions (additivity or homogeneity) do they
violate?
Affine Functions
A function is called affine if we can write as
⊤
𝐰 𝐱
for some linear function and Exactly. When someone says they are using
a non-linear model, they usually mean a
model that is both non-linear and non-affine
Yup! If we use a linear function as a They are indeed closely related. The biggest
binary classifier, its decision boundary difference is that linear functions must
will always pass through the origin satisfy , affine functions need not.
𝑦 𝐰
When ML people use the term
“linear model” they are usually
talking about an affine function 

What is the difference between

affine and linear functions? They 𝑥
both seem to be quite similar!
Linear and Affine Maps
Linear maps: linear functions of the form
and for all
Affine maps: affine functions of the form
Claim: Linear maps must be of the form for some fixed
matrix
Proof: Left as an exercise 
Hint: Look at how acts on the standard basis vectors of the
input space i.e., and use additivity and homogeneity
properties as before
Corollary: Affine maps must be of the form for some
fixed matrix and fixed vector
The terms function and map are used interchangeably.
The term map is often used to emphasize the fact that
a function’s outputs are vectors and not scalars
Exercise
 Are constant functions affine? Are they always
linear?
 Show that a function/map is affine iff (if-and-only-if)
there exists a linear map , such that for all

 Affine functions preserve convexity. Specifically, if is

an affine function and is a convex set, then show
that the set is convex too!
 Is the converse true? If a function preserves
convexity (it maps all convex sets to convex sets),
Opening the door to non-linearity
State-of-the-art, commercial, web-scale models often
combine the two techniques e.g., use nearest neighbor
on top of new features learnt using neural networks

Reduce the problem to linear Reduce the problem to linear

models models
Use a combination of multiple Modify the features and learn a
linear models instead of a single linear model over the
single linear model new features
Examples: decision trees, Examples: neural networks,
learning with prototypes, kernel methods
nearest neighbors
Decision Trees Indeed, you could have.
Several possible decision trees
could exist for the same task
root
𝑦
𝑥> 0 ?
2
right-child
YES NO of the root
1 left-child
of the root
0
𝑥 𝑦>0? 𝑦<0?
-1

-2 YES NO YES NO
leaf leaf leaf

-2 -1 0 1
Could we not have 2 gotten the same
result by creating two regions by
splitting vertically, say using
Decision Trees
We will study algorithms to learn DT
Usually we learn layer-by-layer,
first the root model, then the
models for its children and so on

𝑦
later. However, note that DT learning
is an intractable (NP-hard) problem
sign ( 𝐮 𝐱 ) ⊤

2
+𝟏 −𝟏
1

0
𝑥 sign 𝐯 𝐱 )
( ⊤
sign 𝐰 𝐱 )
( ⊤
-1

-2 +𝟏 −𝟏 +𝟏 −𝟏

-2 -1 0 1
2 all these multiple linear models?
How does one learn The number of layers
Do we learn them together or one after another? needs to be tuned as
How do we decide how many layers to have? a hyperparameter
Decision trees for regression

Regression
𝑦
Trees problems are often called
regression trees
root

4 𝑥>
sign ( 𝑢𝑥1+𝑎
?)
3
+𝟏
YES 𝟏
−NO leaf
2 leaf

1 𝑣
𝑥𝑥− +𝑏
0.5 𝑤
1.5𝑥+
−𝑥𝑐

𝑢=1,𝑎=−1
0 𝑥

-3 -2 -1 0 1 2
3 4 5
Notice that since this decision tree is Notice that this regression tree cleverly used linear
solving a regression problem, the models for classification as well as regression to
leaves each contain a regression model solve a non-linear regression problem!
Neural Networks
[1 ]
When an NN fails, it could either be that its parameter values
0 1
(in this case ) were not good or there may be a deeper (pun
𝐴= intended) problem with the architecture of the NN itself

𝑦 0
2
sign(( 𝐰 𝐱 ), 0 ) + 𝑏 )
⊤
⊤
sign 𝜙 ( 𝐱( )𝐴
𝐰 max +𝑏
𝑏
1
𝐰
sign

0
𝑥 ReLU

-1 𝜙 1
-2

-2 -1 work,0did it?” 1
𝐱 𝐴 𝑥 𝑦
“Didn't ReLU activation
2
𝜙 ( 𝐱 ) =max ( 𝐴 𝐱 , 0 )
-- Maggie Smith, 2002
(as recounted by Ian McKellen)
Neural Networks
[ −1 ]
The network here can discover two new features, The design of the neural network, number of layers,
+1 −1
each of which look like . The parameters values number of nodes in each layer, activation functions,
𝐴= are learnt using (S)GD decide what sort of features that NN can discover

𝑦 +1
These features were able to solve this task but may
not work for some other task. Learning the optimal
neural network is also an NP-hard problem 
2
sign ( 𝐰 max ( 𝐴 𝐱 , 0 ) + 𝑏 )
⊤

𝑏
1
𝐰
sign

0
𝑥 ReLU

-1 𝜙 1
-2

-2 -1 0 1
𝐱 𝐴 𝑥 𝑦
ReLU activation
I have so many
2 questions. How did
we choose these features that did
well, how are learnt?
𝜙 ( 𝐱 ) =max ( 𝐴 𝐱 , 0 )
Neural Networks
[ −1 ]
Notice that the NN learnt
+1 −1 a very different decision
𝐴=
max (𝑡 ,0 )𝑦+max (− 𝑡,0)=|𝑡|
boundary than the DT

+1
2
sign
sign( 𝐰(|max
⊤
max (𝑥−
𝑥 𝑦𝑦
−( 𝐴 ,0|
𝐱 − 𝑏 ) )( 𝑦 − 𝑥 , 0 ) −1 )
,)0+max
)+1
𝑏
1
𝐰
sign

0
𝑥 ReLU

-1 𝜙 1
-2

-2 -1 0 1
𝐱 𝐴 𝑥 𝑦
ReLU activation
2
𝜙 ( 𝐱 ) =max ( 𝐴 𝐱 , 0 )
Exercises 𝑦
 Note that the classifier will also
2
solve this problem. Can you find
model parameter values for for 1
the previous neural network that 0
will yield this classifier? 𝑥
-1

-2

-2 -1 0 1
2
Neural Networks
Note that this neural network creates
two new features even though the data
𝑦 point had only one feature to begin with

No final
activation
4
𝐰 ⊤
𝜙 ( 𝐱 ) +𝑏
3
𝐰 𝑏
2
ReLU

1
𝜙 1
0 𝑥

-3 -2 -1 0 1 2
𝐱 𝐚𝑥 1𝐜
3 4 5
ReLU activation

𝜙 ( 𝑥 ) =max ( 𝑥 ⋅ 𝐚 + 𝐜 , 0 )
𝐚 =( 1,−1 ) , 𝐜 =( −1,1 )
Neural Networks
𝑦 2
3 0
1
The new axis values are
calculated using original
1 2 axis values, not new ones
No final
0 3
activation
-
-
1
𝐰
4 ⊤
𝜙 ( 𝐱 ) +𝑏
5
-
3
2
4
𝐰 𝑏
ReLU
3 𝜙 1
2

0 𝐱 𝐚𝑥 1𝐜
𝑥 𝑧 new -axis new -axis

( max ( 𝑥𝑥−1
𝜙 ( 𝑥 ) =max ⋅ 𝐚,+0𝐜 , 0 ) (1 − 𝑥 , 0 ))
) , max
Non-linearity in Neural Networks
Activation functions
are often applied
Such deep stacking of layers allows
coordinate -wise
NNs to learn very powerful features
on top of which a linear can do well The simplest features learnt by NNs
look like

where
is an activation function
Examples: ReLU, GeLU, sigmoid, tanh
NNs often stack such feature learners

Learning techniques such as (S)GD

are used to learn the parameters of
the NN i.e. and Crucial: must be a non-linear
function
Recall that ML folks often

Exercises
use the terms linear and
affine interchangeably

 Consider a neural network that uses a linear activation

function

and is a linear activation function for some . Show that

this NN can only learn linear classifiers i.e., there exist
(the values of depend only on the values of ) s.t.

 Consider a NN that uses quadratic activation function

where . What types of functions can this NN learn?

Exercises 𝑥=− 𝑦 𝑥=𝑦
 Create a feature map for
some so that for any ,
takes value if is in the blue
region and if is in the
yellow region. is the -
dimensional all-ones
vector. The dashed lines in
the figure are and .
Exercises
 For a circle with centre and
radius , lets build a classifier that
gives label if a point is inside the ( 𝑝 ,𝑞 )
circle i.e., in the yellow region and 𝑟
otherwise. Give a feature map for
some and a corresponding linear
classifier such that for any , is the
correct output. The map must not
depend on but the classifier may
depend on .
Exercises
 For a 2D rectangular hyperbola
with equation for , give a feature
map for some and a ( 𝑐 ,𝑐 )
corresponding linear classifier so
that for any , takes value in the ( −𝑐 ,−𝑐 )
yellow and in the blue region. The
map must not depend on but
may depend on .
Summary
Linear and affine functions have simple and easy-to-understand
structure that makes it easy to learn linear models e.g. SVMs
Several real-life applications require non-linear models to be learnt
ML uses clever tricks to reduce the problem of learning non-linear
models to the problem of learning linear models
Learning a combination of several linear models (decision/regression trees)
Learning modified features over which a linear model does well (NN)
Learning non-linear models often turns out to be an NP-hard task
Learning an optimal DT or optimal NN are both NP-hard problems
Nevertheless, several heuristics exist that allow us to learn
reasonably accurate DTs and NNs for real-world problems in
reasonable time
Stay Classy!
Catch-up with you next time

NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
The Functions of Deep Learning: Gilbert Strang
No ratings yet
The Functions of Deep Learning: Gilbert Strang
1 page
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
Week 2 Artificial Neural Networks
No ratings yet
Week 2 Artificial Neural Networks
62 pages
Revised Blueprint, Civil Engineering - 240116 - 131542
100% (1)
Revised Blueprint, Civil Engineering - 240116 - 131542
41 pages
Lecture 2
No ratings yet
Lecture 2
67 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
LFD 1
No ratings yet
LFD 1
39 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Introduction To Radial Basis Function Networks
No ratings yet
Introduction To Radial Basis Function Networks
45 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
18 pages
Mco 03 em PDF
80% (5)
Mco 03 em PDF
8 pages
2-Mathematical Optimization and Deep Learning
No ratings yet
2-Mathematical Optimization and Deep Learning
53 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Short Course Machine Learning F de Vuyst 1715052496
No ratings yet
Short Course Machine Learning F de Vuyst 1715052496
74 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
61 pages
CH 1
No ratings yet
CH 1
24 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
Introduction To Radial Basis Function Networks
No ratings yet
Introduction To Radial Basis Function Networks
45 pages
DL 2
No ratings yet
DL 2
62 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
978-3-030-41068-1 (1) - 133-188
No ratings yet
978-3-030-41068-1 (1) - 133-188
56 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Disc11-Examprep-Sols (9 Files Merged)
No ratings yet
Disc11-Examprep-Sols (9 Files Merged)
12 pages
16 DL 1
No ratings yet
16 DL 1
9 pages
1) Deep - Learning
No ratings yet
1) Deep - Learning
60 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
DL Unit2
No ratings yet
DL Unit2
22 pages
Activation Function
No ratings yet
Activation Function
34 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
DL Answers
No ratings yet
DL Answers
24 pages
Module 2
No ratings yet
Module 2
44 pages
Unit 2
No ratings yet
Unit 2
35 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
003 Activation Functions in Machine Learning
No ratings yet
003 Activation Functions in Machine Learning
19 pages
Activatn FN 2
No ratings yet
Activatn FN 2
10 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Instructor's Solution Manual For Neural Networks
No ratings yet
Instructor's Solution Manual For Neural Networks
40 pages
Machine Learning and Pattern Recognition Week 8 Neural Net Intro
No ratings yet
Machine Learning and Pattern Recognition Week 8 Neural Net Intro
3 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Activation
No ratings yet
Activation
7 pages
Unit 2
No ratings yet
Unit 2
18 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Vahid
No ratings yet
Vahid
18 pages
Roadblock To Workplace Diversity
100% (2)
Roadblock To Workplace Diversity
5 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Exercises INF 5860 Solution Hints
No ratings yet
Exercises INF 5860 Solution Hints
11 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Edpm Sba
100% (2)
Edpm Sba
16 pages
Unit-1: Dynamics of Communication
No ratings yet
Unit-1: Dynamics of Communication
46 pages
Launchpad BOPTFRM 1001713990
No ratings yet
Launchpad BOPTFRM 1001713990
2 pages
ENGLISH IV Mumtaz Ali Khan
No ratings yet
ENGLISH IV Mumtaz Ali Khan
18 pages
Brent William
No ratings yet
Brent William
173 pages
Bachelor of Education Primary Program Code 3114 PDF
No ratings yet
Bachelor of Education Primary Program Code 3114 PDF
1 page
Pakikipagkapwa RRL
No ratings yet
Pakikipagkapwa RRL
4 pages
Exercise 1
No ratings yet
Exercise 1
3 pages
Chapter 1 - Subject and Predicate
No ratings yet
Chapter 1 - Subject and Predicate
10 pages
SOC 476jan2025 FCH Esha Chatterjee
No ratings yet
SOC 476jan2025 FCH Esha Chatterjee
7 pages
Declared Vacancies - Maseno University
No ratings yet
Declared Vacancies - Maseno University
7 pages
Calendar 2024 2025
No ratings yet
Calendar 2024 2025
2 pages
Iiml Placement
No ratings yet
Iiml Placement
80 pages
M06 21 - 25 April
No ratings yet
M06 21 - 25 April
17 pages
ĐỀ THI TUYỂN SINH VÀO LỚP 10 SỐ 2
No ratings yet
ĐỀ THI TUYỂN SINH VÀO LỚP 10 SỐ 2
5 pages
Suummary Notes Cognitive Approach
No ratings yet
Suummary Notes Cognitive Approach
11 pages
PWSU - Professional Competencies - August 20202docx
No ratings yet
PWSU - Professional Competencies - August 20202docx
7 pages
Abdullah Bio Data
No ratings yet
Abdullah Bio Data
2 pages
Engl111 71761act1 BanguisMyraFritzie
No ratings yet
Engl111 71761act1 BanguisMyraFritzie
2 pages
LEARNING PLAN - Health
No ratings yet
LEARNING PLAN - Health
7 pages
Chapter 4 - Verbs (Action, Helping, Transitive and Intransitive
No ratings yet
Chapter 4 - Verbs (Action, Helping, Transitive and Intransitive
8 pages
Circle Multiplication Easy v4
No ratings yet
Circle Multiplication Easy v4
2 pages
Sở Bắc Ninh năm 2024 - mã đề 414
No ratings yet
Sở Bắc Ninh năm 2024 - mã đề 414
7 pages
Lesson 6
No ratings yet
Lesson 6
2 pages
Atomic Assignment
No ratings yet
Atomic Assignment
3 pages
ED541350
No ratings yet
ED541350
2 pages
55 Second Drill - Dribbling Drill: How The Drill Works
No ratings yet
55 Second Drill - Dribbling Drill: How The Drill Works
2 pages
Code of Ethics For Teacher As Legal Pillar To Improve Educational Professionalism
No ratings yet
Code of Ethics For Teacher As Legal Pillar To Improve Educational Professionalism
5 pages
CamScanner 10-09-2024 08.35
No ratings yet
CamScanner 10-09-2024 08.35
1 page
Btech Oe 8 Sem Human Values and Madhyasth Darsan Koe089 2022
No ratings yet
Btech Oe 8 Sem Human Values and Madhyasth Darsan Koe089 2022
1 page
The Middle Jurassic Oseberg Delta, Northern North Sea: A Sedimentological and Sequence Stratigraphic
No ratings yet
The Middle Jurassic Oseberg Delta, Northern North Sea: A Sedimentological and Sequence Stratigraphic
5 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet

16 - The Key To The Most Powerful ML Models

Uploaded by

16 - The Key To The Most Powerful ML Models

Uploaded by

𝑦

Proof: Consider the standard basis vector for and

What is the difference between

 Affine functions preserve convexity. Specifically, if is

Reduce the problem to linear Reduce the problem to linear

Learning techniques such as (S)GD

 Consider a neural network that uses a linear activation

and is a linear activation function for some . Show that

 Consider a NN that uses quadratic activation function

where . What types of functions can this NN learn?

You might also like