0% found this document useful (0 votes)

8 views22 pages

Lec 3

The document discusses linear models and their application in authentication through secret questions, focusing on the concept of large margin classifiers and Support Vector Machines (SVM). It explains the optimization problems associated with SVMs, including the use of slack variables and the hinge loss function to balance classification accuracy and margin size. Additionally, it covers optimization techniques such as gradient descent and coordinate descent for finding optimal classifiers.

Uploaded by

as920284.namanagarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views22 pages

Lec 3

Uploaded by

as920284.namanagarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

The Best Linear

Model?
Authentication by Secret
Questions

Give me your device ID and TS271828182845

answer the following questions
1. 10111100 1. 1
2. 00110010 2. 0
3. 10001110 3. 1
4. 00010100 4. 0
5. … 5. …
SERVER DEVICE
Arbiter PUFs If the top signal reaches the finish line first,
the “answer” to this question is 0, else if the
bottom signal reaches first, the “answer” is 1

Question: 1011

1 0 1 1

1?
What just happened?
𝐱9 𝐱2 Embedding Challenge Response
10111100 1
𝐱0 00110010 0
𝐱5 10001110 1
00010100 0
𝐱7 01101111 1
𝐱4 01010111 1
𝐱8 10100110 0
10101001 0
11010111 0
𝐱3 𝐱6 𝐱1 00001010 1
Linear Models
We have

where
𝐰

If , upper signal wins and answer is 0

If , lower signal wins and answer is 1
Thus, answer is simply
This is nothing but
a linear classifier!
It seems infinitely

The “best” Linear Classifier

many classifiers
perfectly classify
the data. Which
6
one should I
choose?

Indeed! Such models would be very brittle

and might misclassify test data (i.e. predict
the wrong class), even those test data
which look very similar to train data
It is better to not select a
model whose decision
boundary passes very close
to a training data point
Large Margin Classifiers 7
Fact: distance of origin from hyperplane is
Fact: distance of a point from this hyperplane is
Given train data for a binary classfn problem where
and , we want two things from a classifier
Demand 1: classify every point correctly – how to ask
this politely?
One way: demand that for all ,
Easier way: demand that for all ,
Demand 2: not let any data point come close to the
boundary
Demand that be as large as possible
Support Vector Machines 8
Just a fancy way of saying Let us simplify

“
this optimization
Please find me a linear classifier problem
that perfectly
classifies the train data while keeping data points
“
as far away from the hyperplane as possible
The mathematical way of writing this request is the
following
Constraints Objective

such that for all

This looks so
This is known as an
complicated, how will I
optimization problem
ever find a solution to
with an objective and
this optimization
Constraints are usually

Constrained Optimization 101

specified using math
equations. The set of points
that satisfy all the constraints
9
HOW WE MUSTObjectiv
SPEAK TO MELBOis called
HOW the WEfeasible
SPEAK setTO
of the
A HUMAN
Constrai optimization problem
e I want to find an unknown
nts
that
You For gives
optimization meproblem
your specifiedthe best value
such that according
has
constraints,
no solution to since
the this function
optimal
no
and etc. etc. (least)
pointbtw,
Oh! value
satisfies
of all
not is
anyyour
andwould
it do!
must isconstraints
achieved at

satisfy these
Feasible conditions
Feasible set is
set is All I am saying is, of the
the interval
empty!
s.t.
s.t. values of that satisfy my
and conditions, find me the one
and that gives the best value
according to
3 6
Back to SVMs 10
Assume there do exist params that perfectly classify all
train data
Consider one such params which classifies train data
perfectly
Now, as
Thus, geometric margin is same as since model has
perfect classification!
We will use this useful fact to greatly simplify the
optimization problem
We will remove What if train data is non-linearly
this separable i.e no linear classifier
assumption can
later perfectly classify it? For example
Support Vector Machines 11
Let be the data point that comes closest to the
hyperplane i.e.
Recall that all this discussion holds only for a perfect
classifier
Let and consider
Note this gives us for all as well as (as )
Thus, instead of searching for , easier to search for

min {‖~ 𝐰 2‖2 }

such
~ ❑ for all
𝐰 , 𝑏 that
~
The C-SVM Technique What prevents me from misusing the
slack variables to learn a model that
misclassifies
The termevery data you
prevents point?
from
12
For linearly separable casesdoingwhere we
so. If we setsuspect
to be a a perfect
classifier exists large value (it is a hyper-
parameter), then it will
penalize solutions that misuse
s.t. forslack
all too much
If a linear classifier cannot perfectly classify data,
Having the then
constraint
find model using prevents us from
misusing slack to
artificially inflate the
s.t. for all margin
Recall English
as well as for all phrase “cut me
The terms are called slack variables. They allow some slack”
some data points to come close to the hyperplane or
be misclassified altogether
From C-SVM to Loss Functions 13
We can further simplify the previous optimization
problem
Note basically allows us to have (even )
Thus, the amount of slack we want is just
However, recall that we must also satisfy
𝑥 [ ]
Another way of saying that if you already have+ ¿= max { 𝑥 ,0 } ¿
, then
you don’t need any slack i.e. you should have in this
case
Thus, we need only set
The above is nothing but the popular hinge loss
function!
Hinge Loss 14
Captures how well as a classifier classified a data point
Suppose on a data point , a model gives prediction
score of (for a linear model , we have )
We obviously want for correct classification but we
also want for large margin – hinge loss function
captures both

Note that hinge loss not only penalizes

misclassification
but also correct classification if the data point gets
Final Form of C-SVM 15
Recall that the C-SVM optimization finds a model by
solving

s.t. for all

as well as for all
Using the previous discussion, we can rewrite the
above very simply
Use Calculus for Optimization 16
Method 1: First order optimality Condition
Exploits the fact that gradient must vanish at a local optimum
Also exploits the fact that for convex functions, local minima are
global
Warning: works only for simple convex functions when there are no
constraints
To Do: given a convex function that we wish to minimize, try
finding all the stationary points of the function (set gradient to
zero)
If you find only one, that has to be the global minimum 
Example:
only at
i.e. is cvx i.e. is global min
Use Calculus for Optimization 17
Method 2: Perform (sub)gradient descent
Recall that direction opposite to gradient offers
steepest descent How to initialize ?
(SUB)GRADIENT DESCENT
1. Given: obj. func. to minimize How to choose
2. Initialize Often called “step
3. For length” or “learning
rate”
1. Obtain a (sub)gradient
2. Choose a step length What is
convergence?
3. Update
4. Repeat until convergence How to decide if we
have converged?
Gradient Descent
Move
Choose step length(GD)
carefully else may
18
opposite to overshoot the global
the minimum even with
gradients great initialization
Also,
initialization
may affect
result
Our initialization was such
that we converged to This
a time
local minimuminitialization was
really nice!

With convex fns, all Still need to be

Global local minima are careful with step
minimum global minima and lengths otherwise
can afford to be less may overshoot global
carefull with minima
19
So gradient descent, although a
Behind the scenes in GD for SVM
mathematical tool from calculus,
actually tries very actively to make the
(ignore bias for now)
model perform better on all data points
, where

Assume for a moment for sake of understanding

No change to due to
Small : is large do not change too much!
the data point

If does well on , say , then

Large : Feel free to change as much as the gradient dictates

If does badly on , say , then may get much better

margin on than
Stochastic Gradient Method 20
, where
Calculating each takes time since - total
At each time, choose a random data point
- only time!!
Warning: may have to perform several SGD steps
than we had to do with GD but each SGD step is much
cheaper than a GD step
We take Doawerandom data
really need point to
to spend avoidallbeing
Initially, we need unlucky
is a
(also it is
so cheap)
much time on just one general direction in which
update? No,toSGD
move
gives a
Especially in the beginning,
cheaper way to
when we are far away from
perform gradient
Mini-batch SGD 21
If data is very diverse, the “stochastic” gradient may
vary quite a lot depending on which random data point
is chosen
This is called variance (more on this later) but this
can slow down the SGD process – make it jittery
One solution, choose more than one random point
At each step, choose random data points ( = mini batch
size) without replacement, say and use

Takes time to execute MBSGD – more expensive than

SGD
Coordinate Descent
Sometimes we are able to optimize completely along a
given variable (even if constraints are there) – called
coordinate
22
Similar to GD except onlyminimization (CM)
one coordinate is changed in
a single step
E.g. s.t. with as th partial derivative
COORDINATE DESCENT
CCD: choose coordinate cyclically
i.e. 1. For
SCD: choose randomly 1. Select a coordinate
2. Let
Block CD: choose a small set of
coordinates at each to update 3. Let for
4. Repeat until
Randperm: permute coordinates
randomly and choose them in convergence
that order. Once the list is over, choose a new random

Name Here: Face Recognition System With Face Detection
No ratings yet
Name Here: Face Recognition System With Face Detection
70 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Ds 2
No ratings yet
Ds 2
27 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
Tut04 - One Algorithm To Optimize Them All
No ratings yet
Tut04 - One Algorithm To Optimize Them All
19 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
7 SVM For Scientists Annotated
No ratings yet
7 SVM For Scientists Annotated
76 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
6 - Support Vector Machines
No ratings yet
6 - Support Vector Machines
14 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
ch6 (Q 2,8,4)
No ratings yet
ch6 (Q 2,8,4)
9 pages
8 SVMs
No ratings yet
8 SVMs
72 pages
Discussion 4 CS771
No ratings yet
Discussion 4 CS771
25 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Optimization PDF
No ratings yet
Optimization PDF
59 pages
Ds 3
No ratings yet
Ds 3
25 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Lecture 7
No ratings yet
Lecture 7
46 pages
08 Classification
No ratings yet
08 Classification
46 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
8 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Lec 05
No ratings yet
Lec 05
54 pages
Ds 5
No ratings yet
Ds 5
21 pages
Lec 06 SVM
No ratings yet
Lec 06 SVM
34 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
Machine Learning: Support Vector Machines Kernel Methods
No ratings yet
Machine Learning: Support Vector Machines Kernel Methods
87 pages
Support Vector Machines (SVM) Models in Stata
No ratings yet
Support Vector Machines (SVM) Models in Stata
19 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
Convex Optimization in Classification Problems: MIT/ORC Spring Seminar
No ratings yet
Convex Optimization in Classification Problems: MIT/ORC Spring Seminar
39 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
No ratings yet
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
258 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
CS480 6 Linear Models
No ratings yet
CS480 6 Linear Models
68 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Chapter 8
No ratings yet
Chapter 8
52 pages
End Sem
No ratings yet
End Sem
22 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
SVMs
No ratings yet
SVMs
42 pages
Fitting A Model To Data
No ratings yet
Fitting A Model To Data
41 pages
Lecture 5. Support Vector Machines SVM
No ratings yet
Lecture 5. Support Vector Machines SVM
47 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
Optimization
No ratings yet
Optimization
95 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Painless Calculus
From Everand
Painless Calculus
Barron's Educational Series
No ratings yet
B.tech. 3rd Yr CSE (IOT) 2022 23 Revised
No ratings yet
B.tech. 3rd Yr CSE (IOT) 2022 23 Revised
32 pages
Big Data Analytics For Weather Prediction Research Paper
No ratings yet
Big Data Analytics For Weather Prediction Research Paper
8 pages
Neurocomputing: Karthik Thirumala, Sushmita Pal, Trapti Jain, Amod C. Umarikar
No ratings yet
Neurocomputing: Karthik Thirumala, Sushmita Pal, Trapti Jain, Amod C. Umarikar
10 pages
Journal Review Assignment 0
No ratings yet
Journal Review Assignment 0
3 pages
Experimental of Vectorizer and Classifier For Scrapped Social Media Data
No ratings yet
Experimental of Vectorizer and Classifier For Scrapped Social Media Data
10 pages
Qiu Et Al 2016 Optimal Design of Hydraulic Excavator Working Device Based On Multiple Surrogate Models
No ratings yet
Qiu Et Al 2016 Optimal Design of Hydraulic Excavator Working Device Based On Multiple Surrogate Models
12 pages
A Decision Support System For Diabetes Prediction Using Machine Learning and Deep Learning Techniques
No ratings yet
A Decision Support System For Diabetes Prediction Using Machine Learning and Deep Learning Techniques
4 pages
Deza Understanding Image Virality 2015 CVPR Paper
No ratings yet
Deza Understanding Image Virality 2015 CVPR Paper
9 pages
Malicious URL Detection Using Machine Learning 2
No ratings yet
Malicious URL Detection Using Machine Learning 2
24 pages
Damianou, Lawrence - 2013 - Deep Gaussian Processes
No ratings yet
Damianou, Lawrence - 2013 - Deep Gaussian Processes
9 pages
Python-Project 2023
No ratings yet
Python-Project 2023
19 pages
Recognition of Printed Amharic Documents
No ratings yet
Recognition of Printed Amharic Documents
5 pages
Lung and Pancreatic Tumor Characterization in The Deep Learning Era: Novel Supervised and Unsupervised Learning Approaches
No ratings yet
Lung and Pancreatic Tumor Characterization in The Deep Learning Era: Novel Supervised and Unsupervised Learning Approaches
11 pages
A Comparative Study of Relevant Vector Machine and
No ratings yet
A Comparative Study of Relevant Vector Machine and
5 pages
Factors Responsible For The Success of A Start-Up A Meta-Analytic Approach
No ratings yet
Factors Responsible For The Success of A Start-Up A Meta-Analytic Approach
11 pages
Model 1
No ratings yet
Model 1
8 pages
Otsu Method and Kmeans
No ratings yet
Otsu Method and Kmeans
6 pages
Question Bank
No ratings yet
Question Bank
67 pages
Automatic Age and Gender Identification Using Convolutional Neural Networks 2023
No ratings yet
Automatic Age and Gender Identification Using Convolutional Neural Networks 2023
9 pages
Deep Learning Meets Support Vector Machines An Effective Hybrid Model For Banana Leaf Wilt Disease Severi
No ratings yet
Deep Learning Meets Support Vector Machines An Effective Hybrid Model For Banana Leaf Wilt Disease Severi
5 pages
Human Posture Recognition Using A Hybrid of Fuzzy Logic and Machine Learning Approaches
No ratings yet
Human Posture Recognition Using A Hybrid of Fuzzy Logic and Machine Learning Approaches
12 pages
Detection and Characterization of Breast Tumor Lesions by Machine Learning From Medical Images
No ratings yet
Detection and Characterization of Breast Tumor Lesions by Machine Learning From Medical Images
16 pages
An Analysis of Demographic and Behavior Trends Using Social Media, Facebook, Twitter, and Instagram
No ratings yet
An Analysis of Demographic and Behavior Trends Using Social Media, Facebook, Twitter, and Instagram
22 pages
IIT Delhi AI & ML For Industry - 03 - Brochure
No ratings yet
IIT Delhi AI & ML For Industry - 03 - Brochure
29 pages
JBDSC Vol 2 Issue 2 1 17 24
No ratings yet
JBDSC Vol 2 Issue 2 1 17 24
8 pages
Feature Selection Based On F-Score For Enhancing CTG Data Classification
No ratings yet
Feature Selection Based On F-Score For Enhancing CTG Data Classification
5 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
Detection Method of Turn To Turn Insulation Short Circuit Fault of Dry-Type Air-Core Reactor U
No ratings yet
Detection Method of Turn To Turn Insulation Short Circuit Fault of Dry-Type Air-Core Reactor U
5 pages
Ipt Report
No ratings yet
Ipt Report
19 pages

Lec 3

Uploaded by

Lec 3

Uploaded by

The Best Linear

Give me your device ID and TS271828182845

If , upper signal wins and answer is 0

The “best” Linear Classifier

Indeed! Such models would be very brittle

such that for all

Constrained Optimization 101

min {‖~ 𝐰 2‖2 }

Note that hinge loss not only penalizes

s.t. for all

With convex fns, all Still need to be

Assume for a moment for sake of understanding

If does well on , say , then

If does badly on , say , then may get much better

Takes time to execute MBSGD – more expensive than

You might also like