0% found this document useful (0 votes)

91 views4 pages

Notes Chapter Linear Classifiers

1. The document discusses linear classifiers for binary classification. Linear classifiers are defined by a weight vector θ and bias term θ0 that divide the input space into two regions to make predictions. 2. A learning algorithm takes a training data set and returns a hypothesis from the hypothesis class. For linear classifiers, the hypothesis class H consists of all possible weight vectors θ and biases θ0. 3. As an example learning algorithm, the document proposes generating k random linear classifiers and choosing the one with the lowest training error on the training data.

Uploaded by

Parias L. Mukeba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views4 pages

Notes Chapter Linear Classifiers

Uploaded by

Parias L. Mukeba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

CHAPTER 2

Linear classifiers

1 Classification
A binary classifier is a mapping from Rd → {−1, +1}. We’ll often use the letter h (for Actually, general classi-
hypothesis) to stand for a classifier, so the classification process looks like: fiers can have a range
which is any discrete
x→ h →y . set, but we’ll work with
this specific case for a
Real life rarely gives us vectors of real numbers; the x we really want to classify is while.
usually something like a song, image, or person. In that case, we’ll have to define a function
ϕ(x), whose domain is Rd , where ϕ represents features of x, like a person’s height or the
amount of bass in a song, and then let the h : ϕ(x) → {−1, +1}. In much of the following,
we’ll omit explicit mention of ϕ and assume that the x(i) are in Rd , but you should always
have in mind that some additional process was almost surely required to go from the actual
input examples to their feature representation.
In supervised learning we are given a training data set of the form

Dn = x(1) , y(1) , . . . , x(n) , y(n) .

We will assume that each x(i) is a d × 1 column vector. The intended meaning of this data is
that, when given an input x(i) , the learned hypothesis should generate output y(i) .
What makes a classifier useful? That it works well on new data; that is, that it makes
good predictions on examples it hasn’t seen. But we don’t know exactly what data this My favorite analogy
classifier might be tested on when we use it in the real world. So, we have to assume a is to problem sets. We
evaluate a student’s
connection between the training data and testing data; typically, they are drawn indepen-
ability to generalize by
dently from the same probability distribution. putting questions on the
Given a training set Dn and a classifier h, we can define the training error of h to be exam that were not on
the homework (training
n
1 X 1 h(x(i) ) 6= y(i) set).
En (h) = .
n
i=1
0 otherwise
For now, we will try to find a classifier with small training error (later, with some added
criteria) and hope it generalizes well to new data, and has a small test error
n+n 0

1 X 1 h(x(i) ) 6= y(i)
E(h) = 0
n
i=n+1
0 otherwise

11
MIT 6.036 Fall 2019 12

on n 0 new examples that were not used in the process of finding the classifier.

2 Learning algorithm
A hypothesis class H is a set (finite or infinite) of possible classifiers, each of which represents
a mapping from Rd → {−1, +1}.
A learning algorithm is a procedure that takes a data set Dn as input and returns an
element h of H; it looks like

Dn −→ learning alg (H) −→ h

We will find that the choice of H can have a big impact on the test error of the h that
results from this process. One way to get h that generalizes well is to restrict the size, or
“expressiveness” of H.

3 Linear classifiers
We’ll start with the hypothesis class of linear classifiers. They are (relatively) easy to un-
derstand, simple in a mathematical sense, powerful on their own, and the basis for many
other more sophisticated methods.
A linear classifier in d dimensions is defined by a vector of parameters θ ∈ Rd and
scalar θ0 ∈ R. So, the hypothesis class H of linear classifiers in d dimensions is the set of all
vectors in Rd+1 . We’ll assume that θ is a d × 1 column vector.
Given particular values for θ and θ0 , the classifier is defined by Let’s be careful about
dimensions. We have
+1 if θT x + θ0 > 0 assumed that x and θ
h(x; θ, θ0 ) = sign(θ x + θ0 ) =
T
. are both d × 1 column
−1 otherwise vectors. So θT x is 1 × 1,
which in math (but not
Remember that we can think of θ, θ0 as specifying a hyperplane. It divides Rd , the space necessarily numpy) is
the same as a scalar.
our x(i) points live in, into two half-spaces. The one that is on the same side as the normal
vector is the positive half-space, and we classify all points in that space as positive. The
half-space on the other side is negative and all points in it are classified as negative.

Last Updated: 12/18/19 11:56:05

MIT 6.036 Fall 2019 13

−1
Example: Let h be the linear classifier defined by θ = , θ0 = 3.
1.5

3
The diagram below shows several points classified by h. In particular, let x(1) =
2

4
and x(2) = .
−1

3
h(x(1) ; θ, θ0 ) = sign −1 1.5 + 3 = sign(3) = +1
2

4
h(x (2)
; θ, θ0 ) = sign −1 1.5 + 3 = sign(−2.5) = −1
−1

Thus, x(1) and x(2) are given positive and negative classfications, respectively.

x(1)
θT x + θ0 = 0

x(2)

Study Question: What is the green vector normal to the hyperplane? Specify it as a
column vector.

Study Question: What change would you have to make to θ, θ0 if you wanted to
have the separating hyperplane in the same place, but to classify all the points la-
beled ’+’ in the diagram as negative and all the points labeled ’-’ in the diagram as
positive?

4 Learning linear classifiers

Now, given a data set and the hypothesis class of linear classifiers, our objective will be to
find the linear classifier with the smallest possible training error.
This is a well-formed optimization problem. But it’s not computationally easy!
We’ll start by considering a very simple learning algorithm. The idea is to generate It’s a good idea to think
k possible hypotheses by generating their parameter vectors at random. Then, we can of the “stupidest possi-
ble” solution to a prob-
evaluate the training-set error on each of the hypotheses and return the hypothesis that
lem, before trying to get
has the lowest training error (breaking ties arbitrarily). clever. Here’s a fairly
(but not completely)
stupid algorithm.

Last Updated: 12/18/19 11:56:05

MIT 6.036 Fall 2019 14

R ANDOM -L INEAR -C LASSIFIER(Dn , k, d)

1 for j = 1 to k
(j)
2 randomly sample θ(j) , θ0 from (Rd , R)

(j)
3 j∗ = arg minj∈{1,...,k} En θ(j) , θ0
∗
(j∗ )
4 return θ(j ) , θ0

A note about notation. This might be new no-

tation: arg minx f(x)
Study Question: What do you think happens to En (h), where h is the hypothesis means the value of x
returned by R ANDOM -L INEAR -C LASSIFIER, as k is increased? for which f(x) is the
smallest. Sometimes we
Study Question: What properties of Dn do you think will have an effect on En (h)? write arg minx∈X f(x)
when we want to ex-
plicitly specify the set
X of values of x over
5 Evaluating a learning algorithm which we want to mini-
mize.
How should we evaluate the performance of a classifier h? The best method is to measure
test error on data that was not used to train it.
How should we evaluate the performance of a learning algorithm? This is trickier. There
are many potential sources of variability in the possible result of computing test error on a
learned hypothesis h:

• Which particular training examples occurred in Dn

• Which particular testing examples occurred in Dn 0

• Randomization inside the learning algorithm itself

Generally, we would like to execute the following process multiple times:

• Train on a new training set

• Evaluate resulting h on a testing set that does not overlap the training set

Doing this multiple times controls for possible poor choices of training set or unfortunate
randomization inside the algorithm itself.
One concern is that we might need a lot of data to do this, and in many applications
data is expensive or difficult to acquire. We can re-use data with cross validation (but it’s
harder to do theoretical analysis).

C ROSS -VALIDATE(D, k)
1 divide D into k chunks D1 , D2 , . . . Dk (of roughly equal size)
2 for i = 1 to k
3 train hi on D \ Di (withholding chunk Di )
4 compute “test” error Ei (hi ) on withheld data Di
P
5 return k1 ki=1 Ei (hi )

It’s very important to understand that cross-validation neither delivers nor evaluates a
single particular hypothesis h. It evaluates the algorithm that produces hypotheses.

Last Updated: 12/18/19 11:56:05

Mathematics of Machine Learning MIT
No ratings yet
Mathematics of Machine Learning MIT
411 pages
Algorithms for Artificial Intelligence
No ratings yet
Algorithms for Artificial Intelligence
69 pages
02_intro_learning
No ratings yet
02_intro_learning
98 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
Logistic Regression(Probability Concepts) and Perceptron
No ratings yet
Logistic Regression(Probability Concepts) and Perceptron
20 pages
datamining-lect12
No ratings yet
datamining-lect12
75 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
03 - Non Linear Classifiers PDF
No ratings yet
03 - Non Linear Classifiers PDF
38 pages
CS115 01
No ratings yet
CS115 01
38 pages
Slide ML 0915
No ratings yet
Slide ML 0915
24 pages
Introduction To Classification - PPT Slides 1
No ratings yet
Introduction To Classification - PPT Slides 1
62 pages
07 Intro to ML
No ratings yet
07 Intro to ML
38 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Lec 2
No ratings yet
Lec 2
37 pages
Chapter Classification
No ratings yet
Chapter Classification
12 pages
Lec5 Class
No ratings yet
Lec5 Class
14 pages
Classification
100% (2)
Classification
105 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
39 pages
CS60010: Deep Learning: Spring 2021
No ratings yet
CS60010: Deep Learning: Spring 2021
32 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Ds 2
No ratings yet
Ds 2
27 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Statistical Methods in Artificial Intelligence CSE471 - Monsoon 2015: Lecture 02
No ratings yet
Statistical Methods in Artificial Intelligence CSE471 - Monsoon 2015: Lecture 02
26 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Lec 1
No ratings yet
Lec 1
42 pages
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
No ratings yet
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
30 pages
Chapter 2 - Linear Classifiers
No ratings yet
Chapter 2 - Linear Classifiers
4 pages
Notes6_Classification
No ratings yet
Notes6_Classification
10 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
Supervised Unsupervised
No ratings yet
Supervised Unsupervised
39 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
6.86x Machine Learning With Python: Linear Classifiers
No ratings yet
6.86x Machine Learning With Python: Linear Classifiers
7 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Lect 1
No ratings yet
Lect 1
24 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
EDA-10-ANOVA (1)
No ratings yet
EDA-10-ANOVA (1)
34 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
41 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
Jinisha CLASS 12 Commerce Project Documentation
No ratings yet
Jinisha CLASS 12 Commerce Project Documentation
66 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
04 StaisticalMethods - 1
No ratings yet
04 StaisticalMethods - 1
101 pages
Notes Chapter Logistic Regression
No ratings yet
Notes Chapter Logistic Regression
6 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Lecture 4 Block Diagram Representation of Control Systems
No ratings yet
Lecture 4 Block Diagram Representation of Control Systems
63 pages
CE 44 Highway and Railroad Engineering: Karl Judah C. de Guzman Bsce-3D
No ratings yet
CE 44 Highway and Railroad Engineering: Karl Judah C. de Guzman Bsce-3D
26 pages
Final Review: Erin Keith
No ratings yet
Final Review: Erin Keith
25 pages
Introduction To Matlab
No ratings yet
Introduction To Matlab
63 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Instrumentation Engineering Standards PRD-IN-GS-001
No ratings yet
Instrumentation Engineering Standards PRD-IN-GS-001
44 pages
Flash Analog Clock With Action Script
No ratings yet
Flash Analog Clock With Action Script
4 pages
Colchicine
0% (1)
Colchicine
2 pages
Grade 8 Mathematics (Investigation) 2021
No ratings yet
Grade 8 Mathematics (Investigation) 2021
4 pages
Chapter 4 Metal Cutting
No ratings yet
Chapter 4 Metal Cutting
45 pages
AJEES ID017 Acceptable Margin of Error (1) PUBLISHED
No ratings yet
AJEES ID017 Acceptable Margin of Error (1) PUBLISHED
10 pages
Butterfly Method
No ratings yet
Butterfly Method
14 pages
Psychological Stress and Psoriasis: A Systematic Review and Meta-Analysis
No ratings yet
Psychological Stress and Psoriasis: A Systematic Review and Meta-Analysis
12 pages
9143 PDF
No ratings yet
9143 PDF
8 pages
MECT Phantom 082322
No ratings yet
MECT Phantom 082322
2 pages
Congestion Control Using Leaky Bucket Algorithm
No ratings yet
Congestion Control Using Leaky Bucket Algorithm
6 pages
Interview Questions - CTS
No ratings yet
Interview Questions - CTS
4 pages
Remastering Ubuntu 18
No ratings yet
Remastering Ubuntu 18
3 pages
Questions Junior Mathematics Competition 2017: Department of Mathematics and Statistics
No ratings yet
Questions Junior Mathematics Competition 2017: Department of Mathematics and Statistics
4 pages
Hybrid Synchronous Motor HSM1-6.17.12 - C01: Key Characteristics
No ratings yet
Hybrid Synchronous Motor HSM1-6.17.12 - C01: Key Characteristics
4 pages
HXS5ReadmeHTRI guideHTRI Guide
No ratings yet
HXS5ReadmeHTRI guideHTRI Guide
2 pages
LL1 Parsing
No ratings yet
LL1 Parsing
9 pages
7 Grade Math 2019-2020 School Year Mr. Hershey
No ratings yet
7 Grade Math 2019-2020 School Year Mr. Hershey
3 pages
4373901-0-ZWI-DM06-Pressure Reducing Valve
No ratings yet
4373901-0-ZWI-DM06-Pressure Reducing Valve
2 pages
Aw 1120
No ratings yet
Aw 1120
11 pages
Psychrophilic and Psychrotrophie Microorganisms
No ratings yet
Psychrophilic and Psychrotrophie Microorganisms
2 pages
Aderix Polyester 2 MM
No ratings yet
Aderix Polyester 2 MM
1 page
Topographic Map of Lasara
No ratings yet
Topographic Map of Lasara
1 page
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Notes Chapter Linear Classifiers

Uploaded by

Notes Chapter Linear Classifiers

Uploaded by

CHAPTER 2

Dn −→ learning alg (H) −→ h

Last Updated: 12/18/19 11:56:05

4 Learning linear classifiers

Last Updated: 12/18/19 11:56:05

R ANDOM -L INEAR -C LASSIFIER(Dn , k, d)

A note about notation. This might be new no-

• Which particular training examples occurred in Dn

• Which particular testing examples occurred in Dn 0

• Randomization inside the learning algorithm itself

Generally, we would like to execute the following process multiple times:

• Train on a new training set

Last Updated: 12/18/19 11:56:05

You might also like