0% found this document useful (0 votes)

11 views27 pages

I2ml3e Chap10

The document discusses linear discrimination methods for classification. It describes how linear discriminants model the decision boundary as a linear combination of the input features. This allows for simple and interpretable models. It also covers extensions like quadratic discriminants, kernel methods, and multi-class classification using the softmax function. Gradient descent is used to optimize the model parameters to minimize the cross-entropy loss.

Uploaded by

EMS Metalworking Machinery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views27 pages

I2ml3e Chap10

Uploaded by

EMS Metalworking Machinery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Lecture Slides for

INTRODUCTION
TO
MACHINE
LEARNING
3RD EDITION
ETHEM ALPAYDIN
© The MIT Press, 2014

[email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 10:

LINEAR DISCRIMINATION
Likelihood- vs. Discriminant-based
3
Classification
 Likelihood-based: Assume a model for p(x|Ci), use
Bayes’ rule to calculate P(Ci|x)
gi(x) = log P(Ci|x)
 Discriminant-based: Assume a model for gi(x|Φi);
no density estimation
 Estimating the boundaries is enough; no need to
accurately estimate the densities inside the
boundaries
Linear Discriminant
4

 Linear discriminant:
d
gi x|w i ,wi 0   wTi x  wi 0  wij x j  wi 0
j 1

 Advantages:
 Simple: O(d) space/computation

 Knowledge extraction: Weighted sum of attributes;

positive/negative weights, magnitudes (credit scoring)
 Optimal when p(x|Ci) are Gaussian with shared cov matrix;
useful when classes are (almost) linearly separable
Generalized Linear Model
5

 Quadratic discriminant:
gi x| Wi , w i ,wi 0   xT Wi x  wTi x  wi 0

 Higher-order (product) terms:

z1  x1 , z2  x2 , z3  x12 , z4  x22 , z5  x1x2

Map from x to z using nonlinear basis functions and use a linear

discriminant in z-space
k
gi x    wij j x 
j 1
Two Classes
6

gx   g1 x   g2 x 
 w1T x  w10   w T2 x  w 20 
 w1  w 2 T x  w10  w 20 
 w T x  w0

C1 if gx   0
choose 
C 2 otherwise
Geometry
7
Multiple Classes
8

gi x|w i ,wi 0   wTi x  wi 0

Choos e C i i f
K
gi x   maxg j x 
j 1

Classes are
linearly separable
Pairwise Separation

gij x|w ij ,wij0   wTij x  wij0

 0 if x  C i

gij x    0 if x  C j
don't care otherwise


choos e C i if
j  i , gij x   0

9
From Discriminants to Posteriors
10

When p (x | Ci ) ~ N ( μi , ∑)
gi x| w i ,wi 0   w Ti x  wi 0
1 T 1
w i   μ i wi 0   μ i  μ i  logP C i 
1

2
y  P C1 | x  and P C 2 | x   1  y
 y  0.5

chooseC1 if  y /1  y   1 and C 2 otherwise
log y /1  y   0

P C1 | x  P C | x 
logitP C1 | x   log  log 1
1  P C1 | x  P C 2 | x 
px | C1  P C1 
 log  log
px | C 2  P C 2 
2 d / 2  1/ 2 exp  1/ 2x  μ1 T  1 x  μ1  P C1 
 log  log
2 d / 2  1/ 2 exp  1/ 2x  μ 2 T  1 x  μ 2  P C 2 
 w T x  w0
1
where w   1 μ1  μ 2  w 0   μ1  μ 2 T  1 μ1  μ 2 
2
The inverse of logit
P C1 | x 
log  w T x  w0
1  P C1 | x 

P C1 | x   sigmoidw x  w0  
T 1

1  exp  w T x  w 0  
11
Sigmoid (Logistic) Function
12

Calculate gx   wT x  w0 and chooseC1 if gx   0, or

Calculate y  sigmoidwT x  w0  and chooseC1 if y  0.5
Gradient-Descent
13

 E(w|X) is error with parameters w on sample X

w*=arg minw E(w | X)

 Gradient  E E
T
E 
w E   , ,..., 

 1w w 2 w d 

 Gradient-descent:
Starts from random w and updates w iteratively in the
negative direction of gradient
Gradient-Descent
14
E
wi   , i
wi
wi  wi  wi

E (wt)

E (wt+1)

wt wt+1
η
Logistic Discrimination
15

Two classes: Assume log likelihood ratio is linear

px | C1 
log  w T x  w0o
px | C 2 
P C1 | x  px | C1  P C 
logitP C1 | x   log  log  log 1
1  P C1 | x  px | C 2  P C 2 
 w T x  w0
P C1 
where w0  w  log
o

P C 2 
0

1
y  PˆC1 | x  
 
1  exp  w T x  w0 
Training: Two Classes
16

X  xt , r t t r t | xt ~ Bernoulliy t 
1
y  P C1 | x  

1  exp  w T x  w0  
t r  t 1 r 
l w ,w0 | X    y  1  y 
t t

E  logl
E w ,w0 | X   r t log y t  1  r t log 1  y t 
t
Training: Gradient-Descent
17

E w , w0 | X   r t log y t  1  r t log 1  y t 
t

dy
If y  sigmoida   y 1  y 
da
E  rt 1 rt  t
w j       t  
t 
y 
1  y t
x t

w j 
j
t  y 1 y 
   r t  y t x tj , j  1,..., d
t

E
w0      r t  y t 
w0 t
18
100 1000

19
K>2 Classes
20

X  xt , r t t r t | xt ~ Mul tK 1, y t 

px |C i 
l og  w Ti x  w io0
px |C K 

y  PˆC i | x  

exp w Ti x  w i 0  , i  1,..., K softmax
 expw x  w 
K T
j 1 j j0

 
l w i ,w i 0 | X    y 
i
t
i
rit

t i

E w i ,w i 0 i| X   ritl og y it

w j    rjt  y tj xt w j 0    rjt  y tj 

t t
21
Example
22
Generalizing the Linear Model
23

 Quadratic:
px|C i 
log  xT Wi x  wTi x  wi 0
px|C K 

 Sum of basis functions:

px|C i 
log  w Ti x   wi 0
px|C K 
where φ(x) are basis functions. Examples:
 Hidden units in neural networks (Chapters 11 and 12)

 Kernels in SVM (Chapter 13)

Discrimination by Regression
24

 Classes are NOT mutually exclusive and exhaustive

r t  y t   where  ~ N 0, 2 

y  sigmoidw x  w0  
t T t 1

1  exp  w T xt  w 0  
l w ,w0 | X   
1 
exp 
r y 
t t 2


t 2  2 2


E w ,w0 | X  
1
 r t
 y 
t 2

2 t
w    r t  y t y t 1  y t xt
t
Learning to Rank
25

 Ranking: A different problem than classification or

regression
 Let us say xu and xv are two instances, e.g., two
movies
We prefer u to v implies that g(xu)>g(xv)
where g(x) is a score function, here linear:
g(x)=wTx
 Find a direction w such that we get the desired
ranks when instances are projected along w
Ranking Error
26

 We prefer u to v implies that g(xu)>g(xv), so

error is g(xv)-g(xu), if g(xu)<g(xv)
27

Belenky, Vadim Sevastianov, Nikita B. Stability and Safety of Ships - Risk of Capsizing 2007
100% (1)
Belenky, Vadim Sevastianov, Nikita B. Stability and Safety of Ships - Risk of Capsizing 2007
455 pages
NCERT Math 11th CBSE PDF
100% (1)
NCERT Math 11th CBSE PDF
452 pages
Traverse Computations MLS
No ratings yet
Traverse Computations MLS
31 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
Introduction To Optimal Control Theory and Hamilton-Jacobi Equations
100% (1)
Introduction To Optimal Control Theory and Hamilton-Jacobi Equations
55 pages
For Unit 4 Useful
100% (1)
For Unit 4 Useful
107 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
NN Theory
No ratings yet
NN Theory
138 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
41 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Andrew NG Main - Notes PDF
No ratings yet
Andrew NG Main - Notes PDF
226 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
CS60010: Deep Learning: Spring 2021
No ratings yet
CS60010: Deep Learning: Spring 2021
32 pages
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
No ratings yet
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
18 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
No ratings yet
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
15 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
3 Linear
No ratings yet
3 Linear
5 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Demystifying Barycentric Coordinates
No ratings yet
Demystifying Barycentric Coordinates
2 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
351 pages
CS 229 - Supervised Learning Cheatsheet
No ratings yet
CS 229 - Supervised Learning Cheatsheet
2 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Solution of LPP by Graphical Method
No ratings yet
Solution of LPP by Graphical Method
32 pages
Unit 3-Discriminative Models
No ratings yet
Unit 3-Discriminative Models
29 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Department of Mathematics
No ratings yet
Department of Mathematics
35 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Partial Differentiation - GATE Study Material in PDF
No ratings yet
Partial Differentiation - GATE Study Material in PDF
8 pages
Lecture 1, Part 3: Training A Classifier: Roger Grosse
No ratings yet
Lecture 1, Part 3: Training A Classifier: Roger Grosse
11 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Week 3
No ratings yet
Week 3
56 pages
Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033
No ratings yet
Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033
31 pages
CEE 472 Syllabus-2013
No ratings yet
CEE 472 Syllabus-2013
3 pages
Gradient Descent Based Learners
No ratings yet
Gradient Descent Based Learners
11 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
It Is Compulsory To Submit The Assignment Before Filling in The
No ratings yet
It Is Compulsory To Submit The Assignment Before Filling in The
6 pages
Piled Foundation Analysi by Finite Elements: Randolph
No ratings yet
Piled Foundation Analysi by Finite Elements: Randolph
5 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Math
No ratings yet
Math
4 pages
Steam Turbine 00 Neil Rich
No ratings yet
Steam Turbine 00 Neil Rich
242 pages
MATRICES
No ratings yet
MATRICES
8 pages
12th Magudam Maths em Onemark 123 PDF
No ratings yet
12th Magudam Maths em Onemark 123 PDF
35 pages
Design and Fabrication of Pressing Steam Boiler
No ratings yet
Design and Fabrication of Pressing Steam Boiler
14 pages
2018 Bulgaria National Olympiad: 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 AB 1 2 AD BC CD AB BC BC CD CD AD AD AB
No ratings yet
2018 Bulgaria National Olympiad: 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 AB 1 2 AD BC CD AB BC BC CD CD AD AD AB
2 pages
Sci ML Mock Exam 2023
No ratings yet
Sci ML Mock Exam 2023
8 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
M4. Finding Roots of Equations
No ratings yet
M4. Finding Roots of Equations
7 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
3rd Year Maths Guide 19-20v2 (002) - Pages-7-27
No ratings yet
3rd Year Maths Guide 19-20v2 (002) - Pages-7-27
21 pages
Steam Turbine 08 Neil Goog
No ratings yet
Steam Turbine 08 Neil Goog
655 pages
IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt Solutions To Homework Assignment 1
No ratings yet
IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt Solutions To Homework Assignment 1
16 pages
Operator Methods For Integrals and Differential Equations
No ratings yet
Operator Methods For Integrals and Differential Equations
17 pages
Stability Analysis of Periodically Switched Linear Systems Using Floquet Theory
No ratings yet
Stability Analysis of Periodically Switched Linear Systems Using Floquet Theory
11 pages
Unit 3 Linear Discrimination
No ratings yet
Unit 3 Linear Discrimination
14 pages
Design Load Factors For Structural Columns: Application Example 13
No ratings yet
Design Load Factors For Structural Columns: Application Example 13
5 pages
L3 Cse256 Fa24 FFN
No ratings yet
L3 Cse256 Fa24 FFN
64 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Algebra Tutorial Guide PDF
No ratings yet
Algebra Tutorial Guide PDF
2 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Maths GR 9 June 2021
No ratings yet
Maths GR 9 June 2021
10 pages
Developable Compliant-Aided Rolling-Contact Mechanisms
No ratings yet
Developable Compliant-Aided Rolling-Contact Mechanisms
19 pages
ML-chap10 2024 110300
No ratings yet
ML-chap10 2024 110300
29 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Steamturbine 02 Neilgoog
No ratings yet
Steamturbine 02 Neilgoog
245 pages
I2ml3e Chap15
No ratings yet
I2ml3e Chap15
22 pages
TheDesignofSteamBoilersandPressureVessels 10267815
No ratings yet
TheDesignofSteamBoilersandPressureVessels 10267815
439 pages
Es1003 Lathe Drawtube Specifications
No ratings yet
Es1003 Lathe Drawtube Specifications
7 pages
Thermodynamics of 1911 Pea B
No ratings yet
Thermodynamics of 1911 Pea B
330 pages
Dy - DX - F (X, Y) - in Each Case The Relation G (X, Y) - 0: Chapter 1 Introduction To Differential Equations
No ratings yet
Dy - DX - F (X, Y) - in Each Case The Relation G (X, Y) - 0: Chapter 1 Introduction To Differential Equations
2 pages
EN PDF Betontechnik
No ratings yet
EN PDF Betontechnik
23 pages
I2ml3e Chap9
No ratings yet
I2ml3e Chap9
15 pages
I2ml3e Chap5
No ratings yet
I2ml3e Chap5
26 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
I2ml3e Chap8
No ratings yet
I2ml3e Chap8
28 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
Unit 2
No ratings yet
Unit 2
92 pages
General Purchase Terms - Version 10.2006
No ratings yet
General Purchase Terms - Version 10.2006
5 pages
Curvature and Torsion
No ratings yet
Curvature and Torsion
29 pages
Performance Analysis of Various Activation Functions in Neural
No ratings yet
Performance Analysis of Various Activation Functions in Neural
20 pages
Appendix 1 - NC00056438-Rev2
No ratings yet
Appendix 1 - NC00056438-Rev2
3 pages
A Directory of Paper Recycling Resources
No ratings yet
A Directory of Paper Recycling Resources
276 pages
I2ml3e Chap11
No ratings yet
I2ml3e Chap11
38 pages
Feeding For Dairy
No ratings yet
Feeding For Dairy
28 pages
2018 Annual Report
No ratings yet
2018 Annual Report
76 pages
Duyuru Basel 0001 55
No ratings yet
Duyuru Basel 0001 55
29 pages
Erguvan Olive Oil Price List
No ratings yet
Erguvan Olive Oil Price List
1 page
08 - Class 12th - MATHS - MAGNET BRAINS SAMPLE PAPER SOLUTION FOR BOARD EXAM 2022-23 (Paper 8) - Shivani Sharma Mam
No ratings yet
08 - Class 12th - MATHS - MAGNET BRAINS SAMPLE PAPER SOLUTION FOR BOARD EXAM 2022-23 (Paper 8) - Shivani Sharma Mam
50 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
cs188 Fa22 Note21
No ratings yet
cs188 Fa22 Note21
4 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Multimedia Application L9
No ratings yet
Multimedia Application L9
43 pages
Example Sheet 7
No ratings yet
Example Sheet 7
2 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
BCA-4004 Optimization Techniques Notes
No ratings yet
BCA-4004 Optimization Techniques Notes
3 pages
CS229
No ratings yet
CS229
216 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet

I2ml3e Chap10

Uploaded by

I2ml3e Chap10

Uploaded by

Lecture Slides for

 Knowledge extraction: Weighted sum of attributes;

 Higher-order (product) terms:

Map from x to z using nonlinear basis functions and use a linear

gi x|w i ,wi 0   wTi x  wi 0

gij x|w ij ,wij0   wTij x  wij0

Calculate gx   wT x  w0 and chooseC1 if gx   0, or

 E(w|X) is error with parameters w on sample X

Two classes: Assume log likelihood ratio is linear

X  xt , r t t r t | xt ~ Mul tK 1, y t 

E w i ,w i 0 i| X   ritl og y it

w j    rjt  y tj xt w j 0    rjt  y tj 

 Sum of basis functions:

 Kernels in SVM (Chapter 13)

 Classes are NOT mutually exclusive and exhaustive

 Ranking: A different problem than classification or

 We prefer u to v implies that g(xu)>g(xv), so

You might also like