0% found this document useful (0 votes)
3 views34 pages

Lecture7C Classification

The document provides an overview of binary classification, focusing on various algorithms such as Support Vector Machines (SVM), K Nearest Neighbor (KNN), and Artificial Neural Networks (ANN). It discusses the evaluation metrics for binary classifiers, the concept of hyperplanes, and the optimization techniques used in SVM to maximize margins while minimizing misclassification. Additionally, it introduces the soft-margin SVM and kernel trick for handling non-linearly separable data.

Uploaded by

Hoàng Bảo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views34 pages

Lecture7C Classification

The document provides an overview of binary classification, focusing on various algorithms such as Support Vector Machines (SVM), K Nearest Neighbor (KNN), and Artificial Neural Networks (ANN). It discusses the evaluation metrics for binary classifiers, the concept of hyperplanes, and the optimization techniques used in SVM to maximize margins while minimizing misclassification. Additionally, it introduces the soft-margin SVM and kernel trick for handling non-linearly separable data.

Uploaded by

Hoàng Bảo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Classification

n Outline:
1. Introduction
2. K Nearest Neighbor (KNN)
3. Artificial Neural Network (ANN)
4. Support Vector Machine (SVM)
Binary classifier

§ Supervised learning algorithm,

§ To categorize new observations (data point) into one of two


predefined classes.

§ Applications:
Common binary classification algorithms

§ Support Vector Machines

§ Naïve Bayes

§ K Nearest Neighbor

§ Decision Trees

§ Logistic Regression

§ Artificial Neural Networks


Binary classification evaluation

§ TP, TN, FP, FN § Accuracy

§ Recall § ROC

§ Precision § AUC

§ F1-score § …
Example of binary classification

Red fruit Green fruit


1D data

StatQuest with Josh Starmer


Example of binary classification

Red fruit Green fruit

Threshold

This threshold is not good! New data?


How to find a better threshold?

§ Focus on the edges of each class/cluster

edges
§ Use a midpoint as threshold

This is Maximal Margin


Classifier!
margin
Is maximal margin classifier good?

§ Training data:

à Maximal margin classifier is super close to the green cluster

à Maximal margin classifier is super sensitive to outliers in


training data
Misclassification

§ Choosing a threshold that allows misclassification:

Ignore it!!!

Soft margin
This is Soft Margin
Classifier
How to choose a good soft margin?

§ Choosing this one?

§ Or choosing this one?

?
How to choose a good soft margin?

§ Using cross validation à how many misclassification errors and


observations within the soft margin

observation misclassification observation

Soft margin classifier = support vector classifier

Support vector classifier can handle outliers


Support vector classifier

support vector support vector

1D data
Support vector classifier

2D data
Support vector classifier

3D data
Support vector classifier

§ 𝑛-D data (𝒏 ≥ 𝟒): support vector classifier is a hyperplane

§ The term “hyperplane” is usually used when we can’t draw it

§ Support vector classifier allows misclassification à can handle


overlapping classification
Overlapping classification

§ In case of lots of overlapping

cured
cured
uncured patients uncured
patients
patients patients

§ Support vector classifiers don’t perform well à solution???


Solution

Support
Vector
Machines
(SVM)
SVM: a visual explanation (cont)

How to split the data in


the best possible way?

Alice Zhao
SVM: a visual explanation (cont)
SVM: a visual explanation (cont)

Not good split


SVM: a visual explanation (cont)

The best split


Margin

Margin = distance
between the hyperplane
and the closest data point
from either class.
margin
Margin (cont)
What is a hyperplane?

§ In an n-D space, a hyperplane is a flat affine (n-1)-D sub-space

§ In an SVM, hyperplane is the decision boundary that separates


two classes,

(𝑤: 𝑣𝑒𝑐𝑡𝑜𝑟, 𝑏: 𝑠𝑐𝑎𝑙𝑎𝑟)

§ Optimal hyperplane is a hyperplane that:

Ø Separates the classes as well as possible

Ø Maximizes the margin

Ø Minimizes the misclassification


Example of hyperplane

Hyperplane

Support
vectors
How to find the optimal hyperplane?

Find the best hyperplane =


maximize the margin?

à Constrained
optimization
problem

à Using
Lagrange
multipliers
technique
How to find the optimal hyperplane? (cont)

§ The hyperplane equation: 𝑓 𝑥 = 𝑤 ! + 𝑏 = 0

§ SVM aims to maximizes margin

2
|𝑤|
How to find the optimal hyperplane? (cont)

§ Given training data { 𝑥" , 𝑦" }%


"#$ (linearly separable)

𝑥" = 𝑛𝐷 − 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑣𝑒𝑐𝑡𝑜𝑟,𝑦" ∈ −1, +1 = 𝑙𝑎𝑏𝑒𝑙

§ The SVM optimization is:

subject to.
Quadratic programming solving

§ Given training data { 𝑥" , 𝑦" }%


"#$

𝑥" = 𝑛𝐷 − 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑣𝑒𝑐𝑡𝑜𝑟,𝑦" ∈ −1, +1 = 𝑙𝑎𝑏𝑒𝑙

§ The SVM optimization is:


Objective function is
quadratic and convex

Constraints are linear


subject to

§ This is a convex quadratic programming problem.


How to make SVM more powerful?

§ When data is not linearly separable and/or has some outliers

Ø Soft -margin SVM

Ø Kernel trick

Ø Mixed
Soft-margin SVM

§ We allow some misclassifications by introducing slack variables


𝜉" ≥ 0 and parameter 𝐶:

subject to:
The role of C parameter

§ C controls the trade-off between:

Ø Maximizing the margin

Ø Minimizing the classification errors

§ Effects of C:

Ø Large C: small margin, few misclassification errors

Ø Small C: large margin, more tolerance to errors

§ How to choose the good value of C?

Ø Cross validation
Kernel trick

§§ When
C controls
data the
is not
trade-off
linearlybetween:
separable:

ØØ Map data to a
Maximizing higher
the dimensional feature space where data
margin

Ø becomes separable
Minimizing the classification errors

§ Effect of
Decision boundary
Summary

Data Type Use Notes


Linearly separable + Linear SVM, hard Perfect separation, no
noise-free margin noise
Nearly separable + Linear SVM, soft C controls error
noise margin (C) tolerance
Kernel SVM + soft Use RBF/poly kernel
Non-linearly separable
margin (C) for flexibility

You might also like