0% found this document useful (0 votes)
11 views66 pages

L6 Lecture Image - Classification.fundemental v4

Uploaded by

enochmay123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views66 pages

L6 Lecture Image - Classification.fundemental v4

Uploaded by

enochmay123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

COMP 4423: Image classification

Fundemental
Easy Computer Vision
Xiaoyong Wei (魏驍勇)
[email protected]
New Toy
Outline

• Classification
• Supervised learning
• K nearest neighbors (k-NN)
• Bayesian classifiers
• Support vector machines (SVM)
• Rock-Paper-Scissors
How do you group them?
Feature Space

3
Skin Color

1 2 3
Hair Color
Clustering is unsupervised learning
which means we (human) don’t have
to tell the computers what each
group looks like. It’s data-driven
without using human knowledge
(supervision).
Sounds good?

But …
Feature Space

3
Skin Color

? ? ?
1 2 3
What if we have new examples unseen?
Hair Color
Feature Space

What
3 if the features selected are not
representative enough or consistent with
Skin Color

2 our understanding?

age age age


? ? ?

1 2 3
Hair Color
We can tell the computers about our
understanding on the subjects by
giving labels.
Training Examples (Seen)
Skin Color Class Label
Hair Color (H)
(S) (L)

Yellow (2) White (1) W

Black (3) Yellow (2) A

Black (3) Yellow (2) A

Yellow (2) White (1) W

Black (3) Black (3) B


Testing Examples (Unseen)
Skin Color Class Label
Hair Color (H)
(S) (L)

2.2 0.8 ?

3.2 1.9 ?

3.1 2.2 ?

2.4 1.3 ?

3.1 2.9 ?
We can tell the computers about our
understanding on the subjects by
giving labels.
Training Examples (Seen)
Skin Color Class Label
Hair Color (H)
(S) (L)

Yellow (2) White (1) W

Black (3) Yellow (2) A

Black (3) Yellow (2) A

Yellow (2) White (1) W

Black (3) Black (3) B


Testing Examples (Unseen)
Skin Color Class Label
Hair Color (H)
(S) (L)

2.2 0.8 ?

3.2 1.9 ?

3.1 2.2 ?

2.4 1.3 ?

3.1 2.9 ?
Classification: to predict the labels of
the testing (unseen) examples based
on the knowledge learned from the
training (seen) examples
Classification: to predict the labels of
the testing (unseen) examples based
on the knowledge learned from the
training (seen) examples

We are training the computers and the processing is called training.


The computers are learning. This is what the term “machine learning” is referring to.
Our participation in learning makes it
supervised learning.
Classification: to predict the labels of
the testing (unseen) examples based
on the knowledge learned from the
training (seen) examples

The result is the machine’s understanding of the knowledge. We


call it a model sometimes.
Classification – Supervised Learning

Training Set Testing Set

Samples Features Labels Samples Features Labels

1 [0.1, …] 1 1 [0.8, …] ?
Model
2 [0.3, …] -1 2 [0.7, …] -?
… … … … … …

Validation Set

Samples Features Labels


Samples Features Labels
1 [0.5, …] -1
1 [0.8, …] -1
2 [0.9, …] 1
2 [0.7, …] 1
… … …
… … …
Training Testing
Then how?
Instance-based Learning

PDA Cellphone Desktop PC Laptop

It looks like the


laptop I see last
time.
How about we use image retrieval to
find the most similar ones from the
seen examples for reference?
kNN Classifiers
kNN Classifier
>NN: nearest neighbors
>k: number of nearest neighbors
>Idea
• When k=1: assign the unseen with the label of its nearest
neighbor
• 近朱者赤,近墨者黑(If you lie down with dogs, you
will get up with fleas)
k=1
3

Skin color 2

1 2 3
Hair color
The nearest

k=1 neighbor

Skin color 2

1 2 3
Hair color
kNN Classifier
>NN: nearest neighbors
>k: number of nearest neighbors
>Idea
• k=1: assign the unseen with the label of its nearest neighbor
• k>1: assign the dominating label among these of the k
nearest neighbors
k=3
3

Skin color
2 #A=2
#W=1
#A>#W => A
1

1 2 3
Hair color
It’s straightforward. But so far, we picked the
simplest case (classes are well separated) for
illustration purpose.
In a more general sense, this is what we’re
going to have.
It’s straightforward. But so far, we picked the
simplest case (classes are well separated) for
illustration purpose.
In a more general sense, this is what we’re
going to have.

kNN may fail


Can we teach the computer to draw a “circle”
for each of the class and evaluate the
membership of an example by measuring
how much it falls into “circles”?
Bayesian Classifiers
Bayesian Classifiers

• Classes A and B as two sets

A B

A and B
Bayesian Classifiers
• Classes A and B as two sets
• P(A|x): the probability of A is observed when seeing an x
• P(B|x): the probability of B is observed when seeing an x

A B
Bayesian Classifiers
• Classes A and B as two sets
𝑃 𝑥 𝐴 𝑃(𝐴)
• P(A|x) ∝ P(x|A)P(A) 𝑃 𝐴𝑥 =
𝑃(𝑥)
• P(B|x) ∝ P(x|B)P(B)
𝑃 𝑥 𝐵 𝑃(𝐵)
𝑃 𝐵𝑥 =
𝑃(𝑥)
x

A B

https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c
Bayesian Classifiers
• Classes A and B as two sets
• P(A|x) ∝ P(x|A)P(A)
• P(B|x) ∝ P(x|B)P(B)

A B

P(A)=#A/(#A+#B) P(B)=#B/(#A+#B)

The Prior Probability


Bayesian Classifiers
• Classes A and B as two sets
• P(A|x) ∝ P(x|A)P(A)
• P(B|x) ∝ P(x|B)P(B)

A B

P(x|A)=#x/#A P(x|B)=#x/#B

The Conditional Probability (Likelihood)


Bayesian Classifiers
• Classes A and B as two sets
• P(A|x) ∝ P(x|A)P(A)
• P(B|x) ∝ P(x|B)P(B)

A B

Decision function: x is A if P(A|x)>P(B|x),


B otherwise
Naive Bayesian
• Advantages:
• Fast
• Extendable to multi-class problems
• Requires less training examples
• Works well for categorical data
• Disadvantages:
• Features are assumed to be independent to each other
(not true in real-world applications)
• Zero-frequency problem
Support Vector Machines
Linear Separators
>Binary classification can be viewed as the task of
separating classes in feature space:
𝒚 = 𝒘𝒙 + 𝒃 −𝒚 + 𝒘𝒙 + 𝒃 = 𝟎

𝒘 𝒙 𝒚 +𝒃=𝟎
−𝟏

wTx + b = 0
Linear Separators
>Binary classification can be viewed as the task of
separating classes in feature space:

wTx + b = 0
wTx + b > 0
wTx + b < 0

f(x) = sign(wTx + b)
Linear Separators
>Binary classification can be viewed as the task of
separating classes in feature space:

>Which one is the best?


Margin w T xi + b
> Distance from example xi to the separator isr=
w
> Examples closest to the hyperplane are support vectors.
> Margin ρ of the separator is the distance between support
vectors. ρ

r
Maximum Margin Classification
> Maximizing the margin is good according to intuition and
PAC theory (Probably Approximately Correct).
> Implies that only support vectors matter; other training
examples are ignorable.
Soft Margin Classification
>What if the training set is not linearly separable?
>Slack variables ξi can be added to allow
misclassification of difficult or noisy examples,
resulting margin called soft.

ξi
ξi
Linear SVMs: Overview
> The classifier is a separating hyperplane
> Most “important” training points are support vectors; they
define the hyperplane.
> Quadratic optimization algorithms can identify which training
points xi are support vectors with non-zero Lagrangian
multipliers αi.
> Both in the dual formulation of the problem and in the
solution training points appear only inside inner products:
⾃⼰讀(Math)
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
f(x) = ΣαiyixiTx + b
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi
Linear SVMs: Overview
> The classifier is a separating hyperplane
> Most “important” training points are support vectors; they
define the hyperplane.
> Quadratic optimization algorithms can identify which training
points xi are support vectors with non-zero Lagrangian
multipliers αi.
> Both in the dual formulation of the problem and in the
solution training points appear only inside inner products:

Find α1…αN such that


Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
f(x) = ΣαiyixiTx + b
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi
Non-linear SVMs
> Datasets that are linearly separable with some noise work
out great:

0 x

0 x
Non-linear SVMs
> Datasets that are linearly separable with some noise work
out great:

0 x

0 x

> How about… mapping data to a higher-dimensional space:


In higher dimensional space there are more chance to find a line to marge.
x2

0 x
Non-linear SVMs: Feature spaces
>General idea: the original feature space can
always be mapped to some higher-dimensional
feature space where the training set is separable:

Φ: x → φ(x)
The “Kernel Trick”
> The linear classifier relies on inner product between vectors K(xi,xj)=xiTxj
> If every datapoint is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
> A kernel function is a function that is equivalent to an inner product in
some feature space.
> Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2=
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] =
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
> Thus, a kernel function implicitly maps data to a high-dimensional space
(without the need to compute each φ(x) explicitly).
Examples of Kernel Functions
> Linear: K(xi,xj)= xiTxj
• Mapping Φ: x → φ(x), where φ(x) is x itself
> Polynomial of power p: K(xi,xj)= (1+ xiTxj)p
d + p
• Mapping Φ: x → φ(x), where φ(x) has 
 p


dimensions
2
xi −x j

2 2
> Gaussian (radial-basis function): K(xi,xj) = e
• Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional:
every point is mapped to a function (a Gaussian);
combination of functions for support vectors is the
separator.
> Higher-dimensional space still has intrinsic dimensionality d
(the mapping is not onto), but linear separators in it correspond
to non-linear separators in original space.
Classification – Supervised Learning

Training Set Testing Set

Samples Features Labels Samples Features Labels

1 [0.1, …] 1 1 [0.8, …] ?
Model
2 [0.3, …] -1 2 [0.7, …] -?
… … … … … …

Validation Set

Samples Features Labels


Samples Features Labels
1 [0.5, …] -1
1 [0.8, …] -1
2 [0.9, …] 1
2 [0.7, …] 1
… … …
… … …
Training Testing
The New Toy
New Toy
Feature Extraction
Training & Testing
Thank you!

Thank You!

You might also like