0% found this document useful (0 votes)
14 views61 pages

L7 Lecture Image - classification.DNN v4

Uploaded by

enochmay123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views61 pages

L7 Lecture Image - classification.DNN v4

Uploaded by

enochmay123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

COMP 4423: M achine Leanring

and Deep learning


Easy Computer Vision
Xiaoyong Wei (魏驍勇)
[email protected]
Outline

• Traditional machine learning vs. deep learning


• Gradient decent
• Neural networks
• Deep neural networks
• Convolutional neural networks (CNN)
• Layers, pooling, and activation
• AlexNet, VGG, and ResNet
Traditional classification methods
work well for simple tasks. Models
are usually built in a controlled
environment (e.g., lab setting) to
eliminate the variations of
illumination, viewpoints, scales, and
so on.
Popular traditional datasets

Olivetti Face Dataset, AT&T


Popular traditional datasets

MINIST Handwritten Digits


Popular traditional datasets

Palmprint Acquisition and Datasets


However, in real applications, those
are inevitable.
Viewpoints

Fei-Fei Li, Ranjay Krishna, Danfei Xu, Image Classification: A Core Task in Computer Vision
Illumination

Fei-Fei Li, Ranjay Krishna, Danfei Xu, Image Classification: A Core Task in Computer Vision
Occlusions

Fei-Fei Li, Ranjay Krishna, Danfei Xu, Image Classification: A Core Task in Computer Vision
Background Clutter

Fei-Fei Li, Ranjay Krishna, Danfei Xu, Image Classification: A Core Task in Computer Vision
Intra-class Variations

Fei-Fei Li, Ranjay Krishna, Danfei Xu, Image Classification: A Core Task in Computer Vision
Hand Gesture Recognition
ImageNet
ImageNet: 12 subtrees with 5247 synsets and 3.2 million images in total

J. Deng, W. Dong, R. Socher, L. Li, Kai Li and Li Fei-Fei, "ImageNet: A large-scale hierarchical image database," 2009 IEEE
Conference on Computer Vision and Pattern Recognition, 2009, pp. 248-255, doi: 10.1109/CVPR.2009.5206848.
Deep Learning is a popular solution
to address these challenges.

(This is what you’re waiting for. LOL!)


Let’s start by reviewing the learning
of decision boundary through an
example – to classify the red and
green apples.
List of apples
No. x y others Color (z)
1 9 53 … Red (1)
2 25 45 … Green (-1)
3 225 56.7 … Red (1)
4 576 52.9 … Green (-1)
5 676 60.2 … Red (1)
6 900 55.7 … Green (-1)
7 … … … …
Apple Space
y

x
Apple Space To find the best line
dividing the two groups
of apples is to find the
best parameters of a and
y b

The line: y =a*x+b

x
Apple Space To find the best line
dividing the two groups
of apples is to find the
best parameters of a and
y b

The line: y =a*x+b

The model:
z=a*x-y+b
Outputs 1 if z>0
Outputs -1 if z<=0

x
Apple Space To find the best line
dividing the two groups
of apples is to find the
best parameters of a and
y b

The line: y =a*x+b

x
Initialization
Apple Space Without knowing which
line is the best at the
beginning, we can pick
y a random one by setting
a and be with random
numbers a’ and b’.

x
Initialization
Apple Space Without knowing which
line is the best at the
beginning, we can pick
y a random one by setting
a and be with random
numbers a’ and b’.

The model:
z0=a’*x-y+b’

x
How can we evaluate how good the
model (a’ and b’) is?

Intuitively, we can compare the


prediction z’ to the ground truth label z
using (z’-z)2. By applying 𝑵
to all N samples,
𝟏
we 𝑳have a 𝑵loss
𝒂′, 𝒃′ = ෍(𝒛′function
𝒊 − 𝒛𝒊 )
𝟐

𝒊=𝟏
With the “goodness” evaluated, we
can update a’ and b’ by replacing
them with better ones.

The updating process is so called


learning.
But, how?
The best parameters are the ones
that minimize the loss function L. The
optimal parameters can thus be
found at where the gradients of L are
zeros
𝝏𝑳 𝝏𝑳
= 𝟎, = 𝟎.
𝝏𝒂 𝝏𝒃
We can update a’ and b’ by pushing
the gradients towards zeros!

𝝏𝑳
𝒂′ = 𝒂′ −
𝝏𝒂′

𝝏𝑳
𝒃′ = 𝒃′ −
𝝏𝒃′

Gradient Decent
We can update a’ and b’ by pushing
the gradients towards zeros!


𝝏𝑳′
𝒂 =𝒂 −𝜹
𝝏𝒂′

𝝏𝑳
𝒃′ = 𝒃′ −𝜹
𝝏𝒃′

Learning Rate Gradient Decent


We can update a’ and b’ by pushing
the gradients towards zeros!
L


𝝏𝑳′
𝒂 =𝒂 −𝜹
𝝏𝒂′

𝝏𝑳
𝒃′ = 𝒃′ −𝜹
𝝏𝒃′
a’

Learning Rate b’
Apple Space
y

x
Machine learning is a process to find
the best set of parameters that fits
into a model/hypothesis.
The learning is usually conducted by
updating the initial parameters with a
learning rate towards the optimal of
a loss function. Gradient Decent is
one of the most popular updating
strategies.
Let’s implement the learning using
neural networks.
Neural Network Version of the Model

x
a

-1 z=a*x-y+b
y z 1: if z>0
-1: if z<=0
b

1
Neural Network Version of the Model

z1
x

1: if d>0
z2 d -1: if d<=0

y
z3
y

The new
decision
boundary

y
How is the learning conducted with
more layers and weights?
Gradient Decent on Neural Networks

z1
x

z2 d

y Judge for
z3 Loss
Gradient Decent on Neural Networks

z1
x

z2 d

y Judge for
z3 Loss

The Chain Rule for Backpropagation


By stacking more layers you have
Deep Neural Networks
By employing more loss judges you
have Deep Learning
Now, we’re ready for a few
more concepts (tricks)?
x
x z
Layers

Input layer : Receives data from external sources (data files,


images, sensors, etc.)
Hidden layers : Process data
Output layer: Provides network-based functions for one or
more data points
https://ipullrank.com/resources/guides-ebooks/machine-learning-guide/chapter-1
Convolutional Layers

Instead of using fully connected layers, we can


add a few more “partially” connected layers.
Recall the Filters and Convolutions

Sauber Filter
Implement with Neural Networks
1 0 1
0 1 0
=
1 0 1
Convolutional Filter Convolved
Feature
Image

1 1 1 0 0
4
0 1 1 1 0
0 0 1 1 1
0 0 1 1 0
0 1 1 0 0
Input Layer Convolutional Layer
Implement with Neural Networks
1 0 1
0 1 0
=
1 0 1
Convolutional Filter Convolved
Feature
Image

1 1 1 0 0
4
0 1 1 1 0
0 0 1 1 1
Neurons are not fully
0 0 1 1 0 connected, which results in
0 1 1 0 0 Local Receptive Fields.
Input Layer Convolutional Layer
Implement with Neural Networks
Weights of the filters can be
learned by Backpropagation
1 0 1
0 1 0
=
1 0 1
Convolutional Filter Convolved
Feature
Image

1 1 1 0 0
4
0 1 1 1 0
0 0 1 1 1
0 0 1 1 0
0 1 1 0 0
Input Layer Convolutional Layer
Pooling

Max Pooling 7 6
0 1 2 9 10
7 6 5
8 9 10 3.5 3.5
Mean Pooling
7.5 7.5
Activation
Activation
𝑅𝑒𝐿𝑈 𝑥

𝑅𝑒𝐿𝑈 𝑥 = max(0, 𝑥)
Convolutional Networks

https://discuss.boardinfinity.com/t/what-do-you-mean-by-convolutional-neural-network/8533
AlexNet by Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton
VGG16 by Karen Simonyan, Andrew Zisserman @ Oxford
ResNet by K. He and et al.

ResNet @ ILSVRC & COCO 2015 Competitions


1st places in all five main tracks
ImageNet Classification: “Ultra-deep” 152-layer nets
ImageNet Detection: 16% better than 2nd
ImageNet Localization: 27% better than 2nd
COCO Detection: 11% better than 2nd
COCO Segmentation: 12% better than 2nd
Vanishing Gradients and Residual Learning

𝐻 𝑥 =

𝜕𝐿 𝜕𝐿 𝜕𝐻(𝑥)
=
𝜕𝑥 𝜕𝐻(𝑥) 𝜕𝑥
𝜕𝐻(𝑥) 𝜕(𝐹 𝑥 + 𝑥)
=
𝜕𝑥 𝜕𝑥
𝜕𝐹 𝑥
= +1
𝜕𝑥

He K, Zhang X, Ren S, et al. Deep residual learning for image


recognition, Proceedings of the IEEE conference on computer vision
and pattern recognition. 2016: 770-778.
Thank you!

Thank You!

You might also like