Convolutional Neural Networks For Visual Recognition

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 45
At a glance
Powered by AI
The presentation discusses the history and success of convolutional neural networks in computer vision tasks. CNNs have achieved human-level performance on tasks like ImageNet classification by learning hierarchical image features automatically from large datasets.

CNNs achieved breakthrough results on the ImageNet visual recognition challenge in 2012 by significantly outperforming other methods. Their success was due to having many layers that learned increasingly complex features, from edges to shapes to objects.

CNNs have been applied to tasks beyond classification like detection, scene parsing, indoor semantic labeling, and action detection by adapting the network architecture. They have proven effective for a wide range of computer vision problems.

Introduction:

Convolutional Neural Networks


for Visual Recognition

boris [email protected]
1
Acknowledgments

This presentation is heavily based on:


– http://cs.nyu.edu/~fergus/pmwiki/pmwiki.php
– http://deeplearning.net/reading-list/tutorials/
– http://deeplearning.net/tutorial/lenet.html
– http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial

… and many other

2
Agenda

1. Course overview
2. Introduction to Deep Learning
– Classical Computer Vision vs. Deep learning
3. Introduction to Convolutional Networks
– Basic CNN Architecture
– Large Scale Image Classifications
– How deep should be Conv Nets?
– Detection and Other Visual Apps

3
Course overview

1. Introduction
– Intro to Deep Learning
– Caffe: Getting started
– CNN: network topology, layers definition
2. CNN Training
– Backward propagation
– Optimization for Deep Learning: SGD : monentum, rate
adaptation, Adagrad, SGD with Line Search, CGD
– “Regularization” (Dropout , Maxout)

4
Course overview

3. Localization and Detection


– Overfeat
– R-CNN (Regions with CNN)
4. CPU / GPU performance optimization
– CUDA
– Vtune, OpenMP, and Intel MKL (Math Kernel Library)

5
Introduction to Deep Learning

6
Buzz…

7
Deep Learning – from Research to
Technology

Deep Learning - breakthrough in


visual and speech recognition 8
Classical Computer Vision Pipeline

9
Classical Computer Vision Pipeline.

CV experts
1. Select / develop features: SURF, HoG, SIFT, RIFT,

2. Add on top of this Machine Learning for multi-class
recognition and train classifier
Feature Detection,
Extraction: Classification
SIFT, HoG... Recognition

Classical CV feature definition is domain-


specific and time-consuming

10
Deep Learning –based Vision Pipeline.

Deep Learning:
 Build features automatically based on training data
 Combine feature extraction and classification
DL experts: define NN topology and train NN

Detection,
Deep NN... Deep NN...
Classification
Recognition

Deep Learning promise:


train good feature automatically,
same method for different domain
11
Computer Vision +Deep Learning +
Machine Learning
We want to combine Deep Learning + CV + ML
 Combine pre-defined features with learned features;
 Use best ML methods for multi-class recognition
CV+DL+ML experts needed to build the best-in-class

CV ML
Deep AdaBoost
features
NN... …
HoG, SIFT

Combine best of Computer Vision


Deep Learning and Machine Learning

12
Deep Learning Basics
Deep Learning – is a set of machine learning
algorithms based on multi-layer networks
CAT DOG

OUTPUTS

HIDDEN
NODES

INPUTS
13
Deep Learning Basics
Deep Learning – is a set of machine learning
algorithms based on multi-layer networks
CAT DOG

Training

14

1
Deep Learning Basics
Deep Learning – is a set of machine learning
algorithms based on multi-layer networks
CAT DOG

15

1
Deep Learning Basics
Deep Learning – is a set of machine learning
algorithms based on multi-layer networks
CAT DOG

16
Deep Learning Taxonomy

Supervised:
–Convolutional NN ( LeCun)
–Recurrent Neural nets (Schmidhuber )

Unsupervised
–Deep Belief Nets / Stacked RBMs (Hinton)
–Stacked denoising autoencoders (Bengio)
–Sparse AutoEncoders ( LeCun, A. Ng, )

17
Convolutional Networks

18
Convolutional NN

Convolutional Neural Networks is extension of


traditional Multi-layer Perceptron, based on 3 ideas:
1. Local receive fields
2. Shared weights
3. Spatial / temporal sub-sampling
See LeCun paper (1998) on text recognition:
http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

19
What is Convolutional NN ?
CNN - multi-layer NN architecture
– Convolutional + Non-Linear Layer
– Sub-sampling Layer
– Convolutional +Non-L inear Layer
– Fully connected layers
 Supervised

Classi-
Feature Extraction
fication

20
What is Convolutional NN ?

2x2

Convolution + NL Sub-sampling Convolution + NL


21
CNN success story: ILSVRC 2012

Imagenet data base: 14 mln labeled images, 20K categories

22
ILSVRC: Classification

23
Imagenet Classifications 2012

24
ILSVRC 2012: top rankers

http://www.image-net.org/challenges/LSVRC/2012/results.html

N Error-5 Algorithm Team Authors


1 0.153 Deep Conv. Neural Univ. of Krizhevsky et al
Network Toronto
2 0.262 Features + Fisher ISI Gunji et al
Vectors + Linear
classifier
3 0.270 Features + FV + SVM OXFORD_VG Simonyan et al
G
4 0.271 SIFT + FV + PQ + SVM XRCE/INRIA Perronin et al
5 0.300 Color desc. + SVM Univ. of van de Sande et
Amsterdam al

25
Imagenet 2013: top rankers

http://www.image-net.org/challenges/LSVRC/2013/results.php

N Error-5 Algorithm Team Authors


1 0.117 Deep Convolutional Clarifi Zeiler
Neural Network
2 0.129 Deep Convolutional Nat.Univ Min LIN
Neural Networks Singapore
3 0.135 Deep Convolutional NYU Zeiler
Neural Networks Fergus
4 0.135 Deep Convolutional Andrew Howard
Neural Networks
5 0.137 Deep Convolutional Overfeat Pierre Sermanet
Neural Networks NYU et al

26
Imagenet Classifications 2013

27
Conv Net Topology

 5 convolutional layers
 3 fully connected layers + soft-max
 650K neurons , 60 Mln weights

28
Why ConvNet should be Deep?

Rob Fergus, NIPS 2013 29


Why ConvNet should be Deep?

30
Why ConvNet should be Deep?

31
Why ConvNet should be Deep?

32
Why ConvNet should be Deep?

33
Conv Nets:
beyond Visual Classification

34
CNN applications

CNN is a big Plenty low hanging fruits


hammer

You need just a right nail! 35


Conv NN: Detection

Sermanet, CVPR 2014


36
Conv NN: Scene parsing

Farabet, PAMI 2013


37
CNN: indoor semantic labeling RGBD

Farabet, 2013
38
Conv NN: Action Detection

Taylor, ECCV 2010


39
Conv NN: Image Processing

Eigen , ICCV 2010


40
BACKUP

BUZZ

41
A lot of buzz about Deep Learning

 July 2012 - Started DL lab


 Nov 2012- Big improvement in Speech, OCR:
– Speech – reduce Error Rate by 25%
– OCR – reduce Error rate by 30%
 2013 launched 5 DL based products
– Voice search
– Photo Wonder
– Visual search

42
A lot of buzz about Deep Learning

Microsoft On Deep Learning for Speech goto 3:00-5:10


43
A lot of buzz about Deep Learning

Why Google invest in Deep Learning


44
A lot of buzz about Deep Learning

NYU “Deep Learning” Professor LeCun Will Head


Facebook’s New Artificial Intelligence Lab, Dec 10,
2013
45

You might also like