0% found this document useful (0 votes)

197 views64 pages

Deep Learning in Object Detection, PDF

This document summarizes recent advances in deep learning for object detection, segmentation, and recognition tasks in computer vision. It discusses how early neural networks were limited but have seen resurgence due to improved techniques like unsupervised pre-training and larger datasets. Deep learning now achieves state-of-the-art results for tasks like image classification, face detection, and pedestrian detection. The author's group has applied deep learning to problems like face parsing, pedestrian parsing, and face attribute recognition. Open questions remain around how to best formulate vision problems and make use of deep models' large learning capacity.

Uploaded by

Raghavendra Shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

197 views64 pages

Deep Learning in Object Detection, PDF

Uploaded by

Raghavendra Shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Deep Learning in Object Detection,

Segmentation, and Recognition

Xiaogang Wang
Department of Electronic Engineering,
The Chinese University of Hong Kong
Attribute
recognition
Face alignment Face parsing Face recognition

Pedestrian parsing
Human pose estimation

Deep learning

Crowd segmentation
Pedestrian detection

Crowd tracking

Crowd video surveillance

Person re-identification
across camera views Crowd behaviour analysis
Neural network
Back propagation

1986

• Solve general learning problems

• Tied with biological system
But it is given up…

• Hard to train
• Insufficient computational resources
• Small training sets
• Does not work well
Neural network
Back propagation

1986 2006

• SVM • Loose tie with biological systems

• Boosting • Flat structures
• Specific methods for specific tasks
• Decision tree
– Hand crafted features (GMM-HMM, SIFT, LBP, HOG)
• KNN
• …

Kruger et al. TPAMI’13

Neural network Deep belief net
Back propagation Science

1986 2006

… … • Unsupervised & Layer-wised pre-training

• Better designs for modeling and training
… … (normalization, nonlinearity, dropout)
… … • Feature learning
• New development of computer architectures
… …
– GPU
– Multi-core computer systems
• Large scale databases
Neural network Deep belief net
Back propagation Science Speech

1986 2006 2011

deep learning results

• Solve general learning problems

• Tied with biological system
But it is given up…
Neural network Deep belief net
Back propagation Science Speech

1986 2006 2011 2012

How Many Computers to Identify a Cat? 16000 CPU cores

Neural network Deep belief net
Back propagation Science Speech

1986 2006 2011 2012

Rank Name Error Description

rate
1 U. Toronto 0.15315 Deep learning
2 U. Tokyo 0.26172 Hand-crafted
3 U. Oxford 0.26979 features and
learning models.
4 Xerox/INRIA 0.27058
Bottleneck.
Object recognition over 1,000,000 images and 1,000 categories
(2 GPU)
Neural network Deep belief net
Back propagation Science Speech

1986 2006 2011 2012

• ImageNet 2013

Rank Name Error rate Description

1 NYU 0.11197 Deep learning
2 NUS 0.12535 Deep learning
3 Oxford 0.13555 Deep learning

MSRA, IBM, Adobe, NEC, Clarifai, Berkley, U. Tokyo, UCLA, UIUC,

Toronto ….

Top 20 groups all used deep learning

Neural network Deep belief net
Back propagation Science Speech

1986 2006 2011 2012

• Google and Baidu announced their deep

learning based visual search engines (2013)
– Google
– Baidu
Works Done by Us

Detection
 Pedestrian detection
 Facial keypoint detection

Segmentation
 Face parsing
 Pedestrian parsing

Recognition
 Face verification
 Face attribute
recognition
Pedestrian Detection

Improve state-of-the-art
average miss detection rate
on the largest Caltech dataset
from 63% to 39%

ICCV’13

CVPR’12 CVPR’13 ICCV’13

Facial keypoint detection, CVPR’13 Face parsing, CVPR’12
(2% average error on LFPW)

Pedestrian parsing, CVPR’12

Face Recognition and Face Attribute Recognition
(LFW: 96.45%)

Face verification, ICCV’13 Recovering Canonical-View Face Images, ICCV’13

Face attribute recognition, ICCV’13

Introduction on Classical Deep Models
• Convolutional Neural Networks (CNN)
• Deep Belief Net (DBN)
• Auto-encoder
Classical Deep Models
• Convolutional Neural Networks (CNN)
– LeCun’95

Convolution Pooling
Classical Deep Models
• Deep belief net
– Hinton’06

P(x,h1,h2) = p(x|h1) p(h1,h2)

e − E ( x,h1 )
P ( x, h 1 ) = − E ( x,h1 )
∑ e
x,h1

E(x,h1)=b' x+c' h1+h1' Wx

Classical Deep Models
• Auto-encoder
– Hinton’06 ~
x

Encoding: h1 = σ(W1x+b1) W'1 b4

h2 = σ(W2h1+b2) ~
h1
W'2 b3
~
Decoding: h1 = σ(W’2h2+b3) h2
~
x = σ(W’1h1+b4) W2 b2
h1
W1 b1

x
Opinion I
• How to formulate a vision problem with deep learning?
– Make use of experience and insights obtained in CV research
– Sequential design/learning vs joint learning
– Effectively train a deep model (layerwise pre-training + fine tuning)

Feature Quantization Spatial pyramid Feature

(histograms in Classification ↔ filtering
extraction (visual words) extraction
local regions)

Conventional object recognition scheme Quantization ↔ filtering

Spatial ↔ multi-level
pyramid pooling

… Filtering & max Filtering & Filtering & Krizhevsky

pooling max pooling max pooling NIPS’12
Opinion II
• How to make use of the large learning capacity of
deep models?
– High dimensional data transform
– Hierarchical nonlinear representations

SVM + feature
smoothness, shape prior…
Output

High-dimensional
? data transform
Input
Opinion III
• Deep learning likes challenging tasks (for better
generalization)
– Make input data more challenging (augmenting data by
translating, rotating, and scaling)
– Make training process more challenging (dropout:
randomly setting some responses to zero; dropconnect:
randomly setting some weights to zero)
– Make prediction more challenging
Learning feature through face
verification (predicting 0/1 label):
92.57% on LFW with 480 CNNs

Learning feature through face

reconstruction (predicting 9216
pixels): 96.45% on LFW with 4 CNNs

Y. Sun, X. Wang, and X. Tang, “Hybrid Deep

Learning for Computing Face Similarities,” ICCV’13

Z. Zhu, P. Luo, X. Wang, and X. Tang, “Deep Learning Indentify-Preserving Face Space,” ICCV 2013.
Joint Deep Learning
What if we treat an existing deep model as
a black box in pedestrian detection?

ConvNet−U−MS
– Sermnet, K. Kavukcuoglu, S. Chintala, and LeCun, “Pedestrian Detection with
Unsupervised Multi-Stage Feature Learning,” CVPR 2013.
Results on Caltech Test Results on ETHZ
• N. Dalal and B. Triggs. Histograms of oriented gradients for human detection.
CVPR, 2005. (6000 citations)
• P. Felzenszwalb, D. McAlester, and D. Ramanan. A Discriminatively Trained,
Multiscale, Deformable Part Model. CVPR, 2008. (2000 citations)
• W. Ouyang and X. Wang. A Discriminative Deep Model for Pedestrian Detection
with Occlusion Handling. CVPR, 2012.
Our Joint Deep Learning Model
Modeling Part Detectors
• Design the filters in the second
convolutional layer with variable sizes
Part models learned
from HOG

Part models Learned filtered at the second

convolutional layer
Deformation Layer
Visibility Reasoning with Deep Belief Net
Experimental Results
• Caltech – Test dataset (largest, most widely used)
100 95%
Average miss rate ( %)

90 68%
80
63% (state-of-the-art)
70 53%
60

50
39% (best performing)
40
Improve by ~ 20%
30
2000 2002 2004 2006 2008 2010 2012 2014

W. Ouyang and X. Wang, "A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling,“ CVPR 2012.
W. Ouyang, X. Zeng and X. Wang, "Modeling Mutual Visibility Relationship in Pedestrian Detection ", CVPR 2013.
W. Ouyang, Xiaogang Wang, "Single-Pedestrian Detection aided by Multi-pedestrian Detection ", CVPR 2013.
X. Zeng, W. Ouyang and X. Wang, ” A Cascaded Deep Learning Architecture for Pedestrian Detection,” ICCV 2013.
W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,” IEEE ICCV 2013.
Results on Caltech Test Results on ETHZ
DN-HOG
UDN-HOG
UDN-HOGCSS
UDN-CNNFeat
UDN-DefLayer
Multi-Stage Contextual Deep Learning
Motivated by Cascaded Classifiers and
Contextual Boost
• The classifier of each stage deals with a specific set
of samples
• The score map output by one classifier can serve as
contextual information for the next classifier

 Only pass one detection

score to the next stage
 Classifiers are trained
sequentially

Conventional cascaded classifiers for detection

• Our deep model keeps the score map output by the current classifier and it
serves as contextual information to support the decision at the next stage
• Cascaded classifiers are jointly optimized instead of being trained sequentially
• To avoid overfitting, a stage-wise pre-training scheme is proposed to regularize
optimization
• Simulate the cascaded classifiers by mining hard samples to train the network
stage-by-stage
Training Strategies
• Unsupervised pre-train Wh,i+1 layer-by-layer, setting Ws,i+1 = 0, Fi+1 = 0
• Fine-tune all the Wh,i+1 with supervised BP
• Train Fi+1 and Ws,i+1 with BP stage-by-stage
• A correctly classified sampled at the previous stage does not influence the
update of parameters
• Stage-by-stage training can be considered as adding regularization
constraints to parameters, i.e. some parameters are constrained to be
zeros in the early training stages
Log error function:

Gradients for updating parameters:

Experimental Results

Caltech ETHZ
DeepNetNoneFilter
Comparison of Different Training Strategies

Network-BP: use back propagation to update all the parameters without pre-training
PretrainTransferMatrix-BP: the transfer matrices are unsupervised pertrained, and then
all the parameters are fine-tuned
Multi-stage: our multi-stage training strategy
High-Dimensional Data Transforms
Output

High-dimensional
data transform

Input

Facial keypoint detection: face image -> facial keypoint

Face transform: face image in a arbitrary view -> face image in a canonical view
Face parsing: face image -> segmentation maps
Pedestrian parsing : pedestiran image -> segmentation maps
Recovering Canonical-View Face Images

• Z. Zhu, P. Luo, X. Wang, and X. Tang, “Deep Learning

Indentity-Preserving Face Space,” ICCV 2013.

Reconstruction examples from LFW

• No 3D model; no prior information on pose and lighting condition
• Deep model can disentangle hidden factors through feature
extraction over multiple layers
• Model multiple complex transforms
• Reconstructing the whole face is a much strong supervision than
predicting 0/1 class label and helps to avoid overfitting

Arbitrary view Canonical view

Comparison on Multi-PIE
-45o -30o -15o +15o +30o +45o Avg Pose

LGBP [26] 37.7 62.5 77 83 59.2 36.1 59.3 √

VAAM [17] 74.1 91 95.7 95.7 89.5 74.8 86.9 √
FA-EGFC[3] 84.7 95 99.3 99 92.9 85.2 92.7 x
SA-EGFC[3] 93 98.7 99.7 99.7 98.3 93.6 97.2 √
LE[4] + LDA 86.9 95.5 99.9 99.7 95.5 81.8 93.2 x
CRBM[9] + LDA 80.3 90.5 94.9 96.4 88.3 89.8 87.6 x
Ours 95.6 98.5 100.0 99.3 98.5 97.8 98.3 x
Comparison on LFW (without outside training data)

Methods Accuracy (%)

PLDA 90.07
(Li, TPAMI’12)
Joint Bayesian 90.9
(Chen, ECCV’12, 5-point align)
Fisher Vector Faces 93.30
(Barkan, ICCV’13)
High-dim LBP 93.18
(Chen, CVPR’13, 27-point align)
Ours 94.38
(5-point align)
Comparison on LFW (with outside training data)

Methods Accuracy (%)

Associate-Predict 90.57
(Yin CVPR’12)
Joint Bayesian 92.4
(Chen, ECCV’12, 5-point align)
Tom-vs-Peter 93.30
(Berg, BMVC’12, 90-point align)
High-dim LBP 95.17
(Chen, CVPR’13, 27-point align)
Transfer learning joint Bayesian 96.33
(Cao, ICCV’13, 27-point align)
Ours 96.45
(5-point align)
Face Parsing
• P. Luo, X. Wang and X. Tang, “Hierarchical Face
Parsing via Deep Learning,” CVPR 2012
Motivations

• Recast face segmentation as a cross-modality data

transformation problem
• Cross modality autoencoder
• Data of two different modalities share the same
representations in the deep model
• Deep models can be used to learn shape priors for
segmentation
Hierarchical Representation of Face Parsing
Joint Bayesian Formulation
• Detectors are trained with deep belief net (DBN) and
segmentators are trained with deep autoencoder. Both have
are generative models.
• Joint Bayesian framework for face detection, part detection,
component detection, and component segmentation
Training Segmentators
Human Parsing
• P. Luo, X. Wang, and X. Tang, “Pedestrian Parsing via
Deep Decompositional Network,” ICCV 2013
Second row: our result
Third row: ground truth
Facial Keypoint Detection
• Y. Sun, X. Wang and X. Tang, “Deep Convolutional Network
Cascade for Facial Point Detection,” CVPR 2013
Benefits of Using Deep Model
• Take the full face as input to make full use of texture context
information over the entire face to locate each keypoint
• The first network of tacking the whole face as input needs
deep structures to extract high-level features
• Since the networks are trained to predict all the keypoints
simultaneously, the geometric constraints among keypoints
are implicitly encoded
Comparison with Belhumeur et al. [4], Cao et al. [5] on LFPW test images.

1. http://www.luxand.com/facesdk/
2. http://research.microsoft.com/en-us/projects/facesdk/.
3. O. Jesorsky, K. J. Kirchberg, and R. Frischholz. Robust face detection using the hausdorff distance. In Proc. AVBPA, 2001.
4. P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar. Localizing parts of faces using a consensus of exemplars. In Proc. CVPR, 2011.
5. X. Cao, Y. Wei, F. Wen, and J. Sun. Face alignment by explicit shape regression. In Proc. CVPR, 2012.
6. L. Liang, R. Xiao, F. Wen, and J. Sun. Face alignment via component-based discriminative search. In Proc. ECCV, 2008.
7. M. Valstar, B. Martinez, X. Binefa, and M. Pantic. Facial point detection using boosted regression and graph models. In Proc. CVPR, 2010.
Validation.

BioID.

LFPW.
Conclusions
• Deep learning can jointly optimize key components in
vision systems
• Prior knowledge from vision research is valuable for
developing deep models and training strategies
• Deep learning can solve some vision challenges as
problems of high-dimensional data transform
• Challenging prediction tasks can make better use the
large learning capacity and avoid overfitting
People working on deep learning in our group

Wanli Ouyang Ping Luo Yi Sun Xingyu Zeng Zhenyao Zhu

Acknowledgement
Hong Kong Research Grants Council
中国自然科学基金
Thank you!

http://mmlab.ie.cuhk.edu.hk/ http://www.ee.cuhk.edu.hk/~xgwang/

Mechatronics Interview Questions
67% (3)
Mechatronics Interview Questions
5 pages
Top 10 Deep Learning Algorithms You Should Know in 2023
No ratings yet
Top 10 Deep Learning Algorithms You Should Know in 2023
14 pages
Moral Competence Test
No ratings yet
Moral Competence Test
20 pages
Deep Learning (MODULE-3) (1)
No ratings yet
Deep Learning (MODULE-3) (1)
85 pages
Deep Learning Step by Step
No ratings yet
Deep Learning Step by Step
171 pages
RAG with math
No ratings yet
RAG with math
7 pages
Neural Networks PDF
No ratings yet
Neural Networks PDF
89 pages
Dl All Units Materials
No ratings yet
Dl All Units Materials
138 pages
Unit III
No ratings yet
Unit III
58 pages
Unit 2
No ratings yet
Unit 2
112 pages
Deep Learning: - Course Code: - Unit 1
No ratings yet
Deep Learning: - Course Code: - Unit 1
21 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Neural Network
100% (1)
Neural Network
54 pages
PPT_Btech CSE
No ratings yet
PPT_Btech CSE
17 pages
ISSUU PDF Downloader
No ratings yet
ISSUU PDF Downloader
9 pages
Deep Learning and TensorFlow
No ratings yet
Deep Learning and TensorFlow
50 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu - Medium
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu - Medium
8 pages
Lecture 26-30 Unit 2
No ratings yet
Lecture 26-30 Unit 2
20 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
35 pages
1. Deep Learning
No ratings yet
1. Deep Learning
127 pages
2 Basic Human Aspiration
No ratings yet
2 Basic Human Aspiration
45 pages
PDF Hands-on Time Series Analysis With Python: From Basics To Bleeding Edge Techniques B. V. Vishwas download
100% (1)
PDF Hands-on Time Series Analysis With Python: From Basics To Bleeding Edge Techniques B. V. Vishwas download
62 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
15 pages
SCSA3015 Deep Learning Unit 2 PDF
No ratings yet
SCSA3015 Deep Learning Unit 2 PDF
32 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
Bidirectional RNN and RVNN
No ratings yet
Bidirectional RNN and RVNN
15 pages
Instant Ebooks Textbook Deep Generative Modeling Jakub M. Tomczak Download All Chapters
No ratings yet
Instant Ebooks Textbook Deep Generative Modeling Jakub M. Tomczak Download All Chapters
49 pages
RETAIL FORMATS PPT. - Dr. Sane
No ratings yet
RETAIL FORMATS PPT. - Dr. Sane
13 pages
Our Product Selling Project
No ratings yet
Our Product Selling Project
4 pages
Office Note - of Majedul Islam
No ratings yet
Office Note - of Majedul Islam
4 pages
Petro Internship Report
No ratings yet
Petro Internship Report
21 pages
Designing For Inclusivity Web Accessibility in Full Stack Applications
No ratings yet
Designing For Inclusivity Web Accessibility in Full Stack Applications
4 pages
Video Tutorial: Machine Learning 17CS73
100% (2)
Video Tutorial: Machine Learning 17CS73
27 pages
There Are Following Two Methods of Minimizing or Reducing The Boolean Expressions
No ratings yet
There Are Following Two Methods of Minimizing or Reducing The Boolean Expressions
10 pages
AI-Lecture 12 - Simple Perceptron
100% (1)
AI-Lecture 12 - Simple Perceptron
24 pages
C++Lab Notes
No ratings yet
C++Lab Notes
20 pages
Business Plans Scan
No ratings yet
Business Plans Scan
20 pages
Rodríguez-Procel, W., in
No ratings yet
Rodríguez-Procel, W., in
41 pages
DLunit 4
No ratings yet
DLunit 4
16 pages
2016 Paper 1 MS
No ratings yet
2016 Paper 1 MS
14 pages
Improved Techniques For Training Gans: (G) Data (G)
No ratings yet
Improved Techniques For Training Gans: (G) Data (G)
10 pages
The Mostly Complete Chart of Neural Networks
100% (1)
The Mostly Complete Chart of Neural Networks
19 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Image-to-Image Translation: Methods and Applications
No ratings yet
Image-to-Image Translation: Methods and Applications
19 pages
Face Synthesis From Visual Attributes Via Sketch Using Conditional Vaes and Gans
No ratings yet
Face Synthesis From Visual Attributes Via Sketch Using Conditional Vaes and Gans
15 pages
Elevation Depression
No ratings yet
Elevation Depression
7 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
PThread API Reference
No ratings yet
PThread API Reference
348 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Advanced Deep Learning Questions - ChatGPT
No ratings yet
Advanced Deep Learning Questions - ChatGPT
13 pages
Sketch Image Translation
No ratings yet
Sketch Image Translation
7 pages
ML First Unit
No ratings yet
ML First Unit
70 pages
Visvesvaraya Technological University, Belagavi
No ratings yet
Visvesvaraya Technological University, Belagavi
29 pages
(123doc) - Bai-Tap-Ham-So-On-Thi-Olympic-Toan-Sinh-Vien
No ratings yet
(123doc) - Bai-Tap-Ham-So-On-Thi-Olympic-Toan-Sinh-Vien
13 pages
Sketch2face: Conditional Generative Adversarial Networks For Transforming Face Sketches Into Photorealistic Images
No ratings yet
Sketch2face: Conditional Generative Adversarial Networks For Transforming Face Sketches Into Photorealistic Images
9 pages
L1 Introduction
No ratings yet
L1 Introduction
25 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
Tensor Flow
No ratings yet
Tensor Flow
12 pages
Back Propagation
100% (1)
Back Propagation
27 pages
Lepchas in North Sikkim
No ratings yet
Lepchas in North Sikkim
4 pages
Theoretical and Conceptual Framework of The Study: 2.0 Chapter Overview
No ratings yet
Theoretical and Conceptual Framework of The Study: 2.0 Chapter Overview
23 pages
AI Unit 4 - Artificial Neural Network by Kulbhushan (Krazy Kaksha & KK World)
No ratings yet
AI Unit 4 - Artificial Neural Network by Kulbhushan (Krazy Kaksha & KK World)
5 pages
Module 4: Learning Objectives: Explain How Reliability and Validity Can Influence Interpretation of Research Results
No ratings yet
Module 4: Learning Objectives: Explain How Reliability and Validity Can Influence Interpretation of Research Results
12 pages
Deep Learning
No ratings yet
Deep Learning
2 pages
Escherichia Coli
100% (1)
Escherichia Coli
31 pages
Machine Learning Module-3
No ratings yet
Machine Learning Module-3
23 pages
De Thi Thu Du Lich
No ratings yet
De Thi Thu Du Lich
6 pages
Deep Learning Methods and Applications For Electrical Power Systems A Comprehensive Review
No ratings yet
Deep Learning Methods and Applications For Electrical Power Systems A Comprehensive Review
22 pages
Ajantha E.M School: 13 Street, Chandramouli Nagar, Vedayapalem, Nellore
No ratings yet
Ajantha E.M School: 13 Street, Chandramouli Nagar, Vedayapalem, Nellore
1 page
Neural Networks
No ratings yet
Neural Networks
29 pages
Introduction To Neural Networks Using Matlab 6 0 S N Sivanandam Sumathi Deepa
0% (1)
Introduction To Neural Networks Using Matlab 6 0 S N Sivanandam Sumathi Deepa
4 pages
Sound System Equalization
No ratings yet
Sound System Equalization
17 pages
Transfer Function Transfer Function
No ratings yet
Transfer Function Transfer Function
7 pages
Word Work Portfolio
No ratings yet
Word Work Portfolio
11 pages
Edgeconnect: Generative Image Inpainting With Adversarial Edge Learning
No ratings yet
Edgeconnect: Generative Image Inpainting With Adversarial Edge Learning
17 pages
Realistic Face Image Generation Based On Generative Adversarial Network
No ratings yet
Realistic Face Image Generation Based On Generative Adversarial Network
4 pages
Linear Algebra
No ratings yet
Linear Algebra
1 page
Solution Manual Computer Algorithms 3rd Edition Baase
No ratings yet
Solution Manual Computer Algorithms 3rd Edition Baase
15 pages
HDD RAW Fix Partition !
No ratings yet
HDD RAW Fix Partition !
18 pages
Cs QP 2013 First Session
No ratings yet
Cs QP 2013 First Session
2 pages
Artificial Neural Networks: Part 1/3
No ratings yet
Artificial Neural Networks: Part 1/3
25 pages
Legend For Wiring Diagrams K1600GT (K48)
No ratings yet
Legend For Wiring Diagrams K1600GT (K48)
7 pages
ATM Tank Datasheet
No ratings yet
ATM Tank Datasheet
1 page
Soft Computing Decode
No ratings yet
Soft Computing Decode
142 pages
Job Organizational Chart
No ratings yet
Job Organizational Chart
2 pages
ECG Rhythm Interpretation: How To Analyze A Rhythm
No ratings yet
ECG Rhythm Interpretation: How To Analyze A Rhythm
12 pages
Dropout Vs Pruning
No ratings yet
Dropout Vs Pruning
2 pages
BS6588 1996
No ratings yet
BS6588 1996
17 pages
Product Datasheet: Circuit Breaker Compact NSX250H - TMD - 250 A - 3 Poles 3d
No ratings yet
Product Datasheet: Circuit Breaker Compact NSX250H - TMD - 250 A - 3 Poles 3d
2 pages
Notes On Backpropagation
No ratings yet
Notes On Backpropagation
14 pages
30 Hrs Deep Learning CV Images Video
No ratings yet
30 Hrs Deep Learning CV Images Video
6 pages
SPEC of Pipes For Shipbuilding
No ratings yet
SPEC of Pipes For Shipbuilding
13 pages
PyTorch Workflow Fundamentals
No ratings yet
PyTorch Workflow Fundamentals
1 page
ANN Matlab
No ratings yet
ANN Matlab
13 pages
Backpropagation
No ratings yet
Backpropagation
7 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
Pthread
No ratings yet
Pthread
4 pages
Lab I TENSOR FLOW AND KERAS
No ratings yet
Lab I TENSOR FLOW AND KERAS
3 pages
The Backpropagation Algorithm
No ratings yet
The Backpropagation Algorithm
4 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
From Everand
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
Zhenya Antić
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet

Deep Learning in Object Detection, PDF

Uploaded by

Deep Learning in Object Detection, PDF

Uploaded by

Deep Learning in Object Detection,

Segmentation, and Recognition

Crowd video surveillance

• Solve general learning problems

• SVM • Loose tie with biological systems

Kruger et al. TPAMI’13

… … • Unsupervised & Layer-wised pre-training

1986 2006 2011

deep learning results

• Solve general learning problems

1986 2006 2011 2012

How Many Computers to Identify a Cat? 16000 CPU cores

1986 2006 2011 2012

Rank Name Error Description

1986 2006 2011 2012

Rank Name Error rate Description

MSRA, IBM, Adobe, NEC, Clarifai, Berkley, U. Tokyo, UCLA, UIUC,

Top 20 groups all used deep learning

1986 2006 2011 2012

• Google and Baidu announced their deep

CVPR’12 CVPR’13 ICCV’13

Pedestrian parsing, CVPR’12

Face verification, ICCV’13 Recovering Canonical-View Face Images, ICCV’13

Face attribute recognition, ICCV’13

P(x,h1,h2) = p(x|h1) p(h1,h2)

E(x,h1)=b' x+c' h1+h1' Wx

Encoding: h1 = σ(W1x+b1) W'1 b4

Feature Quantization Spatial pyramid Feature

Conventional object recognition scheme Quantization ↔ filtering

… Filtering & max Filtering & Filtering & Krizhevsky

Learning feature through face

Y. Sun, X. Wang, and X. Tang, “Hybrid Deep

Part models Learned filtered at the second

 Only pass one detection

Conventional cascaded classifiers for detection

Gradients for updating parameters:

Facial keypoint detection: face image -> facial keypoint

• Z. Zhu, P. Luo, X. Wang, and X. Tang, “Deep Learning

Reconstruction examples from LFW

Arbitrary view Canonical view

LGBP [26] 37.7 62.5 77 83 59.2 36.1 59.3 √

Methods Accuracy (%)

Methods Accuracy (%)

• Recast face segmentation as a cross-modality data

Wanli Ouyang Ping Luo Yi Sun Xingyu Zeng Zhenyao Zhu

You might also like