OBJECT TRACKING IN VIDEO
Master Thesis Project of Andrea Ferri
Supervised by Jordi Torres and Xavier Giro I Nieto
20 th October 2016, UPC, Barcelona
Summary
1.Project Overview;
2.Methodology;
3.Project Development;
4.Solved Problems;
5.Running Example;
6.Evaluations;
7.Conclusions;
8.References.
2
Project Overview: Goals
Build a working Model for Object Tracking in Video
Object Detection from Video
3
Can
be the fastest adaptable environment
for Machine Learning,
to implement Research and Development
into a different infrastructure architecture
with a great improvement perspective?
4
Goals:
is an Open Source Software
Library for Machine Intelligence
Based on:
Work in:
Powered by:
5
Goals:
6
Goals: Environment
NEW (November 9, 2015)
Less than 1 year
Great Potentials of Improvement
Great Supported Community
Still Few Available Components
I exploit the usable projects
at the Best! 7
Goals:
Image Database organized according
to the WordNet hierarchy
Used for the:
ILSVRC
8
VID Challenge:
30 Moving Object Classes;
Specific Datasets Provided:
- Train 3862 Snippets;
- Validation 555 Snippets;
- Test 973 Snippets.
9
Model Neural Network
System of programs and data structures which
approximates the operation of the human
brain.
10
Model Neural Network
INPUT
INPUT OUTPUT
INPUT
Output
Input Neural 1Neural N Neural Last Neural Neural
Layer Layer Layer Layer Layer
11
Back Propagation
Method for training artificial neural networks
used with an optimization method
12
Back Propagation
ERROR BACK PROPAGATION
INPUT
weights weights
INPUT
ERROR
weights
INPUT
weights
INPUT LAYER HIDDEN LAYER/S
13
Goals: Model for Object Tracking in Video
t=0 t=1 t=2 t=3
Class: Dog conf 0.78 Class: Dog conf 0.59 Class: Dog conf 0.34 No Objects
For each frame:
I. Detect possible objects;
II. Identify possible detections.
Track them in time and space.
14
Project Overview: Architecture
Model for Object Tracking in Video
Per-frame Analysis:
I. Detect possible objects; Still-Image
II. Identify possible detections. Approach
Post-Processing
In Time & Space Analysis
Approach
15
Modular Structure
GENERAL POST IMAGE
OBJECT PROCESSING CLASSIFIER
DETECTOR TRACKER (OBJECT CLC)
AIRPLANES
16
Methodology
Time Constraints of 5 month
Starting from Scratch
The Power
Fast of
Learn
Fast
the Develop
Community
17
Project Development
GENERAL TensorBox
OBJECT (GitHub Repo)
DETECTOR
Still-Image
Approach IMAGE Inception
CLASSIFIER
(OBJECT CLC) (GitHub Repo)
Post
Processing POST Python
PROCESSING
Implementation
Approach TRACKER
18
Still Image Analysis
TensorBox (GitHub Repo)
OverFeat Model (Pierre Sermanet et al.)
GENERAL
OBJECT
DETECTOR
Unbalanced
19
Trained as Single Class on the 30 VID Classes
Lots of
Peaks
20
Trained as Single Class on the 30 VID Classes
Regular
Curve
21
Inception(GitHub Repo)
Inception V3 Model
(Christian Szegedy et al.)
IMAGE
CLASSIFIER
(OBJECT CLC)
Well Balanced
22
Trained as Multi Class on the 30 VID Classes
Really Smooth
23
Post Processing Analysis
Python Implementation
Not a Trainable Model
POST
Based on simplification a of
PROCESSING The Slow and Steady Features Analysis
TRACKER
- Bounding Boxes
- Object movement
24
Solved Problems
Environment Installation;
Libraries Setting;
Components Training;
Components Combination;
Post Processing Implementation;
Dataset usage. 25
Results: VID ImageNET Challenge
Number of ob ject
Team name Entry description mAP
categories won
cascaded region
NUIST 10 0.808292
regression + tracking
cascaded region
NUIST 10 0.803154
regression + tracking
4-model ensemble
CUVideo (Multi-Context .. & 9 0.767981
Motion-Guided .. )
Trimps-Soushen Ensemble 2 1 0.709651
With Provided Data
26
Results: VID ImageNET Challenge
Number of object
Team name Entry description mAP
categories won
cascaded region
NUIST 17 0.79593
regression + tracking
cascaded region
NUIST 5 0.781144
regression + tracking
Trimps-Soushen Ensemble 6 5 0.720704
An ensemble for
ITLab-Inha detection, MCMOT for 3 0.731471
tracking
With Additional Data
27
Results: VID ImageNET Challenge
Team name Entry description mAP
CUVideo 4-model ensemble 0.558557
Tracking + With Provided Data
Description of
Team name Entry description mAP
outside data used
cascaded region proposal network is
NUIST regression + fine-tuned from 0.583898
tracking COCO
Tracking + With Additional Data
28
Results: Validation Developed Model
0.002263 mAP
Class mAP Class mAP Class mAP
airplane 0 elephant 0 red panda 0
antelope 0 fox 0 sheep 0.0329
bear 0 giant panda 0 snake 0
bicycle 0 hamster 0 squirrel 0
bird 0 horse 0 tiger 0
bus 0 lion 0 train 0
car 0.0002 lizard 0 turtle 0.0615
cattle 0 monkey 0 watercraft 0.0001
dog 0.0006 motorcycle 0.0219 whale 0
domestic cat 0.1492 rabbit 0 zebra 0
29
Evaluations: Developed Model
Modular Structure
GENERAL POST IMAGE
OBJECT PROCESSING CLASSIFIER
DETECTOR TRACKER (OBJECT CLC)
LOW STARTING NOT ENOUGH
ACCURACY TO COMPENSATE
30
LOC Validation Results for the G.O.D.
Class mAP Class mAP Class mAP
airplane 0 elephant -0.0021 red panda 0
antelope 0 fox +0.0843 sheep -0
bear 0 giant panda 0 snake +0.0214
bicycle 0 hamster 0 squirrel 0
bird 0 horse 0 tiger 0
bus +0.0003 lion 0 train +0.0011
car +0.0019 lizard +0.0001 turtle +0.0991
cattle 0 monkey 0 watercraft +0.0003
dog -0 motorcycle -0 whale +0.0002
domestic cat -0.0006 rabbit +0.0003 zebra 0
Best Overlap 31
LOC Validation Results for the G.O.D.
Class mAP Class mAP Class mAP
airplane 0 elephant +0.4077 red panda 0
antelope 0 fox 0 sheep +0.0789
bear 0 giant panda 0 snake -0
bicycle 0 hamster 0 squirrel 0
bird 0 horse 0 tiger 0
bus +0.0014 lion +0.0007 train +0.0056
car +0.0091 lizard 0 turtle +0.4935
cattle +0.0002 monkey 0 watercraft +0.0013
dog -0.0001 motorcycle -0 whale +0.0010
domestic cat -0.0103 rabbit 0 zebra 0
Best Intersection Over Union
32
Evaluations: Possible Improvements
Change Modules Order
GENERAL POST IMAGE
OBJECT PROCESSING CLASSIFIER
DETECTOR TRACKER (OBJECT CLC)
IDENTIFY 30 SPECIFIC TRAINABLE
OBJECTS MODELS MODEL
33
Initial Question
Can
be the fastest adaptable environment
for Machine Learning,
to implement Research and Development,
into a different infrastructure architecture
with a great improvement perspective?
34
Conclusions
I Started without any clue about
Deep Learning
And Visual Recognition Topic .
I Finished implementing
a working model
for Object Tracking in Video.
35
Conclusions
Yes! I think
demonstrate to be adaptable and
with a great improvement
perspectives.
36
THANKS !
37
References
Thesis Project GitHub;
Tensorbox GitHub;
YOLO GitHub;
Inception GitHub;
TensorFlow.
38
Questions & Answers
39