Vision Algorithms For Mobile Robotics: Davide Scaramuzza
Vision Algorithms For Mobile Robotics: Davide Scaramuzza
Lecture 01
Introduction
Davide Scaramuzza
http://rpg.ifi.uzh.ch
1
Today’s Class
• About me and my research lab
• What is Computer Vision?
• Why study computer vision?
• Example of Vision Applications
• Live Demos!
• Specifics of this course
• Overview of Visual Odometry
2
Who am I?
Current positions
Professor of Robotics, Dep. of Informatics and Neuroinformatics (UZH & ETH)
Education
PhD from ETH Zurich with Roland Siegwart
Post-doc at the University of Pennsylvania with Vijay Kumar & Kostas Daniilidis
Highlights
Coordinator of the European project sFly on visual navigation of micro drones
Which introduced the PX4 autopilot and visual navigation of drones
3
My Research Background
Computer Vision
Visual Odometry and SLAM
Sensor fusion
Camera calibration
4
My lab
http://rpg.ifi.uzh.ch
Closed to bahnhof Oerlikon,
Andreasstrasse 15, 2nd floor
5
Research Overview
Real-time, Onboard Computer Vision and Control for Autonomous, Agile Drone Flight
Falanga et al., The Foldable Drone: A Morphing Quadrotor that can Squeeze and Fly, RAL’19. PDF. Videos.
Featured in IEEE Spectrum.
Research Overview
Real-time, Onboard Computer Vision and Control for Autonomous, Agile Drone Flight
Kaufmann, Loquercio, Dosovitskiy, Ranftl, Koltun, Scaramuzza, Deep Drone Racing: Learning Agile Flight in
Dynamic Environments, Conference on Robot Learning (CORL), Zurich, Oct. 29-31, 2018. PDF, YouTube
Student Projects: http://rpg.ifi.uzh.ch/student_projects.php
8
Successful Startups
9
Fotokite (2014) – Power-over-tether drone for aerial filming
Pilot-free tethered aerial camera system with limitless flight time and data
bandwidth
1st and only system approved by the FAA for Public Safety teams to use without a
pilot license
10
Zurich-Eye (2015) - now Oculus Zurich
Vision-based Localization and Mapping Solutions for Mobile Robots
Created in Sep. 2015, became Facebook-Oculus Zurich in Sep. 2016
11
Zurich-Eye (2015) - now Oculus Zurich
Vision-based Localization and Mapping Solutions for Mobile Robots
Created in Sep. 2015, became Facebook-Oculus Zurich in Sep. 2016
The Zurich Eye team is behind the new Oculus Quest
12
We will have a lecture by Christian Forster, from Oculus Zurich end of November!
Today’s Class
• About me and my research lab
• What is Computer Vision?
• Why study computer vision?
• Example of Vision Applications
• Live Demos!
• Specifics of this course
• Overview of Visual Odometry
14
What is computer vision?
Automatic extraction of “meaningful” information from
images and videos
tree
roof tree
sky chimney
building
building
window
door
17
Today’s Class
• About me and my research lab
• What is Computer Vision?
• Why study computer vision?
• Example of Vision Applications
• Live Demos!
• Specifics of this course
• Overview of Visual Odometry
18
Why study computer vision?
Relieve humans of boring, easy tasks
Enhance human abilities: human-computer interaction, visualization,
augmented reality (AR)
Perception for autonomous robots
Organize and give access to visual content
Lots of computer-vision companies and jobs in Switzerland (Zurich &
Lausanne):
• Facebook-Oculus (Zurich): AR/VR
• Magic-Leap (Zurich & Lausanne): AR/VR
• Microsoft Research (Zurich): Robotics and Hololens
• Google (Zurich): Brain, ARCore, Street View, YouTube
• Apple (Zurich): Autonomous Driving, face tracking
• NVIDIA (Zurich): simulation, autonomous driving
• Logitech (Zurich, Lausanne)
• Disney-Research (Zurich)
• Pix4D (Lausanne)
• VIZRT (Zurich): sport broadcasting, 3D replay
• More: https://de.glassdoor.ch/Job/z%C3%BCrich-computer-vision-jobs-
SRCH_IL.0,6_IC3297851_KO7,22.htm 19
Vision in humans
Vision is our most powerful sense. Half of primate cerebral cortex is devoted
to visual processing
Retina is ~1,000 mm2. Contains 130 million photoreceptors
(120 mil. rods (low light vision) and 10 mil. cones for color sampling)
Provides enormous amount of information: data-rate of ~3GBytes/s
To match the eye resolution we would need a 500 Megapixel camera. But in
practice the acuity of an eye is 8 Megapixels within a 18-degree field of view
(5.5 mm diameter) region called fovea
Fovea
20
What a newborn sees every month in the first year
“Your baby sees things best from 15 to 30 cm away. This is the perfect distance for gazing up
into the eyes of mom or dad. Any farther than that, and the newborn sees mostly blurry
shapes because they're nearsighted. At birth, a newborn's eyesight is between 20/200 and
20/400.”
http://uk.businessinsider.com/what-a-baby-can-see-every-month-for-the-first-year-of-its-life-2017-1?r=US&IR=T
21
Why is vision hard?
How do we go from an array of number to recognizing a
fruit?
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 8 28
0 0 0 0 0 0 11 37 61 88 116 132
0 0 0 0 15 64 108 130 131 135 141 145
0 3 32 71 107 132 144 139 139 144 137 143
41 90 113 124 138 140 145 148 147 155 152 139
123 134 136 140 147 149 152 160 160 155 163 155
143 144 147 151 156 160 157 159 167 167 160 167
152 156 161 165 166 169 170 170 164 169 173 164
157 157 161 168 176 175 174 180 173 164 165 171
165 166 164 163 166 172 179 177 168 173 167 168
167 168 169 175 173 168 171 177 174 168 172 173
What we see
23
Related disciplines
Artificial
intelligence
Machine
Graphics learning
Computer
Image vision Cognitive
processing science
Robotics
24
Computer Vision vs Computer Graphics
Computer Graphics
25
Today’s Class
• About me and my research lab
• What is Computer Vision?
• Why study computer vision?
• Example of Vision Applications
• Live Demos!
• Specifics of this course
• Overview of Visual Odometry
26
Optical character recognition (OCR)
Technology to convert scanned docs to text
29
Automotive safety
• Mobileye: Vision systems in high-end Tesla, BMW, GM, Volvo models. Bought by Intel in
2017 for 15 billion USD!
– Pedestrian collision warning
– Forward collision warning
– Lane departure warning
– Headway monitoring and warning 30
Video gaming: Xbox Kinect
31
Lot of Computer Vision in Modern Smartphones
iPhone X
32
Vision in space
NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.
37
Microsoft HoloLens
38
Google Visual Positioning Service
(integrated into Maps and Street View)
39
Instructors
• Lecturer
42
Organization of this Course
Lectures:
• 10:15 to 12:00 every week
• Room: ETH LFW C5, Universitätstrasse 2, 8092 Zurich.
Exercises:
• 13:15 to 15:00: Starting from next week (Lecture 02). Then almost every
week.
• Room: ETH HG E 1.1, Rämistrasse 101, 8092 Zurich.
43
Learning Objectives
• High-level goal: learn to implement current visual odometry pipelines used in
mobile robots (drones, cars, Mars rovers), and Virtual-reality (VR) and
Augmented reality (AR) products: e.g., Google Visual Positioning Service, Oculus
Quest, Microsoft HoloLens, Magic Leap.
• You will also learn to implement the fundamental computer vision algorithms
used in mobile robotics, in particular: feature extraction, multiple view
geometry, dense reconstruction, object tracking, image retrieval, visual-inertial
fusion, event-based vision.
44
Course Schedule
For updates, slides, and additional material: http://rpg.ifi.uzh.ch/teaching.html
19.09.2019 Lecture 01 - Introduction to Computer Vision and Visual Odometry Davide Scaramuzza
Lecture 02 - Image Formation 1: perspective projection and camera models Davide Scaramuzza
26.09.2019
Exercise 01 - Augmented reality wireframe cube Daniel & Mathias Gehrig
Lecture 03 - Image Formation 2: camera calibration algorithms Davide Scaramuzza
03.10.2019
Exercise 02 - PnP problem Daniel & Mathias Gehrig
10.10.2019 Lecture 04 - Filtering & Edge detection Davide Scaramuzza
Lecture 05 - Point Feature Detectors, Part 1 Davide Scaramuzza
17.10.2019
Exercise 03 - Harris detector + descriptor + matching Daniel & Mathias Gehrig
Lecture 06 - Point Feature Detectors, Part 2 Davide Scaramuzza
24.10.2019
Exercise 04 - SIFT detector + descriptor + matching Daniel & Mathias Gehrig
Lecture 07 - Multiple-view geometry Davide Scaramuzza
31.10.2019
Exercise 05 - Stereo vision: rectification, epipolar matching, disparity, triangulation Daniel & Mathias Gehrig
Lecture 08 - Multiple-view geometry 2 Antonio Loquercio
07.11.2019
Exercise 06 - Eight-Point Algorithm Daniel & Mathias Gehrig
14.11.2019 Lecture 09 - Multiple-view geometry 3 (Part 1) Antonio Loquercio
Lecture 10 - Multiple-view geometry 3 (Part 2) Davide Scaramuzza
21.11.2019
Exercise session: Intermediate VO Integration Daniel & Mathias Gehrig
Lecture 11 - Optical Flow and Tracking (Lucas-Kanade) Davide Scaramuzza
28.11.2019
Exercise 08 - Lucas-Kanade tracker Daniel & Mathias Gehrig
Lecture 12 - Place recognition and 3D Reconstruction Davide Scaramuzza
05.12.2019
Exercise session: Deep Learning Tutorial Daniel & Mathias Gehrig
Lecture 13 - Visual inertial fusion Davide Scaramuzza
12.12.2019
Exercise 09 - Bundle Adjustment Daniel & Mathias Gehrig
Lecture 14 - Event based vision Davide Scaramuzza
19.12.2019 After the lecture, we will Scaramuzza's lab. Departure from lecture room at 12:00 via tram 10. Daniel & Mathias45
Gehrig
Exercise session: Final VO Integration
Exercises
• Almost every week starting from next week (check out course schedule)
• Participation in the exercise sessions is mandatory. Questions about the
implementation details might be asked at the exam.
• Bring your own laptop
• Each exercise will consist of coding a building block of a visual odometry pipeline.
There will be two exercises dedicated to integrating these blocks together.
• Have Matlab pre-installed!
– ETH: Download: https://idesnx.ethz.ch/
– UZH: Download: https://www.zi.uzh.ch/de/students/software-
elearning/softwareinstructions/Matlab.html
– An introductory tutorial on Matlab can be found here:
http://rpg.ifi.uzh.ch/docs/teaching/2019/MatlabPrimer.pdf
– Please install all the toolboxes included in the license.
46
Exercises
• Learning Goal of the exercises: Implement a full visual odometry pipeline
(similar to that running on Mars rovers and on current AR/VR devices (but
actually much better )).
• Each week you will learn how to implement a building block of visual
odometry. The building blocks are:
Image sequence
Feature detection
Motion estimation
2D-2D 3D-3D 3D-2D
Local optimization
47
Outcome of last year exercises
48
Recommended Textbooks
Robotics, Vision and Control: Fundamental Algorithms, by
Peter Corke 2011. The PDF of the book can be freely
downloaded (only with ETH VPN) from Springer or
alternatively from Library Genesys
Other books:
• An Invitation to 3D Vision: Y. Ma, S. Soatto, J. Kosecka, S.S.
Sastry
• Multiple view Geometry: R. Hartley and A. Zisserman (Library
Genesys)
49
Prerequisites
• Linear algebra
• Matrix calculus
• No prior knowledge of computer vision and image processing
required
50
Grading and Exam
• The final grade is based on an oral exam (30 minutes). Example exam questions
here.
– Exam dates:
• UZH: January 9, 2020
• ETH: from January 20 to February 14, 2020 (the dates are handled by
ETH Exam Center and are usually communicated before Christmas)
• Strong class participation can offset negative performance at the oral exam.
• Optional mini project:
– you have the option (i.e., not mandatory) to do a mini project, which
consists of implementing a working visual odometry algorithm in Matlab
(C++ or Python are also acceptable)
– If the algorithm runs smoothly, producing a reasonable result, you will be
rewarded with an up to 0.5 grade increase on the final grade. However,
notice that the mini project can be quite time consuming!
– The deadline to hand in the mini project is 5.01.2020.
– Group work (up to 4) possible. 51
Class Participation
• Class participation includes
– showing up
– being able to articulate key points from last lecture
– ask and answer questions
52
Today’s Class
• About me and my research lab
• What is Computer Vision?
• Why study computer vision?
• Example of Vision Applications
• Live Demos!
• Specifics of this course
• Overview of Visual Odometry
53
VO is the process of incrementally estimating the pose of the vehicle by
examining the changes that motion induces on the images of its onboard
cameras
54
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Contrary to wheel odometry, VO is not affected by wheel slippage on uneven
terrain or other adverse conditions.
55
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Sufficient illumination in the environment
Dominance of static scene over moving objects
Enough texture to allow apparent motion to be extracted
Sufficient scene overlap between consecutive frames
56
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
1980: First known VO real-time implementation on a robot by Hans Moraveck PhD
thesis (NASA/JPL) for Mars rovers using one sliding camera (sliding stereo).
57
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
1980: First known VO real-time implementation on a robot by Hans Moraveck PhD
thesis (NASA/JPL) for Mars rovers using one sliding camera (sliding stereo).
2004: VO was used on a robot on another planet: Mars rovers Spirit and Opportunity
(see seminal paper from NASA/JPL, 2007)
58
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Scaramuzza, D., Fraundorfer, F., Visual Odometry: Part I - The First 30 Years and
Fundamentals, IEEE Robotics and Automation Magazine, Volume 18, issue 4, 2011. PDF
Fraundorfer, F., Scaramuzza, D., Visual Odometry: Part II - Matching, Robustness, and
Applications, IEEE Robotics and Automation Magazine, Volume 19, issue 1, 2012. PDF
C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I.D. Reid, J.J. Leonard,
Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-
Perception Age, IEEE Transactions on Robotics, Vol. 32, Issue 6, 2016. PDF
59
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
SFM VSLAM VO
60
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
SFM is more general than VO and tackles the problem of 3D
reconstruction and 6DOF pose estimation from unordered image sets
62
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Visual Odometry
Focus on incremental estimation
Guarantees local consistency
Image sequence
Front-end
Feature detection
Motion estimation
Local optimization
Back-end
64
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
VO computes the camera path incrementally (pose after pose)
Image sequence
Feature detection
Local optimization
65
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
VO computes the camera path incrementally (pose after pose)
Image sequence
Feature detection
Ck+1
Feature matching (tracking)
Tk+1,k
Ck
Motion estimation
Tk,k-1 Ck-1
Local optimization
66
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
VO computes the camera path incrementally (pose after pose)
Image sequence
Front-end
Feature detection
Motion estimation
...
67
Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Course Topics
• Principles of image formation
• Image Filtering
• Feature detection and matching
• Multi-view geometry
• Dense reconstruction
• Visual place recognition
• Visual inertial fusion
• Event-based Vision
68
Course Topics
(0,0) u Image plane
• Principles of image formation
v
– Perspective projection O (u0,v0) x
– Camera calibration
y p
u
Zc
Pc
v
O
p
f Image plane (CCD)
C Xc
Yc
69
Course Topics
• Feature detection and matching
70
Course Topics
• Multi-view geometry and sparse 3D reconstruction
71
Course Topics
• Dense 3D reconstruction
72
Course Topics
• Dense 3D reconstruction
73
Course Topics
• Place recognition
Query
Most similar places from a database of millions of images
image
74
Course Topics
• Visual-inertial fusion
75
Course Topics
• Event-based vision
76
Application: High speed VO
Rosinol et al., Ultimate SLAM? IEEE RAL’18 best Paper Award Honorable Mention PDF. Video. IEEE Spectrum.
Understanding Check
Are you able to:
• Provide a definition of Visual Odometry?
• Explain the most important differences between VO, VSLAM and SFM?
• Describe the needed assumptions for VO?
• Illustrate its building blocks?
78