0% found this document useful (0 votes)

9 views

lecture01

Uploaded by

neraho1297

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

lecture01

Uploaded by

neraho1297

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CS 131 Lecture 1: Course introduction

Olivier Moindrot
Department of Computer Science
Stanford University
Stanford, CA 94305
[email protected]

1 What is computer vision?

1.1 Definition

Two definitions of computer vision Computer vision can be defined as a scientific field that
extracts information out of digital images. The type of information gained from an image can vary
from identification, space measurements for navigation, or augmented reality applications.
Another way to define computer vision is through its applications. Computer vision is building
algorithms that can understand the content of images and use it for other applications. We will see
in more details in section 4 the different domains where computer vision is applied.

A bit of history The origins of computer vision go back to an MIT undergraduate summer project
in 1966 [4]. It was believed at the time that computer vision could be solved in one summer, but we
now have a 50-year old scientific field which is still far from being solved.

Figure 1: Computer vision at the intersection of multiple scientific fields

Computer Vision: Foundations and Applications (CS 131, 2017), Stanford University.
1.2 An interdisciplinary field

Computer vision brings together a large set of disciplines. Neuroscience can help computer vision by
first understanding human vision, as we will see in section 2. Computer vision can be seen as a part
of computer science, and algorithm theory or machine learning are essential for developing computer
vision algorithms.
We will show in this class how all the fields in figure 1 are connected, and how computer vision draws
inspiration and techniques from them.

1.3 A hard problem

Computer vision has not been solved in 50 years, and is still a very hard problem. It’s something that
we humans do unconsciously but that is genuinely hard for computers.

Poetry harder than chess The IBM supercomputer Deep Blue defeated for the first time the world
chess champion Garry Kasparov in 1997. Today we still struggle to create algorithms that output
well formed sentences, let alone poems. The gap between these two domains show that what humans
call intelligence is often not a good criteria to assess the difficulty of a computer task. Deep Blue
won through brute force search among millions of possibilities and was not more intelligent than
Kasparov.

Vision harder than 3D modeling It is today easier to create a 3D model of an object up to

millimeter precision than to build an algorithm that recognizes chairs. Object recognition is still a
very difficult problem, although we are approaching human accuracy.

Why is it so hard? Computer vision is hard because there is a huge gap between pixels and
meaning. What the computer sees in a 200 × 200 RGB image is a set of 120, 000 values. The
road from these numbers to meaningful information is very difficult. Arguably, the human brain’s
visual cortex solves a problem as difficult: understanding images that are projected on our retina and
converted to neuron signals. The next section will show how studying the brain can help computer
vision.

2 Understanding human vision

A first idea to solve computer vision is to understand how human vision works, and transfer this
knowledge to computers.

2.1 Definition of vision

Be it a computer or an animal, vision comes down to two components.

First, a sensing device captures as much details from an image as possible. The eye will capture light
coming through the iris and project it to the retina, where specialized cells will transmit information
to the brain through neurons. A camera captures images in a similar way and transmit pixels to the
computer. In this part, cameras are better than humans as they can see infrared, see farther away or
with more precision.
Second, the interpreting device has to process the information and extract meaning from it. The
human brain solves this in multiple steps in different regions of the brain. Computer vision still lags
behind human performance in this domain.

2.2 The human visual system

In 1962, Hubel & Wiesel [3] tried to understand the visual system of a cat by recording neurons while
showing a cat bright lines. They found that some specialized neurons fired only when the line was in
a particular spot on the retina or if it had a certain orientation. 1
1
More information in this blog post

2
Their research led to the beginning of a scientific journey to understand the human visual system,
which is still active today.
They were awarded the Nobel Prize in Physiology and Medecine in 1981 for their work. After the
announcement, Dr. Hubel said:
There has been a myth that the brain cannot understand itself. It is compared to
a man trying to lift himself by his own bootstraps. We feel that is nonsense. The
brain can be studied just as the kidney can.

2.3 How good is the human visual system?

Speed The human visual system is very efficient. As recognizing threats and reacting to them
quickly was paramount to survival, evolution perfected the visual system of mammals for millions of
years.
The speed of the human visual system has been measured [7] to around 150ms to recognize an animal
from a normal nature scene. Figure 2 shows how the brain responses to images of animals and
non-animals diverge after around 150ms.

Figure 2: Difference between animal and non-animal response. From [7]

Fooling humans However, this speed is obtained at the price of some drawbacks. Changing small
irrelevant parts of an image such as water reflection or background can go unnoticed because the
human brain focuses on the important parts of an image [5].
If the signal is very close to the background, it can be difficult to detect and segment the relevant part
of the image.

Context Humans use context all the time to infer clues about images. Previous knowledge is one
of the most difficult tool to incorporate into computer vision. Humans use context to know where
to focus on an image, to know what to expect at certain positions. Context also helps the brain to
compensate for colors in shadows.
However, context can be used to fool the human brain.

2.4 Lessons from nature

Imitating birds did not lead humans to planes. Plainly copying nature is not the best way or the
most efficient way to learn how to fly. But studying birds made us understand aerodynamics, and
understanding concepts like lift allowed us to build planes.

3
The same might be true with intelligence. Even though it is not possible with today’s technology,
simulating a full human brain to create intelligence might still not be the best way to get there.
However, neuroscientists hope to get insights at what may be the concepts behind vision, language
and other forms of intelligence.

3 Extracting information from images

We can divide the information gained from images in computer vision in two categories: measure-
ments and semantic information.

3.1 Vision as a measurement device

Robots navigating in an unknown location need to be able to scan their surroundings to compute the
best path. Using computer vision, we can measure the space around a robot and create a map of its
environment.
Stereo cameras give depth information, like our two eyes, through triangulation. Stereo vision is a
big field of computer vision and there is a lot of research seeking to create a precise depth map given
stereo images.
If we increase the number of viewpoints to cover all the sides of an object, we can create a 3D surface
representing the object [2]. An even more challenging idea might be to reconstruct the 3D model of a
monument through all the results of a google image search for this monument [1].
There is also research in grasping, where computer vision can help understand the 3D geometry of
an object to help a robot grasp it. Through the camera of the robot, we could recognize and find the
handle of the object and infer its shape, to then enable the robot to find a good grasping position [6].

3.2 A source of semantic information

On top of measurement informations, an image contains a very dense amount of semantic information.
We can label objects in an image, label the whole scene, recognize people, recognize actions, gestures,
faces.
Medical images also contain a lot of semantic information. Computer vision can be helpful for a
diagnosis based on images of skin cells for instance, to decide if they are cancerous or not.

4 Applications of computer vision

Cameras are everywhere and the number of images uploaded on internet is growing exponentially.
We have images on Instagram, videos on YouTube, feeds of security cameras, medical and scientific
images... Computer vision is essential because we need to sort through these images and enable
computers to understand their content. Here is a non exhaustive list of applications of computer
vision.

Special effects Shape and motion capture are new techniques used in movies like Avatar to animate
digital characters by recording the movements played by a human actor. In order to do that, we have
to find the exact positions of markers on the actor’s face in a 3D space, and then recreate them on the
digital avatar.

3D urban modeling Taking pictures with a drone over a city can be used to render a 3D model of
the city. Computer vision is used to combine all the photos into a single 3D model.

Scene recognition It is possible to recognize the location where a photo was taken. For instance, a
photo of a landmark can be compared to billions of photos on google to find the best matches. We
can then identify the best match and deduce the location of the photo.

Face detection Face detection has been used for multiple years in cameras to take better pictures
and focus on the faces. Smile detection can allow a camera to take pictures automatically when

4
the subject is smiling. Face recognition is more difficult than face detection, but with the scale of
today’s data, companies like Facebook are able to get very good performance. Finally, we can also
use computer vision for biometrics, using unique iris pattern recognition or fingerprints.

Optical Character Recognition One of the oldest successful applications of computer vision is to
recognize characters and numbers. This can be used to read zipcodes, or license plates.

Mobile visual search With computer vision, we can do a search on Google using an image as the
query.

Self-driving cars Autonomous driving is one of the hottest applications of computer vision. Com-
panies like Tesla, Google or General Motors compete to be the first to build a fully autonomous
car.

Automatic checkout Amazon Go is a new kind of store that has no checkout. With computer
vision, algorithms detect exactly which products you take and they charge you as you walk out of the
store 2 .

Vision-based interaction Microsoft’s Kinect captures movement in real time and allows players
to interact directly with a game through moves.

Augmented Reality AR is also a very hot field right now, and multiple companies are competing
to provide the best mobile AR platform. Apple released ARKit in June and has already impressive
applications 3 .

Virtual Reality VR is using similar computer vision techniques as AR. The algorithm needs to
know the position of a user, and the positions of all the objects around. As the user moves around,
everything needs to be updated in a realistic and smooth way.

References
[1] Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M Seitz. Multi-view stereo for
community photo collections. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference
on, pages 1–8. IEEE, 2007.

[2] Anders Heyden and Marc Pollefeys. Multiple view geometry. Emerging topics in computer vision, pages
45–107, 2005.

[3] David H Hubel and Torsten N Wiesel. Receptive fields, binocular interaction and functional architecture in
the cat’s visual cortex. The Journal of physiology, 160(1):106–154, 1962.
[4] Seymour A Papert. The summer vision project. 1966.

[5] Ronald A Rensink, J Kevin O’Regan, and James J Clark. On the failure to detect changes in scenes across
brief interruptions. Visual cognition, 7(1-3):127–145, 2000.

[6] Ashutosh Saxena, Justin Driemeyer, and Andrew Y Ng. Robotic grasping of novel objects using vision. The
International Journal of Robotics Research, 27(2):157–173, 2008.

[7] Simon Thorpe, Denise Fize, and Catherine Marlot. Speed of processing in the human visual system. nature,
381(6582):520, 1996.

2
see their video here
3
check out the different apps

Computer Vision Presentation AI
83% (6)
Computer Vision Presentation AI
16 pages
The Buying Brain
50% (2)
The Buying Brain
2 pages
HOWES, DAVID - Empire of The Senses
0% (1)
HOWES, DAVID - Empire of The Senses
4 pages
Basic Brain Anatomy and Physiology PDF
100% (3)
Basic Brain Anatomy and Physiology PDF
10 pages
Computer Vision PDF
No ratings yet
Computer Vision PDF
6 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
CS312 Module 4
No ratings yet
CS312 Module 4
21 pages
Computer Vision Models Learning And Inference Prince Sjd download
No ratings yet
Computer Vision Models Learning And Inference Prince Sjd download
84 pages
Computer Vision: Evolution and Promise
No ratings yet
Computer Vision: Evolution and Promise
5 pages
EXwaPmVPSX r5bAgknhYEw Introduction FPCV 0 1
No ratings yet
EXwaPmVPSX r5bAgknhYEw Introduction FPCV 0 1
30 pages
Introduction FPCV-0-1
No ratings yet
Introduction FPCV-0-1
31 pages
Computer Vision Models Learning and Inference 1st Edition Dr Simon J. D. Prince download
100% (1)
Computer Vision Models Learning and Inference 1st Edition Dr Simon J. D. Prince download
64 pages
Computer Science & Mathematics Major for College_ Mathematics by Slidesgo
No ratings yet
Computer Science & Mathematics Major for College_ Mathematics by Slidesgo
21 pages
(Ebook) Computer Vision: Models, Learning, and Inference by Dr Simon J. D. Prince ISBN 9781107011793, 1107011795 instant download
No ratings yet
(Ebook) Computer Vision: Models, Learning, and Inference by Dr Simon J. D. Prince ISBN 9781107011793, 1107011795 instant download
49 pages
LectureNotes PDF
No ratings yet
LectureNotes PDF
212 pages
Computer Vision in Aritificial Intelligence
No ratings yet
Computer Vision in Aritificial Intelligence
33 pages
What Is Computer Vision
No ratings yet
What Is Computer Vision
9 pages
Computer Vision
No ratings yet
Computer Vision
13 pages
Get (Ebook) Computer Vision: Models, Learning, and Inference by Dr Simon J. D. Prince ISBN 9781107011793, 1107011795 PDF ebook with Full Chapters Now
100% (1)
Get (Ebook) Computer Vision: Models, Learning, and Inference by Dr Simon J. D. Prince ISBN 9781107011793, 1107011795 PDF ebook with Full Chapters Now
86 pages
Lecture Notes
No ratings yet
Lecture Notes
144 pages
A computer vision system processes images acquired
No ratings yet
A computer vision system processes images acquired
4 pages
Computer Vision Report
No ratings yet
Computer Vision Report
31 pages
Text For Presentation
No ratings yet
Text For Presentation
5 pages
CV Module 1
No ratings yet
CV Module 1
166 pages
Lec01 - Intro To Computer Vision
No ratings yet
Lec01 - Intro To Computer Vision
43 pages
What's Our Decision? Computer Vision!!! O.o: By: Honney Thomas, Faria Kader & Sarah Tran
No ratings yet
What's Our Decision? Computer Vision!!! O.o: By: Honney Thomas, Faria Kader & Sarah Tran
11 pages
IT5409 Ch1 Intro New Template
No ratings yet
IT5409 Ch1 Intro New Template
14 pages
Table of Contents
No ratings yet
Table of Contents
9 pages
Percept: Fundamentals and Applications
From Everand
Percept: Fundamentals and Applications
Fouad Sabry
No ratings yet
An Introduction To Computer Vision
No ratings yet
An Introduction To Computer Vision
7 pages
Computer Vision Introduction
No ratings yet
Computer Vision Introduction
11 pages
Computer Vision Lecture 1
No ratings yet
Computer Vision Lecture 1
15 pages
Computer Vision and Data Science Notes
No ratings yet
Computer Vision and Data Science Notes
11 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
1a. Introduction
No ratings yet
1a. Introduction
32 pages
1 Intro Visión Artificial
No ratings yet
1 Intro Visión Artificial
50 pages
01 - Introduction
No ratings yet
01 - Introduction
37 pages
Computer Vision Is A Field of Artificial Intelligence
No ratings yet
Computer Vision Is A Field of Artificial Intelligence
2 pages
Computer Vision
No ratings yet
Computer Vision
14 pages
CV
No ratings yet
CV
9 pages
CompVisNotes PDF
No ratings yet
CompVisNotes PDF
115 pages
1 Vision Lec 1
No ratings yet
1 Vision Lec 1
49 pages
Lec00 Intro For Web Highlighted
No ratings yet
Lec00 Intro For Web Highlighted
72 pages
Computer Vision Intorduction
No ratings yet
Computer Vision Intorduction
57 pages
Computer Vision Notes
No ratings yet
Computer Vision Notes
3 pages
803 (A) Image Processing and Computer Vision#: Subject In-Charge: Prof Shilpa Sharma
No ratings yet
803 (A) Image Processing and Computer Vision#: Subject In-Charge: Prof Shilpa Sharma
44 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
25 pages
CPCS335 - Chapter 9-Final
No ratings yet
CPCS335 - Chapter 9-Final
24 pages
00CV Intro Full
No ratings yet
00CV Intro Full
58 pages
1 Sirg Bsu - 1
No ratings yet
1 Sirg Bsu - 1
46 pages
Cv Unit 1 Overview of Computer Vison and Application
No ratings yet
Cv Unit 1 Overview of Computer Vison and Application
51 pages
CV_UNIT_1
No ratings yet
CV_UNIT_1
17 pages
Computer Vision
No ratings yet
Computer Vision
11 pages
DL4CV_Week01_Part01
No ratings yet
DL4CV_Week01_Part01
35 pages
CV #1 Course Introduction-1
No ratings yet
CV #1 Course Introduction-1
61 pages
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Lec 00
No ratings yet
Lec 00
76 pages
Optical Braille Recognition: Empowering Accessibility Through Visual Intelligence
From Everand
Optical Braille Recognition: Empowering Accessibility Through Visual Intelligence
Fouad Sabry
No ratings yet
Computer Vision SM-1
No ratings yet
Computer Vision SM-1
26 pages
Augmented Reality
No ratings yet
Augmented Reality
27 pages
502355296-Computer-Vision-Presentation-AI
No ratings yet
502355296-Computer-Vision-Presentation-AI
16 pages
Computer Vision: Chapter 1: Introduction
No ratings yet
Computer Vision: Chapter 1: Introduction
7 pages
IT5409 Ch1 Intro
No ratings yet
IT5409 Ch1 Intro
14 pages
Genetics
No ratings yet
Genetics
1 page
The Civil Rights Movement
No ratings yet
The Civil Rights Movement
1 page
Understanding the Basics of Climate Change
No ratings yet
Understanding the Basics of Climate Change
1 page
The Fundamentals of Quantum Mechanics
No ratings yet
The Fundamentals of Quantum Mechanics
1 page
AI notes
No ratings yet
AI notes
1 page
Data Science
No ratings yet
Data Science
1 page
The Sensorimotor System
No ratings yet
The Sensorimotor System
4 pages
CVD Case Study
0% (1)
CVD Case Study
24 pages
The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation
No ratings yet
The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation
8 pages
BRAIN ROT RESEARCH PAPER
No ratings yet
BRAIN ROT RESEARCH PAPER
44 pages
Memory Experiments STM
No ratings yet
Memory Experiments STM
33 pages
Instant download (Ebook) Neurology of Cognitive and Behavioral Disorders by Orrin Devinsky, Mark D'Esposito ISBN 9780195137644, 9781429420693, 0195137647, 1429420693 pdf all chapter
100% (5)
Instant download (Ebook) Neurology of Cognitive and Behavioral Disorders by Orrin Devinsky, Mark D'Esposito ISBN 9780195137644, 9781429420693, 0195137647, 1429420693 pdf all chapter
71 pages
Willa K
No ratings yet
Willa K
23 pages
Text Book Ecercise c1
No ratings yet
Text Book Ecercise c1
7 pages
IB Psychology - Norge
No ratings yet
IB Psychology - Norge
2 pages
Session 15
100% (1)
Session 15
31 pages
DR Steven Lin - The Five Stages of Sleep and Brain Wave Cycles
No ratings yet
DR Steven Lin - The Five Stages of Sleep and Brain Wave Cycles
22 pages
Dreaming of A Learning Task Is Associated With Enhanced Sleep-Dependent Memory Consolidation
No ratings yet
Dreaming of A Learning Task Is Associated With Enhanced Sleep-Dependent Memory Consolidation
7 pages
Chapter 4 Neo-Behavorism
No ratings yet
Chapter 4 Neo-Behavorism
17 pages
Essentials of Cognitive Neuroscience 1st Edition Bradley R. Postle - Download the ebook now to never miss important content
100% (4)
Essentials of Cognitive Neuroscience 1st Edition Bradley R. Postle - Download the ebook now to never miss important content
54 pages
Nat Reviewer Science
No ratings yet
Nat Reviewer Science
41 pages
Case Study Outline 3
No ratings yet
Case Study Outline 3
8 pages
Brain
No ratings yet
Brain
12 pages
Designing Streets For Kids
100% (1)
Designing Streets For Kids
216 pages
The Psychological Impact of Architectural Design
No ratings yet
The Psychological Impact of Architectural Design
44 pages
Nuerolinguistics
No ratings yet
Nuerolinguistics
2 pages
Carl F. Craver (Editor) - John Bickle (Editor) - A. S. Barwich (Ed - The Tools of Neuroscience Experiment - Philosophical and Scientific Perspectives (2022, Routledge) - Libgen - Li
100% (2)
Carl F. Craver (Editor) - John Bickle (Editor) - A. S. Barwich (Ed - The Tools of Neuroscience Experiment - Philosophical and Scientific Perspectives (2022, Routledge) - Libgen - Li
363 pages
Essential Clinical Anatomy of the Nervous System 1st Edition Paul Rea - Own the complete ebook with all chapters in PDF format
100% (1)
Essential Clinical Anatomy of the Nervous System 1st Edition Paul Rea - Own the complete ebook with all chapters in PDF format
57 pages
Planetary Correspondences
No ratings yet
Planetary Correspondences
5 pages
Mid Brain (Ana 242 Lecture Note)
No ratings yet
Mid Brain (Ana 242 Lecture Note)
26 pages
How Your Thoughts Are Connected To Your Future - Dr. Joe Dispenza - English (Auto-Generated)
100% (1)
How Your Thoughts Are Connected To Your Future - Dr. Joe Dispenza - English (Auto-Generated)
9 pages
List of Human Body Parts With Diagram
100% (1)
List of Human Body Parts With Diagram
13 pages
Listening 1 - Extra MCQ questions (students)
No ratings yet
Listening 1 - Extra MCQ questions (students)
2 pages

lecture01

Uploaded by

lecture01

Uploaded by

CS 131 Lecture 1: Course introduction

1 What is computer vision?

Figure 1: Computer vision at the intersection of multiple scientific fields

1.3 A hard problem

Vision harder than 3D modeling It is today easier to create a 3D model of an object up to

2 Understanding human vision

2.1 Definition of vision

Be it a computer or an animal, vision comes down to two components.

2.2 The human visual system

2.3 How good is the human visual system?

Figure 2: Difference between animal and non-animal response. From [7]

2.4 Lessons from nature

3 Extracting information from images

3.1 Vision as a measurement device

3.2 A source of semantic information

4 Applications of computer vision

You might also like