0% found this document useful (0 votes)
20 views

Lecture 01 Introduction

This document provides an overview of the field of computer vision including its goals, challenges, applications, and history. Computer vision aims to make computers understand images and videos by analyzing visual content to classify and detect objects, scenes, and activities. While significant progress has been made, fully achieving human-level perception remains an ongoing challenge.

Uploaded by

Ankush Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Lecture 01 Introduction

This document provides an overview of the field of computer vision including its goals, challenges, applications, and history. Computer vision aims to make computers understand images and videos by analyzing visual content to classify and detect objects, scenes, and activities. While significant progress has been made, fully achieving human-level perception remains an ongoing challenge.

Uploaded by

Ankush Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

Computer Vision

Introduction
What is Computer Vision?
• Make computers understand images and videos.
• What kind of scene?

• Where are the cars?

• How far is the building?


What is Computer Vision?
• Make computers understand images and videos.
• What are they doing?

• Why is this happening?

• What is important?

• What will I see?


Computer Vision and Nearby Fields
Digital Image Processing
Computational Photography Geometry (3D)
Computer Vision
Shape

Images (2D)
Photometry
Computer Graphics
Appearance
Machine learning:
Vision = Machine learning applied to visual data
Image Processing vs Computer Vision
• Image Processing
• Mostly concerned with image-to-image transformations
• Filtering
• Enhancement
• Compression
• Computer Vision
• Concerned with how images reflect the 3D world
• Filtering for feature extraction
• Enhancement for recognition/detection
• Compression that preserves geometric information in images
Visual data on the Internet
• Flickr 90% of net traffic
• 10+ billion photographs
• 60 million images uploaded a month
will be visual!
• Facebook Mostly about cats
• 250 billion+
• 300 million a day
• Instagram
• 55 million a day
• YouTube
• 100 hours uploaded every minute
Too big for humans

http://www.petittube.com/

• Need automatic tools to access and analyze visual data!


Vision is Really Hard
• Vision is an amazing feature of natural intelligence
• Visual cortex occupies about 50% of Macaque brain
• More human brain devoted to vision than anything else

Is that a
queen or a
bishop?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
What did you see?
• Where this picture was taken?

• How many people are there?

• What are they doing?

• What object the person on the left standing on?

• Why this is a funny picture?


Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Challenges: Many nuisance parameters

Illumination Object pose Clutter

Occlusions Intra-class Viewpoint


appearance
Challenges: Intra-class variation
Handling challenges?
We are really, really far from the human like perception of computer
vision. But we have:
• Lots and lots and lots of data.
• We can learn from humans.
• We have Prior knowledge, which can provide constrains of the
problem.
Computer Vision
Safety Health Security
Technology
Can Better Our Lives

Comfort Fun Access


History of Computer Vision
“In 1966, Minsky hired a first-year
undergraduate student and
assigned him a problem to solve
over the summer:

connect a camera to a computer


and get the machine to describe
Marvin Minsky, MIT what it sees.”
Turing award, 1969 Crevier 1993, pg. 88
Half a century later,
we're still working on it.
1960’s: interpretation of synthetic worlds

Larry Roberts Input image 2x2 gradient operator computed 3D model


“Father of Computer Vision” rendered from new viewpoint

Larry Roberts PhD Thesis, MIT, 1963,


Machine Perception of Three-Dimensional Solids Slide credit: Steve Seitz
1970’s: some progress on interpreting selected images

The representation and matching of pictorial structures


Fischler and Elschlager, 1973
1970’s: some progress on interpreting selected images

The representation and matching of pictorial structures


Fischler and Elschlager, 1973
1980’s: ANNs come and go; shift toward
geometry and increased mathematical rigor

Image credit: Rick Szeliski


1990’s: face recognition; statistical analysis in vogue
2000’s: broader recognition; large annotated
datasets available; video processing starts
2010’s: resurgence of deep learning

[AlexNet NIPS 2012] [DeepFace CVPR 2014]

[DeepPose CVPR 2014] [Show, Attend and Tell ICML 2015]


2020’s: autonomous vehicles
2030’s: robot uprising?
Examples of Computer Vision Applications
• How is computer vision used today?
Face detection

• Most digital cameras and smart phones detect faces (and more)
• Canon, Sony, Fuji, …
• For smart focus, exposure compensation, and cropping
Slide credit: Steve Seitz
Face recognition

Facebook face auto-tagging


Face Landmark Alignment – 3D Persona

What Makes Tom Hanks Look Like Tom Hanks ICCV 2015
Smile Detection

Sony Cyber-shot® T70 Digital Still Camera Slide credit: Steve Seitz
Vision-based Biometrics

“How the Afghan Girl was Identified by Her Iris Patterns” Read the story wikipedia

Slide credit: Steve Seitz


Vision-based Biometrics
Optical Character Recognition (OCR)
• Technology to convert scanned docs to text
• If you have a scanner, it probably came with OCR software

Digit recognition, AT&T labs License plate readers


http://www.research.att.com/~yann/ http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
Slide credit: Steve Seitz
Computer vision in sports

Hawk-Eye: helping/improving referee decisions


Computer vision in sports

SportVision: improving viewer experiences


Computer vision in sports

Replay Technologies: improving viewer experiences


Computer vision in sports

Play tracking
Visual recognition for photo organization

Google photo
Earth viewers (3D modeling)

Image from Microsoft’s Virtual Earth


(see also: Google Earth)
Slide credit: Steve Seitz
3D from thousands of images

[Furukawa et al. CVPR 2010]


3D Time-lapse from Internet Photos

3D Time-lapse from Internet Photos, ICCV 2015


Style transfer

Source image (Style) Target image (Content) Output (deepart)

A Neural Algorithm of Artistic Style [Gatys et al. 2015]


Special effects: Matting and composition

Kylie Minogue - Come Into My World


Special effects: Shape capture

The Matrix movies, ESC Entertainment, XYZRGB, NRC

Slide credit: Steve Seitz


Special effects: Motion capture

Pirates of the Carribean, Industrial Light and Magic Slide credit: Steve Seitz
Google cars

Google in talks with Ford, Toyota and Volkswagen to realise driverless cars

http://www.theatlantic.com/technology/archive/2014/05/all-the-world-a-track-th
e-trick-that-makes-googles-self-driving-cars-work/370871/
Interactive Games: Kinect
• Object Recognition: http://www.youtube.com/watch?feature=iv&v=fQ59dXOo63o
• Mario: http://www.youtube.com/watch?v=8CTJL5lUjHg
• 3D: http://www.youtube.com/watch?v=7QrnwoO1-8A
• Robot: http://www.youtube.com/watch?v=w8BmgtMKFbY
Vision in space

NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.

Vision systems (JPL) used for several tasks


• Panorama stitching
• 3D terrain modeling
• Obstacle detection, position tracking
• For more, read “Computer Vision on Mars” by Matthies et al.
Industrial robots

Vision-guided robots position nut runners on wheels

http://www.automationworld.com/computer-vision-opportunity-or-threat
Mobile robots

NASA’s Mars Spirit Rover http://www.robocup.org/

Saxena et al. 2008 http://www.youtube.com/w


STAIR at Stanford atch?v=DF39Ygp53mQ
Medical imaging

Image guided surgery


3D imaging Grimson et al., MIT
MRI, CT
Computer vision for the mass

Counting cells Predicting poverty


Current state of the art
• Many of these are less than 5 years old
• Very active and exciting research area!
• To learn more about vision applications and companies
– David Lowe maintains an excellent overview of vision companies
• http://www.cs.ubc.ca/spider/lowe/vision.html
Topics of Studies in Computer Vision
• Interpreting Intensities
– What determines the brightness and color of a pixel?
– How can we use image filters to extract meaningful information from the image?
• Correspondence and Alignment
– How can we find corresponding points in objects or scenes?
– How can we estimate the transformation between them?
• Perspective and 3D Geometry
– How can we map between the 3D world and the 2D image?
– How can we recover 3D coordinates from images or video?
• Grouping and Segmentation
– How can we group pixels into meaningful regions?
• Categorization and Object Recognition
– How can we represent images and categorize them?
– How can we recognize categories of objects?
• Advanced Topics
– Action recognition, 3D scenes and context, CNNs, …
Resources
Books
• “Computer Vision: A Modern Approach”, by D. A. Forsyth, J. Ponce.
• “Digital Image Processing: An Algorithmic Approach” by Madhuri A. Joshi
• Videos
• https://www.youtube.com/playlist?list=PLyqSpQzTE6M_PI-rIz4O1jEgffhJU9GgG

You might also like