0% found this document useful (0 votes)
26 views

1 Introduction

Uploaded by

jamil.arbas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

1 Introduction

Uploaded by

jamil.arbas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

CPS834/CPS8307

Introduction to Computer Vision

Dr. Omar Falou


Toronto Metropolitan University

Fall 2024
My Contact Info
Dr. Omar Falou
[email protected] - please put CPS834/CPS8307 in subject line
Office – VIC 717

E-mail expectations:
⚫ I will check my e-mail at minimum once a day
⚫ I answer e-mails in the order they are received and aim to
respond to e-mails within 48 hours (counting “business day”
hours).
⚫ Asking questions in class, after class, or during office hours
is the fastest way to get them answered!
D2L and e-mail
⚫ You are required to maintain a TMU e-mail account. E-mails
from other non-TMU accounts may not be answered!
⚫ I will send e-mails to your TMU account with important class
info.
⚫ Check the D2L Course Shell regularly (before each class) to
check for important announcements.
⚫ Access from my.torontomu.ca
⚫ Everything course-related will be posted on D2L!
Course Description
This course describes foundational concepts of
computer vision. In particular, the course covers
the image formation process, image
representation, feature extraction, model fitting,
motion analysis, 3D parameter estimation and
applications.

Lecture: 3 hrs. 3:10 pm – 6:00 pm (LIB 72)

Labs: 0 hrs.
Office Hours
⚫ Tuesdays 11:00 am – 1:00 pm (VIC 717)
⚫ Office Hours are the best time to ask
questions about the course.
Textbooks
⚫ Multiple View Geometry in Computer Vision, by
Richard Hartley and Andrew Zisserman. ISBN:
9780511811685
⚫ Digital Image Processing, by Rafael C. Gonzalez,
Richard E. Woods. ISBN: 9780133356724
⚫ Computer Vision: Algorithms and Applications, by
Richard Szeliski, an electronic version of the book
can be found on the author’s website.
Topics
Topics
Evaluation
Assignments
⚫ Research Topic (15%)
⚫ Topics will be assigned weekly. Two weeks to submit/present.
⚫ Groups of two.
⚫ Requirements: 3-4 pages report (all students) + in-class
presentation (graduate students)
⚫ Paper Critique (15%)
⚫ Given after study week
⚫ Groups of two.
⚫ 2-3 pages.
Projects
⚫ Group of 4-5 students
⚫ Due at the end of the semester
⚫ Source code (Python) + report + video demo + contributions
⚫ Examples:
⚫ Medical Image Segmentation and Classification
⚫ Medical Image Synthesis
⚫ Object Detection and Tracking
⚫ Automatic Image Captioning
⚫ Emotion Detection from Facial Expressions
⚫ Autonomous Vehicle Lane Detection
⚫ 3D Object Reconstruction from 2D Images
⚫ Defect Detection in Manufacturing
⚫ AI-Powered Visual Search Engine
⚫ Plant Disease Detection
⚫ Sports Action Recognition in Video Clips
Projects
⚫ Group of 4-5 students
⚫ Due at the end of the semester
⚫ Source code (Python) + report
⚫ Examples:
⚫ Medical Image Segmentation and Classification
⚫ Medical Image Synthesis
⚫ Object Detection and Tracking
⚫ Automatic Image Captioning
⚫ Emotion Detection from Facial Expressions
⚫ Autonomous Vehicle Lane Detection
⚫ 3D Object Reconstruction from 2D Images
⚫ Defect Detection in Manufacturing
⚫ AI-Powered Visual Search Engine
⚫ Plant Disease Detection
⚫ Sports Action Recognition in Video Clips
Final Exam
⚫ During the examination period.
⚫ Format: Multiple choice questions.
⚫ Duration: 2 hours.
Religious Exemption

⚫ If you have a religious observance at this


time, you must notify me within the first
two weeks of class. (TMU Policy 150)
⚫ For already-scheduled course components (most of them),
must submit a Request for Accommodation of Student
Religious, Aboriginal and Spiritual Observance AND an
Academic Consideration Request form within the first 2 weeks
of the class
⚫ For a final examination, do these within 2 weeks of the posting
of the examination schedule.
Remarking
⚫ Request for remarking of any work must
be made within 10 working days of the
assessment being released to students
⚫ A request should be made in writing (e-

mail)
⚫ Students should be aware that as a result

of remarking, their mark may go up, down,


or stay the same.
My Work
⚫ Medical Imaging Applications
⚫ Transfer Learning for the Prediction of Breast

Cancer Treatment using Quantitative Ultrasound


Imaging
⚫ Machine Learning for the Prediction of Kidney

Fibrosis using Ultrasound and Photoacoustic


Imaging
⚫ Deep Learning for the Synthesis of Cancer Images
Falou et al. “Transfer Learning of Pre-treatment Quantitative Ultrasound Multi-Parametric
Images for the Prediction of Breast Cancer Response to Neoadjuvant Chemotherapy,” Scientific
Reports, 14(1):2340, 2024
Acknowledgments
Many of the following slides are adapted from the
outstanding lecture materials of similar courses taught by
Professors Yung-Yu Chuang, Fredo Durand, Alyosha Efros,
Bill Freeman, James Hays, Svetlana Lazebnik, Andrej
Karpathy, Fei-Fei Li, Srinivasa Narasimhan, Silvio Savarese,
Steve Seitz, Richard Szeliski, Li Zhang, Kosta Derpanis, Yan
Tong, and Noah Snavely. The instructor deeply appreciates
these researchers for sharing their notes publicly, which
have been invaluable in developing this course.
Some Context
Every image tells a story
• Goal of computer vision:
perceive the “story” behind
the picture
• Compute properties of the
world
– 3D shape
– Names of people or objects
– What happened?
The goal of computer vision
Can computers match human perception?
• Yes and no (mainly no)
– computers can be better at
“easy” things
– humans are better at “hard”
things

• But huge progress


– Accelerating in the last five
years due to deep learning
– What is considered “hard”
keeps changing
Human perception has its shortcomings

https://twitter.com/pickover/status/1460275132958662657/
But humans can tell a lot about a scene from a
little information…

Source: “80 million tiny images” by Torralba, et al.


The goal of computer vision
The goal of computer vision
• Compute the 3D shape of the world

ZED 2i Camera
The goal of computer vision
• Recognize objects and people

Terminator 2, 1991
031111_034757+8_beijing_e031124

slide credit: Fei-Fei, Fergus & Torralba


031111_034757+8_beijing_e031124

sky
building

flag

face
banner
wall
street lamp
bus bus

cars slide credit: Fei-Fei, Fergus & Torralba


The goal of computer vision
• “Enhance” images
The goal of computer vision
• Forensics

Source: Nayar and Nishino, “Eyes for Relighting”


Source: Nayar and Nishino, “Eyes for Relighting”
Source: Nayar and Nishino, “Eyes for Relighting”
The goal of computer vision
• Improve photos (“Computational Photography”)

Super-resolution (source: 2d3)

Depth of field on cell phone camera


(source: Google Research Blog)
Removing objects
(Google Magic Eraser)
Low-light photography
(credit: Hasinoff et al., SIGGRAPH ASIA 2016)
April 10, 2019
Why study computer vision?
• Billions of images/videos captured per day

• Huge number of potential applications


• The next slides show the current state of the art
Optical character recognition (OCR)
• If you have a scanner, it probably came with OCR software

Digit recognition, AT&T labs (1990’s) License plate readers


http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
http://yann.lecun.com/exdb/lenet/

Sudoku grabber
http://sudokugrab.blogspot.com/

Automatic check processing


Face detection

• Nearly all cameras detect faces in real time


– (Why?)
Face analysis and recognition
Vision-based biometrics

Who is she? Source: S. Seitz


Vision-based biometrics

“How the Afghan Girl was Identified by Her Iris Patterns” Read the story

Source: S. Seitz
Login without a password

Fingerprint scanners on Face unlock on Apple iPhone X


many new smartphones See also http://www.sensiblevision.com/
and other devices
New York Times, Jan. 18, 2020
by Kashmir Hill
Bird identification

Merlin Bird ID (based on Cornell Tech technology!)


Special effects: shape capture

The Matrix movies, ESC Entertainment, XYZRGB, NRC


Source: S. Seitz
Special effects: motion capture

Pirates of the Carribean, Industrial Light and Magic Source: S. Seitz


3D face tracking w/ consumer cameras

Snapchat Lenses

Face2Face system (Thies et al.)


Image synthesis

https://this-person-does-not-exist.com/
Which face is real?

https://www.whichfaceisreal.com/
Image synthesis

“An astronaut riding a horse in a “A photo of a Corgi dog riding a bike in Times
photorealistic style” – DALL-E 2 Square. It is wearing sunglasses and a beach hat” –
Imagen
Sports

Sportvision first down line


Explanation on www.howstuffworks.com

Source: S. Seitz
Smart cars

• Mobileye
• Tesla Autopilot
• Safety features in many cars
Self-driving cars

Waymo
Robotics

NASA’s Mars Curiosity Rover Amazon Picking Challenge


https://en.wikipedia.org/wiki/Curiosity_(rover) http://www.robocup2016.org/en/events/amazon-picking-challenge/

Amazon Prime Air Amazon Scout


Medical imaging

3D imaging
(MRI, CT) Skin cancer classification with deep learning
https://cs.stanford.edu/people/esteva/nature/
Virtual & Augmented Reality

6DoF head tracking Hand & body tracking

3D scene understanding 3D-360 video capture


Current state of the art
• You just saw many examples of current systems.
– Many of these are less than 5 years old

• Computer vision is an active research area, and rapidly changing


– Many new apps in the next 5 years
– Deep learning and generative methods powering many modern
applications

• Many startups across a dizzying array of areas


– Generative AI, robotics, autonomous vehicles, medical imaging,
construction, inspection, VR/AR, …
Why is computer vision difficult?

Viewpoint variation

Credit: Flickr user michaelpaul

Credit: Flickr user michaelpaul

Scale
Illumination
Why is computer vision difficult?

Motion (Source: S. Lazebnik)


Intra-class variation

Background clutter Occlusion


Challenges: local ambiguity

slide credit: Fei-Fei, Fergus & Torralba


But there are lots of visual cues we can use…

Source: S. Lazebnik
Bottom line
• Perception is an inherently ambiguous problem
– Many different 3D scenes could have given rise to a given 2D
image

Artist Julian Beever with his anamorphic Coke bottle

– We often must use prior knowledge about the world’s structure


Image source: F. Durand
Good luck!
I am here to help you succeed.

You might also like