0% found this document useful (0 votes)
91 views80 pages

Computer Vision ch1

This document provides information about the EE5811 Topics in Computer Vision course. It introduces the instructor, Dr. LI Haoliang, and the three teaching assistants. It lists three recommended textbooks and discusses technical paper reading. It outlines the course goals of understanding the physical world from images and reconstructing 3D models from 2D images. Finally, it provides a tentative schedule covering topics like image filtering, edge detection, object detection, 3D reconstruction and motion analysis.

Uploaded by

ss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views80 pages

Computer Vision ch1

This document provides information about the EE5811 Topics in Computer Vision course. It introduces the instructor, Dr. LI Haoliang, and the three teaching assistants. It lists three recommended textbooks and discusses technical paper reading. It outlines the course goals of understanding the physical world from images and reconstructing 3D models from 2D images. Finally, it provides a tentative schedule covering topics like image filtering, edge detection, object detection, 3D reconstruction and motion analysis.

Uploaded by

ss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

EE5811

Topics in Computer Vision


Dr. LI Haoliang
Department of Electrical Engineering
EE5811 Background Info
• Instructor: LI, Haoliang,
– Assistant professor @ EE department
• Office: Yeung-G6526
• Email: [email protected]
– The best way to reach me.
• Tel: 3442-6087
Teaching Assistants
• Mr. QIN Tiexin
[email protected]
• Mr. LIU Jie
[email protected]
• Mr. LIU Hui
[email protected]
• Textbook
• Milan Sonka, Vaclav Hlavac, Roger Boyle, Image
Processing, Analysis, and Machine Vision (CENGAGE
Learning, 4th edition, 2015)
• D. Forsyth and J. Ponce, Computer Vision: A Modern
Approach, 2nd edition, Prentice Hall (2011)
• R. Gonzalez and R. Woods, Digital Image Processing, 3rd
edition, Prentice Hall (2007)
• Computer Vision: Algorithms and Applications, 2nd
edition, 2021 http://szeliski.org/Book/
Technical paper reading
• Computer Vision Venue
– CVPR, ICCV, ECCV, ACCV,BMVC, etc.
• Machine Learning Venue
– ICML, NeurIPS, ICLR, UAI, AISTAT, etc.
• Other Venues
– ACL, KDD, MICCAI, etc.
7
Every image tells a story
• Goal of computer vision:
perceive the “story”
behind the picture
• Compute properties of
the world
– 3D shape
– Names of people or
objects
– What happened?
Can the computer match human
perception?
• Yes and no (mainly no)
– computers can be better at
“easy” things
– humans are much better at
“hard” things

• But huge progress has


been made
– Accelerating in the last few
years due to deep learning
– What is considered “hard”
keeps changing
Humans can tell a lot about a scene
from a little information…

Source: “80 million tiny images” by Torralba, et al.


The goal of computer vision
The goal of computer vision
• Compute and understand the physical world
The goal of computer vision
• Reconstruct 3D model from crowdsourcing

Internet Photos (“Colosseum”) Reconstructed 3D cameras Dense 3D model


and points
The goal of computer vision
• Recognize objects and people

Terminator 2, 1991
sky
building

flag

face
banner
wall
street lamp
bus bus

cars slide credit: Fei-Fei, Fergus & Torralba


Why study computer vision?
• Billions of images/videos captured per day

• Huge number of useful applications


• The next slides show the current state of the art
Computer Vision

• Low Level Vision


– Measurements
– Enhancements
– Region segmentation
– Features White
• Mid Level Vision
– Reconstruction
– Depth Shadow

– Motion Estimation 1m

• High Level Vision


– Category detection
– Activity recognition
– Deep understandings
17
Computer Vision

• Low Level Vision


– Measurements
– Enhancements
– Region segmentation
– Features White
• Mid Level Vision
– Reconstruction
– Depth Shadow

– Motion Estimation 1m

• High Level Vision


– Category detection
– Activity recognition
– Deep understandings
18
Computer Vision

• Low Level Vision


– Measurements
– Enhancements
– Region segmentation
– Features White
• Mid Level Vision
– Reconstruction
– Depth Shadow

– Motion Estimation 1m

• High Level Vision


– Category detection
– Activity recognition
– Deep understandings
19
Image enhancement
• Improve photos (“Computational Photography”)

Haze removal

Super-resolution (source: 2d3)


Inpainting / image completion (image credit: Hays and Efros)
Applications: 3D Scanning

Scanning Michelangelo’s “The David”

• The Digital Michelangelo Project


- http://graphics.stanford.edu/projects/mich/
• UW Prof. Brian Curless, collaborator
• 2 BILLION polygons, accuracy to .29mm 21
Google’s 3D Maps
Structure estimation from tourist photos

22
Apple’s 3D maps

https://www.youtube.com/watch?v=InIVv-LsgZE

23
Optical character recognition (OCR)
• If you have a scanner, it probably came with OCR software

Digit recognition, AT&T labs License plate readers


http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
http://www.research.att.com/~yann/

Equation Recognition Source: S. Seitz


Automatic check processing
Face detection

• Nearly all cameras detect faces in real time


Face Recognition
Face recognition/Micro expression

Who is she? Source: S. Seitz


Vision-based biometrics

“How the Afghan Girl was Identified by Her Iris Patterns” Read the story

Source: S. Seitz
Object recognition (in mobile phones)

Source: S. Seitz
Google Goggles
Bird Identification

Merlin Bird ID (based on Cornell Tech technology!)


Plant Identification

Pl@ntNet is a research and educational initiative on plant


biodiversity supported by Agropolis Foundation since 2009.
Marine Mammal Recognition
Vessel based CWD survey

Under-water fish counting


Amazon Picking Challenge
http://www.robocup2016.org/en/events
/amazon-picking-challenge/

Robomart
Medical imaging
Healthcare Gist – Chili fish head
Color moment – Braised pork
FC7 – Steamed chicken feet

AlexNet – Kung Pao Chicken


VGG – Kung Pao Chicken

Multi-task VGG – Kung Pao Chicken


[chicken, chili, peanut]

Region-based Multi-task VGG


chicken: dice, stir-fry
chili: dry
peanut: roasted
Virtual & Augmented Reality

6DoF head tracking Hand & body tracking

3D scene understanding 3D-360 video capture


Why is computer vision difficult?

Viewpoint variation

Scale
Illumination
Why is computer vision difficult?

Motion (Source: S. Lazebnik)


Intra-class variation

Background clutter Occlusion


Bottom line
• Perception is an inherently ambiguous problem
– Many different 3D scenes could have given rise to a
particular 2D picture

– We often need to use prior knowledge about the


structure of the world
Image source: F. Durand
“In 1966, Minsky hired a first-year
undergraduate student and assigned
him a problem to solve over the
summer: connect a television camera
to a computer and get the machine to
describe what it sees.”
Crevier 1993, pg. 88

Marvin Minsky,
MIT
Turing
award,1969
42
Marvin Minsky, Gerald Sussman, MIT
MIT (the undergraduate)

Turing “You’ll notice that Sussman never


award,1969 worked in vision again!” – Berthold Horn
A brief History of Computer Vision
1970s
1980s
David Marr (1982)
1980s
1990s
2000s
2010s
Current state of the art
• You just saw examples of current systems.
– Most of these are less than 5 years old

• This is a very active research area, and rapidly


changing
– More algorithms and apps in the next 5 years??

• To learn more about vision applications and


companies
– David Lowe maintains an excellent overview of vision
companies
• http://www.cs.ubc.ca/spider/lowe/vision.html
Robotics

Machine
Learning
Scope of Our Course Human Computer
Interaction

Image processing
Scene understanding
Graphics
Object recognition Medical Imaging
Motion analysis
Computational
Photography
Neuroscience

Multimedia
Tentative Schedule
• Week 1 (1/Sep): Introduction
• Weed 2-3 (8/Sep - 15/Sep): Image filtering, color, texture, and
resampling.
• Week 4 (22/Sep): Edge detection (Quiz 1)
• Week 5-6 (29/Sep - 6/Oct): Image keypoint, descriptor,
transformation and alignment.
• Week 7 (13/Oct): Deep Neural Networks
• Week 8 (20/Oct): Object detection and Image segmentation
• Week 9 (27/Oct): Generative Model
• Week 10 (3/Nov): Camera (Quiz 2)
• Week 11 (10/Nov): 3D and Stereo
• Week 12 (17/Nov): Depth and Structure
• Week 13 (24/Nov): Motion and Wrap Up
Course information
• Prerequisites
– A good working knowledge of programming
• We will briefly go through Pillow and OpenCV later.
– Data structure and algorithm
– Some math: linear algebra, vector calculus
• We will revisit some basic math in the lecture.

• Grading
– Project assignment (30%)
– Quiz (20%): paper exam (1 – 2 hours)
– Final exam (50%)
Project Assignment
• Literature survey (10%)
– Survey of at least 10 research papers (published in recent 4 years) for a
specific problem/topic of computer vision and image processing.
– Slide presentation through video recording.
• Course project: computer vision applications (20%)
– Source Code (colab or Jupyter Notebook (ipynb))
– Project report
– Reports are limited to 4 pages, including figures and tables, in the CVPR style
(with name and student ID). Additional pages containing cited references.

• Grouping
– Each group can have 1~3 members. For non-single group, the contribution of
each member should be listed at the end of the report (can be contained in
the additional pages) for reference.
– Find your groupmates NOW!!

https://cvpr2022.thecvf.com/author-guidelines
Suggested Topics

Detection/Tracking
Object/face
recognition

Segmentation

(Medical) Image Image registration


Processing/Enhancement/…

You may not follow these suggested topics, and you could
choose your own preference topic. ☺
Let’s have some fun!
Play with Colab
• Colab, or "Colaboratory", allows you to write
and execute Python in your browser, with
– Zero configuration required
– Access to GPUs free of charge
– Easy sharing
Python Programming
for Image Processing
• Pillow
– Pillow is a fork of PIL, the Python Imaging Library
• Installation:
https://pillow.readthedocs.io/en/latest/installation.html
pip install Pillow

Need to include an exclamation mark when using colab. (!pip install Pillow)

– Image Basics
• http://pillow.readthedocs.org/en/latest/handbook/concepts.html

– List of Modules
• http://pillow.readthedocs.org/en/latest/reference/index.html
Image Reading / Writing

from PIL import Image

# read image
im = Image.open("cat.jpg")
print im.format, im.size, im.mode # JPEG (512, 512) RGB
im.show() # use display(im) for colab

# create thumbnails
newsize = (128,128) Input
im.thumbnail(newsize)

# write image
outfile = "cat_ thumbnail.jpg"
im.save(outfile)
Output

There can be many parameters in the function, GOOGLE it if


you want to explore more!
Image Cutting / Pasting / Merging

# Copying a subrectangle from an image


box = (100, 100, 400, 400)
region = im.crop(box)

# Processing a subrectangle, and pasting it back


region = region.transpose(Image.ROTATE_180)
im.paste(region, box)
Output 1

# Splitting and merging bands


r, g, b = im.split()
im = Image.merge("RGB", (b, g, r))

Output 2
Geometric / Color Transformation
# geometric transforms:
out1 = im.resize((128, 128))
out2 = im.rotate(45) # degrees counter-clockwise
out3 = im.transpose(Image.FLIP_LEFT_RIGHT)
out4 = im.transpose(Image.FLIP_TOP_BOTTOM)

# color transforms:
out5 = im.convert("L")

Input

Output 1 Output 2 Output 3 Output 4 Output 5


Image Filter

from PIL import ImageFilter

# image smoothing / edge


out1 = im.filter(ImageFilter.BLUR)
out2 = im.filter(ImageFilter.GaussianBlur(radius=20))
out3 = im.filter(ImageFilter.CONTOUR)
Input
out4 = im.filter(ImageFilter.FIND_EDGES)

Output 1 Output 2 Output 3 Output 4


OpenCV
• OpenCV (Open Source Computer Vision Library) is an open source
computer vision and machine learning software library
– (Latest version: OpenCV 4.6 )

• OpenCV-Python is the Python API of OpenCV

• Cross-platform (Windows, Mac, Linux, Android, iOS, etc )

• Open Source and free (May have some commercial packages)

• Documentation:
http://docs.opencv.org/modules/core/doc/intro.html
Installation

• Install OpenCV-Python in Windows (site)


– Install Anaconda
• Anaconda is essentially a nicely packaged Python IDE that is shipped with
tons of useful packages, such as NumPy, Pandas, IPython Notebook, etc.
• Installation: https://docs.anaconda.com/anaconda/install/
– Install python virtual environment on Anaconda
• https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-
environments.html
– Install opencv-python:
pip install opencv-python # exclamation mark for colab

– Test:
• Open Python IDLE and type following codes in Python terminal
>>> import cv2
>>> print cv2.__version__
Reading, Colorspace Changing:
import cv2

# reading image:
image = cv2.imread(‘cat.jpg’) # default
image = cv2.imread(‘cat.jpg’, 1) # colorful, BGR
image = cv2.imread(‘cat.jpg’, 0) # gray-scale
image = cv2.imread(‘cat.jpg’, -1) # unchanged

66
Reading, Colorspace Changing:
import cv2

# change image colorspace:


image = cv2.imread(‘cat.jpg’) # default, BGR
image2 = cv2.cvtColor(image, flag)

flag:
• BGR->Gray: cv2.COLOR_BGR2GRAY
• BGR->RGB: cv2.COLOR_BGR2RGB
• BGR->HSV: cv2.COLOR_BGR2HSV

67
Image Segmentation
• K-means Clustering
– It clusters the given data into k-clusters or parts based on the k-centroids
– The motivation behind image segmentation using k-means is that we try to assign labels to each pixel based on
the RGB (or HSV) values
• OpenCV
– cv2.kmeans(data, K, criteria, attempts, flags[, bestLabels[, centers]]) →
retval, bestLabels, centers
– Input parameters
• data: np.float32 data type, and each feature should be put in a single column
• K : Number of clusters required at end
• criteria : It is the iteration termination criteria
• attempts: Flag to specify the number of times the algorithm is executed
• flags: how initial centers are taken
– Output parameters
• retval: the sum of squared distance from each point to their corresponding centers
• bestLabels: the label array where each element marked ‘0’, ‘1’.....
• centers: array of centers of clusters
Image Segmentation
• Preprocessing
• Convert the MxNx3 image into a Kx3 matrix (K=MxN)

import numpy as np
import cv2
from matplotlib import pyplot as plt

Input
image = cv2.imread('coins.jpg')
# reduce noise and make the image smoother
image = cv2.GaussianBlur(image, (7, 7), 0)

# each row is now a vector in the 3-D space of RGB


vectorized = image.reshape(-1, 3)
# convert the unit8 values to float (opencv requirement)
vectorized = np.float32(vectorized)

Noise Removal
Image Segmentation

• k-means algorithm

# define number of segments, with default 5


segments = 2

# OpenCV k-means function


criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0) Label map

ret, label, center = cv2.kmeans(vectorized, segments, None, criteria, 10,


cv2.KMEANS_RANDOM_CENTERS)

# assign every pixel with a color based on the label map


res = center[label.flatten()]
# reshape to image size
segmented_map = res.reshape((image.shape))
result = segmented_map.astype(np.uint8)
cv2.imwrite("segmented.jpg", result) Segmentation map
Feature Matching
• SIFT
– OpenCV 3 came a big push to move many of these “non-
free” modules out of the default OpenCV install and into
the opencv_contrib package
– To get access to the original SIFT and SURF
implementations found in OpenCV 2.4.X, you need to pull
down both the opencv and opencv_contrib repositories
from GitHub and then compile and install OpenCV 3 from
source

• ORB -Oriented FAST and rotated BRIEF


– An efficient alternative to SIFT or SURF
– A fusion of FAST keypoint detector and BRIEF descriptor
Feature Matching
• Example

import numpy as np
import cv2

img1 = cv2.imread('church.jpg',0) # queryImage


img2 = cv2.imread('church_part.jpg',0) # trainImage

# Initiate detector
orb = cv2.ORB_create()

# find the keypoints and descriptors


kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)
Feature Matching
• Brute-Force matcher
– It takes the descriptor of one
feature in first set and is
matched with all other
features in second set using
some distance calculation
– The closest one is returned
Feature Matching
Feature Matching

Input Output
Face Tracking
• Object Detection using Haar feature-based cascade
classifiers is an effective object detection method
– Group the features into different stages of classifiers and apply
one-by-one
– If a window fails the first stage, discard it.
– If it passes, apply the second stage of features and continue the
process
– The window which passes all stages is a face region

• OpenCV contains many pre-trained classifiers for face, eyes,


smile etc.
• Those XML files are stored in opencv/data/haarcascades/
folder
Face Tracking
• Example
Face Tracking
Face Tracking
• Result

Input Output
Deep Learning Library

You might also like