Computer Vision ch1
Computer Vision ch1
Terminator 2, 1991
sky
building
flag
face
banner
wall
street lamp
bus bus
– Motion Estimation 1m
– Motion Estimation 1m
– Motion Estimation 1m
Haze removal
22
Apple’s 3D maps
https://www.youtube.com/watch?v=InIVv-LsgZE
23
Optical character recognition (OCR)
• If you have a scanner, it probably came with OCR software
“How the Afghan Girl was Identified by Her Iris Patterns” Read the story
Source: S. Seitz
Object recognition (in mobile phones)
Source: S. Seitz
Google Goggles
Bird Identification
Robomart
Medical imaging
Healthcare Gist – Chili fish head
Color moment – Braised pork
FC7 – Steamed chicken feet
Viewpoint variation
Scale
Illumination
Why is computer vision difficult?
Marvin Minsky,
MIT
Turing
award,1969
42
Marvin Minsky, Gerald Sussman, MIT
MIT (the undergraduate)
Machine
Learning
Scope of Our Course Human Computer
Interaction
Image processing
Scene understanding
Graphics
Object recognition Medical Imaging
Motion analysis
Computational
Photography
Neuroscience
Multimedia
Tentative Schedule
• Week 1 (1/Sep): Introduction
• Weed 2-3 (8/Sep - 15/Sep): Image filtering, color, texture, and
resampling.
• Week 4 (22/Sep): Edge detection (Quiz 1)
• Week 5-6 (29/Sep - 6/Oct): Image keypoint, descriptor,
transformation and alignment.
• Week 7 (13/Oct): Deep Neural Networks
• Week 8 (20/Oct): Object detection and Image segmentation
• Week 9 (27/Oct): Generative Model
• Week 10 (3/Nov): Camera (Quiz 2)
• Week 11 (10/Nov): 3D and Stereo
• Week 12 (17/Nov): Depth and Structure
• Week 13 (24/Nov): Motion and Wrap Up
Course information
• Prerequisites
– A good working knowledge of programming
• We will briefly go through Pillow and OpenCV later.
– Data structure and algorithm
– Some math: linear algebra, vector calculus
• We will revisit some basic math in the lecture.
• Grading
– Project assignment (30%)
– Quiz (20%): paper exam (1 – 2 hours)
– Final exam (50%)
Project Assignment
• Literature survey (10%)
– Survey of at least 10 research papers (published in recent 4 years) for a
specific problem/topic of computer vision and image processing.
– Slide presentation through video recording.
• Course project: computer vision applications (20%)
– Source Code (colab or Jupyter Notebook (ipynb))
– Project report
– Reports are limited to 4 pages, including figures and tables, in the CVPR style
(with name and student ID). Additional pages containing cited references.
• Grouping
– Each group can have 1~3 members. For non-single group, the contribution of
each member should be listed at the end of the report (can be contained in
the additional pages) for reference.
– Find your groupmates NOW!!
https://cvpr2022.thecvf.com/author-guidelines
Suggested Topics
Detection/Tracking
Object/face
recognition
Segmentation
You may not follow these suggested topics, and you could
choose your own preference topic. ☺
Let’s have some fun!
Play with Colab
• Colab, or "Colaboratory", allows you to write
and execute Python in your browser, with
– Zero configuration required
– Access to GPUs free of charge
– Easy sharing
Python Programming
for Image Processing
• Pillow
– Pillow is a fork of PIL, the Python Imaging Library
• Installation:
https://pillow.readthedocs.io/en/latest/installation.html
pip install Pillow
Need to include an exclamation mark when using colab. (!pip install Pillow)
– Image Basics
• http://pillow.readthedocs.org/en/latest/handbook/concepts.html
– List of Modules
• http://pillow.readthedocs.org/en/latest/reference/index.html
Image Reading / Writing
# read image
im = Image.open("cat.jpg")
print im.format, im.size, im.mode # JPEG (512, 512) RGB
im.show() # use display(im) for colab
# create thumbnails
newsize = (128,128) Input
im.thumbnail(newsize)
# write image
outfile = "cat_ thumbnail.jpg"
im.save(outfile)
Output
Output 2
Geometric / Color Transformation
# geometric transforms:
out1 = im.resize((128, 128))
out2 = im.rotate(45) # degrees counter-clockwise
out3 = im.transpose(Image.FLIP_LEFT_RIGHT)
out4 = im.transpose(Image.FLIP_TOP_BOTTOM)
# color transforms:
out5 = im.convert("L")
Input
• Documentation:
http://docs.opencv.org/modules/core/doc/intro.html
Installation
– Test:
• Open Python IDLE and type following codes in Python terminal
>>> import cv2
>>> print cv2.__version__
Reading, Colorspace Changing:
import cv2
# reading image:
image = cv2.imread(‘cat.jpg’) # default
image = cv2.imread(‘cat.jpg’, 1) # colorful, BGR
image = cv2.imread(‘cat.jpg’, 0) # gray-scale
image = cv2.imread(‘cat.jpg’, -1) # unchanged
66
Reading, Colorspace Changing:
import cv2
flag:
• BGR->Gray: cv2.COLOR_BGR2GRAY
• BGR->RGB: cv2.COLOR_BGR2RGB
• BGR->HSV: cv2.COLOR_BGR2HSV
67
Image Segmentation
• K-means Clustering
– It clusters the given data into k-clusters or parts based on the k-centroids
– The motivation behind image segmentation using k-means is that we try to assign labels to each pixel based on
the RGB (or HSV) values
• OpenCV
– cv2.kmeans(data, K, criteria, attempts, flags[, bestLabels[, centers]]) →
retval, bestLabels, centers
– Input parameters
• data: np.float32 data type, and each feature should be put in a single column
• K : Number of clusters required at end
• criteria : It is the iteration termination criteria
• attempts: Flag to specify the number of times the algorithm is executed
• flags: how initial centers are taken
– Output parameters
• retval: the sum of squared distance from each point to their corresponding centers
• bestLabels: the label array where each element marked ‘0’, ‘1’.....
• centers: array of centers of clusters
Image Segmentation
• Preprocessing
• Convert the MxNx3 image into a Kx3 matrix (K=MxN)
import numpy as np
import cv2
from matplotlib import pyplot as plt
Input
image = cv2.imread('coins.jpg')
# reduce noise and make the image smoother
image = cv2.GaussianBlur(image, (7, 7), 0)
Noise Removal
Image Segmentation
• k-means algorithm
import numpy as np
import cv2
# Initiate detector
orb = cv2.ORB_create()
Input Output
Face Tracking
• Object Detection using Haar feature-based cascade
classifiers is an effective object detection method
– Group the features into different stages of classifiers and apply
one-by-one
– If a window fails the first stage, discard it.
– If it passes, apply the second stage of features and continue the
process
– The window which passes all stages is a face region
Input Output
Deep Learning Library