0% found this document useful (0 votes)
4 views

Computer_vision_part2

Temp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Computer_vision_part2

Temp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Computer Vision

IFT6758 - Data Science

Sources:
http://www.cs.cmu.edu/~16385/
http://cs231n.stanford.edu/2018/syllabus.html

http://www.cse.psu.edu/~rtc12/CSE486/
Announcement

• Assignment 3 is available on gradescope:

• Algorithmic discrimination

• NLP (has two bonus point questions)

• CV (basics)

• Due: November 28

• Grades of assignment 2 and mid-term will be published this week.

!2
CV Pipeline

!3
Image transformations

!4
Recall: filtering

!5
Convolution

!6
Convolution

All of computer vision


is convolutions
(basically)

!7
Convolution

• The mathematics for many filters can be expressed in a principal manner


using 2D convolution, such as smoothing and sharpening images and
detecting edges.

• Convolution in 2D operates on two images, with one functioning as the


input image and the other, called the kernel, serving as a filter.

• It expresses the amount overlap of one function as it is shifted over another


function, as the output image is produced by sliding the kernel over the input
image.

!8
Convolution
• Convolution is the process of adding each element of the image to its local neighbors,
weighted by the kernel.

!9
Convolution for 2D discrete signals

Definition of filtering as convolution:

filtered image filter input image

!10
Convolution for 2D discrete signals

Definition of filtering as convolution:

filtered image filter input image

If the filter is non-zero only within , then

The box filter kernel we saw earlier is the 3x3 matrix representation of
.

!11
Convolution Examples

!12
Convolution Examples

• A sharpening filter can be broken down into two steps: It takes a smoothed
image, subtracts it from the original image to obtain the "details" of the
image, and adds the "details" to the original image.

!13
Convolution Examples
• A sharpening filter can be broken down into two steps: It takes a smoothed
image, subtracts it from the original image to obtain the "details" of the
image, and adds the "details" to the original image.

!14
Convolution Examples

• Gaussian Smoothing Filter: is a weighted averaging filter which gives more


weights to central pixels and less weights to the neighbors

!15
Convolution vs Correlation

Definition of discrete 2D
convolution: notice the flip

Definition of discrete 2D notice the lack of a


correlation: flip

• Most of the time won’t matter, because our kernels will be symmetric.

!16
Convolution vs correlation

• Convolution: is an integral that expresses the amount of overlap of


one function as it is shifted over another function.

• Convolution is a filtering operation

• Correlation compares the similarity of two sets of data. Correlation


computes a measure of similarity of two input signals as the are
shifted by another.

• The correlation reaches a maximum at the time when the two


signals matches best.

!17
Correlation application

• Correlation tells you how similar the signal is to the filter at any point. This is
used for image alignment, template matching and simple image matching.

Template

Original image

!18
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column

!19
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column

2D convolution with a separable filter is equivalent to two 1D convolutions


(with the “column” and “row” filters).

!20
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column

2D convolution with a separable filter is equivalent to two 1D convolutions


(with the “column” and “row” filters).

If the image has M x M pixels and the filter kernel has size N x N:

What is the cost of convolution with a non-separable filter?

!21
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column

2D convolution with a separable filter is equivalent to two 1D convolutions


(with the “column” and “row” filters).

If the image has M x M pixels and the filter kernel has size N x N:

What is the cost of convolution with a non-separable filter? M2 x N2

!22
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column

2D convolution with a separable filter is equivalent to two 1D convolutions


(with the “column” and “row” filters).

If the image has M x M pixels and the filter kernel has size N x N:

What is the cost of convolution with a non-separable filter? M2 x N2

What is the cost of convolution with a separable filter?

!23
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column

2D convolution with a separable filter is equivalent to two 1D convolutions


(with the “column” and “row” filters).

If the image has M x M pixels and the filter kernel has size N x N:

What is the cost of convolution with a non-separable filter? M2 x N2

What is the cost of convolution with a separable filter? 2 x N x M2

!24
Examples of separable filters

• Box-filter that is used as smoothing filter.

• Sober operator which is used commonly for edge detection.

!25
CV Pipeline

!26
Edge detection

• Edges are the points in an image where the image brightness changes sharply
or has discontinuities. Such discontinuities generally correspond to:

• Discontinuities in depth

• Discontinuities in surface orientation

• Changes in material properties

• Variations in scene illumination

• Edges are important for two main reasons.

• 1) Most semantic and shape information can be deduced from them, so we can
perform object recognition and analyze perspectives and geometry of an image.

• 2) They are a more compact representation than pixels.

!27
Edge detection

!28
Characterizing edges

• We can pinpoint where edges occur from an image's intensity profile along a
row or column of the image. Wherever there is a rapid change in the intensity
function indicates an edge, as seen where the function's first derivative has a
local extrema.

!29
Partial derivatives with Convolution

!30
Partial derivatives of an image

-1 0 1 0

-1

!31
Image Gradient

!32
Intensity profile

!33
Effect of noise

!34
Effect of noise

!35
Solution: Smoothing

!36
Edge detection via Convolution

!37
Derivative of Gaussian filter

!38
Derivatives of Gaussian filter

!39
Edge detectors

!40
Canny Edge detection

!41
Corner/blob detectors

• Edges are useful as local features, but corners and small areas (blobs) are
generally more helpful in computer vision tasks. Blob detectors can be built
by extending the basic edge detector idea that we just discussed.

!42
Scale Invariant Feature Transform
(SIFT)
• Keypoints are basically the points of interest in an image. Keypoints are
analogous to the features of a given image.

• They are locations that define what is interesting in the image. Keypoints are
important, because no matter how the image is modified (rotation, shrinking,
expanding, distortion), we will always find the same keypoints for the image.

Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of computer vision 60.2 (2004): 91-110.

!43
SIFT

!44
Speeded-Up Robust Features (SURF)

• Speeded-Up Robust Features (SURF) is an enhanced version of SIFT. It


works much faster and is more robust to image transformations.

• In SIFT, the scale space is approximated using Difference of Gaussians


(DoG) while in SURF they use Laplacian of Gaussian. The Laplacian kernel
works by approximating a second derivative of the image. Hence, it is very
sensitive to noise and they apply the Gaussian kernel to the image before
Laplacian kernel thus giving it the name Laplacian of Gaussian.

• In SURF, the Laplacian of Gaussian is calculated using a box filter (kernel).


The convolution with box filter can be done in parallel for different scales
which is the underlying reason for the enhanced speed of SURF (compared
to SIFT).

Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. "Surf: Speeded up robust features." European conference on computer vision. Springer, Berlin, Heidelberg, 2006.

!45
SURF

!46
Bag of Words

Dictionary Learning:
Learn Visual Words using clustering

Encode:
build Bags-of-Words (BOW) vectors
for each image

Classify:
Train and test data using BOWs

!47
Which object do these parts belong
to?

!48
Some local feature are
very informative

An object as

a collection of local features


(bag-of-words)

• deals well with occlusion


• scale invariant
• rotation invariant

!49
BOW

• Extract features (e.g., SIFT)

• Learn a visual dictionary

!50
Image Features

!51
Image Features: Motivation

!52
Image Features: Motivation

!53
Image Features

!54
Image Features

!55
Image features vs ConvNets

!56
!57
ImageNet

• ImageNet is an image database organized according to


the WordNet hierarchy (currently only the nouns), in which each node of the
hierarchy is depicted by hundreds and thousands of images. Currently we
have an average of over five hundred images per node

!58
!59
AlexNet

“AlexNet is considered one of the most influential papers published in computer


vision, having spurred many more papers published employing CNNs and GPUs
to accelerate deep learning.”

!60
Convolution Neural Networks

We will learn about it on Thursday!

!61
Conferences focusing on CV

• CVPR : IEEE/CVF Conference on Computer Vision and Pattern Recognition


http://cvpr2020.thecvf.com/

• ICCV : IEEE/CVF International Conference on Computer Vision


http://iccv2019.thecvf.com/

• ACMMM : ACM International Conference on Multimedia


https://www.acmmm.org/2020/

• CV is one of the main topics of the major machine learning and AI


conferences such as:
AAAI, IJCAI, ICML, NEURIPS, …

!62

You might also like