0% found this document useful (0 votes)

11 views46 pages

Lecture 4

The document discusses image segmentation and the importance of context in identifying objects within images. It covers various techniques for automatic and semi-automatic segmentation, including semantic segmentation using fully convolutional networks and object detection methods like R-CNN and YOLO. The content emphasizes the need for efficient algorithms to classify and segment images accurately while managing computational resources.

Uploaded by

asumi288hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views46 pages

Lecture 4

Uploaded by

asumi288hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Image Segmentation

How many zebras?

From Sandlot Science

Why context is important?

What is this?

slide by Takeo Kanade

Why is this a car?
…because it’s on the road!

Why is this road?

Why is this a road?

Context is very important！

Same problem in real scenes
From images to objects

What defines an object?

• Subjective problem, but has been well-studied
Extracting objects

How could we do this automatically (or at least

semi-automatically)?
Semi-automatic binary segmentation
Simplifying the user interaction
Grabcut [Rother et al., SIGGRAPH 2004]
Auto segmentation: toy example
white
pixels
3

pixel count
black pixels
gray

1 2 pixels

input image
intensity

• These intensities define the three groups.

• We could label every pixel in the image according to
which of these primary intensities it is.
• i.e., segment the image based on the intensity feature.
• But … image isn’t quite so simple …

Source: K. Grauman
pixel count
input image
intensity

• Now how to determine the three main intensities that

define our groups?
• We need to cluster.

Source: K. Grauman
Deep Learning
Semantic Classification Object Instance
Segmentation + Detection Segmentation
Localization

GRASS, CAT, CAT DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

Pixel-level Single Object Multiple Object

May 10, 2017

Segmentation+Classification
Fei-Fei Li & Justin Johnson &
Lecture 11 - 13
Slide by: Justin
Serena Johnson
Yeung
Semantic Segmentation

Label each pixel in the

image with a category
label

s
Sky

ee
Sky

Tr
ee
s
Cat Cow

Grass Grass

Don’t differentiate instances,

only care about pixels
Fei-Fei Li & Justin Johnson &
Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
Semantic Segmentation Idea:
Fully Convolutional
Design a network as a bunch of convolutional layers
to make predictions for pixels all at once!

Conv Conv Conv Conv argmax

Input:
Scores: Predictions:
3 x H xW
CxHxW HxW
Convolutions

Each channel is a class

C channels->C classes
May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
Semantic Segmentation Idea:
Fully Convolutional
Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

Med-res: Med-res:
D2 x H/4 x W/4 D2 x H/4 x W/4

Low-res:
D3 x H/4 x W/4
Input: High-res: High-res: Predictions:
3 x H xW D1 x H/2 x W/2 D1 x H/2 x W/2 HxW

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
Semantic Segmentation Idea:
Fully Convolutional
Downsampling: Design network as a bunch of convolutional layers, with Upsampling:
Pooling, strided downsampling and upsampling inside the network! ???
convolution
Med-res: Med-res:
D2 x H/4 x W/4 D2 x H/4 x W/4

Low-res:
D3 x H/4 x W/4
Input: High-res: High-res: Predictions:
3 x H xW D1 x H/2 x W/2 D1 x H/2 x W/2 HxW

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
In-Network upsampling: “Unpooling”

Nearest Neighbor “Bed of Nails”

1 1 2 2 1 0 2 0

1 2 1 1 2 2 1 2 0 0 0 0

3 4 3 3 4 4 3 4 3 0 4 0

3 3 4 4 0 0 0 0

Input: 2 x 2 Output: 4 x 4 Input: 2 x 2 Output: 4 x 4

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
In-Network upsampling: “Max Unpooling”
Max Pooling Max Unpooling
Remember which element was max!
Use positions from
1 2 6 3 pooling layer 0 0 2 0

1 2
3 5 2 1 5 6
… 3 4
0 1 0 0

1 2 2 1 7 8 0 0 0 0
Rest of the network
7 3 4 8 3 0 0 4

Input: 4 x 4 Output: 2 x 2 Input: 2 x 2 Output: 4 x 4

Corresponding pairs of
downsampling and
upsampling layers

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
Learnable Upsampling

3 x 3 transpose convolution, stride 2 pad 1

Input: 2 x 2 Output: 4 x 4

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
Learnable Upsampling

3 x 3 transpose convolution, stride 2 pad 1

Input gives
weight for
filter

Input: 2 x 2 Output: 4 x 4

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
Learnable Upsampling
Sum where
3 x 3 transpose convolution, stride 2 pad 1 output overlaps

Filter moves 2 pixels in

Input gives the output for every one
weight for pixel in the input
filter

Input: 2 x 2 Output: 4 x 4

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
Transpose Convolution: 1D Example

Output
Input Filter Output contains
ax copies of the filter
weighted by the
x ay input, summing at
a where at overlaps in
the output
y az + bx
b
z by
bz

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 - 23
Adapted fromYeung
Serena Justin Johnson
Object Detection as Regression?

CAT: (x, y, w, h)

DOG: (x, y, w, h)
DOG: (x, y, w, h)
CAT: (x, y, w, h)

DUCK: (x, y, w, h)
DUCK: (x, y, w, h)
….

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 - 24
Slide by: Justin
Serena Johnson
Yeung
Object Detection as Regression?

CAT: (x, y, w, h) 4 numbers

DOG: (x, y, w, h)
DOG: (x, y, w, h) 16 numbers
CAT: (x, y, w, h)

DUCK: (x, y, w, h) Many

DUCK: (x, y, w, h) numbers!
….

May 10, 2017

Each image needs a different
number of outputs!
Fei-Fei Li & Justin Johnson &
Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
Object Detection as Classification:
Sliding Window
Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? NO Cat?
NO
Background? YES

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 - 26
Slide by: Justin
Serena Johnson
Yeung
Object Detection as Classification:
Sliding Window
Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? YES
Cat? NO
Background? NO

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
Object Detection as Classification:
Sliding Window
Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? YES
Cat? NO
Background? NO

May 10, 2017

Fei-Fei Li & Justin Johnson &

Dog? NO Cat?
YES
Background? NO

May 10, 2017

Fei-Fei Li & Justin Johnson &

Dog? NO Cat?
YES
Background? NO

Problem: Need to
apply CNN to huge
number of locations
and scales, very
computationally
May 10, 2017
expensive!

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Slide by: Justin
Serena Johnson
Yeung
Region Proposals
● Find image regions that are likely to contain objects
● Relatively fast to run; e.g. Selective Search gives 1000 region
proposals in a few seconds on CPU

Alexe et al, “Measuring the objectness of image windows”, TPAMI 2012

Uijlings et al, “Selective Search for Object Recognition”, IJCV 2013 May 10, 2017
Cheng et al, “BING: Binarized normed gradients for objectness estimation at 300fps”, CVPR 2014
Zitnick and Dollar, “Edge boxes: Locating object proposals from edges”, ECCV 2014
Fei-Fei Li & Justin Johnson &
Lecture 11 - 31
Slide by: Justin
Serena Johnson
Yeung
Alexe et al., CVPR 2010
R-CNN

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 - 33
Girshick et Yeung
Serena al., “R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014
R-CNN

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Girshick et Yeung
Serena al., “R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014
R-CNN

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Girshick et Yeung
Serena al., “R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014
R-CNN

Conv
Conv Net
Conv Net
Net

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Girshick et Yeung
Serena al., “R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014
R-CNN

Conv
Conv Net
Conv Net
Net

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Girshick et Yeung
Serena al., “R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014
R-CNN

Conv
Conv Net
Conv Net
Net

May 10, 2017

Fei-Fei Li & Justin Johnson &

Lecture 11 -
Girshick et Yeung
Serena al., “R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014
Detection without Proposals:
YOLO
Within each grid cell:
•Regress from each of the B base
boxes to a final box with 5
numbers:(dx, dy, dh, dw,
confidence)
•Predict scores for each of C
classes (including background as
a class)

Input image Divide image into grid 7 Output:

3xHxW x7 7 x 7 x (5 * B + C)

Image a set of base

boxes centered at each
grid cell Here B = 3

May 10, 2017

Redmon et al, “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016
Liu et al, “SSD: Single-Shot MultiBox Detector”, ECCV 2016

Fei-Fei Li & Justin Johnson &

Lecture 11 - 39
Slide by: Justin
Serena Johnson
Yeung
This parameterization fixes the output
size
• Each cell predicts:

- For each bounding box:

- 4 coordinates (x, y, w, h)
- 1 confidence value
- Some number of class
probabilities

• For Pascal VOC:

- 7x7 grid
- 2 bounding boxes / cell
- 20 classes

• 7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016
Split the image into a grid

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016
Each cell predicts boxes and confidences:
P(Object)

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016
Each cell also predicts a probability
P(Class | Object)

Bicycle Car

Dog

Dining
Table
Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016
Combine the box and class predictions

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016
Finally do non-maximum suppression and
threshold detections

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016
It also generalizes well to new domains

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016

High Level Computer Vision Overview
No ratings yet
High Level Computer Vision Overview
62 pages
Deep Learning For Computer Vision
No ratings yet
Deep Learning For Computer Vision
181 pages
Semantic Segmentation for CS Students
No ratings yet
Semantic Segmentation for CS Students
151 pages
02 Semantic Segmentation 2024
No ratings yet
02 Semantic Segmentation 2024
53 pages
Deep Learning for Image Segmentation
No ratings yet
Deep Learning for Image Segmentation
92 pages
CS60010 - CNN 4
No ratings yet
CS60010 - CNN 4
32 pages
Object Detyection Using CNN
No ratings yet
Object Detyection Using CNN
113 pages
Image Detection and Segmentation Techniques
No ratings yet
Image Detection and Segmentation Techniques
73 pages
AML - Lecture - 10 - 15nov24
No ratings yet
AML - Lecture - 10 - 15nov24
169 pages
Segmentation Detection
100% (1)
Segmentation Detection
109 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
Object Detection-Compressed
No ratings yet
Object Detection-Compressed
80 pages
Object Detection and Segmentation - Part 2
No ratings yet
Object Detection and Segmentation - Part 2
36 pages
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
42 pages
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
No ratings yet
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
10 pages
Lecture 5 Segmentation
No ratings yet
Lecture 5 Segmentation
140 pages
Cs383 Lecture 20 PDF
No ratings yet
Cs383 Lecture 20 PDF
61 pages
Dlcv2017d3l1segmentation 170623173102
No ratings yet
Dlcv2017d3l1segmentation 170623173102
36 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
Chapter 7 - Part 3 - DL For CV
No ratings yet
Chapter 7 - Part 3 - DL For CV
79 pages
1 CASENet: Deep Category-Aware Semantic Edge Detection
No ratings yet
1 CASENet: Deep Category-Aware Semantic Edge Detection
16 pages
Deep Residual Learning References
No ratings yet
Deep Residual Learning References
9 pages
Deep Learning - 11 - 12
No ratings yet
Deep Learning - 11 - 12
48 pages
Deep Learning For Geometric and Semantic Tasks in Photogrammetry and Remote Sensing
No ratings yet
Deep Learning For Geometric and Semantic Tasks in Photogrammetry and Remote Sensing
11 pages
Liu High-Level Semantic Feature Detection A New Perspective For Pedestrian Detection CVPR 2019 Paper
No ratings yet
Liu High-Level Semantic Feature Detection A New Perspective For Pedestrian Detection CVPR 2019 Paper
10 pages
Pedestrian Detection via Semantic Features
No ratings yet
Pedestrian Detection via Semantic Features
10 pages
cs231n 2018 Lecture02
No ratings yet
cs231n 2018 Lecture02
65 pages
Advanced Image Classification Techniques
No ratings yet
Advanced Image Classification Techniques
102 pages
Image Classification and Detection Challenges
No ratings yet
Image Classification and Detection Challenges
86 pages
Lecture 4 Detection
No ratings yet
Lecture 4 Detection
148 pages
Object Detection and Tracking
No ratings yet
Object Detection and Tracking
144 pages
Lecture4 - Convnets For CV Slide
No ratings yet
Lecture4 - Convnets For CV Slide
65 pages
REF-6-DeepLab Semantic Image Segmentation With Deep Convolutional Nets Atrous Convolution and Fully Connected CRFs
No ratings yet
REF-6-DeepLab Semantic Image Segmentation With Deep Convolutional Nets Atrous Convolution and Fully Connected CRFs
15 pages
Semantic Segmentation by Using Down-Sampling and S
No ratings yet
Semantic Segmentation by Using Down-Sampling and S
14 pages
CV 2025 Spring 16
No ratings yet
CV 2025 Spring 16
53 pages
(Fall 2024) Images and Convolutions
No ratings yet
(Fall 2024) Images and Convolutions
69 pages
Chap4 CNN (20240205) - DL4H Practioner Guide
No ratings yet
Chap4 CNN (20240205) - DL4H Practioner Guide
23 pages
L10 Lecture Detection - Segmentation v2.5
No ratings yet
L10 Lecture Detection - Segmentation v2.5
35 pages
Unit 3 DL
No ratings yet
Unit 3 DL
72 pages
Deep Segmentation
No ratings yet
Deep Segmentation
38 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
2 Convolutional Neural Network For Image Classification
No ratings yet
2 Convolutional Neural Network For Image Classification
6 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
Lecture 2 PDF
No ratings yet
Lecture 2 PDF
62 pages
Research Advance in Deep Learning Image Segmentati
No ratings yet
Research Advance in Deep Learning Image Segmentati
8 pages
Lecture25 Spring2018
No ratings yet
Lecture25 Spring2018
54 pages
Week 9 Lecture Notes
No ratings yet
Week 9 Lecture Notes
27 pages
cs131 Class Notes PDF
No ratings yet
cs131 Class Notes PDF
213 pages
Chapter Convolutional Neural Networks
No ratings yet
Chapter Convolutional Neural Networks
7 pages
Computer Vision Part2
No ratings yet
Computer Vision Part2
62 pages
NNDL Unit 5
No ratings yet
NNDL Unit 5
21 pages
Instance Segmentation in Computer Vision
No ratings yet
Instance Segmentation in Computer Vision
30 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
No ratings yet
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
20 pages
Unsupervised Image Segmentation Model
No ratings yet
Unsupervised Image Segmentation Model
13 pages
Part 2
No ratings yet
Part 2
225 pages
Lecture6 2
No ratings yet
Lecture6 2
37 pages
Lecture5 2
No ratings yet
Lecture5 2
26 pages
Lecture3 1
No ratings yet
Lecture3 1
8 pages
Lecture2 2
No ratings yet
Lecture2 2
9 pages
Lecture2 1
No ratings yet
Lecture2 1
32 pages
VC 1415 TP9 RegionBasedSegmentation
No ratings yet
VC 1415 TP9 RegionBasedSegmentation
35 pages
Digital Signal Processing in MATLAB
No ratings yet
Digital Signal Processing in MATLAB
3 pages
Architectural Exploration For Energy-Efficient Fixed Point Kalman Filter Vlsi Design
No ratings yet
Architectural Exploration For Energy-Efficient Fixed Point Kalman Filter Vlsi Design
14 pages
M1 - Pengantar Pengolahan Citra Digital
No ratings yet
M1 - Pengantar Pengolahan Citra Digital
34 pages
Digital Image Processing Elective 1
No ratings yet
Digital Image Processing Elective 1
3 pages
Computer Vision Midsem 1 Papaer
No ratings yet
Computer Vision Midsem 1 Papaer
2 pages
DSP 1
No ratings yet
DSP 1
21 pages
A1745136595 29458 13 2025 Unit6cv
No ratings yet
A1745136595 29458 13 2025 Unit6cv
54 pages
wp491 Floating To Fixed Point
No ratings yet
wp491 Floating To Fixed Point
14 pages
IP Prog 3
No ratings yet
IP Prog 3
4 pages
B19ECT502 DTSP Revised
No ratings yet
B19ECT502 DTSP Revised
2 pages
Understanding Wavelet Transform Basics
100% (2)
Understanding Wavelet Transform Basics
3 pages
DSP - Ipcc
No ratings yet
DSP - Ipcc
32 pages
Two Examples On Linear and Circular Convolution of Signals.: Example 1
No ratings yet
Two Examples On Linear and Circular Convolution of Signals.: Example 1
2 pages
Outlook Energi Indonesia 2023 - PDF
No ratings yet
Outlook Energi Indonesia 2023 - PDF
257 pages
MATLAB Filter Design for Audio Signals
No ratings yet
MATLAB Filter Design for Audio Signals
3 pages
Control ASK Devices with SDR & Raspberry Pi
No ratings yet
Control ASK Devices with SDR & Raspberry Pi
7 pages
Compressed Sensing Based Image Reconstruction With Projection Recovery For Limited Angle Cone-Beam CT Imaging
No ratings yet
Compressed Sensing Based Image Reconstruction With Projection Recovery For Limited Angle Cone-Beam CT Imaging
4 pages
MFCC Code
No ratings yet
MFCC Code
8 pages
Computer Graphics Summary
No ratings yet
Computer Graphics Summary
6 pages
Radiology Book List for Dental Professionals
No ratings yet
Radiology Book List for Dental Professionals
2 pages
FFT Simulation in LabView Guide
No ratings yet
FFT Simulation in LabView Guide
9 pages
ADC Tutorial
100% (2)
ADC Tutorial
58 pages
Chapter 1
No ratings yet
Chapter 1
73 pages
Intro To Biomedical Signal Processing
No ratings yet
Intro To Biomedical Signal Processing
22 pages
Chapter 3 Photo Editing Fundamentals.
No ratings yet
Chapter 3 Photo Editing Fundamentals.
83 pages
Discrete-Time LTI System Analysis
No ratings yet
Discrete-Time LTI System Analysis
8 pages
Pacote PC DF Agente Pacotaço Pacote Téorico + Pacote Passo Estratégico
No ratings yet
Pacote PC DF Agente Pacotaço Pacote Téorico + Pacote Passo Estratégico
42 pages
2-Parallel FIR Filter, 2-Parallel Fast FIR Filter
No ratings yet
2-Parallel FIR Filter, 2-Parallel Fast FIR Filter
7 pages
Basler 1051
No ratings yet
Basler 1051
528 pages