0% found this document useful (0 votes)
12 views

Instance Segmentation

Uploaded by

Babil King
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Instance Segmentation

Uploaded by

Babil King
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Instance Segmentation

Riley Simmons-Edler, Berthy Feng


Instance Segmentation Task

● Label each foreground pixel with object


and instance
● Object detection + semantic
segmentation

Slide Credit: Kaiming He


In This Lecture...

● Microsoft COCO dataset


● Mask R-CNN (fully supervised)
● MaskX R-CNN (partially supervised)
Microsoft COCO:
Common Objects in Context
Tsung-Yi Lin, Michael Maire, Serge Belongie, et al.
“Microsoft COCO: Common Objects in Context.” arXiv,
2015.
Previous Datasets
● ImageNet: many object
categories
● PASCAL VOC: object
detection in natural images,
small number of classes
● SUN: labeling scene types and
commonly occurring objects,
but not many instances per
category
Image Credit: Tsung-Yi Lin et al.
Goal: Push research in scene understanding

1. Detecting non-iconic views


2. Contextual reasoning between objects
3. Precise 2D localization of objects
MS COCO Dataset
❖ 91 object
classes
❖ 328,000
images
❖ 2.5 million
labeled
instances

Image Credit: Tsung-Yi Lin et al.


Image Collection & Annotation
Object Categories

Image Credit: Tsung-Yi Lin et al.


Non-Iconic Image Collection

Image Credit: Tsung-Yi Lin et al.


Annotation

Image Credit: Tsung-Yi Lin et al.


Dataset Evaluation
Statistics

Image Credit: Tsung-Yi Lin et al.


Statistics

Image Credit: Tsung-Yi Lin et al.


COCO Detection Challenge

Image Credit: Tsung-Yi Lin et al.


COCO Keypoint Challenge

Image Credit: Tsung-Yi Lin et al.


COCO Stuff Challenge

Image Credit: Tsung-Yi Lin et al.


COCO Places Challenges

Image Credit: Tsung-Yi Lin et al.


Mask R-CNN
Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross
Girshick. “Mask R-CNN.” ICCV, 2017.
Faster R-CNN
Fast R-CNN

Image Credit: Shaoqing Ren et al. Image Credit: Tomasz Grel


Insight: Region Proposal and Detection Use
Same Features

Image Credit: Shaoqing Ren et al.


Faster R-CNN = RPN + Fast R-CNN
RPN = Fully Convolutional Network
Extending to Instance
Segmentation
Visual Perception Problems

Slide Credit: Kaiming He


Instance Segmentation Methods

Slide Credit: Kaiming He


Insight: Mask Prediction in Parallel

Slide Credit: Kaiming He


RoIPool

Image Credit: Tomasz Grel


RoIPool

Slide Credit: Kaiming He


RoIAlign

Slide Credit: Kaiming He


Mask R-CNN
Mask R-CNN Results
Examples

● Mask AP =
35.7

Image Credit: Kaiming He et al.


Comparisons

Image Credit: Kaiming He et al.


Comparisons

Image Credit: Kaiming He et al.


Application: Human Pose Estimation

Image Credit: Kaiming He et al.


Mask R-CNN Recap

● Add parallel mask prediction head to Faster-RCNN


● RoIAlign allows for precise localization
● Mask R-CNN improves on AP of previous state-of-the-art, can be
applied in human pose estimation
Learning to Segment Every Thing
Ronghang Hu, Piotr Dollar, Kaiming He, Trevor Darrell, and
Ross Girshick. “Learning to Segment Every Thing.” arXiv,
2017.
Partially Supervised Model
Motivation for a Partially Supervised Model

A = set of object B = set of object


categories with categories with only
complete mask bounding boxes (no
annotations segmentation
annotations)

How can we know C = A U B?

Image Credit: Ronghang Hu et al.


Transfer Learning

Image Credit: Ronghang Hu et al.


Weight Transfer Function

Image Credit: Ronghang Hu et al.


Training
● Train bounding box head using standard box detection losses on all
classes in A U B
● Train mask head, weight transfer function using mask loss on classes in A

Image Credit: Ronghang Hu et al.


Stage-Wise Training
1. Detection training ● Train detection once and then
2. Segmentation training fine-tune weight transfer function
● Inferior performance

Image Credit: Ronghang Hu et al.


End-to-End Joint Training

● Jointly train detection head and mask head end-to-end


● Want detection weights to stay constant between A and B

Image Credit: Ronghang Hu et al.


End-to-End Training Better

Image Credit: Ronghang Hu et al.


Mask Prediction
Baseline: Class-agonistic FCN mask prediction

Extension: FCN+MLP mask heads

Image Credit: Ronghang Hu et al.


Results
Examples

Image Credit: Ronghang Hu et al.


Comparisons

Image Credit: Ronghang Hu et al.


Segmenting Everything

Image Credit: Ronghang Hu et al.

You might also like