0% found this document useful (0 votes)
268 views31 pages

Fully Convolutional Neural Network

Fully convolutional networks (FCNs) are end-to-end neural networks that take input of any size and produce pixelwise outputs for tasks like semantic segmentation. FCNs modify classification networks to be fully convolutional by converting fully connected layers to convolutional layers. This allows processing entire images with variable sizes efficiently in one forward pass. FCNs achieve state-of-the-art results on semantic segmentation benchmarks while being much faster than prior methods. Code and pre-trained models for FCNs are available on various datasets.

Uploaded by

Ilma Arifiany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
268 views31 pages

Fully Convolutional Neural Network

Fully convolutional networks (FCNs) are end-to-end neural networks that take input of any size and produce pixelwise outputs for tasks like semantic segmentation. FCNs modify classification networks to be fully convolutional by converting fully connected layers to convolutional layers. This allows processing entire images with variable sizes efficiently in one forward pass. FCNs achieve state-of-the-art results on semantic segmentation benchmarks while being much faster than prior methods. Code and pre-trained models for FCNs are available on various datasets.

Uploaded by

Ilma Arifiany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Fully Convolutional Networks

Jon Long and Evan Shelhamer


CVPR15 Caffe Tutorial
pixels in, pixels out
monocular depth estimation (Liu et al. 2015)

semantic
segmentation

boundary prediction (Xie & Tu 2015)


< 1/5 second

???
end-to-end learning

3
a classification network

“tabby cat”

4
becoming fully convolutional

5
becoming fully convolutional

6
upsampling output

7
end-to-end, pixels-to-pixels network

8
end-to-end, pixels-to-pixels network

upsampling
conv, pool, pixelwise
nonlinearity output + loss

9
spectrum of deep features
combine where (local, shallow) with what (global, deep)

fuse features into deep jet

(cf. Hariharan et al. CVPR15 “hypercolumn”)


skip layers

interp + sum

sk
ip
to
fu
se
la interp + sum
ye
rs
end-to-end, joint learning !
of semantics and location

dense output
skip layer refinement
input image stride 32 stride 16 stride 8 ground truth

no skips 1 skip 2 skips


training + testing
- train full image at a time without patch sampling
- reshape network to take input of any size
- forward time is ~150ms for 500 x 500 x 21 output
FCN SDS* Truth Input

Relative to prior state-of-the-


art SDS:

- 20% improvement
for mean IoU

- 286× faster

*Simultaneous Detection and Segmentation


Hariharan et al. ECCV14
models + code
fully convolutional networks are fast, end-
to-end models for pixelwise problems

- code in Caffe branch (merged soon) caffe.berkeleyvision.org

- models for PASCAL VOC, NYUDv2,


SIFT Flow, PASCAL-Context in Model Zoo

fcn.berkeleyvision.org
github.com/BVLC/caffe
models
- PASCAL VOC standard for object segmentation
- NYUDv2 multi-modal rgb + depth scene segmentation
- SIFT Flow multi-task for semantic + geometric segmentation
- PASCAL-Context object + scene segmentation
inference

inference script (gist)


solving

solving script (gist)


Reshape
- Decide shape on-the-fly in C++ / Python / MATLAB
- DataLayer automatically reshapes
for batch size == 1
- Essentially free
(only reallocates when necessary)
Helpful Layers
- Losses can take spatial predictions + truths
- Deconvolution / “backward convolution”
can compute interpolation
- Crop: maps coordinates between layers
FCN for Pose Estimation
Georgia Gkioxari
UC Berkeley
FCN for Pose Estimation
Input data:
Image
FCN for Pose Estimation
Input data:
Image Keypoints
FCN for Pose Estimation
Input data:
Image Keypoints

Define an area around the


keypoint as its positive
neighborhood with radius r.
FCN for Pose Estimation
Input data:
Image Keypoints Labels
FCN for Pose Estimation
Input data:
Image Labels
Heat Map Predictions from FCN

Test Image Right Ankle Right Knee Right Hip Right Wrist Right Elbow Right Shoulder
Heat Map Predictions from FCN

Test Image Right Ankle Right Knee Right Hip Right Wrist Right Elbow Right Shoulder

Two modes because there


are two Right Shoulders in
the image!
Heat Maps to Keypoints
PCK @ 0.2 LSP test set FCN baseline PCK == ~69%
Ankle 56.5

Knee 60.0

Hip 56.6
State-of-the-art == ~72%
Wrist 62.9

Elbow 71.8

Shoulder 78.8

Head 93.6
Details
Architecture:
● FCN - 32 stride. No data augmentation.
● radius = 0.1*im.shape[0] (no cross validation)

Runtime on a K40:
● 0.7 sec/iteration for training (15hrs for 80K iterations)
● 0.25 sec/image for inference for all keypoints
conclusion
fully convolutional networks are fast, end-
to-end models for pixelwise problems

- code in Caffe branch (merged soon) caffe.berkeleyvision.org

- models for PASCAL VOC, NYUDv2,


SIFT Flow, PASCAL-Context in Model Zoo

fcn.berkeleyvision.org
github.com/BVLC/caffe

You might also like