0% found this document useful (0 votes)
8 views40 pages

Od Segment 221219 043435

The document discusses deep learning techniques for object detection and segmentation, comparing methods like RCNN, YOLO, and SSD. It explains the concepts of semantic and instance segmentation, highlighting architectures such as U-Net and Mask R-CNN. Additionally, it outlines various applications in fields like robotics, medical imaging, and autonomous vehicles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views40 pages

Od Segment 221219 043435

The document discusses deep learning techniques for object detection and segmentation, comparing methods like RCNN, YOLO, and SSD. It explains the concepts of semantic and instance segmentation, highlighting architectures such as U-Net and Mask R-CNN. Additionally, it outlines various applications in fields like robotics, medical imaging, and autonomous vehicles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Deep learning – Object Detection and Segmentation

Classification vs. Detection


✓ Dog

Dog
Dog
Object Detection
deer

cat
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
with Sliding Window
deer?
CNN cat?
background?
Object Detection as Classification
with Box Proposals
Histogram of Oriented Gradients (HOG) - 1986
Example:
Demo code:
HOG\HOG.py
Object Detection

• The RCNN Object Detector (2014)


• The Fast RCNN Object Detector (2015)
• The Faster RCNN Object Detector (2016)
• The YOLO Object Detector (2016)
• The SSD Object Detector (2016)
• Mask-RCNN (2017)
RCNN

https://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf
Rich feature hierarchies for accurate object detection and semantic
segmentation. Girshick et al. CVPR 2014.
Fast-RCNN

Idea: No need to recompute features for every box independently,


Regress refined bounding box coordinates.
https://arxiv.org/abs/1504.08083
https://github.com/sunshineatnoon/Paper-
Fast R-CNN. Girshick. ICCV 2015. Collection/blob/master/Fast-RCNN.md
Faster-RCNN

Idea: Integrate the Bounding


Box Proposals as part of the
CNN predictions

https://arxiv.org/abs/1506.01497
Ren et al. NIPS 2015.
YOLO- You Only Look Once

Idea: No bounding
box proposals.
Predict a class and a
box for every location
in a grid.

https://arxiv.org/abs/1506.02640 Redmon et al. CVPR 2016.


YOLO- You Only Look Once

Divide the image into 7x7 cells.


Each cell trains a detector.
Demo Code: The detector needs to predict the object’s class distributions.
YOLO\ytest.py The detector has 2 bounding-box predictors to predict
bounding-boxes and confidence scores.

https://arxiv.org/abs/1506.02640 Redmon et al. CVPR 2016.


SSD: Single Shot Detector

Idea: Similar to YOLO, but denser grid map, multiscale grid maps. +
Data augmentation + Hard negative mining + Other design choices i
n the network. Liu et al. ECCV 2016.
Demo Code: Non-Max Suppression: Non-Max Supression (NMS) is a technique used to select
NMS\Nms.py one bounding box for an object if multiple bounding boxes were detected with
varying probability scores by object detection algorithms(example: Faster R-
CNN,YOLO)
(Intersection over Union)
Segmentation
What is the difference?

Left image, every pixel belongs to a particular class (either background or person). Also, all the pixels belonging
to a particular class are represented by the same color (background as black and person as pink). This is an
example of semantic segmentation

Right image has also assigned a particular class to each pixel of the image. However, different objects of the
same class have different colors (Person 1 as red, Person 2 as green, background as black, etc.). This is an
example of instance segmentation
Thresholding

Edge Segmentation
Deep Learning-based methods

Convolutional Encoder-Decoder Architecture

SegNet -2015
Mask R-CNN

1. We take an image as input and pass it to the ConvNet, which returns the feature map for that image
2. Region proposal network (RPN) is applied on these feature maps. This returns the object proposals along with
their objectness score
3. A RoI pooling layer is applied to these proposals to bring down all the proposals to the same size
4. Finally, the proposals are passed to a fully connected layer to classify and output the bounding boxes for
objects. It also returns the mask for each proposal
U-Net – medical image segmentation

U-Net: The U-Net solves problems of general CNN networks used for medical image
segmentation, since it adopts a perfect symmetric structure and skip connection.

Different from common image segmentation, medical images usually contain noise and show
blurred boundaries. Therefore, it is very difficult to detect or recognize objects in medical
images only depending on image low-level features.

Meanwhile, it is also impossible to obtain accurate boundaries depending only on image


semantic features due to the lack of image detail information. Whereas, the U-Net effectively
fuses low-level and high-level image features by combining low-resolution and high-
resolution feature maps through skip connections, which is a perfect solution for medical
image segmentation tasks.

Currently, the U-Net has become the benchmark for most medical image segmentation tasks
and has inspired a lot of meaningful improvements
The low-level information helps to improve accuracy. The high-level information helps to extract complex features.

Demo code:
UNET\runtrain.py
Annotation
https://www.mdpi.com/2071-1050/13/3/1224/pdf
Image segmentation applications
Robotics (Machine Vision)
1. Instance segmentation for robotic grasping
2. Recycling object picking
3. Autonomous navigation and SLAM

https://youtu.be/aZkmeGIWZVw

Medical imaging
1.Medical image segmentation is the process of extracting the desired object
(organ) from a medical image (2D or 3D)
2. X-Ray segmentation
3. CT scan organ segmentation
4. Dental instance segmentation
5. Digital pathology cell segmentation
6. Surgical video annotation

https://youtu.be/wYdI12EN00M
3.Self Driving Cars
Drivable surface semantic segmentation
Car and pedestrian instance segmentation
In-vehicle object detection (stuff left behind by passengers)
Pothole detection and segmentation

and many …

You might also like