0% found this document useful (0 votes)

11 views4 pages

Mask R-CNN

Benedict Aryo presents a session on Mask R-CNN and its application using Detectron 2, emphasizing the importance of understanding objectives in computer vision rather than just learning models. He discusses the evolution of image classification to instance segmentation and highlights the differences between Mask R-CNN and Faster R-CNN. The presentation includes open-source resources for attendees to access the code and materials for practical application.

Uploaded by

BennedictLuisant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views4 pages

Mask R-CNN

Uploaded by

BennedictLuisant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Mask R-CNN using Detectron 2

Slide 1:
Hi, thank you for this opportunity to sharing session.
my Name is Benedict Aryo (introduction)

Today I'm going to share my learning journey about Mask R-CNN and applying it using Detectron 2.

Slide 2:
Here's the agenda for this sharing session, we will start by some intuition before we jump into the code, I will provide
the code in jupyter notebook file and this ppt & pdf file.
I make these open source in my github actually, so everyone can use it freely both the code & the presentation the links
provided in the last slide.
I made the code in google colab also so you guys didn't have to install anything locally.
Some note, for training previously I put as an optional because of maybe it can takes more time to training, but, I
changed the sample dataset in a last minute so I think the training will be shorter.

Slide 3:
Honestly, I dig into the Mask R-CNN deeply because I was asingned by pak Sankar.
But it kinda blessing in disguise, since it turns out that I didn't know much about this topic.

Ok let me share my view, When I was thinking about mask r-cnn I think it's kind of a part of R-CNN family, which kind of
true, you know, they both utilize region proposal to detect object,
but the different is it makes not only a bounding box but also segmenting the object like a mask.

Slide 4:
So I though that, the Mask R-CNN is kind of Faster R-CNN with steroids or some magic.

Because beside creating bounding box of object detection, it also provide kind of information about where is the object
precisely in the image, into the pixel level.
That's why I called it 'some magic'. My question at that time be like 'Mask R-CNN is just Faster R-CNN with added
features right?'

So I started questioning it and it lead me to some new information, some new angle point of view.

Slide 5:
It turns out that I came into conclusion that Mask R-CNN & Faster R-CNN are serving different purposes.
I understand that maybe your response be like, 'Wait.. What?'

Slide 6:
So In fact,
The pictures that I showed you earlier is Not a Faster R-CNN & Mask R-CNN,
Even these pictures is not about R-CNN at all.

The first picture in the left is take from YOLOv3 paper by joseph redmond, they created model called Yolo which is
version 3 using Darknet as backbone. It targets to solving object detection in real time,
And the right picture is YOLACT, as you may guest, it provide object instance segmentation in real time, it's newer than
Mask R-CNN & also of course faster.

So like, what's the point showing you this.

Slide 7:
I think the point is we should focus on the objectives instead of just learning model by model.

Quoting from pak Sankar on last week he was remind me that, models we can learn.
Our focus should be how to solve the problems, and by mean we should have understanding the objectives of the
problems that we want to solve.

Slide 8:
Ok, so let's dive into the problems in Computer Vision especially Image Recognition
The first one is Image Classification, this is very common,
Given a picture, we predict what object is in the picture, usually 1 image contain 1 object.
Since it's classification, you can literally use any models for classification, like Logisitic Regression, SVM, Random Forest,
AdaBoost, etc.

But since 2012, after the invention of AlexNet who won ImageNet Competition, people starts to move into the neural
network approach. Then every year there's new model neural network architecture that capable solving image
classification with high accuracy like AlexNet, Densenet, ResNet, VGG, Inception, and more.

Slide 9:
Ok, after we can classify what object is in the picture, can we not just classify what object in the image, but deep into the
pixel level of classification, means that we classify each pixel belong to which class, This known as Semantic
Segmentation. Because we are not jus classify the object but also segmenting it.

Slide 10:
Next is Classification + Localization
So long before object detection that we know today, there's approach which is still relevant today, called Classification +
Localization.

This approach is improvement of image classification, so instead of just knowing what the image is (the object) it's also
provide information of where the object is located.

The information of location is identified by bounding box that we obtain by regressing the 4 point of the bounding box

Slide 11:
And then, next is Object Detection
I think it's already well explained in previous sharing session in the Faster R-CNN section

Slide 12:
Next is Instance Segmentation & Panoptic Segmentation.
In short Instance Segmentation is combination of Semantic Segmentation and Object Detection, which we will discuss
about it in a minute,
And Panoptic Segmentation which is kind of combination of Semantic Segmentation & Instance Segmentation,
unfortunately we won't discuss about this since I haven't look at it very deeply, but I hope it might inspire you for next
sharing session if you interested in that.

Ok, any question ?

If not, then I'm the one who asking question.
My question is what do you think the application for these method ?,
For example for precision forestry project we use object detection to count the tree.
Slide 13:
Ok, now we're talking about Instance Segmentation,
Actually when we're talking about Instance Segmentation, what we mean by that is Semantic Segmentation which have
Instance-awareness.

As I know, cmiiw
It was first mentioned in paper called Instance-aware semantic segmentation via MNC by Kaiming He who later move to
facebook and Invent mask r-cnn.

Slide 14:
Ok so what is the differences,
As you can see here that in semantic segmentation, it can precisely segmenting the object, in this case balloon, but all
the baloon is the same object segment, whereas in object detection, as we know, it can detect multiple item with the
same class, that's why its useful to like counting object,
Instance segmentation is like combining the best of both worlds, we can precisely get the accurate detection into the
pixel level while maintaining detection of many object.

Slide 15:
But Before we go to the Mask R-CNN for Instance Segmentation, for quick recap
The famous R-CNN Family for Object Detection.
The original R-CNN used fix function like selective search to make proposal of interesting region, by mean that region
that have chance where object is located, then every region proposal they put it into the CNN and combine with Bbox
regressor and SVM for class classification.

As you might notice, this was very slow, because we do CNN for Every object. So in Fast R-CNN they using one big cnn for
all the proposed region together then later combined with Fully Connected Layer for Linear bbox regresor and FCN
Linear with Softmax layer for object classification.

And Faster R-CNN, this kind a revolutionary since, it can give the region proposal inside the the network itself, I think the
detail already explained in previous sharing session.

Slide 16:
So here come the Mask R-CNN by Kaiming He & Ross Girshick, I think you know both of them, Kaiming He is creator of
MNC & Ross created ResNet & Fast R-CNN while he still in Microsoft, so it's funny since those 2 is created Mask R-CNN in
Facebook and both previously work in Microsoft.

As you can read in the Abstract of paper, they extends Faster R-CNN by adding branch for predicting and object mask in
parallel with existing branch for bounding box recognition, and as mentioned in the paper the code for this paper is
open sourced in github facebookreserach/detectron so this repository contain the original implementation from the
paper.

Slide 17:
Ok, let's play kids game,
I put Faster R-CNN & Mask R-CNN model architecture side by side here.
Find how many differences between those 2 ?
What are those ?

Ok so what happened if in Mask R-CNN I remove the Mask Branch ?

Yes, It become Faster R-CNN

Next, can Faster R-CNN using RoIAlign instead of RoIPooling ?

Yes, it can in fact that's what happened in Skymap modelling, they using Mask R-CNN and remove the Mask Branch.
Slide 18:
So here's the detail of the Mask R-CNN, given that the different is quite small (only adding Mask Branch for
Segmentation) means that it also only add small overhead to the performance.

Slide 19:
So here's some Popular Implementation of Mask R-CNN.
The Facebook research version the detectron is the original implementation from the paper, but since it's using Caffe2
Framework, most people will prefer the Matterport version which is implemented in Tensorflow.
But, the repository is no longer maintened means that if we want to use that we need to modify the library it self so it
can compatible to newer Tensorflow Framework. In some cases, you are no longer can use the Matterport version in
GPU because of the Cuda version supported is quite old, so you may need to downgrade your Driver software which is
quite pain.

Slide 20:
So that's why I choose to use Detectron2 , it is ground up rewrite of the previous version, it's new, well maintened and
well documented.

Detectron2 is not only support Mask R-CNN Instance Segmentation, but also support Object Detection, Panoptic
Segmentation & Keypoints detection.
In short, it's kinda like Scikit-learn but for Computer Vision

Slide 21:
Ok, as mentioned earlier this whole presentation & the Code that I'm going to demonstrated is open sources, it's
publicly available, you can visit the shorten links here.
You can download or Clone Locally to your laptop or Server
Or if you want to try it first you can scoll in the page and find the button
"Open in Colab" and try directly there, so no need installation locally.
I will show you in a second.

Basics of Image Processing
No ratings yet
Basics of Image Processing
169 pages
Machine Vision
100% (4)
Machine Vision
453 pages
Object Detection and Identification
67% (3)
Object Detection and Identification
20 pages
Advanced Image Segmentation Techniques
No ratings yet
Advanced Image Segmentation Techniques
71 pages
Part 2
No ratings yet
Part 2
225 pages
A Comprehensive Review of Modern Object Segmentation Approaches
No ratings yet
A Comprehensive Review of Modern Object Segmentation Approaches
177 pages
Presentation (Theoretical Evaluation)
No ratings yet
Presentation (Theoretical Evaluation)
107 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Shan Englot IROS 2018 Preprint
No ratings yet
Shan Englot IROS 2018 Preprint
8 pages
01-02 Introduction To CV and Segmentation
No ratings yet
01-02 Introduction To CV and Segmentation
85 pages
MV cs4243 2024 Amir 6 p2
No ratings yet
MV cs4243 2024 Amir 6 p2
95 pages
The Framework For Object Detection: Generalized R-CNN
No ratings yet
The Framework For Object Detection: Generalized R-CNN
127 pages
DL Unit 5
No ratings yet
DL Unit 5
63 pages
Lec 2 (Image Segemnation)
No ratings yet
Lec 2 (Image Segemnation)
52 pages
Summary
No ratings yet
Summary
65 pages
CVR FDP
No ratings yet
CVR FDP
37 pages
DL4CV BonusBundle
No ratings yet
DL4CV BonusBundle
79 pages
Early Detection of Brain Cancer Based On
No ratings yet
Early Detection of Brain Cancer Based On
81 pages
Cviii 2024 Ws
No ratings yet
Cviii 2024 Ws
45 pages
Advanced Topics in CNN and RNN
No ratings yet
Advanced Topics in CNN and RNN
72 pages
L10 Lecture Detection - Segmentation v2.5
No ratings yet
L10 Lecture Detection - Segmentation v2.5
35 pages
Od Segment 221219 043435
No ratings yet
Od Segment 221219 043435
40 pages
5 Hackett Group
No ratings yet
5 Hackett Group
39 pages
Lecture 22 MaskRCNN
No ratings yet
Lecture 22 MaskRCNN
36 pages
Lecture-22-CAP6412 Spring2018 Mask-RCNN New
No ratings yet
Lecture-22-CAP6412 Spring2018 Mask-RCNN New
36 pages
Od Segment
No ratings yet
Od Segment
53 pages
Lec 19 - Image Segmentation-Methods
No ratings yet
Lec 19 - Image Segmentation-Methods
30 pages
Object Detection and Segmentation - Part 2
No ratings yet
Object Detection and Segmentation - Part 2
36 pages
Lecture-22-Presentation Mask RCNN
No ratings yet
Lecture-22-Presentation Mask RCNN
32 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
Haymanot Taddese 2021
No ratings yet
Haymanot Taddese 2021
50 pages
Instance Segmentation
No ratings yet
Instance Segmentation
51 pages
Scene Change Detection
No ratings yet
Scene Change Detection
31 pages
Image Classification: Mário Caetano
No ratings yet
Image Classification: Mário Caetano
71 pages
Vision
No ratings yet
Vision
24 pages
K Means Clustering - 11032022
No ratings yet
K Means Clustering - 11032022
20 pages
He Mask R-CNN ICCV 2017 Paper PDF
No ratings yet
He Mask R-CNN ICCV 2017 Paper PDF
9 pages
02 - Efficient (3) - JupyterLab
No ratings yet
02 - Efficient (3) - JupyterLab
19 pages
Instance and Panoptic Seg Using Conditional Convolutions
No ratings yet
Instance and Panoptic Seg Using Conditional Convolutions
18 pages
Pattern Codification Strategies in Struc PDF
No ratings yet
Pattern Codification Strategies in Struc PDF
23 pages
Computer VIsion Applications
No ratings yet
Computer VIsion Applications
30 pages
cs231n 2018 ds06
No ratings yet
cs231n 2018 ds06
38 pages
IVP Notes
No ratings yet
IVP Notes
25 pages
Intelligent Geoengineering
No ratings yet
Intelligent Geoengineering
18 pages
AYESHA Research Proposal-1
No ratings yet
AYESHA Research Proposal-1
10 pages
1 s2.0 S0957417424000757 Main
No ratings yet
1 s2.0 S0957417424000757 Main
11 pages
Center Mask
No ratings yet
Center Mask
10 pages
An Enhanced Swin Transformer For Soccer Player Reidentification
No ratings yet
An Enhanced Swin Transformer For Soccer Player Reidentification
14 pages
Radiographic Bone Texture Analysis Using Deep Learning Models For Early Rheumatoid Arthritis Diagnosis
No ratings yet
Radiographic Bone Texture Analysis Using Deep Learning Models For Early Rheumatoid Arthritis Diagnosis
15 pages
Unet
No ratings yet
Unet
8 pages
SSRN Id3028565
No ratings yet
SSRN Id3028565
17 pages
Clrernet: Improving Confidence of Lane Detection With Laneiou
No ratings yet
Clrernet: Improving Confidence of Lane Detection With Laneiou
10 pages
Term Paper - DL
No ratings yet
Term Paper - DL
22 pages
1 Introduction
No ratings yet
1 Introduction
24 pages
Mazen Hany Abd El Salam Hassan
No ratings yet
Mazen Hany Abd El Salam Hassan
8 pages
Iterative Loop Method Combining Active and Semi-Supervised Learning For Domain Adaptive Semantic Segmentation
No ratings yet
Iterative Loop Method Combining Active and Semi-Supervised Learning For Domain Adaptive Semantic Segmentation
10 pages
1 Image Segmentation Using Deep Learning
No ratings yet
1 Image Segmentation Using Deep Learning
6 pages
10 21541-Apjess 1542885-4187651
No ratings yet
10 21541-Apjess 1542885-4187651
5 pages
Segmentation-Aware Convolutional Networks Using Local Attention Masks
No ratings yet
Segmentation-Aware Convolutional Networks Using Local Attention Masks
11 pages
Retraction: Retracted: Deep Neural Networks For Medical Image Segmentation
No ratings yet
Retraction: Retracted: Deep Neural Networks For Medical Image Segmentation
16 pages
Object Detection Techniques A Review
No ratings yet
Object Detection Techniques A Review
9 pages
Curriculum Vitae Asif
No ratings yet
Curriculum Vitae Asif
9 pages
Comprehensive Review of R-CNN and Its Variant Arch
No ratings yet
Comprehensive Review of R-CNN and Its Variant Arch
8 pages
Cluster Analysis
No ratings yet
Cluster Analysis
43 pages
Du 2018 J. Phys. Conf. Ser. 1004 012029
No ratings yet
Du 2018 J. Phys. Conf. Ser. 1004 012029
9 pages
Mask RCNN
No ratings yet
Mask RCNN
6 pages
5 Major Computervision Technique
No ratings yet
5 Major Computervision Technique
10 pages
He 2017
No ratings yet
He 2017
9 pages
Object Detection: With Mask R-CNN
No ratings yet
Object Detection: With Mask R-CNN
8 pages
Matlab Project Titles
No ratings yet
Matlab Project Titles
14 pages
72 191 1 PB
No ratings yet
72 191 1 PB
6 pages
He Mask R-CNN Iccv 2017 Paper
No ratings yet
He Mask R-CNN Iccv 2017 Paper
9 pages
Object Detection in Pytorch Using Mask R-CNN
No ratings yet
Object Detection in Pytorch Using Mask R-CNN
4 pages
Time Stamp Extracting From CCTV Footage
No ratings yet
Time Stamp Extracting From CCTV Footage
10 pages
Real-Time Semantic Segmentation With Fast Attention
No ratings yet
Real-Time Semantic Segmentation With Fast Attention
7 pages
Kannada Text Recognition
No ratings yet
Kannada Text Recognition
7 pages
A Comprehensive Survey of The R-CNN Family For Object Detection
No ratings yet
A Comprehensive Survey of The R-CNN Family For Object Detection
6 pages
Maskrcnn PDF
No ratings yet
Maskrcnn PDF
12 pages
Facemask Detection Using MMdetection Toolbox
No ratings yet
Facemask Detection Using MMdetection Toolbox
6 pages
Introduction - Fast R-CNN (Object Detection) - by Sharif Elfouly - Medium
No ratings yet
Introduction - Fast R-CNN (Object Detection) - by Sharif Elfouly - Medium
4 pages
Review: Deepmask (Instance Segmentation) : An Instance Segment Proposal Method Driven by Convolutional Neural Networks
No ratings yet
Review: Deepmask (Instance Segmentation) : An Instance Segment Proposal Method Driven by Convolutional Neural Networks
6 pages
Objectdetection
No ratings yet
Objectdetection
7 pages
Mask
No ratings yet
Mask
12 pages
R-CNN (Object Detection) - A Beginners Guide To One of The Most - by Sharif Elfouly - Medium
No ratings yet
R-CNN (Object Detection) - A Beginners Guide To One of The Most - by Sharif Elfouly - Medium
6 pages
Object Detection Using Mask R-CNN
No ratings yet
Object Detection Using Mask R-CNN
5 pages
Sign Language To Text-Speech Translator Using Machine Learning
No ratings yet
Sign Language To Text-Speech Translator Using Machine Learning
5 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages