0% found this document useful (0 votes)
11 views4 pages

Mask R-CNN

Benedict Aryo presents a session on Mask R-CNN and its application using Detectron 2, emphasizing the importance of understanding objectives in computer vision rather than just learning models. He discusses the evolution of image classification to instance segmentation and highlights the differences between Mask R-CNN and Faster R-CNN. The presentation includes open-source resources for attendees to access the code and materials for practical application.

Uploaded by

BennedictLuisant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views4 pages

Mask R-CNN

Benedict Aryo presents a session on Mask R-CNN and its application using Detectron 2, emphasizing the importance of understanding objectives in computer vision rather than just learning models. He discusses the evolution of image classification to instance segmentation and highlights the differences between Mask R-CNN and Faster R-CNN. The presentation includes open-source resources for attendees to access the code and materials for practical application.

Uploaded by

BennedictLuisant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Mask R-CNN using Detectron 2

Slide 1:
Hi, thank you for this opportunity to sharing session.
my Name is Benedict Aryo (introduction)

Today I'm going to share my learning journey about Mask R-CNN and applying it using Detectron 2.

Slide 2:
Here's the agenda for this sharing session, we will start by some intuition before we jump into the code, I will provide
the code in jupyter notebook file and this ppt & pdf file.
I make these open source in my github actually, so everyone can use it freely both the code & the presentation the links
provided in the last slide.
I made the code in google colab also so you guys didn't have to install anything locally.
Some note, for training previously I put as an optional because of maybe it can takes more time to training, but, I
changed the sample dataset in a last minute so I think the training will be shorter.

Slide 3:
Honestly, I dig into the Mask R-CNN deeply because I was asingned by pak Sankar.
But it kinda blessing in disguise, since it turns out that I didn't know much about this topic.

Ok let me share my view, When I was thinking about mask r-cnn I think it's kind of a part of R-CNN family, which kind of
true, you know, they both utilize region proposal to detect object,
but the different is it makes not only a bounding box but also segmenting the object like a mask.

Slide 4:
So I though that, the Mask R-CNN is kind of Faster R-CNN with steroids or some magic.

Because beside creating bounding box of object detection, it also provide kind of information about where is the object
precisely in the image, into the pixel level.
That's why I called it 'some magic'. My question at that time be like 'Mask R-CNN is just Faster R-CNN with added
features right?'

So I started questioning it and it lead me to some new information, some new angle point of view.

Slide 5:
It turns out that I came into conclusion that Mask R-CNN & Faster R-CNN are serving different purposes.
I understand that maybe your response be like, 'Wait.. What?'

Slide 6:
So In fact,
The pictures that I showed you earlier is Not a Faster R-CNN & Mask R-CNN,
Even these pictures is not about R-CNN at all.

The first picture in the left is take from YOLOv3 paper by joseph redmond, they created model called Yolo which is
version 3 using Darknet as backbone. It targets to solving object detection in real time,
And the right picture is YOLACT, as you may guest, it provide object instance segmentation in real time, it's newer than
Mask R-CNN & also of course faster.

So like, what's the point showing you this.


Slide 7:
I think the point is we should focus on the objectives instead of just learning model by model.

Quoting from pak Sankar on last week he was remind me that, models we can learn.
Our focus should be how to solve the problems, and by mean we should have understanding the objectives of the
problems that we want to solve.

Slide 8:
Ok, so let's dive into the problems in Computer Vision especially Image Recognition
The first one is Image Classification, this is very common,
Given a picture, we predict what object is in the picture, usually 1 image contain 1 object.
Since it's classification, you can literally use any models for classification, like Logisitic Regression, SVM, Random Forest,
AdaBoost, etc.

But since 2012, after the invention of AlexNet who won ImageNet Competition, people starts to move into the neural
network approach. Then every year there's new model neural network architecture that capable solving image
classification with high accuracy like AlexNet, Densenet, ResNet, VGG, Inception, and more.

Slide 9:
Ok, after we can classify what object is in the picture, can we not just classify what object in the image, but deep into the
pixel level of classification, means that we classify each pixel belong to which class, This known as Semantic
Segmentation. Because we are not jus classify the object but also segmenting it.

Slide 10:
Next is Classification + Localization
So long before object detection that we know today, there's approach which is still relevant today, called Classification +
Localization.

This approach is improvement of image classification, so instead of just knowing what the image is (the object) it's also
provide information of where the object is located.

The information of location is identified by bounding box that we obtain by regressing the 4 point of the bounding box

Slide 11:
And then, next is Object Detection
I think it's already well explained in previous sharing session in the Faster R-CNN section

Slide 12:
Next is Instance Segmentation & Panoptic Segmentation.
In short Instance Segmentation is combination of Semantic Segmentation and Object Detection, which we will discuss
about it in a minute,
And Panoptic Segmentation which is kind of combination of Semantic Segmentation & Instance Segmentation,
unfortunately we won't discuss about this since I haven't look at it very deeply, but I hope it might inspire you for next
sharing session if you interested in that.

Ok, any question ?


If not, then I'm the one who asking question.
My question is what do you think the application for these method ?,
For example for precision forestry project we use object detection to count the tree.
Slide 13:
Ok, now we're talking about Instance Segmentation,
Actually when we're talking about Instance Segmentation, what we mean by that is Semantic Segmentation which have
Instance-awareness.

As I know, cmiiw
It was first mentioned in paper called Instance-aware semantic segmentation via MNC by Kaiming He who later move to
facebook and Invent mask r-cnn.

Slide 14:
Ok so what is the differences,
As you can see here that in semantic segmentation, it can precisely segmenting the object, in this case balloon, but all
the baloon is the same object segment, whereas in object detection, as we know, it can detect multiple item with the
same class, that's why its useful to like counting object,
Instance segmentation is like combining the best of both worlds, we can precisely get the accurate detection into the
pixel level while maintaining detection of many object.

Slide 15:
But Before we go to the Mask R-CNN for Instance Segmentation, for quick recap
The famous R-CNN Family for Object Detection.
The original R-CNN used fix function like selective search to make proposal of interesting region, by mean that region
that have chance where object is located, then every region proposal they put it into the CNN and combine with Bbox
regressor and SVM for class classification.

As you might notice, this was very slow, because we do CNN for Every object. So in Fast R-CNN they using one big cnn for
all the proposed region together then later combined with Fully Connected Layer for Linear bbox regresor and FCN
Linear with Softmax layer for object classification.

And Faster R-CNN, this kind a revolutionary since, it can give the region proposal inside the the network itself, I think the
detail already explained in previous sharing session.

Slide 16:
So here come the Mask R-CNN by Kaiming He & Ross Girshick, I think you know both of them, Kaiming He is creator of
MNC & Ross created ResNet & Fast R-CNN while he still in Microsoft, so it's funny since those 2 is created Mask R-CNN in
Facebook and both previously work in Microsoft.

As you can read in the Abstract of paper, they extends Faster R-CNN by adding branch for predicting and object mask in
parallel with existing branch for bounding box recognition, and as mentioned in the paper the code for this paper is
open sourced in github facebookreserach/detectron so this repository contain the original implementation from the
paper.

Slide 17:
Ok, let's play kids game,
I put Faster R-CNN & Mask R-CNN model architecture side by side here.
Find how many differences between those 2 ?
What are those ?

Ok so what happened if in Mask R-CNN I remove the Mask Branch ?


Yes, It become Faster R-CNN

Next, can Faster R-CNN using RoIAlign instead of RoIPooling ?


Yes, it can in fact that's what happened in Skymap modelling, they using Mask R-CNN and remove the Mask Branch.
Slide 18:
So here's the detail of the Mask R-CNN, given that the different is quite small (only adding Mask Branch for
Segmentation) means that it also only add small overhead to the performance.

Slide 19:
So here's some Popular Implementation of Mask R-CNN.
The Facebook research version the detectron is the original implementation from the paper, but since it's using Caffe2
Framework, most people will prefer the Matterport version which is implemented in Tensorflow.
But, the repository is no longer maintened means that if we want to use that we need to modify the library it self so it
can compatible to newer Tensorflow Framework. In some cases, you are no longer can use the Matterport version in
GPU because of the Cuda version supported is quite old, so you may need to downgrade your Driver software which is
quite pain.

Slide 20:
So that's why I choose to use Detectron2 , it is ground up rewrite of the previous version, it's new, well maintened and
well documented.

Detectron2 is not only support Mask R-CNN Instance Segmentation, but also support Object Detection, Panoptic
Segmentation & Keypoints detection.
In short, it's kinda like Scikit-learn but for Computer Vision

Slide 21:
Ok, as mentioned earlier this whole presentation & the Code that I'm going to demonstrated is open sources, it's
publicly available, you can visit the shorten links here.
You can download or Clone Locally to your laptop or Server
Or if you want to try it first you can scoll in the page and find the button
"Open in Colab" and try directly there, so no need installation locally.
I will show you in a second.

You might also like