0% found this document useful (0 votes)
39 views6 pages

Last Lab Report

Uploaded by

Joy Talukder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views6 pages

Last Lab Report

Uploaded by

Joy Talukder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Project Report (Object Detection)

On
Computer Graphics Lab

(Department of Computer Science & Engineering)

Submitted by Submitted to
Joy Talukder Juel Sikder
Reg. 2016-13-34 Assistant Professor,Dept Of CSE
Session: 2016-2017
Department of CSE

1
Abstraction

Objection detection is a key problem in computer vision. We report our


work on object detection using neural network and other computer vision
features. We use Faster Region based Convolutional Neural Network method
(Faster R-CNN) for detection and then match the object with features from
both neural network and features like histograms of gradients (HoG). We are
able to achieve real-time performance and satisfactory matching results.

1 Introduction

Object detection is a challenging and exciting task in Computer Vision. Detection can be
difficult since there are all kinds of variations in orientation, lighting, background and
occlusion that can result in completely different images of the very same object. Now
with the advance of deep learning and neural network, we can finally tackle such
problems without coming up with various heuristics real-time.

We installed and trained the Faster R-CNN model 2 on Caffe deep learning framework.
The Faster RCNN is a region-based detection neural networks method. We firstly used a
region proposal network (RPN) to generate detection proposals. Then we employed the
same network structure to Fast R-CNN to classify the object and modified the bounding
box. Furthermore we extracted feature from object detected via algorithm developed
by us. At last we matched object detected with ones stored in database.

2 Related Work
In 2012, Krizhevsky and et al.[1] trained a deep convolutional neural network to classify
images in LSVRC-2010 ImageNet into 1000 kinds of classes with much better precision
than previous work, which marked the beginning of usage of deep learning in computer
vision. In 2014, Jia and et al.[2] created a clean and modifiable deep learning framework:
Caffe.

In 2015, Ren et al.[3] proposed faster Region Proposal Network. The network shared
convolutional features through the image, which led to almost cost-free regional
proposals. Besides faster R-CNN, there are many other approaches to improve CNN’s
performance. More recent approaches such like YOLO [4] and SSD [5] directly put the
whole image into the neural network and get the predicted boxes with score. Their
running time is further reduced compared to faster R-CNN. You only look once (YOLO)
applies a single neural network to the full image. This network divides the image into
regions and predicts bounding boxes and probabilities for each region. Single Shot Multi-
Box Detector(SSD) discretizes the output space of bounding boxes into a set of default
boxes over different aspect ratios and scales per feature map location.

2
3 Dataset and Features
For general object detection, we used SUN2012 database by MIT SAIL lab, which contains
images with objects labeled by polygons, as shown in Fig.1. For each category of object,
we extracted the

Figure 1: One database example.

Figure 2: Illustration of Faster R-CNN training process.

bounding box for object detection and further classification tasks. For training, we
divided the images of 3 categories to build up database. Also, we split the whole dataset
into training set (90%) and testing set (10%). What’s more, we tested out detecting
automobiles in different perspective. The images of cars are from the KITTI database,
which contains about 14,000 images. We chose the fully visible and partly occluded
automobiles for training and testing. The KITTI database also provides the observation
angle of each object. We used them to label the perspective ( front, back, left, right) of
automobiles.

4 Methodology
Faster Region-based Convolutional Neural Network
We trained the Faster Region-based Convolutional Neural Network (Faster R-CNN)
model on Caffe deep learning framework by Python language. The Faster R-CNN is a
region based detection method. It firstly used a region proposal network (RPN) to
generate detection proposals, then used the same network structure as Fast R-CNN to
classify object and modify the bounding box. The strategy is visualized in 2.

3
The training strategy of this model was as follows: First, we trained the RPN end-to-end
by back-propagation and stochastic gradient descent. We chose the ZF and VGG16 net
to extract features for the RPN and all the layers are initialized by a pre-trained model
for Image Net classification. We chose the learning rate 0.0001 for 40k mini-batches, a
momentum of 0.9 and a weight decay of 0.0005. Second, we trained a separate
detection network by Fast R-CNN using the proposals generated by the step 1 RPN for
20k mini-batches. This detection network was also initialized by a pre-trained model.
Now the two networks did not share convolutional layers. Third, we used the detector
network to initialize RPN training for 40k min-batches, but we fixed the shared
convolutional layers. At last, we kept the shared convolutional layers fixed and fine-tune
the unique layers of Fast R-CNN for 20 k mini-batches.

5 Results & Discussion


5.1 Object Detection
We successfully achieved detection of backpacks 3(a), towels 3(b), clocks, bottles, and
automobile in different perspectives (front, back, left and right) 3(c). And the detection
of automobile is with high detection precision, as sown in table 5.1:

Perspective Average Precision


Front 81.57 %
Back 86.40 %
Left 87.56 %
Right 79.31 %

Table 1: Average precision of detecting cars in different perspective.

Figure 3: Object Detection.

4
5.2 Object Detection output using pictures

6 Conclusion & Future Work

In conclusion, we trained Faster R-CNN to detect objects in real-time. And then we


extracted features, for instance, color, HoG, SIFT descriptors, and result (the last layer of
networks) given by faster R-CNN. Finally, we compared the object detected with those
in our database and decided the matching one, based on the features extracted.

5
For future work, we can conduct more thorough test of our matching algorithm using
larger dataset and more rigorous diagnostics of the algorithm. For example, we can plot
the error rate vs the size of the dataset or plot the confusion table of the test set, etc.

We tried out feature combinations such as fully connected layer, RGB color and HOG,
but more combinations of other features might be useful. Viable features including
rectangle features and other features from the neural network. However, if we confine
our attention to kNN, there are many other distance functions we can try out. Other
distance definitions including the Manhattan distance, Histogram intersection distance
and Chebyshev distance can be implemented with ease. Matching quality can be
improved by better detection. We also recognized that the observation angle is very
important for object matching. It’s essential to do the "car face" alignment. We can apply
the similar algorithm for face recognition in our project.

[1]

You might also like