Contrastive Learning For Object Detection

Uploaded by

djywithhjh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views5 pages

Contrastive Learning For Object Detection

Uploaded by

djywithhjh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Contrastive Learning for Object Detection

Rishab Balasubramanian Kunal Rathore

Oregon State University
{balasuri, rathorek}@oregonstate.edu
arXiv:2208.06412v1 [cs.CV] 12 Aug 2022

1. Introduction Learning_For_Object_Detection

1.1. Problem Statement

1.2. Scope and Challenges
In this project we will be working on Object Detection
using Contrastive Learning. The goal of the project is to We work in a supervised setting with labels for the
implement and evaluate the success of contrastive learning bounding boxes of objects and their corresponding class.
paradigm for learning better feature representations, and We work with VOC 2007, a sufficiently large dataset of
use these for object detection. annotated images belonging to 20 different classes. The
main challenge in our work would be the training time for
multiple experiments. Due to the longer training time, and
Contrastive Learning follows from traditional triplet-
batch size of contrastive learning, we would need access to
loss, where similarity between an “anchor” and “positive”
GPUs for our experiments. The class similarity rankings,
is maximized, and the similarity between “anchor” and
which are a user input are also a challenge as they would
“negative” is minimized. Contrastive learning is commonly
require manual tuning and multiple experiments to identify
used as a method of self-supervised learning with the
useful ranking order.
“anchor” and “positive” being two random augmentations
of a given input image, and the “negative” is the set of
all other images. This has been shown to outperform 2. Approach
traditional approaches such as the triplet loss and N-pair
loss in [1]. However, the requirement of large batch sizes Fig 1 shows the approach we follow for training our
and memory banks has made it difficult and slow to train model. Given an input image, we pass it through the Ob-
( [1], [3], [2]). This motivated the rise of Supervised ject Detection module to predict bounding boxes (Sec 2.1).
Contrasative approaches that overcome these problems Once the bounding boxes are predicted, we perform a Two-
by using annotated data [5]. However there is no explicit Crop transformation on each object in the image (Sec 2.2),
emphasis on learning good representation, but rather the and pass it through our Contrastive Learning framework
idea is to cluster points into regions such that they are (Sec 2.3). We divide the process into three main stages
separable in the higher dimensional parameter space. The • Object Detection
authors in [4] make an attempt at enforcing better represen-
tation learning into the contrastive learning framework by • Two Crop Augmentations
clustering classes together based on their similarity to each • Object Classification
other.
2.1. Object Detection
Inspired by this approach, we look to rank classes based In this work we use Faster RCNN [6] for object detec-
on their similarity, and observe the impact of human bias tion. Faster RCNN has two main components, a region
(in the form of ranking) on the learned representations. We proposal network (RPN), and a classification network.
feel this is an important question to address, as learning We remove the classification network and retain only the
good feature embeddings has been a long sought after proposed bounding boxes by the RPN. Fig 2 (adapted from
problem in computer vision. This would also be impor- d2l) shows the Faster RCNN pipeline used in our work.
tant for similar domains such as OOD detection, image Given the location of the bounding boxes, the image is
matching/retrieval, and other tasks which require a good cropped at these locations and a two-crop transformation is
representation of the images. Code available at https: performed.
/ / github . com / rishabbala / Contrastive _

1
Figure 1. Our Approach

h(q, P1 ) > h(q, P2 ) > · · · h(q, Pr ) > h(q, N ) (1)

Pr
We do this by defining a loss L = i=1 li where

P h(q,p)
p∈Pi exp( τi )
li = − log P h(q,p) P
p∈∪j≥i Pi exp( τi ) + h(q,n)
n∈N exp( τ )
i
(2)
This can be thought of as recursively computing the
loss (L), when considering the current highest ranked class
(i) as “positive” and all other classes as negative. After
Figure 2. Faster RCNN
computing the loss, the current highest ranked class (i) is
removed, and the loss is computed again for class i + 1.
To ensure good separation, we set τi+1 > τi , following the
2.2. Two Crop Transformation empirical studies provided in [4].
The cropped images at the location of the bounding
Opposed to [4], we rank classes instead of clustering
boxes are stacked together into a batch of images. We
them into groups. The difference between them is that in
first normalize the images using the mean and standard
our ranking, any class could be ranked similar to any other
deviation of the dataset. Then we follow the standard
class, with a user defined score. However, in clustering
transformations for Contrastive Learning proposed in [5],
as done in [4], only the classes within the same cluster
as shown in Fig 3. Of these we do not use the cutout, blur,
are considered similar. For example, [4] puts the classes
and sobel filter augmentations proposed. We perform two
“aeroplane” and “ship” together as “vehicles”. However,
random combination using a subset of these methods to
from human knowledge, we know that an “aeroplane” is
produce two augmentations of the image. The first is the
also (probably more) similar to a “bird” than a “ship”.
“anchor”, and the second falls in the “positive” class.
Hence in our method, we create the ranking for the class
“aeroplane” as {“bird”, “ship”, ... } with decreasing order
of similarity from left to right.
2.3. Object Classification
Fig 4 (adapted from [4]) shows the Contrastive Learning
approach we use. We follow [4], and use a ranked super-
3. Evaluation
vised contrastive learning method, where the ranking is 3.1. Implementation Details
user defined. This is different from traditional approaches
which use a single “positive” image/class. • We use the detectron2 library [7] open sourced by
Facebook for the bounding box predictions. We use
a ResNet 50 FPN backbone for object detection
For each anchor (query) image q, we rank a number of
similar classes as P1 · · · Pr , where r denotes the number of • We build upon the code provided in [4] to incorporate
positive classes in our ranking. We also define a negative our ranking and experiments.
class as N . Let h(q, x) be the cosine similarity between
the query and any other image x. Then, we can define our • We used ResNet backbone 50 for all our Contrastive
objective as enforcing : Learning experiments

2
Figure 3. Proposed transformations

Figure 4. Traditional Contrastive Learning approaches are binary (left), where there is a single “anchor”, and a single “positive” im-
age/class. We use a ranking system to improve learned features (right)

• We run our experiments with a batch size of 32 with truth bounding box annotations, and the corresponding
the VOC2007 dataset class for each bounding box.

• We trained our model for 500 epochs

• We use cosine similarity, with a learning rate of 0.5,

and a learning decay rate of 0.1. 3.3. Metrics And Comparison

• We set the temperature in the loss τ ∈ [0.1, 0.6] We evaluate the mAP for the object detection stage
and evaluate classification accuracy for the Contrastive
• The experiments are conducted with the number of
Learning model. We compare the accuracy with SupCL
positively ranked classes r ∈ {1, 3, 5}
( [5] where there is no ranking), RINCE ( [4], where similar
classes are clustered together), and SoftMax (common
3.2. Dataset
discriminative approach of training a ResNet 50 with a
We evaluate on the VOC2007 dataset, which has 20 SoftMax loss). Extra Credit: We also test our model for
classes, 9963 images and 24640 objects (bounding boxes). detecting OOD classes. In this case we plot the ROC curve
This is split into 5011 images in the training/val set and of True positive Rate vs False Positive Rate and compare
4952 images in the test set, with similar number of objects the Area Under the Curve.
between the two. The dataset provides the images, ground

3
Model AP AP50 AP75
Faster RCNN (trained on VOC2007 train+val) ResNet50 45.254 72.746 49.338
FPN
Table 1. Object Detection scores

Figure 5. Results

3.4. Results & Evaluations to the diverse nature of classes in the dataset, it resulted
in sparser ranking which affects the performance of this
We first evaluate the efficiency of the Faster-RCNN approach.
model in predicting bounding boxes. To do so, we compute
the AP scores using a pre-trained model. Table 1 shows
the AP, AP50, and AP75 scores of the Object detection Method Classification Accuracy
module we used. This is lower than the values reported
in the Faster RCNN paper, and also lower than more SupCL (r=1) 0.6499
modern approaches. Since the objective was not only Ours(r=3) 0.6068
object detection, we did not try different detection models. RINCE(r=5) 0.6368
SoftMax 0.5829

Table 2 shows the classification accuracy on the Table 2. Classification Accuracy on VOC2007
VOC2007 dataset, and Fig 5 shows the results from our
model. We observe that again our accuracy is comparable
to RINCE ( [4]) and SupCL ( [5]), while the discriminative
3.5. Extra Credit: Out of Distribution Detection
classification receives a much lower score. However, we
observed that our scores are slightly lesser than SupCL and We finally evaluate our model’s performance for OOD
RINCE. Since, all other parameters were similar during object detection. We evaluate on the VOC2007 dataset, with
testing, the two factors that affects these results the most 2 classes withheld and show the ROC curve in Fig 6 and the
is ranking, and the user-tuned class-similarity scores. Due AUROC in Table 3. We see that our model does not per-

4
(a) baseline (b) 1 positive class (r = 1) (c) 5 positive classes (r = 5)

Figure 6. ROC plots for VOC

form better than the baselines. This shows that our method [2] Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Im-
enforces good representation learning when human input proved baselines with momentum contrastive learning. arXiv
is given, but the representations for new classes are poor. preprint arXiv:2003.04297, 2020. 1
We can conclude that given an unobserved object class, our [3] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross
model pushes it close to one of the known classes thus re- Girshick. Momentum contrast for unsupervised visual repre-
sulting in poor results. sentation learning. In Proceedings of the IEEE/CVF confer-
ence on computer vision and pattern recognition, pages 9729–
9738, 2020. 1
Method AUROC
[4] David T Hoffmann, Nadine Behrmann, Juergen Gall, Thomas
SupCL (r=1) 0.6679 Brox, and Mehdi Noroozi. Ranking info noise contrastive es-
Ours(r=5) 0.5532 timation: Boosting contrastive learning via ranked positives.
SoftMax 0.5621 arXiv preprint arXiv:2201.11736, 2022. 1, 2, 3, 4
[5] Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna,
Table 3. AUROC on VOC2007 with 2 classes withheld Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and
Dilip Krishnan. Supervised contrastive learning. Advances
in Neural Information Processing Systems, 33:18661–18673,
3.6. Runtimes & Hardware 2020. 1, 2, 3, 4
We train all our models on the HPC cluster using a Tesla [6] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun.
V100 GPU. For training we use a batch size of 32, and Faster r-cnn: Towards real-time object detection with region
proposal networks. Advances in neural information process-
observe that it takes around 1.5-2 minutes per epoch During
ing systems, 28, 2015. 1
testing we observe that we take 2 minutes to to generate
[7] Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen
results over the validation set. Evaluation of the Average Lo, and Ross Girshick. Detectron2. https://github.
Precision scores for Faster RCNN takes about 5 minutes. com/facebookresearch/detectron2, 2019. 2

3.7. Individual Contributions

Kunal worked on the object detection pipeline, and its
evaluation. Rishab worked on setting up the Contrastive
Learning experiments and training them. We worked
together for OOD Detection. We filled up our respective
portions in the report, and made changes together.

References
[1] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge-
offrey Hinton. A simple framework for contrastive learning
of visual representations. In International conference on ma-
chine learning, pages 1597–1607. PMLR, 2020. 1

Object Detection Slides
No ratings yet
Object Detection Slides
90 pages
International Journal of Fatigue: Zhixin Zhan, Hua Li
No ratings yet
International Journal of Fatigue: Zhixin Zhan, Hua Li
15 pages
Object Detection and Identification
67% (3)
Object Detection and Identification
20 pages
Potato Disease Classification Using Deep Learning
No ratings yet
Potato Disease Classification Using Deep Learning
6 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Object Detection
No ratings yet
Object Detection
57 pages
Object Detection
No ratings yet
Object Detection
96 pages
MV cs4243 2024 Amir 6 p2
No ratings yet
MV cs4243 2024 Amir 6 p2
95 pages
Module 6
No ratings yet
Module 6
83 pages
Final Presentation On Object Detection
No ratings yet
Final Presentation On Object Detection
10 pages
ML Last Min Notes
No ratings yet
ML Last Min Notes
81 pages
Artificial Intelligence and Machine Learning For EDGE Computing 1st Edition Rajiv Pandey Download PDF
No ratings yet
Artificial Intelligence and Machine Learning For EDGE Computing 1st Edition Rajiv Pandey Download PDF
54 pages
Content-Based Image Retrieval Using Deep Learning
No ratings yet
Content-Based Image Retrieval Using Deep Learning
44 pages
L7 Detection
No ratings yet
L7 Detection
54 pages
Lec36 Obj Detn
No ratings yet
Lec36 Obj Detn
60 pages
Exploring Chatbot Development Using Python A Final Year Computer Science Project
No ratings yet
Exploring Chatbot Development Using Python A Final Year Computer Science Project
2 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
55 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
Lecture 6 CNN - Detection
No ratings yet
Lecture 6 CNN - Detection
48 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
L2 Supervised Learning
No ratings yet
L2 Supervised Learning
43 pages
DINTA Object Recognition
No ratings yet
DINTA Object Recognition
47 pages
AI-Driven Continuous Feedback Mechanisms in DevOps For Proactive Performance Optimization and User Experience Enhancement in Software Development
No ratings yet
AI-Driven Continuous Feedback Mechanisms in DevOps For Proactive Performance Optimization and User Experience Enhancement in Software Development
38 pages
Cviii 2024 Ws
No ratings yet
Cviii 2024 Ws
45 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
End-To-End Multimodal Deep Learning For Real-Time Decoding of Months-Long Neural Activity From 2 The Same Cells
No ratings yet
End-To-End Multimodal Deep Learning For Real-Time Decoding of Months-Long Neural Activity From 2 The Same Cells
34 pages
Traffic Sign Classification Slides
No ratings yet
Traffic Sign Classification Slides
29 pages
Integrating+LLMs+into+AI-Driven+Supply+Chains
No ratings yet
Integrating+LLMs+into+AI-Driven+Supply+Chains
35 pages
CVR FDP
No ratings yet
CVR FDP
37 pages
Ai Based Insect Detection and Pesticide Recommendation System Updated
No ratings yet
Ai Based Insect Detection and Pesticide Recommendation System Updated
32 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
22 pages
Bi Nov-19 (S.p. Vsit) (E-Next - In)
No ratings yet
Bi Nov-19 (S.p. Vsit) (E-Next - In)
21 pages
BTP Report Faster R CNN Compressed
No ratings yet
BTP Report Faster R CNN Compressed
32 pages
A Machine Learning Project Report Fake News Prediction
No ratings yet
A Machine Learning Project Report Fake News Prediction
24 pages
139 Pretrained Networks Object Detection
No ratings yet
139 Pretrained Networks Object Detection
22 pages
Generalized Focal Loss Towards Efficient Representation Learning For Dense Object Detection
No ratings yet
Generalized Focal Loss Towards Efficient Representation Learning For Dense Object Detection
15 pages
CSE4261 Lecture-12
No ratings yet
CSE4261 Lecture-12
24 pages
Report 34
No ratings yet
Report 34
22 pages
Learning To Detect Objects in Images Via A Sparse, Part-Based Representation
No ratings yet
Learning To Detect Objects in Images Via A Sparse, Part-Based Representation
28 pages
20051228我爱汪东城 1
No ratings yet
20051228我爱汪东城 1
16 pages
Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
No ratings yet
Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
16 pages
2024 Acl-Long 772
No ratings yet
2024 Acl-Long 772
17 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
Few-Shot Learning Tutorial - Medium
No ratings yet
Few-Shot Learning Tutorial - Medium
16 pages
Cornernet: Detecting Objects As Paired Keypoints
No ratings yet
Cornernet: Detecting Objects As Paired Keypoints
17 pages
Lecture 19
No ratings yet
Lecture 19
19 pages
BTP PPT Phase1
No ratings yet
BTP PPT Phase1
14 pages
CornerNet Detecting Objects As Paired Keypoints
No ratings yet
CornerNet Detecting Objects As Paired Keypoints
14 pages
Phishing
No ratings yet
Phishing
13 pages
Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
Center Net
No ratings yet
Center Net
12 pages
Data-Efficient Image Recognition With Contrastive Predictive Coding
No ratings yet
Data-Efficient Image Recognition With Contrastive Predictive Coding
13 pages
Titanic Dataset Model Prediction
No ratings yet
Titanic Dataset Model Prediction
11 pages
Hybrid Deep Learning Algorithms For Dog Breed IdentificationA Comparative Analysis
No ratings yet
Hybrid Deep Learning Algorithms For Dog Breed IdentificationA Comparative Analysis
12 pages
Henaff 20 A
No ratings yet
Henaff 20 A
11 pages
2022 - Enhanced Feature Fusion and Multiple Receptive Fields Object Detection
No ratings yet
2022 - Enhanced Feature Fusion and Multiple Receptive Fields Object Detection
12 pages
Dense Constrastive Learning For Self Supervised Visual Pre Training
No ratings yet
Dense Constrastive Learning For Self Supervised Visual Pre Training
11 pages
Sagar Institute of Research & Technology Department of Electronics & Communication
No ratings yet
Sagar Institute of Research & Technology Department of Electronics & Communication
13 pages
Agarwal Contrastive Learning of Semantic Concepts For Open-Set Cross-Domain Retrieval WACV 2023 Paper
No ratings yet
Agarwal Contrastive Learning of Semantic Concepts For Open-Set Cross-Domain Retrieval WACV 2023 Paper
10 pages
Bottom-Up Object Detection by Grouping Extreme and Center Points
No ratings yet
Bottom-Up Object Detection by Grouping Extreme and Center Points
10 pages
Li 2021 J. Phys.: Conf. Ser. 1827 012085
No ratings yet
Li 2021 J. Phys.: Conf. Ser. 1827 012085
11 pages
Mini Project CSE 3rd Year
No ratings yet
Mini Project CSE 3rd Year
11 pages
VGG Image Classification Practical
No ratings yet
VGG Image Classification Practical
11 pages
Scalable High Quality Object Detection
No ratings yet
Scalable High Quality Object Detection
10 pages
Weakly Supervised Contrastive Learning
No ratings yet
Weakly Supervised Contrastive Learning
10 pages
Autoencoder-Based Feature Extraction For Identifying Hate Speech Spreaders in Social Media
No ratings yet
Autoencoder-Based Feature Extraction For Identifying Hate Speech Spreaders in Social Media
9 pages
Knowledge-Based Systems
No ratings yet
Knowledge-Based Systems
10 pages
CenterNet Keypoint Triplets PDF
No ratings yet
CenterNet Keypoint Triplets PDF
10 pages
Object and Face Detection Based On Center-Net 1
No ratings yet
Object and Face Detection Based On Center-Net 1
7 pages
A Self-Attention Based Message Passing Neural Netw
No ratings yet
A Self-Attention Based Message Passing Neural Netw
10 pages
Adaptive Deconvolutional Networks For Mid and High Level Feature Learning
No ratings yet
Adaptive Deconvolutional Networks For Mid and High Level Feature Learning
8 pages
How To Read A Paper Involving AI
No ratings yet
How To Read A Paper Involving AI
10 pages
Phase 1 Review 0
No ratings yet
Phase 1 Review 0
8 pages
Image Sorting Using Object Detection and Face Recognition
No ratings yet
Image Sorting Using Object Detection and Face Recognition
6 pages
Objectdetection
No ratings yet
Objectdetection
7 pages
Innovative Assignment PDF
No ratings yet
Innovative Assignment PDF
11 pages
Scalable Object Detection
No ratings yet
Scalable Object Detection
8 pages
Deep Adaptive Input Normalization
No ratings yet
Deep Adaptive Input Normalization
7 pages
Clip Model
No ratings yet
Clip Model
7 pages
Overview of Object Detection Based On Deep Learnin
No ratings yet
Overview of Object Detection Based On Deep Learnin
7 pages
Haydrolojikal Kineme
No ratings yet
Haydrolojikal Kineme
7 pages
Few-Shot Object Detection With Refined Contrastive Learning (FSRC)
No ratings yet
Few-Shot Object Detection With Refined Contrastive Learning (FSRC)
6 pages
Last Lab Report
No ratings yet
Last Lab Report
6 pages
A Comprehensive Survey of The R-CNN Family For Object Detection
No ratings yet
A Comprehensive Survey of The R-CNN Family For Object Detection
6 pages
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
No ratings yet
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
6 pages
Literature Survey For Robotics
No ratings yet
Literature Survey For Robotics
6 pages
2802 8020 1 PB
No ratings yet
2802 8020 1 PB
3 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Geometric Feature Learning: Unlocking Visual Insights through Geometric Feature Learning
From Everand
Geometric Feature Learning: Unlocking Visual Insights through Geometric Feature Learning
Fouad Sabry
No ratings yet
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet