Copy Paste++ v1

The document presents Copy Paste++, an augmentation technique for instance segmentation that enhances the Simple Copy Paste method by applying mask-wise large scale jittering and introducing a stratified training approach. This method improves model robustness and allows training on smaller batch sizes, achieving a score of 22.9 AP on the LVIS Challenge 2021 in just 8 epochs. The authors suggest that further training could yield even better results, particularly with larger rescaling ratios.

Uploaded by

frozen.rock07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views3 pages

Copy Paste++ v1

Uploaded by

frozen.rock07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Copy Paste++

Akshat Mandloi* Shaunak Joshi* Sudarshan Kamath*

IIT Guwahati, Uavio Labs, IIT Guwahati
[email protected], [email protected], [email protected]

Abstract: We present an extended version of the Simple Copy Paste Augmentation [1]
which applies mask-wise large scale jittering (LSJ) that helps improve the robustness of the
model during training by helping it identify and learn objects in a size as well as scenario
agnostic manner for instance segmentation. We further introduce a stratified training tech-
nique which enables us to train our models on smaller batch sizes. Our model is able to
achieve a score of 22.9 AP on the LVIS Challenge 2021 - Instance Segmentation task [2],
within just 8 epochs, and is yet to converge.

1. Introduction
Instance segmentation is one of the prominent tasks in computer vision where the goal is to localize and classify
instances in an image. The LVIS dataset is one such instance segmentation dataset that has a large number of
categories. The number of images in some categories are much smaller than the others. This long tailed nature of
LVIS dataset poses a huge challenge to model training. Some existing works such as Balanced Group Softmax [3],
Seesaw Loss [4], Balanced Mosaic [5], and Equalization Loss [6] have shown that re-weighting the loss for
tail classes and enhancing images using augmentations are effective ways to achieve better results. One such
effective and efficient augmentation technique is Simple Copy Paste. Building on this, we propose the Copy
Paste++ Augmentation.

2. Previous Work
2.1. Seesaw Loss
Seesaw Loss is derived from cross-entropy loss and it works by accumulating the number of training samples
for each category during every training iteration. It uses two complementary factors, i.e., mitigation factor and
compensation factor to re-balance the gradients of positive and negative samples based on the number of accumu-
lated samples. The mitigation factor reduces punishments to tail categories, meanwhile, the compensation factor
increases the penalty of incorrectly classified instances to avoid false positives of rarely occurring categories.

2.2. SyncBN
Synchronized Batch Normalization(SyncBN)[ [7]] is a type of batch normalization used for multi-GPU training.
Standard batch normalization only normalizes the data within each device (GPU). SyncBN normalizes the input
within the whole mini-batch.

2.3. Repeat factor Sampling

Repeat Factor Sampling (RFS) [2] was applied by the authors of LVIS instance segmentation benchmark. It in-
creases the sampling rate for tail class instances by oversampling images containing these categories. We imple-
ment RFS with its best practice settings setting the oversampling threshold, t as 0.001.

2.4. Simple Copy Paste

The key idea behind the Copy-Paste augmentation is to paste objects from one image (source) to another image
(destination). The existing Simple Copy Paste Augmentation by Ghiasi et al., takes two random images, randomly
applies large scale jittering to both of them, flips them horizontally at random, chooses a random number of
instances from one image and pastes them onto the other. They then remove the annotations for objects which are
fully occluded and update the annotations for objects which are partially occluded.

* All authors have contributed equally

Fig. 1 Simple Copy Paste: (1) Source Image with Source Masks (colored), 2 Large Scale Jitter operation on Source
Image, (3) Pasting Source Masks to Destination Image, (4) Combined Image with Pasted Source Masks

3. Experiments
For our baseline, we use a Mask RCNN [8] architecture with Resnet-101 [9] as our backbone. We train it with
multi-scale training, repeat factor sampling, SyncBN and Seesaw Loss using a 2x scheduler provided by MMDe-
tection [10]. Further, we upgrade the architecture to Cascade Mask RCNN [11] which gives us an increment of 1.9
AP. We are fine tuning this model by using Copy Paste ++ to get a current score of 22.8 AP after 22 epochs. We
train our model on 8 Nvidia Tesla T4 GPUs, with a learning rate of 10e-5 using Stochastic Gradient Descent with
a Momentum of 0.9, along with a weight decay of 10e-4. We use a batch size of 16 images. The maximum detec-
tions per image was set to 1000 in all our inferences. All scores are reported on the LVIS V1 validation dataset.
For the Cascade Mask RCNN models we have trained for 24 epochs & for with the Copy Paste++ augmentation
we have trained for 8 epochs.*

Experiment List
Experiments Mask AP Mask APr Mask APc Mask AP f
Baseline 28.2 21.0 27.8 31.8
Baseline + Cascade Mask RCNN 30.1 21.1 30.3 33.9
Baseline + Cascade Mask RCNN + Copy 22.9 14.8 22.7 26.7
Paste ++*

4. Our Contribution
4.1. Copy Paste ++
Our approach, proposes an alternative to the Simple Copy Paste Algorithm. Instead of performing large scale
jittering on all the masks in the source image with the same random resize ratio, we randomly choose a set of
masks from the source image and perform large scale jittering independently on each of them with a different
random resize ratio. We then apply a smart pasting technique which not only modifies the annotations in the
destination image but also the source image, which ensures that masks jittered with different random scales do
not have annotations from completely occluded groundtruths and have updated annotations for partially occluded
groundtruths.
This helps us achieve two things. Firstly, we are able to enhance the size agnostic behavior of the model by
increasing the variance in sizes of individual masks by independently scaling each of them with a different resize
ratio. This also avoids any unwanted correlations between the feature representations corresponding to the sizes
of different objects in the same image.
Secondly, with the help of our smart pasting technique, we are able to provide new images with spatial overlaps
between non-spatially overlapping objects from the source image. This enhancement would not have been possible
in the original Copy Paste method.

* Training still ongoing, at the time of submission

Fig. 2 Copy Paste++: (1) Source Image with Source Masks (colored), 2 Granular Large Scale Jitter Operation
on each Source Mask, (3) Pasting Source Masks to Destination Image via Smart Pasting Technique to update
occluded Source Masks, (4) Combined Image with Pasted Source Masks

4.2. Stratified Training

We stratify our training into 3 different parts, we first start with a rescale ratio range of 0.8 to 1.2 till the model
achieves convergence. In the second phase of training, we increase the rescale range from 0.6 to 1.4 and finally, in
the third phase, we increase the range from 0.5 to 1.5. We find that not applying this stratified training technique
when training on small batch sizes e.g., 16, leads to a destabilized training and divergence of loss values.

5. Future Work
Since we were limited by computational constraints we were unable to train our model completely using our
stratified fine-tuning methodology, however we believe that our model can achieve even better results if trained
till convergence for larger ranges of re-scaling ratios.

6. Conclusion
We propose an augmentation technique, Copy Paste++, which is an extension of Simple Copy Paste. We also
propose a stratified fine-tuning method for researchers with limited compute, so that they are able to leverage our
augmentation technique, even with a batch size as small as 16.

References
1. Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, and Barret Zoph.
Simple copy-paste is a strong data augmentation method for instance segmentation, 2021.
2. Agrim Gupta, Piotr Dollár, and Ross Girshick. Lvis: A dataset for large vocabulary instance segmentation, 2019.
3. Yu Li, Tao Wang, Bingyi Kang, Sheng Tang, Chunfeng Wang, Jintao Li, and Jiashi Feng. Overcoming classifier
imbalance for long-tail object detection with balanced group softmax, 2020.
4. Jiaqi Wang, Wenwei Zhang, Yuhang Zang, Yuhang Cao, Jiangmiao Pang, Tao Gong, Kai Chen, Ziwei Liu, Chen Change
Loy, and Dahua Lin. Seesaw loss for long-tailed instance segmentation, 2021.
5. Wei Li Zhibin Wang Lei Chen, Qiang Zhou and Hao Li. Balanced mosaic and double classifier for large vocabulary
instance segmentation, 2020.
6. Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, and Junjie Yan. Equalization loss
for long-tailed object recognition, 2020.
7. Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. Con-
text encoding for semantic segmentation, 2018.
8. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn, 2018.
9. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015.
10. Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu,
Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue
Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. MMDetection: Open
mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
11. Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: High quality object detection and instance segmentation, 2019.

Global Skills of Drawing
No ratings yet
Global Skills of Drawing
2 pages
Simple Copy-Paste Is A Strong Data Augmentation Method For Instance Segmentation
No ratings yet
Simple Copy-Paste Is A Strong Data Augmentation Method For Instance Segmentation
13 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
A Triple Deep Image Prior Model For Image Denoising Based On Mixed Priors and Noise Learning
No ratings yet
A Triple Deep Image Prior Model For Image Denoising Based On Mixed Priors and Noise Learning
19 pages
REF-19-Deep Networks For Image Super-Resolution With Sparse Prior
No ratings yet
REF-19-Deep Networks For Image Super-Resolution With Sparse Prior
10 pages
Taskonomy: Disentangling Task Transfer Learning
No ratings yet
Taskonomy: Disentangling Task Transfer Learning
11 pages
2 SmartBrush Inpainting
No ratings yet
2 SmartBrush Inpainting
10 pages
Low Power To High Power Translation
No ratings yet
Low Power To High Power Translation
8 pages
图像操作检测的多视图多尺度监督网络（TCSVT 2022）
No ratings yet
图像操作检测的多视图多尺度监督网络（TCSVT 2022）
14 pages
Making Convolutional Networks Shift-Invariant Again
No ratings yet
Making Convolutional Networks Shift-Invariant Again
17 pages
A Data-Related Patch Proposal For Semantic Segmentation of Aerial Images
No ratings yet
A Data-Related Patch Proposal For Semantic Segmentation of Aerial Images
5 pages
Predicting Images Using Convolutional Networks - Visual Scene Understanding With Pixel Maps
No ratings yet
Predicting Images Using Convolutional Networks - Visual Scene Understanding With Pixel Maps
149 pages
Aakanksha Improving Robustness of Semantic Segmentation To Motion-Blur Using Class-Centric Augmentation CVPR 2023 Paper
No ratings yet
Aakanksha Improving Robustness of Semantic Segmentation To Motion-Blur Using Class-Centric Augmentation CVPR 2023 Paper
10 pages
A Robust System For Noisy Image Classifi
No ratings yet
A Robust System For Noisy Image Classifi
12 pages
Non-Negative Matrix Factorization: Marshall Tappen 6.899
No ratings yet
Non-Negative Matrix Factorization: Marshall Tappen 6.899
26 pages
IEEE Xplore Reference Download 2024.9.24.8.30.58
No ratings yet
IEEE Xplore Reference Download 2024.9.24.8.30.58
2 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
Kim2019 Article LatentTransformationsNeuralNet
No ratings yet
Kim2019 Article LatentTransformationsNeuralNet
15 pages
Lecture 19
No ratings yet
Lecture 19
19 pages
Conference Template A4
No ratings yet
Conference Template A4
4 pages
Learning Enriched Features For Real Image Restoration and Enhancement
No ratings yet
Learning Enriched Features For Real Image Restoration and Enhancement
20 pages
Intelligent Robot Using Scale Invariant Feature Transform Detector
No ratings yet
Intelligent Robot Using Scale Invariant Feature Transform Detector
3 pages
Naveen Project Review 2 4th Sem
No ratings yet
Naveen Project Review 2 4th Sem
22 pages
AMP-Net: Denoising Based Deep Unfolding For Compressive Image Sensing
No ratings yet
AMP-Net: Denoising Based Deep Unfolding For Compressive Image Sensing
18 pages
Transfer Learning Approach For Splicing and Copy-Move Image Tampering Detection
No ratings yet
Transfer Learning Approach For Splicing and Copy-Move Image Tampering Detection
6 pages
Scaling Robot Learning With Semantically Imagined Experience
No ratings yet
Scaling Robot Learning With Semantically Imagined Experience
21 pages
Lec 14
No ratings yet
Lec 14
8 pages
Unsupervised Foggy Scene Understanding Via Self Spatial-Temporal Label Diffusion
No ratings yet
Unsupervised Foggy Scene Understanding Via Self Spatial-Temporal Label Diffusion
16 pages
Applied Sciences: Applications of Computer Vision in Automation and Robotics
No ratings yet
Applied Sciences: Applications of Computer Vision in Automation and Robotics
3 pages
300 PDF
No ratings yet
300 PDF
8 pages
Efficient Adaptation of Large Vision Transformer Via Adapter Re Composing Paper Conference
No ratings yet
Efficient Adaptation of Large Vision Transformer Via Adapter Re Composing Paper Conference
20 pages
Self Supervised Learning
No ratings yet
Self Supervised Learning
5 pages
Synopsis of Real Time Security System: Submitted in Partial Fulfillment of The Requirements For The Award of
No ratings yet
Synopsis of Real Time Security System: Submitted in Partial Fulfillment of The Requirements For The Award of
6 pages
Feature Enhancement in Visually Impaired Images
No ratings yet
Feature Enhancement in Visually Impaired Images
9 pages
Stable Diffusion 3 Paper
No ratings yet
Stable Diffusion 3 Paper
28 pages
From Text To Mask Localizing Entities Using The
No ratings yet
From Text To Mask Localizing Entities Using The
43 pages
Image-to-Image Difussion Models
No ratings yet
Image-to-Image Difussion Models
29 pages
Few-Shot Image Generation Via Style Adaptation and Content Preservation
No ratings yet
Few-Shot Image Generation Via Style Adaptation and Content Preservation
12 pages
DDNM
No ratings yet
DDNM
31 pages
Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model
No ratings yet
Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model
31 pages
Deep Perceptual Image Enhancement Network For Exposure Restoration
No ratings yet
Deep Perceptual Image Enhancement Network For Exposure Restoration
14 pages
Image Classification Based On Transfer Learning of CNN
No ratings yet
Image Classification Based On Transfer Learning of CNN
5 pages
Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
No ratings yet
Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
13 pages
Dense Transformer Networks For Brain Electron Microscopy Image Segmentation
No ratings yet
Dense Transformer Networks For Brain Electron Microscopy Image Segmentation
7 pages
Image Translation Based Synthetic Data Generation For Industrial Object Detection and Pose Estimation
No ratings yet
Image Translation Based Synthetic Data Generation For Industrial Object Detection and Pose Estimation
8 pages
S I I E R S D E: Emantic Mage Nversion and Diting Using Ectified Tochastic Ifferential Quations
No ratings yet
S I I E R S D E: Emantic Mage Nversion and Diting Using Ectified Tochastic Ifferential Quations
30 pages
A Sparse Representation and Dictionary Learning Ba
No ratings yet
A Sparse Representation and Dictionary Learning Ba
27 pages
Chen NTIRE 2024 Challenge On Image Super-Resolution x4 Methods and Results CVPRW 2024 Paper
No ratings yet
Chen NTIRE 2024 Challenge On Image Super-Resolution x4 Methods and Results CVPRW 2024 Paper
25 pages
4.1 - Unsupervised Visual Representation Learning by Context Prediction
No ratings yet
4.1 - Unsupervised Visual Representation Learning by Context Prediction
10 pages
Sparse Concept Coding For Visual Analysis
No ratings yet
Sparse Concept Coding For Visual Analysis
6 pages
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
He APSeg Auto-Prompt Network For Cross-Domain Few-Shot Semantic Segmentation CVPR 2024 Paper
No ratings yet
He APSeg Auto-Prompt Network For Cross-Domain Few-Shot Semantic Segmentation CVPR 2024 Paper
11 pages
Structure-Preserving Image Super-Resolution
No ratings yet
Structure-Preserving Image Super-Resolution
22 pages
Contrastive Learning For Unpaired Image-to-Image Translation
No ratings yet
Contrastive Learning For Unpaired Image-to-Image Translation
29 pages
Instant Neural Graphics Primitives With A Multiresolution Hash Encoding
No ratings yet
Instant Neural Graphics Primitives With A Multiresolution Hash Encoding
13 pages
Image Restoration Via Frequency Selection
No ratings yet
Image Restoration Via Frequency Selection
16 pages
Paulin Transformation Pursuit For 2014 CVPR Paper
No ratings yet
Paulin Transformation Pursuit For 2014 CVPR Paper
8 pages
Doctoral Dissertation For Junyong Lee, PH.D., POSTECH
No ratings yet
Doctoral Dissertation For Junyong Lee, PH.D., POSTECH
141 pages
Beyond A Gaussian Denoiser: Residual Learning of Deep CNN For Image Denoising
No ratings yet
Beyond A Gaussian Denoiser: Residual Learning of Deep CNN For Image Denoising
14 pages
2023 Bocconi 20600 Lez 1 Intro and Digital Images
No ratings yet
2023 Bocconi 20600 Lez 1 Intro and Digital Images
86 pages
Real Fill
No ratings yet
Real Fill
3 pages
Multisensor Wearable For Cortisol Proxy Detection and Stress Modulation in PCOS Management-1
No ratings yet
Multisensor Wearable For Cortisol Proxy Detection and Stress Modulation in PCOS Management-1
13 pages
Urbanrise Codename New Porur
No ratings yet
Urbanrise Codename New Porur
64 pages
Consent Form
No ratings yet
Consent Form
1 page
MASTER Consent Form 2024
No ratings yet
MASTER Consent Form 2024
2 pages
Sustainability - Feature Story
No ratings yet
Sustainability - Feature Story
2 pages
Kinetic Theory & Thermal Properties Notes IGCSE AVG
100% (3)
Kinetic Theory & Thermal Properties Notes IGCSE AVG
12 pages
Quidos Technical Bulletin - 15th September 2019
100% (1)
Quidos Technical Bulletin - 15th September 2019
7 pages
A Journey of Self-Actualization of Amir in The Kite Runner
No ratings yet
A Journey of Self-Actualization of Amir in The Kite Runner
4 pages
EMR System UI Design
No ratings yet
EMR System UI Design
3 pages
Sumana Bandyopadhyay - Kolkata The Colonial City in Transition - Reflections in Geographies of Urban India-Routledge (2022)
100% (1)
Sumana Bandyopadhyay - Kolkata The Colonial City in Transition - Reflections in Geographies of Urban India-Routledge (2022)
395 pages
VOCALOID 6 Reference Manual ENG
No ratings yet
VOCALOID 6 Reference Manual ENG
88 pages
Scipy - Stats.norm - SciPy v1.11.2 Manual
No ratings yet
Scipy - Stats.norm - SciPy v1.11.2 Manual
3 pages
Anderson ButcherConroy
No ratings yet
Anderson ButcherConroy
21 pages
Astm A799a799m - 10
No ratings yet
Astm A799a799m - 10
4 pages
MCQ Class 2 MS Word
No ratings yet
MCQ Class 2 MS Word
11 pages
Alemite Oil Mist Application Manual
100% (1)
Alemite Oil Mist Application Manual
34 pages
The Energy Transition Conference 2023 - Delegates Brochure
No ratings yet
The Energy Transition Conference 2023 - Delegates Brochure
25 pages
Operator'S Manual: T6.145 T6.155 T6.165 T6.175 T6.180 Autocommand
No ratings yet
Operator'S Manual: T6.145 T6.155 T6.165 T6.175 T6.180 Autocommand
22 pages
Solar-Powered Lawnmower Design and Development
No ratings yet
Solar-Powered Lawnmower Design and Development
8 pages
Course Structure R15me
No ratings yet
Course Structure R15me
217 pages
Ciao 6-1850 User Manual English
No ratings yet
Ciao 6-1850 User Manual English
8 pages
MasterCast 222 TDS-974770
No ratings yet
MasterCast 222 TDS-974770
2 pages
Google Ai ML Virtual Internship Report
No ratings yet
Google Ai ML Virtual Internship Report
29 pages
Castaneda Notes
No ratings yet
Castaneda Notes
10 pages
Calculators List Allowed
No ratings yet
Calculators List Allowed
1 page
Unit One: Lesson 10 "I'll Always Be Proud of Him"
No ratings yet
Unit One: Lesson 10 "I'll Always Be Proud of Him"
11 pages
Quiet Versus Loud Luxury The Influence of Overt and Covert Narcissism On Young Chinese and US Luxury Consumers' Preferences
No ratings yet
Quiet Versus Loud Luxury The Influence of Overt and Covert Narcissism On Young Chinese and US Luxury Consumers' Preferences
27 pages
A Conversation With William Rathje-Anthropology Today
No ratings yet
A Conversation With William Rathje-Anthropology Today
7 pages
Deterioration of Concrete
No ratings yet
Deterioration of Concrete
34 pages
Lecture Set Three-Wave Generator
No ratings yet
Lecture Set Three-Wave Generator
10 pages
45B Ahmed Shaikh AIML Journal
No ratings yet
45B Ahmed Shaikh AIML Journal
181 pages
Pollution Emitting From Guernsey Power Plant/PEH Incinerator and Proposed EtW
No ratings yet
Pollution Emitting From Guernsey Power Plant/PEH Incinerator and Proposed EtW
6 pages
Teacher Notes and Answers 8 Fluid Mechanics
No ratings yet
Teacher Notes and Answers 8 Fluid Mechanics
3 pages

Copy Paste++ v1

Uploaded by

Copy Paste++ v1

Uploaded by

Copy Paste++

Akshat Mandloi* Shaunak Joshi* Sudarshan Kamath*

2.3. Repeat factor Sampling

2.4. Simple Copy Paste

* All authors have contributed equally

* Training still ongoing, at the time of submission

4.2. Stratified Training

You might also like