Chen Dense Learning Based Semi-Supervised Object Detection CVPR 2022 Paper

This document proposes a Dense Learning based (DSL) method for semi-supervised object detection without anchors. It introduces adaptive filtering to assign pseudo-labels to pixels, an aggregated teacher to enhance pseudo-label quality, and consistency regularization across scales and shuffled patches to improve generalization. Experiments on MS-COCO and PASCAL-VOC show the anchor-free DSL method outperforms existing semi-supervised object detection approaches.

Uploaded by

sourachakra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views10 pages

Chen Dense Learning Based Semi-Supervised Object Detection CVPR 2022 Paper

Uploaded by

sourachakra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Dense Learning based Semi-Supervised Object Detection

Binghui Chen1 , Pengyu Li1 , Xiang Chen1 , Biao Wang1 , Lei Zhang2 , Xian-Sheng Hua1
1
Alibaba Group, 2 The Hong Kong Polytechnic University
[email protected], [email protected], [email protected], [email protected]
[email protected], [email protected]

Abstract

Semi-supervised object detection (SSOD) aims to facil-

itate the training and deployment of object detectors with
the help of a large amount of unlabeled data. Though
various self-training based and consistency-regularization
based SSOD methods have been proposed, most of them
are anchor-based detectors, ignoring the fact that in many
real-world applications anchor-free detectors are more de-
manded. In this paper, we intend to bridge this gap
and propose a DenSe Learning (DSL) based anchor-free
SSOD algorithm. Specifically, we achieve this goal by
introducing several novel techniques, including an Adap-
tive Filtering strategy for assigning multi-level and accu- Figure 1. The SSOD performance comparisons between the pro-
rate dense pixel-wise pseudo-labels, an Aggregated Teacher posed anchor-free based DSL and anchor-based methods STAC
for producing stable and precise pseudo-labels, and an [38] and ISMT [48]. One can observe that anchor-based detec-
uncertainty-consistency-regularization term among scales tor Faster-RCNN [36] and anchor-free based detector FCOS [44]
and shuffled patches for improving the generalization ca- have similar baseline performance under the supervised settings,
pability of the detector. Extensive experiments are con- while our proposed DSL achieves the state-of-the-art SSOD per-
ducted on MS-COCO and PASCAL-VOC, and the re- formance, outperforming the existing methods by a large margin.
sults show that our proposed DSL method records new
state-of-the-art SSOD performance, surpassing existing semi-supervised objection detection (SSOD) methods.
methods by a large margin. Codes can be found at
https://github.com/chenbinghui1/DSL. The current state-of-the-art SSOD methods are pseudo-
label based approaches [31, 38, 48, 51], while most of them
are based on a two-stage anchor-based detector such as
1. Introduction Faster-RCNN [36]. Specifically, they first use a teacher
model to generate pseudo-labels for unlabeled images and
The recent rapid development of object detection (OD) then train a two-stage anchor-based detector with both la-
methods [5, 17, 40] largely owes to the availability of beled and unlabeled images. However, in real-world ap-
large-scale and well-annotated datasets, such as MS-COCO plications, the one-stage anchor-free based detectors (e.g.,
benchmark [27]. With the increasing demand for more FCOS [44]) are more attractive and practical since they are
powerful and accurate detection models, the need to col- much easier and efficient to be deployed on resource limited
lect and label more data also increases. However, manually devices without heavy pre/post-processing except NMS.
labeling the class labels and bounding-boxes for large-scale Different from Faster-RCNN, the learning of FCOS is es-
datasets is a very expensive and tedious job, which is not tablished on dense feature predictions; that is, each pixel
cost-effective in practical applications. As a remedy, semi- is directly supervised by the corresponding label. Without
supervised [38,48] and self-supervised [28] OD algorithms, the help of predefined anchors and multiple refinements of
which aim to employ the large amount of unlabeled data to the predictions, the learning of anchor-free based detectors
improve the performance of OD, have been attracting much requires more careful guidance, especially under the SSOD
attention in recent years. In this paper, we focus on the settings. Unfortunately, few works on anchor-free SSOD

4815
have been reported, and how to handle the dense pseudo- employed in [42] to produce accurate labels instead of la-
labels predicted by anchor-free detectors remains a chal- bel ensembles. Generally speaking, the above consistency-
lenging problem. based methods apply perturbations to the input image and
To address the above mentioned challenges, in this paper then minimize the differences between their output pre-
we propose a DenSe Learning (DSL) algorithm for anchor- dictions. These methods have proved to be effective at
free SSOD 1 . Specifically, to perform careful label guid- smoothing the feature manifold, and consequently improv-
ance for dense learning, we first present an Adaptive Filter- ing the generalization performance of models. There are
ing (AF) strategy to partition pseudo-labels into three fine- also some other techniques targeting at utilizing the unla-
grained parts, including background, foreground, and ignor- beled data to improve image classification, including self-
able regions. Then we refine these pseudo-labels by using training [6, 20, 23, 46], data augmentation [2, 37] and so on.
a MetaNet so as to remove the classification false-positives, Though many SSL methods have been proposed for im-
which have higher prediction scores but are actually false age classification, it is not a trivial work to transfer them
predictions in category. Considering that the correctness of to the task of object detection due to the complex archi-
pseudo-labels determines the performance of SSOD mod- tectural design and multi-task learning (classification and
els, we introduce an Aggregated Teacher (AT) to further regression) nature of object detectors.
enhance the stability and quality of the estimated pseudo- Object Detection is a fundamental task in computer vi-
labels. Moreover, to improve the model generalization ca- sion. Current CNN-based object detectors can be catego-
pability, we learn from shuffled image patches and regu- rized into anchor-based and anchor-free methods. Faster
larize the uncertainty of dense feature maps to make them R-CNN [36] is a well-known and representative two-stage
consistent among image scales. The main contributions of anchor-based detector. It consists of a region proposal
this paper are summarized as follows: network (RPN) and a region-wise prediction network (R-
CNN) for detecting objects. Many works [1, 3, 4, 21, 24, 43]
• A simple yet effective DenSe Learning (DSL) method have been proposed to improve the performance of Faster
is developed to improve the utilization of large-scale RCNN. For anchor-free object detection, the state-of-the-
unlabelled data for SSOD. To our best knowledge, this art methods [13,18,30,35,44] mostly regard the center (e.g.,
is the first anchor-free method for SSOD. the center point or part) of an object as a foreground to de-
fine positives, and then predict the distances from positives
• An Adaptive Filtering (AF) strategy is proposed to to the four sides of the object bounding box (BBox). For
assign fine-grained pseudo-labels to each pixel; an example, FCOS [44] takes all the pixels inside the BBox
Aggregated Teacher (AT) is introduced to enhance as positives, and uses these four distances and a centerness
the stability and quality of estimated pseudo-labels; score to detect objects. CSP [30] defines only the center
and learning from shuffled patches and uncertainty- point of the object box as positive to detect pedestrians with
consistency-regularization among scales are employed fixed aspect ratio. FoveaBox [18] regards pixels in the mid-
to improve the model generalization performance. dle part of object as positives and learns four distances to
perform detection. Without the need to set anchors, anchor-
Extensive experiments conducted on MS-COCO [27] free detectors are much easier and more flexible to be de-
and PASCAL-VOC [8] demonstrate that the proposed DSL ployed in real applications.
method achieves significant performance improvements
Semi-Supervised Object Detection (SSOD). SSOD
over existing state-of-the-art SSOD methods.
aims to improve the performance of object detectors by us-
ing larger-scale unlabeled data. Since the manual annota-
2. Related Work tion of object labels is very expensive, producing pseudo-
Semi-Supervised Learning for Image Classification. labels for unlabeled data is very attractive. In [34, 39, 52],
Recently, semi-supervised learning (SSL) has achieved sig- the pseudo-labels are produced by ensembling the predic-
nificant progress in image classification with the rapid de- tions from different data augmentations. STAC [38] uses
velopment of deep learning techniques. SSL aims to em- both weak and strong augmentations for model training,
ploy a large amount of unlabeled data to learn robust and where strong augmentations are only applied to unlabeled
discriminative classification boundaries. Specifically, self- data while weak augmentations are used to produce stable
ensembling is used in [19] to stabilize the learning targets pseudo-labels. UBA [31] employs the EMA teacher [42] for
for unlabeled data. A new measure of local smoothness of producing more accurate pseudo-labels. ISMT [48] fuses
the conditional label distribution is proposed in [32] for im- the current pseudo-labels with history labels via NMS, and
proving the SSL learning performance. Mean teacher is uses multiple detection heads to improve the accuracy of
pseudo-labels. Instant-Teaching [51] combines more pow-
1 In this paper, we employ FCOS [44] as our baseline detector. erful augmentations like Mixup and Mosaic into the train-

4816
Figure 2. The pipeline of our proposed DenSe Learning (DSL) based SSOD method. The training data contain both labeled and unlabeled
images. During each training iteration, a teacher model is employed to produce pseudo-labels for weakly augmented unlabeled images.
In anchor-free based detectors like FCOS [44], each spatial location of the dense predictions will be assigned with one label, and the
model performance is sensitive to noisy pseudo-labels. To alleviate this problem, an Adaptive Filtering strategy is proposed to split the
pseudo-labels into three types, including background, foreground and ignorable regions. Moreover, there exist some false positive cases,
which have higher scores but are obviously wrong predictions. Thus, a MetaNet is proposed to refine these cases. To improve the model
generalization capability, unlabeled images are patch-shuffled and consistency regularizations are applied on these images with different
scales. For improving the stability and quality of pseudo-labels, the teacher model is updated by the student models via aggregation, called
Aggregated Teacher. After obtaining the fine-grained pixel-wise pseudo-labels, the detector can be optimized by the final loss, which is the
sum of Ls , Lu and Lscale .

ing stage. Humble-Teacher [41] uses plenty of proposals where Ls and Lu denote supervised loss and unsupervised
and soft pseudo-labels for the unlabeled data. Certainty- loss, respectively, and α is the hyper-parameter to control
aware pseudo-labels are tailored in [22] for object detection. the contribution of unlabeled data.
E2E [47] uses a soft teacher mechanism for training with Both of the supervised and unsupervised losses are nor-
the unlabeled data. Almost all the above methods are built malized by the corresponding number of positive pixels in
upon anchor-based detectors, e.g., Faster RCNN, which are each mini-batch as follows:
not convenient to deploy in real applications with limited
\label {eq_FCOS} L_{s}=\frac {1}{N_{pos}}\sum _{i}\sum _{h,w}(&L_{cls}(X_{i,h,w})+\mathbbm {1}_{\{p^{*}_{h,w}\in [0,C-1]\}}L_{reg}(X_{i,h,w})\nonumber \\ +&\mathbbm {1}_{\{p^{*}_{h,w}\in [0,C-1]\}}L_{center}(X_{i,h,w}))
resources. Therefore, in this work we develop, for the first
time to our best knowledge, an anchor-free SSOD method.
(2)
3. Methods
L_{u}=\frac {1}{N_{pos}}\sum _{i}\sum _{h,w}(&L_{cls}(U_{i,h,w})+\mathbbm {1}_{\{\bar {p}^{*}_{h,w}\in [0,C-1]\}}L_{reg}(U_{i,h,w})\nonumber \\ +&\mathbbm {1}_{\{\bar {p}^{*}_{h,w}\in [0,C-1]\}}L_{center}(U_{i,h,w}))
3.1. Preliminary
For the convenience of expression, we first provide some (3)
notations for the SSOD task. Suppose that we have two sets
of data, a labeled set X = {Xi |N where Npos means the number of positive pixels in one
i=1 } and an unlabeled set
l

Nu
U = {Ui |i=1 }, where Nl and Nu are the number of labeled mini-batch, Xi,h,w means the predicted vector at spatial lo-
and unlabeled images, respectively, and Nu ≫ Nl . Each cation (h, w) from the ith image, p̄∗h,w is the corresponding
labeled image has annotations of category p∗ ∈ [0, C − 1] estimated pseudo-labels at location (h, w). Lcls , Lreg and
(C is the number of foreground classes) and annotations of Lcenter are the default losses used in FCOS [44]. 1{·} is
bounding box (BBox) t∗ . In an image, each region anno- the indicator function, which outputs 1 if condition {·} is
tated by BBox and class label is called an instance. Without satisfied and 0 otherwise.
loss of generality, we take the anchor-free FCOS [44] de- In this paper, we propose a DenSe Learning (DSL) algo-
tector as our baseline, which is composed of a ResNet50 [9] rithm for bridging the gap between SSOD and anchor-free
backbone, an FPN [26] neck and a dense head. To use both detector. The pipeline of our DSL method is illustrated in
labeled and unlabeled data for training, the overall loss can Figure 2. It is mainly composed of an Adaptive Filtering
be defined as follows: (AF) strategy, a MetaNet, an Aggregated Teacher (AT) and
an Uncertainty-Consistency regularization term, which are
\label {eq_overloss} L=L_{s}+\alpha L_{u} (1) introduced in detail in the following sections.

4817
Figure 3. The distributions of TP+, TP- and BG when using 10% Figure 4. (a) The estimated classification-false-positive instances
labeled data on COCO. ‘TP+’ means that the estimated instance which have high scores but are obvious false predictions in cate-
has the same class ID as the ground-truth (GT) and the IOU of gory. (b) Our proposed MetaNet for refining the pseudo-labels of
√
BBox is above 0.5. ‘TP-’ means that the estimated instance has instances. ‘ ’ and ‘×’ mean reservation and deletion, resp.
the same class ID as GT but the IOU of BBox is below 0.5. ‘BG’
means that the estimated instance belongs to the background or Different from foreground and background regions, we ig-
has wrong class ID. nore the gradients computation and propagation for ignor-
able regions as:
3.2. Adaptive Filtering Strategy
The FCOS [44] detector reduces the dependency on pre- \label {eq_ignore} L_{u}=\frac {1}{N_{pos}}\sum _{i}\sum _{h,w}(&\mathbbm {1}_{\{\bar {p}^{*}_{h,w}\geq 0\}}L_{cls}(U_{i,h,w})+\mathbbm {1}_{\{\bar {p}^{*}_{h,w}\in [0,C-1]\}}\nonumber \\ L_{reg}(U_{i,h,w})+&\mathbbm {1}_{\{\bar {p}^{*}_{h,w}\in [0,C-1]\}}L_{center}(U_{i,h,w})).
defined anchors by introducing dense pixel-wise supervi-
sion. Though this is helpful for the easy deployment in ac- (5)
tual applications, the performance of the model is sensitive
τ1 in Eq. 4 is used to filter out the background and thus it
to the quality of pixel-wise labels. Because the predicted
is relatively easy to set. We set τ1 = 0.1 throughout our
pseudo-labels in SSOD will have noise no matter how pow-
experiments. τ2 is employed to filter out the foreground and
erful the detector is, the pixel-wise supervision for FCOS
it is harder to set for different classes. We propose to use a
should be treated prudently. To this end, we propose an
class-adaptive τ2k instead of a fixed τ2 :
Adaptive Filtering (AF) strategy to elaborately handle the
pseudo-labels for dense learning.
To exploit the unlabeled data, we need to assign a \label {eq_ada} \tau _{2}^{k} = (\frac {\sum _{h,w}\mathbbm {1}_{\{\bar {p}_{h,w}^{*}==k\}}p_{h,w}}{N_{pos}})^{\beta }\tau , (6)
pseudo-label for each pixel in the output dense tensor. As
shown in Figure 3, however, we can see that the TP+, TP-
where τ2k is the threshold for the k th class, β = 0.7 is used
and BG instances coexist with each other, and their distri-
to control the degree of focus on tail-classes, and τ = 0.35
butions are much more complex. If we simply use a sin-
is used as a fixed reference threshold.
gle threshold to define foreground and background, many
Remarks: Different from those anchor-based detectors,
instances will be assigned with wrong labels, resulting in
anchor-free detectors will predict each pixel as either back-
heavy noise and damaging the learning of an accurate de-
ground or foreground, and compute gradients for all of
tector. For example, if we set a relatively higher threshold
them. However, for unlabeled data, instances with scores
0.4 to define the positive instances, there will be many TP+
within interval [τ1 , τ2k ] are noisy and confusing, and treat-
and TP- wrongly assigned to the background. Conversely,
ing them as either foreground or background will degrade
if we set a relatively lower threshold 0.1 to define the back-
the detection performance. Therefore, in anchor-free SSOD
ground instances, there will be many BG instances wrongly
we should explicitly set multiple fine-grained thresholds to
assigned to the foreground. Therefore, we propose to use
identify not only the background and foreground but also
multiple thresholds {τ1 , τ2 } to partition the estimated in-
the ignorable regions. The proposed AF strategy can well
stances into three parts: background, ignorable region and
handle this problem and assign fine-grained and multi-level
foreground:
labels to the dense pixels, as illustrated in Figure. 2. We
experimentally demonstrate that the AF strategy is very im-
\label {eq_pl} \bar {p}_{h,w}^{*}=\left \{ \begin {aligned} &Foreground:[0,\cdots ,C-1]& & p_{h,w}>=\tau _{2}, \\ &Ignorable~Region:[-1] & & \tau _{1}<p_{h,w}<\tau _{2},\\ &Background:[C] & & p_{h,w}<=\tau _{1}. \end {aligned} \right . portant for anchor-free SSOD.

3.3. MetaNet
(4)
where ph,w is the predicted score at location (h, w) (If not Though AF has the ability to improve the quality of
specified, it is the product of classification score and cen- pseudo-labels for dense learning, there still exist some
terness score), and p̄∗h,w is the corresponding pseudo-label. classification-false-positive instances, which have high

4818
scores but are obvious false predictions, as shown in Figure
4(a). In order to handle these instances, we resort to using
a MetaNet, as shown in Figure 4(b). We use a ResNet50 to
implement the MetaNet. Before DSL training, we first pass
all the labeled instances into the MetaNet and compute the
following class-wise proxies mk :

\label {eq_meta} m_{k}=\frac {\sum _{i}f_{i,k}}{N_{k}}, (7) Figure 5. The illustration of (a) EMA Teacher and (b) our Ag-
gregated Teacher. EMA teacher performs aggregation only over
where fi,k is the 1-D feature vector of the ith instance be- parameters, while our Aggregated teacher performs aggregation
longing to the k th class, Nk is the number of instances of over both parameters and layers.
the k th class. After obtaining the class-wise proxies, we
refine the pseudo-labels by computing the cosine distance the recurrent learning [11, 25, 50] and use a recurrent layer
between the feature vector of the unlabeled instance and the aggregation mechanism as bellow:
corresponding class proxy vector. If the distance is smaller
than a threshold d = 0.6, we will change the label ‘Fore- x_{l+1}&=\theta _{l+1}[x_{l}+h_{l}]+x_{l}\label {eq_res_rla},\\ h_{l+1}&=g_{2}[g_{1}[\theta _{l+1}[x_{l}+h_{l}]]+h_{l}]\label {eq_rla},
ground’ of this instance to the label ‘Ignorable Region’. (10)
Remarks: MetaNet is employed to rectify the predicted
foreground class labels of those error-prone instances. It where xl is the lth layer’s tensor in CNN and θl denotes
only performs the meta update step and thus can work in a the corresponding convolution parameters. hl is the hidden
plug-and-play manner. The computation of MetaNet only state tensor for the lth layer, and h1 is initialized with zero.
involves the class proxy update on the labeled instances g1 and g2 are the corresponding 1 × 1 and 3 × 3 Conv layers
without gradient back-propagation, and thus it is fast and used for recurrent computing, which are parameter-shared
the cost is negligible compared with the training of DSL. across the adjacent layers within the same stage. ∗[·] in-
With the help of stable class proxies, we can successfully dicates the convolution operation between input tensor ‘·’
remove many classification-false-positive instances. and parameter ‘∗’. By using the recurrent mechanism, the
3.4. Aggregated Teacher number of introduced parameters is negligible. One can see
from Eq. 9 that it will degrade to the default residual unit
In pseudo-label based methods, the stability and quality of ResNet when the hidden state hl−1 is removed. In other
of the predicted pseudo-labels are important to the final per- words, the recurrent layer aggregation can be easily applied
formance. Therefore, almost all the existing anchor-based to the current residual CNN models. Moreover, since neck
methods [22, 31, 41, 47, 48] employ an EMA Teacher to im- and heads in the detector are very shallow, we only perform
prove the quality of pseudo-labels for the unlabeled data. layer aggregation over the backbone.
As illustrated in Figure 5(a), EMA is usually performed in Remarks: Since the parameter aggregation in EMA
following manner: Teacher treats each layer independently, the relationship be-
tween layers might be destroyed during aggregation, and
\label {eq_ema} \theta ^{'t}=\epsilon \theta ^{'t-1}+(1-\epsilon )\theta ^{t}, (8)
thus one aggregated layer may not work well with the ad-
where ϵ is a smoothing hyperparameter, t means the iter- jacent ones. Therefore, layer aggregation is considered in
′
ation, θ and θ are parameters of the student and teacher our model. By explicitly using the hidden state to connect
models, respectively. the current layer with the previous layers, the knowledge
EMA update aims to obtain a more stable and power- propagation will be more stable and accurate. Moreover,
ful teacher model via the ensemble of students. However, the shared recurrent layers impose regularization over the
such an update in Eq. 8 might still be coarse and weak be- propagated information. Compared with EMA Teacher, the
cause it only aggregates parameters in the same layer at dif- Aggregated Teacher is able to produce more stable and ac-
ferent iterations, without considering the correlation across curate pseudo-labels for dense learning.
layers. To further enhance the capability of teacher model,
3.5. Uncertainty Consistency
motivated by the dense aggregation mechanism [12, 49, 50],
we introduce an Aggregated Teacher (AT), which performs By using the proposed AF, MetaNet and AT, the dense
not only parameter aggregation across time but also recur- pixel-wise pseudo-labels can be obtained to supervise the
rent layer aggregation across layers, as illustrated in Figure learning of SSOD models by optimizing the loss Lu . In or-
5(b). Specifically, for parameter aggregation, we still adopt der to further improve the generalization capability of the
the existing EMA update as in Eq. 8. While for layer aggre- SSOD model, we propose to regularize the uncertainty con-
gation, to avoid the problem of heavy parameter, we follow sistency over the unlabeled images. From Figure 6, one

4819
Algorithm 1: Patch Shuffle
Input: Unlabeled image U ;
Output: Patch shuffled image Up ;
Initialization: U 0 = U , total iteration number J;
for j = 0, · · · , J − 1 do
(1) Mode m: randomly select a mode from
[‘horizontal’,‘vertical’];
(2) Normalized size s: randomly generate s from
interval [0, 1];
(3) Crop U j into two parts based on mode m and
normalized size s;
(4) Shuffle the order of the two parts, and concatenate
them into a new image Û j ;
(5) U j+1 = Û j ;
end
Figure 6. Illustration of the uncertainty consistency regularization
among scales. The input images come from the same unlabeled
image Ui .
4. Experiments
can see that the input consists of a pair of images: Strong Datasets & Evaluation Metrics: We conduct experi-
& Patch Augmented image (Usp ) and the corresponding ments on the popular object detection benchmarks, includ-
Down-sampled image (Ud ). The downsampling ratio is set ing MS-COCO [27] and PASCAL-VOC [8]. MS-COCO
to r = 2 in producing Ud . By patch shuffle augmentation, contains more than 118k labeled images, and there are
we randomly crop an image into several parts along the hor- about 850k instances from 80 classes. In addition, there
izontal or vertical directions and then shuffle these parts (de- are 123k unlabeled images provided for semi-supervised
tailed algorithm can be found in Algorithm 1). Both the two learning. VOC07 contains 5,011 training images from 20
images will be fed into our detector, producing dense score classes, while VOC12 has 11,540 training images.
maps at different scale levels. (In FCOS, there are 5 levels, On MS-COCO, we follow the settings in STAC [38] and
i.e., v ∈ [1, · · · , 5].) evaluate with both the protocols of Partially Labeled Data
To improve the generalization performance of SSOD, we and Fully Labeled Data. The former randomly samples
adopt the following regularization loss: 1%, 2%, 5% and 10% of the training data as labeled data,
and treats the remainder as unlabeled data. (For this pro-
tocol, we create 3 data folds and report the mean results
L_{scale}&=\sum _{v=1}^{4}\|p^{v}[U_{d}]-p^{v+1}[U_{sp}]\|^{2}_{2}, (11) over them.) The latter uses all the training data as labeled
data and the additional unlabeled data as unlabeled samples.
We adopt the mean average precision AP50:90 (denoted by
where pv [U∗ ] indicates the score map pv derived from im- mAP) as the evaluation metric.
age U∗ . Since the downsampling ratio r = 2, pv [Ud ] has For experiments on PASCAL-VOC07, following STAC
the same resolution as pv+1 [Usp ], and they are constrained [38], we use the VOC07 training set as the labeled data, and
to be consistent. the VOC12 training set or together with the images from
Remarks: The output dense score maps reveal the un- the same 20 classes in MS-COCO (denoted by COCO20)
certainty or the reliability of the predicted label for each as the unlabeled data. We adopt VOC default AP50 metric
pixel. The lower the score is, the higher the uncertainty and COCO default mAP metric as the evaluation metrics.
that the pixel belongs to a foreground object. Data uncer- Implementation Details: We adopt the popular anchor-
tainty has been widely used to indicate the data importance free detector FCOS [44] with ResNet50 [9] as backbone,
in previous works [6, 10, 15, 16, 45]. In this paper, we reg- and FPN [27] as neck and dense heads. Images in MS-
ularize the uncertainty consistency. Patch shuffle is used COCO are resized to have shorter edge 800, or 640 if the
to reduce the dependency of foreground objects on their longer edge is less than 1,333. Images in PASCAL-VOC are
surrounding contexts, improving the model robustness to resized to have shorter edge 600, or 480 if the longer edge is
context variations. In addition, to ensure consistent outputs less than 1,000. For fair comparison, following [31, 38], in
among scales, Lscale is then defined to improve the model all experiments, random flip is used as weak augmentation,
robustness to object scaling variations. while strong augmentation includes random flip, color jit-
By far, all the components of our DSL have been de- tering and cutout. The iteration J is set to 2 in Patch Shuf-
scribed, and the overall pipeline is shown in Figure 2. fle. For training configurations, learning rate starts from

4820
Table 1. The mAP performance (%) of competing methods on the MS-COCO [27] dataset. The used protocol is Partially Labeled Data. †
means that the method uses a larger batch size 32 or 40, and ‡ indicates that strong augmentation is applied on the labeled data. Note that
†, ‡ are not the default settings in STAC [38] but they will improve the performance of both supervised baseline and SSOD. ‘Supervised’
means that only the corresponding labeled data are used for training, and this is set as the baseline for SSOD.
Methods Deployment 1% 2% 5% 10%
Supervised [38] Hard 9.05 ± 0.16 12.70 ± 0.15 18.47 ± 0.22 23.86 ± 0.81
CSD [14] Hard 11.12 ± 0.15 14.15 ± 0.13 18.79 ± 0.13 24.50 ± 0.15
STAC [38] Hard 13.97 ± 0.35 18.25 ± 0.25 24.38 ± 0.12 28.64 ± 0.21
IT [51] Hard 16.00 ± 0.20 20.70 ± 0.30 25.50 ± 0.05 29.45 ± 0.15
Anchor-based
ISMT [48] Hard 18.88 ± 0.74 22.43 ± 0.56 26.37 ± 0.24 30.53 ± 0.52
Humble [41] Hard 16.96 ± 0.38 21.72 ± 0.24 27.70 ± 0.15 31.60 ± 0.28
UB† [31] Hard 20.75 ± 0.12 24.30 ± 0.97 28.27 ± 0.11 31.50 ± 0.10
E2E†‡ [47] Hard 20.46 ± 0.39 - 30.74 ± 0.08 34.04 ± 0.14
Supervised(Ours) Easy 9.53 ± 0.23 11.71 ± 0.26 18.74 ± 0.18 23.70 ± 0.22
Anchor-free
DSL(Ours) Easy 22.03 ± 0.28 25.19 ± 0.37 30.87 ± 0.24 36.22 ± 0.18

0.01 and is divided by 10 at 16 and 22 epochs. The max Table 2. The mAP performance (%) of competing methods on the
epoch is 24. α is set to 3 and 1 for the partially and fully la- MS-COCO [27] dataset. The used protocol is Fully Labeled Data.
beled protocols, resp, and 2.5 for VOC. ϵ is set to 0.99. For Methods Deployment 100%
parameter τ2k , we set it within the range [0.25, 0.35]. All of STAC [38] Hard
1.6
37.6−→39.2
1.8
our experiments are based on Pytorch [33] and MMDetec- Anchor-based ISMT [48] Hard 37.8−→39.6
1.1
tion [7]. We use 8 NVIDIA-V100 GPUs with 32G memory UB† [31] Hard 40.2−→41.3
3.6
per GPU. For each GPU, we randomly sample 2 images E2E†‡ [47] Hard 40.9−→44.5
3.6
from labeled set and unlabeled set with ratio 1:1. Anchor-free DSL(Ours) Easy 40.2−→43.8

4.1. Comparison with State-of-the-Arts

In summary, the results in Tables 1, 2 and 3 all demon-
We compare the proposed DSL with existing SOTA strate the effectiveness of our DSL method. It is worth men-
methods that are based on anchor-based detectors such as tioning that the proposed DSL is much easier to be deployed
Faster-RCNN [36] and SSD [29]. The results are shown in in real applications due to its negligible pre/post-processing
Tables 1, 2 and 3. costs compared to anchor-based methods, showing the great
From Table 1, one can see that under the supervised set- potential values of the anchor-free SSOD algorithm.
ting of the Partially Labeled Data protocol in COCO, our
anchor-free detector achieves similar baseline performance 4.2. Ablation Studies
to those anchor-based detectors, i.e., 9.53 vs. 9.05, 11.71
vs. 12.70, 18.74 vs. 18.47 and 23.7 vs. 23.86 with 1%, To better understand how the proposed DSL works, we
2%, 5% and 10% labeled data, respectively. This means conduct a series of ablation studies under the MS-COCO
that anchor-free and anchor-based SSOD models are com- 10% labeled data protocol.
parable when partially labeled data are used. After applying Effectiveness of each component. The contributions of
the proposed DSL algorithm, the SSOD performance can be different components of DSL are listed in Table 4. From this
significantly and consistently improved over the baselines table, one can see that by using AF, the performance can be
under all protocols. DSL outperforms all the competing significantly improved from 23.7 to 32.2 mAP, which has
methods by a large margin, demonstrating the effectiveness already surpassed most SOTA methods shown in Table 1.
and superiority of our method. By adopting the MetaNet to refine the foreground pseudo-
We also conduct experiments following the Fully La- labels, the performance can be further improved to 32.5.
beled Data protocol of COCO. The results are shown in By applying AT to encourage the stability and quality of the
Table 2. Since the reported performance of those super- pseudo-labels, the performance is further improved to 34.5
vised methods varies a lot in the original works, we report mAP. Finally, by learning from shuffled patches and con-
their results together with their baselines, and compare their straining the consistency among image scales, the overall
relative performance improvements. From Table 2, one can model becomes more robust and exhibits higher accuracy,
see that our DSL achieves the largest performance improve- i.e., 36.2 mAP. The ablation studies in Table 4 verify the
ment, i.e., 3.6 mAP gain. The results on PASCAL-VOC are effectiveness of each module in DSL.
listed in Table 3. We can see that the proposed DSL also Ablation studies on AF. Table 5 shows the ablation
achieves significant performance improvements over the su- studies on our AF strategy. In order to demonstrate the
pervised baselines as well as all the compared methods. importance of multiple thresholds, we experiment with a

4821
Table 3. The results (%) of competing methods on the PASCAL-VOC [8] dataset. The performances are evaluated on the VOC07 test set.
Unlabeled: VOC12 Unlabeled: VOC12 + COCO20
Methods Deployment
AP50 AP50:90 AP50 AP50:90
Supervised [38] Hard 72.75 42.04 72.75 42.04
CSD [14] Hard 74.7 - 75.1 -
STAC [38] Hard 77.45 44.64 79.08 46.01
Anchor-based
IT [51] Hard 78.3 48.7 79 49.7
ISMT [48] Hard 77.23 46.23 77.75 49.59
UB† [31] Hard 77.37 48.69 78.82 50.34
Supervised(Ours) Easy 69.6 45.9 69.6 45.9
Anchor-free
DSL (Ours) Easy 80.7 56.8 82.1 59.8

Table 4. Effectiveness of each component of the proposed DSL Table 5. Ablation studies on Adaptive Filtering.
method. ‘+’ means training by the proposed method.
Single threshold AF(fixed τ2k )
Methods AF
Methods mAP 0.05 0.1 0.2 0.3 0.2 0.3 0.4
mAP 27.1 28.8 30.7 27.5 34.3 36.0 35.6 36.2
Supervised 23.7
+ AF 32.2 Table 6. Ablation studies on Aggregated Teacher. ‘LA’ means
+ MetaNet 32.5 layer aggregation.
+ AT 34.5
Methods No teacher + EMA + LA AT
+ Patch-Shuffle 34.9
mAP 33.0 34.1 35.0 36.2
+ Lscale 36.2
Table 7. Ablation studies on loss weight α for unlabeled data.
‘fail’ means that the training loss will easily get to ‘nan’.
single threshold strategy as reference, where instances are
regarded as foreground if their scores are above the thresh- α 1 2 3 4
old and background otherwise. One can see that the single mAP 33.9 35.4 36.2 fail
threshold strategy cannot achieve satisfactory performance.
The best result is only 30.7 mAP when the threshold is
set to 0.2, indicating that there are many instances being how to recall the foreground instances via a threshold. In
wrongly defined by a single threshold. In contrast, by us- contrast, in anchor-free SSOD the multi-level pseudo-labels
ing our multi-level thresholds strategy, i.e., AF, the perfor- should be explicitly considered due to the pixel-wise gradi-
mance can be significantly improved: even by using a fixed ent propagation. This can be demonstrated by our AF strat-
τ2k =0.3, the result can be improved to 36.0 mAP; and when egy as in Table 5. Moreover, without the help of predefined
the adaptive τ2k is used for each class, it can be further im- anchors for scale variances, FPN [27] with a dense head
proved to 36.2 mAP, showing the effectiveness and impor- has been widely used in anchor-free detectors to address
tance of our AF strategy. the scaling issue. Thus Lscale can be generally adopted and
regarded as a default trick in anchor-free SSOD, and this is
Ablation studies on AT. From Table 6, one can see
verified to be effective in Table 4. In summary, most of our
that layer aggregation (LA) achieves higher performance
techniques are proposed by considering the special charac-
gain than EMA because it considers the fine-grained rela-
teristics of anchor-free detectors, and our work in this paper
tionships across layers, while EMA just simply aggregates
makes the first step towards anchor-free SSOD.
layer-wise parameters independently so that the relation-
ships between layers can be harmed. In addition, by em-
ploying both EMA and LA, our AT can further improve the 5. Conclusion
performance to 36.2 mAP. This implies that aggregations In this paper, we made the first attempt, to the best of our
over parameters and layers are actually complementary. knowledge, to bridge the gap between SSOD and anchor-
Ablation studies on loss weight α. From Table 7, one free detector, and developed a DSL based SSOD method.
can see that the performance peaks around α = 3. A too The DSL was built upon several novel techniques, such
large weight such as α = 4 will give the model too many as Adaptive Filtering, Aggregated Teacher and uncertainty
chances to employ the unlabeled images in training, and regularization. Our experiments showed that the proposed
hence reduce the stability of the model. DSL outperformed the state-of-the-art SSOD methods by a
Discussions. In anchor-based SSOD, the nega- large margin on both COCO and VOC datasets. It is ex-
tive/ignorable instances have been implicitly handled by la- pected our work can inspire more and in-depth explorations
bel assigner and sampler, and we only need to consider on anchor-free SSOD methods.

4822
References tection. Advances in neural information processing systems,
32:10759–10768, 2019. 7, 8
[1] Sean Bell, C Lawrence Zitnick, Kavita Bala, and Ross Gir-
[15] Alex Kendall and Yarin Gal. What uncertainties do we
shick. Inside-outside net: Detecting objects in context with
need in bayesian deep learning for computer vision? arXiv
skip pooling and recurrent neural networks. In Proceed-
preprint arXiv:1703.04977, 2017. 6
ings of the IEEE conference on computer vision and pattern
recognition, pages 2874–2883, 2016. 2 [16] Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task
[2] David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas learning using uncertainty to weigh losses for scene geome-
Papernot, Avital Oliver, and Colin Raffel. Mixmatch: A try and semantics. In Proceedings of the IEEE conference on
holistic approach to semi-supervised learning. arXiv preprint computer vision and pattern recognition, pages 7482–7491,
arXiv:1905.02249, 2019. 2 2018. 6
[3] Zhaowei Cai, Quanfu Fan, Rogerio S Feris, and Nuno Vas- [17] Kang Kim and Hee Seok Lee. Probabilistic anchor assign-
concelos. A unified multi-scale deep convolutional neural ment with iou prediction for object detection. In ECCV,
network for fast object detection. In European conference 2020. 1
on computer vision, pages 354–370. Springer, 2016. 2 [18] Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, Lei Li,
[4] Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: Delv- and Jianbo Shi. Foveabox: Beyound anchor-based object de-
ing into high quality object detection. In Proceedings of the tection. IEEE Transactions on Image Processing, 29:7389–
IEEE conference on computer vision and pattern recogni- 7398, 2020. 2
tion, pages 6154–6162, 2018. 2 [19] Samuli Laine and Timo Aila. Temporal ensembling for semi-
[5] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas supervised learning. arXiv preprint arXiv:1610.02242, 2016.
Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- 2
end object detection with transformers. In European confer- [20] Dong-Hyun Lee et al. Pseudo-label: The simple and effi-
ence on computer vision, pages 213–229. Springer, 2020. 1 cient semi-supervised learning method for deep neural net-
[6] Binghui Chen and Weihong Deng. Weakly-supervised deep works. In Workshop on challenges in representation learn-
self-learning for face recognition. In 2016 IEEE Interna- ing, ICML, volume 3, page 896, 2013. 2
tional Conference on Multimedia and Expo (ICME), pages [21] Hyungtae Lee, Sungmin Eum, and Heesung Kwon. Me r-
1–6. IEEE, 2016. 2, 6 cnn: Multi-expert r-cnn for object detection. IEEE Transac-
[7] Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu tions on Image Processing, 29:1030–1044, 2019. 2
Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, [22] Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, and Larry S
Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tian- Davis. Rethinking pseudo labels for semi-supervised object
heng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, detection. arXiv preprint arXiv:2106.00168, 2021. 3, 5
Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang,
[23] Xinzhe Li, Qianru Sun, Yaoyao Liu, Qin Zhou, Shibao
Chen Change Loy, and Dahua Lin. MMDetection: Open
Zheng, Tat-Seng Chua, and Bernt Schiele. Learning to self-
mmlab detection toolbox and benchmark. arXiv preprint
train for semi-supervised few-shot classification. Advances
arXiv:1906.07155, 2019. 7
in Neural Information Processing Systems, 32:10276–10286,
[8] Mark Everingham, Luc Van Gool, Christopher KI Williams,
2019. 2
John Winn, and Andrew Zisserman. The pascal visual object
classes (voc) challenge. International journal of computer [24] Yanghao Li, Yuntao Chen, Naiyan Wang, and Zhaoxiang
vision, 88(2):303–338, 2010. 2, 6, 8 Zhang. Scale-aware trident networks for object detection.
[9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. In Proceedings of the IEEE/CVF International Conference
Deep residual learning for image recognition. In Proceed- on Computer Vision, pages 6054–6063, 2019. 2
ings of the IEEE conference on computer vision and pattern [25] Tsungnan Lin, Bill G Horne, Peter Tino, and C Lee Giles.
recognition, pages 770–778, 2016. 3, 6 Learning long-term dependencies in narx recurrent neu-
[10] Jay Heo, Hae Beom Lee, Saehoon Kim, Juho Lee, ral networks. IEEE Transactions on Neural Networks,
Kwang Joon Kim, Eunho Yang, and Sung Ju Hwang. 7(6):1329–1338, 1996. 5
Uncertainty-aware attention for reliable interpretation and [26] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He,
prediction. arXiv preprint arXiv:1805.09653, 2018. 6 Bharath Hariharan, and Serge Belongie. Feature pyra-
[11] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term mid networks for object detection. In Proceedings of the
memory. Neural computation, 9(8):1735–1780, 1997. 5 IEEE conference on computer vision and pattern recogni-
[12] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- tion, pages 2117–2125, 2017. 3
ian Q Weinberger. Densely connected convolutional net- [27] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays,
works. In Proceedings of the IEEE conference on computer Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence
vision and pattern recognition, pages 4700–4708, 2017. 5 Zitnick. Microsoft coco: Common objects in context. In
[13] Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu. Dense- European conference on computer vision, pages 740–755.
box: Unifying landmark localization with end to end object Springer, 2014. 1, 2, 6, 7, 8
detection. arXiv preprint arXiv:1509.04874, 2015. 2 [28] Songtao Liu, Zeming Li, and Jian Sun. Self-emd: Self-
[14] Jisoo Jeong, Seungeui Lee, Jeesoo Kim, and Nojun Kwak. supervised object detection without imagenet. arXiv preprint
Consistency-based semi-supervised learning for object de- arXiv:2011.13677, 2020. 1

4823
[29] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian [42] Antti Tarvainen and Harri Valpola. Mean teachers are better
Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C role models: Weight-averaged consistency targets improve
Berg. Ssd: Single shot multibox detector. In European con- semi-supervised deep learning results. Advances in Neural
ference on computer vision, pages 21–37. Springer, 2016. 7 Information Processing Systems, 30, 2017. 2
[30] Wei Liu, Shengcai Liao, Weiqiang Ren, Weidong Hu, and [43] Wanxin Tian, Zixuan Wang, Haifeng Shen, Weihong Deng,
Yinan Yu. High-level semantic feature detection: A new Yiping Meng, Binghui Chen, Xiubao Zhang, Yuan Zhao,
perspective for pedestrian detection. In Proceedings of and Xiehe Huang. Learning better features for face detec-
the IEEE/CVF Conference on Computer Vision and Pattern tion with feature fusion and segmentation supervision. arXiv
Recognition, pages 5187–5196, 2019. 2 preprint arXiv:1811.08557, 2018. 2
[31] Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, [44] Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. Fcos:
Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, and Peter Fully convolutional one-stage object detection. In Proceed-
Vajda. Unbiased teacher for semi-supervised object detec- ings of the IEEE/CVF international conference on computer
tion. arXiv preprint arXiv:2102.09480, 2021. 1, 2, 5, 6, 7, vision, pages 9627–9636, 2019. 1, 2, 3, 4, 6
8 [45] Zhenyu Wang, Yali Li, Ye Guo, Lu Fang, and Shengjin
[32] Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Wang. Data-uncertainty guided multi-phase learning for
Shin Ishii. Virtual adversarial training: a regularization semi-supervised object detection. In Proceedings of the
method for supervised and semi-supervised learning. IEEE IEEE/CVF Conference on Computer Vision and Pattern
transactions on pattern analysis and machine intelligence, Recognition, pages 4568–4577, 2021. 6
41(8):1979–1993, 2018. 2 [46] Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V
[33] Pytorch. https://pytorch.org/. 7 Le. Self-training with noisy student improves imagenet clas-
[34] Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia sification. In Proceedings of the IEEE/CVF Conference on
Gkioxari, and Kaiming He. Data distillation: Towards omni- Computer Vision and Pattern Recognition, pages 10687–
supervised learning. In Proceedings of the IEEE conference 10698, 2020. 2
on computer vision and pattern recognition, pages 4119– [47] Mengde Xu, Zheng Zhang, Han Hu, Jianfeng Wang, Lijuan
4128, 2018. 2 Wang, Fangyun Wei, Xiang Bai, and Zicheng Liu. End-to-
[35] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali end semi-supervised object detection with soft teacher. arXiv
Farhadi. You only look once: Unified, real-time object de- preprint arXiv:2106.09018, 2021. 3, 5, 7
tection. In Proceedings of the IEEE conference on computer [48] Qize Yang, Xihan Wei, Biao Wang, Xian-Sheng Hua, and
vision and pattern recognition, pages 779–788, 2016. 2 Lei Zhang. Interactive self-training with mean teachers
[36] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. for semi-supervised object detection. In Proceedings of
Faster r-cnn: Towards real-time object detection with region the IEEE/CVF Conference on Computer Vision and Pattern
proposal networks. Advances in neural information process- Recognition, pages 5941–5950, 2021. 1, 2, 5, 7, 8
ing systems, 28:91–99, 2015. 1, 2, 7 [49] Fisher Yu, Dequan Wang, Evan Shelhamer, and Trevor
[37] Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Darrell. Deep layer aggregation. In Proceedings of the
Zhang, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Han IEEE conference on computer vision and pattern recogni-
Zhang, and Colin Raffel. Fixmatch: Simplifying semi- tion, pages 2403–2412, 2018. 5
supervised learning with consistency and confidence. arXiv [50] Jingyu Zhao, Yanwen Fang, and Guodong Li. Recurrence
preprint arXiv:2001.07685, 2020. 2 along depth: Deep convolutional neural networks with re-
[38] Kihyuk Sohn, Zizhao Zhang, Chun-Liang Li, Han Zhang, current layer aggregation. Advances in Neural Information
Chen-Yu Lee, and Tomas Pfister. A simple semi-supervised Processing Systems, 34, 2021. 5
learning framework for object detection. arXiv preprint [51] Qiang Zhou, Chaohui Yu, Zhibin Wang, Qi Qian, and Hao
arXiv:2005.04757, 2020. 1, 2, 6, 7, 8 Li. Instant-teaching: An end-to-end semi-supervised object
[39] Xiaolin Song, Binghui Chen, Pengyu Li, Biao Wang, and detection framework. In Proceedings of the IEEE/CVF Con-
Honggang Zhang. Prnet++: Learning towards general- ference on Computer Vision and Pattern Recognition, pages
ized occluded pedestrian detection via progressive refine- 4081–4090, 2021. 1, 2, 7, 8
ment network. Neurocomputing, 2022. 2 [52] Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanx-
[40] Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chen- iao Liu, Ekin Dogus Cubuk, and Quoc Le. Rethinking pre-
feng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan training and self-training. Advances in Neural Information
Yuan, Changhu Wang, et al. Sparse r-cnn: End-to-end ob- Processing Systems, 33, 2020. 2
ject detection with learnable proposals. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 14454–14463, 2021. 1
[41] Yihe Tang, Weifeng Chen, Yijun Luo, and Yuting Zhang.
Humble teachers teach better students for semi-supervised
object detection. In Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition, pages
3132–3141, 2021. 3, 5, 7

4824

DSA in JAVA Syllabus
No ratings yet
DSA in JAVA Syllabus
15 pages
COPILOT ESSENTIALS - Step-By-Step Copilot in Excel Guide - C
No ratings yet
COPILOT ESSENTIALS - Step-By-Step Copilot in Excel Guide - C
23 pages
Interior Ballistics Simulation of Modular Charge Gun System Using Matlab
100% (1)
Interior Ballistics Simulation of Modular Charge Gun System Using Matlab
7 pages
W450656C Greer RCI BT 510 Operators Manual
No ratings yet
W450656C Greer RCI BT 510 Operators Manual
66 pages
Topographic Survey of Comprehensive Secondary School Nawfia, Anambra State
100% (1)
Topographic Survey of Comprehensive Secondary School Nawfia, Anambra State
8 pages
Usim Conformance Testing Spec
No ratings yet
Usim Conformance Testing Spec
99 pages
Syllabus Cse Ruet
No ratings yet
Syllabus Cse Ruet
25 pages
Manual Detroit Diesel Serie 92
No ratings yet
Manual Detroit Diesel Serie 92
180 pages
HubSpot Email Marketing Guide
No ratings yet
HubSpot Email Marketing Guide
25 pages
Module 5 LO2 F (ICT)
No ratings yet
Module 5 LO2 F (ICT)
19 pages
Incremental Training For Image Classification of Unseen Objects
No ratings yet
Incremental Training For Image Classification of Unseen Objects
19 pages
Al-Qaysi Mina
No ratings yet
Al-Qaysi Mina
46 pages
Control Systems Chapter General Catalogue 2023 - ECPEN23-500
No ratings yet
Control Systems Chapter General Catalogue 2023 - ECPEN23-500
60 pages
FI01 - Us - Kap07 RD500 - 2015
No ratings yet
FI01 - Us - Kap07 RD500 - 2015
38 pages
Active Learning For Deep Object Detection 2
No ratings yet
Active Learning For Deep Object Detection 2
10 pages
24 Useful Excel Macro Examples For VBA Beginners (Ready-To-Use)
No ratings yet
24 Useful Excel Macro Examples For VBA Beginners (Ready-To-Use)
35 pages
Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
2004 10934v1 PDF
No ratings yet
2004 10934v1 PDF
17 pages
Quectel-Antenna-Brochure - V1 7 4
No ratings yet
Quectel-Antenna-Brochure - V1 7 4
20 pages
Unbiased Teacher For Semi-Supervised Object Detection
No ratings yet
Unbiased Teacher For Semi-Supervised Object Detection
17 pages
DSSD: Deconvolutional Single Shot Detector
No ratings yet
DSSD: Deconvolutional Single Shot Detector
11 pages
Generalized Focal Loss Towards Efficient Representation Learning For Dense Object Detection
No ratings yet
Generalized Focal Loss Towards Efficient Representation Learning For Dense Object Detection
15 pages
Preparing For Google Technical Internship Interviews
No ratings yet
Preparing For Google Technical Internship Interviews
27 pages
2022 Few
No ratings yet
2022 Few
18 pages
Sensors 22 04833
No ratings yet
Sensors 22 04833
17 pages
Object Detection With Deep Learning
No ratings yet
Object Detection With Deep Learning
3 pages
Stacks Notes
No ratings yet
Stacks Notes
21 pages
Journal Pre-Proofs: Neurocomputing
No ratings yet
Journal Pre-Proofs: Neurocomputing
37 pages
Label Propagation For Deep Semi-Supervised Learning
No ratings yet
Label Propagation For Deep Semi-Supervised Learning
10 pages
Max17201gevkit Max17211xevkit
No ratings yet
Max17201gevkit Max17211xevkit
24 pages
ISSM535Q Week1 PDF
No ratings yet
ISSM535Q Week1 PDF
42 pages
TSP CMC 49710
No ratings yet
TSP CMC 49710
19 pages
Electronics-Object Detection YOLO
No ratings yet
Electronics-Object Detection YOLO
12 pages
SSD Single Shot MultiBox Detector
No ratings yet
SSD Single Shot MultiBox Detector
17 pages
2022 - Enhanced Feature Fusion and Multiple Receptive Fields Object Detection
No ratings yet
2022 - Enhanced Feature Fusion and Multiple Receptive Fields Object Detection
12 pages
Advanced Deep Learning Based Object Detection Methods
No ratings yet
Advanced Deep Learning Based Object Detection Methods
36 pages
OD Trans Christopher-Lang2022 Q2
No ratings yet
OD Trans Christopher-Lang2022 Q2
15 pages
Havi Doc Batch 10
No ratings yet
Havi Doc Batch 10
17 pages
Boosting Semi-Supervised Few-Shot Object Detection With Softer Teacher
No ratings yet
Boosting Semi-Supervised Few-Shot Object Detection With Softer Teacher
21 pages
Li SIOD Single Instance Annotated Per Category Per Image For Object CVPR 2022 Paper
No ratings yet
Li SIOD Single Instance Annotated Per Category Per Image For Object CVPR 2022 Paper
10 pages
End-to-End Object Detection With Fully Convolutional Network
No ratings yet
End-to-End Object Detection With Fully Convolutional Network
13 pages
Rockwell Operation Manual v0
No ratings yet
Rockwell Operation Manual v0
25 pages
CSPPartial-YOLO A Lightweight YOLO-Based Method For Typical Objects Detection in Remote Sensing Images
No ratings yet
CSPPartial-YOLO A Lightweight YOLO-Based Method For Typical Objects Detection in Remote Sensing Images
12 pages
Scalable Object Detection
No ratings yet
Scalable Object Detection
8 pages
On Hyperbolic Embeddings in Object Detection
No ratings yet
On Hyperbolic Embeddings in Object Detection
19 pages
Liu Ambiguity-Resistant Semi-Supervised Learning For Dense Object Detection CVPR 2023 Paper
No ratings yet
Liu Ambiguity-Resistant Semi-Supervised Learning For Dense Object Detection CVPR 2023 Paper
10 pages
Dense Constrastive Learning For Self Supervised Visual Pre Training
No ratings yet
Dense Constrastive Learning For Self Supervised Visual Pre Training
11 pages
7941 17755 1 SM
No ratings yet
7941 17755 1 SM
17 pages
Han Few-Shot Object Detection With Fully Cross-Transformer CVPR 2022 Paper
No ratings yet
Han Few-Shot Object Detection With Fully Cross-Transformer CVPR 2022 Paper
10 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
Box2Seg: Attention Weighted Loss and Discriminative Feature Learning For Weakly Supervised Segmentation
No ratings yet
Box2Seg: Attention Weighted Loss and Discriminative Feature Learning For Weakly Supervised Segmentation
18 pages
Center Net
No ratings yet
Center Net
12 pages
DQS3D
No ratings yet
DQS3D
13 pages
A Rich Feature Fusion Single-Stage Object Detector
No ratings yet
A Rich Feature Fusion Single-Stage Object Detector
8 pages
7 外文翻译1
No ratings yet
7 外文翻译1
10 pages
Stereo CenterNet Based 3D Object Detection For Autonomous Driving
No ratings yet
Stereo CenterNet Based 3D Object Detection For Autonomous Driving
11 pages
Wang Hunting Sparsity Density-Guided Contrastive Learning For Semi-Supervised Semantic Segmentation CVPR 2023 Paper
No ratings yet
Wang Hunting Sparsity Density-Guided Contrastive Learning For Semi-Supervised Semantic Segmentation CVPR 2023 Paper
10 pages
2023 CVPR 未知物体嗅探 Unknown Sniffer for Object Detection Don't Turn a Blind Eye to Unknown Objects
No ratings yet
2023 CVPR 未知物体嗅探 Unknown Sniffer for Object Detection Don't Turn a Blind Eye to Unknown Objects
10 pages
End-to-End Semi-Supervised Object Detection With Soft Teacher
No ratings yet
End-to-End Semi-Supervised Object Detection With Soft Teacher
10 pages
Unbiased Teacher v2: Semi-Supervised Object Detection For Anchor-Free and Anchor-Based Detectors
No ratings yet
Unbiased Teacher v2: Semi-Supervised Object Detection For Anchor-Free and Anchor-Based Detectors
10 pages
Overview of Object Detection Based On Deep Learnin
No ratings yet
Overview of Object Detection Based On Deep Learnin
7 pages
MOVI Miccai Workshop
No ratings yet
MOVI Miccai Workshop
11 pages
EdgeYOLO AnEdge-Real-Time Object Detector
No ratings yet
EdgeYOLO AnEdge-Real-Time Object Detector
7 pages
Kumar 2019
No ratings yet
Kumar 2019
6 pages
Computer Vision For Driver Assi - Mahdi Rezaei,-63-75
No ratings yet
Computer Vision For Driver Assi - Mahdi Rezaei,-63-75
13 pages
20052-Article Text-24065-1-2-20220628
No ratings yet
20052-Article Text-24065-1-2-20220628
10 pages
Zheng SimMatch Semi-Supervised Learning With Similarity Matching CVPR 2022 Paper
No ratings yet
Zheng SimMatch Semi-Supervised Learning With Similarity Matching CVPR 2022 Paper
11 pages
Varifocal Net
No ratings yet
Varifocal Net
11 pages
Ma Annealing-Based Label-Transfer Learning For Open World Object Detection CVPR 2023 Paper
No ratings yet
Ma Annealing-Based Label-Transfer Learning For Open World Object Detection CVPR 2023 Paper
10 pages
Instalacion 4090 9121
No ratings yet
Instalacion 4090 9121
4 pages
17029-Article Text-20523-1-2-20210518
No ratings yet
17029-Article Text-20523-1-2-20210518
8 pages
Kwon Semi-Supervised Semantic Segmentation With Error Localization Network CVPR 2022 Paper
No ratings yet
Kwon Semi-Supervised Semantic Segmentation With Error Localization Network CVPR 2022 Paper
11 pages
Ref 14
No ratings yet
Ref 14
5 pages
ASSL Professor Paper
No ratings yet
ASSL Professor Paper
15 pages
Onenet: Towards End-To-End One-Stage Object Detection
No ratings yet
Onenet: Towards End-To-End One-Stage Object Detection
11 pages
He Safe-Student For Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data CVPR 2022 Paper
No ratings yet
He Safe-Student For Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data CVPR 2022 Paper
10 pages
Kim Propagation Regularizer For Semi-Supervised Learning With Extremely Scarce Labeled Samples CVPR 2022 Paper
No ratings yet
Kim Propagation Regularizer For Semi-Supervised Learning With Extremely Scarce Labeled Samples CVPR 2022 Paper
10 pages
Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss
No ratings yet
Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss
10 pages
Object and Face Detection Based On Center-Net 1
No ratings yet
Object and Face Detection Based On Center-Net 1
7 pages
Foveabox: Beyound Anchor-Based Object Detection
No ratings yet
Foveabox: Beyound Anchor-Based Object Detection
10 pages
Slicing Aidedhyperinferenceandfine-Tuning Forsmallobjectdetection
No ratings yet
Slicing Aidedhyperinferenceandfine-Tuning Forsmallobjectdetection
5 pages
Tian FCOS Fully Convolutional One-Stage Object Detection ICCV 2019 Paper
No ratings yet
Tian FCOS Fully Convolutional One-Stage Object Detection ICCV 2019 Paper
10 pages
Grey Modern Company Resume
No ratings yet
Grey Modern Company Resume
2 pages
Planet Jute Brochure
No ratings yet
Planet Jute Brochure
7 pages
Learning A Rotation Invariant Detector With Rotatable Bounding Box
No ratings yet
Learning A Rotation Invariant Detector With Rotatable Bounding Box
9 pages
7368 ISAM ONT G-440G-A Datasheet
No ratings yet
7368 ISAM ONT G-440G-A Datasheet
2 pages
Anchor-Based Vs Anchor-Free Object
No ratings yet
Anchor-Based Vs Anchor-Free Object
8 pages
Pedestrian Detection at 100 Frames Per Second
No ratings yet
Pedestrian Detection at 100 Frames Per Second
8 pages
A Computer Is An Electronic Device That Has Storage
No ratings yet
A Computer Is An Electronic Device That Has Storage
4 pages
Pract4
No ratings yet
Pract4
4 pages
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
No ratings yet
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
6 pages
Current Openings
No ratings yet
Current Openings
3 pages
Cvpr06 Edge
No ratings yet
Cvpr06 Edge
8 pages
Understanding Eye Tracking Data For Re-Engineering Web Pages
No ratings yet
Understanding Eye Tracking Data For Re-Engineering Web Pages
5 pages
An Improved Rotation Invariant CNN-based Detector With Rotatable Bounding Boxes For Aerial Image Detection
No ratings yet
An Improved Rotation Invariant CNN-based Detector With Rotatable Bounding Boxes For Aerial Image Detection
5 pages
Distance Protection Relay Trainer Kit
No ratings yet
Distance Protection Relay Trainer Kit
2 pages
Tài liệu không có tiêu đề
No ratings yet
Tài liệu không có tiêu đề
2 pages
Random - Language (API) - Processing 3+
No ratings yet
Random - Language (API) - Processing 3+
2 pages
Level 2' DFD Showing Passport Management System
No ratings yet
Level 2' DFD Showing Passport Management System
1 page
1 Paper 1: Towards Large Yet Imperceptible Adversarial Image Perturbations With Percep-Tual Color Distance
No ratings yet
1 Paper 1: Towards Large Yet Imperceptible Adversarial Image Perturbations With Percep-Tual Color Distance
1 page
Computational Intelligence and its Applications
From Everand
Computational Intelligence and its Applications
Vikash Yadav
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

Chen Dense Learning Based Semi-Supervised Object Detection CVPR 2022 Paper

Uploaded by

Chen Dense Learning Based Semi-Supervised Object Detection CVPR 2022 Paper

Uploaded by

Dense Learning based Semi-Supervised Object Detection

Semi-supervised object detection (SSOD) aims to facil-

4.1. Comparison with State-of-the-Arts

You might also like