Varifocal Net

VarifocalNet is a dense object detector that predicts an IoU-aware classification score (IACS) instead of just a classification score. The IACS simultaneously represents the presence of an object class and the localization accuracy. VarifocalNet uses a new Varifocal Loss to train the detector to regress continuous IACS values and a star-shaped bounding box feature representation. It achieves state-of-the-art performance on COCO, outperforming other detectors by 2.0 AP.

Uploaded by

Mateus Ribeiro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views11 pages

Varifocal Net

Uploaded by

Mateus Ribeiro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

VarifocalNet: An IoU-aware Dense Object Detector

Haoyang Zhang1 , Ying Wang2 , Feras Dayoub1 , Niko Sünderhauf1

1
Australian Centre for Robotic Vision, Queensland University of Technology
2
University of Queensland
{h202.zhang, feras.dayoub, niko.suenderhauf}@qut.edu.au, [email protected]
arXiv:2008.13367v2 [cs.CV] 4 Mar 2021

Abstract

Accurately ranking the vast number of candidate detec-

tions is crucial for dense object detectors to achieve high
performance. Prior work uses the classification score or
a combination of classification and predicted localization
scores to rank candidates. However, neither option re-
sults in a reliable ranking, thus degrading detection per-
formance. In this paper, we propose to learn an Iou-aware
Classification Score (IACS) as a joint representation of ob-
ject presence confidence and localization accuracy. We
show that dense object detectors can achieve a more accu-
rate ranking of candidate detections based on the IACS. We
design a new loss function, named Varifocal Loss, to train
a dense object detector to predict the IACS, and propose
a new star-shaped bounding box feature representation for Figure 1: An illustration of our method. Instead of learning
IACS prediction and bounding box refinement. Combining to predict the class label (a) for a bounding box, we learn the
these two new components and a bounding box refinement IoU-aware classification score (IACS) as its detection score
branch, we build an IoU-aware dense object detector based which merges the object presence confidence and localiza-
on the FCOS+ATSS architecture, that we call VarifocalNet tion accuracy (b). We propose a varifocal loss for train-
or VFNet for short. Extensive experiments on MS COCO ing a dense object detector to predict the IACS, and a star-
show that our VFNet consistently surpasses the strong base- shaped bounding box feature representation (the features at
line by ∼2.0 AP with different backbones. Our best model nine yellow sampling points) for IACS prediction. With the
VFNet-X-1200 with Res2Net-101-DCN achieves a single- new representation, we refine the initially regressed box (in
model single-scale AP of 55.1 on COCO test-dev, which red) into a more accurate one (in blue).
is state-of-the-art among various object detectors. Code is
available at: https://github.com/hyz-xmaster/VarifocalNet. classification scores may be mistakenly removed in NMS.
To solve the problem, existing dense object detectors
1. Introduction predict either an additional IoU score [11] or a centerness
score [9] as the localization accuracy estimation, and mul-
Modern object detectors, regardless of being a two-stage tiply them by the classification score to rank detections in
method [1–4] or a one-stage method [5–9], usually first NMS. These methods can alleviate the misalignment prob-
generate a redundant set of bounding boxes with classifi- lem between the classification score and the object localiza-
cation scores and then deploy non-maximum suppression tion accuracy. However, they are sub-optimal because mul-
(NMS) to remove duplicated bounding boxes on the same tiplying the two imperfect predictions may lead to a worse
object. Generally, the classification score is used to rank the rank basis and we show in Section 3 that the upper bound of
bounding box in NMS [1–4,8]. However, this harms the de- the performance achieved by such methods is limited. Be-
tection performance, because the classification score is not sides, adding an extra network branch to predict the local-
always a good estimate of the bounding box localization ization score is not an elegant solution and incurs additional
accuracy [10] and accurately localized detections with low computation burden.

1
To overcome these shortcomings, we naturally would popular two-stage methods [3, 4] and multi-stage meth-
like to ask: Instead of predicting an additional localiza- ods [17] usually employ anchors to generate object propos-
tion accuracy score, can we merge it into the classification als for downstream classification and regression, anchor-
score? That is, predict a localization-aware or IoU-aware based one-stage methods [6–8, 12, 18, 19] directly classify
classification score (IACS) that simultaneously represents and refine anchor boxes without object proposal generation.
the presence of a certain object class and the localization More recently, anchor-free detectors have attracted sub-
accuracy of a generated bounding box. stantial attention due to their novelty and simplicity. One
In this paper, we answer the above question and make kind of them formulates the object detection problem as
the following contributions. (1) We show that accurately a key-point or a semantic-point detection problem, includ-
ranking candidate detections is critical for high performing ing CornerNet [20], CenterNet [21], ExtremeNet [22], Ob-
dense object detectors, and IACS achieves a better ranking jectsAsPoints [23] and RepPoints [24]. Another type of
than other methods (Section 3). (2) We propose a new Var- anchor-free detectors are similar to anchor-based one-stage
ifocal Loss for training dense object detectors to regress the methods, but they remove the usage of anchor boxes. In-
IACS. (3) We design a new star-shaped bounding box fea- stead, they classify each point on the feature pyramids [25]
ture representation for computing the IACS and refining the into foreground classes or background, and directly predict
bounding box. (4) We develop a new dense object detector the distances from the foreground point to the four sides
based on the FCOS [9]+ATSS [12] and the proposed com- of the ground-truth bounding box, to produce the detection.
ponents, named VarifocalNet or VFNet for short, to exploit Popular methods include DenseBox [26], FASF [27], Fove-
the advantage of the IACS. An illustration of our method is aBox [15], FCOS [9], and SPAD [28]. We build our VFNet
shown in Figure 1. based on the ATSS [12] version of FCOS due to its simplic-
The Varifocal Loss, inspired by the focal loss [8], is a ity, high efficiency and excellent performance.
dynamically scaled binary cross entropy loss. However, it Detection Ranking Measures. In addition to the classifi-
supervises the dense object detector to regress continuous cation score, other detection ranking measures have been
IACSs, and more distinctively it adopts an asymmetrical proposed. IoU-Net [10] adopts an additional network to
training example weighting method. It down-weights only predict the IoU and uses it to rank bounding boxes in NMS,
negative examples for addressing the class imbalance prob- but it still selects the classification score as the final detec-
lem during training, and yet up-weights high-quality position score. Fitness NMS [29], IoU-aware RetinaNet [11]
tive examples for generating prime detections. This focuses and [30] are similar to IoU-Net in essence, except that they
the training on high-quality positive examples, which is im- multiply the predicted IoU or IoU-based ranking scores and
portant to achieve a high detection performance. the classification score as the ranking basis. Instead of pre-
The star-shaped bounding box feature representation dicting the IoU-based score, FCOS [9] predicts centerness
uses the features at nine fixed sampling points (yellow cir- scores to suppress the low-quality detections.
cles in Figure 1) to represent a bounding box with the de- By contrast, we predict only the IACS as the ranking
formable convolution [13, 14]. Compared to the point fea- score. This avoids the overhead of an additional network
ture used in most existing dense object detectors [7–9, 15], and the possible worse ranking basis resulting from multi-
this feature representation can capture the geometry of the plying the imperfect localization and classification scores.
bounding box and its nearby contextual information, which
is essential for predicting an accurate IACS. It also enables Encoding the Bounding Box. Extracting discriminative
us to effectively refine the initially generated coarse bound- features to represent a bounding box is important for down-
ing box without losing efficiency. stream classification and regression in object detection. In
To verify the effectiveness of our proposed modules, we two-stage and multi-stage methods, RoI Pooling [2, 3] or
build the VFNet based on the FCOS+ATSS and evaluate RoIAlign [4] are widely employed to extract bounding box
it on COCO benchmark [16]. Experiments show that our features. But applying them in dense object detectors is
VFNet consistently exceeds the strong baseline by ∼2.0 time-consuming. Instead, one-stage detectors generally use
AP with different backbones, and our best model VFNet- point features as the bounding box descriptor [7–9], due to
X-1200 with Res2Net-101-DCN reaches a single-model the efficiency consideration. However, these local features
single-scale 55.1 AP on COCO test-dev, surpassing fail to capture the geometry of the bounding box and essen-
previously published best single-model single-scale results. tial contextual information.
Alternatively, HSD [31] and RepPoints [24] extract fea-
2. Related Work tures at learned semantic points with the deformable con-
volution to encode a bounding box. However, learning to
Object Detection. With the development of object de- localize semantic points is challenging due to the lack of
tection, currently popular object detectors can be catego- strong supervision, and the prediction of semantic points
rized by whether they use anchor boxes or not. While also aggravates the computation burden.

2
FCOS is built on FPN [25] and its detection head has
three branches. One predicts the classification score for
each point on the feature map, one regresses the distances
from the point to the four sides of a bounding box, and an-
other predicts the centerness score which is multiplied by
the classification score to rank the bounding box in NMS.
Figure 2 shows an example of the output from the FCOS
head. In this paper, we actually study the ATSS version
of FCOS (FCOS+ATSS) in which the Adaptive Training
Sample Selection (ATSS) mechanism [12] is used to define
Figure 2: An example of the output from the FCOS head foreground and background points on the feature pyramids
which includes a classification score, a bounding box and a during training. We refer the reader to [12] for more details.
centerness score. To investigate the performance upper bound of the
FCOS+ATSS (trained on COCO train2017 [16]), we al-
FCOS+ATSS ternately replace the predicted classification score, the dis-
w/ctr X X X X X X tance offsets and the centerness score with corresponding
gt ctr X ground-truth values for foreground points before NMS, and
gt ctr iou X evaluate the detection performance in terms of AP [16] on
gt bbox X X COCO val2017. For the classification score vector, we
gt cls X X implement two options, that is, replacing its element at the
gt cls iou X X ground-truth label position with a value of 1.0 or the IoU
AP 38.5 39.2 41.1 43.5 56.1 56.3 43.1 58.1 74.7 67.4 between the predicted bounding box and the ground-truth
Table 1: Performance of the FCOS+ATSS on COCO one (termed as gt IoU). We also consider replacing the cen-
val2017 with different oracle predictions. W/ctr means terness score with the gt IoU in addition to its true value.
using the centerness score in inference. Please see the text The results are shown in Table 1. We can see that
for the meaning of other abbreviations. the original FCOS+ATSS achieves 39.2 AP. When using
the ground-truth centerness score (gt ctr) in inference, un-
In comparison, our proposed star-shaped bounding box expectedly, only about 2.0 AP is increased. Similarly,
representation uses the features at nine fixed sampling replacing the predicted centerness score with the gt IoU
points to describe a bounding box. It is simple, efficient, (gt ctr iou) only achieves 43.5 AP. This indicates that us-
and yet able to capture the geometry of the bounding box ing the product of either the predicted centerness score or
and spatial context cues around it. the IoU score and the classification score to rank detections
Generalized Focal Loss. The most similar work to ours is certainly unable to bring significant performance gain.
is a concurrent work, Generalized Focal Loss (GFL) [32].
By contrast, the FCOS+ATSS with ground-truth bound-
The GFL extends the focal loss [8] to a continuous version
ing boxes (gt bbox) achieves 56.1 AP even without center-
and trains a detector to predict a joint representation of lo-
ness score (no w/ctr) in inference. But if setting the classifi-
calization quality and classification.
cation score as 1.0 at the ground-truth label position (gt cls),
We emphasize first that our varifocal loss is a distinct
whether or not to use the centerness score becomes impor-
function from the GFL. It weights positive and negative ex-
tant (43.1 AP vs 58.1 AP). Because the centerness score can
amples asymmetrically, whereas the GFL deals with them
differentiate accurate and inaccurate boxes to some extent.
equally, and experiment results show that our varifocal loss
The most surprising result is the one obtained by re-
performs better than the GFL. Moreover, we propose a star-
placing the classification score of the ground-truth class
shaped bounding box feature representation to facilitate the
with the gt IoU (gt cls iou). Without the centerness score,
prediction of IACS, and further improve the object local-
this case achieves an impressive 74.7 AP which is signif-
ization accuracy through a bounding box refinement step,
icantly higher than other cases. This in fact reveals that
which are not considered in the GFL.
there already exist accurately localized bounding boxes in
3. Motivation the large candidate pool for most objects. The key to achiev-
ing an excellent detection performance is to accurately se-
In this section, we investigate the performance up- lect those high-quality detections from the pool, and these
per bound of a popular anchor-free dense object detector, results show that replacing the classification score of the
FCOS [9], identify its main performance hindrance and ground-truth class with the gt IoU is the most promising
show the importance of producing the IoU-aware classifi- selection measure. We refer to the element of such a score
cation score as the ranking criterion. vector as the IoU-aware Classification Score (IACS).

3
4. VarifocalNet As Equation 2 shows, the varifocal loss only reduces the
loss contribution from negative examples (q=0) by scaling
Based on the discovery above, we propose to learn the their losses with a factor of pγ and does not down-weight
IoU-aware classification score (IACS) to rank detections. positive examples (q>0) in the same way. This is because
To this end, we build a new dense object detector, coined positive examples are extremely rare compared with nega-
as VarifocalNet or VFNet, based on the FCOS+ATSS tives and we should keep their precious learning signals. On
with the centerness branch removed. Compared with the the other hand, inspired by PISA [33] and [34], we weight
FCOS+ATSS, it has three new components: the varifcoal the positive example with the training target q. If a positive
loss, the star-shaped bounding box feature representation example has a high gt IoU, its contribution to the loss will
and the bounding box refinement. thus be relatively big. This focuses the training on those
4.1. IACS – IoU-Aware Classification Score high-quality positive examples which are more important
for achieving a higher AP than those low-quality ones.
We define the IACS as a scalar element of a classification To balance the losses between positive examples and
score vector, in which the value at the ground-truth class negative examples, we add an adjustable scaling factor α
label position is the IoU between the predicted bounding to the negative loss term.
box and its ground truth, and 0 at other positions.
4.3. Star-Shaped Box Feature Representation
4.2. Varifocal Loss
We design a star-shaped bounding box feature represen-
We design the novel Varifocal Loss for training a dense tation for IACS prediction. It uses the features at nine fixed
object detector to predict the IACS. Since it is inspired sampling points (yellow circles in Figure 1) to represent
by Focal Loss [8], we first briefly review the focal loss. a bounding box with the deformable convolution [13, 14].
Focal loss is designed to address the extreme imbalance This new representation can capture the geometry of a
problem between foreground and background classes dur- bounding box and its nearby contextual information, which
ing the training of dense object detectors. It is defined as: is essential for encoding the misalignment between the pre-
( dicted bounding box and the ground-truth one.
−α(1 − p)γ log(p) if y = 1 Specifically, given a sampling location (x, y) on the im-
FL(p, y) = γ
(1)
−(1 − α)p log(1 − p) otherwise, age plane (or a projecting point on the feature map), we
first regress an initial bounding box from it with 3x3 convo-
where y ∈ {±1} specifies the ground-truth class and p ∈ lution. Following the FCOS, this bounding box is encoded
[0, 1] is the predicted probability for the foreground class. by a 4D vector (l’, t’, r’, b’) which means the distance from
As shown in Equation 1, the modulating factor ((1 − p)γ for the location (x, y) to the left, top, right and bottom side of
the foreground class and pγ for the background class) can the bounding box respectively. With this distance vector,
reduce the loss contribution from easy examples and rela- we heuristically select nine sampling points at: (x, y), (x-
tively increases the importance of mis-classified examples. l’, y), (x, y-t’), (x+r’, y), (x, y+b’), (x-l’, y-t’), (x+l’, y-t’),
Thus, the focal loss prevents the vast number of easy neg- (x-l’, y+b’) and (x+r’, y+b’), and then map them onto the
atives from overwhelming the detector during training and feature map. Their relative offsets to the projecting point
focuses the detector on a sparse set of hard examples. of (x, y) serve as the offsets to the deformable convolu-
We borrow the example weighting idea from the focal tion [13,14] and then features at these nine projecting points
loss to address the class imbalance problem when training a are convolved by the deformable convolution to represent
dense object detector to regress the continuous IACS. How- a bounding box. Since these points are manually selected
ever, unlike the focal loss that deals with positives and neg- without additional prediction burden, our new representa-
atives equally, we treat them asymmetrically. Our varifocal tion is computation efficient.
loss is also based on the binary cross entropy loss and is
defined as:
4.4. Bounding Box Refinement
( We further improve the object localization accuracy
−q(qlog(p) + (1 − q)log(1 − p)) q > 0 through a bounding box refinement step. Bounding box re-
VFL(p, q) =
−αpγ log(1 − p) q = 0, finement is a common technique in object detection [17,35],
(2) however, it is not widely adopted in dense object detec-
where p is the predicted IACS and q is the target score. For tors due to the lack of an efficient and discriminative object
a foreground point, q for its ground-truth class is set as the descriptor. With our new star representation, we can now
IoU between the generated bounding box and its ground adopt it in dense object detectors without losing efficiency.
truth (gt IoU) and 0 otherwise, whereas for a background We model the bounding box refinement as a residual
point, the target q for all classes is 0. See Figure 1. learning problem. For an initially regressed bounding box

4
Figure 3: The network architecture of our VFNet. The VFNet is built on the FPN (P3-P7). Its head consists of two subnet-
works, one for regressing the initial bounding box and refining it, and the other for predicting the IoU-aware classification
score, based on a star-shaped bounding box feature representation (Star Dconv). H×W denotes the size of the feature map.

(l’, t’, r’, b’), we first extract the star-shaped representa- ber) elements per spatial location, where each element rep-
tion to encode it. Then, based on the representation, we resents jointly the object presence confidence and localiza-
learn four distance scaling factors (∆l, ∆t, ∆r, ∆b) to scale tion accuracy.
the initial distance vector, so that the refined bounding box
4.6. Loss Function and Inference
that is represented by (l, t, r, b) = (∆l×l’, ∆t×t’, ∆r×r’,
∆b×b’) is closer to the ground truth. Loss Function. The training of our VFNet is supervised
by the loss function:
4.5. VarifocalNet
1 XX
Loss = VFL(pc,i , qc,i )
Attaching the above three components to the FCOS Npos
i c
network architecture and removing the original centerness
λ0 X
branch, we get the VarifocalNet. + qc∗ ,i Lbbox (bbox0i , bbox∗i ) (3)
Figure 3 illustrates the network architecture of the Npos
i
VFNet. The backbone and FPN network parts of the VFNet λ1 X
are the same as the FCOS. The difference lies in the head + qc∗ ,i Lbbox (bboxi , bbox∗i )
Npos
i
structure. The VFNet head consists of two subnetworks.
The localization subnet performs bounding box regression where pc,i and qc,i denote the predicted and target IACS
and subsequent refinement. It takes as input the feature map respectively for the class c at the location i on each level
from each level of the FPN and first applies three 3x3 conv feature map of FPN. Lbbox is the GIoU loss [36], and bbox0i ,
layers with ReLU activations. This produces a feature map bboxi and bbox∗i represent the initial, refined and ground-
with 256 channels. One branch of the localization subnet truth bounding box respectively. We weight the Lbbox with
convolves the feature map again and then outputs a 4D dis- the training target qc∗ ,i , which is the gt IoU for foreground
tance vector (l’, t’, r’, b’) per spatial location which reppoints and 0 otherwise, following the FCOS. λ0 and λ1 are
resents the initial bounding box. Given the initial box and the balance weights for Lbbox and are empirically set as 1.5
the feature map, the other branch applies a star-shaped de- and 2.0 respectively in this paper. Npos is the number of
formable convolution to the nine feature sampling points foreground points and is used to normalize the total loss.
and produces the distance scaling factor (∆l, ∆t, ∆r, ∆b) As mentioned in Section 3, we employ the ATSS [12] to
which is multiplied by the initial distance vector to generate define foreground and background points during training.
the refined bounding box (l, t, r, b). Inference. The inference of the VFNet is straightforward.
The other subnet aims to predict the IACS. It has the It involves simply forwarding an input image through the
similar structure to the localization subnet (the refinement network and a NMS post-processing step for removing re-
branch) except that it outputs a vector of C (the class num- dundant detections.

5
5. Experiments γ α q weighting AP AP50 AP75
1.0 0.50 X 41.2 59.2 44.7
Dataset and Evaluation Metrics. We evaluate the VFNet 1.5 0.75 X 41.5 59.7 45.1
on the challenging MS COCO 2017 benchmark [16]. Fol- 2.0 0.75 X 41.6 59.5 45.0
lowing the common practice [3, 8, 9, 12], we train detec- 2.0 0.75 41.2 59.1 44.4
tors on the train2017 split, report ablation results on 2.5 1.25 X 41.5 59.4 45.2
the val2017 split and compare with other detectors on the 3.0 1.00 X 41.3 59.0 44.7
test-dev split by uploading the results to the evaluation
server. We adopt the standard COCO-style Average Preci- Table 2: Peformance of the VFNet when changing the
sion (AP) as the evaluation metrics. hyper-parameters (α, γ) of the varifocal loss. q weighting
Implementation and Training Details. We implement means weighting the loss of the positive example with the
the VFNet with MMDetection [37]. Unless specified, we learning target q.
adopt the default hyper-parameters used in MMDetection.
The initial learning rate is set as 0.01 and we employ the Star BBox Re-
VFL AP AP50 AP75
linear warming up policy [38] to start the training where the Dconv finement
warm-up ratio is set as 0.1. We use 8 V100 GPUs for train- 39.0 57.7 41.8
ing with a total batch size of 16 (2 images per GPU) in both X 40.1 58.5 43.4
ablation studies and performance comparison. X X 40.7 59.0 44.0
For ablation studies on the val2017, the ResNet- X X X 41.6 59.5 45.0
50 [39] is used as the backbone network and 1x training FCOS+ATSS 39.2 57.3 42.4
schedule (12 epochs) [37] is adopted. Input images are re- Table 3: Individual contribution of the components in our
sized to a maximum scale of 1333×800, without changing method. The first row represents the results of the raw
the aspect ratio. Only random horizontal image flipping is VFNet trained with the focal loss [8].
used for data augmentation.
For performance comparison with the state-of-the-art on
the test-dev, we train the VFNet with different back- two hyper-parameters: α for balancing the losses between
bone networks, including those ones with deformable con- positive examples and negative examples, and γ for down-
volution layers [13, 14] (denoted as DCN) inserted. When weighting the losses of the easy negative examples. We
DCN is used in the backbone, we also insert it into the last show the performance of the VFNet in Table 2 when vary-
layers before the star deformable convolution in the VFNet ing α from 0.5 to 1.5 and γ from 1.0 to 3.0 (only the results
head. 2x (24 epochs) training scheme and multi-scale train- obtained with optimal α are shown). It shows that similar
ing (MSTrain) are adopted, where a maximum image scale results above 41.2 AP are achieved and our varifocal loss is
for each iteration is randomly selected from a scale range. quite robust to different sets of (α, γ). Among those, α =
In fact, we apply two image scale ranges in experiments. 0.75 and γ = 2.0 work best (41.6 AP), and we adopt these
For fair comparison with the baseline, we use the scale two values for all the following experiments.
range 1333×[640:800]; out of curiosity, we also experiment We also investigate the effect of weighting the loss of
with a wider scale range 1333×[480:960]. Note that even the positive example with the training target q, termed as
MSTrain is employed, we keep the maximum image scale q weighting. The fourth row in Table 2 shows the perfor-
to 1333×800 in inference, although a bigger scale performs mance of the optimal set of (α, γ) without q weighting and
slightly better (about 0.4 AP gain with 1333×900 scale). 0.4 AP drop is observed (41.2 AP v.s 41.6 AP). This con-
firms the positive effect of q weighting.
Inference Details. In inference, we forward the input im-
age which is resized to a maximum scale of 1333×800 5.1.2 Individual Component Contribution
through the network and obtain estimated bounding boxes
with corresponding IACSs. We first filter out those bound- We study the impact of the individual component of our
ing boxes with pmax ≤ 0.05 and select at most 1k top- method and results are shown in Table 3. The first row
scoring detections per FPN level. Then, the selected de- shows the performance of the raw VFNet (FCOS+ATSS
tections are merged and redundant detections are removed without centerness branch) trained with the focal loss and
by NMS with a threshold of 0.6 to yield the final results. 39.0 AP is acquired. Replacing the focal loss with our var-
ifocal loss, the performance is improved to 40.1 AP, which
5.1. Ablation Study is 0.9 AP higher than the FCOS+ATSS. By adding the star-
shaped representation and bounding box refinement mod-
5.1.1 Varifocal Loss
ules, the performance is further boosted to 40.7 AP and 41.6
We first investigate the effect of the hyper-parameters of AP respectively. These results verify the effectiveness of the
the varifocal loss on the detection performance. There are three modules in our VFNet.

6
Method Backbone FPS AP AP50 AP75 APS APM APL
Anchor-based multi-stage:
Faster R-CNN [3] X-101 40.3 62.7 44.0 24.4 43.7 49.8
Libra R-CNN [40] R-101 41.1 62.1 44.7 23.4 43.7 52.5
Mask R-CNN [4] X-101 41.4 63.4 45.2 24.5 44.9 51.8
R-FCN [41] R-101 41.4 63.4 45.2 24.5 44.9 51.8
TridentNet [42] R-101 42.7 63.6 46.5 23.9 46.6 56.6
Cascade R-CNN [17] R-101 42.8 62.1 46.3 23.7 45.5 55.2
SNIP [43] R-101 43.4 65.5 48.4 27.2 46.5 54.9
Anchor-based one-stage:
SSD512 [7] R-101 31.2 50.4 33.3 10.2 34.5 49.8
YOLOv3 [6] DarkNet-53 33.0 57.9 34.4 18.3 35.4 41.9
DSSD513 [44] R-101 33.2 53.3 35.2 13.0 35.4 51.1
RefineDet [35] R-101 36.4 57.5 39.5 16.6 39.9 51.4
RetinaNet [8] R-101 39.1 59.1 42.3 21.8 42.7 50.2
FreeAnchor [18] R-101 43.1 62.2 46.4 24.5 46.1 54.8
GFL [32] R-101-DCN 47.3 66.3 51.4 28.0 51.1 59.2
GFL [32] X-101-32x4d-DCN 48.2 67.4 52.6 29.2 51.7 60.2
EfficientDet-D6 [45] B6 5.3† 51.7 71.2 56.0 34.1 55.2 64.1
EfficientDet-D7 [45] B6 3.8† 52.2 71.4 56.3 34.8 55.5 64.6
Anchor-free key-point:
ExtremeNet [22] Hourglass-104 40.2 55.5 43.2 20.4 43.2 53.1
CornerNet [20] Hourglass-104 40.5 56.5 43.1 19.4 42.7 53.9
Grid R-CNN [46] X-101 43.2 63.0 46.6 25.1 46.5 55.2
CenterNet [20] Hourglass-104 44.9 62.4 48.1 25.6 47.4 57.4
RepPoints [24] R-101-DCN 45.0 66.1 49.0 26.6 48.6 57.5
Anchor-free one-stage:
FoveaBox [15] X-101 42.1 61.9 45.2 24.9 46.8 55.6
FSAF [27] X-101-64x4d 42.9 63.8 46.3 26.6 46.2 52.7
FCOS [9] R-101 43.0 61.7 46.3 26.0 46.8 55.0
SAPD [28] R-101 43.5 63.6 46.5 24.9 46.8 54.6
SAPD [28] R-101-DCN 46.0 65.9 49.6 26.3 49.2 59.6
Baseline:
ATSS [12] R-101 17.5 43.6 62.1 47.4 26.1 47.0 53.6
ATSS [12] X-101-64x4d 8.9 45.6 64.6 49.7 28.5 48.9 55.6
ATSS [12] R-101-DCN 13.7 46.3 64.7 50.4 27.7 49.8 58.4
ATSS [12] X-101-64x4d-DCN 6.9 47.7 66.5 51.9 29.7 50.8 59.4
Ours:
VFNet R-50 19.3 44.3/44.8 62.5/63.1 48.1/48.7 26.7/27.2 47.3/48.1 54.3/54.8
VFNet R-101 15.6 46.0/46.7 64.2/64.9 50.0/50.8 27.5/28.4 49.4/50.2 56.9/57.6
VFNet X-101-32x4d 13.1 46.7/47.6 65.2/66.1 50.8/51.8 28.3/29.4 50.1/50.9 57.3/58.4
VFNet X-101-64x4d 9.2 47.4/48.5 65.8/67.0 51.5/52.6 29.5/30.1 50.7/51.7 58.1/59.7
VFNet R2-101 [47] 13.0 48.4/49.3 66.9/67.6 52.6/53.5 30.3/30.5 52.0/53.1 59.2/60.5
VFNet R-50-DCN 16.3 47.3/48.0 65.6/66.4 51.4/52.3 28.4/29.0 50.3/51.2 59.4/60.4
VFNet R-101-DCN 12.6 48.4/49.2 66.7/67.5 52.6/53.7 28.9/29.7 51.7/52.6 61.0/62.4
VFNet X-101-32x4d-DCN 10.1 49.2/50.0 67.8/68.5 53.6/54.4 30.0/30.4 52.6/53.2 62.1/62.9
VFNet X-101-64x4d-DCN 6.7 49.9/50.8 68.5/69.3 54.3/55.3 30.7/31.6 53.1/54.2 62.8/64.4
VFNet R2-101-DCN [47] 10.3 50.4/51.3 68.9/69.7 54.7/55.8 31.2/31.9 53.7/54.7 63.3/64.4
VFNet-X-800 R2-101-DCN [47] 8.0 53.7 71.6 58.7 34.4 57.5 67.5
VFNet-X-1200 R2-101-DCN [47] 4.2 55.1 73.0 60.1 37.4 58.2 67.0
Table 4: Performance (single-model single-scale) comparison with state-of-the-art detectors on MS COCO test-dev.
VFNet consistently outperforms the strong baseline ATSS by ∼2.0 AP. Our best model VFNet-X-1200 reaches 55.1 AP,
achieving the new stat-of-the-art. ’R’: ResNet. ’X’: ResNeXt. ’R2’: Res2Net. ’DCN’: Deformable convolution network. ’/’
separates results of the MSTrain image scale range 1333×[640:800] / 1333×[480:960]. FPSs with † are from papers.
7
Method AP AP50 AP75 to 384 channels.
RetinaNet [8] + FL 36.5 55.5 38.8 RandomCrop and Cutout. We employ the random crop
RetinaNet [8] + GFL 37.3 56.4 40.0 and cutout [50] as additional data augmentation methods.
RetinaNet [8] + VFL 37.4 56.5 40.2 Wider MSTrain Scale Range and Longer Training. We
FoveaBox [15] + FL 36.3 56.3 38.3 adopt a wider MSTrain scale range, from 750×500 to
FoveaBox [15] + GFL 36.9 56.0 39.7 2100×1400, and initially train the VFNet-X for 41 epochs.
FoveaBox [15] + VFL 37.2 56.2 39.8 SWA. We apply the technique of stochastic weight averag-
RepPoints [24] + FL 38.3 59.2 41.1 ing (SWA) [51] in training the VFNet-X, which brings 1.2
RepPoints [24] + GFL 39.2 59.8 42.5 AP gain. Specifically, after the initial 41-epoch training of
RepPoints [24] + VFL 39.7 59.8 43.1 VFNet-X, we further train it for another 18 epochs using a
ATSS [12] + FL 39.3 57.5 42.5 cyclic learning rate schedule and then simply average those
ATSS [12] + GFL 39.8 57.7 43.2 18 checkpoints as our final model.
ATSS [12] + VFL 40.2 58.2 44.0 The performance of VFNet-X on COCO test-dev is
VFNet + FL 40.0 58.0 43.2 shown in the last rows of Table 4. When the inference scale
VFNet + GFL 41.1 58.9 42.2 1333×800 and soft-NMS [52] are adopted, VFNet-X-800
VFNet + VFL 41.6 59.5 45.0 achieves 53.7 AP, while simply increasing the image scale
to 1800×1200, VFNet-X-1200 reaches a new state-of-the-
Table 5: Comparison of performances when applying the
art 55.1 AP, surpassing prior detectors by a large margin.
focal loss (FL) [8], the generalized focal loss (GFL) [32]
Qualitative detection examples of applying this model to the
and our varifocal loss (VFL) to existing popular dense ob-
COCO test-dev can be found in Figure 4.
ject detectors and our VFNet.
5.4. Generality and Superiority of Varifocal Loss
5.2. Comparison with State-of-the-Art To verify the generality of our varifocal loss, we ap-
We compare our VFNet with other detectors on the ply it to some existing popular dense object detectors, in-
COCO test-dev. We select the ATSS [12] as our base- cluding RetinaNet [8], FoveaBox [15], RepPoints [24] and
line since it has similar performance to the FCOS+ATSS. ATSS [12], and evaluate the performance on the val2017.
Table 4 presents the results. Compared with the strong We simply replace the focal loss (FL) [8] used in these de-
baseline ATSS, our VFNet achieves ∼2.0 AP gaps with tectors (ResNet-50 backbone) with our varifocal loss for
different backbones, e.g. 46.0 AP vs. 43.6 AP with the training. We also train them with the generalized focal loss
ResNet-101 backbone. This validates the contributions (GFL) for comparison.
of our method. Compared to the concurrent work, the Table 5 shows the results. We can see that our varifocal
GFL [32] (whose MSTrain scale range is 1333x[480:800]), loss improves RetinaNet, FoveaBox and ATSS consistently
our VFNet is consistently better than it by a considerable by 0.9 AP. For RepPoints, the gain increases to 1.4 AP. This
margin. Meanwhile, our model trained with Res2Net-101- shows that our varifocal loss can easily bring considerable
DCN [47] achieves a single-model single-scale AP of 51.3, performance boost to existing dense object detectors. Com-
surpassing almost all recent state-of-the-art detectors. pared to the GFL, our varifocal loss performs better than it
We also report the inference speed of VFNet in terms of in all cases, evidencing the superiority of our varifocal loss.
frame per second (FPS) on a Nvidia V100 GPU. Since it is Additionally, we train our VFNet with the FL and GFL
difficult to get the speed of all the listed detectors under ex- for further comparison. Results are shown in the last section
actly same settings, we only compare VFNet with the base- of Table 5 and the consistent advantage of our varifocal loss
line ATSS. It can be seen that our VFNet is very efficient, over the FL and GFL can be observed.
e.g. achieving 44.8 AP at 19.3 FPS, and only incurs small 6. Conclusion
additional computation overhead compared to the baseline. In this paper, we propose to learn the IACS for rank-
5.3. VarifocalNet-X ing detections. We first show the importance of produc-
ing the IACS to rank bounding boxes and then develop a
To push the envelope of VFNet, We also implement dense object detector, VarifocalNet, to exploit the advan-
some extensions to the original VFNet. This version of tage of the IACS. In particular, we design a varifocal loss
VFNet is called VFNet-X and those extensions include: for training the detector to predict the IACS, and a star-
PAFPN. We replace the FPN with the PAFPN [48], and shaped bounding box feature representation for IACS pre-
apply the DCN and group normalization (GN) [49] in it. diction and bounding box refinement. Experiments on the
More and Wider Conv Layers. We stack 4 convolution MS COCO benchmark verify the effectiveness of our meth-
layers in the detection head, instead of 3 layers in the orig- ods and show that our VarifocalNet achieves the new stat-
inal VFNet, and increase the original 256 feature channels of-the-art performance among various object detectors.

8
Figure 4: Detection examples of applying our best model on COCO test-dev. The score threshold for visualization is 0.3.

9
References [18] Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, and
Qixiang Ye. Freeanchor: Learning to match anchors for vi-
[1] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra sual object detection. In NIPS, 2019. 2, 7
Malik. Region-based convolutional networks for accurate
object detection and segmentation. IEEE Transactions on [19] Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, and
Pattern Analysis and Machine Intelligence, 2015. 1 Dahua Lin. Region proposal by guided anchoring. In CVPR,
2019. 2
[2] Ross Girshick. Fast R-CNN. In ICCV, 2015. 1, 2
[20] Hei Law and Jia Deng. Cornernet: Detecting objects as
[3] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. paired keypoints. In ECCV, 2018. 2, 7
Faster r-cnn: Towards real-time object detection with region
proposal networks. In NIPS, 2015. 1, 2, 6, 7 [21] Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qing-
ming Huang, and Qi Tian. Centernet: Keypoint triplets for
[4] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Gir-
object detection. In ICCV, 2019. 2
shick. Mask r-cnn. In ICCV, 2017. 1, 2, 7
[22] Xingyi Zhou, Jiacheng Zhuo, and Philipp Krahenbuhl.
[5] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali
Bottom-up object detection by grouping extreme and center
Farhadi. You only look once: Unified, real-time object de-
points. In CVPR, 2019. 2, 7
tection. In CVPR, 2016. 1
[23] Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. Ob-
[6] Joseph Redmon and Ali Farhadi. Yolov3: An incremental
jects as points. arXiv preprint arXiv:1904.07850, 2019. 2
improvement. arXiv preprint arXiv:1804.02767, 2018. 1, 2,
7 [24] Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen
Lin. Reppoints: Point set representation for object detection.
[7] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian
In ICCV, 2019. 2, 7, 8
Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C
Berg. Ssd: Single shot multibox detector. In ECCV. Springer. [25] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He,
1, 2, 7 Bharath Hariharan, and Serge Belongie. Feature pyramid
networks for object detection. In CVPR, 2017. 2, 3
[8] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and
Piotr Dollár. Focal loss for dense object detection. In ICCV, [26] Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu. Dense-
2017. 1, 2, 3, 4, 6, 7, 8 box: Unifying landmark localization with end to end object
detection. arXiv preprint arXiv:1509.04874, 2015. 2
[9] Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. Fcos:
Fully convolutional one-stage object detection. In ICCV, [27] Chenchen Zhu, Yihui He, and Marios Savvides. Feature se-
2019. 1, 2, 3, 6, 7 lective anchor-free module for single-shot object detection.
In CVPR, 2019. 2, 7
[10] Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, and Yun-
ing Jiang. Acquisition of localization confidence for accurate [28] Chenchen Zhu, Fangyi Chen, Zhiqiang Shen, and Marios
object detection. In ECCV, 2018. 1, 2 Savvides. Soft anchor-point object detection. In ECCV,
2020. 2, 7
[11] Shengkai Wu, Xiaoping Li, and Xinggang Wang. Iou-aware
single-stage object detector for accurate localization. Image [29] Lachlan Tychsen-Smith and Lars Petersson. Improving ob-
and Vision Computing, 2020. 1, 2 ject localization with fitness nms and bounded iou loss. In
CVPR, 2018. 2
[12] Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and
Stan Z Li. Bridging the gap between anchor-based and [30] Zhiyu Tan, Xuecheng Nie, Qi Qian, Nan Li, and Hao Li.
anchor-free detection via adaptive training sample selection. Learning to rank proposals for object detection. In ICCV,
In CVPR, 2020. 2, 3, 5, 6, 7, 8 2019. 2
[13] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong [31] Jiale Cao, Yanwei Pang, Jungong Han, and Xuelong Li. Hi-
Zhang, Han Hu, and Yichen Wei. Deformable convolutional erarchical shot detector. In ICCV, 2019. 2
networks. In ICCV, 2017. 2, 4, 6 [32] Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu,
[14] Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. De- Jun Li, Jinhui Tang, and Jian Yang. Generalized focal loss:
formable convnets v2: More deformable, better results. In Learning qualified and distributed bounding boxes for dense
CVPR, 2019. 2, 4, 6 object detection. arXiv preprint arXiv:2006.04388, 2020. 3,
[15] Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, Lei Li, 7, 8
and Jianbo Shi. Foveabox: Beyound anchor-based object [33] Yuhang Cao, Kai Chen, Chen Change Loy, and Dahua Lin.
detection. IEEE Transactions on Image Processing, 2020. 2, Prime sample attention in object detection. In CVPR, 2020.
7, 8 4
[16] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, [34] Shengkai Wu and Xiaoping Li. Iou-balanced loss func-
Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence tions for single-stage object detection. arXiv preprint
Zitnick. Microsoft coco: Common objects in context. In arXiv:1908.05641, 2019. 4
ECCV, 2014. 2, 3, 6 [35] Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and
[17] Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: Delving Stan Z Li. Single-shot refinement neural network for object
into high quality object detection. In CVPR, 2018. 2, 4, 7 detection. In CVPR, 2018. 4, 7

10
[36] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir [44] Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi,
Sadeghian, Ian Reid, and Silvio Savarese. Generalized in- and Alexander C Berg. Dssd: Deconvolutional single shot
tersection over union: A metric and a loss for bounding box detector. In CoRR, 2017. 7
regression. In CVPR, 2019. 5
[45] Mingxing Tan, Ruoming Pang, and Quoc V Le. Efficientdet:
[37] Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Scalable and efficient object detection. In CVPR, 2020. 7
Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei
Liu, Jiarui Xu, et al. Mmdetection: Open mmlab detection [46] Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, and Junjie Yan.
toolbox and benchmark. arXiv preprint arXiv:1906.07155, Grid r-cnn. In CVPR, 2019. 7
2019. 6 [47] Shanghua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu
[38] Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noord- Zhang, Ming-Hsuan Yang, and Philip HS Torr. Res2net: A
huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, new multi-scale backbone architecture. IEEE transactions
Yangqing Jia, and Kaiming He. Accurate, large mini- on pattern analysis and machine intelligence, 2019. 7, 8
batch sgd: Training imagenet in 1 hour. arXiv preprint
[48] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia.
arXiv:1706.02677, 2017. 6
Path aggregation network for instance segmentation. In
[39] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. CVPR, 2018. 8
Deep residual learning for image recognition. In CVPR,
2016. 6 [49] Yuxin Wu and Kaiming He. Group normalization. In ECCV,
2018. 8
[40] Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng,
Wanli Ouyang, and Dahua Lin. Libra r-cnn: Towards bal- [50] Terrance DeVries and Graham W Taylor. Improved regular-
anced learning for object detection. In CVPR, 2019. 7 ization of convolutional neural networks with cutout. arXiv
[41] Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-fcn: Object preprint arXiv:1708.04552, 2017. 8
detection via region-based fully convolutional networks. In [51] Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry
NIPS, 2016. 7 Vetrov, and Andrew Gordon Wilson. Averaging weights
[42] Yanghao Li, Yuntao Chen, Naiyan Wang, and Zhaoxiang leads to wider optima and better generalization. arXiv
Zhang. Scale-aware trident networks for object detection. preprint arXiv:1803.05407, 2018. 8
In ICCV, 2019. 7
[52] Navaneeth Bodla, Bharat Singh, Rama Chellappa, and
[43] Bharat Singh and Larry S Davis. An analysis of scale invari- Larry S Davis. Soft-nms–improving object detection with
ance in object detection snip. In CVPR, 2018. 7 one line of code. In ICCV, 2017. 8

Object Detection and Segmentation
No ratings yet
Object Detection and Segmentation
37 pages
Electronics-Object Detection YOLO
No ratings yet
Electronics-Object Detection YOLO
12 pages
Module 6
No ratings yet
Module 6
83 pages
Generalized Focal Loss Towards Efficient Representation Learning For Dense Object Detection
No ratings yet
Generalized Focal Loss Towards Efficient Representation Learning For Dense Object Detection
15 pages
2022 - Enhanced Feature Fusion and Multiple Receptive Fields Object Detection
No ratings yet
2022 - Enhanced Feature Fusion and Multiple Receptive Fields Object Detection
12 pages
1 Absolutism Vs Relavatism
No ratings yet
1 Absolutism Vs Relavatism
4 pages
Scalable Object Detection
No ratings yet
Scalable Object Detection
8 pages
EF4e Uppint Filetest 5a
100% (6)
EF4e Uppint Filetest 5a
7 pages
We Need To Talk About IT Architecture
No ratings yet
We Need To Talk About IT Architecture
60 pages
Digging Into Sample Assignment Methods For Object Detection
No ratings yet
Digging Into Sample Assignment Methods For Object Detection
40 pages
Docmine: Spare Parts Catalog
No ratings yet
Docmine: Spare Parts Catalog
83 pages
SSD Single Shot MultiBox Detector
No ratings yet
SSD Single Shot MultiBox Detector
17 pages
End-to-End Object Detection With Fully Convolutional Network
No ratings yet
End-to-End Object Detection With Fully Convolutional Network
13 pages
Lec36 Obj Detn
No ratings yet
Lec36 Obj Detn
60 pages
8 ObectDectection
No ratings yet
8 ObectDectection
60 pages
Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
Applsci 12 07825
No ratings yet
Applsci 12 07825
23 pages
Attention and Feature Fusion SSD For Remote Sensing Object Detection
No ratings yet
Attention and Feature Fusion SSD For Remote Sensing Object Detection
9 pages
EdgeYOLO AnEdge-Real-Time Object Detector
No ratings yet
EdgeYOLO AnEdge-Real-Time Object Detector
7 pages
Active Learning For Deep Object Detection 2
No ratings yet
Active Learning For Deep Object Detection 2
10 pages
On Hyperbolic Embeddings in Object Detection
No ratings yet
On Hyperbolic Embeddings in Object Detection
19 pages
TOOD - Task-Aligned One-Stage Object Detection
No ratings yet
TOOD - Task-Aligned One-Stage Object Detection
12 pages
Focal Loss For Dense Object Detection
No ratings yet
Focal Loss For Dense Object Detection
10 pages
Center Net
No ratings yet
Center Net
12 pages
TSP CMC 49710
No ratings yet
TSP CMC 49710
19 pages
Doloriel Improving The Detection of Small Oriented Objects in Aerial Images WACVW 2023 Paper
No ratings yet
Doloriel Improving The Detection of Small Oriented Objects in Aerial Images WACVW 2023 Paper
10 pages
Shenzhen Denver 3000T User Manual
No ratings yet
Shenzhen Denver 3000T User Manual
358 pages
7 外文翻译1
No ratings yet
7 外文翻译1
10 pages
"Object Detection With Yolo": A Seminar On
No ratings yet
"Object Detection With Yolo": A Seminar On
14 pages
Deep Learning: Dr. Sanjeev Sharma
No ratings yet
Deep Learning: Dr. Sanjeev Sharma
61 pages
Alternative Delivery Mode Learning Resource Standards (Reviewer's Copy) I. Background
No ratings yet
Alternative Delivery Mode Learning Resource Standards (Reviewer's Copy) I. Background
32 pages
Focus-And-Detect A Small Object Detection Framework For Aerial Images
No ratings yet
Focus-And-Detect A Small Object Detection Framework For Aerial Images
9 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
Slicing Aidedhyperinferenceandfine-Tuning Forsmallobjectdetection
No ratings yet
Slicing Aidedhyperinferenceandfine-Tuning Forsmallobjectdetection
5 pages
Paper Survey On Performance Metrics For Object Detection Algorithms
No ratings yet
Paper Survey On Performance Metrics For Object Detection Algorithms
6 pages
ICT 7 Learning Module
No ratings yet
ICT 7 Learning Module
77 pages
YOLO Based Object Detection Models: A Review and Its Applications
No ratings yet
YOLO Based Object Detection Models: A Review and Its Applications
40 pages
Chen Dense Learning Based Semi-Supervised Object Detection CVPR 2022 Paper
No ratings yet
Chen Dense Learning Based Semi-Supervised Object Detection CVPR 2022 Paper
10 pages
OD Trans Christopher-Lang2022 Q2
No ratings yet
OD Trans Christopher-Lang2022 Q2
15 pages
Exam Lo1 Electrical Circuit Protection
No ratings yet
Exam Lo1 Electrical Circuit Protection
1 page
Lesson 07
No ratings yet
Lesson 07
59 pages
Tian FCOS Fully Convolutional One-Stage Object Detection ICCV 2019 Paper
No ratings yet
Tian FCOS Fully Convolutional One-Stage Object Detection ICCV 2019 Paper
10 pages
Foveabox: Beyound Anchor-Based Object Detection
No ratings yet
Foveabox: Beyound Anchor-Based Object Detection
10 pages
DSSD: Deconvolutional Single Shot Detector
No ratings yet
DSSD: Deconvolutional Single Shot Detector
11 pages
Knowledge-Based Systems
No ratings yet
Knowledge-Based Systems
10 pages
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes For Dense Object Detection
No ratings yet
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes For Dense Object Detection
14 pages
Robert D'Onofrio-Delay Analysis UK-US Approaches 2018
100% (1)
Robert D'Onofrio-Delay Analysis UK-US Approaches 2018
9 pages
CenterNet Keypoint Triplets PDF
No ratings yet
CenterNet Keypoint Triplets PDF
10 pages
10 1109@iwssip48289 2020 9145130
No ratings yet
10 1109@iwssip48289 2020 9145130
6 pages
Object and Face Detection Based On Center-Net 1
No ratings yet
Object and Face Detection Based On Center-Net 1
7 pages
Overview of Object Detection Based On Deep Learnin
No ratings yet
Overview of Object Detection Based On Deep Learnin
7 pages
CV Project
No ratings yet
CV Project
7 pages
An Improved Rotation Invariant CNN-based Detector With Rotatable Bounding Boxes For Aerial Image Detection
No ratings yet
An Improved Rotation Invariant CNN-based Detector With Rotatable Bounding Boxes For Aerial Image Detection
5 pages
Performance Indicator Survey For Object Detection
No ratings yet
Performance Indicator Survey For Object Detection
5 pages
Ref 14
No ratings yet
Ref 14
5 pages
2004 10934v1 PDF
No ratings yet
2004 10934v1 PDF
17 pages
Sensors 22 04833
No ratings yet
Sensors 22 04833
17 pages
Cost & Management Accounting
No ratings yet
Cost & Management Accounting
3 pages
Choi Gaussian YOLOv3 An Accurate and Fast Object Detector Using Localization ICCV 2019 Paper
No ratings yet
Choi Gaussian YOLOv3 An Accurate and Fast Object Detector Using Localization ICCV 2019 Paper
10 pages
Anchor-Based Vs Anchor-Free Object
No ratings yet
Anchor-Based Vs Anchor-Free Object
8 pages
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
No ratings yet
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
55 pages
Onenet: Towards End-To-End One-Stage Object Detection
No ratings yet
Onenet: Towards End-To-End One-Stage Object Detection
11 pages
RW A. Com: An Essay On Criticism
No ratings yet
RW A. Com: An Essay On Criticism
1 page
Glade Tutorial
No ratings yet
Glade Tutorial
5 pages
Introduction To Object Detection
No ratings yet
Introduction To Object Detection
24 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages
Real-Time Object Detection Using Deep Learning and Open CV
No ratings yet
Real-Time Object Detection Using Deep Learning and Open CV
4 pages
Bottom-Up Object Detection by Grouping Extreme and Center Points
No ratings yet
Bottom-Up Object Detection by Grouping Extreme and Center Points
10 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
Physical Science - q4 - Slm13-Pages-Deleted
No ratings yet
Physical Science - q4 - Slm13-Pages-Deleted
5 pages
An Exhaust Emissions Based Air-Fuel Ratio Calculation
No ratings yet
An Exhaust Emissions Based Air-Fuel Ratio Calculation
8 pages
Gr.8 - Unit #3 - L.4 - Speech Analysis
No ratings yet
Gr.8 - Unit #3 - L.4 - Speech Analysis
11 pages
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
No ratings yet
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
14 pages
Week 2 - Critical Thinking and Fundamental Reading Skills
No ratings yet
Week 2 - Critical Thinking and Fundamental Reading Skills
49 pages
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
No ratings yet
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
6 pages
Jurnal Manajemen Strategi Agribisnis Jessica Halaman 74 - 87
No ratings yet
Jurnal Manajemen Strategi Agribisnis Jessica Halaman 74 - 87
46 pages
Expt. No. 2 - Basic Operational Amplifier Circuit PDF
No ratings yet
Expt. No. 2 - Basic Operational Amplifier Circuit PDF
2 pages
(Buehler & Griffin & Peetz-2012) The Planning Fallacy - Cognitive, Motivational, and Social Origins
No ratings yet
(Buehler & Griffin & Peetz-2012) The Planning Fallacy - Cognitive, Motivational, and Social Origins
62 pages
Macos Mojave Compatibility 02 07
No ratings yet
Macos Mojave Compatibility 02 07
11 pages
Staff Manual Chewonki
No ratings yet
Staff Manual Chewonki
34 pages
Lecture 1
No ratings yet
Lecture 1
20 pages
AX Series Hanyoung Brochure
No ratings yet
AX Series Hanyoung Brochure
6 pages
Inner Ring
No ratings yet
Inner Ring
16 pages
HW4 Pete550
No ratings yet
HW4 Pete550
5 pages
Troubleshooting Neato Botvac Connected Series
No ratings yet
Troubleshooting Neato Botvac Connected Series
4 pages
DCP Exam Datesheet
No ratings yet
DCP Exam Datesheet
15 pages
Performance and Durability Comparison: Dell Latitude 14 5000 Series vs. HP EliteBook 840 G1
No ratings yet
Performance and Durability Comparison: Dell Latitude 14 5000 Series vs. HP EliteBook 840 G1
20 pages

Varifocal Net

Uploaded by

Varifocal Net

Uploaded by

VarifocalNet: An IoU-aware Dense Object Detector

Haoyang Zhang1 , Ying Wang2 , Feras Dayoub1 , Niko Sünderhauf1

Accurately ranking the vast number of candidate detec-

You might also like