0% found this document useful (0 votes)

95 views14 pages

CornerNet Detecting Objects As Paired Keypoints

CornerNet is a one-stage object detector that detects objects as pairs of keypoints (the top-left and bottom-right corners). It eliminates the need for anchor boxes by directly predicting heatmaps for the corners and embeddings to group corner pairs belonging to the same object. CornerNet achieves state-of-the-art performance on MS COCO, outperforming other one-stage detectors. It introduces a new type of pooling called corner pooling that helps the network better localize object corners.

Uploaded by

trí nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views14 pages

CornerNet Detecting Objects As Paired Keypoints

Uploaded by

trí nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

CornerNet: Detecting Objects as Paired Keypoints

Hei Law · Jia Deng

arXiv:1808.01244v2 [cs.CV] 18 Mar 2019

Abstract We propose CornerNet, a new approach to more efficient. One-stage detectors place anchor boxes
object detection where we detect an object bounding densely over an image and generate final box predic-
box as a pair of keypoints, the top-left corner and the tions by scoring anchor boxes and refining their coordi-
bottom-right corner, using a single convolution neural nates through regression.
network. By detecting objects as paired keypoints, we But the use of anchor boxes has two drawbacks.
eliminate the need for designing a set of anchor boxes First, we typically need a very large set of anchor boxes,
commonly used in prior single-stage detectors. In addi- e.g. more than 40k in DSSD (Fu et al., 2017) and more
tion to our novel formulation, we introduce corner pool- than 100k in RetinaNet (Lin et al., 2017). This is be-
ing, a new type of pooling layer that helps the network cause the detector is trained to classify whether each
better localize corners. Experiments show that Corner- anchor box sufficiently overlaps with a ground truth
Net achieves a 42.2% AP on MS COCO, outperforming box, and a large number of anchor boxes is needed to
all existing one-stage detectors. ensure sufficient overlap with most ground truth boxes.
As a result, only a tiny fraction of anchor boxes will
Keywords Object Detection
overlap with ground truth; this creates a huge imbal-
ance between positive and negative anchor boxes and
slows down training (Lin et al., 2017).
1 Introduction
Second, the use of anchor boxes introduces many
hyperparameters and design choices. These include how
Object detectors based on convolutional neural net-
many boxes, what sizes, and what aspect ratios. Such
works (ConvNets) (Krizhevsky et al., 2012; Simonyan
choices have largely been made via ad-hoc heuristics,
and Zisserman, 2014; He et al., 2016) have achieved
and can become even more complicated when combined
state-of-the-art results on various challenging bench-
with multiscale architectures where a single network
marks (Lin et al., 2014; Deng et al., 2009; Everingham
makes separate predictions at multiple resolutions, with
et al., 2015). A common component of state-of-the-art
each scale using different features and its own set of an-
approaches is anchor boxes (Ren et al., 2015; Liu et al.,
chor boxes (Liu et al., 2016; Fu et al., 2017; Lin et al.,
2016), which are boxes of various sizes and aspect ra-
2017).
tios that serve as detection candidates. Anchor boxes
In this paper we introduce CornerNet, a new one-
are extensively used in one-stage detectors (Liu et al.,
stage approach to object detection that does away with
2016; Fu et al., 2017; Redmon and Farhadi, 2016; Lin
anchor boxes. We detect an object as a pair of keypoints—
et al., 2017), which can achieve results highly competi-
the top-left corner and bottom-right corner of the bound-
tive with two-stage detectors (Ren et al., 2015; Girshick
ing box. We use a single convolutional network to pre-
et al., 2014; Girshick, 2015; He et al., 2017) while being
dict a heatmap for the top-left corners of all instances
H. Law of the same object category, a heatmap for all bottom-
Princeton University, Princeton, NJ, USA right corners, and an embedding vector for each de-
E-mail: [email protected] tected corner. The embeddings serve to group a pair of
J. Deng corners that belong to the same object—the network is
Princeton Universtiy, Princeton, NJ, USA trained to predict similar embeddings for them. Our ap-
2 Hei Law, Jia Deng

Heatmaps Embeddings

Top-Left Corners

ConvNet

Bottom-Right Corners

Fig. 1 We detect an object as a pair of bounding box corners grouped together. A convolutional network outputs a heatmap
for all top-left corners, a heatmap for all bottom-right corners, and an embedding vector for each detected corner. The network
is trained to predict similar embeddings for corners that belong to the same object.

proach greatly simplifies the output of the network and way of densely discretizing the space of boxes: we just
eliminates the need for designing anchor boxes. Our ap- need O(wh) corners to represent O(w2 h2 ) possible an-
proach is inspired by the associative embedding method chor boxes.
proposed by Newell et al. (2017), who detect and group We demonstrate the effectiveness of CornerNet on
keypoints in the context of multiperson human-pose es- MS COCO (Lin et al., 2014). CornerNet achieves a
timation. Fig. 1 illustrates the overall pipeline of our 42.2% AP, outperforming all existing one-stage detec-
approach. tors. In addition, through ablation studies we show that
Another novel component of CornerNet is corner corner pooling is critical to the superior performance of
pooling, a new type of pooling layer that helps a con- CornerNet. Code is available at https://github.com/
volutional network better localize corners of bounding princeton-vl/CornerNet.
boxes. A corner of a bounding box is often outside the
object—consider the case of a circle as well as the ex-
amples in Fig. 2. In such cases a corner cannot be lo- 2 Related Works
calized based on local evidence. Instead, to determine
whether there is a top-left corner at a pixel location, 2.1 Two-stage object detectors
we need to look horizontally towards the right for the
topmost boundary of the object, and look vertically to- Two-stage approach was first introduced and popular-
wards the bottom for the leftmost boundary. This mo- ized by R-CNN (Girshick et al., 2014). Two-stage de-
tivates our corner pooling layer: it takes in two feature tectors generate a sparse set of regions of interest (RoIs)
maps; at each pixel location it max-pools all feature and classify each of them by a network. R-CNN gener-
vectors to the right from the first feature map, max- ates RoIs using a low level vision algorithm (Uijlings
pools all feature vectors directly below from the second et al., 2013; Zitnick and Dollár, 2014). Each region is
feature map, and then adds the two pooled results to- then extracted from the image and processed by a Con-
gether. An example is shown in Fig. 3. vNet independently, which creates lots of redundant
We hypothesize two reasons why detecting corners computations. Later, SPP (He et al., 2014) and Fast-
would work better than bounding box centers or pro- RCNN (Girshick, 2015) improve R-CNN by designing
posals. First, the center of a box can be harder to lo- a special pooling layer that pools each region from fea-
calize because it depends on all 4 sides of the object, ture maps instead. However, both still rely on separate
whereas locating a corner depends on 2 sides and is thus proposal algorithms and cannot be trained end-to-end.
easier, and even more so with corner pooling, which en- Faster-RCNN (Ren et al., 2015) does away low level
codes some explicit prior knowledge about the defini- proposal algorithms by introducing a region proposal
tion of corners. Second, corners provide a more efficient network (RPN), which generates proposals from a set of
CornerNet: Detecting Objects as Paired Keypoints 3

Fig. 2 Often there is no local evidence to determine the location of a bounding box corner. We address this issue by proposing
a new type of pooling layer.

output

feature maps top-left corner pooling

Fig. 3 Corner pooling: for each channel, we take the maximum values (red dots) in two directions (red lines), each from a
separate feature map, and add the two maximums together (blue dot).

pre-determined candidate boxes, usually known as an- stage detectors while maintaining competitive perfor-
chor boxes. This not only makes the detectors more effi- mance on different challenging benchmarks.
cient but also allows the detectors to be trained end-to-
end. R-FCN (Dai et al., 2016) further improves the effi- SSD places anchor boxes densely over feature maps
ciency of Faster-RCNN by replacing the fully connected from multiple scales, directly classifies and refines each
sub-detection network with a fully convolutional sub- anchor box. YOLO predicts bounding box coordinates
detection network. Other works focus on incorporating directly from an image, and is later improved in YOLO9000 (Red-
sub-category information (Xiang et al., 2016), generat- mon and Farhadi, 2016) by switching to anchor boxes.
ing object proposals at multiple scales with more con- DSSD (Fu et al., 2017) and RON (Kong et al., 2017)
textual information (Bell et al., 2016; Cai et al., 2016; adopt networks similar to the hourglass network (Newell
Shrivastava et al., 2016; Lin et al., 2016), selecting bet- et al., 2016), enabling them to combine low-level and
ter features (Zhai et al., 2017), improving speed (Li high-level features via skip connections to predict bound-
et al., 2017), cascade procedure (Cai and Vasconcelos, ing boxes more accurately. However, these one-stage
2017) and better training procedure (Singh and Davis, detectors are still outperformed by the two-stage de-
2017). tectors until the introduction of RetinaNet (Lin et al.,
2017). In (Lin et al., 2017), the authors suggest that
the dense anchor boxes create a huge imbalance be-
2.2 One-stage object detectors tween positive and negative anchor boxes during train-
ing. This imbalance causes the training to be inefficient
On the other hand, YOLO (Redmon et al., 2016) and and hence the performance to be suboptimal. They pro-
SSD (Liu et al., 2016) have popularized the one-stage pose a new loss, Focal Loss, to dynamically adjust the
approach, which removes the RoI pooling step and de- weights of each anchor box and show that their one-
tects objects in a single network. One-stage detectors stage detector can outperform the two-stage detectors.
are usually more computationally efficient than two- RefineDet (Zhang et al., 2017) proposes to filter the an-
4 Hei Law, Jia Deng

chor boxes to reduce the number of negative boxes, and 3 CornerNet

to coarsely adjust the anchor boxes.
3.1 Overview

DeNet (Tychsen-Smith and Petersson, 2017a) is a In CornerNet, we detect an object as a pair of keypoints—
two-stage detector which generates RoIs without using the top-left corner and bottom-right corner of the bound-
anchor boxes. It first determines how likely each loca- ing box. A convolutional network predicts two sets of
tion belongs to either the top-left, top-right, bottom- heatmaps to represent the locations of corners of dif-
left or bottom-right corner of a bounding box. It then ferent object categories, one set for the top-left corners
generates RoIs by enumerating all possible corner com- and the other for the bottom-right corners. The network
binations, and follows the standard two-stage approach also predicts an embedding vector for each detected cor-
to classify each RoI. Our approach is very different from ner (Newell et al., 2017) such that the distance between
DeNet. First, DeNet does not identify if two corners the embeddings of two corners from the same object
are from the same objects and relies on a sub-detection is small. To produce tighter bounding boxes, the net-
network to reject poor RoIs. In contrast, our approach work also predicts offsets to slightly adjust the locations
is a one-stage approach which detects and groups the of the corners. With the predicted heatmaps, embed-
corners using a single ConvNet. Second, DeNet selects dings and offsets, we apply a simple post-processing
features at manually determined locations relative to algorithm to obtain the final bounding boxes.
a region for classification, while our approach does not Fig. 4 provides an overview of CornerNet. We use
require any feature selection step. Third, we introduce the hourglass network (Newell et al., 2016) as the back-
corner pooling, a novel type of layer to enhance corner bone network of CornerNet. The hourglass network is
detection. followed by two prediction modules. One module is for
the top-left corners, while the other one is for the bottom-
Point Linking Network (PLN) (Wang et al., 2017) right corners. Each module has its own corner pooling
is an one-stage detector without anchor boxes. It first module to pool features from the hourglass network be-
predicts the locations of the four corners and the center fore predicting the heatmaps, embeddings and offsets.
of a bounding box. Then, at each corner location, it pre- Unlike many other object detectors, we do not use fea-
dicts how likely each pixel location in the image is the tures from different scales to detect objects of different
center. Similarly, at the center location, it predicts how sizes. We only apply both modules to the output of the
likely each pixel location belongs to either the top-left, hourglass network.
top-right, bottom-left or bottom-right corner. It com-
bines the predictions from each corner and center pair
to generate a bounding box. Finally, it merges the four 3.2 Detecting Corners
bounding boxes to give a bounding box. CornerNet is
very different from PLN. First, CornerNet groups the We predict two sets of heatmaps, one for top-left corners
corners by predicting embedding vectors, while PLN and one for bottom-right corners. Each set of heatmaps
groups the corner and center by predicting pixel loca- has C channels, where C is the number of categories,
tions. Second, CornerNet uses corner pooling to better and is of size H × W . There is no background channel.
localize the corners. Each channel is a binary mask indicating the locations
of the corners for a class.
Our approach is inspired by Newell et al. (2017) on For each corner, there is one ground-truth positive
Associative Embedding in the context of multi-person location, and all other locations are negative. During
pose estimation. Newell et al. propose an approach that training, instead of equally penalizing negative loca-
detects and groups human joints in a single network. In tions, we reduce the penalty given to negative locations
their approach each detected human joint has an em- within a radius of the positive location. This is because
bedding vector. The joints are grouped based on the a pair of false corner detections, if they are close to
distances between their embeddings. To the best of our their respective ground truth locations, can still pro-
knowledge, we are the first to formulate the task of duce a box that sufficiently overlaps the ground-truth
object detection as a task of detecting and grouping box (Fig. 5). We determine the radius by the size of an
corners with embeddings. Another novelty of ours is object by ensuring that a pair of points within the ra-
the corner pooling layers that help better localize the dius would generate a bounding box with at least t IoU
corners. We also significantly modify the hourglass ar- with the ground-truth annotation (we set t to 0.3 in all
chitecture and add our novel variant of focal loss (Lin experiments). Given the radius, the amount of penalty
et al., 2017) to help better train the network. reduction is given by an unnormalized 2D Gaussian,
CornerNet: Detecting Objects as Paired Keypoints 5

Prediction Module

Top-left Corners Heatmaps

Corner Pooling
Prediction Module

Embeddings

Prediction Module
Offsets

Bottom-right corners
Hourglass Network
Fig. 4 Overview of CornerNet. The backbone network is followed by two prediction modules, one for the top-left corners and
the other for the bottom-right corners. Using the predictions from both modules, we locate and group the corners.

output is usually smaller than the image. Hence, a lo-

cation (x,y) in the image is mapped to the location
b nx c, b ny c in the heatmaps, where n is the downsam-
pling factor. When we remap the locations from the
heatmaps to the input image, some precision may be
lost, which can greatly affect the IoU of small bounding
boxes with their ground truths. To address this issue we
predict location offsets to slightly adjust the corner lo-
cations before remapping them to the input resolution.

x jx k y j y k
k k k k
ok = − , − (2)
n n n n
where ok is the offset, xk and yk are the x and y coor-
dinate for corner k. In particular, we predict one set of
Fig. 5 “Ground-truth” heatmaps for training. Boxes (green offsets shared by the top-left corners of all categories,
dotted rectangles) whose corners are within the radii of the and another set shared by the bottom-right corners. For
positive locations (orange circles) still have large overlaps training, we apply the smooth L1 Loss (Girshick, 2015)
with the ground-truth annotations (red solid rectangles).
at ground-truth corner locations:
N
x2 +y 2 1 X
e− 2σ2 , whose center is at the positive location and Loff = SmoothL1Loss (ok , ôk ) (3)
N
whose σ is 1/3 of the radius. k=1

Let pcij be the score at location (i, j) for class c

in the predicted heatmaps, and let ycij be the “ground- 3.3 Grouping Corners
truth” heatmap augmented with the unnormalized Gaus-
sians. We design a variant of focal loss (Lin et al., 2017): Multiple objects may appear in an image, and thus mul-
tiple top-left and bottom-right corners may be detected.
C P
H P
W
α
(1 − pcij ) log (pcij ) if ycij = 1
We need to determine if a pair of the top-left corner and
−1
(1)
P
Ldet = N β α bottom-right corner is from the same bounding box.
c=1 i=1 j=1 (1 − ycij ) (pcij ) log (1 − pcij ) otherwise
Our approach is inspired by the Associative Embed-
where N is the number of objects in an image, and ding method proposed by Newell et al. (2017) for the
α and β are the hyper-parameters which control the task of multi-person pose estimation. Newell et al. de-
contribution of each point (we set α to 2 and β to 4 in tect all human joints and generate an embedding for
all experiments). With the Gaussian bumps encoded in each detected joint. They group the joints based on the
ycij , the (1 − ycij ) term reduces the penalty around the distances between the embeddings.
ground truth locations. The idea of associative embedding is also applicable
Many networks (He et al., 2016; Newell et al., 2016) to our task. The network predicts an embedding vector
involve downsampling layers to gather global informa- for each detected corner such that if a top-left corner
tion and to reduce memory usage. When they are ap- and a bottom-right corner belong to the same bound-
plied to an image fully convolutionally, the size of the ing box, the distance between their embeddings should
6 Hei Law, Jia Deng

be small. We can then group the corners based on the where we apply an elementwise max operation. Both
distances between the embeddings of the top-left and tij and lij can be computed efficiently by dynamic pro-
bottom-right corners. The actual values of the embed- gramming as shown Fig. 8.
dings are unimportant. Only the distances between the We define bottom-right corner pooling layer in a
embeddings are used to group the corners. similar way. It max-pools all feature vectors between
We follow Newell et al. (2017) and use embeddings (0, j) and (i, j), and all feature vectors between (i, 0)
of 1 dimension. Let etk be the embedding for the top-left and (i, j) before adding the pooled results. The corner
corner of object k and ebk for the bottom-right corner. pooling layers are used in the prediction modules to
As in Newell and Deng (2017), we use the “pull” loss to predict heatmaps, embeddings and offsets.
train the network to group the corners and the “push” The architecture of the prediction module is shown
loss to separate the corners: in Fig. 7. The first part of the module is a modified
version of the residual block (He et al., 2016). In this
N
1 Xh 2 2
i modified residual block, we replace the first 3 × 3 con-
Lpull = (etk − ek ) + (ebk − ek ) , (4)
N volution module with a corner pooling module, which
k=1
first processes the features from the backbone network
by two 3 × 3 convolution modules 1 with 128 channels
N N and then applies a corner pooling layer. Following the
1 XX
Lpush = max (0, ∆ − |ek − ej |) , (5) design of a residual block, we then feed the pooled fea-
N (N − 1) j=1 tures into a 3 × 3 Conv-BN layer with 256 channels and
k=1
j6=k
add back the projection shortcut. The modified residual
where ek is the average of etk and ebk and we set ∆ block is followed by a 3×3 convolution module with 256
to be 1 in all our experiments. Similar to the offset channels, and 3 Conv-ReLU-Conv layers to produce the
loss, we only apply the losses at the ground-truth corner heatmaps, embeddings and offsets.
location.

3.5 Hourglass Network

3.4 Corner Pooling
CornerNet uses the hourglass network (Newell et al.,
As shown in Fig. 2, there is often no local visual ev- 2016) as its backbone network. The hourglass network
idence for the presence of corners. To determine if a was first introduced for the human pose estimation task.
pixel is a top-left corner, we need to look horizontally It is a fully convolutional neural network that consists
towards the right for the topmost boundary of an ob- of one or more hourglass modules. An hourglass module
ject and vertically towards the bottom for the leftmost first downsamples the input features by a series of con-
boundary. We thus propose corner pooling to better lo- volution and max pooling layers. It then upsamples the
calize the corners by encoding explicit prior knowledge. features back to the original resolution by a series of up-
Suppose we want to determine if a pixel at location sampling and convolution layers. Since details are lost
(i, j) is a top-left corner. Let ft and fl be the feature in the max pooling layers, skip layers are added to bring
maps that are the inputs to the top-left corner pooling back the details to the upsampled features. The hour-
layer, and let ftij and flij be the vectors at location glass module captures both global and local features
(i, j) in ft and fl respectively. With H × W feature in a single unified structure. When multiple hourglass
maps, the corner pooling layer first max-pools all fea- modules are stacked in the network, the hourglass mod-
ture vectors between (i, j) and (i, H) in ft to a feature ules can reprocess the features to capture higher-level of
vector tij , and max-pools all feature vectors between information. These properties make the hourglass net-
(i, j) and (W, j) in fl to a feature vector lij . Finally, work an ideal choice for object detection as well. In fact,
it adds tij and lij together. This computation can be many current detectors (Shrivastava et al., 2016; Fu
expressed by the following equations: et al., 2017; Lin et al., 2016; Kong et al., 2017) already
adopted networks similar to the hourglass network.
Our hourglass network consists of two hourglasses,

max ftij , t(i+1)j if i < H
tij = (6) and we make some modifications to the architecture
ftHj otherwise
of the hourglass module. Instead of using max pool-
1
Unless otherwise specified, our convolution module con-
max flij , li(j+1) if j < W sists of a convolution layer, a BN layer (Ioffe and Szegedy,
lij = (7) 2015) and a ReLU layer
fliW otherwise
CornerNet: Detecting Objects as Paired Keypoints 7

2 1 3 0 2 3 3 3 2 2

5 4 1 1 6 6 6 6 6 6

6 7

9 10
3 1 3 4
1 1 3 4

3 4 3 4

2 2 2 2
0 2 0 2

Fig. 6 The top-left corner pooling layer can be implemented very efficiently. We scan from right to left for the horizontal
max-pooling and from bottom to top for the vertical max-pooling. We then add two max-pooled feature maps.

Top-left Corner Pooling Module

Top-left Corner Pooling

3x3 Conv-BN Heatmaps

ReLU

Embeddings

3x3 Conv-BN-ReLU
Backbone

Offsets

1x1 Conv-BN 3x3 Conv-ReLU 1x1 Conv

Fig. 7 The prediction module starts with a modified residual block, in which we replace the first convolution module with
our corner pooling module. The modified residual block is then followed by a convolution module. We have multiple branches
for predicting the heatmaps, embeddings and offsets.

ing, we simply use stride 2 to reduce feature resolu- 4 Experiments

tion. We reduce feature resolutions 5 times and in-
crease the number of feature channels along the way 4.1 Training Details
(256, 384, 384, 384, 512). When we upsample the features,
we apply 2 residual modules followed by a nearest neigh- We implement CornerNet in PyTorch (Paszke et al.,
bor upsampling. Every skip connection also consists of 2017). The network is randomly initialized under the
2 residual modules. There are 4 residual modules with default setting of PyTorch with no pretraining on any
512 channels in the middle of an hourglass module. Be- external dataset. As we apply focal loss, we follow (Lin
fore the hourglass modules, we reduce the image res- et al., 2017) to set the biases in the convolution layers
olution by 4 times using a 7 × 7 convolution module that predict the corner heatmaps. During training, we
with stride 2 and 128 channels followed by a residual set the input resolution of the network to 511 × 511,
block (He et al., 2016) with stride 2 and 256 channels. which leads to an output resolution of 128 × 128. To
reduce overfitting, we adopt standard data augmenta-
Following (Newell et al., 2016), we also add interme- tion techniques including random horizontal flipping,
diate supervision in training. However, we do not add random scaling, random cropping and random color
back the intermediate predictions to the network as we jittering, which includes adjusting the brightness, sat-
find that this hurts the performance of the network. We uration and contrast of an image. Finally, we apply
apply a 1 × 1 Conv-BN module to both the input and PCA (Krizhevsky et al., 2012) to the input image.
output of the first hourglass module. We then merge
We use Adam (Kingma and Ba, 2014) to optimize
them by element-wise addition followed by a ReLU and
the full training loss:
a residual block with 256 channels, which is then used as
the input to the second hourglass module. The depth of L = Ldet + αLpull + βLpush + γLoff (8)
the hourglass network is 104. Unlike many other state-
of-the-art detectors, we only use the features from the where α, β and γ are the weights for the pull, push and
last layer of the whole network to make predictions. offset loss respectively. We set both α and β to 0.1 and
8 Hei Law, Jia Deng

Table 1 Ablation on corner pooling on MS COCO validation.

AP AP50 AP75 APs APm APl
w/o corner pooling 36.5 52.0 38.8 17.5 38.9 49.4
w/ corner pooling 38.4 53.8 40.9 18.6 40.5 51.8
improvement +2.0 +2.1 +2.1 +1.1 +2.4 +3.6

Table 2 Reducing the penalty given to the negative locations near positive locations helps significantly improve the perfor-
mance of the network
AP AP50 AP75 APs APm APl
w/o reducing penalty 32.9 49.1 34.8 19.0 37.0 40.7
fixed radius 35.6 52.5 37.7 18.7 38.5 46.0
object-dependent radius 38.4 53.8 40.9 18.6 40.5 51.8

Table 3 Corner pooling consistently improves the network performance on detecting corners in different image quadrants,
showing that corner pooling is effective and stable over both small and large areas.
mAP w/o pooling mAP w/ pooling improvement
Top-Left Corners
Top-Left Quad. 66.1 69.2 +3.1
Bottom-Right Quad. 60.8 63.5 +2.7
Bottom-Right Corners
Top-Left Quad. 53.4 56.2 +2.8
Bottom-Right Quad. 65.0 67.6 +2.6

γ to 1. We find that 1 or larger values of α and β lead zeros before feeding it to CornerNet. Both the original
to poor performance. We use a batch size of 49 and and flipped images are used for testing. We combine the
train the network on 10 Titan X (PASCAL) GPUs (4 detections from the original and flipped images, and ap-
images on the master GPU, 5 images per GPU for the ply soft-nms (Bodla et al., 2017) to suppress redundant
rest of the GPUs). To conserve GPU resources, in our detections. Only the top 100 detections are reported.
ablation experiments, we train the networks for 250k The average inference time is 244ms per image on a
iterations with a learning rate of 2.5 × 10−4 . When we Titan X (PASCAL) GPU.
compare our results with other detectors, we train the
networks for an extra 250k iterations and reduce the
learning rate to 2.5 × 10−5 for the last 50k iterations.

4.2 Testing Details

During testing, we use a simple post-processing algo-

4.3 MS COCO
rithm to generate bounding boxes from the heatmaps,
embeddings and offsets. We first apply non-maximal
suppression (NMS) by using a 3×3 max pooling layer on We evaluate CornerNet on the very challenging MS
the corner heatmaps. Then we pick the top 100 top-left COCO dataset (Lin et al., 2014). MS COCO contains
and top 100 bottom-right corners from the heatmaps. 80k images for training, 40k for validation and 20k for
The corner locations are adjusted by the correspond- testing. All images in the training set and 35k images in
ing offsets. We calculate the L1 distances between the the validation set are used for training. The remaining
embeddings of the top-left and bottom-right corners. 5k images in validation set are used for hyper-parameter
Pairs that have distances greater than 0.5 or contain searching and ablation study. All results on the test set
corners from different categories are rejected. The aver- are submitted to an external server for evaluation. To
age scores of the top-left and bottom-right corners are provide fair comparisons with other detectors, we re-
used as the detection scores. port our main results on the test-dev set. MS COCO
Instead of resizing an image to a fixed size, we main- uses average precisions (APs) at different IoUs and APs
tain the original resolution of the image and pad it with for different object sizes as the main evaluation metrics.
CornerNet: Detecting Objects as Paired Keypoints 9

w/o corner pooling

w/ corner pooling

Fig. 8 Qualitative examples showing corner pooling helps better localize the corners.

Table 4 The hourglass network is crucial to the performance of CornerNet.

AP AP50 AP75 APs APm APl
FPN (w/ ResNet-101) + Corners 30.2 44.1 32.0 13.3 33.3 42.7
Hourglass + Anchors 32.9 53.1 35.6 16.5 38.5 45.0
Hourglass + Corners 38.4 53.8 40.9 18.6 40.5 51.8

4.4 Ablation Study different quadrants of an image. Detecting corners can

be seen as a binary classification task i.e. the ground-
4.4.1 Corner Pooling truth location of a corner is positive, and any location
outside of a small radius of the corner is negative. We
Corner pooling is a key component of CornerNet. To measure the performance using mAPs over all cate-
understand its contribution to performance, we train gories on the MS COCO validation set.
another network without corner pooling but with the Tab. 3 shows that without corner pooling, the top-
same number of parameters. left corner mAPs of upper-left and lower-right quad-
Tab. 1 shows that adding corner pooling gives sig- rant are 66.1% and 60.8% respectively. Top-left cor-
nificant improvement: 2.0% on AP, 2.1% on AP50 and ner pooling improves the mAPs by 3.1% (to 69.2%)
2.1% on AP75 . We also see that corner pooling is es- and 2.7% (to 63.5%) respectively. Similarly, bottom-
pecially helpful for medium and large objects, improv- right corner pooling improves the bottom-right corner
ing their APs by 2.4% and 3.6% respectively. This is mAPs of upper-left quadrant by 2.8% (from 53.4% to
expected because the topmost, bottommost, leftmost, 56.2%), and lower-right quadrant by 2.6% (from 65.0%
rightmost boundaries of medium and large objects are to 67.6%). Corner pooling gives similar improvement to
likely to be further away from the corner locations. corners at different quadrants, show that corner pooling
Fig. 8 shows four qualitative examples with and with- is effective and stable over both small and large areas.
out corner pooling.
4.4.3 Reducing Penalty to Negative Locations
4.4.2 Stability of Corner Pooling over Larger Area
We reduce the penalty given to negative locations around
Corner pooling pools over different sizes of area in dif- a positive location, within a radius determined by the
ferent quadrants of an image. For example, the top-left size of the object (Sec. 3.2). To understand how this
corner pooling pools over larger areas both horizontally helps train CornerNet, we train one network with no
and vertically in the upper-left quadrant of an image, penalty reduction and another network with a fixed ra-
compared to the lower-right quadrant. Therefore, the dius of 2.5. We compare them with CornerNet on the
location of a corner may affect the stability of the cor- validation set.
ner pooling. Tab. 2 shows that a fixed radius improves AP over
We evaluate the performance of our network on de- the baseline by 2.7%, APm by 1.5% and APl by 5.3%.
tecting both the top-left and bottom-right corners in Object-dependent radius further improves the AP by
10 Hei Law, Jia Deng

Table 5 CornerNet performs much better at high IoUs than other state-of-the-art detectors.
AP AP50 AP60 AP70 AP80 AP90
RetinaNet (Lin et al., 2017) 39.8 59.5 55.6 48.2 36.4 15.1
Cascade R-CNN (Cai and Vasconcelos, 2017) 38.9 57.8 53.4 46.9 35.8 15.8
Cascade R-CNN + IoU Net (Jiang et al., 2018) 41.4 59.3 55.3 49.6 39.4 19.5
CornerNet 40.6 56.1 52.0 46.8 38.8 23.4

Table 6 Error analysis. We replace the predicted heatmaps and offsets with the ground-truth values. Using the ground-truth
heatmaps alone improves the AP from 38.4% to 73.1%, suggesting that the main bottleneck of CornerNet is detecting corners.
AP AP50 AP75 APs APm APl
38.4 53.8 40.9 18.6 40.5 51.8
w/ gt heatmaps 73.1 87.7 78.4 60.9 81.2 81.8
w/ gt heatmaps + offsets 86.1 88.9 85.5 84.8 87.2 82.0

Fig. 9 Qualitative example showing errors in predicting corners and embeddings. The first row shows images where CornerNet
mistakenly combines boundary evidence from different objects. The second row shows images where CornerNet predicts similar
embeddings for corners from different objects.

2.8%, APm by 2.0% and APl by 5.8%. In addition, tor which uses the hourglass network as its backbone.
we see that the penalty reduction especially benefits Each hourglass module predicts anchor boxes at multi-
medium and large objects. ple resolutions by using features at multiple scales dur-
ing upsampling stage. We follow the anchor box design
4.4.4 Hourglass Network in RetinaNet (Lin et al., 2017) and add intermediate
supervisions during training. In both experiments, we
CornerNet uses the hourglass network (Newell et al., initialize the networks from scratch and follow the same
2016) as its backbone network. Since the hourglass net- training procedure as we train CornerNet (Sec. 4.1).
work is not commonly used in other state-of-the-art de-
tectors, we perform an experiment to study the contri-
bution of the hourglass network in CornerNet. We train Tab. 4 shows that CornerNet with hourglass net-
a CornerNet in which we replace the hourglass network work outperforms CornerNet with FPN by 8.2% AP,
with FPN (w/ ResNet-101) (Lin et al., 2017), which is and the anchor box based detector with hourglass net-
more commonly used in state-of-the-art object detec- work by 5.5% AP. The results suggest that the choice of
tors. We only use the final output of FPN for predic- the backbone network is important and the hourglass
tions. Meanwhile, we train an anchor box based detec- network is crucial to the performance of CornerNet.
CornerNet: Detecting Objects as Paired Keypoints 11

Table 7 CornerNet versus others on MS COCO test-dev. CornerNet outperforms all one-stage detectors and achieves results
competitive to two-stage detectors

Method Backbone AP AP50 AP75 APs APm APl AR1 AR10 AR100 ARs ARm ARl
Two-stage detectors
DeNet (Tychsen-Smith and Petersson, 2017a) ResNet-101 33.8 53.4 36.1 12.3 36.1 50.8 29.6 42.6 43.5 19.2 46.9 64.3
CoupleNet (Zhu et al., 2017) ResNet-101 34.4 54.8 37.2 13.4 38.1 50.8 30.0 45.0 46.4 20.7 53.1 68.5
Faster R-CNN by G-RMI (Huang et al., 2017) Inception-ResNet-v2 (Szegedy et al., 2017) 34.7 55.5 36.7 13.5 38.1 52.0 - - - - - -
Faster R-CNN+++ (He et al., 2016) ResNet-101 34.9 55.7 37.4 15.6 38.7 50.9 - - - - - -
Faster R-CNN w/ FPN (Lin et al., 2016) ResNet-101 36.2 59.1 39.0 18.2 39.0 48.2 - - - - - -
Faster R-CNN w/ TDM (Shrivastava et al., 2016) Inception-ResNet-v2 36.8 57.7 39.2 16.2 39.8 52.1 31.6 49.3 51.9 28.1 56.6 71.1
D-FCN (Dai et al., 2017) Aligned-Inception-ResNet 37.5 58.0 - 19.4 40.1 52.5 - - - - - -
Regionlets (Xu et al., 2017) ResNet-101 39.3 59.8 - 21.7 43.7 50.9 - - - - - -
Mask R-CNN (He et al., 2017) ResNeXt-101 39.8 62.3 43.4 22.1 43.2 51.2 - - - - - -
Soft-NMS (Bodla et al., 2017) Aligned-Inception-ResNet 40.9 62.8 - 23.3 43.6 53.3 - - - - - -
LH R-CNN (Li et al., 2017) ResNet-101 41.5 - - 25.2 45.3 53.1 - - - - - -
Fitness-NMS (Tychsen-Smith and Petersson, 2017b) ResNet-101 41.8 60.9 44.9 21.5 45.0 57.5 - - - - - -
Cascade R-CNN (Cai and Vasconcelos, 2017) ResNet-101 42.8 62.1 46.3 23.7 45.5 55.2 - - - - - -
D-RFCN + SNIP (Singh and Davis, 2017) DPN-98 (Chen et al., 2017) 45.7 67.3 51.1 29.3 48.8 57.1 - - - - - -
One-stage detectors
YOLOv2 (Redmon and Farhadi, 2016) DarkNet-19 21.6 44.0 19.2 5.0 22.4 35.5 20.7 31.6 33.3 9.8 36.5 54.4
DSOD300 (Shen et al., 2017a) DS/64-192-48-1 29.3 47.3 30.6 9.4 31.5 47.0 27.3 40.7 43.0 16.7 47.1 65.0
GRP-DSOD320 (Shen et al., 2017b) DS/64-192-48-1 30.0 47.9 31.8 10.9 33.6 46.3 28.0 42.1 44.5 18.8 49.1 65.0
SSD513 (Liu et al., 2016) ResNet-101 31.2 50.4 33.3 10.2 34.5 49.8 28.3 42.1 44.4 17.6 49.2 65.8
DSSD513 (Fu et al., 2017) ResNet-101 33.2 53.3 35.2 13.0 35.4 51.1 28.9 43.5 46.2 21.8 49.1 66.4
RefineDet512 (single scale) (Zhang et al., 2017) ResNet-101 36.4 57.5 39.5 16.6 39.9 51.4 - - - - - -
RetinaNet800 (Lin et al., 2017) ResNet-101 39.1 59.1 42.3 21.8 42.7 50.2 - - - - - -
RefineDet512 (multi scale) (Zhang et al., 2017) ResNet-101 41.8 62.9 45.7 25.6 45.1 54.1 - - - - - -
CornerNet511 (single scale) Hourglass-104 40.6 56.4 43.2 19.1 42.8 54.3 35.3 54.7 59.4 37.4 62.4 77.2
CornerNet511 (multi scale) Hourglass-104 42.2 57.8 45.2 20.7 44.8 56.6 36.6 55.9 60.3 39.5 63.2 77.3

Fig. 10 Example bounding box predictions overlaid on predicted heatmaps of corners.

4.4.5 Quality of the Bounding Boxes nerNet is able to generate bounding boxes of higher
quality compared to other state-of-the-art detectors.
A good detector should predict high quality bound-
ing boxes that cover objects tightly. To understand the
4.4.6 Error Analysis
quality of the bounding boxes predicted by CornerNet,
we evaluate the performance of CornerNet at multi- CornerNet simultaneously outputs heatmaps, offsets,
ple IoU thresholds, and compare the results with other and embeddings, all of which affect detection perfor-
state-of-the-art detectors, including RetinaNet (Lin et al., mance. An object will be missed if either corner is
2017), Cascade R-CNN (Cai and Vasconcelos, 2017) missed; precise offsets are needed to generate tight bound-
and IoU-Net (Jiang et al., 2018). ing boxes; incorrect embeddings will result in many
Tab. 5 shows that CornerNet achieves a much higher false bounding boxes. To understand how each part con-
AP at 0.9 IoU than other detectors, outperforming Cas- tributes to the final error, we perform an error analysis
cade R-CNN + IoU-Net by 3.9%, Cascade R-CNN by by replacing the predicted heatmaps and offsets with
7.6% and RetinaNet 2 by 7.3%. This suggests that Cor- the ground-truth values and evaluting performance on
2 the validation set.
We use the best model publicly available on
https://github.com/facebookresearch/Detectron/blob/ Tab. 6 shows that using the ground-truth corner
master/MODEL_ZOO.md heatmaps alone improves the AP from 38.4% to 73.1%.
12 Hei Law, Jia Deng

Fig. 11 Qualitative examples on MS COCO.

APs , APm and APl also increase by 42.3%, 40.7% and scale evaluation, CornerNet achieves an AP of 42.2%,
30.0% respectively. If we replace the predicted offsets the state of the art among existing one-stage methods
with the ground-truth offsets, the AP further increases and competitive with two-stage methods.
by 13.0% to 86.1%. This suggests that although there
is still ample room for improvement in both detecting
and grouping corners, the main bottleneck is detecting
corners. Fig. 9 shows some qualitative examples where
the corner locations or embeddings are incorrect. 5 Conclusion

We have presented CornerNet, a new approach to ob-

4.5 Comparisons with state-of-the-art detectors ject detection that detects bounding boxes as pairs of
corners. We evaluate CornerNet on MS COCO and
We compare CornerNet with other state-of-the-art de- demonstrate competitive results.
tectors on MS COCO test-dev (Tab. 7). With multi-
CornerNet: Detecting Objects as Paired Keypoints 13

Acknowledgements This work is partially supported by ings of the IEEE conference on computer vision
a grant from Toyota Research Institute and a DARPA grant and pattern recognition, pages 580–587.
FA8750-18-2-0019. This article solely reflects the opinions and
He, K., Gkioxari, G., Dollár, P., and Girshick,
conclusions of its authors.
R. (2017). Mask r-cnn. arxiv preprint arxiv:
170306870.
He, K., Zhang, X., Ren, S., and Sun, J. (2014). Spatial
References
pyramid pooling in deep convolutional networks
for visual recognition. In European Conference on
Bell, S., Lawrence Zitnick, C., Bala, K., and Girshick,
Computer Vision, pages 346–361. Springer.
R. (2016). Inside-outside net: Detecting objects in
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep
context with skip pooling and recurrent neural net-
residual learning for image recognition. In Proceed-
works. In Proceedings of the IEEE Conference on
ings of the IEEE conference on computer vision
Computer Vision and Pattern Recognition, pages
and pattern recognition, pages 770–778.
2874–2883.
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara,
Bodla, N., Singh, B., Chellappa, R., and Davis, L. S.
A., Fathi, A., Fischer, I., Wojna, Z., Song, Y.,
(2017). Soft-nmsimproving object detection with
Guadarrama, S., et al. (2017). Speed/accuracy
one line of code. In 2017 IEEE International Con-
trade-offs for modern convolutional object detec-
ference on Computer Vision (ICCV), pages 5562–
tors. In IEEE CVPR.
5570. IEEE.
Ioffe, S. and Szegedy, C. (2015). Batch normalization:
Cai, Z., Fan, Q., Feris, R. S., and Vasconcelos, N.
Accelerating deep network training by reducing in-
(2016). A unified multi-scale deep convolutional
ternal covariate shift. In International conference
neural network for fast object detection. In Euro-
on machine learning, pages 448–456.
pean Conference on Computer Vision, pages 354–
Jiang, B., Luo, R., Mao, J., Xiao, T., and Jiang, Y.
370. Springer.
(2018). Acquisition of localization confidence for
Cai, Z. and Vasconcelos, N. (2017). Cascade r-cnn:
accurate object detection. In Computer Vision–
Delving into high quality object detection. arXiv
ECCV 2018, pages 816–832. Springer.
preprint arXiv:1712.00726.
Kingma, D. P. and Ba, J. (2014). Adam: A
Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., and Feng, J.
method for stochastic optimization. arXiv preprint
(2017). Dual path networks. In Advances in Neural
arXiv:1412.6980.
Information Processing Systems, pages 4470–4478.
Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., and Chen,
Dai, J., Li, Y., He, K., and Sun, J. (2016). R-fcn: Ob-
Y. (2017). Ron: Reverse connection with object-
ject detection via region-based fully convolutional
ness prior networks for object detection. arXiv
networks. arXiv preprint arXiv:1605.06409.
preprint arXiv:1707.01691.
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H.,
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
and Wei, Y. (2017). Deformable convolutional net-
Imagenet classification with deep convolutional
works. CoRR, abs/1703.06211, 1(2):3.
neural networks. In Advances in neural informa-
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and
tion processing systems, pages 1097–1105.
Fei-Fei, L. (2009). Imagenet: A large-scale hi-
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., and Sun,
erarchical image database. In Computer Vision
J. (2017). Light-head r-cnn: In defense of two-stage
and Pattern Recognition, 2009. CVPR 2009. IEEE
object detector. arXiv preprint arXiv:1711.07264.
Conference on, pages 248–255. IEEE.
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariha-
Everingham, M., Eslami, S. A., Van Gool, L., Williams,
ran, B., and Belongie, S. (2016). Feature pyra-
C. K., Winn, J., and Zisserman, A. (2015). The
mid networks for object detection. arXiv preprint
pascal visual object classes challenge: A retrospec-
arXiv:1612.03144.
tive. International journal of computer vision,
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollár,
111(1):98–136.
P. (2017). Focal loss for dense object detection.
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., and Berg,
arXiv preprint arXiv:1708.02002.
A. C. (2017). Dssd: Deconvolutional single shot
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona,
detector. arXiv preprint arXiv:1701.06659.
P., Ramanan, D., Dollár, P., and Zitnick, C. L.
Girshick, R. (2015). Fast r-cnn. arXiv preprint
(2014). Microsoft coco: Common objects in con-
arXiv:1504.08083.
text. In European conference on computer vision,
Girshick, R., Donahue, J., Darrell, T., and Malik, J.
pages 740–755. Springer.
(2014). Rich feature hierarchies for accurate object
detection and semantic segmentation. In Proceed-
14 Hei Law, Jia Deng

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, volume 4, page 12.
S., Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single Tychsen-Smith, L. and Petersson, L. (2017a). Denet:
shot multibox detector. In European conference on Scalable real-time object detection with directed
computer vision, pages 21–37. Springer. sparse sampling. arXiv preprint arXiv:1703.10295.
Newell, A. and Deng, J. (2017). Pixels to graphs by Tychsen-Smith, L. and Petersson, L. (2017b).
associative embedding. In Advances in Neural In- Improving object localization with fitness
formation Processing Systems, pages 2168–2177. nms and bounded iou loss. arXiv preprint
Newell, A., Huang, Z., and Deng, J. (2017). Associative arXiv:1711.00164.
embedding: End-to-end learning for joint detection Uijlings, J. R., van de Sande, K. E., Gevers, T., and
and grouping. In Advances in Neural Information Smeulders, A. W. (2013). Selective search for ob-
Processing Systems, pages 2274–2284. ject recognition. International journal of computer
Newell, A., Yang, K., and Deng, J. (2016). Stacked vision, 104(2):154–171.
hourglass networks for human pose estimation. In Wang, X., Chen, K., Huang, Z., Yao, C., and Liu, W.
European Conference on Computer Vision, pages (2017). Point linking network for object detection.
483–499. Springer. arXiv preprint arXiv:1706.03646.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, Xiang, Y., Choi, W., Lin, Y., and Savarese, S. (2016).
E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Subcategory-aware convolutional neural networks
and Lerer, A. (2017). Automatic differentiation in for object proposals and detection. arXiv preprint
pytorch. arXiv:1604.04693.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. Xu, H., Lv, X., Wang, X., Ren, Z., and Chellappa, R.
(2016). You only look once: Unified, real-time ob- (2017). Deep regionlets for object detection. arXiv
ject detection. In Proceedings of the IEEE confer- preprint arXiv:1712.02408.
ence on computer vision and pattern recognition, Zhai, Y., Fu, J., Lu, Y., and Li, H. (2017). Feature selec-
pages 779–788. tive networks for object detection. arXiv preprint
Redmon, J. and Farhadi, A. (2016). Yolo9000: better, arXiv:1711.08879.
faster, stronger. arXiv preprint, 1612. Zhang, S., Wen, L., Bian, X., Lei, Z., and Li, S. Z.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster (2017). Single-shot refinement neural network for
r-cnn: Towards real-time object detection with re- object detection. arXiv preprint arXiv:1711.06897.
gion proposal networks. In Advances in neural in- Zhu, Y., Zhao, C., Wang, J., Zhao, X., Wu, Y., and Lu,
formation processing systems, pages 91–99. H. (2017). Couplenet: Coupling global structure
Shen, Z., Liu, Z., Li, J., Jiang, Y.-G., Chen, Y., and with local parts for object detection. In Proc. of
Xue, X. (2017a). Dsod: Learning deeply supervised Intl Conf. on Computer Vision (ICCV).
object detectors from scratch. In The IEEE Inter- Zitnick, C. L. and Dollár, P. (2014). Edge boxes: Lo-
national Conference on Computer Vision (ICCV), cating object proposals from edges. In European
volume 3, page 7. Conference on Computer Vision, pages 391–405.
Shen, Z., Shi, H., Feris, R., Cao, L., Yan, S., Liu, Springer.
D., Wang, X., Xue, X., and Huang, T. S.
(2017b). Learning object detectors from scratch
with gated recurrent feature pyramids. arXiv
preprint arXiv:1712.00886.
Shrivastava, A., Sukthankar, R., Malik, J., and Gupta,
A. (2016). Beyond skip connections: Top-down
modulation for object detection. arXiv preprint
arXiv:1612.06851.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recogni-
tion. arXiv preprint arXiv:1409.1556.
Singh, B. and Davis, L. S. (2017). An analysis of scale
invariance in object detection-snip. arXiv preprint
arXiv:1711.08189.
Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A.
(2017). Inception-v4, inception-resnet and the im-
pact of residual connections on learning. In AAAI,

Air Master Catalog
100% (2)
Air Master Catalog
191 pages
Object Detection Slides
No ratings yet
Object Detection Slides
90 pages
Unit 2 Elasticity of Demand and Supply
100% (1)
Unit 2 Elasticity of Demand and Supply
24 pages
Object Detection and Identification
67% (3)
Object Detection and Identification
20 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
MV cs4243 2024 Amir 6 p2
No ratings yet
MV cs4243 2024 Amir 6 p2
95 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
2.ObjectDetection Two Stage
No ratings yet
2.ObjectDetection Two Stage
66 pages
L7 Detection
No ratings yet
L7 Detection
54 pages
Object Detection
No ratings yet
Object Detection
57 pages
DLL Physical Science
No ratings yet
DLL Physical Science
9 pages
Deep Learning Algorithms For Object Detection
No ratings yet
Deep Learning Algorithms For Object Detection
43 pages
L10 Lecture Detection - Segmentation v2.5
No ratings yet
L10 Lecture Detection - Segmentation v2.5
35 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
Unit 3
No ratings yet
Unit 3
45 pages
CSE4261 Lecture-12
No ratings yet
CSE4261 Lecture-12
24 pages
CVR FDP
No ratings yet
CVR FDP
37 pages
139 Pretrained Networks Object Detection
No ratings yet
139 Pretrained Networks Object Detection
22 pages
Centernet: Keypoint Triplets For Object Detection
No ratings yet
Centernet: Keypoint Triplets For Object Detection
10 pages
Center Net
No ratings yet
Center Net
12 pages
Nerf RPN
No ratings yet
Nerf RPN
13 pages
Li 2021 J. Phys.: Conf. Ser. 1827 012085
No ratings yet
Li 2021 J. Phys.: Conf. Ser. 1827 012085
11 pages
Star Wars: The Force Awakens: Task 16. Quiz Assignment (Logic 5.1)
0% (1)
Star Wars: The Force Awakens: Task 16. Quiz Assignment (Logic 5.1)
2 pages
Lu Grid R-CNN CVPR 2019 Paper
No ratings yet
Lu Grid R-CNN CVPR 2019 Paper
10 pages
Cornernet: Detecting Objects As Paired Keypoints: Hei Law Jia Deng Princeton University, University of Michigan
No ratings yet
Cornernet: Detecting Objects As Paired Keypoints: Hei Law Jia Deng Princeton University, University of Michigan
24 pages
Machine Learning For High-Speed Corner Detection: July 2006
No ratings yet
Machine Learning For High-Speed Corner Detection: July 2006
15 pages
Grid R-CNN
No ratings yet
Grid R-CNN
9 pages
Cornernet: Detecting Objects As Paired Keypoints
No ratings yet
Cornernet: Detecting Objects As Paired Keypoints
17 pages
Fast R-CNN
No ratings yet
Fast R-CNN
9 pages
Xie Oriented R-CNN For Object Detection ICCV 2021 Paper
No ratings yet
Xie Oriented R-CNN For Object Detection ICCV 2021 Paper
10 pages
Scalable High Quality Object Detection
No ratings yet
Scalable High Quality Object Detection
10 pages
Fast R-CNN
No ratings yet
Fast R-CNN
9 pages
Overview of Object Detection Based On Deep Learnin
No ratings yet
Overview of Object Detection Based On Deep Learnin
7 pages
Tian FCOS Fully Convolutional One-Stage Object Detection ICCV 2019 Paper
No ratings yet
Tian FCOS Fully Convolutional One-Stage Object Detection ICCV 2019 Paper
10 pages
Object and Face Detection Based On Center-Net 1
No ratings yet
Object and Face Detection Based On Center-Net 1
7 pages
Liu High-Level Semantic Feature Detection A New Perspective For Pedestrian Detection CVPR 2019 Paper
No ratings yet
Liu High-Level Semantic Feature Detection A New Perspective For Pedestrian Detection CVPR 2019 Paper
10 pages
Object Detection Techniques A Review
No ratings yet
Object Detection Techniques A Review
9 pages
2022 - Enhanced Feature Fusion and Multiple Receptive Fields Object Detection
No ratings yet
2022 - Enhanced Feature Fusion and Multiple Receptive Fields Object Detection
12 pages
Blitznet: A Real-Time Deep Network For Scene Understanding
No ratings yet
Blitznet: A Real-Time Deep Network For Scene Understanding
11 pages
Assignment-2:DIP: Mr. Victor Mageto CP10101610245
No ratings yet
Assignment-2:DIP: Mr. Victor Mageto CP10101610245
10 pages
Gao, Packer, Koller - Unknown - A Segmentation-Aware Object Detection Model With Occlusion Handling-Annotated
No ratings yet
Gao, Packer, Koller - Unknown - A Segmentation-Aware Object Detection Model With Occlusion Handling-Annotated
8 pages
High-Level Semantic Feature Detection: A New Perspective For Pedestrian Detection
No ratings yet
High-Level Semantic Feature Detection: A New Perspective For Pedestrian Detection
10 pages
CenterNet Keypoint Triplets PDF
No ratings yet
CenterNet Keypoint Triplets PDF
10 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages
DSFD
No ratings yet
DSFD
10 pages
Accurate Single Stage Detector Using Recurrent Rolling Convolution
No ratings yet
Accurate Single Stage Detector Using Recurrent Rolling Convolution
9 pages
Contrastive Learning For Object Detection
No ratings yet
Contrastive Learning For Object Detection
5 pages
CNN Models To Detect Multiple Leds For Multilateral Occ.: Project: Ieee P802.15 Ig Vat
No ratings yet
CNN Models To Detect Multiple Leds For Multilateral Occ.: Project: Ieee P802.15 Ig Vat
9 pages
Ross Girshick Et Al - in 2013 Proposed An Architecture Called R-CNN (Region
No ratings yet
Ross Girshick Et Al - in 2013 Proposed An Architecture Called R-CNN (Region
6 pages
Adaptive Deconvolutional Networks For Mid and High Level Feature Learning
No ratings yet
Adaptive Deconvolutional Networks For Mid and High Level Feature Learning
8 pages
Scalable Object Detection
No ratings yet
Scalable Object Detection
8 pages
Bottom-Up Object Detection by Grouping Extreme and Center Points
No ratings yet
Bottom-Up Object Detection by Grouping Extreme and Center Points
10 pages
Tiny Object Recognition
No ratings yet
Tiny Object Recognition
8 pages
An Improved Rotation Invariant CNN-based Detector With Rotatable Bounding Boxes For Aerial Image Detection
No ratings yet
An Improved Rotation Invariant CNN-based Detector With Rotatable Bounding Boxes For Aerial Image Detection
5 pages
Ref 14
No ratings yet
Ref 14
5 pages
1904 08900v1 PDF
No ratings yet
1904 08900v1 PDF
11 pages
Face Detection With The Faster R-CNN
No ratings yet
Face Detection With The Faster R-CNN
6 pages
Shape Mode S and Object Recognition
No ratings yet
Shape Mode S and Object Recognition
2 pages
Fast R-CNN (R Girshick 2015) PDF
No ratings yet
Fast R-CNN (R Girshick 2015) PDF
9 pages
Literature Survey For Robotics
No ratings yet
Literature Survey For Robotics
6 pages
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
No ratings yet
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
6 pages
ABB DCS Function Code 15
No ratings yet
ABB DCS Function Code 15
2 pages
Chap 3 Vectors EC
No ratings yet
Chap 3 Vectors EC
12 pages
Manual Neuraltools6 en
No ratings yet
Manual Neuraltools6 en
104 pages
Numerical and Experimental Modelling of The Steam Assisted Gravity Drainage (SAGD) Process
No ratings yet
Numerical and Experimental Modelling of The Steam Assisted Gravity Drainage (SAGD) Process
7 pages
LIU 2019 Prepublication Version
No ratings yet
LIU 2019 Prepublication Version
351 pages
Design of High-Speed Comparator For LVDS Receiver
No ratings yet
Design of High-Speed Comparator For LVDS Receiver
3 pages
What Is Trip Circuit Supervision (TCS) Protection
No ratings yet
What Is Trip Circuit Supervision (TCS) Protection
7 pages
Oscillations - Class 11 Physics NCERT Solutions Free PDF Download
No ratings yet
Oscillations - Class 11 Physics NCERT Solutions Free PDF Download
41 pages
Mariadb Vs Mysql: Daniel Bartholomew Monty Program, Ab Sep 2012
No ratings yet
Mariadb Vs Mysql: Daniel Bartholomew Monty Program, Ab Sep 2012
6 pages
Infineon-AN50987 Getting Started With I2C in PSoC 1-ApplicationNotes-V07 00-En
No ratings yet
Infineon-AN50987 Getting Started With I2C in PSoC 1-ApplicationNotes-V07 00-En
28 pages
Notes 1
No ratings yet
Notes 1
76 pages
JNTUK-DAP-Course Structure and Syllabus-B.tech (Mechanical Engineering) - II YEAR.R10 Students
No ratings yet
JNTUK-DAP-Course Structure and Syllabus-B.tech (Mechanical Engineering) - II YEAR.R10 Students
28 pages
Dignaga's Philosophy of Language Dignaga On Anyapoha
No ratings yet
Dignaga's Philosophy of Language Dignaga On Anyapoha
374 pages
Objective Problems: (Level 1)
No ratings yet
Objective Problems: (Level 1)
7 pages
Rhcsa v8 Exam Objectives
No ratings yet
Rhcsa v8 Exam Objectives
2 pages
Traffic-Net - 3D Traffic Monitoring Using A Single Camera
100% (1)
Traffic-Net - 3D Traffic Monitoring Using A Single Camera
21 pages
MPL Series P21 - 33
No ratings yet
MPL Series P21 - 33
13 pages
19e Multifunctional Indicator Operator Manual
No ratings yet
19e Multifunctional Indicator Operator Manual
73 pages
ThinkServer TD350 - Product Guide
No ratings yet
ThinkServer TD350 - Product Guide
27 pages
Lanen - Fundamentals of Cost Accounting - 6e - Chapter 5 - Notes
No ratings yet
Lanen - Fundamentals of Cost Accounting - 6e - Chapter 5 - Notes
4 pages
Joint Monocular 3D Vehicle Detection and Tracking
No ratings yet
Joint Monocular 3D Vehicle Detection and Tracking
18 pages
Fiber Optics: Propagation of Light in An Optical Fiber
No ratings yet
Fiber Optics: Propagation of Light in An Optical Fiber
16 pages
Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras
No ratings yet
Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras
8 pages
Appendices: A B C D
No ratings yet
Appendices: A B C D
14 pages
Topic Test Memo G11 (Energy & Chemical Change) (F)
No ratings yet
Topic Test Memo G11 (Energy & Chemical Change) (F)
4 pages
Intelligent Search Algorithms: Forth Year
No ratings yet
Intelligent Search Algorithms: Forth Year
17 pages
Deep Layer Aggregation
No ratings yet
Deep Layer Aggregation
10 pages
Corn Starch
No ratings yet
Corn Starch
8 pages
Vision Meets Robotics: The KITTI Dataset: Andreas Geiger, Philip Lenz, Christoph Stiller and Raquel Urtasun
No ratings yet
Vision Meets Robotics: The KITTI Dataset: Andreas Geiger, Philip Lenz, Christoph Stiller and Raquel Urtasun
6 pages
5-IA Overlap Clarification May 2020
No ratings yet
5-IA Overlap Clarification May 2020
3 pages
Haxmaps 159016197889
No ratings yet
Haxmaps 159016197889
2 pages
MOT20: A Benchmark For Multi Object Tracking in Crowded Scenes
No ratings yet
MOT20: A Benchmark For Multi Object Tracking in Crowded Scenes
7 pages
Real-Time Collision Warning System Based On Computer Vision Using Mono Camera
No ratings yet
Real-Time Collision Warning System Based On Computer Vision Using Mono Camera
6 pages
The Quantum Internet Revolu-tion: A Deep Dive into the Science and Potential of the Quantum Internet
From Everand
The Quantum Internet Revolu-tion: A Deep Dive into the Science and Potential of the Quantum Internet
Dee Thompson
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet

CornerNet Detecting Objects As Paired Keypoints

Uploaded by

CornerNet Detecting Objects As Paired Keypoints

Uploaded by

CornerNet: Detecting Objects as Paired Keypoints

Hei Law · Jia Deng

feature maps top-left corner pooling

chor boxes to reduce the number of negative boxes, and 3 CornerNet

Top-left Corners Heatmaps

output is usually smaller than the image. Hence, a lo-

Let pcij be the score at location (i, j) for class c

3.5 Hourglass Network

Top-left Corner Pooling Module

Top-left Corner Pooling

1x1 Conv-BN 3x3 Conv-ReLU 1x1 Conv

ing, we simply use stride 2 to reduce feature resolu- 4 Experiments

Table 1 Ablation on corner pooling on MS COCO validation.

4.2 Testing Details

During testing, we use a simple post-processing algo-

w/o corner pooling

Table 4 The hourglass network is crucial to the performance of CornerNet.

4.4 Ablation Study different quadrants of an image. Detecting corners can

Fig. 10 Example bounding box predictions overlaid on predicted heatmaps of corners.

Fig. 11 Qualitative examples on MS COCO.

We have presented CornerNet, a new approach to ob-

You might also like