0% found this document useful (0 votes)

83 views9 pages

Holistically-Nested Edge Detection

Uploaded by

nathalia b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views9 pages

Holistically-Nested Edge Detection

Uploaded by

nathalia b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

2015 IEEE International Conference on Computer Vision

Holistically-Nested Edge Detection

Saining Xie Zhuowen Tu

Dept. of CSE and Dept. of CogSci Dept. of CogSci and Dept. of CSE
University of California, San Diego University of California, San Diego
9500 Gilman Drive, La Jolla, CA 92093 9500 Gilman Drive, La Jolla, CA 92093
[email protected] [email protected]

Abstract
We develop a new edge detection algorithm that ad-
dresses two important issues in this long-standing vision
problem: (1) holistic image training and prediction; and (2)
(a) original image (b) ground truth (c) HED: output
multi-scale and multi-level feature learning. Our proposed
method, holistically-nested edge detection (HED), performs
image-to-image prediction by means of a deep learning
model that leverages fully convolutional neural networks
and deeply-supervised nets. HED automatically learns rich
(d) HED: side output 2 (e) HED: side output 3 (f) HED: side output 4
hierarchical representations (guided by deep supervision on
side responses) that are important in order to resolve the
challenging ambiguity in edge and object boundary detec-
tion. We signiﬁcantly advance the state-of-the-art on the
BSD500 dataset (ODS F-score of .782) and the NYU Depth
dataset (ODS F-score of .746), and do so with an improved (g) Canny: = 2 (h) Canny: = 4 (i) Canny: = 8

speed (0.4s per image) that is orders of magnitude faster Figure 1. Illustration of the proposed HED algorithm. In the ﬁrst row:
(a) shows an example test image in the BSD500 dataset [28]; (b) shows its
than some recent CNN-based edge detection algorithms. corresponding edges as annotated by human subjects; (c) displays the HED
results. In the second row: (d), (e), and (f), respectively, show side edge
responses from layers 2, 3, and 4 of our convolutional neural networks. In
1. Introduction the third row: (g), (h), and (i), respectively, show edge responses from the
Canny detector [4] at the scales σ = 2.0, σ = 4.0, and σ = 8.0. HED
In this paper, we address the problem of detecting edges shows a clear advantage in consistency over Canny.
and object boundaries in natural images. This problem is
both fundamental and of great importance to a variety of ing, one may categorize works into a few groups such as I:
computer vision areas ranging from traditional tasks such as early pioneering methods like the Sobel detector [20], zero-
visual saliency, segmentation, object detection/recognition, crossing [27, 37], and the widely adopted Canny detector
tracking and motion analysis, medical imaging, structure- [4]; methods driven by II: information theory on top of fea-
from-motion and 3D reconstruction, to modern applications tures arrived at through careful manual design, such as Sta-
like autonomous driving, mobile computing, and image-to- tistical Edges [22], Pb [28], and gPb [1]; and III: learning-
text analysis. It has been long understood that precisely lo- based methods that remain reliant on features of human
calizing edges in natural images involves visual perception design, such as BEL [5], Multi-scale [30], Sketch Tokens
of various “levels” [18, 27]. A relatively comprehensive [24], and Structured Edges [6]. In addition, there has been
data collection and cognitive study [28] shows that while a recent wave of development using Convolutional Neural
different subjects do have somewhat different preferences Networks that emphasize the importance of automatic hier-
regarding where to place the edges and boundaries, there archical feature learning, including N 4 -Fields [10], Deep-
was nonetheless impressive consistency between subjects, Contour [34], DeepEdge [2], and CSCNN [19]. Prior to
e.g. reaching F-score 0.80 in the consistency study [28]. this explosive development in deep learning, the Struc-
The history of computational edge detection is extremely tured Edges method (typically abbreviated SE) [6] emerged
rich; we now highlight a few representative works that have as one of the most celebrated systems for edge detection,
proven to be of great practical importance. Broadly speak- thanks to its state-of-the-art performance on the BSD500

1550-5499/15 $31.00 © 2015 IEEE 1395

DOI 10.1109/ICCV.2015.164

Authorized licensed use limited to: Universidad de La Salle. Downloaded on November 16,2020 at 01:28:19 UTC from IEEE Xplore. Restrictions apply.
dataset [28] (with, e.g., F-score of .746) and its practically and/or learned features [28, 5], (2) multi-scale response fu-
significant speed of 2.5 frames per second. Recent CNN- sion [40, 32, 30], (3) engagement of different levels of vi-
based methods [10, 34, 2, 19] have demonstrated promis- sual perception [18, 27, 39, 17] such as mid-level Gestalt
ing F-score performance improvements over SE. However, law information [7], (4) incorporating structural informa-
there still remains large room for improvement in these tion (intrinsic correlation carried within the input data and
CNN-based methods, in both F-score performance and in output solution) [6] and context (both short- and long- range
speed — at present, time to make a prediction ranges from interactions) [38], (5) making holistic image predictions (re-
several seconds [10] to a few hours [2] (even when using ferring to approaches that perform prediction by taking the
modern GPUs). image contents globally and directly) [25], (6) exploiting
Here, we develop an end-to-end edge detection system, 3D geometry [15], and (7) addressing occlusion boundaries
holistically-nested edge detection (HED), that automati- [16].
cally learns the type of rich hierarchical features that are Structured Edges (SE) [6] primarily focuses on three of
crucial if we are to approach the human ability to resolve these aspects: using a large number of manually designed
ambiguity in natural image edge and object boundary de- features (property 1), fusing multi-scale responses (prop-
tection. We use the term “holistic”, because HED, despite erty 2), and incorporating structural information (property
not explicitly modeling structured output, aims to train and 4). A recent wave of work using CNN for patch-based
predict edges in an image-to-image fashion. With “nested”, edge prediction [10, 34, 2, 19] contains an alternative com-
we emphasize the inherited and progressively refined edge mon thread that focuses on three aspects: automatic feature
maps produced as side outputs — we intend to show that learning (property 1), multi-scale response fusion (prop-
the path along which each prediction is made is common erty 2), and possible engagement of different levels of vi-
to each of these edge maps, with successive edge maps be- sual perception (property 3). However, due to the lack of
ing more concise. This integrated learning of hierarchical deep supervision (that we include in our method), the multi-
features is in distinction to previous multi-scale approaches scale responses produced at the hidden layers in [2, 19]
[40, 41, 30] in which scale-space edge fields are neither au- are less semantically meaningful, since feedback must be
tomatically learned nor hierarchically connected. Figure 1 back-propagated through the intermediate layers. More im-
gives an illustration of an example image together with the portantly, their patch-to-pixel or patch-to-patch strategy re-
human subject ground truth annotation, as well as results sults in significantly downgraded training and prediction ef-
by the proposed HED edge detector (including the side re- ficiency. By “holistically-nested”, we intend to emphasize
sponses of the individual layers), and results by the Canny that we are producing an end-to-end edge detection sys-
edge detector [4] with different scale parameters. Not only tem, a strategy inspired by fully convolutional neural net-
are Canny edges at different scales not directly connected, works [26], but with additional deep supervision on top of
they also exhibit spatial shift and inconsistency. trimmed VGG nets [36] (shown in Figure 3). In the absence
The proposed holistically-nested edge detector (HED) of deep supervision and side outputs, a fully convolutional
tackles two critical issues: (1) holistic image training and network [26] (FCN) produces a less satisfactory result (e.g.
prediction, inspired by fully convolutional neural networks F-score .745 on BSD500) than HED, since edge detection
[26], for image-to-image classification (the system takes an demands highly accurate edge pixel localization. One thing
image as input, and directly produces the edge map image worth mentioning is that our image-to-image training and
as output); and (2) nested multi-scale feature learning, in- prediction strategy still has not explicitly engaged contex-
spired by deeply-supervised nets [23], that performs deep tual information, since constraints on the neighboring pixel
layer supervision to “guide” early classification results. We labels are not directly enforced in HED. In addition to the
find that the favorable characteristics of these underlying speed gain over patch-based CNN edge detection methods,
techniques manifest in HED being both accurate and com- the performance gain is largely due to three aspects: (1)
putationally efficient. FCN-like image-to-image training allows us to simultane-
ously train on a significantly larger amount of samples (see
2. Holistically-Nested Edge Detection Table 4); (2) deep supervision in our model guides the learn-
ing of more transparent features (see Table 2); (3) interpo-
In this section, we describe in detail the formulation of
lating the side outputs in the end-to-end learning encourages
our proposed edge detection system. We start by discussing
coherent contributions from each layer (see Table 3).
related neural-network-based approaches, particularly those
that emphasize multi-scale and multi-level feature learning. 2.1. Existing multi-scale and multi-level NN
The task of edge and object boundary detection is inherently Due to the nature of hierarchical learning in the deep
challenging. After decades of research, there have emerged convolutional neural networks, the concept of multi-scale
a number of properties that are key and that are likely to and multi-level learning might differ from situation to sit-
play a role in a successful system: (1) carefully designed uation. For example, multi-scale learning can be “inside”

1396

Authorized licensed use limited to: Universidad de La Salle. Downloaded on November 16,2020 at 01:28:19 UTC from IEEE Xplore. Restrictions apply.
2XWSXW/D\HU

+LGGHQ/D\HU

2XWSXW'DWD

D E F G H

Figure 2. Illustration of different multi-scale deep learning architecture conﬁgurations: (a) multi-stream architecture; (b) skip-layer net architecture; (c) a
single model running on multi-scale inputs; (d) separate training of different networks; (e) our proposed holistically-nested architectures, where multiple
side outputs are added.

the neural network, in the form of increasingly larger recep- (as “ensemble testing”). One notable example is the tied-
tive fields and downsampled (strided) layers. In this “in- weight pyramid networks [8]. This approach is also com-
side” case, the feature representations learned in each layer mon in non-deep-learning based methods [6]. Note that en-
are naturally multi-scale. On the other hand, multi-scale semble testing impairs the prediction efficiency of learning
learning can be “outside” of the neural network, for exam- systems, especially with deeper models[2, 10].
ple by “tweaking the scales” of input images. While these Training independent networks: As an extreme variant
two variants have some notable similarities, we have seen to Fig 2(a), one might pursue Fig 2(d), in which multi-scale
both of them applied to various tasks. predictions are made by training multiple independent net-
We continue by next formalizing the possible configu- works with different depths and different output loss lay-
rations of multi-scale deep learning into four categories, ers. This might be practically challenging to implement as
namely, multi-stream learning, skip-net learning, a single this duplication would multiply the amount of resources re-
model running on multiple inputs, and training of indepen- quired for training.
dent networks. An illustration is shown in Fig 2. Having Holistically-nested networks: We list these variants to
these possibilities in mind will help make clearer the ways help clarify the distinction between existing approaches and
in which our proposed holistically-nested network approach our proposed holistically-nested network approach, illus-
differs from previous efforts and will help to highlight the trated in Fig 2(e). There is often significant redundancy
important benefits in terms of representation and efficiency. in existing approaches, in terms of both representation
Multi-stream learning [3, 29] A typical multi-stream and computational complexity. Our proposed holistically-
learning architecture is illustrated in Fig 2(a). Note that the nested network is a relatively simple variant that is able to
multiple (parallel) network streams have different parame- produce predictions from multiple scales. The architecture
ter numbers and receptive field sizes, corresponding to mul- can be interpreted as a “holistically-nested” version of the
tiple scales. Input data are simultaneously fed into multi- “independent networks” approach in Fig 2(d), motivating
ple streams, after which the concatenated feature responses our choice of name. Our architecture comprises a single-
produced by the various streams are fed into a global output stream deep network with multiple side outputs. This archi-
layer to produce the final result. tecture resembles several previous works, particularly the
Skip-layer network learning: Examples of this form of deeply-supervised net[23] approach in which the authors
network include [26, 14, 2, 33, 10]. The key concept in show that hidden layer supervision can improve both op-
“skip-layer” network learning is shown in Fig 2(b). Instead timization and generalization for image classification tasks.
of training multiple parallel streams, the topology for the The multiple side outputs also give us the flexibility to add
skip-net architecture centers on a primary stream. Links are an additional fusion layer if a unified output is desired.
added to incorporate the feature responses from different
2.2. Formulation
levels of the primary network stream, and these responses
are then combined in a shared output layer. Training Phase We denote our input training data set by
A common point in the two settings above is that, in both S = {(Xn , Yn ), n = 1, . . . , N }, where sample Xn =
(n)
of the architectures, there is only one output loss function {xj , j = 1, . . . , |Xn |} denotes the raw input image and
with a single prediction produced. However, in edge detec- (n) (n)
Yn = {yj , j = 1, . . . , |Xn |}, yj ∈ {0, 1} denotes the
tion, it is often favorable (and indeed prevalent) to obtain corresponding ground truth binary edge map for image Xn .
multiple predictions to combine the edge maps together. We subsequently drop the subscript n for notational sim-
Single model on multiple inputs: To get multi-scale pre- plicity, since we consider each image holistically and inde-
dictions, one can also run a single network (or networks pendently. Our goal is to have a network that learns features
with tied weights) on multiple (scaled) input images, as il- from which it is possible to produce edge maps approaching
lustrated in Fig 2(c). This strategy can happen at both the the ground truth. For simplicity, we denote the collection of
training stage (as data augmentation) and at the testing stage all standard network layer parameters as W. Suppose in the

1397

Authorized licensed use limited to: Universidad de La Salle. Downloaded on November 16,2020 at 01:28:19 UTC from IEEE Xplore. Restrictions apply.
network we have M side-output layers. Each side-output Input image X
()

layer is also associated with a classiﬁer, in which the cor- Y
responding weights are denoted as w = (w(1) , . . . , w(M ) ). ( )

We consider the objective function
M
()

(m)
Lside (W, w) = αm side (W, w(m) ), (1)
Side-output 1
m=1 ()

where side denotes the image-level loss function for side- Side-output 2
outputs. In our image-to-image training, the loss function is ()
Side-output 3

computed over all pixels in a training image X = (xj , j = Y
1, . . . , |X|) and edge map Y = (yj , j = 1, . . . , |X|), yj ∈ Receptive Field Size
Side-output 4
{0, 1}. For a typical natural image, the distribution of 5 14 40 92 196 Side-output
output 5
edge/non-edge pixels is heavily biased: 90% of the ground
Weighted-fusion layer Error Propagation Path
truth is non-edge. A cost-sensitive loss function is proposed Side-output layer Error Propagation Path
ground truth

in [19], with additional trade-off parameters introduced for

biased sampling.
Figure 3. Illustration of our network architecture for edge detection, high-
We instead use a simpler strategy to automatically bal- lighting the error backpropagation paths. Side-output layers are inserted
ance the loss between positive/negative classes. We intro- after convolutional layers. Deep supervision is imposed at each side-output
duce a class-balancing weight β on a per-pixel term basis. layer, guiding the side-outputs towards edge predictions with the charac-
Index j is over the image spatial dimensions of image X. teristics we desire. The outputs of HED are multi-scale and multi-level,
with the side-output-plane size becoming smaller and the receptive field
Then we use this class-balancing weight as a simple way to size becoming larger. One weighted-fusion layer is added to automatically
offset this imbalance between edge and non-edge. Specifi- learn how to combine outputs from multiple scales. The entire network is
cally, we define the following class-balanced cross-entropy trained with multiple error propagation paths (dashed lines).
loss function used in Equation (1)
(m)
See section 4 for detailed hyper-parameter and experiment
side (W, w(m) ) = −β log Pr(yj = 1|X; W, w(m) ) settings.
j∈Y+ Testing phase During testing, given image X, we obtain
− (1 − β) log Pr(yj = 0|X; W, w(m) ) (2) edge map predictions from both the side output layers and
j∈Y− the weighted-fusion layer:
where β = |Y− |/|Y | and 1 − β = |Y+ |/|Y |. |Y− | and |Y+ | (1) (M )
denote the edge and non-edge ground truth label sets, re- (Ŷfuse , Ŷside , . . . , Ŷside ) = CNN(X, (W, w, h) ), (5)
(m)
spectively. Pr(yj = 1|X; W, w(m) ) = σ(aj ) ∈ [0, 1]
where CNN(·) denotes the edge maps produced by our net-
is computed using sigmoid function σ(.) on the activation
work. The final unified output can be obtained by further
value at pixel j. At each side output layer, we then obtain
(m) (m) (m) aggregating these generated edge maps. The details will be
edge map predictions Ŷside = σ(Âside ), where Âside ≡ discussed in section 4.
(m)
{aj , j = 1, . . . , |Y |} are activations of the side-output of
(1) (M )
layer m. ŶHED = Average(Ŷfuse , Ŷside , . . . , Ŷside ) (6)
To directly utilize side-output predictions, we add a
“weighted-fusion” layer to the network and (simultane- 3. Network Architecture
ously) learn the fusion weight during training. Our loss 3.1. Trimmed network for edge detection
function at the fusion layer Lfuse becomes Our goal is to create a deep network to efficiently gen-
erate perceptually multi-level features, and to have multiple
Lfuse (W, w, h) = Dist(Y, Ŷfuse ) (3)
stages with different strides to capture the intrinsic scales of
M (m) edge maps. VGGNet [36] has been seen to achieve state-of-
where Ŷfuse ≡ σ( m=1 hm Âside ) where h =
(h1 , . . . , hM ) is the fusion weight. Dist(·, ·) is the dis- the-art performance in the ImageNet challenge, with great
tance between the fused predictions and the ground truth depth (16 convolutional layers), great density (stride-1 con-
label map, which we set to be cross-entropy loss. Putting volutional kernels), and multiple stages (five 2-stride down-
everything together, we minimize the following objective sampling layers). Recent work [2] also demonstrates that
function via standard (back-propagation) stochastic gradi- fine-tuning deep neural networks pre-trained on the gen-
ent descent: eral image classification task is useful to the low-level edge
detection task. We therefore adopt the VGGNet architec-
(W, w, h) = argmin(Lside (W, w) + Lfuse (W, w, h)) ture but make the following modifications: (a) we connect
(4) our side output layer to the last convolutional layer in each

1398

Authorized licensed use limited to: Universidad de La Salle. Downloaded on November 16,2020 at 01:28:19 UTC from IEEE Xplore. Restrictions apply.
stage, respectively conv1 2, conv2 2, conv3 3, conv4 3, w/ deep supervision w/o deep supervision w// deep
p supervision
p / deep
w/o p supervision
p

conv5 3. The receptive ﬁeld size of each of these convo-

lutional layers is identical to the corresponding side-output
layer; (b) we cut the last stage of VGGNet, including the 5th
pooling layer and all the fully connected layers. The reason
for “trimming” the VGGNet is two-fold. First, because we
are expecting meaningful side outputs with different scales,
a layer with stride 32 yields a too-small output plane with
the consequence that the interpolated prediction map will
be too fuzzy to utilize. Second, the fully connected lay-
ers (even when recast as convolutions) are computationally
intensive, so that trimming layers from pool5 on can sig-
nificantly reduce the memory/time cost during both train-
ing and testing. Our final HED network architecture has 5
stages, with strides 1, 2, 4, 8 and 16, respectively, and with
different receptive field sizes, all nested in the VGGNet. See
Table 1 for a summary of the configurations of the receptive
fields and strides. Figure 4. Two examples illustrating how deep supervision helps side-
Table 1. The receptive field and stride size in VGGNet [36] used in HED. output layers to produce multi-scale dense predictions. Note that in the left
The bolded convolutional layers are linked to additional side-output layers. column, the side outputs become progressively coarser and more “global”,
while critical object boundaries are preserved. In the right column, the
layer c1 2 p1 c2 2 p2 c3 3 predictions tends to lack any discernible order (e.g. in layers 1 and 2), and
rf size 5 6 14 16 40 many boundaries are lost in later stages.
stride 1 2 2 4 4 further explore whether the performance can be improved
layer p3 c4 3 p4 c5 3 p5 by adding even more links from low-level layers. We then
rf size 44 92 100 196 212 create an FCN-2s network that adds additional links from
stride 8 8 16 16 32 the pool1 and pool2 layers. Still, directly applying the FCN
3.2. Architecture alternatives skip-net topology falls behind our proposed HED architec-
ture (see second row of Table 2). With heavy tweaking of
Below we discuss some possible alternatives in architec- FCN, there is a possibility that one might be able to achieve
ture design, and in particular, the role of deep supervision competitive performance on edge detection, but the multi-
of HED for the edge detection task. scale side-outputs in HED are seen to be natural and intu-
Table 2. Performance of alternative architectures on BSDS dataset. The itive for edge detection.
“fusion-output without deep supervision” result is learned w.r.t Eqn. 3. The The role of deep supervision Since we incorporate a
“fusion-output with deep supervision” result is learned w.r.t. to Eqn. 4. weighted-fusion output layer that connects each side-output
ODS OIS AP layer, there is a need to justify the adoption of the deep
FCN-8S .697 .715 .673 supervision terms (specifically, side (W, w(m) ): now the
FCN-2S .738 .756 .717 entire network is path-connected and the output-layer pa-
Fusion-output (w/o deep supervision) .771 .785 .738 rameters can be updated by back-propagation through the
Fusion-output (with deep supervision) .782 .802 .787 weighted-fusion layer error propagation path (subject to
Equation 3). Here we show that deep supervision is impor-
FCN and skip-layer architecture The topology used in the tant to obtain desired edge maps. The key characteristic of
FCN model differs from that in our HED model in several our proposed network is that each network layer is supposed
aspects. As we have discussed, while FCN reinterprets clas- to play a role as a singleton network responsible for produc-
sification nets for per-pixel prediction, it has only one output ing an edge map at a certain scale. Here are some qualitative
loss function. Thus, in FCN, although the skip net structure results based on the two variants discussed above: (1) train-
is a DAG that combines coarse, high-layer information with ing with both weighted-fusion supervision and deep super-
fine low-layer information, it does not explicitly produce vision, and (2) training with weighted-fusion supervision
multi-scale output predictions. We explore how this archi- only. We observe that with deep supervision, the nested
tecture can be used for the edge detection task under the side-outputs are natural and intuitive, insofar as the suc-
same experimental setting as our HED model. We first try to cessive edge map predictions are progressively coarse-to-
directly apply the FCN-8s model by replacing the loss func- fine, local-to-global. On the other hand, training with only
tion with cross-entropy loss for edge detection. The results the weighted-fusion output loss gives edge predictions that
shown in first row of Table 2 are unsatisfactory, which is lack such discernible order: many critical edges are absent
expected since this architecture is still not fine enough. We at the higher layer side output; under exactly same experi-

1399

Authorized licensed use limited to: Universidad de La Salle. Downloaded on November 16,2020 at 01:28:19 UTC from IEEE Xplore. Restrictions apply.
mental setup, the result on the benchmark dataset (row three behavior, even with the help of a pre-trained model. We
of Table 2) differs only marginally in F-score but displays observe that this mismatch leads to back-propagated gradi-
severely degenerated average precision; without direct con- ents that explode at the high-level side-output layers. We
trol and guidance across multiple scales, this network is therefore adjust how we make use of the ground truth labels
heavily biased towards learning large structure edges. in the BSDS dataset to combat this issue. Specifically, the
ground truth labels are provided by multiple annotators and
4. Experiments thus, implicitly, greater labeler consensus indicates stronger
ground truth edges. We adopt a relatively brute-force solu-
In this section we discuss our detailed implementation
tion: only assign a pixel a positive label if it is labeled as
and report the performance of our proposed algorithm.
positive by at least three annotators; regard all other labeled
4.1. Implementation pixels as negatives. This helps with the problem of gradi-
ent explosion in high level side-output layers. For low level
We implement our framework using the publicly avail- layers, this consensus approach brings additional robustness
able Caffe Library and build on top of the publicly available to edge classification and prevents the network from being
implementations of FCN[26] and DSN[23]. Thus, relatively distracted by weak edges. Although not fully explored in
little engineering hacking is required. In our HED system, our paper, a careful handling of consensus levels of ground
the whole network is fine-tuned from an initialization with truth edges might lead to further improvement.
the pre-trained VGG-16 Net model.
Data augmentation Data augmentation has proven to be a
Model parameters In contrast to fine-tuning CNN to per-
crucial technique in deep networks. We rotate the images
form image classification or semantic segmentation, adapt-
to 16 different angles and crop the largest rectangle in the
ing CNN to perform low-level edge detection requires spe-
rotated image; we also flip the image at each angle, lead-
cial care. Differences in data distribution, ground truth dis-
ing to an augmented training set that is a factor of 32 larger
tribution, and loss function all contribute to difficulties in
than the unaugmented set. During testing we operate on an
network convergence, even with the initialization of a pre-
input image at its original size. We also note that “ensem-
trained model. We first use a validation set and follow
ble testing” (making predictions on rotated/flipped images
the evaluation strategy used in [6] to tune the deep model
and averaging the predictions) yields no improvements in
hyper-parameters. The hyper-parameters (and the values we
F-score, nor in average precision.
choose) include: mini-batch size (10), learning rate (1e-6),
Different pooling functions Previous work [2] suggests
loss-weight αm for each side-output layer (1), momentum
that different pooling functions can have a major impact
(0.9), initialization of the nested filters (0), initialization of
on edge detection results. We conduct a controlled exper-
the fusion layer weights (1/5), weight decay (0.0002), num-
iment in which all pooling layers are replaced by average
ber of training iterations (10,000; divide learning rate by 10
pooling. We find that using average pooling decrease the
after 5,000). We focus on the convergence behavior of the
performance to ODS=.741.
network. We observe that whenever training converges, the
In-network bilinear interpolation Side-output prediction
deviations in F-score on the validation set tend to be very
upsampling is implemented with in-network deconvolu-
small. In order to investigate whether including additional
tional layers, similar to those in [26]. We fix all the decon-
nonlinearity helps, we also consider a setting in which we
volutional layers to perform linear interpolation. Although
add an additional layer (with 50 filters and a ReLU) be-
it was pointed out in [26] that one can learn arbitrary in-
fore each side-output layer; we find that this worsens per-
terpolation functions, we find that learned deconvolutions
formance. On another note, we observe that our nested
provide no noticeable improvements in our experiments.
multi-scale framework is insensitive to input image scales;
during our training process, we take advantage of this by Running time Training takes about 7 hours on a single
resizing all the images to 400 × 400 to reduce GPU mem- NVIDIA K40 GPU. For a 320 × 480 image, it takes HED
ory usage and to take advantage of efficient batch process- 400 ms to produce the final edge map (including the inter-
ing. In the experiments that follow, we fix the values of all face overhead), which is significantly faster than existing
hyper-parameters discussed above to explore the benefits of CNN-based methods [34, 2]. Some previous edge detec-
possible variants of HED. tors also try to improve performance by the less desirable
Consensus sampling In our approach, we duplicate the expedient of sacrificing efficiency (for example, by testing
ground truth at each side-output layer and resize the (down- on input images from multiple scales and averaging the re-
sampled) side output to its original scale. Thus, there ex- sults).
ists a mismatch in the high-level side-outputs: the edge
4.2. BSDS500 dataset
predictions are coarse and global, while the ground truth
still contains many weak edges that could even be consid- We evaluate HED on the Berkeley Segmentation Dataset
ered as noise. This issue leads to problematic convergence and Benchmark (BSDS 500) [1] which is composed of 200

1400

Authorized licensed use limited to: Universidad de La Salle. Downloaded on November 16,2020 at 01:28:19 UTC from IEEE Xplore. Restrictions apply.

for instance, combining predictions from multiple scales
yields better performance; moreover, all the side-output lay-

ers contribute to the performance gain, either in F-score or
averaged precision. To see this, in Table 3, the side-output

layer 1 and layer 5 (the lowest and highest layers) achieve
similar relatively low performance. One might expect these
two side-output layers to not be useful in the averaged re-
3UHFLVLRQ

>) @+XPDQ
>) @+('RXUV
>)
>)
@'HHS&RQWRXU
@&6&11
sults. However this turns out not to be the case — for exam-
>) @'HHS(GJH
>) @2() ple, the Average 1-4 achieves ODS=.760 and incorporating
>) @6(PXOWLíXFP
>)
>)
@6(
@6&*
the side-output layer 5, the averaged prediction achieves an

>)
>)
@6NHWFK7RNHQV
@J3EíRZWíXFP
ODS=.774. We find similar phenomenon when considering
>) @,6&5$
>) @*E other ranges. As mentioned above, the predictions obtained
>) @0HDQ6KLIW
>)
>)
@1RUPDOL]HG&XWV
@)HO]í+XWW
using different combination strategies are complementary,

>)

@&DQQ\

and a late merging of the averaged predictions with learned
5HFDOO
fusion-layer predictions leads to the best result. Another ob-
Figure 5. Results on the BSDS500 dataset. Our proposed HED frame-
work achieves the best result (ODS=.782). Compared to several recent servation is, when compared to previous ”non-deep” meth-
CNN-based edge detectors, our approach is also orders of magnitude faster. ods, performance of all ”deep” methods drops more in the
See Table 4 for a detailed discussion. high recall regime. This might indicate that deep learned
training, 100 validation, and 200 testing images. Each im- features are capable of (and favor) learning the global ob-
age has manually annotated ground truth contours. Edge de- ject boundary — thus many weak edges are omitted. HED
tection accuracy is evaluated using three standard measures: is better than other deep learning based methods in the high
fixed contour threshold (ODS), per-image best threshold recall regime because deep supervision helps us to take the
(OIS), and average precision (AP). We apply a standard low level predictions into account.
non-maximal suppression technique to our edge maps to ob-
tain thinned edges for evaluation. The results are shown in Table 4. Results on BSDS500. ∗BSDS300 results,†GPU time
Figure 5 and Table 4. ODS OIS AP FPS
Human .80 .80 - -
Table 3. Results of single and averaged side output in HED on
Canny .600 .640 .580 15
the BSDS 500 dataset. The individual side output contributes to
the fused/averaged result. Note that the learned weighted-fusion
Felz-Hutt [9] .610 .640 .560 10
(Fusion-output) achieves best F-score, while directly averaging all BEL [5] .660∗ - - 1/10
of the five layers (Average 1-5) produces better average precision. gPb-owt-ucm [1] .726 .757 .696 1/240
Merging those two readily available outputs further boost the per- Sketch Tokens [24] .727 .746 .780 1
formance. SCG [31] .739 .758 .773 1/280
ODS OIS AP SE-Var [6] .746 .767 .803 2.5
Side-output 1 .595 .620 .582 OEF [13] .749 .772 .817 -
Side-output 2 .697 .715 .673 DeepNets [21] .738 .759 .758 1/5†
Side-output 3 .738 .756 .717 N4-Fields [10] .753 .769 .784 1/6†
Side-output 4 .740 .759 .672 DeepEdge [2] .753 .772 .807 1/103 †
Side-output 5 .606 .611 .429
CSCNN [19] .756 .775 .798 -
Fusion-output .782 .802 .787
DeepContour [34] .756 .773 .797 1/30†
Average 1-4 .760 .784 .800
2.5†,
Average 1-5 .774 .797 .822 HED (ours) .782 .804 .833
1/12
Average 2-4 .766 .788 .798
Average 2-5 .777 .800 .814 Late merging to boost average precision We find that the
Merged result .782 .804 .833 weighted-fusion layer output gives best performance in F-
Side outputs To explicitly validate the side outputs, we score. However the average precision degrades compared
summarize the results produced by the individual side- to directly averaging all the side outputs. This might due to
outputs at different scales in Table 3, including different our focus on “global” object boundaries for the fusion-layer
combinations of the multi-scale edge maps. We empha- weight learning. Taking advantage of the readily available
size here that all the side-output predictions are obtained side outputs in HED, we merge the fusion layer output with
in one pass; this enables us to fully investigate different the side outputs (at no extra cost) in order to compensate for
configurations of combining the outputs at no extra cost. the loss in average precision. This simple heuristic gives us
There are several interesting observations from the results: the best performance across all measures that we report in
Figure 5 and Table 4.

1401

Authorized licensed use limited to: Universidad de La Salle. Downloaded on November 16,2020 at 01:28:19 UTC from IEEE Xplore. Restrictions apply.
More training data Deep models have significantly ad- Table 5. Results on the NYUD dataset [35] †GPU time
ODS OIS AP FPS
vanced results in a variety of computer vision applications,
at least in part due to the availability of large training data. gPb-ucm .632 .661 .562 1/360
In edge detection, however, we are limited by the number of Silberman [35] .658 .661 - <1/360
training images available in the existing benchmarks. Here gPb+NG[11] .687 .716 .629 1/375
we want to explore whether adding more training data will SE[6] .685 .699 .679 5
help further improve the results. To do this, we expand the SE+NG+[12] .710 .723 .738 1/15
training set by randomly sampling 100 images from the test HED-RGB .720 .734 .734 2.5†
set. We then evaluate the result on the remaining 100 test HED-HHA .682 .695 .702 2.5†
images. We report the averaged result over 5 such trials. HED-RGB-HHA .746 .761 .786 1†
We observe that by adding only 100 training images, per- results below. We directly average the RGB and HHA pre-
formance improves from ODS=.782 to ODS=.797 (±.003), dictions to produce the final result by leveraging RGB-D
nearly touching the human benchmark. This shows a poten- information. We also tried other approaches to incorporate
tially promising direction to further enhance HED by train- the depth information, for example, by training on the raw
ing it with a larger dataset. depth channel, or by concatenating the depth channel with
the RGB channels before the first convolutional layer. None
4.3. NYUDv2 Dataset
of these attempts yields notable improvement compared to
The NYU Depth (NYUD) dataset [35] has 1449 RGB-D the approach using HHA. The effectiveness of the HHA fea-
images. This dataset was used for edge detection in [31] tures shows that, although deep neural networks are capa-
and [11]. Here we use the setting described in [6] and eval- ble of automatic feature learning, for depth data, carefully
uate HED on data processed by [11]. The NYUD dataset is hand-designed features are still necessary, especially when
split into 381 training, 414 validation, and 654 testing im- only limited training data is available.
ages. All images are made to the same size and we train our Table 5 and Figure 6 show the precision-recall evalua-
network on full resolution images. As used in [12, 6], dur- tions of HED in comparison to other competing methods.
ing evaluation we increase the maximum tolerance allowed Our network structures for training are kept the same as for
for correct matches of edge predictions to ground truth from BSDS. During testing we use the Average2-4 prediction in-
.0075 to .011. stead of the Fusion-layer output as it yields the best perfor-

mance. We do not perform late merging since combining
two sources of edge map predictions (RGB and HHA) al-

ready gives good average precision. Note that the results

achieved using the RGB modality only are already better
than those of the previous approaches.

5. Conclusion
3UHFLVLRQ

In this paper, we have developed a new convolutional-

neural-network-based edge detection system that demon-
>) @+(' strates state-of-the-art performance on natural images at a
speed of practical relevance (e.g., 0.4 seconds using GPU
RXUV>) @6(
1*>) @6(
>) @J3E1*
>) @6LOEHUPDQ
>) @J3EíRZWíXFP and 12 seconds using CPU). Our algorithm builds on top of

5HFDOO

the ideas of fully convolutional neural networks and deeply-
Figure 6. Precision/recall curves on NYUD dataset. Holistically-nested supervised nets. We also initialize our network structure
edge detection (HED) trained with RGB and HHA features achieves the and parameters by adopting a pre-trained trimmed VG-
best result (ODS=.746). See Table 5 for additional information. GNet. Our method shows promising results in perform-
Depth information encoding Following the success in [12] ing image-to-image learning by combining multi-scale and
and [26], we leverage the depth information by utilizing multi-level visual responses, even though explicit contex-
HHA features in which the depth information is embed- tual and high-level information has not been enforced.
ded into three channels: horizontal disparity, height above Acknowledgment This work is supported by NSF IIS-
ground, and angle of the local surface normal with the in- 1216528 (IIS-1360566), NSF award IIS-0844566 (IIS-
ferred direction of gravity . We use the same HED architec- 1360568), and a Northrop Grumman Contextual Robotics
ture and hyper-parameter settings as were used for BSDS grant. We gratefully thank Patrick Gallagher for helping
500. We train two different models in parallel, one on RGB improve this manuscript. We are grateful for the generous
images and another on HHA feature images, and report the donation of the GPUs by NVIDIA.

1402

Authorized licensed use limited to: Universidad de La Salle. Downloaded on November 16,2020 at 01:28:19 UTC from IEEE Xplore. Restrictions apply.
References [23] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. Deeply-
supervised nets. In AISTATS, 2015.
[1] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Con- [24] J. J. Lim, C. L. Zitnick, and P. Dollár. Sketch tokens: A
tour detection and hierarchical image segmentation. PAMI, learned mid-level representation for contour and object de-
33(5):898–916, 2011. tection. In CVPR, 2013.
[2] G. Bertasius, J. Shi, and L. Torresani. Deepedge: A multi- [25] C. Liu, J. Yuen, and A. Torralba. Nonparametric scene pars-
scale bifurcated deep network for top-down contour detec- ing via label transfer. PAMI, 33(12):2368–2382, 2011.
tion. In CVPR, 2015. [26] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional
[3] P. Buyssens, A. Elmoataz, and O. Lézoray. Multiscale con- networks for semantic segmentation. In CVPR, 2015.
volutional neural networks for vision–based classification of
[27] D. Marr and E. Hildreth. Theory of edge detection. Pro-
cells. In ACCV. 2013.
ceedings of the Royal Society of London. Series B. Biological
[4] J. Canny. A computational approach to edge detection. Sciences, 207(1167):187–217, 1980.
PAMI, (6):679–698, 1986.
[28] D. R. Martin, C. C. Fowlkes, and J. Malik. Learning to detect
[5] P. Dollar, Z. Tu, and S. Belongie. Supervised learning of
natural image boundaries using local brightness, color, and
edges and object boundaries. In CVPR, 2006.
texture cues. PAMI, 26(5):530–549, 2004.
[6] P. Dollár and C. L. Zitnick. Fast edge detection using struc-
[29] N. Neverova, C. Wolf, G. W. Taylor, and F. Nebout. Multi-
tured forests. PAMI, 2015.
scale deep learning for gesture detection and localization. In
[7] J. H. Elder and R. M. Goldberg. Ecological statistics of
ECCV Workshops, 2014.
gestalt laws for the perceptual organization of contours.
[30] X. Ren. Multi-scale improves boundary detection in natural
Journal of Vision, 2(4):5, 2002.
images. In ECCV. 2008.
[8] C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning
hierarchical features for scene labeling. PAMI, 2013. [31] X. Ren and L. Bo. Discriminatively trained sparse code gra-
dients for contour detection. In NIPS, 2012.
[9] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient graph-
based image segmentation. IJCV, 59(2):167–181, 2004. [32] D. L. Ruderman and W. Bialek. Statistics of natural images:
[10] Y. Ganin and V. Lempitsky. N4-fields: Neural network near- Scaling in the woods. Physical review letters, 73(6):814,
est neighbor fields for image transforms. arXiv preprint 1994.
arXiv:1406.6558, 2014. [33] P. Sermanet, S. Chintala, and Y. LeCun. Convolutional neu-
[11] S. Gupta, P. Arbelaez, and J. Malik. Perceptual organiza- ral networks applied to house numbers digit classification. In
tion and recognition of indoor scenes from rgb-d images. In ICPR, 2012.
CVPR, 2013. [34] W. Shen, X. Wang, Y. Wang, X. Bai, and Z. Zhang. Deep-
[12] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik. Learning contour: A deep convolutional feature learned by positive-
rich features from rgb-d images for object detection and seg- sharing loss for contour detection draft version. In CVPR,
mentation. In ECCV, 2014. 2015.
[13] S. Hallman and C. C. Fowlkes. Oriented edge forests for [35] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor
boundary detection. arXiv preprint arXiv:1412.4181, 2014. segmentation and support inference from rgbd images. In
[14] B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik. Hyper- ECCV. 2012.
columns for object segmentation and fine-grained localiza- [36] K. Simonyan and A. Zisserman. Very deep convolutional
tion. In CVPR, 2015. networks for large-scale image recognition. In ICLR, 2015.
[15] D. Hoiem, A. A. Efros, and M. Hebert. Putting objects in [37] V. Torre and T. A. Poggio. On edge detection. PAMI,
perspective. IJCV, 80(1):3–15, 2008. (2):147–163, 1986.
[16] D. Hoiem, A. N. Stein, A. A. Efros, and M. Hebert. Recov- [38] Z. Tu. Auto-context and its application to high-level vision
ering occlusion boundaries from a single image. In ICCV, tasks. In CVPR, 2008.
2007. [39] D. C. Van Essen and J. L. Gallant. Neural mechanisms of
[17] X. Hou, A. Yuille, and C. Koch. Boundary detection bench- form and motion processing in the primate visual system.
marking: Beyond f-measures. In CVPR, 2013. Neuron, 13(1):1–10, 1994.
[18] D. H. Hubel and T. N. Wiesel. Receptive fields, binocular [40] A. P. Witkin. Scale-space filtering: A new approach to multi-
interaction and functional architecture in the cat’s visual cor- scale description. In ICASSP, 1984.
tex. The Journal of physiology, 160(1):106–154, 1962. [41] A. L. Yuille and T. A. Poggio. Scaling theorems for zero
[19] J.-J. Hwang and T.-L. Liu. Pixel-wise deep learning for con- crossings. PAMI, (1):15–25, 1986.
tour detection. In ICLR, 2015.
[20] J. Kittler. On the accuracy of the sobel edge detector. Image
and Vision Computing, 1(1):37–42, 1983.
[21] J. J. Kivinen, C. K. Williams, N. Heess, and D. Technolo-
gies. Visual boundary prediction: A deep neural prediction
network and quality dissection. In AISTATS, 2014.
[22] S. Konishi, A. L. Yuille, J. M. Coughlan, and S. C. Zhu. Sta-
tistical edge detection: Learning and evaluating edge cues.
PAMI, 25(1):57–74, 2003.

1403

Authorized licensed use limited to: Universidad de La Salle. Downloaded on November 16,2020 at 01:28:19 UTC from IEEE Xplore. Restrictions apply.

CNN 5
No ratings yet
CNN 5
8 pages
Edge Detection and Hough Transform in Computer Vision
No ratings yet
Edge Detection and Hough Transform in Computer Vision
8 pages
Journal of Information and Communication Technology
No ratings yet
Journal of Information and Communication Technology
22 pages
Waters-Trends and Issues in ELT Methods
No ratings yet
Waters-Trends and Issues in ELT Methods
10 pages
RHN A Residual Holistic Neural Network For Edge Detection
No ratings yet
RHN A Residual Holistic Neural Network For Edge Detection
13 pages
EdgeDetection Final
No ratings yet
EdgeDetection Final
111 pages
PhDConfirmation Report 2010
No ratings yet
PhDConfirmation Report 2010
124 pages
Propsal
No ratings yet
Propsal
7 pages
1 CASENet: Deep Category-Aware Semantic Edge Detection
No ratings yet
1 CASENet: Deep Category-Aware Semantic Edge Detection
16 pages
Dynamic Feature Integration For Simultaneous Detection of Salient Object Edge and Skeleton
No ratings yet
Dynamic Feature Integration For Simultaneous Detection of Salient Object Edge and Skeleton
16 pages
2016 - Alex Pappachen James - Edge Detection For Pattern Recognition A Survey
No ratings yet
2016 - Alex Pappachen James - Edge Detection For Pattern Recognition A Survey
33 pages
A Review of Object Detection Based On Convolutional Neural Network
No ratings yet
A Review of Object Detection Based On Convolutional Neural Network
6 pages
1 s2.0 S1110016823000327 Main
No ratings yet
1 s2.0 S1110016823000327 Main
24 pages
Efficient Edge Detection in Digital Images Using A Cellular Neural Network Optimized by Differential Evolution Algorithm
No ratings yet
Efficient Edge Detection in Digital Images Using A Cellular Neural Network Optimized by Differential Evolution Algorithm
7 pages
Detection of Road Extraction From Satellite Images With Deep Learning Method
No ratings yet
Detection of Road Extraction From Satellite Images With Deep Learning Method
10 pages
Edge and Contour DEtection
No ratings yet
Edge and Contour DEtection
10 pages
2017 2 Col Bare JRNL
No ratings yet
2017 2 Col Bare JRNL
21 pages
Combined Edge Detection CED For Similar Images Retrieval and Re-Ranking
No ratings yet
Combined Edge Detection CED For Similar Images Retrieval and Re-Ranking
8 pages
Edge Drawing A Combined Real-Time Edge and Segment Detector
No ratings yet
Edge Drawing A Combined Real-Time Edge and Segment Detector
11 pages
Learning Parallel and Hierarchical Mechanisms For
No ratings yet
Learning Parallel and Hierarchical Mechanisms For
14 pages
Edge Detection
No ratings yet
Edge Detection
8 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
CV CH-3
No ratings yet
CV CH-3
27 pages
A Dictionary of Difficult Words
100% (1)
A Dictionary of Difficult Words
33 pages
Owens 2013
No ratings yet
Owens 2013
23 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
1 s2.0 S2667305323000996 Main
No ratings yet
1 s2.0 S2667305323000996 Main
14 pages
Tiny and Efficient Model For The Edge Detection Ge
No ratings yet
Tiny and Efficient Model For The Edge Detection Ge
10 pages
Practical Edge Detection Via Robust Collaborative
No ratings yet
Practical Edge Detection Via Robust Collaborative
10 pages
(Ebook PDF) Introduction To Psychology 9th Editionpdf Download
100% (3)
(Ebook PDF) Introduction To Psychology 9th Editionpdf Download
50 pages
A Sociolinguistic Survey On Code Switching & Code Mixing by The Native Speakers of Bangladesh Shaima Quyyum
No ratings yet
A Sociolinguistic Survey On Code Switching & Code Mixing by The Native Speakers of Bangladesh Shaima Quyyum
18 pages
Parallel Edge Detection by SOBEL Algorithm Using CUDA C
No ratings yet
Parallel Edge Detection by SOBEL Algorithm Using CUDA C
6 pages
Object Detection Using ELAN
No ratings yet
Object Detection Using ELAN
6 pages
Edge Propagation
No ratings yet
Edge Propagation
16 pages
A Semi-Detailed Lesson Plan in Science 7
100% (1)
A Semi-Detailed Lesson Plan in Science 7
3 pages
DRichard Wilczynski Paper 1
No ratings yet
DRichard Wilczynski Paper 1
2 pages
PPT - Bahasa Inggris - Kelompok 2
No ratings yet
PPT - Bahasa Inggris - Kelompok 2
15 pages
Motivation and Attitudes Toward English Language Learning
No ratings yet
Motivation and Attitudes Toward English Language Learning
7 pages
Silhouette Vanished Contour Discovery of Aerial View Images by Exploiting Pixel Divergence
No ratings yet
Silhouette Vanished Contour Discovery of Aerial View Images by Exploiting Pixel Divergence
11 pages
Phenomenology
No ratings yet
Phenomenology
15 pages
Comparative Study Among Sobel, Prewitt and Canny Edge Detection Operators Used in Image Processing
No ratings yet
Comparative Study Among Sobel, Prewitt and Canny Edge Detection Operators Used in Image Processing
10 pages
Deep Learning ASSIGNMENT 2
No ratings yet
Deep Learning ASSIGNMENT 2
1 page
A Novel Image Edge Detection Algorithm Based On Neutrosophic Set
No ratings yet
A Novel Image Edge Detection Algorithm Based On Neutrosophic Set
23 pages
An Analysis On Object Recognition Using Convolutional Neural Networks
No ratings yet
An Analysis On Object Recognition Using Convolutional Neural Networks
8 pages
Course Title: Structure in English Course Code: EM3 Course Description
No ratings yet
Course Title: Structure in English Course Code: EM3 Course Description
7 pages
Poma Dense Extreme Inception Network Towards A Robust CNN Model For WACV 2020 Paper PDF
No ratings yet
Poma Dense Extreme Inception Network Towards A Robust CNN Model For WACV 2020 Paper PDF
10 pages
2021 - Feroza Mirajkar - IJ - An Extensive Survey On Edge Detection Techniques
No ratings yet
2021 - Feroza Mirajkar - IJ - An Extensive Survey On Edge Detection Techniques
8 pages
He Bi-Directional Cascade Network For Perceptual Edge Detection CVPR 2019 Paper
No ratings yet
He Bi-Directional Cascade Network For Perceptual Edge Detection CVPR 2019 Paper
10 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
A Novel Method For Edge Detection in A Gray Image Based On Human Psychovisual Phenomenon and Bat Algorithm
No ratings yet
A Novel Method For Edge Detection in A Gray Image Based On Human Psychovisual Phenomenon and Bat Algorithm
7 pages
Semantic Image Segmentation With Task-Specific Edge Detection Using Cnns and A Discriminatively Trained Domain Transform
No ratings yet
Semantic Image Segmentation With Task-Specific Edge Detection Using Cnns and A Discriminatively Trained Domain Transform
10 pages
EDGE-Net: Efficient Deep-Learning Gradients Extraction Network
No ratings yet
EDGE-Net: Efficient Deep-Learning Gradients Extraction Network
15 pages
Moroccan Arabic Textbook 13
No ratings yet
Moroccan Arabic Textbook 13
10 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
Writing Portfolio 1 - Unit 2 - HAVE GOT - Capital Letter-Punctuation
No ratings yet
Writing Portfolio 1 - Unit 2 - HAVE GOT - Capital Letter-Punctuation
40 pages
Lesson Plan
No ratings yet
Lesson Plan
5 pages
Class: Time: SUBJECT: English Day: Monday Date
No ratings yet
Class: Time: SUBJECT: English Day: Monday Date
4 pages
A Survey On Edge Detection Methods PDF
No ratings yet
A Survey On Edge Detection Methods PDF
36 pages
Eng. 5, Week 1 - Quarter 1
No ratings yet
Eng. 5, Week 1 - Quarter 1
3 pages
A Study of Edge Detection Techniques: Abstract
No ratings yet
A Study of Edge Detection Techniques: Abstract
10 pages
Simulation of Autonomous Vehicle and Relative Distance Calculation
No ratings yet
Simulation of Autonomous Vehicle and Relative Distance Calculation
5 pages
Cvpr06 Edge
No ratings yet
Cvpr06 Edge
8 pages
Major Synopsis.
No ratings yet
Major Synopsis.
4 pages
Mind Your Thoughts Upper Intermediate
No ratings yet
Mind Your Thoughts Upper Intermediate
5 pages
Extensive Reading YANTI ASMARAA
No ratings yet
Extensive Reading YANTI ASMARAA
4 pages
Ml0120en m2v4 The Mnist Database
No ratings yet
Ml0120en m2v4 The Mnist Database
2 pages
A Simple Scheme For Contour Detection: Gopal Datt Joshi
No ratings yet
A Simple Scheme For Contour Detection: Gopal Datt Joshi
7 pages
Edge Detection Using Convolutional Neural Network: Abstract. in This Work, We Propose A Deep Learning Method To Solve The
No ratings yet
Edge Detection Using Convolutional Neural Network: Abstract. in This Work, We Propose A Deep Learning Method To Solve The
10 pages
Year 12 Modern History Assessment Task
No ratings yet
Year 12 Modern History Assessment Task
11 pages
Eye of Longing
No ratings yet
Eye of Longing
20 pages
Improved Color Edge Detection by Fusion of Hue, PCA & Hybrid Canny
No ratings yet
Improved Color Edge Detection by Fusion of Hue, PCA & Hybrid Canny
8 pages
Foreign Languages: Minidoka County Joint School District #331
No ratings yet
Foreign Languages: Minidoka County Joint School District #331
6 pages
Improved Edge Detection Based Fast Face Detection Method Using Enhanced Fourier Transform (IED-FFD) Towards Facial Expression Recongnition
No ratings yet
Improved Edge Detection Based Fast Face Detection Method Using Enhanced Fourier Transform (IED-FFD) Towards Facial Expression Recongnition
7 pages
Review On Distributed Canny Edge Detectorusing Fpga
No ratings yet
Review On Distributed Canny Edge Detectorusing Fpga
8 pages
Language and The Brain
No ratings yet
Language and The Brain
3 pages
COMP1649 Coursework Term1 - 2223
No ratings yet
COMP1649 Coursework Term1 - 2223
7 pages
Passive Voice: by Group 1: Luthfika Adityo Gindo Putra Nugroho Primatia Palwani Munawaroh
No ratings yet
Passive Voice: by Group 1: Luthfika Adityo Gindo Putra Nugroho Primatia Palwani Munawaroh
8 pages
Chapter 1: Content Knowledge and Pedagogy Title: Mother Tongue, Filipino and English in Teaching and Learning Observation
No ratings yet
Chapter 1: Content Knowledge and Pedagogy Title: Mother Tongue, Filipino and English in Teaching and Learning Observation
4 pages
A Cooperative Network For Contour Grouping
No ratings yet
A Cooperative Network For Contour Grouping
4 pages
Object Detection in Image Processing Using Edge Detection Techniques
No ratings yet
Object Detection in Image Processing Using Edge Detection Techniques
4 pages
Dynamic Resolution of Image Edge Detection Technique Among Sobel, Log, and Canny Algorithms
No ratings yet
Dynamic Resolution of Image Edge Detection Technique Among Sobel, Log, and Canny Algorithms
5 pages
Tle 8 Carpentry
100% (1)
Tle 8 Carpentry
2 pages
Rubric For Oral Presentation
No ratings yet
Rubric For Oral Presentation
2 pages
FPGA Based Design and Implementation of Image Edge Detection Using Xilinx System Generator
No ratings yet
FPGA Based Design and Implementation of Image Edge Detection Using Xilinx System Generator
4 pages
The Structure of A Second Conditional Sentence
No ratings yet
The Structure of A Second Conditional Sentence
4 pages

Holistically-Nested Edge Detection

Uploaded by

Holistically-Nested Edge Detection

Uploaded by

2015 IEEE International Conference on Computer Vision

Holistically-Nested Edge Detection

Saining Xie Zhuowen Tu

1550-5499/15 $31.00 © 2015 IEEE 1395

D E F G H

in [19], with additional trade-off parameters introduced for

conv5 3. The receptive ﬁeld size of each of these convo-

You might also like