0% found this document useful (0 votes)

8 views10 pages

3D Graph Neural Networks For RGBD Semantic Segmentation

Uploaded by

phananhphu2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views10 pages

3D Graph Neural Networks For RGBD Semantic Segmentation

Uploaded by

phananhphu2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

3D Graph Neural Networks for RGBD Semantic Segmentation

Xiaojuan Qi† Renjie Liao‡,§ Jiaya Jia†,♭ Sanja Fidler‡ Raquel Urtasun§,‡
†
The Chinese University of Hong Kong ‡ University of Toronto
§
Uber Advanced Technologies Group ♭ Youtu Lab, Tencent
{xjqi, leojia}@cse.cuhk.edu.hk {rjliao, fidler, urtasun}@cs.toronto.edu

Abstract
RGBD semantic segmentation requires joint reasoning
about 2D appearance and 3D geometric information. In
this paper we propose a 3D graph neural network (3DGNN)
that builds a k-nearest neighbor graph on top of 3D point
cloud. Each node in the graph corresponds to a set of points
and is associated with a hidden representation vector ini-
tialized with an appearance feature extracted by a unary
CNN from 2D images. Relying on recurrent functions, every
node dynamically updates its hidden representation based
on the current status and incoming messages from its neigh-
bors. This propagation model is unrolled for a certain num-
ber of time steps and the final per-node representation is
used for predicting the semantic class of each pixel. We
use back-propagation through time to train the model. Ex- Figure 1. 2D and 3D context. The solid lines indicate neighbors
tensive experiments on NYUD2 and SUN-RGBD datasets in 3D while the dotted lines are for neighbors in 2D but not in 3D.
demonstrate the effectiveness of our approach. (a) Input image (b) 2D image projected into 3D point cloud. (c)
Prediction by the two-stream CNN with HHA encoding [29]. (d)
Our 3DGNN prediction.
1. Introduction
Advent of depth sensors makes it possible to perform
RGBD semantic segmentation along with many applica- the real world. For example, in Fig. 1(c), the two-network
tions in virtual reality, robotics and human-computer inter- model [29] classifies the table as part of the counter.
action. Compared with the more common 2D image setting, An alternative is to use 3D CNNs [37] in voxelized
RGBD semantic segmentation can utilize the real-world ge- 3D space. This type of methods has the potential to ex-
ometric information by exploiting depth infromation. For tract more geometric information. However, since 3D point
example, in Fig. 1(a), given the 2D image alone, the lo- clouds are quite sparse, effective representation learning
cal neighborhood of the red point located on the table in- from such data is challenging. In addition, 3D CNNs are
evitably includes microwave and counter pixels. However, computationally more expensive than their 2D version, thus
in 3D, there is no such confusion because these points are it is difficult to scale up these systems to deal with a large
distant in the 3D point cloud, as shown in Fig. 1(b). number of classes. Anisotropic convolutional neural net-
Several methods [11, 29, 21, 7] treat RGBD segmenta- work [1, 30] provides a promising way to learn filters in
tion as a 2D segmentation problem where depth is taken as non-euclidean space for shape analysis. Yet it faces the
another input image. Deep convolutional neural networks same difficulty of scaling-up to perform large-scale RGBD
(CNNs) are applied separately to the color and depth im- dense semantic segmentation due to complex association of
ages to extract features. These methods need two CNNs, points.
which double computation and memory consumption. Pos- To tackle the challenges above, we propose an end-to-
sible errors stem from missing part of the geometric context end 3D graph neural network, which directly learns its rep-
information since 2D pixels are associated with 3D ones in resentation from 3D points. We first cast the 2D pixels into

15199
Figure 2. Overview of our 3D graph neural network. The top part of the figure shows the 3D point cloud and a close-up of the constructed
graph based on the point cloud. Blue points and the associated black dotted lines represent nodes and edges which exist in the graph
constructed from 2D image. It is clear that a graph built from the 3D point cloud encodes geometric information which is hard to be
inferred from the 2D image. In the bottom part, we show the sub-graph connected to the red point as an example to illustrate the propagation
process. We highlight the source of messages the red point receives in different time steps using yellow edges.

3D based on depth information and associate with each 3D tic segmentation as sequence prediction and used LSTM to
point a unary feature vector, i.e., an output of a 2D segmen- capture local and global dependencies. Graph LSTM [22]
tation CNN. We then build a graph whose nodes are these was used to model structured data. However, the update
3D points, and edges are constructed by finding the nearest procedure and sequential processing make it hard to scale
neighbors in 3D. For each node, we take the image feature up the system to large graphs.
vector as the initial representation and iteratively update it
using a recurrent function. The key idea of this dynamic
RGBD Semantic Segmentation. Compared to the 2D
computation scheme is that the node state is determined
setting, RGBD semantic segmentation has the benefit of
by its history state and the messages sent by its neighbors,
exploiting more geometric information. Several methods
while taking both appearance and 3D information into con-
have encoded the depth map as an image. In [11, 29, 21],
sideration.
depth information forms three channels via HHA encoding:
We use the final state of each node to perform per-node
horizontal disparity, height above ground and norm angle.
classification. The back-propagation through time (BPTT)
In [7], the depth image is simply treated as a one-channel
algorithm is adopted to compute gradients of the graph neu-
image. FCNs were then applied to extract semantic features
ral network. Further, we pass the gradients to the unary
directly on the encoded images.
CNN to facilitate end-to-end training. Our experimental re-
In [11, 24], a set of 2.5D region proposals are first gen-
sults show state-of-the-art performance on the challenging
erated. Each proposal is then represented by its RGB image
NYUD2 and SUN-RGBD datasets.
and the encoded HHA image. Two CNNs were used to ex-
2. Related Work tract features separately, which are finally concatenated and
passed as input to SVM classification. Besides high compu-
2D Semantic Segmentation. Fully convolutional net- tation, separate region proposal generation and label assign-
works (FCN) [29] have demonstrated effectiveness in per- ment make these systems fragile. The final classification
forming semantic segmentation. In fact, most of the fol- stage could be influenced by errors produced in the proposal
lowing work [3, 45, 44, 26, 25, 46, 28, 43, 4] is built on stage. Long et al. [29] applied FCN to RGB and HHA im-
top of FCN. Chen et al. [3] used dilated convolutions to ages separately and fused scores for final prediction. Eigen
enlarge the receptive field of the network while retaining et al. [7] proposed a global-to-local strategy to combine dif-
dense prediction. Conditional random fields (CRFs) have ferent levels of prediction, which simply extracts features
been applied as post-processing [3] or have been integrated via CNNs from the depth image. The extracted feature is
into the network [45] to refine boundary of prediction. Re- again concatenated with the image one for prediction. Li et
cently, global and local context is modeled in scene pars- al. [21] used LSTM to fuse the HHA image and color infor-
ing [42, 27]. In [44, 27], the context is incorporated with mation. These methods all belong to the category that uses
pyramid pooling [12]. Liang et al. [22, 23] tackled seman- 2D CNNs to extract depth features.

5200
Alternatively, several approaches deal with 2.5D data us- is also possible to incorporate more information from the
ing 3D voxel networks [37, 41, 36]. Song et al. [37] used graph with different types of edges using multiple M.
a 3D dilated voxel convolutional neural network to learn Inference is performed by executing the above propaga-
the semantics and occupancy of each voxel. These methods tion model for a certain number of steps. The final predic-
take better advantage of 3D context. However, scaling up to tion can be at the node or at the graph level depending on the
deal with high-resolution and complex scenes is challeng- task. For example, one can feed the hidden representation
ing since 3D voxel networks are computationally expensive. (or aggregation of it) to another neural network to perform
Further, quantization of 3D space can lead to additional er- node (or graph) classification.
rors. Other methods [1, 30] learned non-euclidean filters for Graph Neural Networks are closely related to many ex-
shape analysis. They typically rely on well-defined point as- isting models, such as conditional random fields (CRFs) and
sociation, e.g., meshes, which are not readily available for recurrent neural networks (RNNs). We discuss them next.
complex RGBD segmentation data. We focussed on pairwise CRFs but note that the connection
extends to higher-order models.
Graph neural networks. In terms of the structure of neu-
ral networks, there has been effort to generalize neural net- Loopy Belief Propagation Inference. We are given
works to graph data. One direction is to apply Convolu- a pairwise (often cyclic in practice) CRF whose
tional Neural Networks (CNNs) to graphs. In [2, 5, 18], conditional
P distribution factorizes
P as log P (Y |I) ∝
CNNs are employed in the spectral domain relying on the − i∈V φu (yi |I) − (i,j)∈E φp (yi , yj |I) with
graph Laplacian. While [6] used hash functions so that Y = {yi |i ∈ V } the set of all labels, I the set of all
CNN can be applied to graphs. Another direction is to observed image pixels, and φu and φp the unary and pair-
recurrently apply neural networks to every node of the wise potentials, respectively. One fundamental algorithm
graph [9, 33, 20, 39], producing “Graph Neural Networks”. to approximate inference in general MRFs/CRFs is loopy
belief propagation (BP) [31, 8]. The propagation process is
This model includes a propagation process, which resem-
denoted as
bles message passing of graphical models [8]. The final
X Y
learning process of such a model can be achieved by the βi→j = exp {−φu (yi ) − φp (yi , yj )} βk→i (2)
back-propagation through time (BPTT) algorithm. yi k∈Ωi /j

where β is the recursively defined message and the sub-

3. Graph Neural Networks
scription i → j means the message is sent from node i to
In this section, we briefly review Graph Neural Networks j. The message is a vector defined in the same space of
(GNN) [9, 33] and their variants, e.g., gated GNN [20], and output label yi . The message update is performed either un-
discuss their relationship with existing models. til convergence or when reaching the maximum number of
For GNNs, the input data is represented as a graph G = iterations. One can then construct the
Q final marginal prob-
{V, E}, where V and E are the sets of nodes and edges of ability as P (yi ) ∝ exp {−φu (yi )} j∈Ωi βj→i . It can be
the graph, respectively. For each node v ∈ V , we denote deemed as a special case of the graph neural network built
the input feature vector by xv and its hidden representation on top of the factor graph. In this case, we only associate
describing the node’s state at time step t by htv . We use Ωv hidden representation with the factor node and treat other
to denote the set of neighboring nodes of v. nodes as dummy in GNN. Then the message β in BP corre-
A Graph Neural Network is a dynamic model where the sponds to the hidden state of the factor node. The message
hidden representation of all nodes evolve over time. At time function M is the product and the update function F takes
step t, the hidden representation is updated as the form of Eq. (2).

mtv = M {htu |u ∈ Ωv }

Mean Field Inference. Mean field Q inference defines an
ht+1 = F ht , mtv , approximate distribution Q(Y ) = i Q(yi ) and minimizes

v (1)
the KL-divergence KL(Q||P ). The fixed-point propaga-
where mtv is a vector, which indicates the aggregation of tion equations characterize the stationary points of the KL-
messages that node v receives from its neighbors Ωv . M is divergence as
a function to compute the message and F is the function to  
update the hidden state. Similar to a recurrent neural net- 1
 X 
Q(yi ) = exp −φu (yi ) − EQ(yj ) [φp (yj , yi )] , (3)
work, M and F are feedforward neural networks that are Zi 
j∈Ω i

shared among different time steps. Simple M and F can be
an element-wise summation function and a fully connected where Zi is a normalizing constant and Ωi is the neigh-
layer, respectively. Note that these update functions spec- bor of node i. This fixed-point iteration converges to a lo-
ify a propagation model of information inside the graph. It cal minimum [8]. From Eqs. (1) and (3), it is clear that

5201
the mean field propagation is a special case of graph neu- 37.9 0
RNN
39.2 1
ral networks. The hidden representation of node i is just LSTM
38.6 2
the approximate distribution Q(yi ). M and F is the nega-
38.8 3

mean IoU\%
tion of element-wise summation and softmax respectively. 39 4
EQ(yj ) [φp (yj , yi )] is the message sent from node j to i.
39.3 5
While the messages of CRFs lie in the space of output la- 37.7 6
bels y, GNNs have messages mtv in the space of hidden 38.28
37.5
39 7
representations. GNNs are therefore more flexible in terms 39.4
37
of information propagation as the dimension of the hidden
space can be much larger than that of the label space. 36.5
0 1 2 3 4 5 6 7

Connection to RNNs. GNNs can also be viewed as a gen- Unroll timestamps

eralization of RNNs from sequential to graph data. When Figure 3. Performance of GNNs in different propagation steps.
the input data is chain-structured, the neighborhood Ωv de- Blue and orange lines correspond to vanilla RNN and LSTM up-
generates to the single parent of the current node v. Then a date respectively. Timestamp 0 represents the CNN baseline.
vanilla RNN, e.g., the one used in text modeling [38], is just
a graph neural network with a particular instantiation of the details in Section 5. Taking these features as the initial hid-
message function M and the update function F. den representation of the corresponding node, we encode
GNNs are also closely related to Tree-LSTM [39] when the appearance information. Given that the 3D graph al-
the input graph is a tree. In this case, message computation ready encodes the geometric context, this graph neural net-
is the summation of hidden representations of all children work exploits both appearance and geometry information.
nodes. The main difference from common graph neural net- The propagation process is expressed as
works is that each child node computes its own copy of the
forget gate output – they are aggregated together in their 1 X
mtv = g htu

parents’ cell memory update. |Ωv |
u∈Ωv

hv = F h , mtv ,
t+1 t

(5)
4. 3DGNN for RGBD Semantic Segmentation
where g is a multi-layer perceptron (MLP). Unless other-
In this section, we propose a special GNN to tackle the
wise specified, all instances of MLP that we employ have
problem of RGBD semantic segmentation.
one layer with ReLU [19] as the nonlinearity. At each time
4.1. Graph Construction step, every node collects messages from its neighbors. The
message is computed by first feeding hidden states to the
Given an image, we construct a directed graph based on MLP g and then taking the average over the neighborhood.
the 2D position and the depth information of pixels. Let Then every node updates its hidden state based on previous
[x, y, z] be the 3D coordinates of a point in the camera co- state and the aggregated message. This process is shown in
ordinate system and let [u, v] be its projection onto the im- Fig. 2. We consider two choices of the update function F.
age according to the pinhole camera model. Geometry of
perspective projection yields
Vanilla RNN Update. We can use a MLP as the update
x = (u − cx ) ∗ z/fx function as
ht+1 = q( htv , mtv ),

y = (v − cy ) ∗ z/fy , (4) v (6)

where fx and fy are the focal length along the x and y direc- where we concatenate the hidden state and message before
tions, and cx and cy are coordinates of the principal point. feeding it to the MLP q. This type of update function is
To form our graph, we regard each pixel as a node and con- common in vanilla RNN.
nect it via directed edges to K nearest neighbors (KNN) in
the 3D space, where K is set to 64 in our experiments. Note LSTM Update. Another choice is to use a long short-
that this process creates asymmetric structure, i.e., an edge term memory (LSTM) [15] cell. This is more powerful
from A to B does not necessarily imply existence of the since it maintains its own memory to help extract useful
edge from B to A. We visualize the graph in Fig. 2. information from incoming messages.
4.2. Propagation Model 4.3. Prediction Model
After constructing the graph, we use a CNN as the unary Assuming the propagation model in Eq. (5) is unrolled
model to compute the features for each pixel. We provide for T steps, we now predict the semantic label for each pixel

5202
model mean IoU% mean acc% Unary CNN. For most of the ablation experiments, we
Gupta et al. [11] (2014) 28.6 35.1 use a modified VGG-16 network, i.e., deeplab-LargeFov [3]
Long et al. [29] (2015) 34.0 46.1 with dilated convolutions as our unary CNN to extract the
Eigen et al. [7] (2015) 34.1 45.1 appearance features from the 2D images. We use the fc7
Lin et al. [26] + ms (2016) 40.6 53.6 feature map. The output feature map is of size H × W × C
HHA + ss 40.8 54.6 where H, W and C are the height, width and the channel
3DGNN + ss 39.9 54.0 size respectively. Note that due to the stride and pooling
3DGNN + ms 41.7 55.4 of this network, H and W are 1/8 of the original input in
HHA-3DGNN + ss 42.0 55.2 terms of size. Therefore, our 3D graph is built on top of the
HHA-3DGNN + ms 43.1 55.7 downsampled feature maps.
To further incorporate contextual information, we use
Table 1. Comparison with state-of-the-arts on NYUD2 test set in
global pooling [27] to compute another C-dim vector from
40-class setting. HHA means combining HHA feature [29]. “ss”
and “ms” indicate single- and multi-scale test. the feature map. We then append the vector to all spatial
positions, which result in a H × W × 2C feature map. In
model mean IoU% mean acc% our experiment, C = 1024 and a 1 × 1 convolution layer is
Silberman et al. [34] (2012) - 17.5 used to further reduce the dimension to 512. We also exper-
Ren et al. [32] (2012) - 20.2 imented by replacing the VGG-net with ResNet-101 [14] or
Gupta et al. [10] (2015) - 30.2 by combining it with the HHA encoding.
Wang et al. [40] (2015) - 29.2
Khan et al. [17] (2016) - 43.9 Implementation Details. We initialize the unary CNN
Li et al. [21] (2016) - 49.4 from the pre-trained VGG network in [3]. We use SGD with
momentum to optimize the network and clip the norm of the
3DGNN + ss 43.6 57.1
gradients such that it is not larger than 10. The initial learn-
3DGNN + ms 45.4 59.5
ing rates of the pre-trained unary CNN and GNN are 0.001
Table 2. Comparison with state-of-the-art methods on NYUD2 test and 0.01 respectively. Momentum is set to 0.9. We initialize
set in the 37-class setting. “ss” and “ms” indicate single- and RNN and LSTM update functions of the Graph Neural Net-
multi-scale test. work using the MSRA method [13]. We randomly scale the
image in scaling range [0.5, 2] and randomly crop 425×425
in the score map. For node v corresponding to a pixel in the patches. For the multi-scale testing, we use three scales 0.8,
score map, we predict the probability over semantic classes 1.0 and 1.2. In the ResNet-101 experiment, we modified the
yv as follows: network by reducing the overall stride to 8 and by adding
dilated convolutions to enlarge the receptive field.
pyv = s( hTv , h0v ),

(7) We adopt two common metrics to evaluate our method:
mean accuracy and mean intersection-over-union (IoU).
where s is a MLP with a softmax layer shared by all nodes.
Note that we concatenate the initial hidden state, which is 5.1. Comparison with State-of-the-art
the output of the unary CNN to capture the 2D appearance
information. We finally associate a softmax cross-entropy In our comparison with other methods, we use the vanilla
loss function for each node and train the model with the RNN update function in all our experiments due to its effi-
back-propagation through time (BPTT) algorithm. ciency and good performance. We defer the thorough abla-
tion study to Section 5.2.
5. Experiments
NYUD2 dataset. We first compare with other methods in
Datasets. We evaluate our method on two popular RGBD the NYUD2 40-class and 37-class settings. As shown in
datasets: NYUD2 [34] and SUN-RGBD [35]. NYUD2 con- Tables 1 and 2 our model achieves very good performance
tains a total of 1,449 RGBD image pairs from 464 different in both settings. Note that Long et al. [29] and Eigen et al.
scenes. The dataset is divided into 795 images from 249 [7] both used two-VGG networks with HHA image/depth
scenes for training and 654 images from 215 scenes for test- encoding whereas we only use one VGG network to extract
ing. We randomly split 49 scenes from the training set as the appearance features. The configuration of Lin et al. [26] is
validation set, which contains 167 images. The remaining a bit different since it only takes the color image as input
654 images from 200 scenes are used as the training set. and builds a complicated model that involves several VGG
SUN-RGBD consists of 10,335 images, which are divided networks to extract image features.
into 5,285 RGBD image pairs for training and 5,050 for Results in these tables also reveal that by combining
testing. All our hyperparameter search and ablation stud- VGG with the HHA encoding features [29] as the unary
ies are performed on the NYUD2 validation set. model, our method further improves in performance.

5203
model mean IoU% mean acc% ent parts of our model.
Song et al. [35] (2015) - 36.3
Kendall et al. [16] (2015) - 45.9
Propagation Steps. We first calculate statistics of the
Li et al. [21] (2016) - 48.1
constructed 3D graphs. The average diameter of all graphs
HHA + ss 41.7 52.3
is 21. It corresponds to the average number of propagation
ResNet-101 + ss 42.7 53.5
steps to traverse the graph. The average distance between
3DGNN + ss 40.2 52.5
any pair of nodes is 7.9. We investigate how the number
3DGNN + ms 42.3 54.6
of propagation steps affects the performance of our model
HHA-3DGNN + ss 42.0 55.2
in Fig. 3. The performance, i.e., mean IoU, gradually im-
HHA-3DGNN + ms 43.1 55.7
proves when the number of propagation step increases.
ResNet-101-3DGNN + ss 44.1 55.7
The oscillation when the propagation step is large might
ResNet-101-3DGNN + ms 45.9 57.0
relate to the optimization process. We found that 3 to 6
Table 3. Comparison with other methods on SUN-RGBD test set. propagation steps produce reasonably good results. We also
“ResNet-101” exploits ResNet-101 as the unary model. “HHA” show the segmentation maps using different propagation
denotes a combination of the RGB image feature with HHA image
steps in Fig. 4. Limited by the receptive field size, the unary
feature [29]. “ss” and “ms” indicate single-scale and multi-scale.
CNN often yields wrong predictions when the objects are
too large. For example, in the first row of Fig. 4, the table
Propagation Step Unary CNN 2DGNN 3DGNN
is confused as the counter. With 4 propagation steps, our
0 37.9 - -
prediction of the table becomes much more accurate.
1 - 37.8 38.1
3 - 38.4 39.3
Update Equation. Here we compare the two update
4 - 38.0 39.4
equations described in Section 4.2. As shown in Fig. 3,
6 - 38.1 39.0
the vanilla RNN performs similarly to LSTM. The com-
Table 4. 2D VS. 3D graph. Performance with different propagation putation complexity of LSTM update is much larger than
steps on NYUD2 validation set is listed.
the vanilla RNN. According to this finding, we stick to the
Vanilla RNN update in all our experiments.
Dataset network mean IoU% mean acc%
NYUD2-40 Unary CNN 37.1 51.0 2D VS. 3D Graph. To investigate how much improve-
2DGNN 38.7 52.9 ment the 3D graph additionally brings, we compare with 2D
3DGNN 39.9 54.0 graphs that are built on 2D pixel positions with the same
NYUD2-37 Unary CNN 41.7 55.0 KNN method. We conduct experiments using the same
3DGNN 43.6 57.0 Graph Neural Network and show the performance of differ-
SUNRGBD Unary CNN 38.5 49.4 ent propagation steps in Table 4. Results on the whole test
2DGNN 38.9 50.3 set is shown in Table 5. They indicate that with 3DGNN,
3DGNN 40.2 52.5 more 3D geometric context is captured, which in turn makes
Table 5. Comparison with the unary CNN on NYUD2 and SUN- prediction more accurate. Another interesting observation
RGBD test set. is that even the simple 2DGNN still outperforms the method
incorporating the unary CNN.

SUN-RGBD dataset. We also compare these methods on Performance Analysis. We now compare our 3DGNN to
SUN-RGBD in Table 3. The performance difference is sig- the unary CNN in order to investigate how GNN can be
nificant. Note that Li et al. [21] also adopted the Deeplab- enhanced by leveraging 3D geometric information. The
LargeFov network for extracting image feature and a sep- results based on the single-scale data input are listed in
arate network for HHA encoded depth feature extraction. Table 5. Our 3DGNN model outperforms the unary and
Our single 3DGNN model already outperforms previous 2DGNN models, which again supports the fact that 3D con-
ones by a large margin. Combining HHA features or re- text is important in semantic segmentation.
placing VGG-net with ResNet-101 further boost the perfor- We further break down the improvement in performance
mance. These gains showcase that our method is effective for each semantic class in Fig. 5. The statistics show that
in encoding 3D geometric context. our 3DGNN outperforms the unary CNN by a large margin
for classes like cabinet, bed, dresser, and refrigerator. This
5.2. Ablation Study
is likely because these objects are easily misclassified as
In this section, we conduct an ablation study on our their surroundings in the 2D image. However, in 3D space,
NYUD2 validation set to verify the functionality of differ- they typically have rigid shapes and the depth distribution

5204
(a) Original Image (b) Ground Truth (c) Unary CNN (d) Propagation Step 1 (e) Propagation Step 4
Figure 4. Influence of different propagation steps on NYUD2 validation set.

,PSURYHPHQWRI,R8

Figure 5. Per-class IoU improvement of 3DGNN over unary CNN.

is more consistent, which makes the classification task rela- we first divide the ground truth segmentation maps into a
tively easier to tackle. set of connected components where each component is re-
garded as one instance of the object within that class. We
To better understand what contributes to the improve-
then count the sizes of object instances for all classes. The
ment, we analyze how the performance gain varies for dif-
range of object sizes is up to 10, 200 different values in
ferent sizes of objects. In particular, for each semantic class,

5205

Accuracy improvement/%

Figure 6. Performance gain as a function of sizes on the NYUD2

test set. Each bin is of size 3,000.

(a) Original Image (b) Ground Truth (c) Ours

Figure 8. Failure cases on NYUD2 test set

Failure Case Analysis. Finally, we show and analyze rep-

resentative failure cases of our model. First, our model
sometimes fails to make good prediction when objects have
similar depth, especially for small objects. In the first row
of Fig. 8, the lamp is misclassified as the blinds because
of this fact. The second type of failure case is due to ob-
jects with complex shapes. Discontinuous depth can make
3D neighborhood quite noisy. As shown in the second row
of Fig. 8, the table is recognized as pieces of other objects.
Moreover, when both the 2D appearance and the 3D context
of two objects are similar, our method does not work well.
An example is the white board confused as the white wall,
(a) Image (b) Ground Truth (c) Unary CNN (d) Ours shown in the third row of Fig. 8.
Figure 7. Unary CNN vs 3DGNN on SUNRGBD test set.
6. Conclusion
terms of the number of pixels. We divide them into 34 bins We presented a novel 3D Graph Neural Network for
with bin-width of 3, 000. We record the average improve- RGBD semantic segmentation. The Graph Neural Network
ment of prediction accuracy for object instances in each bin. is built on top of points with color and depth extracted from
As shown in Fig. 6, our 3DGNN handles best large- RGBD images. Our 3DGNN leverages both the 2D appear-
and middle-size objects rather than small ones. This re- ance information and 3D geometric relations. It is capable
sult proves that our 3DGNN can overcome the limited size of capturing the long range dependencies within images,
of the receptive field of unary CNN and captures the long- which has been difficult to model in traditional methods.
range dependency in images. A variety of empirical results show that our model achieves
good performance on standard RGBD semantic segmenta-
Qualitative Analysis. We show example results from our tion benchmarks. In the future, we plan to investigate feed-
model on the SUN-RGBD dataset in Fig. 7 and compare it back to adjust the structure of the constructed graphs.
with the unary CNN. One can see that our 3DGNN exploits
3D geometric context information and learns a better rep- 7. Acknowledgements
resentation for classification compared to the unary CNN.
Results in Fig. 7 show that, for example, the segmentation This work is in part supported by NSERC, CFI, ORF,
of table, bed frame, night stand, sofa and book shelf have ERA, CRC as well as Research Grants Council of the Hong
much better shapes and more accurate boundaries when us- Kong SAR (project No. 413113). We also acknowledge
ing our method. GPU donations from NVIDIA.

5206
References [20] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel. Gated
graph sequence neural networks. ICLR, 2016. 3
[1] D. Boscaini, J. Masci, E. Rodolà, and M. Bronstein. Learn- [21] Z. Li, Y. Gan, X. Liang, Y. Yu, H. Cheng, and L. Lin. Lstm-
ing shape correspondence with anisotropic convolutional cf: Unifying context modeling and fusion with lstms for rgb-
neural networks. In NIPS, pages 3189–3197, 2016. 1, 3 d scene labeling. In ECCV, 2016. 1, 2, 5, 6
[2] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral [22] X. Liang, X. Shen, J. Feng, L. Lin, and S. Yan. Semantic
networks and locally connected networks on graphs. ICLR, object parsing with graph lstm. In ECCV, 2016. 2
2014. 3 [23] X. Liang, X. Shen, D. Xiang, J. Feng, L. Lin, and S. Yan.
[3] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and Semantic object parsing with local-global long short-term
A. L. Yuille. Semantic image segmentation with deep con- memory. In CVPR, 2016. 2
volutional nets and fully connected crfs. ICLR, 2015. 2, 5 [24] D. Lin, S. Fidler, and R. Urtasun. Holistic scene understand-
[4] J. Dai, K. He, and J. Sun. Boxsup: Exploiting bounding ing for 3d object detection with rgbd cameras. In ICCV,
boxes to supervise convolutional networks for semantic seg- 2013. 2
mentation. In ICCV, 2015. 2 [25] G. Lin, A. Milan, C. Shen, and I. Reid. Refinenet: Multi-
[5] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolu- path refinement networks with identity mappings for high-
tional neural networks on graphs with fast localized spectral resolution semantic segmentation. arXiv, 2016. 2
filtering. In NIPS, 2016. 3 [26] G. Lin, C. Shen, A. van den Hengel, and I. Reid. Efficient
[6] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bom- piecewise training of deep structured models for semantic
barell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams. Con- segmentation. In CVPR, 2016. 2, 5
volutional networks on graphs for learning molecular finger- [27] W. Liu, A. Rabinovich, and A. C. Berg. Parsenet: Looking
prints. In NIPS, 2015. 3 wider to see better. ICLR Workshop, 2016. 2, 5
[7] D. Eigen and R. Fergus. Predicting depth, surface normals [28] Z. Liu, X. Li, P. Luo, C.-C. Loy, and X. Tang. Semantic im-
and semantic labels with a common multi-scale convolu- age segmentation via deep parsing network. In ICCV, 2015.
tional architecture. In ICCV, 2015. 1, 2, 5 2
[8] N. Friedman. Inferring cellular networks using probabilistic [29] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional
graphical models. Science, 2004. 3 networks for semantic segmentation. In CVPR, 2015. 1, 2,
[9] M. Gori, G. Monfardini, and F. Scarselli. A new model for 5, 6
learning in graph domains. In IJCNN, 2005. 3 [30] J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst.
[10] S. Gupta, P. Arbeláez, R. Girshick, and J. Malik. Indoor Geodesic convolutional neural networks on riemannian man-
scene understanding with rgb-d images: Bottom-up segmen- ifolds. In ICCV Workshops, pages 37–45, 2015. 1, 3
tation, object detection and semantic segmentation. IJCV, [31] J. Pearl. Probabilistic reasoning in intelligent systems. Mor-
2015. 5 gan Kaufmann, 1988. 3
[11] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik. Learning [32] X. Ren, L. Bo, and D. Fox. Rgb-(d) scene labeling: Features
rich features from rgb-d images for object detection and seg- and algorithms. In CVPR, 2012. 5
mentation. In ECCV, 2014. 1, 2, 5 [33] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and
G. Monfardini. The graph neural network model. IEEE TNN,
[12] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling
2009. 3
in deep convolutional networks for visual recognition. In
[34] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor
ECCV, 2014. 2
segmentation and support inference from rgbd images. In
[13] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into
ECCV, 2012. 5
rectifiers: Surpassing human-level performance on imagenet
[35] S. Song, S. P. Lichtenberg, and J. Xiao. Sun rgb-d: A rgb-d
classification. In ICCV, 2015. 5
scene understanding benchmark suite. In CVPR, 2015. 5, 6
[14] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning [36] S. Song and J. Xiao. Deep sliding shapes for amodal 3d
for image recognition. In CVPR, 2016. 5 object detection in rgb-d images. In CVPR, 2016. 3
[15] S. Hochreiter and J. Schmidhuber. Long short-term memory. [37] S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and
Neural computation, 1997. 4 T. Funkhouser. Semantic scene completion from a single
[16] A. Kendall, V. Badrinarayanan, and R. Cipolla. Bayesian depth image. arXiv, 2016. 1, 3
segnet: Model uncertainty in deep convolutional encoder- [38] I. Sutskever, J. Martens, and G. E. Hinton. Generating text
decoder architectures for scene understanding. arXiv, 2015. with recurrent neural networks. In ICML, 2011. 4
6 [39] K. S. Tai, R. Socher, and C. D. Manning. Improved semantic
[17] S. H. Khan, M. Bennamoun, F. Sohel, R. Togneri, and representations from tree-structured long short-term memory
I. Naseem. Integrating geometrical context for semantic la- networks. ACL, 2015. 3, 4
beling of indoor scenes using rgbd images. IJCV, 2016. 5 [40] A. Wang, J. Lu, J. Cai, G. Wang, and T.-J. Cham. Unsu-
[18] T. N. Kipf and M. Welling. Semi-supervised classification pervised joint feature learning and encoding for rgb-d scene
with graph convolutional networks. ICLR, 2017. 3 labeling. TIP, 2015. 5
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet [41] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and
classification with deep convolutional neural networks. In J. Xiao. 3d shapenets: A deep representation for volumetric
NIPS, 2012. 4 shapes. In CVPR, 2015. 3

5207
[42] J. Yao, S. Fidler, and R. Urtasun. Describing the scene as
a whole: Joint object detection, scene classification and se-
mantic segmentation. In CVPR, 2012. 2
[43] F. Yu and V. Koltun. Multi-scale context aggregation by di-
lated convolutions. arXiv, 2015. 2
[44] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene
parsing network. arXiv, 2016. 2
[45] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet,
Z. Su, D. Du, C. Huang, and P. H. Torr. Conditional ran-
dom fields as recurrent neural networks. In ICCV, 2015. 2
[46] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Tor-
ralba. Scene parsing through ade20k dataset. In CVPR, 2017.
2

5208

Surveying Lab Exercise Outline
40% (5)
Surveying Lab Exercise Outline
19 pages
Planificare Pregatitoare Ars Libri
No ratings yet
Planificare Pregatitoare Ars Libri
3 pages
Overview of Lean Six Sigma: Presented by The University of Texas-School of Public Health
No ratings yet
Overview of Lean Six Sigma: Presented by The University of Texas-School of Public Health
22 pages
Iit-Jee Prepration: A Complete Guide by Students Helper
No ratings yet
Iit-Jee Prepration: A Complete Guide by Students Helper
7 pages
Physics Lab VIVA VOICE
75% (4)
Physics Lab VIVA VOICE
5 pages
Introduction To The 2018 Edition of IEEE 1584
No ratings yet
Introduction To The 2018 Edition of IEEE 1584
27 pages
Clamps Brochure r8 FINAL
No ratings yet
Clamps Brochure r8 FINAL
48 pages
Week 1
No ratings yet
Week 1
418 pages
Devsu Code Jam 2021 - Preliminary Phase
No ratings yet
Devsu Code Jam 2021 - Preliminary Phase
15 pages
2016 WMI Competition Grade 7 Part 1 Logical Reasoning Test
100% (1)
2016 WMI Competition Grade 7 Part 1 Logical Reasoning Test
4 pages
Sarepul Complex
No ratings yet
Sarepul Complex
14 pages
Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions
No ratings yet
Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions
23 pages
OnePlus Nord 3 - Full Phone Specifications
No ratings yet
OnePlus Nord 3 - Full Phone Specifications
2 pages
Colley's Method.
No ratings yet
Colley's Method.
13 pages
Superman
No ratings yet
Superman
2 pages
Sitzmann 2020 CVPR
No ratings yet
Sitzmann 2020 CVPR
23 pages
3 - Solid State Ionics 176, 2005, 1601
No ratings yet
3 - Solid State Ionics 176, 2005, 1601
11 pages
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
No ratings yet
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
20 pages
07 Filter-Integrity-Testing Day 3
No ratings yet
07 Filter-Integrity-Testing Day 3
50 pages
Pointnet: Deep Learning On Point Sets For 3D Classification and Segmentation
No ratings yet
Pointnet: Deep Learning On Point Sets For 3D Classification and Segmentation
19 pages
JFET
No ratings yet
JFET
65 pages
6 Segnet
No ratings yet
6 Segnet
14 pages
Predicting Depth, Surface Normals and Semantic Labels With A Common Multi-Scale Convolutional Architecture
No ratings yet
Predicting Depth, Surface Normals and Semantic Labels With A Common Multi-Scale Convolutional Architecture
9 pages
Lyu Learning To Segment 3D Point Clouds in 2D Image Space CVPR 2020 Paper
No ratings yet
Lyu Learning To Segment 3D Point Clouds in 2D Image Space CVPR 2020 Paper
10 pages
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
No ratings yet
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
10 pages
Https:/ieeexplore Ieee org/stampPDF/getPDF JSP
No ratings yet
Https:/ieeexplore Ieee org/stampPDF/getPDF JSP
7 pages
Solution For SquareMind
No ratings yet
Solution For SquareMind
14 pages
I-Arch: Nata Exam Syllabus
100% (1)
I-Arch: Nata Exam Syllabus
4 pages
RFBNet Deep Multimodal Networks With Residual Fusion Blocks For RGB-D Semantic Segmentation
No ratings yet
RFBNet Deep Multimodal Networks With Residual Fusion Blocks For RGB-D Semantic Segmentation
7 pages
Thesis Z Ai
No ratings yet
Thesis Z Ai
46 pages
Simonovsky Dynamic Edge-Conditioned Filters CVPR 2017 Paper
No ratings yet
Simonovsky Dynamic Edge-Conditioned Filters CVPR 2017 Paper
10 pages
Rocco Convolutional Neural Network CVPR 2017 Paper
No ratings yet
Rocco Convolutional Neural Network CVPR 2017 Paper
10 pages
tmpD684 TMP
No ratings yet
tmpD684 TMP
8 pages
ECRU: An Encoder-Decoder Based Convolution Neural Network (CNN) For Road-Scene Understanding
No ratings yet
ECRU: An Encoder-Decoder Based Convolution Neural Network (CNN) For Road-Scene Understanding
19 pages
Q4 S6 LAS-1 Pasay Version 4
No ratings yet
Q4 S6 LAS-1 Pasay Version 4
4 pages
David Mermin 2003 - Writing Physics
No ratings yet
David Mermin 2003 - Writing Physics
10 pages
GRF: L G R F 3DS R R: Earning A Eneral Adiance Ield For Cene Epresentation and Endering
No ratings yet
GRF: L G R F 3DS R R: Earning A Eneral Adiance Ield For Cene Epresentation and Endering
28 pages
Liu2018 Article RGB-DJointModellingWithSceneGe
No ratings yet
Liu2018 Article RGB-DJointModellingWithSceneGe
14 pages
L12 - 3d Deep Learning On Volumetric Representation
No ratings yet
L12 - 3d Deep Learning On Volumetric Representation
63 pages
Semantic Image Segmentation With Task-Specific Edge Detection Using Cnns and A Discriminatively Trained Domain Transform
No ratings yet
Semantic Image Segmentation With Task-Specific Edge Detection Using Cnns and A Discriminatively Trained Domain Transform
10 pages
Large-Scale Point Cloud Semantic Segmentation With Superpoint Graphs
No ratings yet
Large-Scale Point Cloud Semantic Segmentation With Superpoint Graphs
13 pages
Boundary-Aware Segmentation Network For Mobile and Web Applications
No ratings yet
Boundary-Aware Segmentation Network For Mobile and Web Applications
19 pages
Coconets: Continuous Contrastive 3D Scene Representations
No ratings yet
Coconets: Continuous Contrastive 3D Scene Representations
16 pages
1312.6203 Spectral Networks and Locally Connected Networks
No ratings yet
1312.6203 Spectral Networks and Locally Connected Networks
14 pages
5634 Convolutional Neural Networks With Intra Layer Recurrent Connections For Scene Labeling
No ratings yet
5634 Convolutional Neural Networks With Intra Layer Recurrent Connections For Scene Labeling
9 pages
Mescheder Occupancy Networks Learning 3D Reconstruction in Function Space CVPR 2019 Paper
No ratings yet
Mescheder Occupancy Networks Learning 3D Reconstruction in Function Space CVPR 2019 Paper
11 pages
Liu Multi-View Self-Constructing Graph Convolutional Networks With Adaptive Class Weighting Loss CVPRW 2020 Paper
No ratings yet
Liu Multi-View Self-Constructing Graph Convolutional Networks With Adaptive Class Weighting Loss CVPRW 2020 Paper
7 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
Semantic Mapping Using Object-Class Segmentation of RGB-D Images
No ratings yet
Semantic Mapping Using Object-Class Segmentation of RGB-D Images
6 pages
Lecture 16 Hao
No ratings yet
Lecture 16 Hao
56 pages
3D Shape Analysis Using The CNN
No ratings yet
3D Shape Analysis Using The CNN
6 pages
Pointwise Convolutional Neural Networks
No ratings yet
Pointwise Convolutional Neural Networks
10 pages
1906 02739v2
No ratings yet
1906 02739v2
15 pages
Ivi6000c Im 12
No ratings yet
Ivi6000c Im 12
4 pages
The One Hundred Layers Tiramisu: Fully Convolutional Densenets For Semantic Segmentation
No ratings yet
The One Hundred Layers Tiramisu: Fully Convolutional Densenets For Semantic Segmentation
9 pages
Segmentation-Aware Convolutional Networks Using Local Attention Masks
No ratings yet
Segmentation-Aware Convolutional Networks Using Local Attention Masks
11 pages
ICCV2021 - In-Place Scene Labelling and Understanding With Implicit Scene Representation
No ratings yet
ICCV2021 - In-Place Scene Labelling and Understanding With Implicit Scene Representation
10 pages
Environmental Management Accounting System and Value Creation: An Institutional Perspective
No ratings yet
Environmental Management Accounting System and Value Creation: An Institutional Perspective
7 pages
Real-Time Traffic Scene Segmentation Based On Multi-Feature Map and Deep Learning
No ratings yet
Real-Time Traffic Scene Segmentation Based On Multi-Feature Map and Deep Learning
6 pages
Sipomer B CEA TDS
No ratings yet
Sipomer B CEA TDS
1 page
An Exploration of 2D and 3D Deep Learning Techniques For Cardiac MR Image Segmentation
No ratings yet
An Exploration of 2D and 3D Deep Learning Techniques For Cardiac MR Image Segmentation
8 pages
Ding Context Contrasted Feature CVPR 2018 Paper
No ratings yet
Ding Context Contrasted Feature CVPR 2018 Paper
10 pages
Volumetric and Multi-View CNNs For Object Classification On 3D Data
No ratings yet
Volumetric and Multi-View CNNs For Object Classification On 3D Data
14 pages
Spectral Networks and Deep Locally Connected Networks On Graphs
No ratings yet
Spectral Networks and Deep Locally Connected Networks On Graphs
14 pages
Isuzu
No ratings yet
Isuzu
1 page
Endangered Species Act
No ratings yet
Endangered Species Act
11 pages
Ohring Exercises HW#3
No ratings yet
Ohring Exercises HW#3
3 pages
2015 - DeepLab v1 - Semantic Image Segmentation With Deep Convolutional Nets and Fully Connected Crfs
No ratings yet
2015 - DeepLab v1 - Semantic Image Segmentation With Deep Convolutional Nets and Fully Connected Crfs
14 pages
L D C E S S: Earning Ense Onvolutional Mbeddings FOR Emantic Egmentation
No ratings yet
L D C E S S: Earning Ense Onvolutional Mbeddings FOR Emantic Egmentation
10 pages
Dlcv2017d3l1segmentation 170623173102
No ratings yet
Dlcv2017d3l1segmentation 170623173102
36 pages
Mandikal 3D-PSRNet Part Segmented 3D Point Cloud Reconstruction From A Single ECCVW 2018 Paper
No ratings yet
Mandikal 3D-PSRNet Part Segmented 3D Point Cloud Reconstruction From A Single ECCVW 2018 Paper
13 pages
Large Kernel Matters
No ratings yet
Large Kernel Matters
11 pages
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
No ratings yet
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
7 pages
Gkioxari Mesh R-CNN ICCV 2019 Paper
No ratings yet
Gkioxari Mesh R-CNN ICCV 2019 Paper
11 pages
النمذجة ثلاثية الابعاد بإستخدام التصوير الفوتوغرامتر
No ratings yet
النمذجة ثلاثية الابعاد بإستخدام التصوير الفوتوغرامتر
6 pages
380 Dia Clutch - Oyster
No ratings yet
380 Dia Clutch - Oyster
29 pages
【全局卷积GAP】2017 - Large - Kernel - Matters - Improve - Semantic - Segmentation - by - Global - Convolutional - Network
No ratings yet
【全局卷积GAP】2017 - Large - Kernel - Matters - Improve - Semantic - Segmentation - by - Global - Convolutional - Network
9 pages
3D Datasets and Benchmarks: 3. Method
No ratings yet
3D Datasets and Benchmarks: 3. Method
1 page
Zanuttigh 2017
No ratings yet
Zanuttigh 2017
5 pages
DWRSeg
No ratings yet
DWRSeg
10 pages
Case Study On Tesla
No ratings yet
Case Study On Tesla
12 pages
2018 - SeGAN - Adversarial Network With Multi-Scale L 1 Loss For Medical
No ratings yet
2018 - SeGAN - Adversarial Network With Multi-Scale L 1 Loss For Medical
10 pages
Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks
No ratings yet
Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks
30 pages
2018 DGCNN
No ratings yet
2018 DGCNN
12 pages
A Hybrid CNN-CRF Inference Models For 3D Mesh Segmentation
No ratings yet
A Hybrid CNN-CRF Inference Models For 3D Mesh Segmentation
7 pages
New Interest Points Detector For 3D Objects Recognition
No ratings yet
New Interest Points Detector For 3D Objects Recognition
9 pages
Amazon - Ca 88 Key Digital Piano
No ratings yet
Amazon - Ca 88 Key Digital Piano
1 page
Segmentation by Gan
No ratings yet
Segmentation by Gan
18 pages
Raster Graphics Editor: Transforming Visual Realities: Mastering Raster Graphics Editors in Computer Vision
From Everand
Raster Graphics Editor: Transforming Visual Realities: Mastering Raster Graphics Editors in Computer Vision
Fouad Sabry
No ratings yet
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet

3D Graph Neural Networks For RGBD Semantic Segmentation

Uploaded by

3D Graph Neural Networks For RGBD Semantic Segmentation

Uploaded by

3D Graph Neural Networks for RGBD Semantic Segmentation

where β is the recursively defined message and the sub-

Connection to RNNs. GNNs can also be viewed as a gen- Unroll timestamps

Figure 5. Per-class IoU improvement of 3DGNN over unary CNN.

Figure 6. Performance gain as a function of sizes on the NYUD2

(a) Original Image (b) Ground Truth (c) Ours

Failure Case Analysis. Finally, we show and analyze rep-

You might also like