0% found this document useful (0 votes)

65 views5 pages

Fast Single-View 3D Object Reconstruction With Fine Details Through Dilated Downsample and Multi-Path Upsample Deep Neural Network

This paper proposes a deep neural network method for reconstructing 3D objects from a single 2D image with fine details. The network uses a dilated downsample block to extract more features from the image, and a multi-path upsample block to make better use of the extracted features. Experiments show the method achieves higher accuracy and faster speed than state-of-the-art approaches.

Uploaded by

vivian yulie Quintero Barrera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views5 pages

Fast Single-View 3D Object Reconstruction With Fine Details Through Dilated Downsample and Multi-Path Upsample Deep Neural Network

Uploaded by

vivian yulie Quintero Barrera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

FAST SINGLE-VIEW 3D OBJECT RECONSTRUCTION WITH FINE DETAILS THROUGH

DILATED DOWNSAMPLE AND MULTI-PATH UPSAMPLE DEEP NEURAL NETWORK

Chia-Ho Hsu, Ching-Te Chiu, and Chia-Yu Kuan

Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan

ABSTRACT With the emergence of large-scale datasets like shapenet [5],

many learning-based works have been reported. For example, Choy
Three-dimensional (3D) object reconstruction is among the most et al. [3] proposed a unified recurrent convolution neural network
important research areas in the field of computer vision. Its pur- that can take single-view image or multiple view images as input.
pose is to reconstruct the overall shape of an object from its two- Through their method, the reconstruction quality of the object will
dimensional (2D) image. With the development of deep learning, improve when the network sees more images of the object. However,
many methods based on convolutional neural networks (CNNs) have there is a problem that using CNNs-based methods to reconstruct the
been applied in related research. object shape in high 3D space usually misses some shape details, as
To achieve 3D shape reconstruction with low computation time, the variations in object shape can be very large, even in the same
we focus on the commonly used method: single-image reconstruc- object category.
tion. The main issue of using a single image as an input is that the Therefore, Wang et al. [2] use the probabilistic single-view vi-
reconstruction shape often lacks structural detail. To address this is- sual hull (PSVH) to map the 2D silhouette of the object and pose
sue, we proposed two methods: the dilated downsample block and an estimation to 3D space. With PSVH, they successfully addressed
the multi-path upsample block. The dilated downsample block ex- the problem of missing shape detail. However, they need additional
tracts more features and the multi-path upsample block uses the fea- silhouettes and the camera-pose ground truth in their training stage.
tures in our architecture. Thereafter, we concatenate the encoder and Our work aims to use a single view image to reconstruct a de-
decoder with corresponding layers to keep the image features in re- tailed 3D shape in a voxel grid without additional information. It is
construction process. not surprise that only take one view image will reconstruct coarse
Finally, we perform experiments on the dataset provided by shape, so we enhance our model to increase the reconstruction abil-
Choy et al. Results show that our method achieves 67.7% intersection- ity. In this regard, we propose a powerful feature extractor called
over-union (IoU) accuracy, 3.6% higher than state-of-the-art method, dilated downsample block that integrates residual block and dilated
VTN. Compared to the PSVH method, our result achieves 71.4%, convolution [6]. Then, we propose a multi-path upsample block that
an increase of 3.4%. Our average reconstruction time is 13 ms, extends the path of the residual block to increase the usage of the fea-
approximately 25 times faster than PSVH. tures and concatenate the encoder and decoder. Moreover, without
additional input, we can improve our reconstruction speed.
Index Terms— 3D object reconstruction, 3D shape reconstruc-
tion, deep convolutional neural network, single view
2. RELATED WORK
1. INTRODUCTION 2.1. Traditional Method
Three-dimensional (3D) object reconstruction is a computer vision Traditional methods [7] [8] use geometry priors for this difficult task.
technique that uses two-dimensional (2D) information to reconstruct For example, Kar et al. [8] reconstruct a 3D object shape template
the 3D shape. Its purpose is to reconstruct the shape of the object from images of objects in the same category as shape prior. When
from its 2D image, including the information that cannot be pre- given an input image, they first estimate silhouette and viewpoint
sented by the image. Given an image of object, we can recognize it from the input image, and then reconstruct the 3D object by fit-
and imagine the shape of the part the image cannot provide. How- ting the shape template. However, these methods often rely on the
ever, 3D object reconstruction is a difficult task for computer vision. database to achieve a precise shape. Thus, if the database does not
For 3D object reconstruction, there are two main types: single- have this property, their reconstruction quality will be low.
view reconstruction and multi-view reconstruction. In single-view
reonstruction [1] [2], for different times, one view image is randomly 2.2. Learning-Based Method
selected from several views images of the same object to reconstruct
its corresponding 3D shape. By contrast, multi-view reconstruction With the appearance in large-scale shape repositories like shapeNet
[3] [4] uses more than one image from several view images of the [5], many data-driven methods have been proposed, especially using
object to reconstruct its corresponding 3D shape by integrating the CNNs [4] [9] [10] [11]. For example, Choy et al. [3] propose a 3D
features of different views. In the same network architecture of deep recurrent reconstruction neural network (3D-R2N2), which is a uni-
learning, the reconstruction quality of the latter is better than the fied network for single-view or multi-view 3D object reconstruction.
former. However, in real-world applications, such as AR and VR, the Because this network has memory property that can memorize the
efficiency of the former is greater than that of the latter. Furthermore, state of the previous input, its reconstruction quality will get increas-
although the reconstruction quality of the former is not better than ingly better if is see more and more views of the object. However,
the latter, the result is not far from the expected shape. this model has worse reconstruction quality when it only sees one

978-1-5090-6631-5/20/$31.00 ©2020 IEEE 1653 ICASSP 2020

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on April 20,2023 at 16:02:50 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Overall architecture of proposed method. In training stage, we randomly choose one view for each object in the training batch as input
so each view of the same object will be trained at different time. Constant size block (CB) is similar with DSDB.

view of the object. It needs multi-view images to show its advan- 3.1.2. Dilated Downsample Block
tage.
The most related work to ours is the voxel tube network from Recall that the encoder is major in extracting image features, so
Richter et al. [1]. They use single image and 3D shape ground truth we need to design a powerful image extractor to extract more fea-
for 3D shape reconstruction without any additional ground truth ture. For this purpose, we propose the dilated downsample block, as
such camera pose or object silhouette. Their network architecture shown in Fig. 2.
is friendly to memory because they mainly apply 2D convolution
module in their decoder called voxel tube decoder. Besides, the
computation time of using 2D convolution is less than using 3D
convolution. However, because the object shape is in 3D, using 2D
convolution is not powerful enough to reconstruct the 3D shape.
Thus, we still use the 3D convolution module in the decoder and
integrate our proposed method to address the single-view recon-
struction problem.

3. PROPOSED METHOD

In our proposed method, there are four modules: dilated downsam-

ple, multi-path upsample, concatenation and integrated loss func-
tion. Fig. 1 is our detail architecture.
Fig. 2. Detail structure of dilated downsample block.
3.1. 2D Image Encoder
The encoder is focus on extracting image features and downsampling The reason why this type downsample block is more powerful
the input. Therefore, the first method coming to mind is using resid- than the other two blocks mentioned previously is that different ker-
ual block because it has been proven to have this ability. Thus, in the nel size of convolution has different receptive field. Different recep-
following sections, we describe three blocks, sparse step downsam- tive field means that we can extract different features. In addition,
ple block (SSDB), dense step downsample block (DSDB) and our it is a good candidate to extract different features because it can use
proposed dilated downsample block (D-DB). All blocks are based small kernel size of convolution to get large receptive field with-
on residual block. out increasing the parameters of network. Therefore, we do not use
big kernel size of convolution in our proposed dilated downsample
3.1.1. Sparse Step Downsample Block & Dense Step Downsample block.
Block
Because the input must be downsampled in the encoder, the first con-
volutional layer of the block will set stride = 2 to reach this purpose. 3.2. 3D Shape Decoder
However, using this setting means that some input feature maps in-
formation of the previous layer may not be contributed to the next The decoder is major in upsampling the input features from the la-
layer, as it skips some elements of the input feature maps. tent space and reconstructing 3D object shape in voxel representation
Therefore, if we change this setting to stride = 1 which is called from these features. To reconstruct object shape with detail structure,
dense step downsample block, we can avoid this problem. However, our decoder must make a use of the input features greatly. Therefore,
using this block cannot downsample the input feature. Thus, we add the upsample block using in decoder is similar with residual block.
max pooling layer immediately after each dense step’s downsample We describe the upsample block and our proposed multi-path up-
block to reduce the size of the input feature map. sample block in the following sections.

1654

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on April 20,2023 at 16:02:50 UTC from IEEE Xplore. Restrictions apply.
3.2.1. Upsample Block 3.4. Objective Function
Because we want to reconstruct the 3D object to the specific resolu- We first define some notations for convenience. Because the pre-
tion, the most difference between upsample block and residual block dicted shape is 3D, let the predicted shape V ∗ = [v1∗ , v2∗ , ..., vN∗
]
is that the first layer of the upsample block is changed from convo- denote the probabilistic volume after sigmoid of the output volume
lution to deconvolution because the convolution does not upsample through our network, where N is the number of the voxel in volume
the input features. Besides, the general upsample skill, such as inter- V ∗ and 0 ¡ vi≤N
∗
¡ 1. V = [v1 , v2 , . . . , vN ] represents the ground
polation, nearest or bilinear, is added to the first layer of the shortcut truth volume and each voxel in ground truth volume is 0 or 1.
path of the upsample block.
3.4.1. Intersection-over-Union (IoU) Loss
3.2.2. Multi-path Upsample Block
In spired by Richter et al. [1], the IoU divides the number of the
Though the upsample block has the ability to reconstruct the object intersection by the union. For the segmentation of a 3D object, cor-
shape to the specific resolution, it is not powerful. To use the input rect foreground predictions are effectively weighted by the size of
features greatly, we try to expand the path of the upsample block and the ground truth and prediction shape. Therefore, it is benefit to seg-
the first layer of each path uses different kernel size of deconvolution. ment the object shape from background.
Fig. 3 shows the detail structure of multi-path upsample block. The To use IoU for training, the IoU loss is defined as following:
reason why we do that is similar with dilated downsample block P ∗
mentioned above, and thus the input features can be used as much as vi vi
possible. LIoU (V ∗ , V ) = 1 − P ∗ i ∗
(1)
i [vi + vi − vi vi ]

Where V ∗ is the predicted probabilistic volume, V is the ground

truth volume and i traces all voxel in both V ∗ and V.

3.4.2. Mean Squared False Cross Entropy Loss

In general, 3D object reconstruction in a voxel grid is often cast as
binary classification; thus, minimizing the binary cross entropy loss
is the main purpose. Many works [3] [2] use the standard binary
cross entropy function, which weights both false positive and false
Fig. 3. Detail structure of our multi-path upsample block. negative results equally, to minimize the loss.
However, in the context of 3D object reconstruction, the shape
volume has the sparse property. There is a severe unbalance ratio
between occupied and unoccupied voxels. Therefore, if this loss
3.3. Concatenation function is used, the loss will be unbalanced; thus, the network will
According to our architecture, the image features extracted by our easily obtain a false-positive estimation.
encoder are passed to the latent space through fully connected layer. Consequently, inspired by [13], we leverage a loss function
Therefore, when we reconstruct object shape in our decoder, the ob- called mean squared false cross entropy loss (MSFCEL) proposed
ject features from the encoder is vanished. Thus, to reserve the image by Sun et al. [13] to handle this unbalanced problem and expressed
features, we must pass them from the encoder to the decoder. as:
Inspired by [12], concatenation is the proper skill to reserve im- M SF CEL(V ∗ , V ) = F P CE 2 + F N CE 2 , (2)
age features. However, because our encoder and decoder are in dif- where FPCE is a false-positive cross entropy on unoccupied voxels
ferent dimensions, we must transform the image feature maps to 3D of a ground-truth shape volume, and FNCE is false-negative cross
feature volume. The transformation method is that we stack several entropy on occupied voxels:
feature maps, and thus these maps become a 3D feature volume.
By this method, we can concatenate these transformed feature N1
1 X
volumes with the corresponding layers of the same 3D volume size F P CE = − [vn log vn∗ + (1 − vn ) log(1 − vn∗ )], (3)
N1 n=1
in decoder as Fig. 4 shows.

N2
1 X
F N CE = − [vp log vp∗ + (1 − vp ) log(1 − vp∗ )], (4)
N2 p=1

where N1 is the number of unoccupied voxels of V, and N2 is the

number of occupied voxels. vn is the nth occupied voxel and vp is
the pth occupied voxel. vn∗ and vp∗ are the predicted voxels of vn and
vp respectively.
Thus, the losses of occupied and unoccupied voxels are mini-
mized together and balanced. Finally, we integrate these two loss
functions as our final loss function LF inal , expressed as
Fig. 4. Detail structure of concatenation module in our decoder. ”T”
is transformation and ”C” is concatenation. LF inal (V ∗ , V ) = M SF CEL(V ∗ , V ) + LIoU (V ∗ , V ). (5)

1655

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on April 20,2023 at 16:02:50 UTC from IEEE Xplore. Restrictions apply.
Table 1. Comparison with other works. Note * is recalculated by the author of AtlasNet. ** and *** are tested on our own experiment
environment.
2D Image Encoder- ShapeNet dataset Memory
Mean IoU(%) Mean CD Time/per image
Decoder Arch. (13 categories) (MB)
3D-R2N2 [3] 3DRNN 13 categories 56.0% - 205.8 MB -
OGN [14] Octree Decoder 13 categories 59.6%(+3.6%) - 296.9 MB 16 ms on Titan X
130 ms on
PSGN [15] Point Cloud Decoder 13 categories 64.0%(+8.0%) 6.41* 148.4 MB
laptop CPU
AtlasNet [9] Mesh Decoder 13 categories - 5.11(-1.3) 488.4 MB -
8 ms** on
VTN [1] Voxel Tube Decoder 13 categories 64.1%(+8.1%) - 126.6 MB
GTX 1080 Ti
18 ms on Tesla M40
PSVH [2] 3D Shape Decoder 4 categories 68.0% - 226.9 MB 323 ms*** on
GTX 1080 Ti
D-DB-MpUB 11 ms on
3D Shape Decoder 13 categories 66.7%(+10.7%) 5.01(-1.40) 112.3 MB
w/ LF inal GTX 1080 Ti
C-D-DB-MpUB 13 ms on
3D Shape Decoder 13 categories 67.7%(+11.7%) 4.83(-1.58) 118.1 MB
w/ LF inal GTX 1080 Ti

4. EXPERIMENTAL RESULTS entry in Table 2 and the LF inal also has great improvement, 2.1%
higher than its previous entry.
4.1. Environment and Dataset
We implement our method in PyTorch. The CPU is Intel(R) Xeon(R) Table 3. Reconstruction comparison with PSVH.
CPU E5-2620 v4 @ 2.1 GHz, the main memory is 32 GB DDR4
Mean Time/
RAM, and the GPU is NVIDIA GeForce GTX 1080 Ti. Method areo car chair sofa
IoU(%) per image
In this work, we mainly use the dataset provided by [3] which PSVH [2] 63.1 83.9 55.2 69.8 68.0 323 ms
rendered objects from the ShapeNet dataset [5]. The dataset we used Our 68.0 85.5 57.6 74.4 71.4(+3.4) 13 ms
has 13 categories and consists of nearly 50K 3D objects, and each
object has 24 images from different views. We use the same training
and testing split provided by Choy et al. [3]. The object resolution 4.3. Comparison with Other Works
of this dataset is 32 × 32 × 32 in voxel representation and we train
one network for all 13 categories. First, we compare our final result with PSVH [2] whose research pur-
We train our model with batch size 64 and training epoch 210. pose is the same as us. Table 3 shows that our reconstruction result is
The initial learning rate is 10−3 , the learning rate decay is 0.5 per 30 better than PSVH and gets 3.4% improvement. For the reconstruc-
epochs, and the optimizer is Adam. tion speed, our reconstruction time achieves 13 ms per image and
We evaluate our reconstruction results in two metrics: intersec- approximately 25 times faster than 323 ms of them. Note that we
tion over union (IoU) and chamfer distance (CD). The higher IoU use their pretrained model and test on our own environment.
score means the better reconstruction result. The lower CD value At last, we show our result and other works in several aspects.
means the better reconstruction result. As Table 1 shows, our best method achieves 67.7% IoU score, 3.6%
higher than our reference architecture VTN, and has the better results
4.2. Results in both IoU and CD. Even with lower speed GPU, our average recon-
struction time is less than most of works. Note that all CD values are
In terms of the architecture, we propose two modules and leverage multiplied by 103 . In addition, we find that our reconstruction time
the concatenation skill. Then we integrate our proposed modules is less without using concatenation. VTN has the lowest reconstruc-
with our LF inal . We show the influence of each module in Table 2. tion time, as it applies 2D convolution in their voxel tube decoder
that its computation time is less than applying 3D convolution.
Table 2. Summary of all modules used and comparison of influence
of each module. 5. CONCLUSION
Mean
D-DB MpUB Concat. w/LF inal
IoU(%) We design a network for single-view 3D object reconstruction with-
SSDB-UB out other additional input. We also focus on designing a network
63.5
(Baseline) √ which has powerful ability to extract object features and greatly use
D-DB-UB √ √ 64.4% them. We propose two blocks, dilated downsample block and multi-
D-DB-MpUB √ √ √ 64.6%
C-D-DB-MpUB 65.6% path upsample block, to enhance our model. In addition, we use the
C-D-DB-MpUB √ √ √ √ concatenation skill to reserve features of encoder during reconstruc-
67.7%
w/LF inal tion step and leverage MSFCEL to handle unbalance problem.
With all of our proposed modules, our final reconstruction result
In this table, our baseline is the sparse-step downsample block achieves state of the art performance, 67.7% IoU in 13 categories,
(SSDB) and upsample block (UB). D-DB is dilated downsample 3.6% improvement compared with VTN [1]. In addition, in 4 cate-
block, MpUB is multi-path upsample block and w/LF inal is inte- gories, our result achieves 71.4% IoU, 3.4% higher than PSVH [2].
grating LIoU and MSFCEL. We can infer that each module we pro- In terms of reconstruction speed, our average reconstruction time is
posed has benefit to the reconstruction result. Specifically, the con- 13 ms, 25 times faster than PSVH on our own experiment environ-
catenation module has 1% improvement compared with its previous ment.

1656

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on April 20,2023 at 16:02:50 UTC from IEEE Xplore. Restrictions apply.
6. REFERENCES [15] Haoqiang Fan, Hao Su, and Leonidas J. Guibas, “A point set
generation network for 3d object reconstruction from a single
[1] Stephan R. Richter and Stefan Roth, “Matryoshka networks: image,” in The IEEE Conference on Computer Vision and Pat-
Predicting 3d geometry via nested shape layers,” in The tern Recognition (CVPR), July 2017.
IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), June 2018.
[2] Hanqing Wang, Jiaolong Yang, Wei Liang, and Xin Tong,
“Deep single-view 3d object reconstruction with visual hull
embedding,” in Proceedings of the AAAI Conference on Ar-
tificial Intelligence (AAAI), 2019.
[3] Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin
Chen, and Silvio Savarese, “3d-r2n2: A unified approach for
single and multi-view 3d object reconstruction,” in Computer
Vision – ECCV 2016, Cham, 2016, pp. 628–644, Springer In-
ternational Publishing.
[4] Shubham Tulsiani, Alexei A. Efros, and Jitendra Malik,
“Multi-view consistency as supervisory signal for learning
shape and pose prediction,” in The IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), June 2018.
[5] Angel X. Chang, Thomas A. Funkhouser, Leonidas J. Guibas,
Pat Hanrahan, Qi-Xing Huang, Zimo Li, Silvio Savarese,
Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi,
and Fisher Yu, “Shapenet: An information-rich 3d model
repository,” CoRR, vol. abs/1512.03012, 2015.
[6] Fisher Yu and Vladlen Koltun, “Multi-scale context
aggregation by dilated convolutions,” arXiv preprint
arXiv:1511.07122, 2015.
[7] Minhyuk Sung, Vladimir G. Kim, Roland Angst, and Leonidas
Guibas, “Data-driven structural priors for shape completion,”
ACM Trans. Graph., vol. 34, no. 6, pp. 175:1–175:11, Oct.
2015.
[8] Abhishek Kar, Shubham Tulsiani, Joao Carreira, and Jitendra
Malik, “Category-specific object reconstruction from a single
image,” in The IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR), June 2015.
[9] Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C.
Russell, and Mathieu Aubry, “A papier-mâché approach to
learning 3d surface generation,” in The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), June 2018.
[10] Guandao Yang, Yin Cui, Serge Belongie, and Bharath Hariha-
ran, “Learning single-view 3d reconstruction with limited pose
supervision,” in The European Conference on Computer Vision
(ECCV), September 2018.
[11] Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang,
Chengkai Zhang, Tianfan Xue, Joshua B. Tenenbaum, and
William T. Freeman, “Pix3d: Dataset and methods for single-
image 3d shape modeling,” in The IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), June 2018.
[12] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net:
Convolutional networks for biomedical image segmentation,”
CoRR, vol. abs/1505.04597, 2015.
[13] Yongbin Sun, Ziwei Liu, Yue Wang, and Sanjay E. Sarma,
“Im2avatar: Colorful 3d reconstruction from a single image,”
CoRR, vol. abs/1804.06375, 2018.
[14] Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox,
“Octree generating networks: Efficient convolutional architec-
tures for high-resolution 3d outputs,” in The IEEE Interna-
tional Conference on Computer Vision (ICCV), Oct 2017.

1657

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on April 20,2023 at 16:02:50 UTC from IEEE Xplore. Restrictions apply.

Diagrama de Flujo. GA6-240202501-AA2-EV01
No ratings yet
Diagrama de Flujo. GA6-240202501-AA2-EV01
4 pages
Effective Loss Recons
No ratings yet
Effective Loss Recons
21 pages
Point Completion Network - Point CLD
No ratings yet
Point Completion Network - Point CLD
17 pages
Attention Aware Cost Volume Pyramid Based Multi-View Stereo Network For 3D Reconstruction
No ratings yet
Attention Aware Cost Volume Pyramid Based Multi-View Stereo Network For 3D Reconstruction
21 pages
The Assessment of 3D Model Representation For Retr
No ratings yet
The Assessment of 3D Model Representation For Retr
17 pages
3D Reconstruction From Multiview 2D Images
No ratings yet
3D Reconstruction From Multiview 2D Images
21 pages
Neural Recon
No ratings yet
Neural Recon
10 pages
Neus 2
No ratings yet
Neus 2
15 pages
Sparse Neu S
No ratings yet
Sparse Neu S
22 pages
GRF: L G R F 3DS R R: Earning A Eneral Adiance Ield For Cene Epresentation and Endering
No ratings yet
GRF: L G R F 3DS R R: Earning A Eneral Adiance Ield For Cene Epresentation and Endering
28 pages
Unsupervised Learning of Probably Symmetric Deformable 3D Objects
No ratings yet
Unsupervised Learning of Probably Symmetric Deformable 3D Objects
18 pages
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
No ratings yet
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
8 pages
A Real World Dataset For Multi-View 3D
No ratings yet
A Real World Dataset For Multi-View 3D
18 pages
2021-Single Image 3D Object Reconstruction Based On Deep Learning A Review
No ratings yet
2021-Single Image 3D Object Reconstruction Based On Deep Learning A Review
36 pages
The More You See in 2D The More You Perceive in 3D
No ratings yet
The More You See in 2D The More You Perceive in 3D
11 pages
Singh 2020
No ratings yet
Singh 2020
5 pages
Wang Sparse Convolutional Networks For Surface Reconstruction From Noisy Point Clouds WACV 2024 Paper
No ratings yet
Wang Sparse Convolutional Networks For Surface Reconstruction From Noisy Point Clouds WACV 2024 Paper
10 pages
VisFusion Supp
No ratings yet
VisFusion Supp
7 pages
Rfnet-4D++: Joint Object Reconstruction and Flow Estimation From 4D Point Clouds With Cross-Attention Spatio-Temporal Features
No ratings yet
Rfnet-4D++: Joint Object Reconstruction and Flow Estimation From 4D Point Clouds With Cross-Attention Spatio-Temporal Features
14 pages
Mathematics: Voxel-Based 3D Object Reconstruction From Single 2D Image Using Variational Autoencoders
No ratings yet
Mathematics: Voxel-Based 3D Object Reconstruction From Single 2D Image Using Variational Autoencoders
11 pages
Volumetric and Multi-View CNNs For Object Classification On 3D Data
No ratings yet
Volumetric and Multi-View CNNs For Object Classification On 3D Data
14 pages
Progressive Learning of 3D Reconstruction Network From 2D GAN Data
No ratings yet
Progressive Learning of 3D Reconstruction Network From 2D GAN Data
12 pages
Pixel2Mesh++: Multi-View 3D Mesh Generation Via Deformation
No ratings yet
Pixel2Mesh++: Multi-View 3D Mesh Generation Via Deformation
17 pages
L12 - 3d Deep Learning On Volumetric Representation
No ratings yet
L12 - 3d Deep Learning On Volumetric Representation
63 pages
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
No ratings yet
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
8 pages
Zanuttigh 2017
No ratings yet
Zanuttigh 2017
5 pages
Pix2Vox Context-Aware 3D Reconstruction From Single and Multi-View Images
No ratings yet
Pix2Vox Context-Aware 3D Reconstruction From Single and Multi-View Images
9 pages
Lecture 16 Hao
No ratings yet
Lecture 16 Hao
56 pages
Unit Iv Aicv Aids
No ratings yet
Unit Iv Aicv Aids
22 pages
Mathematics of Quantum Computation and Quantum Technology 1st Edition Louis Kauffman - Read The Ebook Online or Download It To Own The Full Content
No ratings yet
Mathematics of Quantum Computation and Quantum Technology 1st Edition Louis Kauffman - Read The Ebook Online or Download It To Own The Full Content
81 pages
Assignment Computer Vision
No ratings yet
Assignment Computer Vision
4 pages
3D Shape Reconstruction From A Single 2D Image Via 2D-3D Self-Consistency
No ratings yet
3D Shape Reconstruction From A Single 2D Image Via 2D-3D Self-Consistency
12 pages
Displays: Shaohua Qi, Xin Ning, Guowei Yang, Liping Zhang, Peng Long, Weiwei Cai, Weijun Li
No ratings yet
Displays: Shaohua Qi, Xin Ning, Guowei Yang, Liping Zhang, Peng Long, Weiwei Cai, Weijun Li
12 pages
Time-Distributed Framework For 3D Reconstruction Integrating Fringe Projection With Deep Learning
No ratings yet
Time-Distributed Framework For 3D Reconstruction Integrating Fringe Projection With Deep Learning
23 pages
613-Article Text-2537-3-10-20221025
No ratings yet
613-Article Text-2537-3-10-20221025
10 pages
2.algebraic Expressions (Special Products and Factoring)
No ratings yet
2.algebraic Expressions (Special Products and Factoring)
21 pages
3D Reconstruction 2021
No ratings yet
3D Reconstruction 2021
27 pages
Ai Fundamentals Midterm Quizzes Source
No ratings yet
Ai Fundamentals Midterm Quizzes Source
26 pages
Mandikal 3D-PSRNet Part Segmented 3D Point Cloud Reconstruction From A Single ECCVW 2018 Paper
No ratings yet
Mandikal 3D-PSRNet Part Segmented 3D Point Cloud Reconstruction From A Single ECCVW 2018 Paper
13 pages
Text-Guided Sparse Voxel Pruning For Efficient 3D Visual Grounding
No ratings yet
Text-Guided Sparse Voxel Pruning For Efficient 3D Visual Grounding
14 pages
AutoRecon 自动检测物体并重建
No ratings yet
AutoRecon 自动检测物体并重建
10 pages
Mescheder Occupancy Networks Learning 3D Reconstruction in Function Space CVPR 2019 Paper
No ratings yet
Mescheder Occupancy Networks Learning 3D Reconstruction in Function Space CVPR 2019 Paper
11 pages
SAIL-VOS 3D: A Synthetic Dataset and Baselines For Object Detection and 3D
No ratings yet
SAIL-VOS 3D: A Synthetic Dataset and Baselines For Object Detection and 3D
13 pages
Single-Shot 3D Reconstruction Via Nonlinear Fringe Transformation: Supervised and Unsupervised Learning Approaches
No ratings yet
Single-Shot 3D Reconstruction Via Nonlinear Fringe Transformation: Supervised and Unsupervised Learning Approaches
18 pages
Single-Shot 3D Shape Reconstruction Using Structured Light and Deep Convolutional Neural Networks
No ratings yet
Single-Shot 3D Shape Reconstruction Using Structured Light and Deep Convolutional Neural Networks
13 pages
Spam Detection in Text Using Machine Learning 1
No ratings yet
Spam Detection in Text Using Machine Learning 1
85 pages
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
No ratings yet
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
11 pages
Multiview Compressive Coding For 3D Reconstruction
No ratings yet
Multiview Compressive Coding For 3D Reconstruction
12 pages
Generalized Fringe-To-Phase Framework For Single-Shot 3D Reconstruction Integrating Structured Light With Deep Learning
No ratings yet
Generalized Fringe-To-Phase Framework For Single-Shot 3D Reconstruction Integrating Structured Light With Deep Learning
18 pages
Voxel-Based 3D Detection and Reconstruction of Multiple Objects From A Single Image
No ratings yet
Voxel-Based 3D Detection and Reconstruction of Multiple Objects From A Single Image
14 pages
1.3d Shape Reconstruction From 2d Images With Disentangled Attrubute Flow
No ratings yet
1.3d Shape Reconstruction From 2d Images With Disentangled Attrubute Flow
11 pages
pdf1906 06543 PDF
No ratings yet
pdf1906 06543 PDF
27 pages
Learning Efficient Point Cloud Generation For Dense 3D Object Reconstruction
No ratings yet
Learning Efficient Point Cloud Generation For Dense 3D Object Reconstruction
8 pages
3D Shape Analysis Using The CNN
No ratings yet
3D Shape Analysis Using The CNN
6 pages
Gkioxari Mesh R-CNN ICCV 2019 Paper
No ratings yet
Gkioxari Mesh R-CNN ICCV 2019 Paper
11 pages
What Is The Role of Algorithm Analysis in Data Structures?: Computer Science
No ratings yet
What Is The Role of Algorithm Analysis in Data Structures?: Computer Science
10 pages
1.fan - A Point Set Generation Network For 3D Object Reconstruction From A Single Image - CVPR - 2017 - Paper
No ratings yet
1.fan - A Point Set Generation Network For 3D Object Reconstruction From A Single Image - CVPR - 2017 - Paper
9 pages
2d 3d Reconstruction
No ratings yet
2d 3d Reconstruction
11 pages
Fast and Accurate Deep Learning-Based Framework For 3D Multi-Object Detector For Autonomous Vehicles
No ratings yet
Fast and Accurate Deep Learning-Based Framework For 3D Multi-Object Detector For Autonomous Vehicles
3 pages
Pytorch Neural Networks Guide 1717173717
No ratings yet
Pytorch Neural Networks Guide 1717173717
17 pages
8.6.1 - What Is An Iteration
No ratings yet
8.6.1 - What Is An Iteration
22 pages
Yin Learning To Recover 3D Scene Shape From A Single Image CVPR 2021 Paper
No ratings yet
Yin Learning To Recover 3D Scene Shape From A Single Image CVPR 2021 Paper
10 pages
Surfacenet: An End-To-End 3D Neural Network For Multiview Stereopsis
No ratings yet
Surfacenet: An End-To-End 3D Neural Network For Multiview Stereopsis
9 pages
APPC Q2 2024 Scoring Guide
No ratings yet
APPC Q2 2024 Scoring Guide
4 pages
A Review of Deep Learning Techniques For 3D Reconstruction of 2D Images
No ratings yet
A Review of Deep Learning Techniques For 3D Reconstruction of 2D Images
5 pages
1906 02739v2
No ratings yet
1906 02739v2
15 pages
Octnetfusion: Learning Depth Fusion From Data: (Riegler, Bischof) @icg - Tugraz.At (Osman - Ulusoy, Andreas - Geiger) @tue - Mpg.De
No ratings yet
Octnetfusion: Learning Depth Fusion From Data: (Riegler, Bischof) @icg - Tugraz.At (Osman - Ulusoy, Andreas - Geiger) @tue - Mpg.De
10 pages
Programming and Computation in Physics - FTCS Method
No ratings yet
Programming and Computation in Physics - FTCS Method
27 pages
Famous IT Scientist
No ratings yet
Famous IT Scientist
10 pages
Institute of Numerical Sciences, KUST, Kohat, Pakistan
No ratings yet
Institute of Numerical Sciences, KUST, Kohat, Pakistan
1 page
Paper 4a QP
No ratings yet
Paper 4a QP
6 pages
Large Language Models in Finance
No ratings yet
Large Language Models in Finance
11 pages
3 Driessen
100% (1)
3 Driessen
34 pages
Net Sec Suggestions
No ratings yet
Net Sec Suggestions
11 pages
WinRAR Encryption Technology FAQ
No ratings yet
WinRAR Encryption Technology FAQ
6 pages
SYLLABUS
No ratings yet
SYLLABUS
4 pages
A Conditional Generative Chatbot Using Transformer
No ratings yet
A Conditional Generative Chatbot Using Transformer
14 pages
Functions AA HL W 4
No ratings yet
Functions AA HL W 4
3 pages
Euclidean Distance Matrix
No ratings yet
Euclidean Distance Matrix
8 pages
Design of Risk-Based Univariate Control Charts With Measurement Uncertainty
No ratings yet
Design of Risk-Based Univariate Control Charts With Measurement Uncertainty
7 pages
Lecture Notes - Optimal Control (LQG, MPC)
No ratings yet
Lecture Notes - Optimal Control (LQG, MPC)
76 pages
Lab Report
No ratings yet
Lab Report
2 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
تقدير متجه المتوسطات ومصفوفة التباين والتباين المشترك PDF
No ratings yet
تقدير متجه المتوسطات ومصفوفة التباين والتباين المشترك PDF
3 pages
A Magic Square
No ratings yet
A Magic Square
10 pages
Cholesky Decomposition
No ratings yet
Cholesky Decomposition
13 pages
Presentasi IESS
No ratings yet
Presentasi IESS
9 pages
M. Borga, O. Friman, P. Lundberg and H. Knutsson
No ratings yet
M. Borga, O. Friman, P. Lundberg and H. Knutsson
1 page
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet

Fast Single-View 3D Object Reconstruction With Fine Details Through Dilated Downsample and Multi-Path Upsample Deep Neural Network

Uploaded by

Fast Single-View 3D Object Reconstruction With Fine Details Through Dilated Downsample and Multi-Path Upsample Deep Neural Network

Uploaded by

FAST SINGLE-VIEW 3D OBJECT RECONSTRUCTION WITH FINE DETAILS THROUGH

DILATED DOWNSAMPLE AND MULTI-PATH UPSAMPLE DEEP NEURAL NETWORK

Chia-Ho Hsu, Ching-Te Chiu, and Chia-Yu Kuan

Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan

ABSTRACT With the emergence of large-scale datasets like shapenet [5],

978-1-5090-6631-5/20/$31.00 ©2020 IEEE 1653 ICASSP 2020

In our proposed method, there are four modules: dilated downsam-

Where V ∗ is the predicted probabilistic volume, V is the ground

3.4.2. Mean Squared False Cross Entropy Loss

where N1 is the number of unoccupied voxels of V, and N2 is the

You might also like