Single-Shot 3D Reconstruction Via Nonlinear Fringe Transformation: Supervised and Unsupervised Learning Approaches
Single-Shot 3D Reconstruction Via Nonlinear Fringe Transformation: Supervised and Unsupervised Learning Approaches
Article
Single-Shot 3D Reconstruction via Nonlinear Fringe
Transformation: Supervised and Unsupervised
Learning Approaches
Andrew-Hieu Nguyen 1 and Zhaoyang Wang 2, *
1 Neuroimaging Research Branch, National Institute on Drug Abuse, National Institutes of Health,
Baltimore, MD 21224, USA; [email protected]
2 Department of Mechanical Engineering, School of Engineering, The Catholic University of America,
Washington, DC 20064, USA
* Correspondence: [email protected]
Abstract: The field of computer vision has been focusing on achieving accurate three-dimensional
(3D) object representations from a single two-dimensional (2D) image through deep artificial neural
networks. Recent advancements in 3D shape reconstruction techniques that combine structured
light and deep learning show promise in acquiring high-quality geometric information about object
surfaces. This paper introduces a new single-shot 3D shape reconstruction method that uses a
nonlinear fringe transformation approach through both supervised and unsupervised learning
networks. In this method, a deep learning network learns to convert a grayscale fringe input
into multiple phase-shifted fringe outputs with different frequencies, which act as an intermediate
result for the subsequent 3D reconstruction process using the structured-light fringe projection
profilometry technique. Experiments have been conducted to validate the practicality and robustness
of the proposed technique. The experimental results demonstrate that the unsupervised learning
approach using a deep convolutional generative adversarial network (DCGAN) is superior to the
supervised learning approach using UNet in image-to-image generation. The proposed technique’s
ability to accurately reconstruct 3D shapes of objects using only a single fringe image opens up vast
opportunities for its application across diverse real-world scenarios.
Citation: Nguyen, A.-H.; Wang, Z.
Single-Shot 3D Reconstruction via
Keywords: fringe projection; deep learning; generative adversarial network; three-dimensional
Nonlinear Fringe Transformation:
Supervised and Unsupervised
imaging; three-dimensional shape measurement
Learning Approaches. Sensors 2024,
24, 3246. https://doi.org/10.3390/
s24103246
1. Introduction
Academic Editors: Marcin Woźniak
and Pei-Ju Chiang
We live in a world where big data are a dominant force, generating an enormous
amount of information every second. The capture of comprehensive data from the physical
Received: 27 February 2024 world necessitates the use of a wide range of integrated sensors and system devices,
Revised: 22 April 2024 among which cameras are some of the most common. Cameras have a significant impact
Accepted: 16 May 2024 on many areas of our lives, such as security and surveillance, healthcare and medical
Published: 20 May 2024
imaging, environmental monitoring, aerial photography, and media and entertainment
using augmented reality (AR) and virtual reality (VR) [1–4]. Additionally, cameras play
a crucial role in the field of 3D reconstruction, where they are used to capture detailed
images and videos of physical spaces or objects, enabling the creation of 3D representations
Copyright: © 2024 by the authors.
Licensee MDPI, Basel, Switzerland.
of objects from 2D images.
This article is an open access article
In the past, achieving both speed and accuracy with high performance in 3D shape
distributed under the terms and reconstruction has been a challenging task due to hardware constraints and algorithm
conditions of the Creative Commons limitations. Various techniques, such as stereo vision, time of flight (ToF), and structure
Attribution (CC BY) license (https:// from motion (SfM), prioritize speed over accuracy, making them ideal for application cases
creativecommons.org/licenses/by/ that require real-time processing, such as robots, autonomous vehicles, AR/VR, scene
4.0/). understanding, and interactive 3D modeling [5–8]. Conversely, structured light scanning,
Figure 1. (A) Flowchart of the proposed nonlinear fringe transformation for 3D reconstruction;
(B) dual-frequency nonlinear transformation; (C) triple-frequency nonlinear transformation.
generating 3D models for real-world targets, it has the potential to make a notable impact
across numerous applications in various fields.
The following parts of the document are organized in the subsequent manner: Section 2
outlines various related methods in the field that integrate fringe projection profilometry
technique with deep learning for 3D shape reconstruction. Section 3 depicts the fringe pro-
jection method, which includes dual-frequency and triple-frequency approaches. Section 4
describes the structure of the training networks. Section 5 presents diverse quantitative
and qualitative outcomes, along with exhibitions of 3D reconstruction. The last two sec-
tions discuss potential limitations, applications, future directions, and insights into the
overall framework.
2. Related Work
The integration of fringe projection with deep learning for 3D reconstruction began
in early 2019 with the methods of transforming captured fringe patterns into a depth
map [26] or intermediate outputs [27–29]. Since then, the integration primarily falls into
two categories: direct and indirect methods, regardless of whether the input is a single shot
or multiple shots.
The direct method operates on a concept that closely resembles depth estimation or
image-to-image conversion in the field of computer vision. In this approach, a network
model is trained to transform a single 2D image into either a depth map or an equiva-
lent disparity map. Specifically, the input vector consists of fringe pattern image(s), and
the output vector comprises a map image with the same resolution, where the value of
each pixel represents geometric depth or height information. Several early methods use
autoencoder-based networks, typically UNet, for the fringe-to-depth conversion [20,30],
while other approaches use unsupervised network models to predict the depth or height
map [31,32]. Nguyen and colleagues enhanced the accuracy of 3D reconstruction by in-
troducing an autoencoder-based network and later incorporating an additional h-shaped
global guidance branch in their work [33,34]. Wang et al. proposed SwinConvUNet,
which employs a self-attention mechanism and the Swin Transformer for fringe-to-depth
conversion, aiming to extract both local and global features [35]. Similar to these works,
other researchers [36–39] presented a few end-to-end network models, such as MSUNet++,
PCTNet, DF-Dnet, and DCAHINet; they focus on depth recovery through diverse multi-
scale feature fusion modules. The team in [40] introduced LiteF2DNet, a lightweight deep
learning framework designed to reduce network weights, and they tested it on computer-
aided design (CAD) objects. Further exploration and in-depth adjustment of supervised
networks for single-shot 3D measurement led to the development of a depthwise-separable
model named DD-Inceptionv2-UNet, as introduced in [41].
In contrast to direct methods, indirect methods encompass various outputs that serve
as the link for determining the unwrapped phase map and the final depth map through
system calibration. These connecting outputs, often referred to as intermediate outputs,
typically originate from the traditional FPP technique. Such outputs consist of phase-
shifted fringe patterns, the numerators and denominators of the arctangent function, the
wrapped phase map, the unwrapped phase map, and integer fringe orders. In many
studies, researchers often convert the input fringe patterns into the numerators and denom-
inators of the arctangent function [42–45]. Alternatively, transforming the fringe patterns
into multiple phase-shifted patterns is an option based on the image-to-image transfor-
mation concept [46–49]. In addition to being predicted through supervised learning, the
n-step phase-shifting fringe patterns can also be generated using the virtual temporal
phase-shifting method by employing generative adversarial networks (GANs) [50]. How-
ever, employing the discrete cosine transform (DCT) for phase unwrapping from a single
frequency may result in a less accurate determination of the true phase. The need for em-
ploying a multi-frequency phase-shifting approach in the phase unwrapping process has
driven the exploration of encoding diverse information into a composite pattern [25,51–54].
Subsequently, the wrapped phase map can serve as either the input or output vector in co-
Sensors 2024, 24, 3246 5 of 18
operation between fringe projection and learning-based networks [55–58]. Recognizing the
significance of the integer fringe orders in the phase unwrapping scheme, several methods
have trained CNN models to segment the integer fringe orders or predict the coarse phase
map [59–62]. In recent research, instead of employing multiple networks or a multi-stage
scheme to determine separate wrapped phases, fringe orders, or coarse phase maps, a
single network with multiple decoder branches has been developed to predict multiple
intermediate quantities for determining the unwrapped phase map [63–65]. In contrast to
simultaneously predicting multiple intermediate outputs, various approaches incorporate
multiple input types—such as a reference plane, additional fringe image, intermediate
fringe image, wrapped phase, fringe order, etc.—in addition to the fringe pattern, aiming
to further improve the accuracy of phase reconstruction [28,66–69].
Figure 2. Diagram illustrating the fringe projection profilometry method using a dual-frequency
four-step phase-shifting approach.
The fringe patterns captured by the camera can be expressed as follows [29]:
j
Ii (u, v) = Ia (u, v) + Ib (u, v) cos ϕi (u, v) + δj (1)
where I, Ia , and Ib represent the pixel intensities of the captured patterns, the intensity
background, and the fringe amplitude at a specific pixel location (u, v). The superscript
j denotes the order of the phase-shifted image, with j ranging from 1 to 4 in the case of
a four-step phase-shifting algorithm; the subscript i implies the ith frequency; δ is the
Sensors 2024, 24, 3246 6 of 18
( j −1) π
phase-shift amount with δj = 2 . The value of ϕi (u, v) can be computed using the
standard four-step phase-shifting algorithm.
The multi-frequency phase-shifting algorithm is commonly used in FPP 3D imaging
due to its ability to manage geometric discontinuities and overlapping objects with varying
height or depth information. Our proposed approach employs a dual-frequency four-
step (DFFS) phase-shifting scheme that uses two fringe frequencies, as well as the triple-
frequency four-step (TFFS) scheme, which involves three frequencies. When using the DFFS
phase-shifting scheme, the unwrapped phase can be obtained by satisfying the condition
that the difference between the two frequencies is one. In such cases, the equations that
govern the unwrapped phase can be expressed as follows [52]:
ϕ2w ≥ ϕ1w
uw 0,
ϕ12 = ϕ2w − ϕ1w +
2π, ϕ2w < ϕ1w
uw f − ϕw
(2)
ϕ12 2
ϕ= ϕ2uw = ϕ2w + INT 2
2π
2π
Equation (2) describes the process of unwrapping the phase of two different frequen-
cies, f 1 and f 2 . It involves using their wrapped phases, ϕ1w and ϕ2w , respectively. However,
since the initial unwrapped phase, ϕ12 uw , is derived with only one fringe, it cannot be used
directly due to the noise caused by the difference between the two frequencies. Instead, ϕ12 uw
serves as the interfering unwrapped phase for the hierarchical phase-unwrapping process
of ϕ2uw . The final unwrapped phase, denoted as ϕ, corresponds to the phase distribution of
the highest fringe frequency. It is noted that this study utilizes f 1 = 79 and f 2 = 80, which
meet the requirements of the DFFS scheme.
The TFFS scheme adopts three fringe frequencies that must meet a specific condi-
tion to compute the unwrapped phase of the fringe patterns with the highest frequency.
Specifically, it requires that ( f 3 − f 2 ) − ( f 2 − f 1 ) = 1, where ( f 3 − f 2 ) > ( f 2 − f 1 ) > 0. The
unwrapped phase can be determined by a set of hierarchical equations [47,53]:
ϕ2w ⩾ ϕ1w
w 0
ϕ12 = ϕ2w − ϕ1w +
2π ϕ2w < ϕ1w
ϕ3w ⩾ ϕ2w
w 0
ϕ23 = ϕ3w − ϕ2w +
2π ϕ3w < ϕ2w
w ⩾ ϕw
w w 0 ϕ23 12
ϕ123 = ϕ23 − ϕ12 + w w (3)
2π ϕ23 < ϕ12
w
ϕ123 ( f 3 − f 2 ) − ϕ23
w
ϕ23 = ϕ23 + INT 2π
2π
f
ϕ23 f −3 f − ϕ3w
ϕ = ϕ3uw = ϕ3w + INT 3 2 2π
2π
The conditions for the TFFS scheme may seem more complicated, but they are designed
to ensure that the highest-frequency fringe patterns can be accurately analyzed. Equation (3)
involves two different types of phases, wrapped and unwrapped, denoted as ϕw and ϕuw ,
respectively. The function “INT” is used to round off numbers to the nearest integer. The
term ϕmn represents the difference between the phase values of two points, ϕm and ϕn ,
where the difference ( f n − f m ) corresponds to the number of wrapped fringes in the phase
map. The algorithm works by using the fact that ϕ123 , which has only one fringe in the
pattern, is both wrapped and unwrapped, and this property enables a hierarchical phase-
unwrapping process connecting ϕ123 and ϕ3 through ϕ23 . Finally, the highest-frequency
fringe pattern, ϕ3 , is used for the final phase determination, as it provides the highest level
of accuracy.
To extract height or depth information, we utilize the FPP 3D imaging technique,
which directly reconstructs the data from the unwrapped phase obtained from Equation (2)
Sensors 2024, 24, 3246 7 of 18
or Equation (3). The following explanation of the depth map extraction from ϕ is provided
in [70]:
⊺
c P1 P2
z= ⊺
d P1 P2
c = {1 c1 c2 c3 · · · c17 c18 c19 }
d = {d0 d1 d2 d3 · · · c17 d18 d19 } (4)
n o
P1 = 1 ϕ u uϕ v vϕ u2 u2 ϕ uv uvϕ v2 v2 ϕ
n o
P2 = u3 u3 ϕ u2 v u2 vϕ uv2 uv2 ϕ v3 v3 ϕ .
The parameters, denoted as c1 to c19 and d0 to d19 in the equation, are pre-determined
through a system calibration process. It is noteworthy that after the determination of the
height or depth z at each pixel coordinate (u, v), the other two coordinates, x and y, can be
directly determined from the calibrated camera model.
Grayscale image Input image Output images (multiple phase-shifted fringes) Ground-truth phase Ground-truth depth
Figure 3. Illustration of the input–output pairs used for training within the datasets employing the
DFFS scheme.
fringes. Figure 4B visualizes the architecture of the unsupervised model DCGAN, which
includes two separate parts of the generator and discriminator.
Discriminator
Real
Figure 4. Architecture of the supervised UNet (A) and unsupervised DCGAN (B) models utilized in
the training process.
The generator’s architecture is similar to that of the UNet model mentioned earlier,
with the objective of generating fringe patterns closely resembling the ground-truth fringes
(real fringes). The generator comprises a series of transposed convolutional layers that
up-sample the input noise vector to generate the final output images. The generator also
contains skip connections that copy the corresponding layers from the encoder, concatenat-
ing them with the decoder path to preserve fine feature information during upsampling.
The discriminator, on the other hand, takes both generated and real fringes as input vec-
tors, discerning between them. The discriminator incorporates convolution layers with
a stride of 2 to extract feature information from the input feature map. Each convolution
layer undergoes batch normalization and applies the LeakyReLU activation function to
introduce nonlinearity. The final output layer uses a sigmoid activation function for binary
cross-entropy loss, generating a probability score that indicates the likelihood that the
input image is either real or generated/fake. In particular, probability scores approaching 1
indicate real fringes, while values approaching 0 signify generated fringes. The adversarial
training process between the generator and discriminator continues until the generator can
produce lifelike fringes that deceive the discriminator into believing they are real.
the supervised learning phase, the Adam optimizer was employed with an initial learning
rate of 0.0001 for 300 learning epochs. After that, a step decay schedule was utilized for the
last 100 epochs to gradually diminish the learning rate, aiding in the convergence of the
network. The batch size was set to 1, and the loss function for the image regression task is
the mean squared error (MSE).
In the unsupervised learning scenario, the dataset was divided into training and test
sets using a ratio of 90%:10%. The Adam optimizer, with a learning rate of 0.0002 and a
beta value of 0.5, was applied to both the discriminator and the entire DCGAN model.
The discriminator utilized the MSE as the loss function, focusing on distinguishing real
fringes from the generated ones. The DCGAN loss is a combination of sigmoid cross-
entropy and L1 loss (mean absolute error—MAE), with a weight loss ratio of 1 versus
100 (i.e., LAMBDA = 100). PatchGAN, a typical discriminator architecture for image-
to-image translation, was employed to discern real fringes or generated fringes at the
patch level, using arrays of 1’s and 0’s instead of a single output, as seen in traditional
discriminators. The entire learning process involves 200 iterative epochs with a batch size of
2. Notably, samples are randomly selected internally in each epoch. In both the supervised
and unsupervised learning processes, a 10% split test set is completely separated from
the training and validation datasets. This ensures that the object surface has not been
encountered during training or validation, thereby mitigating possible overfitting issues
and biased evaluations.
Multiple graphics processing unit (GPU) nodes within the Biowulf cluster at the
National Institutes of Health (NIH) were required to train the models. Specifically, the two
main GPU nodes utilized in this project consist of 4 × NVIDIA A100 GPUs with 80 GB
VRAM each and 4 × NVIDIA V100-SXM2 GPUs with 32GB VRAM each. The programming
framework relies on Python 3.10.8, and the deep learning Keras framework version 2.11.0 is
employed to construct the network architecture, as well as for data prediction and analysis.
It is worth noting that the training time for the DCGAN model was 9 h using NVIDIA A100
GPUs with a batch size of 2 and 200 epochs. In contrast, the UNet model required less than
6 h for 400 epochs. This highlights the importance of choosing the right model architecture
and training parameters to optimize the training time without compromising the quality of
the results.
the UNet model demonstrated its ability to generate satisfactory fringe patterns for the
final 3D reconstruction process.
(A) Image Quality Assessment for non-linear fringe transformation using UNet
1
𝐼79 2
𝐼79 3
𝐼79 4
𝐼79
(B) Image Quality Assessment for non-linear fringe transformation using DCGAN
1 2 3 4
𝐼79 𝐼79 𝐼79 𝐼79
Figure 5. Assessment of image quality using the SSIM and PSNR metrics for predicted fringe images.
5.2. The 3D Reconstruction of a Single Object and Multiple Objects via the DFFS Scheme
After successfully training the model, we proceeded to utilize a single fringe to gener-
ate new fringe patterns. The experiment employed the DFFS scheme with I79 1 as the initial
1−4 1−4
fringe input. The trained model produced eight fringes of I79 and I80 as outputs. Figure 6
shows a few representative results obtained by using two different approaches: UNet and
DCGAN. The first column in the figure presents the fringe input to the network, followed
by the ground-truth 3D shape generated by the conventional DFFS scheme. Subsequently,
the third and fourth columns display the 3D reconstruction, highlighting key deviations
from the reference 3D shape.
To evaluate the model’s performance, we conducted a comparative analysis between
the UNet and DCGAN schemes. While both approaches successfully rendered an overall
3D representation of the targets, distinctions in their performance emerged. Upon closer
Sensors 2024, 24, 3246 12 of 18
examination of the generated 3D shapes, it became evident that the DCGAN model pro-
duced more detailed depth information than the UNet did, as the region highlighted by
a green square shows. Moreover, the region highlighted by a green circle indicated that
the UNet model struggled to generate the desired 3D shape due to incorrect fringe orders.
The region in the yellow circle further shows the UNet’s deficiency in capturing certain
depth details owing to inaccuracies in fringe ordering. Additionally, we noted some shape
discontinuities along the edges for both approaches.
Figure 6. Comparison of the 3D reconstruction results between UNet and DCGAN models when
utilizing the DFFS scheme.
1 and I 1
5.3. Investigation of the Frequency-Dependent Fringe Input of I79 80
There is a growing concern about how using different fringe inputs with different
frequencies could affect the reconstruction of 3D shapes. To investigate this issue, two
additional networks, UNet and DCGAN, were trained using one fringe input I80 1 while
1−4 1−4
keeping the output fringes as either I79 or I80 . Figure 7 is divided into two sections to
make it easier to visually compare the use of different fringe inputs.
Based on the visual observations, it appears that the performance of both the UNet
and DCGAN models is inferior when using the fringe input I80 1 in comparison with I 1 .
79
This discrepancy arises from the surfaces of the shapes appearing blurred due to sub-
sequent post-processing steps aimed at filling gaps and eliminating shape irregularities.
It is important to note that these post-processing steps are applied uniformly to all re-
construction regions with identical parameters to ensure a fair visual comparison. For a
more detailed examination of single-object reconstruction, the green arrows highlight the
rough nature of the reconstructions and their inability to capture finer details when the
network operates with a fringe frequency of f = 80. Furthermore, the red arrows point out
shape-disconnected regions and missing areas in the reconstructed object when using the
1 .
fringe input I80
Overall, the observations indicate that employing the fringe input I80 1 leads to in-
Input image
Ground-truth 3D
UNet 3D
DCGAN 3D
6. Discussion
This article presents an innovative method for reconstructing 3D image(s) by inte-
grating structured light and deep learning. The proposed approach involves training a
single fringe pattern to generate multiple phase-shifted patterns with varying frequencies,
followed by a conventional algorithm for subsequent 3D reconstruction. Validation of
the technique was conducted on two datasets, employing both supervised and unsuper-
vised deep learning networks to execute the nonlinear fringe transformation. The results
showed that the unsupervised learning strategy using the DCGAN model outperformed
the supervised UNet model on both datasets.
During the training and prediction stages, both supervised and unsupervised learning
models tend to produce some incorrect fringe orders during obtaining the unwrapped
phase. These errors lead to the generation of multiple layers of 3D shapes and irrelevant
scattering point clouds. However, these limitations can be addressed with various auto-
matic noise-removing techniques. It is noteworthy that prior state-of-the-art techniques,
which had sufficient input information, such as encoded composite fringes, multiple fringes,
or fringes with a reference image, did not encounter such issues. Nevertheless, the elimi-
nation of irrelevant point clouds can also be achieved manually through post-processing
steps. The comparison between UNet and DCGAN also highlighted the importance of
obtaining accurate fringe orders in generating detailed 3D shapes, which is an obvious
requirement. It is suggested that further exploration in this area could lead to more precise
3D reconstruction results.
In our proposed approach, we utilized the DFFS and TFFS schemes with high-
frequency fringe patterns for both training and testing scenarios. The use of low-frequency
(e.g., 4, 8, etc.) fringe images is undesirable, as they typically result in lower accuracy
in phase determinations. As demonstrated in our previous work, low-frequency fringe
patterns yield inferior 3D reconstruction results compared to high-frequency fringe patterns
in the image-to-depth construction process [34].
Sensors 2024, 24, 3246 15 of 18
This study used both an unsupervised DCGAN and the conditional pix2pix model [71]
for image-to-image translation and compared their performance. Pix2pix is a type of
conditional GAN where the input is combined with both real and generated images to
distinguish between real and fake. However, the pix2pix model was observed to be less
accurate in generating the 3D shape of the object and introduced more scattered noise and
incorrect layers. It should be noted that the pix2pix model was trained in parallel with
the DCGAN model, but none of its results outperformed the UNet and DCGAN models.
Therefore, we have left out the relevant descriptions. It is also noted that the DCGAN
yields some unexpected artifacts, which can be seen in Figure 5. Although such artifacts are
common in GAN models, our experimental results have demonstrated that these generative
traits effectively cope with the nonlinear fringe transformation task, whereas the supervised
learning model falls short.
Given that the proposed networks are trained on robust GPU cards, i.e., NVIDIA
A100 and V100-SXM2, equipped with lots of VRAM, the likelihood of encountering out-of-
memory issues during training is significantly reduced. However, rather than generating
multiple intermediate outputs in the form of phase-shifted fringe patterns, a more efficient
strategy could be producing the numerators and denominators used in subsequent phase
calculation. This approach can help cope with the potential memory-related challenges.
In this specific situation, the memory required for the outputs can be reduced to half. It
is worth mentioning that training for 400 epochs in the supervised learning process is
significantly faster than training for 200 epochs in the unsupervised learning approach.
This discrepancy arises because the supervised learning approach loads all of the training
and validation datasets into the VRAM at once and conducts internal splitting, whereas
the unsupervised learning approach automatically draws a small batch randomly into the
VRAM for training. Additionally, the inclusion of an additional discriminator architecture,
along with the fine-tuning process, adds to the computational cost of the unsupervised
learning approach.
We acknowledge that this paper lacks a thorough comparison with other cutting-edge
techniques for 3D reconstruction that integrate fringe projection and deep learning. Ob-
taining sample codes from these techniques is challenging, and each method involves
its own hyperparameter tuning and network construction. Consequently, the proposed
approach only utilizes the most popular network for fringe-to-fringe transformation, specif-
ically, an autoencoder-based UNet, to compare it with the unsupervised-learning-based
DCGAN. Furthermore, the proposed technique primarily serves as a proof of concept for
nonlinear fringe transformation using deep learning. More comprehensive comparisons
can be performed in future work as the field progresses. Future research could explore
more rigorous comparisons and strive to enhance the accuracy of the nonlinear fringe
transformation method.
Given the inherent complexity in constructing robust network architectures and tuning
deep learning models, we recognize that future datasets should include a wider variety of
objects and more challenging scenes with diverse testing conditions to address the concern
of possible overfitting issues. We are committed to expanding our dataset collection efforts
to provide the research community with more comprehensive and representative datasets.
7. Conclusions
In summary, this manuscript introduces a novel 3D reconstruction approach via a
nonlinear fringe transformation method that combines fringe projection with deep learning.
This technique utilizes a single high-frequency fringe image as input and generates multiple
high-frequency fringe images with varying frequencies and phase shift amounts. These
intermediate outputs facilitate the subsequent 3D reconstruction process through the
traditional FPP technique. Since the proposed method requires only a single fringe image
to reconstruct the 3D shapes of objects with good accuracy, it provides great potential
for various dynamic applications, such as 3D body scanning, heritage and preservation
scanning, virtual clothing try-ons, indoor mapping, and beyond. Regarding future works,
Sensors 2024, 24, 3246 16 of 18
there is a wide range of possibilities for exploring and enhancing the proposed approach.
One possible direction is to investigate the method’s performance on larger datasets and
more complex objects. Moreover, researchers can explore different imaging configurations
to enhance reconstruction quality and accuracy. Additionally, the potential of this method
is worth exploring in other domains, such as medical imaging or robotics.
References
1. Kim, S.J.; Lin, H.T.; Lu, Z.; Süsstrunk, S.; Lin, S.; Brown, M.S. A New In-Camera Imaging Model for Color Computer Vision and
Its Application. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2289–2302. [PubMed]
2. Kim, J.; Gurdjos, P.; Kweon, I. Geometric and algebraic constraints of projected concentric circles and their applications to camera
calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 637–642.
3. Fleck, S.; Straßer, W. Smart Camera Based Monitoring System and Its Application to Assisted Living. Proc. IEEE 2008, 96, 1698–1714.
[CrossRef]
4. Capel, D.; Zisserman, A. Computer vision applied to super resolution. IEEE Signal Process. Mag. 2003, 20, 75–86. [CrossRef]
5. Kolb, A.; Barth, E.; Koch, R.; Larsen, R. Computer vision applied to super resolution. Comput. Graph. Forum 2010, 29, 141–159.
[CrossRef]
6. Wang, Z.; Kieu, H.; Nguyen, H.; Le, M. Digital image correlation in experimental mechanics and image registration in computer
vision: Similarities, differences and complements. Opt. Lasers Eng. 2015, 65, 18–27. [CrossRef]
7. Nguyen, H.; Wang, Z.; Jones, P.; Zhao, B. Accurate 3D shape measurement of multiple separate objects with stereo vision. Appl.
Opt. 2017, 56, 9030–9037. [CrossRef] [PubMed]
8. Westoby, M.; Brasington, J.; Glasser, N.; Hambrey, M.; Reynolds, J. ‘Structure-from-Motion’ photogrammetry: A low-cost, effective
tool for geoscience applications. Geomorphology 2012, 179, 300–314. [CrossRef]
9. Geng, J. Structured-light 3D surface imaging: A tutorial. Adv. Opt. Photonics 2011, 3, 128–160. [CrossRef]
10. Osten, W.; Faridian, A.; Gao, P.; Körner, K.; Naik, D.; Pedrini, G.; Singh, A.K.; Takeda, M.; Wilke, M. Recent advances in digital
holography [invited]. Appl. Opt. 2014, 53, G44–G63. [CrossRef]
11. Shen, S. Accurate Multiple View 3D Reconstruction Using Patch-Based Stereo for Large-Scale Scenes. IEEE Trans. Image Process.
2013, 22, 1901–1914. [CrossRef] [PubMed]
12. Han, X.F.; Laga, H.; Bennamoun, M. Image-Based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning
Era. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1578–1604. [CrossRef] [PubMed]
13. Chen, J.; Kira, Z.; Cho, Y. Deep Learning Approach to Point Cloud Scene Understanding for Automated Scan to 3D Reconstruction.
J. Comput. Civ. Eng. 2019, 33, 04019027. [CrossRef]
14. Zhu, Z.; Wang, X.; Bai, S.; Yao, C.; Bai, X. Deep Learning Representation using Autoencoder for 3D Shape Retrieval. Neurocomputing
2016, 204, 41–50. [CrossRef]
15. Wang, G.; Ye, J.; Man, B. Deep learning for tomographic image reconstruction. Nat. Mach. Intell. 2020, 2, 737–748. [CrossRef]
16. Zuo, C.; Qian, J.; Feng, S.; Yin, W.; Li, Y.; Fan, P.; Han, J.; Qian, K.; Chen, Q. Deep learning in optical metrology: A review. Light
Sci. Appl. 2022, 11, 39. [CrossRef] [PubMed]
17. Zhang, T.; Jiang, S.; Zhao, Z.; Dixit, K.; Zhou, X.; Hou, J.; Zhang, Y.; Yan, C. Rapid and robust two-dimensional phase unwrapping
via deep learning. Opt. Express 2019, 27, 23173–23185. [CrossRef] [PubMed]
18. Maggipinto, M.; Terzi, M.; Masiero, C.; Beghi, A.; Susto, G.A. A Computer Vision-Inspired Deep Learning Architecture for Virtual
Metrology Modeling with 2-Dimensional Data. IEEE Trans. Semicond. Manuf. 2018, 31, 376–384. [CrossRef]
Sensors 2024, 24, 3246 17 of 18
19. Catalucci, S.; Thompson, A.; Piano, S.; Branson, D.T., III; Leach, R. Optical metrology for digital manufacturing: A review. Int. J.
Adv. Manuf. Technol. 2022, 120, 4271–4290. [CrossRef]
20. Nguyen, H.; Wang, Y.; Wang, Z. Single-Shot 3D Shape Reconstruction Using Structured Light and Deep Convolutional Neural
Networks. Sensors 2020, 20, 3718. [CrossRef] [PubMed]
21. Yang, R.; Li, Y.; Zeng, D.; Guo, P. Deep DIC: Deep learning-based digital image correlation for end-to-end displacement and strain
measurement. J. Mater. Process. Technol. 2022, 302, 117474. [CrossRef]
22. Nguyen, H.; Tran, T.; Wang, Y.; Wang, Z. Three-dimensional Shape Reconstruction from Single-shot Speckle Image Using Deep
Convolutional Neural Networks. Opt. Lasers Eng. 2021, 143, 106639. [CrossRef]
23. Feng, S.; Zuo, C.; Zhang, L.; Yin, W.; Chen, Q. Generalized framework for non-sinusoidal fringe analysis using deep learning.
Photonics Res. 2021, 9, 1084–1098. [CrossRef]
24. Yan, K.; Yu, Y.; Huang, C.; Sui, L.; Qian, K.; Asundi, A. Generalized framework for non-sinusoidal fringe analysis using deep
learning. Opt. Comm. 2019, 437, 148–152. [CrossRef]
25. Li, Y.; Qian, J.; Feng, S.; Chen, Q.; Zuo, C. Composite fringe projection deep learning profilometry for single-shot absolute 3D
shape measurement. Opt. Express 2022, 30, 3424–3442. [CrossRef] [PubMed]
26. Jeught, S.; Dirckx, J. Deep neural networks for single shot structured light profilometry. Opt. Express 2019, 27, 17091–17101.
[CrossRef] [PubMed]
27. Shi, J.; Zhu, X.; Wang, H.; Song, L.; Guo, Q. Label enhanced and patch based deep learning for phase retrieval from single frame
fringe pattern in fringe projection 3D measurement. Opt. Express 2019, 27, 28929–28943. [CrossRef] [PubMed]
28. Feng, S.; Chen, Q.; Gu, G.; Tao, T.; Zhang, L.; Hu, Y.; Yin, W.; Zuo, C. Fringe pattern analysis using deep learning. Adv. Photonics
2019, 1, 025001. [CrossRef]
29. Nguyen, H.; Dunne, N.; Li, H.; Wang, Y.; Wang, Z. Real-time 3D shape measurement using 3LCD projection and deep machine
learning. Appl. Opt. 2019, 58, 7100–7109. [CrossRef] [PubMed]
30. Zheng, Y.; Wang, S.; Li, Q.; Li, B. Fringe projection profilometry by conducting deep learning from its digital twin. Opt. Express
2020, 28, 36568–36583. [CrossRef] [PubMed]
31. Fan, S.; Liu, S.; Zhang, X.; Huang, H.; Liu, W.; Jin, P. Unsupervised deep learning for 3D reconstruction with dual-frequency
fringe projection profilometry. Opt. Express 2021, 29, 32547–32567. [CrossRef]
32. Wang, F.; Wang, C.; Guan, Q. Single-shot fringe projection profilometry based on deep learning and computer graphics. Opt.
Express 2021, 29, 8024–8040. [CrossRef] [PubMed]
33. Nguyen, H.; Ly, K.; Tran, T.; Wang, Y.; Wang, Z. hNet: Single-shot 3D shape reconstruction using structured light and h-shaped
global guidance network. Results Opt. 2021, 4, 100104. [CrossRef]
34. Nguyen, A.; Sun, B.; Li, C.; Wang, Z. Different structured-light patterns in single-shot 2D-to-3D image conversion using deep
learning. Appl. Opt. 2022, 61, 10105–10115. [CrossRef] [PubMed]
35. Wang, L.; Lu, D.; Tao, J.; Qiu, R. Single-shot structured light projection profilometry with SwinConvUNet. Opt. Eng. 2022,
61, 114101.
36. Wang, C.; Zhou, P.; Zhu, J. Deep learning-based end-to-end 3D depth recovery from a single-frame fringe pattern with the
MSUNet++ network. Opt. Express 2023, 31, 33287–33298. [CrossRef] [PubMed]
37. Zhu, X.; Han, Z.; Zhang, Z.; Song, L.; Wang, H.; Guo, Q. PCTNet: Depth estimation from single structured light image with a
parallel CNN-transformer network. Meas. Sci. Technol. 2023, 34, 085402. [CrossRef]
38. Wu, Y.; Wang, Z.; Liu, L.; Yang, N.; Zhao, X.; Wang, A. Depth acquisition from dual-frequency fringes based on end-to-end
learning. Meas. Sci. Technol. 2024, 35, 045203. [CrossRef]
39. Song, X.; Wang, L. Dual-stage hybrid network for single-shot fringe projection profilometry based on a phase-height model. Opt.
Express 2024, 32, 891–906. [CrossRef] [PubMed]
40. Ravi, V.; Gorthi, R. LiteF2DNet: A lightweight learning framework for 3D reconstruction using fringe projection profilometry.
Appl. Opt. 2023, 62, 3215–3224. [CrossRef] [PubMed]
41. Wang, L.; Xue, W.; Wang, C.; Gao, Q.; Liang, W.; Zhang, Y. Depth estimation from a single-shot fringe pattern based on
DD-Inceptionv2-UNet. Appl. Opt. 2023, 62, 9144–9155. [CrossRef] [PubMed]
42. Zhao, P.; Gai, S.; Chen, Y.; Long, W.; Da, F. A multi-code 3D measurement technique based on deep learning. Opt. Lasers Eng.
2021, 143, 106623.
43. Feng, S.; Zuo, C.; Yin, W.; Gu, G.; Chen, Q. Micro deep learning profilometry for high-speed 3D surface imaging. Opt. Lasers Eng.
2019, 121, 416–427. [CrossRef]
44. Liu, X.; Yang, L.; Chu, X.; Zhou, L. A novel phase unwrapping method for binocular structured light 3D reconstruction based on
deep learning. Optik 2023, 279, 170727. [CrossRef]
45. Nguyen, A.; Wang, Z. Time-Distributed Framework for 3D Reconstruction Integrating Fringe Projection with Deep Learning.
Sensors 2023, 23, 7284. [CrossRef] [PubMed]
46. Yu, H.; Chen, X.; Zhang, Z.; Zuo, C.; Yang, Y.; Zheng, D.; Han, J. Dynamic 3-D measurement based on fringe-to-fringe
transformation using deep learning. Opt. Express 2020, 28, 9405–9418. [CrossRef] [PubMed]
47. Nguyen, H.; Wang, Z. Accurate 3D Shape Reconstruction from Single Structured-Light Image via Fringe-to-Fringe Network.
Photonics 2021, 8, 459. [CrossRef]
Sensors 2024, 24, 3246 18 of 18
48. Yang, Y.; Hou, Q.; Li, Y.; Cai, Z.; Liu, X.; Xi, J.; Peng, X. Phase error compensation based on Tree-Net using deep learning. Opt.
Lasers Eng. 2021, 143, 106628. [CrossRef]
49. Qi, Z.; Liu, X.; Pang, J.; Hao, Y.; Hu, R.; Zhang, Y. PSNet: A Deep Learning Model-Based Single-Shot Digital Phase-Shifting
Algorithm. Sensors 2023, 23, 8305. [CrossRef]
50. Yan, K.; Khan, A.; Asundi, A.; Zhang, Y.; Yu, Y. Virtual temporal phase-shifting phase extraction using generative adversarial
networks. Appl. Opt. 2022, 61, 2525–2535. [CrossRef]
51. Fu, Y.; Huang, Y.; Xiao, W.; Li, F.; Li, Y.; Zuo, P. Deep learning-based binocular composite color fringe projection profilometry for
fast 3D measurements. Opt. Lasers Eng. 2024, 172, 107866. [CrossRef]
52. Nguyen, H.; Ly, K.; Li, C.; Wang, Z. Single-shot 3D shape acquisition using a learning-based structured-light technique. Appl. Opt.
2022, 61, 8589–8599. [CrossRef] [PubMed]
53. Nguyen, H.; Novak, E.; Wang, Z. Accurate 3D reconstruction via fringe-to-phase network. Measurement 2022, 190, 110663.
[CrossRef]
54. Yu, R.; Yu, H.; Sun, W.; Akhtar, N. Color phase order coding and interleaved phase unwrapping for three-dimensional shape
measurement with few projected pattern. Opt. Laser Technol. 2024, 168, 109842. [CrossRef]
55. Liang, J.; Zhang, J.; Shao, J.; Song, B.; Yao, B.; Liang, R. Deep Convolutional Neural Network Phase Unwrapping for Fringe
Projection 3D Imaging. Sensors 2020, 20, 3691. [CrossRef] [PubMed]
56. Hu, W.; Miao, H.; Yan, K.; Fu, Y. A Fringe Phase Extraction Method Based on Neural Network. Sensors 2021, 21, 1664. [CrossRef]
[PubMed]
57. Wang, M.; Kong, L. Single-shot 3D measurement of highly reflective objects with deep learning. Opt. Express 2023, 31, 14965–14985.
58. Sun, G.; Li, B.; Li, Z.; Wang, X.; Cai, P.; Qie, C. Phase unwrapping based on channel transformer U-Net for single-shot fringe
projection profilometry. J. Opt. 2023, 1–11. [CrossRef]
59. Huang, W.; Mei, X.; Fan, Z.; Jiang, G.; Wang, W.; Zhang, R. Pixel-wise phase unwrapping of fringe projection profilometry based
on deep learning. Measurement 2023, 220, 113323. [CrossRef]
60. Yu, H.; Chen, X.; Huang, R.; Bai, L.; Zheng, D.; Han, J. Untrained deep learning-based phase retrieval for fringe projection
profilometry. Opt. Lasers Eng. 2023, 164, 107483. [CrossRef]
61. Bai, S.; Luo, X.; Xiao, K.; Tan, C.; Song, Z. Deep absolute phase recovery from single-frequency phase map for handheld 3D
measurement. Opt. Comm. 2022, 512, 128008. [CrossRef]
62. Song, J.; Liu, K.; Sowmya, A.; Sun, C. Super-Resolution Phase Retrieval Network for Single-Pattern Structured Light 3D Imaging.
IEEE Trans. Image Process. 2023, 32, 537–549. [CrossRef] [PubMed]
63. Zhu, X.; Zhao, H.; Song, L.; Wang, H.; Guo, Q. Triple-output phase unwrapping network with a physical prior in fringe projection
profilometry. Opt. Express 2023, 62, 7910–7916. [CrossRef] [PubMed]
64. Nguyen, A.; Ly, K.; Lam, V.; Wang, Z. Generalized Fringe-to-Phase Framework for Single-Shot 3D Reconstruction Integrating
Structured Light with Deep Learning. Sensors 2023, 23, 4209. [CrossRef] [PubMed]
65. Nguyen, A.; Rees, O.; Wang, Z. Learning-based 3D imaging from single structured-light image. Graph. Models 2023, 126, 101171.
[CrossRef]
66. Machineni, R.C.; Spoorthi, G.; Vengala, K.S.; Gorthi, S.; Gorthi, R.K.S.S. End-to-end deep learning-based fringe projection
framework for 3D profiling of objects. Comput. Vis. Image Underst. 2020, 199, 103023. [CrossRef]
67. Li, W.; Yu, J.; Gai, S.; Da, F. Absolute phase retrieval for a single-shot fringe projection profilometry based on deep learning. Opt.
Eng. 2021, 60, 064104. [CrossRef]
68. Tan, H.; Xu, Y.; Zhang, C.; Xu, Z.; Kong, C.; Tang, D.; Guo, B. A Y-shaped network based single-shot absolute phase recovery
method for fringe projection profilometry. Meas. Sci. Technol. 2023, 35, 035203. [CrossRef]
69. Yin, W.; Che, Y.; Li, X.; Li, M.; Hu, Y.; Feng, S.; Lam, E.Y.; Chen, Q.; Zuo, C. Physics-informed deep learning for fringe pattern
analysis. Opto-Electron. Adv. 2024, 7, 230034. [CrossRef]
70. Nguyen, H.; Liang, J.; Wang, Y.; Wang, Z. Accuracy assessment of fringe projection profilometry and digital image correlation
techniques for three-dimensional shape measurements. J. Phys. Photonics 2021, 3, 014004. [CrossRef]
71. Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017;
pp. 5967–5976.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.