0% found this document useful (0 votes)
13 views18 pages

Single-Shot 3D Reconstruction Via Nonlinear Fringe Transformation: Supervised and Unsupervised Learning Approaches

This paper presents a novel single-shot 3D shape reconstruction method utilizing nonlinear fringe transformation through both supervised and unsupervised learning approaches. The method employs deep learning networks to convert a grayscale fringe input into multiple phase-shifted fringe outputs, facilitating accurate 3D reconstruction from a single image. Experimental results indicate that the unsupervised learning approach using a deep convolutional generative adversarial network (DCGAN) outperforms traditional supervised methods, highlighting the technique's potential for diverse real-world applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views18 pages

Single-Shot 3D Reconstruction Via Nonlinear Fringe Transformation: Supervised and Unsupervised Learning Approaches

This paper presents a novel single-shot 3D shape reconstruction method utilizing nonlinear fringe transformation through both supervised and unsupervised learning approaches. The method employs deep learning networks to convert a grayscale fringe input into multiple phase-shifted fringe outputs, facilitating accurate 3D reconstruction from a single image. Experimental results indicate that the unsupervised learning approach using a deep convolutional generative adversarial network (DCGAN) outperforms traditional supervised methods, highlighting the technique's potential for diverse real-world applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

sensors

Article
Single-Shot 3D Reconstruction via Nonlinear Fringe
Transformation: Supervised and Unsupervised
Learning Approaches
Andrew-Hieu Nguyen 1 and Zhaoyang Wang 2, *

1 Neuroimaging Research Branch, National Institute on Drug Abuse, National Institutes of Health,
Baltimore, MD 21224, USA; [email protected]
2 Department of Mechanical Engineering, School of Engineering, The Catholic University of America,
Washington, DC 20064, USA
* Correspondence: [email protected]

Abstract: The field of computer vision has been focusing on achieving accurate three-dimensional
(3D) object representations from a single two-dimensional (2D) image through deep artificial neural
networks. Recent advancements in 3D shape reconstruction techniques that combine structured
light and deep learning show promise in acquiring high-quality geometric information about object
surfaces. This paper introduces a new single-shot 3D shape reconstruction method that uses a
nonlinear fringe transformation approach through both supervised and unsupervised learning
networks. In this method, a deep learning network learns to convert a grayscale fringe input
into multiple phase-shifted fringe outputs with different frequencies, which act as an intermediate
result for the subsequent 3D reconstruction process using the structured-light fringe projection
profilometry technique. Experiments have been conducted to validate the practicality and robustness
of the proposed technique. The experimental results demonstrate that the unsupervised learning
approach using a deep convolutional generative adversarial network (DCGAN) is superior to the
supervised learning approach using UNet in image-to-image generation. The proposed technique’s
ability to accurately reconstruct 3D shapes of objects using only a single fringe image opens up vast
opportunities for its application across diverse real-world scenarios.
Citation: Nguyen, A.-H.; Wang, Z.
Single-Shot 3D Reconstruction via
Keywords: fringe projection; deep learning; generative adversarial network; three-dimensional
Nonlinear Fringe Transformation:
Supervised and Unsupervised
imaging; three-dimensional shape measurement
Learning Approaches. Sensors 2024,
24, 3246. https://doi.org/10.3390/
s24103246
1. Introduction
Academic Editors: Marcin Woźniak
and Pei-Ju Chiang
We live in a world where big data are a dominant force, generating an enormous
amount of information every second. The capture of comprehensive data from the physical
Received: 27 February 2024 world necessitates the use of a wide range of integrated sensors and system devices,
Revised: 22 April 2024 among which cameras are some of the most common. Cameras have a significant impact
Accepted: 16 May 2024 on many areas of our lives, such as security and surveillance, healthcare and medical
Published: 20 May 2024
imaging, environmental monitoring, aerial photography, and media and entertainment
using augmented reality (AR) and virtual reality (VR) [1–4]. Additionally, cameras play
a crucial role in the field of 3D reconstruction, where they are used to capture detailed
images and videos of physical spaces or objects, enabling the creation of 3D representations
Copyright: © 2024 by the authors.
Licensee MDPI, Basel, Switzerland.
of objects from 2D images.
This article is an open access article
In the past, achieving both speed and accuracy with high performance in 3D shape
distributed under the terms and reconstruction has been a challenging task due to hardware constraints and algorithm
conditions of the Creative Commons limitations. Various techniques, such as stereo vision, time of flight (ToF), and structure
Attribution (CC BY) license (https:// from motion (SfM), prioritize speed over accuracy, making them ideal for application cases
creativecommons.org/licenses/by/ that require real-time processing, such as robots, autonomous vehicles, AR/VR, scene
4.0/). understanding, and interactive 3D modeling [5–8]. Conversely, structured light scanning,

Sensors 2024, 24, 3246. https://doi.org/10.3390/s24103246 https://www.mdpi.com/journal/sensors


Sensors 2024, 24, 3246 2 of 18

digital holography, and patch-based multi-view stereo (PMVS) or photogrammetry are


chosen for applications that require good precision and accuracy [9–11].
Over the past decade, there has been an ever-growing interest in using artificial intelli-
gence (AI) as a supportive tool for 3D shape reconstruction. With the rapid advancement of
deep learning techniques, notably, convolutional neural networks (CNNs) and generative
adversarial networks (GANs), researchers have been able to continuously improve and
apply these methods within the field of 3D reconstruction. One of the key advantages of em-
ploying deep learning models for 3D shape reconstruction lies in their capacity to accurately
and efficiently map the intricate structures of physical objects, as captured by 2D imaging,
into detailed 3D profiles. This exploit is accomplished through the use of learning-based
model networks, which are trained on large datasets comprising conventional 2D images
and their corresponding 3D shapes. The integration of 3D reconstruction techniques with
deep learning methods mainly depends on the representation of the output. Some com-
mon approaches include volumetric representations, surface-based representations, and
intermediation techniques [12–15].
Deep learning has also emerged as a versatile tool in experimental mechanics and
optical metrology, offering abilities to enhance the performance of traditional techniques. It
has been adopted in a wide range of applications, including defect identification, shape
recognition, deformation analysis, quality control, 3D shape reconstruction, stress and
strain examination, and fringe pattern and interferogram analysis [16–22]. Of notable
interest is the integration of fringe projection profilometry (FPP) with deep learning, which
has attracted significant attention among researchers and engineers. The integration fo-
cuses on tasks such as fringe analysis, phase determination, depth estimation, and 3D
reconstruction [23–25]. While relatively new compared to traditional computer vision
techniques, the approach has undergone considerable development and can be classified
into various schemes based on the input type (single-shot or multi-shot), stage structure
(single-stage or multi-stage), and whether it involves direct conversion or passes through
intermediate outputs.
This paper introduces an innovative method for reconstructing 3D shape models by
integrating fringe projection and deep learning. The proposed approach involves utilizing
a deep learning network to perform nonlinear fringe conversion, generating multiple
phase-shifted fringe patterns from a single high-frequency fringe image. These predicted
fringe patterns are then employed to derive step-by-step intermediate results for classical
FPP techniques, such as wrapped phase, fringe order, and unwrapped phase, facilitating
subsequent depth estimation and 3D reconstruction. Figure 1A demonstrates a flowchart of
the proposed method, describing the step-by-step nonlinear fringe transformation leading
to wrapped and unwrapped phase determination and eventual depth estimation.
Classical FPP methods are utilized to gather the necessary datasets to facilitate the
learning process using deep learning techniques. Classical FPP techniques within the multi-
frequency phase-shifting (MFPS) framework are renowned in optical metrology for their
effectiveness in high-accuracy 3D shape reconstruction and deformation determination.
This manuscript employs two types of MFPS schemes called the dual-frequency and
triple-frequency phase-shifting schemes for training, validation, and testing purposes. The
deep learning approach is leveraged to transform a single fringe pattern into multiple
fringe patterns at various frequencies and phase shift amounts required for different MFPS
schemes. Unsupervised learning with a GAN and a comparable supervised learning
network are chosen to evaluate the effectiveness of the proposed methods. To offer a more
comprehensive understanding of how the nonlinear fringe transformation operates with
varying frequencies and phase shift amounts, Figure 1B,C present an in-depth framework,
including a specific zoomed-in region.
Sensors 2024, 24, 3246 3 of 18

Figure 1. (A) Flowchart of the proposed nonlinear fringe transformation for 3D reconstruction;
(B) dual-frequency nonlinear transformation; (C) triple-frequency nonlinear transformation.

The proposed technique is well supported by experimental results and assessments,


demonstrating its effectiveness and superiority over existing methods. Its key advantages
are as follows:
• Unlike other approaches that require multiple patterns with different frequencies or
additional reference images, this technique operates with just a single high-frequency
pattern as input.
• It introduces a novel unsupervised learning network (specifically, a GAN) rather than
a supervised autoencoder network.
• The nonlinear fringe conversion is achieved through a single network, eliminating the
need for multiple sub-networks or a multi-stage network.
• The technique maintains the accuracy advantage of the traditional FPP-based approach
while addressing the time-consuming issue associated with acquiring multiple phase-
shifted fringe patterns.
This innovative approach holds substantial promise for advancing the 3D imaging
and shape reconstruction technology. By providing an accurate and efficient means of
Sensors 2024, 24, 3246 4 of 18

generating 3D models for real-world targets, it has the potential to make a notable impact
across numerous applications in various fields.
The following parts of the document are organized in the subsequent manner: Section 2
outlines various related methods in the field that integrate fringe projection profilometry
technique with deep learning for 3D shape reconstruction. Section 3 depicts the fringe pro-
jection method, which includes dual-frequency and triple-frequency approaches. Section 4
describes the structure of the training networks. Section 5 presents diverse quantitative
and qualitative outcomes, along with exhibitions of 3D reconstruction. The last two sec-
tions discuss potential limitations, applications, future directions, and insights into the
overall framework.

2. Related Work
The integration of fringe projection with deep learning for 3D reconstruction began
in early 2019 with the methods of transforming captured fringe patterns into a depth
map [26] or intermediate outputs [27–29]. Since then, the integration primarily falls into
two categories: direct and indirect methods, regardless of whether the input is a single shot
or multiple shots.
The direct method operates on a concept that closely resembles depth estimation or
image-to-image conversion in the field of computer vision. In this approach, a network
model is trained to transform a single 2D image into either a depth map or an equiva-
lent disparity map. Specifically, the input vector consists of fringe pattern image(s), and
the output vector comprises a map image with the same resolution, where the value of
each pixel represents geometric depth or height information. Several early methods use
autoencoder-based networks, typically UNet, for the fringe-to-depth conversion [20,30],
while other approaches use unsupervised network models to predict the depth or height
map [31,32]. Nguyen and colleagues enhanced the accuracy of 3D reconstruction by in-
troducing an autoencoder-based network and later incorporating an additional h-shaped
global guidance branch in their work [33,34]. Wang et al. proposed SwinConvUNet,
which employs a self-attention mechanism and the Swin Transformer for fringe-to-depth
conversion, aiming to extract both local and global features [35]. Similar to these works,
other researchers [36–39] presented a few end-to-end network models, such as MSUNet++,
PCTNet, DF-Dnet, and DCAHINet; they focus on depth recovery through diverse multi-
scale feature fusion modules. The team in [40] introduced LiteF2DNet, a lightweight deep
learning framework designed to reduce network weights, and they tested it on computer-
aided design (CAD) objects. Further exploration and in-depth adjustment of supervised
networks for single-shot 3D measurement led to the development of a depthwise-separable
model named DD-Inceptionv2-UNet, as introduced in [41].
In contrast to direct methods, indirect methods encompass various outputs that serve
as the link for determining the unwrapped phase map and the final depth map through
system calibration. These connecting outputs, often referred to as intermediate outputs,
typically originate from the traditional FPP technique. Such outputs consist of phase-
shifted fringe patterns, the numerators and denominators of the arctangent function, the
wrapped phase map, the unwrapped phase map, and integer fringe orders. In many
studies, researchers often convert the input fringe patterns into the numerators and denom-
inators of the arctangent function [42–45]. Alternatively, transforming the fringe patterns
into multiple phase-shifted patterns is an option based on the image-to-image transfor-
mation concept [46–49]. In addition to being predicted through supervised learning, the
n-step phase-shifting fringe patterns can also be generated using the virtual temporal
phase-shifting method by employing generative adversarial networks (GANs) [50]. How-
ever, employing the discrete cosine transform (DCT) for phase unwrapping from a single
frequency may result in a less accurate determination of the true phase. The need for em-
ploying a multi-frequency phase-shifting approach in the phase unwrapping process has
driven the exploration of encoding diverse information into a composite pattern [25,51–54].
Subsequently, the wrapped phase map can serve as either the input or output vector in co-
Sensors 2024, 24, 3246 5 of 18

operation between fringe projection and learning-based networks [55–58]. Recognizing the
significance of the integer fringe orders in the phase unwrapping scheme, several methods
have trained CNN models to segment the integer fringe orders or predict the coarse phase
map [59–62]. In recent research, instead of employing multiple networks or a multi-stage
scheme to determine separate wrapped phases, fringe orders, or coarse phase maps, a
single network with multiple decoder branches has been developed to predict multiple
intermediate quantities for determining the unwrapped phase map [63–65]. In contrast to
simultaneously predicting multiple intermediate outputs, various approaches incorporate
multiple input types—such as a reference plane, additional fringe image, intermediate
fringe image, wrapped phase, fringe order, etc.—in addition to the fringe pattern, aiming
to further improve the accuracy of phase reconstruction [28,66–69].

3. Materials and Methods


The proposed method aims to use both supervised and unsupervised learning net-
works to transform fringe patterns into desired outputs in a nonlinear manner. The pre-
dicted outputs enable the reconstruction of phase information and 3D geometry. To achieve
this, an optimized GAN model and a supervised model were trained using datasets with
the help of the FPP technique. Dual-frequency and triple-frequency FPP schemes have
been adopted for generating the training labels. Details of the FPP technique and the data
labeling process are described below.

3.1. Fringe Projection Profilometry Technique with Dual-Frequency and Triple-Frequency


Four-Step Phase-Shifting Schemes
The FPP method is a widely used technique in experimental mechanics and optical
metrology that is capable of measuring 3D shapes with high precision. This approach
involves projecting a sequence of fringe images that are phase-shifted onto objects or
scenes in a vertical or horizontal direction. The distorted fringe images are captured by a
camera, which automatically encodes them with phase and shape information. Figure 2
demonstrates a typical computational workflow of the FPP-based 3D imaging technique.

Dual-frequency phase-shifted Wrapped phases Unwrapped phase Depth map


fringe patterns
𝑓1 = 79
𝑓2 = 80

Figure 2. Diagram illustrating the fringe projection profilometry method using a dual-frequency
four-step phase-shifting approach.

The fringe patterns captured by the camera can be expressed as follows [29]:
j  
Ii (u, v) = Ia (u, v) + Ib (u, v) cos ϕi (u, v) + δj (1)

where I, Ia , and Ib represent the pixel intensities of the captured patterns, the intensity
background, and the fringe amplitude at a specific pixel location (u, v). The superscript
j denotes the order of the phase-shifted image, with j ranging from 1 to 4 in the case of
a four-step phase-shifting algorithm; the subscript i implies the ith frequency; δ is the
Sensors 2024, 24, 3246 6 of 18

( j −1) π
phase-shift amount with δj = 2 . The value of ϕi (u, v) can be computed using the
standard four-step phase-shifting algorithm.
The multi-frequency phase-shifting algorithm is commonly used in FPP 3D imaging
due to its ability to manage geometric discontinuities and overlapping objects with varying
height or depth information. Our proposed approach employs a dual-frequency four-
step (DFFS) phase-shifting scheme that uses two fringe frequencies, as well as the triple-
frequency four-step (TFFS) scheme, which involves three frequencies. When using the DFFS
phase-shifting scheme, the unwrapped phase can be obtained by satisfying the condition
that the difference between the two frequencies is one. In such cases, the equations that
govern the unwrapped phase can be expressed as follows [52]:

ϕ2w ≥ ϕ1w

uw 0,
ϕ12 = ϕ2w − ϕ1w +
2π, ϕ2w < ϕ1w
uw f − ϕw 
 (2)
ϕ12 2
ϕ= ϕ2uw = ϕ2w + INT 2

Equation (2) describes the process of unwrapping the phase of two different frequen-
cies, f 1 and f 2 . It involves using their wrapped phases, ϕ1w and ϕ2w , respectively. However,
since the initial unwrapped phase, ϕ12 uw , is derived with only one fringe, it cannot be used

directly due to the noise caused by the difference between the two frequencies. Instead, ϕ12 uw

serves as the interfering unwrapped phase for the hierarchical phase-unwrapping process
of ϕ2uw . The final unwrapped phase, denoted as ϕ, corresponds to the phase distribution of
the highest fringe frequency. It is noted that this study utilizes f 1 = 79 and f 2 = 80, which
meet the requirements of the DFFS scheme.
The TFFS scheme adopts three fringe frequencies that must meet a specific condi-
tion to compute the unwrapped phase of the fringe patterns with the highest frequency.
Specifically, it requires that ( f 3 − f 2 ) − ( f 2 − f 1 ) = 1, where ( f 3 − f 2 ) > ( f 2 − f 1 ) > 0. The
unwrapped phase can be determined by a set of hierarchical equations [47,53]:

ϕ2w ⩾ ϕ1w

w 0
ϕ12 = ϕ2w − ϕ1w +
2π ϕ2w < ϕ1w
ϕ3w ⩾ ϕ2w

w 0
ϕ23 = ϕ3w − ϕ2w +
2π ϕ3w < ϕ2w
w ⩾ ϕw

w w 0 ϕ23 12
ϕ123 = ϕ23 − ϕ12 + w w (3)
2π ϕ23 < ϕ12
 w 
ϕ123 ( f 3 − f 2 ) − ϕ23
w
ϕ23 = ϕ23 + INT 2π

 
f
ϕ23 f −3 f − ϕ3w
ϕ = ϕ3uw = ϕ3w + INT 3 2 2π

The conditions for the TFFS scheme may seem more complicated, but they are designed
to ensure that the highest-frequency fringe patterns can be accurately analyzed. Equation (3)
involves two different types of phases, wrapped and unwrapped, denoted as ϕw and ϕuw ,
respectively. The function “INT” is used to round off numbers to the nearest integer. The
term ϕmn represents the difference between the phase values of two points, ϕm and ϕn ,
where the difference ( f n − f m ) corresponds to the number of wrapped fringes in the phase
map. The algorithm works by using the fact that ϕ123 , which has only one fringe in the
pattern, is both wrapped and unwrapped, and this property enables a hierarchical phase-
unwrapping process connecting ϕ123 and ϕ3 through ϕ23 . Finally, the highest-frequency
fringe pattern, ϕ3 , is used for the final phase determination, as it provides the highest level
of accuracy.
To extract height or depth information, we utilize the FPP 3D imaging technique,
which directly reconstructs the data from the unwrapped phase obtained from Equation (2)
Sensors 2024, 24, 3246 7 of 18

or Equation (3). The following explanation of the depth map extraction from ϕ is provided
in [70]:
 ⊺
c P1 P2
z=  ⊺
d P1 P2
c = {1 c1 c2 c3 · · · c17 c18 c19 }
d = {d0 d1 d2 d3 · · · c17 d18 d19 } (4)
n o
P1 = 1 ϕ u uϕ v vϕ u2 u2 ϕ uv uvϕ v2 v2 ϕ
n o
P2 = u3 u3 ϕ u2 v u2 vϕ uv2 uv2 ϕ v3 v3 ϕ .

The parameters, denoted as c1 to c19 and d0 to d19 in the equation, are pre-determined
through a system calibration process. It is noteworthy that after the determination of the
height or depth z at each pixel coordinate (u, v), the other two coordinates, x and y, can be
directly determined from the calibrated camera model.

3.2. Data Labeling Process


To assess the effectiveness of the proposed nonlinear transformation framework, we
employed a 3D camera (Model: RVC-X mini, RVBUST INC., Shenzhen, China) to capture
real datasets consisting of a collection of plaster sculptures. The sculptures were chosen for
their various geometric surfaces and shapes, with the aim of ensuring that the dataset was
diverse and representative of real-world objects. During the capture process, we randomly
placed objects in the scene, varying their heights and orientations relative to a reference
plane. To further increase the diversity of the dataset, they were grouped in two or more
and placed arbitrarily in the scene using the same method but with different orientations
and height/depth changes. Typically, objects in the scene were situated within a depth
range of −20 mm to 160 mm relative to the reference plane, which was about 850 mm away
from the camera baseline.
Following the described capture strategy, we obtained two separate datasets based on
the DFFS and TFFS schemes. The DFFS scheme captured a total of 2048 scenes with a spatial
resolution of 640 × 448. For each scene, we used two different frequencies and a four-step
phase-shifting scheme. Each scene was illuminated with eight uniform sinusoidal fringe
patterns captured simultaneously by the camera. The first fringe image of each frequency
serves as the single input image, denoted as I79 1 and I 1 , and all eight captured fringe images
80
form the output vector of the training network. Moreover, a traditional DFFS scheme was
used to obtain the ground-truth phase and depth maps for further comparisons. Figure 3
illustrates several examples of the training input–output pairs, as well as the ground-truth
unwrapped phase and depth map. Our proposed approach involves taking a single image
of a fringe pattern as an input and generating multiple phase-shifted fringe images, as
depicted in the second column and the third to fourth columns of Figure 3. Other images,
including grayscale images, ground-truth phase maps, and ground-truth depth maps, are
included to facilitate comparison of the final 3D reconstruction results.
In addition to the DFFS scheme, we implemented the TFFS scheme using three different
fringe frequencies, f 1 = 61, f 2 = 70, and f 3 = 80. This allows for validation of the proposed
nonlinear transformation technique with varying spacings in fringe patterns. The TFFS
datasets comprise a total of 1500 data samples, each with a resolution of 640 × 352, and
12 fringe images were recorded for each scene.
Sensors 2024, 24, 3246 8 of 18

Grayscale image Input image Output images (multiple phase-shifted fringes) Ground-truth phase Ground-truth depth

Figure 3. Illustration of the input–output pairs used for training within the datasets employing the
DFFS scheme.

4. Supervised and Unsupervised Networks for Nonlinear Fringe Transformation


4.1. Network Architecture
Our proposed work achieves image-to-image translation through a supervised learn-
ing approach utilizing the UNet architecture, a widely recognized autoencoder-based
network specifically tailored for image segmentation tasks. The UNet architecture com-
prises an encoder and a decoder path, forming a U-shaped structure that includes a
bottleneck at the bottom connecting the encoder and decoder. The encoder path gradually
reduces the spatial dimension through max-pooling layers, while filter sizes are increased
to extract hierarchical features using convolution layers. The decoder path reverses these
operations using transposed convolutions for up-sampling. To preserve fine feature infor-
mation during up-sampling, skip connections are utilized to copy corresponding layers
from the encoder and concatenate them with the decoder path. Each convolution layer
within the UNet architecture incorporates a LeakyReLU activation function with an alpha
value set to 0.2, effectively addressing the zero-gradient problem commonly encountered
in training deep neural networks. The final stage of the process involves linear activa-
tion through a 1 × 1 convolution layer, which maps the extracted features to arbitrary
pixel values necessary for generating fringe patterns. For further insights into the UNet
model’s application in fringe-to-fringe transformation, detailed information can be found
in Refs. [47,52]. Figure 4A shows the supervised UNet model tailored for nonlinear fringe
transformation tasks.
In unsupervised learning, the image generation task involves using a deep convolu-
tional generative adversarial network (DCGAN) for nonlinear fringe generation based on
different-frequency fringe input. The DCGAN consists of a generator and a discriminator,
both trained simultaneously through adversarial training. The generator’s objective is to
create realistic fringes that can deceive the discriminator, while the discriminator aims to
accurately distinguish between real and generated/fake fringes. Through the adversarial
training process involving generator and discriminator loss functions, the weights are
updated, encouraging the generator to progressively enhance its ability to produce lifelike
Sensors 2024, 24, 3246 9 of 18

fringes. Figure 4B visualizes the architecture of the unsupervised model DCGAN, which
includes two separate parts of the generator and discriminator.

(A) Supervised-learning model (UNet) for image-to-image translation


Convolution 3×3 + LeakyReLU
Max Pooling
Transpose Convolution
Convolution 1×1 + Linear
Concatenation
Copy

(B) Unsupervised-learning model (DCGAN) for image-to-image translation


Real images (Ground-truth fringes)

Discriminator
Real

Generated fringe images


Fake
Input image
Generator

Fine tune training

Figure 4. Architecture of the supervised UNet (A) and unsupervised DCGAN (B) models utilized in
the training process.

The generator’s architecture is similar to that of the UNet model mentioned earlier,
with the objective of generating fringe patterns closely resembling the ground-truth fringes
(real fringes). The generator comprises a series of transposed convolutional layers that
up-sample the input noise vector to generate the final output images. The generator also
contains skip connections that copy the corresponding layers from the encoder, concatenat-
ing them with the decoder path to preserve fine feature information during upsampling.
The discriminator, on the other hand, takes both generated and real fringes as input vec-
tors, discerning between them. The discriminator incorporates convolution layers with
a stride of 2 to extract feature information from the input feature map. Each convolution
layer undergoes batch normalization and applies the LeakyReLU activation function to
introduce nonlinearity. The final output layer uses a sigmoid activation function for binary
cross-entropy loss, generating a probability score that indicates the likelihood that the
input image is either real or generated/fake. In particular, probability scores approaching 1
indicate real fringes, while values approaching 0 signify generated fringes. The adversarial
training process between the generator and discriminator continues until the generator can
produce lifelike fringes that deceive the discriminator into believing they are real.

4.2. Hyperparameter Tuning and Training Process


The process of capturing and processing image datasets requires a great deal of effort
and attention. In this case, an RVC-X camera was used to capture the raw images, followed
by a pre-processing step to prepare the input–output pair for the training process. To
ensure that the input images are in an appropriate format, the input fringe images undergo
a normalization to a scale of [−1, 1]. Meanwhile, the output images could either be in
the raw format combined with a linear activation function or scaled down to [−1, 1] with
the tanh activation function. This step is essential to ensure that the input and output
images are comparable and that the neural network can accurately learn the relationship
between them.
Supervised learning involves partitioning the entire dataset into three sets—the train-
ing set, the validation set, and the test set. In this work, the partitioning is 80%:10%:10%. A
Keras framework handles the internal split between the training and validation sets. During
Sensors 2024, 24, 3246 10 of 18

the supervised learning phase, the Adam optimizer was employed with an initial learning
rate of 0.0001 for 300 learning epochs. After that, a step decay schedule was utilized for the
last 100 epochs to gradually diminish the learning rate, aiding in the convergence of the
network. The batch size was set to 1, and the loss function for the image regression task is
the mean squared error (MSE).
In the unsupervised learning scenario, the dataset was divided into training and test
sets using a ratio of 90%:10%. The Adam optimizer, with a learning rate of 0.0002 and a
beta value of 0.5, was applied to both the discriminator and the entire DCGAN model.
The discriminator utilized the MSE as the loss function, focusing on distinguishing real
fringes from the generated ones. The DCGAN loss is a combination of sigmoid cross-
entropy and L1 loss (mean absolute error—MAE), with a weight loss ratio of 1 versus
100 (i.e., LAMBDA = 100). PatchGAN, a typical discriminator architecture for image-
to-image translation, was employed to discern real fringes or generated fringes at the
patch level, using arrays of 1’s and 0’s instead of a single output, as seen in traditional
discriminators. The entire learning process involves 200 iterative epochs with a batch size of
2. Notably, samples are randomly selected internally in each epoch. In both the supervised
and unsupervised learning processes, a 10% split test set is completely separated from
the training and validation datasets. This ensures that the object surface has not been
encountered during training or validation, thereby mitigating possible overfitting issues
and biased evaluations.
Multiple graphics processing unit (GPU) nodes within the Biowulf cluster at the
National Institutes of Health (NIH) were required to train the models. Specifically, the two
main GPU nodes utilized in this project consist of 4 × NVIDIA A100 GPUs with 80 GB
VRAM each and 4 × NVIDIA V100-SXM2 GPUs with 32GB VRAM each. The programming
framework relies on Python 3.10.8, and the deep learning Keras framework version 2.11.0 is
employed to construct the network architecture, as well as for data prediction and analysis.
It is worth noting that the training time for the DCGAN model was 9 h using NVIDIA A100
GPUs with a batch size of 2 and 200 epochs. In contrast, the UNet model required less than
6 h for 400 epochs. This highlights the importance of choosing the right model architecture
and training parameters to optimize the training time without compromising the quality of
the results.

5. Experiments and Results


5.1. Assessment of Image Quality for Generated Fringe Patterns
Our proposed approach for achieving accurate 3D reconstruction begins with a nonlin-
ear fringe transformation utilizing cutting-edge image-to-image conversion techniques. To
ensure visual fidelity in the predicted fringe patterns, we employ a range of robust image
quality assessment methods. Notably, we use the structural similarity index measurement
(SSIM) and peak signal-to-noise ratio (PSNR) to assess the quality disparity between real
and generated fringes. Through this methodology, we are assured of our ability to deliver
the most accurate and reliable 3D reconstructions possible.
This experiment evaluates the image quality and quantitative metrics of generated
fringe patterns using two deep neural network models, specifically, the UNet and DCGAN.
Figure 5 illustrates a comparative analysis of the performance of these models. The UNet
model exhibited superior performance by accurately producing the initial four fringes of
1−4
I79 with high SSIM scores ranging from 0.998 to 1.000 and PSNR values exceeding 40. This
1 , and the network’s
success was attributed to the selection of an appropriate fringe input, I79
ability to accurately shift the fringe patterns by a phase shift amount of ϕ = π/2. The
initial predicted fringe closely resembled the input, resulting in the highest SSIM and PSNR
scores. Furthermore, the UNet model effectively predicted the subsequent four fringes
with a different frequency of f 2 = 80, representing a nonlinear fringe transformation task
where the number of fringes in the generated images increased by one fringe and images
underwent a phase shift of ϕ = π/2. Despite its relatively lower SSIM and PSNR values,
Sensors 2024, 24, 3246 11 of 18

the UNet model demonstrated its ability to generate satisfactory fringe patterns for the
final 3D reconstruction process.

(A) Image Quality Assessment for non-linear fringe transformation using UNet
1
𝐼79 2
𝐼79 3
𝐼79 4
𝐼79

SSIM = 1.000 SSIM = 0.998 SSIM = 0.997 SSIM = 0.998


PSNR = 53.007 PSNR = 42.011 PSNR = 39.989 PSNR = 42.079
1 2 3 4
𝐼80 𝐼80 𝐼80 𝐼80

SSIM = 0.997 SSIM = 0.998 SSIM = 0.999 SSIM = 0.998


PSNR = 40.392 PSNR = 41.273 PSNR = 43.510 PSNR = 42.535

(B) Image Quality Assessment for non-linear fringe transformation using DCGAN
1 2 3 4
𝐼79 𝐼79 𝐼79 𝐼79

SSIM = 0.998 SSIM = 0.995 SSIM = 0.992 SSIM = 0.994


PSNR = 34.401 PSNR = 35.200 PSNR = 32.204 PSNR = 32.606
1 2 3 4
𝐼80 𝐼80 𝐼80 𝐼80

SSIM = 0.964 SSIM = 0.978 SSIM = 0.990 SSIM = 0.996


PSNR = 27.980 PSNR = 30.890 PSNR = 36.082 PSNR = 34.451

Figure 5. Assessment of image quality using the SSIM and PSNR metrics for predicted fringe images.

In comparison, the DCGAN model demonstrated an inferior performance to its super-


vised UNet counterpart. The eight generated fringes and their corresponding SSIM and
PSNR values indicated a decline in quality compared to the real fringes. The initial four
fringes at the same frequency f 1 = 79 as the fringe input were predicted correctly, but the
subsequent four fringes at frequency f 2 = 80 showed a deterioration in quality compared
1−4
to the real fringes. Notably, the degradation of the four fringes I80 was evident in the last
row, where several vertical white pixels were visible.

5.2. The 3D Reconstruction of a Single Object and Multiple Objects via the DFFS Scheme
After successfully training the model, we proceeded to utilize a single fringe to gener-
ate new fringe patterns. The experiment employed the DFFS scheme with I79 1 as the initial
1−4 1−4
fringe input. The trained model produced eight fringes of I79 and I80 as outputs. Figure 6
shows a few representative results obtained by using two different approaches: UNet and
DCGAN. The first column in the figure presents the fringe input to the network, followed
by the ground-truth 3D shape generated by the conventional DFFS scheme. Subsequently,
the third and fourth columns display the 3D reconstruction, highlighting key deviations
from the reference 3D shape.
To evaluate the model’s performance, we conducted a comparative analysis between
the UNet and DCGAN schemes. While both approaches successfully rendered an overall
3D representation of the targets, distinctions in their performance emerged. Upon closer
Sensors 2024, 24, 3246 12 of 18

examination of the generated 3D shapes, it became evident that the DCGAN model pro-
duced more detailed depth information than the UNet did, as the region highlighted by
a green square shows. Moreover, the region highlighted by a green circle indicated that
the UNet model struggled to generate the desired 3D shape due to incorrect fringe orders.
The region in the yellow circle further shows the UNet’s deficiency in capturing certain
depth details owing to inaccuracies in fringe ordering. Additionally, we noted some shape
discontinuities along the edges for both approaches.

Input image Ground-truth 3D UNet 3D DCGAN 3D

Figure 6. Comparison of the 3D reconstruction results between UNet and DCGAN models when
utilizing the DFFS scheme.
1 and I 1
5.3. Investigation of the Frequency-Dependent Fringe Input of I79 80
There is a growing concern about how using different fringe inputs with different
frequencies could affect the reconstruction of 3D shapes. To investigate this issue, two
additional networks, UNet and DCGAN, were trained using one fringe input I80 1 while
1−4 1−4
keeping the output fringes as either I79 or I80 . Figure 7 is divided into two sections to
make it easier to visually compare the use of different fringe inputs.
Based on the visual observations, it appears that the performance of both the UNet
and DCGAN models is inferior when using the fringe input I80 1 in comparison with I 1 .
79
This discrepancy arises from the surfaces of the shapes appearing blurred due to sub-
sequent post-processing steps aimed at filling gaps and eliminating shape irregularities.
It is important to note that these post-processing steps are applied uniformly to all re-
construction regions with identical parameters to ensure a fair visual comparison. For a
more detailed examination of single-object reconstruction, the green arrows highlight the
rough nature of the reconstructions and their inability to capture finer details when the
network operates with a fringe frequency of f = 80. Furthermore, the red arrows point out
shape-disconnected regions and missing areas in the reconstructed object when using the
1 .
fringe input I80
Overall, the observations indicate that employing the fringe input I80 1 leads to in-

ferior shape reconstructions characterized by disconnected regions and a deficiency in


1 yields a clearer and more accurate
fine details. Conversely, utilizing the fringe input I79
shape representation.
Sensors 2024, 24, 3246 13 of 18

(A) Fringe input with f = 79 (B) Fringe input with f = 80

Input image
Ground-truth 3D
UNet 3D
DCGAN 3D

Figure 7. Comparison of 3D reconstruction using distinct fringe inputs at frequencies f = 79 and


f = 80.

5.4. The 3D Reconstruction of a Single-Object Scene via the TFFS Scheme


The previous tests were carried out using the DFFS scheme, which involves a nonlinear
fringe transformation to generate fringe patterns with two distinct frequencies, and the
frequency difference is exactly one. However, it is unclear if deep learning networks can
effortlessly handle multiple frequencies. Therefore, this experiment uses the TFFS scheme to
perform a more complex nonlinear transformation task. In the TFFS scheme, the differences
in fringe frequencies are much larger; specifically, the following three frequencies are used:
1 and the
f 1 = 61, f 2 = 70, and f 3 = 80. In this particular experiment, the fringe input is I80
1−4 1−4 1−4
output arrays consist of 12 fringes, namely, I61 , I70 , and I80 .
The results of 3D reconstruction using the TFFS scheme are presented in Figure 8.
Again, a comparison between the UNet and DCGAN models for nonlinear fringe transfor-
mation is conducted. Upon analysis, it is evident that the unsupervised learning approach
utilized by the DCGAN model outperforms the UNet model.
The supervised learning approach is limited in its ability to capture objects’ shape
details accurately, and it introduces considerable noise and discontinuities along the edges.
In contrast, the unsupervised learning approach employed by DCGAN excels in recon-
structing most of the shape details, resembling the ground-truth 3D shape with a higher
accuracy. The DCGAN model’s ability to learn from unlabelled data is the key to its su-
perior performance, as it can capture more complex features and patterns in the object’s
shape, leading to higher accuracy and better reconstruction results.
Sensors 2024, 24, 3246 14 of 18

Single input GT 3D UNet 3D DCGAN 3D

Figure 8. Visualization of 3D reconstruction in single-object scenes utilizing the TFFS scheme.

6. Discussion
This article presents an innovative method for reconstructing 3D image(s) by inte-
grating structured light and deep learning. The proposed approach involves training a
single fringe pattern to generate multiple phase-shifted patterns with varying frequencies,
followed by a conventional algorithm for subsequent 3D reconstruction. Validation of
the technique was conducted on two datasets, employing both supervised and unsuper-
vised deep learning networks to execute the nonlinear fringe transformation. The results
showed that the unsupervised learning strategy using the DCGAN model outperformed
the supervised UNet model on both datasets.
During the training and prediction stages, both supervised and unsupervised learning
models tend to produce some incorrect fringe orders during obtaining the unwrapped
phase. These errors lead to the generation of multiple layers of 3D shapes and irrelevant
scattering point clouds. However, these limitations can be addressed with various auto-
matic noise-removing techniques. It is noteworthy that prior state-of-the-art techniques,
which had sufficient input information, such as encoded composite fringes, multiple fringes,
or fringes with a reference image, did not encounter such issues. Nevertheless, the elimi-
nation of irrelevant point clouds can also be achieved manually through post-processing
steps. The comparison between UNet and DCGAN also highlighted the importance of
obtaining accurate fringe orders in generating detailed 3D shapes, which is an obvious
requirement. It is suggested that further exploration in this area could lead to more precise
3D reconstruction results.
In our proposed approach, we utilized the DFFS and TFFS schemes with high-
frequency fringe patterns for both training and testing scenarios. The use of low-frequency
(e.g., 4, 8, etc.) fringe images is undesirable, as they typically result in lower accuracy
in phase determinations. As demonstrated in our previous work, low-frequency fringe
patterns yield inferior 3D reconstruction results compared to high-frequency fringe patterns
in the image-to-depth construction process [34].
Sensors 2024, 24, 3246 15 of 18

This study used both an unsupervised DCGAN and the conditional pix2pix model [71]
for image-to-image translation and compared their performance. Pix2pix is a type of
conditional GAN where the input is combined with both real and generated images to
distinguish between real and fake. However, the pix2pix model was observed to be less
accurate in generating the 3D shape of the object and introduced more scattered noise and
incorrect layers. It should be noted that the pix2pix model was trained in parallel with
the DCGAN model, but none of its results outperformed the UNet and DCGAN models.
Therefore, we have left out the relevant descriptions. It is also noted that the DCGAN
yields some unexpected artifacts, which can be seen in Figure 5. Although such artifacts are
common in GAN models, our experimental results have demonstrated that these generative
traits effectively cope with the nonlinear fringe transformation task, whereas the supervised
learning model falls short.
Given that the proposed networks are trained on robust GPU cards, i.e., NVIDIA
A100 and V100-SXM2, equipped with lots of VRAM, the likelihood of encountering out-of-
memory issues during training is significantly reduced. However, rather than generating
multiple intermediate outputs in the form of phase-shifted fringe patterns, a more efficient
strategy could be producing the numerators and denominators used in subsequent phase
calculation. This approach can help cope with the potential memory-related challenges.
In this specific situation, the memory required for the outputs can be reduced to half. It
is worth mentioning that training for 400 epochs in the supervised learning process is
significantly faster than training for 200 epochs in the unsupervised learning approach.
This discrepancy arises because the supervised learning approach loads all of the training
and validation datasets into the VRAM at once and conducts internal splitting, whereas
the unsupervised learning approach automatically draws a small batch randomly into the
VRAM for training. Additionally, the inclusion of an additional discriminator architecture,
along with the fine-tuning process, adds to the computational cost of the unsupervised
learning approach.
We acknowledge that this paper lacks a thorough comparison with other cutting-edge
techniques for 3D reconstruction that integrate fringe projection and deep learning. Ob-
taining sample codes from these techniques is challenging, and each method involves
its own hyperparameter tuning and network construction. Consequently, the proposed
approach only utilizes the most popular network for fringe-to-fringe transformation, specif-
ically, an autoencoder-based UNet, to compare it with the unsupervised-learning-based
DCGAN. Furthermore, the proposed technique primarily serves as a proof of concept for
nonlinear fringe transformation using deep learning. More comprehensive comparisons
can be performed in future work as the field progresses. Future research could explore
more rigorous comparisons and strive to enhance the accuracy of the nonlinear fringe
transformation method.
Given the inherent complexity in constructing robust network architectures and tuning
deep learning models, we recognize that future datasets should include a wider variety of
objects and more challenging scenes with diverse testing conditions to address the concern
of possible overfitting issues. We are committed to expanding our dataset collection efforts
to provide the research community with more comprehensive and representative datasets.

7. Conclusions
In summary, this manuscript introduces a novel 3D reconstruction approach via a
nonlinear fringe transformation method that combines fringe projection with deep learning.
This technique utilizes a single high-frequency fringe image as input and generates multiple
high-frequency fringe images with varying frequencies and phase shift amounts. These
intermediate outputs facilitate the subsequent 3D reconstruction process through the
traditional FPP technique. Since the proposed method requires only a single fringe image
to reconstruct the 3D shapes of objects with good accuracy, it provides great potential
for various dynamic applications, such as 3D body scanning, heritage and preservation
scanning, virtual clothing try-ons, indoor mapping, and beyond. Regarding future works,
Sensors 2024, 24, 3246 16 of 18

there is a wide range of possibilities for exploring and enhancing the proposed approach.
One possible direction is to investigate the method’s performance on larger datasets and
more complex objects. Moreover, researchers can explore different imaging configurations
to enhance reconstruction quality and accuracy. Additionally, the potential of this method
is worth exploring in other domains, such as medical imaging or robotics.

Author Contributions: Conceptualization, A.-H.N.; methodology, A.-H.N. and Z.W.; software,


A.-H.N. and Z.W.; validation, A.-H.N.; formal analysis, A.-H.N. and Z.W.; investigation, A.-H.N.;
resources, A.-H.N.; data curation, A.-H.N. and Z.W.; writing—original draft preparation, A.-H.N. and
Z.W.; writing—review and editing, A.-H.N. and Z.W.; visualization, A.-H.N.; project administration,
Z.W. All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported by the United States Army Research Office under grant
W911NF-23-1-0367.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available upon request from the
corresponding author.
Acknowledgments: This work utilized the computational resources of the NIH HPC Biowulf cluster
(http://hpc.nih.gov, accessed on 1 November 2023).
Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Kim, S.J.; Lin, H.T.; Lu, Z.; Süsstrunk, S.; Lin, S.; Brown, M.S. A New In-Camera Imaging Model for Color Computer Vision and
Its Application. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2289–2302. [PubMed]
2. Kim, J.; Gurdjos, P.; Kweon, I. Geometric and algebraic constraints of projected concentric circles and their applications to camera
calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 637–642.
3. Fleck, S.; Straßer, W. Smart Camera Based Monitoring System and Its Application to Assisted Living. Proc. IEEE 2008, 96, 1698–1714.
[CrossRef]
4. Capel, D.; Zisserman, A. Computer vision applied to super resolution. IEEE Signal Process. Mag. 2003, 20, 75–86. [CrossRef]
5. Kolb, A.; Barth, E.; Koch, R.; Larsen, R. Computer vision applied to super resolution. Comput. Graph. Forum 2010, 29, 141–159.
[CrossRef]
6. Wang, Z.; Kieu, H.; Nguyen, H.; Le, M. Digital image correlation in experimental mechanics and image registration in computer
vision: Similarities, differences and complements. Opt. Lasers Eng. 2015, 65, 18–27. [CrossRef]
7. Nguyen, H.; Wang, Z.; Jones, P.; Zhao, B. Accurate 3D shape measurement of multiple separate objects with stereo vision. Appl.
Opt. 2017, 56, 9030–9037. [CrossRef] [PubMed]
8. Westoby, M.; Brasington, J.; Glasser, N.; Hambrey, M.; Reynolds, J. ‘Structure-from-Motion’ photogrammetry: A low-cost, effective
tool for geoscience applications. Geomorphology 2012, 179, 300–314. [CrossRef]
9. Geng, J. Structured-light 3D surface imaging: A tutorial. Adv. Opt. Photonics 2011, 3, 128–160. [CrossRef]
10. Osten, W.; Faridian, A.; Gao, P.; Körner, K.; Naik, D.; Pedrini, G.; Singh, A.K.; Takeda, M.; Wilke, M. Recent advances in digital
holography [invited]. Appl. Opt. 2014, 53, G44–G63. [CrossRef]
11. Shen, S. Accurate Multiple View 3D Reconstruction Using Patch-Based Stereo for Large-Scale Scenes. IEEE Trans. Image Process.
2013, 22, 1901–1914. [CrossRef] [PubMed]
12. Han, X.F.; Laga, H.; Bennamoun, M. Image-Based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning
Era. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1578–1604. [CrossRef] [PubMed]
13. Chen, J.; Kira, Z.; Cho, Y. Deep Learning Approach to Point Cloud Scene Understanding for Automated Scan to 3D Reconstruction.
J. Comput. Civ. Eng. 2019, 33, 04019027. [CrossRef]
14. Zhu, Z.; Wang, X.; Bai, S.; Yao, C.; Bai, X. Deep Learning Representation using Autoencoder for 3D Shape Retrieval. Neurocomputing
2016, 204, 41–50. [CrossRef]
15. Wang, G.; Ye, J.; Man, B. Deep learning for tomographic image reconstruction. Nat. Mach. Intell. 2020, 2, 737–748. [CrossRef]
16. Zuo, C.; Qian, J.; Feng, S.; Yin, W.; Li, Y.; Fan, P.; Han, J.; Qian, K.; Chen, Q. Deep learning in optical metrology: A review. Light
Sci. Appl. 2022, 11, 39. [CrossRef] [PubMed]
17. Zhang, T.; Jiang, S.; Zhao, Z.; Dixit, K.; Zhou, X.; Hou, J.; Zhang, Y.; Yan, C. Rapid and robust two-dimensional phase unwrapping
via deep learning. Opt. Express 2019, 27, 23173–23185. [CrossRef] [PubMed]
18. Maggipinto, M.; Terzi, M.; Masiero, C.; Beghi, A.; Susto, G.A. A Computer Vision-Inspired Deep Learning Architecture for Virtual
Metrology Modeling with 2-Dimensional Data. IEEE Trans. Semicond. Manuf. 2018, 31, 376–384. [CrossRef]
Sensors 2024, 24, 3246 17 of 18

19. Catalucci, S.; Thompson, A.; Piano, S.; Branson, D.T., III; Leach, R. Optical metrology for digital manufacturing: A review. Int. J.
Adv. Manuf. Technol. 2022, 120, 4271–4290. [CrossRef]
20. Nguyen, H.; Wang, Y.; Wang, Z. Single-Shot 3D Shape Reconstruction Using Structured Light and Deep Convolutional Neural
Networks. Sensors 2020, 20, 3718. [CrossRef] [PubMed]
21. Yang, R.; Li, Y.; Zeng, D.; Guo, P. Deep DIC: Deep learning-based digital image correlation for end-to-end displacement and strain
measurement. J. Mater. Process. Technol. 2022, 302, 117474. [CrossRef]
22. Nguyen, H.; Tran, T.; Wang, Y.; Wang, Z. Three-dimensional Shape Reconstruction from Single-shot Speckle Image Using Deep
Convolutional Neural Networks. Opt. Lasers Eng. 2021, 143, 106639. [CrossRef]
23. Feng, S.; Zuo, C.; Zhang, L.; Yin, W.; Chen, Q. Generalized framework for non-sinusoidal fringe analysis using deep learning.
Photonics Res. 2021, 9, 1084–1098. [CrossRef]
24. Yan, K.; Yu, Y.; Huang, C.; Sui, L.; Qian, K.; Asundi, A. Generalized framework for non-sinusoidal fringe analysis using deep
learning. Opt. Comm. 2019, 437, 148–152. [CrossRef]
25. Li, Y.; Qian, J.; Feng, S.; Chen, Q.; Zuo, C. Composite fringe projection deep learning profilometry for single-shot absolute 3D
shape measurement. Opt. Express 2022, 30, 3424–3442. [CrossRef] [PubMed]
26. Jeught, S.; Dirckx, J. Deep neural networks for single shot structured light profilometry. Opt. Express 2019, 27, 17091–17101.
[CrossRef] [PubMed]
27. Shi, J.; Zhu, X.; Wang, H.; Song, L.; Guo, Q. Label enhanced and patch based deep learning for phase retrieval from single frame
fringe pattern in fringe projection 3D measurement. Opt. Express 2019, 27, 28929–28943. [CrossRef] [PubMed]
28. Feng, S.; Chen, Q.; Gu, G.; Tao, T.; Zhang, L.; Hu, Y.; Yin, W.; Zuo, C. Fringe pattern analysis using deep learning. Adv. Photonics
2019, 1, 025001. [CrossRef]
29. Nguyen, H.; Dunne, N.; Li, H.; Wang, Y.; Wang, Z. Real-time 3D shape measurement using 3LCD projection and deep machine
learning. Appl. Opt. 2019, 58, 7100–7109. [CrossRef] [PubMed]
30. Zheng, Y.; Wang, S.; Li, Q.; Li, B. Fringe projection profilometry by conducting deep learning from its digital twin. Opt. Express
2020, 28, 36568–36583. [CrossRef] [PubMed]
31. Fan, S.; Liu, S.; Zhang, X.; Huang, H.; Liu, W.; Jin, P. Unsupervised deep learning for 3D reconstruction with dual-frequency
fringe projection profilometry. Opt. Express 2021, 29, 32547–32567. [CrossRef]
32. Wang, F.; Wang, C.; Guan, Q. Single-shot fringe projection profilometry based on deep learning and computer graphics. Opt.
Express 2021, 29, 8024–8040. [CrossRef] [PubMed]
33. Nguyen, H.; Ly, K.; Tran, T.; Wang, Y.; Wang, Z. hNet: Single-shot 3D shape reconstruction using structured light and h-shaped
global guidance network. Results Opt. 2021, 4, 100104. [CrossRef]
34. Nguyen, A.; Sun, B.; Li, C.; Wang, Z. Different structured-light patterns in single-shot 2D-to-3D image conversion using deep
learning. Appl. Opt. 2022, 61, 10105–10115. [CrossRef] [PubMed]
35. Wang, L.; Lu, D.; Tao, J.; Qiu, R. Single-shot structured light projection profilometry with SwinConvUNet. Opt. Eng. 2022,
61, 114101.
36. Wang, C.; Zhou, P.; Zhu, J. Deep learning-based end-to-end 3D depth recovery from a single-frame fringe pattern with the
MSUNet++ network. Opt. Express 2023, 31, 33287–33298. [CrossRef] [PubMed]
37. Zhu, X.; Han, Z.; Zhang, Z.; Song, L.; Wang, H.; Guo, Q. PCTNet: Depth estimation from single structured light image with a
parallel CNN-transformer network. Meas. Sci. Technol. 2023, 34, 085402. [CrossRef]
38. Wu, Y.; Wang, Z.; Liu, L.; Yang, N.; Zhao, X.; Wang, A. Depth acquisition from dual-frequency fringes based on end-to-end
learning. Meas. Sci. Technol. 2024, 35, 045203. [CrossRef]
39. Song, X.; Wang, L. Dual-stage hybrid network for single-shot fringe projection profilometry based on a phase-height model. Opt.
Express 2024, 32, 891–906. [CrossRef] [PubMed]
40. Ravi, V.; Gorthi, R. LiteF2DNet: A lightweight learning framework for 3D reconstruction using fringe projection profilometry.
Appl. Opt. 2023, 62, 3215–3224. [CrossRef] [PubMed]
41. Wang, L.; Xue, W.; Wang, C.; Gao, Q.; Liang, W.; Zhang, Y. Depth estimation from a single-shot fringe pattern based on
DD-Inceptionv2-UNet. Appl. Opt. 2023, 62, 9144–9155. [CrossRef] [PubMed]
42. Zhao, P.; Gai, S.; Chen, Y.; Long, W.; Da, F. A multi-code 3D measurement technique based on deep learning. Opt. Lasers Eng.
2021, 143, 106623.
43. Feng, S.; Zuo, C.; Yin, W.; Gu, G.; Chen, Q. Micro deep learning profilometry for high-speed 3D surface imaging. Opt. Lasers Eng.
2019, 121, 416–427. [CrossRef]
44. Liu, X.; Yang, L.; Chu, X.; Zhou, L. A novel phase unwrapping method for binocular structured light 3D reconstruction based on
deep learning. Optik 2023, 279, 170727. [CrossRef]
45. Nguyen, A.; Wang, Z. Time-Distributed Framework for 3D Reconstruction Integrating Fringe Projection with Deep Learning.
Sensors 2023, 23, 7284. [CrossRef] [PubMed]
46. Yu, H.; Chen, X.; Zhang, Z.; Zuo, C.; Yang, Y.; Zheng, D.; Han, J. Dynamic 3-D measurement based on fringe-to-fringe
transformation using deep learning. Opt. Express 2020, 28, 9405–9418. [CrossRef] [PubMed]
47. Nguyen, H.; Wang, Z. Accurate 3D Shape Reconstruction from Single Structured-Light Image via Fringe-to-Fringe Network.
Photonics 2021, 8, 459. [CrossRef]
Sensors 2024, 24, 3246 18 of 18

48. Yang, Y.; Hou, Q.; Li, Y.; Cai, Z.; Liu, X.; Xi, J.; Peng, X. Phase error compensation based on Tree-Net using deep learning. Opt.
Lasers Eng. 2021, 143, 106628. [CrossRef]
49. Qi, Z.; Liu, X.; Pang, J.; Hao, Y.; Hu, R.; Zhang, Y. PSNet: A Deep Learning Model-Based Single-Shot Digital Phase-Shifting
Algorithm. Sensors 2023, 23, 8305. [CrossRef]
50. Yan, K.; Khan, A.; Asundi, A.; Zhang, Y.; Yu, Y. Virtual temporal phase-shifting phase extraction using generative adversarial
networks. Appl. Opt. 2022, 61, 2525–2535. [CrossRef]
51. Fu, Y.; Huang, Y.; Xiao, W.; Li, F.; Li, Y.; Zuo, P. Deep learning-based binocular composite color fringe projection profilometry for
fast 3D measurements. Opt. Lasers Eng. 2024, 172, 107866. [CrossRef]
52. Nguyen, H.; Ly, K.; Li, C.; Wang, Z. Single-shot 3D shape acquisition using a learning-based structured-light technique. Appl. Opt.
2022, 61, 8589–8599. [CrossRef] [PubMed]
53. Nguyen, H.; Novak, E.; Wang, Z. Accurate 3D reconstruction via fringe-to-phase network. Measurement 2022, 190, 110663.
[CrossRef]
54. Yu, R.; Yu, H.; Sun, W.; Akhtar, N. Color phase order coding and interleaved phase unwrapping for three-dimensional shape
measurement with few projected pattern. Opt. Laser Technol. 2024, 168, 109842. [CrossRef]
55. Liang, J.; Zhang, J.; Shao, J.; Song, B.; Yao, B.; Liang, R. Deep Convolutional Neural Network Phase Unwrapping for Fringe
Projection 3D Imaging. Sensors 2020, 20, 3691. [CrossRef] [PubMed]
56. Hu, W.; Miao, H.; Yan, K.; Fu, Y. A Fringe Phase Extraction Method Based on Neural Network. Sensors 2021, 21, 1664. [CrossRef]
[PubMed]
57. Wang, M.; Kong, L. Single-shot 3D measurement of highly reflective objects with deep learning. Opt. Express 2023, 31, 14965–14985.
58. Sun, G.; Li, B.; Li, Z.; Wang, X.; Cai, P.; Qie, C. Phase unwrapping based on channel transformer U-Net for single-shot fringe
projection profilometry. J. Opt. 2023, 1–11. [CrossRef]
59. Huang, W.; Mei, X.; Fan, Z.; Jiang, G.; Wang, W.; Zhang, R. Pixel-wise phase unwrapping of fringe projection profilometry based
on deep learning. Measurement 2023, 220, 113323. [CrossRef]
60. Yu, H.; Chen, X.; Huang, R.; Bai, L.; Zheng, D.; Han, J. Untrained deep learning-based phase retrieval for fringe projection
profilometry. Opt. Lasers Eng. 2023, 164, 107483. [CrossRef]
61. Bai, S.; Luo, X.; Xiao, K.; Tan, C.; Song, Z. Deep absolute phase recovery from single-frequency phase map for handheld 3D
measurement. Opt. Comm. 2022, 512, 128008. [CrossRef]
62. Song, J.; Liu, K.; Sowmya, A.; Sun, C. Super-Resolution Phase Retrieval Network for Single-Pattern Structured Light 3D Imaging.
IEEE Trans. Image Process. 2023, 32, 537–549. [CrossRef] [PubMed]
63. Zhu, X.; Zhao, H.; Song, L.; Wang, H.; Guo, Q. Triple-output phase unwrapping network with a physical prior in fringe projection
profilometry. Opt. Express 2023, 62, 7910–7916. [CrossRef] [PubMed]
64. Nguyen, A.; Ly, K.; Lam, V.; Wang, Z. Generalized Fringe-to-Phase Framework for Single-Shot 3D Reconstruction Integrating
Structured Light with Deep Learning. Sensors 2023, 23, 4209. [CrossRef] [PubMed]
65. Nguyen, A.; Rees, O.; Wang, Z. Learning-based 3D imaging from single structured-light image. Graph. Models 2023, 126, 101171.
[CrossRef]
66. Machineni, R.C.; Spoorthi, G.; Vengala, K.S.; Gorthi, S.; Gorthi, R.K.S.S. End-to-end deep learning-based fringe projection
framework for 3D profiling of objects. Comput. Vis. Image Underst. 2020, 199, 103023. [CrossRef]
67. Li, W.; Yu, J.; Gai, S.; Da, F. Absolute phase retrieval for a single-shot fringe projection profilometry based on deep learning. Opt.
Eng. 2021, 60, 064104. [CrossRef]
68. Tan, H.; Xu, Y.; Zhang, C.; Xu, Z.; Kong, C.; Tang, D.; Guo, B. A Y-shaped network based single-shot absolute phase recovery
method for fringe projection profilometry. Meas. Sci. Technol. 2023, 35, 035203. [CrossRef]
69. Yin, W.; Che, Y.; Li, X.; Li, M.; Hu, Y.; Feng, S.; Lam, E.Y.; Chen, Q.; Zuo, C. Physics-informed deep learning for fringe pattern
analysis. Opto-Electron. Adv. 2024, 7, 230034. [CrossRef]
70. Nguyen, H.; Liang, J.; Wang, Y.; Wang, Z. Accuracy assessment of fringe projection profilometry and digital image correlation
techniques for three-dimensional shape measurements. J. Phys. Photonics 2021, 3, 014004. [CrossRef]
71. Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017;
pp. 5967–5976.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like