0% found this document useful (0 votes)

26 views12 pages

Joint Image and Depth Estimation With Mask-Based Lensless Cameras

1) The document proposes a new method for jointly estimating image intensity and depth maps from measurements of a mask-based lensless camera. 2) The method estimates depth on a continuous domain using an alternating gradient descent algorithm, rather than assuming discrete depth planes. 3) Simulation and experimental results using the proposed method show improved reconstruction over existing methods for natural scenes with large depth variation.

Uploaded by

J Spencer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views12 pages

Joint Image and Depth Estimation With Mask-Based Lensless Cameras

Uploaded by

J Spencer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, VOL.

6, 2020 1167

Joint Image and Depth Estimation With

Mask-Based Lensless Cameras
Yucheng Zheng and M. Salman Asif

Abstract—Mask-based lensless cameras replace the lens of a scene to a single pixel on the sensor, every sensor pixel in a
conventional camera with a custom mask. These cameras can FlatCam records light from many points in the scene. A single
potentially be very thin and even flexible. Recently, it has been point source in the scene casts a shadow of the mask on the
demonstrated that such mask-based cameras can recover light
intensity and depth information of a scene. Existing depth recovery sensor, which shifts if the point moves parallel to the sensor plane
algorithms either assume that the scene consists of a small number and expands/shrinks if the point source moves toward/away from
of depth planes or solve a sparse recovery problem over a large the sensor plane. The measurements recorded on the sensor
3D volume. Both these approaches fail to recover the scenes with thus represent a superposition of shifted and scaled versions
large depth variations. In this paper, we propose a new approach of the mask shadows corresponding to light sources in different
for depth estimation based on an alternating gradient descent
algorithm that jointly estimates a continuous depth map and light directions and depths. Image and depth information about the
distribution of the unknown scene from its lensless measurements. scene is thus encoded in the measurements, and we can solve an
We present simulation results on image and depth reconstruction inverse problem to estimate both of them.
for a variety of 3D test scenes. A comparison between the proposed To jointly estimate the depth and light distribution, we propose
algorithm and other method shows that our algorithm is more a two-step approach that consists of an initialization step and
robust for natural scenes with a large range of depths. We built
a prototype lensless camera and present experimental results for an alternating gradient descent step to minimize our objective.
reconstruction of intensity and depth maps of different real objects. To preserve sharp edges in the image intensity and depth map,
we include an adaptive regularization penalty in our objective
Index Terms—Lensless imaging, flatcam, depth estimation, non-
convex optimization, alternating minimization.
function.
An overview of the reconstruction framework is illustrated
in Fig. 1. In this paper, we use the same sampling framework
I. INTRODUCTION proposed in [6]. We initialize the estimate of the depth map
by selecting a single plane or solving the greedy algorithm
EPTH estimation is an important and challenging problem
D that arises in a variety of applications including computer
vision, robotics, and autonomous systems. Existing depth es-
proposed in [6]. The greedy method assumes that the scene
consists of a small number of depth planes and fails to recover
scene with continuous depth variations. The method proposed
timation systems use stereo pairs of conventional (lens-based)
in this paper can estimate continuous depth by minimizing an
cameras or time-of-flight sensors [2]–[4]. These cameras can
objective function with respect to image intensity and depth via
be heavy, bulky, and require large space for their installation.
alternating gradient descent. We present extensive simulation
Therefore, their adoption in portable and lightweight devices
and real experimental results with different objects.
with strict physical constraints is still limited.
The main contributions of this paper are as follows.
In this paper, we propose a joint image and depth estimation r We propose a new computational framework for joint es-
framework for a computational lensless camera that consists
timation of light intensity and depth maps from a single
of a fixed, binary mask placed on top of a bare sensor. Such
image of a mask-based lensless camera. In contrast to
mask-based cameras offer an alternative design for building
other methods, our method estimates the depth map on a
cameras without lenses. A recent example of a mask-based
continuous domain. Our algorithm consists of a careful ini-
lensless camera is known as FlatCam [5]. In contrast with a
tialization step based on greedy pursuit and an alternating
lens-based camera that is designed to map every point in the
minimization step based on gradient descent.
r The problem of joint image and depth recovery is highly
Manuscript received October 16, 2019; revised April 23, 2020 and July 6, nonconvex. To tackle this issue, we present different regu-
2020; accepted July 12, 2020. Date of publication July 20, 2020; date of current
version July 31, 2020. This work was supported in part by Google Faculty Award. larization schemes that offer robust recovery on a diverse
A shorter version with preliminary results was presented in [1]. The associate dataset.
editor coordinating the review of this manuscript and approving it for publication r We present simulation results on standard 3D datasets
was Prof. Hajime Nagahara. (Corresponding author: M. Salman Asif.)
The authors are with the Department of Electrical and Computer En- and demonstrated a significant improvement over existing
gineering, University of California, Riverside, CA 92521 USA (e-mail: methods for 3D imaging using coded mask-based lensless
[email protected]; [email protected]). cameras.
This article has supplementary downloadable material available at https:// r We built a hardware prototype to capture measurements of
ieeexplore.ieee.org, provided by the authors.
Digital Object Identifier 10.1109/TCI.2020.3010360 real objects. We present image and depth reconstruction

2333-9403 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on June 08,2023 at 21:16:11 UTC from IEEE Xplore. Restrictions apply.
1168 IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, VOL. 6, 2020

Fig. 1. A coded mask-based imaging model and an overview of the proposed continuous depth estimation framework.

results of these real objects using our proposed algorithm depth-dependent imaging capability in coded aperture systems
and a comparison with existing methods. is known since the pioneering work in this domain [8], [16].
However, the classical methods usually assume that the scene
consists of a single plane at known depth. In this paper, we
II. RELATED WORK assume that the depth map is arbitrarily distributed on a contin-
A pinhole camera, also known as camera obscura, is the uous domain and the true depth map is unknown at the time of
simplest example of a mask-based lensless camera. Even though reconstruction.
a pinhole can easily provide an image of the scene onto a sensor The 3D lensless imaging problem has also recently been
plane, the image quality is often severely affected by noise studied in [6], [11], [12], [17], [18]. These methods can broadly
because the amount of light collected is limited by the pinhole be divided into two categories. In the first category, the 3D scene
aperture [7]. Coded aperture-based lensless cameras avoid this is divided into a finite number of voxels. To recover the 3D light
problem by increasing the number of pinholes and allowing more distribution, these methods solve an 1 norm-based recovery
light to reach the sensor [5], [8]–[12]. In contrast to a pinhole problem under the assumption that the scene is very sparse [12],
camera where only one inverted image of the scene is obtained [17]. In the second category, the 3D scene is divided into an
through a single pinhole, the measurements captured through a intensity map and multiple depth planes such that each pixel
coded-mask are a linear combination of all the pinhole images is assigned one intensity and depth. To solve the intensity and
under every mask element. To recover an image of the scene, depth recovery problem, these methods either sweep through
we need to solve a computational image recovery problem [5], the depth planes [11], [18] or assign depth to each pixel using a
[8], [12]. greedy method [6]. Our proposed method belongs to the second
Recent work on mask-based lensless imaging broadly falls category in which we model the image intensity and depth
into two categories. FlatCam [6] uses a separable mask aligned separately and assume that the depth values of the scene are
with the sensor such that the sensor measurements correspond- distributed on a continuous domain. To recover the 3D scene,
ing to a plane at a fixed depth from the sensor can be written we jointly estimate the image intensity and depth map from the
as a separable system. DiffuserCam [12] assumes that the mask available sensor measurements.
size and angular span of the object are small enough so that the Joint estimation of image intensity and depth map can be
sensor measurements of a plane can be modeled as a convolution viewed as a nonlinear inverse problem in which the sampling
of the mask pattern with image intensity at that plane. The function is dependent on scene depth. Similar inverse problem
convolutional model can be computationally efficient if the also arises in many other fields such as direction-of-arrival
object falls within a small angular range because we can use estimation in radar [19], super-resolution [20] and compressed
fast Fourier transform to compute convolutions. The separable sensing [21]–[23]. Similar to the joint estimation of image
model does not require a small angular range assumption. A intensity and depth, the solution approaches to these problems
number of methods based on deep learning have also been consists of two main steps: identification of signal bases and the
developed recently for both separable and convolutional imaging estimation of signal intensities based on the identified bases. The
models to recover images at a fixed depth plane [13]–[15]. problem of identifying the signal bases from continuously vary-
A coded aperture system offers another advantage by encod- ing candidates is often called off-the-grid signal recovery. The
ing light from different directions and depths differently. The methods for solving the off-the-grid signal recovery problems

Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on June 08,2023 at 21:16:11 UTC from IEEE Xplore. Restrictions apply.
ZHENG AND SALMAN ASIF: JOINT IMAGE AND DEPTH ESTIMATION WITH MASK-BASED LENSLESS CAMERAS 1169

can be divided into two main types. The first approach formulates The measurement recorded on any sensor pixel is the sum-
the problem as a convex program on a continuous domain and mation of contributions from each of the point sources in the
solves it using an atomic norm minimization approach [24], [25]. 3D scene. The imaging model for a single sensor pixel can be
The second approach linearizes the problem for the optimization represented by
parameter using a first-order approximation at every iteration N
N

[20], [26]. Our proposed algorithm is inspired by the second y(su , sv ) = ψi,j (su , sv )li,j . (4)
approach. i=1 j=1
Mask-based lensless cameras have traditionally been used
We can write the imaging model for the entire sensor in a
for imaging light at wavelengths beyond the visible spectrum
compact form as
[9], [10]. Other examples related to mask-based cameras in-
clude controllable aperture and employing coded-mask for com- y = Ψ(α)l + e, (5)
pressed sensing and computational imaging [27], [28], dis- M2
where y ∈ R is a vectorized form of an M × M matrix that
tributed lensless camera [29], single pixel camera [30] and 2
denotes sensor measurements, l ∈ RN is a vectorized form
external mask setting [31].
of an N × N matrix that denotes light intensity from all the
Coded masks have also recently been used with conventional
locations (θi , θj , αi,j ), and Ψ is a matrix with all the basis
lens-based cameras to estimate depth and lightfield [32]–[35].
functions corresponding to θi , θj , αi,j . The basis functions in
Recently, a number of data-driven methods have been proposed 2

to design custom phase masks and optical elements to estimate (5) are parameterized by the unknown α ∈ RN and e denotes
depth from a single image [36], [37]. An all-optical diffractive noise and other nonidealities in the system.
deep neural network is proposed in [38], [39], which can perform We can jointly estimate light distribution (l) and inverse depth
pattern recognition tasks such as handwritten digits classifi- map (α)1 using the following optimization problem:
cation using optical mask layers. Such networks can literally 1
minimize y − Ψ(α)l22 . (6)
process images at a lightning-fast pace with near-zero energy α,l 2
cost. Note that if we know the true values of α (or we fix it to some-
thing), then the problem in (6) reduces to a linear least-squares
III. METHODS problem that can be efficiently solved via standard solvers. On
A. Imaging Model the other hand, if we fix the value of l, the problem remains
nonlinear with respect to α. In the next few sections we discuss
We divide the 3D scene under observation into N × N uni- our approach for solving the problem in (6) via alternating
formly spaced directions. We use θi and θj to denote the angular minimization.
directions of a light source with respect to the center of the
sensor. The intensity and depth of the light source are denoted B. Initialization
using li,j and zi,j respectively. Fig. 1(a) depicts the geometry
of such an imaging model. A planar coded-mask is placed on Since the minimization problem in (6) is not convex, a proper
top of a planar sensor array at distance d. The M × M sensor initialization is often needed to ensure convergence to a local
array captures lights coming from the scene modulated by the minimum close to the optimal point. A naïve approach is to
coded-mask. initialize all the point sources in the scene at the same depth
Every light source in the scene casts a shadow of the mask plane. To select an initial depth plane, we sweep through a set
on the sensor array, which we denote using basis functions ψ. of candidate depth planes and perform image reconstruction on
We use su and sv to index a pixel on the rectangular sensor one depth plane at a time by solving the following linear least
array. The shadow cast by a light source with unit intensity at squares problem:
(θi , θj , zi,j ) can be represented as the following basis or point 1
spread function: minimize y − Ψ(α)l22 . (7)
l 2
ψi,j (su , sv ) = mask [αi,j su + d tan(θi ), αi,j sv + d tan(θj )], We evaluate the loss value for all the candidate depth planes and
(1) picked the one with the smallest loss as our initialized depth.
where mask[u, v] denotes the transmittance of the mask pattern The mask basis function in (1) changes as we change α, which
at location (u, v) on the mask plane and αi,j is a variable that has an inverse relation with the scene depth. We select candidate
is related to the physical depth zi,j with the following inverse depth corresponding to uniformly sampled values of α, which
relation: yields non-uniform sampling of the physical scene depth. The
d single-depth initialization approach is computationally simple
αi,j = 1 − , (2) and provides a reasonable initialization of light distribution to
zi,j
start with, especially when the scene is far from the sensor.
If the 3D scene consists of only a single point source at (θi , θj ) Our second approach for initialization is the greedy method
with light intensity li,j , the measurement captured at sensor pixel proposed in [6]. Greedy algorithms are widely used for sparse
(su , sv ) would be
1 α has an inverse relation with the depth map (2); therefore we refer to it as
y(su , sv ) = ψi,j (su , sv )li,j . (3) inverse depth map throughout the paper.

Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on June 08,2023 at 21:16:11 UTC from IEEE Xplore. Restrictions apply.
1170 IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, VOL. 6, 2020

signal recovery [21]–[23]. Based on these algorithms, [6] pro- derivatives using finite-difference of ψi,j (su , sv ) over a fine grid
posed a greedy depth pursuit algorithm for depth estimation and linear interpolation.
from FlatCam [5]. The algorithm works by iteratively updating
the depth surface that matches the observed measurements the D. Algorithm Analysis
best.
To solve the non-linear least squares problem in (6) in our
The depth pursuit method assumes that the scene consists
algorithms, we compute the gradient derived in (10) and use it
of a small number of predefined depth planes. We start the
as input of a optimization solver. Suppose ψi and ψj denote the
program by initializing all the pixels at a single depth plane
basis function vectors evaluated on a 1D mask as
and the estimation of light intensities l based on the initialized
depth map. The first step is to select new candidate values for α. ψi (su ) = mask [αi,j su + d tan(θi )]
The new candidates are selected using the basis vectors that are
mostly correlated with the current residual of the estimate. In the ψj (sv ) = mask [αi,j sv + d tan(θj )]. (11)
second step, new candidates for α are appended to the current If we use a separable mask pattern, then the 2D mask function
estimate. We solve a least squares problem using the appended ψi,j in (1) can be computed as the outer product of two vectors
α. In the third step, we prune the α by selecting αi,j as the given as ψi,j = ψi ψjT . Similarly, we define 1D sub-gradient
value corresponding to the largest magnitude of li,j . Although function g as
this method may not estimate the off-grid point sources well, it
∂ψi,j (su , sv )
produces a good preliminary estimate of the scene. gi (su ) =
∂ui,j
C. Refinement via Alternating Gradient Descent ∂ψi,j (su , sv )
gj (sv ) = , (12)
To solve the minimization problem in (6), we start with the ∂vi,j
preliminary image and depth estimates from the initialization ∂ψ (s ,s ) ∂ψ (s ,s )
step and alternately update depth and light distribution via gra- Similar to (10), the functions i,j u v
∂ui,j and i,j∂vi,ju v are the
dient descent. The main computational task in gradient descent sub-gradient functions along the 1D mask. It takes non-negative
method is computing the gradient of the loss function w.r.t. α. values at locations where mask pattern value changes and takes
To compute that gradient, we expand the loss function in (6) as zero value at the other places. Using the derivation in (10), the
∂ψ (su ,sv )
matrix contains i,j ∂αi,j at all (su , sv ) can be computed using
M N
1 the following sum of two vector outer products.
L= (y(su , sv ) − ψi,j (su , sv )li,j )2 (8)
2 u,v=1 i,j=1 ∂ψi,j
= gi ψjT + ψi gjT . (13)
N ∂αi,j
We define Ru,v = y(su , sv ) − i,j=1 ψi,j (su , sv )li,j as the
residual approximation error at location (su , sv ). The derivatives Using the derivations in (9), the derivative of loss function with
of the loss function with respect to the αi,j is given as respect to depth value can be computed using the following
matrix multiplications, where R refers to the matrix of residual
M M
∂L ∂Ru,v ∂ψi,j (su , sv ) Ru,v at all (su , sv )
= Ru,v = −li,j Ru,v .
∂αi,j u,v=1
∂α i,j u,v=1
∂αi,j ∂L
= giT Rψj + ψiT Rgj . (14)
(9) ∂αi,j
We compute the derivatives of sensor value with respect to Suppose we have M × M pixels on sensor array. The computa-
the αi,j using the total derivative2 as follows. tion in (14) takes 2M 2 + 2M multiplications. We then feed our
gradients to minfunc solver [40] with L-BFGS algorithm [41]
∂ψi,j (su , sv ) ∂ψi,j (su , sv ) ∂ui,j ∂ψi,j (su , sv ) ∂vi,j
= + to solve the non-linear optimization problem in (6).
∂αi,j ∂ui,j ∂αi,j ∂vi,j ∂αi,j
∂ψi,j (su , sv ) ∂ψi,j (su , sv ) E. Regularization Approaches
= su + sv . (10)
∂ui,j ∂vi,j 2 regularization on spatial gradients: The optimization
ui,j = αi,j su + d tan(θi ) and vi,j = αi,j sv + d tan(θj ) denote problem in (6) is highly non-convex and contains several local
two dummy variables that also correspond to the specific loca- minima; therefore, the estimate often gets stuck in some local
tion on the mask where a light ray from a point source at angle minima and the estimated intensity and depth maps are coarse.
(θi , θj ) and depth αi,j and sensor pixel at (su , sv ) intersects To improve the performance of our algorithm for solving the
∂ψ (su ,sv ) ∂ψi,j (su ,sv ) non-convex problem in (6), we seek to exploit additional struc-
with the mask plane. The terms in i,j ∂ui,j , ∂vi,j can
tures in the scene. A standard assumption is that the depth of
be viewed as the derivatives of mask pattern along the respective
neighboring pixels is usually close, which implies that the spatial
spatial coordinates and evaluated at ui,j , vi,j . We compute these
differences of (inverse) depth map are small. To incorporate
this assumption in our model, we add a quadratic regularization
2 Recall that the total derivative of a multivariate function f (x, y) is term on the spatial gradients of the inverse depth map to our
∂f (x,y) ∂f (x,y)
∂x dx + ∂y dy. loss function. The quadratic regularization term is defined on an

Algorithm 1: Weighted TV-2 Regularized Optimization.

Input: Sensor measurements: y
Output: Light distribution and inverse depth map: l, α
Initialization via greedy algorithm:
Compute α and l with depth pursuit algorithm in [6].
Refinement via alternating gradient descent:
for k = 1 : kmax do
k = argmin 12 y − Ψ(α)lk−1 22 + λRW (α)
α
α
lk = argmin 1 y − Ψ(αk )l2
2 2
l
end for
return l and α

exponential decay in our experiments that we compute as

r (αi,j − αi+1,j )2
Wi,j = exp −
σ

c (αi,j − αi,j+1 )2
Wi,j = exp − . (17)
σ
Fig. 2. A comparison between objective loss functions without and with Such a weighted regularization forces pixels that have depth
smooth regularization. The inverse depth axis refers to the value of α.
within a small range of one another to be smooth and does not
penalize the points that have larger gap in depth (which indicates
the presence of an edge). This helps preserve sharp edges in
N × N inverse depth map matrix α and can be written as the reconstructed depth estimates. This weighting approach is
analogous to bilateral filtering approach for image denoising
N
[43], [44].
R(α) = (αi,j − αi+1,j )2 + (αi,j − αi,j+1 )2 The regularized estimation problem for image and depth can
i,j=1 be written in the following form:
= ∇r α2F + ∇c α2F , (15) 1
minimize y − Ψ(α)l22 + λRW (α). (18)
α,l 2
where the operators ∇r , ∇c compute spatial differences along We call this regularization approach weighted TV-2 and solve
rows and columns, respectively. We call this regularization an 2 it by alternately updating the inverse depth map α and light
norm-based total variation (TV-2 ) in this paper. Fig. 2 illustrates intensity l. A pseudocode of the algorithm is presented at Algo-
the effect of the depth regularization. From Fig. 2, we observe rithm 1.
that smoothness regularization improves the loss function by 1 regularization on spatial gradients: It is well-known that
removing several local minima. We also observed this effect in the 1 norm regularization enforces the solution to be sparse.
our simulations for a high-dimensional depth recovery problem, We add an 1 -based total variation norm [45] of the depth to
which is not very sensitive to initialization with depth regular- our optimization problem. By enforcing the sparsity of spatial
ization. gradients, the edges of (inverse) depth map can be preserved.
Weighted 2 regularization on spatial gradients: Even The 1 norm-based TV regularization term is given as
though smoothness regularization on the inverse depth map
removes some local minima and helps with converge, it does N

not respect the sharp edges in the depth map. To preserve sharp RT V (α) = |αi,j − αi+1,j | + |αi,j − αi,j+1 |
discontinuities in the (inverse) depth map, we used the following i,j=1

adaptive weighted regularization inspired from [42]: = ∇r α1 + ∇c α1 . (19)
N
To solve the nonlinear optimization problem with 1 norm
c
RW (α) = Wi,j (αi,j − αi+1,j )2 + Wi,j
r
(αi,j − αi,j+1 )2 , regularization, we write the optimization problem as
i,j=1
1
r,α c,α
(16) minimize y − Ψ(α)l22 + λ(dr 1 + dc 1 )
where Wi,j and Wi,j denote weights for row and column dif- α,l 2
ferences, respectively. We aim to select these weights to promote s.t.dr = ∇r α, dc = ∇c α. (20)
depth similarity for neighboring pixels, but avoid smoothing
the sharp edges. To promote this, we selected weights with We solve this problem (20) using a split-Bregman method [46].

Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on June 08,2023 at 21:16:11 UTC from IEEE Xplore. Restrictions apply.
1172 IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, VOL. 6, 2020

F. Computational Complexity TABLE I

ANALYSIS EXPERIMENTS ARE PERFORMED ON MULTIPLE SCENES PICKED
The main computational and storage cost of the proposed FROM MIDDLEBURY [48], MAKE3D [49], [50] AND NYU DEPTH [51].
method arises from the forward and adjoint of the imaging RESULTS OF THE TWO SCENES ABOVE LINE ARE PRESENTED WITHIN THE
MAIN TEXT, WHILE THE REST OF THEM ARE REPORTED IN THE
operators. Since the depth of any pixel can have an arbitrary value SUPPLEMENTARY MATERIAL
in a continuous domain, we cannot precompute the imaging
operators. At every iteration, we first interpolate the separable
mask patterns according to the current estimate of α. The time
complexity of forward imaging operator as given in (5) will be
O(M 2 N 2 ) because we have to add up depth-dependent response
corresponding to every angle.

IV. SIMULATION RESULTS

In this section, we present simulation results to evaluate the
performance of our methods under different noise levels and
sensor sizes. We also present a comparison of our proposed
method with two existing methods for 3D imaging with lensless
cameras. Additional experiments on the reconstruction of a
single depth plane and the effect of numbers of sensor pixels on
the reconstruction are included in the Supplementary material.

A. Simulation Setup
To validate the performance of the proposed algorithm, we
simulate a lensless imaging system using a binary planar mask
with a separable maximum length sequence (MLS) pattern [47]
that is placed 4mm away from a planar sensor array. We used
an MLS sequence of length 1024 and converted all the −1 s to
0 s to create a separable binary pattern. We used square mask
features, each of which is 30 μm wide. Since we optimize the
objective function in (6) with respect to α and need to compute
the gradient in (9), we require the mask function to be smooth
and differentiable with respect to α. Therefore, we convolved
the binary pattern with a Gaussian blur kernel of length 15
μm and standard deviation 5. In our simulations, we do not
Fig. 3. Left to right: original image and depth of the Cones scene; image
explicitly model the diffraction blur. However, the Gaussian and depth initialized via greedy algorithm [6]; depth estimation using weighted
blur kernel that we apply to the mask function can be viewed 2 -based regularization. The depth in this scene varies from around 0.99 m to
as an approximation of the diffraction blur. The sensor contains 1.7 m.
512 × 512 square pixels, each of which is 50 μm wide. The chief
ray angle of each sensor pixel is ±18◦ . We assume that there is
from around 0.99 m to 1.7 m. We used depth pursuit greedy
no noise added to the sensor measurements. In our experiments
algorithm in [6] as our initialization method. We selected 15
for continuous depth estimation, we fixed all the parameters to
candidate depths by uniformly sampling the inverse depth values
these default values and analyze the performance with respect
α from 0.996 to 0.9976, which gives an effective depth in the
to a single parameter.
same range as the original depth. Since we are trying to gauge
the performance for off-the-grid estimate of depth, the candidate
B. Reconstruction of Scenes With Continuous Depth values of α are not exactly the same as the true values of α in
Depth datasets: We performed all our experiments on 3D our simulations. The output of the initialization algorithm is then
images created using light intensities and depth information fed into the alternating gradient descent method.
from Middlebury [48], Make3D [49], [50] and NYU Depth [51], Performance metrics: We evaluate the performance of recov-
the test scenes and their depth ranges are listed in Table I. ered image intensity and depth independent of each other. We
Initialization via greedy method: Let us further discuss our report the peak signal to noise ratio (PSNR) of the estimated
simulation setup using the Cones scene, for which the results are intensity distribution and root mean squared error (RMSE) of
presented in Fig. 3. We simulated the 3D scene using depth data the estimated depth maps for all our experiments. The estimates
from Middlebury dataset [48]. We sample the scene at uniform for image intensity and depth maps for the initialization and our
angles to create a 128 × 128 image and its (inverse) depth map proposed weighted TV-2 method are shown in Fig. 3, along
with the same size. We can compute the physical depth from with the PSNR and RMSE. We can observe that both image and
α using (2). In our simulation, the depth of this scene ranges depth estimation from greedy method [6] contain several spikes

Fig. 4. Comparison between reconstructions using three different regularization approaches from the same measurements.

Fig. 5. Effects of noise: Reconstruction from the measurements with signal-to-noise ratio (SNR) at 20dB, 30dB and 40dB, along with the PSNR of reconstructed
image and RMSE of reconstructed depth map. As expected, the quality of reconstructed image and depth improves as the noise level is reduced. The sequence in
left is for Sword, right is Playtable.

because of the model mismatch with the predefined depth grid. measurements and can adversely affect the reconstruction re-
In contrast, many of these spikes are removed in the estimations sults. To investigate the effect of noise on our algorithm, we
from the proposed algorithm with weighted TV-2 while the present simulation results for the reconstruction of scenes from
edges are preserved. the same sensor measurements under different levels of ad-
Comparison of regularization methods: Here we present a ditive white Gaussian noise. The experiments are performed
comparison between three different regularization approaches. on multiple 3D scenes listed in Table I. Some examples of
We reconstruct image intensity and (inverse) depth map using reconstruction with different levels of noise are shown in
the same measurements with TV-2 , weighted TV-2 , and TV-1 Fig. 5.
regularization. The results are shown in Fig. 4 . Compared to the The plots recording PSNR of image intensities and RMSE
TV-2 method, we observe that both weighted TV-2 and TV-1 of depth maps over a range of measurement SNR values are
preserve the sharp edges in image and depth estimates. Overall, presented in Fig. 6. As we can observe from the curves that
in our experiments, weighted TV-2 provided the best results. the quality of both estimated image and depth improve when
Therefore, we used that as our default method for the rest of the the measurements have small noise (high SNR) and the quality
paper. degrades as we add more noise in the measurements (low SNR).
Another observation we can make is that the scenes that are
farther away have higher RMSE. This aspect is understandable
C. Effects of Noise because as the scenes move farther, α of the scene pixels all get
Sensor noise exists widely in any observation process. very close to 1 and we cannot resolve fine depth variations in
The amplitude of noise depends on the intensities of sensor the scene.

Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on June 08,2023 at 21:16:11 UTC from IEEE Xplore. Restrictions apply.
1174 IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, VOL. 6, 2020

solve the 3D recovery problem using the forward model and

measurements from our simulation setup.
The imaging experiments in [12] and [17] are aimed at flu-
orescence imaging in which objects are mostly transparent and
all the points in the 3D volume can contribute to the sensor
measurements without occluding one another. In contrast, we
consider natural photographic scenes, where objects are usually
opaque and block light from objects behind them along the
same angular direction. We can model such scenes as having
only one voxel along any angle to be nonzero; however, that
will be a nonconvex constraint and to enforce that we will have
to resort to some heuristic similar to the one in [6]. For the
Fig. 6. Reconstruction from measurements with different levels of Gaussian sake of comparison, we solve the 1 norm-based sparse recovery
noise on multiple scenes. Both of the image Peak Signal-Noise Ratio and depth
Root mean squared error are improved as the noise is reduced. The reconstruction problem as described in [12], but then we pick the points with the
quality degrades if the scene is placed farther from the camera. maximum light intensity at each angle to form the reconstructed
image and (inverse) depth map.
A comparison of different recovery methods with the same
D. Size of Sensor imaging setup is shown in Fig. 8. For the same scene, we
reconstruct the same measurements using the three methods. As
In conventional disparity-based depth estimation method [2],
we can observe that our proposed algorithm offers a significant
the quality of reconstructed depth depends on the disparity
improvement compared to existing methods in all the test scenes.
between frames captured from multiple camera views. Larger
The time and storage complexity of our proposed method
distance between camera viewing positions results in better
and the other two methods depend on different factors; such
depth estimation accuracy. In a lensless imaging system, we can
as whether the imaging model is separable or convolutional
think of each pinhole on the mask and the sensor area behind
and the sampling density along the depth. Since the main
the mask as a tiny pinhole camera. The analogy only goes this
computational complexity of all the methods arises from the
far, because we do not record images from these tiny pinhole
applications of the forward and adjoint operators, we will just
cameras separately; instead, we record a multiplexed version
discuss the complexity of those operators for different methods.
of all the views. The disparity between different points on the
The imaging operator in the greedy algorithm uses a separable
sensors, however, does affect our ability to resolve the depth of
mask and assumes that the scene consists of D depth planes. The
the scene, which is determined by the size of sensor.
computational complexity of the operator is O(DM N 2 ) when
To analyze the effect of disparity in our system, we performed
we have M × M sensor pixels to reconstruct N × N image
experiments with three different sizes of sensor pixels from 25
at D predefined depth planes. The convolutional model can be
μm, 50 μm, and 100 μm. For comparison, the number of sensor
implemented using a fast Fourier transform and its complexity
pixels and other parameters are set to the default settings as
for a 3D volume with D depth planes is O(DN 2 log(N ). The
described earlier. No noise is included in this experiment. Results
time complexity of forward imaging operator in the proposed
in terms of reconstructed image and depth maps are presented in
method is O(M 2 N 2 ) because we assign independent depth
Fig. 7, where we observe that the quality of depth reconstruction
values to each of the angles.
improves as we increase the size of sensor pixels. The results
in Fig. 7 demonstrate that increasing the disparity of viewing
V. EXPERIMENTAL RESULTS
points increases the depth reconstruction quality.
To demonstrate the performance of our proposed method in
E. Comparison With Existing Methods the real world, we built a FlatCam prototype to capture images of
different objects with different depth profiles. Below we discuss
Finally, we present a comparison of our proposed algorithm
the details of our experiments and present reconstructed intensity
and two other methods for 3D recovery with lensless cameras.
and depth maps for some real objects.
In our method, we estimate light intensity and a depth map over
continuous domain. The greedy method in [6] also estimates
intensity and depth separately, but the depth map for any angle is A. Prototype Setup
restricted to one of the predetermined planes. Three-dimensional Image sensor: We used a Sony IMX249 CMOS color sensor
recovery using lensless cameras for 3D fluorescence microscopy that came inside a point grey camera (model BFLY-U3-23S6C-
was presented in [12] and [17], which estimate the entire 3D C). The sensor has 1920 × 1200 pixels and the size of each pixel
volume of the scene sampled over a predetermined 3D grid. is 5.86 μm. The physical size of the sensor is approximately 11.2
Since the unknown volume scene in microscopy is often very mm × 7 mm.
sparse, the 3D scene recovery problem is solved as a sparse Mask pattern: We printed a binary mask pattern on a plastic
recovery problem for the light intensity over all the grid voxels. sheet. The mask pattern was created by computing an outer
The result is a light distribution over the entire 3D space. We product of two 255-length MLS vectors and setting all the −1
call this method 3D Grid and use the code provided in [12] to entries to 0. The physical size of each mask feature is 60μm. The

Fig. 7. Reconstructions from measurements with different sizes of sensor pixels. The number of sensor pixels is fixed as 512 × 512. The quality of depth
reconstruction improves as we increase the size of sensor pixels.

Fig. 8. Comparison of existing 3D recovery methods for lensless imaging, 3D grid method in [12], [17] and greedy method in [6], with our proposed method.
3D grid method provides a 3D volume with multiple depth planes; therefore, we pick the depth with the largest light intensity along any angle for comparison.

physical size of the generated mask pattern is approximately 15.3

mm × 15.3 mm.
Sensor and mask placement: We placed the mask and the
bare sensor on two optical posts such that the mask-to-sensor
distance (d) is approximately 4mm; we attached kinematic
platforms on top of the optical posts so that we can align the
sensor and mask. Pictures of our sensor and mask setup are
shown in Fig. 9.
Data acquisition and processing: In our experiments, we
calibrated the system by capturing sensor measurements while Fig. 9. Camera prototype: The side view of the sensor and mask assembly. The
moving an LED flashlight at different locations in front of the sensor and mask are placed at a large distance for this image, but their distance
camera. We performed all our experiments by uniformly illumi- (d) is approximately 4 mm in our experiments. The mask pattern is binary and
separable, and the physical size of each feature is 60μ m.
nating the object with a table lamp. We reconstructed depth map
and RGB images at 128 × 128 pixel resolution from 512 × 512
sensor measurements. The sensor provides 1920 × 1200 pixels;
we first resize the sensor measurement into 960 × 600 pixels by is a rank-one image after mean subtraction [5]. To calibrate
binning 2 × 2 pairs, and then we crop a 512 × 512 area in the system matrix at one given depth, we can capture a sequence
center. of rank-1 Hadamard patterns as described in [5] or capture the
response of one LED flashlight as described in [12]. Instead
of calibrating the separable system matrices for different depth
B. Calibration of the Prototype Camera
planes, we calibrated the mask pattern function at one depth and
We use a separable mask pattern and align the mask and sensor evaluated the point spread function at arbitrary depth and angle
assembly such that the response of any point source on the sensor according to (1). Because our mask pattern is bigger than the
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on June 08,2023 at 21:16:11 UTC from IEEE Xplore. Restrictions apply.
1176 IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, VOL. 6, 2020

Fig. 10. Experiments on real objects. (a) A slanted card; the depth range is 18–28 cm (b) Two slanted cards; the depth range of left card is 18–28 cm and the right
card is 26–29 cm. (c) Hand sculpture; depth range is 15–30 cm. (d) A mug with card texture; depth range is 24–27 cm. We divide each group of real scenes into
four columns, the first column is front view and side view of the scene, the second column is the result from greedy algorithm in [6], the third column is the output
of sparse 3D grid recovery algorithm proposed in [12] and [17], and the last column is the image intensity and depth map estimated using our proposed algorithm.

sensor, we captured sensor measurements for LED flashlight at In comparison, both the greedy algorithm [6] and the sparse
9 different angles at the same depth and merged them to estimate 3D volume recovery algorithm [12], [17] produce coarse and
the mask function at that depth. discretized depth maps. The intensity map recovered by our
In our experiments, we captured the sensor measurements method is also visually better compared to other methods.
by placing an LED at z = 42 cm away from the sensor, which Even though our proposed algorithm produces better intensity
corresponds to the mask function in (1) evaluated at α = and depth maps compared to the greedy and 3D grid methods, we
1 − d/z = 0.9905 for d = 4 mm, z = 42 cm. We first resized observed that the estimated depth has some errors in the darker
the calibrated mask function to compute the mask function parts of the objects. For instance, the left part of the mug is darker
corresponding to α = 1. than the right part because the object was illuminated from a
lamp on the right side. The left part appears to have errors in the
depth estimate as several pixels are assigned small depth values,
C. Reconstruction of Real Objects but that part is in fact farther from the sensor. We also observe
We present results for four objects in Fig. 10, (a) slanted card a similar effect in other experiments, where depth estimates for
has depth range from 18 cm to 28 cm, (b) two slanted cards have darker parts of the scene appear to have larger errors.
depth ranges from 18 cm to 28 cm and 26 cm to 29 cm, (c) hands
sculpture has depth range from 15 cm to 30 cm, and (d) mug with
VI. CONCLUSION
card texture depth is from 24 cm to 27 cm. The figure is divided
into four boxes. In each box, we present a front- and side-view We presented a new algorithm to jointly estimate the image
of the object along with estimated scene intensity and depth and depth of a scene using a single snapshot of a mask-based
maps for three methods. the greedy algorithm in [6], the sparse lensless camera. Existing methods for 3D lensless imaging either
3D volume recovery method from [12], [17], and our proposed estimate scene over a predefined 3D grid (which is computa-
method. For the greedy and 3D grid method, we generated 15 tionally expensive) or a small number of candidate depth planes
candidate depth planes by uniformly sampling the inverse depth (which provides a coarse depth map). We divide the scene into an
values α between 0.96 and 0.9905 (corresponding to the depth intensity map at uniform angles and a depth map on a continuous
of 10 cm and 42 cm, respectively). domain, which allows us to estimate a variety of scenes with
All the objects in our experiments are placed in front of different depth ranges using the same formulation. We jointly es-
the black background and the depth values for dark pixels are timate the image intensity and depth map by solving a nonconvex
not meaningful. We can observe that in all these experiments, problem. We initialize our estimates using a greedy method and
our proposed method provides a continuous depth map that add weighted regularization to enforce smoothness in the depth
is consistent with the real depth of the object in the scene. estimate while preserving the sharp edges. We demonstrated

with extensive simulations and experiments with real data that [18] Y. Hua, S. Nakamura, M. S. Asif, and A. C. Sankaranarayanan, “Sweepcam
our proposed method can recover image and depth with high depth-aware lensless imaging using programmable masks,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 42, no. 7, pp. 1606–1617, 2020.
accuracy for a variety of scenes. We evaluated the performance of [19] Z. Tan, P. Yang, and A. Nehorai, “Joint sparse recovery method for
our methods under different noise levels, sensor sizes, and num- compressed sensing with structured dictionary mismatches,” IEEE Trans.
bers of sensor pixels and found the method to be robust. We pre- Signal Process., vol. 62, no. 19, pp. 4997–5008, Oct. 2014.
[20] N. Boyd, G. Schiebinger, and B. Recht, “The alternating descent condi-
sented a comparison with existing methods for lensless 3D imag- tional gradient method for sparse inverse problems,” in Proc. IEEE 6th
ing and demonstrated both in simulation and real experiments Int. Workshop Comput. Adv. Multi-Sensor Adaptive Process., Dec. 2015,
that our method provides significantly better results. We believe pp. 57–60.
[21] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements
this work provides a step toward capturing complex scenes with via orthogonal matching pursuit,” IEEE Trans. Inform. Theory, vol. 53,
lensless cameras, where depth estimation is a feature as well as no. 12, pp. 4655–4666, Dec. 2007.
a compulsion because if the depth information is unavailable or [22] D. Needell and J. A. Tropp, “Cosamp: Iterative signal recovery from
incomplete and inaccurate samples,” Commun. ACM, vol. 53, no. 12,
inaccurate, that will cause artifacts in the recovered images. pp. 93–100, Dec. 2010. [Online]. Available: http://doi.acm.org/10.1145/
1859204.1859229
REFERENCES [23] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, “Model-
based compressive sensing,” IEEE Trans. Inform. Theory, vol. 56, no. 4,
[1] Y. Zheng and M. S. Asif, “Image and depth estimation with mask-based pp. 1982–2001, 2010.
lensless cameras,” in Proc. IEEE Int. Workshop Comput. Adv. Multi-Sensor [24] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, “The
Adapt. Process. (CAMSAP), 2019, pp. 91–95. convex geometry of linear inverse problems,” Foundations Comput. Math.,
[2] R. Hartley and A. Zisserman, Multiple View Geom. Comput. Vis. Cam- vol. 12, no. 6, pp. 805–849, Dec. 2012. [Online]. Available: https://doi.org/
bridge, U.K.: Cambridge Univ. Press, 2003. 10.1007/s10208-012-9135-7
[3] S. B. Gokturk, H. Yalcin, and C. Bamji, “A time-of-flight depth sensor - [25] G. Tang, B. N. Bhaskar, P. Shah, and B. Recht, “Compressed sensing off
system description, issues and solutions,” in Conf. Comput. Vision Pattern the grid,” IEEE Trans. Inform. Theory, vol. 59, no. 11, pp. 7465–7490,
Recognit. Workshop, Jun. 2004, pp. 35–35. Nov. 2013.
[4] F. Heide, M. B. Hullin, J. Gregson, and W. Heidrich, “Low-budget transient [26] Z. Yang, L. Xie, and C. Zhang, “Off-grid direction of arrival estimation
imaging using photonic mixer devices,” ACM Trans. Graph. (ToG), vol. 32, using sparse Bayesian inference,” IEEE Trans. Signal Process., vol. 61,
no. 4, pp. 45, 2013. no. 1, pp. 38–43, Jan. 2013.
[5] M. S. Asif, A. Ayremlou, A. Sankaranarayanan, A. Veeraraghavan, and [27] D. Takhar et al., “A new compressive imaging camera architecture using
R. G. Baraniuk, “Flatcam: Thin, lensless cameras using coded aperture optical-domain compression,” in Proc. Comput. Imag. IV SPIE Electron.
and computation,” IEEE Trans. Comput. Imag., vol. 3, no. 3, pp. 384–397, Imag., 2006, pp. 43–52.
Sep. 2017. [28] A. Zomet and S. K. Nayar, “Lensless imaging with a controllable aper-
[6] M. S. Asif, “Lensless 3D imaging using mask-based cameras,” in IEEE ture,” in Proc. IEEE Comput. Vision Pattern Recognit., vol. 1, Jun. 2006,
Int. Conf. Acoust., Speech and Signal Process. (ICASSP). IEEE, Apr. 2018, pp. 339–346.
pp. 6498–6502. [29] Y. Zheng and M. S. Asif, “Imaging with distributed lensless line sen-
[7] A. Yedidia, C. Thrampoulidis, and G. Wornell, “Analysis and optimization sors,” in Proc. 53rd Asilomar Conf. Signals, Syst., Comput., Nov. 2019,
of aperture design in computational imaging,” IEEE Int. Conf. Acoust., pp. 1289–1293.
Speech, and Signal Process., pp. 4029–4033, Apr. 2018. [Online]. Avail- [30] M. F. Duarte et al., “Single-pixel imaging via compressive sampling,”
able: http://sigport.org/3049 IEEE Signal Process. Mag., vol. 25, no. 2, pp. 83–91, Mar. 2008.
[8] E. E. Fenimore and T. M. Cannon, “Coded aperture imaging with [31] D. Reddy, J. Bai, and R. Ramamoorthi, “External mask based depth and
uniformly redundant arrays,” Appl. Opt., vol. 17, no. 3, pp. 337–347, light field camera,” in Proc. IEEE Int. Conf. Comput. Vision Workshops,
Feb. 1978. [Online]. Available: http://ao.osa.org/abstract.cfm?URI=ao- Dec. 2013, pp. 37–44.
17-3-337 [32] A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth
[9] A. Busboom, H. Elders-Boll, and H. D. Schotten, “Uniformly redundant from a conventional camera with a coded aperture,” ACM Trans. Graph.,
arrays,” Exp. Astron., vol. 8, no. 2, pp. 97–123, Jun. 1998. [Online]. vol. 26, pp. 70–es, 2007.
Available: https://doi.org/10.1023/A:1007966830741 [33] A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin,
[10] T. M. Cannon and E. E. Fenimore, “Coded Aperture Imaging: Many holes “Dappled photography: Mask enhanced cameras for heterodyned light
make light work,” Opt. Eng., vol. 19, pp. 283–289, Jun. 1980. fields and coded aperture refocusing,” in ACM Trans. Graph., vol. 26,
[11] V. Boominathan et al., “Lensless imaging: A computational renaissance,” no. 3, 2007, pp. 69–es.
IEEE Signal Process. Mag., vol. 33, no. 5, pp. 23–35, 2016. [34] K. Marwah, G. Wetzstein, Y. Bando, and R. Raskar, “Compressive light
[12] N. Antipa et al., “Diffusercam: lensless single-exposure 3D imaging,” field photography using overcomplete dictionaries and optimized projec-
Optica, vol. 5, no. 1, pp. 1–9, Jan. 2018. [Online]. Available: http://www. tions,” ACM Trans. Graph. (Proc. SIGGRAPH), vol. 32, no. 4, pp. 1–11,
osapublishing.org/optica/abstract.cfm?URI=optica-5-1-1 2013.
[13] S. S. Khan, A. V. R. , V. Boominathan, J. Tan, A. Veeraraghavan, and [35] M. Hirsch, S. Sivaramakrishnan, S. Jayasuriya, A. Wang, A. Molnar,
K. Mitra, “Towards photorealistic reconstruction of highly multiplexed R. Raskar, and G. Wetzstein, “A switchable light field camera architecture
lensless images,” in Proc. IEEE Int. Conf. Comput. Vision, Oct. 2019, with angle sensitive pixels and dictionary-based sparse coding,” in Proc.
pp. 7859–7868. IEEE Int. Conf. Comput. Photography (ICCP), 2014, pp. 1–10.
[14] K. Monakhova, J. Yurtsever, G. Kuo, N. Antipa, K. Yanny, and L. Waller, [36] Y. Wu, V. Boominathan, H. Chen, A. Sankaranarayanan, and A. Veer-
“Learned reconstructions for practical mask-based lensless imaging,” Opt. araghavan, “Phasecam3D learning phase masks for passive single view
Express, vol. 27, no. 20, pp. 28 075–28 090, Sep. 2019. [Online]. Available: depth estimation,” in Proc. IEEE Int. Conf. Comput. Photography (ICCP),
http://www.opticsexpress.org/abstract.cfm?URI=oe-27-20-28075 May 2019, pp. 1–12.
[15] A. Dave, A. K. Vadathya, R. Subramanyam, R. Baburajan, and K. Mitra, [37] J. Chang and G. Wetzstein, “Deep optics for monocular depth estimation
“Solving inverse computational imaging problems using deep pixel-level and 3D object detection,” in Proc. IEEE Int. Conf. Comput. Vision, 2019.
prior,” IEEE Trans. Comput. Imag., vol. 5, no. 1, pp. 37–51, 2019. [Online]. [38] X. Lin et al., “All-optical machine learning using diffractive deep neu-
Available: https://doi.org/10.1109/TCI.2018.2882698 ral networks,” Sci., vol. 361, no. 6406, pp. 1004–1008, 2018. [Online].
[16] H. H. Barrett, D. T. Wilson, G. D. DeMeester, and H. Scharfman, “Fresnel Available: https://science.sciencemag.org/content/361/6406/1004
zone plate imaging in radiology and nuclear medicine,” in Application of [39] D. Mengu, Y. Luo, Y. Rivenson, and A. Ozcan, “Analysis of diffractive
Optical Instrumentation, vol. 0035, Medicine I, P. L. Carson, W. R.Hendee, optical neural networks and their integration with electronic neural net-
and W. C. Zarnstorff, Eds., International Society for Optics and Photonics, works,” IEEE J. Sel. Topics Quantum Electron., vol. 26, no. 1, pp. 1–14,
1972, pp. 199–206. Jan. 2020.
[17] J. K. Adams et al., “Single-frame 3D fluorescence microscopy with [40] M. Schmidt, “minfunc: unconstrained differentiable multivariate opti-
ultraminiature lensless flatscope,” Sci. Adv., vol. 3, no. 12, 2017. [Online]. mization in matlab,” 2005. [Online]. Available: http://www.cs.ubc.ca/
Available: https://advances.sciencemag.org/content/3/12/e1701548 schmidtm/Software/minFunc.html

Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on June 08,2023 at 21:16:11 UTC from IEEE Xplore. Restrictions apply.
1178 IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, VOL. 6, 2020

[41] D. C. Liu and J. Nocedal, “On the limited memory BFGS method for Yucheng Zheng received the B.Sc. degree in electri-
large scale optimization,” Math. Program., vol. 45, no. 1, pp. 503–528, cal engineering from the Nanjing University of Aero-
Aug. 1989. [Online]. Available: https://doi.org/10.1007/BF01589116 nautics and Astronautics, Nanjing, China in 2017.
[42] Y. Liu, J. Ma, Y. Fan, and Z. Liang, “Adaptive-weighted total variation min- He is currently working toward the Ph.D. degree at
imization for sparse data toward low-dose X-ray computed tomography the University of California, Riverside, CA, USA.
image reconstruction,” Phy. Med. & Biol., vol. 57, no. 23, pp. 7923–7956, His current research interests include computational
2012. imaging, computer vision and signal processing.
[43] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,”
in Proc. Sixth Int. Conf. Comput. Vision (IEEE Cat. No. 98CH36271),
Jan. 1998, pp. 839–846.
[44] F. Durand and J. Dorsey, “Fast bilateral filtering for the display
of high-dynamic-range images,” ACM Trans. Graph., vol. 21, no. 3,
pp. 257–266, Jul. 2002. [Online]. Available: http://doi.acm.org/10.1145/
566654.566574
[45] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise
removal algorithms,” Phys. D, vol. 60, no. 1–4, pp. 259–268, Nov. 1992.
[Online]. Available: https://doi.org/10.1016/0167-2789(92)90242-F
[46] T. Goldstein and S. Osher, “The split Bregman method for l1-regularized
problems,” SIAM J. Img. Sci., vol. 2, no. 2, pp. 323–343, Apr. 2009.
[Online]. Available: http://dx.doi.org/10.1137/080725891
[47] F. J. MacWilliams and N. J. A. Sloane, “Pseudo-random sequences and M. Salman Asif received the B.Sc. degree from the
arrays,” in Proc. IEEE, vol. 64, no. 12, pp. 1715–1729, Dec. 1976. University of Engineering and Technology, Lahore,
[48] D. Scharstein, R. Szeliski, and R. Zabih, “A taxonomy and evaluation Pakistan, in 2004, the M.S.E.E and Ph.D. degrees
of dense two-frame stereo correspondence algorithms,” in Proc. IEEE from the Georgia Institute of Technology, Atlanta,
Workshop Stereo Multi-Baseline Vis., Dec. 2001, pp. 131–140. GA, USA, in 2008 and 2013, respectively. He is an
[49] A. Saxena, S. H. Chung, and A. Y. Ng, “Learning depth from single Assistant Professor in the Department of Electrical
monocular images,” in Adv. Neural Inf. Process. Syst. 18, Y. Weiss, B. and Computer Engineering, University of Califor-
Schölkopf, and J. C. Platt, Eds. MIT Press, 2006, pp. 1161–1168. [Online]. nia, Riverside, CA, USA. He was a Research Intern
Available: http://papers.nips.cc/paper/2921-learning-depth-from-single- at Mitsubishi Electric Research Laboratories, Cam-
monocular-images.pdf bridge, MA, USA, in the Summer of 2009, and at
[50] A. Saxena, M. Sun, and A. Y. Ng, “Make3d: Learning 3D scene structure Samsung Standards Research Laboratory, Richard-
from a single still image,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, son, TX, USA, in the Summer of 2010. He was a Senior Research Engineer at
no. 5, pp. 824–840, May 2009. Samsung Research America, Dallas, TX, USA, from August 2012 to January
[51] P. K. Nathan Silberman, D. Hoiem, and R. Fergus, “Indoor segmentation 2014, and as a Postdoctoral Researcher at Rice University from February 2014
and support inference from RGBD images,” in Proc. Eur. Conf. Comput. to June 2016. His research interests include compressive sensing, computational
Vision, 2012, pp. 746–760. and medical imaging, and machine learning.

Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on June 08,2023 at 21:16:11 UTC from IEEE Xplore. Restrictions apply.

Decoder Modulation For Indoor Depth Completion
No ratings yet
Decoder Modulation For Indoor Depth Completion
11 pages
D P: S M M D L T S: Epth RO Harp Onocular Etric Epthin ESS Hana Econd
No ratings yet
D P: S M M D L T S: Epth RO Harp Onocular Etric Epthin ESS Hana Econd
33 pages
Adversarial Learning For Unguided Single Depth Map Completion of Indoor Scenes
No ratings yet
Adversarial Learning For Unguided Single Depth Map Completion of Indoor Scenes
30 pages
Metric3D v2: A Versatile Monocular Geometric Foundation Model For Zero-Shot Metric Depth and Surface Normal Estimation
No ratings yet
Metric3D v2: A Versatile Monocular Geometric Foundation Model For Zero-Shot Metric Depth and Surface Normal Estimation
30 pages
Depth Perception in Single RGB Camera System Using Lens Aperture and Object Size: A Geometrical Approach For Depth Estimation
No ratings yet
Depth Perception in Single RGB Camera System Using Lens Aperture and Object Size: A Geometrical Approach For Depth Estimation
16 pages
SharpDepth深度估计
No ratings yet
SharpDepth深度估计
23 pages
Slides Fireside Chat
No ratings yet
Slides Fireside Chat
70 pages
Domain Randomization-Enhanced Depth Simulation and Restoration For Perceiving and Grasping Specular and Transparent Objects
No ratings yet
Domain Randomization-Enhanced Depth Simulation and Restoration For Perceiving and Grasping Specular and Transparent Objects
26 pages
Oe 27 20 28075
No ratings yet
Oe 27 20 28075
16 pages
Adaptive Surface Normal Constraint For Geometric Estimation From Monocular Images
No ratings yet
Adaptive Surface Normal Constraint For Geometric Estimation From Monocular Images
17 pages
Deep Coarse-to-Fine Dense Light Field Reconstruction With Flexible Sampling and Geometry-Aware Fusion
No ratings yet
Deep Coarse-to-Fine Dense Light Field Reconstruction With Flexible Sampling and Geometry-Aware Fusion
18 pages
Depth Estimation From A Single Image Using Deep Learned Phase Coded Mask
No ratings yet
Depth Estimation From A Single Image Using Deep Learned Phase Coded Mask
12 pages
Monocular Depth Estimation Based On Deep Learning An Overview
No ratings yet
Monocular Depth Estimation Based On Deep Learning An Overview
16 pages
Neural RGB D Sensing: Depth and Uncertainty From A Video Camera
No ratings yet
Neural RGB D Sensing: Depth and Uncertainty From A Video Camera
13 pages
Depth Anything: Unleashing The Power of Large-Scale Unlabeled Data
No ratings yet
Depth Anything: Unleashing The Power of Large-Scale Unlabeled Data
18 pages
Depthanything
No ratings yet
Depthanything
18 pages
Passive Depth Estimation Using Chromatic Aberration
No ratings yet
Passive Depth Estimation Using Chromatic Aberration
13 pages
Levin 2007
No ratings yet
Levin 2007
10 pages
Neural RGBRD Sensing Depth and Uncertainty From A Video Camera
No ratings yet
Neural RGBRD Sensing Depth and Uncertainty From A Video Camera
10 pages
Fdsafdsfsafasdfbrwa
No ratings yet
Fdsafdsfsafasdfbrwa
14 pages
Depth Estimation by Combining Binocular Stereo and Monocular
No ratings yet
Depth Estimation by Combining Binocular Stereo and Monocular
10 pages
CroMo Cross-Modal Learning For Monocular Depth Estimation
No ratings yet
CroMo Cross-Modal Learning For Monocular Depth Estimation
11 pages
NA Diffuser Cam
No ratings yet
NA Diffuser Cam
9 pages
2021-Depth From Defocus With Learned Optics For Imaging and Occlusion-Aware Depth Estimation
No ratings yet
2021-Depth From Defocus With Learned Optics For Imaging and Occlusion-Aware Depth Estimation
12 pages
RadarCam-Depth Radar-Camera Fusion For Depth Estimation With Learned Metric Scale
No ratings yet
RadarCam-Depth Radar-Camera Fusion For Depth Estimation With Learned Metric Scale
8 pages
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
No ratings yet
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
8 pages
Monocular Depth Estimation Based On Deep Learning: An Overview
No ratings yet
Monocular Depth Estimation Based On Deep Learning: An Overview
14 pages
Event-Based Monocular Depth Estimation With Recurrent Transformers
No ratings yet
Event-Based Monocular Depth Estimation With Recurrent Transformers
13 pages
Kinect 4
No ratings yet
Kinect 4
2 pages
Affine Transform Representation For Reducing Calibration Cost On Absorption-Based LWIR Depth Sensing
No ratings yet
Affine Transform Representation For Reducing Calibration Cost On Absorption-Based LWIR Depth Sensing
9 pages
Reliable Fusion of Tof and Stereo Depth Driven by Confidence Measures
No ratings yet
Reliable Fusion of Tof and Stereo Depth Driven by Confidence Measures
16 pages
Zusc S 24 00845
No ratings yet
Zusc S 24 00845
15 pages
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
No ratings yet
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
8 pages
Development of Optical Depth-Sensing Technology With A Mechanical Control Lens and Diffuser
No ratings yet
Development of Optical Depth-Sensing Technology With A Mechanical Control Lens and Diffuser
11 pages
Xin 等 - 2019 - A Theory of Fermat Paths for Non-Line-Of-Sight Sha
No ratings yet
Xin 等 - 2019 - A Theory of Fermat Paths for Non-Line-Of-Sight Sha
10 pages
T54B VCF PDF
92% (12)
T54B VCF PDF
528 pages
Directional TSDF
No ratings yet
Directional TSDF
8 pages
Oe 19 9 8019 PDF
No ratings yet
Oe 19 9 8019 PDF
10 pages
PhaseCam3D Learning Phase Masks For Passive Single View Depth Estimation
No ratings yet
PhaseCam3D Learning Phase Masks For Passive Single View Depth Estimation
12 pages
Focus Set Based Reconstruction of Micro-Objects
No ratings yet
Focus Set Based Reconstruction of Micro-Objects
4 pages
Qin MonoGround Detecting Monocular 3D Objects From The Ground CVPR 2022 Paper
No ratings yet
Qin MonoGround Detecting Monocular 3D Objects From The Ground CVPR 2022 Paper
10 pages
Part I Methods of 3D Computer Vision 1 Geometric Approaches To Three-Dimensional Scene Reconstruction .
No ratings yet
Part I Methods of 3D Computer Vision 1 Geometric Approaches To Three-Dimensional Scene Reconstruction .
5 pages
WTM 4500 - v3
100% (1)
WTM 4500 - v3
46 pages
Lindell 2018
No ratings yet
Lindell 2018
12 pages
Robust Light Field Depth Estimation For Noisy Scene With Occlusion
No ratings yet
Robust Light Field Depth Estimation For Noisy Scene With Occlusion
9 pages
Overview On 3 D Reconstruction From Images
No ratings yet
Overview On 3 D Reconstruction From Images
7 pages
Real-Time Camera Tracking and 3D Reconstruction Using Signed Distance Functions
No ratings yet
Real-Time Camera Tracking and 3D Reconstruction Using Signed Distance Functions
8 pages
Computational Periscopy With An Ordinary Digital Camera
No ratings yet
Computational Periscopy With An Ordinary Digital Camera
6 pages
Generalized Fringe-To-Phase Framework For Single-Shot 3D Reconstruction Integrating Structured Light With Deep Learning
No ratings yet
Generalized Fringe-To-Phase Framework For Single-Shot 3D Reconstruction Integrating Structured Light With Deep Learning
18 pages
Yin Learning To Recover 3D Scene Shape From A Single Image CVPR 2021 Paper
No ratings yet
Yin Learning To Recover 3D Scene Shape From A Single Image CVPR 2021 Paper
10 pages
Csi 2018 Mechanical Division 15
100% (1)
Csi 2018 Mechanical Division 15
303 pages
Novel Approach For Limited-Angle Problems in EM Based On CS: Marc Vilà Oliva and HH Muhammed
No ratings yet
Novel Approach For Limited-Angle Problems in EM Based On CS: Marc Vilà Oliva and HH Muhammed
4 pages
An Integrated Imaging Sensor For Aberration-Corrected 3D Photography
No ratings yet
An Integrated Imaging Sensor For Aberration-Corrected 3D Photography
26 pages
Honda Hornet 2.0 - Owner's Manual
No ratings yet
Honda Hornet 2.0 - Owner's Manual
1 page
Pipeline Pigging
100% (3)
Pipeline Pigging
28 pages
Coursehero 34726625 PDF
No ratings yet
Coursehero 34726625 PDF
11 pages
3D Range Map Using Structured Light and Three-Point Epipolar Constraints
No ratings yet
3D Range Map Using Structured Light and Three-Point Epipolar Constraints
4 pages
(Classics in Applied Mathematics) Stephen L. Campbell, Carl D. Meyer - Generalized Inverses of Linear Transformations - Society For Industrial and Applied Mathematics (2008)
100% (1)
(Classics in Applied Mathematics) Stephen L. Campbell, Carl D. Meyer - Generalized Inverses of Linear Transformations - Society For Industrial and Applied Mathematics (2008)
294 pages
Punjab University Lahore B A English Explanation of Poems Edited
50% (2)
Punjab University Lahore B A English Explanation of Poems Edited
41 pages
Heat of Reaction
83% (6)
Heat of Reaction
8 pages
Happy Baby Cook Book (Aida Kamanyire)
No ratings yet
Happy Baby Cook Book (Aida Kamanyire)
49 pages
Coir
No ratings yet
Coir
34 pages
Talal Khan CV Civil Engineer - Planning Engineer - Dec 18
100% (1)
Talal Khan CV Civil Engineer - Planning Engineer - Dec 18
4 pages
The
No ratings yet
The
17 pages
Arts7 Q1 M1 Attiresfabricsandtapestriesv Final
100% (2)
Arts7 Q1 M1 Attiresfabricsandtapestriesv Final
28 pages
Automobile Engineering Lab II (ETPM Lab)
No ratings yet
Automobile Engineering Lab II (ETPM Lab)
4 pages
U.K. Chatterjee, S.K. Bose, S.K. Roy - Environmental Degradation of Metals - Corrosion Technology Series - 14-CRC Press (2001)
No ratings yet
U.K. Chatterjee, S.K. Bose, S.K. Roy - Environmental Degradation of Metals - Corrosion Technology Series - 14-CRC Press (2001)
509 pages
The Collected Papers of Peter J. W. Debye, Pgs 500-513
No ratings yet
The Collected Papers of Peter J. W. Debye, Pgs 500-513
30 pages
M.E. Production Engineering - Manufacturing &amp Automation
No ratings yet
M.E. Production Engineering - Manufacturing &amp Automation
41 pages
Ductile Iron Pipe Piles (DSI-Case Atlantic)
No ratings yet
Ductile Iron Pipe Piles (DSI-Case Atlantic)
16 pages
Comprehensive Pharmacology
No ratings yet
Comprehensive Pharmacology
102 pages
Os Study at Penpol PVT LTD
No ratings yet
Os Study at Penpol PVT LTD
88 pages
NRF 24 e 1
No ratings yet
NRF 24 e 1
119 pages
312006-Basic Mechanical Engineering 281223
No ratings yet
312006-Basic Mechanical Engineering 281223
7 pages
Bài ôn tập học kì II - Unit 7 - Test 1
No ratings yet
Bài ôn tập học kì II - Unit 7 - Test 1
5 pages
Sec File Asmita
No ratings yet
Sec File Asmita
13 pages
Topic 1 - Introduction To Cell
No ratings yet
Topic 1 - Introduction To Cell
23 pages
Pointo - Pitch Deck - 5-Dec.'24
No ratings yet
Pointo - Pitch Deck - 5-Dec.'24
15 pages
Intelligence in IoMT Turkey
No ratings yet
Intelligence in IoMT Turkey
17 pages
Pump Cycle Calculator: Input Data
No ratings yet
Pump Cycle Calculator: Input Data
2 pages
Redbull Meijer
No ratings yet
Redbull Meijer
1 page
Curriculum Vitae 2020 2
No ratings yet
Curriculum Vitae 2020 2
2 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Active Contour: Advancing Computer Vision with Active Contour Techniques
From Everand
Active Contour: Advancing Computer Vision with Active Contour Techniques
Fouad Sabry
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
Procedural Surface: Exploring Texture Generation and Analysis in Computer Vision
From Everand
Procedural Surface: Exploring Texture Generation and Analysis in Computer Vision
Fouad Sabry
No ratings yet
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet

Joint Image and Depth Estimation With Mask-Based Lensless Cameras

Uploaded by

Joint Image and Depth Estimation With Mask-Based Lensless Cameras

Uploaded by

IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, VOL.

Joint Image and Depth Estimation With

Algorithm 1: Weighted TV-2 Regularized Optimization.

exponential decay in our experiments that we compute as

F. Computational Complexity TABLE I

IV. SIMULATION RESULTS

solve the 3D recovery problem using the forward model and

physical size of the generated mask pattern is approximately 15.3

You might also like