0% found this document useful (0 votes)
59 views11 pages

Hyperspectral Compressive Sensing With A System-On-Chip FPGA

This article proposes a hardware/software implementation of compressive sensing on a field-programmable gate array (FPGA) for hyperspectral imagery. The FPGA architecture is able to run compressive sensing on an AVIRIS hyperspectral image with 512 lines, 614 samples, and 224 bands in 0.35 seconds, which is 49x and 216x faster than a GPU and ARM processor respectively. The FPGA implementation also requires around 100x less energy than alternative platforms. Compressive sensing is well-suited for hyperspectral data compression due to its sparse and low-dimensional structure, and can enable efficient on-board processing and transmission of massive hyperspectral datasets.

Uploaded by

Somrita sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views11 pages

Hyperspectral Compressive Sensing With A System-On-Chip FPGA

This article proposes a hardware/software implementation of compressive sensing on a field-programmable gate array (FPGA) for hyperspectral imagery. The FPGA architecture is able to run compressive sensing on an AVIRIS hyperspectral image with 512 lines, 614 samples, and 224 bands in 0.35 seconds, which is 49x and 216x faster than a GPU and ARM processor respectively. The FPGA implementation also requires around 100x less energy than alternative platforms. Compressive sensing is well-suited for hyperspectral data compression due to its sparse and low-dimensional structure, and can enable efficient on-board processing and transmission of massive hyperspectral datasets.

Uploaded by

Somrita sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.2996679, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
JOURNAL OF LATEX , VOL. XX, NO. X, AUGUST 20XX 1

Hyperspectral Compressive Sensing with a


System-On-Chip FPGA
José M. P. Nascimento, Mário Véstias and Gabriel Martı́n

Abstract—Advances in hyperspectral sensors have led to a less data than those dictated by the Shannon–Nyquist theorem
significantly increased capability for high-quality data. This trend and reconstructed accurately with these sampled data [14],
calls for the development of new techniques to enhance the way [15]. Fortunately, the structure of hyperspectral data is sparse
that such unprecedented volumes of data are stored, processed,
and transmitted to the ground station. [16], [17] and it can be modeled by a linear mixing model,
An important approach to deal with massive volumes of considering that the total set of pixel vectors are represented
information is an emerging technique, called compressive sensing, by a few number of endmembers [9], [18]. Additionally, these
which acquire directly the compressed signal instead of acquiring images also present a high correlation in the spatial domain,
the full data set. Thus reducing the amount of data that needs which may improve the compression ratio and the quality
to be measured, transmitted and stored in first place.
In this paper, a hardware/software implementation in a system- of the reconstructed image [19]. These hyperspectral features
on-chip Field-Programmable Gate Array (FPGA) for compres- have encouraged recent developments and implementations of
sive sensing is proposed. The proposed hardware/software archi- CS techniques on hyperspectral imagery [20]–[26].
tecture runs the compressive sensing algorithm with a unitary Running compressive sensing algorithms in on-board pro-
compression rate over an Airborne Visible/Infrared Imaging cessing platforms is subject to throughput and power con-
Spectrometer (AVIRIS) sensor image with 512 lines, 614 samples,
224 bands in 0.35 seconds. The proposed system runs 49× and straints. Images are acquired at a certain rate and therefore
216× faster than an embedded 256-cores GPU of a Jetson TX2 these platforms must be fast enough for real-time processing
board and the ARM of the system-on-chip FPGA, respectively. to avoid image storage. For example, the AVIRIS senses 512
In terms of energy, the proposed architecture requires around pixels of 224 spectral bands in 8.3 ms. So, 614 samples must
100× less energy. be processed in about 5 s. On-board processing is also subject
Index Terms—Hyperspectral Imagery, Compressive Sensing, to power constraints. Thus, platforms must be designed for
Field-Programmable Gate Arrays (FPGA), On-board Processing, best energy efficiency with reduced power.
Real-Time. Since CS measurement process is based on performing a
large number of parallel dot products between random vectors
I. I NTRODUCTION and the signal of interest, Graphics Processing Units (GPUs)
are well suited to perform this task. Several implementations
H YPERSPECTRAL sensors acquire images containing
hundreds of spectral data bands with high spatial and
spectral resolution. The high spectral resolution of these sen-
of CS and random projections algorithms over hyperspectral
data with GPUs have been proposed [27]–[29], concluding that
sors allows an accurate identification of the different materials by using GPUs it is possible to achieve real-time performance
contained in the scene of interest. This feature among others, for the random projection step. On the other hand, the power
has turn hyperspectral images into a powerful tool in many ap- requirements of this hardware makes them ineffective for
plications in the fields of agriculture [1], surveillance [2], [3], on-board applications. Over the last years, the advances in
medical imaging [4], food safety [5], [6], forensic applications semiconductor industry and the huge interest on developing
[7], [8], and many others [9]. mobile devices have allowed companies such as Nvidia to
Considering the collected data in a 2D spatial domain of develop low power GPUs. For example, Jetson TX2 board has
megapixel size and with the spectral dimension with hundreds a low power consumption GPU, that nevertheless, can achieve
of bands, one can represent the data as a three-dimensional high throughput in image processing applications at the same
image cube comprising a huge amount of data. Consequently, time [11], [30], [31].
its scanning, storage and digital processing is challenging [10]. Field-programmable gate arrays (FPGA) are also a good
In remote sensing scenarios where hyperspectral images are platform for on-board processing systems since they have high
collected on-board satellites and need to be transferred to the computational performance, compact size, reduced weight and
Earth’s ground station, an efficient compression of such images low power consumption among other characteristics. Addi-
is mandatory [11]. tionally, FPGAs permit the adaptation of the hardware to the
The compressed sensing (CS) theory proposed in [12], needs of different missions, which make them appealing for
[13] has received considerable interest since it states that if satellite platforms. [32]–[36]. However, in order to include
a signal is sparse itself, thus it can be sampled with much FPGA in satellite payload, they must be resistant to damages
or malfunctions caused by ionizing radiation, present in the
J. Nascimento is with Instituto Superior de Engenharia de Lisboa and with harsh environment of outer space [37]–[39]. The available
Instituto de Telecomunicações, Lisbon, Portugal. rad-hard FPGAs (e.g. Virtex-5QV) easily provide sufficient
M. Véstias is with Instituto Superior de Engenharia de Lisboa and with
INESC-ID , Lisbon, Portugal. resources to implement the proposed architecture with the
Manuscript received April xx, 20xx; revised August xx, 20xx. same performance and with fault tolerance.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.2996679, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
JOURNAL OF LATEX , VOL. XX, NO. X, AUGUST 20XX 2

In this paper, a hardware/software architecture is proposed code of the compressive sensing method is given by Algorithm
to run the CS method. The architecture is implemented in a 1.
System-on-Chip (SoC) FPGA. The performance of the system
is compared to the execution of the algorithm in the ARM of Algorithm 1: Compressive sensing algorithm.
the SoC FPGA and in an embedded GPU on a Jetson TX2 Input: X - 3D Hyperspectral image with
board. The work compares the performance and the power (lines × samples × bands) pixels;
consumption of the three implementations. The results indicate H - Matrix of random vectors;
that the proposed system delivers a peak performance of 96.8 ws - Window size
GOPs (Giga Operations per second). and runs the compressive Result: Y - Compressed Image of size
sensing algorithm with a unitary compression rate over an lines × samples × q
AVIRIS sensor image with 512 lines, 614 samples, 224 bands 1 for j ← 0 to lines − 1 do
in 0.35 seconds. Compared to the other two platforms, it 2 il ← j mod ws;
runs 49 times and 216 times faster than the embedded GPU 3 for i ← 0 to samples − 1 do
and the ARM, respectively. In terms of energy, the proposed 4 ic ← i mod ws;
architecture requires around 100 times less energy than the 5 for n ← 0 to q − 1 do
other two solutions. 6 Y [n + (j × samples + i) × q] ←
The remainder of this paper is organized as follows. In Pbands−1
H[b + n × bands + (il × ws +
b=0
section II, the compressive sensing method is summarized. ic) × bands × q] × X[b + (j × samples +
Section III describes the design and implementation of the i) × bands]
proposed hardware/software architecture. In section IV, a set 7 end
of experiments are conducted to demonstrate the effectiveness 8 end
of the architecture to execute compressed sensing. Also, the 9 end
architecture is compared to other computing platforms. Finally,
section V presents some conclusions and future lines of
research work. Algorithm 1 considers that matrix X is represented as a 2D
matrix of size lines×samples where each entry of the matrix
(pixel) is a vector of bands and that the disposition on memory
II. C OMPRESSIVE S ENSING M ETHOD
follow the Band-Interleaved-by-Pixel format (BIL). For each
In this section the CS method termed Hyperspectral Coded pixel X(i, j), q inner products are calculated and accumulated
Aperture (HYCA) [22] is briefly described. This method for its between its vector of bands and a vector of matrix H. The
characteristics is well suited to be developed in a parallel fash- result is stored in matrix Y. Each inner product with a vector
ion [27]. It takes advantage of two central properties of most of H produces one element of the vector associated with the
hyperspectral images: i) the spectral vectors live systematically compressed pixel.
in low dimensional subspaces [17] and ii) the spectral bands Considering that the measurements are sent from the on-
present a high correlation in the spatial domain. The former board platform, the bulk of the processing to reconstruct the
property allows to represent the data vectors using a reduced original image is performed on the Earth ground station. The
set of spectral endmembers due to the mixing phenomenon reconstruction of the original image can be formulated as an
[18] and also exploits the high spatial correlation of the optimization problem, where it is assumed that the dataset
fractional abundances associated to the spectral endmembers. live in a low-dimensional subspace [17]. Furthermore, the
HYCA performs CS in the spectral domain, for this purpose abundances exhibit a high spatial correlation and must be
a set of q inner products between random vectors and the nonegative, these features are exploited for estimating z using
image pixels is performed, with q lower than the original the following:
number of bands of the hyperspectral data bands. Thus the
size of the compressed signal is bands times smaller than the min (1/2)ky − Hzk2 + λT V TV(z). (1)
q z≥0
original. This operation may be represented as yp = Hp xp for
p ∈ {1 . . . n}, where n is the number of pixels of the image, Therefore, the minimization of Eq. (1) aims at finding a solu-
yp ∈ Rq is the p-th compressed pixel, Hp is a matrix con- tion which is a compromise between the fidelity to the mea-
taining the random vectors used for the measurement process sured data, enforced by the quadratic term (1/2)ky − Kzk2 ,
for the p-th pixel, which is represented as xp ∈ Rbands . Due and the properties enforced by the total variation regularizer
to the fact that the number of pixels n in a given scene may TV(z), that is piecewise smooth image of abundances. The
be very large, for instance, AVIRIS sensor acquire for each relative weight between the two characteristics of the solution
image scene a set of 512 scans containing 614 samples and 224 is set the regularization parameter λT V > 0.
bands, which yields approximately 140 Megabytes (MB). Thus To solve the convex optimization problem in Eq. (1), a set
storing in memory different matrices Hp for p ∈ {1 . . . n} is of new variables per term of the objective function were used
unattainable, HYCA measurement strategy splits the dataset and the ADMM methodology [40] has been adopted to de-
into different windows of size m = ws × ws and then repeat compose very hard problems into a cyclic sequence of simpler
the matrices Hi used in each window, thus requiring to store in problems. Further details on the algorithm implementation and
memory just m different Hp matrices. Formally, the pseudo- its parallelization can be found in works [27], [41].

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.2996679, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
JOURNAL OF LATEX , VOL. XX, NO. X, AUGUST 20XX 3

III. D ESIGN AND I MPLEMENTATION OF THE Algorithm 2: Compressive sensing algorithm with
H ARDWARE /S OFTWARE A RCHITECTURE optimization of memory accesses.
In this section, the hardware/software architecture to run the Input: X - 3D Hyperspectral image with
compressive sensing algorithm is described. The methodology (lines × samples × bands) pixels
followed to design the hardware architecture consisted of the H - Matrix of random vectors;
following steps: ws - Window size
1) Algorithm Optimization - The algorithm was reorga- Result: Y - Compressed Image of size
nized to improve data accesses from external memory. lines × samples × q
Loop tilling exploits spatial and temporal locality al- 1 for r ← 0 to ws − 1 do

lowing data to be accessed in blocks (tiles) permitting 2 for k ← 0 to ws − 1 do


to execute the operations over a block of data stored in 3 for m ← 0 to q × bands − 1 do
on-chip memory; 4 Hlocal [m] ← H[m+(r×ws+k)×bands×q]
2) Architecture Design - Design of a hardware/software 5 end
architecture where the hardware runs the compute in- 6 for j ← r to lines-1, j = j + ws do
tensive operations of the algorithm and the processor 7 for i ← k to samples-1, i = i + ws do
controls the cycles of the algorithm and data transfers 8 for l ← 0 to bands-1 do
from/to external memory. 9 Xlocal [l] ←
X[l + (j × samples + i) × bands]
10 end
A. Algorithm Optimization
11 for n ← 0 to q-1 do
The original algorithm slides sequentially over the input 12 Y [n + (j × samples + i) × q] ←
pixels of the hyperspectral image. The problem with this Pbands−1
Hlocal [b + n × bands] ×
b=0
approach is that matrix H is not reused by the next pixel. Xlocal [b]
So, unless the local memory is enough to hold all ws × ws 13 end
H matrices, each H matrix is read lines×samples
ws×ws times from 14 end
main memory. This introduces a penalty in the execution time 15 end
of the algorithm. 16 end
Since there are no data dependencies between the calcu- 17 end
lation of different output pixels and there are no constraints
over the order with each output pixels must be produced, a
loop tilling technique has been applied to the algorithm which SoC FPGA
guarantees that each matrix H is only read once from main Processing System
memory (see Algorithm 2).
In the optimized algorithm, each different matrix H is only ARM

read once and used in all pixels multiples of the window size. Mem
samples External
Each matrix H is then reused lines ws × ws times before Ctrl Memory
the next matrix H is read. Also, each pixel is reused q times
in the calculation of the inner products between the pixel and HP Ports
the vectors of matrix H.

B. FPGA Architecture Data Dispatch


& Interconnect
This section presents the proposed hardware/software imple-
mentation of Algorithm 2 described in the previous section.
The architecture is designed to support real-time processing of Inner Product
hyperspectral images acquired from AVIRIS sensor. AVIRIS
is a whiskbroom scanning system that collects data in a 12 bit Programmable Logic
quantization. Each image contains 614×512 pixels comprising
224 spectral bands in the range from 370 to 2500 nm 1 . Fig. 1. Hardware/Software architecture designed to run the compressive
sensing algorithm.
Radiance values are stored, after onboard calibration, with 16-
bit integers [42]. Thus, the proposed architecture use 16 bits
short integer (int16) which guarantees enough precision for
the algorithm. namely the cycles in lines 1, 2, 6, and 7 of the algorithm.
The architecture consists of a general-purpose processor In lines 3-5 and lines 8-10 the processor configures a set of
(ARM) and a dedicated hardware accelerator to run the core of DMAs (Direct Memory Access) to send matrix H and the
the algorithm. Algorithm 2 was partitioned into the accelerator input image to the hardware. Lines 11-13 are implemented
and the processor. The processor controls the whole algorithm, in hardware consisting of the inner product calculation. The
compressed image is sent back to main memory. The block
1 [Online] Available: https://aviris.jpl.nasa.gov/html/aviris.instrument.html diagram of the architecture is illustrated in Fig. 1.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.2996679, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
JOURNAL OF LATEX , VOL. XX, NO. X, AUGUST 20XX 4

The architecture contains an ARM processor with access chip memory has an associated address generator that gener-
to external memory and to the accelerator implemented in ates read and write addresses of simple dual port memories.
the programmable logic of the FPGA. The transfer of data On-chip memories are dual-port to allow simultaneous read
between external memory and accelerator in the programmable and write of data.
logic is done through four High-Performance (HP) ports All four DMAs are used to read data from external memory
using four DMA (Direct Memory Access) blocks. The Data to on-chip memory. One DMA is used to transfer the com-
Dispatch & Interconnect (DPI) block is responsible for pressed image from the local memory (Y memory) to external
forwarding data between DMA buffers and on-chip memories memory.
used to store the matrices of the algorithm to be processed The H and pixel memories store four bands of four different
by the inner product module. Block DPI is configured and H vectors and pixels, respectively, in each memory write and
controlled by the processor. the Y memory stores four bands of an output pixel in a single
The architecture can be implemented in a non SoC FPGA write. H and pixel memories must have a large bandwidth
replacing the processor of the SoC FPGA by a soft processor so that multiple values are read in parallel. Therefore, these
or a dedicated controller for a fixed algorithm configuration. memories are implemented with a set of distributed memories
The Inner Product block implements the inner products (BRAMs of the FPGA) each having an output datawidth of
of the algorithm (lines 11-13). To improve the performance 64 bits (see figure 4).
of the solution, multiple values are read from local memories Each memory block H stores several vectors of the H
and calculated in parallel (see Fig. 2). matrix and each memory block P ixelM em stores several
All bands of a pixel can be read and multiplied by an pixels.
H vector in parallel. More parallelism can be exposed by The main control block of the data dispatch and interconnect
unrolling the cycle of line 11 (parallel inner-products for block guarantees the synchronization between data commu-
different q). The inner product block is statically configurable nication and computation. Following algorithm 2 described
in terms of parallel multipliers and unrolling factor, permitting previously, the ARM sets the control block to configure the
to optimize the architecture for a required throughput and DMAs to transfer the H matrix and the set of pixels to the
available FPGA resources. When unrolled, multiple inner local memories of the architecture. After transferring the first
products are calculated in parallel, one for each value of data, the controller signals the address generators and the inner
q. The example illustrated in Fig. 2 and later used in the product block to start execution. At the same time, configures
tested implementations, corresponds to an implementation of the DMAs to transfer the next H matrix and the next set of
the cycle unrolled four times. pixels of the input image.
The output of the multipliers are accumulated with an adder After finishing the operation, the inner product block notifies
tree. The adder tree guarantees full arithmetic, that is, adders the controller. If the next H matrix and the first pixels
of each subsequent level have an extra bit to represent the for the next inner product are already available in the on-
result to keep full precision. The number of levels of the adder chip memories, the controller signals the address generators
tree, l, depends on the number of multipliers, N mult, that to restart again. The process repeats until the end of the
is: l = dlog2 N multe. The accumulator (ACC) in the end algorithm.
of the adder tree is required if the number of multipliers is
lower than the number of bands. In this case, the inner-product
IV. E XPERIMENTAL RESULTS
must be executed in multiple steps and the intermediate results
are accumulated in ACC. For example, to process 224 bands The proposed hardware architecture has been described in
with only 112 multipliers, the architecture determines the inner VHDL and implemented on a Xilinx Zynq Zedboard with a
product of the first 112 bands and the result is accumulated XC7Z020 SoC FPGA. The hardware design and implementa-
with the inner product of the next 112 bands. Therefore, in tion has been done with Vivado Design Suite 2019.1 and the
this case, it takes two steps. To improve the throughput of the power of the circuits has been estimated with Xilinx Power
circuit, the whole datapath is pipelined, illustrated with gray Estimator tool.
lines (registers) in the figure. The FPGA board has 512 MB DDR3 memory with a
After calculating the inner-products, the results are truncated measured 3.3 GB/s of memory bandwidth. This is the memory
back to 16 bits before being stored and sent to external bandwidth available to transfer the hyperspectral image and
memory. matrices H to the FPGA and the compressed image from the
The Data Dispatch and Interconnect block transfers data FPGA to the external memory.
(H matrix, image) from external memory to on-chip memories The target FPGA is a system on chip that contains a Dual
TM
(H memory and pixel memory) and from the Y memory ARM Cortex
R -A9 and a reconfigurable area with Artix -
R
(compressed image) to external memory (see Fig. 3). 7 technology. This family of FPGAs is quite appropriate
The data transfer is done through the four High-Performance to develop embedded systems making these boards ideal
(HP) ports of the ZYNQ FPGA that allow a total data transfer for fast prototyping, proof-of-concept development and fast
of up to (4 × 1.2 GBytes per second). Four DMAs are used deployment of embedded systems. The programmable logic of
to do data transfers. These are dynamically configured for the FPGA has 85K logic cells with 106400 registers, 53200
specific data transfers (start address and data size) by a central look-up tables (LUTs), 140 BRAMs and 220 digital signal
controller that is configured by the ARM processor. Each on- processing blocks (DSP48).

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.2996679, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
JOURNAL OF LATEX , VOL. XX, NO. X, AUGUST 20XX 5

h0,0 p0 h0,m-1 pm-1 h1,0 p0 h1,m-1 pm-1 h2,0 p0 h2,m-1 pm-1 h3,0 p0 h3,m-1 pm-1
16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16

X X X X X X X X
32 32 32 32 32 32 32 32

Pipelined
Adder Tree
log2NºMult
levels
+ + + +

+ + + +
40 40 40 40

ACC ACC ACC ACC

y0 y1 y2 y3

Fig. 2. Architecture of the inner product block of the proposed architecture.

TABLE I
ARM U TILIZATION OF RESOURCES OF THE PROPOSEDARCHITECTURE FOR
DIFFERENT CONFIGURATIONS IN TERMS OF NUMBER OF MULTIPLIERS OF
GP0 THE INNER PRODUCT BLOCK AND FOR SEVERAL VALUES OF q.

16
h0,0
Control # Multipliers LUTs DSPs BRAMs q:224, 112, 48, 16
H
Mem 56 × 4 15188 220 135, 135, 135, 135
64 64
Addr
16 28 × 4 10253 112 107, 79, 79, 79
HP DMA Gen hk-1,m-1
14 × 4 8430 56 93, 65, 51, 51
64 64 16
256 p0
HP DMA 7×4 7452 28 86, 58, 44, 36
Pixel
Mem
4×4 7001 16 83, 55, 41, 30
64 64
HP DMA Addr
16
Gen pm-1
64 64
HP DMA
64
To test the architecture, the DDR memory available in the
64
Y 64 board was utilized to store the dataset. The ARM processor
Y0-3
Mem
Addr available in the FPGA was utilized as the processor of the
Data Dispatch & Gen
Interconnect proposed architecture.
The experiments are carried out on the Cuprite AVIRIS
Fig. 3. Block diagram of the data dispatch and interconnect block. scene labeled as f970619t01p02 r02 sc03.a.rfl. This scene has
614 × 512 pixels comprising 224 spectral bands.
4x16
h{0, 4, 8, …},{[0,3], .. [168,171]}
4x16
256 h{1, 5, 9, …},{[0,3], .. [168,171]} A. Area, Performance and Power of the Proposed Architecture
H 4x16
h{2, 6, 10, …},{[0,3], .. [168,171]} Several implementations of the proposed architecture were
4x16 designed for different compression rates and with different
h{3, 9, 11, …},{[0,3], .. [168,171]}
4x16 number of multipliers in a pipelined datapath to calculate four
h{0, 4, 8, …},{[4,7], .. [172,175]} output pixels in parallel (unroll factor of four). The utilization
4x16
256 h{1, 5, 9, …},{[4,7], .. [172,175]} of resources after post-place and route is given in table I.
H 4x16
h{2, 6, 10, …},{[4,7], .. [172,175]} The number of multipliers determines the number of used
4x16 DSPs, while q determines the number of BRAMs of the
h{3, 9, 11, …},{[4,7], .. [172,175]}
architecture. The largest architecture, with the higher critical
path, operates at a maximum frequency of 220 MHz. The same
256 Pixel 4x16
X{0, 8, 16, 24},{[0,3], .. [168,171]} operating frequency was considered for all architectures.
Mem
The architecture with 224 multipliers was implemented
256 Pixel 4x16
X{0, 8, 16, 24},{[4,7], .. [172,175]}
for different compression rates and the execution times to
Mem compress an hyperspectral image of size 512 × 614 × 224
were determined (See the results for window sizes of 8 and
64 in Fig. 5).
Fig. 4. Organization of the on-chip memories: H and Pixel memories.
With a unitary compression, the proposed circuit com-
presses an hyperspectral image of size 512 × 614 × 224 in
0.35 seconds. The execution time reduces five times with a

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.2996679, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
JOURNAL OF LATEX , VOL. XX, NO. X, AUGUST 20XX 6

0,40 0,40

0,35 0,35
ws = 8 0,30
0,30

Execution Time (s)


ws = 64
Execution Time (s)

0,25
0,25
0,20
0,20
0,15
0,15
0,10
0,10
0,05
0,05
0,00
0,00 1,0 2,0 4,0 8,0 16,0 32,0 64,0
1,0 2,0 4,0 8,0 16,0 32,0 64,0 Compression Rate (bands/q)
Compression Rate (bands/q)
Communication Computation Total

Fig. 5. Execution time of the compression process of a hyperspectral image


(512 × 614 × 224) for different compression rates and window sizes of 8 and Fig. 7. Computation versus communication time of the compression process
64. The circuit runs at 220 MHz in a ZYNQ7020 SoC FPGA. of a hyperspectral image (512 × 614 × 224), for different compression rates
and ws = 64. The circuit runs at 220 MHz in a ZYNQ7020 SoC FPGA.

20,00 4 1,40 100


18,00 3,5 90

Performance Efficiency (%)


1,20
16,00 80

Execution time (s)


3 1,00 70
Execution time (s)

14,00
0,80 60
12,00 2,5 Power (W)
50
0,60
10,00 2 40
8,00 0,40 30
1,5 20
6,00 0,20
10
1
4,00 0,00 0
2,00 0,5 1,0 1,1 1,2 1,3 1,4 1,6 1,8 2,0 2,3 2,8 3,5 4,7 7,0 14,0 28,0 56,0 74,7

0,00 0 Compression Rate (bands/q)


Perf. (224 mult) Perf. (112 mult) Perf. (56 mult)
4 8 16 28 56 112 224
Perf. Eff. (224 mult) Perf. Eff. (112 mult) Perf. Eff. (56 mult)
Number of Multipliers
Performance (q:224) Performance (q:112) Performance (q:48)
Performance (q:16) Aviris Time Power Fig. 8. Performance and performance efficiency of the compression process
of a hyperspectral image (512 × 614 × 224), for different compression ratios
and three different architectures (with 224, 112 and 56 multipliers) running
Fig. 6. Execution time of the compression process of a hyperspectral image at 220 MHz in a ZYNQ7020 SoC FPGA.
(512 × 614 × 224), for different values of q, and for different levels of
architectural parallelism. The circuit runs at 220 MHz in a ZYNQ7020 SoC
FPGA.
of computation and communication. Therefore, the architec-
ture can be optimized in terms of resources considering the
compression rate of 14. The circuit has a total power of 3.66 compression rate. The execution times of both components,
W. Another important observation is that the execution time communication and computation, have been determined (see
of the compression algorithm in our architecture reduces with figure 7).
the size of the window. The figure shows the results for the For low compression rates the total execution time is deter-
extreme cases of window sizes: 8 and 64. The largest variations mined by the computational performance of the architecture.
occur for the higher compression rates. For compression rates higher than three the total execution
The previous performance results are for a configuration time is determined by the communication performance of the
with the highest throughput. However, since the AVIRIS architecture. In this case the bottleneck is associated with the
sensor acquires 512 pixels of 224 spectral bands in 8.3 ms memory bandwidth.
[43] (5.1 seconds to process 614 samples), the parallelism can Considering the communication to computation ratios, in the
be reduced, which reduces the required hardware resources design of architecture it is important to determine how efficient
and the power (see Fig. 6). the computational resources are used. The metric to quantify
As can be observed, with just 16 multipliers in parallel, with this is performance efficiency, which is the ratio between
q = 224 and a power 1.94 W it is possible to execute AVIRIS measured performance and peak performance converted to
images in real-time. This design uses just 7001 LUTs, 16 DSPs percentage.
and 31 BRAMs. The power is smaller but the energy increases To analyze the efficiency of the proposed architecture,
from 1.3 Joules to 9.6 joules, since the power associated three different architectures with different trade-offs between
with the processor is almost constant for different hardware performance and performance efficiency were implemented
architectures. With compression rates above four, the results for a window size of 64. The architectures have 224, 112
show that four multipliers are enough to run the algorithm in and 56 parallel multipliers. In all cases, we measured the
real-time. Consequently, the hardware resources are drastically execution times for different compression ratios and from these
reduced. the performance efficiency was determined (see figure 8).
The compression rate determines the requirements in terms As can be observed from the figure, the performance ef-

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.2996679, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
JOURNAL OF LATEX , VOL. XX, NO. X, AUGUST 20XX 7

ficiency reduces considerably (from 91% to 31%) with the TABLE II


compression rate when the architecture is designed with 224 C OMPARISON OF THE DELAY OF PROPOSED ARCHITECTURE AGAINST AN
EMBEDDED GPU AND A DUAL - CORE ARM PROCESSOR RUNNING
multipliers. This is because the reduction in the compression COMPRESSIVE SENSING OVER AN AVIRIS SENSOR IMAGE WITH 512
rate increases the ratio between communication delay and LINES , 614 SAMPLES , 224 BANDS AND A WINDOW SIZE OF 64.
computation delay. When the communication delay is higher
than the computation delay the idle times of computational Compression Rate Embedded GPU (s) ARM (s) This work (s)
units increase and consequently the performance efficiency 1.0 17.1 75.8 0.35
reduces. 1.1 15.2 71.2 0.33
When the number of multipliers is reduced to half, the 1.2 14.0 65.3 0.31
computation time doubles reducing the communication to 1.3 12.8 60.3 0.29
computation ratio. With 112 multipliers the performance ef- 1.4 11.9 55.4 0.27
ficiency reduces to 61% with a compression ratio of 14 up 1.6 10.8 50.2 0.25
to 20% with the highest compression rates. The increase in 1.8 9.1 45.3 0.22
the efficiency is traded-off by performance, that is, improving 2.0 8.3 40.1 0.20
the efficiency reduces the performance since there are less 2.3 7.1 35.0 0.18
computational resources. With 56 multipliers the highest per- 2.8 5.9 29.8 0.16
formance efficiency is kept until a compression rate of 14. 3.5 5.0 24.7 0.14
Since the execution times of circuit with a compression rate 4.7 4.1 19.6 0.12
above 14 are independent of the number of multipliers, the re- 7.0 1.9 14.6 0.09
duction in performance efficiency is due to the communication 14.0 1.1 9.9 0.07
bottleneck.
The most appropriate architecture depends on the appli-
cation requirements in terms of required performance and in these platforms achieves a slightly better performance than
energy. The designer should always try to be as close to running with floating-point arithmetic. This may be caused by
the requirement as possible to improve the efficiency of the improved utilization of cache or better compiler optimization.
architecture. The real hyperspectral data set, acquired by the AVIRIS
sensor, used in this experiments has 614 samples times 512
B. Comparison with Other Embedded Computing Platforms lines and 224 bands. The window size is set to 64 (see results
The performance, power and energy consumption of the in table II).
proposed SoC architecture was compared to other embedded The results show that the proposed hardware/software ar-
computing platforms, namely the embedded GPU of the Jetson chitecture is 49 times faster than the implementation in the
TX2 platform [44] and the ARM processor of the ZYNQ7020 embedded GPU and 216 times faster than the solution with
SoC FPGA, in the execution of the compressive sensing the dual-core ARM processor.
algorithm. Considering power and energy consumption, the proposed
Jetson TX2 incorporates a quad-core 2.0-GHz 64-bit solution is also better (see energy results in table III). However,
ARMv8 A57 processor, a dual-core 2.0-GHz superscalar it should be noted that the GPU and ARM implementations
ARMv8 Denver processor and an integrated embedded low were not subject to the same development effort. So, their
power Pascal GPU. There are two 2-MB L2 caches, one shared results can potentially be improved, reducing the gap to the
by the four A57 cores and one shared by the two Denver FPGA solution.
cores. The GPU has two streaming multiprocessors (SMs), Since the SoC FPGA needs less power and executes the
each providing 128 1.3-GHz cores that share a 512-KB L2 algorithm faster, the energy is from 77 to 119 times lower
cache. The six CPU cores and integrated GPU share 8 GB of than energy used by the embedded GPU and 146 to 99 times
1.866-GHz DRAM memory [45]. The Jetson TX2 typically lower than the energy of the ARM processor.
draws between 7.5 and 15 watts with a voltage input of 5.5V-
19.6V DC and requires minimal cooling and additional space.
C. Comparison with Other FPGA-based Platforms
The processing system side of the ZYNQ7020 device con-
tains a dual-core ARM Cortex-A9 working with a frequency As far as we know there is no other previous implementation
of 667 MHz. The memory hierarchy consists of 32 KB level-1 of compressive sensing in FPGA. Most of the related work
cache for each core and 512 KB level-2 cache common to both on FPGA for compressive sensing methods are concerned
cores, 256 KB of on-chip memory and a memory controller to with the reconstruction part which can be computed on the
access the external board memory with a measured memory ground based station. However, there are a few developed
bandwidth of 3.3 GBytes/s. The dual-core ARM and caches designs on FPGA for CCSDS 123 recommendation used to
are integrated in a complete processing system that also compress hyperspectral images. A comparison of our work
includes a NEON media processing engine and a single and against implementations of recommendation CCSDS 123 has
double precision vector floating-point unit. The NEON engine been made (see table IV).
was not used to run the algorithm. The reported resources and power of the proposed architec-
All platforms run the integer version of the algorithm for ture are for the whole system, including the data dispatch and
a fair comparison. Running the algorithm with integer data interconnect module. So, it reports in general more resources.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.2996679, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
JOURNAL OF LATEX , VOL. XX, NO. X, AUGUST 20XX 8

TABLE III
C OMPARISON OF THE ENERGY OF THE PROPOSED ARCHITECTURE
AGAINST AN EMBEDDED GPU AND A DUAL - CORE ARM PROCESSOR
RUNNING COMPRESSIVE SENSING OVER AN AVIRIS SENSOR IMAGE WITH
512 LINES , 614 SAMPLES , 224 BANDS AND A WINDOW SIZE OF 64.

Compression Rate Embedded GPU (J) ARM (J) This work (J)
1.0 154.7 128.9 1.3
1.1 135.0 120.9 1.2
1.2 126.0 111.0 1.1
1.3 114.4 96.6 1.0
1.4 103.2 88.5 0.9
1.6 91.3 80.3 0.9
1.8 73.8 72.3 0.7
2.0 65.6 64.2 0.7
2.3 56.7 52.5 0.6
2.8 48.6 44.9 0.5 Fig. 9. Peak Signal to Noise Ratio (PSNR) in decibels (dB) for different
compression ratios. Proposed architecture (solid line); Work [50] (dashed line).
3.5 40.0 37.1 0.4
4.7 32.0 29.4 0.3
7.0 15.6 22.1 0.2 multiprocessors with 128 CUDA cores each at 1.33-GHz and
14.0 7.7 14.6 0.1 4-GB of memory. From the results presented, one can conclude
that, for the same compression ratio, the execution time of
proposed work is lower than others methods.
The power in [48] and [49] only refers to the dedicated hard- The accuracy of the proposed SoC architecture is compared
ware block and does not include the power of the processor with the lossy compression method introduced on work [50].
and data transfer. The proposed architecture can achieve a This method employs a prediction-based scheme, with quanti-
compression ratio of 14 times whereas the reported CCSDS zation and rate-distortion optimization, with a low complexity
implementations have a compression factor lower than 4.6. technique in terms of memory and computational require-
The highest throughputs are also achieved with the proposed ments.
architecture. With a compression rate of 14 the throughput is The experiments are carried out using the Yello-
56% better than the throughput reported in [49]. stone AVIRIS image labeled as “Scene0” from flight
Another aspect of the previous architectures is that it is not f060925t01p00r12. Fig. 9 presents the accuracy in terms of the
clear how data is sent and received from the main computing peak signal-to-noise ratio (PSNR), for different compression
core. A large percentage of the area and consumed energy ratios. For the proposed architecture, the image reconstruction
of the proposed architecture is relative to hardware for data is done with the P-HYCA reconstruction algorithm [27] in a
communication. desktop computer. On this figure, the proposed architecture has
Also, the results of the proposed architecture are for the a PSNR higher than 80 dB for compression ratio smaller than
highest throughput. However, as shown previously, the re- 30×. The PSNR start dropping for compression ratio higher
quired resources reduce drastically if the main goal is only than 50×. The reported results for work [50] are better in
to achieve enough performance for real-time performance. terms of PSNR for compression ratio higher than 75×.

D. Comparison with Other Methods V. C ONCLUSIONS


The performance of the proposed SoC architecture is com- On-board processing systems have recently emerged in
pared with other methods, namely the compressive sensing order to overcome the huge amount of data to transfer from
method called SPECA which was introduced on work [28] and the satellite to the ground station. Hyperspectral imagery is
with a parallel version of HYCA introduced on work [27]. a remote sensing technology that can benefit of on-board
In table V and table VI the methods execution time is processing. This paper proposes a hardware/software FPGA-
compared. In work [27] the method is tested on three different based architecture for compressive sensing of hyperspectral
platforms. First, an Intel i7-4770K CPU at 3.50 GHz with images.
four physical cores and 32 GB of DDR3 RAM memory, The original algorithm was reorganized to improve the
second a GPU on a NVidia GeForce GTX 590 board which accesses to data stored in external memory. The proposed
features 1024 processor cores operating at 1.215 GHz, total architecture has been designed in a Xilinx Zynq board with a
dedicated memory of 3072 MB, and finally a GPU on a NVidia Zynq-7020 SoC FPGA. Experimental results with real hyper-
GeForce GTX TITAN board, which features 2688 processor spectral datasets indicate that the proposed implementation can
cores operating at 876 MHz, total dedicated memory of 6144 fulfill real-time requirements with low resources in a low cost
MB. On work [28], SPECA is tested on a Intel i7-4790 CPU SoC FPGA. The architecture is also around 100 times more
at 3.6-GHz clock speed connected to 32-GB RAM memory energy efficient when compared to a software only solution
and on a GPU NVidia GeForce GTX 980 which contains 16 and to an embedded GPU.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.2996679, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
JOURNAL OF LATEX , VOL. XX, NO. X, AUGUST 20XX 9

TABLE IV
C OMPARISON OF THE PROPOSED ARCHITECTURE AGAINST CCSDS-123 IMPLEMENTATIONS .

Work FPGA Device Freq (MHz) Throughput (Mb/s) Power (W) Compression LUTs DSPs BRAMs
Santos et al. [46] Virtex5 FX130T 140 2240 — 3.45 4645 11 74
Tsigkanos et al. [47] Virtex5 FX130T 213 3300 4.7 - 9462 6 83
Fjeldtvedt et al. [48] ZYNQ7020 147 2350 0.3 - 5872 6 84
Orlandic et al. [49] ZYNQ7020 150 12000 0.5 - 14709 7 37
This Work ZYNQ7020 220 6259 3.7 2 15188 220 135
This Work ZYNQ7020 220 14083 3.7 4.7 15188 220 135
This Work ZYNQ7020 220 18778 2.6 14 10253 112 107

TABLE V [3] P. W. Yuen and M. Richardson, “An introduction to


C OMPARISON OF THE EXECUTION TIMES ( IN SECONDS ) FOR THE hyperspectral imaging and its application for security, surveil-
COMPRESSIVE SENSING ALGORITHMS , FOR A COMPRESSION RATE OF 14.6 lance and target acquisition,” The Imaging Science Journal,
CONSIDERING AN IMAGE WITH 512 LINES , 614 SAMPLES AND 224 BANDS vol. 58, no. 5, pp. 241–253, 2010. [Online]. Available:
. https://doi.org/10.1179/174313110X12771950995716
[4] H. Fabelo., S. Ortega., R. Guerra., G. Callicó., A. Szolna., J. F. Piñeiro.,
Work device Time M. Tejedor., S. López., and R. Sarmiento., “A novel use of hyperspectral
images for human brain cancer detection using in-vivo samples,” in
This work ZYNQ7020 0.07 Proceedings of the 9th International Joint Conference on Biomedical
[27] Intel i7-4770K 5.99 Engineering Systems and Technologies - Volume 4: Smart-BIODEV,
[27] GeForce GTX 590 0.135 (BIOSTEC 2016), INSTICC. SciTePress, 2016, pp. 311–320.
[5] X. Fu and J. Chen, “A review of hyperspectral imaging for
[27] GeForce GTX TITAN 0.107 chicken meat safety and quality evaluation: Application, hardware,
and software,” Comprehensive Reviews in Food Science and Food
TABLE VI Safety, vol. 18, no. 2, pp. 535–547, 2019. [Online]. Available:
C OMPARISON OF THE EXECUTION TIMES ( IN SECONDS ) FOR THE https://onlinelibrary.wiley.com/doi/abs/10.1111/1541-4337.12428
COMPRESSIVE SENSING ALGORITHMS , FOR A COMPRESSION RATE OF 7.7 [6] Y.-Z. Feng and D.-W. Sun, “Application of hyperspectral imaging
CONSIDERING AN IMAGE WITH 512 LINES , 614 SAMPLES AND 224 BANDS . in food safety inspection and control: A review,” Critical
Reviews in Food Science and Nutrition, vol. 52, no. 11,
pp. 1039–1058, 2012, pMID: 22823350. [Online]. Available:
Work Device Time
https://doi.org/10.1080/10408398.2011.651542
This work ZYNQ7020 0.09 [7] J. Yang, D. W. Messinger, and R. R. Dube, “Bloodstain detection and
[28] Intel i7-4790 3.07 discrimination impacted by spectral shift when using an interference
filter-based visible and near-infrared multispectral crime scene imaging
[28] GeForce GTX 980 0.32 system,” Optical Engineering, vol. 57, no. 3, pp. 1 – 10, 2018.
[Online]. Available: https://doi.org/10.1117/1.OE.57.3.033101
[8] B. Li, P. Beveridge, W. T. O’Hare, and M. Islam, “The application
Since the FPGA permits to configure the architecture for of visible wavelength reflectance hyperspectral imaging for the
detection and identification of blood stains,” Science & Justice,
other custom bitwidths, the algorithm and the architecture can vol. 54, no. 6, pp. 432 – 438, 2014. [Online]. Available:
be further optimized for specific bitwidths will less than 16 http://www.sciencedirect.com/science/article/pii/S1355030614000586
bits. However, it requires an additional analysis to determine [9] P. Ghamisi, N. Yokoya, J. Li, W. Liao, S. Liu, J. Plaza, B. Rasti, and
A. Plaza, “Advances in hyperspectral image and signal processing: A
how the bitwidth reduction influences the signal to noise ratio. comprehensive overview of the state of the art,” IEEE Geoscience and
Remote Sensing Magazine, vol. 5, no. 4, pp. 37–78, Dec 2017.
ACKNOWLEDGMENT [10] X. Ceamanos and S. Valero, “4 - processing hyperspectral images,”
in Optical Remote Sensing of Land Surface, N. Baghdadi and
This work was supported by Instituto de Telecomunicações M. Zribi, Eds. Elsevier, 2016, pp. 163 – 200. [Online]. Available:
and Fundação para a Ciência e a Tecnologia (FCT) under http://www.sciencedirect.com/science/article/pii/B9781785481024500041
Project UID/EEA/50008/2019, Project FIREFRONT with ref- [11] R. Guerra, Y. Barrios, M. Dı́az, L. Santos, S. López, and R. Sarmiento,
“A new algorithm for the on-board compression of hyperspectral im-
erence PCIF/SSI/0096/2017, and by national funds through ages,” Remote Sensing, vol. 10, no. 3, p. 428, 2018.
Fundação para a Ciência e a Tecnologia (FCT) with reference [12] D. Donoho, “Compressed sensing,” Information Theory, IEEE Transac-
UIDB/50021/2020. tions on, vol. 52, no. 4, pp. 1289–1306, 2006.
[13] E. J. Candes and M. B. Wakin, “An introduction to compressive
sampling,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 21–30,
R EFERENCES March 2008.
[1] T. Adão, J. Hruška, L. Pádua, J. Bessa, E. Peres, R. Morais, [14] Y. Wu, M. Rosca, and T. Lillicrap, “Deep compressed sensing,” in
and J. J. Sousa, “Hyperspectral imaging: A review on uav- Proceedings of the 36th International Conference on Machine Learning,
based sensors, data processing and applications for agriculture and ser. Proceedings of Machine Learning Research, K. Chaudhuri and
forestry,” Remote Sensing, vol. 9, no. 11, 2017. [Online]. Available: R. Salakhutdinov, Eds., vol. 97. Long Beach, California, USA:
https://www.mdpi.com/2072-4292/9/11/1110 PMLR, 09–15 Jun 2019, pp. 6850–6860. [Online]. Available:
[2] S. B. Tombet, F. Marcotte, Éric Guyot, M. Chamberland, and http://proceedings.mlr.press/v97/wu19d.html
V. Farley, “Toward UAV based compact thermal infrared hyperspectral [15] M. F. Duarte and Y. C. Eldar, “Structured compressed sensing: From
imaging solution for real-time gas detection identification and theory to applications,” IEEE Transactions on Signal Processing, vol. 59,
quantification (Conference Presentation),” in Chemical, Biological, no. 9, pp. 4053–4085, Sep. 2011.
Radiological, Nuclear, and Explosives (CBRNE) Sensing XX, J. A. [16] R. M. Willett, M. F. Duarte, M. A. Davenport, and R. G. Baraniuk,
Guicheteau and C. R. Howle, Eds., vol. 11010, International “Sparsity and structure in hyperspectral imaging : Sensing, reconstruc-
Society for Optics and Photonics. SPIE, 2019. [Online]. Available: tion, and target detection,” IEEE Signal Processing Magazine, vol. 31,
https://doi.org/10.1117/12.2521191 no. 1, pp. 116–126, Jan 2014.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.2996679, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
JOURNAL OF LATEX , VOL. XX, NO. X, AUGUST 20XX 10

[17] J. M. Bioucas-Dias and J. M. P. Nascimento, “Hyperspectral Subspace [38] M. Wirthlin, “High-reliability fpga-based systems: Space, high-energy
Identification,” IEEE Trans. Geosci. Remote Sensing, vol. 46, no. 8, pp. physics, and beyond,” Proceedings of the IEEE, vol. 103, no. 3, pp.
2435–2445, 2008. 379–389, March 2015.
[18] J. M. P. Nascimento and J. M. Bioucas-Dias, “Does Independent [39] S. Lopez, T. Vladimirova, C. Gonzalez, J. Resano, D. Mozos, and
Component Analysis Play a Role in Unmixing Hyperspectral Data?” A. Plaza, “The promise of reconfigurable computing for hyperspectral
IEEE Trans. Geosci. Remote Sensing, vol. 43, no. 1, pp. 175–187, 2005. imaging onboard systems: A review and trends,” Proceedings of the
[19] C. R. Berger, Z. Wang, J. Huang, and S. Zhou, “Application of com- IEEE, vol. 101, no. 3, pp. 698–722, March 2013.
pressive sensing to sparse channel estimation,” IEEE Communications [40] J. Eckstein and D. Bertsekas, “On the Douglas-Rachford splitting
Magazine, vol. 48, no. 11, pp. 164–174, November 2010. method and the proximal point algorithm for maximal monotone op-
[20] Y. Oiknine, I. August, V. Farber, D. Gedalin, and A. Stern, “Compressive erators,” Mathematical Programming, vol. 5, pp. 293–318, 1992.
sensing hyperspectral imaging by spectral multiplexing with liquid [41] G. Martı́n, J. M. Bioucas-Dias, and A. Plaza, “Hyca: A new technique for
crystal,” Journal of Imaging, vol. 5, no. 1, p. 3, 2019. hyperspectral compressive sensing,” IEEE Transactions on Geoscience
[21] P. Xu, B. Chen, L. Xue, J. Zhang, and L. Zhu, “A prediction-based and Remote Sensing, vol. 53, no. 5, pp. 2819–2831, May 2015.
spatial-spectral adaptive hyperspectral compressive sensing algorithm,” [42] A. B. Kiely and M. A. Klimesh, “Exploiting calibration-induced artifacts
Sensors, vol. 18, no. 10, p. 3289, 2018. in lossless compression of hyperspectral imagery,” IEEE Transactions
[22] G. Martin, J. M. Bioucas-Dias, and A. Plaza, “Hyca: A new technique for on Geoscience and Remote Sensing, vol. 47, no. 8, pp. 2672–2678, 2009.
hyperspectral compressive sensing,” IEEE Transactions on Geoscience [43] R. Green, M. Eastwood, C. Sarture et al., “Imaging Spectroscopy and
and Remote Sensing, vol. 53, no. 5, pp. 2819–2831, 2014. the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS),” Rem.
[23] Z. Xiong, Z. Shi, H. Li, L. Wang, D. Liu, and F. Wu, “Hscnn: Cnn-based Sens. of the Environ., vol. 65, no. 3, pp. 227–248, 1998.
hyperspectral image recovery from spectrally undersampled projections,” [44] J. M. P. Nascimento and M. Véstias, “Hyperspectral compressive
2017 IEEE International Conference on Computer Vision Workshops sensing: a comparison of embedded GPU and ARM implementations,”
(ICCVW), pp. 518–525, 2017. in Emerging Imaging and Sensing Technologies for Security and
[24] I. Choi, D. S. Jeon, G. Nam, D. Gutierrez, and M. H. Kim, Defence IV, G. S. Buller, R. C. Hollins, R. A. Lamb, and
“High-quality hyperspectral reconstruction using a spectral prior,” ACM M. Laurenzis, Eds., vol. 11163, International Society for Optics
Trans. Graph., vol. 36, no. 6, pp. 218:1–218:13, Nov. 2017. [Online]. and Photonics. SPIE, 2019, pp. 88 – 97. [Online]. Available:
Available: http://doi.acm.org/10.1145/3130800.3130810 https://doi.org/10.1117/12.2532581
[25] L. Zhang, W. Wei, Y. Zhang, C. Shen, A. van den Hengel, and Q. Shi, [45] T. Amert, N. Otterness, M. Yang, J. H. Anderson, and F. D. Smith, “Gpu
“Dictionary learning for promoting structured sparsity in hyperspectral scheduling on the nvidia tx2: Hidden details revealed,” in 2017 IEEE
compressive sensing,” IEEE Transactions on Geoscience and Remote Real-Time Systems Symposium (RTSS), Dec 2017, pp. 104–115.
Sensing, vol. 54, no. 12, pp. 7223–7235, Dec 2016. [46] L. Santos, L. Berrojo, J. Moreno, J. F. López, and R. Sarmiento,
“Multispectral and hyperspectral lossless compressor for space appli-
[26] G. Martı́n and J. M. Bioucas-Dias, “Hyperspectral blind reconstruction
cations (hyloc): A low-complexity fpga implementation of the ccsds
from random spectral projections,” IEEE Journal of Selected Topics in
123 standard,” IEEE Journal of Selected Topics in Applied Earth
Applied Earth Observations and Remote Sensing, vol. 9, no. 6, pp. 2390–
Observations and Remote Sensing, vol. 9, no. 2, pp. 757–770, Feb 2016.
2399, 2016.
[47] A. Tsigkanos, N. Kranitis, G. A. Theodorou, and A. Paschalis, “A 3.3
[27] S. Bernabe, G. Martin, J. Nascimento, J. Bioucas-Dias, A. Plaza,
gbps ccsds 123.0-b-1 multispectral hyperspectral image compression
and V. Silva, “Parallel hyperspectral coded aperture for compressive
hardware accelerator on a space-grade sram fpga,” IEEE Transactions
sensing on gpus,” IEEE Journal of Selected Topics in Applied Earth
on Emerging Topics in Computing, pp. 1–1, 2018.
Observations and Remote Sensing,, vol. 9, no. 2, pp. 932–944, 2015.
[48] J. Fjeldtvedt, M. Orlandić, and T. A. Johansen, “An efficient real-
[28] J. Sevilla, G. Martin, J. Nascimento, and J. Bioucas-Dias, “Hyperspectral
time fpga implementation of the ccsds-123 compression standard for
image reconstruction from random projections on gpu,” in Geoscience
hyperspectral images,” IEEE Journal of Selected Topics in Applied Earth
and Remote Sensing Symposium (IGARSS), 2016 IEEE International,
Observations and Remote Sensing, vol. 11, no. 10, pp. 3841–3852, Oct
2016, pp. 280–283.
2018.
[29] J. Sevilla, G. Martin, and J. M. Nascimento, “Parallel hyperspectral im- [49] M. Orlandic, J. Fjeldtvedt, and T. A. Johansen, “A parallel fpga imple-
age reconstruction using random projections,” in SPIE Remote Sensing. mentation of the ccsds-123 compression algorithm,” Remote Sensing,
International Society for Optics and Photonics, 2016, pp. 1 000 707– vol. 11, p. 673, 2019.
1 000 707–9. [50] A. Abrardo, M. Barni, and E. Magli, “Low-complexity predictive lossy
[30] J. M. Haut, S. Bernabé, M. E. Paoletti, R. Fernandez-Beltran, A. Plaza, compression of hyperspectral and ultraspectral images,” in 2011 IEEE
and J. Plaza, “Low–high-power consumption architectures for deep- International Conference on Acoustics, Speech and Signal Processing
learning models applied to hyperspectral image classification,” IEEE (ICASSP), 2011, pp. 797–800.
Geoscience and Remote Sensing Letters, vol. 16, no. 5, pp. 776–780,
May 2019.
[31] M. Dı́az, R. Guerra, P. Horstrand, E. Martel, S. López, J. F. López,
and R. Sarmiento, “Real-time hyperspectral image compression onto
embedded gpus,” IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, vol. 12, no. 8, pp. 2792–2809, Aug
2019.
[32] A. Rodrı́guez, L. Santos, R. Sarmiento, and E. D. L. Torre, “Scalable
hardware-based on-board processing for run-time adaptive lossless hy-
perspectral compression,” IEEE Access, vol. 7, pp. 10 644–10 652, 2019. José Nascimento received the Ph.D. degree in elec-
[33] D. Valsesia and E. Magli, “High-throughput onboard hyperspectral trical and computer engineering from the Instituto
image compression with ground-based cnn reconstruction,” IEEE Trans- Superior Técnico, Technical University of Lisbon,
actions on Geoscience and Remote Sensing, vol. PP, 07 2019. Lisbon, Portugal, in 2006. Currently, he is a Profes-
[34] D. Báscones, C. González, and D. Mozos, “Fpga implementation of the sor in Instituto Superior de Engenharia de Lisboa
ccsds 1.2.3 standard for real-time hyperspectral lossless compression,” and researcher in Instituto de Telecomunicações,
IEEE Journal of Selected Topics in Applied Earth Observations and Lisbon, Portugal. He has contributed to more then 60
Remote Sensing, vol. 11, no. 4, pp. 1158–1165, April 2018. journal, international conference papers, and book
[35] J. Fjeldtvedt, M. Orlandić, and T. A. Johansen, “An efficient real- chapters. He is currently serving as reviewer of
time fpga implementation of the ccsds-123 compression standard for several international journals and he has also been
hyperspectral images,” IEEE Journal of Selected Topics in Applied Earth a member of program/technical committees of sev-
Observations and Remote Sensing, vol. 11, no. 10, pp. 3841–3852, Oct eral international conferences. His current research interests include remote
2018. sensing, image processing, and high performance computing.
[36] A. Plaza, Q. Du, Y.-L. Chang, and R. King, “High performance
computing for hyperspectral remote sensing,” IEEE J. Sel. Topics Appl.
Earth Observations Remote Sens., vol. 4, no. 3, pp. 528 –544, 2011.
[37] H. Quinn, “Radiation effects in reconfigurable FPGAs,” Semiconductor
Science and Technology, vol. 32, no. 4, p. 044001, mar 2017.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.2996679, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
JOURNAL OF LATEX , VOL. XX, NO. X, AUGUST 20XX 11

Mário P. Véstias Mário P. Véstias is a Coordi- Gabriel Martı́n received the PhD degree in
nate Professor at the Polytechnic Institute of Lis- Computer Engineering from the University of Ex-
bon, School of Engineering (ISEL), Department of tremadura, Caceres, Spain, in 2013. He has ob-
Electronic, Telecommunications and Computer En- tained several prices for his PhD. dissertation such
gineering (DEETC). He is a senior researcher at the as the ”Best Iberian PhD. Dissertation in Infor-
Electronic Systems Design and Automation group at mation System and Technologies” awarded by the
the research institute INESC-ID in Lisbon. His main Iberian Association for Information Systems and
research interests are Computer Architectures and Technologies, and the ”Outstanding PhD Disserta-
Digital Systems for High-Performance Embedded tion award” by the University of Extremadura. He
Computing, with an emphasis on Reconfigurable was a Predoctoral Research Associate (funded by
Computing. He is a PhD in Electrical and Computer the Spanish Ministry of Science and Innovation)
Engineering from the Technical University of Lisbon. with the Hyperspectral Computing Laboratory and Postdoctoral Researcher
in ”Instituto de Telecomunicaçoes” Lisbon, Portugal. Actually he is working
as senior performance analytics engineer at Atrio Inc. His research interest
include Hyperspectral image processing, specifically the areas of unmixing
and compressive sensing of hyperspectral images, as well as the efficient
processing of these images in High Performance Computing architectures such
as GPUs. He is co-author of a patent about a portable performance analytics
system. He has authored or co-authored more than 60 publications, including
several book chapters, journal citation report (JCR) papers and peer-reviewed
international conference papers. Dr. Martı́n has served as a Reviewer for the
IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing and the IEEE Transactions on Geoscinece and Remote Sensing.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like