0% found this document useful (0 votes)
66 views8 pages

2023 - Case-Study For Integration of COTS SoC Devices in Reliable Space Systems For On-Board Processing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views8 pages

2023 - Case-Study For Integration of COTS SoC Devices in Reliable Space Systems For On-Board Processing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Case-Study for Integration of COTS SoC Devices in

Reliable Space Systems for On-Board Processing


Ivan Rodriguez-Ferrandez∗†‡ , David Steenari‡ , Maris Tali‡ , Leonidas Kosmidis†∗ ,
Ferdinando Tonicello‡
∗ Universitat
Politècnica de Catalunya (UPC), Barcelona, Spain
† BarcelonaSupercomputing Center (BSC), Barcelona, Spain
‡ European Space Agency (ESA), Noordwijk, The Netherlands

Abstract—Recent trends in Data Handling Systems (DHS) address this concern, the ESA COTS EEE Working Group
include increased data-rates, in-orbit reconfiguration and the has proposed the use of hardware safety barriers as a means
introduction of advanced On-Board Processing (OBP) methods to of limiting any potential failures [7].
extract actionable information on-board, with low latency. At the
same time, NewSpace industry has successfully deployed COTS- By isolating the COTS equipment from other mission-
based DHS and processing equipment in space. In particular critical components through the use of safety barriers, the risks
for OBP, COTS-based processors and FPGAs can offer higher associated with using non-space qualified components can be
computational performances than space qualified equivalents mitigated, while still taking advantage of their lower cost and
– which in turn can enable new applications through more higher performance.
advanced OBP algorithms. The individual component cost is
also lower for COTS than space qualified components, which To explore the potential of these concepts in real-world sce-
can allow overall cost-optimisations on mission level. narios, we present a case study that focuses on standard form
An internal working group at ESA has studied the concept factors for payload processing modules using automotive-
of using ”safety barriers” to ensure no propagation of failure grade embedded GPU SoCs.
from functions implemented with COTS to other on-board units. Moreover, we propose the use of a Software Safety Barrier
Thanks to ESA studies there is a path to use Class IV equipment
on missions with higher class (Class I-III), through the use of
concept that utilizes a separate reliable processor with qualified
hardware “safety barriers” to limit failure propagation from software and error checking to limit application level error
Commercial Off-The-Shelf (COTS) equipment. propagation and ensure system availability. In addition, we
In our work, we take the above mentioned concepts and propose an extension to the ESA hardware safety barrier to
apply them to reference implementations, targeting payload manage single event latch-up (SEL) in COTS devices with a
processing modules based on standard form factors (ADHA) and
highly variable power consumption.
automotive-grade GPU Systems-on-Chip (SoCs). In addition, we
make use of previous work on improving in-flight availability Finally, we apply the aforementioned solutions to three
of complex COTS SoC processors, through system and software hardware form factors: 6U-ADHA, 3U-ADHA, and Cubesat.
Fault Detection, Isolation and Recovery (FDIR) techniques. Our goal is to provide insights into the potential of using
Index Terms—Data Handling Systems, On-Board Processing, COTS-based systems and software/hardware safety barriers in
Space Systems, Reliable Systems
the design of DHS equipment for future space missions.
I. I NTRODUCTION II. BACKGROUND
Recent trends in DHS include increased data-rates, in-orbit A. Data Processing Units (DPUs) and OBP
reconfiguration and the introduction of advanced OBP methods Part of the DHS, Data Processing Units are often responsible
to extract actionable information on-board, with low latency, for pre-processing, data reduction and compression in space
using Artificial Intelligence (AI) algorithms [1] [2] [3] [4]. missions. In certain cases, a Data Processing Unit (DPU) can
The European Space Agency (ESA) and European Space be integrated with the Instrument Controller Unit (ICU). Usu-
Industry have recognised the importance of ensuring the ally, DPUs interface directly with one or multiple instrument
availability of mature and high-performance DHS equipment front-ends, receiving raw payload data, process it, and transfer
for institutional Earth Observation (EO) missions. To achieve it on the Mass-Memory and Formatting Unit (MMFU).
this, the ADHA modularity concept has been introduced [5]. In smaller missions such as micro-satellites or Cubesats, a
Meanwhile, the NewSpace industry has demonstrated single payload computer can integrate all the aforementioned
successfully the deployment of Commercial Off-The-Shelf functions: instrument interfacing, data processing and mass-
(COTS)-based DHS and processing equipment in space, which memory storage.
offer higher computational performance and lower individual We summarise some of the major challenges that DPU
component costs than space-qualified equivalents [6]. designers begin to face today:
One identified issue with the use of COTS equipment is • In-flight reconfigurability requirements of On-Board Pro-
the possible failure propagation to external equipment. To cessing algorithms

© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works. https://dx.doi.org/10.23919/EDHPC59100.2023.10396004
• High instrument data rates (ranging from tens of Mbit/s been working in a set of procedures to allow the use of these
to multiple Gbit/s, depending on the mission size) components in higher class missions, not only NewSpace ones
• Requirements for high energy efficiency, i.e. high perfor- (Class V). This allows first to use these components in higher
mance within a limited power envelope class missions (Class IV) and in the future to increase their
use in even higher class missions (Class I-III).
B. Advanced Data Handling Architecture (ADHA) This work is carried out by the ESA COTS EEE components
ESA has initiated the ADHA-2 studies (Advanced Data and modules Working Group [9] which developed the ESA
Handling Architecture) together with European Large Space COTS guidelines in order to define a set of rules to limit
Integrators (LSIs) and DHS equipment suppliers [5]. the failure propagation. This is achieved by the use of local
ADHA specifies the module form-factor and backplane power regulation in the power buses and external drivers for
definition. The backplane connector is based on Compact PCI all interfaces, to limit failure propagation in communication
Serial Space (cPCI-S-S), but with a revised pin-out definition, interfaces. To accomplish this, reference designs are proposed
which adds redundancy features typically required in DHS for the power delivery to the COTS device and voltage clamps
equipment used in ESA missions. The backplane utilises for low speed serial lines.
Controller Area Network (CAN), SpaceWire and SpaceFibre
interfaces for control and data transfer. E. Automotive-grade COTS GPU-SoCs
At module (board) level, two form-factors have been de- Among all the hardware options for COTS devices, Au-
fined, equivalent to Eurocard-3U and Eurocard-6U. At unit tomotive Embedded Graphics Processing Units (GPUs) have
level, several standard functions have been defined including: shown a great potential in high performance and safety fea-
On-Board Computer (OBC) module, power module, mass- tures [6]. Driven by the autonomous driving concept, these
memory module, and Co-Processing Module (CPM). An devices have been designed with high performance in mind
ADHA-based DPU or payload computer can be built up by a as well reliability features in order to meet the automotive
system controller and multiple CPMs. safety requirements, defined by the automotive functional
safety standard ISO 26262 [10].
C. Use of COTS in ESA Class IV Missions However, whether GPU-SoCs can be used to accelerate
Recently ESA started considering the use of COTS devices space algorithms remained an open question until recently.
in Class IV missions. This class of missions includes the lower Thanks to the recent ESA-funded activity GPU4S (GPUs for
cost missions targeting micro-satellites or constellations. In Space) [6] the OBPMark open-source benchmark suite [11]
order to allow bringing innovation to this class of missions, has been developed, which is currently the standard software
the ESA COTS EEE Working Group is working towards the for testing new payload processing computers at ESA. This
standardisation of the use of these new COTS devices. As set of benchmarks, which supports both sequential and parallel
part of this process, it defines categories, guidelines about implementations both for multicore CPUs and GPUs, allows us
their testing and also their interactions with space qualified to evaluate the performance benefits of GPUs targeting specific
components, ensuring always that a COTS system will not space-related algorithms. In particular, it has been shown that
harm other subsystems. The outcome of this effort is included high performance can be achieved in these GPU platforms,
in the update of the ECSS-QST-60-13 standard [8]. with reasonable programming effort [12].
This effort includes the definition of Q1 and Q2 COTS com- Despite these benefits, in order to use such devices in
ponent classes, for components that lack lot-traceability [9]. space, a set of challenges must be overcome like a) module
This new component classification, creates a framework for the integration in spacecraft systems, b) radiation characterisation,
use of NewSpace products based on COTS within institutional and c) latch-up protection.
missions with limited budgets. From a general integration perspective, these high per-
The principles for Q1 can be summarised as follows: formance GPU SoCs are available as a System-on-Module.
1) Use of automotive-grade components when possible This allows for fast integration because the most challenging
2) Perform proton radiation testing on component level parts of the system like DRAM, booting sequencer or power
for single event effects (SEE) characterisation, focusing distribution are already solved by the supplier. On the other
on SEL and non-recoverable single event functional hand, for space related use, this is a complication, because the
interrupt (SEFI) auxiliary components cannot be decided by the designer. As
3) Perform TID radiation testing on component or board a consequence, workarounds and deep study of the module is
level required to identify all of the possible failure modes and how
to isolate the system from the rest of the spacecraft.
D. Hardware Safety Barrier for COTS Equipment Regarding radiation testing, there is already a lot of effort to
Due to the rapid development of technology, the perfor- characterise such systems like E.J. Wyrwas et al. [13] with an
mance and cost gap between space qualified components and NVIDIA TX2 , the new generation of this class of embedded
COTS devices has widen. This creates an interest to use these GPUs the NVIDIA Xavier [14] or Guertin et al. [15] with the
components in some part of the missions to increase mission Snapdragon 820 that was been used in the Mars Helicopter
return and reduce development costs. This is why ESA has Ingenuity [16].
All of these tests have demonstrated really promising relia- multiple hardware and software components to guarantee that
bility data regarding such systems. the computer always provide valid and trustworthy outputs.
Finally, in Section III-C we address the problem of latch-up One of the safety systems included in automotive-grade
protection, proposing the use of a state-aware delatcher. components such as NVIDIA Xavier GPU SoC family, is
the ARM Reliability, Availability, and Serviceability (RAS)
III. P ROPOSED S OLUTIONS feature [17]. RAS is a set of hardware and software diagnostic
In this Section we summarise our proposals for the solution systems, which focuses on the reliability and error reporting
of the aforementioned issues. of ARM-based CPU systems. In addition, it includes protocols
The proposed solutions can be summarised as: for communication between the different safety systems with
1) Use of Q1 components. the high-level software, such as the operating system (OS).
2) Implement hardware safety barriers at module level. This system can be used to pinpoint errors in radiation
3) Extend the hardware safety barrier system. conditions, as well to identify sensitive parts of these complex
4) Implement software safety barriers at module level. COTS System on Chip (SoC) [14] devices when they are
Figure 1 displays a general view of how the proposed studied under irradiation. In addition, starting ARM v8.2, RAS
solutions will work for the complete system. will be mandatory for all ARM processors compliant with this
version of the standard. Therefore, this feature will be available
in more ARM devices and will enable interoperability between
Sw. Safety
Barrier
different products.
COTS Device Another feature to leverage is the use of real-time, safety
certified processors for multiple parts of the subsystems. For
Monitor App. Hw.
Processor Processor Accelerators
example the more complex automotive SoCs like the NVIDIA
Xavier/Orin, feature a set of dual lockstep ARM R5 proces-
Monitor I/O sors [18] which are used during the boot sequence to guarantee
Hw. Safety

P/Monitor
Barrier
Extension
the correctness of the bootloader, as well as other vital parts
FPGA POWER of the system, like power delivery and main memory.
Other manufacturers, in order to guarantee correctness and
Hw. Safety real-time operations, implement dual modular redundancy
Barrier
Power using lockstep processors such as the Aurix TriCore [19]
LCL
Voltage or dynamic modular redundancy like the multicore ARM-
TX/RX TX/RX TX/RX
Clamp 78AE [20] which allows to use each core independently for
more performance or in lock-step to guarantee correctness.
All these features are useful to ensure thath the system can
operate correctly under radiation conditions.
Fig. 1. General solution diagram
C. Hardware Safety Barrier Expansion

A. Use of Q1 Components As described in the Background Section (II-D) the hardware


As outlined above, we propose to follow the component safety barrier allows to add a COTS device and ensure that
qualification guidelines of the upcoming Q1 component clas- errors will not cascade to the rest of the spacecraft.
sification, as described in [9]. However, in our proposal we are interested into using
In case that proton SEE tests show that key components complex COTS SoCs, featuring embedded GPUs. In these
are susceptible to (destructive) SEL effects, or non-recoverable devices, there is high variability in current consumption [21]
SEFI effects (such as corruption of boot memory), an iteration [22] depending on whether the GPU is used. Therefore, the
of the component selection and radiation testing needs to basic implementation of the delatcher to prevent SEL is not
be performed in order to identify components with similar sufficient, since it is unable to differentiate between the start of
function and performance. an execution of a demanding computation or a real latch-up.
In rare cases, components with non-destructive SEL can Therefore, in order to guarantee the reliable use of these
still be utilised – given that the component has an application- high performance COTS systems, a state-aware delatcher
enabling performance. In our case this is the computational needs to be developed. This delatcher needs to have infor-
performance or throughput. It is worth noting that this is mation of the state of the SoC main application processors,
entirely driven by the use-case. accelerators, state of computation and responsiveness of the
system. With that information the delatcher will have sufficient
B. Leverage Automotive Certifiable HW/SW information to decide whether an observed current spike is due
As discussed in Section II-E, these COTS embedded com- to a standard task processing or a SEL.
puters must include built-in reliability and safety features in The state-aware delatcher, has two different systems running
order comply with the ISO 26262 rules. Such systems exhibit in parallel, the SoC state handler and the SEL detector.
The SoC state handler is continuously reading from the For control, external equipment will only interact with the
Software Safety Barrier, the low level GPIO signals, the qualified software running on the FT-CPU. High-speed data
application processor watchdog and the control processor. interfaces, which are not feasible to be managed via software
With all this information, it decides whether the SoC is in without introducing unacceptable bottlenecks, are managed
any of the following states: through dedicated hardware IP in FPGA.
• off. The main Application processor and accelerator are The method can be summarised as:
off. • All control and data interfaces of the COTS processor
• idle. The main Application processor and accelerator are are connected to an external FPGA or SoC, with known
powered on but they are in an idle state. radiation effects and reliability
• computing. The main Application processor and acceler- • The FPGA contains a FT-CPU soft-core (such as NOEL-
ator are powered on and running applications. V or LEON3FT) [24] [25]
• booting. The main Application processor and accelerator • The FT-CPU runs a fully space qualified software stack,
are powered on but they are in booting state. such as RTEMS [26]
• crash. The SoC has suffered a crash, Software or SEE • The FT-CPU monitors the COTS processor and collects
related. and checks telemetries (TM) and status.
• unknown. The SoC state cannot be decided and an • All Telemetry and Telecommand (TM/TC) to external
intervention is needed to restore the system. equipment is filtered through a qualified software node
Based on the state information, other SEL detector takes the • All high-speed data interfaces are monitored by dedicated
decision to power cycle the board or not, when a current spike IPs in the FPGA
takes place. The full flow is shown in the Figure 2.
IV. R EFERENCE D ESIGNS
Based on our proposals, we present reference diagrams for
Yes
the integration of such systems in three different form factors
No
for space.
Read Current Threshold
Yes
Request state In Expected
No Power Cycle
1) 6U-ADHA Co-Processing Module
Data met? Data mode?
2) 3U-ADHA Payload Computer Module
3) CubeSat form-factor Payload Computer
Fig. 2. Operation flow of the smart delatcher
A. 6U-ADHA Co-Processing Module
Due to the fast nature of SEL, this full state awareness needs The reference design for the 6U module is based on the
to be implemented in hardware, and be able to communicate NVIDIA AGX Orin and the NVIDIA AGX Orin Industrial.
with the software safety barrier, described in Section III-D, Due to the large size of these devices, this is the only form
in order to decide whether the occurring current spike is factor that can host such system. This allows to have a fully
related to a SEE or to normal computation. It is worth noting qualified industrial module which hosts more high speed I/O,
that this solution has a small drawback: in order to know fully automotive qualified components and more low level
fast enough the status of the main SoC, a near real-time control signals for the SoM. In Figure3 a block diagram of
communication needs to be established, increasing the base the proposed reference design is shown.
power consumption of the device and using some of the For this reference design we use the FPGA as intermediary
available computation capacity. between the COTS device and the rest of the spacecraft. This
allows the COTS device to be connected to the spacecraft in
D. Software Safety Barrier a safe manner. Also as shown in the diagram the FPGA will
As some high-performance COTS processors require the use serve as an agent for translating the commercial high speed
of closed source proprietary drivers and libraries i.e. in order interfaces like Ethernet or PCIe to the space standards like
to use the GPU, and they are only available for Linux – it is SpaceFibre and SpaceWire.
not feasible to run an entirely space-qualified software stack This arrangement of components allows the COTS system
on these COTS platforms [23]. to be transparent to the OBC, since the FPGA is responsible
Therefore, to enable the use of such COTS processors, for the communication and processing of the commands, and
similarly to the hardware barrier described above, in this some them are forwarded to the COTS device.
paper we propose to use also a Software Safety Barrier to
limit possible failure propagation from the COTS processor to B. 3U-ADHA Payload Computer Module
external equipment – through software/data/control interfaces. The reference design for the 3U module is based on the
The software safety barrier introduces a second layer of NVIDIA NX. The system is similar to the CubeSat design
software within a module or unit, which is external to the presented next, but uses the ADHA form factor. The use of
COTS processor but embedded in the module/unit, and uses a this module will take up 25% of the available area, leaving
fully space-qualified software stack. This additional software enough space for the FPGA, power distribution and condition-
layer runs in a Fault-Tolerant (FT) CPU. ing, backplane and frontend connections. Using this module,
Ethernet SoM (Orin) Hardware
Debug Safety Barrier
SoC (Orin)
ECC
Hardware Software Safety DRAM
Safety Barrier Barrier

Safety App Proc HwAcc


Acc
CAN Boot Proc App Proc Hw Micro
MDM... CAN CAN Ile
Transceiver Boot
Memory Power
Conditioning...

PCIe
Boot Signals Bus
UART CAN JTAG Voltage Clamp
(GPIO) Debug

Power Backplane
LCL
Software Metrics Connector
Hardware
Safety Barrier Safety Barrier
Boot Signals
Extension (GPIO)
LVDS State CAN (N+R)
MDM...9 X SpW (N+R) X SpW (N+R) RISC-V Hardware
Transceiver Aware SEL
(NOEL-V FT) Monitors
detection X SpW (N+R)
LVDS
X SpW (N+R)
Transceiver
MDM...9 X SpFi (N+R)

FPGA (RT-Polarfire, NG-ULTRA300 or COTS) X SpFi (N+R)

Working Memory
Application Boot
ECC-protected
Storage Memory
SDRAM DDR3/4

Fig. 3. Reference design architecture for the 6U-ADHA module

we can get high performance with a smaller form factor, Widely used in the NewSpace industry, CubeSats only
sacrificing some of the I/O and low level control signals of the standardise the outer dimensions of the satellite in order to
SOM. Moreover, while the NVIDIA NX can be ruggedized, it be compatible with multiple launchers. However, the internal
does not fulfil all the automotive grade qualifications, unlike systems like the onboard computer (OBC) or power systems
the larger form factor NVIDIA Orin AGX and Industrial. In use different interconnections or form factors due to the lack
Figure 4 the reference block diagram can be seen. of standardisation. For this reason, in our reference design we
This design shares a lot of commonalities with the 6U- choose to follow the PC/104 space standard, which is widely
ADHA design. In particular, some of the front-end control used in industry and uses two 52 pins connectors to deliver
signals are removed, as well as the JTAG interface since it is power and control signals between the different modules.
not available on the NVIDIA NX device. This also means that This PC/104 connector has a limitation in contrast to the
both systems can have similar software and FPGA hardware compact PCI connector on the ADHA design, which is that
design, with the only change to be in the I/O. the connector is not able to carry high speed low voltage
differential signalling (LVDS), like PCI express or gigabit
Ethernet. For this reason, in our reference design we took
C. CubeSat form-factor Payload Computer
inspiration from the PCIe/104 specification [28] to use the
The smallest reference design which targets CubeSats is express connector to carry LVDS signals and between the
based on the NVIDIA Orin NX. Again it uses the NX form different modules. The proposed reference block diagram is
factor from NVIDIA, which allows to have the same hardware shown in Figure 5.
and software interface with other NX modules like Orin Nano, As it can be seen on the block diagram, the control signals
as well as with previous models of the same SoC family like (CAN and I2C) of the module are connected to the PC/104
the Xavier NX and other compute modules compatible with as well some low level control signals to handle the power
this form factor like the Mixtile Core 3588E [27]. of the module. This is similar to the ADHA reference design,
SoM (Orin) Hardware
Safety Barrier
SoC (Orin)
Software Safety DRAM
Barrier

Ethernet Control App Proc HwAcc


Acc
Boot Proc App Proc Hw Micro
Debug Proc
Boot
Memory Power
Conditioning...

PCIe
Boot Signals Bus
UART CAN Voltage Clamp
(GPIO) Debug

Power Backplane
LCL
Software Metrics Connector
Hardware
Hardware
Safety Barrier Safety Barrier
Safety Barrier Boot Signals
Extension (GPIO)
LVDS State CAN (N+R)
MDM...9 X SpW (N+R) X SpW (N+R) RISC-V Hardware
Transceiver Aware SEL
(NOEL-V FT) Monitors
detection X SpW (N+R)
LVDS
X SpW (N+R)
Transceiver
MDM...9 X SpFi (N+R)

FPGA (RT-Polarfire, NG-ULTRA300 or COTS) X SpFi (N+R)

Working Memory
Application Boot
ECC-protected
Storage Memory
SDRAM DDR3/4

Fig. 4. Reference design architecture for the 3U-ADHA module

in which only the CAN connection is exposed and protected For optical imagers the following payload processing stages
with the hardware safety barrier. However, in contrast to the are available:
previous designs, the rest of the connections goes without that
1) Image pre-processing corrections and calibrations (flat-
control, which is a trade off of the cubesat design due to the
field correction, bad pixel correction, radiation scrub-
limit space of the PC/104 from factor.
bing, stacking, binning, etc) [30] [29].
Finally, it is also worth mentioning that because the PCIe
2) Cloud screening, using Convolutional Neural Network
lanes are going over the connector, this functionality can be
methods [29].
upgraded in the future with more modules attached to the
3) Object detection, e.g. ship detection [29].
PCIe/104 connector.
4) Image and data compression: CCSDS 121, CCSDS
V. O N -B OARD S OFTWARE E COSYSTEM 122 [31] [29]. Support for CCSDS 123 is on-going.
5) Data encryption [29].
In addition to the hardware reference designs described in
the previous Section, we have put significant effort in terms In addition, for synthetic aperture radar (SAR) instruments –
of implementing end-user application software for on-board image formation (i.e. range-Doppler algorithm) has been also
data processing - primarily focusing on payload data, since implemented in parallel software [32] within the OBPMark
the targeted use case is for DPUs. benchmark suite.
The baseline code used for the on-board payload data As all implementations mentioned above are available as
processing tasks comes from the reference, parallel software parallelised implementations both in OpenMP and CUDA – it
implementation of OBPMark and OBPMark-ML benchmarks is possible to utilize both the embedded CPU multicore and
reference in OpenMP and CUDA [29], in order to exploit the embedded GPUs at the same time (e.g. for different stages
the multicore and GPU features of the embedded COTS GPU of a processing pipeline) in order to ensure the utilization of
device. the available computational elements.
PC/104 connector

CAN Bus I2C Bus Power


GPIO &Lanes
Enables
Power MDM9
Lanes Connector

Hardware Safety Barrier

2 2 x x x
8

CAN
Power Management & Control
transceiver

2 x x

NVIDIA Jetson Board SOM


LPDDR4/5 CAN I2C
I2C Power Power Ethernet
controller GPIO
Lanes Power lanes
controller Controller controller controller

6 to
CPU
8 ARM GPU
(6/8Cores
ARM) (6/8 SMs)

eMMC I2C
PCIe I2C USB OTG
SPI I2C controller
UART controller Debug UART
(only Xavier) controller
controller controller controller

4
2 2
RS422
transceiver
20 2 10 10 4 2

UART MDM9 Debug & Programming


PCIe Express connector
connector Connector connector

Fig. 5. Reference design architecture for Cubesat form factor payload computer.

In addition, the devices include several hardware accelera- as a possible solution for mitigating error propagation on the
tors, such as: AI/ML processing for quantized neural networks, software and data handling interfacing level and an extension
JPEG image compression and H.264/H.265 video encoding. of the ESA COTS EEE Hardware Safety Barrier to be used
If higher performance is required for tasks that are not easily in state-of-the-art complex COTS embedded GPUs.
parallelisable, such as e.g. some compression or encryption We have presented three different hardware/software con-
algorithms, additional hardware accelerator IPs can also be ceptual designs for payload computers on standard form-
added to the on-board FPGA. For example, we have observed factors: 6U-ADHA module, 3U-ADHA module and Cubesat.
that the encoding stages of CCSDS compression algorithms Finally, we have presented an overview of the software ecosys-
are more efficiently implemented in FPGAs in terms of tem in such a design.
performance per power. Hence, the pre-processing stage (e.g. ACKNOWLEDGEMENTS
predictor in CCSDS 121/123 or the discrete wavelet transform
This work was supported by ESA through the
in CCSDS 122) [33] can be easily implemented in the
4000136514/21/NL/GLC/my co-funded PhD activity ”Mixed
embedded GPUs, which can be combined with the encoding
Software/Hardware-based Fault-tolerance Techniques for
stage implemented in the FPGA.
Complex COTS System-on-Chip in Radiation Environments”
VI. C ONCLUSIONS and the GPU4S (GPU for Space) ESA-funded project.
Moreover, it was supported by the European Commu-
In this paper we have outlined methodologies for the use of nity’s Horizon Europe programme under the METASAT
complex COTS System-on-Chip in reliable data handling sys- project (grant agreement 101082622). In addition, it was
tems, focusing on on-board payload processing applications. partially supported by the Spanish Ministry of Econ-
Finally, we have presented the new concept of using a omy and Competitiveness under grants PID2019-107255GB-
Software Safety Barrier in complex SoC with embedded GPUs C21 and IJC-2020-045931-I ( Spanish State Research
Agency / Agencia Española de Investigación (AEI) / [14] I. Rodriguez-Ferrandez, M. Tali, L. Kosmidis, M. Rovituso, and
http://dx.doi.org/10.13039/501100011033 ) and by the Depart- D. Steenari, “Sources of Single Event Effects in the NVIDIA Xavier
SoC Family under Proton Irradiation,” in 2022 IEEE 28th International
ment of Research and Universities of the Government of Cat- Symposium on On-Line Testing and Robust System Design (IOLTS).
alonia with a grant to the CAOS Research Group (Code: 2021 IEEE, 2022, pp. 1–7.
SGR 00637). Leonidas Kosmidis was partially supported by [15] S. M. Guertin and M. Cui, “SEE Test Results for the Snapdragon 820,”
in 2017 IEEE Radiation Effects Data Workshop (REDW). IEEE, 2017,
the Spanish Ministry of Economy and Competitiveness under pp. 1–6.
grant IJC2020-045931-I (Spanish State Research Agency / [16] B. Balaram, T. Canham, C. Duncan, H. F. Grip, W. Johnson, J. Maki,
A. Quon, R. Stern, and D. Zhu, “Mars Helicopter Technology Demon-
http://dx.doi.org/10.13039/501100011033). strator,” in AIAA Atmospheric Flight Mechanics Conference, 2018.
R EFERENCES [17] ARM. (2020) Arm Architecture Reference Manual Supplement Relia-
bility, Availability, and Serviceability (RAS), for Armv8-A. [Online].
[1] G. Furano, G. Meoni, A. Dunne, D. Moloney, V. Ferlet-Cavrois, Available: https://developer.arm.com/documentation/ddi0587/latest
A. Tavoularis, J. Byrne, L. Buckley, M. Psarakis, K.-O. Voss et al., [18] ——, Cortex R-5, mar 2018. [Online]. Available:
“Towards the Use of Artificial Intelligence on the Edge in Space https://www.arm.com/products/silicon-ip-cpu/cortex-r/cortex-r5
Systems: Challenges and Opportunities,” IEEE Aerospace and Electronic [19] A. Hayek and J. Börcsök, “Safety Chips in Light of the Standard
Systems Magazine, vol. 35, no. 12, pp. 44–56, 2020. IEC 61508: Survey and Analysis,” in 2014 International Symposium
[2] G. Giuffrida, L. Fanucci, G. Meoni, M. Batič, L. Buckley, A. Dunne, on Fundamentals of Electrical Engineering (ISFEE). IEEE, 2014.
C. Van Dijk, M. Esposito, J. Hefele, N. Vercruyssen et al., “The Φ- [20] ARM, Cortex-A78AE, mar 2022. [Online]. Available:
Sat-1 Mission: The First On-board Deep Neural Network Demonstrator https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78ae
for Satellite Earth Observation,” IEEE Transactions on Geoscience and [21] E. Aragon, J. M. Jiménez, A. Maghazeh, J. Rasmusson, and U. D. Bor-
Remote Sensing, 2021. doloi, “Pattern Matching in OpenCL: GPU vs CPU Energy Consumption
[3] F. Ouallouche, K. Labadi, Y. Mohia, M. Lazri, and S. Ameur, “Artificial on two Mobile Chipsets,” in Proceedings of the International Workshop
Intelligence for Satellite Image Processing: Application to Rainfall on OpenCL 2013 & 2014, 2014, pp. 1–7.
Estimation,” in Intelligent Systems and Applications: Select Proceedings [22] I. Rodrı́guez Ferrandez, “An On-board Algorithm Imple-
of ICISA 2022. Springer, 2023, pp. 165–174. mentation on an Embedded GPU: A Space Case Study,”
[4] F. Ortiz, V. Monzon Baeza, L. M. Garces-Socarras, J. A. Vásquez- Master’s thesis, Universitat Politècnica de Catalunya, 2021,
Peralvo, J. L. Gonzalez, G. Fontanesi, E. Lagunas, J. Querol, and https://upcommons.upc.edu/handle/2117/344892.
S. Chatzinotas, “Onboard Processing in Satellite Communications Using [23] L. Kosmidis, M. Solé Bonet, I. Rodriguez-Ferrández, J. Wolf, and
AI Accelerators,” Aerospace, vol. 10, no. 2, p. 101, 2023. M. M. Trompouki, “The METASAT Hardware Platform: A High-
[5] O. Mourra, “ADHA 2022 Workshop (Advanced Data Handling Archi- Performance Multicore, AI SIMD and GPU RISC-V Platform for On-
tecture),” nov 2022. [Online]. Available: https://indico.esa.int/event/427/ board Processing,” in European Data Handling and Data Processing
[6] L. Kosmidis, I. Rodriguez, A. Jover-Alvarez, S. Alcaide, J. Lachaize, Conference for Space (EDHPC), 2023.
O. Notebaert, A. Certain, and D. Steenari, “GPU4S: Major Project [24] Frontgrade Gaisler, NOEL-V Processor. [Online]. Available:
Outcomes, Lessons Learnt and Way Forward,” in Design Automation https://www.gaisler.com/index.php/products/processors/noel-v
and Test in Europe Conference (DATE), 2021. [25] ——, Leon3 Processor. [Online]. Available:
[7] M. Nikulainen and F. Tonicello, “Utilization of COTS in ESA https://www.gaisler.com/index.php/products/processors/leon3
Missions,” NASA Electronic Parts and Packaging (NEPP) Electronics [26] J. Seronie-Vivien and C. Cantenot, “RTEMS Operating System Qualifi-
Technology Workshop, 2021, accessed: March 10, 2023. [Online]. cation,” in Data Systems in Aerospace (DASIA), vol. 602, 2005.
Available: https://nepp.nasa.gov/workshops/etw2021/talks/17-JUN- [27] mixtile, “Mixtile core 3588e,” 2023, accessed: August 21, 2023.
21 Thur/1045 Nikulainen Tonicello-Utilisation-of-COTS-in-ESA- [Online]. Available: https://www.mixtile.com/core-3588e/
Missions.pdf [28] PC/104 Consortium, “PCIe/104,” 2021, accessed: March 10, 2023.
[8] European Cooperation for Space Standardization (ECSS), “ECSS-Q-ST- [Online]. Available: https://pc104.org/hardware-specifications/pcie104/
60-13C Rev.1 – Commercial electrical, electronic and electromechanical [29] D. Steenari, L. Kosmidis, I. Rodriguez, S. Muret, M. Bargholz, D. M.
(EEE) components,” European Space Agency, ECSS Standard ECSS-Q- Tali, and L. Mansilla”, “OBPMark and OBPMark-ML - Computational
ST-60-13C Rev.1, 2022, accessed: March 10, 2023. [Online]. Available: Benchmarks for On-Board Data Processing and Machine Learning”,” in
https://ecss.nl/standard/ecss-q-st-60-13c-rev-1-commercial-electrical- European Data Handling and Data Processing Conference for Space
electronic-and-electromechanical-eee-components-12-may-2022/ (EDHPC), 2023.
[9] F. Tonicello, “ESA Position And Activities Related to COTS Usage in [30] I. Rodriguez, L. Kosmidis, M. M. Trompouki, F. J. Cazorla, and
Space,” in ACCEDE 2022 Workshop on COTS Components for Space D. Steenari, “Evaluating the Computational Capabilities of Embedded
Applications, oct 2022. Multicore and GPU Platforms for On-Board Image Processing,” in
[10] International Organization for Standardization, ISO/DIS 26262. Road European Data Handling and Data Processing Conference for Space
Vehicles – Functional Safety, 2018. (EDHPC), 2023.
[11] D. Steenari, L. Kosmidis, I. Rodriguez-Ferrandez, A. Jover-Alvarez, and [31] A. Jover-Alvarez, I. Rodrı́guez, L. Kosmidis, and D. Steenari, “Space
K. Forster, “OBPMark (On-Board Processing Benchmarks)-Open Source Compression Algorithms Acceleration on Embedded Multi-Core and
Computational Performance Benchmarks for Space Applications,” in GPU Platforms,” Ada Lett., vol. 42, no. 1, dec 2022.
European Workshop on On-Board Data Processing (OBDP), 2021. [32] M. Solé, I. Rodriguez-Ferrandez, D. Steenari, and L. Kosmidis, “Ac-
[12] L. Kosmidis, I. Rodriguez, A. Jover-Alvarez, S. Alcaide, J. Lachaize, celeration of Synthetic Aperture Radar for On-board Space Systems,”
O. Notebaert, A. Certain, and D. Steenari, “GPU4S: Major Project Out- IEEE High Performance Extreme Computing (HPEC), 2023.
comes, Lessons Learnt and Way Forward,” in 2021 Design, Automation [33] Y. Barrios, A. J. Sánchez, L. Santos, and R. Sarmiento, “SHyLoC 2.0:
& Test in Europe Conference & Exhibition (DATE). IEEE, 2021, pp. A Versatile Hardware Solution for On-board Data and Hyperspectral
1314–1319. Image Compression on Future Space Missions,” IEEE Access, vol. 8,
[13] E. Wyrwas, “Proton Testing of nVidia Jetson TX2,” jun 2019. [Online]. pp. 54 269–54 287, 2020.
Available: https://nepp.nasa.gov/files/30370/NEPP-TR-2019-Wyrwas-
NEPPweb-TR-19-021 1nVidia-Jetson-TX2-2019June02-TN72754.pdf

You might also like