The METASAT Hardware Platform A High-Performance Multicore AI SIMD and GPU RISC-V Platform For On-Board Processing
The METASAT Hardware Platform A High-Performance Multicore AI SIMD and GPU RISC-V Platform For On-Board Processing
Abstract—The METASAT Horizon Europe project which is frameworks like TensorFlow, PyTorch, OpenVX and neural
funded by the European Commission and started in January network exchange formats like ONNX and NNEF.
2023, will enable model-based design methodologies in order to Such an increased complexity of both hardware and soft-
manage the complexity of upcoming hardware and software for
space on-board processing. As a representative high performance ware of future space platforms is hard to manage. For this
platform for on-board processing, METASAT will design a multi- reason, model-based engineering approaches are increasingly
core platform featuring accelerators prototyped on an FPGA. employed in the design of space systems. In particular, the
This includes both an AI SIMD accelerator tightly integrated European Space Agency (ESA) has developed the open source
with the CPU, as well as a GPU. All hardware components of TASTE framework [12] which is constantly under develop-
the METASAT platform will be open source and based on the
RISC-V open ISA. In this paper, we provide an overview of ment with new functionalities.
the platform architecture as well as preliminary implementation The Horizon Europe project METASAT [10], funded by
decisions, current development status and early results. the European Commission, will develop model-based design
Index Terms—multicore, SIMD, GPU, AI accelerator, RISC-V approaches which will help to manage the complexity of
programming such advanced high performance platforms,
including AI accelerators and GPUs. For this reason, the
I. I NTRODUCTION METASAT reference platform is currently under development,
Current and upcoming space missions become increasingly which will serve as a low TRL (3-4) target for the development
complex, incorporating new functionalities and even the use of the aforementioned model-based design methods.
of Artificial Intelligence (AI) for on-board processing. For The METASAT hardware platform will be prototyped on an
this reason there is a trend to move towards more powerful FPGA and will target an architecture that will be possible to
hardware architectures which can provide the computational be qualified and used in institutional missions in the future.
power required by this type of processing. Moreover, a virtual platform of the METASAT hardware will
Multicores have been introduced into space platforms for be developed, which will allow software development before
more than a decade with the introduction of NGMP (Next the hardware is available and will support the model-based
Generation Microprocessor), which has been recently quali- design process.
fied. Next generation Frontgrade Gaisler’s platforms such as In this paper, we provide an overview of the architectural
GR765 are also based on multicores. In terms of Real-Time design of the METASAT platform, as well as its current
Operating Systems (RTOS) support, RTEMS SMP has full development status and preliminary hardware implementation
support for multicores and it is fully qualified for NGMP, with and performance results.
the pre-qualification toolkit for GR721RC and GR740 being The rest of the paper is organised as follows: Section II
openly available [4]. provides an overview of the hardware architecture and its
Despite the proliferation of multicores in space, currently envisaged software stack. Section III discusses the trade-off
their use is limited to the ability of executing single threaded study for the selection of the FPGA platform, while Sec-
tasks on different cores, but not real parallel processing. While tions IV and V summarize the hardware developments related
this can increase the overall computation capacity of the to the FPGA prototype and the virtual platform respectively.
on-board processing platform, the single thread performance Section VI presents the current implementation status and
provided by each core is not enough for the advanced func- early hardware resource utilisation results, while Section VII
tionalities mentioned earlier [14]. This can be solved either presents preliminary performance results. Finally, Section VIII
with the introduction of more capable hardware such as ac- provides the conclusions of the paper.
celerators [2] or with more complex software, through the use
of new parallel programming models like OpenMP [14] and II. T HE METASAT H ARDWARE P LATFORM OVERVIEW
OpenCL. Hardware accelerators, especially the ones focused The METASAT hardware platform will be a mixed-
on AI processing require also complex programming models criticality platform, allowing the deployment of software of
and software stacks, such as popular Machine Learning (ML) different criticality on the same hardware. In order to achieve
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.
High-Criticality, High-Criticality, High-Criticality, Low-Criticality, FPGA Cost Lead Resources Preliminary Synthesis Ethernet
Qualification, Qualification, Qualification, Best effort task
Time (LUTs) results IP
Real-time Real-time Real-time
requirements requirements requirements
Partition
Management Xtratum Xtratum Xtratum KCU105 $3,882.00 16 242400 4 HP NOEL-V cores No
Software Partition for Partition for Partition for weeks (102% LUT utilization)
Partition application 1 application 2 application N
Health VC707 $5,664.00 18 303600 4 HP NOEL-V cores No
Monitoring Application sw Application sw weeks (92% LUT utilization)
(possibly (possibly Application sw
Update
Management legacy) legacy) VCU118 $9,066.00 12 1182240 6 HP NOEL-V cores Yes
Satellite Qualifiable FT GPU Qualifiable FT GPU ... Linux sw Application
sw stack/middleware sw stack/middleware libraries layer weeks (30% LUT utilization)
Management GPU remoting GPU remoting
Other API API GPU driver ZCU102 $3,234.00 20 274080 4 HP NOEL-V cores No
Ada/ C/ RTEMS
Functionalities (SMP)
LithOS APEX Linux Kernel
weeks (82% LUT utilization)
Portable Middleware exposing a common API for all the Middleware GPU
considered hypervisor technologies remoting API Middleware Layer
VCU108 $7,770.00 2 weeks 537600 4 HP NOEL-V cores No
(52% LUT utilization)
Hypervisor (Xtratum) GPU Driver OS Layer Naive synthesis, no
port-mapping
(Multicore) CPUs + SPARROW AI SIMD Accelerator GPU HW Layer
this, it employs the concept of virtualisation. In particular, it with external equipment which will be used in the project
uses the Xtratum XNG Hypervisor from Fentiss [7], which with a space case study provided by OHB, emulating the
is a member of the consortium. Figure 1 shows a high level satellite’s platform. Additional space-related use cases with
hardware and software architecture of the platform. high performance processing needs will be provided by BSC,
In order to provide a high-performance design, in which par- based on the open source OBPMark benchmarking suite [15].
titions of different criticality will be assigned to separate CPU
cores, METASAT employs a multicore version of Frontgrade III. I NITIAL FPGA T RADE - OFF A NALYSIS
Gaisler NOEL-V [8] RISC-V CPU, which is enhanced with For the choice of the FPGA platform that will host the
AI processing capabilities, through integration with the SPAR- hardware prototype, a large FPGA was required, enough to
ROW open source AI SIMD (Single Instruction Multiple Data) fit a multicore and a reasonably sized GPU. A preliminary
unit [2]. This will satisfy the AI needs of applications with analysis between various trade-offs was performed in order
moderate acceleration needs, with low latency requirements to identify the best FPGA board for use in the project,
and with the need of high criticality, qualifiable software. considering the following aspects: FPGA resources, price, lead
For the acceleration of applications with much higher per- time and whether the FPGA platform contains an Ethernet IP.
formance needs, the METASAT platform includes an open In terms of FPGA capacity, the 5 largest COTS Xilinx
source RISC-V based GPU [16] [17], which will be extended FPGAs were first considered to ensure that they could fit
with real-time capabilities such as hardware features that allow both a multicore and a GPU. Two of them (VCU1525 and
the computation of Worst Case Execution Time (WCET) of KCU1500) are discontinued, while one of them, VCU110, has
GPU tasks and reduction of interference between multiple a cost beyond $20K, which makes it prohibitive for use in the
GPU tasks executed concurrently on the GPU, and reliability project, since the hardware budget allocated per partner at the
features which are required for use in space. The GPU is fully proposal stage did not exceed $10K. This limited the selection
configurable in the number of shader cores, number of threads, of the possible high capacity FPGAs only to VCU118 and
presence and size of shared L2 and L3 caches etc. VCU108. Note that RadHard FPGAs were not considered due
One of the current limitations of the use of GPUs in insti- to their higher cost and the low target Technology Readiness
tutional missions of high criticality is that most GPUs require Level (TRL) of the project.
device drivers and user space libraries for non-qualifiable Figure 2 shows the results of our preliminary trade-off
operating systems like Linux and Android [11]. Moreover, analysis of these FPGAs, as well as a comparison with
their closed source nature prevents their porting to qualifiable, other smaller and lower cost FPGAs already supported by
Real-Time operating systems used in space like RTEMS. In Frontgrade Gaisler. The comparison includes a preliminary
METASAT, we will overcome this limitation by adapting synthesis of a multicore configuration of a high performance
Vortex’s bare metal open source GPU driver, and develop configuration of the NOEL-V processor, as an indication of
a portable method for the use of GPUs among multiple the remaining utilisation for the GPU implementation.
partitions, no matter whether they will be running on bare- Based on our early evaluation, the selected platform is the
metal, RTEMS, Xtratum Runtime Environment or a full Linux Xilinx VCU118. A preliminary synthesis of the METASAT
partition for low criticality software. platform shows that this FPGA is enough to include 4 64-bit
Finally, the METASAT platform will include an Ethernet high performance configurations of NOEL-V with SPARROW
IP and UART in order to provide communication interfaces AI accelerators with a 30% utilisation, or 8 cores with 48%
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.
Multicore GPU SPARROW’s opcodes have been reorganised, since newer
CPU CPU
NOEL-V versions have used some of the RISC-V opcodes
CU CU
SPARROW reserved for custom processor extensions, which were also
SIMD Unit used by SPARROW.
L2 L3 L2
In addition, SPARROW has been extended to 8 SIMD lanes,
CPU CPU since in the 64 bit configuration of NOEL-V, each integer
CU CU
register is 64 bit wide. Moreover, since the processor is dual-
issue, two SPARROW units have been added. This allows up
Fig. 3. High-level METASAT Hardware Platform architecture. to 16 8-bit operations to be executed per cycle per CPU. It is
worth noting that the SPARROW integration does not impact
the NOEL-V frequency, which remains 100 MHz.
utilisation. This leaves the other half of the FPGA for the
implementation of a multicore Vortex GPU consisting of 4 C. Vortex GPU
64-bit shader cores and a 64KB L2 cache, which our analysis
Due to resource limitations in the FPGA and in order to
shows that it requires 50% utilisation.
facilitate the implementation of a functional prototype (i.e.
IV. FPGA D EVELOPMENT through RTL simulations for debugging), the Vortex GPU is
Currently the METASAT platform is under development. currently configured to use the simplest and smaller config-
Once the integration of the multicore CPU and GPU is uration. In particular, it uses a cluster with a single compute
completed and it is fully functional, the most appropriate unit (CU), with 4 threads. The Compute Unit has a 16 KB
configuration (number of cores, cache sizes, number of shader instruction cache, and 16 KB data cache, organised in 4 banks,
cores, threads and cache size) will be determined. It is also one per thread. Once a functional prototype with the GPU is
worth noting that the platform will be released as open source. available, a more complex configurations will be instantiated.
The baseline configuration of the platform consists of an Vortex officially supports only the Intel Arria 10 and Intel
integrated multicore platform based on NOEL-V with the Stratix 10 FPGA acceleration cards. These FPGAs are con-
SPARROW AI accelerator, the Vortex GPU, UART and Eth- nected on an x86 computer through PCIe, and communicate
ernet. Figure 3 shows a high level picture of the architecture with the host processor using the Open Programmable Accel-
of the METASAT hardware platform. The hardware elements eration Engine (OPAE) framework. In addition, it features an
shown in the Figure can be enabled or disabled at platform AXI interface which allows Vortex to connect to the DRAM.
instantiation time, and their number and size is configurable. However, our target FPGA is a standalone device, which
Next we provide more details for each of the hardware implements both the host processor and the GPU in the FPGA
components of the METASAT platform. fabric as soft IPs. For this reason, first we needed to port
Vortex on our Xilinx FPGA. Since both the NOEL-V multicore
A. CPU Configuration CPU system and the GPU need to access memory using AXI,
Due to the need to support mixed criticality software, which we have used an interconnection network based on PULP
will be satisfied using the Xtratum hypervisor, a NOEL-V platform’s AXI modules [6] and adapted the network also used
configuration with support with Memory Management Unit in the SELENE project [9]. This interconnection arbitrates
(MMU) and the RISC-V hypervisor extension was needed. between the CPU and GPU responding each memory petition
For this reason, the METASAT platform uses the highest accordingly.
performance NOEL-V configuration. This is a 64-bit RISC-
D. UART and Ethernet
V core, with dual issue pipeline and floating point unit,
clocked at 100 MHz. Each CPU features private L1 caches For the METASAT platform connectivity, UART and Ether-
for instructions and data, with 16KB size and 32 bytes cache net interfaces are included. In both cases, GPL GRLIB IPs are
line length. used. In the case of the UART, the APBUART IP is used, while
Since the GPL version of NOEL-V and Frontgrade Gaisler’s for Ethernet the GRETH is used, which provides up to 100
GRLIB IP library is used, METASAT’s platform uses some IP Mbit ethernet connectivity, together with Frontgrade Gaisler’s
components which are less optimised and have fewer configu- RGMII (reduced gigabit media-independent interface) to GMII
rations. In particular, for the floating point unit, METASAT (gigabit media-independent interface) adapter.
uses the area optimised, low performance NanoFPUnv. In
V. V IRTUAL P LATFORM
terms of the unified L2 cache among the multiple cores,
METASAT uses L2 cache Lite (L2C-LITE) with 256 KB size The availability of a virtual platform for the METASAT
and 2 ways, with pseudo random replacement policy. hardware is very important for the project goals. First, a virtual
platform allows the software development and porting, without
B. SPARROW SIMD Accelerator waiting for the final RTL (register transfer level) hardware
The SPARROW AI accelerator [2] has been improved implementation on the FPGA.
and integrated with the high performance NOEL-V CPU In addition, as discussed in Section III, the FPGA used
configuration used in the METASAT platform. In particular, for prototyping METASAT’s platform is quite costly, so only
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.
TABLE I
P RELIMINARY R ESOURCE UTILIZATION OF THE METASAT H ARDWARE P LATFORM
Xilinx VCU118 Multicore NOEL-V + SPARROW Multicore NOEL-V + SPARROW + Vortex GPU
LUT 1182240 296360 (25.07%) 422308 (35.72%)
LUTRAM 591840 3251 (0.55%) 7204 (1.22%)
FF 2364480 123917 (5.24%) 401645 (16.99%)
BRAM 2160 196 (9.07%) 196 (9.07%)
Controller
Memory
GPU
effective way, with the availability of a simulated platform.
Last but not least, the model-based design approach developed
in METASAT, can benefit significantly from the availability of
AXI
a digital twin [10]. Interconnect
Ethernet
First, the open source SIS simulator used in the RTEMS SPARROW
Core 2 SPARROW
Multicore
project as a simulation platform is modified, in order to CPU Core 2
SPARROW Platform
for SPARROW and Vortex. SIS models a generic RISC-V Interrupt
Core 1 controller
CPU but it does not support the full features of NOEL-
V. In particular, the Memory Management Unit (MMU) and
hypervisor support are missing, therefore it is not a simulator Fig. 4. Preliminary METASAT Hardware Platform FPGA floorplan.
that can fit all METASAT needs. However, SIS has a small
code base and it is easy to modify and customise. Moreover,
platform is fully functional and can boot both RTEMS SMP
SIS provides support for the GRETH Ethernet IP.
as well as Linux. The GRETH functionality has been tested
The second option, QEMU is a more generic simulator,
successfully with Gaisler’s Linux port. Compiler support for
supporting full system emulation, for many different instruc-
SPARROW has been added in the RTEMS SMP compiler, and
tion set architectures, and several particular devices. QEMU
it is possible to execute programs that combine both OpenMP
supports among others a generic leon3 model with basic GR-
and SPARROW instructions under RTEMS.
LIB peripherals, as well as a generic risc64 model. However,
The GPU functionality and interaction with the NOEL-
two notable missing features from QEMU’s implementation
V CPU has been successfully verified in full system RTL
of GRLIB components, is support for the Multiprocessor
simulation using QuestaSim, under a bare metal configuration.
Interrupt Controller for more than one cores, as well as support
In the next steps, FPGA emulation will be performed, and
for the GRETH device.
GPU software support under RTEMS and Xtratum will be
Again a NOEL-V and a METASAT model with SPARROW
developed, including GPU sharing between different partitions.
and Vortex are being developed within QEMU. As a part of
Table I shows the preliminary synthesis results of the
this effort, models for Frontgrade Gaisler’s existing platforms
baseline METASAT platform, broken down in two parts. The
GR740 and GR712RC are also under development for famil-
multicore platform, which consists of the NOEL-V cores
iarisation with QEMU development. Currently, our GR740
integrated with the SPARROW units, and for the entire
and GR712RC QEMU platform models are able to execute
METASAT platform, which includes also the Vortex GPU.
unmodified RTEMS binaries produced by Frontgrade Gaisler’s
The multicore part of the design consumes 5.244 W according
RCC compiler as well as to boot an unmodified Linux kernel
to the Xilinx reports, while the total platform consumption
compiled for these platforms. Combining the linux kernel with
including the GPU is 5.883 W. Note however that the current
a ramdisk built with Gaisler’s LEON buildroot, results in a
GPU configuration is minimal, as explained in Section IV-C,
fully functional emulated system.
in order to facilitate the integration until a fully functional
Similar to the FPGA development, our developments re- platform becomes available.
garding the virtual platforms will be also open sourced, and Figure 4 shows the floor plan of the preliminary platform
we will also try to get our patches accepted upstream. configuration, in which the different parts of the design are
shown. We notice that roughly half of the design is occupied
VI. D EVELOPMENT S TATUS AND P RELIMINARY
by the multicore CPU, while the other half is used by the GPU.
H ARDWARE I MPLEMENTATION R ESULTS
We can see that each core features 2 SPARROW units, which
Currently a baseline METASAT platform is integrated and only occupy a fraction of the core’s utilisation. Moreover,
synthesised for the Xilinx VCU118. The multicore part of the there are enough available resources in the FPGA in order
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.
TABLE II TABLE III
P RELIMINARY P ERFORMANCE R ESULTS WITH M ATRIX M ULTIPLICATION P RELIMINARY P ERFORMANCE R ESULTS WITH M ATRIX M ULTIPLICATION
FOR SIZE 1024 X 1024 FOR SIZE 4096 X 4096
Implementation Execution Time (s) Speed-up Implementation Execution Time (s) Speed-up
Sequential 86.70 Sequential 5757.96
OpenMP 4 CPUs 19.93 4.4× OpenMP 4 CPUs 1339.93 4.3×
SPARROW 1 CPU 19.40 4.5× SPARROW 1 CPU 1452.94 3.9×
SPARROW OpenMP 4 CPUs 5.66 15.3× SPARROW OpenMP 4 CPUs 852.34 6.7×
to implement more complex configurations. METASAT FPGA platform is 2× faster than GR740 thanks
When a fully functional baseline platform is implemented to SPARROW, both in single core and multicore workloads.
(i.e. after verifying the GPU functionality on the FPGA), a Considering that the METASAT platform is implemented on
design space exploration step will be performed in order to an FPGA and therefore has a lower frequency (100 MHz) than
select the best configuration for the METASAT use cases. GR740 (250 MHz) and uses lower performance GPL compo-
VII. P RELIMINARY P ERFORMANCE R ESULTS nents from GRLIB (iterative NanoFPUnv and less optimised
Since the multicore part of the platform is fully functional, L2 cache), this is a very promising result. This indicates that a
in this section we provide some preliminary performance potential ASIC implementation of the METASAT platform in
results obtained with the current METASAT FPGA prototype. the future with commercial GRLIB components will achieve
Table II presents the results of 8-bit matrix multiplication, much higher performance.
a computational kernel used very frequently in deep learning
workloads, since it is used for the implementation of fully VIII. C ONCLUSION
connected layers in neural networks. The multiplied matrices
have size 1024x1024 and the second matrix is transposed In this paper, we presented the architecture of the
in order to have a cache friendly memory access pattern. METASAT platform which is currently under development in
The execution time of the transposition is included in the the METASAT Horizon Europe project [10]. The platform will
computation and results are obtained under RTEMS SMP. provide high performance for on-board processing relying on
We notice that OpenMP parallelisation results in more the open RISC-V instruction set architecture. The METASAT
than 4× speedup over the sequential version. Note that the platform combines high performance NOEL-V CPUs from
obtained speedup is superlinear since each CPU is dual-issue. Frontgrade Gaisler enhanced with the SPARROW AI accelera-
The size of the matrices is smaller than the caches, and tor and a high performance Vortex GPU. On the software side,
therefore they can even fit in the L1 cache. Interestingly, it will support a fully qualifiable software stack, which will
if we use SPARROW and only one CPU, we get an even enable the use of these technologies in institutional missions.
higher speedup. Therefore, a simple SIMD unit with very low In addition to the architecture, we covered the current status
hardware cost can provide a similar speedup with 4 cores and of the hardware prototyping in the Xilinx VCU118 FPGA, and
a much higher hardware cost. If SPARROW is combined with the on-going efforts for the creation of its virtual platforms.
OpenMP parallelisation, the combined performance is 15.3× Finally, we have presented some preliminary hardware imple-
faster than the sequential version. mentation results and early performance results obtained with
Table III shows execution time results with matrix multipli- a preliminary platform configuration. The obtained speedups
cation of matrices with size 4096x4096, which do not fit in over the sequential version are very promising since they are
the cache. In this case, the obtained speed-up is slightly slower almost linear, and in absolute terms quite close to the ASIC
both for the OpenMP and SPARROW, but again very close to implementation of GR740.
4. Moreover, in this case the multicore performance is higher
than the SPARROW one. When OpenMP and SPARROW are ACKNOWLEDGEMENTS
used together, the combined speed-up is 6.7× which is again
quite high. This work was supported by the European Commu-
In [14] the performance of the preliminary METASAT hard- nity’s Horizon Europe programme under the METASAT
ware platform is compared against several multicore platforms project (grant agreement 101082622). In addition, it was
targeting the space domain, using the GPU4S Bench [13], partially supported by the Spanish Ministry of Econ-
which forms part of ESA’s open source benchmarking suite omy and Competitiveness under grants PID2019-107255GB-
OBPMark [3] [15]. The preliminary METASAT platform C21 and IJC-2020-045931-I ( Spanish State Research
prototyped on FPGA provides multicore performance close to Agency / Agencia Española de Investigación (AEI) /
the one of GR740 ASIC implementation and near linear speed- http://dx.doi.org/10.13039/501100011033 ) and by the Depart-
ups with the number of cores. In particular, GR740 is 2× faster ment of Research and Universities of the Government of
for integer processing, and 8-10× faster for floating point Catalonia with a grant to the CAOS Research Group (Code:
workloads. In terms of 8-bit processing for AI workloads, the 2021 SGR 00637).
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES Based Design and Testing for Applications in Satellites. In Embedded
Computer Systems: Architectures, Modeling, and Simulation - 22nd
[1] Fabrice Bellard. QEMU, a Fast and Portable Dynamic Translator. In International Conference (SAMOS), Lecture Notes in Computer Science,
USENIX Annual Technical Conference, 2005. 2023.
[2] Marc Solé Bonet and Leonidas Kosmidis. SPARROW: A Low-Cost [11] Leonidas Kosmidis, Iván Rodriguez, Alvaro Jover-Alvarez, Sergi Al-
Hardware/Software Co-designed SIMD Microarchitecture for AI Op- caide, Jérôme Lachaize, Olivier Notebaert, Antoine Certain, and David
erations in Space Processors. In Design, Automation Test in Europe Steenari. GPU4S: Major Project Outcomes, Lessons Learnt and Way
Conference & Exhibition (DATE), 2022. Forward. In Design, Automation Test in Europe Conference &
[3] D. Steenari et al. On-Board Processing Benchmarks, 2021. Exhibition (DATE), 2021.
http://obpmark.github.io/. [12] Maxime Perrotin, Eric Conquet, Julien Delange, and Thanassis Tsio-
[4] ESA. RTEMS SMP Qualification Data Packet. https://rtems- dras. TASTE: An Open-source Tool-chain for Embedded System and
qual.io.esa.int/. Software Development. In Embedded Real Time Software and Systems
[5] ESA. SPARC Instruction Simulator. https://essr.esa.int/project/erc32- (ERTS2012), Toulouse, France, February 2012.
and-sis. [13] Ivan Rodriguez, Leonidas Kosmidis, Jerome Lachaize, Olivier Note-
[6] ETH. AXI SystemVerilog Modules for High-Performance On-Chip baert, and David Steenari. GPU4S Bench: Design and Implemen-
Communication. https://github.com/pulp-platform/axi. tation of an Open GPU Benchmarking Suite for Space On-board
[7] fentISS. XtratuM hypervisor. https://www.fentiss.com/xtratum/. Processing. Technical Report UPC-DAC-RR-CAP-2019-1, Univer-
[8] Frontgrade Gaisler. NOEL-V Processor. sitat Politècnica de Catalunya. https://www.ac.upc.edu/app/research-
https://www.gaisler.com/index.php/products/processors/noel-v. reports/public/html/research center index-CAP-2019,en.html.
[9] Carles Hernàndez, Jose Flieh, Roberto Paredes, Charles-Alexis Lefeb- [14] Marc Solé, Jannis Wolf, Ivan Rodriguez, Alvaro Jover, Matina Maria
vre, Imanol Allende, Jaume Abella, David Trillin, Martin Matschnig, Trompouki, and Leonidas Kosmidis. Evaluation of the Multicore
Bernhard Fischer, Konrad Schwarz, Jan Kiszka, Martin Rönnbäck, Johan Performance Capabilities of the Next Generation Flight Computers. In
Klockars, Nicholas McGuire, Franz Rammerstorfer, Christian Schwarzl, Digital Avionics Systems Conference (DASC), 2023.
Franck Wartet, Dierk Lüdemann, and Mikel Labayen. SELENE: Self- [15] David Steenari, Leonidas Kosmidis, Ivan Rodrı́guez-Ferrández, Álvaro
Monitored Dependable Platform for High-Performance Safety-Critical Jover-Álvarez, and Kyra Förster. OBPMark (On-Board Processing
Systems. In Euromicro Conference on Digital System Design (DSD), Benchmarks) - Open Source Computational Performance Benchmarks
2020. for Space Applications. In 2nd European Workshop on On-Board Data
[10] Leonidas Kosmidis, Alejandro J. Calderón, Aridane Álvarez Suárez, Processing (OBDP), 2021.
Stefano Sinisi, Eckart Göhler, Paco Gómez Molinero, Alfred Hönle, [16] Georgia Tech. Vortex GPU. https://vortex.cc.gatech.edu/.
Álvaro Jover Álvarez, Lorenzo Lazzara, Miguel Masmano Tello, Peio [17] Blaise Tine, Krishna Praveen Yalamarthy, Fares Elsabbagh, and Kim
Onaindia, Tomaso Poggi, Iván Rodrı́guez Ferrández, Marc Solé Bonet, Hyesoon. Vortex: Extending the RISC-V ISA for GPGPU and 3D-
Giulia Stazi, Matina Maria Trompouki, Alessandro Ulisse, Valerio Graphics. In International Symposium on Microarchitecture (MICRO),
Di Valerio, Jannis Wolf, and Irune Yarza. METASAT: Modular Model- 2021.
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.