0% found this document useful (0 votes)

41 views6 pages

The METASAT Hardware Platform A High-Performance Multicore AI SIMD and GPU RISC-V Platform For On-Board Processing

The document describes the METASAT hardware platform, which is a high-performance multicore platform featuring accelerators for on-board space processing. It will include an AI SIMD accelerator tightly integrated with CPUs as well as a GPU, all based on the open-source RISC-V ISA. The platform is being developed under the METASAT Horizon Europe project and will serve as a target for model-based design approaches to manage complexity.

Uploaded by

Raphael Lopes Pinheiro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views6 pages

The METASAT Hardware Platform A High-Performance Multicore AI SIMD and GPU RISC-V Platform For On-Board Processing

Uploaded by

Raphael Lopes Pinheiro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

The METASAT Hardware Platform:

A High-Performance Multicore, AI SIMD and GPU

RISC-V Platform for On-board Processing
Leonidas Kosmidis†∗ Marc Solé∗† Ivan Rodriguez∗† Jannis Wolf† Matina Maria Trompouki∗†
∗ †
Universitat Politècnica de Catalunya (UPC) Barcelona Supercomputing Center (BSC)

Abstract—The METASAT Horizon Europe project which is frameworks like TensorFlow, PyTorch, OpenVX and neural
funded by the European Commission and started in January network exchange formats like ONNX and NNEF.
2023, will enable model-based design methodologies in order to Such an increased complexity of both hardware and soft-
manage the complexity of upcoming hardware and software for
space on-board processing. As a representative high performance ware of future space platforms is hard to manage. For this
platform for on-board processing, METASAT will design a multi- reason, model-based engineering approaches are increasingly
core platform featuring accelerators prototyped on an FPGA. employed in the design of space systems. In particular, the
This includes both an AI SIMD accelerator tightly integrated European Space Agency (ESA) has developed the open source
with the CPU, as well as a GPU. All hardware components of TASTE framework [12] which is constantly under develop-
the METASAT platform will be open source and based on the
RISC-V open ISA. In this paper, we provide an overview of ment with new functionalities.
the platform architecture as well as preliminary implementation The Horizon Europe project METASAT [10], funded by
decisions, current development status and early results. the European Commission, will develop model-based design
Index Terms—multicore, SIMD, GPU, AI accelerator, RISC-V approaches which will help to manage the complexity of
programming such advanced high performance platforms,
including AI accelerators and GPUs. For this reason, the
I. I NTRODUCTION METASAT reference platform is currently under development,
Current and upcoming space missions become increasingly which will serve as a low TRL (3-4) target for the development
complex, incorporating new functionalities and even the use of the aforementioned model-based design methods.
of Artificial Intelligence (AI) for on-board processing. For The METASAT hardware platform will be prototyped on an
this reason there is a trend to move towards more powerful FPGA and will target an architecture that will be possible to
hardware architectures which can provide the computational be qualified and used in institutional missions in the future.
power required by this type of processing. Moreover, a virtual platform of the METASAT hardware will
Multicores have been introduced into space platforms for be developed, which will allow software development before
more than a decade with the introduction of NGMP (Next the hardware is available and will support the model-based
Generation Microprocessor), which has been recently quali- design process.
fied. Next generation Frontgrade Gaisler’s platforms such as In this paper, we provide an overview of the architectural
GR765 are also based on multicores. In terms of Real-Time design of the METASAT platform, as well as its current
Operating Systems (RTOS) support, RTEMS SMP has full development status and preliminary hardware implementation
support for multicores and it is fully qualified for NGMP, with and performance results.
the pre-qualification toolkit for GR721RC and GR740 being The rest of the paper is organised as follows: Section II
openly available [4]. provides an overview of the hardware architecture and its
Despite the proliferation of multicores in space, currently envisaged software stack. Section III discusses the trade-off
their use is limited to the ability of executing single threaded study for the selection of the FPGA platform, while Sec-
tasks on different cores, but not real parallel processing. While tions IV and V summarize the hardware developments related
this can increase the overall computation capacity of the to the FPGA prototype and the virtual platform respectively.
on-board processing platform, the single thread performance Section VI presents the current implementation status and
provided by each core is not enough for the advanced func- early hardware resource utilisation results, while Section VII
tionalities mentioned earlier [14]. This can be solved either presents preliminary performance results. Finally, Section VIII
with the introduction of more capable hardware such as ac- provides the conclusions of the paper.
celerators [2] or with more complex software, through the use
of new parallel programming models like OpenMP [14] and II. T HE METASAT H ARDWARE P LATFORM OVERVIEW
OpenCL. Hardware accelerators, especially the ones focused The METASAT hardware platform will be a mixed-
on AI processing require also complex programming models criticality platform, allowing the deployment of software of
and software stacks, such as popular Machine Learning (ML) different criticality on the same hardware. In order to achieve

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.
High-Criticality, High-Criticality, High-Criticality, Low-Criticality, FPGA Cost Lead Resources Preliminary Synthesis Ethernet
Qualification, Qualification, Qualification, Best effort task
Time (LUTs) results IP
Real-time Real-time Real-time
requirements requirements requirements
Partition
Management Xtratum Xtratum Xtratum KCU105 $3,882.00 16 242400 4 HP NOEL-V cores No
Software Partition for Partition for Partition for weeks (102% LUT utilization)
Partition application 1 application 2 application N
Health VC707 $5,664.00 18 303600 4 HP NOEL-V cores No
Monitoring Application sw Application sw weeks (92% LUT utilization)
(possibly (possibly Application sw
Update
Management legacy) legacy) VCU118 $9,066.00 12 1182240 6 HP NOEL-V cores Yes
Satellite Qualifiable FT GPU Qualifiable FT GPU ... Linux sw Application
sw stack/middleware sw stack/middleware libraries layer weeks (30% LUT utilization)
Management GPU remoting GPU remoting
Other API API GPU driver ZCU102 $3,234.00 20 274080 4 HP NOEL-V cores No
Ada/ C/ RTEMS
Functionalities (SMP)
LithOS APEX Linux Kernel
weeks (82% LUT utilization)
Portable Middleware exposing a common API for all the Middleware GPU
considered hypervisor technologies remoting API Middleware Layer
VCU108 $7,770.00 2 weeks 537600 4 HP NOEL-V cores No
(52% LUT utilization)
Hypervisor (Xtratum) GPU Driver OS Layer Naive synthesis, no
port-mapping
(Multicore) CPUs + SPARROW AI SIMD Accelerator GPU HW Layer

Fig. 2. METASAT’s FPGA selection Trade-off Analysis

Fig. 1. METASAT’s Hardware and Software Overview

this, it employs the concept of virtualisation. In particular, it with external equipment which will be used in the project
uses the Xtratum XNG Hypervisor from Fentiss [7], which with a space case study provided by OHB, emulating the
is a member of the consortium. Figure 1 shows a high level satellite’s platform. Additional space-related use cases with
hardware and software architecture of the platform. high performance processing needs will be provided by BSC,
In order to provide a high-performance design, in which par- based on the open source OBPMark benchmarking suite [15].
titions of different criticality will be assigned to separate CPU
cores, METASAT employs a multicore version of Frontgrade III. I NITIAL FPGA T RADE - OFF A NALYSIS
Gaisler NOEL-V [8] RISC-V CPU, which is enhanced with For the choice of the FPGA platform that will host the
AI processing capabilities, through integration with the SPAR- hardware prototype, a large FPGA was required, enough to
ROW open source AI SIMD (Single Instruction Multiple Data) fit a multicore and a reasonably sized GPU. A preliminary
unit [2]. This will satisfy the AI needs of applications with analysis between various trade-offs was performed in order
moderate acceleration needs, with low latency requirements to identify the best FPGA board for use in the project,
and with the need of high criticality, qualifiable software. considering the following aspects: FPGA resources, price, lead
For the acceleration of applications with much higher per- time and whether the FPGA platform contains an Ethernet IP.
formance needs, the METASAT platform includes an open In terms of FPGA capacity, the 5 largest COTS Xilinx
source RISC-V based GPU [16] [17], which will be extended FPGAs were first considered to ensure that they could fit
with real-time capabilities such as hardware features that allow both a multicore and a GPU. Two of them (VCU1525 and
the computation of Worst Case Execution Time (WCET) of KCU1500) are discontinued, while one of them, VCU110, has
GPU tasks and reduction of interference between multiple a cost beyond $20K, which makes it prohibitive for use in the
GPU tasks executed concurrently on the GPU, and reliability project, since the hardware budget allocated per partner at the
features which are required for use in space. The GPU is fully proposal stage did not exceed $10K. This limited the selection
configurable in the number of shader cores, number of threads, of the possible high capacity FPGAs only to VCU118 and
presence and size of shared L2 and L3 caches etc. VCU108. Note that RadHard FPGAs were not considered due
One of the current limitations of the use of GPUs in insti- to their higher cost and the low target Technology Readiness
tutional missions of high criticality is that most GPUs require Level (TRL) of the project.
device drivers and user space libraries for non-qualifiable Figure 2 shows the results of our preliminary trade-off
operating systems like Linux and Android [11]. Moreover, analysis of these FPGAs, as well as a comparison with
their closed source nature prevents their porting to qualifiable, other smaller and lower cost FPGAs already supported by
Real-Time operating systems used in space like RTEMS. In Frontgrade Gaisler. The comparison includes a preliminary
METASAT, we will overcome this limitation by adapting synthesis of a multicore configuration of a high performance
Vortex’s bare metal open source GPU driver, and develop configuration of the NOEL-V processor, as an indication of
a portable method for the use of GPUs among multiple the remaining utilisation for the GPU implementation.
partitions, no matter whether they will be running on bare- Based on our early evaluation, the selected platform is the
metal, RTEMS, Xtratum Runtime Environment or a full Linux Xilinx VCU118. A preliminary synthesis of the METASAT
partition for low criticality software. platform shows that this FPGA is enough to include 4 64-bit
Finally, the METASAT platform will include an Ethernet high performance configurations of NOEL-V with SPARROW
IP and UART in order to provide communication interfaces AI accelerators with a 30% utilisation, or 8 cores with 48%

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.
Multicore GPU SPARROW’s opcodes have been reorganised, since newer
CPU CPU
NOEL-V versions have used some of the RISC-V opcodes
CU CU
SPARROW reserved for custom processor extensions, which were also
SIMD Unit used by SPARROW.
L2 L3 L2
In addition, SPARROW has been extended to 8 SIMD lanes,
CPU CPU since in the 64 bit configuration of NOEL-V, each integer
CU CU
register is 64 bit wide. Moreover, since the processor is dual-
issue, two SPARROW units have been added. This allows up
Fig. 3. High-level METASAT Hardware Platform architecture. to 16 8-bit operations to be executed per cycle per CPU. It is
worth noting that the SPARROW integration does not impact
the NOEL-V frequency, which remains 100 MHz.
utilisation. This leaves the other half of the FPGA for the
implementation of a multicore Vortex GPU consisting of 4 C. Vortex GPU
64-bit shader cores and a 64KB L2 cache, which our analysis
Due to resource limitations in the FPGA and in order to
shows that it requires 50% utilisation.
facilitate the implementation of a functional prototype (i.e.
IV. FPGA D EVELOPMENT through RTL simulations for debugging), the Vortex GPU is
Currently the METASAT platform is under development. currently configured to use the simplest and smaller config-
Once the integration of the multicore CPU and GPU is uration. In particular, it uses a cluster with a single compute
completed and it is fully functional, the most appropriate unit (CU), with 4 threads. The Compute Unit has a 16 KB
configuration (number of cores, cache sizes, number of shader instruction cache, and 16 KB data cache, organised in 4 banks,
cores, threads and cache size) will be determined. It is also one per thread. Once a functional prototype with the GPU is
worth noting that the platform will be released as open source. available, a more complex configurations will be instantiated.
The baseline configuration of the platform consists of an Vortex officially supports only the Intel Arria 10 and Intel
integrated multicore platform based on NOEL-V with the Stratix 10 FPGA acceleration cards. These FPGAs are con-
SPARROW AI accelerator, the Vortex GPU, UART and Eth- nected on an x86 computer through PCIe, and communicate
ernet. Figure 3 shows a high level picture of the architecture with the host processor using the Open Programmable Accel-
of the METASAT hardware platform. The hardware elements eration Engine (OPAE) framework. In addition, it features an
shown in the Figure can be enabled or disabled at platform AXI interface which allows Vortex to connect to the DRAM.
instantiation time, and their number and size is configurable. However, our target FPGA is a standalone device, which
Next we provide more details for each of the hardware implements both the host processor and the GPU in the FPGA
components of the METASAT platform. fabric as soft IPs. For this reason, first we needed to port
Vortex on our Xilinx FPGA. Since both the NOEL-V multicore
A. CPU Configuration CPU system and the GPU need to access memory using AXI,
Due to the need to support mixed criticality software, which we have used an interconnection network based on PULP
will be satisfied using the Xtratum hypervisor, a NOEL-V platform’s AXI modules [6] and adapted the network also used
configuration with support with Memory Management Unit in the SELENE project [9]. This interconnection arbitrates
(MMU) and the RISC-V hypervisor extension was needed. between the CPU and GPU responding each memory petition
For this reason, the METASAT platform uses the highest accordingly.
performance NOEL-V configuration. This is a 64-bit RISC-
D. UART and Ethernet
V core, with dual issue pipeline and floating point unit,
clocked at 100 MHz. Each CPU features private L1 caches For the METASAT platform connectivity, UART and Ether-
for instructions and data, with 16KB size and 32 bytes cache net interfaces are included. In both cases, GPL GRLIB IPs are
line length. used. In the case of the UART, the APBUART IP is used, while
Since the GPL version of NOEL-V and Frontgrade Gaisler’s for Ethernet the GRETH is used, which provides up to 100
GRLIB IP library is used, METASAT’s platform uses some IP Mbit ethernet connectivity, together with Frontgrade Gaisler’s
components which are less optimised and have fewer configu- RGMII (reduced gigabit media-independent interface) to GMII
rations. In particular, for the floating point unit, METASAT (gigabit media-independent interface) adapter.
uses the area optimised, low performance NanoFPUnv. In
V. V IRTUAL P LATFORM
terms of the unified L2 cache among the multiple cores,
METASAT uses L2 cache Lite (L2C-LITE) with 256 KB size The availability of a virtual platform for the METASAT
and 2 ways, with pseudo random replacement policy. hardware is very important for the project goals. First, a virtual
platform allows the software development and porting, without
B. SPARROW SIMD Accelerator waiting for the final RTL (register transfer level) hardware
The SPARROW AI accelerator [2] has been improved implementation on the FPGA.
and integrated with the high performance NOEL-V CPU In addition, as discussed in Section III, the FPGA used
configuration used in the METASAT platform. In particular, for prototyping METASAT’s platform is quite costly, so only

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.
TABLE I
P RELIMINARY R ESOURCE UTILIZATION OF THE METASAT H ARDWARE P LATFORM

Xilinx VCU118 Multicore NOEL-V + SPARROW Multicore NOEL-V + SPARROW + Vortex GPU
LUT 1182240 296360 (25.07%) 422308 (35.72%)
LUTRAM 591840 3251 (0.55%) 7204 (1.22%)
FF 2364480 123917 (5.24%) 401645 (16.99%)
BRAM 2160 196 (9.07%) 196 (9.07%)

few of these devices can be afforded. Therefore software

Vortex
development and validation can be performed in a more cost

Controller
Memory
GPU
effective way, with the availability of a simulated platform.
Last but not least, the model-based design approach developed
in METASAT, can benefit significantly from the availability of
AXI
a digital twin [10]. Interconnect

Two virtual platforms are currently under development,

modelling the METASAT hardware described in the previ-
L2 Cache
ous section, one based on SIS (SPARC/RISC-V Instruction Core 0 UART
Core 3

Simulator) simulator [5] and one based on QEMU [1]. JTAG

Core 0 Core 3

Ethernet
First, the open source SIS simulator used in the RTEMS SPARROW
Core 2 SPARROW
Multicore
project as a simulation platform is modified, in order to CPU Core 2

create a NOEL-V and a METASAT model, including support Core 1

SPARROW

SPARROW Platform
for SPARROW and Vortex. SIS models a generic RISC-V Interrupt
Core 1 controller
CPU but it does not support the full features of NOEL-
V. In particular, the Memory Management Unit (MMU) and
hypervisor support are missing, therefore it is not a simulator Fig. 4. Preliminary METASAT Hardware Platform FPGA floorplan.
that can fit all METASAT needs. However, SIS has a small
code base and it is easy to modify and customise. Moreover,
platform is fully functional and can boot both RTEMS SMP
SIS provides support for the GRETH Ethernet IP.
as well as Linux. The GRETH functionality has been tested
The second option, QEMU is a more generic simulator,
successfully with Gaisler’s Linux port. Compiler support for
supporting full system emulation, for many different instruc-
SPARROW has been added in the RTEMS SMP compiler, and
tion set architectures, and several particular devices. QEMU
it is possible to execute programs that combine both OpenMP
supports among others a generic leon3 model with basic GR-
and SPARROW instructions under RTEMS.
LIB peripherals, as well as a generic risc64 model. However,
The GPU functionality and interaction with the NOEL-
two notable missing features from QEMU’s implementation
V CPU has been successfully verified in full system RTL
of GRLIB components, is support for the Multiprocessor
simulation using QuestaSim, under a bare metal configuration.
Interrupt Controller for more than one cores, as well as support
In the next steps, FPGA emulation will be performed, and
for the GRETH device.
GPU software support under RTEMS and Xtratum will be
Again a NOEL-V and a METASAT model with SPARROW
developed, including GPU sharing between different partitions.
and Vortex are being developed within QEMU. As a part of
Table I shows the preliminary synthesis results of the
this effort, models for Frontgrade Gaisler’s existing platforms
baseline METASAT platform, broken down in two parts. The
GR740 and GR712RC are also under development for famil-
multicore platform, which consists of the NOEL-V cores
iarisation with QEMU development. Currently, our GR740
integrated with the SPARROW units, and for the entire
and GR712RC QEMU platform models are able to execute
METASAT platform, which includes also the Vortex GPU.
unmodified RTEMS binaries produced by Frontgrade Gaisler’s
The multicore part of the design consumes 5.244 W according
RCC compiler as well as to boot an unmodified Linux kernel
to the Xilinx reports, while the total platform consumption
compiled for these platforms. Combining the linux kernel with
including the GPU is 5.883 W. Note however that the current
a ramdisk built with Gaisler’s LEON buildroot, results in a
GPU configuration is minimal, as explained in Section IV-C,
fully functional emulated system.
in order to facilitate the integration until a fully functional
Similar to the FPGA development, our developments re- platform becomes available.
garding the virtual platforms will be also open sourced, and Figure 4 shows the floor plan of the preliminary platform
we will also try to get our patches accepted upstream. configuration, in which the different parts of the design are
shown. We notice that roughly half of the design is occupied
VI. D EVELOPMENT S TATUS AND P RELIMINARY
by the multicore CPU, while the other half is used by the GPU.
H ARDWARE I MPLEMENTATION R ESULTS
We can see that each core features 2 SPARROW units, which
Currently a baseline METASAT platform is integrated and only occupy a fraction of the core’s utilisation. Moreover,
synthesised for the Xilinx VCU118. The multicore part of the there are enough available resources in the FPGA in order

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.
TABLE II TABLE III
P RELIMINARY P ERFORMANCE R ESULTS WITH M ATRIX M ULTIPLICATION P RELIMINARY P ERFORMANCE R ESULTS WITH M ATRIX M ULTIPLICATION
FOR SIZE 1024 X 1024 FOR SIZE 4096 X 4096

Implementation Execution Time (s) Speed-up Implementation Execution Time (s) Speed-up
Sequential 86.70 Sequential 5757.96
OpenMP 4 CPUs 19.93 4.4× OpenMP 4 CPUs 1339.93 4.3×
SPARROW 1 CPU 19.40 4.5× SPARROW 1 CPU 1452.94 3.9×
SPARROW OpenMP 4 CPUs 5.66 15.3× SPARROW OpenMP 4 CPUs 852.34 6.7×

to implement more complex configurations. METASAT FPGA platform is 2× faster than GR740 thanks
When a fully functional baseline platform is implemented to SPARROW, both in single core and multicore workloads.
(i.e. after verifying the GPU functionality on the FPGA), a Considering that the METASAT platform is implemented on
design space exploration step will be performed in order to an FPGA and therefore has a lower frequency (100 MHz) than
select the best configuration for the METASAT use cases. GR740 (250 MHz) and uses lower performance GPL compo-
VII. P RELIMINARY P ERFORMANCE R ESULTS nents from GRLIB (iterative NanoFPUnv and less optimised
Since the multicore part of the platform is fully functional, L2 cache), this is a very promising result. This indicates that a
in this section we provide some preliminary performance potential ASIC implementation of the METASAT platform in
results obtained with the current METASAT FPGA prototype. the future with commercial GRLIB components will achieve
Table II presents the results of 8-bit matrix multiplication, much higher performance.
a computational kernel used very frequently in deep learning
workloads, since it is used for the implementation of fully VIII. C ONCLUSION
connected layers in neural networks. The multiplied matrices
have size 1024x1024 and the second matrix is transposed In this paper, we presented the architecture of the
in order to have a cache friendly memory access pattern. METASAT platform which is currently under development in
The execution time of the transposition is included in the the METASAT Horizon Europe project [10]. The platform will
computation and results are obtained under RTEMS SMP. provide high performance for on-board processing relying on
We notice that OpenMP parallelisation results in more the open RISC-V instruction set architecture. The METASAT
than 4× speedup over the sequential version. Note that the platform combines high performance NOEL-V CPUs from
obtained speedup is superlinear since each CPU is dual-issue. Frontgrade Gaisler enhanced with the SPARROW AI accelera-
The size of the matrices is smaller than the caches, and tor and a high performance Vortex GPU. On the software side,
therefore they can even fit in the L1 cache. Interestingly, it will support a fully qualifiable software stack, which will
if we use SPARROW and only one CPU, we get an even enable the use of these technologies in institutional missions.
higher speedup. Therefore, a simple SIMD unit with very low In addition to the architecture, we covered the current status
hardware cost can provide a similar speedup with 4 cores and of the hardware prototyping in the Xilinx VCU118 FPGA, and
a much higher hardware cost. If SPARROW is combined with the on-going efforts for the creation of its virtual platforms.
OpenMP parallelisation, the combined performance is 15.3× Finally, we have presented some preliminary hardware imple-
faster than the sequential version. mentation results and early performance results obtained with
Table III shows execution time results with matrix multipli- a preliminary platform configuration. The obtained speedups
cation of matrices with size 4096x4096, which do not fit in over the sequential version are very promising since they are
the cache. In this case, the obtained speed-up is slightly slower almost linear, and in absolute terms quite close to the ASIC
both for the OpenMP and SPARROW, but again very close to implementation of GR740.
4. Moreover, in this case the multicore performance is higher
than the SPARROW one. When OpenMP and SPARROW are ACKNOWLEDGEMENTS
used together, the combined speed-up is 6.7× which is again
quite high. This work was supported by the European Commu-
In [14] the performance of the preliminary METASAT hard- nity’s Horizon Europe programme under the METASAT
ware platform is compared against several multicore platforms project (grant agreement 101082622). In addition, it was
targeting the space domain, using the GPU4S Bench [13], partially supported by the Spanish Ministry of Econ-
which forms part of ESA’s open source benchmarking suite omy and Competitiveness under grants PID2019-107255GB-
OBPMark [3] [15]. The preliminary METASAT platform C21 and IJC-2020-045931-I ( Spanish State Research
prototyped on FPGA provides multicore performance close to Agency / Agencia Española de Investigación (AEI) /
the one of GR740 ASIC implementation and near linear speed- http://dx.doi.org/10.13039/501100011033 ) and by the Depart-
ups with the number of cores. In particular, GR740 is 2× faster ment of Research and Universities of the Government of
for integer processing, and 8-10× faster for floating point Catalonia with a grant to the CAOS Research Group (Code:
workloads. In terms of 8-bit processing for AI workloads, the 2021 SGR 00637).

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES Based Design and Testing for Applications in Satellites. In Embedded
Computer Systems: Architectures, Modeling, and Simulation - 22nd
[1] Fabrice Bellard. QEMU, a Fast and Portable Dynamic Translator. In International Conference (SAMOS), Lecture Notes in Computer Science,
USENIX Annual Technical Conference, 2005. 2023.
[2] Marc Solé Bonet and Leonidas Kosmidis. SPARROW: A Low-Cost [11] Leonidas Kosmidis, Iván Rodriguez, Alvaro Jover-Alvarez, Sergi Al-
Hardware/Software Co-designed SIMD Microarchitecture for AI Op- caide, Jérôme Lachaize, Olivier Notebaert, Antoine Certain, and David
erations in Space Processors. In Design, Automation Test in Europe Steenari. GPU4S: Major Project Outcomes, Lessons Learnt and Way
Conference & Exhibition (DATE), 2022. Forward. In Design, Automation Test in Europe Conference &
[3] D. Steenari et al. On-Board Processing Benchmarks, 2021. Exhibition (DATE), 2021.
http://obpmark.github.io/. [12] Maxime Perrotin, Eric Conquet, Julien Delange, and Thanassis Tsio-
[4] ESA. RTEMS SMP Qualification Data Packet. https://rtems- dras. TASTE: An Open-source Tool-chain for Embedded System and
qual.io.esa.int/. Software Development. In Embedded Real Time Software and Systems
[5] ESA. SPARC Instruction Simulator. https://essr.esa.int/project/erc32- (ERTS2012), Toulouse, France, February 2012.
and-sis. [13] Ivan Rodriguez, Leonidas Kosmidis, Jerome Lachaize, Olivier Note-
[6] ETH. AXI SystemVerilog Modules for High-Performance On-Chip baert, and David Steenari. GPU4S Bench: Design and Implemen-
Communication. https://github.com/pulp-platform/axi. tation of an Open GPU Benchmarking Suite for Space On-board
[7] fentISS. XtratuM hypervisor. https://www.fentiss.com/xtratum/. Processing. Technical Report UPC-DAC-RR-CAP-2019-1, Univer-
[8] Frontgrade Gaisler. NOEL-V Processor. sitat Politècnica de Catalunya. https://www.ac.upc.edu/app/research-
https://www.gaisler.com/index.php/products/processors/noel-v. reports/public/html/research center index-CAP-2019,en.html.
[9] Carles Hernàndez, Jose Flieh, Roberto Paredes, Charles-Alexis Lefeb- [14] Marc Solé, Jannis Wolf, Ivan Rodriguez, Alvaro Jover, Matina Maria
vre, Imanol Allende, Jaume Abella, David Trillin, Martin Matschnig, Trompouki, and Leonidas Kosmidis. Evaluation of the Multicore
Bernhard Fischer, Konrad Schwarz, Jan Kiszka, Martin Rönnbäck, Johan Performance Capabilities of the Next Generation Flight Computers. In
Klockars, Nicholas McGuire, Franz Rammerstorfer, Christian Schwarzl, Digital Avionics Systems Conference (DASC), 2023.
Franck Wartet, Dierk Lüdemann, and Mikel Labayen. SELENE: Self- [15] David Steenari, Leonidas Kosmidis, Ivan Rodrı́guez-Ferrández, Álvaro
Monitored Dependable Platform for High-Performance Safety-Critical Jover-Álvarez, and Kyra Förster. OBPMark (On-Board Processing
Systems. In Euromicro Conference on Digital System Design (DSD), Benchmarks) - Open Source Computational Performance Benchmarks
2020. for Space Applications. In 2nd European Workshop on On-Board Data
[10] Leonidas Kosmidis, Alejandro J. Calderón, Aridane Álvarez Suárez, Processing (OBDP), 2021.
Stefano Sinisi, Eckart Göhler, Paco Gómez Molinero, Alfred Hönle, [16] Georgia Tech. Vortex GPU. https://vortex.cc.gatech.edu/.
Álvaro Jover Álvarez, Lorenzo Lazzara, Miguel Masmano Tello, Peio [17] Blaise Tine, Krishna Praveen Yalamarthy, Fares Elsabbagh, and Kim
Onaindia, Tomaso Poggi, Iván Rodrı́guez Ferrández, Marc Solé Bonet, Hyesoon. Vortex: Extending the RISC-V ISA for GPGPU and 3D-
Giulia Stazi, Matina Maria Trompouki, Alessandro Ulisse, Valerio Graphics. In International Symposium on Microarchitecture (MICRO),
Di Valerio, Jannis Wolf, and Irune Yarza. METASAT: Modular Model- 2021.

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:54:51 UTC from IEEE Xplore. Restrictions apply.

Android-I: Android Multiple Choice Questions and Answers
No ratings yet
Android-I: Android Multiple Choice Questions and Answers
26 pages
NE40 Configuration Guide - Basic Configurations (V600R003C00 - 02)
No ratings yet
NE40 Configuration Guide - Basic Configurations (V600R003C00 - 02)
341 pages
Dual-Port Memory Block Diagram
No ratings yet
Dual-Port Memory Block Diagram
8 pages
Onur Stanford SystemXSeminar FutureComputingPlatforms 8 February 2024
No ratings yet
Onur Stanford SystemXSeminar FutureComputingPlatforms 8 February 2024
423 pages
2017 Edexcel IGCSE 4IT0/01 Past Paper
No ratings yet
2017 Edexcel IGCSE 4IT0/01 Past Paper
20 pages
DCC Micro Project
89% (9)
DCC Micro Project
10 pages
Cisco IP Solution Center MPLS VPN User Guide, 4.2: Corporate Headquarters
No ratings yet
Cisco IP Solution Center MPLS VPN User Guide, 4.2: Corporate Headquarters
300 pages
Weeks 9-10-11 Input/Output Interface Circuits and LSI Peripheral Devices
No ratings yet
Weeks 9-10-11 Input/Output Interface Circuits and LSI Peripheral Devices
80 pages
Juniper MPLS Best Practice: Static LSP Configuration and Understanding mpls.0 Table
No ratings yet
Juniper MPLS Best Practice: Static LSP Configuration and Understanding mpls.0 Table
22 pages
9J0794-K ImageSuite Specifications
No ratings yet
9J0794-K ImageSuite Specifications
38 pages
Vigor 3900 CLI Guide PDF
No ratings yet
Vigor 3900 CLI Guide PDF
97 pages
History: Motorola MC68000 (Package)
No ratings yet
History: Motorola MC68000 (Package)
5 pages
Homogeneous and Heterogeneous Multicore Systems
No ratings yet
Homogeneous and Heterogeneous Multicore Systems
9 pages
Cheshire A Lightweight Linux-Capable RISC-V Host Platform For Domain-Specific Ac
No ratings yet
Cheshire A Lightweight Linux-Capable RISC-V Host Platform For Domain-Specific Ac
5 pages
D3-04-METASAT by Leonidas Kosmidis
No ratings yet
D3-04-METASAT by Leonidas Kosmidis
31 pages
Traditional Network
No ratings yet
Traditional Network
11 pages
The Role of Field-Programmable Gate Arrays in The Acceleration of Modern High - Performance Computing Workloads
No ratings yet
The Role of Field-Programmable Gate Arrays in The Acceleration of Modern High - Performance Computing Workloads
11 pages
Lecture13 Swapping and Virtual Memory
No ratings yet
Lecture13 Swapping and Virtual Memory
57 pages
Gpon Alcatel
No ratings yet
Gpon Alcatel
7 pages
Types of Network Topology
No ratings yet
Types of Network Topology
5 pages
Log
No ratings yet
Log
23 pages
Di Mascio Et Al 2021 On Board Decision Making in Space With Deep Neural Networks and Risc V Vector Processors
No ratings yet
Di Mascio Et Al 2021 On Board Decision Making in Space With Deep Neural Networks and Risc V Vector Processors
18 pages
Towards The Use of Arti Ficial Intelligence On The Edge in Space Systems: Challenges and Opportunities
No ratings yet
Towards The Use of Arti Ficial Intelligence On The Edge in Space Systems: Challenges and Opportunities
13 pages
Computer Assignment: Name: Faculty: Science Department: Course
No ratings yet
Computer Assignment: Name: Faculty: Science Department: Course
5 pages
HARP2 An X Scale Reconfigurable Accelerator Rich Platform For Massively Parallel Signal Processing Algorithms
No ratings yet
HARP2 An X Scale Reconfigurable Accelerator Rich Platform For Massively Parallel Signal Processing Algorithms
13 pages
Best Practices For Integrating OS X With Active Directory
No ratings yet
Best Practices For Integrating OS X With Active Directory
20 pages
Best Practices For Performance - Tuning SAP R3 and Oracle Database
No ratings yet
Best Practices For Performance - Tuning SAP R3 and Oracle Database
23 pages
Acn Project
No ratings yet
Acn Project
10 pages
Application Architectures: Single Tier Architecture
No ratings yet
Application Architectures: Single Tier Architecture
2 pages
How To Write Shell Script
No ratings yet
How To Write Shell Script
10 pages
On-Board Processing Benchmarks
No ratings yet
On-Board Processing Benchmarks
9 pages
Desmo Uliers 2012
No ratings yet
Desmo Uliers 2012
12 pages
Exam Questions 500-560: Cisco Networking: On-Premise and Cloud Solutions
No ratings yet
Exam Questions 500-560: Cisco Networking: On-Premise and Cloud Solutions
6 pages
A European Roadmap To Leverage RISC-V in Space Applications
No ratings yet
A European Roadmap To Leverage RISC-V in Space Applications
7 pages
Add Standalone NIM Clients
No ratings yet
Add Standalone NIM Clients
3 pages
lastUIException 63864829118
No ratings yet
lastUIException 63864829118
5 pages
CC Handouts
No ratings yet
CC Handouts
6 pages
On-Prem 8-202302 - Backup - Snapshot and Upgrade
No ratings yet
On-Prem 8-202302 - Backup - Snapshot and Upgrade
5 pages
GPU4S Benchmark
No ratings yet
GPU4S Benchmark
6 pages
In-House Developed 32-Bit Digital Signal Processor For Strategic Applications
No ratings yet
In-House Developed 32-Bit Digital Signal Processor For Strategic Applications
5 pages
Comparison of Processing Performance and Architectural Efficiency Metrics For Fpgas and Gpus in 3D Ultrasound Computer Tomography
No ratings yet
Comparison of Processing Performance and Architectural Efficiency Metrics For Fpgas and Gpus in 3D Ultrasound Computer Tomography
7 pages
Wasp
No ratings yet
Wasp
3 pages
Wireshark 1
No ratings yet
Wireshark 1
3 pages
Matriks Komponen Sistem Informasi: No Hardware/ Netware Software Mesin Media Program Sumber Daya/ Activity
No ratings yet
Matriks Komponen Sistem Informasi: No Hardware/ Netware Software Mesin Media Program Sumber Daya/ Activity
3 pages
IT Automation: The Definitive Guide to Mastering Infrastructure Automation, Scaling, and Future Trends
From Everand
IT Automation: The Definitive Guide to Mastering Infrastructure Automation, Scaling, and Future Trends
turki alkhwlani
3/5 (1)
Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
Programming AI Workloads with Habana Gaudi SDK: The Complete Guide for Developers and Engineers
From Everand
Programming AI Workloads with Habana Gaudi SDK: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
From Everand
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet
DENT Network Operating System in Practice: The Complete Guide for Developers and Engineers
From Everand
DENT Network Operating System in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
From Everand
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
Matthew Rosch
No ratings yet
PyQt Development Reference: Definitive Reference for Developers and Engineers
From Everand
PyQt Development Reference: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rust for Network Programming and Automation, Second Edition
From Everand
Rust for Network Programming and Automation, Second Edition
Gilbert Stew
No ratings yet
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Software-Defined Networks: A Systems Approach
From Everand
Software-Defined Networks: A Systems Approach
Larry Peterson
5/5 (1)
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
From Everand
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
Anand Vemula
No ratings yet
Rust for Network Programming and Automation, Second Edition: Work around designing networks, TCP/IP protocol, packet analysis and performance monitoring using Rust 1.68
From Everand
Rust for Network Programming and Automation, Second Edition: Work around designing networks, TCP/IP protocol, packet analysis and performance monitoring using Rust 1.68
Gilbert Stew
No ratings yet
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
From Everand
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
Anand Vemula
No ratings yet
FLTK Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
FLTK Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering CUDA Python Programming
From Everand
Mastering CUDA Python Programming
Ed A Norex
No ratings yet
Learning PyTorch 2.0, Second Edition
From Everand
Learning PyTorch 2.0, Second Edition
Matthew Rosch
No ratings yet
ESP8266 Programming and Applications: Definitive Reference for Developers and Engineers
From Everand
ESP8266 Programming and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Study Guide Cisco 300-915 DEVIOT Developing Solutions using Cisco IoT and Edge Platforms Exam
From Everand
Study Guide Cisco 300-915 DEVIOT Developing Solutions using Cisco IoT and Edge Platforms Exam
Anand Vemula
No ratings yet
QEMU Virtualization Essentials: Definitive Reference for Developers and Engineers
From Everand
QEMU Virtualization Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SR-IOV Networking in Kubernetes: The Complete Guide for Developers and Engineers
From Everand
SR-IOV Networking in Kubernetes: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Programming and Prototyping with Teensy Microcontrollers: Definitive Reference for Developers and Engineers
From Everand
Programming and Prototyping with Teensy Microcontrollers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CircuitPython in Practice: Definitive Reference for Developers and Engineers
From Everand
CircuitPython in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
From Everand
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
Robert Johnson
No ratings yet
Contiki Operating System for Embedded IoT: Definitive Reference for Developers and Engineers
From Everand
Contiki Operating System for Embedded IoT: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming NodeMCU for IoT Applications: Definitive Reference for Developers and Engineers
From Everand
Programming NodeMCU for IoT Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Mbed Development: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Mbed Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
From Everand
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
ESP32 Development and Applications: Definitive Reference for Developers and Engineers
From Everand
ESP32 Development and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CUDA Programming with C++: From Basics to Expert Proficiency
From Everand
CUDA Programming with C++: From Basics to Expert Proficiency
William Smith
No ratings yet
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
From Everand
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
BeagleBone Systems and Applications: Definitive Reference for Developers and Engineers
From Everand
BeagleBone Systems and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Design and Implementation with i.MX Processors: Definitive Reference for Developers and Engineers
From Everand
Design and Implementation with i.MX Processors: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Kubernetes
From Everand
Mastering Kubernetes
Manish Soni
No ratings yet
Professional Heroku Programming
From Everand
Professional Heroku Programming
Chris Kemp
4/5 (2)
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Stack Computers: The New Wave
From Everand
Stack Computers: The New Wave
Philip Koopman
No ratings yet
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
From Everand
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
Rodrigo Copetti
No ratings yet
Next-Generation switching OS configuration and management: Troubleshooting NX-OS in Enterprise Environments
From Everand
Next-Generation switching OS configuration and management: Troubleshooting NX-OS in Enterprise Environments
Mamta Devi
No ratings yet
Linux Services Deployment
From Everand
Linux Services Deployment
Fabian Mestre
No ratings yet
Raspberry Pi :The Ultimate Step by Step Raspberry Pi User Guide (The Updated Version )
From Everand
Raspberry Pi :The Ultimate Step by Step Raspberry Pi User Guide (The Updated Version )
Jason Scotts
4/5 (4)
Neo Geo Architecture: Architecture of Consoles: A Practical Analysis, #23
From Everand
Neo Geo Architecture: Architecture of Consoles: A Practical Analysis, #23
Rodrigo Copetti
No ratings yet
Node.js 63 Interview Questions and Answers
From Everand
Node.js 63 Interview Questions and Answers
John Edward Cooper Berg
No ratings yet
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
SRS - How to build a Pen Test and Hacking Platform
From Everand
SRS - How to build a Pen Test and Hacking Platform
alasdair gilchrist
2/5 (1)

The METASAT Hardware Platform A High-Performance Multicore AI SIMD and GPU RISC-V Platform For On-Board Processing

Uploaded by

The METASAT Hardware Platform A High-Performance Multicore AI SIMD and GPU RISC-V Platform For On-Board Processing

Uploaded by

The METASAT Hardware Platform:

A High-Performance Multicore, AI SIMD and GPU

Fig. 2. METASAT’s FPGA selection Trade-off Analysis

few of these devices can be afforded. Therefore software

Two virtual platforms are currently under development,

Simulator) simulator [5] and one based on QEMU [1]. JTAG

create a NOEL-V and a METASAT model, including support Core 1

You might also like