0% found this document useful (0 votes)
86 views26 pages

Hardware Acceleration of Data Distribution Service (DDS) For Automotive Communication and Computing

This article discusses hardware acceleration of the Data Distribution Service (DDS) middleware for automotive communication and computing. DDS is a powerful middleware being adopted in automotive systems, but some of its functionalities are computationally intensive. The article proposes implementing some DDS functionalities using hardware accelerators to improve performance while reducing software complexity. This would allow building cost-effective automotive systems suitable for next-generation vehicles.

Uploaded by

thierry42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views26 pages

Hardware Acceleration of Data Distribution Service (DDS) For Automotive Communication and Computing

This article discusses hardware acceleration of the Data Distribution Service (DDS) middleware for automotive communication and computing. DDS is a powerful middleware being adopted in automotive systems, but some of its functionalities are computationally intensive. The article proposes implementing some DDS functionalities using hardware accelerators to improve performance while reducing software complexity. This would allow building cost-effective automotive systems suitable for next-generation vehicles.

Uploaded by

thierry42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Hardware Acceleration of Data


Distribution Service (DDS) for
Automotive Communication and
Computing
CLAUDIO SCORDINO 1 , (IEEE Member), ANGELA GONZALEZ MARIÑO 2
and
FRANCESC FONS 2 (IEEE Senior Member)
1
Huawei Research Center, Pisa, Italy (e-mail: [email protected])
2
Huawei Technologies Duesseldorf GmbH, Munich, Germany (e-mail: {angela.gonzalez.marino, francesc.fons}@huawei.com)
Corresponding author: Angela Gonzalez Mariño (e-mail: [email protected]).

ABSTRACT The increasing growth in complexity of vehicles’ functionalities is driving a technological


shift in the design of software architectures in the automotive industry. Traditional signal-oriented network-
ing is being replaced by service-oriented communications enabled by a new generation of Electronic Control
Units (ECUs). The growing interest for full-fledged middlewares can be supported by these powerful ECUs.
However, the new capabilities come at a non-negligible cost, which conflicts with the need to design a cost-
effective solution that allows for meeting aggressive budget goals in a high volume market like automotive.
In this paper, we illustrate how a significant part of the functionalities of a powerful middleware like Data
Distribution Service (DDS) can be effectively implemented through hardware accelerators. We show that
our approach can guarantee high performance while minimizing system complexity at the software level
(e.g. AUTOSAR) by shifting painful or inefficient software implementations of QoS policies directly to
hardware. This, in turn, allows to build cost-effective solutions suitable for next-generation automotive
systems.

INDEX TERMS Automotive, AUTOSAR, DDS, Networking, Real-Time, Service-Oriented, Hardware


Accelerators

I. INTRODUCTION Fig. 1 shows an example of a traditional ECU for Gaso-


line and Diesel engines based on the original specifications
OR decades, automotive has been a very conservative (named AUTOSAR Classic [2]). The ECU receives a set
F industry, with electronic functionalities made of sim-
ple Electronic Control Units (ECUs) executing tiny real-
of inputs (from the human driver, through the pedals and
the cruise-control lever, from sensors or other ECUs) and
time operating systems (RTOSs) and communicating through controls the injection system and the throttle. The inter-
domain-specific networks (e.g. CAN, LIN, FlexRay). The ested readers can refer to the original study [3] for further
main focus has been on safety and qualification (e.g. details. As illustrated in Fig. 1, at the bottom part, the
ISO26262 [1]), while cost production has been kept un- AUTOSAR software stack contains the Microcontroller Ab-
der control through standardization. In particular, the AU- straction Layer (MCAL). This layer, usually provided by the
TOSAR (AUTomotive Open System ARchitecture) partner- chip vendor, has direct access to the on-chip MCU peripheral
ship, started in 2004, has coordinated and driven a huge inter- modules and external devices and makes the upper layers
national effort to create an open and standardized software ar- independent of the specific microcontroller (MCU). An in-
chitecture for automotive ECUs. The consortium has fostered termediate layer (called "Basic Software") contains the real-
the growth of an open market where different actors (vehicle time operating system (RTOS), based on the OSEK/VDX
manufacturers, suppliers, service providers and companies standard [4], and provides a a set of services (including com-
from the electronics, semiconductor and software industry) munication). The generated Run-Time Environment (RTE)
can collaborate based on common specifications. abstracts the communication path to the application. Finally,

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

FIGURE 1. Example of automotive ECU for Gasoline and Diesel engines.

the functional part is implemented by the application layer the success of DDS relies on its very powerful Quality of
consisting of a set of Software Components (SWCs). Service (QoS) mechanisms, that allow to set requirements
The recent exponential increase in complexity of automo- at various levels of the communication stack and shape
tive systems [5], due to the integration of novel functionalities the traffic accordingly. DDS is already supported by most
like assisted or autonomous driving, has shown the limits of frameworks used for modern high-performance functional-
the original specifications, calling for a revolutionary shift ities in automotive. These include, for example, the men-
in the design of the computing platforms and the related tioned AUTOSAR Adaptive, the Robot Operating System
software stacks [6], [7]. Current luxury cars already contain (ROS2) framework [17] and its derived Apex.OS operating
more than 100 ECUs for a total of more than 100 million system [18]. In fact, according to some recent investiga-
lines of code [8]. tions [19], the ROS framework is already being used by
To properly address this novel class of functionalities, the about 80% of the automotive OEMs and Tier-1s developing
consortium has thus introduced an additional standard, called autonomous vehicles. Moreover, an on-going effort aims
AUTOSAR Adaptive [9]. The software stack for this kind of at including the DDS support also in the more traditional
ECUs consists of a general-purpose OS based on the POSIX AUTOSAR Classic standard [20].
standard (e.g. Linux) and a set of C++ libraries to support DDS is not just a protocol. It is a full-fledged middleware.
multi-thread applications. In this novel standard, the original Therefore, it should not surprise the fact that some of its pow-
signal-oriented paradigm has been replaced by a modern erful functionalities come at a price, often requesting power-
service-oriented architecture (SoA). Using this paradigm, the ful hardware to be timely executed. In this paper, we illustrate
various software components are decoupled from each other how some functionalities can be effectively implemented by
and communicate by requesting and providing "services". hardware accelerators, relieving slow microcontrollers of the
Each component can be designed in isolation and the sys- execution of heavy computations and allowing the overall
tem is assembled by composing and integrating the various system to better meet the timing requirements. To the best
functionalities. This separation of concerns [10] allows to of the authors’ knowledge, this is the first attempt to propose
lower the complexity of the designed system to a manageable the implementation of DDS functionalities with hardware
level through composability, scalability and reusability of the support. Furthermore, we explore the combination of DDS
various components. with TSN technologies from a HW perspective, allowing to
At the same time, automotive OEMs have started replacing integrate the QoS features at different levels.
traditional automotive networks with general-purpose net- The paper is organized as follows. First, section II presents
works (namely, Ethernet) that allow to reach higher through- the key characteristics of the DDS middleware. Second,
puts and also solve the issue of complex cabling inside section III introduces the context of DDS in automotive and
the vehicle [11]–[13]. Following the trend towards Ether- illustrates some automotive use-cases that can take advantage
net networks inside the vehicle, Time Sensitive Networking of DDS features. Section IV introduces TSN technologies in
(TSN) technologies are also being explored for automotive the context of automotive. Section V introduces a new HW-
purposes [14]. TSN brings the determinism that Ethernet based network processing architecture, called Elastic Gate-
lacks and allows to bridge the gap towards the integration way (eGW), where DDS and TSN features can be deployed,
of Ethernet into real time systems such as IVNs [15]. together with the Software Defined Network (SDN) imple-
Among the various available technologies for implement- mentation paradigm. Then, Section VI explains how some
ing SoA, Data Distribution Service (DDS) [16] is becom- DDS functionalities can be effectively moved to hardware ac-
ing a de-facto communication standard. The reason behind celerators and describes the deployment of such features into
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

the eGW architecture. Subsequently, Section VII describes TABLE 1. Supported QoS policies of DDS version 1.4
the framework that allows to integrate DDS features in the

DomainParticipant
HW implementation from a high level SW-based definition.
Later, Section VIII explores the intersection between DDS

DataReader
DataWriter

Subscriber
and TSN technologies. Section IX compares our HW-centric

Publisher
approach with the AUTOSAR SW-centric approach and

Topic
shows how our proposal is not orthogonal with AUTOSAR QoS policy Description
software stack but compatible and integrable with it. Before USER_DATA Custom user data x x x
concluding, Section X shows a proof-of-concept of HW- TOPIC_DATA Custom user data x
GROUP_DATA Custom user data x x
accelerated DDS in eGW based on real hardware. Finally,
DURABILITY If data should “outlive” their writing x x x
Section XI states the conclusions and future work. time (e.g. late-joining DataReaders)abc
DURABILITY_ Specifies the service implementing the x x
SERVICE durability (if any)b
II. DATA DISTRIBUTION SERVICE (DDS) PRESENTATION How changes to data are presented to x x
subscribing applicationsabc
DEADLINE Maximum time after which DataReader x x x
Originally proposed in 2001, DDS became an Object Man- expects an update of periodic dataac
agement Group (OMG) standard in 2004, with several open- LATENCY_ Maximum delay from data write to data x x x
source implementations available nowadays. OMG [21] is an BUDGET reception and notificationac
OWNERSHIP If multiple DataWriters can write the x x x
international not-for-profit consortium producing and main- same data instanceabc
taining computer industry standards for the design of inter- OWNERSHIP_ Strength of the DataWriter for arbitration x
operable and portable systems. STRENGTH in case of exclusive OWNERSHIPc
LIVELINESS Mechanism to determine if an entity is x x x
The DDS specifications [16] describe a Data-Centric active (“alive”)abc
Publish-Subscribe model for distributed application com- TIME_BASED_ Minimum time a DataReader is x
munication. This model builds on the concept of a "global FILTER interested in receiving updates
PARTITION Logical partition among the topics visi- x x
data space" contributed by publishers and accessed by sub- ble by the Publisher and the Subscriberc
scribers: each time a publisher posts new data into this global RELIABILITY Reliability level of message deliveryabc x x x
data space, the DDS middleware propagates the information TRANSPORT_ Priority to be used on underlying x x
to all interested subscribers. The data-centric communication PRIORITY transportc
LIFESPAN Maximum time of validity of written x x
allows to decouple publishers from subscribers, thus building data, to avoid delivery of “stale” datac
a very scalable and flexible architecture. The underlying data DESTINATION_ Logical order among changes made by x x x
model specifies the set of data items, identified by "topics". ORDER Publishers to the same data instanceabc
HISTORY Behavior in case a sample changes be- x x x
A Topic corresponds to a single data type, but it may fore being communicatedb
gather multiple data-object instances (in which case, differen- RESOURCE_ Maximum amount of resources x x x
tiated by some key data field). DataWriter is the typed object LIMITS consumed by the serviceb
ENTITY_ Behavior of an entity when creating x x x
used by an application to communicate to the Publisher FACTORY other entities
the value of data-objects of a given type. The Publisher, WRITER_DATA_ Behavior of DataWriter with respect to x
which can publish data of different data types, is then re- LIFECYCLE the lifecycle of the data-instances
READER_DATA_ Behavior of DataReader with respect to x
sponsible for data distribution according to the configured LIFECYCLE the lifecycle of the data-instances
QoS policies. Similarly, Subscriber is the object responsible a Values on the publishing and subscribing sides must be compatible.

for receiving data of different data types. To access the b Not changeable.
c May appear as in-line QoS inside RTPS messages [22].
received data, the application must use a typed DataReader
attached to the Subscriber. Thus, a subscription is defined by
the association of a DataReader with a Subscriber. On the
subscriber’s side the notification can be either synchronous the sent messages. The interested readers can refer to the
or asynchronous. A domain is a distributed concept that protocol specifications [16] for a full explanation.
links all the applications able to communicate with each
other. Only publishers and subscribers attached to the same It is important to note that the main DDS specification
domain may interact. A DomainParticipant represents the does not address the underlying transport protocol used for
local membership of the application in a domain. exchanging messages (e.g. TCP and UDP). This functionality
is provided by the underlying Real Time Publish Subscribe
Fig. 2 shows a simplified vision of the interaction between
(RTPS) wire protocol [22], specifically designed to support
a sender and a receiver belonging to the same domain. Before
the unique requirements of data-distribution services. A re-
the communication can occur, the sender needs to instantiate
cent Request For Proposals by OMG [23] aims at investi-
DomainParticipant, Topic, Publisher and DataWriter. Simi-
gating the usage of DDS on top of Time-Sensitive Networks
larly, the receiver must instantiate DomainParticipant, Topic,
(TSN).
Subscriber and DataReader. The DDS middleware will then
take care of matching the two endpoints and properly deliver
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

FIGURE 2. Simplified sequence diagram of DDS.

A. QUALITY OF SERVICE
The reason behind the success of DDS relies on its very Ethernet backbone
Domain related buses
powerful QoS mechanism, that allows to assign different
QoS policies to the various entities in the system (i.e. Topic,
DataWriter, DataReader, Publisher, Subscriber and Domain-
Participant). These policies allow to control the behavior of Zonal Zonal
the middleware in terms of timing predictability, overhead Gateway Gateway
and resource utilization. Table 1 summarizes the possible 22
QoS policies according to the latest version of the standard HPC

(1.4). The interested readers can refer to [16] for a full


Zonal Zonal
description of the policies. Gateway Gateway

III. DDS IN AUTOMOTIVE: CONTEXT AND USE-CASES

In this section, we discuss the suitability of the DDS mid-


dleware to the automotive domain. We start by introducing FIGURE 3. Zonal-network architecture deployment in vehicle.
the state of the art of In-Vehicle Network architectures, and
discuss the current challenges and opportunities regarding
the design of such networks. Then, we introduce the topic sensors (e.g. cameras, radars, LIDAR’s, ultrasound) increases
of Functional Safety, which is relevant to the automotive dramatically. Also, the required computation becomes more
domain, where HW-accelerated DDS capabilities can be complex (e.g. objects/events detection, decision making), re-
beneficial. Later, we discuss the existing alternative to DDS quiring more powerful computing platforms. Apart from the
in automotive: namely, SOME-IP [24]. Afterwards, we cover cost of the new components included, the cost of the cabling
the typical aspects that are sometimes seen as a disadvantage required to connect them is impacting the overall cost of the
of DDS for embedded systems and show how, today, it is solution. As described in [11]–[13], the In-Vehicle Network
possible to overcome them. At the end of this section, we is shifting from a logical distribution of the functionalities
present several use-cases for HW accelerated DDS capabili- (i.e. domain-based architecture) to a physical distribution (i.e.
ties in automotive. zone-based architecture). Fig. 3, shows the new IVN defined
for modern vehicles. The architecture is composed of several
A. IN-VEHICLE NETWORK ARCHITECTURE zonal gateway controllers which handle the communication
With the introduction of novel use-cases and technologies with sensors and actuators of one physical area, combining
in automotive, the electric/electronic (E/E) architecture of the network protocols required for each of the devices. At the
the vehicle is changing radically. In order to achieve the same time, these zonal gateways are connected between them
desired level of autonomy and connectivity, the number of and with the central CPU or High Performance Computer
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

(HPC) through an Ethernet backbone. This distributed zonal tribute to implement such safety mechanisms. For example,
architecture simplifies the layout of sensors, actuators and in case a safety goal requires to detect and handle long
cabling inside the car. However, it introduces new technical communication latencies, a callback can be attached to the
challenges. Now, each zonal gateway needs to manage differ- LATENCY_BUDGET policy. Similarly, the LIVELINESS
ent network technologies, different kinds of functionalities policy could be used when the safety goal requires to detect
and traffic with different criticality levels. Authors in [25] the silent failure of specific components. The DEADLINE
review the challenges of current IVNs, focusing on the com- policy allows to detect if a publisher fails to send information
plexity of software configuration and mapping of function- at the requested rate. Moreover, the OWNERSHIP policy
alities to available resources. In [26], authors analyse the re- supports the design of safe fail-over schemes. Overall, it
quirements of new ECUs in order to provide the capabilities is clear how the DDS middleware supports the design of
demanded by autonomous and connected vehicles. Interested a safe system also in the automotive domain. The alterna-
readers can find there details on performance requirements tive would be the development of the whole logic in the
(latency, bandwidth, technologies to be supported, etc.) as application code, which would become hard to port and
well as an analysis of existing platforms in the state of the maintain. Furthermore, being certified by a functional safety
art. The conclusion of the study is that new high performance institution/authority would be also more complex, mainly in
solutions are needed to reach the desired outcome, and that terms of proving freedom from interference (FFI) when this
HW acceleration can be the key to unlock the potential of code gets merged with other non-safety relevant code running
IVNs. on the same processor or core concurrently.
All in all, we see that there is a need for new strategies that
allow to efficiently handle these new challenges and to enable C. SOME/IP VS DDS IN AUTOMOTIVE
the required performance and QoS throughout the network. Originally proposed by BMW, Scalable service-Oriented
Some examples are the current works on the integration of MiddlewarE over IP (SOME/IP) [24] is a protocol specifi-
TSN in the vehicular network trying to optimize latency and cally designed for Ethernet-based communications in auto-
reliability. Other ongoing efforts are related to safety aspects motive. This standard specifies the serialization mechanism,
that are relevant when increasing the level of autonomy of the service discovery and the integration with the AUTOSAR
vehicles. And on top of these functional aspects, the plat- stack. DDS, instead, is a full-fledged middleware standard-
forms where they run also need to keep up with the ongoing ized by OMG and designed as a cross-domain technology,
changes as seen above. In this work, we cover these trends and therefore also used for aerospace, robotics and industrial
in the following sections, and focus on a particular strategy automation.
that improves performance and QoS in the network: the HW Both protocols allow distributed communication through
acceleration of DDS policies. either publish/subscribe or Remote Procedure Call (RPC)
patterns. However, there are some substantial differences
B. FUNCTIONAL SAFETY behind the provided functionalities:
One characteristic of modern automotive functionalities is • DDS allows looser coupling and more scalability by
that they can span over multiple ECUs. When this is the offering a fully-decentralized data-centric communica-
case, traditional AUTOSAR Classic ECUs [2] can be used tion; the service-based pattern provided by SOME/IP,
to implement real-time control loops for sensing and actu- instead, is more coupled.
ation, while AUTOSAR Adaptive ECUs can perform HPC • On DDS, reliability and fragmentation are provided by
processing. When the implemented functionality is safety- a transport-agnostic layer (RTPS), allowing to transfer
relevant, functional safety practices (e.g. ISO26262 [1]) help large and reliable data over (even multicast) UDP; on
in lowering the risk introduced by malfunctions. For each SOME/IP, instead, reliability needs to be provided by
implemented vehicle function ("item"), the safety require- the underlying transport protocol (namely, TCP).
ments are identified ("safety goals") and safety mechanisms • The DDS standard also offers optional transport-
are implemented to reduce the risk of hazardous events. agnostic security; SOME/IP, instead, does not include
In [27], an example of safety analysis and development of security functionalities and therefore relies on the func-
a safety concept for IVNs is presented. The benefits of HW tionalities offered by the underlying transports.
acceleration for safety-critical features have been already • DDS applications are more portable across different
explored in the state of the art. In [28], the authors propose platforms as the specification standardizes also the full
the use of reconfigurable HW as an alternative to multi-core API for various programming languages.
systems. Considering not only safety but also security fea- • However, the most relevant difference is about QoS
tures, [29] and [30] explore the benefits of HW acceleration support. SOME/IP provides a very limited QoS support
for embedded cyber-security in automotive. relying on the functionalities offered by the underlying
Later in this work, we show how HW-acceleration can be transport. DDS, instead, offers a wide range of QoS
beneficial for the processing of DDS policies related to safety policies (22, in the current specification as listed in
critical systems within the vehicle. The use-cases illustrated Table 1) that can be used to implement complex safety
in Section III-E help in understanding how DDS can con- mechanisms.
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

For these reasons, although SOME/IP has been supported a viable protocol in the automotive domain due to its poor
by AUTOSAR since 2014, in the recent years there has been scalability [37].
a growing interest for DDS communications. DDS support
was first included into AUTOSAR Adaptive in 2018 and now c: Suitability of the Ethernet protocol
being added to AUTOSAR Classic as well [20]. Moreover,
DDS is the communication mechanism behind ROS2, a Another objection concerns the suitability of the Ethernet
framework often used for implementing novel automotive medium for in-vehicle communications. Indeed, the automo-
functionalities. tive domain traditionally preferred the design and usage of
ad-hoc protocols (e.g. CAN, FlexRay, LIN) for communica-
tions. However, these technologies cannot sustain the high
D. CRITICISM
throughput necessary to support novel automotive functional-
In this section, we discuss some technical aspects that some- ities. Moreover, the burden of cabling is affecting the design,
times are used to argue that DDS is not suitable for embedded the cost and the maintenance of the vehicles. For instance,
systems and/or the automotive domain. We show that, even the impact of its weight can compromise the autonomy of
though these issues had some foundation in the past, the the electric vehicles (notice that the wire harness is the third
technology has evolved enough to allow to overcome them. heaviest and highest cost component in a vehicle, behind
engine and chassis).
a: CPU load For these reasons, rather than designing new ad-hoc high-
One general concern when substituting lightweight automo- throughput communication mechanisms, automotive OEMs
tive communication protocols (e.g. SOME/IP) with DDS is decided to switch from previous communication technolo-
related to performance: DDS is a complex middleware and gies towards Ethernet. The design of the SOME/IP protocol
thus companies fear longer communication latencies and has been a first step towards this transition that is already
especially more CPU processing. Indeed, it has been shown happening in modern vehicles [38]. The next step envisioned
that the execution of a DDS stack can imply a non-negligible by the AUTOSAR Consortium is the design of the zonal
amount of CPU processing. Bellavista et al. [31] has shown architecture previously illustrated, where gateway controllers
that even a scalable stack can consume up to 10% of an communicate with a main HPC through an Ethernet back-
Intel CPU at 1.8 GHz for a simple Round-Trip-Time (RTT) bone.
test. Wu et al. [32] showed 20% of CPU usage on an Intel Addressing the technical and electrical suitability of Eth-
Xeon machine when transferring Computer Vision data for ernet for in-vehicle communications would be an interesting
autonomous driving through DDS in the ROS2 framework. discussion but out of the scope of the current work. This is
Profanter et al. [33] reported overload conditions when trying being dealt with through standardization of 100/1000BASE-
to run 100 DDS nodes on a powerful Intel i7 at 3.7 GHz [33]. T1 technology migrating from Broad-Reach PHY, which
The sources of this overhead come from the various QoS was specifically designed to address the stringent electro-
policies (which often imply message de-serialization and magnetic compatibility (EMC) requirements of Ethernet in
filtering) and the additional messages for service discov- vehicles. However, when Ethernet is used, DDS is expected
ery [33]. Some recent work aimed at reducing overhead by to provide additional benefits over the usage of other proto-
implementing zero-copy mechanisms [19], [34]. However, cols (namely, SOME/IP) as it will be illustrated in the next
the amount of CPU processing is still quite significant if sections.
compared to other technologies like SOME/IP. Indeed, the
architecture proposed in this paper specifically takes into
account these concerns reducing both CPU processing and d: Interoperability
latencies by moving part of the DDS stack to hardware Another objection is that the main benefits provided by the
accelerators. DDS middleware (i.e. QoS policies) are restricted only to
peers communicating through this protocol. This is indeed
b: Bandwidth usage a limitation that is being already taken into account by the
Some previous work [33], [35] argued that DDS has higher AUTOSAR Consortium, which in 2022 has started an effort
overhead in terms of payload size with respect to other to make DDS available on all ECUs designed according to
protocols like MQTT [36], thus resulting in a higher band- the standard. DDS is already officially supported on AU-
width usage. MQTT is an OASIS standard for a lightweight TOSAR Adaptive ECUs [9]. The ongoing effort, however,
publish/subscribe protocol with minimal network bandwidth is standardizing the DDS support also for ECUs designed
requirements. However, such evaluations did not take into according to the more traditional AUTOSAR Classic stan-
account the possibility of disabling dynamic service discov- dard [20]. Once this standardization activity will be finished,
ery of DDS, which is responsible for a significant part of all automotive ECUs designed according to (either Classic or
the additional overhead. The evaluation showed that DDS Adaptive) AUTOSAR or ROS2 could interoperate based on
provides "superior performance on data latency and reliabil- the DDS protocol, taking full advantage of its QoS policies.
ity" than MQTT. At the same time, MQTT is not considered
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

E. POSSIBLE APPLICATION EXAMPLES FOR enter a fail state and/or restore the failed component. In ad-
HW-ACCELERATED DDS IN AUTOMOTIVE dition, the OWNERSHIP and OWNERSHIP_ STRENGTH
This section illustrates some automotive use cases that can policies allow to implement a fail-over scheme through a
take advantage of the features of the DDS middleware. backup DataWriter that automatically becomes visible and
substitutes the failed component.
a: Real-time safety-critical communications Fig. 5 illustrates an example where, in case of failure of
In automotive, as in similar real-time domains, safety-critical an autonomous driving system, another (simpler) component
communications need to be delivered within a certain and can take over the control and park the vehicle in safe con-
bounded amount of time, otherwise some humans might ditions. Both the autonomous driving and the backup SWCs
be injured. As an example, Fig. 4 illustrates a camera- periodically send messages under the same DDS topic to the
based Emergency Braking Assistant (EBA) similar to the one Control SWC which, as a result, controls the vehicle actua-
shipped with the AUTOSAR demonstrator [39]. The example tors. However, the SWC for autonomous driving is assigned
consists of a pipeline of SWCs activated periodically by a higher OWNERSHIP_STRENGTH than the backup SWC.
the operating system (with a period in the order of tens This way, whenever the former SWC is alive, it "wins" the
of milliseconds) and communicating through one-slot input ownership and controls the system. However, if it silently
buffers. The Pre-processing SWC identifies the current travel fails, then the control gets automatically acquired by the
lane on every frame received from the Video Adapter. This backup SWC.
information is sent along with the original frame to the
Computer Vision SWC, which detects vehicles and estimates
their distance. The Emergency Braking SWC, then, receives
this information and decides if an emergency braking should
occur. Since the system is composed of several components
communicating via network, it is important to bound the E2E
latency, otherwise the system might not react in time and
people could be injured in a car accident.
FIGURE 5. Use-case b: Fail-over of autonomous driving.

c: Avoid processing of unneeded messages


Data messages sent on the IVN could be needed by multiple
subscribers, which may have different requirements about
how frequently to be notified of the most recent values.
FIGURE 4. Use-case a: DDS communication in Emergency Braking system.
Forcing all subscribers to receive (and process) the incoming
messages at the highest frequency would be a waste of
DDS allows to assign these safety-critical communica- processing resources and might force system designers to
tions higher priority than other communications through the select a more powerful and expensive hardware.
TRANSPORT_PRIORITY QoS policy. This way, the mid-
Fig. 6 shows an example where a message containing the
dleware will take care of prioritizing this communication
vehicle’s speed needs to be received by two different com-
by e.g. increasing the priority of the underlying transport
ponents. This is known as the "1-N" scenario, widely used
protocol.
in autonomous vehicles [32]. In the example, an AUTOSAR
The timing requirement, however, can be further enforced Classic ECU is in charge of pre-processing and publishing
by additionally setting the LATENCY_BUDGET policy. speed information. A high-performance ECU needs to re-
This QoS policy, in fact, allows to specify the maximum ceive the sent data at a high rate (i.e. 100 Hz) to implement
acceptable delay from the time the data is written until the autonomous driving functionalities. Another ECU, instead,
data is received and the application notified. This policy does needs to receive the same data at a lower rate (i.e. 10 Hz) to
not only help in prioritizing the messages, but it also allows show the information on a dashboard.
to execute proper diagnostic or recovery mechanisms in case
the timing requirement has not been respected.

b: Healthy state of safety-critical components


Collateral to communications, it is important to also check
the healthy state of critical components. The LIVELINESS
policy allows a safety monitor to check if a component
silently fails or loses communication with the rest of the
FIGURE 6. Use-case c: Multi-frequency communications.
system and, in case, to trigger some recovery mechanism to
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

Without DDS (e.g. using SOME/IP), the system designer Time Synchronization
would need to choose between either (i) receive and process 1AS
data at a faster rate also on the dashboard or (ii) hardcode Qav !+ Qci Reliability
two different communications for the ECUs. DDS, instead, !+ !+
offers the TIME_BASED_FILTER policy that allows each Bounded Latency
DataReader to specify the minimum frequency at which data Qbv 1CB
*+ ! Base profile
is needed. This way, each component can specify its own *-
requirement, and the middleware takes care of dispatching * Extended profile
Qch Qbu
data at the needed frequency. The DDS middleware running
*- Qcr *-
on the Dashboard will still receive data at a faster frequency, + Required
but data will be dispatched to the application at a lower *-
frequency, thus reducing the amount of processing. Note that, - Optional
implementing this functionality in hardware (as proposed in
the next sections) allows to reduce CPU processing even fur- FIGURE 7. IEEE P802.1DG - TSN profile for automotive - overview

ther. In most complex scenarios, it is possible to additionally


use content-based filtering, a functionality offered by DDS to
specify which values of the Topic the subscriber wants to be extended profile where some of the TSN features are required
notified for (e.g. range of values of interest). and some others are optional. An overview of the P802.1DG
profile and the integrated sub-standards is given in Fig. 7.
IV. TIME SENSITIVE NETWORKING IN AUTOMOTIVE As shown in the figure, there are three main categories of
One of the technologies to be integrated in automotive net- the TSN sub-standards that are applicable to automotive:
works in order to provide determinism over the Ethernet (i) Time Synchronization, (ii) Reliability and (iii) Bounded
backbone is Time Sensitive Networking (TSN). TSN is a Latency. Next, we introduce all the sub-standards included in
collection of IEEE standards that define several mechanisms P802.1DG according to their functional category and provide
which allow to provide bounded jitter and latency over Eth- a summary in Table 2 .
ernet networks [40]. A comprehensive survey of TSN tech- • Time Synchronization: The basis of TSN technologies
nologies is available in [15], [41], and the suitability of these relies on the synchronization of the different nodes
technologies for the automotive use case is explored in [42]. in the network, providing a common reference of
TSN provides a toolkit of mechanisms to equip networks time. This mechanism is standardized in IEEE802.1AS,
with deterministic capabilities, allowing to define the right which describes how this synchronization can be
configuration for each use case. However, this means that achieved [45]. For automotive, the purpose of 1AS is to
many parameters need to be properly defined for the network provide synchronization across the whole network with
to operate correctly. The state of the art is also focusing on the a 1 µs accuracy. For this, one node acts as a master
configuration aspects of TSN as in [43]. In this section we and distributes its time to the other nodes periodically,
provide an overview of TSN technologies applicable to the allowing the slaves to adjust their internal timing to
automotive domain, guided by the P802.1DG (TSN profile match the one of the master.
for automotive [44]) which is currently under development. • Reliability: The focus of TSN regarding reliable net-
TSN and DDS share the same goal of providing the right works goes in two directions. On one side, it provides
QoS for frames traversing the network, but at different levels a mechanism to filter unwanted traffic (IEEE802.1Qci).
of abstraction. While TSN operates at network access layer, On the other side, it provides a mechanism to ensure that
DDS operates at transport layer. In Section VIII we explore critical traffic is delivered across the network by using
the intersection of TSN with DDS from a HW perspective, physical redundancy (IEEE802.1CB).
aiming at simplifying the software management of QoS -- IEEE802.1Qci [46]: This standard defines the Per-
aspects and providing the best possible performance. Stream Filtering and Policing strategy that can be
used to filter streams that are identified to be some
A. TSN AUTOMOTIVE PROFILE: P802.1DG kind of threat. It provides mechanisms to define
Given that TSN technologies are applicable to many different and identify such streams, as well as guidelines to
industries, there are various working groups within IEEE what kind of traffic should be dropped.
that are working on different profiles to promote and ease -- IEEE802.1CB [47]: This standard defines the
the adoption of TSN in different domains. The goal of these Frames Replication and Elimination for Reliability
profiles is to provide guidelines on what are the key features strategy that allows to duplicate traffic of critical
that can be of interest for a particular use case and simplify flows in order to maximize the probability of crit-
the selection of mechanisms in order to provide the desired ical messages arriving to their destination. It uses
performance. In automotive, the P802.1DG profile is defining similar mechanisms as Qci in order to identify
the TSN features that IVNs should integrate in order to be flows and process frame replication and elimina-
standard compliant. The profile defines a base profile and an tion accordingly.
8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

• Bounded Latency: Finally, TSN also provides several TABLE 2. Summary of TSN standards included in P802.1DG
mechanisms with the aim of guaranteeing bounded la-
Time Synchronization
tency across the network. These mechanisms can be Periodic distribution of time reference through network
used independently or combined, and give flexibility to IEEE802.1AS
in order to synchronize all nodes.
network designers to adapt them to their specific use- Reliability
cases. IEEE802.1Qci
Per Stream Filtering and Policing: provides mechanisms
to filter undesired flows
-- IEEE802.1Qav [48]: This standard defines the Frames Replication and Elimination for Reliability: strat-
Credit Based Shaper (CBS) algorithm that allows IEEE802.1CB egy to duplicate critical flows in order to make sure that
they reach their destination
to limit the amount of bandwidth consumed by
Bounded Latency
some specific flows. This is useful in limiting faulty IEEE802.1Qav Traffic Shaper based on Credit Based Shaping strategy
devices or attackers that could intentionally or un- IEEE802.1Qbv Traffic Shaper based on Time Aware Shaper
intentionally flood the network resulting in severe Traffic Shaper based on Cyclic Queueing and Forwarding
IEEE802.1Qch
algorithm
faulty behaviour. By limiting certain high band- Traffic Shaper based on Asynchronous Traffic Shaping
width consuming flows, it is possible to guarantee IEEE802.1Qcr
strategy
a certain bandwidth for other flows in the network, IEEE802.1Qbu Support for frame preemption
or even extra bandwidth that may be needed in case
of exceptional events such as a failure.
-- IEEE802.1Qbv [49]: This standard defines the V. ELASTIC GATEWAY SOC ARCHITECTURE
Time Aware Shaper (TAS) algorithm that allows
to effectively manage periodic traffic. The TAS In this section, we introduce a novel System-on-Chip
defines a base cycle time divided in smaller time (SoC) architecture for network processing within IVNs: Elas-
windows, and allows to control which traffic is tic Gateway (eGW). This architecture has been specifically
allowed to be transmitted in each window. With designed to meet the requirements of future IVNs defined
this, it is possible to guarantee a certain open in [26]. Elastic Gateway provides a full HW datapath com-
transmission window for critical periodic traffic, posed of a set of Intellectual Property Cores (IPCores) that
minimizing the delay experienced by this traffic. allow to optimize the gateway performance while keeping
-- IEEE802.1Qch [50]: This standard defines the complexity and HW resources under control. The IPCores
Cyclic Queueing and Forwarding (CQF) algorithm are optimized for the gateway application and designed with
that allows to set an upper limit to the delay of scalability and reusability in mind, allowing to create new
frames. Similarly to Qbv, it defines a base cycle gateway designs through the utilization of several IPCore
time, but in this case it is divided only in two instances. The architecture follows the Software Defined
windows. At the same time, all stations have two Networking approach (SDN), separating control plane from
internal buffers, which are used either for trans- data plane although keeping them within the device and
mission or reception in each window time. This providing full configuration capabilities from the system
way, the maximum delay of one message across CPU, allowing, for instance, to configure the datapath of each
the network can be limited by the cycle time and ingress/egress port in a different way.
amount of hops traversed across the network.
The high-level architecture of eGW is depicted in Fig.
-- IEEE802.1Qcr [51]: This standard defines the
8. As illustrated in such figure, eGW is composed of three
Asynchronous Traffic Shaping (ATS) algorithm
main stages, similarly to other network processing architec-
that provides reduced latency without the need
tures: ingress stage, processing stage and egress stage. On
of time synchronization. The concept is based on
the ingress stage, frames are received and adapted to the
Urgency Based Scheduler and the implementation
internal device processing format. To this aim, a new frame
of a token-based algorithm to assign transmission
normalization concept and HW IPCore (Normalizer in Fig. 8)
windows.
were introduced in [53]. This normalization concept allows
-- IEEE802.1Qbu [52]: This standard defines the
to extract the required information from frames in order to
strategy of frame preemption within TSN net-
process the frames accordingly. To do so, the normaliza-
works. The concept of frame preemption allows
tion stage extracts the metadata of interest from the frame
to define two categories of traffic: express and
and composes a new frame, that we call instruction. This
preemptable traffic. Given this classification, pre-
approach is graphically explained in Fig. 9. Traditionally,
emption defines how express traffic can interrupt
metadata is embedded in frames as a sort of new layer follow-
the transmission of preemptable traffic, allowing to
ing the Open Systems Interconnection model (OSI) layered
minimize the delay experienced by express traffic.
approach [54]in the header of each frame. However, in eGW
Like Qcr, it is independent of time synchronization
we propose to separate the metadata from the data frame
mechanisms.
and to generate a new instruction frame that is transported
in parallel with the data. This instruction frame has its own
VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

SYSTEM TIME ENGINE CPU


TimeReference

INGRESS INGRESS FRAME INTERMEDIATE FRAME


FRAME CPU-HW interface
PHY QUEUEING 0 FILTERING QUEUEING 0 PROCESSING
11 00 11 00 NORMALIZER
11 00 11 00 Match 0 Action
0
DAE
TASK 0 EGRESS EGRESS
INGRESS INGRESS FRAME INTERMEDIATE
FRAME QUEUEING 0 TRAFFIC PHY
PHY QUEUEING 1 FILTERING QUEUEING 1
11 00 11 00 NORMALIZER TASK 1 SHAPING 0
11 00 11 00 Match 1 1 0 1 0
1 DAE
DAE
TASK K
INGRESS INGRESS FRAME EGRESS EGRESS
FRAME INTERMEDIATE
PHY QUEUEING N FILTERING QUEUEING N QUEUEING M TRAFFIC PHY
11 00 11 00 NORMALIZER SHAPING M
11 00 11 00 Match N 1 0 1 0
N DAE DAE
DAE

Loopback Processing

Loopback TSN

INGRESS STAGE PROCESSING STAGE EGRESS STAGE

FIGURE 8. High level architecture of Elastic Gateway.

header, payload and tail, that transports the metadata related these buffers allow to easily change clock domains or to
to the data frame. This new concept allows to provide full modify the datapath width across the different processing
separation between control plane and data plane, since the stages, from ingress to egress, of the SoC device. The internal
data frame flows through the data plane, while the instruction architecture of these queueing buffers was described in [55].
frame carrying the metadata flows through the control plane. The processing stage is composed of a single Match &
More details on this strategy are given in [53]. Action stage with queueing modules between the steps. The
The format of the instruction frame is shown in Table 3. match or filtering block allows to identify patterns within
The header contains information extracted from the frame frames and to determine the required processing for each of
at ingress stage, such as the size, ingress port ID or type them. To this aim, a Content Addressable Memory (CAM)
of network (CAN, Ethernet, etc.). An important field within is implemented within the filtering block. The Action stage,
the header is the Timestamp of the frame at ingress, which instead, is composed of a stack of parallel tasks which can be
can be later used to be aware of the time that frames spend performed in parallel over different frames, providing max-
within eGW. The payload contains information about which imum performance and allowing to exploit all the available
action needs to be performed over the frame, encoded in an HW resources. For this, the control plane of the action block
operation code (OPCODE), which will be used later in the determines which frame from the intermediate queueing will
processing stage. Finally, the tail consists of a checksum that be assigned to each task at each moment in time. To make
allows to verify the integrity of the instruction frame across this decision, it takes into account the processing needed for
the different stages. each frame and the priority of the enqueued frames in case of
After normalization, frames are stored in the ingress conflicts. The strategy followed in order to prioritize frames
buffers. The FIFO-based queueing buffers, which are reused is described later in this section. An example of application
across the different stages (ingress, intermediate and egress implemented in this architecture is detailed in [56]. In case
queueing in the figure) allow to accommodate frames and that more than one action is required for a frame, the loop-
resolve possible conflicts on shared resources. Additionally, back processing path highlighted in green in the figure can
be used. In this way, a pseudo-pipeline of Match & Action
stages can be created, with as many stages as required by the
processing of each frame. That is, the architecture allows to
adapt the datapath to the processing required by each frame,

TABLE 3. Instruction Frame format.

Header Payload Tail


FIGURE 9. Normalization strategy. Prio Size Time Port Type Opcode CS

10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

optimizing thus the processing time when combined with


CPU
an effective prioritization and arbitration strategy responsible
CONFIG FEEDBACK
for the queueing and dequeueing of the internal frames in SYSTEM TIME ENGINE
each moment, called Distributed Arbitration Engine (DAE)
and described later. More details on the use of this loopback MEMORY MAP

path together with an example of application implemented on CONFIG CONFIG

this architecture are available in [57]. CLK0


After processing, frames are stored in the egress queues PHASE LOCKED CLK1
CLKIN LOOP (PLL) ...
and finally transmitted to the egress PHY, according to the
CLKN
shaping rules determined by the traffic shaping module. This SYSTEM CLK
IPCore allows to deploy any traffic shaping standard such
FEEDBACK
as the ones defined by IEEE TSN standards, or others that FROM / TO
SYSTEM TIME CONTROLLER
may come in the future. The details of this IPCore are PROCESSING (IEEE 802.1AS)
TimeReference
STAGE
available in [58]. Once again, the datapath can be extended
through a loopback, if required, after the traffic shaping stage. FIGURE 10. System Time Engine.
This provides the required flexibility to perform recursive
processing with frames that have already reached the egress
stage. More details about this loopback path and suitable use
ingress stage will usually operate at the rate of the ingress
cases are described in [59].
port, but it may be useful to operate at higher rates in the
One important aspect of the eGW architecture is the scal-
processing stage in order to accelerate complex processing
ability and flexibility provided towards designing different
tasks when required. The queueing modules in the eGW
families of gateway products: from a very simple gateway
architecture allow frames to seamlessly traverse from one
with few ingress/egress ports and little processing func-
clock domain to another.
tionalities, to a high-end product with many ingress/egress
ports and complex functionalities integrated, the high level On the other hand, the System Time Engine includes a
architecture depicted in Fig. 8 remains the same. Exploiting System Time Controller that generates the TimeReference
this characteristics of the architecture, a system design frame- signal within the system. This is an essential item within
work called eGW Builder, which allows for selecting the eGW, since it allows to distribute a common notion of time
geometry and capabilities of each gateway design, has been across the different eGW IPCores. As seen in Fig. 8, the
developed [60] together with a generic validation framework TimeReference signal is distributed to all the IPCores of
for the generated designs [61]. In this work, we explain eGW. This information is then used at the different stages in
how the integration of DDS features can be defined in the order to perform decisions based on time. This time signal is
design framework and how this is translated into the HW synchronized across the network with other nodes by using
implementation. the IEEE802.1AS synchronization protocol, enabling thus
The flexibility and scalability claimed by the eGW ar- the integration of further TSN technologies such as syn-
chitecture are supported by four main features: a System chronous traffic shaping algorithms along distributed systems
Time Engine (STE) that provides system time awareness, a composed by several nodes or ECUs like the ones depicted in
Distributed Arbitration Engine (DAE) that enables flexible Fig. 3. System Time exchanges information with the process-
control over frames routing/processing based on dynamic ing stage in order to extract the required time synchronization
priorities, an Elastic Queueing Engine (EQE) that supports information from frames related to IEEE802.1AS.
flexible memory organization for the storage of frames and
a Traffic Shaping Engine (TSE) that allows to shape traffic B. DISTRIBUTED ARBITRATION ENGINE
according to priorities and other shaping algorithms in place. The strategy followed by the Distributed Arbitration Engine
These aspects are detailed in the following subsections. (DAE) relies on the SDN concept at the core of eGW ar-
chitecture. As explained before, eGW SoC separates control
A. SYSTEM TIME ENGINE plane from data plane, allowing to handle the frame itself and
The System Time Engine (STE) is in charge of two structural the metadata related to it in parallel (instruction frame). The
aspects of the eGW: (i) generating the clock references to most important field within the context of this work is the
be used by the different eGW IPCores, and (ii) generating first field in the header (see Table 3): frame priority (PRIO).
the system Time Reference. The internal architecture of STE This field is the one used by the DAE in order to make
is depicted in Fig. 10. On one hand, this block contains a decisions on how to prioritize frames. This field defines the
Phase Locked Loop (PLL) that can be configured to generate priority of the frame in real time, and is composed not only
different clock frequencies according to the configuration of fixed priorities that can be assigned by strategies such as
defined by the CPU. Then, the different clocks are routed to VLAN tag, but also of dynamic aspects such as the time
the corresponding IPCores, providing flexibility with regard left for a frame to meet its deadline. All these bits have a
to frequency operation of each module. For example, the meaning and weight used to determine thus how to perform
VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

STATISTICS QUEUEING BLOCK


SYSTEM TIME
(MEMORY MAP)
S
DEADLINE CALCULATION

O0
I T T T T T ... DP
T DP DP DP DP DP ... DP IBT IBT IBT ... IBT
O1
I: Interrupt I0
T: Timestamp
Interrupt
DP: Defined Priority O2
Priority defined by previous stage IBT: In Band Telemetry Ii
Event

O3

OO
FIGURE 11. Priority field composition.
CTRL FULL EMPTY

Instruction0 FROM/TO
QUEUES CONTROLLER
Instructioni DAE
the arbitration in the dequeueing. The bits within the PRIO
field are organised in such a way that DAE needs only to
find the bigger number among the available frames and thus
select the next frame for processing/transmission (fast sorting FIGURE 12. Elastic Queueing Engine Architecture
in HW).
The DAE is used in different stages of eGW as seen in
Fig. 8. Mainly, it is present in every IPCore that needs to read C. ELASTIC QUEUEING ENGINE
frames from a queueing block since this implies to decide The Elastic Queueing Engine (EQE) provides a set of FIFO
which frame(s) to read/dequeue. The most complex arbitra- memories that allow to store frames while waiting to be
tion happens in the processing stage where several frames can processed by the next stage. At the same time, it acts as inter-
be selected in parallel and processed in different processing face between stages that can operate at different frequencies
units as illustrated in Fig. 8 by means of the building blocks and/or have different datapath sizes, enabling the claimed
TASK 0 to K. In the egress stage, it is integrated in the Traffic flexibility in the eGW datapath. The internal architecture is
Shaping Engine as described later in this section. depicted in Fig. 12. As seen in the figure, the datapath is very
The composition of the PRIO field is depicted in Fig. 11. simple, with interconnection resources allowing to store any
The first bit corresponds to interrupt events. This allows ingress frame into any of the internal queues. The selection of
to suddenly increase the priority of a frame to the highest where to store the frames is done by the control plane, to be
(because it is the most significant bit) in the case of particular exact by the DAE, taking into account the current status of the
events. Then, the notion of time is integrated by introducing queues as well as the metadata embedded in the instruction
a deadline, which indicates how much time is left for a frame of each frame. Like this, the queues controller block shown
to still get to its destination on time. In this case, the two’s in Fig. 12 is seamlessly integrated with the DAE in order
complement of the deadline is used in order to give more to perform the queueing and dequeueing of frames at run
priority to frames with less time available. Afterwards, a time, in real time. On the output interface, EQE allows the
predefined priority is included. This field can change across next stage to read from any of the queues, providing thus
the different stages of the eGW. For instance, at the ingress maximum flexibility to manage the available frames. In terms
stage the normalizer can assign a default priority to all the of configuration options, EQE allows to select the number of
frames coming through an ingress port. Different normalizers input (i) and output (o) ports as well as the size of the internal
can have different priorities, establishing a first prioritization buffers, adapting thus to the needs of the different stages that
between ingress ports. Then, in the filtering stage, the priority require queueing resources within the eGW datapath.
can change according to the pattern identified in the frame. Complementing the previously described orchestration
This allows the DAE in the processing stage to know which and arbitration strategy, the queueing stage provides another
are the frames with higher priority, including interrupts or extra degree of flexibility, allowing to organize the memory
tight timing constraints as conditions that can elevate or resources inside the queueing module in different ways. On
decrease the assigned priority at run-time. Finally, aspects one side, the architecture allows to define different queueing
coming from the internal telemetry or in band telemetry can blocks based on the same structure by selecting configuration
also be part of the prioritization of frames, using information parameters such as number of FIFO modules, queueing depth
collected on the statistics module of the eGW, pursuing thus a per unit, input/output width, etc. On the other side, it also
good enough level of flexibility to define and configure such provides flexibility regarding the usage of the instantiated
prioritization strategy. resources during operation, i.e. at run-time, allowing to max-
imize the use of the available memory. In other words, trying
12 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

HOST CPU
to avoid unnecessary messages drops if the eGW SoC has FROM/TO HOST CPU
free space in any of the queues, independently of how they TRAFFIC SHAPING ENGINE (TSE)
are organized, fact that does not occur in other more rigid, CONTROL PLANE

i.e. inelastic, chipset solutions in the market today. Further CONFIGURATION REGISTERS
CURRENT TIME (T bit) (MEMORY MAP)
details and experimental results of some initial dynamic TIME
REFERENCE
strategies for queueing management are available in [55]. In INTERRUPT
INTERRUPTS TSE CORE
SOURCE FSM
this work, we extend EQE by introducing a new approach to ((N+1) x Y bit)
Queue Tx Allowed
QUEUES
the management of queueing resources. Instead of focusing READ_EN (N bit) CTRL
NF STATUS (N bit)
on virtually extending the queues as in the previous work, FULL STATUS (N bit) STATISTICS
now we focus on providing a large number of small size NE STATUS (N bit) Queue Tx Allowed DAE
QUEUES
CONTROL
EMPTY STATUS (N bit) CTRL
queues that are all accessible to the next stage. With this new INSTRUCTION IN (K bit)
INTERRUPT
SOURCE
approach we increase the accessibility to the data stored in
the queues, providing a finer grain on the decision making
N FIFOs
since many more frames can be compared and stored in DATA IN (M bit)
DATA PLANE
EGRESS
parallel to improve the inline arbitration. DATA OUT PHY
(M bit) 1 0 1 0

D. TRAFFIC SHAPING ENGINE


The Traffic Shaping Engine (TSE) allows for selecting the FIGURE 13. Traffic Shaping Engine Architecture
next frame eligible for transmission based not only on the
priority field in the instruction frame, but also on other pa-
rameters defined by the traffic shaping algorithms in place. It on availability (queues status) but also on the configuration
provides a common HW architecture to implement any shap- of the traffic shaping algorithms determined by the memory
ing algorithm required by TSN standards or other shaping map. This means that for each queue we are able to determine
strategies that may be of interest. It provides a set of registers at each moment in time whether it is allowed to transmit or
in the form of a memory map that can be configured from not, and have DAE select the next transmission frame only
the CPU in order to implement different shaping strategies. among the queues for which transmission is allowed. For
Finally, it also provides the option to configure interrupt example, when implementing TAS or CBS, the algorithm
sources that can change the configuration in place, in real- would run individually for each queue determining whether
time and at run-time. The high-level architecture of TSE is the queue can transmit or not (i.e. gate open/closed in the case
depicted in Fig. 13. As shown in the figure, similarly to of TAS or credit available in the case of CBS). Afterwards,
EQE, the data plane is kept very simple allowing to minimize DAE would select the highest priority frame of the ones
the latency experienced by frames traversing the eGW. The allowed to transmit. The main difference with the previous
intelligence is managed in the control plane, mainly by the work in [58] is that by using DAE the priority is determined
TSE Core. This block takes as input the configuration written by the frames themselves, and not by the queue in which
in the memory map and the status of queues together with the they are stored. This allows to have more options in how to
available instruction frames. Additionally, it also has dedi- store and handle the frames, giving more flexibility within
cated inputs for interrupt sources that may alter the behavior the shaping stage stage and making our arbitration strategy
of the controller. In [58] we introduced the concept of TSE much more scalable and HW-independent/agnostic.
for the first time and performed some experiments to evaluate
the capabilities to implement the different TSN standards. VI. HARDWARE-ACCELERATED DDS: DEPLOYMENT IN
In this work, we extend the previous one by integrating the ELASTIC GATEWAY
DAE within TSE, improving the flexibility regarding the DDS has been designed to be timing predictable and ef-
management of frames. We also explore the suitability of ficient in its resource usage. However, the timely execu-
TSE to support the combination of DDS and TSN with a HW- tion is affected by the typical issues occurring on general-
based approach (Section VIII). purpose platforms such as contention on shared software and
The TSE core internal architecture is also visible in hardware resources (e.g. data structures, threads, memory
Fig. 13. We show how the DAE is integrated in order to hierarchy, CPUs, etc.). Furthermore, being a software tech-
decide which queue is the next one that will be transmitted. nology, DDS needs some non-negligible CPU processing,
However, in this case, apart from the instruction frames which might not be available on slow automotive CPUs.
available in the queueing block, DAE takes as input the The community answered this issue by proposing the DDS
output of the queue control stage, which determines whether For Extremely Resource Constrained Environments (DDS-
queues are allowed to transmit or not. Previously, we saw XRCE) standard [62]. However, this profile does not offer
how DAE can take the status of the queues as input and key features of the full standard. Moreover, it introduces the
combine it with the priority field in the instruction frame. need of an Agent in the software architecture to communicate
However, now we add an additional stage to the processing, with a DDS network, which could easily become a single
determining queues eligible for transmission based not only point of failure.
VOLUME 4, 2016 13

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

It is therefore clear the need for reducing the complexity of TABLE 4. RTPS message format.
the software stack and improving its timing accuracy to fully
Submessage
exploit the potential of DDS technology in the automotive Header Submessage Submessage Submessage Submessage
domain. This section presents the novel idea of implementing Header Element Header Element
a subset of the DDS QoS functionalities directly in hardware
by means of reliable accelerators. Particularly, we showcase TABLE 5. RTPS Data Submessage format.
how some of the DDS QoS policies can be deployed in
the previously described eGW architecture. By snooping Data SubmessageId
Submessage Flags
the DDS traffic, the network equipment can automatically Header SubmessageLength
configure the hardware components and resources to enforce EndiannessFlag
and better meet the requested QoS. InlineQosFlag
DataFlag
Data KeyFlag
A. USE CASE 1: EGW AS A NETWORK RELAY Submessage NonStandardPayloadFlag
BETWEEN PUBLISHERS AND SUBSCRIBERS Element readerId
writerId
In this section, we detail how eGW supports DDS policies writerSN
for traffic that is traversing the eGW — i.e., eGW is neither inlineQos
Publisher nor Subscriber for this traffic, just an element of serializedPayload
the network path between them acting as a relay or hop. The
main idea is to deploy the capability to detect DDS-related
frames and configure the eGW in order to better meet the tering rules that can extract inlineQoS parameters in a simple
requirements of the communication during run-time. For in- way. Table 6 shows the InlineQoS ParameterList format in
stance, it is possible to detect DDS policies such as transport RTPS Data messages. The parameter types are defined in the
priority or latency budget, and adjust the prioritization of the DDS specification [16]. Apart from these QoS properties, the
traffic classes within the eGW in order to ensure that DDS submessage contains information related to the reader and
requirements are met. In Fig. 14, we show how this strategy writer involved in the communication, or other parameters
can be deployed within the eGW internal architecture. After outside of the scope of this work. Table 6 highlights the QoS
reception, frames traverse the filtering stage where RTPS policies that can be supported by eGW for this use-case and
messages can be identified. Furthermore, by defining the that will be further described in the next paragraphs.
appropriate matching rules, the information embedded in the
RTPS message can also be identified, extracting the related 1) Deadline
inlineQoS parameters and deciding the prioritization of the On automotive systems, there is often the requirement that
frame within the eGW. This way, the QoS required by the some safety-critical data is sent and received at regular peri-
services can be guaranteed both during processing stage and ods. An example is the radar information that needs to be sent
in the egress ports, which is where most conflicts occur.
The definition of matching rules for DDS Parameter Ex-
traction exploits the modularity of RTPS messages, which TABLE 6. InlineQoS ParameterList in Data Message format.

allow to easily obtain the desired DDS parameters. The


PARAMETER NAME ID TYPE
frame format of an RTPS Message consists of a fixed-size
TOPIC_NAME 0x0005 String<256>
Header followed by a variable number of RTPS Submessage DURABILITY 0x001D DurabilityQoSPolicy
parts. Each Submessage also has a SubmessageHeader and PRESENTATION 0x0021 PresentationQoSPolicy
a variable number of SubmessageElements, as shown in Ta- DEADLINE 0x0023 DeadlineQoSPolicy
LATENCY_BUDGET 0x0027 LatencyBudgetQoSPolicy
ble 4. There are several kinds of Submessages, such as Data, OWNERSHIP 0x001F OwnershipQoSPolicy
AckNack, Heartbeat, etc. In the case of DDS QoS policies, OWNERSHIP_STRENGTH 0x0006 OwnershipStrengthQoSPolicy
RTPS messages allow to send this information in a DataSub- LIVELINESS 0x001B LivelinessQoSPolicy
message. Therefore, we focus now only on the structure of PARTITION 0x0029 PartitionQoSPolicy
RELIABILITY 0x001A ReliabilityQoSPolicy
this kind of Submessage. The most relevant parameters of TRANSPORT_PRIORITY 0x0049 TransportPriorityQoSPolicy
this submessage for our work are the inlineQosFlags and LIFESPAN 0x002B LifespanQoSPolicy
inlineQos, as highlighted in Table 5. The first one signals DESTINATION_ORDER 0x0025 DestinationOrderQoSPolicy
CONTENT_FILTER_INFO 0x0055 ContentFilterInfo_t
whether any QoS property is indicated within the message.
COHERENT_SET 0x0056 SequenceNumber_t
If so, the latter contains the inlineQos parameter list. The DIRECTED_WRITE 0x0057 GUID_t
inlineQos parameter list contains information related to the ORIGINAL_WRITER_INFO 0x0061 OriginalWriterInfo_t
DDS QoS policies used in this transaction. These can be GROUP_COHERENT_SET 0x0063 SequenceNumber_t
GROUP_SEQ_NUM 0x0064 SequenceNumber_t
the parameters related to any of the QoS policies marked WRITER_GROUP_INFO 0x0065 WriterGroupInfo_t
with "c " in Table 1 together with information identifying SECURE_WR._GR._INFO 0x0066 WriterGroupInfo_t
the service and entity which the message belongs to. By KEY_HASH 0x0070 KeyHash_t
exploiting this fixed structure it is possible to define the fil- STATUS_INFO 0x0071 StatusInfo_t

14 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

INGRESS FRAME INGRESS


FRAME
PHY FILTERING QUEUEING 0
11 00 11 00 PROCESSING
11 00 11 00 Match 0
Action
DDS RULES
TASK 0
INGRESS FRAME INGRESS TASK 1 EGRESS EGRESS
PHY FILTERING QUEUEING 1 QUEUEING TRAFFIC PHY
11 00 11 00 TASK K
11 00 11 00 Match 1 SHAPING 1 0 1 0
DDS RULES DAE

Detect DDS frame, Extract DDS Arbitration of parallel queues based on


parameters and determine frames priority, with DDS frames competing
priority of frames accordingly with the rest of non-DDS enqueued traffic

FIGURE 14. DDS support in eGW, when eGW is acting as a switch.

(and received) at regular intervals to let the receiving compo- TopicQoS custom_qos;
LatencyBudgetQoSPolicy latency;
nent take decisions for emergency braking. DDS allows to latency.duration = {..., ...};
codify this requirement through the DEADLINE QoS, which custom_qos.latency_budget(latency);
create_topic (tp_name, tp_type, custom_qos);
specifies the maximum period after which data must be sent
and received, respectively on the writer’s and receiver’s sides. Similarly to the DEADLINE policy, this information
The API to set this QoS policy for a Topic on software comes in a RTPS message when exchanging information
DDS implementations is similar to the following: between publisher and subscriber. Again, eGW can extract
TopicQoS custom_qos; this information and prioritize the traffic accordingly. In this
DeadlineQoSPolicy deadline;
deadline.period = {..., ...}; case, classification could be based on a set of thresholds or
custom_qos.deadline(deadline); ranges defined within the DDS rules (when latency budget <
create_topic (tp_name, tp_type, custom_qos);
X, then priority–>X, when latency budget > Y, then priority–
In software implementations, however, the latency in the > Y). Since the latency budget format is defined by the RTPS
intermediate network equipment might trigger unwanted frame, it is possible to define the HW accordingly in order
callback executions. In fact, even if the DataWriter respects to be able to guarantee that all values of the policy can be
the contract by sending the information at the right pe- handled in the HW. Furthermore, thanks to the SDN approach
riod, the DataReader might receive them with a jitter as followed within eGW, these thresholds can be updated by
they were not sent in time. For this reason, it is important the CPU when required, allowing to dynamically change the
that the intermediate network equipment is aware of the configuration used to prioritize DDS traffic.
DEADLINE value to be able of prioritizing the messages
when the period is expiring. This information is embedded 3) Transport Priority
in the RTPS messages used to send the data of a specific The TRANSPORT_PRIORITY QoS policy specifies the pri-
service from publisher to subscriber. Therefore, as seen in ority to be used on underlying transport.
Fig. 14, it is possible for the eGW to extract the DEADLINE The API to set this QoS policy for a Topic on software
information and classify this traffic into the appropriate traffic DDS implementations is similar to the following:
class within the system, providing more guarantees towards
meeting the required delay. Then it is just the job assigned TopicQoS custom_qos;
TransportPriorityQoSPolicy priority;
to DAE, to perform the dequeueing of frames taking into priority.value = ...
consideration the PRIO field where all this DDS information custom_qos.transport_priority(priority);
create_topic (tp_name, tp_type, custom_qos);
is encoded, as shown in Fig. 11. Moreover, this policy is a
good fit for interaction between DDS and TSN standards, As in the previous cases, the DDS Parameter Extraction
since it could benefit from the same HW implementation as module gets this information and defines the priority used for
IEEE802.1Qbv, as we explain later in Section VIII. these frames in the egress stage. In this case, the DDS policy
is already enforcing an explicit priority, which only needs
2) Latency Budget to be converted to the internal priorities in order to assign it
The LATENCY_BUDGET QoS policy specifies the maxi- to the corresponding frame. The mapping between priorities
mum delay from data write to data reception and notification. is done based on a table defined by the eGW CPU, which
The API to set this QoS policy for a Topic on software guarantees that every possible priority in DDS will have a
DDS implementations is similar to the following: corresponding priority in eGW.
VOLUME 4, 2016 15

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

4) Reliability TABLE 7. RTPS Heartbeat Submessage format.

The RELIABILITY QoS policy specifies the reliability level


Heartbeat SubmessageId
of message delivery. Submessage Flags
The API to set this QoS policy for a Topic on software Header SubmessageLength
DDS implementations is similar to the following: EndiannessFlag
FinalFlag
TopicQoS custom_qos; LivelinessFlag
ReliabilityQoSPolicy rel; GroupInfoFlag
rel.kind = RELIABLE_RELIABILITY_QOS; Heartbeat readerId
custom_qos.reliability(rel);
Submessage writerId
create_topic (tp_name, tp_type, custom_qos);
Element firstSN
lastSN
In this case, the information can be identified and used to count
prioritize traffic requiring higher reliability, but also to con- currentGSN
figure other internal reliability mechanisms, such as Frames firstGSN
lastGSN
Replication and Elimination for Reliability (FRER) strategy, writerSet
defined in TSN standards. The interaction between this DDS secureWriterSet
policy and TSN standard are described in Section VIII.

B. USE CASE 2: EGW AS A PROCESSING ELEMENT of each field is not of interest for this particular work. In
HOSTING A PUBLISHER OR SUBSCRIBER our case, eGW can easily perform the task of periodically
In this section, we describe how eGW can support DDS QoS sending Heartbeat messages within the Publisher Support
policies for traffic that is generated/consumed at the eGW module, offloading the CPU of this processing.
itself — i.e. the case when eGW hosts the application of the
reader/writer. Essentially, new HW accelerators are deployed 2) Time-based filter
in the frame processing stage where some of the DDS QoS The TIME_BASED_FILTER QoS policy defines the mini-
policies can be offloaded from the CPU, as shown in Fig. 15. mum time a DataReader is interested in receiving updates.
On one side, there is HW support for publisher features, The API to set this QoS policy for a DataReader on
which collect information from the CPU memory map and software DDS implementations is similar to the following:
generate the required traffic (e.g. alive messages). On the DataReaderQos custom_qos;
other side, there is HW support for subscriber features, where TimeBasedFilterQosPolicy time;
time.minimum_separation = {..., ...};
traffic is first analyzed in the HW, some characteristics are ex- custom_qos.time_based_filter(time);
tracted, and then frames are sent to the CPU when applicable. create_datareader(tp_name, custom_qos, ...);
Next, we elaborate on some DDS policies that can be
When established, Readers filter frames which are outside
deployed within the described architecture and how this can
of the receiver window (i.e. earlier than what the time-based
be implemented.
filter requires). However, this policy is not embedded as
an inlineQoS parameter within an RTPS message. In this
1) Liveliness
case, it is a configuration that belongs to the reader and can
The LIVELINESS QoS policy describes a "mechanism to be specified in the device supporting the reader application
determine if an entity is active (“alive”)". processing. eGW supports this feature within the Subscriber
The API to set this QoS policy for a Topic on software support module just dropping messages according to the time
DDS implementations is similar to the following: window rules, offloading thus the CPU from this filtering task
TopicQoS custom_qos; and reducing unnecessary workload.
LivelinessQoSPolicy live;
live.kind = AUTOMATIC_LIVELINESS_QOS;
live.duration = {..., ...}; 3) Destination order
custom_qos.liveliness(live);
create_topic (tp_name, tp_type, custom_qos); The DESTINATION_ORDER QoS policy defines the logical
order among changes made by Publishers to the same data
When configured, entities send alive messages informing
instance. In essence, it defines means to resolve conflicts
that their instance is up and running. The policy can be
when several publishers update the same data instance. For
configured to either let the middleware send these messages
example, it allows to choose recording values based on recep-
automatically (as in the example above) or to leave this
tion timestamp (last received prevails) or source timestamp
responsibility to the application code.
(last sent prevails). The API to set this QoS policy for a Topic
For this policy, a different type of RTPS message is used:
on software DDS implementations is similar to the following:
Heartbeat. As highlighted in Table 7, within the Heartbeat
TopicQoS custom_qos;
message, the LivelinessFlag is the element that needs to be DestinationOrderQoSPolicy order;
asserted in order to inform the reader about the writer status. order.kind = \\
BY_RECEPTION_TIMESTAMP_DESTINATIONORDER_QOS;
The other fields can also be filled accordingly to maximize custom_qos.destination_order(order);
usability of this Heartbeat message, although the purpose create_topic (tp_name, tp_type, custom_qos);

16 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

CPU

INGRESS INGRESS FRAME


FRAME QUEUEING 0
PHY PROCESSING
11 00 11 00 FILTERING
11 00 11 00 MEMORY MAP
Match 0
DDS
INGRESS INGRESS PUBLISHER EGRESS EGRESS
FRAME
PHY QUEUEING 1 SUPPORT QUEUEING TRAFFIC PHY
11 00 11 00 FILTERING
11 00 11 00 DDS SHAPING 1 0 1 0
Match 1
SUBSCRIBER
SUPPORT
TASK K

FIGURE 15. DDS support in eGW when as Publisher/Subscriber.

Typically, the DDS_BY_SOURCE_TIMESTAMP_ DES- later configure the embedded functions and features of each
TINATION_ORDER_QOS policy is used, for which a times- IPCore (micro-architecture), both levels managed by means
tamp must be included in the messages by every Writer. In of parameters that are configurable. Fig. 16 shows how the
eGW, this can be supported in the subscriber support module, framework allows to select the DDS policies that will be
delivering the chosen data to the CPU and abstracting from included in the design, and how these are mapped to the HW
the complexity of such conflicts. implementation.
All in all, the examples elaborated in this section prove In the top of the figure, an architecture example with two
how we can reduce the software complexity of certain timing- ingress and two egress ports is shown. After defining this first
related functions when appropriately ported to a dedicated step, the user can go block by block configuring the micro-
and cost-effective hardware architecture based on hardware architecture parameters. For instance, when configuring the
accelerators (HWA). Our proposal supports all this process- micro-architecture view of the Action block, there are several
ing performed in-memory and inline in a smooth and more sub-blocks available, as seen in the left bottom of the figure,
reliable manner, skipping by design those potential software corresponding to the low-level components of the Action
uncertainties/interferences that appear when deploying in a stage. When the selection of parameters of all blocks is
multicore or multi-thread SW-centric solution. In summary, done, the designer can click on the "Generate HDL" button,
our technical proposal consists in moving all this time-critical to automatically generate the code corresponding to this
processing from upper SW layers to a lower layer managed particular design. In this automatically generated code, the
in HW but configurable in SW by the host CPU. With this IPCores of eGW will include the required blocks according
approach, all that timing complexity and uncertainty is gone, to the parameters selected, as seen in right bottom corner of
just by design. the figure. In this case, the publisher and subscriber support
for DDS features will be included. For more details on how
VII. FROM DDS POLICIES TO IPCORES: ELASTIC the code is automatically generated, interested readers can
GATEWAY BUILDER TOOL refer to [60].
The composition of a complex HW design is often an arduous Overall, Fig. 16 shows the graphical interface of the tool
task. To reduce the complexity, there are several frameworks that allows to select the desired features, and how the selec-
available targeting different applications. For instance, au- tion is later translated into the HW implementation, by in-
thors in [63] present a framework that eases the integration stantiating the corresponding IPCores from our eGW Builder
of Deep Neural Networks in FPGAs. In [64] a framework library. In this example, we focus on the DDS features, par-
to automate the deployment of TSN switches configuration ticularly the Liveliness policy. Additionally, for each of the
in FPGAs is presented. Following this trend, we briefly selected features, the associated registers are automatically
introduce the automation framework that enables the design introduced in the memory map of the full chipset, allowing
of gateways based on eGW architecture. More details on this to configure the required functionalities. An example of the
framework are available in [60], [61]. registers corresponding to the DDS features instantiated in
As seen before, eGW allows to instantiate different in- Fig. 16 is shown in Table 8.
stances of a GW design by choosing which features are With this, we show how the gap between high-level DDS
integrated in each of them. To ease this task, an automa- policies and low-level HW IPCores can be bridged thanks
tion design framework, "Elastic Gateway Builder" has been to the SDN strategy followed in eGW architecture and sup-
developed [60]. The framework allows to first define the ported by eGW Builder as automation tool.
geometry and shape of the gateway design (architecture) and
VOLUME 4, 2016 17

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

ELASTIC GATEWAY BUILDER


GW ARCHITECTURE VIEW

1. GW Architecture ACTION
Definition

MICROARCHITECTURE VIEW
CPU
BLOCK NAME : ACTION

SUB-BLOCKS: FRAME PROCESSING

DDS MEMORY MAP


Generic Interface Encryption Liveliness Time-based Rx./
Timeout Filter Min. Tx.
DDS - Liveliness Decryption DDS PUBLISHER
SUPPORT Alive
DDS – Time Based Filter Forwarding Timeout message
3. HW Implementation
2. GW Blocks Micro- Counter
DDS – Destination Order ...
Architecture Definition DDS SUBSCRIBER
SUPPORT

DDS - Liveliness Time based


Filter
DDS - Liveliness
PARAMETERS

Destination
SELECTED :
Order
support

PROCESS

FIGURE 16. From DDS Policies to HW implementation with eGW Builder Tool.

TABLE 8. eGW DDS policies Memory Map. TABLE 9. Mapping of QoS features to Simplified OSI model.

DDS Policy Parameter Type Layer QoS features


LIVELINESS Alive timeout Natural Application Layer -
TIME_BASED_FILTER Minimum Time Natural Transport Layer DDS library
DESTINATION_ORDER Rx vs. Tx Timestamp Boolean Protocol Layer RTPS protocol
Network Access Layer TSN technologies

VIII. COEXISTENCE OF DDS WITH TSN IN EGW


The traditional layered approach defined by the OSI model but there is no link to the application layer. On the other
is the de-facto standard in networking devices. Its success side, middlewares such as DDS try to bridge this limita-
resides on the fact that it allows to decouple functionality tion by offering service related QoS features at transport
from transport and physical layers, permitting to reuse func- layer. However, without linking DDS to the lower layers,
tionalities across different network technologies and increas- the deployment is still subject to how the link and physical
ing the scalability of network management. It also simplifies layers are managed. Table 9 shows how the existing QoS
the application layer processing, by allowing higher layers mechanisms are mapped to the different layers defined in OSI
to operate without the need of knowing anything about how model (a simplified view).
the lower layers work. However, this freedom in terms of There are already some works in the literature exploring
functionality comes at a cost: it is not only that higher layers the convenient integration of TSN and DDS. In [65] authors
do not need to know how lower layers operate; it also means exploit the QoS capabilities of DDS in terms of traffic priori-
that they cannot influence their behavior, even if they want tization to guarantee the QoS of their application, while using
to. TSN to provide time synchronization across the network.
For this reason, QoS management is nowadays commonly However, in this research TSN and DDS just happen to be in
deployed in lower layers (TSN at network access layer) place in the experiments, but are independent of one another.
18 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

TABLE 10. Mapping of DDS policies to TSN technologies and eGW SoC. if a frame has a wide margin to arrive to its destination, a low
priority can be used for its transport, giving more priority to
DDS policy TSN feature eGW parameters
other frames; if, instead, it is running out of time, the priority
TRANSPORT_PRIORITY Strict Priority Frame Priority Field
LATENCY_BUDGET Strict Priority Frame Priority Field
of this particular frame can be increased, allowing it to meet
DEADLINE IEEE 802.1Qbv TSE Configuration the latency requirement.
RELIABILITY IEEE 802.1CB FRER Support TSN also allows to classify traffic according to priorities.
In this case, frames are divided into different Traffic Classes
(TC) (8 according to IEEE802.1Q) with each TC having a
In [66] authors describe a case study of DDS over TSN for priority assigned. Then, a Strict Priority scheduler chooses
military applications where they map the DDS applications to the next frame for transmission, which will be the one with
TSN enabled end stations. Similarly to the previous reference highest priority out of the eligible frames in a particular
TSN and DDS run in parallel, but independently. In [67], moment in time. In here we see an opportunity to make a
authors explore the capabilities in terms of QoS of wireless bridge between the DDS specification and TSN, matching
TSN networks when combined with DDS. In such work, DDS prioritization to TSN traffic classes. For the TRANS-
the intersection between TSN and DDS is introduced by PORT_PRIORITY policy, this is a direct match between the
mapping DDS topics to TSN streams. priorities specified by DDS frames and the defined TC within
In here, we identify a gap in the state of the art, which the gateway, which can be configured in the gateway memory
is why we explore how DDS and TSN QoS features can map. For LATENCY_BUDGET, instead, a dynamic range
be seamlessly merged or combined in order to provide this can be defined, such that depending on the remaining time
missing bridge that would allow to define low-level QoS that a frame has to achieve its destination, the priority used at
properties at higher levels of abstraction. This bridge is an L2 level (i.e. the traffic class) can be adjusted. Table 10 shows
important advance in the state of the art since it allows to how the mapping between DDS policies, TSN traffic classes
ensure the performance of application level services that have and eGW registers in the memory map can be deployed.
highly stringent real-time requirements for their operation.
In this section, we analyze how DDS and TSN can be suc- B. PERIODIC TRAFFIC MANAGEMENT: A COMMON
cessfully combined in order to improve the QoS management CONCEPT OF PERIOD, TIME AND DEADLINES
from three different perspectives. Moreover, we also show Another common challenge of traffic management is how to
how this combination of DDS and TSN can be deployed efficiently deal with periodic traffic. On one side, periodic
in the previously introduced eGW architecture, maximizing traffic should be, at first glance, easy to manage since we
performance through the combination of DDS, TSN and HW- know in advance when it is going to occur. However, when
based network processing. Table 10 summarizes the DDS mixed with traffic sent with different periodicity and also
policies that can be mapped to TSN technologies and how with event-based traffic, the correct management is not so
they are deployed in eGW. simple. Furthermore, in order to be able to make a good
planning of resources for periodic traffic, a common notion
A. TRAFFIC PRIORITIZATION: WHO GOES FIRST? of time needs to be shared across the different devices of the
The whole problem of traffic management can be abstracted network. This means, that in order to plan for a flow that may
as a conflict resolution problem. In the end, QoS strategies come at a specific moment in time, all nodes must have a
are needed because frames encounter conflicts along their trip common knowledge of "what time it is". In other words, all
through the network. In other words, in a network without nodes need to be synchronized.
conflicts, no QoS strategies would be required. However, For this, TSN offers the IEEE802.1AS standard that allows
conflicts do exist in real-world networks, mainly due to the to distribute the time of a master node so that all the other
presence of shared resources that need to be arbitrated, which nodes can synchronize with it with an accuracy of 1 µs or
is why different strategies are needed in order to solve them. less. Once synchronization is in place, different strategies can
The first approach to tackle this problem is to introduce traffic be used in order to handle periodic traffic. The proposal from
prioritization, i.e. out of a collection of frames that may TSN technologies is IEEE802.1Qbv, also known as Time
be eligible for transmission at a specific moment in time, Aware Shaper (TAS). Basically, TAS defines time windows
"Which one should go first?" Defining priorities allows to in which only one traffic class is allowed to transmit, ensuring
make this decision in a simple way. that periodic traffic will encounter an open path when arriving
DDS provides means to prioritize traffic through 2 to the node, minimizing thus the end-to-end latency if all
different QoS policies. The first one is the TRANS- nodes synchronize these windows properly.
PORT_PRIORITY policy, which explicitly indicates which On the DDS side, the DEADLINE policy defines the
should be the priority used for a particular frame at transport maximum time (i.e. the period) between updates of a data in-
layer. Another policy that expresses priority in an indirect stance. Again, we can match the period of the services with a
way is the LATENCY_BUDGET policy. By informing about particular deadline policy and make it a part of the TAS cyclic
the available time that a frame has to reach its destination, traffic management, ensuring that the traffic corresponding to
dynamism is achieved with regard to the priority of the frame: a particular service will always meet its deadline. This can be
VOLUME 4, 2016 19

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

deployed in eGW by appropriately configuring the registers application layer runs on top of the Run Time Environment
within the traffic shaping stage that allow to define the time (RTE). Below RTE there are different software layers that
windows for each traffic class, as seen in Section V. provide functionalities related to lower level infrastructure.
On one side, there is a layer for service related functionalities
C. TRAFFIC RELIABILITY MANAGEMENT: WHEN BEST which allows for supporting the service oriented architecture.
EFFORT IS NOT ENOUGH This is key in providing flexibility towards distribution of
Apart from timing constraints, QoS also applies to the reli- services across the network supporting the SDN paradigm.
ability of frames transmission. In safety related applications On the other side, there are different HW abstraction layers
this is of utmost importance since people’s health condition allowing for making the higher layers independent of the
might be at stake. Most of the time, reliability requires re- lower layers. Particularly, this abstraction is divided between
dundancy. This is because safety related information usually the MCU abstraction layer, which provides the required APIs
needs to be received within a bounded amount of time which to interact with the particular MCU in each implementation,
does not allow for data re-transmission, and also because and the ECU abstraction layer, which abstracts the whole
the system must be robust to a link failure where no re- ECU, where one or more MCUs may be in place. Addition-
transmissions would be feasible as in best-effort traffic. There ally, there is a set of drivers that manage the communication
are several strategies for redundancy, ranging from topology between the RTE and the MCU, skipping the other layers
decisions where the amount of redundant links are chosen, when necessary.
to protocol level strategies deciding which traffic should be On the right side of Fig. 17, we show how eGW maps to
replicated, how and when. this software layers defined in AUTOSAR. On the top, the ap-
DDS provides the RELIABILITY policy that allows to plication layer runs on the eGW CPU, similarly to any other
specify the reliability required for a particular service. How- AUTOSAR implementation. There, the SW components re-
ever, this is more a high-level requirement than an imple- sponsible for running the applications of each automotive
mentation specification, since DDS does not infer how this domain are deployed, i.e., ADAS/AD, body/comfort, cock-
reliability should be provided. pit/infotainment, powertrain/chassis and connectivity. Then,
TSN also considers the topic of reliability for which all the HW abstraction layers together with services and RTE
IEEE802.1CB has been defined. This standard defines the are absorbed by the IPCores that compose eGW data-path
"Frames Replication and Elimination for Reliability" algo- and their interconnections. The ECU and microcontroller
rithm (FRER), which provides a standard way to provide (MCU) abstraction layer are simplified in the CPU-HW
this redundancy of the safety related communication over interface (yellow line in the figure), together with the memory
alternate paths. map embedded in each of the IPCores. This simplification is
Within the eGW, FRER can be integrated in the processing possible thanks to the integration of functionalities directly
stage, as one more task in the stack which handles this in HW, which "lift up" the abstraction required by the CPU,
generation and elimination of replicates, offloading the CPU covering most of the lower layers and reducing the complex-
from this reliability related processing. More details on how ity significantly.
FRER algorithm can be deployed in the eGW architecture are The communication drivers are split across the different
described in [56]. IPCores of eGW data-path. Fig. 18 shows a more detailed
view of the functionalities embedded in these drivers in
IX. EGW SOC ARCHITECTURE AND AUTOSAR: HW VS AUTOSAR specification. On one side, the drivers manage
SW CENTRICITY the communication with different network protocols (CAN,
In this section, we showcase how the eGW SoC architecture Ethernet, FlexRay, etc. in the figure). This functionality is
is compatible and integrable with the AUTOSAR Classic absorbed in the Normalizer stage of eGW together with
software stack. This compatibility is a key aspect towards a the SDN strategy previously described. Furthermore, eGW
possible future adoption of eGW architecture in the industry, allows to integrate new network protocols by writing a
for instance in the way of new networking HW peripherals different configuration in the Normalizer Memory Map. In
integrable in next-generation networking SoC devices and AUTOSAR, the integration of a new protocol requires the
adopted in automotive and AUTOSAR standard. Further- the standardization and development of a new Basic Software
more, it is the main reason why it is possible to embed such a (BSW) module or a custom Complex Device Driver (CDD).
wide range of functionalities related to IVNs, and definitely So, again, we see how eGW simplifies the management of
the enabler of the integration of HW-Accelerated DDS. In complexity through HW integration. On the other side, the
order to show this concept, we start with an overview of drivers perform also routing/diagnostics functionalities, that
AUTOSAR software stack and afterwards detail how each are split between Normalizer, Filtering and Action stage in
of the layers maps to eGW SoC architecture. eGW, providing all these functionalities in HW, without inter-
On the left side of Fig. 17 the AUTOSAR Classic SW stack vention of the CPU, increasing performance while reducing
is shown. From top to bottom, AUTOSAR defines an appli- CPU load.
cation layer where the different functionalities required in Finally, the Services Layer is absorbed between the Match
the vehicle organized in SW components are deployed. This and Action stages of eGW, by integrating the required IP-
20 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

Application Layer CPU

Run Time environment SYSTEM TIME ENGINE

Services Layer INGRESS


FRAME
INGRESS FRAME INTERMEDIATE FRAME
PHY QUEUEING 0 FILTERING QUEUEING 0 PROCESSING
11 00 11 00 NORMALIZER
11 00 11 00 Match 0 Action
0
DAE
Complex TASK 0
ECU Abstraction Layer INGRESS
FRAME
INGRESS FRAME INTERMEDIATE
EGRESS EGRESS
Drivers PHY
11 00 11 00 NORMALIZER
QUEUEING 1 FILTERING QUEUEING 1 TASK 1
QUEUEING 0 TRAFFIC
SHAPING 0
PHY
11 00 11 00 Match 1 1 0 1 0
1 DAE
DAE
TASK K
EGRESS
Microcontroller Abstraction Layer INGRESS
PHY
FRAME
INGRESS
QUEUEING N
FRAME
FILTERING
INTERMEDIATE
QUEUEING N
EGRESS
QUEUEING M TRAFFIC PHY
11 00 11 00 NORMALIZER SHAPING M
11 00 11 00 Match N 1 0 1 0
N DAE DAE
DAE
Microcontroller Loopback Processing

Loopback TSN

FIGURE 17. AUTOSAR SW stack vs eGW SoC architecture.

Application Layer RTE

Comm.
Run Time environment manager

System
Services
Memory
Services
Crypto
Services
... Communication
Services
Autosar
Comm.
Diagnostic
Comm.
manager
Generic
NM

Device Memory Crypto Communication


Abstraction Abstraction Abstraction ... Abstraction
IPDUM

FlexRay TP
PDU Router

CAN TP
NM
Module

Microcontroller
Drivers
Memory Drivers
Crypto
Drivers
... Communication
Drivers FlexRay Interface CAN Interface LIN Interface

FlexRay Driver CAN Driver LIN Driver


Microcontroller
FIGURE 18. Simplified AUTOSAR stack (adapted from [68]).

Cores for each service, together with the eGW data-path itself A. TEST CASE 1: EGW AS A SWITCH BETWEEN
and the queuing and DAE strategies described above, as we PUBLISHERS AND SUBSCRIBERS
have shown for DDS in the previous sections. In this subsection, we evaluate the performance improvement
Overall, we see how eGW remains compatible with AU- in terms of QoS of RTPS messages provided by the eGW
TOSAR and how it enables the integration of modern func- DDS support when eGW is a switch between publisher and
tionalities and services such as DDS in a simple way. More subscribers, i.e. eGW is neither transmitter nor receiver of
importantly, eGW is able to reduce the complexity of han- the RTPS messages. For this, we focus on the TRANS-
dling and orchestrating all the required functionalities while PORT_PRIORITY QoS policy which provides a simple yet
providing maximum performance, through a pioneer fully powerful mechanism to control the priority with which a
HW-centric approach. frame is transmitted through the network. To evaluate this
feature, we run experiments with different configurations
regarding the priority used for transportation of RTPS traffic.
X. PROOF OF CONCEPT
The experiments run are the following:
• Experiment 1 — No prioritization of RTPS traffic:
This section shows experimental results of some of the First, we measure the delay observed for each frame
concepts defined above, synthesized on the FPGA of a Xilinx without using any prioritization or traffic shaping mech-
Zynq UltraScale+ ZU19EG SoC-based platform [69]. We anism, just a FIFO approach with all the traffic using the
evaluate the impact in performance of QoS both for the same queue inside the queueing modules.
case when eGW is a network switch between publishers and • Experiment 2 — Prioritization of RTPS traffic over
subscribers, and when it is acting as a publisher or subscriber. Best Effort traffic: Second, we define a rule in the
For the PoC design, we use the previously described eGW filtering stage that recognises RTPS frames and assigns
Builder Tool in order to generate the corresponding design them a priority higher than the rest of the traffic, which
file for each test case. For the experimentation and data anal- is considered best effort in the eGW.
ysis we follow the approach described in [61]. Mainly, we • Experiment 3 — Prioritization of RTPS traffic ac-
base the experimentation on the use of standard PCAP files in cording to TRANSPORT-PRIORITY QoS: Third,
order to inject/log traffic to/from the system. Afterwards, we we add more rules in the filtering stage that not only
process the collected PCAPs with Python scripts that allow recognise RTPS frames, but also inspect the QoS inline
to extract the data we are looking for. parameters. The priority used for the transmission of
traffic in the egress stage of eGW is determined by the
VOLUME 4, 2016 21

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

FRAME FILTERING EGRESS


HIT Match 0 FRAME PROCESSING
QUEUEING
Action
RULE 0: RTPS MSG + PRIO =
INGRESS QoS=TRANSPORT_PRIO TRANSPORT_PRIO
RTPS MSG 1 0 PHY
1 0 FORWARDING
Header TRANSPORT_PRIO = 6 1 0 1 0
1 0 1 0 RULE 1: RTPS MSG PRIO = 3
1 0 1 0
HIT
RULE 3: NON RTPS MSG PRIO = 0
NON RTPS MSG EGRESS
TRAFFIC
PHY
SHAPING
FRAME FILTERING ENGINE
OTHER 1 0 1 0
Match 1 PROCESSING
RULE 0: RTPS MSG + PRIO = TASKS
INGRESS QoS=TRANSPORT_PRIO TRANSPORT_PRIO
1 0 PHY
1 0
RTPS MSG 1 0 1 0 HIT
RULE 1: RTPS MSG PRIO = 3
Header X 1 0 1 0
1 0 1 0
RULE 3: NON RTPS MSG PRIO = 0

FIGURE 19. PoC — eGW as a switch between publishers and subscribers.

content of the TRANSPORT-PRIORITY QoS policy. 60,000

The PoC runs on a GW platform based on eGW architec-


ture, particularized for this use case as depicted in Fig. 19.
50,000
In this case, we instantiate two ingress ports and one egress
port, with Ethernet connections of 100 Mbps. We inject the
same traffic (at the same time) in the two ingress ports, and
Time (ns)

40,000
forward all the traffic to the egress port. The traffic in each of
the ports consists of 1000 frames of 64 Bytes which are sent
every 200 µs. We define 8 different traffic classes, following 30,000
the IEEE TSN standardization approach [40], where Traffic
Class 7 represents the highest priority, and Traffic Class 0
represents the lowest priority. As shown in Fig. 19, we define 20,000
three filtering rules to differentiate RTPS messages with
TRANSPORT_PRIORITY QoS policy, RTPS messages in
general with other QoS properties and non-RTPS messages. 10,000

For each of them, a different priority is defined and used 7 6 5 4 3 2 1 0

internally in eGW. This way, frames are stored in different Traffic Class
queues of the egress stage (based on availability and not on Experiment 1 Experiment 2 Experiment 3
priority, since priority is embedded in the instruction frame).
FIGURE 20. PoC — Plot of delay measurement of frames across experiments
Finally, the TSE is able to select between the frames based
on the priority field of the instruction frame, closing thus the
loop and allowing to use the QoS properties defined by the
DDS middleware. However, as detailed before, not all the each traffic class. Traffic Class 7 represents traffic with
experiments are sensitive to this traffic classification. In the highest priority, and Traffic Class 0 represents traffic with
first experiment there is no prioritization of traffic (i.e. all lowest priority. As we can see in the tables, in the case
frames fall under Rule 3 in Fig. 19). In the second one, all of Experiment 1, eGW does not differentiate among traffic
RTPS messages are given the same priority (Rules 2 and 3 classes and therefore the average delay for all classes is
apply, differentiating RTPS from non RTPS traffic). Finally, the same. When we introduce the prioritization strategy, we
on the third experiment, the TRANSPORT-PRIORITY QoS already see some improvements (Experiment 2) regarding
policy is used (Rules 1, 2 and 3 apply, differentiating RTPS the delay of the RTPS traffic, which further improves when
messages with TRANSPORT_PRIORITY QoS, RTPS mes- introducing the use of TRANSPORT-PRIORITY QoS policy
sages with other QoS properties and non RTPS messages). In (Experiment 3). The rows highlighted in bold in Table 11
our experiments, the TRANSPORT_PRIORITY used in the show the delay experienced by frames corresponding to the
messages corresponds to the traffic class of the frames, for the highest priorities (Traffic Class 6 and 7). There we see
sake of easing the final data analysis. We generate the traffic how in the first experiment these flows have similar delay
based on PCAP files with CANoe tool from Vector [70]. to others, while in experiments 2 and 3 they benefit from
The detailed results for each of the experiments are shown higher transport priority. On the other side, the worst case
in Table 11. For each of the experiments, we show the delay for traffic with lower priority increases significantly in
maximum, minimum and average delay for the flows of exchange for reducing the worst case delay of high priority
22 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

TABLE 11. PoC — Measured frames delay (in ns) across the different experiments.

Experiment Configuration Experiment 1 Delay Experiment 2 Delay Experiment 3 Delay


Traffic Class Frames in Minimum Maximum Average Minimum Maximum Average Minimum Maximum Average
7 125 13824 41984 22759 14080 35072 20342 13824 35072 20256
6 125 13824 41984 22870 13824 36096 21004 13824 35840 20592
5 125 13824 41984 22884 13824 36096 21880 13824 41984 20721
4 125 13824 40960 23799 14080 41216 22484 13824 38144 21110
3 125 13824 41216 22851 13824 42240 22032 13824 41984 20948
2 125 13824 47872 23443 13824 40960 21288 13824 46848 21071
1 125 14080 41984 24107 13824 41984 22769 13824 41216 20707
0 1125 13824 47872 23042 13824 48128 22034 13824 58112 20985
*Traffic Classes 6 and 7 represent highest priority traffic

TABLE 12. PoC — Average delay comparison across the different experiments (total measurements in ns).

# TC 7 TC 6 TC 5 TC 4 TC 3 TC 2 TC 1 TC 0
1 22759 (Ref) 22870 (Ref) 22884 (Ref) 23799 (Ref) 22851 (Ref) 23443 (Ref) 24107 (Ref) 23042 (Ref)
2 20342 (-10,6%) 21004 (-8,1%) 21880 (-4,4%) 22484 (-5,5%) 22032 (-3,6%) 21288 (-9,2%) 22769 (-5,5% ) 22034 (-4,4%)
3 20256 (-11%) 20592 (-10%) 20721 (-9,5%) 21110 (-11,3%) 20948 (-8,3%) 21071 (-9,2%) 20707 (-14,1%) 20985 (-4,37%)
#: Experiment Number
*Traffic Classes 6 and 7 represent highest priority traffic

traffic, as it could be expected. This effect appears because simulate vehicular traffic for experiments. Authors in [71]
the experiments are using two ingress ports at maximum identify this issue too, and provide some guidelines on how
rate to transmit traffic over one single egress port, taking the to build the traffic that can be used in IVN experiments.
system to the limit. Therefore, the egress port needs to queue In our case, we follow these guidelines and randomize the
some of the low priority frames for a longer time in order to generation of frames when possible in order to get the most
infer the lowest possible delay over new coming high priority realistic results we can. From our perspective, this is an
traffic. In Table 12, we compare a summary of the results opportunity for future research as well, which could represent
focusing on the average delay per traffic class in each of the an important contribution in the field.
approaches. We see that, in general, the average delay of all
traffic classes improves, with higher improvements related to B. TEST CASE 2: EGW AS A PUBLISHER OR
higher traffic prioritization (improvement for Traffic Class 6 SUBSCRIBER
and 7 highlighted in bold). In this subsection, we evaluate some of the benefits of per-
Finally, Fig. 20 provides a graphical view of the results. forming certain DDS features in a HW accelerator when
Again, we see the results for each of the traffic classes, being eGW is acting as publisher or subscriber. In particular, we
TC 7 and 6 the ones with highest priority. We represent evaluate the benefit of offloading a simple task such as the
the range of delay from minimum to maximum within each LIVELINESS QoS policy. As described before, this policy
traffic class with vertical lines, and the average with dots. mainly defines the periodicity of Heartbeat messages that
The different experiments are differentiated with colors and need to be generated and transmitted by writers. The benefits
superposed in the figure to ease visual comparison. Experi- of offloading such functionality are two-fold:
ment one is plotted in blue, experiment 2 in red and exper-
iment 3 in brown. The Figure shows how the introduction • CPU-load reduction: On one side, the CPU is free of
of prioritization improves the average response time of all this task and can dedicate its resources to other pro-
traffic classes for this particular use case. Furthermore, we cessing tasks that require software capabilities (e.g. due
also see how the use of specific priorities for each traffic to algorithm complexity or high silicon cost of a HW
class reduces the delay of higher priority classes, allowing alternative). This is a qualitative benefit that is difficult
to guarantee a maximum worst case for a given traffic class. to measure since it depends on the applications running
We see how the introduction of priorities per traffic class on the CPU.
reduces the maximum delay of high priority traffic (TC 6 • Better timing accuracy: On the other side, the time pre-
and 7) in exchange for longer worst case delays in the lower cision achieved by the HW implementation is typically
priority traffic classes. The big variations observed in the higher (i.e. the jitter observed regarding the established
experiments are caused by the traffic pattern used, where period is expected to be smaller). This aspect can be
two ports are sending a burst of traffic to one single egress quantitatively evaluated, so we focus on this second
port simultaneously. During the experiments, we saw that benefit in this PoC.
the traffic pattern used has a great impact on the result. In order to evaluate the difference in the jitter between
One shortcoming we identified, is that there is currently messages at receiver side, we deploy two implementations of
no available and standardized data set that can be used to the LIVELINESS QoS policy: a software-based implementa-
VOLUME 4, 2016 23

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

1.000500060 erators. This approach allows to either avoid the need of pow-
erful and expensive MCUs/CPUs to execute the DDS mid-
Time (s)

1.000000060
dleware and improve the predictability of the overall system.
The concept of this transition has been not only detailed but
0.999500060
also deployed through a Proof of Concept. For such a goal,
the eGW SoC architecture has been developed showcasing
0.999000060
HW offload SW implementation the possibility to manage DDS requirements at HW level
through the right management of queueing, arbitration and
FIGURE 21. PoC — Plot of jitter measurement of Heartbeat messages. scheduling of frames with the different proposed IPCores.
With this, the suitability of the HW-centric approach for
TABLE 13. PoC — Results of jitter measurement of Heartbeat messages. high-performance DDS deployment is successfully demon-
strated. Moreover, we have shown some insights about the
Statistic SW Implementation HW offload reasonable easiness of porting the HW-centric DDS solution
Maximum time (s) 1.000393828 1.000000250 to be adopted and standardized in AUTOSAR by means of
Average time (s) 1.000004977 1.000000147
Minimum time (s) 0.999118813 1 new standardizable HW peripherals integrable in future next-
Jitter (Max-Min) (s) 0.001275015 0.000000250 generation automotive-related SoC devices as accelerators.
The work done proves also that the evaluated DDS QoS
policies can coexist in a compatible manner with other
tion and a HW-accelerated implementation. On the software- TSN standards and IVN functional safety mechanisms re-
based implementation, we define a function that sends frames quired in next-generation autonomous-connected-electric-
with a certain periodicity by writing the corresponding reg- and-shared (ACES) vehicles. The work describes also the
isters of the HW driver. No other functionalities are present compatibility of DDS with part of TSN P802.1DG and Func-
in the CPU at run-time and no other traffic is present in the tional Safety ISO 26262 standards for certain automotive in-
system. Although being a simplistic system implementation, vehicle networking uses cases.
it allows us to compare the impact of the SW implementa- All in all, this work pioneers the deployment of HW-
tion running on the CPU Operating System versus the HW- centric DDS in cyber-physical systems, particularized here
accelerated functionality. On the HW-based implementation in a zonal gateway networking SoC device for automotive-
we implement the publisher support block described in Fig. related scenarios. The work shows how the presented eGW
16. Both the SW and HW implementations are configured to SoC architecture enables not only the integration of HW-
generate alive messages every second, and we run the exper- centric DDS and TSN features but also the compatibility with
iment on both platforms for 1000 seconds. The results are AUTOSAR software stack. The research has also contributed
summarized in Table 13 and Fig. 21. As expected, the jitter to bridge the gap in automotive software complexity thanks
of the HW implementation (250 ns) is much smaller than to a pioneer HW-oriented approach that integrates SDN, TSN
the jitter experienced by the SW implementation (>1ms). and DDS.
Considering that the SW implementation used for this ex-
perimentation is a very simple one without interferences of
REFERENCES
other tasks/threads in parallel, it is reasonable to foresee
[1] ISO. ISO 26262-1:2018 Road vehicles — Functional safety — Part 1:
that the benefit would be even higher on an ECU loaded Vocabulary. https://www.iso.org/standard/68383.html.
with many different applications running concurrently. It is [2] AUTOSAR. Classic Platform. https://www.autosar.org/standards/
important to note that we use the timestamp provided by the classic-platform/.
CANoe tool [70] when recording the resulting PCAP files. [3] EGAS Workgroup. Standardized E-Gas Monitoring Concept for Gasoline
and Diesel Engine Control Units Version 6.0. Technical report, Technical
This means that measurements include delay and jitter related report, EGAS Workgroup, 2015.
to the tool itself, and not only to the eGW implementation. [4] OSEK. OSEK/VDX Operating System Specification 2.2.3. Standard,
However, since this is the same for both the SW and HW February 2005.
[5] McKinsey. The case for an end-to-end automotive software plat-
implementation this is just an offset in both cases and does form. https://www.mckinsey.com/industries/automotive-and-assembly/
not influence the purpose of the comparison. our-insights/the-case-for-an-end-to-end-automotive-software-platform.
[6] Luc van Dijk. Future vehicle networks and ecus architecture and technol-
ogy considerations. NXP Semiconductors, Also available at: https://www.
XI. CONCLUSIONS nxp. com/docs/en/white-paper/FVNECUA4WP. pdf, 2017.
In this paper, we have illustrated the rationale behind the [7] Hadi Askaripoor, Morteza Hashemi Farzaneh, and Alois Knoll. E/E Archi-
transition of the automotive market towards service-oriented tecture Synthesis: Challenges and Technologies. Electronics, 11(4):518,
2022.
architectures (SoA) and we have illustrated how some use [8] Kim Strandberg, Tomas Olovsson, and Erland Jonsson. Securing the
cases can take advantage of the functionalities provided by Connected Car: A Security-Enhancement Methodology. IEEE Vehicular
modern middlewares. Technology Magazine, 13(1):56–65, 2018.
The research has then contributed in bringing the DDS [9] AUTOSAR. Adaptive Platform. https://www.autosar.org/standards/
adaptive-platform/.
technology to the next-level, by proposing the transition of [10] Edsger W Dijkstra. On the role of scientific thought. In Selected writings
some DDS functionalities from software to hardware accel- on computing: a personal perspective, pages 60–66. Springer, 1982.

24 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

[11] Shane Tuohy, Martin Glavin, Ciarán Hughes, Edward Jones, Mohan [36] OASIS. MQTT. Standard.
Trivedi, and Liam Kilmartin. Intra-vehicle networks: A review. IEEE [37] Vector. Middleware Protocols in the Automobile: Service-Oriented, Data-
Transactions on Intelligent Transportation Systems, 16(2):534–545, 2014. Centric or RESTful? Elektronik automotive Magazine, (3), March 2020.
[12] Onur Alparslan, Shin’ichi Arakawa, and Masayuki Murata. Next gener- [38] Alexandru Ioana, Adrian Korodi, and Ioan Silea. Automotive IoT
ation intra-vehicle backbone network architectures. In 2021 IEEE 22nd Ethernet-Based Communication Technologies Applied in a V2X Context
International Conference on High Performance Switching and Routing via a Multi-Protocol Gateway. Sensors, 22(17):6382, 2022.
(HPSR), pages 1–7. IEEE, 2021. [39] Christian Menard, Andrés Goens, Marten Lohstroh, and Jeronimo Cas-
[13] Jean Walrand, Max Turner, and Roy Myers. An Architecture for In-Vehicle trillon. Achieving determinism in adaptive autosar. In 2020 Design,
Networks. IEEE Transactions on Vehicular Technology, 70(7):6335–6342, Automation & Test in Europe Conference & Exhibition (DATE), pages
2021. 822–827. IEEE, 2020.
[14] Gaetano Patti, Lucia Lo Bello, and Luca Leonardi. Deadline-aware online [40] IEEE. Time sensitive networking working group. https://1.ieee802.org/
scheduling of tsn flows for automotive applications. IEEE Transactions on tsn/.
Industrial Informatics, pages 1–10, 2022. [41] Ahmed Nasrallah, Akhilesh S. Thyagaturu, Ziyad Alharbi, Cuixiang
[15] Youhwan Seol, Doyeon Hyeon, Junhong Min, Moonbeom Kim, and Wang, Xing Shao, Martin Reisslein, and Hesham ElBakoury. Ultra-Low
Jeongyeup Paek. Timely Survey of Time-Sensitive Networking: Past and Latency (ULL) Networks: The IEEE TSN and IETF DetNet Standards and
Future Directions. IEEE Access, 9:142506–142527, 2021. Related 5G ULL Research. IEEE Communications Surveys & Tutorials,
[16] OMG. Data Distribution Service (DDS) version 1.4. https://www.omg. 21(1):88–145, 2019.
org/spec/DDS/. [42] Lucia Lo Bello, Gaetano Patti, and Giancarlo Vasta. Assessments of
[17] Robot Operating System (ROS). https://www.ros.org. Real-Time Communications over TSN Automotive Networks. Electronics,
[18] Apex.AI, Inc. Apex.OS. https://www.apex.ai. 10(5), 2021.
[19] Michael Pöhnl, Alban Tamisier, and Tobias Blass. A middleware journey [43] Siwar Ben Hadj Said, Quang Huy Truong, and Michael Boc. SDN-Based
from microcontrollers to microprocessors. In 2022 Design, Automation & Configuration Solution for IEEE 802.1 Time Sensitive Networking (TSN).
Test in Europe Conference & Exhibition (DATE), pages 282–286. IEEE, SIGBED Rev., 16(1):27–32, feb 2019.
2022. [44] IEEE P802.1DG/D1.4 Draft Standard for Local and metropolitan area
[20] AUTOSAR. 13th AUTOSAR Open Conference (AOC). https://www. networks — Time-Sensitive Networking Profile for Automotive In-Vehicle
autosar.org/news-events/aoc2022/, 2022. Ethernet Communications. 2021.
[21] Object Management Group. https://www.omg.org. [45] 802.1AS-2020 - IEEE Standard for Local and Metropolitan Area
[22] OMG. The Real-time Publish-Subscribe Protocol DDS Interoperability Networks–Timing and Synchronization for Time-Sensitive Applications
Wire Protocol (DDSI-RTPSTM) Specification version 2.5. https://www. - IEEE Standard. 2020.
omg.org/spec/DDSI-RTPS. [46] IEEE Standard for Local and metropolitan area networks–Bridges and
[23] OMG. DDS-TSN Request For Proposals. https://www.omg.org/news/ Bridged Networks–Amendment 28: Per-Stream Filtering and Policing.
releases/pr2018/10-08-18.htm. IEEE Std 802.1Qci-2017 (Amendment to IEEE Std 802.1Q-2014 as
[24] AUTOSAR. SOME/IP Protocol Specification. https://www.autosar.org/ amended by IEEE Std 802.1Qca-2015, IEEE Std 802.1Qcd-2015, IEEE
fileadmin/user_upload/standards/foundation/21-11/AUTOSAR_PRS_ Std 802.1Q-2014/Cor 1-2015, IEEE Std 802.1Qbv-2015, IEEE Std
SOMEIPProtocol.pdf. 802.1Qbu-2016, and IEEE Std 802.1Qbz-2016), pages 1–65, 2017.
[25] Hadi Askaripoor, Morteza Hashemi Farzaneh, and Alois Knoll. E/E [47] L A N Man, Standards Committee, and Ieee Computer. IEEE Standard for
Architecture Synthesis: Challenges and Technologies. Electronics, 11(4), Local and metropolitan area networks–Frame Replication and Elimination
2022. for Reliability. 2017.
[26] Angela Gonzalez Mariño, Francesc Fons, and Juan Manuel Moreno [48] 802.1Qav-2009 - IEEE Standard for Local and metropolitan area
Arostegui. The future roadmap of in-vehicle network processing: A hw- networks– Virtual Bridged Local Area Networks Amendment 12 : For-
centric (r-)evolution. IEEE Access, 10:69223–69249, 2022. warding and Queuing Enhancements for Time-Sensitive Streams. IEEE,
[27] Abdoul Aziz Kane, Angela Gonzalez Mariño, Francesc Fons, Sandro 2010.
Nueesch, Piotr Serwa, and Michael Schoetz. Elastic gateway func- [49] IEEE Standard for Local and metropolitan area networks – Bridges and
tional safety architecture and deployment: A case study. IEEE Access, Bridged Networks - Amendment 25: Enhancements for Scheduled Traffic.
10:91771–91801, 2022. IEEE Std 802.1Qbv-2015 (Amendment to IEEE Std 802.1Q-2014 as
[28] Francesc Fons and Mariano Fons. FPGA-based Automotive ECU Ad- amended by IEEE Std 802.1Qca-2015, IEEE Std 802.1Qcd-2015, and
dresses AUTOSAR and ISO 26262 Standards. Xcellence in Automotive IEEE Std 802.1Q-2014/Cor 1-2015), pages 1–57, 2016.
Applications, (1), 2012. [50] IEEE Standard for Local and metropolitan area networks–Bridges and
[29] Francesc Fons, Mariano Fons, Paul Olivier, and Andre Weimerskirch. Bridged Networks–Amendment 29: Cyclic Queuing and Forwarding.
A Modular, Reconfigurable and Updateable Embedded Cyber Security IEEE 802.1Qch-2017 (Amendment to IEEE Std 802.1Q-2014 as amended
Hardware Solution for Automotive. EmbeddedWorldConference 2017, by IEEE Std 802.1Qca-2015, IEEE Std 802.1Qcd(TM)-2015, IEEE Std
2017. 802.1Q-2014/Cor 1-2015, IEEE Std 802.1Qbv-2015, IEEE Std 802.1Qbu-
[30] Shanker Shreejith and Suhaib A. Fahmy. Security aware network con- 2016, IEEE Std 802.1Qbz-2016, and IEEE Std 802.1Qci-2017), pages 1–
trollers for next generation automotive embedded systems. In 2015 52nd 30, 2017.
ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1–6, [51] IEEE Standard for Local and Metropolitan Area Networks–Bridges and
2015. Bridged Networks - Amendment 34:Asynchronous Traffic Shaping. IEEE
[31] Paolo Bellavista, Antonio Corradi, Luca Foschini, and Alessandro Per- Std 802.1Qcr-2020 (Amendment to IEEE Std 802.1Q-2018 as amended by
nafini. Data Distribution Service (DDS): A performance comparison of IEEE Std 802.1Qcp-2018, IEEE Std 802.1Qcc-2018, IEEE Std 802.1Qcy-
OpenSplice and RTI implementations. In 2013 IEEE Symposium on 2019, and IEEE Std 802.1Qcx-2020), pages 1–151, 2020.
Computers and Communications (ISCC), pages 000377–000383, 2013. [52] IEEE Standard for Local and metropolitan area networks – Bridges and
[32] Tianze Wu, Baofu Wu, Sa Wang, Liangkai Liu, Shaoshan Liu, Yungang Bridged Networks – Amendment 26: Frame Preemption. IEEE Std
Bao, and Weisong Shi. Oops! It’s Too Late. Your Autonomous Driving 802.1Qbu-2016 (Amendment to IEEE Std 802.1Q-2014), pages 1–52,
System Needs a Faster Middleware. IEEE Robotics and Automation 2016.
Letters, 6(4):7301–7308, 2021. [53] Angela Gonzalez Mariño, Francesc Fons, Li Ming, and Juan Manuel
[33] Stefan Profanter, Ayhun Tekat, Kirill Dorofeev, Markus Rickert, and Alois Moreno Arostegui. PDU Normalizer Engine for Heterogeneous In-Vehicle
Knoll. OPC UA versus ROS, DDS, and MQTT: Performance Evaluation Networks in Automotive Gateways. In Steven Derrien, Frank Hannig,
of Industry 4.0 Protocols. In 2019 IEEE International Conference on Pedro C. Diniz, and Daniel Chillet, editors, Applied Reconfigurable Com-
Industrial Technology (ICIT), pages 955–962, 2019. puting. Architectures, Tools, and Applications, pages 140–155, Cham,
[34] OpenADX. iceoryx — true zero-copy inter-process-communication). 2021. Springer International Publishing.
https://github.com/eclipse-iceoryx/iceoryx. [54] ISO. ISO 7498-2:1989 Information processing systems — Open Systems
[35] Yuang Chen and Thomas Kunz. Performance evaluation of IoT protocols Interconnection — Basic Reference Model . https://www.iso.org/standard/
under a constrained wireless access network. In 2016 International Con- 14256.html.
ference on Selected Topics in Mobile & Wireless Networking (MoWNeT), [55] Angela Gonzalez Mariño, Francesc Fons, Ahmed Gharba, Li Ming, and
pages 1–7. IEEE, 2016. Juan Manuel Moreno Arostegui. Elastic Queueing Engine for Time Sen-

VOLUME 4, 2016 25

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3213664

C. Scordino et al.: Hardware Acceleration of Data Distribution Service (DDS) for Automotive Communication and Computing

sitive Networking. In 2021 IEEE 93rd Vehicular Technology Conference CLAUDIO SCORDINO received a M.Sc. degree
(VTC2021-Spring), pages 1–7, 2021. in Computer Engineering in 2003 and a PhD in
[56] Angela Gonzalez Mariño, Abdoul Aziz Kane, Francesc Fons, and Computer Science in 2007 from the University
Juan Manuel Moreno Arostegui. Enhancements for Hardware-Based of Pisa. He collaborated with Scuola Superiore
IEEE802.1CB Embedded in Automotive Gateway System-on-Chip. In Sant’Anna and University of Pittsburgh about re-
Proceedings of the Symposium on Architectures for Networking and search on power-aware real-time operating sys-
Communications Systems, ANCS ’21, page 31–37, New York, NY, USA, tems. He has collaborated with the Linux ker-
2021. Association for Computing Machinery.
nel community, especially for the development
[57] Angela Gonzalez Mariño, Francesc Fons, Zhang Haigang, and Juan
of the SCHED_DEADLINE CPU scheduler. His
Manuel Moreno Arostegui. Loopback Strategy for In-Vehicle Network
Processing in Automotive Gateway Network on Chip. In Proceedings research activity focuses on real-time operating
of the 14th International Workshop on Network on Chip Architectures, systems, middlewares and hypervisors. He is an AUTOSAR member, ac-
NoCArc ’21, page 22–28, New York, NY, USA, 2021. Association for tively collaborating to the standard.
Computing Machinery.
[58] Angela Gonzalez Marino, Francesc Fons, Zhang Haigang, and Juan
Manuel Moreno Arostegui. Traffic Shaping Engine for Time Sensitive
Networking Integration within In-Vehicle Networks. In 2021 IEEE Vehic-
ular Networking Conference (VNC), pages 182–189, 2021.
[59] Angela Gonzalez Mariño, Francesc Fons, Zhang Haigang, and Juan
Manuel Moreno Arostegui. Loopback Strategy for TSN-compliant Traffic
Queueing and Shaping in Automotive Gateways. In 2021 IEEE Confer-
ence on Network Function Virtualization and Software Defined Networks
(NFV-SDN), pages 47–53, 2021. ANGELA GONZALEZ MARIÑO received the
[60] Angela Gonzalez Marino, Nikhil Halinge Naganath, Francesc Fons, and Bachelor degree in Telecommunications engineer-
Juan Manuel Moreno Arostegui. Build Automation Framework for Ar- ing from the Universidade de Vigo (UVIGO),
chitecture Design of Automotive Elastic Gateway. In Embedded World Vigo, Spain, in 2015 and the Master degree in
Conference, 2022. Electronics Engineering Systems from the Uni-
[61] Angela Gonzalez Marino, Nikhil Halinge Naganath, Francesc Fons, and
versidad Politecnica de Madrid (UPM), Madrid,
Juan Manuel Moreno Arostegui. Build Automation Framework for De-
Spain, in 2016.
sign Validation of Automotive Elastic Gateway Controllers. In SLICES
workshop held during IFIP Networking 2022, 2022. She worked in HP Inc Barcelona as a Research
[62] OMG. DDS for eXtremely Resource Constrained Environments version and Development Electronics Engineer from 2016
1.0. https://www.omg.org/spec/DDS-XRCE. to 2020, designing electronics for large format
[63] Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, printers and supporting the full product lifecycle development. At present
Wen-mei Hwu, and Deming Chen. DNNBuilder: an Automated Tool for she is with Huawei Technologies in the Automotive Engineering Laboratory
Building High-Performance DNN Hardware Accelerators for FPGAs. In of Munich Research Center (Munich, Germany) focusing on HW accel-
2018 IEEE/ACM International Conference on Computer-Aided Design erators design for automotive networking solutions, and pursuing a Ph.D.
(ICCAD), pages 1–8, 2018. together with the Universitat Politècnica de Catalunya (UPC), Barcelona,
[64] Jinli Yan, Wei Quan, Xiangrui Yang, Wenwen Fu, Yue Jiang, Hui Yang, Spain. Her current areas of interest focus on HW design for automotive In-
and Zhigang Sun. TSN-Builder: Enabling Rapid Customization of Vehicle Networks and System on Chip design.
Resource-Efficient Switches for Time-Sensitive Networking. In 2020 57th
ACM/IEEE Design Automation Conference (DAC), pages 1–6, 2020.
[65] Tanushree Agarwal, Payam Niknejad, M. R. Barzegaran, and Luigi Van-
fretti. Multi-Level Time-Sensitive Networking (TSN) Using the Data
Distribution Services (DDS) for Synchronized Three-Phase Measurement
Data Transfer. IEEE Access, 7:131407–131417, 2019.
[66] RELYUM and Real Time Innovations (RTI). Using DDS over TSN to
support NATO Generic Vehicle Architecture (NGVA) for Land Systems.
2019.
FRANCESC FONS (Senior Member 2020) re-
[67] Susruth Sudhakaran, Vincent Mageshkumar, Amit Baxi, and Dave Caval-
canti. Enabling qos for collaborative robotics applications with wireless ceived the Bachelor degree in Electrical Engineer-
tsn. In 2021 IEEE International Conference on Communications Work- ing, Master degree in Automatic Control and In-
shops (ICC Workshops), pages 1–6, 2021. dustrial Electronics Engineering and PhD degree
[68] AUTOSAR. AUTOSAR Layered Software Architecture. in Electronics Technology from the Universitat
https://www.autosar.org/fileadmin/user_upload/standards/classic/21-11/ Rovira i Virgili (URV), Tarragona, Spain, in 1995,
AUTOSAR_EXP_LayeredSoftwareArchitecture.pdf. 2001 and 2012, respectively. He has focused his
[69] ProFPGA Zynq UltraScale+ ZU19EG. https://www.profpga.com/ professional career on the automotive electronics
products/fpga-modules-overview/zynq-ultrascale-based/profpga-zu19eg. industry, working on R&D in the areas of em-
[70] Vector. Canoe. https://www.vector.com/int/en/products/products-a-z/ bedded software, systems, hardware and networks.
software/canoe/. Along his career, he has been with different automotive Tier 1 and Tier
[71] Filip Rezabek, Marcin Bosk, Thomas Paul, Kilian Holzinger, Sebastian 2 suppliers from US, Germany and China, and has participated in the
Gallenmüller, Angela Gonzalez, Abdoul Kane, Francesc Fons, Zhang successful launch of many commercial products for OEMs in Europe and
Haigang, Georg Carle, and Jörg Ott. EnGINE: Flexible Research Infras- Asia. At present, he is with Huawei Technologies, where he has the role
tructure for Reliable and Scalable Time Sensitive Networks. Journal of
of Chief Automotive In-Vehicle Network Researcher in the Automotive
Network and Systems Management, 30, 2022.
Engineering Laboratory of Huawei Munich Research Center.

26 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like