0% found this document useful (0 votes)
63 views9 pages

Homogeneous and Heterogeneous Multicore Systems

Multicore systems have gained significant importance in the Automotive industry due to data- intensive applications, such as image processing, high- speed process, and GPS application. It enables manufacturers to build smaller chips, simplifying board architecture and routing, reducing power consumption and cost and increasing programmability. Multicore platforms are segregated into two categories namely Homogeneous (symmetric) and Heterogeneous (asymmetric) multicore systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views9 pages

Homogeneous and Heterogeneous Multicore Systems

Multicore systems have gained significant importance in the Automotive industry due to data- intensive applications, such as image processing, high- speed process, and GPS application. It enables manufacturers to build smaller chips, simplifying board architecture and routing, reducing power consumption and cost and increasing programmability. Multicore platforms are segregated into two categories namely Homogeneous (symmetric) and Heterogeneous (asymmetric) multicore systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Volume 9, Issue 5, May – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24MAY458

Homogeneous and Heterogeneous Multicore Systems


Srinath Bheemaraju Teja Sri Venkat Saidhar Movva
Chief Manager- SW SW Architect, ADAS,
Architecture, ADAS, Continental Autonomous Continental Autonomous Mobility,
Mobility, Bangalore, India Bangalore, India

Priyadharshini S
Engineer, ADAS Continental Autonomous Mobility,
Bangalore, India

Gayathri Nair, Engineer, ADAS Chandravardhana Kuluru


Continental Autonomous SW Architect, ADAS,
Mobility, Bangalore, India Continental Autonomous Mobility, Neu-Ulm, Germany

Abstract:- Multicore systems have gained significant I. INTRODUCTION


importance in the Automotive industry due to data-
intensive applications, such as image processing, high- Advanced Driver Assistance Systems (ADAS)
speed process, and GPS application. It enables applications are characterized by a growing interest in the real-
manufacturers to build smaller chips, simplifying board time and low-power implementation of digital signal
architecture and routing, reducing power consumption and processing (DSP) techniques for image processing and
cost and increasing programmability. Multicore platforms RADAR signal processing to improve system performance in
are segregated into two categories namely Homogeneous terms of processing efficiency. The State of the Art, complex
(symmetric) and Heterogeneous (asymmetric) multicore algorithms are proposed to bring highly efficient image
systems. processing and higher dynamic real-time control systems
realization. To achieve the higher computational power
In Homogeneous systems, all cores are the same, required by ADAS applications, it is required to use multicore
including frequencies, cache sizes and functions. In systems.
Heterogeneous systems, different cores operate with
different frequencies, cache sizes and functions. In high-performance computational systems, core
architecture has higher importance. A generic core contains a
In this paper, we are going to discuss various aspects processor, FPU, General purpose registers, Timers, Watchdog,
of homogeneous and heterogeneous multicore architectures Core specific debug port, Core specific I-Cache and D-Cache,
like number and level of caches, interconnection of the optional L2 RAM and a supporting bus system. These
cores, Physical and Temporal isolation, Energy Efficiency, elements’ combination defines core performance.
Concurrency, Performance, Reliability and Robustness
along with the evaluation of these two architectures for  CPU - Central Processing Unit
applications based on the above-mentioned aspects.  DSP - Digital Signal Processor
 GPU - Graphics Processing Unit
Keywords:- Multicore System, ADAS, Automotive,
Homogeneous, Heterogeneous, System on Chip,
Communication, Interconnect bus, CoreConnect, AMBA,
Wishbone, OS scheduling, Energy efficiency, Performance,
Isolation, Temporal, Physical, Concurrency.

IJISRT24MAY458 www.ijisrt.com 141


Volume 9, Issue 5, May – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24MAY458

Fig 1: Multicore System

Multicore systems can be categorized as Homogeneous They can be operated in single OS scheduling per core or
multicore systems and Heterogeneous multicore systems [1][3] common OS scheduling for all cores in the system[5]. A
popular example of a homogeneous system is the ARM
 Homogeneous (Symmetric) Multicore QUAD-CORE Cortex-A53 system. [17]
Homogeneous cores display homogeneity in their
attributes, as they possess uniformity in their ability to execute  Heterogeneous (Asymmetric) Multicore Systems
the same functions and possess an equivalent set of Heterogeneous cores are not identical. They can differ in
capabilities. capabilities, and speed, may lack certain features, or otherwise
perform a task differently.
Personal computer processors have homogenous cores,
which means that no power is consumed when a task is carried Modern high-end mobile phones tend to have
out on any of the processor's cores. A task is expected to be heterogeneous cores. Each core has its own specified
completed at the same time irrespective of which core it is functionality, different performance and power dissipation.
scheduled on. Heterogenous multicore systems can be operated in single OS
scheduling per core. Renesas R-Car-M3Ne uses
heterogeneous architecture.

Fig 2: Homogeneous and Heterogeneous Multicore Systems

IJISRT24MAY458 www.ijisrt.com 142


Volume 9, Issue 5, May – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24MAY458

In modern embedded applications, both categories of  Physical Isolation


multi core processors are widely used based on application [16].  Temporal Isolation
 Energy efficiency
II. MULTICORE SYSTEM ARCHITECTURE  Concurrency
CONSIDERATIONS
III. COMMUNICATION BETWEEN THE CORES
 Various Design and Architecture Aspects for Multicore
Systems are Considered in the Subsequent Sections Multicore interconnect plays a significant role in
multicore architecture systems. Some of the publicly available
 Communication between the cores bus architectures from leading manufacturers are AMBA, Core
 Evaluation of multicore systems Connect and Wishbone [6][18]

Fig 3: Interconnect in a Multicore System

A. AMBA[20][6][12] operations between them.[12] The AMBA specifications specify


Advanced Microcontroller Bus Architecture(AMBA) is a common bus protocols, regardless of processor type, for
bus standard devised by ARM with the aim to support efficient interconnecting on-chip components in various multicore
on-chip communications among ARM processor cores. [20] system structures. Arbitration procedures are not defined by
AMBA is currently one of the top on-chip bus systems used in AMBA. Instead, it enables the arbiter to meet the requirements
the design of high-performance multicore computers. This bus of the application, for interconnecting on-chip components in
architecture helps in reducing the silicon. AMBA is various multicore system structures, regardless of processor
hierarchically divided into the system-bus and peripheral-bus type [6][12] . AMBA is currently in its 5th version.
segments, which are linked by a bridge that buffers data and

IJISRT24MAY458 www.ijisrt.com 143


Volume 9, Issue 5, May – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24MAY458

Fig 4: AMBA System Architecture

The private peripheral bus is an APB(Advanced silicon bus and comprises three levels namely the Processor
Peripheral Bus) that is used to connect additional on-chip Local Bus (PLB), the On-chip Peripheral Bus (OPB) and the
peripherals like debug components. Fig.4[21] shows the AMBA Device Control Register (DCR) bus as shown in Fig. 5.
bus architecture devised for ARM processors.
PLB is the main on-chip bus system and is used to
B. Core Connect communicate with high-speed peripherals. It is used to connect
Core Connect is an on-chip silicon bus that was processors with on-chip memory, memory controllers and
developed by IBM for FPGA or ASIC designs[12]. other high-speed peripherals, including DMA controllers. It is
a synchronous, high-performance, arbitrated bus. It supports
CoreConnect technology is from IBM[12]CoreConnect concurrent read and writes operations for the same master. A
was developed for FPGA or ASIC designs. It is an on-chip central arbitration mechanism is used to grant access.

Fig 5: Core Connect System Architecture

IJISRT24MAY458 www.ijisrt.com 144


Volume 9, Issue 5, May – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24MAY458

OPB is designed to support slower peripherals. It is  Wishbone Supports Various IP Core Interconnections
implemented as a straightforward multimaster, arbitrated bus. Including
It is a fully synchronous, 32-bit address bus and 32-bit data
bus. Data transactions between master and slave take place in  Point-to-point
a single clock and burst operations are supported. The masters  Data flow interconnection
compete for the bus through arbitration. PLB transactions are  Shared bus interconnection
converted into OPB transactions by the PLB to OPB Bridge.  Crossbar switch
DCR is used for configuring on-chip device. It offers an  Off-chip
alternative path to the system for setting the individual device
control registers. A bus arbiter decides which one of the multi- Fig 6 shows possible wishbone topologies and how
masters will be allowed to control the bus for each bus cycle. masters and slaves can be connected. Point-to-point is a simple
way of connecting master and slave[19]. Data flow
C. Wishbone interconnection is a connection in which the data flows
It is a portable interface for semiconductor IP cores[13][12]. through a set of IP cores in a sequential order that is
Design reusability is made possible by establishing a shared, prearranged. In shared bus interconnection only one master
logical interface between IP cores. Therefore, the system is can use the interconnection at a time, as it initiates the
more reliable, portable and the end user experiences a shorter addressable bus cycles for a slave. Crossbar switch
time to market. Wishbone is a standard for building IP cores, architecture helps to operate several channels in parallel. It
not an IP core in and of itself. The handshaking protocol increases the data transfer rate. In Fig 6 dotted lines denote a
implemented in Wishbone allows each IP core to regulate possible connection. Similar connections can be made to
speed.[14] establish communication channels. Off-chip is used when we
have an interface that extends off-chip.[14]

Fig 6: Wishbone Topologies

IV. DESIGN FACTOR FOR MULTICORE SYSTEM  Physical Isolation


Physical isolation ensures that various cores in a single
The following sections explain the criteria for evaluating chip cannot access the same physical hardware (e.g., memory
multicore systems in design and performance. locations such as caches and RAM).

A. Isolation
Isolation physically and logically separates the resources
and software programs between the multiple cores. [7][11].

IJISRT24MAY458 www.ijisrt.com 145


Volume 9, Issue 5, May – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24MAY458

Resources are shared but at a specific time, only one core


has access to the specific resource.

Assuring temporal isolation and adequate execution time


determinism in a single-core real-time domain is crucial. An
extra amount of slack execution time is added to account for
nondeterministic disturbances, such as interrupts, with
carefully considered temporal effect. Applications are executed
in such a way that they seem to be in parallel, but frequent
context switches are performed to provide each application a
fair share of resources in the allocated time. Effectively, the
complete processor is partitioned using time division multiple
access (TDMA), with some exceptions. Direct Memory Access
(DMA) transfers and the resulting memory bus contention are
examples of exceptions that have temporal effects, but they
usually have well-known solutions and their implementation is
Fig 7: Physical Isolation extremely limited.

Physical isolation involves physically separating the When moving to multicore, the sequential execution
resources, such as memory and I/O devices, between different assumption in a single-core system can no longer be made
cores. Each core has its own dedicated resources and there is no secure. The applications now run simultaneously and have non-
sharing between cores. This approach ensures that there is no exclusive access to resources. Within a multicore environment,
interference between tasks or threads running on different cores. applications contend for access to system resources, which is
Fig 7 shows how two cores in a multicore system are physically typically mediated through hardware implementation in an
isolated from each other but may share some resources. implicit manner. The execution has non-deterministic temporal
delays as a response. A subset of the effects described in this
High availability is often valued more than performance. paper might be present in each processor architecture.
To achieve high system availability, redundant hardware is
commonly used to detect errors. Chip multiprocessors with a lot B. Energy Efficiency
of the same resources, such as cores, memory and Architects can use multicore processors to reduce the
interconnection networks, would seem to be the best starting number of embedded computers. Multinational systems
points for developing high-availability solutions on chips. But overcome excessive heat generation due to Moore’s law, which
on the contrary, doing so poses significant challenges with reduces the need for cooling. As less energy is released in the
respect to error containment and replacing the faulty form of heat while using multicore processing, battery life is
component. Increasing silicon and transient fault rates with increased while power usage is reduced. [2,9,10]
future technology scaling make the issue worse.
Various Energy Efficiency techniques in multicore
 Temporal Isolation architecture can be implemented based on application.
It ensures that the execution of software on one core does
not impact the temporal behavior of software running on  They are:
another core.
 Voltage and Frequency scaling[15]
 Cache configuration
 Individual core control
 Low-power and high-power domain configuration

C. Concurrency
Multicore processing improves the intrinsic support for
real (as opposed to virtual) parallel processing within
individual software applications across multiple applications
by allocating applications to various cores as illustrated in fig
8. Concurrency is the capacity for many tasks to be carried out
simultaneously and in any sequence without impacting the
output.
Fig 8: Temporal Isolation

IJISRT24MAY458 www.ijisrt.com 146


Volume 9, Issue 5, May – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24MAY458

and measurement are the primary methods. In evaluating


multicore CPUs performance, the techniques used depend on
different considerations. However, we cannot trust the result of
one technique unless we compare that result with other
techniques. [4,1]

 Performance Metrics
Performance Metrics that are used in evaluating multicore
CPUs performance for applications.

 Throughput: Throughput can be defined as the average time


taken by the cores for executing the processes.
 Response time: It is defined as the interval between when a
request is submitted and when the system starts generating
the response. Ideal response time of a core must be
minimum.
Fig 9: Concurrency  Execution time: The time taken to complete program
execution. The major concern is to achieve minimum
D. Performance execution time without having to compromise energy,
Performance can be improved by multicore processing by thermal issues and other vital factors.
running several applications simultaneously. The closer  Energy: It is the power needed to run a program. To reduce
spacing between cores on an integrated chip allows faster cache power consumption, multicore systems must have separate
speeds and lower latency when compared to using separate power management units. The architecture must be such
processors or computers. The number of cores, degree of actual that the thermal energy dissipated by cores is also
concurrency in the software and use of shared resources are all minimum.
factors that affect performance[4].  Memory bandwidth: It is described by the rate of data read
from the CPU core to the RAM (Random Access Memory)
Multi processors(multiple CPUs on different chips or vice versa. In another way, it can be defined by how often
attached to the same motherboard) can accomplish high- the data can be accessed to and from the memory. When the
performance and speed. However, alternative research trends memory bandwidth decreases, the core will have difficulty
promoted the development of multicore CPUs to processing data or even loading it. This will result in a delay
simultaneously decrease power consumption and boost in processing it.
computing speed because of their unfavorably high-power  Memory latency: The time delay for the memory controller
consumption. The multicore CPU architecture gave speed and signaled the memory module to access a byte or word from
efficiency with less power usage for power-hungry the RAM and until it is retrieved by the memory module. It
applications. is also known as CAS (Column Address Strobe) latency.
 Percent of memory contention: Memory contention is a
 Performance Analysis condition where two programs or parts of a program
To guarantee reliable performance at a specific cost, attempt to read data from the same memory block at the
performance analysis, a criterion that defines the performance same moment. Contention value between 5% and 50% is a
of a system, is necessary at every stage of the computer system “normal overcommit”.
life cycle. The demand for performance analysis is derived
from the following factors:

 The requirements of computer users today are different


from those of 20 years ago.
 The popularity of computer technology has increased
drastically

These factors have resulted in the development of


different computer systems with unique performances. These
developments require continuous performance analysis that
meets the user’s demands and helps to select the best
alternative which provides higher performance at a given cost.
The selection of proper evaluation criteria is the initial
step for performance analysis. Analytical modeling, simulation Fig 10: Performance Metrics

IJISRT24MAY458 www.ijisrt.com 147


Volume 9, Issue 5, May – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24MAY458

V. MULTICORE SYSTEM EVALUATION WITH


EXAMPLES

 The Following List of Parameters and Factors Need to be


Considered for the Evaluation of Multicore Systems

 Memory: Memory architecture used and memory speed can


affect the performance of the multicore CPU.
 Scalability: Affects performance based on the rate of
increase of the workloads (i.e. tasks). If the number of tasks
is equal to the number of cores, then performance scales
linearly but in reality, if the number of tasks exceeds the
number of cores then, the performance depends on various
factors like cache utilization, etc.
 I/O bandwidth: I/O bandwidth refers to the rate at which a
core reads or writes to the network. This can affect the
performance as it utilizes the CPU cores which leads to
more resources consumption
 Inter-core communication: The interaction between cores
in multicore CPU's are vital. This can be implemented by Fig 11: Power Optimized ARM based Multicore Architecture
various mechanisms, affecting overall CPU performance
due to shared workloads between cores. Cores
communicating effectively can increase the performance of
the system as a whole
 Operating system (OS): OS is a software that manages the
hardware. It assigns tasks to cores based on a scheduling
mechanism. It provides an orderly manner of allocation of
cores for programs, memory etc.
 CPU clock speed: Clock speed affects processor
performance. Slow clock speed reduces throughput.
Increasing the clock speed to a certain extent is only
permitted. It is not possible to increase clock speed beyond
this limit.
 Numbers of cores: Number of cores present and their types
can affect the CPU performance as multicore architecture
divides the workload between cores.
 Cache coherent: The challenge of maintaining consistency
in the data in caches is known as cache coherence.
Multicore architectures uses different caching mechanisms
as the cache is shared among the cores. Maintaining the
consistency of data will affect the performance of the cores.

Considering all these multicore evaluation parameters, a Fig 12: Performance and Realtime ARM based Multicore
few examples are discussed with respect to ARM architecture. Architecture
ARM Cortex-A cores are utilized for devices that run powerful
operating systems like Linux or Android which require High- The fig 10 shows an ARM based multicore architecture
performance and computational power. For real-time that can be considered where optimum power utilization is
applications which require deterministic interrupt response to main use case. There is always a tradeoff between power and
meet hard real time requirements, Cortex-R cores with features performance whenever we consider any system.
like tightly coupled memory are recommended. Cortex-M core
are optimized for low-power, cost-sensitive embedded The fig 11 represent a multicore architecture in which the
systems. An example of multicore architecture that can be cores are selected based on the performance of the cores. The
considered for power optimization based and performance and cores used for an application can also be a core cluster where
safety or real time based is shown in the figures below. two or more homogeneous cores can be used.

IJISRT24MAY458 www.ijisrt.com 148


Volume 9, Issue 5, May – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24MAY458

VI. SUMMARY [6]. M. Mitić and M. Stojčev, "A Survey of Three System-
on-Chip Buses: AMBA, CoreConnect and Wishbone,"
In modern automotive applications like ADAS, both International Journal of Electrical and Computer
homogeneous and heterogeneous categories of multi core Engineering
processors are widely used based on application. Using [7]. J. Zamorano and J. A. de la Puente, "Memory Isolation
multicore system, we can increase the performance of a system in Many-Core Embedded Systems,"
without increasing the clock frequency. In this paper, different [8]. N. Aggarwal, P. Ranganathan, N. P. Jouppi, and J. E.
parameters, considerations and factors like core-connect buses, Smith, "Configurable Isolation: Building High
isolation and energy efficiency, are discussed with reference to Availability Systems with Commodity Multicore
homogeneous and heterogeneous multicore systems. In Processors,"
general, where performance optimization, system reliability [9]. J. Cong and B. Yuan, "Energy-Efficient Scheduling on
and reduction of power consumption are the focus areas despite Heterogeneous Multicore Architectures,"
architectural and design complexity, heterogeneous systems [10]. A. Merkel and F. Bellosa, "Memory-aware Scheduling
are beneficial. However, if architectural & design simplicity is for Energy Efficiency on Multicore Processors."
the key criterion, homogeneous systems are a better choice. [11]. H. Omar, H. Dogan, B. Kahne, and O. Khan, "Multicore
The combination of both heterogenous and homogenous cores Resource Isolation for Deterministic, Resilient, and
also can be considered for system designs. All these aspects Secure Concurrent Execution of Safety-Critical
need to be evaluated in a balanced way to select an appropriate Applications," IEEE Computer Architecture Letters,
multicore system based on the application needs for migrating vol. 17, no.2, July Dec.2018
from an existing single core system to a multicore system. [12]. R. Usselmann, "OpenCores SoC Bus Review," Rev.
These topics can be part of further study by considering 1.0, January 9, 2001
specific multicore SoCs. [13]. "Specification for the: WISHBONE System-on-Chip
(SoC), Interconnection Architecture for Portable IP
ACKNOWLEDGEMENTS Cores," Revision: B.3, Released: September 7, 2002.
[14]. "Wishbone B4, WISHBONE System-on-Chip (SoC)
We sincerely wish to thank Mr. Sudeepth Puthumana, Interconnection, Architecture for Portable IP Cores,"
Head of Market Segments, Autonomous Mobility (AM) India, [15]. Hwang-cheng Wang and Alagan Anpalagan, "Energy-
Continental Tech Center India (TCI), Bangalore; Mr. Vinayaka efficient tasks scheduling algorithm for real-time
Nagaraja, Head of Market Japan, Korea and India; Mr. Praveen multiprocessor embedded systems," Journal of Systems
Kumar BL, Head of Application Projects, Japan, Korea and Architecture, vol. 57, no. 5, pp. 498-505, May 2011.
India Segment for providing this opportunity and giving doi: 10.1016/j.sysarc.2010.10.003.
valuable support to us for working on this study. We would like [16]. Nik Jedrzejewski, "Three Reasons Why Embedded
to thank all the members of Market Japan, Korea and India SW Heterogeneous Systems Are More Efficient," NXP
Architecture team, Mr. Srinath Murthy, Head of Market Blog, Jan. 30, 2019.
Europe and our sincere thanks to Corporate Communications [17]. ARM Cortex-A53 MPCore Processor Technical
Team and Technical Paper Publishing team at TCI for their Reference Manual,
cooperation. Ver:r0P4,https://developer.arm.com/documentation/dd
i0500/j/Introduction/About-the-Cortex-A53-processor.
REFERENCES [18]. IJSRD - International Journal for Scientific Research
and Development. (2014, May 24). A comparative
[1]. A.S. Radhamani, "Performance Analysis of Study of Different system-on-Chip Buses based on
Homogeneous and Heterogeneous Multicore Processor Industry standards: AMBA, CoreConnect and
Using Static and Dynamic Schedulers," Asian Journal Wishbone.
of Information Technology [19]. IJEERT - Kolte, Mahesh. (2014). Design and
[2]. C. Leech and T. J. Kazmierski, "Energy Efficient Verification Point-to-Point Architecture of Wishbone
Multicore Processing," Electronics Bus for System On Chip.International Journal of
[3]. Sergio Saponara and Luca Fanucci, Homogeneous and Emerging Engineering Research and Technology. 2.
Heterogeneous MPSoC Architectures with Network- 155-159.
On-Chip Connectivity for Low-Power and Real-Time [20]. ARM, "AMBA,"https://developer.arm.com/ip-
[4]. IntervalZero, "How to Optimize the Scalability & products/system-ip/amba.
Performance of a Multicore Operating System," [21]. J. Yiu, "CHAPTER 6 - Cortex-M3 Implementation
IntervalZero.com Overview," The Definitive Guide to the ARM Cortex-
[5]. Ajeya Naithani, Stijn Eyerman, Lieven Eeckhout. M3 (Second Edition), Newnes, 2010, pp. 99-108
Reliability-Aware Scheduling on Heterogeneous
Multicore Processors

IJISRT24MAY458 www.ijisrt.com 149

You might also like