PWRARBYNDBITSRAS
PWRARBYNDBITSRAS
Networking elements, medical Symmetric A single OS manages all processor Provides greater scalability and parallelism
multipro- cores simultaneously. The OS can than AMP, along with simpler shared
devices and defense and aerospace dynamically schedule any process resource management.
cessing on any core, enabling full utilization
applications are all growing in (SMP) of all cores.
complexity and demanding ever- Bound mul- A single OS manages all cores Combines the developer control of AMP
increasing computational power. tiprocessing simultaneously. As in SMP, the with the transparent resource management
OS can dynamically schedule of SMP. The option to lock threads to any
At the same time, many of these (BMP) processes on any core. However, core simplifies migration of legacy code
the developer can also lock any and allows designers to dedicate cores to
systems must continue to address process (and all of its associated specific operations.
the thermal dissipation and low power threads) to a specific core.
constraints inherent to embedded
devices. Freescale QorIQ processors To address these challenges, developers Multiprocessing Modes and
directly address these requirements must find tools that can analyze the the Role of the OS
by providing much better processing complex system-level behavior that
Developers must also choose the
capacity per watt and per square occurs in a multicore chip. At any
appropriate form of multiprocessing
inch than conventional single-core instant, threads can be migrating across
for their application requirements. This
processors. cores, communicating with threads
choice will determine how easily both
Multicore processors such as Freescale on other cores or sharing resources
new and existing code can achieve
QorIQ platforms are, in effect, with threads on other cores—complex
maximum concurrency. As Table 1
multiprocessing systems-on-chip (SoC). interactions that conventional debug
illustrates, developers have three basic
Many Freescale SoCs have separate tools were never designed to analyze.
forms to choose from: Asymmetric
L1 and L2 caches per core, but use a Fortunately, vendors such as QNX multiprocessing (AMP), symmetric
shared L3 cache, memory subsystem, Software Systems have introduced multiprocessing (SMP) and bound
interrupt subsystem and peripherals. system tracing tools that provide multiprocessing (BMP).
To take advantage of these processors, a comprehensive view of multicore
embedded developers must graduate behavior, allowing the developer to
from a serial execution model, where visualize interactions between cores
software tasks take turns running on a and eliminate a variety of performance
single processor, to a parallel execution bottlenecks. Using the information that
model, where multiple software tasks these tools generate, the developer can
can run simultaneously. The more reduce resource contention, optimize
parallelism developers can achieve, thread migration, identify opportunities
the better their multicore systems will for parallelism and achieve maximum
perform. utilization of every processor core.
92
Beyond Bits Power Architecture Edition
Asymmetric Multiprocessing In the homogenous example shown In the heterogeneous example shown
(AMP) in Figure 1, one core of the P2020 in Figure 3, one core implements the
processor handles ingress traffic from control plane, while the other handles
AMP provides an execution environment
a hardware interface while the other all the data plane traffic, which has real-
similar to that of conventional
handles the egress traffic. Because time performance requirements. In this
uniprocessor systems, which most
the traffic exists as two independent case, the OSs running on the two cores
developers already know and
streams, the two cores don’t need both need to provide a consistent IPC
understand. Consequently, it offers a
to communicate or share data with mechanism, such as the transparent
relatively straightforward path for porting
each other. As a result, the OS doesn’t inter-process communication (TIPC)
legacy code. It also provides a direct
have to provide core-to-core IPC. It protocol, that allows the cores to
mechanism for controlling how the CPU
must, however, provide the real-time communicate efficiently, possibly
cores are used. And, in most cases,
performance needed to manage the through shared data structures.
it lets developers work with standard
traffic flows.
debugging tools and techniques.
Figure 3: AMP Control/Data
AMP can be either homogeneous, Figure 1: Using AMP Control/Data
Plane Plane
where each core runs the same type
Homogenous AMP to
and version of OS, or heterogeneous, Control Plane/Data Plane
Handle Both Ingress and
where each core runs either a different
Homogenous AMP: Ingress and Egress Traffic
Egress Traffic
OS or a different version of the same
Linux IPC QNX Neutrino
OS. In a homogeneous environment, Data Plane (Half-Duplex Mode)
developers can make best use of the Core 0 Core 1
multiple cores by choosing an OS that QNX Neutrino QNX Neutrino
offers a distributed programming model,
Core 0 Core 1
such as the QNX® Neutrino® RTOS.
In virtually all cases, OS support for a
Properly implemented, the model will
lean and easy-to-use communications
allow applications running on one core
Figure 2 shows another homogenous protocol will greatly enhance core-to-
to communicate transparently with
example, but this time both e500 core operation. In particular, an OS
applications and system services (e.g.
cores implement a distributed control built with the distributed programming
device drivers, protocol stacks) on
plane, with each core handling different paradigm in mind can take greater
other cores, but without the high CPU
aspects of a data plane. To control advantage of the parallelism provided by
utilization imposed by traditional forms
the data plane correctly, applications the multiple cores.
of interprocessor communication.
running on the multiple cores must
A heterogeneous environment has function in a coordinated fashion. To
somewhat different requirements. enable this coordination, the OS should
In this case, the developer must provide strong IPC support, such as a
either implement a proprietary shared memory infrastructure for routing
communications scheme or choose two table information.
OSs that share a common infrastructure
(likely IP based) for interprocessor Figure 2: Using
communications. To help avoid resource Homogenous AMP to
conflicts, the OSs should also provide
Implement a Distributed
standardized mechanisms for accessingHomogenous AMP
Control Plane
shared hardware components. In
virtually all cases, OS support for a Distributed Control Plane
lean and easy-to-use communications
QNX Neutrino IPC QNX Neutrino
protocol will greatly enhance core-to-
core operation. In particular, an OS Core 0 Core 1
built with the distributed programming
paradigm in mind can take greater
Data Plane Hardware
advantage of the parallelism provided by
the multiple cores.
freescale.com
93
Enablement
Figure System
AMP Multicore 4: AMP Multicore System CPU Utilization in
AMP Mode
Apps Apps Apps Apps In AMP mode, a process and all of its
OS 4
threads are locked to a single processor
OS 1 OS 2 OS 3
core. While this approach is useful for
running legacy code, it can result in
CPU CPU CPU CPU underutilization of processor cores.
For instance, if one core becomes
busy, applications running on that
core cannot, in most cases, migrate
System Interconnect to a core that has more CPU cycles
available (refer Figure 4). Though such
dynamic migration is possible, it typically
involves complex checkpointing of the
I/O I/O I/O I/O Memory Controller application’s state and can result in a
service interruption while the application
OS 1 Memory
is stopped on one core and restarted on
another. This migration becomes even
OS 2 Memory more difficult, if not impossible, if the
User management
of shared resources OS 3 Memory cores use different OSs.
complicates design
OS 4 Memory
Symmetric Multiprocessing
Shared Memory (SMP) Mode
Allocating resources in a multicore
design can be difficult, especially when
multiple software components are
unaware of how other components
are employing those resources. SMP
addresses many of the issues by
running only one copy of an OS across
all the chip’s cores. Because the OS
has insight into all system elements
at all times, it can allocate resources
on multiple cores with little or no input
from the application designer. By
running only one copy of the OS, SMP
can dynamically allocate resources
to specific applications rather than to
CPU cores, thereby enabling greater
utilization of available processing power.
94
Beyond Bits Power Architecture Edition
freescale.com
95
Enablement
96
Beyond Bits Power Architecture Edition
Japan:
Freescale Semiconductor Japan Ltd.
Headquarters
ARCO Tower 15F
1-8-1, Shimo-Meguro, Meguro-ku,
Tokyo 153-0064, Japan
0120 191014
+81 3 5437 9125
[email protected]
Asia/Pacific:
Freescale Semiconductor Hong Kong Ltd.
Technical Information Center
2 Dai King Street
Tai Po Industrial Estate,
Tai Po, N.T., Hong Kong
+800 2666 8080
[email protected]