Multicore Embeddedfinal Revised
Multicore Embeddedfinal Revised
Emerging Trends
Christian Märtin
Faculty of Computer Science
Augsburg University of Applied Sciences
Augsburg, Germany
[email protected]
Abstract— This paper undertakes a critical review of the cur- However, in a similar way as the necessary transition from
rent challenges in multicore processor evolution, underlying complex single core architectures with high operating frequen-
trends and design decisions for future multicore processor cies to multicore processors with moderate frequencies was
implementations. It is first shown, that for keeping up with caused by the exponentially growing thermal design power
Moore´s law during the last decade, the VLSI scaling rules for (TDP) of the complex single core processors for reaching
processor design had to be dramatically changed. In future mul- linear performance improvements [2], the ongoing multicore
ticore designs large quantities of dark silicon will be unavoidable evolution has again hit the power wall and will undergo
and chip architects will have to find new ways for balancing dramatic changes during the next several years [3]. As new
further performance gains, energy efficiency and software com-
analytical models and studies show [4], [5], power problems
plexity. The paper compares the various architectural alterna-
tives on the basis of specific analytical models for multicore
and the limited degree of inherent application parallelism will
systems. Examples of leading commercial multicore processors lead to rising percentages of dark or dim silicon in future mul-
and architectural research trends are given to underscore the ticore processors. This means that large parts of the chip have
dramatic changes lying ahead in computer architecture and to be switched off or operated at low frequencies all the time. It
multicore processor design. has to be studied, whether the effects of such pessimistic fore-
casts will affect embedded applications and system environ-
Keywords—multicore processor, Moore’s law, Post-Dennard ments in a milder way than the software in more conservative
scaling, multicore architectures, many-core architecture, multicore standard and high-performance computing environments.
performance models, dark silicon
In this paper we discuss the reasons for these developments
I. INTRODUCTION together with other future challenges for multicore processors.
We also examine possible solution approaches to some of the
More than 12 years after IBM started into the age of multi- topics. When discussing the performance of multicore systems,
core processors with the IBM Power4, the first commercial we must first have a look on adequate multicore performance
dual core processor chip, software and system developers as models that both consider the effects of Amdahl’s law on
well as end users of business, engineering and embedded appli- different multicore architectures and workloads, and on the
cations still take it for granted, that the performance gains consequences of these models with regard to multicore power
delivered by each new chip generation maintain a more than and energy requirements. We use the models also to introduce
linear improvement over the decade ahead. Moore´s law the different architectural classes for multicore processors.
appears still to be valid as demonstrated by Intel´s fast track
from 32 to 22nm mass production and towards its new 14nm The paper will therefore give an overview of the most
CMOS process with even smaller and at the same time more promising current architectures and predictable trends and will
energy efficient structures every two years [1]. finally point at some typical implementations of server, work-
station, and embedded multicore chips. Multicore processor
Very successfully and at the extreme end of the perfor- implementations in the same architectural class may vary sig-
mance spectrum, Moore´s law is also expressed by the indus- nificantly depending on the targeted application domain and
try´s multi-billion transistor multicore and many-core server the given power budget. As will be shown, the trend towards
chips and GPUs. Obviously the transistor raw material needed more heterogeneous and/or dynamic architectures and innova-
for integrating even more processor cores and larger caches tive design directions can mitigate several of the expected
onto future chips for all application areas and performance problems.
levels is still available.
!
𝑆!"# (𝑓, 𝑛, 𝑟) = !!! ! (6)
!
!"#$ ! !
Note, that for the asymmetric case in equation (5), the par-
allel work is done together by the large core and the 𝑛 − 𝑟
small cores. If either the large core, or the small cores would be
running, to keep the available power budget in balance, the
equation would change to:
!
𝑆!"#$ (𝑓, 𝑛, 𝑟) = !!! ! (7)
!
!"#$(!) !!!
Fig. 5. Dynamic multicore with 4 cores and frequency scaling using the
In addition to achievable performance, power (TDP) and power budget of 8 BCEs. Currently one core is at full core TDP, two cores are
energy consumption are important indicators for the feasibility at ½ core TDP. One core is switched off.
and appropriateness of multi- and many-core-architectures with
symmetric and asymmetric structures. Woo and Lee [13] have
extended the work of Hill and Marty towards modeling the C C C C
average power envelope, when workloads that can be modeled
with Amdahl’s law are executed on multicore processors. The
equations for the symmetric case are given here: A A
1
B B
𝑊 = !! !!! ! !!!
! (8)
!!! !
! C C
D D E E
In this equation 𝑊 is the average power consumption, 𝑘 is
the fraction of power that one core consumes in idle state, and
!"#$%#&'()"
the power of a core in active state is 1. For we get Fig. 6. Heterogeneous multicore with a large core, four BCEs, two acceler-
!"#$%
ators or co-processors of type A, B, D, E, each. Each co-processor/accelerator
uses the same transistor budget as a BCE.
!"#$ !
= (9) B. Models for Heterogeneous Multicores
! !! !!! ! !!!
Heterogeneous architectures on first glance look similar to
However, if we have a look at the evolution of commercial asymmetric multicores. However, in addition to a conventional
multicore processors, in addition to the already discussed complex core, they introduce unconventional or U-cores [11]
architectural alternatives, we meanwhile can see new architec- that represent custom logic, GPU resources, or FPGAs. Such
tural variants for asymmetric systems (fig. 4), dynamic systems cores can be interesting for specific application types with
(fig. 5) and heterogeneous multicore systems (fig. 6). SIMD parallelism, GPU-like multithreaded parallelism, or
specific parallel algorithms that have been mapped to custom
or FPGA logic. To model such an unconventional core, Chung
et al. suggest to take the same transistor budget for a U-core as
for a simple BCE core. The U-core then executes a specific
parallel application section with a relative performance of 𝜇,
while consuming a relative power of 𝜑 compared to a BCE
with 𝜇 = 𝜑 = 1. If 𝜇 > 1 for a specific workload, the U-core
works as an accelerator. With 𝜑 < 1 the U-core consumes less
power and can help to make better use of the dark silicon. The
resulting speedup equation by Chung et al. can directly be
derived from equation (7):
!
𝑆!!"!#$%!&!$'( (𝑓, 𝑛, 𝑟, 𝜇) = !!! ! (10)
!
!"#$(!) !(!!!)