Multi-Core Architectures: Rakesh Kumar Rakumar@cs - Ucsd.edu
Multi-Core Architectures: Rakesh Kumar Rakumar@cs - Ucsd.edu
Rakesh Kumar
[email protected]
100.00 Pow er PC
AMD
10.00
1.00
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05
1
Price being paid
1
Watts/Spec
0.1
Intel
Alpha
Sparc
Mips
HP PA
Pow er PC
AMD
0.01
1 10 100 1000 10000
Spec2000
Lessons learned
Marginal utility of transistors decreasing
If n be the number of transistors
Power and Area are O(n)
Performance is O(sqrt(n))
• Wrong side of square law
2
One way of handling a problem is….
..instead of confronting the problem try skipping
to a simpler one
For example,
In terms of area:
1 EV6 5 EV5 cores
In terms of throughput:
1 EV6 2.0-2.2 EV5 cores
5EV5 cores >=2 EV6 cores
• Performance doubled just by having multiple cores!
3
Multi-core Architecture: Definition
Simpler core
Possibility of lower cycle time, better optimisation etc.
4
So, the next question to ask obviously is…
! "# "
5
Goals of my thesis research
Demonstrate that the prior methodology is highly
inefficient in terms of area and power
6
Before scrutinizing the “identical cores” assumption...
7
Implication of diversity on multi-core design
8
An example multi-core architecture
9
An alternate multi-core architecture
%&
%&
%&
%&
' ( "$ $)
%& ( ! $ ( * %&
10
Single-ISA Heterogeneous Multi-core Architectures
11
Another Performance Advantage: Adjusts to varying TLP
%&
%&
%& +, $ $ $( $
%& %&
$( ( %&
%&
+- ( $$ $ ( *
%&
%&
4EV6
6
Weighted Speedup
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Num ber of threads
12
Comparing Single-ISA Heterogeneous
Architectures against Conventional CMPs
7 4EV6 20EV5
6
Weighted Speedup
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Num ber of threads
A choice has to be made between throughput and ST performance
8
4EV6
7 3EV6 & 5EV5 (static best)
20EV5
6
Weighted Speedup
5 +.( / $ $( $ %&
+0
4
%& ! ( 1 $ ( *
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Num ber of threads
13
Then there is intra-program diversity as well!
1.6
1.2 EV8-
EV6
IPS
EV5
0.8
EV4
0.4
0
1 201 401 601 801
7 4EV6
3EV6 & 5EV5 (random)
6
3EV6 & 5EV5 (stat ic best)
0
1 2 3 4 5 6 7 8
N um be r o f t hre a ds
14
To sum up….
Single-ISA Heterogeneous architectures a good design
point for throughput as well as performance:
Talk Outline
15
Reducing power for a conventional multi-core architecture
23
16
23
23
17
4 " (# ! (#
!
(#
18
An example Single-ISA heterogeneous multi-core architecture
+ ( # $* *5($
$
+5 ( 1
2 13,6
!( 1&
EV4 4.97 3
EV5 9.83 5
EV6 17.80 24
EV8- 92.88 260
7$ ) # $ (
#$ ( *
19
Choosing Dynamically the Core with Least Energy
(perf. loss<10%)
2
1.6
1.2 EV8-
EV6
IPS
EV5
0.8
EV4
0.4
0
1 201 401 601 801
1.6
1.2 EV8-
EV6
IPS
EV5
EV4
0.8
Best-path
0.4
0
1 201 401 601 801
20
Choosing Dynamically the Core with Least Energy
(perf. loss<10%)
[Summary of results]
Realistic heuristics
1
Energy
Performance(1/execution-time)
Energy-delay
0.8
Normalized Value (wrt EV8-)
0.6
0.4
0.2
0
neighbour neighbor- random all Dynamic
global oracle
5$ ( 8 / *$
21
To sum up…
Bottomline
22
Summary of talk
23