Multi-Core Programming - Increasing Performance Through Software Multi-Threading
Multi-Core Programming - Increasing Performance Through Software Multi-Threading
Introduction
Based on slides from Intel Software College
and
Multi-Core Programming –
increasing performance through software multi-threading
by Shameem Akhter and Jason Roberts,
Paul Otellini
Chief Executive Officer
Intel Developer Forum, Fall 2003
2008/1/15 2
1
Concurrency – in everyday use
• Reality
– PC must download streaming video data, decompress/decode it, display
it on the screen, must also handle streaming audio and send to
soundcard
– OS may be doing system tasks
– Server must receive the broadcast, encode/compress it in near real-
time, send it to possibly thousands of users.
2008/1/15 3
2008/1/15 4
2
Concurrency in sequential systems !!
• Streaming Video
– While waiting to receive a frame, decode previous frame
• FTP server
– Create a task (thread) for each user that connects
– Much simpler and easier to maintain
2008/1/15 5
Concurrency v Parallelism
• Parallel
– Multiple jobs (threads) are running simultaneously on different hardware
resources or processing elements (PEs)
– Each can execute and make progress at the same time
– Each PE can execute an instruction from a different thread
simultaneously
• Concurrency
– We often say multiple threads or processors are running on the same PE
or CPU at the same time
– But this means that the execution of the threads are interleaved in time
– A single PE is only executing an instruction from a single thread at any
particular time
• To have parallelism concurrency must use multiple hardware
resources
2008/1/15 6
3
Concurrency vs. Parallelism
•
Thread 1
Thread 2
– Parallelism: two or more threads are executing at the same time
Thread 1
Thread 2
– Multiple cores needed
2008/1/15 7
Multiprocessing v Multitasking
2008/1/15 8
4
Bleeding Edge of Computer Architecture
2008/1/15 9
2008/1/15 10
5
Flynn’s Taxonomy of Parallel Computers
1972
• SISD – single instruction, single data
• Traditional sequential computers
• Instructions executed in serial manner
• MISD – multiple instruction, single data
• More a theoretical model
2008/1/15 11
6
Flynn’s Taxonomy of Parallel Computers
1972
• MIMD multiple instruction, multiple data
• execute different instruction on different data
• Most common parallel platform today
• Multi-core computers
2008/1/15 13
2008/1/15 14
7
Expanded Taxonomy of Parallel
Architectures
• MIMD (Multiple Instruction Multiple Data) – execute different
instruction on different data
• MPP (Massively Parallel Processors)
– Distributed memory (Intel Paragon)
– Shared Memory w/o coherent caches (BBN Butterfly, T3E)
– CC-NUMA [cache coherent non-uniform memory archicture] (HP
Exemplar, SGI Origin 2000)
• Clusters – ensemble of commodity components connected by an
interconnection network within a single administrative domain and
usually in one room
• (Geographically) Distributed Systems – exploit available cycles
(Grid, DSI, Entropia, SETI@home)
2008/1/15 15
• Parallel Computers
– shared memory, bus based (SGI Origin 2000)
2008/1/15 16
8
Evolution of Parallel Machines
– standard chips
2008/1/15 17
iPSC/860
Paragon
Intel
iPSC/2 shipped Delta shipped - ASCI Red, World’s
Scientific
fastest computer in First TFLOP
Founded
the world
2008/1/15 18
9
… and we were pretty good at it
We held the MP-LINPACK record* over most of the 90’s
3000
1500
(6768)
Paragon
Paragon (3744)
1000 (512)
Delta
500
0
91 92 93 94 96 97 98 99
Thinking Machines Inc** CM-5 (1024 CPUs) SGI** Asci Blue Mountain (5040 CPUs)
IBM** ASCI Blue Pacific Intel MPP supercomputers (512 to 9472 CPUs)
2008/1/15 19
The most popular formulation: The number of transistors on integrated circuits is doubling every 12(18) months.
2008/1/15 20
10
Mistaken Interpretation of Moore’s Law
2008/1/15 21
2008/1/15 22
11