Architecture1 1 (2012)
Architecture1 1 (2012)
January, 2012 Kenichi Miura, Ph.D. Professor, National Institute of Informatics Fellow, Fujitsu Laboratories limited
Course Outline
1. Introduction (What is parallel processing? Why needed?) 2. HPC Architecture 2.1. History of Supercomputers and trends 2.2. Classification of Parallel Architecture (From CPU to System) 2.3. Memory Architecture (Shared, Distributed ) 3. Computational Models 4. Parallel Algorithms 7.1 Serial vs. Parallel Algorithms 7.2 Hardware Realization and Examples of special-purpose Processors 5.Parallel Programming Languages 4.1. Relations between Parallel Languages and Architecuture 4.2. Parallel Language for Shared-memory Architecture OpenMP 4.3. Parallel Languages for Distributed-memory Architecture Message Passing Interface 6. Application Areas for the Large Scale Scientific Computations/Simulations 7.Grid Computing and Cloud
There are THREE things which are inevitable in this world Death, Tax and Parallelism !
John RiganatiSarnoff Lab
USA ENIACBallistic Table EDVAC, JOHNNYAC, MANIAC Nuclear Weapon Development U.K. ColossusCode Breaking
ENIAC c1946
D.H.Lehmers Criticism
This(ENIAC) was a highly parallel machine, until von Neumann spoiled it.
D.H.Lehmer: A History of the Sieve Process in N.Metropolis et al: A History of Computing in the Twentieth Century(1980)
Input Device
Output Device
Memory (MSU)
Program Data
Input Device
Output Device
Memory(MSU)
Program Data
10 10 10 10 10 10
12
tera
VPP800
10
giga
8
VP2000 CRAY T90 VP400 CRAY C90 CRAY X-MP + + CRAY Y-MP ILLIAC4 + VP200 CRAY2 CDC7600 CRAY1 STRETCH LARK IBM704 CDC1604 CDC6600 230-75/APU
mega
kilo
2
UNIVAC ENIAC
Mark I
1950
1960
1970
1980
1990
2000
Source:L.Smar
Parallel Architecture - Discussion items Classification of Hardware Parallelism Processor DesignVector, Scalar Memory Design Shared, Distributed) Computational Models and Parallelism Correspondence with Systems Vector vs Scalar Parallel Applications and Parallelism SIMD,MIMD,SPMD) Data Parallel vs Control Parallel Numerical Algorithms and Parallelism
Moores Law
Number of transistors which can be mounted On a chip doubles in 24 18 months
Measurements
Celsius
microprocessor motherboard
measured at six locations
next slide
Celsius
Why?
Arrhenius equation
temperature implications
Celsius
Celsius
- Instruction Stream (Fetch, Decode, Issue,.) - Data Stream (Segmented Arithmetic Units, chaining,)
Seymour Cray
Cray-1
Sourced from http://www.thocp.net/hardware/cray_1.htm
CRAY 1 Architecture
VP100/200/400 Architecture
ILLIAC IV:
The first SIMD Supercomputer System (1975-1982 @ NASA Ames Research Center)
D.L.Slotnick
ILLIAC IV Architecture
SIMD:64 way parallelism ECL7 gates/chip by TI) 16 MHz clock First Semiconductor Memory by Fairchild Designed at University of Illinois Built by Burroughs Corporation
100 G
Vector-
10 G 1G
SMP
V-SMP Cluster
ScalarSMP
SPP / SMP-Cluster
MPP/Cluster
Inout Devices
Cache
Cache
Output Device
L1 L2 L3
More complexity are introduced in order to cope with mismatch between CPU and Memory!
Interleaved Memory
-Cache avoidance technique Cyclic use of multiple memory banks to hide the cycle time of memory Used in Vector Processing System
CPU Registers MSU
Bank 0
Bank 1
Bank2
Bank n-2
Bank n-1
B0 B1 B2 Memory Banks
B255
B0 B1 B2 Memory Banks
B255
B0 B1 B2 Memory Banks
B255
B0 B1 B2 Memory Banks
B255
Matrix A =
Intel Microprocessor
IBM Power7
Power Wall
25MW to the building 12.5MW to the computers
Memory Bus
System Interconnect
ToFu 1
ToFu 2
ToFu 3
ToFu 4
ToFu 5
4 x 4 Crossbar Network
cable
cable
Key issue is how to keep the pipelines filled for longer time!
Degree of Parallelism in Application software Locality of datasets Memory latency hiding and Wider memory Bandwidth
Multi-threading
Denelcor HEP Cray MTA/XMT - Multiple Instruction Counters (one/thread) - Pipelines everywhere - Full /Empty bits on every memory word
HEP Architecture