Advanced Computer Architecture Slides
Advanced Computer Architecture Slides
By Kai Hwang.
Computer Generations
1. Computer evolved two stage of development,
Sequential Execution.
Detection of parallelism using Bernstein's Conditions
If two adders are available simultaneously, the
parallel execution requires only three steps.
Detection of parallelism using Bernstein's Conditions
We Learn:
Computational Granularity or level of
parallelism in programs
Communication latency
Scheduling Issues
Program Partitioning and Scheduling
Using Grain Packing , top two levels into four Course grain
Nodes as V, W, X and Y.
Remaining three nodes (V,W,X and Z) form the fifth node as Z
Parallel decomposition for static
multiprocessor schedule
Using Grain Packing , top two levels into four Course grain
Nodes as V, W, X and Y.
Remaining three nodes (V,W,X and Z) form the fifth node as Z.
Speed UP : 864/446=1.94
Program Flow Mechanisms
Conventional Computers are based on control
flow ,mechanism as stated by Instruction
execution by the user program.
Data Flow computers emphasize the execution
of any instruction based by data (operand)
availability.
Data Flow Computers emphasize higher degree
of parallelism at the fine grain instructional
level.
Program Flow Mechanisms
Control Flow Computer Data Flow Computer
Program Counter Sequences the Uses shared memory to hold program
Program and Instruction
Control Driven Data availability driven
Contro Flow can be made parallel by Execution is driven by data availability
using parallel language constructs
Computational results (data tokens) are
passed directly between instructions
The data generated is duplicated into
many copies and forwarded directly to all
needy instructions
Data tokens once consumed by an
instruction, will no longer be available for
reuse
Requires no Program Counter, no shared
memory, no control sequencer
A data flow Architecture
Arvind at MIT developed tagged token
architecture.
A data flow Architecture
A data flow Architecture
The inter PE communications are done
through pipelined routing network
With in PE, the machine provides a low level
token matching mechanism which dispatches
only those instructions whose input data tokens
are already available
Each datum is tagged with the address of the
Instructions to which it belongs
Instruction are stored in program memory.
Tagged tokens enter the PE through a local
path.
Data flow Graph
A data flow graph is similar to dependence
graph, with a difference that data tokens are
passed around edges in a data flow graph.
Add: 1
Multiply: 2
Divide: 3 cycles to complete
Sequential execution on a uniprocessor
requires 48 cycles
Data flow Graph
Data driven execution on 4 processor requires
24 cycles.