SIMD ARRAY
PROCESSORS
Chapter 4
NOTE: Refer two author book Kai Hwang and Briggs page no:325
SIMD ARRAY PROCESSORS
• ILLIAC-IV
• Burroughs Scientific Processor(BSP)
ILLIAC-IV
ILLIAC IV
• Constitutes of multiple synchronized processing
elements under one control unit
• Each processing elements contain an ALU, registers
and local memory
• User programs are loaded into the control unit
memory from an external source
• The control unit then decodes and decides where the
instruction should be executed.
ILLIAC IV
• Branching instructions are executed directly on the
control unit
• Vector instructions are sent to the processing
elements for distributive execution to achieve spatial
parallelism via duplicate arithmetic units
• The data can be loaded into the processing element’s
memory from external source via system bus and
broadcast mode of control unit
ILLIAC
•
IV
Masking scheme is used to control the status of
processing element during execution
• A PE may either be activated or deactivated during an
instruction cycle
• Enabled PE only perform execution
• Data exchange between the PE is done via
interconnection network that performs data routing
and manipulation function
ILLIAC IV
• Interconnection network is too under the supervision
of control unit
• A host computer is interfaced with array processor
through control unit
• A host computer is a general purpose machine which
serves as a overall manager of the entire system.
• Host : front-end machine; manages resources, i/o activity
• Array Processor can be considered as back-end attached
computer
• Note: each node has local memory
Burroughs Scientific Processor(BSP)
• Effectively a successor to the ILLIAC IV machine
• But with an architecture modified to reflect the fact that the BSP was
intended to be a commercial product
• It has fewer processing units than ILLIAC IV
• All processors enjoyed equal access to a common logical address space
which was divided into a number of physically separate memory
modules
• Each processing element is nothing more than an arithmetic unit with
input and output registers, and these units are homogeneous and non-
pipelined
• Note: All nodes share the same memory howsoever the memory are divided
into modules
SIMD Interconnection Network
1D : Linear; 2D: Ring, Mesh,Star, Tree; 3D: Hypercube
SIMD Interconnection Network
• Data exchange among the PE’s are done via interconnection
network
• They perform all data routing and manipulation functions
• Architecture of an interconnection network is based on
topology
• Static: pattern is fixed and cannot be reconfigured
• Dynamic : pattern inside the network are not fixed
• Depending on the number of stages it is divided into two
types
• Single stage
• Multi stage
NOTE: refer 334 page no of two author kai Hwang and Briggs
Single Stage
• Only one stage is used
• Depending on the interstage connection used a single stage is also
known as recirculating network
• Data may recirculate single stage many times before reaching their
destination
• E.g : Crossbar Network
NOTE: refer 334 page no of two author kai Hwang and Briggs
Multi Stage
• Consist of many stages interconnected switch
• Characterized by switch box and network connectivity
• The connectivity is controlled by choice of interstage
connection pattern
• A switch box may have any of the four patterns
mentioned in the next slide.
NOTE: refer 337 page no of two author kai Hwang and Briggs
1: Straight
2:Exchange
3: Lower Broadcast
4:Upper Broadcast
NOTE: refer 337 page no of two author kai Hwang and Briggs
Mesh Connected Illilac Network
No of Nodes = N=16
No of interconnection per node (r) =√N=4
Max no of hops≤ √N-1 Example:
Routing function for connecting ith node: Node 3 will be connected with:
R+i(1)= (i+1) mod N 0≤ i ≤N-I i+1= 4th node
R-i(1)= (i-1) mod N i-1=3rd node
R+r(i)= (r+i) mod N i+4=7th node
R-r(i)= (r-i) mod N i-4=(-1)=15
Shuffle exchange and omega Networks:
• The class of shuffle- exchange network is based on two routing function
shuffle(S) and exchange(G).
• Two types: perfect shuffle and Inverse shuffle
• For perfect shuffle
• Let A=an-1, an-2,…….a1,a0 be a PE address.
• S(an-1, an-2,…….a1,a0 )=an-2,…….a1,a0 , an-1 ..
• N= number of PE’s.
• n= log n
• The cyclic shifting of bits in A to the left for one bit position is performed by
the S.
NOTE: 350 page no of two author kai Hwang and Briggs
PERFECT SHUFFLE
NOTE: 351 page no of two author kai Hwang and Briggs
Shuffle exchange and omega Networks:
• Inverse shuffle
• Let A=an-1, an-2,…….a1,a0 be a PE address.
• S(an-1, an-2,…….a1,a0 )=a0, an-1, an-2,…….a1
• N= number of PE’s.
• n= log n
• The cyclic shifting of bits in A to the right for one bit position is
performed by the S.
NOTE: 350 page no of two author kai Hwang and Briggs
Inverse Shuffle
NOTE: 351 page no of two author kai Hwang and Briggs
OMEGA NETWORK (by using perfect
shuffle)
NOTE: refer Video
BLOCKING STATE
• If the i/o ports are on the same side then the network
is called one-sided networks, also known as full
switches
• Two side multistage has an input and output side
• This can be classified further as blocking or non blocking
• If the simultaneous connections of some multiple i/o pairs result in
conflicts in switches or links, then the multistage network is known
as blocking network. Eg: OMEGA NETWORK
• If the network can perform all possible connections sources (input)
and destination(output) by rearranging its connection then its
known as non-blocking network. Eg: Benes Network
Cube Interconnection Network
• Cube network can be implemented as either a re-circular network or
as multistage network for SIMD.
• In cube network by single cube we can able to connect 8 PEs.
• To connect 16 node we need two cube and so on.
• For representing 8 PEs 3bits are required.
• Interconnection made by using following rule:
• vertical lines connect vertices(PEs) differ in the most significant bit.
• Horizontal line differs in the least significant bit.
• Vertices at both ends of diagonal lines differs in the middle bit position.
(refer page no:343
Cube Interconnection Network
Assignment:
• Design a 4-cube network for an array processor consisting of 16
Processing Elements (PEs). Trace the path to route a packet of data
from the node 0110 to 1101.
• Barrel Shifter Network (refer page no:345)
• With the aid of an example explain the ‘Masking’ and ‘Data Routing’
mechanism in an SIMD Array processor.
Thank You