0% found this document useful (0 votes)
14 views37 pages

Ca 1

Uploaded by

ashikapramodpm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views37 pages

Ca 1

Uploaded by

ashikapramodpm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Module III - STRUCTURES AND

ALGORITHMS FOR ARRAY


PROCESSORS
SIMD ARRAY PROCESSORS

 A synchronous array of parallel processors is called


an array processor
 It consists of multiple Processing Elements (PEs)
under the supervision of one Control Unit (CU)
 An array processor can handle Single Instruction and
Multiple Data (SIMD) streams
 Array processors are also known as SIMD computers
 SIMD computers appear in two basic architectural
organizations:
1. Array Processors – using random-access
memory
2. Associative Processors – using content
addressable memory
SIMD Computer Organizations

 An array processor may assume one of two slightly


different configurations
 This configuration is structured with N synchronized
PEs, all of which are under the control of one CU.
 Each PEi is an arithmetic logic unit (ALU) with
attached working registers and local memory
 The CU has its own main memory for the storage of
programs
 The system and user programs are executed under
the control of CU
 The user programs are loaded into the CU memory
from the external source
 The function of the CU is to decode all the
instructions and determine where the decoded
instructions should be executed
 Scalar or control-type instructions are directly
executed inside the CU.
 Vector instructions are broadcast to the PEs for
distributed execution to achieve spatial parallelism
through duplicate arithmetic units (PEs).
 All the PEs perform the same function
synchronously in a lock-step fashion under the
command of the CU.
 Vector operands are distributed to the PEMs before
parallel execution in the array of PEs.
 The distributed data can be loaded into the PEMs
from an external source via the system data bus, or
via the CU in a broadcast mode using the control bus
 Masking schemes are used to control the status of
each PE during the execution of a vector instruction.
 Each PE may be either active or disabled during an
instruction cycle.
 A masking vector is used to control the status of all
PEs.
 Only enabled PEs perform computation
 Data exchanges among the PEs are done via an inter-
PE communication network, which performs all
necessary data-routing and manipulation functions.
 This interconnection network is under the control of
the control unit.
 An array processor is normally interfaced to a host
computer through the control unit.
 The host computer is a general-purpose machine
which serves as the "operating manager" of the
entire system, consisting of the host and the
processor array.
 The functions of the host computer include resource
management and peripheral and I/O supervisions.
 The control unit of the processor array directly
supervises the execution of programs, whereas the
host machine performs the executive and I/O
functions with the outside world.
Configuration II
 This configuration II differs from the configuration I
in two aspects.
1. The local memories attached to the PEs are now
replaced by parallel memory modules shared by all
the PEs through an alignment network.
2.The inter-PE permutation network is replaced by
the inter alignment network, which is again
controlled by the CU.
 A good example of a configuration II SIMD machine
is the Burroughs Scientific Processor (BSP).
 There are N PEs and P memory modules in
configuration II.
 The two numbers are not necessarily equal.
 They have been chosen to be relatively prime.
 The alignment network is a path-switching network
between the PEs and the parallel memories.
 Such an alignment network is desired to allow
conflict accesses of the shared memories by as many
PEs as possible.
SIMD computer

 Formally, an SIMD computer C is characterized by


the following set of parameters:
C = < N, F, I, M >
 Where
 N = the number of PEs in the system. For example,
the Illiac-IV has N= 64, the BSP has N =16, and the
MPP has N = 16,384.
 F = a set of data-routing functions provided by the
interconnection network (in configuration I) or by
the alignment network (in configuration II).
 I = the set of machine instructions for scalar-vector,
data-routing, and network-manipulation operations.
 M = the set of masking schemes, where each mask
partitions the set of PEs into the two disjoint subsets
of enabled PEs and disabled PEs.
Masking and Data Routing Mechanisms

 Each PEi is
o a processor with its own memory PEMi ,
o a set of working registers and flags, namely Ai ,
Bi , Ci and Si,
o an arithmetic logic unit,
o a local index register Ii ,
o an address register Di and
o a data-routing register Ri .
 The Ri of each PEi is connected to the Ri of other PEs
via the interconnection network.
 When data transfer among PEs occurs, it is the
contents of the Ri registers that are being
transferred.
 We denote the N PEs as PE for i= 0, 1, 2, , N-1, where
the index i is the address of PEi .
 We assume N = 2m or m = log2 N binary digits are
needed to encode the address of a PE.
 The address register Di is used to hold the m bit
address of the PEi.
 This PE structure is essentially based on the design
in Illiac IV
 Some array processor may use 2 routing register, one
for input and the other for output.
 Each PEi is either active or in the inactive mode
during each instruction cycle.
 If a PEi is active, it executes the instruction broadcast
to it by the CU.
 If a PEi is inactive, it will not execute the broadcast
instruction.
 The masking schemes are used to specify the status
flag Si of PEi.
 The conventions Si = 1 is chosen for an active PEi and
Si = 0 for an inactive PEi .
 In the CU, there is a global index register I and a
Masking register M.
 The M register has N bits.
 The physical length of a vector is determined by the
number of PEs.
 The CU performs the segmentation of a long vector
into vector loops, the setting of a global address, and
the offset increment.
 In an array processor, vector operands can be
specified by the registers to be used or by the
memory addresses to be referenced.
 For memory-reference instructions, each PEi
accesses the local PEMi offset by its own index
register Ii.
 The register Ii modifies the global memory address
broadcast from the CU.
 Thus, different locations in different PEMs can be
accessed simultaneously with the same global
address specified by the CU.
 Array processors are special purpose computers for
limited scientific applications.
 The array of PEs are passive arithmetic units waiting
to be called for parallel computation duties.
 The permutation network among PEs is under
control from the CU
Inter-PE Communications

 Network design decisions for inter-PE


communications are :
1. Operation modes
2. Control strategies
3. Switching methodologies
4. Network topologies
 These are fundamental decisions in determining the
appropriate architecture of an interconnection
network for an SIMD machine.
 The decisions are made between operation modes,
control strategies, switching methodologies, and
network topologies.
Operation modes

 Two types of communication can be identified :


1. Synchronous : Synchronous communication is
needed for establishing communication paths
synchronously for either a data manipulating
function or for a data instruction broadcast.
2. Asynchronous : Asynchronous communication
is needed for multiprocessing in which
connection requests are issued dynamically
 A system may also be designed to facilitate both
synchronous and asynchronous processing.
 The typical operation modes of interconnection
networks can be classified into three categories:
synchronous, asynchronous, and combined.
 All existing SIMD machines choose the synchronous
operation mode, in which lock-step operations
among all PEs are enforced.
Control strategies

 An interconnection network consists of a number of


switching elements and interconnecting links,
interconnection functions are realized by properly setting
control of the switching elements.
 The control-setting function can be managed by a
centralized controller or by the individual switching
element.
 The latter strategy is called distributed control and the
first strategy corresponds to centralized control.
 Most existing SIMD interconnection networks choose the
centralized control on all switch elements by the control
unit.
Switching methodologies

 The two major switching methodologies are :


 Circuit switching : A physical path is actually
established between a source and a destination.
 Packet switching : Data is put in a packet and
routed through the interconnection network
without establishing a physical connection path.
 Circuit switching is much more suitable for bulk data
transmission, and packet switching is more efficient
for many short data messages.
 Another option, integrated switching, includes the
capabilities of both circuit switching and packet
switching.
 Most SIMD interconnection networks are handwired
to assume circuit switching operations.
 Packet switched networks have been suggested
mainly for MIMD machines.
Network topologies

 A network can be depicted by a graph in which nodes


represent switching points and edges represent
communication links.
 The topologies tend to be regular and can be grouped
into two categories:
 Static : Links between two processors are passive
and dedicated buses cannot be reconfigured for
direct connections to other processors.
 Dynamic: Links in the dynamic category can be
reconfigured by setting the network’s active
switching elements.
 The space of the interconnection networks can be
represented by the cartesian product of the above
four sets of design features:
{operation mode} x {control strategy} x {switching
methodology} x {network topology}
 The choice of a particular interconnection network
depends on the application demands, technology
supports, and cost-effectiveness.

You might also like