ch.4 and 5
ch.4 and 5
4th Year
Term2
-The width of lines in address bus (n-bit) determines the size of the main memory (RAM)
memory size= 2𝑛
-The number of lines in data bus determines how many bits can be stored in a main memory
location.
Interconnection Bus Structures (cont.)
Control signals (bus): are control lines used to control the operations of the system memory,
I/O devices , and instruction execution , ALU…etc.
Some of the control signals are:
- Memory read: RD - Clock
- Memory write: WR - Reset
- I/O read - Interrupt
- I/O write - Request
Several physical (H/W) techniques available for establishing an interconnection network,
Some of these schemes are presented in this section:
1. Time-shared common bus
2. Multiport memory
3. Crossbar switch
4. Multistage switching network
5. Hypercube system
1) Time-Shared Common Bus
A common-bus multiprocessor system consists of several processors and I/O devices
connected through a common path (bus) to main memory unit.
- Example: A time-shared common bus for three processors and two I/O devices is shown in
Fig. 4.1.
Common bus
Disadvantages
- Only one processor can communicate with the memory or another processor at any given
time.
- Consequently, the total overall transfer rate (bandwidth) within the system is limited by
the speed of the single path
1) Time-Shared Common Bus (cont.)
The performance of the system can be increased if two or more independent buses can be used to
transfer information. However, this increases the bus cost and complexity.
o Example: A more economical is the implementation of a dual bus structure as shown in Fig. (4.2).
o Part of the local memory may be designed as a cache memory attached to the CPU.
Disadvantage
The disadvantage is that it requires expensive memory control logic and many cables and
connectors. Consequently, this interconnection structure is usually appropriate for systems
with a small number of processors.
3) Crossbar Switch (also called switching Network)
- Consists of a set of cross-points that are placed at intersections between processor buses and
memory module paths.
- A crossbar can be defined as a switching network with N inputs and M outputs, which
allows up to min{N, M} one-to-one interconnections without contention.
Types of Crossbar Switch:
a) Uni-directional crossbar
b) Bidirectional crossbar
Disadvantage:
o The hardware required to implement the switch can become quite large and complex.
Fig. (4.4) shows the functional design of a crossbar switch connected to one memory
module.
3) Crossbar Switch (cont.)
- The small square in Fig. (4.4) of each cross-
point is a switch that determines the path from
a processor to a memory module.
- PE Mem).
C11 C1n
P1 C12
...
- cij is the control signals to determine Pn
Cn1
Cn2
Cnn
OR gates needed.
...
Cn1
Cn2
Cnn
Pn
C11
• All processors Pi(s) execute the same instruction simultaneously (for vector processing).
Thus, providing a single instruction stream with multiple data streams (SIMD operation).
b) MCM (Master Control Memory): holds the instructions and common data.
Array Processor (cont.)
- Data is exchanged between scratchpad registers and local memories of the Pis. This
exchange takes place through path provided by the Inter-Processor Communication
Network (IPCN).
Example: Consider the following recurrence equation:
𝑧𝑖 = 𝑧𝑖−1 + 𝑎𝑖 𝑓𝑜𝑟 𝑖 = 0, … … 3 with 𝑎−1 = 0
Using array processor, calculate the result of the equation and draw array processor graph
that indicate how the recurrence equation is calculated. Determine the number of steps
needed to complete the implementation of the equation.
Solution
First, expand the recurrence equation : 𝒛𝒊 = 𝒛𝒊−𝟏 + 𝒂𝒊 𝑓𝑜𝑟 𝑖 = 0, … … 3 with 𝑎−1 = 0
𝑧0 = 𝑧−1 + 𝑎0 = 0 + 𝑎0 = 𝑎0 i=0
𝑧1 = 𝑧0 + 𝑎1 = 𝑎0 + 𝑎1 i=1
𝑧2 = 𝑧1 + 𝑎2 = 𝑎0 + 𝑎1 + 𝑎2 i=2
𝑧3 = 𝑧2 + 𝑎3 = 𝑎0 + 𝑎1 + 𝑎2 + 𝑎3 i=3
- To perform the recurrence equation by an array processing system , we need four processing
elements (4- PE).
- We assume that each PE (or Pi) is initialized with the data 𝑎𝑖 . Now, the following graph shows how
the values of 𝑧𝑖 are calculated.
𝑧2 𝑧3
𝑧0
initialization a0 a1 a2 a3
P0 P1 P2 P3
𝑧0 = 𝑧−1 + 𝑎0 = 0 + 𝑎0 = 𝑎0 i=0
𝑧1 = 𝑧0 + 𝑎1 = 𝑎0 + 𝑎1 i=1
𝑧2 = 𝑧1 + 𝑎2 = 𝑎0 + 𝑎1 + 𝑎2 i=2
𝑧3 = 𝑧2 + 𝑎3 = 𝑎0 + 𝑎1 + 𝑎2 + 𝑎3 i=3
Disable Disable Enable Enable
Step 2
a0 a0+ a1 a0+ a1+ a2 a0+a1+a2+a3
a0 a1 a2 a3
initialization
P0 P1 P2 P3
Notes:
- In general, for an array processor system with N processing elements (where N is power of
2), it is possible to evaluate N- values of 𝑧𝑖 ∶ (𝑧0 , 𝑧1 , … … . . 𝑧𝑁−1 ) using 𝒍𝒐𝒈𝟐 𝑵 steps.
- Also, we need to disable 2𝑘−1 processing elements during step k.
Usage of Array Processors
•Array processors enhance the total speed of instruction processing.
•Most array processors' design optimizes its performance for repetitive arithmetic
operations, making it faster at vector arithmetic than the host CPU.
•Since most Array processors run asynchronously from the host CPU, the system's overall
capacity is thus improved.
•Array Processors have their own local memory, providing additional extra memory to
systems with limited memory. This is an essential consideration for the systems with a limited
physical memory or address space.
Applications:
Array processing is used at various places, including:
Applications
1 2 5 6
:
X2 :
X1 X2
X1
Before After
Solution: Matrix multiplication Z=X*Y 𝑌22
𝑍11 𝑍12 𝑋11 𝑋12 𝑌11 𝑌12 𝑌12 𝑌21
= ∗
𝑍21 𝑍22 𝑋21 𝑋22 𝑌21 𝑌22
𝑌11 0
𝑍11 = 𝑋11 𝑌11 + 𝑋12 𝑌21
𝑍12 = 𝑋11 𝑌12 + 𝑋12 𝑌22 000
𝑿𝟏𝟏 𝑿𝟏𝟐
𝑍21 = 𝑋21 𝑌11 + 𝑋22 𝑌21
𝑍22 = 𝑋21 𝑌12 + 𝑋22 𝑌22 000
𝑿𝟐𝟏 𝑋22
:
:
X2 X2
X1
X1
Before After
Clock 1 Clock 2
𝑌22
𝑌12 𝑌21
0 𝑌22
𝑌11
0 0
0
Clock 3 Clock 4
0 0 0 0
𝒁𝟏𝟐
0 +𝑋11 *0 𝑋11 𝑌12 + 𝑋12 𝑌22 0 0 +0*0
0 𝑋11 𝑋12 𝑋11 𝑋12
Homework: Using focal plane systolic array architecture, provide a step- by- step block diagram
approach for a (3*3) matrix multiplication.
3) Wavefront Array Processor
- Wavefront arrays are another kind of SIMD systems.
- It is very similar to Systolic Array since it comprises from a set of simple processing
elements (PE) with regular and local connections which takes external inputs and processes
them in a predetermined manner in a pipelined fashion.
- But its asynchronous Network
Example: Consider the following wavefront array cell, provide a step- by- step block diagram
approach of a (2*2) matrix multiplication Z= X*Y
A
B data
𝑍11 𝑍12 𝑿𝟏𝟏 𝑿𝟏𝟐 𝒀𝟏𝟏 𝒀𝟏𝟐
Solution: Matrix multiplication Z=X*Y 𝑍21 𝑍22
=
𝑿𝟐𝟏 𝑿𝟐𝟐
∗
𝒀𝟐𝟏 𝒀𝟐𝟐
𝑌21 𝑌12
𝑍21 = 𝑋21 𝑌11 + 𝑋22 𝑌21
𝑌11 0
𝑍22 = 𝑋21 𝑌12 + 𝑋22 𝑌22
A 0 𝑿𝟏𝟐 𝑿𝟏𝟏
0 0
B Initialization
data
𝑿𝟐𝟐 𝑿𝟐𝟏 𝟎
0 0
𝑌22
𝑌21 𝑌12
𝑌11 0
0 𝑿𝟏𝟐 𝑿𝟏𝟏
Step 1
0 0
𝑌22
𝑌21 𝑌12
𝑿𝟐𝟐 𝑿𝟐𝟏 𝟎
0 0
0 𝑋12 𝑋11 0
𝟎 + 𝑿𝟏𝟏 𝒀𝟏𝟏 0
𝑌11 0
𝑋22 𝑋21 0
0 0
0
𝑌22
𝑌21 𝑌12
0 𝑋12 𝑋11
𝟎 + 𝑿𝟏𝟏 𝒀𝟏𝟏 0
0 Step 2
𝑌11 0
𝑋22 𝑋21 0 𝑌22
0
0 0
0 𝑍11
0 𝑿𝟏𝟏 𝒀𝟏𝟏 +
𝑋12 𝑋11
𝟎 + 𝑿𝟏𝟏 𝒀𝟏𝟐
𝑿𝟏𝟐 𝒀𝟐𝟏 + 𝟎
𝑌21 𝑌12
𝑋22 𝑋21
𝟎 + 𝑿𝟐𝟏 𝒀𝟏𝟏 0
𝑌11
0 𝑌22
𝑍11
𝑋12 𝑋11
0 𝑋11 𝑌11 + 𝑋12 𝑌21
+0 0 + 𝑋11 𝑌12
𝑌11
𝑍11 𝑍12
0 0 𝑋12
𝑿𝟏𝟏 𝒀𝟏𝟏 + 𝑿𝟏𝟐 𝒀𝟐𝟏 𝑿𝟏𝟏 𝒀𝟏𝟐 + 𝑿𝟏𝟐 𝒀𝟐𝟐
+𝟎
0 𝑌22
𝑌21 𝑌12
0 0
𝒁𝟏𝟏 𝒁𝟏𝟐
0 0 0
𝑋11 𝑌11 + 𝑋12 𝑌21 𝑋11 𝑌21 + 𝑋12 𝑌22
+0
0 0
𝑌21 𝑌22
Exercise: Consider the following wavefront array cell, provide a step- by- step block
diagram approach of a (3*3) matrix multiplication Z= X*Y
X2 :
X1 X2
X1
Before After