0% found this document useful (0 votes)
5 views43 pages

Pipeline

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 43

Unit # 5

Pipeline and Vector Processing

Dr. Rajesh Tiwari


Professor ( CSE – AIML)
CMREC, Hyderabad, Telangana
Characteristics of multiprocessors
• A multiprocessor system is an interconnection of two or more
CPUs with memory and input-output equipment.
• The term “processor” in multiprocessor can mean either a central
processing unit (CPU) or an input-output processor (IOP).
• Multiprocessors are classified as multiple instruction stream,
multiple data stream (MIMD) systems
• The similarity and distinction between multiprocessor and
multicomputer are:
– Similarity
• Both support concurrent operations
– Distinction
• The network consists of several autonomous computers that may or
may not communicate with each other.
• A multiprocessor system is controlled by one operating system that
provides interaction between processors and all the components of the
system cooperate in the solution of a problem.
Characteristics of multiprocessors
• Multiprocessing improves the reliability of the system.
• The benefit derived from a multiprocessor organization is an
improved system performance.
– Multiple independent jobs can be made to operate in parallel.
– A single job can be partitioned into multiple parallel tasks.
• Multiprocessing can improve performance by decomposing a
program into parallel executable tasks.
– The user can explicitly declare that certain tasks of the program be
executed in parallel.
• This must be done prior to loading the program by specifying the parallel
executable segments.
– The other is to provide a compiler with multiprocessor software that
can automatically detect parallelism in a user’s program.
Characteristics of multiprocessors
• Multiprocessor are classified by the way their memory is
organized.
– A multiprocessor system with common shared memory is classified as a
shared-memory or tightly coupled multiprocessor.
• Tolerate a higher degree of interaction between tasks.
– Each processor element with its own private local memory is classified as
a distributed-memory or loosely coupled system.
• Are most efficient when the interaction between tasks is minimal
Interconnection Structures
• The components that form a multiprocessor system are CPUs,
IOPs connected to input-output devices, and a memory unit.
• The interconnection between the components can have
different physical configurations, depending on the number of
transfer paths that are available
– Between the processors and memory in a shared memory system
– Among the processing elements in a loosely coupled system
• There are several physical forms available for establishing an
interconnection network.
– Time-shared common bus
– Multiport memory
– Crossbar switch
– Multistage switching network
– Hypercube system
Time Shared Common Bus
• A common-bus multiprocessor system consists of a number of
processors connected through a common path to a memory unit.
• Disadvantage.:
– Only one processor can communicate with the memory or
another processor at any given time.
– As a consequence, the total overall transfer rate within the
system is limited by the speed of the single path
• A more economical implementation of a dual bus structure is
depicted in Fig. 5.1 and Fig. 5.2.
• Part of the local memory may be designed as a cache memory
attached to the CPU.
Time Shared Common Bus

Figure 5.1: Time shared common bus organization


Time Shared Common Bus

Figure 5.2: System bus structure for multiprocessors


Multiport Memory
• A multiport memory system employs separate buses between
each memory module and each CPU.
• The module must have internal control logic to determine which
port will have access to memory at any given time.
• Memory access conflicts are resolved by assigning fixed
priorities to each memory port.
• Advantage:
– The high transfer rate can be achieved because of the multiple paths.
• Disadvantage:
– It requires expensive memory control logic and a large number of cables
and connections
• Fig. 5.3 shows multiport memory system.
Multiport Memory

Figure 5.3: Multiport memory organization


Crossbar Switch
• This consists of a number of crosspoints that are placed at
intersections between processor buses and memory module
paths.
• The small square in each crosspoint is a switch that determines
the path from a processor to a memory module.
• Advantage:
– Supports simultaneous transfers from all memory modules
• Disadvantage:
– The hardware required to implement the switch can become
quite large and complex.
• Fig. 5.4 shows the functional design of a crossbar switch
connected to one memory module.
• Figure 5.5 shows Block diagram of crossbar switch.
Crossbar Switch

Figure 5.4: Crossbar switch


Crossbar Switch

Figure 5.5: Block diagram of crossbar switch


Multistage Switching Network
• The basic component of a multistage network is a two-input, two-
output interchange switch as shown in below Figure.
Multistage Switching Network
• Using the 2x2 switch as a building block, it is possible to build
a multistage network to control the communication between a
number of sources and destinations.
– To see how this is done, consider the binary tree shown in below
Figure.
– Certain request patterns cannot be satisfied simultaneously. i.e., if P1 -
000~011, then P2 - 100~111
Multistage Switching Network
• One such topology is the omega switching network shown in Fig.
5.6.

Fig. 5.6: 8 x 8 Omega Switching Network


• A particular request is initiated in the switching network by the
source, which sends a 3-bit pattern representing the destination
number.
• As the binary pattern moves through the network, each level
examines a different bit to determine the 2 x 2 switch setting.
• Level 1 inspects the most significant bit.
• level 2 inspects the middle bit, and level 3 inspects the least
significant bit.
• When the request arrives on either Input of the 2 x 2 switch, it
i$ routed to the upper output if the specified bit is 0 or to the
lower output if the bit is 1.
Multistage Switching Network
• Some request patterns cannot be connected simultaneously. i.e.,
any two sources cannot be connected simultaneously to destination
000 and 001

• In a tightly coupled multiprocessor system, the source is a


processor and the destination is a memory module. The first pass
through the network sets up the path. Succeeding passes are used
to transfer the address into memory and then transfer the data in
either direction. depending on whether the request is a read or a
write.

• In a loosely coupled multiprocessor system, both the source and


destination are processing elements. After the path is established,
the source processor transfers a message to the destination
processor.
Hypercube System
• The hypercube or binary n-cube multiprocessor structure is a
loosely coupled system composed of N=2n processors
interconnected in an n-dimensional binary cube.
– Each processor forms a node of the cube, in effect it contains not only a
CPU but also local memory and I/O interface.
– Each processor address differs from that of each of its n neighbors by
exactly one bit position.
• Fig. 5.7 shows the hypercube structure for n=1, 2, and 3.
• Routing messages through an n-cube structure may take from one
to n links from a source node to a destination node.
– A routing procedure can be developed by computing the exclusive-OR of
the source node address with the destination node address.
– The message is then sent along any one of the axes that the resulting binary
value will have 1 bits corresponding to the axes on which the two nodes
differ.
Multistage Switching Network
• A representative of the hypercube architecture is the Intel iPSC
computer complex.
• It consists of 128(n=7) microcomputers, each node consists of a
CPU, a floating-point processor, local memory, and serial
communication interface units.

Fig. 5.7: Hypercube structures for n=1,2,3


lnterprocessor Arbitration
• System Bus
– A typical system bus consists of approximately 100 signal lines.

– These lines are divided into three functional groups: data,


address, and control.

– In addition, there are power distribution lines that supply power


to the components.

– For example, the IEEE standard 796 multibus system has 16


data lines, 24 address lines, 26 control lines, and 20 power lines,
for a total of 86 lines.
lnterprocessor Arbitration
Serial Arbitration Procedure:
• The serial priority resolving technique is obtained from a daisy-
chain connection of bus arbitration circuits similar to the priority
interrupt logic presented.

• The processors connected to the system bus are assigned priority


according to their position along the priority control line.

• The device closest to the priority line is assigned the highest


priority.

• When multiple devices concurrently request the use of the bus, the
device with the highest priority is granted access to it.
lnterprocessor Arbitration
Serial Arbitration Procedure:
• Next Figure shows the daisy-chain connection of four arbiters.
• It is assumed that each processor has its own bus arbiter logic
with priority-in and priority-out lines.
• The priority out (PO) of each arbiter is connected to the
priority in (PI) of the next-lower-priority arbiter.
• The PI of the highest-priority unit is maintained at a logic 1
value.
• The highest-priority unit in the system will always receive
access to the system bus when it requests it.
• The PO output for a particular arbiter is equal to 1 if its PI
input is equal to 1 and the processor associated with the arbiter
logic is not requesting control of the bus.
lnterprocessor Arbitration

Figure: Serial (daisy-chain) arbitration


lnterprocessor Arbitration
• If the processor requests control of the bus and the
corresponding arbiter finds its PI input equal to 1, it sets its PO
output to 0.

• Lower-priority arbiters receive a 0 in PI and generate a 0 in PO


.

• Thus the processor whose arbiter has a PI = 1 and PO = 0 is


the one that is given control of the system bus.
lnterprocessor Arbitration
Parallel Arbitration Logic:
• The parallel bus arbitration technique uses an external priority
encoder and a decoder as shown in next Fig. .

• Each bus arbiter in the parallel scheme has a bus request


output line and a bus acknowledge input line.

• Each arbiter enables the request line when its processor is


requesting access to the system bus.

• The processor takes control of the bus if its acknowledge input


line is enabled.

• The bus busy line provides an orderly transfer of control, as in


the daisy-chaining case.
lnterprocessor Arbitration

Figure: Parallel arbitration


lnterprocessor Communication
& Synchronization
• Interprocessor communication is used for interchanging useful
information among various regions in one or more processes
(or programs).

• This communication could involve letting another process


know that some event has occurred or the transferring of data
from one process to another.
lnterprocessor Communication
& Synchronization

Figure: Interprocess communication


lnterprocessor Communication
& Synchronization
• Synchronization is an essential part of interprocess communication.

• It refers to a case where the data used to communicate between


processors is control information.

• It is either given by the interprocess control mechanism or handled


by the communicating processes.

• It is required to maintain the correct sequence of processes and to


make sure equal access to shared writable data.
lnterprocessor Communication
& Synchronization
• Multiprocessor systems have various mechanisms for
the synchronization of resources.
• Below are some methods to provide synchronization
are as follows −
– Mutual Exclusion
– Semaphore
– Barrier
– Spinlock
lnterprocessor Communication
& Synchronization
• Mutual Exclusion
– Mutual Exclusion requires that only a single process thread
can enter the critical section one at a time.

– This also helps synchronize and prevents the race condition


by creating a stable state.
lnterprocessor Communication
& Synchronization
• Semaphore
– Semaphore is a type of variable that generally controls the
access to the shared resources by several processes.
– Semaphore is divided into two types as follows:
• Binary Semaphore
A binary semaphore is limited to zero or one. It could be used to
control access to one resource. In particular, it can be used to force
the same release of an important category in the user code.

• Counting Semaphore
Counting semaphore may take any integer value. It could be used to
control access to resources having many instances.
lnterprocessor Communication
& Synchronization
• Barrier
– A barrier ( as evident by its name) does not allow an
individual process to proceed unless all the processes do
not reach it.

– Many parallel languages use it, and collective routines


impose barriers.
lnterprocessor Communication
& Synchronization
• Spinlock
– A spinlock is a type of lock that prevents processes from
operating any function unless it is available.

– The processes which are trying to acquire the spinlock wait


in a loop while checking if the lock is available or not.

– This is also known as busy waiting because the process is


not doing any helpful operation even though it is active.
Cache Coherence
• The primary advantage of cache is its ability to reduce the average
access time in uniprocessors.

• When the processor finds a word in cache during a read operation,


the main memory is not involved in the transfer.

• If the operation is to write, there are two commonly used


procedures to update memory.

• In the write-through policy, both cache and main memory are


updated with every write operation.

• In the write-back policy, only the cache is updated and the location
is marked so that it can be copied later into main memory.
• To ensure the ability of the system to execute memory
operations correctly, the multiple copies must be kept
identical.

• This requirement imposes a cache coherence problem.

• A memory scheme is coherent if the value returned on a load


instruction is always the value given by the latest store
instruction with the same address.

• Without a proper solution to the cache coherence problem,


caching cannot be used in bus-oriented multiprocessors with
two or more processors.
• Cache coherency is a situation where multiple processor cores
share the same memory hierarchy, but have their own L1 data
and instruction caches.

• Incorrect execution could occur if two or more copies of a


given cache block exist, in two processors’ caches, and one of
these blocks is modified.
• In a multiprocessor system, data inconsistency may occur
among adjacent levels or within the same level of the memory
hierarchy.
• In a shared memory multiprocessor with a separate cache
memory for each processor, it is possible to have many copies
of any one instruction operand: one copy in the main memory
and one in each cache memory.
• When one copy of an operand is changed, the other copies of
the operand must be changed also.
• Example : Cache and the main memory may have inconsistent
copies of the same object.
• Suppose there are three processors, each having cache.
Suppose the following scenario:-
• Processor 1 read X : obtains 24 from the memory and caches
it.
• Processor 2 read X : obtains 24 from memory and caches it.
• Again, processor 1 writes as X : 64, Its locally cached copy
is updated. Now, processor 3 reads X, what value should it
get?
• Memory and processor 2 thinks it is 24 and processor 1 thinks
it is 64.
• As multiple processors operate in parallel, and independently
multiple caches may possess different copies of the same memory
block, this creates a cache coherence problem.
• Cache coherence is the discipline that ensures that changes in the
values of shared operands are propagated throughout the system in a
timely fashion.
• There are three distinct level of cache coherence :-
– Every write operation appears to occur instantaneously.
– All processors see exactly the same sequence of changes of
values for each separate operand.
– Different processors may see an operation and assume different
sequences of values; this is known as non-coherent behavior.
Thank You

You might also like