Unit 4 COA
Unit 4 COA
Parallel computing is a computing where the jobs are broken into discrete parts that can be executed
concurrently. Each part is further broken down to a series of instructions. Instructions from each part
execute simultaneously on different CPUs. Parallel systems deal with the simultaneous use of
multiple computer resources that can include a single computer with multiple processors, a number
of computers connected by a network to form a parallel processing cluster or a combination of both.
Parallel systems are more difficult to program than computers with a single processor because the
architecture of parallel computers varies accordingly and the processes of multiple CPUs must be
coordinated and synchronized.
The crux of parallel processing are CPUs. Based on the number of instruction and data streams that
can be processed simultaneously, computing systems are classified into four major categories:
Flynn’s classification –
The speed of the processing element in the SISD model is limited(dependent) by the rate at which
the computer can transfer information internally. Dominant representative SISD systems are IBM PC,
workstations.
MIMD machines are broadly categorized into shared-memory MIMD and distributed-memory
MIMD based on the way PEs are coupled to the main memory.
In the shared memory MIMD model (tightly coupled multiprocessor systems), all the PEs are
connected to a single global memory and they all have access to it. The communication between PEs
in this model takes place through the shared memory, modification of the data stored in the global
memory by one PE is visible to all other PEs. Dominant representative shared memory MIMD
systems are Silicon Graphics machines and Sun/IBM’s SMP (Symmetric Multi-Processing).
In Distributed memory MIMD machines (loosely coupled multiprocessor systems) all PEs have a local
memory. The communication between PEs in this model takes place through the interconnection
network (the inter process communication channel, or IPC). The network connecting PEs can be
configured to tree, mesh or in accordance with the requirement.
The shared-memory MIMD architecture is easier to program but is less tolerant to failures and
harder to extend with respect to the distributed memory MIMD model. Failures in a shared-memory
MIMD affect the entire system, whereas this is not the case of the distributed model, in which each
of the PEs can be easily isolated. Moreover, shared memory MIMD architectures are less likely to
scale because the addition of more PEs leads to memory contention. This is a situation that does not
happen in the case of distributed memory, in which each PE has its own memory. As a result of
practical outcomes and user’s requirement , distributed memory MIMD architecture is superior to
the other existing models.
Parallel processing is used to increase the computational speed of computer systems by performing
multiple data-processing operations simultaneously. For example, while an instruction is being
executed in ALU, the next instruction can be read from memory. The system can have two or more
ALUs and be able to execute multiple instructions at the same time. In addition, two or more
processing is also used to speed up computer processing capacity and increases with parallel
processing, and with it, the cost of the system increases. But, technological development has
reduced hardware costs to the point where parallel processing methods are economically possible.
Parallel processing derives from multiple levels of complexity. It is distinguished between parallel and
serial operations by the type of registers used at the lowest level.
Shift registers
work one bit at a time in a serial fashion, while parallel registers work simultaneously with all bits of
the word. At high levels of complexity, parallel processing derives from having a plurality of
functional units that perform separate or similar operations simultaneously. By distributing data
among several functional units, parallel processing is installed. As an example, arithmetic, shift and
logic operations can be divided into three units and operations are transformed into a teach unit
under the supervision of a control unit. One possible method of dividing the execution unit into eight
functional units operating in parallel is shown in figure. Depending on the operation specified by the
instruction, operands in the registers are transferred to one of the units, associated with the
operands. In each functional unit, the operation performed is denoted in each block of the diagram.
The arithmetic operations with integer numbers are performed by the adder and integer multiplier.
Floating-point operations
can be divided into three circuits operating in parallel. Logic, shift, and increment operations are
performed concurrently on different data. All units are independent of each other, therefore one
number is shifted while another number is being incremented. Generally, a multi-functional
organization is associated with a complex control unit to coordinate all the activities between the
several components.
The main advantage of parallel processing is that it provides better utilization of system resources by
increasing resource multiplicity which overall system throughput.
COA Pipelining
The term Pipelining refers to a technique of decomposing a sequential process into sub-operations,
with each sub-operation being executed in a dedicated segment that operates concurrently with all
other segments.
The most important characteristic of a pipeline technique is that several computations can be in
progress in distinct segments at the same time. The overlapping of computation is made possible by
associating a register with each segment in the pipeline. The registers provide isolation between
each segment so that each can operate on distinct data simultaneously.
The structure of a pipeline organization can be represented simply by including an input register for
each segment followed by a combinational circuit.
Let us consider an example of combined multiplication and addition operation to get a better
understanding of the pipeline organization.
The combined multiplication and addition operation is done with a stream of numbers such as:
The operation to be performed on the numbers is decomposed into sub-operations with each sub-
operation to be implemented in a segment within a pipeline.
The sub-operations performed in each segment of the pipeline are defined as:
R5 ← R3 + R4 Add Ci to product
The following block diagram represents the combined as well as the sub-operations performed in
each segment of the pipeline.
Registers R1, R2, R3, and R4 hold the data and the combinational circuits operate in a particular
segment.
The output generated by the combinational circuit in a given segment is applied as an input register
of the next segment. For instance, from the block diagram, we can see that the register R3 is used as
one of the input registers for the combinational adder circuit.
In general, the pipeline organization is applicable for two areas of computer design which includes:
1. Arithmetic Pipeline
2. Instruction Pipeline
1. Arithmetic Pipeline:
Components
o Addition Stage: In this stage, the pipeline plays the addition operation. It's a crucial
mathematical operation and is frequently broken down into sub-parts for efficient
processing.
o Division Stage: Division is any other arithmetic operation that can take advantage of
pipelining. Dividing various involves more than one step, and breaking down the approach
into pipeline ranges can decorate the general tempo of execution.
Advantages:
o Optimized Resource Utilization: The pipeline structure allows for the best usage of
processing resources. While one arithmetic operation is within the multiplication stage,
every other can be within the addition stage, maximizing the performance of the processor.
2. Instruction Pipeline:
Components:
o Instruction Fetch (IF): The first stage entails fetching the instruction from memory. The
software program counter is used to decide the address of the following approach.
o Instruction Decode (ID): In this phase, the fetched instruction is decoded to determine the
operation to be completed and to understand the operands involved.
o Execution (EX): The actual computation or operation through the instruction takes place in
this stage. It might also additionally contain mathematics or logical operations.
o Memory Access (MEM): If instruction requires access to memory, this stage is wherein data
is analyzed from or written to memory.
o Write Back (WB): The final phase includes registering the results once more to report or
memory and finishing the execution of these.
Advantages:
o Improved Throughput: The instruction pipeline allows for a continuous drift of commands
through the processor, enhancing the usual throughput. While one instruction is within the
execution phase, every other may be within the decoding phase, resulting in better resource
utilization.
o Faster Program Execution: By overlapping the execution of instructions, the time taken to
execute a series of commands is reduced. This outcome in faster software execution is a vital
element in enhancing the general performance of a PC system.
Conclusion
In short, pipelining stands as a cornerstone of processor layout, presenting a systematic and effective
technique for enhancing typical overall performance via parallelism. Its application stages range from
simple practice pipelines to the modern superscalar architectures seen in current CPUs. The
evolution of pipelining techniques, coupled with improvements in memory hierarchy and ILP,
continues to enhance power in computer structures, pushing the bounds of computational
competencies. In the future, the principles of pipelining are possibly crucial to the continued
exploration of faster and more trusting computing structures