Unit 6 JSPSingh
Unit 6 JSPSingh
Computer Arithmetic:
Addition and Subtraction Algorithm, Multiplication Algorithm
Introduction to Parallel Processing : Pipelining, Characteristics of multiprocessors, Interconnection
Structures, parallel processing
Latest technology and trends in computer architecture: next generation processors architecture,
microarchitecture, latest processor for smartphone or tablet and desktop
Content
´ Computer Arithmetic
´ Multiplication Algorithm
´ Parallel processing
´ Characteristics of Multiprocessors
´ Interconnection Structures
´ Pipelining
Computer Arithmetic’s
- Following 3 representations
Signed magnitude representation
Signed 1's complement representation
Signed 2's complement representation
(-A)+(-B) -(A+B)
(+A)-(-B) +(A+B)
(-A)-(+B) -(A+B)
Bs B Register
AV M (ModeControl)
Complementer
F
E Parallel Adder
Output Carry Input Carry
Simple procedure require magnitude comparator, an adder, two subtractor however alternative
reveals that using 2’s complement for operation requires only an adder and a complementor
Ø Addition :
ØSubtraction :
ØHere Subtraction consists of first taking the 2’s complement of the
subtrahend and then adding it to minuend
´ Multiplication Algorithm
´ Parallel processing
´ Characteristics of Multiprocessors
´ Interconnection Structures
´ Pipelining
Binary Multiplication
Sign of product determined from sign of Multiplicand and Multiplier , if same +ve else -ve
H/W implementation -Multiplication
ØIt operates on the fact that strings of 0’s in the multiplier require no
addition but just shifting
Thus M x 14= M x 24 – M x 21
Booth’s Multiplication Algorithm
3. The partial product does not change when the multiplier bit is identical
to the previous multiplier bit
Booth’s Multiplication Algorithm
Solve using Booth’s Algorithm
−9 × −13
´ BR register (5-bits)
´ -9 = 1 1001
´ 1’s Compl. = 1 0110
´ 2’s Compl. = 1 0111
´ BR’ + 1 = 0 1001
´ QR register (5-bits)
´ -13 = 1 1101
´ 1’s Compl. = 1 0010
´ 2’s Compl. = 1 0011
Example - Booth’s Multiplication Algorithm
Content
´ Computer Arithmetic
´ Multiplication Algorithm
´ Parallel processing
´ Characteristics of Multiprocessors
´ Interconnection Structures
´ Pipelining
Parallel Processing
Architectural Classification
´ Flynn's classification
´ Based on the multiplicity of Instruction Streams and Data Streams
´ Instruction Stream
´ Sequence of Instructions read from memory
´ Data Stream
´ Operations performed on the data in the processor
´ Characteristics
´ Uni-processor machine, capable of executing single instructions, operating on a single
data stream
´ Single computer containing a control unit, processor and memory unit
´ Instructions and data are stored in memory and executed sequentially
´ may or may not have parallel processing
´ parallel processing can be achieved by pipelining
Instruction stream
SIMD Processors
´ Characteristics
´ Capable of executing the same instruction on all the processors but operating on different data
streams
´ Only one copy of the program exists
´ A single controller executes one instruction at a time
Memory
Data bus
Control Unit
Instruction stream
Alignment network
M CU P
M CU P
Memory
• •
• •
• •
M CU P Data stream
Instruction stream
MIMD Processors
´ Characteristics
´ Capable of executing the multiple instructions on multiple datasets
´ Capable of processing several programs simultaneously
Interconnection Structure
´ The components that form a multiprocessor system are CPUs, IOPs connected to input-
output devices, and a memory unit
´ The interconnection between the components can have different physical
configurations, depending on the number of transfer paths that are available
´ Between the processors and memory in a shared memory system
´ Among the processing elements in a loosely coupled system
´ There are several physical forms available for establishing an interconnection network.
´ Time-Shared common bus
´ Multiport Memory
´ Crossbar Switch
´ Multistage Switching Network
´ Hypercube System
Time-Shared Common Bus
´ A common-bus multiprocessor system consists of a number of processors connected
through a common path to a memory unit
´ Only one processor can communicate with the memory or another processor at any given
time.
´ As a consequence, the total overall transfer rate within the system is limited by the speed of the
single path
´ Part of the local memory may be designed as a cache memory attached to the CPU
Multiport Memory
´ A multiport memory system employs separate buses between each memory module
and each CPU
´ The module must have internal control logic to determine which port will have access to
memory at any given time
´ Memory access conflicts are resolved by assigning fixed priorities to each memory port.
´ Advantage: The high transfer rate can be achieved because of the multiple paths.
´ Disadvantage: Requires expensive memory control logic and a large number of cables
and connections
Crossbar Switch
´ Consists of a number of cross-points that are placed at intersections between processor
buses and memory module paths
´ The small square in each cross-point is a switch that determines the path from a
processor to a memory module
´ Advantage: Supports simultaneous transfers from all memory modules
´ Disadvantage: The hardware required to implement the switch can become quite large
and complex
MM1 MM2 MM3 MM4
CPU1
CPU2
CPU3
CPU4
Multistage Switching Network
Pipelining
´ A technique of decomposing a sequential process into sub operations, with each sub
process being executed in a special dedicated segment that operates concurrently
with all other segments.
´ Each segment performs partial processing dictated by the way the task is dictated
´ The result obtained from computation is in each segment is transferred to next segment
in the pipeline
´ The final result is obtained after data has been passed through all segment
Design of Basic Pipeline
´ In a pipelined processor, a pipeline has two ends, the input end and the output end
´ Between these ends, there are multiple stages/segments such that output of one stage
is connected to input of next stage and each stage performs a specific operation
´ Interface registers are used to hold the intermediate output between two stages, these
interface registers are also called latch or buffer.
´ All the stages in the pipeline along with the interface registers are controlled by a
common clock
Design of Basic Pipeline
´ Pipeline Stages
´ RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC instruction set.
´ Following are the 5 stages of RISC pipeline with their respective operations
´ In this stage the CPU reads instructions from the address in the memory whose value is present in the program counter
´ In this stage, instruction is decoded and the register file is accessed to get the values from the registers used in the instruction
´ In this stage, memory operands are read and written from/to the memory that is present in the instruction
´ In this stage, computed/fetched value is written back to the register present in the instruction
Pipelining
´ Simplest way to understand pipelining is to imagine that each segment consist of input
register followed by combinational circuit. The output of combinational circuit in a
segment is applied to Input register of next segment
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci
Segment 1
R1 R2
Multiplier
Segment 2
R3 R4
Adder
Segment 3
R5
Clock
Input S 1 R 1 S 2 R 2 S 3 R 3 S 4 R 4
Space-Time Diagram
1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
Pipeline Speed-Up
n: Number of tasks to be performed
´ Speedup
´ Sk: Speedup
´ Sk = n*tn / (k + n - 1)*tp
Pipeline Speed-Up
As n becomes very larger than k-1 then k+n-1 approaches to n
´ Then : S= tn/tp
If we consider time taken to complete a task is same in both circuits then tn=ktp and speedup reduces to
´ S= ktp/tn = k