Coa Notes Unit 5
Coa Notes Unit 5
Parallel Processing
The need of parallel processing is to enhance the computer processing capability and
increase its throughput, i.e. the amount of processing that can be accomplished
during a given interval of time.
Introduction
Parallel processing can be described as a class of techniques which enables the system
to achieve simultaneous data-processing tasks to increase the computational speed of a
computer system.
A parallel processing system can carry out simultaneous data-processing to achieve
faster execution time.
The primary purpose of parallel processing is to enhance the computer processing
capability and increase its throughput, i.e. the amount of processing that can be
accomplished during a given interval of time.
A parallel processing system can be achieved by having a multiplicity of functional
units that perform identical or different operations simultaneously.
The data can be distributed among various multiple functional units.
The following diagram shows one possible way of separating the execution unit into
eight functional units operating in parallel and the operation performed in each
functional unit is indicated in each block of the diagram:
o The adder and integer multiplier performs the arithmetic operation with integer
numbers.
o The floating-point operations are separated into three circuits operating in parallel.
1
o The logic, shift, and increment operations can be performed concurrently on different
data.
o All units are independent of each other.
There are a variety of ways that parallel processing can be classified. it can be
considered from the:
o Internal organization of the processors
o Interconnection structure between processors
o The flow of information through the system
One classification introduced by M. J. Flynn considers the organization of a computer
system by the number of instructions and data items that are manipulated
simultaneously.
The normal operation of a computer is to fetch instructions from memory and execute
them in the processor.
The sequence of instructions read from memory constitutes an instruction stream.
The operations performed on the data in the processor constitute a data stream
Parallel processing may occur in the instruction stream, in the data stream, or in both.
Flynn’s classification divides computers into four major groups as follows:
1. Single instruction stream, single data stream (SISD)
2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)
2
3. Multiple Instruction stream, Single Data stream (MISD)
•Here there are n processor units, each receiving distinct instructions operating over the
same data stream.
•The results of the one processor become the input of the next processor in the macro
pipe.
•This structure receives less attention and has been challenged as impractical in some
application.
1. It saves time and money as many resources working together will reduce the time
3
and cut potential costs.
2. It can be impractical to solve larger problems on Serial Computing.
3. It can take advantage of non-local resources when the local resources are finite.
4
KR23 COMPUTER ORGANIZATION AND ARCHITECTURE CSE/AIML II-II
Examples of Pipelining
Example 1:
1
Example 2:
To perform the combined multiply and add operations with a stream of
numbers Ai*Bi+ Ci for i =1,2,3,…,7
Each sub operation is to be implemented in a segment within a
pipeline. Segment 1: R1Ai,R2Bi Input Ai and Bi
Segment 2:R3R1*R2,R4Ci Multiply and input Ci
Segment 3:R5R3+R4 Add Ci to product
The five registers are loaded with new data every clock pulse.
The effect of each clock is shown in Table 9-1.
The first clock pulse transfers A1 and 31 into R1 andR2.
The second clock pulse transfers the product of R1 and R2 into R3 and C1into R4.
The same clock pulse transfers A2 and B2 into R1 and R2.
The third clock pulse operates on all three segments simultaneously.
It places A3 and B3 into R1 and R2, transfers the product of R1 and R2 into R3,
transfers C2 into R4, and places the sum of R3 and R4 into R5.
It takes three clock pulses to fill up the pipe and retrieve the first output from R5.
From there on, each clock produces a new output and moves the data one step down
the pipeline.
Each segment has one or two registers and a combinational circuit as shown in Fig. 9-
2.
2
We are considering the implementation of A[7] array with B[7] array
If the above task is executed without the pipelining, then each data operation will take
5 cycles, totally they are 35 cycles of CPU are needed to perform the operation.
But if are using the concept of pipeline, the task can be executed in 9 cycles.
Principles of Pipelining
General considerations
Any operation that can be decomposed into a sequence of sub operations of about the
same complexity can be implemented by a pipeline processor.
The general structure of a four-segment pipeline is illustrated in Fig. 9-3.
3
The behavior of a pipeline can be illustrated with a space-time diagram.
The space-time diagram of a four-segment pipeline is demonstrated in Fig. 9-4.
4
5
KR23 COMPUTER ORGANIZATION AND ARCHITECTURE CSE/AIML II-II
Arithmetic Pipeline
An arithmetic pipeline divides an arithmetic problem into various sub problems for
execution in various pipeline segments.
Arithmetic Pipelines are mostly used in high-speed computers.
They are used to implement floating-point operations, multiplication of fixed-point
numbers, and similar computations encountered in scientific problems.
To understand the concepts of arithmetic pipeline, let us consider an example of a
pipeline unit for floating-point addition and subtraction with two normalized floating-
point binary numbers defined as:
X
= A * 2a = 0.9504 * 103
Y = B * 2b = 0.8200 * 102
Where A and B are two fractions that represent the mantisas, a and b are the exponents.
The combined operation of floating-point addition and subtraction is divided into four
segments.
Each segment contains the corresponding sub-operation to be performed in the given
pipeline.
The sub-operations that are shown in the four segments are:
1. Compare the exponents by subtraction.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.
The process or flowchart arithmetic pipeline for floating point addition is shown in
the diagram.
1
Figure : Flowchart of Arithmetic Pipeline
3. Add mantissas:
The two mantissas are added in segment three.
Z = X + Y = 1.0324 * 103
2
KR23 COMPUTER ORGANIZATION AND ARCHITECTURE CSE/AIML II-II
Instruction Pipeline
Pipeline processing can occur not only in the data stream but in the instruction stream
as well.
Most of the digital computers with complex instructions require instruction pipeline to
carry out operations like fetch, decode and execute instructions.
In this a stream of instructions can be executed by overlapping fetch, decode and
execute phases of an instruction cycle.
This type of technique is used to increase the throughput of the computer system.
An instruction pipeline reads instruction from the memory while previous instructions
are being executed in other segments of the pipeline.
Thus we can execute multiple instructions simultaneously.
The pipeline will be more efficient if the instruction cycle is divided into segments of
equal duration.
In general, the computer needs to process each instruction with the following sequence
of steps:
1. Fetch instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the results.
The flowchart for instruction pipeline is shown below.
1
Figure 3.1 shows how the instruction cycle in the CPU can be processed with a four-
segment pipeline.
While an instruction is being executed in segment 4, the next instruction in sequence
is busy fetching an operand from memory in segment 3.
The effective address may be calculated in a separate arithmetic circuit for the third
instruction, and whenever the memory is available, the fourth and all subsequent
instructions can be fetched and placed in an instruction FIFO.
Thus up to four sub-operations in the instruction cycle can overlap and up to four
different instructions can be in progress of being processed at the same time.
Once in a while, an instruction in the sequence may be a program control type that
causes a branch out of normal sequence.
In that case the pending operations in the last two segments are completed and all
information stored in the instruction buffer is deleted.
The pipeline then restarts from the new address stored in the program counter.
Similarly, an interrupt request, when acknowledged, will cause the pipeline to empty
and start again from a new address value.
Figure 3.2 shows the operation of a instruction pipeline with the help of a
four segment pipeline.
2
The fetch and decode phase overlap due to pipelining i.e by the time the first
instruction is being decoded, next instruction is fetched by the pipeline.
In case of third instruction, it is a branch instruction.
Here when it is being decoded, 4th instruction is fetched simultaneously.
But as it is a branch instruction it may point to some other instruction when it is
decoded.
Thus fourth instruction is kept on hold until the branch instruction is executed.
When it gets executed then the fourth instruction is copied back and the other phases
continue as usual.
3
KR23 COMPUTER ORGANIZATION AND ARCHITECTURE CSE/AIML II-II
RISC Pipeline
Among the characteristics attributed to RISC is its ability to use an efficient
instruction pipeline.
The simplicity of the instruction set can be utilized to implement an instruction
pipeline using a small number of sub operations, with each being executed in one clock
cycle.
Because of the fixed-length instruction format, the decoding of the operation can occur
at the same time as there register selection.
All data manipulation instructions have register-to register operations.
Since all operands are in registers, there is no need for calculating an effective address
or fetching of operands from memory.
Therefore, the instruction pipeline can be implemented with two or three segments.
One segment fetches the instruction from program memory, and the other segment
executes the instruction in the ALU.
A third segment may be used to store the result of the ALU operation in a destination
register.
The data transfer instructions in RISC are limited to load and store instructions.
1. These instructions use register indirect addressing. They usually need three or
four stages in the pipeline.
2. To prevent conflicts between a memory access to fetch an instruction and to
load or store an operand, most RISC machines use two separate buses with two
memories.
3. Cache memory: operate at the same speed as the CPU clock
One of the major advantages of RISC is its ability to execute instructions at the rate of
one per clock cycle.
1. In effect,it is to start each instruction with each clock cycle and to pipeline the
processor to achieve the goal of single-cycle instruction execution.
2. RISC can achieve pipeline segments, requiring just one clock cycle.
1
Consider the hardware operation for such a computer.
The control section fetches the instruction from program memory into an instruction
register.
1. The instruction is decoded at the same time that the registers needed for the
execution of the instruction are selected.
The processor unit consists of a number of registers and an arithmetic logic unit
(ALU).
A data memory is used to load or store the data from a selected register in the register
file.
The instruction cycle can be divided into three sub operations and implemented in
three segments:
1. I:Instruction fetch
Fetches the instruction from program memory
2. A:ALU operation
The instruction is decoded and an ALU operation is performed.
It performs an operation for a data manipulation instruction.
It evaluates the effective address for a load or store instruction.
It calculates the branch address for a program control instruction.
3. E:Execute instruction
Directs the output of the ALU to one of three destinations, depending on
the decoded instruction.
It transfers the result of the ALU operation into a destination register in
the register file.
It transfers the effective address to a data memory for loading or
storing.
It transfers the branch address to the program counter.
Delayed Load
2
delayed load.
Figure9-9(b) shows the same program with a no-op instruction inserted after the load
to R2 instruction.
The data is loaded intoR2 in clock cycle4.
The add instruction uses the value of R2 in step 5.
Thus the no-op instruction is used to advance one clock cycle in order to compensate
for the data conflict in the pipeline.
The advantage of the delayed load approach is that the data dependency is taken care
of by the compiler rather than the hardware.
This results in a simpler hardware segment since the segment does not have to check if
the content of the register being accessed is currently valid or not.
Delayed Branch
The method used in most RISC processors is to rely on the compiler to redefine the
branches so that they take effect at the proper time in the pipeline.
This method is referred to as delayed branch.
The compiler is designed to analyze the instructions before and after the branch
and rearrange the program sequence by inserting useful instructions in the delay
steps.
It is up to the compiler to find useful instructions to put after the branch instruction.
Failing that, the compiler can insert no-op instructions.
4
In Fig. 9-10(a) the compiler inserts two no-op instructions after the branch.
The branch address X is transferred to PC in clock cycle 7.
The fetching of the instruction at X is delayed by two clock cycles by the no-op
instructions.
The instruction at X starts the fetch phase at clock cycle 8 after the program counter
PC has been updated.
The program inFig.9-10(b) is rearranged by placing the add and subtract instructions
after the branch instruction instead of before as in the original program.
PC is updated to the value of X in clock cycle 5, but add and subtract instructions are
fetched from memory and executed in the proper sequence.
In other words, if the load instruction is at address 101 and X is equal to 350, the
branch instruction is fetched from address 103.
The add instruction is fetched from address 104 and executed in clock cycle 6. The
subtract instruction is fetched from address 105and executed in clock cycle 7.
Since the value of X is transferred to PC with clock cycle 5 in the E segment, the
instruction fetched from memory at clock cycle6 is from address350, which is the
instruction at the branch address.
5
6
7
8
8
KR23 COMPUTER ORGANIZATION AND ARCHITECTURE CSE II-II
Types of Hazards
They are typically three types of hazards:
1. Data Hazards.
2. Structural Hazards.
3. Control (or) Branch (or) Instruction Hazards.
Data Hazards
Data Hazard occurs when instructions that exhibit data dependency in different stages
of a pipeline. (or)
Data hazard is a situation in which the pipeline is stalled because the data to be
operated on is delayed for some reason.
For example, stage E in the four stage pipeline in Figure 6.1 is responsible for
arithmetic and logic operations such as divide, may require more time to complete in
which I2 requires three cycles to complete from cycle 4 through cycle 6.
Thus in cycles 5 and 6, the write stage must be told to do nothing, because it has no
data to work with.
Meanwhile, the information in buffer B2 must remain intact until the Execute stage
has completed its operation.
This means that stage 2 and stage 1 are blocked from accepting new instructions
because the information in B1 cannot be overwritten.
Thus, steps D4 and F5 must be postponed.
Pipelined operation is said to have stalled for two clock cycles which leads to hazard.
A data hazard is any condition in which either the source or destination operands of an
instruction are not available at the time expected in the pipeline.
1
As a result some operation has to be delayed, and the pipeline
stalls.
2
In the above Figure 6.2, the result of multiply is placed into register R4, which in turn
is one of the two source operands of Add instruction.
Assume that multiply operation takes one clock cycle to complete execution.
As decode unit decodes the Add instruction in cycle 3, it realizes that R4 is used as a
source operand.
Hence, the D step of that instruction cannot be completed until the W step of multiply
instruction has been completed.
Completion of Step D2 must be delayed to clock cycle 5.
Instruction I3 is fetched in cycle 3, but its decoding must be delayed because the step
D3 cannot precede D2.
Hence, pipelined execution is stalled for two cycles.
Read After Write (RAW): (j tries to read a source before i writes to it)
A read after write (RAW) data hazard refers to a situation where an instruction refers
to a result that has not yet been calculated or retrieved.
This can occur because even though an instruction is executed after a previous
instruction, the previous instruction has not been completely processed through the
pipeline.
Example: I1:R2 ← R1 + R3
I2:R4 ← R2 + R3
The first instruction is calculating a value to be saved in register 2, and the second
instruction is going to use this value to compute a result for register 4.
3
However, in a pipeline, when we fetch the operands for the 2nd operation, the results
from the first will not yet have been saved, and hence we have a data dependency.
There is a data dependency with instruction 2, as it is dependent on the completion of
instruction 1.
Write After Read (WAR):(j tries to write a destination before it is read by i).
A write after read (WAR) data hazard represents a problem with concurrent execution.
Example: I1:R4 ← R1+ R3
I2:R3 ← R1 + R2
If in a situation that there is a chance that i2 may be completed before i1 (i.e. with
concurrent execution)
we must ensure that we do not store the result of register 3 before i1 had a chance to
fetch the operands.
Write After Write (WAW): (j tries to write an operand before it is written by i).
A write after write (WAW) data hazard may occur in a concurrent execution
environment.
For example:
I1: R2 ← R1 + R2
I2: R2 ← R4 + R7
We must delay the WB (Write Back) of i2 until the write back of i1 is completed.
4
5
6
KR23 COMPUTER ORGANIZATION AND ARCHITECTURE CSE II-II
1. Pipeline Bubbling
Pipeline Bubble also known as pipeline break or a pipeline stall is a software method
for preventing data hazards from occurring.
When instructions are fetched, control logic determines whether a hazard could occur.
If it is true, then the control logic inserts a NOP instruction (No operation) into the
pipeline.
Thus, before the next instruction (which could cause hazard) is executed, the previous
one will have sufficient time to complete and prevent hazard.
If the responsibility of detecting dependencies is left entirely to the software, the
compiler must insert the NOP instruction to obtain correct results.
If the number of NOPs is equal to number of stages in the pipeline, the processor has
been cleared of all instructions and can proceed free from hazards.
This is called flushing the pipeline. All forms of stalling introduce a delay before the
processor can resume execution.
1
The data forwarding mechanism works as follows:
Forwarding is implemented by feeding back the output of previous instruction into the
next stages of the pipeline as soon as the output of that instruction is available.
2
3
4
KR23 COMPUTER ORGANIZATION AND ARCHITECTURE CSE II-II
Structural Hazards
1
Structural hazards are avoided by providing sufficient hardware resources on the
processor chip.
Pipelining does not result in individual instructions being executed faster rather it is
the throughput that increases where throughput is measured by rate at which
instruction execution is completed.
At any time one of the stages in the pipeline cannot complete its operation in one
clock cycle, the pipeline stalls and performance of the pipeline degrades.
Example 2:
A machine has shared a single-memory pipeline for data and instructions. As a result,
when an instruction contains a data-memory reference (load), it will conflict with the
instruction reference for a later instruction (instruction 3).
2
Clock cycle number
Instr 1 2 3 4 5 6 7 8 9
Load IF ID EX MEM WB
Instr 1 IF ID EX MEM WB
Instr 2 IF ID EX MEM WB
Instr 3 Stall IF ID EX MEM WB
Stalling of Instr 3 to resolve structural hazard situations.
3
KR23 COMPUTER ORGANIZATION AND ARCHITECTURE CSE II-II
Branch hazards (also known as control or instruction hazards) occur with branches.
On many instruction pipeline micro architectures, the processor will not know the
outcome of the branch when it needs to insert a new instruction into the pipeline
(normally the fetch stage).
Branch hazards occur when the processor is told to branch i.e., if a certain condition is
true, then jump from one part of the instruction stream to another - not necessarily to the
next instruction sequentially.
In such a case, the processor cannot tell in advance whether it should process the next
instruction (when it may instead have to move to a distant instruction).
This can result in the processor doing unwanted actions.
Unconditional branches
1
The time lost as a result of a branch instruction is often referred to as branch penalty.
Here, the branch penalty is one clock cycle. For a longer pipeline, the branch penalty
may be higher.
Either a cache misses or branch instruction stalls the pipeline for one or more clock
cycles.
To reduce this effect many processors use fetch units that can fetch the instructions
before they are needed and put them in a queue.
2
The fetch unit has a dedicated hardware to identify the branch instructions &
compute the branch target address as quickly as possible after an instruction is fetched.
A separate dispatch unit takes the instruction from the front of the queue and sends
them to the execution units. It also performs decoding function.
To be effective, the fetch unit must have a sufficient decoding and processing
capabilities to recognize.
When the dispatch unit is not able to issue the instructions for execution because of
data hazards the fetch unit continues to fetch the instructions and add them to the queue.
If there is a delay in fetching because of branch or cache miss, dispatch unit continues
to dispatch the instructions from the instruction queue.
Having instruction queue is beneficial with cache misses. When a cache miss occurs,
dispatch unit continues to send the instructions for execution as long as the queue is not
empty.
Meanwhile the desired cache block is read from the main memory
We assume that initially the queue contains one instruction. Every fetch operation
adds one instruction to the queue and every dispatch operation reduces the queue length
by one.
Conditional Branches
Conditional branch introduces the added hazard caused by dependency of the branch
condition on the result of preceding instruction.
The decision to branch cannot be made until the execution of that instruction has been
completed.
Delayed Branch
In Figure 6.15, the processor fetches the instruction I3 before it determines whether
the current instruction I2 is a branch instruction.
When the execution of I2 is completed and a branch is made, the processor must
discard I3 and fetch the instruction at the branch target.
The location following a branch instruction is called a branch delay slot.
There may be more than one delay slot depending on the time it takes to execute a
branch instruction.
The instruction in the delay slots are always fetched and at least partially executed
before the branch decision is made and the branch target address is computed.
3
A technique called delay branching can minimize the penalty incurred as a result of
conditional branch instructions.
The instructions in the delay slots are always fetched.
We would like to arrange them to be fully executed whether or not the branch is taken.
The objective is to place useful instructions in these slots.
If no useful instructions can be placed in the delay slots, these slots must be filled with
NOP instruction.
4
KR23 COMPUTER ORGANIZATION AND ARCHITECTURE CSE II-II
Branch Prediction
Branch Prediction is a technique used for reducing branch penalty associated with
conditional branches
1
In the above figure, the prediction takes place in cycle 3, while instruction I 3 is being
fetched. The fetch unit predicts that the branch will not be taken, and continues to fetch
the instructions I3 and I4 in a sequential order.
The results of compare operation are available at end of cycle 3, assuming that these
results are forwarded directly to the instruction fetch unit branch condition is evaluated
in cycle 4.
Here instruction fetch realizes that prediction was incorrect and the two instructions in
the execution pipe are purged and new instruction I is fetched from branch target
address in clock cycle 5.
Dynamic Brach Prediction reduces the branch penalty under hardware control.
The direction of each branch is predicted by recording the information in the hardware
of the past branch history during the program execution and is therefore done at
runtime.
Here, the prediction is done in the IF stage of the pipeline.
The simplest dynamic prediction scheme is a branch prediction buffer or branch
history table.
2
Branch prediction buffer is a small fast memory which contains history information of
the previous outcomes of the branch as a prediction field.
Branch target can then be accessed as soon as the branch target address is computed
but before the branch condition is available.
2-State algorithm
Suppose the algorithm is started in LNT state.
When the branch is executed and if the branch is taken, it moves from LNT to LT.
Otherwise, it is in LNT state.
The next time when the same instruction is encountered and when the branch is
predicted as taken, the corresponding machine is in LT state. Otherwise, it moves to
LNT state.
Suppose the algorithm is started in LT state.
When the branch is executed and if the branch is not taken, it moves to LNT state.
The next time when the same instruction is encountered and when the branch is
predicted as taken, the corresponding machine is in LT state.
This scheme requires one bit of the history information for each branch instruction.
Better performance can be achieved by keeping more information about the execution
history.
4-State algorithm
Suppose the algorithm is started in LNT state and the branch is executed and if the
branch is taken, it moves from LNT to ST else it moves to SNT.
Suppose the algorithm is started in SNT state and when it is executed and if the branch
is taken it moves from SNT to LNT else it will be SNT state.
When the branch is in LNT state and when it is executed again and if the branch is
taken (i.e.; the prediction is incorrect twice) it moves from LNT to ST.
Suppose the algorithm is started in LT state and the branch is executed and if the
branch is taken, it moves from LT to ST else it moves to SNT.
Suppose the algorithm is started in ST state and when it is executed and if the branch
is taken (the prediction is incorrect) it will be ST state else it moves to LT state.
3
4
KR23 COMPUTER ORGANIZATION AND ARCHITECTURE CSE/AIML II-II
Characteristics of RISC
Simpler instruction, hence simple instruction decoding.
Instruction comes undersize of one word.
Instruction takes a single clock cycle to get executed.
More general-purpose registers.
Simple Addressing Modes.
Fewer Data types.
A pipeline can be achieved.
Advantages of RISC
Simpler instructions: RISC processors use a smaller set of simple instructions,
which makes them easier to decode and execute quickly. This results in faster
processing times.
Faster execution: Because RISC processors have a simpler instruction set, they can
execute instructions faster than CISC processors.
Lower power consumption: RISC processors consume less power than CISC
processors, making them ideal for portable devices.
Disadvantages of RISC
More instructions required: RISC processors require more instructions to perform
complex tasks than CISC processors.
1
Increased memory usage: RISC processors require more memory to store the
additional instructions needed to perform complex tasks.
Higher cost: Developing and manufacturing RISC processors can be more expensive
than CISC processors.
Characteristics of CISC
Complex instruction, hence complex instruction decoding.
Instructions are larger than one-word size.
Instruction may take more than a single clock cycle to get executed.
Less number of general-purpose registers as operations get performed in memory
itself.
Complex Addressing Modes.
More Data types.
Advantages of CISC
Reduced code size: CISC processors use complex instructions that can perform
multiple operations, reducing the amount of code needed to perform a task.
More memory efficient: Because CISC instructions are more complex, they require
fewer instructions to perform complex tasks, which can result in more memory-
efficient code.
Widely used: CISC processors have been in use for a longer time than RISC
processors, so they have a larger user base and more available software.
Disadvantages of CISC
Slower execution: CISC processors take longer to execute instructions because they
have more complex instructions and need more time to decode them.
More complex design: CISC processors have more complex instruction sets, which
makes them more difficult to design and manufacture.
Higher power consumption: CISC processors consume more power than RISC
processors because of their more complex instruction sets.
2
Key Difference between RISC and CISC Processor
• Instructions are easier to decode in RISC than in CISC, which has a more complex
decoding process.
• Calculations in CISC require external memory, but they are not necessary for RISC.
• While CISC only has a single register set, RISC has numerous register sets.
• The execution time of a RISC computer is very low compared to a CISC computer,
which is very high.
RISC architectures emphasize simpler instructions, shorter execution times, and
efficient use of registers.
CISC architectures offer complex instructions, potentially leading to fewer
instructions overall but with longer execution times.
RISC architectures often follow a load-store model, while CISC architectures allow
direct memory access instructions.
3
4
KR23 COMPUTER ORGANIZATION AND ARCHITECTURE CSE/AIML II-II
However, the execution unit can only operate on data that has been loaded
into one of the six registers (A, B, C, D, E, or F).
To find the product of two numbers - one stored in location 2:3 and
another stored in location 5:2 - and then store the product back in the
location 2:3.
1
in the appropriate register.
2
The task of multiplying two numbers can be completed with one
instruction: MULT 2:3, 5:2
MULT is known as a "complex instruction" ,Which operates directly on the
computer's memory banks and does not require the programmer to explicitly call any
loading or storing functions.
One of the primary advantages is that the compiler has to do very little work to
translate a high-level language statement into assembly. Because the length of the
code is relatively short, very little RAM is required to store instructions. The emphasis
is directly into the hardware.
RISC processors only use simple instructions that can be executed within one clock cycle.
Thus, the "MULT" command could be divided into three separate commands: "LOAD,"
which moves data from the memory bank to a register, "PROD," which finds the product of
two operands located within the registers, and "STORE," which moves data from a register
to the memory banks.
LOAD A, 2:3
LOAD B, 5:2
PROD A, B
STORE 2:3, A
Here, there are more lines of code, more RAM is needed to store the assembly level
instructions.
The compiler must also perform more work to convert a high-level language statement into
code of this form.
The advantages of RISC is that because each instruction requires only one clock cycle to
execute, the entire program will execute in approximately the same amount of time as the
multi-cycle "MULT" command.
These RISC "reduced instructions" require less transistors of hardware space than the
complex instructions, leaving more room for general purpose registers.
Because all of the instructions execute in a uniform amount of time (i.e. one clock),
pipelining is possible.
Separating the "LOAD" and "STORE" instructions actually reduces the amount of work
that the computer must perform.
3
CISC VERSUS RISC
CISC RISC
13. CISC processor executes microcode 13. RISC processor has a number
instructions. of hardwired instructions.
14. CISC processors cannot have 14. Large number of registers, most
a large number of registers. of which can be used as general
purpose registers.
15. Intel and AMD CPU's are based 15. Apple uses RISC chips.
on CISC architectures.
4
16. Mainly used in normal PC’s, 16.Mainly used for real time
Workstations and servers. applications.
18. Direct addition between data in two 18. Direct addition is not
memory locations. Ex.8085. possible.
20. CISC approach minimizes the number 20. RISC approach maximizes the
of instructions per program and increases the number of instructions per program
number of cycles per instruction. and reduces the number of cycles per
instruction.
5
6
6