DDCO module5 (1)
DDCO module5 (1)
DDCO module5 (1)
MODULE-V
Basic Processing Unit: Some Fundamental Concepts: Register Transfers, Performing ALU
operations,fetching a word from Memory, Storing a word in memory. Execution of a
Complete Instruction.
Pipelining:Basic concepts, Role of Cache memory, Pipeline Performance.
Text book 2: 7.1, 7.2, 8.1
BASIC PROCESSING UNIT:
Processing unit executes machine instructions and coordinates the activities of other units. This
unit is also called Instruction set processor or simply the processor.
Figure shows an organization in which the arithmetic and logic unit(ALU) and all the registers
are interconnected via a single common bus. The data and address lines of external memory bus
are connected to the internal processor bus via MDR and MAR respectively. The control lines of
memory bus are connected to instruction decoder and control logic block. This unit is
responsible for issuing the signals that control operation of all units inside the processor and for
interaction with memory bus.
The number and use of processor registers R0 through R(n-1) vary considerably from one
processor to another. Three registers X, Y and TEMP are used by the processor for temporary
storage during execution of some instructions. The multiplexer MUX selects output of register Y
(selectY) or a constant value 4 to be provided as input A of the ALU. The registers, the ALU and
the interconnecting bus are collectively referred to as the datapath. With few exceptions, an
instruction can be executed by performing one or more of the following operations in some
specified sequence:
• Transfer a word of data from one processor register to another or to the ALU
• Perform an arithmetic or a logic operation and store the result in a processor register
• Fetch the contents of a given memory location and load them into a processor register
• Store a word of data from a processor register into a given memory location
We now consider in detail how each of these operations is implemented, using the simple
processor model.
Consider the instruction Move (R1), R2. The memory read operation requires three steps:
1. R1out, MARin, Read
2. MDRinE, WMFC
3. MDRout, R2in
Where WMFC is the control signal that causes the processor’s control circuitry to wait for the
arrival of the MFC signal. Assume output of MAR enabled all the time. When a new address is
loaded into MAR it will appear on the memory bus at the beginning of the next clock cycle as
shown in timing diagram. A read control signal is activated at the same time MAR is loaded.
This signal will cause the bus interface circuit to send a read command, MR, on the bus. Control
signal MDRinE is activated while waiting for a response from memory. thus, the data received
from memory are loaded into MDR at the end of clock cycle in which the MFC signal is
received. In the next clock cycle MDRout is activated to transfser the data to register R2.
The processing starts as usual, the fetch phase ends in step 3. In step 4, The offset value is
extracted from the IR by the instruction decoding circuit. In step 5, The result, which is the
branch target address, is loaded into the PC.
5.3 Pipelining
• Pipelining is a technique of decomposing a sequential process into sub operations, with
each sub process being executed in a special dedicated segment that operates
concurrently with all other segments.
✓ Decomposing a sequential process into suboperations
✓ Each subprocess is executed in a special dedicated segment concurrently
• The computer is controlled by a clock whose period is such that the fetch and execute
steps of any instruction can each be completed in one clock cycle. Operation of the
computer proceeds as in Figure 5.1c.
• In the first clock cycle, the fetch unit fetches an instruction I1 (step F1) and stores it in
buffer B1 at the end of the clock cycle.
• In the second clock cycle, the instruction fetch unit proceeds with the fetch operation for
instruction I2 (step F2). Meanwhile, the execution unit performs the operation specified
by instruction I1, which is available to it in buffer B1 (step E1).
• By the end of the second clock cycle, the execution of instruction I 1 is completed and
instruction I2 is available. Instruction I2 is stored in B1, replacing I1, which is no longer
needed. Step E2 is performed by the execution unit during the third clock cycle, while
instruction I3 is being fetched by the fetch unit. In this manner, both the fetch and execute
units are kept busy all the time.
• If the pattern in Figure 5.1c can be sustained for a long time, the completion rate of
instruction execution will be twice that achievable by the sequential operation depicted in
Figure 5.1a
In summary, the fetch and execute units in Figure 5.1b constitute a two-stage pipeline in which
each stage performs one step in processing an instruction. An interstage storage buffer, B1, is
needed to hold the information being passed from one stage to the next. New information is
loaded into this buffer at the end of each clock cycle. The processing of an instruction need not
be divided into only two steps.
For example, a pipelined processor may process each instruction in four steps, as follows:
For example, during clock cycle 4, the information in the buffers is as follows:
• Buffer B1 holds instruction I3, which was fetched in cycle 3 and is being decoded by the
instruction-decoding unit.
• Buffer B2 holds both the source operands for instruction I2 and the specification of the
operation to be performed. This is the information produced by the decoding hardware in
cycle 3. The buffer also holds the information needed for the write step of instruction I2
(step W2) . Even though it is not needed by stage E, this information must be passed on to
stage W in the following clock cycle to enable that stage to perform the required Write
operation.
• Buffer B3 holds the results produced by the execution unit and the destination information for
instruction I1.
• A unit that completes its task early is idle for the remainder of the clock period. Hence,
pipelining is most effective in improving performance if the tasks being performed in
different stages require about the same amount of time.
• This consideration is particularly important for the instruction fetch step, which is
assigned one clock period in Figure 5.2a.
• The clock cycle has to be equal to or greater than the time needed to complete a fetch
operation. However, the access time of the main memory may be as much as ten times
greater than the time needed to perform basic pipeline stage operations inside the
processor, such as adding two numbers. Thus, if each instruction fetch required access to
the main memory, pipelining would be of little value.
• The use of cache memories solves the memory access problem.
11 Dept. of CSE, CEC, MANGALORE
DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024
• In particular, when a cache is included on the same chip as the processor, access time to
the cache is usually the same as the time needed to perform other basic operations inside
the processor. This makes it possible to divide instruction fetching and processing into
steps that are more or less equal in duration.
• Each of these steps is performed by a different pipeline stage, and the clock period is
chosen to correspond to the longest one.
o For a variety of reasons, one of the pipeline stages may not be able to complete its
processing task for a given instruction in the time allotted.
o For example, stage E in the four-stage pipeline of Figure 5.2b is responsible for
arithmetic and logic operations, and one clock cycle is assigned for this task. Although
this may be sufficient for most operations, some operations, such as divide, may require
more time to complete . Figure 5.3 shows an example in which the operation specified in
instructin I2 require 3 cycles to complete ,from cycle 4 through cycle6.
o Thus cycle ,in cycle 5 and 6,the write stage must be told to nothing,because it has no data
to work with.
o Meanwhile ,the information in buffer B2 must remain intact until the Execute stage has
completed its operation. This means that stage 2 and, in turn, stage 1 are blocked from
accepting new instructions because the information in B1 cannot be overwritten. Thus,
steps D4 and F5 must be postponed as shown.
Figure 5.3 Effect of an execution operation taking more than one clock cycle.
Pipelined operation in Figure 5.3 is said to have been stalled for two clock cycles. Normal
pipelined operation resumes in cycle 7.
• Any condition that causes a pipeline to stall is called a hazard.
• Data hazard – any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. So some operation has to
be delayed, and the pipeline stalls.
• Instruction (control) hazard – a delay in the availability of an instruction causes the
pipeline to stall.
• Structural hazard – the situation when two instructions require the use of a given
hardware resource at the same time.
• An example of a structural hazard is shown in Figure 5.5. This figure shows how the load
instruction
Load X(R1),R2
can be accommodated in our example 4-stage pipeline.
• The memory address, X+[R1],is computed in step E2 in cycle4, then memory access
takes place in cycle 5.
• The operand read from memory is written into register R2 in cycle 6. This means that the
execution step of this instruction takes two clock cycles (cycles 4 and 5).
• It causes the pipeline to stall for one cycle, because both instructions I2 and I3 require
access to the register file in cycle 6.
• Even though the instructions and their data are all available, the pipeline is stalled
because one hardware resource, the register file, cannot handle two operations at once.
• If the register file had two input ports, that is, if it allowed two simultaneous write
operations, the pipeline would not be stalled. In general, structural hazards are avoided by
providing suf ficient hardware resources on the processor chip.
Questions:
1. Write down the control sequence for the execution of the instruction Add
(R3), R1.
(in execution of a complete instruction)
[What are the actions required to execute a complete instruction Add (R3),
R1]