M5 Notes
M5 Notes
Module V
MODULE 5:
Basic Processing Unit and Pipelining
Basic Processing Unit: Some Fundamental Concepts: Register Transfers, Performing ALU
operations, fetching a word from Memory, Storing a word in memory. Execution of a Complete
Instruction.
Pipelining: Basic concepts, Role of Cache memory, Pipeline Performance.
• Here the processor contain only a single bus for the movement of data,
address and instructions.
• ALU and all the registers are interconnected via a Single Common Bus
(Figure 7.1).
• Data & address lines of the external memory-bus is connected to
the internal processor-bus via MDR & MAR respectively.
(MDR -> Memory Data Register, MAR -> Memory Address Register).
• MDR has 2 inputs and 2 outputs. Data may be loaded
→ into MDR either from memory-bus (external) or
→ from processor-bus (internal).
• MAR‟s input is connected to internal-bus; MAR‟s output is connected to
external- bus. (address sent from processor to memory only)
• The
control-signals that govern a particular transfer are asserted at the
start of the clock cycle.
Input & Output Gating for one Register Bit
Implementation for one bit of register Ri(as shown in fig 7.3)
All operations and data transfers are controlled by the processor clock.
• A 2-input multiplexer is used to select the data applied to the input
of an edge- triggered D flip-flop.
• Riin=1,Multiplexer selects data on the bus. This data will be loaded into
flip-flop at rising-edge of clock.
• Riin=0,Multiplexer feeds back the value currently stored in the flipflop
• Q output of flip-flop is connected to bus via a tri-state gate.
• When Riout=0, gates output in the high-impedance state.
• When Riout=1,gate drives the bus to 0 or 1,depending on the value
of Q.
CONTROL-SIGNALS OF MDR
• The MDR register has 4 control-signals (Figure 7.4):
1) MDRin & MDRout control the connection to the internal processor data bus
&
2) MDRinE & MDRoutE control the connection to the external memory Data
bus.
• Similarly, MAR register has 2 control-signals.
1) MARin: controls the connection to the internal processor address bus &
2) MARout: controls the connection to the memory address bus.
Pipelining:
Basic Concepts:
The speed of execution of programs is influenced by many factors.
➢ One way to improve performance is to use faster circuit technology to build the
processor and the main memory. Another possibility is to arrange the hardware so that
more than one operation can be performed at the same time. In this way, the number
of operations performed per second is increased even though the elapsed time needed
to perform any one operation is not changed.
➢ Pipelining is a particularly effective way of organizing concurrent activity in a
computer system.
➢ The technique of decomposing a sequential process into sub-operations, with each sub-
operation being executed in a dedicated segment .
➢ pipelining is commonly known as an assembly-line operation.
Consider how the idea of pipelining can be used in a computer. The processor executes
a program by fetching and executing instructions, one after the other.
Let Fi and Ei refer to the fetch and execute steps for instruction Ii . Execution of a
program consists of a sequence of fetch and execute steps, as shown in Figure a.
Now consider a computer that has two separate hardware units, one for fetching
instructions and another for executing them, as shown in Figure b. The instruction
fetched by the fetch unit is deposited in an intermediate storage buffer, B1. This buffer
is needed to enable the execution unit to execute the instruction while the fetch unit is
fetching the next instruction. The results of execution are deposited in the destination
location specified by the instruction.
The computer is controlled by a clock.
any instruction fetch and execute steps completed in one clock cycle.
Operation of the computer proceeds as in Figure 8.1c.
In the first clock cycle, the fetch unit fetches an instruction I1 (step F1) and
stores it in buffer B1 at the end of the clock cycle.
In the second clock cycle, the instruction fetch unit proceeds with the fetch
operation for instruction I2 (step F2). Meanwhile, the execution unit performs the
operation specified by instruction I1, which is available to it in buffer B1 (step E1).
By the end of the second clock cycle, the execution of instruction I1 is completed
and instruction I2 is available. Instruction I2 is stored in B1, replacing I1, which is
no longer needed.
Step E2 is performed by the execution unit during the third clock cycle, while
instruction I3 is being fetched by the fetch unit. In this manner, both the fetch and
execute units are kept busy all the time. If the pattern in Figure 8.1c can be
sustained for a long time, the completion rate of instruction execution will be twice
that achievable by the sequential operation depicted in Figure a.
The sequence of events for this case is shown in Figure a. Four instructions are in
progress at any given time. This means that four distinct hardware units are
needed, as shown in Figure b. These units must be capable of performing their
tasks simultaneously and without interfering with one another. Information is
passed from one unit to the next through a storage buffer. As an instruction
progresses through the pipeline, all the information needed by the stages
downstream must be passed along. For example, during clock cycle 4, the
information in the buffers is as follows:
➢ Buffer B1 holds instruction I3, which was fetched in cycle 3 and is being
decoded by the instruction-decoding unit.
➢ Buffer B2 holds both the source operands for instruction I2 and the
specification of the operation to be performed. This is the information
produced by the decoding hardware in cycle 3. The buffer also holds the
information needed for the write step of instruction I2 (stepW2). Even though
it is not needed by stage E, this information must be passed on to stage W
in the following clock cycle to enable that stage to perform the required Write
operation.
➢ Buffer B3 holds the results produced by the execution unit and the
destination information for instruction I1.
Pipeline Performance:
➢ The potential increase in performance resulting from pipelining is
proportional to the number of pipeline stages.
➢ However, this increase would be achieved only if pipelined operation as
depicted in Figure a could be sustained without interruption throughout
program execution.
➢ Unfortunately, this is not the True.
➢ Floating point may involve many clock cycle.
➢ For a variety of reasons, one of the pipeline stages may not be able to
complete its processing task for a given instruction in the time allotted. For
example, stage E in the four stage pipeline of Figure b is responsible for
arithmetic and logic operations, and one clock cycle is assigned for this task.
Although this may be sufficient for most operations, some operations, such
as divide, may require more time to complete. Figure shows an example in
which the operation specified in instruction I2 requires three cycles to
complete, from cycle 4 through cycle 6. Thus, in cycles 5 and 6, the Write
stage must be told to do nothing, because it has no data to work with.
Meanwhile, the information in buffer B2 must remain intact until the
Execute stage has completed its operation. This means that stage 2 and, in
turn, stage 1 are blocked from accepting new instructions because the
information in B1 cannot be overwritten. Thus, steps D4 and F5 must be
postponed as shown.
Pipelined operation in Figure 8.3 is said to have been stalled for two clock
cycles. Normal pipelined operation resumes in cycle 7. Any condition that
causes the pipeline to stall is called a hazard. We have just seen an example
of a data hazard.
1) A data hazard is any condition in which either the source or the
destination operands of an instruction are not available at the time
expected in the pipeline. As a result some operation has to be
delayed, and the pipeline stalls.
If instructions and data reside in the same cache unit, only one instruction can
proceed and the other instruction is delayed. Many processors use separate
instruction and data caches to avoid this delay.
An example of a structural hazard is shown in Figure. This figure shows how the
load instruction
Load X(R1),R2
➢ The memory address, X+[R1], is computed in stepE2 in cycle 4, then memory
access takes place in cycle 5. The operand read from memory is written into
register R2 in cycle 6. This means that the execution step of this instruction
takes two clock cycles (cycles 4 and 5). It causes the pipeline to stall for one
cycle, because both instructions I2 and I3 require access to the register file
in cycle 6.
➢ Even though the instructions and their data are all available, the pipeline is
stalled because one hardware resource, the register file, cannot handle two
operations at once. If the register file had two input ports, that is, if it allowed
two simultaneous write operations, the pipeline would not be stalled. In
general, structural hazards are avoided by providing sufficient hardware
resources on the processor chip.