DDCO module5 (1)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

LECTURE NOTES – MODULE 5


SUBJECT : COMPUTER ORGANIZATION
SUBJECT CODE: BCS302
SEMESTER : III

1 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

MODULE-V
Basic Processing Unit: Some Fundamental Concepts: Register Transfers, Performing ALU
operations,fetching a word from Memory, Storing a word in memory. Execution of a
Complete Instruction.
Pipelining:Basic concepts, Role of Cache memory, Pipeline Performance.
Text book 2: 7.1, 7.2, 8.1
BASIC PROCESSING UNIT:

Processing unit executes machine instructions and coordinates the activities of other units. This
unit is also called Instruction set processor or simply the processor.

5.1 Some Fundamental Concepts


To execute an instruction, the processor has to perform the following three steps:
1. Fetch the contents of memory location pointed to by the PC into the IR. That is,
IR←[[PC]]
2. Assuming that the memory is byte addressable, increment the contents of PC by 4, that is,
PC←[PC]+4
3. Carry out the actions specified by the instruction in the IR.

Single-bus organization of datapath inside a processor

2 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

Figure shows an organization in which the arithmetic and logic unit(ALU) and all the registers
are interconnected via a single common bus. The data and address lines of external memory bus
are connected to the internal processor bus via MDR and MAR respectively. The control lines of
memory bus are connected to instruction decoder and control logic block. This unit is
responsible for issuing the signals that control operation of all units inside the processor and for
interaction with memory bus.

The number and use of processor registers R0 through R(n-1) vary considerably from one
processor to another. Three registers X, Y and TEMP are used by the processor for temporary
storage during execution of some instructions. The multiplexer MUX selects output of register Y
(selectY) or a constant value 4 to be provided as input A of the ALU. The registers, the ALU and
the interconnecting bus are collectively referred to as the datapath. With few exceptions, an
instruction can be executed by performing one or more of the following operations in some
specified sequence:
• Transfer a word of data from one processor register to another or to the ALU
• Perform an arithmetic or a logic operation and store the result in a processor register
• Fetch the contents of a given memory location and load them into a processor register
• Store a word of data from a processor register into a given memory location
We now consider in detail how each of these operations is implemented, using the simple
processor model.

5.1.1 Register Transfers


Instruction execution involves a sequence of steps in which data are transferred from one
register to another. Each register has input and output gating and these gates are controlled
by corresponding control signals as shown in the figure.
• The input and output gates of register Ri are nothing but the electronic switches
which can be controlled by the control signals Riin and Riout.
• When signal Riin is 1, data on bus are loaded into Ri.
• When the signal Riout is 1, the contents of register Ri are placed on the bus.
• While Riout, set to 0, the bus can be used for transferring data from other registers.

3 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

Input and output gating for the registers inside a processor

5.1.2 Performing an Arithmetic or Logic Operation


The ALU performs arithmetic and logic operations on the two operands applied to it’s A and B
inputs. One of the operands is the output of MUX and other operand is directly from the bus. The
result produced by the ALU is temporarily stored in register Z. a sequence of operations to add
the contents of register R1 to R2 and store in R3 is,
1) R1out, Yin // transfer the contents of R1 to Y register
2) R2out, SelectY, Add, Zin // R2 contents are transferred directly to B input of ALU, The
numbers are added and loaded into register Z
3) Zout, R3in // Sum stored in register Z are transferred to destination register R3
The signals are activated for the duration of the clock cycle corresponding to that step. All other
signals are inactive.

4 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

5.1.3 Fetching a word from memory


To fetch a word of information from memory, the processor has to specify the address of the
memory location where this information is stored and request a read operation. When the
requested data are received from the memory they are stored in register MDR, from where they
can be transferred to other registers in the processor. The connection for register MDR are
illustrated in figure.

Connection and Control signals for register MDR


It has four control signals: MDRin and MDRout control the connection to internal bus, and
MDRinE and MDRoutE control the connection to the external bus. Control signal MFC (Memory-
Function-Completed) : addressed-device set MFC to 1 to indicate that the contents of the
specified location have been read and are available on the data lines of the memory bus.

Consider the instruction Move (R1), R2. The memory read operation requires three steps:
1. R1out, MARin, Read
2. MDRinE, WMFC
3. MDRout, R2in
Where WMFC is the control signal that causes the processor’s control circuitry to wait for the
arrival of the MFC signal. Assume output of MAR enabled all the time. When a new address is
loaded into MAR it will appear on the memory bus at the beginning of the next clock cycle as
shown in timing diagram. A read control signal is activated at the same time MAR is loaded.
This signal will cause the bus interface circuit to send a read command, MR, on the bus. Control
signal MDRinE is activated while waiting for a response from memory. thus, the data received
from memory are loaded into MDR at the end of clock cycle in which the MFC signal is
received. In the next clock cycle MDRout is activated to transfser the data to register R2.

5 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

Timing of a memory read operation

5.1.4 Storing a word in memory


To write a word in memory location the desired address is loaded into MAR. then the data to be
written into is loaded to MDR, and a write command is issued. Executing instruction Move R2,
(R1) requires following sequence.
1. R1out, MARin
2. R2out , MDRin, Write
3. MDRoutE, WMFC

5.2 Execution of a complete instruction


Consider the instruction Add (R3), R1 which adds the contents of a memory-location pointed by
R3 to register R1. Executing this instruction requires the following actions:
1) Fetch the instruction
2) Fetch the first operand
3) Perform the addition
4) Load the result into R1
Control sequence for execution of this instruction is as follows:

6 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

Instruction execution proceeds as follows.


• In step 1, the instruction fetch operation is initiated by loading the contents of the
PC into the MAR and sending a Read request to the memory. The Select signal
is set to Select4, which causes the MUX to select the constant 4. This value is
added to the operand at input B, which is the contents of the PC, and the result
is stored in register Z.
• In step 2, The updated value is moved from register Z back into the PC, while
waiting for the memory to respond.
• In step 3, the fetched instruction from memory is loaded into the IR.
• Step 4, Contents of R3 are loaded into MAR & a memory read signal is issued.
• Step 5, Contents of R1 are transferred to Y to prepare for addition.
• Step 6, When Read operation is completed, memory-operand is available in MDR,
and the addition is performed.
• Step 7, The sum is stored in register Z, then transferred to Rl. The end signal causes a
new instruction fetch cycle to begin by returning to step 1.

5.2.1 Branch instructions

Control sequence for an unconditional branch instruction is as follows:

7 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

The processing starts as usual, the fetch phase ends in step 3. In step 4, The offset value is
extracted from the IR by the instruction decoding circuit. In step 5, The result, which is the
branch target address, is loaded into the PC.

5.3 Pipelining
• Pipelining is a technique of decomposing a sequential process into sub operations, with
each sub process being executed in a special dedicated segment that operates
concurrently with all other segments.
✓ Decomposing a sequential process into suboperations
✓ Each subprocess is executed in a special dedicated segment concurrently
• The computer is controlled by a clock whose period is such that the fetch and execute
steps of any instruction can each be completed in one clock cycle. Operation of the
computer proceeds as in Figure 5.1c.
• In the first clock cycle, the fetch unit fetches an instruction I1 (step F1) and stores it in
buffer B1 at the end of the clock cycle.
• In the second clock cycle, the instruction fetch unit proceeds with the fetch operation for
instruction I2 (step F2). Meanwhile, the execution unit performs the operation specified
by instruction I1, which is available to it in buffer B1 (step E1).
• By the end of the second clock cycle, the execution of instruction I 1 is completed and
instruction I2 is available. Instruction I2 is stored in B1, replacing I1, which is no longer
needed. Step E2 is performed by the execution unit during the third clock cycle, while
instruction I3 is being fetched by the fetch unit. In this manner, both the fetch and execute
units are kept busy all the time.
• If the pattern in Figure 5.1c can be sustained for a long time, the completion rate of
instruction execution will be twice that achievable by the sequential operation depicted in
Figure 5.1a

8 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

Figure 5.1 Basic idea of instruction pipelining.

9 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

In summary, the fetch and execute units in Figure 5.1b constitute a two-stage pipeline in which
each stage performs one step in processing an instruction. An interstage storage buffer, B1, is
needed to hold the information being passed from one stage to the next. New information is
loaded into this buffer at the end of each clock cycle. The processing of an instruction need not
be divided into only two steps.

For example, a pipelined processor may process each instruction in four steps, as follows:

• Fetch(F)- read the instruction from the memory


• Decode(D)- Decode the instruction and fetch the source operand
• Execute(E)- perform the operation specified by the instruction
• Write(W)- store the result in the destination location

• The sequence of events for this case is shown in Figure 5.2 a.


• Four instructions are in progress at any given time. This means that four distinct
hardware units are needed, as shown in Figure 5.2
• These units must be capable of performing their tasks simultaneously and without
interfering with one another. Information is passed from one unit to the next through a
storage buffer. As an instruction progresses through the pipeline, all the information
needed by the stages downstream must be passed along.

For example, during clock cycle 4, the information in the buffers is as follows:

• Buffer B1 holds instruction I3, which was fetched in cycle 3 and is being decoded by the
instruction-decoding unit.
• Buffer B2 holds both the source operands for instruction I2 and the specification of the
operation to be performed. This is the information produced by the decoding hardware in
cycle 3. The buffer also holds the information needed for the write step of instruction I2
(step W2) . Even though it is not needed by stage E, this information must be passed on to
stage W in the following clock cycle to enable that stage to perform the required Write
operation.
• Buffer B3 holds the results produced by the execution unit and the destination information for
instruction I1.

5.1.1 Role of Cache Memory

• Each pipeline stage is expected to complete in one clock cycle.


• The clock period should be long enough to let the slowest pipeline stage to complete.
• Faster stages can only wait for the slowest one to complete.
• Since main memory is very slow compared to the execution, if each instruction needs to
be fetched from main memory, pipeline is almost useless.
• Fortunately, we have cache.

10 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

Figure 5.2 A 4-stage pipeline.

• A unit that completes its task early is idle for the remainder of the clock period. Hence,
pipelining is most effective in improving performance if the tasks being performed in
different stages require about the same amount of time.
• This consideration is particularly important for the instruction fetch step, which is
assigned one clock period in Figure 5.2a.
• The clock cycle has to be equal to or greater than the time needed to complete a fetch
operation. However, the access time of the main memory may be as much as ten times
greater than the time needed to perform basic pipeline stage operations inside the
processor, such as adding two numbers. Thus, if each instruction fetch required access to
the main memory, pipelining would be of little value.
• The use of cache memories solves the memory access problem.
11 Dept. of CSE, CEC, MANGALORE
DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

• In particular, when a cache is included on the same chip as the processor, access time to
the cache is usually the same as the time needed to perform other basic operations inside
the processor. This makes it possible to divide instruction fetching and processing into
steps that are more or less equal in duration.
• Each of these steps is performed by a different pipeline stage, and the clock period is
chosen to correspond to the longest one.

5.1.2 Pipeline Performance

• The potential increase in performance resulting from pipelining is proportional to the


number of pipeline stages.
• However, this increase would be achieved only if all pipeline stages require the same
time to complete, and there is no interruption throughout program execution.
• Unfortunately, this is not true.

o For a variety of reasons, one of the pipeline stages may not be able to complete its
processing task for a given instruction in the time allotted.
o For example, stage E in the four-stage pipeline of Figure 5.2b is responsible for
arithmetic and logic operations, and one clock cycle is assigned for this task. Although
this may be sufficient for most operations, some operations, such as divide, may require
more time to complete . Figure 5.3 shows an example in which the operation specified in
instructin I2 require 3 cycles to complete ,from cycle 4 through cycle6.
o Thus cycle ,in cycle 5 and 6,the write stage must be told to nothing,because it has no data
to work with.
o Meanwhile ,the information in buffer B2 must remain intact until the Execute stage has
completed its operation. This means that stage 2 and, in turn, stage 1 are blocked from
accepting new instructions because the information in B1 cannot be overwritten. Thus,
steps D4 and F5 must be postponed as shown.

12 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

Figure 5.3 Effect of an execution operation taking more than one clock cycle.

Pipelined operation in Figure 5.3 is said to have been stalled for two clock cycles. Normal
pipelined operation resumes in cycle 7.
• Any condition that causes a pipeline to stall is called a hazard.
• Data hazard – any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. So some operation has to
be delayed, and the pipeline stalls.
• Instruction (control) hazard – a delay in the availability of an instruction causes the
pipeline to stall.
• Structural hazard – the situation when two instructions require the use of a given
hardware resource at the same time.

The effect of a cache miss on pipelined operation is illustrated in Figure 5.4.


o Instruction I1 is fetched from the cache in cycle 1, and its execution proceeds normally.
o However, the fetch operation for instruction I2, which is started in cycle 2, results in a
cache miss.
o The instruction fetch unit must now suspend any further fetch requests and wait for I2 to
arrive. We assume that instruction I2 is received and loaded into buffer B1 at the end of
cycle 5.
o The pipeline resumes its normal operation at that point.

13 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

Figure 5.4 Pipeline stall caused by a cache miss in F2.

• An alternative representation of the operation of a pipeline in the case of a cache miss is


shown in Figure 5.4b.
• This figure gives the function performed by each pipeline stage in each clock cycle. Note
that the Decode unit is idle in cycles 3 through 5, the Execute unit is idle in cycles 4
through 6, and the Write unit is idle in cycles 5 through 7. Such idle periods are called
stalls. They are also often referred to as bubbles in the pipeline. Once created as a result
of a delay in one of the pipeline stages, a bubble moves downstream until it reaches the
last unit.
• In structural hazard . when two instructions require the use of a given hardware resource
at the same time. The most common case in which this hazard may arise is in access to
memory. One instruction may need to access memory as part of the Execute or Write
stage while another instruction is being fetched.
• If instructions and data reside in the same cache unit, only one instruction can proceed
and the other instruction is delayed. Many processors use separate instruction and data
caches to avoid this delay.

14 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

• An example of a structural hazard is shown in Figure 5.5. This figure shows how the load
instruction
Load X(R1),R2
can be accommodated in our example 4-stage pipeline.
• The memory address, X+[R1],is computed in step E2 in cycle4, then memory access
takes place in cycle 5.
• The operand read from memory is written into register R2 in cycle 6. This means that the
execution step of this instruction takes two clock cycles (cycles 4 and 5).
• It causes the pipeline to stall for one cycle, because both instructions I2 and I3 require
access to the register file in cycle 6.
• Even though the instructions and their data are all available, the pipeline is stalled
because one hardware resource, the register file, cannot handle two operations at once.

Figure 5.5 Effect of a Load instruction on pipeline timing.

• If the register file had two input ports, that is, if it allowed two simultaneous write
operations, the pipeline would not be stalled. In general, structural hazards are avoided by
providing suf ficient hardware resources on the processor chip.

15 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

• It is important to understandthat pipelining does not result in individual instructions being


executed faster; rather, it is the throughput that increases, where throughput is measured
by the rate at which instruction execution is completed.
• Any time one of the stages in the pipeline cannot complete its operation in one clock
cycle, the pipeline stalls, and some degradation in performance occurs. Thus, the
performance level of one instruction completion in each clock cycle is actually the upper
limit for the throughput achievable in a pipelined processor organized as in Figure 5.2b.
• An important goal in designing processors is to identify all hazards that may causes the
pipeline to stall and to find ways to minimize their impact.

16 Dept. of CSE, CEC, MANGALORE


DIGITAL DESIGN AND Computer Organization Notes (BCS302) 2024

Questions:
1. Write down the control sequence for the execution of the instruction Add
(R3), R1.
(in execution of a complete instruction)
[What are the actions required to execute a complete instruction Add (R3),
R1]

2. Write the control sequence for an unconditional branch instruction.


3. Discuss with neat diagram, single bus organization of the data path inside a
4. processor.
5. Write and discuss about micro routine for complete execution of instruction
Add (R1), R2 in single bus organisation
6. Write the sequence of control steps required for single bus structure for each
of the following instructions.
(i) Add the contents of memory location NUM to register R1.
(ii) Add the contents of memory location whose address is at memory
location NUM to register R1.
7. Explain the role of cache memory in pipelining.
8. Explain pipelining performance.
9. Explain Pipelining and data hazard?

17 Dept. of CSE, CEC, MANGALORE

You might also like