0% found this document useful (0 votes)
18 views8 pages

Notes Co Unit3

Uploaded by

soumyaks81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views8 pages

Notes Co Unit3

Uploaded by

soumyaks81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

'Processing Unit

To execute a program, the processor fetches one instruction at a time and performs the operations
specified. Instructions are fetched from successive memory locations until a branch or a jump
instruction is encountered. The processor keeps track of the address of the memory location
containing the next instruction to be fetched using the program counter, PC. After fetching an
instruction, the contents of the PC are updated to point to the next instruction in the sequence. A
branch instruction may load a different value into the PC.
Another key register in the processor is the instruction register, IR. Suppose that each instruction
comprises 4 bytes, and that it is stored in one memory word. To execute an instruction, the
processor has to perform the following three steps:
1. Fetch the contents of the memory location pointed to by the PC. The contents of this location are
interpreted as an instruction to be executed. Hence, they are loaded into the IR. Symbolically,
this canbewritten as
IR ⟵ [[PC]]
2. Assuming that the memory is byte addressable, increment the contents of the PC by 4, that is,
PC ⟵ [PC]+4
3. Carry out the actions specified by the instruction in the IR.

In cases where an instruction occupies more than one word, steps 1 and 2 must be repeated as many
times as necessary to fetch the complete instruction. These two steps are usually referred to as the
fetch phase; step 3 constitutes the execution phase.
Figure shows the processor organization in which the arithmetic and logic unit (ALU) and all the
registers are interconnected via a single common bus. This bus is internal to the processor.
The data and address lines of the external memory bus are connected to the internal processor bus
via the memory data register, MDR, and the memory address register, MAR, respectively. Register
MDR has two inputs and two outputs. Data may be loaded into MDR either from the memory bus
or from the internal processor bus. The data stored in MDR may be placed on either bus. The input
of MAR is connected to the internal bus, and its output is connected to the external bus. The control
lines of the memory bus are connected to the instruction decoder and control logic block. This unit
is responsible for issuing the signals that control the operation of all the units inside the processor
and for interacting with the memory bus.
The number and use of the processor registers R0 through
R(n - I) vary considerably from one processor to another.
Registers may be provided for general-purpose use by the
programmer. Some may be dedicated as special-purpose
registers, such as index registers or stack pointers. Three
registers, Y, Z, and TEMP in Figure are transparent to the
programmer, that is, the programmer need not be concerned
with them because they are never referenced explicitly by
any instruction. They are used by the processor for
temporary storage during execution of some instructions.
These registers are never used for storing data generated by
one instruction for later use by another instruction.
The multiplexer MUX selects either the output of register Y
or a constant value 4 to be provided as input A of the ALU.
The constant 4 is used to increment the contents of the
program counter.
As instruction execution progresses, data are transferred
from one register to another, often passing through the ALU
to perform some arithmetic or logic operation. The
instruction decoder and control logic unit is responsible for
implementing the actions specified by the instruction loaded
in the IR register. The decoder generates the control signals

Computer Organisation (R’21) Unit III Page | 1


needed to select the registers involved and direct the transfer of data. The registers, theALU, and the
interconnecting bus are collectively referred to as the datapath.
With few exceptions, an instruction can be executed by performing one or more of the following
operations in some specified sequence:
• Transfer a word of data from one processor register to another ortotheALU
• Perform an arithmetic or a logic operation and store the result in a processor register
• Fetch the contents of a given memory location and load them into a processor register
• Store a word of data from a processor register into a given memory location
REGISTER TRANSFERS
Instruction execution involves a sequence of steps in which data are
transferred from one register to another. For each register, two control
signals are used to place the contents of that register on the bus or to
load the data on the bus into the register. This is represented
symbolically in Figure. The input and output of register Ri are
connected to the bus via switches controlled by the signals Riin and
Riout respectively. When Riin is set to 1, the data on the bus are
loaded into Ri. Similarly, when Riout, is set to 1, the contents of
register Ri are placed on the bus. While Riout is equal to 0, the bus
can be used for transferring data from other registers.
Suppose that we wish to transfer the contents of register R1 to register
R4. This can be accomplished as follows:
• Enable the output of register R1 by setting R1out to 1. This
places the contents of R1 on the processor bus.
• Enable the input of register R4 by setting R4in to 1. This loads
data from the processor bus into register R4.
All operations and data transfers within the processor take place
within time periods defined by the processor clock. The control
signals that govern a particular transfer are asserted at the start of the
clock cycle. In our example, R1out and R4in are set to 1. The
registers consist of edge-triggered flip-flops. Hence, at the next active
edge of the clock, the flip-flops that constitute R4 will load the data
present at their inputs. At the same time, the control signals R1out,
and R4in will return to 0. For example, data transfers may use both
the rising and falling edges of the clock. Also, when edge-triggered
flip-flops are not used, two or more clock signals may be needed to
guarantee proper transfer of data. This is known as multiphase
clocking.
PERFORMING AN ARITHMETIC OR LOGIC OPERATlON
The ALU is a combinational circuit that has no internal storage, It performs arithmetic and logic
operations on the two operands applied to its A and B inputs. In the previous Figures, one of the
operands is the output of the multiplexer MUX and the other operand is obtained directly from the
bus. The result produced by theALU is stored temporarily in register Z. Therefore, a sequence of
operations to add the contents of register Rl to those of register R2 and store the result in register R3
is
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in

Computer Organisation (R’21) Unit III Page | 2


The signals whose names are given in any step are activated for the duration of the clock cycle
corresponding to that step. All other signals are inactive. Hence, in step 1, the output of register R1
and the input of register Y are enabled, causing the contents of R1 to be transferred over the bus to
Y. In step 2, the multiplexer's Select signal is set to SelectY, causing the multiplexer to gate the
contents of register Y to input A of the ALU. At the same time, the contents of register R2 are gated
onto the bus and, hence, to input B. The function performed by the ALU depends on the signals
applied to its control lines. In this case, the Add line is set to 1, causing the output of the ALU to be
the sum of the two numbers at inputs A and B. This sum is loaded into register Z because its input
control signal is activated. In step 3, the contents of register Z are transferred to the destination
register, R3. This last transfer cannot be carried out during step 2, because only one register output
can be connected to the bus during any clock cycle.
FETCHlNG A WORD FROM MEMORY
To fetch a word of information from memory, the processor has to
specify the address of the memory location where this information
is stored and request a Read operation. This applies whether the
information to be fetched represents an instruction in a program or
an operand specified by an instruction. The processor transfers the
required address to the MAR, whose output is connected to the
address lines of the memory bus. At the same time, the processor
uses the control lines of the memory bus to indicate that a Read operation is needed. When the
requested data are received from the memory they are stored in register MDR, from where they can
be transferred to other registers in the processor.
The connections for register MDR are illustrated in Figure. It has four control signals: MDRin and
MDRout control the connection to the internal bus, and MDRinE and MDRoutE control the
connection to the external bus.
During memory Read and Write operations, the timing of internal processor operations must be
coordinated with the response of the addressed device on the memory bus. The processor completes
one internal data transfer in one clock cycle. The speed of operation of the addressed device varies
with the device. To accommodate the variability in response time, the processor waits until it
receives an indication that the requested Read operation has been completed. A control signal called
Memory-Function-Completed (MFC) is used for this purpose. The addressed device sets this signal
to 1 to indicate that the contents of the specified location have been read and are available on the
data lines of the memory bus.
As an example of a read operation, consider the instruction Move (RI), R2. The actions needed to
execute this instruction are:
1. MAR ← [RI]
2. Start a Read operation on the memory bus
3. Wait for the MFC response from the memory
4. Load MDR from the memory bus
5. R2 ← [MDR]
These actions may be carried out as separate steps, but some can be combined into a single step.
Each action can be completed in one clock cycle, except action 3 which requires one or more clock
cycles, depending on the speed of the addressed device.
Let us assume that the output of MAR is enabled all the time. The contents of MAR are always
available on the address lines of the memory bus. This is the case when the processor is the bus
master. When a new address is loaded into MAR, it will appear on the memory bus at the beginning
of the next clock cycle. A Read control signal is activated at the same time MAR is loaded. This
signal will cause the bus interface circuit to send a read command, MR,on the bus. With this
arrangement, we have combined actions 1 and 2 above into a single control step. Actions 3 and 4
can also be combined by activating control signal MDRinE while waiting for a response from the
memory. Thus, the data received from the memory are loaded into MDR at the end of the clock
cycle in which the MFC signal is received. In the next clock cycle, MDRout is activated to transfer
the data to register R2. This means that the memory read operation requires three steps, which can
be described by the signals being activated as follows:

Computer Organisation (R’21) Unit III Page | 3


1. R1out, MARin, Read
2. MDRinE, WMFC
3. MDRout, R2in
where WMFC is the control signal that causes the processor's control circuitry to wait for the arrival
of the MFC signal.
STORING A WORD IN MEMORY
Writing a word into a memory location follows a similar procedure. The desired address is loaded
into MAR. Then, the data to be written are loaded into MDR, and a Write command is issued.
Hence, executing the instruction Move R2, (Rl) requires the following sequence:
1. R1out, MARin
2. R2out, MDRin, Write
3. MDRoutE, WMFC
As in the case of the read operation, the Write control signal causes the memory bus interface
hardware to issue a Write command on the memory bus. The processor remains in step 3 until the
memory operation is completed and an MFC response is received.
EXECUTION OF A COMPLETE INSTRUCTION
Consider theinstruction Add (R3), R1 which adds the contents of a memory location pointed to by
R3 to register R1. Executing this instruction requires the following actions:
1. Fetch the instruction.
2. Fetch the first operand (the contents of the memory location pointed to by R3).
3. Perform the addition.
4. Load the result into R1.
Below given the sequence of control steps required to perform these operations for the single-bus
architecture.
1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R3out, MARin, Read
5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin
7. Zout, R1in, End
In step 1, the instruction fetch operation is initiated by loading the contents of the PC into the MAR
and sending a Read request to the memory. The Select signal is set to Select4, which causes the
multiplexer MUX to select the constant 4. This value is added to the operand at input B, which is
the contents of the PC, and the result is stored in register Z. The updated value is moved from
register Z back into the PC during step 2, while waiting for the memory to respond. In step 3, the
word fetched from the memory is loaded into the IR.
Steps I through 3 constitute the instruction fetch phase, which is the same for all instructions. The
instruction decoding circuit interprets the contents of the IR at the beginning of step 4. This enables
the control circuitry to activate the control signals for steps 4 through 7, which constitute the
execution phase. The contents of register R3 aretransferred to the MAR in step 4, and a memory
read operation is initiated. Then the contents of R1 are transferred to register Y in step 5, to prepare
for the addition operation. When the Read operation is completed, the memory operand is available
in register MDR, and the addition operation is performed in step 6. The contents of MDR are gated
to the bus, and thus also to the B input of the ALU, and register Y is selected as the second input to
the ALU by choosing Select Y. The sum is stored in register Z, then transferred to Rl in step 7. The
End signal causes a new instruction fetch cycle to begin by returning to step 1.
BRANCH INSTRUCTIONS
A branch instruction replaces the contents of the PC with the branch target address. This address is
usually obtained by adding an offset X, which is given in the branch instruction, to the updated
value of the PC. Below gives a control sequence that implements an unconditional branch
instruction. Processing starts with the fetch phase. This phase ends when the instruction is loaded
into the IR in step 3. The offset value is extracted from the IR by the instruction decoding circuit,
which will also perform sign extension if required. Since the value of the updated PC is already

Computer Organisation (R’21) Unit III Page | 4


available in register Y, the offset X is gated onto the bus in step 4, and an addition operation is
performed. The result, which is the branch target address, is loaded into thePC in step 5.
The offset X used in a branch instruction is usually the difference between the branch target address
and the address immediately following the branch instruction. Forexample, if the branch instruction
is at location 2000 and if the branch target address is 2050, the value of X must be 46. The PC is
incremented during the fetch phase, before knowing the type of instruction being executed. Thus,
when the branch address is computed in step 4, the PC value used is the updated value, which points
to the instruction following the branch instruction in the memory.
Consider now a conditional branch. In this case, we need to check the status of the condition codes
before loading a new value into the PC. For example, for a Branch-on-negative (Branch<0)
instruction, step 4 in above control sequence is replaced with
Offset-field-of-IRout, Add, Zin, If N = 0 then End
Thus, if N= 0 the processor returns to step 1 immediately after step 4. If N = 1, step 5 is performed
to load a new value into the PC, thus performing the branch operation.

HARDWIRED CONTROL
The techniques to generate the control signals
needed to execute instructions in theproper
sequence fall into one of two categories: hardwired
control and microprogrammed control.
In the sequence of control signals, each step is
completed in one clock period. A counter may be
used to keep track of the control steps, as shown in
Figure. Each state, or count, of this counter
corresponds to one control step. The required
control signals are determined by the following
information:
• Contents of the control step counter
• Contents of the instruction register
• Contents of the condition code flags
• External input signals, such as MFC and
interrupt requests
The decoder/encoder block in Figure is a combinational circuit that generates the required control
outputs, depending on the state of all its inputs. The step decoder provides a separate signal line for
each step, or time slot, in the control sequence. The output of the instruction decoder consists of a
separate line for each machine instruction, For any instruction loaded in the IR, one of the output
lines INS1 through INSm is set to 1, and all other lines are set to 0. The input signals to the encoder
block in Figure are combined to generate the individual control signals Yin, PCout, Add, End, and
so on. An example of how the encoder generates the Zin control signal is given in Figure. This
circuit implements the logic function
Zin = T1 + T6 . ADD + T4 . BR + ....
This signal is asserted during time slot TI for all instructions,
during T6 for an Add instruction, during T4 for an unconditional
branch instruction, and so on. The sequence of operations carried
out by the machine is determined by the wiring of the logic
elements, hence the name "hardwired". A controller that uses this
approach can operate at high speed. It has little flexibility, and the
complexity of the instruction set it can implement is limited.

Computer Organisation (R’21) Unit III Page | 5


A COMPLETE PROCESSOR
A complete processor can be designed using the structure
shown in Figure. This structure has an instruction unit
that fetches instructions from an instruction cache or from
the main memory when the desired instructions are not
already in the cache. It has separate processing units to
deal with integer data and floating-point data. A data
cache is inserted between these units and the main
memory. Using separate caches for instructions and data
is common practice in many processors today. Other
processors use a single cache that stores both instructions
and data. The processor is connected to the system bus
and to the rest of the computer, by means of a bus
interface.
MICROPROGRAMMED CONTROL
In microprogrammed controt, the control signals are
generated by a program similar to machine language
programs.
A control word (CW) is a word whose individual bits
represent the various control signals. Each of the control
steps in the control sequence of an instruction
defines a unique combination of 1s and 0s in the
CW. The CWs corresponding to the 7 steps of
Add (R3), R1 is shown in Figure. We have
assumed thatSelectY is represented by Select = 0
and Select4 by Select = 1. A sequence of CWs
corresponding to the control sequence of a
machine instruction constitutes the microroutine
for that instruction, and the individual control
words in this microroutine are referred to as
microinstructions.
The microroutines for all instructions in the instruction set of a computer are
stored in a special memory called the control store. The control unit can
generate the control signals for any instruction by sequentially reading the
CWs of the corresponding microroutine from the control store. To read the
control words sequentially from the control store, a microprogram counter
(μPC) is used. Every time a new instruction is loaded into the IR, the output
of the block labeled "starting address generator" is loaded into the μPC. The
μPC is then automatically incremented by the clock, causing successive
microinstructions to be read from the control store. Hence, the control
signals are delivered to various parts of the processor in the correct
sequence.

INSTRUCTION PIPELINING
An instruction has a number of stages. Consider
subdividing instruction processing into two stages: fetch
instruction and execute instruction. There are times during
the execution of an instruction when main memory is not
being accessed. This time could be used to fetch the next
instruction in parallel with the execution of the current
one. Figure(a) depicts this approach. The pipeline has two
independent stages. The first stage fetches an instruction
and buffers it. When the second stage is free, the first

Computer Organisation (R’21) Unit III Page | 6


stage passes it the buffered instruction. While the second stage is executing the instruction, the first
stage takes advantage of any unused memory cycles to fetch and buffer the next instruction. This is
called instruction prefetch or fetch overlap. Note that this approach, which involves instruction
buffering, requires more registers. This process will speed up instruction execution. If the fetch and
execute stages were of equal duration, the instruction cycle time would be halved. However, if we
look more closely at this pipeline (Figure(b)), we will see that this doubling of execution rate is
unlikely for two reasons:
1. The execution time will generally be longer than the fetch time. Execution will involve
reading and storing operands and the performance of some operation. Thus, the fetch stage
may have to wait for some time before it can empty its buffer.
2. A conditional branch instruction makes the address of the next instruction to be fetched
unknown. Thus, the fetch stage must wait until it receives the next instruction address from
the execute stage. The execute stage may then have to wait while the next instruction is
fetched.
To gain further speedup, the pipeline must have more stages. Let us consider the following
decomposition of the instruction processing.
• Fetch instruction (FI): Read the next expected instruction into a buffer.
• Decode instruction (DI): Determine the
opcode and the operand specifiers.
• Calculate operands (CO): Calculate the
effective address of each source operand. This
may involve displacement, register indirect,
indirect, or other forms of address calculation.
• Fetch operands (FO): Fetch each operand
from memory. Operands in registers need not
be fetched.
• Execute instruction (EI): Perform the indicated
operation and store the result, if any, in the
specified destination operand location.
• Write operand (WO): Store the result in
memory.
Figure shows that a six-stage pipeline can reduce the
execution time for 9 instructions from 54 time units to 14 time units.
The diagram assumes that each instruction goes through all six stages of the pipeline. This will not
always be the case. Also, the diagram assumes that all of the stages can be performed in parallel. It
is assumed that there are no memory conflicts. If the six stages are not of equal duration, there will
be some waiting involved at various pipeline stages. Another difficulty is the conditional branch
instruction, which can invalidate several instruction fetches. A
similar unpredictable event is an interrupt. The CO stage may
depend on the contents of a register that could be altered by a
previous instruction that is still in the pipeline. Other such register
and memory conflicts could occur. The system must contain logic
to account for this type of conflict.
Pipeline Hazards
A pipeline hazard occurs when the pipeline, or some portion of the
pipeline, must stall because conditions do not permit continued
execution. Such a pipeline stall is also referred to as a pipeline
bubble. There are three types of hazards: resource, data, and
control.
RESOURCE HAZARDS A resource hazard occurs when two (or
more) instructions that are already in the pipeline need the same
resource. The result is that the instructions must be executed in
serial rather than parallel for a portion of the pipeline. A resource
hazard is sometime referred to as a structural hazard.

Computer Organisation (R’21) Unit III Page | 7


Consider an example of a resource hazard.Assume a simplified fivestage pipeline, in which each
stage takes one clock cycle. Assume that main memory has a single port and that all instruction
fetches and data reads and writes must be performed one at a time. An operand read to or write from
memory cannot be performed in parallel with an instruction fetch. The source operand for
instruction I1 is in memory, rather than a register. Therefore, the fetch instruction stage of the
pipeline must idle for one cycle before beginning the instruction fetch for instruction I3. All other
operands are in registers.
Another example of a resource conflict is a situation in which multiple instructions are ready to
enter the execute instruction phase and there is a single ALU. One solutions to such resource
hazards is to increase available resources, such as having multiple ports into main memory and
multiple ALU units.
DATA HAZARDS A data hazard occurs when there is a conflict in
the access of an operand location. In general terms, we can state
the hazard in this form: Two instructions in a program are to be
executed in sequence and both access a particular memory or
register operand. If the two instructions are executed in strict
sequence, no problem occurs. If the instructions are executed in a
pipeline, then it is possible for the operand value to be updated in
such a way as to produce a different result than would occur with
strict sequential execution. The program produces an incorrect
result because of the use of pipelining.
As an example, consider the following x86 machine instruction sequence:
ADD EAX, EBX /* EAX = EAX + EBX
SUB ECX, EAX /* ECX = ECX - EAX
The first instruction adds the contents of the 32-bit registers EAX and EBX and stores the result in
EAX. The second instruction subtracts the contents of EAX from ECX and stores the result in ECX.
The ADD instruction does not update register EAX until the end of stage 5, which occurs at clock
cycle 5. But the SUB instruction needs that value at the beginning of its stage 2, which occurs at
clock cycle 4. To maintain correct operation, the pipeline must stall for two clocks cycles. Thus, in
the absence of special hardware and specific avoidance algorithms, such a data hazard results in
inefficient pipeline usage.
There are three types of data hazards;
• Read after write (RAW), or true dependency: An instruction modifies a register or memory
location and a succeeding instruction reads the data in that memory or register location.A
hazard occurs if the read takes place before the write operation is complete.
• Write after read (WAR), or antidependency: An instruction reads a register or memory
location and a succeeding instruction writes to the location. A hazard occurs if the write
operation completes before the read operation takes place.
• Write after write (WAW), or output dependency: Two instructions both write to the same
location. A hazard occurs if the write operations take place in the reverse order of the
intended sequence.
CONTROL HAZARDS A control hazard, also known as a branch hazard, occurs when the pipeline
makes the wrong decision on a branch prediction and therefore brings instructions into the pipeline
that must subsequently be discarded.

Computer Organisation (R’21) Unit III Page | 8

You might also like