0% found this document useful (0 votes)
9 views46 pages

COA Notes

Uploaded by

Fact Guru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views46 pages

COA Notes

Uploaded by

Fact Guru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

CHAPTER 2

COMPUTER ORGANIZATION AND ARCHITECTURE

Syllabus: Machine instructions and addressing modes, ALU and data path, CPU control design, memory interface,
I/O interface (interrupt and DMA mode), instruction pipelining, cache and main memory, secondary storage

2.1 INTRODUCTION 2.2 COMPUTER ARCHITECTURE

Computer architecture and organization is the 2.2.1 Register Set


science of interconnecting hardware components, design-
ing and configuring the hardware/software interface to The computer needs registers for processing and manip-
fulfill functional and performance goals of a computer. ulating data and for holding memory addresses that
This chapter outlines the basic hardware structure of a are available to the machine-code programmer. Some
modern digital programmable computer, the basic laws registers for a basic computer are given in Table 2.1.

Table 2.1 |   Types of registers and their functions


for performance evaluation, designing the control and
data path hardware for a processor, concept of pipelining
for executing machine instructions simultaneously and Register Register Function
designing fast memory and storage systems. Symbol Name
Computer architecture deals with the structure DR Data register Holds memory operand
and behaviour of computer system as viewed by the
user. It encompasses instruction formats, the instruction ACC Accumulator Special purpose
set architecture (ISA) and addressing modes. processor register

Computer organization deals with the operation AR Address Holds address for
and interconnection of the various hardware components. register memory
(Continued)

Chapter 2.indd 57 4/9/2015 9:49:42 AM


58     Chapter 2: Computer Organization and Architecture 

Table 2.1 |   Continued When the system consists of multiple frequent cases,
where i is the number of frequent cases:
Register Register Function
−1
Symbol Name 
( ) Fi 
Soverall =  1 − ∑ Fi + ∑ 
 S
IR Instruction Holds an instruction
register that is to be executed
Problem 2.1: Consider a hypothetical processor used
PC Program Holds address of
in mathematical model simulation. It consists of two
counter instruction to be
functional units, floating point and integer. The float-
executed next
ing point is enhanced then it runs two times faster,
TR Temporary Holds temporary data but only 10% of the instructions are floating point.
register if required What is the speed up?
INPR Input register Holds input character
Solution: Here S = 2, F = 0.1
OUTR Output register Holds output character
−1
 0.1 
Soverall = (1 − 0.1) +
2 
= 1.052
2.2.2 Quantitative Principles to Design 
High-Performance Processor

Amdahl’s law focused on performance gain after enhanc- 2.3 MACHINE INSTRUCTIONS AND
ing the system. The performance gain is denoted by ADDRESSING MODES
Soverall and ET stands for execution time.
Performance of the system with enhancement
Soverall = Machine instruction is an individual machine code. The
Performance of the system without enhancement complete set of all machine codes recognized by a partic-
1 ETnew ular processor makes its Instruction Set. Instructions can
Soverall =  (2.1) be grouped according to the function they perform. The
1 ETold
number of ways by which arguments for these machine
ETold instructions can be specified constitutes the addressing
Soverall =
ETnew modes for a processor.
After enhancement, the system consists of two portions:
unenhanced and enhanced portion. 2.3.1 Machine Instructions

ETnew = ET of the unenhanced portion An instruction is a command to the microprocessor to


+ ET of enhanced portion perform a given task. Most computer instructions are
classified as follows:
To calculate ETnew, the following two factors are needed:
1. Data transfer instructions: These instructions
1. Fractionenhance (F): It indicates how much por- move data from one place to another in the com-
tion of the old system undergoes enhancement. puter without changing the data content. Example:
2. Speedenhance (S): It indicates how many times the LOAD, MOVE, IN, OUT, PUSH, STORE.
new portion is running faster than the old portion. 2. Data manipulation instructions: These instruc-
Performancenew F 1 ETnew F ETold F tions perform arithmetic, logical and shift opera-
S= = = tions on data. Example: ADD, SUB, MUL, DIV,
Performanceold F 1 ETold F ETnew F
INC, AND, XOR, OR, SHR, SHL, ROR, ROL.
ETold F 3. Program control instructions: These instruc-
So, ETnew F =
S tions may change the address value in program coun-
On the basis of the above factor, ter and cause the normal sequential flow to change.
ETold F
ETnew F = ETold (1 − F ) +
On the basis of the number of address fields in an instruc-
S tion, they are classified as follows:
Substitute the value of ET in Eq. (2.1): 1. Three-address instruction: Computer with
ETold F three-address instruction format can use each
ETnew F = ETold (1 − F ) + address field to specify two sources and a destina-
S
Let ETold = 1, tion, which can be either a processor register or a
−1 memory operand. It results in short program but
1  F
= (1 − F ) +  requires too many bits to specify three addresses.
(1 − F ) + (F S ) 
Soverall =
S
Example: ADD R1, A, B (R1 ← M[A] + M[B])

Chapter 2.indd 58 4/9/2015 9:49:45 AM


2.3 MACHINE INSTRUCTIONS AND ADDRESSING MODES     59

2. Two-address instruction: Each address field can 1. provide programming flexibility to users through
specify either a processor register or a memory word. use of pointers to memory, counter for loop control,
 OV R1, A       (R1 ← M[A]);
Example: M data indexing and program relocation.
MUL R1, R2 (R1 ← R1*R2) 2. reduce the size of the addressing field of the
instruction.
3. One-address instruction: It used an implied
accumulator (AC) register for all data manipula- Let us suppose [x] means contents at location x for all the
tion. The other operand is in register or memory. addressing modes.

 OAD A  (AC ← M[A]);


Example: L
2.3.2.1 Types of Addressing Modes
ADD B (AC ← AC + M[B])
4. Zero-address instruction: A stack organized 1. Implied mode: In this mode, the operands are
computer does not use an address field for the implicitly stated in the instruction. For example,
instruction ADD and MUL. register reference instructions such as CMA (comple-
ment accumulator), CLA (clear accumulator) and
zero-address instructions that use stack organization.
Problem 2.2:
2. Immediate mode: In this mode, the operand is
(a) ISA of a processor consists of 64 registers, 125 specified in the instruction itself, that is, address
instructions and 8 bits for immediate mode. In a field is replaced by an actual operand. Immediate
given program, 30% of the instructions take one mode instructions are useful for initializing regis-
input register and have one output register, 30% ters to a constant value. For example, used for ini-
have two input registers and one output ­register, tializing CPU registers to some constant value such
20% have one immediate input, and one output as MOV R1, #34.
register, and remaining have two immediate
Instruction with immediate mode
input, 1 register input and one output register.
Calculate the number of bits required for each Opcode Data Operand
instruction type. Assume that the ISA requires
that all instructions be a multiple of 8 bits in
3. Register (direct) mode: In this mode, the oper-
length.
ands are in CPU registers. An n-bit register field
(b) Compare the memory space required with that of
can specify any one of 2n registers. Example: ADD
variable length instruction set.
R1 will add the contents of an accumulator and
Solution: contents of R1, that is, ACC = [ACC] + [R1].
(a) Since there are 125 instructions so we need 7 bits Instruction with register direct mode
to differentiate them as 64 < 125 < 128. For 64
registers, we need 6 bits and 8 bits for immediate Opcode Register address
mode.
For Type 1, 1 reg in, 1 reg out: 7 + 6 + 6 = CPU register
19 bits ~ 32 bits
For Type 2, 2 reg in, 1 reg out: 7+ 6 + 6 + 6 = Operand
26 bits ~ 32 bits
For Type 3, 1 imm in, 1 reg out: 7 + 6 + 8 = 4. Register (indirect) mode: In this mode, the
21 bits ~ 32 bits instruction format specifies a CPU register which
For Type 4, reg in, 2 imm in, 1 reg out: 7 + 6 + contains an effective address of the operand resid-
8 + 8 + 6 = 35 bits ~48 bits ing in memory. This mode ensures less number of
(b) As the largest instruction type requires 48 bit bits to specify a register value than to specify a
instructions, the fixed-length instruction format memory location. Example: ADD @ R1 will add the
uses 48 bits per instruction. Variable length instruc- contents of an accumulator with contents of the
tion format uses 0.3 × 32 + 0.3 × 32 + 0.2 × 32 + register R1, that is, AC = [ACC] + [[R1]].
0.2 × 48 = 36 = bits on average, that is, 25% less
Instruction
Instruction with
with register
register indirect
indirect mode
mode
space.
Opcode
Opcode Register address
Register address
2.3.2 Addressing Modes Memory
Memory

The addressing mode specifies how effective address of CPU register


CPU register
an operand is calculated from an instruction. Computers Pointer to
Pointer to Operand
Operand
use various addressing mode techniques to: operand
operand

Chapter 2.indd 59 4/9/2015 9:49:47 AM


60     Chapter 2: Computer Organization and Architecture 

5. Auto-increment or Auto-decrement mode: 10. Base register addressing mode: In this mode,
This is similar to register indirect mode except the the effective address of an operand is obtained by
register containing effective address is incremented adding the content of a base register to the address
or decremented after (or before) its value is used to part of the instruction. This is somewhat similar to
access memory. the indexed addressing mode except that the base
6. Direct address mode: In this mode, the effec- register stores base or beginning address instead of
tive address of an operand is equal to the address an index register. It is used for program relocation.
part of the instruction. Example: ADD A instruc-
tion adds content of memory cell A to accumula-
tor, that is, ACC = [ACC] + M[A]. Problem 2.3: A two-word instruction LOAD is stored
Instruction with direct address mode at location 300 with its address field in the next loca-
tion. The address field has value 600 and value stored
at 600 is 500 and at 500 is 650. The words stored
Opcode Memory address
at 900, 901 and 902 are 400, 401 and 402, respec-
Memory tively. A processor register R contains the number
800 and index register has value 100. Evaluate the
effective address and operand if addressing mode of
the instruction is as follows:
Operand
1. Direct 4. Immediate
2. Indirect 5. Register indirect
7. Indirect address mode: In this mode, memory 3. Relative 6. Index
address specified by address field contains the Solution: Memory layout is as follows
address of (pointer to) the operand. Example:
ADD @ A will add the contents of the memory cell 300 LOAD
A, that is, ACC = [ACC] + M[M[A]]. 301 600
Instruction with indirect address mode
500 650
Opcode Memory address
Memory
600 500
Pointer to operand
700 900
800 700
Operand
900 400
901 401
8. Relative address mode: In this mode, the effective 902 402
address of an operand is obtained by adding the con-
tent of a program counter to the address part of the Addressing Effective Operand
instruction. The address part of the instruction can be Mode Address
either positive or negative represented in 2’s comple-
ment. The result obtained after adding the content of Direct 600 500
the program counter to the address field produces an Indirect 500 650
effective address whose position in memory is relative Relative 902 402
to the address of the next instruction. Immediate 301 600
9. Index address mode: In this mode, the effective Register indirect 800 700
address of an operand is obtained by adding the Index 700 900
content of an index register to the address part of
the instruction. The index register is a special CPU
register that stores an index value and the address
field of the instruction stores the base address of a Problem 2.4: A relative mode branch type instruc-
data array in the memory. The distance between tion is stored in memory at an address equivalent to
the base address and the address of the operand is decimal 600 and the branch is made to an address
the index value that is stored in the index register. equivalent to decimal 400. What is the value of the
The index register can be incremented to facilitate relative address field of the instruction (in decimal)?
access to consecutive operands stored in arrays
Solution: Relative address = 400 − 601 = −201
using the same instruction.

Chapter 2.indd 60 4/9/2015 9:49:48 AM


2.4 ARITHMETIC LOGIC UNIT     61

Table 2.2 |   Arithmetic circuit function table


Select Input to Output of Binary Micro-Operation
Adder Y Adder
S1 S0 Cin
D = A + Y + Cin
0 0 0 0 D=A Transfer A
0 0 1 0 D=A+1 Increment A
0 1 0 B D=A+B Add
0 1 1 B D=A+B+1 Add with carry
1 0 0 B D = A+B Subtract with borrow
1 0 1 B D = A + B +1 Subtract
1 1 0 1 D=A−1 Decrement A
1 1 1 1 D=A Transfer A

2.4 ARITHMETIC LOGIC UNIT By controlling the output Y of multiplexers with two
selection inputs S1 and S0 and Cin either 0 or 1, we can
generate the eight arithmetic micro-operations (Table 2.2).
Arithmetic logic unit (ALU) is a combinational circuit
that performs all arithmetic and logic operations so that 2.4.2 Logic Micro-Operations
the entire register transfer operation from the source reg-
isters through the ALU and into the destination register Logic micro-operations such as AND, OR, Exclusive OR,
can be performed during one clock pulse period. etc., consider each bit of register separately and specify
binary operations for strings of bits (Table 2.3).
2.4.1 Arithmetic Micro-Operations
Table 2.3 |   Types of micro-operations
The basic arithmetic micro-operations such as addition,
subtraction, increment, decrement and shift are performed Micro-operation Name

F←0
on numeric data stored in registers. The basic component
Clear
of arithmetic is parallel binary adder, and by controlling
the input to adder, different micro-operations can be F←A∧B AND
­realized. Figure 2.1 depicts a 2-bit arithmetic circuit which F ←A∧B
includes two full-adder circuits and two multiplexers for
choosing different arithmetic micro-operations. There are F←A Transfer A
two 2-bit input numbers A and B and 2-bit output D. The F ←A∧B
F←B
two inputs from A go directly to X inputs of full adder.
Transfer B
The output of multiplexer goes to input Y of full adder.
F←A⊕B Exclusive OR
Cin F ← A ∨B OR
A0
S1
X0 C0 F ←A∨B NOR
S1 FA D0
S0 Y0 C1 F ← A⊕B Exclusive NOR
0 4×1
S0
B0 1 MUX F ←B Complement B
2
3
A1 F ←A∨B
X1 C1
S1 FA D1 F ←A Complement A
0 4×1
S0 Y1 C2
B1 1 MUX Cout F ←A∨B
2
3
0 1 F ←A∧B NAND

Figure 2.1 |   A 2-bit arithmetic circuit. F ← all 1’s Set to all 1’s

Chapter 2.indd 61 4/9/2015 9:49:51 AM


62     Chapter 2: Computer Organization and Architecture 

Logical micro-operations are capable of manipulating 2.5 CPU CONTROL DESIGN


individual bits or a portion of word stored in CPU regis-
ters. Let us consider the data in a register A. In another
register, B is the operand that will be used to modify the Central processing unit (CPU), or the brain of a com-
contents of A using logic micro-operations. Some of the puter, performs the data processing operations. It consists
applications are as follows: of three major parts: register set that stores intermedi-
ate data during instruction execution, ALU performs the
1. Selective set operation: In this, If a bit in B is required micro-operations and control unit that super-
set to 1, that same position in A sets to 1, other- vises all other elements for the transfer of information
wise that bit in A retains its previous value. from one register to the other. The main function of a
1 1 0 0 At CPU is to fetch an instruction from the memory and exe-
1 0 1 0 B (To set some bits in A) cute it. CPU is divided into three types of organizations:
1 1 1 0 At +1(A ¬ A + B)
1. Single accumulator organization: In this, one
2. Selective complement operation: If a bit in operand is implied in the accumulator, a special
B is set to 1, that same position in A gets comple- purpose register, and the other operand is a register
mented from its original value, otherwise it remains or the memory. Example: ADD R1 (R1 ← AC +
unchanged. R1), LOAD A (AC ← A), STORE T (M[T] ← AC).
1 1 0 0 At 2. General register organization: In this, the
1 0 1 0 B (To complement some bits in A) CPU will have several general purpose regis-
0 1 1 0 At +1(A ← A ⊕ B) ters which lead to shorter and efficient programs
because registers are faster. Example: ADD R1, R2
3. Selective clear operation: If a bit in B is set to (R1 ← R1 + R2). Figure 2.2 shows bus organiza-
1, that same position in A sets to 0, otherwise it tion for three registers R1, R2 and R3. The output
remains unchanged. of these registers and one from the external input
is connected to two multiplexers A and B. The two
1 1 0 0 At
1 0 1 0 B (To clear some bits in A) External
0 1 0 0 At +1(A ← A ⋅ B ′) input
R1
4. Mask operation: If a bit in B is set to 0, that same R2
position in A sets to 0, otherwise it remains unchanged.
R3
1 1 0 0 At
1 0 1 0 B (To clear some bits in A)
At +1(A ¬ A × B)
SELA MUX MUX SELB
0 0 0 0
2×4
Decoder A bus B bus
5. Clear operation: If the bits in the same position
in A and B are the same, they are cleared in A, else
they are set in A. SELD OPR ALU

1 1 0 0 At
1 0 1 0 B
At+1 (A ¬ A Å B)
External output
0 1 1 0
Figure 2.2 |   General register organization.
6. Insert operation: It is used to insert a specific bit
pattern into A register, leaving the other bit posi- select lines SELA and SELB from multiplexers A
tions unchanged. This is accomplished by two sub- and B select one of the input and feed to ALU.
operations: masking operation to clear the desired bit OPR specifies one of the possible operation codes
positions, followed by OR operation to introduce the that ALU will perform on the data inputs and the
new bits into the desired positions. Suppose you wanted output is transferred either to one of the registers
to introduce 10 into the low order two bits of A: using 2 × 4 decoder or to the external output say
1101 A (Original) and 1110 A (Desired) memory. The control word (Fig. 2.3) for the two-
operand instruction is as follows:
1 1 0 1 A (Original)
1 1 0 0 Mask 2 bits 2 bits 2 bits 4 bits
1 1 0 0 A (Intermediate)
SELA SELB SELD OPR
0 0 1 0 Added bits
1 1 1 0 A (Desired) Figure 2.3 |   A control word.

Chapter 2.indd 62 4/9/2015 9:49:53 AM


2.5 CPU CONTROL DESIGN     63

3. Stack organization: Stack may consist of number repeated continuously for a complete program and is
of registers or a part of main memory in which data known as the fetch-execute cycle (Fig. 2.4). The fol-
items are stored in consecutive locations that are lowing steps are performed for executing an instruction:
accessed by LIFO (last in, first out) mechanism. As
there is limited number of registers, a part of memory Start
is implemented as stack for storage and retrieval of
intermediate data. Stack pointer (SP) keeps a track
of the top item of a stack. The process of inserting Load PC contents
a new item onto a stack is known as push accom- to MAR
plished by first incrementing stack pointer and then
inserting an item from the data register. Increment PC to
SP ← SP + 1 point to next
instruction
M[SP] ← DR
The process of removing an item from the top of a
Load the instruction
stack is known as pop performed by first transfer-
stored at MAR to IR
IR ← M[MAR]
ring data into DR and then decrementing SP.
DR ← M[SP]
SP ← SP − 1 Decode the
instruction
Problem 2.5: A system has CPU organized in the
form of general register organization consisting of 16 Load any data
registers, each storing 32-bit data. Assume the ALU required into MDR
has 35 operations.
(a) How many multiplexers are there in A bus and B
Check
bus, and what is the size of each multiplexer? Yes Set PC to value
for jump
(b) How many selection inputs are needed for MUX A from jump inset
instruction
and MUX B?
(c) How many inputs and outputs are there in a decoder?
(d) How many inputs and outputs are there in ALU No
for data, including input and output carries?
(e) Formulate a control word for the system. Execute the
instruction
Solution:
(a) 32 Multiplexers, each of size 16 × 1.
(b) 4 Inputs each, to select one of 16 registers.
(c) 4 to 16 − Line decoder Check for No
(d) 32 + 32 + 1 = 65 data input lines interrupts
(e) 32 + 1 = 33 data output lines
4 bits 4 bits 4 bits 6 bits Yes
SELA SELB SELD OPR Service the
interrupt

2.5.1 Instruction Execution Figure 2.4 |   Instruction cycle.

A CPU generally executes one instruction at a time 1. Fetching the instruction: The next instruction
sequentially and a sequence of such instructions is is fetched from the memory address that is saved in
known as a program. The CPU executes the instructions the program counter, and memory content fetched
that reside in the main memory. In order to execute is stored in instruction register (IR). The program
an instruction, the CPU has to fetch the instruction counter then points to the next instruction that
first from the main memory into one of its registers. will be read in the next cycle.
It then decodes the instruction, that is, it decides what 2. Decode the instruction: During this cycle, the
the instruction intended to do, fetch operands required instruction inside the IR gets interpreted by the
and finally executes the instruction. This process is decoder.

Chapter 2.indd 63 4/9/2015 9:49:54 AM


64     Chapter 2: Computer Organization and Architecture 

3. Operand fetch: In case of a direct or indirect decoder generates a separate control line for each
memory instruction, the execution begins in the step in the control sequence. The encoder gets
next clock cycle. If the instruction has an indirect its input signal from the decoder, step decoder,
address, the effective address of the operand is read external input and condition codes and generates
from the main memory, and the required data is individual control signals. It is faster and more
fetched from the memory into memory data regis- efficient but less flexible and is difficult to add
ters. If the instruction has direct address, nothing new feature or correct mistakes in original design.
is done at this clock cycle.
4. Execute the instruction: The control unit of Clock Control step Reset
the CPU passes the instruction decoded by decoder counter
as a sequence of control signals to the different
functional units of the CPU to execute the tasks
required by the instruction such as reading values Step decoder
from registers or input devices, performing mathe-
T1 T2 Tn
matical or logic micro-operations by ALU, and writ- I1
ing the result back to a register or main memory. External
I2
inputs
Instruction
2.5.2 CPU Data Path IR Encoder
decoder
In Condition
CPU contains data paths that are responsible for routing codes
data between the functional units of a computer. The
following are the different data path structures available End
for routing: Control
1. Single bus structure: In this architecture, all CPU signals
Figure 2.5 |   Block diagram of hardwired control unit.
registers are connected to the same bus. Data can be
transferred either between CPU registers or between
CPU register and ALU at a given clock pulse. The 2. Micro-programmed control: Control signals
speed of operation is slow as only one operand can be are generated by using programming known as
transferred in one clock cycle and addition operation micro-programs that constitutes micro-instructions
(R1 ← R2 + R3) occurs in three clock cycles. (control word) (Fig. 2.6). Memory that is part
2. Two bus structure: All general purpose CPU
registers are connected to both buses say bus A and IR External
Sequences inputs
bus B; but special purpose registers are divided into (starting and branch
two groups, say group 1 connecting bus A to pro- address generator) Condition
gram counter and one input of ALU and group 2 codes
connecting bus B to MDR (Memory Data Register)
and other input of ALU. The two operands are Control address
transferred to ALU in 1 clock cycle and the addition Clock
register
operation (R1 ← R2 + R3) occurs in 2 clock cycles. Address
3. Three bus structure: The performance can be
further be improved by using three buses such that Control
Read
addition operation (R1 ← R2 + R3) can occur in command memory
one clock cycle.
Control word

2.5.3 Control Unit Design Micro instruction


register
Control unit is considered as brain of a CPU that con-
trols various units in the data path. The performance of
control unit is important as it determines the clock cycle Decoder
of the processor. Control unit can be designed either by
hardwired or by microprogram. Control Control
1. Hardwired control: Control unit is made up signals signals to
of sequential and combinatorial circuits to gener- within CPU system bus
ate the control signals and interpret instructions
(Fig. 2.5). The instruction decoder decodes the Figure 2.6 |   Block diagram of micro-programmed
instruction loaded in instruction register. The step ­control unit.

Chapter 2.indd 64 4/9/2015 9:49:55 AM


2.6 I/O INTERFACE (INTERRUPT AND DMA MODE)     65

of CPU is known as control memory and stores Table 2.4 |   RISC versus CISC
micro-instructions. The micro-program sequencer
generates the address of micro-instruction accord- RISC (Reduced CISC (Complex
ing to instruction stored in instruction register. ­Instruction Set ­Instruction Set
The address of micro-instruction to be executed ­Computers) ­Computers)
is available in content addressable register. Micro- Rich register set Less number of registers
program sequencer issues read command to read
micro-instruction from control memory into micro- Supports less addressing Supports more number of
instruction register which on execution generates modes addressing modes
control signals for various parts of a processor. This Supports fixed length Supports variable length
control unit design is more flexible to accommodate instruction instruction
new features and less error prone but quite slower Successful pipeline with Unsuccessful pipeline
than the hardwired unit. one instruction per cycle
The format of the control word is Example: ARM, Example: Pentium
Motorola processors
Branch Flag Control Control memory
condition signal address

On the basis of the type of control word supported, it is 2.6 I/O INTERFACE (INTERRUPT
divided into two types: AND DMA MODE)
1. Horizontal micro-programmed control unit:
In this design, the control signals are represented in I/O interface bridges the differences between CPU and
the form of 1 bit per control signal and it supports peripheral devices and provides a method for transfer-
longer control word. ring information between internal storage and external
2. Vertical micro-programmed control unit: In I/O devices. There are the following three modes of I/O
this design, the control signal is represented by using transfer:
encoding format.
1. Programmed I/O: The I/O device does not
have direct access to memory. It requires execution
Problem 2.6: Consider a control unit which has 1024 of several instructions by the CPU and the CPU
control word memory; it supports 48 control signals has to wait for the I/O device to be ready for either
and 8 flag conditions. What is the size of the control reception or transmission of data.
word in bits and control memory in bytes? 2. Interrupt initiated I/O: In this, instead of
Solution: waiting, the control is transferred from a currently
running program to another service program as a
(a) Using horizontal programmed control unit result of an external/internal generated request.
0 bits 3 bits 48 bits 10 bits ••Hardware interrupts: These interrupts are
Branch Flag Control Control present in the hardware pins.
condition signal memory ••Software interrupts: These are the instruc-
tions used in the program whenever the required
Size of control word = 61 bits functionality is needed.
Control memory = (1024 × 61)/8 = 128 × 61 bytes ••Maskable interrupts: These interrupts may
(b) Using vertical programmed control unit be enabled or disabled explicitly.
••Non-maskable interrupts: These interrupts
0 bits 3 bits 48 bits 10 bits are always there in the enable state. We cannot
Branch Flag Control Control disable them by explicit conditions (flags).
condition signal memory ••Vectored interrupts: These interrupts are
log 48 ~ 6 bits associated with the static vector address.
••Non-vectored interrupts: These interrupts
Size of control word = 19 bits
Control memory = (1024 × 19)/8 = 128 × 19 bytes are associated with dynamic vector address.
••External interrupts: These interrupts are
generated by external devices such as I/O.
2.5.4 RISC versus CISC Processors ••Internal interrupts: These devices are gener-
ated by the internal components of the processor
The differences between reduced and complex instruc- such as temperature sensor, power failure, error
tion set computers is given in Table 2.4 instruction, etc.

Chapter 2.indd 65 4/9/2015 9:49:55 AM


66     Chapter 2: Computer Organization and Architecture 

••Synchronous interrupts: These interrupts block in memory is given by the address register,
are controlled by the fixed time interval. All and the length of the bytes to transfer is given by
the interval interrupts are called as synchronous the word count register. The controller decrements
interrupt. a word counter each time it moves a data byte.
••Asynchronous interrupts: These interrupts
There are several modes of operation of DMA:
are initiated based on the feedback of previ-
ous instructions. All the external interrupts are ••Burst or block transfer mode: In this mode,
called as asynchronous interrupt. the entire block of data is transferred once the
3. Direct memory access (DMA): It is one of DMA controller is granted access to the system
several methods for coordinating the data transfers bus by the CPU. The bytes of data in the block are
between an I/O device and the core processing unit transferred before releasing control of the system
or memory in a computer. It refers to transfer of buses back to the CPU. The only disadvantage of
data directly between a fast storage device and this mode is that it renders the CPU inactive for
memory ­bypassing CPU because of its limited some long periods of time.
speed. DMA ­provides a significant improvement ••Cycle stealing mode: In this mode, the DMA con-
in terms of latency and throughput as it allows troller obtains access to the system buses like burst
the I/O device to access the memory directly, mode; but after one byte of data transfer, the control
without using the processor. There are certain of the system bus is released back to the CPU via
advantages of using DMA for data transfer: BG. It is then continually requested again via BR,
••DMA saves processor’s MIPS as the core can transferring one byte of data per request, until the
operate in parallel. entire block of data has been transferred. This mode
••DMA saves power because it requires less cir- is suitable for the systems in which the CPU cannot
cuitry than the processor to transfer data. be disabled for the considerable length of time as in
••DMA has no modulo block size restrictions. burst transfer modes such as for controllers moni-
Direct memory access (DMA) controller takes over toring the data in real time. The advantage is that
the control of buses to manage the transfer directly CPU is not idled for as long as in burst mode, but
between the I/O device and memory. Bus request the data block is not transferred as quickly.
••Transparent mode: It is the slowest yet more ef-
(BR) and Bus grant (BG) signals are used by the
DMA controller to request the CPU to relinquish ficient data transfer mode in terms of overall system
control of the buses and get the control of system performance. The DMA controller transfers data only
buses (Fig. 2.7). The DMA controller consists of when the CPU is busy in performing operations that
3 different registers: an address register, a control do not use the system buses. So, the CPU never stops
register and a word counter register. To transfer a executing its programs but the biggest disadvantage
block of data between an I/O device and memory, is complex hardware circuitry that needs to deter-
the controller stores initial values in the address mine when the CPU is not using the system buses.
register. The DMA channel then transfers the A DMA read transfers data from the memory to
block of information from or to memory according the I/O device, while DMA write transfers data
to the control register. The starting address of the from an I/O device to memory. The functional
Address behaviour of a DMA transfer outlined in Fig. 2.8:
bus ••TheCPU transmits the following information to a
Data DMA controller:
bus Data bus (a) beginning address in memory which is stored in
Address bus
buffer buffer address register in DMA controller.
DMA (b) Number of words to transfer which is stored in
select DS
Register Address word count register in DMA Controller.
select RS register (c) direction (memory-to-I/O device or I/O device-
Read RD to-memory), port ID, DMA mode of transfer
Internal bus

Word count and end of block transfer either through inter-


Write WR register rupt request or no interrupt request which is
Bus
request BR stored in control register as command word.
Control
Bus register ••The processor them relinquishes control of ­address,
BG
grant data and control buses to DMA Controller and
Interrupt Interrupt I/O ­returns to other processing activities while the
device
DMA controller starts the data transfer between
Figure 2.7 |   Block diagram of the DMA controller. I/O device and memory.

Chapter 2.indd 66 4/9/2015 9:49:56 AM


2.7 INSTRUCTION PIPELINING     67

Interrupt
BG Random Access
CPU
Memory (RAM)
BR
RD WR Address Data RD WR Address Data

Read control
Write control
Address
select Address bus
Data bus

RD WR Address Data
DS DMA acknowledge

RS Direct Memory I/O


Access (DMA) peripheral
DMA request
BR controller device

BG
Interrupt

Figure 2.8 |   DMA controller interconnection with memory, CPU and I/O devices.

••When the DMA controller accesses memory, it execution time of a set of instructions and there is no
synchronizes this memory request with an idle need to wait of the most part of the processor circuits for
period of the processor, thus disabling the pro- the other parts of the processor to complete their part
cessor, or requesting a halt of the processor, and of execution. Pipeline speed is limited by the slowest
awaits an acknowledgement. pipeline stage.
••After the completion of the block transfer, the DMA Throughput of a processor is the rate at which opera-
controller either raises an interrupt request if the tions get executed. Latency is the amount of time that a
interrupts are enabled or indicates the ­completion single operation takes to execute. In an unpipelined com-
in its status register and the ­processor recognizes puter, throughput = 1/latency, as each operation exe-
I/O completion (either by interrupt signal or by cutes by itself and for pipelined computer, throughput
reading the status register) and gets its ­system > 1/latency, since execution of instruction is overlapped.
buses back and normal processing starts. The Consider a k-segment pipeline with a clock cycle time
­device has to initiate a new data transfer through Tp used to execute n tasks (Fig. 2.9). An equivalent non-
DMA request signal which is again acknowledged pipelined system takes Tn time to complete each task.
by CPU through DMA acknowledge signal via The speed up of a pipelined system over a non-pipelined
DMA controller. system is given by the following relation:
n × Tn
S=
2.7 INSTRUCTION PIPELINING (k + n - 1) × Tp

Theoretically, maximum speed up that a pipelined


In early computers, each instruction completely finished system can achieve is given by the following equation:
before the execution of the next one began. The hard-
kTp
ware circuits needed to perform different operations S= =k
of an instruction cycle are different and most part of Tn
these processor circuits are idle at a given moment of Pipelining Hazards: These hazards reduce the
time. These processor circuits wait for the other parts ideal speed up gained by pipelining by preventing the
of the processor to complete its part of execution first. next instruction in the sequence from being executing
Instruction pipelining is a technique for overlapping the during its designated clock pulse. Hazards forces the
execution of several instructions to reduce the overall pipeline to be stalled. There are three types of hazards:

Chapter 2.indd 67 4/9/2015 9:49:57 AM


68     Chapter 2: Computer Organization and Architecture 

Fetch an Decode Fetch an Execute Write


instruction instruction operand instruction result back

One clock cycle


(a)
Write
Fetch an Fetch an Execute
Decode result
instruction operand instruction
back

Pipeline latch / overhead/delay

One clock One clock One clock One clock One clock
cycle cycle cycle cycle cycle
Pipeline Stages
(b)

step → 1 2 3 4 5 6 7 8

Segment ↓ Fetch I1 I2 I3 I4

Decode I1 I2 I3 I4

Fetch operand I1 I2 I3 I4

Execute I1 I2 I3 I4

Write back I1 I2 I3 I4

(c)
Figure 2.9 |   (a) Unpipelined processor. (b) Pipelined five-stage processor.
(c) Timing diagram of a five-stage instruction pipeline.

1. Structural hazards: These result from resource instruction refers to a result which is yet not
conflicts when the hardware cannot support been calculated, that is, in this inst2 tries to
instructions that need simultaneous execution in read a source before inst1 writes to it. This
pipeling. situation arises if the read operation by instruc-
2. Data hazards: They arise when an instruction tion takes place before write done by other in-
depends on the result of a previous instruction and struction. For example,
that result is not yet calculated. inst1: R3 <-R1 + R2
There are three situations in which data hazards inst2: R4 <-R3 + R2
can occur: The first instruction calculates a value by adding
••Read
values in registers R1 and R2 and saves the result
after write (RAW), a true dependency
in register R3, and the second instruction uses
••Write after read (WAR), an anti dependency this saved value to calculate a result for regis-
••Write after write (WAW), an output dependency ter R4. However, in a pipeline, when operands
for the second operation are fetched, the results
Consider two instructions inst1 and inst2
from the first instruction will not have been
occurring, with inst1 occurring before inst2 in
saved yet, and so there arises a data depend-
the program order.
ency. It can be said that there is a data depend-
••Readafter write (RAW): A read after write ency with instruction inst2, as it is dependent
(RAW) data hazard is a situation in which an on the completion of instruction inst1.

Chapter 2.indd 68 4/9/2015 9:49:58 AM


2.7 INSTRUCTION PIPELINING     69

••Write after read (WAR): A write after 3. Control hazards: They arise from the pipelining
read (WAR) data hazard refers to a situation in of branches and other instructions that change the
which there is a problem with concurrent execu- value of PC.
tion, that is, inst2 tries to write a destination Speed up from pipelining
before it is read by inst1. This situation arises
if write operation completes first by instruction Average instruction time unpipelined
=
before the read operation takes place by other Average instruction time pipelined
instruction. For example,
Speed up from pipelining
inst1: R4 <-R1 + R3
CPI unpipelined × Clock cycle pipelined
inst2: R3 <-R1 + R2 =
CPI pipelined × Clock cycle pipelined
If a situation arises in which there is a chance
that inst2 may get completed before inst1 CPI unpipelined
Ideal CPI =
(i.e., with concurrent execution) we must note Pipeline depth
that we do not store the result of register R3
before inst1 has had a chance to fetch the Speed up from pipelining
operands. Ideal CPI × Pipeline depth × Clock cycle unpipelined
=
••Write after write (WAW): A write after CPI pipelined × Clock cycle pipelined
write (WAW) data hazard refers to a situation
in which there is a concurrent execution envi- Speed up from pipelining
ronment, that is, inst2 tries to write an oper- Ideal CPI × Pipeline depth × Clock cycle unpipelined
and before it is written by inst1.This situation
(Ideal CPI + Pipeline stall) × Clock cycle pipelined
=
arises if write operation by an instruction occurs
in the reverse order of the intended sequence. Assuming ideal CPI as 1, speed up is:
For example,
Speed up from pipelining
inst1: R2 <-R1 + R3
inst2: R2 <-R4 + R5 Pipeline depth × Clock cycle unpipelined
(1 + Pipeline stall) × Clock cycle pipelined
=
The WB (write back) of inst2 must be delayed
until the execution of inst1. where CPI is cycles per instruction.

Problem 2.7: Consider a four-stage pipeline processor. The number of cycles needed by the four instructions I1, I2,
I3 and I4 in stages instruction fetch, decode, operand fetch and execute are shown below. Assume I2 is the branch
instruction. Draw the timing space diagram.

S1 S2 S3 S4
I1 2 1 1 1
I2 1 2 3 1
I3 1 1 1 2
I4 2 1 3 1

Solution:

STEP → 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Fetch I1 I1 I2 I3 - - - - - I3 I4 I4
Decode I1 I2 I2 - - - - - I3 - I4
Operand Fetch I1 I2 I2 I2 - - - I3 - I4 I4 I4
Execute I1 - - - I2 - - - I3 I3 - - I4

Chapter 2.indd 69 4/9/2015 9:49:59 AM


70     Chapter 2: Computer Organization and Architecture 

Problem 2.8: Assume a simple 5-stage pipeline (IF, ID, E, DF, W) each stage takes a single cycle. Assuming there
are no cache misses. How many cycles would the following code take to execute if there is no special hardware to
improve performance in the presence of hazards?
MOV edx,[ecx+100]
MOV ebx,[ecx+104]
ADD edx,ebx
MOV [ecx+108],ebx
MOV eax,[ecx+100]
ADD ebx,eax

Solution: The above code takes 14 cycles to execute, as shown below:

1 2 3 4 5 6 7 8 9 10 11 12 13 14
IF ID DF E W
IF ID DF E W
IF ID DF stall E W
IF ID stall DF stall W
IF ID stall DF stall stall E W
IF ID stall DF stall stall stall E W

Problem 2.9: In the below figure, calculate the total execution time after which the result of the fourth task enter-
ing the pipe above ready?

IF ID EX MEM WB

5 ns 5 ns 10 ns 10 ns 5 ns
Solution:
5 10 15 20 25 30 35 40 45 50 55 60 65
Inst1 IF ID EX EX MEM MEM WB
Inst2 IF ID EX EX MEM MEM WB
Inst3 IF ID EX EX MEM MEM WB
Inst4 IF ID EX EX MEM MEM WB

Therefore, the total execution time is 65 ns.

Problem 2.10: What is the mean overhead of a pipe- Problem 2.12: Calculate the time required to perform
line with 8 stages and an execution time per stage of 1000 operations in a 6-staged pipeline with an execu-
2 ns? tion time of 3 ns per stage?
Solution:
Solution: The mean overhead = (Stages - 1) ×
Execution time per stage = (8 - 1) × 2 = 7 × 2 = 14 ns Tp = (k - 1 + n) × T = (6 - 1 + 1000) × 3 = 3.015 µs

Problem 2.11: How many stages has a pipeline that Problem 2.13: Calculate the mean overhead of a pipeline
achieves a speed of 9.9 for 100 operations? with 7 stages and an execution time per stage of 2 ns?

Solution: Solution: Mean overhead of pipeline =


n ×k n × 90 (k × Tp - Tn )
Speed = ⇒ 9.9 = ⇒ n = 11 = (k - 1) × T = (7 - 1) × 2 = 12 ns
k -1 + n (90 - 1) + n k

Chapter 2.indd 70 4/9/2015 9:50:01 AM


2.7 INSTRUCTION PIPELINING     71

Problem 2.14: Consider a pipeline with 5 stages: IF, ID, EX, M and W. Assume that each stage requires one clock
cycle. Show how the following program segment for adding 2 arrays is processed and compare the clock cycles
needed in non-pipelined system with pipelined system when result of the branch instruction i.e. content of is avail-
able after WB stage.

LOAD R4 #400
L1: LOAD R1, 0 (R4);
LOAD R2, 400 (R4);
ADD R3, R1, R2;
STORE R3, 0 (R4);
SUB R4, R4, #4;
BNEZ R4, L1;

Solution: Number of cycles = [Initial instruction + (Number of instructions in the loop L1) × Number of loop
cycles] × Number of clock cycles/instruction (CPI)
= [1 + (6) × 400/4] × 5 = 3005

Timing diagram for one loop iteration in a pipelined system is as follows:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
LOAD R4 #400 IF ID EX M W
LOAD R1, 0 (R4) IF ID EX M W
LOAD R2, 400 (R4) IF ID stall stall EX M W
ADD R3, R1, R2 IF ID stall stall EX stall M W
STORE R3, 0 (R4) IF ID stall DF stall stall E W
SUB R4,R4, #4 IF ID stall Ex M W
BNEZ R4, L1 IF stall ID stall stall EX M W

Number of cycles in the loop = 15


Number of clock cycles for segment execution on pipelined processor
= 1 + (Number of clock cycles in the loop L1) × Number of loop cycles
= 1 + 15 × 400/4 = 1501
Number of Clock cycles for the program execution on non-pipelined processor
Speedup =
Number of Clock cycles for the segment execution on pipelined processor
3005
= = 2 times
1501

Problem 2.15: Consider a 5-stage pipeline with stages: For all following questions we assume that: (a) Pipeline
contains stages: IF (Instruction Fetch), IS (Issue), FO (Fetch operand), E (Execute) and W (Write). (b) Each stage
except E requires one clock cycle and system has 4 Functional Units for floating point operations, FP load/store,
FP addition/subtraction, FP multiplication and FP division, (c) Execution stage for Load/Store operations requires
1 clock cycle, for ADD or SUB operations requires 1 clock cycle, for MUL operation requires 3 clock cycles and for
DIV operation requires 4 clock cycles. All memory references hit in cache. Pipeline has forwarding circuitry for all
FUs, except FP-Load/Store where operand is ready after W-stage.

Chapter 2.indd 71 4/9/2015 9:50:01 AM


72     Chapter 2: Computer Organization and Architecture 

Timing diagram of is presented below:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
LOAD F6, 20(R5) IF IS FO E W
LOAD F2, 28(R5) IF ISD FO E W
MUL F0, F2, F4 IF IS stall stall FO E E E W
SUB F8, F6, F3 IF IS FO E W
DIV F10, F0, F6 IF IS stall stall stall stall FO E E E E W
ADD F6, F8, F2 IF IS FO E W
STORE F8, 50(R5) IF IS FO E W

Identify the hazards in the following instructions from the following list (Structural, Data, Control, RAW, WAR,
WAW, None)
1. MULT F0, F2, F4 and STORE F8, 50(R5)
2. DIV F10, F0, F6 and ADD F6, F8, F2
3. MULT F0, F2, F4 and DIV F10, F0, F6
4. DIV F10, F0, F6 and ADD F6, F8, F2

Solution: 1. Structural; 2. Data; 3. RAW; 4. WAR.

2.8 MEMORY HIERARCHY and ROM (read only memory). Integrated RAM chips
are available in two modes:

The storage media can be categorized in hierarchy accord- 1. Static RAM: It stores the binary information in
ing to their speed and cost (Fig. 2.10). As we move down flip flops and information remains valid until power
the hierarchy, access time increases and cost per bit is supplied. It has faster access time and is used in
decreases. implementing cache memory.
2. Dynamic RAM: It stores the binary information
as a charge on the capacitor. It requires refreshing
circuitry to maintain the charge on the capacitors
CPU after few milliseconds. It contains more memory
registers cells per unit area as compared to SRAM.
Decreasing Cache Increasing
cost and memory cost and
speed Main memory speed 2.8.1.1 Memory Interfacing
Magnetic disks If the required memory for the computer is larger
Increasing Decreasing
size size than the capacity of one chip, it is necessary to
Magnetic tapes connect multiple RAM and ROM chips to a CPU

Figure 2.10 |   Memory hierarchy.


through the data and address buses (Fig. 2.11). The
low-order address bus lines select the word within a
chip and other lines select a particular chip through
2.8.1 Main Memory its chip select inputs. Assume a computer system
needs 256 bytes of RAM and 512 bytes of ROM. The
It is the central storage unit that directly communicates configuration of RAM chip is 128 × 8 and ROM chip
with the CPU. It is designed using ­semiconductor- is 512 × 8. The RAM and ROM chips required are
integrated circuits and needs constant power supply to as follows:
maintain the information. It is expensive as compared to
auxiliary storage so it has limited capacity. Example: R/W Number of RAM chips = 256/128 = 2
(read/write) memory or RAM (random access memory) Number of ROM chips = 512/512 = 1

Chapter 2.indd 72 4/9/2015 9:50:02 AM


2.8 MEMORY HIERARCHY     73

The memory interconnection is depicted in the following diagram:

Chip select 1 CS1


Chip select 1 CS1
Chip select 2 CS2
128 × 8 Bidirectional Chip select 2 CS2 512 × 8
Data bus in
Read RD output mode only
RAM chip data bus ROM chip
Write WR
Address bus AD7 Address bus AD9

(a) (b)
Figure 2.11 |   (a) RAM chip. (b) ROM chip.

Problem 2.16: A computer employs RAM chips of 256 × 8 and ROM chips of 1024 × 16. The computer system needs
2K bytes of RAM and 4K bytes of ROM and four interface units each with four registers. Draw a memory address
map for the system and give the address range in hexadecimal for RAM and ROM chips.

Solution: RAM 2048/256 = 8 chips; 2048 = 211; 256 = 28


ROM 4096/1024 = 4 chips; 4096 = 212; 1024 = 210
Interface 4 × 4 = 16 registers; 16 = 24

Component Address 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
RAM 0000-O7FF 0   0   0   0   0   ↔ x x x x  x  x x x
3×8
decoder
ROM 4000-4FFF 0   1   0   0 ↔ x   x x x x x  x x   x x
2×4
decoder
Interface 8000-800F 1   0   0   0   0   0    0   0   0   0   0   0   x   x   x   x

2.8.2 Secondary Memory coated with magnetized material on both sides.


Multiple disks are stacked over one another on the
Secondary memory, also known as auxiliary memory or spindle with read/write heads on each surface. Bits
external memory, can store a large amount of data at are stored as spots on magnetized surface along
lesser cost per byte than the main memory. They are concentric circles called tracks. Tracks are further
non-volatile in nature, that is, data is not lost when the divided into wedge-shaped sectors.
device is powered off. The most common auxiliary stor- 4. Magnetic tapes: It consists of tape made up of
age devices used in consumer systems are flash memory, plastic covered with magnetic oxide coating. Tapes
optical disks and magnetic disks. are mounted on reels. Bits are recorded as magnetic
1. Flash memory: Flash memory is an electronic spots on tape along several tracks. R/W heads are
non-volatile fastest computer storage device that mounted in each track so that data can be recorded
can be electrically erased and reprogrammed. and read as a sequence of characters. Seven or nine
Example: flash drives and solid state drive. bits are recorded to form a character together with
2. Optical disk: Optical disks are low-cost mass a parity bit. Data is recorded in contiguous blocks
storage devices from which read and write opera- separated by inter-record gaps.
tions are performed using laser technology. Optical
disks can store huge amounts of data up to 6 GB 2.8.3 Cache Memory
(6 billion bytes). Different types of optical disks
are CD-ROM (compact disk read-only), WORM It is a special memory that compensates the speed
(write-once read-many), EO (erasable optical mismatch between processor and main memory access
disks) and DVD. time. It temporarily stores frequently used instructions
3. Magnetic disk: A magnetic disk is composed and data for faster processing by the CPU. Cache hit
of a circular platter made of metal or plastic and ratio is calculated to measure its performance. If a data

Chapter 2.indd 73 4/9/2015 9:50:02 AM


74     Chapter 2: Computer Organization and Architecture 

item requested by the CPU is found in cache it is called 2.8.4.1 Direct Mapping
hit otherwise it is a miss. Hit ratio is defined as ratio
of number of hits divided by total CPU references to In this technique, each block from the main memory has
memory. only one possible location in the cache memory. In this
example, say a block from main memory maps onto a
Number of hits block (i mod 128) of the cache. If there are 2n words in the
Hit ratio (h) =
Number of hits + Number of misses cache memory and 2m words in the main memory, then
Average access time = Hit ratio × Tc m-bit main memory address is divided into two fields: n
bits for index field to access the cache and (m − n) bits
+ (1 - Hit ratio)(Tc + Tm )
for the tag field. Each word in cache consists of the data
where Tc is cache access time and Tm is the main memory and the associated tag. Whenever a new block is brought
access time. into cache, tag is stored along with data bits. Index field
is further divided into block and word if there are mul-
2.8.3.1 Elements of Cache Design tiple words (say k) in a block. The lower k bits select one
of the k words in a block known as word field. The block
The various elements of cache design are as follows: field is used to distinguish a block from other blocks.
1. Cache size: It should be optimum, small enough Tag (m − n) bits Index (n bits)
to keep average cost per bit close to the main
memory and large enough to keep overall average Tag (m - n) bits Block (n - k) bits Word (k bits)
access time close to the cache memory.
2. Mapping function: It describes the mapping of When CPU generates a memory request, the block field
main memory block to cache block. There are three points to a particular block location in the cache. The
different mapping techniques: fully associative, direct high-order tag field is compared with tag bits associated
mapped and set associative cache organization. with that cache location. If they match, then the desired
3. Replacement algorithm: When a new memory word is in that block of cache. If there is no match, then
block is required in cache, one of the existing blocks the block containing the required word must be loaded
must be replaced by a new block. Example: FIFO to cache first (Fig. 2.13).
(first in, first out), LRU (least recently used). Main memory address Main memory
4. Write policy: Cache memory follows write-
through and write-back updating policies. In 5 7 4 Block 0
write-through policy, cache controller copies data Tag Block Word Block 1
immediately to main memory as data is written in
≈ ≈
cache. The data in main memory is always valid,
but this approach reduces system performance. In Cache memory Block 127
write back, update to memory block is delayed until Tag 0
Block 0 Block 128
the updated cache block is replaced by a new block. (5 bits) Data

≈ ≈ ≈ ≈
2.8.4 Cache Mapping Techniques Block 255

The cache memory can store a reasonable number of Block 127 Tag 3 Data Block 3968
blocks, but this number is always small as compared to
blocks in the main memory to keep average cost per bit
low. The correspondence between memory blocks and
≈ ≈
cache block is specified by the following mapping tech- Block 4095
niques. Consider a cache memory consisting of 2K words
with 128 blocks of 16 words each. Number of bits required Figure 2.13 |   Direct mapped cache organization.
to address a cache block is 11 bits. Main memory has 64K
The demerit of direct mapping is that hit ratio drops
words and bits required to address is 16 (Fig. 2.12).
­considerably if two or more words having same index and
different tags are accessed consecutively one after the other.

Main memory CPU 2.8.4.2 Fully Associative Mapping


64K × 8 Cache memory
2K × 8 In this technique, a main memory block can be placed
into any cache block location. It is the most flexible cache
Figure 2.12 |   Cache mapping example. organization. The main memory address is divided into

Chapter 2.indd 74 4/9/2015 9:50:04 AM


2.8 MEMORY HIERARCHY     75

two fields: word and tag. The associative memory stores contain the desired block. The high-order tag field is
both the address (tag) and data of the main memory. then compared associatively to the tags corresponding
Figure 2.14 shows the mapping of different blocks into to the matched set. If a match occurs, the corresponding
cache. High-order 12 bits of CPU address is placed in the word is read from cache else main memory is referred
argument register of the associative memory and com- and block containing that word is brought into cache for
pared to tag bits of each block of the cache to see if the future reference (Fig. 2.15).
desired block is present. Once the desired block is pres-
ent, 4-bit word is used to extract necessary word from Tag Set Word
the cache. 6 6 4 Main memory
Main memory address Main memory address Block 0
12 4 Block 1

Tag 0
Main memory ≈
Tag Word Cache memory
Block 0 Block 63
Set 0 Tag 0 Data Tag 2 Data
Cache memory Block 1 6 bits Block 62
Tag 0 Date
(12 bits)
Block i
Set 63 Tag 3 Date Tag 61 Date 4033 Tag 63
4095
Block 4095 Figure 2.15 |   Set-associative mapped cache organization.

Figure 2.14 |   Associative mapped cache organization.


Problem 2.17: Consider a memory hierarchy
It is necessary to compare high-order bits of main system containing a cache, a main memory and a
memory with all tag bits corresponding to each block to virtual memory. Assuming, cache access time of 5
find whether a given block is present in cache, so it is ns, and 80% hit ratio . The access time of the main
the most expensive. memory is 100 ns, and it has a 99.5% hit rate.
The access time of the virtual memory is 10 ms.
2.8.4.3 Set-Associative Mapping Calculate the average access time of the memory
hierarchy.
As fully associative mapping is an expensive solution and
direct mapping does not allow words with same index
Solution: As we know, the hit rate of virtual memory
but different tag to exist in cache, set associative map-
is 100%, the average access time for requests that
reach the main memory as (l00 ns × 0.995) + (10 ns ×
ping is a combination of both. It is an improvement over
direct mapping where contention problem is solved by
0.005) = 50,099.5 ns. Given this, the average access
time for requests that reach the cache is (5 ns × 0.80) +
having several choices for block placement. The figure
(50,099.5 ns × 0.20) = 10,024 ns.
below shows two-way set associative cache because each
block of main memory has two choices for block place-
ment in cache. A block i in the main memory can be
in any block belonging to set i mod S of cache, where
S is the number of sets. The block 0, 64, 128, … and so Problem 2.18: A computer uses RAM chips of 1024 ×
on of main memory can map into any of the two blocks 1 capacity.
in set 0. (a) How many chips are needed to provide a memory
The main memory address is divided into three fields: capacity of 16K bytes?
low-order bits for word field, set field to determine the (b) How many of these lines will be common to all
desired block from all possible sets and high-order bits chips?
for the tag field. Each word in cache consists of data and
the associated tag. Solution:
(a) Chips are needed to provide a memory capacity of
Tag Set Word 16K bytes = 16 × 8 = 128 chips
(b) Using 14 address lines (16K = 214), we have 10
lines specifying the chip address which is common
When the CPU generates a memory request, the set
to all chips.
field points to a particular set of the cache which might

Chapter 2.indd 75 4/9/2015 9:50:04 AM


76     Chapter 2: Computer Organization and Architecture 

Problem 2.19: Consider a 2-way set associative cache Problem 2.20: The access time of a cache memory
consisting of 256 blocks of 8 words each, and assume is 200 ns and that of main memory is 2000 ns. It is
that the main memory is addressable by 16-bit address estimated that 70% of the memory requests are for
and it consists of 4K blocks. Calculate the number read and remaining 30% for write. The hit ratio for
of bits in each of the TAG, BLOCK/SET and word read accesses only is 0.9. A write-through procedure
fields for different mapping techniques? is used.
Solution: For direct mapping, word field is of 3 (a) What is the average access time of the system
bits to identify 8 different words in a block (23 = 8). considering only memory read cycles?
As cache memory consists of 256 blocks so (28 = 256) (b) What is the average access time of the system for
8 bits are required to address a block because there is both read and write requests?
one-to-one correspondence of block k in main memory (c) What is the hit ratio taking into consideration the
to block (k mod 256) in cache memory. The remain- write cycles also?
ing 5 (16 − 8 - 3) high-order address bits are tag bits.
Thus, the main memory address for direct mapping is Solution:
divided as follows: (a) Average access time = 0.9 × 200 + 0.1 × 2200 =
180 + 220 = 400 ns
Tag Block Word (b) Average access time = 0.3 × 2000 + 0.7 × 400 =
5 bits 8 bits 3 bits 600 + 280 = 880 ns
(c) Hit ratio = 0.7 × 0.9 = 0.63
For fully associative mapping, number of word
bits are same, that is, 3 bits. Cache memory stores
both tag and data. The high-order tag bits of an
address generated by CPU are compared with tag
bits of each block so number of block bits is zero. All Problem 2.21: A 4-way set-associative cache memory
remaining bits (except word bits) are identified as tag uses blocks of four words. The cache can accommo-
bits. Thus, the main memory address for fully asso- date a total of 1024 words from the main memory.
ciative mapping is divided as follows: The main memory size is 128K × 32.
(a) Formulate all pertinent information required to
Tag Word
construct the cache memory.
13 bits 3 bits (b) What is the size of the cache memory?

For two-way set-associative mapping, cache Solution:


memory is mapped with two blocks per set. The word (a) Main memory is 128K = 217 so 17 bits address is
field has same number of bits, 3 bits. There are 128 generated. For a set size of 4, the number of sets
(256/2) sets so 7 bits are required to uniquely identify in cache is 256/4 = 64, so 6 bits are required. Each
these 128 sets. The remaining 6 (16 - 7 - 3) bits are block consists of 4 words so 2 bits are required for
used as tag bits. Thus, the main memory address for the word field.
set-­associative mapping is divided as follows:
Tag Set Word
Tag Set Word
9 bits 6 bits 2 bits
6 bits 7 bits 3 bits

(b) Since cache is 4-way set associative, 4 blocks per set are stored in cache memory.

Tag 1 Data 1 Tag 2 Data 2 Tag 3 Data 3 Tag 4 Data 4


9 bits 32 bits 9 bits 32 bits 9 bits 32 bits 9 bits 32 bits
Set 0
- Tag Data Tag Data Tag Data Tag Data
Set 128
Size of cache memory is 128 × 4(9 + 32) = 5248 bits.

Chapter 2.indd 76 4/9/2015 9:50:04 AM


SOLVED EXAMPLES     77

Problem 2.22: Suppose physical memory is of 2GB (b) Since cache is 2-way set associative, 2 blocks per
and each word is of 16 bits. There is a cache contain- set are stored in cache memory. So, the number
ing 2K words of data, and each cache block contains of sets is 128/2 = 64.
16 words. For each of the direct mapped and 2-way
set associative cache configurations, specify how the 21 bits 6 bits 4 bits
address would be partitioned. Tag Set Word

Solution:
(a) For direct mapping, the word field is of 4 bits
Problem 2.23: Consider a direct mapped cache of size
identify 16 different words in a block (24 = 16).
32 KB with block size 32 bytes. The CPU generates
As the cache memory consists of 2K words which
32-bit addresses. What are number of bits needed for
is equivalent to 2K/16 = 128 blocks so (27 =
addressing block in cache and number of tag bits?
128) 7 bits are required to address a block. The
remaining 20 (31 − 7 − 4) high-order address
Solution:
bits are tag bits. Thus, the main memory address
for direct mapping is divided as follows: Tag Block Word
32 − 10 − 5 bits 10 bits 5 bits
20 bits 7 bits 4 bits
Tag Block Word The number of bits needed for addressing block in
cache and number of tag bits are 10, 17, respectively.

IMPORTANT FORMULAS

••Amdahl’s law ••Hit ratio (h)


−1 Number of hits
 Fi 
Soverall = (1 − ∑ Fi ) + ∑ 
=
Number of hits + Number of misses
 S
••Average access time
••The speed up of pipelined system over non-­pipelined
= Hit ratio × Tc + (1 − Hit ratio)(Tc + Tm )
system is given by:
 here Tc is cache access time and Tm is main
w
n × Tn memory access time.
S=
(k + n − 1) × Tp ••Direct mapping
••Maximum speed up that a pipelined system can Tag (m - n) bits Block (n - k) bits Word (k bits)
achieve is given by:
••Set-associative mapping
kTp
S= =k Tag Set Word
Tn

SOLVED EXAMPLES

1. The principle of locality is used in 2. Which memory unit has lowest access time?
(a) Interrupt (b) Registers (a) Cache (b) Registers
(c) DMA (d) Cache memory (c) Optical disk (d) Main memory

Solution: It is used in cache memory to help the Solution: Registers are used for processing
program access small amounts of address space at and manipulating data and for holding memory
any instant. addresses that are available to the machine-code
programmer. So, they have lowest access time.
Ans. (d)
Ans. (b)

Chapter 2.indd 77 4/9/2015 9:50:06 AM


78     Chapter 2: Computer Organization and Architecture 

3. During DMA transfer, the DMA controller takes (c) a processor interrupt.
over the buses to manage the transfer (d) a clock interrupt.
(a) Directly from CPU to memory Solution: Hardware interrupt is present in hard-
(b) Directly from memory to CPU ware pins.
(c) Directly between the memory and registers Ans. (b)
(d) Directly between the I/O device and memory
9. Priority is provided by for access to memory
Solution: DMA controller manages transfer by various I/O channels and processors.
between I/O device and memory.
(a) a register
Ans. (d)
(b) a counter
4. Booth’s algorithm is used for the arithmetic opera- (c) the processor scheduler
tion of (d) a controller
(a) addition. (b) subtraction. Solution: Controller sets priority for memory
(c) multiplication. (d) division. access by various I/O devices and processes.
Solution: It is a multiplication algorithm that Ans. (d)
multiplies two signed binary numbers in 2’s com- 10. By applying the principle of temporal locality,
plement notation. processes are likely to reference pages that
Ans. (c)
(a) have been referenced recently.
5. The reason for improvement in CPU performance (b) are located at address near recently referenced
during pipelining is pages in memory.
(c) have been preloaded into memory.
(a) reduced memory access time.
(d) have to be reloaded into memory.
(b) increased clock speed.
(c) introduction of parallelism. Solution: Temporal locality refers to reuse of
(d) increase in cache memory. resources referenced within a short time frame.
Solution: Instruction-level parallelism is imple- Ans. (a)
mented within a single processor to allow faster 11. Which of the following is a correct statement
CPU throughput. related to L2 cache memory?
Ans. (c)
(a) T
 he level 1 cache is always faster than the level
6. Use of cache memory enhances 2 cache.
(b) The level 2 cache is used to mitigate the dynamic
(a) I/O access time
slowdown every time a level 1 cache miss occurs.
(b) memory access time.
(c) Level 2 cache comes as on board only.
(c) effective memory access time.
(d) In modern day computer, the level 2 cache is
(d) secondary storage access time.
considered an internal cache.
Solution: Cache memory compensates the speed
Solution: L2 level of cache is placed between the
mismatch between processor and main memory
L1 and RAM. The L1 cache is always the fastest.
access time.
Ans. (c) Ans. (a)
7. An instruction cycle refers to 12. What is the control unit’s function in the CPU?
(a) fetching an instruction. (a) To decode program instructions
(b) executing an instruction. (b) To transfer data to primary storage
(c) fetching, decoding and executing an instruction. (c) To perform logical operations
(d) reading and executing an instruction. (d) To store arithmetic operations
Solution: It involves fetching, decoding and exe- Solution: Control unit controls several units of
cuting the instruction. CPU and helps decode program instructions.
Ans. (c)
Ans. (a)
8. A hardware interrupt is also called
13. CPU fetches the data and instructions from
(a) an internal interrupt.
(a) ROM (b) control unit
(b) an external interrupt.
(c) RAM (d) coprocessors chip

Chapter 2.indd 78 4/9/2015 9:50:06 AM


SOLVED EXAMPLES     79

Solution: The CPU fetches data and instructions Solution:


from RAM and executes it.
Ans. (c) 0
14. Which of the following affects the processing power 1
of the CPU? 0 2
Set
1 3
(a) Data bus, addressing schemes
(b) Clock speed, addressing schemes
(c) Clock speed, data bus
(d) Clock speed, data bus, addressing schemes 2c — 2 2cm — 2
Set
Solution: CPU processing speed is affected by 2c — 1 2cm — 1
clock cycle, data bus for routing data and address- Set of associate cache Main memory
ing modes.
Ans. (d) Number
Number of setsofinsets in cache
cache
Number of blocksblocks
Number of in cache
in cache 2c 2c
15. The contents of the flag register after execution of the = = = =
following program by 8085 microprocessor will be: NumberNumber of blocks
of blocks in oneinset
one set
2 2

Program Number of sets in cache = c


SUB A So to map block k, main memory maps to the set
MVI B, (01)H (k mod c) of the cache.
DCR B Ans. (b)
HLT
(a) (54)H (b) (00)H (c) (01)H (d) (45)H 18. Which of the following is/are advantage of virtual
memory?
Solution:
(a) Faster memory to memory on an average
SUB A Subtract contents of memory loca-
(b) Processes can be given protected address
tion whose contents are stored in A
spaces
MVI B, (01)H Move B with (01)H (c) Linker can assign addresses independent of where
DCR B Decrement B gives (54)H the program will be loaded in physical memory
(d) Programs larger than the physical memory size
HLT
can be run
Ans. (a)
16. An N-bit carry look-ahead adder, where N is a mul- Solution: Virtual memory allows programs to
tiple of 4, employs ICS 74181 (40-bit ALU) and have a larger space than the physical memory size
74182 (4-bit carry look-ahead generator). to run. So, the programmer does not have to worry
about physical memory size.
The minimum addition time using the best archi-
Ans. (d)
tecture for adder is
19. The number of full and half adders required to add
(a) Proportional to N
16-bit numbers is
(b) Proportional to log N
(c) A constant (a) 8 half adders, 8 full adders
(d) None of the above (b) 1 half adder, 15 full adders
Solution: The addition time for N-bit carry look- (c) 16 half adders, 0 full adder
ahead adder is always a constant. (d) 4 half adders, 12 full adders
Ans. (c) Solution: To add N-bit numbers:
Number of half adders required = 1
Number of full adders required = N − 1
17. The main memory of a computer has 2 cm blocks
while the cache has 2c blocks. If the cache uses the
Therefore, to add 16-bit number only, 1 half adder
set-associative mapping scheme with two blocks per
and 15 full adders are required.
set, then block k of the main memory maps to the set:
Ans. (b)
(a) (k mod m) of the cache
20. Which of the following requires a device driver
(b) (k mod c) of the cache
(c) (k mode 2c) of the cache (a) Register (b) Cache
(d) (k mod 2cm) of the cache (c) Main memory (d) Disk

Chapter 2.indd 79 4/9/2015 9:50:07 AM


80     Chapter 2: Computer Organization and Architecture 

Solution: Disk is the I/O device attached exter- 23. Advantage of synchronous sequential circuits over
nally to the processor. Therefore, disk requires a asynchronous ones is
device driver.
(a) faster operation.
Ans. (d)
(b) ease of avoiding problems due to hazards.
21. More than one word are put in one cache block to (c) lower hardware requirement.
(d) better noise immunity.
(a) exploit the temporal locality of reference in a
program. Solution: Because of less delay, synchronous
(b) exploit the spatial locality of reference in a sequential circuits have faster operation than asyn-
program. chronous ones.
(c) reduce the miss penalty. Ans. (a)
(d) none of the above.
24. The total size of address space in a virtual memory
Solution: There are two types of locality of references system is limited by
temporal and spatial locality.
(a) the length of MAR.
The concept of spatial locality, instead of fetching
(b) the available secondary storage.
just one item from the main memory to the cache,
(c) the available main memory.
is useful to fetch several items that reside at adja-
(d) all of the above.
cent address as well.
So, option (b) is correct. Solution: Virtual memory depends only on the
Ans. (b) available size of the secondary memory.
22. Which of the following statements is false? Ans. (b)

(a) Virtual memory implements the translation of a 25. Comparing the time T1 taken for a single instruc-
program’s address space into physical memory tion on a pipelined CPU with time T2 taken on a
address space. non-pipelined but identical CPU, we can say that
(b) Virtual memory allows each program to exceed (a) T1 ≤ T2
the size of the primary memory. (b) T1 ≥ T2
(c) Virtual memory increases the degree of (c) T1 < T2
multiprogramming. (d) T1 plus T2 is the time taken for one instruction
(d) Virtual memory reduces the context-switching fetch cycle
overhead.
Solution: In case of one instruction, non-­
Solution: Virtual memory increases the context- pipelined CPU takes less time as compared to pipe-
switching overhead. lined CPU. This is due to buffer delays for pipelining.
Ans. (d) Ans. (b)

GATE PREVIOUS YEARS’ QUESTIONS

1. For a pipelined CPU with a single ALU, consider Solution: All the three statements cause hazards.
the following situations: Ans. (d)
I. The j + 1-st instruction uses the result of the
Common Data Questions 2 and 3: Consider the
jth instruction as an operand.
following assembly language program for a hypo-
II. The execution of a conditional jump instruction.
thetical processor. A, B and C are 8-bit registers.
III. The j-th and j + 1-st instructions require the
The meanings of various instructions are shown as
ALU at the same time.
comments:
Which of the above can cause a hazard?
MOV B, #0; B←0
(a) I and II only (b) II and III only MOV C, #8; C←8
(c) III only (d) All the three Z: CMP C, #0; Compare C with 0
(GATE 2003: 1 Mark) JZ X; Jump to X if zero flag is set

Chapter 2.indd 80 4/9/2015 9:50:07 AM


GATE PREVIOUS YEARS’ QUESTIONS     81

SUB C, #1; C←C−1


RRC A, #1; Right rotate A through carry Instruction Operation Instruction
by one bit. Thus: Size
; if the initial values of A and (in words)
the carry flag are a7..a0 and; c0
respectively, their values after the
MOV R1,  1 ← Memory
;R 2
5000 [5000]
execution of this; instruction will
be c0 a7..a1 and a0, respectively.
JC Y; Jump to Y if carry flag is set
MOV R2  2 ← Memory
;R 1
(R1) [(R1)]
JMP Z; Jump to Z
Y: ADD B, #1; B ← B + 1
JMP Z; Jump to Z
ADD R2, R3  2 ← R2 + R3
;R 1
X:
MOV 6000, ; Memory 2
2. If the initial value of register A is A0, the value of R2 [6000] ← R2
register B after the program execution will be
HALT ; Machine halts 1
(a) The number of 0 bits in A0
(b) The number of 1 bits in A0
(c) A0 5. Consider that the memory is byte addressable with
(d) 8 size 32 bits, and the program has been loaded
(GATE 2003: 2 Marks) starting from memory location 1000 (decimal).
Solution: After execution, register B will contain If an interrupt occurs while the CPU has been
number of 0 bits in A0. halted after executing the HALT instruction, the
Ans. (a) return address (in decimal) saved in the stack
will be
3. Which of the following instructions when inserted
at location X will ensure that the value of register (a) 1007 (b) 1020
A after program execution is the same as its initial (c) 1024 (d) 1028
value? (GATE 2004: 2 Marks)
(a) RRC A, #1 Solution:
(b) NOP; no operation
(c) LRC A, #1; left rotate A through carry flag by MOV R1, 5000 ;2 1000 to 1007
one bit
(d) ADD A, #1 MOV R2 (R1) ;1 1008 to 1011
(GATE 2003: 2 Marks)
ADD R2, R3 ;1 1012 to 1015
Solution: RRC A, #1
Ans. (a) MOV 6000, R2 ;2 1016 to 1023
4. Which of the following addressing modes are suit-
HALT ;1 1024 to 1027
able for program relocation at run time?
  (i) Absolute addressing   (ii) Base addressing Ans. (c)
(iii) Relative addressing (iv) Indirect addressing 6. Let the clock cycles required from various opera-
(a) (i) and (iv) (b) (i) and (ii) tions be as follows:
(c) (ii) and (iii) (d) (i), (ii) and (iv) Register to/from memory transfer: 3 clock cycles ADD
(GATE 2004: 1 Mark) with both operands in register: 1 clock cycle Instruction
fetches and decodes: 2 clock cycles per word
Solution: Both base addressing and relative
The total number of clock cycles required to exe-
addressing modes are suitable.
cute the program is
Ans. (c)
(a) 29 (b) 24
Common Data Questions 5 and 6: Consider the
following program segment for a hypothetical CPU (c) 23 (d) 20
having three-user registers R1, R2 and R3. (GATE 2004: 2 Marks)

Chapter 2.indd 81 4/9/2015 9:50:07 AM


82     Chapter 2: Computer Organization and Architecture 

Solution: 10. Consider a multiplexer with X and Y as data inputs


and Z as control input. Z = 0 selects input X, and
Clock Cycles
Z = 1 selects input Y. What are the connections
MOV R1, 5000 ;2 8 required to realize the 2-variable Boolean function
MOV R2 (R1) ;1 5 f = T + R, without using any additional hardware?
ADD R2, R3 ;1 1 (a) R to X, 1 to Y, T to Z
(b) T to X, R to Y, T to Z
MOV 6000, R ;2 8 (c) T to X, R to Y, 0 to Z
HALT ;1 2 (d) R to X, 0 to Y, T to Z

Total 24 clock cycles are required. Solution: Connect R to X, 1 to Y and T to Z.


Ans. (b) Ans. (a)
7. Consider a small two-way set-associative cache 11. The microinstructions stored in the control
memory, consisting of four blocks. For choosing memory of a processor have a width of 26 bits.
the block to be replaced, use the least recently used Each microinstruction is divided into three fields: a
(LRU) scheme. The number of cache misses for the micro-operation field of 13 bits, a next address field
following sequence of block addresses 8, 12, 0, 12, 8 is (X), and a MUX select field (Y). there are 8 status
bits in the inputs of the MUX. How many bits are
(a) 2    (b) 3     (c) 4     (d) 5 there in the X and Y fields, and what is the size of
(GATE 2004: 2 Marks) the control memory in number of words?
Solution: Sequence is: 8, 12, 0, 12, 8 (a) 10, 3, 1024 (b) 8, 5, 256
(c) 5, 8, 2048 (d) 10, 3, 512
(GATE 2004: 2 Marks)
12 12 12 Load
8 8 0 8 Control address
register
Total number of miss = 4
Ans. (c) Increment
8. A hard disk with a transfer rate of 10 MB/s is con- Control
stantly transferring data to memory using DMA. memory
The processor runs at 600 MHz, and takes 300 and 13
Mux Y
900 clock cycles to initiate and complete DMA Micro operations
transfer, respectively. If the size of the transfer is 8
20 KB, what is the percentage of processor time Status bits
X
consumed for the transfer operation?
Solution: MUX has 8 input bits, 3 select lines are
(a) 5.0%   (b) 1.0%   (c) 0.5%   (d) 0.1%
required. So, Y = 3. Total bits = 26
(GATE 2004: 2 Marks) X = 26 — 13 — 3 = 10. So, memory required = 210 = 1024
 Ans: (a)
Solution: Given that data transfer rate = 10 MB/s
and size of transfer = 20 KB 12. Which one of the following is true for a CPU
So, % processor time consumed = 20 KB × 100/10 having a single interrupt request line and a single
MB/s = 0.1% interrupt grant line?
Ans. (d) (a) N either vectored interrupt nor multiple inter-
9. A 4-stage pipeline has the stage delays as 150, 120, rupting devices are possible.
160 and 140 ns, respectively. Registers that are (b) Vectored interrupts are not possible but mul-
used between the stages have a delay of 5 ns each. tiple interrupting devices are possible.
Assuming constant clocking rate, the total time taken (c) Vectored interrupts and multiple interrupting
to process 1000 data items on this pipeline will be devices are both possible.
(a) 120.4 µs (b) 160.5 µs (c) 165.5 µs (d) 590.0 µs
(d) Vectored interrupt is possible but multiple
interrupting devices are not possible.
(GATE 2004: 2 Marks)
(GATE 2005: 1 Mark)
Solution: Maximum stage delay = 160 + 5 = 165 ns Solution: Vectored interrupts are not possible but
T = [4 + (1000 -1)] 165 = 165.5 µs multiple interrupting devices are possible.
Ans. (c) Ans. (b)

Chapter 2.indd 82 4/9/2015 9:50:08 AM


GATE PREVIOUS YEARS’ QUESTIONS     83

13. Normally user programs are prevented from handling


(1) A[I] = B[J] (a) Indirect addressing
I/O directly by I/O instructions in them. For CPUs
having explicit I/O instructions, such I/O protection (2) while (*A++) (b) Indexed addressing
is ensured by having the I/O instructions privileged. (3) int temp =*x (c) Auto increment
In a CPU with memory mapped I/O, there is no
explicit I/O instruction. Which one of the following (a) (1, c), (2, b), (3, a) (b) (1, a), (2, c), (3, b)
is true for a CPU with memory mapped I/O? (c) (1, b), (2, c), (3, a) (d) (1, a), (2, b), (3, c)
(a) I/O protection is ensured by operating system (GATE 2005: 2 Marks)
routine(s). Solution: A[I] = indexed addressing
(b) I/O protection is ensured by a hardware trap. *A++ = auto increment
(c) I/O protection is ensured during system Temp = *x = indirect addressing
configuration. Ans. (c)
(d) I/O protection is not possible.
16. Consider a direct mapped cache of size 32 KB with
(GATE 2005: 1 Mark) block size 32 bytes. The CPU generates 32-bit
Solution: I/O protection is ensured by operating addresses. The number of bits needed for cache
system routine(s). indexing and the number of tag bits are, respectively
Ans. (a) (a) 10, 17   (b) 10, 22   (c) 15, 17   (d) 5, 17
14. Consider a three word machine instruction ADD (GATE 2005: 2 Marks)
A[R0], @B
Solution: Cache size = 32 KB = 15 bits are
The first operand (destination) “A [R0]” uses required
indexed addressing mode with R0 as the index regis-
ter. The second operand (source) “@B” uses indirect Index bits = 15
addressing mode. A and B are memory addresses Tag bits 32 − 15 = 17
residing at the second and the third words, respec- Ans. (a)
tively. The first word of the instruction specifies 17. A 5-stage pipelined CPU has the following sequence
the opcode, the index register designation and the of stages:
source and destination addressing modes.
IF: Instruction fetch from instruction memory
During execution of ADD instruction, the two operands
RD: Instruction decodes and register read
are added and stored in the destination (first operand).
EX: Execute ALU operation for data and address
The number of memory cycles needed during the computation
execution cycle of the instruction is:
MA: Data memory access for write access, the
(a) 3 (b) 4 register read at RD state is used.
(c) 5 (d) 6 WB: Register write back.
(GATE 2005: 2 Marks) Consider the following sequence of instructions:
Solution: A[R0] = require 1 memory cycle I1: L R0, loc 1; R0 ⇐ M[loc1]
The 3rd and 4th operands are indirectly addressed I2: A R0, R0 1; R0 ⇐ R0 + R0
I3: S R2, R0 1; R2 ⇐ R2 − R0
so each requires 2 cycles = 4 memory cycles.
1 memory cycle will be required to store the result.
So, total of 6 memory cycles are required. Let each stage take one clock cycle. What is the
Ans. (d) number of clock cycles taken to complete the above
sequence of instructions starting from the fetch of I1?
15. Match each of the high level language statements
given on the left-hand side with the most natural (a) 8     (b) 10    (c) 12    (d) 15
addressing mode from those listed on the right- (GATE 2005: 2 Marks)
hand side. Solution: 12 clock cycles are taken, as shown below

Clocks 1 2 3 4 5 6 7 8 9 10 11 12
I1 IF RD EX MA WB
I2 IF - - - RD EX MA WB
I3 IF - - - RD - - EX MA WB

Ans. (c)

Chapter 2.indd 83 4/9/2015 9:50:08 AM


84     Chapter 2: Computer Organization and Architecture 

18. A device with data transfer rate 10 KB/s is con- fetch cycle of the first word of the instruction, its
nected to a CPU. Data is transferred byte-wise. register transfer interpretation is
Rn ⇐ PC+1;
Let the interrupt overhead be 4 s. The byte trans-
PC ⇐ M[PC];
fer time between the device interfaces register and
CPU or memory is negligible. What is the mini-
mum performance gain of operating the device The minimum number of CPU clock cycles needed
under interrupt mode over operating it under pro- during the execution cycle of this instruction is
gram-controlled mode? (a) 2    (b) 3    (c) 4    (d) 5
(a) 15 (b) 25 (GATE 2005: 2 Marks)
(c) 35 (d) 45 Solution: The minimum number of CPU clock
(GATE 2005: 2 Marks) cycles needed during the execution cycle = 4. This
is because
Solution: Data transfer rate = 10 KB/s 1 cycle is required to transfer already incremented
Interrupt overhead = 4 × 10−2 s value of PC
10 KB is sent = 1 s 2 cycles for getting data in MDR
1 B is sent = 1/10K = 100 − 10−2 s 1 to load value of MDR in PC
Minimum performance gain =  100 × 10−2/4 × 10−2 Ans. (c)
= 25
21. Consider a disk drive with the following
Ans. (b)
specifications:
Common Data Questions 19 and 20: Consider the 16 surfaces, 512 tracks/surface, 512 sectors/track, 1
following data path of a CPU. The ALU, the bus KB/sector, rotation speed 3000 rpm. The disk is oper-
and all the registers in the data path are of identi- ated in cycle stealing mode whereby whenever one 4
cal size. All operations including incrementation of byte word is ready it is sent to memory; similarly, for
the PC and the GPRs are to be carried out in the writing, the disk interface reads a 4 byte word from
ALU. Two clock cycles are needed for memory read the memory in each DMA cycle. Memory cycle time
operation - the first one for loading address in the is 40 nsec. The maximum percentage of time that the
MAR and the next one for loading data from the CPU gets blocked during DMA operation is:
memory bus into the MDR.
(a) 10 (b) 25
MAR MDR (c) 40 (d) 50
(GATE 2005: 2 Marks)
Solution:
Data transfer in one rotation = 512 × 1024 Bytes
S T 60
1 rotation takes = s
IR PC 3000
GPRs ALU 60
512KB is transferred in = s
3000
60
1 byte will be transferred = × 512 × 1024
19. The instruction “add R0, R1” has the register 3000
transfer interpretation R0 ⇐ R0 + R1. The mini- 4 bytes will be transferred
4
= 60 × × 512 × 1024 = 152.58 ns
mum number of clock cycles needed for execution
cycle of this instruction is 3000
(a) 2 (b) 3 40
Block % = = 26%
(c) 4 (d) 5 152.28
Ans. (b)
(GATE 2005: 2 Marks)
22. A CPU has 24-bit instructions. A program starts at
Solution: There will be three cycles-(1) R1out, address 300 (in decimal). Which one of the following
Sin, (2) R2out, Tin and (3) Sout, Tout, ALUadd, Rin. is a legal program counter (all values in decimal)?
Ans. (b) (a) 400 (b) 500
20. The instruction “call Rn, sub” is a two-word instruc- (c) 600 (d) 700
tion. Assuming that PC is incremented during the (GATE 2006: 1 Mark)

Chapter 2.indd 84 4/9/2015 9:50:09 AM


GATE PREVIOUS YEARS’ QUESTIONS     85

Solution: Size of instruction = 24 bits; Start and the bits are numbered 0 to 31, bit in position
address = 300. Legal address will be multiple of 0 being the least significant. Consider the following
three, that is, 300. emulation of this instruction on a processor that
 Ans. (c) does not have bbs implemented.
23. A CPU has a cache with block size 64 bytes. temp ← reg & mask
The main memory has k banks, each bank being Branch to label if temp is non-zero.
c bytes wide. Consecutive c byte chunks are The variable temp is a temporary register. For correct
mapped on consecutive banks with wrap around. emulation, the variable mask must be generated by
All the k banks can be accessed in parallel, but
two accesses to the same bank must be serialized. (a) mask ← 0 × 1  pos
A cache block access may involve multiple itera- (b) mask ← 0 × ffffffff  pos
tions of parallel bank accesses depending on the (c) mask ← pos
amount of data obtained by accessing all the k (d) mask ←0×f
banks in parallel. Each iteration requires decoding (GATE 2006: 2 Marks)
the bank numbers to be accessed in parallel and
this takes k/2 ns. The latency of one bank access is Solution: As there is only one bit with pos, the other
80 ns. If c = 2 and k = 24, the latency of retriev- bits need to be set to 0 in temp. The mask register
ing a cache block starting at address zero from the must have 1 in pos position, for which pos number
main memory is of left shifts over 1 need to be made.
Ans. (a)
(a) 92 ns (b) 104 ns
(c) 172 ns (d) 184 ns Common Data Questions 26 and 27: Consider two
cache organizations. The first one is 32 KB, 2-way
(GATE 2006: 2 Marks) set associative with 32-byte block size. The second
Solution: one is of the same size but direct mapped. The size
Time for one parallel process = k/2 + latency of an address is 32 bits in both cases. A 2-to-1 mul-
Time for one byte = 24/2 + 80 = 92 tiplexer has a latency of 0.6 ns while a k-bit com-
Total time for c bytes = 2 × 92 = 184 ns parator has a latency of k/10 ns. The hit latency of
Ans. (d) the set-associative organization is h1 while that of
the direct mapped one is h2.
24. A CPU has a five-stage pipeline and runs at 1
GHz frequency. Instruction fetch happens in the 26. The value of h1 is
first stage of the pipeline. A conditional branch (a) 2.4 ns (b) 2.3 ns
instruction computes the target address and (c) 1.8 ns (d) 1.7 ns
evaluates the condition in the third stage of the
pipeline. The processor stops fetching new instruc- (GATE 2006: 2 Marks)
tions following a conditional branch until the Solution:
branch outcome is known. A program executes Address bits = 32
109 instructions out of which 20% are conditional Block size = 32 B = 5 bits
branches. If each instruction takes one cycle to Size of cache = 32KB/32 = 1KB = 10 bits
complete on average, the total execution time of For 2-way set-associative memory = index bits = 9,
the program is tag bits = 18
(a) 1.0 s (b) 1.2 s k/10 = 18/10 = 1.8 + latency = 1.8 + 0.6 = 2.4
(c) 1.4 s (d) 1.6 s Ans. (a)
(GATE 2006: 2 Marks) 27. The value of h2 is
Solution: (a) 2.4 ns (b) 2.3 ns
Total execution time of the program (c) 1.8 ns (d) 1.7 ns
= 109 + 0.20 × 2 × 109 = 1.4 s (GATE 2006: 2 Marks)
Ans. (c)
Solution:
25. Consider a new instruction named branch-on-bit- In direct memory access: Tag bits = 17; Index = 10
set (mnemonic bbs). The instruction “bbs reg, pos, bits; word = 5 bits
label” jumps to label if bit in position pos of reg- k/10 = 17/10 = 1.7 + latency = 1.7 + 0.6 = 2.3
ister operand reg is one. A register is 32 bits wide Ans. (b)

Chapter 2.indd 85 4/9/2015 9:50:10 AM


86     Chapter 2: Computer Organization and Architecture 

Common Data Questions 28 and 29: A CPU has Solution: For 64 words, log264 = 6 bits are required.
a 32-KB direct-mapped cache with 128-byte block For lines = 128/4 = 32 lines, 5 bits are required
size. Suppose A is a two-dimensional array of size
512 × 512 with elements that occupy 8 bytes each. Tag bits Line Word
Consider the following two C code segments, P1 9 5 6
and P2. Ans. (d)
P1: for (i=0; i<512; i++) { 31. Consider a pipelined processor with the following
for (j=0; j<512; j++) { four stages:
x +=A[i][j];
} IF: Instruction fetch
} ID: Instruction decode and operand fetch
P2: for (i=0; i<512; i++) { EX: Execute
for (j=0; j<512; j++) { WB: Write back
x +=A[j][i]; The IF, ID and WB stages take one clock cycle each
} to complete the operation. The number of clock cycles
}
for the EX stage depends on the instruction. The
ADD and SUB instructions need 1 clock cycle and
P1 and P2 are executed independently with the
the MUL instruction needs 3 clock cycles in the EX
same initial state, namely, the array A is not in the
stage. Operand forwarding is used in the pipelined
cache and i, j, x are in registers. Let the number
processor. What is the number of clock cycles taken
of cache misses experienced by P1 be M1 and that
to complete the following sequence of instructions?
R2 ← R1 + R0
for P2 be M2.
ADD R2, R1, R0
28. The value of M1 is MUL R4, R3, R2 R4 ← R3 × R2
(a) 0 (b) 2048 SUB R6, R5, R4 R 6 ← R 5 − R4
(c) 16384 (d) 262144 (a) 7    (b) 8    (c) 10    (d) 14
(GATE 2006: 2 Marks) (GATE 2007: 1 Mark)
Solution: Solution:
Memory = 32 KB Clock 1 2 3 4 5 6 7 8
Block size = 128 B
I1 IF ID EX WB
Number of blocks = 256
Number of elements in block = 256/8 =16 I2 IF ID EX EX EX WB
P1 cache misses = M1 : 512 × 512/16 = 16384 I3 IF ID - - EX WB
Ans. (c)
Using operand forwarding, 8 clock cycles are required.
29. The value of the ratio M1/M2 is
Ans. (b)
(a) 0 (b) 1/16
32. In a simplified computer, the instructions are
(c) 1/8 (d) 16
OP RJ, Ri - Performs RJ OP Ri and stores the
(GATE 2006: 2 Marks)
result in register Ri.
Solution: OP m, Ri - Performs val OP Ri and stores the
P2 number of cache misses = M2 = 512 × 512 result in Ri. val denotes the content of memory
Ratio of M1:M2 = 1:16 location m.
Ans. (b) MOV, mRi - Moves the content of memory loca-
tion m to register Ri.
30. Consider a 4-way set-associative cache consist- MOV Ri, m - Moves the content of register Ri to
ing of 128 lines with a line size of 64 words. The memory location m.
CPU generates a 20-bit address of a word in main
memory. The number of bits in the TAG, LINE The computer has only two registers, and OP is either
and WORD fields are, respectively: ADD or SUB. Consider the following basic block:
t1 = a+b
(a) 9, 6, 5 (b) 7, 7, 6
t2 = c+d
e − t2
(c) 7, 5, 8 (d) 9, 5, 6
t3 =
(GATE 2007: 1 Mark)   t4 = t1 − t3

Chapter 2.indd 86 4/9/2015 9:50:11 AM


GATE PREVIOUS YEARS’ QUESTIONS     87

Assume that all operands are initially in memory. The Solution: Given that R1 = 10, so the loop will
final value of the computation should be in memory. run 10 times
What is the minimum number of MOV instructions 10 × 2 + 1 = 21
in the code generated for this basic block? Ans. (d)
(a) 2 (b) 3
(c) 5 (d) 6 34. Assume that the memory is word addressable.
After the execution of this program, the content of
(GATE 2007: 1 Mark) memory location 2010 is
Solution: The instructions generated in the code (a) 100 (b) 101
for this basic block are as follows: (c) 102 (d) 110
MOV a, Ri (GATE 2007: 1 Mark)
ADD b, Ri
MOV c, Rj Solution: It will remain 100, because the loop will
ADD d, Rj exit as the value in R1 becomes 0 when address in
SUB e, Rj R3 becomes 2010.
SUB Ri, Rj Ans. (a)
MOV m, Ri 35. Assume that the memory is byte addressable and
Ans. (b) the word size is 32 bits. If an interrupt occurs
Common Data Questions 33-35: Consider the during the execution of the instruction “INC R3”,
­following program segment. Here R1, R2 and R3 what return address will be pushed on to the stack?
are general purpose registers. (a) 1005 (b) 1020
(c) 1024 (d) 1040
Instruction Operation Instruction
Size (No. (GATE 2007: 1 Mark)
of Words) Solution: Memory is byte addressable, take 4 bytes
per word. So at INC R3, stack will contain 1024.
MOV R1, R1 ← m[3000] 2
(3000) Ans. (c)
LOOP: MOV R2 ← M[R3] 1 36. Consider a disk pack with 16 surfaces, 128 tracks
R2, (R3) per surface and 256 sectors per track. 512 bytes of
R2 ← R1 + R2
ADD R2, R1 1 data are stored in a bit serial manner in a sector.
The capacity of the disk pack and the number of
MOV (R3), R2 M[R3] ← R2 1 bits required to specify a particular sector in the
disk are respectively:
INC R3 R3 ← R3 + 1 1
(a) 256 Mbyte, 19 bits (b) 256 Mbyte, 28 bits
DEC R1 R1 ← R1 - 1 1 (c) 512 Mbyte, 20 bits (d) 64 Gbyte, 28 bits
BNZ LOOP Branch on not 2 (GATE 2007: 1 Mark)
zero
Solution:
Disk capacity = 16 surfaces × 128 tracks ×
HALT Stop 1
256 sectors × 512 bytes = 256 MB
Assume that the content of memory location 3000 Total number of sectors = 16 × 128 × 256 = 219
is 10 and the content of the register R3 is 2000. The Ans. (a)
content of each of the memory locations from 2000 to
2010 is 100. The program is loaded from the memory Linked Answer Questions 37 and 38: Consider a
location 1000. All the numbers are in decimal. machine with a byte addressable main memory of
162 bytes. Assume that a direct mapped data cache
33. Assume that the memory is word addressable. The consisting of 32 lines of 64 bytes each is used in the
number of memory references for accessing the system. A 50 × 50 two-dimensional array of bytes is
data in executing the program completely is stored in the main memory starting from memory
location 1100H. Assume that the data cache is ini-
(a) 10 (b) 11
tially empty. The complete array is accessed twice.
(c) 20 (d) 21
Assume that the ­contents of the data cache do not
(GATE 2007: 1 Mark) change in between the two accesses.

Chapter 2.indd 87 4/9/2015 9:50:11 AM


88     Chapter 2: Computer Organization and Architecture 

37. How many data cache misses will occur in total? exception occurs, so an exception is not allowed to
execute. Option (d) is the correct option.
(a) 48    (b) 50    (c) 56    (d) 59
Ans. (d)
(GATE 2007: 2 Marks)
41. For a magnetic disk with concentric circular tracks,
Solution: the seek latency is not linearly proportional to the
Main memory = 216 B seek distance due to
Block size = 64 B (a) non-uniform distribution of requests
Number of blocks = 32 (b) arm starting and stopping inertia
Number of elements = 50 × 50 = 2500 (c) higher capacity of tracks on the periphery of
Starting from location 1100 means from 68th block the platter
Number of blocks = 2500/64 = 40 blocks (d) use of unfair arm scheduling policies
Initially cache is empty for 32 misses, then 8 are
remaining from total 40 for one access (GATE 2008: 2 Marks)
Array is traversed twice, so data cache misses = Solution: Tracks on magnetic disks are concentric
40 + 8 + 8 = 56 and seek latency from one sector to other which
Ans. (c) may or may not be in different tracks. This seek
38. Which of the following lines of the data cache will distance is not proportional to latency since the
be replaced by new blocks in accessing the array tracks at periphery have higher diameter, and
for the second time? hence higher capacity to store data.
 Ans. (b)
(a) line 4 to line 11 (b) line 4 to line 12
(c) line 0 to line 7 (d) line 0 to line 8 42. Which of the following are NOT true in a pipelined
processor?
Solution:
Applying k mod c to find the location: 68 mod I. Bypassing can handle all RAW hazards.
32 = 4 to 11 II. Register renaming can eliminate all register
Ans. (a) carried WAR hazards.
III. C ontrol hazard penalties can be eliminated by
39. Which of the following is/are true for the auto- dynamic branch prediction.
increment addressing mode?
(a) I and II only (b) I and III only
I. It is useful in creating self-relocating code. (c) II and III only (d) I, II and III
II. If it is included in an Instruction Set Architec­
ture, then an additional ALU is required for (GATE 2008: 2 Marks)
effective address calculation. Solution: All the statements are true.
III. T he amount of increment depends on the size Ans. (d)
of the data item accessed.
43. For inclusion to hold between two cache levels L1
(a) I only (b) II only and L2 in a multi-level cache hierarchy, which of
(c) III only (d) II and III only the following are necessary?
(GATE 2008: 2 Marks) I. L1 must be a write-through cache
Solution: Only statement (III) is true. II. L2 must be a write-through cache
Ans. (c) III. T
 he associativity of L2 must be greater than
40. Which of the following must be true for the RFE that of L1
(Return from Exception) instruction on a general IV. The L2 cache must be at least as large as the
purpose processor? L1 cache
(a) IV only (b) I and IV only
I. It must be a trap instruction. (c) I, II and IV only (d) I, II, III and IV
II. It must be a privileged instruction.
III. A
 n exception cannot be allowed to occur during (GATE 2008: 2 Marks)
execution of an RFE instruction. Solution: L1 and L2 cache are placed between
(a) I only (b) II only CPU and they can be both write through cache
(c) I and II only (d) I, II and III only but not necessarily.
Associativity does not matter.
(GATE 2008: 2 Marks)
L2 cache must be at least as large as L1 cache,
Solution: RFE (Return from Exception) is a since all the words in L1 are also in L2.
privileged trap instruction that is executed when Ans. (a)

Chapter 2.indd 88 4/9/2015 9:50:11 AM


GATE PREVIOUS YEARS’ QUESTIONS     89

44. The use of multiple register windows with over- Solution: Total elements can come in one slot 2048.
lap causes a reduction in the number of memory After 2048 elements, same cache index will be on
accesses for [2][0] and [4][0].
Ans. (b)
I. Function locals and parameters
II. Register saves and restores 47. The cache hit ratio for this initialization loop is
III. Instruction fetches
(a) 0%   (b) 25%   (c) 50%   (d) 75%
(a) I only (b) II only (c) III only (d) I, II and III
(GATE 2008: 2 Marks)
(GATE 2008: 2 Marks)
Solution: Cache hit ratio is found out as follows:
Solution: Multiple register windows with over-
1024 1
lap causes a reduction in the number of memory = = 50%
accesses for register saves and restores. 2048 2
 Ans. (b) As we can see in the above, there will be 50% hits.
Ans. (c)
45. Consider a machine with a 2-way set-associative
data cache of size 64 KB and block size 16 bytes. Linked Answer Questions 48 and 49: Delayed
The cache is managed using 32 bit virtual addresses branching can help in the handling of control
and the page size is 4 KB. A program to be run on hazards.
this machine begins as follows:
48. For all delayed conditional branch instructions,
double ARR [1024] [1024]; irrespective of whether the condition evaluates to
int i, j;
true or false:
/* Initialize array ARR to 0.0 */
for(i=0; i<1024; i++) (a) T he instruction following the conditional
for(j=0; j<1024; j++) branch instruction in memory is executed.
ARR [i] [j] =0.0; (b) The first instruction in the fall through path is
executed.
The size of double is 8 bytes. Array ARR is located (c) The first instruction in the taken path is
in memory starting at the beginning of virtual page executed.
0xFF000 and stored in row major order. The cache (d) The branch takes longer to execute than any
is initially empty and no pre-fetching is done. The other instruction.
only data memory references made by the program
(GATE 2008: 2 Marks)
are those to array ARR.
The total size of the tags in the cache directory is Solution: The first instruction following the
branch instruction is always executed (irrespective
(a) 32 Kbits (b) 34 Kbits
of whether the branch is taken or not).
(c) 64 Kbits (d) 68 Kbits
Ans. (b)
(GATE 2008: 2 Marks)
49. The following code is to run on a pipelined proces-
Solution: sor with one branch delay slot:
Virtual address = 32 bits
2-way cache size = 64 KB I1: ADD R2 ← R7 + R8
1 set will contain = 32 KB entries = 15 bits I2: SUB R4 ← R5 − R6
Block size = 16 bytes = 4 bits I3: ADD R1 ← R2 + R3
Tag bits Set bits Word bits I4: STORE Memory [R4] ← R1
17 11 4 BRANCH to Label if R1 == 0
Tag size = 17 × 2 × 1024 = 34 kbits Which of the instructions I1, I2, I3 or I4 can legiti-
Ans. (b) mately occupy the delay slot without any other
program modification?
46. Consider the data given in the above question.
Which of the following array elements has the (a) I1     (b) I2     (c) I3     (d) I4
same cache index as ARR[0][0]? (GATE 2008: 2 Marks)
(a) ARR[0][4] (b) ARR[4][0] Solution: Instruction I2 contains delayed slot. I4
(c) ARR[0][5] (d) ARR[5][0] has data dependency in I2.
(GATE 2008: 2 Marks) Ans. (b)

Chapter 2.indd 89 4/9/2015 9:50:12 AM


90     Chapter 2: Computer Organization and Architecture 

50. How many 32K × 1 RAM chips are needed to pro- 53. Consider a 4-way set-associative cache (initially
vide a memory capacity of 256 K bytes? empty) with total 16 cache blocks. The main
memory consists of 256 blocks and the request for
(a) 8 (b) 32
memory blocks is in the following order:
(c) 64 (d) 128
0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32,
Solution: 73, 92, 155
256K × 8 Which one of the following memory block will NOT
Number of chips required = = 64
32K × 1 be in cache if LRU replacement policy is used?
Ans. (c) (a) 3    (b) 8    (c) 129    (d) 216
(GATE 2009: 2 Marks)
51. A CPU generally handles an interrupt by execut-
Solution:
ing an interrupt service routine
To decide the location (address), mod 4 is applied.
(a) as soon as an interrupt is raised.
(b) by checking the interrupt register at the end of Set 0 0, 4, 8,216 → 48, 32, 8, 92
the fetch cycle. Set 1 1, 133, 129, 73
(c) by checking the interrupt register after finishing Set 2 155
the execution of the current instruction. Set 3 255, 3, 159, 63
(d) by checking the interrupt register at fixed time 216 will not be there in cache.
intervals. Ans. (d)
Common Data Questions 54 and 55: A hard disk has
Solution: Interrupts are handled by checking the 63 sectors per track, 10 platters each with 2 record-
interrupt register after finishing the execution of ing surfaces and 1000 cylinders. The address of a
current instruction. sector is given as a triple <c, h, s>, where c is the
 Ans. (c) cylinder number, h is the surface number and s is
the sector number. Thus, the 0th sector is addressed
as <0, 0, 0>, the 1st sector as <0, 0, 1>, and so on.
52. Consider a 4-stage pipeline processor. The number
of cycles needed by the four instructions I1, I2, I3,
I4 in stages S1, S2, S3, S4 is shown below: (GATE 2009: 2 Marks)
54. The address <400, 16, 29> corresponds to sector
S1 S2 S3 S4 number:
I1 2 1 1 1 (a) 505035 (b) 505036
(c) 505037 (d) 505038
I2 1 3 2 2
I3 1 1 1 3 Solution:
Total surfaces = 10 × 2 = 20
Address is: 400 × 20 × 63 + 16 × 63 + 29 = 505037
I4 1 2 2 2
Ans. (c)
What is the number of cycles needed to execute the 55. The address of 1039th sector is
(a) <0, 15, 31> (b) <0, 16, 30>
following loop?
for (i=1 to 2) {I1; I2; I3; I4;} (c) <0, 16, 31> (d) <0, 17, 31>
Solution:
(a) 16   (b) 23   (c) 28   (d) 30 Address of 1039th sector = 16 × 31 + 31 = 1039
(GATE 2009: 2 Marks) Ans. (c)
Solution:

Clock 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
I1 S1 S1 S2 S3 S4
I2 S1 S2 S2 S2 S3 S3 S4 S4
I3 S1 S1 S2 S3 S4 S4 S4
I4 S1 S2 S2 S3 S3 S4 S4
For two iterations = 15 × 2 = 30 clock cycles
Ans. (d)

Chapter 2.indd 90 4/9/2015 9:50:12 AM


GATE PREVIOUS YEARS’ QUESTIONS     91

56. A main memory unit with a capacity of 4 mega- Here we have to find X in terms of Y. So,
bytes is built using 1M × 1-bit DRAM chips. Each  a a a -1 
X = a1 + 2 + 3 +  + (nn- =Y
 2 2) 
DRAM chip has 1 K rows of cells with 1K cells
2 4
in each row. The time taken for a single refresh
operation is 100 ns. The time required to perform  
If a0 + 1 + 2 + … + n-1 
a a a
one refresh operation on all the cells in the memory  2 4 2( n -1) 
unit is
 a -1 
< a1 + 2 + 3 + … + (nn-
a a

(a) 100 nanoseconds  2 4 2 2) 
(b) 100 × 210 nanoseconds
(c) 100 × 220 nanoseconds OR
(d) 3200 × 220 nanoseconds  a 1 
X = a0 + 1 + 2 +  + (nn -
a a
 = a0 +
Y
(GATE 2010: 1 Mark)  2 4 2 -1) 
 2
Solution:  
If a0 + 1 + 2 + … + (nn −−11) 
a a a
 
Main memory = 4 MB 2 4
Number of DRAM chips = 4 MB/ 1M × 1 bit = 32
2
Total cells = 32 × 1K × 1K  
> a1 + 2 + 3 + … + (nn −−12) 
a a a
Time taken to refresh all the cells = 32 × 1K ×  2 4 2 
1K × 100 ns Hence, we sum up as
Ans. (d) X = MAX(Y, a0 + Y/2)
57. The weight of a sequence a0, a1/2, … an−1 of real Ans. (b)
numbers is defined as a0 + a1/2 + … + an−1/2n−1.
58. A 5-stage pipelined processor has instruction fetch
A subsequence of a sequence is obtained by delet-
(IF), instruction decode (ID), operand fetch (OF),
ing some elements from the sequence, keeping the
perform operation (PO) and write operand (WO)
order of the remaining elements the same. Let X
stages. The IF, ID, OF and WO stages take 1 clock
denote the maximum possible weight of a subse-
quence of a0, a1, …, an−1 and Y the maximum pos-
cycle each for any instruction. The PO stage takes
sible weight of a subsequence of a1, a2, … an−1.
1 clock cycle for ADD and SUB instructions, 3 clock
cycles for MUL instruction and 6 clock cycles for
Then X is equal to
DIV instruction, respectively. Operand forwarding
(a) max(Y, a0 + Y ) (b) max(Y, a0 + Y/2) is used in the pipeline. What is the number of clock
(c) max(Y, a0 + 2Y ) (d) a0 + Y/2 cycles needed to execute the following sequence of
(GATE 2010: 2 Marks) instructions?
Solution: The concepts involve the Dynamic Instruction Meaning of Instruction
Programming in Algorithms.
Given that I0: MUL R2, R0, R1 R2 ← R0 × R1
X = max weight from the sequence (a0, a1, a2, … I1: DIV R5, R3, R4 R5 ← R3/R4
a a a -1
an−1) = a0 + 1 + 2 +  + n I2: ADD R2, R5, R2 R2 ← R5 + R2
2 4 2(n -1)
Y= max weight from the sequence (a1, a2, … an−1) I3: SUB R5, R2, R6 R5 ← R2 − R6
a2 a3 a -1
= a1 + + +  + (nn -2) (a) 13   (b) 15   (c) 17    (d) 19
2 4 2 (GATE 2010: 2 Marks)

Solution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
I0 IF ID OF PO PO PO WO
I1 IF ID OF - - PO PO PO PO PO PO WO
I2 IF ID OF - - - - - PO WO
I3 IF ID OF PO WO
With operand forwarding, 15 clock cycles are needed.
Ans. (b)

Chapter 2.indd 91 4/9/2015 9:50:14 AM


92     Chapter 2: Computer Organization and Architecture 

59. The program below uses six temporary variables a, Solution: Access to L1 cache = 2 ns
b, c, d, e, f. Access to L2 cache = 20 ns
a = 1 Block size of L2 is 16 words; data bus size is 4 words.
b = 10 So, time taken for this transfer = 4 × 22 = 88 ns.
c = 20 Ans. (d)
d = a + b 61. When there is a miss in both L1 cache and L2 cache,
e = c + d
first a block is transferred from main memory to
f = c + e
b = c + e L2 cache, and then a block is transferred from L2
e = b + f cache to L1 cache. What is the total time taken for
d = 5 + e these transfers?
return d + f (a) 222 nanoseconds (b) 888 nanoseconds
(c) 902 nanoseconds (d) 968 nanoseconds
Assuming that all operations take their operands
from registers, what is the minimum number of regis- (GATE 2010: 2 Marks)
ters needed to execute this program without spilling? Solution: Block transfer from main memory to L2
cache = 2 + 4 × (20 + 200) = 2 + 880 = 882 ns
(a) 2 (b) 3
L2 to L1 = 882 + 20 + 66 = 968 ns
(c) 4 (d) 6 Ans. (d)
(GATE 2010: 2 Marks) 62. A computer handles several interrupt sources of
which the following are relevant for this question.
Solution: Let us take three Registers R1, R2 and R3
••Interrupt from CPU temperature sensor (raises
R1 R2 R3 interrupt if CPU temperature is too high)
••Interrupt from Mouse (raises interrupt if the
a=1 b = 10 c = 20
mouse is moved or a button is pressed)
d = 11 b = 10 c = 20 ••Interrupt from Keyboard (raises interrupt when
d = 11 e = 21 c = 20 a key is pressed or released)
••Interrupt from Hard Disk (raises interrupt when
f = 41 e = 21 c = 20
a disk read is completed)
b = 41 e = 21 c = 20
Which one of these will be handled at the HIGHEST
e = 42 b = 41 f = b = 41 priority?
d = 47 e = 42 f = 41 (a) Interrupt from Hard Disk
(b) Interrupt from Mouse
All the operations will be completed using three
(c) Interrupt from Keyboard
registers only.
(d) Interrupt from CPU temperature sensor
Ans. (b) (GATE 2011: 1 Mark)
Common Data Questions 60 and 61: A computer Solution: Interrupt from CPU temperature sensor
system has an L1 cache, an L2 cache, and a main will be handled at the highest priority.
memory unit connected as shown below. The block Ans. (d)
size in L1 cache is 4 words. The block size in L2 63. Consider a hypothetical processor with an instruction
cache is 16 words. The memory access times are 2 of type LW R1, 20 (R2), which during execution reads
nanoseconds, 20 nanoseconds and 200 nanoseconds a 32-bit word from memory and stores it in a 32-bit
for L1 cache, L2 cache and main memory unit, register R1. The effective address of the memory loca-
respectively. tion is obtained by the addition of a constant 20 and
Data bus Data bus the contents of register R2. Which of the following
L1 L2 Main best reflects the addressing mode implemented by
Cache Cache memory this instruction for the operand in memory?
4 words 4 words
(a) Immediate addressing
60. When there is a miss in L1 cache and a hit in L2 (b) Register addressing
cache, a block is transferred from L2 cache to (c) Register indirect scaled addressing
L1 cache. What is the time taken for this transfer? (d) Base indexed addressing
(GATE 2011: 1 Mark)
(a) 2 nanoseconds (b) 20 nanoseconds
(c) 22 nanoseconds (d) 88 nanoseconds Solution: Effective address = contents of register
R2 + 20.
(GATE 2010: 2 Marks) Ans. (d)

Chapter 2.indd 92 4/9/2015 9:50:15 AM


GATE PREVIOUS YEARS’ QUESTIONS     93

64. On a non-pipelined sequential processor, a program Memory size for tag bits = 19 + 2 = 21
segment, which is a part of the interrupt service Total size of memory for tags = 21 × 256 = 5376 bits
routine, is given to transfer 500 bytes from an I/O Ans. (d)
device to memory.
66. Consider an instruction pipeline with four stages
Initialize the address register (S1, S2, S3 and S4) each with combinational circuit
Initialize the count to 500 only. The pipeline registers are given in the figure.
LOOP: Load a byte from device

Pipeline register (delay 1 ns)

Pipeline register (delay 1 ns)

Pipeline register (delay 1 ns)

Pipeline register (delay 1 ns)


Store in memory at address given by address register
Increment the address register
Decrement the count Stage Stage Stage Stage
If count!= 0 go to LOOP S1 S2 S3 S4
Assume that each statement in this program is delay delay delay delay
equivalent to a machine instruction which takes 5 ns 6 ns 11 ns 8 ns
one clock cycle to execute if it is a non-load/store
instruction. The load-store instructions take two
clock cycles to execute.
The designer of the system also has an alternate
approach of using the DMA controller to implement What is the approximate speed up of the pipeline in
the same transfer. The DMA controller requires 20 steady state under ideal conditions when compared
clock cycles for initialization and other overheads. Each to the corresponding non-pipeline implementation?
DMA transfer cycle takes two clock cycles to transfer (a) 4.0   (b) 2.5   (c) 1.1   (d) 3.0
one byte of data from the device to the memory.
What is the approximate speed up when the DMA Solution:
controller based design is used in place of the inter- Time taken by non-pipelined
rupt-driven program-based input output? Speed up =
Time taken by pipelined
(a) 3.4 (b) 4.4 5 + 6 + 11 + 8
(c) 5.1 (d) 6.7 = = 2.5
11 + 1
(GATE 2011: 2 Marks) Ans. (b)
Solution: By using load store instructions: 1 + 1 67. The decimal value 0.5 in IEEE single-precision
+ 500 × (2 + 2 + 1 + 1 + 1) = 2 + 500 × 7 = 3502 floating-point representation has
By using DMA: 20 + 500 × 2 = 1020
Speed up = 3502/1020 = 3.4 (a) Fraction bits of 000…000 and exponent value of 0
Ans. (a) (b) Fraction bits of 000…000 and exponent value
of −1
65. An 8 KB direct-mapped write-back cache is orga- (c) Fraction bits of 100…000 and exponent value of 0
nized as multiple blocks, each of size 32 bytes. The (d) No exact representation
processor generates 32-bit addresses. The cache (GATE 2012: 1 Mark)
controller maintains the tag information for each
cache block comprising of the following: Solution: Binary representation of 0.5 is 0.1000.
which can be written as = 1.000 × 2−1
Here, exponent = −1 and fraction bits are 000…000.
1 Valid bit
1 Modified bit
Ans. (b)
As many bits as the minimum needed to identify
the memory block mapped in the cache. 68. Register renaming is done in pipelined processors
What is the total size of the memory needed at the (a) A
 s an alternative to register allocation at com-
cache controller to store metadata (tags) for the cache? pile time
(a) 4864 bits (b) 6144 bits (b) For efficient access to function parameters and
(c) 6656 bits (d) 5376 bits local variables
(GATE 2011: 2 Marks) (c) To handle certain kinds of hazards
(d) As part of address translation
Solution: Number of blocks = 8K/32 = 256 blocks (GATE 2012: 1 Mark)

Tag bits Block bits Word bits Solution: Register renaming is done to handle
WAR/WAW hazards.
19 8 5
Ans. (c)

Chapter 2.indd 93 4/9/2015 9:50:15 AM


94     Chapter 2: Computer Organization and Architecture 

69. The amount of ROM needed to implement a 4-bit 73. Consider the following sequence of micro-operations.
multiplier is MBR ← PC
(a) 64 bits (b) 128 bits (c) 1 Kbits (d) 2 Kbits MAR ← X
PC ← Y
Solution: Amount of ROM required Memory ← MBR
= 22k × 2k (where k = number of bits) Which one of the following is a possible operation
= 22×4 × 2 × 4 performed by this sequence?
= 2 Kbits (GATE 2013: 2 Marks)
Ans. (d)
(a) Instruction fetch
70. A computer has a 256 KB, 4-way set-associative, (b) Operand fetch
write-back data cache with block size of 32 bytes. (c) Conditional branch
The processor sends 32-bit addresses to the cache (d) Initiation of interrupt service
controller. Each cache tag directory entry contains,
in addition to address tag, 2 valid bits, 1 modified Solution: Program counter value is stored in
bit and 1 replacement bit. memory by MBR and gets a new address by Y. This
indicates initialization of interrupt service routine.
The number of bits in the tag field of an address is Ans. (d)
(a) 11   (b) 14 (c) 16   (d) 27
74. Consider a hard disk with 16 recording surfaces
(GATE 2012: 2 Marks)
(0-15) having 16384 cylinders (0-16383) and each
Solution: cylinder contains 64 sectors (0-63). Data storage
256 KB capacity in each sector is 512 bytes. Data are orga-
Number of blocks = = 213 blocks nized cylinder-wise and the addressing format is
32 B
<cylinder no., surface no., sector no.>. A file of size
Due to 4-way set associative = 213/22 = 211 42797 KB is stored in the disk and the starting disk
32 bits location of the file is <1200, 9, 40>. What is the
cylinder number of the last sector of the file, if it is
Ans. (c) stored in a contiguous manner?
71. The size of the cache tag directory is (GATE 2013: 2 Marks)

(a) 160 Kbits (b) 136 bits (a) 1281 (b) 1282 (c) 1283 (d) 1284
(c) 40 Kbits (d) 32 bits Solution: Number of sectors required to store the
(GATE 2012: 2 Marks) 42797 × 1024
file = = 85594 sectors
512
Solution: Tag directory contains: Tag bits + 4 Number of sectors in a cylinder = 16 × 64 = 1024
additional bits = 20 bits
85594
Size of cache tag directory = 20 × 213 = 160 Kbits Total number of cylinders required = = 84
Ans. (a) 1024
Last sector will be stored on 1284th cylinder.
72. In a k-way set associative cache, the cache is divided Ans. (d)
into v sets, each of which consists of k lines. The lines
of a set are placed in sequence one after another. 75. Consider an instruction pipeline with five stages
The lines in set s are sequenced before the lines in without any branch prediction: Fetch Instruction
set (s+1). The main memory blocks are numbered (FI), Decode Instruction (DI), Fetch Operand
0 onwards. The main memory block numbered j (FO), Execute Instruction (EI) and Write Operand
must be mapped to any one of the cache lines from (WO). The stage delays for FI, DI, FO, EI and WO
are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively.
(a) (j mod v) * k to (j mod v) * k + (k-1) There are intermediate storage buffers after each
(b) (j mod v) to (j mod v) + (k-1) stage and the delay of each buffer is 1 ns. A pro-
(c) (j mod k) to (j mod k) + (v-1) gram consisting of 12 instructions I1, I2, I3, … I12 is
(d) (j mod k) * v to (j mod k) * v + (v-1) executed in this pipelined processor. Instruction I4
(GATE 2013: 1 Mark) is the only branch instruction and its branch target
is I9. If the branch is taken during the execution of
Solution: Number of sets = v
this program, the time (in ns) needed to complete
Number of main memory blocks = j
the program is
Number of lines = k (from 0 to k-1)
Position will be (j mod v)*k to (j mod v)*k +k-1 (a) 132 (b) 165 (c) 176 (d) 328
Ans. (a) (GATE 2013: 2 Marks)

Chapter 2.indd 94 4/9/2015 9:50:16 AM


PRACTICE EXERCISES     95

Solution: Clock pulse duration if (R2 > R1)


= Maximum stage delay + Buffer delay {
= 11 × 5 + (4 − 1) × 11 = 88 ns R2 ← R1 * R1 [y = a*a]
n + k − 1 = 88 + 88 − 11 = 165 }
Ans. (b) else
{
Common Data Questions 76 and 77: The follow- R2 ← c     [read value c from memory]
ing code segment is executed on a processor which R2 ← R2 * R1 [d = c * a;]
allows only register operands in its instructions. R2 ← R2 * R2 [d = d * d;]
Each instruction can have atmost two source oper- R2 ← R2 * R1 [e = c + a;]
ands and one destination operand. Assume that all R2 ← R2 * R2 [e = e * e;]
variables are dead after this code segment. }
c = a + b; Only `c’ is spilled variable here, which needs to be
d = c * a; stored and loaded from memory.
e = c + a; Ans. (b)
x = c * c;
77. What is the minimum number of registers needed
if (x > a) {
y = a * a; in the instruction set architecture of the proces-
} sor to compile this code segment without any spill
else { to memory? Do not apply any optimization other
d = d * d; than optimizing register allocation.
e = e * e;
} (a) 3 (b) 4   (c) 5   (d) 6
(GATE 2013: 2 Marks)
76. Suppose the instruction set architecture of the Solution: Let R1 = a, R2 = b. The machine code is
processor has only two registers. The only allowed c = a + b; R2 ← R1 + R2
compiler optimization is code motion, which moves
d = c * a; R3 ← R2 * R1
statements from one place to another while pre-
serving correctness. What is the minimum number e = c + a; R4 ← R2 + R1
of spills to memory in the compiled code? x = c * c; R2 ← R2 * R2
if (x > a) { R2 > R1
(a) 0 (b) 1
y = a * a; R1 ← R1 * R1
(c) 2 (d) 3
(GATE 2013: 2 Marks) }
else {
Solution: Suppose R1 and R2 hold a and b, respec- d = d * d; R3 ← R3 * R3
tively. Here for code optimization, code motion is e = e * e; R4 ← R4 * R4
allowed. The machine code is; }
R2 ← R1 + R2   (c = a + b)
R2 ← R2 * R2   (x = c * c) Four registers are required.
 Ans. (b)

PRACTICE EXERCISES

Set 1 (c) The width of the address bus


(d) The total number of general purpose CPU
1. The three different kinds of buses supporting the registers
architecture of a computer are
3. Which of the following statement is false about
(a) Address bus, data bus, I/O bus CISC ARCHITECTURE?
(b) Front bus, data bus, address bus
(c) Control bus, data bus, address bus (a) C
 ISC machine instructions may include com-
(d) I/O bus, address bus, instruction bus plex addressing modes, which require many
clock cycles to carry out.
2. The term `WORD’ of a CPU equals to (b) CISC control units are typically micro-pro-
(a) The maximum addressable memory size grammed, allowing the instruction set to be
(b) The width of a CPU register (integer or float point) more flexible.

Chapter 2.indd 95 4/9/2015 9:50:16 AM


96     Chapter 2: Computer Organization and Architecture 

(c) In the CISC instruction set, all arithmetic/logic 11. The most relevant addressing mode to write posi-
instructions must be register based for fast tion-independent code is
processing. (a) direct mode (b) auto mode
(d) CISC architectures may perform better in net- (c) relative mode (d) indexed mode
work centric applications than RISC.
12. A CPU uses 24-bit instruction. A program starts at
4. The register that holds the address of the loca- address 300 (in decimal). Which one of the follow-
tion to or from which data are to be transferred is ing is a legal program counter content (all values
called in decimal)?
(a) Index register (a) 324 (b) 512
(b) Accumulator (c) 600 (d) 700
(c) Memory address registers
(d) Memory data registers 13. An attempt to access a location not owned by a
program is called
5. Which one of the following is not a type of I/O
(a) data fault (b) address fault
channel?
(c) instruction fault (d) page fault
(a) Multiplexer (b) Selector
14. Which of the following statement about relative
(c) Block multiplexer (d) None of the above
addressing mode is FALSE?
6. The performance of a pipelined processor is (a) It enables reduced program code
degraded if (b) It allows indexing of array element with same
(a) the pipeline stages have different delays instruction
(b) consecutive instructions are to be executed serially (c) It enables easy relocation of data
(c) the pipeline stages share hardware resources (d) It enables faster address calculation than abso-
(d) all of the above lute addressing

7. The minimum time delay between the initiation of 15. Compared to CISC processors, RISC processors contain
two independent memory operations is called (a) more register and smaller instruction set
(a) Access time (b) Cycle time (b) larger instruction set and less registers
(c) Rotational time (d) Latency time (c) less registers and smaller instruction set
(d) more registers and larger instruction set
8. The register which keeps track of the execution of a
16. Micro-programmed control cannot be implemented
program and which contains the memory address of
in RISC architecture because
the instruction currently being executed is known
as (a) it tends to slow down the processor.
(b) it consumes more chip areas and large instruc-
(a) index register
tion set.
(b) memory address register
(c) handling a large number of registers is impos-
(c) program counter
sible in micro-programmed system.
(d) instruction registers
(d) the 1 instruction/cycle timing requirement
9. For interval arithmetic, the best rounding tech- for RISC is difficult to achieve in micro-pro-
nique used is grammed based architecture.

(a) rounding to plus and minus infinity 17. Relocation of the code is easier in irrespec-
(b) rounding to zero tive of the program code
(c) rounding to nearest zero (a) indirect addressing
(d) rounding to the next number (b) indexed addressing
10. Hardwired control unit are faster than micro-­ (c) base register addressing
programmed control unit because (d) absolute addressing

(a) they do not consist of slower memory elements. 18. In inverted page table organization, the size of the
(b) they do not have slower elements such as gates, page table depends on
flip flops and registers. (a) the number of processes
(c) they consist of elements based on VVLSI design (b) the size of page
technology. (c) the size of main memory
(d) they contain high-speed digital components. (d) the number of frames in the main memory

Chapter 2.indd 96 4/9/2015 9:50:17 AM


PRACTICE EXERCISES     97

19. When using the concept of locality of reference, the 25. A device employing INTA line for device inter-
page reference being made by a process rupt puts the CALL instruction on the data bus
(a) will always be to the page used in the previous while
page reference (a) INTA is active. (b) HOLD is active.
(b) is likely to be one of the pages used in the past (c) READY is active. (d) READY is active.
few page references
(c) will always to be one of the pages existing in 26. On receiving an interrupt from an I/O device, the CPU
the main memory (a) b ranches off to halt (or wait) for a predeter-
(d) will always lead to page fault mined time
(b) branches off to the interrupt service after com-
20. If the new version of processor is not made com-
pletion of the current instruction
patible to programs written for its older version, it
(c) branches off to the interrupt service routine
could be able to process at a faster speed
immediately
(a) the statement is true. (d) hands over control of address bus and data bus
(b) the statement is false. to the interrupting device
(c) the speed cannot be predicted.
(d) speed has nothing to do with the compatibility. 27. Using large block size in a fixed block size file
system leads to
21. A certain snooping cache can snoop only on address
(a) b etter disk throughput but poorer disk space
line. Which of the following is true?
utilization
(a) This would adversely affect the system if the (b) better disk throughput and better disk space
write-through protocol is used. utilization
(b) It would run well if the write-through protocol (c) it does not matter as the total memory size is
is used. same
(c) Data snooping is mandatory to be implemented (d) poorer disk throughput but better disk space
on data line. utilization
(d) Data snooping may not be required.
28. Which of the following statements are true about
22. When the frequency of the input signal to a CMOS paging?
gate is increased, the average power dissipation (a) It divides memory into units of equal size.
(a) decreases exponentially (b) It permits implementation of virtual memory.
(b) increases (c) It suffers from internal fragmentation.
(c) decreases (d) It suffers from external fragmentation.
(d) increases exponentially
29. The number of entries in an inverted page table
23. The disadvantage of hardwired control units with (a) is equal to the number of processes.
flip flop is (b) is equal to the number of page frames in the
(a) design becomes complex main memory.
(b) it requires more number of flip flops (c) is equal to the size of the page frame.
(c) control circuit speed does not match with flip (d) is equal to the number of page frames in cache
flops memory.
(d) flip—flops can handle the data unit not the
30. In a virtual memory system, the addresses used by
control unit
the programmer belongs to
24. In a vectored interrupt, (a) memory space (b) physical space
(a) the branch address is assigned to a fixed loca- (c) address space (d) main memory space
tion in memory.
31. Power consumption of processors can be vastly
(b) the interrupting source supplies the branch
reduced by making use of based transistors
information to the processor through an inter-
to implement the ICs.
rupt vector.
(c) the branch address is obtained from a register (a) NMOS only
in the processor. (b) TTL Schottky and PMOS
(d) the branch address is obtained from program (c) PMOS only
counter. (d) NMOS and PMOS

Chapter 2.indd 97 4/9/2015 9:50:17 AM


98     Chapter 2: Computer Organization and Architecture 

32. Address symbol table is generated by the 41. In a multiprogramming system, which of the fol-
(a) memory management software lowing concepts is used?
(b) assembler (a) Data parallelism (b) Paging
(c) table match of associative memory (c) L1 cache (d) DMA
(d) generated by CPU
42. PAL circuit can be defined as
33. How many 128 × 8 RAM chips are needed to have
(a) fixed OR and programmable AND logic.
a total RAM of 2048 bytes?
(b) programmable OR and programmable AND logic.
(a) 8    (b) 16    (c) 24    (d) 32 (c) fixed AND and programmable OR logic.
(d) fixed OR and fixed AND logic.
34. In 8085 microprocessor, how many I/O devices can
be interfaced in I/O mapped I/O technique? 43. If the clock input applied to a cascaded Mod-6 and
Mod-4 counter is 48 kHz. Then the output of the
(a) Either 256 input devices or 256 output devices
cascaded arrangement shall be of:
(b) 8 I/O devices
(c) 256 input devices and 256 output devices (a) 4.8 kHz (b) 12 kHz
(d) 512 input-output devices (c) 8 kHz (d) 48 kHz

44. If there are four ROM ICs of 8K and two RAM ICs
35. After reset, the CPU starts the execution of instruc-
of 4K words, then the address range of Ist RAM is
tion from memory address
(assume initial addresses correspond to ROMs)
(a) 1111H (b) 8000H
(c) 0000H (d) FFFFH (a) (8000)H to (9FFF)H (b) (5000)H to (7FFF)H
(c) (8000)H to (8FFF)H (d) (5000)H to (9FFF)H
36. In a microprocessor system, suppose TRAP, HOLD
45. The method for updating the main memory as soon
and RESET pin got activated at the same time,
as a word is removed from the cache is called
while the processor was executing some instruc-
tions, the system will (a) Write-through (b) Write-back
(c) Write-save (d) Cache-save
(a) execute the TRAP instruction
(b) execute the HOLD instruction Set 2
(c) execute the RESET instruction
(d) none of these instructions will be executed 1. The most appropriate matching for the following
pairs
37. In 8085 microprocessor, the programmer cannot
access which flag directly? X. Indirect addressing 1. Loops
Y. Immediate addressing 2. Pointers
(a) Sign flag (b) Carry flag Z. Auto-decrement addressing 3. Constants
(c) Auxiliary carry flag (d) Parity flag
(a) X − 3 Y − 2 Z − 1 (b) X - 1 Y - 3 Z - 2
38. Which of the following is a pseudo-instruction for (c) X − 2 Y − 3 Z − 1 (d) X - 3 Y - 1 Z - 2
8085?
2. Which of the following is not a form of memory?
(a) SPHL (b) CMP
(c) NOP (d) END (a) Instruction cache
(b) Instruction register
39. The term “cycle stealing” refers to: (c) Instruction opcode
(d) Translation look aside buffer
(a) Interrupt-based data transfer
(b) DMA-based data transfer 3. In serial data transmission, every byte of data is
(c) Polling mode data transfer padded with a `0’ in the beginning and one or two
(d) Clock cycle overriding 1’ s at the end of byte because
(a) Receiver is to be synchronized for byte reception.
40. Which of the following architecture is not suitable
(b) Receiver recovers lost `0’ and `1’ from these
for the following SIMD architecture?
padded bits.
(a) Vector processor (b) PLA-based processor (c) Padded bits are useful in parity computation.
(c) Von Neumann (d) PAL-based processor (d) None of these.

Chapter 2.indd 98 4/9/2015 9:50:17 AM


PRACTICE EXERCISES     99

4. In 2’s complement addition, overflow (c) H


 orizontal microprogramming, vertical micro-
programming, hardwired control.
(a) is flagged whenever there is carry from sign bit
(d) Vertical microprogramming, horizontal micro-
addition
programming, hardwired control.
(b) cannot occur when a positive value is added to
a negative value 10. A processor needs software interrupt to
(c) is flagged when the carries from sign bit and (a) test the interrupt system of the processor
previous bit match (b) implement co-routines
(d) none of the above (c) obtain system services which need execution of
5. The performance of a pipelined processor suffers if privileged instructions
(d) return from subroutine
(a) the pipeline stages have different delays
(b) consecutive instructions are dependent on each 11. Which is the most appropriate match for the items
other in the first list with the items in the second list?
(c) the pipeline stages share hardware resources
(d) all of the above List-I List-II
X. Indirect addressing I. Array implementation
6. Match the pairs in the following questions
By writing the corresponding letters only Y. Index addressing    II. Writing relocatable
code
(A) IEEE 488 (P) Specifies the interface for con-
necting a single device Z. B
 ase register III. Passing array as
(B) IEEE 796 (Q) Specified the bus standard addressing parameter
for connecting a computer to
(a) (X, III) (Y, I) (Z, II) (b) (X, II) (Y, III) (Z, I)
other devices including CPUs
(c) (X, III) (Y, II) (Z, I) (d) (X, I) (Y, IIII) (Z, II)
(C) IEEE 696 (R) Specifies the standard for an
instrumentation bus 12. Consider the following data path of a simple non-
(D) RS232-C (S) Specifies the bus standard for the piplelined CPU. The registers A, B, A1, A2, MDR,
“back plane” bus called multibus the bus and the ALU are 8-bit wide. SP and MAR
are 16-bit registers. The MUX is of size 8 × (2:1)
(a) (A) − (P); (B) − (R); (C) − (S); (D) − (Q)
and the DEMUX is of size 8 × (1.2). Each memory
(b) (A) − (P); (B) − (Q); (C) − (R); (D) − (S)
(c) (A) − (Q); (B) − (S); (C) − (R); (D) − (P)
operation takes 2 CPU clock cycles and uses MAR
(d) (A) − (R); (B) − (S); (C) − (P); (D) − (R)
(memory address register) and MDR (memory
data register). SP can be decremented locally
7. When an interrupt occurs, an operating system
(a) Ignores the interrupt
A2 A1
(b) Always change state of interrupted process MUX
B A DEMUX
after processing the interrupt 1:2 1:2
(c) Always resumes execution of interrupted pro-
cess after processing the interrupt
(d) May change state of interrupted process to dcr SP MAR MDR
`blocked’ and schedule another process.
8. RAID configuration of disks are used to provide The CPU instruction “push r”, where r = A or B
has the specification
(a) Fault-tolerance (b) High speed
(c) High data density (d) None of the above M[sp] ← r
SP ← SP − 1
9. Arrange the following configuration for CPU in
decreasing order of operating speeds: hardwired How many CPU clock cycles are needed to execute
control, vertical microprogramming, horizontal the “push r” instruction?
microprogramming. (a) 2    (b) 3    (c) 4    (d) 5
(a) Hardwired control, vertical microprogramming, 13. In the absolute addressing mode,
horizontal microprogramming.
(b) Hardwired control, horizontal microprogram- (a) the operand is inside the instruction.
ming, vertical microprogramming. (b) the address of the operand is inside the
instruction.

Chapter 2.indd 99 4/9/2015 9:50:18 AM


100     Chapter 2: Computer Organization and Architecture 

(c) the register containing the address of the oper- (a) A - 4, B - 3, C -1, D - 2
and is specified inside the instruction. (b) A - 2, B - 1, C -3, D - 4
(d) the location of the operand is implicit. (c) A - 4, B - 3, C -2, D - 1
(d) A - 2, B - 3, C -4, D - 1
14. What are the states of the auxiliary carry (AC)
and carry flag (CY) after executing the following 19. I/O redirection
8085 program? (a) implies changing the name of a file
MVI H, 5DH (b) can be employed to use an existing file as input
MIV L, 6BH file for a program
MOV A, H (c) implies connecting two programs through a pipe
ADD L (d) none of the above
(a) AC = 0, CY = 0 (b) AC = 1, CY = 1 20. The main difference(s) between a CISC and a RISC
(c) AC = 1, CY = 0 (d) AC = 0, CY = 1 processor is/are that a RISC processor typically

15. Horizontal microprogramming (a) has fewer instruction


(b) has fewer addressing modes and more registers
(a) does not require use of signal decoders (c) is easier to implement using hardwired control logic
(b) results in larger-sized microinstructions than (d) all of these
vertical microprogramming
(c) uses 1 bit for each control signal 21. How big is a 4-way set-associative cache made up of
(d) all of the above 16 bits word, 4 words per line and having 1024 sets?
(a) 32 KB (b) 64 KB
16. For the daisy chain scheme of connecting I/O
(c) 128 KB (d) 256 KB
devices, which of the following statements is true?
21. Consider the expression to be evaluated as X =
(M + N × O)/(P × Q). How many three address
(a) It gives non-uniform priority to various devices.
(b) It gives uniform priority to all devices.
instruction are required to evaluate the above
(c) It is only useful for connecting slow devices to
given expression?
a processor device.
(d) It requires a separate interrupt pin on the pro- (a) 3    (b) 4    (c) 5    (d) 6
23. In X = (M + N × O)/(P × Q), how many one-
cessor for each device.

17. A microprogram control unit is required to gener- address instructions are required to evaluate it?
ate a total of 25 control signals. Assume that during (a) 4    (b) 6      (c) 8    (d) 10
any microinstruction at most two control signals
are active. Minimum number of bits required in 24. A decimal number has 64 digits. The number of bits
the control word to generate the required control needed for its equivalent binary representation is
signal is . (a) 200   (b) 213   (c) 246   (d) 277

18. The correct matching for the following pairs is 25. Determine the speed up obtained from pipelining if
latencies for each stage in single cycle processor is
A. DMA I/O 1. High-speed RAM given as:
B. Cache 2. Disk
C. Interrupt I/O 3. Printer IF ID ALU MEM WB
D. Condition code 4. ALU 45 ns 20 ns 52 ns 44 ns 18 ns
register

ANSWERS TO PRACTICE EXERCISES

Set 1

1. (c) 3. (c) 5. (d) 7. (b) 9. (a)


2. (b) 4. (c) 6. (d) 8. (c) 10. (a)

Chapter 2.indd 100 4/9/2015 9:50:18 AM


ANSWERS TO PRACTICE EXERCISES     101

11. (c) 18. (d) 25. (a) 32. (b) 39. (b)
12. (c) 19. (b) 26. (b) 33. (b) 40. (c)
13. (b) 20. (a) 27. (a) 34. (c) 41. (b)
14. (d) 21. (a) 28. (c) 35. (c) 42. (a)
15. (a) 22. (b) 29. (b) 36. (d) 43. (a)
16. (a) 23. (b) 30. (c) 37. (c) 44. (c)
17. (c) 24. (a) 31. (d) 38. (d) 45. (b)

Set 2 Hardwired control > horizontal microprogramming


> vertical microprogramming
1. (c) Indirect addressing mode → Pointer access
10. (c) A processor needs software interrupts to obtain
Immediate addressing mode → Constant access system services which need execution of privileged
Auto decrement addressing mode → Loops instructions.
2. (c) An opcode is the portion of a machine lan- 11. Pointers are used for accessing indirect addresses.
guage instruction that specifies the operation to be
Indirect addressing mode → Pointer access
performed.
In case of immediate addressing, the constant rep-
3. (a) Bits in the beginning and at the end of byte are resent the effective address.
known as start and stop bits, respectively. Start Immediate addressing mode → Constant access
and stop bits are used for synchronization purpose. Auto decrement/increments are used for loop
4. (b) When two 2’s complement numbers are added, variables.
overflow occurs in the following situations: Auto decrement addressing mode → Loops

Both operands are positive and the result is negative. 12. (a) Push `r’ memory operation needs 2 clocks.
Both operands are negative and the result is positive. 13. (b) In absolute addressing mode, the address of the
So option (b) is true, overflow does not occur when operand is inside the instruction.
positive and negative numbers are added.
14. (c)
5. (d) Different hazards are caused due to various
dependencies. Different dependencies for pipelined Carry Auxilary carry
processor are as follows: 0 1 1 1 1 1 1 1
Structural dependency is due to different delays in (5D) # (0 1 0 1 1 1 0 1)B
pipelined stages. +(6D) # + (0 1 1 0 1 0 1 1)B
Control dependency is due to consecutive instruc-
tions are dependent on each other. (1 1 0 0 1 0 0 0)B
Data dependency is due to hardware resources
AC = 1 and C Y = 0
sharing.
15. (d) Features of horizontal microprogramming are
6. (c) (A) IEEEE 488 - (Q)
(B) IEEEE 796 - (S) (i) It does not require use of signal decoders
(C) IEEEE 696 - (R) (ii) It results in larger-sized microinstructions
(D) RS232-C - (P) than vertical microprogramming
(iii) It uses 1 bit for each control signal
7. (c) When interrupt is caused, the execution of cur-
rent instruction is stopped. After handling inter- 16. (a) The daisy chaining method of establishing­
rupt, the program resumes its execution. priority consists of a serial connection of all devices
that request an interrupt. The device with the
8. (a) RAID is random array of independent disks highest priority is placed in the first position, fol-
that combines multiple disk drive components into lowed by lower-priority devices up to the device
a logical unit. RAID configuration provides fault- with the lowest priority, which is placed last in
tolerance and high speed. the chain. The farther the device is from the first
9. (b) Horizontal micro programming has high paral- position, the lower is its priority. Therefore, daisy
lelism than vertical. So, the speed order is: chain gives non-uniform priority to various devices.

Chapter 2.indd 101 4/9/2015 9:50:18 AM


102     Chapter 2: Computer Organization and Architecture 

17. (10) To generate 25 control signals, 5 bits are 22. (b) Three-address instructions are as follow:
required. To generate two control signals, the fol-
MUL R1, N, O
lowing scheme will be used.
MUL R2, P, Q
ADD R3, M, R1
5 bits to identify first and 5 bits to identify second DIV X, R3, R2
including the case when one of them is not present.
So total bits required = 10. 23. (c) LOAD P (AC ← P)
18. (b) DMA I/O - Disk MPY Q (AC ← AC × Q)
Cache - High-speed RAM STORE X (X ← AC)
Interrupt I/O - Printer LOAD N (AC ← N)
Condition code register - ALU MPY O (AC ← AC × O)
19. (c) I/O redirection implies connection two pro- ADD M (AC ← AC + M)
grams through a pipe. DIV X (AC ← AC/X)
20. (d) The major characteristics of a RISC processor are STORE X (X ← AC)
(i) Relatively few instruction 24. (b) The number of bits is
(ii) Relatively few addressing modes
(iii) More registers 1064 − 1 = 2x − 1
(iv) Hardwired rather than microprogrammed ⇒ 1064 = 2x = x = log2 1064 ≈ 213
control
21. (a) Number of bytes per line = 16 bit × 4 byte = 25. (3.44) Speed up obtained by pipelining
8 bytes
(45 + 20 + 52 + 44 + 18)
Cache size = 8 × 4 × 1024 = 32 KB =
52
= 3.44

Chapter 2.indd 102 4/9/2015 9:50:19 AM

You might also like