0% found this document useful (0 votes)

121 views35 pages

CS8491 Computer Architecture Guide

Q: Evaluate the implications of using Direct Memory Access (DMA) on CPU performance and overall system efficiency.

Direct Memory Access (DMA) improves system efficiency by allowing peripherals to send or receive data directly to or from the main memory without CPU intervention, thus freeing CPU resources for other tasks. This reduces latency and enhances throughput, particularly in I/O-bound processes, by minimizing CPU bottlenecks and allowing greater parallelism in data handling .

Q: What role does the branch prediction buffer play in handling control hazards in processor pipelines, and what are its benefits?

The branch prediction buffer, or branch history table, addresses control hazards by keeping a record of whether branches were recently taken or not, allowing the system to predict the outcome of branches at runtime. This buffer enhances processor efficiency by mitigating the delays caused by control hazards, ensuring smoother and faster instruction execution without waiting to ascertain actual branch results .

Q: Why is the fused multiply-add operation significant in improving floating-point computations?

The fused multiply-add operation is significant because it performs both multiplication and addition in a single step while rounding only once after the addition, thus improving accuracy and performance. By reducing rounding errors and minimizing the number of operations, it enhances computational precision and is particularly beneficial in complex numerical applications .

Q: What are the advantages of using IEEE 754 normalized scientific notation in floating-point arithmetic?

IEEE 754 normalized scientific notation offers three main advantages: it simplifies data manipulation involving floating-point numbers making them easier to work with; it simplifies the algorithms for floating-point arithmetic by ensuring numbers fit a uniform format; and it increases the precision of stored numbers by removing unnecessary leading zeros and utilizing more bits for actual number representation .

Q: How do the guard, round, and sticky bits enhance the accuracy of floating-point computations?

Guard and round bits are additional bits kept during intermediate floating-point calculations to improve rounding accuracy. The guard bit is the first of two extra bits while the round bit is used to make the intermediate result fit the floating-point format. The sticky bit is set when there are non-zero bits beyond the round bit, helping the system discern subtle differences in rounding, thus enhancing accuracy .

Q: Analyze the steps involved in floating-point addition and why normalization is necessary in this process.

The steps in floating-point addition are: aligning the decimal point of the number with the smaller exponent, adding the significands, normalizing the sum, and then rounding the result. Normalization is crucial because it keeps the result within a standard range and maintains precision by ensuring that there are no redundant leading zeros, thereby enhancing the accuracy and efficiency of subsequent calculations .

Q: What are the differences between split cache and multilevel cache systems, and how do they impact computer performance?

Split cache systems divide the memory hierarchy into separate instruction and data caches that operate in parallel, optimizing access times for each. In contrast, multilevel caches provide a hierarchical approach with multiple cache levels, improving overall data access performance by mitigating the latency associated with single-level caches. The choice between these systems affects performance based on data access patterns and workloads .

Q: Explain the concept of subword parallelism and its benefits in computational efficiency?

Subword parallelism allows a processor to execute multiple parallel operations on smaller precision data by partitioning larger data paths, such as using a 128-bit adder to conduct simultaneous operations on smaller operands (e.g., sixteen 8-bit operands). This approach maximizes the use of available data paths and enhances computational efficiency, particularly benefiting graphics and audio processing .

Q: Discuss how MIPS architecture handles overflow in signed integer arithmetic and the significance of this approach.

MIPS architecture handles overflow in signed integers by using different instructions: the regular add and add immediate instructions cause exceptions on overflow, while the add unsigned and add immediate unsigned do not cause any overflow exceptions. This differentiation allows programmers to choose instructions based on whether they want their program to catch overflows or ignore them, providing flexibility in error management .

Q: Describe how the IEEE 754 floating-point standard handles rounding and why consistent rounding is crucial for numerical computations.

IEEE 754 handles rounding using guard, round, and sticky bits to ensure accurate representation of floating-point numbers by selecting the nearest representable value. Consistent rounding is crucial as it minimizes errors propagated through arithmetic operations, crucial for applications requiring high precision such as scientific computing and financial analysis .

This document contains the syllabus for the course CS8491 Computer Architecture from Dr. NGP Institute of Technology. It outlines 5 units that will be covered: [1] Basic structure of a computer system, [2] Arithmetic for computers, [3] Processor and control unit, [4] Parallelism, and [5] Memory and I/O systems. It also lists two textbooks and four references that will be used for the course. The document provides sample two-mark questions and answers related to the course content, covering topics like computer components, memory types, and performance metrics.

Uploaded by

Jbr Raheem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views35 pages

CS8491 Computer Architecture Guide

Uploaded by

Jbr Raheem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Dr NGP IT CS 8491 Computer Architecture Dept of IT

Dr. N.G.P. INSTITUTE OF TECHNOLOGY

Coimbatore-641048

DEPARTMENT OF INFORMATION TECHNOLOGY

Two Mark Questions with Answers

CS8491 / COMPUTER ARCHITECTURE

REGULATION: 2017 (AU – CHENNAI)

Prepared by Reviewed by
Ms .R.P.Narmadha, AP/IT
Dr NGP IT CS 8491 Computer Architecture Dept of IT

SYLABUS

CS8491 COMPUTER ARCHITECTURE LTPC

3 00 3

UNIT I BASIC STRUCTURE OF A COMPUTER SYSTEM 9

Functional Units – Basic Operational Concepts – Performance – Instructions: Language of the Computer –
Operations, Operands – Instruction representation – Logical operations decision making – MIPS Addressing.

UNIT II ARITHMETIC FOR COMPUTERS 9

Addition and Subtraction – Multiplication – Division – Floating Point Representation – Floating Point
Operations – Subword Parallelism

UNIT III PROCESSOR AND CONTROL UNIT 9

A Basic MIPS implementation – Building a Data path – Control Implementation Scheme – Pipelining –
Pipelined data path and control – Handling Data Hazards & Control Hazards – Exceptions.

UNIT IV PARALLELISM 9
Parallel processing challenges – Flynn‘s classification – SISD, MIMD, SIMD, SPMD, and Vector
Architectures - Hardware multithreading – Multi-core processors and other Shared Memory Multiprocessors
-Introduction to Graphics Processing Units, Clusters, Warehouse Scale Computers and other Message-
Passing Multiprocessors.

UNIT V MEMORY AND I/O SYSTEMS 9

Memory Hierarchy - memory technologies – cache memory – measuring and improving cache performance –
virtual memory, TLB‘s – Accessing I/O Devices – Interrupts – Direct Memory Access – Bus structure – Bus
operation – Arbitration – Interface circuits - USB.

TEXT BOOK:
1. David A. Patterson and John L. Hennessy, Computer Organization and Design: The
Hardware/Software Interface, Fifth Edition, Morgan Kaufmann / Elsevier, 2014.
2. Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig Manjikian, Computer Organization and
Embedded Systems, Sixth Edition, Tata McGraw Hill, 2012.

REFERENCES:
1. William Stallings “Computer Organization and Architecture” , Seventh Edition , Pearson
Education, 2006.
2. John P. Hayes, “Computer Architecture and Organization”, Third Edition, Tata Mc Graw Hill,
1998.
3. John L. Hennessey and David A. Patterson, Computer Architecture – A Quantitative Approach‖,
Morgan Kaufmann / Elsevier Publishers, Fifth Edition, 2012.
4. http://nptel.ac.in/.
Dr NGP IT CS 8491 Computer Architecture Dept of IT

UNIT 1
BASIC STRUCTURE OF A COMPUTER SYSTEM

1. List the eight great ideas in computer architecture. (APR/MAY-2015)

The eight great ideas in computer architecture are:

1. Design for Moore’s Law

2. Use Abstraction to Simplify Design
3. Make the Common Case Fast
4. Performance via Parallelism
5. Performance via Pipelining
6. Performance via Prediction
7. Hierarchy of Memories
8. Dependability via Redundancy

2. List the major components of a computer system. (May/June 2017)

The major components of a computer system are
 Processor
 Main Memory
 Secondary Memory
 Input Devices
 Output Devices

3. State the Bus

A bus is a group of wires on the main circuit board of the computer. It is a pathway for data
flowing between components. Most devices are connected to the bus through a controller which
coordinates the activities of the device with the bus.

4. How will work the Processor?

The processor is an electronic device about a one inch square covered in plastic. Inside the
square is an even smaller square of silicon containing millions of tiny electrical parts. It consists of
billions of transistors. It does the fundamental computing within the system and directly or indirectly
controls all the other components. It is also called as the central processing unit or CPU.
Examples: Pentium Processor , SPARC processor

5. List the types of Memory.

There are two types of memory Main Memory
Volatile(loses its information when power is removed)
Main storage Secondary Memory
Non volatile(retains its information when power is removed)
Secondary Storage/ mass storage

6. Compare the Main memory and Secondary memory

Main Memory Secondary Memory

Closely connected to the Processor Connected to main memory through the bus and a controller
Dr NGP IT CS 8491 Computer Architecture Dept of IT

Stored data are easily changed but changes are slow compared
Stored data are quickly and easily changed to main memory

Holds the programs and data that the Used for long term storage of programs and data
processor is actively working with

Interacts with the processor millions of Before data and programs can be used, they must be copied
times per second from secondary memory into the main memory

Needs constant electric power to keep

its information Does not need electric power to keep its information

7. List some of the input and output devices.

Some of the Input Devices are
Keyboard
Mouse
Microphone
Bar code reader Graphics tablet
Some of the Output Devices are Monitor
Printer Speaker

8. Identify the advantage of network computers. Networked computers have several major
advantages:
 Communication: Information is exchanged between computers at high speeds.
 Resource sharing: Rather than each computer having its own I/O devices, computers
on the network can share I/O devices.
 Nonlocal access: By connecting computers over long distances, users need not be near
the computer they are using.

8. List out the technologies implemented in computer system.

9. The Technologies implemented in computer system are
- Integrated circuit logic technology - Semiconductor DRAM
- Magnetic disk technology

10. State the Response Time.

Response time is also called execution time. The total time required for the computer to
complete a task, including disk accesses, memory accesses, I/O activities, operating
system overhead, CPU execution time.

11. State the Throughput.

Throughput or bandwidth is the total amount of work done in a given time.

12. Express the Execution Time (Nov /Dec 2016) (Nov/Dec 2015)
Execution time is defined as the reciprocal of the performance of the computer system It is
related by

13. How will measure the CPU time?

Dr NGP IT CS 8491 Computer Architecture Dept of IT

CPU time means the time the CPU is computing, not including the time waiting for I/O or
running other programs. It can be further divided into the CPU time spent in the program, called user
CPU time, and the CPU time spent in the operating system performing tasks requested by the
program, called system CPU time.

14. List out the types of programs to evaluate the performance? There are four levels of programs.
They are
- Real Programs
- Kernels
- Toy benchmarks
- Synthetic benchmarks

15. Show the basic components of performance?

The basic components of performance and unit of measure are given below

16. Classify the Benchmark Suites?

Benchmark Suites are made of collections of program, some of which may be kernels, and
many of them are typically real program. Examples: SPEC92 – for characterize performance in the
work station and server markets.

17. State the Amdahl’s Law. ( Nov/Dec 2014, April/May 2019)

Amdahl’s Law states that the performance improvement to be gained from using some faster
mode of execution is limited by the fraction of the time the faster mode can be used.

18. How would you formulate the speedup? Speedup is the ratio

Speedup tells us how much faster a task will run using the machine with the enhancement as
opposed to the original machine.

19. State the CPU Performance Equation

CPU time for a program can then be expressed two ways:
Dr NGP IT CS 8491 Computer Architecture Dept of IT

20. Write the formula for CPU execution time for a program.

21. If computer A runs a program in 10 seconds, and computer B runs the same program in 15
seconds, how much faster is A over B.

22. List out the characteristics of CPU Performance.

CPU performance is dependent upon three characteristics
 clock cycle (or rate),
 clock cycles per instruction, and
 Instruction count.

23. Classify the Locality of Reference?

A program that tend to reuse data and instructions they have used recently is known as
locality of reference. Locality of reference also applies to data accesses, though not as strongly as to
code accesses.
Two different types of locality
 Temporal locality states that recently accessed items are likely to be accessed in the
near future.
 Spatial locality says that items whose addresses are near one another tend to be
referenced close together in time.

24. State the SPEC?

SPEC (System Performance Evaluation Cooperative) is an effort funded and supported by a
number of computer vendors to create standard sets of benchmarks for modern computers.
It focuses on processor performance.

25. Find out the SPEC rating formulae.

Dr NGP IT CS 8491 Computer Architecture Dept of IT

The overall SPEC Rating for the computer is given by

( 1/n
)

26. State the MIPS.

Million Instructions per Second (MIPS) is a measurement of program execution speed based on
the number of millions of instructions.
MIPS is computed as:

27. List out the fields in an MIPS instruction?

MIPS fields are

op rs rt rd shamt funct

6 bits 5bits 5 bits 5 bits 5 bits 6 bits

where,

op: Basic operation of the instruction, traditionally called the opcode.

rs:The first register source operand
rt: The second register source operand.
rd: The register destination operand. It gets the result of the operation.
shamt: Shift amount.
funct: Function.

28. Show the addressing modes and its various types. (Nov/Dec 2017)
The different ways in which the location of an operand is specified in an instruction is referred to as
addressing modes.
The MIPS addressing modes are the following:
1. Immediate addressing
2. Register addressing
3. Base or displacement addressing
4. PC-relative addressing
5. Pseudo direct addressing

29. State the instruction and instruction Set.

The words of computer’s language are called as Instructions and its vocabulary is called as
Instruction Sets.

30. List out the design principles of hardware.

Four design principles

 Simplicity favors regularity

 Smaller is faster
 Make the common case fast
 Good design demands good compromises
Dr NGP IT CS 8491 Computer Architecture Dept of IT

31. Write an example for immediate operand.

The quick add instruction with one constant operand is called add immediate or addi. To add 4 to
register $s3, we just write addi $s3,$s3,4 # $s3 = $s3 + 4

32. State the Registers?

Operands of arithmetic instruction are restricted; they must be from a limited numbers of special
locations built directly in hardware called as registers.
The size of a register in MIPS architecture is 32 bits; group of 32 bits known as word.

33. How to work the data transfer instructions?

An instruction which transfers data between memory and registers are known as Data
Transfer Instructions.
Examples: lw, sw

34. Mention the spilling register?

The process of putting less commonly used variables into memory is called as spilling
registers.

35. Describe Stored Program Concept

There are two principles
- Instructions are represented as numbers.
- Data of the Programs are stored in memory as numbers that can be read/ written

36. State the Basic Block?

It is a sequence of instructions without branches, except possibly at the end and without
branch targets or branch labels except possibly at the beginning.

37. State the Program Counter (PC)?

The register containing the address of the next instructions in the program being executed is
known as Program Counter (PC).

38. Recall the Instruction Register? (Nov/Dec 2016)

An instruction register (IR) is the part of a CPU's control unit that holds the instruction
currently being executed or decoded.
Dr NGP IT CS 8491 Computer Architecture Dept of IT

39. State Instruction Set Architecture? (Nov/Dec 2015)

ISA is the embedded programming language of the central processing unit. It defines the CPU’s
functions and capabilities based on what programming it can perform or process. This includes the
word size, processor register types, memory addressing modes, data formats and the instruction set
that programmers use.

40. State the need for indirect addressing mode. (Apr/May 2017)
With direct addressing, the length of the address field is usually less than the word length,
thus limiting the address range. One solution is to have the address field refer to the
address of a word in memory, which in turn contains a full-length address of the operand.
this is known as indirect addressing.

41. List out the components of a computer system.

Five basic components of computer system
 Input Unit.
 Output Unit.
 Storage Unit.
 Central Processing Unit (CPU)
 Arithmetic and Logic Unit (ALU)
 Control Unit.

42. How to represent Instruction in a computer system? MAY/JUNE 2016

Computer instructions are the basic components of a machine language program. They are also known as
macro operations, since each one is comprised of a sequences of micro operations. Each instruction initiates a
sequence of microoperations that fetch operands from registers or memory, possibly perform arithmetic,
logic, or shift operations, and store results in registers or memory. Instructions are encoded as binary
instruction codes. Each instruction code contains of an operation code, or opcode, which designates the
overall purpose of the instruction (e.g. add, subtract, move, input, etc.). The number of bits allocated for the
opcode determined how many different instructions the architecture supports. In addition to the opcode,
many instructions also contain one or more operands, which indicate where in registers or memory the data
required for the operation is located. For example, add instruction requires two operands, and a not
instruction requires one.

43. List out the types of instruction in MIPS.(APR/MAY2018)

1. Arithmetic instruction
2. Data transfer Instruction
3. Logical Instruction
4. Conditional Branch Instruction
5. Unconditional jump Instruction

44. State indirect addressing mode give example.(APR/May 2017)

Indirect Mode. The effective address of the operand is the contents of a register or main memory location,
location whose address appears in the instruction. Once it's there, instead of finding an operand, it finds an
address where the operand is located. LOAD R1, @R2 Load the content of the memory address stored at
registers R2 to register R1.
Dr NGP IT CS 8491 Computer Architecture Dept of IT

45. Suppose that we are considering an enhancement to the processor of a server system used for Web
Serving. The new CPU is 10 times faster on computation in the Web serving application that the
original processor. Assuming that the original CPU is busy with computation 40% of the time and is
waiting for I/O 60% of the time, what is the overall speedup gained by incorporating the
enhancement? April/May 2019

UNIT – II

ARITHMETIC FOR COMPUTERS

1. State the Arithmetic Logic Unit? MAY/JUNE 2016

Arithmetic Logic Unit (ALU) is the brain of the computer, the device that performs the arithmetic
operation like addition, subtraction and multiplication or logical operations like OR, AND, NOT and
EX-OR.

2. Draw a block diagram for ALU and its example

A decode is used to decode the instruction.

Block Diagram of the ALU

The ALU has got two input registers named as A and B and one output storage register, named as C.
It performs the operation as:
C = A op B
The input data are stored in A and B, and according to the operation specified in the control lines, the
ALU perform the operation and put the result in register C.
3. Add 610 to 710 in binary and Subtract 610 from 710 in binary.
Dr NGP IT CS 8491 Computer Architecture Dept of IT

4. Write the overflow conditions for addition and subtraction. (APR/MAY 2015) (Nov/Dec 2016)
(Nov/Dec 2015)
The overflow conditions for addition and subtraction are

Operation Operand A Operand B Result indicating

overflow

A+B ≥0 ≥0 <0

A+B <0 <0 ≥0

A-B ≥0 <0 <0

A-B <0 ≥0 ≥0

5. When overflow occurs in addition?

A situation, in which a positive exponent becomes too large to fit in the exponent field, overflow occurs.
When adding operands with different signs, overflow cannot occur.
Reason: sum must be no larger than one of the operands.
Ex: -10+4=-6
Therefore no overflow can occur when adding positive and negative operands.

6. Multiply 100010 X 100110.

7. Divide 1,001,010ten by 1000ten. (Nov/Dec 2017)

Dr NGP IT CS 8491 Computer Architecture Dept of IT

8. Which technology is used to faster division?

SRT division technique is used to faster division. SRT division technique try to guess several quotient bits
per step, using a table lookup based on the upper bits of the dividend and remainder.

9. Write the IEEE 754 floating point format.

The IEEE 754 standard floating point representation is almost always an approximation of the real number.

(-1)s x (1+fraction) x 2(exponent – bias)

10. Recall the fraction, scientific notation and exponent?

Fraction: the value, generally between 0 and 1, placed in the fraction field.
Scientific notation: a notation that renders numbers with a single digit to the left of the decimal
point.
Exponent: in the numerical representation system of floating point arithmetic, the value that is placed
in the exponent field.

11. List out the advantages using normalized scientific notation There are three advantages:
 It simplifies of data that includes floating point numbers
 It simplifies the floating point arithmetic algorithms to know that numbers will always be in this form
 It increases the accuracy of the numbers that can be stored in a word, since the unnecessary leading 0s
are replaced by real digits to the right of the binary point.

12. List the advantages of floating point.

The advantages of floating point are
 It simplifies exchange of data including floating point numbers.
 It simplifies the floating point algorithms
 It increases the accuracy of the numbers that can be stored in a word

13. State an Exception?( NOV/DEC 2014)

 An unscheduled event that disrupt program execution called as an Exception .it is also called
as Interrupt.
 The address of the instruction that overflowed is saved in a register, and the computer jumps
to a predefined address to invoke the appropriate routine for that exception.

14. How double and single precision represented? (Nov/Dec 2015)

 Double precision: A floating point value is represented in two 32 bit words.

 Single precision: A floating point value is represented in single 32 bit words.

15. List out the steps followed in floating-point addition.

The steps in the floating-point addition are
1. Align the decimal point of the number that has the smaller exponent.
2. Addition of the significant
Dr3.
NGP IT
Normalize the sum. CS 8491 Computer Architecture Dept of IT
4. Round the result.

16. Write down the steps for FP_multilplication

The steps for FP Multiplication are
 add exponents
 multiply significant
 normalize result and check for overflow and underflow
 round and renormalize if necessary
 determine sign of result from signs of operands

17. List out the floating point instructions in MIPS.

MIPS supports the IEEE 754 single precision and double precision formats with these instructions:
o Floating-point addition, single (add.s) and addition, double (add.d)
o Floating-point subtraction, single (sub.s) and subtraction, double (sub.d)
o Floating-point multiplication, single (mul.s) and multiplication, double (mul.d)
o Floating-point division, single (div.s) and division, double (div.d)
o Floating-point comparison, single (c.x.s) and comparison, double (c.x.d),
 where x may be equal (eq), not equal (neq), less than (lt), less than or equal
 (le), greater than (gt), or greater than or equal (ge)
o Floating-point branch, true (bc1t) and branch, false (bc1f)

18. State the Guard and Round (Nov/Dec 2016)

Guard is the first of two extra bits kept on the right during intermediate calculations of floating point
numbers. It is used to improve rounding accuracy.
Round is a method to make the intermediate floating-point result fit the floating-point format; the
goal is typically to find the nearest number that can be represented in the format. IEEE 754, therefore, always
keeps two extra bits on the right during intermediate additions, called guard and round, respectively.

19. Recall ULP

Units in the Last Place are defined as the number of bits in error in the least significant bits of the
significant between the actual number and the number that can be represented.

20. State the sticky bit

Sticky bit is a bit used in rounding in addition to guard and round that is set whenever there are
nonzero bits to the right of the round bit. This sticky bit allows the computer to see the difference between
0.50 … 00 ten and ....01 ten when rounding.

21. Show the sub word parallelism. (APR/MAY 2015, MAY/JUNE 2016)
Subword Parallelism-
Subword Parallelism is a technique that enables the full use of word-oriented datapaths when dealing
with lower-precision data. It is a form of low-cost, small-scale SIMD parallelism.
Graphics and audio applications can take advantage of performing simultaneous operations on short
vectors.By partitioning the carry chains within a 128-bit adder, a processor could use parallelism to
perform simultaneous operations on short vectors of sixteen 8-bitoperands, eight 16-bit operands, four
32-bit operands, or two 64-bit operands.
Example: 128-bit adder:
 Sixteen 8-bit adds
Dr NGP IT  Eight CS 8491adds
16-bit Computer Architecture Dept of IT
 Four 32-bit adds

22. State an opcode. How many bits are needed to specify 32 distinct operations?
The field that denotes the operation and format of the instruction is called opcode. To specify 32
distinct operations, 5 bits are necessary.

23. List out the use of round bit in floating point arithmetic.
Round is a method to make the intermediate floating point result to fit the floating point format.
The purpose of round is to find the nearest number that can be represented in the format.

24. State the fused multiply add

Floating point instruction that performs both a multiply and add, but round only once after the add
operation is called fused multiply add.

25. Show the biased notation

Bias is the number subtracted from the normal, unsigned representation to determine the real value. It
is a convenient way to represent the most negative exponent as 00….00 and the most positive as 11….11

26. How MIPS ignore overflow in signed integers?

MIPS computer has two methods to ignore overflow in signed integers.
1. Add( add), add immediate(addi) cause exceptions on overflow.
2. Add unsigned (addu), add immediate unsigned(addiu) do not cause exceptions on overflow.

27. List out the data used to perform parallelism

Parallelism performs simultaneous operations on short vectors with the following values.
1. Sixteen 8 bit operands
2. Eight 16 bit operands
3. Four 32 bit operands
4. Tow 64 bit operands

28. Write the MIPS instruction for multiplication

MIPS has two instructions to produce a proper product for signed and unsigned numbers such as
1. Multiply (mult)
2. Multiply unsigned (multu)

29. How the sign of binary numbers can be represented?

In binary numbers, to represent the sign, the least significant bit is used. If MSB is 0, the number is positive
and if it is 1, then the number is called negative number.

30. Represent 1259.125 base 10 in single precision

Single precision: 10011101011.001

31. State the Guard Bit (Apr/May 2017)

Dr NGPbits
Guard IT are the first two extra bits
CS 8491
keptComputer
on the Architecture Dept of
right during the intermediate calculations ofITfloating
point numbers, which is used to improve rounding accuracy.

32. List out the rules to perform addition on floating point numbers. (Apr/May 2017)

Step 1: Compare the exponents of the two numbers. Shift the smaller number to the right
until its exponent would match the larger exponent
Step 2: Add the significants
Step 3: Normalize the sum, either shifting right and incrementing the exponent or shifting
left and decrementing the exponent
Step 4: Check for Overflow or Underflow
Step 5: Round the significant to the appropriate number of bits.

33. Write the single and double precision binary representation of −0.75ten (APR/MAY2018)

Double precision

34. State Rules for floating point addition.(APR/MAY 2017)

Assume that only four decimal digits of the significand and two decimal digits of the exponent.
Step 1: Align the decimal point of the number that has the smaller exponent
Step 2: addition of the significands
Step 3: This sum is not in normalized scientific notation, so adjust it
Step 4: Since the significand can be only four digits long (excluding the sign), we round the number.
truncate the number if the digit to the right of the desired point

35. Convert (1.00101)2 to decimal.(April/May 2019))

To do this, at first translate it to decimal here so :

1.001012 = 1∙20+0∙2-1+0∙2-2+1∙2-3+0∙2-4+1∙2-5 = 1+0+0+0.125+0+0.03125 = 1.1562510
Happened: 1.1562510
Result of converting:
1.001012 = 1.1562510

36. List out the advantage of using Booth algorithm.

Dr NGP
1)ITIthandles both positive andCSnegative
8491 Computer Architecture
multiplier uniformly. Dept of IT
2) It achieves efficiency in the number of additions required when the multiplier
has a few large blocks of 1’s.
3) The speed gained by skipping 1’s depends on the data

37. Write the multiplication algorithm.

38. Perform subtraction by two’s complement method : 100-110000. (April/May 2019)

0 0 0 0 0 1 0 0 = +4
0 0 1 1 0 0 0 0 = +48

Take 1's complement of 48

=11001111

Add 1 to get the 2's complement of + 48

1111
0 1 1 0 1 1 1 1 = +48
+ 1
_________________
01110000

Add the binary values of +4 and -48 to get the correct answer.
0 0 0 0 0 1 0 0 = +04
1 1 0 1 0 0 0 0 = -48
_____________________
1 1 0 1 0 1 0 0 = -44
(or)

Number 1 in the decimal system

1002 = 410
Number 2 in the decimal system
1100002 = 4810
Dr NGP IT
Theirdifference is CS 8491 Computer Architecture Dept of IT
4 - 48 = -44
Result in binary form
-4410 = 1011002

UNIT – III

PROCESSOR AND CONTROL UNIT

1. Write down the MIPS instruction set.

The classification of MIPS instruction set are
 The memory-reference instructions e.g load word(lw) and store word(sw)
 The arithmetic-logical instructions e.g add, sub, and, or and slt
 The Control instructions e.g (beq) and Jump(j).

2. State the data path element and program counter. Nov/Dec 2016
Data element is a unit used to operate on or hold data within a processor. In the MIPS
implementation, the data path elements include the instruction and data memories, the register file, the ALU
and adders. Program Counter (PC) is the register containing the address of the current instruction in the
program being executed.

3. State the register file.

The processor’s 32 general-purpose registers are stored in a structure called a register file. A register
file is a collection of registers in which any register can be read or written by specifying the number of the
register in the file. The register file contains the register state of the computer.

4. State the combinational and state element.

Combinational element is a AND gate or an ALU A Memory element such as a register or a
memory.

5. List out the two state elements needed to store and access an instruction.
Two state elements needed to store and access instructions are the instruction memory and the
program counter. An adder is needed to compute the next instruction address.

6. Draw the diagram of portion of datapath used for fetching instruction.

A portion of the data path is used for fetching instructions and incrementing the program counter.
The fetched instruction is used by other parts of the data path.
Dr NGP IT CS 8491 Computer Architecture Dept of IT

7. State the Sign Extend.

Sign-extend is used to increase the size of a data item by replicating the high-order sign bit of the
original data item in the high order bits of the larger, destination data item.

8. Differentiate branch taken and branch not taken.

Branch taken is a branch where the branch condition is satisfied and the program counter (PC)
becomes the branch target. All unconditional jumps are taken branches. Branch not taken or (untaken branch)
is a branch where the branch condition is false and the program counter (PC) becomes the address of the
instruction that sequentially follows the branch.

9. Outline about delayed branch.

In MIPS instruction set, branches are delayed meaning that the instruction immediately following the
branch is always executed, independent of whether the branch condition is true or false. When the condition
is false, the execution looks like a normal branch. When the condition is true, a delayed branch first executes
the instruction immediately following the branch in sequential instruction order before jumping to the
specified branch target address.

10. List out the three instruction classes and their instruction formats?
The three instruction classes (R-type, load and store, and branch) use two different instruction
formats.

11. Write the instruction format for the jump instruction.

The destination address for a jump instruction is formed by concatenating the upper 4 bits of the
current PC + 4 to the 26-bit address field in the jump instruction and adding 00 as the 2 low-order bits.

12. Recall the pipelining.

DrPipelining
NGP IT is an implementation CS 8491 Computer
technique Architecture
in which Dept
multiple instructions are overlapped in ofexecution.
IT
Pipelining improves performance by increasing instruction throughput, as opposed to decreasing the
execution time of an individual instruction.

13. Point out the five steps in MIPS instruction execution. (April/May 2019)
The five steps in MIPS Instrution
1. Fetch instruction from memory.
2. Read registers while decoding the instruction. The regular format of MIPS instructions allows
reading and decoding to occur simultaneously.
3. Execute the operation or calculate an address.
4. Access an operand in data memory.
5. Write the result into a register.

14. Write the formula for calculating time between instructions in a pipelined processor.

15. Identify the Hazards. Write its types. (Nov/Dec 2015) (Apr/May 2017)(Apr/May 2017)

Hazards are the situations in pipelining when the next instruction cannot be executed in the following
clock cycle. The types of hazards are:
1. Structural Hazards
2. Data Hazards
3. Control Hazards

16. Infer the information about the forwarding.

Forwarding, also called bypassing, is a method of resolving a data hazard by retrieving the missing data
element from internal buffers rather than waiting for it to arrive from programmer visible registers or
memory.

17. Outline the pipeline stall/ Pipeline Bubble (Nov/Dec 2016)

Pipeline stall, also called bubble, is a stall initiated in order to resolve a hazard. They can be seen
elsewhere in the pipeline.
Dr NGPis
18. Why IT branch prediction algorithm
CS 8491 Computer
needed?Architecture
Differentiate between the static Dept
andofdynamic
IT
techniques. (May 2013,APR/MAY 2015,NOV/DEC 15)
The branch instruction will introduce branch penalty which would reduce the gain in performance expected
from pipelining. Branch instructions can be handled in several ways to reduce their negative impact on the
rate of execution of instructions. Thus the branch prediction algorithm is needed.
Static Branch prediction
The static branch prediction, assumes that the branch will not take place and to continue to fetch instructions
in sequential address order.

Dynamic Branch prediction

The idea is that the processor hardware assesses the likelihood of a given branch being taken by keeping
track of branch decisions every time that instruction is executed. The execution history used in predicting the
outcome of a given branch instruction is the result of the most recent execution of that instruction.

19. List out the five pipeline stages.

The five stages of instruction execution in a pipelined processor are:
1. IF: Instruction fetch
2. ID: Instruction decode and register file read
3. EX: Execution or address calculation
4. MEM: Data memory access
5. WB: Write back

20. Show the interrupt and exception. (Dec 2012,NOV/DEC14,MAY/JUNE 2016,APR/MAY2018))

Exception:
The term exception is used to refer to any event that causes an interruption.
Interrupt:
An exception that comes from outside of the processor. There are two types of interrupt.
1. Imprecise interrupt and 2.Precise interrupt

21. Sate the Vectored Interrupts.

Vectored interrupt is an interrupt in that the address to which the control is transferred is determined
by the cause of the exception.

22. Sate the structural Hazard.

When a planned instruction cannot execute in the proper clock cycle because the hardware does not
support the combination of instructions that are set to execute is called structural hazard.

23. Recall Data Hazards. (APR/MAY 2017)

Data hazard is also called a pipeline data hazard. When a planned instruction cannot execute in the proper
clock cycle because data that is needed to execute the instruction is not yet available.
● In a computer pipeline, data hazards arise from the dependence of one instruction on an earlier one that is
still in the pipeline
● Example:
add instruction followed immediately by a subtract instruction that uses the sum ($s0):
add$s0, $t0, $t1
sub$t2, $s0, $t3

24. State Control Hazard.

DrControl
NGP IT CS 8491 Computer
hazard also called as branch hazard. Architecture
When the proper instruction cannot Dept of IT in the
execute
propoer pipeline clock cycle because the instruction that was fetched is not the one that is needed ie the flow
of instruction addresses is not available what the pipeline expected.

25. State the load and store data hazard.

It is a specific form of data hazard in which the data being loaded by a load instruction has not yet
become available when it is needed by another instruction.

26. How data hazard are resolved?

Forwarding method used to resolve the data hazards. It is also called as bypassing. It is a method of
resolving data hazard by retrieving the missing data element from internal buffers rather than waiting for it to
arrive from programmer visible or memory.

27. How Control hazard are resolved?

Control hazard or branch hazard can be resolved using branch prediction method. Branch prediction
is a method of resolving a branch hazard that assumes a given outcome for the branch and proceeds from that
assumption rather than waiting to ascertain the actual response.

28. State the Dynamic Branch Prediction.

Dynamic branch prediction is a prediction of branched at runtime using runtime information.

29. State the branch prediction buffer.

It is also called as branch history table. It is a small memory that is indexed by the lower portion of
the address of the branch prediction. It contains one or more bits indicating whether the branch was recently
taken or not.

30. List down the steps to be carried out in executing a load word instruction.
1. An instruction is fetched from the instruction memory and the PC is incremented
2. A register value is read from the register file
3. The ALU computes the sum of the value read from the register file and the sign-extended, lower 16
bits of the instruction
4. The sum from the ALU is used as the address for the data memory
5. The data from the memory unit is written into the register file in the destination register.

31. Name the control Signal required to perform arithmetic operations (Apr/May 2017)
The control signals required to perform arithmetic operations are
1. ResDst
2. RegWrite
3. ALUSrc
4. MemRead
5. MemWrite

32. State the branch target address.

Branch target address is the address specified in a branch, which becomes the new program counter (PC)
if the branch is taken. In the MIPS architecture the branch target is given by the sum of the offset field of
the instruction and the address of the instruction following the branch.
Dr NGP IT CS 8491 Computer Architecture Dept of IT
33. State the data path. (NOV/DEC 2016,APR/MAY2018)
As instruction execution progress data are transferred from one instruction to another, often passing through
the ALU to perform some arithmetic or logical operations. The registers, ALU, and the interconnecting bus
are collectively referred as the data path.

34. Identify the ideal cycle per instruction in pipelining.(APR/MAY 2018)

With pipelining, a new instruction is fetched every clock cycle by exploiting instruction-level parallelism,
therefore, since one could theoretically have five instructions in the five pipeline stages at once (one
instruction per stage), a different instruction would complete stage 5 in every clock cycle

UNIT IV

PARALLELISM

1. State the Instruction level parallelism. (Nov/Dec 2016) (Nov/Dec 2015) (Apr/May 2017)

Pipelining exploits the potential parallelism among instructions. This parallelism is called instruction-
level parallelism (ILP). There are two primary methods for increasing the potential amount of instruction-
level parallelism.
1. Increasing the depth of the pipeline to overlap more instructions.
2. Multiple issue.

2. Outline the multiple issues. Write any two approaches.

Multiple issue is a scheme whereby multiple instructions are launched in one clock cycle. It is a
method for increasing the potential amount of instruction-level parallelism. It is done by replicating the
internal components of the computer so that it can launch multiple instructions in every pipeline stage. The
two approaches are
1. Static multiple issue (at compile time)
2. Dynamic multiple issue (at run time)

3. Recall the speculation. (NOV/DEC2014)

Speculation is one of the most important methods for finding and exploiting more ILP . It is an
approach whereby the compiler or processor guesses the outcome of an instruction to remove it as
dependence in executing other instructions. For example, we might speculate on the outcome of a branch, so
that instructions after the branch could be executed earlier.

4. State the Static Multiple Issue.

Static multiple issue is an approach to implement a multiple-issue processor where many decisions
are made by the compiler before execution.

5. Recall the Issue Slots and Issue Packet.

Issue slots are the positions from which instructions could be issued in a given clock cycle. By
analogy, these correspond to positions at the starting blocks for a sprint.Issue packet is the set of instructions
that Dr NGP ITtogether in one clock cycle;
issues CS 8491
theComputer
packet Architecture Deptcompiler
may be determined statically by the of IT or
dynamically by the processor.

6. State the VLIW.

Very Long Instruction Word (VLIW) is a style of instruction set architecture that launches many
operations that are defined to be independent in a single wide instruction, typically with many separate
opcode fields.

7. Recall the Superscalar Processor. (Nov/Dec 2015)

Superscalar is an advanced pipelining technique that enables the processor to execute more than one
instruction per clock cycle by selecting them during execution. Instructions issue in order, and the processor
decides whether zero, one, or more instructions can issue in a given clock cycle.

8. Outline about the loop unrolling.

An important compiler technique to get more performance from loops is loop unrolling, where
multiple copies of the loop body are made. After unrolling, there is moreILP available by overlapping
instructions from different iterations.

9. State an anti-dependence. How is it removed?

Anti-dependence is an ordering forced by the reuse of a name, typically a register.Rather than by a true
dependence that carries a value between two instructions. It is also called as name dependence.Renaming is
the technique used to remove anti-dependence in which the registers are renamed by the compiler or
hardware.

10. List the use of reservation station and reorder buffer.

Reservation station is a buffer within a functional unit that holds the operands and the operation. Reorder
buffer is the buffer that holds results in a dynamically scheduled processor until it is safe to store the results
to memory or a register.

11. Differentiate between in-order execution from out-of-order execution.

Out-of-order execution is a situation in pipelined execution when an instruction is Blocked from executing
does not cause the following instructions to wait. It preserves the data flow order of the program.
In-order execution requires the instruction fetch and decode unit to issue instructions in order, which
allows dependences to be tracked, and requires the commit unit to write results to registers and memory in
program fetch order. This conservative mode is called in-order commit.

12. Recall the hardware multithreading. (Nov/Dec 2016)

Dr NGPmultithreading
Hardware IT allows multipleCS 8491 Computer
threads Architecture
to share Dept of IT in an
the functional units of a single processor
overlapping fashion to try to utilize the hardware resources efficiently. To permit this sharing, the processor
must duplicate the independent state of each thread. It Increases the utilization of a processor.

13. List out the two main approaches to hardware multithreading.

There are two main approaches to hardware multithreading.

 Interleaved multithreading
This is also known as fine-grained multithreading. The processor deals with two or more thread
contexts at a time, switching from one thread to another at each clock cycle. If a thread is blocked
because of data dependencies or memory latencies, that thread is skipped and a ready thread is
executed.

 Blocked multithreading
This is also known as coarse-grained multithreading. The instructions of a thread are executed
successively until an event occurs that may cause delay, such as a cache miss. This event
induces a switch to another thread. This approach is effective on an in-order processor that would
stall the pipeline for a delay event such as a cache miss.

14. Revise the SMT.

Instructions are simultaneously issued from multiple threads to the execution units of a superscalar processor.
This combines the wide superscalar instruction issue capability with the use of multiple thread contexts.

.15. Name the three multithreading options.

The three multithreading options are:

1. A superscalar with coarse-grained multithreading
2. A superscalar with fine-grained multithreading
3. A superscalar with simultaneous multithreading

16. Revise the SMP.

Shared memory multiprocessor (SMP) is one that offers the programmer a single physical address space
across all processors - which is nearly always the case for multicore chips. Processors communicate through
shared variables in memory, with all processors capable of accessing any memory location via loads and
stores.
Dr NGP IT CS 8491 Computer Architecture Dept of IT

17. Differentiate between UMA and NUMA. (APRIL/MAY2015)

Uniform memory access (UMA) is a multiprocessor in which latency to any word in main memory is about
the same no matter which processor requests the access.
Non uniform memory access (NUMA) is a type of single address space multiprocessor in which some
memory accesses are much faster than others depending on which processor asks for which word.

18. How do you improve throughput in multithreading?

To improve throughput,

 Executing some instructions in a different order from the way they occur in the instruction
stream and beginning execution of instructions that may never be needed. This approach may
be reaching a limit due to complexity and power consumption concerns.
 An alternative approach, which allows for a high degree of instruction-level parallelism
without increasing circuit complexity or power consumption, is called multithreading.
 The instruction stream is divided into several smaller streams, known as threads, such that the
threads can be executed in parallel.

19. List out the differences between process and thread.

Process:
 An instance of a program running on a computer.
 A process is concerned with scheduling/execution and resource ownership

Thread:
 A dispatch able unit of work within a process. It includes a processor context (which includes the
program counter and stack pointer) and its own data area for a stack (to enable subroutine
branching).
 A thread executes sequentially and is interruptible so that the processor can turn to another thread.
 A thread is concerned with scheduling and execution.

20. List out the characteristics of process.

Two key characteristics:

 Resource ownership: A process includes a virtual address space to hold the process image; the
process image is the collection of program, data, stack, and attributes that define the process. From
time to time, a process may be allocated control or ownership of resources, such as main memory, I/O
channels, I/O devices, and files.
 Scheduling/execution: The execution of a process follows an execution path(trace) through one or
more programs. This execution may be interleaved with that of other processes. Thus, a process has
an execution state (Running, Ready, etc.) and a dispatching priority and is the entity that is scheduled
and dispatched by the operating system.

21. State the Pollack’s rule.

Dr
NGP IT
Rule CS 8491 Computer
of thumb known as Pollack’s Architecture
rule which Dept of
states that performance increase is ITroughly
proportional to square root of increase in complexity.

22. Recall the task level parallelism and data level parallelism.

Task level parallelism or process level parallelism means utilizing multiple processors by running
independent programs simultaneously. Parallelism achieved by performing the same operation on
independent data is called as data level parallelism

23. State the multicore multiprocessor.

A microprocessor containing multiple processors (“Cores”) in a single integrated circuit is called as

multicore multiprocessor. All microprocessors today in desktops and servers are multicore in nature.

24. State the write after write operation.

Consider an example in which register A comes before register B in program order. A writes to a location
and B writes to the same loacation. If B writes first, then A writes, the location will end up with the wrong
value. It is called as output dependency.

25. State the Write after Read Operation.

Consider an example in which register A comes before B in program order. A reads from a location, B
writes to the location, therefore B has a WAR dependency on A. If B executes before A has read its
operand, then the operand will be lost. It is called an anti-dependency.

26. List out the limitations of ILP.

ILP is limited by the following factors

1. True data dependency
2. Procedural dependency
3. Resource Conflicts
4. Output Dependency

27. State the Flynn’s Classification. ( NOV/DEC 2014,NOV/DEC2017,APR/MAY 2018)

Flynn’s Classification divides parallel hardware in to four groups such as

1. Single instruction stream single data stream

2. Single instruction stream multiple data stream
3. Multiple instruction stream single data stream
4. Multiple instruction stream multiple data stream

28. Recall the Multithreading. ( NOV/DEC 2014)

The multithreading implies that there are multiple threads of control in each processor. Multithreading
offers an effective mechanism for hiding long latency in building large scale microprocessors.
Dr NGP IT
Multithreading CS 8491
is the ability of a program orComputer Architecture
an operating Dept
system to serve more than one user at ofa IT
time and
to manage multiple simultaneous requests without the need to have multiple copies of the programs running
within the computer. To support this, central processing units have hardware support to efficiently execute
multiple threads

29. Differentiate between strong scaling and weak scaling (APR/MAY 2015, NOV/DEC2017)

In strong scaling methods, speed up is achieved on a multiprocessor without increasing the size of
the problem. Strong scaling means measuring speed up while keeping the problem size fixed. In weak
scaling method, speed up is achieved on a multiprocessor while increasing the size of the problem
proportionally to the increase in the number of processors.

30. State the synchronization.

Synchronization is the process of coordinating the behavior of two or more processes running on different
processors.

31. List the Fine grained multithreading and Coarse grained multithreading. MAY/JUNE
2016,NOV/DEC2017
Fine grained multithreading
Switches between threads on each instruction, causing the execution of multiples threads to be interleaved,
- Usually done in a round-robin fashion, skipping any stalled threads
- CPU must be able to switch threads every clock
Coarse grained multithreading
Switches threads only on costly stalls, such as L2 cache misses

32. Differentiate Explicit threads Implicit Multithreading(Apr/May2017)

Explicit threads
User-level threads which are visible to the application program and kernellevel threads which are visible only
to operating system, both are referred to as explicit threads.
Implicit Multithreading
Implicit Multithreading refers to the concurrent execution of multiple threads extracted from a single
sequential program.
Explicit Multithreading refers to the concurrent execution of instructions from different explicit threads,
either by interleaving instructions from different threads on shared pipelines or by parallel execution on
parallel pipelines
Dr NGP IT CS 8491 Computer Architecture Dept of IT

UNIT V

MEMORY & I/O SYSTEMS

1. List out the temporal and spatial localities of references.

Temporal locality (locality in time): if an item is referenced, it will tend to be referenced in near future.
Spatial locality (locality in space): if an item is referenced, items whose addresses are close by will tend to
be referenced in near future.

2. Write the structure of memory hierarchy. (APRIL/MAY2015)

3. Classify the various memory technologies. (Nov/Dec 2015)

Dr NGP ITmemory technologies are CS 8491 Computer Architecture
The various Dept of IT
1. SRAM semiconductor memory
2. DRAM semiconductor memory
3. Flash semiconductor memory
4. Magnetic disk

4. Differentiate SRAM from DRAM. (Dec 2013,APR/MAY2018)

SRAMs are simply integrated circuits that are memory arrays with a single access port that can
provide either a read or a write. SRAMs have a fixed access time to any datum. SRAMs don’t need to refresh
and so the access time is very close to the cycle time. SRAMs typically use six to eight transistors per bit to
prevent the information from being disturbed when read. SRAM needs only minimal power to retain the
charge in standby mode.
In a dynamic RAM (DRAM), the value kept in a cell is stored as a charge in a capacitor. A single
transistor is then used to access this stored charge, either to read the value or to overwrite the charge stored
there. Because DRAMs use only a single transistor per bit of storage, they are much denser and cheaper per
bit than SRAM. As DRAMs store the charge on a capacitor, it cannot be kept indefinitely and must
periodically be refreshed.

5. State the flash memory.

Flash memory is a type of electrically erasable programmable read-only memory.(EEPROM). Unlike disks
and DRAM, EEPROM technologies can wear out flash memory bits. To cope with such limits, most flash
products include a controller to spread the writes by remapping blocks that have been written many times to
less trodden blocks. This technique is called wear Leveling.

6. State the Rotational Latency.

Rotational latency, also called rotational delay, is the time required for the desired sector of a disk to rotate
under the read/write head, usually assumed to be half the rotation time.

7. Recall the direct-mapped cache.

Direct-mapped cache is a cache structure in which each memory location is mapped to exactly one location
in the cache. For example, almost all direct-mapped caches use this mapping to find a block, (Block address)
modulo (Number of blocks in the cache).

8. Consider a cache with 64 blocks and a block size of 16 bytes. To what block number does byte
address 1200 map?
The block is given by,
Dr NGP IT CS 8491 Computer Architecture Dept of IT

9. How many total bits are required for a direct-mapped cache with 16 KiB of data and 4-word
blocks, assuming a 3-bit address?

10. List out the writing strategies in cache memory.

Write-through is a scheme in which writes always update both the cache and the next lower level of
the memory hierarchy, ensuring that data is always consistent between the two.

Write-back is a scheme that handles writes by updating values only to the block in the cache, then
writing the modified block to the lower level of the hierarchy when the block is replaced.

11. List out the steps to be taken in an instruction cache miss.

The steps to be taken on an instruction cache miss are

1. Send the original PC value (current PC – 4) to the memory.

2. Instruct main memory to perform a read and wait for the memory to complete its access.
Dr NGP
3. Write the IT CSfrom
cache entry, putting the data 8491 Computer
memory Architecture
in the data portion ofthe entry, writingDept
theofupper
IT bits
of the address (from the ALU) into the tag field, and turning the valid bit on.
4. Restart the instruction execution at the first step, which will refetch the instruction, this time finding it in
the cache.

12. Recall the AMAT.

Average memory access time is the average time to access memory considering both hits and misses and the
frequency of different accesses. It is equal to the following:

13. Name the various block placement schemes in cache memory.

Direct-mapped cache is a cache structure in which each memory location is mapped to exactly one location
in the cache.
Fully associative cache is a cache structure in which a block can be placed in any location in the cache. Set-
associative cache is a cache that has a fixed number of locations (at least two)where each block can be
placed.

14. State the MTTF and AFR

Reliability is a measure of the continuous service accomplishment or,equivalently, of the time to failure from
a reference point. Hence, mean time to failure (MTTF) is a reliability measure. A related term is annual
failure rate (AFR), which is just the percentage of devices that would be expected to fail in a year for a given
MTTF.

15. Recall the Availability

Availability is then a measure of service accomplishment with respect to thealternation between the two
states of accomplishment and interruption. Availability is statistically quantified as

16. Name the three ways to improve MTTF.

The three ways to improve MTTF are:

1. Fault avoidance: Preventing fault occurrence by construction.
2. Fault tolerance: Using redundancy to allow the service to comply with the service specification despite
faults occurring.
3. Fault forecasting: Predicting the presence and creation of faults, allowing the component to be replaced
before it fails.

17. State the TLB

Dr NGP IT
Translation-Lookaside Buffer (TLB) isCSa 8491
cacheComputer Architecture
that keeps Dept of ITto try to
track of recently usedaddress mappings
avoid an access to the page table.

18. Recall the virtual memory. (MAY/JUNE 2016, NOV/DEC 2017)

Virtual memory is a technique that uses main memory as a “cache” for secondary storage. Two major
motivations for virtual memory: to allow efficient and safe sharing of memory among multiple programs,
and to remove the programming burdens of a small, limited amount of main memory.

19. Differentiate physical address from logical address.

Physical address is an address in main memory.

Logical address (or) virtual address is the CPU generated addresses that corresponds to a location in virtual
space and is translated by address mapping to a physical address when memory is accessed.

20. State the Page Fault

Page fault is an event that occurs when an accessed page is not present in main memory.

21. Recall the address mapping. (Nov/Dec 2016)

Address translation is also called address mapping is the process by which a virtual address is mapped to an
address used to access data in memory.

22. How will be the size of memory works, if 20 address lines are used?

If the Processor has 20 address lines, it is capable of addressing up to 2 20 memory locations. Hence the size of
the memory is 1 MB.

23. State the use of valid bit.

Valid bit indicates that a field in the tables of memory hierarchy with the associated block in the hierarchy
contains valid data.

24. Differentiate Split Cache and Multilevel Cache

Split Cache:
A Scheme in which a level of the memory hierarchy is composed of two independent
caches that operates in parallel with each other, with one handling instructions and the other handling data is
called Split cache.
Multilevel Cache
A memory hierarchy with multiple levels of caches rather than just a cache and main memory is
called multilevel cache.

25. Recall the reference bit or use bit.

A field that is set whenever a page is accessed and is used to implement LRU or other replacement schemes
is called reference bit.

26. Recall the vectored interrupt.

Dr NGP IT CS 8491 Computer Architecture Dept of IT
In Vectored interrupts, the interrupting device must supply the CPU with the starting address or interrupt
vector of the program. This method is most flexible in handling interrupts. The interrupt request causes the
CPU to a direct hardware implemented transition to the interrupt handling program.

27. List down the different modes of DMA Operation.

DMA works in different modes. It is based on the degrees of overlap between the CPU and DMA
Operations. The various modes of DMA operations are
1. Block Transfer
2. Cycle Stealing
3. Transparent DMA

28. Name the approaches to bus arbitration.

The various approaches to bus arbitration are 1. Centralized and Distributed. In Centralized arbitration, a
single bus arbiter performs the arbitration and selects the bus master. In distributed arbitration, all the
devices participate in the selection of the next bus master.

29. Compare memory mapped IO and IO mapped IO

S.No Parameter Memory Mapped I/O I/O Mapped I/O

1. Address Space Memory and I/O Memory and I/O

devices share the devices have separate
entire address space address space

2. Hardware No Additional Additional Hardware

hardware required Required

Difficult to
3. Implementation Easy to implement implement

4. Address Same address cannot Same address can be

be used to refer both used to refer both

memory and I/O memory and I/O

device device

5. Control Lines Memory control lines Different set of

Dr NGP IT CS 8491 Computer Architecture Dept of IT
are used to control IO control lines are used
devices also to control memory
and IO

6. Control lines used The control lines are The control lines are
READ M, WRITE
READ, WRITE M,
READ IO, WRITE
IO

30. List out the limitations of Programmed IO?

The programmed IO methods have two limitations.

1. The speed of the CPU is reduced due to low speed IO Devices. The speed with which the CPU can
test and transfer data between IO devices is limited due to low transfer rate of IO devices.

2. Most of the CPU time is wasted. The time that CPU spends testing IO device status and executing IO
data transfers is too long. That can be spent on other tasks.

31. State the cache memory? (Nov/Dec 2016)

The Cache Memory is the Memory which is very nearest to the CPU , all the Recent Instructions are Stored
into the Cache Memory. The Cache Memory is attached for storing the input which is given by the user and
which is necessary for the CPU to Perform a Task. But the Capacity of the Cache Memory is too low in
compare to Memory and Hard Disk. The cache memory therefore, has lesser access time than memory and is
faster than the main memory. A cache memory have an access time of 100ns, while the main memory may
have an access time of 700ns.

32. State the hit ratio.( APR/MAY 2013,NOV/DEC 2015)

When a processor refers a data item from a cache, if the referenced item is in the cache, then such a reference
is called Hit. If the referenced data is not in the cache, then it is called Miss, Hit ratio is defined as the ratio of
number of Hits to number of references.
Hit ratio =Total Number of references

32. Recall the memory interleaving. (MAY/JUNE ’11) (apr/may2017)

In order to carry out two or more simultaneous access to memory, the memory must be partitioned in to
separate modules. The advantage of a modular memory is that it allows the interleaving i.e. consecutive
addresses are assigned to different memory module
Dr NGP IT CS 8491 Computer Architecture Dept of IT
33. Mention the use of DMA. (Dec 2012)(Dec 2013,APR/MAY2018)
DMA (Direct Memory Access) provides I/O transfer of data directly to and from the memory unit and the
peripheral. Direct memory access (DMA) is a method that allows an input/output (I/O) device to send or
receive data directly to or from the main memory, bypassing the CPU to speed up memory operations. The
process is managed by a chip known as a DMA controller (DMAC).

Common questions