Notes 1
Notes 1
Notes 1
SEM/YEAR: IV/II
1
UNIT 1
PART-A
3.What is a bus? What are the different buses in a CPU? [ APR/MAY 2011]
A group of lines that serve as a connecting path for several devices is called bus
The different buses in a CPU are 1] Data bus 2] Address bus 3] Control bus.
VLSI is the abbreviation for Very Large Scale Integration. In this technology
millions of transistors are put inside a single chip as tiny components. The VLSI
chips do the function of millions of transistors. These are Used to implement
parallel algorithms directly in hardware
2
7.Define multiprocessing?
Multiprocessing is the ability of an operating system to support more than one
process at the same time
11.What is uniprocessor?
A uniprocessor system is defined as a computer system that has a single central
processing unit that is used to execute computer tasks. As more and more modern
software is able to make use of multiprocessing architectures, such as SMP and
MPP, the term uniprocessor is therefore used to distinguish the class of computers
where all processing tasks share a single CPU.
3
13.Differentiate super computer and mainframe computer.
A computer with high computational speed, very large memory and parallel
structured hardware is known as a super computer.EX: CDC 6600. Mainframe
computer is the large computer system containing thousands of IC’s. It is a room-
sized machine placed in special computer centers and not directly accessible to
average users. It serves as a central computing facility for an organization such as
university, factory or bank.
4
20.What is RISC and CISC?
The processors with simple instructions are called as Reduced Instruction Set
Computers (RISC). The processors with more complex instructions are called as
Complex Instruction Set Computers (CISC).
5
26.Define Relative mode addressing.(Nov2014)
In PC-relative mode addressing, the branch address is the sum of the PC and a
constant in the instruction. - In the relative address mode, the effective address is
determined by the index mode by using the program counter in stead of general
purpose processor register. This mode is called relative address mode.
6
30.Brief about relative addressing mode.NOV/DEC 2014
Relative addressing mode - In the relative address mode, the effective
address is determined by the index mode by using the program counter in
stead of general purpose processor register. This mode is called relative
address mode.
MAY/JUNE 2016
A special case of indirect register mode. The register whose number is included in
the instruction code, contains the address of the operand. Autoincrement Mode =
after operand addressing , the contents of the register is incremented. Decrement
Mode = before operand addressing, the contents of the register is decrement. We
denote the autoincrement mode by putting the specified register in parentheses, to
show that the contents of the register are used as the efficient address, followed by
a plus sign to indicate that these contents are to be incremented after the operand is
accessed. Thus, using register R4, the autoincrement mode is written as (R4)+.
As a companion for the autoincrement mode, another mode is often available in
which operands are accessed in the reverse order. Autodecrementmode The
contents of a register specified in the instruction are decremented. These contents
are then used as the effective address f the operand. We denote the autodecrement
mode by putting the specified register in parentheses, preceded by a minus sign to
indicate that the contents of register are to be decremented before being used as the
effective address. Thus, we write (R4).
This mode allows the accessing of operands in the direction of descending
addresses. The action performed by the autoincrement and auto decrement
addressing modes can be achieved using two instruction, one to access the operand
and the other to increment or to decrement the register that contains the operand
address. Combining the two operations in one instruction reduces the number if
instructions needed to perform the task.
32.If computer A runs a program in 10 seconds and computer B runs the same
program in 15 seconds how much faster is A than B?
7
and A is therefore 1.5 times as fast as B.
In the above example, we could also say that computer B is 1.5 times slower
than computer A, since
means that
To run the program in 6 seconds, B must have twice the clock rate of A.
8
34.Suppose we have two implementations of the same instruction set
architecture. Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for
some program, and computer B has a clock cycle time of 500 ps and a CPI of
1.2 for the same program. Which computer is faster for this program and by
how much?
We know that each computer executes the same number of instructions for the
program; let’s call this number I. First, find the number of processor clock cycles
for each computer:
Likewise, for B:
Clearly, computer A is faster. The amount faster is given by the ratio of the
execution times:
We can conclude that computer A is 1.2 times as fast as computer B for this
program.
The CPU time spent in the operating system performing tasks on behalf of
the program
36.Define response time
Response time:
Also called execution time. The total time required for the computer to
complete a task, including disk accesses, memory accesses, I/O activities,
operating system overhead, CPU execution time, and so on.
9
37.What is Throughput?
Also called bandwidth. Another measure of performance, it is the number
of tasks completed per unit time.
f = (g + h) – (i + j);
add t0,g,h # temporary variable t0 contains g + h
add t1,i,j # temporary variable t1 contains i + j
sub f,t0,t1 # f gets t0 –t1, which is
(g + h) – (i + j)
10
43.What are the three types of operands in MIPS
1.word
2. Memory Operands
A[12] = h + A[8];
Answer
$t0: lw$t0,32($s3)
# Temporary reg $t0 gets A[8]
add$t0,$s2,$t0
# Temporary reg $t0 gets h + A[8]
sw$t0,48($s3)
# Stores h + A[8] back into A[12]
3.J-type or Jump
11
48.What are the types of instruction in MIPS.(APR/MAY2018)
1. Arithmetic instruction
2. Data transfer Instruction
3. Logical Instruction
4. Conditional Branch Instruction
5. Unconditional jump Instruction
while (save[i] == k)
i += 1;
Ans:
Loop: sll$t1,$s3,2# Temp reg $t1 = i * 4
add $t1,$t1,$s6# $t1 = address of save[i]
lw $t0,0($t1)
# Temp reg $t0 = save[i]
bne $t0,$s5, Exit
# go to Exit if save[i] ≠ k
addi $s3,$s3,1# i = i + 1
jLoop# go to Loop
Exit:
Indirect Mode. The effective address of the operand is the contents of a register or
main memory location, location whose address appears in the instruction. ...
Once it's there, instead of finding an operand, it finds an address where the
operand is located.
LOAD R1, @R2Load the content of the memory address stored atregister R2
to register R1.
52.Suppose that we want to enhance the processor used for Web serving. The new processor is 10
times faster on computation in the Web serving application than the original processor. Assuming
that the original processor is busy with compu- tation 40% of the time and is waiting for I/O 60% of
the time, what is the overall speedup gained by incorporating the enhancement?[APR 2019]
12
53.Write down the five stages of Instruction Executions[APR 2019]
56. How many total bits are required for a direct-mapped cache with 16 KiB of data and 4-word blocks,
assuming a 32-bit address?[APR 2019]
16 KiB = 16384 (214) bytes = 4096 (212) words Block size of 4 (22) words = 16 bytes (24) 1024 (210)
blocks with 4 x 32 = 128 bits of data So, n = 10 m = 2 210x (4 x 32 + (32 –10 –2 - 2) + 1) = 210x 147 = 147
kibibits = 18.4 KiB
Number of cores. Different multicore processors often have different numbers of cores. For example, a
quad-core processor has four cores. The number of cores is usually a power of two.
Number of core types.
o Homogeneous (symmetric) cores. All of the cores in a homogeneous multicore processor are of
the same type; typically the core processing units are general-purpose central processing units that run a
single multicore operating system.
13
o Heterogeneous (asymmetric) cores. Heterogeneous multicore processors have a mix of
core types that often run different operating systems and include graphics processing units.
PART-B
Q. No. Questions
i) Discuss in detail about Eight great ideas of computer Architecture.(8)
1.
Refer Notes(Pg 1-3)(APR 2019)
ii) Explain in detail about Technologies for Building Processors and Memory
(8) Refer Notes(Pg 5-7)
2. Explain the various components of computer System with neat diagram (16)
(NOV/DEC2014,NOV/DEC2015,APR/MAY 2016,NOV/DEC
2016,APR/MAY2018/APR 2019) Refer Notes(Pg 3-5)
4. Define Addressing mode and explain the different types of basic addressing
modes with an example
(APRIL/MAY2015 ,NOV/DEC2015,APR/MAY 2016,NOV/DEC
2016,APR/MAY2018/APR 2019) Refer Notes(Pg 24-28)
Consider three diff erent processors P1, P2, and P3 executing the same instruction set.
6.
P1 has 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of
1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2. (APR/MAY 2018)
a. Which processor has the highest performance expressed in instructions per
second?
b. If the processors each execute a program in 10 seconds, find the number of
cycles and the number of instructions.
c. We are trying to reduce the execution time by 30% but this leads to an
increase
of 20% in the CPI. What clock rate should we have to get this time
reduction?
Ans:
1.P1: 3GHz / 1.5 = 2 * 10^9 instructions per second
P2: 2.5GHz / 1.0 = 2.5 * 10^9 instructions per second
P3: 4GHz / 2.2 = 1.82 * 10^9 instructions per second
So P2 has the highest performance among the three.
14
2. Cycles: P1: 3GHz * 10 = 3 * 10^10 cycles
P2: 2.5GHz * 10 = 2.5 * 10^10 cycles
P3: 4GHz * 10 = 4 * 10^10 cycles
3.Num of instructions: P1: 3GHz * 10 / 1.5 = 2 * 10^10 instructions
P2: 2.5GHz * 10 / 1.0 = 2.5 * 10^10 instructions
P3: 4GHz * 10 / 2.2 = 1.82 * 10^10 instructions
4. Execution time = (Num of instructions * CPI) / (Clock rate)
So if we want to reduce the execution time by 30%, and CPI increases by 20%, we
have: Execution time * 0.7 = (Num of instructions * CPI * 1.2) / (New Clock rate)
New Clock rate = Clock rate * 1.2 / 0.7 = 1.71 * Clock rate
New Clock rate for each processor: P1: 3GHz * 1.71 = 5.13 GHz
P2: 2.5GHz * 1.71 = 4.27 GHz
P3: 4GHz * 1.71 = 6.84 GHz
8. Explain direct ,immediate ,relative and indexed addressing modes with example
(APR/MAY2018/APR 2019)
Refer Notes(Pg 20-21)
9. State the CPU performance equation and the factors that affect performance (8)
(NOV/DEC2014) Refer Notes(Pg 7-10)
11. What is the need for addressing in a computer system?Explain the different
addressing modes with suitable examples.(APRIL/MAY2015/APR 2019)
Refer Notes(Pg 24-28)
15
14. Describe the steps that transform a program written in a high-level
language such as C into a representation that is directly executed by a
computer processor.
Language Processors–
Assembly language is machine dependent yet mnemonics that are being used
to represent instructions in it are not directly understandable by machine and
high Level language is machine independent. A computer understands
instructions in machine code, i.e. in the form of 0s and 1s. It is a tedious task
to write a computer program directly in machine code. The programs are
written mostly in high level languages like Java, C++, Python etc. and are
called source code. These source code cannot be executed directly by the
computer and must be converted into machine language to be executed.
Hence, a special translator system software is used to translate the program
written in high-level language into machine code is called Language
Processor and the program after translated into machine code (object program
/ object code).
The language processors can be any of the following three types:
1. Compiler –
The language processor that reads the complete source program written
in high level language as a whole in one go and translates it into an
equivalent program in machine language is called as a Compiler.
Example: C, C++, C#, Java
1. In a compiler, the source code is translated to object code successfully if
it is free of errors. The compiler specifies the errors at the end of
compilation with line numbers when there are any errors in the source
code. The errors must be removed before the compiler can successfully
recompile the source code again.>
2. Assembler –
The Assembler is used to translate the program written in Assembly
language into machine code. The source program is a input of assembler
that contains assembly language instructions. The output generated by
assembler is the object code or machine code understandable by the
computer.
16
3. Interpreter –
The translation of single statement of source program into machine code
is done by language processor and executes it immediately before
moving on to the next line is called an interpreter. If there is an error in
the statement, the interpreter terminates its translating process at that
statement and displays an error message. The interpreter moves on to the
next line for execution only after removal of the error. An Interpreter
directly executes instructions written in a programming or scripting
language without previously converting them to an object code or
machine code.
Example: Perl, Python and Matlab.
Difference between Compiler and Interpreter –
COMPILER INTERPRETER
analyze the entire source code but the of time to analyze the source
overall execution time of the program is code but the overall execution
17
the error can be present any where in the
program.
18
UNIT 2
PART -A
19
7.What are the main features of Booth’s algorithm?
● It handles both positive and negative multipliers uniformly.
● It achieves some efficiency in the number of addition required when
the multiplier has a few large blocks of 1s.
Given that the parallelism occurs within a wide word, the extensions
are classified as sub-word parallelism. It is also classified under the more
general name of data level parallelism. They have been also called vector or
SIMD, for single instruction, multiple data . The rising popularity of
multimedia applications led to arithmetic instructions that support narrower
operations that can easily operate in parallel.
12.How can we speed up the multiplication process?
There are two techniques to speed up the multiplication process:
1) The first technique guarantees that the maximum number of
summands that must be added is n/2 for n-bit operands.
20
13.What is bit pair recoding? Give an example.
Bit pair recoding halves the maximum number of summands. Group the
Booth- recoded multiplier bits in pairs and observe the following: The pair
(+1 -1) is equivalent to to the pair (0 +1). That is instead of adding -1 times
the multiplicand m at shift position i to +1 M at position i+1, the same
result is obtained by adding +1 M at position i.
Eg: 11010 – Bit Pair recoding value is 0 -1 -2
14.What are the two methods of achieving the 2’s complement?
a. Take the 1’s complement of the number and add 1.
b. Leave all least significant 0’s and the first unchanged and then
complement the
remaining bits
15.What is the advantage of using Booth algorithm?
1) It handles both positive and negative multiplier uniformly.
2) It achieves efficiency in the number of additions required when the
multiplier
has a few large blocks of 1’s.
3) The speed gained by skipping 1’s depends on the data
21
18.When can you say that a number is normalized?
When the decimal point is placed to the right of the first (nonzero)
significant digit, the number is said to be normalized.
The end values 0 to 255 of the excess-127 exponent E are used to
represent special values such as:
a) When E = 0 and the mantissa fraction M is zero the value exact 0 is
represented.
1. When E = 255 and M=0, the value is represented.
2. When E = 0 and M 0, denormal values are represented.
3. When E = 2555 and M 0, the value represented is called Not a number.
19.How overflow occur in subtraction?APRIL/MAY2015
If 2 Two's Complement numbers are subtracted, and their signs are different,
then overflow occurs if and only if the result has the same sign as the
subtrahend.
Overflow occurs if
● (+A) − (−B) = −C
● (−A) − (+B) = +C
20.Write the Add/subtract rule for floating point numbers.
1) Choose the number with the smaller exponent and shift its
mantissa right a number of steps equal to the difference in
exponents.
2) Set the exponent of the result equal to the larger exponent.
3) Perform addition/subtraction on the mantissa and determine the sign of
the result
4) Normalize the resulting value, if necessary.
21.Define ALU. MAY/JUNE 2016
The arithmetic and logic unit (ALU) of a computer system is the place
where the actual execution of the instructions take place during the
processing operations. All calculations are performed and all comparisons
(decisions) are made in the ALU. The data and instructions, stored in the
primary storage prior to processing are transferred as and when needed to
the ALU where processing takes place
22.Write the multiply rule for floating point numbers.
1) Add the exponent and subtract 127.
2) Multiply the mantissa and determine the sign of the result
3) Normalize the resulting value, if necessary
22
23.State double precision floating point number?NOV/DEC 2015
23
29.Define Von Neumann Rounding.
ast one of the guard bits is 1, the least significant bit of the retained bits is set
to 1 otherwise nothing is changed in retained bits and simply guard bits are
dropped.
example, ARM added more than 100 instructions in the NEON
multimedia instruction extension to support sub-word parallelism, which
can be used either with ARMv7 or ARMv8.
.Multiply 100010 * 100110.
30.Write the algorithm for restoring division.
n- Restoring Division Algorithm
p 1: Do the following n times: If the sign of A is 0, shift A and Q left one bit
position and subtract M from A; otherwise, shift A and Q left and add M
to A. Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.
p 2: If the Sign of A is 1, add M to A.
31.Define Exception
Also called interrupt. An unscheduled event that disrupts program
execution; used to detect overflow.
32.Define Interrupt
An exception that comes from outside of the processor. (Some
architectures use the term interrupt for all exceptions.)
24
34.Write the multiplication algorithm.
25
36.Write the division algorithm.
37.DefineScientific notation
A notation that renders numbers with a single digit to the left of
the decimal point.
0.1ten×10−8
26
41.Define Single precision
A floating-point value represented in a single 32-bit word.
o s is the sign of the floating-point number (1 meaning negative),
o exponent is the value of the 8-bit exponent field (including the
sign of the exponent), and
o fraction is the 23-bit number.
Answer
The sign bit is 1, the exponent field contains 129, and the fraction field contains
1×2−2=1/4, or 0.25. Using the basic equation,
27
45.Write The algorithm for binary floating-point addition that follows this
decimal example.
46.Write The algorithm for binary floating-point addition that follows this
decimal example
28
47.What are the floating point instructions supported by MIPS?
Floating-point addition, single (add.s) and addition, double (add.d)
■ Floating-point subtraction, single (sub.s) and subtraction, double (sub.d)
■ Floating-point multiplication, single (mul.s) and multiplication, double (mul.d)
■ Floating-point division, single (div.s) and division, double (div.d)
29
49.What are the advantages to represent number in IEEE format:
1. It simplifies exchange of data that includes floating-point numbers;
2. it simplifies the floating-point arithmetic algorithms to know that numbers
will always be in this form; and
ncreases the accuracy of the numbers that can be stored in a word, since the
unnecessary leading 0s are replaced by real digits to the right of the
binary point
50.State Rules for floating point addition.(APR/MAY 2017)
Assume that only four decimal digits of the significand and two decimal digits of
the exponent.
Step 1: Align the decimal point of the number that has the smaller
exponent
Step 2: addition of the significands:
Step 3: This sum is not in normalized scientific notation, so adjust it:
Step 4: Since the significand can be only four digits long (excluding the
sign), we round the number. truncate the number if the digit to
the right of the desired point .
PART-B
Questions
1.Explain the sequential version of Multiplication algorithm in detail with
diagram hardware and examples ( APRIL/MAY2015)
Refer Notes(40-43)
30
10.Briefly explain Carry lookahead adder(NOV/DEC2014) (6)
Refer Notes(Pg 62-64)
31
UNIT 3
PART A
Questions
1.What is pipelining?
The technique of overlapping the execution of successive instruction for
substantial improvement in performance is called pipelining.
.
The time required between moving an instruction one step down the pipeline is
a processor cycle.
To resolve the hazard the pipeline is stall for 1 clock cycle. A stall is
commonly called a pipeline bubble, since it floats through the pipeline taking
space but carrying no useful work.
Adding registers between pipeline stages me adding logic between stages and
setup and hold times for proper operations. This delay is known as pipeline
register delay.
Any condition that causes the pipeline to stall is called hazard. They are
also called as stalls or bubbles. The various pipeline hazards are:
This is the situation when two instruction require the use of a given hardware
resource at the same time. The most common case in which this hazard may
arise is in access to memory
11.What is side effect?
33
15.Define exception and interrupt. (DEC 2012,NOV/DEC
14,MAY/JUNE/2016,APR/MAY 2018)
Exception:
The term exception is used to refer to any event that causes an interruption.
Interrupt:
An exception that comes from outside of the processor. There are two
types of interrupt.
1. Imprecise interrupt and 2.Precise interrupt
16.Why is branch prediction algorithm needed? Differentiate between the
static and dynamic techniques. (May 2013,APR/MAY 2015,NOV/DEC 15)
The branch instruction will introduce branch penalty which would reduce the
gain in performance expected from pipelining. Branch instructions can be
handled in several ways to reduce their negative impact on the rate of
execution of instructions. Thus the branch prediction algorithm is needed.
The static branch prediction, assumes that the branch will not take place and to
continue to fetch instructions in sequential address order.
The idea is that the processor hardware assesses the likelihood of a given
branch being taken by keeping track of branch decisions every time that
instruction is executed. The execution history used in predicting the outcome
of a given branch instruction is the result of the most recent execution of that
instruction.
The address specified in a branch, which becomes the new program counter, if
the branch is taken. In MIPS the branch target address is given by the sum of
the offset field of the instruction and the address of the instruction following
the branch
18.How do control instructions like branch, cause problems in a pipelined
processor?
Pipelined processor gives the best throughput for sequenced line instruction.
In branch instruction, as it has to calculate the target address, whether the
instruction jump from one memory location to other. In the meantime, before
calculating the larger, the next sequence instructions are got into the pipelines,
which are rolled back, when target is calculated.
34
19.What is meant by super scalar processor?
Super scalar processors are designed to exploit more instruction level
parallelism in user programs. This means that multiple functional units are
used. With such an arrangement it is possible to start the execution of several
instructions in every clock cycle. This mode of operation is called super scalar
execution.
21.What is Vectorizer?
The process to replace a block of sequential code by vector instructions is
called vectorization. The system software, which generates parallelism, is
called as vectorizing compiler.
Based on the configuration → Static and Dynamic pipelines and linear and non
linear pipelines
35
27.What are the problems faced in instruction pipeline.
Resource conflicts → Caused by access to the memory by two at the same
time. Most of the conflicts can be resolved by using separate instruction and
data memories.
Data dependency → Arises when an instruction depends on the results of the
previous instruction but this result is not yet available.
Branch difficulties → Arises from branch and other instruction that change the
value of PC (Program Counter).
One of the most important methods for finding and exploiting more ILP is
speculation. It is an approach whereby the compiler or processor guesses the
outcome of an instruction to remove it as dependence in executing other
instructions. For example, we might speculate on the outcome of a branch, so
that instructions after the branch could be executed earlier.
Speculation (also known as speculative loading ), is a process implemented in
Explicitly Parallel Instruction Computing ( EPIC ) processors and their
compiler s to reduce processor-memory exchanging bottlenecks or latency by
putting all the data into memory in advance of an actual load instruction
36
33. What is Adder:
An adder is needed to compute the next instruction address. The adder is an
ALU wired to always add its two 32-bit inputs and place the sum on its output.
0000 AND
0001 OR
0010 add
0110 Sub
37
38.Define Don’t-care term
An element of a logical function in which the output does not depend on
the values of all the inputs
38
41.What is Structural hazard?
When a planned instruction cannot execute in the proper clock cycle
because the hardware does not support the combination of instructions that are set
to execute.
If there is a single memory instead of two memories. If the pipeline had a
fourth instruction, that in the same clock cycle the first instruction is accessing
data from memory while the fourth instruction is fetching an instruction from that
same memory. Without two memories, pipeline could have a structural hazard.
To avoid structural hazards
● When designing a pipeline designer can change the design
By providing sufficient resources
Define Data Hazards. .(APR/MAY 2017)
Data hazard is also called a pipeline data hazard. When a planned
instruction cannot execute in the proper clock cycle because data that is needed to
execute the instruction is not yet available.
● In a computer pipeline, data hazards arise from the dependence of one
instruction on an earlier one that is still in the pipeline
● Example:
add instruction followed immediately by a subtract instruction that
uses the sum ($s0):
add$s0, $t0, $t1
sub$t2, $s0, $t3
39
44.Define Pipeline stall
Pipeline stall is also called as bubble. A stall initiated in order to resolve a
hazard.
Correlating predictor
A branch predictor that combines local behavior of a particular branch and
global information about the behavior of some recent number of executed
branches.
40
49.Name control signal to perform arithmetic operation.(APR/MAY 2017)
1.Regdst
2.Regwrite
3.ALU Src
PART-B
Questions
1.Explain the basic MIPS implementation with binary multiplexers and
control lines(16) (NOV/DEC 15/APR 2019)
Refer Notes(Pg 65-67 )
2.What is hazards ?Explain the different types of pipeline hazards
withsuitableexamples.(NOV/DEC2014,APRIL/MAY2015,MAY/J
UNE 2016,NOV/DEC2017)
Refer Notes(Pg 84-89)
3.Explain how the instruction pipeline works. What are the various
situations where an instruction pipeline can stall? Illustration with an
example?(NOV/DEC 2015,NOV/DEC 2016/APR 2019). Refer Notes(Pg
78-83)
4.Explain data path in detail(NOV/DEC 14,NOV/DEC2017)
Refer Notes(Pg 68-72)
5.Explain dynamic branch prediction .Refer Notes(Pg 89-93)
41
9.What is pipelining?Discuss about pipelined datapath and
control(16)MAY/JUNE2016
Refer Notes(Pg 78-83)
16.For the problems in this exercise, assume that there are no pipeline stalls
and that the breakdown of executed instructions is as follows:
add addi not beq lw sw
20% 20% 0% 25% 25% 10%
In what fraction of all cycles is the data memory used?
In what fraction of all cycles is the input of the sign-extend
circuit needed? What is this circuit doing in cycles in which its input is not
needed? (Refer notes)
Consider the following loop. loop:lw r1,0(r1)
and r1,r1,r2 lw r1,0(r1) lw r1,0(r1)
beq r1,r0,loop
17.Assume that perfect branch prediction is used (no stalls due to control
hazards),that there are no delay slots, and that the pipeline has full forwarding
support. Also assume that many iterations of this loop are executed before the
loop exits. (Refer notes)
42
UNIT 4
.
PART-A
5.What is Speculation?
An approach whereby the compiler or processor guesses the outcome of an
instruction to remove it as dependence in executing other instructions
43
7.What is Loop unrolling?
A technique to get more performance from loops that access arrays, in which multiple
copies of the loop body are made and instructions from different iterations are
scheduled together
.
15.Define Strong scaling and weak scaling. APRIL/MAY 2015,NOV/DEC2017
Strong scaling
Speed-up achieved on a multi-processor without increasing the size of the problem.
Weak scaling.
Speed-up achieved on a multi-processor while increasing the size of the problem
proportionally to the increase in the number of processors.
44
16.Define Single Instruction, Single Data stream(SISD)
A sequential computer which exploits no parallelism in either the instruction or
data streams. Single control unit (CU) fetches single Instruction Stream (IS)
from memory. The CU then generates appropriate control signals to direct
single processing element (PE) to operate on single Data Stream (DS) i.e. one
operation at a time.
Examples of SISD architecture are the traditional uniprocessormachines like a
PC
45
threads
- CPU must be able to switch threads every clock
Coarse grained multithreading
Switches threads only on costly stalls, such as L2 cache misses
One of the most important methods for finding and exploiting more ILP is
speculation. It is an approach whereby the compiler or processor guesses the
outcome of an instruction to remove it as dependence in executing other
instructions. For example, we might speculate on the outcome of a branch, so
that instructions after the branch could be executed earlier.
Speculation (also known as speculative loading ), is a process implemented in
Explicitly Parallel Instruction Computing ( EPIC ) processors and their
compiler s to reduce processor-memory exchanging bottlenecks or latency by
putting all the data into memory in advance of an actual load instruction
46
26.Differentiate UMA from NUMA. (APRIL/MAY2015)
Uniform memory access (UMA)is a multiprocessor in which latency to
any word in main memory is about the same no matter which processor requests
the access.
Non uniform memory access (NUMA) is a type of single address space
multiprocessor in which some memory accesses are much faster than others
depending on which processor asks for which word.
47
32.Define cluster.
Group of independent servers (usually in close proximity to one another)
interconnected through a dedicated network to work as one centralized data
processing resource. Clusters are capable of performing multiple complex
instructions by distributing workload across all connected servers. Clustering
improves the system's availability to users, its aggregate performance, and
overall tolerance to faults and component failures. A failed server is
automatically shut down and its users are switched instantly to the other
servers
48
37.What are the advantages of Speculation?
● Speculating on certain instructions may introduce exceptions that were
formerly not present.
● Example a load instruction is moved in a speculative manner, but the
address it uses is not legal when the speculation is incorrect.
● Compiler-based speculation, such problems are avoided by adding special
speculation support that allows such exceptions to be ignored until it is
clear that they really should occur.
● In hardware-based speculation, exceptions are simply buffered until it is
clear that the instruction causing them is no longer speculative and is ready
to complete; at that point the exception is raised, and nor-mal exception
handling proceeds.
● Speculation can improve performance when done properly and decrease
performance when done carelessly.
49
43.Write down the difference between this simple superscalar and a VLIW
processor:
● The code, whether scheduled or not, is guaranteed by the hardware to
execute correctly.
● The compiled code always run correctly independent of the issue rate
or pipeline structure of the processor.
● In some VLIW designs, recompilation was required when moving
across different processor models.
● In other static issue processors, code would run correctly across
different implementations, but often so poorly.
50
46.Write down the Speed-up (Performance Improvement) equation.
It tells us how much faster a task can be executed using the
machine with the enhancement as compare to the original
machine. It is defined as
Speedup =
or Speedup = Fractionenhanced(Fe)
Advantages:
● coarse-grained multithreading is much more useful for reducing the
penalty of high-cost stalls
Disadvantages:
● Coarse-grained multithreading is limited in its ability to overcome
throughput losses, especially from shorter stalls.
● This limitation arises from the pipeline start-up costs of coarse-grained
multithreading. Because a processor with coarse-grained multithreading
issues instructions from a single thread, when a stall occurs, the pipeline
must be emptied or frozen.
The new thread that begins executing after the stall must fill the pipeline before
instructions will be able to complete
51
50.What are the Advantages SMT.
● Simultaneous Multithreaded Architecture is superior in performance to a
multiple-issue multiprocessor (multiple-issue CMP).
● SMP boosts utilization by dynamically scheduling functional units among
multiple threads.
● SMT also increases hardware design flexibility.
● SMT increases the complexity of instruction scheduling.
● With register renaming and dynamic scheduling, multiple instructions
from independent threads can be issued without regard to the dependences
among them; the resolution of the dependences can be handled by the
dynamic scheduling capability.
● Since you are relying on the existing dynamic mechanisms, SMT does not
switch resources every cycle. Instead, SMT is always executing
instructions from multiple threads, leaving it up to the hardware to
associate instruction slots and renamed registers with their proper threads.
PART-B
Questions
1.Explain Instruction level parallel processing state the challenges of parallel
processing.(NOV/DEC2014,APR/MAY2018)Refer Notes(Pg 96-103)
52
2016,NOV/DEC2017/APR 2019)
Refer Notes(Pg 108-110)
5.Explain cluster and other Message passing Multiprocessor (Refer notes.)
53
UNIT 5
PART -A
The locality principle stating that if a data location is referenced, data locations
with nearby addresses will tend to be referenced soon.
A structure that uses multiple levels of memory with different speeds and
sizes. The faster memories are more expensive per bit than the slower memories.
The principle stating that a data location is referenced then it will tend to be
. referenced again soon.
.
7.How cache memory is used to reduce the execution time. (APR/MAY’10)
If active portions of the program and data are placed in a fast small
memory, the average memory access time can be reduced, thus reducing the
total execution time of the program. Such a fast small memory is called as
cache memory.
54
8.Define memory interleaving. (A.U.MAY/JUNE ’11) (apr/may2017)
In order to carry out two or more simultaneous access to memory, the
memory must be partitioned in to separate modules. The advantage of a modular
memory is that it allows the interleaving i.e. consecutive addresses are assigned
to different memory module
It is a fast memory that is inserted between the larger slower main memory and the
processor. It holds the currently active segments of a program and their data
11.What is memory system? [MAY/JUNE ‘11] [APR/MAY 2012]
Every computer contains several types of devices to store the instructions and data
required for its operation. These storage devices plus the algorithm-implemented
by hardware and/or software-needed to manage the stored information from the
memory system of computer
55
15.Distinguish between isolated and memory mapped I/O? (May 2013)
The isolated I/O method isolates memory and I/O addresses so that memory
address values are not affected by interface address assignment since each has its
own address space.
In memory mapped I/O, there are no specific input or output instructions. The
CPU can manipulate I/O data residing in interface registers with the same
instructions that are used to manipulate memory words
16.Distinguish between memory mapped I/O and I/O mapped I/O. Memory
mapped I/O:
When I/O devices and the memory share the same address space, the
arrangement is called memory mapped I/O. The machine instructions that can
access memory is used to trfer data to or from an I/O device.
56
22.Compare Static RAM and Dynamic RAM.(Dec 2013,APR/MAY2018)
Static RAM is more expensive, requires four times the amount of space for a given
amount of data than dynamic RAM, but, unlike dynamic RAM, does not need to
be power-refreshed and is therefore faster to access. Dynamic RAM uses a kind of
capacitor that needs frequent power refreshing to retain its charge. Because reading
a DRAM discharges its contents, a power refresh is required after each read. Apart
from reading, just to maintain the charge that holds its content in place, DRAM
must be refreshed about every 15 microseconds. DRAM is the least expensive kind
of RAM.
SRAMs are simply integrated circuits that are memory arrays with a single
access port that can provide either a read or a write. SRAMs have a fixed access
time to any datum.
SRAMs don’t need to refresh and so the access time is very close to the cycle
time. SRAMs typically use six to eight transistors per bit to prevent the
information from being disturbed when read. SRAM needs only minimal power to
retain the charge in standby mode.
In a dynamic RAM (DRAM), the value kept in a cell is stored as a charge
in a capacitor. A single transistor is then used to access this stored charge, either to
read the value or to overwrite the charge stored there. Because DRAMs use only a
single transistor per bit of storage, they are much denser and cheaper per bit than
SRAM
DRAMs store the charge on a capacitor, it cannot be kept indefinitely and
must periodically be refreshed.
57
26.What is the need to implement memory as a hierarchy?
(APRIL/MAY2015/APR 2019)
27.Point out how DMA can improve I/O speed? APRIL/MAY 2015
CPU speeds continue to increase, and new CPUs have multiple processing
elements on the same chip.A large amount of data can be processed very quickly
Problem in the transfer of data to CPU or even memory in a reasonable amount of
time so that CPU has some work to do at all time . Without DMA, when the CPU
is using programmed input/output, it is typically fully occupied for the entire
duration of the read or write operation, and is thus unavailable to perform other
work. With DMA, the CPU first initiates the transfer, then it does other operations
while the transfer is in progress, and it finally receives an interrupt from the DMA
controller when the operation is done.
58
30.In many computers the cache block size is in the range 32 to 128 bytes.
What would be the main Advantages and disadvantages of making the size
of the cache blocks larger or smaller?
Larger the size of the cache fewer be the cache misses if most of the data in the
block are actually used. It will be wasteful if much of the data are not used before
the cache block is moved from cache. Smaller size means more misses
31.Define USB.
Universal Serial Bus, an external busstandard that supports data transfer ratesof
12 Mbps. A single USB portcan be used to connect up to 127 peripheral devices,
such as mice, modems, and keyboards. USB also supportsPlug-and-Play
installationandhot plugging.
.
32.Define Memory latency
The amount of time it takes to transfer a word of data to or from the memory.
59
38.Define miss penalty
The miss penalty is the time to replace a block in the upper level with the
corresponding block from the lower level, plus the time to deliver this block to
the processor
Tag A field in a table used for a memory hierarchy that contains the address
information required to identify whether the associated block in the hierarchy
corresponds to a requested word.
The simplest way to keep the main memory and the cache consistent is always
to write the data into both the memory and the cache. • This scheme is called
write-through.
In a write back scheme, when a write occurs, the new value is written only to
the block in the cache.
43.What is TLB.
60
44.What are the messages transferred in DMA?
Bus Master: The device that is allowed to initiate data transfers on the bus at
any given time is called the bus master
Bus Arbitration: It is the process by which the next device to become the bus
master is selected and the bus mastership is transferred to it.
PART -B
Questions
1. . Explain in detail about memory
Technologies(APRIL/MAY2015,DEC2017) Refer Notes(Pg 120-124)
2. . Expain in detail about memory Hierarchy with neat diagram
Refer Notes( Pg 118-120)
61
Draw the typical block diagram of a DMA controller and explain how it is
6.
used for direct data transfer between memory and peripherals. (NOV/DEC
2015,MAY/JUNE 2016,NOV/DEC 2016,MAY/JUN 2018/APR 2019)
Refer Notes(146-151)
Explain in detail about any two standard input and output interfaces
11.
required to connect the I/O devices to the bus.(NOV/DEC2014/APR
2019) Refer Notes(Pg 151-156)
62