Advanced Computer Architecture (ACA) Assignment
Advanced Computer Architecture (ACA) Assignment
Machine Cycle
Machine Cycle For every instruction, a processor repeats a set of four basic operations, which comprise a
machine cycle: (1) fetching, (2) decoding, (3) executing, and, if necessary, (4) storing. Fetching is the process of
obtaining a program instruction or data item from memory. The term decoding refers to the process of
translating the instruction into signals the computer can execute. Executing is the process of carrying out the
commands. Storing, in this context, means writing the result to memory (not to a storage medium).
Instruction CycleInstruction cycle is a cycle in which one instruction that is fetched from the memory and
get executed right after when machine language get any instruction from the Computer.
Before moving forward with pipelining, check these topics out to understand the concept better :
Memory Organization
Memory Mapping and Virtual Memory
Parallel Processing
Pipelining is a technique where multiple instructions are overlapped during execution. Pipeline is
divided into stages and these stages are connected with one another to form a pipe like structure.
Instructions enter from one end and exit from another end.
Q3) How the speedup is related to number of stages of the pipeline? Justify
your answer.
Ans: Pipelining is one way of improving the overall processing performance of a processor.
This architectural approach allows the simultaneous execution of several instructions.
Pipelining is transparent to the programmer; it exploits parallelism at the instruction level by
overlapping the execution process of instructions. It is analogous to an assembly line where
workers perform a specific task and pass the partially completed product to the next worker.
This chapter explains various types of pipeline design. It describes different ways to measure
their performance. Instruction pipelining and arithmetic pipelining, along with methods for
maximizing the throughput of a pipeline, are discussed. The concepts of reservation table
and latency are discussed, together with a method of controlling the scheduling of static
and dynamic pipelines.
Ans:
SIMDMIMD
Ans: Data hazards occur when instructions that exhibit data dependence, modify data in different stages of a
pipeline. Hazard cause delays in the pipeline. There are mainly three types of data hazards:
RAW hazard occurs when instruction J tries to read data before instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R4 <- R2 + R3
WAR hazard occurs when instruction J tries to write data before instruction I reads it.
Eg:
I: R2 <- R1 + R3
J: R3 <- R4 + R5
WAW hazard occurs when instruction J tries to write output before instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R2 <- R4 + R5
WAR and WAW hazards occur during the out-of-order execution of the instructions.
Ans: A cache miss is an event in which a system or application makes a request to retrieve data from a cache, but that
specific data is not currently in cache memory. Contrast this to a cache hit, in which the requested data is successfully
retrieved from the cache. A cache miss requires the system or application to make a second attempt to locate the data, this
time against the slower main database. If the data is found in the main database, the data is then typically copied into the
cache in anticipation of another near-future request for that same data.A cache miss occurs either because the data was
never placed in the cache, or because the data was removed (“evicted”) from the cache by either the caching system itself or
an external application that specifically made that eviction request. Eviction by the caching system itself occurs when space
needs to be freed up to add new data to the cache, or if the time-to-live policy on the data expired.
An array is made up of indexed collections of information called indices. Though an array can, in rare cases, have
only one index collection, a vector is technically indicative of an array with at least two indices. Vectors are
sometimes referred to as "blocks" of computer data.
Vector and array processing technology are most often seen in high-traffic servers.
RISC CISC
RISC emphasizes efficiency in cycles per CISC emphasizes efficiency in instructions per
instruction program.
Very fewer instructions are present. The A large number of instructions are present in
number of instructions is generally less than the architecture.
100.
No instruction with a long execution time due Some instructions with long execution times.
to very simple instruction set. Some early RISC These include instructions that copy an entire
machines did not even have an integer multiply block from one part of memory to another and
instruction, requiring compilers to implement others that copy multiple registers to and from
multiplication as a sequence of additions. memory.
Fixed-length encodings of the instructions are Variable-length encodings of the instructions.
used.
Example: In IA32, generally all instructions are Example: IA32 instruction size can range from 1
encoded as 4 bytes. to 15 bytes.
Simple addressing formats are supported. Only Multiple formats are supported for specifying
base and displacement addressing is allowed. operands. A memory operand specifier can
have many different combinations of
displacement, base and index registers.
RISC does not support array. CISC supports array.
Implementation programs exposed to machine Implementation programs are hidden from
level programs. Few RISC machines do not machine level programs. The ISA provides a
allow specific instruction sequences. clean abstraction between programs and how
they get executed.
Registers are being used for procedure The stack is being used for procedure
arguments and return addresses. Memory arguments and return addresses.
references can be avoided by some procedures.
No condition codes are used. Condition codes are used.
Q10) Identify the data hazard while executing the following instruction in DLX
pipeline. Draw the forwarding path to avoid the hazard.
ADD R1, R2, R3
SUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9
XOR R10, R1, R11
Ans: All instructions after ADD use result of ADD ADD writes the register in WB but SUB needs it
in ID. This is a data hazard.
ADD instruction causes a hazard in next 3 instructions b/c register not written until after those 3
read it
(c) USB
A serial transmission format has been chosen for the USB because a serial bus
satisfies the low-cost and flexibility requirements. Clock and data information are
encoded together and transmitted as a single signal. Hence, there are no limitations
on clock frequency or distance arising from data skew. Therefore, it is possible to
provide a high data transfer bandwidth by using a high clock frequency. As pointed
out earlier, the USB offers three bit rates, ranging from 1.5 to 480 megabits/s, to suit
the needs of different I/O devices.
For example, if a hit takes 0.5ns and happens 90% of the time, and a miss takes 10ns
and happens 10% of the time, on average you spend 0.4ns in hits and 1.0ns in
misses, for a total of 1.4ns average access time.
The Average Memory Access Time equation (AMAT) has three components: hit time,
miss rate, miss penalty.
1) Hit time (H) is the time to hit in the cache.
2) Miss rate (MR) is the frequency of cache misses,
3) Miss penalty (MP) is the cost of a cache miss in terms of time.
Assignment -2
SpeedupMAX = 1/((1-p)+(p/s))
SpeedupMAX = maximum performance gain
Proof :-
Let Speedup be S, old execution time be T, new execution time be T’ , execution
time that is taken by portion A(that will be enhanced) is t, execution time that is
taken by portion A(after enhancing) is t’, execution time that is taken by portion
that won’t be enhanced is t n , Fraction enhanced is f’, Speedup enhanced is S’. Now
from the above equation,
2-Explain Temporal Locality and Spatial Locality.
Temporal locality - The concept that a resource that is referenced at one point in
time will be referenced again sometime in the near future. Temporal locality refers to
the reuse of specific data and/or resources within a relatively small time duration.
So, 200 tasks would need = 200 clock cycles + 199 = 399 clock cycles.
ADD R2 R2 + M[313]
INC R3 R3 + 1
STORE M [314] R3