0% found this document useful (0 votes)
219 views

Advanced Computer Architecture (ACA) Assignment

The document discusses differences between a memory cycle and instruction cycle. A memory cycle consists of four basic operations: fetching, decoding, executing, and storing. An instruction cycle involves fetching an instruction from memory and executing it. The document also discusses how pipelining improves processor performance by allowing simultaneous execution of instructions through overlapping stages, increasing throughput. It relates speedup to the number of pipeline stages, noting that more stages can provide more parallelism and thus higher speedup.

Uploaded by

Sayed Aman Konen
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
219 views

Advanced Computer Architecture (ACA) Assignment

The document discusses differences between a memory cycle and instruction cycle. A memory cycle consists of four basic operations: fetching, decoding, executing, and storing. An instruction cycle involves fetching an instruction from memory and executing it. The document also discusses how pipelining improves processor performance by allowing simultaneous execution of instructions through overlapping stages, increasing throughput. It relates speedup to the number of pipeline stages, noting that more stages can provide more parallelism and thus higher speedup.

Uploaded by

Sayed Aman Konen
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Assignment-1

Name: Sayed Aman Konen


Redg.: 1801227460
Q1) Differentiate between memory cycle and instruction
cycle.
Ans: Difference Between Machine Cycle and Instruction Cycle is that Machine Cycle For every instruction, a
processor repeats a set of four basic operations, which comprise a machine cycle: (1) fetching, (2) decoding, (3)
executing, and, if necessary, (4) storing. While Instruction cycle is a cycle in which one instruction that is fetched
from the memory and get executed right after when machine language get any instruction from the Computer.

Machine Cycle

Machine Cycle For every instruction, a processor repeats a set of four basic operations, which comprise a
machine cycle: (1) fetching, (2) decoding, (3) executing, and, if necessary, (4) storing. Fetching is the process of
obtaining a program instruction or data item from memory. The term decoding refers to the process of
translating the instruction into signals the computer can execute. Executing is the process of carrying out the
commands. Storing, in this context, means writing the result to memory (not to a storage medium).

Instruction CycleInstruction cycle is a cycle in which one instruction that is fetched from the memory and
get executed right after when machine language get any instruction from the Computer.

Q2) How do pipeline improves performance of a system?


Ans: Pipelining is the process of accumulating instruction from the processor through a pipeline. It
allows storing and executing instructions in an orderly process. It is also known as pipeline
processing.

Before moving forward with pipelining, check these topics out to understand the concept better :

 Memory Organization
 Memory Mapping and Virtual Memory
 Parallel Processing

Pipelining is a technique where multiple instructions are overlapped during execution. Pipeline is
divided into stages and these stages are connected with one another to form a pipe like structure.
Instructions enter from one end and exit from another end.

Pipelining increases the overall instruction throughput.


In pipeline system, each segment consists of an input register followed by a combinational circuit.
The register is used to hold data and combinational circuit performs operations on it. The output
of combinational circuit is applied to the input register of the next segment.

Q3) How the speedup is related to number of stages of the pipeline? Justify
your answer.
Ans: Pipelining is one way of improving the overall processing performance of a processor.
This architectural approach allows the simultaneous execution of several instructions.
Pipelining is transparent to the programmer; it exploits parallelism at the instruction level by
overlapping the execution process of instructions. It is analogous to an assembly line where
workers perform a specific task and pass the partially completed product to the next worker.
This chapter explains various types of pipeline design. It describes different ways to measure
their performance. Instruction pipelining and arithmetic pipelining, along with methods for
maximizing the throughput of a pipeline, are discussed. The concepts of reservation table
and latency are discussed, together with a method of controlling the scheduling of static
and dynamic pipelines.

Q4) What is the difference between SIMD and MIMD?

Ans:

SIMDMIMD

1. SIMD stands for While MIMD stands for Multiple Instruction


Single Multiple Data.
Instruction
Multiple Data.

2. SIMD requires While it requires more or large memory.


small or less
memory.

3. The cost of While it is costlier than SIMD.


SIMD is less
than MIMD.
4. It has single While it have multiple decoders.
decoder.

5. It is latent or While it is accurate or explicit synchronization.


tacit
synchronization.

Q5) What is the difference between instruction hazard and


data hazard?

Ans: Data hazards occur when instructions that exhibit data dependence, modify data in different stages of a
pipeline. Hazard cause delays in the pipeline. There are mainly three types of data hazards:

1) RAW (Read after Write) [Flow/True data dependency]


2) WAR (Write after Read) [Anti-Data dependency]
3) WAW (Write after Write) [Output data dependency]

Let there be two instructions I and J, such that J follow I. Then,

 RAW hazard occurs when instruction J tries to read data before instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R4 <- R2 + R3
 WAR hazard occurs when instruction J tries to write data before instruction I reads it.
Eg:
I: R2 <- R1 + R3
J: R3 <- R4 + R5
 WAW hazard occurs when instruction J tries to write output before instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R2 <- R4 + R5
WAR and WAW hazards occur during the out-of-order execution of the instructions.

Q6) What do you mean by cache miss?

Ans: A cache miss is an event in which a system or application makes a request to retrieve data from a cache, but that
specific data is not currently in cache memory. Contrast this to a cache hit, in which the requested data is successfully
retrieved from the cache. A cache miss requires the system or application to make a second attempt to locate the data, this
time against the slower main database. If the data is found in the main database, the data is then typically copied into the
cache in anticipation of another near-future request for that same data.A cache miss occurs either because the data was
never placed in the cache, or because the data was removed (“evicted”) from the cache by either the caching system itself or
an external application that specifically made that eviction request. Eviction by the caching system itself occurs when space
needs to be freed up to add new data to the cache, or if the time-to-live policy on the data expired.

Q7) Differentiate between array and vector processors.


Ans: In computing, a vector processor or array processor is a central processing unit (CPU) that implements an
instruction set containing instructions that operate on one-dimensional arrays of data called vectors.
Vector and array processing are essentially the same with small differences.

An array is made up of indexed collections of information called indices. Though an array can, in rare cases, have
only one index collection, a vector is technically indicative of an array with at least two indices. Vectors are
sometimes referred to as "blocks" of computer data.

Vector and array processing technology are most often seen in high-traffic servers.

Q8) Differentiate between RISC and CISC architectures.


Ans:

RISC CISC
RISC emphasizes efficiency in cycles per CISC emphasizes efficiency in instructions per
instruction program.
Very fewer instructions are present. The A large number of instructions are present in
number of instructions is generally less than the architecture.
100.
No instruction with a long execution time due Some instructions with long execution times.
to very simple instruction set. Some early RISC These include instructions that copy an entire
machines did not even have an integer multiply block from one part of memory to another and
instruction, requiring compilers to implement others that copy multiple registers to and from
multiplication as a sequence of additions. memory.
Fixed-length encodings of the instructions are Variable-length encodings of the instructions.
used.
Example: In IA32, generally all instructions are Example: IA32 instruction size can range from 1
encoded as 4 bytes. to 15 bytes.
Simple addressing formats are supported. Only Multiple formats are supported for specifying
base and displacement addressing is allowed. operands. A memory operand specifier can
have many different combinations of
displacement, base and index registers.
RISC does not support array. CISC supports array.
Implementation programs exposed to machine Implementation programs are hidden from
level programs. Few RISC machines do not machine level programs. The ISA provides a
allow specific instruction sequences. clean abstraction between programs and how
they get executed.
Registers are being used for procedure The stack is being used for procedure
arguments and return addresses. Memory arguments and return addresses.
references can be avoided by some procedures.
No condition codes are used. Condition codes are used.

Q9) Give an account of processor performance.


Ans: Processor performance is usually measured using benchmark programs. There are many
benchmark programs available and one should exercise care when comparing the performance of
various processors as the performance depends on many external factors, such as the efficiency of
the compiler used, and the type of operation performed for the measurement.
Many attempts were made in the past to measure the performance of a processor and quote it as
a single number. For example, MOPS, MFLOPS, Dhrystone, DMIPS, BogoMIPS, and so on.
Nowadays, CoreMark is one of the most commonly used benchmark programs used to indicate
the processor performance.

Q10) Identify the data hazard while executing the following instruction in DLX
pipeline. Draw the forwarding path to avoid the hazard.
ADD R1, R2, R3
SUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9
XOR R10, R1, R11

Ans: All instructions after ADD use result of ADD ADD writes the register in WB but SUB needs it
in ID. This is a data hazard.

ADD instruction causes a hazard in next 3 instructions b/c register not written until after those 3
read it

Q11) Give an account of hierarchical memory system with


respect to speed and size.
 Ans:In a hierarchical memory system, the entire addressable memory space is
available in the largest, slowest memory and incrementally smaller and faster
memories, each containing a subset of the memory below it, proceed in steps up
toward the processor. This hierarchical organization of memory works primarily
because of the Principle of Locality. There are basically two different types of
locality: temporal and spatial.
Temporal and spatial locality ensures that nearly all references can be found in
smaller memories and at the same time gives the illusion of a large, fast memory
being presented to the processor. As we move away from the processor, the speed
decreases, cost decreases and the size increases. The registers and cache memories
are closer to the processor, satisfying the speed requirements of the processor, the
main memory comes next and last of all, the secondary storage which satisfies the
capacity requirements. The registers which are part of the CPU itself have few
thousand of bytes of storage space. The first level cache has a few kilobytes, the
second level cache has a few hundred kilobytes and the storage increases to a few
megabytes in the case of the third level of cache

1-Write short notes on the following:


(a) Branch prediction in pipeline
 Branch prediction is an approach to computer architecture that attempts to mitigate
the costs of branching. Branch predication speeds up the processing of branch
instructions with CPUs using pipelining. The technique involves only executing
certain instructions if certain predicates are true. Branch prediction is typically
implemented in hardware using a branch predictor. Branch prediction is also known
as branch predication or simply as predication.

(b) Super scalar Vs Super pipeline architecture


 Super-pipelining attempts to increase performance by reducing the clock cycle time.
It achieves that by making each pipeline stage very shallow, resulting in a large
number of pipe stages. A shorter clock cycle means a faster clock. As long as your
cycles per instruction (CPI) doesn’t change, a faster clock means better performance.
Super-pipelining works best with code that doesn’t branch often, or has easily
predicted branches.
Superscalar attempts to increase performance by executing multiple instructions in
parallel. If you can issue more instructions every cycle—without decreasing clock rate
—then your CPI decreases, therefore increasing performance.
Superscalar breaks into two broad flavors: In-order and out-of-order. In-order
superscalar mainly provides benefit to code with instruction-level parallelism among
a small window of consecutive instructions. Out-of-order superscalar allows the
pipeline to find parallelism across larger windows of code, and to hide latencies
associated with long-running instructions. (Example: load instructions that miss the
cache.) In-order vs. out-of-order form a continuum: Some processors have in-order
issue, but out-of-order completion, for example.

(c) USB
 A serial transmission format has been chosen for the USB because a serial bus
satisfies the low-cost and flexibility requirements. Clock and data information are
encoded together and transmitted as a single signal. Hence, there are no limitations
on clock frequency or distance arising from data skew. Therefore, it is possible to
provide a high data transfer bandwidth by using a high clock frequency. As pointed
out earlier, the USB offers three bit rates, ranging from 1.5 to 480 megabits/s, to suit
the needs of different I/O devices.

(d) Explain need of hierarchy in memory organization.


 In the Computer System Design, Memory Hierarchy is an enhancement to organize
the memory such that it can minimize the access time. The Memory Hierarchy was
developed based on a program behavior known as locality of references.

(e) Differentiate Temporal Locality and Spatial Locality.

Spatial Locality Temporal Locality


In Spatial Locality, nearby instructions toIn Temporal Locality, a recently executed
recently executed instruction are likely toinstruction is likely to be executed again
be executed soon. very soon.
It refers to the tendency of execution It refers to the tendency of execution
which involve a number of memory where memory location that have been
locations . used recently have a access.
It is also known as locality in space. It is also known as locality in time.
It only refers to data item which are closed
It repeatedly refers to same data in short
together in memory. time span.
Each time same useful data comes into
Each time new data comes into execution. execution.
Example : Example :
Data elements accessed in array (where Data elements accessed in loops (where
each time different (or just next) element same data elements are accessed multiple
is being accessing ). times).

(f)Formulate Average Memory Access time (AMAT). Briefly


discuss each component of the AMAT.
 A single-level cache is pretty easy to model mathematically.  Each access is either a
hit or a miss, so average memory access time (AMAT) is:

AMAT = time spent in hits + time spent in misses


= hit rate * hit time + miss rate * miss time

For example, if a hit takes 0.5ns and happens 90% of the time, and a miss takes 10ns
and happens 10% of the time, on average you spend 0.4ns in hits and 1.0ns in
misses, for a total of 1.4ns average access time.
The Average Memory Access Time equation (AMAT) has three components: hit time,
miss rate, miss penalty.  
1) Hit time (H) is the time to hit in the cache.
2) Miss rate (MR) is the frequency of cache misses,
3) Miss penalty (MP) is the cost of a cache miss in terms of time. 

(g) Associative, Set associative and Direct mapping


 Cache Mapping:
There are three different types of mapping used for the purpose of cache memory
which are as follows: Direct mapping, Associative mapping, and Set-Associative
mapping. These are explained below.
1) Direct Mapping –
The simplest technique, known as direct mapping, maps each block of main
memory into only one possible cache line. or
In Direct mapping, assigns each memory block to a specific line in the cache. If a
line is previously taken up by a memory block when a new block needs to be
loaded, the old block is trashed. An address space is split into two parts index
field and a tag field. The cache is used to store the tag field whereas the rest is
stored in the main memory. Direct mapping`s performance is directly
proportional to the Hit ratio.
2) Associative Mapping –
In this type of mapping, the associative memory is used to store content and
addresses of the memory word. Any block can go into any line of the cache. This
means that the word id bits are used to identify which word in the block is
needed, but the tag becomes all of the remaining bits. This enables the
placement of any word at any place in the cache memory. It is considered to be
the fastest and the most flexible mapping form.
3) Set-associative Mapping –
This form of mapping is an enhanced form of direct mapping where the
drawbacks of direct mapping are removed. Set associative addresses the
problem of possible thrashing in the direct mapping method. It does this by
saying that instead of having exactly one line that a block can map to in the
cache, we will group a few lines together creating a set. Then a block in memory
can map to any one of the lines of a specific set. Set-associative mapping allows
that each word that is present in the cache can have two or more words in the
main memory for the same index address. Set associative cache mapping
combines the best of direct and associative cache mapping techniques.

Assignment -2

1-State Amdahl’s law. Derive the formula for Overall Speedup.


 In computer architecture, Amdahl's law (or Amdahl's argument) is a formula which
gives the theoretical speedup in latency of the execution of a task at
fixed workload that can be expected of a system whose resources are improved.

Amdahl’s Law can be expressed in mathematically as follows –

 SpeedupMAX = 1/((1-p)+(p/s))
 SpeedupMAX = maximum performance gain

; where s = performance gain factor of p after implement the enhancements.

And p = the part which performance needs to be improved.

Proof :-
Let Speedup be S, old execution time be T, new execution time be T’ , execution
time that is taken by portion A(that will be enhanced) is t, execution time that is
taken by portion A(after enhancing) is t’, execution time that is taken by portion
that won’t be enhanced is t n , Fraction enhanced is f’, Speedup enhanced is S’. Now
from the above equation,
2-Explain Temporal Locality and Spatial Locality.
 Temporal locality - The concept that a resource that is referenced at one point in
time will be referenced again sometime in the near future. Temporal locality refers to
the reuse of specific data and/or resources within a relatively small time duration.

Spatial locality - The concept that likelihood of referencing a resource is higher if a


resource near it was just referenced. Spatial locality (also termed data locality) refers
to the use of data elements within relatively close storage locations.

3-Define the term Speedup enhanced and Fraction Enhanced.


 Speedup enhanced – The improvement gained by the enhanced execution mode;
that is, how much faster the task would run if the enhanced mode were used for the
entire program. For example – If the enhanced mode takes, say 3 seconds for a
portion of the program, while it is 6 seconds in the original mode, the improvement
is 6/3. This value is Speedup enhanced.
Speedup Enhanced is always greater than 1. The overall Speedup is the ratio of the
execution time
Fraction enhanced – The fraction of the computation time in the original computer
that can be converted to take advantage of the enhancement. For example- if 10
seconds of the execution time of a program that takes 40 seconds in total can use
an enhancement , the fraction is 10/40. This obtained value is Fraction Enhanced.
Fraction enhanced is always less than 1.

4-In certain scientific computations it is necessary to perform


the arithmetic operation (Ai + Bi) (Ci + Di) with a stream of
numbers. Specify a pipeline configuration to carry out this
task. List the contents of all registers in the pipeline for i=1
through 6.
5-Draw a space-time diagram for a six segmented pipeline
showing the time it takes to process eight tasks.
 1st task would need 6 clock cycles to come out of pipeline and then remaining 7
tasks would need 1 clock cycle each = total 13 clock cycles.

So, 200 tasks would need =  200 clock cycles + 199 = 399 clock cycles.

6-Determine the number of clock cycles that it takes to process


200 tasks in a six-segment pipeline.
 Let there be 'n' tasks to be performed in the 'pipelined processor'.
The first instruction will be taking 'k' segment pipeline 'cycles' to exit out of the
'pipeline' but the other 'n - 1' instructions will take only '1' 'cycle' each, i.e. , a total of
'n - 1' cycles.
So, to perform 'n' 'instructions' in a 'pipelined processor' time taken is k + (n - 1)
cycles.
So, in our case number of clock cycles = 6 + (200 -1) = 205
7- A non-pipeline system takes 50 ns to process a task. The
same task can be processed in a six-segment pipeline with
clock cycles of 10 ns. Determine the speed up ratio of the
pipeline for 100 tasks. What is the maximum speed up that
can be achieved?

 Total Number of tasks "n" = 100

Time taken by non - pipeline "Tn" = 50 ns

Time period of 100 tasks = ntn


                                       = 100 x 50 = 5000 ns

Number of segment pipeline "K" = 6

Time period of 1 clock cycle = 10 ns

Total time required = ( k + n - 1)tp


                            = ( 6 + 100 - 1)10
                            = 1050 ns

Speed up ratio " S" = 5000/1050


                            = 4.76

8-Formulate a six-segment instruction pipeline for a computer.


Specify the operations to be performed in each segment
9-Consider the four instructions in the following program.
Suppose that the first instruction starts from step 1 in the
pipeline. Specify what operations are performed in the four
segments during step 4.

10- LOAD R1 M [312]

ADD R2 R2 + M[313]
INC R3 R3 + 1
STORE M [314] R3

WAR, RAW AND WAW HAZARDS


 Segments of a four-segment instruction pipeline are:
FI: fetches an instruction
DA: decodes the instruction and calculate the effective address
FO: fetches the operand
EX: executes the instruction

Timing of instruction pipeline for these four instructions is:


Step: 1 2 3 4 5 6 7
Load FI DA FO EX
ADD FI DA FO EX
INC FI DA FO EX
STORE FI DA FO EX
Operations performed in pipeline during step 4 are:
Segment EX: executing the Load instruction
Segment FO: fetching the operand from memory for ADD instruction
Segment DA: decoding INC instruction
Segment FI: fetching the STORE instruction
In step 4 the LOAD instruction has already completed the first three segments (FI,
DA, FO) and is in EX pipeline segment, i.e. it is going to load R1 with the contents of
memory at location 312.
At that time the ADD instruction has already completed the first two segments (FI,
DA) and is about to fetch its operand from memory location 313.
The INC instruction has completed the first (FE) pipeline segment and is about to be
decoded.
The STORE instruction is in FE pipeline segment and it is about to be fetched from
memory.
Load FI DA FO EX
ADD FI DA FO EX
INC FI DA FO EX
STORE FI DA FO EX

You might also like