0% found this document useful (0 votes)
8 views20 pages

Computer Architecture Unit1

The document provides an overview of computer architecture, detailing the main components such as the CPU, memory, input/output devices, and communication channels. It discusses the principles of computer design, including Amdahl's Law for performance improvement, CPU performance equations, and the importance of pipelining in enhancing execution efficiency. Additionally, it covers performance measurement techniques, benchmarks, and potential pipeline conflicts that can affect processing speed.

Uploaded by

Turbo Addict
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views20 pages

Computer Architecture Unit1

The document provides an overview of computer architecture, detailing the main components such as the CPU, memory, input/output devices, and communication channels. It discusses the principles of computer design, including Amdahl's Law for performance improvement, CPU performance equations, and the importance of pipelining in enhancing execution efficiency. Additionally, it covers performance measurement techniques, benchmarks, and potential pipeline conflicts that can affect processing speed.

Uploaded by

Turbo Addict
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

1

Unit 1
INTRODUCTION
Computer architecture is a specification detailing how a set of software and
hardware technology standards interact to form a computer system or
platform. It refers to how a computer system is designed and what
technologies it is compatible with.
REVIEW OF BASIC COMPUTER ARCHITECTURE
The main components in a typical computer systems are:-
Processor : The central processor of a computer is also known as the
CPU, or Central Processing Unit. This processor handles all the basic systems
instructions such as processing mouse and keyboard input and running
applications.
Example: Intel, Advanced Micro Devices (AMD), Celeron, Pentium, Core,
Sempron, Athlon, Phenom.
Memory : It is just like a human brain. It is used to store data and
instructions. Computer memory is the storage space in the computer, where
data is to be processed and instructions required for processing are stored.
Example: Primary memory [RAM->volatile, ROM->non-volatile], Secondary
memory [hard drive, CD].
Input/Output devices : An input device sends information to a
computer system for processing, and an output device reproduces or
displays the result of that processing.
Example: Input device= mouse, keyboard etc. Output device= printer,
monitor etc.
Communication channels : A communication channel refers either
to a physical transmission medium such as a wire, or to a logical connection
over a multiplexed medium such as a radio channel in telecommunications
and computer networking. Communicating data from one location to
another requires some form of pathway or medium.
2

Von Neumann Architecture : Von Neumann Architecture design


consists of a Control Unit, Arithmetic & Logic Unit, Registers and
Inputs/Outputs.
It is based on stored program computer concept, where instruction
data and program data are stored in the same memory.

QUANTITIVE TECHNIQUES IN COMPUTER DESIGN


The most important and pervasive principle of computer design
is to make the common case fast. In applying this simple principle, we have
to decide what the frequent case is and how much performance can be
improved by making the case faster.
A fundamental law, called Amdahl’s Law can be used to quantify this
principle.
Amdahl’s Law: The performance gain that can be obtained by improving
some portion of a computer can be calculated using Amdahl’s Law.
Amdahl’s Law states that the performance improvement to be gained
from using some faster mode of execution is limited by the fraction of the
time the faster mode can be used.
3

Amdahl’s Law defines the speedup that can be gained by using a


particular feature.
Speedup= Performance for entire task using enhancement when
possible / Performance for entire task without using enhancement
Speedup=Serial execution time/Parallel execution time
Example:
(Q)If the serial application executes in 6720 seconds and a corresponding
parallel application runs in 126.7 seconds (using 64 threads and cores). Find
speedup of parallel application?
Soln: Speedup=serial execution time/Parallel execution time
=6720/126.7
=53.038
=53x
Speedup tells us how much faster a task will run using the machine with
the enhancement as opposed to the original machine. Amdahl’s Law gives us
a quick way to find the speedup from some enhancement, which depends on
two factors:

1. The fraction of the computation time in the original machine that can
be converted to take advantage of the enhancement
Example: If 20 seconds of the execution time of a program that takes 60
seconds in total can use an enhancement, the fraction is 20/60. This value,
which we will call Fractionenhanced, is always less than or equal to 1.

2. The improvement gained by the enhanced execution mode; that is, how
much faster the task would run if the enhanced mode were used for the
entire program. This value is the time of the original mode over the
time of the enhanced mode.
4

Example: If the enhanced mode takes 2 seconds for some portion of the
program that can completely use the mode, while the original mode took
5 seconds for the same portion, the improvement is 5/2.
We will call this value, which is always greater than 1, Speedupenhanced.

CPU Performance Equation: Essentially all computers are


constructed using a clock running at a constant rate. These discrete time
events are called ticks, clock ticks, clock periods, clocks, cycles, or clock
cycles. Computer designers refer to the time of a clock period by its duration
(e.g., 1 ns) or by its rate (e.g., 1 GHz). CPU time for a program can then be
expressed.
CPU Time = CPU Clock Cycles Per a Program X Clock Cycle Time
In addition to the number of clock cycles needed to execute a program, we
can also count the number of instructions executed—the instruction path
length or instruction count (IC). If we know the number of clock cycles and
the instruction count we can calculate the average number of clock cycles
per instruction (CPI).
CPI= CPU Clock Cycles Per a Program / Instruction Count
CPU time = Instruction count X Clock Cycle Time X Cycles per
Instruction
Clock cycle time: Clock cycle is the speed of a computer processor, or
CPU, is determined by the clock cycle, which is the amount of time between
two pulses of an oscillator. Generally, the higher number of pulses per
second, the faster the computer processor will be able to process
information.
• Clock period or Cycle time (Tc): It is the time between rising edges of a
respective clock signal.
• Clock Frequency (Fc): Reciprocal of clock time.
Fc =1/Tc
Increasing the clock frequency, increasing the work that a digital
system can accomplish per unit time.
5

A clock speed of 3.5 GHz to 4.0 GHz is generally considered a good clock
speed for gaming.
CPI(Clock cycles Per Instruction): It is one aspect of a
processor’s performance: the average number of clock cycles per instruction
for a program or program fragment.
CPI= ∑i(ICi)(CCi)/IC
Where, ICi=no of instruction for a given instruction type i.
IC=∑i(ICi) is the total instruction count.
CCi=clock cycles

There are five types of instructions in multi-cycle MIPS.


i. Load(5 cycles)
ii. Store(4 cycles)
iii. R-type(4 cycles)
iv. Branch(3 cycles)
v. Jumps(3 cycles)

Example:
(Q) If a program has 50% load instruction,25% store instruction, 15% R-
type instruction, 8% branch instruction and 2% jump instruction. Find CPI?
Soln: CPI= {(5*50) +(4*25) +(4*15) +(3*8) +(3*2)}/100
= (250+100+60+24+6)/100
= 440/100
= 4.4
Instruction Count: The total no of instructions that get executed for a
particular task, algorithm, workload, or program is referred to the
instruction count.
The instruction count forms the basis for various performance aspects
of the microprocessor such as Instructions Per Cycle (IPC) or Cycles Per
Instruction. (CPI)
6

MEASURING & REPORTING PERFORMANCE


Two ways to measure the performance are:-
1. The speed measure: which measures how fast a computer
completes a single task.
Ex: SPECint95 is used for comparing the ability of a computer to complete
single tasks.
2. The throughput measure: which measures how many tasks a
computer can complete in certain amount of time.
The computer user is interested in reducing response time (the time
between the start and the completion of an event) also referred to an
execution time. The manager of a large data processing center maybe
interested in increasing throughput (the total amount of work done in a
given time).
Even execution time can be defined in different ways depending on what we
count. The most straight forward definition of time is called wall-clock time,
response time, or elapsed time, which is the latency to complete a task,
including disk accesses, memory accesses, input/output activities, operating
system overhead.

Measuring performances
•Response time: how long does it take to execute a certain application / a
certain amount of work.
•Given two platforms X and Y, X is n times faster than Y for a certain
application, if
n=Timey/Timex
•Performance of X is n times faster than the performance of Y, if
n=Timey/Timex
=(1/Perfy)/(1/Perfx)
=Perfx/Perfy
Timing how long an application takes
7

• Wall-clock time/ elapsed time: time to complete a task as seen by the


user. Might include operating system overhead or potentially
interfering other applications. [When we are working on one
application, sometimes it is observed that some other applications
start working by itself, then time will vary.]
• CPU time: does not include time slices introduced by external
sources (e.g. running other applications). CPU time can be further.
(It will not include the time of other application. )
➢ User CPU time: CPU time spent in the program.
➢ System CPU time: CPU time spent in the OS performing tasks
requested by the program.

Choosing the right programs to test a system


• Real application: Use the target application for the machine in
order to evaluate its performance. (Best solution if application
available)
• Modified application: Real application has been modified in order
to measure a certain feature. (Remove I/O parts of an application
in order to focus on the CPU performance)
• Application Kernels: Focus on the most time-consuming parts of
an application. (Eg: Extract the matrix- vector multiply of an
application, since this uses 80% of the user CPU time)
• Toy benchmarks: Very small code segments which produce a
predictable result. (Eg: Sieve of Eratosthenes, quicksort)
• Synthetic benchmarks: Try to match the average frequency of
operations and operands for a certain program. (Code does not do
any useful work)[When more than one operations are used in one
program then the average frequency of the operations are
calculated.]

SPEC
• The Standard Performance Evaluation Corporation (SPEC) is a non-
profit corporation formed to establish, maintain and endorse(support)
a standardized set of relevant benchmarks that can be applied to the
newest generation of high- performance computers.
8

• SPEC develops suites of benchmarks and also reviews and publishes


submitted results from our member organizations and other
benchmark licenses.

Why do we need benchmarks?


• Identify problems: Measure machine properties.
• Time evaluation: Verify that we make progress.
• Coverage: Help vendors to have representative codes
Increase competition by transparency
Drive future development
• Relevance: Help consumers to choose the right computer.
Reporting results
• SPEC produces a minimal set of representative numbers:
➢ Reduces complexity to understand correlations.
➢ Eases comparison of different systems.
➢ Loss of information.
• Results have to be compliant to the SPEC benchmarking rules in order
to be approved as an official SPEC report.
➢ All components have to available at least 3 months after the
publication. (including a runtime environment for C/C++/Fortran
applications)
➢ Usage of SPEC tools for compiling and reporting.
➢ Each individual benchmark has to be executed at least three
times.
➢ Verification of the benchmark output.
➢ A maximum of four optimization flags are allowed for the base
run. (including preprocessor and link directives)
➢ Disclosure report containing all relevant data has to be available.

PIPELINING
Definition
▪ Pipelining is the process of arrangement of hardware elements of CPU
such that its overall performance is increased.
9

▪ Simultaneous execution of more than one instruction takes place in


pipelined processor.
▪ In pipelining multiple instructions are overlapped in execution.

Basic concepts
• Pipelining is the process of accumulating instruction from the
processor through a pipeline.
• It allows storing and executing instructions in an orderly process. It is
known as pipeline processing.
• Pipelining is a technique where multiple instructions are overlapped
during execution.

t0 t1 t2 t3 t4 t1 t1 t1 t1
Ins 1 IF ID IE MEM WB
Ins 2 IF ID IE MEM WB
Ins 3 IF ID IE MEM WB
Ins 4 IF ID IE MEM WB
Ins 5 IF ID IE MEM WB
IF=Instruction Fetch
ID=Instruction Decode
IE=Instruction Execute
MEM=Memory Access
WB=Write Back

• Pipeline is divided into stages and these stages are connected with one
another to form a pipe like structure. Instructions enter from one end
and exit from another end.
10

• In pipeline system, each segment consists of an input register followed


by a combinational circuit.
• The Register is used to hold data and combinational circuit performs
operations on it.
• The output of combinational circuit is applied to the input register of
the next segment.

Stages of Pipeline
There are 5 stages instruction pipeline to execute all the instructions in the
RISC instruction set.
• Stage 1 (Instruction Fetch): In this stage the CPU fetches the
instructions from the address present in the memory location whose
value is stored in the program counter.
• Stage 2 (Instruction Decode): In this stage, the instruction is decoded
and register file is accessed to obtain the values of registers used in the
instruction.
• Stage 3 (Instruction Execute): In this stage some of activities are
done such as ALU operations.
• Stage 4 (Memory Access): In this stage, memory operands are read
and written from/to the memory that is present in the instruction.
• Stage 5 (Write Back): In this stage, computed/fetched value is written
back to the register present in the instructions.
11

Types of pipeline
It is divided into two categories:
1)Arithmetic pipeline:
▪ It is usually found in most of the computers.
▪ They are used for floating point operation, multiplication of fixed-point
numbers etc.
▪ Example: The input to the floating-point adder pipeline is:
X=A*2a A & B are mantissa.
Y=B*2b a & b are exponents.
2)Instruction pipeline:
▪ In this stream of instructions can be executed by overlapping fetch,
decode and execute phase of an instruction cycle.
▪ This type of technique is used to increase the throughput of the
computer system.
▪ An instruction pipeline reads instruction from the memory while
previous instructions are being executed in other segments of the
pipeline. Thus, we can execute multiple instructions simultaneously.
▪ The pipeline will be more efficient if the instruction cycle is divided
into segments of equal duration.

What is Throughput?
• It measures number of instructions completed per unit time.
• It represents overall processing speed of pipeline.
• Higher throughput indicates processing speed of pipeline.
• Calculated as, throughput= number of instruction executed/ execution
time.
• It can be affected by pipeline length, clock frequency. efficiency of
instruction execution and presence of pipeline hazards or stalls.
What is Latency?
12

• It measures time taken for a single instruction to complete its


execution.
• It represents delay or time it takes for an instruction to pass through
pipeline stages.
• Lower latency indicates better performance.
• It is calculated as, Latency= Execution time/ Number of instruction
executed.
• It in influenced by pipeline length, depth, clock cycle time, instruction
dependencies and pipeline hazards.

Pipeline conflicts
There are some factors that cause the pipeline to derivate its normal
performance. Some of these factors are given below:
1)Timing Variations:
All stages cannot take same amount of time. This problem generally
occurs in instructions have different operands requirements and thus
different processing time.
2)Data Hazards:
When several instructions are in parallel execution, and if they
reference same data then problem arises. We must ensure that next
instruction does not attempt to access data before the current instruction,
because this will lead to incorrect results.
3)Branching:
In order to fetch and execute the next instruction must know what that
instruction is. If the present instruction is a conditional branch, and its result
will lead us to the next instruction, then the next instruction may not be
known until the current one is processed.
4)Interrupts:
Interrupts set unwanted instruction into the instruction stream.
Interrupts effect the execution of instruction.
13

5)Data Dependency:
It arises when an instruction depends upon the result of a previous
instruction but this is not yet available.

Advantages of Pipelining
➢ The cycle time of the processor is reduced.
➢ It increases the throughput of the system.
➢ It makes the system reliable.

Disadvantages of Pipelining
➢ The design of pipelined processor is complex and costly to
manufacture.
➢ The instruction latency is more.

HAZARDS
In pipelining CPI should be 1, i.e. every clock should have one instruction as
the output. But it is difficult to achieve. So, the problems which are created in
achieving this are called Hazards.
Data Hazards
Data Hazards occur when an instruction depends on the result of previous
instruction and that result of instruction has not yet been computed.
whenever two different instructions use the same storage. the location must
appear as if it is executed in sequential order.
Consider the pipelined execution of these instructions:
14

All the instructions after the ADD use the result of the ADD instruction(in
R1).
The ADD instruction writes the value of R1 in the WB stage, and the SUB
instructions read the value during ID stage (IDSUB). This problem is called a
data hazard.

There are four types of data dependencies: Read after Write (RAW), Write
after Read (WAR), Write after Write (WAW), and Read after Read (RAR).
These are explained as follows below.
• Read after Write (RAW) :
It is also known as True dependency or Flow dependency. It occurs
when the value produced by an instruction is required by a subsequent
instruction.
• Write after Read (WAR) :
It is also known as anti-dependency. These hazards occur when the
output register of an instruction is used right after read by a previous
instruction.
• Read after Read (RAR) :
It is also known as output dependency. It occurs when the instruction
both read from the same register.
Structural Hazards
Multiple instructions but limited resource.
A structural hazard occurs when two (or more) instructions that are already
in pipeline need the same resource. The result is that instruction must be
15

executed in series rather than parallel for a portion of pipeline. Structural


hazards are sometimes referred to as resource hazards.
Solution for Structural Hazards:
1.Resource Duplication: Increase the number of resources/Use multiple
resources.
2.Resource Pipelining: As instructions are used in pipeline, in the same way
use resources in pipeline. But in this case complexity will increase.
3.Change the ordering: The instruction which is taking much time, we can
execute that later.
Control Hazards
A control hazard in pipelining occurs when the processor can't decide which
instruction to fetch next in time. This can lead to delays in instruction
fetching. It is also known as branch hazards; these occur when the pipeline
makes the wrong decision about which instruction to fetch.
Branch prediction:
The most common approach to handle control hazards is using a branch
prediction unit that tries to guess whether a branch will be taken or not,
minimizing the need for flushing or stalling.
Solution for Control Hazards:
1.Flushing: Happens when a branch prediction is completely wrong, causing
instructions that were fetched based on the wrong prediction to be
discarded.
2.Stalling: Occurs when the pipeline pauses execution until the branch
decision is made, allowing the processor to fetch the correct instructions
based on the actual branch outcome.
Generally preferred over flushing as it only delays the pipeline for a few
cycles, not completely restarting it.

Techniques for Handling Hazards


16

1. Stalling: Stalling involves delaying the execution of an instruction until


the hazard is resolved. This can be done by inserting bubbles into the
pipeline or by stalling the entire pipeline.
2. Forwarding: Forwarding involves bypassing the result of an instruction
from one stage to another stage, rather than waiting for the result to be
written back to the register file.
3. Register Renaming: Register renaming involves assigning a new register
name to an instruction that is dependent on a previous instruction, thereby
avoiding the hazard.
4. Reordering: Reordering involves reordering the instructions in the
pipeline to avoid hazards. This can be done using techniques such as
instruction-level parallelism (ILP) or out-of-order execution (OoOE).
5. Hazard Detection and Resolution: Hazard detection and resolution
involves detecting hazards and resolving them using techniques such as
stalling, forwarding, or register renaming.
6. Pipeline Flush: Pipeline flush involves flushing the entire pipeline when a
hazard is detected, and restarting the pipeline from the beginning.
7. Branch Prediction: Branch prediction involves predicting the outcome of
a branch instruction and speculatively executing the instructions following
the branch. If the prediction is incorrect, the pipeline is flushed and the
correct instructions are executed.
8. Delayed Branch: Delayed branch involves delaying the execution of a
branch instruction until the pipeline is empty, thereby avoiding the hazard.
9. Speculative Execution: Speculative execution involves speculatively
executing instructions that are dependent on a previous instruction, and
discarding the results if the speculation is incorrect.
10. Tomasulo's Algorithm: Tomasulo's algorithm involves using a
combination of register renaming, forwarding, and stalling to handle hazards
in a pipelined processor.

EXCEPTION HANDLING
17

➢ Exceptions and interrupts are unexpected events that disrupt the


normal flow of instruction execution.
➢ An exception is an unexpected event within the processor.
➢ An interrupt is an unexpected event from outside the processor.
➢ Exceptions generally refer to events that arise within the CPU.
Example: undefined opcode, overflow, system call etc.
➢ Interrupts point to requests coming from an external I/O controller or
device to the processor.

Some examples of exceptions are:


• I/O device request.
• Invoking an OS service from a user program.
• Tracing instruction execution.
• Breakpoint.
• Integer arithmetic overflow.
• FP arithmetic anomaly.
• Page fault.
• Misaligned memory access.
• Memory protection violation.
• Using an undefined or unimplemented instruction.
• Hardware malfunctions.
• Power failure.

There are different characteristics for exceptions. They are as follows:


Synchronous VS Asynchronous
❖ Some exceptions may be synchronous, whereas others may be
asynchronous. If the same exception occurs in the same place with the
same data and memory allocation, then it is a synchronous exception.
They are more difficult to handle.
❖ Devices external to the CPU and memory cause asynchronous
exceptions. They can be handled after the current instructions and
hence easier than synchronous exceptions.
18

User requested VS Coerced


❖ Some exceptions may be user requested and not automatic. Such
exceptions are predictable and can be handled after the current
instruction.
❖ Coerced exceptions are generally raised by hardware and not under the
control of the user program. They are harder to handle.

User maskable VS unmaskable


❖ Exceptions can be maskable or unmaskable. They can be masked or
unmasked by a user task. This decides whether the hardware responds
to the exception or not. We may have instructions that enable or disable
exceptions.
Within VS Between instructions
❖ Exceptions may have to be handled within the instruction or between
the instruction. Within exceptions are normally synchronous and are
harder since the instruction has to be stopped and restarted.
Catastrophic exceptions like hardware malfunction will normally cause
termination.
❖ Exceptions that can be handled between two instructions are easier to
handle.

Resume VS Terminate
❖ Some exceptions may lead to the program to be continued after the
exception and some of them may lead to termination. Things are much
more complicated if we have to restart.
❖ Exceptions that lead to termination are easier, since we just have to
terminate and need not to restore the original status.

PIPELINE OPTIMIZATION TECHNIQUES


19

➢ Process to maximize the rendering speed, then allow stages that are
not bottlenecks to consume as much as the bottle-neck.
➢ Pipelining is a technique used to improve the execution throughput of a
CPU by using the processor resources in a more efficient manner. The
basic idea is to split the processor instruction into a series of small
independent stages. Each stage is designed to perform a certain part of
the instructions.
➢ The optimizing technique can greatly reduce the conflict of shared data
bus and improve the performance of applications with inherent data
pipeline characteristics.

Pipeline Optimization
▪ Stages execute in parallel.
▪ Always the slowest stage is the bottleneck of the pipeline.
▪ The bottleneck determines throughput (i.e. maximum speed).
▪ The bottleneck is the average bottleneck over a frame.
▪ Cannot measure intra-frame bottlenecks easily.
▪ Bottlenecks can change over a frame.
▪ Most important: Find bottleneck, then optimize that stage.

Locating the Bottleneck


Two bottleneck location techniques:
❖ Technique 1:
• Make a certain stage work less.
• If performance is better, then that stage is the bottleneck.
❖ Technique 2:
• Make the other two stages work less or (better) not at all.
• If the performance is the same, then the stages not included above
is the bottleneck.
20

COMPILER TECHNIQUES FOR IMPROVING PERFORMANCE


1. Instruction Level Parallelism (ILP): execute multiple
instructions simultaneously
2. Pipeline Optimization: minimize pipeline stalls
3. Loop Unrolling for Cache: reduce cache misses in loops
4. Register Allocation for Renaming: eliminate register
hazards
5. Branch Prediction Assistance: help CPU predict branches
6. Data Alignment for SIMD: optimize data for Single Instruction
Multiple Data
7. Dead Store Elimination for Cache: remove unnecessary
cache stores

You might also like