0% found this document useful (0 votes)
52 views11 pages

Co Unit1 Part3

This document discusses measuring computer performance and metrics. It states that time is the primary measure of performance, with elapsed time being the total time to complete a task including all operations. CPU time refers only to the time the CPU spends computing for the task and excludes wait times. Performance can also be measured by clock cycles, which are the discrete time intervals of the hardware clock. A formula shows that CPU performance depends on the number of clock cycles, clock cycle time, and instructions per clock cycle. Hardware and software components like the algorithm, programming language, compiler, instruction set architecture can impact these factors.

Uploaded by

Manjushree N.S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views11 pages

Co Unit1 Part3

This document discusses measuring computer performance and metrics. It states that time is the primary measure of performance, with elapsed time being the total time to complete a task including all operations. CPU time refers only to the time the CPU spends computing for the task and excludes wait times. Performance can also be measured by clock cycles, which are the discrete time intervals of the hardware clock. A formula shows that CPU performance depends on the number of clock cycles, clock cycle time, and instructions per clock cycle. Hardware and software components like the algorithm, programming language, compiler, instruction set architecture can impact these factors.

Uploaded by

Manjushree N.S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

distinguishes them from shared-memory multiprocessors, leading to the name message passing

multi-computers.

METRICS
Measuring Performance
Time is the measure of computer performance: the computer that performs the same
amount of work in the least time is the fastest. Program execution time is measured in seconds
per program. However, time can be defined in different ways, depending on what we count. Th
e most straightforward definition of time is called wall clock time, response time, or elapsed
time. Th ese terms mean the total time to complete a task, including disk accesses, memory
accesses, input/output (I/O) activities, operating system overhead—everything.
Computers are often shared, however, and a processor may work on several programs
simultaneously. In such cases, the system may try to optimize throughput rather than attempt
to minimize the elapsed time for one program. Hence, we oft en want to distinguish between
the elapsed time and the time over which the processor is working on our behalf. CPU
execution time or simply CPU time, which recognizes this distinction, is the time the CPU
spends computing for this task and does not include time spent waiting for I/O or running other
programs. (Remember, though, that the response time experienced by the user will be the
elapsed time of the program, not the CPU time.) CPU time can be further divided into the CPU
time spent in the program, called user CPU time, and the CPU time spent in the operating
system performing tasks on behalf of the program, called system CPU time. Differentiating
between system and user CPU time is difficult to do accurately, because it is oft en hard to
assign responsibility for operating system activities to one user program rather than another
and because of the functionality differences among operating systems. For consistency, we
maintain a distinction between performance based on elapsed time and that based on CPU
execution time. We will use the term system performance to refer to elapsed time on an
unloaded system and CPU performance to refer to user CPU time.
Although as computer users we care about time, when we examine the details of a
computer it’s convenient to think about performance in other metrics. In particular, computer
designers may want to think about a computer by using a measure that relates to how fast the
hardware can perform basic functions. Almost all computers are constructed using a clock that
determines when events take place in the hardware. Th ese discrete time intervals are called
clock cycles (or ticks, clock ticks, clock periods, clocks, cycles). Designers refer to the length
of a clock period both as the time for a complete clock cycle (e.g., 250 picoseconds, or 250 ps)
and as the clock rate (e.g., 4 gigahertz, or 4 GHz), which is the inverse of the clock period. In
the next subsection, we will formalize the relationship between the clock cycles of the hardware
designer and the seconds of the computer user.
1. Suppose we know that an application that uses both personal mobile devices and the Cloud
is limited by network performance. For the following changes, state whether only the throughput
improves, both response time and throughput improve, or neither improves.
a. An extra network channel is added between the PMD and the Cloud, increasing the
total network throughput and reducing the delay to obtain network access (since there are
now two channels).
b. Th e networking software is improved, thereby reducing the network communication
delay, but not increasing throughput.
c. More memory is added to the computer.
2. Computer C’s performance is 4 times as fast as the performance of computer B, which runs
a given application in 28 seconds.

CPU Performance and Its Factors


Users and designers oft en examine performance using different metrics. If we could
relate these diff erent metrics, we could determine the effect of a design change on the
performance as experienced by the user. Since we are confining ourselves to CPU performance
at this point, the bottom-line performance measure is CPU execution time. A simple formula
relates the most basic metrics (clock cycles and clock cycle time) to CPU time:

This formula makes it clear that the hardware designer can improve performance by reducing
the number of clock cycles required for a program or the length of the clock cycle. As we will see
in later chapters, the designer oft en faces a trade-off between the number of clock cycles needed
for a program and the length of each cycle. Many techniques that decrease the numberof clock
cycles may also increase the clock cycle time.
Instruction Performance
The performance equations above did not include any reference to the number of
instructions needed for the program. However, since the compiler clearly generated instructions
to execute, and the computer had to execute the instructions to run the program, the execution
time must depend on the number of instructions in a program. One way to think about execution
time is that it equals the number of instructions executed multiplied by the average time per
instruction. Therefore, the number of clock cycles required for a program can be written as

Th term clock cycles per instruction, which is the average number of clock cycles each
instruction takes to execute, is oft en abbreviated as CPI. Since different instructions may take
different amounts of time depending on what they do, CPI is an average of all the instructions
executed in the program. CPI provides one way of comparing two different implementations
of the same instruction set architecture, since the number of instructions executed for a program
will, of course, be the same.

Classic CPU Performance Equation


We can now write this basic performance equation in terms of instruction count (the
number of instructions executed by the program), CPI, and clock cycle time:
CPU time = Instruction count X CPI X Clock cycle time
or, since the clock rate is the inverse of clock cycle time:

These formulas are particularly useful because they separate the three key factors that affect
performance. We can use these formulas to compare two different implementations or toevaluate
a design alternative if we know its impact on these three parameters. The performance of a
program depends on the algorithm, the language, the compiler, the architecture, and the actual
hardware. Clock cycle Also called tick, clock tick, clock period, clock, or cycle. The time for
one clock period, usually of the processor clock, which runs at a constant rate. Clock period - The
length of each clock cycle.

The following table summarizes how these components affect the factors in the CPU
performance equation.
Hardware or
Affects
software How?
what?
component
The algorithm determines the number of source program
Instruction
instructions executed and hence the number of processor
count,
Algorithm instructions executed. The algorithm may also affect the CPI,
possibly CPI
by favoring slower or faster instructions. For example, if the
algorithm uses more divides, it will tend to have a higher CPI.
The programming language certainly affects the instruction
count, since statements in the language are translated to
Programming Instruction processor instructions, which determine instruction count.
language count, The language may also affect the CPI because of its features;
CPI for example, a language with heavy support for data
abstraction (e.g., Java) will require indirect calls, which will
use higher CPI instructions.
The effi ciency of the compiler affects both the instruction
Instruction count and average cycles per instruction, since the compiler
Compiler count, determines the translation of the source language instructions
CPI into computer instructions. The compiler’s role can be very
complex and affect the CPI in complex ways.
Instruction Instruction The instruction set architecture affects all three aspects of
set count, CPU performance, since it affects the instructions needed for
architecture clock rate, a function, the cost in cycles of each instruction, and the
CPI overall clock rate of the processor.
2. INSTRUCTIONS AND INSTRUCTION SEQUENCING
Objectives: you will learn about the Machine instructions and program execution, including
branching and subroutine call and return operations.

Memory Locations and Addresses

We will first consider how the memory of a computer is organized. The memory
consists of many millions of storage cells, each of which can store a bit of information having
the value 0 or 1. Because a single bit represents a very small amount of information, bits are
seldom handled individually. The usual approach is to deal with them in groups of fixed size.
Modern computers have word lengths that typically range from 16 to 64 bits. If the word length
of a computer is 32 bits, a single word can store a 32-bit signed number or four ASCII-encoded
characters, each occupying 8 bits, Accessing the memory to store or retrieve a single item of
information, either a word or a byte, requires distinct names or addresses for each location. We
now have three basic information quantities to deal with: bit, byte, and word. A byte is always
8 bits, but the word length typically ranges from 16 to 64 bits. The most practical assignment
is to have successive addresses refer to successive byte locations in the memory. This is the
assignment used in most modern computers. The term byte-addressable memory is used for
this assignment. Byte locations have addresses 0, 1, 2, . . . .. There are two ways that byte
addresses can be assigned across words, as shown in Figure.

The name big-endian is used when lower byte addresses are used for the more
significant bytes (the leftmost bytes) of the word. The name little-endian is used for the
opposite ordering, where the lower byte addresses are used for the less significant bytes (the
rightmost bytes) of the word. In the case of a 32-bit word length, natural word boundaries occur
at addresses 0, 4, 8, . . . , as shown in Figure. We say that the word locations have aligned
addresses if they begin at a byte address that is a multiple of the number of bytes in a word.
For practical reasons associated with manipulating binary-coded addresses, the number of
bytes in a word is a power of 2.

Memory Operations

Both program instructions and data operands are stored in the memory. To execute an
instruction, the processor control circuits must cause the word (or words) containing the
instruction to be transferred from the memory to the processor. Operands and results must also
be moved between the memory and the processor. Thus, two basic operations involving the
memory are needed, namely, Read and Write.
The Read operation transfers a copy of the contents of a specific memory location to
the processor. The memory contents remain unchanged. To start a Read operation, the
processor sends the address of the desired location to the memory and requests that its contents
be read. The memory reads the data stored at that address and sends them to the processor.

The Write operation transfers an item of information from the processor to a specific
memory location, overwriting the former contents of that location. To initiate a Write
operation, the processor sends the address of the desired location to the memory, together with
the data to be written into that location. The memory then uses the address and data to perform
the write.

INSTRUCTIONS AND INSTRUCTION SEQUENCING

The tasks carried out by a computer program consist of a sequence of small steps, such as
adding two numbers, testing for a particular condition, reading a character from the keyboard,
or sending a character to be displayed on a display screen. A computer must haveinstructions
capable of performing four types of operations:
• Data transfers between the memory and the processor registers
• Arithmetic and logic operations on data
• Program sequencing and control
• I/O transfers
We begin by discussing instructions for the first two types of operations.

Register Transfer Notation

We need to describe the transfer of information from one location in a computer to


another. Possible locations that may be involved in such transfers are memory locations,
processor registers, or registers in the I/O subsystem. Most of the time, we identify such
locations symbolically with convenient names. For example, names that represent the addresses
of memory locations may be LOC, PLACE, A, or VAR2. Predefined names for the processor
registers may be R0 or R5. Registers in the I/O subsystem may be identified by names such as
DATAIN or OUTSTATUS. To describe the transfer of information, the contents of any
location are denoted by placing square brackets around its name. Thus, the expression
R2 ← [LOC]
means that the contents of memory location LOC are transferred into processor register R2.
As another example, consider the operation that adds the contents of registers R2 and
R3, and places their sum into register R4. This action is indicated as
R4 ← [R2] + [R3]
This type of notation is known as Register Transfer Notation (RTN). Note that the righthand
side of an RTN expression always denotes a value, and the left-hand side is the name of a
location where the value is to be placed, overwriting the old contents of that location. In
computer jargon, the words “transfer” and “move” are commonly used to mean “copy.”
Transferring data from a source location A to a destination location B means that the contents
of location A are read and then written into location B. In this operation, only the contents of
the destination will change. The contents of the source will stay the same.

Assembly-Language Notation

We need another type of notation to represent machine instructions and programs. For
this, we use assembly language. For example, a generic instruction that causes the transfer
described above, from memory location LOC to processor register R2, is specified by the
statement
Load R2, LOC
The contents of LOC are unchanged by the execution of this instruction, but the old contents
of register R2 are overwritten. The name Load is appropriate for this instruction, because the
contents read from a memory location are loaded into a processor register. The second example
of adding two numbers contained in processor registers R2 and R3 and placing their sum in R4
can be specified by the assembly-language statement
Add R4, R2, R3
In this case, registers R2 and R3 hold the source operands, while R4 is the destination.

An instruction specifies an operation to be performed and the operands involved. In the


above examples, we used the English words Load and Add to denote the required operations.
In the assembly-language instructions of actual (commercial) processors, such operations are
defined by using mnemonics, which are typically abbreviations of the words describing the
operations. For example, the operation Load may be written as LD, while the operation Store,
which transfers a word from a processor register to the memory, may be written as STR or ST.
Assembly languages for different processors often use different mnemonics for a given
operation. To avoid the need for details of a particular assembly language at this early stage,
we will continue the presentation in this chapter by using English words rather than processor-
specific mnemonics.

RISC and CISC Instruction Sets

One of the most important characteristics that distinguish different computers is the
nature of their instructions. There are two fundamentally different approaches in the design of
instruction sets for modern computers. One popular approach is based on the premise that
higher performance can be achieved if each instruction occupies exactly one word in memory,
and all operands needed to execute a given arithmetic or logic operation specified by an
instruction are already in processor registers. This approach is conducive to an implementation
of the processing unit in which the various operations needed to process a sequence of
instructions are performed in “pipelined” fashion to overlap activity and reduce total execution
time of a program. The restriction that each instruction must fit into a single word reduces the
complexity and the number of different types of instructions that may be included in the
instruction set of a computer. Such computers are called Reduced Instruction Set Computers
(RISC).

An alternative to the RISC approach is to make use of more complex instructions which
may span more than one word of memory, and which may specify more complicated
operations. This approach was prevalent prior to the introduction of the RISC approach in the
1970s. Although the use of complex instructions was not originally identified by any particular
label, computers based on this idea have been subsequently called Complex Instruction Set
Computers (CISC).

We will start our presentation by concentrating on RISC-style instruction sets because


they are simpler and therefore easier to understand.

Introduction to RISC Instruction Sets

Two key characteristics of RISC instruction sets are:


• Each instruction fits in a single word.
• A load/store architecture is used, in which
– Memory operands are accessed only using Load and Store instructions.
– All operands involved in an arithmetic or logic operation must either be in processor registers,
or one of the operands may be given explicitly within the instruction word.
At the start of execution of a program, all instructions and data used in the program are
stored in the memory of a computer. Processor registers do not contain valid operands at that
time. If operands are expected to be in processor registers before they can be used by an
instruction, then it is necessary to first bring these operands into the registers. This task is done
by Load instructions which copy the contents of a memory location into a processor register.
Load instructions are of the form
Load destination, source
or more specifically
Load processor_register, memory_location
The memory location can be specified in several ways. The term addressing modes is used to
refer to the different ways in which this may be accomplished. Let us now consider a typical
arithmetic operation. The operation of adding two numbers is a fundamental capability in any
computer. The statement
C=A+B
in a high-level language program instructs the computer to add the current values of the two
variables called A and B, and to assign the sum to a third variable, C. When the program
containing this statement is compiled, the three variables, A, B, and C, are assigned to distinct
locations in the memory. For simplicity, we will refer to the addresses of these locations as A,
B, and C, respectively. The contents of these locations represent the values of the three
variables. Hence, the above high-level language statement requires the action
C ← [A] + [B]
to take place in the computer. To carry out this action, the contents of memory locations A and
B are fetched from the memory and transferred into the processor where their sum is computed.
This result is then sent back to the memory and stored in location C. The required action can
be accomplished by a sequence of simple machine instructions. We choose to use registers R2,
R3, and R4 to perform the task with four instructions:
Load R2, A
Load R3, B
Add R4, R2, R3
Store R4, C
We say that Add is a three-operand, or a three-address, instruction of the form
Add destination, source1, source2
The Store instruction is of the form
Store source, destination
where the source is a processor register and the destination is a memory location. Observe that
in the Store instruction the source and destination are specified in the reverse order from the Load
instruction; this is a commonly used convention. Note that we can accomplish the desired addition
by using only two registers, R2 and R3, if one of the source registers is also used as the
destination for the result. In this case the addition would be performed as
Add R3, R2, R3
and the last instruction would become
Store R3, C

INSTRUCTION EXECUTION AND STRAIGHT-LINE SEQUENCING


We used the task C = A + B, implemented as C←[A] + [B], as an example. Figure shows
a possible program segment for this task as it appears in the memory of a computer. We assume
that the word length is 32 bits and the memory is byte-addressable. The four instructions of the
program are in successive word locations, starting at location i. Since each instruction is 4 bytes
long, the second, third, and fourth instructions are at addresses i + 4,i + 8, and i + 12. For
simplicity, we assume that a desired memory address can be directly specified in Load and Store
instructions, although this is not possible if a full 32-bit address is involved.

Let us consider how this program is executed. The processor contains a register called the
program counter (PC), which holds the address of the next instruction to be executed. To
begin executing a program, the address of its first instruction (i in our example) must be placed
into the PC. Then, the processor control circuits use the information in the PC to fetch and
execute instructions, one at a time, in the order of increasing addresses. This is called straight-
line sequencing. During the execution of each instruction, the PC is incremented by 4 to point
to the next instruction. Thus, after the Store instruction at location i + 12 is executed, the PC
contains the value i + 16, which is the address of the first instruction of the next program
segment.
Executing a given instruction is a two-phase procedure. In the first phase, called
instruction fetch, the instruction is fetched from the memory location whose address is in the
PC. This instruction is placed in the instruction register (IR) in the processor. At the start of the
second phase, called instruction execute, the instruction in IR is examined to determine which
operation is to be performed. The specified operation is then performed by the processor. This
involves a small number of steps such as fetching operands from the memory or from processor
registers, performing an arithmetic or logic operation, and storing the result in the destination
location. At some point during this two-phase procedure, the contents of the PC are advanced
to point to the next instruction. When the execute phase of an instruction is completed, the PC
contains the address of the next instruction, and a new instruction fetch phase can begin.
Branching
Consider the task of adding a list of n numbers. The program outlined in Figure is a
generalization of the program in Figure. The addresses of the memory locations containing the
n numbers are symbolically given as NUM1, NUM2, . . . , NUMn, and separate Load and Add
instructions are used to add each number to the contents of register R2. After all the numbers
have been added, the result is placed in memory location SUM. Instead of using a long list of
Load and Add instructions, as in Figure, it is possible to implement a program loop in which
the instructions read the next number in the list and add it to the current sum. To add all
numbers, the loop has to be executed as many times as there are numbers in the list. Figure
shows the structure of the desired program. The body of the loop is a straight-line sequence of
instructions executed repeatedly. It starts at location LOOP and ends at the instruction
Branch_if_[R2]>0. During each pass through this loop, the address of the next list entry is
determined, and that entry is loaded into R5 and added to R3. The address of an operand can
be specified in various ways. For now, we concentrate on how to create and control a program
loop. Assume that the number of entries in the list, n, is stored in memory location N, as shown.
Register R2 is used as a counter to determine the number of times the loop is executed. Hence,

You might also like