Co Unit1 Part3
Co Unit1 Part3
multi-computers.
METRICS
Measuring Performance
Time is the measure of computer performance: the computer that performs the same
amount of work in the least time is the fastest. Program execution time is measured in seconds
per program. However, time can be defined in different ways, depending on what we count. Th
e most straightforward definition of time is called wall clock time, response time, or elapsed
time. Th ese terms mean the total time to complete a task, including disk accesses, memory
accesses, input/output (I/O) activities, operating system overhead—everything.
Computers are often shared, however, and a processor may work on several programs
simultaneously. In such cases, the system may try to optimize throughput rather than attempt
to minimize the elapsed time for one program. Hence, we oft en want to distinguish between
the elapsed time and the time over which the processor is working on our behalf. CPU
execution time or simply CPU time, which recognizes this distinction, is the time the CPU
spends computing for this task and does not include time spent waiting for I/O or running other
programs. (Remember, though, that the response time experienced by the user will be the
elapsed time of the program, not the CPU time.) CPU time can be further divided into the CPU
time spent in the program, called user CPU time, and the CPU time spent in the operating
system performing tasks on behalf of the program, called system CPU time. Differentiating
between system and user CPU time is difficult to do accurately, because it is oft en hard to
assign responsibility for operating system activities to one user program rather than another
and because of the functionality differences among operating systems. For consistency, we
maintain a distinction between performance based on elapsed time and that based on CPU
execution time. We will use the term system performance to refer to elapsed time on an
unloaded system and CPU performance to refer to user CPU time.
Although as computer users we care about time, when we examine the details of a
computer it’s convenient to think about performance in other metrics. In particular, computer
designers may want to think about a computer by using a measure that relates to how fast the
hardware can perform basic functions. Almost all computers are constructed using a clock that
determines when events take place in the hardware. Th ese discrete time intervals are called
clock cycles (or ticks, clock ticks, clock periods, clocks, cycles). Designers refer to the length
of a clock period both as the time for a complete clock cycle (e.g., 250 picoseconds, or 250 ps)
and as the clock rate (e.g., 4 gigahertz, or 4 GHz), which is the inverse of the clock period. In
the next subsection, we will formalize the relationship between the clock cycles of the hardware
designer and the seconds of the computer user.
1. Suppose we know that an application that uses both personal mobile devices and the Cloud
is limited by network performance. For the following changes, state whether only the throughput
improves, both response time and throughput improve, or neither improves.
a. An extra network channel is added between the PMD and the Cloud, increasing the
total network throughput and reducing the delay to obtain network access (since there are
now two channels).
b. Th e networking software is improved, thereby reducing the network communication
delay, but not increasing throughput.
c. More memory is added to the computer.
2. Computer C’s performance is 4 times as fast as the performance of computer B, which runs
a given application in 28 seconds.
This formula makes it clear that the hardware designer can improve performance by reducing
the number of clock cycles required for a program or the length of the clock cycle. As we will see
in later chapters, the designer oft en faces a trade-off between the number of clock cycles needed
for a program and the length of each cycle. Many techniques that decrease the numberof clock
cycles may also increase the clock cycle time.
Instruction Performance
The performance equations above did not include any reference to the number of
instructions needed for the program. However, since the compiler clearly generated instructions
to execute, and the computer had to execute the instructions to run the program, the execution
time must depend on the number of instructions in a program. One way to think about execution
time is that it equals the number of instructions executed multiplied by the average time per
instruction. Therefore, the number of clock cycles required for a program can be written as
Th term clock cycles per instruction, which is the average number of clock cycles each
instruction takes to execute, is oft en abbreviated as CPI. Since different instructions may take
different amounts of time depending on what they do, CPI is an average of all the instructions
executed in the program. CPI provides one way of comparing two different implementations
of the same instruction set architecture, since the number of instructions executed for a program
will, of course, be the same.
These formulas are particularly useful because they separate the three key factors that affect
performance. We can use these formulas to compare two different implementations or toevaluate
a design alternative if we know its impact on these three parameters. The performance of a
program depends on the algorithm, the language, the compiler, the architecture, and the actual
hardware. Clock cycle Also called tick, clock tick, clock period, clock, or cycle. The time for
one clock period, usually of the processor clock, which runs at a constant rate. Clock period - The
length of each clock cycle.
The following table summarizes how these components affect the factors in the CPU
performance equation.
Hardware or
Affects
software How?
what?
component
The algorithm determines the number of source program
Instruction
instructions executed and hence the number of processor
count,
Algorithm instructions executed. The algorithm may also affect the CPI,
possibly CPI
by favoring slower or faster instructions. For example, if the
algorithm uses more divides, it will tend to have a higher CPI.
The programming language certainly affects the instruction
count, since statements in the language are translated to
Programming Instruction processor instructions, which determine instruction count.
language count, The language may also affect the CPI because of its features;
CPI for example, a language with heavy support for data
abstraction (e.g., Java) will require indirect calls, which will
use higher CPI instructions.
The effi ciency of the compiler affects both the instruction
Instruction count and average cycles per instruction, since the compiler
Compiler count, determines the translation of the source language instructions
CPI into computer instructions. The compiler’s role can be very
complex and affect the CPI in complex ways.
Instruction Instruction The instruction set architecture affects all three aspects of
set count, CPU performance, since it affects the instructions needed for
architecture clock rate, a function, the cost in cycles of each instruction, and the
CPI overall clock rate of the processor.
2. INSTRUCTIONS AND INSTRUCTION SEQUENCING
Objectives: you will learn about the Machine instructions and program execution, including
branching and subroutine call and return operations.
We will first consider how the memory of a computer is organized. The memory
consists of many millions of storage cells, each of which can store a bit of information having
the value 0 or 1. Because a single bit represents a very small amount of information, bits are
seldom handled individually. The usual approach is to deal with them in groups of fixed size.
Modern computers have word lengths that typically range from 16 to 64 bits. If the word length
of a computer is 32 bits, a single word can store a 32-bit signed number or four ASCII-encoded
characters, each occupying 8 bits, Accessing the memory to store or retrieve a single item of
information, either a word or a byte, requires distinct names or addresses for each location. We
now have three basic information quantities to deal with: bit, byte, and word. A byte is always
8 bits, but the word length typically ranges from 16 to 64 bits. The most practical assignment
is to have successive addresses refer to successive byte locations in the memory. This is the
assignment used in most modern computers. The term byte-addressable memory is used for
this assignment. Byte locations have addresses 0, 1, 2, . . . .. There are two ways that byte
addresses can be assigned across words, as shown in Figure.
The name big-endian is used when lower byte addresses are used for the more
significant bytes (the leftmost bytes) of the word. The name little-endian is used for the
opposite ordering, where the lower byte addresses are used for the less significant bytes (the
rightmost bytes) of the word. In the case of a 32-bit word length, natural word boundaries occur
at addresses 0, 4, 8, . . . , as shown in Figure. We say that the word locations have aligned
addresses if they begin at a byte address that is a multiple of the number of bytes in a word.
For practical reasons associated with manipulating binary-coded addresses, the number of
bytes in a word is a power of 2.
Memory Operations
Both program instructions and data operands are stored in the memory. To execute an
instruction, the processor control circuits must cause the word (or words) containing the
instruction to be transferred from the memory to the processor. Operands and results must also
be moved between the memory and the processor. Thus, two basic operations involving the
memory are needed, namely, Read and Write.
The Read operation transfers a copy of the contents of a specific memory location to
the processor. The memory contents remain unchanged. To start a Read operation, the
processor sends the address of the desired location to the memory and requests that its contents
be read. The memory reads the data stored at that address and sends them to the processor.
The Write operation transfers an item of information from the processor to a specific
memory location, overwriting the former contents of that location. To initiate a Write
operation, the processor sends the address of the desired location to the memory, together with
the data to be written into that location. The memory then uses the address and data to perform
the write.
The tasks carried out by a computer program consist of a sequence of small steps, such as
adding two numbers, testing for a particular condition, reading a character from the keyboard,
or sending a character to be displayed on a display screen. A computer must haveinstructions
capable of performing four types of operations:
• Data transfers between the memory and the processor registers
• Arithmetic and logic operations on data
• Program sequencing and control
• I/O transfers
We begin by discussing instructions for the first two types of operations.
Assembly-Language Notation
We need another type of notation to represent machine instructions and programs. For
this, we use assembly language. For example, a generic instruction that causes the transfer
described above, from memory location LOC to processor register R2, is specified by the
statement
Load R2, LOC
The contents of LOC are unchanged by the execution of this instruction, but the old contents
of register R2 are overwritten. The name Load is appropriate for this instruction, because the
contents read from a memory location are loaded into a processor register. The second example
of adding two numbers contained in processor registers R2 and R3 and placing their sum in R4
can be specified by the assembly-language statement
Add R4, R2, R3
In this case, registers R2 and R3 hold the source operands, while R4 is the destination.
One of the most important characteristics that distinguish different computers is the
nature of their instructions. There are two fundamentally different approaches in the design of
instruction sets for modern computers. One popular approach is based on the premise that
higher performance can be achieved if each instruction occupies exactly one word in memory,
and all operands needed to execute a given arithmetic or logic operation specified by an
instruction are already in processor registers. This approach is conducive to an implementation
of the processing unit in which the various operations needed to process a sequence of
instructions are performed in “pipelined” fashion to overlap activity and reduce total execution
time of a program. The restriction that each instruction must fit into a single word reduces the
complexity and the number of different types of instructions that may be included in the
instruction set of a computer. Such computers are called Reduced Instruction Set Computers
(RISC).
An alternative to the RISC approach is to make use of more complex instructions which
may span more than one word of memory, and which may specify more complicated
operations. This approach was prevalent prior to the introduction of the RISC approach in the
1970s. Although the use of complex instructions was not originally identified by any particular
label, computers based on this idea have been subsequently called Complex Instruction Set
Computers (CISC).
Let us consider how this program is executed. The processor contains a register called the
program counter (PC), which holds the address of the next instruction to be executed. To
begin executing a program, the address of its first instruction (i in our example) must be placed
into the PC. Then, the processor control circuits use the information in the PC to fetch and
execute instructions, one at a time, in the order of increasing addresses. This is called straight-
line sequencing. During the execution of each instruction, the PC is incremented by 4 to point
to the next instruction. Thus, after the Store instruction at location i + 12 is executed, the PC
contains the value i + 16, which is the address of the first instruction of the next program
segment.
Executing a given instruction is a two-phase procedure. In the first phase, called
instruction fetch, the instruction is fetched from the memory location whose address is in the
PC. This instruction is placed in the instruction register (IR) in the processor. At the start of the
second phase, called instruction execute, the instruction in IR is examined to determine which
operation is to be performed. The specified operation is then performed by the processor. This
involves a small number of steps such as fetching operands from the memory or from processor
registers, performing an arithmetic or logic operation, and storing the result in the destination
location. At some point during this two-phase procedure, the contents of the PC are advanced
to point to the next instruction. When the execute phase of an instruction is completed, the PC
contains the address of the next instruction, and a new instruction fetch phase can begin.
Branching
Consider the task of adding a list of n numbers. The program outlined in Figure is a
generalization of the program in Figure. The addresses of the memory locations containing the
n numbers are symbolically given as NUM1, NUM2, . . . , NUMn, and separate Load and Add
instructions are used to add each number to the contents of register R2. After all the numbers
have been added, the result is placed in memory location SUM. Instead of using a long list of
Load and Add instructions, as in Figure, it is possible to implement a program loop in which
the instructions read the next number in the list and add it to the current sum. To add all
numbers, the loop has to be executed as many times as there are numbers in the list. Figure
shows the structure of the desired program. The body of the loop is a straight-line sequence of
instructions executed repeatedly. It starts at location LOOP and ends at the instruction
Branch_if_[R2]>0. During each pass through this loop, the address of the next list entry is
determined, and that entry is loaded into R5 and added to R3. The address of an operand can
be specified in various ways. For now, we concentrate on how to create and control a program
loop. Assume that the number of entries in the list, n, is stored in memory location N, as shown.
Register R2 is used as a counter to determine the number of times the loop is executed. Hence,