0% found this document useful (0 votes)
85 views10 pages

Csci 260 Study Guide-2

The document discusses CPU performance metrics like CPI (cycles per instruction), instruction count, clock cycles, and clock rate. It shows formulas that relate these metrics to calculate execution time (CPU time) for different processors running the same program. One formula shown is that CPU time equals instruction count multiplied by CPI multiplied by clock cycle time. The document also provides an example that calculates and compares the execution times and performance of Processor A and Processor B running the same program. Processor B is found to be 1.28 times as fast as Processor A.

Uploaded by

zubayerthewizard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views10 pages

Csci 260 Study Guide-2

The document discusses CPU performance metrics like CPI (cycles per instruction), instruction count, clock cycles, and clock rate. It shows formulas that relate these metrics to calculate execution time (CPU time) for different processors running the same program. One formula shown is that CPU time equals instruction count multiplied by CPI multiplied by clock cycle time. The document also provides an example that calculates and compares the execution times and performance of Processor A and Processor B running the same program. Processor B is found to be 1.28 times as fast as Processor A.

Uploaded by

zubayerthewizard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

● Find the percentage of each type and use the average weighted method.

5,000
(5,000 + 25,000)
= 16. 7%floating point instructions
25,000
(5,000 + 25,000)
= 83. 3%integer instructions
Thus, 𝐶𝑃𝐼𝐴 = (0. 167)7 + (0. 833)1 = 2. 0 𝑐𝑦𝑐𝑙𝑒𝑠/𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛

● Find the total cycles and the total instructions in the whole program, and
then find the ratio of cycles per instructions.

There are 60,000 cycles and 25,000 + 5,000 = 30,000 instructions in the
60,000 𝑐𝑦𝑐𝑙𝑒𝑠
whole program. Thus, 30,000 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠
= 2 𝑐𝑦𝑐𝑙𝑒𝑠/𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛.

3. Processor A runs Program 2 consisting of 100,000 floating instructions and 50,000


integer instructions. What's the average CPI for this program?

● Method 1: Weighted average method

100,000 10 2
(100,000 + 50,000)
= 15
= 3
floating point instructions
50, 000 5 1
(100,000 + 50,000)
= 15
= 3
integer instructions

2 1 15
Thus, 𝐶𝑃𝐼 = 3
(7) + 3
(1) = 3
= 5 𝑐𝑦𝑐𝑙𝑒𝑠/𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛.

● Method 2: Ratio of total cycles and total instructions

To use this method, we must first calculate the number of clock cycles for
Program 2. This is given by
𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠𝑓𝑝 + 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠𝑖𝑛𝑡 = 7(100, 000) + 1(50, 000) = 750, 000 𝑐𝑦𝑐𝑙𝑒𝑠
in total.

𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑐𝑦𝑐𝑙𝑒𝑠 750,000 𝑐𝑦𝑐𝑙𝑒𝑠


Thus, 𝐶𝑃𝐼 = 𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠
= 150,000 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠
= 5 𝑐𝑦𝑐𝑙𝑒𝑠/𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛.

4. Processor B has an average CPI for Program 2 of 3.5. Its clock rate is 1.8 GHz. How
much time does it take to execute the program?
Here we're given the instruction count (from previous problem), the CPI and the
cycle time (in the form of the clock rate). Thus,
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡 × 𝐶𝑃𝐼 × 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡𝑖𝑚𝑒
1 291, 667
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 150, 000 × 3. 5 × 9 = 91 = 291 µ𝑠
1.8 ×10 ℎ𝑒𝑟𝑡𝑧 10𝑠

5. For Program 2, which processor is faster and by how much?


We must first calculate Processor A's CPU time with Program 2. This is given by
1 375,000
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 150, 000 × 5 × 9 = 91 = 375 µ𝑠
2 ×10 ℎ𝑒𝑟𝑡𝑧 10 𝑠

Clearly, Processor B is faster than Processor A. Thus,

𝑃𝑒𝑟𝑓𝑜𝑚𝑎𝑛𝑐𝑒𝐵 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐴 375 µ𝑠


𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝐴
= 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐵
= 292 µ𝑠
= 1. 28

Thus, Processor B is 1.28 times as fast as A.

Power Wall
The power wall refers to the limit transistors experience in trying to perform instructions on a
given clock cycle. In other words, it describes the electric energy consumption of a chip as a
limiting factor for processor frequency increase. If there is too much power, the transistor will
overheat causing malfunctions. However, if there is less power, then we start to experience
leakage due to lower voltage.

Reading: The Power Wall, Why Aren’t Modern CPUs Faster?

CPU Benchmark
Describes Some kind of test program that is used to compare the computer. Specifically how the
program is able to run on which computer’s CPI.

Summary

Formula Meaning

𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝐴 =
1 Relates the performance and (CPU)
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐴
execution time for a computer A (i.e., the
lesser the execution time the higher the
performance).
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝐴 1/𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐴 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐵 Relates the performance of computers A and
= = =𝑛
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝐵 1/𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐵 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐴 B. It denotes that computer A is n times
faster than B.

𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 × 𝑐𝑦𝑐𝑙𝑒 𝑡𝑖𝑚𝑒 Relates the clock cycles (how many clock
cycles?) and cycle time (how long is each
cycle?).

𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 =
𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 Relates clock cycles and clock rate. This is
𝑐𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒
due to cycle time (period) and clock rate
(frequency) being inverse of each other.
Since cycle time is measured in seconds, the
clock rate is 1/seconds = Hertz.

𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 = 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡 × 𝑎𝑣𝑔 𝐶𝑃𝐼 Relates the instruction count (how many
instructions?) and the average CPI (cycles
per instruction).

𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡 × 𝐶𝑃𝐼 × 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡 Relates instruction count, CPI, and cycle
time.

Assembly Programming Language


Machine language is code that a computer can explicitly understand. It consists of binary and
hexadecimal memory values that a human being is unable to process, yet is vital for a computer to
operate. Machine language is considered a low-level language as a result.

The next ring on the ladder is assembly language which can be read by a human, but isn’t directly
understood by the CPU. Assembly language is 1:1 with its instructions and is converted into machine
code via an assembler. Pseudo Instructions are special commands used by the assembler to help
assembly language be converted into machine language. Pseudo instructions does this by checking the
position of registers and memory addresses.

Lastly the compiler and interpreter take a high level language and convert it into a form that can be
eventually converted into Machine Language. Although similar, they have unique differences. An
interpreter executes the code in its source language (Java for example), while the compiler translates
that high level code into machine language (C++ for example).

From high-level languages to machine code:


1. Compiler/Interpreter
2. Assembler/Pseudo instructions
3. Machine Language
Intro to MIPS Assembly Language
To command a computer’s hardware, you must speak its language. The words of a computer’s
language are called instructions, and its vocabulary is called an instruction set. MIPS is an
instruction set that comes from MIPS Technologies, and is an elegant example of the instruction sets
designed since the 1980s. Among other popular instruction sets there are:
● ARMv7 is similar to MIPS. More than 9 billion chips with ARM processors were manufactured in
2011, making it the most popular instruction set in the world.
● The second example is the Intel x86, which powers both the PC and the cloud of the PostPC
Era.
● The third example is ARMv8, which extends the address size of the ARMv7 from 32 bits to 64
bits. Ironically, as we shall see, this 2013 instruction set is closer to MIPS than it is to ARMv7.

A computer architecture is a set of rules and methods that describe the functionality, organization,
and implementation of computer systems. Some definitions of architecture define it as describing the
capabilities and programming model of a computer but not a particular implementation. The MIPS
architecture specifically refers to a special computer architecture that features reduced instruction
sets, which is what it's used in this course.

Operands and Memory Access


Operands of arithmetic instructions are restricted, and thus they're based on a limited number of special
locations built directly in hardware called registers. The size of a register in the MIPS architecture is 32
bits; a group of 32 bits is called a word, a natural unit of access in a computer.

Arithmetic Operands: add and sub


A MIPS instruction operates on two source operands and places the result in one destination operand.
The add instruction adds the contents of two registers and places the result into another one. Likewise
sub for subtracting two registers.

add $s0, $s1, $s2 # s0 = s1 + s2


sub $s0, $s1, $s2 # s0 = s1 - s2

Example: Given the C statement f = (g + h) - (i + j);. The variables f, g, h, i, and j are


assigned to the registers $s0, $s1, $s2, $s3, and $s4, respectively. What is the compiled MIPS code?

1. The first MIPS instruction calculates g + h. The result must be placed somewhere, so the compiler
creates a temporary variable $t0 and places it there.

2. Next we calculate i + j by placing the sum into a temporary variable $t1.


3. Finally, the sub instruction subtracts the second sum from the first and places the difference in the
register $s0 (i.e., variable f):

Thus:

add $t0, $s1, $s2 # t0 = s1 + s2


add $t1, $s3, $s4 # t1 = s3 + s4
sub $s0, $t0, $t1 # s0 = t0 - t1

are the MIPS instructions for f = (g + h) – (i + j);

Constant/Immediate Operands: addi


Many times a program will use constants in an operation (e.g., incrementing an index to point to the
next element of an array). In such cases we could use the arithmetic instructions we've seen so far but
we would need to load the constants from memory, which can be avoided by using versions of the
arithmetic instructions in which an operand is a constant.

Example: To add the constant 4 to register $s3, we could use the code

lw $t0, AddrConstant4($s1) # t0 = constant 4


add $s3,$s3,$t0 # s3 = s3 + t0

assuming that $s1 + AddrConstant4 is the memory address of the constant 4.

The same operation can be achieved by using addi (add immediate) instead:

add $s3, $s3, 4 # $s3 = $s3 + 4

By including constants inside arithmetic instructions, operations are much faster and use less energy
than if constants were loaded from memory. The constant zero has another role, which is to simplify the
instruction set by offering useful variations. For example, the move operation is just an add instruction
where one operand is zero. Hence, MIPS dedicates a register $zero to be hard-wired to the value zero.
(As you might expect, it is register number 0.)

NOTE: Since MIPS supports negative constants, there is no need for a subtract immediate (or subi had
there been one) in MIPS.
Memory Operands: lw and sw
Arithmetic operations occur only on registers in MIPS instructions; thus, MIPS must include instructions
that transfer data between memory and registers. These instructions are called data transfer
instructions. To access a word in memory, the instruction must supply the memory address. Memory is
just a large, single-dimensional array, with the address acting as the index to that array, starting at 0.
For example, given the layout down below, the address of the third data element is 2, and the value of
Memory[2] is C:

Address | 0 | 1 | 2 | 3 | ...
Data | A | B | C | 4 | ...

lw
lw (stands for load word) - This MIPS instruction copies data from memory to a register, and is
traditionally called load. The format of the load instruction is the name of the operation followed by the
register to be loaded with the memory's data, then a constant (i.e., offset) and register used to access
memory (i.e., base register). The sum of the constant portion of the instruction and the contents of the
second register forms the memory address.

lw $s1, 0($s0) # loads the memory address of s0 + 0 into s1


lw $s1, 16($s0) # loads the memory address of s0 + 16 into s1

All offsets are divisible by 4 and increase the memory address by 1 (for each 4 bytes). This makes the
memory byte-addressable.

Example: Let’s assume that A is an array of 100 words and that the compiler has associated the
variables g and h with the registers $s1 and $s2 as before. Let’s also assume that the starting address,
or base address, of the array is in $s3. Compile the C assignment statement g = h + A[8];.

1. We must first transfer A[8] into a register. The address of this array element is the sum of the
base of the array A, found in register $s3, plus the number to select element 8. The data should
be placed in a temporary register for use in the next instruction.

2. Next we add h to A[8] (already in register $t0) and place the sum into the register corresponding
to g.
Thus we get:

lw $t0, 32($s3) # t0 = A[8]


add $s1, $s2, $t0 # g = h + A[8]
Notice that the proper offset added to the base register $s3 is 4 𝑏𝑦𝑡𝑒𝑠 × 8 = 32 𝑏𝑦𝑡𝑒𝑠, and not simply
8. If we used 8($s3), we would select A[8/4] (remember offsets are divisible by 4) and not the correct
A[8].

sw
sw (stands for store word) - Traditionally called store, this instruction copies data from a register to
memory. The format of a store is similar to that of a load.

sw $s1, 0($s0) # Stores register s1 into the memory address of s0


sw $s1, 16($s0) # Stores register s1 into the memory address of s0 + 4

Example: Assume variable h is associated with register $s2 and the base address of the array A is in
$s3. What is the MIPS assembly code for the C assignment statement A[12] = h + A[8];?

1. Transfer A[8] into a register (we use $t0).


2. Add h and A[8] into a register (we reuse $t0).
3. Store the sum into A[12].

Thus we've:

lw $t0, 32($s3) # t0 = A[8]


add $t0, $s2, $t0 # t0 = h + A[8]
sw $t0, 48($s3) # A[12] = t0

NOTE: Load word and store word are the instructions that copy words between memory and registers
in the MIPS architecture. Other brands of computers use other instructions along with load and store to
transfer data.

More Examples
Example: Compile x = arr[3] into MIPS assembly code.

# Allocate $s1 = x and $s2 = base of arr

lw $s1, 12($s2) # x = arr[12]

Graphically:
Here we assume that registers $s1 and $s2 store the variable x and the base of array arr, respectively.
Again, the load word instruction follows the format lw REG, offset(MEM), which loads the contents of
the memory address in MEM (from memory) into REG (CPU). Certainly MEM is still stored in a register,
however that unsigned integer is a memory address nonetheless.

The part offset(MEM) is interpreted as MEM + offset. We’re basically given the base address of the
array and we must use the offset to access the appropriate element. In this example, we’re accessing
the element at position 3 and since each element (i.e., an integer) occupies 4 bytes in memory, we
move 3 × 4 𝑏𝑦𝑡𝑒𝑠away from the base address (see figure above). Thus, to access element at position
3, we load 12 bytes away from the base address of the array. To access element at position 0, we load
0 bytes “away” from the base address (i.e., we don’t move since we’re at the correct address after all).

TRIVIA: Most high-level programming languages ought the convention of 0-indexing arrays to the way
memory is accessed.
Remember that the first element of an array is at position 0, the second at position 1, and so on.
Therefore,

arr[0] lw $s1, 0 ($s2)


arr[1] lw $s1, 4 ($s2)
arr[2] lw $s1, 8 ($s2)
arr[3] lw $s1, 12 ($s2)
arr[4] lw $s1, 16 ($s2)
arr[5] lw $s1, 20 ($s2)
assuming that the array is large enough to access those elements. Notice the increment by 4 bytes
(i.e., that’s because integers in MIPS are 4 bytes = 32 bits. Each one of them occupy a 4 bytes long slot
in memory).

NOTE: The offset in a data transfer instruction (such load word and store word) is always a constant
integer, 16-bit signed integer to be precise. Thus, you cannot use a register as an offset for those
instructions. If you did use it, that wouldn’t be a legal MIPS instruction.

Example: Compile arr[2*k + 3] = arr[k + 5] + 6 into MIPS assembly code.

Here, instead of accessing the array with just a constant, we’re also using a variable. The process is
almost the same. For instance for arr[k + 5], we’ve 4 𝑏𝑦𝑡𝑒𝑠 × (𝑘 + 5) = 4𝑘 + 20bytes. This
means that we move 4𝑘bytes away from the base address, namely arr, and then we use an offset of
20bytes.

Another difference is that now we’re also storing from a register into memory so we must use the store
word instruction. Again, the store word instruction follows the format sw REG, offset(MEM), which
stores the contents in register REG (CPU) into MEM (memory address). For arr[2*k + 3], we’ve
4 𝑏𝑦𝑡𝑒𝑠 × (2𝑘 + 3) = 8𝑘 + 12 bytes. Thus, we move 8𝑘bytes away from the base address, and
then use an offset of 12 bytes.

# Allocate:
# $s1 = k
# $s4 = (base add. of) arr

add $t1, $s1, $s1 # t1 = k + k = 2k


add $t1, $t1, $t1 # t1 = 2k + 2k = 4k
add $t2, $s4, $t1 # t2 = base add. + 4k = add. of a[k]
lw $t3, 20($t2) # t3 = load from (add. of arr[k] + 20) = arr[k + 5]
addi $t3, $t3, 6 # t3 = arr[k + 5] + 6
add $t4, $t2, $t1 # t4 = add. of a[k] + 4 = base add. + 8k = add. of arr[2k]
sw $t3, 12($t4) # arr[2k + 3] = arr[k + 5] + 6

Notice that with add $t4, $t2, $t1, we reused registers $t2 and $t1, which already had the address
of arr[k] and 4k respectively, to get arr[2k]. To demonstrate we get the same result as above, let’s do it
by building up arr[2k] right from zero:

# Allocate: # $s1 = k, $s4 = (base add. of) arr

add $t1, $s1, $s1 # t1 = k + k = 2k


add $t1, $t1, $t1 # t1 = 2k + 2k = 4k
add $t1, $s4, $t1 # t1 = base add. + 4k = add. Of a[k]
lw $t2, 20($t1) # t2 = a[k + 5]
addi $t2, $t2, 6 # t2 = a[k + 5] + 6
add $t3, $s1, $s1 # t3 = k + k
add $t3, $t3, $t3 # t3 = 2k + 2k = 4k
add $t3, $t3, $t3 # t3 = 4k + 4k = 8k
add $t4, $s4, $t3 # t4 = base add. + 8k = add. Of a[2k]
sw $t3, 12($t4) # arr[2k + 3] = a[k + 5] + 6

We obviously wrote more instructions here but we get the same result nonetheless.

Example: Compile *p = a[n] - (*p) into MIPS assembly code.

# Allocate:
# $s0 = n
# $s1 = (base add. of) a
# $s2 = p

add $t1, $s0, $s0 # t1 = n + n = 2n


add $t1, $t1, $t1 # t1 = 2n + 2n = 4n
add $t1, $s1, $t1 # t1 = base add. of arr + 4n = add of a[n]
lw $t1, 0($t1) # t1 = a[n]
lw $t2, 0($s2) # t2 = *p
sub $t1, $t1, $t1 # t1 = a[n] - (*p)
sw $t1, 0($s2) # *p = a[n] - (*p)

Example: Compile A[3i + 4] = B[2i] + C[i+1] into MIPS assembly code.

# Allocate: $s0 = i, $s1 = A, $s2 = B, $s3 = C

add $t0, $s0, $s0 # t0 = i + i = 2i


add $t0, $t0, $t0, # t0 = 2i + 2i = 4i
add $t1, $s3, $t0 # t1 = add. of C[i]
lw $t1, 4($t1) # t1 = C[i + 1]
add $t0, $t0, $t0 # t0 = 4i + 4i = 8i
add $t2, $s2, $t0 # t2 = add. of B[2i]
lw $t2, 0($t2) # t2 = B[2i]
add $t3, $s0, $s0 # t3 = i + i = 2i
add $t3, $t3, $t3 # t3 = 2i + 2i = 4i
add $t0, $t0, $t3 # t0 = 8i + 4i = 12i
add $t3, $s1, $t0 # t3 = add. of A[3i]

You might also like