Csci 260 Study Guide-2
Csci 260 Study Guide-2
5,000
(5,000 + 25,000)
= 16. 7%floating point instructions
25,000
(5,000 + 25,000)
= 83. 3%integer instructions
Thus, 𝐶𝑃𝐼𝐴 = (0. 167)7 + (0. 833)1 = 2. 0 𝑐𝑦𝑐𝑙𝑒𝑠/𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛
● Find the total cycles and the total instructions in the whole program, and
then find the ratio of cycles per instructions.
There are 60,000 cycles and 25,000 + 5,000 = 30,000 instructions in the
60,000 𝑐𝑦𝑐𝑙𝑒𝑠
whole program. Thus, 30,000 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠
= 2 𝑐𝑦𝑐𝑙𝑒𝑠/𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛.
100,000 10 2
(100,000 + 50,000)
= 15
= 3
floating point instructions
50, 000 5 1
(100,000 + 50,000)
= 15
= 3
integer instructions
2 1 15
Thus, 𝐶𝑃𝐼 = 3
(7) + 3
(1) = 3
= 5 𝑐𝑦𝑐𝑙𝑒𝑠/𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛.
To use this method, we must first calculate the number of clock cycles for
Program 2. This is given by
𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠𝑓𝑝 + 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠𝑖𝑛𝑡 = 7(100, 000) + 1(50, 000) = 750, 000 𝑐𝑦𝑐𝑙𝑒𝑠
in total.
4. Processor B has an average CPI for Program 2 of 3.5. Its clock rate is 1.8 GHz. How
much time does it take to execute the program?
Here we're given the instruction count (from previous problem), the CPI and the
cycle time (in the form of the clock rate). Thus,
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡 × 𝐶𝑃𝐼 × 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡𝑖𝑚𝑒
1 291, 667
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 150, 000 × 3. 5 × 9 = 91 = 291 µ𝑠
1.8 ×10 ℎ𝑒𝑟𝑡𝑧 10𝑠
Power Wall
The power wall refers to the limit transistors experience in trying to perform instructions on a
given clock cycle. In other words, it describes the electric energy consumption of a chip as a
limiting factor for processor frequency increase. If there is too much power, the transistor will
overheat causing malfunctions. However, if there is less power, then we start to experience
leakage due to lower voltage.
CPU Benchmark
Describes Some kind of test program that is used to compare the computer. Specifically how the
program is able to run on which computer’s CPI.
Summary
Formula Meaning
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝐴 =
1 Relates the performance and (CPU)
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐴
execution time for a computer A (i.e., the
lesser the execution time the higher the
performance).
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝐴 1/𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐴 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐵 Relates the performance of computers A and
= = =𝑛
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝐵 1/𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐵 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝐴 B. It denotes that computer A is n times
faster than B.
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 × 𝑐𝑦𝑐𝑙𝑒 𝑡𝑖𝑚𝑒 Relates the clock cycles (how many clock
cycles?) and cycle time (how long is each
cycle?).
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 =
𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 Relates clock cycles and clock rate. This is
𝑐𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒
due to cycle time (period) and clock rate
(frequency) being inverse of each other.
Since cycle time is measured in seconds, the
clock rate is 1/seconds = Hertz.
𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 = 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡 × 𝑎𝑣𝑔 𝐶𝑃𝐼 Relates the instruction count (how many
instructions?) and the average CPI (cycles
per instruction).
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡 × 𝐶𝑃𝐼 × 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡 Relates instruction count, CPI, and cycle
time.
The next ring on the ladder is assembly language which can be read by a human, but isn’t directly
understood by the CPU. Assembly language is 1:1 with its instructions and is converted into machine
code via an assembler. Pseudo Instructions are special commands used by the assembler to help
assembly language be converted into machine language. Pseudo instructions does this by checking the
position of registers and memory addresses.
Lastly the compiler and interpreter take a high level language and convert it into a form that can be
eventually converted into Machine Language. Although similar, they have unique differences. An
interpreter executes the code in its source language (Java for example), while the compiler translates
that high level code into machine language (C++ for example).
A computer architecture is a set of rules and methods that describe the functionality, organization,
and implementation of computer systems. Some definitions of architecture define it as describing the
capabilities and programming model of a computer but not a particular implementation. The MIPS
architecture specifically refers to a special computer architecture that features reduced instruction
sets, which is what it's used in this course.
1. The first MIPS instruction calculates g + h. The result must be placed somewhere, so the compiler
creates a temporary variable $t0 and places it there.
Thus:
Example: To add the constant 4 to register $s3, we could use the code
The same operation can be achieved by using addi (add immediate) instead:
By including constants inside arithmetic instructions, operations are much faster and use less energy
than if constants were loaded from memory. The constant zero has another role, which is to simplify the
instruction set by offering useful variations. For example, the move operation is just an add instruction
where one operand is zero. Hence, MIPS dedicates a register $zero to be hard-wired to the value zero.
(As you might expect, it is register number 0.)
NOTE: Since MIPS supports negative constants, there is no need for a subtract immediate (or subi had
there been one) in MIPS.
Memory Operands: lw and sw
Arithmetic operations occur only on registers in MIPS instructions; thus, MIPS must include instructions
that transfer data between memory and registers. These instructions are called data transfer
instructions. To access a word in memory, the instruction must supply the memory address. Memory is
just a large, single-dimensional array, with the address acting as the index to that array, starting at 0.
For example, given the layout down below, the address of the third data element is 2, and the value of
Memory[2] is C:
Address | 0 | 1 | 2 | 3 | ...
Data | A | B | C | 4 | ...
lw
lw (stands for load word) - This MIPS instruction copies data from memory to a register, and is
traditionally called load. The format of the load instruction is the name of the operation followed by the
register to be loaded with the memory's data, then a constant (i.e., offset) and register used to access
memory (i.e., base register). The sum of the constant portion of the instruction and the contents of the
second register forms the memory address.
All offsets are divisible by 4 and increase the memory address by 1 (for each 4 bytes). This makes the
memory byte-addressable.
Example: Let’s assume that A is an array of 100 words and that the compiler has associated the
variables g and h with the registers $s1 and $s2 as before. Let’s also assume that the starting address,
or base address, of the array is in $s3. Compile the C assignment statement g = h + A[8];.
1. We must first transfer A[8] into a register. The address of this array element is the sum of the
base of the array A, found in register $s3, plus the number to select element 8. The data should
be placed in a temporary register for use in the next instruction.
2. Next we add h to A[8] (already in register $t0) and place the sum into the register corresponding
to g.
Thus we get:
sw
sw (stands for store word) - Traditionally called store, this instruction copies data from a register to
memory. The format of a store is similar to that of a load.
Example: Assume variable h is associated with register $s2 and the base address of the array A is in
$s3. What is the MIPS assembly code for the C assignment statement A[12] = h + A[8];?
Thus we've:
NOTE: Load word and store word are the instructions that copy words between memory and registers
in the MIPS architecture. Other brands of computers use other instructions along with load and store to
transfer data.
More Examples
Example: Compile x = arr[3] into MIPS assembly code.
Graphically:
Here we assume that registers $s1 and $s2 store the variable x and the base of array arr, respectively.
Again, the load word instruction follows the format lw REG, offset(MEM), which loads the contents of
the memory address in MEM (from memory) into REG (CPU). Certainly MEM is still stored in a register,
however that unsigned integer is a memory address nonetheless.
The part offset(MEM) is interpreted as MEM + offset. We’re basically given the base address of the
array and we must use the offset to access the appropriate element. In this example, we’re accessing
the element at position 3 and since each element (i.e., an integer) occupies 4 bytes in memory, we
move 3 × 4 𝑏𝑦𝑡𝑒𝑠away from the base address (see figure above). Thus, to access element at position
3, we load 12 bytes away from the base address of the array. To access element at position 0, we load
0 bytes “away” from the base address (i.e., we don’t move since we’re at the correct address after all).
TRIVIA: Most high-level programming languages ought the convention of 0-indexing arrays to the way
memory is accessed.
Remember that the first element of an array is at position 0, the second at position 1, and so on.
Therefore,
NOTE: The offset in a data transfer instruction (such load word and store word) is always a constant
integer, 16-bit signed integer to be precise. Thus, you cannot use a register as an offset for those
instructions. If you did use it, that wouldn’t be a legal MIPS instruction.
Here, instead of accessing the array with just a constant, we’re also using a variable. The process is
almost the same. For instance for arr[k + 5], we’ve 4 𝑏𝑦𝑡𝑒𝑠 × (𝑘 + 5) = 4𝑘 + 20bytes. This
means that we move 4𝑘bytes away from the base address, namely arr, and then we use an offset of
20bytes.
Another difference is that now we’re also storing from a register into memory so we must use the store
word instruction. Again, the store word instruction follows the format sw REG, offset(MEM), which
stores the contents in register REG (CPU) into MEM (memory address). For arr[2*k + 3], we’ve
4 𝑏𝑦𝑡𝑒𝑠 × (2𝑘 + 3) = 8𝑘 + 12 bytes. Thus, we move 8𝑘bytes away from the base address, and
then use an offset of 12 bytes.
# Allocate:
# $s1 = k
# $s4 = (base add. of) arr
Notice that with add $t4, $t2, $t1, we reused registers $t2 and $t1, which already had the address
of arr[k] and 4k respectively, to get arr[2k]. To demonstrate we get the same result as above, let’s do it
by building up arr[2k] right from zero:
We obviously wrote more instructions here but we get the same result nonetheless.
# Allocate:
# $s0 = n
# $s1 = (base add. of) a
# $s2 = p