Computer Language
Computer Language
Add - a, b, c - #a = b + c
operation – operands - comment
Eg: Add t0, g, h #t0 = g+h
Add t1,I, j #t1 = i+j
Sub f, t0, t1 #f = t0 – t1
-> equivalent C code: f = (g+h)-(i+j)
+> ?: why not making 4 or 5 inputs instructtions – DP1: Simplicity favors regularity
- Operands
+ Object of operation
. Source operand: provides input data
. Destination operand: stores the result of operation
+ MIPS operands
. Registers
. Memory locations
. constant/Immediate
+ - Larger register file should be better, more flexibility for CPU operation
- Moore’s law: doubled number of transistor every 18 mo
- ?: why only have 32 registers, not more? – DP/Design Principal 2: smaller is faster
-> Effective use of register file is critical
- Memory operand
+ Data stored in compute’s main memory :
. Large size
. Outsize CPU -> slower than register
+ Operations with memory operand
. Load values from memory to register
. Store result from register to memory
+ Sample instruction
lw $t0, 32($s3)
#do sth
#
sw #t0,48($s3)
int A[1000, //$s3 -> A
$s3 + 32 -> A[8]
$s3 + 48 -> A[12]
A[12] = A[8]
+ Byte Addresses
. Big Endian: leftmost byte is word address
. Little Endian: rightmost byte is word address
+ immediate operand
. ?: mostly used constant? - $zero/constant value of 0 – why? DP3: Making common cases fast
- Instruction set
+ 3 formats:
. register (R)
. Immediate (I)
. Branch (J)
+ R-instruction: all operands are register
+ I-instruction: one operand is immediate
+ J-instruction: the unconditional branch
* note: all are 32 bits long
+> ?: why not only one format? – DP4: Good design demands good compromises
+ 5 types:
. Arithmetic: addition, subtraction
. Data transfer: transfer data between registers, memory, and immediate
. Logical: and, or , shift
. Conditional branch
. Unconditional branch
+ Arithmetic operations
. MIPS arithmetic statement
add rd, rs, rt #rd <- rs + rt
sub rd, rs, rt #rd <- rs - rt
addi rd, rs, const #rd <- rs + const
+ Logical operations
* Single cycle vs Pipeline
- Single cycle CPU: 800 x 10^6 (ps)
- Pipeline CPU: 200 x 10^6 + 4 x 200 (neligible)
fill-up time
+ spped-up = 4 times
+ CPI = 1
Eg:
* Pipeline hazard
- Hazard: situations that prevent starting the next instruction in thenext cycle
+ structural: attempt to use the same resource by two different in structions at the same time
+ data: attempt to use data before it is ready
+ control: attempt to make a decision about program control flow before the condition has been
evaluated and the new PC target address calculated
*
IF: PC( Program Counter), Instruction memory
ID: Register File, control unit
EX: ALU
MEM: data memory
WB: Register File
Cache: m block = 2^10
Mem: p block = 2^28
MemBlock i -> Cache Block i mod m
* Principle of Locality - Temporal: nhiều block in cache
- Spatial: block size lớn
* How many total bits are required for a direct mapped cache with 16 kb abd 4 -word blocks of
32 bit address
CPU: 32 bit address
size: 16 bytes = 2^4
cache: 16Kb = 2^14