Lecture Notes-Computer Architecture-Module 1
Lecture Notes-Computer Architecture-Module 1
This module mainly focuses on introduction to computer architecture and a few other concepts
such as MIPS instructions,computer performance. Listed below are the topics that will be
discussed here:
1. Introduction to computer architecture
2. Microprocessor,Microcontroller
3. Assemblers and Compilers
4. Application specific processor
5. Microblaze soft core RISC processor
6. Performance analysis of a computer
7. MIPS instructions
8. Instruction encoding formats
Microprocessor Microcontroller
- Types: Reduced Instruction Set Computer - Low power feature exists as it finds
(RISC) - computer with a small, highly -optimized application in embedded systems.
set of instructions and Complex Instruction Set
Computer (CISC) based - more specialized set of - Examples: used in washing machines, MP3
instructions. players, and embedded systems.
Software Abstraction flow:
Below is a high level overview of where assembly language comes into play in MIPS
processors.
Assembler : Assembler converts the assembly level language to machine level code.
Compiler : Compiler translates high level programming language code to machine level code.
We will discuss more about the comparison of performance between the FPGA based ASP and
Microblaze soft core RISC processor embedded in FPGA and come up with a distinctive FPGA
based comparative analysis to compare the cost-performance ratio (CPR) of an Application
Specific Processor (ASP) with Microblaze soft core RISC processor. Additionally,you will also
learn about the design process of a performance optimized power stringent ASP by converting a
computation intensive application into an actual Register Transfer Level (RTL) hardware design
as well as for Microblaze soft core RISC processor for the same given application.
Algorithm/Application:
To compare the performance of FPGA based ASP and Microblaze soft core RISC processor
embedded in FPGA, let’s consider a sample application process with the digital output value
according to the following function:
Where, ‘B’, ‘Wc’, ‘F’, ‘Tp’, and ‘WB’ are variables and ‘2’ being a constant which are all the input
vectors for the processor. Further, ‘Y (n)’ is the output of the application.
● Where ‘L’ is the Latency, ‘Tc’ is the cycle time and ‘N’ is the number of processed data
sets. Then the multiplexing scheme is determined for constructing the data path unit.
● Latency is the delay of the scheduling of a single output while cycle time is the difference
in the delay of two consecutive outputs.
● The controller was determined for providing the timing sequence to the data path. The
development of the controller and the data path of the device were finally performed in
Xilinx Integrated Software Environment (ISE) tool version 9.2i.
● After synthesis and implementation, verification of the controller and the data path of the
device were carried out in Xilinx ISE simulator.
● Simulation indicated that the designed processor with data path and controller is in
compliance with the functional and timing specification.
● Analysis of the simulations results revealed that the ASP was successfully designed and
is in compliance with all the technical specifications specified in the design problem.
● After implementation and verification of the processor, the bitstreams were downloaded
in xc3s500e-5fg320 Spartan 3E FPGA which was successfully implemented.
Analysis:
There are two different analyses:
a) The analysis of the FPGA based speedup comparison between the two processors
b) The cost performance ratio (CPR) analysis between the application specific processor
and the Microblaze soft core RISC processor.
Speed-up Analysis:
The performance of the ASP and the Microblaze soft core processor for the same application
was compared for a fixed set of processing data (N = 200) as shown below.The execution time
(TASP) of ASP implementation using the equation is:
● The value of ‘L’ is obtained after implementation of the ASP in Xilinx ISE. Hence, L =
19cc, N = 200 and Tc = 19cc. Therefore, TASP for ASP is 3800 clock cycles (cc).
● The execution time (TRISC) for N = 200 sets of data obtained through Microblaze Soft
core RISC processor is 9, 35,946 clock cycles. Therefore, Speedup obtained (ASP vs.
Microblaze Soft core RISC processor) is:
Performance Metrics:
Some of the important performance metrics are Latency,Execution time,CPU time and Number
of clock cycles.
Performance -
➢ It is a measure of how fast a processor takes to finish a task.
➢ Higher the performance more efficient is the processor.
Latency - It is time taken to produce the first output based on a set of provided inputs.
Execution Time -
➢ It is the inverse of Performance.
➢ Therefore (Performance)x = (1/Execution Time)x
➢ (Performance)x / (Performance)y = N means processor ‘X’ is ‘N’ times faster than
processor ‘Y’.
CPU time -Execution time but doesn't count I/O or time spent running other programs.
Clock Cycles -
➢ Using clock cycles to report execution time per program is also performed instead of
seconds.
➢ Frequency can be converted into clock cycles.
➢ Time in secs = # of clock cycles * clock period
➢ Time in secs = # of clock cycles * (1/freq.)
➢ Eg:: If # of clock cycles = 10^10 and frequency = 500MHz; then execution time in secs
=10^10 * (1/(500* 10^6)) = 20 secs
where CPIj = CPI of ‘j’th type instruction and Ij = number of ‘j’th type instructions.
Overview of performance:
● Execution time is the most important performance metric.
● Basic formula for performance:
- Execution time = instructions * cycle time * CPI
P1 2 GHz 1.5
P3 3 GHz 2.5
Solution:
Ans 1 -
Let the number of instructions be ‘I’.
CPU clock cycles= Avg. clock cycles per instruction*I =CPI*I
CPU clock cycles (P1)=1.5*I (1)
CPU clock cycles (P2)=1*I (2)
CPU clock cycles (P3)=2.5*I (3)
CPU time = CPU clock cycles/ frequency
CPU time (P1)= CPU clock cycles(P1)/(2GHz)=(1.5*I)/(2*109) seconds (4)
0-9 sec
= 0.75 * I * 1
CPU time (P2)= CPU clock cycles(P2)/(1.5GHz)=(1*I)/(1.5*109) seconds (5)
= 0.67 * I * 1 0-9 sec
CPU time (P3)= CPU clock cycles(P3)/(3GHz)=(2.5*I)/(3*109) seconds (6)
0-9 sec
= 0.833 * I * 1
Since the CPU performance is inversely proportional to CPU time, hence the processor with
lowest CPU time has the highest performance i.e. processor - P2.
Ans 2 -
Each processor executes the program in 10s.
Using eq. (4), (5) and (6),
CPU time (P1) = 0.75 * I * 109 = 10
Hence, the # instructions (I) for P1= 13.33 * 109
Similarly,
CPU time (P2) = 0.67 * I * 1 0-9 sec = 10
09
Hence, the # instructions (I) for P2= 14.92 * 1
CPU time (P3) = 0.833 * I * 10-9 sec = 10
Hence, the # instructions (I) for P3 = 12 * 109
Ans 3 -
If execution or CPU time reduces by 30%, then new CPU time is 7 seconds
Since there is an increase of 20% in the CPI,
Therefore, new CPI = old CPI+(old CPI)*20/100= 1.2* old CPI
new CPI (P1) = 1.2* old CPI (P1)
= 1.2* 1.5= 1.8
Similarly, new CPI (P2) = 1.2*1.0= 1.2
new CPI (P3) = 1.2*2.5= 3.0
MIPS Instructions :
Let’s look at the categories of instructions.
1. Arithmetic type
- Integer
- Floating Point
2. Memory access instruction type
- Load & Store
3. Control flow type
- Jump
- Conditional Branch
- Call & Return
Registers in MIPS:
Arithmetic Operations:
● Most instructions have 3 operands
● Operand order is fixed (destination, source 1, source 2)
● Example 1:
High level code : Z = Y + X
MIPS code : add $s0, $s1, $s2
● $s0, $s1 and $s2 are associated with variables Z,Y,X respectively by compiler.
● Example 2:
High level code : X = P + Q + R;
Z = Y - X;
MIPS instructions:
MIPS instructions is divided into 4 classes:
1. Arithmetic/logical/shift/comparison
2. Control instructions (branch and jump)
3. Load/store
4. Other (exception, register movement to/from GP registers, etc.)
Example 1-
add $t0, $s1, $s2
As registers have numbers, $t0=9, $s1=17, $s2=18
Here rs takes the value of 9,rt takes the value of 18 and rs takes the value of 17.
Load/Store Instructions:
Instructions:
Memory Organisation:
● Viewed as a large, single-dimension array, with an address.
● A memory address is an index into the array.
● Byte addressing means that successive addresses are one byte apart.
● Word addressing means that successive addresses are four bytes apart.
Spilling registers:
When there are more variables than registers, then compiler tries to keep most frequently used
variable in registers
Register vs Memory:
We use registers instead of memory for storing all variables because:
● Smaller is faster: Registers are faster than memory.
● MIPS arithmetic instructions can read two registers, operate on them, and write one
register per instruction.
● MIPS data transfer only reads or writes one operand per instruction, and no operation.
1. A[8] = h + A[8]
Here h is associated with register $s2 and base address for the array A[i] is in register
$s3.To get to the 8th index of the array,we’ve to add 8*4=32 bytes to the base address
of the array.
2. Below is the conversion of C code of swapping two elements of the array to machine
code.To get to the kth index of the array v[ ] we’ve to add 4*k bytes to the base address
of the array and to get to the (k+1)th index of the array v[ ] we’ve to add 4*(k+1) bytes to
the base address of the array. Here the base address of the array is $a0.
lw $t0, 0($t1) - Loading the value at kth index of the array into the register $t0.
lw $t0, 4($t1) - Adding an offset of 4bytes to $t1 to get to (k+1)th index of the array and
loading the value at (k+1)th index of the array into the register $t2.
MIPS Branching:
MIPS processor conditional instructions:
1. Branch on not equal
2. Branch on equal
3. I-type instruction
Examples:
1. bne $t0,$t1, Label - means if the contents of $t0 and $t1 are not equal then it will take
the control flow execution to the label of the program.
2. beq $t0,$t1, Label - means if the contents of $t0 and $t1 are equal then the control flow
of the program will go to the label.
● If $s4(i) and $s5(j) are equal then the control flow of the program will go to the Label 1
which is subtracting $s5 from $s4..
● Else the addition of $s4 and $s5 will execute.
Shifting the instruction left by two positions is equivalent to multiplying it by 4 which gives the
jump address.
Jump target address = (Upper 4-bit (31:28) of current PC+4) + (26-bit of immediate field of jump
instruction) + (Lower order bits 00) .
Eg : $s1, $s2, 25
Here ‘25’ is the target label and not target address.So target address is calculated as:
Target address = (4*25) +(PC+4)
Calculating target address of Jump:
● This is not a 32-bit address. This only forms the lower 28-bits of the address.To form a
32-bit address updated PC must be concatenated with the upper 4-bits.
Eg 1 - j 2500
Here ‘2500’ is the target label and not target address.
So target address is calculated as: TA = (4*2500) = 10000
Eg 2 - jal 2500
Here ‘2500’ is the target label and not target address.
$ra = PC+4 goto 10000 TA.
C - code :
While (save [i] = = k)
i = i + 1;
Here let’s assume register $s6 stores the base address of the array save[ ], i is stored in the
register $s3 and k is stored in the register $s5.
Pseudo MIPS instruction:
Below are some pseudo instructions.