0% found this document useful (0 votes)
95 views

Lecture Notes-Computer Architecture-Module 1

This document discusses a computer architecture module that focuses on introducing concepts like MIPS instructions and computer performance. It describes topics that will be covered, including introduction to computer architecture, microprocessors, assemblers, compilers, application specific processors, and the Microblaze soft core RISC processor. It also discusses analyzing the performance of an ASP implementation versus a Microblaze soft core processor implementation for a given application. The document provides details about implementing the application in both an ASP and the Microblaze processor to analyze speedup and cost performance ratio.

Uploaded by

mokshagnanare26
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

Lecture Notes-Computer Architecture-Module 1

This document discusses a computer architecture module that focuses on introducing concepts like MIPS instructions and computer performance. It describes topics that will be covered, including introduction to computer architecture, microprocessors, assemblers, compilers, application specific processors, and the Microblaze soft core RISC processor. It also discusses analyzing the performance of an ASP implementation versus a Microblaze soft core processor implementation for a given application. The document provides details about implementing the application in both an ASP and the Microblaze processor to analyze speedup and cost performance ratio.

Uploaded by

mokshagnanare26
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

​ Computer Architecture - Module 1

This module mainly focuses on introduction to computer architecture and a few other concepts
such as MIPS instructions,computer performance. Listed below are the topics that will be
discussed here:
1. Introduction to computer architecture
2. Microprocessor,Microcontroller
3. Assemblers and Compilers
4. Application specific processor
5. Microblaze soft core RISC processor
6. Performance analysis of a computer
7. MIPS instructions
8. Instruction encoding formats

Introduction to Computer Architecture:


Computer Architecture is a study of computer system hardware that comprises processor,
memory, sub-systems and its interconnection.Knowledge of this course is useful in

1. designing consumer electronics systems.


2. designing optimized systems used in medical applications.
3. performing hardware-software partitioning.
4. designing AI and ML hardware.
5. designing processors for gaming systems, audio-video applications etc.
6. to analyze computer hardware systems in terms of performance parameters such as
power, delay, area etc.

Fundamentals of computer Architecture:


You’ll learn about ALU,control unit,RAM,cache,memory module in the later part of the course.

Classification of computer architecture:


Computer architecture is classified into three different types.

1. General purpose architecture -


● These processors have general purpose applicability.Some of the general
purpose computer architecture are Central processing unit(CPU), Graphics
processing unit(GPU).
● These are used for wide-variety of applications by a user. It has a general
purpose architecture.
● Examples include MIPS processor (microprocessor), HCS12 (Freescale
microcontroller), Intel Pentium series etc..
● Disadvantages : These processors are not optimized for high performance and
low power.

2. Custom/Application specific architecture -


● Used for a specific application or class of applications. It has dedicated usage for
performing data-intensive and power-hungry calculations.
● Cannot be used for general purpose applications. It is usually optimized to
deliver high performance and low power.
● Examples include Filter ASICs (application specific integrated circuits), JPEG
compression/decompression IPs etc..

3. Reconfigurable architecture/programmable architecture -


● Programmable architecture – usually called programmable logic devices. This
can be of 2 types: programmable logic array and programmable array logic.
● Can accommodate /emulate different digital applications on the hardware by
programming fuse links.
● Examples include FPGA (field programmable gate array, complex programmable
logic devices etc. Such as Spartan, Virtex..
● These provide better performance than general purpose architecture.
Microprocessor vs Microcontroller:
Both microprocessor and microcontroller come under general purpose computer architecture.

Microprocessor Microcontroller

- Microprocessor consists of only a Central - Microcontroller contains a CPU, Memory, I/O


Processing Unit. all integrated into one chip.

- Used in desktop machines and personal - Used in embedded systems.


systems.

- Could be Von Neumann architecture based. - Could be Harvard architecture based.


Examples are Intel Pentium processors, AMD Examples are Freescale Semiconductor
processors etc. HCS12…etc.

- Types: Reduced Instruction Set Computer - Low power feature exists as it finds
(RISC) - computer with a small, highly -optimized application in embedded systems.
set of instructions and Complex Instruction Set
Computer (CISC) based - more specialized set of - Examples: used in washing machines, MP3
instructions. players, and embedded systems.
Software Abstraction flow:
Below is a high level overview of where assembly language comes into play in MIPS
processors.

Assembler :​ Assembler converts the assembly level language to machine level code.

Compiler : ​Compiler translates high level programming language code to machine level code.

Application specific processor:


● The design of an Application Specific Processors (ASP) is only meant for a certain
specific application or for a certain class of applications.
● The design of ASP demands pure high performance based on the working constraints
provided, thereby being used as a function specific system.
● ASP cores are being increasingly used to address the need for high performance, low
area, minimum cost and timely operation in many embedded systems.
Microblaze soft core RISC processor:
● Microblaze system can range from a processor core with a minimum of local memory to
a large system with many Microblaze processors, sizable external memory, and
numerous OPB peripherals.
● An embedded system built around Microblaze consists of the Microblaze Soft Processor
Core, On-chip Local Memory, Standard Bus Interconnects and On-chip Peripheral Bus
(OPB) Peripherals.
● Microblaze uses three-stage pipeline architecture with fetch, decode, and execute
stages. Data forwarding, pipeline stall, and branches are resolved in the hardware
automatically.

We will discuss more about the comparison of performance between the FPGA based ASP and
Microblaze soft core RISC processor embedded in FPGA and come up with a distinctive FPGA
based comparative analysis to compare the cost-performance ratio (CPR) of an Application
Specific Processor (ASP) with Microblaze soft core RISC processor. Additionally,you will also
learn about the design process of a performance optimized power stringent ASP by converting a
computation intensive application into an actual Register Transfer Level (RTL) hardware design
as well as for Microblaze soft core RISC processor for the same given application.

Algorithm/Application:
To compare the performance of FPGA based ASP and Microblaze soft core RISC processor
embedded in FPGA, let’s consider a sample application process with the digital output value
according to the following function:

Where, ‘B’, ‘Wc’, ‘F’, ‘Tp’, and ‘WB’ are variables and ‘2’ being a constant which are all the input
vectors for the processor. Further, ‘Y (n)’ is the output of the application.

Implementation of the application in ASP:


● Once the optimal architecture for the given application is obtained, as soon as possible
(ASAP) algorithm is applied to form a scheduled sequencing graph. The latency and the
cycle time are determined from this graph. The execution time of the ASP (TASP) for an
‘N’ set of processed data is then determined using the equation below.

● Where ‘L’ is the Latency, ‘Tc’ is the cycle time and ‘N’ is the number of processed data
sets. Then the multiplexing scheme is determined for constructing the data path unit.
● Latency is the delay of the scheduling of a single output while cycle time is the difference
in the delay of two consecutive outputs.
● The controller was determined for providing the timing sequence to the data path. The
development of the controller and the data path of the device were finally performed in
Xilinx Integrated Software Environment (ISE) tool version 9.2i.
● After synthesis and implementation, verification of the controller and the data path of the
device were carried out in Xilinx ISE simulator.
● Simulation indicated that the designed processor with data path and controller is in
compliance with the functional and timing specification.
● Analysis of the simulations results revealed that the ASP was successfully designed and
is in compliance with all the technical specifications specified in the design problem.
● After implementation and verification of the processor, the bitstreams were downloaded
in xc3s500e-5fg320 Spartan 3E FPGA which was successfully implemented.

Implementation of application in Microblaze processor:


● The same application was also implemented in the soft core Microblaze RISC Processor
in the Spartan 3E FPGA. The C implementation is done by coding the application in XPS
tool.
● The C-code consists of coding the application as well as coding the timer which records
the starting time, current time and final time during processing ‘N’ sets of processed
data.
● Hence next the compilation was done to generate the bitstreams of the application in
XPS (EDK) for downloading in Spartan FPGA.

Analysis:
There are two different analyses:
a) The analysis of the FPGA based speedup comparison between the two processors
b) The cost performance ratio (CPR) analysis between the application specific processor
and the Microblaze soft core RISC processor.

Speed-up Analysis:
The performance of the ASP and the Microblaze soft core processor for the same application
was compared for a fixed set of processing data (N = 200) as shown below.The execution time
(TASP) of ASP implementation using the equation is:

● The value of ‘L’ is obtained after implementation of the ASP in Xilinx ISE. Hence, L =
19cc, N = 200 and Tc = 19cc. Therefore, TASP for ASP is 3800 clock cycles (cc).
● The execution time (TRISC) for N = 200 sets of data obtained through Microblaze Soft
core RISC processor is 9, 35,946 clock cycles. Therefore, Speedup obtained (ASP vs.
Microblaze Soft core RISC processor) is:
Performance Metrics:
Some of the important performance metrics are Latency,Execution time,CPU time and Number
of clock cycles.

Performance -
➢ It is a measure of how fast a processor takes to finish a task.
➢ Higher the performance more efficient is the processor.

Latency - ​It is time taken to produce the first output based on a set of provided inputs.

Execution Time -
➢ It is the inverse of Performance.
➢ Therefore (Performance)x = (1/Execution Time)x
➢ (Performance)x / (Performance)y = N means processor ‘X’ is ‘N’ times faster than
processor ‘Y’.

CPU time -​Execution time but doesn't count I/O or time spent running other programs.

Clock Cycles -
➢ Using clock cycles to report execution time per program is also performed instead of
seconds.
➢ Frequency can be converted into clock cycles.
➢ Time in secs = # of clock cycles * clock period
➢ Time in secs = # of clock cycles * (1/freq.)
➢ Eg:: If # of clock cycles = 10^10 and frequency = 500MHz; then execution time in secs
=10^10 * (1/(500* 10^6)) = 20 secs

Factors that affect the execution time -


➢ Number of instructions affects execution time. Number of instructions is directly
proportional to execution time.
➢ Different types of instructions consume different clock cycles.
➢ Multiplication takes more time than addition.
➢ Floating-point operations take longer than integers.
➢ Accessing memory takes more time than register
Ways to decrease execution time -
➢ Reduce the clock cycle duration
➢ Reduce the number of clock cycles
➢ Reduce the number of instructions
➢ Increase operating clock frequency

Classical CPU Equation:

CPU time = seconds/program = Instructions/program x cycles/instruction x seconds/cycle.

Cycles per Instruction(CPI):

● CPI measures the average clock cycles consumed per instruction.


● CPI = Total number of clock cycles / Total number of instructions in a program
● Total number of clock cycles = CPI * Total number of instructions in a program
● CPU time = Clock duration (period) * total # of clock cycles.

where CPIj = CPI of ‘j’th type instruction and Ij = number of ‘j’th type instructions.

MIPS Rating- ​Million instructions per second.


The formula to calculate MIPS is as follows.
Amdahl’s Law:
Amdahl’s law states that the new execution time is equal to the sum of unaffected execution
time and execution time after improvement by amount of improvement.

New Execution time =


Execution time affected by impr./ Amount of Improvement + Unaffected Execution Time

Eg : If 70% of execution time is done on integer instructions, and 6% on floating point


instructions and the total execution time is 100 seconds.
1. What’s the effect of making integer instructions 7 times faster?
New time = Execution time affected by impr./ Amount of Improvement + Unaffected
Execution Time = (100 * 0.70) / 7 + (100 * (1 - 0.70) )
= 70/7 + 100*0.30
= 10+30
= 40 seconds

2. What’s the effect of making F.P. instructions twice as fast?


New time = (100 * 0.06) / 2 + (100 * 0.94)
= 3+94 = 97 seconds

Overview of performance:
● Execution time is the most important performance metric.
● Basic formula for performance:
- Execution time = instructions * cycle time * CPI

Numerical on CPU performance:


Let us solve a numerical for better understanding of the concepts discussed earlier.
Consider 3 different processors P1, P2, and P3 executing the same instruction set with the
clock rates and CPIs given in the following table:

Processor Clock rate CPI

P1 2 GHz 1.5

P2 1.5 GHz 1.0

P3 3 GHz 2.5

1. Which Processor has the highest performance?


2. If the processors each execute a program in 10s, find the number of instructions.
3. We are trying to reduce the time by 30% but this leads to an increase of 20% in the CPI.
What clock rate should we have to get this time reduction?

Solution:

Ans 1 -
Let the number of instructions be ‘I’.
CPU clock cycles= Avg. clock cycles per instruction*I =CPI*I
CPU clock cycles (P1)=1.5*I (1)
CPU clock cycles (P2)=1*I (2)
CPU clock cycles (P3)=2.5*I (3)
CPU time = CPU clock cycles/ frequency
CPU time (P1)= CPU clock cycles(P1)/(2GHz)=(1.5*I)/(2*​10​9​) seconds (4)
​ 0​-9​ sec
= 0.75 * I * 1
CPU time (P2)= CPU clock cycles(P2)/(1.5GHz)=(1*I)/(1.5*​10​9​) seconds (5)
= 0.67 * I * 1​ 0​-9 ​sec
CPU time (P3)= CPU clock cycles(P3)/(3GHz)=(2.5*I)/(3*​10​9​) seconds (6)
​ 0​-9​ sec
= 0.833 * I * 1

Since the CPU performance is inversely proportional to CPU time, hence the processor with
lowest CPU time has the highest performance i.e. processor - P2.

Ans 2 -
Each processor executes the program in 10s.
Using eq. (4), (5) and (6),
CPU time (P1) = 0.75 * I * 10​9 ​ = 10
Hence, the # instructions (I) for P1= 13.33 * 10​9
Similarly,

CPU time (P2) = 0.67 * I * 1​ 0​-9 sec = 10
​ 0​9
Hence, the # instructions (I) for P2= 14.92 * 1
CPU time (P3) = 0.833 * I * ​10​-9​ sec = 10
Hence, the # instructions (I) for P3 = 12 * ​10​9

Ans 3 -
If execution or CPU time reduces by 30%, then new CPU time is 7 seconds
Since there is an increase of 20% in the CPI,
Therefore, new CPI = old CPI+(old CPI)*20/100= 1.2* old CPI
new CPI (P1) = 1.2* old CPI (P1)
= 1.2* 1.5= 1.8
Similarly, new CPI (P2) = 1.2*1.0= 1.2
new CPI (P3) = 1.2*2.5= 3.0

CPU clock cycles (P1)= I * CPI = 13.33 * 1.8 * ​10​9


CPU clock cycles (P2)= I * CPI = 14.92 * 1.2 * ​10​9
CPU clock cycles (P2)= I * CPI = 12 * 3 * ​10​9
Hence, new clock rates for P1, P2 and P3 to get CPU time reduction by 30% :
New frequency (P1)= CPU clock cycles (P1)/ new CPU time= 3.42 GHz
New frequency (P2)= CPU clock cycles (P2)/ new CPU time= 2.56 GHz
New frequency (P3)= CPU clock cycles (P3)/ new CPU time= 5.14 GHz

MIPS Instructions :
Let’s look at the categories of instructions.
1. Arithmetic type
- Integer
- Floating Point
2. Memory access instruction type
- Load & Store
3. Control flow type
- Jump
- Conditional Branch
- Call & Return

Registers in MIPS:

Register name Register number Usage

$zero 0 The constant 0

$v0 - $v1 2-3 Returned values

$a0 - $a3 4-7 arguments

$t0 - $t7 8-15 temporaries

$s0 - $s7 16-23 Saved values

$t8 - $t9 24-25 temporaries

$gp 28 Global pointer

$sp 29 Stack pointer

$fp 30 Frame pointer

$ra 31 Return pointer


These registers are used in instruction formats.The global pointer,stack pointer,frame pointer
and return pointer are not frequently used.

Arithmetic Operations:
● Most instructions have 3 operands
● Operand order is fixed (destination, source 1, source 2)
● Example 1:
High level code : Z = Y + X
MIPS code : ​add​ $s0, $s1, $s2
● $s0, $s1 and $s2 are associated with variables Z,Y,X respectively by compiler.

● Example 2:
High level code : X = P + Q + R;
Z = Y - X;

MIPS code : add $t0, $s1, $s2


add $s0, $t0, $s3
sub $s4, $s5, $s0
● Let’s say $s1 = Q and $s2 = R the sum of Q and R is stored in a temporary register
$t0.Next this temporary register is added to $s3(P) and stored in $s0(X) and next
subtract X($s0) from Y(Ss5) and store it in Z($s4).
● Operands must be registers, MIPS provides only 32 registers.

MIPS instructions:
MIPS instructions is divided into 4 classes:
1. Arithmetic/logical/shift/comparison
2. Control instructions (branch and jump)
3. Load/store
4. Other (exception, register movement to/from GP registers, etc.)

Instruction encoding formats:


There are three types of instruction encoding formats which are crucial to convert assembly
language code to machine code.
1. R-type (6-bit opcode, 5-bit rs, 5-bit rt, 5-bit rd, 5-bit shamt, 6-bit function code)

● 6-bit opcode/operation code specifies the operation/instruction to be performed.


● 5-bit rs indicate the first source register
● 5-bit rt represents the second source register
● 5-bit rd represents the destination register
● 5-bit shamt indicates the shift amount in the instruction format
● 6-bit function code identifies the specific R-format instruction

2. I-type (6-bit opcode, 5-bit rs, 5-bit rt, 16-bit immediate)

● 6-bit opcode/operation code specifies the operation/instruction to be performed.


● 5-bit rs indicate the first source register
● 5-bit rd represents the destination register
● 16-bit immediate could be a constant or an offset.

3. J-type (6-bit opcode, 26-bit pseudo-direct address)

● 6-bit opcodes indicates what type of J-type instruction is given


● 26-bit pseudo-direct address that represents the target address.

Instruction format Examples:

Example 1-
add $t0, $s1, $s2
As registers have numbers, $t0=9, $s1=17, $s2=18

rs = identifier of the first source register


rt = identifier of the second source register
rd = identifier of the destination register

Here rs takes the value of 9,rt takes the value of 18 and rs takes the value of 17.
Load/Store Instructions:

● Two components for load/Store instructions uses memory address


- A register whose content are known
- An offset stored in 16 bits
-
● Offset
- It is written in terms of number of bytes
- But in instruction it is in terms of number of words
- 32 byte offset is written as 32 but stored as 8 byte because every location in
MIPS memory stores 4 bytes.

● Computation of Address is:


- Adding content of register with offset

● All address has both these components


- Register 0 is used when no register is needed to be used
- Register 0 always stores value 0.

Instructions:
Memory Organisation:
● Viewed as a large, single-dimension array, with an address.
● A memory address is an index into the array.
● Byte addressing means that successive addresses are one byte apart.
● Word addressing means that successive addresses are four bytes apart.

Spilling registers:
When there are more variables than registers, then compiler tries to keep most frequently used
variable in registers

Register vs Memory:
We use registers instead of memory for storing all variables because:
● Smaller is faster: Registers are faster than memory.
● MIPS arithmetic instructions can read two registers, operate on them, and write one
register per instruction.
● MIPS data transfer only reads or writes one operand per instruction, and no operation.

MIPS instructions examples:

1. A[8] = h + A[8]
Here h is associated with register $s2 and base address for the array A[i] is in register
$s3.To get to the 8th index of the array,we’ve to add 8*4=32 bytes to the base address
of the array.
2. Below is the conversion of C code of swapping two elements of the array to machine
code.To get to the kth index of the array v[ ] we’ve to add 4*k bytes to the base address
of the array and to get to the (k+1)th index of the array v[ ] we’ve to add 4*(k+1) bytes to
the base address of the array. Here the base address of the array is $a0.
lw $t0, 0($t1) - Loading the value at kth index of the array into the register $t0.

lw $t0, 4($t1) - Adding an offset of 4bytes to $t1 to get to (k+1)th index of the array and
loading the value at (k+1)th index of the array into the register $t2.

MIPS Branching:
MIPS processor conditional instructions:
1. Branch on not equal
2. Branch on equal
3. I-type instruction

Examples:
1. bne $t0,$t1, Label - means if the contents of $t0 and $t1 are not equal then it will take
the control flow execution to the label of the program.
2. beq $t0,$t1, Label - means if the contents of $t0 and $t1 are equal then the control flow
of the program will go to the label.

Eg 1​:​ if(i == j)h = i+j ;


One of the equivalent MIPS code is
bne $s0,$s1, Label
add $s3,$s0, $s1

Eg 2​ : The C-code is as follows.


MIPS code:

● If $s4(i) and $s5(j) are equal then the control flow of the program will go to the Label 1
which is subtracting $s5 from $s4..
● Else the addition of $s4 and $s5 will execute.

Eg 3: ​Loading 32-bit constants into a register.


lui stands for load upper immediate that is why we are loading the upper 16 bits of a 32 bit
constant into a register $t0 and the remaining 16 bits will be 0’s.ori stands for or immediate,it
performs an OR operation between the $t0 (which contains the upper 16 bits) and the lower 16
bits of the constant and the final 32-bit output is stored in $t0 register.This is how 32-bit value is
loaded by using lri and ori instructions.

Calculating Jump Target address:


Jump instruction has a 6-bit opcode(000010), 26-bit pseudo-direct address. Let’s understand
how to calculate the jump target address using the below example.

Shifting the instruction left by two positions is equivalent to multiplying it by 4 which gives the
jump address.
Jump target address = (Upper 4-bit (31:28) of current PC+4) + (26-bit of immediate field of jump
instruction) + (Lower order bits 00) .

Calculate target address of Branch:

● I-type formats for branches


- Immediate (target label) = [target address – (PC+4)] / 4.
- Target address = 4 * Immediate + (PC+4) .

Eg : $s1, $s2, 25
Here ‘25’ is the target label and not target address.So target address is calculated as:
Target address = (4*25) +(PC+4)
Calculating target address of Jump:

● J-type formats for jump


- Immediate (target label) = [target address] / 4
- 4 * Immediate = target address

● This is not a 32-bit address. This only forms the lower 28-bits of the address.To form a
32-bit address updated PC must be concatenated with the upper 4-bits.

Eg 1 - j 2500
Here ‘2500’ is the target label and not target address.
So target address is calculated as: TA = (4*2500) = 10000

Eg 2 - jal 2500
Here ‘2500’ is the target label and not target address.
$ra = PC+4 goto 10000 TA.

MIPS code to Machine code:

C - code :
While (save [i] = = k)
i = i + 1;

Here let’s assume register $s6 stores the base address of the array save[ ], i is stored in the
register $s3 and k is stored in the register $s5.
Pseudo MIPS instruction:
Below are some pseudo instructions.

blt - branch and less than


ble - branch and less than or equal to
bgt - branch and greater than
bge - branch and greater than or equal to

Few more pseudo code instructions are:

You might also like