0% found this document useful (0 votes)
126 views61 pages

BSC Computer Acrchitecture

J S Mirza is advisor in the department of computer science. COMSATS Lahore, Pakistan and is actively engaged in teaching computer architecture course both to the undergraduate and graduate students. The book covers XX chapters covering almost all the important topics. The author does not guarantee the 100% accuracy of any information published herein.

Uploaded by

Arslan Muhammad
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views61 pages

BSC Computer Acrchitecture

J S Mirza is advisor in the department of computer science. COMSATS Lahore, Pakistan and is actively engaged in teaching computer architecture course both to the undergraduate and graduate students. The book covers XX chapters covering almost all the important topics. The author does not guarantee the 100% accuracy of any information published herein.

Uploaded by

Arslan Muhammad
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 61

ADVANCED COMPUTER ACRCHITECTURE

First edition

BSC COMPUTER ACRCHITECTURE


First edition

Dr J S MIRZA Advisor Department of Computer Science COMSATS Institute of Information Technology Lahore, Pakistan

J S Mirza is advisor in the department of computer science. COMSATS Lahore, Pakistan and is actively engaged in teaching computer architecture course both to the undergraduate and graduate students. He obtained his early education from Matric (equivalent to O level) to Master of Science (MSc) in Physics from Punjab University Lahore, Pakistan. Immediately afterwards he served as Senior Research Assistant (SRA) in Mangla Dam Pakistan for just about one year and later for more than one year as lecturer in the department of physics, Govt College Gujranwala and then Rahimn Yar Khan and lastly at Islamia College Civil Lines Lahore. Then he proceeded to the University of Salford, Lancashire, UK and completed M.Sc. (equivalent to MS/ M.Phil) and PhD from the Department of Electrical Engineering. He secured M.Sc. degree at his own expenses and for PhD study and obtained scholarship from the university for PhD Study On his return form England he joined Physics Department of Punjab University Lahre, Pakistan

PREFACE
Progress in the field of architecture is made almost on daily basis. Essentially in the teaching field, the contests of this course should include new inventions mad in the field every semester. The book covers XX chapters covering almost all the important topics. Chapter 1 covers XXX. Chapter 2 covers floating point representation of numerical data. Chapter 3 covers assembly language in which a program can be written. Chapter 4 covers data path detailing how data is assembled inside a computer. Chapter 5 covers multicycle implementation. Chapter 6 covers definitions of RISC and CISC, their differences, advantage sand disadvantages of these structures Chapter 7 covers pipelining techniques used in the processor architecture Chapter 8 covers performance measurement Chapter 9 covers instruction level processing Chapter 10 covers memory Chapter 11 covers virtual memory Chapter 12 covers cache Chapter 13 covers multilevel processing INDEX

The author, Dr J S Mirza, does not guarantee the 100% accuracy of any information published herein. Also the author does not take any responsibility, whatsoever, of any error, omission or any damage resulting from the use of the information contained herein. However, the author, to the best of his knowledge, has collected the information from various resources which he believes are reliable and updated. The author has long experience of teaching various subjects in the department of computer science in various countries. Special thanks go to the students of MS(CS)of COMSATS Institute of Information Technology (CIIT) Lahore, who studied subject of advanced computer architecture (Code: ) from the author and showed lot of interest in this publication. Copyright: by Dr J S Mirza of COMSATS Institute of Information Technology, Lahore, Pakistan. No parts of this publication may be reproduced or distributed by any means, or stored in a database or retrieval system, without the prior written permission of the author. Dr J S Mirza

CONTENTS
CHAPTER 1 CHAPTER 2 Introduction Instruction set 1 17

CHAPTER 3

Datapath

34

CHAPTER 4 CHAPTER 5 CHAPTER 5

Pipeline performance Multicycle Implementation

45 56 48

CHAPTER 6

Risc & Cisc

66

CHAPTER 8 CHAPTER 9

Performance Instruction Level Processing Memory Virtual Memory Cache Multilevel Processing

80-85 82

CHAPTER 10 CHAPTER 11 CHAPTER 12 CHAPTER 13

87 97 87 105

CHAPTER 1 INTRODUCTION
1.1
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 1.1

Introduction
Computer Architecture Everything inside computer is 1 and 0 with Examples Bits, Bytes, Word Size, Instructions, Programs Numbers Signed numbers 2s complement Sign extension Why MIPS Processor Speedup Amdahls law Benchmarks COMPUTER ARCHITECTURE

The word architecture most likely gives impression that we are talking about buildings design. The word computer in computer architecture will clarify the sense that we are talking about computer design. A computer most often deals with the memory and processing units. and memory is often divided into a number of categories, which may also be called memory units. These categories are: hard disk, maim memory, virtual memory and cache. Each memory category has its characteristic speed and particular placement with respect to the CPU. The very placement of memory units with respect to processing unit is called computer architecture. Many other factors also have to be considered when we talk about computer architecture. These will be dealt with later in Chapters. The functionality, inter- and intra- relationship of each components with other (s) also has to be taken into account when talking about computer architecture. Thus computer architecture not only means the placement of different components with respect to the processing unit but also their functionalities and relationship of one another. Traditionally, for a good reason, the cache is the nearest to the CPU, then comes main memory; virtual memory is part of the main memory and then hard disk. (see the figure 1.1) Cache is made of SRAM which is quite expensive; that is why its size is much smaller. DRAM is cheaper than SRAM and therefore the companies can afford to make its size bigger. The hard disk is very much cheaper and that is the reason it c an be made of a

bigger size The price and the access times of different memory units are given below for comparison sake

Figure 1.1: The placement structure of the processing unit and the different category of memory Memory Type Used in Access Time Access Time Ratios (If
SRAM is considered as reference)

Price

Price Ratio (disk is


considered a reference)

SRAM DRAM Disks

Cache Main memory Hard disk

5 ns 60 ns 7ms

1 12 times 14,000,00 times

$ 25/MB $1 .MB $0.01/MB (1cent)

2500 100 1

Fig 1.2 The comparison of prices and access time of different memory units (2008) 1.2

Every data and instruction inside a computer is represented by a long sequence comprising 1 and 0.

Memory is made of transistors. Each transistor makes one bit and eight transistors together will make one byte, A particular terminal of transistor has either a voltage on it or no voltage. If the voltage level on that particular terminal is +3 say we call it 1 and if the voltage level on that terminal is 0 we call it 0. Thus if there are 8 transistors in a line and the voltage levels starting from the first transistor until the last transistor are +3, +3, +3, +3, 0, 0, 0, 0 we can say the data stored is 11110000. When we put the data in a computer we can at will either put 1 or 0 in a particular transistor. When the computer starts making calculations then it determines 1 and 0 on transistors on its own. For convenience in reading we normally represent the data in sets of 4 bits. That is we put a space after every set. This way it is easy for us to represent a number. Remember the computer does not waste its memory by putting a space inside it. Thus data 11110000 will be written for convenience in understanding as 111 0000. Note there is one space between each set of 4 bits

Fir convenience the data can be represented as wither decimal or binary or hexadecimal From the prefixes comprising two letters used alongside the data it can be specified how the data has been written. 0b represents that data is binary. 0x represents that data is hexadecimal Examples 1.2 15 = 0b0000 1111 +15 = 0b0000 1111 15hex = 0xf0 +15hex = 0xf0 Another way to express the code in which the word is written is to express the code as suffix of LSB in the word as 0000 11112 f0hex The bit on the extreme right is always called Least Significant Bit (LSB) and the bit on extreme left is called Most Significant Bit (MSB). Computer understands by reading 0b and 0x that the data is inside binary or hexadecimal and accordingly converts it into binary numbers Whether the thing inside the computer is music, picture, text or numbers it will always be represented by a long sequence comprising 1 an 0. Let me explain how it is possible. Consider word size 16 of the computer which means that the all the data ( and instructions ) must be represented inside computer as 16 bits. We deal with numbers every day. We say this class conations 50 students; this thing has cost us $11; I have 4 children and often we have to add or subtract or multiply or divide the numbers. Examples of positive numbers are like +5, +64, 5 or 64 etc. The sign + is optional in this particular case. Numbers are represented in 1 and 0 (digital representation) in computer word size as follows. Suppose the computer word size is 8, then +5 +64 5 64 = 0000 0101 = 0100 0000 = 0000 0101 = 0100 0000

Music: when you speak or sing in fort of a microphone its diaphragm vibrates. Its motion on one direction may be called positive and on other direction may be called negative. Thus the diaphragm oscillates between positive and negative values. Figuratively this is represented by the following diagram where only one oscillation is shown.

Figure 1.2: one oscillation of a diaphragm of a microphone where the instantaneous values go from positive to negative side The analogue waveshape of the music is digitized and the decimal values of the upper displacement are: 5,18,28,25,35,40,28,28,10 which are converted by the computer into binary numbers as: 05= 18= 28= 25= 35= 40= 28= 28= 10= 0000 0101 0001 0010 0001 1100 0001 1001 00010 0011 0010 1000 0001 0010 0001 0010 0000 1010

These number may be either fed straight into the loud speaker or processed Picture: Consider for convenience a black and white picture. Draw horizontal lines on it from its top to bottom. Take any line. The pixels (picture elements or grains) on this line starting from say left to right have some degree of grey on it. There will be extremely gray elements and some elements may be least grey. Let us convert the into decimal number as we have done for the music file above. The decimal numbers can again be converted int binary bits. If these bits are transmitted to a far distance and fed to another computer the same line can be generated over there. Text: In ASCII Code (using 8 bits of word size) you know that
Decimal Numbers Equivalent Binary Numbers Equivalent Binary Capital Letters Small case English Letters Equivalent Binary small Case Letters

0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

0011 0000(48) 0011 0001(49) 0011 0010(50) 0011 0011(51) 0011 0100(52) 0011 0101(53) 0011 0110(54) 0011 0111(55) 0011 1000(56) 0011 1001(57) 0011 1010(58)

A B C D E F G H I J K

0100 0001(65) 0100 0010(66) 0100 0011(67) 0100 0100(68) 0100 0101(69) 0100 0110(69) 0100 0111(70) 0100 1000(71) 0100 1001(72) 0100 1010(73) 0100 1011(74)

a b c d e f g h I j k

0011 0001 0011 0010 0011 0011 0011 0100 0011 0101 0011 0110 0011 0111 0011 1000 0011 1001 0011 1010 0011 1011

. .

. .

. .

. .

. .

. .

The figures in the bracket are decimal numbers which we have written for no good reason. Our concern is directly with binary number as we are explaining that all text can also be represented in binary numbers, Let us write Dig. From the above table it will be: Number Music Picture Text 2 Mona Lisas Naseem Hijazi novel 0000 0000 0000 0010

1.3

Bits, Bytes, Word Size, Instructions, Programs

Bit: The easy way is to count the number of 1 and 0 in a word. Say, if a word is 10001000 and the question is how many bits are there in it you count and say there are 8 bits. Similarly the following word 1111 0000 1111 0000 contains 16 bits. Byte: The 8 bits taken together are called byte. Thus 0000 1111 0000 1111 has two bytes in the word. Byte on the low side can be called low end byte and byte at the high side can be called high end byte. Word size: Computer has a word size and it almost always demands that instructions and data must be supplied to it in word size. It also makes calculations and provides the result in the word size unless told to do otherwise. Thus 8-bit computer means that all instructions and data must always comprise 8-bits. Similarly 16-bit computer means that all instructions and data must always comprise 16-bits. Now the PC are available were the word size is 64. One advantage of having bigger word size is that much more information can be packed inside the instructions Instructions: Each instruction given to a processing unit is a command given to the processing unit to take some action. These actions are called instructions. Fir instance Add or subtract or multiply or divide the two given numbers Compare the two numbers to fid out whether they are equal or different, and if different which one is bigger which one is smaller. Depending upon comparison to decide whether to take next instruction or take some other instruction.

Program: When computer debuted in early 1960 the instructions were fed to the computer one by one. Later on Von Neumann a Hungarian scientist suggested that

instructions can be collected in one place in a computer and the computer can be called to take first instruction and execute it. After it has been executed take the next instruction itself and execute it and then the next instruction and so on. The entire set of instructions which are written is caked program,

1.4

Numbers

Numbers (whether positive or negative) are always represented in 2s complement. In other words you can say all the numbers in a computer are represented in 2s complement. Suppose you supply any number to the computer it will convert it into 2s complement and then save it. The number can be written into sign and magnitude form but computers do not use this form. This is just academic exercise.

1.5

Sign and magnitude numbers

The MSB (most significant bit) is reserved for the sign of the numbers. If the MSB is 0 it will represent that word is positive and if it is 1 it will represent the word is negative. This means that if the word size is 8 bits then 7 bits will be used to calculate the value of the number Example 1.5 / 2. 1111 0000 represents decimal -70 0111 0000 represents decimal +70

1.51 Unsigned numbers


Then unsigned numbers are the numbers which do not carry signs as 70, 127 etc. If you think hard you will realize that unsigned and positive numbers do not differ; thus 70 and +70 mean the same thing. You can say that unsigned number are in fact positive numbers. Think whether our thinking is true for +0 and 0 or -0.

1.6

2s complement numbers

Now a days computer almost always use 2s complement numbers. The 2s complement will take care of the sign as well as the value of the number. Let use see some example. Take the word size as 4. That is, we must take 4 bits; 2s complements of numbers are written manually as follows: For positive numbers just write their binary values as you would do for unsigned binary numbers. Thus in 4 bits the 2s complement of +1, +3, + 4 etc is 0001, 0011, 0100. Note that the two complement for 1,3,4 are also the same because + attached numbers and unsigned numbers have the same 2s complments. For negative numbers in 2s complement. (that is, the numbers with which minus sign is attached: Fist write the magnitude of the number. Say, If you want to take 2s complement of - 6, then
first write 6 in 4 bits.

Invert the number; 0 into 1 and 1 into 0 Add 1 into it

This is 2s complement number of the negative number Restrictions: for 4-bit word size the maximum positive or unsigned number we can write in 2s complement is : 2n-1 -1 where n= 4 for word size 4. In this case the maximum is 7. Similarly, for 8-bit words the maximum positive or unsigned number we can write in 2s complement is : 2n-1 -1 where n=8 for word size of 8. In this case the maximum is 127. Example 1.6 (1) : Write in 4 bit word size, 2s complement of 8, +9, +7, -6 The magnitude of the number which is to be written in 2s complement must be less than 2n-1 -1 where n= word size. That is if the word size is 4 , n=4. 8 cannot be written in 2s complement in 4 bits word size because its magnitude is greater that 2n-1 -1. Similarly +9 cannot be written in 4 bits word size because the words magnitude, 9 is greater that 2n-1 -1 +7 can be written in 2 s complement because its magnitude, 7, is less than 2n-1 -1 -6 can be written in 2 s complement because its magnitude, 6, is less than 2n-1 -1 Number 7 Sign and magnitude -Unsigned 2s complement binary number 0111 0111 (look at the MSBit. If it is 1 the number is negative, if it is 0 the number is positive) --0111 --1001

+7 -7

0111 1111

Note that the minimum negative number in 2s complement is If the word size is 4 bits 24-1 -1 If the word size is 8 bits 28-1 -1 Let us calculate the maximum and minimum number s in 2s complement for 4 diferent word size WORD SIZE 4 Maximum:0111 Minimum:1000 Decimal value (as though the words were unsigned) 7 8

Range: 10000 Maximum: 0111 1111 Minimum: 1000 0000 Range 10000 0000

16 127 128 256

If you look at the pattern it is the same throughout. For maximum of the 2a complement put 0 at the left hand side and all other 1. For minimum 1 at the left and side and all other 0. For range put extra 1 on left and side and all other 0 number 7 6 5 4 3 2 1 0 -1 -2 ,,, -4 -5 -6 -7 2s complement (in 4 bits) 0111 0110 0101 0100 0011 0010 0001 0000 1110+1=1111 1101+1=1110 1100+1=1101 1011+1=1100 1010+1=1011 1001+1=1010 1000+1=1001 Calculation of value of the number 0+4+2+1=7 0+4+2+0=6 0+4+0+1=5 0+4+2+1=4 0+0+2+1=3 0+0+2+0=2 0+0+0+1=1 0+0+0+0=0 -8+4+2+1=-1 -8+4+2+0=-2 -8+4+0+1=-3 -8+4+0+0=-4 -8+0+2+1=-5 -8+0+2+0=-6 -8+0+0+0=-7

When calculating the value the MSbit is taken minus and the rest positive

,, ,, ,, ,, ,, ,, ,,

,, ,, ,, ,, ,, ,, ,,

,, ,, ,, ,, ,, ,, ,,

Note1: In the above table all the positive (or unsigned numbers) start with 0or the MSBit is 0, and all the ive numbers start with 1 from the left hand side. The most easy way to write the maximum / maximum limit / the maximum value to which you can rightly go to, and minimum / minimum limit/ the minimum value to which you can rightly go to in 2s complement is the following. If word size is 4 Maximum limit Minimum limit Range Figuratively 0111 1000 1111 If word size is 8 0111 1111 1000 0000 1111 1111 If word size is 16 0111 1111 1111 1111 1000 0000 0000 0000 1111 1111 1111 1111

1.7

Sign extension

Let us suppose you have a number in 8 bits and you want to convert it into 16 bits. The most significant bit is extended till the word size is 16 bits.. You will see in the Data path Chapter that we normally deal with the conversion of a 16 bit data into 32 bit data. Diagrammatically it is represented as: The left hand side is the16 bit data input and the right hand side is the 32 but output. This conversion is required because ALU will always take two number each of 32 bits long.

Fig 1.3: a conversion of the data from 16 bit into32 bits Examples Suppose the number is 1111 0000 (8 bits). Convert the number into 16 bits. Extend the sign bit: 1111 1111 1111 0000. Let us check if the value of the word remains same. Yes it does as shown below Word (in 4 bits) 0001 1000 Word (in 8 bits) obtained through sign extension 0000 0001 1111 1000 Explanation Both the word same the same value namely 1 -128=+120 =-8 SAME

Suppose another number is 0000 1111 (8 bits). Convert the number into 16 bits 16 bit number obtained through sign extension is 0000 0000 0000 1111

1.8 MIPS PROCESSOR


First let us have little introduction of MIPS MIPS MIPS is the name of a microprocessor MIPS works on a stored program concept first suggested by VON NEUMANN. That is, a number of instructions are stored inside the computer

first. Instruction are mere words, like 1001 0101, comprised of many 1s and many 0s which tell computer what to do; for instance add two numbers, subtract two numbers etc. These instructions, taken together, are called a program. Once the program is stored inside the computer the computer is then started to work on instructions itself, without any help from human being, one by one and in the same sequence which has been provided. When the instructions are complete computer stops working. 100 million of MIPS processors were made in 2002 and they are used in

ATI Technologies Broadcom Cisco NEC Nintendo Silicon Graphics Sony Texas Instruments Tishiba

CHAPTER 2 INSTRUCTION SET


Register file Machine language and its fields

We are accustomed to writing our computer programs in high level language as C, C++, etc because we learned these languages at the university. The computer converts your program written in high level language into machine level language and then works on these machine level language programs. If you look at the machine level programs it consists of 1 and 0 but these 1and 0 are packed into groups called fields an each instruction is composed of number of fields say 5 or 6 etc. Each field conveys a certain facts to the processor. If the word size of the computer is 32 each instruction contains as many 1 and 0; that is, there will be 32, 1 and 0 and they are grouped into number of fields; each fold conveys certain message to processor and then processor works accordingly There is another form of program called assembly language program. Because the people can not understand easily machine level language program or take too much time to make sense of instructions written in machine level language, the computer program can also be represented in assembly language. The assembly level language is in between high level and low level language and can thus be relatively easily understood.

2.1

Some Assembly Language Instructions

add $s1, $s2, $s3


This instruction means add the contents of register, $s2, into the contents of cash register, $s3, and place the result of addition into cash register, $s1. Note that $s1, $s2 and $s3 indicate the registers numbers and not the contents. The contents will be found once the processor has access to these registers. Similarly there are instructions that subtract, multiply and divide two numbers. In them the abbreviations sub, mul and div is used. Some instructions compare the two numbers. You must note that in almost all the instructions we deal with use two numbers only and specify where the result should go into. That is only three numbers are dealt with in each instruction. The first word add is called operation code (add is op code), $s1, $s2 and $s3 are called operands. Note in each instruction there is always one op code and almost all three

operands are given. Why three. Because in ALU three terminals are shown two inputs and one output

Figure 2.1: Arithmetic Logic Unit (ALU), an important part of the processor, which takes two numbers applied at its two input terminals and the result of them is provided third output terminal In the above add instruction: add is called operation. To be more specific, we call it the mathematical operation. The contents of the registers, $s1, $s2, $s3 are called operands There are three operands in each operation. Sticking to this requirement that every instruction must have three operands has a lot of advantages. One advantage is this will keep the hardware simple than if some instructions took more than three or less than three operands. It will be told later that some instruction require two operands

The Register file of the MIPS contains 32 registers. $ is put before register name. Some registers are called $t; t stands for temporary, some registers are designated $s. The words after # mark (sharp mark) are called comments. These comments are for the program writer or other readers to understand what the instructions mean, because sometime you write instruction and you forget why it was written in such a way. The 32 registers in the register file are named and numbered as follows. The register # 1, 26 and 27 have not been shown in the table. Register No. 1 called $at is reserved for assembler and register Nos. 26 an 27, called $k0 and $k1, are used by the operating system alone Table XX: registers names and numbers Register Register Can the user names number (s) write the register $zero 0 No

the value is always zero.

this register cannot be written $at $v0 -$v1 $a0 - $a3 $ t0 - $t7 $ s0-$s7 $t8-$t9 $k0 $k1 $gp $sp $fp $ra 1 2-3 4-7 8-15 16-23 24-25 26 27 28 29 30 31 No No Yes Yes Yes Used by operating system Used by operating system Yes Yes Yes Yes v stands value for

t stands for temporary s stands for saving t stands for temporary

global pointer stack pointer floating point ra stands for return address

In assembly language most computer instructions seem to have four fields as shown below Add $s1 $s2 $s3

But when this instruction is converted onto machine language, the language which computer understands, then this instruction will convert into 6 fields. Note some assembly language instructions convert into 6 fields and some in 4 fields as shown below add instruction in terms of machine language fields would look like this (6 bits) op opcode (5 bits) rs source operand, rs (5 bits) rt source operand, rt (5 bits) rd destination operand, rd (5 bits) shamt shift amount (shamt) (6 bits) funct function code

The first 6 bits in the first field would tell the machine about the operation

The next 5 bits in the 2nd field denote the register number which contains the data. This is called source operand and is denoted by rs. The subscript S stands for resource register. The next 5 bits, in the 3rd field, denote the register number which contains the second data and is denoted by rt. The subscript t stands for temporary register. The next 5 bits denote the register number where the result should go in and is denoted by rd. The subscript d stands for destination register. The next 5 bits, in the 4th field, indicate shift amount if any. Since it is not concerned by the instruction add, it is ignored by the computer; in fact for add operation this field will be 00000 meaning this field is not concerned. Function code in the last filed elaborates op- code.

Note that two fields, the first one and the last one, tell specifically about the operation.

Example 2.1 (1)


If the machine language instruction is as shown below what registers have been used for the instruction? 000000two 10001 two 10010 two 01000 two 00000 two 100000 two

The word two at the right hand side of the binary numbers shown above indicate and that the word which follows has been written in Radix two, Note that you can write any number into three forms called radix. Radix two ( Binary) Radix eight (Octal) Radix ten ( Decimal) Radix sixteen (also called hex) Radix two means that word is written into binary form Radix eight means word is written into octal form Radix ten means the word is written in decimal form Radix sixteen means that word is written into hex form In the case of MIPS for each instruction we ensure that the first field and the last field comprise 6 bits and the rest of the fields comprise 5 bits. In the six fields of machine instruction we have given decimal numbers as is the tradition. This makes it easy for us to understand which register is source and which is temporary and destination register. Thus we can easily tell, looking at the machine instruction, that register number 17 in the

register file is used for first source operand, rs; the register number 18 has been used for the second operand, rt, and finally register number 8 has been used for the result of addition. The operation field and the function field both together tell the computer what exactly has to be done with the two numbers
Operation (decimal value) Rs (decimal value) Rt (decimal value) Rd (decimal value) Shamt (decimal value) Function (decimal value)

17

18

32

Machine language instruction has been shown to contain 6 fields: 6-field instruction machine language is called R-type. There are 4 field-instructions as well in machine language. These are called I-type instruction In the R type instruction, the three registers must be mentioned. The exact operation is determined by the contents of 1st field and the 6th field In the table we have shown this instruction and in the last column of the table we indicate whether there is overflow. Let us discus what is overflow.

The Word Size and Overflow


The MIPS processor word size is 32 bits. It means that all of its words whether they are instructions or data are stored in the computer memory as 32 bits long; that is when calculations are made each of the operands must contain exactly 32 bits and the result must also be 32 words. If the first bit on right hand side is counted as bit number 0, then the last bit would be bit number 31. Also remember that almost always the words are represented in twos complement. For quick calculation, remember if a word is 10 bits long and the word is unsigned, it maximum value is 1K, for 20 bits, the max value is 1M, and for 30 bits the maximum is 1G. Thus if you are given unsigned word which is 32 bits long; you can immediately say that 30 bits mean 1G, so 31 would mean 2G and 32 bits would mean 4G. Word size
10 bits 20 bits 30 bits

Rough value
1K (1 Kilo) 1M (1 Mega) 1G (1 Giga)

maximum Exact value

maximum range
0 to 1,023 0 to 1,048,575 0 to 1,073,741,823

1,023 1,048,575 1,073,741,823

Remember that computer words are always represented in the memory in 2s complement form. For the positive words its twos complement is exactly the same as is the unsigned binary word. For negative numbers to be represented into 2s complement, conversion

has to be made. Below are shown some number and their complements. For convenience the word size has been taken as 8, not 32. Remember restrictions when converting a number into 2s complement? restriction are again shown below. These

Restrictions: for 4-bit word size the maximum positive (or unsigned number) we can write in 2s complement is : 2n-1 -1 where n= 4 for word size 4. In this case the maximum is 7. another way of quickly telling the maximum number which can be converted into 2s complement is . . . . ( 1000) Similarly : for 8-bit words the maximum positive or unsigned number we can write in 2s complement is : 2n-1 -1 where n=8 for word size of 8. In this case the maximum is 127. Some examples of converting decimal number into 2s complement are given below. Note that: Word size has been mentioned as 8 All the words given in the table below are within limits . That is the words are less than +127 and -128 and they conform to the restrictions
Word (in decimal) or radix 10
+5 -5

Binary representation in 8 bits


0000 0101 +5 = 0000 0101 Invert = 1111 1010 Add 1 = 1111 1011 So -5 = 1111 1011 0000 0111 +7 = 0000 0111 Invert= 1111 1000 Add 1= 1111 1001 So -7 is 1111 1001 0111 1111 +127 = 0111 1111 Invert= 1000 0000 Add1= 10000 0001 Beyond the limit ---

2s complement
0000 0101 1111 1011

+7 -7

0000 0111 1111 1001

+127 -127 +128 -128

0111 1111 10000 0001

--1000 0000

2.2. Sub $s1, $s2, $s3


It means subtract the contents of $s3 from $s2 and place the result in $s1. Overflow will be detected by the computer itself, if any. You do not have to do anything. In the machine language this instruction would be represented as 000000 10001 10010 01000 00000 100000

operation (decimal value)

rs (decimal value)

rt (decimal value)

rd (decimal value)

shamt (decimal value)

function (decimal value)

17

18

34

Note the 1st and 6th fields for add are 0 and 32. These fields are 0 and 34 for sub

2.2. lw $to, 32 ($s3)


Load means transfer a data from one location in the main memory to one of the registers in register file. This above instruction means load 8th word (starting from 32nd byte in memory) from the array whose base address is given in $s3 and put it in $t0. Often you deal with arrays which have thousands of elements in them and we have to process each element. You might think, you would save time in calculation if the entire array were stored in cache, and then assembly language is written. But you would realize that there are only 32 registers in register file and it is not possible to store the thousands of elements of array in cache registers. The array has to be stored in memory; memory has millions and millions of locations. But since the ALU would perform calculations only on the contents of registers we have to load the array data into the register file first. The instruction means add 32 into the contents of register $s3 reach the memory location whose address has been calculated above and transfer the contents of that location into $t0 $s3 contains the base address of the array. The base address means the starting address of the array. The register which contains the base address is also called index register; that is the address where the array starts or its first element is stored. An offset is added into it which for the 8th word of the array is 32. (note that data word consists of 4 bytes)

2.3 sw $t0, 48 ($s3)


sw stands for the store word and it is the reverse of the load operation. It means transfer a data from one of the file registers to the memory. sw $t0, 48 ($s3) means store the word from register t0 into memory at the address which is obtained by adding the contents of $s3 and 48. The lw and sw operations are called data transfer instruction. lw and sw instructions transfer one data from memory to register or vice versa. If your variables are many more and you feel that 32 registers of cache will not be enough then the frequently used data is kept inside the register file and the less frequently used data is placed in memory. This scheme is called register spillover.

I said there may be thousand of data words in the array which need processing. Suppose at some stage you require to put the word whose address is 1023 from the base into the register $s1. Base of the array is given in $s0. The instruction should be lw $s1, 1023($s0). Obviously you cannot write this instruction, because 1023 (11 1111 1111) cannot be written in 5 bits. Such instructions as lw and sw compel us to use more bits to write figures like 1023 etc. So we combine last three fields which will give us now 16 bits and 16 bits will give us 216 ( 64k-1) Thus R-type instruction is (6 bits) op (5 bits) rs (5 bits) rt (5 bits) rd (5 bits) shamt (6 bits) funct

6 bits Op

5 bits rs

5 bits rt

16 bits Constant address

This tyo ofninstruction is called I type. I-type instruction is used for lw and sw operations. The table below shows few instructions detail Instruction add subtract add immediate lw sw Format R R I I I Rs 0 0 8ten 35ten 43ten Rt Reg Reg Reg Reg reg rd Reg Reg Reg Reg reg Shamt funct address

Category ARITHMETIC instructions: there are 7 instructions code available in this category. In the column of registers the leftmost register $s1 is the the destination register. The immediate word in some instruction indicates that one of the words in the equation is the constant. add subtract add immediate code add sub addi Registers $s1, $s2, $s3 $s1, $s2, $s3 $s1, $s2, 100 explanation $s1=$s2+$s3 $s1=$s2-$s3 $s1=$s2+100
3 operands; overflow detected 3 operands; overflow detected + constant; overflow detected

add unsigned subtract unsigned add immediate unsigned move from coprocessor register

addu subu add iu mfc0

$s1, $s2, $s3 $s1, $s2, $s3 $s1, $s2, 100 $s1, $epc

$s1=$s2+$s3 $s1=$s2-$s3 $s1=$s2+100 $s1= $epc

3 operands; overflow detected 3 operands; overflow detected + constant; overflow detected Copy exception PC + Special regs

DATA TRANSFER: data transfer means data is either going from memory to register or vice versa. Load always means data is going from memory to register and store always means data is going from register to memory. The immediate means that we are dealing with a constant. In data transfer we either deal with full word or half word or byte load word store word load half unsigned store half load byte unsigned store byte load byte immediate code lw sw lhu sh lbu sb lui Registers $s1, 100($s2) $s1, 100($s2) $s1, 100($s2) $s1, 100($s2) $s1, 100($s2) $s1, 100($s2) $s1, 100 explanation $s1=Memory[$s2+100] Memory[$s2+100]=$s1 $s1=Memory[$s2+100] Memory[$s2+100]=$s1 $s1=Memory[$s2+100] Memory[$s2+100]=$s1 $s1=100*216
Word from memory to register Word from register to memory Half word from memory to register Half word from register to memory Byte from memory to register Byte from register to memory Load constant in upper 16 bits

LOGICAL and or nor and immediate or immediate shift left logical shift right logical

and or nor andi ori sll srl

$s1,$s2,$s3 $s1,$s2,$s3 $s1,$s2,$s3 $s1,$s2,100 $s1,$s2,100 $s1,$s2,10 $s1,$s2,10

$s1=$s2 & $s3 $s1=$s2 ! $s3 $s1= ~($s2 ! $s3) $s1=$s2 & 100 $s1=$s2 ! 100 $s1=$s2 << 10 $s1=$s2 >> 10

In each instruction $s1, $s2 and $s3 or constant are mentioned . In immediate instruction the constant value is given instead of $s3. AND and OR is conducted on ($s2

and $s3) or ($s2 and constant) which is 100 in these examples. sll and srl perform shifting operation on register $s2 and save the result on $s1. CONDITIONAL BRANCH. There are 6 instructions in this category which are given below in the tabular form. There is always a condition given in the branching instruction. This condition is given in the bracket on explanation side. If the condition is satisfied, that is it is true, then the operation will be performed otherwise no operation is performed. In case the operation is not performed the processor will go on its normal route, branch on equal branch on not equal ste on less than set on less than immediate set less than unsigned Set less tan immediate unsigned beq bne slt slti sltu sltiu $s1,$s2, 25 $s1,$s2, 25 $s1,$s2,$s3 $s1,$s2, 100 $s1,$s2,$s3 $s1,$s2, 100 if ($s1= =$s2) goto PC+4+100. if ($s1 != $s2) goto PC+4+100. set $s1 to 1 if ($s2 < $s3) else make it zero set $s1 to 1 if ($s2 < 100) else make it zero set $s1 to 1 if ($s2 < $s3) else make it zero set $s1 to 1 if ($s2 < 100) else make it zero
Note 25 translates to 25*4=100 ,, ,, ,,

UNCIONDITINAL JUMP jump jump register jump ad link J Jr jal j 2500 jr $ra jal 2500

Branching instruction

2.4 LOGICAL INSTRUCTIONS


We will be dealing wit shift instructions which means shift the 32-data left or right. sll means shift left logical srl means shift right logical similarly we will be dealing with other logical instructions shown below in the table along with example: Category Instructions Example Meaning Comments

arithmetic

operation add subtract add immediate

add $s1,$s2,$s3 sub $s1,$s2,$s3 addi $s1,$s2,100 and $s1,$s2,$s3 or $s1,$s2,$s3 nor $s1,($s2,$s3) andi $s1,$s2,100 ori $s1,$s2,100 sll $s1,$s2,10 srl $s1,$s2,10 lw $s1, $s2 (100) sw $s1, $s2 (100)

$s1=$s2+$s3 $s1=$s2-$s3 $s1= $s2 +100 $s1= $s2 & $s3 $s1= $s2 ! $s3 $s1= ~($s2 ! $s3) $s1= $s2 & 100 $s1= $s2 !100 $s1= $s2 <<100 $s1= $s2 >>100 $s1= memory [$s2 +100] memory [$s2 +100]= $s1

Overflow detected Overflow detected Overflow detected

logical

and or nor and immediate or immediate shift left logical shift right logical load word store word

data transfer

Overflow is detected only in arithmetic operation. It is not detected in logical operation or data transfer operations and, andi, or, ori and nor instructions are given example of. We will take short word size consisting of 8 bits and operation applies masking a word. Only those bits of the word which we are interested in remain unchanged, the rest become zero. For instance if we want the first 4 bit of the word to remain unchanged and all the rest become zero, the mask should be Mask =0000 1111 Word to be masked= 1111 0101 After anding operation the word will become : 0000 0101 (its first 4 bits remain unchanged and the rest become zero) EXAMPLES AND is used to force the unwanted bits in the word to 0. For example Mask 0000 1111 0000 1111 0000 0000 Word 1010 0111 1010 1010 0111 1010 After and operation the word becomes 0000 0111 0000 1010 0000 0000

and operation forces the unwanted bits in a word to 0

It is done by ANDing a bit of the word with corresponding bit of the mask and saving the resulting bit. Let us start with the first word in the above table. We start from the left side of the word and AND the first bit of the mask and the first bit of the word and then proceed to the second and third and so on. The first four bits have been ANDed. Similarly the remaining bits can be solved 0&1=0 0&0=0 0&1=0 0&0=0 The mask contains zeros where the words bits are to be masked Or operation. It forces some bits in the word to 1 Or with Words 0000 1111 0000 1111 0000 0000 1010 0111 1010 1010 0111 1010 After or operation the word becomes 1010 1111 1010 1111 0111 1010

Or operation forces the unwanted bits in a word to 1 It is done by ORing a bit of the word with corresponding bit of the mask and saving the resulting bit. Let us start with the first word in the above table. We start from the left side of the word and OR the first bit of the mask and the first bit of the word and then proceed to the second and third and so on. The bits 5 to 8 have been ORed. Similarly the remaining bits can be solved 1& 0 = 1 1&1=1 1&1=1 1&1=1 NOR operation (Nor operation) 0000 0000 0000 1111 0000 0000 SOME DEFINITIONS Numeric version of instructions machine language is used to distinguish instruction from the Sequence of such instruction is called machine code The layout of the instruction is called the instruction format 1010 0111 1010 1010 0111 1010 0101 1000 0101 0101 1000 0101

2.5. DECISION MAKING/ IF ELSE THEN/ CONDITIONAL JUMP/ UNCONDITIONAL JUMP


Often you have to make decision for instance: If i is equal to j, the add h into j; if i is not equal to j then subtract h from j, diagrammatically it can be shown as below and the assembly language will be beq $s1, $s2, L1 # if contents of $s1 and $s2 are equal then branch to instruction which has been labeled L1; else continue with the next instruction which is g+h.

The computer finds it easy to do it in the reverse direction. bnq $s1, $s2, L1 # if contents of $s1 and $s2 are not equal then branch to L1; else continue with the next instruction

Why so? It will be simpler to code it and save the time. In the branch first condition is tested and then branching or continuation takes place. It is called conditional jump There is another thing which is called unconditional Jump. This is unconditional because there is no condition to be tested, Branching must take place. To distinguish conditional form unconditional, the unconditional jump has been named as J exit 2.6 LOOPS Loops are important. Suppose you want to do a set of instructions. Yu do the set once and decrease content of a certain register, say $s1, by one. If $s1 was originally contained 10, now it will contain 9. You decide you will repeat same set of instructions over and over again and decrease each time $s3 till it becomes zero. This means you will have to repeat that set of instruction ten times. Thus you are looping or repeating same set of instructions 10 times. We put 10 in register say $s1 and each time we complete the set we decrease $s1 by 1. when it becomes zero we come out of this loop. STACKS / PROCEDURE / FUNCTION Your program, whether it is written in assembly language or high level language, executes the instruction in order. That is, first it executes instruction #1, then instruction #2, then instruction #3, then instruction #4 and so on. Such execution is called sequential. All the program run sequentially. Often during the programs sequential run, the program When the program reaches this instruction the jumping takes place to exit whatever.

has to leave its sequence and make a jump to another short program to get its job done. The program it jumps to is called procedure or function, The program form where jump is made is called caller and the procedure it jumps to is called callee. A caller, say, has been written by you to do addition of a very long list of numbers and somewhere in the middle of your program you require program to do multiplication of two numbers and then do addition. The jump is made to the callee which has been written to do multiplication of two numbers. This strategy, to go to a callee for multiplication of two numbers has at least one advantage. You do not have to write instructions for multiplication. If you have to write so, your program will become bigger in size; you may also make certain errors in multiplication instruction. Why blither for multiplication of two number if a certain program had already been written and it has been well tested. Always the programmer swill do this. The will use short programs written by others to get the jobs done. Before your program makes a jump to another program called procedure following things must be doine 1. The data to callee must be provided by the caller upon which it has to do work. 2. The data must be placed where the callee can have access to. In fact four registers have been reserved for this purpose and they are called $a0, $a1, $a2 and $a3. a stands for the XXX 3. Then caller should be called to do its work; The caller calls callee by issuing instruction jal XXX The jal stands for jump and link and giving instruction number to which jump should be made. 4. The calle would evidently want few registers to use. These register may already be in use by the caller. So they are saved first so that when callee finishes its job the original values of the registers be restored. If there are, say, ten registers to be saved we will save them,. In MIPS these register same 10 in n e rand they are designated $s. S stand for saved registers, 5. The callee, after finishing its jib tells the caller to resume its job as it has finished its work. To do this callee follows the instruction 6. Jr XXX. The jr stands for jump to register whose address is given by XXX. Nested Procedure If a caller has called procedure #1 and before procedure#1 is complete, procedure #1 calls another procedure #2, it will be called a case of nested procedure; the procedure #2 before its completion may call another procedure #3; so on and so forth. Nested procedures are not uncommon. The point to remember in the case of nested procedure is how to save the registers in the stack so that when a certain procedure finishes its task and goes to its caller it should present the exactly the same environment to the. Diagrammatically the nested procedure can be shown like

Picture ADDRESSING MODES There are plenty of addressing mode; five of them are important, and they are mentioned below 1. 2. 3. 4. 5. Register Addressing Base or Displacement addressing Immediate Addressing PC relative Addressing Pseudodirect Addressing

These modes can best be described diagrammatically Mode Register addressing Base or addressing Displacement Diagram Explanation All three operands are in registers Two operands are in registers and the third one is a constant. Two operands are in registers and the third one is a constant Pc address is formed

Immediate Addressing PC relative Addressing Pseudodirect Addressing JUMP and unconditional branch INSTRUCTIONS j jr jlr

Scientific Notation: any number that has both exponent and fraction is called written in scientific notation. This means that fraction and exponents are must. Example: 3.578 X 10 -5 Since in a computer numbers are always written in twos complement numbers 0 and 1, a binary number in scientific notation will always look like, say 0111.1010 x 2 -1

Normalized scientific notation; When a number in a scientific notation is written such that there is only one bit on the left hand side of fraction it is called normalized scientific notation; Remember that the one bit on LHS must be non zero . In other words it must not be zero For instance 1.0011 x 2 0 1.1110 x 2 -1 How computer detects overflow operation Operand A A+B A+B A-B A-B positive negative positive negative

Operand B positive negative negative positive

Result indicating overflow negative positive negative positive

Overflow: Suppose you add two floating point numbers and their result turns out to have a big exponent; so big that it cannot be stored in 7 bit reserved for exponents value ( note the 8th bit is reserved for sign ) , then overflow is said to have occurred. Overflow is detected by the computer as given in the above table. The maximum value of the exponent is +127 (0111 1111). If suppose the result exponent is 127, so no overflow is said to have occurred. Underflow: Suppose you computer subtracts a floating point number, B, from another floating point number, A, and the result turns out to have a very small exponent; so small that it cannot be represented by 8 bit reserved for exponent;. then underflow is said to have occurred. In our above example if the exponent turns out to be -126. Note -127 is reserved for another purpose. Reserved exponent values: These are mentioned in overflow and underflow paragraphs. The maximum value is 127 and minimum value is -126. Note -127 is reserved which will be explained later Significand versus fraction: fraction is the part contained in fraction bits. Significand is the implicit value of the fraction when 1 is added to fraction part EXAMPLE OF ADDING TWO FLOATING NUMBERS REPRESENTED IN TWOS COMPLEMENT (add) Add 0.75 into 0.25 Ensure that the two numbers are in the format of significand and Exponent. This will have the two numbers in a given format. Any number which is not in that format has to be converted first to contain significand and exponent.

The two numbers are converted into binary form The exponent of the small number is increased to be equal to the exponent of the big number by shifting the significand to the right. When exponents are equal the significands are added. Overflow or underflow is detected and exception is approached Round the significand, if one has to be done. Normalization is checked again. If it has to be done it has to be done All the steps of the algorithm are given in the flow chart shown below.

FIG 2.2

The bock diagram of the unit which does the addition of the floating numbers is shown in fig XXXX

FIG 2.3 Instruction size


The memory locations inside memory units always comprise 8 bits. That is, each memory location consists of 8 bits or 1 byte only. This is true for both the instruction and the data Since each instruction is mad up of 32 bits this means that for evry 32 bits instruction we required 4 bytes. When we load these instruction in the memory each instruction will fill 4 locations . Thus for successive instructions we have increment by 4. This is godoe by using ALU as follwows

CHAPTER 3 DATAPATH AND CONTROL


Assembly language instructions are of 5 categories: THE ARITHMETIC INSTRUCTION. There are 7 instructions in this category. All of them are arithmetic instruction. You would have noted that immediate instruction replaces one register with a constant word, so immediate instruction has a total of two registers and one constant word. The 7 instructions in this category are: add, sub, addi, addu, subu, addiu, mfc0. The abbreviations respectively stand for addition subtraction addition immediate addition unsigned addition immediate unsigned move form co processor

Note that these instructions contain 3 register save those that are immediate instructions. As said above in immediate instructions there are two registers and one constant

FIGURE 3.1 Data path of an arithmetic instruction which requires three registers arithmetic instruction. (Fig 5.10) op rs reg 1 rt reg 2 rd reg 3 shamt function

FIGURE 3.2: the six fields of R-type instruction To align with the lecture notes, we have called rs as reg 1, rt as reg 2 and rd as reg3 in the six fields above. You would note that in the figure above Read register 1 means the register number 1 or rs in the machine language instruction. The contents of this register appear at Read data 1 of the box Simiarly Read register 2 means the register rs, in the machine language instruction. The contents of this register appear at Read data 2, in the box. Write register 3 means the register rd, in the machine language instruction. Its address is supplied to the box, at its input The contents of this register are calculated by ALU shown in the figure above THE DATA PATH OF THIS INSTRUCTION: As the 32-bits instruction enters in the box, (please see the figure) Read register1 and Read register 2 and Write register 3 receive the register numbers from the instruction automatically and Read data 1 and Read data 2 receive the contents of Read register 1 and Read register 2 automatically These two contents are added by the ALU and the result of their addition appears at the output terminal of ALU. We do not need Data memory box in this instruction. So the addition result bypass data memory box, passes through MUX and then back to Write data in the first box. Thus we have supplied both Write register and Write data in the first box which will now write the result in the register when next cycle come in. DATA TRANSFER instruction: In this category there are also 7 instructions and in this type a data word moves from data memory to one of the registers in register file or vie versa. The instruction are lw, sw, lhu, sh, lbu,sb, lui. The abbreviations respectively stand for load word store word load half unsigned store half load byte unsigned store byte, and

load upper immediate

the first letters of the words are collected in the abbreviations LOGICAL INSTRUCTIONS: There are 7 instructions in this category as well. They are and, or, nor, andi, ori, sll, srl. These abbreviations respectively stand for and or nor (not or) and immediate or immediate shift logical left shift logical right

CONDITIONAL BRANCH instructions. There are 6 instructions in it, namely beq, bne, slt, slti, sltu, sltiu. Their abbreviations respectively stand for branch if equal branch if not equal set on less than set on less than immediate set on less than unsigned set on less than immediate unsigned

UNCONDITIONAL JUMP INTSRUCTIONS: There are three instructions in it, namely j, jr and jal. Their abbreviations respectively stand for jump jump register jump and link

CATEGORIES of instruction It is appropriate point to pronounce that there are three types of machine language instruction namely R type I type, and J type The number of fields required for them in machine language instruction are as follows: Name Field size 6 bits Fields 5 bits 5 bits 5 bits 5 bits 6 bits

R Format

op

rs

rt

rd

shamt

funct

6 fields in this category; Arithmetic instruction format 4 fields in this category; Data transfer. branch immediate format

I format

op

rs

rt

address / immediate

J format

Op

Target address

fields in this catego ry;

Jump instr uctio n form at

FIGURE 3.3 The types of instructions available in MIPS: namely R-type, I type, and J type. When the program is told to start the processor will first look to the programs counter and reads what is written inside it, this is the first instruction address of the program. The program counter (PC) increases by 4 to fetch the next instruction after it has executed the preceding instruction. Because each location of the memory which has stored the program in it is byte long and each instruction is 32 bits or 4 bytes long , so 4 locations are taken to store one instruction The two consecutive instructions are always 4 bytes away. That is, why the PC has to be increased by 4 bytes to point to the next instruction. Mathematically this fact is written by: PC(new value) = PC (old value) +4 This above equations is normally written as: PC = PC +4 If the instruction says PC to jump, say by 100, then the new value of the PC will be PC = PC +4 +100

FIGURE 3.4 This figure shows how PC is incremented to fetch the next instruction Two ALUs have been shown. Since their action is only to add. (in fact other ALUs
can perform maximum 16 operations and therefore require control signals to tell them which operation is to be performed and this ALU can only be asked to add only) there is no need to have control signals to control its

operation as there is only one operation of add. The word adder written inside the symbol testifies to this fact. First adder adds 4 into the old value of PC and the MUX allows this value to pass through MUX and reach at the input to the PC. Note MUX has two inputs and one output. The control signals tells which input can pass through the MUX; the second input will be disallowed to pass to the output. If a jump instruction comes and tells jump by 100 instruction then the PC will follow this equation: PC = PC + 4 +100 ALU 1 and ALU2 both of them will be used and theMUX1 will open the way 2 to have access to the output and block the path1.

FIGURE 3.4 : data path for a load instruction / store instruction (5.11) Whether PC+4 or PC+4+100 is to be selected is controlled by MUX MUX: MUX is an electronic device with a number of inputs and one out put. The out put is always one and the number of inputs traditionally is either 2 or 4 or 8 or 16 etc. That is the higher number of inputs is always double than its immediately lower number. The MUX is made of combinational circuits Examples of arithmetic-logical instructions, which contains three registers Add the contents of the two registers, $s1 and $s2, and put the result inside a third register, $s1 add $s1, $s2, $s3 $s1= $s2 + $s3 (3 registers) Subtract the content of a register, s3, form the content of another register,s2, and put the result into a third register, s1. sub $s1, $s2, $s3 $s1= $s2 - $s3 (3 registers) AND the contents of the two registers and put the result into a third register and $s1, $s2, $s3 $s1= $s2 & $s3 (3 registers) OR the contents of the two registers and put the result into a third register or $s1, $s2, $s3 $s1= $s2 ! $s3 (3 registers) Set a register to 1 if the content of one register is less than the content of another register Slt $s1, $s2, $s3 (if $s2<$s3) $s1=1 else $s1=0 (3 registers) Did you note that in all of the arithmetic logical instructions given above three registers have to be supplied Memory reference instruction or data transfer instruction i.e., load word (lw) and store word (sw). Memory reference instructions are the ones which deal with the data memory and either put the value in one of the locations or take one value from one of the locations Examples of memory reference instruction: lw $s1, 100($s2) $s1= Memory [$s2+ 100] Load register $s1 with the contents of the memory location100, say, from the base register sw $s1, 100($s2) Memory [$s2+ 100]= $s1

store the word inside the register $s1 into a memory location whose address is 100 from the contents of the base register In the memory reference instruction only two registers arte mentioned Branch instructions i.e., branch if equal (beq) and jump (j) beq $s1, $s2, 25 (if $s1 = = $s2) go to PC + 4 +100 j 2500 goto 1000 How the instructional data flows firm the beginning to the end in the processor is known as Data path or dataflow. We will explain how the instruction proceeds in different stages Fetch, Decide, Register Read, Execute and finally write back In the diagram shown below five major parts have been shown of the processor PC (part 1) Instruction memory (part 2) Register file (part 3) ALU (part 4) Data memory (part 5) The part numbers are also shown in the diagram XX

FIGURE 3.5 two more adders are also shown. Let us call them part 6 and art 7 All the instructions whether they belong to class I, class 2 or class 3 must use PC and instruction memory Then one or two registers must be read from the register file as determined by the op code of the instruction Then ALU must be used for: The arithmetic unit will use to add, subtract two numbers or AND, OR or SLT the two registers contents Memory ref instructions will use to calculate the new address For branch instruction the ALU will used for comparison

State elements are PC and memory. Combinational circuit Instruction memory is only read ie., combinational logic. The processors 32 general purpose register Data writing needs input address and data Write control must assert a write signal

ALU CONTROL The 4 bits of the ALU Control, as shown in the diagram, are derived from the function field of the machine instruction which is made up of 6 bits and also two more bits called ALUOP. Thus there will be altogether 8 bits entering control unit for the ALU which is producing 4 bits at its output that will control the operation of the ALU

FIGURE 3.4: How the 4 bits that are shown entering ALU are derived We will show that if 6 lines of function field of the machine language are given to us and in addition to that 2 more bits are given to us which we will call ALUop, we can easily generate a code for the ALU operation. The two bits ALUop are generated by another control unit. Let us first of all show a table to show how many operations ALU has to perform. Even though there are some 60 instructions (we have listed in the first column only lw, sw, branch if equal and R type instructions), ALU has to perform only 6 operations which are highlighted in the following table in the last column. Instruction op code Lw Sw branch equal R type R type R type R type R type Instruction operation load word store word branch if equal add subtract AND OR set on less than 2 bits of 6 bit of Desired ALU function ALU action operation field 00 00 01 10 10 10 10 10 xxxxxx xxxxxx xxxxxx 100000 100010 100100 100101 101010 add add subtract add subtract and or set on less than 4 bits ALU Control inputs 0010 0010 0110 0010 0110 0000 0001 0111 of

Note in the 6-bits function field, x means dont care. The above table means that 4-bits or 4 lines entering the ALU require to generate only 6 codes which are highlighted. Also not that in the column of control inputs the Left most bits is always 0. So instead of 4 bits

our interest should be to cater for left three bits and make the 4 th bit always 0. That is 3 the control bits should be bothered only. The above table is again presented below but with a different viewpoint. If you look at the 3rd column of above table, 2-bit ALUop is given. In 2-bit ALUop If you have 1 on the left hand side (LHS) you are surely generating 0110. This means 1 is important and the adjoining 0 on the LHS is unimportant. In other words whenever you see 1 on the right hand side LHS can be ignored or do not care about it. Similarly If you have 1 on the left hand side you must also check what the function field is and depending upon the function field you will generate the 4 bits. Note when you have 1 on LHS you have always 10 on LHS of the function field. Why not bother about those bits which we have to care about and disregard those which we do not have to?. This fact is also said that we care about those bits which are asserted (that is which are 1) and stop bothering about those which are 0. Thus we will use this technique to generate the 4-bit code for ALU control. The beauty if this strategy is that our control units will be much easier to develop. ALUop ALUop1 0 X 1 1 1 1 1 ALUop0 0 1 X X X X X Function F5 X X X X X X X Field F4 X X X X X X X Operation F3 X X 0 0 0 0 1 F2 X X 0 0 1 1 0 F1 X X 0 1 0 0 1 F0 X X 0 0 0 1 0 0010 0110 0010 0110 0000 0001 0111

IN the table above the ignorable bits in the ALUop are made x to indicate they are donot care ones. Also sine the left most bits are always 10 we also ignore them and make them donot care.

Let us explain how the 4 bits of ALU control are produced. Let us take the first row of the table. The 2-bits of ALUop are determined by the instruction opcode. lw and sw produce 00, branch if equal produces 01 and R type instruction will always produce 10. The function fields in the case of lw, sw and beq are always ignored and the 4 control bits for the ALU are 0010, 0010 and 0110 . TABLE 3.1 of the control outputs from the control unit. This table shows how the data flows RegDst The register destination The register destination

RegWrite

number for the Write register comes form the rt field [20-16] None

ALUSrc

PCSrc

MemRead

The second ALU operand comes from th second register file output [read data 2] The PC s replaced by the output of the adder that computes the value of PC+4 None

number for the Write register comes form the rd field [20-16] The register on the wrote register input is written with the value on the Write data input The second ALU operand is the sign extended lower 16 bits of the instruction The PC s replaced by the output of the adder that computes the branch target Dat memory contents designated by the address input are put on the Read data output Data memory contents designated by the address input are rep[laced by the value on the Write data inout The value fed to the register Write data input comes form the data memory add $s1, $s2, $s3 sw $s1, 100($s2 sw $s1, 100($s2 add $s1, $s2, $s3 add $s1, $s2, $s3

MemWrite

None

MemtoRef

The value fed to the register Write data input comes form the ALU Register Destination Branch Memory Read Memory to Register ALU operation Memory Write ALU Source Register Write

RegDst Branch MemRead Mewmto Regoistyer ALUop MemWrite ALUSrc RegWrite

MAIN CONTROL UNIT Before we explain main control unit some observations must be given

CHAPTER 4

PIPELINING
4.1 What is pipelining ?
In old processor there used to be one circuit which would process an instruction completely. Once the instruction had completed itself and came out of the processor the following instruction would enter the processor. In order to speed up processing of the processor, pipelining concept was implemented Pipelining allowed the breakup of a single circuit of the processor into five stages called: 1. 2. 3. 4. 5. Fetch Decode Register Read Execute Write back

The figure below shows the breakup of the processor

FIGURE 4.1: the breakup of the circuit of the processor into five stages

Five instructions have been shown waiting outside the processor for execution; the instruction about to enter the processor is #1 as shown in the figure; instruction #1 is followed by instruction # 2 and then #3, and so forth. In the old processor there used to be one circuit. The instruction #1 when it entered the processor used to be fetched, decoded, register read, executed and then the result of the execution written back into a register of the register file; that is all the above named processes would happen one after the other and once the processes were complete the instruction would come out of the box, meaning that its execution was complete. Only then instruction #2 would enter and would pass through the same stages as instruction #1. When instruction #2 was done with completely instruction #3 would enterer and so on. Thus instructions would enter the processor, one instruction per second

We have supposed each instruction remains inside the processor for one second. Thus on the output side of the processor, you can say that after every second one instruction would come out of it. Thus every second one instruction would come out of the processor box. Now let us consider pipelined processor. The one circuit of the old processor was sluggish and in order to run things faster we split the processor circuit into five stages namely 1. 2. 3. 4. 5. fetch decode register read execute writeback

Note these are the same stages that one circuit of the processor would do. Now we have split the one circuit into five stages. As the instruction #1 enters the processor, it goes into the fetch stage. Once the fetch stage has done its job, the instruction would leave the fetch stage and enter the decode stage. NOTE that fetch circuit is now empty and can be used for the following instruction #2. While the instruction#1 is being tackled by the decoder stage, the fetch circuit is tackling instruction #2. Similarly if the instruction # 1 has reached writeback stage, instruction #2 would be present inside the execute stage, instruction #3 would be present inside the register read stage and instruction 4 would be present inside the decoder stage and instruction #5 would be present inside the fetch stage. This thing has been shown in the FIGURE 4.2

If we still use the same value for the time consumed by one instruction in the old processor namely one second now in the pipelined processor which has five stages the instruction will consume 0.2 second in fetch 0.2 second in the decode, 0.2 second in the register read and so one. In other words each instruction will take 0.2 second for each stage. Thus the total time for each instruction inside the pipelined processor is still the same namely 1 second but the instructions coming out of the pipelined processor take much less time that the unpipelined processor. The instructions come out of the pipelined processor every 0.2 second

Thus by splitting one circuit of the old processor into 5 stages we have made the processes faster from 1 second to 0.2 second If there are 5 stages in a pipelined processors the new time consumption is old time divided by number of stages. If there were 50 stages inside the processor the consumption time would be time taken by the old processor divided by 50 i.e., 1/50 = 0.2 seconds. Thus more the stages inside the processor lesser would the time taken by the instructions to come out of the processor. 4.2 Latches: When the old processor is partitioned. we did make the process faster. However one thing has to be kept in mind. Each stage of the pipelined processor works fine but once it has finished it job its output must be saved so that it is provided to the next stage in the next cycle. The saving of the result of a stage and its provision to the next stage in the next cycle is done by a latch. This latch is introduced after each stage and is shown in the FIGURE 4.2

Some examples for pipelining are given below Laundry: If you have a laundry to wash you will put the load into washer when washer has done its job you will take out the load and put it in dryer The washer, which is now empty can now take another load and start washing it; in the meantime dryer is drying up the load When the dryer had done its job and the washer has also done its job which will finish together supposing that each unit will take the same time as others, the load will be transferred to the iron for pressing the clothes Thus three units namely washer, dryer ad iron are working together; and three loads are being worked upon together. The third load is being washed by the washer The second load is being dried by the dryer And the first load is being pressed This is exactly what pipelined computer does with the instruction Latency: It is the time taken by a single instruction to execute in a certain circuit. Let the instruction / operation be add. Imagine a black box with input and output terminal as shown below

FIG 4.1 a black box representing a certain circuit with input and output terminals

Suppose the add instruction enters the black box through the input terminal, remains for some time inside the box during its processing, and then comes out from the output terminal after it has been processed. The time it remains inside the box for processing is called latency. This latency needs qualifiers to exactly represent the system which is causing the latency, Thus we can have pipelined processor latency which means the time taken by an instruction to pass all the stages inside a processor. This latency will be called instruction latency for the pipelined processor circuit which is inside the black box. Latency can be measured in time or cycles. 4.3 PIPELINING AND CLOCK CYCLES Imagine the add operation again sent to a single circuit which completely processes it and then gives the result at the output. Suppose it takes a single clock cycle to process the operation. Then the latency of add operation will be said to be one cycle for that circuit. Let us now compartmentalize that single circuit intelligently into five parts which are serial. That is the instruction enters the first part call it first stage which is named FETCH and then its result is passed to the second stage called DECODE. After it has been decoded its result is passed to the third stage called REGISTER READ stage. After this stage has done its job completely its result is passed to the following stage called EXECUTE stage and subsequently the Execute stage result is passed to WRITE BACK stage. Suppose these five stages do exactly same processing as the original one unbroken circuit would do. The stages names are once again mentioned. Fetch Decode Register Read Execute and Writeback Suppose that a number of assembly language instructions are placed in a block somewhere in the computer and the processor has been required to process those instructions one by one in the same sequence in which they are. Then each instruction when it is fetched form the block will pass through five stages which have been named above. These five stages are explained below. FETCH FETCH means to bring the instruction whose turn it is, from that block to the processor. The processor has a part which is called program counter (PC). The program counter starts with the bits stored in it which can be called address of the first instruction. The first instruction is brought into the processor and processed whose address is contained in the program counter. After it has been processed the program counter increments itself to the address of the next instruction. The increment is done by four, because

Memory form where the instructions are imported proceeds by bytes (8 bits) and he instruction size is 32 bits. So four bytes have to be fetched from the memory and these four bytes together make one instructions. DECODE Decode means determination by the processor or to understand by the processor what is to be done with instruction. This is done at the decode stage. Understanding means resolution by the processor whether the instruction requires addition, subtraction, ANDing, ORing, Storing or Loading etc. Normally the instructions are first written in Assembly Language. Assembly language instructions are for human consumption because human being can easily understand assembly language. The compiler then coverts assembly language program into machine language. Computer understands only machine language instruction and then they are converted in to machine language by the compiler add r1,r2 ,r3 sub r1,r2 ,r3 mul r1,r2 ,r3 div r1,r2 ,r3 and each instruction before REGISTER READ Almost each instruction requires Register Read operation, which means to read registers form the register file in the computer. Mostly three registers are mentioned in each instruction, if your processor is MIPS. The data from two registers are read and the two data are either added, subtracted or compared etc as the instruction calls for. Mostly instruction read two registers and writes the result in the third register. EXECUTE Execute means that the execute stage of the processor, after reading the data form the two registers would execute the instruction; that is, the two data which are read form the memory are mixed as the instruction calls for. WRITEBACK Writeback means that the result after mixing the two data is written by the processor into a third register whose name or number is supplied in the instruction These operations are shown by a diagram in FIGURE 4.2

FIG 4.2

Figure 4.2: A processor processes the data available in register1 and register 2 and put the result in register3 4.4 MIPS It has been designed for pipeline execution Its each instruction is 32 bits long The first 6 bits and the last 6 bits together always indicate what the instruction wants The two registers names from where the data is to be read always occupy the same place in the 32-bit machine language instruction. Because of these systematic placements of sets of bits as to which bits are responsible for operation and which bits are responsible for register read, some stages can be proceed together. For instance decode and register read operations can be processed in the same cycle. Also Execute and calculation of the address for write back can proceed in the same cycle.

4.5 HAZARDS
If there is no hazard (danger) offered by the pipelined processor to the instructions in their execution, the flow of instructions will be smooth. That is the instructions will flow smoothly along slanting lines, as shown in the diagram 4.3, from top end to bottom end without any horizontal part in them indicating stoppage of the instruction. Say, if there are ten instructions in a program and these instructions find no hazard in the processor you will see ten straight slanting lines parallel to one another in the flow diagram of figure 4.3 As the instructions run on the processor sequentially, in ideal case they would run smoothly. That is, as shown in the diagram Figure 4.3 each of the instruction would follow straight line. 1 2
Instr#1

Cycle 3
Instr#2

IF Pipelined ID Stages RR EX WB

Figure 4.3 Two instructions are shown passing through a pipelined processor one after the other. No hazards are met on the way. The lines are parallel and straight and slanting. Normally if the program is reasonably long hazards will have to be confronted. Hazards occur because one instruction has to stop for some reason on the way and consequently the following instructions also have to stop. Obviously so because the path for the following instructions have been closed by the foregoing instruction. One example for the stoppage is suppose the first instruction wants to write a certain register, say register # 10 with some data and immediately following instruction requires to read the same register #10 for the data which has been written by the first instruction. Because that data has not yet been written by the instruction #1 (see the flow diagram especially at what cycle the writing of a register #10 takes place and for the following instruction at what cycle reading of register #10 takes place. You will note that writing of the register#10 by instruction #1 takes place later than the reading of the register by the following instruction#2. So the following instruction #2, immediately following instruction #1 will have to stop progressing in the processor and will wait in its place at the read register stage till the data has been written by the instruction#1. This is one example of the hazard. This hazard is called RAW (read after write); that is the instruction #2 which is following instruction #1 wants to read after the instruction #1 has written the register. Note it is the same register which has to be written by instruction #1 and then read by the following instruction#2. Consider an example the following two instructions; these are consecutive instruction in a certain program: add r1, r2 .r3 sub r4,r5,r1 The top instruction tells to add r2 and r3 and pout the result in r1. The immediately follow instruction tells the processor to subtract r1 from r5 ad put the result in r4. Note that both the instruction are dealing with the same register r1. The top instruction uses r1 for storing the answer while the bottom instruction uses the same register r1 to read it. If you look at the instruction flow diagram you will note that writing process of the top instruction takes place later than the read register stage of the following instruction. IN fact there are four tyoes of Hazrd. We know that hazards are the dangers involving read of a register and writing of the same register. Thus there are four types of possible hazards, namely Read After Read (RAR) Read After Write (RAW) Write After Read (WAR) Write After Write (WAW)

RAW Raw hazard is also called by other names: DATA DEPENDENCIES OR TRUE DEPENDENCY. In fact often instructions do not follow the smooth slanting line shown in the diagram above, instead they suffer disruptions. One example of disruption is that in

any two consecutive instructions in the program. the following instruction wants to read data from a given register which the earlier instruction is supposed to have written. (see the diagram) This would evidently require that the following instruction stay put in its stage waiting till the earlier instruction has written the given register. This fact is shown by the two instructions shown below. Five stages of the pipeline and two instruction have been shown in the diagram. add r1, r2 .r3 sub r4,r5,r1 The top instruction gets executed earlier than the bottom instruction which gets executed later. In the above two instructions the bottom instruction (the sub one) wants to read data from register r1 but data has not yet been written by the add instruction (please see the diagram ). It would be advisable that we talk about the bottom instruction first and the top instruction later; so that you understand the syntax of the hazard RAW Did you note that in the bottom instructions, it is the register, r1, which is being read and it is the same register which is being written by the top instruction. If the two consecutive instructions use the same register for read and write, then it means that a hazard is sure to appear. Sometimes we may avoid this hazard if we had used different registers for read and write in the instructions. add r1, r2 .r3 sub r4,r5,r6 The bottom instruction uses registers r4, r5 and r6 whereas the top instruction uses registers r1, r2 and r3. So no register has a clash meaning that no hazard will ever exist. Doing so may not be desirable and we may have to stick to same register It can be easily seen that in the case of above two instructions, add and sub, the bottom instruction which is sub, will have to wait at cycle 4 and 5 so that the earlier instruction add has done writing on register r1. This writing takes place at the end of the cycle 5. This waiting is shown by the horizontal part of the line in the graph of bottom instruction Those stages where an instruction stops and does not do any operation except waiting we say bubble or stall has been caused at that stage. No operation is introduced at that stage by the processor. If there were another instruction immediately following the sub instruction this would evidently have to stop behind the sub instruction and the graph of the three instructions would be represented as follow. Le us draw the RAW hazard by drawing the graphs of the two instructions add and sub

1 IF Pipelined ID Stages RR EX WB Figure 4.4

2
Instr#1

Cycle 3
Instr#2

The sub instruction had to stop for two cycles waiting for the add instruction to finish writing r1 registers. This stopping is shown by the horizontal line. Thus sub instruction at the end of the cycle 3 had to wait for cycle 4 and cycle 5 and then proceed. RAR Add r1,r2,r3 Sub r4,r5,r3 The bottom instruction wants to read register r3, and the top instruction has already read r3. Reading a register does not produce any clash. So there is no problem. Note that reading a register will not cause any disruption. The graph will be shown as follows 1 IF Pipelined ID Stages RR EX WB Figure 4.5 WAR Again consider the two instructions Add r1,r2,r3 Sub r2,r5,r6 The bottom instruction wants to write register r2, the register r2 has already been read by add instruction. So no problem would exist. So the Figure of 4,5 will apply WAW hazard This means write after write hazard. Consider two instructions: 2
Instr#1

Cycle 3
Instr#2

Add r1,r2,r3 Sub r1,r5,r6 The bottom instruction wants to write register r1 which has already been written by top instruction. Apparently there is no hazard. But does it serve any purpose. The top instruction has written register r1 and the sub instruction again rewrites register r1. No instruction in the program has read r1. So writing by the top instruction seems useless in this example. However when we study a processor where more than one processors are working together and reading writing is required by many processors in the same cycle we may have to confront such a situation. As for now the figure of 4.4 will apply Thus hazard does not mean that delay has occurred in the flow of the instruction, it unnecessary thing has taken place We have not spoken yet that there are the factors which to some extent nullify the advantages of pipelining. They are We have presented 4 hazards so far namely RAW, RAR. WAW and WAR amd we have also seen that in the vase if one pipeline register only RAW seems to be the majot disruption. The rest of the hazards in fact do not seem to be the hazard BRANCH Another delay in processing is caused by the branching instructions namely branch if equal (beq) branch if not equal (bne) A FIGURE is shown below produced for branching instruction. Let us consider two consecutive instructions written in a program: add r1, r2, r3 beq r5,r1, L The first instruction calls for addition of the contents of r2 and r3 and storing of the result in r1. The second instruction calls for branching to label L if the contents of register r5 and r1 are equal. The flow diagram is shown below in FGURE 4.6.

Figure 4.6 The first instruction will flow through the processor without any hindrance, The second instruction namely beq will stop right at the fetch stage because the instruction does not know if it has t branch to L or take the next instruction in from the next location as told by PC. You see that branch instruction has been stopped for XX cycles. It will proceed only until after the register r1 has been written by the first instruction . Two remedies can be provided in the case of branch instruction and if not all, few cycles can be saved. The two remedies are Branch prediction

CHAPTER 5 PERFORMANCE
RESPONSE TIME EXECUTION TIME ELAPSED TIME All the three names refer to the same time. In other words response time, execution time and elapsed time are one and same thing it is the time taken by a system to COMPLETELY finish a certain task i.e., a program. Let me emphasize again it is the time for the COMPLETE EXECUTION of a task. This means if the said task /program requires aside from CPU execution time other activities such as access to hard disk access to memory operating system activities and some I/O activities,

Those times must also be included. Thus certain computers because of their fastest processors may take lesser CPU execution time but because of their sluggish hard disk, or slow performing operating system or slow memory, the net time to finish the task may turn out to be more for computer with fast processor. This means that in order to have least response time from amongst the competitor computers, not only do we need to have fast processor but other components such as hard disk, memory and operating system of the system should also be fast. 5.2 PERFORMANCE OF A COMPUTER Performance =

5.3 execution time CPI stands for average No. of Cycles Per Instruction, and C stands for the number of instructions which we call here count, This formula is better than the other two in the sense that CPI and C are readily available for any program and so its execution time can easily and accurately be calculated The average CPI can easily be calculated from the knowledge of number of cycles consume by each instruction. Consider a program made of 10 instructions, say. Each instruction takes a number of cycles mentioned against it.

Instruction No. First instruction 2nd instruction 3rd instruction 4th instruction 5th instruction 6th instruction 7th instruction 8th instruction 9th instruction 10th instruction

No of cycles 1 1 1 2 1 1 3 1 1 2 TOTAL 14 Thus 10 instructions take 14 clock cycles the average CPI is 14/10 = 1.4. The count for this program is 10. Often we talk about the performance of a computer. The performance of a computer essentially relates to the execution time of a given program. If the system has a single processor and that is running more than one program simultaneously, that is, sometime it is working on one program and the other time it is working on another program then the response time or performance will be complex to calculate. And this may happen quite often because the new systems have multitasking characteristic The performance of a computer is measured either in time or no of cycles consumed to finish the program running on the computer. Imagine two computers A and B are given and we have been asked to measure which computer is better performer. Obviously the one which takes lesser time is better performer. Therefore performance of a computer is inversely proportional to the time taken to finish a program. This can be written in equation form as : Example 1 A given program takes 10 s on computer A, whose clock cycle rate (Clk. Cyc. Rate) is 4 GHz. Another manufacturers tries to make his computer which will take 6 seconds to finish the same job but calculates that he has to consume 1.2 times as many cycles as A consumes. What should be the clock rate of computer B A B Time taken = 6 s Clock Rate: ? Clock cycles required = 1.2 x 40 x 109 = 48 x 109

Time taken = 10 s Clock Rate: 4 GHz So Clock cycles required = 4 x 109 x 10 = 40 x 109

Clk. Rate = 48 x 109 / 6 = 8 GHz 5.4 RELATIVE PERFORMANCE Often we compare two computers and ask ourselves which of the two computers is faster or suppose we have a computer and introduces some changes in its hardware to make it faster and ask ourselves how much this computer has gained in speedup. So in comparison we deal with relative performance. Before we attempt relative performance or speedup we define the following terms
THROUGHPUT CPU EXECUTION TIME USER CPU TIME SYSTEM CPU TIME SYSTEM PERFORMANE CPU PERFORMANCE CLOCK CYCLES PER INSTRUCTION= AVERAGE CLCOKCYCLES PER INSTRUCTION= CPI

Some definitions have to be expressed THROUGHPUT: it is the rate at which the operation gets executed. It is measured either in number of operations per second or operations per cycle. You might imagine a pipeline and start counting the number of instructions which have come out of the pipeline in one second. That number will be called throughput. Normally we count the number of complete programs which have been processed in one second. Consider there are two units: the number of instruction completely processed and the time or cycles taken: The time taken by a system to completely finish a task comprises any different times which have t be considered. CPU EXECUTION TIME / CPU TIME is the time spent by the CPU and does not include time spent waiting for I /O or running for other programs. We have to ignore the time taken by the CPU to do overhead works when changing its program USER CPU TIME: The CPU time spent in the program itself. CPU overheads are not counted in it SYSTEM CPU TIME: The CPU time which is spent by the CPU in performing certain operation solely for the operating system CPU PERFORMANCE: it is calculated based on the use of CPU time CLOCK CYCLE TIME/ TICKS / CLOCK TICKS: Each computer has a clock in it. It is a electronic device which produce pulses at it output terminals. The following figure

shows the clock cycle coming out of device. Each pulse is numbered and the shape of the pulse is also evident

Figure 5.1

Most of the definitions have to be repeated in terms of pulses. CPU execution time ( in clock cycles) = It may be noted that clock cycle time is in fact = So the CPU execution time (in clock cycles) can be written as = From the above equations it is clear that if the system needs to speeded up i.e., if it is wanted that CPU execution time should be made lesser, then either clock cycle time should be made smaller or the clock rate should be increased. There is a trade off between the two parameters i.e., clock cycle time reduction or increase of the clock rate. Suppose a computer A has clock rate = 4 X 109 cycles per second. The time taken to finish a certain program.= 10 seconds Then The CPU clock cycles it will take are= 40 x 109 An ambitious designer wants to make a computer which fast-finishes the program. He discovers that he cannot increase the clock rate and decrease cycle time simultaneously If he decides to increase the cycle time to 1.2 x 40 x 10 9 he ends up with clock rate as given below. It is assumed that the number of instructions remain the same when the new computer is designed. A better formula will be as given below CPU clock cycles = This formula takes care of number of instructions and introduces a new topic of average clock cycles per instruction. This term is often abbreviated a CPI. The word average is not mentioned but it is taken for granted that CPI stands for the average execution time over all the instructions present in the program talked about. If we use this formula then the performance of the two computers A and B can be compared by the following equation.

Performance of computer A = Performance of computer B = And per formance A/ oerformance B CPU time = Instruction count x CPI / Clock rate Remember that only complete and reliable measure of computer performance is the time The above formula can also be written in another form. CPU clock cycles = (CPIi x Ci ) The above formula indicates that the program for which the formula has been written is made up of n sets. Each set has different CPI. The Ci indtaes the count of the instruction is a given set. Thus suppose if there are only 5 sets, when we open the formula it will become CPU clock cycles = CPI1x C1 + CPI2 x C2 + CPI3 x C3 + CPI4 x C4 + CPI5 x C5 The Ci represents the number of the instruction; C stands for the count of the instruction. It is supposed that algorithm, language, the compiler, the architecture and the actual hardware are the same; in other words the compiled program (the machine language program is exactly the same), In fact all the things mentioned above affect the machine language program. The SPEC Benchmarks SPEC stands for System Performance Evaluation Corporation It started in 1989 and was created by a group of vendors of workstations and servers to benchmark the available servers and workstations. In a non- pipelined processor Throughput = 1/(latency) in pipelined processor throughput is than throughput in non-pipelined processor Each stage of the processor finishes its job in one complete cycle. Thus if there are five stages in the pipelined processor. Five cycles have to be consumed, one by each stage, to completely process the instruction.

The Figure 5.2 shown below shows the cycles on the top and the stages of the processor on the left hand side. The first element of the first row of the table (instr. 1) enters in the fetch stage. After it has been processed by the fetch stage, in the next clock 2, it enters the ID stage. This has been shown as the first element in the second row. In the third cycle the instruction1 enters the RR stage. This has been shown as the first element in the 3rd row of the table. Thus the instruction flow in the diagram is shown slanted i.e., neither horizontal nor vertical but slanted. After instruction 1, on its heel, comes instruction 2, Note that it is also shown slanted. Seven cycles have been shown in the Figure 5.1. At the end of 5th cycles instruction 1 completed and comes out of the pipelined processor. Instruction 2 comes out of the pipelined processor at the end of the 6th cycles and so on. EXAMPLE A software designer is to pick one of the two codes; either A or B., whose details are given below. Obviously he will choose that code which runs quicker than the other. Each code has three types of instructions A, B and C and the number of instructions and CPI for the three types are given below Code A Type A: 2 instructions; CPI= 1 Type B: 1 instructions; CPI= 12 Type C: 2 instructions; CPI= 3 The formula to be used is Code B Type A: 4 instructions CPI= 1 Type B: 1 instructions CPI= 2 Type C: 1 instructions CPI= 3

The clock cycles for the code A = 2x1 + 1x2 + 2x3= 10 cycles The clock cycles for the code B = 4x1 + 1x2 + 1x3= 9 cycles The ratio of performance:=

. These codes are to be run on a computer whise details are

BREAK

You might also like