0% found this document useful (0 votes)
70 views114 pages

Slide 1

comp arch 1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views114 pages

Slide 1

comp arch 1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Computer Architecture

(CS F342)
Motivation and Introduction
Automatic, Single & General Purpose Computing

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


2
Class Schedule
•Google Classroom (Class materials)
•Moodle or Quanta (Labs)

Day Venue Time


Monday LT-3 5 PM to 6 PM
Wednesday LT-3 5 PM to 6 PM
Friday LT-3 5 PM to 6 PM
Tuesday [Lab.] CC-219 2 PM to 4 PM
3
Instructors

• Dr Kanchan Manna ([email protected])

• Dr Kunal Korgaonkar ([email protected])


4
TAs
YASH RAJESH BHISIKAR [email protected]

ADITYA S HANDUR KULKARNI [email protected]

JINAM BHAVESH KENIYA [email protected]

NANDAN BIPINBHAI SURANI [email protected]

NISHANT ATUL BHANDARI [email protected]

PATEL DEVARSH AMIT [email protected]

ARNAV GOYAL [email protected]

MANRAJ SINGH CHAHAL [email protected]

HIMANSHU SINGH [email protected]

VAIBHAV JAIN [email protected]


5
TAs
6
Tentative Evaluation Guideline
Duration Weightage
Components Date & Time Nature of Component
(Mints) (%)
Lab 1: Aug 6, 2024
Lab 2: Aug 13, 2024
Lab 3: Aug 20, 2024
Lab 4: Aug 27, 2024
Lab 5: Sep 3, 2024
Lab 6: Sep 10, 2024
Lab 7: Sep 17, 2024 Open Book
Regular Labs. - 15 Lab 8: Sep 24, 2024
Lab 9: Oct 15, 2024 Best n-2 of n
Lab 10: Oct 22, 2024
Lab 11: Oct 29, 2024
Lab 12: Nov 05, 2024
Lab 13: Nov 12, 2024
Lab 14: Nov 19, 2024
Lab 15: Nov 28, 2024

Lab :
Lab. Test 15 Lab : TBA

Midsem 90 30 As per the timetable TBA

Comprehensive 180 40 As per the timetable TBA


7
Text and Reference Books
Textbooks:

(T1) Computer Organization and Design: The Hardware Software Interface


MIPS Edition by David A. Patterson and John L. Hennessy.
(T2) Computer Architecture: A quantitative Approach by David A. Patterson and
John L. Hennessy.

Reference Books:

(R1) Digital Design: With a Introduction to the Verilog HDL by M. Morris Mano &
Michael D. Ciletti
(R2) Verilog HDL: A Guide to Digital Design and Synthesis by Samir Palnitkar.
(R3) Computer Organisation & Architecture: Designing for performance by William
Stallings.
8
We shall answer the following questions
1. What’s computation?
2. What’s uncomputable?
3. What’s automatic computation/automation?
4. What’s a single-purpose microprocessor?
5. What’s a general-purpose microprocessor?
6. How does a laptop solve the problems?
7. Past & Future of Microprocessors
© Kanchan Manna; BITS-Pilani, Goa Campus, India.
9
What is the meaning of Computable?

•What is meant to be computable?


•Is a number prime?
•An algorithm is present
•Is there anything to uncomputable?
•Is a number random?
•Is P =? NP
•An algorithm is not present till now

Unsolved problems: https://en.wikipedia.org/wiki/Millennium_Prize_Problems?fbclid=IwAR1I9mERgOdeaKJASjcXE9AaMht3U6x_4zc0orU_aEC49Vt5U35ZOZB3-Lo


https://en.wikipedia.org/wiki/Smale%27s_problems?fbclid=IwAR3K5n01NUIpeQy_AsXen6dKtMSDzl5RuQ6xQrXuIcNRmdIvR_kPjqVnyEQ
© Kanchan Manna; BITS-Pilani, Goa Campus, India.
10
A problem: Addition of two numbers
• Two numbers: -2 and 8

• Result: 6

• How do humans calculate it?

• How does a calculator/computer/integrated circuit calculate it?

• Is there an algorithm available for this?

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


A problem: Addition of two numbers 11

• How does a calculator/computer/IC calculate it?


• Representation of numbers (sign & magnitude)
• Binary (Digital) system and width
• Is there an algorithm available for this? Full-Adder (1-bit)
Half-Adder Input Output

Cin A B S Cout
Input Output Find the relation between input variables and
0 0 0 0 0 output variables for compressing the table.
A B S Cout
0 0 1 1 0
0 0 0 0
0 1 0 1 0
0 1 1 0
0 1 1 0 1 S = A (XOR) B (XOR) Cin
1 0 1 0
Cout = A (AND) B + A (AND) Cin + B (AND) Cin
1 0 0 1 0
1 1 0 1
1 0 1 0 1

1 1 0 0 1
S = A (XOR) B
Cout = A (AND) B 1 1 1 1 1

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


12
A problem: Addition of two numbers
• How does a calculator/computer calculate it?
• Is there an algorithm available for this?
• Ripple-Carry Adder/Algorithm
• Are there efficient algorithms available for this?

• Datapath

• Controller

Can we write a C-code for the above circuit?

Book- COD by P&H –ch-appendix-B


© Kanchan Manna; BITS-Pilani, Goa Campus, India.
13

Basic Elements:
Mapping of High-level Construct onto Digital
Construct

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


14
Basic Elements:
Mapping of High-level Construct onto Digital Construct

How does one represent the data?

For example,
C language provides datatype to represent and store the data.

How is this data stored in digital components such as mobile phones,


laptops, smartwatches, etc.?

In the storage elements: Registers and Memory

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


15
Basic Elements:
Mapping of High-level Construct onto Digital Construct

Register:

32-bit register

31 1 0

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


16
Basic Elements:
Mapping of High-level Construct onto Digital Construct
Register:

Each location of the register is made of a latch or flip-flop

A latch or flip-flop
• RS
• D
• T

Fig: D-ff
Fig: D-Latch

Why do we need a clock in the CKT?


https://www.electronicsforu.com/technology-trends/latch-not-bad-latch-vs-flip-flop Book- COD by P&H –ch-appendix-B
https://www.build-electronic-circuits.com/d-flip-flop/
17
Basic Elements:
Clock

T: Clock (clk) Period

CLK edge sensitive


+5 volts, logic 1
+10 volts, logic 0 Book- COD by P&H –ch-appendix-B
18
Basic Elements:
Verilog D-Latch and D-F/F

https://circuitfever.com/d-flip-flop-in-verilog
How to map Algorithm’s elements onto Architectural 19

elements

High-level Construct Digital Construct Type of Component


Scalars/variable Registers or wires Sequential if Reg.
Arrays Memories Sequential
Operators (+,-,*,/, etc) Functional unit Combinational
Control flows [if-else, Control unit Combinational/Sequential
switch & loop]

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


How to map Algorithm’s elements onto Architectural 20

elements
if (sel)
a = 10;
else
a = 5;

Multiplexer

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


How to map Algorithm’s elements onto Architectural 21

elements
if (sel)
a = 10;
else
b = 5;

Decoder

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


How to map Algorithm’s elements onto Architectural 22

elements Write operation Read operation


switch (PC){ switch (PC){
int Mem[1024], Data;
case LOC0: case LOC0:
Mem[LOC0] = Data; Data_out = Mem[LOC0]
Mem [10] = Data; break; break;

case LOC1: case LOC1:


Data = Mem [20]; Mem[LOC1] = Data; Data_out = Mem[LOC1]
break; break;

case LOC2: case LOC2:


0 Mem[LOC2] = Data; Data_out = Mem[LOC2]
break; break;
1
case LOC3: case LOC3:
Mem[LOC3] = Data; Data_out = Mem[LOC3]
break; break;

} }

Mem Multiplexer
Decoder
How to map Algorithm’s elements onto Architectural 23

elements
Load (Max. Value)
int i;

for (i=10; i>0; i--){


Zero/stop?
CLK
} Counter (i)

Counter/Register
Decrement Max. Value

Counter Design Using D-F/F:


https://www.youtube.com/watch?v=ts4g_NUuHAc
© Kanchan Manna; BITS-Pilani, Goa Campus, India.
How to map Algorithm’s elements onto Architectural 24

elements Load (Max. Value)

CLK
int i, k;
Counter (i) Zero (1)?
stop
for (i=10; i>0; i--){
for (k=20; k>0; k--){
}
Decrement Max. Value (10) Load (Max. Value)
}
Zero = 1 if the content of
the counter is 0. Zero (1)?
Counter (k)
Stop = 1 if both counters
value is 0. CLK

Decrement Max. Value (20)


Counter Design Using D-F/F:
https://www.youtube.com/watch?v=ts4g_NUuHAc
© Kanchan Manna; BITS-Pilani, Goa Campus, India.
How to map Algorithm’s elements onto Architectural 25

elements
Comparison

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


26
A problem: Find the minimum and maximum
number from a set of numbers
• Set of nos: 1, 2, -8, 0, 23, 11, -10
• Min = -10 and Max = 23
• Is there an algorithm to solve it automatically?
Input: set of n nos stored in A
Output: Min & Max
Min = infi
Max = -infi
Do scan i-th no.
If Max < A [i] then
Max = A [i]
If Min > A [i] then
Min = A [i]
Until i reaches to n
Stop
A problem: Find the minimum and maximum 27

number from a set of numbers


• Is there an algorithm to solve it automatically?
• Yes
• Is there an architecture for that algorithm?
Input: set of n nos stored in A
Output: Min & Max
Min = ∞
Max = - ∞
Do scan i-th no.
If Max < A [i] then
Max = A [i]
If Min > A [i] then
Min = A [i]
Until i reaches to n
Stop

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


A problem: Find the minimum and maximum 28
High-level Construct Digital Construct
number from a set of numbers Scalars/variable Register or wire
• Is there an algorithm to solve it automatically? Arrays Memories
• Yes Operators Functional unit
Control flows Control unit
• Is there an architecture for that algorithm?
Input: set of n nos stored in A Datapath
Output: Min & Max 10 -∞
Min = ∞
Stop 0 1
Max = - ∞ = LoadMax

Do scan i-th no.


CLK CLK
If Max < A [i] then PC/i
Max = A [i]
< MaxReg
If Min > A [i] then + MEM CLK
Min = A [i] MinReg
Until i reaches to n >
Stop 0 1 LoadMin
Controller

High-level Synthesis
© Kanchan Manna; BITS-Pilani, Goa Campus, India.
29
MinMax Processor
Datapath
10 -∞
Stop 0 1
= LoadMax

CLK CLK
PC/i
< MaxReg

+ MEM CLK
MinReg
>
0 1 LoadMin
Controller

How does one measure the performance of this processor?


After giving inputs to it and time to get the output.
How does one include time in the processor?
Clock
30
Basic Elements:
Clock

T: Clock (clk) Period

Setup and Hold time come from


the Library.

Book- COD by P&H –ch-appendix-B


31
Data Setup time, Data Hold time, Clock Period calculation
• Slow path
Read Write Data launched at one flop (Di) should be captured at the
• Fast path
Di: input F/F next flop (Do) in the next clock edge. Data launched at
(source) one edge should be captured at the next active clock
Logic
Do: output F/F Di (TL) Do edge.
(sink)
Let’s consider the clock period as T
CLK

T: Clock (clk) Period


TL: Time to compute Logic

Book- COD by P&H –ch-appendix-B


32
Data Setup time, Data Hold time, Clock period calculation
• Slow path
Read • Fast path Write
Di: input F/F
(source) Let’s consider the clock period as T
Logic
Do: output F/F Di (TL) Do
(sink)
CLK

If data hold time check is violated, data intended to be captured at the


next edge will get captured at the same edge.
A problem: Find the minimum and maximum 33

number from a set of numbers


Find clock period (T)
10 -∞
Time (Input_Reg to Output_Reg)
Stop 0 1
= LoadMax
Time (Reg. i to MaxReg)
CLK CLK Time (Reg. i to MinReg)
PC/i
< MaxReg
Datapath
Time (Reg. 10 to MaxReg)?

?
+ MEM CLK
MinReg Find maximum Time from all
> possible Time values

0 1 LoadMin
Controller

© Kanchan Manna; BITS-Pilani, Goa Campus, India.


34
Can we execute other algorithms on this processor?
Special-purpose/Dedicated Processor
MinMax Processor
Input: set of n nos stored in A
Output: Min & Max
Min = ∞ 10 -∞
Max = - ∞
Stop 0 1
Do scan i-th no. = LoadMax

If Max < A [i] then


CLK CLK
Max = A [i] PC/i
If Min > A [i] then
< MaxReg
Datapath
Min = A [i] + MEM CLK
Until i reaches to n MinReg
Stop >
0 1 LoadMin
Controller

MinMax Processor
Necessity of General-purpose processor 35
• Is there an Algorithm which will execute or simulate other Algorithms?
• The processor executes any algorithms
• Programmable
• Turing Model
• Is there any limitation of such an Algorithm?
• Halting problem: Can we have an Algorithm which takes other Algorithm as input and decides that
whether given input Algorithm will halt/stop or not, in general?
• Consider [*] such Algo. exists A(P, D). Another Algo. B(X): loop-forever if A (X, X) = “Halt” else
Halt. Next use B(B), it is unable to decide the answer. A(P, I) doesn’t exist.
• Used Self-referential structure
• What kind of Algorithm do we need for making the processor general purpose
or programmable?
• Fetch-and-Execute Algorithm Fetch-and-Execute Processor
• Stored program (?) [*]
• Generalized Datapaths ALU/FU
• Generalized Functional Unit Datapath All possible
MEM Operations
• Proposed by Jhon von Neumann [*]

Controller
[*] An URL is embedded.
36
Fetch-and-Execute Algorithm
• What is to be fetched?
• Program/instructions (Birth of the program or software)

• Representation of instruction (opcode & operand)


• Representation of number (sign and magnitude) [parallel reasoning]

• Where are the instructions coming from?


• It’s stored in storage/memory

• How do the instructions execute?


• What kind of operation is it?

• Decode the instruction

• Execute the instruction


37
Fetch-and-Execute Algorithm
• How does one represent the instruction?
• Is there any similar problem we solved?

Number representation
• Representation of instruction
• Instruction format
38
Fetch-and-Execute Algorithm Number representation

• Representation of instruction
• Instruction format

Microprocessor
without
Interlocked
Pipelined
Stages
39
Fetch-and-Execute Algorithm

Inst. opcode Opcode

LW 100010

SW 100011

BEQ 000100

R-type 000000

addi 001000

j 000010
40
Shift instructions in MIPS
41
Shift instructions in MIPS
42
Shift instructions in MIPS
43
Fetch-and-Execute Algorithm
• Bit wise and (&) and shift operations (<< and >>)
44
MIPS Processor/Generalized components
#define OPCODE 0b11111100000000000000000000000000
#define RS 0b00000011111000000000000000000000 Fetch-and-Execute Algorithm 45
#define RT 0b00000000000111110000000000000000
#define DST 0b00000000000000001111100000000000 //RD
#define OFFSET 0b00000000000000001111111111111111

int PC, IMM[1024], DMM[1024], RF[32];


Load(IMM, DMM);
Set PC with address 1st instruction which is stored in IMM;
while (1){ Let’s define ALU()
switch((IMM[PC] & OPCODE) >>26){
case R-type:
RF[(IMM[PC] & DST) >> 11] = ALU(RF[(IMM[PC] & RS)>>21], RF[(IMM[PC] & RT) >>16]); PC = PC + 4; break;
case S-type:
DMM[ALU((IMM[PC] & RS)>>21, (IMM[PC] & OFFSET) ] = RF[(IMM[PC] & RT)>>16]; PC = PC + 4; break;
case L-type:
RF[(IMM[PC] & RT) >>16]= DMM[ALU((IMM[PC] & RS) >>21, IMM[PC] & OFFSET) ]; PC = PC + 4; break;
case B-type:
IF (ZERO) PC = (PC + 4) + ((IMM[PC] & OFFSET) <<2); ELSE PC = PC + 4; break;
}
Can we have an Architecture for this Algorithm?
}
#define OPCODE 0b11111100000000000000000000000000
#define RS 0b00000011111000000000000000000000 Fetch-and-Execute Algorithm for 46

46
#define RT
#define RD
0b00000000000111110000000000000000
0b00000000000000001111100000000000
MIPS Microprocessor (32 bit)
#define SHIFT 0b00000000000000000000011111000000
#define OFFSET 0b00000000000000001111111111111111
ALU(Src1, Src2){
switch (ALUControl){
int PC, IMM[1024], DMM[1024], RF[32], ALUControl; bool ZERO; case B-type: ZERO = (Src1- Src2) == 0 ? 1: 0;
Load(IMM, DMM); break;
Set PC with address 1st instruction which is stored in IMM; case ADD: return (Src1 + Src2); break;

while (1){

}
switch((IMM[PC] & OPCODE) >>26){ Need conversion for
}
case R-type: 16 bit offset to 32 bits
Set ALUControl; //ADD, SUB, AND, OR, etc
RF[IMM[PC] & DST] = ALU(RF[(IMM[PC] & RS)>>21], RF[(IMM[PC] & RT) >>16]); PC = PC + 4;
case SW-type:
Set ALUControl = 0b0010;
DMM[ALU((IMM[PC] & RS)>>21, (IMM[PC] & OFFSET) ] = RF[(IMM[PC] & RT)>>16]; PC = PC + 4;
case LW-type:
Set ALUControl = 0b0010;
RF[IMM[PC] & RT]= DMM[ALU((IMM[PC] & RS) >>21, IMM[PC] & OFFSET) ]; PC = PC + 4;
case B-type:
Set ALUControl = 0b0110;
ALU(RF[(IMM[PC] & RS)>>21], RF[(IMM[PC] & RT) >>16]);
IF (ZERO ==1) PC = (PC + 4) + ((IMM[PC] & OFFSET) <<2); ELSE PC = PC + 4;
}
}
(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
47
Analysis of data-path for Fetch stage 47

Why 4?

+
4 Clock Period (T-pc-to-pc)

Source Reg is PC
Destination Reg is PC
Read address

PC

Instruction

CLK
Instruction Memory
IMM[PC]; PC = PC + 4

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


48

48 Analysis of data-path for R-type instruction


• ADD $S1, $S2, $S3 //$S1 $S2 + $S3
op rs rt rd shamt funct
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)

25:21 Clock Period (T-pc-to-rf)


Read Read
register 1 data 1
Source Reg is PC
Instruction 20:16 Destination Reg is RF
Read
register 2
15:11 ALU
Write
register
Read Shamt=5’b0
Write data 2 ALUControl
data
ALUDecoder
Result RegWrite
CLK
ALU
5:0 ALUOp = (00) 2 Slides
(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
49

49
Analysis of data-path for I-type instruction Is offset a physical address?
• LW $S1, offset[$S2] //$S1 DM[ offset + $S2]
No. It is a relative address (here,
relative with respective to Reg.)
op rs rt offset

6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)


50

50
Analysis of data-path for I-type instruction Is offset a physical address?
• LW $S1, offset[$S2] //$S1 DM[ offset + $S2]
No. It is a relative address (here,
relative with respective to Reg.)
op rs rt offset

6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0) MemWrite

25:21 Read Read


register 1 data 1 Address
Instruction
Read Data memoryRead
register 2 data

20:16 Write data


Write ALU
register
Read ALUControl
Result Write 31:0
data 2
data
MemRead
15:0 CLK RegWrite Sign ALUDecoder
Extn.
offset ALUOp = (00) 2
LW

How does one measure the clock period? (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
51

51
Analysis of data-path for I-type instruction
• SW $S1, offset[$S2] //DM[ offset + $S2] $S1

op rs rt offset

6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)


52

52
Analysis of data-path for I-type instruction
• SW $S1, offset[$S2] //DM[ offset + $S2] $S1

op rs rt offset

6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0) MemWrite

25:21 Read Read


register 1 data 1 Address
Instruction 20:16 Read Data memoryRead
register 2 data

Write ALU Write data


register
Read SW ALUControl
Write data 2
data
MemRead
15:0 CLK RegWrite Sign ALUDecoder
Extn.
offset 31:0
ALUOp = (00) 2

How does one measure the clock period? (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
53

53
Analysis of data-path for I-type instruction
• LW $S1, offset[$S2] //$S1 DM[ offset + $S2]
• SW $S1, offset[$S2] //DM[ offset + $S2] $S1
op rs Rt offset

6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)


54

54
Analysis of data-path for I-type instruction
• LW $S1, offset[$S2] //$S1 DM[ offset + $S2]
• SW $S1, offset[$S2] //DM[ offset + $S2] $S1
op rs rt offset

6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0) MemWrite

25:21 Read Read


register 1 data 1 Address
Instruction 20:16 Read Data memoryRead
register 2 data

20:16 Write data


Write ALU
register
Read SW ALUControl
Result Write data 2
data
MemRead
15:0 RegWrite Sign ALUDecoder
Extn. 31:0
ALUOp = (00) 2
LW

How does one measure the clock period? (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
Analysis of data-path for I(B)-type instruction 55
• BEQ $S1, $S2, offset //Jump to the offset no. of instr., when $S1 = $S2
55
• BNE $S1, $S2, offset //Jump to the offset no. of instr., when $S1 != $S2
op rs rt offset
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


Analysis of data-path for I(B)-type instruction 56
• BEQ $S1, $S2, offset //Jump to the offset no. of instr., when $S1 = $S2
56
• BNE $S1, $S2, offset //Jump to the offset no. of instr., when $S1 != $S2
op rs rt offset
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)

PC + 4
Sum Branch Address

25:21 Add
Read Read
register 1 data 1
Instruction 20:16 Read Left Zero
register 2

Write
Shift
by
2-bits
? ALU Offset indicates number
register of instructions
Read 31:0
Write data 2 ALUControl
data Left shift by 2-bits to
ALUDecoder align the instruction
RegWrite Sign boundary
15:0 Extn.
ALUOp = (01)2
How does one measure the clock period? (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
57

57
Analysis of data-path I-type instruction
• ADDI $S1, $S2, -12 //$S1 $S2 + (-12)

op rs rd Immediate
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


58

58
Analysis of data-path I-type instruction
• ADDI $S1, $S2, -12 //$S1 $S2 + (-12)

op rs rd Immediate
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)

25:21 Read Read


register 1 data 1
Instruction
Read
register 2
+
20:16 ALU
Write
register
Read 31:0
Result Write data 2 ALUControl
data
RegWrite Sign
Extn. ALUDecoder
15:0

Immediate ALUOp = (00) 2

How does one measure the clock period? (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
Analysis of data-path j-type instruction 59

59
• J addrs //PC PC[31:28]addrs[27:0]
op address
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


Analysis of data-path j-type instruction 60

60
• J addrs //PC PC[31:28]addrs[27:0]
op address
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)

P Read
C address

Instruction

Instruction
Memory

?
31:28 +
25:0
4 << 2

27:0

How does one measure the clock period? (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
61

61
Building Microprocessor
• Designed the individual datapath for
• Instruction Fetch
• R-type instructions
• I-type instructions
• J-type instructions

• How does one build the microprocessor with these instructions?


• Merge the datapath for all the instructions type, including the
fetch, How?
• Using MUX

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


Combined Fetch cycle, R, M and I-type data-path 62

• For ALU and write register, source of data, for the input, is more than one
62
• Insert MUX before such input signal and control the inputs through MUX-select line

How does one measure the clock period?

+
4 MemWrite
RegDst
ALUSrc

25:21 Read Read MemtoReg


Read register 1 data 1 Address
PC address
20:16 Read
Instruction Data Read 1
register 2
0 memory data M
0 M U
Instruction M Write U ALU Write X
Memory U register X data 0
15:11 X Read 1
1 Write data 2
data S ALUControl
W MemRead
15:0 Result RegWrite Sign ALUDecode
Extn r
.
5:0 ALUOp
LW
(c) Kanchan Manna; BITS-Pilani, Goa Campus,
India.
Combined Fetch cycle, R, M, I and B-type data-path 63

23
0
M
U
X
+ 1
+
4 <<2 MemWrite
RegDst
ALUSr
c
25:21
Read Read MemtoReg
Read Branch
register data 1 Address
PC address
1
20:16
Read Zero Data Read
Instruction 1
register memor data
0 M
0 2 y
M U
Instruction M Write U ALU X
Memory U register X Write 0
15:11 X Read 1 data
1 Write data 2
data S ALUControl
W MemRead
15:0 Result RegWrite Sign ALUDecode
How does one measure the clock period? Extn r
.
5:0 ALUOp
LW
(c) Kanchan Manna; BITS-Pilani, Goa Campus,
India.
Combined Fetch cycle, R, M, I, B and J-type data-path 64

24
0
How does one measure the clock period? M
U
X
+ 1
+
4 <<2
RegDst MemWrite
ALUSrc MemtoReg
0 25:21
Read Read
M Read Branch
register data 1
U PC address Address
1
X 20:16
1 Read Zero Data Read
Instruction 1
register memor data
0 M
0 2 y
M U
Jump Instruction M Write U ALU X
Memory 15:11 U register X Write 0
X Read 1 data
1 Write data 2
31:28 data S ALUControl
W MemRead
25:0 15:0 Result RegWrite Sign ALUDecode
Extn r
27:0 .
<<2 5:0 ALUOp
LW

(c) Kanchan Manna; BITS-Pilani, Goa Campus,


India.
65

26
Identify the control signals
• Jump How can we design a Controller?
• RegDst ALUOp Meaning

00 add
• RegWrite
01 subtract
• ALUSrc 10 Look at funct field
• Branch 11 n/a

• ALUOp
• MemRead
• MemWrite
• MemtoReg

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


66

27
Control Unit
Inputs
MemtoReg

MemWrite
Main Branch
Decoder Jump
ALUSrc Outputs
RegDst
RegWrite

ALUOP1:0

ALU
ALUControl2:0
Decoder

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


67

28
Generation of Controls: Main decoder truth table
op-code part [31:26]
Inputs to the control unit: op-code part [31:26] and funct part [5:0] of the ALUOp Meaning
instruction
00 add

6:2^6 01 subtract

10 Look at funct field


Microprogrammed CU
11 n/a

Output of the control unit:

Instr. Jump RegDst RegWrite ALUSrc Branch ALUOp1 ALUOp0 MemRead MemWrite MemtoReg
(Input)
R-type 0 1 1 0 0 1 0 0 0 0
lw 0 0 1 1 0 0 0 1 0 1
sw 0 x 0 1 0 0 0 0 1 x
addi 0 0 1 1 0 0 0 0 0 0
B-type 0 x 0 0 1 0 1 0 0 x
J-type 1 x 0 x x x x 0 0 x
(c) Kanchan Manna; BITS-Pilani, Goa Campus, India. Book- COD by P&H –ch-appendix-D
68

28
Generation of Controls: Main decoder truth table
op-code part [31:26]
Inputs to the control unit: op-code part [31:26] and funct part [5:0] of the
instruction

Consider ALUSrc If ( (! R-type & ! B-type & lw) ||


6:2^6
(! R-type & ! B-type & sw) ||
What could be the C-expression? (! R-type & ! B-type & addi) )

lw sw
ALUSrc = 1;
Output of the control unit:
Else
Instr. Jump RegDst RegWrite ALUSrc ALUSrc = 0;
(Input)
R-type 0 1 1 0 How can we write it using Logical expression?
lw 0 0 1 1
ALUSrc = (! R-type & ! B-type & lw) + (! R-type & !
sw 0 x 0 1
B-type & sw) + (! R-type & ! B-type & addi)
addi 0 0 1 1
B-type 0 x 0 0
Hardwired CU
J-type 1 x 0 x
(c) Kanchan Manna; BITS-Pilani, Goa Campus, India. Book- COD by P&H –ch-appendix-D
69

29
ALU Operations

ALU control lines (3:0) ALU Functions

0000 AND

0001 OR

0010 Add

0110 Subtract

0111 set on less than

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


70
Generation of Control: ALU decoder truth
30
table

ALUOp ALU control lines (2:0) Funct

00 010 (add) X

01 110 (subtract) X

1X 010 (add) 100000 (add)

1X 110 (subtract) 100010 (sub)

1X 000 (and) 100100 (and)

1X 001 (or) 100101 (or)

1X 111 (set less than) 101010 (slt)

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


71

31
Generation of Controls
Inst. opcode ALUOp Instr. operation Funct field Desired ALU ALUControl
action
100010 (LW) 00 load word xxxxxx add 0010

100011 (SW) 00 store word xxxxxx add 0010

000100 (BEQ) 01 branch equal xxxxxx subtract 0110

000000 (R-type) 10 add 100000 add 0010

R-type 10 Subtract 100010 subtract 0110

R-type 10 AND 100100 AND 0000

R-type 10 OR 100101 OR 0001

R-type 10 set on less than 101010 set on less than 0111

001000 (addi) 00 Immediate xxxxxx add xxxx

000010 (j) xx jump xxxxxx jump xxxx

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


72

34
Performance analysis implementation

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


73

73
Performance analysis of implementation
Resource Usage
Instr. Total Resource
PC IMM ADD_PC RF EXTN ALU <<2 ADD_B DMM RF (Write)
Usage
ADD √ √ √ √ × √ × × × √ 6

BNE √ √ √ √ √ √ √ √ × × 8

J √ √ √ × × × × × × × 3

SW √ √ √ √ √ √ × × √ × 7

LW √ √ √ √ √ √ × × √ √ 8

ADDI √ √ √ √ √ √ × × × √ 7

What could be maximum time to update the PC? Because updated PC will bring the next instruction in the datapath

Which of the instruction using maximum resource? Next consider the time taken by each resource to compute the functionality

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


Consider the pink lines for LW’s path 74

74 0
M
U
X
+ 1
+ CLK <<2
CLK 4 MemWrite
RegDst
ALUSrc CLK MemtoReg
0 25:21 Read Read
M Read register 1 data 1 Branch
U PC address Address
X 20:16
1 Read Zero Data Read
Instruction register 2 1
memor data
0 M
0 y
M U
Jump Instruction M Write U ALU X
Memory 15:11 U register X Write 0
X Read 1 data
1 Write data 2

31:28 data SWALUControl


MemRead
25:0 15:0 Result RegWrite ALUDecode
Sign
r
Extn.
27:0
<<2 5:0 ALUOp
L
W
(c) Kanchan Manna; BITS-Pilani, Goa Campus,
India.
CLK
Performance analysis of Single-cycle 75

75
implementation

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


76

32
Single-cycle implementation
• The previous design is called single-cycle implementation
• The instruction memory, register file and data memory are all read
combinationally
• What does it mean?
• The new instruction appears to output of instruction memory after some
propagation delay, if the address changes
• Operations are done on rising edge of the clock
• The single-cycle microarchitecture executes an entire instruction in one
clock cycle
• Simple control unit (why?)
• No next state is associated with it
• Every operation is done in a clock cycle

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


Performance analysis of Single-cycle 77

33 implementation

• Need some quantity (or metric) for comparison of two design


• How does one measure the effectiveness of new design?

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


Performance analysis of Single-cycle 78

78 implementation
• How to calculate Delay
• Delay: time between applying the Input and producing the output, another
way to say time between two inputs, i.e., when we can update the PC
• Input is reading an instruction from Memory
• Output is producing the result by the read instruction
• Next input is available when we update the PC by (PC + 4) or Branch address

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


Performance analysis of Single-cycle 79

79
implementation Paramet Delay (ps)
• XYZ-organization is going to build the er

single-cycle MIPS processor in a 65-nm CMOS 30

manufacturing process. The organization has 250

determined that the logic elements have the 20

delays given in Table. Help the organization 200

compute the execution time for a program with 25

20
100 billion instructions.

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


Performance analysis of Single-Cycle 80

80 implementation
Parameter Delay (ps)

30

250

150

200

25

20

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


MIPS ISA & Load Byte (lb) 81

MIPS ISA can be found:


https://www.scribd.com/document/358717972/mips-ref-pdf
MIPS ISA & Load Byte (lb) 82

MIPS ISA can be found:


https://www.scribd.com/document/358717972/mips-ref-pdf

MIPS Reference Card (yumpu.com)


MIPS ISA & Load Byte (lb) 83

Name Instr Type Operation


Load Byte lb rt, imm(rs) I RF[rt] = Signextn(M[RF[rs]+signextn(Imm)](7:0)]
Load Byte lbu rt, imm(rs) I RF[rt] = {24’b0, M[RF[rs]+signextn(Imm)](7:0)]
Unsigned

Load Half-word lh rt, imm(rs) I RF[rt] = Signextn(M[RF[rs]+signextn(Imm)](15:0)]

Load Half-word lbu rt, imm(rs) I RF[rt] = {16’b0, M[RF[rs]+signextn(Imm)](15:0)]


Unsigned
MIPS ISA & Store Byte (lb) 84

Name instrn Type Operation


Store Byte sb rt, imm(rs) I M[RF[rs]+signextn(Imm)](7:0) = RF[rt](7:0)
Store sh rt, imm(rs) I M[RF[rs]+signextn(Imm)](15:0) = RF[rt](15:0)
Half-word

Datapath and control?


Find the minimum and maximum number from a set 85

of numbers MIPS: Microprocessor without Interlocked Pipelined Stages


.data
array: .word 1, 2, -8, 0, 23, 11, -10
array_size: .word 10
array_min: .asciiz "\nMin: "
array_max: .asciiz "\nMax: "
minE: .word 999
maxE: .word -999

.text
main:
la $a0, array
lw $a1, array_size
lw $t2, maxE # max
lw $t3, minE # min
Find the minimum and maximum number from a 86

set of numbers
loop_array:
beq $a1, $zero, print_and_exit
lw $t0, ($a0)
bge $t0, $t3, not_min # if (current_element >= current_min) {don't modify min}
move $t3, $t0
not_min:
ble $t0, $t2, not_max # if (current_element <= current_max) {don't modify max}
move $t2, $t0
not_max:
addi $a1, $a1, -1
addi $a0, $a0, 4
j loop_array
Find the minimum and maximum number from a 87

set of numbers
# print maximum
print_and_exit: li $v0, 4
# print minimum la $a0, array_min
li $v0, 4 #for string syscall
la $a0, array_max
syscall li $v0, 1
move $a0, $t3
li $v0, 1 #for number syscall
move $a0, $t2
syscall # exit
li $v0, 10
syscall
Find the minimum and maximum number from a 88

set of numbers
int main()
{
int arr[10] = {1, 2, -8, 0, 23, 11, -10};
int N = 10, i;
int minE = 9999, maxE = -9999;
// Traverse the given array
for (i = 0; i < N; i++) {
// If current element is smaller than minE then update it
if (arr[i] < minE) {
minE = arr[i];
}
// If current element is greater than maxE then update it
if (arr[i] > maxE) {
maxE = arr[i];
} } printf("The minimum element is %d", minE); printf("\n");
printf("The maximum element is %d", maxE);
return 0;
}
89
Algorithm and its Possible Architectures

Algorithm

RLT Design Compiler: Compiler


Xilinx HLS, Intel HLS, gcc, g++
Synopsis DC

RTL Design/ Instructions/S


Hardware oftware

Single or General Purpose Microprocessor General Purpose Microprocessor


How does the General-purpose micro-processor 90

solve the problem?


Beside problem it also take energy
Problem

How do we ensure problems are solved by electrons?

Electrons Beside solutions it also produce heat


How does the General-purpose micro-processor 91

solve the problem?


Problem Beside problem it also take energy

Algorithm

Program/Language
Runtime system How do we ensure problems are solved by
(OS, VM, MM) electrons?

ISA (Architecture)

CS G524 focuses on Micro-Architecture

Logic

Devices

Electrons Beside solutions it also produce heat

Yale Patt, “Requirements, Bottlenecks, and Good Fortune- Agents for Microprocessor Evolution,” Proc. of the IEEE, VOL. 89, NO. 11, NOV. 2001
92

What is the basic building block of a program?


• A C-program: min_max.c
• Perform: gcc –S min_max.c
• Assembly code: min_max.s
• The basic building block of a program is instructions
• What is the significance of instruction order?
• From a program’s (compiler writer’s) point of view the computer is
the instruction set
• Instruction Set Architecture can be MIPS (32, 64), RISC-V (32, 64),
8085, x86 (refers to a 16/32-bit CPU of type 8086), x64 (refers to a
64-bit CPU of type 8086)
• All program use same set of instructions
93
What is the meaning of Computer Architecture?
• Architecture: Computational (dedicated or
general-purpose) structure with respect to user
(programmer, etc)
• Use of minimal resources
• Easily scalable
• For an example, a set of instruction can be the
meaning of comp. arch. to a programmer
• A dedicated processor
• Algorithm has its physical structure
94

Two Very Important Ideas


• Idea 1: All computers (the biggest and the smallest, the fastest and the
slowest, the most expensive and the cheapest) are capable of
computing exactly the same things if they are given enough time and
enough memory.
• Idea 2: We describe our problems in English or some other language
spoken by people. Yet the problems are solved by electrons running
around inside the computer. It is necessary to transform our problem
from the language of humans to the voltages that influence the flow of
electrons.
95
Can we automate the microprocessor design process?
• Electronic Design Automation (EDA)
• Algorithms for microprocessor design
• Intel, Xilinx, Synopsys, Cadence, Mentor Graphics, Qualcomm, Etc.

Open source tool: https://github.com/lana555/dynamatic


96

Can we design efficient branch predictor?


97

Can we design efficient cache replacement policy?


98

Can we design efficient cache prefetcher?


99

Can we design efficient Register File prefetcher?


100

History of computation
• Homework
• Go through the material on Gdrive, shared with you.
• https://drive.google.com/drive/folders/1JkTqBCwtP8o6-7YzjX8Jfxbab
KT8puzW
• Go through the order mentioned in the xlsx file
• Will ask the question in the next class
101

Changes in Computation
• Manual
• Mechanical
• gears, chains, pulleys, and steam power
• Punch cards
• Electro-mechanical
• switches, relays
• Electrical
• plugboards, vaccum tubes
• later came DRUM memory, core memory, transistors and so on ...

http://www.computerhistory.org/timeline
102

Computation on 2004
• 64-bit Itanium processor developed by
Intel
• 1.7 billion transistors
• 1.7 GHz, issue up to 8 instructions per
cycle
• 26 MByte of cache
• In ~30 years, about 100,000 fold growth
in transistor count and performance
103

What Happened in Between


• Moore's Law refers to Moore's perception that the number of
transistors on a microchip doubles every two years, though the cost of
computers is halved.
104

Moore’s Law Scaling with Cores

Year:1970 to ~ 2005 2005 to ~ now


The Big picture 105

What kind of
growth is it?
106
Future is about

Quantum Computing
107

Carbon nanotube computer


• https://www.nature.com/articles/nature12502
• https://en.wikipedia.org/wiki/Carbon_nanotube_computer
FPGA 108

• Field Programmable Gate Array Inside a CLB

CLB: Configurable Logic block


https://www.youtube.com/watch?v=WY-F3knih7c
https://www.youtube.com/watch?v=K2LRXsKCW7w
4-inputs LUT
109

Quantum Computing
• https://awards.acm.org/about/2020-acm-prize
• https://www.ibm.com/quantum-computing/
• Google wants to build a useful quantum computer by 2029
• https://www.theverge.com/2021/5/19/22443453/google-quantum-computer-2029-d
ecade-commercial-useful-qubits-quantum-transistor
• Quantum Computing: Untangling the Hype (Talk at The Royal Institution)
• https://www.youtube.com/watch?v=wE1OCXvaDtc
110

Homework
• Design a single-purpose processor for performing the only bubble sort.
• Design a general processor for a set of instructions:
• ADD R1, R2, R3
• SUB R1, R2, R3
• LW R2, offset(R3)
• SW R2, offset(R3)
• BNE R1, R2, offset
• Has a instruction and a data memory
• Has 32 number of registers
• Register R0 holds zero value only.
111

Summary
• Motivation for automated Computation
• Dedicated processor Vs. General-purpose processor
• Limitation of Algorithm
• Building block of a program
• Steps to solve a problem by a computer/laptop
• Changes in Computation
29

ALU Operations
ALU control lines (2:0) ALU Functions

000 AND

001 OR

010 Add

110 Subtract

111 set on less than

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


30 Generation of Control: ALU decoder truth table

ALUOp ALU control lines (2:0) Funct

00 010 (add) X

01 110 (subtract) X

1X 010 (add) 100000 (add)

1X 110 (subtract) 100010 (sub)

1X 000 (and) 100100 (and)

1X 001 (or) 100101 (or)

1X 111 (set less than) 101010 (slt)

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


31
Generation of Controls
Inst. opcode ALUOp Instr. operation Funct field Desired ALU ALUControl
action
100010 (LW) 00 load word xxxxxx add 0010

100011 (SW) 00 store word xxxxxx add 0010

000100 (BEQ) 01 branch equal xxxxxx subtract 0110

000000 (R-type) 10 add 100000 add 0010

R-type 10 Subtract 100010 subtract 0110

R-type 10 AND 100100 AND 0000

R-type 10 OR 100101 OR 0001

R-type 10 set on less than 101010 set on less than 0111

001000 (addi) 00 Immediate xxxxxx add xxxx

000010 (j) xx jump xxxxxx jump xxxx

to
M
ai
B
a

n
c
k
(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.

You might also like