ECE2015 - CA - All Slides
ECE2015 - CA - All Slides
ECE 2015
DR. M. S. ELLISON
ASSOCIATE PROFESSOR
SENSE
Reference Books:
1. M. Morris Mano, Rajib Mall, Computer System Architecture, Pearson Education Third Edition,2017.
2. Carl Hamacher, Zvonkovranesic, Safwat Zaky , Computer Organization, McGraw Hill, Fifth Edition,2011.
Computer Architecture is also referred as Instruction set architecture (ISA) which has an algorithm to control various
components.
ISA
Organization is how features are implemented by interconnecting the operational units to realize the specific architectural
specifications.
◦ Control signals, interfaces, memory technology.
◦ e.g. Is there a hardware multiply unit or is it done by repeated addition?
Function Structure
MAR<PC MAR<PC
MBR<M[MAR] MBR<M[MAR]
IBR<MBR[20:39] IBR<MBR[20:39]
IR<MBR[0:7] IR<MBR[0:7]
MAR<MBR[8:19] MAR<MBR[8:19]
MBR<M[MAR] MBR<AC
AC<MBR M[MAR]<MBR
IR<IBR[0:7]
MAR<IBR[8:19]
PC<PC+1
MBR<M[MAR]
AC<AC+MBR
P3
P2 OPCODE OPERAND
00000001 000000000010
First, the CPU must make access memory to fetch the instruction. The instruction contains the address of the
data we want to load. During the execute phase accesses memory to load the data value located at that
address for a total of two trips to memory.
P3
The vectors A, B, and C are each stored in
1,000 continuous locations in memory,
beginning at locations 1001, 2001, and 3001,
respectively.
The program begins with the left half of
location 3.
A counting variable N is set to 999 and
decremented after each step until it reaches -
1.
Thus, the vectors are processed from high
location to low location.
For computer storage and processing minus signs and periods cannot be used
Only binary digits (0 and 1) may be used to represent numbers
If limited to non-negative integers, the representation is straightforward
DRAWBACKS
Addition and subtraction requires consideration of both the signs of the numbers and their
relative magnitudes to carry out the required operation
Another drawback is that there are two representations of 0
The number zero is identified as positive and therefore has a 0 sign bit and a magnitude of all
0s.
The range of positive integers that may be represented is from 0 (all of the magnitude bits are
0) through (all of the magnitude bits are 1).
Any larger number would require more bits
If A is a positive number, the rule clearly works. Now, if A is negative and we want
to construct an m-bit representation, with m > n. Then
For m-bit*n-bit
multiplier we need m*n
AND gates, n Half
adders, and (m-2)*n
Full adders
instead of
Solution 2
◦ Booth’s algorithm
Consider the following examples of integer division with all possible combinations of signs of D
and V
Ex 1.
18 .625
- 16 - .5
2 .125
- 2 - .125
0 0
18.62510 = 10010.1012
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 156
Problem storing binary form
Standards committee came up with a way to store floating point numbers (that have a decimal
point)
1 2 9 10 32
Significand
The advantage of using a larger exponent is that a greater range can be achieved for the same
number of exponent bits.
However, a larger exponent base gives a greater range at the expense of less precision.
AX - the Accumulator
BX - the Base Register
CX - the Count Register
DX - the Data Register
If the address location to which the control is to be transferred lies in a different segment
other than the current one, the mode is called intersegment mode.
If the destination lies in the same segment, the mode is called intrasegment mode
◦ c=a+b
2. Arithmetic Instructions
3. Logical Instructions
Instruction Set
1. Data Transfer Instructions
ADD A, data
ADC A, data
SUB A, data
SBB A, data
After DEC instruction we can use any JMP (Cond. Or Non-conditional) incase of loop
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 243
Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…
MUL reg/ mem
Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…
Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…
CMP A, data
MOVS
(MAE) (MA)
MOVSW
REPNZ/ REPNE
REP MOVSW
While CX 0 and ZF = 0, repeat execution of string instruction
(Repeat string instruction until ZF = 1) and
(CX) (CX) - 1
In the example, the first form copies a single byte from the source string, at address DS:SI, to the destination string, at
address ES:DI, then increments (or decrements, if the Direction flag is set) both SI and DI.
The second form performs this operation and then decrements CX; if CX is not zero, the operation is repeated.
CMPS
LODS
STOS
Mnemonics Explanation
STC Set CF 1
CLC Clear CF 0
NOP No operation
Mnemonics Explanation
CALL reg/ mem/ disp16 Call subroutine
Checks flags
Mnemonics Explanation
JC disp8 Jump if CF = 1
JP disp8 Jump if PF = 1
JO disp8 Jump if OF = 1
JS disp8 Jump if SF = 1
Data of SI moved to AX IF
NO CX=00
?
Counter CX is assigned by 05
yes
YES
0000 is assigned to SI
Increment SI by 02 times
Start
Move contents of AX to
1400 memory location
Stop
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 294
Program
MOV SI, 1200
SHR BX, 01
XOR AX, BX
MOV [1400], AX
HLT
Start
Stop
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 296
Program
MOV SI, 1200
SHR BX, 01
XOR AX, BX
MOV [1400], AX
HLT
30 0 0 1 1 0 0 0 0
ROR,4
AL=03 0 0 0 0 0 0 1 1
MUL 0A
30 in 0 0 0 1 1 1 1 0
dec
AH
56 0 0 1 1 1 0 0 0
AND
0F 0 0 0 0 1 1 1 1
AH
08 0 0 0 0 1 0 0 0
BL DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 300
Write a program to get Factorial of 10 numbers stored from the starting location
4000H:1000H. The results should be stored in 4000H:2000H
MOV 4000H, AX
MOV DS, AX
MOV SI,1000H
MOV DI, 2000H
Mov CL,0AH
MOV AL, 01H
Next:MOV BL,[SI]
LOOK:MUL BL
DEC BL
JNZ LOOK
MOV [DI], AL
INC SI
INC DI
LOOP NEXT
HLT DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 301
Flowchart Start
No If equal
?
Yes
Move contents of AX to
Stop
1400 memory location
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 302
Program
Mnemonics
MOV SI, 1200
MOV AX, [ SI ]
MOV BX, [ SI ]
LOOP 1 SHR BX, 01
XOR AX, BX
CMP BX, 0000
JE LOOP 2
JMP LOOP 1
LOOP 2 MOV [ 1400 ], AX
HLT
HLT ; Stop
312
Interrupts
An interrupt is used to cause a temporary halt in the execution of program.
Microprocessor responds to the interrupt with an interrupt service routine short program or
subroutine that instructs the microprocessor on how to handle the interrupt.
Interrupt vector: Code loaded on the bus by the interrupting device that contains the Address
(segment and offset) of specific interrupt service routine
Interrupt Masking: Ignoring (disabling) an interrupt
Non-Maskable Interrupt: Interrupt that cannot be ignored (power-down)
Accept N
Interrupt
Get interrupt
vector
Jump to ISR
Save PC
Load PC
INTR
INTA΄
D7-D0 Vector
Interrupts
The 8086 provides a 256 entry interrupt vector table beginning at address 0:0 in memory.
The Interrupt Vector Table occupies the address range from 00000H to 003FFH (the first
1024 bytes in the memory map).
This is a 1K table containing 256 4-byte entries.
Each entry in this table contains a segmented address that points at the interrupt service
routine in memory.
The lowest five types are dedicated to specific interrupts such as the divide by zero interrupt
and the non maskable interrupt.
The next 27 interrupt types, from 5 to 31 are reserved by Intel for use in future
microprocessors.
The upper 224 interrupt types, from 32 to 255, are available to use for hardware and
software interrupts.
D7
Peripheral
Device
8088 System
D0
A0 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0
E2 INTR
LS244
E1
I7 I6 I5 I4 I3 I2 I1 I0
A19 +5V
INTA
4C = 0 1 0 0 1 1 0 0
INTR
0 1 2 3 4 5 6 7 8 9 A B C D E F
00000 3C 22 10 38 6F 13 2C 2A 33 22 21 67 EE F1 32 25
00010 11 3C 32 88 90 16 44 32 14 30 42 58 30 36 34 66
......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
00100 4A 33 3C 4A AA 1A 1B A2 2A 33 3C 4A AA 1A 3E 77
00110 C1 58 4E C1 4F 11 66 F4 C5 58 4E 20 4F 11 F0 F4
......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
00250 00 10 10 20 3F 26 33 3C 20 26 20 C1 3F 10 28 32
00260 20 4E 00 10 50 88 22 38 10 5A 38 10 4C 55 14 54
......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
003E0 3A 10 45 2F 4E 33 6F 90 3A 44 37 43 3A 54 54 7F
003F0 22 3C 80 01 3C | COMPUTER
DR. ELLISON 4F 4E 88 22
ARCHITECTURE 3C| VIT-AP
(ECE2015) 50 21 49 3F F4 65 324
Example
Write a sequence of instructions that initialize vector 40H to point to the ISR “isr40”.
Answer: Address in table = 4 X 40 = 100H
Set ds to 0 since the Interrupt Vector Table begins at 00000H
Get the offset address of the ISR using the Offset directive
and store it in the addresses 100H and 101H
Get the segment address of the ISR using the Segment directive
and store it in the addresses 102H and 103H
Process interrupt
Restore state
◦ Load PC/IP, flags, registers etc.
INT 21h / AH=9 - output of a string at DS:DX. String must be terminated by '$'.
Allows plug’n’play connectivity which allows for dynamically loadable and unloadable devices
and drivers.
Plug the device in and the host loads the drivers without needing a reboot or initiation/termination of
connection, etc
Unplug the device the absence is automatically detected by the host and the drivers are unloaded
PQ = 00 Fetch Cycle
Is it that simple? PQ = 11 Interrupt Cycle
PQ = 10 Execute Cycle
No. In a modern complex processor, the number of PQ = 01 Indirect Cycle
Boolean equations needed to define the control unit is
very large. Hence an efficient and simpler approach, known as Flow
chart/ delay element method is usually used
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 414
•Referring to the figures above, depict the control signals required for the execution cycle of LOAD 200 and
STORE 300. [5]
•Express the Boolean expression for each of the control signals. [5]
•Implement the Boolean expressions for each of the control signals using minimum number of gates. [5]
C1
C2
C3
Is
Y x=0? N
C4 C5
C6
End
Problems on Pipeline
Non-pipeline
Pipeline
A B C
A B C
A B C
Time
Problem 1: Find the number of clock cycles required to execute 10 instructions with
I1
pipeline method and without pipeline method for the following instruction structure ?
I2
I3
I F (2) D (1) E (1)
I4
I5
Fetch - 2 Clock cycle I6
I7
Decoding - 1 Clock cycle
I8
Execution - 1 Clock cycle
I9
I10
Pipeline method
F F D E
F F D E So on …
F F D E
F F D E
F F D E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12
Problem 2: Find the number of clock cycles required to execute 100 instructions with
I1
pipeline method and without pipeline method for the following instruction structure ?
I2
I3
I F (2) D (1) E (3)
I4
I5
Fetch - 2 Clock cycle I6
I7
Decoding - 1 Clock cycle
I8
Execution - 3 Clock cycle
I9
I10
Pipeline method
F F D E E E
F F D E E E So on …
F F D E E E
F F D E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Assignment - 1: Find the number of clock cycles required to execute 1000 instructions with
I1
pipeline method and without pipeline method for the following instruction structure ?
I2
I3
I F (1) D (2) E (3)
I4
I5
Fetch - 1 Clock cycle I6
I7
Decoding - 2 Clock cycle
I8
Execution - 3 Clock cycle
I9
I10
Assignment - 1: Find the number of clock cycles required to execute 1000 instructions with
I1
pipeline method and without pipeline method for the following instruction structure ?
I2
I3
I F (1) D (2) E (3)
I4
I5
Fetch - 1 Clock cycle I6
I7
Decoding - 2 Clock cycle
I8
Execution - 3 Clock cycle
I9
I10
Pipeline method
F D D E E E
F D D E E E So on …
F D D E E E
F D D E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Problem 1: Find the number of clock cycles required to execute 1000 instructions with
I1
pipeline method and without pipeline method for the following instruction structure? If
microcontroller frequency is 1GHz then also find the max operating frequency?
I2
I3
I4
I F (1) D (1) E (4)
I5
I6
Fetch - 1 Clock cycle
I7
Decoding - 1 Clock cycle I8
Execution - 4 Clock cycle I9
I10
Pipeline method
F D E E E E
F D E E E E So on …
F D E E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pipeline method
F D E E E E
F D E E E E So on …
F D E E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Problem 4: If microcontroller frequency is 1GHz then also find the max operating
I1
frequency?
I2
I F (2) D (1) E (1) I3
I4
I5
Fetch - 2 Clock cycle
I6
Decoding - 1 Clock cycle
I7
Execution - 1 Clock cycle I8
I9
I10
Pipeline method
F F D E
F F D E So on …
F F D E
F F D E
F F D E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12
Problem 5: If microcontroller frequency is 1GHz then also find the max operating
I1
frequency ?
I2
I3
I F (2) D (1) E (3)
I4
I5
Fetch - 2 Clock cycle I6
I7
Decoding - 1 Clock cycle
I8
Execution - 3 Clock cycle
I9
I10
Pipeline method
F F D E E E
F F D E E E So on …
F F D E E E
F F D E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Problem 1: Find the number of clock cycles required to execute 4 instructions with pipeline method and
without pipeline method for the following instruction structure? If microcontroller frequency is 1GHz then
also find the max operating frequency?
I F D E
I1
I2
Fetch – 2 Clock cycle I3
Decoding – 1 Clock cycle I4
Pipeline method
F F D E E
F F D E E E
F F D E E E E
F F D E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pipeline method
F F D E E
F F D E E E
F F D E E E E
F F D E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Problem 2: Find the number of clock cycles required to execute 4 instructions with pipeline method
and without pipeline method for the following instruction structure? If microcontroller frequency is
1GHz then also find the max operating frequency?
I F D E I1
I2
I3
Fetch – 1 Clock cycle
I4
Decoding – 1 Clock cycle
Execution – 2 (I1), 1 (I2), 1 (I3) and 2 (I4) Clock cycles
Pipeline method
F D E E
F D E
F D E
F D E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8
Pipeline method
F D E E
F D E
F D E
F D E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8
Maximum operating frequency is given by
Assignment - 1: Find the number of clock cycles required to execute 5 instructions with pipeline method
and without pipeline method for the following instruction structure? If microcontroller frequency is 2 GHz
then also find the max operating frequency?
I1
I F D E
I2
I3
Fetch – 1 Clock cycle I4
Decoding – 1 Clock cycle I5
Problem - 1: If the pipeline is flushed for every 3 instructions then find the number of clock cycles required
to execute 9 instructions with pipeline method?
I1
I2
I F (1) D (1) E (4) I3
I4
Fetch - 1 Clock cycle I5
I6
Decoding - 1 Clock cycle
I7
Execution - 4 Clock cycle I8
I9
Pipeline method
F D E E E E
F D E E E E So on …
F D E E E E
F D E E E E
F D E E E E
F D E E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Pipeline method
F D E E E E
F D E E E E
F D E E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No of Clock Cycle required to execute 3 instructions is
= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((3-1) x 4) = 6 + (2 x 4) = 6 + 8 = 14 Clock cycles
Problem - 2: If the pipeline is flushed for every 10 instructions then find the number of clock cycles
required to execute 40 instructions with pipeline method? I1
I2
I F (1) D (1) E (4) I3
I4
I5
Fetch - 1 Clock cycle
I6
Decoding - 1 Clock cycle I7
Execution - 4 Clock cycle I8
I9
I10
Pipeline method
F D E E E E
F D E E E E So on …
F D E E E E
F D E E E E
F D E E E E
F D E E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pipeline method
F D E E E E
F D E E E E Up to 10 instructions
F D E E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No of Clock Cycle required to execute 10 instructions is
= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((10-1) x 4) = 6 + (9 x 4) = 6 + 36 = 42 Clock cycles
Problem - 3: If the pipeline is flushed for every 10 instructions then find the number of clock cycles
required to execute 41 instructions with pipeline method? I1
I2
I F (1) D (1) E (4) I3
I4
I5
Fetch - 1 Clock cycle
I6
Decoding - 1 Clock cycle I7
Execution - 4 Clock cycle I8
I9
I10
Pipeline method
F D E E E E
F D E E E E So on …
F D E E E E
F D E E E E
F D E E E E
F D E E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pipeline method
F D E E E E
F D E E E E Up to 10 instructions
F D E E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No of Clock Cycle required to execute 10 instructions is
= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((10-1) x 4) = 6 + (9 x 4) = 6 + 36 = 42 Clock cycles
Problem - 4: If the pipeline is flushed for every 10 instructions then find the number of clock cycles
required to execute 141 instructions with pipeline method? I1
I2
I F (2) D (1) E (3) I3
I4
I5
Fetch - 2 Clock cycle
I6
Decoding - 1 Clock cycle I7
Execution - 3 Clock cycle I8
I9
I10
Pipeline method
F F D E E E
F F D E E E So on …
F F D E E E
F F D E E E
F F D E E E
F F D E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12
Pipeline method
F F D E E E
F F D E E E
Up to 10 instructions
F F D E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12
No of Clock Cycle required to execute 10 instructions is
= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((10-1) x 3) = 6 + (9 x 3) = 6 + 27 = 33 Clock cycles
No of Clock Cycle required to execute 141 instructions = 14 x 33 + 6 = 462 + 6 = 468 Clock cycles
Problem - 1: If the pipeline is flushed for every 15 instructions then find the number of clock cycles
required to execute 1501 instructions with pipeline method? I1
I2
I F (2) D (1) E (4) I3
I4
I5
Fetch - 2 Clock cycle
I6
Decoding - 1 Clock cycle I7
Execution - 4 Clock cycle I8
I9
I10
Problem 1: Find the number of clock cycles required to execute 10 instructions with
pipeline method and without pipeline method for the following instruction structure? I1
Improve the pipeline structure. I2
I3
I F (1) D (1) E (4) I4
I5
Fetch - 1 Clock cycle I6
I7
Decoding - 1 Clock cycle
I8
Execution - 4 Clock cycle
I9
I10
F D E E E E
F D E E E E So on …
F D E E E E
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10
Problem 2: Find the number of clock cycles required to execute 100 instructions with
pipeline method and without pipeline method for the following instruction structure? I1
Improve the pipeline structure. I2
I3
I F (2) D (1) E (6) I4
I5
Fetch - 2 Clock cycle I6
I7
Decoding - 1 Clock cycle
I8
Execution - 6 Clock cycle
I9
I10
F F D E E E E E E
F F D E E E E E E So on …
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No of Clock Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12
Problem - 1: Find the number of clock cycles required to execute 5432 instructions with pipeline
method and without pipeline method for the following instruction structure ? Improve the
I1
pipeline structure.
I2
I3
I F (2) D (1) E (8) I4
I5
I6
Fetch - 2 Clock cycle
I7
Decoding - 1 Clock cycle
I8
Execution - 8 Clock cycle I9
I10
Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns. Given latch delay is 10 ns.
Calculate-
1. Pipeline cycle time
2. Non-pipeline execution time
3. Speed up ratio
4. Pipeline time for 1000 instructions
5. Sequential time for 1000 instructions
6. Throughput
Solution:
Given-
o Four stage pipeline is used
o Delay of stages = 60, 50, 90 and 80 ns
o Latch delay or delay due to each register = 10 ns
S3 90ns
S4 80ns
Latch 10ns
S4 80ns
Pipelined Architecture
Note: In any stage of pipeline, the output of each stage will be moved to the
next state after the 100 ns (max(60,50,90,80) + 10 ns)
Cycle time = Maximum delay due to any stage + Delay due to its register (Latch)
= Max { 60, 50, 90, 80 } + 10 ns
= 90 ns + 10 ns
= 100 ns
Part-06: Throughput-
Throughput for pipelined execution = Number of instructions executed per unit time
= 1000 instructions / 100300 ns
How fast?
◦ Time is money
How expensive?
Set-Associative Mapping
https://www.youtube.com/watch?v=mCF5XNn_xfA
https://www.youtube.com/watch?v=j5PUJllPPVE&t=893s
https://www.youtube.com/watch?v=1J_DhymCJok
Set-Associative Mapping
https://www.youtube.com/watch?v=vWxtmci1Nko&authuser=1