Computer Organisation and Archeitecture
Computer Organisation and Archeitecture
Organization &
BITS
Architecture DS
Rao
Pilani 98663582
Pilani Campus 71
Computer Architecture and
Organization-1
• Architecture is those attributes visible to the programmer and having direct
impact on the logical execution of the program
– Instruction set, Number of bits used for data types, I/O mechanisms,
Memory addressing techniques
– e.g. x86 architecture, IBM/360 architecture
• Architecture Question?
– Is there a multiply/division instruction available?
• Organization Question?
– Is multiplication implemented by separate
hardware or is it done by repeated addition?
3
BITS Pilani, Pilani
Computer Organization vs. Computer
Architecture
• For example, it is an architectural design issue whether a computer will have a multiply
instruction.
• Many computer manufacturers offer a family of computer models, all with the same
architecture but with differences in organization.
• Different models in the family have different price and performance characteristics.
• A prominent example of both these phenomena is the IBM system /370 architecture. It is
first introduced in 1970 and included a number of models.
• This gives code compatibility
— At least backwards
• Organization differs between different versions
BITS Pilani, Deemed to be University under Section 3 of UGC
Structure and Functions of various components of
a computer
system.
• Structure is the way in which components relate to each other.
Structure = static relations among components
• Function is the operation of individual components as part of the structure.
Function = dynamic behaviour of each component
• In terms of description, we have two choices, starting at the bottom and
building up to a complete description or beginning with a top view and
decomposing the system into the subparts.
— Data processing
— Data storage
— Data movement
— Control
7
BITS Pilani, Pilani
Computer
Operations
Operations (b) Storage Operation (c) Processing from/to storage
Operations (a) Data movement
Operation (d)
Processing from storage to I/O
8
BITS Pilani, Pilani
Structu
re
Peripherals
Communication lines
COMPUTER
•Storage
•processing
The Computer
BITS Pilani, Deemed to be University under Section 3 of UGC
Structural View of a
Computer
Peripherals Comput
er
Central Main
Processin Memor
g Unit y
Computer
Systems
Interconnecti
on
Input
Outp
Communication ut
lines
10
BITS Pilani, Pilani
• There are four main structural components:
• Central processing unit (CPU): controls the operation of the
computer and performs its data processing functions; often simply
referred to as processor.
CPU
Comput Arithmet
er Register ic and
I/O s Logic
System CPU Unit
Bus
Internal CPU
Memor Interconnecti
y on
Contr
ol
Unit
12
BITS Pilani, Pilani
Structural View of
the CPU
•The most complex component is the CPU and its major structural components are as
follows:
•Control unit: Controls the operation of the CPU and hence the computer.
•Arithmetic and logic unit (ALU): Performs the computers data processing functions.
•CPU Interconnection: Some mechanism that provides for communication among the
18
BITS Pilani, Pilani
IAS -
details
• 1000 x 40 bit words
– Binary number
– Instruction Register
– Program Counter
– Accumulator
19
– Multiplier Quotient
BITS Pilani, Pilani
IAS -
details
20
BITS Pilani, Pilani
Structure of
IAS – detail
21
BITS Pilani, Pilani
Commercial
Computers
24
BITS Pilani, Pilani
Microelectron
ics
• Literally - “small electronics”
• A computer is made up of gates, memory cells
and interconnections
• These can be manufactured on a
semiconductor
• e.g. silicon wafer
25
BITS Pilani, Pilani
DEC
PDP-8
• 1964
• First minicomputer (after miniskirt!)
• Did not need air conditioned room
• Small enough to sit on a lab bench
• $16,000
– $100k+ for IBM 360
• Embedded applications & OEM(original equipment
manufacturers)
• BUS STRUCTURE
models of the PDP-8 used a structure that became virtually universal for
microcomputers: the bus structure
32
BITS Pilani, Pilani
x86
Evolution-2
• 8086
– Much more powerful (16 bit data)
– Instruction cache, pre-fetch few instructions
– 8088 (8 bit external bus) used in first IBM PC
• 80286
– 16 MByte memory addressable
– Up from 1MB (in 8086)
• 80386
– 32 bit processor with multitasking support
33
BITS Pilani, Pilani
x86
Evolution-3
• 80486
– Sophisticated powerful cache and instruction pipelining
– Built in maths co-processor
• Pentium
– Superscalar
– Multiple instructions executed in parallel
• Pentium Pro
– Increased superscalar organization
– Aggressive register renaming
– Branch prediction and Data flow analysis
34
BITS Pilani, Pilani
x86
Evolution-4
• Pentium II
– MMX technology, graphics, video & audio processing
• Pentium III
– Additional floating point instructions for 3D graphics
• Pentium 4
– Further floating point and multimedia enhancements
• Itanium Series
– 64 bit with Hardware enhancements to increase speed
• What’s next???
– Multi core architectures
35
BITS Pilani, Pilani
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
Opcode Address
Instruction format
• The instruction format provides 4 bits for the opcode, so that there can be as many as 24 = 16
different opcodes.
• ,and up to 212 = 4096 (4K) words of memory can be directly addressed.
opcode function
0001 Load AC from memory
0010 Store AC to Memory
0101 Add to AC from
Memory
46
BITS Pilani, Deemed to be University under Section 3 of UGC
47
BITS Pilani, Deemed to be University under Section 3 of UGC
CPU Performance
• Equation
Multiple aspects to performance: helps to isolate them
• Latency = seconds / program =
–
PERFORMANCE
Performance is one of the key parameter to
evaluate a system.
If we say one system is better than another
we consider
• Response Time
• Throughput
50
BITS Pilani, Pilani
Clock
Rate
• Operation performed by processor are governed by
system clock (fundamental level of processor speed
measurement)
– Generated by quartz crystal
52
BITS Pilani, Pilani
Performance: Application
Specific
• Performance
– How a processor performs when executing a given application
53
BITS Pilani, Pilani
CPU
Performance
• To maximize performance, need to minimize
execution time
performance = 1 / execution_time
54
BITS Pilani, Pilani
Cycles Per Instruction
(CPI)
• For any given processor, number of cycles
required varies for different types of
instructions
– e.g. load, store, branch, add, mul etc.
• Hence CPI is not a constant value for a
processor
• Needs to calculate average CPI for
processor
56
BITS Pilani, Pilani
CPU Performance and it’s
Factors
57
BITS Pilani, Pilani
In class
Example1
• Consider the execution of a program that results in the execution of 2 million
instructions on a 400-MHz processor. The program consists of four major types of
instructions. The instruction mix and the CPI for each instruction type are given
below, based on the result of a program trace experiment
The average CPI when the program is executed on a uniprocessor with the above trace results is
CPI = 0.6 + (2 * 0.18) + (4 * 0.12) + (8 * 0.1) = 2.24.
MIPS rate = Ic /T * 106 = f /CPI * 106
The corresponding MIPS rate is (400 * 106 )/(2.24 * 106 ) ≈ 178.
58
BITS Pilani, Pilani
Example of Performance
Measure
59
BITS Pilani, Pilani
Limitation of MIPS
Rate
• MIPS rate or instruction execution rate is also
inadequate to measure CPU performance. Why?
– Because of differences in ISA
– Ex. To execute a high level language statement
A=B+C (A,B and C are in memory) may need
different number of low level instructions for
different ISA
60
BITS Pilani, Pilani
Amdahl’s Law
Formula
• For program running on single processor
• Conclusions
63
BITS Pilani, Pilani
Apply Amdahl's Law to the above question and observe the result
With FP Speedupoverall=
Q2. Early examples of CISC and RISC are the VAX 11/780 and the IBM RS/6000,
respectively. Using a typical benchmark program, the following machine
characteristics results:
The final column shows that the VAX required 12 times longer than the IBM
measured in CPU
time.
•What is the relative size of the instruction count of the machine code for this
68
So that:
Ic = T [(MIPS rate)/106].
[x 18]/[12x 1] = 1.5
69
69
BITS Pilani, Pilani
Review
Questions
• Differentiate between computer organization and architecture?
• What are the four main functions of a computer?
• What are the basic structural components of a computer?
• Describe the computer generations in brief.
• What was the first general purpose microprocessor?
• Consider two implementation of the same ISA. Computer A clock cycle time of 250
ns and a CPI of 2 for a program. Computer B has a clock cycle time of 500 ns and a
CPI of 1.2 for the same program. Which computer is faster for this program and by
how much?
• A program runs on computer A with a 2 GHz clock in 10 seconds. Another
computer B with 4 GHz run this program in 6 seconds. To accomplish this,
computer B will require P times as many clock cycles as computer A to run the
program. Find the value of P.
70
BITS Pilani, Pilani
Performance revision
Consider two different machines, with two different instruction
sets, both of which have a clock rate of 200 MHz. The following
measurements are recorded on the two machines running a
given set of
Instruction benchmark
Type programs:
Instruction Count (millions) Cycles per Instruction
Machine A
Arithmetic and 8 1
logic
Load and store 4 3
Branch 2 4
Others 4 3
Machine B
Arithmetic and 10 1
logic
Load and store 8 2
Branch 2 4
Others
a. Determine the effective 4CPI, MIPS rate, and execution time
3 for each
machine.
b. Comment on the results.
BITS Pilani, Deemed to be University under Section 3 of UGC
Performance revision
clock rate of 200
MHz. :
Instruction Type Instruction Count (millions) Cycles per Instruction
Machine A
Arithmetic and 8 1
logic
Load and store 4 3
Branch 2 4
Others 4 3
a. Determine the effective CPI, MIPS rate, and execution time for each
machine.
b. Comment on the results.
Contact Session 3
Computer System Components and
Interconnections
Inter
Connections
• All the units must be connected
• Different type of connection for different type
of unit
– Memory
– Input/Output
– CPU
Memory Connection
– Read
– Write
– Timing
• Ribbon cables
• Sets of wires
BITS Pilani, Deemed to be University under Section 3 of UGC
Different
Bus
Data Bus
• Carries data
– Remember that there is no difference between “data” and “instruction”
at this level
• Width is a key determinant of performance
– 8, 16, 32, 64 bit
Address bus
LANs
WAN
s BITS Pilani, Deemed to be University under Section 3 of UGC
High Performance
Bus High-performance Hierarchical Bus
Architecture
o brings high-demand
devices into closer integration
with the processor and at the
same time is independent of the
processor
o Changes in processor
architecture do not affect the high-
BITS Pilani, Deemed to be University under Section 3 of UGC
speed bus, and vice versa
Bus
Types
• Dedicated
– Separate data & address lines
• Multiplexed
– Shared lines
– Address valid or data valid control line
– Advantage - fewer lines
– Disadvantages
• More complex control
• Reduction in performance (cannot take place in parallel)
91
BITS Pilani, Deemed to be University under Section 3 of UGC
Bus Arbitration Methods
• Centralized
Centralized bus arbitration requires hardware (arbiter)that
will grant the bus to one of the requesting devices. This
hardware can be part of the CPU or it can be a separate
device on the motherboard.
• Decentralized
Decentralized arbitration there isn't an arbiter, so the devices
have to decide who goes next. This makes the devices more
complicated, but saves the expense of having an arbiter.
92
BITS Pilani, Deemed to be University under Section 3 of UGC
Centralized
arbitration BBS
Y
BR
Processor
DMA DMA
controller controller
BG 1 BG 2
1 2
DMA controller 2
asserts the BR Tim
signal. Processor
e
BR
asserts
the BG1 signal
B BS
Y
Bus
mast
er P rocess DMA controller P rocess
or 2 or
Arbitration process:
•Each device compares the pattern that appears on the arbitration
lines to its own ID, starting with MSB.
•If it detects a difference, it transmits 0s on the arbitration lines for
that and all
lower bit positions.
•Device A compares its ID 5 with a pattern 0101 to pattern 0111.
•It detects a difference at bit position 2, as a result, it transmits a
pattern 0100 on the arbitration lines.
•The pattern that appears on the arbitration lines is the logical-OR 26of
0100 and 0110, which is 0110. BITS Pilani, Deemed to be University under Section 3 of
Timin
g
• Co-ordination of events on bus
• Synchronous
– Events determined by clock signals and synchronized
on leading edge of clock
– All devices can read clock line
– Usually a single cycle for an event
Asynchronous
⮚ The occurrence of one event on a bus
depends on the occurrence of a previous event
⮚ Events on the bus are not synchronized with
the clock
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous
bus
Bus clock
Bus cycle
28
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous bus(contnd.) T im
e
Bus
clock
Address
and
comman
d
Dat
a
t0 t1 t2
Bus
Master places the cycle
device address Addressed slave
and command on places Master “strobes” the
the bus, and data on the data data on the data
indicates that lines lines into its input
it is a Read buffer, for a Read
•In operation. operation.
case of a Write operation, the master places the data on the bus
along with the address and commands at time t0.
•The slave strobes the data into its input buffer 10
at time t2. 0
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous bus(contnd.)
• Once the master places the device address and command on the bus, it
takes time for this information to propagate to the devices:
– This time depends on the physical and electrical characteristics of
the
bus.
• Also, all the devices have to be given enough time to decode the
address
and control signals, so that the addressed slave can place data on the
bus.
t t 2
•Signals do not appear on
0
the bus as soon as they are placed on the
1
t
bus, due to the propagation delay in the interface circuits.
•Signals reach the devices after a propagation delay which
depends on the characteristics of the bus.
•Data must remain on the bus for some time after t2 equal to
10
the hold time of the buffer. 3
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous bus(contnd.)
• Most buses have control signals to represent a
response from the slave.
• Control signals serve two purposes:
– Inform the master that the slave has recognized the address, and is ready to
participate in a data transfer operation.
– Enable to adjust the duration of the data transfer operation based on the
speed of the participating slaves.
10
4
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous
bus(contnd.)
Slave-ready signal is an acknowledgement from the slave to the master to confirm that the
valid data has been sent. Depending on when the slave-ready signal is asserted, the duration
of the data transfer can change.
Address & Tim
command e
requesting a Read 1 2 3 4
operation appear
on the bus.
C loc
k
Addres
s
C omman
d Master strobes data
into the input buffer.
Dat
a
Slave-
ready
Slave places the data on the Clock changes are seen by all the
bus,
and asserts Slave-ready signal. devices at the same
time. 34
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous Timing
Diagram
⚫ Common clock in the synchronous bus case is replaced by two timing control lines:
⚫ Master-ready,
⚫ Slave-ready.
⚫ Master-ready signal is asserted by the master to indicate to the slave that it is ready
to participate in a data transfer.
⚫ Slave-ready signal is asserted by the slave in response to the master-ready from the
master, and it indicates to the master that the slave is ready to participate in a data
transfer.
36
BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous
bus(contnd.) T im
e
Address
and
command
Master-
ready
Dat
a
S lave-
t0 t1 t2 t3 t4 t5
ready
Bus cycle
t0 - Master places the address and command information on the bus.
t1 - Master asserts the Master-ready signal. Master-ready signal is asserted
at t1 instead of t0 t2 - Addressed slave places the data on the bus and asserts
the Slave-ready signal.
t3 - Slave-ready signal arrives at the master.
t4 - Master removes the address and command information.
and the Slave-ready
t5 - Slave receives the signal from of the Master-ready signal from 1 to 0. It 37
transition
the bus. BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous vs.
Synchronous bus
38
BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous Timing – Read
Diagram
⚫ Low-cost bus
⚫ Processor independent
⚫ Plug-and-play capability
⚫ In today’s computers, most memory transfers involve a burst of data rather than just one
word. The PCI is designed primarily to support this mode of operation.
⚫ The bus supports three independent address spaces: memory, I/O, and configuration.
⚫ we assumed that the master maintains the address information on the bus until data transfer
is completed. But, the address is needed only long enough for the slave to be selected. Thus,
the address is needed on the bus for one clock cycle only, freeing the address lines to be used
for sending data in subsequent clock cycles. The result is a significant cost reduction.
⚫ A master is called an initiator in PCI terminology. The addressed device that responds to read
and write commands is called a target.
BITS Pilani, Deemed to be University under Section 3 of UGC
PCI Bus
Arbiter
c. At the same time, the arbiter asserts GNT-A to grant bus access to
A.
learns that it has been granted bus access. It also finds IRDY and
decision to grant the bus to B for the next transaction. It then asserts GNT-B and
deasserts
GNT-A. B will not be able to use the bus until it returns to an idle state.
f. A deasserts FRAME to indicate that the last (and only) data transfer is in
progress. It puts
the data on the data bus and signals the target with IRDY.The target reads the
data at the
g. At the beginning of clock 5, B finds IRDY and FRAME deasserted and so is able
to take
control of the bus by asserting FRAME. It also deasserts its REQ line, because it
only wants to
https://yasmin-cpu-os-
simulator.software.informer.com/6.1/
46
BITS Pilani, Deemed to be University under Section 3 of UGC
BITS
Pilani
Pilani Campus
Chapter 9
• Computer
Arithmetic
Arithmetic & Logic Unit (ALU)
Part of the computer that actually performs arithmetic
and logical operations on data
All of the other elements of the computer system are
there mainly to bring data into the ALU for it to process
and then to take the results back out
Based on the use of simple digital logic devices that can
store binary digits and perform simple Boolean logic
operations
Fig
9.1
1101101= -27-1.1
+ 25.1 + 24.0 + 23.1 +
22.1+ 21.0 + 20.1
= -64+ 32 +0 +8+4+0+1
BITS Pilani, Deemed to be University under Section 3 of UGC
Addition and Subtraction
OVERFLOW RULE: If two numbers are added, and they are both positive or
both negative, then overflow occurs if and only if the result has the opposite
sign.
cycl
e
0 0101 1110 1011 Shift
42
9/4
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 1001
43
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
15
5
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001
15
6
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 0000
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001
2 1: Rem= Rem-Div 0000 0010 0000 1110 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0010 0000 0000 1001
3: Shift Div Right 0000 0001 0000 0000 1001
15
7
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 0000
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001
2 1: Rem= Rem-Div 0000 0010 0000 1110 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0010 0000 0000 1001
3: Shift Div Right 0000 0001 0000 0000 1001
3 1: Rem= Rem-Div 0000 0001 0000 1111 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0001 0000 0000 1001
3: Shift Div Right 0000 0000 1000 0000 1001
15
8
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 0000
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001
2 1: Rem= Rem-Div 0000 0010 0000 1110 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0010 0000 0000 1001
3: Shift Div Right 0000 0001 0000 0000 1001
3 1: Rem= Rem-Div 0000 0001 0000 1111 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0001 0000 0000 1001
3: Shift Div Right 0000 0000 1000 0000 1001
4 1: Rem= Rem-Div 0000 0000 1000 0000 0001
2a: Rem>0 => sll Q, Q0=1 0001 0000 1000 0000 0001
3: Shift Div Right 0001 0000 0100 0000 0001
5 1: Rem= Rem-Div 0001 0000 0100 1111 1101
2b: Rem<0 => +Div, sll Q, Q0=0 0010 0000 0100 0000 0001
3: Shift Div Right 0010 0000 0010 0000 0001
15
9
Signed Division
Remember the signs of the divisor and dividend and then negate
the quotient if the signs disagree.
Dividend = Quotient Divisor + Remainder
Look at the example of
± 11 ÷ ±5
+11 = +2 𝑎𝑛𝑑 + −11 = −2 𝑎𝑛𝑑 −
+5 +5
1; 1
+11
−2 𝑎𝑛𝑑 + +11 ÷ –5:
−5
1
−11
Quotient = –2,
= = +2 𝑎𝑛𝑑
−5 –11 ÷ –5:
Remainder = +1
−1
; Quotient = +2,
Remainder = –1
(seconds in a typical
century) BITS Pilani, Deemed to be University under Section 3 of UGC
Biased Exponent Representation
How to represent a signed exponent? Choices are …
Sign + magnitude representation for the exponent
Two’s complement representation
Biased representation
IEEE 754 uses biased representation for the exponent
Value of exponent
= val(E) = E – Bias (Bias is a constant)
= (1.01001100 … 0)
BITS Pilani, Deemed to 2be× 23 =under
University (1010.01100
Section 3 of UGC … 0)2
Converting FP Decimal to
Binary
Solution:
=
Fraction bits 1.25can be obtained using multiplication by
0.8125 × 2== 0.51.625
0.625 × 2 = 1.0
0.8125
0.25=× (0.1101)2 = ½ + ¼ + 1/16 = 13/16
2
0.5 ×
Fraction 2
= (0.1101)2 = (1.101)2 × 2
– 1 (Normalized)
101111110101000000000000000
00000 BITS Pilani, Deemed to be University under Section 3 of UGC
Largest Normalized Float
Zero
Exponent field E = 0 and fraction F = 0
+0 and – 0 are possible according to
sign bit S
Infinity
Infinity is a special value represented with maximum
E and
F=0
For single precision with 8-bit exponent: maximum E
= 255
Infinity can result from overflow or division by zero
+∞ and – ∞ are possible according to sign bit S
+ 1.11100100000000000000010 × 24
+ 0.01100000000000001100001 01 × 24 (shift right)
+10.01000100000000001100011 01 × 24 (result)
Consider multiplying:
-1.110 1000 0100 0000 1010 00012 × 2–4
× 1.100 0000 0001 0000 0000 00002 × 2–2
Unlike addition, we add the
exponents of the operands
Result exponent value = (–4) + (–
2) = – 6
Using the biased representation: EZ
= EX + EY – Bias
EX = (–4) + 127 = 123 (Bias = 127
for single precision)
EY = (–2) + 127 = 125
EZ = 123 + 125 – 127 = 121
(value = –6)
Sign bit of product can be
BITS Pilani, Deemed to be University under Section 3 of UGC
FP Multiplication contnd.
Now multiply the
(Multiplicand)
significands: 1.11010000100000010100
(Multiplier) 001
×
111010000100000010100001
1.1000000000100000000
111010000100000010100001 0000
1.11010000100000010100001
10.101110001111101111110011001010000100
0000000000
24 bits × 24 bits 48 bits (double number
of bits)
Multiplicand × 0 = 0 Zero rows are eliminated
Multiplicand × 1 = Multiplicand (shifted left)
Single: (1 + 8 + 23)
Double: (1 + 11 + 52)
Program-controlled Input/Output
operations. BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Location, Addresses, and
Operation n
bits
first word
Memory consists second
of many millions word
of storage cells, •
each of which •
•
can store 1 bit.
i th
Data is usually word
accessed in n- •
bit groups. n is •
•
called word
length. last
word
Figure Memory
BITS Pilani, Deemed to be 2.5.
University underwords.
Section 3 of UGC
Memory Location, Addresses, and Operation
•
1 0
•
•
Sign bit:b31= 0 for positive numbers
b31= 1 for negative numbers
(a) A signed integer
1K(kilo)=210
1T(tera)=240
4 4 5 6 7 4 7 6 5 4
• •
• •
• •
k k k k
k k k k k k
2 - 2 -4 2 -3 2- 2 - 1 2 - 2- 2 - 2 2 - 3 2 -4
4 2 4 1
Three-Address Instructions
▪ADD R1, R2, R3 R1 ← R2 + R3 or R3 ← R1
+ R2
Two-Address Instructions
▪ADD R1, R2 R1 ← R1 + R2
One-Address Instructions
▪ADD M AC ← AC + M[AR]
Zero-Address
▪ADD Instructions
TOS ← TOS + (TOS –
RISC Instructions 1)
▪Lots of registers. Memory is restricted to Load
& Store
BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction Formats
Example: Evaluate (A+B)
(C+D)
Three-Address
1. ADD R1, A, B ; R1 ← M[A] +
2. ADD M[B]
R2, C, D
; R2 ← M[C] +
M[D]
3. MUL X, R1, R2 ; M[X] ← R1 R2
Example: Evaluate (A+B)
(C+D) Two-Address
1. MOV R1, A ; R1 ← M[A]
2. ADD R1, B ; R1 ← R1 +
M[B]
3. MOV R2, C ; R2 ← M[C]
4. ADD R2, D ; R2 ← R2 +
M[D]
5. MUL R1, R2 ; R1 ← R1 R2
6. MOV X, R1 ; M[X] ← R1
halfwords to words.
Example: A: 11110000
▪A: 1 1 1 1 0 0
00
▪B: 0 0 0 1 0 1
+(−B): 1 1 1 0 1 1 0
00 0
11011100
C=1 Z=0
S=1
V=0
Overflow can only happen when adding two numbers of the same sign
and getting a different sign. So, to detect overflow we don't care about
any bits except the sign bits. Ignore the other bits.
𝑎 + (𝑏 × 𝑐),
becomes a b c x +
(𝑎 + 𝑏) × 𝑐 becomes
ab+cx
B ; TOS ← C
3. ADD ; TOS ← D
4. PUSH ; TOS ← (C + D)
C ; TOS ←
5. PUSH (C+D)(A+B)
D ; M[X] ← TOS
6. ADD
7. MUL BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction Formats
B ; TOS ← C
3. ADD ; TOS ← D
4. PUSH ; TOS ← (C + D)
C ; TOS ←
5. PUSH (C+D)(A+B)
D ; M[X] ← TOS
6. ADD
7. MUL BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction
Formats
Example: Evaluate (A+B)
(C+D) Zero-Address
1. PUSH ; TOS ← A
A ; TOS ← B C+D
2. PUSH ; TOS ← (A + B) A+B
B ; TOS ← C
3. ADD ; TOS ← D
4. PUSH ; TOS ← (C + D)
C ; TOS ←
5. PUSH (C+D)(A+B)
D ; M[X] ← TOS
6. ADD
7. MUL BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction
Formats
Example: Evaluate (A+B)
(C+D) Zero-Address
1. PUSH ; TOS ← A
A ; TOS ← B
2. PUSH ; TOS ← (A + B) (C+D)*(A+B)
B ; TOS ← C
3. ADD ; TOS ← D
4. PUSH ; TOS ← (C + D)
C ; TOS ←
5. PUSH (C+D)(A+B)
D ; M[X] ← TOS
6. ADD
7. MUL BITS Pilani, Deemed to be University under Section 3 of UGC
BITS
Pilani
Pilani | Dubai | Goa |
Hyderabad
Addressing Modes
BITS Pilani, Deemed to be University under Section 3 of UGC
• 8086 Intel Microprocessor
Architecture
Implied
▪AC is implied in “ADDM[AR]” in “One-
Address” instr.
▪TOS is implied in “ADD” in “Zero-
Address” instr. Immediate
▪The use of a constant in R1, 5”, i.e. R1
“MOV 5 ←
Register
▪Indicate which register
holds the operand
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes
Register Indirect
▪Indicate the register that holds the number of the register
that holds
the
MOVoperand
R1,
Autoincrement
(R2) / Autodecrement R1
▪Access & update in 1
instr. Direct Address R2 = 3
▪Use the given address
to access a memory R3 = 5
location
AR = 101
100
101 0 1 0 4
102
103
104 1 1 0 A
+
100
AR = 100
101
Could be 102 1 1 0
Positive or A
Negative 103
(2’s 104
Complemen
t)
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes
Indexe
d
▪EA = Index Register + Relative
Addr
Useful with
“Autoincrement” XR = 2
or
“Autodecrement” +
100
AR = 100
101
102 1 1 0
Could be A
Positive or 103
Negative 104
(2’s
Complemen
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes
Base Register
▪EA = Base Register + Relative
Addr
Could be Positive or AR = 2
Negative
(2’s Complement)
+
100 0 0 0 5
BR = 100
101 0 0 1 2
102 0 0 0 A
Usually points to 103 0 1 0 7
the beginning of 104 0 0 5 9
an array
Offset
Register addressing
Branch
Immediate
– On word boundary
– Effective range +/-32MB from PC.
BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Load/Store
Multiple Addressing
Load/store subset of general-purpose registers
before
Instruction Length
Affected by and affects:
Memory size
Memory organization
Bus structure
CPU complexity
CPU speed
Trade off between powerful instruction repertoire and
saving space BITS Pilani, Deemed to be University under Section 3 of UGC
Allocation
ofaddressing
Number of Bits
modes
Number of operands
Register versus memory
Number of register sets
Address range
Address granularity
Can also be downloaded from the below link and installed on your
PC, it is a free tool
STB #51,@R01
STB #51,@R01
Opcode
Operand
Direct
operand references
Eg: Add A 87H
or Add A B
Next Instruction
Instruction Set Design Issue
Type of Operation ?
Kind of Data ?
Instruction Format ?
Number of internal Registers in
CPU ?
Addressing Modes?
Type of Operation ?
Kind of Data ?
Instruction Format ?
Number of internal Registers in
CPU ?
Addressing Modes?
▪Compliers
▪I/O routines
a.
Sol:2
0 d. 30 e. 50 f .
b. 4 70
0
c. 6
0
BITS Pilani, Deemed to be University under Section 3 of UGC
In class example 4
Consider a 16-bit processor in which the following appears in main memory, starting at
location 200:
The first part of the first word indicates
that this instruction loads a value into
an accumulator.
The Mode field specifies an addressing mode and, if appropriate, indicates a source register;
assume that when used, the source register is R1, which has a value of 400.There is also a
base register that contains the value 100.The value of 500 in location 201 may be part of the
address calculation. Assume that location 399 contains the value 999, location 400 contains
the value 1000, and so on. Determine the effective address and the operand to be loaded for
the following address modes:
C
B
A
X = (A + B X C) ∕ (D - E
X F)
BXC
A
X = (A + B X C) ∕ (D - E
X F)
X = (A + B X C) ∕ (D - E
X F)
ExF
D
A+BXC
X = (A + B X C) ∕ (D - E
X F)
D-ExF
A+BXC
X = (A + B X C) ∕ (D - E
X F)
A+BxC /
D-ExF
X = (A + B X C) ∕ (D - E
X F)
DSRao
COA -IS ZC353
Memory
Characteristics of memory
systems
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical
characteristics
Datapath Output
Capacity Speed
(latency)
Logic: 2x in 3 2x in 3 years
years
DRAM: 4x in 3 2x in 10 years
years
Disk: 4x in 3DRAM 2x in 10 years
years
Year Size Cycle Time
1980 1000:1! 64 Kb 2:1! 250 ns
1983 256 Kb 220 ns
1986 1 Mb 190 ns
1989 4 Mb 165 ns
1992 16 Mb 145 ns
1995 64 Mb 120 ns
1998 256 Mb 100 ns
2001 1 Gb 80 ns
100 Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
DRAM
9%/yr.
1 (2X/10 yrs)
1980
1981
1982
1983
1984
1985
2000
1986
1987
1989
1990
1991
1992
1993
1994
1995
1997
1998
1999
1988
1996
Time
BITS Pilani, Deemed to be University under Section 3 of UGC
Locatio
n
• Internal (main)
– CPU : In the form of registers, control unit
memory
– Cache , main memory
• External(secondary) CONTROL UNIT
Control
ALU memory
Registers
CPU
Internal CPU
Interconnection
(Bus)
Control unit
• Direct
– Individual blocks have unique address
– Access is by jumping to vicinity plus sequential
search
– Access time is variable i.e. access time
depends on location and previous location
BITS Pilani, Pilani
Access Methods
(2)
Random
– Addressable locations are identified by individual
addresses
– Access time is independent of location or previous
access
– e.g. RAM
Associative
– A word is retrieved based on a portion of its
content rather than its address.
– Access time is independent of location or previous
access
– e.g. cache
BITS Pilani, Pilani
Performanc
e
Access time or Latency
– Time between presenting the address and
getting the valid data
Memory Cycle time
– Memory Cycle time = access time +
Additional time
– Additional time may be required for transients
to die out on signal lines or to generate data if
they are read destructively
Transfer Rate
– Rate at which data can be transferred into
or out of a memory unit
CPU
Processor
Increasing
Level
distance from
1
the CPU in
access time
Levels in the Level 2
memory
hierarchy Data are
transferred
Level
n
Processor
Control
Secondary Tertiar
Storage y
Second Main (Disk) Storage
Level Memory (Tape)
Registers
Datapath On-
Cache
Chip
Cache
(SRAM) (DRAM)
Performance of accesses
involving only level 1 (hit ratio)
offse
1
bits Hi
t
Ta
2
0 0
t
Dat
a
g
▪We need to address 1024 (210) words Inde
x
▪We could have any of 220 words per Index Valid Dat
a
Tag 0
cache 1
2
location
usually indicated by
address size – (log2(memory size) +
2)
▪E.g. 32 – (10 + 2) = 20
BITS Pilani, Deemed to be University under Section 3 of UGC
Elements of Cache
Design
Cache Addresses
Logical (also known as a virtual
cache)
Physical Write Policy
Cache Size
Write through
Mapping Function
Write back
Direct
Write once
Associative
Line Size
Set Associative
Replacement Algorithm Number of caches
Least recently used (LRU) Single or two level
First in first out (FIFO) Unified or split
Least frequently used (LFU)
Random
BITS Pilani, Deemed to be University under Section 3 of UGC
Size does
matter
Cost
• Small enough so that the overall average cost per bit is
close to that of main memory alone.
Speed
• Large enough so that the overall average access time
is close to
that of cache alone.
1 1 2
cache miss: A request for data from the cache that cannot be fulfilled
because the data is not present in the cache
CPU time =
(CPU execution clock cycles + Memory-stall clock cycles) × Clock
cycle time
• 32-word memory
Inde Vali Tag Data
• 8-word cache x d
000
• (The addresses 001
010
011
below are word 100
101
addresses.) 110
111
Address Binary Cache Hit or miss
block
22 10110 110
26 11010 010
22 10110 110
26 11010 010
16 10000 000
3 00011 011
16 10000 000
18 10010 010
cache with four byte blocks, if the cache contains 64 Kbytes of data and the
address length of the
With 32 byte blocks, the 64 Kbyte cache contains a total of 64K/32 = 2K blocks.
The cache has the following parameters
(a) Byte select size: 5 bits (since 25 = 32 bytes/block)
(b) Cache index size: 11 bits (since 211 = 2K blocks)
(c) Cache tag size: 16 bits (remaining bits from 32)
(d) Block size: 256 bits (32 bytes)
(e) Number of blocks: 2K blocks (cache size/block size)
cache bits = number of blocks x (block size + tag size + 1)
= 211 x (256 + 16 + 1)
= 559,104 bits
Note: Increasing the block size tends to decrease the number of bits needed to implement
the cache
BITS Pilani, Deemed to be University under Section 3 of UGC
Miss rate versus block
size
Explore?
𝐵𝑦𝑡𝑒
Address of the block is 𝐴𝑑𝑑𝑟𝑒𝑠𝑠
𝐵𝑦𝑡𝑒𝑠 𝑝𝑒𝑟
This block contains all 𝑏𝑙𝑜𝑐𝑘of 8 bytes
addresses
𝐵𝑦𝑡𝑒
𝐴𝑑𝑑𝑟𝑒𝑠𝑠 × 𝐵𝑦𝑡𝑒𝑠 𝑝𝑒𝑟
𝐵𝑦𝑡𝑒 𝑝𝑒𝑟
𝑏𝑙𝑜𝑐𝑘
60
= 75 𝑏𝑙𝑜𝑐𝑘
08 is the block address
This maps to (75 modulo 32) = 11
In fact this contains all address between 600 to 607
BITS Pilani, Deemed to be University under Section 3 of UGC
Direct mapping
Summary
Cache Line Main Memory blocks assigned
0 0,m,2m,…., 2s-m
. .
. .
. .
JEQ 28 instruction?
BITS Pilani, Deemed to be University under Section 3 of UGC
Configure the cache
Block Size = 4
Cache Type = Direct Mapped
Cache Size = 16
Write Policy = Write-Back
Fully associative:
Block 12 can go anywhere
Direct mapped:
Block no. = (Block address) mod
(No. of blocks in
cache)
Block 12 can go only into
block 4 (12 mod 8 = 4)
=> Access block using lower
3 bits
Direct-mapped cache
(Block number) modulo (Number of blocks in the cache)
0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
Assume there8 are three small
0 caches, each consisting
miss Mem[8]of four one-wordMem[6]
blocks. One cache is fully
associative, a second is two-way set-associative, and the third is direct-mapped. Find the number of
misses for each cache organization given the following sequence of block addresses: 0, 8, 0, 6, and 8.
0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6] Block Cache Set
Adrs
loc
Fully associative k
0 (0 modulo 2)=0
Set-associative caches usually replace the least recently
used b within a set 6 (6 modulo 2)=0
8 (8 modulo 2)=0
Block Hit/miss Cache content after access
addres
s
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0]
BITS Mem[8]
Pilani, Deemed to be University under Section 3 of UGC
What is when the block size is 8
For two way set Associative we have 2 Sets (0,1)
0 M 0
8 M 0 8
0 H 0 8
6 M 0 8 6
8 H 0 8 6
there would be no replacements in the two-way set-associative
cache;
and it would have the same number of misses as the fully associative cache.
Check for yourself.: if we had 16 blocks, all 3 caches would have the same
number of misses BITS Pilani, Deemed to be University under Section 3 of UGC
Q2: How Is a Block
Found?
The address can be divided into two main
parts
▪Block offset: selects the data from the
block
offset size = log2(block
size)
▪Block address: tag + index
– index: selects set in cache
index size = log2(#blocks/associativity)
– tag: compared to tag in cache to
determine hit
tag size = addreess size -
index size - offset size
Each block has a valid bit that tells if the block is valid - the block is
in the cache if the tags match and the valid bit is set.
= 20
5𝑛(or first-level)
With two levels of caching, a miss in the primary
𝑠
0.25𝑛
is = be satisfied either by the secondary𝑠 cache
𝑐𝑦𝑐𝑙𝑒𝑠
penalty for an access to the second-level
cache
cache can or by main
memory. The miss BITS Pilani, Deemed to be University under Section 3 of UGC
Thus, for a two-level cache, total CPI is the sum of the stall
cycles from both levels of cache and the base CPI:
15
%
1
12 KB
%
2
9 KB
%
4
6 KB
% 8
KB
3 16
% KB
32
64 128
KB
0 KB KB
One- Two-way Four- Eight-
way Associativi way way
ty
To further reduce the gap between fast clock rates of CPUs and the relatively long time
to access memory additional levels of cache are used (level two and level three
caches).
The primary cache is optimized for a fast hit rate, which implies a relatively small
size
A secondary cache is optimized to reduce the miss rate and penalty
needed to go to memory.
Example:
Assume CPI = 1 (with all hits) and 5 GHz clock
100 ns main memory access time
2% miss rate for primary cache
Secondary cache with 5 ns access time and miss rate of .5%
What is the total CPI with and without secondary cache?
BITS Pilani, Deemed to be University under Section 3 of UGC
Reducing the Miss Penalty using
Multilevel Caches
The miss penalty to main
memory: 100 ns / .2 ns per
cycle = 500 cycles
With no cache
Fetch time = (10 passes) (68 blocks/pass) (10T/block) =
6800T
With cache
Fetch time =(68)
(11T) first pass
other
+ (9) (48) (T) + (9) (20) passes
(11T) BITS Pilani, Deemed to be University under Section 3 of UGC
Modern
Systems
External Device
An external device connects to the computer by a link to an I/O module
• Isolated I/O
oSeparate address spaces
oNeed I/O or memory select lines
oSpecial commands for I/O
Limited set
2
Interru
pt i
occurs
here i+
1
CPU Viewpoint
•Issue read command
• Do other work
• Check for interrupt at
end of each
instruction cycle
• If interrupted:-
o Save
context
(registers)
o Process
interrupt
Fetch
data
BITS Pilani, Deemed to be University under Section &
3 of UGC
Changes in Memory and
Registers for an Interrupt
Daisy-chaining interrupts at
the same level of priority
• Example - PC Bus
• 80x86 has one interrupt line
• 8086 based systems use one 8259A interrupt
controller
• 8259A has 8 interrupt lines
o CPU is tied up
How many cycles per second does the processor spend handling I/O from
Sol: The device makes 150 requests, each of which require one interrupt.
Each interrupt
takes 12,000 cycles (1000 to start the handler, 10,000 for the handler,1000
to switch
back to the original program), for a total of 1,800,000 cycles spent handling
BITS Pilani, Deemed to be University under Section 3 of UGC
Example contnd.
b. How many cycles per second are spent on I/O if polling is used (include all polling
attempts)? Assume the processor only polls during time slices when user programs
are not running, so do not include any context-switch time in your calculation
Sol: The processor polls every 0.5 ms, or 2000 times/s. Each polling attempt takes
500 cycles, so it spends 1,000,000 cycles/s polling. In 150 of the polling attempts,
a request is waiting from the I/O device, each of which takes 10,000 cycles to
complete for another 1,500,000 cycles.
Therefore, the total time spent on I/O each second is 2,500,000 cycles with polling.
C. How often would the processor have to poll for polling to take as many cycles
per second as
interrupts
Sol: In the polling case, the 150 polling attempts that find a request waiting
consume 1,500,000
cycles. Therefore, for polling to match the interrupt case, an additional 300,000
BITS Pilani, Deemed to be University under Section 3 of UGC
Example 2.
An I/O device transfers 10 MB/s of data into the memory of a processor over the I/O
bus, which has
a total data transfer capacity of 100 MB/s. The 10 MB/s of data is transferred as 2500
independent
pages of 4 KB each. If the processor operates at 200 MHz, it takes 1000 cycles to
initiate a DMA transaction, and 1500 cycles to respond to the device's interrupt when
the DMA transfer completes, what fraction of the processor's time is spent handling
Sector 0,
Sector 2,
track 1
trackn Sector 0,
track 0
Seek time: is the time required to move the disk arm to the required track.
Rotational delay: Disks, other than floppy disks, rotate at speeds ranging from
3600 rpm (for handheld devices such as digital cameras) up to, as of this writing,
20,000 rpm;
Transfer time : The transfer time to or from the disk depends on the rotation
speed of the disk T = b /rN
T = transfer time ;b = number of bytes to be transferred; N = number of bytes
on a track ; r = rotation speed, in revolutions per second
Ttotal = Ts + 1/2r + b/rN
First, let us assume that the file is stored as compactly as possible on the disk. That is, the
file occupies all of the sectors on 5 adjacent tracks (5 tracks * 500 sectors/track = 2500
sectors). This is known as sequential organization
Average seek 4 ms; Average rotational delay 2 ms
Read 500 sectors= b/rN= (512X500 X 60)/(15000 X500X512)=4 ms ; Thus Total=10 ms
Suppose that the remaining tracks can now be read with essentially no seek time.
That is, the I/O operation can keep up with the flow from the disk. Then, at most, we
need to deal with rotational delay for the four remaining tracks. Thus each successive
track is read in 2 + 4 = 6 ms.
To read the entire file, Total time = 10 + (4 * 6) = 34 ms = 0.034 seconds
Now let us calculate the time required to read the same data using random access rather
than sequential access;
that is, accesses to the sectors are distributed randomly over the disk.
For each sector, we have
Average seek 4 ms
Rotational delay 2 ms
Read 1 sector =
b/rN= (512X
60)/(15000
X500X512)= 0.008 ms
Which is
6.008 ms
Total time = 2500 * BITS Pilani, Deemed to be University under Section 3 of UGC
Access Time
Seek Time: 5 to 8 ms( Time required to move the
read/write head to proper track
Rotational delay or latency time(time to reach the
correct sector over the track:
On average This is the time for half a rotation of disk
Access Time is the sum of these delays
Disk capacity( of a 3.5 inch diameter )
20 Data recording surface
15000 tracks / surface
400 sectors per track
GBytes
Access time
=
6
0
10000×
Typical Average seek time 6 ms
= 3𝑚𝑠
rotate at 10,000 rpm so latency
2
BITS Pilani, Deemed to be University under Section 3 of UGC
Some basic
concepts
• Maximum size of the Main Memory is determined
by the addressing scheme
• byte-addressable(big endian or…
Little endian)
• CPU-Main Memory Connection
Processor Memory
k-
address
bit
MAR bus
n-bit
data
bus Up to 2k
MDR addressable
locations
Word length = n
Control lines bits
( R / W , MFC,
etc.)
BITS Pilani, Deemed to be University under Section 3 of UGC
Big Endian or Little
Endian
2^32= 4 G memory
locations
T2
T X Y
1
Word line
Bit lines
Figure shows a simplified timing diagram for a DRAM read operation over a
bus. The access time is considered to last from t1 to t2. Then there is a
recharge time, lasting from t2 to t3, during which the DRAM chips will have
to recharge before the processor can access them again.
a. Assume that the access time is 60 ns and the recharge time is 40 ns.
What is the memory cycle time? What is the maximum data rate this
DRAM can sustain, assuming a 1-bit output?
b. b. Constructing a 32-bit wide memory system using these chips
yields what data transfer rate?
Row/
Address address
Column
RAS
R/ W
Memor CAS
Request y R/ W
Processor controll
CS Memory
er
Clock
Clock
Dat
a
•Consecutive words
located in are
modules. consecutive
ABR DBR ABR DBR ABR DBR
•Consecutive addresses
Modul
e0
Module
i
Modul
e 2k -
be
canlocated in consecutive
1 modules.
• While transferring a
block of data, several
memory modules can be
kept busy at the same time.
Example: Contnd
A disk has 24 recording surface has 14000
tracks. There is an
average of 400 sectors per track. Each sector
contains 512 Bytes
of data
Need 9 bits to identify a sector, 14 bits for a track, and 5 bits for a
surface.
Thus, a possible scheme is to use address bits A8-0 for sector,
A22-9 for track,
BITS Pilani, Deemed to be University under Section 3 of UGC
Example
Design a 16-bit memory of total capacity 8192 bits using SRAM chips of size 64 X 1 bit. Give the array
configuration of the chips on the memory board showing all required input and output signals for
assigning this memory to the lowest address space. The design should allow for both byte and 16-bit
word accesses.
Sol:
Number of total addressable locations of 16 bits= 8192/16= 512
no of 64 X 1 chips required 8192/64= 128 chips
There will be 128/16=8 row of 16 bits
512 can be rranged into 8 rows X 64 columns
For 8 rows no of decoder bits 3
For 64 columns no of decoder bits 6
%=
out of 1ms
150𝑛𝑠 ×100
refresh operation consumes250
60% 𝑛𝑠
•Out of total time of memory cycle of 250 ns
BITS Pilani, Deemed to be University under Section 3 of UGC
Figure shows a simplified timing diagram for a DRAM read
operation over a bus.
Since the clock rate is 100 MHz, the cycle time is:
1/(100 MHz) = 10 ns
which gives
AMAT = 10 ns x (2 + 20 x 0.05) = 30 ns
Suppose doubling the size of the cache decrease the miss rate
to 3%, but causes the hit time to increases to 3 cycles and the
miss penalty to increase to 21 cycles. What is the AMAT of the
new machine?
1 block 1 block
0 0
sector 2 block 2 block
sector
s 1 1
s
3 block 3 block
2 2
4 block 4 block
3 3
5 block 5 block
Disk Disk
4 41
0
5 5
BITS Pilani, Deemed to be University under Section 3 of UGC
Mirrorin
g
• Keep two copies of data on two separate disks
• Gives good error recovery
– if some data is lost, get it from the other source
• Expensive
– requires twice as many disks
• Write performance can be slow
– have to write data to two different spots
• Read performance is enhanced
– can read data from file in parallel
1 block 1 block
0 1
sector 2 block 2 block
sector
s 2 3
s
3 block 2
4 3
3 4
4 5
5
Disk Disk
0 1
BITS Pilani, Deemed to be University under Section 3 of UGC
Data Mapping For RAID 0
1 block 1 block
0 0
sector 2 block 2 block
sector
s 1 1
s
3 block 3 block
2 2
4 block 4 block
3 3
5 block 5 block
Disk Disk
4 41
0
5 5
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID Level 2
Uses Bit-level striping with Hamming codes for ECC
Number of disks required depends on exact implementation
Only fair fault tolerance
Advantages
Random Read performance=fair
Sequential Read Performance=very good
Sequential Write performance=good
Disadvantages
Random Write Performance=poor
Requires a complex controller
High overhead for check disks
Not used in modern systems
A computer executes a
program Fetch/execute
cycle
Each cycle has a number of steps : Called
micro- operations
Each step does very little: Atomic operation of
CPU micro-operations needed to perform the
subcycles of the instruction cycle.
IR ← [[PC]]
processor to another.
Registers are provided for general purpose used by
programmer.
Special purpose registers-index & stack registers.
0 Q
Io Q
1 B
B
something Instruction
register
– Op-code causes different control signals for each
different instruction
– Unique logic for each op-code
Step Action
Add (R3),
1 PCout , MAR in , Read, Select4,Add, R1
2 Z in
4 C MDRout , IRin
5 Offset-field-of-IRout,
Add, Zin
Zout, PCin , End
Branc Ad
h d
T4
T6
T1
Zin = T1 + T6 • ADD + T4 • BR
+…
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Problems With Hard Wired
Designs
Complex sequencing & micro-operation
logic
Difficult to design and test
Inflexible design
Difficult to add new instructions
Add R1,
(R3)
▪4:
if is a single if (MBR)
micro-
operation == 0
then PC
▪Micro-operations done
<- (PC)
during t4 + 1 BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Execute Cycle (BSA)
BSA X - Branch and save address
▪Address of instruction following BSA is
saved in X
▪
▪Execution continues from X+1
t MAR <- (IRaddress)
▪1: MBR <- (PC)
▪t PC <- (IRaddress)
▪2: memory <-
▪t (MBR) PC <- (PC) + 1
3:
The address in the PC at the start of the instruction is the address of the next
instruction in sequence. This is saved at the address designated in the IR.
The latter address is also incremented to provide the address of the
instruction for the next instruction cycle
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Multiple-Bus
Organization
•Allow the contents of two
different registers to be
accessed simultaneously and
have their contents placed on
buses A and B.
• Incrementer unit.
Step Action
In this example, two integer, two floating-point, and one memory (either
load or store) operations can be executing at the same time.
Superpipelined
Many pipeline stages
need less than half a
clock cycle
Double internal
Degree 2
clock speed gets two
tasks per
external clock cycle
Superscalar allows
parallel fetch execute
IR
U
AL
P
C
Reg File
ory
Mem
IR
U
AL
P
C 000001000010001000011
…
00001100011111110000
Reg File
0…
y
Memor
IR
U
AL
P
C 000001000010001000011
…
00001100011111110000
Reg File
0…
y
Memor
IR
000001000010001000011
U
AL
…
P
C 000001000010001000011
…
00001100011111110000
Reg File
0…
y
Memor
Fetch the
instruction
IR
000001000010001000011
U
AL
…
000001000010001000011
P …
C 00001100011111110000
Reg File
0…
Fetch the
y
Memor
instruction Move PC
to the next
instruction
IR
000001000010001000011
U
AL
…
Reg
File
0
R
x
0
y
g e
Re Fil
R
z
1
ory
Mem
R
R3
2 …
1
R
3 BITS Pilani, Deemed to beAct,
University under Section 3 of UGC
1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
000001000010001000011
U
AL
…
Reg
File
ADD R 0
0 x
R y
g e
Re Fil
decode the
1 z instruction
ory
Mem
R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
000001000010001000011
U
AL
…
Reg
File
ADD R 0
R1
0 x
R y
g e
Re Fil
decode the
1 z instruction
ory
Mem
R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
000001000010001000011
U
AL
…
Reg
File
ADD R1 R 0
R2
0 x
R y
g e
Re Fil
decode the
1 z instruction
ory
Mem
R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
000001000010001000011
U
AL
…
Reg File
ADD R1 R2 R3 R0 0
R x
1 y
g e
Re Fil
R z
ory
Mem
2
R
R3 …
13
U
AL
…
y
Reg
File
ADD R1 R2 R3 0
R0
x
R
y
g e
Re Fil
1
z
R
ory
Mem
2
R3
R …
1
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Execution (EX)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011
U
AL
…
y
Reg File
ory
Mem
U
AL
… y
y
Execute the
operation
Reg File
ory
Mem
U
AL
… y
y
Reg
File
R 0
0 x
R y Write the result to the
g e
Re Fil
register
1 z
ory
Mem
R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Writeback (WB)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011 x+
U
AL
… y
y
Reg
File
R 0
0 x
R y Write the result to the
g e
Re Fil
register
1 x+y
ory
Mem
R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956