0% found this document useful (0 votes)
364 views27 pages

CH - 3 Memory Hierarchy Design PDF

Advanced computer architecture

Uploaded by

Basant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
364 views27 pages

CH - 3 Memory Hierarchy Design PDF

Advanced computer architecture

Uploaded by

Basant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

§

. s£- - :.
A'
. < I

,f
i.
V •: .
> vr « ?.ti*rv > c 3.«ja. yj,*j -:•. /)-' 1' -
• 1

-
= .. ; -
'
ft
i v *J

»
i
*>
••

r ?
' *, • ' . '

l,
• * /

la « C •

pi
• J: :x ./\
. J
*.? V 1
.•->
*•
•? sr . v
I- •
• * viAV ; •
:- <W
rj
,V • . .. I- 1

•r l

* fs
:,
U'
»
r ’ •
r
/
m >•:>
y t
• J-

.1jm
J
yi: *
r

- V-

*®g
.
#C
I
t ;i- r f VJ » ‘Bjtjtf *
-, ,
: .
• •
^ *

.
rt'
> ,,

H
/ >:• ' 'i ji
.
•r f Ii
(.*
ft %'%'
'
’: ' -4 rK
^ •'

t

' <

SpS KI - L'S
> « .*
^V:: •.
• f . ‘j
- :rj
'

Me
33- «s v.-
mo ry
J
Hi era rch y
J

7

;~
; v & .
.

•i
*
- r
.V ,

•:V
V, ;
*
Design

studying this chapter students


This chapter deals with the terms and concepts related with memory. After
memory hierarchy , how different
will be able to answer the question related with memory , levels of
types of memories are organised etc.

8
3.1 INTRODUCTION •- ' *
-

memory is used for storing the programs


Memory unit is an essential component in computer system and
(sequence of instructions) and data (state of instruct
ions). The term primary memory is used for the
information in physical systems which function at high-speed
(i.e. RAM ), as a distinction from secondary
data storage which are slow to access but offer
memory, which are physical devices for program and
higher memory capacity.
used for limited application . But
Additional storage capacity is not needed in the computers that aie
the capacity of main memory is
general purpose computer works well if additional storage beyond
provided.
used in typical
There is not enough space in one memory unit to accommodate all the programs
needed at the same
computer. Also we know that all of the stored information in the computer is not
a backup for storing the
time. Therefore ii is more economical to use low cost storage devices to serve as
information that is not currently used by CPU.
Main memory is the memory that can directly communicates directly with CPU . For backup storage
auxiliary memory such as magnetic tape , disk drives are used. The benefit of using auxiliary memory is
that only programs and data currently needed by the processor reside in the main memory and all
other
information is stored in the auxiliary memory . The information is transferred from auxiliary memory to
main memory when required .

99

Scanned with CamScanner


5§ j 005

3.2 MEMORY
f . -
,• • v ,
Advanced Computer Architecture^. imarofwaagju mm

HIERARCHY
'
•' *

,
The total memory capacity can be seen as being a
,yJ; ,

tmnm

SE
» mrrmve***


>

hierarchy of components. While discussing performance


programs
-v V

.:
•*
*

issue, in computer Ml design , algorithm predictions


conslrucls such as involving locality of reference the term
, and he lower level or
mentor hierarch is used .,A
8
0 aramming
memory ,
hierarchy ” in computer storage distinguishes each level in the ‘‘hierarchy by response time ll *

a>

response time, complexity , and capacity are related , the levels may also be distinguished >
t
00
controlling technology . E
O

Thus memory technology and the storage organization at each level are characterized by 5 parameters .
6
$
O

1. Access Time: Access time: refers to round trip time from CPU to ith level memory
2. Size of Memory: Size of the memory is the number of bytes or words in level i.
C/)

3. Cost Per Byte: Cost per byte is the cost of ith level memory is estimated by the product of cost
per byte * memory size.
4. Transfer Bandwidth: Transfer bandwidth refers to rate at which information is transferred
between the adjacent levels of memory hierarchy. «

5. Unit of Transfer : Unit of transfer refers to data transfer between level / and level i + 1 .
The various components can be viewed as forming a hierarchy of
which each member m i 5 a subordinate to the next highest member
memories (
of the hierarchy .
m in mv **

Al the higher level of the hierarchy are the relatively slow magnetic
tapes, optical disks etc. are used
to store the removable tiles. Memory devices at the
lower level (level 0) are fastter to access, smaller
size and more expensive per byte, having a in
higher bandwidth and using smaller
compared with those at higher levels. unit of transfer as

The idea behind memory hierarchy is that

• For each /, the faster, smaller device at level / serves as a cache, for the larger,
at level / 4 1 . slower device

--
Why do memory hierarchies work ?
The answer of this question are

: zr;;c
• Net effect is a that large pool of me
T : '
” T" ^
' ’ '
**
^ cheap
'“P Per bit. ®
.
»* at level / + ].
storage near the bottom
storage near the top.
(\
°°
Memory Hierarchv h
. •
oeS
-
we- -awi' wt *

1 V

CPU registers hold words retrieved from


/Registers r cache memory.
-
Smaller- y

^'
performance . L, On-chip Lt ) \
// cache (SRAM - a;
l

.
faster

.
ogramnung and J c
c
CO
A “memory costlier o
CO
time. Since
hcd by the
(per byte)
storage
devices
1-2
Off-chip L-2
cache (SRAM
) LI L2 cache holds cache lines
retrieved from memory < £nlelev
s6
E
CO
o
£
" ‘ ^
uU P
e
the
5
-o
c
<u
Main memory 0f
c
1- 3 (DRAM) Main memory holds disk Most CO
o
CO
11
blocks retrieved from local 1- afl
disks. a d£
JCt of cost 2-
Larger,
slower, Local secondary storage 3 . atr*
and L4 (local disks) bot!
"erred Local disks holds files
cheaper
(per byte) > retrieved from disks on cac
storage remote network serv
1.

mn ) in
t
devices
Ls
Remote secondary storage
(distributed file systems Web servers
)
m
febiPs*
V

Fig. 3.1 Levels of memory hierarchy with increasing capacity and decreasing used
speed and cost from low to high levels. g thatt
^
i& that
Processor Super fast j l&ftoin
super expensive
tiny capacity
device ' •' • 3. Main
Processor
Register Primary
mm to the IT
. . CPU Cache Faster cost efft
u Level.1(L1) Cache expensive I
+ 1. Level 2 (L2) Cache small capacity V .
Level 3 (L3) Cache Stat
) ttom,
EDO, SD-RAM, DDR-SORAM,
RD-RAM and More ...
Fast
1 .
Access Memory (RAM) priced reasonably |
SSD1 Flash Drive :
average capacity
• V
I Pirn
N -Vo|aU eF
“’ , ,ash BasedMem
.
Average speed
priced reasonablyi
v average capacityl
0(
Mechanical Hard '
t %
Driver
Slow
File-Base cheap $
Memory
Fi9- 3.2
More detailed
vi
large capacity
I «
e w o f memory hie .. M
rarchy (Ryan J.Leng).
8 AdvancedComputerArchileclure • . v*«
«
-rn ^

.
Hierarchy Level A
*
of Memory 4. Di

-
Exte
isiers - ston
1 . , . of a group of nip flop* with a common clock input. Rcgjstcr arr,
rc? is cr insists
binary data A counter is conslructcd from two or more flinfi, common|y used to
' ston

^
A
*! **• « *
phase the register
transfer operations are directly controlled by the
process
*

^ ,h' gigi
inf (
Memory
1 Cache dat
whiles copL tf £
.£ cSe f'a smaillVfasrer tilTry
0 CD
C

£U£. memory locations. The cache controlled by MMU and it can be implemented au ne ! me c
T0 CO
o
Ira
2 ts

Multiple levels depending upon speed and application requirements.


> >
E
03
o
£
have at least three independent caches:
2 5

Most of the CPUs


O
"

QJ
C
c
to speed up executable instruction fetch,
p CO

1. anI instruction cache


o
c/3
data cache to speed up data fetch and store, and P
2. a
3. a translation lookaside buffer TLB ) used to speed up virtual-to-physical address translation for
(

both executable instructions and data. The data cache is usually organized as a hierarchy of more
,
cache levels L , L2 etc.
'.
V

Random access memory ( RAM ) is a computer’s volatile or temporary memory, which exists as
" r chips on the motherboard near the CPU * RAM stores data and programs while they are being
'

used and requires a power source to maintain its integrity. Cache memory is high speed memory
that resides between the CPU and RAM in a computer. Cache memory stores data and instructions
k that the CPU is likely to need next. The CPU can retrieve data cr instructions more quickly
e*pe-$*
tiny C8pa - ^ from cache than it can from RAM or a disk

3. Main Memory/Physical Memory


Sg® .

Primary storage (also main memory and physical memory) are generally used interchangeably to reler
to the memory that is attached directly to the processor. It is larger than the cache and implemented by
ft* cost effective RAM chips. Integrated RAM chips are available in 2 possible modes: static and dynamic.
expe'® 5
cap*’*
Static RAM (SRAM) •
V
*

.
• in SRAM each cell stores bit with a six-transistor circuit.
ft SRAM retains value indefinitely , as long as it is kept powered v ,

ss? SRAM is relatively insensitive to disturbarices such as electrical Doise.


• SRAM is faster (8 - 16 times faster) and more expensive (8-16 times more expensive ^ well )
'• : <
Dtynarni c RAM (DRAM )
**
t
! each cell stores bit with a capacitor and transistor.
values must be refreshed every 10- 100 ms.
^
^
AM is sensitive to disturbances. .w
AM is slower and cheaper than SRAM.
Memory Hierarchy Design
103
1
- -• ’ ?*

-
Jk * Kr *

4. Disk Drives and Tape Units


memory, allows the permanent
Externa] memory which is sometimes called hacking store or secondary line memory for use as backup
off -
uan es of data . The magnetic taj>e units are the
sto ^° ^ ^
in hundreds o mega v es
The capacity of external memory is high , usually measured
memory has the important prope
gigabytes ( thousand million bytes ) at present . External
information stored is not lost when the computer is switched olf
.
directly access data that is in main memory . To P
It is important to note that the CPU can only
it to main memory . ccessi CD

data that resides in external memory the CPU must first transfer
£Z

) in relation to Cl spee s C
CD
memory to find the appropriate data is slow ( milliseconds .
a
c/5
been located
transfer of data to main memory is reasonably fast once it has
E
CD
o
£
T3

MEMORY (Main Memory) CD


C
C
CD

to CPU.
O

Physical memory is the main memory that has direct/ indirect access
CO

Physical memory mainly consists of RAM & RAM and is of 2


types

1. DRAM - Dynamic Random Access Memory


/

2. SRAM - Static Random Access Memory

Auxiliary Memory

Magnetic Main
Memory
Tapes
I /O
Processor

Magnetic
Disk Cache
MEmory

CPU

with CPU.
Fig. 3.3 Main Memory directly communicate

p
.-
j »f

....
< I '
ITT *
a
i .
\c
.
%t *
% ' IT . H ‘
y ••

,
*i
1 « amt i. v
» :•
!I
w
•«*

Fig. 3.4 Physical Memory.


Special-
purpose
CPU memory Main
memory
1-Unit E-Unit

Cache Memory
Register
File
Virtual QJ
C
Inter- memory c
CO

I L2/L3 L 3 /L 4 connection u
-
On chip -
On Chip
cache cache network
CD
E
L1 chache L1 Cache
* Disk files & ,
CO
O

r '

5
TLB datbases -O
0)
c
c
CO
Special- a
cn
purpose
#
CD-ROM'
caches TAPE
<- Other etc. - •
computers
& WWW

Fig. 3.5 Physical Memory Architecture.

3.3.1 Static RAM (SRAM)

^
Basic Architecture: }} basic architecture of
^ to SRAM includes one or more rectangular arrays of memory

>
cells with support circuitry decode address and implement read and write operation
support circuitry is also used to implement special !* Additional
on chip.
features such as burst operation, may also be
present

Ar - An
ui Address
CLK
i
Control WE
D, Signals
CS

D2 Data
Inputs
Memory Array
Dn

May be register
or latch

Data Outputs

T
OQ0
DQ
Fig. 3.6
Block dila9ram of
SRAM

i
Memory Hierarchy Desire tyf
.
. •4
^s

|n rows and
columns of memory Ceiis
arranged c
MEUK'Mewtt*3*'
" • n of
n
Diagram inrV arrays are has a unique location defined by intersect;
cell , dcterminedJaiiQi3LsiZ£-QtjneaiQj N !^
, /
10
Descriptl° ys. srAM I**"
& , h menlory v chiojs ‘ >


i
1 c"'0 ryorbiilines resPcC lV'fyalTaysonJI!e!S2i

n
ory
wordlines o nuniber 9* * - . .. i /nc n
raw and c0 un!?J uSt operate nOi.
at which ]52£22
2. Memory
flop may he in

3ry I shown in hi&


rtf - &
E
: From
3. Control Signals In some bKAivi O
£
*17] jSSenusSinput and output. ip. WE is write enable used to ry.
jesJ D
"

at «!5EiJ
3 ^^
CS.)
between a
memory.

Write enable (VVE)

Intput Output
. ••ir ’ •

Fig. 3.7 WE, Input and output signals in SRAM .

When enable is high, output is same as input,


otherwise, output holds last value

WL

—h H
'
"•
. VDD
M4

=
dw r 5 '
Q
5

M , -J M3
BL
BL

Fi9.3.8
Memory ce| ,
anced Computer Architecture
106 **^ Tf S' '

cpAM Organization: 16 -word x 4-bit


ryp'ical 5 Din 2 Din 1
Din 3 DinO DRAM
T( '
WrEn DRAM
96
Wr Driver & Wr Driver & Inters
VVrDriver &+ -Precharger+ -Precharger+
Wr Driver &
„ Precharger •
Precharger +
Supp <
»
SRAM SRAM _ SRAM • Word 0 AO
/

Cell Cell Cell SRAM .


k
Cell
A1
a3
SRAM SRAM _ SRAM _ SRAM
Word 1 w A2
tz
c
CD

x Cell Cell Cell Cell


£
T3
D
< A3 «
a
CO
E
CD
o
x. 9
-
1
£
T3
CD
C

SRAM Word 15 c
SRAM SRAM SRAM
CD
o
Cell Cell Cell
Cla ? CO
Cell

-Sense Anp+ -Sense Anpt. - r -Sense Anp+ -Sense Anp+

Dort 3 Dort 2 Dort 1 DortO


Fig. 3.9 SRAM Organisation ( Alvin R. Lebeck 1998).

Advantages and Disadvantages of SRAM


Advantage of SRAM is that is of high speed over DRAM . While SRAM memory cells require nose
space on silicone chip, they have another advantage of that translate directly into improved performance .
Also SRAM ’ s cells need not be refreshed this mean that SRAM cells are available for reading and
writing data 100 % of the time.
Disadvantage
The disadvantage of SRAM is high power consumption and heat. Also SRAM ’ s is more expensive than
DRAM .
3.3.2 Dynamic RAM (DRAM)
Dynamic RAM is an
alternative to SRAM . DRAMs consumes less power ihen -fiRAMs, actsUtef
capacitor
to logical 0 ,
Ul
Caf)acitor gradually loses charge and when left for a long time , logical 1 changes
tim lo discharge
can be under a second .
thoi|gh t is inexpensive device
‘ , but still DRAM is an jmperfect memory ^

Write enable

Intput Circuit for


one bit - Output

[ Refresh|
Fig. 3.10 Block diagram of DRAM
.
Memory Hierarchy Design M
- •
•l'-* > * jo . - *

DRAM Architecture
is called wordlines.
DRAM architecture consists of array of columns are called bitlines and arra of rows
Intersection of hitline and wordline is the address of the memory sell

Support Circuitry
on the memory cell .
^ Sense Amplifiers: Sense amplifies signal or charge detected
Address Logic: Address logic is used to select rows and columns.
& CAS latch and
Row Address Select ( RAS ) and Column Address Select ( CAS ): RAS
^ resolve the row and column addresses. It initiates or terminates the read / write
the
operation .
memory cell.
<D
c
c
Read Write Circuitry: RD/ WR circuitry store or read the information in CD
o

^ refresh cycle .
* Internal Counters: Internal counters keep track of the refresh sequence or initiateunnecessarily .
CO
E
CD
o
5
Output Enable Logic: Output enable logic prevents the data from displaying it
• -O
Cl)
C
c
CD
O
Classical DRAM Organisation (Square ) cn

bit ( data ) lines

Each inter section represents


r a 1-T DRAM Cell
o
w
RAM Cell
d Array
e
5 c
o
d word (row ) select
e
r
^

Sense-Amps,
row
Column Selector & Coiumn
address Address
I/O Circuits

• Row and column address together:


- Select 1 bit a time
data

.
Fig 3.11 DRAM Organisation.

DRAM Read
Qt
1. Row Address Strobe ( RAS): is required at the beginning of every operation. And it Latch
row address and initiate the memory cycle. To enable RAS the voltage transition should be from
high tojow voltage . RAS is an active low and it should maintain low voltage as long as RAS is
]
required For Complete memory cycle RAS must active for minimum amount time and
also
inactive tor minimum amount of time.
gjnawfW*'frrjSi*' '
- before RAS for
irr Hftr
Computer Architecture * •« * *
'

'

$ 08 -Advanced
. - - . T*
iw
ion . CAS
Operation must be active
RcadAVrite
.
2 Column Address St
robe - : Initiate
fion . Low
voltage-
refresh cycle .
to choose for read /write operatic
): Write enable
signal used operation
.
3 Write Enable ( WE operation . High - voltage level signifies a read read
^
level signifies a write displaying the output while
control signal is used to prevent
.
4 Output Enable (OK ): This
when write operation is selected .
operation. The control signal is grounded
used for input and output. (D

5. Data IN /OUT( l )Qs ): DQ pins are c


c
(0
o
C/)

Steps to access a cell in DRAM ECD


o
N ColAddr £

3 •

c
c
O
Cl)

03
o
Is: (/ )

Control , Addr
ll II DRAM Cells o Data

is a: P

Fig. 3.12 DRAM Structure .


.-•? • *
* -:v

Certain measures of memory technology are -:r


1 . Density: Density refers to memory cellsLoeiisduare area of silicoij: Density is usually stated as
number of bits on standard size chip / •

. . -. . - . ..
Examples: 1 meg chip holds one- -megabit of memory and 4 meg chip holds four
k
» • ; . •
. ‘

megabits of
; •

.
memory
Note: higher density chip generates more heat
:• ri%
2. Latency: Time that elapses betweenvthe start of an operation 5; -
and the completion
ms"
MP ' Z1

Latency is not a constant value. of the operation .


• _ r
'
fi •' '
V

3.3.3 Physical Memory and Word Size


Bits of physical memory are divided into
blocks of N BitS each. Group of
N is known as the width of a word or the word N bits is called a word
size. where
Physical |
address 32 bits

0
Word 0
1
Word 1
2
Word 2
3
Word 3
4
Word 4
5
Word 5
Fig. 3.13 Physica|
memory organisation.
&
p
Memory Hierarchy
Desinn
a**®**3s** ***
® 6

- Vv
10jr
,’ J * 't
i
- " r' ?>
i#

the memory
a word is equal to transfer Sj2
words, where word
2 ;i es . P
^^ ^
V *

t •

physi memo ope


e .
o an entire .

before RAS for ^


Each rend or
wntc ; Ui

i. Low voltage - larger word size


„ with more paraliel
wires
"
; :
• Implem e led perfonnance
put while read

- -
Resu to in higher
.

• Has higher cost IK ,11 n,rts of computer to use one size for: .
usuallydesignsallpartsot
Q3
c
c
03

Note: Architec (
v o
* ** ? ,. t‘ ;I 00
• -••t
i Ero
• Memor)' word
ygg ;

^ j Th< o
. Integer (general -purpose registers
) -
'
£

• Floating point number V 4 ill v . -lwV ' '


' '

' - * 4 — '
:v
r
*i"*
T3
03
C
ra
Translation between Byte and Word Addresses is performed by
intelligent memory controller o
CPU CO

can use byte addresses . Physical memory can use word addresses ( efficient )
ita

Mathematics of Translation

Processor Physical
Controller
memory

Fig. 3.14
Word address given by:
W = BIN
Offest given by:
at
0 = B mod N
Example
te
peration. yv 4
* Byte address 11 =
• Fo>md in word 2 at
offset 3
Important Point
K
ord where

^VSSmt'CofCbUl“‘ ate
vV *
°nS d

^ ofr
byte address bytes Per ° ITlainder’ Physi«l memory
®
ress mo word
address ^« wer is organized
means ffie translation from a
3.4 y extracting
AUXILIARY I ORY bits .
Auxilia ^^ "
:

&;
1
.

J
* «• r £> , JL* AW
,>
mi Ene ' , «
« «« <
''etap« nysne' eii'sblin"|’'Uuxiliuryme
tI1 ( tlc ^ , ^ ^ ««
ctl:ssil>

0t ler
lcbylh
P
11 Is non- volalilc. Aitollicr

y USed in c mputer systems


]

1
bubble comnon eni 0ry 1 Cnts tJlat vvere°used previously 1
etc .
1
a3
c
F « g 3 15 CD (Secondary Storage Device ). c
CD
O
CO
of a memory are: E
main ctancteristics
CD
o
^• Capacity: Capacity represents the global volume of information (in bits) that the memory can
2.
6
-
O
store C
Cl)
c
c
% , Access time: Access tunc , orresponds to the lime interval between the read/write request and
the n
CD
O
C/)

availability of the data c


time interval between two successive accesses.
• Cycle time: Cycle time is the minimum 1
defines ) c volume of information exchanged per unit of time , expressed
• Throughput : Throughput \ \
i

in bits per second


-
• Non volatility: h is the ability ol a memory to store data when it is not being supplied with
electricity.
The ideal memory has a large captcitv w uh restricted access time and cycle time, a high throughput
-
and is non volatile .
Howocr . fast memories are also the most expensive This is why memories that use different
technologies are used in a computer , interfaced with each other and organized hierarchically.

> Capacity
Registers ] 1 n*
Cache Memories 1 5 MS
Random Access Memory
^Vi
110 , *
1

5 ps
S Mass Memory

Fig. 3.16 Access time Vs . capacity

Auxinary Memories
f ftd
*
__
1 Rash M

4
tod r

M**"
emor> : An electronic non - volatile computer storage device lhat. 1
'

|**ogrammed . and works without any moving parts. Rash


j
and
~
s. Flash memory presses the
memory t

both read and write access . However, the access tunes of flash
nfROM memories
memon
^^
ise between

longer ”
^
***« times of RAM.
Memory Hierarchy Design gj11h‘

a;
c
Fig. 3.17 Flash Drive.
c
a)
o
V)
E
2. Optical Disc: It is a storage medium from which data is read and to which it is written b> lasers. ca
O

Optical disks can store much more data- up to 6 gigabytes (6 billion bytes)-than most portable magnetic £

media, such as floppies. There are three basic types of optical disks: CD- ROM ( Read only ), WORM O
CL)
c
c
( Write-Once Read-Many ) and EO ( Erasable Optical disks). ca
o
GO

3. Magnetic Disk: A magnetic disk is a circular plate constructed of metal or plastic coated with
magnetized material . Both sides of the disk are used and several disks may be stacked on one spindle
with read/ write heads available on each surface. Bits are stored in magnetised surface in spots along
concentric circles called tracks. Tracks are commonly divided into sections called sectors. Disk that are
permanently attached and cannot removed by occasional user are called Hard disk. A disk drive with
removable disks is called a floppy disks.
A

A: Track

B: Geometrical Sector

C: Track Sector

D: Cluster

<

D
Fig. 3.18 Magnetic Disk .

4. Magnetic Tapes: A magnetic tape transport consists of electric, mechanical and electronic components
to provide the parts and control mechanism for a magnetic tape unit . The tape itself is a strip of plastic
coated with a magnetic recording medium. Bits are recorded as magnetic spots on tape along several
^ecur . trM»vsa»rsw

^ ^^ bit R/W heads are


112
1 A n

J
^
: with
tracks . Seven or Nine bits
are recorded to form a character
together
as a sequence
aPf
of characters.
can be recorded and read
mounted in each track so that data disc
Versatile Disc or Digital Video Disc is a popular optical

^
TYmtal
Digital Versa .
5. DVD: DVD also known as Most DVDs are of the same dimensions
. Its uses arc I
data. Variations of the term DVD often
storage media format main
much
as compact discs (CDs) but store mo
.
‘^ ^
RrQM haJ. dafa lhat can only be read and
not written,
describe the way data is stored on the discs
° DVD- ROM. DVD- RW , DVD+ RW

lasers is 650 nm , and thus the light has a red color.

Fig . 3.19 DVD.


DVD- Video and DVD-Audio discs respectively refer to properly
formatted and structured video
and audio content. Other types of DVDs, including those
with video content, may be referred to as
DVD-Data discs. As next generation high -definition optical
formats also use a disc identical in some
aspects yet more advanced than a DVD, such as Blu ray
.
the retror ym SD DVD (for standard definition ).
- Disc, the original DVD isi occasionally given

3.5 INCLUSION, COHERENCE AND


LOCALITY
mmm m
- - •

Three important properties that must be .

„ satisfied when information is stored m Hiff , ,15


^
memory hterarchy (L L , L ... L ) are:
2 3
As we can see from the Memory
„ inclusion , coherence and l
Hierarchy diagram that cache
alitv f
° **
which directly communicates with the
CPU registers. The memory is at innermost level
outermost level L„ contains ( Magnetic L ,
^
etc.) all the information words
stored . ap
3.5.1 Inclusion Property
The general meaning of inclusion
is
.
1 The action or state

.
of handicapped students” ' ^* ^ °f included wi hin a sroup or
‘ Structure: "the inclusion
2 A person or thing that is
,
included within a arger
Same is the concept here. The group or structure.
inclusion
multilevel ,"Xhier
complexity for multiprocessors with
The inclusion property is stated
' n reducing the cache coherence
as L
information items are originally stored c L „ ^ L,r he Set elusion
the n 7 ' ' relatione!, -
/
arc copied into L„ Similarl subs in
_,
^ ^ .

u <
e c n cd mmv During , « all «
"- I ° * Ln and
nIu
-2'
so
H ess ng
on .
, subsets of1 /
n

Scanned with CamScanner


I
?
. lib
-
iprr

,,
found in i, <
;- r Li elc ) lhen coP> es of i|,
e 1
, *" * r However, a word stored in Upner,

^eveS,i
. *'
rd !
|
* \
,r>
^* . ifan inform* \
,,
^^ eH|
e« s
't W ;
-
i n all «PI» m i» i n Z ( lower levels i m p l i e s that i t i s a)

- .
BT2CfiBVWti%*1vrJM££

it RAV heads are £ highest level (L is d


backup storage,
;re.
Z all lower
be
** lhr -
1
levels vmam* L
where
'ular optical disc from ‘
:ame dimensions can be found.
term DVD often CPU Registers
and not written,
form of words
RW, DVD+ RW
Information transferred in the
standard DVD
blocks)
CACHE MEMORVfdivided in cache

Blocks are units of data transfer between cache and main memory

Main Memoryfdivided in pages)


r v-. .

Pages are the units of data transfer between Main memory and disk
sto r”- 3e
Disk.Storage
.1 -

Segment transfer with different number of pages

Backup Storage

- - "d" Fi3 3 20
,ow ltle '-
Suesiion arises at each level are
5 property in memory hierarchy.

els of the

love! Z.( ,
etic tape
Cathc and rna n
rile main ‘ me
are lhe units memory is divided •
4 y 32 bytes
<* ^«ds). Blocks le h,ne . The cache
Units
10
^

^ ata “
- hierarchy.

divided into caste


transfer between the
I
3.5.2
of inf
““
Coherence
(
d*
uansfeST**““ **bV'« «« memory
’ y’ 4 K
El
and
each h
86
*“** 128 blocks **
T, Propefty " '

| c cohere

-
hr
insistent If


s , « ££5
is often f,

S 5Sff
For ma,ntaini
, ,, "nl> the must be updated immediately r
dt successive memory levels

-
* C0
herence „
j d ,*
m ir i e
" *
,
|s
the feci 1 ' SUth: Fre<luently used information °
' *e 'ma'' ' .
^
iai all ve :it - Css
IS 0 s Called W neraOry I,; .
“ in, 1 die memory hierarchy

•»
ofi« Parallel lrit
, ' °
~ Methods
llle
" wore at y
S es
icmor/w ' "'Plesi and the n
» are:
""dified , 7''.fc adJ £ a„«„ with osle ,COmmonly used procedure
)

me hCh
(

^
, 111
° ^ . Short< this
<’ ri nemory being updated in
’ 2,* • •.
» n
' ^ I,
0
|
which demands immediate

Scanned with CamScanner


X
^ \ Jf
r d op AdvanQB6
Computer Architecture
114
rs
/ %OS.'

'^
i "**•'* >.- w4Xi

W ^"
^
The

,1* 1" "„P


t
\VN lnr iitvofReferences
, (Principles of Locality)
3.5-
...
3

j 0calil.
v
, same
value
9 rule: According to
Hennessy and Patterson (1990), 90- 10
rule stales lha a Epical
% of its execution time on only 10% of the code such as the innermost program
,
d 90
,may sP"
coping operation
.
locality are: temporal, spatial, and sequential. Each
loop of a nested
Ie
% Three types of reference type of locality affects
the design of the memory hierarchy . The principle of localities will guide us in the design of
cache
mam memory, and even virtual Memory organization.

d
disk
%
1. Temporal Locality —
It is assumed that recently referenced items ( instruction or
likely to be referenced again in the near future. If at one point in time a particular
data ) are
memory
location is referenced, then it is likely that the same location will he referenced again in the near
future. Once a loop is entered or a subroutine is called, a small code segment will be referenced
repeatedly many times.
The temporal locality helps in determining the size of memory at successive levels.


2. Spatial Locality Spatial locality means items at addresses close to the addresses of recently
accessed items will be accessed in the near future. In this case it is common to attempt to guess
the size and shape of the area around the current reference for which it is worthwhile to prepare
faster access. For example, operations on tables or arrays involve accesses of a certain clustered
area in the address space. Program segments, such as routines and macros, tend to be stored in the
same neighborhood of the memory space.
hierarcfiy. The spatial locality assists us in determining the size of unit data transfers between adjacent
memory levels
die is in !iIt I
dedinio ^ l
3. Sequential Locality

In typical programs, the execution of instructions follows a sequential
order (or the program order ) unless branch instructions create out-of -order executions. The ratio
of in -order execution to out of order execution is roughly 5 to 1 in ordinary programs. Besides,
r beW* *\ - -
the access of a large data array also follows a sequential order.

jlocv. m . 3.5.4 lrT|pact of


Temporal Locality
' instructions in program loops which are executed many times, e.g. n . The
C S £ We times ,
. ' ’ assume
once l ! |
are replaced by new instructions
- ^ * °time, cache
averagee * aocess
nt
> are used more than once before they

a it

In deri
tQv , is given by the following.
tm
= rc + —n

i hi

Scanned with CamScanner


Memory Hierarchy Design 115
•: r
• hfarsmiea: -- — • -- 1 »m<IJ wy 'I W H i 'l 1*"» ’t#***1
*

3.5.5 Impact of Spatial Locality


memory to the cache , upon
In this case , it is assumed that the size of the block transferred from the main
elements were requeste ,
a cache miss , is m elements . We also assume that due to spatial locality , all in
, is given by the tav
one at a time, by the processor . Based on these assumptions , the average access time
,

following.

tav mtc + = tc +‘ ^m ~
n m
has created a
In deriving the above expression , it was assume that the requested memory element
elements , in time im.
cache miss, thus leading to the transfer of a main memory block, considering of m
. The above
Following that, m accesses, each for one of the elements constituting the block , were made
expression reveals that as the number of elements in a block , m , increases, the average access
time
decreases, a desirable feature of the memory hierarchy.

3.5.6 Impact of Combined Temporal and Spatial Locality


In this case , we assume that the element requested by the processor created a cache miss leading to the
transfer of a block, considering of m elements, to the cache. Now, due to spatial locality, all m elements
constituting a block were requested , one at a time, by the processor . Following that the originally
requested element was accessed n times ( temporal locality ). Based on these assumptions , the average
access time, tav , is given by the following.

=
mtc + tm + ( n - l )tc rc
n
-t
m
+ ( n - 1) _ + lc
‘ av n nm
A simplifying assumption to the above expression is to assume that t - mt . In this case the above
expression will simplify to the following expression .

= mtc = tc _ n +1
' nm
av + tc - l c n n '
c

The above expression reveals that as the number of repeated access n increases , the average access
time will approach t ,.. This is a significant performance improvement.

3.5.7 Reasons for Locality


There are several reasons for locality . These are
• Predictability: locality is one type of predictable behavicr in computer systems. Luckiiy. many
of the practical problems aie decidable and hence the correspondme program can behave
predictably , if it is well written.
.
• Structure of the progi am Locality occurs often because of the way in which computer programs
are created , for handling decidable problems. Generally , related data is stored in nearby locations
in storage . One common pattern in computing involves the processing of several items. one at a
time. This means that if a lot of processing is done , the single item will be accessed more than

. once, ihus leading to temporal locality of reference.


Linear data structures: Locality often occurs because code contains loops that tend to reference
arrays or other data structures by indices. Sequenlial locality, a special case of spatial locality
occurs when relevant data elements are arranged and accessed linearly .

Scanned with CamScanner



.• . J, -
• jmwr niisrm
- ... ; • «

JA •Ss -A * •:
%
• •
^ "-
vV •

Architecture ^
r

116 Advanced Computer


swsflir
f Xi

^- MEMORY CAPACITY PLANNING



-' i ~T/r :T~

C ;•;

... any level in Che

.
. t0
3.6
and
Memory hierarchy performance
is piece of information
on the hi ratios , and access frepuenc es
“IS
successive levels of memory
hierarchy .
for processing. U depends

Hit Ratios . And hit ration


..hir in cnche is said to occur when data is found in the cache
u
memory
h
,
, , ,

i
hierarchy .
t
f

When
as hi
s
an
and
mis
'
s
informatio
miss ratio
n
a
iiem
L .,
is
is found
defined
,
). Here Hi ratio is a concept defined
in
as
Lt
1 -
for any two adjacent levels o
we call « a hit otherw se, a miss. The hit
,
h The hit rafios a successive levels are
, .
ration a L is denoted . Successive it ra 10s are
capacities , manageme nt policies , and program behavior
a function of memory
independent random variables with values betweep 0 and 1. The
access frequency to L/. is e m as
of successfully accessing L{ when
f . = ( l - / jj ) ( 1 - hj ... ( 1 - hi - tjhr This is indeed the probability
there are / - 1 misses at the lower levels and a hit at Lr
Note that

£1 /i = 1 and fi = hi
/ =
Due to the locality property , the access frequencies decrease very rapidly from low to high levels ;
, ,
that is, / » f 2 » / » .. . » fn . This implies that the inner levels of memory are accessed more often
than the outer levels.
To simplify the future derivation , we assume hQ = 0 and hn = 1, which means the CPU always
,
accesses M first and the access to the outermost memoiy Mn is always a hit.

3.6.1 Effective Access Time


To be more efficient operations we expect high hit ratio as possible at L , (lower levels like cache ) If
there is a miss in the cache then this is called as block misses and if there is a miss in the main memory
then this is called as pagejoult . 1 hese are named as block misses and page fault because
units of data transfer as described in inclusion property.
these are tZ
Using the access frequencies / for / = 1, 2
,
n. The effective access time of a memory hierarchy is:

^
eff -
i =l -( I )
and /) - ( 1 /* , ) ( ! - h 2 ) . . ( l - / ) h j thus put in eq ( 1 ) ,
iM
,
we get = VJ + ( 1 - V h 2r 2 + ( 1 ~ /i ) ( 1 - h ) h +
2 33 + (1 ^ - - /!,) ( 1 - /,2) .. ( 1
3.6.2 Optimization of Memory Hierarchy
The total cost of a memory hierarchy is estimated as follows:
n
C total =
Icost
=
/ 1
per byte *memory size
This implies that the cost is distributed over n levels.
Since c. >
,
v. < s2 < s3 < ... . The optimal design of a mCm0ry
sn >c > ' r
.
M , anti a otal cost close to the c n of MV

' in a Close '

Scanned with CamScanner


Memory Hierarchyy nfic -
JL.. ».. •
esiQn
problem, giVen
ulated as a linear programming a c«li e
- 't?Vr
can \)C »Sr e 9'/
"toge* ' ^
r
- "
rpss
e'
tei
t
E *

b|em to niinimizc
r
^
P
Be optimization l <0 °
n„ = £
on the total cost
/< «

ie w fiea ved
d inte aflc
dt constraints: pClf CpO ci
subject to the following
ine
,.. > (), /. > 0 for / = 1.2 ••• » ^
^tne16^orf
i

- n
ory W
,

- *-
r i

c -lotal - i^* . C • J;
'1 , <C
° is
1 1
. metn01^aiuf
; jxr&amons v
, „nH rnnacitv 5 - at each level Af- depend on the speed t required. Theref,0re d

* i /i«'
* ** >« - « y •* « i .
M *
for e virt
assign
*
would ass
3.6.3 Memory System Performance Parameters
Two basic parameters that determine Memory systems performance are:
.
1 Access Time: The time required by a processor to access data or to write data from and I
memory chip is referred as access time. i.e. Access time is the time taken by a processor ^
requesi j
to be transmitted to the memory system, access a datum and return it back to
Access time depends on physical parameter like bus delay , chip delay etc.
the processor. 11
.
2 Memory Bandwidfh: Memory bandwidth is the ability of
the memory to respond to requests I Add

I" T memonI
l Wi h dependS mem0ry system organization
7‘ ‘ “ , no of
°" ,
which processor would be submitting T §md
g ven by ihc
iS lhe *° rris
Offered request rate and maximum
memory requests if memory had unlimited bandwidth, *‘
memory bandwidth determine
bandwidth maximum achieved memory
^ Turns (J 'p

^
and it includes input, ProSram performance is the turnaround time
output tasks comni
time, these time
factors must be reduced ^ ime l me etC
system
attributes t0 performance are
’ ' us 10 reduce the turnaround
' ^
:
1. Clock rate Interle;
and CPI; F be he frequency differe
CPI:
‘ ' of typeI
instructi ns in a program. Then, Avcra#
°
CPU Total Cycle

= Xcpi
tal

/ =1
i x pi where
F= _ IC
Instruction Count
xj.
CPU ti '
2" MIPS rate
'Cycle time
i=l
«
3> Tteoughput rate MIPS = Clock
^ Thr<)ughput rate/CPI x 6
JO
defines the
volumes 1 info
° rniation exchanged per unit of hnK

Scanned with CamScanner


&3l
en 9 c
ei|,
't
*
Adva
—^nced
aBWUiWi
Computer
«m w w M»
- gn
_
Architec
c ,. .
« r
ture

INTERLEAVED MEMO
• >» W
*
- -
} ) .
t4

-~- •
• • ;;
OftSb r
4*

-7- P~
-—
-
“'

, * .,. , .
Memory
Interleaved ,
ed of a collectio'n r )R M niP, ry cMps n
memory is compos
Main to form a memory bank . It is P°SS bie to orr n •°e /le A ° r chi^
ir> aZ^,s** kmUni*
.
t0Sether memory. Interleaving is a „/
° ng /a,enor n, ,'guoUs Jy *nown
interleaved .Interleaved memory j .. , Chnt°<lUe for compensat
ly } arrange m ry in
< , Cun
'

performancan
ce access alternative section lmniedWely wjlh ‘
.
f

TheCPU
memory banks take turns
.
^ f* .as
sunplyjnp T

,, *"*** locmio o vT * * * ««
suss ss-
^
sasa" srs*:cj
coa
M0 V » one technique for
«
' * i' ? «
' •*
*“"« < «c. (Fig. 3.21 ).
--
; i:.

aiid
P W0 A#! M2 W3
lessor

Addresses: 0 1 2 3 1
mar,
4 5 6 7
mfe
Ifl 8

an
(a) MPS diagram and memory layout
(b ) Elements fetched from an
(cell i is in bank i mod 4 )
x 8 * 8 any when the stride is 9.
i
Fig. 3.21 Interleaved Memory.

^
0
different Memory bl ZZporimu s ^
are
°CCUr
^ Pan*Ue1’ bUt they must ** t 0 addresses located in

* Since m is a
tW S X moc m res Jlts in memory module to be referenced being
determined by^| ^ * address ,
° °f tbe memory '
. This is called low order interleaving .
^ ^ ° ^
(

* Memory
3 S0 be ,
add by higher order interleaving
* In higher 0r(j . ^ ^ ^ nuPPed to memory modules
define a module and lower bits define

^Z^
e ng upper of memory
‘lv bits address
a w rd in
° dla[ '
jn Mgher order Cav, n£ ,nost
whereas
f the references tend to remain in a particular module

ln l
w
° order jn » r °
Cavin dlc references tend to be distributed across all the modules
.
* l0 u, ^ memory bandwidth whereas higher order

^
er lnterIeavinK
interleavin » dn bc used t0 increase the reliability of memory system by reconfiguring
, Provid es for better memory
Ostein.
i

Scanned with CamScanner


rfjjjy/

T3.9 *
nwa« ^
Memory Hierarchy Design
. _vasss ' eswsaso'r'*®* •
-
S
?
— '-r^- r&' 3 i /Vs „
* ^ j \
^ • •

Memory System Design


design steps are
How memory system is designed , there are various questions to be answered . Thus basic
as follows:
1. To determine the no. of memory modules and how memory system is
partitioned .
2. To determine offered bandwidth ( already described earlier )
delay through the
3. To decide which interconnection network should be used because physical
and increased access
network + delays due to network contention may cause reduced bandwidth
reduce contention but
time. High performance time multiplexed bus or crossbar switch can
increases cost
4. To evaluate memory model: In order to support the achieved bandwidth the
assessment ot Achieved
in the memory system .
Bandwidth and actual memory access time and the queuing is required
/
__ _
s -i:
A r??
_ W; .
?Y; "i f •
a
_
PRO
v>
CES
v :
SOR
>•
MEM
*
.
,i r M A n v INTE
ORY MTtDRAC T!f)
ArTi
- jt
. .
.

[MDU : Dec 2008 , 11 , 12]


.

L systems with multiple processors or with complex single processors, request s may congest the memory
^^ systerm)Either multiple requests may occur at the same time, providing bus or network congestion , or
requests arising from different sources may request access to the memory system
. Requests that cannot
. This contention
be immediately honored by the memory system result in memory systems contention
degrades the bandwidth and is possible to achieve from the memory system.
.
.

a single memory'
In the simplest possible arrangement, a single simple processor makes a request to
service from the module.
module. The processor ceases activity (as with a blocking cache) and waits for
an arrangement , the results are
When the module responds, the processor resumes activity . Under such
y system since only one request is
• completely predictable. There can be no contention of the memor
to have n simple processors access m
made at a time to the memory module. Now suppose we arrange
sors access the same module, Contention
independent modules. Contention develops when multiple proces
sors. Asymptotically , a processor
results in a reduced average bandwidth available to each of the proces
a memory cycle resembles
with a nonblocking cache making n requests to the memory system during
the n processor m module memory system, at least
from a modeling point of view{But in modern
the memory system . Whether or not a processor is slowed
systems, processors are usually bullered from
depends on the cache design and the service rate
down by memory or bus contention during cache access
of processors that share the same memory system

Nature of Processor
^
Processor are classified in 3 categories:
^ • Simple Processor: Simple processor makes a single request and waits for response from memo
ry. O

• Pipelined Processor: Pipelined processor makes multiple requests for various buffers in
each
memory cycle
• Multiple Processors: Multiple processors make each request once every memory cycle.
each
in
requesting once every memory cycle^
_
To represent the bandwidth available from the memory system, 2 symbols are used.
( m ) or B ( m , n) : Number of requests that are serviced in each module. Service time Ts = T C,

Scanned with CamScanner


120

- v ... .

_ _
Advanced Computer Architecture^ . - -
. w
vv,« *»
' '
*
>.
*

is number of requests_eachcyclg
.
where m is the number _gflQodMlgs and n _
time .
Tc is the cycle time and Ts is the service second.
B ( w) : Number of requests serviced per
fi( w ) = B( m )/ Ts

3.8.1 Hellerman ’s Model

( a ) This model was developed by Hellerman and known as one of the best memory model .
v V//
( b ) In this model ^
jfellgman assumes a single sequence of addresses.
(c ) Bandwidth is determined by average length of conflict free sequence of addresses i .e. conflict free
sequence of addresses means that addresses are examined until a match between 2 addresses occurs
in w low order bit positions where w = log 0 m , m is no of modules.
( d ) Modeling assumption is that no address queue
is present and no out of order requests are possible.
( e ) Under these conditions the
maximum , available bandwidth is found to be approximately.
Bim ) = Jm and B( w ) =
JmlTs
3.8 .2 Strecker ’s Model

This model was developed by RAVI and the


Model Assumptions were:
O& n simple processor requests made per memory cycle and (
(b) There is no bus contention.
here are m modules.
(C )
^
SMgsts random and uniformly riistrih ,
module is 1/m .
.
tH-*crQSiiJliadifc Prob. of
any one request to a particular

^^
, (c)
J ny busy module serves
All unserviced
1 request.
*

• Model Analysis:
l
^^^ iroppedeach cycle and there
are no queues.
- Bandwidth B( m, « ) is average
no of memory requests
- This ecqua s .
average no of memo serviced per memory cycle
.
Prob that a m0dUle
is no fenced
'"
by one
each memory 'prlcessor^
cycle .
Prob that a module
Prob that module s
So B { m , n )
is not referenced
by any processor
busy = j . ( 1 |/ )n
m
_
( t 1 /m )
u = 1
=
Thus the achieved
1 /m )"]
congestion ca ed Is less
^over cycles
than the th
eoretical d ue to
results in Pti contention.
bandtM
° mistic value for
^
he NegkCtin®

Scanned with CamScanner


KA\

$
Memory Hierarchy DesiQn r
.- -
,i xnszu 'Jt?
^
I I" < cj
±
>!*#***
*•
!
r' -
v***?0

ur'
, -
s Model solution lor B( f n , n ) ( i.c., average nu*
3.8.3 Rau’ %!
rjSS 5S S:
S ".=
, ;,
^ B{ m , n ) = '
i i
£0
c =
.
2' xc( »« - 1. / ) x c
(n - 1, i)
3
(tO

Close form: =
/ i
(n — 1, /) / (c + 1)]
£ 2' X c( m - 1, / ) x [c
c-0
nodel.
e nu mber of ways
of choosing i objects out of set of # ,
.e. conflict free
where, r m ( 1 » f
1)
or n 1 objects.
or c(
—-
n 1.0 — “
M

Jdresses occurs and / = min (m, n ).


on the analytic approximation that
is if all the processors are not ami ! ,,
model is based
2. Rau ’s
, then they are queued at the
remaining m 1 modules with 1
the service at a given memo:ry module
</ur in a system consisting of same number of
'

nately . precisely the same distribution that wouldo


processors and m - 1 memory modules, yvr

Numerical:
,
Example: Consider a 2 level memory hierarchy M and M . Lei ratio of M is h . Let Cj and C2 are costs ,
,
per KByte, S and S 2 are memory capacities and / ( and t1 are the access time respectively.
( a ) Under what conditions with the average cost of the entire memory system approach C " E\ 2
2
( b ) What is effective memoiyaccess time / of this hierarchy ?
cl
to a particular (c) Let r = Ut ] be speed ratio of 2 memories. Let e
system. Express e > 0.95 if r 100?
= r / be the access efficiency of , ro memon
=
Solution: *
Average cost (C) = (C, .S, + C,S , )
We have to find the condition o
S + S2 ,
that average cost of entire memory
system approach to C 2 this
cycle .
le.
means that C C ‘
then
2
S2 » 5

C
, =*
= ^2^2 = C2
.
c 5,
hence when S » S
*2 2 I c c2.
(b
> Effective memory access , = Vi + ( i - / , i /
time a i i /
2 2

r = -h£.
ii
and £

*a
e =r *2
\
From r = — =*
t
tI
52
r

! ,

Scanned with CamScanner


computer Architecture ,^ . ,.
Advanced
XAA.’UOr;
-
122 **.
’* J
*
,
.
r2
, -
r (fo + (l /t) f 2 )
f( r L
'
I
7'2 + (1 - h )
1 1
-
li + r (1 - /, )
r [ r
+ 0 “ fc > ]
r 100
(J ) e > 0.95 if =
XJ 1
G = ( from C part )

95 1
lfse; o: /t + 100 ( l - ft )
100
95
( /t + 100 - lOO/i ) = 1
=> 100

>.
' fiu&k ’ :

95
LOO
•»
h + 75 - 95h \1

95
=
h - 95h = -94 „
=S >
100
' o

areaa
= 1
.
h —
1100
95
-
*
95 = -94
h = - 0.99
•i

=>
r A ,
Example: Consider a 2 level memory hierarchy M and M 2 with access time and t2 , costs per byte is
nc
Cj and C2 capacities are 5. and S0 respectively . The cache hit ratio /i = 0.95 at first level. , .
( a ) Derive (
fmcar ref •

( b ) Derive a formula showing total capacity of the memory system.

-
39 CONCLUSIONS I*/ ' - ;
:V7:-:r
-
'
- ~m
mm -

1 Physical memory is organized into fixed -size words accessed through a controller
.
word addr
Controller can use byte addressing when communicating with a processor
communicating with a physical memory.
3. To av id per NU id
arithmetic, use powers of two for address space size bytes
4. Memory ° num0I

5.
Banks are in alternative to single memory and single separ\ . jca| memory
connects to multiple controllers and each controller connects to
omioliers and memories can all operate simultaneously
Interlcaving is
^ ,

i
, - Related to memory banks
" Trans patent to programmer
/

Places consecutive bytes in separate physical memory

Scanned with CamScanner


Memory Hierarchy Design

Uses low-order bits of address to choose module known as


-
N way interleaving { N is number of physical memories)

Requests

:U - Interface
^ vV

CD
C
c
CO
o
GO
E
03
0 1 2 3 o
7 £
4 .i&6
,
$
8 10 11
;w:9 XI
a)
- c
c
03
a
c/)
Module 0 Module 1 Module 2 Module 3

Fig. 3.22 Memory Interleaving.

6. Main Memory Capacities


Main memory capacity is determined by DRAM chip
• Multiple banks (or bigger chips) used to increase memory capacity



Bandwidth is determined by memory word width
Memory words typically same width as bus
• Peak memory bandwidth is usually one word per bus cycle
7 . The advantage of SRAM is its high speed and disadvantage is high power consumption and heat.
*
. 8 DRAM is an alternative to SRAM and DRAM consumes less power but refreshing is needed in
DRAM, thus information get lost if refreshing is not done at proper time .
9. When information is stored at different levels of memory hierarchy then three properties inclusion,
conference and locality must be satisfied.
-
10 Locality of reference is of three types: Temporal, Spatial and sequential .
- 11 The main reasons for locality are predictability , structure of the program and linear data structures.
total number of bits
12. Hit ratio is defined as hits + miss

13 Access time, memory bandwidth, turnaround time are the basic parameters to determine Memory
System performance.

14 There are 3 types of Processors Simple Pipelined and Multiple Processors.
,

15 Hellerman ’s model, strecker s model and Rau s model are the three models of simple processor
’ ’
memory interaction.
i
* r'
W*
* **>
»

Computer Architecture
~ -
• •
. ** ’•
«
Advaneed

'124
ffifWiilii.- - - V
^
4S
rarchy are
Memory hie
16 . Levels of
i

CPU
Register
. r
{ . . iy-.
L

i: •'
Level 1
Temporary
Level 2
— Areas
•••

f .
Storage
r
i*

«
:
r
r :
-

Physical RAM
:
Virtual Memory
}' .
>:•
* *
A

;•
Permanent
— Storage
Notwork/ Hard Areas
ROM/ Removable Internet Drive
IOS Drives Storage

Scanner / ;• •
.•1
u i :
AC - Other
'

Removable Camera/ Remote


Keyboard Mouse Media Mic/ Source / Sources
Video r
-

n *> »
!

Fig . 3.23 Memory Hierarchy.


T

•4
. EXERCISE
a

V
P ; »* • x
:

- *•
i
l

r • o

Q. l . If an 8 - way set - associative cache is made up of 32 bit words, 4 words per line and 4096 sets , how big
is the cache in bvtes?
Solution: We convert words/ line to bytes / line = 4 bytes/ word x 4 words/ line = 16 bytes/ line .
Cache size is KLN = 16 x 8 x 4096 = 512k bytes
Q.2. What is the shoiiesi time it would take to load a complete line in the above cache usin ® fast page mode
DRAM that has a R AS access time of 50 ns , a CAS access time of 13ns, a cycle time of 95 ns and a fast
page mode cycle time of 35 ns?

°° .
Q 3. What is the shortest time it would take to load a complete tine in the above cache usino ennna
ED „„
that has a R AS access time of 50 ns, a CAS access time of 13 ns, a cycle time of 84 ns and8 AM
un EDO cycle
time of 20 ns?

Scanned with CamScanner


Memory Hierarchy Desion ,•
w> ny > " «' •
«#V . - 4 »
• »
Tr, T
r

•»T «** f
Hit *
’""'"
page mode ) cycle to retrieve SUK
lllc EDO (hyper
. I f|
'# •

.
*
A cache line
*
; a
{A

the faS«t Hi* « AS access word is 4 Words


achieve lo get the first s0 K mi
Q.4.
Solution : Wc

mininiamtimewo
What is the shortest
W
time
.we « ed a lull
words after the first<il<lheJ(l + M + 20 + 20 '* a compk
it would take
cycles from
to loud
RAS
0 s
"
" tc line in. the above cache usinz Ch u a
to the first data out
-
and is clocked at I OQ M %
*!1
fl
°
^ ff »
I£ '
Solution: We achieve
5 clock
DRAM that requires the fastest time hy using synchronous
, but we need a
full RAS
access to subsequent $
access to get the f
r
W rd '
b, .
k Vy
I :l °
, m
the first ^ |
I
^
after
words
retrieve subsequent minimum time would be 5 clocks for the first
word plus 1 d ki Per subs jg ~
is 4 words, so
line » the
word.Clocked at 100 MHz
, we get 50 + 10 + 10 + 10 80 ns. =
an access time o f 2 0
*
II ¥ II
'

Q.5. Ifa memory


0.92, and
system?
system consists
cons of a single external cache with

a main memory with itn access

Solution: r(c( = 20 + (0.08 )(60) = 24.8 ns


time of 6 0 ns, what is the effective memoryac t
ra of
meof (|h1
^
^, “
^
Q 6. We now add virtual memory to the system described in
. question 9. The TLB is implemented internal
to the processor chip and takes 2 ns to do a translation on a TLB
hit. The TLB hit ratio is 98%,
segment table hit ratio is 100% and the page table hit ratio is 50%. What is the effective memor
access time of the system with virtual memory ?
Solution:

—'
• teff RUB + ( l - i.rLB SEG + 'PAGE ) + CACHE + ( 1 _ CACHE /MAIN
=
^^ '
• teff 2 + 0.02(20 + 20 + 0.5(60)) + 20 + (0.08) (60) 28.2 ns =
^ ^
• This represents a drop in performance of (28.2 24)/24.8 14.%. - =
.
Q 7 Stone says that a simple rule of thumb is that doubling the
cache size reduces the miss rate by roughly
30%. Given that the cache in question 5 is 256K bytes,
what is the expected percentage improvement
in the effective access time if we double the
cache size to 512K bytes?
Solution: New miss rate 0.08 (0.3) (0.08)
= -
Percentage improvement is (24.8 23.36)/
=
0.056, fcff 20 + (0.056) (60) 23.36 sn. = =
.
Q.8 What is the expected
-
24.8 5.8%. =
percentage improvement in the
if we double the cache size
again to 1024K bytes?
effectiive access time over that in above question
Solution: New miss rate 0.056 (
Percentage improvement is ( '
= -
0.3) (0.056) 0.039, =
tcff 20 + (0.039) (60) 22.34 sn. =
23.36 - 22.34)/23.36
big 4.4% = .

Q- L What do you • 1
REVIEW QUESTIONS . •

mean by memory Hiera '

lode
.
Q.2 Write
short note on: rchy? Explain with
the help of diagram.
(a) Hit rati
i fast ratio and miss
(b) Access ratio
(c) turn
time
ords around time.
> the
.
Q3 Discuss
inclusion, localilly and
IAM
:ycle
..
Q 4 Discuss
Q3. What do
and compare
you mean
Helle
coherenee property
rman s, Strecker’s

by low order i
memory hierarchy
and Rau’s
"" mg and highmodel of simple memory processor i
erleavi
. _ leracb°n '

order memory interleaving?

Scanned with CamScanner

You might also like