CH - 3 Memory Hierarchy Design PDF
CH - 3 Memory Hierarchy Design PDF
. s£- - :.
A'
. < I
,f
i.
V •: .
> vr « ?.ti*rv > c 3.«ja. yj,*j -:•. /)-' 1' -
• 1
-
= .. ; -
'
ft
i v *J
’
»
i
*>
••
r ?
' *, • ' . '
l,
• * /
la « C •
pi
• J: :x ./\
. J
*.? V 1
.•->
*•
•? sr . v
I- •
• * viAV ; •
:- <W
rj
,V • . .. I- 1
•r l
* fs
:,
U'
»
r ’ •
r
/
m >•:>
y t
• J-
.»
.1jm
J
yi: *
r
- V-
*®g
.
#C
I
t ;i- r f VJ » ‘Bjtjtf *
-, ,
: .
• •
^ *
.
rt'
> ,,
H
/ >:• ' 'i ji
.
•r f Ii
(.*
ft %'%'
'
’: ' -4 rK
^ •'
•
t
' <
SpS KI - L'S
> « .*
^V:: •.
• f . ‘j
- :rj
'
Me
33- «s v.-
mo ry
J
Hi era rch y
J
d»
7
;~
; v & .
.
•i
*
- r
.V ,
•
•:V
V, ;
*
Design
8
3.1 INTRODUCTION •- ' *
-
99
3.2 MEMORY
f . -
,• • v ,
Advanced Computer Architecture^. imarofwaagju mm
HIERARCHY
'
•' *
,
The total memory capacity can be seen as being a
,yJ; ,
—
tmnm
SE
» mrrmve***
•
>
a>
response time, complexity , and capacity are related , the levels may also be distinguished >
t
00
controlling technology . E
O
Thus memory technology and the storage organization at each level are characterized by 5 parameters .
6
$
O
1. Access Time: Access time: refers to round trip time from CPU to ith level memory
2. Size of Memory: Size of the memory is the number of bytes or words in level i.
C/)
3. Cost Per Byte: Cost per byte is the cost of ith level memory is estimated by the product of cost
per byte * memory size.
4. Transfer Bandwidth: Transfer bandwidth refers to rate at which information is transferred
between the adjacent levels of memory hierarchy. «
5. Unit of Transfer : Unit of transfer refers to data transfer between level / and level i + 1 .
The various components can be viewed as forming a hierarchy of
which each member m i 5 a subordinate to the next highest member
memories (
of the hierarchy .
m in mv **
Al the higher level of the hierarchy are the relatively slow magnetic
tapes, optical disks etc. are used
to store the removable tiles. Memory devices at the
lower level (level 0) are fastter to access, smaller
size and more expensive per byte, having a in
higher bandwidth and using smaller
compared with those at higher levels. unit of transfer as
• For each /, the faster, smaller device at level / serves as a cache, for the larger,
at level / 4 1 . slower device
--
Why do memory hierarchies work ?
The answer of this question are
: zr;;c
• Net effect is a that large pool of me
T : '
” T" ^
' ’ '
**
^ cheap
'“P Per bit. ®
.
»* at level / + ].
storage near the bottom
storage near the top.
(\
°°
Memory Hierarchv h
. •
oeS
-
we- -awi' wt *
1 V
^'
performance . L, On-chip Lt ) \
// cache (SRAM - a;
l
.
faster
.
ogramnung and J c
c
CO
A “memory costlier o
CO
time. Since
hcd by the
(per byte)
storage
devices
1-2
Off-chip L-2
cache (SRAM
) LI L2 cache holds cache lines
retrieved from memory < £nlelev
s6
E
CO
o
£
" ‘ ^
uU P
e
the
5
-o
c
<u
Main memory 0f
c
1- 3 (DRAM) Main memory holds disk Most CO
o
CO
11
blocks retrieved from local 1- afl
disks. a d£
JCt of cost 2-
Larger,
slower, Local secondary storage 3 . atr*
and L4 (local disks) bot!
"erred Local disks holds files
cheaper
(per byte) > retrieved from disks on cac
storage remote network serv
1.
mn ) in
t
devices
Ls
Remote secondary storage
(distributed file systems Web servers
)
m
febiPs*
V
Fig. 3.1 Levels of memory hierarchy with increasing capacity and decreasing used
speed and cost from low to high levels. g thatt
^
i& that
Processor Super fast j l&ftoin
super expensive
tiny capacity
device ' •' • 3. Main
Processor
Register Primary
mm to the IT
. . CPU Cache Faster cost efft
u Level.1(L1) Cache expensive I
+ 1. Level 2 (L2) Cache small capacity V .
Level 3 (L3) Cache Stat
) ttom,
EDO, SD-RAM, DDR-SORAM,
RD-RAM and More ...
Fast
1 .
Access Memory (RAM) priced reasonably |
SSD1 Flash Drive :
average capacity
• V
I Pirn
N -Vo|aU eF
“’ , ,ash BasedMem
.
Average speed
priced reasonablyi
v average capacityl
0(
Mechanical Hard '
t %
Driver
Slow
File-Base cheap $
Memory
Fi9- 3.2
More detailed
vi
large capacity
I «
e w o f memory hie .. M
rarchy (Ryan J.Leng).
8 AdvancedComputerArchileclure • . v*«
«
-rn ^
.
Hierarchy Level A
*
of Memory 4. Di
-
Exte
isiers - ston
1 . , . of a group of nip flop* with a common clock input. Rcgjstcr arr,
rc? is cr insists
binary data A counter is conslructcd from two or more flinfi, common|y used to
' ston
^
A
*! **• « *
phase the register
transfer operations are directly controlled by the
process
*
’
^ ,h' gigi
inf (
Memory
1 Cache dat
whiles copL tf £
.£ cSe f'a smaillVfasrer tilTry
0 CD
C
£U£. memory locations. The cache controlled by MMU and it can be implemented au ne ! me c
T0 CO
o
Ira
2 ts
QJ
C
c
to speed up executable instruction fetch,
p CO
both executable instructions and data. The data cache is usually organized as a hierarchy of more
,
cache levels L , L2 etc.
'.
V
Random access memory ( RAM ) is a computer’s volatile or temporary memory, which exists as
" r chips on the motherboard near the CPU * RAM stores data and programs while they are being
'
used and requires a power source to maintain its integrity. Cache memory is high speed memory
that resides between the CPU and RAM in a computer. Cache memory stores data and instructions
k that the CPU is likely to need next. The CPU can retrieve data cr instructions more quickly
e*pe-$*
tiny C8pa - ^ from cache than it can from RAM or a disk
Primary storage (also main memory and physical memory) are generally used interchangeably to reler
to the memory that is attached directly to the processor. It is larger than the cache and implemented by
ft* cost effective RAM chips. Integrated RAM chips are available in 2 possible modes: static and dynamic.
expe'® 5
cap*’*
Static RAM (SRAM) •
V
*
.
• in SRAM each cell stores bit with a six-transistor circuit.
ft SRAM retains value indefinitely , as long as it is kept powered v ,
-
Jk * Kr *
data that resides in external memory the CPU must first transfer
£Z
) in relation to Cl spee s C
CD
memory to find the appropriate data is slow ( milliseconds .
a
c/5
been located
transfer of data to main memory is reasonably fast once it has
E
CD
o
£
T3
to CPU.
O
Physical memory is the main memory that has direct/ indirect access
CO
Auxiliary Memory
Magnetic Main
Memory
Tapes
I /O
Processor
Magnetic
Disk Cache
MEmory
CPU
with CPU.
Fig. 3.3 Main Memory directly communicate
p
.-
j »f
....
< I '
ITT *
a
i .
\c
.
%t *
% ' IT . H ‘
y ••
•
,
*i
1 « amt i. v
» :•
!I
w
•«*
Cache Memory
Register
File
Virtual QJ
C
Inter- memory c
CO
I L2/L3 L 3 /L 4 connection u
-
On chip -
On Chip
cache cache network
CD
E
L1 chache L1 Cache
* Disk files & ,
CO
O
r '
5
TLB datbases -O
0)
c
c
CO
Special- a
cn
purpose
#
CD-ROM'
caches TAPE
<- Other etc. - •
computers
& WWW
^
Basic Architecture: }} basic architecture of
^ to SRAM includes one or more rectangular arrays of memory
>
cells with support circuitry decode address and implement read and write operation
support circuitry is also used to implement special !* Additional
on chip.
features such as burst operation, may also be
present
Ar - An
ui Address
CLK
i
Control WE
D, Signals
CS
D2 Data
Inputs
Memory Array
Dn
May be register
or latch
Data Outputs
T
OQ0
DQ
Fig. 3.6
Block dila9ram of
SRAM
i
Memory Hierarchy Desire tyf
.
. •4
^s
•
|n rows and
columns of memory Ceiis
arranged c
MEUK'Mewtt*3*'
" • n of
n
Diagram inrV arrays are has a unique location defined by intersect;
cell , dcterminedJaiiQi3LsiZ£-QtjneaiQj N !^
, /
10
Descriptl° ys. srAM I**"
& , h menlory v chiojs ‘ >
—
i
1 c"'0 ryorbiilines resPcC lV'fyalTaysonJI!e!S2i
n
ory
wordlines o nuniber 9* * - . .. i /nc n
raw and c0 un!?J uSt operate nOi.
at which ]52£22
2. Memory
flop may he in
“
at «!5EiJ
3 ^^
CS.)
between a
memory.
Intput Output
. ••ir ’ •
WL
—h H
'
"•
. VDD
M4
=
dw r 5 '
Q
5
M , -J M3
BL
BL
Fi9.3.8
Memory ce| ,
anced Computer Architecture
106 **^ Tf S' '
SRAM Word 15 c
SRAM SRAM SRAM
CD
o
Cell Cell Cell
Cla ? CO
Cell
Write enable
[ Refresh|
Fig. 3.10 Block diagram of DRAM
.
Memory Hierarchy Design M
- •
•l'-* > * jo . - *
DRAM Architecture
is called wordlines.
DRAM architecture consists of array of columns are called bitlines and arra of rows
Intersection of hitline and wordline is the address of the memory sell
Support Circuitry
on the memory cell .
^ Sense Amplifiers: Sense amplifies signal or charge detected
Address Logic: Address logic is used to select rows and columns.
& CAS latch and
Row Address Select ( RAS ) and Column Address Select ( CAS ): RAS
^ resolve the row and column addresses. It initiates or terminates the read / write
the
operation .
memory cell.
<D
c
c
Read Write Circuitry: RD/ WR circuitry store or read the information in CD
o
^ refresh cycle .
* Internal Counters: Internal counters keep track of the refresh sequence or initiateunnecessarily .
CO
E
CD
o
5
Output Enable Logic: Output enable logic prevents the data from displaying it
• -O
Cl)
C
c
CD
O
Classical DRAM Organisation (Square ) cn
Sense-Amps,
row
Column Selector & Coiumn
address Address
I/O Circuits
.
Fig 3.11 DRAM Organisation.
DRAM Read
Qt
1. Row Address Strobe ( RAS): is required at the beginning of every operation. And it Latch
row address and initiate the memory cycle. To enable RAS the voltage transition should be from
high tojow voltage . RAS is an active low and it should maintain low voltage as long as RAS is
]
required For Complete memory cycle RAS must active for minimum amount time and
also
inactive tor minimum amount of time.
gjnawfW*'frrjSi*' '
- before RAS for
irr Hftr
Computer Architecture * •« * *
'
'
$ 08 -Advanced
. - - . T*
iw
ion . CAS
Operation must be active
RcadAVrite
.
2 Column Address St
robe - : Initiate
fion . Low
voltage-
refresh cycle .
to choose for read /write operatic
): Write enable
signal used operation
.
3 Write Enable ( WE operation . High - voltage level signifies a read read
^
level signifies a write displaying the output while
control signal is used to prevent
.
4 Output Enable (OK ): This
when write operation is selected .
operation. The control signal is grounded
used for input and output. (D
3 •
c
c
O
Cl)
03
o
Is: (/ )
Control , Addr
ll II DRAM Cells o Data
is a: P
. . -. . - . ..
Examples: 1 meg chip holds one- -megabit of memory and 4 meg chip holds four
k
» • ; . •
. ‘
megabits of
; •
.
memory
Note: higher density chip generates more heat
:• ri%
2. Latency: Time that elapses betweenvthe start of an operation 5; -
and the completion
ms"
MP ' Z1
0
Word 0
1
Word 1
2
Word 2
3
Word 3
4
Word 4
5
Word 5
Fig. 3.13 Physica|
memory organisation.
&
p
Memory Hierarchy
Desinn
a**®**3s** ***
® 6
- Vv
10jr
,’ J * 't
i
- " r' ?>
i#
the memory
a word is equal to transfer Sj2
words, where word
2 ;i es . P
^^ ^
V *
—
t •
- -
Resu to in higher
.
• Has higher cost IK ,11 n,rts of computer to use one size for: .
usuallydesignsallpartsot
Q3
c
c
03
Note: Architec (
v o
* ** ? ,. t‘ ;I 00
• -••t
i Ero
• Memor)' word
ygg ;
^ j Th< o
. Integer (general -purpose registers
) -
'
£
' - * 4 — '
:v
r
*i"*
T3
03
C
ra
Translation between Byte and Word Addresses is performed by
intelligent memory controller o
CPU CO
can use byte addresses . Physical memory can use word addresses ( efficient )
ita
Mathematics of Translation
Processor Physical
Controller
memory
Fig. 3.14
Word address given by:
W = BIN
Offest given by:
at
0 = B mod N
Example
te
peration. yv 4
* Byte address 11 =
• Fo>md in word 2 at
offset 3
Important Point
K
ord where
^VSSmt'CofCbUl“‘ ate
vV *
°nS d
^ ofr
byte address bytes Per ° ITlainder’ Physi«l memory
®
ress mo word
address ^« wer is organized
means ffie translation from a
3.4 y extracting
AUXILIARY I ORY bits .
Auxilia ^^ "
:
•
&;
1
.
J
* «• r £> , JL* AW
,>
mi Ene ' , «
« «« <
''etap« nysne' eii'sblin"|’'Uuxiliuryme
tI1 ( tlc ^ , ^ ^ ««
ctl:ssil>
0t ler
lcbylh
P
11 Is non- volalilc. Aitollicr
1
bubble comnon eni 0ry 1 Cnts tJlat vvere°used previously 1
etc .
1
a3
c
F « g 3 15 CD (Secondary Storage Device ). c
CD
O
CO
of a memory are: E
main ctancteristics
CD
o
^• Capacity: Capacity represents the global volume of information (in bits) that the memory can
2.
6
-
O
store C
Cl)
c
c
% , Access time: Access tunc , orresponds to the lime interval between the read/write request and
the n
CD
O
C/)
> Capacity
Registers ] 1 n*
Cache Memories 1 5 MS
Random Access Memory
^Vi
110 , *
1
5 ps
S Mass Memory
Auxinary Memories
f ftd
*
__
1 Rash M
4
tod r
M**"
emor> : An electronic non - volatile computer storage device lhat. 1
'
both read and write access . However, the access tunes of flash
nfROM memories
memon
^^
ise between
longer ”
^
***« times of RAM.
Memory Hierarchy Design gj11h‘
a;
c
Fig. 3.17 Flash Drive.
c
a)
o
V)
E
2. Optical Disc: It is a storage medium from which data is read and to which it is written b> lasers. ca
O
Optical disks can store much more data- up to 6 gigabytes (6 billion bytes)-than most portable magnetic £
media, such as floppies. There are three basic types of optical disks: CD- ROM ( Read only ), WORM O
CL)
c
c
( Write-Once Read-Many ) and EO ( Erasable Optical disks). ca
o
GO
3. Magnetic Disk: A magnetic disk is a circular plate constructed of metal or plastic coated with
magnetized material . Both sides of the disk are used and several disks may be stacked on one spindle
with read/ write heads available on each surface. Bits are stored in magnetised surface in spots along
concentric circles called tracks. Tracks are commonly divided into sections called sectors. Disk that are
permanently attached and cannot removed by occasional user are called Hard disk. A disk drive with
removable disks is called a floppy disks.
A
A: Track
B: Geometrical Sector
C: Track Sector
D: Cluster
<
D
Fig. 3.18 Magnetic Disk .
4. Magnetic Tapes: A magnetic tape transport consists of electric, mechanical and electronic components
to provide the parts and control mechanism for a magnetic tape unit . The tape itself is a strip of plastic
coated with a magnetic recording medium. Bits are recorded as magnetic spots on tape along several
^ecur . trM»vsa»rsw
J
^
: with
tracks . Seven or Nine bits
are recorded to form a character
together
as a sequence
aPf
of characters.
can be recorded and read
mounted in each track so that data disc
Versatile Disc or Digital Video Disc is a popular optical
^
TYmtal
Digital Versa .
5. DVD: DVD also known as Most DVDs are of the same dimensions
. Its uses arc I
data. Variations of the term DVD often
storage media format main
much
as compact discs (CDs) but store mo
.
‘^ ^
RrQM haJ. dafa lhat can only be read and
not written,
describe the way data is stored on the discs
° DVD- ROM. DVD- RW , DVD+ RW
.
of handicapped students” ' ^* ^ °f included wi hin a sroup or
‘ Structure: "the inclusion
2 A person or thing that is
,
included within a arger
Same is the concept here. The group or structure.
inclusion
multilevel ,"Xhier
complexity for multiprocessors with
The inclusion property is stated
' n reducing the cache coherence
as L
information items are originally stored c L „ ^ L,r he Set elusion
the n 7 ' ' relatione!, -
/
arc copied into L„ Similarl subs in
_,
^ ^ .
•
u <
e c n cd mmv During , « all «
"- I ° * Ln and
nIu
-2'
so
H ess ng
on .
, subsets of1 /
n
,,
found in i, <
;- r Li elc ) lhen coP> es of i|,
e 1
, *" * r However, a word stored in Upner,
^eveS,i
. *'
rd !
|
* \
,r>
^* . ifan inform* \
,,
^^ eH|
e« s
't W ;
-
i n all «PI» m i» i n Z ( lower levels i m p l i e s that i t i s a)
- .
BT2CfiBVWti%*1vrJM££
Blocks are units of data transfer between cache and main memory
Pages are the units of data transfer between Main memory and disk
sto r”- 3e
Disk.Storage
.1 -
Backup Storage
- - "d" Fi3 3 20
,ow ltle '-
Suesiion arises at each level are
5 property in memory hierarchy.
els of the
love! Z.( ,
etic tape
Cathc and rna n
rile main ‘ me
are lhe units memory is divided •
4 y 32 bytes
<* ^«ds). Blocks le h,ne . The cache
Units
10
^
^ ata “
- hierarchy.
| c cohere
-
hr
insistent If
^£
s , « ££5
is often f,
S 5Sff
For ma,ntaini
, ,, "nl> the must be updated immediately r
dt successive memory levels
-
* C0
herence „
j d ,*
m ir i e
" *
,
|s
the feci 1 ' SUth: Fre<luently used information °
' *e 'ma'' ' .
^
iai all ve :it - Css
IS 0 s Called W neraOry I,; .
“ in, 1 die memory hierarchy
•»
ofi« Parallel lrit
, ' °
~ Methods
llle
" wore at y
S es
icmor/w ' "'Plesi and the n
» are:
""dified , 7''.fc adJ £ a„«„ with osle ,COmmonly used procedure
)
•
me hCh
(
^
, 111
° ^ . Short< this
<’ ri nemory being updated in
’ 2,* • •.
» n
' ^ I,
0
|
which demands immediate
'^
i "**•'* >.- w4Xi
W ^"
^
The
j 0calil.
v
, same
value
9 rule: According to
Hennessy and Patterson (1990), 90- 10
rule stales lha a Epical
% of its execution time on only 10% of the code such as the innermost program
,
d 90
,may sP"
coping operation
.
locality are: temporal, spatial, and sequential. Each
loop of a nested
Ie
% Three types of reference type of locality affects
the design of the memory hierarchy . The principle of localities will guide us in the design of
cache
mam memory, and even virtual Memory organization.
d
disk
%
1. Temporal Locality —
It is assumed that recently referenced items ( instruction or
likely to be referenced again in the near future. If at one point in time a particular
data ) are
memory
location is referenced, then it is likely that the same location will he referenced again in the near
future. Once a loop is entered or a subroutine is called, a small code segment will be referenced
repeatedly many times.
The temporal locality helps in determining the size of memory at successive levels.
—
2. Spatial Locality Spatial locality means items at addresses close to the addresses of recently
accessed items will be accessed in the near future. In this case it is common to attempt to guess
the size and shape of the area around the current reference for which it is worthwhile to prepare
faster access. For example, operations on tables or arrays involve accesses of a certain clustered
area in the address space. Program segments, such as routines and macros, tend to be stored in the
same neighborhood of the memory space.
hierarcfiy. The spatial locality assists us in determining the size of unit data transfers between adjacent
memory levels
die is in !iIt I
dedinio ^ l
3. Sequential Locality
—
In typical programs, the execution of instructions follows a sequential
order (or the program order ) unless branch instructions create out-of -order executions. The ratio
of in -order execution to out of order execution is roughly 5 to 1 in ordinary programs. Besides,
r beW* *\ - -
the access of a large data array also follows a sequential order.
a it
In deri
tQv , is given by the following.
tm
= rc + —n
i hi
following.
tav mtc + = tc +‘ ^m ~
n m
has created a
In deriving the above expression , it was assume that the requested memory element
elements , in time im.
cache miss, thus leading to the transfer of a main memory block, considering of m
. The above
Following that, m accesses, each for one of the elements constituting the block , were made
expression reveals that as the number of elements in a block , m , increases, the average access
time
decreases, a desirable feature of the memory hierarchy.
=
mtc + tm + ( n - l )tc rc
n
-t
m
+ ( n - 1) _ + lc
‘ av n nm
A simplifying assumption to the above expression is to assume that t - mt . In this case the above
expression will simplify to the following expression .
= mtc = tc _ n +1
' nm
av + tc - l c n n '
c
The above expression reveals that as the number of repeated access n increases , the average access
time will approach t ,.. This is a significant performance improvement.
JA •Ss -A * •:
%
• •
^ "-
vV •
Architecture ^
r
.
. t0
3.6
and
Memory hierarchy performance
is piece of information
on the hi ratios , and access frepuenc es
“IS
successive levels of memory
hierarchy .
for processing. U depends
i
hierarchy .
t
f
When
as hi
s
an
and
mis
'
s
informatio
miss ratio
n
a
iiem
L .,
is
is found
defined
,
). Here Hi ratio is a concept defined
in
as
Lt
1 -
for any two adjacent levels o
we call « a hit otherw se, a miss. The hit
,
h The hit rafios a successive levels are
, .
ration a L is denoted . Successive it ra 10s are
capacities , manageme nt policies , and program behavior
a function of memory
independent random variables with values betweep 0 and 1. The
access frequency to L/. is e m as
of successfully accessing L{ when
f . = ( l - / jj ) ( 1 - hj ... ( 1 - hi - tjhr This is indeed the probability
there are / - 1 misses at the lower levels and a hit at Lr
Note that
£1 /i = 1 and fi = hi
/ =
Due to the locality property , the access frequencies decrease very rapidly from low to high levels ;
, ,
that is, / » f 2 » / » .. . » fn . This implies that the inner levels of memory are accessed more often
than the outer levels.
To simplify the future derivation , we assume hQ = 0 and hn = 1, which means the CPU always
,
accesses M first and the access to the outermost memoiy Mn is always a hit.
^
eff -
i =l -( I )
and /) - ( 1 /* , ) ( ! - h 2 ) . . ( l - / ) h j thus put in eq ( 1 ) ,
iM
,
we get = VJ + ( 1 - V h 2r 2 + ( 1 ~ /i ) ( 1 - h ) h +
2 33 + (1 ^ - - /!,) ( 1 - /,2) .. ( 1
3.6.2 Optimization of Memory Hierarchy
The total cost of a memory hierarchy is estimated as follows:
n
C total =
Icost
=
/ 1
per byte *memory size
This implies that the cost is distributed over n levels.
Since c. >
,
v. < s2 < s3 < ... . The optimal design of a mCm0ry
sn >c > ' r
.
M , anti a otal cost close to the c n of MV
‘
' in a Close '
b|em to niinimizc
r
^
P
Be optimization l <0 °
n„ = £
on the total cost
/< «
•
ie w fiea ved
d inte aflc
dt constraints: pClf CpO ci
subject to the following
ine
,.. > (), /. > 0 for / = 1.2 ••• » ^
^tne16^orf
i
- n
ory W
,
- *-
r i
c -lotal - i^* . C • J;
'1 , <C
° is
1 1
. metn01^aiuf
; jxr&amons v
, „nH rnnacitv 5 - at each level Af- depend on the speed t required. Theref,0re d
* i /i«'
* ** >« - « y •* « i .
M *
for e virt
assign
*
would ass
3.6.3 Memory System Performance Parameters
Two basic parameters that determine Memory systems performance are:
.
1 Access Time: The time required by a processor to access data or to write data from and I
memory chip is referred as access time. i.e. Access time is the time taken by a processor ^
requesi j
to be transmitted to the memory system, access a datum and return it back to
Access time depends on physical parameter like bus delay , chip delay etc.
the processor. 11
.
2 Memory Bandwidfh: Memory bandwidth is the ability of
the memory to respond to requests I Add
I" T memonI
l Wi h dependS mem0ry system organization
7‘ ‘ “ , no of
°" ,
which processor would be submitting T §md
g ven by ihc
iS lhe *° rris
Offered request rate and maximum
memory requests if memory had unlimited bandwidth, *‘
memory bandwidth determine
bandwidth maximum achieved memory
^ Turns (J 'p
^
and it includes input, ProSram performance is the turnaround time
output tasks comni
time, these time
factors must be reduced ^ ime l me etC
system
attributes t0 performance are
’ ' us 10 reduce the turnaround
' ^
:
1. Clock rate Interle;
and CPI; F be he frequency differe
CPI:
‘ ' of typeI
instructi ns in a program. Then, Avcra#
°
CPU Total Cycle
I°
= Xcpi
tal
/ =1
i x pi where
F= _ IC
Instruction Count
xj.
CPU ti '
2" MIPS rate
'Cycle time
i=l
«
3> Tteoughput rate MIPS = Clock
^ Thr<)ughput rate/CPI x 6
JO
defines the
volumes 1 info
° rniation exchanged per unit of hnK
-~- •
• • ;;
OftSb r
4*
-7- P~
-—
-
“'
, * .,. , .
Memory
Interleaved ,
ed of a collectio'n r )R M niP, ry cMps n
memory is compos
Main to form a memory bank . It is P°SS bie to orr n •°e /le A ° r chi^
ir> aZ^,s** kmUni*
.
t0Sether memory. Interleaving is a „/
° ng /a,enor n, ,'guoUs Jy *nown
interleaved .Interleaved memory j .. , Chnt°<lUe for compensat
ly } arrange m ry in
< , Cun
'
performancan
ce access alternative section lmniedWely wjlh ‘
.
f
TheCPU
memory banks take turns
.
^ f* .as
sunplyjnp T
,, *"*** locmio o vT * * * ««
suss ss-
^
sasa" srs*:cj
coa
M0 V » one technique for
«
' * i' ? «
' •*
*“"« < «c. (Fig. 3.21 ).
--
; i:.
aiid
P W0 A#! M2 W3
lessor
Addresses: 0 1 2 3 1
mar,
4 5 6 7
mfe
Ifl 8
an
(a) MPS diagram and memory layout
(b ) Elements fetched from an
(cell i is in bank i mod 4 )
x 8 * 8 any when the stride is 9.
i
Fig. 3.21 Interleaved Memory.
^
0
different Memory bl ZZporimu s ^
are
°CCUr
^ Pan*Ue1’ bUt they must ** t 0 addresses located in
* Since m is a
tW S X moc m res Jlts in memory module to be referenced being
determined by^| ^ * address ,
° °f tbe memory '
. This is called low order interleaving .
^ ^ ° ^
(
* Memory
3 S0 be ,
add by higher order interleaving
* In higher 0r(j . ^ ^ ^ nuPPed to memory modules
define a module and lower bits define
^Z^
e ng upper of memory
‘lv bits address
a w rd in
° dla[ '
jn Mgher order Cav, n£ ,nost
whereas
f the references tend to remain in a particular module
•
ln l
w
° order jn » r °
Cavin dlc references tend to be distributed across all the modules
.
* l0 u, ^ memory bandwidth whereas higher order
^
er lnterIeavinK
interleavin » dn bc used t0 increase the reliability of memory system by reconfiguring
, Provid es for better memory
Ostein.
i
T3.9 *
nwa« ^
Memory Hierarchy Design
. _vasss ' eswsaso'r'*®* •
-
S
?
— '-r^- r&' 3 i /Vs „
* ^ j \
^ • •
L systems with multiple processors or with complex single processors, request s may congest the memory
^^ systerm)Either multiple requests may occur at the same time, providing bus or network congestion , or
requests arising from different sources may request access to the memory system
. Requests that cannot
. This contention
be immediately honored by the memory system result in memory systems contention
degrades the bandwidth and is possible to achieve from the memory system.
.
.
a single memory'
In the simplest possible arrangement, a single simple processor makes a request to
service from the module.
module. The processor ceases activity (as with a blocking cache) and waits for
an arrangement , the results are
When the module responds, the processor resumes activity . Under such
y system since only one request is
• completely predictable. There can be no contention of the memor
to have n simple processors access m
made at a time to the memory module. Now suppose we arrange
sors access the same module, Contention
independent modules. Contention develops when multiple proces
sors. Asymptotically , a processor
results in a reduced average bandwidth available to each of the proces
a memory cycle resembles
with a nonblocking cache making n requests to the memory system during
the n processor m module memory system, at least
from a modeling point of view{But in modern
the memory system . Whether or not a processor is slowed
systems, processors are usually bullered from
depends on the cache design and the service rate
down by memory or bus contention during cache access
of processors that share the same memory system
Nature of Processor
^
Processor are classified in 3 categories:
^ • Simple Processor: Simple processor makes a single request and waits for response from memo
ry. O
• Pipelined Processor: Pipelined processor makes multiple requests for various buffers in
each
memory cycle
• Multiple Processors: Multiple processors make each request once every memory cycle.
each
in
requesting once every memory cycle^
_
To represent the bandwidth available from the memory system, 2 symbols are used.
( m ) or B ( m , n) : Number of requests that are serviced in each module. Service time Ts = T C,
is number of requests_eachcyclg
.
where m is the number _gflQodMlgs and n _
time .
Tc is the cycle time and Ts is the service second.
B ( w) : Number of requests serviced per
fi( w ) = B( m )/ Ts
( a ) This model was developed by Hellerman and known as one of the best memory model .
v V//
( b ) In this model ^
jfellgman assumes a single sequence of addresses.
(c ) Bandwidth is determined by average length of conflict free sequence of addresses i .e. conflict free
sequence of addresses means that addresses are examined until a match between 2 addresses occurs
in w low order bit positions where w = log 0 m , m is no of modules.
( d ) Modeling assumption is that no address queue
is present and no out of order requests are possible.
( e ) Under these conditions the
maximum , available bandwidth is found to be approximately.
Bim ) = Jm and B( w ) =
JmlTs
3.8 .2 Strecker ’s Model
^^
, (c)
J ny busy module serves
All unserviced
1 request.
*
• Model Analysis:
l
^^^ iroppedeach cycle and there
are no queues.
- Bandwidth B( m, « ) is average
no of memory requests
- This ecqua s .
average no of memo serviced per memory cycle
.
Prob that a m0dUle
is no fenced
'"
by one
each memory 'prlcessor^
cycle .
Prob that a module
Prob that module s
So B { m , n )
is not referenced
by any processor
busy = j . ( 1 |/ )n
m
_
( t 1 /m )
u = 1
=
Thus the achieved
1 /m )"]
congestion ca ed Is less
^over cycles
than the th
eoretical d ue to
results in Pti contention.
bandtM
° mistic value for
^
he NegkCtin®
$
Memory Hierarchy DesiQn r
.- -
,i xnszu 'Jt?
^
I I" < cj
±
>!*#***
*•
!
r' -
v***?0
’
ur'
, -
s Model solution lor B( f n , n ) ( i.c., average nu*
3.8.3 Rau’ %!
rjSS 5S S:
S ".=
, ;,
^ B{ m , n ) = '
i i
£0
c =
.
2' xc( »« - 1. / ) x c
(n - 1, i)
3
(tO
Close form: =
/ i
(n — 1, /) / (c + 1)]
£ 2' X c( m - 1, / ) x [c
c-0
nodel.
e nu mber of ways
of choosing i objects out of set of # ,
.e. conflict free
where, r m ( 1 » f
1)
or n 1 objects.
or c(
—-
n 1.0 — “
M
Numerical:
,
Example: Consider a 2 level memory hierarchy M and M . Lei ratio of M is h . Let Cj and C2 are costs ,
,
per KByte, S and S 2 are memory capacities and / ( and t1 are the access time respectively.
( a ) Under what conditions with the average cost of the entire memory system approach C " E\ 2
2
( b ) What is effective memoiyaccess time / of this hierarchy ?
cl
to a particular (c) Let r = Ut ] be speed ratio of 2 memories. Let e
system. Express e > 0.95 if r 100?
= r / be the access efficiency of , ro memon
=
Solution: *
Average cost (C) = (C, .S, + C,S , )
We have to find the condition o
S + S2 ,
that average cost of entire memory
system approach to C 2 this
cycle .
le.
means that C C ‘
then
2
S2 » 5
C
, =*
= ^2^2 = C2
.
c 5,
hence when S » S
*2 2 I c c2.
(b
> Effective memory access , = Vi + ( i - / , i /
time a i i /
2 2
r = -h£.
ii
and £
*a
e =r *2
\
From r = — =*
t
tI
52
r
! ,
95 1
lfse; o: /t + 100 ( l - ft )
100
95
( /t + 100 - lOO/i ) = 1
=> 100
>.
' fiu&k ’ :
•
95
LOO
•»
h + 75 - 95h \1
95
=
h - 95h = -94 „
=S >
100
' o
areaa
= 1
.
h —
1100
95
-
*
95 = -94
h = - 0.99
•i
=>
r A ,
Example: Consider a 2 level memory hierarchy M and M 2 with access time and t2 , costs per byte is
nc
Cj and C2 capacities are 5. and S0 respectively . The cache hit ratio /i = 0.95 at first level. , .
( a ) Derive (
fmcar ref •
-
39 CONCLUSIONS I*/ ' - ;
:V7:-:r
-
'
- ~m
mm -
1 Physical memory is organized into fixed -size words accessed through a controller
.
word addr
Controller can use byte addressing when communicating with a processor
communicating with a physical memory.
3. To av id per NU id
arithmetic, use powers of two for address space size bytes
4. Memory ° num0I
5.
Banks are in alternative to single memory and single separ\ . jca| memory
connects to multiple controllers and each controller connects to
omioliers and memories can all operate simultaneously
Interlcaving is
^ ,
i
, - Related to memory banks
" Trans patent to programmer
/
Requests
:U - Interface
^ vV
CD
C
c
CO
o
GO
E
03
0 1 2 3 o
7 £
4 .i&6
,
$
8 10 11
;w:9 XI
a)
- c
c
03
a
c/)
Module 0 Module 1 Module 2 Module 3
13 Access time, memory bandwidth, turnaround time are the basic parameters to determine Memory
System performance.
—
14 There are 3 types of Processors Simple Pipelined and Multiple Processors.
,
15 Hellerman ’s model, strecker s model and Rau s model are the three models of simple processor
’ ’
memory interaction.
i
* r'
W*
* **>
»
Computer Architecture
~ -
• •
. ** ’•
«
Advaneed
•
'124
ffifWiilii.- - - V
^
4S
rarchy are
Memory hie
16 . Levels of
i
CPU
Register
. r
{ . . iy-.
L
i: •'
Level 1
Temporary
Level 2
— Areas
•••
f .
Storage
r
i*
«
:
r
r :
-
Physical RAM
:
Virtual Memory
}' .
>:•
* *
A
;•
Permanent
— Storage
Notwork/ Hard Areas
ROM/ Removable Internet Drive
IOS Drives Storage
Scanner / ;• •
.•1
u i :
AC - Other
'
n *> »
!
•
•4
. EXERCISE
a
V
P ; »* • x
:
•
- *•
i
l
.»
r • o
Q. l . If an 8 - way set - associative cache is made up of 32 bit words, 4 words per line and 4096 sets , how big
is the cache in bvtes?
Solution: We convert words/ line to bytes / line = 4 bytes/ word x 4 words/ line = 16 bytes/ line .
Cache size is KLN = 16 x 8 x 4096 = 512k bytes
Q.2. What is the shoiiesi time it would take to load a complete line in the above cache usin ® fast page mode
DRAM that has a R AS access time of 50 ns , a CAS access time of 13ns, a cycle time of 95 ns and a fast
page mode cycle time of 35 ns?
°° .
Q 3. What is the shortest time it would take to load a complete tine in the above cache usino ennna
ED „„
that has a R AS access time of 50 ns, a CAS access time of 13 ns, a cycle time of 84 ns and8 AM
un EDO cycle
time of 20 ns?
•»T «** f
Hit *
’""'"
page mode ) cycle to retrieve SUK
lllc EDO (hyper
. I f|
'# •
.
*
A cache line
*
; a
{A
mininiamtimewo
What is the shortest
W
time
.we « ed a lull
words after the first<il<lheJ(l + M + 20 + 20 '* a compk
it would take
cycles from
to loud
RAS
0 s
"
" tc line in. the above cache usinz Ch u a
to the first data out
-
and is clocked at I OQ M %
*!1
fl
°
^ ff »
I£ '
Solution: We achieve
5 clock
DRAM that requires the fastest time hy using synchronous
, but we need a
full RAS
access to subsequent $
access to get the f
r
W rd '
b, .
k Vy
I :l °
, m
the first ^ |
I
^
after
words
retrieve subsequent minimum time would be 5 clocks for the first
word plus 1 d ki Per subs jg ~
is 4 words, so
line » the
word.Clocked at 100 MHz
, we get 50 + 10 + 10 + 10 80 ns. =
an access time o f 2 0
*
II ¥ II
'
—'
• teff RUB + ( l - i.rLB SEG + 'PAGE ) + CACHE + ( 1 _ CACHE /MAIN
=
^^ '
• teff 2 + 0.02(20 + 20 + 0.5(60)) + 20 + (0.08) (60) 28.2 ns =
^ ^
• This represents a drop in performance of (28.2 24)/24.8 14.%. - =
.
Q 7 Stone says that a simple rule of thumb is that doubling the
cache size reduces the miss rate by roughly
30%. Given that the cache in question 5 is 256K bytes,
what is the expected percentage improvement
in the effective access time if we double the
cache size to 512K bytes?
Solution: New miss rate 0.08 (0.3) (0.08)
= -
Percentage improvement is (24.8 23.36)/
=
0.056, fcff 20 + (0.056) (60) 23.36 sn. = =
.
Q.8 What is the expected
-
24.8 5.8%. =
percentage improvement in the
if we double the cache size
again to 1024K bytes?
effectiive access time over that in above question
Solution: New miss rate 0.056 (
Percentage improvement is ( '
= -
0.3) (0.056) 0.039, =
tcff 20 + (0.039) (60) 22.34 sn. =
23.36 - 22.34)/23.36
big 4.4% = .
Q- L What do you • 1
REVIEW QUESTIONS . •
lode
.
Q.2 Write
short note on: rchy? Explain with
the help of diagram.
(a) Hit rati
i fast ratio and miss
(b) Access ratio
(c) turn
time
ords around time.
> the
.
Q3 Discuss
inclusion, localilly and
IAM
:ycle
..
Q 4 Discuss
Q3. What do
and compare
you mean
Helle
coherenee property
rman s, Strecker’s
’
by low order i
memory hierarchy
and Rau’s
"" mg and highmodel of simple memory processor i
erleavi
. _ leracb°n '