Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
COM PARE - Compare numbers.
IN - Input informat ion from a device, e.g., keyboard.
JUM P - Jump t o designat ed RAM address.
LOAD - Load informat ion from RAM t o t he CPU.
OUT - Out put informat ion t o device, e.g., monit or.
STORE - St ore informat ion t o RAM .
Comput ers are classified on t he basis on inst ruct ion set t hey have as:
CISC Scalar Processors
CISC (Complex Inst ruct ion Set Comput er): CISC based comput er w ill have short er programs w hich are made up
of symbolic machine language. A Complex Inst ruct ion Set Comput er (CISC) supplies a large number of complex
inst ruct ions at t he assembly language level. During t he early years, mem ory w as slow and expensive and t he
programming w as done in assembly language. Since memory w as slow and inst ruct ions could be ret rieved up
t o 10 t imes fast er from a local ROM t han from main memory, programmers t ried t o put as many inst ruct ions
as possible in a m icrocode.
Speed: Since a simplified inst ruct ion set allow s for a pipelined, superscalar design RISC processors
Advant ages:
Simpler Hardw are : Because t he inst ruct ion set of a RISC processor is so simple, it uses up much less
t echnology and t he same clock rat es.
chip space; ext ra funct ions, such as memory management unit s or float ing point arit hmet ic unit s, can
also be placed on t he same chip. Smaller chips allow a semiconduct or manufact urer t o place m ore
Short er Design Cycle : Since RISC processors are simpler t han corresponding CISC processors, t hey can
part s on a single silicon w afer, w hich can low er t he per-chip cost dramat ically.
be designed more quickly, and can t ake advant age of ot her t echnological development s sooner t han
corresponding CISC designs, leading t o great er leaps in performance bet w een generat ions.
Difference between CISC and RISC
VLIW Architecture
Very long inst ruct ion w ord (VLIW) describes a comput er processing archit ect ure in w hich a language compiler
or pre-processor breaks program inst ruct ion dow n int o basic operat ions t hat can be performed by t he
processor in parallel (t hat is, at t he same t ime). These operat ions are put int o a very long inst ruct ion w ord
w hich t he processor can t hen t ake apart w it hout furt her analysis, handing each operat ion t o an appro priat e
funct ional unit .
VLIW is somet imes view ed as t he next st ep beyond t he reduced inst ruct ion set comput ing (RISC) archit ect ure,
w hich also w orks w it h a limit ed set of relat ively basic inst ruct ions and can usually execut e more t han one
inst ruct ion at a t ime (a charact erist ic referred t o as superscalar ). The main advant age of VLIW processors is
t hat complexit y is moved from t he hardw are t o t he soft w are, w hich means t hat t he hardw are can be smaller,
cheaper, and require less pow er t o operat e. The challenge is t o design a compiler or pre-processor t hat is
int elligent enough t o decide how t o build t he very long inst ruct ion w ords. If dynamic pre-processing is done as
t he program is run, performance may be a concern.
Figure 2.2: A VLIW processor architecture and instruction format Figure 2.3: Pipeline execution
M emory Hierarchy
The t ot al memory capacit y of a comput er can be visualized by hierarchy of component s. The memory
hierarchy syst em consist s of all st orage devices cont ained in a comput er syst em from t he slow Auxiliary
M emory t o fast M ain M emory and t o smaller Cache memory. Auxiliary memory access t ime is generally 1000
t imes t hat of t he main memory, hence it is at t he bot t om of t he hierarchy.
The main memory occupies t he cent ral posit ion because it is equipped t o comm unicat e direct ly w it h t he CPU
and w it h auxiliary memory devices t hrough Input / out put processor (I/ O).
When t he program not residing in main memory is needed by t he CPU, t hey are brought in from auxiliary
memory. Programs not current ly needed in main memory are t ransferred int o auxiliary memory t o provide
space in main memory for ot her programs t hat are current ly in use.
The cache memory is used t o st ore program dat a w hich is current ly being execut ed in t he CPU. Approximat e
access t ime rat io bet w een cache memory and main memory is about 1 t o 7~10
3. M ain memory or RAM (Random Access M emory): It is a t ype of t he comput er memory and is a
hardw are component . It can be increased provided t he operat ing syst em can handle it . Typical PCs
t hese days use 8 GB of RAM . It is accessed slow ly as compared t o cache.
4. Hard disk: A hard disk is a hardw are component in a comput er. Dat a is kept permanent ly in t his
memory. M emory from hard disk is not direct ly accessed by t he CPU, hence it is slow er. As compared
w it h RAM , hard disk is cheaper per bit .
5. M agnet ic t ape: M agnet ic t ape memory is usually used for backing up large dat a. When t he syst em
needs t o access a t ape, it is first mount ed t o access t he dat a. When t he dat a is accessed, it is t hen un-
mount ed. The memory access t ime is slow er in magnet ic t ape and it usually t akes few minut es t o
access a t ape.
Figure 2.5: The inclusion property and data transfer between adjacent levels
The follow ing t hree principles w hich led t o an effect ive implement at ion memory hierarchy for a
syst em are:
1 M ake the Common Case Fast: This principle says t he dat a w hich is more frequent ly used should be kept in
fast er device. It is based on a fundament al law , called Am dahl’s Law , w hich st at es t hat t he performance
improvement t o be gained from using some fast er mode of execut ion is limit ed by t he fract ion of t he t ime
t he fast er mode can be used. Thus if fast er mode use relat ively less frequent dat a t hen most of t he t ime fast er
mode device w ill not be used hence t he speed up achieved w ill be less t han if fast er mode device is
more frequent ly used.
2. Principle of Locality: It is very common t rend of Programs t o reuse dat a and inst ruct ions t hat are used
recent ly. Based on t his observat ion comes import ant program propert y called localit y of references: t he
inst ruct ions and dat a in a program t hat w ill be used in t he near fut ure is based on it s accesses in t he recent
past . There is a famous 40/ 10 rule that comes from empirical observat ion is:
" A program spends 40% of it s t ime in 10% of it s code"
These localities can be categorized of three types:
a. Temporal locality: st at es t hat dat a it ems and code t hat are recent ly accessed are likely t o be accessed in t he
near fut ure. Thus if locat ion M is referenced at t ime t , t hen it (locat ion M ) w ill be referenced again at some
t ime t +Dt .
b. Spatial locality: st at es t hat it ems t ry t o reside in proximit y in t he memory i.e., t he it ems w hose
addresses are near t o each ot her are likely t o be referred t oget her in t ime. Thus w e can say memory accesses
are clust ered w it h respect t o t he address space. Thus if locat ion M is referenced at t ime t , t hen anot her
locat ion M ±Dm w ill be referenced at t ime t +Dt .
c. Sequential locality: Programs are st ored sequent ially in memory and normally t hese programs has
sequent ial t rend of execut ion. Thus w e say inst ruct ions are st ored in memory in cert ain array pat t erns and
are accessed sequent ially one memory locat ions aft er anot her. Thus if locat ion M is referenced at t ime t ,
t hen locat ions M +1, M +2, … w ill be referenced at t ime t +Dt , t +Dt ’, et c. In each of t hese patt erns, bot h Dm
and Dt are “ small.” H& P suggest t hat 90 percent of t he execut ion t ime in most programs is spent execut ing
only 10 percent of t he code. One of t he implicat ions of t he localit y is dat a and inst ruct ions should have
separat e dat a and inst ruct ion caches. The main advant age of separat e caches is t hat one can fet ch inst ruct ions
and operands simult aneously. This concept is basis of t he design know n as Harvard archit ect ure, aft er t he
Harvard M ark series of elect romechanical machines, in w hich t he inst ruct ions w ere supplied by a separat e
unit .
3. Smaller is Faster: Smaller pieces of hardw are w ill generally be fast er t han larger pieces.
This according t o above principles suggest ed t hat one should t ry t o keep recent ly accessed it ems in
t he fast est memory.
While designing t he memory hierarchy follow ing point s are alw ays considered
Inclusion property: If a value is found at one level, it should be present at all of t he levels below it .
The implicat ion of t he inclusion propert y is t hat all it ems of informat ion in t he “ innermost ” memory level
(cache) also appear in t he out er memory levels.The inverse, how ever, is not necessarily t rue. That is, t he
presence of a dat a it em in level M i+1 does not imply it s presence in level M i. We call a reference t o a missing
it em a “ miss.”
The Coherence Property
The value of any dat a should be consist ent at all level. The inclusion propert y is, of course, never complet ely
t rue, but it does represent a desired st at e. That is, as informat ion is modified by t he processor, copies of t hat
informat ion should be placed in t he appropriat e locat ions in out er memory levels. The requirement t hat
copies of dat a it ems at successive memory levels be consist ent is called t he “ coherence propert y.”
Coherence Strategies
W rite-through
As soon as a dat a it em in M i is modified, immediat e updat e of t he corresponding dat a it em(s) in M i+1, M i+2,
… M n is required. This is t he most aggressive (and expensive) st rat egy.
W rite-back
The updat e of t he dat a it em in M i+1 corresponding t o a modified it em in M i is not updat ed unit it (or t he
block/ page/ et c. in M i t hat cont ains it ) is replaced or removed. This is t he most efficient approach, but cannot
be used (w it hout modificat ion) w hen mult iple processors share M i+1, …, M n.
M emory Capacity Planning:
The performance of a memory hierarchy is det ermined by t he effect ive access t ime (Teff) t o any level in t he
hierarchy. It depends on t he hit rat io and access frequencies at successive levels.
Hit Ratio (h): is a concept defined for any t w o adjacent levels of a memory hierarchy. When an informat ion
it em found in M i, it is a hit , ot herw ise, a miss. The hit rat io (hi) at M i is t he probabilit y t hat an inform at ion it em
w ill be found in M i. t he miss rat io at M i is defined as 1-hi.
The access frequency t o M i is defined as
fi= (1-h1)(1-h2)….(1-hi)
Effective Access Time (Teff):
In pract ice, w e w ish t o achieve as high a hit rat io as possible at M 1. Every t ime a miss occurs, a penalt y must
be paid t o access t he next higher level of memory. The Teff of a memory hierarchy is given by:
Hierarchy Optimization:
The t ot al cost of a memory hierarchy is est imat ed as:
Not e t hat t his means consecut ive addresses are st ored w it hin t he same module, except at t he boundary. The
above arrangement is called high-order int erleaving, because it uses t he high-order, i.e. most significant , bit s
of t he address t o det ermine w hich module t he w ord is st ored in.
Low-Order Interleaving
An alt ernat ive w ould be t o use t he low bit s for t hat purpose. In our example here, for inst ance, t his w ould
ent ail feeding bus lines A0-A1 int o t he decoder, w it h bus lines A2-A27 being t ied t o t he address pins of t he
memory modules. This w ould mean t he follow ing st orage pat t ern:
In ot her w ords, consecut ive addresses are st ored in consecut ive modules, w it h t he underst anding t hat t his is
mod 4, i.e. w e w rap back t o M 0 aft er M 3.
Bandwidth
The memory bandw idt h (B) of an m -w ay int erleaved memory is low er-bounded by 1 and upper-bounded by
m. The approximat ion of B by Hellerman is:
~√
0.56
B=m
In t his equat ion m denot es t he number of int erleaved memory modules. This equat ion indicat ed t hat t he
efficient memory bandw idt h is approximat ely t w o t imes t hat of single module w hen four memory modules
are used.
This pessimist ic est imat e is because of t he fact t hat block access of different lengt hs and access of single
w ords are random ly mixed in users programs. Hellerman's calculat ion w as depend on a single processor
syst em. The effect ive memory bandw idt h decreased again, if memory-access conflict s from mult iple
processors are considered.
Fault Tolerance
To achieve various int erleaved memory organizat ions, low order and high order int erleaving are combined. In
each memory module, sequent ial addresses are allocat ed in high order int erleaved memory. This makes it
simple t o isolat e fault y memory modules in a memory bank of m memory modules. If one module failure is
det ect ed t he remaining modules can st ill be used by opening t he w indow in t he address space. This fault
isolat ion cannot be performed in low order int erleaved memory, w here a module failure may paralyze t he
complet e memory bank. Hence, low order int erleaved memory is not fault t olerant .
Backplane Buses
A backplane bus int erconnect s processors, dat a st orage and peripheral devices in a t ight ly coupled
hardw are. The syst em bus must be designed to allow communicat ion bet w een devices on t he devices on
t he bus w it hout dist urbing t he int ernal act ivit ies of all t he devices at t ached t o t he bus. These are t ypically
`int ermediat e' buses, used t o connect a variet y of ot her buses t o t he CPU-M emory bus. They are called
Backplane Buses because t hey are rest rict ed t o t he backplane of t he syst em.
Backplane bus specification
They are generally connect ed t o t he CPU-M emory bus by a bus adapt or, w hich handles t ranslat ion bet w een
t he buses. Commonly, t his is int egrat ed int o t he CPU-M emory bus cont roller logic. While t hese buses can be
used t o direct ly cont rol devices, t hey are used as 'bridges` t o ot her buses. For example, AGP bus devices –
Int erconnect s t he circuit boards cont aining processor, memory and I/ O int erfaces an int erconnect ion
communicat ion.
Dat a address and cont rol lines form t he dat a t ransfer bus (DTB) in VM E bus.
st ruct ure w it hin t he chassis.
DTB Arbit rat ion bus t hat provide cont rol of DTB t o request er using t he arbit rat ion logic.
Int errupt and Synchronizat ion bus used for handling int errupt .
Ut ilit y bus includes signals t hat provide periodic t iming and coordinat e t he pow er up and pow er dow n
sequence of t he syst em.
R/ W
Int ernet Bus
Dat a Bus
The backplane bus is made of signal lines and connect ors. A special bus cont roller board is used t o house t he
backplane cont rol logic, such as t he syst em clock driver, arbit er, bus t imer and pow er driver.
Functional module: A funct ional module is collect ion of elect ronic circuit ry t hat resides on one funct ional
An arbit rat or is a funct ional module t hat accept s bus request from t he request er module and grant
board and w orks t o achieve special bus cont rol funct ion. These funct ions are:
A bus t imer measures t he t ime each dat a t ransfer t akes on t he DTB and t erminat es t he DTB cycle if
cont rol of t he DTB t o one request at a t ime.
An int errupt er module generat es an int errupt request and provides st at us / ID informat ion w hen an
t ransfers t ake t oo long.
A locat ion monit or is a funct ional module t hat m onit ors dat a t ransfer over t he DTB. A pow er monit or
int errupt handler module request s it .
A syst em clock driver is a module t hat provides a clock t iming signal on t he ut ilit y bus. In addit ion, board
w at ches t he st at us of t he pow er source and signals w hen pow er unst able.
int erface logic is needed t o mat ch t he signal line impedance, t he propagat ion t ime and t erminat ion
values bet w een t he backplane and t he plug in board.
Asynchronous Data Transfer
All t he operat ions in a digit al syst em are synchronized by a clock t hat is generat ed by a pulse generat or. The
CPU and I/ O int erface can be designed independent ly or t hey can share common bus. If CPU and I/ O
int erface share a common bus, t he t ransfer of dat a bet w een t w o unit s is said t o synchronous. There are
some disadvant ages of synchronous dat a t ransfer, such as:
• It is not flexible, as all bus devices run on t he same clock rat e.
• Execut ion t imes are t he mult iples of clock cycles (if any operat ion needs 3.1 clock cycles, it w ill t ake 4
cycles).
• Bus frequency has t o be adapt ed t o slow er devices. Thus, one cannot t ake full advant age of t he
fast er ones.
• It is part icularly not suit able for an I/ O system in w hich t he devices are comparat ively m uch
slow er t han processor.
In order t o overcome all t hese problems, an asynchronous dat a t ransfer is used for input / out put syst em.
The w ord ‘asynchronous’ means not in st ep w it h t he elapse of t ime; In case of asynchronous dat a t ransfer,
t he CPU and I/ O int erface are independent of each ot her. Each uses it s ow n int ernal clock t o cont rol it s
regist ers. There are t w o popular t echniques used for such dat a t ransfer: st robe cont rol and handshaking.
Strobe Control
In st robe cont rol , a cont rol signal, called st robe pulse, w hich is supplied from one unit t o ot her, indicat es t hat
dat a t ransfer has t o t ake place. Thus, for each dat a t ransfer, a st robe is act ivat ed eit her by source or
dest inat ion unit . A st robe is a single cont rol line t hat informs t he dest inat ion unit t hat a valid dat a is
available on t he bus. The dat a bus carries t he binary informat ion from source unit t o dest inat ion unit .
Data transfer from source to destination
The st eps involved in dat a t ransfer from source t o dest inat ion are as follow s: (i) The source unit places dat a on
t he dat a bus.
(ii) A source act ivat es t he st robe aft er a brief delay in order t o ensure t hat dat a values are st eadily placed on
t he dat a bus.
(iii) The informat ion on dat a bus and st robe signal remain act ive for some t ime t hat is sufficient for t he
dest inat ion t o receive it .
(iv) Aft er t his t ime t he sources remove t he dat a and disable t he st robe pulse, indicat ing t hat dat a bus does
not cont ain t he valid dat a.
(v) Once new dat a is available, st robe is enabled again.
Dat a Bus
Source unit Dest inat ion
St robe unit
Dat a bus
St robe
Dat a Bus
Source unit Dest inat ion
St robe unit
Dat a bus
St robe
Dat a Bus
Source unit Dest inat ion
Request
unit
Reply
Dat a bus
Request
Reply
Dat a Bus
Dest inat ion Source unit
Request
unit
Reply
Dat a bus
Request
Reply
BGACK - At t he end of t he current bus cycle t he pot ent ial bus mast er t akes cont rol of t he syst em buses and
assert s a bus grant acknow ledge signal t o inform t he old bus mast er t hat it is now cont rolling t he buses.
This signal should not be assert ed unt il t he follow ing condit ions are met :
1. A bus grant has been received.
2. Address st robe is inact ive, w hich indicat es t hat t he microprocessor is not using t he bus.
3. Dat a t ransfer acknow ledge is inact ive, w hich indicat es t hat neit her memory nor peripherals are
using t he bus.
4. Bus grant acknow ledge is inact ive, w hich indicat es t hat no ot her device is st ill claiming bus
mast ership.
On a t ypical I/ O bus, how ever, t here may be mult iple pot ent ial mast ers and t here is a need t o arbit rat e
bet w een simult aneous request s t o use t he bus. The arbit rat ion can be eit her cent ral or dist ribut ed.
Cent ralized bus arbit rat ion in w hich a dedicat ed arbit er has t he role of bus arbit rat ion. In t he cent ral scheme,
it is assumed t hat t here is a single device (usually t he CPU) t hat has t he arbit rat ion hardw are. The cent ral
arbit er can det ermine priorit ies and can force t erminat ion of a t ransact ion if necessary. Cent ral arbit rat ion is
simpler and low er in cost for a uniprocessor system. It does not w ork as w ell for a symmet ric mult iprocessor
design unless t he arbit er is independent of t he CPUs.
Arbit er
Release
Request