0% found this document useful (0 votes)
196 views20 pages

Ch-7 Cache Coherence and Synchronization

Advanced computer architecture

Uploaded by

Basant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
196 views20 pages

Ch-7 Cache Coherence and Synchronization

Advanced computer architecture

Uploaded by

Basant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 20
| Cache Coherence and ‘Synchronization ronization is important so ‘cache coherent and synchronization mechanism syncht This chapter deals with tain cache and main memory are consistent that all the copies of dat 7.4 INTRODUCTION ‘The cache isa smaller, faster memory which stores copes ofthe data fom the most frequents used rain memory locations. When processor wants some data itis searched inthe cache Fst, ifitis found thon ths ielled cache hit. Thus the main memory isnot involved in ead operation. But there are two policies for write operation to update the memory these are: write through and write back. Tnashared memory multiprocessor system , all the processors share a common memory. -The main reason forhaving separate caches foreach processors to reduce the average acces time in each processor. The same information stays in a umber af copies in some caches and main memory. TToensuve that memory operations must be executed coretl, the multiple copies must be Kept identical This imposes a cache coherence problem. ‘When two or more copies of a given datum exist in different processors’ memories, it may lead to different processors having diferent values for the same variable leading vo inherent problem. Cache inconsistencies are caused by data sharing, process migration or input/output etc. 207 eee ‘There is no incoherence problem with read only data but with write. To illustrate this, write through and write back policies are explained here again, Let here are 3 processors P,, P,, Ps with their private cache memories Cy, Cy, Cy. Let there is variable X that contains value 20. The load on X to the there processors result in constant copies in caches and main memory. ocm 3070Su Main Memory a, X=20 Fig. 7.1 Cache configuration after a load on X, Now, if'a processor performs a store to X, the copies of X in the caches become inconsistent. A load by the other processors will not return the latest value. Depending on the memory update policy used in the cache, the main memory may also be inconsistent with respect to cache. This is shown in Fig. 7.2. A store to X into the cache of processor P, updates the memory to the new value in a write-through policy. A write through policy maintains the consistency between memory and originating cache, but other 2 caches are inconsistent since they still hold the old value. c x=40 Sey Main Memory G Main Memory x-01 X=20 | Secaeaerere (a) Write-through cache policy (b) Write-back cache policy a Fig. 72 ‘ “aie. ie Te back pOLcy te MMMM METRO tine of sine. Te copies in fer 2 inconsistent. Memory is updated when modified data in the cache are copied back * ve nsimain cache coverence for shared Writable data start, categories of coherence protocols. These are: 0 1, Hardware-based) protocols 4, Software based protocols Trardware based protoco!s 4 Fig. 7.3 Hardware based cache coherence protocols. Now be will study hardware based protocols in three categories. 3 ae, update policy: (Write through and write back policy already described). - noe policy: (Write update policy also called greedy policy. Write invalidate pov’? called lazy policy. These 2 are described in snoopy bus protocol). and scheme. * Advanced Computer Architecture 4 Snoopy Bus Protocol document in a however, a processor ‘There are two primary methods to maintain the coherence: Write iny; Write-Invalidate: Write invalidate policy will iny block is updated. The advantage of this method is its it would result into cache miss. alidate and write update, alidate all the remote copi simple implementati Write-update or write broadeatt: the processor that is wi * ‘the bus (without issuing the invalidation signal). All cac updated. This scheme differs from write-invalidate in writes. iting the data broadcasts the new data over hes that contain copies of the data are then that it does not create only one local copy for Procesor = [Resor] a i rs ne ue | Fig.7.4 Snoopy Bus Protocol i - ache misses. " vache lines (Each update The advantages of this method Sete all writes Se invalidate protocl. = See Bee nt mangoes ae Bee ae addresses. It sees if the “ ea bus, watching ne on the request either i wwously snoops the Dus, We ending on Se asc eet ih ‘tddeess on the bus is in their cache an‘ mechanism receives requests Tom ne, and the TES aaa ser arerd if it hits or misses Tesponds to these, according to cache peanest ses. The disadvantage is that it consumes emory he bus a copy pt ina cessor te. cache that ta over e then py for nsumes ‘update s if the steither / ‘the bus. Cache Coherence and Synctrenizaion 241 ‘Again consider Fig. 7.5. = ‘Again there are 2 cases of snoopy bus protocols 1. Write invalidate: Let processor P) modifies its cache from X to X’, then all other copies (Py, P3) are invalidated. Invalidated blocks are also called “dirty blocks”. Cache } () =HO) Se Fig.7.6 After a wie invaidate operation by P is representing Invalidate block. rotocol broadcasts the modified value X10 all eache copies via ‘Write update protocol 2. Write update protocol: This p the bus. ‘ [Lite trou protect |, Write update protocot The memory copy is also In using write-back chaches, ‘updated at the same time the memory copy is updated asitis updated in cache later at the time of block replacement Fig. 7.7. Write update “dirty blocks”. RR {2422 Advanced Computer Architecture ener ges vs sand or Coete —() Cache : ee Fig. 7.8 After write update operation by P.- ‘We know that the stater of a cache block copy change with respect to read, write, replacement ‘operations in the cache. Before discussing further, some terms are defined here: 1. Valid: A cache block can assume one and invalid of the 2 possible states: valid or invalid. The cache block is invalid state if it is consistent with the memory copy. The cache block has been read from shared memory and has not been modified. ‘The reverse case is called invalid, ic, the block is not found in the cache or itis inconsistent with meinory copy. 2. Reserved: Date has been written exacily once since being read from shared memory. The cache : copy is consistent with memory copy which is the ouly other copy. 3. Dirty: The cache block has been modified more then once and cache copy is the only one in the system. Thus cache copy is inconsistent with other copies (either in other cache or main memory). 4. Read-miss and Read-bit: When a processor counts to read some data, it first request to cache memory, if itis not in the cache, this is called Read-mis and if it is found in cache then this is called Read-hit. Ifthe data that a processor wants to read, not present in the cache, then a bus, read operation will be initiated. If no dirty copy exists, means that main memory has a consistent copy thus main memory supplies copy to the requesting cache. If dirty copy exists this means that cache copy is the only one in the system. Thus cache will inhibit the main memory and send a copy to the requesting cache. Thus after a read-miss cache block copy is consistent with ‘memory copy and hence cache copy will enter the valid state, 5, Write-miss and Write-hit: When a processor wants to write in its local cache but fails to write, then this is called write-miss. Copy must come either from main memory or from a remote cache with a dirty block. Now the question is how this can be done, The answer is, first invalidate all the cache copies by sending a read-invalidate command. Then local copy is updated. Since only Jocal copy is updated (inconsistent with all other copies) thus ends up in a “dirty” state. Write-hit: If write can be possible in local cache then this is called write-hit, But write: possible if copy is in dirty state or reserved state, After write operation again the new state is dirty. May be possible that the new state is valid, then in this case a write-invalidate command is broadcast to all caches, invalidating their copies. 7.3.1.1 Write through cache _ State transitions for 2 basic writ-invalidate snoopy protocols developed! for write through and write iavalid. The ck has been The cache one in the memory). this is a3 bus, te Core ——— Eo ———san are considering here are valid oF inygig states sxand y where y is representing remote pe already. Thus in valid state (cache and main 2, eal ate Vai ec) sel) local processor x can also write Woy) er Ti can read ( 1 gate occurs when block is either replaced or invade (4 ‘valid tale: inva. Z me oy its cache copy all other cache copies become), lid whenever a successful read (Ry in cache x becomes va ) OF vite The cache block in (Wy out by 8 local processor x Woy), He) 7 , Fig.7.9. Write through cache. 7.34.2 Write Back Cache ii back cache states [Ba Se (Fiat Sa INV oF nots cache: [Regier] Reina ro) Fig. 7.10 Write back cache states * Read only: When a memory owns a block (finally written in MM) all the other cates cx contain only RO copies of the black. ‘Thus every processor having a copy, can read (Rt), Rl) copy safely. * Read Write (RW): RW state ‘owned by local processor x. * INV or Invalidate state: A writes W(y), its local copy corresponds to only one cache copy existing in the entre sjse In this state reading R(x) and write W(a) can be perfomed si Iready discussed state becomes invalid whenever a remote pos t local processor replaces (I{x) its own block copy. Ry) tecture ? tor Archi anced CO en a Be once Protocol 3 Wo for bus-based SSMS Proposed by James Gog a advantage of write through . Odmen in 1983, w.. ines the BP and write-back invalidating One prota i rotocol we year d8 a write of cache block uses a write through eggs * Rng and shared memory ae consistent, wil al ther 2 Linh i ep cc ”) 4 afer ist write, write back policy is used to update shared ‘Opies are invalidated Y) or c used valiga) ig," Mered and dity re used to describe ths scheme, "7 4cheststes ali oral, awrite Fig. 7.12 Write once cache protocol. Solid line shows the commands issued by local processor and zig-2ag line shows commands issued by remote processor via system bus. . es a 132 Snoopy Bus Write Invalidate Strategies ), RO) - a Ths itis clear from above discussion that there are 2 basic approaches to snoopy protocol consistency system ‘lie and invalidate, Again there are 4 approaches of invalidate strategy are: Write invalidate, Synapse, safely. is, Berkeley, cessor Scanned with ComSeannar Cache Coherence na Synervoncaton AeA “Hatwork Based Protocel (Directory Based) write eit, Synapse tioos Beteley Fig. 7.134 Variations on invalidate srategy of snoopy Protocols. 7.3.2.4 Berkeley Cache Coherence Protocol According to Berkeley cache coherency protocol any of cache used in shared memory multiprocessors ‘an own the cache Hine and ino cache Owns vpecache ine then memory wil own the cache line. 4 states re sed in this protocol: invalid, ead nly, shared dirty and private dirty cpu reed mies CPU read miss ‘CPU write (hit & miss) Fig. 7.44 Berkeley cache coherence protocols for CPU initiated cache state transitions. The cache which owns the ownership is in the shared dirty or private dirty state and all other are in rend only state, When a read miss occurs he requesting cache ets the Gata fom the owner of cache ‘ownership can be owned by the memory. Thus if line that nether cache, then the owner each i ts the state of ‘If line that is requested is © ced Computer Architecture ' Goce 7 : a and makes the write ye jidates all other copies of cache lint For a write hit, the requesting cache inval updates. Private dirty will be the new state of line. tocol is used for cache coherency and mainly 7.3.2.2 Illinois Protocol Cache Coherence J; MESI means modified (also called private Illinois protocol is also known as MESI protocol. This prot uses write back policy, The 4 states are used in this protocol dirty), Exclusive (read private), shared and invalid, Moilfed: The cache line i present only inthe curent cache, and is dirty it has been modified from the value in main memory. The cache is required to write the data back to main memory at some time in the future, before permitting any other read of the (no longer valid) main memory state. The write-back changes the line to the'Exclusive state. Exclusive: The cache line is present only inthe current cache, but is clean; it matches main memory. Tt may be changed to the Shared state at any time, in se to a read request. Alternatively, it may be d respon rom ee _ changed to the Modified state whet writing to it. . Su os that this cache line may be stored in other caches of the machine and is “clean” ’s the main memory. The line may be discarded (changed to the Invalid state) at any time. Invalid: Indicates that this cache line is invalid (unused) - " . y tz a ae or write miss occurs, the missing line is first retrieved from other caches. If the line ty, itis also written back to memory at the same time. The cache with the highest priori i the line, ifthe cache line is shared. . ‘ ghee ad mainly ed privale from the me in the rte-back smory. It pmay be he line upplies —— —— rion protocols comparison — - ‘snoopy cache var , perkeley IMinois Basic || Protocol Protocol Private Dirty Exclusive | Owned Exclusive xc | Owned Shared | Private Cleat Shared Invalid | Shared Invalid Invalid Cache Coherence Protocol 7323 Firefly so the Invalid state is not used here In this In this pro OCI the, (he tes, ‘This protocol never causes invalidation, are assigned to each block: . Se This block has a coherent copy of the memory. There is only one, ona « Dirty: is Di ett nyo bene a see. Thi el These stat Exclusive, Shared, and Modi protocol tes correspond to the ive, Shared, ified states of the MESI I. CPI) read miss (from mefnory) a eT Fig. 746 Firen ly Cache Coherenc °@ Protocol for CPU initiated cache state transitions Tae es line named 1 ect Which transitio fess pope Al ter adie pe the protocol detects sharing using a speit! ee, ste ayia apna lmenary peraions anda be Shari in their own cache. . ‘supplied Read mis: red is pertcg no abe: No state change. theread mor) emory) mueyy tocol th; Mee sta es ! Y one copy ofthe Ply shared, but its he only state that I protocol. a " a a ip, pe 81 eta inthe coches Dirty Sate he cache ini updag without u x Architecture —_ teen ren \ io ota was Dirt, itso writen to mem ; i ithe dare is no cache with the dat Ory. All partcipatin | ft gred. sttnere ; a, it is supplied by t WB Caches cha spots at 10 Vald-Exclusive, by tbe memory andthe egig | 1S veil tais in Valid-Exclusive state, th ; S ibe da the block is updated and its state j ating the IS changed to ‘thared, the data makes a writesthrough ate iS igh and wy aches, hey raise the SharedLine and plaice the memory. If the data is ie arene i not ase, andthe sates changed Valid emerging B, id-Es i nS a like a Read miss followed bya write hit Ifthe da 5 eS ge al pdt andthe cache ine ens up with he Share urate or pr cches, te cache Tine ends up in the Dirty state. - His oot resea in \ Dir etn cate sees ae vache line may be written back to hi jetion: A Dirty ca : memory at any tr , «Bie sve ate Fro is ae tig Sharad SORSPES ee erie nace wih some ier dla ay time, Goee et RM(sh) Processor-base transition ——————— Bus-induced transition Fig. 747 1324 Dragon Cache Coherency Protocol Tis proach use Write-back Update Protocol. 4 states of Dragon 0° Read private, shared clean hared. This implies that here might be uF (Can, might be shared.) shared dirty (Modified, might be s ‘that the memory copy IS not up-to-date.) inte ter copies of the data (in Shared-Clean state) but ified, only copy) Shas ity imply the ownership and dirty private (mod PU read miss: (shared) Read ne ng, her ent the: data d clean tbe up- y-date.). cache Coherence and SyneorZaGOR z ‘Shared line is used in Dragon which mes hat at least one (thus multiple writs are allowed) ott cach is sharing te same cache line The METS ¢e not updated until a Tine is eplaced. f there Oe rutile writes ten instead of wating the main memory, the updates are only ‘propagated to other rie that is holding the same cache Hine Inorder to identify which transitions mes bbe made, the protocol detects sharing USiDg a special bus tine named Shared. All accesses on the Mo ve mony bus are snooped by all caches, which assert the tied line when a snoop hit curs: THE following rules are then applied tothe transitions «Read hit: The data is supplied by the Iocal cache. No state change. «+ Read miss: If there is any cache with tree and supplies the data to the requesting cOChe> "nigh Keeps a copy in he Shared-Clean Sai ‘The supplying cache leaves its copy ‘of the line in the Shared-Dirty oF ‘Shared-Clean state, 25 appropriate. ‘Otherwise fetch the ‘data from main memory ‘and mark the cache Clean. «write it f the data inte cache isin Dir) or Cla sate, updated te cached data and mark Dinty Ifthe stat is Shared-Clean or Shared Dirty hen update oter caches the Shared ine is Pid change the Local cache 10 Shared Dirty and n sters to Shared-Clean, oterwise the Jocal cache changes to Dirty. a copy of the cache Tine it indicateS this with the Shared «Write miss: I there is any cache with a copy, htt cache supplies the data. The writer generaset a write broadcast, the local cache changes to Shared: Dirty anda thers 10 ‘Shared-Clean, Otherwise avin memory supplies the data, the local cache Saie changes to Dirty Eee ERS RY BASED pRoTOCOLS Directory based protocols apply to network connected systems and snoopy bus protocols apply to bus ‘connected systems. So when & ‘multistage network is used t© build a large multiprocessor with ‘hundreds ‘of processors, there is @ ‘need to modify the snoopy bus ‘protocols to suit the network capabilities. Directory Structures ‘as already sid that multistage networ CONES of hundreds of processors, thus cache directories 1 tied to stor information. Cache directory 80% centration where copies of cache blocks reside Yerious directory based protorcls exists Ome ES ae al dectory scheme oer use distributed ireory Marne ete. Various diector-based rotons cere in decry structures (ull mapfimite), OX sepemation is stored inthe directory and what information to be stored in the director). 1, Central directory scheme: It was te fist directory scheme proposed by Tang (197), As the name indicates “central”, a central diectOry W Trzitzined which contains aplictes of the che directories. As it contains duplicates ofall the cache directories thus itis usually ver TAPES and must be associatively searched like the individual cache directories. Drawback of this scheme if thatthe size of directory js lange thus searc contention is another problem in this scheme, Ja time is longer also 220. Advar [Computer Architecture Ge Gad... Ga fe] :P, dae P, Fig. 7.19 Centralised directory scheme. 2. Distributed directory scheme: This scheme was proposed by Censier in 1978, The limitations of central directory scheme (long search time and contention) were temoved by distributed directory scheme. Instead of keeping all the informations ofall the cache directories, each memory module maintains a ‘Separate directories which records the state and Presence information for each memory block. Information indicates which cache has a copy of the block presence. Fig. 7.20 Distributed directory scheme. Cache directory: The list of cached locations ie. locations of all the cached Shared data is called cache directory. Cache location may be centralized or dis for each block of data centains: copies of each block of istributed directory entry 1. No. of pointers are used to specify the locations of copies of block, 2. Dinty bit to specify whether a unique cache has Permission to write the associated block of data, _Ditectory Protocois ee = ae - 7 parcours: Tn ti ress EO (a) Dirty bit. ; is: Pine (one bit per races present or absent. If ity it is set, processor can write into the block. ose biz er bok 2 bis are sed to reese Gad bit iicucs otc sa es indies whee avai lock may be ween ethers block is aig _Cade choot an Sarita caury foreach block contains cor). Since one bit is used means one bi then one and only one process «it epresenys 2 0r's Bit is sex and oiher indicates Beas gettin saamoryderoory ss in ie cach and thos ar Count ‘Three different states of full-map directory are discussed here: ‘These terms are used here: : "Ais location that all the caches wants to access (in frst case location A is missing) a « D (dirty bitis set) emory ‘« C (directory entry is sét to clean) on for # Py, Pay Py nn Py a8 the processors. Ist state: Cache 1, Cache2, Cache 3 wants to access the locati eas n ss the location A, Location A is missing i sae ‘memory only. In tis state, directory bit is clean dirty bit is clea vin permission to write to black of data. in means that no 2 Baad) ReadA Read ah site eee it try Be Soci arte cea cate onl overnite ea ‘Tequesting copies of location A, thus three pointers are set in the entry, representing seas. aches that have copies ‘MEANS no processor stl ct data, In thi pct hs 0 pein pee of ata. nist also diy Ben Scanned with ComSeannar Presents 9 S Set ang is vatig n all the that no are set an (C) ex oor process Peis realy to “wri ver 2nd state Od 10 “write A” or cact een ye: AM ses a wete 0 eache 3, ieee oon sion cng CHO me |. a ny, (p nas to wa) at write to A, and Tocation A is also present in cache 1 and cache 2, thus ter , thus memory Ate block the $, Write permission message is rec 6, Memory Module waits to receive the ack transaction, Thus by waiting for acknowl conta points to cache 1 an ensures sequential consistency etory so that every i feo Total memory overhead = there are N want 10 « rises watidate requests to cache 1 and cache 2. ving the invalidate requests, cache 1 and cache 2 set the ra toxin Ais invali, end acknowledgments ack ara module, Fig.7.24 4, After receiving the Acknowledgments, ‘Memory ‘dcache 2 and sends write Pe size of meraory o aw). jedgments> occurs, petects that block containing Jocation A is valid or not. If it j » each 3 issues a write request to memory module cnet valid (consistent with ining location A and stalls module sets the dirty pit (from C to D). clears srmission to cache 3. fates the state in cache, ceived by cache 3, upd thus Pisreactivated Butin mit AC as set associ T In 1988, Agga A represe Thus, for \ For limit ‘The limi (<¥) req | 7 0G; write ystem il map Cache Coherence and Syrehronization ae But in limited direct cory, here ae ned opie per ety act as set associative cache, , } lg less of system size, Directory “To salve directory Sze problem, limited directory protocols are designed Jn 1988, Aggarwal used the notation Dir A, for directory protocol where 116 N0 of pointers. Where ‘A represents broadcast (B) or no broadcast (NB) “Thus, for full map directory the noation is Dity NB«~ No broadcast a: no. of pointer For limited directory the notation is Dir, NB__ where 2 caches request read copies ofa particular lock of data Shared meory Fig. 728 ow Cache 3 also requests copy of loation A “Then what can be possibility. The memony nodule must invalidate the copy in either cache | OF ‘cache? this means that pointer must De replaced. (Memory module — Cy t0 C3 OF ‘Memory Module — C,10C)) ‘This process of pointer replacement iS called eviction Fig. 7.27 Eviction limited directory (pointer replacement frm cache 2 0 cache $) © Memory overhead is O(N IogN) Low pointer requires log bits of memory = Were N = no. of processors im the system eel er C. Chained directories: This scheme is called chained scheme because it keeps track of shared copies ted directories, i < N,. But still there are of data by maintaining a chain of directory pointers. In limi e oie limitation of scalability. Chained directories realize the limited directories without restricting n0- of shared copies of data blocks. Two cases are considered here: Suppose there are no shared copies sends a copy to cache | along with c memory. ‘Advanced Computer Architecture of location A and processor P, reads location A, Memory Module hain termination (T) pointer. Also a pointer to cache | is kept by {Shared memory (LED) E ies) e x a. [eae 1 | [Facto 2 | [eactes ~ @ ® Fig.7.28. (@) po =~ Shared memory: « (cL [d Cache 1 [Data] Read A Fig.7.28 (b) Now P, want to read location A, memory sends a copy to cache 2 along with pointer to cache 1 ‘The memory now keeps a pointer to cache 2 and cache 2. Now points to cache 1 ~~ Shared memory Fig, 7.29 mites ite (0 location A, a data invalidation message is sent down the chain, acknowledgment of invalidation not arrived. = ERENT PROTOCOLS «. ot ws - CACHE ‘COH ERARCHIAL ON E Te HERR scanned 9 tite 8 MUINPROCSHON. Moti | OM iacil AEN s syster es by Wilson in 1987 who generalized Goodman's ye on! extension of sin che system a d eye wih 2h mulevel CHCNE SIS re shown in figure os cache coherency Pr ult processors consider 2 level structure s igure, hierarchial bus ee i Main ‘memory usd wie oT x Cache 9 coe | al (Cal Ae Pi] PJ Pa} \(ed Fig. 7.30 Hierarchial cache coherence. ‘Assume a processor P issues a write command to a data block A which has 4 copies in caches Cy Cy, Cand C,, As shown, copies must be available in higher level caches Cy, Cig too. According 0 ‘ite once protocol, after writing A in C and Cy, all other copies of A should be invalidated. 3. ‘8° pl This is achieved by broadcasting the write command first on low level by B, and then higher level bus B,. The copy of A in C, is invalidated by write command on By. When 2 write command appears on bus Bg, second level caches check if they have @ copy of A. ye Cio detects a write command to A and finds a copy of A, it must invalidate its own copy and send {ov an invalidate command o bus 3 where copies of in C, and C, must be invalidated. ral only 5 Second level caches associated with updated processors C;, and C have copies of A. 4 eee requestissuedby another processor eg, Pg shouldbe broadcast at the lowerle aya ae at that level it should be propagated up in the hierarchy. rat ie a we for A, and finds that it has a dirty copy of A, it supplies dirty copy ‘flush request down to bus B, where C, will relinguish exclusive see ofA to cache 6 case of shared memory mult Tas iprocessor system, NUS Cohereney and synchronization, These protocols are applied s° ‘memory will remain consistent,

You might also like