0% found this document useful (0 votes)
44 views25 pages

4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024

Uploaded by

Omar Amer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views25 pages

4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024

Uploaded by

Omar Amer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Module #4

Shared Memory Architectures


Professor Mostafa Abd-El-Barr

Fall Term 2024-2025

Sunday, October 27, 2024 1


Outline
1. Introduction to Shared Memory Architecture
2. Classification of Shared Memory.
3. Bus-based Symmetric Multiprocessors
4. Cache Incoherence-Problem

2
Introduction
o A shared memory as shown in the Figure.
o In shared All inter-processor coordination and synchronization are accomplished via the global memory.
M M M M

Interconnection Network

P P P P

o Two main problems in designing a shared memory system:


1. Performance degradation due to contention, and
2. coherence problems.
o Performance degradation might happen when multiple processors are trying to access the shared memory simultaneously.
o Having multiple copies of data, spread throughout the caches, might lead to a coherence problem.
o Copies in the caches are coherent if they all equal the same value.
o If one of the processors writes over the value of one of the copies, then the copy becomes inconsistent because it no longer
equals the value of the other copies.
Classification of Shared Memory Systems
o This is the simplest shared memory.

o An arbitration unit within the memory module passes requests through to a memory controller.
o If the memory module is not busy and a single request arrives, then the arbitration unit passes that request to the memory and the request is
satisfied.
o If the arbitration unit receives two requests, it selects one of them and passes it to the memory controller.
o The module is placed in the busy state while a request is being serviced.
o If a new request arrives while the memory is busy servicing a previous request, the memory module sends a wait signal, through the memory
controller, to the processor making the new request.

o Based on the interconnection network used, shared memory systems can be categorized in the following categories:
✓ Uniform Memory Access (UMA)
o In the UMA a shared memory all processors have equal access time to any memory location.
o The interconnection network used in the UMA can be a single bus, multiple buses, or a crossbar switch.
o Because access to shared memory is balanced, these systems are also called SMP (Symmetric Multiprocessor) systems.
o Each processor has equal opportunity to read/write to memory, including equal access speed.
o Commercial examples of SMPs are Sun Microsystems multiprocessor servers and Silicon Graphics Inc. multiprocessor servers.
o A typical bus-structured SMP computer, as shown in the Figure. M M M M
Bus

C C C C

P P
Classification of Shared Memory Systems
✓ Non-uniform memory access (NUMA)
o In this architecture each processor has part of shared
memory attached.
o The memory has a single address space.
o Any processor could access any memory location
directly using its real address.
o The access time to modules depends on the distance to
the processor, i.e., a non-uniform memory access time.
o A number of architectures are used in a NUMA such as
the tree networks.
o Examples of NUMA architecture are BBN TC-2000,
SGI Origin 3000, and Cray T3E. Figure 4.4 shows
NUMA system organization

✓Cache-only memory Architecture


(COMA).
o Like the NUMA, each processor has part of the shared
memory, which is a cache memory in the COMA.
o A COMA system requires that data to be migrated to
the processor requesting it.
o The address space is made of all the caches.
o There is a cache directory (D) that helps in remote
cache access.
o The Figure shows the organization of COMA.
Bus Based Symmetric Multiprocessors
o A typical bus-based design uses high speed caches to solve the bus contention problem.
o We define the variables for hit rate, number of processors, processor speed, bus speed, and processor duty cycle rates as
follows:
▪ N = Number of processors
▪ h = hit rate of each cache, assumed to be the same for all caches
▪ (1 – h) = miss rate of all caches
▪ B = Bandwidth of the bus, measured in cycles/second
▪ I = Processor duty cycle, assumed to be identical for all processors, in fetches/cycle
▪ V = Peak processor speed, in fetches/second
o The effective bandwidth of the bus is B.I fetches/second.
o If each processor is running at a speed of V, then misses are being generated at a rate of V(1 – h).
o For an N-processor system, misses are simultaneously being generated at a rate of N(1 – h)V.
o This leads to saturation of the bus when N processors simultaneously try to access the bus., i.e. N(1 – h)V ≤ BI.
o The maximum number of processors with cache memories that the bus can support is given by the relation, N≤ BI/((1 – h)V)

o Example 1
▪ Suppose a shared memory system is constructed from processors that can execute V = 110 instructions/second and the
processor duty cycle I = 1.
▪ The caches are designed to support a hit rate of 97%, and the bus supports a peak bandwidth of B = 106 cycles/second.
▪ Then, (1 – h) = 0.03, and the maximum number of processors N is N ≤ 106/(0.03 * 110) = 32.
▪ Thus, the system we have in mind can support only 32 processors!
Cache Incoherence Problem
Single Processor Case caching
Hit: data in the cache

Miss: data is not in the cache


x
Memory
Cashe Hit rate (%): h
x
Cashe Miss rate (%): m = (1-h) Cache
P
Effect of Cashe Hit ratio
𝑡𝑎𝑣 = ℎ1 × 𝑡1 + 1 − ℎ1 𝑡1 + ℎ2 × 𝑡2 + 1 − ℎ2 𝑡2 + 𝑡3

ℎ1 = 0.8, 𝑡1 = 5 , 𝑡2 = 50 ns, 𝑡3 = 200 𝑛𝑠, ℎ2 = 0.8


Cache Incoherence Problem
✓ Writing to Cache in the Two processors case
• Write Through
• Write Back
Let X be an element of shared data which has been referenced by two processors, P1 and P2.

1. In the beginning, three copies of X are consistent.


2. If the processor P1 writes a new data X1 into the cache, by using write through policy, the same
copy will be written immediately into the shared memory.
3. In this case, consistency occurs between cache memory and the main memory.
4. When a write-back policy is used, the main memory will be updated when the modified data in the
cache is replaced or invalidated.
Cache Incoherence Problem
Multiprocessor Case caching
In a shared memory multiprocessor system with a separate cache memory for each processor, it is possible to have
many copies of shared data: one copy in the main memory and one in the local cache of each processor that requested
it.
When one of the copies of data is changed, the other copies must reflect that change.
Cache coherence: ensures that the changes in the values of shared operands (data) are propagated throughout the
system in a timely fashion.
The following are the requirements for cache coherence:
1. Write Propagation
Changes to the data in any cache must be propagated to other copies (of that cache line) in the peer caches.

2. Transaction Serialization
Reads/Writes to a single memory location must be seen by all processors in the same order.

Thus, if location X received two different values A and B, in this order, from any two processors, the processors can never read
location X as B and then read it as A. The location X must be seen with values A and B in that order.
Cache Incoherence Problem
Example: Consider that more than one processor has cached a copy of the memory location X.

The following conditions are necessary to achieve cache coherence:

1. If a read made by a processor P1 to a location X that follows a write by the same processor P1 to X,
given that no writes to X by another processor occurring between the write and the read instructions
made by P1, X must always return the value written by P1.

2. If a read made by a processor P1 to location X that follows a write by another processor P2 to X, given
that no other writes to X made by any processor occurring between the two accesses and with the read
and write being sufficiently separated, X must always return the value written by P2.

This condition defines the concept of coherent view of memory.


If processor P1 reads the old value of X, even after the write by P2, we can say that the memory is
incoherent (incoherent).
Cache Incoherence Problem
✓ Writing to Cache in Multiple processors case
X Shared

x x x

P1 P2 P3 Pn
-Multiple copies of x
-What if P1 updates x?

11
Cache Incoherence Problem
Cache Coherence Policies
✓Four Cases to handle Writing to Cache in n processors case
• Write Update - Write Through
• Write Update - Write Back
• Write Invalidate - Write Through
• Write Invalidate - Write Back
✓ Cashe-Memory coherence
Illustration of the Write-Through and Write Back in a single Processor
Write Through Write Back
Serial Event Memory Cache Memory Cache
1 X X
2 P reads X X X X X
3 P updates (write) X X’ X’ X X’
12
Cache Coherence Problem
✓ Cache – Cache Coherence
o When a task running on processor P requests the data in global memory location X, the contents of X are copied to
processor P’s local cache.
o Suppose processor Q also accesses X and wants to Write a new value to X.
o There are two fundamental cache coherence policies:
(1) write-invalidate, and
(2) write-update.
o Write-invalidate maintains consistency by reading from local caches until a write occurs.
o When any processor updates the value of X through a write, posting a dirty bit for X invalidates all copies.
o Processor Q invalidates all other copies of X when it writes a new value into its cache. This sets the dirty bit for X.
o Q can continue to change X without further notifications to other caches because Q has the only valid copy of X.
o When processor P wants to read X, it must wait until X is updated and the dirty bit is cleared.

o Write-update maintains consistency by immediately updating all copies in all caches.


o See Table for the write-update versus write-invalidate policies.
Write update Write invalidate
Serial Event P’s Cache Q’s Cache P’s Cache Q’s cache
1 P reads X X X
2 Q reads X X X X X
3 Q updates X X’ X’ INV X’
4 Q updates X’ X’’ X’’ INV X’’
Cache Incoherence Problem
Write-invalidate
x x’ x

x x x’ I x’ I

P1 P2 P3 P1 P2 P3 P1 P2 P3

Before Write Through Write back

14
Cache Incoherence Problem
Write-Update
x x’ x

x x x’ x’ x’ x’

P1 P2 P3 P1 P2 P3 P1 P2 P3

Before Write Through Write back

15
Cache Incoherence Problem
✓ Write Invalidate Write Through
o Multiple processors can read block copies from main memory safely until one processor
updates its copy.
o At this time, all cache copies are invalidated and the memory is updated to remain consistent.

State Description

Valid The copy is consistent with global memory


[VALID]

Invalid The copy is inconsistent


[INV]

16
Cache Incoherence Problem
Write Through- Write Invalidate (cont.)

Event Actions

Read Hit Use the local copy from the cache.

Read Miss Fetch a copy from global memory. Set the state of this copy to Valid.

Write Hit Perform the write locally. Broadcast an Invalid command to all caches. Update the global
memory.

Write Miss Get a copy from global memory. Broadcast an invalid command to all caches. Update the
global memory. Update the local copy and set its state to Valid.

Replace Since memory is always consistent, no write back is needed when a block is replaced.

17
Cache Incoherence Problem
✓ Write Invalidate Write Through

Example 1
X=5 M
1. P reads X
2. Q reads X
3. Q updates X, X=10 C C

4. Q reads X
5. Q updates X, X=15 P Q

6. P updates X, X=20
7. Q reads X
18
Cache Incoherence Problem
• The Table shows the contents of memory and the two caches after the execution of each operation when Write Invalidate Write
Through was used for cache coherence.
• The table also shows the state of the block containing X in P’s cache and Q’s cache.
Memory P’s Cache Q’s Cache
Serial Event Location Location State Location State
X X X
0 Original value 5
1 P reads X 5 5 VALID
(Read Miss)
2 Q reads X 5 5 VALID 5 VALID
(Read Miss)
3 Q updates X 10 5 INV 10 VALID
(Write Hit)
4 Q reads X 10 5 INV 10 VALID
(Read Hit)
5 Q updates X 15 5 INV 15 VALID
(Write Hit)
6 P updates X 20 20 VALID 15 INV
(Write Miss)
7 Q reads X 20 20 VALID 20 VALID
(Read Miss)
Cache Incoherence Problem
✓ Write Back- Write Invalidate (ownership)
o A valid block can be owned by memory and shared in multiple caches that can
contain only the shared copies of the block.

o Multiple processors can safely read these blocks from their caches until one
processor updates its copy.

o At this time, the writer becomes the only owner of the valid block and all other
copies are invalidated.

20
Cache Incoherence Problem
✓ Write Back- Write Invalidate (ownership)
State Description

Shared Data is valid and can be read safely. Multiple copies can be in this state
(Read-Only) [RO]

Exclusive Only one valid cache copy exists and can be read from and written to safely.
(Read-Write) Copies in other caches are invalid
[RW]

Invalid The copy is inconsistent


[INV]

21
Cache Incoherence Problem

✓ Write Back- Write Invalidate (ownership)


Event Action

Read Hit Use the local copy from the cache.


Read Miss If no Exclusive (Read-Write) copy exists, then supply a copy from global memory. Set the state
of this copy to Shared (Read-Only). If an Exclusive (Read-Write) copy exists, make a copy from
the cache that set the state to Exclusive (Read-Write), update global memory and local cache
with the copy. Set the state to Shared (Read-Only) in both caches.
Write Hit If the copy is Exclusive (Read-Write), perform the write locally. If the state is Shared (Read-
Only), then broadcast an Invalid to all caches. Set the state to Exclusive (Read-Write).
Write Miss Get a copy from either a cache with an Exclusive (Read-Write) copy, or from global memory
itself. Broadcast an Invalid command to all caches. Update the local copy and set its state to
Exclusive (Read-Write).
Block Replacement If a copy is in an Exclusive (Read-Write) state, it has to be written back to main memory if the
block is being replaced. If the copy is in Invalid or Shared (Read-Only) states, no write back is
needed when a block is replaced.
22
Cache Incoherence Problem

✓ Write Back- Write Invalidate (ownership)


Example:
Consider the shared memory system previously shown in the Figure and the following
operations:
1) P reads X,
M
2) Q reads X,
3) Q updates X,
4) Q reads X, C C
5) Q updates X,
6) P updates X, P Q

7) Q reads X.
23
Cache Incoherence Problem
✓Example:
o Consider the shared memory system previously shown in the Figure and the following operations: 1) P reads X, 2) Q reads
X, 3) Q updates X, 4) Q reads X, 5) Q updates X, 6) P updates X, 7) Q reads X.
o The Table shows the contents of memory and the two caches after the execution of each operation when Write Invalidate
Write Back was used for cache coherence. The table also shows the state of the block containing X in P’s cache and Q’s
cache.
Memory P’s Cache Q’s Cache
Serial Event Location Location State Location State
X X X
0 Original value 5
1 P reads X 5 5 RO
(Read Miss)
2 Q reads X 5 5 RO 5 RO
(Read Miss)
3 Q updates X 5 5 INV 10 RW
(Write Hit)
4 Q reads X 5 5 INV 10 RW
(Read Hit)
5 Q updates X 5 5 INV 15 RW
(Write Hit)
6 P updates X 5 20 RW 15 INV
(Write Miss)
7 Q reads X 20 20 RO 20 RO
(Read Miss)
References

• Textbook Chapter 4

You might also like