0% found this document useful (0 votes)

25 views27 pages

Lect10 SMPCC

Shared memory architectures allow multiple CPUs to access a shared pool of memory. This can cause cache coherence problems if different CPUs have conflicting copies of data in their caches. Cache coherence protocols like MSI resolve this by maintaining consistency between caches and main memory through snooping or a directory-based approach.

Uploaded by

bassam abutraab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views27 pages

Lect10 SMPCC

Uploaded by

bassam abutraab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Shared Memory Architectures

 Introduce different shared memory architectures

 Introduce the cache coherence problem and cache coherence
protocol.
Shared memory architectures

 Multiple CPU’s (or cores)

 One memory with a global address space
o May have many modules (to increase memory bandwidth)
o All CPUs access all memory through the global address space

 All CPUs can make changes to the shared memory

o Changes made by one processor are visible to all other
processors.

 Data parallelism or function parallelism? MIMD

Major issue: How to connect CPU and memory

 Ideal effect:
o Large memory bandwidth
o Low memory latency

 When accessing remote objects,

bandwidth and latency are always
key metrics. Think of the user
experience when downloading small
files versus a very large file.
Shared memory architectures: UMA and NUMA

• One large memory

– One the same side of the interconnect
• Mostly Bus
– Memory references from different CPUs have the same
distance (latency)
– Uniform memory access (UMA)

• Many small memories

– Local and remote memory
– Memory references from different CPUs have different
distance to memory location (latency is different)
– Non-uniform memory access (NUMA)
Bus-based UMA Shared memory architecture
 Many processors and memory modules connect to the bus
o This architecture dominated in the server domain in the past.
 Faster processors began to saturate the bus as the bus technology
cannot keep up with CPU processing power
 Bus interconnect may also be replaced by an crossbar interconnect,
but that is expensive.
NUMA Shared memory architecture

 Identical processors, processors have

different time for accessing different part of
the memory.
 Memory resides in “NUMA domains.”
 The current generation SMPs adopt the
NUMA architecture. AMD EPYC multi-chip
module (MCM) processor is shown in the
left. Memory is distributed to each module.
Various SMP hardware organizations
Cache Coherence Problem

Due to the cache copies of the memory, different processors may see
the different values of the same memory location.
 Processors see different values for u after event 3.
With a write-back cache, memory may store the stale date.
Bus Snoopy Cache Coherence protocols
 Memory: centralized with uniform access time and bus interconnect.
Bus Snooping idea

 Send all requests for data to all processors (through the bus)
 Processors snoop to see if they have a copy and respond accordingly.
o Cache listens to both CPU and BUS.
o The state of a cache line may change by (1) CPU memory operation, and (2)
bus transaction (remote CPU’s memory operation).
 Requires broadcast since caching information is at processors.
o Bus is a natural broadcast medium.
o Bus (centralized medium) also serializes requests.
Types of snoopy bus protocols
 Write invalidate protocols
o Write to shared data: an invalidate is sent to the bus (all caches snoop and
invalidate copies).
 Write broadcast protocols (typically write through)
o Write to shared data: broadcast on bus, processors snoop and update any
copies.
An Example Snoopy Protocol (MSI)
 Invalidation protocol, write-back cache
 Each block of memory is in one state
o Clean in all caches and up-to-date in memory (shared)
o Dirty in exactly one cache (exclusive)
o Not in any cache

 Each cache block is in one state:

o Shared: block can be read
o Exclusive: cache has only copy, its writable and dirty
o Invalid: block contains no data.

 Read misses: cause all caches to snoop bus (bus transaction)

 Write to a shared block is treated as misses (needs bus transaction).
MSI PROTOCOL STATE MACHINE FOR CPU
REQUESTS
MSI PROTOCOL STATE MACHINE FOR CPU
REQUESTS
MSI PROTOCOL STATE MACHINE FOR CPU
REQUESTS
MSI cache coherence protocol variations

 Basic MSI Protocol

o Three states: MSI.
o Can optimize by refining the states so as to reduce the bus transactions in
some cases.
 Berkeley protocol
o Five states, M  owned, exclusive, owned shared.
 MESI protocol (four states)
o M  modified and Exclusive.
 MESIF – protocol used in Intel processors
o MESI + S S and F (Cache should be the responder for a request)
 MOESI – protocol used in AMD processors
Multiple levels of caches

 Most processors today have on-chip L1 and L2 caches.

 Transactions on L1 cache are not visible to bus (needs separate snooper
for coherence, which would be expensive).
 Typical solution:
o Maintain inclusion property on L1 and L2 cache so that all bus transactions
that are relevant to L1 are also relevant to L2: sufficient to only use the L2
controller to snoop the bus.
o Propagating transactions for coherence in the hierarchy.
Large share memory multiprocessors

• The interconnection network is usually not a bus.

• No broadcast medium  cannot snoop.
• Needs a different kind of cache coherence protocol.
Cache coherence for large SMPs

 Similar idea as MSI protocol, but the interconnect does not have broadcast.
o Use a directory to record where (who the owner is) of each memory line.

 Use a directory for each cache line to track the state of every block in the
cache.
o Can also track the state for all memory blocks  directory size = O(memory size).

 Need to used distributed directory

o Centralized directory becomes the bottleneck.
 Who is the central authority for a given cache line?

 Typically called cc-NUMA multiprocessors

Performance implication of shared memory
architecture
 NUMA architecture can have very large impact on performance
 Cache coherence protocol can also have impacts
o Memory write is even more expensive
o False sharing issue
 One thread’s cache behavior can affect other thread’s performance
Summary

 Share memory architectures

o UMA and NUMA
o Bus based systems and interconnect based systems

 Cache coherence problem

 Cache coherence protocols
o Snoopy bus
o Directory based

Internship Report
No ratings yet
Internship Report
18 pages
Multicore Programming: K. Nagalakshmi, ASP/IT Department of Information Technology E.G.S. Pillay Engineering Technology
No ratings yet
Multicore Programming: K. Nagalakshmi, ASP/IT Department of Information Technology E.G.S. Pillay Engineering Technology
19 pages
Lec13 Multiprocessors
No ratings yet
Lec13 Multiprocessors
69 pages
Cache Coherency
No ratings yet
Cache Coherency
19 pages
Cache Coherence: - According To Webster's Dictionary
No ratings yet
Cache Coherence: - According To Webster's Dictionary
15 pages
FPGA Implementation of Snoopy Bus Based Cache Coherence Protocols For Dual Processor System
No ratings yet
FPGA Implementation of Snoopy Bus Based Cache Coherence Protocols For Dual Processor System
11 pages
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
No ratings yet
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
22 pages
Ultra-Low-Power Arm Cortex - M0+ 32-Bit MCU, Up To 256-Kbyte Flash Memory, 40-Kbyte SRAM, USB, LCD
100% (1)
Ultra-Low-Power Arm Cortex - M0+ 32-Bit MCU, Up To 256-Kbyte Flash Memory, 40-Kbyte SRAM, USB, LCD
146 pages
2021-MESI Protocol For Multicore Processors Based On FPGA
No ratings yet
2021-MESI Protocol For Multicore Processors Based On FPGA
10 pages
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
No ratings yet
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
79 pages
A Survey of Cache Coherence Mechanisms in Shared M
No ratings yet
A Survey of Cache Coherence Mechanisms in Shared M
27 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
23 pages
Shared Memory Architectures
No ratings yet
Shared Memory Architectures
34 pages
A+ Cheat Sheet Best1
100% (2)
A+ Cheat Sheet Best1
7 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
1.symmetric and Distributed Shared Memory Architectures
79% (19)
1.symmetric and Distributed Shared Memory Architectures
29 pages
Parallel Prrocessor
No ratings yet
Parallel Prrocessor
12 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
Cosc530 Ch5all6up
No ratings yet
Cosc530 Ch5all6up
5 pages
Module 4
No ratings yet
Module 4
40 pages
Part B Ma
No ratings yet
Part B Ma
16 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
Week 5
No ratings yet
Week 5
52 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
CT049 3 1 OSCA Exam QP 2
100% (2)
CT049 3 1 OSCA Exam QP 2
19 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
25895
No ratings yet
25895
4 pages
Memory Hierarchy: Haresh Dagale Dept of ESE
No ratings yet
Memory Hierarchy: Haresh Dagale Dept of ESE
32 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
atII Bks Lec 2021 31 32
No ratings yet
atII Bks Lec 2021 31 32
16 pages
Snooping Cache and Directory Based Multiprocessors
No ratings yet
Snooping Cache and Directory Based Multiprocessors
59 pages
Module 5 - Pentium
No ratings yet
Module 5 - Pentium
39 pages
Unit 5
No ratings yet
Unit 5
89 pages
CA-unit 5-Material-For Reference
No ratings yet
CA-unit 5-Material-For Reference
16 pages
This Unit: Shared Memory Multiprocessors: - Three Issues
No ratings yet
This Unit: Shared Memory Multiprocessors: - Three Issues
17 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Lecture 3 PDC
No ratings yet
Lecture 3 PDC
21 pages
Cache Coherence - 20250120 - 142158 - 0000
No ratings yet
Cache Coherence - 20250120 - 142158 - 0000
34 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
18bce2429 Da 2 Cao
No ratings yet
18bce2429 Da 2 Cao
13 pages
0014 SharedMemoryArchitecture
No ratings yet
0014 SharedMemoryArchitecture
31 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
Cache Coherence: Computer Science & Artificial Intelligence Lab
No ratings yet
Cache Coherence: Computer Science & Artificial Intelligence Lab
36 pages
Copa Lesson Plan Sem1 in
No ratings yet
Copa Lesson Plan Sem1 in
120 pages
ARM Multi Core Processing
No ratings yet
ARM Multi Core Processing
38 pages
Distributed Shared Memory: Introduction & Thisis
No ratings yet
Distributed Shared Memory: Introduction & Thisis
22 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
07 Multiprocessors MF PDF
No ratings yet
07 Multiprocessors MF PDF
99 pages
Memory: by Kriti Sundar Das Mirafra Technnologies
No ratings yet
Memory: by Kriti Sundar Das Mirafra Technnologies
22 pages
Lecture4 (Share Memory-"According Access")
No ratings yet
Lecture4 (Share Memory-"According Access")
16 pages
Shared-Memory Multiprocessors - Symmetric Multiprocessing Hardware
No ratings yet
Shared-Memory Multiprocessors - Symmetric Multiprocessing Hardware
7 pages
tpm05 Modbus Register List
No ratings yet
tpm05 Modbus Register List
3 pages
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
No ratings yet
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
11 pages
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
24 pages
MCRP Viva Question
No ratings yet
MCRP Viva Question
12 pages
USBASP - Help3
No ratings yet
USBASP - Help3
19 pages
COS 318: Operating Systems Virtual Memory Paging: Andy Bavier Computer Science Department Princeton University
No ratings yet
COS 318: Operating Systems Virtual Memory Paging: Andy Bavier Computer Science Department Princeton University
24 pages
Operating Systems: Samet Yalmaç 190336 Shared Memory
No ratings yet
Operating Systems: Samet Yalmaç 190336 Shared Memory
16 pages
Unit 5 (Csa)
No ratings yet
Unit 5 (Csa)
25 pages
Increasing TLB Reach by Exploiting Clustering in Page Translations
No ratings yet
Increasing TLB Reach by Exploiting Clustering in Page Translations
10 pages
Handout 1
No ratings yet
Handout 1
6 pages
Instrumentation & Control Dept. A.V.P.T.I.-RAJKOT: Prepared By: Mr. H.R.Chothani
No ratings yet
Instrumentation & Control Dept. A.V.P.T.I.-RAJKOT: Prepared By: Mr. H.R.Chothani
85 pages
Plastic Memory
No ratings yet
Plastic Memory
9 pages
Term Paper: Computer Organization and Architecure (Cse211)
No ratings yet
Term Paper: Computer Organization and Architecure (Cse211)
7 pages
Memory Design Intro Viveka
No ratings yet
Memory Design Intro Viveka
32 pages
P5K-VM QVL
No ratings yet
P5K-VM QVL
3 pages
207 Assignment 6
No ratings yet
207 Assignment 6
7 pages
1 Introduction
No ratings yet
1 Introduction
18 pages
Logix 5000 Instruction Execution Time
No ratings yet
Logix 5000 Instruction Execution Time
166 pages
Microcontroller Architecture
No ratings yet
Microcontroller Architecture
8 pages
Autosar Sws Memoryaccess
No ratings yet
Autosar Sws Memoryaccess
86 pages
22 Memory Interleaving 07-03-2024
No ratings yet
22 Memory Interleaving 07-03-2024
22 pages
Switch Configuration With SYNTAX (Updated)
No ratings yet
Switch Configuration With SYNTAX (Updated)
5 pages
TW3X4G1600C9DHX
No ratings yet
TW3X4G1600C9DHX
1 page
Part 2 ITandBUSSINES III BCOM - Unit II To Unit V
No ratings yet
Part 2 ITandBUSSINES III BCOM - Unit II To Unit V
55 pages
Statement of Volatility - Dell Pro Max Tower T2 FCT2250
No ratings yet
Statement of Volatility - Dell Pro Max Tower T2 FCT2250
3 pages
Memory Management - OS - 2023
No ratings yet
Memory Management - OS - 2023
96 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Unit 1 Micro
No ratings yet
Unit 1 Micro
10 pages
Taxonomy of Parallel Computing Paradigms
No ratings yet
Taxonomy of Parallel Computing Paradigms
9 pages
Muge - Snoop Based Multiprocessor Design
No ratings yet
Muge - Snoop Based Multiprocessor Design
32 pages

Lect10 SMPCC

Uploaded by

Lect10 SMPCC

Uploaded by

Shared Memory Architectures

 Introduce different shared memory architectures

 Multiple CPU’s (or cores)

 All CPUs can make changes to the shared memory

 Data parallelism or function parallelism? MIMD

 When accessing remote objects,

• One large memory

• Many small memories

 Identical processors, processors have

 Each cache block is in one state:

 Read misses: cause all caches to snoop bus (bus transaction)

 Basic MSI Protocol

 Most processors today have on-chip L1 and L2 caches.

• The interconnection network is usually not a bus.

 Need to used distributed directory

 Typically called cc-NUMA multiprocessors

 Share memory architectures

 Cache coherence problem

You might also like