ARM Multi Core Processing
ARM Multi Core Processing
Module 8
Interconnection network
Interconnection network
Memory
Coherent
Current
value
Memory
15 © 2021 Arm Limited
Shared State (S)
Current
value
Memory
16 © 2021 Arm Limited
Invalid State (I)
Memory
17 © 2021 Arm Limited
Invalid to Modified
• Occurs when the Local core cache Remote core cache Local core cache Remote core cache
local core
attempts to write M
some data to an
address not
already in the
2
cache 1
1. Read-exclusive
request
2. Data response
Memory Memory
Before After
18 © 2021 Arm Limited
Invalid to Exclusive
• Occurs when the Local core cache Remote core cache Local core cache Remote core cache
core attempts to
read data from an E
address that is
not already in the
cache and no
2
other cache has it 1
1. Read request
2. Data response
Memory Memory
Before After
19 © 2021 Arm Limited
Invalid to Shared
• Occurs when the
local core Local core cache Remote core cache Local core cache Remote core cache
attempts to read
data from an E S S
address that is
not already in the 2
cache, but other
caches have a
copy 1
• Data is supplied
by another cache.
1. Read request Memory Memory
2. Data response Before After
20 © 2021 Arm Limited
Shared to Modified
• Occurs when the local Local core cache Remote core cache Local core cache Remote core cache
• Occurs when Local core cache Remote core cache Local core cache Remote core cache
another core
attempts to write S X2 S M
some data to an
address that is 1
already in the cache
• The local cache
snoops the exclusive
read request.
1. Read-exclusive
request Memory Memory
• Occurs when the Local core cache Remote core cache Local core cache Remote core cache
local core
attempts to write E M
some data to an
address that is
already in the
cache, and that’s
the only copy
• No need to
invalidate other
caches because Memory Memory
we know they Before After
23
don’t have a copy
© 2021 Arm Limited
Exclusive to Shared
• Occurs when another Local core cache Remote core cache Local core cache Remote core cache
core attempts to
read data from an E S S
address that this
cache has, and it’s 2
the only copy
• Data are supplied by 1
the cache after
snooping the read
request.
1. Read request Memory Memory
another core
attempts to write E X 3 M
some data to an
address that this 2
cache has the only
copy of 1
• The local cache
snoops the exclusive
read request.
1. Read-exclusive Memory Memory
another core
attempts to read M S S
data from an
address that this 2
cache has written
to 1
• Must flush the
data back to
memory and the
requesting cache Memory Memory
another core
attempts to write M X3 M
some data to an
address that this 2
cache has altered
• Must flush the 1
data back to
memory and the
requesting cache
1. Read-exclusive Memory Memory
Local/remote Remote
Local core
read/write M Local core
read E core read
S core
read/write I
Local core
Local core read, remote
write core has a
copy
Local core
Local core
read, no other
write
copies
Local core
29 © 2021 Arm Limited write
Memory Consistency
Accesses issued Accesses issued Accesses issued Core 1’s accesses seen
Store B Store C Store B Load A
Load A Load A Load A Load D
Reordered
Store C Load B Store C Store B
Store A Load C Store A Store C
Load C Store D Load C Load C
Load D Load A Load D Store A
Data propagation
33 © 2021 Arm Limited
Memory Consistency
• The memory consistency model defines valid outcomes of sequences of accesses of the
different cores.
• Sequential consistency (SC) is the strongest and most intuitive model.
• The operations of each core occur in program order, and these are interleaved (at some granularity)
across all cores.
• This means that no loads or stores can bypass other loads or stores.
• SC is overly strong because it prevents many useful optimizations without being needed by most
programs.
• Total store order (TSO) is widely implemented (e.g., x86 architectures).
• The same as sequential consistency but allows a younger load to observe a state of memory in which
the effects of an older store have not yet become observable
• Forms of relaxed consistency have been adopted (e.g., Arm and PowerPC architectures).
• In more relaxed consistency models, other constraints in SC are removed, such as a younger load
observing a state of memory before an older load does.
• E: Exclusive/UniqueClean (UC) – the line is in only this IFU Click and type. Right-click to select fill col or.
L1 Click and
Instruction
L1type. Right-click to select fill col or.
Data
Click and type. Right-click to select fill color.
* Optional
37 © 2021 Arm Limited
Conclusions
• Multicore processors provide performance from increasing numbers of transistors.
• Performance comes through thread-level parallelism.
• Shared-memory systems are the most common paradigm.
• Cores share a memory and common address space.
• Data written by one core are read by others when accessing the same location.
• Dealing with shared memory in the presence of caches poses a challenge.
• This is where the cache-coherence protocol comes into play.
• We looked at the MESI protocol, but there are other more simple and more complex protocols
around.
• Memory consistency defines the order that reads/writes to different addresses are seen
by the different cores in the system.