0% found this document useful (0 votes)

73 views

ARM Multi Core Processing

Multicore processors contain multiple CPU cores on a single chip connected by an interconnect. This allows parallel processing to improve performance. There are challenges in communication between cores and ensuring data consistency. Cores can communicate via either message passing or shared memory. Cache coherence protocols like MESI ensure cores always access the most up-to-date data values when using shared memory. The MESI protocol defines states like Modified, Exclusive, Shared, and Invalid for cache blocks to manage coherence across cores.

Uploaded by

Gurram Kishore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views

ARM Multi Core Processing

Uploaded by

Gurram Kishore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Multicore Processors

Module 8

© 2021 Arm Limited

Module Syllabus
• Multicore
• Communication
• Message passing
• Shared memory
• Cache-coherence protocol
• MESI protocol states
• MESI protocol transitions
• Visualizing the protocol
• MESI state transition diagram
• Memory consistency

2 © 2021 Arm Limited

Motivation
• Moore’s law meant that the number of transistors available to an architect kept
increasing.
• Historically, these were used to improve the performance of a single core processor.
• Usually through increased speculation support, but this gives sublinear performance improvement
• However, the breakdown of Dennard scaling meant these schemes were no longer
viable.
• If they consumed large amounts of power without giving commensurate performance improvements.
• Multicore architectures are an efficient way of using these transistors instead.
• Performance comes from parallelism, specifically thread-level or process-level parallelism.
• Note that we had multi-processor systems long before Dennard scaling failed; this just pushed them to
mainstream.
• What are the challenges in providing multiple cores and how do they communicate?

3 © 2021 Arm Limited

Multicore

Overview An example of multicore system

• In a multicore processor, multiple CPU

cores are provided within the chip. CPU CPU CPU CPU
• Cores are connected together through core core core core

some form of interconnect.

• Cores share some components on Interconnect
chip.
• For example, memory interface, or a cache
I/O RAM Peripheral Timer
level

4 © 2021 Arm Limited

Multicore
• Cores in a multicore processor are connected together and can collaborate.
• There are a number of challenges to consider when creating a system like this.
• How do cores communicate with each other?
• How is data synchronized?
• How do we ensure that cores don’t get stale data when it’s been modified by other cores?
• How do cores see the ordering of events coming from different cores?
• We’ll explore each of these by considering the concepts of
• Shared memory and message passing
• Cache coherence
• Memory consistency

5 © 2021 Arm Limited

Communication

6 © 2021 Arm Limited

Communication

Message Passing Shared Memory

• In this paradigm, applications running • In this paradigm, there is a shared

on each core wrap up the data they memory address space accessible to
want to communicate into messages all cores, where they can read and
sent to other cores. write data.
• Explicit communication via send and • Implicit communication via memory
receive operations accesses
• Synchronization is implicit via blocks of • Synchronization is performed using
messages. atomic memory operations.

7 © 2021 Arm Limited

Message Passing
• Cores do not rely on a shared
memory space.
• Each processing element (PE) has
its core, data, I/O.
• Explicit I/O to communicate with other
cores
• Synchronization via sending and Core Core Core
receiving messages
• Advantages Cache Cache . . . Cache
• Less hardware, easier to design
• Focuses attention on costly non-local
operations Memory Memory Memory

Interconnection network

8 © 2021 Arm Limited

Shared Memory
• Cores share a memory that they see as a single address space.
• Cores may have caches holding data.
• Communication involves reading and writing to locations in memory.
• Synchronization via atomic operations to modify memory
– Specific instructions provided in the ISA
– The hardware guarantees correct operation
• Advantages Core Core Core
• Matches the programmer’s view of the system
• Hardware handles communication implicitly
Cache Cache . . . Cache

Interconnection network

Memory

9 © 2021 Arm Limited

Shared Memory with Caches
• Caching shared-memory data has to be
handled carefully.
• The cache stores a copy of some of the
data in memory. Core Core
• In particular, writes to the data must Cache coherence
applies to this level
be dealt with correctly. Cache Cache of the cache
• A core may write to data in its own cache. hierarchy.
• This makes copies in other caches become Interconnection
stale. network
• The version in memory won’t get updated
immediately either, if it’s a write-back Shared cache
cache.
• In these situations, there is a danger of one
core subsequently reading a stale (old) Memory
value.

10 © 2021 Arm Limited

Cache Coherence

11 © 2021 Arm Limited

Cache-coherence Protocol
• The cache-coherence protocol ensures that cores always read the
most up-to-date values.
• This means a core can always find the most recent value for some data, no
matter where it is.
• It describes the actions to take on seeing certain events from other cores.
Core Core
• Cache coherence is required when caches share some memory.
• The protocols rely on caches seeing events from other cores.
• In particular, reads and writes to the shared memory Cache Cache
• Often this is realized through snooping operations on the interconnect.
Interconnection
• The protocol runs on each block in the cache independently. network
• This is the granularity commercial implementations track the state
information. Cache or memory
• It runs in the private caches that have a shared ancestor.
• The orange caches in the diagram
12 © 2021 Arm Limited
MESI Cache-coherence Protocol
• The MESI protocol is a write-invalidate cache-coherence protocol.
• When writing to a shared location, the related cache block is invalidated in the caches of all other
cores.
• This protocol can manage cache coherence for a specified memory area.
• In general, uses an allocate-on-write cache policy
• New data are loaded into the cache on both read and write misses.
• In general, uses a write-back cache
• So caches can store data that are more recent than the value in memory.
• The MESI protocol defines states for each cache block and transitions between them.
• Four states, corresponding to M, E, S, and I

13 © 2021 Arm Limited

Modified State (M)

• The local cache holds

the only copy of the Local core cache Remote core cache
block, which is also the
most recent version of Current M
the data; memory holds value
old (stale) data.
• Note that memory may
actually be a shared Not coherent
cache between the
core’s caches and main Not
memory. current
value
Memory
14 © 2021 Arm Limited
Exclusive State (E)

• The local cache

holds the only Local core cache Remote core cache
copy of the block,
which is identical E
Current
to memory’s value
version.

Coherent

Current
value
Memory
15 © 2021 Arm Limited
Shared State (S)

• The local cache

holds a copy of Local core cache Remote core cache
the block, which
is identical to Current
S Coherent S
Current
memory’s value value
version; other
caches may also
hold the block in Coherent Coherent
shared state.

Current
value
Memory
16 © 2021 Arm Limited
Invalid State (I)

• The local cache

does not hold a Local core cache Remote core cache
copy of the block.
• This state may not
be marked in the
cache for each
block; it could be
inferred by a
cache miss on that
block.

Memory
17 © 2021 Arm Limited
Invalid to Modified

• Occurs when the Local core cache Remote core cache Local core cache Remote core cache

local core
attempts to write M
some data to an
address not
already in the
2
cache 1
1. Read-exclusive
request
2. Data response
Memory Memory

Before After
18 © 2021 Arm Limited
Invalid to Exclusive

• Occurs when the Local core cache Remote core cache Local core cache Remote core cache

core attempts to
read data from an E
address that is
not already in the
cache and no
2
other cache has it 1
1. Read request
2. Data response

Memory Memory

Before After
19 © 2021 Arm Limited
Invalid to Shared
• Occurs when the
local core Local core cache Remote core cache Local core cache Remote core cache
attempts to read
data from an E S S
address that is
not already in the 2
cache, but other
caches have a
copy 1
• Data is supplied
by another cache.
1. Read request Memory Memory
2. Data response Before After
20 © 2021 Arm Limited
Shared to Modified

• Occurs when the local Local core cache Remote core cache Local core cache Remote core cache

core attempts to write

some data to an S S X2 M
address that is already
in the cache, and other 1
caches may have a
copy
• Needs to invalidate
other copies
1. Read-exclusive
request Memory Memory

2. Invalidate Before After

21 © 2021 Arm Limited
Shared to Invalid

• Occurs when Local core cache Remote core cache Local core cache Remote core cache

another core
attempts to write S X2 S M
some data to an
address that is 1
already in the cache
• The local cache
snoops the exclusive
read request.
1. Read-exclusive
request Memory Memory

2. Invalidate Before After

22 © 2021 Arm Limited
Exclusive to Modified

• Occurs when the Local core cache Remote core cache Local core cache Remote core cache

local core
attempts to write E M
some data to an
address that is
already in the
cache, and that’s
the only copy
• No need to
invalidate other
caches because Memory Memory
we know they Before After
23
don’t have a copy
© 2021 Arm Limited
Exclusive to Shared

• Occurs when another Local core cache Remote core cache Local core cache Remote core cache

core attempts to
read data from an E S S
address that this
cache has, and it’s 2
the only copy
• Data are supplied by 1
the cache after
snooping the read
request.
1. Read request Memory Memory

2. Data response Before After

24 © 2021 Arm Limited
Exclusive to Invalid
• Occurs when Local core cache Remote core cache Local core cache Remote core cache

another core
attempts to write E X 3 M
some data to an
address that this 2
cache has the only
copy of 1
• The local cache
snoops the exclusive
read request.
1. Read-exclusive Memory Memory

request Before After

2. Data response
3. Invalidate
25 © 2021 Arm Limited
Modified to Shared
• Occurs when Local core cache Remote core cache Local core cache Remote core cache

another core
attempts to read M S S
data from an
address that this 2
cache has written
to 1
• Must flush the
data back to
memory and the
requesting cache Memory Memory

1. Read request Before After

2. Data response
26 © 2021 Arm Limited
Modified to Invalid
• Occurs when Local core cache Remote core cache Local core cache Remote core cache

another core
attempts to write M X3 M
some data to an
address that this 2
cache has altered
• Must flush the 1
data back to
memory and the
requesting cache
1. Read-exclusive Memory Memory

request Before After

2. Data response
3. Invalidate
27 © 2021 Arm Limited
Visualizing the Protocol

• We can visualize the protocol in two

ways. M E S I
• First, a table showing the valid
combinations of states that two caches M ⨯ ⨯ ⨯ ✓
can have for the same block
• For example, two caches can have it in E ⨯ ⨯ ⨯ ✓
shared.
• But if any cache has it in exclusive or
modified, then all other caches are invalid. S ⨯ ⨯ ✓ ✓
• Second, we can draw a state transition
diagram to show all events and actions. I ✓ ✓ ✓ ✓

28 © 2021 Arm Limited

MESI State-transition Diagram
Remote core
write

Remote core Remote core

read write

Remote core Remote core

read write

Local/remote Remote
Local core
read/write M Local core
read E core read
S core
read/write I
Local core
Local core read, remote
write core has a
copy
Local core
Local core
read, no other
write
copies
Local core
29 © 2021 Arm Limited write
Memory Consistency

Memory Consistency
• The purpose of cache coherence is to ensure data propagation and coherence.
• So when data are written by one core, all other cores can later read the correct value.
• When a core attempts to write, others know that their copies are stale.
• When a core attempts to read, others know they must provide their data, if modified.
• The cache-coherence protocol is run independently on each block of data.
• There is no direct interaction between different blocks, as far as the protocol is concerned.
• So what about the order in which data accesses by one core are seen by others?
• If a core performs certain reads and writes to different data, in what order do other cores see them?
• It is the purpose of the memory-consistency model to define this.
• And the job of the memory hierarchy (and core) to implement it

Memory Consistency
• Modern processors may reorder memory operations.
• Out-of-order processing can allow loads to access the cache ahead of older stores.
• Either because the addresses they access don’t match
• Or because the load has been speculatively executed and will be replayed later if a dependence is
found
• This avoids stalling loads unnecessarily, even though their effects can be seen externally.
• By other cores in the system
• Recall module 5 where this was introduced.

Memory Consistency vs Cache Coherence

Cache coherence Memory consistency

Core 1 Core 2 Core 1 Core 2

Accesses issued Accesses issued Accesses issued Core 1’s accesses seen
Store B Store C Store B Load A
Load A Load A Load A Load D

Reordered
Store C Load B Store C Store B
Store A Load C Store A Store C
Load C Store D Load C Load C
Load D Load A Load D Store A

Data propagation
33 © 2021 Arm Limited
Memory Consistency
• The memory consistency model defines valid outcomes of sequences of accesses of the
different cores.
• Sequential consistency (SC) is the strongest and most intuitive model.
• The operations of each core occur in program order, and these are interleaved (at some granularity)
across all cores.
• This means that no loads or stores can bypass other loads or stores.
• SC is overly strong because it prevents many useful optimizations without being needed by most
programs.
• Total store order (TSO) is widely implemented (e.g., x86 architectures).
• The same as sequential consistency but allows a younger load to observe a state of memory in which
the effects of an older store have not yet become observable
• Forms of relaxed consistency have been adopted (e.g., Arm and PowerPC architectures).
• In more relaxed consistency models, other constraints in SC are removed, such as a younger load
observing a state of memory before an older load does.

Case Study: Cortex-A9 MPCore
Contains up to four Cortex-A9 processors
Snoop Control Unit
• Maintains L1 data
cache coherency
across processors
• Arbitrates
accesses to the L2
memory system,
through one or
two external 64-
bit AXI manager
interfaces

Case Study: Heterogeneous Multicore
• big.LITTLE is a heterogeneous processing
architecture with two types of cores.
• “big” cores for high compute performance
• “LITTLE” cores for power efficiency
• L1 and L2 memory system in cores
• A DynamIQ system contains big and LITTLE
cores and a shared unit containing.
• L3 memory system
• Control logic
• External interfaces

Case Study: Cortex-A55
• The Cortex-A55 is a LITTLE core.
™

• Optionally contains an L2 cache Core 7*

Core 6*
Core 5*

• Uses the MESI protocol for coherence

Core 4*
Core 3*
Core 2*

• M: Modified/UniqueDirty (UD) – the line is in only Core 1*

Core 0

this cache and is dirty. L1 memory system

• E: Exclusive/UniqueClean (UC) – the line is in only this IFU Click and type. Right-click to select fill col or.
L1 Click and
Instruction
L1type. Right-click to select fill col or.
Data
Click and type. Right-click to select fill color.

cache and is clean.

Cache Cache
Click and type. Right-click to select fill color.
DPU Click and type. Right-click to select fill color.

• S: Shared/SharedClean (SC) – the line is possibly in PMU

Click and type. Right-clickSTB
DCU to select fill col or.
Click and type. Right-click to select fill color.

other caches and is clean. NEON* BIU MMU

• I: Invalid/Invalid (I) – the line is not in this cache.

• The data-cache unit (DCU) stores the MESI state L2 cache*

of each cache line. ETM ELA* GIC CPU interface

DSU Asynchronous Bridges

DSU SCU and L3

* Optional
37 © 2021 Arm Limited
Conclusions
• Multicore processors provide performance from increasing numbers of transistors.
• Performance comes through thread-level parallelism.
• Shared-memory systems are the most common paradigm.
• Cores share a memory and common address space.
• Data written by one core are read by others when accessing the same location.
• Dealing with shared memory in the presence of caches poses a challenge.
• This is where the cache-coherence protocol comes into play.
• We looked at the MESI protocol, but there are other more simple and more complex protocols
around.
• Memory consistency defines the order that reads/writes to different addresses are seen
by the different cores in the system.

Introducing Hardware (Laptop) PDF
80% (5)
Introducing Hardware (Laptop) PDF
5 pages
History and Generation of Computer
67% (3)
History and Generation of Computer
4 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Week 5
No ratings yet
Week 5
52 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Chapter 4 TLP
No ratings yet
Chapter 4 TLP
46 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
Memory Hierarchy: Haresh Dagale Dept of ESE
No ratings yet
Memory Hierarchy: Haresh Dagale Dept of ESE
32 pages
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
No ratings yet
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
11 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
MODULE 4 hpc
No ratings yet
MODULE 4 hpc
41 pages
Snooping Cache and Directory Based Multiprocessors
No ratings yet
Snooping Cache and Directory Based Multiprocessors
59 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
No ratings yet
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
21 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
comporg6_ch12
No ratings yet
comporg6_ch12
36 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
24 pages
Term Paper: Cahe Coherence Schemes
No ratings yet
Term Paper: Cahe Coherence Schemes
12 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
CA-unit 5-Material-For Reference
No ratings yet
CA-unit 5-Material-For Reference
16 pages
Multiprocessor Cache Coherence
No ratings yet
Multiprocessor Cache Coherence
13 pages
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
No ratings yet
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
20 pages
Multiprocessors and Thread
No ratings yet
Multiprocessors and Thread
4 pages
Lect10 SMPCC
No ratings yet
Lect10 SMPCC
27 pages
Cache Coherence: - According To Webster's Dictionary
No ratings yet
Cache Coherence: - According To Webster's Dictionary
15 pages
1.symmetric and Distributed Shared Memory Architectures
79% (19)
1.symmetric and Distributed Shared Memory Architectures
29 pages
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
No ratings yet
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
54 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
ch5 4
No ratings yet
ch5 4
9 pages
Shared Memory Architectures
No ratings yet
Shared Memory Architectures
34 pages
Lec13 Multiprocessors
No ratings yet
Lec13 Multiprocessors
69 pages
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
No ratings yet
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
79 pages
Lecture 06
No ratings yet
Lecture 06
26 pages
Multi-Core Architectures
100% (1)
Multi-Core Architectures
43 pages
Cache Coherence_20250120_142158_0000
No ratings yet
Cache Coherence_20250120_142158_0000
34 pages
Coherence
No ratings yet
Coherence
16 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
ACA Unit 2
No ratings yet
ACA Unit 2
45 pages
CSE211 Computer Architecturemodule 18-21
No ratings yet
CSE211 Computer Architecturemodule 18-21
19 pages
Cs 6461 Computer Architecture Lecture 11
No ratings yet
Cs 6461 Computer Architecture Lecture 11
51 pages
Week_5
No ratings yet
Week_5
35 pages
Krishna M. Kavi The University of Alabama in Huntsville: Cache Memories
No ratings yet
Krishna M. Kavi The University of Alabama in Huntsville: Cache Memories
5 pages
Slides 17 PDF
No ratings yet
Slides 17 PDF
63 pages
10-Multithreading
No ratings yet
10-Multithreading
60 pages
Module 4
No ratings yet
Module 4
40 pages
Unit 5
No ratings yet
Unit 5
89 pages
CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
46 pages
Cache Coherency
No ratings yet
Cache Coherency
19 pages
52-Cache Memory_ Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
52-Cache Memory_ Principles, Cache Memory Management Techniques-28!02!2025
11 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
No ratings yet
4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
25 pages
Cache Coherence - MESI MOESI
No ratings yet
Cache Coherence - MESI MOESI
57 pages
Mastering the Art of Linux Kernel Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Linux Kernel Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
SoC Design - A Review
No ratings yet
SoC Design - A Review
131 pages
ARM CORTEX M4 Memory Layout
100% (2)
ARM CORTEX M4 Memory Layout
11 pages
Learn The Architecture - Introducing The Arm Architecture 102404 0200 02 en
No ratings yet
Learn The Architecture - Introducing The Arm Architecture 102404 0200 02 en
24 pages
Zynq Mpsoc Kit Selection Guide PDF
No ratings yet
Zynq Mpsoc Kit Selection Guide PDF
3 pages
Rust Cheat Sheet
No ratings yet
Rust Cheat Sheet
78 pages
Microprocessor Based Embedded System: WWW - Engr.usask - ca/classes/CME/331
No ratings yet
Microprocessor Based Embedded System: WWW - Engr.usask - ca/classes/CME/331
36 pages
5 8 Comp Arch Memory System
No ratings yet
5 8 Comp Arch Memory System
22 pages
8086 Pin Diagram
No ratings yet
8086 Pin Diagram
67 pages
Tipos de Ob s71200 s7 1500
No ratings yet
Tipos de Ob s71200 s7 1500
6 pages
CHS Module 4 Install Computer Systems and Networks
No ratings yet
CHS Module 4 Install Computer Systems and Networks
72 pages
Vxrail 4.0 Spec Sheet
No ratings yet
Vxrail 4.0 Spec Sheet
3 pages
Manual Ali 386-40
No ratings yet
Manual Ali 386-40
2 pages
Chipyard Tutorial - Intro by - Sagar Karandikar
No ratings yet
Chipyard Tutorial - Intro by - Sagar Karandikar
12 pages
eBOOK SQLServerExecutionPlans 2ed G Fritchey PDF
No ratings yet
eBOOK SQLServerExecutionPlans 2ed G Fritchey PDF
332 pages
Cache Coherence: Caches Memory Coherence Caches Multiprocessing
No ratings yet
Cache Coherence: Caches Memory Coherence Caches Multiprocessing
4 pages
Module 1 (BKM)
No ratings yet
Module 1 (BKM)
30 pages
Unit 1: Introduction To Computers: Lesson 1: Introduction and Basic Organization
No ratings yet
Unit 1: Introduction To Computers: Lesson 1: Introduction and Basic Organization
16 pages
slot_01
No ratings yet
slot_01
15 pages
Intro To Embedded Systems - AMIT - New
No ratings yet
Intro To Embedded Systems - AMIT - New
73 pages
Instruction Op-Code Operand Bytes Machine - Cycles T - States Detail
No ratings yet
Instruction Op-Code Operand Bytes Machine - Cycles T - States Detail
3 pages
Good Questions To Ask Assembly
100% (2)
Good Questions To Ask Assembly
12 pages
CS6461 - Computer Architecture Fall 2016 Adapted From Professor Stephen Kaisler's Slides
No ratings yet
CS6461 - Computer Architecture Fall 2016 Adapted From Professor Stephen Kaisler's Slides
71 pages
Understanding Message Flow in XI - SAP Blogs PDF
No ratings yet
Understanding Message Flow in XI - SAP Blogs PDF
10 pages
Architecture of 80486 Microprocessor
No ratings yet
Architecture of 80486 Microprocessor
3 pages
MC Vs MP
No ratings yet
MC Vs MP
6 pages
COA Syllabus
No ratings yet
COA Syllabus
1 page
What Is A Microcontroller
No ratings yet
What Is A Microcontroller
8 pages
SUBROUTINES IN 8085-(Document)(2022BCSE017)
No ratings yet
SUBROUTINES IN 8085-(Document)(2022BCSE017)
4 pages
SCE - EN - 020-060 - R1209 - Diagnose Und Fehlersuche
No ratings yet
SCE - EN - 020-060 - R1209 - Diagnose Und Fehlersuche
36 pages
End Term Examination: Paper Code: BBA - 107 Subject: Computer Fundamentals Paper ID
No ratings yet
End Term Examination: Paper Code: BBA - 107 Subject: Computer Fundamentals Paper ID
5 pages
3.PLC Hardware Components
No ratings yet
3.PLC Hardware Components
36 pages
Syllabus MCA-I Sem (AIML) July-2024
No ratings yet
Syllabus MCA-I Sem (AIML) July-2024
14 pages
2021S IP Question
No ratings yet
2021S IP Question
43 pages

ARM Multi Core Processing

Uploaded by

ARM Multi Core Processing

Uploaded by

Multicore Processors

© 2021 Arm Limited

2 © 2021 Arm Limited

3 © 2021 Arm Limited

Overview An example of multicore system

• In a multicore processor, multiple CPU

some form of interconnect.

4 © 2021 Arm Limited

5 © 2021 Arm Limited

6 © 2021 Arm Limited

Message Passing Shared Memory

• In this paradigm, applications running • In this paradigm, there is a shared

7 © 2021 Arm Limited

8 © 2021 Arm Limited

9 © 2021 Arm Limited

10 © 2021 Arm Limited

11 © 2021 Arm Limited

13 © 2021 Arm Limited

• The local cache holds

• The local cache

• The local cache

• The local cache

core attempts to write

2. Invalidate Before After

2. Invalidate Before After

2. Data response Before After

request Before After

1. Read request Before After

request Before After

• We can visualize the protocol in two

28 © 2021 Arm Limited

Remote core Remote core

Remote core Remote core

30 © 2021 Arm Limited

31 © 2021 Arm Limited

32 © 2021 Arm Limited

Cache coherence Memory consistency

Core 1 Core 2 Core 1 Core 2

34 © 2021 Arm Limited

35 © 2021 Arm Limited

36 © 2021 Arm Limited

• Optionally contains an L2 cache Core 7*

• Uses the MESI protocol for coherence

• M: Modified/UniqueDirty (UD) – the line is in only Core 1*

this cache and is dirty. L1 memory system

cache and is clean.

• S: Shared/SharedClean (SC) – the line is possibly in PMU

other caches and is clean. NEON* BIU MMU

• I: Invalid/Invalid (I) – the line is not in this cache.

of each cache line. ETM ELA* GIC CPU interface

DSU Asynchronous Bridges

DSU SCU and L3

38 © 2021 Arm Limited

You might also like