0% found this document useful (0 votes)

11 views26 pages

Lecture 06

Uploaded by

Mohamed Ghetas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views26 pages

Lecture 06

Uploaded by

Mohamed Ghetas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

HIGH PERFORMANCE

COMPUTING
LECTURE 6

Dr. Mohamed Ghetas

MIMD Systems Interconnection

MIMD

Shared Memory Distributed Memory

Bus Direct Indirect

Crossbar Ring Crossbar
Toroidal mesh Omega
2
Cache coherence
 Programmers have no
control over caches
and when they get updated.

Example
A shared memory system with two
cores and two caches
y0 privately owned by Core 0
y1 and z1 privately owned by Core 1
Copyright © 2010,
Elsevier Inc. All rights
Reserved 3
Cache coherence
y0 privately owned by Core 0
y1 and z1 privately owned by Core 1
x = 2; /* shared variable */

y0 eventually ends up = 2
y1 eventually ends up = 6
z1 = ???

5
Problem with Write - back
Policy

6
Cache coherence
 Programmers have no control over caches
and when they get updated.
 Copies of the data stored in the shared
memory must match those copies stored in
the local caches. This is referred to as
cache coherence.
 The copies of a shared variable are coherent
if they are all equal
 Cache coherence is important to guarantee
correct program execution and to ensure high
system performance. 7
Cache Coherence Protocols
 A cache coherence protocol must be used to
ensure that the contents of the cache
memories are consistent with the contents of
the shared memory.

 Two main cache coherence protocols:

1. Snooping Cache Coherence
2. Directory Based Cache Coherence

8
Snooping Cache Coherence

 The cores share a bus .

 Any signal transmitted on the bus can be
“seen” by all cores connected to the bus.
 When core 0 updates the copy of x stored in
its cache it also broadcasts this information
across the bus.
 If core 1 is “snooping” the bus, it will see that x
has been updated and it can mark its copy of x
as invalid.
Copyright © 2010,
Elsevier Inc. All rights
Reserved 9
Snooping Cache Coherence

10
Snooping Cache Coherence
 Write-through vs. write-back
 Requires a broadcast every time a variable is
updated.
 Large networks broadcasts are expensive
 Snooping cache coherence isn’t scalable,
because for larger systems it will cause
performance to degrade.

11
Directory Based Cache
Coherence
 Uses a data structure called a directory that
stores the status of each cache line.

 When a variable is updated, the directory is

consulted, and the cache controllers of the
cores that have that variable’s cache line in
their caches are invalidated.

13
Directory Based Cache
Coherence
 The local caches associated with the processors
have local cache controllers to coordinate
updating the copies of the shared variables
stored in the local caches.
 The central controller is responsible for cache
coherence for the system..
 There will be additional storage required for the
directory
 When a cache variable is updated, only the
cores storing that variable need to be contacted.
14
False Sharing
 CPU caches are implemented in hardware, so
they operate on cache lines, not individual
variables.

 We can parallelize the previous code by

dividing the iterations in the outer loop among
the cores.
 If we have core_count cores, we might assign
the first m/core_count iterations to the first core,
the next m/core_count iterations to the second
core, and so on.

 Suppose our shared-memory system has two

cores, m = 8, doubles are eight bytes, cache
lines are 64 bytes, and y[0] is stored at the
beginning of a cache line.
 A cache line can store eight doubles, and y
takes one full cache line.
 What happens when core 0 and core 1
simultaneously execute their codes ?

 Since all of y is stored in a single cache line,

each time one of the cores executes the
statement y[i] += f(i,j), the line will be
invalidated, and the next time the other core
tries to execute this statement it will have to
fetch the updated line from memory!

 This is called false sharing, because the

system is behaving as if the elements of y were
being shared by the cores.
 False sharing does not cause incorrect results
 It can degrade the performance of a program
by causing memory accesses more than
necessary.

 How to solve this problem?

 To reduce its effect, use temporary storage that
is local to the thread or process and then
copying the temporary storage to the shared
storage.

Copyright © 2010,
Elsevier Inc. All rights
Reserved 21
Parallel software
The burden is on software
 Hardware and compilers can keep up the pace
needed.
 From now on…
In shared memory programs:
◼ Start a single process and fork threads.
◼ Threads carry out tasks.

In distributed memory programs:

◼ Start multiple processes.
◼ Processes carry out tasks.

 A SPMD programs consists of a single

executable that can behave as if it were
multiple different programs through the use of
conditional branches.
if (I’m thread process i)
do this;
else
do that;
Copyright © 2010,
Elsevier Inc. All rights
Reserved 24
Writing Parallel Programs

1. Divide the work among the double x[n], y[n];

processes/threads
(a) so each process/thread …
gets roughly the same for (i = 0; i < n; i++)
amount of work
(b) and communication is
x[i] += y[i];
minimized.
2. Arrange for the processes/threads to synchronize.
3. Arrange for communication among processes/threads.

Copyright © 2010,
Elsevier Inc. All rights
Reserved 25
Shared Memory
 Dynamic threads
Master thread waits for work, forks new threads,
and when threads are done, they terminate
Efficient use of resources, but thread creation and
termination is time consuming.
 Static threads
Pool of threads created and are allocated work,
but do not terminate until cleanup.
Better performance, but potential waste of system
resources.
Copyright © 2010,
Elsevier Inc. All rights
Reserved 26

Yan Solihin - Fundamentals of Parallel Computer Architecture
100% (2)
Yan Solihin - Fundamentals of Parallel Computer Architecture
547 pages
Unit 1 Handout
No ratings yet
Unit 1 Handout
5 pages
BI Written Assignment
No ratings yet
BI Written Assignment
9 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
1.symmetric and Distributed Shared Memory Architectures
79% (19)
1.symmetric and Distributed Shared Memory Architectures
29 pages
Shared Memory Multiprocessors
No ratings yet
Shared Memory Multiprocessors
45 pages
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
No ratings yet
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
22 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
R12 U5 MultiProcessor Architectures
No ratings yet
R12 U5 MultiProcessor Architectures
47 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
Thread-Level Parallelism: A Quantitative Approach, Sixth Edition
No ratings yet
Thread-Level Parallelism: A Quantitative Approach, Sixth Edition
40 pages
Module 4
No ratings yet
Module 4
40 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
Shared Memory Multiprocessors
No ratings yet
Shared Memory Multiprocessors
45 pages
Cp4292 Mcap Question Bank
No ratings yet
Cp4292 Mcap Question Bank
12 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
Week 5
No ratings yet
Week 5
52 pages
Unit 5
No ratings yet
Unit 5
89 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Cache Coherence: CSE 661 - Parallel and Vector Architectures
No ratings yet
Cache Coherence: CSE 661 - Parallel and Vector Architectures
37 pages
MCAP QP CT - I - 2 Marks - Key
No ratings yet
MCAP QP CT - I - 2 Marks - Key
3 pages
Part B Ma
No ratings yet
Part B Ma
16 pages
Cache Coherence (Part 1)
No ratings yet
Cache Coherence (Part 1)
13 pages
Cache Coherence - 20250120 - 142158 - 0000
No ratings yet
Cache Coherence - 20250120 - 142158 - 0000
34 pages
Lec13 Multiprocessors
No ratings yet
Lec13 Multiprocessors
69 pages
Cosc530 Ch5all6up
No ratings yet
Cosc530 Ch5all6up
5 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
A Survey of Cache Coherence Mechanisms in Shared M
No ratings yet
A Survey of Cache Coherence Mechanisms in Shared M
27 pages
4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
No ratings yet
4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
25 pages
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
No ratings yet
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
32 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
CA-unit 5-Material-For Reference
No ratings yet
CA-unit 5-Material-For Reference
16 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
2.symmetric Shared Memory Architectures
No ratings yet
2.symmetric Shared Memory Architectures
12 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
No ratings yet
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
42 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
Week4 1
No ratings yet
Week4 1
37 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
0014 SharedMemoryArchitecture
No ratings yet
0014 SharedMemoryArchitecture
31 pages
Multiprocessor Cache Coherence
No ratings yet
Multiprocessor Cache Coherence
13 pages
Multi Processor
No ratings yet
Multi Processor
63 pages
Cache Coherence - MESI MOESI
No ratings yet
Cache Coherence - MESI MOESI
57 pages
Memory Hierarchy: Haresh Dagale Dept of ESE
No ratings yet
Memory Hierarchy: Haresh Dagale Dept of ESE
32 pages
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
No ratings yet
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
11 pages
Lecture 5
No ratings yet
Lecture 5
15 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
CHERIoT Programmers' Guide: CHERIoT, #1
From Everand
CHERIoT Programmers' Guide: CHERIoT, #1
David Chisnall
No ratings yet
Elixir in Action: From Zero to Production-Ready Applications
From Everand
Elixir in Action: From Zero to Production-Ready Applications
Elliot Ramsey
No ratings yet
M01 - Developing System Infrastructure Design Plan
No ratings yet
M01 - Developing System Infrastructure Design Plan
49 pages
Genero Studio
No ratings yet
Genero Studio
18 pages
Beginning Game Development With Godot (For John Malkovec) : Learn To Create and Publish Your First 2D Platform Game Maithili Dhule
No ratings yet
Beginning Game Development With Godot (For John Malkovec) : Learn To Create and Publish Your First 2D Platform Game Maithili Dhule
47 pages
Elevator Action (En)
No ratings yet
Elevator Action (En)
2 pages
User-Agents Line-App Application Android
No ratings yet
User-Agents Line-App Application Android
37 pages
Prevent Visual Studio 2015 From Removing Line Continuation Characters - in VBNET Files
No ratings yet
Prevent Visual Studio 2015 From Removing Line Continuation Characters - in VBNET Files
3 pages
Windows11 CMD
No ratings yet
Windows11 CMD
1 page
5W1H
No ratings yet
5W1H
1 page
KAFKA - Training Contents - 5 Days
No ratings yet
KAFKA - Training Contents - 5 Days
1 page
QRadar SOAR PoX Product Education Quiz (SOAR PoX L4) Attempt Review
No ratings yet
QRadar SOAR PoX Product Education Quiz (SOAR PoX L4) Attempt Review
15 pages
2022 Grade 11 3rd Tem Sinhala Lit
No ratings yet
2022 Grade 11 3rd Tem Sinhala Lit
5 pages
Datastage Routines
100% (1)
Datastage Routines
64 pages
H21-282 - V2.0-EnU Huawei Exam Valid Questions
100% (1)
H21-282 - V2.0-EnU Huawei Exam Valid Questions
8 pages
Introduction To UML: Use Case Diagram
No ratings yet
Introduction To UML: Use Case Diagram
33 pages
Iot Levels
No ratings yet
Iot Levels
8 pages
Quintessential Guide To Data Lineage 2023 1689366223
No ratings yet
Quintessential Guide To Data Lineage 2023 1689366223
10 pages
Sfin Admin Guide 211
No ratings yet
Sfin Admin Guide 211
56 pages
Curriculum Vitae - English
No ratings yet
Curriculum Vitae - English
4 pages
The Internals of PostgreSQL - Chapter 2 Process and Memory Architecture
No ratings yet
The Internals of PostgreSQL - Chapter 2 Process and Memory Architecture
3 pages
Activity 3
No ratings yet
Activity 3
12 pages
NMA Question Bank For GTU Examination
No ratings yet
NMA Question Bank For GTU Examination
5 pages
12th IT
No ratings yet
12th IT
2 pages
Pdfmergerfreecom Electronic Devices and Circuits by Bogart PDF Free Download Mindscompress
No ratings yet
Pdfmergerfreecom Electronic Devices and Circuits by Bogart PDF Free Download Mindscompress
2 pages
Supervised Machine Learning - Javatpoint
No ratings yet
Supervised Machine Learning - Javatpoint
9 pages
CSC1002 Week1 Overview
No ratings yet
CSC1002 Week1 Overview
87 pages
Analysis Knowledge
No ratings yet
Analysis Knowledge
32 pages
Gcu Coursework Submission Form
100% (2)
Gcu Coursework Submission Form
6 pages
Mark Scheme - Circular Merge
No ratings yet
Mark Scheme - Circular Merge
2 pages

Lecture 06

Uploaded by

Lecture 06

Uploaded by

HIGH PERFORMANCE

Dr. Mohamed Ghetas

Shared Memory Distributed Memory

Bus Direct Indirect

 Two main cache coherence protocols:

 The cores share a bus .

 When a variable is updated, the directory is

 We can parallelize the previous code by

 Suppose our shared-memory system has two

 Since all of y is stored in a single cache line,

 This is called false sharing, because the

 How to solve this problem?

In distributed memory programs:

 A SPMD programs consists of a single

1. Divide the work among the double x[n], y[n];

You might also like