0% found this document useful (0 votes)

60 views44 pages

CS621 Week 6

Uploaded by

Gurchani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views44 pages

CS621 Week 6

Uploaded by

Gurchani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Dr.

Muhammad Anwaar Saeed

Dr. Said Nabi
Ms. Hina Ishaq

CS621 Parallel and Distributed

Computing
Concurrency Control

CS621 Parallel and Distributed

Computing
What is Concurrency?

Objective
s
Mechanisms for Concurrency
Control.
Definition of Concurrency

“Concurrency is the task of running two or more computations

over the same time interval. Two events are said to be concurrent
if they occur within the same time interval.”
What is Concurrency?

Concurrent doesn't necessarily mean at the same exact instant.

For example, two tasks may occur concurrently within the same
second but with each task executing within different fractions of
the second.

Concurrent tasks can execute in a single or multiprocessing

environment.
In a multiprocessor environment, if enough processors are
In a single processing environment, concurrent tasks exist at free, concurrent tasks may execute at the same instant over
the same time and execute within the same time period by the same time period. The determining factor for what
context switching. makes an acceptable time period for concurrency is relative
to the application.
Concurrency Control

• There must be an implicit or explicit control over concurrency. It is both

hazardous and unsafe when multiple flows of executions
simultaneously operate in the same address space without any kind of
agreement on ordered access. Two or more activities might access
the same data and thus induce data corruption as well as inconsistent
or invalid application state.
• Multiple activities that work jointly on a problem need an agreement
on their common progress. Both issues represent fundamental
challenges of concurrency and concurrent programming.
Concurrency Control Cont…

• Synchronization is a mechanism
Synchronizatio that controls access on shared
n and resources between multiple
Coordination activities. It enforces exclusiveness
are two basic and ordered access on the resource
by different activities.
approaches to • Coordination aims at the
tackle this orchestration of collaborating
challenge: activities.
The overall time to perform the series of tasks
is reduced.

Concurrent processes can reduce duplication in

code.

The overall runtime of the algorithm can be

Benefits of significantly reduced.
Concurrency control can also increase the
Concurrency scalability of parallel and distributed computing
systems

Redundancy can make systems more reliable.

CS621 Parallel and Distributed

Computing
Understanding of Concurrency
Control.

Objective
Parallel Programming Technique.
s

Distributed Programming
Technique.
Achieving Concurrency

Parallel • Parallel programming techniques assign

programming the work a program has do to two or
and distributed more processors within a single physical
or a single virtual computer.
programming • Distributed programming techniques
are two basic assign the work a program has to do to
approaches for two or more processes where the
processes may or may not exist on the
achieving same computer.
concurrency:
Achieving Concurrency:
Parallel Programming Technique
• The parallel application consists of one program
divided into four tasks. Each task executes on a
separate processor; therefore, each task may
execute simultaneously. The tasks can be
implemented by either a process or a thread.

Typical architecture for a

parallel program.
Achieving Concurrency:
Distributed Programming Technique

• The distributed application consists of three

separate programs with each program executing on
a separate computer. Program 3 consists of two
separate parts that execute on the same computer.
Although task A and D of Program 3 are on the
same computer, they are distributed because they
are implemented by two separate processes. Typical architecture for a
parallel and distributed
program.
Concurrency Control:
Models for Programming
Concurrency

CS621 Parallel and Distributed

Computing
Models of Programming
Concurrency.

Objective
s
Van Roy Approaches for
Programming Concurrency.
Concurrency Control: Models for
Programming Concurrency

Van • Sequential Programming.

Roy introduces • Declarative Concurrency.
four main • Message-passing
approaches for Concurrency.
programming • Shared-state Concurrency.
concurrency:
Sequential Programming

In this deterministic programming model, no

concurrency is used at all. In its strongest form,
there is a total order of all operations of the
program. Weaker forms still keep the deterministic
behavior. However, they either make no guarantees
on the exact execution order to the programmer a
priori. Or they provide mechanisms for explicit
preemption of the task currently active, as co-
routines do, for instance.
Declarative Concurrency

Declarative programming is a programming model that

favors implicit control flow of computations. Control flow
is not described directly, it is rather a result of
computational logic of the program. The declarative
concurrency model extends the declarative programming
model by allowing multiple flows of executions. It adds
implicit concurrency that is based on a data-driven or a
demand-driven approach. While this introduces some
form of nondeterminism at runtime, the nondeterminism
is generally not observable from the outside.
Message-passing Concurrency

This model is a programming style that allows

concurrent activities to communicate via
messages. Generally, this is the only allowed
form of interaction between activities which are
otherwise completely isolated. Message passing
can be either synchronous or asynchronous
resulting in different mechanisms and patterns
for synchronization and coordination.
Shared-state Concurrency

Shared-state concurrency is an extended programming

model where multiple activities are allowed to access
contended resources and states. Sharing the exact
same resources and states among different activities
requires dedicated mechanisms for synchronization of
access and coordination between activities. The
general nondeterminism and missing invariants of this
model would otherwise directly cause problems
regarding consistency and state validity.
Memory Hierarchies

CS621 Parallel and Distributed

Computing
Introduction of Memory Hierarchy.

Objective
s
Characteristics of Memory
Hierarchy.
Memory Hierarchies

“Memory Hierarchy, in Computer System Design, is an

enhancement that helps in organizing the memory so that it
can minimize the access time. The development of the
Memory Hierarchy occurred on a behavior of a program
known as locality of references.”
Memory Hierarchies Cont…
We are concerned with five types of memory:
• Registers: are the fastest type of memory, which are located internal to a processor. These
elements are primarily used for temporary storage of operands, small partitions of memory,
etc., and are assumed to be one word (32 bits) in length in the MIPS architecture.
• Cache: is a very fast type of memory that can be external or internal to a given processor.
Cache is used for temporary storage of blocks of data obtained from main memory (read
operation) or created by the processor and eventually written to main memory (write
operation).
• Main Memory: is modelled as a large, linear array of storage elements that is partitioned
into static and dynamic storage. Main memory is used primarily for storage of data that a
program produces during its execution, as well as for instruction storage.
• Disk Storage: is much slower than main memory, but also has much higher capacity than
the preceding three types of memory.
• Archival Storage: is offline storage such as a CD-ROM jukebox or (in former years) rooms
filled with racks containing magnetic tapes. This type of storage has a very long access
time, in comparison with disk storage, and is also designed to be much less volatile than
disk data.
Memory Hierarchies Cont…
Characteristics of Memory Hierarchy

Characteristi • Capacity: It refers to the total volume of

cs of a data that a system’s memory can store.
Memory The capacity increases moving from the
top to the bottom in the Memory Hierarchy.
Hierarchy • Access Time: It refers to the time interval
can be present between the request for read/write
and the data availability. The access time
inferred from increases as we move from the top to the
the previous bottom in the Memory Hierarchy.
figure:
Characteristics of Memory Hierarchy Cont..

Characteristi • Performance: When a computer system was designed

earlier without the Memory Hierarchy Design, the gap in
cs of a speed increased between the given CPU registers and the
Main Memory due to a large difference in the system’s
Memory access time. It ultimately resulted in the system’s lower
performance, and thus, enhancement was required. Such a
Hierarchy kind of enhancement was introduced in the form of Memory
Hierarchy Design, and because of this, the system’s
can be performance increased. One of the primary ways to increase
the performance of a system is minimizing how much a
inferred from memory hierarchy has to be done to manipulate data.
• Cost per bit: The cost per bit increases as one moves from
the previous the bottom to the top in the Memory Hierarchy, i.e. External
Memory is cheaper than Internal Memory.
figure:
Limitations of Memory
System Performance

CS621 Parallel and Distributed

Computing
Understanding of Limitations of
Memory System Performance.

Objective
s
Memory Latency Example.
Memory system, and not processor speed, is
often the bottleneck for many applications.

Limitations of
Memory
System
Performance Memory system performance is largely
captured by two parameters, latency and
bandwidth.
Latency: Is the time from
Bandwidth: Is the rate at
the issue of a memory
which data can be pumped
request to the time the data
to the processor by the
is available at the
memory system.
processor.
It is very important to understand the
difference between latency and
bandwidth.
Limitations of
Consider the example of a fire-hose. If
Memory the water comes out of the hose two
System seconds after the hydrant is turned on,
Performance the latency of the system is two
seconds.
Cont… • Once the water starts flowing, if the hydrant
delivers water at the rate of 5 gallons/second,
the bandwidth of the system is 5 gallons/second.
• If you want immediate response from the
hydrant, it is important to reduce latency.
• If you want to fight big fires, you want high
bandwidth.
Memory Latency Example

Consider a processor operating at 1 GHz (1 ns clock) connected to

a DRAM with a latency of 100 ns (no caches). Assume that the
processor has two multiply-add units and is capable of executing
four instructions in each cycle of 1 ns. The following observations
follow:
• The peak processor rating is 4 GFLOPS.
• Since the memory latency is equal to 100 cycles and block size is one
word, every time a memory request is made, the processor must wait 100
cycles before it can process the data.
Improving Effective
Memory
Latency Using Caches

CS621 Parallel and Distributed

Computing
Effect of Cache.

Objective
s
Effect of Cache with Example.
Improving Effective Memory Latency
Using Caches
Caches are small and fast memory elements between the
processor and DRAM.

This memory acts as a low-latency high-bandwidth storage.

If a piece of data is repeatedly used, the effective latency of

this memory system can be reduced by the cache.

The fraction of data references satisfied by the cache is

called the cache hit ratio of the computation on the system.

Cache hit ratio achieved by a code on a memory system

often determines its performance.
Effect of Cache

In our example, we had

O(n2) data accesses and
Repeated references to the O(n3) computation. This Reduce network congestion
same data item correspond asymptotic difference and improve overall
to temporal locality. makes the above example performance.
particularly desirable for
caches.
Effect of Cache Example

Continue the previous example of memory latency, we introduce a

cache of size 32 KB with a latency of 1 ns or one cycle. We use
this setup to multiply two matrices A and B of dimensions 32 ×
32(8KB or 1K words for each matrix). We have carefully chosen
these numbers so that the cache is large enough to store matrices
A and B, as well as the result matrix C.
• 1GHz processor, 4GFLOPS theoretical peak, 100ns memory Latency.
• Assume 1ns cache latency (full-speed cache)
Effect of Cache Example Cont…

The following observations can be made about the problem:

• Fetching the two matrices into the cache corresponds to fetching 2K
words, which takes approximately 200 µs.
• Multiplying two n × n matrices takes 2n3 operations. For our problem, this
corresponds to 64K operations, which can be performed in 16K cycles (or
16 µs) at four instructions per cycle.
• The total time for the computation is therefore approximately the sum of
time for load/store operations and the time for the computation itself, i.e.,
200 + 16 µs.
• This corresponds to a peak computation rate of 64K/216 or 303 MFLOPS.
Effect of Memory
Bandwidth

CS621 Parallel and Distributed

Computing
Effect of Memory Bandwidth.

Objective
s
Effect of Memory Bandwidth
Example.
Effect of Memory Bandwidth

Memory bandwidth is determined by the

bandwidth of the memory bus as well as the
memory units.

Memory bandwidth can be improved by

increasing the size of memory blocks.

The performance of the CPU or GPU can also

impact memory bandwidth.
Effect of Memory Bandwidth Example

Consider the same setup as in previous topic, except

in this case, the block size is 4 words instead of 1
word. We repeat the dot-product computation in this
scenario:
• Assuming that the vectors are laid out linearly in memory, eight
FLOPs (four multiply-adds) can be performed in 200 cycles.
• This is because a single memory access fetches four consecutive
words in the vector.
• Therefore, two accesses can fetch four elements of each of the
vectors. This corresponds to a FLOP every 25 ns, for a peak speed
of 40 MFLOPS.
Effect of Memory Bandwidth Example
Cont…
It is important to note that increasing block size does not
change latency of the system.

Physically, the scenario illustrated here can be viewed as

a wide data bus (4 words or 128 bits) connected to
multiple memory banks.

In practice, such wide buses are expensive to construct.

In a more practical system, consecutive words are sent

on the memory bus on subsequent bus cycles after the
first word is retrieved.
Effect of Memory Bandwidth Example Cont…

The above examples clearly illustrate how

increased bandwidth results in higher peak
computation rates.

The data layouts were assumed to be such that

consecutive data words in memory were used
by successive instructions (spatial locality of
reference).

If we take a data-layout centric view,

computations must be reordered to enhance
spatial locality of reference.

Gregory R. Andrews-Foundations of Multithreaded, Parallel, and Distributed Programming-Addison-Wesley (1999)
100% (4)
Gregory R. Andrews-Foundations of Multithreaded, Parallel, and Distributed Programming-Addison-Wesley (1999)
682 pages
HPC Notes
No ratings yet
HPC Notes
24 pages
High Performance Computing
100% (2)
High Performance Computing
164 pages
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
100% (1)
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
38 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
CS3551 Distributed Computing
100% (1)
CS3551 Distributed Computing
14 pages
Introduction To Parallel Computing: John Von Neumann Institute For Computing
No ratings yet
Introduction To Parallel Computing: John Von Neumann Institute For Computing
18 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Lab2a - RGB LED PDF
0% (1)
Lab2a - RGB LED PDF
18 pages
Parallel & Distributed Computing
No ratings yet
Parallel & Distributed Computing
47 pages
ETAP 22 - System Requirements
No ratings yet
ETAP 22 - System Requirements
1 page
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
Program Labs For Beginners-Mikrotik
No ratings yet
Program Labs For Beginners-Mikrotik
4 pages
JN0-281 Data Center, Associate (JNCIA-DC) Updated Dumps
No ratings yet
JN0-281 Data Center, Associate (JNCIA-DC) Updated Dumps
7 pages
Operating System 6
No ratings yet
Operating System 6
16 pages
Concepts of Concurrent Programming by David W. Bustard
No ratings yet
Concepts of Concurrent Programming by David W. Bustard
51 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
16 pages
HPC Computer Engg Sem 8 Notes
No ratings yet
HPC Computer Engg Sem 8 Notes
36 pages
Com 419 - CSD - Cat 2 2023
No ratings yet
Com 419 - CSD - Cat 2 2023
5 pages
CSE211 Computer Architecturemodule 18-21
No ratings yet
CSE211 Computer Architecturemodule 18-21
19 pages
PARALLEL COMPUTER MEMORY ARCHITECTURE Hybrid Distributed Shared Memory
No ratings yet
PARALLEL COMPUTER MEMORY ARCHITECTURE Hybrid Distributed Shared Memory
20 pages
Week 3
No ratings yet
Week 3
29 pages
Cs 621
No ratings yet
Cs 621
7 pages
Exposicion de Jimenez
No ratings yet
Exposicion de Jimenez
19 pages
Unit-7 Design Issues For Parallel Computers Definition
No ratings yet
Unit-7 Design Issues For Parallel Computers Definition
11 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
Parallel and Distributed Lec 7
No ratings yet
Parallel and Distributed Lec 7
35 pages
Sequential Control Flow:: If Else If
No ratings yet
Sequential Control Flow:: If Else If
35 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
14 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
64 pages
Unit 5 - Distributed Algorithms
No ratings yet
Unit 5 - Distributed Algorithms
15 pages
IX. Parallel Algorithms Design (1) : April 13, 2009
No ratings yet
IX. Parallel Algorithms Design (1) : April 13, 2009
31 pages
HPC Insem 2024 FlyHigh Services
No ratings yet
HPC Insem 2024 FlyHigh Services
10 pages
CC - Unit 1
No ratings yet
CC - Unit 1
29 pages
Module 4 - Architecture
No ratings yet
Module 4 - Architecture
22 pages
Parallel Computers
No ratings yet
Parallel Computers
39 pages
Introduction
No ratings yet
Introduction
34 pages
IT105 Midterm Lecture Part1
No ratings yet
IT105 Midterm Lecture Part1
5 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
5CS022 Lecture 1
No ratings yet
5CS022 Lecture 1
36 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
15cs72aca Module-5 Aca
No ratings yet
15cs72aca Module-5 Aca
53 pages
COS 464 Concurrent Progr by DR Mrs Asogwa
No ratings yet
COS 464 Concurrent Progr by DR Mrs Asogwa
16 pages
Parallel Programming - Unit 1
No ratings yet
Parallel Programming - Unit 1
81 pages
Biochrom Anthos Zenyth 200 Quick Cuvette Measurements - Zenyth-200-RT-Quick-Cuvette-Measurements-V2
No ratings yet
Biochrom Anthos Zenyth 200 Quick Cuvette Measurements - Zenyth-200-RT-Quick-Cuvette-Measurements-V2
3 pages
CS621 Week 2
No ratings yet
CS621 Week 2
32 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
CS621 Week 01
No ratings yet
CS621 Week 01
61 pages
Unit1 2 and 3
No ratings yet
Unit1 2 and 3
76 pages
U1&u2 Padcom-25
No ratings yet
U1&u2 Padcom-25
95 pages
PDC 4011
No ratings yet
PDC 4011
21 pages
PA Midsem
No ratings yet
PA Midsem
20 pages
Oracle Performance Improvement by Tuning Disk Input Output
No ratings yet
Oracle Performance Improvement by Tuning Disk Input Output
4 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
No ratings yet
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
4 pages
Oracle E-Business Suite Installation and Upgrade Notes Release 12 (12.1.1) For HP-UX PA-RISC (64-Bit) (ID 762894.1)
100% (1)
Oracle E-Business Suite Installation and Upgrade Notes Release 12 (12.1.1) For HP-UX PA-RISC (64-Bit) (ID 762894.1)
13 pages
Parallel Computing
No ratings yet
Parallel Computing
19 pages
Distributed Computing
No ratings yet
Distributed Computing
189 pages
Db2 LUW Administration Basic Commands
No ratings yet
Db2 LUW Administration Basic Commands
20 pages
Lecture 45
No ratings yet
Lecture 45
7 pages
PostgreSQL Cheatsheet
No ratings yet
PostgreSQL Cheatsheet
2 pages
Draft Specifications For PR and DR
No ratings yet
Draft Specifications For PR and DR
9 pages
Example of Fork Programming.
No ratings yet
Example of Fork Programming.
9 pages
Easy Project List Fall 2024 BY VUBWN - NASIR ABBAS
No ratings yet
Easy Project List Fall 2024 BY VUBWN - NASIR ABBAS
8 pages
MPMC
No ratings yet
MPMC
2 pages
FAQ en Improving Performance of Allplan
No ratings yet
FAQ en Improving Performance of Allplan
4 pages
MWG Installation 11.0.x IG-INSTALLATION-1021-EN
No ratings yet
MWG Installation 11.0.x IG-INSTALLATION-1021-EN
83 pages
Community: Additional Information About The Problem
No ratings yet
Community: Additional Information About The Problem
13 pages
Open Kicks Print
No ratings yet
Open Kicks Print
24 pages
Lecture 26
No ratings yet
Lecture 26
3 pages
Platform Designer Introduction
No ratings yet
Platform Designer Introduction
33 pages
Profile Sage Iec-104
No ratings yet
Profile Sage Iec-104
7 pages
H20 CSC458 Sample Final Solutions
No ratings yet
H20 CSC458 Sample Final Solutions
4 pages
Chapter 2 - HSRP PDF
No ratings yet
Chapter 2 - HSRP PDF
34 pages
Lecture 38
No ratings yet
Lecture 38
8 pages
Practise Questions
No ratings yet
Practise Questions
5 pages
Troubleshooting: Using Recovery Procedures
No ratings yet
Troubleshooting: Using Recovery Procedures
18 pages
Intel 8251 USART
No ratings yet
Intel 8251 USART
21 pages
UCM Series IP PBX Firmware Release Notes: Table of Content
No ratings yet
UCM Series IP PBX Firmware Release Notes: Table of Content
17 pages
5.4 Flash 20240925
No ratings yet
5.4 Flash 20240925
8 pages
CUCM BK CA3A517A 00 Cisco-Unified-Real-Time-Monitoring-Tool Chapter 01
No ratings yet
CUCM BK CA3A517A 00 Cisco-Unified-Real-Time-Monitoring-Tool Chapter 01
12 pages
CIV LogFile
No ratings yet
CIV LogFile
11 pages
NW75 X PAMDBOS
No ratings yet
NW75 X PAMDBOS
3 pages
CA3 - Notes On CA CA3 - Notes On CA
No ratings yet
CA3 - Notes On CA CA3 - Notes On CA
7 pages
Unit 3 - 2 Federation Cloud
No ratings yet
Unit 3 - 2 Federation Cloud
10 pages
Lecture 27
No ratings yet
Lecture 27
2 pages
Arbaz Synopsis1
No ratings yet
Arbaz Synopsis1
7 pages
Mastering Concurrency and Parallel Programming Unlock the Secrets of Expert-Level Skills.pdf
From Everand
Mastering Concurrency and Parallel Programming Unlock the Secrets of Expert-Level Skills.pdf
Larry Jones
No ratings yet
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet

CS621 Week 6

Uploaded by

CS621 Week 6

Uploaded by

Dr.

Muhammad Anwaar Saeed

CS621 Parallel and Distributed

CS621 Parallel and Distributed

“Concurrency is the task of running two or more computations

Concurrent doesn't necessarily mean at the same exact instant.

Concurrent tasks can execute in a single or multiprocessing

• There must be an implicit or explicit control over concurrency. It is both

Concurrent processes can reduce duplication in

The overall runtime of the algorithm can be

Redundancy can make systems more reliable.

More real-world problems can be solved than

CS621 Parallel and Distributed

Parallel • Parallel programming techniques assign

Typical architecture for a

• The distributed application consists of three

CS621 Parallel and Distributed

Van • Sequential Programming.

In this deterministic programming model, no

Declarative programming is a programming model that

This model is a programming style that allows

Shared-state concurrency is an extended programming

CS621 Parallel and Distributed

“Memory Hierarchy, in Computer System Design, is an

Characteristi • Capacity: It refers to the total volume of

Characteristi • Performance: When a computer system was designed

CS621 Parallel and Distributed

Consider a processor operating at 1 GHz (1 ns clock) connected to

CS621 Parallel and Distributed

This memory acts as a low-latency high-bandwidth storage.

If a piece of data is repeatedly used, the effective latency of

The fraction of data references satisfied by the cache is

Cache hit ratio achieved by a code on a memory system

In our example, we had

Continue the previous example of memory latency, we introduce a

The following observations can be made about the problem:

CS621 Parallel and Distributed

Memory bandwidth is determined by the

Memory bandwidth can be improved by

The performance of the CPU or GPU can also

Consider the same setup as in previous topic, except

Physically, the scenario illustrated here can be viewed as

In practice, such wide buses are expensive to construct.

In a more practical system, consecutive words are sent

The above examples clearly illustrate how

The data layouts were assumed to be such that

If we take a data-layout centric view,

You might also like