2 Parallel Computer Memory Architectures

The document discusses different types of parallel computer memory architectures and parallel programming models. It describes shared memory architectures including uniform memory access (UMA), non-uniform memory access (NUMA), and the advantages and disadvantages of shared memory. It also describes distributed memory and hybrid distributed-shared memory architectures. Finally, it summarizes common parallel programming models including shared memory, threads, message passing, data parallel, and others.

Uploaded by

raja usama201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

2 Parallel Computer Memory Architectures

Uploaded by

raja usama201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Introduction to Parallel Computing

Parallel Computer Memory Architectures

Shared Memory
• All processors access all memory as global address space
• Multiple processors can operate independently but share the same memory
resources
• Changes in a memory location effected by one processor are visible to all
other processors.
Shared Memory
• Shared memory machines are classified as UMA and NUMA, based
upon memory access times
• Uniform Memory Access (UMA)
• Most commonly represented today by Symmetric Multiprocessor (SMP)
machines
• Identical processors
• Equal access and access times to memory
• Sometimes called CC-UMA - Cache Coherent UMA
• Cache coherent means if one processor updates a location in shared memory, all the
other processors know about the update. Cache coherency is accomplished at the
hardware level Shared Memory
Shared Memory
• Non-Uniform Memory Access (NUMA)
• Often made by physically linking two or more SMPs
• One SMP can directly access memory of another SMP
• Not all processors have equal access time to all memories
• Memory access across link is slower
• If cache coherency is maintained, then may also be called CC- NUMA - Cache
Coherent NUMA Shared Memory
Shared Memory
• Advantages
• Global address space provides a user-friendly programming perspective to memory
• Data sharing between tasks is both fast and uniform due to the proximity of
memory to CPUs
• Disadvantages
• Primary disadvantage is the lack of scalability between memory and CPUs
• Adding more CPUs can geometrically increases traffic on the shared memory-CPU
path, and for cache coherent systems, geometrically increase traffic associated
with cache/memory management
• Programmer responsibility for synchronization constructs that ensure "correct"
access of global memory Shared Memory
Distributed Memory
• Processors have their own local memory
• Changes to processor’s local memory have no effect on the memory of other processors
• When a processor needs access to data in another processor, it is usually the task of the
programmer to explicitly define how and when data is communicated
• Synchronization between tasks is likewise the programmer's responsibility
• The network "fabric" used for data transfer varies widely, though it can be as simple as Ethernet
Distributed Memory
• Advantages
• Memory is scalable with the number of processors.
• Increase the number of processors and the size of memory increases proportionately
• Each processor can rapidly access its own memory without interference and
without the overhead incurred with trying to maintain global cache coherency.
• Cost effectiveness: can use commodity, off-the-shelf processors and networking
• Disadvantages
• The programmer is responsible for many of the details associated with data
communication between processors.
• Non-uniform memory access times - data residing on a remote node takes longer to
access than node local data.
Hybrid Distributed-Shared Memory
• The largest and fastest computers in the world today employ both
shared and distributed memory architectures
• The shared memory component can be a shared memory machine or
graphics processing units
• The distributed memory component is the networking of multiple shared
memory or GPU machines
Hybrid Distributed-Shared Memory
• Advantages and Disadvantages
• Whatever is common to both shared and distributed memory architectures
• Increased scalability is an important advantage
• Increased programmer complexity is an important disadvantage
Parallel Programming Models
Parallel Programming Model
• Programming model provides an abstract view of computing system

• Abstraction above hardware and memory architectures

• Value of a programming model is usually judged on its generality

• how well a range of different problems can be expressed and
• how well they execute on a range of different architectures

• The implementation of a programming model can take several forms such as

• libraries invoked from traditional sequential languages,
• language extensions, or
• complete new execution models
Parallel Programming Model
• Parallel programming models in common use:
• Shared Memory (without threads)
• Threads
• Distributed Memory / Message Passing
• Data Parallel
• Hybrid
• Single Program Multiple Data (SPMD)
• Multiple Program Multiple Data (MPMD)
• These models are NOT specific to a particular type of machine or
memory architecture
• Any of these models can be implemented on any underlying hardware
Parallel Programming Model
• Parallel programming models in common use:
• Shared Memory (without threads)
• Threads
• Distributed Memory / Message Passing
• Data Parallel
• Hybrid
• Single Program Multiple Data (SPMD)
• Multiple Program Multiple Data (MPMD)
• These models are NOT specific to a particular type of machine or
memory architecture
• Any of these models can be implemented on any underlying hardware
Parallel Programming Model
• SHARED memory model on a DISTRIBUTED memory machine
• Machine memory was physically distributed across networked machines, but
appeared to the user as a single shared memory (global address space).
• This approach is referred to as virtual shared memory
• DISTRIBUTED memory model on a SHARED memory machine
• The SGI Origin 2000 employed the CC-NUMA type of shared memory
architecture, where every task has direct access to global address space
spread across all machines.
• However, the ability to send and receive messages using MPI, as is commonly
done over a network of distributed memory machines, was implemented and
commonly used
Shared Memory Model - Without Threads
• Tasks share a common address space
• Efficient means of passing data between programs
• Various mechanisms such as locks / semaphores may be used to control access to the
shared memory
• Programmer's point of view
• The notion of data "ownership" is lacking, so there is no need to specify explicitly the
communication of data between tasks
• Program development can often be simplified
• Disadvantage in terms of performance
• It becomes more difficult to understand and manage data locality:
• Keeping data local to the processor that works on it conserves memory accesses, cache
refreshes and bus traffic that occurs when multiple processors use the same data
Shared Memory Model - Without Threads
• Implementations
• Native compilers or hardware translate user program variables into
actual memory addresses, which are global

• On stand-alone shared memory machines, this is straightforward.

• On distributed shared memory machines, memory is physically distributed
across a network of machines, but made global through specialized hardware
and software.
Threads Model
• Type of shared memory programming model
• A single "heavy weight" process can have multiple "light weight", concurrent
execution paths
• Main program a.out is scheduled by native OS
• a.out loads and acquires all of the necessary system and user resources to
run. This is the "heavy weight" process
• a.out performs some serial work, and then creates a number of tasks
(threads) that can be scheduled and run by the operating system concurrently
Threads Model
• Each thread has local data, but also, shares the entire resources of a.out
• This saves the overhead associated with replicating a program's resources
for each thread ("light weight"). Each thread also benefits from a global
memory view because it shares the memory space of a.out
• Threads communicate with each other through global memory (updating
address locations). This requires synchronization constructs to ensure that
more than one thread is not updating the same global address at any time
• Threads can come and go, but a.out remains present to provide the
necessary shared resources until the application has completed
Threads Model
• Implementations
• POSIX Threads
• Library based; requires parallel coding
• C Language only
• Commonly referred to as Pthreads.
• Most hardware vendors now offer Pthreads in addition to their proprietary threads
implementations.
• Very explicit parallelism; requires significant programmer attention to detail.
• OpenMP
• Compiler directive based; can use serial code
• Portable / multi-platform, including Unix and Windows platforms
• Available in C/C++ and Fortran implementations
• Can be very easy and simple to use
Distributed Memory / Message Passing Model
• A set of tasks that use their own local memory during computation
• Multiple tasks can reside on the same physical machine and/or across an
arbitrary number of machines.
• Tasks exchange data through communications by sending and receiving
messages.
• Data transfer usually requires cooperative operations to be performed by
each process. For example, a send operation must have a matching receive
operation.
Distributed Memory / Message Passing Model
• Implementations
• From a programming perspective, message passing implementations usually
comprise a library of subroutines
• Calls to these subroutines are imbedded in source code
• MPI specifications are available on the web at http://www.mpi-
forum.org/docs/
• MPI implementations exist for virtually all popular parallel computing
platforms
Data Parallel Model
• Address space is treated globally
• A set of tasks work collectively on the same data structure, however,
each task works on a different partition of the same data structure
• On shared memory architectures, all tasks may have access to the data
structure through global memory
• On distributed memory architectures the data
structure is split up and resides as "chunks"
in the local memory of each task
Data Parallel Model
• Implementations
• Unified Parallel C (UPC):
An extension to the C programming language for SPMD parallel programming.
Compiler dependent. More information: http://upc.lbl.gov/
• Global Arrays:
Provides a shared memory style programming environment in the context of
distributed array data structures. Public domain library with C and Fortran77
bindings. More information: http://www.emsl.pnl.gov/docs/global/
• X10:
A PGAS based parallel programming language being developed by IBM at the
Thomas J. Watson Research Center. More information: http://x10-lang.org/
Hybrid Model
• A hybrid model combines more than one of the previously described
programming models
Single Program Multiple Data
• High level programming model that can be built upon any
combination of the previously mentioned parallel programming
models.
• SINGLE PROGRAM: All tasks execute their copy of the same program
simultaneously.
• This program can be threads, message passing, data parallel or hybrid.
• MULTIPLE DATA: All tasks may use different data

DBMS - Transaction Management Notes
50% (4)
DBMS - Transaction Management Notes
43 pages
01-Bullion S General Training V3
No ratings yet
01-Bullion S General Training V3
66 pages
Final Year Project Proposal
No ratings yet
Final Year Project Proposal
1 page
Msi b250 Gaming m3 Ms-7a62 Rev1.0
No ratings yet
Msi b250 Gaming m3 Ms-7a62 Rev1.0
67 pages
TUTORIAL 5 Datapath For R Format, I Format and J Format: For ALU, Load/store, Branch and Jump
No ratings yet
TUTORIAL 5 Datapath For R Format, I Format and J Format: For ALU, Load/store, Branch and Jump
1 page
Lecture 4
No ratings yet
Lecture 4
20 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
64 pages
Chapter 2 - Parallel Algorithm Design
No ratings yet
Chapter 2 - Parallel Algorithm Design
84 pages
2. Parallel Computers
No ratings yet
2. Parallel Computers
39 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
PP_CS(451)
No ratings yet
PP_CS(451)
89 pages
P D Group2-2
No ratings yet
P D Group2-2
6 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Unit 2 Pram Algorithms: Structure Page Nos
No ratings yet
Unit 2 Pram Algorithms: Structure Page Nos
25 pages
Parallel Computing Lecture # 6: Parallel Computer Memory Architectures
No ratings yet
Parallel Computing Lecture # 6: Parallel Computer Memory Architectures
16 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
comporg6_ch12
No ratings yet
comporg6_ch12
36 pages
Embedded System Architecture
No ratings yet
Embedded System Architecture
10 pages
MCP ppt
No ratings yet
MCP ppt
19 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Classification Based On Memory Access Architecture Shared Memory General Characteristics: General Characteristics
No ratings yet
Classification Based On Memory Access Architecture Shared Memory General Characteristics: General Characteristics
4 pages
Lecture 3 PDC
No ratings yet
Lecture 3 PDC
21 pages
3.3-Recent Trends in Parallel Computing
No ratings yet
3.3-Recent Trends in Parallel Computing
12 pages
Introduction To Parallel Computing: John Von Neumann Institute For Computing
No ratings yet
Introduction To Parallel Computing: John Von Neumann Institute For Computing
18 pages
DST4030A Lecture Notes Week 4
No ratings yet
DST4030A Lecture Notes Week 4
42 pages
Unit 1
No ratings yet
Unit 1
25 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
Parallel Programming Models
No ratings yet
Parallel Programming Models
25 pages
09 Communication models of Parallel platforms
No ratings yet
09 Communication models of Parallel platforms
25 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
Multiprocessors and Multicomputers
No ratings yet
Multiprocessors and Multicomputers
27 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Parallel and Distributed Computing Lecture#12
No ratings yet
Parallel and Distributed Computing Lecture#12
19 pages
Lec 5 SharedArch PDF
No ratings yet
Lec 5 SharedArch PDF
16 pages
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
No ratings yet
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
22 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
lecture 3
No ratings yet
lecture 3
16 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Lecture 13 - Programming Models
No ratings yet
Lecture 13 - Programming Models
15 pages
V3i9201434 PDF
No ratings yet
V3i9201434 PDF
6 pages
Demystifying Multicore Germany 14 PDF
No ratings yet
Demystifying Multicore Germany 14 PDF
82 pages
Lecture-3 Parallel Computer Memory Architecture
No ratings yet
Lecture-3 Parallel Computer Memory Architecture
14 pages
Lecture 6 Parallel Programming Models
No ratings yet
Lecture 6 Parallel Programming Models
17 pages
Chapter 5 - Shared Memory Multiprocessor
No ratings yet
Chapter 5 - Shared Memory Multiprocessor
96 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
09 Communication models of Parallel platforms
No ratings yet
09 Communication models of Parallel platforms
25 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
QUIZ PREP
No ratings yet
QUIZ PREP
21 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
Parallel Computing Memory Architectures
No ratings yet
Parallel Computing Memory Architectures
14 pages
Lecture 1.2.2
No ratings yet
Lecture 1.2.2
13 pages
ACA-Unit5-Notes
No ratings yet
ACA-Unit5-Notes
26 pages
Term Paper: Computer Organization and Architecure (Cse211)
No ratings yet
Term Paper: Computer Organization and Architecure (Cse211)
7 pages
Why Multiprocessors?: Motivation: Opportunity
No ratings yet
Why Multiprocessors?: Motivation: Opportunity
20 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
Operating Systems Interview Questions You'll Most Likely Be Asked
From Everand
Operating Systems Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Computer Science I Essentials
From Everand
Computer Science I Essentials
Randall Raus
5/5 (7)
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
UNIT4 - Embedded System RTOS 1 - JP PDF
No ratings yet
UNIT4 - Embedded System RTOS 1 - JP PDF
41 pages
Introduce NetBackup Accelerator - 0
No ratings yet
Introduce NetBackup Accelerator - 0
20 pages
Who Owns The Internet
No ratings yet
Who Owns The Internet
3 pages
interrupts
No ratings yet
interrupts
4 pages
LCD5500Z v1.0 - V1.2 Program Sheet
No ratings yet
LCD5500Z v1.0 - V1.2 Program Sheet
2 pages
7345v1.2 (P35 Neo2)
No ratings yet
7345v1.2 (P35 Neo2)
121 pages
Lab 3 Ip Spoofing and Denial of Service Ms Word15
No ratings yet
Lab 3 Ip Spoofing and Denial of Service Ms Word15
19 pages
BR Ethernet IP User Manual
No ratings yet
BR Ethernet IP User Manual
35 pages
Terminal Server Configuration Using Menu
No ratings yet
Terminal Server Configuration Using Menu
3 pages
Ebs Diagnostics Troubleshooting 483730
No ratings yet
Ebs Diagnostics Troubleshooting 483730
104 pages
Multi PTINT
No ratings yet
Multi PTINT
22 pages
HUAWEI Y300-0100 Software Upgrade Guide
No ratings yet
HUAWEI Y300-0100 Software Upgrade Guide
7 pages
Role of Microcontroller in Embedded System Market
100% (6)
Role of Microcontroller in Embedded System Market
3 pages
AK ICR 11 Installation
No ratings yet
AK ICR 11 Installation
1 page
Group 6: Advanced Networking and Data Communication INFO 405
No ratings yet
Group 6: Advanced Networking and Data Communication INFO 405
15 pages
Kubernets
No ratings yet
Kubernets
14 pages
1000 Mbit Crossover Cable Pinout Diagram at Pinouts
No ratings yet
1000 Mbit Crossover Cable Pinout Diagram at Pinouts
2 pages
Aquilion Multi - SS-download Startup Has Failed
100% (1)
Aquilion Multi - SS-download Startup Has Failed
8 pages
Trace Tables: Natalee A. Johnson, Contributor
No ratings yet
Trace Tables: Natalee A. Johnson, Contributor
166 pages
Microsoft 365 Office 365 Plan Comparison Details Enclyne 1
No ratings yet
Microsoft 365 Office 365 Plan Comparison Details Enclyne 1
1 page
Student Help Guide - Take-Home - Portfolio On Myexams
No ratings yet
Student Help Guide - Take-Home - Portfolio On Myexams
8 pages
SC TPS Release Notes 4 2 7 (v1)
No ratings yet
SC TPS Release Notes 4 2 7 (v1)
7 pages
Eventlog
No ratings yet
Eventlog
3 pages
Watson Studio - IBM Cloud
No ratings yet
Watson Studio - IBM Cloud
2 pages
This Study Resource Was
No ratings yet
This Study Resource Was
3 pages

2 Parallel Computer Memory Architectures

Uploaded by

2 Parallel Computer Memory Architectures

Uploaded by

Introduction to Parallel Computing

Parallel Computer Memory Architectures

• Abstraction above hardware and memory architectures

• Value of a programming model is usually judged on its generality

• The implementation of a programming model can take several forms such as

• On stand-alone shared memory machines, this is straightforward.

You might also like