0% found this document useful (0 votes)

40 views5 pages

Parallel Programming

This document provides an overview of parallel computing concepts including: - Multi-threading which allows multiple threads to run simultaneously on a single CPU. - Symmetric multiprocessing (SMP) which allows multiple CPUs or CPU cores to access the same memory and perform computations in parallel. - Hyperthreading which allows CPUs to address more logical cores than physical cores through optimizations at the hardware and software levels.

Uploaded by

Taylor Viscaino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views5 pages

Parallel Programming

Uploaded by

Taylor Viscaino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Parallel Programming

Scientific Computing
Fall, 2019
Paul Gribble

1 What is parallel computing? 1

2 Multi-threading 2
3 Symmetric Multiprocessing (SMP) 2
4 Hyperthreading 3
5 Clusters 3
6 Grids 4
7 GPU Computing 4
8 Types of Parallel problems 5
9 MATLAB 5
10 Shell scripts 5

1 What is parallel computing?

Simply put, parallel computing refers to performing multiple computations in parallel, i.e. simultaneously.
By default most operations that take place on your computer happen in serial, that is, one at a time. These
days CPU chips (even those on laptops) have multiple cores, which allow for some degree of parallel
operations.

In principle, every time you double the number of CPU cores (or CPUs themselves), you can achieve
something close to a halving of time to complete the operations. In practice however, there is always some
overhead cost in carryout out the parallel computations. If the operations are at all lengthy however, the
overhead cost is always worth it.

There are several types of parallel computing, which we’ll talk briefly about (and which are listed below).
What we’ll get hands-on experience with is symmetric multiprocessing. This is the style of parallel
computing where multiple CPUs, or multiple cores on a single CPU, share access to the same memory
(RAM) store, and can carry out operations in parallel.

Today (Fall, 2015), it’s still the case that not many programs take advantage of multiple cores. Operating
systems, however, can take advantage of multiple cores through multithreading (see below), by assigning
different threads to their own processing nodes.

Some programs like MATLAB (and some Apple applications) come with the ability to take advantage of
multiple cores built-in. Due to the relative complexity of parallelizing serial code, however, most
applications still operate in a serial fashion.

Parallel computing (Wikipedia)

1
Parallel Programming 2

2 Multi-threading
Modern operating systems like Mac OS X, Linux, and other Unix variants, provide the ability for programs
to spawn multiple threads that execute independently of each other. The advantage of multithreading is that
one process can do its work and other processes don’t have to wait until the working process is done. This is
used extensively for graphical user interfaces.

When you copy a file, you can still move your mouse around, you can still start other programs, you can still
browse the web, while other things are happening simultaneously. Multithreading can occur on a single
CPU with a single core. This isn’t parallel computing per se, as multiple threads still have to share a single
CPU processing unit to do their work—but the operating system manages the multiple threads so that the
user has the impression that multiple things are happening at once.

See the wikipedia article for more details:

Multithreading (wikipedia)

3 Symmetric Multiprocessing (SMP)

These days modern computers ship with CPUs that have multiple cores, or even multiple CPUs each with
multiple cores. At the time of writing these notes (Fall 2015) you can for example buy a Mac Pro desktop
computer with two 6-core CPUs, for a total of 12 independent processing cores. With hyperthreading (see
below) you get 24 processing cores, all for around $5,000—which seems like a lot, but just 10 years ago a
computer cluster with 24 nodes would have cost around $75,000–$100,000.

When multiple CPUs and/or multiple CPU cores live in a single machine, they typically all share access to
the same physical RAM (memory). These days all Apple desktops and laptops have CPUs with multiple
cores. Generic PCs also ship with multiple CPUs and cores. Even smartphones (e.g. the iPhone, and
Google’s Nexus phone) come with multiple CPUs and multiple cores.

The great advantage of having multiple computing nodes in a single machine, is that unlike multithreading
on a single CPU, where the operating system has to switch back and forth between each thread, with
multiple CPUs/cores, each core can execute a different task in parallel with the others (i.e. at the same time).

A good analogy is the following. Imagine someone gives me 10 decks of playing cards, and each deck has
been shuffled, and my task is to re-order each deck of cards. A computer with a single CPU/core is like a
single person who is tasked with sorting all 10 decks of cards. I would have to sort them all, one at a time,
one after the other, i.e. in serial. I could implement “multithreading” by sorting one deck for a few seconds,
setting it aside, sorting the next deck for a few seconds, setting it aside, and so on, sorting bits of each one,
one by one. It’s still happening in serial though.

If I had access to other processing nodes, I could parallelize the task. So imagine instead of me sorting all 10
decks, I found 9 other people to help me. I gave them one of the 10 decks of cards, and I took one. Now we
th
1
can all sort them, at the same time, in parallel. In theory it should take 10 the time compared to me sorting
them all myself, in serial. In fact though, there would be some overhead cost, for example at the beginning,
when I would have to hand out each deck and give everyone their instructions, and then at the end, when I
would have to collect all the sorted decks from each person. If the actual computational task being
parallelized is time intensive, however, then these overhead costs would be minimal compared to the gain in
speed I would achieve by parallelizing the task.

Symmetric multiprocessing, i.e. having multiple independent processing units share the same memory store,
is advantageous compared to cluster or grid computing, where each processing node has its own memory.
In the latter cases, there is a (sometimes relatively major) overhead cost involved in transferring data to the
Parallel Programming 3

memory store for each processing unit, and back again to a head node. When this transfer happens over a
network, as you can imagine, this would be way slower than if it happens on a common logic board on
which all processing cores sit (as is with the case of symmetric multiprocessing).

Here is a wikipedia article on symmetric multiprocessing:

Symmetric multiprocessing (wikipedia)

4 Hyperthreading
Hyperthreading is a proprietary implementation by Intel for allowing modern CPUs to behave as if they
have twice as many logical cores as physical cores. That is, if your CPU has two cores, hyperthreading
implements a series of tricks at the operating system level, that interface with a series of tricks at the
hardware layer (i.e. in the CPU itself) that results in the ability to address four “logical” cores.

Unlike multithreading, which is simply a software implementation at the operating system level,
hyperthreading involves special implementations both at the operating system level and at the hardware
level. Current Apple laptops and desktops all implement hyperthreading. Several generic PCs also
implement hyperthreading.

For large, time consuming computations, hyperthreading won’t actually double the computation speed,
since at the end of the day, there are still x physical cores, even though hyperthreading pretends there are 2x
logical cores. If however each computations is small, and doesn’t last a long time, hyperthreading can end
up giving you performance gains above and beyond regular multithreading, since it implements a number
of efficiencies and tricks at the software and hardware layers.

For our purposes, hyperthreading is either there, or it isn’t, and it’s not something we will be fiddling with.
Here is a wikipedia article on hyperthreading:

Hyper-threading (wikipedia)

5 Clusters
So far we have been talking about a single machine with multiple CPUs and/or multiple cores. Another
way of implementing parallel computing is to connect together multiple machines, over a specialized local
network. In principle one can connect as many machines as one likes, to achieve just about any level of
parallelism one wants. Today’s fastest supercomputers are in fact clusters of machines hooked together. The
world’s fastest supercomputer, as of today, October 2015, is the Tianhe-2, located in Guangzhou, China. It
has 16,000 computer nodes, each one comprising two Intel Ivy Bridge Xeon CPUs and three Xeon Phi chips
for a total of 3,120,000 cores (3.12 million cores).

Sharcnet is a Canadian cluster computing facility with several individual clusters, the largest of which has
8,320 cores. Western has access to the Sharcnet clusters, you just have to sign up for an account.

Many individual researchers also operate smaller clusters, for example with 8, or 12, or 24 machines hooked
together.

A relatively recent development is the advent of gigantic server farms operated by private companies like
Amazon and Google. Amazon’s Elastic Compute Cloud allows individuals to spawn multiple “virtual”
machines, and hook them together in networks and clusters, and run jobs on them. Cost is per machine and
per unit time, and so one can essentially (1) define your own cluster and (2) pay for only those minutes that
you actually use. It’s a very flexible system that many researchers are beginning to utilize. Rhodri Cusack’s
lab, for example, uses cloud-based machines for brain imaging data analysis.
Parallel Programming 4

The obvious advantage of a cluster over a single SMP machine, is that one can add as many nodes onto the
cluster (growing it as you go) to whatever size you want (provided you can pay for it). The disadvantage is
that data transfer over a network can be slower than a SMP machine where CPU cores share the same RAM
store. There is also added complexity in managing a cluster of machines, for example in configuring each
one, and configuring a head node to manage all of the slave nodes. There is software out there that can
organize this for you, for example Oracle Grid Engine, and others, but it’s still not trivial and takes some
investment of time to fully implement.

Computer cluster (wikipedia)

6 Grids
A grid is like a cluster, but the individual machines are not on a local network, but they can be anywhere on
the internet. Sometimes multiple clusters are hooked together over the internet to form a grid. Sometimes a
grid is composed of multiple individual machines, spread out over multiple labs, multiple Departments,
Univerisities, or even countries. Sometimes grids are set up so that individual machines can be “taken over”
as dedicated computational nodes. In other configurations, individual machines only process grid jobs
during their downtime, when for example the user is not using the machine for something else. One way of
setting this up is via a specialized screensaver. Wheneven the screensaver activates (which is an indication
that the machine is not being used), the grid process starts up and processes grid jobs.

Two classic examples of grids are the SETI@home grid (searching for extra-terrestrial life in the universe) and
the Folding@home grid (simulations of protein folding for disease research). In each case, anyone around
the world can sign up their machine to join the grid and donate computer time, install some local software,
and then anytime their computer is not busy, it is recruited by the grid to process data. As of now (Oct 2015)
the Folding@home website shows that there are 8,067,858 CPUs active right now on the Folding@home grid.

There are also nefarious uses for grids, which are sometimes called Botnets. In this case, a virus infects a
user’s machine, installs a nefarious program, which lies dormant until a central machine somewhere on the
internet activates it, for some nasty purpose (like a DDoS attack, or for sending spam). Your machine
essentially becomes a sleeper cell.

Grid computing (wikipedia)

7 GPU Computing
In recent years computer engineers and software developers have teamed up, and have delivered software
libraries that allow developers to utilize graphics cards for more general purpose computing (GPGPU
Computing).

Graphics cards, unlike CPUs, have hundreds if not thousands of cores, each of which are typically used to
process graphics for things like 3D games, video animation and scientific visualization. Each processing unit
on a graphics card is a much simpler beast than the cores on CPU chips—but for some computational tasks,
one doesn’t need much complexity, and massive parallelism can be achieved by farming out general
purpose computational tasks to the thousands of cores on a graphics card.

For example, today (Oct 2015) for around $5,000 one can purchase an NVidia Tesla GPU, which is a single
graphics card, that has 12GB of GPU memory, 2880 cores, and has a processing power of 1.43 Tflops. As you
can imagine, if your computational task is well suited to GPU processing, running it on 2880 cores will be
quite a bit faster than running on 4, 8 or 12 cores (e.g. that you get with a modern dual 6-core CPU Mac Pro).

There are two major C/C++ software libraries that provide relatively high-level interfaces to performing
general purpose computation on graphics cards
Parallel Programming 5

• CUDA (Nvidia proprietary)

• OpenCL (open)

MATLAB’s Parallel Computing Toolbox has the ability to farm out some computations to NVidia
CUDA-enabled GPUs, see this page for more info:

MATLAB GPU Computing Support for NVIDIA CUDA-Enabled GPUs

See this wikipedia page for more general information on GPGPU Computing:

GPGPU Computing

8 Types of Parallel problems

Multithreading is an example of fine-grained parallelism (many shared operations per second), in which the
operating system manages (e.g. switches between) threads at a very fast rate, e.g. with each CPU clock cycle.
This can thus happen many times per second. This is what your operating system does in the background,
as you are interacting with your graphical user interface, surfing the web, playing music, processing video
in the background, all the while copying files from one disk to another.

In another kind of fine-grained parallelism, multiple processes communicate with each other many, many
times per second.

In coarse-grained parallelism, there are many, many independent threads/tasks, that rarely or never
communicate with each other.

Finally, so-called embarassingly parallel problems are 100% independent operations, and never communicate
with each other. Each process doesn’t depend in any way on the result of another operation. This is the kind
of parallelism that we will be talking about in this class.

9 MATLAB
MATLAB provides parallel computing via its Parallel Computing Toolbox (see below).

• MATLAB Parallel Computing Toolbox

• MATLAB Execute loop iterations in parallel using parfor
• MATLAB Getting Started with parfor
• MATLAB Parallel Computing Toolbox Examples

10 Shell scripts
Finally, one can parallelize tasks at the level of the shell, even if the programs you write/run aren’t
parallelized, using a tool like GNU Parallel (see below). Briefly, with GNU Parallel you can split a list of
(ambarassingly parallel) tasks across multiple cores even if the program itself is serial in nature. See the
GNU Parallel page below and the tutorial page for some examples. In our lab we use GNU Parallel to
distribute subject-level brain imaging processing across multiple cores.

• GNU Parallel
• GNU Parallel tutorial

Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
57 pages
Parallel vs. Distributed Computing Explained
No ratings yet
Parallel vs. Distributed Computing Explained
24 pages
Types of Parallel Computing
No ratings yet
Types of Parallel Computing
11 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
30 pages
Map Reduce
No ratings yet
Map Reduce
11 pages
CMP 252 - Parallelism Fundamentals
No ratings yet
CMP 252 - Parallelism Fundamentals
64 pages
PDC 3
No ratings yet
PDC 3
26 pages
Whitepaper Imsl Increase Performance Parallel Programming Numerical Libraries
No ratings yet
Whitepaper Imsl Increase Performance Parallel Programming Numerical Libraries
8 pages
Overview of Parallel Computing Concepts
No ratings yet
Overview of Parallel Computing Concepts
46 pages
Lecture 1 Introduction 1
No ratings yet
Lecture 1 Introduction 1
49 pages
Parallel Computing Concepts Explained
No ratings yet
Parallel Computing Concepts Explained
90 pages
CS-482 Lecture#1 IntroductiontoParallelandDistributedComputing
No ratings yet
CS-482 Lecture#1 IntroductiontoParallelandDistributedComputing
26 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Intro Parallel Computing PDF
No ratings yet
Intro Parallel Computing PDF
58 pages
Distributed Computing & Parallel Processing
No ratings yet
Distributed Computing & Parallel Processing
11 pages
PDC Lecture 2
No ratings yet
PDC Lecture 2
13 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
30 pages
CO1-Parallel Computers Latest
No ratings yet
CO1-Parallel Computers Latest
16 pages
Parallel Programming
No ratings yet
Parallel Programming
10 pages
Topic 1 2024
No ratings yet
Topic 1 2024
41 pages
Chapter 1 Parallel - Computing General Concepts
No ratings yet
Chapter 1 Parallel - Computing General Concepts
52 pages
Co 1
No ratings yet
Co 1
66 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
Multiprocessing Vs Multithreading 2
No ratings yet
Multiprocessing Vs Multithreading 2
16 pages
Elective 3
No ratings yet
Elective 3
30 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
CO1-Parallel Computers - Latest
No ratings yet
CO1-Parallel Computers - Latest
16 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
34 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Introduction To Parallel Computing
100% (1)
Introduction To Parallel Computing
34 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Parallel & Distributed Computing Course Overview
No ratings yet
Parallel & Distributed Computing Course Overview
63 pages
Multicore02 1 Updated
No ratings yet
Multicore02 1 Updated
25 pages
Assignment 1st PC
No ratings yet
Assignment 1st PC
12 pages
High Performance Computing
100% (2)
High Performance Computing
164 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Theory and Applications of Parallel Computing
No ratings yet
Theory and Applications of Parallel Computing
148 pages
CS621 - Handouts - Mids
No ratings yet
CS621 - Handouts - Mids
61 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
51 pages
UNIT 2 (HPC)
No ratings yet
UNIT 2 (HPC)
10 pages
Understanding Parallel Computing Basics
No ratings yet
Understanding Parallel Computing Basics
11 pages
Multi-Core Processors in CS149
No ratings yet
Multi-Core Processors in CS149
107 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Parallel Processing: Types of Parallelism
No ratings yet
Parallel Processing: Types of Parallelism
7 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Parallel Computing: "Parallelization" Redirects Here. For Parallelization of Manifolds, See
No ratings yet
Parallel Computing: "Parallelization" Redirects Here. For Parallelization of Manifolds, See
20 pages
Generic Questions
No ratings yet
Generic Questions
70 pages
PDC 1
No ratings yet
PDC 1
41 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
43 pages
Multicore02 2
No ratings yet
Multicore02 2
18 pages
Parallel Processing Assignment 1
No ratings yet
Parallel Processing Assignment 1
14 pages
Parallel Computing Overview
No ratings yet
Parallel Computing Overview
14 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
Parallel Computing Varun Patial
No ratings yet
Parallel Computing Varun Patial
41 pages
Certified Kubernetes Administrator
No ratings yet
Certified Kubernetes Administrator
4 pages
First Steps in Using SDFs for Edits
No ratings yet
First Steps in Using SDFs for Edits
102 pages
Distributed vs. Decentralized Systems
No ratings yet
Distributed vs. Decentralized Systems
99 pages
HP Storageworks P4000 San Configuration and Administration: Lab Guide
No ratings yet
HP Storageworks P4000 San Configuration and Administration: Lab Guide
27 pages
B.Tech CSE Study Evaluation Scheme
0% (2)
B.Tech CSE Study Evaluation Scheme
32 pages
Cloud Anywhere:: Azure For Hybrid and Multicloud Environments
No ratings yet
Cloud Anywhere:: Azure For Hybrid and Multicloud Environments
34 pages
Sandvine Ds Policy Traffic Switch
No ratings yet
Sandvine Ds Policy Traffic Switch
10 pages
Upgrade Guide - Ovirt
No ratings yet
Upgrade Guide - Ovirt
60 pages
BDA Unit-1
No ratings yet
BDA Unit-1
56 pages
Differences in DF and Du On Oracle Cluster File System (OCFS2) and Orphan Files
No ratings yet
Differences in DF and Du On Oracle Cluster File System (OCFS2) and Orphan Files
5 pages
SIOS Protection Suite For Linux Network Attached Storage Reovery Kit v9.2.2
No ratings yet
SIOS Protection Suite For Linux Network Attached Storage Reovery Kit v9.2.2
23 pages
Cisco Data Center Network Architecture and Solutions Overview
No ratings yet
Cisco Data Center Network Architecture and Solutions Overview
19 pages
Oracle Standalone& RAC Explained & Top Interview Questions
No ratings yet
Oracle Standalone& RAC Explained & Top Interview Questions
26 pages
70-764 SQL Database Admin Exam Guide
No ratings yet
70-764 SQL Database Admin Exam Guide
15 pages
CPSP Dissertation Writing Guide
100% (2)
CPSP Dissertation Writing Guide
6 pages
Nptel CC 2025 Qa
No ratings yet
Nptel CC 2025 Qa
119 pages
Ccs335 Cloud Computing-Unit - I Notes
No ratings yet
Ccs335 Cloud Computing-Unit - I Notes
37 pages
Job Scheduler Control M Migration en
No ratings yet
Job Scheduler Control M Migration en
12 pages
Blockchain Document Storage Security
No ratings yet
Blockchain Document Storage Security
7 pages
Evolution of Distributed Computing
No ratings yet
Evolution of Distributed Computing
20 pages
Product Data Sheet Deltav Virtual Studio For Hyperconverged Infrastructure Deltav en 8486150
No ratings yet
Product Data Sheet Deltav Virtual Studio For Hyperconverged Infrastructure Deltav en 8486150
16 pages
CX300 Toc
No ratings yet
CX300 Toc
6 pages
Azure Databricks Course Slide Deck
75% (4)
Azure Databricks Course Slide Deck
169 pages
D-P3 Dump
No ratings yet
D-P3 Dump
10 pages
VMware Cloud On AWS Cheat Sheet
No ratings yet
VMware Cloud On AWS Cheat Sheet
1 page
Proxmox VE Ceph Benchmark 202312 Rev0
No ratings yet
Proxmox VE Ceph Benchmark 202312 Rev0
18 pages
Tessent AppNote Memory Shared BUS
100% (2)
Tessent AppNote Memory Shared BUS
38 pages
Introduction To Pacemaker Cluster Tools - Red Hat Customer Portal
No ratings yet
Introduction To Pacemaker Cluster Tools - Red Hat Customer Portal
6 pages
Isilon - Configure 10 G
No ratings yet
Isilon - Configure 10 G
8 pages
WindreamCentera e
No ratings yet
WindreamCentera e
44 pages

Parallel Programming

Uploaded by

Parallel Programming

Uploaded by

Parallel Programming

1 What is parallel computing? 1

1 What is parallel computing?

Parallel computing (Wikipedia)

See the wikipedia article for more details:

3 Symmetric Multiprocessing (SMP)

Here is a wikipedia article on symmetric multiprocessing:

Symmetric multiprocessing (wikipedia)

Computer cluster (wikipedia)

Grid computing (wikipedia)

• CUDA (Nvidia proprietary)

MATLAB GPU Computing Support for NVIDIA CUDA-Enabled GPUs

8 Types of Parallel problems

• MATLAB Parallel Computing Toolbox

You might also like