CSE3057Y Parallel and Distributed Systems: Lecture 1 Introduction To Parallel Computing
CSE3057Y Parallel and Distributed Systems: Lecture 1 Introduction To Parallel Computing
Distributed Systems
Lecture 1
Introduction to Parallel Computing
Introduction
● What is parallel computation?
● Computations that use multiprocessor computers and/or several independent computers
interconnected in some way, working together on a common task
● Examples: CRAY T3E, IBMSP, SGI3K, Cluster of Workstations
● Why use parallel computation?
● Computing power (speed/memory)
● Cost/Performance
● Scalability
● Tackle intractable problems
● Larger volumes of data (“Big Data”)
● Performance limits of parallel programs
● Available Parallelism – Amdahl’s Law
● Load Balance (some processors work while others wait)
● Extra work (management of parallelism, redundant computation)
● Communication
Is it really harder to “think” in
parallel?
● Some would argue it is more natural to think in parallel...
● ... and many examples exist in daily life
● House construction parallel tasks, wiring and plumbing
performed at once (independence), but framing must precede
wiring (dependence)
– Similarly, developing large software systems
● Assembly line manufacture pipelining, many instances in
process at once
● Call center independent calls executed simultaneously (data
parallel)
● “Multitasking” – all sorts of variations
Serial v/s Parallel Computation (1)
● Traditionally, software has been written for serial
computation:
● To be run on a single computer having a single CPU
● A problem is broken into a discrete series of instructions
● Instructions are executed one after another
● Only one instruction may execute at any moment in time.
CSE3057Y 4
Serial v/s Parallel Computation (2)
● In the simplest sense, parallel computing is the simultaneous
use of multiple compute resources to solve a computational
problem:
● To be run using multiple CPUs
● A problem is broken into discrete parts that can be solved
concurrently
● Each part is further broken down to a series of instructions
● Instructions from each part execute simultaneously on different CPUs
CSE3057Y 5
Serial v/s Parallel Computation (3)
● The compute resources might be:
● A single computer with multiple processors
● An arbitrary number of computers connected by a network
● A combination of both.
● The computational problem should be able to:
● Be broken apart into discrete pieces of work that can be
solved simultaneously
● Execute multiple program instructions at any moment in time
● Be solved in less time with multiple compute resources than
with a single compute resource.
CSE3057Y 6
Parallel Computing – Definition (1)
● In simple terms:
● A parallel computer is a collection of processing
elements that cooperate to solve problems quickly
● Important considerations:
● Performance
● Efficiency
CSE3057Y 7
Parallel Computing – Definition (2)
● Parallel computing is a form of computation in
which many calculations are carried out
simultaneously, operating on the principle that large
problems can often be divided into smaller ones,
which are then solved concurrently ("in parallel")
● Parallelism has been employed for many years,
mainly in highperformance computing
(Source: Wikipedia)
CSE3057Y 8
Parallelism (1)
● Parallel computers can be roughly classified according
to the level at which the hardware supports parallelism
● Multicore and multiprocessor computers having multiple
processing elements within a single machine
● Clusters, MPPs (Massively Parallel Processors) and grids
use multiple computers to work on the same task
● Specialized parallel computer architectures are
sometimes used alongside traditional processors, for
accelerating specific tasks
CSE3057Y 9
Parallelism (2)
● Parallel computer programs are more difficult to
write than sequential ones, because concurrency
introduces several new classes of potential software
bugs
● Communication and synchronization between the
different subtasks are typically some of the greatest
obstacles to getting good parallel program
performance
CSE3057Y 10
Concurrency v/s Parallelism (1)
● Parallelism:
● System in which the execution of several
programs overlap in time by running them on
separate processors
● Concurrency:
● Potential Parallelism
● Executions may, but need not, overlap
● Useful abstraction because concurrent programs
can better be understood by assuming that all
processes are executed in parallel
CSE3057Y 11
Concurrency v/s Parallelism (2)
● Concurrency is an abstraction
● Set of sequential programs that can be
executed in parallel
● Not related to performance
● Parallelism is an implementation concept
● Use of multiple processors to improve performance
CSE3057Y 12
Concurrency v/s Parallelism (3)
Concurrency
Not
Concurrent
Concurrent
Parallelism
Not parallel
Not parallel
Not
Concurrent Concurrent
Parallel Parallel
CSE3057Y 13
Concurrency as Abstract Parallelism
● I/O operations proceeding ´in parallel´ with
computation
● On a single CPU, the processing required for a
character typed on a keyboard cannot be really done
in parallel with another computation
● But processing the character only requires the use of
the CPU for a fraction of a microsecond
CSE3057Y 14
Parallelism on a Personal Computer
● Your PC contains more than one processor
● The graphics processor: taking information from the
computer´s memory and rendering it on the computer
screen
● Specialized processors for I/O and communications
interface
● Therefore, on a PC:
– Multitasking performed by OS kernel
– Parallel Processing performed by specialized processors
CSE3057Y 15
Why Use Parallel Computing? (1)
● Save time and/or money
● More resources to solve a task will shorten its time to completion with potential
cost savings
● Solve larger problems
● Many problems are so large and/or complex that it is impractical or impossible
to solve them on a single computer, especially given limited computer memory.
● e.g. Web search engines/databases processing millions of transactions per second
● Provide Concurrency
● A single compute resource can only do one thing at a time. Multiple computing
resources can be doing many things simultaneously
● e.g. the Access Grid (www.accessgrid.org) provides a global collaboration
network where people from around the world can meet and conduct work
"virtually"
CSE3057Y 16
Why Use Parallel Computing? (2)
● Use of nonlocal resources
● Using compute resources on a wide area network, or
even the Internet when local compute resources are
scarce
● Limits to serial computing
● Both physical and practical reasons pose significant
constraints to simply building ever faster serial
computers
CSE3057Y 17
Parallel Programming Complexity
● Enough parallelism? (Amdahl's law)
● Granularity
● Locality
● Load balance
● Coordination and synchronisation
All of the above make parallel programming harder
than sequential programming
Speedup
● One major motivation of using parallel processing:
achieve a speedup
● For a given problem,
● Speedup (using p processors) = execution time using 1
processor / execution time using p processors
CSE3057Y 19
Amdahl's Law
● Optimally, the speedup from parallelization would be linear—
doubling the number of processing elements should halve the runtime,
and doubling it a second time should again halve the runtime.
However, very few parallel algorithms achieve optimal speedup.
● The potential speedup of an algorithm on a parallel computing
platform is given by Amdahl's law. It states that a small portion of the
program which cannot be parallelized will limit the overall speedup
available from parallelization. A program solving a large mathematical
or engineering problem will typically consist of several parallelizable
parts and several nonparallelizable (sequential) parts.
CSE3057Y 20
Limits to Maximum Speedup
● The following factors limit the maximum speedup
that can be achieved
● Nonparallelizable parts of the program (as stated by
Amdahl's law)
● Communication (minimize communication cost to
improve speedup)
● Load imbalance (improve work distribution to improve
speedup)
CSE3057Y 21
Challenges of Parallel Computing
● Synchronization and communication between the different subtasks
are typically some of the greatest obstacles to getting good parallel
program performance
● Parallel processing is not a problem if processes are independent
● But for e.g., if an I/O process accepts a character from the
keyboard, it must communicate it to the process running the word
processor and if there are multiple windows on display, it should
be sent to the one with the current focus
CSE3057Y 22
Multiple Computers (1)
● Essential when the computational task requires more processing
than is possible on one computer
● ´Server farms´ containing tens or hundreds of computers used by
internet companies to provide service to millions of customers
● Multiprocessors designed to bring the computing power of
several processors to work on a single computationallyintensive
problem
● Scientific and Engineering simulation
● Weather forecasting and studying climate
CSE3057Y 23
Multiple Computers (2)
● Clusters
● Collection of ordinary PCs (micro computers) that does
not require any additional hardware. They are connected
to each other to operate as one unit. One PC connected
to it is called a node.
● Most clusters run on Linux as it is powerful and allows
modifications to be done without problems since it is free
CSE3057Y 24
Parallel Computer Architectures
● MPP – Massively Parallel Processors
● Top of the top500 list consists of mostly mpps but clusters are
“rising”
● Clusters
● “simple” cluster (1 processor in each node)
● Cluster of small smp’s (small # processors / node)
● Constellations (large # processors / node)
● Older Architectures
● SIMD – Single Instruction Multiple Data (CM2)
● Vector Processors (Old Cray machines)
CSE3057Y 25
Architecture Details
● MPPs are built with specialized networks by vendors with
the intent of being used as a parallel computer. Clusters are
built from independent computers integrated through an
aftermarket network.
● Buzzwords: “COTS” Commodity off the shelf components rather
than custom architectures.
● Clusters are a market reactions to MPPS with the thought of being
cheaper
● Originally considered to have slower communications but are
catching up.
CSE3057Y 26
Applications of Parallel Computing
● Desktop machines, engineering workstations, and compute servers
with two, four, or even eight processors together are becoming
common platforms for design applications
● Large scale applications in science and engineering rely on larger
configurations of parallel computers, often comprising hundreds of
processors
● Data intensive platforms such as database or web servers and
applications such as transaction processing and data mining often user
clusters of workstations that provide high aggregate disk bandwidth
● Applications in graphics and visualization use multiple processing
elements to compute and render realistic environments
● Applications requiring high availability rely on parallel and
distributed platforms for redundancy
CSE3057Y 27
Parallel v/s Distributed Computing (1)
● Parallel computing splits a single ● Distributed computing considers a
application up into tasks that are single application which is executed
executed at the same time and is as a whole but at different locations
more like a topdown approach and is more like a bottomup
approach
● Parallel computing is about
decomposition
● Distributed computing is about
composition
● how we can perform a single
application concurrently
● What happens if many distributed
processes interact with each other
● how we can divide a computation into
● If a global function can be achieved
smaller parts which may potentially be
although there is no global time or state
executed in parallel
● Distributed computing considers
● Parallel computing consider how to
reliability and availability
reach a maximum degree of
concurrency
● Information/resource sharing
● Scientific computing
Parallel v/s Distributed Computing (2)
● The differences are now blurred, especially after the
introduction of grid computing and cloud computing
● The two related fields have many things in common
● Multiple processors
● Networks connecting the processors
● Multiple computing activities and processes
● Input/output data distributed among processors
Grid Computing
● Grid computing is the combination of computer resources from
multiple administrative domains applied to a common task,
usually to a scientific, technical or business problem that
requires a great number of computer processing cycles or the
need to process large amounts of data
● It is a form of distributed computing whereby a “super and
virtual computer” is composed of a cluster of networked
loosely coupled computers acting in concert to perform very
large tasks
● This technology has been applied to computationally intensive
scientific, mathematical, and academic problems, and used in
commercial enterprise data intensive applications
Cloud Computing
● A style of computing where massively scalable IT
related capabilities are provided “as a service” using
Internet technologies to multiple external customers
● Cloud computing describes a new supplement,
consumption and delivery model for IT services based
on the Internet, and it typically involves the provision
of dynamically scalable and often virtualized
resources (storage, platform, infrastructure, and
software) as a service over the Internet
Goal of Parallel Computing
● Solve bigger problems faster
● Often bigger is more important than faster
● Pfold speedups not as important!
Challenge of Parallel Computing
● Coordinate, control, and monitor the computation
CSE3057Y 32
Module Contents (1)
● Designing and writing parallel programs that scale
● Parallel thinking:
– Decompose work into pieces that can be performed in parallel
– Assigning work to processors
– Orchestrating communication/synchronisation between
processors so that it does not limit speedup
● Abstractions/mechanisms for performing the above tasks
– Writing code in parallel programming language
CSE3057Y 33
Module Contents (2)
● Parallel computer hardware implementation: how
parallel computers work
● Mechanisms used to implement abstractions efficiently
● Design tradeoffs: performance v/s convenience v/s cost
● Hardware implementation important because:
● Characteristics of the machine matter
● Efficiency and performance are the main objectives
CSE3057Y 34