Week1-Parallel-and-Distributed-Computing
Week1-Parallel-and-Distributed-Computing
Distributed
Computing
Instructor
Muhammad Danish Khan
Lecturer, Department of Computer Science
FAST NUCES Karachi
m.danish@nu.edu.pk
Week 14: Distributed System Models and Enabling Technologies, Assignment Task(s)
Week 15: Distributed System Models and Enabling Technologies, Quiz-3, Project
Evaluations
Week 16: Distributed System Models and Enabling technologies, Project Evaluations
LMS: Google Class Room
Section 5C Class Code: eg2u47t
https://classroom.google.com/c/Mzg4NTA1MTMzNDI3?cjc=eg2u47t
Operating System Concepts
Program
◦ Set of instructions and associated data
◦ resides on the disk and is loaded by the operating system to perform some task.
◦ E.g. An executable file or a python script file.
Process
◦ A program in execution.
◦ In order to run a program, the operating system's kernel is first asked to create a new
process, which is an environment in which a program executes.
◦ consists of instructions, user-data, and system-data segments, CPU, memory, address-
space, disk acquired at runtime
Thread
◦ the smallest unit of execution in a process.
◦ A thread simply executes instructions serially.
◦ A process can have multiple threads running as part of it.
◦ Processes don't share any resources amongst themselves whereas threads
of a process can share the resources allocated to that particular process,
including memory address space.
CLOUD COMPUTING DEFINITION
Parallel Execution
Parallelism
The term parallelism means that an application splits its tasks up
into smaller subtasks which can be processed in parallel, for instance
on multiple CPUs at the exact same time.
Serial Execution vs. Parallel Execution
Transmission speeds - the speed of a serial computer is directly dependent upon how
fast data can move through hardware.
◦ Absolute limits are the speed of light (30 cm/nanosecond) and the transmission limit of copper
wire (9 cm/nanosecond). Increasing speeds necessitate increasing proximity of processing
elements.
◦ Solved in less time with multiple compute resources than with a single
compute resource.
LD $12, (100)
ADD $11, $12
SUB $10, $11
INC $10
SW $13, ($10)
int sample1
{
X = sample2()
Return x;
}
float sample3
{
Pi=3.14
Return pi
}
Int sample2()
{
Cin>>I
Return I;
}
Parallel Computing: what for?
Example applications include:
… ..
Flynn Taxonomy
referred to as Flynn's Taxonomy, is a classification system for
computer architectures introduced by Michael J. Flynn in 1966. It
categorizes computer systems based on the number of instruction
streams and data streams they can handle simultaneously. This
taxonomy is particularly useful for understanding parallel processing
and the design of processors.
Flynn Taxonomy
Based on the number of concurrent instruction (single or multiple)
and data streams (single or multiple) available in the architecture
Single Instruction, Single Data
(SISD)
It represents the organization of a single computer containing a control
unit, processor unit and a memory unit.
This is the oldest and until recently, the most prevalent form of computer
◦ Examples: most PCs, single CPU workstations and mainframes
Single Instruction, Single Data
(SISD)
Single Instruction, Multiple Data
(SIMD)
Single instruction: All processing units execute the same instruction at any given clock cycle
Multiple data: Each processing unit can operate on a different data element
The processing units are made to operate under the control of a common control unit, thus
providing a single instruction stream and multiple data streams.
◦ Best suited for specialized problems characterized by a high degree of regularity, such as image processing.
Parallel Task
◦ A task that can be executed by multiple processors safely (yields correct results)
Serial Execution
◦ Execution of a program sequentially, one statement at a time. In the simplest sense, this is
what happens on a one processor machine. However, virtually all parallel tasks will have
sections of a parallel program that must be executed serially.
Parallel Execution
◦ Execution of a program by more than one task, with each task being able to execute the same or
different statement at the same moment in time.
Shared Memory
◦ From a strictly hardware point of view, describes a computer architecture where all processors have
direct (usually bus based) access to common physical memory.
◦ In a programming sense, it describes a model where parallel tasks all have the same "picture" of memory and can directly address and access the same
logical memory locations regardless of where the physical memory actually exists.
Distributed Memory
◦ In hardware, refers to network based memory access for physical memory that is not common. As a
programming model, tasks can only logically "see" local machine memory and must use
communications to access memory on other machines where other tasks are executing.
Communications
◦ Parallel tasks typically need to exchange data. There are several ways this can be accomplished, such as
through a shared memory bus or over a network, however the actual event of data exchange is
commonly referred to as communications regardless of the method employed.
Synchronization
◦ The coordination of parallel tasks in real time, very often associated with communications. Often
implemented by establishing a synchronization point within an application where a task may not
proceed further until another task(s) reaches the same or logically equivalent point.
◦ Synchronization usually involves waiting by at least one task, and can therefore cause a parallel
application's wall clock execution time to increase.
Granularity
◦ In parallel computing, granularity is a qualitative measure of the ratio of computation to communication.
◦ Coarse: relatively large amounts of computational work are done between communication events
◦ Fine: relatively small amounts of computational work are done between communication events
Observed Speedup
◦ Observed speedup of a code which has been parallelized, defined as:
◦ One of the simplest and most widely used indicators for a parallel program's performance.
Parallel Overhead
◦ The amount of time required to coordinate parallel tasks, as opposed to doing useful work. Parallel
overhead can include factors such as:
◦ Task start-up time
◦ Synchronizations
◦ Data communications
◦ Software overhead imposed by parallel compilers, libraries, tools, operating system, etc.
◦ Task termination time
Massively Parallel
◦ Refers to the hardware that comprises a given parallel system - having many processors. The meaning of
many keeps increasing, but currently BG/L* pushes this number to 6 digits.
*Blue Gene is an IBM project aimed at designing supercomputers that can reach operating
speeds in the petaFLOPS (PFLOPS) range, with low power consumption.
Scalability
◦ Refers to a parallel system's (hardware and/or software) ability to
demonstrate a proportionate increase in parallel speedup with the addition of
more processors.
Multiple processors can operate independently but share the same memory resources.
Changes in a memory location effected by one processor are visible to all other processors.
Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA.
Shared Memory : UMA vs. NUMA
Uniform Memory Access (UMA):
◦ Most commonly represented today by Symmetric Multiprocessor (SMP) machines
◦ Identical processors with equal access and access times to memory
◦ Sometimes called CC-UMA - Cache Coherent UMA.
Disadvantages
◦ The programmer is responsible for many of the details associated with data communication between
processors.
◦ It may be difficult to map existing data structures, based on global memory, to this memory
organization.
◦ Non-uniform memory access (NUMA) times
Hybrid Distributed-Shared
Memory
Comparison of Shared and Distributed Memory Architectures