0% found this document useful (0 votes)
60 views44 pages

CS621 Week 6

Uploaded by

Gurchani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views44 pages

CS621 Week 6

Uploaded by

Gurchani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Dr.

Muhammad Anwaar Saeed


Dr. Said Nabi
Ms. Hina Ishaq

CS621 Parallel and Distributed


Computing
Concurrency Control

CS621 Parallel and Distributed


Computing
What is Concurrency?

Objective
s
Mechanisms for Concurrency
Control.
Definition of Concurrency

“Concurrency is the task of running two or more computations


over the same time interval. Two events are said to be concurrent
if they occur within the same time interval.”
What is Concurrency?

Concurrent doesn't necessarily mean at the same exact instant.


For example, two tasks may occur concurrently within the same
second but with each task executing within different fractions of
the second.

Concurrent tasks can execute in a single or multiprocessing


environment.
In a multiprocessor environment, if enough processors are
In a single processing environment, concurrent tasks exist at free, concurrent tasks may execute at the same instant over
the same time and execute within the same time period by the same time period. The determining factor for what
context switching. makes an acceptable time period for concurrency is relative
to the application.
Concurrency Control

• There must be an implicit or explicit control over concurrency. It is both


hazardous and unsafe when multiple flows of executions
simultaneously operate in the same address space without any kind of
agreement on ordered access. Two or more activities might access
the same data and thus induce data corruption as well as inconsistent
or invalid application state.
• Multiple activities that work jointly on a problem need an agreement
on their common progress. Both issues represent fundamental
challenges of concurrency and concurrent programming.
Concurrency Control Cont…

• Synchronization is a mechanism
Synchronizatio that controls access on shared
n and resources between multiple
Coordination activities. It enforces exclusiveness
are two basic and ordered access on the resource
by different activities.
approaches to • Coordination aims at the
tackle this orchestration of collaborating
challenge: activities.
The overall time to perform the series of tasks
is reduced.

Concurrent processes can reduce duplication in


code.

The overall runtime of the algorithm can be


Benefits of significantly reduced.
Concurrency control can also increase the
Concurrency scalability of parallel and distributed computing
systems

Redundancy can make systems more reliable.

More real-world problems can be solved than


with sequential algorithms alone.
Concurrency Control:
Basic Approaches to
Achieving Concurrency

CS621 Parallel and Distributed


Computing
Understanding of Concurrency
Control.

Objective
Parallel Programming Technique.
s

Distributed Programming
Technique.
Achieving Concurrency

Parallel • Parallel programming techniques assign


programming the work a program has do to two or
and distributed more processors within a single physical
or a single virtual computer.
programming • Distributed programming techniques
are two basic assign the work a program has to do to
approaches for two or more processes where the
processes may or may not exist on the
achieving same computer.
concurrency:
Achieving Concurrency:
Parallel Programming Technique
• The parallel application consists of one program
divided into four tasks. Each task executes on a
separate processor; therefore, each task may
execute simultaneously. The tasks can be
implemented by either a process or a thread.

Typical architecture for a


parallel program.
Achieving Concurrency:
Distributed Programming Technique

• The distributed application consists of three


separate programs with each program executing on
a separate computer. Program 3 consists of two
separate parts that execute on the same computer.
Although task A and D of Program 3 are on the
same computer, they are distributed because they
are implemented by two separate processes. Typical architecture for a
parallel and distributed
program.
Concurrency Control:
Models for Programming
Concurrency

CS621 Parallel and Distributed


Computing
Models of Programming
Concurrency.

Objective
s
Van Roy Approaches for
Programming Concurrency.
Concurrency Control: Models for
Programming Concurrency

Van • Sequential Programming.


Roy introduces • Declarative Concurrency.
four main • Message-passing
approaches for Concurrency.
programming • Shared-state Concurrency.
concurrency:
Sequential Programming

In this deterministic programming model, no


concurrency is used at all. In its strongest form,
there is a total order of all operations of the
program. Weaker forms still keep the deterministic
behavior. However, they either make no guarantees
on the exact execution order to the programmer a
priori. Or they provide mechanisms for explicit
preemption of the task currently active, as co-
routines do, for instance.
Declarative Concurrency

Declarative programming is a programming model that


favors implicit control flow of computations. Control flow
is not described directly, it is rather a result of
computational logic of the program. The declarative
concurrency model extends the declarative programming
model by allowing multiple flows of executions. It adds
implicit concurrency that is based on a data-driven or a
demand-driven approach. While this introduces some
form of nondeterminism at runtime, the nondeterminism
is generally not observable from the outside.
Message-passing Concurrency

This model is a programming style that allows


concurrent activities to communicate via
messages. Generally, this is the only allowed
form of interaction between activities which are
otherwise completely isolated. Message passing
can be either synchronous or asynchronous
resulting in different mechanisms and patterns
for synchronization and coordination.
Shared-state Concurrency

Shared-state concurrency is an extended programming


model where multiple activities are allowed to access
contended resources and states. Sharing the exact
same resources and states among different activities
requires dedicated mechanisms for synchronization of
access and coordination between activities. The
general nondeterminism and missing invariants of this
model would otherwise directly cause problems
regarding consistency and state validity.
Memory Hierarchies

CS621 Parallel and Distributed


Computing
Introduction of Memory Hierarchy.

Objective
s
Characteristics of Memory
Hierarchy.
Memory Hierarchies

“Memory Hierarchy, in Computer System Design, is an


enhancement that helps in organizing the memory so that it
can minimize the access time. The development of the
Memory Hierarchy occurred on a behavior of a program
known as locality of references.”
Memory Hierarchies Cont…
We are concerned with five types of memory:
• Registers: are the fastest type of memory, which are located internal to a processor. These
elements are primarily used for temporary storage of operands, small partitions of memory,
etc., and are assumed to be one word (32 bits) in length in the MIPS architecture.
• Cache: is a very fast type of memory that can be external or internal to a given processor.
Cache is used for temporary storage of blocks of data obtained from main memory (read
operation) or created by the processor and eventually written to main memory (write
operation).
• Main Memory: is modelled as a large, linear array of storage elements that is partitioned
into static and dynamic storage. Main memory is used primarily for storage of data that a
program produces during its execution, as well as for instruction storage.
• Disk Storage: is much slower than main memory, but also has much higher capacity than
the preceding three types of memory.
• Archival Storage: is offline storage such as a CD-ROM jukebox or (in former years) rooms
filled with racks containing magnetic tapes. This type of storage has a very long access
time, in comparison with disk storage, and is also designed to be much less volatile than
disk data.
Memory Hierarchies Cont…
Characteristics of Memory Hierarchy

Characteristi • Capacity: It refers to the total volume of


cs of a data that a system’s memory can store.
Memory The capacity increases moving from the
top to the bottom in the Memory Hierarchy.
Hierarchy • Access Time: It refers to the time interval
can be present between the request for read/write
and the data availability. The access time
inferred from increases as we move from the top to the
the previous bottom in the Memory Hierarchy.
figure:
Characteristics of Memory Hierarchy Cont..

Characteristi • Performance: When a computer system was designed


earlier without the Memory Hierarchy Design, the gap in
cs of a speed increased between the given CPU registers and the
Main Memory due to a large difference in the system’s
Memory access time. It ultimately resulted in the system’s lower
performance, and thus, enhancement was required. Such a
Hierarchy kind of enhancement was introduced in the form of Memory
Hierarchy Design, and because of this, the system’s
can be performance increased. One of the primary ways to increase
the performance of a system is minimizing how much a
inferred from memory hierarchy has to be done to manipulate data.
• Cost per bit: The cost per bit increases as one moves from
the previous the bottom to the top in the Memory Hierarchy, i.e. External
Memory is cheaper than Internal Memory.
figure:
Limitations of Memory
System Performance

CS621 Parallel and Distributed


Computing
Understanding of Limitations of
Memory System Performance.

Objective
s
Memory Latency Example.
Memory system, and not processor speed, is
often the bottleneck for many applications.

Limitations of
Memory
System
Performance Memory system performance is largely
captured by two parameters, latency and
bandwidth.
Latency: Is the time from
Bandwidth: Is the rate at
the issue of a memory
which data can be pumped
request to the time the data
to the processor by the
is available at the
memory system.
processor.
It is very important to understand the
difference between latency and
bandwidth.
Limitations of
Consider the example of a fire-hose. If
Memory the water comes out of the hose two
System seconds after the hydrant is turned on,
Performance the latency of the system is two
seconds.
Cont… • Once the water starts flowing, if the hydrant
delivers water at the rate of 5 gallons/second,
the bandwidth of the system is 5 gallons/second.
• If you want immediate response from the
hydrant, it is important to reduce latency.
• If you want to fight big fires, you want high
bandwidth.
Memory Latency Example

Consider a processor operating at 1 GHz (1 ns clock) connected to


a DRAM with a latency of 100 ns (no caches). Assume that the
processor has two multiply-add units and is capable of executing
four instructions in each cycle of 1 ns. The following observations
follow:
• The peak processor rating is 4 GFLOPS.
• Since the memory latency is equal to 100 cycles and block size is one
word, every time a memory request is made, the processor must wait 100
cycles before it can process the data.
Improving Effective
Memory
Latency Using Caches

CS621 Parallel and Distributed


Computing
Effect of Cache.

Objective
s
Effect of Cache with Example.
Improving Effective Memory Latency
Using Caches
Caches are small and fast memory elements between the
processor and DRAM.

This memory acts as a low-latency high-bandwidth storage.

If a piece of data is repeatedly used, the effective latency of


this memory system can be reduced by the cache.

The fraction of data references satisfied by the cache is


called the cache hit ratio of the computation on the system.

Cache hit ratio achieved by a code on a memory system


often determines its performance.
Effect of Cache

In our example, we had


O(n2) data accesses and
Repeated references to the O(n3) computation. This Reduce network congestion
same data item correspond asymptotic difference and improve overall
to temporal locality. makes the above example performance.
particularly desirable for
caches.
Effect of Cache Example

Continue the previous example of memory latency, we introduce a


cache of size 32 KB with a latency of 1 ns or one cycle. We use
this setup to multiply two matrices A and B of dimensions 32 ×
32(8KB or 1K words for each matrix). We have carefully chosen
these numbers so that the cache is large enough to store matrices
A and B, as well as the result matrix C.
• 1GHz processor, 4GFLOPS theoretical peak, 100ns memory Latency.
• Assume 1ns cache latency (full-speed cache)
Effect of Cache Example Cont…

The following observations can be made about the problem:


• Fetching the two matrices into the cache corresponds to fetching 2K
words, which takes approximately 200 µs.
• Multiplying two n × n matrices takes 2n3 operations. For our problem, this
corresponds to 64K operations, which can be performed in 16K cycles (or
16 µs) at four instructions per cycle.
• The total time for the computation is therefore approximately the sum of
time for load/store operations and the time for the computation itself, i.e.,
200 + 16 µs.
• This corresponds to a peak computation rate of 64K/216 or 303 MFLOPS.
Effect of Memory
Bandwidth

CS621 Parallel and Distributed


Computing
Effect of Memory Bandwidth.

Objective
s
Effect of Memory Bandwidth
Example.
Effect of Memory Bandwidth

Memory bandwidth is determined by the


bandwidth of the memory bus as well as the
memory units.

Memory bandwidth can be improved by


increasing the size of memory blocks.

The performance of the CPU or GPU can also


impact memory bandwidth.
Effect of Memory Bandwidth Example

Consider the same setup as in previous topic, except


in this case, the block size is 4 words instead of 1
word. We repeat the dot-product computation in this
scenario:
• Assuming that the vectors are laid out linearly in memory, eight
FLOPs (four multiply-adds) can be performed in 200 cycles.
• This is because a single memory access fetches four consecutive
words in the vector.
• Therefore, two accesses can fetch four elements of each of the
vectors. This corresponds to a FLOP every 25 ns, for a peak speed
of 40 MFLOPS.
Effect of Memory Bandwidth Example
Cont…
It is important to note that increasing block size does not
change latency of the system.

Physically, the scenario illustrated here can be viewed as


a wide data bus (4 words or 128 bits) connected to
multiple memory banks.

In practice, such wide buses are expensive to construct.

In a more practical system, consecutive words are sent


on the memory bus on subsequent bus cycles after the
first word is retrieved.
Effect of Memory Bandwidth Example Cont…

The above examples clearly illustrate how


increased bandwidth results in higher peak
computation rates.

The data layouts were assumed to be such that


consecutive data words in memory were used
by successive instructions (spatial locality of
reference).

If we take a data-layout centric view,


computations must be reordered to enhance
spatial locality of reference.

You might also like