0% found this document useful (0 votes)
11 views28 pages

Unit 3

The document discusses shared memory architectures, specifically UMA (Uniform Memory Access) and NUMA (Non-Uniform Memory Access), as well as parallel systems that improve processing speed through simultaneous task execution. It classifies multiprocessors based on memory architecture, instruction execution, and interconnection, highlighting their advantages and disadvantages. Additionally, it covers Flynn's taxonomy, message-passing versus shared memory systems, and the concepts of parallel, concurrent, and distributed programming.

Uploaded by

malviyat42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views28 pages

Unit 3

The document discusses shared memory architectures, specifically UMA (Uniform Memory Access) and NUMA (Non-Uniform Memory Access), as well as parallel systems that improve processing speed through simultaneous task execution. It classifies multiprocessors based on memory architecture, instruction execution, and interconnection, highlighting their advantages and disadvantages. Additionally, it covers Flynn's taxonomy, message-passing versus shared memory systems, and the concepts of parallel, concurrent, and distributed programming.

Uploaded by

malviyat42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Shared Memory

1. UMA (Uniform Memory Access)

This model distributes physical memory uniformly among the


processors, with each processor having an equal access time
to all memory words.

When all the processors have equal access to all the


peripheral devices, the system is called a symmetric
multiprocessor (SMP).

When only one or a few processors can access the peripheral


devices, the system is called an asymmetric multiprocessor
(AMP).

When a CPU wants to access a memory location, it checks if


the bus is free, then it sends the request to the memory
interface module and waits for the requested data to be
available on the bus.

2. NUMA (Non-Uniform Memory Access) model

In NUMA multiprocessor model, the access time varies with


the location of the memory word.

 Here, the shared memory is physically distributed among all


the processors, called local memories.

 The collection of all local memories forms a global address


space which can be accessed by all the processors.

 NUMA systems also share CPUs and the address space,


but each processor has a local memory, visible to all other
processors.

 In NUMA systems access to local memory blocks is quicker


than access to remote memory blocks.
PARALLEL SYSTEMS
Processing Parallel Systems divide the program into multiple segments and process them
simultaneously.

The main objective of parallel systems is to improve the


processing speed. They are sometimes known as
multiprocessor or multi computers. They refer to
simultaneous use of multiple computer resources that can
include a single computer with multiple processors, a number
of computers connected by a network to form a parallel
processing cluster or a combination of both.

Characteristics of parallel systems: A parallel system may


be broadly classified as belonging to one of the following
types:
1. A multiprocessor system
2. A multicomputer parallel system

A multiprocessor is a system that contains multiple central


processing units (CPUs) that share full access to a
common random-access memory (RAM). The main aim of
using a multiprocessor is to enhance the speed of system
execution, with other goals including app matching and fault
tolerance. Multiple processors may execute various tasks
simultaneously. If one CPU has failed, it doesn't affect the
tasks of the other processors. As a result, a multiprocessor is
more dependable.
Classification of Multiprocessors

Multiprocessors are classified based on memory architecture,


instruction execution, interconnection, and processing
organization. Below is a detailed classification:

1. Based on Memory Architecture

This classification depends on how processors access memory.

(a) Shared Memory Multiprocessors (Tightly Coupled Systems)

 All processors share a single global memory.


 Processors communicate via shared memory.
 Requires synchronization mechanisms to manage concurrent
access.

Types:

 Uniform Memory Access (UMA): All processors have equal


access time to memory (e.g., Symmetric Multiprocessing).
 Non-Uniform Memory Access (NUMA): Memory is closer to
some processors, making access time variable (e.g., High-
performance computing servers).

(b) Distributed Memory Multiprocessors (Loosely Coupled


Systems)
 Each processor has its own private memory.
 Processors communicate via message passing.
 Used in large-scale computing systems like supercomputers
and cloud data centers.

2. Based on Instruction Execution

This classification is based on how processors handle instructions.

(a) Single Instruction, Multiple Data (SIMD)

 One instruction is executed on multiple data points


simultaneously.
 Used in vector processors, GPUs, and AI accelerators.
 Example: Graphics Processing Units (GPUs) used in deep
learning.

(b) Multiple Instruction, Multiple Data (MIMD)

 Each processor executes different instructions on different


data sets.
 Used in general-purpose multiprocessor systems (e.g., high-
performance computing, parallel databases).

3. Based on Interconnection Network

This classification is based on how processors and memory units are


connected.

(a) Bus-Based Multiprocessors

 All processors share a common bus to access memory and


communicate.
 Works well for a small number of processors but scales poorly.
(b) Crossbar and Multistage Interconnection Networks (MIN)

 Uses switching networks to connect processors and memory


efficiently.
 Example: IBM POWER systems, high-end servers.

Advantages and Disadvantages of Multiprocessor

There are various advantages and disadvantages of


the multiprocessor system. Some advantages and
disadvantages of the multiprocessor system are as follows:

Advantages
1. As several processors share their job between one and
another system, the work is completed with collaboration. It
suggests that these systems are dependable.
2. When numerous processors are connected, it aids in
matching the needs of an application. At the same time, a
multiprocessor system saves money by eliminating the need
for centralized capabilities. However, this structure allows for
future expansion.
3. It aids in enhancing the authenticity of the system. A failure in
any one component of a multiprocessor system has a limited
impact on the rest of the system.
4. It aids in enhancing the system's cost or performance ratio.
5. There is a larger burden in a single processor system
because several processes should be executed
simultaneously. However, several processes are executed
just a few times in a multiprocessor system. It means
multiprocessor CPUs use less power than a single processor.

Disadvantages

1. It has several processors that share a memory, peripherals,


and some other resources.
2. If one of the CPUs fails, the work is shared among the
remaining CPUs. The negative effect will be that work will be
completed quickly, and the system's performance will suffer.
3. Even while multiprocessor systems are cheaper in the long
term than using several systems, they are still extremely
expensive. A simple single-processor system is substantially
cheaper to buy than a multiprocessor system.
4. All processors share memory in a multiprocessor system. As
a result, a substantially bigger pool of memory is needed than
in single-processor systems.
5. If one processor is already utilizing an Input/output device,
additional processors cannot utilize the same Input/output
device, resulting in deadlock.
A multicomputer parallel system
 It is a parallel system in which the multiple processors do
not have direct access to shared memory. The memory of
the multiple processors may or may not form a common
address space. Such computers usually do not have a
common clock.
 Every processor has its memory, which is solely
accessible by that processor. An interconnection network
allows the processors to communicate with one another.


 As the multicomputer can transmit messages between the
processors, the task may be categorized among the CPUs
to be completed. Therefore, the multicomputer may be
utilized for distributed computation. A multicomputer is
easier and less expensive to develop than a
multiprocessor. On the other hand, programming a
multicomputer is complex.
Flynn’s Taxonomy
Flynn's taxonomy is a specific classification of parallel computer
architectures that are based on the number of concurrent
instruction (single or multiple) and data streams (single or
multiple) available in the architecture.

Flynn's taxonomy based on the number of instruction


streams and data streams are the following:
1. (SISD) single instruction, single data
2. (MISD) multiple instruction, single data
3. (SIMD) single instruction, multiple data
4. (MIMD) multiple instruction, multiple data

1. SISD (Single Instruction, Single Data stream)


 Single Instruction, Single Data (SISD) refers to an
Instruction Set Architecture in which a single processor
(one CPU) executes exactly one instruction stream at a
time.
 It also fetches or stores one item of data at a time to
operate on data stored in a single memory unit.
 Most of the CPU design is based on the von Neumann
architecture and the follow SISD.
 The SISD model is a non-pipelined architecture with
general-purpose registers, Program Counter (PC), the
Instruction Register (IR), Memory Address Registers
(MAR) and Memory Data Registers (MDR).
Fig : Single Instruction, Single Data Stream (PU=Processing Unit)

SIMD (Single Instruction, Multiple Data streams)


 Single Instruction, Multiple Data (SIMD) is an Instruction
Set Architecture that have a single control unit (CU) and
more than one processing unit (PU) that operates like a
von Neumann machine by executing a single instruction
stream over PUs, handled through the CU.
 The CU generates the control signals for all of the PUs
and by which executes the same operation on different

data streams. The SIMD architecture is capable of


achieving data level parallelism.

MISD (Multiple Instruction, Single Data stream)


 Multiple Instruction, Single Data (MISD) is an Instruction
Set Architecture for parallel computing where many
functional units perform different operations by executing
different instructions on the same data set.
 This type of architecture is common mainly in the fault-
tolerant computers executing the same instructions

redundantly in order to detect and mask errors.

MIMD (Multiple Instruction, Multiple Data streams)


 Multiple Instruction stream, Multiple Data stream (MIMD) is
an Instruction Set Architecture for parallel computing that
is typical of the computers with multiprocessors.
 Using the MIMD, each processor in a multiprocessor
system can execute asynchronously different set of the
instructions independently on the different set of data units.
 The MIMD based computer systems can used the shared
memory in a memory pool or work using distributed
memory across heterogeneous network computers in a
distributed environment.

 The MIMD architectures is primarily used in a number of


application areas such as computer-aided
design/computer-aided manufacturing, simulation,
modelling, communication switches etc.
Single Multiple
Single SISD MISD
Von Neumann May be pipelined
Single computer computers
Multiple SIMD MIMD
Vector processors Multi computers
Fine grained data Multiprocessors
Parallel computers
MESSAGE-PASSING SYSTEMS VERSUS SHARED MEMORY
SYSTEMS
Communication among processors takes place via
shared data variables, and control variables for
synchronization among the processors. The communications
between the tasks in multiprocessor systems take place
through two main modes:

Message passing systems:


 This allows multiple processes to read and write data to the
message queue without being connected to each other.
 Messages are stored on the queue until their recipient
retrieves them. Message queues are quite useful for inter
process communication and are used by most operating
systems.
Shared memory systems:
 The shared memory is the memory that can be
simultaneously accessed by multiple processes. This is
done so that the processors can communicate with each
other.
 Communication among processors takes place through
shared data variables, and control variables for
synchronization among the processors.
 Semaphores and monitors are common synchronization
mechanisms on shared memory systems.
 When shared memory model is implemented in a
distributed environment, it is termed as distributed
shared memory.
a) Message Passing b) Shared Memory Model
Model
Fig : Inter-process communication models

Differences between message passing and shared


memory models
Shared Memory Message Passing

It is mainly used for data It is mainly used for


communication. communication.

It offers a maximum speed of It takes a huge time because it


computation as communication is is performed via the kernel
completed via the shared (system calls).
memory, so the system calls are
only required to establish the
shared memory.

The code for reading and writing No such code is required in


the data from the shared memory this case because it offers a
should be written explicitly by the method for communication and
developer. synchronization of activities
executed by the
communicating processes.

It is used to communicate It is most commonly utilized in


between the single processor and a distributed setting when
multiprocessor systems in which communicating processes are
the processes to be spread over multiple devices
communicated are on the same linked by a network.
machine and share the same
address space.

It is a faster communication It is a relatively slower


strategy than the message communication strategy than
passing. the shared memory.

Make sure that processes in It is useful for sharing little


shared memory aren't writing to quantities of data without
the same address causing disputes.
simultaneously.
Parallel, distributed, and concurrent programs

In computer architecture, parallel, distributed, and concurrent


programs refer to different ways of handling tasks and processes
simultaneously to improve efficiency, processing speed, or resource
utilization. Let’s break down each concept:

1. Parallel Programming:

Parallel programming refers to the practice of running multiple tasks or


operations at the same time. This is typically done by using multiple
processors or cores in a single machine, enabling the system to
perform more work in less time. The key point here is that the tasks are
executed simultaneously on different processors or cores.

 Characteristics:
o Shared memory: In most parallel systems, multiple
processors or cores have access to shared memory.
o Fine-grained parallelism: The program is broken down into
small tasks that are executed concurrently.
o Tightly coupled systems: The processors are often on the
same physical machine and are interconnected with high-
speed communication channels.
 Example:
o Multi-core processors in modern computers allow for parallel
execution of programs. For instance, a program that
processes a large dataset can be divided into smaller
chunks that are processed in parallel on different cores,
speeding up the computation.

2. Concurrent Programming:

Concurrent programming involves dealing with multiple tasks or


processes that can start, run, and complete in overlapping time
periods. Unlike parallelism, concurrency doesn't necessarily mean
tasks are executed at the same exact moment. Instead, it implies that
tasks are being managed in a way that they can make progress
without waiting for one another.
 Characteristics:
o Task management: The system can handle multiple tasks
at the same time, but it doesn’t guarantee that tasks will run
simultaneously. The tasks may be executed in a time-sliced
manner (in the case of a single processor).
o Shared or independent resources: Processes may share
resources or run independently. However, resource
management is critical to avoid conflicts, like deadlocks or
race conditions.
o Context switching: If the system has a single processor, it
may switch between tasks (context switching), giving the
illusion of simultaneous execution.
o
 Example: An operating system handling multiple programs at
once on a single processor. Even though only one program can
run at a time, the operating system switches between them so
fast that it appears as though they are running concurrently

3. Distributed Programming:

Distributed programming involves dividing a task across multiple


independent machines (often located remotely), which communicate
over a network. Each machine in a distributed system typically has its
own memory, and they work together to solve a common problem.

 Characteristics:
o Independent memory: Each machine has its own local
memory, and there’s no shared memory.
o Communication through messages: Machines
communicate with each other using network protocols (e.g.,
TCP/IP, RPC).
o Geographically dispersed: Machines may be in different
locations, leading to potential delays due to network latency.

 Example:
o Cloud computing platforms like AWS or Google Cloud use
distributed systems where multiple servers (distributed
across the globe) work together to provide services like data
storage, processing, or hosting applications.

Coupling
The term coupling is associated with the configuration
and design of processors in a multiprocessor system.
The degree of coupling among a set of modules, whether hardware or software, is measured in
terms of the interdependency and binding and/or homogeneity
among the modules.

The multiprocessor systems are classified into two types


based on coupling:
1. Loosely coupled systems
2. Tightly coupled systems
Tightly Coupled systems:
 Tightly coupled multiprocessor systems contain multiple
CPUs that are connected at the bus level with both local
as well as central shared memory.
 Tightly coupled systems perform better, due to faster
access to memory and intercommunication and are
physically smaller and use less power. They are
economically costlier.
 Tightly coupled multiprocessors with UMA shared memory
may be either switch-based (e.g., NYU Ultracomputer,
RP3) or bus-based (e.g., Sequent, Encore).
 Some examples of tightly coupled multiprocessors with
NUMA shared memory or that communicate by message
passing are the SGI Origin 2000

Loosely Coupled systems:


 Loosely coupled multiprocessors consist of distributed
memory where each processor has its own memory and
IO channels.
 The processors communicate with each other via
message passing or interconnection switching.
 Each processor may also run a different operating system
and have its own bus control logic.
 Loosely coupled systems are less costly than tightly
coupled systems, but are physically bigger and have a low
performance compared to tightly coupled systems.
 The individual nodes in a loosely coupled system can be
easily replaced and are usually inexpensive.
 The extra hardware required to provide communication
between the individual processors makes them complex
and less portable.
 These processors neither share memory nor have a
common clock.
 Loosely coupled multicomputers without shared memory
and without common clock and that are physically remote,
are termed as distributed systems.
SYNCHRONOUS VS ASYNCHRONOUS EXECUTIONS
The execution of process in distributed systems may be
synchronous or asynchronous.

Asynchronous Execution:
A communication among processes is considered
asynchronous, when every communicating process can have
a different observation of the order of the messages being
exchanged. In an asynchronous execution:
 there is no processor synchrony and there is no bound on
the drift rate of processor clocks
 message delays are finite but unbounded
 no upper bound on the time taken by a process

Synchronous Execution:
In synchronous execution, tasks or processes must wait
for one another to complete before continuing. When one
process sends a request or message, it waits for a response
before proceeding. A communication among processes is
considered synchronous when every process observes the
same order of messages within the system. In the same
manner, the execution is considered synchronous, when
every individual process in the system observes the same
total order of all the processes which happen within it. In an
synchronous execution:
 processors are synchronized and the clock drift rate between
any two processors is bounded
 message delivery times are such that they occur in one logical step
or round
 upper bound on the time taken by a process to execute a step.
Clock synchronization

Clock synchronization is the process of coordinating the time across


multiple devices or systems. It's important for applications that require
precise timing, such as logging events and triggering events in real
time.
Why is clock synchronization important?
 Ensures consistency across distributed systems

 Ensures that multiple nodes in a distributed system share a


common notion of time
 Ensures that time-sensitive operations run at the same time
Design Issues and Challenges in Distributed Systems
Designing and implementing distributed systems comes with
several challenges that need to be carefully addressed:
 Fault Tolerance:

o Ensuring that the system can recover from hardware

or network failures and continue functioning even


when some components fail.
 Consistency:

o Managing data consistency across distributed nodes

is complex, particularly in systems that replicate data.


Techniques like eventual consistency and strong
consistency need to be carefully chosen based on
application requirements.
 Scalability:

o As systems grow, they must be able to scale

efficiently in terms of both performance and resource


usage. Load balancing, distributed storage, and
horizontal scaling are key strategies.
 Security:

o Distributed systems often involve communication over

untrusted networks. Securing data, ensuring


authentication, and preventing unauthorized access
are important design concerns.
 Latency and Network Partitioning:
o Network latency can degrade performance, and

partitioning (where parts of the system become


isolated from one another) can lead to consistency
and availability issues.

Global State and Distributed Transactions

1. Global State: In a distributed system, global state refers


to the overall state of the entire system at a given point in
time, considering all nodes, processes, and communication
between them. Since a distributed system typically involves
multiple independent components (such as servers,
databases, or processes) running concurrently, each
component has its own local state. Global state is a way to
aggregate these local states into a comprehensive view of
the system.

Challenges with Global State:


 Concurrency: The state of different components or
processes can change simultaneously, making it difficult to
track or define the global state at any given moment.
 Coordination: As different nodes may have different views
of the system, achieving a consistent and synchronized
global state can be challenging.
 Latency: Gathering information about the global state from
all components in real-time can introduce delays and
potential inconsistencies.

Example:
Consider a distributed banking system with multiple branches,
each having its own database. The global state would
represent the total balance across all branches at any given
moment. Tracking this state accurately would be important for
ensuring the correctness of any transactions performed in the
system.

2. Distributed Transactions: A distributed transaction


refers to a transaction that spans multiple nodes or systems in
a distributed environment. Since a distributed system typically
involves many independent machines or databases, a
transaction that needs to affect more than one node (e.g., a
transfer of funds between accounts stored in different
databases) is called a distributed transaction.
The goal of distributed transactions is to ensure that all
involved components commit or roll back the transaction in a
coordinated manner, even if they are geographically or
logically separated. This coordination is critical to maintain
atomicity, consistency, isolation, and durability (ACID
properties).

Challenges of Distributed Transactions:


 Coordination Across Nodes: Since the transaction

involves multiple nodes, ensuring that all nodes reach a


consensus on whether to commit or abort the transaction
can be difficult, especially if the nodes are geographically
distributed.
 Network Failures: In a distributed system, network

failures are common, and ensuring that transactions are


consistent in the event of a network partition (or "split
brain") can be complex.
 Two-Phase Commit (2PC): This is a standard protocol
used in distributed transactions to ensure that all nodes
agree on whether to commit or abort the transaction. It
involves two main phases:
o Phase 1 (Prepare Phase): The coordinator sends a

"prepare" message to all participants, asking if they


can commit the transaction.
o Phase 2 (Commit/Abort Phase): If all participants

respond affirmatively, the coordinator sends a


"commit" message to all participants. Otherwise, it
sends an "abort" message.
While 2PC guarantees consistency and atomicity, it is
vulnerable to issues like blocking (if the coordinator fails,
participants might be left in an uncertain state) and
performance problems.
 Three-Phase Commit (3PC): This is an extension of the

2PC protocol that addresses some of its shortcomings,


particularly around fault tolerance. The 3PC protocol
introduces an additional phase to handle scenarios where
participants may be left in uncertain states after a failure.

Example:

Consider an online store with a distributed system that manages


inventory and order processing. When a customer places an order, a
distributed transaction might need to update the inventory database
and the order database simultaneously:

 Update Inventory: Reduce the number of items in stock.


 Create Order: Insert the order details into the order database.

If either part of the transaction fails (e.g., the inventory update fails),
the entire transaction should be rolled back, and the order should not
be created, ensuring that both databases remain in a consistent state.

You might also like