0% found this document useful (0 votes)
20 views51 pages

Distributed Shared Memory

The document discusses Distributed Shared Memory (DSM) architecture, highlighting its design and implementation issues such as granularity, memory coherence, and data replication. It outlines various consistency models, including strict, sequential, and eventual consistency, as well as the need for data replication to enhance reliability and performance. Additionally, it covers different types of data replication strategies and models, including master-slave, client-server, and peer-to-peer architectures.

Uploaded by

khushiejain92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views51 pages

Distributed Shared Memory

The document discusses Distributed Shared Memory (DSM) architecture, highlighting its design and implementation issues such as granularity, memory coherence, and data replication. It outlines various consistency models, including strict, sequential, and eventual consistency, as well as the need for data replication to enhance reliability and performance. Additionally, it covers different types of data replication strategies and models, including master-slave, client-server, and peer-to-peer architectures.

Uploaded by

khushiejain92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Distributed Shared Memory

Module 5
Distributed Shared Memory
Architecture of DSM
Architecture of DSM
Architecture of DSM
• The DSM abstraction presents a large shared-memory space to the processors of all
nodes.
• A software memory-mapping manager routine in each node maps the local memory
onto the shared virtual memory. To facilitate the mapping operation, the
shared-memory space is partitioned into blocks.
• Data caching is a well-known solution to address memory access latency. The idea of
data caching is used in DSM systems to reduce network latency. that is, the main
memory of individual nodes is used to cache pieces of the shared-memory
space. The memory- mapping manager of each node views its local memory as a big
cache of the shared - memory space for its associated processors. The basic unit of
caching is a memory block
Architecture of DSM
Architecture of DSM
Design & Implementation Issues
Design & Implementation Issues
Granularity
Granularity refers to the block size of a DSM system, that is, to the unit of sharing and the
unit of data transfer across the network when a network block fault occurs. Possible units
are a few words, a page, or a few pages. Selecting proper block size is an important part
of the design of a DSM system because block size is usually a measure of the granularity
of parallelism explored and the amount of network traffic generated by network block
faults.
Design & Implementation Issues
Structure of shared-memory space.
Structure refers to the layout of the shared data in memory. The structure of the shared-memory space of a
DSM system is normally dependent on the type of applications that the DSM system is intended to support.

Heterogeneity.
The DSM systems built for homogeneous systems need not address the heterogeneity issue. However, if the
underlying system environment is heterogeneous, the DSM system must be designed to take care of
heterogeneity so that it functions properly with machines having different architectures.
Design & Implementation Issues
Memory coherence and access synchronization.
In a DSM system that allows replication of shared data items, copies of shared data items may
simultaneously be available in the main memories of a number of nodes.

Thrashing.
In a DSM system, data blocks migrate between nodes on demand. Therefore, if two nodes compete for
write access to a single data item, the corresponding data block may be transferred back and forth at such
a high rate that no real work can get done. A DSM system must use a policy to avoid this situation (usually
known as thrashing).
Design & Implementation Issues
Data location and access.
To share data in a DSM system, it should be possible to locate and retrieve the data accessed by a user process.
Therefore, a DSM system must implement some form of data block locating mechanism in order to service network
data block faults to meet the requirement of the memory coherence semantics being used.

Replacement strategy.
If the local memory of a node is full, a cache miss at that node implies not only a fetch of the accessed data block from a
remote node but also a replacement. That is, a data block of the local memory must be replaced by the new data block.
Therefore, a cache replacement strategy is also necessary in the design of a DSM system.
Replication
Reasons for replication:
1. Reliability:
If a file system has been replicated it may be possible to continue working after one replica crashes by
simply switching to one of the other replicas. Also, it becomes possible to provide better protection
against corrupted data. e.g. There are 3 copies of a file & every read & write operation is performed on
each copy. We can safeguard ourselves against a single, failing write operation by considering the value
that is returned by at least two copies as being the correct one.
2. Performance:
When an increasing number of processes needs to access data that are managed by a single server. In that
case performance can be improved by replicating the server & subsequently dividing the work.
3. Response Time, throughput
4. Scalability
Consistency Models
A consistency model basically refers to the degree of consistency that should be maintained
for the shared memory data. There are two types of consistency models:
1. Data centric
2. Client centric consistency models
Consistency Models
1. Strict Consistency Model
• Any read on a data item x returns a value corresponding to the result of the most recent write on x
• This is the strongest form of consistency model.
2. Sequential Consistency Model
• In this model the result of any execution is the same as if the (read & write) operations by all processes
on the data store were executed is some sequential order & the operations of each individual process
appear in this sequence in the order specified by its program.
3. Linearizability Consistency Model
• The operations of each individual process appear in sequence order specified by its program.
• If tsOP1(x) < tsOP2(y), then operation OP1(x) should proceed OP2(y) in this sequence.
Linearizable = sequential + operation ordering according to global time
4. Causal Consistency Model
• In this model, all processes see only those memory reference operations in the correct order that are
potentially causally related.
• A memory reference operation is said to be causally related to another memory reference operation if
the first operation is influenced by the second operation.
5. FIFO Consistency Model
• It is weaker than causal consistency.
• This model ensures that all write operations performed by a single
process are seen by all other processes in the order in which they
were performed, like a single process in a pipeline.
e.g. FIFO consistency writes done by a single process are seen by all
other processes in the order in which they were issued, but writes
from different processes may be seen in a different order by different
processes.
6. Weak Consistency Model
• The basic idea behind the weak consistency model is enforcing
consistency on group of memory reference operations rather
than individual operations.
• It uses a special variable called a synchronization variable
which is used to synchronize memory.
• When a process accesses a synchronization variable, the
entire memory is synchronized by making visible the changes
made to the memory to all other processes.
6. Weak Consistency Model
7. Release Consistency Model

• The release consistency model tells whether a process is


entering or exiting from critical section so that the system
performs either of the operations when a synchronization
variable is accessed by a process.
• Two synchronization variables acquire and release are used
instead of single synchronization variable. Acquire is used
when process enters critical section & release is used when it
exits a critical section.
7. Release Consistency Model
8. Entry Consistency Model

• In entry consistency every shared data item is associated with


synchronization variable.
• In order to access consistent data, each synchronization
variable must be explicitly acquired.
8. Entry Consistency Model
2. Client-Centric Consistency Model
• This model concentrates on consistency from the perspective
of a single mobile client.
• Client-centric consistency models are generally used for
applications that lack simultaneous updates where most
operations involve reading the data.
2. Client-Centric Consistency Model
Types:
• Eventual Consistency
• Monotonic reads
• Monotonic writes
• Read-your-writes
• Write-follow-reads
1. Eventual Consistency
• Eventual consistency is a consistency model used in
distributed systems where, after some time with no
updates, all data replicas will eventually converge to a
consistent state.
• This model allows for replicas of data to be
inconsistent for a short period, enabling high
availability and partition tolerance.
• Eventual consistency is suitable for scenarios where
immediate consistency is not critical, such as in social
media feeds or shopping carts, and provides a balance
2. Monotonic reads Consistency
• Monotonic reads consistency ensures that once a
process reads a value, all subsequent reads by that
same process will return the same or a more recent
value, preventing a system from reverting to older
states.
• A process has seen a value of x at time t, it will never
see an older version of x at a later time.
• E.g A user can read incoming mail while moving
3. Monotonic write Consistency
• Monotonic Writes Consistency ensures that writes to a system are
committed in the order they are received. This order is crucial in
maintaining data integrity across distributed systems..
• A write operation on a copy of data item x is performed only if that
copy has been brought up to date by means of any preceding
write operations, which may have taken place on other copies of x.
• E.g Updating a program on server S2 & ensuring that all
components on which compilation & linking rely are also stored on
S2.
4. Read your writes Consistency
• Read-Your-Writes Consistency guarantees that when you
make changes to data, those changes are instantly visible in
your subsequent reads. This simplifies development,
enhances user experience, and ensures data accuracy..
• As a result, no matter where the read operation occurs, a write
operation is always completed before a subsequent read
operation by the same process.
• E.g Updating your web page & ensuring that your web browser
displays the most recent version rather than its cashed copy.
5. Write follow Reads Consistency
• A write operation by a process on a data item x
following a previous read operation on x by the same
process is guaranteed to take place on the same or a
more recent value of x that was read.
• E.g Only view reactions to submitted articles if you
have the initial posting
Replication
• Data Replication is the process of generating
numerous copies of data called replicas & storing in
various locations for backup, fault tolerance &
improving overall network accessibility.
• The data replicas can be stored on on-site & off-site
servers as well as cloud-based hosts or all within the
same system.
Need for Data Replication
• Higher Availability:
Data is replicated over numerous locations so that the user
can access it even if some of the copies are unavailable due
to site failures.
• Reduced Latency:
By keeping data geographically closer to a customer,
replication helps to reduce data query latency.
e.g. Netflix retain a copy of duplicated data closer to the user
• Read Scalability:
Read queries can be served from copies of the same data that
have been replicated. This increases the overall throughput of
queries.
• Fault-Tolerant:
Replica Placement
Replica Placement
• The placement problem itself should be split into two
subproblems:
1. Placing Replica Server: Replica-server placement is
concerned with finding the best locations to place a server that
can host a data store
2. Placing Content: Content placement deals with finding the
best servers for placing the content.
Replica Placement
1. Content Replication & Placement:
• Permanent Replica: It can be considered as the initial set of
replicas that constitute a distributed data store. In many cases
no of permanent replicas is small.
• E.g. A website: website distribution comes in 2 formats.
• First kind of distribution is one in which the files that constitute
a site are replicated across a limited no of servers at single
location. Whenever request comes in, it is forwarded to one of
the server using round robin strategy.
• The second form is called mirroring. In this case, a website is
copied to a limited no of servers, called mirror sites which are
geographically spread across the Internet. In most cases, client
simply choose one of the various mirror sites from list offered to
them.
Replica Placement
• Server-Initiated Replicas:
• In contrast to permanent replicas, server initiated replicas are copies
of data store that exists to enhance performance which are created at
the initiative of the data store.
• E.g. A webserver placed in NewYork. Normally, this server can
handle incoming requests quite easily, but it may happen that over a
couple of days a sudden burst of requests come in from an
unexpected location far from the server. In that case, it may be
worthwhile to install a number of temporary replicas in regions where
requests are coming from. To provide optimal facilities such as
hosting services can dynamically replicate files to servers where
those files are needed to enhance performance that is close to
demanding clients. The algorithm for dynamic replication takes two
issues into account. First replication can take place to reduce the
load on a server. Second specific files on a server can be migrated or
replicated to servers placed in the proximity of clients that issue many
Replica Placement
• Client Initiated Replicas:
Client initiated replicas are more commonly known as cashes. In
essence, a cache is a local storage facility that is used by a client
to temporarily store a copy of the data it has just requested. In
principle. Managing cache is left entirely to the client. The data
store from where the data had been fetched has nothing to do
with keeping cached data consistent.
Types of Data Replication
• Asynchronous vs synchronous replication
• Active vs passive replication
• Based on server model
o Active Replication
o Passive Replication
• Based on replication schemes
o Single Leader Architecture
o Multi Leader Architecture
o No Leader Architecture
Types of Data Replication
1. Asynchronous Replication:In this replication, the replica
gets modified after the commit(save) is fired onto the
database.
2. Synchronous Replication :In this replication, the replica
gets modified immediately after some changes are made in
the relation table
Types of Data Replication
1. Active Replication :
• It is a non-centralized replication mechanism. The central
idea is that all replicas receive & process the same set of
client requests.
• Consistency is ensured by assuming that replicas will
generate the same output when given the same input in the
same sequence. This assumption indicates that servers
respond to queries in a deterministic manner.
• Client do not address a single server, but rather a group of
servers.
2. Passive Replication :
• Client requests are processed by just one server (primary).
• The primary server changes the status of the other
Types of Data Replication
Based on Server Model
1. Single Leader Architecture:
o In this architecture, one server accepts client writes &
replicas pull data from it.
o This is the most popular & traditional way.
2. Multi Leader Architecture :
o In this architecture, multiple servers can accept writes and
serve as a model for replicas.
o To avoid delay, copies should be spread out & leaders
should be near all of them.
3. No Leader Architecture :
Every server in this architecture can receive writes & function
as a replica model. While it provides maximum flexibility ,it
makes synchronization difficult.
Based on Replication Scheme
1. Full Data Replication:
o It refers to the replication of the whole database across all sites.
o Since the results can be accessed from any local server, full
replication speeds up the execution of global queries.
o The drawback is that the updating process is often sluggish.
This makes maintaining current data copies in all locations
challenging.
2. Partial Data Replication :
o Here, only selected parts of the database are replicated based
on significance of data at each site.
o The number of copies can be anything from one to the total
number of nodes in the distributed system.
o This kind of replication can be effective for members of Sales
and Marketing teams where a partial database is maintained on
Replication Models
1. Master-Slave Model
2. Client-Server Model
3. Peer-to-Peer Model
Replication Models
1. Master-Slave Model
• In this model one of the copy is the master replica and all the
other copies are slaves.
• Slaves should always be identical to the master. In this model
the functionality of the slaves is very limited, thus the
configuration is very simple.
• The slaves essentially are read-only.
• Most of the master-slaves services ignore all the updates or
modifications performed at the slave & ‘undo’ the update during
synchronization, making the slave identical to the manner.
• The modifications or the updates can be reliably performed at
the master & the slaves must synchronize directly with master.
Replication Models
2. Client-Server Model
• The functionality of the clients in this model is more complex
than that of the slave in the master-slave model.
• It allows multiple inter-communicating servers; all types of data
modifications and updates can be generated at the client.
• One of the replication systems in which this model is
successfully implemented is Coda.
• In client-server replication all the updates must be propagated
first to the server, which then updates all the other clients.
• In this model, one replica of the data is designated as the
special server replica.
• All updates created at other replicas must be registered with the
server before they can be propagated further. Since all updates
Replication Models
3. Peer-Peer Model
• Here all the replicas or the copies are of equal importance or
they are all peers.
• In this, any replica can synchronize with other replica, & any file
system modification or update can be applied at any replica.
• Peer-to-peer systems can propagate updates faster by making
use of any available connectivity.
• They provide a very rich & robust communication framework,
but they are more complex in implementation.
• One more problem with this model is scalability.
• As synchronization & communication is allowed between any
replicas, this results in exceedingly large complicated data

You might also like