0% found this document useful (0 votes)
42 views

Lecture4 (Share Memory-"According Access")

The document discusses different types of distributed systems including shared memory systems. It describes classification based on instruction and data as well as connection between memory and processor. Shared memory architectures including UMA, NUMA and COMA are explained.

Uploaded by

hussiandavid26
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Lecture4 (Share Memory-"According Access")

The document discusses different types of distributed systems including shared memory systems. It describes classification based on instruction and data as well as connection between memory and processor. Shared memory architectures including UMA, NUMA and COMA are explained.

Uploaded by

hussiandavid26
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Distributed Computer System

Issued by:
Dr. Ameer Mosa Thoeny Al-Sadi

Lecture 4
Classification of Distributed Systems
(1-Share memory-” according access”)

Outlines

• Forms parallelism.
• Classification architecture:

A. Flynn Taxonomy: according to the dimensions of Instruction and Data.

1. SISD.
2. SIMD.
3. MISD.
4. MIMD.

B. According connection memory - processor:

1. Share memory:
-According access” UMA, NUMA, COMA”.
-According connection: buses, crossing switch, multistage network.

2. Message system:
-according network topology.

3. Hybrid system (combination access).

4. Distributed share memory.

Dr. Ameer Mosa Al-sadi 1|P age


Shared Memory
In today’s competitive technology where speed and cost are so important, the
efficiency of tasks is crucial to achieve success. In the most common human handle
tasks, team work is the way to go. Technology follows the same principle as multi-
processors allows speedup achievement through parallel programming. Like for
team work, communication is very important to achieve maximum efficiency.
One popular way to allow multiple processors to communicate is to use shared
memory architecture. This lecture describes the different characteristics of shared
memory systems, the paradigm surrounding the software programming, the
hardware requirement for implementation and a description of Altera’s solution. An
emphasis is made on cache coherence and synchronization.

Shared Memory Architecture:


Shared memory systems form a major category of multiprocessors. In this category,
all processors share a global memory. Communication between tasks running on
different processors is performed through writing to and reading from the global
memory.
All interprocessor coordination and synchronization is also accomplished via the
global memory. A shared memory computer system consists of a set of independent
processors, a set of memory modules, and an interconnection network.

Two main problems need to be addressed when designing a shared memory system:
performance degradation due to contention, and coherence problems.

Performance degradation might happen when multiple processors are trying to


access the shared memory simultaneously.
A typical design might use caches to solve the contention problem. Although, having
multiple copies of data.

Coherence problem: The copies in the caches are coherent if they are all equal to the
same value. However, if one of the processors writes over the value of one of the
copies, then the copy becomes inconsistent because it no longer equals the value of
the other copies.

Dr. Ameer Mosa Al-sadi 2|P age


Note: PFN (Page Frame Number), F (Flags).

Access to share memory


A shared memory model is one in which processors communicate by reading
and writing locations in a shared memory that is equally accessible by all processors.
Each processor may have registers, buffers, caches, and local memory banks as
additional memory resources.
A number of basic issues in the design of shared memory systems have to be taken
into consideration. These include access control, synchronization, protection, and
security.
Access control determines which process accesses are possible to which resources.
Access control models make the required check for every access request issued by
the processors to the shared memory, against the contents of the access control table.
The latter contains flags that determine the legality of each access attempt. If there
are access attempts to resources, then until the desired access is completed, all

Dr. Ameer Mosa Al-sadi 3|P age


disallowed access attempts and illegal processes are blocked. Requests from sharing
processes may change the contents of the access control table during execution.
The flags of the access control with the synchronization rules determine the
system’s functionality.
Synchronization constraints limit the time of accesses from sharing processes to
shared resources. Appropriate synchronization ensures that the information flows
properly and ensures system functionality.

EXAMPLE: The simplest shared memory system consists of one memory


module that can be accessed from two processors. Requests arrive at the memory
module through its two ports. An arbitration unit within the memory module passes
requests through to a memory controller. If the memory module is not busy and a
single request arrives, then the arbitration unit passes that request to the memory
controller and the request is granted. The module is placed in the busy state while a
request is being serviced. If a new request arrives while the memory is busy servicing
a previous request, the requesting processor may hold its request on the line until the
memory becomes free or it may repeat its request sometime later.

Depending on the interconnection network, a shared memory system leads to


systems can be classified as: uniform memory access (UMA), Non-uniform memory
access (NUMA), and cache-only memory architecture (COMA).

Access to share memory


From view range access to memory:
Model UMA (uniform Memory Access).
- Units access to all share memory.
Model NUMA (Non-uniform Memory Access).
- Range access will change with address.
Model COMA (Cache-Only Memory Access).
- Don’t have main memory, use only cache.
Model ccNUMA (Cache- Coherent NUMA).

Dr. Ameer Mosa Al-sadi 4|P age


UMA
1. All processors have equal access and equal time access to all part share
memory.
2. Processors and memory are connect by share interconnect system.
3. Convenient architecture for SMP multiprocessors system, simple realizes.
4. Bad scalability.

single bus, multiple buses, or a crossbar switch

In the UMA system a shared memory is accessible by all processors through an


interconnection network in the same way a single processor accesses its memory.
All processors have equal access time to any memory location. The interconnection
network used in the UMA can be a single bus, multiple buses, or a crossbar switch.
Because access to shared memory is balanced, these systems are also called SMP
(symmetric multiprocessor) systems. Each processor has equal opportunity to
read/write to memory, including equal access speed. Commercial examples of SMPs
are Sun Microsystems multiprocessor servers and Silicon Graphics Inc.
multiprocessor servers. A typical bus-structured SMP computer.

Dr. Ameer Mosa Al-sadi 5|P age


NUMA
1. Every processor has his local memory and those memories together create
share memory.
2. Access processor to his local memory is faster than access to share memory
of other processors.
3. Access to share parts of other processors less use (only if necessary), less use
bases.
4. Support OS for allocate memory location for needing data.
5. Advantage, programming bus SMP, good scalability.

In the NUMA system, each processor has part of the shared memory attached. The
memory has a single address space. Therefore, any processor could access any
memory location directly using its real address. However, the access time to modules
depends on the distance to the processor.
In the extreme, the bus contention might be reduced to zero after the cache memories
are loaded from the global memory, because it is possible for all instructions and
data to be completely contained within the cache.

Dr. Ameer Mosa Al-sadi 6|P age


COMA
1. Every processor uses his cache memory.
2. Single address space for share memory is created from cache memory for all
processors.
3. Can be classified as special type of NUMA.
4. Access to memory (cache) another processor realizes through address space.

Similar to the NUMA, each processor has part of the shared memory in the COMA.
However, in this case the shared memory consists of cache memory. A COMA
system requires that data be migrated to the processor requesting it. There is no
memory hierarchy and the address space is made of all the caches. There is a cache
directory (D) that helps in remote cache access. The Kendall Square search’s KSR-
1 machine is an example of such architecture.

Dr. Ameer Mosa Al-sadi 7|P age


Hierarchy memory
Memory is a very important component of a computer, yet even more
importantly is the type of memory in a computer’s architecture. Having the
wrong type of memory for the system can be costly.
There are many types of memory all of which must work together in a system
to perform tasks efficiently. System efficiency is gained by having different
levels of memory units that have different data transfer speeds. The top level
there is a layer with slow memory access.

Register – M0

-on one chip with processor.


-very fast (~ ns ).
- very small (~10² B).
- memory allocation(, control Access ) done by complier.

Cache-M1:
-Almost on chip with processor.
-speed up access to main memory (RAM) by (~10 ns).
-relatively small (~ 10² kB).
-is transparent , control Access is HW (MMU).

Main memory – M2
-on motherboard technology D-RAM or DDRAM.
- relatively fast (~ 10-10² ns ).
- level bigger than cache (~ GB).
- control Access done by OS and MMU.

Hard disk-M3:
-in computer.
-long rang time access (~10 ms).
-big capacity(~ 10² GB).
-control Access is OS.

Dr. Ameer Mosa Al-sadi 8|P age


Hierarchy memory characteristic

Inclusive :

- .

Coherence:

- instate information in different level hierarchy must be actually (or when


needed).

Locality:

-Access to memory according those models

(Sequence, time, space)

BASIC CACHE COHERENCY METHODS:

Multiple copies of data, spread throughout the caches, lead to a coherence


problem among the caches. The copies in the caches are coherent if they all equal
the same value. However, if one of the processors writes over the value of one of
the copies, then the copy becomes inconsistent because it no longer equals the
value of the other copies. If data are allowed to become inconsistent (incoherent),
incorrect results will be propagated through the system, leading to incorrect final
results.
Cache coherence algorithms are needed to maintain a level of consistency
throughout the parallel system.

Cache–Memory Coherence:
In a single cache system, coherence between memory and the cache is maintained
using one of two policies: (1) write-through, and (2) write-back. When a task
running on a processor P requests the data in memory location X, for example, the
contents of X are copied to the cache, where it is passed on to P. When P updates
the value of X in the cache, the other copy in memory also needs to be updated in
order to maintain consistency. In write-through, the memory is updated every time

Dr. Ameer Mosa Al-sadi 9|P age


the cache is updated, while in write-back, the memory is updated only when the
block in the cache is being replaced or need.

Write-Through vs. Write-Back

Coherence of (Multi-cache)
Caches play key role in all cases:
1. Reduce average data access time
2. Reduce bandwidth demands placed on shared interconnect.

But private processor caches create a problem:


1. Copies of a variable can be present in multiple caches.
2. A write by one processor may not become visible to others.
3. They’ll keep accessing stale value in their caches.
4. Cache coherence problem.
5. Need to take actions to ensure visibility.

-Multiple copies of x -What if P1 updates x?

(Multi-cache) NUMA or COMA

Dr. Ameer Mosa Al-sadi 10 | P a g e


There are two fundamental cache coherence policies:
1. Write-invalidate maintains consistency by reading from local caches until a
write occurs. When any processor updates the value of X through a write, posting
a dirty bit for X invalidates all other copies.

2. Write-update maintains consistency by immediately updating all copies in all


caches. All dirty bits are set during each write operation.

Write-Update vs. Write-Invalidate

Write-invalidate Write-update

Cache Coherence Policies


Writing to Cache in n processor case:
• Write Update -Write Through
• Write Invalidate -Write Back
• Write Update -Write Back
• Write Invalidate -Write Through

Dr. Ameer Mosa Al-sadi 11 | P a g e


Snooping protocols

• Snooping protocols: are based on watching bus activities and carry out the
appropriate coherency commands when necessary.
• Global memory is moved in blocks, and each block has a state associated
with it, which determines what happens to the entire contents of the block.
• The state of a block might change as a result of the operations:
1. Read-Miss,
2. Read-Hit,
3. Write-Miss,
4. and Write-Hit.

Cache Events and Actions


Following events and actions occur on the execution of memory-access and
invalidation commands
Read-miss When a processor wants to read a block and it is not in the
cache, a read-miss occurs. This initiates a bus-read operation. If no dirty
copy exists, then the main memory that has a consistent copy, supplies a copy
to the requesting cache memory. If a dirty copy exists in a remote cache
memory, that cache will restrain the main memory and send a copy to the

Dr. Ameer Mosa Al-sadi 12 | P a g e


requesting cache memory. In both the cases, the cache copy will enter the
valid state after a read miss.
Write-hit If the copy is in dirty or reserved state, write is done locally and
the new state is dirty. If the new state is valid, write-invalidate command is
broadcasted to all the caches, invalidating their copies. When the shared
memory is written through, the resulting state is reserved after this first write.
Write-miss If a processor fails to write in the local cache memory, the copy
must come either from the main memory or from a remote cache memory
with a dirty block. This is done by sending a read-invalidate command,
which will invalidate all cache copies. Then the local copy is updated with
dirty state.
Read-hit Read-hit is always performed in local cache memory without
causing a transition of state or using the snoopy bus for invalidation.
Block replacement When a copy is dirty, it is to be written back to the
main memory by block replacement method. However, when the copy is
either in valid or reserved or invalid state, no replacement will take place.

Write Invalidate -Write Through

Multiple processors can read block copies from main memory safely until one
processor updates its copy. At this time, all cache copies are invalidated and the
memory is updated to remain consistent.

Dr. Ameer Mosa Al-sadi 13 | P a g e


Example:

Write Invalidate -Write Through

Dr. Ameer Mosa Al-sadi 14 | P a g e


Write Invalidate -Write Back

A valid block can be owned by memory and shared in multiple caches that can
contain only the shared copies of the block. Multiple processors can safely read these
blocks from their caches until one processor updates its copy. At this time, the writer
becomes the only owner of the valid block and all other copies are invalidated.

Write Invalidate -Write Back

Dr. Ameer Mosa Al-sadi 15 | P a g e


Program locality:

Memory (RAM and Cache) exploit properties of program:

1) Time locality: block data which we needed it before instant (moment), will need
it also after moment.

2) Space locality: need to read or write into adjacent addresses.

Homework

Dr. Ameer Mosa Al-sadi 16 | P a g e

You might also like