0% found this document useful (0 votes)
33 views

Distributed Memory Architecture

This document discusses distributed multiprocessor architectures, including tightly coupled and loosely coupled architectures. It provides details on the key aspects of tightly coupled architectures, including models both with and without private caches. Issues like memory conflicts and solutions like adding caches are covered. Loosely coupled architectures are also summarized, including how they use local memory and inter-process communication through message passing over different modules. Specific examples like the Cm* architecture are briefly mentioned.

Uploaded by

sanzog rai
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Distributed Memory Architecture

This document discusses distributed multiprocessor architectures, including tightly coupled and loosely coupled architectures. It provides details on the key aspects of tightly coupled architectures, including models both with and without private caches. Issues like memory conflicts and solutions like adding caches are covered. Loosely coupled architectures are also summarized, including how they use local memory and inter-process communication through message passing over different modules. Specific examples like the Cm* architecture are briefly mentioned.

Uploaded by

sanzog rai
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Unit 5

Distributed Multiprocessor
Architectures

1
Distributed Multiprocessor Architectures

• Loosely coupled and tightly coupled


architectures
• Cluster computing as an application of loosely
coupled architecture. Examples –CM* and
Hadoop.

2
Some Basics….
• Whenever working on projects, it seems as though
several people coordinating together makes for a
better solution then one person trying to piece things
together on their own.
• This is similar to the concept of multiprocessing.
• Multiprocessing is n number of p processors working
and operating concurrently.
• A multiprocessing system refers to a system
configuration that contains more than one main
control processor unit (CPU).
3
Why use a
multiprocessing system?
• First of all, a multiprocessing system is used to
increase overall system performance in work
being accomplished, also referred to as
throughput.
• By working together problems can be divided
up among processor for faster completion, also
called “divide and conqueror”.
• Another reason for using multiprocessing
systems is to increase system availability.
4
Introduction
• Key attributes of “multiprocessors”:-
– Single computer that includes multiple processors
– Processors may communicate at various levels
• Message passing or shared memory
• Multiprocessor and Multicomputer systems
– Multiple computer system consist of several autonomous computers
which may or may not communicate with each other.
– Multiprocessor system is controlled by single operating system which
provides mechanism for interactions among processors
• Architectural models
– Tightly coupled multiprocessor
– Loosely coupled multiprocessor

5
Tightly coupled multiprocessor(Basics)

• Communicate via shared memory.


• Complete connectivity between processor and
memory.
• This Connectivity accomplished by any
interconnection network.
• Drawback-:
Performance degradation due to memory
conflicts
6
Tightly Coupled Architecture(Details)

• A tightly coupled multiprocessor system may


be used in cases where speed is more of a
concern.
• Models:-
– Without private cache
– With private cache

7
Architecture(Without Private Cache)
• This model consists of p number of processors, l
memory modules, and d I/O channels.
• Everything is then connected using a P/M
interconnection network (PMIN).
• The PMIN is a switch that can connect every
processor to every memory module.
• A memory module can satisfy only one
processors request in a given memory cycle. This
conflict is arbitrated by the PMIN.
8
Tightly Coupled Architecture
• However, in this system the best way to prevent
these types of conflicts is to make l equal to p (i.e.
memory modules equal to the number of
processors).
• Another way of eliminating this conflict is to use
unmapped local memory (ULM)(Reserved Memory
Area For Each Processor)
• By adding the ULM we are able to reduce the
amount of traffic to the PMIN and thereby reducing
conflicts to and from memory.
9
Tightly coupled multiprocessor contd.
Interrupt signal
Interconnection network
(ISIN) Input Output
channels
d-1
p-1 . . disks
Processors .. Input/Output . .
.. . .
.. Interconnection network . .
0 (IOPIN) . .
0

Mapped ...... Unmapped Local Memory


Local Memory

Processor Memory
Interconnection network
(PMIN)

Shared Memory Modules


0 ......... l-1 10
Problem
• In this type of system architecture the memory
references made by the processors is usually
main memory.
• Memory reference common to all processor will
cause conflicts.
• PMIN will surely resolve this conflicts but it will
cause delay in operation,which increases
instruction cycle time,which decreases
throughput..
11
Solution
• Delay can be reduced by having cache for each
processor which will hold memory reference
for each processor.
• But cache coherance problem should be taken
care of.
• Refer to diagram.

12
Tightly coupled multiprocessor contd.
Interrupt signal
Interconnection network
(ISIN) Input Output
channels
d-1
p-1 . . disks
Processors .. Input/Output . .
.. . .
.. Interconnection network . .
0 (IOPIN) . .
Mapped
Local Memory 0
......
Unmapped Local Memory
Private
Caches

Processor Memory
Interconnection network
(PMIN)

Shared Memory Modules


0 ......... l-1 13
Tightly coupled multiprocessor
• ISIN permits each processor to interrupt to
each processor.
• ISIN also used by failing processor to
broadcast message.
• IOPIN permits processor to communicate with
IO channel.

14
Tightly coupled multiprocessor contd.
• Processor types
– Homogeneous, if all processors perform same
function
– Heterogeneous, if processors perform different
functions
Note: Two functionally same processor may differ
along other parameters like I/O, memory size,

etc, i.e. they are asymmetric

15
Loosely Coupled Architecture
• Each processor has its own set of I/O devices
and memory where it accesses most of its
instructions and data
• Computer Module: Processor, I/O interface
and memory
Input/Output
Local memory (I/O)
Processor (LM)
(P)

Channel and
Arbiter Switch
(CAS)
16
Loosely coupled multiprocessor contd.
• Inter-process communicate over different module happens by
exchange of messages, using message transfer system (MTS)
• Distributed system, degree of coupling is loose
• Degree of memory conflicts is less
LM I/O LM I/O

P P

CAS CAS
Computer Module 0 ………….. Computer Module N-1

Message Transfer System (MTS)

17
Loosely coupled multiprocessor
• Inter module communication
– Channel arbiter and switch (CAS)
– Arbiter decide when requests from two or more computer
module collide in accessing a physical segment of MTS

– Also responsible for delaying other request until servicing


request is completed.

18
Loosely coupled multiprocessor
• Message Transfer System (MTS)
– Time based or shared memory
– The latter case can be implemented with set of
memory modules and processor-memory
interconnection network or multiported main
memory.
– MTS determines the performance of
multiprocessor system.

19
Loosely coupled multiprocessor
• For LCS, that use single time shared bus,
performance limited by ,message arrival rate
on bus, message length and bus capacity.
• For LCS with shared memory, limiting factor is
memory conflict problem imposed by
processor memory interconnection network.

20
Cm* Architecture
• Project at Carnegie Melon University
• Now what is computer module?

P S

LM I/O
• Computer module consists of processor, S local, local
memory and I/O.
• S local similar to CAS in loosely coupled arch.
21
Cluster of computer Modules
Inter-cluster Bus

Cm1 Cm10Map Bus


KMAP P S P S

LM I/O LM I/O

22
Role Of S local
• Receives and interprets requests for access to P's
local and foreign to local memory and the I / O
• S allows a local P to access external resources Cm
and to make interpretation of local and external
applications software provide a translation of
local address.

23
Address Translation

24
K map Components
• It uses 4 high order bits along with 1 PSW bit
and then they access map table.
• Map Table determines whether memory is local or
not.
• If memory is non local control is given to K map via
map bus.
• CM connected to k map via map bus.
• K map responsible for routing data between s
locals.
25
AP
Kmap Components
Intercluster Bus 1
Intercluster Bus 2

Link
SEND SEND
SERVICE RETURN PORT 1
PORT 2

RUN

KBUS PMAP

OUT Map Bus

Cm Cm … Cm 26
Kmap Components
• Request for non local memory arrives at Kbus
via map bus.
• Link manages communication Between Kmap
and another Kmap.
• Pmap ->mapping processor which response to
request between Kbus and link.

27
Kmap Components
• Kmap can simultaneously handle 8 processor
request.
• Pmap uses the concept of queue to handle
request.
• Service request signaled to kbus whenever
request for non local memory reference, such
computer module called master Cm.

28
Kmap Components
• Kmap fetches virtual address via map bus and
allocates context for Pmap.
• It places the virtual address in Pmap run queue.
• Pmap performs virtual address to physical
address translation.
• Using physical address it can initiate memory
acces in any cm.

29
Kmap Components
• Kmap services the out request by sending
physical memory of memory request via map
bus.
• When destination cm completes memory
access it sends return signal to Kmap.

30
Intra-cluster Communication
KMAP
4
PMAP
3 5 Map Bus
RUN OUT
1
KBUS Cm … Cm
2
Master Slave

• Cm Master initiates a memory access nonlocal


• Master Cm virtual address issued by KBUS
• KBUS activates a context (creating specific data structure
transition) that the PMAP RUN queue
• PMAP treats context and do address translation
• PMAP OUT queue a request for memory cycle Cm Slave of the
current cluster
31
Intra-cluster Communication
KMAP
4
PMAP 6
3 5 Map Bus
RUN OUT
1 9 8 7
KBUS Cm … Cm
2
Master Slave
• KBUS send physical address to Cm Slave by Map Bus
• There is the local slave Cm local memory access cycle .
• KBUS "allow" the result of memory access operation to be
provided by Master Cm
• Cm Master takes the data, complete and continuous operation
during execution

32
Intra-cluster communication
3
Intercluster Bus
2 4
KMAP KMAP
Map Bus Master Slave Map Bus

1 … 5…
Cm Cm Cm Cm
Master Slave

1. Cm Master sends a transfer request to KMAP Master


2. Master prepares KMAP message / request package encode
intercluster
3. Intercluster message is transmitted on the bus intercluster
routing algorithms
4. Slave KMAP decode incoming requests and sends to the cluster or
local Memory cycle request is sent to Cm Slave

33
Intra-cluster communication
Cop Segment Offset 3
Intercluster Bus
8
2 9 7 4
KMAP KMAP
R/W Cm # Page Offset
Map Bus Master Slave Map Bus

1 … 5…
Cm
10
Cm Cm Cm
6
K/U R/W Cm # Page Offset
Master Slave

5. Cm Slave transmits the result to KMAP


6. Slave ready KMAP message intercluster (ie context reactivation)
7. KMAP Slave Master transmits the result to KMAP
8. KMAP Master receives and interprets the message received
9. The result is sent to the Master Cm
10. Result received by Cm.

34
BIGDATA FACTs
• Data intensive applications with Petabytes of
data

• Web pages - 20+ billion web pages x 20KB =


400+ terabytes

– One computer can read 30-35 MB/sec


from disk ~four months to read the web
– same problem with 1000 machines, < 3
hours
35
Single-thread performance doesn’t matter
We have large problems and total throughput/price more
important than peak performance

Stuff Breaks – more reliability


• If you have one server, it may stay up three years (1,000
days)
• If you have 10,000 servers, expect to lose ten a day

“Ultra-reliable” hardware doesn’t really help


At large scales, super-fancy reliable hardware still fails, able it
less often
– software still needs to be fault-tolerant
– commodity machines without fancy hardware give
better perf/price
36
What is Hadoop?


It's a framework for running applications on large
clusters of commodity hardware which produces
huge data and to process it


Hadoop is a framework used to have distributed
processing of big data which is stored at different
physical locations.

37
• The Apache Hadoop software library is a framework that allows
for the distributed processing of large data sets across clusters
of computers using simple programming models.
• It is designed to scale up from single servers to thousands of
machines, each offering local computation and storage.
• Rather than rely on hardware to deliver high-availability, the
library itself is designed to detect and handle failures at the
application layer, so delivering a highly-available service on top
of a cluster of computers, each of which may be prone to
failures.

38
Hadoop Includes


HDFS ­a distributed filesystem


Map/Reduce ­ HDFS implements this programming model.
It is an offline computing engine

39
Hadoop HDFS
• Hardware failure is the norm rather than the
exception.

• Moving Computation is Cheaper than Moving


Data

40
HDFS
• run on commodity hardware

• HDFS is highly fault-tolerant and is designed to


be deployed on low-cost hardware

• provides high throughput access to application


data

• suitable for applications that have large data


sets
41
NameNode and DataNodes
• HDFS has a master/slave architecture

• NameNode :-manages the file system namespace


and regulates access to files by clients

• DataNodes, usually one per node in the cluster,


which manage storage attached to the nodes that
they run on

• a file is split into one or more blocks


42
• these blocks are stored in a set of DataNodes

• NameNode executes file system namespace operations


like opening, closing, and renaming files and directories

• It also determines the mapping of blocks to DataNodes

• The DataNodes are responsible for serving read and


write requests from the file system’s clients.

• The DataNodes also perform block creation, deletion,


and replication upon instruction from the NameNode.

43
HDFS Internal

44
Hadoop ma-preduce
• Software framework for easily writing applications which process
vast amounts of data (multi-terabyte data-sets) in-parallel on large
clusters (thousands of nodes) of commodity hardware in a reliable,
fault-tolerant manner

• A MapReduce job usually splits the input data-set into independent


chunks which are processed by the map tasks in a completely
parallel manner.

• The framework sorts the outputs of the maps, which are then input
to the reduce tasks

• Typically the compute nodes and the storage nodes are the same
45
Hadoop mapreduce
• The MapReduce framework consists of a single
master JobTracker and one slave TaskTracker per
cluster-node

• The master is responsible for scheduling the jobs'


component tasks on the slaves, monitoring them and
re-executing the failed tasks

• The slaves execute the tasks as directed by the master

46
Hadoop mapreduce
• applications specify the input/output locations
• supply map and reduce functions via
implementations of appropriate interfaces and/or
abstract-classes.
• The Hadoop job client then submits the job and
configuration to the JobTracker

• JobTracker assumes the responsibility of distributing


the software/configuration to the slaves, scheduling
tasks and monitoring them, providing status and
diagnostic information
47
48

You might also like