Distributed Memory Architecture
Distributed Memory Architecture
Distributed Multiprocessor
Architectures
1
Distributed Multiprocessor Architectures
2
Some Basics….
• Whenever working on projects, it seems as though
several people coordinating together makes for a
better solution then one person trying to piece things
together on their own.
• This is similar to the concept of multiprocessing.
• Multiprocessing is n number of p processors working
and operating concurrently.
• A multiprocessing system refers to a system
configuration that contains more than one main
control processor unit (CPU).
3
Why use a
multiprocessing system?
• First of all, a multiprocessing system is used to
increase overall system performance in work
being accomplished, also referred to as
throughput.
• By working together problems can be divided
up among processor for faster completion, also
called “divide and conqueror”.
• Another reason for using multiprocessing
systems is to increase system availability.
4
Introduction
• Key attributes of “multiprocessors”:-
– Single computer that includes multiple processors
– Processors may communicate at various levels
• Message passing or shared memory
• Multiprocessor and Multicomputer systems
– Multiple computer system consist of several autonomous computers
which may or may not communicate with each other.
– Multiprocessor system is controlled by single operating system which
provides mechanism for interactions among processors
• Architectural models
– Tightly coupled multiprocessor
– Loosely coupled multiprocessor
5
Tightly coupled multiprocessor(Basics)
7
Architecture(Without Private Cache)
• This model consists of p number of processors, l
memory modules, and d I/O channels.
• Everything is then connected using a P/M
interconnection network (PMIN).
• The PMIN is a switch that can connect every
processor to every memory module.
• A memory module can satisfy only one
processors request in a given memory cycle. This
conflict is arbitrated by the PMIN.
8
Tightly Coupled Architecture
• However, in this system the best way to prevent
these types of conflicts is to make l equal to p (i.e.
memory modules equal to the number of
processors).
• Another way of eliminating this conflict is to use
unmapped local memory (ULM)(Reserved Memory
Area For Each Processor)
• By adding the ULM we are able to reduce the
amount of traffic to the PMIN and thereby reducing
conflicts to and from memory.
9
Tightly coupled multiprocessor contd.
Interrupt signal
Interconnection network
(ISIN) Input Output
channels
d-1
p-1 . . disks
Processors .. Input/Output . .
.. . .
.. Interconnection network . .
0 (IOPIN) . .
0
Processor Memory
Interconnection network
(PMIN)
12
Tightly coupled multiprocessor contd.
Interrupt signal
Interconnection network
(ISIN) Input Output
channels
d-1
p-1 . . disks
Processors .. Input/Output . .
.. . .
.. Interconnection network . .
0 (IOPIN) . .
Mapped
Local Memory 0
......
Unmapped Local Memory
Private
Caches
Processor Memory
Interconnection network
(PMIN)
14
Tightly coupled multiprocessor contd.
• Processor types
– Homogeneous, if all processors perform same
function
– Heterogeneous, if processors perform different
functions
Note: Two functionally same processor may differ
along other parameters like I/O, memory size,
15
Loosely Coupled Architecture
• Each processor has its own set of I/O devices
and memory where it accesses most of its
instructions and data
• Computer Module: Processor, I/O interface
and memory
Input/Output
Local memory (I/O)
Processor (LM)
(P)
Channel and
Arbiter Switch
(CAS)
16
Loosely coupled multiprocessor contd.
• Inter-process communicate over different module happens by
exchange of messages, using message transfer system (MTS)
• Distributed system, degree of coupling is loose
• Degree of memory conflicts is less
LM I/O LM I/O
P P
CAS CAS
Computer Module 0 ………….. Computer Module N-1
17
Loosely coupled multiprocessor
• Inter module communication
– Channel arbiter and switch (CAS)
– Arbiter decide when requests from two or more computer
module collide in accessing a physical segment of MTS
18
Loosely coupled multiprocessor
• Message Transfer System (MTS)
– Time based or shared memory
– The latter case can be implemented with set of
memory modules and processor-memory
interconnection network or multiported main
memory.
– MTS determines the performance of
multiprocessor system.
19
Loosely coupled multiprocessor
• For LCS, that use single time shared bus,
performance limited by ,message arrival rate
on bus, message length and bus capacity.
• For LCS with shared memory, limiting factor is
memory conflict problem imposed by
processor memory interconnection network.
20
Cm* Architecture
• Project at Carnegie Melon University
• Now what is computer module?
P S
LM I/O
• Computer module consists of processor, S local, local
memory and I/O.
• S local similar to CAS in loosely coupled arch.
21
Cluster of computer Modules
Inter-cluster Bus
LM I/O LM I/O
22
Role Of S local
• Receives and interprets requests for access to P's
local and foreign to local memory and the I / O
• S allows a local P to access external resources Cm
and to make interpretation of local and external
applications software provide a translation of
local address.
23
Address Translation
24
K map Components
• It uses 4 high order bits along with 1 PSW bit
and then they access map table.
• Map Table determines whether memory is local or
not.
• If memory is non local control is given to K map via
map bus.
• CM connected to k map via map bus.
• K map responsible for routing data between s
locals.
25
AP
Kmap Components
Intercluster Bus 1
Intercluster Bus 2
Link
SEND SEND
SERVICE RETURN PORT 1
PORT 2
RUN
KBUS PMAP
Cm Cm … Cm 26
Kmap Components
• Request for non local memory arrives at Kbus
via map bus.
• Link manages communication Between Kmap
and another Kmap.
• Pmap ->mapping processor which response to
request between Kbus and link.
27
Kmap Components
• Kmap can simultaneously handle 8 processor
request.
• Pmap uses the concept of queue to handle
request.
• Service request signaled to kbus whenever
request for non local memory reference, such
computer module called master Cm.
28
Kmap Components
• Kmap fetches virtual address via map bus and
allocates context for Pmap.
• It places the virtual address in Pmap run queue.
• Pmap performs virtual address to physical
address translation.
• Using physical address it can initiate memory
acces in any cm.
29
Kmap Components
• Kmap services the out request by sending
physical memory of memory request via map
bus.
• When destination cm completes memory
access it sends return signal to Kmap.
30
Intra-cluster Communication
KMAP
4
PMAP
3 5 Map Bus
RUN OUT
1
KBUS Cm … Cm
2
Master Slave
32
Intra-cluster communication
3
Intercluster Bus
2 4
KMAP KMAP
Map Bus Master Slave Map Bus
1 … 5…
Cm Cm Cm Cm
Master Slave
33
Intra-cluster communication
Cop Segment Offset 3
Intercluster Bus
8
2 9 7 4
KMAP KMAP
R/W Cm # Page Offset
Map Bus Master Slave Map Bus
1 … 5…
Cm
10
Cm Cm Cm
6
K/U R/W Cm # Page Offset
Master Slave
34
BIGDATA FACTs
• Data intensive applications with Petabytes of
data
It's a framework for running applications on large
clusters of commodity hardware which produces
huge data and to process it
Hadoop is a framework used to have distributed
processing of big data which is stored at different
physical locations.
37
• The Apache Hadoop software library is a framework that allows
for the distributed processing of large data sets across clusters
of computers using simple programming models.
• It is designed to scale up from single servers to thousands of
machines, each offering local computation and storage.
• Rather than rely on hardware to deliver high-availability, the
library itself is designed to detect and handle failures at the
application layer, so delivering a highly-available service on top
of a cluster of computers, each of which may be prone to
failures.
38
Hadoop Includes
HDFS a distributed filesystem
Map/Reduce HDFS implements this programming model.
It is an offline computing engine
39
Hadoop HDFS
• Hardware failure is the norm rather than the
exception.
40
HDFS
• run on commodity hardware
43
HDFS Internal
44
Hadoop ma-preduce
• Software framework for easily writing applications which process
vast amounts of data (multi-terabyte data-sets) in-parallel on large
clusters (thousands of nodes) of commodity hardware in a reliable,
fault-tolerant manner
• The framework sorts the outputs of the maps, which are then input
to the reduce tasks
• Typically the compute nodes and the storage nodes are the same
45
Hadoop mapreduce
• The MapReduce framework consists of a single
master JobTracker and one slave TaskTracker per
cluster-node
46
Hadoop mapreduce
• applications specify the input/output locations
• supply map and reduce functions via
implementations of appropriate interfaces and/or
abstract-classes.
• The Hadoop job client then submits the job and
configuration to the JobTracker