Distributed System 2022
Distributed System 2022
net/publication/280977301
CITATIONS READS
2 6,399
1 author:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Rafiqul Zaman Khan on 15 August 2015.
possibly different in nature that each one contributes distributed computing. Actually topology defines how the
processing cycles to the overall system over a network. nodes will contribute their computational power towards
the tasks [11, 15].
Parallel computing needs expensive parallel hardware
to coordinate many processors within the same machine 3.6 Overheads
but distributed computing uses already available Overheads measure the frequency of communication
individual machines which are cheap enough in today’s among processors during execution. During the execution,
market. processors communicate to each other for the completion
of the job as early as possible, so obviously
3. Terminologies Used in Distributed Computing communication overheads take place. There are three
There are some basic terms used in distributed types of overheads mainly bandwidth, latency and
computing and ideas that will be defined first to response time [11]. First two are mostly influenced by the
understand the concept of distributed computing. network underlying the distributed computer system and
the last one is the administrative time taken for the system
3.1 Job to respond.
A job is defined as the overall computing entity that’s
need to be executed to solve the problem at hand [11]. 3.7 Bandwidth
There are different types of jobs depending upon the It measures the amount of data that can be transferred
nature of computation or algorithm itself. Some jobs are over a communication channel in a finite period of time
completely parallel in nature and some are partially [11].It always plays a critical role for the system
parallel. Completely parallel jobs are known as efficiency. Bandwidth is a crucial factor especially in case
embarrassingly parallel problem. In embarrassingly of fine grain problem where more communication takes
parallel problem communication among different entities place. The bandwidth is often far more critical than the
is minimum but in case of partially parallel problem speed of the processing nodes. The slow data rate
communication becomes high due to the communication obviously will restrict the speed of the processor and
among different processes running on different nodes to ultimately will cause poor performance efficiency.
finish the job. 3.8 Latency
It refers to the interval between an action being initiated
3.2 Granularity and the action actually having some effect [11]. Latency
Simply the size of tasks is expressed as specifies different meanings in different situations.
the granularity of parallelism. The grain size of a parallel Latency is the time between the data being sent and the
instruction is a measure of how much work each processor data actually being received in case of underlying network
does compared to an elementary instruction execution time called network latency. In case of task, latency is the time
[11]. It is equal to the number of serial instructions done between a task being submitted to a node and the node
within a task by one processor. There are mainly three actually begins the execution of the task called response
types of grain size exists: fine, medium and coarse grain. time. Network latency is closely related with the
bandwidth of the underlying network and both are critical
3.3 Node to the performance of a distributed computing system.
A node is an entity that is capable of executing the Response time and the network latency together are often
computing tasks. In traditional parallel system this refers called parallel overhead.
mostly to a physical processor unit within the computer
system. But in distributed computing a computer is
4. Performance Parameters in Distributed
generally considered as a computing node in a network
[11]. But in reality trends have been changed. A computer Computing
may have more than one core like dual core or multi core There are many performance parameters which are
processors. Both the terms node and processor have been mostly used for measuring parallel computing
used interchangeably in this literature. performance. Some of them are listed as follows:
optimize the cost of data movement among tasks. Even a The larger granularity may take longer time to fill up
decentralized dynamic mapping uses the information the pipeline and the first process may take longer time to
about the task-dependency graph structure for minimizing pass the data to the next step so the next process may have
interaction overhead [11, 14, 15]. to wait longer and too fine granularity may cause more
over heads so this model uses overlapping interaction with
5.3 The Work Pool Model computation to reduce the overheads [11].
In this model the tasks may be assigned to any
processor by a dynamic mapping technique for load 5.6 Hybrid Models
balancing either by centralized or decentralized fashion Sometimes, one or two models are combined to form
[11, 14, 15].This model does not follow any pre-mapping hybrid model shown in Fig. 2 to solve the current problem
scheme. The work already may be statically available in hand [11]. Many times, an algorithm design may need
before computation orcan be created dynamically. features of more than one algorithm model. For example,
Whatever the process available or generated will be added pipeline model is combined with a task dependency graph
to the global (possibly distributed) work pool. It is in which data passed through the pipeline model lead by
necessary to use termination detection algorithm for the dependency graph [15, 19, 20].
notifying the other processes to understand the completion
of entire work when dynamic and decentralized mapping
is used so that the processor can stop finding more jobs
[11, 14, 15].
5.5 The Pipeline or Producer-Consumer Model 6.3 Access to Geographically Remote Data and
In this model, the data is passed through pipeline which Resources
has several stages and each stage (process) does some In many instances data cannot be replicated at each site
work on the data and passed to the next stage. This due to its heavy size and it also may be risky to keep the
concurrent execution on a data stream by different vital data in each site [10, 11, 15]. For example, banking
programs is called stream parallelism[15, 19, 20, 21]. The system’s data cannot be replicated everywhere due to its
pipelines may be in the form of linear or multidimensional sensitivity. So it is rather stored in central server which
arrays, trees or general graphs. A pipeline is a chain of can be accessed by the branch offices through remote log
producers and consumers because in this model each in. Advances in mobile communication through which the
process generates result for next process. In general, static central server can be accessed which needs distributed
mapping is used in this model. protocols and middleware [15].
Int. J. Advanced Networking and Applications 2634
Volume: 07 Issue: 01 Pages: 2630-2635 (2015) ISSN: 0975-0290