0% found this document useful (0 votes)
18 views20 pages

CHAPTER1 Update

The document discusses centralized and distributed systems. Centralized systems have a single central node while distributed systems have multiple nodes connected over a network. Transparency in distributed systems aims to hide the distribution from users. Developing distributed systems can be challenging due to assumptions about reliability, security and homogeneity of networks. Cluster and grid computing are examples of distributed systems.

Uploaded by

md5fxz9ths
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views20 pages

CHAPTER1 Update

The document discusses centralized and distributed systems. Centralized systems have a single central node while distributed systems have multiple nodes connected over a network. Transparency in distributed systems aims to hide the distribution from users. Developing distributed systems can be challenging due to assumptions about reliability, security and homogeneity of networks. Cluster and grid computing are examples of distributed systems.

Uploaded by

md5fxz9ths
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

KING KHALID UNIVERSITY

Department of Computer Science


College of Computer Science

Parallel and Distributed Computing– (482-CCS-3)

Dr. Mohammad Nadeem Ahmed

Chapter 1
Introduction to Distributed Systems: Transparency in a distributed system,
scalability problems, scaling techniques, pitfalls when developing distributed
systems, cluster computing systems, grid computing systems, transaction
processing systems, Enterprise Application Integration, distributed pervasive
systems.
CENTRALIZED SYSTEMS:

We start with centralized systems because they are the most intuitive and easy to understand and
define.

Centralized systems are systems that use client/server architecture where one or more client nodes are
directly connected to a central server. This is the most commonly used type of system in many
organizations where a client sends a request to a company server and receives the response.

Components of Centralized System are: -

• Node (Computer, Mobile, etc.).

• Server.

• Communication link (Cables, Wi-Fi, etc.).

Example: -

● DBMS that uses a centralized architecture,


● Fast-food businesses like Burger King, Pizza Hut and McDonald's use a predominantly centralized
structure to ensure that control is maintained over their many thousands of outlets.
● Application development by deploying test servers leading to easy debugging, easy deployment,
easy simulation

Characteristics of Centralized System – :-

● Presence of a global clock: As the entire system consists of a central node(a server/ a master)
and many client nodes(a computer/ a slave), all client nodes sync up with the global clock(the
clock of the central node).
● One single central unit: One single central unit which serves/coordinates all the other nodes in
the system.
● Dependent failure of components: Central node failure causes the entire system to fail. This
makes sense because when the server is down, no other entity is there to send/receive
responses/requests.
● Scaling: only Vertical scaling is possible( we can increase memory)
Disadvantages of Centralized System –

● Highly dependent on the network connectivity – The system can fail if the nodes lose
connectivity as there is only one central node.
● Single point of failure: Centralized systems have a single point of failure, which can cause the
entire system to fail if the central node goes down.
● Abrupt failure of the entire system- as the server can only have a finite number of open ports to
which can listen to connections from client nodes. So, when high traffic occurs like a shopping
sale, the server can essentially suffer a Denial-of-Service attack or Distributed Denial-of-Service
attack.
● Less possibility of data backup. If the server node fails and there is no backup, you lose the data
straight away
● Difficult server maintenance – There is only one server node and due to availability reasons, it is
inefficient and unprofessional to take the server down for maintenance. So, updates have to be
done on-the-fly(hot updates) which is difficult and the system could break.
● Security risks: Centralized systems are more vulnerable to security risks as the central authority
has complete access to all the data.
● Limited scalability: Centralized systems have limited scalability as the central node can only
handle a limited number of clients at a time.
● Limited innovation: Centralized systems can stifle innovation as the central authority has
complete control over the system, which can limit the scope for experimentation and creativity.

DISTRIBUTED SYSTEMS:

It is a collection of autonomous computers linked by a computer network & equipped with distributed
system software(middleware).

Components of Distributed System :

• Node (Computer, Mobile, etc.)

• A communication link (Cables, Wi-Fi, etc.)


Example :

Google search system. Each request is worked upon by hundreds of computers that crawl the web and
return the relevant results. Telephone, Cellular Network,facebook etc.

Characteristics of Distributed System:

● Simultaneous processing: Multiple machines can process the same function simultaneously
● No Physical Synchronous global Clock: We are using Logical Clock for global ordering of events
from different processes in such system or keep track of time based events.
● Error detection: Failures can be more easily detected
● Transparency: A node can access and communicate with other nodes in the system
● Low latency than a centralized system – Distributed systems have low latency because of high
geographical spread, hence leading to less time to get a response.
● Scalability: Distributed systems are highly scalable as they can be easily expanded by adding
new nodes to the network. This allows the system to handle a large amount of data and traffic
without compromising on performance.
● Fault tolerance: Distributed systems are fault-tolerant, meaning they can continue to function
even if some nodes fail or go offline. This is because the workload is distributed among multiple
nodes, so the system can continue to operate even if some nodes are down.
● Increased reliability: With multiple nodes in the network, distributed systems are more reliable
than centralized systems. Even if one node fails, there are still other nodes that can provide the
required service.
● Cost-effective: Distributed systems are often more cost-effective than centralized systems, as
they can be built using off-the-shelf hardware and software components. This makes them a
more affordable option for many organizations.
● Improved performance: Distributed systems can achieve higher performance by using parallel
processing techniques, where tasks are split among multiple nodes in the network, and each
node processes its share of the workload simultaneously. This can significantly reduce
processing time and improve overall system performance.
Transparency in a distributed System:
In distributed systems, transparency is defined as the masking from the user and the application
programmer regarding the separation of components, so that the whole system seems to be like a single
entity rather than individual components.

Access Transparency: Access Transparency allows the same operations to be used to access local and
remote resources. The file distribution must be hidden from the clients.

Location Transparency: Hides where resource is located. There is no information regarding the physical
location of the object in case of a location transparent name.

Concurrency Transparency: Concurrency Transparency permits many processes to run in parallel using
shared resources without interfering with one another. Hide that resource may be shared by several
competitive user.

Replication Transparency: Hides facts about copies of data, ensures the existence of numerous
instances of resources to improve reliability and performance without having to know about replication
to the user.

Failure Transparency: Hides the failure of recovery of a resource. Allowing users and application
programs to execute tasks even when hardware and software components fail.

Performance Transparency: Performance Transparency enables system reconfiguration to increase or


enhance performance

Mobility (Migration) Transparency: Hide the resources may move to another location. Mobility
Transparency lets a system or resources move around without disrupting user or software processes.

Scaling (Size) Transparency: Scaling Transparency enables systems and applications to scale up without
requiring changes to the system architecture or application techniques.
Pitfalls when Developing Distributed Systems
It should be clear by now that developing a distributed system is a formidable task. There are so many
issues to consider at the same time that it seems that only complexity can be the result. Nevertheless,
by following a number of design principles, distributed systems can be developed that strongly adhere
to the goals we set out in this paper. Distributed systems differ from traditional software because
components are dispersed across a network. Not taking this dispersion into account during design time
is what makes so many systems needlessly complex and results in flaws that need to be patched later
on. Peter Deutsch, at the time working at Sun Microsystems, formulated these flaws as the following
false assumptions that everyone makes when developing a distributed application for the first time.

 The network is reliable


 The network is secure
 The network is homogeneous
 The topology does not change
 Latency(delay) is zero
 Bandwidth is infinite
 Transport cost is zero
 There is one administrator

Cluster Computing Systems


A Computer Cluster is a local network of two or more homogeneous computers.A computation process
on such a computer network i.e. cluster

In virtually all cases, cluster computing is used for parallel programming in which a single (compute
intensive) program is run in parallel on multiple machines.

● Cluster is a widely used term means independent computer combined into a unified system
through software and Networking.
● Clusters are typically used for High Availability (HA) for greater Reliability and High Performance
Computing (HPC) to provide greater computational power than a single computer can provide.
● Cluster is composed with many commodity (Cheap) computer, linked together by a high speed
dedicated network
Cluster Categorization:-

● High Availability –HA clusters


● Load Balancing Cluster
● High performance Cluster (HPC)
● Grid Cluster
High performance Cluster

HPC is technology that uses clusters of powerful processors, working in parallel, to process massive
multi-dimensional datasets (big data) and solve complex problems at extremely high speeds. HPC
systems typically perform at speeds more than one million times faster than the fastest commodity
desktop, laptop or server systems.

For decades the HPC system paradigm was the supercomputer, a purpose-built computer that embodies
millions of processors or processor cores. Supercomputers are still with us; at this writing, the fastest
supercomputer is the US-based Frontier (link resides outside ibm.com), with a processing speed of
1.102 exaflops, or quintillion floating point operations per second (flops). But today, more and more
organizations are running HPC solutions on clusters of high-speed computers servers, hosted on
premises or in the cloud.
Grid Computing

Grid Computing can be defined as a network of homogeneous or heterogeneous computers


geographically distributed working together over a long distance to perform a task that would rather be
difficult for a single machine.

Grid size can be quite large.

Uses:- to perform large tasks or solve complex problems that are difficult to do on a single computer.
For example:-

● Meteorologists use grid computing for weather modeling,


● Artificial Intelligence
● Life science. Life science is one of the fastest-growing application areas of grid computing. ...
● Engineering-oriented applications. ...
● Data-oriented applications. ...
● Scientific research collaboration (e-Science) ...
● Commercial applications.

A layered architecture for grid computing systems

1) Fabric Layer: The lowest layer in grid architecture. All shareable resources are placed in this
layer; such as processors, memories, sensors and actuators. It is clear that in grid network, grid
protocols are responsible for resource control.

2) Connectivity Layer: In this layer those protocols are placed which are related to
communication and authentication. It consists of communication protocols for supporting grid
transactions that span the usage of multiple resources
3) Resource Layer: All common actions related to network parts are guided in this layer; like
negotiation, initiation, monitoring, control, accounting and payment. This is responsible for
managing a single resource. It uses the functions provided by the connectivity layer and calls
directly the interfaces made available by the fabric layer

4) Collective Layer: It deals with handling access to multiple resources and typically consists of
services for resource discovery, allocation and scheduling of tasks onto multiple resources, data
replication.

5) Application layer:- It is user interaction layer, user interact to system via application layer. It consists of
the applications that operate within a virtual organization and which make use of the grid computing
environment

Difference between Cluster and Grid Computing

Cluster Computing Grid Computing

Nodes must be homogeneous i.e. they should


Nodes may have different Operating systems and hardwares.
have same type of hardware and operating
Machines can be homogeneous or heterogeneous.
system.

Computers in a cluster are dedicated to the Computers in a grid contribute their unused processing
same work and perform no other task. resources to the grid computing network.

Computers may be located at a huge distance from one


Computers are located close to each other.
another.

Computers are connected by a high speed local Computers are connected using a low speed bus or
area network bus. the internet.

Computers are connected in a centralized Computers are connected in a distributed or decentralized


network topology. network topology.

It may have servers, but mostly each node behaves


Scheduling is controlled by a central server.
independently.
Whole system has a centralized resource
Every node manages it’s resources independently.
manager.

Whole system functions as a single system. Every node is autonomous, and anyone can opt out anytime.

Cluster computing is used in areas such


Grid computing is used in areas such as predictive
as WebLogic Application Servers, Databases,
modeling, Automation, simulations, etc.
etc.
Scaling Techniques
A system is said to be scalable if it can handle the addition of users and resources without suffering a
loss of performance or increase in administrative complexity.

Following are the challenges while designing a scalable distributed system

 Controlling the cost of physical resources


 Controlling performance loss
 preventing software resources running out

Commonly used technique for Scaling:-

Hiding communication latencies:-Important to achieve geographical scalability. The basic idea is simple:
try to avoid waiting for responses to remote (and potentially distant) service requests as much as
possible.

For example, when a service has been requested at a remote machine, an alternative to waiting for a
reply from the server is to do other useful work at the requester's side.

Partition & Distribution:-

Distribution involves taking a component, splitting it into smaller parts, and subsequently spreading
those parts across the system. An excellent example of distribution is the Internet Domain Name System
(DNS).The names in each zone are handled by a single name server. Without going into too many
details, one can think of each path name being the name of a host in the Internet, and thus associated
with a network address of that host

An Example, consider the World Wide Web. To most users, the Web appears to be an enormous
document-based information system in which each document has its own unique name in the form of
a URL. Conceptually, it may even appear as if there is only a single server. However, the Web is
physically partitioned and distributed across a few hundred million servers, each handling a number of
Web documents. The name of the server handling a document is encoded into that document’s URL. It
is only because of this distribution of documents that the Web has been capable of scaling to its
current size

Replication:-

scalability problems often appear in the form of performance degradation, it is generally a good idea to
actually replicate components across a distributed system. Replication not only increases availability,
but also helps to balance the load between components leading to better performance. Also, in
geographically widely-dispersed systems, having a copy nearby can hide much of the communication
latency problems mentioned before.

Transaction processing systems

A transaction is a program including a collection of database operations, executed as a logical unit of


data processing. The operations performed in a transaction include one or more of database operations
like insert, delete, update or retrieve data. It is an atomic process that is either performed into
completion entirely or is not performed at all. A transaction involving only data retrieval without any
data update is called read-only transaction.
In distributed systems, transactions are often constructed as a number of subtransactions, jointly
forming a nested transaction as shown in below Figure.

Fig: - A nested transaction

Each high level operation can be divided into a number of low level tasks or operations.

For example, a data update operation can be divided into three tasks −

read_item() − reads data item from storage to main memory.

modify_item() − change value of item in the main memory.

write_item() − write the modified value from main memory to storage.

The low level operations performed in a transaction are :-

Fig:- Primitives of Transactions


Properties of Transaction:-

Atomicity − This property states that a transaction is an atomic unit of processing, that

is, either it is performed in its entirety or not performed at all. No partial update should

exist.(Ex:- while doing ticket booking in payment laptop closed, so payment will be

roll back, no ticket)

Consistency − A transaction should take the database from one consistent state to

another consistent state. It should not adversely affect any data item in the database.

Ex:- Calculate transaction before and after transaction.

Isolation − A transaction should be executed as if it is the only one in the system.

There should not be any interference from the other concurrent transactions that are

simultaneously running. The transaction is not aware of any other transaction.

Suppose that we have multiple transactions so there should be any impact of any

translation in one another.

Durability − If a committed transaction brings about a change, that change should be

durable in the database and not lost in case of any failure.


Fig : The role of a TP monitor in distributed systems

In the early days of enterprise middleware systems, the component that handled distributed (or nested)
transactions formed the core for integrating applications at the server or database level. This component was called
a transaction-processing monitor or TP monitor for short. Its main task was to allow an application to access
multiple server/databases by offering it a transactional programming model, as shown in above figure. Essentially,
the TP monitor coordinated the commitment of sub transactions following a standard protocol known as distributed
commit,

Scalability Problems

Characteristics of decentralized algorithms

• No machine has complete information about the system state.


• Machines make decisions based only on local information.
• Failure of one machine does not ruin the algorithm.
• There is no implicit assumption that a global clock exists.
Enterprise Application Integration
Enterprise application integration allows these tools to talk with one another, helping to
synchronize data and the workforce and greatly improving the old way of doing things with
independent, isolated legacy systems. This guide explores the pros and cons of having an EAI
system and what it means for your company's customer relationship management, supply chain
management, and business performance.

This integration can allow for differing financial applications to interface effectively and process
data or transactions.

Also known as enterprise app integration, this refers to the process of syncing or aligning the
various systems and databases used within a company, network or industry.

Several types of communication middleware exist. With remote procedure calls (RPC), an
application component can effectively send a request to another application component by doing
a local procedure call

As the popularity of object technology increased, techniques were developed to allow calls to
remote objects, leading to what is known as remote method invocations (RMI). An RMI is
essentially the same as an RPC, except that it operates on objects instead of functions

Figure:- Middleware as a communication facilitator in enterprise application integration.

The most common types of enterprise applications are as follows.

 Enterprise Resource Planning (ERP)


 Enterprise Messaging System
 Payment Processing
 Email marketing platform
 Service desk applications
 Customer relationship management (CRM) tools
 Content management system
 Business continuity planning
 Business analytics and intelligence platform
 Accounting system
 Automated billing system

Application Integration Tools:-

Many application integration tools are available in the market. Some important tools are:-

 Mulesoft
 Tibco software
 Informatica
 Dell Bhoomi
 Workato
 Celigo
 Cloud Elements
 InterSystems
 OpenLegacy
 IBM
 Boomi

Distributed Pervasive Systems

Pervasive Computing is also called Ubiquitous (Present everywhere) computing, and it is the
new trend toward embedding everyday objects with microprocessors so that they can
communicate information. It refers to the presence of computers in common objects found all
around us so that people are unaware of their presence. All these devices communicate with each
other over wireless networks without the interaction of the user.

Pervasive computing is a combination of three technologies, namely:


1. Micro electronic technology:
This technology gives small powerful devices and displays with low energy consumption.

2. Digital communication technology:


This technology provides higher bandwidth, higher data transfer rate at lower costs and with
worldwide roaming.

3. The Internet standardization:


This standardization is done through various standardization bodies and industry to give the
framework for combining all components into an interoperable system with security, service and
billing systems.
The Core requirement of Distributed pervasive system

Distribution: - As mentioned Pervasive computer system is an example of distributed system. The device
and other computer forming a node of the system are simply networked and work together to form the
illusion of single coherent system, distribution comes naturally there will be a device close to the user(
sensors ,actuator( a device that causes a machine or other device to operate) connected to a
computer and hidden and operates remotely in cloud.

Interaction: - Interaction between user and devices is highly unobtrusive (not easily noticed)

Context Awareness: - refers to the idea that computers can both sense, and react based on their
environment.

Example: - how smartphones react to ambient light, adjusting their screen brightness for optimal
readability
Autonomy: - enables a system to control its own actions independently. Autonomous systems are
defined as systems that are self-governing and are capable of their own independent decisions and
actions

Example:- Automatic updates, Adding devices, Address allocation

Intelligence:- Uses methods and techniques from the field of AI. Wide range of Advance algorithm and
models needs to be deployed to handle incomplete inputs , quickly react to changing environment

Some of the key characteristics of Pervasive Computing are:

 Ultra-fast computing capacity


 Ability to get embedded and negligibly visible or apparent as extraneous object
 Low power consumption and battery powered
 In-built memory retention
 Capability to sense the environment and process it quickly and continuously
 Active connectivity through wireless network and internet
 Ability to interpret / translate different languages in written or spoken or via gestures or other
triggers
 Interoperability with other devices
 Easily portable
 Operable without any active human supervision

Applications:
There are a rising number of pervasive devices available in the market nowadays. The areas of
application of these devices include:

 Retail
 Airlines booking and check-in
 Sales force automation
 Healthcare
 Tracking
 Car information System
 Email access via WAP (Wireless Application Protocol) and voice.

You might also like