Advanced Distributed Systems
Advanced Distributed Systems
Designing the distributed systems does not come for free. Some challenges need to be
overcome in order to get the ideal systems. The challenges in distributed systems are:
Heterogeneity
This term means the diversity of the distributed systems in terms of hardware, software,
platform, etc. Modern distributed systems will likely span different:
Network: Local network, the Internet, wireless network, satellite links, etc.
Transparency
Distributed systems designers must hide the complexity of the systems as much as they
can. Adding abstraction layer is particularly useful in distributed systems. While users hit
search in google.com, they never notice that their query goes through a complex process
before google shows them a result. Some terms of transparency in distributed systems
are:
Openness
If the well-defined interfaces for a system are published, it is easier for developers to add
new features or replace sub-systems in the future. Example: Twitter and Facebook have
API that allows developers to develop their own software interactively.
Concurrency
Distributed Systems usually is multi-users environment. In order to maximize
concurrency, resource handling components should be anticipate as they will be accessed
by competing users. Concurrency is a tricky challenges, then we must avoid the systems
state from becoming unstable when users compete to view or update data.
Security
Every system must consider strong security measurement. Distributed Systems somehow
deals with sensitive information; so secure mechanism must be in place.
Scalability
Distributed systems must be scalable as the number of user increases.
Scalability has 3 Dimensions:
Resilience to Failure
Architectural Models
How are responsibilities distributed between system components and how are these
components placed?
Client-server model
The system is structured as a set of processes, called servers that offer services to the users,
called clients.
The client-server model is usually based on a simple request/reply protocol, implemented with
send/receive primitives or using remote procedure calls (RPC) or remote method invocation
(RMI): - the client sends a request (invocation) message to the server asking for some service;
- the server does the work and returns a result (e.g. the data requested) or an error code if the
work could not be performed.
A server can itself request services from other servers; thus, in this new relation, the server
itself acts like a client.
Peer-to-peer
All processes (objects) play similar role.
Processes (objects) interact without particular distinction between clients and servers.
File Sharing
File sharing is the simplest and the most widely-deployed application in P2P systems. A filesharing application uses the P2P substrate to discover peers who have a requested file. Once
one or more suppling peers have been found, connections are established between the
supplier(s) and the requester. The application does not do more than storing and providing
files to requesting peers.
Media Streaming and High-bandwidth Content Distribution
In file-sharing P2P applications, a client has to download the entire file before starting using
it. Consider for example, a one-hour movie recorded at 1 Mb/s, and being downloaded by a
client with an in-bound bandwidth of 1.5 Mb/s. Ignoring all protocols overhead and
retransmissions, the client will have to wait for 40 minutes to start watching the movie! Given
that most of the contents distributed over the current P2P systems are multimedia files [17],
P2P media streaming applications have been receiving increasing attention in the research
community [40]. Real-time streaming applications start playing out the requested movie after
a short (e.g., order of seconds) waiting period.
File and Storage Systems
Distributed file systems provide logical functions similar to those provided by a centralized
file server, but they are constructed from physically distributed peers.
Some problems with client-server:
Centralisation of service poor scaling
Limitations: capacity of server bandwidth of network connecting the server
Peer-to-Peer tries to solve some of the above
It distributes shared resources widely
share computing and communication loads
Problems with peer-to-peer:
High complexity due to
- cleverly place individual objects
- retrieve the objects
- maintain potentially large number of replicas
Based on the expressiveness of the language used to represent subscribers interests, pub/sub
systems are commonly classified into two types, namely topic-based and content-based [53].
Simply put, both topic-based and content-based pub/sub systems allow a subscriber to issue
subscriptions that declare filtering constraints on the produced publication messages. Only
publications that satisfy these constraints are delivered to a subscriber. These publications are
said to match the clients subscription. Topic-based pub/sub systems support simple
constraints that are based upon a predefined set of topics.
system provides total ordering guarantee, then all traders will receive stock quotes in the exact
same order. This may be useful in order to ensure an even and unbiased playing field for all
competing traders. Alternatively, the system may provide causal ordering such that the
precedence relationships between messages are preserved. This can be useful to study how
traders react to delivery of market news, for instance.
Recovery from failures: Distributed applications are commonly composed of faultprone
processes and networking components which may cease to operate at any point in time or
become disconnected from one another. For example, service providers may instantaneously
crash at any time or the machines that they run on may be unplugged suddenly. Likewise,
communication links may be unreliable and experience long periods of disconnections.
Occurrence of such failures in an unprepared pub/sub system can significantly hinder its
operation and even permanently disrupt its availability. To recover from such failure
scenarios, the system must have built-in recovery mechanisms that ensure such disruptions are
temporary and do not impact the operation of the system in the long run. In a distributed
pub/sub system, recovery typically involves amending the pub/sub overlay (i.e., maintaining
connectivity among service providers despite failures) and updating routing tables of service
providers accordingly in order to setup new forwarding paths in the network (i.e., in order to
re-route publications).
Cloud computing
In the simplest terms, cloud computing means storing and accessing
data and programs over the Internet instead of your computer's hard
drive. The cloud is just a metaphor for the Internet. It goes back to the
days of flowcharts and presentations that would represent the gigantic
server-farm infrastructure of the Internet as nothing but a puffy, white
cumulonimbus cloud, accepting connections and doling out information
as it floats.
What cloud computing is not about is your hard drive. When you store
data on or run programs from the hard drive, that's called local storage
and computing. Everything you need is physically close to you, which
means accessing your data is fast and easy, for that one computer, or
others on the local network. Working off your hard drive is how the
computer industry functioned for decades; some would argue it's still
superior to cloud computing, for reasons I'll explain shortly.
business, you may know all there is to know about what's on the other
side of the connection; as an individual user, you may never have any
idea what kind of massive data-processing is happening on the other
end. The end result is the same: with an online connection, cloud
computing can be done anywhere, anytime.
Common Cloud Examples
The lines between local computing and cloud computing sometimes get
very, very blurry. That's because the cloud is part of almost everything
on our computers these days. You can easily have a local piece of
software (for instance, Microsoft Office 365 ) that utilizes a form of
cloud computing for storage
Some other major examples of cloud computing you're probably using:
Google Drive : This is a pure cloud computing service, with all the
storage found online so it can work with the cloud apps: Google Docs,
Google Sheets, and Google Slides. Drive is also available on more
than just desktop computers; you can use it on tablets like
the iPad $335.00 at Amazon or on smartphones, and there are separate
apps for Docs and Sheets, as well. In fact, most of Google's services
could be considered cloud computing: Gmail, Google Calendar, Google
Maps, and so on.
Apple iCloud : Apple's cloud service is primarily used for online storage,
backup, and synchronization of your mail, contacts, calendar, and
more. All the data you need is available to you on your iOS, Mac OS,
or Windows device (Windows users have to install the iCloud control
panel). Naturally, Apple won't be outdone by rivals: it offers cloudbased versions of its word processor (Pages), spreadsheet (Numbers),
and presentations (Keynote) for use by any iCloud subscriber. iCloud is
also the place iPhone users go to utilze the Find My iPhone feature
that's all important when the phone goes missing.
Hybrid services like Box, Dropbox , and SugarSync all say they work in
the cloud because they store a synced version of your files online, but
most also sync those files with local storage. Synchronization to allow
all your devices to access the same data is a cornerstone of the cloud
computing experience, even if you do access the file locally.