0% found this document useful (0 votes)
6 views54 pages

DistributedSystems Ed2023 Ch01

The document discusses the evolution of computer systems from large, independent machines to powerful, networked devices, highlighting the impact of microprocessors and high-speed networks. It distinguishes between decentralized and distributed systems, emphasizing their different organizational structures and the importance of understanding these distinctions for system design. The text also addresses the complexities and challenges associated with distributed systems, including dependencies and partial failures.

Uploaded by

citizeninsane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views54 pages

DistributedSystems Ed2023 Ch01

The document discusses the evolution of computer systems from large, independent machines to powerful, networked devices, highlighting the impact of microprocessors and high-speed networks. It distinguishes between decentralized and distributed systems, emphasizing their different organizational structures and the importance of understanding these distinctions for system design. The text also addresses the complexities and challenges associated with distributed systems, including dependencies and partial failures.

Uploaded by

citizeninsane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

01

INTRODUCTION
2 CHAPTER 1. INTRODUCTION

The pace at which computer systems change was, is, and continues to be
overwhelming. From 1945, when the modern computer era began, until about
1985, computers were large and expensive. Moreover, lacking a way to connect
them, these computers operated independently of one another.
Starting in the mid-1980s, however, two advances in technology began to
change that situation. The first was the development of powerful microproces-
sors. Initially, these were 8-bit machines, but soon 16-, 32-, and 64-bit CPUs
became common. With powerful multicore CPUs, we now are again facing
the challenge of adapting and developing programs to exploit parallelism. In
any case, the current generation of machines have the computing power of the
mainframes deployed 30 or 40 years ago, but for 1/1000th of the price or less.
The second development was the invention of high-speed computer net-
works. Local-area networks or LANs allow thousands of machines within a
building to be connected in such a way that small amounts of information can
be transferred in a few microseconds or so. Larger amounts of data can be
moved between machines at rates of billions of bits per second (bps). Wide-area
network or WANs allow hundreds of millions of machines all over the earth
to be connected at speeds varying from tens of thousands to hundreds of
millions bps and more.
Parallel to the development of increasingly powerful and networked ma-
chines, we have also been able to witness miniaturization of computer systems,
with perhaps the smartphone as the most impressive outcome. Packed with
sensors, lots of memory, and a powerful multicore CPU, these devices are
nothing less than full-fledged computers. Of course, they also have network-
ing capabilities. Along the same lines, so-called nano computers have become
readily available. These small, single-board computers, often the size of a
credit card, can easily offer near-desktop performance. Well-known examples
include Raspberry Pi and Arduino systems.
And the story continues. As digitalization of our society continues, we
become increasingly aware of how many computers are actually being used,
regularly embedded into other systems such as cars, airplanes, buildings,
bridges, the power grid, and so on. This awareness is, unfortunately, increased
when such systems suddenly turn out to be hackable. For example, in
2021, a fuel pipeline in the United States was effectively shut down by a
ransomware attack. In this case, the computer system consisted of a mix
of sensors, actuators, controllers, embedded computers, servers, etc., all
brought together into a single system. What many of us do not realize, is that
vital infrastructures, such as fuel pipelines, are monitored and controlled by
networked computer systems. Along the same lines, it may be time to start
realizing that a modern car is actually an autonomously operating, mobile
networked computer. In this case, instead of the mobile computer being
carried by a person, we need to deal with the mobile computer carrying
people.

DS 4.01 downloaded by [email protected]


1.1. FROM NETWORKED SYSTEMS TO DISTRIBUTED SYSTEMS 3

The size of a networked computer system may vary from a handful of


devices, to millions of computers. The interconnection network may be wired,
wireless, or a combination of both. Moreover, these systems are often highly
dynamic, in the sense that computers can join and leave, with the topology
and performance of the underlying network almost continuously changing.
It is difficult to think of computer systems that are not networked. And
as a matter of fact, most networked computer systems can be accessed from
any place in the world because they are hooked up to the Internet. Studying
to understand these systems can easily become exceedingly complex. In this
chapter, we start with shedding some light on what needs to be understood
to build up the bigger picture without getting lost.

1.1 From networked systems to distributed systems


Before we dive into various aspects of distributed systems, let us first consider
what distribution, or decentralization, actually entails.

1.1.1 Distributed versus decentralized systems


When considering various sources, there are quite a few opinions on dis-
tributed versus decentralized systems. Often, the distinction is illustrated by
three different organizations of networked computer systems, as shown in
Figure 1.1, where each node represents a computer system and an edge a
communication link between two nodes.
To what extent such distinctions are useful remains to be seen, especially
when discussions open on the pros and cons of each organization. For
example, it is often stated that centralized organizations do not scale well.
Likewise, distributed organizations are said to be more robust against failures.
As we shall see, none of these claims are generally true.

(a) (b) (c)

Figure 1.1: The organization of a (a) centralized, (b) decentralized, and


(c) distributed system, according to various popular sources. We take a
different approach, as figures such as these are really not that meaningful.

downloaded by [email protected] DS 4.01


4 CHAPTER 1. INTRODUCTION

We take a different approach. If we think of a networked computer system


as a collection of computers connected in a network, we can ask ourselves
how these computers even became connected to each other in the first place.
There are roughly two views that one can take.
The first, integrative view, is that there was a need to connect existing
(networked) computer systems to each other. Typically, this happens when
services running on a system need to be made available to users and applica-
tions that were not thought of before. This may happen, for example, when
integrating financial services with project management services, as is often the
case within a single organization. In the scientific-research domain, we have
seen efforts to connect a myriad of often expensive resources (special-purpose
computers, supercomputers, very large database systems, etc.) into what came
to be known as a grid computer.
The second, expansive view is that an existing system required an exten-
sion through additional computers. This view is the one most often related to
the field of distributed systems. It entails expanding a system with computers
to hold resources close to where those resources are needed. An expansion
may also be driven by the need to improve dependability: if one computer
fails, then there are others who can take over. An important type of expansion
is when a service needs to be made available for remote users and applications,
for example, by offering a Web interface or a smartphone application. This
last example also shows that the distinction between an integrative and an
expansive view is not a clear-cut.
In both cases, we see that the networked system runs services, where each
service is implemented as a collection of processes and resources spread across
multiple computers. The two views lead to a natural distinction between two
types of networked computer systems:

• A decentralized system is a networked computer system in which pro-


cesses and resources are necessarily spread across multiple computers.
• A distributed system is a networked computer system in which pro-
cesses and resources are sufficiently spread across multiple computers.

Before we discuss why this distinction is important, let us look at a few


examples of each type of system.
Decentralized systems are mainly related to the integrative view of net-
worked computer systems. They come to being because we want to con-
nect systems, yet may be hindered by administrative boundaries. For exam-
ple, many applications in the artificial-intelligence domain require massive
amounts of data for building reliable predictive models. Normally, data is
brought to the high-performance computers that literally train models before
they can be used. But when data needs to stay within the perimeter of an
organization (and there can be many reasons why this is necessary), we need
to bring the training to the data. The result is known as federated learning,

DS 4.01 downloaded by [email protected]


1.1. FROM NETWORKED SYSTEMS TO DISTRIBUTED SYSTEMS 5

and is implemented by a decentralized system, where the need for spreading


processes and resources is dictated by administrative policies.
Another example of a decentralized system is that of distributed ledger,
also known as a blockchain. In this case, we need to deal with the situation
that participating parties do not trust each other enough to set up simple
schemes for collaboration. Instead, what they do is essentially make transac-
tions among each other fully public (and verifiable) by an extend-only ledger
that keeps records of those transactions. The ledger itself is fully spread across
the participants, and the participants are the ones who validate transactions
(of others) before admitting them to the ledger. The result is a decentralized
system in which processes and resources are, indeed, necessarily spread across
multiple computers, in this case due to lack of trust.
As a last example of a decentralized system, consider systems that are
naturally geographically dispersed. This occurs typically with systems in
which an actual location needs to be monitored, for example, in the case of
a power plant, a building, a specific natural environment, and so on. The
system, controlling the monitors and where decisions are made, may easily
be placed somewhere else than the location being monitored. One obvious
example is monitoring and controlling of satellites, but also more mundane
situations as monitoring and controlling traffic, trains, etc. In these examples,
the necessity for spreading processes and resources comes from a spatial
argument.
As we mentioned, distributed systems are mainly related to the expansive
view of networked computer systems. A well-known example is making use
of e-mail services, such as Google Mail. What often happens is that a user logs
into the system through a Web interface to read and send mails. More often,
however, is that users configure their personal computer (such as a laptop) to
make use of a specific mail client. To that end, they need to configure a few
settings, such as the incoming and outgoing server. In the case of Google Mail,
these are imap.gmail.com and smtp.gmail.com, respectively. Logically, it seems
as if these two servers will handle all your mail. However, with an estimate
of close to 2 billion users as of 2022, it is unlikely that only two computers
can handle all their e-mails (which was estimated to be more than 300 billion
per year, that is, some 10,000 mails per second). Behind the scenes, of course,
the entire Google Mail service has been implemented and spread across many
computers, jointly forming a distributed system. That system has been set
up to make sure that so many users can process their mails (i.e., ensures
scalability), but also that the risk of losing mail because of failures, is minimal
(i.e., the system ensures fault tolerance). To the user, however, the image of
just two servers is kept up (i.e., the distribution itself is highly transparent
to the user). The distributed system implementing an e-mail service, such
as Google Mail, typically expands (or shrinks) as dictated by dependability
requirements, in turn, dependent on the number of its users.

downloaded by [email protected] DS 4.01


6 CHAPTER 1. INTRODUCTION

An entirely different type of distributed system is formed by the collection


of so-called Content Delivery Networks, or CDNs for short. A well-known
example is Akamai with, in 2022, over 400,000 servers worldwide. We will
discuss the principle working of CDNs later in Chapter 3. What it boils
down to, is that the content of an actual Website, is copied and spread across
various servers of the CDN. When visiting a Website, the user is transparently
redirected to a nearby server that holds all or part of the content of that Website.
The choice for which server to direct a user to may depend on many things,
but surely when dealing with streaming content, a server is selected for which
good performance in terms of latency and bandwidth can be guaranteed.
The CDN dynamically ensures that the selected server will have the required
content readily available, as well as update that content when needed, or
remove it from the server when there are no or very few users to service there.
Meanwhile, the user knows nothing about what is going on behind the scenes
(which, again, is a form of distribution transparency). We also see in this
example, that content is not copied to all servers, yet only to where it makes
sense, that is, sufficiently, and for reasons of performance. CDNs also copy
content to multiple servers to provide high levels of dependability.
As a final, much smaller distributed system, consider a setup based on a
Network-Attached Storage device, also called a NAS. For domestic use, a
typical NAS consists of 2–4 slots for internal hard disks. The NAS operates
as a file server: it is accessible through a (generally wireless) network for
any authorized device, and as such can offer services like shared storage,
automated backups, streaming media, and so on. The NAS itself can best be
seen as a single computer optimized for storing files, and offering the ability
to easily share those files. The latter is important, and together with multiple
users, we essentially have a setup of a distributed system. The users will
be working with a set of files that are locally (i.e., from their laptop) easily
accessible (in fact, seemingly integrated into the local file system), while also
directly accessible by and for other users. Again, where and how the shared
files are stored is hidden (i.e., the distribution is transparent). Assuming that
sharing files is the goal, then we see that indeed a NAS can provide sufficient
spreading of processes and resources.

Note 1.1 (More information: Are centralized solutions bad?)


There appears to be a stubborn misconception that centralized solutions cannot
scale. Moreover, they are almost always associated with introducing a single
point of failure. Both reasons are often seen to be enough to dismiss centralized
solutions as being a good choice when designing distributed systems.
What many people forget is that a difference should be made between logical
and physical designs. A logically centralized solution can be implemented in a
highly scalable distributed manner. An excellent example is the Domain Name
System (DNS), which we discuss extensively in Chapter 6. Logically, DNS is

DS 4.01 downloaded by [email protected]


1.1. FROM NETWORKED SYSTEMS TO DISTRIBUTED SYSTEMS 7

organized as a huge tree, where each path from the root to a leaf node represents
a fully qualified name, such as www.distributed-systems.net. It would be a
mistake to think that the root node is implemented by just a single server. In
fact, the root node is implemented by 13 different root servers, each server, in
turn, implemented as a large cluster computer. The physical organization of DNS
also shows that the root is not a single point of failure. Being highly replicated, it
would take serious efforts to bring that root down and so far, all attempts to do
so have failed.
Centralized solutions are not bad just because they seem to be centralized.
In fact, as we shall encounter many times throughout this book, (logically, and
even physically) centralized solutions are often much better than distributed
counterparts for the simple reason that there is a single point of failure. It makes
them much easier to manage, for example, and certainly in comparison where
there may be multiple points of failures. Moreover, that single point of failure
can be hardened against many kinds of failures as well as many kinds of security
attacks. When it comes to being a performance bottleneck, we will also see
that many things can be done to ensure that even that cannot be held against
centralization.
In this sense, let us not forget that centralized solutions have even proven to
be extremely scalable and robust. They are called cloud-based solutions. Again,
their implementations can make use of very sophisticated distributed solutions,
yet even then, we shall see that even those solutions may sometimes need to rely
on a small set of physical machines, if only to guarantee performance.

1.1.2 Why making the distinction is relevant


Why do we make this distinction between decentralized and distributed
systems? It is important to realize that centralized solutions are generally
much simpler, and also simpler along different criteria. Decentralization,
that is, the act of spreading the implementation of a service across multiple
computers because we believe it is necessary, is a decision that needs to be
considered carefully. Indeed, distributed and decentralized solutions are
inherently difficult:

• There are many, often unexpected, dependencies that hinder understand-


ing the behavior of these systems.

• Distributed and decentralized systems suffer almost continuously from


partial failures: some process or resource, somewhere at one of the
participating computers, is not operating according to expectations.
Discovering that failure may actually take some time, while also such
failures are preferably masked (i.e., they go unnoticed for users and
applications), including the recovery from failures.

downloaded by [email protected] DS 4.01


8 CHAPTER 1. INTRODUCTION

• Much related to partial failures is the fact that in many networked


computer systems, participating nodes, processes, resources, and so
on, come and go. This makes these systems highly dynamic, in turn
requiring forms of automated management and maintenance, in turn
increasing the complexity.

• The fact that distributed and decentralized systems are networked, used
by many users and applications, and often cross multiple administra-
tive boundaries, make them particularly vulnerable to security attacks.
Therefore, understanding these systems and their behavior, requires
that we understand how they can be, and are secured. Unfortunately,
understanding security is not that easy.

Our distinction is one between sufficiency and necessity for spreading processes
and resources across multiple computers. Throughout this book, we take
the standpoint that decentralization can never be a goal in itself, and that it
should focus on the sufficiency for spreading processes and resources across
computers. In principle, the less spreading, the better. Yet at the same time,
we need to realize that spreading is sometimes truly necessary, as illustrated
by the examples of decentralized systems. From this point of sufficiency, the
book is truly about distributed systems and where appropriate, we shall speak
of decentralized systems.
Along the same lines, considering that distributed and decentralized
systems are inherently complex, it is equally important to consider solutions
that are as simple as possible. Therefore, we shall hardly discuss optimizations
to solutions, firmly believing that the impact of their negative contribution to
increased complexity outweighs the importance of their positive contribution
to an increase of any type of performance.

1.1.3 Studying distributed systems


Considering that distributed systems are inherently difficult, it is important to
take a systematic approach toward studying them. One of our major concerns
is that there are so many explicit and implicit dependencies in distributed
systems. For example, there is no such thing as a separate communication
module, or a separate security module. Our approach is to take a look at
distributed systems from a limited number, yet different perspectives. Each
perspective is considered in a separate chapter.

• There are many ways in which distributed systems are organized. We


start our discussion by taking the architectural perspective: what are
common organizations, what are common styles? The architectural
perspective will help in getting a first grip on how various components
of existing systems interact and depend on each other.

DS 4.01 downloaded by [email protected]


1.1. FROM NETWORKED SYSTEMS TO DISTRIBUTED SYSTEMS 9

• Distributed systems are all about processes. The process perspective is


all about understanding the different forms of processes that occur in
distributed systems, be they threads, their virtualization of hardware
processes, clients, servers, and so on. Processes form the software
backbone of distributed systems, and their understanding is essential
for understanding distributed systems.

• Obviously, with multiple computers at stake, communication between


processes is essential. The communication perspective concerns the
facilities that distributed systems provide to exchange data between
processes. It essentially entails mimicking procedure calls across mul-
tiple computers, high-level message passing with a wealth of semantic
options, and various sorts of communication between sets of processes.

• To make distributed systems work, what happens under the hood on top
of which applications are executed, is that processes coordinate things.
They jointly coordinate, for example, to compensate for the lack of global
clock, for realizing mutual exclusive access to shared resources, and so
on. The coordination perspective describes a number of fundamental
coordination tasks that need to be carried out as part of most distributed
systems.

• To access processes and resources, we need naming. In particular,


we need naming schemes that, when used, will lead to the process,
resources, or whatever other type of entity that is being named. As
simple as this may seem, naming not only turns out to be crucial in
distributed systems, there are also many ways in which naming is
supported. The naming perspective focuses entirely on resolving a
name to the access point of the named entity.

• A critical aspect of distributed systems is that they perform well in terms


of efficiency and in terms of dependability. The key instrument for both
aspects is replicating resources. The only problem with replication is
that updates may happen, implying that all copies of a resource need
to be updated as well. It is here, that keeping up the appearance of a
nondistributed system becomes challenging. The consistency and repli-
cation perspective essentially concentrates on the trade-offs between
consistency, replication, and performance.

• We already mentioned that distributed systems are subject to partial


failures. The perspective of fault tolerance dives into the means for
masking failures and their recovery. It has proven to be one of the
toughest perspectives for understanding distributed systems, mainly
because there are so many trade-offs to be made, and also because
completely masking failures and their recovery is provably impossible.

downloaded by [email protected] DS 4.01


10 CHAPTER 1. INTRODUCTION

• As also mentioned, there is no such thing as a nonsecured distributed


system. The security perspective focuses on how to ensure authorized
access to resources. To that end, we need to discuss trust in distributed
systems, along with authentication, namely verifying a claimed identity.
The security perspective comes last, yet later in this chapter we shall
discuss a few basic instruments that are needed to understand the role
of security in the previous perspectives.

1.2 Design goals


Just because it is possible to build distributed systems does not necessarily
mean that it is a good idea. In this section, we discuss four important goals
that should be met to make building a distributed system worth the effort. A
distributed system should make resources easily accessible; it should hide the
fact that resources are distributed across a network; it should be open; and it
should be scalable.

1.2.1 Resource sharing


An important goal of a distributed system is to make it easy for users (and
applications) to access and share remote resources. Resources can be virtually
anything, but typical examples include peripherals, storage facilities, data,
files, services, and networks, to name just a few. There are many reasons for
wanting to share resources. One obvious reason is that of economics. For
example, it is cheaper to have a single high-end reliable storage facility be
shared than having to buy and maintain storage for each user separately.
Connecting users and resources also makes it easier to collaborate and
exchange information, as is illustrated by the success of the Internet with
its simple protocols for exchanging files, mail, documents, audio, and video.
The connectivity of the Internet has allowed geographically widely dispersed
groups of people to work together by all kinds of groupware, that is, software
for collaborative editing, teleconferencing, and so on, as is illustrated by
multinational software-development companies that have outsourced much
of their code production to Asia, but also the myriad of collaboration tools
that became (more easily) available due to the COVID-19 pandemic.
Resource sharing in distributed systems is also illustrated by the success
of file-sharing peer-to-peer networks like BitTorrent. These distributed sys-
tems make it simple for users to share files across the Internet. Peer-to-peer
networks are often associated with distribution of media files such as au-
dio and video. In other cases, the technology is used for distributing large
amounts of data, as in the case of software updates, backup services, and data
synchronization across multiple servers.
Seamless integration of resource-sharing facilities in a networked environ-
ment is also now commonplace. A group of users can simply place files into a

DS 4.01 downloaded by [email protected]


1.2. DESIGN GOALS 11

special shared folder that is maintained by a third party somewhere on the In-
ternet. Using special software, the shared folder is barely distinguishable from
other folders on a user’s computer. In effect, these services replace the use of
a shared directory on a local distributed file system, making data available
to users independent of the organization they belong to, and independent of
where they are. The service is offered for different operating systems. Where
exactly data are stored is completely hidden from the end user.

1.2.2 Distribution transparency


An important goal of a distributed system is to hide the fact that its processes
and resources are physically distributed across multiple computers, possibly
separated by large distances. In other words, it tries to make the distribution
of processes and resources transparent, that is, invisible, to end users and
applications. As we shall discuss more extensively in Chapter 2, achieving
distribution transparency is realized through what is known as middleware,
sketched in Figure 1.2 (see Gazis and Katsiri [2022] for a first introduction).
In essence, what applications get to see is the same interface everywhere,
whereas behind that interface, where and how processes and resources are
and how they are accessed is kept transparent. There are different types of
transparency, which we discuss next.

Types of distribution transparency


The concept of transparency can be applied to several aspects of a distributed
system, of which the most important ones are listed in Figure 1.3. We use the
term object to mean either a process or a resource.
Access transparency deals with hiding differences in data representation
and the way that objects can be accessed. At a basic level, we want to hide
differences in machine architectures, but more important is that we reach

Figure 1.2: Realizing distribution transparency through a middleware layer.

downloaded by [email protected] DS 4.01


12 CHAPTER 1. INTRODUCTION

Transparency Description
Access Hide differences in data representation and how an object is
accessed
Location Hide where an object is located
Relocation Hide that an object may be moved to another location while
in use
Migration Hide that an object may move to another location
Replication Hide that an object is replicated
Concurrency Hide that an object may be shared by several independent
users
Failure Hide the failure and recovery of an object

Figure 1.3: Different forms of transparency in a distributed system (see ISO


[1995]). An object can be a resource or a process.

agreement on how data is to be represented by different machines and operat-


ing systems. For example, a distributed system may have computer systems
that run different operating systems, each having their own file-naming con-
ventions. Differences in naming conventions, differences in file operations, or
differences in how low-level communication with other processes is to take
place, are examples of access issues that should preferably be hidden from
users and applications.
An important group of transparency types concerns the location of a
process or resource. Location transparency refers to the fact that users
cannot tell where an object is physically located in the system. Naming
plays an important role in achieving location transparency. In particular,
location transparency can often be achieved by assigning only logical names
to resources, that is, names in which the location of a resource is not secretly
encoded. An example of a such a name is the uniform resource locator
(URL) https://www.distributed-systems.net/, which gives no clue about
the actual location of the Web server where this book is offered. The URL also
gives no clue whether files at that site have always been at their current location
or were recently moved there. For example, the entire site may have been
moved from one data center to another, yet users should not notice. The latter
is an example of relocation transparency, which is becoming increasingly
important in the context of cloud computing: the phenomenon by which
services are provided by huge collections of remote servers. We return to
cloud computing in subsequent chapters, and, in particular, in Chapter 2.
Where relocation transparency refers to being moved by the distributed
system, migration transparency is offered by a distributed system when it
supports the mobility of processes and resources initiated by users, with-
out affecting ongoing communication and operations. A typical example
is communication between mobile phones: regardless whether two people

DS 4.01 downloaded by [email protected]


1.2. DESIGN GOALS 13

are actually moving, mobile phones will allow them to continue their con-
versation. Other examples that come to mind include online tracking and
tracing of goods as they are being transported from one place to another,
and teleconferencing (partly) using devices that are equipped with mobile
Internet.

As we shall see, replication plays an important role in distributed systems.


For example, resources may be replicated to increase availability or to im-
prove performance by placing a copy close to the place where it is accessed.
Replication transparency deals with hiding the fact that several copies of a
resource exist, or that several processes are operating in some form of lockstep
mode so that one can take over when another fails. To hide replication from
users, it is necessary that all replicas have the same name. Consequently,
a system that supports replication transparency should generally support
location transparency as well, because it would otherwise be impossible to
refer to replicas at different locations.

We already mentioned that an important goal of distributed systems is


to allow sharing of resources. In many cases, sharing resources is done
cooperatively, as in the case of communication channels. However, there are
also many examples of competitive sharing of resources. For example, two
independent users may each have stored their files on the same file server
or may be accessing the same tables in a shared database. In such cases, it
is important that each user does not notice that the other is making use of
the same resource. This phenomenon is called concurrency transparency.
An important issue is that concurrent access to a shared resource leaves that
resource in a consistent state. Consistency can be achieved through locking
mechanisms, by which users are, in turn, given exclusive access to the desired
resource. A more refined mechanism is to make use of transactions, but these
may be difficult to implement in a distributed system, notably when scalability
is an issue.

Last, but certainly not least, it is important that a distributed system


provides failure transparency. This means that a user or application does not
notice that some piece of the system fails to work properly, and that the system
subsequently (and automatically) recovers from that failure. Masking failures
is one of the hardest issues in distributed systems and is even impossible
when certain apparently realistic assumptions are made, as we will discuss
in Chapter 8. The main difficulty in masking and transparently recovering
from failures lies in the inability to distinguish between a dead process and a
painfully slowly responding one. For example, when contacting a busy Web
server, a browser will eventually time out and report that the Web page is
unavailable. At that point, the user cannot tell whether the server is actually
down or that the network is badly congested.

downloaded by [email protected] DS 4.01


14 CHAPTER 1. INTRODUCTION

Degree of distribution transparency


Although distribution transparency is generally considered preferable for any
distributed system, there are situations in which blindly attempting to hide
all distribution aspects from users is not a good idea. A simple example is
requesting your electronic newspaper to appear in your mailbox before 7 AM
local time, as usual, while you are currently at the other end of the world
living in a different time zone. Your morning paper will not be the morning
paper you are used to.
Likewise, a wide-area distributed system that connects a process in San
Francisco to a process in Amsterdam cannot be expected to hide the fact
that Mother Nature will not allow it to send a message from one process to
the other in less than approximately 35 milliseconds. Practice shows that it
actually takes several hundred milliseconds using a computer network. Signal
transmission is not only limited by the speed of light, but also by limited
processing capacities and delays in the intermediate switches.
There is also a trade-off between a high degree of transparency and the
performance of a system. For example, many Internet applications repeatedly
try to contact a server before finally giving up. Consequently, attempting to
mask a transient server failure before trying another one may slow down the
system as a whole. In such a case, it may have been better to give up earlier,
or at least let the user cancel the attempts to make contact.
Another example is where we need to guarantee that several replicas,
located on different continents, must be consistent all the time. In other words,
if one copy is changed, that change should be propagated to all copies before
allowing any other operation. A single update operation may now even take
seconds to complete, something that cannot be hidden from users.
Finally, there are situations in which it is not at all obvious that hiding
distribution is a good idea. As distributed systems are expanding to devices
that people carry around and where the very notion of location and context
awareness is becoming increasingly important, it may be best to actually expose
distribution rather than trying to hide it. An obvious example is making use
of location-based services, which can often be found on mobile phones, such
as finding a nearest shop or any nearby friends.
There are other arguments against distribution transparency. Recognizing
that full distribution transparency is simply impossible, we should ask our-
selves whether it is even wise to pretend that we can achieve it. It may be much
better to make distribution explicit so that the user and application developer
are never tricked into believing that there is such a thing as transparency. The
result will be that users will much better understand the (sometimes unex-
pected) behavior of a distributed system, and are thus much better prepared
to deal with this behavior.
The conclusion is that aiming for distribution transparency may be a
nice goal when designing and implementing distributed systems, but that

DS 4.01 downloaded by [email protected]


1.2. DESIGN GOALS 15

it should be considered together with other issues such as performance


and comprehensibility. The price for achieving full transparency may be
surprisingly high.

Note 1.2 (Discussion: Against distribution transparency)


Several researchers have argued that hiding distribution will lead to only further
complicating the development of distributed systems, exactly for the reason that
full distribution transparency can never be achieved. A popular technique for
achieving access transparency is to extend procedure calls to remote servers. How-
ever, Waldo et al. [1997] already pointed out that attempting to hide distribution
by such remote procedure calls can lead to poorly understood semantics, for
the simple reason that a procedure call does change when executed over a faulty
communication link.
As an alternative, various researchers and practitioners are now arguing for
less transparency, for example, by more explicitly using message-style commu-
nication, or more explicitly posting requests to, and getting results from remote
machines, as is done on the Web when fetching pages. Such solutions will be
discussed in detail in the next chapter.
A somewhat radical standpoint was taken by Wams [2012] by stating that
partial failures preclude relying on the successful execution of a remote service. If
such reliability cannot be guaranteed, it is then best to always perform only local
executions, leading to the copy-before-use principle. According to this principle,
data can be accessed only after they have been transferred to the machine of the
process wanting that data. Moreover, modifying a data item should not be done.
Instead, it can only be updated to a new version. It is not difficult to imagine
that many other problems will surface. However, Wams shows that many existing
applications can be retrofitted to this alternative approach without sacrificing
functionality.

1.2.3 Openness
Another important goal of distributed systems is openness. An open dis-
tributed system is essentially a system that offers components that can easily
be used by, or integrated into other systems. At the same time, an open
distributed system itself will often consist of components that originate from
elsewhere.

Interoperability, composability, and extensibility


To be open means that components should adhere to standard rules that
describe the syntax and semantics of what those components have to offer (i.e.,
which service they provide). A general approach is to define services through
interfaces using an Interface Definition Language (IDL). Interface definitions
written in an IDL nearly always capture only the syntax of services. In other
words, they specify precisely the names of the functions that are available

downloaded by [email protected] DS 4.01


16 CHAPTER 1. INTRODUCTION

together with types of the parameters, return values, possible exceptions that
can be raised, and so on. The hard part is specifying precisely what those
services do, that is, the semantics of interfaces. In practice, such specifications
are given in an informal way by natural language.
If properly specified, an interface definition allows an arbitrary process
that needs a certain interface, to talk to another process that provides that
interface. It also allows two independent parties to build entirely different
implementations of those interfaces, leading to two separate components that
operate in exactly the same way.
Proper specifications are complete and neutral. Complete means that
everything that is necessary to make an implementation has indeed been
specified. However, many interface definitions are not at all complete, so
that it is necessary for a developer to add implementation-specific details.
Just as important is the fact that specifications do not prescribe what an
implementation should look like; they should be neutral.
As pointed out in Blair and Stefani [1998], completeness and neutrality are
important for interoperability and portability. Interoperability characterizes
the extent by which two implementations of systems or components from
different manufacturers can co-exist and work together by merely relying
on each other’s services as specified by a common standard. Portability
characterizes to what extent an application developed for a distributed system
A can be executed, without modification, on a different distributed system B
that implements the same interfaces as A.
Another important goal for an open distributed system is that it should
be easy to configure the system out of different components (possibly from
different developers). Moreover, it should be easy to add new components or
replace existing ones without affecting those components that stay in place.
In other words, an open distributed system should also be extensible. For
example, in an extensible system, it should be relatively easy to add parts that
run on a different operating system, or even to replace an entire file system.
Relatively simple examples of extensibility are plug-ins for Web browsers, but
also those for Websites, such as the ones used for WordPress.

Note 1.3 (Discussion: Open systems in practice)


Of course, what we have just described is an ideal situation. Practice shows that
many distributed systems are not as open as we would like, and that still a lot
of effort is needed to put various bits and pieces together to make a distributed
system. One way out of the lack of openness is to simply reveal all the gory
details of a component and to provide developers with the actual source code.
This approach is becoming increasingly popular, leading to so-called open-source
projects, where large groups of people contribute to improving and debugging
systems. Admittedly, this is as open as a system can get, but whether it is the best
way is questionable.

DS 4.01 downloaded by [email protected]


1.2. DESIGN GOALS 17

Separating policy from mechanism


To achieve flexibility in open distributed systems, it is crucial that the system
be organized as a collection of relatively small and easily replaceable or
adaptable components. This implies that we should provide definitions of not
only the highest-level interfaces, that is, those seen by users and applications,
but also definitions for interfaces to internal parts of the system and describe
how those parts interact. This approach is relatively new. Many older and
even contemporary systems are constructed using a monolithic approach
in which components are only logically separated but implemented as one,
huge program. This approach makes it hard to replace or adapt a component
without affecting the entire system. Monolithic systems thus tend to be closed
instead of open.
The need for changing a distributed system is often caused by a component
that does not provide the optimal policy for a specific user or application.
As an example, consider caching in Web browsers. There are many different
parameters that need to be considered:

Storage: Where is data to be cached? Typically, there will be an in-memory


cache next to storage on disk. In the latter case, the exact position in the
local file system needs to be considered.

Exemption: When the cache fills up, which data is to be removed so that
newly fetched pages can be stored?

Sharing: Does each browser make use of a private cache, or is a cache to be


shared among browsers of different users?

Refreshing: When does a browser check if cached data is still up-to-date?


Caches are most effective when a browser can return pages without
having to contact the original Website. However, this bears the risk of
returning stale data. Note also that refresh rates are highly dependent
on which data is actually cached: whereas timetables for trains hardly
change, this is not the case for Web pages showing current highway-
traffic conditions, or worse yet, stock prices.

What we need is a separation between policy and mechanism. In the case


of Web caching, for example, a browser should ideally provide facilities for
only storing documents (i.e., a mechanism) and at the same time allow users
to decide which documents are stored and for how long (i.e., a policy). In
practice, this can be implemented by offering a rich set of parameters that the
user can set (dynamically). When taking this a step further, a browser may
even offer facilities for plugging in policies that a user has implemented as a
separate component.

downloaded by [email protected] DS 4.01


18 CHAPTER 1. INTRODUCTION

Note 1.4 (Discussion: Is a strict separation really what we need?)


In theory, strictly separating policies from mechanisms seems to be the way to go.
However, there is an important trade-off to consider: the stricter the separation, the
more we need to make sure that we offer the appropriate collection of mechanisms.
In practice, this means that a rich set of features is offered, in turn leading to many
configuration parameters. As an example, the popular Firefox browser comes
with a few hundred configuration parameters. Just imagine how the configuration
space explodes when considering large distributed systems consisting of many
components. In other words, strict separation of policies and mechanisms may
lead to highly complex configuration problems.
One option to alleviate these problems is to provide reasonable defaults, and
this is what often happens in practice. An alternative approach is one in which
the system observes its own usage and dynamically changes parameter settings.
This leads to what are known as self-configurable systems. Nevertheless, the
fact alone that many mechanisms need to be offered to support a wide range of
policies often makes coding distributed systems very complicated. Hard-coding
policies into a distributed system may reduce complexity considerably, but at the
price of less flexibility.

1.2.4 Dependability
As its name suggests, dependability refers to the degree that a computer
system can be relied upon to operate as expected. In contrast to single-
computer systems, dependability in distributed systems can be rather intricate
due to partial failures: somewhere there is a component failing while the
system as a whole still seems to be living up to expectations (up to a certain
point or moment). Although single-computer systems can also suffer from
failures that do not appear immediately, having a potentially large collection
of networked computer systems complicates matters considerably. In fact, one
should assume that at any time, there are always partial failures occurring.
An important goal of distributed systems is to mask those failures, as well as
mask the recovery from those failures. This masking is the essence of being
able to tolerate faults, accordingly referred to as fault tolerance.

Basic concepts
Dependability is a term that covers several useful requirements for distributed
systems, including the following [Kopetz and Verissimo, 1993]:

• Availability
• Reliability
• Safety
• Maintainability

DS 4.01 downloaded by [email protected]


1.2. DESIGN GOALS 19

Availability is defined as the property that a system is ready to be used


immediately. In general, it refers to the probability that the system is operating
correctly at any given moment and is available to perform its functions on
behalf of its users. In other words, a highly available system is one that will
most likely be working at a given instant in time.
Reliability refers to the property that a system can run continuously
without failure. In contrast to availability, reliability is defined in terms of a
time interval instead of an instant in time. A highly reliable system is one
that will most likely continue to work without interruption during a relatively
long period of time. This is a subtle but important difference when compared
to availability. If a system goes down on average for one, seemingly random
millisecond every hour, it has an availability of more than 99.9999 percent,
but is still unreliable. Similarly, a system that never crashes but is shut down
for two specific weeks every August has high reliability but only 96 percent
availability. The two are not the same.
Safety refers to the situation that when a system temporarily fails to
operate correctly, no catastrophic event happens. For example, many process-
control systems, such as those used for controlling nuclear power plants or
sending people into space, are required to provide a high degree of safety. If
such control systems temporarily fail for only a very brief moment, the effects
could be disastrous. Many examples from the past (and probably many more
yet to come) show how hard it is to build safe systems.
Finally, maintainability refers to how easily a failed system can be repaired.
A highly maintainable system may also show a high degree of availability,
especially if failures can be detected and repaired automatically. However, as
we shall see, automatically recovering from failures is easier said than done.
Traditionally, fault-tolerance has been related to the following three metrics:

• Mean Time To Failure (MTTF): The average time until a component


fails.
• Mean Time To Repair (MTTR): The average time needed to repair a
component.
• Mean Time Between Failures (MTBF): Simply MTTF + MTTR.

Note that these metrics make sense only if we have an accurate notion of
what a failure actually is. As we will encounter in Chapter 8, identifying the
occurrence of a failure may actually not be so obvious.

Faults, errors, failures


A system is said to fail when it cannot meet its promises. In particular, if a
distributed system is designed to provide its users with several services, the
system has failed when one or more of those services cannot be (completely)
provided. An error is a part of a system’s state that may lead to a failure. For

downloaded by [email protected] DS 4.01


20 CHAPTER 1. INTRODUCTION

example, when transmitting packets across a network, it is to be expected that


some packets have been damaged when they arrive at the receiver. Damaged
in this context means that the receiver may incorrectly sense a bit value (e.g.,
reading a 1 instead of a 0), or may even be unable to detect that something
has arrived.
The cause of an error is called a fault. Clearly, finding out what caused an
error is important. For example, a wrong or bad transmission medium may
easily cause packets to be damaged. In this case, it is relatively easy to remove
the fault. However, transmission errors may also be caused by bad weather
conditions, such as in wireless networks. Changing the weather to reduce or
prevent errors is a bit trickier.
As another example, a crashed program is clearly a failure, which may
have happened because the program entered a branch of code containing
a programming bug (i.e., a programming error). The cause of that bug is
typically a programmer. In other words, the programmer is the cause of the
error (programming bug), in turn leading to a failure (a crashed program).
Building dependable systems closely relates to controlling faults. As
explained by Avizienis et al. [2004], a distinction can be made between pre-
venting, tolerating, removing, and forecasting faults. For our purposes, the
most important issue is fault tolerance, meaning that a system can provide
its services even in the presence of faults. For example, by applying error-
correcting codes for transmitting packets, it is possible to tolerate, to a certain
extent, relatively poor transmission lines and reducing the probability that an
error (a damaged packet) may lead to a failure.
Faults are generally classified as transient, intermittent, or permanent.
Transient faults occur once and then disappear. If the operation is repeated,
the fault goes away. A bird flying through the beam of a microwave transmitter
may cause lost bits on some network (not to mention a roasted bird). If the
transmission times out and is retried, it will probably work the second time.
An intermittent fault occurs, then vanishes on its own accord, then reap-
pears, and so on. A loose contact on a connector will often cause an inter-
mittent fault. Intermittent faults cause a great deal of aggravation because
they are difficult to diagnose. Typically, when the fault doctor shows up, the
system works fine.
A permanent fault is one that continues to exist until the faulty compo-
nent is replaced. Burnt-out chips, software bugs, and disk-head crashes are
examples of permanent faults.
Dependable systems are also required to provide security, especially in
terms of confidentiality and integrity. Confidentiality is the property that
information is disclosed only to authorized parties, while integrity relates to
ensuring that alterations to various assets can be made only in an authorized
way. Indeed, can we speak of a dependable system when confidentiality and
integrity are not in place? We return to security next.

DS 4.01 downloaded by [email protected]


1.2. DESIGN GOALS 21

1.2.5 Security
A distributed system that is not secure, is not dependable. As mentioned,
special attention is needed to ensure confidentiality and integrity, both of
which are directly coupled to authorized disclosure and access of information
and resources. In any computer system, authorization is done by checking
whether an identified entity has proper access rights. In turn, this means
that the system should know it is indeed dealing with the proper entity. For
this reason, authentication is essential: verifying the correctness of a claimed
identity. Equally important is the notion of trust. If a system can positively
authenticate a person, what is that authentication worth if the person cannot
be trusted? For this reason alone, proper authorization is important, as it
may be used to limit any damage that a person, who could in hindsight not
be trusted, can cause. For example, in financial systems, authorization may
limit the amount of money a person is allowed to transfer between various
accounts. We will discuss trust, authentication, and authorization at length in
Chapter 9.

Key elements needed to understand security

An essential technique to making distributed systems secure is cryptography.


This is not the place in this book to extensively discuss cryptography (which
we also defer until Chapter 9), yet to understand how security fits into various
perspectives in the following chapters, we informally introduce some of its
basic elements.
Keeping matters simple, security in distributed systems is all about en-
crypting and decrypting data using security keys. The easiest way of consid-
ering a security key K is to see it as a function operating on some data data.
We use the notation K (data) to express the fact that the key K operates on
data.
There are two ways of encrypting and decrypting data. In a symmetric
cryptosystem, encryption and decryption takes place with a single key. Denot-
ing by EK (data) the encryption of data using key EK , and likewise DK (data)
for decryption with key DK , then in a symmetric cryptosystem, the same key
is used for encryption and decryption, i.e.,

if data = DK ( EK (data)) then DK = EK .

Note that in a symmetric cryptosystem, the key will need to be kept secret by
all parties that are authorized to encrypt or decrypt data. In an asymmetric
cryptosystem, the keys used for encryption and decryption are different.
In particular, there is a public key PK that can be used by anyone, and a
secret key SK that is, as its name suggests, to be kept secret. Asymmetric
cryptosystems are also called public-key systems.

downloaded by [email protected] DS 4.01


22 CHAPTER 1. INTRODUCTION

Encryption and decryption in public-key systems can be used in two,


fundamentally different ways. First, if Alice wants to encrypt data that can
be decrypted only by Bob, she should use Bob’s public key, PKB , leading
to the encrypted data PKB (data). Only the holder of the associated secret
key can decrypt this information, i.e., Bob, who will apply the operation
SKB (PKB (data)), which returns data.
A second, and widely applied use case, is that of realizing digital signa-
tures. Suppose Alice makes some data available for which it is important that
any party, but let us assume it is Bob, needs to know for sure that it comes
from Alice. In that case, Alice can encrypt the data with her secret key SKA ,
leading to SK A (data). If it can be assured that the associated public key PKA
indeed belongs to Alice, then successfully decrypting SK A (data) to data, is
proof that Alice knows about data: she is the only one holding the secret key
SKA . Of course, we need to make the assumption that Alice is indeed the only
one who holds SKA . We return to some of these assumptions in Chapter 9.
As it turns out, proving that an entity has seen, or knows about some
data, returns frequently in secured distributed systems. Practical placement
of digital signatures is generally more efficient by a hash function. A hash
function H has the property that when operating on some data, i.e., H (data),
it returns a fixed-length string, regardless of the length of data. Any change
of data to data⇤ will lead to a different hash value H (data⇤ ). Moreover, given
a hash value h, it is computationally impossible in practice, to discover the
original data. What this all means, is that for placing a digital signature, Alice
computes sig = SK A ( H (data)) as her signature, and tells Bob about data, H
and sig. Bob, in turn, can then verify that signature by computing PK A (sig)
and verifying that it matches the value H (data).

Using cryptography in distributed systems


The application of cryptography in distributed systems comes in many forms.
Besides its general use for encryption and digital signatures, cryptography
forms the basis for realizing a secure channel between two communicating
parties. Such channels basically let two parties know for sure that they are
indeed communicating to the entities that they expected to communicate with.
In other words, a communication channel that supports mutual authentication.
A practical example of a secure channel is using https when accessing Websites.
Now, many browsers demand that Websites support this protocol, and at
the very least will warn the user when this is not the case. In general, using
cryptography is necessary to realize authentication (and authorization) in
distributed systems.
Cryptography is also used to realize secure distributed data structures.
A well-known example is that of a blockchain, which is, literally, a chain of
blocks. The basic idea is simple: hash the data in a block Bi , and place that
hash value as part of the data in its succeeding block Bi+1 . Any change in

DS 4.01 downloaded by [email protected]


1.2. DESIGN GOALS 23

Bi (for example, as the result of an attack), will require that the attacker also
changes the stored hash value in Bi+1 . However, because the successor of
Bi+1 contains the hash computed over the data in Bi+1 , and thus including
the original hash value of Bi , the attacker will also have to change the new has
value of Bi as stored in Bi+1 . Yet changing that value, also means changing the
hash value of Bi+1 , and thus the value stored in Bi+2 , in turn, requiring that
a new hash value is to be computed for Bi+2 , and so on. In other words, by
securely linking blocks into a chain, any successful change to a block requires
that all successive blocks be modified as well. These modifications should go
unnoticed, which is virtually impossible.
Cryptography is also used for another important mechanism in distributed
systems: delegating access rights. The basic idea is that Alice may want to
delegate some rights to Bob, who, in turn, may want to pass some of those
rights on to Chuck. Using appropriate means (which we discuss in Chapter 9, a
service can securely check that Chuck has indeed been authorized to perform
certain operations, without the need for that service to check with Alice
whether the delegation is in place. Note that delegation is something we are
now used to: many of us delegate access rights that we have as a user to
specific applications, such as an e-mail client.
An upcoming distributed application of cryptography is so-called multi-
party computation: the means for two or three parties to compute a value
for which the data of those parties is needed, but without having to actually
share that data. An often-used example is computing the number of votes
without having to know who voted for whom.
We will see many more examples of security in distributed systems in the
following chapters. With the brief explanations of the cryptographic basis,
it should suffice to see how security is applied. We shall consistently use
the notations as shown in Figure 1.4. Alternatively, security examples can be
skipped until having studied Chapter 9.

Notation Description
KA,B Secret key shared by A and B
PKA Public key of A
SKA Private (secret) key of A
EK (data) Encryption of data using key EK (or key K)
DK (data) Decryption of (encrypted) data using key DK (or key K)
H (data) The hash of data computed using function H

Figure 1.4: Notations for cryptosystems used in this book.

downloaded by [email protected] DS 4.01


24 CHAPTER 1. INTRODUCTION

1.2.6 Scalability
For many of us, worldwide connectivity through the Internet is as common as
being able to send a package to anyone anywhere around the world. Moreover,
where until recently, we were used to having relatively powerful desktop
computers for office applications and storage, we are now witnessing that
such applications and services are being placed in what has been coined “the
cloud,” in turn leading to an increase of much smaller networked devices such
as tablet computers or even cloud-only laptops such as Google’s Chromebook.
With this in mind, scalability has become one of the most important design
goals for developers of distributed systems.

Scalability dimensions
Scalability of a system can be measured along at least three different dimen-
sions (see [Neuman, 1994]):

Size scalability: A system can be scalable regarding its size, meaning that
we can easily add more users and resources to the system without any
noticeable loss of performance.
Geographical scalability: A geographically scalable system is one in which
the users and resources may lie far apart, but the fact that communication
delays may be significant is hardly noticed.
Administrative scalability: An administratively scalable system is one that
can still be easily managed even if it spans many independent adminis-
trative organizations.
Let us take a closer look at each of these three scalability dimensions.

Size scalability When a system needs to scale, very different types of prob-
lems need to be solved. Let us first consider scaling regarding size. If more
users or resources need to be supported, we are often confronted with the
limitations of centralized services, although often for very different reasons.
For example, many services are centralized in the sense that they are imple-
mented by a single server running on a specific machine in the distributed
system. In a more modern setting, we may have a group of collaborating
servers co-located on a cluster of tightly coupled machines physically placed
at the same location. The problem with this scheme is obvious: the server, or
group of servers, can simply become a bottleneck when it needs to process
an increasing number of requests. To illustrate how this can happen, let us
assume that a service is implemented on a single machine. In that case, there
are essentially three root causes for becoming a bottleneck:

• The computational capacity, limited by the CPUs


• The storage capacity, including the I/O transfer rate

DS 4.01 downloaded by [email protected]


1.2. DESIGN GOALS 25

• The network between the user and the centralized service

Let us first consider the computational capacity. Just imagine a service for
computing optimal routes taking real-time traffic information into account. It
is not difficult to imagine that this may be primarily a compute-bound service,
requiring several (tens of) seconds to complete a request. If there is only a
single machine available, then even a modern high-end system will eventually
run into problems if the number of requests increases beyond a certain point.
Likewise, but for different reasons, we will run into problems when having
a service that is mainly I/O bound. A typical example is a poorly designed
centralized search engine. The problem with content-based search queries is
that we essentially need to match a query against an entire data set. Even
with advanced indexing techniques, we may still face the problem of having
to process a huge amount of data exceeding the main-memory capacity of
the machine running the service. As a consequence, much of the processing
time will be determined by the relatively slow disk accesses and transfer of
data between disk and main memory. Simply adding more or higher-speed
disks will prove not to be a sustainable solution as the number of requests
continues to increase.
Finally, the network between the user and the service may also be the cause
of poor scalability. Just imagine a video-on-demand service that needs to
stream high-quality video to multiple users. A video stream can easily require
a bandwidth of 8 to 10 Mbps, meaning that if a service sets up point-to-point
connections with its customers, it may soon hit the limits of the network
capacity of its own outgoing transmission lines.
There are several solutions to attack size scalability, which we discuss
below after having looked into geographical and administrative scalability.

Note 1.5 (Advanced: Analyzing size scalability)

Figure 1.5: A simple model of a service as a queuing system.

Size scalability problems for centralized services can be formally analyzed


using queuing theory and making a few simplifying assumptions. At a conceptual
level, a centralized service can be modeled as the simple queuing system shown
in Figure 1.5: requests are submitted to the service, where they are queued until
further notice. As soon as the process can handle a next request, it fetches it from
the queue, does its work, and produces a response. We largely follow Menasce
and Almeida [2002] in explaining the performance of a centralized service.

downloaded by [email protected] DS 4.01


26 CHAPTER 1. INTRODUCTION

Often, we may assume that the queue has an infinite capacity, meaning that
there is no restriction on the number of requests that can be accepted for further
processing. Strictly speaking, this means that the arrival rate of requests is not
influenced by what is currently in the queue or being processed. Assuming that
the arrival rate of requests is l requests per second, and that the processing
capacity of the service is µ requests per second, one can compute that the fraction
of time pk that there are k requests in the system is equal to:

l l k
pk = 1
µ µ
If we define the utilization U of a service as the fraction of time that it is busy,
then clearly,

l
U= Â pk = 1 p0 = ) p k = (1 U )U k
k >0
µ

We can then compute the average number N of requests in the system as

( 1 U )U U
N= Â k · p k = Â k · (1 U )U k = ( 1 U) Â k · Uk = = .
k 0 k 0 k 0
(1 U )2 1 U

What we are truly interested in, is the response time R: how long does it take
before the service to process a request, including the time spent in the queue.
To that end, we need the average throughput X. Considering that the service is
“busy” when at least one request is being processed, and that this then happens
with a throughput of µ requests per second, and during a fraction U of the total
time, we have:

l
X= U·µ + (1 U ) · 0 = · µ = l
| {z } | {z } µ
server at work server idle

Using Little’s formula [Trivedi, 2002], we can then derive the response time as

N S R 1
R= = ) =
X 1 U S 1 U
where S = µ1 , the actual service time. Note that if U is small, the response-
to-service time ratio is close to 1, meaning that a request is virtually instantly
processed, and at the maximum speed possible. However, as soon as the utilization
comes closer to 1, we see that the response-to-server time ratio quickly increases to
very high values, effectively meaning that the system is coming close to a grinding
halt. This is where we see scalability problems emerge. From this simple model,
we can see that the only solution is bringing down the service time S. We leave it
as an exercise to the reader to explore how S may be decreased.

Geographical scalability Geographical scalability has its own problems. One


of the main reasons why it is still difficult to scale existing distributed systems

DS 4.01 downloaded by [email protected]


1.2. DESIGN GOALS 27

that were designed for local-area networks is that many of them are based
on synchronous communication. In this form of communication, a party
requesting a service, generally referred to as a client, blocks until a reply is
sent back from the server implementing the service. More specifically, we often
see a communication pattern consisting of many client-server interactions, as
may be the case with database transactions. This approach generally works
fine in LANs, where communication between two machines is often at worst
a few hundred microseconds. However, in a wide-area system, we need to
consider that interprocess communication may be hundreds of milliseconds,
three orders of magnitude slower. Building applications using synchronous
communication in wide-area systems requires a great deal of care (and not
just a little patience), notably with a rich interaction pattern between client
and server.
Another problem that hinders geographical scalability is that communica-
tion in wide-area networks is inherently much less reliable than in local-area
networks. In addition, we generally also need to deal with limited bandwidth.
The effect is that solutions developed for local-area networks cannot always
be easily ported to a wide-area system. A typical example is streaming video.
In a home network, even when having only wireless links, ensuring a stable,
fast stream of high-quality video frames from a media server to a display is
quite simple. Simply placing that same server far away and using a standard
TCP connection to the display will surely fail: bandwidth limitations will
instantly surface, but also maintaining the same level of reliability can easily
cause headaches.
Yet another issue that pops up when components lie far apart is the
fact that wide-area systems generally have only very limited facilities for
multipoint communication. In contrast, local-area networks often support
efficient broadcasting mechanisms. Such mechanisms have proven to be
extremely useful for discovering components and services, which is essential
from a management perspective. In wide-area systems, we need to develop
separate services, such as naming and directory services, to which queries
can be sent. These support services, in turn, need to be scalable as well and
often no obvious solutions exist as we will encounter in later chapters.

Administrative scalability Finally, a difficult, and often open, question is


how to scale a distributed system across multiple, independent administrative
domains. A major problem that needs to be solved is that of conflicting
policies regarding resource usage (and payment), management, and security.
To illustrate, for many years, scientists have been looking for solutions to share
their (often expensive) equipment in what is known as a computational grid.
In these grids, a global decentralized system is constructed as a federation of
local distributed systems, allowing a program running on a computer at an
organization A to directly access resources at the organization B.

downloaded by [email protected] DS 4.01


28 CHAPTER 1. INTRODUCTION

Many components of a distributed system that reside within a single


domain can often be trusted by users that operate within that same domain. In
such cases, system administration may have tested and certified applications,
and may have taken special measures to ensure that such components cannot
be tampered with. In essence, the users trust their system administrators.
However, this trust does not expand naturally across domain boundaries.
If a distributed system expands to another domain, two types of security
measures need to be taken. First, the distributed system has to protect
itself against malicious attacks from the new domain. For example, users
from the new domain may have only read access to the file system in its
original domain. Likewise, facilities such as expensive image setters or high-
performance computers may not be made available to unauthorized users.
Second, the new domain has to protect itself against malicious attacks from
the distributed system. A typical example is that of downloading programs,
such as in the case of federated learning. Basically, the new domain does not
know what to expect from such foreign code. The problem, as we shall see in
Chapter 9, is how to enforce those limitations.
As a counterexample of distributed systems spanning multiple adminis-
trative domains that apparently do not suffer from administrative scalability
problems, consider modern file-sharing peer-to-peer networks. In these cases,
end users simply install a program implementing distributed search and
download functions and within minutes can start downloading files. Other ex-
amples include peer-to-peer applications for telephony over the Internet such
as older Skype systems [Baset and Schulzrinne, 2006], and (again older) peer-
assisted audio-streaming applications such as Spotify [Kreitz and Niemelä,
2010]. A more modern application (that has yet to prove itself in terms of scal-
ability) are blockchains. What these decentralized systems have in common is
that end users, and not administrative entities, collaborate to keep the system
up and running. At best, underlying administrative organizations such as
Internet Service Providers (ISPs) can police the network traffic that these
peer-to-peer systems cause.

Scaling techniques

Having discussed some scalability problems brings us to the question of how


those problems can generally be solved. In most cases, scalability problems
in distributed systems appear as performance problems caused by limited
capacity of servers and network. Simply improving their capacity (e.g., by
increasing memory, upgrading CPUs, or replacing network modules) is often
a solution, referred to as scaling up. When it comes to scaling out, that is,
expanding the distributed system by essentially deploying more machines,
there are basically only three techniques we can apply: hiding communication
latencies, distribution of work, and replication (see also Neuman [1994]).

DS 4.01 downloaded by [email protected]


1.2. DESIGN GOALS 29

Client Server
M
FIRST NAME MAARTEN A
LAST NAME VAN STEEN A
E-MAIL R
[email protected] T
E
N

Check form Process form


(a)

Client Server

FIRST NAME MAARTEN


MAARTEN
LAST NAME VAN STEEN VAN STEEN
E-MAIL [email protected] [email protected]

Check form Process form


(b)

Figure 1.6: The difference between letting (a) a server or (b) a client check
forms as they are being filled.

Hiding communication latencies Hiding communication latencies is appli-


cable in the case of geographical scalability. The basic idea is simple: try to
avoid waiting for responses to remote-service requests as much as possible.
For example, when a service has been requested at a remote machine, an
alternative to waiting for a reply from the server is to do other useful work at
the requester’s side. Essentially, this means constructing the requesting appli-
cation in such a way that it uses only asynchronous communication. When a
reply comes in, the application is interrupted and a special handler is called
to complete the previously issued request. Asynchronous communication
can often be used in batch-processing systems and parallel applications, in
which independent tasks can be scheduled for execution while another task is
waiting for communication to complete. Alternatively, a new thread of control
can be started to perform the request. Although it blocks waiting for the reply,
other threads in the process can continue.
However, there are many applications that cannot make effective use of
asynchronous communication. For example, in interactive applications when
a user sends a request, she will generally have nothing better to do than to
wait for the answer. In such cases, a much better solution is to reduce the
overall communication, for example, by moving part of the computation that
is normally done at the server to the client process requesting the service. A
typical case where this approach works is accessing databases using forms.
Filling in forms can be done by sending a separate message for each field and
waiting for an acknowledgment from the server, as shown in Figure 1.6(a). For
example, the server may check for syntactic errors before accepting an entry.

downloaded by [email protected] DS 4.01


30 CHAPTER 1. INTRODUCTION

A much better solution is to ship the code for filling in the form, and possibly
checking the entries, to the client, and have the client return a completed
form, as shown in Figure 1.6(b). This approach of shipping code is widely
supported by the Web through JavaScript.

Partitioning and distribution Another important scaling technique is parti-


tioning and distribution, which involves taking a component or other resource,
splitting it into smaller parts, and subsequently spreading those parts across
the system. A good example of partitioning and distribution is the Internet
Domain Name System (DNS). The DNS name space is hierarchically orga-
nized into a tree of domains, which are divided into nonoverlapping zones,
as shown for the original DNS in Figure 1.7. The names in each zone are
handled by a single name server. Without going into too many details now
(we return to DNS extensively in Chapter 6), one can think of each path name
being the name of a host on the Internet, and is thus associated with a network
address of that host. Basically, resolving a name means returning the network
address of the associated host. Consider, for example, the name flits.cs.vu.nl.
To resolve this name, it is first passed to the server of zone Z1 (see Figure 1.7)
which returns the address of the server for zone Z2, to which the rest of the
name, flits.cs.vu, can be handed. The server for Z2 will return the address of
the server for zone Z3, which is capable of handling the last part of the name,
and will return the address of the associated host.
Generic Countries

Z1
int com edu gov mil org net jp us nl

sun yale Z2
acm ieee ac co oce vu

eng cs eng jack jill keio nec cs


Z3
ai linda cs csl flits fluit

robot pc24

Figure 1.7: An example of dividing the (original) DNS name space into zones.

These examples illustrate how the naming service as provided by DNS, is


distributed across several machines, thus avoiding that a single server has to
deal with all requests for name resolution.
As another example, consider the World Wide Web. To most users, the Web
appears to be an enormous document-based information system, in which
each document has its own unique name in the form of a URL. Conceptually,

DS 4.01 downloaded by [email protected]


1.2. DESIGN GOALS 31

it may even appear as if there is only a single server. However, the Web
is physically partitioned and distributed across a few hundreds of millions of
servers, each handling often a number of Websites, or parts of Websites. The
name of the server handling a document is encoded into that document’s
URL. It is only because of this distribution of documents that the Web has
been capable of scaling to its current size. Yet, note that finding out how many
servers provide Web-based services is virtually impossible: A Website today
is so much more than a few static Web documents.

Replication Considering that scalability problems often appear in the form


of performance degradation, it is generally a good idea to actually replicate
components or resources, etc., across a distributed system. Replication not
only increases availability, but also helps to balance the load between com-
ponents, leading to better performance. Moreover, in geographically widely
dispersed systems, having a copy nearby can hide much of the communication
latency problems mentioned before.
Caching is a special form of replication, although the distinction between
the two is often hard to make or even artificial. As in the case of replication,
caching results in making a copy of a resource, generally in the proximity of
the client accessing that resource. However, in contrast to replication, caching
is a decision made by the client of a resource and not by the owner of a
resource.
There is one serious drawback to caching and replication that may ad-
versely affect scalability. Because we now have multiple copies of a resource,
modifying one copy makes that copy different from the others. Consequently,
caching and replication leads to consistency problems.
To what extent inconsistencies can be tolerated depends on the usage of a
resource. For example, many Web users find it acceptable that their browser
returns a cached document of which the validity has not been checked for
the last few minutes. However, there are also many cases in which strong
consistency guarantees need to be met, such as in the case of electronic stock
exchanges and auctions. The problem with strong consistency is that an
update must be immediately propagated to all other copies. Moreover, if two
updates happen concurrently, it is often also required that updates are pro-
cessed in the same order everywhere, introducing a global ordering problem.
To make things worse, combining consistency with desirable properties such
as availability may simply be impossible, as we discuss in Chapter 8.
Replication therefore often requires some global synchronization mecha-
nism. Unfortunately, such mechanisms are extremely hard or even impossible
to implement in a scalable way, if alone because network latencies have a nat-
ural lower bound. Consequently, scaling by replication may introduce other,
inherently nonscalable solutions. We return to replication and consistency
extensively in Chapter 7.

downloaded by [email protected] DS 4.01


32 CHAPTER 1. INTRODUCTION

Discussion When considering these scaling techniques, one could argue that
size scalability is the least problematic from a technical perspective. Often,
increasing the capacity of a machine will save the day, although perhaps
there is a high monetary cost to pay. Geographical scalability is a much
tougher problem, as network latencies are naturally bound from below. As
a consequence, we may be forced to copy data to locations close to where
clients are, leading to problems of maintaining copies consistent. Practice
shows that combining distribution, replication, and caching techniques with
different forms of consistency generally leads to acceptable solutions. Finally,
administrative scalability seems to be the most difficult problem to solve,
partly because we need to deal with nontechnical issues, such as politics of or-
ganizations and human collaboration. The introduction and now widespread
use of peer-to-peer technology has successfully demonstrated what can be
achieved if end users are put in control [Lua et al., 2005; Oram, 2001]. How-
ever, peer-to-peer networks are obviously not the universal solution to all
administrative scalability problems.

1.3 A simple classification of distributed systems

We have discussed distributed versus decentralized systems, yet it is also use-


ful to classify distributed systems according to what they are being developed
and used for. We make a distinction between systems that are developed for
(high performance) computing, for general information processing, and those
that are developed for pervasive computing, i.e., for the “Internet of Things.”
As with many classifications, the boundaries between these three types are
not strict and combinations can easily be thought of.

1.3.1 High-performance distributed computing


An important class of distributed systems is the one used for high-performance
computing tasks. Roughly speaking, one can make a distinction between two
subgroups. In cluster computing the underlying hardware consists of a
collection of similar compute nodes, interconnected by a high-speed network,
often alongside a more common local-area network for controlling the nodes.
In addition, each node generally runs the same operating system.
The situation becomes very different in the case of grid computing. This
subgroup consists as decentralized systems that are often constructed as a
federation of computer systems, where each system may fall under a different
administrative domain, and may be very different when it comes to hardware,
software, and deployed network technology.

DS 4.01 downloaded by [email protected]


1.3. A SIMPLE CLASSIFICATION OF DISTRIBUTED SYSTEMS 33

Note 1.6 (More information: Parallel processing)


High-performance computing more or less started with the introduction of multi-
processor machines. In this case, multiple CPUs are organized in such a way that
they all have access to the same physical memory, as shown in Figure 1.8(a). In
contrast, in a multicomputer system several computers are connected through
a network and there is no sharing of main memory, as shown in Figure 1.8(b).
The shared-memory model turned out to be highly convenient for improving the
performance of programs, and it was relatively easy to program.

(a) (b)

Figure 1.8: A comparison between (a) multiprocessor and (b) multicom-


puter architectures.

Its essence is that multiple threads of control are executing at the same time,
while all threads have access to shared data. Access to that data is controlled
through well-understood synchronization mechanisms like semaphores (see Ben-
Ari [2006] or Herlihy et al. [2021] for more information on developing parallel
programs). Unfortunately, the model does not easily scale: so far, machines have
been developed in which only a few tens (and sometimes hundreds) of CPUs
have efficient access to shared memory. To a certain extent, we are seeing the
same limitations for multicore processors.
To overcome the limitations of shared-memory systems, high-performance
computing moved to distributed-memory systems. This shift also meant that many
programs had to make use of message passing instead of modifying shared data as
a means of communication and synchronization between threads. Unfortunately,
message-passing models have proven to be much more difficult and error-prone
compared to the shared-memory programming models. For this reason, there
has been significant research in attempting to build so-called distributed shared-
memory multicomputers, or simply DSM systems [Amza et al., 1996].
In essence, a DSM system allows a processor to address a memory location
at another computer as if it were local memory. This can be achieved using
existing techniques available to the operating system, for example, by mapping
all main-memory pages of the various processors into a single virtual address
space. Whenever a processor A addresses a page located at another processor B,
a page fault occurs at A allowing the operating system at A to fetch the content of
the referenced page at B in the same way that it would normally fetch it locally
from disk. At the same time, the processor B would be informed that the page is
currently not accessible.

downloaded by [email protected] DS 4.01


34 CHAPTER 1. INTRODUCTION

Mimicking shared-memory systems using multicomputers eventually had to


be abandoned because performance could never meet the expectations of pro-
grammers, who would rather resort to far more intricate, yet better (predictably)
performing message-passing models. An important side effect of exploring the
hardware-software boundaries of parallel processing is a thorough understanding
of consistency models, to which we return extensively in Chapter 7.

Cluster computing
Cluster computing systems became popular when the price/performance
ratio of personal computers and workstations improved. At a certain point, it
became financially and technically attractive to build a supercomputer using
off-the-shelf technology by simply hooking up a collection of relatively simple
computers in a high-speed network. In virtually all cases, cluster computing is
used for parallel programming, in which a single (compute intensive) program
is run in parallel on multiple machines. The principle of this organization is
shown in Figure 1.9.
This type of high-performance computing has evolved considerably. As
discussed extensively by Gerofi et al. [2019], the developments of supercom-
puters organized as clusters have reached a point where we see clusters with
more than 100,000 CPUs, with each CPU having 8 or 16 cores. There are mul-
tiple networks. Most important is a network formed by dedicated high-speed
interconnects between the various nodes (in other words, there is often no
such thing as a shared high-speed network for computations). A separate
management network, as well as nodes, are used to monitor and control

Figure 1.9: An example of a cluster computing system (adapted from [Gerofi


et al., 2019].)

DS 4.01 downloaded by [email protected]


1.3. A SIMPLE CLASSIFICATION OF DISTRIBUTED SYSTEMS 35

the organization and performance of the system as a whole. In addition, a


special high-performance file system or database is used, again with its own
local, dedicated network. Figure 1.9 does not show additional equipment,
notably high-speed I/O as well as networking facilities for remote access and
communication.
A management node is generally responsible for collecting jobs from users,
to subsequently distribute the associated tasks among the various compute
nodes. In practice, several management nodes are used when dealing with
very large clusters. As such, a management node actually runs the software
needed for the execution of programs and management of the cluster, while
the compute nodes are equipped with a standard operating system extended
with typical functions for communication, storage, fault tolerance, and so on.
An interesting development, as explained in Gerofi et al. [2019], is the
role of the operating system. There has been a clear trend to minimize the
operating system to lightweight kernels, essentially ensuring the least possible
overhead. A drawback is that such operating systems become highly spe-
cialized and fine-tuned toward the underlying hardware. This specialization
affects compatibility, or openness. To compensate, we are now gradually
seeing so-called multikernel approaches, in which a full-fledged operating
system operates next to a lightweight kernel, thus achieving the best of two
worlds. This combination is also necessary given increasingly more often, a
high-performance compute node is required to run multiple, independent
jobs simultaneously. At present, 95% of all high-performance computers run
Linux-based systems; multikernel approaches are developed for multicore
CPUs, with most cores running a lightweight kernel and the other running
a regular Linux system. In this way, new developments such as contain-
ers (which we discuss in Chapter 3) can also be supported. The effects for
computing performance still needs to be seen.

Grid computing
A characteristic feature of traditional cluster computing is its homogeneity.
In most cases, the computers in a cluster are largely the same, have the
same operating system, and are all connected through the same network.
However, as we just discussed, there is a continuous trend toward more
hybrid architectures in which nodes are specifically configured for certain
tasks. This diversity is even more prevalent in grid-computing systems: no
assumptions are made concerning similarity of hardware, operating systems,
networks, administrative domains, security policies, etc. [Rajaraman, 2016].
A key issue in a grid-computing system is that resources from different
organizations are brought together to allow the collaboration of a group of
people from different institutions, indeed forming a federation of systems.
Such a collaboration is realized in the form of a virtual organization. The
processes belonging to the same virtual organization have access rights to the

downloaded by [email protected] DS 4.01


36 CHAPTER 1. INTRODUCTION

resources that are provided to that organization. Typically, resources consist of


compute servers (including supercomputers, possibly implemented as cluster
computers), storage facilities, and databases. In addition, special networked
devices such as telescopes, sensors, etc., can be provided as well.
Given its nature, much of the software for realizing grid computing evolves
around providing access to resources from different administrative domains,
and to only those users and applications that belong to a specific virtual
organization. For this reason, focus is often on architectural issues. An
architecture initially proposed by Foster et al. [2001] is shown in Figure 1.10,
which still forms the basis for many grid computing systems.

Figure 1.10: A layered architecture for grid computing systems.

The architecture consists of four layers. The lowest fabric layer provides
interfaces to local resources at a specific site. Note that these interfaces are
tailored to allow sharing of resources within a virtual organization. Typically,
they will provide functions for querying the state and capabilities of a resource,
along with functions for actual resource management (e.g., locking resources).
The connectivity layer consists of communication protocols for supporting
grid transactions that span the usage of multiple resources. For example,
protocols are needed to transfer data between resources, or to simply access
a resource from a remote location. In addition, the connectivity layer will
contain security protocols to authenticate users and resources. Note that in
many cases, human users are not authenticated; instead, programs acting on
behalf of the users are authenticated. In this sense, delegating rights from
a user to programs is an important function that needs to be supported in
the connectivity layer. We return to delegation when discussing security in
distributed systems in Chapter 9.
The resource layer is responsible for managing a single resource. It uses the
functions provided by the connectivity layer and calls directly the interfaces
made available by the fabric layer. For example, this layer will offer functions
for obtaining configuration information on a specific resource, or, in general,
to perform specific operations such as creating a process or reading data. The

DS 4.01 downloaded by [email protected]


1.3. A SIMPLE CLASSIFICATION OF DISTRIBUTED SYSTEMS 37

resource layer is thus seen to be responsible for access control, and hence will
rely on the authentication performed as part of the connectivity layer.
The next layer in the hierarchy is the collective layer. It deals with handling
access to multiple resources and typically consists of services for resource
discovery, allocation and scheduling of tasks onto multiple resources, data
replication, and so on. Unlike the connectivity and resource layer, each
consisting of a relatively small, standard collection of protocols, the collective
layer may consist of many protocols reflecting the broad spectrum of services
it may offer to a virtual organization.
Finally, the application layer consists of the applications that operate
within a virtual organization and which make use of the grid computing
environment.
Typically, the collective, connectivity, and resource layer form the heart of
what could be called a grid middleware layer. These layers jointly provide
access to and management of resources that are potentially dispersed across
multiple sites.
An important observation from a middleware perspective is that in grid
computing, the notion of a site (or administrative unit) is common. This
prevalence is emphasized by the gradual shift toward a service-oriented ar-
chitecture in which sites offer access to the various layers through a collection
of Web services [Joseph et al., 2004]. This, by now, has led to the definition of
an alternative architecture known as the Open Grid Services Architecture
(OGSA) [Foster et al., 2006]. OGSA is based upon the original ideas as for-
mulated by Foster et al. [2001], yet having gone through a standardization
process makes it complex, to say the least. OGSA implementations generally
follow Web service standards.

1.3.2 Distributed information systems


Another important class of distributed systems is found in organizations
that were confronted with a wealth of networked applications, but for which
interoperability turned out to be a painful experience. Many of the existing
middleware solutions are the result of working with an infrastructure in which
it was easier to integrate applications into an enterprise-wide information
system [Alonso et al., 2004; Bernstein, 1996; Hohpe and Woolf, 2004].
We can distinguish several levels at which integration can take place. Often,
a networked application simply consists of a server running that application
(often including a database) and making it available to remote programs,
called clients. Such clients send a request to the server for executing a specific
operation, after which a response is sent back. Integration at the lowest level
allows clients to wrap several requests, possibly for different servers, into a
single larger request and have it executed as a distributed transaction. The
key idea is that all, or none of the requests are executed.

downloaded by [email protected] DS 4.01


38 CHAPTER 1. INTRODUCTION

As applications became more sophisticated and were gradually separated


into independent components (notably distinguishing database components
from processing components), it became clear that integration should also
take place by letting applications communicate directly with each other. This
has now led to an industry on enterprise application integration (EAI).

Distributed transaction processing


To clarify our discussion, we concentrate on database applications. In practice,
operations on a database are carried out in the form of transactions. Pro-
gramming using transactions requires special primitives that must either be
supplied by the underlying distributed system or by the language runtime
system. Typical examples of transaction primitives are shown in Figure 1.11.
The exact list of primitives depends on what kinds of objects are being used
in the transaction [Gray and Reuter, 1993; Bernstein and Newcomer, 2009]. In
a mail system, there might be primitives to send, receive, and forward mail.
In an accounting system, they might be quite different. READ and WRITE are
typical examples, however. Ordinary statements, procedure calls, and so on,
are also allowed inside a transaction. In particular, remote procedure calls
(RPC), that is, procedure calls to remote servers, are often also encapsulated
in a transaction, leading to what is known as a transactional RPC. We discuss
RPCs extensively in Section 4.2.

Primitive Description
BEGIN_TRANSACTION Mark the start of a transaction
END_TRANSACTION Terminate the transaction and try to commit
ABORT_TRANSACTION Kill the transaction and restore the old values
READ Read data from a file, a table, or otherwise
WRITE Write data to a file, a table, or otherwise

Figure 1.11: Example primitives for transactions.

BEGIN_TRANSACTION and END_TRANSACTION are used to delimit the


scope of a transaction. The operations between them form the body of the
transaction. The characteristic feature of a transaction is either all of these
operations are executed or none are executed. These may be system calls,
library procedures, or bracketing statements in a language, depending on the
implementation.
This all-or-nothing property of transactions is one of the four characteristic
properties that transactions have. More specifically, transactions adhere to the
so-called ACID properties:

• Atomic: To the outside world, the transaction happens indivisibly


• Consistent: The transaction does not violate system invariants

DS 4.01 downloaded by [email protected]


1.3. A SIMPLE CLASSIFICATION OF DISTRIBUTED SYSTEMS 39

• Isolated: Concurrent transactions do not interfere with each other


• Durable: Once a transaction commits, the changes are permanent

In distributed systems, transactions are often constructed as a number of


subtransactions, jointly forming a nested transaction as shown in Figure 1.12.
The top-level transaction may fork off children that run in parallel with one
another, on different machines, to gain performance or simplify programming.
Each of these children may also execute one or more subtransactions, or fork
off its own children.

Figure 1.12: A nested transaction.

Subtransactions give rise to a subtle, but important, problem. Imagine


that a transaction starts several subtransactions in parallel, and one of these
commits, making its results visible to the parent transaction. After further
computation, the parent aborts, restoring the entire system to the state it
had before the top-level transaction started. Consequently, the results of
the subtransaction that committed must nevertheless be undone. Thus, the
permanence referred to above applies only to top-level transactions.
Since transactions can be nested arbitrarily deep, considerable administra-
tion is needed to get everything right. The semantics are clear, however. When
any transaction or subtransaction starts, it is conceptually given a private copy
of all data in the entire system for it to manipulate as it wishes. If it aborts,
its private universe just vanishes, as if it had never existed. If it commits,
its private universe replaces the parent’s universe. Thus, if a subtransaction
commits and then later a new subtransaction is started, the second one sees
the results produced by the first one. Likewise, if an enclosing (higher level)
transaction aborts, all its underlying subtransactions have to be aborted as
well. And if several transactions are started concurrently, the result is as if
they ran sequentially in some unspecified order.
Nested transactions are important in distributed systems, for they provide
a natural way of distributing a transaction across multiple machines. They
follow a logical division of the work of the original transaction. For example,
a transaction for planning a trip by which three different flights need to be

downloaded by [email protected] DS 4.01


40 CHAPTER 1. INTRODUCTION

reserved can be logically split up into three subtransactions. Each of these


subtransactions can be managed separately and independently.

Server
Reply
Transaction Request
Requests
Request
Client Server
TP monitor
application
Reply
Reply
Request

Server
Reply

Figure 1.13: The role of a TP monitor in distributed systems.


Ever since the early days of enterprise middleware systems, the compo-
nent that handles distributed (or nested) transactions belongs to the core for
integrating applications at the server or database level. This component is
called a transaction-processing monitor or TP monitor for short. Its main
task is to allow an application to access multiple server/databases by offering
it a transactional programming model, as shown in Figure 1.13. Essentially,
the TP monitor coordinates the commitment of subtransactions following a
standard protocol known as distributed commit, which we discuss in detail
in Section 8.5.
An important observation is that applications wanting to coordinate sev-
eral subtransactions into a single transaction do not have to implement this
coordination themselves. By simply making use of a TP monitor, this coordi-
nation is done for them. This is precisely where middleware comes into play:
it implements services that are useful for many applications, avoiding that
such services have to be reimplemented over and over again by application
developers.
Enterprise application integration
As mentioned, the more applications became decoupled from the databases
they were built upon, the more evident it became that facilities were needed
to integrate applications independently of their databases. In particular, appli-
cation components should be able to communicate directly with each other
and not merely by means of the request/reply behavior that was supported
by transaction processing systems.
This need for interapplication communication led to many communication
models. The main idea was that existing applications could directly exchange
information, as shown in Figure 1.14.

DS 4.01 downloaded by [email protected]


1.3. A SIMPLE CLASSIFICATION OF DISTRIBUTED SYSTEMS 41

Figure 1.14: Middleware as a communication facilitator in enterprise applica-


tion integration.

Several types of communication middleware exist. With remote procedure


calls (RPC), an application component can effectively send a request to another
application component by doing a local procedure call, which results in the
request being packaged as a message and sent to the callee. Likewise, the
result will be sent back and returned to the application as the result of the
procedure call.
As the popularity of object technology increased, techniques were devel-
oped to allow calls to remote objects, leading to what is known as remote
method invocations (RMI). An RMI is essentially the same as an RPC, except
that it operates on objects instead of functions.
RPC and RMI have the disadvantage that the caller and callee both need
to be up and running at the time of communication. In addition, they
need to know exactly how to refer to each other. This tight coupling is
often experienced as a serious drawback, and has led to what is known as
message-oriented middleware, or simply MOM. In this case, applications
send messages to logical contact points, often described by a subject. Likewise,
applications can indicate their interest for a specific type of message, after
which the communication middleware will take care that those messages are
delivered to those applications. These so-called publish-subscribe systems
form an important and expanding class of distributed systems.

Note 1.7 (More information: On integrating applications)


Supporting enterprise application integration is an important goal for many mid-
dleware products. In general, there are four ways to integrate applications [Hohpe
and Woolf, 2004]:
File transfer: The essence of integration through file transfer, is that an applica-
tion produces a file containing shared data that is subsequently read by

downloaded by [email protected] DS 4.01


42 CHAPTER 1. INTRODUCTION

other applications. The approach is technically simple, making it appealing.


The drawback, however, is that there are numerous things that need to be
agreed upon:

• File format and layout: text, binary, its structure, and so on. Nowadays,
the extended markup language (XML) has become popular as its files
are, in principle, self-describing.
• File management: where are they stored, how are they named, who is
responsible for deleting files?
• Update propagation: When an application produces a file, there may
be several applications that need to read that file to provide the view
of a single coherent system. As a consequence, sometimes separate
programs need to be implemented that notify applications of file
updates.

Shared database: Many of the problems associated with integration through files
are alleviated when using a shared database. All applications will have
access to the same data, and often through a high-level database language
such as SQL. Furthermore, it is easy to notify applications when changes
occur, as triggers are often part of modern databases. There are, however,
two major drawbacks. First, there is still a need to design a common
data schema, which may be far from trivial if the set of applications that
need to be integrated is not completely known in advance. Second, when
there are many reads and updates, a shared database can easily become a
performance bottleneck.
Remote procedure call: Integration through files or a database implicitly as-
sumes that changes by one application can easily trigger other applications
to act. However, practice shows that sometimes small changes should
actually trigger many applications to take actions. In such cases, it is not
really the change of data that is important, but the execution of a series of
actions.
Series of actions are best captured through the execution of a procedure
(which may, in turn, lead to all kinds of changes in shared data). To prevent
that every application needs to know all the internals of those actions (as
implemented by another application), standard encapsulation techniques
should be used, as deployed with traditional procedure calls or object
invocations. For such situations, an application can best offer a procedure
to other applications in the form of a remote procedure call, or RPC. In
essence, an RPC allows an application A to make use of the information
available only to the application B, without giving A direct access to that
information. There are many advantages and disadvantages to remote
procedure calls, which are discussed in depth in Chapter 4.
Messaging: A main drawback of RPCs is that caller and callee need to be up
and running at the same time in order for the call to succeed. However, in
many scenarios, this simultaneous activity is often difficult or impossible

DS 4.01 downloaded by [email protected]


1.3. A SIMPLE CLASSIFICATION OF DISTRIBUTED SYSTEMS 43

to guarantee. In such cases, offering a messaging system carrying requests


from the application A to perform an action at the application B, is what
is needed. The messaging system ensures that eventually the request is
delivered, and if needed, that a response is eventually returned as well.
Obviously, messaging is not the panacea for application integration: it also
introduces problems concerning data formatting and layout, it requires an
application to know where to send a message to, there need to be scenarios
for dealing with lost messages, and so on. Like RPCs, we will be discussing
these issues extensively in Chapter 4.
What these four approaches tell us, is that application integration will generally
not be simple. Middleware (in the form of a distributed system), however, can
significantly help in integration by providing the right facilities such as support
for RPCs or messaging. As said, enterprise application integration is an important
target field for many middleware products.

1.3.3 Pervasive systems


The distributed systems discussed so far are largely characterized by their
stability: nodes are fixed and have a more or less permanent and high-quality
connection to a network. To a certain extent, this stability is realized through
the various techniques for achieving distribution transparency. For example,
there are many ways how we can create the illusion that only occasionally
components may fail. Likewise, there are all kinds of means to hide the actual
network location of a node, effectively allowing users and applications to
believe that nodes stay put.
However, matters have changed since the introduction of mobile and
embedded computing devices, leading to what are generally referred to as
pervasive systems. As its name suggests, pervasive systems are intended
to blend into our environment naturally. Many of their components are
necessarily spread across multiple computers, making them arguably a type of
decentralized system in our view. At the same time, most pervasive systems
have many components that are sufficiently spread throughout the system, for
example, to handle failures and such. In this sense, they are also arguably
distributed systems. The seemingly strict separation between decentralized
and distributed systems is thus seen to be less strict than one could initially
imagine.
What makes them unique, in comparison to the computing and infor-
mation systems described so far, is that the separation between users and
system components is much more blurred. There is often no single dedicated
interface, such as a screen/keyboard combination. Instead, a pervasive system
is often equipped with many sensors that pick up various aspects of a user’s
behavior. Likewise, it may have a myriad of actuators to provide information
and feedback, often even purposefully aiming to steer behavior.

downloaded by [email protected] DS 4.01


44 CHAPTER 1. INTRODUCTION

Many devices in pervasive systems are characterized by being small,


battery-powered, mobile, and having only a wireless connection, although
not all these characteristics apply to all devices. These are not necessarily
restrictive characteristics, as is illustrated by smartphones [Roussos et al., 2005]
and their role in what is now coined as the Internet of Things [Mattern and
Floerkemeier, 2010; Stankovic, 2014]. Nevertheless, notably, the fact that we
often need to deal with the intricacies of wireless and mobile communication,
will require special solutions to make a pervasive system as transparent or
unobtrusive as possible.
In the following, we make a distinction between three different types of
pervasive systems, although there is considerable overlap between the three
types: ubiquitous computing systems, mobile systems, and sensor networks.
This distinction allows us to focus on different aspects of pervasive systems.

Ubiquitous computing systems


So far, we have been talking about pervasive systems to emphasize that
its elements have spread through in many parts of our environment. In a
ubiquitous computing system, we go one step further: the system is pervasive
and continuously present. The latter means that a user will be continuously
interacting with the system, often not even being aware that interaction is
taking place. Poslad [2009] describes the core requirements for a ubiquitous
computing system roughly as follows:

1. (Distribution) Devices are networked, distributed, and accessible trans-


parently
2. (Interaction) Interaction between users and devices is highly unobtrusive
3. (Context awareness) The system is aware of a user’s context to optimize
interaction
4. (Autonomy) Devices operate autonomously without human intervention,
and are thus highly self-managed
5. (Intelligence) The system as a whole can handle a wide range of dy-
namic actions and interactions

Let us consider these requirements from a distributed-systems perspective.

Ad. 1: Distribution As mentioned, a ubiquitous computing system is an


example of a distributed system: the devices and other computers forming
the nodes of a system are simply networked and work together to form
the illusion of a single, coherent system. Distribution also comes naturally:
there will be devices close to users (such as sensors and actuators), connected
to computers hidden from view and perhaps even operating remotely in a

DS 4.01 downloaded by [email protected]


1.3. A SIMPLE CLASSIFICATION OF DISTRIBUTED SYSTEMS 45

cloud. Most, if not all, of the requirements regarding distribution transparency


mentioned in Section 1.2.2, should therefore hold.

Ad. 2: Interaction When it comes to interaction with users, ubiquitous


computing systems differ a lot in comparison to the systems we have been
discussing so far. End users play a prominent role in the design of ubiquitous
systems, meaning that special attention needs to be paid to how the interac-
tion between users and core system takes place. For ubiquitous computing
systems, much of the interaction by humans will be implicit, with an implicit
action being defined as one “that is not primarily aimed to interact with a com-
puterized system but which such a system understands as input” [Schmidt,
2000]. In other words, a user could be mostly unaware of the fact that input is
being provided to a computer system. From a certain perspective, ubiquitous
computing can be said to seemingly hide interfaces.
A simple example is where the settings of a car’s driver’s seat, steering
wheel, and mirrors are fully personalized. If Bob takes a seat, the system will
recognize that it is dealing with Bob and subsequently makes the appropriate
adjustments. The same happens when Alice uses the car, while an unknown
user will be steered toward making his or her own adjustments (to be remem-
bered for later). This example already illustrates an important role of sensors
in ubiquitous computing, namely as input devices that are used to identify a
situation (a specific person apparently wanting to drive), whose input analysis
leads to actions (making adjustments). In turn, the actions may lead to natural
reactions, for example that Bob slightly changes the seat settings. The system
will have to take all (implicit and explicit) actions by the user into account
and react accordingly.

Ad. 3: Context awareness Reacting to the sensory input, but also the explicit
input from users, is more easily said than done. What a ubiquitous computing
system needs to do, is to take the context in which interactions take place
into account. Context awareness also differentiates ubiquitous computing
systems from the more traditional systems we have been discussing before,
and is described by Dey and Abowd [2000] as “any information that can be
used to characterize the situation of entities (i.e., whether a person, place, or
object) that are considered relevant to the interaction between a user and an
application, including the user and the application themselves.” In practice,
context is often characterized by location, identity, time, and activity: the where,
who, when, and what. A system will need to have the necessary (sensory) input
to determine one or several of these context types. As discussed by Alegre
et al. [2016], developing context-aware systems is difficult, if only for the
reason that the notion of context is difficult to grasp.
What is important, from a distributed-systems perspective, is that raw
data as collected by various sensors is lifted to a level of abstraction that can
be used by applications. A concrete example is detecting where a person is,

downloaded by [email protected] DS 4.01


46 CHAPTER 1. INTRODUCTION

for example in terms of GPS coordinates, and subsequently mapping that


information to an actual location, such as the corner of a street, or a specific
shop or other known facility. The question is where this processing of sensory
input takes place: is all data collected at a central server connected to a
database with detailed information on a city, or is it the user’s smartphone
where the mapping is done? Clearly, there are trade-offs to be considered.
Dey [2010] discusses more general approaches toward building context-
aware applications. When it comes to combining flexibility and potential
distribution, so-called shared data spaces in which processes are decoupled
in time and space are attractive, yet as we shall see in later chapters, suffer
from scalability problems. A survey on context-awareness and its relation to
middleware and distributed systems is provided by Baldauf et al. [2007].

Ad. 4: Autonomy An important aspect of most ubiquitous computing sys-


tems is that explicit systems management has been reduced to a minimum. In
a ubiquitous computing environment, there is simply no room for a systems
administrator to keep everything up and running. As a consequence, the
system as a whole should be able to act autonomously, and automatically
react to changes. This requires a myriad of techniques, of which several will
be discussed throughout this book. To give a few simple examples, think of
the following:

Address allocation: In order for networked devices to communicate, they


need an IP address. Addresses can be allocated automatically using pro-
tocols like the Dynamic Host Configuration Protocol (DHCP) [Droms,
1997] (which requires a server) or Zeroconf [Guttman, 2001].

Adding devices: It should be easy to add devices to an existing system. A


step towards automatic configuration is realized by the Universal Plug
and Play protocol (UPnP) [UPnP Forum, 2008]. Using UPnP, devices can
discover each other and make sure that they can set up communication
channels between them.

Automatic updates: Many devices in a ubiquitous computing system should


be able to regularly check through the Internet if their software should
be updated. If so, they can download new versions of their components
and ideally continue where they left off.

Admittedly, these are simple examples, but the picture should be clear that
manual intervention is to be kept to a minimum. We will be discussing many
techniques related to self-management in detail throughout the book.

Ad. 5: Intelligence Finally, Poslad [2009] mentions that ubiquitous com-


puting systems often use methods and techniques from the field of artificial

DS 4.01 downloaded by [email protected]


1.3. A SIMPLE CLASSIFICATION OF DISTRIBUTED SYSTEMS 47

intelligence. What this means, is that often a wide range of advanced algo-
rithms and models need to be deployed to handle incomplete input, quickly
react to a changing environment, handle unexpected events, and so on. The
extent to which this can or should be done in a distributed fashion is cru-
cial from the perspective of distributed systems. Unfortunately, distributed
solutions for many problems in the field of artificial intelligence are yet to
be found, meaning that there may be a natural tension between the first
requirement of networked and distributed devices, and advanced distributed
information processing.

Mobile computing systems


As mentioned, mobility often forms an important component of pervasive
systems, and many, if not all aspects that we have just discussed also apply to
mobile computing. There are several issues that set mobile computing aside
to pervasive systems in general (see also Adelstein et al. [2005] and Tarkoma
and Kangasharju [2009]).
First, the devices that form part of a (distributed) mobile system may
vary widely. Typically, mobile computing is done with devices such as
smartphones and tablet computers. However, entirely different types of
devices are now using the Internet Protocol (IP) to communicate, placing
mobile computing in a different perspective. Such devices include remote
controls, pagers, active badges, car equipment, various GPS-enabled devices,
and so on. A characteristic feature of all these devices is that they use wireless
communication. Mobile implies wireless, so it seems (although there are
exceptions to the rules).
Second, in mobile computing, the location of a device is assumed to change
over time. A changing location has its effects on many issues. For example, if
the location of a device changes regularly, so will perhaps the services that
are locally available. As a consequence, we may need to pay special attention
to dynamically discovering services, but also letting services announce their
presence. In a similar vein, we often also want to know where a device actually
is. This may mean that we need to know the actual geographical coordinates
of a device such as in tracking and tracing applications, but it may also require
that we can simply detect its network position (as in mobile IP [Perkins, 2010;
Perkins et al., 2011].
Changing locations may also have a profound effect on communication.
For some time, researchers in mobile computing have been concentrating on
what are known as mobile ad hoc networks, also known as MANETs. The
basic idea was that a group of local mobile computers would jointly set up a
local, wireless network and to subsequently share resources and services. The
idea never really became popular. Along the same lines, there are researchers
who believe that end users are willing to share their local resources for another
user’s compute, storage, or communication requirements (see, e.g., Ferrer

downloaded by [email protected] DS 4.01


48 CHAPTER 1. INTRODUCTION

et al. [2019]). Practice has shown over and over again that voluntarily making
resources available is not something users are willing to do, even if they have
resources in abundance. The effect is that in the case of mobile computing,
we generally see single mobile devices setting up connections to stationary
servers. Changing locations then simply means that those connections need
to be handed over by routers on the path from the mobile device to the
server. Mobile computing is then brought back to its essence: a mobile device
connected to a server (and nothing else). In practice, this means that mobile
computing is all about mobile devices making use of cloud-based services, as
sketched in Figure 1.15(a).

(a)

(b)

Figure 1.15: (a) Mobile Cloud Computing versus (b) Mobile Edge Computing.

Despite the conceptual simplicity of this model of mobile computing, the


mere fact that so many devices make use of remote services has led to what is
known as Mobile Edge Computing, or simply MEC [Abbas et al., 2018], in
contrast to Mobile Cloud Computing (MCC). As we shall discuss further in
Chapter 2, (mobile) edge computing, as sketched in Figure 1.15(b), is becoming
increasingly important in those cases where latency, but also computational
issues play a role for the mobile device. Typical example applications that
require short latencies and computational resources include augmented reality,
interactive gaming, real-time sports monitoring, and various health applica-
tions [Dimou et al., 2022]. In these examples, the combination of monitoring,
analyses, and immediate feedback in general make it difficult to rely on
servers that may be placed thousands of miles from the mobile devices.

DS 4.01 downloaded by [email protected]


1.3. A SIMPLE CLASSIFICATION OF DISTRIBUTED SYSTEMS 49

Sensor networks

Our last example of pervasive systems is sensor networks. These networks


in many cases form part of the enabling technology for pervasiveness, and
we see that many solutions for sensor networks return in pervasive applica-
tions. What makes sensor networks interesting from a distributed system’s
perspective is that they are more than just a collection of input devices. In-
stead, as we shall see, sensor nodes often collaborate to process the sensed
data efficiently in an application-specific manner, making them very different
from, for example, traditional computer networks. Akyildiz et al. [2002] and
Akyildiz et al. [2005] provide an overview from a networking perspective. A
more systems-oriented introduction to sensor networks is given by Zhao and
Guibas [2004], but also [Hahmy, 2021] will show to be useful.
A sensor network generally consists of tens to hundreds or thousands of
relatively small nodes, each equipped with one or more sensing devices. In
addition, nodes can often act as actuators [Akyildiz and Kasimoglu, 2004],
a typical example being the automatic activation of sprinklers when a fire
has been detected. Many sensor networks use wireless communication, and
the nodes are often battery powered. Their limited resources, restricted
communication capabilities, and constrained power consumption demand
that efficiency is high on the list of design criteria.
When zooming into an individual node, we see that, conceptually, they
do not differ a lot from “normal” computers: above the hardware there is a
software layer akin to what traditional operating systems offer, including low-
level network access, access to sensors and actuators, memory management,
and so on. Normally, support for specific services is included, such as
localization, local storage (think of additional flash devices), and convenient
communication facilities such as messaging and routing. However, similar to
other networked computer systems, additional support is needed to effectively
deploy sensor network applications. In distributed systems, this takes the form
of middleware. For sensor networks, we can, in principle, follow a similar
approach in those cases that sensor nodes are sufficiently powerful and that
energy consumption is not a hindrance for running a more elaborate software
stack. Various approaches are possible (see also [Zhang et al., 2021b]).
From a programming perspective, and extensively surveyed by Mottola
and Picco [2011], it is important to take the scope of communication primitives
into account. This scope can vary between addressing the physical neighbor-
hood of a node, and providing primitives for systemwide communication.
In addition, it may also be possible to address a specific group of nodes.
Likewise, computations may be restricted to an individual node, a group
of nodes, or affect all nodes. To illustrate, Welsh and Mainland [2004] use
so-called abstract regions allowing a node to identify a neighborhood from
where it can, for example, gather information:

downloaded by [email protected] DS 4.01


50 CHAPTER 1. INTRODUCTION

1 region = k_nearest_region.create(8);
2 reading = get_sensor_reading();
3 region.putvar(reading_key, reading);
4 max_id = region.reduce(OP_MAXID, reading_key);

In line 1, a node first creates a region of its eight nearest neighbors, after which
it fetches a value from its sensor(s). This reading is subsequently written to
the previously defined region to be defined using the key reading_key. In
line 4, the node checks whose sensor reading in the defined region was the
largest, which is returned in the variable max_id.
When considering that sensor networks produce data, one can also focus
on the data-access model. This can be done directly by sending messages to
and between nodes, or either moving code between nodes to locally access
data. More advanced is to make remote data directly accessible, as if variables
and such were available in a shared data space. Finally, and also quite popular,
is to let the sensor network provide a view of a single database. Such a
view is easy to understand when realizing that many sensor networks are
deployed for measurement and surveillance applications [Bonnet et al., 2002].
In these cases, an operator would like to extract information from (a part of)
the network by simply issuing queries such as “What is the northbound traffic
load on highway 1 at Santa Cruz?” Such queries resemble those of traditional
databases. In this case, the answer will probably need to be provided through
collaboration of many sensors along highway 1, while leaving other sensors
untouched.
To organize a sensor network as a distributed database, there are essentially
two extremes, as shown in Figure 1.16. First, sensors do not cooperate but
simply send their data to a centralized database located at the operator’s site.
The other extreme is to forward queries to relevant sensors and to let each
compute an answer, requiring the operator to aggregate the responses.
Neither of these solutions is very attractive. The first one requires that
sensors send all their measured data through the network, which may waste
network resources and energy. The second solution may also be wasteful,
as it discards the aggregation capabilities of sensors, which would allow
much fewer data to be returned to the operator. What is needed are facilities
for in-network data processing, similar to the previous example of abstract
regions.
In-network processing can be done in numerous ways. One obvious way
is to forward a query to all sensor nodes along a tree encompassing all nodes
and to subsequently aggregate the results as they are propagated back to the
root, where the initiator is located. Aggregation will take place where two
or more branches of the tree come together. As simple as this scheme may
sound, it introduces difficult questions:
• How do we (dynamically) set up an efficient tree in a sensor network?
• How does aggregation of results take place? Can it be controlled?

DS 4.01 downloaded by [email protected]


1.3. A SIMPLE CLASSIFICATION OF DISTRIBUTED SYSTEMS 51

(a)

(b)

Figure 1.16: Organizing a sensor network database, while storing and process-
ing data (a) only at the operator’s site or (b) only at the sensors.

• What happens when network links fail?

These questions have been partly addressed in TinyDB, which implements


a declarative (database) interface to wireless sensor networks [Madden et al.,
2005]. In essence, TinyDB can use any tree-based routing algorithm. An
intermediate node will collect and aggregate the results from its children,
along with its own findings, and send that toward the root. To make matters
efficient, queries span a period of time, allowing for careful scheduling of
operations so that network resources and energy are optimally consumed.
However, when queries can be initiated from different points in the net-
work, using single-rooted trees such as in TinyDB may not be efficient enough.
As an alternative, sensor networks may be equipped with special nodes where
results are forwarded to, as well as the queries related to those results. To give
a simple example, queries and results related to temperature readings may be
collected at a different location than those related to humidity measurements.
This approach corresponds directly to the notion of publish-subscribe systems.
The interesting aspect of sensor networks, as discussed along these lines,
is that we really need to concentrate on the organization of sensor nodes, and
not the sensors themselves. Likewise, many sensor nodes will be equipped
with actuators, i.e., devices that directly influence an environment. A typical

downloaded by [email protected] DS 4.01


52 CHAPTER 1. INTRODUCTION

Figure 1.17: A hierarchical view from clouds to devices (adapted from Yousef-
pour et al. [2019]).

actuator is one that controls the temperature in a room, or switches devices


on or off. By viewing and organizing the network as a distributed system, an
operator is provided with a higher level of abstraction to monitor and control
a situation.

Cloud, edge, things

As may have become clear by now, distributed systems span a huge range of
different networked computer systems. Many of such systems operate in a
setting in which the various computers are connected through a local-area
network. Yet with the growth of the Internet-of-Things and the connectivity
with remote services offered through cloud-based systems, new organizations
across wide-area networks are emerging. Figure 1.17 presents this more
hierarchical approach.
Typically, higher up the hierarchy we see that typical qualities of dis-
tributed systems improve: they become more reliable, have more capacity,
and, in general, perform better. Lower in the hierarchy, we see that location-
related aspects are easier facilitated, as well as performance qualities related to
latencies. At the same time, the lower parts show in an increase in the number
of devices and computers, whereas higher up, the number of computers
becomes less.

1.4 Pitfalls
It should be clear by now that developing a distributed system is a formidable
task. As we will see many times throughout this book, there are so many
issues to consider, while it seems that only complexity can be the result.

DS 4.01 downloaded by [email protected]


1.5. SUMMARY 53

Nevertheless, by following several design principles, distributed systems can


be developed that strongly adhere to the goals we set out in this chapter.
Distributed systems differ from traditional software because components
are dispersed across a network. Not taking this dispersion into account during
design time is what makes so many systems needlessly complex and results in
flaws that need to be patched later on. Peter Deutsch, at the time working at
Sun Microsystems, formulated these flaws as the following false assumptions
that many make when developing a distributed application for the first time:

• The network is reliable


• The network is secure
• The network is homogeneous
• The topology does not change
• Latency is zero
• Bandwidth is infinite
• Transport cost is zero
• There is one administrator

Note how these assumptions relate to properties that are unique to dis-
tributed systems: reliability, security, heterogeneity, and topology of the
network; latency and bandwidth; transport costs; and finally administrative
domains. When developing nondistributed applications, most of these issues
will most likely not show up.
Most of the principles we discuss in this book relate immediately to these
assumptions. In all cases, we will be discussing solutions to problems that
are caused by the fact that one or more assumptions are false. For example,
reliable networks simply do not exist and lead to the impossibility of achieving
failure transparency. We devote an entire chapter to deal with the fact that
networked communication is inherently insecure. We have already argued
that distributed systems need to be open and take heterogeneity into account.
Likewise, when discussing replication for solving scalability problems, we
are essentially tackling latency and bandwidth problems. We will also touch
upon management issues at various points throughout this book.

1.5 Summary
A distributed system is a collection of networked computer systems in which
processes and resources are spread across different computers. We make a
distinction between sufficiently and necessarily spread, where the latter relates
to decentralized systems. This distinction is important to make, as spreading
processes and resources cannot be considered to be a goal by itself. Instead,

downloaded by [email protected] DS 4.01


54 CHAPTER 1. INTRODUCTION

most choices for coming to a distributed system come from the need to im-
prove the performance of a single computer system in terms of, for example,
reliability, scalability, and efficiency. However, considering that most cen-
tralized systems are still much easier to manage and maintain, one should
think twice before deciding to spread processes and resources. There are also
cases when there is simply no choice, for example when connecting systems
belonging to different organizations, or when computers simply operate from
different locations (as in mobile computing).
Design goals for distributed systems include sharing resources and ensur-
ing openness. Increasingly important is designing secure distributed systems.
In addition, designers aim at hiding many of the intricacies related to the
distribution of processes, data, and control. However, this distribution trans-
parency not only comes at a performance price, in practical situations it can
never be fully achieved. The fact that trade-offs need to be made between
achieving various forms of distribution transparency is inherent to the design
of distributed systems, and can easily complicate their understanding. One
specific difficult design goal that does not always blend well with achieving
distribution transparency is scalability. This is particularly true for geographi-
cal scalability, in which case hiding latencies and bandwidth restrictions can
turn out to be difficult. Likewise, administrative scalability, by which a system
is designed to span multiple administrative domains, may easily conflict with
goals for achieving distribution transparency.
Different types of distributed systems exist which can be classified as
being oriented toward supporting computations, information processing, and
pervasiveness. Distributed computing systems are typically deployed for
high-performance applications, often originating from the field of parallel
computing. A field that emerged from parallel processing was initially grid
computing with a strong focus on worldwide sharing of resources, in turn
leading to what is now known as cloud computing. Cloud computing goes
beyond high-performance computing and also supports distributed systems
found in traditional office environments, where we see databases playing
an important role. Typically, transaction processing systems are deployed
in these environments. Finally, an emerging class of distributed systems is
where components are small, the system is composed in an ad hoc fashion,
but most of all is no longer managed through a system administrator. This last
class is typically represented by pervasive computing environments, including
mobile-computing systems as well as sensor-rich environments.
Matters are further complicated by the fact that many developers initially
make assumptions about the underlying network that are fundamentally
wrong. Later, when assumptions are dropped, it may turn out to be difficult
to mask unwanted behavior. A typical example is assuming that network
latency is not significant. Other pitfalls include assuming that the network is
reliable, static, secure, and homogeneous.

DS 4.01 downloaded by [email protected]

You might also like