Fallacies of Distributed Computing Explained
Fallacies of Distributed Computing Explained
net/publication/322500050
CITATIONS READS
69 7,859
1 author:
Arnon Rotem-Gal-Oz
8 PUBLICATIONS 82 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Arnon Rotem-Gal-Oz on 15 January 2018.
Arnon Rotem-Gal-Oz
The software industry has been writing distributed systems for several
decades. Two examples include The US Department of Defense
ARPANET (which eventually evolved into the Internet) which was
established back in 1969 and the SWIFT protocol (used for money
transfers) was also established in the same time frame [Britton2001].
Nevertheless,
This whitepaper will looks at each of these fallacies, explains them and
checks their relevancy for distributed systems today.
Latency is zero
The second fallacy of Distributed Computing is the assumption that
"Latency is Zero". Latency is how much time it takes for data to move
from one place to another (versus bandwidth which is how much data
we can transfer during that time). Latency can be relatively good on a
LAN--but latency deteriorates quickly when you move to WAN
scenarios or internet scenarios.
Latency is more problematic than bandwidth. Here's a quote from a
post by Ingo Rammer on latency vs. Bandwidth [Ingo] that illustrates
this:
You may think all is okay if you only deploy your application on LANs.
However even when you work on a LAN with Gigabit Ethernet you
should still bear in mind that the latency is much bigger then accessing
local memory Assuming the latency is zero you can be easily tempted
to assume making a call over the wire is almost like making a local
calls--this is one of the problems with approaches like distributed
objects, that provide "network transparency"--alluring you to make a
lot of fine grained calls to objects which are actually remote and
expensive (relatively) to call to.
Another example is AJAX. The AJAX approach allows for using the dead
time the users spend digesting data to retrieve more data - however,
you still need to consider latency. Let's say you are working on a new
shiny AJAX front-end--everything looks just fine in your testing
environment. It also shines in your staging environment passing the
load tests with flying colors. The application can still fail miserably on
the production environment if you fail to test for latency problems--
retrieving data in the background is good but if you can't do that fast
enough the application would still stagger and will be unrespon sive.…
(You can read more on AJAX and latency here.) [RichUI]
You can (and should) use tools like Shunra Virtual Enterprise, Opnet
Modeler and many others to simulate network conditions and
understand system behavior thus avoiding failure in the production
system.
Bandwidth is infinite
The next Distributed Computing Fallacy is "Bandwidth Is Infinite." This
fallacy, in my opinion, is not as strong as the others. If there is one
thing that is constantly getting better in relation to networks it is
bandwidth.
The other force at work to lower bandwidth is packet loss (along with
frame size). This quote which underscores this point very well:
In case you just landed from another planet the network is far from
being secured. Here are few statistics to illustrate that:
When I tried to find some updated incident statistics, I came up with the
following [CERT]:
Lastly Aladdin claims that the costs of Malware for 2004 (Viruses,
Worms, Trojans etc.) are estimated between $169 billion and $204
billion. [Aladdin]
Topology doesn’tchange
The fifth Distributed Computing Fallacy is about network topology.
"Topology doesn't change." Th at's rig h t, it d oesn ’t--as long as it stays
in the test lab.
When you're talking about clients, the situation is even worse. There
are laptops coming and going, wireless ad-hoc networks , new mobile
devices. In short, topology is changing constantly.
What does this mean for the applications we write? Simple. Try not to
depend on specific endpoints or routes, if you can't be prepared to
renegotiate endpoints. Another implication is that you would want to
either provide location transparency (e.g. using an ESB, multicast) or
provide discovery services (e.g. a Active Directory/JNDI/LDAP).
At this point you may say "Okay, there is more than one
administrator. But why should I care?" Well, as long as everything
works, maybe you don't care. You do care, however, when things go
astray and there is a need to pinpoint a problem (and solve it). For
example, I recently had a problem with an ASP.NET application that
required full trust on a hosting service that only allowed medium trust-
-the application had to be reworked (since changing host service was
not an option) in order to work.
To sum up, when there is more than one administrator (unless we are
talking about a simple system and even that can evolve later if it is
successful), you need to remember that administrators can constrain
your options (administrators that sets disk quotas, limited privileges,
limited ports and protocols and so on), and that you need to help them
manage your applications.
One way is that going from the application level to the transport level
is free. This is a fallacy since we have to do marshaling (serialize
information into bits) to get data unto the wire, which takes both
computer resources and adds to the latency. Interpreting the
statement this way emphasizes the "Latency is Zero" fallacy by
reminding us that there are additional costs (both in time and
resources).
The second way to interpret the statement is that the costs (as in cash
money) for setting and running the network are free. This is also far
from being true. There are costs--costs for buying the routers, costs
for securing the network, costs for leasing the bandwidth for internet
connections, and costs for operating and maintaining the network
running. Someone, somewhere will have to pick the tab and pay these
costs.
While the first seven fallacies were coined by Peter Deutsch, I read
[JDJ2004] that the eighth fallacy was added by James Gosling six
years later (in 1997).
Most architects today are not naïve enough to assume this fallacy. Any
network, except maybe the very trivial ones, are not homogeneous.
Heck, even my home network has a Linux based HTPC, a couple of
Windows based PCs, a (small) NAS, and a WindowMobile 2005 device-
-all connected by a wireless network. What's true on a home network
is almost a certainty in enterprise networks. I believe that a
homogeneous network today is the exception, not the rule. Even if you
managed to maintain your internal network homogeneous, you will hit
this problem when you would try to cooperate with a partner or a
supplier.
Assuming this fallacy should not cause too much trouble at the lower
network level as IP is pretty much ubiquitous (e.g. even a specialized
bus like Infiniband has an IP-Over-IB implementation, although it may
result in suboptimal use of the non-native IP resources.
I hope that reading this paper both helped explain what the fallacies
mean as well as provide some guidance on what to do to avoid their
implications.
References
[Britton2001] IT Architecture & Middleware, C. Britton, Addison-
Wesley 2001, ISBN 0-201-70907-4
[JDJ2004]. http://java.sys-con.com/read/38665.htm
[Gosling] http://blogs.sun.com/roller/page/jag
[Ingo]
http://blogs.thinktecture.com/ingo/archive/2005/11/08/LatencyVsBan
dwidth.aspx
[RichUI] http://richui.blogspot.com/2005/09/ajax-latency-problems-
myth-or-reality.html
[WareOnEarth] http://sd.wareonearth.com/~phil/jumbo.html
[Aladdin]
http://www.esafe.com/home/csrt/statistics/statistics_2005.asp
[RipTech] http://www.riptech.com/
[CERT] http://www.cert.org/stats/#incidents\
[Adams] http://www.dilbert.com/comics/dilbert/archive/dilbert-
20060516.html