Fundamentals of Cloud Application Archit
Fundamentals of Cloud Application Archit
Shi
iii
iv
Contributors
Temple University
Justin Y. Shi Philadelphia, Pennsylvania
v
vi
List of Figures
vii
viii
Contents
Bibliography 45
ix
x
Symbol Description
annealing and genetic algo-
α To solve the generator main- rithms have also been tested.
tenance scheduling, in the √
θ abc This paper presents a survey
past, several mathematical of the literature
techniques have been ap-
ζ over the past fifteen years in
plied.
the generator
σ2 These include integer pro-
gramming, integer linear ∂ maintenance scheduling.
The objective is to
programming, dynamic pro-
gramming, branch and sdf present a clear picture of the
bound etc. available recent literature
ewq of the problem, the con-
P
Several heuristic search al-
gorithms have also been de- straints and the other as-
veloped. In recent years ex- pects of
pert systems, bvcn the generator maintenance
abc fuzzy approaches, simulated schedule.
Part I
Fundamentals of Cloud
Computing
1
1
Fundamentals of Cloud Application
Architecture
Justin Y. Shi
Temple University
CONTENTS
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Necessary and Sufficient Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Necessary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 NCA and Sufficient Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Unit of Transmission (UT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Mission Critical Application Architecture: A First Example . . . . . . . . . . . . . 9
1.5 Maximally Survivable Transaction Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.1 Maximal Survivability and Performance Scalability . . . . . . . . . . . . . . 13
1.5.2 Transaction Processing Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.3 Parallel Synchronous Transaction Replication . . . . . . . . . . . . . . . . . . . . 15
1.5.4 Transaction Trust Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.5 Non-Stop Resynchronization - 2PCr Protocol . . . . . . . . . . . . . . . . . . . . 17
1.5.6 ACID Properties and RML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.7 Cluster Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5.8 Lossless Transaction Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.9 Cloud-Ready VLDB Application Development . . . . . . . . . . . . . . . . . . . 21
1.5.10 Unlimited Performance Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Maximally Survivable High Performance Computing . . . . . . . . . . . . . . . . . . . . 26
1.6.1 Protection for the “Bag of Tasks” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.6.2 Data Parallel Programming Using Tuple Space . . . . . . . . . . . . . . . . . . 30
1.6.3 Stateless Parallel Processing Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6.4 Extreme Parallel Processing Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.6.5 Performance Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.6.6 Automatic Parallel Program Generation – The Higher Dimension 38
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.1 Introduction
The scale of economy of cloud computing has ignited wide-spread imagina-
tions. With the promises of mighty computing power at unbelievable prices,
all applications are poised to gain mission critical status. The fundamentals
3
4 The Cloud Book
speed, better reliability and lower costs. These improvements have brought
the power of networked computing to the masses.
Now, we are confronted with the nagging survivability issues for networked
computing applications (NCAs). Server virtualization, the key technology that
has made cloud computing possible, although cost-effective, merely shifts the
service locations. The technological challenges remain.
It seems that the history is repeating itself. The same problems we solved
four decades ago for the telecommunication industry have come back to chal-
lenge us at a higher level for NCAs.
We ask: what is the generic “DNA” sequence in the packet switching net-
work that has made it possible to create the maximally survivable systems?
How can this sequence be “inherited” to the networked computer applications
to achieve the similar results? What are the necessary and sufficient conditions
for the maximally survivable systems?
To date, the “store-and-forward” network has been proven the most surviv-
able architecture. The Internet has inherited this “DNA” sequence by wrap-
ping packet switching circuits around circuit switching networks.
Looking deeper, the “store-and-forward” packet switching protocol is a
clever utilization of multiple redundancy types for the communication sys-
tem: with a given unit of transmission (a data packet), a packet switching
network supports transient spatial redundancy (store), temporal redundancy
(forward with re-transmission) and passive spatial redundancy (routers and
switches). These essential statistical multiplexing methods afford: a) the abil-
ity to mitigate massive component failures, b) the potential to leverage par-
Fundamentals of Cloud Application Architecture 7
FIGURE 1.1
Packet Switching Network
allel processing for performance scalability; and c) provably least cost fault
tolerance.
The primary winning argument for packet switching network was the prov-
ably least cost and maximal survivability design. The unlimited performance
scalability came as a welcome after-effect - a benefit that can only be delivered
after the system has reached enough scale.
The maximal survivability feature allowed decades of continued innova-
tions. After quite a few years, the “world-wide-wait” network has eventually
become the world-wide web that has poised to change the landscape of our
lives.
FIGURE 1.2
Enterpise Service Bus
layers. This means that the benefits of packet level network are not automatic
for higher layers. Therefore, the necessary conditions for gaining the same
benefits at a higher level should be somewhat similar.
The sufficient conditions include all those required for the correct respec-
tive NCA processing. These are application dependant conditions.
Unlike the low-level data communications, the NCA semantic network typ-
ically carries complex application dependencies and must satisfy non-trivial
semantics. Low-level data packets can be transmitted and re-transmitted with-
out violating the semantics of data communication; it is not clear how to de-
fine the application dependent UTs that can be supported at architecture level
without breaking the application processing correctness. Poor “packetization”
could violate the crucial application dependencies (too little state information)
or incur excessive communication overheads (too much state information). It
becomes a non-trivial exercise to embed the generic “DNA” sequence to any
NCA.
For example, Enterprize Service Bus (ESB) is the core of Service Oriented
Architectures (SOA). It consists of a set of message servers dispatching re-
quests and replies between clients and assortment of heterogeneous servers
(Figure 1.2).
A sustainable cloud-ready ESB should meet the following (sufficient) re-
quirements:
1. Maximal survivability.
2. Unlimited performance scalability.
10 The Cloud Book
FIGURE 1.3
ESB with Spatial Redundancy
conceptual lossless ESB system with re-transmission API at all clients without
the disk replication harness.
Although the differences are subtle, the benefits are non-trivial.
In Figure 1.3, the disk replication harness adds significant overheads to
each message transmission, either synchronously or asynchronously. In asyn-
chronous mode, message losses cannot be avoided and the ESB overall speed
must be throttled to avoid the replication queue overflow. In synchronous
mode, the two-phase-commit replication protocol imposes severe performance
penalty to each message transfer. Adding message queue managers increases
the replication overheads linearly. Amongst the three operating objectives,
only one can be attempted by any production system. This problem persists
even if Figure 1.3 is moved to a computing cloud.
In Figure 1.4, the disk replication harness is removed. There will be zero
replication overhead. This enables unlimited performance scalability: one can
add as many messaging servers as the application requires. There will be zero
message losses since all clients hold the transient storage of their own messages
until confirmed delivery. Further, the re-transmission logic will automatically
discover alternative message routes and recover lost messages. All we need to
supply is the passive ESB redundancy in different location(s). Passive ESB
system needs much less maintenance (similar to a network router) than active
systems. Properly configured, this system can automatically survive multiple
simultaneous component failures.
The critical observation in all these is that messaging storages are all tran-
sient. Replication of transient messages on permanent storage is absolutely
unnecessary.
This scheme can be further improved if we enhance the re-transmission
12 The Cloud Book
FIGURE 1.4
ESB with Re-Transmission API and Passive Redundancy
protocol to leverage the passive ESB hardware. For example, we can build an
interconnecting network of ESB servers where each server runs an enhanced
message store-and-forward protocol at each stage. The result is an extreme
scale ESB infrastructure that can actually meet all three operating objectives
simultaneously. Implementing the improved ESB in a computing cloud can
result in a provably optimal messaging architecture.
It is also worth mentioning that not all messages need confirmed delivery.
Non-mission critical messages can be sent without delivery confirmation. In
other words, there should be two kinds of messaging services in its API: lossless
with automatic re-transmission (like TCP), and others (like UDP).
Since none of the existing messaging systems was built as shown in Figure
1.4, these applications are really using UDP-like messaging without delivery
guarantees.
This example also eludes to the potential difficulties and significant ben-
efits of introducing statistical multiplexing to other basic NCA architectures.
The next sections describe the challenges and solutions for embedding the
“store-and-forward” sequence for transaction processing and high performance
computing systems.
Fundamentals of Cloud Application Architecture 13
FIGURE 1.5
Conceptual Diagram of DB x
1. Serializable
2. Repeatable Read
3. Read Committed
4. Read Uncommitted (dirty read)
give better performance but increases the risk of deadlocks. The discipline of
cluster lock optimization is exactly the same as database locks.
For practical purposes, we will also need two more tags to complete RML.
Here is the complete list:
Count=0;
Tid=GetTransID();
Repeat:
Begin Tran:
Update1; // Insert Tid to sys$trans$table
Update2;
--- // Other operations
End Tran;
FIGURE 1.6
Automatic Re-transmission
Fundamentals of Cloud Application Architecture 23
FIGURE 1.7
Replicated Partitioned Database (P =3, R=2)
FIGURE 1.8
K-Order Shift Mirroring, K=P =4
1. Parallel partial joins. For example, for tables A and B, there are 16
partial join pairs [A0 ∗ B0 ], [A0 ∗ B1 ], [A0 ∗ B2 ], [A0 ∗ B3 ], [A1 ∗ B0 ],
[A1 ∗ B1 ],..,[A15 ∗ B15 ]. All 16 pairs can be executed in parallel on
four servers.
FIGURE 1.9
Message-based “Bag of Tasks” Parallel Processing
Further, we define
Thus,
α = α1 + α2 .
Assuming failure occurs only once per check-point interval and all failures are
independent, then the expected running time E per check-point interval with
any processing element failure is
where X = recovery time for worker time losses. We can then compute the
differences E ′ − E, as follows:
Since:
α2 = α − α1
E − E′
= α2 t0 (K0 + t0 + K1 + t0 /2 − K0 − t0 − X)
= α2 t0 (K1 + t0 /2 − X)
Fundamentals of Cloud Application Architecture 29
In other words, the savings equal to the product of the probability of partial
(worker) failure and the sum of check-point reading time and master lost time
with an offset of lost worker time. Since the number of workers is typically
very large, the savings are substantial.
The total expected application running time ET without worker fault tol-
erance is:
T t2
ET = (K0 + t0 + α(t0 K1 + 0 )) (1.3)
t0 2
We can compute the optimal check-point interval as:
dET K0 α
= T (− 2 + )
t0 t0 2
K0 α
0=− +
t0 2
K0 α
=
t20 2
r
2K0
t0 = (1.4)
α
The total application running time ET with worker fault tolerance is:
αt20 α2 t20
T
ET′ = K0 + t0 + αt0 K1 + − α2 t0 K1 − + α2 t0 X
t0 2 2
K0 αt0 α2 t0
= T 1+ + αK1 + − α2 K1 − + α2 X (1.5)
t0 2 2
The optimal check-point interval with worker fault tolerance is:
dET K0 α − α2
= T (− + )
t0 t0 2
K0 α − α2
0=− +
t0 2
K0 α − α2
=
t0 2
r
2K0
t0 = (1.6)
α − α2
For example, if we set the check-point interval t0 = 60 minutes, the check-
point reading and writing time K0 = K1 = 10 minutes, and the average worker
failure delay X = 30 sec = 0.5 minute, the expected savings per check-point
under any single worker failure is about 39.5 minutes.
E − E′
30 The Cloud Book
= α2 t0 (K0 + t0 + K1 + t0 /2 − K0 − t0 − X)
= (10 + 30 − 0.5)
= 39.5 minutes.
r
2K0 p
t0 = = 2 · 10 · 184, 320 = 1, 920 (1.7)
α − α2
This means that it is not necessary to check-point the masters unless the
application running time T is greater than 30 hours.
Implementing the “worker fault tolerance” scheme requires:
1. P ut(TupleName, value);
2. Read(NamePattern, &buffer);
3. Get(NamePattern, &buffer);
The Read operator fetches the value from a matching tuple. The Get oper-
ator extracts the value from a matching tuple and destroys the tuple. Both
operators are “blocking” that they suspend the calling program indefinitely
until a matching tuple is found. Tuple name matching allows the same tuple
Fundamentals of Cloud Application Architecture 31
FIGURE 1.10
Parallel Processing Using Tuple Space
FIGURE 1.11
Stateless Parallel Processor
change in partition loop depth requires re-programming. This is the very rea-
son that has prevented practical HPC applications from being optimized. Once
the loop depth is found, optimal grouping factor can be calculated statically
or heuristically at runtime [25][16].
Since deeper loop decomposition (finer processing grain) risks higher com-
munication overheads, maximizing the degree of parallelism can adversely im-
pact performance. The optimal loop depth gives the best possibility to deliver
the optimal parallel performance.
Finding the optimal loop depth can be done experimentally via some back-
of-envelop calculation or aided by an analytical tool we call timing models [31]
[33].
The timing model for a compute intense loop is an equation containing
estimation models for major time consuming elements, such as computing,
communication, synchronization and disk I/O. Setting the synchronization
time to zero makes it possible to find the performance upper bound that all
parallel tasks complete in time. In reality, the synchronization time may be
negative when computing overlaps with communications.
For example, let
cf (n)
Tseq (n) = (1.8)
ω ′ (n)
be the running time of a program of input size n, where f (n) is the time
complexity, c > 0 captures the efficiency losses (instruction to algorithmic
step ratio) of the programmer, compiler and operating system scheduler, and
ω ′ (n) is the processor speed measured in instructions processed per second.
Since both c and ω ′ (n) are hard to obtain in practice, we introduce
ω ′ (n)
ω(n) = (1.9)
c
measured in algorithmic steps per second. Thus ω(n) can be obtained from
program instrumentation:
f (n)
ω(n) = (1.10)
Tseq (n)
FIGURE 1.12
Application Dependent CMSD Envelope
n3 δn2 (p + 1)
Tpar (n, p) = + ; (1.12)
pω(n) µ
n3
Tseq (n) = . (1.13)
ω(n)
Tseq (n) p
Sp = = (1.14)
Tpar (n, p) δω(n)(p2 + p)
1+
nµ
FIGURE 1.13
Parallel Performance Map of Matrix Multiplication
MBPS, Figure 1.13 shows the performance map of parallel matrix multiplica-
tion for a small number of processors and problem sizes.
Timing model helps to determine the optimal partition depth. For exam-
ple, if n = 10, 000, for the matrix multiplication program with the processing
environment characterized by ω(n) = 300 MOPS and µ = 120 MBPS, it is not
a good idea to pursue anything deeper than the top-level parallelization since
the speed up will not be greater than 12. As shown in Figure 1.13, spreading
calculations onto too many processors can have severe adverse performance
effects. However, deeper loop partitioning should be re-evaluated if the pro-
cessing environment changes, such as slower processors, larger problem sizes,
faster network or a combination of these. For simulation applications, the op-
timal processing grain size should be identified using the same process within
each simulated time period. This is because the time marching loop does not
impact computing versus communication ratio.
Theoretically, the Timing Model method overcomes the impossibility of
program performance prediction (due to Turing’s Halting Theorem [41]) by
introducing application dependent ω(n). ω(n) is obtained via program instru-
mentation. The rest is straightforward.
1. Identify the computing intensive parts and separate them into in-
dependent sub-programs.
2. Identify data dependencies amongst all parts. This defines the tuples
to connect the sub-programs.
3. For each computing intensive sub-program, decide a partition loop
depth.
Fundamentals of Cloud Application Architecture 39
<reference></reference>
<parallel>
<reference></reference>
<master>
<send> or <read>
<worker>
<send> or <read>
<target>
the loop to be parallelized
</target>
<send> or <read>
</worker>
<send> or <read>
</master>
</parallel>
FIGURE 1.14
PML Tag Structure
The resulting parallel programs will run in a Tuple Space supported en-
vironment, such as Synergy [28] [32] where the worker will be automatically
replicated to run on multiple compute nodes.
For performance optimization, each partition depth change requires re-
coding steps 3-5. This is labor intensive. PML (Parallel Markup Language)
[38] was developed to ease the finding of optimal processing granularity and
for the ease of parallel programming in general.
Like all other parallel program markup efforts ([36], [26] and [44]), the core
concept of PML is to eliminate automated dependency analysis - a common
component in traditional parallel compilers. Instead, the user is required to
perform dependency analysis and marking the results in the sequential source
code using PML tags.
PML is a XML-like language designed for generating data parallel pro-
grams from sequential programs. It contains seven tags (Figure 1.14).
The “reference” tag marks program segments for direct source-to-source
40 The Cloud Book
copy in their relative positions. The “master” tag marks the range of the
parallel master. The “send” or “read” tags define the master-worker interface
based on their data exchange formats. The “worker” tag marks the compute
intense segment of the program that is to be parallelized. The “target” tag
defines the actual partitioning strategy based on loop subscripts, such as tiling
(2D), striping (1D) or wave-front (2D). The coarse-to-fine grain size search is
to place “target” tags in an outer loop first and then gradually drive into
deeper loop(s) if the timing model indicates there are speedup advantages in
the deeper loops.
Figure 1.15 shows the PML marked sequential matrix multiplication pro-
gram.
In this example, the variable declarations of i,j and k will be copied exactly.
/* <reference id="123"> */
int i, j, k;
/* </reference> */
The first three lines of “master” tag define the output data interface to
workers. The two “put” tags insert two tuples named “B” and “A” with double
precision [N, N ] cells to the space.
/* <master id="123"> */
/* <put var="B" type="double[N][N]" opt="ONCE" /> */
/* <put var="A" type="double[N][N]"/> */
The “worker” tags define how the tuples are to be accessed. Tuple “B”
will be read “ON CE”. Tuple “A” must be retrieved along the i subscript for
N times.
/* <worker> */
/* <read var="B" type="double[N][N]" opt="ONCE"/> */
/* <get var="A" type="double[N(i)][N]"/> */
The “target” tag defines the partition depth and grouping factor. In this
case, the partition will happen at subscript i, within the range of [0, N ] step
1 and group G. This is the 1st order partition.
The “worker” body concludes with a single output tag describing the over-
all dimensions and the partitioned dimension [N (i)].
/* <parallel appname="matrix"> */
main(int argc, char **argv[]) {
/* <reference id="123"> */
int i, j, k;
/* </reference> */
/* <master id="123"> */
/* <put var="B" type="double[N][N]" opt="ONCE" /> */
/* <put var="A" type="double[N][N]" /> */
/* <worker> */
/* <read var="B" type="double[N][N]" opt="ONCE"/> */
/* <get var="A" type="double[N(i)][N]"/> */
/* <target index="i" limits="(0,N,1)" chunk="G" order="1"> */
for (i = 0; i < N; i++)
/* </target> */
{
for (k = 0; k < N; k++)
for (j = 0; j < N; j++)
C[i][j] += A[i][k]*B[k][j];
}
/* <put var="C" type="double[N(i)][N]"/> */
/* </worker> */
/* <put var="C" type="double[N][N]"/> */
/* </master> */
exit(0);
}
/* </parallel> */
FIGURE 1.15
PML Marked Matrix Program
42 The Cloud Book
Finally, the “master” is responsible for collecting the results with a single
read tag.
In the “target” tag, the “type” attribute design allows complex loop (sub-
script) manipulations. In this example, the expression [N (i), N ] indicates an
one dimension (leftmost) parallelization. The grouping factor is described
by the ”chunk” attribute. Expression [N (i), N (j)] indicates a two-dimension
parallelization. As mentioned earlier, deeper parallelization typically require
higher communication volume. It should be used with care. Similarly, sliding
windows, equal partitions, and wave-forms can all be implemented via similar
mechanisms.
In the “master” tag, it is also possible to insert check-point instructions.
To show the feasibility of the PML approach, a PML compiler was con-
structed [38]. The PML compiler generates two parallel programs: a master
and a worker. We tested the manually crafted and the generated programs in
the Synergy parallel processing environment [28][31].
We also installed MPICH2-0.971 [14]. It was compiled with -enable-fast
switch. A MPI parallel matrix application program was also obtained.
All test programs were compiled with gcc (version 2.95.3) -O3 switch.
All tests were conducted using a Solaris cluster consisted of 25 Sun Blade500
processors connected via 100 Mbps switches. All nodes are of exactly identical
configuration.
The Synergy and PML experiments were timed with worker fault tolerance
turned on. The Synergy master and MPICH2 programs have no fault tolerance
protection.
Figure 1.16 shows the recorded performance results comparing Synergy
hand crafted programs, PML generated programs and MPI parallel programs.
This study [38] also included other common computational algorithms,
such as Laplacian solver using Gauss-Seidel iteration, block LU factorization,
convolution and others. This study revealed that a) the tuple space can be
augmented to reduce code generation complexities; and b) the PML tags can
be very flexible to accommodate arbitrarily complex partition patterns. Like
other markup languages, [36],[44] and [26], practice and memorization help
with coding efficiency. In comparison, the scale of efforts is still much less
than coding data parallel applications directly.
For extreme scale applications, we could develop new PML tags to in-
clude the support for multi-scale algorithms [1] with automatic check-point
generation. The new features would allow programmers to compose sequential
multi-scale algorithms and use existing MPI or OpenMP codes as coarse-grain
workers directly. PML tags can parallelize the multi-scale program with direct
references (via the ”reference” tag) to the legacy MPI or OpenMP codes.
Fundamentals of Cloud Application Architecture 43
FIGURE 1.16
PML Performance Comparisons
1.7 Summary
Cloud computing has brought the “mission criticalness” within the reach of
all applications. Reaping the full cloud benefits, however, requires non-trivial
efforts. However, none of the challenges is new. This chapter has described
the fundamentals of mission critical applications for scalability, availability
and information assurance goals.
Starting from the packet switching networks, we identified the generic
“DNA” sequence that is essential to the robustness of extreme data com-
munication systems. Using information theory, we argued that extreme scale
networked computing applications (NCA) are also possible if statistic multi-
plexing is introduced to the application semantic networks.
The necessary condition of the maximally survivable NCA contains four
elements: Unit of Transmission (UT), transient spatial redundancy, temporal
redundancy and passive spatial redundancy. The sufficient conditions include
maximal NCA service/data availability, unlimited NCA scalability and loss
free NCA processing. NCA correctness and integrity are also assumed.
This chapter also describes specific steps toward meeting the necessary
and sufficient conditions of mission critical applications. we have included
detailed “DNA” embedding sequence for three basic NCAs: mission critical
Enterprise Service Bus (ESB), lossless mission critical transaction processing
and non-stop high performance computing.
Although the results are non-trivial, somewhat experimental still; it can
be hard to argue with the technological direction.
We have only scratched the surface. The basic NCA services: messaging,
storage, transaction processing and computing, suggest the direction for the
development of next generation networking equipment and distributed appli-
cation APIs. Differentiating from the traditional silo development processes,
44 The Cloud Book
these are integrating frame works based on the NCA semantic networks. The
next generation systems are likely to include more advanced support for the
basic NCA services.
The holistic NCA development methodology has brought the essential but
non-functional factors to surface. We hope this chapter could help the appli-
cation API designers, system developers, application integrators and network
developers to better understand the critical factors for solving some of the
most difficult non-functional networked computing problems.
1.8 Acknowledgements
The author wishes to thank many students and colleagues who contributed
in different ways to the body of work described here. In particular, contri-
butions of Kostas Blathras, John Dougherty, David Muchler, Suntian Song,
Feijian Sun, Yijian Yang and Fanfan Xiong are essential to the philosophical
developments in the “decoupling” school of thoughts.
The projects described here are partially supported by the National Sci-
ence Foundation, the Office of Naval Research, Temple University Vice Provost
Office for Research and Parallel Computers Technology Inc. Special thanks to
Professors David Clark (MIT) and Chip Elliott (GENI) for the informal dis-
cussions after their Distinguished Speaker talks on TCP/IP protocols, trans-
action losses and information entropies.
This manuscript was particularly inspired by recent invigorating discus-
sions with author’s new colleagues Drs. Abdallah Khreishah, Shan Ken Lin
and Ph.D. student Moussa Taifi, on fault tolerance modeling and information
theory.
Bibliography
[4] Juan M. Andrade. The TUXEDO System: Software for Constructing and
Managing Distributed Business Applications, 1996.
[8] Hal Berenson, Phil Bernstein, Gray Jim, Jim Melton, Elizabeth O’Neil,
and Patrick O’Neil. A critique of ansi sql isolation levels. In Proceedings
of the 1995 ACM SIGMOD international Conference on Management of
Data, pages 1–10. ACM, 1995.
45
46 The Cloud Book
[15] William Gropp and Ewing Lusk. Fault tolerance in mpi programs. Special
Issue, Journal of High Performance Computing Applications, 1(18):363–
372, 2002.
[19] Gray Jim. The dangers of replication and a solution. In ACM SIGMOD
International Conference on Management of Data Archive, pages 173 –
182, Montreal, Quebec, Canada, 1996.
[20] Leslie Lamport. Time, clocks, and the ordering of events in a distributed
system. CACM, 21(7):558–565, 1978.
[21] John Little. A proof for the queueing formula: l = λw. Operations
Research: A Journal of the Institute for Operations Research and the
Management Sciences, pages 383–387, 1961.
[27] Justin Shi and Suntian Song. Apparatus and Method of Optimizing
Database Clustering with Zero Transaction Loss. (Pending), 2007.
[30] Justin Y. Shi. Decoupling as a foundation for large scale parallel process-
ing. In Proceedings of 2009 High Performance Computing and Commu-
nications, Seoul, Korea, 2009.
[35] Suntian Song. Method and apparatus for database fault tolerance with
instant transaction replication using off-the-shelf database servers and
low bandwidth networks. (#6,421,688), 2002.
[36] Scott Spetka, Haris Hadzimujic, Stephen Peek, and Christopher Flynn.
High Productivity Languages for Parallel Programming Compared to
MPI. HPCMP Users Group Conference, 0:413–417, 2008.
[37] Michael Stonebraker. The case for shared nothing architecture. Database
Engineering, 9(1), 1986.
[42] Fanfan Xiong. Resource Efficient Parallel VLDB with Customizable De-
gree of Redundancy. PhD thesis, Temple University, 2009.
48 The Cloud Book
[43] Yijian Yang. Fault Tolerance Protocol for Multiple Dependent Master
Protection in a Stateless Parallel Processing Framework. PhD thesis,
Temple University, August 2007.
[44] Yingqian Zhang, Bin Sun, and Jia Liu. A markup language for
parallel programming model on multi-core system. SCALCOM-
EMBEDDEDCOM 2009 International Conference, pages 640–643, Sept.
2009.