Evaluating The Scalability of Distributed Systems
Evaluating The Scalability of Distributed Systems
net/publication/3300457
CITATIONS READS
297 6,787
2 authors, including:
Murray Woodside
Carleton University
241 PUBLICATIONS 5,997 CITATIONS
SEE PROFILE
All content following this page was uploaded by Murray Woodside on 18 March 2014.
March 2000
Abstract
Many distributed systems must be scalable, meaning that they must be economically deployable
in a wide range of sizes and con gurations. This paper presents a scalability metric based on cost-
e ectiveness, where the e ectiveness is a function of the system's throughput and its quality of
service. It is part of a framework which also includes a scaling strategy for introducing changes as a
function of a scale factor, and an automated virtual design optimization at each scale factor. This
is an adaptation of concepts for scalability measures in parallel computing. Scalability is measured
by the range of scale factors that give a satisfactory value of the metric, and good scalability is a
joint property of the initial design and the scaling strategy. The results give insight into the scaling
capacity of the designs, and into how to improve the design. A rapid simple bound on the metric
is also described.
The metric is demonstrated in this work by applying it to some well-known idealized systems,
and to real prototypes of communications software.
1 Introduction
Many distributed systems must be scalable. Typical present and future applications include web-based
applications, e-commerce, multimedia news services, distance learning, remote medi- cine, enterprise
management, and network management. They should be deployable in a wide range of scales, in
terms of numbers of users and services, quantities of data stored and manipulated, rates of processing,
numbers of nodes, geographical coverage, and sizes of networks and storage devices. Small scales may
be just as important as large scales. Scalability means not just the ability to operate, but to operate
eciently and with adequate quality of service, over the given range of con gurations. Increased
capacity should be in proportion to the cost, and quality of service should be maintained.
The framework presented here, and described in a preliminary way in [1], has the following features
which are lacking in previous work on scalability metrics:
it separates the evaluation of throughput or quantity of work from quality of service,
it allows any suitable expression for evaluating quality of service,
This research was supported by the Natural Sciences and Engineering Research Council of Canada, through their
program of Industrial Research Chairs, and by CITO (Communications and Information Technology Ontario.
1
it adds to the system design a formal notion of a scaling strategy, which is a plan for scale-up.
The plan can introduce di erent kinds of changes at di erent scales, since it often happens that
all the components cannot be scaled simultaneously. This generalizes the notion of a scale factor,
which becomes a parameter of the strategy,
it incorporates scalability enablers, which express aspects of the design which should be tuned
for ecient operation at any given scale.
Existing Scalability Analysis
A variety of scalability metrics have been developed for massively parallel computation, to evaluate the
e ectiveness of a given algorithm running on di erent sized platforms, and to compare the scalability
of algorithms. These metrics assume that the program runs by itself, on a set of k processors with a
given architecture, and that the completion time T measures the performance.
Three related kinds of metrics have been reported: speedup metrics, eciency metrics, and scala-
bility metrics. The following de nitions give the avour of the proposed metrics, although there are
variations in detail among di erent authors:
Speedup S measures how the rate of doing work increases with the number of processors k,
compared to one processor, and has an \ideal" linear speedup value of S (k) = k.
Eciency E measures the work rate per processor (that is, E (k) = S (k)=k), and has an \ideal"
value of unity.
Scalability (k1; k2) from one scale k1 to another scale k2 is the ratio of the eciency gures for
the two cases, (k1; k2) = E (k2)=E (k1). It also has an ideal value of unity.
A typical metric is the xed size speedup, in which the scaled-up base case has the same total
computational work, and the speedup S is the ratio of the completion times (i.e., S (k) = T (1)=T (k)).
The above three metrics are described in [2], for a homogeneous distributed memory multiprocessor
such as a hypercube or a mesh. The authors considered a xed size case, a xed time case and
a memory-bounded case. In [3], a generalized speedup (taking a better account of memory access
operations) is proposed for systems with shared virtual memory, and an iso-speed scalability metric
for an algorithm-machine combination, which relates the workload capacity of the system at two
di erent scales. In [4] an isoeciency analysis is given, based on the question: at what rate should
the problem size increase with respect to the number of processors in order to keep the \eciency"
xed? In [5], a toolkit called the Modelling Kernel is described, which uses a model based on the
program's parse tree. The choice of the scalability metric is left to the user. In [6], the techniques of
experimentation, analytical modeling and simulation are compared, as they apply to studying parallel
system performance. Scalability captures both the available and the delivered computing power, with
the di erences due to the overheads of parallel processing. The paper identi es overheads from di erent
sources (hardware, software, algorithm), and summarizes metrics which include constant problem size
scaling, time-constrained scaling, memory-constrained scaling and isoeciency scaling.
The need for a new scalability metric and a methodology
Distributed systems require a more general form of scalability metric, because:
Rather than running a single job to completion, these systems are shared by many jobs and new
jobs arrive as others complete, so the behaviour should be modeled as a steady state.
2
Throughput and delay should be evaluated separately as productivity factors. With a single job,
the throughput is just the inverse of the job time; in a distributed system with an average of N
jobs the throughput is N=(job time). The mean number of users adds a degree of freedom to
the analysis.
a greater variety of communications mechanisms may become involved, with their own scalability
properties. Reliable multicast for example can introduce serious scalability problems.
The productivity evaluation should be further expanded because there are more aspects to
\adequate service" in distributed systems, called quality-of-service (QoS) gures. We shall use
the term QoS here to include any measure of the goodness of a service (for instance, it could
include a failure related or availability measure). For simplicity the examples considered in this
paper are restricted to quality of service based on mean delay, but the framework includes any
measure which can be evaluated.
The \size" of the system is more complex because of the heterogeneous physical architecture of
distributed systems. Instead of just a number of processors to measure size, one should consider
symmetric multiprocessor nodes, replicated services, alternative networks and processors with
di erent types and prices, and so forth. \Size" becomes multidimensional.
Additional cost factors besides cost of processors, storage and bandwidth should be considered,
such as the cost of software licenses, and perhaps the cost of operation such as management and
help desks.
The strategy for scaling up a distributed system is more complex than simply adding processors,
storage and bandwidth. It may include replicating software services and storage, for instance,
and modifying the communications mechanisms. An explicit scaling strategy is needed as part of
the de nition of the metric. This is a counterpart of the various kinds of parallel system scaleup
de ned in di erent metrics, (e.g. the xed-time or xed-speed scaleup metrics).
There has been a little previous work on distributed systems, in several distinct avors. In [7], the
scalability of Microsoft's Windows NT operating system is discussed using an in-memory subset of
Microsoft's SQL server benchmark, to focus on the CPU performance. Scalability analysis is carried
out by plotting a graph of performance gures obtained from this benchmark versus the number of
processors. This study is quite close to the parallel systems studies, having homogeneous processor
resources and ignoring software resources.
In [8], a scalable load monitoring service and a scalable resource query service for managing re-
sources distributed across a network are described. In this work scalability means a linear relationship
between the bandwidth requirements (i.e. the amount of trac generated on the network) and the
number of hosts on the network.
In [9], the authors argue that the existing solutions to provide transparent access to network
services need to be modi ed for the internet paradigm, considering the scalability, fault tolerance and
load balancing issues.
In [10], models are used to enable design-time modeling of complex large scale distributed applica-
tions. The authors analyze how some design parameters for the example system a ect the application's
QoS (de ned as end-to-end mean response time) and scalability, with respect to the number of nodes
and the number of domains.
In [11] a scalability metric suitable for distributed systems, called P-scalability, was examined. It
employs the \power" measure P (k) of Giessler [12], and the cost of all system resources at a scale
3
factor k, as follows:
P{scalability (k1 ; k2) = (P (k2) Cost (k1))=(P (k1) Cost (k2))
P (k ) = (Throughput)=(Response Time)
This metric combines capacity and response time (both are present in the power P) with cost. However
it has a defect (which is cured in [1] and the present work), in that it credits unbounded value to
response times approaching zero. In fact however, for most users there is a required response time,
below which further reduction has little or no value. This metric therefore distorts the scalability by
rewarding very short responses which are actually not very useful.
In [1] there is a preliminary description of the metric presented here. The present paper adds a
more complete description of the algorithms used to compute the metric, and of a number of idealized
cases which validate its intuitive meaning. It introduces an upper bound on the metric, and gives the
details of the scalability assessment of two substantial applications. It shows that the present metric
is a generalization of some of the well-known metrics for scalability of parallel computations.
productivity for any given k. Since k determines x by the strategy, and x in uences y through
the optimal tuning, the values of y are e ectively determined by k.
Examples of scalability enablers are the allocation of processes to the processors, priorities, replication
of processes and data, the creation of threads within processes, the memory available for bu ers, tuning
of the middleware parameters, network bandwidth and the choice of communication protocols. For a
simple example, a database system might have a scaling strategy which de nes the users, processors
and the database size as functions of k:
Nusers = k = the number of active users
Datasize = 10000 log10 k = the assumed size of the database, in records, as k increases.
Nproc = dk=100e = the number of processors to be provided, one per 100 users, rounded
up.
Figure 1 illustrates this scaling path in the space (Datasize, Nproc), over the range k = 100 to 300.
At each value of the scale factor, the scaling strategy and the optimal values of the enablers
determine the scaled con guration, from which the cost, capacity and quality values can be evaluated
and used in the scalability metric.
4
Nproc processors
2
150 Scaling path
200
100 for different
1
scale factors
20,000 25,000
Datasize (records)
5
for purposes of explanation and demonstration, we will consider only the mean response time T (k) at
scale factor k, compared to a target value T^, in the following value function:
f (k) = 1=(1 + (T (k)=T^)) (2)
With this value function, from (1) and (2) the scalability metric for scale k2 relative to k1 is, after a
little simpli cation:
^
(k1; k2) = 2 C1 (T1 + T^) (3)
1 C2 (T2 + T )
C2 = kC1 , the response time is roughly constant at T = kN=k = N=, and 1:0.
7
server. If there is no overhead for managing replicas, it seems intuitively clear that this is a perfectly
scalable system. Consider the scaling path:
N2 = kN1; K2 = kK1 ; D2 = D1 =k; C2 = kC1
Following Case III, but with K2 = kK1 we nd that > 1 for all k, and for large k the limit is
almost the same:
lim ( (N + K , 1)[(N1 + K , 1)D1 + T^]
k) = 1
k!1 N1[(N1 + K )D1 + T^]
Again if K N1, ! 1:0. For an unbalanced system the result is also the same.
Case VII: E ect of overhead costs
In Case VI, suppose that for k > 1 the demands D are augmented by coordination overhead, for
example to maintain overall system state data. The replicas, costs and user population are all scaled
by a factor k. An ecient coordination mechanism might limit the overhead cost to a slowly increasing
amount of overhead, giving a scaling relationship for the server demands such as
D1 = (D1=k) + D0 log k
Then, once again following the approach of Case III, Eq. (3) gives:
Case VIII: A closed system with scaled population and target response times
This case considers a scaling path in which response degradation is accepted in proportion to a rising
user population in an otherwise unscaled system. This is quite a di erent system goal, and illustrates
the exibility of the framework. We consider the balanced closed system of Case III, but with constant
values of C and D. The analysis of Case III then gives the value function
f = 1=(1 + T=T^) = 1=(1 + (N + K , 1)D=T^)
If T^2 = k T^1 and N2 = k N1, then dkd f > 0, and it is also well-known that dkd 0. Then it can
be deduced from Eq (3) that is an increasing function of k, and that it approaches a constant limit
greater than 1. This is a symptom of the well-known fact that the response time rises at the same
asymptotic rate as the population.
The conclusion is that, if response time degradation is accepted in proportion to users and is
included in the scalability function, closed balanced systems are in nitely scalable in population.
8
Case IX: A single scaled open multiserver queue
Often one tries to scale up a system by adding servers. This case considers an ideal multiserver queue
(M=M=m queue) in which: the number of servers m, the arrival rate , and the cost C are all scaled
by a factor k. There is no server coordination overhead, and the queue shares the load in an ideal
fashion, so the metric should show in nite scalability.
The solution is well-known but lengthy, so it will not be shown here, but it does show that > 1
for all k, which supports the intuition.
As k ! 1, the metric approaches the form:
. S
! 1 + 1^ S + (1 P,Q) 1+ ^ >1 (8)
T T
where S is the service time at any server, is the arrival rate, is the utilization of each server
( = S=m) and PQ represents the Erlang{C formula for the probability that all m servers are busy.
The fraction with PQ in the numerator approaches zero with large k, and approaches 1 in the
limit.
Summary
The cases I{IX cover a wide range of well-understood systems and of scaling policies, and reveal how
the metric proposed in this paper will evaluate di erent kinds of systems, and agrees with intuitive
judgement. This gives some con dence in applying it.
The parallel-system metrics surveyed in Section 2 also t into the general framework of this paper
as special cases. If we consider a steady state with one job at a time being executed, one after another,
and xed-size scalability, then a parallel computer is a closed system with replication and overhead, as
in Case VII (except the user population is not scaled). The time to completion is rewarded through the
throughput term, with = 1=T . The QoS function considered in most metrics is simply f = 1, since
they only evaluate the time to completion and that is already taken into account in the throughput.
(Satisfactory)
Stop
Is it possible to yes
provide more scaling
enablers?
(Unsatisfactory)
10
Optimizing the productivity metric using simulated annealing
In order to maximize the productivity of a particular con guration by tuning the scalability enablers y ,
this research used the simulated annealing algorithm described in ([16], [17], [18]). It was used because
it is robust, in the sense that it can handle a wide variety of relationships, including discontinuous
functions and integer or categorical variables. Its disadvantages are that it can consume very many
search steps and it gives no guarantees about convergence.
Simulated annealing takes random steps controlled by a parameter called the \temperature" .
The productivity function is evaluated and the perturbation is accepted if it gives an increase, or is
either accepted or rejected if it gives a decrease. The acceptance probability is smaller for a greater
decrease, and as decreases this probability also is reduced. Termination was decided if the fraction
of the accepted moves was less than 2% for two successive full iterations, or if the number of iterations
exceeded a prede ned limit. The best solution found was retained and used as the nal result.
11
Step 2. For each scale factor k, determine the scaled system con guration from the scaling strategy.
Compute the total seconds of execution of each device, averaged per response, as follows:
Execution and overhead which is determined and assigned to each device by the scaling strategy
is calculated rst,
The remaining execution demand is added up over the remaining tasks and spread (optimisti-
cally) over all the devices so as to produce the most even distribution of the total demand,
expressed in seconds of execution per response. That is, it is allocated without regard to allo-
cating entire tasks to one device, but with regard to whether the device can do the work (so,
CPU demand is spread over CPUs and disk demand over disks).
Optimistic assumptions about overheads mean that they are set to the lowest value consistent
with the scaling strategy; thus if two tasks included in the remaining demand should (by the scaling
strategy) be allocated separately, internode communications overhead is included.
The result of this step is a set of demands which may still be unequally distributed over the devices,
because of constraints in spreading the workload.
Step 3. At scale k, set C (k) to the cost of the scaled system and, following ([13] chapter 5), nd
bounds on and T :
set (k) to the minimum of (1) the balanced system throughput bound for a queueing network
with the same servers, and (2) the asymptotic throughput bound for the given set of demands
set T (k) to the balanced job value
compute F (k) from Eq. (3).
Step 4. Set the scalability metric bound to = F (k)=F (1), and then the bound-based scalability
limit is the rst value of k giving a y that drops below the \moderate scalability" limit of 1 , ".
The queueing network model with the evenly spread workload is constructed so that it intuitively
gives a performance bound, however the relationship is not rigorously proven. The intuitive reasons
for believing it gives a bound are
software resource constraints are ignored, which can only improve performance,
allocation decisions which are enablers in the strategy are represented in the bound by the
greatest possible degree of load balancing, which should give better performance than the best
feasible allocation that respects task granularity,
overhead that is not explicitly required by the scaling strategy is omitted.
The bounds can show the consequences of changing demands and power with k. Suppose that
the scaling strategy resulted in a total demand (in seconds of execution, adding over all nodes) of
D(k) = g1 (k), the number of nodes (all equally fast) is g2(k), and there is a user delay (not included
in the response time) of Z0 . Then the bound calculation is:
Davg(k) = D=g2(k) = g1(k)=g2(k)
R(k) = D(k) + (N , 1)Davg)=(1 + Z0 =D(k))
= g1(k) + (N , 1)(g1(k)=g2(k))(g1(k)=(Z0 + g1(k)))
T (k) = R(k) + Z0
12
The bound on the scalability metric can then be expressed as:
( )
min Z +g (k)+(N ,kN ; 1
1) gg21 ((kk)) (Z0g+1g(k1)(k)) Dmax
Fbnd(k) 0 1
bnd(k) = F (1) = (9)
C (k) 1 + T1^ Z0 + g1 (k) + (N , 1) gg12 ((kk)) (Z0g+1 (gk1)(k)) F (1)
When the system is saturated, both the numerator and denominator are dominated by the terms
in the big round brackets multiplied by (N , 1). The direct e ect of adding work (increasing g1 (k)) is
always to decrease . The direct e ect of adding nodes is to increase g2 (k) and C (k) both, so as far as
the bound is concerned the e ect is neutral when the system is saturated, and harmful to scalability
when it is not. The direct e ect of causing a bottleneck node, due to a scaling path that does not
allow the load to be properly balanced, is to increase Dmax and decrease scalability through the last
term in the numerator. All of these e ects are expected, but the equation gives a picture of the order
of the relationship.
A second version of the bounds analysis, which is closer to a kind of approximation, is to use the
bounding value for performance and productivity in the base case also. This puts all scale factors
on an equal footing as regards the looseness of the bounds, however it reduces the certainty that the
value of bnd is in fact a bound since the denominator may be overestimated.
6 A connection-management system
This section analyzes the scalability of a connection management system, based on the design and
parameters of a real industrial prototype. It is a design which evolved out of a connection-management
design described previously in [1] and [11].
Figure 3 shows the major components in a prototype connection management system for virtual
private networks, intended to support applications such as video-conferencing. The prototype was
heavily in uenced by standards such as G.805 [20]. It was designed to be able to:
set up a virtual private network joining user-speci ed end-points, and allocating the network
resources in such a manner as to meet the QoS requirement,
manage a variety of heterogeneous switching equipment, for the purpose of setting up end-to-end
connections,
use the allocated resources of the virtual private network and let the user set-up/tear down
connections arbitrarily, among any of the sites.
The prototype was implemented using a network of workstations running UNIX, with DCE mid-
dleware to handle intertask communications and transparency, and a backbone network based on a
SONET OC-12 (622 Mbit/s) optical ber ring with proprietary switching equipment on which cross-
connections can be made or released as required. The software tasks can be roughly classi ed into
three logical layers:
The topology layer, that deals with the connection topology of the virtual private network (VPN),
connecting all the user-speci ed endpoints (e.g., the User-Network Interface identi ers { UNI's
{ in the case of an ATM network). Once a virtual private network is established, the objects
in the topology layer can directly communicate with the lowest layer (called SONET here), in
order to set up virtual channels over this VPN.
13
Client
Database
Read,Delete
VPN Trail
Topo_delete
VPN Trail
Read
Topo_setup
to Database Write
Read,Write,
Create Create
X- Dis-
connect connect
Subnet_connect
The virtual path (VP) layer, that deals with connecting all the sites in a virtual private network
with a virtual path. This corresponds to provisioning the network resources to meet user-speci ed
bandwidth and QoS, to support future connections.
The SONET layer that supports a virtual path by setting up appropriate connections on the
SONET ring.
Following is a brief description of the tasks in Figure 3:
The client tasks represents the users that set up (or dismantle) the virtual private network
and set up (or dismantle) connections on an existing virtual private network. The clients
could be the software tasks that manage higher level applications, e.g. a video conferencing
system that uses the given connection management system.
The clients interact with the topology layer to set up the virtual private network as well as the
connections on it (VC's or the virtual channels). The frequency of setting up/releasing a VPN,
which is like a leased line, is much lower than that of setting up/releasing temporary connections,
by a ratio of 1:50.
Topo setup and Topo delete: these tasks belong to the topology layer discussed above, and
support setting up VPNs as well as connections within a VPN. The necessary routing
functions are built into the setup entries of these tasks and of their servers.
VP: This task sets up and deletes virtual paths (VPs) that make up a VPN.
SONET: This task manages the bre-level port-to-port connections required to support the
setting up of the VP layer trails, which in turn help set up the VPN.
14
Subnet connect: This task directly controls the SONET network elements.
Database: The database stores objects related to the various functional layers in the system,
and provides state data to all the functions.
The database, which is accessed heavily by almost all the tasks in the system, clearly is a potential
hot spot in the system. By measurement it was veri ed that the database indeed had the greatest
demands for both VPN setup/release as well as connection setup/release, and would limit scalability
if its capacity were not increased. One approach to this is database replication, which was considered
as an element in the scaling strategy. As we shall see, the hazard in replication is heavy overhead.
The prototype system was instrumented and measured to obtain workload parameters for the
performance model which was used to evaluate the scalability.
Scaling strategy for the connection management system
The scaling strategy was to introduce replications of the database, using the location-based replication
paradigm described by Tranta liou and Taylor in [21]. For each database replica, an additional
processor was also added to the system. (We note that the location based paradigm was motivated
by reliability as well as performance, and the reliability e ects are not rewarded in the value function
f used here.)
The scale factor was set to be the number of database replicas. A xed number of ve proces- sors
was provided to run the other tasks in a xed con guration, and the number of users was taken as a
scalability enabler. Further enablers that were not used could have been the allocation of the tasks
other than the database tasks to the processors, and replicas and additional processors for the other
functions.
For each scale factor a performance model was set up with the replicas and their overheads, with
overhead amounts calculated from the number of replicas, and the requests sent from any client entry to
the database task were equally divided among all the replicas. The xed remote invocation overheads
were incorporated in the execution demands of the task entries. The fact that the accesses to the
database replicas were symmetric happens to permit a special ecient approximation for symmetric
replication of subsystems to be used in the solver [22].
In order to model the consistency management overhead (in terms of extra execution), each replica
of the database is associated with a transaction overhead pseudo-task on the same CPU. The trans-
action overhead task accounts for the synchronous and asynchronous broadcasting overheads, locking
overheads, etc. for consistency management, and the calls made by the database entries to the over-
head task during the operation, prepare, commit and abort phase are proportional to the number of
database replicas in the system.
The number of write transactions is signi cant, but the granularity of the database objects is small,
so the probability of con ict on locks was assumed to be negligible and lock queueing delays were not
modelled. However, the execution overheads of locking were substantial and were included.
The response of the system was modelled as a cycle of e ort for one conference, including set-
ting up and tearing down 5 virtual channels for a video conference between the two sites, plus one
time in ten it included setting up a VPN as well. The cycle had a target time of 15 minutes (T^ = 15
min.). Load was generated by a number of users, who were modelled as having a \thinking time" of
10 minutes, between one cycle and the next.
The provisioning cost for the base con guration, including one copy of the database server, and one
processor per software task, is taken as $100,000. Each extra copy of the database server (including a
new dedicated processor) is assumed to cost an additional $5000. This gives a cost per unit time of
the form Constant*(1 + 0:05k).
15
The reference con guration of the system had a single database copy, and was also optimized with
respect to the number of clients, giving a reference productivity of 702 cycles of activity per hour per
unit cost, and a reference throughput of 95 cycles of activity per hour. (That is, setting up and tearing
down 9.5 virtual private networks, and setting up and tearing down about 475 virtual channels per
hour).
6.1 Scalability bound
Step 1. The base con guration with 6 processors is optimized with respect to the number of clients,
to obtain 23 clients, 95 operation units per hour and productivity F = 1:95 10,5 units/hour.
Step 2. At each scale factor, with k database replicas and k database processors, the balanced
demand is calculated, including the overheads. In this case,
total demand; D = 14:44 + (22:11k) sec.;
average demand = Davg(k) = (14:44 + (22:11k))=(k + 5) sec.;
Dmax = 35:08 sec.;
response time = D + (N , 1)Davg=(1 + (Z=D))
Z0 = 600sec.;
cost = C (k) = 1 + 0:05k units/sec.;
Steps 3, 4. The solution gives the response time T = D +(N , 1)Davg=(1+(Z=D)) and throughput
= N (Z0 + T ) for the balanced system. Substituting into Eq. (9), we get the following expression for
the scalability bound:
4 kN
5:13 10 min Z0 +D+(N ,1) D D ; Dmax 1
(k+5) (Z0 +D)
bnd(k) = : (10)
(1 + 0:05k) 1 + T^ (Z0 + D + (N , 1) (kD+5) (Z0D+D) )
1
The total demand, the number of processors, and the cost all grow linearly with k. For large k, the
scalability metric bound drops as k,2 . In fact it is the increasing overhead demands which cause the
devices to saturate, and limit the scalability. The equation gives the plot in Figure 4. If the acceptable
scalability limit is 0.8, it is reached at scale factor of k = 8. This is similar to the conclusion obtained
in the next section, which derives a limit of 5 from a more detailed analysis.
6.2 Scalability metric: full calculation
The full calculation optimizes the productivity function with respect to the available scalability enabler
(the number of clients), at each scale factor. The results are summarized in the following table.
16
Scalability bound for the connection management system (B)
1.3
1.1
0.9
0.8
0.7
1 2 3 4 5 6 7 8
Scale factor
1.3
1.1
0.9
0.8
0.7
0.6
0.5
1 2 3 4 5 6 7
Scale factor
Figure 5: Scalability by the detailed calculation and by the bound: Connection Management System
for productivity. The overall ratio of the read to write operations is approximately 2:1. A higher
read-to-write ratio would give less coordination overhead, and greater scalability.
For further scale-up this system, the results indicate some possible directions:
(1) The database schema could be re-worked to reduce the number of separate transactions.
(2) The routing algorithms at the topology, VP and SONET layer made heavy use of database
transactions, and could be redesigned to reduce the database operations.
(3) It might be possible to partition the database objects, instead of replicating the whole database.
For example, in this case, the ATM objects and the SONET objects could possibly be partitioned into
two disjoint parts, by redesigning the database schema.
18
IBM Power PCs
133 MHz
ATM Network
19
Terminal
User Node
Originating Node Software Network Interface Network
Interface Net_IF
Orig_AF
Net_IF ATM_Net
Orig_Call_Setup
Term_AF
Database Software
op prep cmt ab Location
Server
Term_Call_Setup
DB_oheads
Receiving_Term
Figure 7: Layered Model fo the Call Processing Software, including the processes and Interactions
20
Productivity Vs Scale factor Scalability Vs Scale factor
1.6
0.16
1.4
0.14 Rule of thumb bound
Productivity (per millisecond per unit cost)
1
0.1
0.8
0.08
0.06 0.6
0.04 0.4
0.02 0.2
0 0
2 4 6 8 10 12 14 2 4 6 8 10 12 14
Scale factor Scale factor
(a) Productivity VS the scale factor (b) Scalability VS the scale factor
6
x 10 Throughput Vs Scale factor (Optimized) Scalability enablers Vs Scale factor
4.5
* = Location server
10 o = Database
4
9
3.5 8
(Optimized) Replication level
Throughput (Calls per hour)
7
3
6
2.5 5
4
2
3
1.5 2
1
1
0
2 4 6 8 10 12 14 0 2 4 6 8 10 12 14 16
Scale factor Scale factor
(c) (Optimized) throughput VS the scale factor (d) The number of (optimized) database and location
server replica VS the scale factor.
Max. remote invocation overheads Vs Scale factor
1
1
0.9
0.9
0.8
0.8 Max. CPU. utilization
0.7
0.7
Max. remote invocation overhead
0.6
0.6 incurred by any of the CPUs
Utilization
Utilization
0.5
0.5
0.4
0.4
0.3
0.3 U_max U_min
0.2
0.2
0.1
0.1
0
0 2 4 6 8 10 12 14
2 4 6 8 10 12 14 Scale factor
Scale factor
(f) The max. remote invocation overheads incurred, as
(e) The maximum CPU utilization and the load a function of the scale factor.
imbalance VS the scale factor.
21
TABLE 2. Scalability analysis results for the call processing system.
Optimal no. of replicas Productivity
(Optimized) Scalability Throughput Normalized* System
Scale Location (ms,1 per metric value (Calls per Response Cost
Factor Database server unit cost) (Optimized) hour)106 Time (units)
1 1 1 0.1600 1 1.0958 0.72 1.1
2 1 1 0.1492 0.9325 2.1273 0.88 2.1
3 1 6 0.1444 0.9025 3.0531 0.89 3.1
5 4 2 0.0866 0.5415 3.6485 1.16 5.4
10 2 4 0.0589 0.3683 4.4934 1.11 10.2
15 2 10 0.0387 0.2421 4.4934 1.12 15.2
*The response time is normalized to the target mean response time of 10 ms
The factors which have to be balanced in the optimization of the scalability enablers include:
Collocation and distribution of software tasks: collocation reduces the remote invocation over-
head but may cause a load imbalance in the overall system,
Replication of the database, which a ects load balancing, remote invocation costs, overheads
for coordination (and thus also latency) and system cost. (Replication of the location server
however does not have a cost or overhead associated with it.)
The overhead latencies due to database replication, measured per response, have been summarized
below. These results correspond to the optimum replication levels mentioned in Table 2.
Scale Scale Scale
Factor Factor Factor
= 5 = 10 = 15
Overhead latency (per response)
due to database replication (ms) 2.10 0.7 0.7
The remote invocation overheads incurred are di erent for each CPU; the maximum among them
are plotted in Figure 8(f) below.
Overall, the results show that scalability is reasonable up to a factor of 3. Beyond this, the
scalability metric degrades, although capacity continues to increase up to a factor of about 10. Beyond
this the system is bottlenecked and capacity is saturated.
Some details shown by the Table and by Figure 8 are:
In Figure 8(a) the available productivity of the system drops gradually up to k = 3 and then
more steeply. Figure 8(b) shows the scalability metric. It drops to 0.8 at about k = 4, indicating
scalability up to a factor of 4.
Figure 8(c) shows the throughput, with a knee at about k = 3. At this point it has been increased
from about 1.09 million calls/hr to about 3.3 million calls/hr, while maintaining a good QoS.
For scale factors beyond 3, the optimization gives a response time a little higher than the target
value (which is encouraged by the metric, because it also gives a higher throughput).
Figure 8(d) shows the replication of the location server and the database server, which follows
a generally increasing trend with a lot of variation. The cost-bene t balance of replicas in the
middle range is pretty well neutral, so the optimizer has stopped in di erent parts of the space
22
in di erent runs. In fact, at k = 10 and 15, the optimizer chooses to leave CPUs unused (one at
10, two at 15) in order to be able to collocate some of the objects and save on remote-invocation
overheads. Due to the existence of unused CPUs (one unused CPU for scale factor = 10, and
two for scale factor = 15), the maximum CPU utilization is the same as Umax , Umin as seen in
Figure 8(e).
Figure 8(e) gives a picture of the load balance, which becomes worse as the number of CPUs in-
creases, because of constraints on allocation, di erences in task demands, and remote invocation
costs. At k = 10 and 15 the minimum utilization is zero because of unused processors.
As the software tasks are spread out across more CPUs, they incur increasing remote invocation
overheads, as seen in Figure 8(f). The worst overhead percentage among the processors is plotted;
it levels o at about 50%.
Summary
Thus, we conclude that the call-processing system is scalable up to a scale factor of 4, at which
point it can support about 3.3 million calls per hour. If it needs to be scaled beyond this point in a
cost-e ective manner, the following improvements could be tried:
The organization of software objects into concurrent tasks could be redesigned, to make their
execution and communication demands more equal.
The database schema could be modi ed so the database could be partitioned in various domains
rather than replicating it, and the consistency management overheads can be reduced. However,
the location service faces an increased execution demand in such a scenario.
The optimization by simulated annealing took about 10 hours (on a SPARC Ultra-1 workstation)
for each scale factor. This is entirely practical for a major evaluation, but it is also quite heavy, and
a faster optimization technique would be preferred.
8 Conclusions
The proposed strategy-based scalability metric generalizes the well-known metrics for scalability of
parallel computations, to describe heterogeneous distributed systems. In these systems a uniform
increase in all types of components is usually not a reasonable scaling strategy.
The principal new features of this metric are: separating the impact of throughput and response
time on the metric, formalizng the notion of a scaling strategy, introducing a quality-of-service eval-
uation, and introducing formal scalability enablers which are optimized at each scale factor. The
metric is the ratio of the system's productivity in a scaled version, to the productivity of a base case.
Relating scalability to productivity is consistent with quite general quality-of-service evaluation, and
with previous work on metrics for parallel systems.
The previous scalability metrics are special cases. For example, in scalability based on xed size
speedup, the scaling strategy is to use k processors, throughput is the inverse of completion time, cost
is k, and the QoS function is F = 1. For scalability based on xed-time speedup, the scaling strategy
is also to use k processors but to also change the workload W to a value which keeps the completion
time constant. Throughput is now constant, cost is k, and the QoS function is F = W .
The new strategy based scalability metric gives reasonable results for a large collection of idealized
and well-understood system models, in the form of queueing models suitable for distributed systems.
While it requires substantial e ort to apply it to real systems the e ort is manageable.
23
The contributions of this work are the new framework (including an open-ended range of possibil-
ities for di erent quality-of-service evaluation functions, and for di erent scaling strategies), the new
metric, a bounding calculation, and practical numerical techniques for evaluating the metric on real
systems. These techniques are applied to two substantial problems which have not been described
before, to indicate both the scaling limits and how the scalability might be improved.
The new framework is only applied here using models, to evaluate systems that have not yet been
deployed, but it could also be used with measurements to evaluate live deployed scaled systems.
The paper has de ned scaling only with a single scale factor, but the framework applies equally
to multiple scale factors, describing independent scaling of di erent attributes of the system. The
productivity and scalability would then be de ned as functions of a vector k.
Acknowledgments
Discussions with Pankaj Garg and Jerry Rolia were helpful in the early stages of this work.
References
[1] P.P. Jogalekar and C.M. Woodside, \Evaluating the Scalability of Distributed Systems", Proc.
31st Hawaii Int. Conf. on System Sciences, vol. 7, pp. 524{524, January, 1998.
[2] X.H. Sun and L.M. Ni, \Scalable Problems and Memory-Bounded Speedup", J. of Parallel and
Distributed Computing, vol. 19, pp. 27{37, 1993.
[3] X.H. Sun and J. Zhu, \Performance Considerations of Shared Virtual Memory Machines", IEEE
Trans. on Parallel and Distributed Systems, vol. 6, no. 11, pp. 1185{1194, November 1995.
[4] A.Y. Grama, A. Gupta and V. Kumar, \Isoeciency: Measuring the Scalability of Parallel
Algorithms and Architectures", IEEE Parallel and Distributed Technology, pp. 12{21, August
1993.
[5] S.R. Sarukkai, P. Mehta and R.J. Block, \Automated Scalability Analysis of Message-Passing
Parallel Programs", IEEE Parallel and Distributed Technology, winter 1995, pp. 21{32.
[6] A. Sivasubramaniam, U. Ramachandran and H. Venkateswaran, \A Comparative Evaluation
of Techniques for Studying Parallel System Performance", Technical Report GIT-CC-94/38,
College of Computing, Georgia Institute of Technology, Atlanta, September 1994.
[7] O. Char, C. Evans and R. Bisbee, \Operating System Scalability: Windows NT vs. UNIX",
Intergraph Corporation, available at http://www.ingr.com/ics/wkstas/ntscale.html
[8] C. Allison, P. Harrington, F. Huang and M. Livesey, \Scalable Services for Resource Management
in Distributed and Networked Environments", WARP Report W1-96, Division of Computer Sci-
ence, University of St. Andrews, UK. Available at http://www.warp.dcs.stand.ac.uk/warp
[9] C. Yoshikawa, B. Chun, P. Eastham, A. Vahdat, T. Anderson and D. Culler, \Using Smart
Clients to Build Scalable Services", Internal report, Computer Science Division, University of
California, Berkeley. Available at http://www.now.cs.berkeley/edu/SmartClients
[10] Fahim Sheikh, Jerome Rolia, Pankaj Garg, Svend Frolund, Allan Shepherd, \Performance Eval-
uation of a Large Scale Distributed Application Design", World Congress on Systems Simulation,
Singapore, September 1997.
24
[11] P.P. Jogalekar and C.M. Woodside, \A Scalability Metric for Distributed Computing Applica-
tions in Telecommunications", Proc. 15th International Teletrac Congress - Teletrac Con-
tributions to the Information Age, eds. V. Ramaswami and P. Wirth, vol. 2(a), pp. 101{110,
1997.
[12] A. Giessler, J. Hanle, A. Konig, E. Pade, \Free Bu er Allocation - An Investigation by Simula-
tion", Computer Networks, pp. 191{208, 1978.
[13] E.D. Lazowska, H. Zahorjan, G.S. Graham, K.C. Sevcik, \Quantitative System Performance -
Computer System Analysis Using Queueing Network Models", chapter 5, Prentice-Hall, Inc.,
New Jersey, 1984.
[14] C.M. Woodside, J.E. Neilson, D.C. Petriu and S. Majumdar, \The Stochastic Rendezvous Net-
work Model for Performance of Synchronous Client-Server-like Distributed Software", IEEE
Trans. on Computers, vol. 44, no. 1, pp. 20{34, January 1995.
[15] J.A. Rolia and K.C. Sevcik, \The Method of Layers", IEEE Trans. on Software Engineering,
vol. 21, no. 8, pp. 689{700, August 1995.
[16] P.J.M. van Laarhoven and E.J.L. Aarts, \Simulated Annealing: Theory and Applications",
D. Reidel Publishing Company, Boston, 1987.
[17] L. Ingber, \Simulated Annealing: Practice Versus Theory", J. Mathl. Comput. Modelling, vol.
18, no. 11, December 1993.
[18] G.L. Bilbro and W.E. Snyder, \Optimization of Functions with Many Minima", IEEE Transac-
tions on Systems, Man and Cybernetics, vol. 21, no. 4, July/August 1991.
[19] S. Majumdar, C.M. Woodside, J.E. Neilson and D.C. Petriu, \Performance Bounds for Concur-
rent Software with Rendezvous", Performance Evaluation, vol. 13, pp. 207{236, 1991.
[20] \Generic Functional Architectures for Transport Networks", International Telecommunications
Union Recommendation no. G.805, November 1995.
[21] P. Tranta liou and D.J. Taylor, \The Location-Based Paradigm for Replication: Achieving
Eciency and Availability in Distributed Systems", IEEE Trans. on Software Engineering, vol.
21, pp. 1{18, January 1995.
[22] A.M. Pan, \Solving Stochastic Rendezvous Networks of Large Client-Server Systems with Sym-
metric Replication", Master's thesis, Dept. of Systems and Computer Engineering, Carleton
University, Ottawa, September 1996.
[23] T. von Eicken, A. Basu, V. Buch and W. Vogels, \U-Net: A User-Level Network Interface
for Parallel and Distributed Computing", Proc. 15th ACM Symposium on Operating Systems
Principles, Colorado, pp. 1{14, December 1995.
[24] J.C. McDonald, \Fundamentals of Digital Switching", Chapter 4, Plenum Press, New York,
1990.
25