Waveform Relaxation Theory and Practice
Waveform Relaxation Theory and Practice
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute to
lists, requires prior specific permission.
WAVEFORM RELAXATION: THEORY AND PRACTICE
by
and A. Ruehli
30 July 1985
WAVEFORM RELAXATION: THEORY AND PRACTICE
by
30 July 1985
College of Engineering
University of California, Berkeley
94720
WAVEFORM RELAXATION: THEORY AND PRACTICE
Abstract: This paper surveys the family of Waveform Relaxation Methods for
improve the efficiency of the basic WR algorithm, along with theoretical results
INTRODUCTION
computing resources has made computer simulation an important and heavily used
tool for both research and engineering design. Since many simulation problems are
The standard approach to solving ODE systems is based on three techniques [l],
[2]:
i) Stiffly stable implicit integration methods, such as the Backward Difference formu
las, to convert the differential equations which describe the system into a sequence of
ii) Modified Newton methods to solve the algebraic equations by solving a sequence of
linear problems.
iii) Sparse Gaussian Elimination to solve the systems of linear equations generated by
variables are changing at very different rates. This is because the direct application of
the integration method forces every differential equation in the system to be discretized
identically, and this discretization must be fine enough so that the fastest changing
different discretization points, or timesteps. for each differential equation in the system
so that each could use the largest timestep that would accurately reflect the behavior
of its associated state variable, then the efficiency of the simulation would be greatly
improved.
Several modifications of the direct method have been used that allow the indivi
that will be discussed in this paper is the family of Waveform Relaxation algorithms
In this paper we will both survey the current state of research in WR algorithms,
and present new theoretical and practical results The paper is organized as two parts.
In the first part we will present the theoretical background for the basic WR algorithm
example, and follow with the basic algorithm. Then a new proof of the convergence,
norm, will be presented. Extensions to the basic algorithm that allow for modified
tion spaces will then be presented, and its convergence proved using lemmas from the
basic theorem. Finally, discretization approximations will be considered in more detail,
by comparing relaxation and explicit integration methods for a sample stiff problem.
In the second part we will analyze examples that illustrate several of the imple
mentation techniques used to improve the efficency of the basic WR algorithm, and
prove theorems that indicate the strengths or limitations of these techniques. We will
start by considering approaches for partitioning large systems into loosely coupled
subsystems. We will then examine how breaking the simulation interval into pieces,
called windows, can be used to reduce the number of relaxation iterations required to
achieve convergence. Two techniques for reducing the iteration computation will then
be presented. The first is based on performing one iteration of a Newton method with
each relaxation iteration, and the second is based on exploiting piecewise linearity.
Because the WR algorithm has proved to be an efficient technique for simulating MOS
digital circuits, the examples used throughout this paper are drawn from this area. In
order to more clearly demonstrate both the practicality of the WR algorithm, and the
specific nature of its efficiencies, we will end the second section, and the paper, by exa
mining in detail application of the WR algorithm to the simulation of MOS digital cir
cuits.
We will start this section with a simple illustrative example, and then present
x2(t) . Eqn. (1.1.1a) is then re-solved using the new solution for x2(t) and the pro
cedure is repeated.
Alternately, fix the waveform x2(t) in Eqn. (1.1.la) and fix x}(t) in Eqn.
(1.1.lb) and solve both one dimensional differential equations simultaneously. Use the
solution obtained for x2 in Eqn. (1.1.1b) and the solution obtained for Xi in Eqn.
In this fashion, iterative algorithms have been constructed. Either replaces the
relaxation algorithms can been seen as the analogues of the Gauss-Seidel and the
Gauss-Jacobi techniques for solving nonlinear algebraic equations. Here, however, the
unknowns are waveforms (elements of a function space), rather than real variables.
In this sense, the algorithms are techniques for time-domain decoupling of differential
equations.
system's solution, we first must guarantee that Eqn. (1.1.2) has a solution. If we
require that there exists a transformation of Eqn. (1.1.2) to the form y = / (y ,u)
where / is Lipshitz continuous with respect to y for all u , then a unique solution for
the system exists[22]. Although there are many sets of broad constraints on F that
guarantee the existence of such a transformation, the conditions can be difficult to ver
ify in practice. In addition, for the above system, it is difficult to determine how to
assign variables to equations when applying the WR algorithm. That is, when solving
the Ft equation of the system in the iteration process, what Xj variable should be
solved for implicitly. If a poor choice is made, the relaxation may not converge[9B].
Rather than carefully considering the existence and assignment questions, which
will complicate the analysis that follows without lending much insight, we will con
sider the following less general form, in which many practical problems, particularly
C: IRn x IRr —» IR"*" is such that C Or , u )_1 exists and is uniformly bounded with
respect to x ,u ; and / : IR" x W -»IR" is globally Lipschitz continuous with respect to
normal form for Eqn (1.1.3). and that x (t) € IR" is the vector of state variables for
the system. Then as / is globally Lipschitz with respect to x for all u , C (x . u )-1 is
uniformly bounded, and u (t ) is piecewise continuous, there exists a unique solution to
Eqn. (1.1.3).
foreach (i € { l,..,n } {
solve
}
}until ( maxj<,<„ max, € [oj]'XK* ) ~ ** KOI ^ € )
that is. until the iteration converges. •
Note that the differential equation in Algorithm (1.1.1) has only one unknown vari
able x* . The variables x/^1. • • • ,x*_1 are known from the previous iteration and the
variables x\ . • • • .x^-i have already been computed. Also, the Gauss-Jacobi version of
the WR Algorithm for Eqn. (1.1.3) can be obtained from Algorithm (1.1.1) by replac
ing the foreach statement with the forall statement and adjusting the iteration
indices.
respect to x for all u then both the Gauss-Seidel and the Gauss-Jacobi versions of
Algorithm (l.l.l) are guaranteed to converge. In [9A], it was shown that the WR
and independent of x. As many systems that are modeled in the form of Eqn. (1.1.3)
extends the original theorem to include these systems. In addition, we will prove the
WR method is a contraction in a simpler norm than the one used in the original
theorem.
We will prove the theorem by first showing that if C (x ,u ) is diagonally dom
inant, then there exists a bound on the xk 's generated by the WR algorithm that is
independent of k . Using this bound, we will show that the assumption that C (x ,u)
is Lipschitz continuous implies there exists a norm on 1R" such that for arbitrary posi
ll**+1-*'+1H* <\\xk-xi\\b
and therefore the sequence { x* } converges by the contraction mapping theorem. As
x* (0) = x0 for all k, { x* } converges as well.
Before formally proving this basic WR convergence theorem we will state the
well-known contraction mapping theorem[l6], and a few lemmas which will be used
such that \\F(y ) - F(x)|| < y||y-x|| for all x ,y € Y , for some y 6 [0.1) , then F
has a unique fixed point y such that F (y ) —y. Furthermore, for any initial guess
Lemma 1.2.1; If C(x,u) € IRnxn is diagonally dominant uniformly over all x G IR" .
u 6 IRr then given any collection of vectors { x1, • • ,xn } . x' € IR" . and any
u e W . the matrix C(x1. • • • ,x" ,u ) € IR"™ defined by
Cfj (x *, • • • pcn ,u ) = Cij (x' ,u ) is also diagonally dominant. In other words, let Cp be
the matrix constructed by setting the ith row of Cp equal to the ith row of the given
matrix C (x' ,u) . Then this new matrix is also diagonally dominant.
Lemma 1.2.2; Let C € lRnxn be any strictly diagonally dominant matrix. Let L,
strictly lower triangular, U, strictly upper triangular, and D, diagonal, be such that
Lemma 1.2.3; Let x ,y 6 C ([0,7],IRn ) . If there exists some norm on IR" such that
substitution for y (t), multiplying the entire equation by e~bl, and moving the norms
l2e-b'f\\y(T)\\dT + l2e-b'\\y(0)l
o
Let || • ||6 be defined by ||/ \\b =max[0,r?~bl 11/ U )|| . This is a norm on C ([0.7IlR" ) for
any finite positive number b > 0 and is equivalent to the uniform norm on
l2e-blJebrdr\\y\\b +he^'h(0)|| ]
o
t ..
And since e-6' febrd r ^ -=-, then for b >l 1we can write
\\x\\b <£J4=rltf
1—l ib
I'* +*(0)H +z2bCo)||. (1.2.5)
(y+Z25_1)
In this case v is less than 1, so there exists a finite B for which ,— = a < 1 .
' 1-Z^B"1
Let the b in Eqn. (1.2.5) be set equal to this B to get
C(x (t ).u (r )) 6 ]Rnxn is strictly diagonally dominant uniformly over all x (t ) € IR"
and u {t) € IRr and Lipschitz continuous with respect to x (t) for all u (t). then the
[0.7] .
We will present the proof only for the Gauss-Seidel WR algorithm, as the proof
for the Gauss-Jacobi case is almost identical. The equations for one iteration of the
equation yields:
10
Lipschitz constant of / with respect to its second argument. That C (x ,u) is uni
formly diagonally dominant and Lipschitz continuous with respect to x for all u
implies (Lk +Dk )-1 and (Lk +Dk )-1f/x. are also Lipschitz continuous in the same
manner. It then follows that there exist some positive finite numbers &j , k2 . k3 , k4
such that
[*3ll**+1(0-*;+1(r)|| + *J**a)-*'(OI|]||/(*-'+,.*',i/)li +
[*,||**+1a )-x->+1U )|| + *2||x* it }-xHt )|| ] II*' (r )|| +y||xA U )-x' (t )||
where kl is the Lipschitz constant of {Lk +Dk )~1Uk with respect to its first x argu
ment (see definition of Lk . Uk and Dk above). k2 is the Lipschitz constant with respect
to the second x argument. k3 and k4 are the Lipschitz constants for (Lk +Dk )_1 with
respect to its first and second x arguments, and y is such that || (Lk +Dk }~lUk || < y < 1
independent of k (by Lemma 1.2.2).
To establish a bound on the terms in Eqn. (1.2.10) involving ||x* (t )|| and
11/ (xi+1, xj ,u)\\ it is necessary to show that the x* *s and therefore the x* *s and
/ (•)*s are bounded a priori. We prove such a bound exists in the following lemma.
11
Lemma 1.2.4; If C(x.w) in Eqn. (1.1.3) is strictly diagonally dominant and Lipschitz
continuous then the x*(0's produced by Algorithm (1.1.1) are bounded independent
of k .
If ||*|| is the Z^norm on IRn . by Lemma 1.2.1 || (Z,^ +1+Z)it +1)~1^jt +1H < 1. From
Eqn. (1.2.7).
\\xk+1(t )|| < yllx* U )|| +l1K\\xk+Kt )ll +l2K\\xk (t )|| +*||/ (0,0.u )|| (1.2.13)
Eqn. (1.2.13) is in the form to apply a slightly modified Lemma 1.2.3. Therefore there
exists some || •||fc such that
for all k. Then, since ||x°||fe must be bounded given a finite x(0), and
\\xk+% =max,0J)e-*'||x*+1(r)|| .
any one of the B norms, by the contraction mapping theorem xk converges to some
x € C([0,7],IRn) which is a fixed point of Eqn. (1.2.5). Any fixed point x of Eqn.
Algorithm 1.1.1 is stationary in the sense that the equations that define the
is to allow these iteration equations to change, and to consider under what conditions
the relaxation still converges [9]. There are two major reasons for studying nonsta-
tionary algorithms. The solution of the ordinary differential equations in the inner
loop of Algorithm 1.1.1 cannot be obtained exactly. Instead numerical methods com
pute the solution with some error which is in general controlled, but which cannot be
eliminated. However, the discrete approximation can be interpreted as the exact solu
tion to a perturbed system. Since the approximation changes with the solutions, the
that must compute the solution to the iteration equations approximately can be inter
The second reason for studying nonstationary methods is that they can be used to
be to improve the accuracy of the computation of the iteration equations as the relaxa
tion approaches convergence. In this way, accurate solutions to the original system
would still be obtained, but unnecessarily accurate computation of the early iteration
waveforms, which are usually far from the final solution, is avoided.
is, given mild assumptions about the relationship between a general stationary contrac
tion map and a nonstationary map, the nonstationary map will produce a sequence
that will converge to within some tolerance. And if in the limit as k —»oo the nonsta
tionary map approaches the stationary map, then the sequence generated by the nons
tationary map will converge to the fixed point of the original map. In later sections we
based algorithms.
Theorem 1.3.1: Let Y be a Banach space and F, Fk :K -»}' . Define y *+1 = F(y * ) and
~*+i _ pk Qk ) If .F is a contraction mapping with contraction factor y (See section
1.2). \\F(y)- Fk(y)\\ < 8* for all y € Y, and z € Y, is such that z = F{z), then
for any e > 0 there exists a 8 < 1 such that if 8; < 8 for all k then
Taking the norm of the difference between the k'h and k+lsl iteration of the
nonstationary algorithm we get:
Let y be the fixed point of F . The difference between the computed and the
lim||yi+1-yl|, ^ 1 —y
k -*oo
* (1.3.8)
which completes the proof of the first statement of Theorem 1.3.1. The second state
ll**+1-*j+% <a||*'-*V||6
This WR convergence result and Theorem 1.3.1 imply that using any "reasonable"
approximation method to solve the WR iteration equations will not affect the conver
gence provided the errors in the approximation are driven to zero. In addition.
Theorem 1.3.1 indicates that it will be difficult to determine a priori how accurately
required.
with respect tox(0 in a B norm. Theorem 1.3.1 then implies that the WR iteration
equations must be solved accurately with respect to x (t ) in this B norm if the itera
larger B norm than the one used in the proof of Theorem 1.2.1. and the size of this B
result. Theorem 1.3.1 implies that it is only necessary to control errors in the compu
norm is in some sense a weaker type of convergence. So. in the case where C(x ,u) has
achieved if the x* (t) 's are computed in a way that also guarantees that the xk (t )'s
are globally accurate.
methods used to solve nonlinear algebraic problems. Another popular method for
solving nonlinear algebraic problems is the Newton-Raphson method, and its function
space extension also has practical applications. In this section we will derive the
function-space Newton method applied to systems of the form of Eqn. (1.1.3) and
prove that the method has globed convergence properties, which is not true in general
xk+i = xk _ jfi(xk)F(xk ) (1 4 2)
where JFix ) is the Frechet derivative of Fix ) with respect to x . Note that in this
case JFix) is a matrix-valued function on [0.7] . That is. JFix) is a matrix of
waveforms.
9*
xi+1(0)=x0
We will refer to Eqn. (1.4.6) as the Waveform-Newton(WN) algorithm for solving
Eqn (1.1.3). It is, however, just the function-space extension of the classical Newton-
Raphson algorithm.
17
close to the correct solution, but they do not in general have global convergence pro
perties. However, the WN algorithm represented by Eqn. (1.4.6) does converge for
AC ( •*• 11 ^
any initial guess, given mild assumptions on the behavior of ———-— as in the fol-
0*
lowing theorem.
c)C (x u )
Theorem 1.4.1; For any system of the form of Eqn. (1.1.3) in which ———-— is
o*
assume that C(x ,u) is the identity, as the proof for the general case is much more
involved, and does not provide much further insight into the nature of the conver
for all k , { xk } converges to the solution of Eqn. (1.4.1) on any bounded interval. •
SECTION 1.5 - DISCRETIZED WR ALGORITHMS
to solve systems of nonlinear ordinary differential equations. The most popular tech
niques for solving these systems are the multistep integration formulas (such as the
function of the timesteps, which are usually chosen small enough so that the
that describe the decomposed systems are not solved exactly. However, one can view
Section 1.3 can then be applied to guarantee WR convergence to the solution of the
given system of ODE's when the global discretization error, a function of the
If the global discretization error is not driven to zero, one may expect that the
ODE"s. In this section we show that unless the timesteps used in the numerical
method are kept below some problem-dependent bound, the WR algorithm may not
the discretized WR algorithm converges if the timesteps used are "small enough".
Finally, we will end this section by comparing explicit and implicit integration
Consider the two node inverter circuit in Fig. 1. The current equations at each
where the input and output voltages were equal to half the supply voltage. Time is
x 2 = —A.x j H—x 2
x2(0) = x2(0) = 0.
Note that the initial conditions given for the above example identify a stable equili
brium point.
The Gauss-Seidel WR iteration equations for the linear system example are:
x£+1 in ) = _L-
1+h
x2+1 in -1) (1.5.4)
As an example, let \=200, h =0.5 and as an initial guess use x2inh) = nh.
which is far from the exact solution x 2 inh ) = 0. The computed sequences for the ini
tial guess and first, second and third iterations of Eqn. (1.5.4) are presented in Table 1.
20
As the Table 1 indicates, the WR algorithm diverges for this example. In fact,
(1.5.5)
il+h) V0.1\
algorithm applied to Eqn. (1.1.3) with Cix ,u) = C . The WR iteration equation is
h f ixk+1in+l), x*(n+l).tt).
In the limit as h-*oo , Eqn. (1.5.6) becomes equivalent to solving
/ ixk+1in +1), x* in +l),u )=0 . Since little is assumed about / other than Lipschitz
continuity, it is unlikely that this problem can be solved, in general, with a simple
Gauss-Seidel relaxation. However, in the limit as the timestep becomes small, Eqn.
(1.5.6) becomes
xk+1in+l)-xk+1in) = iL +D)-1Uixkin+l)-xkin) ).
and from the lemma in Section 1.2, the norm of || iL + Z))_1Z/|| < 1 so the relaxation
21
braic problem. As the timestep decreases, the problem is continuously deformed from
one that may not be solvable by relaxation to one that is guaranteed to be solvable by
Theorem 1.5.1: If, in addition to the assumptions of Theorem 1.2.1. the WR iteration
equations are solved using a stable and consistent multistep method with a fixed
timestep h , for a finite number of points, then there exists an h' > 0 such that the
Now consider solving Eqn. (1.5.3) using the computationally simpler Explicit-
Euler integration formula ( xinh) = —[x((/i +l)h )—xinh )] ). The recursion equa
x2inh ) = nh .
this example. In general, if explicit integration methods are used in the WR algorithm,
Theorem 1.5.2; If, in addition to the assumptions of Theorem 1.2.1. the WR iteration
equations are solved using an explicit multistep method with a fixed timestep h , for a
The proof of this theorem follows from a simple inductive argument [21]. Let
xin ) be the exact solution to the system discretized using an explicit method. Assume
xk(m) = x(m) for all m < n . Since the integration method is explicit, xk in) and
xin) are the same function of u and xk (m ) , m < n . Therefore, x* in ) = x (n ) .
x* (0) = x(0) for all k by assumption, which completes the proof. Note that this
proof guarantees that the discretized WR algorithm converges precisely to x in ) , the
It should be noted that the above proof does not show the explicit discretized WR
Explicit-Euler method, or for explicit methods in general. But since these methods
have small regions of absolute stability, the timestep may be limited not by accuracy
considerations but to insure stability. For example, consider the differential equation
of Eqn. (1.5.3), but with a perturbed initial condition. x\ (0) = x2(0) = 0.1. The
exact solution will decay asymptotically to zero, but the numerical solution produced
method with fixed timesteps. WR is a not a good algorithm to use. As the convergence
proof indicates, the WR algorithm will converge for at least one additional time step,
and probably no more, with each relaxation iteration. For that reason, it is inefficient to
compute more than one additional time step with each WR iteration. Given that, it fol
lows that there is no reason to recompute the old timepoints because the relaxation
will have converged for sure. The WR algorithm is then reduced to an explicit integra
Still, it is interesting to examine the case when the timestep necessary to insure
necessary for stability of the Explicit-Euler. Because, if implicit methods do not allow
use of much larger timesteps, the extra computations required to use them is not
timesteps are unconstrained and the Explicit-Euler timestep must be less than 1.0. If
X = 100 then the Implicit-Euler timesteps must be less than 0.37 , and the Explicit-
Euler timesteps must be less than 0.18. In addition, Implicit-Euler will continue to
allow larger timesteps than Explicit-Euler for very large X . because its timestep con
J_
T
One can infer from the above example that the WR algorithm allows the use of
larger timesteps than a direct explicit method in most cases, but constrains the
timesteps more than a direct implicit method. The difference between the direct impli
cit method timestep constraint and that for the WR algorithm is smallest if the system
to be solved is very loosely coupled. Digital integrated circuits, for which the WR
24
algorithm was originally developed, are not always loosely coupled. The coupling can
be quite strong, but is usually so only for short intervals. The WR algorithm is
efficient for these problems because small timesteps are required (to insure WR conver
gence) only during those intervals when the coupling is strong, and. because implicit
integration is used, the timestep can safely be made much larger for the rest of the
interval[l9].
Most of the above analysis does not extend readily to the case where different
timesteps are used for different nodes of the system. Examining the multiple timestep
case is in general a very difficult problem in numerical analysis, even for standard
methods, and nothing has been published examining this case. Since the major advan
tage of WR is that only those variables in the system that are changing rapidly use
small timesteps, this is an important missing piece of the theory about WR methods.
In Algorithm 1.1.1. the system equations are solved as single differential equa
tions in one unknown, and these solutions are iterated until convergence. If this kind
of node-by-node decomposition strategy is used for systems with even just a few
tightly coupled nodes, the WR algorithm will converge very slowly. As an example,
consider the three node circuit in Fig. 2a, a two inverter chain separated by a resistor-
wiring delays, so the resistor has a large conductance compared to the other conduc
tances in the circuit. The current equations for the system can be written down by
C x2 g (x2—xj) = 0
C x3 tm3(x3.x2) + im4ixz.vdd ) = 0
25
Linearizing and normalizing time (so that the simulation interval [0,T] is converted to
[0,1]) yields a 3x3 linear equation:
x,(0)=x2(0)=0 x3(0)=5
Algorithm 1.1.1 was used to solve the original nonlinear system. The input uit ),
the exact solution for x2, and the first, fifth and tenth iteration waveforms generated
by the WR algorithm for x2 are plotted in Fig. 2b. As the plot indicates, the iteration
waveforms for this example are converging very slowly. The reason for this slow can
be seen by examining the linearized system. It is clear x i and x 2 are tightly coupled
If Algorithm 1.1.1 is modified, so that x\ and x2 are lumped together and solved
„* +i
•k+1 -10 9.5 0 *i
xl ^k+l
•k+1
x2 9.5 -9.5 0 t
*3
The modified WR algorithm now converges in one iteration, because x3 only depends
As the example above shows, lumping together lightly coupled nodes and solving
them directly can greatly improve the efficiency of the WR algorithm. For this reason,
the first step in almost every WR-based program is to partition the system, to scan all
the nodes in the system and determine which should be lumped together and solved
directly. Partitioning "well" is difficult for several reasons. If too many nodes are
lumped together, the advantages of using relaxation will be lost, but if any tightly
coupled nodes are not lumped together then the WR algorithm will converge very
slowly. And since the aim of WR is to perform the simulation rapidly, it is important
26
Three approaches have been applied to solve this partitioning problem, and all
have been used successfully in programs for solving large systems. The first approach
is to require the user to partition the system [10] . This technique is reasonable for the
simulation of large digital integrated circuits because usually the large circuit has
already been broken up into small, fairly independent pieces to make the design easier
point of view may not be a good partitioning for the WR algorithm. For this reason
programs that require the user to partition the system sometimes perform a "sanity
check" on the partitioning [ll] . A warning is issued if there are tightly coupled nodes
that have not been lumped together.
the functional extraction method[l2]. In this method the equations that describe the
system are carefully examined to try to find functional blocks (i.e. a nand gate or a
flip-flop). It is then assumed that nodes of the system that are members of the same
functional block are tightly coupled, and are therefore grouped together. This type of
partitioning is difficult to perform, since the algorithm must recognize broad classes of
functional blocks, or nonstandard blocks may not be treated properly. However, the
functional extraction method can produce very good partitions because the relative
The most general, and perhaps the most obvious, approach to the partitioning
problem is the "diagonal dominant loop" method [13] [12] . In this method tightly
/ (x ,u) and C(x ,u ). If the magnitude of the product of the diagonal terms is not
greater than the product of the off-diagonal terms by some factor a, (a good choice
will depend on the application), then the two nodes corresponding to the submatrix are
27
The diagonal dominant loop method has the advantage of simplicity and general
ity, but it is often too conservative in practice. Unnecessarily large subsystems can be
generated because only the worst-case coupling is considered when lumping nodes
together. There are also cases for which the method is not conservative enough. A
poor partitioning will be generated for systems thai include sets of nodes that are
extremely tightly coupled to each other and are also tightly coupled to other nodes in
capture a wide variety of functional blocks, can become a very complicated algorithm.
However, the functional extraction methods better estimate the effective coupling
In the case where a functional extraction method exists, but is too complicated to
apply to a large system directly, then a good mixed approach is to use the two methods
sequentially. First partition the system by applying the diagonal dominant loop
method. Then apply the functional extraction method only to any overly large
28
The convergence theorem presented in Section 1.2 guarantees that the WR algo
will then examine how to reduce the number of iterations required to achieve conver
gence by breaking the simulation interval into small pieces, or "windows". First we
will prove WR convergence in an unweighted norm for short intervals. As this proof
must take into account worst-case behavior, the estimate of the interval the proof pro
vides is too short to be practical. This will lead us to the conclusion that an adaptive
approach to choosing the windows will be more useful, and is a safe alternative
because the basic convergence theorem guarantees that regardless of the interval
nor logic gate in Fig. 3a (the approximate equations represent a normalization that con
the behavior of the cross-coupled nor gate circuit approximated by the above small
system of equations. In Fig. 3b plots of the input uit), the exact solution for x lit).
and the relaxation iteration waveforms for xxit ) for the 5th, 10th and 20th iterations
are shown. The plots demonstrate a property typical of the WR algorithm when
applied to systems with strong coupling: the difference between the iteration
29
waveforms and correct solution is not reduced at every time point in the waveform.
Instead, each iteration lengthens the interval of time, starting from zero, for which the
x2 = (x i — x2)
x(0) = 0.
The Gauss-Seidel WR Algorithm given in Section 1.2 was used to solve the original
system approximated by the above system of equations. The input uit), the exact
solution for x lit). and the waveforms for x\it ) computed from the first, second, and
third iterations of the WR algorithm are plotted in Fig. 4b. As the plots for this
example show, the difference between the iteration waveforms and the correct solution
Perhaps surprisingly, the behavior of the first example is consistent with the WR
convergence theorem, even though that theorem states that the iterations converge uni
formly. This is because it was proved that the WR method is a contraction map in the
max[0J]e-6'|l/(OII
where b >0 . / it) € IR" , and || • II is a norm on IR" . Note that ||/ it )|| can increase
as ebr without increasing the value of this function space norm. If fit) grows
slowly, or is bounded, it is possible to reduce the function space norm by reducing
Wf'it )|| only on some small interval in [O.T] , though it will be necessary to increase
this interval to decrease further the function space norm. The waveforms in the more
30
slowly converging example above, converge in just this way: the function space norm
is decreased after every iteration of the WR algorithm because significant errors are
reduced over larger and larger intervals of time. The examples above lead to the fol
lowing definition:
Definition 2.2.1; A differential system of the form given in Eqn. (1.1.3) is said to have
the strict WR contractivity property on [0.T] . if the WR algorithm applied to the sys
tem is a contraction map in a uniform norm on [0,T] , i.e.
[O.oo) . •
errors due to the decomposition die off in time, instead of accumulating or growing. As
the crossed nand gate example indicates, many systems of interest do not have the
strict WR contractivity property on [0.T) for all T < oo . However, we will prove
that any system that satisfies the WR convergence theorem will also have the strict
Theorem 2.2.1; For any system of the form of Eqn. (1.1.3) which satisfies the
assumptions of the WR convergence theorem (Theorem 1.2.1) there exists af>0 such
that the system has the strict WR contractivity property on [0,r] .
Proof of Theorem 2.2.1
31
We will prove the theorem only for the Gauss-Seidel WR algorithm but, as
before, the theorem holds for the Gauss-Jacobi case. Starting with Eqn. (1.2.8) and
xk+1it)-xkit)= (1.2.8)
iLk +1it )+Dk +1it ))-^. +1it )xk it ) - iLk it )+Dk it ))"1i/i. (/ )xk ~Kt ) +
iLk+l+Dk+lYlf ixk+\ xk ,u)~ (L, +Dt r1/ ixk . xk~\u )
To simplify the notation. let Ak it). Bk it) € IR"*" be defined by
keep in mind that iLk it )+Dk it ))~xUk it ), and iLk it )+Dk it ))_1 are functions of xk ,
and by definition, so are Ak it ) and Bk it). Expanding the above equation and
integrating,
J[Ak^ir)-Akir)]xk-Kr)d7 +
o
fBk +1ir)[f ixk +1(t). xk (t), u(t))-/ (** (t). x* "Ht). uir)}d r +
o
J[Bk+lir)-Bkir)]Jixkir).xk-lir).uir))d7
o
\A-Ak^ir)[xkiT)-xk-li7))d7 +f[Ak+1i7)-Aki7)]xk~1 d7 +
0 " T 0
Taking norms, and using the Lipschitz continuities of / , Ak (t), and Bk it). and the
uniform boundedness of Bk it ) in x (see Thm 1.2.1):
32
•yllx^O-x^'-KOII +/ il2K+k1M+2k2M+k4N)\\xki7)-xk-1iT)\\d7
o
where l\ ,l2 are the Lipschitz constants of / with respect to x*+1 and xk respec
tively; k\, k2, k3, k4 are the Lipschitz constants for Ak +J(r ), Bk +1(? ) with respect
to their xi+1 and xk arguments respectively; y = maX[x^.][(L; -¥Dk )Uk ] < 1; and M
and N are the a priori bounds on xk and / found in the proof of Theorem 1.2.1.
the max ( over t ) norms outside the integrals and integrating yields
maxIO,r]||x*+1(*)-x*(z)ll ^ (2.2.7)
a uniform norm for any system, provided the interval of time over which the
waveforms are computed is made small enough. This suggest that the interval of
uniformly throughout the entire window. The smaller the window is made, the faster
the convergence. However, as the window size becomes smaller, the advantages of the
waveform relaxation are lost. Scheduling overhead increases when the windows
become smaller, since each subsystem must be processed at each iteration in every win-
33
dow. If the windows are made very small, timesteps chosen to calculate the
waveforms will be limited by the window size rather than by the local truncation
The lower bound for the region over which WR contracts uniformly given in
Theorem 2.2.1 is too conservative in most cases to be of direct practical use. As men
tioned above, in order for the WR algorithm to be efficient it is important to pick the
largest windows over which the iterations actually contract uniformly, but the
a reasonable window size to use for a given nonlinear problem, window sizes are usu
scheme does not have to pick the window sizes very accurately. The only cost of a bad
In Section 1.3 we discussed a general class of methods for improving the compu
approximately solve the iteration equations when the computed xk *s were far from
convergence. We proved that if the WR iteration equations are only solved approxi
mately, but the accuracy of the approximation is improved with each iteration, then
these methods have convergence properties similar to the canonical WR algorithm. The
practical question is then what approximation should be used initially, and by how
much should the accuracy be improved with each iteration. In this section we will
present a modified WR algorithm that automatically adjusts the accuracy of the com
putation to how close the iterations are to convergence. The method is an extension to
solved exactly, but are solved approximately, by performing one step of a Newton
method. Since the accuracy of the one Newton step improves as the xk 's approach the
exact solution, the iteration equations will be solved more accurately with each itera
Using the waveform-Newton method derived in Section 1.4, and performing one
step of this Newton method with each waveform relaxation iteration, yields the fol
for all (i in N ) {
solve
Again, each equation has only one unknown variable xj- . like Algorithm 1.1.1, but
each of the nonlinear equations has been replaced by a simpler time-varying linear
problem.
35
Given the global convergence properties of both the original WR and the WN
algorithms, it is not surprising that the WNR algorithm has global convergence proper
ties. We will state the convergence theorem, but will not present the proof because it
Theorem 2.3.1: If, in addition to the assumptions of Eqn. 1.1.3, C(x ,u) € lRn*n is
strictly diagonally dominant uniformly over all x € IR" and ———•— is Lipschitz
Qx
continuous with respect to x for all u ; then the sequence { xk } generated by the
Gauss-Seidel or Gauss-Jacobi WNR algorithm will converge to the solution of Eqn
The linear time-varying systems generated by the WNR algorithm are easier to
solve numerically than the nonlinear iteration equations of the basic WR algorithm,
but the iteration equations could be futher simplified if the time-varying Jacobian is
course destroy the local quadratic convergence of the WN method. But, as an exami
nation of the convergence proof in Section 1.4 indicates, approximating the Jacobian
will not destroy the global WN convergence. In addition, loss of quadratic conver
gence may not be a significant consideration when the Newton method is used in con
junction with a relaxation method, because the relaxation converges linearly and will
The Modified WNR method then converts the basic WR iteration equations to
much simpler linear time-invariant equations. Such systems can be solved with a
variety of efficient numerical techniques other than the standard multistep methods.
Since the problem is linear, Laplace transform techniques could be used (see Section
2.4). Also, it is possible to use methods based on replacing the solution of the
network of nonlinear elements. Often, to reduce computation time, the nonlinear ele
ments are not evaluated exactly, but approximated by a linearly interpolated table of
values. Not only does this approach reduce the computation time needed to evaluate
the nonlinear elements, but it also converts the original system into a piecewise-linear
system. Inside the "bounding box" formed by the table entries, the system is linear.
In this section an approach is given for solving piecewise-linear systems using the
WR algorithm, but as the iterations are computed using Laplace transforms, the
We will start this section by deriving the iteration equations for the WR algo
rithm applied to a linear differential system. Following, the steps required to compute
We will then extend the approach to piecewise-linear systems and introduce a new
algorithm.
x = Ax x(0)=x0 (2.4.1)
where x(t) € IR" on t 6 [0,T] and A € IR";cIR" . A is linear; therefore it is
Lipschitz continuous with respect to x , and the basic WR convergence theorem guaran
(2.4.1) are:
37
Given that Jt°CO is a rational function with real poles (for example, if
x°(t )=x0 , x°(s ) = —x0 ), it is easy to compute the xk+1(s ) term from xk (s ) . If
\ ttii
xk(s)= 2>-X,) "V (2.4.4)
where v' € IR" , \/ 6 IR , and m, and Mk are positive integers, then xk can be calcu
lated from Eqn. (2.4.2) as
When necessary, the time domain expression for Eqn. (2.4.6) can be obtained
from
m*+1 ,. , fm'i _1
xk+Kt)= £i
LcVf,
(m , - 1)
ppw'. (2.4.7)
As indicated above, the partial fraction expansion of xk+1 is computed from the
partial fraction expansion of xk in two simple steps. First, Mk multiplications of the
matrix (L+U) by the vector v' must be performed. For large systems, the (L+U)
matrix is usually sparse, so the number of scalar multplications required to perform
nonzero terms in each row of the matrix. The next step is to compute the w' vectors
38
as in Eqn. (2.4.6). This involves performing the partial fraction expansion of the
terms of the form (s—\')~mi(s—dJ)~* and s~l(s—dj )-1 where d} is the jth entry of
the diagonal matrix D. The partial fraction expansion can be computed by evaluating
Mk
Z W, residues, where Wt is the number of nonzero elements in (L +U )v,.
i=l
linear systems is that the solution will cross into many different linear regions. How
ever, the points in time at which the solution passes from one linear region to the next
can be thought of as defining beginning and ending points of windows in time (see Sec
tion 2.2). Inside each window the problem is linear, with initial conditions specified
by the solution's value at the time it crosses the boundary of the region. The algo
rithm can then proceed as above inside each window, with only the additional
differential system so that we can precisely define the notion of boundary crossings.
Definition 6.1: Let RJ , j € [l, • • • ,r ] be a collection of closed sets with disjoint inte
riors, and U= {jRj. Let A} e IR"*" and bj €IR" . Then p:U-+JRn is such that
;=i
p (z ) = Aj z + bj is piecewise-linear.
and bj 6 IR" of p are known, and are such that />(•) is everywhere continuous. The
WR algorithm for solving systems of the form of Eqn. (2.4.8) using a region by region
sions for the iterations. The computation of the time domain expression is a relatively
improve the efficiency of Algorithm 2.4.1 by checking the convergence only every few
iterations. Another method for improving the efficiency is somewhat more subtle.
Since the WR algorithm usually converges in a nonuniform manner (see Section 2.2),
insisting that the relaxation converge to the end of the interval of interest only to then
toss away the solution after the boundary crossing time, will usually require many
when the boundary crossings will occur without knowing the exact solution, finding
those times for the partially converged solutions can provide good approximations to
the boundary crossing times. Since in many cases evaluating the boundary crossing
time is very expensive, a fast approximation should be used to provide some reasonable
upper bound on the boundary crossing time. This approximation can then be used to
shorten the interval over which WR convergence must be assured. Hence, the number
In the previous sections of this paper, we have presented several relatively gen
eral techniques for improving the efficiency of WR along with corresponding examples
related to simulating MOS circuits. We chose examples from this area because, as men
tioned in the introduction, WR has proved to be an efficient techique for solving the
large nonlinear ODE systems that describe MOS digital circuits. This is due, in part, to
characteristics of these ODE systems that are exploited by the general properties of the
WR algorithm mentioned above. Specifically, these problems are easily broken up into
loosely coupled subsystems across which relaxations converge rapidly, and different
state variables change at very different rates, so the ability of the WR algorithm to use
There are also other properties of MOS circuits that can be exploited by the WR
algorithm, and they are much more specific to the circuit simulation problem. In order
to complete our presentation of the WR algorithm we will discuss some of these tech
niques, and will end the section with experimental results that demonstrate the
tances, and / : IR" v IR' -♦ IR" is a a vector function of the voltages (the currents). For
most circuits of practical interest, C(v , w) is strictly diagonally dominant and there
fore C (v , u )_1 exists and is uniformly bounded with respect to v .it; and / is globally
the form
Ax — b
and strictly upper triangular matrices respectively, such that A = L+D +U. If the
classical Gauss-Seidel relaxation algorithm is used to solve the above system, the itera
(L+D)xk+1-U x* =b (2.5.2)
Taking the difference between iteration k +1 and k yields the following relation:
mapping theorem, the relaxation converges if there exists some induced norm on IR"
relaxation converges in one iteration because the above norm is zero. Of course, the
order of the single equations represented by the rows of A in Eqn. 2.5.1 is not unique.
One could reverse order the equations of the system, so that b, becomes 6„+j_;:
rithm is used to solve this new problem, the above norm will no longer be zero, and
the relaxation will not converge in one iteration. If fact, it may not converge at all.
Although the above is an extreme example, it does indicate that if the Gauss-
Seidel relaxation algorithm is used, it is possible to reorder the system so that the
particular, a reordering should attempt to move as many of the large elements of the
42
The Gauss-Seidel WR algorithm shares this properly with the algebraic relaxation
scheme. That is, the Gauss-Seidel WR algorithm will converge more rapidly if the
ODE system can be made lower triangular. In the case of MOS digital circuits the
function / (v ,u ) that represents the currents in the circuit can be made mostly lower
triangular by a careful reordering of the equations*. This is because the MOS transis
tor is a highly directional device. The transistor currents at the drain and source ter
minals are a strong function of the voltage at the gate terminal, but the gate current is
almost unaffected by the drain and source voltages. Therefore, if the differential equa
tions of the circuit can be ordered so that the equation that is solved to determine the
voltage at the gate of a transistor can be placed before the equation that is solved to
determine the voltage at the drain and source of a the transistor, then / (v ,u ) will be
mostly lower triangular.
transistor's drain or source voltage equation to precede its gate voltage equation. In
these cases, convergence is still improved if the equations are ordered so that / (v ,u ) is
as lower triangular as possible.
that the entire waveform for every node in the system must be stored during the itera
tion process. For systems with many nodes and long waveforms, the required data
storage may exceed a computer's available memory. Breaking the simulation interval
into "windows" (section 2.2) reduces the storage required for each of the individual
♦When we say we wish / (v ,u ) to be mostly lower triangular, we mean that we would like the terms
Q/i
that are large for any V: to be in the lower triangular portion.
$Vj
43
memory, but very large systems will still require extra storage. One approach to solv
ing this problem without changing the WR algorithm is to store the waveforms on a
mass storage medium (e.g. a magnetic disk). Then, since only a few waveforms are
used at any one time, those waveforms can be moved into a computer's memory, and
then moved back out when no longer needed (in much the same way as virtual
rithm is to introduce a truncation approximation, which will work well for certain
Consider the inverter chain example in Fig. 5a. The input to the first inverter and
its output for the cases of chains with two. three and four inverters are plotted in Fig.
5b. As the plots indicate, the impact on the output of the first inverter of additional
inverters diminishes with the number of inverters. This suggests that the output of
each inverter in a long chain could be computed accurately by considering only a few
inverters at a time, effectively truncating the computation. For example, compute the
output of the first inverter by solving a reduced system which ignores all the inverters
after the fifth: then compute the output of the second inverter, using the computed
output from the first inverter, by solving a reduced system ignoring all the inverters
after the sixth: and continue in this fashion until all the inverter outputs have been
computed. The advantage of this approach is that at any point in the procedure, only
the waveforms of six inverters are needed. And. as the above simple example indicates,
Of course this algorithm will only produce accurate results if the system of equa
tions are almost unidirectional, like the inverter chain, and the truncation algorithm
follows this direction. Combinational circuits, a large class of MOS digital integrated
circuits, do share the mostly unidirectional property of the inverter chain. And since
44
these circuits can be quite large (several thousand nodes), using the truncation
If the WR algorithm is used to compute the time domain behavior for very large
circuits, it is often the case that some pieces of the circuit will converge much more
rapidly than others. The overall efficiency of the WR method can be improved if the
waveforms that have already converged are not recomputed every subsequent itera
tion.
to Algorithm 1.1.1. Before giving the exact algorithm we present the following useful
definition.
be the ith equation of the system in Eqn. 2.5.1. We say v} (t ) is an input to this equa
tion if there exists some a,t € IR and z ,y € IR" such that
n n
Z Ctj (z ,u (t ))yj^J^Cij (z +otej ,u (t ))j; or / f(z m(t ));*/ j(z +aej ,u (t )), where ej
is the jth unit vector. The input set for the ith equation is the set of j € [l, • • • ,n]
such that vj (t ) is an input.
The WR algorithm is then modified slightly using this notion of the set of inputs
to a given ODE.
Z Cij(vi.--.vf.v^.---.v/f-1.W)vjf-1+
The degree to which the WR algorithm improves circuit simulation efficiency can
be traced to two properties of a circuit. The first, mentioned before, is the differences
in the rates of change of voltages in the system, as this will determine how much
The second is the amount of coupling between the subsystems. If the subsystems are
tightly coupled, then many relaxation iterations will be required to achieve conver
gence, and the advantage gained by solving each subsystem with its own timestep will
be lost. To show this interaction for a practical example, we will use the Relax2.2[l3]
program to compare the computation time required to simulate a 141-node CMOS
memory circuit using standard direct methods and using the WR algorithm. In order
to demonstrate the effect of tighter coupling, the CMOS memory circuit will be simu
lated using several values of a parameter XQC. which is the percent of the gate oxide
46
The results in Table 3 are exactly as expected. As the coupling increases, the
number of WR iterations required increases, and the difference in the simulation time
It is possible to verify for this example our claim of the nature of the efficiencies
of using WR. Consider the number of timepoints computed by the direct method
versus the number of computed timepoints for the WR method in the final iteration.
By comparing these two numbers, a bound can be put on the maximum speed increase
that can achieved by solving different subsystems using different timesteps (Note that
we are only considering the number of timepoints computed by the WR method in the
final iteration, because we are only interested in the number of timepoints needed to
The total number of timepoints computed for each of the simulation cases of the
memory circuit example is also given in Table 3. This number is the sum of the com
puted timepoints over all the waveforms in the circuit. If most of the efficency of a
decomposition method stems from solving each of the subsystems with its own
timestep. then the maximum improvement that could be gained from a decomposition
integration method would be the ratio of the number of timepoints computed using the
47
direct method compared to the number of timepoints computed in the final WR itera
tion. As can be seen from the Table 3, for the CMOS memory example this ratio is
approximately six. In order to compute the actual efficiency of the WR method, the
iteration the set of timepoints is recomputed. Then, if our claims above are correct,
when the ratio of the number of timepoints for the direct method to the number of
WR timepoints is divided into the average number of relaxation iterations, the result
methods: that they avoid large matrix solutions. This is a reasonable assumption for
the above example because the matrix operations account for only a small percentage
of the computations, even when direct methods are used. However, for much larger
problems, of the order of several thousand nodes, the time to perform the large matrix
should compare even more favorably because they avoid these large matrix solutions.
Finally, in Table 4, we present several circuits that have been simulated using
Table 4 - CPU Time for Direct Methods vs WR for Several Industrial Circuits
In this paper several of the WR algorithms that have been proposed in the litera
ture have been analyzed both from a theoretical and practical point of view. We have,
however, treated several of these aspects too lightly. In particular, research is needed
addition, theoretical and practical work needs to be continued on breaking large sys
tems into smaller subsystems in such a way that relaxation algorithms converge
rapidly. Finally, since the WR algorithm has a tremendous amount of inherent paral
lelism, its application to solving problems using parallel processors is also an impor
R. Saleh, and the other members of the CAD research group at Berkeley. In addition,
Yorktown Research group for many valuable discussions and suggestions. This
research has been sponsored in part by DARPA under contract NESC-N39. and in part
by the Air Force Office of Scientific Research (AFSC), United States Air Force, under
reproduce and distribute reprints for government purposes notwithstanding any copy
REFERENCES
[l] C. William. Gear, "Numerical Initial Value Problems /or Ordinary Differential Equa
May 1975.
49
[3] K. Sakallah and S.W. Director, "An activity-directed circuit simulation algorithm,"
Proc. IEEE Int. Con/, on Circ. and Computers, October 1980, pp.1032-1035
[5] B.R. Chawla. H.K. Gummel. and P. Kozah. "MOTIS - an MOS timing simulator."
IEEE Trans. Circuits and Systems, Vol. 22, pp. 901-909, 1975
[7] Keepin, W. N., "Multirate Integration of Two Time-Scale Systems," Ph.D. Disserta
tion, University o/ Arizonia, 1980
[8] C. William Gear, "Automatic Multirate Methods for Ordinary Differential Equa
[9B] E. Lelarasmee. "The waveform relaxation method for the time domain analysis of
large scale MOS integrated circuits", Proc. 19th Design Automation Conference, Las
[11] P. Defebve. J. Beetem. W. Donath. H.Y. Hsieh. F. Odeh, A.E. Ruehli, P.K. Wolff,
50
Sr.. and J. White, "A Large-Scale Mosfet Circuit Analyzer Based on Waveform Relax
ation" International Con/erence on Computer Design, Rye. New York. October 1984.
[12] C. H. Carlin and A. Vachoux, "On Partitioning for Waveform Relaxation Time-
Domain Analysis of VLSI Circuits" Proc. 1984 Int. Symp. on Circ. and Syst., Montreal,
[15] W.M.G. van Bokhoven, "An Activity Controlled Modified Waveform Relaxation
Method" 1983 Con/. Proc. IEEE ISCAS, Newport Beach. CA. May 1983.
Analysis" Proc. 1984 Int. Custom Integrated Circuits Con/erence Rochester, New York,
June 1984
[19] W.K. Chia. T.N. Trick, and I.N. Haij. "Stability and Convergence Properties of
Relaxation Methods for Hierarchical Simulation of VLSI Circuits", Proc. 1984 Int.
[20] F. Odeh and D Zein. "A Semidirect Method for Modular Circuits" 1983 Con/. Proc.
[22] J. K. Hale, Ordinary Differential Equations John Wiley and Sons, Inc.. 1969
[23] H. A. Antosiewicz, "Newton's method and boundary value problems". Journal 0/
Relaxation Scheme" Proc. 1983 Int. Con/erence on Computer Design, Port Chester, New
tions to Gauss-Seidel Iterations" SIAM J. Numer. Anal., Vol. 9, No. 2, June 1972
List of Figures
EXACT
4-
3-
ITER 10
2-
ITER 5
1 - ITER 1
u(t)
T" > t
T
«->
=!
v(t)
A
5-
~T
f f
4-
|
3- j
ITERS ITER 10 (ITER 20
2-
1 -
II I
1 EXACT
u(t)
0 1 J
+ «
X 1
CM 1
X
L
+
<>
51 i "—>
(
>n
+ x~ 1
1
+ 3 I
I
_* *.
> h
+ 5>
."[
Zj + 3 I