Parallel Pde
Parallel Pde
Parallel Pde
clustered workstations
Feras A. Mahmoud, Mohammad H. Al-Towaiq
*
Jordan University of Science and Technology, Department of Mathematics and Statistics, P.O. Box 3030, Irbid 22110, Jordan
Abstract
In this paper we propose parallel algorithm for the solution of partial dierential equations over a rectangular domain
using the CrankNicholson method by cooperation with the DuFortFrankel method and apply it on a model problem,
namely, the heat conduction equation. One of the well known parallel techniques in solving partial dierential equations in
cluster computing environment is the domain decomposition technique. Using this technique, the whole domain is decom-
posed into subdomains, each of them has its own boundaries that are called the interface points. Parallelization is realized
by approximating interface values using the unconditionally stable DuFortFrankel explicit scheme, and these values serve
as Neumann boundary conditions for the CrankNicholson implicit scheme in the subdomains. The numerical results
show that our algorithm is more accurate than the algorithm based on the forward explicit method to approximate the
values of the interface points, especially, when we use a small number of time steps. Moreover, these numerical results
show that increasing the number of processors which are used in the cluster, yields an increase in the algorithm speedup.
2007 Published by Elsevier Inc.
Keywords: Heat conduction equation; Parallel computing; Domain decomposition; CrankNicholson; DuFortFrankel
1. Introduction
The mathematical formulation of most problems in science and engineering involving rates of change with
respect to two or more independent variables, usually representing time, length or angle, leads either to a par-
tial dierential equation (PDE) or to a set of such equations. Numerical approximation methods for solving
PDEs those employing nite dierences are more frequently used and more universally applicable than any
other. These numerical methods often require a large number of computations, which make us explore parallel
methods for solving PDEs.
Finite dierence solutions for PDEs can be found either explicitly or implicitly. The explicit method is easy
to implement on parallel computers but it has severe conditions for stability; that is, in order to attain reason-
able accuracy, the space step must be small which forces necessarily the time step to be small too. The implicit
0096-3003/$ - see front matter 2007 Published by Elsevier Inc.
doi:10.1016/j.amc.2007.11.013
*
Corresponding author.
E-mail addresses: [email protected] (F.A. Mahmoud), [email protected] (M.H. Al-Towaiq).
Available online at www.sciencedirect.com
Applied Mathematics and Computation 200 (2008) 178188
www.elsevier.com/locate/amc
method does not have these conditions for stability but instead a global linear system of equations needs to be
solved at each time step and it is not easy for parallel implementation.
Domain decomposition is a method widely used for solving time dependent PDEs and powerful tool for
devising parallel PDE methods. A conventional approach [1] of parallelizing the implicit scheme is to apply
the domain decomposition based on preconditioning methods to the problem arising from the semidiscretiza-
tion at each time step. In [2] is proved that the preconditioning methods is well conditioned when the time step
is small; nevertheless, small step size is not always describe in situations where implicit schemes become nec-
essary to use. If the original domain is decomposed into a set of non-overlapping subdomains, then the PDEs
dened in dierent subdomains could be solved on dierent processors concurrently. This often requires
numerical boundary conditions at the interface points between subdomains. Since these interface points are
not a part from the original model of the problem, we have to generate them numerically. One way to generate
these numerical boundary conditions is to use the solutions from the previous time step to calculate the solu-
tion at the next time step. This is often referred to as time lagging [3]. A modied approximation scheme of
mixed type was proposed by Kuznetsov [4] where the standard second order implicit scheme is used inside
each subdomain, while the explicit Euler scheme is applied to obtain the interface values on the new time level.
Once the interface values are available, the global problem is fully decoupled and can thus be computed in
parallel. In [5] Dawson proposed a similar hybrid scheme, where instead of using the same spacing as for
the interior points where the implicit scheme is applied, a larger spacing is used at each interface point where
the explicit scheme is applied. In [1] Du, Mu, and Wu proposed two new parallel nite dierence methods for
parabolic PDEs and he focused on a one-dimensional heat equation in a spatial interval 0; 1 as an example.
For computation on the subdomain interface, Du used in the rst method a high-order scheme, while he used
a multistep explicit scheme for the other one. He studied the stability and error analysis of the two new
schemes, and addressed the parallel eciency of these schemes. In [6] Zhang and Wan presented some new
techniques in designing nite dierence domain decomposition algorithm for the heat equation. The basic idea
is to dene the nite dierence schemes at the interface grid points with smaller time steps by Saulyevs asym-
metric schemes.
In this paper, we propose parallel nite dierence scheme for solving PDEs. For simplicity, we consider as a
model the heat conduction equation
ou
ot
s
2
o
2
u
ox
2
: 1
The parallel dierence scheme based on both, the CrankNicholson (CN) implicit and DuFortFrankel (DF)
explicit schemes. In this procedure, the values of interface points of each subdomain are calculated by using
the DF explicit scheme, and then these values serve as Neumann boundary conditions for the CN implicit
scheme in the subdomains. The rest of the paper is organized as follows. In Section 2, we present a detailed
description of the proposed algorithm. The stability of our parallel algorithm is given in Section 3. A numer-
ical results and performance analysis are presented in Section 4. Finally, we conclude this paper in Section 5.
2. DFCN parallel algorithm
Consider the heat conduction equation
ou
ot
s
2
o
2
u
ox
2
for 0 < x < and 0 < t < T; 2
with initial condition
ux; 0 f x for 0 6 x 6 ;
and boundary conditions
u0; t c
1
u; t c
2
for 0 6 t 6 T:
F.A. Mahmoud, M.H. Al-Towaiq / Applied Mathematics and Computation 200 (2008) 178188 179
We partition the space into n subintervals and the time into m subintervals, and we dene the space step h
n
and the time step k
T
m
. Therefore, the domain is discretized uniformly. For designing the parallel algorithm
we begin by choosing primitive tasks, identifying data communication patterns among them, and looking for
ways to agglomerate tasks.
By using the domain decomposition technique, the n 1 elements of the space are divided among p processors
fairly; that is, p divides the number of the space subintervals n. Denote the grid points by ux
i
; t
j
uih; jk u
i;j
where i 0; 1; . . . ; n and j 1; 2; . . . ; m. So that the interface points correspond to i
n
p
; 2
n
p
; . . . ; p 1
n
p
and the boundary points correspond to i 0 and i n. Each processor is responsible to compute
n
p
1 points
where each two neighbor processors share only one interface point at each time step, and each processor will
compute this interface point concurrently.
Let
r s
2
k
h
2
;
then at any time step j 1; 2; . . . ; m, and if we use the approximation w
i;j
for u
i;j
in (2), the DF explicit scheme
1 2rw
i;j1
2rw
i1;j
w
i1;j
1 2rw
i;j1
3
is applied at the interface points whereas the CN implicit scheme
2 2rw
i;j1
rw
i1;j1
w
i1;j1
2 2rw
i;j
rw
i1;j
w
i1;j
: 4
will be used to compute the interior points of each subdomain.
Fig. 1 depicts the communications needed to compute the solution at time j 1 given the solution at time j
and time j 1. Processor q is responsible for computing w
i;j1
implicitly using the CN dierence scheme (4). It
can compute the values of the gray cells (interior points) without any communications. However, it cannot
compute the values of these gray cells until it computes the values of the black cells (interface points). Proces-
sor q can compute the values of the black cells explicitly, using the DF scheme (3), only if it gets values from
neighboring processors. In Fig. 1b we show how processor q exchanges values with the neighboring proces-
sors, q 1 and q 1. After these values are received, the black cells can be computed. The parallel program
allocates two extra points for processor q at each time step (the dotted cells). These points will receive the val-
ues received from the neighboring processors that will be stored in memory locations. These memory locations
are called the ghost points. During the iteration that computes row j 1, each processor sends each of its
neighbors the appropriate border values from row j and receives the neighbors row j border value in turn.
After the values has been received into the ghost points, every processor can compute all of its row j 1 inte-
rior values using the CN scheme (4).
3. Stability of the DFCN algorithm
The DFCN parallel algorithm is stable for all values of r > 0 if and only if both the DF explicit scheme
and the CN implicit scheme are stable for all values of r > 0. When only one processor is used, the fully CN
implicit scheme is applied and the algorithm is unconditionally stable [7]. Using two or more processors in the
DFCN algorithm leads us to approximate the values of the interface points by the DF explicit scheme, then
Fig. 1. Ghost points simplify parallel nite dierence programs. (a) When computing row j 1, processor q has the data values it needs to
ll in the gray cells, but it needs values from neighboring processors to ll in black cells. (b) Every processor sends its edge values to its
neighbors. Every processor receives incoming values into ghost points.
180 F.A. Mahmoud, M.H. Al-Towaiq / Applied Mathematics and Computation 200 (2008) 178188
these interface values serve as Neumann boundary conditions for the CN implicit scheme in the subdomains.
So, we have to prove that the CN implicit scheme is unconditionally stable when boundary conditions of Neu-
mann type are used for the heat conduction equation (2).
3.1. Neumann boundary conditions
Consider a thin rod of length that is thermally insulated along its length and which radiates heat from the
end x 0. Then the boundary condition at x 0 is given by
ou
ox
g
1
u0; t v
1
:
A negative sign must be associated with
ou
ox
because the outward normal to the rod at this end is in the negative
direction of the x-axis. On the other hand, the boundary condition at x is given by
ou
ox
g
2
u; t v
2
with a positive sign because the outward normal to the rod at this end is in the same direction of the x-axis.
Note that g
1
, g
2
, v
1
, v
2
are constants in which g
1
and g
2
are nonnegative. Hence, instead of the boundary con-
ditions in (2), the boundary conditions have the form
ou0;t
ox
ou;t
ox
for 0 6 t 6 T:
If we wish to represent
ou
ox
more accurately at x 0 and x by the central dierence formula
oux; t
ox
ux h; t ux h; t
2h
; 5
it is necessary to introduce the ctitious temperature w
1;j
and w
n1;j
at the external mesh points h; jk and
h; jk, respectively, by imagining the rod to be extended very slightly. Then, the boundary conditions can
be represented by
w
1;j
w
1;j
2hg
1
w
0;j
v
1
; 6
w
n1;j
w
n1;j
2hg
2
w
n;j
v
2
: 7
The temperatures w
1;j
and w
n1;j
are unknown and necessitates another two equations. Specically, for the
CN formula, at i 0 and i n, we have
2 2rw
0;j1
rw
1;j1
w
1;j1
2 2rw
0;j
rw
1;j
w
1;j
; 8
and
2 2rw
n;j1
rw
n1;j1
w
n1;j1
2 2rw
n;j
rw
n1;j
w
n1;j
: 9
By substituting (6) and (7) into (8) and (9) respectively, the resulting formulae will be
2 2r1 hg
1
w
0;j1
2rw
1;j1
2 2r1 hg
1
w
0;j
2rw
1;j
4rhg
1
v
1
; 10
and
2 2r1 hg
2
w
n;j1
2rw
n1;j1
2 2r1 hg
2
w
n;j
2rw
n1;j
4rhg
2
v
2
: 11
For each time step we have to solve the n 1 tridiagonal system of linear equations, using Thomas algorithm
[8], which represented in matrix form as follows:
Aw
j1
Bw
j
c; for each j 0; 1; 2; . . . ; 12
where
w
j
w
0;j
; w
1;j
; . . . ; w
n;j
t
;
F.A. Mahmoud, M.H. Al-Towaiq / Applied Mathematics and Computation 200 (2008) 178188 181
and the matrices A and B and the vector c are given by
A
2 21 hg
1
r 2r 0 0
r 2 2r r 0 0
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
.
.
.
0 r 2 2r r
0 0 2r 2 21 hg
2
r
_
_
_
_
;
B
2 21 hg
1
r 2r 0 0
r 2 2r r 0 0
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
.
.
.
0 r 2 2r r
0 0 2r 2 21 hg
2
r
_
_
_
_
;
and
c 4hv
1
g
1
r; 0; . . . ; 0; 4hv
2
g
2
r
t
:
As g
1
, g
2
, h and r are all nonnegative, then the matrix A in (12) is strictly diagonally dominant [9]. Since a
diagonally dominant matrix is nonsingular [9], then there is a unique solution to the tridiagonal linear system
(12) given by
w
j1
A
1
Bw
j
A
1
c: 13
To examine stability of the CN dierence scheme (12), let us assume that an error
e
0
e
0
0
; e
0
1
; . . . ; e
0
n
t
is made in representing the initial data
w
0
w
0;0
; w
1;0
; . . . ; w
n;0
t
:
So, the initial vector is actually w
0
e
0
, and so we have
w
1
A
1
Bw
0
e
0
A
1
c A
1
Bw
0
A
1
c A
1
Be
0
w
2
A
1
Bw
1
A
1
c
A
1
B
2
w
0
A
1
BA
1
c A
1
c A
1
B
2
e
0
.
.
.
w
k
A
1
Bw
k1
A
1
c
A
1
B
k
w
0
k1
i0
A
1
B
i
A
1
c A
1
B
k
e
0
:
Hence, at the kth time step, the error in w
k
due to e
0
is A
1
B
k
e
0
. In order for this error not to be magnied
in the successive steps, we want
A
1
B
k
e
0
_
_
_
_
6 ke
0
k
for all values of k. Therefore, we must have
kA
1
B
k
k 6 1;
182 F.A. Mahmoud, M.H. Al-Towaiq / Applied Mathematics and Computation 200 (2008) 178188
which requires that
qA
1
B
k
qA
1
B
k
6 1;
where qA is the spectral radius of the matrix A [9]. The CN dierence scheme (12) is therefore stable only
when the modulus of every eigenvalue of the matrix A
1
B does not exceed one. Since the matrix B can be writ-
ten as B 4I A, then
A
1
B 4A
1
I:
Therefore, the method is stable only when
4
l
1
6 1;
where l is an eigenvalue of A. This is equivalent to l P2. Since g
1
, g
2
, h and r are all nonnegative, then an
application of Gerschgorins circle theorem [9] to the matrix A in (12) shows that all its eigenvalues are at least
2 for any value of r P0. Hence, the CN dierence scheme (12) is unconditionally stable. Note that the local
truncation error of the CN implicit scheme is Ok
2
h
2
, see [9].
3.2. Stability analysis of the DF method
The stability of the DF scheme can be investigated by writing the DF formula (3) in the matrix form
w
j1
2r
1 2r
Aw
j
1 2r
1 2r
w
j1
c; 14
where
w
j
w
1;j
; w
2;j
; . . . ; w
n1;j
t
; c 2rc
1
; 0; . . . ; 0; 2rc
2
t
;
and
A
0 1 0 0
1 0 1
.
.
.
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
1
0 0 1 0
_
_
_
_
:
Let
v
j
w
j
w
j1
_ _
;
then Eq. (14) can be written as
w
j1
w
j
_ _
2r
12r
A
12r
12r
I
I 0
_ _
w
j
w
j1
_ _
c
0
_ _
where I is the identity matrix of size n 1. Therefore,
v
j1
Pv
j
d; 15
where
P
2r
12r
A
12r
12r
I
I 0
_ _
and d
c
0
_ _
:
F.A. Mahmoud, M.H. Al-Towaiq / Applied Mathematics and Computation 200 (2008) 178188 183
This technique has reduced a three-level dierence scheme to a two-level one. The DF scheme (15) will be
unconditionally stable when each eigenvalue of p has a modulus less than or equal to 1. The following two
theorems are useful for the analysis of the stability of three or more time level dierence schemes and are easy
to use. The proof of these theorems can be found in [7].
Theorem 1. If the matrix A can be written as
A
A
11
A
12
A
1n
A
21
A
22
A
2n
.
.
.
.
.
.
.
.
.
A
n1
A
n2
A
nn
_
_
_
_
;
where each A
ij
is an m m matrix, and all the A
ij
has a common set of n linearly independent eigenvectors, then the
eigenvalues of A are given by the eigenvalues of the matrices
k
k
11
k
k
12
k
k
1n
k
k
21
k
k
22
k
k
2n
.
.
.
.
.
.
.
.
.
k
k
n1
k
k
nn
k
k
nn
_
_
_
_
; k 1; 2; . . . ; m;
where k
k
ij
is the kth eigenvalue of A
ij
corresponding to the kth eigenvector g
k
common to all the A
ij
s.
Theorem 2. The eigenvalues of the n n tridiagonal matrix A, where
A
a b 0 0
c a b
.
.
.
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
.
.
.
.
.
.
c a b
0 0 c a
_
_
_
_
;
are
k
k
a 2
bc
p
cos
kp
n 1
_ _
; k 1; 2; . . . ; n:
The matrix A from (15) is a tridiagonal matrix. So, by Theorem (2), the matrix A has n 1 dierent eigenvalues
which are
k
k
2 cos
kp
n
_ _
; k 1; 2; . . . ; n 1;
and thus it has n 1 linearly independent eigenvectors g
i
; i 1; 2; . . . ; n 1. Although the matrix I has n 1
eigenvalues each equal to 1, then it has n 1 linearly independent eigenvectors which may be taken as g
i
;
i 1; 2; . . . ; n 1. Hence, by Theorem (1), the eigenvalues l of P are the eigenvalues of
2r
12r
k
k
12r
12r
1 0
_ _
184 F.A. Mahmoud, M.H. Al-Towaiq / Applied Mathematics and Computation 200 (2008) 178188
where k
k
is the kth eigenvalue of A. The values of l can be computed by evaluating
det
2r
12r
k
k
l
12r
12r
1 l
_ _
0;
which gives
l
2
2r
1 2r
k
k
l
1 2r
1 2r
0:
Therefore,
l
2r cos
kp
n
_ _
1 4r
2
sin
2
kp
n
_ _
_
1 2r
;
and there are two cases: if 0 6 1 4r
2
sin
2
kp
n
< 1, then
jlj <
2r 1
1 2r
1;
and if 1 4r
2
sin
2
kp
n
< 0, then
jlj
2
2r cos
kp
n
_ _ _ _
2
4r
2
sin
2
kp
n
_ _
1
1 2r
2
4r
2
1
4r
2
4r 1
< 1:
Therefore the DF explicit dierence scheme (3) is unconditionally stable for all values of r > 0.
The local truncation error of this scheme is Ok
2
h
2
k
2
h
2
, see [9]. Successive renement of the values of h
and k may generate a nite dierence solution that is stable, but that may converge to the solution of a dif-
ferent PDE. For example, in the DF explicit dierence scheme, as both h and k tend to zero at the same rate,
the ratio
k
h
is constant, and thus we solve a modied PDE and not the original Eq. (2). However, the DF
scheme is consistent if k tends to zero faster than h.
4. Numerical results and performance analysis
In this section, we consider a heat conduction equation. We use the DFCN algorithm to approximate the
solution of this equation. The example is implemented using the academic cluster built in the department of
Computer Science at Jordan University of Science and Technology. This cluster contains 1 management node
and 18 Linux (Kernel 2.4.20.8 RedHat 9) workstations connected as a star network, each of which has a single
IBM Pentium IV with 2.4 GHz, 512 Cache, 512 MBs of memory and 40 GBs disk space. These hosts are con-
nected together by fast Ethernet, 1 GB switch and 1 optical interconnection switch. We use the Message Pass-
ing Interface (MPI) with the MPICH version 1.5.2 as a message passing library throughout the
implementations. The barrier synchronization and blocking point-to-point communication are used. The
graphs reported in the gures represent the average speedup and eciency over many runs of the DFCN
algorithm.
Example. Consider the heat conduction equation
ou
ot
4
p
2
o
2
u
ox
2
for 0 < x < 4 and t > 0;
with boundary conditions
u0; t u4; t 0; t > 0;
F.A. Mahmoud, M.H. Al-Towaiq / Applied Mathematics and Computation 200 (2008) 178188 185
and initial condition
ux; 0 sin
p
4
x
_ _
1 2 cos
p
4
x
_ _ _ _
; 0 6 x 6 4:
The exact solution to this problem is
ux; t e
t
sin
p
2
x
_ _
e
t
4
sin
p
4
x
_ _
:
The solutions at t 0:01 will be approximated using the proposed DFCN algorithm with several values of h
when k 1 10
6
. Figs. 24 show the execution time, speedup and eciency of the DFCN parallel dier-
ence scheme corresponding to several values of n when 1; 2; 4; 8; 12; 16 and 18 processors are used.
It is clear, from Fig. 3, that the speedup of the DFCN algorithm is not ideal, i.e, it is not linear with the
number of processors. This because of the decreasing of the problem size when the number of processors
increases, which makes the communication time to be the dominant in comparison with the computation
one. Also, the height speed of the processors used in implementing our algorithm eects on the parallel exe-
cution time; that is, the small problem sizes will take little execution time to perform a certain calculations.
However, in the DFCN algorithm, as problem size n increases, so does the height of the speedup curve. Also,
for a xed number of processors, speedup is an increasing function of the problem size.
Fig. 2. The execution time of the DFCN algorithm using several values of h with k 1 10
6
at t 0:01.
Fig. 3. Speedup of the DFCN parallel algorithm using several values of h with k 1 10
6
at t 0:01.
186 F.A. Mahmoud, M.H. Al-Towaiq / Applied Mathematics and Computation 200 (2008) 178188
For a problem of xed size, the eciency of a parallel computation typically decreases as the number of
processors increases, see Fig. 4. Since parallel communication increases when the number of processors
increases, the way to maintain eciency when increasing the number of processors is to increase the size of
the problem being solved. The proposed algorithm assumes that the data structures we manipulate t in pri-
mary memory. The maximum problem size we can solve is limited by the amount of primary memory that is
available. Also, in the DFCN parallel algorithm, as problem size n increases, the height of the eciency curve
increases.
The DFCN parallel algorithm is accurate. For example, when n 54; 000, the value of r is approximately
73:863 and the maximum error is 5:035 10
9
, while using this choice of n if the forward dierence scheme, see
[7,9], is used to approximate the values of the interface points instead of the DF scheme makes the approxi-
mate solution diverges.
To analyze the performance, let v represent the time needed to compute an interior point using the Thomas
algorithm. Using a single processor to update the n 1 points requires time n 1v. Because the algorithm
has m time steps, the total expected execution time of the sequential algorithm is
t
s
mn 1v: 16
To compute the parallel execution time using p processors, suppose that each of them is responsible for an
equal-sized portion contains
n
p
1 points, two boundaries and
n
p
1 interiors, in general. The boundary
points will be computed by the DF explicit scheme, while the interior points will be computed using the Tho-
mas algorithm. Suppose x represent the time needed to compute a boundary point, the parallel computation
time for each iteration is
t
p
comp
n
p
_ _
1
_ _
v 2x:
However, the parallel algorithm involves communication that the sequential algorithm does not. In general,
each processor must send values to its two neighboring processors and receive two values from them. If f rep-
resents the time needed for a processor to send (receive) a value to (from) another processor, the necessary
communications increase the parallel execution time for each iteration 2f. Therefore,
t
p
comm
2f:
Combining computation time with communication time, the overall parallel execution time for all m iterations
of the algorithm is
t
p
mft
p
comp
t
p
comm
g m
n
p
_ _
1
_ _
v 2x 2f
_ _
: 17
Fig. 4. Eciency of the DFCN parallel algorithm using several values of h with k 1 10
6
at t 0:01.
F.A. Mahmoud, M.H. Al-Towaiq / Applied Mathematics and Computation 200 (2008) 178188 187
The speedup relative to the sequential algorithm is
S
t
s
t
p
n 1v
n
p
_ _
1
_ _
v 2x 2f
;
18
and the parallel eciency is given by
E
S
p
n 1v
n p v 2px 2pf
: 19
5. Conclusion
In this paper, the DFCN parallel algorithm has been discussed in depth. This algorithm uses the DF expli-
cit scheme to approximate the solution at the interior boundaries between subdomains. For the remaining
points in each subdomain, the algorithm uses the CN implicit scheme. This scheme has no stability constraint
and prevents the algorithm from moving to worse approximations.
A numerical example is given for the proposed DFCN parallel algorithm. From the numerical results we
conclude that the DFCN algorithm is recommended when small values of time steps are used. Small number
of time steps will decrease the inter-processors communications; which decreases the communication time of
this parallel algorithm. This algorithm gave more accurate results than the parallel algorithm that uses the for-
ward dierence scheme to approximate the values of the interface points especially when a small number of
time steps are used.
Furthermore, in the DFCN parallel algorithm, as problem size n increases, so does the height of the
speedup curve. Also, for a xed number of processors, speedup is an increasing function of the problem size.
Moreover, the eciency of the DFCN algorithm computation typically decreases as the number of proces-
sors increases. The way to maintain eciency when increasing the number of processors is to increase the size
of the problem being solved. Unfortunately, the maximum problem size we can solve is limited by the amount
of primary memory that is available.
References
[1] Q. Du, M. Mu, Z.N. Wu, Ecient parallel algorithms for parabolic problems, SIAM J. Numer. Anal. 30 (2001) 14691487.
[2] X. Cai, Additive Schwarz algorithms for parabolic convectiondiusion equations, Numer. Math. 50 (1991) 4152.
[3] Rivera-Gallego Wilson, Stability analysis of numerical boundary conditions in domain decomposition algorithms, Appl. Math.
Comput. 137 (2003) 375385.
[4] Y.A. Kuznetsov, New algorithms for approximate realization of implicit dierence scheme, Soviet J. Numer. Anal. Math. Model. 3
(1988) 99114.
[5] C.N. Dawson, Q. Du, T.F. Dupnot, A nite dierence domain decomposition algorithm for numerical solution of the heat equation,
Math. Comput. 57 (1991) 6371.
[6] Zhang Bao-Lin, Wan Zheng-Su, New techniques in designing nite-dierence domain decomposition algorithm for the heat equation,
Comput. Math. Appl. 45 (2003) 16951705.
[7] G.D. Smith, Numerical Solution of Partial Dierential Equations: Finite Dierence Methods, Oxford University Press, London, 1978.
[8] Gearge Em Karniadakis, Robert M. Kirby II, Parallel Scientic Computing in C++ and MPI, Cambridge University Press,
Cambridge, 2003.
[9] Richard L. Burden, J. Douglas Faires, Numerical Analysis, seventh ed., Brooks/Cole, United States of America, 2001.
188 F.A. Mahmoud, M.H. Al-Towaiq / Applied Mathematics and Computation 200 (2008) 178188