High Performance Computing Using Out-of-Core Sparse Direct Solvers
High Performance Computing Using Out-of-Core Sparse Direct Solvers
finite element formulations for three dimensional problems on a past there has been lot of research to reduce the time for I/O
Windows platform. Here three different solvers, HSL_MA78, and make the out-of-core solvers efficient. The capability and
MUMPS and PARDISO are compared. The performance of these performance of out-of-core solvers in the context finite
solvers is evaluated on a 64-bit machine with 16GB RAM for finite
element Navier-Stokes code is assessed in this paper. Three
element formulation of flow through a rectangular channel. It is
observed that using out-of-core PARDISO solver, relatively large state-of-the art out-of-core solvers - MUMPS, HSL_MA78
problems can be solved. The implementation of Newton and and PARDISO are evaluated. To the best of author’s
modified Newton's iteration is also discussed. knowledge no such comparison of the performance of out-of-
core has been reported in the literature.
Keywords—Out-of-core, PARDISO, MUMPS, Newton. MUMPS [11]-[13] is a parallel direct solver with out-of-
core functionality and is available in the public domain.
I. INTRODUCTION PARDISO [2], [14]-[17] also has an out-of-core solver and it
is available as a part of the INTEL Math Kernel Library [18].
T HE use of sparse direct solvers in the context of finite
element discretization of Navier-Stokes equations for
three dimensional problems is limited by its huge memory
HSL_MA78 [19], an out-of-core solver, is available as part of
HSL 2007, which is available free for any UK researchers. An
requirement. Nevertheless, direct solvers are preferred due to evaluation version of HSL_MA78 is used in this paper.
their robustness. The development of sparse direct solvers In finite element Navier-Stokes formulations, the set of
based on algorithms like multifrontal [1], supernodal [2] etc. linear equations generated usually generate a matrix that zero
have significantly reduced the memory requirements diagonal entries. Penalty formulation yields non-zero diagonal
compared to the traditional frontal solvers [3]. The superior entries but it is observed that the diagonal entries are few
performance of multifrontal solvers has been demonstrated for orders of magnitudes smaller than the other non diagonal
different CFD applications [4]-[7] and also in power system entries. The iterative solution methods fail or pose severe
simulations [8]-[10]. It has been identified [4]-[7] that the convergence problems for such ill conditioned matrices.
memory requirement is a bottleneck in solving large three- Although the iterative solvers are memory efficient, the
dimensional CFD problems. There are different viable resolution of convergence issues is not straightforward and
alternatives for overcoming the huge memory requirements. results in lack of robustness. The performance of a suite of
One alternative is to run on a 64 bit machine having large iterative solvers is compared with the out-of-core direct
RAM. The second alternative is to use out-of-core solver, solvers to demonstrate the superiority of direct solvers.
where the factors are written to the disk, thereby minimizing
the in-core requirements. The third alternative is to use II. MATHEMATICAL FORMULATION
parallel solvers in a distributed computing environment where A benchmark rectangular channel flow problem is chosen
the memory is distributed amongst the different processors. for evaluating the out-of-core solvers. The governing
Recent efforts by the authors show that by using a 64 bit and equations for laminar flow inside a rectangular channel are
16GB RAM machine, relatively larger problems can be presented below in the non-dimensional form. In three-
handled in-core. dimensional calculations, instead of the primitive
formulation, penalty approach is used to reduce the memory
Mandhapati P. Raju is currently with the General Motors Inc., Warren, MI requirements. The equations are all presented in the non-
48093USA (phone: 586-986-1365; e-mail: [email protected]).
Siddhartha Khaitan, is with Iowa State University, Ames, IA 50011 USA. dimensional form.
(e-mail: [email protected]).
International Scholarly and Scientific Research & Innovation 3(9) 2009 639 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009
∂ 2 ∂ ⎛ ∂uˆ ∂vˆ ∂wˆ ⎞ ∂ ⎛ 2 ∂uˆ ⎞
∂xˆ
( uˆ ) + ∂∂yˆ ( uvˆ ˆ ) + ∂∂zˆ ( uw
ˆˆ)=λ ⎜ + + ⎟+ ⎜ ⎟
∂xˆ ⎝ ∂xˆ ∂yˆ ∂zˆ ⎠ ∂xˆ ⎝ Re ∂xˆ ⎠
(2) till the infinity norm of the correction vector δ X (i ) converges
∂ ⎛ 1 ⎛ ∂uˆ ∂vˆ ⎞ ⎞ ∂ ⎛ 1 ⎛ ∂uˆ ∂wˆ ⎞ ⎞
to a prescribed tolerance of 10-10. A modified Newton’s
+ ⎜ ⎜ + ⎟⎟ + ⎜ + ⎟ ,
∂yˆ ⎝ Re ⎝ ∂yˆ ∂xˆ ⎠ ⎠ ∂zˆ ⎝⎜ Re ⎝ ∂zˆ ∂xˆ ⎠ ⎠⎟ method is also used in this study. For modified Newton eq. (9)
is modified as shown in eq. 11
[ J ]( 0) {δ X (i ) } = − {RX } . (11)
(i )
∂ ∂ ∂ ∂ ⎛ ∂uˆ ∂vˆ ∂wˆ ⎞ ∂ ⎛ 1 ⎛ ∂uˆ ∂vˆ ⎞ ⎞
ˆ ˆ ) + ( vˆ 2 ) + ( vw
( uv ˆˆ)=λ ⎜ + + ⎟+ ⎜ ⎜ + ⎟⎟
(3)
∂xˆ ∂yˆ ∂zˆ ∂yˆ ⎝ ∂xˆ ∂yˆ ∂zˆ ⎠ ∂xˆ ⎝ Re ⎝ ∂yˆ ∂xˆ ⎠ ⎠
In modified Newton’s method the Jacobian is evaluated
∂ ⎛ 2 ∂vˆ ⎞ ∂ ⎛ 1 ⎛ ∂vˆ ∂wˆ ⎞ ⎞ only during the first iteration. Consequently the Jacobian is
+ ⎜ ⎟+ ⎜ ⎜ + ⎟⎟,
∂yˆ ⎝ Re ∂yˆ ⎠ ∂zˆ ⎝ Re ⎝ ∂zˆ ∂yˆ ⎠ ⎠
factorized only once. For all subsequent iterations, the same
Jacobian (and hence its LU factors) is used repeatedly. This
and algorithm is referred as modified Newton. Since factorization
∂ ∂ ∂ ∂ ⎛ ∂uˆ ∂vˆ ∂wˆ ⎞ ∂ ⎛ 1 ⎛ ∂uˆ ∂wˆ ⎞ ⎞ (4)
( uw ˆ ˆ ) + ( wˆ 2 ) = λ ⎜ + +
ˆ ˆ ) + ( vw ⎟+ ⎜ + ⎟ is the most expensive part of the computations, by using
∂xˆ ∂yˆ ∂zˆ ∂zˆ ⎝ ∂xˆ ∂yˆ ∂zˆ ⎠ ∂xˆ ⎜⎝ Re ⎝ ∂zˆ ∂xˆ ⎠ ⎟⎠
modified Newton’s algorithm, the expensive factorization step
∂ ⎛ 1 ⎛ ∂vˆ ∂wˆ ⎞ ⎞ ∂ ⎛ 2 ∂wˆ ⎞
+ ⎜ ⎜ + ⎟⎟ + ⎜ ⎟. can be skipped after the first iteration.
∂yˆ ⎝ Re ⎝ ∂zˆ ∂yˆ ⎠ ⎠ ∂zˆ ⎝ Re ∂zˆ ⎠
Open Science Index, Mathematical and Computational Sciences Vol:3, No:9, 2009 waset.org/Publication/4908
International Scholarly and Scientific Research & Innovation 3(9) 2009 640 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009
iteration however is more robust in terms of its flexibility for [22]). In addition, there is a provision to link METIS [23] as
choosing an initial guess. However, the rate of convergence an external package. Memory relaxation is taken as 100%.
will be linear and hence would result in more computational MUMPS out-of-core solver is used for all the cases. Table 1
time. shows the comparison of the performance of the various
The choice of modified Newton or modified Picard can ordering methods. All the cases are run for 30x30x30 mesh.
further significantly reduce the computational time. In the The CPU time and memory for each of the solver are
modified Newton method, the left hand side matrix is compared. The CPU time reported is the CPU time for the
factorized only once and the factors are reused. Since first Newton iteration. The CPU time and memory
factorization is the bottleneck, by avoiding the factorization in requirement for the complete in-core solution is also included
the subsequent steps can reduce the computational time. The in the brackets for a quick comparison. It is to be first noted
rate of convergence will no longer be quadratic but linear. that the out-of-core solution is around 3-5 times slower
Nevertheless, there will be significant savings in compared to the in-core solution. Of all the ordering packages,
computational time. This paper discussed only Newton and METIS gives best results. Compared to AMD, METIS results
modified Newton implementation. in almost one-third of the floating point operations. The
computational time and memory requirements are the lowest
IV. RESULTS AND DISCUSSION for the METIS ordering. Nested bisection algorithm of
In this paper, flow inside a three dimensional rectangular METIS is found to generate good ordering for three
Open Science Index, Mathematical and Computational Sciences Vol:3, No:9, 2009 waset.org/Publication/4908
channel is considered. Three dimensional finite element brick dimensional meshes. Based on this result, METIS ordering is
elements are used for generating the grid. Weak Galerkin used for all subsequent runs using MUMPS solver.
finite element formulation is used to discretize the Navier-
TABLE 1: COMPARISON OF ORDERING METHODS FOR THE MUMPS SOLVER
Stokes equations to form a large set of non-linear equations.
Newton's iteration is used to generate a set of linear algebraic Memory (GB)
ordering #dof's Cpu time (sec) in-core out-of-
equations. The matrices generated from such discretization are arrays core files
usually very sparse and hence a good sparse solver is used to
AMD 89373 438.4 (142.8) 1.4 (4.06) 2
reduce the computational efforts. It is to be noted that for three
dimensional grids, the matrices generated are less sparse QAMD 89373 446.6 (142.75) 1.4 (4.04) 2
compared to the matrices generated from two-dimensional AMF 89373 352 (105.7) 1.34 (3.48) 1.67
grid. PORD 89373 309 (86.6) 1.09 (3.18) 1.5
Typically an interior node in a three-dimensional grid is
METIS 89373 250.3 (55.01) 0.78 (3.02) 1.28
connected to 27 nodes including it. Since there are 3 dof's at
each node, a typical row consists of 81 non-zero entries. In a HSL_MA78 solver
two-dimensional grid, a typical row consists of 27 non-zero The HSL MA78solver does not any internal ordering
entries. This would increase the frontal size considerably. techniques but however the HSL solver package has other
Hence solving three-dimensional problems using direct routines which do the function of ordering the finite element
solvers is quite challenging both in terms of computational entries to reduce the fill-in during factorization. HSL MC68 is
time and memory requirements. Large problems cannot be generally used for efficient ordering of finite element
solved on a 32-bit machine using in-core techniques [4], [7]. matrices. In addition, external ordering packages can be
This paper studies the performance of out-of-core direct hooked to the HSL MA78 solver. In this paper METIS
solvers on a 64 bit machine with 16GB RAM. All the ordering is also used by hooking the METIS library to the
computations are run a windows machine with Intel Xeon solver. Table 2 shows the performance of HSL MC68 and
processor. METIS ordering on the CPU time and memory of the HSL
Before comparing the various solvers for their relative MA78 solver. It is found that METIS performs better than
performances, each individual solver is tuned for its optimal HSL MC68. Hence METIS is used for all subsequent runs for
performance, specifically the choice of the ordering package. the HSL MA78 solver.
Each solver has inbuilt ordering packages, whose choice can
affect the performance of the solver. In addition there are TABLE II: COMPARISON OF ORDERING METHODS FOR THE HSL-MA78 FOR
other parameters like pivot tolerance etc which will affect the 30X30X30 GRID
performance of the solver. Memory (GB)
ordering #dof's Cpu time (sec) in-core out-of-
MUMPS solver arrays core files
The sequential version of out-of-core MUMPS solver is HSL_MC68 89373 536 (524) 1.4 (6.88) 3.5
built on a 64 bit machine. The choice of the out-of-core solver METIS 89373 321 (318) 0.79 (4.72) 2
can be invoked by setting the value of
mumps_par%ICNTL(22) as 1. MUMPS has different inbuilt PARDISO solver
ordering packages (AMD [20], QAMD [21], AMF, PORD PARDISO has minimum dissection (MD) and METIS
International Scholarly and Scientific Research & Innovation 3(9) 2009 641 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009
ordering hooked internally within the solver. The user has the HSL_MA78 and PARDISO solvers for different grid sizes.
choice to use either of the ordering techniques. Table 3 The performance of in-core solution is also presented for
compares the effect of MD and METIS ordering on the relative comparison. Table 4 shows that out-of-core MUMPS
performance of PARDISO solver. It is found that METIS solver is always greater than 4 times slower compared to the
performs better than the MD ordering technique. Hence in-core solver. This shows that the out-of-core implementation
METIS is used for all subsequent runs for the PARDISO of MUMPS solver is less efficient. The in-core memory
solver. requirement is maintained low. Surprisingly the out-of-core
HSL_MA78 solver is very efficient with respect to the in-core
TABLE III: COMPARISON OF ORDERING METHODS FOR THE PARDISO SOLVER
FOR 30X30X30 GRID
solver. The computational times for both in-core and out-of-
ORDERING #DOF'S CPU TIME MEMORY (GB) core implementations are almost similar. The in-core memory
(SEC)
IN-CORE OUT-OF-CORE
for the out-of-core solver is maintained low. The out-of-core
ARRAYS FILES memory requirement is larger for HSL_MA78 solver
MD 89373 284 (162) 0.5 (2.71) 2.8 compared to the other two solvers.
METIS 89373 97 (59) 0.3 (1.42) 1.25
TABLE VI: PERFORMANCE OF MUMPS SOLVER ON DIFFERENT GRID SIZES
times slower than in-core solver. PARDISO out-of-core solver 100 50 50 788103 * * 126.1 10.2 24.1
is around 1.6 times slower than its in-core solver. Another 50 20 20 67473 0.45 1.81 2.36 0.47 0.78
interesting observation is that out-of-core PARDISO has a
50 50 10 85833 0.495 2.11 2.72 0.45 0.925
relatively large solve phase compared to out-of-core MUMPS.
PARDISO has much less in-core memory requirement 50 50 20 163863 2.228 5.93 8.67 1.3 2.64
TABLE IV: COMPARISON OF TIME SPLIT FOR IN-CORE SOLUTION OF 30X30X30 TABLE VII: PERFORMANCE OF HSL_MA78 SOLVER ON DIFFERENT GRID SIZES
GRID
HSL in-core HSL out-of-core
cpu Memory (GB)
time Memo cpu Out-
In-core Computational time (Seconds) Memory
(min ry time In- of-
solvers Matrix Analysi Numeric Solve Total (GB)
nex ney nez #dof's ) (GB) (min) core core
assembly s phase phase phase time
MUMPS 4 1.51 53.3 0.58 59.39 3.02 50 10 10 18513 0.18 0.24 0.193 0.14 0.32
PARDISO 4 2.19 52.6 0.47 59.26 1.42 100 10 10 36663 0.35 0.5 0.363 0.14 0.625
HSL_MA7
8 4 0.64 313.14 317.78 4.72 200 10 10 72963 0.59 0.79 0.612 0.14 1.15
50 20 10 35343 0.78 0.65 0.8 0.23 0.84
TABLE 5: COMPARISON OF TIME SPLIT FOR OUT-OF-CORE SOLUTION OF 100 20 10 69993 1.06 1.53 1.074 0.23 1.475
30X30X30 GRID
100 20 20 133623 4.83 3.91 3.936 0.5 3.54
Computational time (Seconds) Memory (GB)
Out- 100 50 20 324513 28.9 13.94 23.52 1.56 12.67
Out-core Matrix Analysis Numeric Solve Total Incore of-core
solvers assembly phase phase phase time files files 100 50 50 788103 * * 542.6 8.1 49.95
250.2
MUMPS 4 1.6 243.2 1.45 5 0.78 1.28 50 20 20 67473 2.42 1.98 2.46 0.5 1.93
PARDISO 4 2.2 87.76 3.07 97.03 0.3 1.25 50 50 10 85833 2.65 2.38 2.69 0.5 2.37
HSL_MA7 318.6
8 4 0.64 314 4 0.79 3.174 50 50 20 163863 11.9 7.12 11.3 1.56 6.45
International Scholarly and Scientific Research & Innovation 3(9) 2009 642 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009
Table 8 shows the performance of PARDISO solver. The core refers to the in-core and out-of-core memory
out-of-core solver is around twice slower compared to the in- requirements for the out-of-core solver, nex, ney and nez refer
core solver. Overall PARDISO out-of-core solver is must to the grid elements in the x,y and z directions respectively, n
faster compared to the other two solvers. The in-core memory refers to the total number of degrees of freedom, ar and ar
requirement is kept very low. For a 100x50x50 mesh, out-of- refers to the grid aspect ratio's nex/ney and nex/nez.
core MUMPS requires around 10 GB of in-core memory and MUMPS
out-of-core HSL_MA78 requires around 8 GB of in-core T = 3.56 ×10−7 n1.447 ar1−0.127 ar2 −0.127 ; R 2 = 0.95 (12)
memory and out-of-core PARDISO requires around 4 GB of M incore = 8.28 ×10−7 n1.214 ar1−0.197 ar2 −0.197 ; R 2 = 0.98 (13)
in-core memory. Hence PARDISO can solve much finer grid
M outofcore = 2.11×10−7 n1.377 ar1−0.127 ar2 −0.127 ; R 2 = 0.98 (14)
sizes compared to the other two solvers. The finest grid sizes
chosen to solve with PARDISO in this paper are 150x75x30 HSL
and 200x80x40, which consists of around 1 million and 2 T = 3.53 ×10−8 n1.707 ar1−0.362 ar2 −0.362 ; R 2 = 0.7 (15)
million degrees of freedom. The 150x75x30 grid requires M incore = 2.6 ×10 n−4 0.736
ar
1
−0.33
ar2 −0.33
; R = 0.98
2 (16)
around 5.5 GB of in-core memory. The out-of-core memory is −6 −0.155 −0.155 (17)
M outofcore = 3.42 ×10 n 1.219
ar
1 ar2 ; R = 0.99
2
−6 −0.01 −0.01
Newton iteration takes around 16.5 hours of CPU time. Thus M incore = 3.84 ×10 n 1.02
ar 1 ar2 ; R = 0.99
2 (19)
we observe that out-of-core PARDISO can solve very large −7
M outofcore = 1.39 ×10 n 1.407
ar−0.112
ar2 −0.112
; R 2 = 0.99 (20)
1
three dimensional problems in the context of using direct
solvers on a single desktop. Both in terms of computational
The correlations will give an idea of how the solver
time and memory requirement, PARDISO is found to the best
requirements vary as the grid size is modified. The exponents
solver.
of n is greater than 1 indicating that as the number of degrees
of freedom increases, the CPU time and memory requirement
TABLE VIII: PERFORMANCE OF PARDISO SOLVER ON DIFFERENT GRID SIZES
are going to increase superlinearly. For the out-of-core
PARDISO in‐core PARDISO out‐of‐core solvers, the exponents of n are similar for all the three solvers,
cpu cpu Memory (GB)
ne time Memory time incore out‐of‐ with MUMPS being the lower of around 1. 45. The CPU time
nex ney z #dof's (min) (GB) (min) core (and memory requirement are not only a function of number
50 10 10 18513 0.038 0.08 0.067 0.087 0.1 of degrees of freedom but also a function of the grid aspect
0.15 0.17 0.22
ratio’s. The absolute values of the exponents of the aspect
100 10 10 36663 0.073 0.25
ratio’s is larger for HSL solver compared to MUMPS and
200 10 10 72963 0.157 0.57 0.304 0.34 0.47
PARDISO. This indicates that the solver performance is also a
50 20 10 35343 0.105 0.29 0.209 0.17 0.27 strong function of the grid distribution also. The memory
100 20 10 69993 0.243 0.69 0.515 0.337 0.593 requirements of the out-of-core solver consists of in-core
1.758 0.665 1.693
memory (for holding the frontal matrices and other working
100 20 20 133623 1.073 1.92
arrays) requirement and out-of-core memory (consists of LU
9.5 1.65 6.035
100 50 20 324513 6.517 6.62 factors written to the disk) requirements. Correlations are
100 50 50 788103 * * 114.2 4.1 24.5 presented for both the memory requirements. An interesting
0.731 0.33 0.763 observation is that the in-core memory requirement for
50 20 20 67473 0.432 0.85
PARDISO the least amongst the three solvers and that the
50 50 10 85833 0.457 1.02 0.84 0.42 0.895
memory requirement varies linearly with the number of
50 50 20 163863 2.233 2.86 3.4 0.83 2.6 degrees of freedom and is almost independent of the grid
50 50 50 397953 17.650 10.67 25.4 2.03 10.15 distribution. This behavior of out-of-core PARDISO is very
171.95 5.52 31
conducive for choosing it for solving large three dimensional
150 75 30 1067268 * *
finite element problems.
993 10.5 74.8
200 80 40 2002563 * *
International Scholarly and Scientific Research & Innovation 3(9) 2009 643 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009
10-4 Newton
250 V. CONCLUSIONS
Modified Newton
||δX||∞
200
Three different out-of-core solvers (MUMPS, HSL_MA78,
10-6
PARDISO) are evaluated for the solution of finite element
-8
150
Navier-Stokes formulation of laminar flow in a rectangular
10
100 channel. METIS is found to be the best choice of ordering
10 -10
algorithm for reducing the fill in of LU factors. Of the three
50
solvers, PARDISO is found to the best solver with lower
0 1 2 3 4 5 6 7
0 computational time and lower in-core and out-of-core memory
Iterations requirements.
It is observed that out-of-core HSL_MA78 is found to
Fig. 1 Comparison of CPU time and residual norm for Newton and
modified Newton's method for 30x30x30 grid perform almost identically with that of the in-core
HSL_MA78 solver. HSL_F01 facilitates the efficient I/O
Figure 1 shows the performance of Newton and operations for the HSL_MA78 solver. However the out-of-
Open Science Index, Mathematical and Computational Sciences Vol:3, No:9, 2009 waset.org/Publication/4908
modified Newton algorithms using out-of-core PARDISO as core HSL_MA78 is much slower than MUMPS and
the linear solver for a 30x30x30 grid. No under-relaxation is PARDISO out-of-core solvers. Out-of-core strategy can help
used for both Newton and modified Newton's method. in solving large three dimensional finite element problems.
Newton's method converges in 4 iterations and quadratic Out-of-core PARDISO could solve around 2 million equations
convergence is observed. Modified Newton's method resulting from three dimensional finite element formulations
converges in 6 iterations. Linear convergence is observed. It is on a single desktop. Further it is observed that the use of
observed that significant computational time savings can be modified Newton's algorithm can significantly reduce the
achieved using modified Newton's method. Newton's computational time as compared to the Newton's algorithm.
iterations converge in 381 seconds, whereas modified
Newton's iterations converge in 128 seconds. REFERENCES
[1] T. A. Davis and I.S. Duff, “A combined unifrontal/multifrontal method
TABLE IX: COMPARISON OF CPU TIMES FOR NEWTON AND MODIFIED for unsymmetric sparse matrices,” ACM Trans. Math. Soft., vol. 25, no.
NEWTON FOR DIFFERENT GRIDS 1, 1997, pp. 1–19.
[2] O. Schenk, K. Gartner, and W. Fichtner, “Efficient Sparse LU
PARDISO out‐of‐core Newton Modified Newton Factorization with Left-right Looking Strategy on Shared Memory
Multiprocessors,” BIT, vol. 40, no. 1, 2000, pp. 158–176.
nex ney nez #dof's cpu time (min) cpu time (min) [3] B. M. Irons, “A frontal solution scheme for finite element analysis,”
Numer. Meth. Engg., vol. 2, 1970, pp. 5-32.
50 10 10 18513 0.34 0.152
[4] M. P. Raju, and J. S. T’ien, “Development of Direct Multifrontal Solvers
100 10 10 36663 0.8 0.327 for Combustion Problems,” Numerical Heat Transfer-Part B, vol. 53,
2008, pp. 1-17.
200 10 10 72963 1.78 0.77 [5] M. P. Raju, and J. S. T’ien, “Modelling of Candle Wick Burning with a
Self-trimmed Wick,” Comb. Theory Modell., vol. 12, no. 2, 2008, pp.
50 20 10 35343 0.99 0.405
367-388.
100 20 10 69993 2.35 0.937 [6] M. P. Raju, and J. S. T’ien, “Two-phase flow inside an externally heated
axisymmetric porous wick,” vol. 11, no. 8, 2008, pp. 701-718.
100 20 20 133623 7.65 2.53 [7] P. K. Gupta, and K. V. Pagalthivarthi, “Application of Multifrontal and
GMRES Solvers for Multisize Particulate Flow in Rotating Channels,”
100 50 20 324513 39.51 11.83
Prog. Comput Fluid Dynam., vol. 7, 2007, pp. 323–336.
100 50 50 788103 468.1 122.5 [8] S. Khaitan, J. McCalley, Q. Chen, "Multifrontal solver for online power
system time-domain simulation," IEEE Transactions on Power Systems,
50 20 20 67473 3.27 1.09 vol. 23, no. 4, 2008, pp. 1727–1737.
[9] S. Khaitan, C. Fu, J. D. McCalley, "Fast parallelized algorithms for
50 50 10 85833 3.57 1.247
online extended-term dynamic cascading analysis," PSCE, 2009, pp. 1–
50 50 20 163863 14.4 4.37 7.
[10] J. McCalley, S. Khaitan, “Risk of Cascading outages”, Final Report,
50 50 50 397953 106.4 27.98 PSrec Report, S-26, August 2007. Available at
http://www.pserc.org/docsa/Executive_Summary_Dobson_McCalley_C
150 75 30 1067268 693 181.8
ascading_Outage_ S-2626_PSERC_ Final_Report.pdf
[11] P. R. Amestoy, and I. S. Duff, “Vectorization of a multiprocessor
Table 9 shows the comparison of CPU times for Newton multifrontal code,” International Journal of Supercomputer
and modified Newton methods for different grid sizes. It is Applications, vol. 3, 1989, pp. 41–59.
[12] P. R. Amestoy, I. S. Duff, J. Koster and J. Y. L’Excellent, “A fully
clearly observed that the implementation of modified Newton asynchronous multifrontal solver using distributed dynamic scheduling,”
methods leads to significant savings in computational time. SIAM Journal on Matrix Analysis and Applications, vol. 23, no. 1, 2001,
Further it is observed that the number of degrees of freedom pp. 15–41.
International Scholarly and Scientific Research & Innovation 3(9) 2009 644 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009
International Scholarly and Scientific Research & Innovation 3(9) 2009 645 ISNI:0000000091950263