0% found this document useful (0 votes)

70 views7 pages

High Performance Computing Using Out-of-Core Sparse Direct Solvers

Uploaded by

eko123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views7 pages

High Performance Computing Using Out-of-Core Sparse Direct Solvers

Uploaded by

eko123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

World Academy of Science, Engineering and Technology

International Journal of Mathematical and Computational Sciences

Vol:3, No:9, 2009

High Performance Computing Using Out-of-

Core Sparse Direct Solvers
Mandhapati P. Raju and Siddhartha Khaitan

However, as the problem size increases, the in-core

Abstract—In-core memory requirement is a bottleneck in solving memory requirement quickly exceeds 16GB. To increase the
large three dimensional Navier-Stokes finite element problem size of RAM is cost wise very expensive. On the other hand
formulations using sparse direct solvers. Out-of-core solution out-of-core solvers can handle very large problems with
strategy is a viable alternative to reduce the in-core memory
smaller in-core memory requirements. The disadvantage of
requirements while solving large scale problems. This study
evaluates the performance of various out-of-core sequential solvers using out-of-core solvers is that the computational time
based on multifrontal or supernodal techniques in the context of increases due to the I/O operations on the disk. In the recent
Open Science Index, Mathematical and Computational Sciences Vol:3, No:9, 2009 waset.org/Publication/4908

finite element formulations for three dimensional problems on a past there has been lot of research to reduce the time for I/O
Windows platform. Here three different solvers, HSL_MA78, and make the out-of-core solvers efficient. The capability and
MUMPS and PARDISO are compared. The performance of these performance of out-of-core solvers in the context finite
solvers is evaluated on a 64-bit machine with 16GB RAM for finite
element Navier-Stokes code is assessed in this paper. Three
element formulation of flow through a rectangular channel. It is
observed that using out-of-core PARDISO solver, relatively large state-of-the art out-of-core solvers - MUMPS, HSL_MA78
problems can be solved. The implementation of Newton and and PARDISO are evaluated. To the best of author’s
modified Newton's iteration is also discussed. knowledge no such comparison of the performance of out-of-
core has been reported in the literature.
Keywords—Out-of-core, PARDISO, MUMPS, Newton. MUMPS [11]-[13] is a parallel direct solver with out-of-
core functionality and is available in the public domain.
I. INTRODUCTION PARDISO [2], [14]-[17] also has an out-of-core solver and it
is available as a part of the INTEL Math Kernel Library [18].
T HE use of sparse direct solvers in the context of finite
element discretization of Navier-Stokes equations for
three dimensional problems is limited by its huge memory
HSL_MA78 [19], an out-of-core solver, is available as part of
HSL 2007, which is available free for any UK researchers. An
requirement. Nevertheless, direct solvers are preferred due to evaluation version of HSL_MA78 is used in this paper.
their robustness. The development of sparse direct solvers In finite element Navier-Stokes formulations, the set of
based on algorithms like multifrontal [1], supernodal [2] etc. linear equations generated usually generate a matrix that zero
have significantly reduced the memory requirements diagonal entries. Penalty formulation yields non-zero diagonal
compared to the traditional frontal solvers [3]. The superior entries but it is observed that the diagonal entries are few
performance of multifrontal solvers has been demonstrated for orders of magnitudes smaller than the other non diagonal
different CFD applications [4]-[7] and also in power system entries. The iterative solution methods fail or pose severe
simulations [8]-[10]. It has been identified [4]-[7] that the convergence problems for such ill conditioned matrices.
memory requirement is a bottleneck in solving large three- Although the iterative solvers are memory efficient, the
dimensional CFD problems. There are different viable resolution of convergence issues is not straightforward and
alternatives for overcoming the huge memory requirements. results in lack of robustness. The performance of a suite of
One alternative is to run on a 64 bit machine having large iterative solvers is compared with the out-of-core direct
RAM. The second alternative is to use out-of-core solver, solvers to demonstrate the superiority of direct solvers.
where the factors are written to the disk, thereby minimizing
the in-core requirements. The third alternative is to use II. MATHEMATICAL FORMULATION
parallel solvers in a distributed computing environment where A benchmark rectangular channel flow problem is chosen
the memory is distributed amongst the different processors. for evaluating the out-of-core solvers. The governing
Recent efforts by the authors show that by using a 64 bit and equations for laminar flow inside a rectangular channel are
16GB RAM machine, relatively larger problems can be presented below in the non-dimensional form. In three-
handled in-core. dimensional calculations, instead of the primitive
formulation, penalty approach is used to reduce the memory
Mandhapati P. Raju is currently with the General Motors Inc., Warren, MI requirements. The equations are all presented in the non-
48093USA (phone: 586-986-1365; e-mail: [email protected]).
Siddhartha Khaitan, is with Iowa State University, Ames, IA 50011 USA. dimensional form.
(e-mail: [email protected]).

International Scholarly and Scientific Research & Innovation 3(9) 2009 639 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009

∂uˆ ∂vˆ ∂wˆ ∂R ( )

i
+ + = 0, (1) [ J ]( ) = X(i ) . (10)
i

∂xˆ ∂yˆ ∂zˆ ∂X

{RX }( ) is the residual vector. Newton’s iteration is continued
i

∂ 2 ∂ ⎛ ∂uˆ ∂vˆ ∂wˆ ⎞ ∂ ⎛ 2 ∂uˆ ⎞
∂xˆ
( uˆ ) + ∂∂yˆ ( uvˆ ˆ ) + ∂∂zˆ ( uw
ˆˆ)=λ ⎜ + + ⎟+ ⎜ ⎟
∂xˆ ⎝ ∂xˆ ∂yˆ ∂zˆ ⎠ ∂xˆ ⎝ Re ∂xˆ ⎠
(2) till the infinity norm of the correction vector δ X (i ) converges

∂ ⎛ 1 ⎛ ∂uˆ ∂vˆ ⎞ ⎞ ∂ ⎛ 1 ⎛ ∂uˆ ∂wˆ ⎞ ⎞
to a prescribed tolerance of 10-10. A modified Newton’s
+ ⎜ ⎜ + ⎟⎟ + ⎜ + ⎟ ,
∂yˆ ⎝ Re ⎝ ∂yˆ ∂xˆ ⎠ ⎠ ∂zˆ ⎝⎜ Re ⎝ ∂zˆ ∂xˆ ⎠ ⎠⎟ method is also used in this study. For modified Newton eq. (9)
is modified as shown in eq. 11
[ J ]( 0) {δ X (i ) } = − {RX } . (11)
(i )
∂ ∂ ∂ ∂ ⎛ ∂uˆ ∂vˆ ∂wˆ ⎞ ∂ ⎛ 1 ⎛ ∂uˆ ∂vˆ ⎞ ⎞
ˆ ˆ ) + ( vˆ 2 ) + ( vw
( uv ˆˆ)=λ ⎜ + + ⎟+ ⎜ ⎜ + ⎟⎟
(3)
∂xˆ ∂yˆ ∂zˆ ∂yˆ ⎝ ∂xˆ ∂yˆ ∂zˆ ⎠ ∂xˆ ⎝ Re ⎝ ∂yˆ ∂xˆ ⎠ ⎠
In modified Newton’s method the Jacobian is evaluated
∂ ⎛ 2 ∂vˆ ⎞ ∂ ⎛ 1 ⎛ ∂vˆ ∂wˆ ⎞ ⎞ only during the first iteration. Consequently the Jacobian is
+ ⎜ ⎟+ ⎜ ⎜ + ⎟⎟,
∂yˆ ⎝ Re ∂yˆ ⎠ ∂zˆ ⎝ Re ⎝ ∂zˆ ∂yˆ ⎠ ⎠
factorized only once. For all subsequent iterations, the same
Jacobian (and hence its LU factors) is used repeatedly. This
and algorithm is referred as modified Newton. Since factorization
∂ ∂ ∂ ∂ ⎛ ∂uˆ ∂vˆ ∂wˆ ⎞ ∂ ⎛ 1 ⎛ ∂uˆ ∂wˆ ⎞ ⎞ (4)
( uw ˆ ˆ ) + ( wˆ 2 ) = λ ⎜ + +
ˆ ˆ ) + ( vw ⎟+ ⎜ + ⎟ is the most expensive part of the computations, by using
∂xˆ ∂yˆ ∂zˆ ∂zˆ ⎝ ∂xˆ ∂yˆ ∂zˆ ⎠ ∂xˆ ⎜⎝ Re ⎝ ∂zˆ ∂xˆ ⎠ ⎟⎠
modified Newton’s algorithm, the expensive factorization step
∂ ⎛ 1 ⎛ ∂vˆ ∂wˆ ⎞ ⎞ ∂ ⎛ 2 ∂wˆ ⎞
+ ⎜ ⎜ + ⎟⎟ + ⎜ ⎟. can be skipped after the first iteration.
∂yˆ ⎝ Re ⎝ ∂zˆ ∂yˆ ⎠ ⎠ ∂zˆ ⎝ Re ∂zˆ ⎠
Open Science Index, Mathematical and Computational Sciences Vol:3, No:9, 2009 waset.org/Publication/4908

We can see that the discretizations of the governing partial

differential equations from (7)-(10) by the GFEM scheme
where uˆ , vˆ, wˆ are the components of velocity,. The bulk
results in a set of nonlinear equations. However the core of the
flow Reynolds number, Re, and λ is the penalty parameter. resulting nonlinear equations is the solution of a sparse linear
Velocities are non-dimensionalized with respect to inlet systems (eq. 9), which is the most computationally intensive
velocity and the coordinates are non-dimensionalized with part of the solver both in terms of CPU time and memory
respect to channel length. requirement. Here three different out-of-core solvers,
The boundary conditions are prescribed as follows: MUMPS, HSL_MA78 and PARDISO are implemented and
(1) Along the channel inlet: compared.
uˆ = 1; vˆ = 0; wˆ = 0. (5) To gain maximum computational efficiency the codes are
optimized at three levels.
(2) Along the channel exit : (a) The first is at the hardware level by using an optimized
∂uˆ ∂vˆ ∂wˆ (6) Intel MKL BLAS library. This is highly optimized for Intel
= 0; = 0; = 0.
∂xˆ ∂xˆ ∂xˆ processors.
(b) The second level is the choice of an efficient state-of-
(3) Along the walls: the-art out-of-core solver. Three different out-of-core solvers
uˆ = 0; vˆ = 0; wˆ = 0. (7) are evaluated for their performance. The efficiency of an out-
of-core solver not only depends on the factorization
algorithms of the solvers but also on the handling of different
The flow Reynolds number is taken as 50 to simulate I/O operations. For an out-of-core solver, the I/O operations
laminar flow inside the channel. can be a bottleneck depending on how the I/O operations are
performed.HSL_MA78 handles efficiently using virtual
III. NUMERICAL FORMULATION memory management package HSL_OF01 which facilitates
reading and writing from direct-access files. Real and integer
Galerkin finite element method (GFEM) is used for the data have their own buffers associated with it. Each buffer can
discretization of the above penalty based Navier Stokes be associated with more than one direct-access file.
equations. Three dimensional brick elements are used; the (c) The third level is the choice of an efficient algorithm for
velocity components are interpolated bilinearly. The solving the system of non-linear equations. The choice of the
nonlinear system of equations obtained from GFEM is solved non-linear algorithm can affect the rate of convergence and
by Newton’s method. Let be the available vector of field hence the computational time. The system of non-linear
unknowns for the ith iteration. Then the update for the equations is either solved using Newton or Picard iteration.
iteration is obtained as Newton iteration is quite popular and efficient due to its
X (i +1) = X (i ) + α δ X (i ) , (8) quadratic convergence behavior. If the initial guess is chosen
( i)
where α is an under-relaxation factor, and δ X is the properly, then Newton iteration can give convergence in a few

correction vector obtained by solving the linearized system iterations. However the limitation is that the formation of
[ J ](i ) {δ X (i ) } = − {RX } . (9)
(i ) Jacobian matrices involving derivatives is not always
straightforward to compute. In addition the choice of the
Here, [ J ](i ) is the Jacobian matrix at the (i+1)st iteration,
initial guess will affect the convergence behavior. Picard

International Scholarly and Scientific Research & Innovation 3(9) 2009 640 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009

iteration however is more robust in terms of its flexibility for [22]). In addition, there is a provision to link METIS [23] as
choosing an initial guess. However, the rate of convergence an external package. Memory relaxation is taken as 100%.
will be linear and hence would result in more computational MUMPS out-of-core solver is used for all the cases. Table 1
time. shows the comparison of the performance of the various
The choice of modified Newton or modified Picard can ordering methods. All the cases are run for 30x30x30 mesh.
further significantly reduce the computational time. In the The CPU time and memory for each of the solver are
modified Newton method, the left hand side matrix is compared. The CPU time reported is the CPU time for the
factorized only once and the factors are reused. Since first Newton iteration. The CPU time and memory
factorization is the bottleneck, by avoiding the factorization in requirement for the complete in-core solution is also included
the subsequent steps can reduce the computational time. The in the brackets for a quick comparison. It is to be first noted
rate of convergence will no longer be quadratic but linear. that the out-of-core solution is around 3-5 times slower
Nevertheless, there will be significant savings in compared to the in-core solution. Of all the ordering packages,
computational time. This paper discussed only Newton and METIS gives best results. Compared to AMD, METIS results
modified Newton implementation. in almost one-third of the floating point operations. The
computational time and memory requirements are the lowest
IV. RESULTS AND DISCUSSION for the METIS ordering. Nested bisection algorithm of
In this paper, flow inside a three dimensional rectangular METIS is found to generate good ordering for three
Open Science Index, Mathematical and Computational Sciences Vol:3, No:9, 2009 waset.org/Publication/4908

channel is considered. Three dimensional finite element brick dimensional meshes. Based on this result, METIS ordering is
elements are used for generating the grid. Weak Galerkin used for all subsequent runs using MUMPS solver.
finite element formulation is used to discretize the Navier-
TABLE 1: COMPARISON OF ORDERING METHODS FOR THE MUMPS SOLVER
Stokes equations to form a large set of non-linear equations.
Newton's iteration is used to generate a set of linear algebraic Memory (GB)
ordering #dof's Cpu time (sec) in-core out-of-
equations. The matrices generated from such discretization are arrays core files
usually very sparse and hence a good sparse solver is used to
AMD 89373 438.4 (142.8) 1.4 (4.06) 2
reduce the computational efforts. It is to be noted that for three
dimensional grids, the matrices generated are less sparse QAMD 89373 446.6 (142.75) 1.4 (4.04) 2
compared to the matrices generated from two-dimensional AMF 89373 352 (105.7) 1.34 (3.48) 1.67
grid. PORD 89373 309 (86.6) 1.09 (3.18) 1.5
Typically an interior node in a three-dimensional grid is
METIS 89373 250.3 (55.01) 0.78 (3.02) 1.28
connected to 27 nodes including it. Since there are 3 dof's at
each node, a typical row consists of 81 non-zero entries. In a HSL_MA78 solver
two-dimensional grid, a typical row consists of 27 non-zero The HSL MA78solver does not any internal ordering
entries. This would increase the frontal size considerably. techniques but however the HSL solver package has other
Hence solving three-dimensional problems using direct routines which do the function of ordering the finite element
solvers is quite challenging both in terms of computational entries to reduce the fill-in during factorization. HSL MC68 is
time and memory requirements. Large problems cannot be generally used for efficient ordering of finite element
solved on a 32-bit machine using in-core techniques [4], [7]. matrices. In addition, external ordering packages can be
This paper studies the performance of out-of-core direct hooked to the HSL MA78 solver. In this paper METIS
solvers on a 64 bit machine with 16GB RAM. All the ordering is also used by hooking the METIS library to the
computations are run a windows machine with Intel Xeon solver. Table 2 shows the performance of HSL MC68 and
processor. METIS ordering on the CPU time and memory of the HSL
Before comparing the various solvers for their relative MA78 solver. It is found that METIS performs better than
performances, each individual solver is tuned for its optimal HSL MC68. Hence METIS is used for all subsequent runs for
performance, specifically the choice of the ordering package. the HSL MA78 solver.
Each solver has inbuilt ordering packages, whose choice can
affect the performance of the solver. In addition there are TABLE II: COMPARISON OF ORDERING METHODS FOR THE HSL-MA78 FOR
other parameters like pivot tolerance etc which will affect the 30X30X30 GRID
performance of the solver. Memory (GB)
ordering #dof's Cpu time (sec) in-core out-of-
MUMPS solver arrays core files
The sequential version of out-of-core MUMPS solver is HSL_MC68 89373 536 (524) 1.4 (6.88) 3.5
built on a 64 bit machine. The choice of the out-of-core solver METIS 89373 321 (318) 0.79 (4.72) 2
can be invoked by setting the value of
mumps_par%ICNTL(22) as 1. MUMPS has different inbuilt PARDISO solver
ordering packages (AMD [20], QAMD [21], AMF, PORD PARDISO has minimum dissection (MD) and METIS

International Scholarly and Scientific Research & Innovation 3(9) 2009 641 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009

ordering hooked internally within the solver. The user has the HSL_MA78 and PARDISO solvers for different grid sizes.
choice to use either of the ordering techniques. Table 3 The performance of in-core solution is also presented for
compares the effect of MD and METIS ordering on the relative comparison. Table 4 shows that out-of-core MUMPS
performance of PARDISO solver. It is found that METIS solver is always greater than 4 times slower compared to the
performs better than the MD ordering technique. Hence in-core solver. This shows that the out-of-core implementation
METIS is used for all subsequent runs for the PARDISO of MUMPS solver is less efficient. The in-core memory
solver. requirement is maintained low. Surprisingly the out-of-core
HSL_MA78 solver is very efficient with respect to the in-core
TABLE III: COMPARISON OF ORDERING METHODS FOR THE PARDISO SOLVER
FOR 30X30X30 GRID
solver. The computational times for both in-core and out-of-
ORDERING #DOF'S CPU TIME MEMORY (GB) core implementations are almost similar. The in-core memory
(SEC)
IN-CORE OUT-OF-CORE
for the out-of-core solver is maintained low. The out-of-core
ARRAYS FILES memory requirement is larger for HSL_MA78 solver
MD 89373 284 (162) 0.5 (2.71) 2.8 compared to the other two solvers.
METIS 89373 97 (59) 0.3 (1.42) 1.25
TABLE VI: PERFORMANCE OF MUMPS SOLVER ON DIFFERENT GRID SIZES

MUMPS in-core MUMPS out-of-core

Table 4 and Table 5 show the time split between different
cpu cpu Memory (GB)
phases of the solver for in-core and out-of-core solution.
Open Science Index, Mathematical and Computational Sciences Vol:3, No:9, 2009 waset.org/Publication/4908

time Memory time incor Out-of-

Interestingly, the performance of in-core and out-of-core for nex ney nez #dof's (min) (GB) (min) e core
HSL_MA78 solver is almost the same in terms of 50 10 10 18513 0.047 0.23 0.31 0.075 0.11
computational time. This may be because of the efficient I/O
100 10 10 36663 0.089 0.52 0.63 0.12 0.23
operation used in HSL_MA78 package. It used virtual
200 10 10 72963 0.177 1.1 1.25 0.2 0.465
memory management using HSL_F01 packages to handle
efficient I/O operations. This strategy is found to be very 50 20 10 35343 0.14 0.65 0.82 0.17 0.285
effective in developing good out-of-core solvers. Although the 100 20 10 69993 0.278 1.4 1.74 0.27 0.612
out-of-core HSL is performing well in comparison to the in-
100 20 20 133623 1.127 3.87 5.31 0.72 1.72
core, the overall computation time is much larger compared to
the other solvers. MUMPS out-of-core solver is almost 4 100 50 20 324513 6.4 13.42 21.39 2.54 6.13

times slower than in-core solver. PARDISO out-of-core solver 100 50 50 788103 * * 126.1 10.2 24.1
is around 1.6 times slower than its in-core solver. Another 50 20 20 67473 0.45 1.81 2.36 0.47 0.78
interesting observation is that out-of-core PARDISO has a
50 50 10 85833 0.495 2.11 2.72 0.45 0.925
relatively large solve phase compared to out-of-core MUMPS.
PARDISO has much less in-core memory requirement 50 50 20 163863 2.228 5.93 8.67 1.3 2.64

compared to MUMPS or HSL. 50 50 50 397953 * * 42.5 5 10.12

TABLE IV: COMPARISON OF TIME SPLIT FOR IN-CORE SOLUTION OF 30X30X30 TABLE VII: PERFORMANCE OF HSL_MA78 SOLVER ON DIFFERENT GRID SIZES
GRID
HSL in-core HSL out-of-core

cpu Memory (GB)
time Memo cpu Out-
In-core Computational time (Seconds) Memory
(min ry time In- of-
solvers Matrix Analysi Numeric Solve Total (GB)
nex ney nez #dof's ) (GB) (min) core core
assembly s phase phase phase time

MUMPS 4 1.51 53.3 0.58 59.39 3.02 50 10 10 18513 0.18 0.24 0.193 0.14 0.32

PARDISO 4 2.19 52.6 0.47 59.26 1.42 100 10 10 36663 0.35 0.5 0.363 0.14 0.625
HSL_MA7
8 4 0.64 313.14 317.78 4.72 200 10 10 72963 0.59 0.79 0.612 0.14 1.15
50 20 10 35343 0.78 0.65 0.8 0.23 0.84
TABLE 5: COMPARISON OF TIME SPLIT FOR OUT-OF-CORE SOLUTION OF 100 20 10 69993 1.06 1.53 1.074 0.23 1.475
30X30X30 GRID
100 20 20 133623 4.83 3.91 3.936 0.5 3.54
Computational time (Seconds) Memory (GB)
Out- 100 50 20 324513 28.9 13.94 23.52 1.56 12.67
Out-core Matrix Analysis Numeric Solve Total Incore of-core
solvers assembly phase phase phase time files files 100 50 50 788103 * * 542.6 8.1 49.95
250.2
MUMPS 4 1.6 243.2 1.45 5 0.78 1.28 50 20 20 67473 2.42 1.98 2.46 0.5 1.93
PARDISO 4 2.2 87.76 3.07 97.03 0.3 1.25 50 50 10 85833 2.65 2.38 2.69 0.5 2.37
HSL_MA7 318.6
8 4 0.64 314 4 0.79 3.174 50 50 20 163863 11.9 7.12 11.3 1.56 6.45

50 50 50 397953 * * 358 5.8 24

Tables 6-8 shows the performance of out-of-core MUMPS,

International Scholarly and Scientific Research & Innovation 3(9) 2009 642 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009

Table 8 shows the performance of PARDISO solver. The core refers to the in-core and out-of-core memory
out-of-core solver is around twice slower compared to the in- requirements for the out-of-core solver, nex, ney and nez refer
core solver. Overall PARDISO out-of-core solver is must to the grid elements in the x,y and z directions respectively, n
faster compared to the other two solvers. The in-core memory refers to the total number of degrees of freedom, ar and ar
requirement is kept very low. For a 100x50x50 mesh, out-of- refers to the grid aspect ratio's nex/ney and nex/nez.
core MUMPS requires around 10 GB of in-core memory and MUMPS
out-of-core HSL_MA78 requires around 8 GB of in-core T = 3.56 ×10−7 n1.447 ar1−0.127 ar2 −0.127 ; R 2 = 0.95 (12)
memory and out-of-core PARDISO requires around 4 GB of M incore = 8.28 ×10−7 n1.214 ar1−0.197 ar2 −0.197 ; R 2 = 0.98 (13)
in-core memory. Hence PARDISO can solve much finer grid
M outofcore = 2.11×10−7 n1.377 ar1−0.127 ar2 −0.127 ; R 2 = 0.98 (14)
sizes compared to the other two solvers. The finest grid sizes
chosen to solve with PARDISO in this paper are 150x75x30 HSL
and 200x80x40, which consists of around 1 million and 2 T = 3.53 ×10−8 n1.707 ar1−0.362 ar2 −0.362 ; R 2 = 0.7 (15)
million degrees of freedom. The 150x75x30 grid requires M incore = 2.6 ×10 n−4 0.736
ar
1
−0.33
ar2 −0.33
; R = 0.98
2 (16)
around 5.5 GB of in-core memory. The out-of-core memory is −6 −0.155 −0.155 (17)
M outofcore = 3.42 ×10 n 1.219
ar
1 ar2 ; R = 0.99
2

around 31 GB. It takes around 172 seconds for one Newton

iteration. The 200x80x40 grid requires around 10.5 GB of in- PARDISO
core memory and around 75 GB out-of-core memory. One T = 4.07 ×10−9 n1.757 ar1−0.174 ar2 −0.174 ; R 2 = 0.85 (18)
Open Science Index, Mathematical and Computational Sciences Vol:3, No:9, 2009 waset.org/Publication/4908

−6 −0.01 −0.01
Newton iteration takes around 16.5 hours of CPU time. Thus M incore = 3.84 ×10 n 1.02
ar 1 ar2 ; R = 0.99
2 (19)
we observe that out-of-core PARDISO can solve very large −7
M outofcore = 1.39 ×10 n 1.407
ar−0.112
ar2 −0.112
; R 2 = 0.99 (20)
1
three dimensional problems in the context of using direct

solvers on a single desktop. Both in terms of computational
The correlations will give an idea of how the solver
time and memory requirement, PARDISO is found to the best
requirements vary as the grid size is modified. The exponents
solver.
of n is greater than 1 indicating that as the number of degrees
of freedom increases, the CPU time and memory requirement
TABLE VIII: PERFORMANCE OF PARDISO SOLVER ON DIFFERENT GRID SIZES
are going to increase superlinearly. For the out-of-core
PARDISO in‐core PARDISO out‐of‐core solvers, the exponents of n are similar for all the three solvers,
cpu cpu Memory (GB)
ne time Memory time incore  out‐of‐ with MUMPS being the lower of around 1. 45. The CPU time
nex ney z #dof's (min) (GB) (min)               core (and memory requirement are not only a function of number
50 10 10 18513 0.038 0.08 0.067 0.087 0.1 of degrees of freedom but also a function of the grid aspect
0.15 0.17 0.22
ratio’s. The absolute values of the exponents of the aspect
100 10 10 36663 0.073 0.25
ratio’s is larger for HSL solver compared to MUMPS and
200 10 10 72963 0.157 0.57 0.304 0.34 0.47
PARDISO. This indicates that the solver performance is also a
50 20 10 35343 0.105 0.29 0.209 0.17 0.27 strong function of the grid distribution also. The memory
100 20 10 69993 0.243 0.69 0.515 0.337 0.593 requirements of the out-of-core solver consists of in-core
1.758 0.665 1.693
memory (for holding the frontal matrices and other working
100 20 20 133623 1.073 1.92
arrays) requirement and out-of-core memory (consists of LU
9.5 1.65 6.035
100 50 20 324513 6.517 6.62 factors written to the disk) requirements. Correlations are
100 50 50 788103 * * 114.2 4.1 24.5 presented for both the memory requirements. An interesting
0.731 0.33 0.763 observation is that the in-core memory requirement for
50 20 20 67473 0.432 0.85
PARDISO the least amongst the three solvers and that the
50 50 10 85833 0.457 1.02 0.84 0.42 0.895
memory requirement varies linearly with the number of
50 50 20 163863 2.233 2.86 3.4 0.83 2.6 degrees of freedom and is almost independent of the grid
50 50 50 397953 17.650 10.67 25.4 2.03 10.15 distribution. This behavior of out-of-core PARDISO is very
171.95 5.52 31
conducive for choosing it for solving large three dimensional
150 75 30 1067268 * *
finite element problems.
993 10.5 74.8
200 80 40 2002563 * *

Correlations are generated for the CPU times and memory

requirements of all the solvers with respect to the grid size.
Equation 12-15 shows the correlations for the MUMPS solver.
In these equations T refers to the CPU time in minutes taken
by the solver for one Newton iteration. It includes the time for
generation of the matrix, the analysis phase, factorization
phase and the solve phase. M refers to the memory
requirement in GigaBytes, the subscripts in-core and out-of-

International Scholarly and Scientific Research & Innovation 3(9) 2009 643 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009

100 400 increases, the percentage of computational savings using

350
modified Newton's method as compared to the Newton's
10
-2
method increases.

Cumulative CPU time (sec)

300

10-4 Newton
250 V. CONCLUSIONS
Modified Newton
||δX||∞

200
Three different out-of-core solvers (MUMPS, HSL_MA78,
10-6
PARDISO) are evaluated for the solution of finite element
-8
150
Navier-Stokes formulation of laminar flow in a rectangular
10
100 channel. METIS is found to be the best choice of ordering
10 -10
algorithm for reducing the fill in of LU factors. Of the three
50
solvers, PARDISO is found to the best solver with lower
0 1 2 3 4 5 6 7
0 computational time and lower in-core and out-of-core memory
Iterations requirements.
It is observed that out-of-core HSL_MA78 is found to
Fig. 1 Comparison of CPU time and residual norm for Newton and
modified Newton's method for 30x30x30 grid perform almost identically with that of the in-core
HSL_MA78 solver. HSL_F01 facilitates the efficient I/O
Figure 1 shows the performance of Newton and operations for the HSL_MA78 solver. However the out-of-
Open Science Index, Mathematical and Computational Sciences Vol:3, No:9, 2009 waset.org/Publication/4908

modified Newton algorithms using out-of-core PARDISO as core HSL_MA78 is much slower than MUMPS and
the linear solver for a 30x30x30 grid. No under-relaxation is PARDISO out-of-core solvers. Out-of-core strategy can help
used for both Newton and modified Newton's method. in solving large three dimensional finite element problems.
Newton's method converges in 4 iterations and quadratic Out-of-core PARDISO could solve around 2 million equations
convergence is observed. Modified Newton's method resulting from three dimensional finite element formulations
converges in 6 iterations. Linear convergence is observed. It is on a single desktop. Further it is observed that the use of
observed that significant computational time savings can be modified Newton's algorithm can significantly reduce the
achieved using modified Newton's method. Newton's computational time as compared to the Newton's algorithm.
iterations converge in 381 seconds, whereas modified
Newton's iterations converge in 128 seconds. REFERENCES
[1] T. A. Davis and I.S. Duff, “A combined unifrontal/multifrontal method
TABLE IX: COMPARISON OF CPU TIMES FOR NEWTON AND MODIFIED for unsymmetric sparse matrices,” ACM Trans. Math. Soft., vol. 25, no.
NEWTON FOR DIFFERENT GRIDS 1, 1997, pp. 1–19.
[2] O. Schenk, K. Gartner, and W. Fichtner, “Efficient Sparse LU
PARDISO out‐of‐core Newton Modified Newton Factorization with Left-right Looking Strategy on Shared Memory
Multiprocessors,” BIT, vol. 40, no. 1, 2000, pp. 158–176.
nex ney nez #dof's cpu time (min) cpu time (min) [3] B. M. Irons, “A frontal solution scheme for finite element analysis,”
Numer. Meth. Engg., vol. 2, 1970, pp. 5-32.
50 10 10 18513 0.34 0.152
[4] M. P. Raju, and J. S. T’ien, “Development of Direct Multifrontal Solvers
100 10 10 36663 0.8 0.327 for Combustion Problems,” Numerical Heat Transfer-Part B, vol. 53,
2008, pp. 1-17.
200 10 10 72963 1.78 0.77 [5] M. P. Raju, and J. S. T’ien, “Modelling of Candle Wick Burning with a
Self-trimmed Wick,” Comb. Theory Modell., vol. 12, no. 2, 2008, pp.
50 20 10 35343 0.99 0.405
367-388.
100 20 10 69993 2.35 0.937 [6] M. P. Raju, and J. S. T’ien, “Two-phase flow inside an externally heated
axisymmetric porous wick,” vol. 11, no. 8, 2008, pp. 701-718.
100 20 20 133623 7.65 2.53 [7] P. K. Gupta, and K. V. Pagalthivarthi, “Application of Multifrontal and
GMRES Solvers for Multisize Particulate Flow in Rotating Channels,”
100 50 20 324513 39.51 11.83
Prog. Comput Fluid Dynam., vol. 7, 2007, pp. 323–336.
100 50 50 788103 468.1 122.5 [8] S. Khaitan, J. McCalley, Q. Chen, "Multifrontal solver for online power
system time-domain simulation," IEEE Transactions on Power Systems,
50 20 20 67473 3.27 1.09 vol. 23, no. 4, 2008, pp. 1727–1737.
[9] S. Khaitan, C. Fu, J. D. McCalley, "Fast parallelized algorithms for
50 50 10 85833 3.57 1.247
online extended-term dynamic cascading analysis," PSCE, 2009, pp. 1–
50 50 20 163863 14.4 4.37 7.
[10] J. McCalley, S. Khaitan, “Risk of Cascading outages”, Final Report,
50 50 50 397953 106.4 27.98 PSrec Report, S-26, August 2007. Available at
http://www.pserc.org/docsa/Executive_Summary_Dobson_McCalley_C
150 75 30 1067268 693 181.8
ascading_Outage_ S-2626_PSERC_ Final_Report.pdf
[11] P. R. Amestoy, and I. S. Duff, “Vectorization of a multiprocessor
Table 9 shows the comparison of CPU times for Newton multifrontal code,” International Journal of Supercomputer
and modified Newton methods for different grid sizes. It is Applications, vol. 3, 1989, pp. 41–59.
[12] P. R. Amestoy, I. S. Duff, J. Koster and J. Y. L’Excellent, “A fully
clearly observed that the implementation of modified Newton asynchronous multifrontal solver using distributed dynamic scheduling,”
methods leads to significant savings in computational time. SIAM Journal on Matrix Analysis and Applications, vol. 23, no. 1, 2001,
Further it is observed that the number of degrees of freedom pp. 15–41.

International Scholarly and Scientific Research & Innovation 3(9) 2009 644 ISNI:0000000091950263
World Academy of Science, Engineering and Technology
International Journal of Mathematical and Computational Sciences
Vol:3, No:9, 2009

[13] P. R. Amestoy, I. S. Duff, and J. Y. L’Excellent, “Multifrontal parallel

distributed symmetric and unsymmetric solvers,” Comput. Methods
Appl. Mech. Eng., vol. 184, 2000, pp. 501–520.
[14] O. Schenk, “Scalable Parallel Sparse LU Factorization Methods on
Shared Memory Multiprocessors,” Ph.D. dissertation, ETH Zurich,
2000.
[15] O. Schenk, and K. Gartner, “Sparse Factorization with Two-Level
Scheduling in PARDISO,” in Proc. 10th SIAM conf. Parallel Processing
for Scientific Computing, Portsmouth, Virginia, March 12-14, 2001.
[16] O. Schenk, and K. Gartner, “Two-level scheduling in PARDISO:
Improved Scalability on Shared Memory Multiprocessing Systems,”
Parallel Computing, vol. 28, 2002, pp. 187-197.
[17] O. Schenk, and K. Gartner, “Solving Unsymmetric Sparse Systems of
Linear Equations with PARDISO,” Journal Future Generation
Computer Systems, vol. 20, no. 3, 2004, pp. 475-487.
[18] Intel MKL Reference Manual, Intel® Math Kernel Library (MKL), 2007.
Available: http://www.intel.com/software/products/mkl/
[19] J. A. Scott, Numerical Analysis Group Progress Report, RAL-TR-2008-
001, Rutherford Appleton Laboratory, 2008.
[20] P. R. Amestoy, T. A. Davis, and I. S. Duff, “An approximate minimum
degree ordering algorithm,” SIAM Journal on Matrix Analysis and
Applications, vol. 17, 1996, pp. 886–905.
Open Science Index, Mathematical and Computational Sciences Vol:3, No:9, 2009 waset.org/Publication/4908

[21] P. R. Amestoy, “Recent progress in parallel multifrontal solvers for

unsymmetric sparse matrices,” in Proc. 15th World Congress on
Scientific Computation, Modelling and Applied Mathematics, IMACS,
Berlin, 1997.
[22] J. Schulze, “Towards a tighter coupling of bottom-up and top-down
sparse matrix ordering methods,” BIT, vol. 41, no. 4, 2001, pp. 800–841.
[23] G. Karypis, and V. Kumar, “METIS – A Software Package for
Partitioning Unstructured Graphs, Partitioning Meshes, and Computing
Fill-Reducing Orderings of Sparse Matrices – Version 4.0,” University
of Minnesota, September 1998.

International Scholarly and Scientific Research & Innovation 3(9) 2009 645 ISNI:0000000091950263

Performance Analysis of Different Iterative Solvers Parallelized On Gpu Architecture
No ratings yet
Performance Analysis of Different Iterative Solvers Parallelized On Gpu Architecture
8 pages
Adams 2000
No ratings yet
Adams 2000
18 pages
Application and Validation of CFD in A Turbomachinery Design System
No ratings yet
Application and Validation of CFD in A Turbomachinery Design System
11 pages
ChannelFlow Readme PDF
100% (1)
ChannelFlow Readme PDF
16 pages
LectureNoteInCS1573 (VECPAR'98)
No ratings yet
LectureNoteInCS1573 (VECPAR'98)
11 pages
Other - Comparison of Pressure-Velocity Coupling Schemes For 2D Flow (Audi)
No ratings yet
Other - Comparison of Pressure-Velocity Coupling Schemes For 2D Flow (Audi)
4 pages
Theoretical Physics For Students by Alexander Fufaev
No ratings yet
Theoretical Physics For Students by Alexander Fufaev
204 pages
Exercises For 2nd Order Homogeneous and Non Homogeneous (Page 2226, 2243)
No ratings yet
Exercises For 2nd Order Homogeneous and Non Homogeneous (Page 2226, 2243)
35 pages
A Parallel Implicit in Compressible Flow
No ratings yet
A Parallel Implicit in Compressible Flow
33 pages
Parallel PNSC
No ratings yet
Parallel PNSC
26 pages
Scalable Implicit Solvers With Dynamic Mesh Adaptation For A Relativistic Drift-Kinetic Fokker-Planck-Boltzmann Model
No ratings yet
Scalable Implicit Solvers With Dynamic Mesh Adaptation For A Relativistic Drift-Kinetic Fokker-Planck-Boltzmann Model
36 pages
J Archger 2020 104255
No ratings yet
J Archger 2020 104255
26 pages
Adaptively Localized Continuation-Newton Method-Nonlinear Solvers That Converge All The Time
No ratings yet
Adaptively Localized Continuation-Newton Method-Nonlinear Solvers That Converge All The Time
19 pages
90 GT 018
No ratings yet
90 GT 018
10 pages
The Growth of Computacional Fluid Dynamics
No ratings yet
The Growth of Computacional Fluid Dynamics
8 pages
10 Solvers
No ratings yet
10 Solvers
23 pages
The Synaps Pointer Optimization Engine
No ratings yet
The Synaps Pointer Optimization Engine
7 pages
1988 NUM SOLVERS REVIEW - Ababou1988PhD&MIT Report - X18p
No ratings yet
1988 NUM SOLVERS REVIEW - Ababou1988PhD&MIT Report - X18p
18 pages
2015 - Flow Simulation System Based On High Order Space - Time Extension of Flux Reconstrution Methods
No ratings yet
2015 - Flow Simulation System Based On High Order Space - Time Extension of Flux Reconstrution Methods
22 pages
Learnable-Differentiable Finite Volume Solver For Accelerated Simulation of Flows
No ratings yet
Learnable-Differentiable Finite Volume Solver For Accelerated Simulation of Flows
19 pages
Choosing A Solver For FEM - Direct or Iterative - SimScale
No ratings yet
Choosing A Solver For FEM - Direct or Iterative - SimScale
15 pages
Leonardo Jose Roberto 2021 Hybrid Parallel Iterative Solver
No ratings yet
Leonardo Jose Roberto 2021 Hybrid Parallel Iterative Solver
18 pages
Parallel Computing in CFD: Milovan Perić
No ratings yet
Parallel Computing in CFD: Milovan Perić
25 pages
Dutto 1999
No ratings yet
Dutto 1999
14 pages
High Performance Parallel Computing of Flows in Complex Geometries - Part 1 - Methods
No ratings yet
High Performance Parallel Computing of Flows in Complex Geometries - Part 1 - Methods
25 pages
LS Dyna1
No ratings yet
LS Dyna1
6 pages
ECCOMAS Lisbon Minisymposium
No ratings yet
ECCOMAS Lisbon Minisymposium
1 page
Iccad 22
No ratings yet
Iccad 22
9 pages
JCP Symmpois Published
No ratings yet
JCP Symmpois Published
19 pages
CFX Openfoam Ebfvm PDF
No ratings yet
CFX Openfoam Ebfvm PDF
17 pages
Sparse Matrix Factorization in The Implicit Finite Element Method On Petascale Architecture
No ratings yet
Sparse Matrix Factorization in The Implicit Finite Element Method On Petascale Architecture
22 pages
CFD - Solver PDF
No ratings yet
CFD - Solver PDF
55 pages
Detals of Steps:: Finite Element Analysis Solver
No ratings yet
Detals of Steps:: Finite Element Analysis Solver
6 pages
AI in Smart Energy Systems Lecture 7 Notes
No ratings yet
AI in Smart Energy Systems Lecture 7 Notes
12 pages
02 ElmerWebinar Multiphysics
No ratings yet
02 ElmerWebinar Multiphysics
51 pages
ANSYS Mechanical APDL Introductory Tutorials Huy KLJHLKJHLK
100% (1)
ANSYS Mechanical APDL Introductory Tutorials Huy KLJHLKJHLK
142 pages
Solving Pdes With Cuda
No ratings yet
Solving Pdes With Cuda
34 pages
Elmer: Beoynd Elmergui - About Pre-And Postprocessing, Derived Data and Manually Working With The Case
No ratings yet
Elmer: Beoynd Elmergui - About Pre-And Postprocessing, Derived Data and Manually Working With The Case
30 pages
Recent Advances in Sparse Direct Solvers: Approximations, and Distributed/shared Memory Hybrid Programming
No ratings yet
Recent Advances in Sparse Direct Solvers: Approximations, and Distributed/shared Memory Hybrid Programming
10 pages
Performance Analysis Tools Applied To A Finite Adaptive Mesh Free Boundary Seepage Parallel Algorithm
No ratings yet
Performance Analysis Tools Applied To A Finite Adaptive Mesh Free Boundary Seepage Parallel Algorithm
62 pages
Manapy: MPI-Based Framework For Solving Partial Differential Equations Using Finite-Volume On Unstructured-Grid
No ratings yet
Manapy: MPI-Based Framework For Solving Partial Differential Equations Using Finite-Volume On Unstructured-Grid
24 pages
Experience of Developing Sparse Matrix Algorithms and Software For Sustainablity
No ratings yet
Experience of Developing Sparse Matrix Algorithms and Software For Sustainablity
22 pages
Internship2024 SRSI2023
No ratings yet
Internship2024 SRSI2023
14 pages
Knupp P. - Remarks On Mesh Quality
No ratings yet
Knupp P. - Remarks On Mesh Quality
10 pages
Rapid Simulation of Hydraulic Fracturing Using A Planar 3D Model
No ratings yet
Rapid Simulation of Hydraulic Fracturing Using A Planar 3D Model
26 pages
FPDE User Guide
No ratings yet
FPDE User Guide
77 pages
Evaluation of CFD Codes On A Two-Phase Flow Benchmark Reference Test Case
No ratings yet
Evaluation of CFD Codes On A Two-Phase Flow Benchmark Reference Test Case
4 pages
Chapter 2 Numerical Modelling Approach
No ratings yet
Chapter 2 Numerical Modelling Approach
5 pages
Application of AVX (Advanced Vector Extensions) For Improved PDF
No ratings yet
Application of AVX (Advanced Vector Extensions) For Improved PDF
8 pages
AeroFoam QuickStart
No ratings yet
AeroFoam QuickStart
8 pages
Buckling of Free Pipe Under External Pressure
100% (1)
Buckling of Free Pipe Under External Pressure
38 pages
Benchmark PDF
No ratings yet
Benchmark PDF
20 pages
Assignment 2 PDF
No ratings yet
Assignment 2 PDF
14 pages
SWSIM-Enhanced Turbulence Modeling Flow
No ratings yet
SWSIM-Enhanced Turbulence Modeling Flow
21 pages
MEC 21 Part II 098
No ratings yet
MEC 21 Part II 098
162 pages
Comparison Between Upwind and Multidimensional Upwind Schemes
No ratings yet
Comparison Between Upwind and Multidimensional Upwind Schemes
11 pages
CFD Ii PDF
No ratings yet
CFD Ii PDF
4 pages
Bay Rak Tarm Ierk A Turek 2011
No ratings yet
Bay Rak Tarm Ierk A Turek 2011
17 pages
A Performance Assessment of NE/Nastran's New Sparse Direct and Iterative Solvers
No ratings yet
A Performance Assessment of NE/Nastran's New Sparse Direct and Iterative Solvers
11 pages
Reverse Step Flow
No ratings yet
Reverse Step Flow
11 pages
Matlab Codes
No ratings yet
Matlab Codes
9 pages
Quadcopter Drone: Adaptive Control Laws: Alfredo M. Gar o M. Tianyang Cao Al Chandeck
No ratings yet
Quadcopter Drone: Adaptive Control Laws: Alfredo M. Gar o M. Tianyang Cao Al Chandeck
10 pages
Engineering Procurment and Project Management : Rev Date Description Eppm Approved Approved
No ratings yet
Engineering Procurment and Project Management : Rev Date Description Eppm Approved Approved
52 pages
Antonio Posa
No ratings yet
Antonio Posa
11 pages
Steel Pipelines Crossing Highways: D P Syms F E T1 T2 T TW H
0% (1)
Steel Pipelines Crossing Highways: D P Syms F E T1 T2 T TW H
53 pages
Lecture 1 - 1D Autonomous Differential Equations (WS) PDF
100% (1)
Lecture 1 - 1D Autonomous Differential Equations (WS) PDF
25 pages
Software Verification: EXAMPLE 6-003
No ratings yet
Software Verification: EXAMPLE 6-003
16 pages
PL MEHTA Demand Forecasting
No ratings yet
PL MEHTA Demand Forecasting
12 pages
Nonlinear Guide
No ratings yet
Nonlinear Guide
17 pages
Crane Tech Eval AR 05-13 Rev 2 02-05-15
No ratings yet
Crane Tech Eval AR 05-13 Rev 2 02-05-15
60 pages
B.tech 15CS328E Virtual Reality
No ratings yet
B.tech 15CS328E Virtual Reality
5 pages
Pipe Elbow Stiffness Coefficients Including Shear and Bend Flexibility Factors For Use in Direct Stiffness Codes
No ratings yet
Pipe Elbow Stiffness Coefficients Including Shear and Bend Flexibility Factors For Use in Direct Stiffness Codes
17 pages
10 - Chapter 2 PDF
No ratings yet
10 - Chapter 2 PDF
16 pages
Simultaneous Equation
No ratings yet
Simultaneous Equation
6 pages
F KX PPT Handouts With Mathcad
No ratings yet
F KX PPT Handouts With Mathcad
57 pages
Physics-Informed Neural Networks
No ratings yet
Physics-Informed Neural Networks
22 pages
Differential Ion Mobility Spectrometry Nonlinear Ion Transport and Fundamentals of FAIMS 1st Edition Alexandre A. Shvartsburg (Author)
100% (19)
Differential Ion Mobility Spectrometry Nonlinear Ion Transport and Fundamentals of FAIMS 1st Edition Alexandre A. Shvartsburg (Author)
84 pages
Cam-Clay Plasticity, Part II Implicit Integration of Constitutive Equation Based On A Nonlinear Elastic Stress Predictor
No ratings yet
Cam-Clay Plasticity, Part II Implicit Integration of Constitutive Equation Based On A Nonlinear Elastic Stress Predictor
16 pages
Handbook of Sinc Numerical Methods Frank Stenger Download
No ratings yet
Handbook of Sinc Numerical Methods Frank Stenger Download
49 pages
Process Dynamics and Control Cap 13
No ratings yet
Process Dynamics and Control Cap 13
13 pages
Inclusion of A Support Friction Into A Computerized Solution of A Self-Compensating Pipeline
No ratings yet
Inclusion of A Support Friction Into A Computerized Solution of A Self-Compensating Pipeline
6 pages
Plastic-Deformation Analysis in Tube Bending: N.C. Tang
No ratings yet
Plastic-Deformation Analysis in Tube Bending: N.C. Tang
9 pages
OD - 2007 - Fifty Years of Orbit Determination Development of Modern Astrodynamics Methods
No ratings yet
OD - 2007 - Fifty Years of Orbit Determination Development of Modern Astrodynamics Methods
14 pages
Topology Optimization Theory, Methods and Applications: Topology Optimization in Optics Jakob Søndergaard Jensen
No ratings yet
Topology Optimization Theory, Methods and Applications: Topology Optimization in Optics Jakob Søndergaard Jensen
50 pages
Piping Flexibility Checking
No ratings yet
Piping Flexibility Checking
5 pages
Non Linear
No ratings yet
Non Linear
18 pages
ATTACHMENT 1-TECHNICAL BID EVALUATION FOR VALVES ATTACHEMENT AG 1 Rev 1
No ratings yet
ATTACHMENT 1-TECHNICAL BID EVALUATION FOR VALVES ATTACHEMENT AG 1 Rev 1
4 pages
ATTACHMENT 1-TECHNICAL BID EVALUATION FOR VALVES ATTACHEMENT AG 1 Rev 1
No ratings yet
ATTACHMENT 1-TECHNICAL BID EVALUATION FOR VALVES ATTACHEMENT AG 1 Rev 1
4 pages
Part 1 - Introduction To Systems Engineering
No ratings yet
Part 1 - Introduction To Systems Engineering
64 pages
CH - 2 Differential Equation Student
No ratings yet
CH - 2 Differential Equation Student
7 pages
Module 3 Problem Discretization Using Approximation Theory PDF
No ratings yet
Module 3 Problem Discretization Using Approximation Theory PDF
98 pages
ATTACHMENT 1-TECHNICAL BID EVALUATION FOR VALVES ATTACHEMENT Camtech Rev 1
No ratings yet
ATTACHMENT 1-TECHNICAL BID EVALUATION FOR VALVES ATTACHEMENT Camtech Rev 1
4 pages
ATTACHMENT 1-TECHNICAL BID EVALUATION FOR VALVES ATTACHEMENT Camtech Rev 1
No ratings yet
ATTACHMENT 1-TECHNICAL BID EVALUATION FOR VALVES ATTACHEMENT Camtech Rev 1
4 pages
Generalized Model of Strains During Bending of Metal Tubes in Bending Machines
No ratings yet
Generalized Model of Strains During Bending of Metal Tubes in Bending Machines
14 pages
Sahinidis2019 Article Mixed-IntegerNonlinearProgramm
No ratings yet
Sahinidis2019 Article Mixed-IntegerNonlinearProgramm
6 pages
Find The Solution of 36 + 10u If (T 0) 3e Using The Method of Separation of Variables
No ratings yet
Find The Solution of 36 + 10u If (T 0) 3e Using The Method of Separation of Variables
3 pages
Simulink Model For Teaching The 'Stick-Slip' Friction Phenomenon in 'Machine Vibration and Noise' Course
No ratings yet
Simulink Model For Teaching The 'Stick-Slip' Friction Phenomenon in 'Machine Vibration and Noise' Course
10 pages
Report For Project Review - M Tech (SE&NDM) 2016-2018 Progressive Collapse
No ratings yet
Report For Project Review - M Tech (SE&NDM) 2016-2018 Progressive Collapse
3 pages
Non-Linear Guide Explained Rev 02
No ratings yet
Non-Linear Guide Explained Rev 02
9 pages
A Fuzzy Nonlinear Model For Quality Function Deployment Considering Kano's Concept
No ratings yet
A Fuzzy Nonlinear Model For Quality Function Deployment Considering Kano's Concept
13 pages
8015 0151 12 111 Pi LS 00001 - 03
No ratings yet
8015 0151 12 111 Pi LS 00001 - 03
8 pages
Optimal Motion Planning For Differentially Flat Underactuated Mechanical Systems
No ratings yet
Optimal Motion Planning For Differentially Flat Underactuated Mechanical Systems
7 pages
Optimum Path Planning For - Mechanical Manipulators
No ratings yet
Optimum Path Planning For - Mechanical Manipulators
10 pages
Nombre de Ligne Class Type 1 1/2" - FS-08031XXX-091440XG 091440XG Ball Valve
No ratings yet
Nombre de Ligne Class Type 1 1/2" - FS-08031XXX-091440XG 091440XG Ball Valve
4 pages
Technical Evaluation Penstock Valves Rev 0 Reviewed 14 11 2013
No ratings yet
Technical Evaluation Penstock Valves Rev 0 Reviewed 14 11 2013
3 pages
Isolation Valves TBE PGV Rev 0
No ratings yet
Isolation Valves TBE PGV Rev 0
1 page
Technical Evaluation HAM BAKER
No ratings yet
Technical Evaluation HAM BAKER
1 page
Simulation Choosing Solver PDF
No ratings yet
Simulation Choosing Solver PDF
2 pages

High Performance Computing Using Out-of-Core Sparse Direct Solvers

Uploaded by

High Performance Computing Using Out-of-Core Sparse Direct Solvers

Uploaded by

World Academy of Science, Engineering and Technology

International Journal of Mathematical and Computational Sciences

High Performance Computing Using Out-of-

However, as the problem size increases, the in-core

∂uˆ ∂vˆ ∂wˆ ∂R ( )

∂xˆ ∂yˆ ∂zˆ ∂X

We can see that the discretizations of the governing partial

MUMPS in-core MUMPS out-of-core

time Memory time incor Out-of-

compared to MUMPS or HSL. 50 50 50 397953 * * 42.5 5 10.12

50 50 50 397953 * * 358 5.8 24

around 31 GB. It takes around 172 seconds for one Newton

Correlations are generated for the CPU times and memory

100 400 increases, the percentage of computational savings using

Cumulative CPU time (sec)

[13] P. R. Amestoy, I. S. Duff, and J. Y. L’Excellent, “Multifrontal parallel

[21] P. R. Amestoy, “Recent progress in parallel multifrontal solvers for

You might also like