ECSE 420 - Parallel Cholesky Algorithm - Report

The document summarizes a study comparing serial, OpenMP, and MPI implementations of the Cholesky decomposition algorithm. It describes: 1) The motivation for studying parallel Cholesky decomposition given its computational complexity and potential for parallelization. 2) How the MPI implementation distributes matrix rows across processes to minimize communication, broadcasting rows as needed. 3) How the OpenMP implementation parallelizes the inner loop using #pragma directives without needing changes to the serial algorithm. 4) Tests to check the correctness and accuracy of results from the different implementations on varying matrix sizes and numbers of threads/processes.

Uploaded by

piohm

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

284 views

ECSE 420 - Parallel Cholesky Algorithm - Report

Uploaded by

piohm

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

1

ECSE 420 - Fast Cholesky: a serial, OpenMP and

MPI comparative study
Renaud Jacques-Dagenais, B en edicte Leonard-Cannon, Payom Meshgin, Samantha Rouphael
I. INTRODUCTION
The aim of this project is to analyze different parallel
implementations of the Cholesky decomposition algorithm
to determine which is most efcient in executing the oper-
ation. In particular, we will study a shared memory-based
implementation in OpenMP as well as a message-passing
based implementation using MPI, all programmed in the C
programming language. These parallel versions will also be
compared against a plain, serial version of the algorithm.
To test our implementations of Cholesky decomposition,
we employ a series of tests to gauge the performance of the
programs under varying matrix sizes and number of parallel
threads.
II. MOTIVATION
Many mathematical computations involve solving large
sets of linear equations. In general, the LU decomposition
method applies for any square matrix. However, for certain
common classes of matrices, other methods are more efcient.
One such method commonly used to solve these problems
is Cholesky decomposition. This method is notably used in
engineering applications such as circuit simulation and nite
element analysis.
Sadly, Cholesky decomposition is relatively demanding in
terms of computational operations. Indeed, the computation
time of the algorithm scales quite poorly with respect to
matrix size, with a complexity of order O(n
3
) [Gallivan].
Luckily, much of the work can be parallelized, allowing for
higher performance in less time.
III. BACKGROUND THEORY
The Cholesky decomposition algorithm is a special version
of the LU decomposition, where the matrix to be factorized
is square symmetric and positive denite. The key difference
between LU and Cholesky decomposition is that the upper tri-
angular factor determined using the Cholesky decomposition
method is forced to be equal to the transpose of the lower
triangular matrix, rather than an arbitrary matrix. For LL
T
factorable matrices, i.e. matrices that can be represented as
a product of a lower triangular matrix L and its transpose,
the Cholesky algorithm has twice the efciency of the LU
decomposition method.
The algorithm is iterative and recursive, such that after
each major step in the algorithm, the rst column and row
of the L factor are known and the algorithm is applied to the
remainder of the matrix. Hence, the computational domain of
the algorithm successively reduces in size.
We begin with an n x n, symmetric and positive denite
matrix A. These properties ensure the matrix A is LL
T
factorable [Heath]. Given the linear system Ax = b, Cholesky
factorization decomposes the matrix A into the form A =
LL
T
, such that L is a lower triangular matrix and L
T
is
its transpose. Once L is computed, the original system can
be solved trivially by solving the matrix equation L(y) = b
(also called forward substitution), following by solving the
equation L
T
x = y (also called backward substitution). For the
scope of this project, we will strictly focus on parallelizing the
factorization algorithm rather than the entire task of solving
Ax = b.
IV. IMPLEMENTATION
A. Serial Cholesky Algorithm
There are three popular implementations of the Cholesky
decomposition algorithm, each based on which elements of
the matrix are computed rst within the innermost for loop
of the algorithm [Gallivan]:
Row-wise Cholesky (row-Cholesky)
Column-Cholesky (column Cholesky)
Block Matrix-Cholesky (block Cholesky)
In the case of the serial implemenation of the algorithm, the
column-wise version was used to limit the number of cache
misses on the system. The column-wise algorithm is shown
below:
for j = 1 to n do
for k = 1 to j-1 do
for i = j to n do
a
ij
= a
ij
a
jk
;
end
end
a
jj
=

a
jj
;
for i = j+1 to n do
a
ij
= a
ij
/a
jj
;
end
end
Algorithm 1: Column-wise Cholesky algorithm
B. MPI
In order to have an efcient MPI implementation of
Cholesky decomposition, it is necessary to distribute com-
putations in a way that minimizes communications between
processes. Before we can gure out what the best solution
is, we have to identify the dependencies. By analyzing the
algorithm, we notice two things: previous columns need to be
updated before any subsequent column and, when updating
entries of the same column, the new value of diagonal entry
has to be computed rst.
This is easier to understand with an example. Suppose we
have a 5x5 matrix and we wish to update entries in column
2
3 (entries in previous columns have already been updated).
Then, the dependencies for each entry in column 3 are the
following (Figure 1).
Fig. 1. Dependency chart for the message passing implementation
Notice that the values that are needed to update an entry are
either located on the same row as the entry or on the row that
contains the entry on the diagonal. Hence, decomposing the
domain into rows is probably the best solution to minimize
communications between processes.
Cyclic assignment of rows to processes can easily be
achieved by taking the remainder of the division of the column
number to the total number of processes (i.e. j%n
processors
).
Note that we can take the column number instead of the
row number because the matrices on which the algorithm is
applied are always square. For instance, if we take the same
5x5 matrix as above and run the algorithm with 3 processes,
rows would be assigned as follows:
Next we need to identify which parts of the algorithm can
be parallelized. Let us recapitulate how Cholesky decompo-
sition works.
For every column:
1) Replace the entries above the diagonal with ze-
roes
2) Update the entry on the diagonal
3) Update entries below the diagonal
The rst step is pointless to parallelize because it involves
no computations.
The second step could be parallelized. We could start by
broadcasting the row that contains the entry on the diagonal,
then use MPI_Reduce() to compute the new value of
the entry on the diagonal and broadcast again to send the
updated value to all processes. However, this would be terribly
inefcient because processes would spend a lot of time
sending and receiving messages. It is more efcient to simply
let the process assigned to that row compute the new value
for the entry on the diagonal and broadcast the whole row
once.
The third step is where we gain performance by paralleliz-
ing tasks. Since processes already have access to the data in
their respective row(s), we only need to broadcast the row
that contains the entry on the diagonal. As soon as a process
gets this data, it can proceed and update the entry(entries) that
correspond to its row(s). Thus, at this point in the program,
processes all compute in parallel.
We also tried to broadcast each individual entry up to the
diagonal. We thought this might yield better performances
because the total amount of data transferred is almost half.
However, it turned out to decrease performances by a factor of
almost 2! This is due to the communication overhead incurred
when transferring data between processes.
Note: The reason why we chose to parallelize the row-
Cholesky algorithm instead of the column-Cholesky algo-
rithm is because there is no simple way to broadcast the
columns of a matrix. We thought about transposing the matrix
before applying the algorithm but we gured that perfor-
mances would probably be worse than by simply applying
the row-Cholesky.
C. OpenMP
The OpenMP implementation of the Cholesky decom-
position algorithm is fairly straight-forward. Starting from
the code for the serial implementation of the Cholesky
algorithm, only a few lines of code were changed to par-
allelize the algorithm with OpenMP. Using the OpenMP
#pragma parallel for directive before the innermost
for loop splits the single thread into a certain number of
shared-memory threads. Moreover, a set of structures for
synchronization (i.e. locks) were explicitly dened in the
code to avoid any potential race conditions. The result of the
modications are shown in the listings section of the report.
V. TESTS
This section outlines the testing methodology undertaken to
analyze and compare the serial, OpenMP and MPI implemen-
tations. Each set of tests is executed on two 4-core machines
with the specications displayed in the Appendix.
A. Test 1
The purpose of the rst test is to verify the correctness
of each of the three implementations. To ensure that each
program executes the right computations, we begin by de-
composing a matrix A into its Cholesky decomposition, LL
T
.
Then, we compare these two matrices based on the fact that
A should be equivalent to LL
T
, using the least absolute error
formula:
S =
n

i=1
|y
i
f(x
i
)| (1)
B. Test 2
The second test is conducted to measure the accuracy of
the decomposed matrix generated by each implementation.
To acquire this measure, a set of matrix operations is applied
both to the original matrix A and the decomposed matrix L
to produce an experimental result x

to be compared to a
theoretical result x. More precisely, we begin by generating a
random matrix A and a unidimensional vector x of the same
length (x being the theoretical value). We then compute the
product b of these matrices using the following equation:
Ax = b (2)
Subsequently, As Cholesky decomposition, LL
T
is gener-
ated. Using these matrices and the result b previously found,
the experimental value x

is computed based on the following

equation:
LL
T
x

= b (3)
Since L and L
T
are triangular matrices, nding x is
straightforward. Indeed, one can replace L
T
x

by y and nd
for y by solving this system of linear equations:
Ly = b (4)
3
Finally, x can be computed in a similar way using:
LTx

= y (5)
The error between the theoretical and experimental vectors
x and x is computed by least absolute error :
S =
n

i=1
|y
i
f(x
i
)| (6)
C. Test 3
The third test aims at comparing the performance of
the three implementations. To acquire this information, we
measure the program execution time with respect to the matrix
size and the number of threads (the latter concerning OpenMP
and MPI exclusively). The serial, OpenMP and MPI programs
are tested with square matrices of sizes varying from 50 to
5000, 50 to 4000 and 50 to 2000/3000 (see discussion for
more details) respectively. The OpenMP and MPI programs
are executed with the number of threads ranging from 2 to
32 and 2 to 9 respectively.
The matrix size and number of threads are inputted as
arguments before running the executables, while the execution
time is obtained by comparing the system time before and
after Cholesky decomposition and printing this value to the
command prompt.
For each program execution, a new random matrix is
generated. In order to ensure consistency between different
runs, each test case is conducted ve times and the average
runtime is logged as the measurement.
VI. RESULTS
The following section summarizes the statistics acquired
after executing each three of the aforementioned tests.
TABLE II
MEASURE OF THE CORRECTNESS OF EACH ALGORITHM
Serial OpenMP MPI
Correctness Passed Passed Passed
The least absolute deviation between the Cholesky decom-
position and the original matrix is computed with a precision
of 0.00000000001 for all implementations. For the serial
algorithm and the OpenMP and MPI versions with a matrix
size from 50 to 3000 and 1 to 9 threads, the results of the
deviation are always exactly 0. Therefore, all implementations
of the algorithm are correct, although it is not certain whether
the parallelization of the task is optimal.
VII. DISCUSSION
In, this section, we discuss and analyze our ndings based
on the data collected during tests 1, 2 and 3.
The results of the rst test show that all three implemen-
tation that we designed are correct, as expected. Indeed, the
error for each one of them is equal to exactly 0. This result is
required to conduct further testing and comparison between
the different implementations since it shows that Cholesky
decomposition is properly programmed for each version.
Surprisingly, the least square error measured in the second
test is precisely 0 for all implementations when we expected
TABLE III
BEST EXECUTION TIMES IN FUNCTION OF THE MATRIX SIZE FOR EACH
IMPLEMENTATION
Matrix
size
Serial execution
time (s)
OpenMP best exe-
cution time (s)
MPI best
execution
time (s)
50 0.00043 0.00113 0.000229
100 0.00464 0.00299 0.00291
200 0.01739 0.01047 0.008708
300 0.07026 0.03178 0.023265
500 0.28296 0.14314 0.10016
1000 3.86278 0.94137 0.833415
1500 11.31204 2.98861 2.856436
2000 30.60744 6.92203 6.927265
3000 113.79156 22.94476 22.438851
4000 284.43736 54.20875 -
5000 546,4589 - -
some loss of precision in the MPI implementation. This may
be explained by the fact that test matrices were diagonally
dominant, and hence the condition number of the matrices
were very low. This property ensures that the decomposition
yields little error, even though we didnt expect it to yield no
error at all.
Fig. 2. Runtime of serial Cholesky algorithm
Using data acquired during the third testing phase, the
execution time versus the matrix size is plotted for each
implementation. For the serial version, we observe that the
execution time increases cubically in function of the matrix
size, as displayed on gure 2. This result corresponds to our
expectations since Cholesky decomposition is composed of
three nested for loops resulting in a time complexity of O(n
3
).
Similarly, we note that the OpenMP and MPI execution times
(for a xed number of threads) increase with respect to the
matrix size, but less dramatically as displayed on Figures 3
and 4. This results illustrates that parallelizing computations
improves execution time.
Second, one can observe from Figure 3 that for most
of the conducted tests, the execution time of the OpenMP
algorithm for a xed matrix size decreases with an increasing
number of threads until 5 threads are summoned. With 5
threads onwards, the execution time generally increases as
more threads are added. However, there is a notable exception;
for matrix sizes above 1000, calling 9 threads often results in
a longer execution time than that of 16 or 32 threads. This
might be due to a particularity of Cholesky decomposition, or
is more likely a result of the underlying computer architecture.
The speedup plotted on Figure 5 illustrates the ratio of
4
TABLE I
EXECUTION TIME IN FUNCTION OF THE MATRIX SIZE AND NUMBER OF THREADS FOR THE OPENMP AND MPI IMPLEMENTATIONS
Fig. 3. Average OpenMP Implementation execution time
Fig. 4. Average OpenMPI Implementation execution time
the computation time of the OpenMP implementation versus
its serial counterpart. As it can be observed, the speedup
grows linearly as the matrix size increases. Indeed, the larger
the matrix size, the more OpenMP outperforms the serial
algorithm since the former completes in O(n
2
) and the latter
in O(n
3
).
It is very interesting to discuss an intriguing problem
encountered with the OpenMP implementation. Initially, a
column Cholesky algorithm was programmed and paral-
Fig. 5. Speedups of Parallel Implementations
lelized. After executing dozens of tests, we noted that the
performance improved as the number of threads was increased
up to 2000 threads -despite the program being run on a 4-
core machine! We suspect that spawning an overwhelming
amount of threads forces the program to terminate prema-
turely, without necessarily returning an error message. As
a result, increasing the amount of threads causes prompter
termination and thus smaller execution times. To address this
problem, a row Cholesky algorithm was programmed instead,
which resulted in logical results as previously discussed. We
chose to completely change the Cholesky implementation
from the column to the row version because OpenMP may
not deliver the expected level of performance with some
algorithms [Chapman].
Figure 4 shows the execution time of the MPI implementa-
tion for varying matrix sizes and number of threads. As it can
be seen, the execution time for a xed matrix size decreases
as the number of threads increases until the number of threads
equals 4 (corresponding to the number of cores on the tested
machine). From that point on, the execution time generally
increases with the number of threads, and one can observe
that this is consistent for different matrix sizes. An interesting
observation can also be made from Table I regarding the
execution time with more than 4 threads. When the number
of threads is set to an even number, the execution time is
less than with the preceding and following odd numbers.
5
Once again, this might be a result of the underlying computer
architecture.
The ratio of the computation time of MPI over the serial
implementation is plotted in Figure 5 and we can observe
that as the matrix size increases, the speedup increases
linearly. In fact, this conrms that MPI outperforms the serial
implementation by a factor of the matrix size n. Interestingly,
we can conclude from these results that MPIs behaviour is
very similar to that of OpenMP.
As previously implied, the best OpenMP and MPI perfor-
mance for most matrix sizes is achieved with an amount of 4
threads. This is due to the fact that tests were conducted on
a 4-core machine; maximum performance is achieved when
the number of threads matches the number of cores [Jie
Chen]. Indeed, a too small amount of threads doesnt fully
exploit the computers parallelization capabilities, while a too
large amount induces important context switching overhead
between processes [Chapman].
A. Serial/MPI/OpenMP comparison
As expected, the parallel implementations result in a
much higher performance than their serial counterpart; obvi-
ously, parallelizing independent computations results in much
smaller runtimes than executing them serially. Surprisingly, in
their best cases, the OpenMP and MPI algorithms complete
Cholesky decomposition in perceptibly identical runtimes,
with MPI having an advantage of a few microseconds, as seen
on Figure 6. We were expecting OpenMP to perform better,
since we believed that message passing between processes
would cause more overhead than using a shared memory
space. However, we have not been able to explain this result
on a theoretical basis.
Fig. 6. Average OpenMPI Speedup
One can observe that the OpenMP and MPI implemen-
tations were tested with a maximum matrix size of 4000
and 3000 respectively, compared to that of 5000 for the
serial implementation. This outcome is due to the fact that
running the program with a size higher than these values
would completely freeze the computer because of memory
limitations. Therefore, the serial implementation seems to
have an advantage over the parallel algorithms when the
matrix size is signicantly large because it doesnt require
additional memory elements such extra threads, processes or
messages. Similarly, MPI was not tested with values of 16 and
32 threads unlike OpenMP because these values would freeze
the computer. We believe that this behavior is explained by the
fact that MPI requires more memory for generating not only
processes, but also messages, while OpenMP only requires
threads.
Finally, it is essential to mention that it is much simpler to
parallelize a program with OpenMP than with MPI [Mall on];
the former only requires a few extra lines of code dening
the parallelizable sections and required synchronization mech-
anisms (locks, barriers, etc), while the latter requires the code
to be restructured in its entirety before setting up the message
passing architecture.
VIII. CONCLUSION
Throughout this project, we evaluated the performance
impact of using different parallelization techniques for the
Cholesky decomposition. First, a serial implementation was
developed and used as the reference base. Second, we used
multicore programming tools, namely OpenMP and MPI, to
apply different parallelization approaches. We inspected the
results of the different implementations by varying the matrix
size (between 0 to 4000) and the number of threads used
(between 1 to 32). As expected, both parallel implementations
result in a higher performance than the serial version. Also,
we observed that MPI resulted in better execution times of a
few microseconds over OpenMP.
In the future, we would like to experiment with a mixed
mode of OpenMP and MPI in the hope of discovering an even
more efcient parallelization scheme for Cholesky decompo-
sition, since this scheme may increase the code performance.
Moreover, we would like to conduct further tests with the
current implementations by running the different programs on
computers with various amounts of CPUs, such as 8,16,32 and
64. We would expect machines with more cores to provide
better execution times.
This project improved our knowledge regarding the dif-
ferent parallelization tools that can be used to parallelize a
program. In fact, we were able to apply the parallel computing
theories learned in class.
IX. REFERENCES
D. Mall on, et al., Performance Evaluation of MPI,
UPC and OpenMP on Multicore Architectures, in Re-
cent Advances in Parallel Virtual Machine and Message
Passing Interface. vol. 5759, M. Ropo, et al., Eds., ed:
Springer Berlin Heidelberg, 2009, pp. 174-184.
J. Shen and A. L. Varbanescu, A Detailed Performance
Analysis of the OpenMP Rodinia Benchmark, Techni-
cal Report PDS-2011-011, 2011.
B. Chapman, et al., Using OpenMP: Portable Shared
Memory Parallel Programming, The MIT Press, 2008.
M.T. Heath. Parallel Numerical Algorithms course,
Chapter 7 - Cholesky Factorization. University of Illi-
nois at Urbana Champaign.
K. A. Gallivan, et al., Parallel Algorithms for Matrix
Computations: Society for Industrial and Applied Math-
ematics, 1990.
L. Smith and M. Bull, Development of mixed mode
MPI / OpenMP applications, Sci. Program., vol. 9, pp.
83-98, 2001.
6
X. APPENDIX
A. Test Machine Specications
1) Machine 1
Hardware:
CPU: Intel Core i3 M350 Quad-Core @ 2.27 GHz
Memory: 3.7 GB
Software:
Ubuntu 12.10 64-bit
Linux kernel 3.5.0-17-generic
GNOME 3.6.0
2) Machine 2
Hardware:
CPU: Intel Core i7 Q 720 Quad-Core @ 1.60 GHz
Memory: 4.00 GB
Software:
Ubuntu 13.04 64-bit
Linux kernel 3.5.0-17-generic
GNOME 3.6.0
XI. CODE
A. matrix.c - Helper Functions
# i nc l ude ma t r i x . h
/ / Pr i n t a s quar e mat r i x .
voi d p r i n t ( doubl e mat r i x , i nt ma t r i x Si z e ) {
i nt i , j ;
f or ( i = 0; i < ma t r i x Si z e ; i ++) {
f or ( j = 0; j < ma t r i x Si z e ; j ++) {
p r i n t f ( %.2 f \ t , ma t r i x [ i ] [ j ] ) ;
}
p r i n t f ( \n ) ;
}
p r i n t f ( \n ) ;
}
/ / Mu l t i p l y t wo s quar e ma t r i c e s of t he same s i z e .
doubl e ma t r i x Mu l t i p l y ( doubl e mat r i x1 , doubl e mat r i x2 , i nt ma t r i x Si z e ) {
/ / Al l o c a t e s memory f o r a mat r i x of doubl e s .
i nt i , j , k ;
doubl e mat r i xOut = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
f or ( i = 0; i < ma t r i x Si z e ; i ++){
mat r i xOut [ i ] = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
}
doubl e r e s u l t = 0;
/ / F i l l each c e l l of t he mat r i x out put .
f or ( i = 0 ; i < ma t r i x Si z e ; i ++){
f or ( j = 0; j < ma t r i x Si z e ; j ++){
/ / Mu l t i p l y each row of mat r i x 1 wi t h each col umn of mat r i x 2.
f or ( k = 0; k < ma t r i x Si z e ; k++){
r e s u l t += ma t r i x1 [ i ] [ k ] ma t r i x2 [ k ] [ j ] ;
}
mat r i xOut [ i ] [ j ] = r e s u l t ;
r e s u l t = 0; / / Re s e t ;
}
7
}
ret urn mat r i xOut ;
}
/ / Add t wo s quar e ma t r i c e s of t he same s i z e .
doubl e ma t r i xAddi t i on ( doubl e mat r i x1 , doubl e mat r i x2 , i nt ma t r i x Si z e ) {
/ / Al l o c a t e s memory f o r a mat r i x of doubl e s .
i nt i , j ;
doubl e mat r i xOut = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
f or ( i = 0; i < ma t r i x Si z e ; i ++){
mat r i xOut [ i ] = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
}
/ / F i l l each c e l l of t he mat r i x out put .
f or ( i = 0 ; i < ma t r i x Si z e ; i ++){
f or ( j = 0; j < ma t r i x Si z e ; j ++){
mat r i xOut [ i ] [ j ] = ma t r i x1 [ i ] [ j ] + ma t r i x2 [ i ] [ j ] ;
}
}
ret urn mat r i xOut ;
}
/ / Mu l t i p l y a s quar e mat r i x by a v e c t o r . Ret ur n n u l l i f f a i l u r e .
doubl e v e c t o r Mu l t i p l y ( doubl e mat r i x , doubl e ve c t or , i nt ma t r i xSi z e , i nt v e c t o r Si z e ) {
doubl e r e s u l t = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
i f ( v e c t o r Si z e ! = ma t r i x Si z e ) {
ret urn NULL;
}
i nt i , j ;
doubl e sum = 0 . 0 ;
/ / Mu l t i p l i c a t i o n .
f or ( i = 0 ; i < ma t r i x Si z e ; i ++){
f or ( j = 0; j < ma t r i x Si z e ; j ++){
sum += ma t r i x [ i ] [ j ] v e c t o r [ j ] ;
}
r e s u l t [ i ] = sum;
sum = 0; / / Re s e t .
}
ret urn r e s u l t ;
}
/ / Ret ur n t he t r a n s p o s e of a s quar e mat r i x .
doubl e t r a n s p o s e ( doubl e mat r i x , i nt ma t r i x Si z e ) {
/ / Al l o c a t e s memory f o r a mat r i x of doubl e s .
i nt i , j ;
doubl e mat r i xOut = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
f or ( i = 0; i < ma t r i x Si z e ; i ++){
mat r i xOut [ i ] = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
}
/ / Tr ans pos e t he mat r i x .
f or ( i = 0 ; i < ma t r i x Si z e ; i ++){
f or ( j = 0; j < ma t r i x Si z e ; j ++){
8
mat r i xOut [ i ] [ j ] = ma t r i x [ j ] [ i ] ;
}
}
ret urn mat r i xOut ;
}
/ / Cr eat e a r e a l p o s i t i v e d e f i n i t e mat r i x .
doubl e i n i t i a l i z e ( i nt mi nVal ue , i nt maxValue , i nt ma t r i x Si z e ) {
/ / Al l o c a t e s memory f o r a ma t r i c e s of doubl e s .
i nt i , j ;
doubl e ma t r i x = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
doubl e i d e n t i t y = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
f or ( i = 0; i < ma t r i x Si z e ; i ++){
ma t r i x [ i ] = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
i d e n t i t y [ i ] = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
}
/ / Cr e at e s an uppert r i a n g u l a r mat r i x of random numbers bet ween mi nVal ue and maxVal ue .
/ / Cr e at e s an i d e n t i t y mat r i x mu l t i p l i e d by maxVal ue .
doubl e random ;
f or ( i = 0 ; i < ma t r i x Si z e ; i ++){
i d e n t i t y [ i ] [ i ] = maxValue ma t r i x Si z e ;
f or ( j = 0 ; j < ma t r i x Si z e ; j ++){
random = ( maxValue mi nVal ue )
( ( doubl e ) r and ( ) / ( doubl e )RAND MAX) + mi nVal ue ;
i f ( random == 0 . 0 ) {
random = 1 . 0 ; / / Avoi d d i v i s i o n by 0.
}
ma t r i x [ i ] [ j ] = random ;
}
}
/ / Tr ans f or m t o p o s i t i v e d e f i n i t e .
doubl e t r a n s p o s e d = t r a n s p o s e ( mat r i x , ma t r i x Si z e ) ;
ma t r i x = ma t r i xAddi t i on ( mat r i x , t r a ns pos e d , ma t r i x Si z e ) ;
ma t r i x = ma t r i xAddi t i on ( mat r i x , i d e n t i t y , ma t r i x Si z e ) ;
ret urn ma t r i x ;
}
/ / Comput es t he sum of Abs ol ut e e r r or bet ween 2 v e c t o r s
doubl e vect or Comput eSumof AbsEr r or ( doubl e ve c t o r 1 , doubl e ve ct or 2 , i nt s i z e )
{
i nt i ;
doubl e sumOfAbsError = 0;
f or ( i = 0; i < s i z e ; i ++)
{
sumOfAbsError += f a bs ( v e c t or 2 [ i ] ve c t o r 1 [ i ] ) ;
}
ret urn sumOfAbsError ;
}
/ / Comput es t he sum of Abs ol ut e e r r or bet ween 2 ma t r i c e s
voi d ComputeSumOfAbsError ( doubl e ma t r i x1 , doubl e mat r i x2 , i nt s i z e )
9
{
p r i n t f ( Mat r i x 1 : \ n ) ;
i nt i , j ;
doubl e sumOfAbsError = 0;
f or ( i = 0; i < s i z e ; i ++)
{
f or ( j = 0; j < s i z e ; j ++)
{
sumOfAbsError += f a bs ( ma t r i x1 [ i ] [ j ] ma t r i x2 [ i ] [ j ] ) ;
}
}
p r i n t f ( The sum of a b s o l u t e e r r o r i s %10.6 f \n , sumOfAbsError ) ;
}
voi d p r i n t Ve c t o r ( doubl e ve c t or , i nt s i z e ) {
i nt i ;
f or ( i = 0; i < s i z e ; i ++){
p r i n t f ( \ t %10.6 f , v e c t o r [ i ] ) ;
p r i n t f ( \n ) ;
}
p r i n t f ( \n ) ;
}
doubl e i n i t Ma t r i x ( i nt s i z e ) {
doubl e ma t r i x = ( doubl e ) mal l oc ( s i z e s i z e o f ( doubl e ) ) ;
i nt i ;
f or ( i = 0; i < s i z e ; i ++)
ma t r i x [ i ] = ( doubl e ) mal l oc ( s i z e s i z e o f ( doubl e ) ) ;
ret urn ma t r i x ;
}
voi d t r ans Copy ( doubl e s our ce , doubl e des t , i nt s i z e ) {
i nt i , j ;
f or ( i = 0; i < s i z e ; i ++){
f or ( j = 0; j <= i ; j ++){
d e s t [ i ] [ j ] = s our c e [ i ] [ j ] ;
}
}
}
voi d copyMat r i x ( doubl e s our ce , doubl e des t , i nt s i z e ) {
i nt i , j ;
f or ( i = 0; i < s i z e ; i ++){
f or ( j = 0; j < s i z e ; j ++){
d e s t [ i ] [ j ] = s our c e [ i ] [ j ] ;
}
}
}
;
10
B. cholSerial.c - Serial Cholesky
# i nc l ude c h o l S e r i a l . h
doubl e c h o l S e r i a l ( doubl e A, i nt n ) {
/ / Copy mat r i x A and t ak e onl y l ower t r i a n g u l a r par t
doubl e L = i n i t Ma t r i x ( n ) ;
t r ans Copy (A, L, n ) ;
i nt i , j , k ;
f or ( j = 0; j < n ; j ++){
f or ( k = 0; k < j ; k++){
/ / I nne r sum
f or ( i = j ; i < n ; i ++){
L[ i ] [ j ] = L[ i ] [ j ] L[ i ] [ k ] L[ j ] [ k ] ;
}
}
L[ j ] [ j ] = s q r t ( L[ j ] [ j ] ) ;
f or ( i = j +1; i < n ; i ++){
L[ i ] [ j ] = L[ i ] [ j ] / L[ j ] [ j ] ;
}
}
ret urn L;
}
;
11
C. cholOMP.c - OpenMP Cholesky
# i nc l ude ma t r i x . h
# i nc l ude <omp . h>
doubl e cholOMP ( doubl e L, i nt n ) {
/ / Warni ng : a c t s d i r e c t l y on gi v e n mat r i x !
i nt i , j , k ;
omp l ock t wr i t e l o c k ;
omp i ni t l oc k (&wr i t e l o c k ) ;
f or ( j = 0; j < n ; j ++) {
f or ( i = 0; i < j ; i ++){
L[ i ] [ j ] = 0;
}
#pragma omp p a r a l l e l f or s ha r e d ( L) p r i v a t e ( k )
f or ( k = 0; k < i ; k++) {
omp s et l ock (&wr i t e l o c k ) ;
L[ j ] [ j ] = L[ j ] [ j ] L[ j ] [ k ] L[ j ] [ k ] ; / / Cr i t i c a l s e c t i o n .
omp uns et l ock(&wr i t e l o c k ) ;
}
#pragma omp s i n g l e
L[ i ] [ i ] = s q r t ( L[ j ] [ j ] ) ;
#pragma omp p a r a l l e l f or s ha r e d ( L) p r i v a t e ( i , k )
f or ( i = j +1; i < n ; i ++) {
f or ( k = 0; k < j ; k++) {
L[ i ] [ j ] = L[ i ] [ j ] L[ i ] [ k ] L[ j ] [ k ] ;
}
L[ i ] [ j ] = L[ i ] [ j ] / L[ j ] [ j ] ;
}
omp des t r oy l ock (&wr i t e l o c k ) ;
}
ret urn L;
}
;
12
D. cholMPI.c - OpenMPI Cholesky
# i nc l ude <mpi . h>
# i nc l ude ma t r i x . h
voi d chol MPI ( doubl e A, doubl e L, i nt n , i nt ar gc , char ar gv ) {
/ / Warni ng : chol MPI ( ) a c t s d i r e c t l y on t he gi v e n mat r i x !
i nt npes , r ank ;
MPI I ni t (&ar gc , &ar gv ) ;
MPI Comm size (MPI COMM WORLD, &npes ) ;
MPI Comm rank (MPI COMM WORLD, &r ank ) ;
doubl e s t a r t , end ;
MPI Bar r i er (MPI COMM WORLD) ; / Ti mi ng /
i f ( r ank == 0) {
s t a r t = MPI Wtime ( ) ;
/ / / Te s t
p r i n t f ( A = \n ) ;
p r i n t ( L , n ) ; /
}
/ / For each col umn
i nt i , j , k ;
f or ( j = 0; j < n ; j ++) {
/
St e p 0:
Repl ace t he e n t r i e s above t he di agonal wi t h z e r oe s
/
i f ( r ank == 0) {
f or ( i = 0; i < j ; i ++) {
L[ i ] [ j ] = 0 . 0 ;
}
}
/
St e p 1:
Updat e t he di agonal e l e me nt
/
i f ( j%npes == r ank ) {
f or ( k = 0; k < j ; k++) {
L[ j ] [ j ] = L[ j ] [ j ] L[ j ] [ k ] L[ j ] [ k ] ;
}
L[ j ] [ j ] = s q r t ( L[ j ] [ j ] ) ;
}
/ / Br oadcas t row wi t h new v al ue s t o ot he r p r o c e s s e s
MPI Bcast ( L[ j ] , n , MPI DOUBLE, j%npes , MPI COMM WORLD) ;
/
St e p 2:
Updat e t he e l e me nt s bel ow t he di agonal e l e me nt
/
/ / Di vi de t he r e s t of t he work
13
f or ( i = j +1; i < n ; i ++) {
i f ( i%npes == r ank ) {
f or ( k = 0; k < j ; k++) {
L[ i ] [ j ] = L[ i ] [ j ] L[ i ] [ k ] L[ j ] [ k ] ;
}
L[ i ] [ j ] = L[ i ] [ j ] / L[ j ] [ j ] ;
}
}
}
MPI Bar r i er (MPI COMM WORLD) ; / Ti mi ng /
i f ( r ank == 0) {
end = MPI Wtime ( ) ;
p r i n t f ( Te s t i n g OpenMpi i mpl e me nt a t i on Out put : \n ) ;
p r i n t f ( Runt i me = %l f \n , ends t a r t ) ;
p r i n t f ( Te s t i n g MPI i mpl e me nt a t i on Out put : ) ;
t e s t Ba s i c Ou t p u t (A, L, n ) ;
/ / Te s t
/ doubl e LLT = ma t r i x Mu l t i p l y ( L , t r a n s p o s e ( L , n ) , n ) ;
p r i n t f ( LL T = \n ) ;
p r i n t ( LLT , n ) ; /
}
MPI Fi nal i ze ( ) ;
}
i nt t e s t Ba s i c Ou t p u t ( doubl e A, doubl e L, i nt n )
{
doubl e LLT = ma t r i x Mu l t i p l y ( L, t r a n s p o s e ( L, n ) , n ) ;
i nt i , j ;
f l o a t p r e c i s i o n = 0. 0000001;
f or ( i = 0; i < n ; i ++){
f or ( j = 0; j < n ; j ++){
i f ( ! ( abs ( LLT[ i ] [ j ] A[ i ] [ j ] ) < p r e c i s i o n ) )
{
p r i n t f ( FAILED\n ) ;
ComputeSumOfAbsError (A, LLT, n ) ;
ret urn 0;
}
}
}
p r i n t f ( PASSED\n ) ;
ret urn 1;
}
;
14
E. tests.c - General Test Code
# i nc l ude <s t d i o . h>
# i nc l ude <s t r i n g . h>
# i nc l ude <math . h>
# i nc l ude <f l o a t . h>
# i nc l ude <t i me . h>
# i nc l ude <s t d l i b . h>
# i nc l ude <t i me . h>
# i nc l ude <omp . h>
# i nc l ude ma t r i x . h
t ypedef i nt bool ;
enum { f a l s e , t r u e };
s t r uc t t i me s pe c begi n ={0 , 0} , end ={0 , 0};
t i me t s t a r t , s t op ;
i nt main ( i nt ar gc , char ar gv )
{
/ / ge ne r at e s eed
s r a nd ( t i me (NULL) ) ;
i f ( a r gc ! = 3)
{
p r i n t f ( You di d not f e e d me ar gument s , I wi l l di e now : ( . . . \n ) ;
p r i n t f ( Usage : %s [ ma t r i x s i z e ] [ number of t h r e a d s ] \n , ar gv [ 0 ] ) ;
ret urn 1;
}
i nt ma t r i x Si z e = a t o i ( ar gv [ 1 ] ) ;
i nt t hr eads Number = a t o i ( ar gv [ 2 ] ) ;
p r i n t f ( Te s t b a s i c out p ut f o r a ma t r i x of s i z e %d : \ n , ma t r i x Si z e ) ;
/ / Gener at e random SPD mat r i x
doubl e A = i n i t i a l i z e ( 0 , 10 , ma t r i x Si z e ) ;
/ p r i n t f ( Chol mat r i x \n ) ;
p r i n t ( A, ma t r i x S i z e ) ; /
doubl e L = i n i t i a l i z e ( 0 , 10 , ma t r i x Si z e ) ;
/ / Te s t S e r i a l Program
/ / Appl y S e r i a l Chol es ky
p r i n t f ( Te s t i n g S e r i a l i mpl e me nt a t i on Out put : \n ) ;
c l o c k g e t t i me (CLOCK MONOTONIC, &begi n ) ;
L = c h o l S e r i a l (A, ma t r i x Si z e ) ;
c l o c k g e t t i me (CLOCK MONOTONIC, &end ) ; / / Get t he c u r r e n t t i me .
t e s t Ba s i c Out put Of Chol (A, L, ma t r i x Si z e ) ;
/ / Te s t e x e c u t i o n t i me
p r i n t f ( The s e r i a l comput at i on t ook %.5 f s econds \n ,
( ( doubl e ) end . t v s e c + 1. 0 e9 end . t v ns e c )
( ( doubl e ) begi n . t v s e c + 1. 0 e9 begi n . t v ns e c ) ) ;
/ / Te s t i n g OpenMP Program
p r i n t f ( Te s t i n g OpenMP i mpl e me nt a t i on Out put : \n ) ;
omp s et num t hr eads ( t hr eads Number ) ;
15
copyMat r i x (A, L, ma t r i x Si z e ) ;
c l o c k g e t t i me (CLOCK MONOTONIC, &begi n ) ;
cholOMP ( L, ma t r i x Si z e ) ;
c l o c k g e t t i me (CLOCK MONOTONIC, &end ) ; / / Get t he c u r r e n t t i me .
t e s t Ba s i c Out put Of Chol (A, L, ma t r i x Si z e ) ;
/ / Te s t e x e c u t i o n t i me
p r i n t f ( The OpenMP comput at i on t ook %.5 f s econds \n ,
( ( doubl e ) end . t v s e c + 1. 0 e9 end . t v ns e c )
( ( doubl e ) begi n . t v s e c + 1. 0 e9 begi n . t v ns e c ) ) ;
p r i n t f ( \n ) ;
ret urn 0;
}
i nt t e s t Ba s i c Out put Of Chol ( doubl e A, doubl e L, i nt n )
{
doubl e LLT = ma t r i x Mu l t i p l y ( L, t r a n s p o s e ( L, n ) , n ) ;
i nt i , j ;
f l o a t p r e c i s i o n = 0. 00000000001;
f or ( i = 0; i < n ; i ++){
f or ( j = 0; j < n ; j ++){
i f ( ! ( abs ( LLT[ i ] [ j ] A[ i ] [ j ] ) < p r e c i s i o n ) )
{
p r i n t f ( FAILED\n ) ; / / i f i t f a i l s show t he e r r or
ComputeSumOfAbsError (A, LLT, n ) ;
ret urn 0;
}
}
}
p r i n t f ( PASSED\n ) ;
ret urn 1;
}
voi d t e s t Ti me f o r Se r i a l Ch o l ( i nt n )
{
p r i n t f ( Te s t d u r a t i o n f o r s e r i a l v e r s i o n wi t h ma t r i x of s i z e %d \n , n ) ;
/ / Gener at e random SPD mat r i x
doubl e A = i n i t i a l i z e ( 0 , 10 , n ) ;
c l o c k t s t a r t = c l oc k ( ) ;
/ / Appl y Chol es ky
doubl e L = c h o l S e r i a l (A, n ) ;
c l o c k t end = c l oc k ( ) ;
f l o a t s econds = ( f l o a t ) ( end s t a r t ) / CLOCKS PER SEC;
p r i n t f ( I t t ook %f s econds \n , s econds ) ;
}
voi d t e s t Er r o r Of Li n e a r Sy s t e mAp p l i c a t i o n ( i nt ma t r i x Si z e )
{
p r i n t f ( Te s t l i n e a r s ys t em a p p l i c a t i o n of Chol esky f o r ma t r i x s i z e %d : \ n ,
ma t r i x Si z e ) ;
doubl e A = i n i t i a l i z e ( 0 , 10 , ma t r i x Si z e ) ;
doubl e xTheo = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
i nt i ndex ;
f or ( i ndex = 0; i ndex < ma t r i x Si z e ; i ndex ++)
{
16
xTheo [ i ndex ] = r and ( ) / ( doubl e ) RAND MAX 10;
}
doubl e b = v e c t o r Mu l t i p l y (A, xTheo , ma t r i xSi z e , ma t r i x Si z e ) ;
/ / Appl y Chol es ky
doubl e L = c h o l S e r i a l (A, ma t r i x Si z e ) ;
doubl e y = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
/ / Forwards u b s t i t u t i o n par t
i nt i , j ;
f or ( i = 0; i < ma t r i x Si z e ; i ++){
y [ i ] = b [ i ] ;
f or ( j = 0; j < i ; j ++){
y [ i ] = y [ i ] L[ i ] [ j ] y [ j ] ;
}
y [ i ] = y [ i ] / L[ i ] [ i ] ;
}
/ / Backs u b s t i t u t i o n par t
doubl e LT = t r a n s p o s e ( L, ma t r i x Si z e ) ;
doubl e xExpr = ( doubl e ) mal l oc ( ma t r i x Si z e s i z e o f ( doubl e ) ) ;
f or ( i = ma t r i x Si z e 1; i >=0; i ){
xExpr [ i ] = y [ i ] ;
f or ( j = i + 1 ; j < ma t r i x Si z e ; j ++){
xExpr [ i ] = xExpr [ i ] LT[ i ] [ j ] xExpr [ j ] ;
}
xExpr [ i ] = xExpr [ i ] / LT[ i ] [ i ] ;
}
p r i n t f ( x e x p e r i me n t a l i s : \n ) ;
p r i n t Ve c t o r ( xExpr , ma t r i x Si z e ) ;
p r i n t f ( The sum of abs e r r o r i s %10.6 f \n ,
vect or Comput eSumof AbsEr r or ( xTheo , xExpr , ma t r i x Si z e ) ) ;
}
;
17
F. testMPI.c - Test program for MPI implementation
# i nc l ude ma t r i x . h
i nt main ( i nt ar gc , char ar gv )
{
/ / ge ne r at e s eed
s r a nd ( t i me (NULL) ) ;
i f ( a r gc ! = 2)
{
p r i n t f ( You di d not f e e d me ar gument s , I wi l l di e now : ( . . . \n ) ;
p r i n t f ( Usage : %s [ ma t r i x s i z e ] \n , ar gv [ 0 ] ) ;
ret urn 1;
}
i nt ma t r i x Si z e = a t o i ( ar gv [ 1 ] ) ;
/ / Gener at e random SPD mat r i x
doubl e A = i n i t i a l i z e ( 0 , 10 , ma t r i x Si z e ) ;
/ p r i n t f ( Chol mat r i x \n ) ;
p r i n t ( A, ma t r i x S i z e ) ; /
doubl e L = i n i t i a l i z e ( 0 , 10 , ma t r i x Si z e ) ;
/ / Te s t i n g OpenMpi Program
copyMat r i x (A, L, ma t r i x Si z e ) ;
chol MPI (A, L, ma t r i xSi z e , ar gc , ar gv ) ;
/ / Warni ng : chol MPI ( ) a c t s d i r e c t l y on t he gi v e n mat r i x L
}
;

1 15 Powerblock Repair PDF
100% (1)
1 15 Powerblock Repair PDF
20 pages
Preservation Letter
81% (21)
Preservation Letter
4 pages
ECSE 323 - Tic Tac Toe Game On FPGA Board - Report
60% (5)
ECSE 323 - Tic Tac Toe Game On FPGA Board - Report
85 pages
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
Rapport Factorisation Cholesky-1
No ratings yet
Rapport Factorisation Cholesky-1
11 pages
DSP Implementation of Cholesky Decomposition
No ratings yet
DSP Implementation of Cholesky Decomposition
4 pages
DSP Implementation of Cholesky Decomposition
No ratings yet
DSP Implementation of Cholesky Decomposition
4 pages
Cholesky Method
100% (1)
Cholesky Method
6 pages
Experiment 6
No ratings yet
Experiment 6
2 pages
CS 240A: Solving Ax B in Parallel: Dense A: Gaussian Elimination With Partial Pivoting (LU)
No ratings yet
CS 240A: Solving Ax B in Parallel: Dense A: Gaussian Elimination With Partial Pivoting (LU)
35 pages
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
Optimal Ordering Schemes
No ratings yet
Optimal Ordering Schemes
26 pages
A Friendly Introduction to MATLAB Programming
From Everand
A Friendly Introduction to MATLAB Programming
Orhan Gazi
No ratings yet
Introduction to Numerical Analysis
From Everand
Introduction to Numerical Analysis
Simone Malacrida
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Exercises of Numerical Analysis
From Everand
Exercises of Numerical Analysis
Simone Malacrida
No ratings yet
MIT - Applied Parallel Computing - Alan Edelman
No ratings yet
MIT - Applied Parallel Computing - Alan Edelman
187 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Logic Primer, third edition
From Everand
Logic Primer, third edition
Colin Allen
No ratings yet
Nonlinear Control Feedback Linearization Sliding Mode Control
From Everand
Nonlinear Control Feedback Linearization Sliding Mode Control
Mourad Boufadene
No ratings yet
Cholesky Decomposition: 1 1 2 1 2) /2), and X 2 2 1
No ratings yet
Cholesky Decomposition: 1 1 2 1 2) /2), and X 2 2 1
7 pages
Investigation of the Usefulness of the PowerWorld Simulator Program: Developed by "Glover, Overbye & Sarma" in the Solution of Power System Problems
From Everand
Investigation of the Usefulness of the PowerWorld Simulator Program: Developed by "Glover, Overbye & Sarma" in the Solution of Power System Problems
Dr. Hidaia Mahmood Alassouli
No ratings yet
Loop-shaping Robust Control
From Everand
Loop-shaping Robust Control
Philippe Feyel
No ratings yet
Sym Band Doc
No ratings yet
Sym Band Doc
7 pages
Cholesky Decomposition: Switch Case
No ratings yet
Cholesky Decomposition: Switch Case
2 pages
Maths
No ratings yet
Maths
9 pages
Iccgi 2013 11 30 10133
No ratings yet
Iccgi 2013 11 30 10133
5 pages
Efficient Parallel Algorithm for
No ratings yet
Efficient Parallel Algorithm for
12 pages
Modified Cholesky Decomposition and Applications
No ratings yet
Modified Cholesky Decomposition and Applications
119 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
A fast algorithm for subspace state-space system identification via exploitation of the displacem
No ratings yet
A fast algorithm for subspace state-space system identification via exploitation of the displacem
11 pages
Implementation of Generalized Lanczos Procedure For Structural Dynamics
No ratings yet
Implementation of Generalized Lanczos Procedure For Structural Dynamics
52 pages
CHOLMOD UserGuide
No ratings yet
CHOLMOD UserGuide
153 pages
Cholesky Decomposition
No ratings yet
Cholesky Decomposition
8 pages
User Guide For CHOLMOD: A Sparse Cholesky Factorization and Modification Package
No ratings yet
User Guide For CHOLMOD: A Sparse Cholesky Factorization and Modification Package
140 pages
Simplex
No ratings yet
Simplex
147 pages
Sparse Matrix Methods: Day 1: Overview
No ratings yet
Sparse Matrix Methods: Day 1: Overview
17 pages
CHOLMOD UserGuide
No ratings yet
CHOLMOD UserGuide
152 pages
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
CO-2 (2)
No ratings yet
CO-2 (2)
22 pages
Second Midterm Exam
No ratings yet
Second Midterm Exam
11 pages
Stanford 2013
No ratings yet
Stanford 2013
36 pages
Analytical Methods of Optimization
From Everand
Analytical Methods of Optimization
D. F. Lawden
No ratings yet
Cholesky Factorization - MATLAB Chol
No ratings yet
Cholesky Factorization - MATLAB Chol
3 pages
Cholesky
No ratings yet
Cholesky
13 pages
Parallel Processing
No ratings yet
Parallel Processing
35 pages
Lishen He CEE 470 Fall 2014
No ratings yet
Lishen He CEE 470 Fall 2014
32 pages
Cholesky Factorization: EE103 (Fall 2011-12)
No ratings yet
Cholesky Factorization: EE103 (Fall 2011-12)
21 pages
Current Trends in Numerical Linear Algebra
No ratings yet
Current Trends in Numerical Linear Algebra
19 pages
Digital Circuit Simulation Using Excel
From Everand
Digital Circuit Simulation Using Excel
Anthony Mazzurco
No ratings yet
Linear Equation Faster Than Mul
No ratings yet
Linear Equation Faster Than Mul
18 pages
Content PDF
No ratings yet
Content PDF
14 pages
Project Topics
No ratings yet
Project Topics
2 pages
Linear Algebra: Assignment I
No ratings yet
Linear Algebra: Assignment I
11 pages
Algorithmic Probability: Fundamentals and Applications
From Everand
Algorithmic Probability: Fundamentals and Applications
Fouad Sabry
No ratings yet
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter Kattan
No ratings yet
Sparse 1
No ratings yet
Sparse 1
68 pages
2. Cholesky
No ratings yet
2. Cholesky
49 pages
Chapter 9 - Parallel Computation Problems
No ratings yet
Chapter 9 - Parallel Computation Problems
43 pages
CE 007 (Numerical Solutions To CE Problems)
No ratings yet
CE 007 (Numerical Solutions To CE Problems)
19 pages
Advanced Numerical Methods with Matlab 1: Function Approximation and System Resolution
From Everand
Advanced Numerical Methods with Matlab 1: Function Approximation and System Resolution
Bouchaib Radi
No ratings yet
ECSE 489 - Java Chat Client - Project Report
No ratings yet
ECSE 489 - Java Chat Client - Project Report
15 pages
ECSE 548 - Electronic Design and Implementation of The Sine Function On 8-Bit MIPS Processor - Report
100% (1)
ECSE 548 - Electronic Design and Implementation of The Sine Function On 8-Bit MIPS Processor - Report
4 pages
ECSE 211 - Autonomous Block Stacking Robot - Project Documentation
100% (1)
ECSE 211 - Autonomous Block Stacking Robot - Project Documentation
41 pages
Symfony5 The Fast Track
No ratings yet
Symfony5 The Fast Track
362 pages
EVO33 - Handover To Support Checklist
No ratings yet
EVO33 - Handover To Support Checklist
2 pages
Whole Microland Placement Test Paper
No ratings yet
Whole Microland Placement Test Paper
7 pages
25 Serial Peripheral Interface (SPI)
No ratings yet
25 Serial Peripheral Interface (SPI)
53 pages
Intro Event Driven Architectures Guide
No ratings yet
Intro Event Driven Architectures Guide
10 pages
Instructions, Fetch, Execution Cycle and Concept of Operand, Register and Storage
No ratings yet
Instructions, Fetch, Execution Cycle and Concept of Operand, Register and Storage
22 pages
Microcontroller Based Wheather Monitoring System: Mini Project On "
No ratings yet
Microcontroller Based Wheather Monitoring System: Mini Project On "
12 pages
Release Information IPconfig v4.2.0.0
No ratings yet
Release Information IPconfig v4.2.0.0
7 pages
The Private 5G Revolution: Dr. Andreas Mueller
No ratings yet
The Private 5G Revolution: Dr. Andreas Mueller
19 pages
SAP GUI For HTML and Web Dynpro Tiles On FLP - Troubleshooting Guide
No ratings yet
SAP GUI For HTML and Web Dynpro Tiles On FLP - Troubleshooting Guide
16 pages
10 Interrupt v21 Rev1
No ratings yet
10 Interrupt v21 Rev1
43 pages
WebLogic Server Overview
No ratings yet
WebLogic Server Overview
13 pages
Autosar Sws Lindriver
No ratings yet
Autosar Sws Lindriver
67 pages
Real Time Applications of Encoder and Decoder
No ratings yet
Real Time Applications of Encoder and Decoder
10 pages
Devika G - 9th Group - Task 8
No ratings yet
Devika G - 9th Group - Task 8
15 pages
Chapter 3 C Programming For 8051 2023
No ratings yet
Chapter 3 C Programming For 8051 2023
29 pages
Unit 3 - Week 1 Quiz
No ratings yet
Unit 3 - Week 1 Quiz
3 pages
Untitled
No ratings yet
Untitled
2 pages
Frangipani: A Scalable Distributed File System
No ratings yet
Frangipani: A Scalable Distributed File System
3 pages
Network Address Translation (NAT) : Khawar Butt Ccie # 12353 (R/S, Security, SP, DC, Voice, Storage & Ccde)
No ratings yet
Network Address Translation (NAT) : Khawar Butt Ccie # 12353 (R/S, Security, SP, DC, Voice, Storage & Ccde)
11 pages
Product Specification VIBROCONTROL 950/960: Features
No ratings yet
Product Specification VIBROCONTROL 950/960: Features
3 pages
Microphone Preamplifier
50% (2)
Microphone Preamplifier
3 pages
Network Security Modules 1 to 10 小考解答
No ratings yet
Network Security Modules 1 to 10 小考解答
19 pages
TAB E 7.0" LTE REV0.3: Samsung Confidential Samsung Confidential
No ratings yet
TAB E 7.0" LTE REV0.3: Samsung Confidential Samsung Confidential
10 pages
Functional Design Specification
No ratings yet
Functional Design Specification
6 pages
Aer Colourizer Eng
No ratings yet
Aer Colourizer Eng
12 pages
Online Student Union Voting System
50% (2)
Online Student Union Voting System
58 pages
Poweredge-C5230 - Owner's Manual - En-Us PDF
No ratings yet
Poweredge-C5230 - Owner's Manual - En-Us PDF
146 pages