0% found this document useful (0 votes)

146 views14 pages

An Ecient Parallel Version of The Householder-QL Matrix Diagonalisation Algorithm

This document summarizes an efficient parallel version of the Householder-QL matrix diagonalization algorithm. The Householder algorithm scales like O(N2) = P + 3N log2(P) and the QL algorithm scales like O(N3) + O(N2) = P as the number of processors P increases for a fixed problem size. The code is implemented in C using MPI and was tested on an IBM SP2 with real matrices from simulations of crystalline materials.

Uploaded by

Shashi Kant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PS, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

146 views14 pages

An Ecient Parallel Version of The Householder-QL Matrix Diagonalisation Algorithm

Uploaded by

Shashi Kant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PS, PDF, TXT or read online on Scribd

You are on page 1/ 14

An Ecient Parallel Version of the

Householder-QL Matrix Diagonalisation

Algorithm
J.S. Reeveand M. Heath
Department of Electronics and Computer Science
University of Southampton
Southampton SO17 1BJ, UK
email [email protected]
October 9, 1998

Abstract
In this paper we report an eective parallelisation of the House-
holder routine for the reduction of a real symmetric matrix to tri-
diagonal form and the QL algorithm for the diagonalisation of the
resulting matrix. The Householder algorithm scales like N =P +3

N log (P ) and the QL algorithm like N + N =P as the number

2
2
2 3

of processors P is increased for xed problem size. The constant pa-

rameters , , and are obtained empirically. When the eigenvalues
only are required the Householder method scales as above while the
QL algorithm remains sequential. The code is implemented in c in
conjunction with the Message Passing Interface (MPI) libraries and
veried on a sixteen node IBM SP2 and for real matrices that occur
in the simulation of properties of crystaline materials

Keywords Householder; QL; Parallel Algorithms.

1
The Householder-QL Matrix Diagonalisation Algorithm 2

1 Introduction
The parallelisation of methods for nding the eigenvalues and eigenvectors
for real symetric matrices has received little attention over the years, for two
reasons. Firstly for very large systems quite often only a few eigenvalues and
eigenvectors are needed. Secondly, the parallelisation of the best sequential
techniques for matrix diagonalisation has proved dicult and ineective.
In the simulation of the dynamics of large structures, only the lowest few
eigenvalues that correspond to the most likely occuring low frequency modes
of vibration are needed. Similarly, in solid state physics, states at or near the
ground state are those most likely to be occupied, at least at low tempera-
tures, and so make the largest contribution to the material properties. For
these reasons, in subjects like structural mechanics, techniques have concen-
trated in reducing the degrees of freedom in the system, from perhaps mil-
lions to a few hundred, enabling classical sequential techniques to be used. In
Physics and Chemistry, and where the original matrix is regularly structured
and sparse, iterative techniques like the Lanczos[9] or Davidson[2] methods,
that recover the lowest few eigenvalues and their corresponding eigenvectors
can be used eectively.
There are however many problems for which it is desirable to have all of the
eigenvalues and eigenvectors of systems. Physical simulations of quantum
mechanical systems that have a few particles present at nite temperature
cannot be properly analysed by methods for large numbers of particles like
statistical mechanics and are suciently far from their ground states as to
be in uenced by all available states. Similarly in the design of mechanical
structures acoustic resonances might occur over a broad bandwidth. In such
cases all eigenvalues and eigenvectors of large symmetric matrices are needed,
for which no algorithm of complexity less than O(N ), where N is the di-
3

mension of the matrix to be diagonalised, exists, regardless of the sparsity of

the matrix.
The most ecient algorithm for the diagonalisation of large symmetric ma-
trices is the Householder tri-diagonalisation proceedure followed by the QL
diagonalisation algorithm. The operation count for the combined House-
holder method is approximately N whether or not eigenvectors are required
3

and the operation count for the QL method is roughly 30N when the eigen-
2
The Householder-QL Matrix Diagonalisation Algorithm 3

vectors are not required and 3N when they are. These counts presume that
3

the average number of iterations is about 1.7[10], an assumption that is con-

rmed for the test matrices that we use. The only viable alternative is the
Jacobi method which has about 20N operations when eigenvectors are not
3

required and about 40N when the are assuming less that 10 Jacobi cycles
3

for convergence[10]. The Jacobi method might readily parallelise [11] but
requires more than 20 processors before becoming competitive with the se-
quential Householder-QL algorithm. Previous attempts at parallelising the
Householder and QL algorithms have resulted in inecient solutions[4] or
have altered the algorithms in some way[12, 3, 1, 8]

2 Description of the Algorithm

Clear descriptions of the Householder and QL algorithms can be found in [10]
and [6]. If A is an N N symmetric matrix, then a sequence of N
2 transformations of the form PN :::P AP :::PN reduces the matrix to
2 1 1 2

symmetric tri-diagonal form. The matrix Pk is formed from the requirement

that the matrix Ak = Pk :::P AP :::Pk , is tri-diagonal up to the kth
column. For example for a 6 6 matrix A has the form
1 1 1 1 1

0 1
B d e 0 0 0 0C
B x x x x CC
1 1

B e d
B x x x x CCC
1 2

B
B 0 x
B
B 0 x x x x x CC
B
B 0 x x x x x CC
B
@ 0 x x x x x CA
0 x x x x x

The Pk matrices can be written as Pk = I 2Wk WkT , where the vec-

tor WkT = R (0; : : : ; 0; xk + s; xk ; : : : ; xn)T , is formed from the kth
1
+1 +2

row (or column) of the Ak matrix, where R normalises Wk and

s = sign(xk )(xk + xk + : : : + xn). The Ak matrix is then calcu-

+1
2 2 2

lated as Ak = Pk Ak Pk which can readily be shown to reduce to

+1 +2

Ak = Ak 2Wk QTk 2Qk WkT , where Qk = Vk cWk , with Vk = Ak Wk

and c = WkT Vk .
1 1
The Householder-QL Matrix Diagonalisation Algorithm 4

The complete transformation matrix Z = PN :::P is needed for the calcu-

2 1

lation of the eigenvectors. The normal method for reconstructing Z is to

store the Wk vectors in the rows of the A matrix that have already been
tri-diagonalised, then the Z matrix can be built using the same space so that
all the eigenvectors of A can be determined using only the memory occupied
by the original data space.
The QL algorithm which is used to nd all the eigenvalues (and eigenvec-
tors) of a symmetric tri-diagonal matrix naturally follows the Householder
transformation. The part of the QL algorithm that generates the eigenvalues
has complexity O(N ), and the part that builds the eigenvectors has com-
2

plexity O(N ), so that it is only worthwhile parallelising the QL algorithm if

the eigenvectors are required. The code fragment of the QL algorithm that
constructs the eigenvectors has the form below for the ith iteration:
for(j=0;j<N;j++)f
f=Z[i+1][j]
Z[i+1][j]=s*Z[i][j]+c*f
Z[i][j]=c*Z[i][j]-s*f
g

When the transformation matrix Z is stored by row-wise striping[5] the up-

date of the Z matrix can be performed independently for each processor,
except when the ith and i+1th rows are on dierent processors. As the iter-
ation sequence is from N to 1, at most a single exchange of rows is needed
for each iteration. This dictates that the transformation matrix from the
Householder algorithm also be decomposed by row-wise striping, which in
turn means that we have to use the same storage method for the original
matrix. In fact the only signicant amount of memory needed is that oc-
cupied by the original matrix, assuming of course that this is stored as a
dense matrix. This important property needs to be retained in the parallel
algorithm.
In the mathematical description of the Householder algorithm that follows
the iteration number is indicated by subscripting the vectors, for example
Wk is the W vector at the kth iteration. In the pseudo code there is only
one storage location for each vector and in the case of the Wk , it is denoted
as W. The iteration number in the pseudo code is k.
The Householder-QL Matrix Diagonalisation Algorithm 5

The pseudo-code outline for the Householder routine is:-

for(k=0;k<N-2;k++)f
s=formW(W,A,k)
formV(V,A,W)
formQ(Q,V,W,s,k)
formA(A,Q,W,s,k)
g
formZ(A)

As the matrix Ak is stored row-wise the formation of the Wk is done by

one processor and broadcast to the rest. The quantity s, which is needed for
subsequent parts of the calculation is also distributed in this same broadcast.
The pseudo-code outline for formW is :-
double formW(W,A,k) {Inputs Ak and outputs Wk
if(ProcNoForRow[k] == MyProcNo)f
1

form W and s
MPI Bcast(s,W)
g
else
MPI Bcast(s,W)
g
return(s)

Calculation of the vector Vk = Ak Wk requires a matrix vector product. Since

each processor has a copy of the Wk vector, this requires a local matrix vector
product followed by a gathering operation of the local slices of Vk , so that
each processor has a complete copy of Vk .
formV(V,A,W,k) {Inputs Ak and Wk and outputs Vk
1

Do the local matrix multiply Vl = AW

MPI Allgather(Vl,V )

As both the vectors Vk and Wk are held by each processor the Qk vector
(calculated in formQ) can be computed locally on each processor. The Ak
matrix is updated by the equation Ak = Ak 2Wk QTk 2Qk WkT , which
1
The Householder-QL Matrix Diagonalisation Algorithm 6

involves only outer products of the locally stored Wk and Qk vectors. The
diagonal and o-diagonal elements of the reduced part of the a matrix are
stored in a local vector on the processor that holds row k. At the end
of the tri-diagonalisation process these need to be gathered onto the other
processors so all processors have a copy of the tri-diagonalised matrix as
required by the QL procedure. If the eigenvectors are not required then the
transformation matrix Z = PN ; : : :; P is not required and formZ can be
2 1

omitted. When the eigenvectors are required, Z is updated recursively from

the formula Zk = Zk 1 2Zk Wk WkT . The formation of Z starts from the
1

bottom right hand corner of the original A matrix, which now stores the Wk
row-wise. Proceeding from this corner for which Z is the 2 2 unit matrix, we
can form the Zk matrix in rows k up to N in the space formerly used to keep
the Wk up to WN . In separating out the formation of the Z matrix from the
2

main iteration sequence we have made the choice to opt for storage eciency,
rather than execution speed. This means that we have to re-broadcast the
Wk vectors in order to do the U = ZW matrix-vector product, and then
perform a gathering operation to put a copy of the resultant vector U on
every processor. The outerproduct UW T can then be done locally. In the
following code description remember that A stores both the Wk vectors and
the Z matrix under construction.
formZ(A) {Inputs AN and outputs Z in the A matrix
Make the bottom right hand corner of A a 2 2 unit matrix
2

for(k=N-2;k>0;k{) f
if(ProcNoForRow[k] == MyProcNo)f
recover W
MPI Bcast(W)
g
else
MPI Bcast(W)
g
Do the local matrix vector multiplication Ul = AW
MPI Allgather(Ul,U)
Z = Z 2UW T
Zero row k and column k of Z
Set Z[k][k]=1
g
The Householder-QL Matrix Diagonalisation Algorithm 7

3 Test Cases
The algorithms were tested using matrices that occur in many-body quantum
mechanical model of crystals for which there are two states per atom. The
total number of available states for the system is thus 2N for a N atom system.
The nature of the problem is such that the matrices for l = 0; 1; : : : ; N
particles on N crystal sites can be generated separately and the resulting
matrices have dimension ClN = NNl l . The general sparsity pattern of the
(
!
)! !

test matrices is shown in gure 1 in which non-zero elements (about 3%)

are all one and we store the original matrix as a dense matrix since the
storage is needed for the eigenvectors anyway. It must be stressed too, that
sparsity does not aect the complexity of the Householder algorithm and the
complexity of the QL algorithm is dominated by the number of iterations
required which is about 1:7N for the matrices we use although the maximum
number of iterations required (topnd the rst eigenvalue) can vary markedly
from 10 or so to something like N .
The fact that the algorithm only requires minimal storage, for the original
matrix only, is an important property when dealing with matrices that grow
exponentially as the physical size of the problem is increased linearly. For
instance for the 3432 3432 diagonalised in the scaling experiments requires
about 100Mbytes of storage for the original matrix.

4 Results
The results presented below were all carried out on an IBM SP2 which is a
heterogeneous 23 node machine with a subset of 16 'thin1' nodes on which
our code was tested. Each `thin1' node consists of a 66Mhz Power2 processor,
128Mbytes of RAM with a 64 bit memory bus, a 32Kbyte instruction cache
and a 64Kbyte data cache. The latency and bandwidth for the machine, as
measured by simple processor to processor `ping-pong' benchmarks are 100
seconds and 20 Mbytes/second[7].
For the scaling test presented in gure 2 the problem size was increased from
N = 70 to N = 3432 with the number of processors xed at 16. Ideally the
The Householder-QL Matrix Diagonalisation Algorithm 8

100

200

300

400

500

600

700

800

900
0 200 400 600 800

Figure 1: The Sparsity Pattern of the 924 924 Test Matrix used in the
Speed-up Experiments.
The Householder-QL Matrix Diagonalisation Algorithm 9

P 1 2 3 4 5 6 7 8
HH 114 66.4 46.3 36.8 30.5 26.5 23.8 22.8
QL 37.1 28.2 21.7 17.8 15.4 13.5 12.1 11.3
Total 151 94.6 68.0 54.6 45.9 40.0 35.9 34.1
P 9 10 11 12 13 14 15 16
HH 21.0 19.9 18.8 18.6 17.5 17.0 16.9 16.9
QL 10.3 9.86 9.27 8.85 8.58 8.13 8.08 7.67
Total 31.3 29.8 28.1 27.5 26.1 25.1 25.0 24.6
Table 1: Execution Times (in seconds) on P Processors for the QL and House-
holder Segments of the Diagonalisation Algorithm for a 924 924 Matrix
N 70 126 252 462 924 1716 3432
HH 0.22 0.36 1.05 3.13 16.9 79.1 550
QL 0.05 0.09 0.33 1.42 7.67 38.9 264
(Av. Iters.) (1.71) (1.67) (1.71) (1.75) (1.89) (1.75) (1.89)
Total Time 0.27 0.45 1.38 4.55 24.6 118 814
Table 2: Execution Times (in seconds) on 16 Processors for the QL and
Householder Segments of the Diagonalisation Algorithm for varying Prob-
lem Size, N. The Average number of iterations per eigenvalue for the QL
algorithm is also indicated.

execution time should behave like t = N for large N , which is indeed the
3

case. The constant of proportionality is approximately 1:88 times larger than

ideal as is seen from the overall eciency of the algorithm on 16 processors.
The speed-up of the algorithm was tested on a problem size N = 924 on one
to sixteen processors. The results presented on gure 3 show a dierence be-
tween the eciency(on 16 procesors). To estimate the eciency we calculate
the ratio of the time on one processor to P times the time on P processors.
We nd that the eciency EHH 0:50, for the Householder routine and that
of the QL algorithm is EQL 0:46. The super eciency eect for the QL
algorithm being due to a marked reduction in data cache misses when the
data is distributed over a large number of processors. The overall eciency
E 0:49 for 16 processors.
The measured data are shown in tables 1 and 2.
The Householder-QL Matrix Diagonalisation Algorithm 10

600
Householder
QL
550(N/3432)^3
264(N/3432)^3
500
Execution Time (seconds)

400

300

200

100

0
0 500 1000 1500 2000 2500 3000 3500
Problem Size (N)

Figure 2: Execution Times for Varying Problem Sizes on 16 Processors

The Householder-QL Matrix Diagonalisation Algorithm 11

140
Householder
QL
129/P+2log2(P)
120 4.7+47/P

100
Execution Time (seconds)

0
0 2 4 6 8 10 12 14 16
Number of Processors (P)

Figure 3: Execution Times for a 924 924 Matrix on 2 to 16 processors.

The Householder-QL Matrix Diagonalisation Algorithm 12

5 Summary
We have shown that a relatively straight forward parallelisation of the House-
holder tri-diagonalisation algorithm and the QL method for nding the eigen-
values and eigenvectors of the resulting matrix shows a time complexity like
N =P + N log (P ) for the Householder problem and like N + N =P
3 2
2
2 3

for the QL algorithm for xed problem size on increasing numbers of proces-
sors, P . The log (P ) term is empirically tted to the data, but is consistent
with MPI Bcast and MPI Allgather being implemented on a breadth rst
2

spanning out tree. If we use the timing results from gure 2 for large N , then
we nd values of 3 , 0:6 10
8800
(3432)
6
, 5:5 10 and
6
3.
4224
(3432)

For the Householder algorithm, the communications penalty , N log (P ),2

is less relevant for larger problems since the time for the main body of the
algorithm is / N . If the eigenvalues are not required the QL algorithm
3

remains sequential and the Householder method goes twice as fast. For large
problems it might still worth using a parallel version of Householders algo-
rithm since this spreads the memory requirements over all of the processors
and there is still a 30% time saving.
The Householder-QL Matrix Diagonalisation Algorithm 13

References
[1] M.M. Chawla and D.J. Evans. Parallel Householder Method for Linear-
Systems International Journal of Computer Mathematics 58 (1995) 159-
167
[2] E.R. Davidson. The Iterative Calculation of a Few of the Lowest Eigen-
values and Corresponding Eigenvectors of Large Real-Symmetric Matri-
ces. J. Computational Physics 17 (1975) 87
[3] F.J. Ferreira, P.B. Vasconelos and F.D. Dalmeida. Performance of a QR
Algorithm Inplementation on a Multicluster of Transputers Computer
Systems in Engineering 6 (1995) 363-367
[4] G. Fox et al, Solving Problems on Concurrent Processors (Prentice-Hall
International Englewood Clis 1988).
[5] V. Kumar, A. Grama, A. Gupta and G. Karypis, Introduction to Parallel
Computing - Design and Analysis of Algorithms (Benjamin Cummings
California).
[6] J.H. Mathews, Numerical Methods for Mathematics, Science and Engi-
neering (Prentice-Hall International Englewood Clis 1992).
[7] P. Melas and E.J. Zaluska, High Performance Protocols for Clusters
of Commodity Workstations Lecture Notes in Computer Science 1470
(1998) 570-577.
[8] G.G. Meyer and M. Pascale A Family of Parallel QR Factorisation Al-
gorithms Concurrency-Practice and Experience 8 (1996) 461-473
[9] B.N.Parlett, H.Simon and L.M.Stringer On Estimating the Largest
Eigenvalue With the Lanczos Algorithm Mathematics of Computation
38 (1982) 153-165
[10] W.H.Press, B.P. Flanery, S.A. Teukolsky and W.T. Vetterling, Numer-
ical Recipies - The Art of Scientic Computing (Combridge University
Press Cambridge 1986).
[11] W.H.Press, B.P. Flanery, S.A. Teukolsky and W.T. Vetterling, Numeri-
cal Recipies in Fortran 90- The Art of Scientic Computing (Cambridge
University Press Cambridge 1996).
The Householder-QL Matrix Diagonalisation Algorithm 14

[12] M.A. Shalaby. A Parallel Processed Scheme for the Eigenproblem of

Positive-Denite Matrices J. of Computational and Applied Maths. 54
(1994) 99-106

Blennow M Solutions Manual For Mathematical Methods For Phys
No ratings yet
Blennow M Solutions Manual For Mathematical Methods For Phys
348 pages
Simulation of Some Power System, Control System and Power Electronics Case Studies Using Matlab and PowerWorld Simulator Programs
From Everand
Simulation of Some Power System, Control System and Power Electronics Case Studies Using Matlab and PowerWorld Simulator Programs
Dr. Hidaia Mahmood Alassouli
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
Robot Manipulators: Modeling, Performance Analysis and Control
From Everand
Robot Manipulators: Modeling, Performance Analysis and Control
Etienne Dombre
No ratings yet
Exercises of Matrices and Linear Algebra
From Everand
Exercises of Matrices and Linear Algebra
Simone Malacrida
4/5 (1)
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Exercises of Differential Linear Systems
From Everand
Exercises of Differential Linear Systems
Simone Malacrida
No ratings yet
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
3/5 (4)
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
How To Code For Quantum Computers
From Everand
How To Code For Quantum Computers
Nivio Dos Santos
No ratings yet
Kronecker Products and Matrix Calculus with Applications
From Everand
Kronecker Products and Matrix Calculus with Applications
Alexander Graham
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Exercises of Basic Analytical Geometry
From Everand
Exercises of Basic Analytical Geometry
Simone Malacrida
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Reviews in Computational Chemistry, Volume 31
From Everand
Reviews in Computational Chemistry, Volume 31
Abby L. Parrill
No ratings yet
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
Solutions to Problems in Fluids and Turbomachinery
From Everand
Solutions to Problems in Fluids and Turbomachinery
Rahul Basu
No ratings yet
Handout 10
No ratings yet
Handout 10
12 pages
Reduction of A Symmetric Matrix To Tridiagonal Form - Givens and Householder Reductions
No ratings yet
Reduction of A Symmetric Matrix To Tridiagonal Form - Givens and Householder Reductions
7 pages
Introduction to Vectors, Matrices and Tensors
From Everand
Introduction to Vectors, Matrices and Tensors
Simone Malacrida
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Applied Mathematics for Science and Engineering
From Everand
Applied Mathematics for Science and Engineering
Larry A. Glasgow
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet
Worked Examples in Mechanical Vibrations using MATLAB
From Everand
Worked Examples in Mechanical Vibrations using MATLAB
Eric Okoth Ogur
No ratings yet
Exercises of Tensors
From Everand
Exercises of Tensors
Simone Malacrida
No ratings yet
Exercises of Numerical Analysis
From Everand
Exercises of Numerical Analysis
Simone Malacrida
No ratings yet
Advances in Chemical Physics
From Everand
Advances in Chemical Physics
Stuart A. Rice
No ratings yet
Analytical Methods of Optimization
From Everand
Analytical Methods of Optimization
D. F. Lawden
No ratings yet
Foundations of Image Science
From Everand
Foundations of Image Science
Harrison H. Barrett
No ratings yet
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
Electrostatic Kinetic Energy Harvesting
From Everand
Electrostatic Kinetic Energy Harvesting
Philippe Basset
No ratings yet
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter Kattan
No ratings yet
Problems in Quantum Mechanics: Third Edition
From Everand
Problems in Quantum Mechanics: Third Edition
D. ter Haar
3/5 (2)
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Nonlinear Control Feedback Linearization Sliding Mode Control
From Everand
Nonlinear Control Feedback Linearization Sliding Mode Control
Mourad Boufadene
No ratings yet
Introduction to Analytical Geometry
From Everand
Introduction to Analytical Geometry
Simone Malacrida
No ratings yet
Householder
100% (1)
Householder
9 pages
Introduction to Numerical Analysis
From Everand
Introduction to Numerical Analysis
Simone Malacrida
No ratings yet
Simulation of Some Power System, Control System and Power Electronics Case Studies Using Matlab and PowerWorld Simulator
From Everand
Simulation of Some Power System, Control System and Power Electronics Case Studies Using Matlab and PowerWorld Simulator
Dr. Hedaya Mahmood Alasooly
No ratings yet
Determinants and Matrices
From Everand
Determinants and Matrices
A. C. Aitken
3/5 (1)
Quantum Bits (Qubits)
From Everand
Quantum Bits (Qubits)
Dar’Sean Raymond White Johnson
No ratings yet
Simulation of Some Power Electronics Case Studies in Matlab Simpowersystem Blockset
From Everand
Simulation of Some Power Electronics Case Studies in Matlab Simpowersystem Blockset
Dr. Hidaia Mahmood Alassouli
2/5 (1)
Simulation of Some Power Electronics Case Studies in Matlab Simpowersystem Blockset
From Everand
Simulation of Some Power Electronics Case Studies in Matlab Simpowersystem Blockset
Dr. Hidaia Mamood Alassouli
No ratings yet
Exercises of Trigonometry
From Everand
Exercises of Trigonometry
Simone Malacrida
No ratings yet
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Fifth Dimension: The Light to See
From Everand
Fifth Dimension: The Light to See
Marc E. King
No ratings yet
Exercises of Quantum Physics
From Everand
Exercises of Quantum Physics
Simone Malacrida
No ratings yet
A Fully Parallel Algorithm For The Symmetric Eigen
No ratings yet
A Fully Parallel Algorithm For The Symmetric Eigen
20 pages
An Algorithm For Generalized Matrix Eigenvalue Problems (1973) (C. B. Moler, G. W. Stewart)
No ratings yet
An Algorithm For Generalized Matrix Eigenvalue Problems (1973) (C. B. Moler, G. W. Stewart)
16 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter I. Kattan
3.5/5 (11)
11.3 Eigenvalues and Eigenvectors of A Tridiagonal Matrix
No ratings yet
11.3 Eigenvalues and Eigenvectors of A Tridiagonal Matrix
7 pages
Dear The Weight
From Everand
Dear The Weight
Masud Rana
No ratings yet
Some Power Electronics Case Studies Using Matlab Simpowersystem Blockset
From Everand
Some Power Electronics Case Studies Using Matlab Simpowersystem Blockset
Dr. Hedaya Mamood Alasooly
No ratings yet
Advanced Electric Drives: Analysis, Control, and Modeling Using MATLAB / Simulink
From Everand
Advanced Electric Drives: Analysis, Control, and Modeling Using MATLAB / Simulink
Ned Mohan
No ratings yet
Housholder Transformación
No ratings yet
Housholder Transformación
5 pages
Jacobi
No ratings yet
Jacobi
2 pages
Chapter 4 - Trigonometric Equation
No ratings yet
Chapter 4 - Trigonometric Equation
2 pages
CHAPTER 4 State Space Analysis and Desig PDF
No ratings yet
CHAPTER 4 State Space Analysis and Desig PDF
43 pages
Unit 3 - Discrete Fourier Transform
No ratings yet
Unit 3 - Discrete Fourier Transform
62 pages
Homework 2 Solutions: Solution: (A) If A
No ratings yet
Homework 2 Solutions: Solution: (A) If A
3 pages
7 50 139910997332 38
No ratings yet
7 50 139910997332 38
7 pages
3.1 Trigonometric Functions of An Acute Angle
No ratings yet
3.1 Trigonometric Functions of An Acute Angle
13 pages
Area Practice
No ratings yet
Area Practice
7 pages
10-1 Intro To Conic Sections - Class
No ratings yet
10-1 Intro To Conic Sections - Class
29 pages
MTH
No ratings yet
MTH
6 pages
Time: 3 Hours Paper Maths For Class 6
No ratings yet
Time: 3 Hours Paper Maths For Class 6
1 page
Maths Year 11 Book 1
No ratings yet
Maths Year 11 Book 1
122 pages
Unit 1 Review Answer Key - AP Precalculus - Calc Medic
No ratings yet
Unit 1 Review Answer Key - AP Precalculus - Calc Medic
4 pages
04-22-2024 Class Slides
No ratings yet
04-22-2024 Class Slides
10 pages
Week 7-8
No ratings yet
Week 7-8
4 pages
Sample Solutions To Practice Problems For Exam I: Math 11 Fall 2007 October 17, 2008
No ratings yet
Sample Solutions To Practice Problems For Exam I: Math 11 Fall 2007 October 17, 2008
27 pages
Aops Community 2019 Imo: Proposed by Liam Baker, South Africa
No ratings yet
Aops Community 2019 Imo: Proposed by Liam Baker, South Africa
2 pages
DPP - Algebra - Session 4 - Quadratic Equations - II - LAKSHYA'22
No ratings yet
DPP - Algebra - Session 4 - Quadratic Equations - II - LAKSHYA'22
7 pages
Uttara Kannada District List of Mathematics II Pu Activities & Assignments For 2024-25
No ratings yet
Uttara Kannada District List of Mathematics II Pu Activities & Assignments For 2024-25
4 pages
Multiplying and Dividing Decimal Notes
No ratings yet
Multiplying and Dividing Decimal Notes
4 pages
Aptitude Day01
No ratings yet
Aptitude Day01
83 pages
Homework 3 Numerical Analysis (CMPS/MATH 305) : Hani Mehrpouyan
No ratings yet
Homework 3 Numerical Analysis (CMPS/MATH 305) : Hani Mehrpouyan
7 pages
Maxima & Minima-I-1
No ratings yet
Maxima & Minima-I-1
25 pages
hw3 Solutions
No ratings yet
hw3 Solutions
11 pages
Important Notice: Additional Mathematics 4037 GCE O Level 2007
No ratings yet
Important Notice: Additional Mathematics 4037 GCE O Level 2007
15 pages
Pages From Laplace - Table v1
No ratings yet
Pages From Laplace - Table v1
1 page
Nurul A. Chowdhury - Numerical Methods
100% (1)
Nurul A. Chowdhury - Numerical Methods
336 pages
NCERT Solutions For Class 7 Maths Chapter 12 Algebraic Expressions
No ratings yet
NCERT Solutions For Class 7 Maths Chapter 12 Algebraic Expressions
32 pages
March-Practice Test - 010357
No ratings yet
March-Practice Test - 010357
10 pages

An Ecient Parallel Version of The Householder-QL Matrix Diagonalisation Algorithm

Uploaded by

An Ecient Parallel Version of The Householder-QL Matrix Diagonalisation Algorithm

Uploaded by

An Ecient Parallel Version of the

Householder-QL Matrix Diagonalisation

N log (P ) and the QL algorithm like N + N =P as the number

of processors P is increased for xed problem size. The constant pa-

Keywords Householder; QL; Parallel Algorithms.

mension of the matrix to be diagonalised, exists, regardless of the sparsity of

the average number of iterations is about 1.7[10], an assumption that is con-

2 Description of the Algorithm

symmetric tri-diagonal form. The matrix Pk is formed from the requirement

The Pk matrices can be written as Pk = I 2Wk WkT , where the vec-

row (or column) of the Ak matrix, where R normalises Wk and

s = sign(xk )(xk + xk + : : : + xn). The Ak matrix is then calcu-

lated as Ak = Pk Ak Pk which can readily be shown to reduce to

Ak = Ak 2Wk QTk 2Qk WkT , where Qk = Vk cWk , with Vk = Ak Wk

The complete transformation matrix Z = PN :::P is needed for the calcu-

lation of the eigenvectors. The normal method for reconstructing Z is to

plexity O(N ), so that it is only worthwhile parallelising the QL algorithm if

When the transformation matrix Z is stored by row-wise striping[5] the up-

The pseudo-code outline for the Householder routine is:-

As the matrix Ak is stored row-wise the formation of the Wk is done by

Calculation of the vector Vk = Ak Wk requires a matrix vector product. Since

Do the local matrix multiply Vl = AW

omitted. When the eigenvectors are required, Z is updated recursively from

test matrices is shown in gure 1 in which non-zero elements (about 3%)

case. The constant of proportionality is approximately 1:88 times larger than

Figure 2: Execution Times for Varying Problem Sizes on 16 Processors

Figure 3: Execution Times for a 924 924 Matrix on 2 to 16 processors.

For the Householder algorithm, the communications penalty , N log (P ),2

[12] M.A. Shalaby. A Parallel Processed Scheme for the Eigenproblem of

You might also like

An Ecient Parallel Version of the

N log (P ) and the QL algorithm like N + N =P as the number