0% found this document useful (0 votes)

89 views10 pages

Towards Parallel Twisted Block Factorizations: Moldaschl, Gansterer Parnum11

This document discusses parallelization strategies for computing twisted block factorizations (TBFs) of block tridiagonal matrices. TBFs are an important part of computing eigenvectors without first tridiagonalizing the matrix. The document outlines several strategies for parallelizing TBF computations across multiple cores, including distributing eigenvector computations across cores or using two cores cooperatively to compute a single eigenvector's TBFs in parallel. Experimental results are presented to analyze the performance of some of these parallelization strategies.

Uploaded by

Anurag Daware

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views10 pages

Towards Parallel Twisted Block Factorizations: Moldaschl, Gansterer Parnum11

Uploaded by

Anurag Daware

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Moldaschl, Gansterer

Parnum11

TOWARDS PARALLEL TWISTED BLOCK FACTORIZATIONS

Michael Moldaschl Research Group Theory and Applications of Algorithms, University of Vienna, Vienna, Austria Wilfried N. Gansterer Research Group Theory and Applications of Algorithms, University of Vienna, Vienna, Austria
Abstract With the rise of multicore architectures, concurrency and parallelization potential of numerical algorithms have become pivotal for achieving high performance. In this paper, the parallelization of the computation of twisted block factorizations of a block tridiagonal matrix is studied and several parallelization variants are discussed. The process is an important building block for computing eigenvectors of symmetric block tridiagonal matrices without tridiagonalization.

Introduction

A sequential algorithm for computing eigenvectors of a block tridiagonal or banded matrix based on twisted block factorizations (TBF, [1]) has been proposed recently [2]. An important benet of this method, which we refer to as TBF method in this paper, is that it does not require tridiagonalization of the block tridiagonal matrix and thus no backtransformation process. Since band reduction processes which reduce a full matrix to a band matrix with a bandwidth larger than one tend to be more ecient than a full tridiagonalization, block tridiagonal or banded matrices as considered here may also arise as the intermediate step when computing spectral information of a full dense matrix. Moreover, there is also a computationally less expensive approximative method for transforming a full matrix into a closely related block tridiagonal matrix [3]. The central component of the TBF method for computing eigenvectors of a block tridiagonal matrix proposed in [2] is the computation of twisted block factorizations. For fully understanding the performance potential of this method on contemporary and future hardware architectures, ecient parallelizations for multicore architectures have to be investigated and their scaling behavior with the number of cores has to be investigated. Multicore architectures are the dominant processor architecture for the years to come, it is expected that the number of cores will grow steadily over time. In this paper, we focus on several parallel strategies for computing twisted block factorizations of a given block tridiagonal matrix. The other components of the eigenvector computation, i. e., determination of a starting vector and solving the linear system in an inverse iteration step, are much less signicant in terms of overall runtime and therefore not discussed in detail in this paper. However, we discuss the eect of two new CPU features on the performance of parallel twisted block factorizations: the Intel hyperthreading technology
Parallel Numerics 11/Leibnitz, Austria/October 5-7 1

Parnum11

Moldaschl, Gansterer

and the Intel turbo boost technology. The former supports the execution of two separate threads on a single core. The latter allows one, two or four cores to increase their clock frequency if more performance is needed.1 Related Work. Various parallel eigensolvers have been developed and integrated into parallel software packages such as ScaLapack [4] or PLapack [5]. These libraries contain routines primarily for tridiagonal problems, though [6, 7, 8, 9], and rely on a tridiagonalization process preceeding them. Moreover, the target platforms for classical libraries such as ScaLapack and PLapack are distributed memory machines and they are not particularly well suited for modern multicore architectures. Plasma [10] is a more recent eort specically targeting multicore architectures, but its most current version 2.4.1 does not yet contain any routines for computing eigenvectors. In our experiments, we use eigensolvers from ScaLapack for comparison. The only eort so far specically targeting block tridiagonal eigenproblems without tridiagonalization is the block divide-and-conquer method [11, 12], which has already been parallelized [13]. However, the block divide-and-conquer method is not competitive in terms of performance if highly accurate solutions are required and the ranks of the o-diagonal blocks are high [11]. This has motivated the development of a method for computing eigenvectors of block tridiagonal matrices based on twisted block factorizations without tridiagonalization [2], [1]. Sequential implementations of this method have been developed and evaluated. Synopsis. In Section 2 the sequential TBF method for computing eigenvectors of a symmetric block tridiagonal matrix is reviewed. In Section 3, several parallelization strategies for the most important component of the TBF method, i. e., the computation of the twisted block factorizations, are introduced and discussed. Section 4 summarizes experimental evaluations of some of these parallelization strategies, and Section 5 concludes the paper.

Computing eigenvectors using twisted block factorizations

matrix with p diagonal blocks: Rnn (1)

The problem is dened by a symmetric block tridiagonal B1 A 2 A2 B2 A 3 .. .. . . A3 M := .. .. . . A p Ap Bp

where Ai , Bi Rbb with n = p b and Bi = Bi . Given an (approximation of an) eigenvalue of M , the corresponding eigenvector x of M can be computed as the solution of

(M I) x = W x = 0 with block tridiagonal W , e. g., using an inverse iteration process:

(2)

see http://www.intel.com/Assets/PDF/manual/325384.pdf Parallel Numerics 11/Leibnitz, Austria/October 5-7

Moldaschl, Gansterer

Parnum11

1. Determine starting vector x0 with x0 2. Solve (M I) xi+1 = xi for xi+1 3. Normalize xi+1 ; i := i + 1 4. If not converged continue with Step 2

= 1; i := 0

For solving the linear system in Step 2, a factorization of W can be used. For a block tridiagonal matrix, twisted block factorization [1] can be constructed which has the following structure: + + L1 + U1 N1 M+ ... .. .. 2 . . .. + + + . Li1 Ui1 Ni1 (3) Ui Mi+ Li Mi+1 P .. Ni Ui+1 . Li+1 .. .. .. . . . Mp Np1 Up L p The plus superscripts in Eqn. (3) indicate that these blocks result from a forward factorization and the minus superscripts that they were calculated in a backward factorization. The combination of the forward and backward factorization is called twisted block factorization (TBF). One of the possible TBFs denes a suitable starting vector for the inverse iteration process and is also used in solving the linear system in Step 2 [2]. In total, there are p dierent twisted block factorization representations (combining i 1 forward and p i backward factorization steps for i = 1, . . . , p). For determining the dening entities of all TBFs, an entire forward and backward factorization need to be computed, and then the updated diagonal blocks need to be factorized according to the following p equations:
+ Bi Pi+ Mi+ Ni1 Pi+1 Mi+1 Ni = Pi Li Ui

i = 1, . . . , p.

(4)

The minimal diagonal entry in all Ui can be used to dene the starting vector x0 for the inverse iteration process [2].

Parallelization strategies

In this section, we develop and discuss several possible strategies for parallelizing the computation of twisted block factorizations on multicore architectures. Computing the twisted block factorizations of W by far dominates the computational cost and runtime of the TBF method, and thus we focus on the parallelization of this component. The parallelization strategies are grouped according to the number of cores working on the computation of a single eigenvector.
Parallel Numerics 11/Leibnitz, Austria/October 5-7 3

Parnum11

Moldaschl, Gansterer

3.1

One core per eigenvector

The straightforward approach would be to use a single core per eigenvector and to parallelize over dierent eigenvectors. Theoretically, this achieves almost perfect parallelization. However, in practice, there are some inuences which cause a deviation from perfect scaling: initially replicating the block tridiagonal matrix over all cores and nally collecting the computed eigenvectors on one core causes communication overhead;and on the nodes we have currently available, up to four cores share the cache which can cause memory delays if several processes use the cache at the same time. Last, but not least, orthogonality of computed eigenvectors based on twisted block factorizations may be insucient for clustered eigenvalues [2], and thus separate reorthogonalization of eigenvectors may be required in some cases. Nevertheless, we use simplied versions of this straightforward approach as reference versions. Two dierent strategies based on the parallelization over the eigenvectors have been implemented: version0 and version3. Version0: The eigenvalues are distributed blockwise over the available cores and the matrix is broadcasted. Each core computes the eigenvectors dened by its local eigenvalues and then the eigenvectors of all cores are merged into the eigenvector matrix. No reorthogonalization is performed. The implementation of version0 is based on MPI, including the distribution of the eigenvalues at the beginning and the merging of the eigenvectors at the end. Version3: This strategy tries to exploit the shared memory in a node by using OpenMP. Every process consists of two or four threads, which are used to simultaneously compute several eigenvectors within a process. The number of processes times number of threads is always equal to the number of cores used, and we do not consider several threads running on a single core.

3.2

Two cores per eigenvector

As outlined in Section 2, the p dierent twisted block factorizations can be computed as one forward and one backward factorization followed by solving the p equations (4). An obvious idea to parallelize the computation of one eigenvector over two cores is the simultaneous computation of the forward and backward factorization with two processes. This idea leads to ve dierent strategies outlined in the following. Version1: In this version the computation is also parallelized over dierent eigenvectors, but always two processes work on the same eigenvector. In particular, the forward and backward factorizations are computed in parallel by two processes. The rst process computes the forward factorization of the upper half of W , and simultaneously the second process computes the backward factorization of the lower half of W . Then, the two processes exchange the blocks of the row where the factorizations meet. Process 1 uses these blocks to compute the backward factorization of the upper part of W , and simultaneously Process 2 uses them to compute the forward factorization of the lower part of W . After that, Process 1 has all data of the forward and backward factorization of the upper half of W and Process 2 has all data of the forward and backward factorization of the lower half of W .
4 Parallel Numerics 11/Leibnitz, Austria/October 5-7

Moldaschl, Gansterer

Parnum11

Based on Eqn. (4) these data can be used to calculate all possible twisted factorizations without needing any other data except two blocks which were calculated in the upper or lower half but are necessary for the other part. Both processes exchange these necessary blocks and Process 1 computes the twisted factorizations of the upper part of W (for all twisted blocks in the upper part), and simultaneously Process 2 computes the twisted factorizations of the lower part of W (for all twisted blocks in the lower part). As a result, Process 1 has all the information about twisted factorizations which meet in the upper half of W , and Process 2 has all the information about twisted factorizations which meet in the lower half of W . Both processes then search for the minimal diagonal entry in the twisted factorizations of their half, compare their minima, and the process with the global minimum starts with the back substitution process. Depending on the location of the minimum the corresponding process rst substitutes towards the center, exchanges required information with the other process (which can then start its back substitution process), and then substitutes towards the outside. As a result, the upper half of the eigenvector is computed on Process 1, and the lower half on Process 2. Version2 is an implementation of version1 based on using OpenMP instead of MPI for the parallelization of the forward, backward and twisted factorizations. In this case, each process consists of two threads: one computes the forward factorization, the other one the backward factorization. Finally, both threads compute the twisted factorizations in parallel (exchange of required data through the shared memory). In version2, the solution of the linear systems (forward and backward substitutions) is not parallelized, because the runtime of this part is insignicant. Version4 is a combination of version2 and version3. MPI is used to distribute the computation of eigenvectors over dierent processes. Then, OpenMP is used to create two threads in each process to further distribute the computation of the eigenvectors. Finally, each thread uses OpenMP to create another thread to also work on the same eigenvector. These two threads both work on the forward and backward factorization simultaneously. Then each of the two threads computes half of the twisted factorizations. This implementation requires the nested use of OpenMP, which is not supported by all compilers. Version5: The parallelization is done like in version4, only the lowest level (the forward, backward and twisted factorizations, back substitutions for each eigenvector) is parallelized based on MPI instead of OpenMP. Two threads of two dierent processes use MPI to communicate with each other for the work at the lowest level. Version6 is a renement of version2. The inverse iteration and the search of the minimum are also parallelized by using the same threads which before did the forward and backward factorization.

3.3

More than two cores per eigenvector

The following strategies use more than two cores for the parallel computation of a single eigenvector.
Parallel Numerics 11/Leibnitz, Austria/October 5-7 5

Parnum11

Moldaschl, Gansterer

Version7: We can use k cores to compute the p corresponding twisted factorizations concurrently. The computation of the factorizations is in the rst step uniformly distributed between the cores, i. e., core i of the k cores computes TBF 1 + (i 1) (p 1)/(k 1) (assuming that k and p are such that integer division is possible). Then, each core communicates with its neighbors to compute the remaining twisted factorizations based on Eqn. (4). The larger the number of cores is, the more computations are done redundantly, but less communication is required. We are currently investigating which number of cores reaches the best eciency for given problem parameters n, p. Version8: Another possibility to use up to four cores is the simultaneous computation of the twisted, the forward and the backward factorization. One process starts with the computation of the forward factorization, while another process starts with the computation of the backward factorization (in contrast to version1, each of them calculates the whole forward and backward factorization). After each factorization step, each of these two processes sends the given result to another process (Process 1 sends to Process 3 and Process 2 sends to Process 4). After both have reached the center of the matrix they change the target process for the results (Process 1 sends to Process 4 and Process 2 sends to Process 3). At the beginning Processes 3 and 4 have no tasks except receiving the results. After the center of the matrix has been reached, each further result allows each of the two processes the calculation of one twisted factorization. Thus, once half of the forward and backward factorization has been nished, all four processes can work simultaneously. Version9: Another parallelization strategy is the use of parallel basic operations which are used in the twisted block factorization. This includes LU factorizations or solving systems of equations. Ecient implementations for the parallelization of these operations are given in PLASMA. Open questions are how many cores can be used for the parallelization of the basic operations and for which block size this method becomes competitive. Version10: The last and most interesting parallelization strategy for more than two cores is the investigation of a tiled approach for the process of computing all twisted block factorizations. Therefore all blocks are split into smaller blocks (called tiles). The result of one tile is used in the computation of one tile of the next block while the next tile of the rst block is computed. This strategy constructs a pipeline with very small blocks where multiple processes can compute on dierent blocks concurrently, although the basic operations are not nished on the constrained blocks. Important for further researches is how much overlap between the independent operations is possible in principle and how ecient this strategy can be implemented. Factors which inuence the performance will be the size of the tiles, the block size and the degree of overlap (depends on the other factors and the amount of cores).

Numerical experiments

So far, we have implemented versions 0,1,2,3 and 6. The versions which use only one core per eigenvector are included as references for the other methods in order to quantify the parallelization overhead for using more than one core per eigenvector. We summarize evaluations of the parallel eciency of these parallelization strategies using up to two cores per eigenvec6 Parallel Numerics 11/Leibnitz, Austria/October 5-7

Moldaschl, Gansterer

Parnum11

Eciency of two processes for random matrix with b=10 1 0.95 Eciency 0.9 0.85 0.8 0.75 0 2000 4000 6000 8000 10000 12000 14000 16000 Matrix size n
Figure 1: Parallel eciency of various parallelization strategies for computing all eigenvectors on two cores (b = 10, matrix sizes n vary) tor on an Intel i7-860 CPU with 2.8 GHz and 8 GB main memory using the GNU Fortran 4.4.3 compiler. The test system also oers the turbo boost and hyperthreading technology. In all experiments the runtimes for computing all n eigenvectors of a random symmetric block tridiagonal matrix were measured. The parallel eciency (sometimes only called efciency in the following) was computed by dividing the speedup (runtime of the sequential program over the runtime of the parallel program) by the number of cores used. The numerical accuracy of the parallel versions is identical to the one of the sequential implementation which has already been analysed in [2]. Varying the matrix dimension. The rst evaluation of the performance is illustrated in Figure 1 for dierent matrix dimensions and xed block size b = 10. We see that version1 achieves the highest eciency of the versions which use two cores per eigenvector. The OpenMP implementations perform quite well (version6 is a little bit better than version2), but they do not achieve the same eciency as the MPI version. Varying the block size. In Figure 2 the matrix size is 8000 and dierent block sizes are illustrated for two cores. We can see that the eciency strongly depends on the block size. For very small blocks the eciency of the two cores per eigenvector strategies is much lower. Version1 reaches the best eciency for block size 10 and is for all larger block sizes almost equal. The eciency of version2 and version6 increases with the block size and version6 becomes as ecient as version1 for block sizes greater than or equal to 60. Note that except for very small block and matrix sizes, the parallel eciency of version1 is in the worst case only 5% lower than the one of the trivial parallelizations in version0 or version3.
Parallel Numerics 11/Leibnitz, Austria/October 5-7 7

Version3 Version0 Version1

Version6 Version2

Parnum11

Moldaschl, Gansterer

Eciency of two processes for random matrix with N=8000 1 0.95 Eciency 0.9 0.85 0.8 0.75 0 10 20 30 40 Block size b
Figure 2: Parallel eciency of various parallelization strategies for computing all eigenvectors on two cores (n = 8000, block sizes b vary) Turbo boost and hyperthreading. We also investigated the inuence of the turbo boost technology and of hyperthreading on the eciency of the parallelization strategies. On the test system each core runs normally with a clock frequency of 2.8 GHz, but with turbo boost one core can be accelerated up to 3.46 GHz, two cores up to 3.33 GHz and four cores up to 2.96 GHz.2 Using turbo boost on more cores reduces the average performance of all cores and will therefore decrease the eciency. Eqns. (5) and (6) estimate the theoretical slow-down caused by this eect on two cores (c2 ) and on four cores (c4 ): 2.8 + 0.530 96.24% 2.8 + 0.660 2.8 + 0.130 c4 = 84.68% 2.8 + 0.660 c2 = (5) (6)

Version0 Version3 Version1

Version6 Version2

Eqn. (6) could be conrmed experimentally with an error tolerance up to 6%. Because of the relatively small slow-down for two cores, the error tolerance is larger than the estimated change caused by the turbo boost. In the experiments investigating the inuence of the hyperthreading technology three cores were disabled to guarantee that only one physical core was used. In Figure 3 we compare the speedup (over the sequential version) achieved by the dierent parallelization strategies with hyperthreading for dierent matrix sizes and block size 10 to the speedup achieved on two cores without hyperthreading. We see that using hyperthreading for simulating two cores yields denitely worse performance than achieved on two physical cores, but all variants (with
2

see http://download.intel.com/newsroom/kits/embedded/pdfs/Core i7-860 Core i5-750.pdf Parallel Numerics 11/Leibnitz, Austria/October 5-7

Moldaschl, Gansterer

Parnum11

Speedup of 2 processes on 1 or 2 cores for random matrix with b=10 2 1.8 1.6 Speedup 1.4 1.2 1 0.8 0 2000 4000 6000 8000 10000 12000 14000 16000 Matrix size n
Figure 3: Parallel speedup of parallelization strategies (over sequential variant on a single core) on two cores without hyperthreading and on one core with hyperthreading (b = 10, matrix sizes n vary) or without hyperthreading) achieve a speedup over the sequential version, although this is also a very ecient program which is based on Lapack and (ATLAS-)Blas. With hyperthreading version0 is the winner, whereas version3 is not able to use the additional thread just as well. Version6 is as good as version1, and for large matrix sizes it is even better. Version2 shows in almost all cases the worst performance.

Version3 Version0 Version1 Version6 Version2

Version0 Version3 Version6 Version2 Version1

HT HT HT HT HT

Conclusions

Several parallelization strategies for computing eigenvectors of block tridiagonal matrices based on twisted block factorizations have been discussed. The excellent parallel eciency of the trivial parallelization over the eigenvectors was almost matched by more sophisticated parallelization strategies which utilize two cores per eigenvector and are based on the idea of computing the forward and backward factorizations in parallel. In particular, version1 achieved the highest parallel eciency of all strategies which utilize two cores per eigenvector for all tested matrix and block sizes. Larger block sizes tend to lead to higher parallel eciency. This work has focussed on the ecient parallel computation of a single eigenvector with two cores. We are currently working on the strategies mentioned which utilize more than two cores per eigenvector which have higher potential for scaling with growing numbers of cores, as expected on future multicore architectures. Acknowledgements. This work has been partly supported by the Austrian Science Fund (FWF) under contract S10608 (NFN SISE).
Parallel Numerics 11/Leibnitz, Austria/October 5-7 9

Parnum11

Moldaschl, Gansterer

References
[1] W. N. Gansterer and G. Knig, On twisted factorizations of block tridiagonal matrices, o Procedia Computer Science, vol. 1, no. 1, pp. 279 287, 2010. [2] G. Knig, M. Moldaschl, and W. N. Gansterer, Computing eigenvectors of block tridio agonal matrices based on twisted block factorizations, Journal of Computational and Applied Mathematics, 2011, in press. [3] Y. Bai and R. C. Ward, Parallel block tridiagonalization of real symmetric matrices, J. Parallel Distrib. Comput., vol. 68, pp. 703715, 2008. [4] L. S. Blackford, J. Choi, A. Cleary, E. DAzevedo, J. W. Demmel, I. Dhillon, J. J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley, ScaLapack Users Guide. Philadelphia, PA: SIAM Press, 1997. [5] R. van de Geijn, Using PLapack: Parallel Linear Algebra Package. The MIT Press, 1997. Cambridge, MA:

[6] C. Vmel, Scalapacks MRRR algorithm, ACM Trans. Math. Softw., vol. 37, pp. 1:1 o 1:35, January 2010. [7] P. Bientinesi, I. S. Dhillon, and R. A. van de Geijn, A parallel eigensolver for dense symmetric matrices based on multiple relatively robust representations, SIAM J. Sci. Comput., vol. 27, pp. 4366, 2005. [8] F. Tisseur and J. Dongarra, A parallel divide and conquer algorithm for the symmetric eigenvalue problem on distributed memory architectures, SIAM J. Sci. Comput, vol. 20, pp. 22232236, 1999. [9] I. S. Dhillon, B. N. Parlett, and C. Vmel, The design and implementation of the MRRR o algorithm, ACM Trans. Math. Softw., vol. 32, pp. 533560, 2006. [10] E. Agullo et al., PLASMA Users Guide, Version 2.0, November 10, 2009. [Online]. Available: http://icl.cs.utk.edu/projectsles/plasma/pdf/users guide.pdf [11] W. N. Gansterer, R. C. Ward, R. P. Muller, and W. A. Goddard, III, Computing approximate eigenpairs of symmetric block tridiagonal matrices, SIAM J. Sci. Comput., vol. 25, pp. 6585, 2003. [12] W. N. Gansterer, R. C. Ward, and R. P. Muller, An extension of the divide-and-conquer method for a class of symmetric block-tridiagonal eigenproblems, ACM Trans. Math. Softw., vol. 28, pp. 4558, 2002. [13] Y. Bai and R. C. Ward, A parallel symmetric block-tridiagonal divide-and-conquer algorithm, ACM Trans. Math. Softw., vol. 33, 2007.

Parallel Numerics 11/Leibnitz, Austria/October 5-7

A Fully Parallel Algorithm For The Symmetric Eigen
No ratings yet
A Fully Parallel Algorithm For The Symmetric Eigen
20 pages
Multi Frontal
No ratings yet
Multi Frontal
24 pages
Kronbichler, Kormann - 2012 - A Generic Interface
No ratings yet
Kronbichler, Kormann - 2012 - A Generic Interface
13 pages
A Scalable, Numerically Stable, High-Performance Tridiagonal Solver For Gpus
No ratings yet
A Scalable, Numerically Stable, High-Performance Tridiagonal Solver For Gpus
52 pages
Bank 1985
No ratings yet
Bank 1985
4 pages
Knight Math221 Paper
No ratings yet
Knight Math221 Paper
34 pages
Greek Anal Fissures: ARA Research
100% (1)
Greek Anal Fissures: ARA Research
8 pages
Application of AVX (Advanced Vector Extensions) For Improved PDF
No ratings yet
Application of AVX (Advanced Vector Extensions) For Improved PDF
8 pages
Cyclotron
72% (61)
Cyclotron
20 pages
ZYJ260
No ratings yet
ZYJ260
78 pages
Icc PDF
100% (1)
Icc PDF
279 pages
Chap2 5
No ratings yet
Chap2 5
6 pages
Outline of Next 2 Lectures: Matrix Computations: Direct Methods I
No ratings yet
Outline of Next 2 Lectures: Matrix Computations: Direct Methods I
16 pages
Fiat Hitachi Excavator Ex135w Workshop Manual
100% (1)
Fiat Hitachi Excavator Ex135w Workshop Manual
22 pages
Project Proposal
No ratings yet
Project Proposal
9 pages
Quality Work Life
No ratings yet
Quality Work Life
12 pages
Standard Specification For Castings, Austenitic-Ferritic (Duplex) Stainless Steel, For Pressure-Containing Parts
No ratings yet
Standard Specification For Castings, Austenitic-Ferritic (Duplex) Stainless Steel, For Pressure-Containing Parts
6 pages
The Three Lines of Defence: Audit Committee Institute
No ratings yet
The Three Lines of Defence: Audit Committee Institute
4 pages
Control Lab1
0% (1)
Control Lab1
59 pages
SPM Swivels Operation Instruction and Service Manual
No ratings yet
SPM Swivels Operation Instruction and Service Manual
44 pages
225 WEIGHT INDICATOR Installation and Technical Manual
No ratings yet
225 WEIGHT INDICATOR Installation and Technical Manual
156 pages
Heading Hints A Guide To Cold Forming Specialty Alloys
No ratings yet
Heading Hints A Guide To Cold Forming Specialty Alloys
63 pages
5 Versionfinal
No ratings yet
5 Versionfinal
8 pages
Storage Tank Protection Using VCI 2
No ratings yet
Storage Tank Protection Using VCI 2
9 pages
Mechanical Tube English
No ratings yet
Mechanical Tube English
8 pages
Kowsi Final Project
No ratings yet
Kowsi Final Project
50 pages
TRCS - Assignment Issued To Students
No ratings yet
TRCS - Assignment Issued To Students
4 pages
Importance of Personal Growth
No ratings yet
Importance of Personal Growth
3 pages
Brochure 4200 en
No ratings yet
Brochure 4200 en
8 pages
10 of The Most Luxurious Indian Homes On Houzz
No ratings yet
10 of The Most Luxurious Indian Homes On Houzz
2 pages
Hydroline Breather FSB TB 130417
No ratings yet
Hydroline Breather FSB TB 130417
3 pages
Trắc nghiệm thi chuyên Keys
No ratings yet
Trắc nghiệm thi chuyên Keys
10 pages
Izar Net 2 14
No ratings yet
Izar Net 2 14
3 pages
Techniques in Measuring Microbial Growth
No ratings yet
Techniques in Measuring Microbial Growth
7 pages
STS Lesson 1
No ratings yet
STS Lesson 1
8 pages
Instructional Design Rubric Final
No ratings yet
Instructional Design Rubric Final
1 page
Teacher Notes and Answers 8 Fluid Mechanics
No ratings yet
Teacher Notes and Answers 8 Fluid Mechanics
3 pages
Exercise About News Item
No ratings yet
Exercise About News Item
3 pages
Calculators List Allowed
No ratings yet
Calculators List Allowed
1 page
Week 03 - Quiz
No ratings yet
Week 03 - Quiz
1 page
How To Code For Quantum Computers
From Everand
How To Code For Quantum Computers
Nivio Dos Santos
No ratings yet
Simulation of Digital Communication Systems Using Matlab
From Everand
Simulation of Digital Communication Systems Using Matlab
Mathuranathan Viswanathan
3.5/5 (22)
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter I. Kattan
3.5/5 (11)
The Traffic Assignment Problem: Models and Methods
From Everand
The Traffic Assignment Problem: Models and Methods
Michael Patriksson
5/5 (1)
Molecular Electronic-Structure Theory
From Everand
Molecular Electronic-Structure Theory
Trygve Helgaker
5/5 (2)
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
From Everand
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
Björn Olsson
No ratings yet
Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Worked Examples in Mechanics of Machines using MATLAB
From Everand
Worked Examples in Mechanics of Machines using MATLAB
Eric Ogur
No ratings yet
Foundations of Image Science
From Everand
Foundations of Image Science
Harrison H. Barrett
No ratings yet
Introduction to MATLAB for Scientists and Engineers: A Practical Guide to Computational Problem Solving
From Everand
Introduction to MATLAB for Scientists and Engineers: A Practical Guide to Computational Problem Solving
Eric Okoth Ogur
No ratings yet
Concepts of Combinatorial Optimization
From Everand
Concepts of Combinatorial Optimization
Vangelis Th. Paschos
No ratings yet
Basic Matrix Theory
From Everand
Basic Matrix Theory
Leonard E. Fuller
No ratings yet
Petri Nets: Fundamental Models, Verification and Applications
From Everand
Petri Nets: Fundamental Models, Verification and Applications
Michel Diaz
No ratings yet
Kronecker Products and Matrix Calculus with Applications
From Everand
Kronecker Products and Matrix Calculus with Applications
Alexander Graham
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Introductory Mathematics and Statistics for Islamic Finance
From Everand
Introductory Mathematics and Statistics for Islamic Finance
Abbas Mirakhor
No ratings yet
Dear The Weight
From Everand
Dear The Weight
Masud Rana
No ratings yet
Applied Iterative Methods
From Everand
Applied Iterative Methods
Louis A. Hageman
No ratings yet
Worked Examples in Mechanical Vibrations using MATLAB
From Everand
Worked Examples in Mechanical Vibrations using MATLAB
Eric Okoth Ogur
No ratings yet
Applications of Combinatorial Optimization
From Everand
Applications of Combinatorial Optimization
Vangelis Th. Paschos
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
From Everand
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
Abdolhossein Fereidoon
No ratings yet
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet
Finite Element Method
From Everand
Finite Element Method
Gouri Dhatt
1/5 (1)
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Applied Mathematics for Science and Engineering
From Everand
Applied Mathematics for Science and Engineering
Larry A. Glasgow
No ratings yet
Practical Monte Carlo Simulation with Excel - Part 2 of 2: Applications and Distributions
From Everand
Practical Monte Carlo Simulation with Excel - Part 2 of 2: Applications and Distributions
Akram Najjar
2/5 (1)
Vector Calculus Using Mathematica Second Edition
From Everand
Vector Calculus Using Mathematica Second Edition
Steven Tan
No ratings yet
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
A Friendly Introduction to MATLAB Programming
From Everand
A Friendly Introduction to MATLAB Programming
Orhan Gazi
No ratings yet
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
From Everand
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
Jyh-Horng Jeng
No ratings yet
NumPy Recipes
From Everand
NumPy Recipes
Martin McBride
No ratings yet
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter Kattan
No ratings yet
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
Tensor Structures and Applications: Definitive Reference for Developers and Engineers
From Everand
Tensor Structures and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Method of Moments for 2D Scattering Problems: Basic Concepts and Applications
From Everand
Method of Moments for 2D Scattering Problems: Basic Concepts and Applications
Christophe Bourlier
No ratings yet
Introduction to Finite Element Analysis
From Everand
Introduction to Finite Element Analysis
Rahul Basu
No ratings yet
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Towards Parallel Twisted Block Factorizations: Moldaschl, Gansterer Parnum11

Uploaded by

Towards Parallel Twisted Block Factorizations: Moldaschl, Gansterer Parnum11

Uploaded by

Moldaschl, Gansterer

TOWARDS PARALLEL TWISTED BLOCK FACTORIZATIONS

Computing eigenvectors using twisted block factorizations

The problem is dened by a symmetric block tridiagonal B1 A 2 A2 B2 A 3 .. .. . . A3 M := .. .. . . A p Ap Bp

(M I) x = W x = 0 with block tridiagonal W , e. g., using an inverse iteration process:

see http://www.intel.com/Assets/PDF/manual/325384.pdf Parallel Numerics 11/Leibnitz, Austria/October 5-7

One core per eigenvector

Two cores per eigenvector

More than two cores per eigenvector

Version3 Version0 Version1

Version0 Version3 Version1

Version3 Version0 Version1 Version6 Version2

Version0 Version3 Version6 Version2 Version1

Parallel Numerics 11/Leibnitz, Austria/October 5-7

You might also like