TSQR
TSQR
Abstract
Abstract TODO
I.
Introduction
Algorithm 1. Single-threaded TSQR B[] = array with B[i] is block i of A for (i=2 to numBlocks) concatenate B[i] below M compute M = QR M R end for return M
The Tall and Skinny QR (TSQR) factorization algorithm is an algorithm that produces the factorization A = QR of a matrix A where Q is orthogonal and R is upper triangular. This algorithm is designed for the case where A is a tall and skinny matrix. Suppose A Rmn , then A is considered tall and skinny if m n. These matrices have many applications, one of which is solving a least squares problem Ax = b where x, b Rn . Consider the application of this to a linear regression. In these cases matrices would have rows representing each observation and columns representing each variable being used in the model. This matrix would be very tall and skinny as the number of observations (m) often exceeds several thousand whereas the number of variables in the model (n) is usually less than 10. The algorithm works by taking the matrix A and splitting it into blocks of equal number of rows. In the sequential case with no multithreading the algorithm is represented by the following pseudocode.
The resulting M at end of this algorithm is the R from the factorization A = QR. The algorithm lends itself well to being done concurrently. Instead of sequentially computing the R matrices sequentially from top to bottom in one thread the algorithm would be altered so that if there are k threads then each one is responsible for m k rows which they would then split into blocks and perform the algorithm sequentially on. At the end of the parallel part there will be k R matrices that need to be combined. This is done sequentially using the original sequential TSQR algorithm. The idea behind using this TSQR algorithm is that matrices that are very large may not be able to t in memory. In order to prevent the matrix from spilling onto the hard drive (and making the program much slower when QR factorizations are done) the matrix is split into blocks. When a good blocksize is chosen most
Template
by howtoTeX.com
of the block should be able to remain in the cache so that the QR factorization can be computed with minimal reads and writes to slower memory and the hard drive. If the original matrix A is small enough such that it can already
t in the cache without being sectioned into blocks then this algorithm should not be used over just calling the regular QR procedure in the programming languages matrix library.