0% found this document useful (0 votes)
49 views13 pages

The Geometry of PLS1 Explained Properly

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views13 pages

The Geometry of PLS1 Explained Properly

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Research Article

Received: 6 August 2013, Revised: 11 December 2013, Accepted: 11 December 2013, Published online in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/cem.2589

The geometry of PLS1 explained properly: 10


key notes on mathematical properties of and
some alternative algorithmic approaches to
PLS1 modelling
Ulf G. Indahl*

The insight from, and conclusions of this paper motivate efficient and numerically robust ‘new’ variants of algorithms
for solving the single response partial least squares regression (PLS1) problem. Prototype MATLAB code for these
variants are included in the Appendix. The analysis of and conclusions regarding PLS1 modelling are based on a
rich and nontrivial application of numerous key concepts from elementary linear algebra. The investigation starts
with a simple analysis of the nonlinear iterative partial least squares (NIPALS) PLS1 algorithm variant computing
orthonormal scores and weights.
A rigorous interpretation of the squared P-loadings as the variable-wise explained sum of squares is presented.
We show that the orthonormal row-subspace basis of W-weights can be found from a recurrence equation. Conse-
quently, the NIPALS deflation steps of the centered predictor matrix can be replaced by a corresponding sequence of
Gram–Schmidt steps that compute the orthonormal column-subspace basis of T-scores from the associated
non-orthogonal scores.
The transitions between the non-orthogonal and orthonormal scores and weights (illustrated by an easy-to-grasp
commutative diagram), respectively, are both given by QR factorizations of the non-orthogonal matrices. The
properties of singular value decomposition combined with the mappings between the alternative representations of
the PLS1 ‘truncated’ X data (including Pt W) are taken to justify an invariance principle to distinguish between the PLS1
truncation alternatives. The fundamental orthogonal truncation of PLS1 is illustrated by a Lanczos bidiagonalization
type of algorithm where the predictor matrix deflation is required to be different from the standard NIPALS deflation.
A mathematical argument concluding the PLS1 inconsistency debate (published in 2009 in this journal) is also
presented. Copyright © 2014 John Wiley & Sons, Ltd.
Keywords: PLS1 algorithms; bidiagonalization; orthogonal and non-orthogonal weights; scores and projections; change of
coordinates and bases; truncation; QR factorization; singular value decomposition; reorthogonalization

1. INTRODUCTION (a) the extraction of orthonormal row-subspace and column-


subspace bases for computing various orthogonal projec-
In the chemometrics community, single response partial least tions of the observed data;
squares regression (PLS1) has been a popular tool for solv- (b) its close relationship to one of the fundamental problems of
ing regression problems with multicollinear data for more than numerical linear algebra: the computation of singular values
30 years; see [1–4]. Over the years, several papers focusing inter- by bidiagonalization of the matrix subject to investigation;
pretations and various theoretical aspects of PLS1 have been (c) interpretations related to the applications of the PLS1
published (see [5–14] for a selection). However, as late as 2007, method by considering the various coordinate representa-
Pell et al. [15] published a paper claiming inconsistencies in tions of the data with respect to the orthonormal row-
the residuals of conventional PLS1 regression due to the trun- subspace and column-subspace bases.
cation (projection) implied by the nonlinear iterative partial
least squares (NIPALS) PLS1 algorithm. The claimed inconsistency
resulted in a debate (published in Journal of Chemometrics 2009;
23, pages 67–77, see [16–19]) between some of the most influ-
* Correspondence to: Ulf G. Indahl, Department of Mathematical Sciences and
ential contributors to the field, without reaching a unanimous
Technology, Norwegian University of Life Sciences, N-1432 Ås, Norway.
conclusion. E-mail: [email protected]
The purpose of the present paper is to clarify unnecessary
misunderstandings by pinpointing some important but often
overlooked mathematical properties of PLS1 with particular Ulf G. Indahl
Department of Mathematical Sciences and Technology, Norwegian University
focus on of Life Sciences, N-1432 Ås, Norway

J. Chemometrics (2014) Copyright © 2014 John Wiley & Sons, Ltd.


U. G. Indahl

and last but not least WWt as the associated orthogonal projections. The column vec-
(d) a mathematical conclusion of the PLS1 inconsistency debate tors of P and the entries of q are referred to as the corresponding
by considering the key orthogonal and non-orthogonal pro- X- and y-loadings of the associated PLS1 model.
jections involved in PLS1 model building. By application of elementary linear algebra to the various parts
of the NIPALS algorithm described above, we are going to estab-
The main source of inspiration for the work presented subse-
lish a sequence of fundamental PLS1 modelling properties listed
quently was found in the two publications [12,18] by Ergon.
as notes in the following sections.
2. THE NIPALS PLS1 ALGORITHM WITH
ORTHONORMAL SCORES 3. PLS1 WITHOUT THE DEFLATION STEP
OF NIPALS
The widely applied NIPALS PLS1 algorithm with orthogonal (but
not normalized scores) [1–3] is usually considered as the bench- Note 1: explained variance by P-loadings
mark for comparison of other algorithmic approaches to PLS1
The X-loadings identity P = Xt0 T follows from
modelling. According to [20], the NIPALS PLS1 is relatively slow
but numerically stable in most practical situations. The main rea-
Xt0 ta = Xta–1 ta + (X0 – Xa–1 )t ta = Xta–1 ta = pa (1)
son for its lack of speed is the extensive data matrix deflation
that requires computation of the outer products between each because the column space of the matrix (X0 – Xa–1 ) is spanned
extracted component and the corresponding loadings. In the by the orthonormal subset {t1 , : : : , ta–1 } of scores that are all
typical applications of PLS, where the number of predictors is orthogonal to ta .
large compared to the number of observations, deflation of the Each row of P represents the coordinates of the corresponding
predictor matrix is a computationally expensive way to extract X0 -column w.r.t. the orthonormal column-subspace basis asso-
the desired sets of orthogonal PLS1 scores (and weights). ciated with T. Hence, P can also be referred to as the projected
To make the mathematics and interpretations as transpar- variables coordinate matrix.
ent as possible, we will focus on the version of the NIPALS The projection of the i-th column xi of X0 onto the direction of
PLS1 algorithm (algorithm 1) that computes orthonormal scores the score basis vector ta equals xia = ta tta xi = xti ta ta , and the
(orthogonal unit vectors). As usual, we will assume that X0 is
coordinate value pa (i) = xti ta relates directly to the amount of xi -
the mean-centered version of the n  m predictor matrix X and h i
that y is an n-dimensional response vector where the entries are variance = n1 xti xia accounted for by the a-th PLS1 component
a
associated with the corresponding rows of X. (i.e. ta ), that is,
According to the conventional PLS1 terminology and 1 t 1  t 2 1  2
properties, the matrices of scores (T) and weights (W) are both xia xia = x ta = pa (i) (2)
n n i n
orthogonal, that is, Tt T = Wt W = IA (the A  A identity matrix).
Hence, the associated vectors represent orthonormal bases for Hence, with orthonormal scores, the explained sum of squares
the PLS1 column and row subspaces, respectively, with TTt and corresponding to the i-th variable (xi , 1  i  m) accounted for

wileyonlinelibrary.com/journal/cem Copyright © 2014 John Wiley & Sons, Ltd. J. Chemometrics (2014)
The geometry of PLS1

by the PLS1 model is found by squaring and adding all (A) entries of P is (A + 1  A) lower bidiagonal, and the corresponding
in the i-th row of P. orthonormal basis vectors are arranged in the augmented p
(A + 1) weight matrix W+ = [W wA+1 ]. Consequently,
Note 2: W-weights recurrence equation gives NIPALS  
alternative without deflation Pt W = Bt2 Wt+ W

k 1 k –k 1 k kv 2k
2 3
By step 1 in algorithm 1, the vectors vta = yt Xa–1 (a = 1, : : : , A). kv k
0
1
Left multiplication by yt in step 6 of algorithm 1 gives the follow- kv3 k
6 7
6 k 2 k –k 2 k kv k
7
ing identities and implications: 2
6 7
. .
6 7
=6 . . 7
 t  6 . . 7
wa va
6 7
kva k kvA k 7
yt Xa = yt Xa–1 – yt ta pta ) va+1 = va – k A–1 k –k A–1 k kv k5
6
pa = va – pa 4
A–1
k a k k a k
0 k A k
(7)
+
is upper bidiagonal and of size A  A. Note that Pt W equals the
1 transposed coordinates of P truncated to the A basis (column)
 
va+1 = kva k wa – pa (3) vectors of W. According to (7), Pt W can therefore be found
k a k
directly from the normalizing constants of the orthogonal scores
By normalization of va+1 , we obtain wa+1 = kv 1 k va+1 . Hence, (T) and weights (W) only.
a+1
(3) followed by normalization defines a recurrence equation for
computing the orthonormal PLS1 weights. The associated nested Note 4: some computational issues in algorithms 1 and 2
sequence of vector equations can be entirely solved from the
starting vectors w1 , t1 , p1 (a = 1) and norms kv1 k, k 1 k that The main difference between algorithms 1 and 2 is that the
are all available before execution of the first deflation step in the computationally ‘expensive’ deflation (step 6) in algorithm 1
NIPALS algorithm. Note that Equation (3) with the succeeding (involving the computation and subtraction of A large vector
normalization is equivalent to the content of lemma 2 in [21]. outer products) is accounted for by a considerably less expensive
For a > 1, a trivial projection argument shows that the score GS orthogonalization (step 4) in algorithm 2 (involving the indi-
vector ta = k1 k  a is obtained from a Gram–Schmidt (GS) cated computationally ‘moderate’ matrix–vector products only).
a Consequently, an implementation of algorithm 2 executes con-
orthogonalization step of the vector  ?a = X0 wa with respect to siderably faster than the NIPALS PLS1. The GS orthogonalization
the preceding orthonormal scores {t1 , : : : , ta–1 } that forms a basis
step replacing the deflation step of algorithm 1 indicates that
for the column space of (X0 – Xa–1 ). Because the corresponding
the numerical robustness in proper implementations of the two
loading vector can be found directly by pa = Xt0 ta according to
algorithms should be similar.
note 1, wa+1 can (by induction) be found without executing the
The competitiveness of algorithm 2 with the fastest ‘stable’
deflation of X0 .
PLS1 algorithms discussed in [20] will not be investigated in detail
From notes 1 and 2, we can now establish a mathematically
in the present paper, but some comments on speed and preci-
equivalent PLS1 algorithm where deflation of X0 is replaced with
sion is given in Section 5. A prototype MATLAB implementation
a GS step to obtain the orthonormal scores:
of algorithm 2 is given in Appendix A.1.

Note 3: the W-weights recurrence equation shows that Pt W is 4. ADDITIONAL NOTES ON MATHEMATICS
bidiagonal
AND ALGORITHMIC FACETS OF PLS1
By a re-arrangement of Equation (3), the X-loadings
The vectors  ?a of algorithm 2 coincide with the non-orthogonal
scores found by Martens’ alternative PLS1 algorithm (see frame
 
kva+1 k
pa = k a k wa – wa+1 , a = 1, : : : , A < m (4) 3.5 in [3]) that was shown by Helland [7] to be mathematically
kva k
equivalent to the NIPALS algorithm: both algorithms share the
appear as linear combinations of exactly two successive weights. same set of orthogonal weights.
These vector equations of (4) are compactly expressed by the
matrix product Note 5: alternative coordinates and interpretations by
P = W+ B2 (5) elementary linear algebra
where the coordinate matrix By inspection of algorithm 2, the orthonormal scores
2
k 1 k 0
3 T = [t1 t2 : : : tA ] are successively derived
 from the  non-
6–k 1 k kv2 k orthogonal scores T? = X0 W =  ? 1  2 : : :  A by the
? ?
k 2 k 7
6 kv1 k 7 GS-orthonormalization steps (4 and 5) establishing an
..
6 7
kv3 k . orthonormal basis for the subspace spanned by the non-
–k 2 k kv
6 7
6 7
2k orthogonal scores.
B2 = 6
6 7
.. 7
It should not be ignored that the rows of the non-orthogonal
. | A–1 k
6 7
6 7
6 kvA k
7 score matrix T? represent coordinates of the observations
6 –k A–1 k kv k A k 7
4 A–1 k 5 with respect to the orthonormal row-subspace basis W =
0 –k A k kvkvA+1kk [w1 w2 ... wA ] of PLS1 weights (the coordinate interpretation
A
(6) of the T? entries is valid because we compute inner products

J. Chemometrics (2014) Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/cem
U. G. Indahl

between the X0 observations and the W basis vectors). From the Right multiplication of T? by D–1
2 in the preceding QR factor-
latter interpretation (focusing on the rows of observations), it ization (9) gives
makes sense to refer to T? as the Martens coordinate matrix w.r.t  –1 
the basis W .

? –1 ?
T = T D2 = T T T t ? = X0 WD–1 ?
2 = X0 W (12)
The associated orthogonal projection of the centered data
matrix X0 onto to the row subspace spanned by the orthonormal
basis W results in the truncation X0tr given by where  –1  –1
W? = WD–1 t ?
2 =W T T = W Pt W (13)
X0tr = T? Wt = X0 WWt (8) is the matrix of corresponding non-orthogonal weights. W? coin-
cides with the (non-orthogonal) weights matrix directly com-
with respect to the original coordinates. Note that T? = X0 W = puted by the mathematically equivalent SIMPLS algorithm [22] (in
X0 WWt W = X0tr W. Hence, the GS steps for deriving the the case of a single response vector y). Finally, a left multiplication
orthonormal scores from the Martens coordinate matrix imply of Equation (13) by Wt shows that
the existence of an invertible upper triangular (A  A) matrix
D2 that when paired with T represents a QR factorization of T? D–1 t ?
2 =W W (14)
(compare with section 3 in [18]),
The basic algebraic properties of PLS1 (just pointed out in
T? [= X0 W] = TD2 (9) notes 4 and 5) are illustrated by the commutative diagram shown
in Figure 1.
Left multiplication by Tt in Equation (9) solves for D2 as follows:
Note 6: QR factorization of the non-orthogonal W? -weights
Tt T? = Tt X0 W = Tt TD 2 = D2 (10)
The inverse of an upper triangular matrix is always upper trian-
gular. Because W is orthonormal (Wt W = IA ), the first identity
Thus, with respect to the orthonormal column-subspace basis
W? = WD–1 2 of Equation (13) represents a QR factorization of
T, an alternative set of (untruncated) coordinates for the non-
the non-orthogonal weight matrix W? because D–1 2 is upper
orthogonal scores (the columns of T? ) is given by the correspond-
triangular.
ing columns of
 
D2 = Tt X0 W = Pt W (11) Note 7: everything can be calculated if you know W or W?
which is a bidiagonal matrix according to note 3. Consequently, Any PLS1 algorithm (mathematically equivalent to the NIPALS
in PLS1 modelling, the matrix product Pt W has two different algorithm) computing either the orthonormal weights W or cor-
coordinate interpretations. responding non-orthogonal weights W? can easily be modi-

wileyonlinelibrary.com/journal/cem Copyright © 2014 John Wiley & Sons, Ltd. J. Chemometrics (2014)
The geometry of PLS1

fied to output the complete set of relevant matrices/vectors –1 


projection Hstr = W Pt W Pt = W? Pt , that is,
 
and regression coefficients (T, T? , W, W? , P, D2 , q, b) by using the
appropriate QR factorization (Equation 9 or 13) followed by direct –1 
 
TTt X0 = TPt = TTt X0 W? Pt = TTt X0 W Pt W
 
computation of the required matrix products.
 –1 (16)
Pt = X0 W Pt W Pt = X0 W? Pt
Note 8: orthogonal projections, a skew projection in disguise
and singular value decomposition consistency The fact that Hstr is a projection (idempotent) is easily checked,
that is,
The orthonormal bases of W and T span subspaces of the row and   –1    –1 
column spaces of X0 , respectively. Correspondingly, a complete H2str = W Pt W Pt W Pt W Pt
orthogonal truncation of X0 includes the projections of X0 onto
both subspaces, that is,  –1   –1 
= W Pt W Pt W Pt W Pt = Hstr
 –1  
However, in general, Hstr is not symmetrical Hstr ¤ Htstr and
         
TTt X0 WWt = T Pt W Wt = X0 W Pt W Pt W Wt
therefore not orthogonal. Consequently, the left orthogonal pro-
= X0 WWt = X0tr (15) jection X0str = TPt of X0 , most commonly focused in PLS1
modelling according to Pell et al. [15] and Wold et al. [16], turns
out to be a non-orthogonal (skew) right projection ‘in disguise’.
Hence, the suggested bi-orthogonal projection of (15) is Our intuition with respect to ‘truncations’ as obtained by the
equivalent to applying the (row subspace) orthogonal projection NIPALS algorithm must therefore be handled with some care. It
(8), obtained by right multiplication of X0 with Htr = WWt (com- should be noted that in Equation (29d) of the noteworthy paper
pare with section 2 in [12]), only. The truncation X0tr is indicated by Phatak and de Jong [24], the projection matrix (I – Hstr ) appear
in the center of the commutative diagram shown in Figure 1. Htr is and is recognized to represent a skew (oblique) projection for
trivially an orthogonal projection (idempotent and symmetrical). computing the PLS residual data matrix.
According to equation t t The singular values of the skew truncation X0str will in general
 t (15) and note 3, T X0tr W = P W is bidi-
t
agonal with T P W W as the corresponding bidiagonalization differ from the singular values of Pt W, T? and the orthogonal
of X0tr . According to the fundamental relationships used for com- truncation X0tr . This is due to the fact that the first score vector
puting singular values in the famous paper by Golub and Kahan ?A+1 (in the non-exhaustive case) not included in the PLS1 model
[23], the bidiagonal matrix Pt W, the Martens coordinates T? (alias is non-orthogonal to the subspace spanned by its orthogonal
the non-orthogonal scores) and the truncated data matrix X0tr all scores T.
share the same set of nonzero singular values. Consequently, the The difference between the two truncations is exactly
singular values of the bidiagonal matrix D2 = Pt W describes (up accounted for by the fact that X0str also includes the projection
to a scaling factor) the variances along the principal axes of both A+1 ontothe
of  ? T-subspace. Consequently, the nonzero singular
values of Pt = Pt W+ Wt+ , X0str = TPt and TPt W+ = TTt T?
    
the Martens coordinates T? and the truncated data X0tr , and con- +,
firm consistency between these alternative simplifications of the where the last expression  denotes
 the projection of the aug-
original data. mented matrix T? + = T  A+1 = X0 W+ of non-orthogonal
? ?
It is a non-intuitive fact that truncation of X0 by the left scores, are all identical and confirm consistency between these
orthogonal projection TTt is equivalent to an alternative dou- data representations.
ble projection of X0 that collapses to application of the right
Note 9: a recursion formula for upper bidiagonal inverses
According
 to a simple recursion formula for the inverse U–1 =
uQ 1 uQ 2 : : : uQ A of upper bidiagonal matrices of the form

d1 –b1 0
2 3
6 d2 –b2 7
. .
6 7
U=6 .. .. (17)
6 7
7
6 7
4 dA–1 –bA–1 5
0 dA
 t
uQ 1 = d1 0 : : : 0 and for 1 < a  A,
1

t
ba–1 1

uQ a = uQ a–1 + 0 .. .. 0
da da

Figure 1. Commutative diagram showing the elementary linear algebra that is, the a-th (and only nonzero) entry of the last vector
of PLS1 modelling. Arrows indicate multiplication from the right by the is equal to d1 ; see [25]. Hence, by taking da = k a k and
corresponding matrices. (All directed paths in the diagram with the same a
kva k
start and endpoints lead to the same result by composition.) ba = k a–1 k kv k
, the desired columns in the upper trian-
a–1

J. Chemometrics (2014) Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/cem
U. G. Indahl

–1 sponding MATLAB prototype code confirms this to be the case.


gular inverse D–1 = Pt W

2 can be calculated successively
while computing the PLS1 components. An explicit application Computation of the extra matrix–vector products for finding the
of this inversion strategy is demonstrated in the MATLAB code weights (step 1) also adds significantly to the computational cost
(see Appendix A.2) implementing a slightly modified version of of algorithm 1. In algorithm 2, the weights are found from the
the bidiagonalization algorithm for PLS1 modelling introduced loading computations (step 6) and the weighed vector differ-
by Manne [5]. ences (step 1), indicating that the computational cost of finding
the weights is reduced with 50% alone. Orthogonality of the score
vectors from both algorithms appear to be numerically satisfac-
Note 10: how to avoid GS in algorithm 2
tory. However, for the weights (when the number of extracted
In exact arithmetic, the Gram–Schmidt orthonormalization steps components is large), this is not necessarily the case. For both
(4 and 5) of algorithm 2 can be avoided by a successive computa- algorithms, this deficiency may cause precision problems in the
tions of the non-orthogonal W? -weights from the corresponding computation of the desired regression coefficients. A possible
orthogonal W-weights and the recursion formula in note 9 for solution to the problem is to include an additional reorthogonal-
–1 ization step as indicated in Appendix 1 (line 17) of the MATLAB
expanding Pt W .


MATLAB code for an algorithm including this modification is prototype code. Later, we will briefly explain the reasons for and
given in Appendix A.3. The relationships to the PLS1 versions of the possible solutions to this problem.
Appendix A.1 and A.2 are evident. For de Jongs SIMPLS algorithm [22] with a single response vec-
It should be noted that when using the code in Appendix A.3, tor (y), computation of the non-orthogonal W? -weights (in [22],
one runs into the same kind of trouble as was reported for the the notation R is used for the non-orthogonal weights) is driven
Bidiag2 in [20]. by the orthogonality requirement for the scores:

IA = Tt T = Tt X0 W? = Pt W? (18)
5. DISCUSSION
Hence, requiring orthonormality between the scores is equiv-
5.1. Notes 1–3
alent to requiring orthogonality between the P-loadings and
Extraction of orthonormal bases for the subspaces of interest W? -weights not corresponding to the same component. De Jong
simplifies later computations and interpretations, and is com- found this requirement to be satisfied exactly by the residual
mon practice in applied linear algebra. The advantages of nor- weight vector obtained after projecting w = Xt0 y onto the sub-
malization seems, however, to have been partly overlooked by space spanned by the P-vectors found ‘so far’. Appropriate scaling
the earliest pioneers of PLS modelling. Although mathematically of the desired residual weight vector was chosen to obtain nor-
equivalent, the original NIPALS PLS1 algorithm, with extraction of malization of the corresponding score vector. Thus, the SIMPLS
orthogonal but not normalized scores, disturbs the interpretation algorithm has its focus on computation of the (non-orthogonal)
of the squared P-loading entries as variable-wise explained sum weights required to obtain an orthonormal basis for the desired
of squares that was shown in Equation (2) of note 1. PLS1 column subspace that exactly corresponds to the column
The missing normalization is perhaps also the reason why subspace basis found by algorithm 2.
the simple relationship between the weights and loadings as
described in Equation (3) of note 2 was overlooked when for- 5.3. Notes 6 and 7
mulating the original NIPALS PLS1 algorithm. Another possible
explanation is that the NIPALS PLS was initially designed to Although Pt W is bidiagonal only in the PLS1 case, the commu-
also handle multivariate responses (Y) where the possibilities tative diagram of Figure 1 is valid for any right projection of
of avoiding the X-deflation were not so obvious. This problem X0 , where the row coordinates T? = X0 W with respect to an
has, however, later been solved by de Jongs SIMPLS algorithm orthonormal-subspace basis W of the m-dimensional Euclidean
[22]. A minor modification of the direct scores PLS1 algorithm of space. By the QR factorization of T? = TR, we obtain the
Andersson [20] (according to its close relationship to the SIMPLS orthonormal column-subspace basis T and the upper triangular
algorithm) will also solve the problem for multivariate responses matrix R = Pt W = Tt T? describing the coordinate relation-
without X-deflation. ships of both P with respect to W and T? with respect to T. In
The bidiagonal form of Pt W is of course ‘old news’ in the con- particular, this describes the situation for the multiresponse case
text of PLS1 modelling. The purpose of including note 3 is to (PLS2) as well as the various modifications of PLS1 based on extra
demonstrate that the bidiagonal form has a very simple and requirements in the computation of the orthonormal weights.
transparent interpretation as coordinates. From Equation (4), it For any number A of components, all algorithms mathemati-
is clear that each loading weight is a linear combination of not cally equivalent to the NIPALS PLS1 must necessarily compute a
more than two distinct weights and hence has at most two set of basis vectors for the subspace spanned by the columns of
nonzero coordinates with respect to both of the bases W+ and W = [w1 , : : : , wA ] (or W? ). If W is not found directly, the associ-
(the projection onto the subspace spanned by) W. ated orthonormal basis can always be uniquely (up to the sign of
each basis vector) obtained by a normalized GS post-processing.
5.2. Notes 4 and 5 Thereafter, the entire collection of scores, weights, loadings and
regression coefficients can easily be calculated. Consequently,
The main challenges of numerical problem solving are related to interpretations of the resulting model are not restricted by the
numerical precision and computational efficiency. A brief com- particular choice of algorithm. The main concerns when choosing
parison of the two algorithms indicate that their major difference between the mathematically equivalent PLS1 algorithms should
is the deflation step (6 in algorithm 1) and the GS step (4 in therefore be directed towards numerical precision and computa-
algorithm 2). Application of MATLABs profiling tool to the corre- tional efficiency.

wileyonlinelibrary.com/journal/cem Copyright © 2014 John Wiley & Sons, Ltd. J. Chemometrics (2014)
The geometry of PLS1

5.4. Note 8
As pointed out earlier, the orthonormal weights W together with
the centered data X0 are sufficient to describe the entire PLS1
model. The processing of any observation xr (a row vector either
present in X0 or new) according to the PLS1 model can there-
fore be based on its orthogonal projection xrtr = xr Htr = xr WWt
onto the modelled row subspace and the corresponding residual
xrres = xr –xrtr . The right multiplication of the skew truncation TPt
by WWt in Equation (15) is exactly the post-processing solution,
suggested by Ergon [18], to the alleged inconsistency problem
of the PLS1 model space lined out by Pell et al. [15] (and later
discussed in Journal of Chemometrics (2009; 23, pages 67–77),
see [16–19]). The truncation alternative (corresponding to a right
multiplication by the skew projection Hstr = W? Pt ) advocated
by Wold et al. [16] is illustrated in the extended commutative
diagram of Figure 2.
Consistent use of the skew truncation requires the non-
orthogonal projection xrstr = xr Hstr to be considered together
with the residual xrsres = xr – xrstr . Note that the vector of regres-
sion coefficients b = W(Pt W)–1 q is an eigenvector corresponding
to the eigenvalue  = 1 for both Htr and Hstr : Figure 2. The extended commutative diagram showing the elementary
h i  –1 linear algebra of PLS1 modelling with both types of truncation. Arrows
Htr b = WWt W Pt W indicate multiplication from the right.
  –1
q = W Wt W Pt W (19)
 –1 Consequently, the regression vector part of the inconsistency
q = W Pt W q=b debate can be ended by concluding that we can consistently
navigate between the alternative vectors of regression coeffi-
and   –1   –1 cients by using the projection matrices Hstr and G according to
Hstr b = W Pt W Pt W Pt W Equations (24) and (25).
In traditional PLS1 modelling, preference for the skew trunca-
 –1   –1
(20) tion TPt of X0 seems to be mainly justified by the deflation step
q = W Pt W Pt W Pt W
of the NIPALS PLS1 algorithm. It is, however, important to keep
 –1 in mind that the particular choice of deflation strategy is not
q = W Pt W q=b
at all theoretically critical. In particular, application of the right
Consequently, orthogonal projections (equivalent to double projections) in the
corresponding deflations
yO r = xr b = xrtr b = xrstr b (21)
demonstrates that both projections of xr are consistent with Xa = Xa–1 – Xa–1 wa wta = Xa–1 –  a wta (26)
application of the regression coefficients b.
According to [15], the minimum norm regression coefficients inside the Bidiag2 algorithm (demonstrated in the MATLAB code
for the NIPALS truncation TPt of X0 are of appendix A.4) is one way of stabilizing this algorithm. By noting
that (i) this type of deflation is identical to the deflation step in
 +  –1 Martens alternative PLS1 algorithm [3] and (ii) the NIPALS type of
ˇ = TPt y = P Pt P q (22)
deflation will not work correctly inside these algorithms, it is clear
that sacrificing orthogonality (between the row-subspace residu-
and does not share the properties reflected by Equation (21), als and corresponding part of the PLS model) is an unnecessary
that is, price to pay. Actions should therefore be taken to review our
xrtr ˇ ¤ yO r = xrstr ˇ ¤ xr ˇ (23) PLS modelling tools accordingly (in particular the latent variable
model view of PLS advocated in [16]). Putting the pieces together
However, from the identity TPt = X0 Hstr , it follows that the corre-
after computing the singular value decomposition (SVD) of Pt W
sponding X0 -regression coefficients b are obtained by applying
(i.e. with the orthonormal T and W) is all that is needed to get
the skew projection Hstr to ˇ:
it right.
  –1   –1  –1
Hstr ˇ = W Pt W Pt P Pt P q = W Pt W q = b (24) 5.5. Notes 9 and 10

 –1 The Lanczos bidiagonalization algorithm suggested by Golub and


By the orthogonal projection corresponding to G = P Pt P Pt Kahan [23] was originally designed for computing the SVD of a
of b, we obtain ˇ: matrix. This algorithm is much referred to as Bidiag2 according
to Paige and Saunders [26] and was introduced as a PLS1 algo-
–1 –1
    
Gb = P Pt P Pt W Pt W q=ˇ (25) rithm to the chemometrics community by Manne [5] under this
acronym. Eldén [13] has published a related paper on the mod-

J. Chemometrics (2014) Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/cem
U. G. Indahl

elling properties of PLS1 regression based on the equivalence REFERENCES


between NIPALS PLS1 and Lanczos bidiagonalization.
1. Wold S, Martens H, Wold H. The multivariate calibration method in
It should be noted that by switching signs in the off-diagonal chemistry solved by the PLS method. In Proceedings of the Confer-
elements of the bidiagonal matrix generated by Bidiag2, we ence of Matrix Pencils, Ruhe A, Kågström B (eds), Lecture Notes in
obtain consistency with the W-signs of the NIPALS algorithm. Mathematics. Springer Verlag: Heidelberg, 1983; 286–293.
Unfortunately, direct implementations of Lanczos algorithms 2. Wold S, Ruhe A, Wold H, Dunn WJ, III. The collinearity problem in
are known to be computationally unstable in floating point linear regression. The partial least squares (PLS) approach to general-
ized inverses. SIAM J. Scientific Comput 1984; 5: 735–743.
arithmetic with potentially rapid loss of orthogonality for the 3. Martens H, Naes T. Multivariate Calibration (2nd edn). Wiley:
computed vectors (see Golub and Kahan [23], Simon [27] and Chichester, UK, 1989.
Björck [28]). A desired stabilization can be obtained by intro- 4. Wold S, Sjöström M, Eriksson L. PLS-regression: a basic tool of chemo-
ducing a so-called full reorthogonalization or a (computation- metrics. Chemom. Intell. Lab. Syst 2001; 58: 109–130.
5. Manne R. Analysis of two partial-least-squares algorithms for multi-
ally more sophisticated and efficient) partial reorthogonalization variate calibration. Chemom. Intell. Lab. Syst. 1987; 2: 187–197.
strategy; see Simon [27] and Larsen [29]. However, for the com- 6. Höskuldsson A. PLS regression methods. J. Chemom 1988; 2:
putations of a limited number of PLS components only (typ- 211–228.
ical in the majority of PLS1 applications), the computational 7. Helland IS. On the structure of partial least squares regression.
savings of a partial reorthogonalization strategy compared to Commun. Statist.(Simul. Comput.) 1988; 17: 581–607.
8. Frank IE, Friedman JH. A statistical view of some chemometrics
the full reorthogonalization is rarely critical (in comparison to regression tools. Technometrics 1993; 35(2): 109–148.
the corresponding savings by partial reorthogonalization for a 9. Garthwaite PH. An interpretation of partial least squares. J. Am. Stat.
full SVD). Assoc. 1994; 89(425): 122–127.
For the sake of completeness, an efficient and numerically sta- 10. ter Braak CJF, de Jong S. The objective function of partial least
squares regression. J. Chemom. 1998; 12: 41–54.
bilized MATLAB prototype version of the Bidiag2 (to be compared 11. Helland IS. Some theoretical aspects of partial least squares regres-
to the unstable version and other algorithms in Andersson [20]) is sion. Chemom. Intell. Lab. Syst. 2001; 58: 97–107.
given in Appendix A.5. 12. Ergon R. PLS score-loading correspondence a bi-orthogonal factor-
ization. J. Chemom. 2002; 16: 368–373.
13. Eldén L. Partial least-squares vs. Lanczos bidiagonalization-I:analysis
5.6. Final remarks of a projection method for multiple regression. Comput. Stat. Data
Anal. 2004; 46: 11–31.
Reorthogonalization in PLS algorithms focusing on stabilization 14. Boulesteix AL, Strimmer K. Partial least squares: a versatile tool for
of the scores (only) has been briefly considered by Faber and the analysis of high-dimensional genomic data. Brief. Bioinform. 2006;
Ferré [30]. To avoid problems in the subsequent computations of 8(1): 32–44.
regression coefficients, reorthogonalization of the weights seems 15. Pell RJ, Ramos LS, Manne R. The model space in partial least squares
regression. J. Chemom. 2007; 21: 165–172.
to be just as important. 16. Wold S, Høy M, Martens H, Trygg J, Westad F, MacGregor J, Wise BM.
One should note that the most delicate issues of floating point The PLS model space revisited. J. Chemom. 2009; 23: 67–68.
arithmetic and orthogonalization are not necessarily robustly 17. Bro R, Eldén L. PLS works. J. Chemom. 2009; 23: 69–71.
dealt with by the MATLAB statements commented ‘Included 18. Ergon R. Re-interpretation of NIPALS results solves PLSR inconsis-
for numerical stability’ in the prototype scrips presented in tency problem. J. Chemom. 2009; 23: 72–75.
19. Manne R, Pell RJ, Ramos LS. The PLS model space: the inconsistency
the Appendix. persists. J. Chemom. 2009; 23: 76–77.
As pointed out in several talks by Björck [31], who is a 20. Andersson M. A comparison of nine PLS1 algorithms. J. Chemom.
distinguished numerical analysis researcher, commercial imple- 2009; 23: 518–529.
mentations of PLS software should take more seriously the 21. Ergon R. Finding Y-relevant part of X by use of PCR and PLSR model
reduction methods. J. Chemom. 2007; 21: 537–546.
development of numerically precise algorithms (either by robust 22. de Jong S. SIMPLS: an alternative approach to partial least squares
reorthogonalization strategies for the Lanczos bidiagonalization regression. Chemom. Intell. Lab. Syst. 1993; 18: 251–263.
algorithm or by alternatives based on Householder transforma- 23. Golub GH, Kahan W. Calculating the singular values and pseudoin-
tions in the case of PLS1 regression). verse of a matrix. SIAM J. Numer. Anal. 1965; 2: 205–224.
In the traditional NIPALS PLS1 applications [4], the orthogo- 24. Phatak A, de Jong S. The geometry of partial least squares. J.
Chemom. 1997; 11: 311–338.
nal (but not normalized) scores corresponds to multiplying the 25. Higham NJ. Efficient algorithms for computing the condition number
T-columns of algorithm 2 with the corresponding diagonal ele- of a tridiagonal matrix. SIAM J. Sci. Stat. Comput. 1986; 7(1): 150–165.
ments of Pt W. One should not confuse these scores with any kind 26. Paige C, Saunders M. LSQR: an algorithm for sparse linear equations
of orthogonal projection of the original X0 -data. de Jong [22] and sparse least squares. ACM Trans. Math. Softw. 1982; 8: 43–71.
proved that the NIPALS PLS1 scores represent an orthogonal basis 27. Simon HD. The Lanczos algorithm with partial reorthogonalization.
Math. Comp. 1984; 42(165): 115–142.
(and hence a reference system) for the A-dimensional subspace 28. Björck Å. Numerical Methods for Least Squares Problems. SIAM:
(of the space spanned by the X0 -columns) maximizing the X0 – y Philadelphia, 1996.
covariances with orthogonality constraints. The corresponding 29. Larsen RM. Lanczos bidiagonalization with partial reorthogonal-
theoretically important SIMPLS algorithm was designed to solve ization. Technical Report , DAIMI PB-357, Department of Computer
the constrained covariance maximization problem. The SIMPLS Science, Aarhus University, September 1998.
30. Faber NM, Ferré J. On the numerical stability of two widely used PLS
orthonormal score vectors are of course mathematically equiva- algorithms. J. Chemom. 2002; 22: 101–105.
lent to those found by the GS step of algorithm 2 and directly in 31. Björck Å. Available from: http://www.mai.liu.se/~akbjo/ (accessed Jan
the MATLAB prototype of Appendix A.3. 7th, 2014).

wileyonlinelibrary.com/journal/cem Copyright © 2014 John Wiley & Sons, Ltd. J. Chemometrics (2014)
The geometry of PLS1

APPENDIX A. MATLAB CODE FOR


ALGORITHMS
A.1. The non-deflating NIPALS PLS1

J. Chemometrics (2014) Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/cem
U. G. Indahl

A.2. The modified Bidiag2

wileyonlinelibrary.com/journal/cem Copyright © 2014 John Wiley & Sons, Ltd. J. Chemometrics (2014)
The geometry of PLS1

A.3. Direct computation of orthonormal scores

J. Chemometrics (2014) Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/cem
U. G. Indahl

A.4. Bidiag2 with deflation

wileyonlinelibrary.com/journal/cem Copyright © 2014 John Wiley & Sons, Ltd. J. Chemometrics (2014)
The geometry of PLS1

A.5. Bidiag2 with stabilization of scores and weights

J. Chemometrics (2014) Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/cem

You might also like