0% found this document useful (0 votes)
8 views6 pages

How To Make Pls Consistent

This document discusses the consistency of Partial Least Squares (PLS) estimation in the context of a factor model, focusing on mode A and the adjustment of weight vectors using proportionality factors. It also explores the implications of correlated errors and introduces various fitting functions for improving estimation accuracy. The author emphasizes the need for further research to determine the most effective methods for achieving consistent estimators in PLS.

Uploaded by

jorgequijano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views6 pages

How To Make Pls Consistent

This document discusses the consistency of Partial Least Squares (PLS) estimation in the context of a factor model, focusing on mode A and the adjustment of weight vectors using proportionality factors. It also explores the implications of correlated errors and introduces various fitting functions for improving estimation accuracy. The author emphasizes the need for further research to determine the most effective methods for achieving consistent estimators in PLS.

Uploaded by

jorgequijano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A note on how to make Partial Least Squares (mode A) consistent1

0. Introduction

Our setup will be the ‘basic design’, a factor model that satis…es the ‘fun-
damental principle of soft modeling’: all information between the blocks is
conveyed by the latent variables. This entails that the covariance matrix
between the indicators of any two blocks has rank one. We assume the avail-
ability of a consistent estimator2 for the covariance matrix of the indicators
on which we apply mode A. It is well known that this mode produces weight
vectors for the blocks with probability limits proportional to the loadings. If
we let represent the covariance matrix of any block, the corresponding
loading vector and w the probability limit (plim) of the estimated weight
b we have
vector w,
1
w = plim w b= ( | )2 : (1)
Based on another property of the basic design, I have shown (Dijkstra 1981,
2010) how to estimate the proportionality factor for each weight vector con-
sistently. With the proportionality factors we can adjust the output of mode
A and obtain consistent estimators for all parameters of the factor model.
The property referred to is the assumed zero correlation between the errors
within each block. So
= |+ (2)
where the diagonal matrix stands for the covariance matrix of the errors.
Since the o¤-diagonal elements of are products of the loadings, ij = i j ,
the idea was that the o¤-diagonal elements sij of the consistent estimator S
for can help adjust the scale of wb and thereby obtain a consistent estimator
for : To this end we multiply w b by a scalar c and choose its value such that
the di¤erence between sij and (cw bi ) (cw bj ) is ‘as small as possible’. In
(Dijkstra 1981, 2010) this was simply made operational by minimizing
P
[sij (cw bi ) (cw bj )]2 (3)
i6=j

1
I assume familiarity with PLS, its modes and its notions. See The Handbook of Partial
Least Squares by Esposito Vinzi et al (2010) e.g. or Systems Under Indirect Observation,
part II, by (Jöreskog and Wold, 1982), for background and elaboration. It may be appro-
priate here to point out that similar approches have been outlined for mode B, (Dijkstra
1981, 2010). Our focus here is on mode A.
2
An estimator S is said to be a consistent estimator for when S is arbitrarily close
to , with arbitrarily high probability, provided the sample size is su¢ ciently large. This
is abbreviated to plim S = :

1
as a function of c: The solution b
c is
"P # 12
wbi w
bj sij
b
c= Pi6=j 2 2 (4)
bi w
i6=j w bj

provided the expression in brackets is positive. A special case is obtained for


just two indicators. Then3 r
s12
b
c= : (5)
wb1 w
b2
Clearly, (b b1 ) (b
cw cwb2 ) = s12 so that plim ((b
cwb1 ) (b
cwb2 )) =plim s12 = 1 2 and
b
cwb is consistent for . We note for further reference that for the general case
1
plim(b c) = ( | ) 2 .
One purpose of this note is to slightly extend the least squares approach to
the case where some of the errors are correlated, and by introducing weights.
Another purpose is to propose more ways of de…ning suitable scaling factors
like b
c, who may turn out to be useful as well.

1. Correlated errors and weighted least squares

Taking account of correlated errors is actually quite trivial, when we know


which errors (which elements of ") are correlated. Let U be the set of
uncorrelated pairs: U := f(i; j) jcorr ("i ; "j ) = 0g assumed to be nonempty.
An immediate extension is to minimize with respect to c:
P 2
sij c2 w
bi w
bj (6)
i;j2U

which produces
"P # 12
i;j2Ubi w
w bj sij
b
c= P : (7)
bi2 w
i;j2U w bj2
The least squares criterion implicitly regards all di¤erences sij c2 w
bi w
bj
equally informative. But one could claim that this is too simple. Argu-
ing heuristically, using facts from the asymptotic theory of sample moments
from multinormal distributions, we know that the higher the correlation ij ,
the more accurate sij will be (we will work with standardized variables as
3
Typically, ‘genuine’indicators can be assumed to be positively related to the under-
lying latent variables, so all three terms under the square root sign will be positive with
high probability when the sample is su¢ ciently large. But loadings with opposite signs
present no problem either, since numerator and denominator will tend to be of the same
sign.

2
2
is customary in PLS). In fact, the asymptotic variance of sij is 1 s2ij
divided by the number of observations. Perhaps one should weigh the dif-
ference sij c2 w bj by the inverse of 1 s2ij in the criterion. If we keep
bi w
treading along this path, (asymptotic) covariances between sij and skl come
into the picture as well, leading us to proper weighted least squares. And
while we are at it, distribution-free estimates for these covariances are avail-
able and could be used instead of the values based on an assumed normality.
But it is fair to say that the latter two suggestions will be of some value only
when the sample size is ‘very large’, if experience with these weights in WLS
in SEM counts for anything. The …rst suggestion, where cross-products are
ignored, may have some merit though. The adjustment is again straightfor-
ward. Let Wij be the weight of the di¤erence sij c2 w bi w
bj , say 1 1 s2ij ,
or something else. Then we minimize with respect to c
P 2
sij c2 wbi w
bj Wij (8)
i;j2U

and get
"P # 12
i;j2Ubi w
w bj sij Wij
b
c= P : (9)
bi2 w
i;j2U w bj2 Wij
Finally, as a bridge to the next section, we note that the variance stabilizing
transformation of a correlation coe¢ cient (from a normal sample) could also
help to ‘equalize the importances’ of the di¤erences in the criterion: the
asymptotic variance of
1 1 + sij
log (10)
2 1 sij
is just one over the number of observations, independent of ij . So we could
contemplate to minimize
2
P 1 1 + sij 1 1 + c2 w
bi w
bj
log log (11)
i;j2U 2 1 sij 2 1 cw 2 bi w
bj
which however will require an iterative procedure for its solution. In the
next section we will introduce a family of nonlinear …tting functions, some
of which will also allow of an explicit, easy solution.

2. A large class of …tting functions suitable for ratios

Instead of looking at di¤erences, we could take the ratios, assumed to be


positive, as our raw material:
sij c2 w
bi wbj
or (12)
2
cw bi wbj sij

3
and minimize a criterion of these ratios whose probability limit attains a
unique global minimum at the point where all ratios equal one. In other
words, based on probability limits f ij gi6=j and w, minimization would yield
the correct value. As an example, consider a real function f (x) de…ned for
positive real x as
f (x) := x log (x) 1: (13)
It is clear that f (x) 0, and f is zero when and only when x = 1. Now
minimize with respect to c (or c2 ) the criterion
P c2 w
bi wbj P c2 w
bi wbj c2 w
bi wbj
f = log 1: (14)
i6=j sij i6=j sij sij
The optimal solution b c is the square root of the harmonic mean of the
sij
bi w
w bj
’s. Replacing sample values by probability limits, plim(sij ) = i j and
1
|
plim(wbi wbj ) = i j gives plim(bc) = ( | ) 2 , exactly as required.
Sustituting 1=x for x yields 1=x+log (x) 1 which is also nonnegative and
zero only for x = 1: Minimization of the corresponding criterion yields the
s
square root of the arithmetic mean of the wbiijwbj ’s. It also has the desired prob-
ability limit of course. If we take g (x) := 12 f (x) + 12 f (1=x) = 21 (x + 1=x) 1
the induced optimal correction factor equals
vP
u s
u i6=j wbiijwbj
t4
P : (15)
bi w
w bj
i6=j sij

This is the square root of the geometric mean of the previous factors (the
s
square roots of the arithmetic mean and the harmonic mean of the wbiijwbj ’s),
a neat compromise between the other solutions. The function g (x) satis…es
by construction g (x) = g (1=x), so it does not matter on which of the ratios
in (12) one focusses. As another example with the same property, consider
a real function h (x) de…ned for positive real x as4
1
(log (x))2 :
h (x) := (16)
2
Again, h (x) 0, and h is zero when and only when x = 1. The value of c
that minimizes the corresponding criterion is simply 5
v
u !1=#
u Q sij
t ; (17)
i6=j wbi w
bj
4
The factor 12 is an innocent normalization, yielding f (2) (1) = 1 for the second deriv-
ative, which we will impose on all functions of x below.
5
The symbol ‘#’stands for the number of di¤erent pairs (i; j)).

4
s
the square root of the geometric mean of the wbiijwbj ’s. Also with the correct
probability limit.
Incidentally, recall that the geometric, harmonic and arithmetic means
are members of the family of power means. So any square root of a power
mean will be a contender. Since the power means can be ordered (e.g. the
harmonic mean the geometric mean the arithmetic mean), it is clear
that the choice is real. We have as yet no clear guide as to which is best.
The functions 14 (1 x)2 + 14 (1 1=x)2 and 21 (xr + x r ) 1 =r2 for real
r 1 are obviously feasible choices as well: We can generate an in…nite
number of candidates by taking any smooth, strictly convex function h with
h (y) = h ( y) for all real y and a unique minimum of zero at y = 0, and then
take h (log (x)) for positive x. By construction, h (log (x)) = h (log (1=x)).
This family of functions has been studied in another context, multicriteria
decision analysis (the AHP-method (Dijkstra, 2011)). Arbitrary members do
not allow of explicit minimizers of the induced criteria, they typically require
1
iterative techniques. But their probability limits all equal ( | ) 2 , as can
be veri…ed with some work based on general minimum distance estimation
theory.
As before we can take correlated errors into account, and introduce weights.

A …nal remark: which function, which approach works best is as yet an en-
tirely open question. It may well be that the approach …rst tried (equation
(4)), and its variations, that led to PLSc, is a dependable workhorse and is
good enough for most purposes, but the others may have merit also. Any
guesses/advice, anyone?

Theo K. Dijkstra, April 29th 2013.

References
[1] Dijkstra, T. K. (1981). Latent variables in linear stochastic models. PhD
thesis. (second edition,1985, Amsterdam: Sociometric Research Founda-
tion).
[2] Dijkstra, T. K. (2010). Latent variables and indices. In: Esposito Vinzi,
V., Chin, W. W., Henseler, J., Wang, H. (eds.), Handbook of Partial Least
Squares, chapter 1, pp. 23-46. Springer-Verlag, Heidelberg.
[3] Dijkstra, T. K. (2011). On the extraction of weights from pairwise com-
parison matrices. Central European Journal of Operations Research, DOI
10.1007/s10100-011-0212-9

5
[4] Esposito Vinzi, V., Chin, W. W., Henseler, J., Wang, H. (eds.), Handbook
of Partial Least Squares, Springer-Verlag, Heidelberg.

[5] Jöreskog, K. G. & Wold, H. (eds.),(1982). Systems under Indirect Obser-


vation, part II, North-Holland, Amsterdam.

[6] Steele, J. Michael (2004). The Cauchy-Schwarz Master Class, Cambridge


University Press (The Mathematical Association of America)

You might also like