JCCP
JCCP
g
+
g
(1)
g
=
g
+
g
g
(2)
g
=
g
g
+
g
(3)
Assuming p items and one latent factor in each country, Equation 1 defines the relation-
ship between the observed variables x (p 1 vector) in the different groups or countries g (g =
1, . . . ., G) and the latent variable (11vector). represents the vector (p1) of coefficients
indicating the magnitude of the expected change in the observed variable for a one-unit
change in the latent variable. These coefficients are regression coefficients (factor loadings)
for the effects of the latent variable on the observed variable. The vector (p 1) represents
the items intercepts, and is the vector (p 1) of errors of measurement for x. Equation 2
defines the mean structure, with , the vector (p 1) of the itemmeans, and corresponding
to the vector (here 1 1) of latent means. Residuals are assumed to have means of 0. Equation
3 is the covariance structure with , the variance-covariance matrix of x, where is the vari-
ance-covariance matrix of the latent variable and the variance-covariance matrix of
(usually constrained to be a diagonal matrix). Taking into account these components of the
measurement model, Steenkamp and Baumgartner (1998) precise a procedure with succes-
sive steps corresponding to different levels of equivalence across groups or countries:
0) A first preliminary analysis tests the assumption that the different samples have different
covariances () and means () vectors. If this is not the case, the samples can be pooled and a test
of equivalence has no purpose. Basically, all cultures would stem from the same population.
Usually, equivalence cannot be assumed at this general level. Moreover, one can obtain an initial
idea of the differentiated impact of the inequivalence of means or covariances testing their
equality across the countries separately. Thus, three models are tested here, one pooling the
Spini / MEASUREMENT EQUIVALENCE OF VALUES 7
2003 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
at Bibliotheques de l'Universite Lumiere Lyon 2 on May 2, 2008 http://jcc.sagepub.com Downloaded from
effects of means and covariance variances (), one testing the invariance of covariances (),
and one testing the invariance of means () across the countries.
1) Configural invariance: The goal here is to demonstrate that the items constituting the mea-
surement instrument exhibit the same configuration of salient (different from zero) and not-
salient (fixed at zero) factor loadings () across the countries. Unidimensional models of values
are evaluated using Schwartzs (1994) indications concerning the empirical location of each
value. These first models are considered here as the basic models (H
base
) for the test of
inequivalence. This type of model estimates factor variance, error variances, and factor load-
ings, with the exception of one factor loading (linking the same value itemto its latent variable in
all the samples) set at unity in order to anchor the scale of the latent variable.
2) Metric invariance: At this level, structural equation models can be used to test the hypothesis
(H) that metrics or scale intervals are equal across countries. This requires that configural
equivalence be accepted. To test the hypothesis of metric equivalence, constraints on the factor
loadings are defined in such a way as to constrain these parameters to be equal across the sam-
ples (
1
=
2
. . . =
G
). If metric invariance is confirmed, this results in the fact that difference
scores on an itemcan be meaningfully compared across samples and that these difference scores
are indicative of similar cross-national differences in the latent variable. Metric invariance is a
very important step in the evaluation of equivalence, as this level of invariance is a prerequisite
for other levels to be tested.
3) Scalar invariance: In many studies, it is of primary importance to compare means across
countries. To test such comparisons meaningfully, one must establish scalar invariance that
results in the possibility of comparing the observed means of the underlying construct across
groups. This test requires providing the analyses with the observed means in addition to the
covariances and to impose additional constrains of equality on the intercepts (
1
=
2
. . . =
G
).
This model (H
) is nested in the H
2
. . .
G
), in addition to the parameters already fixed in the H
model. The H
is nested in the H
model.
5) Error variance invariance: The last model (H
) adds to the H
> H
> H
> H
. If two nested models fit the data equally well, the most constrained model (hence
parsimonious) is accepted. If not, the hypothesis of equivalence is rejected and the least con-
strained model is retained. Subsequent model comparisons need not be evaluated (see Van de
Vijver & Leung, 1997a).
It is also important to note that other sequences could be defined on the basis of the
researchers interest. In particular, the last two tests, concerning variance and error
invariance, can be performed in the inverse order (H
) or separately (H
, H
), as they do
not build on each other as the others do (Steenkamp &Baumgartner, 1998). In this article the
sequence H
base
> H
> H
> H
2
reports differences in
2
for nested models. Model 5a includes the item social justice. Model 5b includes the
item wisdom.
*p < .05. **p < .01. ***p < .001.
2003 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
at Bibliotheques de l'Universite Lumiere Lyon 2 on May 2, 2008 http://jcc.sagepub.com Downloaded from
variance for the itemcreativity) in the French sample. It is always difficult to understand why
improper solutions appear, as they may be the consequence of different factors such as sam-
pling fluctuations, of outliers, of a fundamental fault in the specification of the model, or sim-
ply due to sampling (Bollen, 1989b). The proposed solution was to evaluate the overall fit of
the models excluding the French sample in which the Heywood case appeared. The IFI and
RMSEA indicated that the four- and five-item models had a relatively better fit than the six-
item model. But taking into account the SRMR and RMSEA, the three models were
accepted.
The next value type concerned Tradition values. AHeywood case (negative error variance
on the itemhumble) appeared in the Ugandan sample for the five-itemmodel. This problem
was again resolved by discarding this sample fromthe analysis. The results for the four- and
five-item models indicated an excellent fit of both models.
Finally, for Universalism values, seven models were tested. Of these, two models were
tested with five items each. This is due to the fact that the fifth itemsocial justiceand the
sixth itemwisdomwere equally correctly situated in the Universalism value type in
Schwartzs analyses (see Table 1). Concerning the fit of the models for Universalism, the pic-
ture is somewhat fuzzy. First, on the basis of the SRMR, the nine-, seven-, and six-itemmod-
els had values equal to or just above the indicated threshold of 0.80. Taken together, the
RMSEAand IFI also led us to reject the nine- and seven-itemmodels. We had then the choice
between the eight-item, the two five-item, and the four-item models. Considering the fact
that the seven- and six-item models did not reach acceptable standards, it was decided to
reject the eight-itemmodel, and thus to consider that the two five-itemmodels contained the
maximum number of items that were defined by a single dimension of values.
Metric invariance. To test the metric invariance of the value types across the country sam-
ples, a constraint was added to the models used for the configural invariance. It concerned the
factor loadings, which were fixed at a level equal across the samples (H
). As these models
were nested in the previous models (H
base
), it was possible, in addition to the absolute fit indi-
ces, to report comparative indices of fit (
2
and RMSEA) in order to assess whether the
constraints on the factor loadings significantly deteriorated the specified models or not (see
Table 5).
For Achievement values three models were considered, with six, five, and four items,
respectively. Considering the RMSEA and IFI overall fit indices, the five-item was associ-
ated with unacceptable values. This led us to also reject the six-item, which contained the
five-item model, and to accept the four-item model as metrically invariant. This conclusion
also considered the RMSEAindex, which indicated that this model was not different from
the basic model.
Five models were tested for Benevolence values comprising eight to four items. Con-
sidering the overall fit indices, and in particular the SRMRand RMSEA, only the eight-item
model was rejected for metric equivalence on the basis of SRMR and IFI.
Both overall and comparative fit indices for Conformity values indicated that the four-
item model was metrically equivalent across the national groups. As for the configural
hypothesis, this value dimension appeared to have excellent measurement properties for
cross-cultural comparisons, here of difference scores.
Hedonismwas tested for the hypothesis of metric equivalence using the three-itemform.
This implied that no comparative test was available, as the basic model has no degree of free-
dom. Surprisingly, taking into account that a small number of items usually favored the
acceptance of the hypotheses of equivalence across our results, here both RMSEA and IFI
14 JOURNAL OF CROSS-CULTURAL PSYCHOLOGY
2003 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
at Bibliotheques de l'Universite Lumiere Lyon 2 on May 2, 2008 http://jcc.sagepub.com Downloaded from
indices were just above the defined thresholds. Therefore, the hypothesis of metric equiva-
lence was rejected for Hedonism.
The Power four-item H
model was rejected on the basis of both RMSEA and IFI. The
Power values included the items social power, authority, and wealth as best indicators (see
Table 1), which together constituted a fairly consistent set of power values. This was the
Spini / MEASUREMENT EQUIVALENCE OF VALUES 15
TABLE 5
Absolute and Comparative Fit Indices for the Equal Factor Loadings Models (H
)
and the Change in Fit From the Basic Models (H
base
)
Absolute Fit Index Comparative Fit Index
Value Type Satorra- RMSEA; RMSEA;
Hypothesis Bentler
2
df SRMR RMSEA 90% CI IFI
2
RMSEA 90% CI
Achievement
H
6 408.21*** 289 0.064 0.048 0.037; 0.058 0.87 200.72*** 0.016 0.013; 0.020
H
5 329.01*** 185 0.063 0.066 0.054; 0.077 0.87 177.38*** 0.018 0.014; 0.021
H
4 139.54** 102 0.076 0.045 0.024; 0.063 0.93 97.79** 0.013 0.008; 0.017
Benevolence
H
8 755.43*** 560 0.081 0.044 0.036; 0.052 0.82 294.44*** 0.017 0.014; 0.020
H
7 562.56*** 414 0.062 0.045 0.035; 0.054 0.82 172.15** 0.011 0.007; 0.014
H
6 407.83*** 289 0.050 0.048 0.037; 0.058 0.81 146.30** 0.011 0.007; 0.015
H
5 273.41*** 185 0.038 0.052 0.038; 0.064 0.82 98.54 0.008 0.000; 0.013
H
4 122.31 102 0.031 0.033 0.000; 0.053 0.88 67.89 0.006 0.000; 0.012
Conformity
H
4 102.65 102 0.058 0.006 0.000; 0.040 0.97 54.74 0.000 0.000; 0.008
Hedonism
H
4 195.72*** 102 0.047 0.072 0.056; 0.087 0.94 115.22*** 0.016 0.011; 0.020
H
7 516.31*** 414 0.083 0.037 0.026; 0.047 0.87 143.94 0.000 0.000; 0.006
H
6 355.63** 289 0.091 0.036 0.021; 0.048 0.88 122.34 0.008 0.000; 0.012
H
5 237.04** 185 0.100 0.040 0.022; 0.054 0.90 102.89* 0.009 0.002; 0.013
H
4 135.23* 102 0.120 0.043 0.020; 0.061 0.92 76.44 0.009 0.000; 0.014
H
6 400.17*** 289 0.062 0.046 0.035; 0.057 0.82 131.18 0.009 0.004; 0.013
H
4 79.99 102 0.075 0.000 0.000; 0.046 0.97 42.34 0.000 0.000; 0.000
Universalism
H
5a 230.52* 185 0.056 0.037 0.018; 0.052 0.85 92.56 0.006 0.000; 0.012
H
5b 231.17* 185 0.056 0.037 0.019; 0.052 0.84 98.57 0.008 0.000; 0.013
H
4 142.82** 102 0.067 0.047 0.027; 0.065 0.87 84.72* 0.010 0.004; 0.015
NOTE: SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation;
CI = confidence interval; IFI = Incremental Fit Index; NA = not available.
2
reports differences in
2
for nested
models. Model 5a includes the itemsocial justice; Model 5b includes the itemwisdom. Degrees of freedomfor
2
are respectively 120, 100, 80, and 60 for 7, 6, 5, and 4 items.
*p < .05. **p < .01. ***p < .001.
2003 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
at Bibliotheques de l'Universite Lumiere Lyon 2 on May 2, 2008 http://jcc.sagepub.com Downloaded from
reason why the three-item model was also considered and tested, even if the rule of four
items or more per factor was generally followed. The result was that this three-item model
had a relatively good fit as indicated by all the absolute fit indices and may, in consequence,
be considered as metrically equivalent across the samples.
The four Security models including fromseven to four items had to be rejected as they all
were associated with high SRMR values. Only the three-item model, including the items
clean, national security, and social order, reached an acceptable SRMR value, and the other
overall fit indices confirmed that this reduced model, including the items clean, national
security, and social order, may be metrically equivalent across national samples.
Three models for Self-Direction were evaluated for metric equivalence with six, five, and
four items. The overall fit indices indicated an acceptable overall fit of all three models, espe-
cially taking into account the SRMR and RMSEA. Moreover, even if the comparative fit
indices were not all available, those concerning the six-item model indicated that this con-
strained model was equivalent to the basic model.
For the three-itemStimulation model, SRMRand IFI indicated a good fit even though the
value of the RMSEAupper confidence limit was a little bit too high. The hypothesis of metric
equivalence was accepted for the three-item model.
For Tradition values, the results were clear. Both overall and comparative fit indices con-
firmed that the four-item model had an excellent fit, better than the five-item model (for
which the SRMRand IFI indicated a lack of fit). The four-itemmodel was accepted as metri-
cally equivalent.
Finally, the three models of Universalism showed comparable results with acceptable
SRMR and RMSEA values for the absolute fit indices and relatively low values for the IFI.
At the exception of the significant difference in chi-square for the four-item models testing
configural and metric equivalence, all other comparative evaluations indicated that the most
constrained models were equivalent to the basic models and should thus be accepted as indi-
cating metric equivalence for Universalism values measured with the two five-item models
across the samples.
Scalar invariance. The hypothesis tackled here concerns equivalence in means or scalar
invariance (H
). The models that were tested, as specified before, were nested in the H
models and thus were statistically compared to them. Here, as one might expect on the basis
of the results presented in Table 3, all the models tested showed a very bad fit and the results
are therefore not presented in detail. The Conformity four-item model (a good fitting model
in the previous analyses) can be taken as an example: Satorra-Bentler
2
(182) = 1236.69, p <
.001; SRMR = 0.058; RMSEA = 0.18 [0.17; 0.19]; IFI = 0.30;
2
(80) = 1134.04, p < .001;
RMSEA = 0.059 [0.056; 0.062]. In consequence, the hypothesis of equivalent means
across the samples had to be rejected for all value types.
Factor variance invariance. Another step in the analysis of levels of equivalence involves
the test of factor variance invariance across the samples (H
models.
Values of the SRMR for the four-item model of Achievement indicated that this model
had to be rejected. This was also true for the three-item model (not reported here) that had
also been tested and that also showed a rather bad fit (SRMR = 0.14).
Four models were tested for Benevolence values. The result was somewhat ambiguous.
On one hand, all RMSEA values indicated a good fit, contrary to the IFI values that all indi-
cated a relatively bad fit. On the other hand, the SRMR values were really close to the
16 JOURNAL OF CROSS-CULTURAL PSYCHOLOGY
2003 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
at Bibliotheques de l'Universite Lumiere Lyon 2 on May 2, 2008 http://jcc.sagepub.com Downloaded from
threshold value of 0.08, but only the seven- and four-itemmodels had acceptable SRMRval-
ues. Comparative fit indices showed that all models were equivalent to the H
models. As all
these models were somewhat equivalent, and as the seven-itemand four-itemmodels should
be accepted on the basis of the rules we are following, we suggest rejecting the seven-item
model that included the six- and five-itemrejected models and accepting the four-itemmodel
as having a unique and equal variance across the samples.
The Conformity four-item model was accepted as indicated by the overall and compara-
tive fit indices. Thus, the hypothesis of equal factor variance across the 21 countries was con-
firmed for this value type.
Both the IFI and RMSEAindicated a lack of overall fit of the three-itemPower model for
which the hypothesis of factor variance equivalence across samples was rejected. All the
models for the Security values, including fromseven to four items, were rejected on the basis
Spini / MEASUREMENT EQUIVALENCE OF VALUES 17
TABLE 6
Fit Indices for the Variance Equivalence Models (H
) and
the Change in Fit From the Equal Factor Loadings Models (H
)
Overall Fit Index Comparative Fit Index
Value Type Satorra- RMSEA; RMSEA;
Hypothesis Bentler
2
df SRMR RMSEA 90% CI IFI
2
RMSEA 90% CI
Achievement
H
4 183.86*** 122 0.12 0.053 0.037; 0.068 0.90 44.32** 0.018 0.011; 0.025
Benevolence
H
7 597.11*** 434 0.078 0.046 0.036; 0.055 0.81 34.55* 0.014 0.005; 0.021
H
6 439.49*** 309 0.084 0.049 0.038; 0.059 0.80 31.28 0.012 0.000; 0.020
H
5 303.04*** 205 0.088 0.052 0.039; 0.064 0.80 29.63 0.011 0.000; 0.019
H
4 156.98** 122 0.073 0.040 0.018; 0.057 0.83 34.67* 0.014 0.005; 0.022
Conformity
H
4 145.27** 122 0.057 0.033 0.000; 0.051 0.95 42.62** 0.017 0.010; 0.025
Power
H
3 99.85*** 60 0.049 0.061 0.039; 0.081 0.93 51.72*** 0.021 0.014; 0.027
Security
H
7 552.75*** 434 0.120 0.039 0.028; 0.049 0.86 36.44* 0.015 0.007; 0.022
H
6 392.93*** 309 0.140 0.039 0.026; 0.050 0.87 37.30* 0.015 0.007; 0.023
H
5 275.52*** 205 0.150 0.044 0.029; 0.057 0.88 46.87*** 0.019 0.012; 0.029
H
4 174.90** 122 0.140 0.049 0.032; 0.065 0.89 39.67** 0.016 0.009; 0.023
Self-Direction
H
6 427.59*** 309 0.066 0.046 0.035; 0.057 0.81 27.42 0.010 0.010; 0011
H
5 256.33** 205 0.059 0.037 0.020; 0.051 0.85 27.68 0.010 0.000; 0.018
H
4 122.72 122 0.051 0.006 0.000; 0.038 0.90 23.97 0.007 0.000; 0.016
Stimulation
H
3 100.67*** 60 0.120 0.061 0.040; 0.082 0.96 37.62** 0.015 0.007; 0.023
Tradition
H
4 116.75 122 0.120 0.000 0.000; 0.033 0.93 36.76* 0.015 0.007; 0.022
Universalism
H
5a 246.86* 205 0.068 0.034 0.014; 0.048 0.84 16.34 0.000 0.000; 0.011
H
5b 247.38* 205 0.067 0.034 0.014; 0.048 0.84 16.21 0.000 0.000; 0.011
H
4 159.41* 122 0.070 0.041 0.020; 0.058 0.87 16.59 0.000 0.000; 0.011
NOTE: SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation;
CI = confidence interval; IFI = Incremental Fit Index. Model 5a includes the itemsocial justice; Model 5b includes
the item wisdom. Degrees of freedom for
2
= 20.
*p < .05. **p < .01. ***p < .001.
2003 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
at Bibliotheques de l'Universite Lumiere Lyon 2 on May 2, 2008 http://jcc.sagepub.com Downloaded from
of RMSEA and IFI. A complementary analysis with the three-item model gave the same
results with RMSEA = 0.060 [0.038; 0.081]; IFI = 0.90.
The SRMRand RMSEAand the two comparative fit indices confirmed that the hypothe-
sis of equal variance could be accepted for the four-itemSelf-Direction values. This is not the
case for the three-itemmodel for the Stimulation values, which had to be rejected on the basis
of both SRMR and RMSEA.
Concerning Tradition values, the hypothesis of equal variance across the samples was
rejected for the four-item model on the basis of SRMR and IFI. The three-item model (not
reported here) had also been tested but revealed a lack of fit, with almost equal SRMRand IFI
values.
Finally, the three models for Universalism had acceptable overall and comparative fit
indices. These indicated that the hypothesis of equal factor variance across the samples could
be accepted for both five-itemmodels, including the itemsocial justice or the itemwisdom.
Error variance invariance. The last step in the analysis of equivalence concerns the
hypothesis of invariance of the items error variances (H