Journal of Statistical Software: More On Multidimensional Scaling and Unfolding in R: Smacof Version 2
Journal of Statistical Software: More On Multidimensional Scaling and Unfolding in R: Smacof Version 2
Abstract
The smacof package offers a comprehensive implementation of multidimensional scal-
ing (MDS) techniques in R. Since its first publication (De Leeuw and Mair 2009b) the
functionality of the package has been enhanced, and several additional methods, features
and utilities were added. Major updates include a complete re-implementation of mul-
tidimensional unfolding allowing for monotone dissimilarity transformations, including
row-conditional, circular, and external unfolding. Additionally, the constrained MDS im-
plementation was extended in terms of optimal scaling of the external variables. Further
package additions include various tools and functions for goodness-of-fit assessment, uni-
dimensional scaling, gravity MDS, asymmetric MDS, Procrustes, and MDS biplots. All
these new package functionalities are illustrated using a variety of real-life applications.
1. Introduction
Multidimensional scaling (MDS; Torgerson 1952; Kruskal 1964; Borg and Groenen 2005)
is a technique that represents proximities among objects as distances among points in a
low-dimensional space. Multidimensional unfolding (Coombs 1964; Busing, Groenen, and
Heiser 2005; Borg and Groenen 2005) is a related technique that represents input preference
data as distances (among individuals and objects) in a low-dimensional space. Nowadays,
MDS as well as unfolding problems are typically solved through numeric optimization. The
state-of-the-art approach is called SMACOF (Stress Majorization of a Complicated Function;
De Leeuw 1977)1 and provides the user with a great amount of flexibility for specifying
1
Originally, the “C” in SMACOF stood for “convex” which was later changed to “complicated” as the stress
function is not convex.
2 smacof Version 2: Multidimensional Scaling and Unfolding in R
Table 1: Overview of newly implemented smacof functions (and key arguments), grouped by
their purpose.
MDS and unfolding variants. Since the first publication of the smacof package in R by De
Leeuw and Mair (2009b), several additional MDS and unfolding approaches as well as various
extensions and utility functions have been implemented, as presented in this article. We keep
our elaborations fairly applied since the core technical details were already provided in the
original publication.
The first part of this paper gives the reader the key ingredients of MDS, with a special focus
on newly implemented dissimilarity transformation functions. This is followed by a section
on MDS goodness-of-fit assessment, including various ways of assessing the stability of a
solution, and a section on MDS biplots. The incorporation of optimal scaling on the external
variables, as presented in a subsequent section, makes MDS an attractive tool for confirmatory
research. What follows next is a detailed presentation of the recently implemented unfolding
function, which adds great amounts of flexibility in model specification as compared to the
original implementation. Finally, several smaller additions such as Procrustes transformation,
asymmetric MDS, gravity MDS, and unidimensional scaling are presented. Table 1 gives an
overview of these developments. Related R packages are mentioned in the respective sections.
2. SMACOF in a nutshell
MDS takes a symmetric dissimilarity matrix ∆ of dimension n×n with non-negative elements
δij as input. These dissimilarities can be either directly observed (e.g., in an experimental
setting a participant has to rate similarities between pairs of stimuli) or derived (e.g., by ap-
plying a proximity measure on a multivariate data frame). If the data are collected or derived
as similarities sij , the sim2diss function supports users to convert them into dissimilarities
δij . Corresponding conversion formulas are given in Table 2. Additional technical details on
various conversions can be found in Shepard (1957), Gower and Legendre (1986), Ramsay
(1997), Esposito, Malerba, Tamma, and Bock (2000), Fleiss, Levin, and Paik (2003), Heiser
and Busing (2004), and Keshavarzi, Dehghan, and Mashinchi (2009). The resulting matrix
∆ can then be passed to the respective MDS functions.
SMACOF uses majorization (see De Leeuw and Mair 2009b, for details) to solve Kruskal’s
stress target function (Kruskal 1964)
i<j
Journal of Statistical Software 3
Table 2: Conversions of similarities into dissimilarities: similarities sij , correlations rij , fre-
quencies fij , proportions/probabilities pij .
with i<j wij dˆ2ij = n(n − 1)/2 as constraint. Let us explain the components involved in this
P
s=1
The dˆij ’s are the disparities (also called d-hats), collected in the n × n matrix D̂. Disparities
are optimally scaled dissimilarities. That is, a transformation admissible on the assumed scale
level (“measurement levels as functions”; see, e.g., Jacoby 1999) is applied. The first smacof
package incarnation offered only two specification options: metric or non-metric. The new
package version implements the following bundle of transformation functions (ordered from
most restrictive to least restrictive):
• Monotone spline MDS: dˆij = f (δij ) where f is an I-spline (integrated spline) transfor-
mation (Ramsay 1988) with fixed number of knots and spline degree.
• Ordinal MDS: dˆij = f (δij ) where f is a monotone step function. Approaches for tie
handling (i.e., in case of δij = δi′ j ′ ) are the following:
4 smacof Version 2: Multidimensional Scaling and Unfolding in R
– Primary approach (“break ties”): does not require that dˆij = dˆi′ j ′ .
– Secondary approach (“keep ties tied”): requires that dˆij = dˆi′ j ′ .
– Tertiary approach: requires that the means of the tie blocks are in the correct
order.
In the MDS literature, many experiments and other MDS software in mainstream statistical
packages have been using stress-1. Fortunately, there exists a simple relation between σn and
σ1 , as shown in detail in Borg and Groenen (2005, Chapter 11). They prove that at a local
minimum X∗ q
σ1 (D̂, X∗ ) = σn (D̂, X∗ ). (4)
Therefore, without loss of generality, we report stress-1 in all MDS functions implemented in
smacof2 .
To illustrate MDS with different types of transformation functions we use a simple dataset
from Guttman (1965). The data consist of an 8 × 8 matrix containing correlations of eight
items in an intelligence test. First, we need to convert these similarities into dissimilarities, as
all smacof functions operate on dissimilarities. Second, we fit four MDS versions and report
the corresponding stress values.
R> library("smacof")
R> idiss <- sim2diss(intelligence[,paste0("T", 1:8)])
R> fitrat <- mds(idiss)
R> fitint <- mds(idiss, type = "interval")
R> fitord <- mds(idiss, type = "ordinal")
R> fitspl <- mds(idiss, type = "mspline")
R> round(c(fitrat$stress, fitint$stress, fitord$stress, fitspl$stress), 3)
Configuration Distances
0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9
Dissimilarities Dissimilarities
Configuration Distances
1.0
1.0
0.5
0.5
0.0
0.0
Dissimilarities Dissimilarities
The variability in the stress values across the different transformations is due to the differing
amounts of flexibility provided by each of the transformations. Figure 1 shows the Shepard
diagrams involving four different transformation functions. These diagrams plot the observed
dissimilarities δij against the fitted distances dij (X), and map the disparities dˆij into the
point cloud (De Leeuw and Mair 2015).
The option to apply various dissimilarity transformations is one of the advantages of the
SMACOF framework compared to classical scaling (Torgerson 1952) as implemented in stats’
cmdscale. In smacof, these transformation functions are now also available for all kinds of
three-way MDS models (indscal and idioscal functions), as well as for confirmatory MDS
and unfolding, as described further below.
6 smacof Version 2: Multidimensional Scaling and Unfolding in R
This leads to a stress value of 0.2185. Now we fit 100 additional ratio MDS models based on
different random starts, and report the lowest stress value.
R> set.seed(123)
R> stressvec <- rep(NA, 100)
R> fitbest <- mds(WishD, init = "random")
R> stressvec[1] <- fitbest$stress
R> for(i in 2:100) {
+ fitran <- mds(WishD, init = "random")
+ stressvec[i] <- fitran$stress
+ if (fitran$stress < fitbest$stress) fitbest <- fitran
+ }
R> round(fitbest$stress, 4)
[1] 0.2178
This solution leads to a slightly lower stress value than the one obtained with a classical
scaling start. From a purely statistical point of view the user would normally decide to go
with this solution. However, from a more substantive perspective, interpretability plays an
important role. For instance, there might be a solution with a reasonably low stress value
(but not the lowest) which leads to better interpretability. This issue is studied in detail in
Borg and Mair (2017) who propose the following strategy (p. 21–22):
1. Run an MDS analysis with a set of different initial configurations (e.g., using many
random configurations).
Journal of Statistical Software 7
3. Use Procrustean fitting (see Section 7.4) to eliminate all meaningless differences (i.e.,
differences not driven by the data) among the MDS solutions.
5. Analyze the similarity structure of the MDS configurations with two-dimensional MDS
(to visualize the similarity structure) or cluster analysis (to identify types of MDS
configurations).
6. For each type of MDS configuration with a reasonably low stress value, plot one proto-
typical MDS solution and check its interpretability.
7. Pick the MDS solution that is acceptable in terms of stress value and gives the best
interpretation.
These steps to explore initial configurations are implemented in the icExplore function.
Again, we fit 100 ratio MDS models with random starts and save all fitted MDS objects
(returnfit argument).
R> set.seed(123)
R> icWish <- icExplore(WishD, nrep = 100, returnfit = TRUE)
R> plot(icWish, main = "IC Plot Wish")
Figure 2 shows the configuration plot of the 100 MDS solutions based on random starts
(cf. Step 5).3 The larger the size of the label, the larger the stress value and, therefore, the
worse the fit of the solution. Based on this plot the user can extract various solutions that fit
satisfactorily, plot the configurations, and interpret the solutions.
IC Plot Wish
22 27
1.5
46
1.0
80 91
12 8364
94 65 81
Dimension 2
0.5
66 60
77 39 18 58
9875 26
34 1659 535095
38 7 17 31 44 87
76 55 96 69 33 32 43 84
79
0.0
72 35 11 4071
70 47 24 28
89 61 25 23 455159
100
73 29 6 1021 57
86 48 52 74 36 1418 1954
56 4 41 97 90 63 42 3049 82
93 13 2 37 85 67 62 68
−0.5
92
78 20 315
8899
−1.5 −1.0 −0.5 0.0 0.5
Dimension 1
Figure 2: Similarity structure of 100 MDS solutions. Each label corresponds to an MDS
solution. The size of the labels (and their color shading) is proportional to the stress value.
of MDS from Section 2. As an example, we use a dataset from Lawler (1967) who studied
the performance of managers. There are three traits (T1 = quality of output, T2 = ability to
generate output, T3 = demonstrated effort to perform), and three methods (M1 = rating by
superior, M2 = peer rating, M3 = self-rating). We start the stress norm analysis by fitting a
2D ratio MDS model:
This leads to a stress value of 0.241. Let us explore the random stress values for this example
(n = 9, p = 2; 500 replications):
R> set.seed(123)
R> rstress <- randomstress(n = 9, ndim = 2, nrep = 500, type = "ratio")
This function call returns a vector of 500 stress values. Let x̄r denote the average random
stress value and σr the standard deviation. The default in the random stress literature (see,
e.g., Spence and Ogilvie 1973) is to use x̄r − 2σr as upper bound: if the observed stress value
is smaller than this cutoff, the stress can be considered as “significant”.
[1] 0.22
Journal of Statistical Software 9
In our example the stress value of 0.241 from the original MDS fit is above this cutoff. This
suggests a “non-significant” result which implies that the 2D ratio MDS solution does not fit
satisfactorily.
There are several issues associated with such random stress norms. First, as Spence and
Ogilvie (1973) point out, the dispersion of the random stress norms is in general very small.
In most practical applications the strategy applied above leads to “significant” results; our
example is somewhat of a rare exception. Second, apart from n and p, other circumstances
such as the error in the data, missing values, as well as ties affect the stress (Mair et al. 2016;
Borg and Groenen 2005). Third, the benchmark is based on completely random configura-
tions. Real-life data almost always have some sort of structure in it such that the random
stress strategy leads to “significant” results in most cases.
Instead of generating random dissimilarities, permutation tests can be used, as formalized in
Mair et al. (2016). They lead to “sharper” tests than random null configurations. There are
two scenarios for setting up a permutation scheme. First, in the case of directly observed
dissimilarities the elements in ∆ can be permuted. For each permutation sample an MDS
model of choice is fitted. By doing this many times it results in a null distribution of stress
values. Second, for derived dissimilarities, Mair et al. (2016) propose a strategy for systematic
column-wise permutations (one variable at a time). This permutation scheme gives a more
informative null distribution compared to full column-wise permutations. For each permuta-
tion sample a dissimilarity matrix is computed, and an MDS fitted. Again, this gives a stress
distribution under the H0 of little departure from complete exchangeability of dissimilarities
in the data-generating process.
Let us illustrate both permutation scenarios. For directly observed dissimilarities we continue
with the Lawler example from above (500 permutations):
R> set.seed(123)
R> permLaw <- permtest(fitLaw, nrep = 500, verbose = FALSE)
R> permLaw
Stress: 0.184
250
0.8
200
0.6
Frequency
Probability
150
0.4
100
0.2
50
p−value: <0.001
0.0
0
0.20 0.22 0.24 0.26 0.28 0.30 0.10 0.15 0.20 0.25 0.30 0.35 0.40
Figure 3: Left panel: ECDF of the permuted stress values (dashed gray line at α = 0.05,
solid gray line at the p-value). Right panel: Permutation stress histogram (red dashed line at
critical value, solid black line at observed stress value).
R> library("MPsychoR")
R> data("Wenchuan", package = "MPsychoR")
R> Wdelta <- dist(t(Wenchuan))
R> fitWen <- mds(Wdelta, type = "interval")
R> round(fitWen$stress, 3)
[1] 0.184
In the subsequent permtest call we provide the raw input data through the data argument.
This way the function knows that the permutations should be performed on the raw data
rather than on ∆. We also need to tell the function which dissimilarity measure we used
above before fitting the MDS. We perform 1000 replications.
R> set.seed(123)
R> permWen <- permtest(fitWen, data = Wenchuan, method.dat = "euclidean",
+ nrep = 1000, verbose = FALSE)
R> permWen
This time we reject H0 . Figure 3, obtained by calling plot(permWen), visualizes the results in
two ways: the left panel shows the empirical cumulative distribution function (ECDF) of the
permutation stress values, whereas the right panel shows the permutation stress histogram
including the critical value (lower 5% quantile) and the observed stress value.
Note that such permutation strategies can be applied to unfolding models (see Section 6) as
well (see Mair et al. 2016, for details).
ST can be interpreted as the ratio of between and total variance. To measure the cross-
validity, that is, comparing the “predicted” configuration of object i as the i-th row in X̄∗
with the actual configuration (i-th row in X),
n∥X − X̄∗ ∥2
CV = 1 − Pn ∗ 2 (6)
i=1 ∥X−i ∥
can be used. Using these two normalized measures the dispersion around the original solution
X can be simply expressed as
DI = 2 − (ST + CV ). (7)
The dataset we use to illustrate the jackknife MDS is from McNally, Mair, Mugno, and
Riemann (2017), included in the MPsychoR package. Below we scale 16 depression symptoms
reported by patients using the Quick Inventory of Depressive Symptomatology (QIDS-SR).
We fit a 2D ordinal MDS on the Euclidean distance input matrix, subject to an MDS jackknife.
Objects
onset
middle
1.0 late
hypersom
sad
decappetite
incappetite
weightloss
0.5
weightgain
concen
guilt
suicide
Dimension 2
anhedonia
fatigue
retard
0.0
agitation
−0.5
−1.0
Dimension 1
Figure 4: Jackknife MDS plot. The labels are positioned at the original point coordinates,
and the stars represent the resampled solutions with the jackknife centroid at the center.
SMACOF Jackknife
Number of objects: 16
Value loss function: 0.3444
Number of iterations: 12
The print output shows the jackknife measures reported above. Figure 4 shows the jackknife
MDS plot. The points are placed at X (MDS configuration). The centers of the stars denote
the jackknife centroids, the rays the n − 1 jackknife solutions. This result suggests that the
solution is very stable.
Further options for using jackknife in MDS are presented in Vera (2017) where the distances
are subject to stability analysis.
Journal of Statistical Software 13
(zj − xi )S−1 ⊤
i (zj − xi ) = χ (α; p),
2
(8)
N denotes the number of bootstrap replications, X∗l the configuration of the l-th replication,
X̄∗ the bootstrap centroid configuration. Again, ST reflects a between/total variance ratio
and can be used to compare various MDS solutions against each other (Heiser and Meulman
1983). For instance, one could compare an unrestricted solution with a restricted solution
(see Section 5). The larger ST , the more stable the solution.
Let us apply the corresponding bootmds function on the depression data from above. We use
N = 500 bootstrap replications.
R> set.seed(123)
R> bootRogers <- bootmds(fitRogers, RogersSub, method.dat = "euclidean",
+ nrep = 500)
R> bootRogers
SMACOF Bootstrap:
Number of objects: 16
Number of replications: 500
In addition to the stability coefficient, the function also reports the stress averaged across
bootstrap samples, including the 95% confidence interval (bootstrap percentile).
14 smacof Version 2: Multidimensional Scaling and Unfolding in R
1.0
hypersom
weightgain
0.5
guilt
incappetite
anhedonia
fatigue
suicide
retard concen sad
0.0
Dimension 2
decappetite agitation
middle
weightloss
−0.5
late onset
−1.0
−1.5
Dimension 1
R> plot(bootRogers)
Figure 5 shows the resulting bootstrap configuration with the confidence ellipsoids. There is
a fair amount of instability associated with the sleep-onset insomnia item (labeled “onset”).
Pseudo−Confidence Ellipses
0.10
hypersom
weightgain
guilt
fatigue
0.05
sad
0.00
retard
anhedonia
decappetite
agitation
−0.05
weightloss middle
late onset
−0.10
Dimension 1
The following plot function takes this object and produces the configuration plot with the
ellipsoids. Of importance is the eps argument which we set to 0.01 below. This value implies
that we look at a perturbation region where the stress value is at most 1% larger than the
local minimum we have found. Figure 6 shows the corresponding configuration plot.
Note that the scales along the axes differ from the ones in Figures 5 and 6 (apart from the
fact that ratio MDS is used). This is because the SMACOF engine for estimating pseudo-
confidence ellipsoids normalizes the coordinates differently (see De Leeuw 2019, for details).
Also, the shape differences in the confidence ellipsoids are due to different methods used to
construct the ellipsoids.
4. MDS biplots
Biplots were developed within the context of principal component analysis (PCA; Gabriel
1971). In a PCA biplot the loading vectors are mapped on top of the scatterplot of the prin-
cipal component scores. However, the concept of biplots can be applied to other multivariate
techniques as well, as elaborated in Greenacre (2010), Gower, Lubbe, and Le Roux (2011),
16 smacof Version 2: Multidimensional Scaling and Unfolding in R
and Mair (2018). In MDS, biplots are often used to map external variables onto the MDS
configuration. Such covariates allow users to explore meaningful directions in the MDS space
rather than trying to interpret the dimensions directly. Note that Rabinowitz (1975) was one
of the first to suggest embedding axes representing external variables into MDS solutions in
order to facilitate substantive interpretations.
Let Y be a n × q matrix with q external variables in the columns, each of them centered
and optionally standardized (the latter simply changes the length of the biplot vector, not its
direction). To produce an MDS biplot, the following multivariate regression problem needs
to be solved:
Y = XB + E, (10)
where B is a p × q containing p regression coefficients for each of the q variables, and E is
the n × q matrix of errors. The corresponding OLS estimates B̂ = (X⊤ X)−1 X⊤ Y give the
coordinates of the external variables in the MDS space. The smacof package provides the
biplotmds function which performs the regression fit. By default, the external variables are
standardized internally (default scale = TRUE; scale = FALSE does centering only).
Let us start with a simple example where we map a single metric variable onto a configuration.
We use a dataset taken from Engen, Levy, and Schlosberg (1958) on facial expressions (see
also Heiser and Meulman 1983). Participants had to rate proximities of 13 facial expressions,
resulting in the dissimilarity matrix ∆. Rating scale values were collected by Abelson and
Sermat (1962) for the dimensions “pleasant-unpleasant” (PU), “attention-rejection” (AR),
and “tension-sleep” (TS).
We fit an ordinal MDS solution, and map the pleasant-unpleasant (PU) variable on top of the
configuration. We present two biplot versions. First, we focus on the vector representation.
PU
D1 -1.6214189
D2 -0.6295513
These regression coefficients determine the direction and length of the biplot vector.
Second, we use the axis representation for which the calibrate package (Graffelman 2020)
turns out to be helpful. We start by computing the regression coefficients based on the
centered external variable. In order to make sure that the ticks on the biplot axis correspond
to the original scale, some additional preliminary lines are needed.
R> library("calibrate")
R> biFace2 <- biplotmds(fitFace, extvar = ext, scale = FALSE)
R> coef(biFace2)
PU
D1 -3.865508
D2 -1.500868
Journal of Statistical Software 17
The top panel uses the vector representation as advocated in Greenacre (2010). Using the
vecscale argument the biplot vector can be scaled by its length. The bottom panel uses the
axis representation as preferred by Gower et al. (2011). For the axis representation we can
do an orthogonal projection of the points on the axis, which gives the fitted values.
Let us move on with a second, more complex example involving multiple external variables
which reproduces part of the analysis presented in Mair (2018). We use the mental states
dataset from Tamir, Thornton, Contreras, and Mitchell (2016) who, for each individual,
collected a dissimilarity matrix involving 60 mental states, derived from functional magnetic
resonance imaging (fMRI) scans. The data are included in the MPsychoR package. We
average across the individuals, which leads to a single 60 × 60 dissimilarity matrix, subject to
a 2D monotone spline MDS. After the biplot computations, we print out the R2 values from
the individual regression fits.
R> data("NeuralActivity")
R> data("NeuralScales")
R> NeuralD <- Reduce("+", NeuralActivity)/length(NeuralActivity)
R> fitNeural <- mds(NeuralD, type = "mspline")
R> biNeural <- biplotmds(fitNeural, NeuralScales[,1:8])
R> round(biNeural$R2vec, 3)
The vector version of the MDS biplot is given in Figure 8. The longer a covariate vector, the
larger the corresponding R2 . That is, the more accurate the corresponding axis projections
are in relation to the raw data. The orientation of the vectors reflects the correlation patterns
among the external variables, assuming the plot gives an accurate representation of the data
(of course, we lose information here due to projecting into a low-dimensional space). Other
options such as nonlinear MDS biplots are presented in Gower et al. (2011, Chapter 5),
including corresponding R code.
18 smacof Version 2: Multidimensional Scaling and Unfolding in R
1.0
Light sleep
Revulsion
Physical exhaustion
0.5
Dimension 1
Light sleep
Revulsion
Physical exhaustion
0.5
1
Maternal love−baby in arms 2
3
0.0
4
Savoring a Coke Something wrong with plane
5
Unexpectedly meets old boyfriend
6 Anger at seeing dog beaten
PU 7
8 hard on seat of chair
Pulling
−0.5
9
Very pleasant surprise
Knows plane will crash
−1.0
Dimension 1
Figure 7: Top panel: Vector representation of external variable. Bottom panel: Axis
representation of external variable.
Journal of Statistical Software 19
Neural Biplot
1.5
Body
relaxation
peacefulness
1.0
sleepiness
Experience
fatigue
awe
anticipation Low.Arousalweariness
Emotion exaltation planning
drunkenness
0.5
disgust exhaustion
agitation imagination
affection
playfulness decision
laziness ecstasy trance
Dimension 2
inspiration
desire
self−consciousness
craziness attention
lust
skepticism lethargy
0.0
uneasiness
embarrassment pity
friendliness
distrust belief judgment
self−pity alarm insanity
subordination opinion transcendence
dominance contemplation
worryseriousness
satisfactioncuriosity awareness
High.Arousal disarrayintrigue
earnestness
reason
−0.5
nervousness objectivity
consciousness
thought cognition
patience
stupor
Mind
pensiveness
−1.0
Agency Reason
−1.5
Dimension 1
Figure 8: Biplot for mental state MDS configuration. External variables are represented as
vectors.
X = ZC, (11)
directly incorporated into the stress formula given in (1). Z is a known covariate matrix of
dimension n × q with number of covariates q ≥ p. C is a q × p matrix of regression weights
to be estimated, subject to potential additional restrictions, as outlined below.
For practical purposes, however, this basic implementation is of limited use. For instance,
specifying a 2 × 2 ANOVA design in Z collapses point coordinates to only four points in a 2D
configuration. What makes the external restriction concept attractive in practice is to apply
an additional optimal scaling step on the external scales within each majorization iteration.
20 smacof Version 2: Multidimensional Scaling and Unfolding in R
Equation 11 changes to
X = ẐC. (12)
Each predictor variable z1 , . . . , zq is subject to an optimal scaling transformation. A popular
option is to scale these vectors in an ordinal way (i.e., using monotone regression). Other
transformations such as interval or splines (with or without monotonicity constraints) are
implemented in smacof as well. Note that, from a specification point of view, these exter-
nal variable transformations are unrelated to the dissimilarity transformations introduced in
Section 2.
Let us illustrate such a constrained MDS using the face expression data from Section 4. We
include the two external variables “pleasant-unpleasant” (PU) and “tension-sleep” (TS). They
constitute the matrix Z. We restrict C to be diagonal, which performs dimensional weighting.
Note that for this diagonal restriction the number of dimensions is determined by the number
of covariates (i.e., q = p), since each covariate defines an axis (dimension). We also use the
configuration from an unrestricted ordinal fit as initial configuration. It is important that
the user provides a reasonable starting configuration for the constrained MDS computation;
using one from an unrestricted fit is in general a good option.
Let us start with the first constrained MDS model: ordinal dissimilarity transformation of ∆,
interval transformed external variables in Z, diagonal regression weights restriction in C.
R> fitFace <- mds(FaceExp, type = "ordinal")
R> Z <- FaceScale[, c(1,3)]
R> fitFaceC1 <- smacofConstraint(FaceExp, type = "ordinal",
+ constraint = "diagonal", external = Z, constraint.type = "interval",
+ init = fitFace$conf)
R> round(fitFaceC1$C, 3)
[,1] [,2]
[1,] 1.068 0.000
[2,] 0.000 1.211
The last line shows the implied diagonal restriction in C. We obtain a stress value of 0.183
which, of course, is larger than the one from the unconstrained fit (0.106).
The resulting MDS configuration is given in Figure 9. Using the calibrate package the axes
of the external variables (original scales) can be added (see supplemental code materials).
These axes are a simple form of biplot axes, resulting from the diagonal restriction in C. For
this interval transformed solution the observed values in Z can be directly read from the PU
and TS axes; the configuration coordinates reproduce these values exactly.
In a second fit we relax the interval transformation of Z in terms of an ordinal transformation.
C is still kept diagonal.
R> fitFaceC2 <- smacofConstraint(FaceExp, type = "ordinal",
+ constraint = "diagonal", external = Z, constraint.type = "ordinal",
+ init = fitFace$conf)
R> round(fitFaceC2$C, 3)
[,1] [,2]
[1,] -1.034 0.00
[2,] 0.000 -1.08
Journal of Statistical Software 21
1
2
Physical exhaustion
0.5
3
Maternal love−baby in Grief
arms at death of mother
4
Dimension 2
Savoring a Coke
Revulsion
TS
5
0.0
6
Pulling hard on seat Something
of chair wrong with plane
Very pleasant surprise
7
Anger at seeing dog beaten
−0.5
8
Knows plane will crash
9
−1.0
Dimension 1
Transformed Scale
0.2
0.0
0.0
−0.2
−0.5
−0.6
2 4 6 8 2 4 6 8
Figure 10: Transformation plots for external variables (original scores from Z on the x-axis,
transformed scores from Ẑ on the y-axis).
Due to the less restrictive nature of this specification this solution has a lower stress value
(0.159) than the interval transformed solution from above. Figure 10 gives some insight into
the ordinal transformations performed internally on each column of Z.
Figure 11 shows the configuration with the transformed axes on top and to the right. Again,
the points can be projected onto these axes. The corresponding values match the ones in Ẑ.
22 smacof Version 2: Multidimensional Scaling and Unfolding in R
1.0
Physical exhaustion
0.5
TS transformed
Maternal love−baby in arms
Savoring Grief at death
a Coke of mother
Revulsion
Dimension 2
0
0.2
Pulling hard on Something
seat of chair
wrong with plane
Very pleasant surprise Anger at seeing dog beaten
0.4
−0.5
0.6
0.8
−1.0
Dimension 1
Figure 11: Constrained MDS configuration (C diagonal) of face expression data: ordinal
transformed external variables.
For the next constrained MDS variant we use all three external variables in the dataset (i.e.,
PU, AR, and TS). As q > p we need to relax the diagonal restriction in C: we keep C
unrestricted and use once more an interval transformation of Z.
D1 D2
[1,] -0.887 -0.231
[2,] 0.087 -0.413
[3,] -2.571 4.344
Again, the three biplot axes can be mapped onto the configuration using the calibrate package,
after computing the regressions Z = XB with Z column-centered (see supplemental materials
for the entire code chunk).
Figure 12 displays the corresponding constrained MDS configuration with the biplot axes
on top. Each point can be projected on each axis. The projections are stored in each of
the calibrate objects (value: yt). Generally, the projected values do not correspond to the
observed values in Z as these calibrated biplot axes do not reproduce Z perfectly. As far as
Journal of Statistical Software 23
1
2
Physical exhaustion
0.5
2
3
Revulsion
3
Extreme pain
Grief at death of mother
4
1
Dimension 2
4
2
Savoring a Coke 3
5
0.0
5
4
Something wrong with plane
Pulling hard on seat of chair
5
6 6
6
Unexpectedly meets old boyfriend Anger at seeing dog beaten
7
PU
7
Knows plane will crash
7
8
−0.5
8
Very pleasant surprise
8
TS
AR
9
9
−1.0
Dimension 1
Figure 12: Constraint MDS configuration with C unrestricted. Three external covariates
are added as biplot axes.
the axes are concerned, the biplot suggests that PS and TU are almost orthogonal, whereas
the predictions TS and AR are highly correlated in this 2D space. Dimension 2 lines up with
the AR axis which is useful for the interpretation of the configuration.
A general dimensional interpretation on the basis of the external variables no longer holds
since C is not diagonal: the solution is rotated/reflected followed by dimensional stretching.
By applying an SVD on C the user can get the rotation matrices and the dimension stretching
values (see Borg and Groenen 2005, p. 232, for details).
6. Unfolding
As mentioned in the introduction, one of the major updates since the first publication of the
package was a complete re-implementation of the unfolding function. This update gives the
user the possibility to apply the usual transformations on the dissimilarities, to incorporate
circular restrictions, and to fit row-conditional and external unfolding models.
i=1 j=1
s=1
ˆ
v
i,j (wij dij dij (X1 , X2 ))
2
u P
σ1 (D̂, X1 , X2 ) = 1 − P (15)
u
.
ˆ2
t
2 wij dij (X1 , X2 )
P
i,j wij dij i,j
This expression provides a short cut to compute the stress-1 value, given that we allow for
an optimal dilation constant. At the same time it is a trick for interpretation in terms of
the well known stress-1 value after all the majorization computations are done. Details on
the majorization approach in the case of ratio transformations are given in (De Leeuw and
Mair 2009b). Below we elaborate on a modification that is able to handle general monotone
dissimilarity transformations from Section 2.
Obviously this penalty term acts as a multiplicative factor in Equation 16. As ν(D̂) decreases,
the p-stress penalization increases. There are two tuning parameters involved in this p-stress
setup:
• λ ∈ (0; 1] is a lack-of-penalty parameter that controls the influence of penalty term: the
larger λ, the smaller the penalty influence.
Journal of Statistical Software 25
• ω acts as range parameter in the penalty term: for a small ω the penalty is especially
effective if ν(D̂) is small.
Busing et al. (2005) did an extensive simulation study in order to provide suggestions on how
to fix the tuning parameters. For conditional unfolding, it is suggested to set λ = 0.5, and
ω = 1 (default settings in unfolding)5 . For unconditional unfolding, they suggest that one
uses λ = 0.5 and ω > 0.1. Further details can be found in the corresponding publication.
The p-stress target can be minimized using majorization, for which the details are again given
in Busing et al. (2005). From a practical point of view, after obtaining a p-stress optimized
solution, users can consider the stress-1 from Equation 15 as goodness-of-fit index6 . Note that
all the dissimilarity transformation functions from MDS (i.e., ratio, interval, ordinal, spline;
cf. Section 2) are implemented for unfolding as well.
Let us illustrate an example of an ordinal unfolding solution. We use a dataset from Dabic
and Hatzinger (2009), available in the prefmod package (Hatzinger and Dittrich 2012), where
individuals were asked to configure a car according to their preferences. They could choose
freely from several modules such as exterior and interior design, technical equipment, brand,
price, and producing country. We use only the first 100 individuals in this analysis.
R> library("prefmod")
R> carconf1 <- carconf[1:100, 1:6]
R> head(carconf1)
Since not all individuals ranked all objects, we have the situation of “partial rankings”. The
unfolding function specifies a proper weight matrix W automatically: wij = 0 if δij is
missing; wij = 1 otherwise. This way, the corresponding missing dissimilarities are blanked
out from the optimization. For the ordinal unfolding model we are going to fit, this weight
matrix can be extracted using unf_ord$weightmat.
country
33
tech.equip 26
1.5
20
94 85 13
21
Configuration Distances/d−hats
38
37
0.5
18 43 17
80 99
3177 15 92
89 27
price 3986 71 100
6547 1162 14 52 57
Dimension 2
72 68 56 91
90
58 22 2
51 32 61 41 74 49
1.0
66 46 60 6
959 88 7955 73
0.0
7075
97 10
4 98 28 4434
83 82
35 24 54
87
16 exterior 85
48
3625 63
76 96
84 50 93 1
19 67 29
69 8153
7 64 59
42 23
40
0.5
−0.5
78 12
30 interior3
45
brand
−1.0
0.0
−1.0 −0.5 0.0 0.5 1.0 1 2 3 4 5 6
Dimension 1 Dissimilarities
Figure 13: Left panel: Unfolding configuration of car preference data. Right panel: Shepard
diagram car preference data (ordinal transformation).
Transformation: ordinalp
Conditionality: matrix
This call prints out the stress-1 value as well as the final p-stress value. The configuration
plot and Shepard diagram shown in Figure 13 can be produced as follows:
The Shepard diagram shows the ordinal transformation of the input dissimilarities, whereas
the configuration plot maps the row and column coordinates into a joint space, which makes
distances between any pair of points interpretable.
The d̂i ’s are the row vectors in D̂. The raw stress term in Equation 16 remains unadjusted
since it is additive over the rows.
Let us fit a row-conditional version of the ordinal unfolding on the car characteristics data.
We use the final configuration obtained above as starting configuration. Note that for running
time purposes we set a slightly more generous convergence boundary ε than the default7 . In
general, we recommend to increase the number of iterations using the itmax argument, if
needed. For a reasonably large sample size it can take a while for the algorithm to converge.
A parallelized fit can be evoked through the parallelize argument.
Compared to the unconditional fit, the row-conditional version clearly reduced the stress-1.
Figure 14 shows the resulting configuration plot in the left panel. The Shepard diagram in
the right panel nicely illustrates the difference between unconditional and row-conditional
unfolding. While in unconditional unfolding we fitted only a single transformation function
(see right panel of Figure 13), in row-conditional unfolding each individual gets its own trans-
formation function. Since we have missing values in our data, not all individuals have the full
six-ranking monotone trajectories.
7
In the t-th iteration the convergence criterion used in unfolding is
2(σp (X1 , X2 )(t−1) − σp (X1 , X2 )(t) ) ≤ ε(σp (X1 , X2 )(t−1) + σp (X1 , X2 )(t) + 10−15 ) .
28 smacof Version 2: Multidimensional Scaling and Unfolding in R
11
23
1.0
2.5
54 55
33 56 country
15 9962 52 44
26 34
100
46 17 20
14 16
23
60 9485 38 13
21
18 54380
0.5
Configuration Distances/d−hats
31 tech.equip
77 37 7
11
34
44
55
60
96
97
54
2.0
27
92
3986 71 84
76
36
25
67
65
47 89 5
14
50
52
56
62
6
88 78
price
7268 57 90
91 4
98
9 66
51 58
32 2 53
83
46
63
99
100 35
51
95
41 49 75 15 72
32
73exterior 30
0.0
498 70 58
68
65
26
Dimension 2
9535 22
74 82 6 24
61
39
90
91
70
79
87
13
21
82
20
3
37
1.5
6179
87
24 28 10 83 85
86
94
8
93 1 8 43
49
10
57
47
18
22
74
80
25 8159 9 45
19
66
29
69
36
76
84
48 19 1264
42 41
93
71
42
64
27
28
92
73
33
81
77
38
12
29 interior 75
48
31
17
−0.5
67 69 633 53 2
59
40
16
23 1
89
88 78 30
4045 50
brand
1.0
−1.0
97 11
34
44
54
55
60
96
97
7
0.5
96 37
61
7 27
52
56
62
5
6
14
50
88
92
79
24
87
25
36
−1.5
22
74
15
100
46
63
99
83
19
33
38
89
76
84
48
12
41
35
53
45
8
16 77
86
95
26
59
42
64
81
1
67
20
82
30
31
90
91
93
98
29
69
4
13
21
18
40
71
75
73
9
0.0
10
2
58
68
17
70
39
28
32
47
57
65
3
94
51
78
72
43
66
80
85
49
Dimension 1 Dissimilarities
Figure 14: Left panel: Row-conditional unfolding configuration of car preference data. Right
panel: Shepard diagram (ordinal transformation) row-conditional unfolding.
R> plot(unf_cond,
+ main = "Conditional Unfolding Configuration Car Preferences")
R> plot(unf_cond, plot.type = "Shepard",
+ main = "Shepard Diagram Car Preferences", col.dhat = "gray",
+ xlim = c(0.9, 6.1))
trace(F⊤
1 X1 )
s1 = ,
∥F1 ∥
trace(F⊤
2 X2 )
s2 = .
∥F2 ∥
Based on these scaling factors the updated coordinates are X1 := s1 F1 in the case of row
restrictions, or X2 := s2 F2 in the case of column restrictions. Using this adjustment the new
coordinates are properly scaled with respect to the unconstrained column/row coordinates,
while maintaining the specified shape constraints.
Journal of Statistical Software 29
1.0
BE TR
137
20
125
66
141 SE 150
95
147118 AC 81
13319 10
9611
90 65 UN CO
73 100
37 TR
0.5
0.5
25
1 76 4061
72 PO
88 64
CO 105 59
101 3
110 126
70 127 122
124 109107 8 23 38 116 91 138
104 98 44 85
36
114
42 56
129 144 136 45 93
47
71 113
104
11121 113 83 69 75 111
Dimension 2
Dimension 2
85 54 63 119
805852143 151 57143 134
136
21
48 41 78134 49 121146 9479140974133 52
30
74103
78
51 130 135
112
6827132
74
133946 SD 99 15
128
149
139 27589135
2 16 151 SE
115 57 108 50
117 768 4
0.0
0.0
139148
3155 81 53 108
3587658 132 64
HE 89 75 120 80 39
678
7131 33
10614 6 91
140 30
16 25 62 24 28
22 51 112
861382
ST 142123
4329
115
131
26
31 60109 107
6936
586 82 138 145
77 55 17 130
14 144
43 28
117
26
59 128
47
84
12392
71
150 102 34
8492
483212 106 148
37 18 4665 88
22 34 120 4 67 70 12772 73
1 110 124
40
56 9 126
90 3122141
95
146 60918
12 97 114
103 421961 10
129 100
773829 99 76101 20
6232 14515 3511644
17 147 11
−0.5
−0.5
102 24121 23 50
8753
149
11979 2 ST 9666 PO
98 142
94 83 137 125
SD 5463 45 105 133118
BE
UN 49
93
HE AC
−1.0
−1.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
Dimension 1 Dimension 1
Figure 15: Left panel: Unrestricted unfolding solution. Right panel: Externally restricted
unfolding solution (fixed circular coordinates for personal values).
To illustrate, we use a dataset from Borg, Bardi, and Schwartz (2017). We focus on the
Portrait Value Questionnaire (PVQ) portion of the data which result from a questionnaire of
40 items assessing how persons rate the personal importance of ten basic values: power (PO),
achievement (AC), hedonism (HE), stimulation (ST), self-direction (SD), universalism (UN),
benevolence (BE), tradition (TR), conformity (CO), security (SE) on a scale from 0 to 6. We
use an aggregated version where the item scores belonging to the same psychological value
are averaged. As fixed coordinates we use the following value circle coordinates:
This specification is different from spherical unfolding introduced below, as we fix the value
coordinates on the circle (equidistant) instead of just forcing them to be aligned on a circle.
Of course, in external unfolding we can specify any arbitrarily fixed configuration; it does not
have to be a circle.
Below we fit two solutions: an unconstrained ordinal unfolding solution, and a constrained
ordinal unfolding solution with fixed circular column coordinates. Since smaller responses in
the PVQ data reflect larger dissimilarities, we reverse the category scores.
The stress value of the unconstrained solution is 0.208, whereas that of the external solution is
0.274, which is clearly larger. The plots given in Figure 15 reflect the corresponding differences
in the configurations. The unrestricted solution clearly deviates from the theoretical circle.
We have a single radius r and the corresponding angle θc,i . To compute Xc , we want to
minimize the quadratic part of the majorizing function:
i=1
In the last term, the best θci that can be chosen is the one that maximizes the following
expression: cos(θc,i ) cos(θi ) + sin(θc,i ) sin(θi ). This implies choosing θc,i = θi so that
This simple expression gives us the optimal circular projection of the row coordinates in X.
As mentioned above, the same steps can be carried out for the column coordinates (X := X2 ;
replace i by j, and n by m in these equations).
To illustrate an unfolding solution where we restrict the column coordinates to be on a circle,
we use once more a dataset from Borg et al. (2017) which builds on the Schwartz (1992)
value circle theory. The data are derived from the Schwartz Value Survey (SVS). They were
centered (row-wise) and converted from preferences into dissimilarities, hence representing a
Journal of Statistical Software 31
stimulation
achievement
1.0
1.0
stimulation hedonism
self−direction
achievement
hedonism
self−direction
98 88 16 66
49
0.5
0.5
8898 16 66 271
313143 292 248
20 97
316142
327 189
103
109
197242 132 81
101
203
118 202
300
18
220
327313 49 176 1 52
139
104
17 167
50 102
125
296 26
10 130
198
214
41
283276
152
205 259
316 142
176 271
103
189
109 143 24897 202 324
299
133187
44
37
162 113
206
234
177
9
126
264
165
115 160
129
164
6382
57
151 43
60
188
212
46
156
250 8
320
210
48
7268
116131
237
298
321
299139
1 197
10417
50
167 292 20
101
203 83275
124 14181
146
260291
12
78
253
19
317
282 42
278
240
186
137
45
243
322166
3188785
263
252169
235
27
158 56
231 72
128
145
120
7325100
324
133187
126
44
83 242
102296
177
942
2681
206
132
234
125 63
41
10129
264 118
198
283
13018
165
214
300
152
250 225
286
58 28207
136
108
183
279
323
171
105
277
306
217
34 157
199
222 5
194
228
185 80
4
40
69
6
285
326
22
319
174
135
154
89
159
192 110
119
148
127
213
224
153
168 93
215
84 321
258
229
184
61 211
216
111
149
59
218
121
175
293
227
209 182
power 275
28115
124
136
207 37
162
14
113
146
253
19 82
52
306
181
137 164
151
205
291
277 60
57
278
235
186 160
276 259
848
7
156
220 30
141 54
15
273
76
29 107
254
302
281
5568
150
196
246
190
179
92
255 155
13
267
272
288
140
307
232 117
67
261
265
23294
64
36 9953
58 286
225
279
78
323
192171
105 34
183
108
217
222
15
185 322
806
159
199
228
157
194 45
260
274 12
166
87
243
317
326
282
46 43
212
252
240
40263
285
148 188320
158
85
56169
72
128145
131120 universalism power 269
47
71
86208
312
315
28031 233
287
325
241
147
94
35
303
251
29791
295
3833 77
51
239
221
204 305
122
178
274
74
79
289 226
311 262
266
138
310
30 154
141 4754
273
76
92
55
269
25558922
69
213
135
319
29
68
272 318
127
174
184 231
258
119
153
168
110
150 61
21 298
73
59
321 3210
116 249
134 195
65
11
230
304
96 95
2284
301
245
270
70
161114
144
191
112
180 75123
173
200 24
Dimension 2
Dimension 2
302
208
280
315
241
312
71
249232 107
246
2 265
325
13
303
297 93
229
288
254
196
81 224
155
14067
215
84
216
117
307
94 149
211
64
268 100
25
293 121 314
32
163170 219
0.0
295
0.0
134 179
74
96233
91
33
6535261
251
190
287
267 218
147
77237
23 172236244 256309 193 39
172 86163
195
1138
245
230
304
236
274
32 31
239
7951
221
305
284178
31470
204
144
191
294
95
3012
270
161
122
36
289111
99
226
114
112
266
311
175
53
227
262 182 257 223
201
247
180 170
244 138
219
256257
209
173
309 123
200
310
308
290
238 62 90benevolence
75
2473924
223
201 106
308 106 238
62 193
290
90benevolence universalism
−0.5
−0.5
conformity
security
security
conformity
−1.0
−1.0
tradition
tradition
−1.5
−1.5
−1.5 −1.0 −0.5 0.0 0.5 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
Dimension 1 Dimension 1
Figure 16: Left panel: Unrestricted unfolding configuration of personal values. Right panel:
Circular unfolding solution personal values (circle superimposed).
rectangular dissimilarity matrix ∆ with 327 persons and 10 variables referring to Schwartz’
psychological values: power, achievement, hedonism, stimulation, self-direction, universalism,
benevolence, tradition, conformity, and security. We fit two (ratio) unfolding solutions: an
unrestricted one as well as one with circular restrictions on the column coordinates (values):
Comparing the stress-1 values we get 0.171 for the unrestricted solution, and 0.179 for the
restricted solution. This suggests that the circular solution is basically as good as the unre-
stricted one.
The reason for this becomes obvious when looking at the configuration plots in Figure 16.
The unrestricted solution in the left panel suggests that the personal values approximate a
circle, as suggested by Schwartz’ value theory. The configuration in the right panel results
from forcing the psychological values to be arranged on a circle.
1.5
133
ST
HE
1.0
118
42 70
1.0
129
96 1 48
AC
101 76 127
102
125 147 72
66 19 32
0.5
PO 106 77 62 SD
0.5
61 56 22
110 55
40 43
12 24
124 1073 34
137 115
11 390 20 31 142
37
9148
17 84 145
14 51
92 60
131
100
126 122 2618 28
123 29
130 49 45 63 54
141 121
117 94
0.0
0.0
87
144 112
86
35 50
120
146 UN
109 7 149
2
65 5 128 99
15 97 79
88 10746 82 13 58 806867 4 53 13993119
D2
95 89 6 98
10838
14039
103 27
132 71
83 23 BE
57 135
33 105
−0.5
8 41 44 47
−0.5
74
30
78 16
52 114
69
143
64 36 116
13475 59
21 136151 138 91
SE
−1.0
111
−1.0
85
104 113
150
−1.5
25
TR
−1.5
81
−2.0
CO
D1
The results can be visualized using a biplot, where the row scores are represented as preference
vectors (see Figure 17).
Note that ordinal versions of VMU amount to fitting an ordinal PCA (Princals; Gifi 1990; De
Leeuw and Mair 2009a).
mds is used with ndim = 1. The smacof package provides a simple implementation where
all possible n! dissimilarity permutations are considered for scaling, and the one which leads
to a minimal stress value is returned. Obviously, this strategy is applicable only to a small
number of objects, say less than 10 objects8 .
In the following example we examine seven works by Plato, map them on a single dimension,
and explore to which degree the mapping reflects the chronological order by which they are
written. The input dissimilarities are derived according to the following strategy: Cox and
Brandwood (1959) extracted the last five syllables of each sentence; each syllable is classified
as long or short which gives 32 types; and based on this classification a percentage distribution
across the 32 scenarios for each of the seven works can be computed, subject to a Euclidean
distance computation.
The last line prints the 1D “time” configuration of Plato’s works that lead to the lowest stress
value. Note that the exact chronological order of Plato’s works is unknown; scholars only know
that “Republic” was the first work, and “Laws” his last one. Copleston (1949, p. 140) suggests
the following temporal order of these selected seven works: Republic, Sophist, Politicus,
Philebus, Timaeus, Critias, Laws. Obviously, our unidimensional scaling model advocates a
different chronological order.
8
Approximate timings: 1s for n = 7; 8s for n = 8; 75s for n = 9; for n = 10 the running time is already
exceedingly long.
34 smacof Version 2: Multidimensional Scaling and Unfolding in R
Bubble Plot
1.0
principles
liberty
freedom
americaamerican
great
will
0.5
individual
party
work
hardconstitution
country
founding
limited
life
god
best
Dimension 2
right
defense
0.0
fiscal
nation people personal
national
market
conservative
responsibility
strong
−0.5
family
values
military
free
government
taxes
−1.0
low
small
Dimension 1
Figure 18: Bubble plot for gravity MDS solution on GOP data. The larger the bubbles, the
larger the SPP.
(diagonal blanked out). The gravity model defines the following dissimilarities (for i ̸= j):
s
ci+ c+j
δij = . (19)
cij
To illustrate a gravity application on text data we use a DTM similar to the one presented
in Mair et al. (2014). This DTM was created on the basis of statements of 254 Republican
voters who had to complete the sentence “I am a Republican because...”. First, let us create
the gravity dissimilarities according to the strategy outlined above.
Note that using text data, C is typically sparse (i.e., many elements cij = 0). For these
elements we cannot compute Equation (19) since we divide by 0. The function sets the
corresponding entries to NA. In the subsequent MDS call, these elements are automatically
blanked out by setting the corresponding weight wij to 0 in the basic stress equation.
Figure 18 shows the bubble plot which incorporates the stress-per-point (SPP) information.
The larger a bubble, the larger the contribution of a particular object (here, word) to the
Journal of Statistical Software 35
total stress. Objects with large SPP values are responsible for misfit. The closer two words
are in the configuration, the more frequently they have been mentioned together in a single
statement.
An extension of this model is presented in Mair et al. (2014), who introduce the exponent λ
in order to emphasize larger dissimilarities. The reason for this is that in text data we often
end up with little variance in the input dissimilarities which leads to a concentric, circular
representation of the configuration. Equation 19 changes to
!λ
2
ci+ c+j
δij = . (20)
cij
This extension is called power gravity model. The parameter λ needs to be chosen ad hoc
and can take values from [−∞, ∞]. For λ < 1 we shrink large dissimilarities, for λ = 1 we
end up with the ordinary gravity model, and for λ > 1 we stretch large dissimilarities. Note
that there is a trade-off between the choice of λ and the stress value: the more structure we
create, the higher the stress value. This extension is relevant for metric MDS strategies such
as ratio, interval, or spline MDS. The λ parameter can be specified in the gravity function.
A recent work by Rusch, Mair, and Hornik (2021) embeds the gravity formulation into a more
general loss function which, among other things, finds an optimal λ during the optimization
process.
• For each object pair i, j compute aij = xi − xj which results in a vector of length p.
q
• Norm aij to unit length, resulting in bij = aij / aij
⊤a .
ij
• Incorporate the skew-symmetric part: cij = nij bij with nij as the corresponding element
in N (drift vectors).
• For a given point i, average the elements in cij : di = n−1 cij (average drift vectors).
P
j
• For plotting, compute the vector lengths of di (root mean square of its elements, scaled
√
by a factor of n/mean(M)),
q and the direction angle (relative to the y-axis) of
αi = arccos(d⊤
i u/ di di ) with u = (0, 1).
⊤ ⊤
36 smacof Version 2: Multidimensional Scaling and Unfolding in R
− .
1.0
−− −.
0.5
−−− .− ..
−−.
Dimension 2
−−−−.
−−−−− .−−
−.− .−.
.−−−− .−−−
.−−. −..
0.0
−−.− −−..
−.−. ..−
−−−.. −.−− ...
..−−− .−..
−..− −... ..−.
−−...
....
−0.5
−.... ...−
...−−
....−
.....
−1.0
Dimension 1
Figure 19: MDS solution for asymmetric Morse code data including drift vectors.
To illustrate the drift vector model, we use the classical Morse code data by Rothkopf (1957).
Rothkopf asked 598 subjects to judge whether two signals, presented acoustically one after
another, were the same or not. The values are the average percentages for which the answer
“Same!” was given in each combination of row stimulus i and column stimulus j, where
either i or j was the first signal presented. The responses were aggregated to confusion
rates and subsequently subtracted from 1, such that the values represent dissimilarities. The
driftVector function performs the decomposition from Equation 21, fits an MDS of choice on
M, and applies the drift vector computation steps outlined above. For the Morse code data,
the resulting drift configuration plot based on a 2D ordinal MDS fit, is given in Figure 19.
We see that the vectors tend to point in the bottom left direction; they are certainly not
random. In the bottom left quadrant we mostly have longer signals suggesting that shorter
signals are more often confused with longer ones than vice versa. Note that the plot function
has a vecscale argument by which the user can modify the length of the drift vectors by a
scaling factor.
Other approaches to scale asymmetric data implemented in R are the following. Vera and
Rivera (2014) embed MDS into a structural equation modeling framework. Their approach
is implemented in the semds package (Vera and Mair 2019). Zielman and Heiser (1993)
Journal of Statistical Software 37
7.4. Procrustes
Sometimes it is of interest to compare multiple MDS configurations based on, for instance,
different experimental conditions (the objects need to be the same within each condition).
The idea of Procrustes (Hurley and Cattell 1962) is to remove “meaningless” configuration
differences such as rotation, translation, and dilation (see Commandeur 1991, for an overview
of Procrustean models). Note that Procrustes transformations do not change the fit (stress
value) of an MDS.
In brief, Procrustes works as follows. Let X and Y be two MDS configuration matrices.
X is the target configuration, and Y the configuration subject to Procrustes transformation
leading to the transformed configuration matrix Ŷ. Further, let Z be a centering matrix
(Z = I − n−1 11⊤ ). Procrustes involves the following steps:
1. Compute C = X⊤ ZY.
The matrix Ŷ contains the Procrustes transformed configuration and replaces Y. The target
configuration X and Ŷ can be plotted jointly and allows researchers to explore potential
differences between the configurations.
The dataset we use to illustrate Procrustes is taken from Vaziri-Pashkam and Xu (2019). In
their fMRI experiment on visual object representations they used both natural and artificial
shape categories to study the activation of various brain regions (each object represents a
particular brain region). We start with fitting two ordinal MDS solutions, one for each
condition
By plotting the two configurations, Figure 20 suggests that these configurations are different.
Let us apply a Procrustes transformation with the artificial condition as target configura-
tion X, and the natural condition solution as testee configuration Y, subject to Procrustes
transformation.
V4
IPS4 IPS1
VOT
1.0
0.5
IPS0 LO
Superior IPS
IPS2 V3B
IPS3
0.5
IPS2
Dimension 2
Dimension 2
Inferior IPS
V3A
0.0
V2
V1 Superior IPS V3
V3 IPS3
0.0
V3A IPS1
−0.5
IPS0
IPS4
V4 VOT V1
Inferior IPS V2
−0.5
LO V3B
Dimension 1 Dimension 1
Figure 20: Left panel: MDS configuration natural condition. Right panel: MDS configura-
tion artificial condition.
Rotation matrix:
D1 D2
D1 -0.826 -0.564
D2 0.564 -0.826
Translation vector: 0 0
Dilation factor: 0.906
The print output shows the rotation matrix, the dilation factor, and the translation vector
(which is always 0 if two MDS configurations are involved, due to normalization constraints).
In addition, it reports Tucker’s congruence coefficient for judging the similarity of two con-
figurations. This coefficient is derived from factor analysis and can be computed as follows:
dij (X)dij (Y)
P
i<j
c(X, Y) = qP qP (22)
i<j dij (X) i<j dij (Y)
2 2
A few remarks regarding the congruence coefficient. First, it is generally recommended that
one uses the congruence coefficient to judge configurational similarity and not the correlation
coefficient, since correlating distances does not properly assess the similarity of configurations
(see Borg and Groenen 2005, p. 439–440 for details). Second, there is actually no Procrustes
transformation needed to compute the congruence coefficient, since c(X, Y) = c(X, Ŷ).
Third, when applying (22) within an MDS context, the resulting value of c(X, Y) is gen-
erally high. In factor analysis, values in the range of 0.85–0.94 are considered to be “fairly
Journal of Statistical Software 39
IPS4
IPS4
1.0
IPS3
IPS3
0.5
IPS2
V2
V2 V1
Dimension 2
V1 IPS2
Superior IPS
V3 Superior IPS
Inferior IPS
0.0
V4 VOT
LO Inferior IPS
−0.5
LO V3B
VOT V4
artificial
natural
Dimension 1
Figure 21: Procrustes on two MDS configurations: the configuration from the artificial
condition acts as target; the one from the natural condition is Procrustes transformed.
similar”, and values higher than 0.95 suggest that the two factors are considered equal (see
Lorenzo-Seva and Ten Berge 2006, for details).
As an alternative we can consider Guttman’s alienation coefficient, which is simply
q
K(X, Y) = 1 − c(X, Y)2 . (23)
This measure differentiates better between two solutions than the congruence coefficient.
Figure 21 shows the Procrustes transformed solution (i.e., plotting X and Ŷ jointly) and
suggests that the two configurations are actually very similar, apart from a few points. The
user can request the (sorted) distances between each pair of testee and target points via
fitproc$pairdist. V4, inferior IPS, IPS3, and V3B show the biggest differences across the
two conditions.
Another option to apply Procrustes is to use a theoretical configuration as target. For in-
stance, Borg and Leutner (1983) constructed rectangles on the basis of a grid design (as
contained in rect_constr) which we use as target configuration. Participants had to rate
similarity among rectangles within this grid. Based on these ratings a dissimilarity matrix
was constructed, here subject to a 2D ordinal MDS solution. Within the context of theoretical
solutions it is sometimes interesting to determine the stress value based on the dissimilarity
matrix and an initial configuration (with 0 iterations). The stress0 function does the job.
4 12
8
16
3
4 8 12 16
3
7
11
15
3 7 11 15
2
2
Dimension 2
2 6 10 14
6 14
1
1 5 9 13
9
1 5
0
13
−1
theoretical
observed
2 3 4 5 6 7
Dimension 1
Figure 22: Procrustes with theoretical grid as target configuration (blue dots).
Call:
stress0(delta = rectangles, init = rect_constr)
Now we fit an ordinal MDS model, using the theoretical rectangle alignment as starting value.
The resulting MDS configuration is used as testee configuration Y, subject to Procrustes.
Figure 22 plots the rectangle grid and the Procrustes transformed configuration jointly. We
see clear differences, especially in the right end of the rectangle grid. These differences are
also reflect in the considerably high alienation coefficient of 0.35.
Note that Procrustes is not limited to MDS applications. It can be applied to any configu-
ration matrices X and Y as long as the objects involved are the same and the matrices have
the same dimensions. Other forms of generalized Procustes analysis are given in Borg and
Groenen (2005, Section 20.9).
Journal of Statistical Software 41
8. Conclusion
In this follow-up paper to De Leeuw and Mair (2009b), who introduced the smacof package,
we presented numerous updates that have been implemented over the years. It is safe to
say that these developments establish smacof as the most comprehensive implementation of
MDS and unfolding techniques in R. Still, there are several tasks on our to-do list. First,
we plan to implement a fastMDS routine entirely written in C to speed up computations for
large data settings. Second, we will work on an implementation of inverse MDS (De Leeuw
and Groenen 1997). Third, we aim to extend spherical MDS and unfolding to more general
geometric shapes such as a pre-specified polygon mesh.
Acknowledgments
We would like to thank Ingwer Borg for his contributions to the smacof package and his
critical applied remarks which enhanced the usability of the package, and Frank Busing for
his contributions to the unfolding implementation and the sim2diss function. In addition,
we would like to thank three anonymous reviewers and the associate editor for their helpful
comments.
References
Borg I, Bardi A, Schwartz SH (2017). “Does the Value Circle Exist within Persons or Only
across Persons?” Journal of Personality, 85, 151–162. doi:10.1111/jopy.12228.
Borg I, Groenen PJF (2005). Modern Multidimensional Scaling: Theory and Applications.
2nd edition. Springer-Verlag, New York.
Borg I, Groenen PJF, Mair P (2018). Applied Multidimensional Scaling and Unfolding. 2nd
edition. Springer-Verlag, New York.
Borg I, Leutner D (1983). “Dimensional Models for the Perception of Rectangles.” Perception
and Psychophysics, 34, 257–269. doi:10.3758/BF03202954.
Borg I, Lingoes JC (1980). “A Model and Algorithm for Multidimensional Scaling with Exter-
nal Constraints on the Distances.” Psychometrika, 45, 25–38. doi:10.1007/BF02293597.
Bove G, Okada A (2018). “Methods for the Analysis of Asymmetric Pairwise Rela-
tionships.” Advances in Data Analysis and Classification, 12, 5–31. doi:10.1007/
s11634-017-0307-9.
Coombs CH (1964). A Theory of Data. John Wiley & Sons, New York.
Copleston FC (1949). A History of Philosophy. Volume I: Greece and Rome. Doubleday, New
York.
Cox DR, Brandwood L (1959). “On a Discriminatory Problem Connected with the Work
of Plato.” Journal of the Royal Statistical Society B, 21, 195–200. doi:10.1111/j.
2517-6161.1959.tb00329.x.
De Leeuw J, Mair P (2009a). “Gifi Methods for Optimal Scaling in R: The package homals.”
Journal of Statistical Software, 31(4), 1–20. doi:10.18637/jss.v031.i04.
DeSarbo WS, Rao VR (1984). “GENFOLD2: A Set of Models and Algorithms for the
GENeral UnFOLDing Analysis of Preference/Dominance Data.” Journal of Classification,
1, 147–186. doi:10.1007/BF01890122.
Engen T, Levy N, Schlosberg H (1958). “The Dimensional Analysis of a New Series of Facial
Expressions.” Journal of Experimental Psychology, 55, 454–458. doi:10.1037/h0047240.
Fleiss JL, Levin BA, Paik MC (2003). Statistical Methods for Rates and Proportions. 3rd
edition. John Wiley & Sons, Hoboken.
Gabriel KR (1971). “The Biplot Graphical Display of Matrices with Application to Principal
Component Analysis.” Biometrika, 58, 453–457. doi:10.1093/biomet/58.3.453.
Gifi A (1990). Nonlinear Multivariate Analysis. John Wiley & Sons, Chichester.
Gower JC, Legendre P (1986). “Metric and Euclidean Properties of Dissimilarity Coefficients.”
Journal of Classification, 3, 5–48. doi:10.1007/BF01896809.
Gower JC, Lubbe S, Le Roux N (2011). Understanding Biplots. John Wiley & Sons, Chich-
ester.
Graffelman J (2020). calibrate: Calibration of Scatterplot and Biplot Axes. R package version
1.7.7, URL http://CRAN.R-project.org/package=calibrate.
Haynes KE, Fotheringham AS (1984). Gravity and Spatial Interaction Models. Sage, Beverly
Hills.
44 smacof Version 2: Multidimensional Scaling and Unfolding in R
Heiser WJ, Busing FMTA (2004). “Multidimensional Scaling and Unfolding of Symmetric and
Asymmetric Proximity Relations.” In D Kaplan (ed.), The Sage Handbook of Quantitative
Methodology for the Social Sciences, pp. 25–48. Sage Publications, Thousand Oaks.
Hurley JR, Cattell RB (1962). “The Procrustes Program: Producing Direct Rotation to
Test a Hypothesized Factor Structure.” Behavioral Science, 7, 258–262. doi:10.1002/bs.
3830070216.
Mair P (2020). MPsychoR: Modern Psychometrics with R. R package version 0.10-8, URL
https://CRAN.R-project.org/package=MPsychoR.
Mair P, Rusch T, Hornik K (2014). “The Grand Old Party: A Party of Values?” SpringerPlus,
3(697), 1–10. doi:10.1186/2193-1801-3-697.
McNally RJ, Robinaugh DJ, Wu GWY, Wang L, Deserno MK, Borsboom D (2015). “Mental
Disorders as Causal Systems: A Network Approach to Posttraumatic Stress Disorder.”
Clinical Psychological Science, 3, 836–849. doi:10.1177/2167702614553230.
Meulman J, Heiser WJ (1983). “The Display of Bootstrap Solutions in MDS.” Technical
report, Bell Laboratories, Murray Hill.
Murdoch D, Chow ED (2020). ellipse: Functions for Drawing Ellipses and Ellipse-Like Con-
fidence Regions. R package version 0.4.2, URL https://CRAN.R-project.org/package=
ellipse.
Rabinowitz GB (1975). “Introduction to Nonmetric Multidimensional Scaling.” American
Journal of Political Science, 19, 343–390.
Ramsay JO (1977). “Maximum Likelihood Estimation in Multidimensional Scaling.” Psy-
chometrika, 42, 241–266. doi:10.1007/BF02294052.
Ramsay JO (1982). “Some Statistical Approaches to Multidimensional Scaling Data.” Journal
of the Royal Statistical Society A, 145, 285–303. doi:10.2307/2981865.
Ramsay JO (1988). “Monotone Regression Splines in Action.” Statistical Science, 3, 425–461.
doi:10.1214/ss/1177012761.
Ramsay JO (1997). Multiscale Manual (Extended Version). McGill University, Montreal.
Rodgers JL, Young FW (1981). “Successive Unfolding of Family Preferences.” Applied Psy-
chological Measurement, 5, 51–62. doi:10.1177/014662168100500108.
Rothkopf EZ (1957). “A Measure of Stimulus Similarity and Errors in some Paired-Associate
Learning.” Journal of Experimental Psychology, 53, 94–101. doi:10.1037/h0041867.
Rusch T, Mair P, Hornik K (2021). “Cluster Optimized Proximity Scaling.” Journal of
Computational and Graphical Statistics, 30(4), 1156–1167. doi:10.1080/10618600.2020.
1869027.
Sagarra M, Busing FMTA, Mar-Molinero C, Rialp J (2018). “Assessing the Asymmetric
Effects on Branch Rivalry of Spanish Financial Sector Restructuring.” Advances in Data
Analysis and Classification, 12, 131–153. doi:10.1007/s11634-014-0186-2.
Schwartz SH (1992). “Universals in the Content and Structure of Values: Theoretical Ad-
vances and Empirical Tests in 20 Countries.” In M Zanna (ed.), Advances in Experimental
Social Psychology, pp. 1–65. Academic Press, New York.
Shepard RN (1957). “Stimulus and Response Generalization: A Stochastic Model Relating
Generalization to Distance in Psychological Space.” Psychometrika, 22, 325–345. doi:
10.1007/BF02288967.
Spence I, Ogilvie JC (1973). “A Table of Expected Stress Values for Random Rankings
in Nonmetric Multidimensional Scaling.” Multivariate Behavioral Research, 8, 511–517.
doi:10.1207/s15327906mbr0804_8.
Spence I, Young FW (1978). “Monte Carlo Studies in Nonmetric Scaling.” Psychometrika,
43, 115–117. doi:10.1007/BF02294095.
46 smacof Version 2: Multidimensional Scaling and Unfolding in R
Tamir DI, Thornton MA, Contreras JM, Mitchell JP (2016). “Neural Evidence that Three
Dimensions Organize Mental State Representation: Rationality, Social Impact, and Va-
lence.” Proceedings of the National Academy of Sciences of the United States of America,
113, 194–199. doi:10.1073/pnas.1511905112.
Vera JF (2017). “Distance Stability Analysis in Multidimensional Scaling Using the Jackknife
Method.” British Journal of Mathematical and Statistical Psychology, 70, 25–41. doi:
10.1111/bmsp.12079.
Vera JF, Mair P (2019). “semds: An R Package for Structural Equation Multidimensional
Scaling.” Structural Equation Modeling: A Multidisciplinary Journal, 26(5), 803–818. doi:
10.1080/10705511.2018.1561292.
Vera JF, Rivera CD (2014). “A Structural Equation Multidimensional Scaling Model for One-
Mode Asymmetric Dissimilarity Data.” Structural Equation Modeling: A Multidisciplinary
Journal, 21, 54–62. doi:10.1080/10705511.2014.856696.
Weinberg SL, Carroll JD, Cohen HS (1984). “Confidence Regions for INDSCAL Using
the Jackknife and Bootstrap Techniques.” Psychometrika, 49, 475–491. doi:10.1007/
BF02302586.
Young FW (1975). “An Asymmetirc Euclidean Model for Multi-Process Asymmetric Data.”
Paper presented at US-Japan Seminar on MDS, San Diego.
Affiliation:
Patrick Mair
Department of Psychology
Harvard University
33 Kirkland Street
Cambridge, MA 02138, United States of America
E-mail: [email protected]
URL: http://scholar.harvard.edu/mair