Multidimensional Scaling Handout
Multidimensional Scaling Handout
Workshop at Pronunciation in Second Language Learning and Teaching (PSLLT) September 2018
Definition of MDS: a means of ordinating (i.e. creating categories and clines) and visualizing data by taking
potentially complex information and arranging it into a set of points in n-dimensional space.
1-dimensional space: a line, e.g. a number line.
2-dimensional space: a plane / a surface
3-dimensional space: a volume, e.g. a cube
4- or higher dimensional space: no physical analog; can still be analyzed mathematically, but
interpretation is more challenging, and in practice, a high-dimensional model is used very rarely
- Not a statistical test. Alone, it does not test hypotheses, but can be useful for argumentation
- Primarily for “distance” data
o Concrete, real-world distances MDS is almost purely a means of visualizing data
o Abstract, psychological distances MDS can also aid in interpreting data
Perceptual space is warped by language experience, so sometimes it is of key concern to find out how target
language sounds or utterances are perceived. MDS could also be used on a wide variety of data types:
- L2 learners’ perception of the relative salience of segmental, suprasegmental, or even indexical
distinctions
- L1 listeners’ perception of L2 accented speech as “closer to” or “farther away” from native speakers, or
from each other in terms of “groups of accents” (also, e.g. for L2 learners perceiving different dialects)
- The diagonal of the dissimilarity matrix will be 0 because the distance of a stimulus to itself is 0.
- Typically, the matrix should be “square symmetric” (i.e. cell [1,2] is equal to [2,1], and so on). Sometimes,
however, the perceived distance might depend on order or anchor effects. “A” compared to the standard of “B”
might genuinely be different from “B” compared to the standard of “A.” That is doable in MDS, but that analysis
is much more complex and typically not done in psychology.
-1-
Warm-up Example: Flight Distances in Japan
First, we have to decide how many dimensions are necessary for the data. If everything were lined up in
a row, then 1 dimension would be enough, but this is rarely the case.
There’s no way to fit all of the points on a single straight line in a way that reproduces the distances
from the dissimilarity matrix. If instead of a line, we allowed there to be a 2-dimensional space, then we could
create a triangle, and the relationship of distances could be recreated faithfully. E.g.:
Hiroshima
Akita Sendai
Sapporo
-2-
SPSS Walkthrough:
-3-
Select all the variables (stimuli) you want to include in the analysis, and then the model you’ll use
Model options:
“Level of measurement” Even if using ratio data, selecting ordinal
will not reduce the meaningfulness of the result. There are almost no
conceivable situations where L2 researchers would use anything other
than ordinal assumptions.
Output:
“Group plots” Visually displays the output in SPSS. This is not always desirable since
SPSS is limited in visual displays, but it can help to get a general idea.
“Data matrix” Displays distances in the scaled matrix or matrices you create.
However, with only an “ordinal” assumption, this will only give relative distances, so
particular data set will not often be meaningful, and instead it’s often more useful to
calculate distances between points.
“Model and options summary” Prints the settings you selected in the SPSS Output file
“S-stress convergence” MDS creates a set of points in space. The distances between
those points should match up to the distances in the original input matrix, but that might
not be possible in the model-specified number of dimensions. The degree to which the
original input distances and the MDS output distances diverge is called “stress,” and in
general, stress is bad and should be minimized.
MDS algorithms work iteratively. They produce a first pass estimation, then calculate the divergence from the original matrix, make
an adjustment, and then recalculate the degree of divergence, adjust again, and so on. The level of s-stress convergence specifies how
big of an improvement you need to make at each iteration in order to continue. After a new iteration makes less improvement than the
convergence criterion, it will stop. In general, there are not many reasons to change it from 0.001.
“Minimum s-stress value” Once the model gets better than a certain amount, then it can stop. Very few reasons to alter this.
“Maximum iterations” If there is still not a good fit after this many iterations, the model will just give up. If you try 30 and the
model is still iterating, then it is perfectly sound to increase this number, but in practice, this hasn’t been needed.
-4-
SPSS shows each iteration, but for L2 researchers, this data is
useless. Only the final value matters. “Young’s S-Stress for
the matrix is 0.01697, and “matrix stress” is 0.02239, with an R2
value for the matrix of 0.99843. In general, stress values of
less than 0.1 or R2 of higher than 0.9 are considered “good
fit” (Clopper, 2008), but these are very rarely achieved in
psychology data. As you can see, even in only 1 dimension,
stress is very low. Putting all the cities in Japan on a simple
number line doesn’t result in huge discrepancies (assuming
ordinal data).
-5-
2-dimensional output:
The “coordinates” are the points in the combined n-dimensional
space where the stimuli have been placed.
We can plot these points in whatever software you like. SPSS also
has one, but the scales are always off and it’s hard to modify. We get
this:
It doesn’t look much like Japan. Typically, MDS puts the dimension that explains the largest portion of the variance on
the x-axis. In our case, Japan is long and skinny, so North-South is much more important than East-West, and it looks
tilted on its side. We can take the coordinates and multiply that matrix by a rotation matrix in order to spin the points
around. This rotation is only done to make things more easily interpretable.
-6-
Stress Plots
One measure of how much improvement in model fit you make by increasing dimensionality. Unlike a scree plot in
Factor Analysis, you will be looking at the point at the bottom of the elbow. After that point, increasing dimensionality
does not drastically increase fit. Alternatively, in SPSS, it’s easy to obtain R2 values, and you can choose the number of
dimensions beyond which R2 does not dramatically increase. Importantly, though, there is no cut-and-dry, mathematical
cutoff for dimensionality, so researchers typically have to make an argument from interpretability of the data (Atagi &
Bent, 2014; Clopper, 2008).
(From real data on American English listeners’ grouping rates on a Free Classification task for German vowels (Daidone, Kruger, & Lidster, 2015))
This particular example is most likely best analyzed as 3-dimensional, so long as the 3d output is interpretable.
Neither SPSS nor R will make a stress plot for you automatically. You have to actually go in and run the model with k =
1, k = 2, k = 3, and so on, and record the stress values and then plot them in order to get this.
-7-
Applied Example in R: Perception of German Vowels
Possible Data Sources:
Perceptual assimilation
Overlap scores (Levy, 2006) yield pairwise
similarities. 1 – overlap scores
dissimilarities.
Free classification
How often was stimulus A grouped with stimulus B (%)
Grouped together more often = more similar
93% of the time in a group = distance of 0.07
15% of the time in a group = distance of 0.85
For SPSS:
1. Analyze -> Scale -> Multidimensional Scaling
2. In Model, initially choose 1-5 dimensions for min to max in order to evaluate stress. Record “matrix stress”
values and plot them against number of dimensions. See what makes sense and seems to fit well enough to
examine in more detail
3. Take the “Coordinates” from that output, plot them in any program and see if there is a clear pattern
4. Run correlations with the rotated points and acoustic, phonological, or indexical features to confirm whether
distances between stimuli correspond to particular features of interest. (If you test significance of the correlations,
make sure to correct for multiple comparisons.)
-8-
Example with Non-Native Language Perception Data (from Daidone, Kruger, & Lidster, 2015) in R:
We’ll use the isoMDS function in the library called “MASS.” The entirety of MDS can be run using 5 lines of code.
1. Set your working directory to where the dissimilarity matrix is stored as a table
a. setwd(“C:/Users/rflidste/Desktop”) don’t forget quotation marks
3. Read in your table while giving it a variable name, and then convert it to a matrix
a. [initialtable] <- read.table(“[Name of file]”) don’t forget quotation marks
5. Run isoMDS on your matrix and give the result a variable name so you can open it up
a. [transparent.output.name] <- isoMDS([initialmatrix], k = [number of dimensions, default is 2])
That’s it. You can then either view the points by clicking on them directly, or by using the command $points to get the
coordinates (i.e. locations of each stimulus).
Note that the name for the stress is the “final value” that gets displayed in the console output, but keep in mind that SPSS
and R give stress in different units. What SPSS will report as a “matrix stress of 0.125,” R will report as a “final value of
12.5.” They mean the same thing. Less than 0.1 / 10 is great, but usually unobtainable. More important is to find where in
the “stress plot” the elbow occurs.
-9-
1. Set your working directory
3. Convert it to a matrix with a name so that the isoMDS function can operate on it
4. Call the MASS library so that you can use MDS functions
library(MASS)
5. Use the function isoMDS function, but specify dimensionality to be 1, 2, 3, and so on, and record the stress
(“final”) values
- 10 -
7. Decide on dimensionality, and then obtain coordinates for the stimuli in that dimensional space using $points
> 3d.output$points
[,1] [,2] [,3]
fy -0.379037146 -0.597544517 0.06256909
my -0.417584600 -0.552179345 0.12950149
fY -0.405999112 -0.165208206 -0.31670512
mY -0.410951826 -0.217585926 -0.37365152
fi 0.801594222 -0.249652696 0.45407903
mi 0.795764086 -0.255615338 0.44386083
fI 0.786813164 -0.233236045 -0.26426827
…
…
8. Using your favorite plotting program, examine and rotate the points for interpretability
9. Run correlations with acoustic measurements to confirm whether distances correspond to particular features of the
stimuli themselves
- R2 is not easy to obtain using R, but there are ways to calculate it “by hand” if reviewers need it. Alternatively,
there are other functions for obtaining MDS data, including “cmdscale,” which can provide R2 as part of the list of
objects it creates. Cmdscale, though, requires transforming your data into a “distgps” object first, which is
slightly more involved for the R-uninitiated.
- Even though the actual numbers from R and SPSS may differ, it is only the relative positions and distances in
multidimensional space that are interpretable. They can be scaled from one to the other
- Because the exact amount things are rotated is effectively arbitrary, it’s important not to assume that there is one
“correct” rotation amount, and be appropriately modest in interpretations
Expansions
- INSCAL is a way of looking at individual variation in the perceptual space. When you are able to create a full
dissimilarity matrix per participant, you will get a group plot of distances, and then a set of additional values for
each individual:
o Eigenvectors for each dimension (expand or shrink dimensions in order to convert the group plot to
something closer to that individual’s pattern of distances)
o Fit for that individual
References
Atagi, E., & Bent, T. (2013). Auditory free classification of nonnative speech. Journal of Phonetics, 41, 509–519. doi:
10.1016/j.wocn.2013.09.003
Clopper, C. G. (2008). Auditory free classification: Methods and analysis. Behavior Research Methods, 40, 575–581. doi:
10.3758/BRM.40.2.575
Daidone, D., Kruger, F., & Lidster, R. (2015). Perceptual assimilation and free classification of German vowels by
American English listeners. In The Scottish Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th International
Congress of Phonetic Sciences. Glasgow, UK: Glasgow University.
- 11 -
Perceptual Assimilation Data:
Need a way to convert the response data into a dissimilarity matrix. There are two, somewhat competing methods for this:
Faris, M. M., Best, C. T., & Tyler, M. D. (2018). Discrimination of uncategorised non-native vowel contrasts is
modulatedby perceived overlap with native phonological categories. Journal of Phonetics, 70, 1-19.
doi:10.1016/j.wocn.2018.05.003
Levy, E. S. (2009). On the assimilation-discrimination relationship in American English adults’ French vowel learning.
Journal of the Acoustical Society of America, 126, 2670–2682. doi: 10.1121/1.3224715
For Levy’s model, the overlap between L2 categories a and b is defined as:
𝐾𝐾
where “min” means the minimum between two values, 𝑎𝑎𝑘𝑘 is the percent of a stimuli that were categorized as L1 category
k, 𝑏𝑏𝑘𝑘 is the percent of b stimuli that were categorized as L1 category k, and this is summed across all response options K.
Overlap scores seek to find how much the overall pattern of classification “overlaps” between any two stimuli.
Example: Percent categorizations of German /i/ and /ɪ/ by American English (AE) listeners into AE categories:
AE Category: i ɪ eɪ ɛ æ ʌ ɝ ɑ ɔ oʊ ʊ u
In total across all response categories, German /i/ and /ɪ/ were categorized the same way 12.2 percent of the time.
You’d then repeat this calculation for every pair of vowels. Note, though, that these are similarities, not “dissimilarities,”
so you would need to take 1 minus the overlap scores in order to obtain the dissimilarity matrix. The result would be a
square symmetric table with German vowels against German vowels.
- 12 -