Multi Dimensional Scaling
Multi Dimensional Scaling
Multi-Dimensional Scaling
1
Feature Matrices
• Formalized as a set of observations,
each containing a set of variables
… … … … …
3
Monotonic Relationship
“Normal People”
$$ *
*
* Education
4
What then…
• Network methods allow detection of
similarity clusters in feature data
• Relationship between clusters can be
discontinuous Lotto Winners
*** ** “Normal People”
* *
$$ *
*
*
Ph.D’s
*
* *** **
*
* * **
* **
5 Education
What if we have more
columns?
6
7
8
9
1
0
Multi-Dimensional Scaling
• Hi-clustering is a discrete model
1
1
MDS
• The purpose of multidimensional scaling
(MDS) is to provide a spatial representation
of the pattern of similarities
1
2
Input to MDS
• Measure of pairwise similarity among nodes
• Attribute-based
• Euclidean distances
• Graph distances
• CONCOR similarities
• Output:
• A set of coordinates in 2D or 3D space such that
• Similar nodes are closer together then dissimilar nodes
1
3
1
4
1
5
Algorithm
• MDS finds a set of vectors in p-dimensional space
such that the matrix of euclidean distances among
them corresponds as closely as possible to a
function of the input matrix according to a fitness
function called stress.
• Difficulties:
• High-dimensional spaces are difficult to represent
visually
• With increasing dimensions, you must estimate an
increasing number of parameters to obtain a1
decreasing improvement in stress. 7
1
8
1
9
Stress function
• The degree of correspondence between the distances among points on
MDS map and the matrix input
2
2
Increasing dimensionality
• As number of dimensions increases,
stress decreases:
2
3
Interpretation of MDS Map
• Axes are meaningless
• We are looking at cohesiveness and
proximity of clusters, not their locations
• Infinite number of possible permutations
• If stress > 0 , there is distortion
• Larger distances less distorted then
smaller
2
4
What to look for
• Clusters
• groups of items that are closer to each other than
to other items.
• When really tight, highly separated clusters occur
in perceptual data, it may suggest that each
cluster is a domain or subdomain which should be
analyzed individually.
• Extract clusters and re-run MDS on them for
further separation
2
5
2
6
What to look for…
• Dimensions
• Item attributes that seem to order the items in the
map along a continuum.
• For example, an MDS of perceived similarities among
breeds of dogs may show a distinct ordering of dogs by
size.
• At the same time, an independent ordering of dogs
according to viciousness might be observed.
• Orderings may not follow the axes or be orthogonal to
each other
• The underlying dimensions are thought to
"explain" the perceived similarity between items.
• Implicit similarity function is a weighted sum of
attributes
• May “discover” non-obvious continuums 2
7
High-dimensionality MDS
• Difficult to interpret visually, need a
mathematical technique