0% found this document useful (0 votes)
98 views28 pages

Multi Dimensional Scaling

Multi-Dimensional Scaling (MDS) is a technique that provides a spatial representation of the similarities between observations by positioning similar observations closer together in a low-dimensional space. MDS finds a configuration of points in the space that best preserves the distances in the original high-dimensional data. The stress value indicates how well the distances between points in the low-dimensional map match the original distances, with a lower stress value indicating a better representation. MDS can be used to identify clusters of similar observations and underlying dimensions in the data.

Uploaded by

Maksim Tsvetovat
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views28 pages

Multi Dimensional Scaling

Multi-Dimensional Scaling (MDS) is a technique that provides a spatial representation of the similarities between observations by positioning similar observations closer together in a low-dimensional space. MDS finds a configuration of points in the space that best preserves the distances in the original high-dimensional data. The stress value indicates how well the distances between points in the low-dimensional map match the original distances, with a lower stress value indicating a better representation. MDS can be used to identify clusters of similar observations and underlying dimensions in the data.

Uploaded by

Maksim Tsvetovat
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

MDS

Multi-Dimensional Scaling

1
Feature Matrices
• Formalized as a set of observations,
each containing a set of variables

Name Age Income Education


1 Sean 19 23,000 H.S.

2 Joe 25 90,000 MBA

… … … … …

n Bill 32 28,000 Ph.D. 2


Feature Matrices
• Standard fodder of regression analysis
• Assumption:
• For all observations, dependencies
between variables are linear, log-linear or
at least monotonic

• …Generally not true in real world


• “Outliers” may be just as important

3
Monotonic Relationship

“Normal People”
$$ *

*
* Education
4
What then…
• Network methods allow detection of
similarity clusters in feature data
• Relationship between clusters can be
discontinuous Lotto Winners
*** ** “Normal People”
* *
$$ *
*
*
Ph.D’s
*
* *** **
*
* * **
* **
5 Education
What if we have more
columns?

6
7
8
9
1
0
Multi-Dimensional Scaling
• Hi-clustering is a discrete model

• Partition nodes into exhaustive non-


overlapping subsets
• World is not so black-n-white

1
1
MDS
• The purpose of multidimensional scaling
(MDS) is to provide a spatial representation
of the pattern of similarities

• More similar nodes will appear closer together

• Finds non-intuitive equivalences in networks

1
2
Input to MDS
• Measure of pairwise similarity among nodes
• Attribute-based
• Euclidean distances
• Graph distances
• CONCOR similarities
• Output:
• A set of coordinates in 2D or 3D space such that
• Similar nodes are closer together then dissimilar nodes

1
3
1
4
1
5
Algorithm
• MDS finds a set of vectors in p-dimensional space
such that the matrix of euclidean distances among
them corresponds as closely as possible to a
function of the input matrix according to a fitness
function called stress.

1. Assign points to arbitrary coordinates in p-dimensional space.


2. Compute euclidean distances among all pairs of points, to
form the D’ matrix.
3. Compare the D’ matrix with the input D matrix by evaluating
the stress function. The smaller the value, the greater the
correspondance between the two.
4. Adjust coordinates of each point in the direction of the stress
vector 1
5. Repeat steps 2 through 4 until stress won't get any lower6
Dimensionality
• Normally, MDS is used in 2D space for
optimal visual impact
• may be a very poor, highly distorted,
representation of your data.
• High stress value.
• Increase the number of dimensions.

• Difficulties:
• High-dimensional spaces are difficult to represent
visually
• With increasing dimensions, you must estimate an
increasing number of parameters to obtain a1
decreasing improvement in stress. 7
1
8
1
9
Stress function
• The degree of correspondence between the distances among points on
MDS map and the matrix input

• dij = euclidean distance, across all dimensions, between points i and j on


the map,
• f(xij) is some function of the input data,
scale = a constant scaling factor, used to keep stress values between 0
and 1.
• When the MDS map perfectly reproduces the input data,
• f(xij) = dij is for all i and j, so stress is zero. 2
• Thus, the smaller the stress, the better the representation. 0
Stress Function, cont.
• The transformation of the input values f(xij)
used depends on whether metric or non-
metric scaling.
• Metric scaling:
• f(xij) = xij.
• raw input data is compared directly to the map
distances
• Inverse of map distances for similarities
• Non-metric scaling
• f(xij) is a weakly monotonic transformation of the
input data that minimizes the stress function.
• Computed using a regression method 2
1
Non-zero stress
• Caused by measurement error or
insufficient dimensionality
• Stress levels of
• < 0.15 = acceptable
• < 0.1 = excellent
• Any MDS map with stress > 0 is
distorted

2
2
Increasing dimensionality
• As number of dimensions increases,
stress decreases:

2
3
Interpretation of MDS Map
• Axes are meaningless
• We are looking at cohesiveness and
proximity of clusters, not their locations
• Infinite number of possible permutations
• If stress > 0 , there is distortion
• Larger distances less distorted then
smaller

2
4
What to look for
• Clusters
• groups of items that are closer to each other than
to other items.
• When really tight, highly separated clusters occur
in perceptual data, it may suggest that each
cluster is a domain or subdomain which should be
analyzed individually.
• Extract clusters and re-run MDS on them for
further separation

2
5
2
6
What to look for…
• Dimensions
• Item attributes that seem to order the items in the
map along a continuum.
• For example, an MDS of perceived similarities among
breeds of dogs may show a distinct ordering of dogs by
size.
• At the same time, an independent ordering of dogs
according to viciousness might be observed.
• Orderings may not follow the axes or be orthogonal to
each other
• The underlying dimensions are thought to
"explain" the perceived similarity between items.
• Implicit similarity function is a weighted sum of
attributes
• May “discover” non-obvious continuums 2
7
High-dimensionality MDS
• Difficult to interpret visually, need a
mathematical technique

• Feed MDS coordinates into another


discriminator function
• HiClus may work well

• May be easier to tease apart then original


attribute vectors 2
8

You might also like