Word embedding has been a frequent theme of this blog. But the original embedding has been algorithms that perform a non-linear mapping of higher dimensional data to the lower one. This entry I will talk about one of the most oldest and widely used one: Sammon Embedding, published in 1969. This is an embedding algorithm that preserves the distances between all points. How is it achieved?
Assume there are high dimensional data described by -dimensional vectors,
where
. And they will be mapped into vectors
, with dimensions 2 or 3. Denote the distances to be
and
. In this problem,
are the variables to be learned. The cost function to minimize is
,
where . To minimize this, use Newton's method by
,
where , and
is the learning rate.
To implement it, use Theano package of Python to define the cost function for the sake of optimization, and then implement the learning with numpy. Define the cost function with the outline above:
import theano
import theano.tensor as T
import numerical_gradients as ng
# define variables
mf = T.dscalar('mf') # magic factor / learning rate
# coordinate variables
Xmatrix = T.dmatrix('Xmatrix')
Ymatrix = T.dmatrix('Ymatrix')
# number of points and dimensions (user specify them)
N, d = Xmatrix.shape
_, td = Ymatrix.shape
# grid indices
n_grid = T.mgrid[0:N, 0:N]
ni = n_grid[0].flatten()
nj = n_grid[1].flatten()
# cost function
c_terms, _ = theano.scan(lambda i, j: T.switch(T.lt(i, j),
T.sqrt(T.sum(T.sqr(Xmatrix[i]-Xmatrix[j]))),
0),
sequences=[ni, nj])
c = T.sum(c_terms)
s_term, _ = theano.scan(lambda i, j: T.switch(T.lt(i, j),
T.sqr(T.sqrt(T.sum(T.sqr(Xmatrix[i]-Xmatrix[j])))-T.sqrt(T.sum(T.sqr(Ymatrix[i]-Ymatrix[j]))))/T.sqrt(T.sum(T.sqr(Xmatrix[i]-Xmatrix[j]))),
0),
sequences=[ni, nj])
s = T.sum(s_term)
E = s / c
# function compilation and optimization
Efcn = theano.function([Xmatrix, Ymatrix], E)
And implement the update algorithm with the following function:
import numpy
# training
def sammon_embedding(Xmat, initYmat, alpha=0.3, tol=1e-8, maxsteps=500, return_updates=False):
N, d = Xmat.shape
NY, td = initYmat.shape
if N != NY:
raise ValueError('Number of vectors in Ymat ('+str(NY)+') is not the same as Xmat ('+str(N)+')!')
# iteration
Efcn_X = lambda Ymat: Efcn(Xmat, Ymat)
step = 0
oldYmat = initYmat
oldE = Efcn_X(initYmat)
update_info = {'Ymat': [initYmat], 'cost': [oldE]}
converged = False
while (not converged) and step<=maxsteps:
newYmat = oldYmat - alpha*ng.tensor_gradient(Efcn_X, oldYmat, tol=tol)/ng.tensor_divgrad(Efcn_X, oldYmat, tol=tol)
newE = Efcn_X(newYmat)
if np.all(np.abs(newE-oldE)<tol):
converged = True
oldYmat = newYmat
oldE = newE
step += 1
if return_updates:
print 'Step ', step, '\tCost = ', oldE
update_info['Ymat'].append(oldYmat)
update_info['cost'].append(oldE)
# return results
if return_updates:
update_info['num_steps'] = step
return oldYmat, update_info
else:
return oldYmat
The above code is stored in the file sammon.py. We can test the algorithm with an example. Remember tetrahedron, a three-dimensional object with four points equidistant from one another. We expect the embedding will reflect this by sampling points around these four points. With the code tetrahedron.py, we implemented it this way:
import argparse
import numpy as np
import matplotlib.pyplot as plt
import sammon as sn
argparser = argparse.ArgumentParser('Embedding points around tetrahedron.')
argparser.add_argument('--output_figurename',
default='embedded_tetrahedron.png',
help='file name of the output plot')
args = argparser.parse_args()
tetrahedron_points = [np.array([0., 0., 0.]),
np.array([1., 0., 0.]),
np.array([np.cos(np.pi/3), np.sin(np.pi/3), 0.]),
np.array([0.5, 0.5/np.sqrt(3), np.sqrt(2./3.)])]
sampled_points = np.concatenate([np.random.multivariate_normal(point, np.eye(3)*0.0001, 10)
for point in tetrahedron_points])
init_points = np.concatenate([np.random.multivariate_normal(point[:2], np.eye(2)*0.0001, 10)
for point in tetrahedron_points])
embed_points = sn.sammon_embedding(sampled_points, init_points, tol=1e-4)
X, Y = embed_points.transpose()
plt.plot(X, Y, 'x')
plt.savefig(args.output_figurename)
It outputs a graph:

There are other such non-linear mapping algorithms, such as t-SNE (t-distributed stochastic neighbor embedding) and Kohonen’s mapping (SOM, self-organizing map).


