Skip to content

AssertionError: Tree consistency failed in TSNE #8992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yanshuaicao opened this issue Jun 6, 2017 · 10 comments
Closed

AssertionError: Tree consistency failed in TSNE #8992

yanshuaicao opened this issue Jun 6, 2017 · 10 comments
Milestone

Comments

@yanshuaicao
Copy link

yanshuaicao commented Jun 6, 2017

Description

TSNE learning on digit dataset throws error for perplexity=8

Steps/Code to Reproduce

from sklearn import manifold, datasets
data = datasets.load_digits()
X = data['data']
n_components = 2
tsne = manifold.TSNE(n_components=n_components, init='pca', perplexity=8, 
                         random_state=0, verbose=0)
Y = tsne.fit_transform(X)

Expected Results

No error.

Actual Results

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-9-553b1868b3c8> in <module>()
     11 tsne = manifold.TSNE(n_components=n_components, init='pca', perplexity=8, 
     12                          random_state=0, verbose=0)
---> 13 Y = tsne.fit_transform(X)

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in fit_transform(self, X, y)
    895             Embedding of the training data in low-dimensional space.
    896         """
--> 897         embedding = self._fit(X)
    898         self.embedding_ = embedding
    899         return self.embedding_

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _fit(self, X, skip_num_points)
    792                           X_embedded=X_embedded,
    793                           neighbors=neighbors_nn,
--> 794                           skip_num_points=skip_num_points)
    795 
    796     @property

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _tsne(self, P, degrees_of_freedom, n_samples, random_state, X_embedded, neighbors, skip_num_points)
    868         opt_args['it'] = it + 1
    869         params, kl_divergence, it = _gradient_descent(obj_func, params,
--> 870                                                       **opt_args)
    871 
    872         if self.verbose:

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _gradient_descent(objective, p0, it, n_iter, objective_error, n_iter_check, n_iter_without_progress, momentum, learning_rate, min_gain, min_grad_norm, min_error_diff, verbose, args, kwargs)
    386 
    387     for i in range(it, n_iter):
--> 388         new_error, grad = objective(p, *args, **kwargs)
    389         grad_norm = linalg.norm(grad)
    390 

/home/owner/py27_env/local/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _kl_divergence_bh(params, P, neighbors, degrees_of_freedom, n_samples, n_components, angle, skip_num_points, verbose)
    289     error = _barnes_hut_tsne.gradient(sP, X_embedded, neighbors,
    290                                       grad, angle, n_components, verbose,
--> 291                                       dof=degrees_of_freedom)
    292     c = 2.0 * (degrees_of_freedom + 1.0) / degrees_of_freedom
    293     grad = grad.ravel()

sklearn/manifold/_barnes_hut_tsne.pyx in sklearn.manifold._barnes_hut_tsne.gradient (sklearn/manifold/_barnes_hut_tsne.c:8155)()

AssertionError: Tree consistency failed: unexpected number of points=1796 at root node=1797

Versions

Linux-4.4.0-78-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Jul 1 2016, 15:12:24) \n[GCC 5.4.0 20160609]')
('NumPy', '1.12.1')
('SciPy', '0.18.1')
('Scikit-Learn', '0.19.dev0')

Note:

No error encountered for perplexity = 7 or 9, or any other values tried.

@amueller
Copy link
Member

amueller commented Jun 6, 2017

Thanks for the report. That's... odd... to say the least.

@lesteve
Copy link
Member

lesteve commented Jun 6, 2017

Hmmm I can reproduce on master. Note this works fine on 0.18.1.

@jnothman
Copy link
Member

jnothman commented Jun 6, 2017 via email

@lesteve lesteve added the Sprint label Jun 6, 2017
@lesteve lesteve added this to the 0.19 milestone Jun 7, 2017
@lesteve lesteve added the Blocker label Jun 7, 2017
@Sentient07
Copy link
Contributor

Sentient07 commented Jun 7, 2017

Hi, I am not sure what the "blocker" tag is, but can i work on this ? If so, could someone please provide me an idea of where to start looking from?
Thanks,

@jnothman
Copy link
Member

jnothman commented Jun 7, 2017 via email

@amueller
Copy link
Member

@jnothman do you think this is realistic to tackle for 0.19?

@jnothman
Copy link
Member

jnothman commented Jun 20, 2017 via email

@ogrisel
Copy link
Member

ogrisel commented Jun 23, 2017

Actually I am pretty confident that #9032 fixes this issue and is not behaving correctly.

@ogrisel
Copy link
Member

ogrisel commented Jun 28, 2017

seeing as it's not been diagnosed fully

@tomMoral and I diagnosed that this was caused by the master implementation of the QuadTree datastructure. Contiguous cells (or tiles) did not always have exactly matching boundaries due to floating point rounding. For deep enough trees with small cells, you get a non-zero chance to insert a point in between consecutive tiles in the QuadTree...

The reimplementation of the QuadTree datastructure in #9032 does not have this issue anymore. The max boundary of a cell is exactly the min boundary of the following cell.

@jnothman
Copy link
Member

Should be fixed in #9032

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants