StatQuest Statistics
StatQuest Statistics
TSNE 2
PCA 2
PCA 1 TSNE 1
Etc. etc…
Using a normal distribution
means that distant points have
very low similarity values….
… and close points have high
similarity values.
Ultimately, we measure
the distances between
all of the points and the
point of interest…
Ultimately, we measure
the distances between
all of the points and the
point of interest…
No big deal!
= High similarity
= Low similarity
Hooray!!! We’re done doing calculating similarity scores for the scatter plot!
Now we randomly project
the data onto the number
line…
Now we randomly project
the data onto the number
line…
… and calculate
similarity scores for
the points on the
number line.
Just like before, that means
picking a point…
Just like before, that means
picking a point…
…measuring a distance…
…and lastly, drawing a line
from the point to a curve.
However, this time we’re
using a “t-distribution”.
It uses small steps, because it’s a little bit like a chess game and can’t be solved all
at once. Instead, it goes one move at at time.
Now to finally tell you why the “t-distribution” is used…
…originally, the “SNE” algorithm just used a normal distribution throughout and the clusters
clumped up in the middle and were harder to see.
The t-distribution forces some space between the points.
You should look up “the CURSE of dimensionality”, to
understand the need for this additiontional ‘space’
provided by the T-distribution. It is really useful to know
in general
https://en.wikipedia.org/wiki/Curse_of_dimensionality
To understand how the two matrices are compared,
read about kullback leibler divergence :)
You can also read about the most optimal way to
randomly optimize the arrangement of the points, by
looking into the original paper on T-SNE
https://lvdmaaten.github.io/tsne/
Now some actual TSNE examples
MNIST digits dataset
MNIST digits dataset
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
..
.
0
0
0
0
..
.
0
0
0
0
..
.
120
0
0
0
0
..
.
120
170
0
0
0
0
..
.
120
170
190
0
0
0
0
..
.
120
170
190
200
200
190
..
..
0
0
0
0
0
0
0
0
0
..
.
120
170
190
200
200
190
..
..
0
0
0
0 0
0 0
0 0
0 0
.. ..
. .
120 130
170 160
190 170
200 180
200 180
190 180
.. ..
.. ..
0 0
0 0
0 0
...
0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
.. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . .
...
150 150 150 150 150 150 150 ... 150 150 150 150 150 150 150
130 130 130 130 130 130 130 130 130 130 130 130 130 130
120 120 120 120 120 120 120 ... 120 120 120 120 120 120 120
180 180 180 180 180 180 180 ... 180 180 180 180 180 180 180
210 210 210 210 210 210 210 ... 210 210 210 210 210 210 210
...
.. .. .. .. .. .. .. ... .. .. .. .. .. ..
.. .. .. .. .. .. .. . . . . . .
0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
Now some actual TSNE examples
- sklearn.manifold.TSNE - http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
https://distill.pub/2016/misread-tsne/
https://lvdmaaten.github.io/tsne/
http://jotterbach.github.io/2016/05/23/TSNE/ <-- Curse of dim
http://colah.github.io/posts/2014-10-Visualizing-MNIST/
https://www.oreilly.com/learning/an-illustrated-introduction-to-the-t-sne-algorithm
https://github.com/oreillymedia/t-SNE-tutorial
https://www.youtube.com/watch?v=NEaUSP4YerM&t=618s