ML Clustering2
ML Clustering2
Kmeans.inertia_
Kmeans.score() is negative of inertia
K-Means ++
In 2006 David Arthur and Sergei Vassilvitskii. In their paper
Proposed a smarter initialization step that tend
to select centroids that are distant from one
another and this improvement makes the K-
Means algorithm much less likely to converge to a
suboptimal solution.
They showed that even though this method
requires an additional steps for smarter
initialization. It is worth it because it makes it
possible to drastically reduce the number of the
algorithm needs to run to find the optimal
solution.
k-means++ initialization algorithm
1. Take one centroid c(1), chosen uniformly at random
from the dataset.
2. Take a new centroid c(i) choosing and instance x(i)
with probability sqr(d(xi – xc))/sum((sqr(xj-xc)))
3. This algorithm ensures that any instances farthest
from from the centroid is likely to me a centroid
4. Repeat till all the k –clusters have been found