K-Means Clustering Tutorial - Matlab Code

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

(http://people.revoledu.

com/kardi/)

MENU

KMeanClusteringCodeinMatlab

By Kardi Teknomo, PhD.

(purchase.html)

< Previous (ScreenShot.htm) | Next (kMeanHistogram.htm) | Contents (index.html) >

Purchase the latest e-book with complete code of this k means clustering tutorial here (purchase.html)

KMeansAlgorithminMatlab
For you who like to use Matlab, Matlab Statistical Toolbox (http://www.mathworks.com/access/helpdesk/help/toolbox/stats/multiv16.html)
contain a function name kmeans (http://www.mathworks.com/access/helpdesk/help/toolbox/stats/kmeans.html) . If you do not have the
statistical toolbox, you may use my generic code below. The latest code of kMeanCluster (purchase.html) and distMatrix (purchase.html)
can be downloaded here (purchase.html) . The updated code (purchase.html) can goes to N dimensions. Alternatively, you may use the
old code below (limited to only two-dimensions). For more information about what is k means clustering (WhatIs.htm) , how the
algorithm works (Algorithm.htm) , and numerical example (NumericalExample.htm) of this code, or application to machine learning
(Application.htm) and other resources (Resources.htm) in k means clustering, your may visit the Content of this tutorial (index.html)
functiony=kMeansCluster(m,k,isRand)
%%%%%%%%%%%%%%%%
%
%kMeansClusterSimplekmeansclusteringalgorithm
%Author:KardiTeknomo,Ph.D.
%
%Purpose:classifytheobjectsindatamatrixbasedontheattributes
%Criteria:minimizeEuclideandistancebetweencentroidsandobjectpoints
%Formoreexplanationofthealgorithm,seehttp://people.revoledu.com/kardi/tutorial/kMean/index.html
%Output:matrixdataplusanadditionalcolumnrepresentthegroupofeachobject
%
%Example:m=[11;21;43;54]orinaniceform
%m=[11;
%21;
%43;
%54]
%k=2
%kMeansCluster(m,k)producesm=[111;
%211;
%432;
%542]
%Input:
%mrequired,matrixdata:objectsinrowsandattributesincolumns
%koptional,numberofgroups(default=1)
%isRandoptional,ifusingrandominitializationisRand=1,otherwiseinputanynumber(default)
%itwillassignthefirstkdataasinitialcentroids
%
%LocalVariables
%frownumberofdatathatbelongtogroupi
%ccentroidcoordinatesize(1:k,1:maxCol)
%gcurrentiterationgroupmatrixsize(1:maxRow)
%iscalariterator
%maxColscalarnumberofrowsinthedatamatrixm=numberofattributes
%maxRowscalarnumberofcolumnsinthedatamatrixm=numberofobjects
%temppreviousiterationgroupmatrixsize(1:maxRow)
%zminimumvalue(notneeded)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

ifnargin<3,isRand=0;end
ifnargin<2,k=1;end

[maxRow,maxCol]=size(m)
ifmaxRow<=k,
y=[m,1:maxRow]
else

%initialvalueofcentroid
ifisRand,
p=randperm(size(m,1));%randominitialization
fori=1:k
c(i,:)=m(p(i),:)
end
else
fori=1:k
c(i,:)=m(i,:)%sequentialinitialization
end
end

temp=zeros(maxRow,1);%initializeaszerovector

while1,
d=DistMatrix(m,c);%calculateobjcetscentroiddistances
[z,g]=min(d,[],2);%findgroupmatrixg
ifg==temp,
break;%stoptheiteration
else
temp=g;%copygroupmatrixtotemporaryvariable
end
fori=1:k
f=find(g==i);
iff%onlycomputecentroidiffisnotempty
c(i,:)=mean(m(find(g==i),:),1);
end
end
end

y=[m,g];

end

The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. The code below works only for two
dimensions. If you want to use it for multi-dimensional Euclidean distance, you may purchase the tutorial and the code here
(purchase.html) . Learn about other type of distance here (../Similarity/index.html) .

function d=DistMatrix(A,B)
%%%%%%%%%%%%%%%%%%%%%%%%%
% DISTMATRIX return distance matrix between point A=[x1 y1] and B=[x2 y2]
% Author: Kardi Teknomo, Ph.D.
% see http://people.revoledu.com/kardi/
%
% Number of point in A and B are not necessarily the same.
% It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway),
%
% A and B must contain two column,
% first column is the X coordinates
% second column is the Y coordinates
% The distance matrix are distance between points in A as row
% and points in B as column.
% example: Spacing= dist(A,A)
% Headway = dist(A,B), with hA ~= hB or hA=hB
% A=[1 2; 3 4; 5 6]; B=[4 5; 6 2; 1 5; 5 8]
% dist(A,B)= [ 4.24 5.00 3.00 7.21;
% 1.41 3.61 2.24 4.47;
% 1.41 4.12 4.12 2.00 ]
%%%%%%%%%%%%%%%%%%%%%%%%%%%
[hA,wA]=size(A);
[hB,wB]=size(B);
if hA==1& hB==1
d=sqrt(dot((A-B),(A-B)));
else
C=[ones(1,hB);zeros(1,hB)];
D=flipud(C);
E=[ones(1,hA);zeros(1,hA)];
F=flipud(E);
G=A*C;
H=A*D;
I=B*E;
J=B*F;
d=sqrt((G-I').^2+(H-J').^2);
end
Purchase the latest e-book with complete code of this k means clustering tutorial here (purchase.html)

For more interactive example, you may use the K means program that I made using VB (download.htm)

Do you have question regarding this k means tutorial? Ask your question here (../../Service/index.html)

< Previous (ScreenShot.htm) | Next (kMeanHistogram.htm) | Contents (index.html) >

Copyright 2017 Kardi Teknomo


Revoledu Design

You might also like