Bis Distance
Bis Distance
Mahalanobis distance is also called quadratic distance. It measures the separation of two groups of
objects. Suppose we have two groups with means and , Mahalanobis distance is given by the
following
Formula
The data of the two groups must have the same number of variables (the same number of columns)
but not necessarily to have the same data (each group may have different number of rows).
Mahalanobis distance is also called quadratic distance. It measures the separation of two
groups of objects. Suppose we have two groups with means and , Mahalanobis distance
is given by the following
Formula
The data of the two groups must have the same number of variables (the same number of
columns) but not necessarily to have the same data (each group may have different number of
rows).
For example: Suppose we have two groups of data, each of group consists of two variables
(x, y). The scattered plot of data is shown below.
The pooled covariance matrix of the two groups is computed as weighted average of the
covariance matrices. The weighted average takes this form
The Mahalanobis distance is simply quadratic multiplication of mean difference and inverse
of pooled covariance matrix.
To perform the quadratic multiplication, check again the formula of Mahalanobis distance
above. When you get mean difference, transpose it, and multiply it by inverse pooled
covariance. After that, multiply the result with the mean difference again and you take the
square root. The final result of Mahalanobis distance is
Contact
Example: Suppose we have 4 objects as your training data points and each object have 2
attributes. Each attribute represents coordinate of the object.
Medicine A 1 1
Medicine B 2 1
Medicine C 4 3
Medicine D 5 4
We also know before hand that these objects belong to two groups of medicine (cluster 1 and
cluster 2). The problem now is to determine which medicines belong to cluster 1 and which
medicines belong to the other cluster.
The basic step of k-means clustering is simple. In the beginning we determine number of
cluster K and we assume the centroid or center of these clusters. We can take any random
objects as the initial centroids or the first K objects in sequence can also serve as the initial
centroids.
Then the K means algorithm will do the three steps below until convergence
The numerical example below is given to understand this simple iteration. You may
download the implementation of this numerical example as Matlab code here. Another
example of interactive k-means clustering using Visual Basic (VB) is also available here. MS
excel file for this numerical example can be downloaded at the bottom of this page.
Suppose we have several objects (4 types of medicines) and each object have two attributes
or features as shown in table below. Our goal is to group these objects into K=2 group of
medicine based on the two features (pH and weight index).
Each medicine represents one point with two attributes (X, Y) that we can represent it as
coordinate in an attribute space as shown in the figure below.
1. Initial value of centroids : Suppose we use medicine A and medicine B as the first
centroids. Let and denote the coordinate of the centroids, then and
centroid is , etc.
3. Objects clustering : We assign each object based on the minimum distance. Thus, medicine
A is assigned to group 1, medicine B to group 2, medicine C to group 2 and medicine D to
group 2. The element of Group matrix below is 1 if and only if the object is assigned to that
group.
4. Iteration-1, determine centroids : Knowing the members of each group, now we compute
the new centroid of each group based on these new memberships. Group 1 only has one
member thus the centroid remains in . Group 2 now has three members, thus the
centroid is the average coordinate among the three members:
5. Iteration-1, Objects-Centroids distances : The next step is to compute the distance of all
objects to the new centroids. Similar to step 2, we have distance matrix at iteration 1 is
6. Iteration-1, Objects clustering: Similar to step 3, we assign each object based on the
minimum distance. Based on the new distance matrix, we move the medicine B to Group 1
while all the other objects remain. The Group matrix is shown below
7. Iteration 2, determine centroids: Now we repeat step 4 to calculate the new centroids
coordinate based on the clustering of previous iteration. Group1 and group 2 both has two
We obtain result that . Comparing the grouping of last iteration and this iteration
reveals that the objects does not move group anymore. Thus, the computation of the k-mean
clustering has reached its stability and no more iteration is needed. We get the final grouping
as the results