FDA Lecture5
FDA Lecture5
Clustering
Classification
LUT University
2 Clustering
Cluster analysis
Fuzzy clustering
3 Classification
Fuzzy K-nearest Neighbor classifier
Similarity based classifier
Adaptive neuro-fuzzy inference system (ANFIS)
Cluster analysis
Specification of some similarity relation over a given universe
induces a special similarity structure. Pattern recognition can
then consist of a partition of the whole data set into a number of
subsets demanding that the data within each subset are similar
to a high degree, whereas pairs, where the data are taken from
different subsets, are allowed to be similar only to a very low
degree. Subsets with such a property are called clusters.
Definition
A subset C(j, s0 ) of {O1 , ..., ON } is called a s0 neighbourhood
of Oj , iff
C(j, s0 ) = {Ok , k ∈ {1, ..., N} : µR (j, k ) ≥ s0 }
Procedure 1
a) Choose a decreasing sequence of similarity degrees
1 ≥ s1 > s2 > ... > sr ≥ 0
b) O = {O1 , ..., ON }
c) s = s1 and O0 = O1
d) For s and O0
C10 = {Ok , k ∈ O : µR (o, k ) ≥ s} forms the first cluster
e) O 1 = O|C10
f) If O 1 6= ∅ then choose an object Ov with Ov ∈ / C10 and
µR (o, v ) = minO 1
g) For s and Ov
C11 = {Ok , k ∈ O 1 : µR (v , k ) ≥ s} forms the second
cluster.
h) O 2 = O|(C10 ∪ C11 )
i) if O 2 6= ∅ then choose an object Ow ∈ / (C10 ∪ C11 ) and
max{µR (w, o), µR (w, v )} = minO 2 continue until getting
some O m = ∅
Fuzzy Data Analysis
Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification
Example
In this example we again consider the wine data set and aim to
find out most similar countries by clustering the data.
Example
Data is again first normalized to unit interval and then similarity
matrix between samples is computed.
1.00 0.94 0.90 0.84 0.72 0.73 0.42 0.46 0.76 0.68
0.94 1.00 0.94 0.90 0.79 0.78 0.41 0.55 0.83 0.79
0.90 0.94 1.00 0.94 0.87 0.87 0.51 0.67 0.90 0.83
0.84 0.90 0.94 1.00 0.91 0.87 0.48 0.75 0.84 0.79
0.72 0.79 0.87 0.91 1.00 0.95 0.61 0.84 0.88 0.87
S=
0.73 0.78 0.87 0.87 0.95 1.00 0.67 0.82 0.89 0.90
0.42 0.41 0.51 0.48 0.61 0.67 1.00 0.54 0.48 0.58
0.46 0.55 0.67 0.75 0.84 0.82 0.54 1.00 0.66 0.79
0.76 0.83 0.90 0.84 0.88 0.89 0.48 0.66 1.00 0.86
0.68 0.79 0.83 0.79 0.87 0.90 0.58 0.79 0.86 1.00
Example
Selecting s0 = 0.8 and first object O1 we can form first
cluster C10 = {O1 , O2 , O3 , O4 }.
Next we form set O 1 = {O5 , O6 , O7 , O8 , O9 , O10 }.
Since O 1 6= ∅ we select such Ov which is in O 1 and has
smallest similarity to O1 . Now Ov = O7 .
Next we form second cluster with those objects Ok which
have S(O7 , Ok ) ≥ 0.8 and Ok ∈ O 1 . In this case
C11 = {O7 }.
To form O 2 we need to select such objects which are not in
C10 or in C11 . ⇒ O 2 = {O5 , O6 , O8 , O9 , O10 }.
Since O 2 6= ∅ we select such Ow which is in O 2 and has
smallest similarity to O7 and O1 . Now Ow = O9 .
Example
From O 2 objects, which have similarity higher or equal to
0.8 are selected to next cluster C12 = {O5 , O6 , O9 , O10 }.
To form O 3 we select such objects which are not in
C10 , C11 , C12 . ⇒ O 3 = {O8 }.
Now since in this set there is only one object it forms its
own cluster C13 = {O8 }. After this O 4 = ∅ and all objects
are clustered.
Now data was divided into four clusters Clusters =
[{O1 , O2 , O3 , O4 }, {O7 }, {O5 , O6 , O9 , O10 }, {O8 }].
Procedure 3
a) Choose an increasing sequence of natural number
n1 , n2 , ..., nm ∈ {2, 3, ..., N − 1}, a distance d among
objects and a distance D among clusters
b) For each n = n1 ; l = 1, ..., m; choose n objects and
renumber the object set so that the chosen objects are
O1 , ..., On
c) Construct n clusters C1 , ..., Cn according to the following
principle:
Fuzzy clustering
The problem of cluster analysis consists in a partitioning of a
given set of objects, say O1 , ..., ON , into a number of clusters,
say C1 , .., Cn . Starting from a distance matrix
((d(j, k )))
the objects are to be allocated to the clusters such that each
object belongs to one and only one cluster. This can be
expressed by a partition matrix
and
n
X
∀j ∈ {1, ..., N} : µi (j) = 1 (9)
i=1
N X
N
(" n
# )2
X X
2 2
JR (M) = c0 (µi (j) − µi (l)) − d (j, l) (12)
j=1 l=1 i=1
Procedure 1
a) Choose a distance d and a natural number
n ∈ {2, ..., N − 1} which the number of clusters should not
exceed.
b) Choose a matrix norm ||.|| in Mfno and a small number
> 0.
c) Choose a matrix M (o) ∈ Mfno , i.e. from the set of
non-degenerate n-partitions for initializing the procedure.
For l = 0, 1, ...
(l)
d) Calculate c0 by minimizing JR (M (l) ) with respect to c0
e) By some method of optimization (e.g. an adaption of the
(l)
gradient method) calculate M l+1 for the fixed c0 .
f) If ||M (l) − M (l+1) || < then stop and suggest M (l+1) as
partition, otherwise return to (d) with l + 1 instead of l
>
In particular, any positive-definite matrix T ∈ Mt×t induces such
a norm via a weighted inner product
(l) (l)
Ij = {1, ..., n}|Ij
Fuzzy Data Analysis
Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification
(l)
f) If Ij = ∅ then
−1
(l) 2/(q−1)
n
(l+1) X dij
µi (j) =
(l)
s=1 dsj
(i)
If Ij 6= ∅ then
(l+1) (l+1)
∀i ∈ Ij−l : µi
P
(j) = 0 and i∈Ij
(l) µi (j) = 1
Example
Consider following five samples in x with first arbitrary
partitioning matrix M and let the number of clusters be two.
Assume fuzziness parameter q = 2 and distance to be
Minkowski distance with p = 1.
1 1 1 0
3 2 0 1
x = 1 4 M = 0
1
1 2 0.9 0.1
3 2 0.8 0.2
Example
Next we calculate distance between cluster centers and data
Xt
samples by using dij = |xjk − vik |.
k =1
1.22 2.91
1.78 1.83
d =
2.96 2.17
0.96 1.91
1.78 1.83
Example
After computing the distance matrix next we need to update
" n #−1
X dij 2/(q−1)
partitioning matrix by µij =
dsj
s=1
0.85 0.15
0.51 0.49
2
M = 0.35 0.65
0.80 0.20
0.51 0.49
Next we need to compare current partitioning matrix M 2 with the
previous one M 1 and compute distance between them. If
||M (l) − M (l+1) || < we stop and otherwise start the process again by
computing new cluster centers now using current partitioning matrix.
Choosing distance as Minkowski with p = 1 and = 0.00001 we see
that stopping criteria is not met and repeat computations.
Fuzzy Data Analysis
Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification
Example
Repeating this process leads to cluster centers being
v1 = [2.62, 2.08] and v2 = [1.06, 2.31] and partitioning matrix
0.20 0.79
0.96 0.04
26
M = 0.20 0.80
0.04 0.96
0.96 0.04
Classification
The problem of classification is basically one of partitioning
the feature space into regions, one region for each
category. Ideally, one would like to arrange this partitioning
so that none of the decisions are ever wrong.
When this cannot be done, one would like to minimize the
probability of error.
where uij is the membership in the ith class of the jth pattern of
the labeled pattern set.
Fuzzy Data Analysis
Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)
where power value m is fixed for all i, d. Other popular ways are
applying Bonferroni mean, OWA or optimizing vi .
We decide that x ∈ Cm if
Algorithm
The method starts with normalization of the data. After this, an
ideal vector (e.g. the mean vector which is used in this case) is
calculated for every class. Samples are classified so that the
similarity value between the ideal and test vector is calculated.
The sample can be classified into the class with the highest
similarity value. The method gets the test element test
(dimension dim), learning set learn (dimension dim, n different
classes) and dimension (dim) of the data as its parameters. In
addition, different weights for features can be set in weights and
value p and m can be set for the similarity measure.
f1 f2 f3 f4 c
64 32 45 15 1
69 31 49 15 1
74 28 61 19 2
79 38 64 20 2
51 35 14 2 3
49 30 14 2 3
Example
All values are positive. Maximum values for features:
[79, 38, 64, 20]
Computing the mean vectors v1 , v2 , v3 for classes:
v1 = [ 64+69
2 ,
32+31 45+49 15+15
2 , 2 , 2 ] = [66.5, 31.5, 46.5, 15]
74+79 28+38 61+64 19+20
v2 = [ 2 , 2 , 2 , 2 ] = [76.5, 33, 62.5, 19.5]
v3 = [ 51+49
2 ,
35+30 14+14 2+2
2 , 2 , 2 ] = [50, 32.5, 14, 2]
Compute the total similarity values between the sample and the
mean vectors that are representing the classes:
Assuming p = 1
S(x, v1 ) = 1/4(1 − |69/79 − 66.5/79| + 1 − |31/38 − 31.5/38| + 1 −
|51/64 − 46.5/64| + 1 − |20/20 − 15/20|) = 0.9087
S(x, v2 ) = 1/4(1 − |69/79 − 76.5/79| + 1 − |31/38 − 33/38| + 1 −
|51/64 − 62.5/64| + 1 − |20/20 − 19.5/20|) = 0.9119
S(x, v3 ) = 1/4(1 − |69/79 − 50/79| + 1 − |31/38 − 32.5/38| + 1 −
|51/64 − 14/64| + 1 − |20/20 − 2/20|) = 0.5605
Example
Now our sample has similarity value of 0.9087 to class one,
0.9119 to class two and 0.5605 to class three. We make the
decision to which class the sample belong according to highest
similarity value. Highest similarity value is now 0.9119 which
was received when new sample was compated to second class
vector so we decide that this sample belongs to class two.
Notice that if you do this using e.g. minkowsky metric you will
end up classifying this sample wrongly.
H−H
0.95 A−H
G−G
A−G
0.9
0.85
0.8
0.75
0.7
0 5 10 15 20 25 30 35 40 45 50
power value
1
1
0.95
0.9
Classification %
0.9
0.8
0.7
0.85
0.6
0.8 0.5
0.4
0.75 10
12
10
10
20 5 5
8
0 0
6
−20 −5
4
0 −10
2 −40 P value mean value
0 −60
(a) (b)
1 1
Classification %
Classification %
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
10 10
10 20
8 0
5 6 5 −20
4 −40
2 −60
0 0 0 −80
P value mean value P value mean value
(a) (b)
Example
Yu’s T-norm and S-norm are as follows:
Example
where
Snhx, vi = min(1, x + v + λxv) (26)
and negation is noted as x̄ = 1 − x for x, v ∈ [0, 1]d .