0% found this document useful (0 votes)
5 views75 pages

FDA Lecture5

fuzzy data analysis 5

Uploaded by

ha.hassanamjad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views75 pages

FDA Lecture5

fuzzy data analysis 5

Uploaded by

ha.hassanamjad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Qualitative analysis, introduction

Clustering
Classification

Fuzzy Data Analysis

LUT University

February 17, 2021

Fuzzy Data Analysis


Qualitative analysis, introduction
Clustering
Classification

1 Qualitative analysis, introduction

2 Clustering
Cluster analysis
Fuzzy clustering

3 Classification
Fuzzy K-nearest Neighbor classifier
Similarity based classifier
Adaptive neuro-fuzzy inference system (ANFIS)

Fuzzy Data Analysis


Qualitative analysis, introduction
Clustering
Classification

As outlined earlier, the main aim of data analysis is pattern


cognition, e.g. as detection of untypical items, clusters,
structures and functional relationships. The data used here will
be either crisp or simple fuzzy data. Fuzziness occurs in either
data being fuzzy and modelled as simple fuzzy data or
methods applied to data being such that their origin is in fuzzy
set theory (e.g. functional relationships being uncertain and
modeled by fuzzy methods). When going through applications,
the specification of the fuzzy data will be part of the
consideration. "Something" in question will be referred to,
traditionally, as an "object" (situation, experiment) and the
description of its present "state" is characterized by a set of
"features" (attributes, properties, statements).

Fuzzy Data Analysis


Qualitative analysis, introduction
Clustering
Classification

There are, essentially, two attitudes towards the incorporation


of fuzziness into data analysis.
The first attitude consists in an adaption of methods used
in data analysis with crisp data introducing some aspects
of fuzzy set theory.
The second attitude consists in a consequent considering
of the data to be fuzzy and hence handling them according
to the lines of fuzzy set theory.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Cluster analysis
Specification of some similarity relation over a given universe
induces a special similarity structure. Pattern recognition can
then consist of a partition of the whole data set into a number of
subsets demanding that the data within each subset are similar
to a high degree, whereas pairs, where the data are taken from
different subsets, are allowed to be similar only to a very low
degree. Subsets with such a property are called clusters.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

As usual in cluster analysis the data set is represented in the


following form. Each object Oj of a finite set, say {O1 , ..., ON },
is evaluated by the same complex of t features taking values
from a given universe U. As a rule, U is a subset of some
t-dimensional set, aggregated by t sub-universes U1 , ..., Ut .
Hence the feature values belonging to the j-th object form a
"vector" (x1j , ..., xtj ) in a t-fold set, with xij ∈ Ui . Starting point of
cluster analysis is the data matrix

((xij )) ∈ Mt×N (1)

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

The aim of cluster analysis is the cognition of qualitative


diversities within the given set of objects. A unique solution can
not be expected at all, because the problem of mathematical
cluster analysis is posed fuzzily. Moreover, obviously, the result
is also influenced by the choice of the similarity relation (or
equivalently, of the distance function).
If the similarity structure of the object set is given directly by
similarity matrix

((µR (j, k ))) ∈ Mt×N (2)

Definition
A subset C(j, s0 ) of {O1 , ..., ON } is called a s0 neighbourhood
of Oj , iff
C(j, s0 ) = {Ok , k ∈ {1, ..., N} : µR (j, k ) ≥ s0 }

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Procedure 1
a) Choose a decreasing sequence of similarity degrees
1 ≥ s1 > s2 > ... > sr ≥ 0
b) O = {O1 , ..., ON }
c) s = s1 and O0 = O1
d) For s and O0
C10 = {Ok , k ∈ O : µR (o, k ) ≥ s} forms the first cluster
e) O 1 = O|C10
f) If O 1 6= ∅ then choose an object Ov with Ov ∈ / C10 and
µR (o, v ) = minO 1
g) For s and Ov
C11 = {Ok , k ∈ O 1 : µR (v , k ) ≥ s} forms the second
cluster.
h) O 2 = O|(C10 ∪ C11 )
i) if O 2 6= ∅ then choose an object Ow ∈ / (C10 ∪ C11 ) and
max{µR (w, o), µR (w, v )} = minO 2 continue until getting
some O m = ∅
Fuzzy Data Analysis
Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Then C10 , C11 , ..., C1m−1 is a suggestion for a cluster partition


and the procedure is repeated with setting O2 = O0 in (c), etc.
When the procedure has run for On = O0 then the procedure is
repeated with setting s2 = s and O1 = O0 in (c), etc, until,
finally, sr and ON are reached.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Usually, in the first step for s1 each objects forms a cluster if


itself, indifferently to the starting object. In the course of running
through sl ; l = 1, ..., r ; some clusters will obtain more and more
elements, and finally, for sr , as a rule, all objects form only one
joint cluster. Such cluster procedures are called hierarchical.
The final choice of a cluster partition from the large set of
suggestions then has to be made by cluster validation in the
given practical context.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Example
In this example we again consider the wine data set and aim to
find out most similar countries by clustering the data.

Table: Wine data set


Country Liquor Wine Beer Life Expectancy Heart disease rate
France 2.5 63.5 40.1 78 61.1
Italy 0.9 58 25.1 78 94.1
Switzerland 1.7 46 65 78 106.4
Australia 1.2 46 102.1 78 173
Great Britain 1.5 12.2 100 77 199.7
United States 2 8.9 87.8 76 176
Russia 3.8 2.7 17.1 69 373.6
Czech Republic 1 1.7 140 73 283.7
Japan 2.1 1 55 79 34.7
Mexico 0.8 0.2 50.4 73 36.4

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Example
Data is again first normalized to unit interval and then similarity
matrix between samples is computed.
 
1.00 0.94 0.90 0.84 0.72 0.73 0.42 0.46 0.76 0.68

 0.94 1.00 0.94 0.90 0.79 0.78 0.41 0.55 0.83 0.79 


 0.90 0.94 1.00 0.94 0.87 0.87 0.51 0.67 0.90 0.83 


 0.84 0.90 0.94 1.00 0.91 0.87 0.48 0.75 0.84 0.79 

 0.72 0.79 0.87 0.91 1.00 0.95 0.61 0.84 0.88 0.87 
S= 

 0.73 0.78 0.87 0.87 0.95 1.00 0.67 0.82 0.89 0.90 


 0.42 0.41 0.51 0.48 0.61 0.67 1.00 0.54 0.48 0.58 


 0.46 0.55 0.67 0.75 0.84 0.82 0.54 1.00 0.66 0.79 

 0.76 0.83 0.90 0.84 0.88 0.89 0.48 0.66 1.00 0.86 
0.68 0.79 0.83 0.79 0.87 0.90 0.58 0.79 0.86 1.00

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Example
Selecting s0 = 0.8 and first object O1 we can form first
cluster C10 = {O1 , O2 , O3 , O4 }.
Next we form set O 1 = {O5 , O6 , O7 , O8 , O9 , O10 }.
Since O 1 6= ∅ we select such Ov which is in O 1 and has
smallest similarity to O1 . Now Ov = O7 .
Next we form second cluster with those objects Ok which
have S(O7 , Ok ) ≥ 0.8 and Ok ∈ O 1 . In this case
C11 = {O7 }.
To form O 2 we need to select such objects which are not in
C10 or in C11 . ⇒ O 2 = {O5 , O6 , O8 , O9 , O10 }.
Since O 2 6= ∅ we select such Ow which is in O 2 and has
smallest similarity to O7 and O1 . Now Ow = O9 .

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Example
From O 2 objects, which have similarity higher or equal to
0.8 are selected to next cluster C12 = {O5 , O6 , O9 , O10 }.
To form O 3 we select such objects which are not in
C10 , C11 , C12 . ⇒ O 3 = {O8 }.
Now since in this set there is only one object it forms its
own cluster C13 = {O8 }. After this O 4 = ∅ and all objects
are clustered.
Now data was divided into four clusters Clusters =
[{O1 , O2 , O3 , O4 }, {O7 }, {O5 , O6 , O9 , O10 }, {O8 }].

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Note: Similarity degree s0 used to form partitioning effects


quite much on forming the clusters. Closer to 1 objects are
partitioned so that each form their own cluster and closer
to zero we end up with just one cluster.
Note: In this example we started from first object O1
forming first cluster. Selecting this randomly can also
change somewhat the partitioning.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

If the similarity structure is specified by a distance with the


distance matrix

((d(j, k ))) ∈ MN×N (3)


then the possibilities of constructing cluster analysis methods
become more and more rich. This will be of importance also for
the case of clustering on the basis of similarity degree, since
there are possibilities for introducing distances by similarity
degrees, e.g.

d(j, k ) = 1 − µR (j, k ) (4)


defines an ultra-metric, if R is max-min-transitive.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Simplest hierarchical clustering procedure (using distance):


Procedure 2
a) Choose a special distance d of objects and with that a
distance D of clusters
b) Each object forms its own one-object-cluster
c) The two clusters showing a shortest distance are joined
together
d) The performance according to (c) is repeated for the
updated set of clusters until either all objects have been
joined into one cluster or a prescribed threshold level, say
d0 , is smaller than all remaining inter-cluster distances.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Cluster procedure become more flexible if partition of clusters


is allowed as well. A rather elaborated procedure, usually given
for an Euclidean distance, is called ISODATA which is an
abbreviation of the description Iterative
Self-Organizing-Data-Analysis-Technique. It was developed
and coined by Ball and Hall 1965.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Procedure 3
a) Choose an increasing sequence of natural number
n1 , n2 , ..., nm ∈ {2, 3, ..., N − 1}, a distance d among
objects and a distance D among clusters
b) For each n = n1 ; l = 1, ..., m; choose n objects and
renumber the object set so that the chosen objects are
O1 , ..., On
c) Construct n clusters C1 , ..., Cn according to the following
principle:

∀j ∈ {n + 1, ..., N} : d(j, k ) = minl∈{1,..,n} d(j, l) (5)


⇒ Oj belongs to Ck initiated by Ok
d) For each Ck compute a focal object, say Qk with the fictive
index qk , according to the chosen distance d

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

e) Two clusters, Ck 1 and Ck 2 , are joined together, if their


distance D(Ck 1 , Ck 2 ) is smaller than a given threshold level
D0
f) A cluster Ck is partioned if the inner dissimilarity
X
d(j, qk )/cardCk (6)
Oj ∈Ck

exceeds a given upper bound b0 (this can be managed by


the same procedure with n=2)
g) For the clusters obtained according to (e) and (f ) the focal
objects are updated and the performance is repeated
running (e) and (f ) until some stabilization of the clusters
is observed.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Fuzzy clustering
The problem of cluster analysis consists in a partitioning of a
given set of objects, say O1 , ..., ON , into a number of clusters,
say C1 , .., Cn . Starting from a distance matrix
((d(j, k )))
the objects are to be allocated to the clusters such that each
object belongs to one and only one cluster. This can be
expressed by a partition matrix

M = ((µi (j))) ∈ Mn×N (7)


where µi (j) is the membership value of Oj in Ci , i.e.

∀i ∈ {1, ..., n}∀j ∈ {1, ..., N} : µi (j) ∈ {0, 1} (8)

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

and
n
X
∀j ∈ {1, ..., N} : µi (j) = 1 (9)
i=1

Usually these conditions are completed by


N
X
∀i ∈ {1, ..., n} : 0 < µi (j) < N (10)
j=1

A partition matrix M satisfying the conditions (8), (9) and (10) is


called hard-n-partition

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Frequently allocation of some object to one of the clusters is


rather arbitrary, e.g. in transitional regions or if the object fits to
several clusters likewise. Hence allowing for gradual
membership leads, in general, to more realistic statements.
Turning to fuzzy partitions

∀i ∈ {1, ..., n}∀j ∈ {1, ..., N} : µi (j) ∈ [0, 1] (11)


It seems desirable that the hard-n-partition case is a particular
case of fuzzy clustering. Hence Bezdek called a matrix M,
satisfying (11), (9) and (10), a fuzzy-n-partition. When deleting
(10), i.e. allowing for empty clusters, M is a degenerate
fuzzy-n-partition. With regard to properties of the respective
matrix sets see the book from Bezdek 1981.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Since the membership can now vary continuously within [0, 1] it


is possible to choose a suitable objective function, called
clustering criterion, and search for optimum partitions. These
functions will be differentiable, as a rule, and because of this we
can use wellknown numerical methods for optimizing
procedures (mainly gradient based methods).
Ruspini 1972 considered among others the following criterion,
which is said to work successfully in many real situations.Let
Mfno be the set of degenerate fuzzy-n-partitions, then Ruspini’s
objective function JR |Mfno → R + has the form
∀M ∈ Mfno

N X
N
(" n
# )2
X X
2 2
JR (M) = c0 (µi (j) − µi (l)) − d (j, l) (12)
j=1 l=1 i=1

where c0 > 0 is a fitting parameter and n ∈ {2, ..., N − 1} is to


be fixed beforehand.
Fuzzy Data Analysis
Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Ruspini interprets JR as a measure of cluster quality based on


local density, because JR will be small, when the terms in (12)
are individually small; in turn, this will occur, when close pairs of
points have nearly equal fuzzy cluster memberships µi in M.
For JR Ruspini suggested an algorithm to compute optimum
fuzzy-n-partitions. This algorithm is an adaption of the usual
gradient method to find approximations of local minima. The
following procedure gives only an outline of this algorithm. For
details and more comments see Ruspini 1972, especially with
respect to the methods applied in (e).

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Procedure 1
a) Choose a distance d and a natural number
n ∈ {2, ..., N − 1} which the number of clusters should not
exceed.
b) Choose a matrix norm ||.|| in Mfno and a small number
 > 0.
c) Choose a matrix M (o) ∈ Mfno , i.e. from the set of
non-degenerate n-partitions for initializing the procedure.
For l = 0, 1, ...
(l)
d) Calculate c0 by minimizing JR (M (l) ) with respect to c0
e) By some method of optimization (e.g. an adaption of the
(l)
gradient method) calculate M l+1 for the fixed c0 .
f) If ||M (l) − M (l+1) || <  then stop and suggest M (l+1) as
partition, otherwise return to (d) with l + 1 instead of l

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Fuzzy C-mean (FCM) clustering


Pointing to some provisos to be made with respect to the
Ruspini approach Bezdek (Bezdek 1981) suggested an
alternative criterion starting from ISODATA and taking
advantage of the widespread prediletion for least-square
methods. Later on this came to be called as fuzzy C-mean
clustering. The procedure requires the existence of an inner
product norm metric over the whole feature space, usually this
is a Euclidean norm. Let xj denote the feature vector of Oj and
µi (j) the membership value of Oj in Ci . Then for q ∈ (1, ∞)
N
X N
X
q
vi = (µi (j)) xj / (µi (j))q (13)
j=1 j=1

is called the q-mean of Ci (often later on referred to as c-mean).

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

>
In particular, any positive-definite matrix T ∈ Mt×t induces such
a norm via a weighted inner product

< x, y >= x T Ty ; x, y ∈ R t (14)


Relative to this special class of norms the Bezdek criterion for
objective function JB can be written as
N X
X n
JB (M, V , T ) = (µi (j))q ||vi − xj ||2T (15)
j=1 i=1

where V = (v1 , ..., vn ) and

dij2 = ||vi − xj ||2T = (xj − vi )T T (xj − vi ) (16)

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

In essence procedure consist of three main steps which are


iteratively updated until membership matrix does not change
more than  or until maximum number of iterations.
1. Compute distance between cluster centers vi and objects
t
X
xj . E.g. by dij = ( (xjk − vik )2 )1/2 .
k =1
2. Update membership matrix according to
" n   #−1
X dij 2/(q−1)
µij =
dsj
s=1

3. Update cluster centers


N
X N
X
vi = (µij )q xj / (µij )q
j=1 j=1

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Procedure 2, more detailed algorithm for FCM


a) Choose an inner product norm metric d and a natural
number n ∈ {2, ..., N − 1}, which the number of clusters
should not exceed.
b) Choose a number q ∈ (1, ∞)
c) Choose a matrix norm ||.|| in Mfno and a small number
>0
d) Choose a matrix M (0) ∈ Mfn for initializing the procedure,
For l = 0, 1, 2, ...
(l)
e) Calculate the n cluster centres {vi } with (13) for
M = M (l) .
(l)
f) Update M (l) using {vi }:
Define
(l) (l) (l)
Ij = {i ∈ {1, ..., n} : dij = d(vi , xj ) = 0}

(l) (l)
Ij = {1, ..., n}|Ij
Fuzzy Data Analysis
Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

(l)
f) If Ij = ∅ then
 −1
(l) 2/(q−1)
 
n
(l+1) X  dij 
µi (j) = 

(l) 
s=1 dsj

(i)
If Ij 6= ∅ then
(l+1) (l+1)
∀i ∈ Ij−l : µi
P
(j) = 0 and i∈Ij
(l) µi (j) = 1

g) If ||M (l) − M (l+1) || <  then stop and suggest M (l+1) as


partition, otherwise return to (e) with l + 1 instead of l.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Example
Consider following five samples in x with first arbitrary
partitioning matrix M and let the number of clusters be two.
Assume fuzziness parameter q = 2 and distance to be
Minkowski distance with p = 1.
   
1 1 1 0
 3 2   0 1 
   
x =  1 4 M =  0
   1  
 1 2   0.9 0.1 
3 2 0.8 0.2

To compute first iteration for cluster centers by


N
X N
X
q
vi = (µij ) xj / (µij )q we get v1 = [1.59, 1.63] and
j=1 j=1
v2 = [2.04, 2.87].
Fuzzy Data Analysis
Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Example
Next we calculate distance between cluster centers and data
Xt
samples by using dij = |xjk − vik |.
k =1
 
1.22 2.91

 1.78 1.83 

d =
 2.96 2.17 

 0.96 1.91 
1.78 1.83

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Example
After computing the distance matrix next we need to update
" n   #−1
X dij 2/(q−1)
partitioning matrix by µij =
dsj
s=1 
0.85 0.15
 0.51 0.49 
2
 
M =  0.35 0.65 

 0.80 0.20 
0.51 0.49
Next we need to compare current partitioning matrix M 2 with the
previous one M 1 and compute distance between them. If
||M (l) − M (l+1) || <  we stop and otherwise start the process again by
computing new cluster centers now using current partitioning matrix.
Choosing distance as Minkowski with p = 1 and  = 0.00001 we see
that stopping criteria is not met and repeat computations.
Fuzzy Data Analysis
Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

Example
Repeating this process leads to cluster centers being
v1 = [2.62, 2.08] and v2 = [1.06, 2.31] and partitioning matrix
 
0.20 0.79
 0.96 0.04 
26
 
M =  0.20 0.80 

 0.04 0.96 
0.96 0.04

Based on this samples {2, 5} have highest membership degree


to first cluster and samples {1, 3, 4} highest to second cluster.

Fuzzy Data Analysis


Qualitative analysis, introduction
Cluster analysis
Clustering
Fuzzy clustering
Classification

As theoretical considerations show, the larger q is, the "fuzzier"


are the membership assignments; and conversely, as q
converges to 1, fuzzy c-means partitions become hard ones.
Although the procedure presented do work well in many
practical problems, if controlled by useful knowledge from the
applying branch of science or engineering, there has been
pointed out also e.g. the difficulty to find suitable initializations
and parameters for the algorithm. Also the researchers has
commented on optimality of the partitions suggested:
Stationary points of any objective function are not necessarily
local minima. There is no assurance that even a global
optimum of any objective function is a "good" clustering.
Different choices of algorithmic parameters may yield different
"optimum" partitions.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Classification
The problem of classification is basically one of partitioning
the feature space into regions, one region for each
category. Ideally, one would like to arrange this partitioning
so that none of the decisions are ever wrong.
When this cannot be done, one would like to minimize the
probability of error.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Usually, when we start to classify, we merely have some vague,


general knowledge about the situation, together with a number
of design samples, particular representatives of the patterns we
want to classify. The problem, then, is to find some way to use
this information to design the classifier.
It is difficult to classify sharply where different classifiers
belong. Many use several different methods, and it might be
difficult to say in which category each method belongs. In
classification, learning means that algorithm usually learns
through samples of how the classification should be done, and
then it can classify data sets with similar problems.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

One way to try to categories the different types of classification


methods would be:
1. Supervised and unsupervised learning methods
2. Parametric and nonparametric approaches
3. Mathematical learning methods
Learning methods that use probability
Learning methods that use logic
Topological spaces
4. Structural classification
Using trees in classification
Hidden Markov Models
5. Generalized classification methods from nature and real
life
Neural network classification
Functional networks
Fuzzy classification
Fuzzy Data Analysis
Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Learning or adaptation is supervised when there is a desired


response that can be used by the system to guide the learning.
If the learner is given a set of examples and each example
shows what output will be returned for a given input, then this
type of learning can be classified as supervised learning.
Notice that the categorization is not explicit but one
classification method can easily belong the several group e.g.
similarity classifier is supervised learning method, parametric
approach, learning method that use logic and can be also
considered to be generalized classification method originating
from nature and real life.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Classification process in general:


In classification we have the information to which class the
sample belongs.
We will use this information to fit our classification method
to do the classification task to the data set at hand as well
as possible.
To do this we need to divide our data into training set and
testing set and apply proper crossvalidation technique.
For crossvalidation most popular techniques are
n-fold crossvalidation
leave one out technique
hold out technique.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Training set is used to fit our classifier to our data set as


well as possible and we are using the class information of
the training samples to do this.
Testing set is used to test how well our method managed
to classify particular problem (with out having information
of the class of the samples).
After we have classification results from the classifier
(class values for test set samples) we can use test set
class information to measure how good the actual
classification is.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Evaluation measures for classification results.


Most often used measure is classification accuracy
no. of correctly classified samples
(Accuracy = ). Besides this
no. of all samples
especially for binary classification task several other measures
exists.
For example assume binary classification task with two classes
{Positive, Negative}, i.e. {Sick , healthy }. We can divide
classification result and actual state into four part:
True Positive(TP): Correctly identified positive condition i.e.
Sick people correctly identified as sick.
False Positive (FP): Healthy people incorrectly identified as
sick.
True Negative (TN): Healthy people correctly identified as
healthy.
False Negative (FN): Sick people incorrectly identified as
healthy.
Fuzzy Data Analysis
Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

From {TP, FP, TN, FN} we can construct several measures to


evaluate how good classification results are. Besides accuracy
two most widely used ones are:
Sensitivity: Sensitivity refers to the classifiers ability to
correctly detect patients who do have the condition.
Sensitivity =
number of true positives
=
number of true positives + number of false negative
TP
= probability of a positive test given that patient
TP + FN
has the disease.
Specificity: Specificity relates to classifiers ability to
correctly detect patients without a condition. Specificity =
number of true negatives
=
number of true negatives + number of false positives
TN
=probability of a negative test given that the
TN + FP
patient is well.
Fuzzy Data Analysis
Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Other often applied measures:


TP
Precision or positive predictive value (PPV): PPV = TP+FP
TN
Negative predictive value (NPV): NPV = TN+FN
FP
False positive rate (FPR): FPR =FP+TN
FN
False negative rate (FNR): FNR = TP+FN
FP
False discovery rate (FDR): FNR = TP+FP
TP+TN
Accuracy: Acc = TP+FP+FN+TN
One other measure which is often computed to get better
understanding of the data at hand is
no. of condition positive
Prevalence =
Total population

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Fuzzy Classification methods


Next as this course is fuzzy data analysis our focus turn into
classifiers related to fuzzy set theory.
Range of fuzzy classifiers in quite wide nowaday. To give
somekind of understanding in the following there is a list of few
more widely used classification methods that use fuzzy logic:
Fuzzy K-nearest Neighbor
Fuzzy ARTMAP
Fuzzy OARTMAP
Evidential distance based classifier
NEFCLASS - Neuro-fuzzy classification
Fuzzy Discrimination analysis
Fuzzy rule based classification system
Fuzzy classification using self-organizing maps and
learning vector quantization
Fuzzy Support Vector Machine
Fuzzy Data Analysis
Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Fuzzy K-nearest Neighbor classifier


The conventional K-NN is a simple algorithm that is used to assign
patterns of unknown classification to the class of majority of its K
nearest neighbors of known classification according to a distance
measure. One drawback of the method is that each of the patterns of
known classification is considered equally important in the the
assignment of the pattern to be classified. This can cause difficulties
in regions where pattern data overlap. To overcome this drawback, a
fuzzy version of the K-NN was proposed in 1985. In this method, the
assigned memberships play a role in the amount of contribution of a
pattern during the classification process. As a result, the selected
pattern that has low membership suggests small contribution on the
classification of a pattern. Therefore, this can allow the
misclassification rate to decrease even for class pattern data that
overlap. However, the result may still be sensitive to the selection of
K.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Nearest neighbor classifiers require no preprocessing fo the


labeled pattern set prior to their use. The crisp 1-nearest
neighbor classification rule assigns a given input pattern, which
is of unknown classification, to the class of its nearest neighbor.
This idea can be extended to K-nearest neighbors with the
given pattern being assigned to the class that is represented by
a majority among the K-nearest neighbors.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

This can be summarized as follows


Let W = {x1 , · · · , xn }
BEGIN
Input y , of unknown classification
Set K, 1 ≤ K ≤ n
Initialize i = 1
DO UNTIL (K-nearest neighbors are found)
Compute distance from y to xi
IF (i ≤ K ) THEN
Include xi in the set of K-nearest neighbors
ELSE IF (xi is closer to y than any previous nearest neighbor)
THEN
Delete farthest pattern in the set of K-nearest neighbors;
Include xi in the set of K-nearest neighbors;
ENDIF
Increment i
END DO UNTIL

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Determine the majority class represented i the set of K-nearest


neighbors
IF (a tie exists) THEN
Compute sum of distances of neighbors in each class that tied
IF (no tie occurs) THEN
Classify y in the class of minimum sum
ELSE
Classify y in the class of last minimum found
END IF
ELSE
Classify y in the majority class
END IF END

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

The fuzzy K-NN algorithm


The fuzzy K-nearest neighbor algorithm assigns class
membership to a pattern rather than assigning the pattern to a
particular class. The membership values for the pattern should
provide a level fo assurance to accompany the resultant
classification. The basis of the algorithm is to assgin
membership as a function of the pattern distance from its
K-nearest neighbors and those neigbors’ memberships in the
possible classes. the assigned membership of the pattern x is
computed as
PK 2/(m−1)

j=1 uij 1/||x − xj ||
ui (x) = PK  (17)
2/(m−1)
j=1 1/||x − xj ||

where uij is the membership in the ith class of the jth pattern of
the labeled pattern set.
Fuzzy Data Analysis
Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

As seen in (17), the assigned memberships of pattern x are


influenced by the inverse of the distances from the nearest
neighbors and their class memberships. The inverse distance
provides more weight to a pattern’s membership if it is closer
and less if it is farther from the pattern under consideration. In
addition, the labeled patterns can be assigned class
memberships in several ways. One reasonable membership
assignment is each class can be computed as
(
0.51 + (nj /K ) ∗ 0.49 if j = i
uj = (18)
(nj /K ) ∗ 0.49 if j 6= i
where nj denotes the number of neigbors which belong to the
jth class.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Let W = {x1 , · · · , xn } be a set of n labeled patterns.


BEGIN
INPUT x, of unknown classification
Set K, i ≤ K ≤ n
Initialize i = 0
DO UNTIL (K-nearest neighbors to x found)
Compute distance form x to xi
IF (i ≤ K ) THEN
Include xi in the set of K-nearest neighbors
END IF
END DO UNTIL
Initialize i = 0
DO UNTIL (x assigned membership in all classes)
Compute ui (x) using (17)
Increment i
END DO UNTIL
END

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Similarity based classifier


We would like to classify a set X of objects into N different
classes C1 , . . . , CN by their features. We suppose that D is the
number of different kinds of features f1 , . . . , fD that we can
measure from objects. We suppose that the values for the
magnitude of each feature is normalized so that it can be
presented as a value between [0, 1]. So, the objects we want to
classify are basically vectors that belong to [0, 1]D .
First one must determine for each class the ideal vector
vi = (vi (1), . . . , vi (D)) that represents class i as well as
possible. This vector can be user defined or calculated from
some sample set Xi of vectors x = (x(1), . . . , x(D)) which are
known to belong to class Ci .

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

We can use i.e. the generalized mean for calculating vi , which


is
 1
m
1 X
m
vi (d) =  x(d) , ∀d = 1, . . . , D (19)
]Xi
x∈Xi

where power value m is fixed for all i, d. Other popular ways are
applying Bonferroni mean, OWA or optimizing vi .

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Once the ideal vectors have been determined, then the


decision to which class an arbitrarily chosen x ∈ X belongs to
is made by comparing it to each ideal vector. The comparison
can be done e.g. by using similarity in the generalized
Łukasiewicz structure
D
!1/m
1 X m/p
Shx, vi = wd (1 − |x(d)p − v (d)p |) , (20)
D
d=1

for x, v ∈ [0, 1]d . Here p is a parameter coming from


generalized similarity measure.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

We decide that x ∈ Cm if

Shx, vm i = max Shx, vi i . (21)


i=1,...,N

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Algorithm
The method starts with normalization of the data. After this, an
ideal vector (e.g. the mean vector which is used in this case) is
calculated for every class. Samples are classified so that the
similarity value between the ideal and test vector is calculated.
The sample can be classified into the class with the highest
similarity value. The method gets the test element test
(dimension dim), learning set learn (dimension dim, n different
classes) and dimension (dim) of the data as its parameters. In
addition, different weights for features can be set in weights and
value p and m can be set for the similarity measure.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

The method is presented in pseudo-code form in the following:


Require: test,learn[1, . . . , n],weights,dim
scale test between [0, 1]
scale learn between [0, 1]
for i = 1 to n do
idealvec[i] = r
IDEAL[learn[i]]
Pdim √ m
p
m
j=1 weights[i] 1−|idealvec[i][j]p −test[j]p |
maxsim[i] = dim
end for
class = arg maxi maxsim[i]

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Next very simplified example will so the basic idea on comparing


samples and ideal vectors.
Example
As a simplyfied example to demonstrate the basic idea: We have following
data of six samples each having four measured values (f1 ,...,f4 ) in Table 10
and we know which class they belong (class indicated in fifth column c). we
get a new sample: x = [69 31 51 20] without the knowledge to which
class it belongs. Compute, using similarity, to which class the sample
belongs.

f1 f2 f3 f4 c
64 32 45 15 1
69 31 49 15 1
74 28 61 19 2
79 38 64 20 2
51 35 14 2 3
49 30 14 2 3

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Example
All values are positive. Maximum values for features:
[79, 38, 64, 20]
Computing the mean vectors v1 , v2 , v3 for classes:
v1 = [ 64+69
2 ,
32+31 45+49 15+15
2 , 2 , 2 ] = [66.5, 31.5, 46.5, 15]
74+79 28+38 61+64 19+20
v2 = [ 2 , 2 , 2 , 2 ] = [76.5, 33, 62.5, 19.5]
v3 = [ 51+49
2 ,
35+30 14+14 2+2
2 , 2 , 2 ] = [50, 32.5, 14, 2]
Compute the total similarity values between the sample and the
mean vectors that are representing the classes:
Assuming p = 1
S(x, v1 ) = 1/4(1 − |69/79 − 66.5/79| + 1 − |31/38 − 31.5/38| + 1 −
|51/64 − 46.5/64| + 1 − |20/20 − 15/20|) = 0.9087
S(x, v2 ) = 1/4(1 − |69/79 − 76.5/79| + 1 − |31/38 − 33/38| + 1 −
|51/64 − 62.5/64| + 1 − |20/20 − 19.5/20|) = 0.9119
S(x, v3 ) = 1/4(1 − |69/79 − 50/79| + 1 − |31/38 − 32.5/38| + 1 −
|51/64 − 14/64| + 1 − |20/20 − 2/20|) = 0.5605

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Example
Now our sample has similarity value of 0.9087 to class one,
0.9119 to class two and 0.5605 to class three. We make the
decision to which class the sample belong according to highest
similarity value. Highest similarity value is now 0.9119 which
was received when new sample was compated to second class
vector so we decide that this sample belongs to class two.
Notice that if you do this using e.g. minkowsky metric you will
end up classifying this sample wrongly.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Notice also that in this example we had three class


classification problem and only six samples. Still we managed
to get it right. This indicates that the classifier is capable of
working with data sets having very small amount of samples.
This was also shown in study related to bladder cancer
classification where similarity classifier managed to classify
bladder cancer data 100% correctly using only four sample
data. Two patients with cancer and two healty patients. A
similarity classifier is a fast and accurate tool for medical
diagnosis and it is capable of accurate performance already
with a limited amount of data. This is quite important because
there is a very limited amount of techniques available even to
deal with such small sample sizes and especially in the
diagnosis of cancer, high diagnosis accuracy is most important.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Desirable properties that a pattern classifier should possess


according to (Simpson 1992):
1. Learn the required task quickly
2. Learn new data without having to retrain with old data (on-line
adaptation)
3. Solve non-linearly separable problems
4. Provide the capability for soft and hard decisions regarding the
degree of membership of the data within each class
5. Offer explanations of how the data are classified, and why the
data are classified as such
6. Exhibit performance that is independent of parameter tuning
7. Function without knowledge of the distributions of the data in
each class
8. For overlapping pattern classes, create regions in the space of
the input parameters that exhibit the least possible overlap
Fuzzy Data Analysis
Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

If we go into detail of these properties and compare the


classifier to the properties listed we can see that the classifier
satisfies most of the properties.
The method is capable to learn the required task quickly.
On-line adaptation is not included but is one area of future
work.
Classifier is able to solve non-linearly separable problems.
It provides a partial membership for each class through the
similarity concept but so far capability for both soft and
hard decisions is not implemented to the classifier.
It offers the explanations of how the data are classified,
and why the data are classified as such. This is again
thanks to similarity concept.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

As can be seen from the results classifier can exhibit


performance that is independent of parameter tuning by
just setting suitable p and m values which seems to work
well.
In the classifier algorithm distribution of the data is not
needed.
For overlapping pattern classes, regions in the space of
the input parameters that exhibit the least possible overlap
is created by finding thesuitable p and m values for each
task separately.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Iris Data Similarity Values


1

H−H

0.95 A−H

G−G

A−G
0.9

Average similarity value


A−A

0.85

0.8

0.75

0.7
0 5 10 15 20 25 30 35 40 45 50
power value

Figure: Average similarities with generalized similarity measure


without weight optimization

In Figure 1 average similarity values for different combinations


are shown.
Fuzzy Data Analysis
Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

In Figure 2 one can see how changing these parameters p and


m are effecting the classification results.

Thyroid data classification

1
1
0.95
0.9

Classification %
0.9
0.8

0.7
0.85
0.6

0.8 0.5

0.4
0.75 10
12
10
10
20 5 5
8
0 0
6
−20 −5
4
0 −10
2 −40 P value mean value
0 −60

(a) (b)

Figure: Classification results plotted with respect to mean and p


values: a) Iris b) Thyroid

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Dermatology data classification BC1 data classification

1 1
Classification %

Classification %
0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
10 10
10 20
8 0
5 6 5 −20
4 −40
2 −60
0 0 0 −80
P value mean value P value mean value

(a) (b)

Figure: Classification results plotted with respect to mean and p


values: a) Dermatology b)Breast cancer data

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Other ways to construct similarity measures


Lowen introduced several ways in which to generalize the
classical truth functions to the fuzzy case. He introduced
following way to derive equivalence relations from T-norms and
S-norms and negation.

E(x, y ) = T (Sn(x̄, y ), Sn(x, ȳ )) (22)

where T denotes a T-norm, Sn denotes T-conorm or S-norm


and x̄ complement of x.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Example
Yu’s T-norm and S-norm are as follows:

T (x, y ) = max[0, (1 + λ) (x + y − 1) − λxy ] (23)


and

Sn(x, y ) = min[1, x + y + λxy ] (24)


where λ > −1.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Example

Shx, vi = max(0, (1 + λ)(Sn(x̄, v) + Sn(x, v̄) − 1 − λSn(x̄, v)Sn(x, v̄)) (25)

where
Snhx, vi = min(1, x + v + λxv) (26)
and negation is noted as x̄ = 1 − x for x, v ∈ [0, 1]d .

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Adaptive neuro-fuzzy inference system (ANFIS)


An Adaptive neuro-fuzzy inference system is a artificial
neural network that is based on Takagi-Sugeno fuzzy
inference system.
It is a universal approximator that can be used not only in
control problems, but also in classification and regression
analysis.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

ANFIS architecture is composed of five layers.


The first layer takes the imput values and determines the
membership functions belonging to them. It is commonly
called as fuzzification layer.
Second layer (rule layer) is responsible of generating th
firing strengths for the rules.
The role of the third lauer is to normalize the computed
firing strengths, by dividing each value for the total firing
strength
The fourth layer takes as imput the normalized values and
the consequence parameter set. The values returned by
this layer are the defuzzificated ones and they are passed
to the fifth layer
Fifth layers task is to return the final output.

Fuzzy Data Analysis


Qualitative analysis, introduction Fuzzy K-nearest Neighbor classifier
Clustering Similarity based classifier
Classification Adaptive neuro-fuzzy inference system (ANFIS)

Figure: Adaptive neuro-fuzzy inference system (ANFIS)

Fuzzy Data Analysis

You might also like