0% found this document useful (0 votes)
60 views11 pages

Machine Learning - Conceptual Clustering - 4/27/2019

Cobweb is an incremental conceptual clustering algorithm that builds a classification tree by adding examples one at a time. It uses category utility as a heuristic measure to determine whether to classify an example into an existing class, create a new class, or modify the tree structure through merging or splitting nodes. The algorithm aims to maximize intra-class similarity while maximizing inter-class dissimilarity at each step.

Uploaded by

Hamza Jutt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views11 pages

Machine Learning - Conceptual Clustering - 4/27/2019

Cobweb is an incremental conceptual clustering algorithm that builds a classification tree by adding examples one at a time. It uses category utility as a heuristic measure to determine whether to classify an example into an existing class, create a new class, or modify the tree structure through merging or splitting nodes. The algorithm aims to maximize intra-class similarity while maximizing inter-class dissimilarity at each step.

Uploaded by

Hamza Jutt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 11

Conceptual Clustering

• Unsupervised, spontaneous - categorizes or postulates


concepts without a teacher
• Conceptual clustering forms a classification tree - all
initial observations in root - create new children using
single attribute (not good), attribute combinations (all),
information metrics, etc. - Each node is a class
• Should decide quality of class partition and significance
(noise)
• Many models use search to discover hierarchies which
fulfill some heuristic within and/or between clusters -
similarity, cohesiveness, etc.
Machine Learning - Conceptual Clustering - 4/27/2019 Page 1
Cobweb

• Cobweb is an incremental hill-climbing strategy with bidirectional


operators - not backtrack, but could return in theory
• Starts empty. Creates a full concept hierarchy (classification tree) with
each leaf representing a single instance/object. You can choose how deep
in the tree hierarchy you want to go for the specific application at hand
• Objects described as nominal attribute-value pairs
• Each created node is a probabilistic concept (a class) which stores
probability of being matched (count/total), and for each attribute,
probability of being on, P(a=v|C), only counts need be stored.
• Arcs in tree are just connections - nodes store info across all attributes
(unlike ID3, etc.)

Machine Learning - Conceptual Clustering - 4/27/2019 Page 2


Category Utility: Heuristic Measure

• Tradeoff between intra-class similarity and inter-class


dissimilarity - sums measures from each individual
attribute
• Intra-class similarity a function of P(Ai = Vij|Ck),
Predictability of C given V - Larger P means if class is
C, A likely to be V. Objects within a class should have
similar attributes.
• Inter-class dissimilarity a function of P(Ck|Ai = Vij),
Predictiveness of C given V - Larger P means A=V
suggests instance is member of class C rather than some
other class. A is a stronger predictor of class C.

Machine Learning - Conceptual Clustering - 4/27/2019 Page 3


Category Utility Intuition

• Both should be high over all (most) attributes for a


good class breakdown
– Predictability: P(V|C) could be high for multiple
classes, giving a relatively low P(C|V), thus not
good for discrimination
– Predictiveness: P(C|V) could be high for a class,
while P(V|C) is relatively low, due to V occurring
rarely, thus good for discrimination, but not intra-
class similarity
– When both are high, get best categorization balance
between discrimination and intra-class similarity
Machine Learning - Conceptual Clustering - 4/27/2019 Page 4
Category Utility

• For each category sum predictability times predictiveness for


each attribute weighted by P(Ai = Vij), with k proposed categories,
i attributes, j values/attribute
K
   P( Ai  Vij )P(Ck | Ai  Vij )P( Ai  Vij |Ck )
k 1 i j

Bayes Rule - P( Ai  Vij )P(Ck | Ai  Vij )  P(Ck )P( Ai  Vij |Ck )


K
2
Thus,  P(Ck )  P(Ai  Vij |Ck )
k 1 i j
2
The expected number of attribute
values one could guess given C
  P( Ai  Vij |Ck )
i j

Machine Learning - Conceptual Clustering - 4/27/2019 Page 5


Category Utility

• Category Utility is the increase in expected attributes that could be


guessed, given a partitioning of categories - leaf nodes.
• CU({C1, C2, ... Ck}) =
K

 P(C )[ P( A  V
k 1
k
i j
i ij | Ck ) 2   P( Ai  Vij | Ck ) 2 ]
i j

• K normalizes CU for different numbers of categories in the


candidate partition
• Since incremental, there is a limited number of possible
categorization partitions
• If Ai = Vij is independent (irrelevant) of class membership, CU = 0
Machine Learning - Conceptual Clustering - 4/27/2019 Page 6
Cobweb Learning Algorithm

1. Incrementally add a new training example


2. Recurse down the at root until new node with just this example is added.
Update appropriate probabilities at each level.
3. At each level of the tree calculate the scores for all valid modifications
using category utility (CU)
4. Depending on which of the following gives the best score:
– Classify into an existing class - then recurse
– Create a new class node – done, can get next example
– Combine two classes into a single class (Merging) - then recurse
– Divide a class into multiple classes (Splitting) - then recurse

Machine Learning - Conceptual Clustering - 4/27/2019 Page 7


Cobweb Learning Mechanisms

• Classifying (Matching) - calculate overall CU for each


case of putting the example in a node at current level
• New Class - calculate overall CU for putting example
into a single new class- Note gradient descent (greedy)
nature. Does not go back and try all possible new
partitions.
– If created from internal node, simply add
– If created from leaf node, split into two, one for new and old

• These alone are quite order dependent - splitting and


merging allow bi-directionality - ability to undo

Machine Learning - Conceptual Clustering - 4/27/2019 Page 8


Cobweb Learning Mechanisms

• Merging - For best matching node (the one that would


be chosen for classification) and the second best
matching node at that level, calculate CU when both are
merged into one node, with two children
• Splitting - For best matching node, calculate CU if that
node were deleted and it’s children added to the current
level.
• Both schemes could be extended to test other nodes, at
the cost of increased computational complexity
• Can overcome initial “misconceptions”
Machine Learning - Conceptual Clustering - 4/27/2019 Page 9
Cobweb Comments

• Generalization done by just executing recursive classification step


• Could use different variations on CU and search strategy
• Complexity: O(AVB2logK) for each example, where B is branching
factor, A (attributes), V (average number of values), K (classes)
• Empirically, B usually between 2 and 5
• Does not directly handle noise - no defined significance
mechanism
• Tends to make “bushy” trees, however high levels should be most
important class categories (because of merge/split causing best
breaks to float up, though no optimal guarantee), and one could
just use nodes highest in the tree for classification
• Does not support continuous values
Machine Learning - Conceptual Clustering - 4/27/2019 Page 10
Extensions - Classit

• Cannot store probability counts for continuous data


• Classit uses a scheme similar to Cobweb, but assumes
normal distribution around an attribute and thus can just
store a mean and variance - not always a reasonable
assumption
• Also uses a formal cut-off (significance) mechanism to
better support generalization and noise handling (a class
node can then include outliers)
• More work needed

Machine Learning - Conceptual Clustering - 4/27/2019 Page 11

You might also like