Data Mining Using Conceptual Clustering

The document discusses conceptual clustering and the COBWEB algorithm. It defines conceptual clustering as an unsupervised learning technique that forms concept hierarchies without supervision. The COBWEB algorithm uses category utility to evaluate concept hierarchies and performs hill-climbing search to iteratively place new instances in existing categories or create new categories to maximize category utility. The algorithm derives a classification tree, extracts an initial partition, then iteratively redistributes objects to maximize cohesion between clusters.

Uploaded by

Trupti Kadam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

188 views

Data Mining Using Conceptual Clustering

Uploaded by

Trupti Kadam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Data Mining Using Conceptual

Clustering

By
Trupti Kadam
What is Data Mining?

• Many Definitions
– Non-trivial extraction of implicit, previously unknown and
potentially useful information from data
– Exploration & analysis, by automatic or
semi-automatic means, of
large quantities of data
in order to discover
meaningful patterns
Origins of Data Mining

• Draws ideas from machine learning/AI, pattern

recognition, statistics, and database systems
• Traditional Techniques
may be unsuitable due to Statistics/ Machine Learning/
– Enormity of data AI Pattern
Recognition
– High dimensionality
Data Mining
of data
– Heterogeneous,
distributed nature Database
systems
of data
Clustering Definition
• Given a set of data points, each having a set of
attributes, and a similarity measure among
them, find clusters such that
– Data points in one cluster are more similar to one
another.
– Data points in separate clusters are less similar to
one another.
Conceptual Clustering
• Unsupervised, spontaneous - categorizes or
postulates concepts without a teacher
• Conceptual clustering forms a classification tree - all
initial observations in root - create new children
using single attribute (not good), attribute
combinations (all), information metrics, etc. - Each
node is a class
• Should decide quality of class partition and
significance (noise)
• Many models use search to discover hierarchies
which fulfill some heuristic within and/or between
clusters - similarity, cohesiveness, etc.
Concept Under CC
Concept Hierarchy
Contd..
• Suppose we choose 6 as threshold value for
similarity, the algo produce 5 distinct clusters
(1,2),(3,4),(5,6,7,8),(5,6),(5,7,8) after deleting
redundant one and a hierarchy is formed as
follows:
Contd..
The COBWEB Conceptual Clustering
Algorithm
• The COBWEB algorithm was developed by
machine learning researchers in the 1980s for
clustering objects in a object-attribute data
set.
• The COBWEB algorithm yields a clustering
dendogram called classification tree that
characterizes each cluster with a probabilistic
description.
Contd..
• When given a new instance, COBWEB
considers the overall quality of either placing
the instance in an existing category or
modifying the hierarchy
• The criterion COBWEB uses for evaluating the
quality of the classification is called category
utility
Category utility
• Was developed in research of human
categorization (Gluck and Corter 1985)
• Category utility attempts to maximize both the
probability that two objects in the same category
have values in common and the probability that
objects in different categories will have different
property values.
• Manhattan distance or Euclidean distance formula
is used to measure cohesion among clusters.
Category utility

P(Ck) represents size of cluster Ck.

represents probability of attribute Ai

taking on value V ij over the entire set, and

is its conditional probability of taking

the same value in class k C .
• To evaluate an entire partition made up of K clusters,
we use the average CU over the K clusters
The Classification Tree Generated by the
COBWEB Algorithm
• COBWEB performs a hill-climbing search of
the space of possible taxonomies (trees) using
category utility to evaluate and select possible
categorizations
– Initializes the taxonomy to a single category whose
features are those of the first example
– For each example, the algorithm begins with the root
category and moves through the tree
– At each level is uses category utility to evaluate the
taxonomies

1. Placing the example in the best category

2. Adding a new category containing the example
3. Merging two existing categories and adding the example
to the category
4. Splitting two existing categories and placing the example
into the best category in the tree
• Insertion means that the new object is
inserted into one of the existing child
nodes. The COBWEB algorithm evaluates
the respective CU function value of
inserting the new object into each of the
existing child nodes and selects the one
with the highest score.
• The COBWEB algorithm also considers
creating a new child node specifically for
the new object.
• The COBWEB algorithm considers merging
the two existing child nodes with the
highest and second highest scores.

P P
… … … Merge … …

A B
N
…

A B
• The COBWEB algorithm considers spliting
the existing child node with the highest
score.

P P
… … Split … … …

A B
N
…

A B
The COBWEB Algorithm Cobweb(N, I)
If N is a terminal node,
Then Create-new-terminals(N, I)
Incorporate(N,I).
Else Incorporate(N, I).
Input: The current node N in the concept hierarchy. For each child C of node N,
An unclassified (attribute-value) instance I. Compute the score for placing I in C.
Let P be the node with the highest score W.
Results: A concept hierarchy that classifies the
instance. Let Q be the node with the second highest score.
Top-level call: Cobweb(Top-node, I). Let X be the score for placing I in a new node R.
Let Y be the score for merging P and Q into one node.
Variables: C, P, Q, and R are nodes in the hierarchy.
Let Z be the score for splitting P into its children.
U, V, W, and X are clustering (partition) scores.
If W is the best score,
Then Cobweb(P, I) (place I in category P).
Else if X is the best score,
Then initialize R’s probabilities using I’s values
(place I by itself in the new category R).
Else if Y is the best score,
Then let O be Merge(P, R, N).
Cobweb(O, I).
Else if Z is the best score
Then Split(P, N).
Cobweb(N, I).
• Limitations of COBWEB
– The assumption that the attributes are independent of
each other is often too strong because correlation may
exist
– Not suitable for clustering large database data – skewed
tree and expensive probability distributions
ITERATE
The algorithm has three primary steps:

1. Derive a classification tree using category utility
as a criterion function for grouping instances.
2. Extract a good initial partition of data from the
classification tree as a starting point to focus the
search for desirable groupings or clusters.
3. Iteratively redistribute data objects among the
groupings to achieve maximally separable clusters.
Derivation of classification tree
The initial partition structure is extracted by
comparing the CU value of classes or nodes
along a path in the classification tree. For any
path from root to leaf of a classification tree this
value initially increases, and then drops .
Extraction of a good initial partition
Iteratively redistribute data objects
• The iterative redistribution operator is applied
to maximize the cohesion measure for
individual classes in the partition.
• The redistribution operator assigns object d to
class k for which the category match measure
CMdk is maximum.
Evaluating Cluster Partitions
• To be able to assess the result of a certain
clustering operation, we adopt a measure
known as cohesion, which measures the
degree of interclass similarity between objects
in the same class.
• The increase in predictability for an object for
an object d assigned to cluster k, Mdk is
defined as
THANK YOU

NUMERICAL METHODS COMPILATION - Honculada - Angel
No ratings yet
NUMERICAL METHODS COMPILATION - Honculada - Angel
38 pages
The mathematics of quantum mechanics
From Everand
The mathematics of quantum mechanics
Alessio Mangoni
No ratings yet
Design of Upsampler
No ratings yet
Design of Upsampler
7 pages
Machine Learning - Conceptual Clustering - 4/27/2019
No ratings yet
Machine Learning - Conceptual Clustering - 4/27/2019
11 pages
Data Mining-Model Based Clustering
No ratings yet
Data Mining-Model Based Clustering
8 pages
Model Based Clustering Approach
No ratings yet
Model Based Clustering Approach
14 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
27 pages
FCM-Fuzzy Rule Base: A New Rule Extraction Mechanism
No ratings yet
FCM-Fuzzy Rule Base: A New Rule Extraction Mechanism
5 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
DM Mod4
No ratings yet
DM Mod4
108 pages
Data mining and machine learning
No ratings yet
Data mining and machine learning
48 pages
Knowledge Acquisition Via Incremental Conceptual Clustering
No ratings yet
Knowledge Acquisition Via Incremental Conceptual Clustering
34 pages
Clustering
No ratings yet
Clustering
45 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
BIRCH: A New Data Clustering Algorithm and Its Applications
No ratings yet
BIRCH: A New Data Clustering Algorithm and Its Applications
42 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Knowledge Mining Using Classification Through Clustering
No ratings yet
Knowledge Mining Using Classification Through Clustering
6 pages
DWDM Short YNotes
No ratings yet
DWDM Short YNotes
9 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
57 pages
Data Mining
No ratings yet
Data Mining
68 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
BMW M-4
No ratings yet
BMW M-4
108 pages
Clustering Part 2
No ratings yet
Clustering Part 2
28 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
93 pages
Classification
No ratings yet
Classification
32 pages
K - Means: Select The K Values
No ratings yet
K - Means: Select The K Values
5 pages
Lecture 13
No ratings yet
Lecture 13
45 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
Fuzzy Decision Trees
No ratings yet
Fuzzy Decision Trees
12 pages
UNIT III DM (2)
No ratings yet
UNIT III DM (2)
48 pages
Unit Iii DM
No ratings yet
Unit Iii DM
48 pages
A New Decision Tree Method For Data Mining in Medicine: Kasra Madadipouya
No ratings yet
A New Decision Tree Method For Data Mining in Medicine: Kasra Madadipouya
7 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
Course Outcomes For Assessment in This Ia: Cos Co3 Co4 Co5 Co6
No ratings yet
Course Outcomes For Assessment in This Ia: Cos Co3 Co4 Co5 Co6
4 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
DWDM UNIT-IV Classification and Prediction
100% (1)
DWDM UNIT-IV Classification and Prediction
70 pages
siv UNIT-3 Classification DWM PART-A
No ratings yet
siv UNIT-3 Classification DWM PART-A
12 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Clustering
No ratings yet
Clustering
28 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
BI-Unit-3-Part-1-PPT.ppt
No ratings yet
BI-Unit-3-Part-1-PPT.ppt
51 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Class Basic
No ratings yet
Class Basic
75 pages
Lecture11-Ch8-ClassBasic-Part1
No ratings yet
Lecture11-Ch8-ClassBasic-Part1
38 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Business Data Mining Week 10 A
No ratings yet
Business Data Mining Week 10 A
28 pages
List of Figures Chapter 1: State of The Art
No ratings yet
List of Figures Chapter 1: State of The Art
25 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
CC Unit IV
No ratings yet
CC Unit IV
30 pages
Unit-2 Material (1)
No ratings yet
Unit-2 Material (1)
52 pages
Cluster
100% (1)
Cluster
72 pages
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
No ratings yet
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
5 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Monomials
No ratings yet
Monomials
4 pages
Recurrences
No ratings yet
Recurrences
71 pages
s12182-014-0005-6
No ratings yet
s12182-014-0005-6
15 pages
HW 6
No ratings yet
HW 6
2 pages
Data Structure Course Outline
No ratings yet
Data Structure Course Outline
2 pages
Parametric LP
No ratings yet
Parametric LP
13 pages
Cholesky
No ratings yet
Cholesky
4 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
SVM Notes
No ratings yet
SVM Notes
8 pages
Assignment 5 Solutions James Vanderhyde
No ratings yet
Assignment 5 Solutions James Vanderhyde
2 pages
Numerical Methods and Programming
No ratings yet
Numerical Methods and Programming
197 pages
CSC508 - Pass Year 2022
No ratings yet
CSC508 - Pass Year 2022
7 pages
Iris Dataset Clustering and Spam Email Separation
No ratings yet
Iris Dataset Clustering and Spam Email Separation
20 pages
RV College of Engineering: 1RV19ET005 Abhilash M. S 1RV19ET054 Shubh Patiyat
No ratings yet
RV College of Engineering: 1RV19ET005 Abhilash M. S 1RV19ET054 Shubh Patiyat
4 pages
2 Marks - 2ND UNIT - AOR ANNA UNIVERSITY PDF
No ratings yet
2 Marks - 2ND UNIT - AOR ANNA UNIVERSITY PDF
3 pages
Slides ChannelCoding
No ratings yet
Slides ChannelCoding
98 pages
Image - Deblurring
No ratings yet
Image - Deblurring
18 pages
06 Smoothing PDF
No ratings yet
06 Smoothing PDF
55 pages
Digital Filters
No ratings yet
Digital Filters
50 pages
1.Bais varience trade-off
No ratings yet
1.Bais varience trade-off
5 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
RS Aggarwal Solutions Class 10 Chapter 2 - Exercise 2.1
No ratings yet
RS Aggarwal Solutions Class 10 Chapter 2 - Exercise 2.1
44 pages
Naive Bayes Algorithm: Assignment 1a
No ratings yet
Naive Bayes Algorithm: Assignment 1a
4 pages
Lecture3 Searching
No ratings yet
Lecture3 Searching
45 pages
Predict the Output Decodex
No ratings yet
Predict the Output Decodex
4 pages
2 Seismic Inversin Consept
No ratings yet
2 Seismic Inversin Consept
15 pages
Awadhesh Kumar Maurya - KIET
No ratings yet
Awadhesh Kumar Maurya - KIET
7 pages
Floodfill algorithm
No ratings yet
Floodfill algorithm
4 pages

Data Mining Using Conceptual Clustering

Uploaded by

Data Mining Using Conceptual Clustering

Uploaded by

Data Mining Using Conceptual

• Draws ideas from machine learning/AI, pattern

P(Ck) represents size of cluster Ck.

represents probability of attribute Ai

is its conditional probability of taking

1. Placing the example in the best category

You might also like