SlideShare a Scribd company logo
Graph-based cluster labeling using
Growing Hierarchal SOM
Mahmoud Rafeek Alfarra
College Of Science & Technology
m.farra@cst.ps
The second International conference of Applied Science & natural
Ayman Shehda Ghabayen
College Of Science & Technology
a.ghabayen@cst.ps
Prepared by:
Out Line
 Labeling, What and why ?
 Graph based Representation
 Growing Hierarchal SOM
 Extraction of labeles of clusters
Labeling, What and why ?
 Cluster labeling: process tries to select
descriptive labels (Key words) for the clusters
obtained through a clustering algorithm.
Labeling, What and why ?
Cluster labeling is an increasingly important
task that:
1. The document collections grow larger.
2. Help To: work with processing of news,
email threads, blogs, reviews, and
search results
Labeling, What and why ?
Documents collection
Document
Labeled Clusters
Preprocessing Step
DIG Model
X B
S OL
A
G
C
D
Clustering
Process
+
Labeling
0
G0
0
G1
0
Gs
SOM
1
G0
1
G1
1
Gs
2
G1
2
G2
Hierarchal Growing SOM
2
G1
2
G2
1
G0
1
G1
2
G1
2
G2
Graph based Representation
0
1
0
1
1
0
2
5
9
6
3
7
1
0
0
0
0
0
A
B
X
D
NC
S
2,3
3,3
1,3
1,1
ph1
ph2
ph3
ph4
ph5
Graph based Representation
 Capture the silent features of the data.
 DIG Model: a directed graph.
 A document is represented as a vector of sentences
 Phrase indexing information is stored in the graph
nodes themselves in the form of document tables.
e1
e0
e2
rafting
adventures
river
Document Table e0 S1(1), S2(2), S3(1)
e0 S2(1)
e2 S1(2)
e1 S4(1)
fishing
Doc TF ET
1 {0,0,3}
2 {0,0,2}
3 {0,0,1}
S1(2(
#Sentence
Position
of term
Graph based Representation
Example Document 1
River rafting
Mild river rafting
River rafting trips
Document 2
Wild river adventures
River rafting vocation plan
fishing trips
fishing vocation plan
booking fishing trips
river fishing
mild
river
rafting
trips
mild
river
rafting
trips
wild
adventures vocation
plan
wild
plan
mild
river
rafting
trips
adventures
vocation
booking
fishing
+
Growing Hierarchal SOM
Growing Hierarchal SOM
 Determining the winning node
…
v1
v2
v3
v5
v4
v7
e0 v6
e0
e1 e5
e3
e2
e4
n-nodes in SOM (Gs)
v1
v2 v5
v7
e0 v6
e0
e1 e5
e3
Input Document Graph (Gi)
Phrases Significance
Gi Gs
length
Gi
Growing Hierarchal SOM
Neuron updating in the graph domain
A
B D
C
e0 Xe0
e1 e5
e3
Y
B D
C
Ee4
e1 e5
e3
A
e2
e2
G1
G2
We choose increasing the matching phrases to update graphs
due to its affect is more stronger than increasing terms (nodes)
also add matching phrases can consider it as add ordered pair
of nodes
Over all Document clustering Process
Extracting labeling of clusters
 To extract the Key word, we need to build a table
for each cluster as the following:
Term TF- Locations
{T, L,B,b}
No of matching phrases
(MP)
Weight
Weight = (f1*T + f2*L + f3*B+ f4*b) * 0.4 + MP * 0.6
Extracting labeling of clusters
T1
T2
T3
T10
T4
T7 T8 T11
T6 T5
T9
Term F-weight # MP Net weight
T2 12.4 2 (T2,T3), (T2,T5) 4.96 + 1.2 =6.16
T3 10.2 2 (T2,T3), (T5,T3) 4.08 + 1.2= 5.28
T5 16.6 3 (T2,T5), (T8, T5), (T5,T3) 6.4+ 1.8= 6.4
T8 14.4 1 (T8,T5) 5.76+ 0.6=6.36
Thank You … Questions

More Related Content

PPTX
PPTX
Top FME Recipes: Raster
PPTX
15 نصيحة للطالب الجامعي الجديد
PPTX
ثلاث خطوات عملية للطالب الجامعي قبل الامتحان
PPT
البرمجة الهدفية بلغة جافا - مقدمة
PPT
البرمجة الهدفية بلغة جافا - تعدد الأشكال
PPT
البرمجة الهدفية بلغة جافا - مصفوفة الكائنات
PPTX
‫Chapter3 inheritance
Top FME Recipes: Raster
15 نصيحة للطالب الجامعي الجديد
ثلاث خطوات عملية للطالب الجامعي قبل الامتحان
البرمجة الهدفية بلغة جافا - مقدمة
البرمجة الهدفية بلغة جافا - تعدد الأشكال
البرمجة الهدفية بلغة جافا - مصفوفة الكائنات
‫Chapter3 inheritance

Similar to graph based cluster labeling using GHSOM (20)

PPTX
#8 Graph Analytics in Machine Learning.pptx
PDF
Dynamics in graph analysis (PyData Carolinas 2016)
PPT
Hands on Mahout!
PDF
How the Go runtime implement maps efficiently
PDF
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
PDF
Nx tutorial basics
PPTX
Reproducible Computational Research in R
PPTX
IA3_presentation.pptx
PDF
Incremental View Maintenance for openCypher Queries
PDF
Incremental View Maintenance for openCypher Queries
PPTX
PPTX
[FOSS4G Seoul 2015] New Geoprocessing Toolbox in uDig Desktop GIS
PPTX
Improved Query Reformulation for Concept Location using CodeRank and Document...
PPTX
What's New in ArcGIS 10.1 Data Interoperability Extension
PDF
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PDF
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
PDF
Move from C to Go
PDF
Introduction of DiscoGAN
PPTX
CMPT470-usask-guest-lecture
#8 Graph Analytics in Machine Learning.pptx
Dynamics in graph analysis (PyData Carolinas 2016)
Hands on Mahout!
How the Go runtime implement maps efficiently
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Nx tutorial basics
Reproducible Computational Research in R
IA3_presentation.pptx
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
[FOSS4G Seoul 2015] New Geoprocessing Toolbox in uDig Desktop GIS
Improved Query Reformulation for Concept Location using CodeRank and Document...
What's New in ArcGIS 10.1 Data Interoperability Extension
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Move from C to Go
Introduction of DiscoGAN
CMPT470-usask-guest-lecture
Ad

More from Mahmoud Alfarra (20)

PPT
Computer Programming, Loops using Java - part 2
PPT
Computer Programming, Loops using Java
PPT
Chapter 10: hashing data structure
PPT
Chapter9 graph data structure
PPT
Chapter 8: tree data structure
PPT
Chapter 7: Queue data structure
PPT
Chapter 6: stack data structure
PPT
Chapter 5: linked list data structure
PPT
Chapter 4: basic search algorithms data structure
PPT
Chapter 3: basic sorting algorithms data structure
PPT
Chapter 2: array and array list data structure
PPT
Chapter1 intro toprincipleofc#_datastructure_b_cs
PPT
Chapter 0: introduction to data structure
PPTX
3 classification
PPT
8 programming-using-java decision-making practices 20102011
PPT
7 programming-using-java decision-making220102011
PPT
6 programming-using-java decision-making20102011-
PPT
5 programming-using-java intro-tooop20102011
PPT
4 programming-using-java intro-tojava20102011
PPT
3 programming-using-java introduction-to computer
Computer Programming, Loops using Java - part 2
Computer Programming, Loops using Java
Chapter 10: hashing data structure
Chapter9 graph data structure
Chapter 8: tree data structure
Chapter 7: Queue data structure
Chapter 6: stack data structure
Chapter 5: linked list data structure
Chapter 4: basic search algorithms data structure
Chapter 3: basic sorting algorithms data structure
Chapter 2: array and array list data structure
Chapter1 intro toprincipleofc#_datastructure_b_cs
Chapter 0: introduction to data structure
3 classification
8 programming-using-java decision-making practices 20102011
7 programming-using-java decision-making220102011
6 programming-using-java decision-making20102011-
5 programming-using-java intro-tooop20102011
4 programming-using-java intro-tojava20102011
3 programming-using-java introduction-to computer
Ad

Recently uploaded (20)

PDF
Insiders guide to clinical Medicine.pdf
PDF
From loneliness to social connection charting
PDF
The Final Stretch: How to Release a Game and Not Die in the Process.
PPTX
Cardiovascular Pharmacology for pharmacy students.pptx
PDF
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
DOCX
UPPER GASTRO INTESTINAL DISORDER.docx
PPTX
COMPUTERS AS DATA ANALYSIS IN PRECLINICAL DEVELOPMENT.pptx
PPTX
Onica Farming 24rsclub profitable farm business
PPTX
How to Manage Loyalty Points in Odoo 18 Sales
PPTX
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
PPTX
Software Engineering BSC DS UNIT 1 .pptx
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Types of Literary Text: Poetry and Prose
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Module 3: Health Systems Tutorial Slides S2 2025
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Open folder Downloads.pdf yes yes ges yes
Insiders guide to clinical Medicine.pdf
From loneliness to social connection charting
The Final Stretch: How to Release a Game and Not Die in the Process.
Cardiovascular Pharmacology for pharmacy students.pptx
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Week 4 Term 3 Study Techniques revisited.pptx
UPPER GASTRO INTESTINAL DISORDER.docx
COMPUTERS AS DATA ANALYSIS IN PRECLINICAL DEVELOPMENT.pptx
Onica Farming 24rsclub profitable farm business
How to Manage Loyalty Points in Odoo 18 Sales
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
Software Engineering BSC DS UNIT 1 .pptx
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Types of Literary Text: Poetry and Prose
102 student loan defaulters named and shamed – Is someone you know on the list?
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Module 3: Health Systems Tutorial Slides S2 2025
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Open folder Downloads.pdf yes yes ges yes

graph based cluster labeling using GHSOM

  • 1. Graph-based cluster labeling using Growing Hierarchal SOM Mahmoud Rafeek Alfarra College Of Science & Technology [email protected] The second International conference of Applied Science & natural Ayman Shehda Ghabayen College Of Science & Technology [email protected] Prepared by:
  • 2. Out Line  Labeling, What and why ?  Graph based Representation  Growing Hierarchal SOM  Extraction of labeles of clusters
  • 3. Labeling, What and why ?  Cluster labeling: process tries to select descriptive labels (Key words) for the clusters obtained through a clustering algorithm.
  • 4. Labeling, What and why ? Cluster labeling is an increasingly important task that: 1. The document collections grow larger. 2. Help To: work with processing of news, email threads, blogs, reviews, and search results
  • 5. Labeling, What and why ? Documents collection Document Labeled Clusters Preprocessing Step DIG Model X B S OL A G C D Clustering Process + Labeling 0 G0 0 G1 0 Gs SOM 1 G0 1 G1 1 Gs 2 G1 2 G2 Hierarchal Growing SOM 2 G1 2 G2 1 G0 1 G1 2 G1 2 G2
  • 7. Graph based Representation  Capture the silent features of the data.  DIG Model: a directed graph.  A document is represented as a vector of sentences  Phrase indexing information is stored in the graph nodes themselves in the form of document tables. e1 e0 e2 rafting adventures river Document Table e0 S1(1), S2(2), S3(1) e0 S2(1) e2 S1(2) e1 S4(1) fishing Doc TF ET 1 {0,0,3} 2 {0,0,2} 3 {0,0,1} S1(2( #Sentence Position of term
  • 8. Graph based Representation Example Document 1 River rafting Mild river rafting River rafting trips Document 2 Wild river adventures River rafting vocation plan fishing trips fishing vocation plan booking fishing trips river fishing mild river rafting trips mild river rafting trips wild adventures vocation plan wild plan mild river rafting trips adventures vocation booking fishing +
  • 10. Growing Hierarchal SOM  Determining the winning node … v1 v2 v3 v5 v4 v7 e0 v6 e0 e1 e5 e3 e2 e4 n-nodes in SOM (Gs) v1 v2 v5 v7 e0 v6 e0 e1 e5 e3 Input Document Graph (Gi) Phrases Significance Gi Gs length Gi
  • 11. Growing Hierarchal SOM Neuron updating in the graph domain A B D C e0 Xe0 e1 e5 e3 Y B D C Ee4 e1 e5 e3 A e2 e2 G1 G2 We choose increasing the matching phrases to update graphs due to its affect is more stronger than increasing terms (nodes) also add matching phrases can consider it as add ordered pair of nodes
  • 12. Over all Document clustering Process
  • 13. Extracting labeling of clusters  To extract the Key word, we need to build a table for each cluster as the following: Term TF- Locations {T, L,B,b} No of matching phrases (MP) Weight Weight = (f1*T + f2*L + f3*B+ f4*b) * 0.4 + MP * 0.6
  • 14. Extracting labeling of clusters T1 T2 T3 T10 T4 T7 T8 T11 T6 T5 T9 Term F-weight # MP Net weight T2 12.4 2 (T2,T3), (T2,T5) 4.96 + 1.2 =6.16 T3 10.2 2 (T2,T3), (T5,T3) 4.08 + 1.2= 5.28 T5 16.6 3 (T2,T5), (T8, T5), (T5,T3) 6.4+ 1.8= 6.4 T8 14.4 1 (T8,T5) 5.76+ 0.6=6.36
  • 15. Thank You … Questions