Performance Analysis of Embedded Tree-Based Pattern Mining Algorithm With Sentiment Analysis (1)
Performance Analysis of Embedded Tree-Based Pattern Mining Algorithm With Sentiment Analysis (1)
Abstract - Human interaction is one of the most important characteristics of group social dynamics in meetings.
The sequence of human interaction is generally represented as a tree. Tree structure is used to capture how the
person interacts in meetings and to discover the interactions flow. The human interaction are proposing as an
idea, giving comments, ask opinion, acknowledge, etc., Frequent interaction tree pattern mining algorithm and
Frequent interaction sub tree pattern mining algorithm are utilized to analysis the structure and to extract
interaction flow patterns, where co-occurring only the tags are considered.
Tree-Based Pattern Mining Algorithm (TBPMA) not considers the verbal behaviors. To address issues of verbal
behaviors, Sentimental Analysis (SA) Algorithm is integrated with Tree Based Pattern mining Algorithm for
classifying the document. A sentiment analysis approach is introduced to extract sentiments associated with
opinions of positive or negative for specific subjects from the document instead of classifying the whole
document into positive or negative. Embedded Tree-Based Pattern Mining Algorithm (ETBPMA) is used to
construct frequency of embedded tree orders. It is used to find the correct position of the node in tree
representation and also improve the accuracy level of the tree orders.
Keywords-Tree mining, Sentiment Analysis, Frequent sub-tree mining, embedded sub-tree mining, interaction
mining
I.INTRODUCTION
Human Interaction is a vital event to understand this communicative information and different from
communicative information. Understanding human physical interactions (e.g. turn-taking and
behavior is essential in applications including addressing), the human interactions here are defined
automated surveillance, video archival/retrieval, as behaviors among meeting participants with respect
medical diagnosis, and human-computer interaction. to the Current topic, such as proposing an idea,
The advent of smart meeting that automatically giving some comments, expressing positive opinion,
records a meeting [1] and analyzes the generated and requesting information. When incorporated with
audio-visual content for future viewing .While most semantics (i.e. user intention or attitude towards a
of current smart meeting systems analyze the meeting topic), interactions are more meaningful in
content for understanding what conclusion was made, understanding conclusion drawing and meeting
it is more interesting and important to know how a organization. The interaction issues including turn-
conclusion was made, for example, did all members taking, gaze behavior, influence and talkativeness
agree on the outcome? Who did not give his and analyzing user interactions during poster
opinion? Who spoke a little or a lot? etc., such presentation in an exhibition room are mainly
kind of group social dynamics can be useful for focus on detecting physical interactions between
determining whether meeting was well organized participants without any relations with topics.
and whether the conclusion was rational. Human
interaction plays an important role in understanding
The context information is gathered through Info –a user requests information about a proposal;
multiple sensors e.g. video cameras, microphones, ask Opinion– a user asks someone else’s opinion
and motion sensors. The various interactions imply about a proposal; posOpinion – a user expresses
different user [3] [4] roles, attitudes, and intentions positive opinion, i.e. follow a proposal; and
about a topic during a discussion.. We create a set of negOpinion – a user expresses negative opinion, i.e.
human interactions that includes seven categories: against a proposal.
propose, comment, acknowledgement, request Info,
ask Opinion, posOpinion, and negOpinion. The The context used in our interaction detection
detailed meanings are described as: propose – a user includes head motion, notice from others, speech
proposes an idea with respect to a topic; comment – manner, talking time, interaction juncture, and
a user gives comments on a proposal; information about previous interaction. Head motion
acknowledgement – a user confirms someone else’s (e.g. Drowsy) is very common and used often
comment or explanation, e.g. yeah and OK; request
in detection of human response (acknowledgement interaction flow as a key feature [4] and (iv) index
or agreement). For example, when a user is proposing meetings for further ease of access in database.
some idea, he is usually being looked at by most of
the participants. Attention from others can be treated There have been several works done in discovering
as how many persons looking at the target user Human behavior patterns by using stochastic
during the interaction. Thus the problem can be techniques. Bakeman and Gottman [5] applied
roughly turned into detection of face direction. The sequential analysis to observe and analyze human
face orientation is determined as the one whose interactions. Magnusson [6] proposed a pattern
vector makes the smallest angle. Speech tone refers detection method, called T-pattern to discover hidden
to whether a statement is a question or a normal one. time patterns in human behavior. T-pattern has been
Speaking time is another important indicator in adopted in several applications such as interaction
detection the type of human interaction. analysis and sports research .Although the purpose
of these techniques is similar to our work, conduct
II.RELATED WORKS analysis on human interaction in meetings and
Frequent-pattern mining has been studied address the problem of discovering interaction
extensively in data mining, with many algorithms patterns from the perspective of data mining.
proposed and implemented (for example, Apriori
[Agrawal & Srikant1994],FP-growth [Han, Pei, & Casas-Garriga[7] proposed algorithms to mine
Yin2000], CLOSET [Pei, Han, & Mao2000], and unbounded episodes (those with unfixed window
CHARM [Zaki & Hsiao2002]). Frequent pattern width or interval) from a sequence of events on a
mining and its associated methods have been time line. The work is generally used to extract
popularly used in association rule mining [Agrawal & frequent episodes, i.e., collections of events occurring
Srikant1994], sequential pattern mining [Agrawal & frequently together. Morita et al. [8] proposed a
Srikant1995], structured pattern mining [Kuramochi pattern mining method for the interpretation of
& Karypis2001], iceberg cube computation [Beyer & human interactions in a poster exhibition. It extracts
Ramakrishnan1999], cube gradient analysis [Imielin- simultaneously occurring patterns of primitive
ski, Khachiyan, & Abdulghani2002], associative actions such as gaze and speech. Sawamoto et al. [8]
classification [Liu, Hsu, & Ma1998], frequent presented a method for extracting important
pattern-based clustering [Wang et al.2002], and so interaction patterns in medical interviews (i.e.,
on. doctor-patient communication) using non-verbal
information.
Data mining is useful in discovering implicit,
previously unknown, and potentially valuable Sasa Junuzovic et al. proposed that Capturing the
information or knowledge from large datasets. For relevant aspects is more important for offline meeting
instance, frequent pattern mining [1], [2], [3] is viewing, which is for remotely attending a meeting.
helpful in finding frequently occurring pat-terns, such The reason is that during an ongoing meeting, remote
as interaction patterns from meetings. The discovered attendees can interrupt the conversation to ask for
interaction patterns help to (i) estimate the clarifications, which is not possible in the case of a
effectiveness of decisions made in meetings, (ii) person watching a recorded meeting. The aspects of a
designate whether a meeting discussion is fruitful, meeting that are important are meeting-dependent. In
(iii) compare two meeting discussions using general, meetings can be roughly classified into two
types. In one type of meeting, there are a large
number of attendees, but only a few of them are frequently. Thus, traditional meeting viewing
active. An example of such a meeting is a lecture in interfaces for such meetings have an automatic
which there is one lecturer and a large audience. In speaker view, which always shows the current
the other type of meeting, there are a small number of speaker.
attendees, but the majority of them are active.
III. TREE BASED PATTERN MINING
Motivated by concepts in social psychology, which
highlight the group and multimodal nature of Tree-Based Pattern Mining algorithm is
communication, recent work has viewed meetings as designed for interaction flow. Frequent tree pattern
sequences of no overlapping Multimodal actions mining algorithm is used for each tree TD, for
performed by the group of participants thus implying exchanging the places of siblings (i.e., performs
that such actions are relevant to segment and commutation processing) to generate the full set of
recognize. A key aspect of interactive meetings is Isomorphic Trees, ITD. [8] [9] The tree dataset is
the current speaker, who is, by definition, changing constructed by using String Encoding method which
is explained under here
.
A. String Encoding example string code for tree is PRO-(COM-
ACK)*REQ*ACK. First, tree dataset (TD)
String Encoding defines an Application
is constructed. Then Tree based mining
Programming Interface (API) for encoding
algorithm calculates the support of each
strings to binary data, and decoding strings
node (tree) in ITD. Then it selects the tree
from binary data. To represent a Tree
Dataset (TD), String Encoding method is whose supports are larger than
used. The first node of the root is (minsupport). It finally outputs the frequent
represented using “-”, and sub tree is patterns as frequent trees. Frequent
represented by using “()”, sibling interaction tree pattern mining Algorithm
relationships are represented using “*”. The (FITPM) is projected in Fig 1.
Mining frequent sub tree from databases of labelled in tree-like databases. Using a new data structure
trees is a new research field that has many practical called scope-list, which is a canonical representation
applications in areas such as computer networks, of the tree node, the algorithm first generates all
Web mining, Bioinformatics, XML document candidate trees, then enumerates into embedded
mining, etc. These applications share a requirement ordered trees. Embedded Tree Based Interaction
for capture the complex relations among data entities. Pattern Mining [9] with Sentiment Algorithm is
In this thesis an efficient algorithm is introduced for presented in Fig 3 and this algorithm Flow chart is
mining frequent, ordered, an embedded sub tree [8] projected in Fig 4.
Start
Dataset
Review Text
End
Fig 4.Flow chart for ETPM with SA Algorithm
The goal of experimental analysis is to interaction of the meetings. Each meeting have four
calculate the accuracy of the proposed algorithms participants seated around a table. In order to use a
for interaction in the meetings. The detailed correct data for mining, the interaction types such
description of dataset used to conduct experiment as Positive Content, Negative Content, Comments
and their discussions of the results are given below. and Acknowledgement are tuned. The six Text
A. .Data sets documents used for conducting experiments are
This work involves real meetings average about given in Table 1.
15 minutes. Video camera, microphones and
motion sensors are used for capturing the PRO
S.No Document Name
1. Ext.txt
2. Inp1.txt COM
3. Inp2.txt
4. Inp3.txt
5. Inp4.txt
6. Inp5.txt
COM
TABLE 1. TEXT DOCUMENTS
The accuracy is a parameter used to evaluate The Performance of Tree-Based Pattern Mining
the efficiency of the proposed algorithms and the Algorithm (TBPMA), Embedded Tree Based Pattern
same is defined as follows. Mining algorithm (ETBPM) and TBPMA with SA
and ETBPM with SA are evaluated and analyzed
B .Performance Analysis thoroughly.
PRO
COM
COM
R
R
ACK COM
P1 P2
ACK POS
P2 P1
POS
P3
POS
P4
COM
P2
COM
P4
POS
P3
COM
Fig 6.Representation of Tree Fig 7. Representation of Interaction
Tree
Sentiment analytics
Classifier
Positive Negative
Embedded Tree Based Interaction Pattern Mining to the root of the sub tree is to be taken into account
Algorithm during the candidate enumeration phase. The sample
The Embedded Sub trees used to find the correct interaction tree based on the Embedded Sub tree is
position of the node in a tree representation. In such, shown in Fig 9.
sub trees the distances of the nodes with respective
PR
POS NE NE CO
PO PO PO CO
NE COM
Fig 9.Interaction Tree based on Embedded Sub Tree
Inp5. 6.4 5.8 5.6 5.8 18.5 18.9 19.7 17 69 69.9 70 69 3.1 3.1 3.2 3.2
txt
Five text files have been taken from the internet and given as input to pre-processing in order to classify the
vocabularies as a verb, subject, object and so on.
70
60
50
40
30
20 TBPMA
10 ETBPM
0 TBPMSA
t t ts t ETBSA
ten ten en en
n on m m
C o C om ge
ve e C led
iti ativ ow
os eg n
P N ck
A
Fig 10.Performance analysis chart of TBPMA, ETBPM, SA and ETB with SA algorithm
In this thesis, TBPM with Sentiment SA approach automatically identify the contextual
Analysis and Embedded-Sub Tree Based Pattern polarity for sentiment expressions. The
Mining Algorithms with SA are proposed for mining Experimental result shows that the ETBPM with
the frequent interactions in the meetings. TBPM with SA algorithm is discover frequent interaction
Positive Content, Comments and [9] Gosta Grahne and Jianfei Zhu “Efficiently
Acknowledgement interaction types. ETBPM with Using Prefix-trees in Mining Frequent Item
SA produces highest accuracy 9.7%, 64.66% and sets “In Proceedings of VLDB’94, pages
7.3% for Positive Content, Comments and 487–499,2001.
Acknowledgement respectively. With this [10] C. Wang, M. Hong, J. Pei, H. Zhou, W.
approach, a new data structure called scope-list to Wang, and B. Shi, “Efficient Pattern-Growth
compute the frequency of embedded tree orders in Methods for Frequent Tree Pattern Mining,”
ETBM algorithm. Proc. Pacific-Asia Conf. Knowledge
TBPM with SA algorithm is better to Discovery and Data Mining (PAKDD ’04),
discover the frequent interaction in meetings, pp. 441-451, 2002
whereas Negative Content achieved 18.3% of [11] Anna Fariha “Mining Frequent Patterns from
accuracy compared to another approach. In the Human Interactions in Meetings Using
future, it will be using a different type of pattern Directed Acyclic Graphs “In Proceedings of
mining method for several applications based on ACM SIGMOD’98, pages 85–93.
the discovered patterns. [12] Jia-Ling Koh “A Tree-based Approach for
Efficiently Mining Approximate Frequent
References Item sets “under Contract No. 98-2221-E-
[1] Ahmed, C.F., Tanbeer, S.K., Jeong, B.-S., 003-017andNSC98-2631-S-003-002.
Lee, Y.-K.: An efficient candidate pruning [13] Zhiwen Yu, Hideki Aoyama, Motoyuki
technique for high utility pattern mining. In: Ozeki, Yuichi Nakamura “Collaborative
PAKDD 2009. LNAI 5476, pp. 749–756. Capturing and Detection of Human
Springer (2009) Interactions in Meetings” The authors are at
[2] Leung, C.K.-S., Mateo, M.A.F., Brajczuk, the Kyoto University, Japan.
D.A.: A tree-based approach for frequent [14] Jian Pei and Jiawei Han “Prefix Span:
pattern mining Mining Sequential Patterns Efficiently by
fromuncertaindata.In:PAKDD2008.LNAI50 Prefix-Projected Pattern Growth”
12,pp.653–661.Springer(2008). Engineering Research Council of Canada
[3] Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., Lee, (grant NSERC-A3723).
Y.-K.: Discovering periodic-frequent [15] Kiran Kumar Reddi “Generating Optimized
patterns Decision Tree Based on Discrete Wavelet
intransactionaldatabases.In:PAKDD2009.L Transform “Kiran Kumar Reddi et al. /
NAI5476,pp.242–253.Springer(2009). International Journal of Engineering Science
[4] Yu, Z., Yu, Z., Zhou, X., Becker, C., and Technology Vol. 2(3), 2010, 157-164.
Nakamura, Y.: Tree-based mining for dis-
covering patterns of human
interactioninmeetings.IEEETKDE24(4),759
–768(2012).
[5] R.Bakeman and J.M. Guttmann, “Observing
Interaction:“An Introduction to Sequential
Analysis”, Cambridge Univ. Press, 1997.
[6] M.S. Magnusson, “Discovering Hidden Time
Patterns in Behavior’s-Patterns and Their
Detection,” Behavior
ResearchMethods,InstrumentsandComputers
,vol.32,no.1,pp.93-110,2000.
[7] G.Casas-Garriga,“Discovering unbounded
Episodes in Sequential Data,” Proc.
European Conf. Principles and Practice of
Knowledge Discovery in Databases (PKDD
‟03), pp. 83-94, 2003.
[8] T. Morita, Y. Hirano, Y. Sumi, S. Kajita, and
K. Mase, “A Pattern Mining Method for
Interpretation of Interaction,”
Proc.Int‟lConf.MultimodalInterfaces(ICMI‟
05),pp.267-273,2005.