0% found this document useful (0 votes)

37 views5 pages

Improved Method For Pattern Discovery in Text Mining

This document summarizes an improved method for pattern discovery in text mining proposed by Bharate Laxman and D.Sujatha. The method implements a novel pattern taxonomy model technique proposed by Zhong et al. that discovers patterns, computes pattern specificities to evaluate term weights, and updates ambiguous patterns. The authors implemented this technique and built a prototype application to test its efficiency. Empirical results revealed the solution is useful for text mining.

Uploaded by

esatjournals

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views5 pages

Improved Method For Pattern Discovery in Text Mining

Uploaded by

esatjournals

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

IJRET: International Journal of Research in Engineering and Technology

eISSN: 2319-1163 | pISSN: 2321-7308

IMPROVED METHOD FOR PATTERN DISCOVERY IN TEXT MINING

Bharate Laxman1, D.Sujatha2
1

Student, 2Assistant professor, Department of CSE, ATRI, Andhra Pradesh, India

[email protected], [email protected]

Abstract
Digital data in the form of text documents is rapidly growing. Analyzing such data manually is a tedious task. Data mining techniques
have been around to analyze such data and bring about interesting patterns. Many existing methods are based on term-based
approaches that cant deal with synonymy and polysemy. Moreover they lack the ability in using and updating the discovered patterns.
Zhong et al. proposed an effective pattern discovery technique. It discovers patterns and then computes specificities of patterns for
evaluating term weights as per their distribution in the discovered patterns. It also takes care of updating patterns that exhibit
ambiguity which is a feature known as pattern evolution. In this paper we implemented that technique and also built a prototype
application to test the efficiency of the technique. The empirical results revealed that the solution is very useful in text mining domain.

Keywords Text mining, pattern discovery, text classification, pattern evolving

---------------------------------------------------------------------***---------------------------------------------------------------------1. INTRODUCTION
Knowledge discovery has become an indispensable
phenomenon in recent years due to the rapid increase in digital
data. They have attracted lot of attention in academic and
scientific circles. Many applications in the real world need
such mining of data in order to discover trends or patterns.
These trends or patterns lead to business intelligence (BI).
Such BI helps in taking well informed decisions. Many data
mining techniques came into existence in the past ten years.
They include closed pattern mining, maximum pattern mining,
sequential pattern mining, item set mining, and association
rule mining. These techniques are developed for data mining
algorithms. They are capable of producing huge number of
patterns. However, how to use those patterns and how to
update them in future is the area that needs some more
research. Especially in the field of text mining, patterns are
discord from text documents. It is a challenging job to use
those patterns and also update them. Earlier term based
methods are provided by Information Retrieval (IR)
techniques. The term based methods are classified into rough
set models [1], SVM based models [2] and probability models
[3]. All the term based methods suffer from problems such as
synonymy and polysemy. When award has many meanings it
is known as polysemy. When multiple words have similar
meaning, it is called synonymy. Thus the discovered patterns
with term based techniques have semantic meaning and
answering the exact user query is difficult.
For this reason for many years people started believing that
phrase-based techniques are better than that of term based.
However, the experiments in the field of data mining [4], [5],
[6] have not been proved. The possible reasons include the
phrases have less properties pertaining to statistics when

compared with terms; frequency of occurrence is low; noisy

and redundant phrases are more [6].
Though there are some drawbacks, the sequential patterns
became promising alternatives to phrases [7], [8]. The reason
for this is that sequential patterns avail required statistics like
terms. Pattern Taxonomy Models (PTMs) [8], [9] came into
existence to overcome the drawbacks of phrase-based mining
approaches. Pattern based approaches became alternatives but
much improvements are not made to make them more
effective for text mining. With regard to effectiveness there are
two issues. They are misinterpretation and low frequency.
When patterns are less frequent, they cant be used for
decision making. When the terms or patterns are
misinterpreted, the result will not be reliable. Low frequency
cant have required support. If the support is decreased, the
results may not be useful for business decisions.
Over the last many years Information Retrieval (IR) is also
used to have many techniques that used features of text
documents. They are used to retrieve content from huge
amount of documents based on the terms and their weights.
The terms may have different weights based on the context as
well. There might be semantic meanings that are to be
considered in IR. Therefore it is not sufficient to only consider
weights of terms for document analysis or evaluation. In this
paper we implement a novel pattern discovery technique
proposed by Zhong et al. [10]. It first computes specificities of
the discovered patterns and then evaluates the weights of
terms based on the distribution. Thus it is capable of avoiding
misinterpretation problem. Negative training examples
influence is also considered by this in order to avoid lowfrequency problem. Moreover the ambiguous patterns are
updated. This phenomenon is known as pattern evaluation.

__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org

574

IJRET: International Journal of Research in Engineering and Technology

Thus the proposed approach improves accuracy of the
discovered patterns.
The remainder of this paper is organized as follows. Section II
provides review of literature. Section III provides details of
the proposed technique. Section IV presents implementation
details. Section V provides experimental results while section
VI concludes the paper.

2. PRIOR WORK

eISSN: 2319-1163 | pISSN: 2321-7308

3. PATTERN TAXONOMY MODEL

The pattern taxonomy model described briefly here refers to
Zhong et al. [10]. The PTM approach assumes that all text
documents are converted to paragraphs. Therefore any given
document is a set of paragraphs. By using is a relation it is
possible to structure documents into taxonomy. Consider the
following table.
Table 1: Frequent patterns and covering sets (excerpt from
[10])

Textual documents are increasingly added to the World Wide

Web and also the electronic databases of organizations. One of
the representations which are well known is known as bag of
words approach that makes use of keywords. Tf*idf weighting
scheme is presented in [11] for representing text. In [12]
entropy weighting and global IDF are used for text
representation in addition to DFIDF. For the approach bag of
words various schemes were developed for weighting [13],
[14], [15]. The drawback in the bag of words is that choosing
limited number of words is a problem thus it causes over
fitting [6]. To reduce number of features other approaches
came into existence. They include Odds ratio, Chi-Square,
Mutual Information, and Information Gain [4], [6]. Though
there are many representations, the choice of representation is
based on the requirement, the rules of natural language [6].
Some researchers used phrases instead of words. Unigram and
bigram combination is also used in the text categorization
process. Phrase based approach is explored in [16]. Data
mining techniques are also used as explored in [17]. There was
no significant improvement in text mining when phrases are
used. It suffered from lower frequency and misinterpretation
problems [18]. Some insights were provided on ontology
mining which is again term based [19], [20]. In [21] a
technique known as pattern evolution was introduced. In data
mining communities, pattern mining is extensively used for
number of years. Algorithms such as GST [22], SLPMiner
[23], SPADE [56] etc. are used for the purpose of data mining.
However, finding interesting patterns is still open to anyone to
research [25], [26]. Pattern mining is also used in text mining
domain. Frequently found items is used text mining for
various decisions making applications. Closed sequential
patterns are also explored in text mining [9]. In [10] a model
known as Pattern Taxonomy model is proposed in order to
improve the discovered patterns in text mining. In [27] a twostage model was developed. The two stages include pattern
based methods and term based methods. For text mining
Natural Language Processing concepts are used. Recently a
new model known as concept-based model came into
existence [28], [29]. Conceptual Ontology Graph is also
explored in order to use semantic knowledge in the discovery
of patterns. This model provides effective discrimination
between meaningful terms and important terms. In this paper
Pattern Taxonomy Model is used for text mining.

As can be seen in table 1, frequent patterns are shows in the

left column while the right column shows the documents in
which these patterns exist. This is the based to structure
pattern taxonomy. The constructed pattern taxonomy for the
given values in table 1 is as shown in fig. 1.

Fig. 1 Pattern taxonomy (excerpt from [10])

As can be seen in fig. 1, there are many terms which are part
of pattern taxonomy. This information is best used in text
mining in order to produce closed patterns. The performance
of text mining gets improved using this model. More details
and deduced equations of this model are as explored in [10].

__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org

575

IJRET: International Journal of Research in Engineering and Technology

eISSN: 2319-1163 | pISSN: 2321-7308

4. PROTOTYPE IMPLEMENTAITON
The pattern discovery technique proposed by Zhong et al. [10]
has been implemented by us using Java programming
language. The environment used for the implementation
include a PC with 4GB RAM, Core 2 Dual processor.
Operating system used is Windows and the IDE is Net Beans.
Java SWING API is used to build GUI (Graphical User
Interface). The main UI of the application is as shown in fig.
2.

Fig. 4 Discovered patterns

Fig. 2 The main UI of the prototype

As seen in fig. 2 the application facilitates preprocessing
before actual discovery of patterns. The selected dataset is
shown in text area. Before proceeding further, the text needs to
be preprocessed for operations like removal of stop words and
stemming. On choosing preprocessing, the UI as shown in fig.
3 is rendered.

Fig 5 Shows terms in the discovered patterns

As can be seen in fig. 5, the terms for each discovered pattern
are presented. The pattern1 has terms such as t6, t7, t8 and t9.
This way the terms are shown for all discovered patterns.
Groups of terms involved in different patterns are extracted
and presented in fig. 6.

Fig. 3 UI showing preprocessing operations

As can be seen in fig. 3, there is provision for stop words
removal and stemming. These two are the fundamental preprocessing operations required before actually processing the
text documents. The PIM button helps to build a pattern
taxonomy model. The discovered patterns are shown in fig. 4.

Fig. 6 Distribution of Patterns and Terms

__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org

576

IJRET: International Journal of Research in Engineering and Technology

As can be seen in fig. 6, the discovered patterns distribution is
shown. The terms are grouped according to the patterns to
which they belong to. As can be observed in fig. 6, the results
show the terms which do not belong to any patterns. Such
terms are pruned and the results are presented in fig. 7.

Fig. 7 Final results showing distribution of terms and

patterns
As can be seen in fig. 7, the results reveal the distribution of
terms and corresponding discovered patterns.

CONCLUSIONS
Data mining techniques have been around for long time. The
techniques used to discover knowledge include sequential
pattern mining, frequent item set mining, closed pattern
mining and maximum pattern mining. These data mining
techniques are not useful for text mining. This is due to lack of
high specificity of discovered patterns. Not all frequent
patterns discovered by mining algorithm are useful. Moreover
then can be misinterpreted to make the problem worse. To
overcome the problems of misinterpretation and low
frequency, we proposed an effective pattern discovery. Pattern
deploying and evolving are the two parts in the proposed
technique. The empirical results revealed that the proposed
technique is effective.

REFERENCES
[1] Y. Li, C. Zhang, and J.R. Swan, An Information Filtering
Model on the Web and Its Application in Jobagent,
Knowledge-Based Systems, vol. 13, no. 5, pp. 285-296, 2000.
[2]S. Robertson and I. Soboroff, The Trec 2002 Filtering
Track
Report,
TREC,
2002,
trec.nist.gov/pubs/trec11/papers/OVER. FILTERING.ps.gz
[3] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information
Retrieval. Addison Wesley, 1999.
[4] D.D. Lewis, Feature Selection and Feature Extraction for
Text Categorization, Proc. Workshop Speech and Natural
Language, pp. 212-217, 1992.

eISSN: 2319-1163 | pISSN: 2321-7308

[5] S. Scott and S. Matwin, Feature Engineering for Text

Classification, Proc. 16th Intl Conf. Machine Learning
(ICML 99), pp. 379- 388, 1999.
[6] F. Sebastiani, Machine Learning in Automated Text
Categorization, ACM Computing Surveys, vol. 34, no. 1, pp.
1-47, 2002.
[7] N. Jindal and B. Liu, Identifying Comparative Sentences
in Text Documents, Proc. 29th Ann. Intl ACM SIGIR Conf.
Research and Development in Information Retrieval (SIGIR
06), pp. 244-251, 2006
[8] S.-T. Wu, Y. Li, and Y. Xu, Deploying Approaches for
Pattern Refinement in Text Mining, Proc. IEEE Sixth Intl
Conf. Data Mining (ICDM 06), pp. 1157-1161, 2006
[9] S.-T. Wu, Y. Li, Y. Xu, B. Pham, and P. Chen, Automatic
Pattern- Taxonomy Extraction for Web Mining, Proc.
IEEE/WIC/ACM Intl Conf. Web Intelligence (WI 04), pp.
242-248, 2004.
[10] Ning Zhong, Yuefeng Li, and Sheng-Tang Wu, Effective
Pattern Discovery for Text Mining, IEEE TRANSACTIONS
ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24,
NO. 1, JANUARY 2012.
[11] X. Li and B. Liu, Learning to Classify Texts Using
Positive and Unlabeled Data, Proc. Intl Joint Conf. Artificial
Intelligence (IJCAI 03), pp. 587-594, 2003.
[12] S.T. Dumais, Improving the Retrieval of Information
from External Sources, Behavior Research Methods,
Instruments, and Computers, vol. 23, no. 2, pp. 229-236, 1991
[13] K. Aas and L. Eikvil, Text Categorisation: A Survey,
Technical Report Raport NR 941, Norwegian Computing
Center, 1999.
[14] T. Joachims, A Probabilistic Analysis of the Rocchio
Algorithm withtfidf for Text Categorization, Proc. 14th Intl
Conf. Machine Learning (ICML 97), pp. 143-151, 1997
[15] G. Salton and C. Buckley, Term-Weighting Approaches
in Automatic Text Retrieval, Information Processing and
Management: An Intl J., vol. 24, no. 5, pp. 513-523, 1988.
[16] R. Sharma and S. Raman, Phrase-Based Text
Representation for Managing the Web Document, Proc. Intl
Conf. Information Technology: Computers and Comm.
(ITCC), pp. 165-169, 2003.
[17] H. Ahonen, O. Heinonen, M. Klemettinen, and A.I.
Verkamo, Applying Data Mining Techniques for Descriptive
Phrase Extraction in Digital Document Collections, Proc.
IEEE IntlForum on Research and Technology Advances in
Digital Libraries (ADL 98), pp. 2-11, 1998
[18] D.D. Lewis, An Evaluation of Phrasal and Clustered
Representations on a Text Categorization Task, Proc. 15th
Ann. Intl ACM SIGIR Conf. Research and Development in
Information Retrieval (SIGIR 92), pp. 37-50, 1992.
[19] A. Maedche, Ontology Learning for the Semantic Web.
Kluwer Academic, 2003.
[20] C. Manning and H. Schu tze, Foundations of Statistical
Natural Language Processing.MIT Press, 1999
[21] Y. Li and N. Zhong, Mining Ontology for Automatically
Acquiring Web User Information Needs, IEEE Trans.

__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org

577

IJRET: International Journal of Research in Engineering and Technology

eISSN: 2319-1163 | pISSN: 2321-7308

Knowledge and Data Eng., vol. 18, no. 4, pp. 554-568, Apr.
2006.
[22] Y. Huang and S. Lin, Mining Sequential Patterns Using
Graph Search Techniques, Proc. 27th Ann. Intl Computer
Software and Applications Conf., pp. 4-9, 2003
[23] M. Seno and G. Karypis, Slpminer: An Algorithm for
Finding Frequent Sequential Patterns Using LengthDecreasing Support Constraint, Proc. IEEE Second Intl
Conf. Data Mining (ICDM 02), pp. 418-425, 2002.
[24] M. Zaki, Spade: An Efficient Algorithm for Mining
Frequent Sequences, Machine Learning, vol. 40, pp. 31-60,
2001.
[25] Y. Li, W. Yang, and Y. Xu, Multi-Tier Granule Mining
for Representations of Multidimensional Association Rules,
Proc. IEEE Sixth Intl Conf. Data Mining (ICDM 06), pp.
953-958, 2006.
[26] Y. Xu and Y. Li, Generating Concise Association Rules,
Proc. ACM 16th Conf. Information and Knowledge
Management (CIKM 07), pp. 781-790, 2007
[27] Y. Li, X. Zhou, P. Bruza, Y. Xu, and R.Y. Lau, A TwoStage Text Mining Model for Information Filtering, Proc.
ACM 17th Conf. Information and Knowledge Management
(CIKM 08), pp. 1023-1032, 2008.
[28] S. Shehata, F. Karray, and M. Kamel, Enhancing Text
Clustering Using Concept-Based Mining Model, Proc. IEEE
Sixth Intl Conf. Data Mining (ICDM 06), pp. 1043-1048,
2006
[29] S. Shehata, F. Karray, and M. Kamel, A Concept-Based
Model for Enhancing Text Categorization, Proc. 13th Intl
Conf. Knowledge Discovery and Data Mining (KDD 07), pp.
629-637, 2007

__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org

578

Harvard Business Review Case Study Agero
100% (1)
Harvard Business Review Case Study Agero
12 pages
Research Assistants 1
No ratings yet
Research Assistants 1
2 pages
Financial Modeling Interview Questions
No ratings yet
Financial Modeling Interview Questions
6 pages
A Review Paper On Smart Health Care System Using Internet of Things
No ratings yet
A Review Paper On Smart Health Care System Using Internet of Things
5 pages
A Review On Fake Biometric Detection System For Various Applications
No ratings yet
A Review On Fake Biometric Detection System For Various Applications
4 pages
A Research On Significance of Kalman Filter-Approach As Applied in Electrical Power System
No ratings yet
A Research On Significance of Kalman Filter-Approach As Applied in Electrical Power System
8 pages
A Study and Survey On Various Progressive Duplicate Detection Mechanisms
No ratings yet
A Study and Survey On Various Progressive Duplicate Detection Mechanisms
3 pages
A Servey On Wireless Mesh Networking Module
No ratings yet
A Servey On Wireless Mesh Networking Module
5 pages
A Survey On Identification of Ranking Fraud For Mobile Applications
No ratings yet
A Survey On Identification of Ranking Fraud For Mobile Applications
6 pages
I JR Et 20160503007dfce
No ratings yet
I JR Et 20160503007dfce
5 pages
A Servey On Wireless Mesh Networking Module
No ratings yet
A Servey On Wireless Mesh Networking Module
5 pages
A Review Paper On Smart Health Care System Using Internet of Things
No ratings yet
A Review Paper On Smart Health Care System Using Internet of Things
5 pages
A Survey On Identification of Ranking Fraud For Mobile Applications
No ratings yet
A Survey On Identification of Ranking Fraud For Mobile Applications
6 pages
A Review On Fake Biometric Detection System For Various Applications
No ratings yet
A Review On Fake Biometric Detection System For Various Applications
4 pages
A Research On Significance of Kalman Filter-Approach As Applied in Electrical Power System
No ratings yet
A Research On Significance of Kalman Filter-Approach As Applied in Electrical Power System
8 pages
A Study and Survey On Various Progressive Duplicate Detection Mechanisms
No ratings yet
A Study and Survey On Various Progressive Duplicate Detection Mechanisms
3 pages
Privacy Preserving Through Mediator in Decentralized Ciphertext Policy Attribute Based Encryption PDF
No ratings yet
Privacy Preserving Through Mediator in Decentralized Ciphertext Policy Attribute Based Encryption PDF
6 pages
Pushover Analysis-To Study Seismic Performances of Vertical Irregular Structure PDF
No ratings yet
Pushover Analysis-To Study Seismic Performances of Vertical Irregular Structure PDF
4 pages
Unit 3
No ratings yet
Unit 3
69 pages
Pattern Mining Current Challenges and Op
No ratings yet
Pattern Mining Current Challenges and Op
16 pages
Cls 1
No ratings yet
Cls 1
12 pages
Upadhyay 2018 Ijca 916573
No ratings yet
Upadhyay 2018 Ijca 916573
9 pages
44-Research Trends in Text Mining
No ratings yet
44-Research Trends in Text Mining
12 pages
Vietnam Research.v2
No ratings yet
Vietnam Research.v2
13 pages
Text Document Classification and Pattern Recognition: Qin Wu, Eddie Fuller, Cun-Quan Zhang
No ratings yet
Text Document Classification and Pattern Recognition: Qin Wu, Eddie Fuller, Cun-Quan Zhang
6 pages
Introduction
No ratings yet
Introduction
11 pages
Text Categorization Using Association Rule and Naïve Bayes Classifier
No ratings yet
Text Categorization Using Association Rule and Naïve Bayes Classifier
9 pages
Efficient Preprocessing and Patterns Identification Approach For Text Mining
No ratings yet
Efficient Preprocessing and Patterns Identification Approach For Text Mining
6 pages
Trend Analysis in Machine Learning Research
No ratings yet
Trend Analysis in Machine Learning Research
6 pages
KDD97 046
No ratings yet
KDD97 046
3 pages
Assignment - Acc 221-2023
No ratings yet
Assignment - Acc 221-2023
3 pages
Shaping, Planning, and Slotting Machines - Principles, Specifications, and Comparisons
No ratings yet
Shaping, Planning, and Slotting Machines - Principles, Specifications, and Comparisons
12 pages
Knowledge Discovery in Textual Databases (KDT)
No ratings yet
Knowledge Discovery in Textual Databases (KDT)
7 pages
Zhou 2016
No ratings yet
Zhou 2016
14 pages
Paper Intro and Conclusion Corrected
No ratings yet
Paper Intro and Conclusion Corrected
5 pages
Texthuff
No ratings yet
Texthuff
3 pages
1612-Article Text-6168-1-4-20250219
No ratings yet
1612-Article Text-6168-1-4-20250219
20 pages
A100K11750 CTB Technical Manual
No ratings yet
A100K11750 CTB Technical Manual
82 pages
Text Extraction Research Paper
No ratings yet
Text Extraction Research Paper
6 pages
27 A Review of Some Semi Supervised Learning Methods
No ratings yet
27 A Review of Some Semi Supervised Learning Methods
10 pages
Zhang 2015
No ratings yet
Zhang 2015
5 pages
Approach To Textual Data Analysis
No ratings yet
Approach To Textual Data Analysis
11 pages
Applying Clustering Techniques For Efficient Text Mining in Twitter Data
No ratings yet
Applying Clustering Techniques For Efficient Text Mining in Twitter Data
4 pages
Mjoiuytrsfedsqwe 4 e 56 R 7 I 8 Ouikjghfvdcsretjyukilopl, KMJHNGB
No ratings yet
Mjoiuytrsfedsqwe 4 e 56 R 7 I 8 Ouikjghfvdcsretjyukilopl, KMJHNGB
9 pages
Cultural Metaphors
No ratings yet
Cultural Metaphors
13 pages
Journal Pre-Proofs: Expert Systems With Applications
No ratings yet
Journal Pre-Proofs: Expert Systems With Applications
16 pages
Bravo-Guerrero vs. Bravo, 465 SCRA 244, July 29, 2005
No ratings yet
Bravo-Guerrero vs. Bravo, 465 SCRA 244, July 29, 2005
7 pages
Adaptive XML Tree Classification On Evolving Data Streams
No ratings yet
Adaptive XML Tree Classification On Evolving Data Streams
16 pages
Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
63 pages
A Comprehensive Survey of Pattern Mining: Challenges and Opportunities
No ratings yet
A Comprehensive Survey of Pattern Mining: Challenges and Opportunities
8 pages
Birds Nest Menu
No ratings yet
Birds Nest Menu
7 pages
A Three-Level Disposal Site Selection Criteria System For Toxic and Hazardous Wastes in The Philippines
No ratings yet
A Three-Level Disposal Site Selection Criteria System For Toxic and Hazardous Wastes in The Philippines
9 pages
2012 Orientation Guide-1
No ratings yet
2012 Orientation Guide-1
22 pages
Base Knowledge Based
No ratings yet
Base Knowledge Based
14 pages
AUT International Scholarships - South Asia - Regulations S1 2025 Final Version
No ratings yet
AUT International Scholarships - South Asia - Regulations S1 2025 Final Version
5 pages
7 1457934700 - 14-03-2016 PDF
No ratings yet
7 1457934700 - 14-03-2016 PDF
3 pages
A Women Secure Mobile App For Emergency Usage (Go Safe App)
No ratings yet
A Women Secure Mobile App For Emergency Usage (Go Safe App)
3 pages
Effective Pattern Discovery For Text Mining
No ratings yet
Effective Pattern Discovery For Text Mining
8 pages
Demos 049
No ratings yet
Demos 049
8 pages
Compusoft, 3 (10), 1140-1142 PDF
No ratings yet
Compusoft, 3 (10), 1140-1142 PDF
3 pages
Experimental Study On The Application of Polymer Modified Bitumen in The Flexible Pavement
No ratings yet
Experimental Study On The Application of Polymer Modified Bitumen in The Flexible Pavement
1 page
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
No ratings yet
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
8 pages
C1 Reading Political Manifestos
No ratings yet
C1 Reading Political Manifestos
3 pages
A Survey On Text Categorization: International Journal of Computer Trends and Technology-volume3Issue1 - 2012
No ratings yet
A Survey On Text Categorization: International Journal of Computer Trends and Technology-volume3Issue1 - 2012
7 pages
Simple Additive Weighting Method To Determining Employee Salary Increase Rate
No ratings yet
Simple Additive Weighting Method To Determining Employee Salary Increase Rate
7 pages
A Study On Visualizing Semantically Similar Frequent Patterns in Dynamic Datasets
No ratings yet
A Study On Visualizing Semantically Similar Frequent Patterns in Dynamic Datasets
6 pages
Text Categorization Using Association Rule and Naïve Bayes Classifier
No ratings yet
Text Categorization Using Association Rule and Naïve Bayes Classifier
9 pages
Text Mining Assistant: Muslum Serdar Akis, Semih Utku
No ratings yet
Text Mining Assistant: Muslum Serdar Akis, Semih Utku
6 pages
Right To Privacy Essay
No ratings yet
Right To Privacy Essay
18 pages
Unit 3
No ratings yet
Unit 3
3 pages
Jurnal Information Retrieval
No ratings yet
Jurnal Information Retrieval
4 pages
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
From Everand
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Text Mining: Techniques and Its Application: December 2014
100% (1)
Text Mining: Techniques and Its Application: December 2014
5 pages
Class X 1 Mark Question Geography Term 2 PDF
No ratings yet
Class X 1 Mark Question Geography Term 2 PDF
5 pages
3.sung Sam Hong Et Al
No ratings yet
3.sung Sam Hong Et Al
19 pages
Experiment #2 - Introduction To TI C2000 Microcontroller, Code Composer Studio (CCS) and Matlab Graphic User Interface (GUI)
No ratings yet
Experiment #2 - Introduction To TI C2000 Microcontroller, Code Composer Studio (CCS) and Matlab Graphic User Interface (GUI)
18 pages
Sison V Teodoro
No ratings yet
Sison V Teodoro
1 page
Answer To The Question No: (A) : Pattern Recognition Is The Process of Recognizing Patterns by Using
100% (1)
Answer To The Question No: (A) : Pattern Recognition Is The Process of Recognizing Patterns by Using
4 pages
Online Message Categorization Using Apriori Algorithm
No ratings yet
Online Message Categorization Using Apriori Algorithm
7 pages
Improve Text Classification Accuracy Based On Classifier Fusion Methods
No ratings yet
Improve Text Classification Accuracy Based On Classifier Fusion Methods
6 pages
Ultra Vires
No ratings yet
Ultra Vires
9 pages
Compusoft, 3 (9), 1079-1082 PDF
No ratings yet
Compusoft, 3 (9), 1079-1082 PDF
4 pages
A Tutorial Review On Text Mining Algorithms: Mrs. Sayantani Ghosh, Mr. Sudipta Roy, and Prof. Samir K. Bandyopadhyay
No ratings yet
A Tutorial Review On Text Mining Algorithms: Mrs. Sayantani Ghosh, Mr. Sudipta Roy, and Prof. Samir K. Bandyopadhyay
11 pages
Background Research: 2.1 Machine Learning
No ratings yet
Background Research: 2.1 Machine Learning
9 pages
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Machine Learning Fundamentals: Concepts, Models, and Applications
From Everand
Machine Learning Fundamentals: Concepts, Models, and Applications
Amar Sahay
No ratings yet
02 Activity 1 READING WRITING
No ratings yet
02 Activity 1 READING WRITING
5 pages
Assoland Construction Pte LTD V Malayan Credit Properties Pte LTD (1993) 3 SLR 470
No ratings yet
Assoland Construction Pte LTD V Malayan Credit Properties Pte LTD (1993) 3 SLR 470
2 pages
An Efficient Pharse Based Pattern Taxonomy Deploying Method For Text Document Mining
No ratings yet
An Efficient Pharse Based Pattern Taxonomy Deploying Method For Text Document Mining
9 pages
Data Science: Concepts, Strategies, and Applications
From Everand
Data Science: Concepts, Strategies, and Applications
Zemelak Goraga
No ratings yet
Gardner Denver MH5 Hydrapak
0% (1)
Gardner Denver MH5 Hydrapak
8 pages
Labour Regulations in The UAE Are Governed by The UAE Labour Law
No ratings yet
Labour Regulations in The UAE Are Governed by The UAE Labour Law
10 pages
Power Generation Using Maglev Windmill PDF
No ratings yet
Power Generation Using Maglev Windmill PDF
6 pages
Survey Data Analysis
No ratings yet
Survey Data Analysis
17 pages
PTE 3 Week Study Schedule
No ratings yet
PTE 3 Week Study Schedule
3 pages
SB3000
No ratings yet
SB3000
76 pages
Text Mining Assignment
No ratings yet
Text Mining Assignment
12 pages
Self-Supervised Learning: Teaching AI with Unlabeled Data
From Everand
Self-Supervised Learning: Teaching AI with Unlabeled Data
Robert Johnson
No ratings yet
Metaheuristic: Fundamentals and Applications
From Everand
Metaheuristic: Fundamentals and Applications
Fouad Sabry
No ratings yet
01 Road Roller Basic Knowledge (6611E)
0% (1)
01 Road Roller Basic Knowledge (6611E)
16 pages
Analysis and Optimization of Electrodes For Improving The Performance of Ring Laser Gyro PDF
No ratings yet
Analysis and Optimization of Electrodes For Improving The Performance of Ring Laser Gyro PDF
4 pages
Dell - Digital Firm: Rehan Khan Sana Bashir Shah Shamael.Z.Khan Shoaib Shamim Sulaiman Shakil Taji
No ratings yet
Dell - Digital Firm: Rehan Khan Sana Bashir Shah Shamael.Z.Khan Shoaib Shamim Sulaiman Shakil Taji
23 pages
Text Mining Techniques Applications and Issues2
No ratings yet
Text Mining Techniques Applications and Issues2
5 pages
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Thane PWD DSR 15-16 PDF
0% (1)
Thane PWD DSR 15-16 PDF
335 pages
Analysis of Cylindrical Shell Structure With Varying Parameters PDF
No ratings yet
Analysis of Cylindrical Shell Structure With Varying Parameters PDF
6 pages
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Analysis and Design of A Multi Compartment Central Cone Cement Storing Silo PDF
No ratings yet
Analysis and Design of A Multi Compartment Central Cone Cement Storing Silo PDF
7 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Secrets of Statistical Data Analysis and Management Science!
From Everand
Secrets of Statistical Data Analysis and Management Science!
Andrei Besedin
No ratings yet

Improved Method For Pattern Discovery in Text Mining

Uploaded by

Improved Method For Pattern Discovery in Text Mining

Uploaded by

IJRET: International Journal of Research in Engineering and Technology

eISSN: 2319-1163 | pISSN: 2321-7308

IMPROVED METHOD FOR PATTERN DISCOVERY IN TEXT MINING

Student, 2Assistant professor, Department of CSE, ATRI, Andhra Pradesh, India

Keywords Text mining, pattern discovery, text classification, pattern evolving

compared with terms; frequency of occurrence is low; noisy

IJRET: International Journal of Research in Engineering and Technology

eISSN: 2319-1163 | pISSN: 2321-7308

3. PATTERN TAXONOMY MODEL

Textual documents are increasingly added to the World Wide

As can be seen in table 1, frequent patterns are shows in the

Fig. 1 Pattern taxonomy (excerpt from [10])

IJRET: International Journal of Research in Engineering and Technology

eISSN: 2319-1163 | pISSN: 2321-7308

Fig. 4 Discovered patterns

Fig. 2 The main UI of the prototype

Fig 5 Shows terms in the discovered patterns

Fig. 3 UI showing preprocessing operations

Fig. 6 Distribution of Patterns and Terms

IJRET: International Journal of Research in Engineering and Technology

Fig. 7 Final results showing distribution of terms and

eISSN: 2319-1163 | pISSN: 2321-7308

[5] S. Scott and S. Matwin, Feature Engineering for Text

IJRET: International Journal of Research in Engineering and Technology

eISSN: 2319-1163 | pISSN: 2321-7308

You might also like