0% found this document useful (0 votes)
168 views

Effective Pattern Discovery For Text Mining

The document presents an innovative and effective pattern discovery technique for text mining. It includes processes for deploying and evolving discovered patterns to improve their effectiveness. Substantial experiments on Reuters newswire data and TREC topics demonstrate encouraging performance of the proposed solution. The technique aims to address issues with term-based approaches like polysemy and synonymy by using pattern-based approaches.

Uploaded by

Swathi Manthena
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views

Effective Pattern Discovery For Text Mining

The document presents an innovative and effective pattern discovery technique for text mining. It includes processes for deploying and evolving discovered patterns to improve their effectiveness. Substantial experiments on Reuters newswire data and TREC topics demonstrate encouraging performance of the proposed solution. The technique aims to address issues with term-based approaches like polysemy and synonymy by using pattern-based approaches.

Uploaded by

Swathi Manthena
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 8

Effective Pattern Discovery for Text Mining

ABSTRACT Many data mining techniques have been proposed for mining useful patterns in text documents. However how to effectively use and update discovered patterns is still an open research issue especially in the domain of text mining. !ince most existing text mining methods adopted term"based approaches they all suffer from the problems of polysemy and synonymy. #ver the years people have often held the hypothesis that pattern $or phrase%"based approaches should perform better than the term"based ones but many experiments do not support this hypothesis. This paper presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. !ubstantial experiments on &'() data collection and T&E' topics demonstrate that the proposed solution achieves encouraging performance.

Contact: 040-40274843, 8143615322 Email id: [email protected], www.logicsystems.org.in

Effective Pattern Discovery for Text Mining

SYSTEM ARCHITECTURE:

Contact: 040-40274843, 8143615322 Email id: [email protected], www.logicsystems.org.in

Effective Pattern Discovery for Text Mining

EXISTING SYSTEM: * * * * Existing is used to term"based approach to extracting the text. Term"based ontology methods are providing some text representations. E.g.+ Hierarchical is used to determine synonymy and hyponymy relations between ,eywords. Pattern evolution technique is used to improve the performance of term"based approach.

DISADVANTAGES OF EXISTING SYSTEM: * * The term"based approach is suffered from the problems of polysemy and synonymy. - term with higher $tf.idf% value could be meaningless in some d"patterns $some important parts in documents%.

PROPOSED SYSTEM: * * * * -n effective pattern discovery technique is discovered Evaluates specificities of patterns and then evaluates term weights according to the distribution of terms in the discovered patterns !olves Misinterpretation Problem 'onsiders the influence of patterns from the negative training examples to find ambiguous $noisy% patterns and tries to reduce their influence for the low"frequency problem. The process of updating ambiguous patterns can be referred as pattern evolution. The proposed approach can improve the accuracy of evaluating term weights because discovered patterns are more specific than whole documents. /n 0eneral there are two phases Training and Testing /n training phase the d"patterns in positive documents $D1% based on a min sup are found and evaluates term supports by deploying dpatterns to terms /n Testing Phase to revise term supports using noise negative documents in D based on an experimental coefficient
Contact: 040-40274843, 8143615322 Email id: [email protected], www.logicsystems.org.in

* * * * * *

Effective Pattern Discovery for Text Mining

The incoming documents then can be sorted based on these weights.

ADVANTAGES OF PROPOSED SYSTEM: * * * * The proposed approach is used to improve the accuracy of evaluating term weights. 2ecause the discovered patterns are more specific than whole documents. To avoiding the issues of phrase"based approach to using the pattern"based approach. Pattern mining techniques can be used to find various text patterns.

LIST OF MODULES: ). 3oading document 4. Text Preprocessing 5. Pattern taxonomy process 6. Pattern deploying 7. Pattern evolving

MODULES DESCRIPTION:

1. Loading do !"#n$ /n this module to load the list of all documents. The user to retrieve one of the documents. This document is given to next process. That process is preprocessing.

%. T#&$ P'#('o #))ing The retrieved document preprocessing is done in module.


Contact: 040-40274843, 8143615322 Email id: [email protected], www.logicsystems.org.in

Effective Pattern Discovery for Text Mining

There are two types of process is done. )% stop words removal 4%text stemming !top words are words which are filtered out prior to or after processing of natural language data. !temming is the process for reducing inflected $or sometimes derived% words to their stem base or root form. /t generally a written word forms.

*. Pa$$#'n $a&ono"+ ('o #)) /n this module the documents are split into paragraphs. Each paragraph is considered to be each document. /n each document the set of terms are extracted. The terms which can be extracted from set of positive documents.

,. Pa$$#'n d#(-o+ing The discovered patterns are summari8ed. The d"pattern algorithm is used to discover all patterns in positive documents are composed. The term supports are calculated by all terms in d"pattern. Term support means weight of the term is evaluated.

Contact: 040-40274843, 8143615322 Email id: [email protected], www.logicsystems.org.in

Effective Pattern Discovery for Text Mining

.. Pa$$#'n #/o-/ing /n this module used to identify the noisy patterns in documents. !ometimes system falsely identified negative document as a positive. !o noise is occurred in positive document. The noised pattern named as offender. /f partial conflict offender contains in positive documents the reshuffle process is applied.

SYSTEM CONFIGURATION:0 HARD1ARE RE2UIREMENTS:0

Processor !peed &-M Hard Dis, <loppy Drive " =ey 2oard Mouse Monitor "

"Pentium 9/// " " ).) 0h8 47: M2$min%

" 4; 02 ).66 M2 " !tandard >indows =eyboard

Two or Three 2utton Mouse " !(0-

Contact: 040-40274843, 8143615322 Email id: [email protected], www.logicsystems.org.in

Effective Pattern Discovery for Text Mining

SOFT1ARE RE2UIREMENTS:0

#perating !ystem <ront End T##3

+ >indows?7@?A@4;;;@BP + Cava + Detbeans /DE

REFERENCE: E)F =. -as and 3. Ei,vil GText 'ategorisation+ - !urvey H Technical &eport &aport D& ?6) Dorwegian 'omputing 'enter )???. E4F &. -grawal and &. !ri,ant G<ast -lgorithms for Mining -ssociation &ules in 3arge Databases H Proc. 4;th /ntIl 'onf. (ery 3arge Data 2ases $(3D2 I?6% pp. 6JA"6?? )??6. E5F H. -honen #. Heinonen M. =lemettinen and -./. (er,amo G-pplying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document 'ollections H Proc. /EEE /ntIl <orum on &esearch and Technology -dvances in Digital 3ibraries $-D3 I?A% pp. 4")) )??A. E6F &. 2ae8a"Kates and 2. &ibeiro"Deto Modern /nformation &etrieval. -ddison >esley )???. E7F D. 'ancedda D. 'esa"2ianchi -. 'onconi and '. 0entile G=ernel Methods for Document <iltering H T&E' trec.nist.gov@ pubs@trec))@papers@,ermit.ps.g8 4;;4. E:F D. 'ancedda E. 0aussier '. 0outte and C."M. &enders G>ord" !equence =ernels H C. Machine 3earning &esearch vol. 5 pp. );7?" );A4 4;;5. EJF M.<. 'aropreso !. Matwin and <. !ebastiani G!tatistical Phrases in -utomated Text 'ategori8ation H Technical &eport /E/"26";J" 4;;; /nstituto di Elabora8ione dellI/nforma8ione 4;;;. EAF '. 'ortes and (. (apni, G!upport"(ector Detwor,s H Machine 3earning vol. 4; no. 5 pp. 4J5"4?J )??7. E?F !.T. Dumais G/mproving the &etrieval of /nformation from External !ources H 2ehavior &esearch Methods /nstruments and 'omputers vol. 45 no. 4 pp. 44?"45: )??).

Contact: 040-40274843, 8143615322 Email id: [email protected], www.logicsystems.org.in

Effective Pattern Discovery for Text Mining

E);F C. Han and =.'."'. 'hang GData Mining for >eb /ntelligence H 'omputer vol. 57 no. )) pp. :6"J; Dov. 4;;4. E))F C. Han C. Pei and K. Kin GMining <requent Patterns without 'andidate 0eneration H Proc. -'M !/0M#D /ntIl 'onf. Management of Data $!/0M#D I;;% pp. )")4 4;;;. E)4F K. Huang and !. 3in GMining !equential Patterns Lsing 0raph !earch Techniques H Proc. 4Jth -nn. /ntIl 'omputer !oftware and -pplications 'onf. pp. 6"? 4;;5. E)5F D. Cindal and 2. 3iu G/dentifying 'omparative !entences in Text Documents H Proc. 4?th -nn. /ntIl -'M !/0/& 'onf. &esearch and Development in /nformation &etrieval $!/0/& I;:% pp. 466"47) 4;;:. E)6F T. Coachims G- Probabilistic -nalysis of the &occhio -lgorithm with tfidf for Text 'ategori8ation H Proc. )6th /ntIl 'onf. Machine 3earning $/'M3 I?J% pp. )65")7) )??J. E)7F T. Coachims GText 'ategori8ation with !upport (ector Machines+ 3earning with Many &elevant <eatures H Proc. European 'onf. Machine 3earning $/'M3 I?A% pp. )5J")64 )??A.

Contact: 040-40274843, 8143615322 Email id: [email protected], www.logicsystems.org.in

You might also like