Key-Phrase Extraction For Classification

Uploaded by

rickshark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views4 pages

Key-Phrase Extraction For Classification

Uploaded by

rickshark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

KEY-PHRASE EXTRACTION FOR CLASSIFICATION

Nikitas N. Karanikolas* and Christos Skourlas**

Areteion University Hospital, Systems’ Head, Athens, Greece

**
TEI of Athens/Dept. of Informatics, Professor, Aegaleo, Greece

[email protected]

Abstract: In this paper we consider the problem of number of occurrences. Let see a bilingual portion of a
extracting key-phrases from a bilingual texts discharge letter covering past history and presentation:
collection and using them for text classification. A “71 years old male patient with a history of a AAA
key-phrase could be defined as a sequence of words (Abdominal Aortic Aneurysm) repair 13 months ago
of a given size in a given partial order that occur presented with RUQ (Right Upper Quandrum) pain and
within a sentence. We describe an algorithm for the a palpable mass of two months duration”.
discovery of key-phrases. Then, a framework of Stop-words could be words as the following ones:
handling multilingual texts / documents is described with, a, of, and that have no implication in retrieving
which combines the use of the traditional vector texts. Key-phrases could be sequences of words as the
space model with a new similarity measure which is following ones:
based on the key-phrases. This framework is used
for finding the most similar documents of a training AAA Abdominal Aortic Aneurysm
set with any new document and selecting the classes RUQ Right Upper Quandrum pain
of the similar documents as the most plausible ones
for the new document. Some experimental results The basic problem in analyzing such a text
are also presented. collection is to find key-phrases, i.e., sequences of
words occurring frequently close to each other. For
Introduction example, a phrase "A (followed by) B (followed by) C
(followed by) D", where A, B, C, D are four distinct
Phrase extraction is the subject of interesting words, must occurs a specific number of times in order
research accompanied by various experimental and to be considered as a key-phrase. Note that in the
operational tools. It is worth mentioning here that these sentence there can be other words occurring between
tools are usually oriented towards the extraction of these ones. The user must define how close is close
information from Web applications. As an example we enough by giving the width of the word window within
can mention the case of the KEA system [1] which which the key-phrase must occur. The user also must
implements in Java a rather simple algorithm for specify how often a key-phrase has to occur to be
extracting phrases from English text. considered frequent.
In the case of the Greek language there is a rich
syntactic and inflectional (grammatical) system that Discovery of frequent key-phrases
implies further difficulties in the extraction process.
Hence, the use of a stop-words list, some Mannila et. al. [6] describe an algorithm for discovering
morphological analysis, stemming, etc are frequent episodes in a telecommunication network alarm
prerequisites for handling Greek-Latin text. It is also database. The essential points of their method are the
interesting to see the problem of extracting phrases following:
from texts written in other languages with rich a. Input data is a flat sequence of ordered episodes
inflectional system [2]. (faults) and is not organized into other structural
We must also stress the importance of using these levels. This fact implies that the situation in our
extracted phrases as terms characterizing the document case is different. We have documents / texts
and their store and organization as a basis for effective collections where words and phrases compose
free-text searching. sentences, sentences compose documents and
documents compose the collection.
Key-phrase extraction b. Their main idea is that for every frequent sequence
of events all the subsequences are at least equal
Most machine learning and text mining techniques
frequent. Therefore, the construction of candidate
are adapted towards the analysis of text collections.
sequences of n+1 width (Cn+1) can be based on the
Texts are composed of words or phrases and have an
frequent sequences of n width (Ln ). The implication
inherent sequential structure. Such a text can be
is that the search space could be reduced.
viewed as a sequence of words, stop-words,
punctuation marks, parentheses, key-phrases, where
each key-phrase has an associated frequency e.g. the
c. The episodes of a sequence are successive but in syntactic analyzers) and works restrictively in the
the sequence can be other events occurring selection of key-phrases for classification.
between the episodes. The above discussion combined with the analysis of
We think that the choice of sequences of events the method in [6] have influnced us in the construction
which is based on high frequencies is a reliable method of a new algorithm for key-phrase extraction for
for forecasting in general. An adaptation of such a classification.
method in a text processing environment can be helpful We formalize the problem of phrase extraction for
and especially in the creation of a type ahead wizard. classification in the following way. Given a collection
However, if key-phrases (sequences of words) are of documents subdivided into classes, a window width
used as indexing terms for information retrieval it is and a frequency threshold, find all key-phrases that
better to choose phrases that exist in a few texts (if the occur frequently enough in one or few classes but do not
candidate phrases exist in many texts then they are occur frequently enough in other classes. We describe
useless for retrieval purposes) and are quite frequent an algorithm for solving this task. The algorithm has two
within these texts. Therefore, it is better to use alternating phases: building new candidate key-phrases,
appropriate measures that prefer / choose such phrases. and evaluating how often these ones occur in a class of
A simple measure in this category is the following: the collection.
N The idea of building candidate patterns from smaller
freq(P,D) x ------------ ones is incorporated to the algorithm. Such an idea has
docfreq(P) been profitably used in the discovery of association
where freq(P,D) is the frequency of phrase P in rules etc. and occurs also in other contexts [6, 7].
document D, docfreq(P) is the number of documents in Adapting the main ideas discussed in [6] we can
the collection that include phrase P and N is the claim that the efficiency of our algorithm is based on the
number of documents in the collection. fact: Potentially, a very large number of candidate key-
An alternation of the above measure is used in the phrases has to be checked. Hence, we can reduce the
KEA system [3] to build a prediction model based on a search space by building larger key-phrases from
training set of documents. The following equation smaller ones. In other words, it is only necessary to test
describes this measure: the occurrences of key-phrases whose all sub key-
freq(P,D) docfreq(P) phrases are frequent.
TFxIDF = ----------- x –log2 ------------
size(D) N
Algorithm
If key-phrases are used as features for
texts/documents classification [4, 5] then the frequent We give an algorithm for finding key-phrases that
key-phrases are inappopriate for such a task. occur frequently enough in one or few classes but do not
If the choice of key-phrases for text classification is occur frequently enough in many classes:
based on measures, as the one used by the KEA
system, then there are some potential problems: 1 For every class of the training set do
1. A candidate key-phrase that exists in many 2 For every document of the class do
documents of only one class (and not in another 3 Stemming
4 stopword removal
class) could be erroneously rejected if the number 5 End {For every document of the class}
of the documents of this class is greater than the 6 Choose the most frequent stems of the
number of documents of other classes. class (P0 - parameter)
7 Form the candidate double word phrases
2. A candidate key-phrase could be erroneously (C2) from the frequent stems (L1)
chosen in case that it exists in a small subset of 8 Choose the most frequent double word
texts of a numerous / dense class and all these phrases (L2) (W1 and P1 - parameters)
texts are dedicated on a specific subtopic of the 9 For x=3,4 do
topic of class. 10 Form the candidate x - width word
phrases (Cx) from the frequent (x-1) –
3. A candidate key-phrase is chosen because it exists width word phrases (Lx-1)
only in few texts in the collection and it is quite 11 Choose the most frequent x – width
frequent within these texts but these few texts that word phrases (Lx) (P2 and W2,
contain the candidate key-phrase are spread within P3 and W3 – parameters)
a lot of classes. 12 End {For x=3,4 do}
As a first conclusion, the choice of the key-phrases 13 Compose an integrated list by joining Lx
(for x=2,3,4). This join, forms the
must not be based on frequent (within the whole text
frequent word phrases of class (LFC)
collection) candidate phrases or measures like the one 14 End {For every class of the training set}
used in KEA system. 15 Integrate / Join the lists of frequent word
Instead of using these frequent, in the whole phrases of all classes of the training set
16 Reject the frequent word phrases that exist
collection, phrases for text classification we can use in many classes (Pt - parameter). The rest
key-phrases which are frequent within the documents of the frequent word phrases form the
of one or few classes but do not be frequent in the set of key-phrases or «Authority list»
17 Form the dictionary of «Terms». It is the
documents of the rest classes in the training set. We list of stems that are components of the
also estimate that the selection of key-phrases based on key-phrases of the «Authority list».
some syntactic structures poses some extra complexity
(use of morphological part of speech taggers and Parameters:
P0 percentage of texts of the class that must Similarity based classification
contain a stem,
W1 width of window that covers 2-word phrases, Karanikolas and Skourlas [5] presented the idea that
P1 percentage of texts of the class that must the classification of new medical documents can be
contain a 2-words phrase, based to their similarity to existing documents (of a
W2 width of window that covers 3-word phrases, “training” set). Such an Instance based learning method
P2 percentage of texts of the class that must assumes that similar documents must be classified in the
contain a 3-words phrase, same category or (in other words) must share the same
W3 width of window that covers 4-word phrases, classification code (e.g. the same ICD code). According
P3 percentage of texts of the class that must to this approach the text collection is divided into a
contain a 4-words phrase, number of classes and each document of each class is
Pt percentage of classes that can contain a key- characterized by a number of key-phrases. For each
phrase. document in the collection the existing key-phrases in
the document can have a frequency, etc.
Adopting the idea proposed in Mannila et. al. our
algorithm works iteratively, alternating between The vector space model
building (steps 7 and 10) and recognition phases (steps
8 and 11). In the building phase of an iteration i, a In the popular vector space model a data set of n
collection Ci of new candidate key-phrases of distinct unique terms is specified, called the index terms of the
words is built, using the information available from document collection, and every text can be represented
smaller frequent key-phrases. Then, these candidate by a vector of weights of the terms in the document. In
key-phrases are recognized in the documents in the our case we use a set of m key-phrases instead of simple
class and their frequencies are computed. The terms and the vector representation of each document
collection Li consists of frequent key-phrases in Ci. In can be:
the next iteration i+1, candidate key-phrases in Ci+1 are
built using the information about the frequent key- (kp1, kp2, …, kpm)
phrases in Li. The algorithm starts by constructing C1
to contain all key-phrases consisting of single words. where kpj=1, if the key-phrase j is present in the text,
At the end of each step, the list of frequent key-phrases and 0 otherwise.
of the processed class is being built (step 13). At the A query is a new (unclassified) document (text) and
end, the algorithm composes the Authority list (steps can be represented in the same manner. The text and
15 and 16). query vectors can be envisioned as an n-dimensional
Steps 7 and 10 are based on the second (b) vector space. A vector matching operation, based on the
essential points of Mannila’s paper [6]. The set of cosine correlation used to measure the cosine of the
candidates 2-word phrases (C2) must contain key- angle between vectors can be used to compute the
phrases of length 2 (key-phrases including two stems similarity. Hence, the following equation (adapted from
of distinct words). To construct this set, we form the Lucarella D., 1988, [8]) gives us a well-known method
cartesian product and then remove all the tuples that to measure the similarity of a text Di of the training set
have the same elements. Figure 2 illustrates an against a new text Dnew (or query Q):
example of the application of step 10 of the algorithm.
In this case the construction of the sixth class of key- m m

phrases C6 is based on the set L5.  q j kpij

j 1
 q kp
j 1
j ij

S ( D i , Dnew )  
a c k d f m m LDnew  LDi
     a g c k d f  q   kp
j 1
2
j
j 1
2
ij

a g c d f
where m is the number of key-phrases used in the
collection, kpij is equal to 1 if the key-phrase j exists in
a c k d f document Di (of the training set), otherwise is equal to 0
and qj is the weight of key-phrase j in the new document.
     a b c k d f The following equation can be used to measure the
b c k d f b a c k d f term qj:

a c d f k  ClassCount 
q j  log 2  

ClassFreq
  g a c d f k  j 
g a c d f
where ClassCount is the number of classes of the
training set, and ClassFreqj is the number of classes that
Figure 1: construction of C6 based on L5. include the key-phrase j.
Experimental results [3] Eibe Frank, et. al. Domain-Specific Keyphrase
Extraction. International Joint Conference of Artificial
In the next table (table 1) we depict classes of texts Intelligence, 1999.
of the training set classified by ICD 9 – codes. [4] N. Karanikolas and C. Skourlas. Automatic
Diagnosis Classification of patient discharge letters.
Table 1: Training Set MIE 2002: XVIIth International Congress of the
European Federation for Medical Informatics, August
Class ICD9 code that Number of documents 25-29, 2002, Budapest, Hungary.
number characterizes the (discharge letters) in
class class [5] N. Karanikolas, C. Skourlas, A. Christopoulou and
1 0010 4 T. Alevizos. Medical Text Classification based on Text
2 122.8 4 Retrieval techniques. MEDINF 2003. 1st International
3 151 4 Conference on Medical Informatics & Engineering,
4 153 4 October 9 - 11, 2003, Craiova, Romania.
5 153.3 5 [6] Heikki Mannila, Hannu Toivonen and A. Inkeri
6 154.1 4 Verkamo. Discovering frequent episodes in sequences.
7 155.0 4 KDD-95: First International Conference on Knowledge
Discovery and Data Mining, August 20-21, 1995,
First, we applied our algorithm to the training set Montreal, Canada.
(29 discharge letters) and the «Set of the key-phrases» /
«Authority list» was constructed. Then, every text of [7] Heikki Mannila and Hannu Toivonen. Discovering
the training set was submitted as a new text for generalized episodes using minimal occurrences. KDD-
classification (for assigning the appropriate ICD-9 96: Second International Conference on Knowledge
code). The similarity of the «new» text with all the Discovery and Data Mining, August, 1996, Portland,
texts of the training set was calculated using the Oregon. AAAI Press.
proposed measure of similarity. In the next table (table [8] Lucarella, D., 1988, A document retrieval system
2) we present the number of documents of the same based on nearest neighbour searching, Journal of
class with the «new» document. More precisely, we Information Science, 14, 25-33.
present the number of retrieved documents of the same
class with the «new» document that belong to the first
five more similar ones and the first ten more similar
ones, respectively. It seems that we have promissing /
encouraging results.

Table 2: Results

Number of most similar In the five In the ten

documents of the same class best similar best similar
1 6 3
2 14 6
3 7 10
4 2 10

Funds

The work presented in this paper is co-funded by 75%

from E.E. and 25% from the Greek Government under
the framework of the Education and Initial Vocational
Training Program – Archimedes.

References

[1] Ian Witten, Eibe Frank. Data Mining: Practical

Machine Learning tools and Techniques with Java
implementation. Morgan Kaufmann, 1999, ISBN: 1-
55860-552-5.
[2] Helena Ahonen et. al. Mining in the phrasal
frontier. Principles of Knowledge Discovery in
Databases Conference, Trondheim, Norway, June
1997. Lecture Notes in Computer Science, Springer
Verlag, 1997.

Application of Computational Linguistics
No ratings yet
Application of Computational Linguistics
19 pages
Arabic Keyphrase Extraction
0% (1)
Arabic Keyphrase Extraction
77 pages
S S S 1fur Maths 2nd Term E-Note 2017
100% (1)
S S S 1fur Maths 2nd Term E-Note 2017
64 pages
Volume 5
100% (1)
Volume 5
181 pages
iPrimaryAwd Computing Specifcation
No ratings yet
iPrimaryAwd Computing Specifcation
26 pages
Data Structure Practice - For Collegiate Programming Contests and Education
No ratings yet
Data Structure Practice - For Collegiate Programming Contests and Education
707 pages
General Principles, Connecting Computational Thinking and Program
No ratings yet
General Principles, Connecting Computational Thinking and Program
92 pages
Python Module 2 Ktu Complete Notes
100% (1)
Python Module 2 Ktu Complete Notes
22 pages
2019 Book CyberSecurity PDF
No ratings yet
2019 Book CyberSecurity PDF
184 pages
Afan Oromo Text Keyword Extraction Using Machine Learning
100% (1)
Afan Oromo Text Keyword Extraction Using Machine Learning
18 pages
Artificial Intelligence - A Bird's Eye View
No ratings yet
Artificial Intelligence - A Bird's Eye View
11 pages
Unit 1
No ratings yet
Unit 1
72 pages
NUS SoC Module Desc
No ratings yet
NUS SoC Module Desc
55 pages
Week1-Fundamental of Computer Science
No ratings yet
Week1-Fundamental of Computer Science
47 pages
Keyword Extraction PDF
No ratings yet
Keyword Extraction PDF
29 pages
A Survey On Text Categorization: International Journal of Computer Trends and Technology-volume3Issue1 - 2012
No ratings yet
A Survey On Text Categorization: International Journal of Computer Trends and Technology-volume3Issue1 - 2012
7 pages
Anagram Gen
No ratings yet
Anagram Gen
51 pages
Distributional Features For Text Categorization: (Xuexb, Zhouzh) @lamda - Nju.edu - CN
No ratings yet
Distributional Features For Text Categorization: (Xuexb, Zhouzh) @lamda - Nju.edu - CN
12 pages
Do-It-Yourself Data Mining - Part I Text Analysis Using Architext Principles of Text Analysis
No ratings yet
Do-It-Yourself Data Mining - Part I Text Analysis Using Architext Principles of Text Analysis
9 pages
Postprocessing in Machine Learning and Data Mining: Ivan Bruha A. (Fazel) Famili
No ratings yet
Postprocessing in Machine Learning and Data Mining: Ivan Bruha A. (Fazel) Famili
5 pages
Wheelbox Result
No ratings yet
Wheelbox Result
14 pages
The Use of Bigrams To Enhance Text Categorization
No ratings yet
The Use of Bigrams To Enhance Text Categorization
38 pages
The Use of Bigrams To Enhance
No ratings yet
The Use of Bigrams To Enhance
31 pages
Rapid and Robust Ranking of Text Documents in A Dynamically Changing Corpus
No ratings yet
Rapid and Robust Ranking of Text Documents in A Dynamically Changing Corpus
7 pages
CH 2 - Text Operation
No ratings yet
CH 2 - Text Operation
38 pages
3.1problem Solving Agent
No ratings yet
3.1problem Solving Agent
38 pages
Survey Data Analysis
No ratings yet
Survey Data Analysis
17 pages
How Good Is Your Model?: Andreas Müller
No ratings yet
How Good Is Your Model?: Andreas Müller
54 pages
Optimal Stop Word Selection For Text Mining in Critical Infrastructure Domain
No ratings yet
Optimal Stop Word Selection For Text Mining in Critical Infrastructure Domain
6 pages
Keyword 2
No ratings yet
Keyword 2
5 pages
12 Asymptotic Notations 04-04-2023
No ratings yet
12 Asymptotic Notations 04-04-2023
18 pages
Feldman1998 Article MiningTextUsingKeywordDistribu
No ratings yet
Feldman1998 Article MiningTextUsingKeywordDistribu
20 pages
Text Classification by Augmenting Bag of Words (BOW) Representation With Co-Occurrence Feature
No ratings yet
Text Classification by Augmenting Bag of Words (BOW) Representation With Co-Occurrence Feature
5 pages
Roadmap To Python
No ratings yet
Roadmap To Python
19 pages
Information Technology National Scheme G 11
No ratings yet
Information Technology National Scheme G 11
32 pages
Intoduction To DSA
No ratings yet
Intoduction To DSA
44 pages
From Decoding To Meta-Generation: Inference-Time Algorithms For Large Language Models
No ratings yet
From Decoding To Meta-Generation: Inference-Time Algorithms For Large Language Models
46 pages
Demos 049
No ratings yet
Demos 049
8 pages
From MATLAB To Embedded C: News&Notes
No ratings yet
From MATLAB To Embedded C: News&Notes
4 pages
Ijcsn 2013 2 4 60 PDF
No ratings yet
Ijcsn 2013 2 4 60 PDF
3 pages
Search Engine Techniques
No ratings yet
Search Engine Techniques
10 pages
Unit-V Document and Term Clustering: Word Coordination Approach
No ratings yet
Unit-V Document and Term Clustering: Word Coordination Approach
11 pages
Information Retrieval Is A Complex Process Because There Is No Infallible Way To Provide A Direct Connection Between A User
No ratings yet
Information Retrieval Is A Complex Process Because There Is No Infallible Way To Provide A Direct Connection Between A User
4 pages
Midterm Exam - Solution
No ratings yet
Midterm Exam - Solution
6 pages
Dynamic Discovery of Type Classes and Relations in Semantic Web Data
No ratings yet
Dynamic Discovery of Type Classes and Relations in Semantic Web Data
26 pages
Ontology Modelling For FDA Adverse Event Reporting System
No ratings yet
Ontology Modelling For FDA Adverse Event Reporting System
5 pages
ZKFinger10.0 User Guide PDF
No ratings yet
ZKFinger10.0 User Guide PDF
2 pages
Knowledge Discovery in Textual Databases (KDT)
No ratings yet
Knowledge Discovery in Textual Databases (KDT)
7 pages
Jaya D. Kapoor Alamuri Ratnamala Institute of Engineering and Technology, Shahpur Kailas K. Devadkar Sardar Patel Institute of Technology, Andheri
No ratings yet
Jaya D. Kapoor Alamuri Ratnamala Institute of Engineering and Technology, Shahpur Kailas K. Devadkar Sardar Patel Institute of Technology, Andheri
6 pages
Text Extraction Research Paper
No ratings yet
Text Extraction Research Paper
6 pages
Document Centered Approach To Text Normalization
No ratings yet
Document Centered Approach To Text Normalization
8 pages
A Stop List For General Text
No ratings yet
A Stop List For General Text
17 pages
Keyphrase Extraction in Scientific Publications
No ratings yet
Keyphrase Extraction in Scientific Publications
10 pages
Efficient Preprocessing and Patterns Identification Approach For Text Mining
No ratings yet
Efficient Preprocessing and Patterns Identification Approach For Text Mining
6 pages
A Suggestion-Based RDF Instance Matching System: January 2017
No ratings yet
A Suggestion-Based RDF Instance Matching System: January 2017
6 pages
Natarajan Meghanathan, Nataliya Kostyuk, Raphael Isokpehi and Hari Cohly
No ratings yet
Natarajan Meghanathan, Nataliya Kostyuk, Raphael Isokpehi and Hari Cohly
9 pages
Keyword Extraction From A Single Document Using Word Co-Occurrence Statistical Information
No ratings yet
Keyword Extraction From A Single Document Using Word Co-Occurrence Statistical Information
5 pages
Kannada Text Extraction From Images and Videos For Vision Impaired Persons
No ratings yet
Kannada Text Extraction From Images and Videos For Vision Impaired Persons
8 pages
Manual Calculation - Pencil Beam
No ratings yet
Manual Calculation - Pencil Beam
25 pages
Sepsis Poster
No ratings yet
Sepsis Poster
1 page
SOWNDARRAJAN Journal
No ratings yet
SOWNDARRAJAN Journal
7 pages
Introduction To Discrete Mathematics
No ratings yet
Introduction To Discrete Mathematics
10 pages
Universal Guidance For Diffusion Models
No ratings yet
Universal Guidance For Diffusion Models
10 pages
KEA Practical Automatic Keyphrase Extraction
No ratings yet
KEA Practical Automatic Keyphrase Extraction
2 pages
Assignment 7
No ratings yet
Assignment 7
9 pages
A Comparative Study For Arabic Text Classification Algorithms Based On Stop Words Elimination
No ratings yet
A Comparative Study For Arabic Text Classification Algorithms Based On Stop Words Elimination
5 pages
A Jaccards Similarity Score Based Methodology For Kannada Text Document Summarization
No ratings yet
A Jaccards Similarity Score Based Methodology For Kannada Text Document Summarization
4 pages
Framework For Integration of Domain Knowlede Into Logistic Regression
No ratings yet
Framework For Integration of Domain Knowlede Into Logistic Regression
8 pages
An Unsupervised Model For Text Message Normalization
No ratings yet
An Unsupervised Model For Text Message Normalization
8 pages
2013, Badea - Text Analysis Based On Time Series
No ratings yet
2013, Badea - Text Analysis Based On Time Series
5 pages
Pengaturcaraan Komputer
No ratings yet
Pengaturcaraan Komputer
6 pages
Classifying Arabic Web Pages Toolkit
No ratings yet
Classifying Arabic Web Pages Toolkit
4 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Data Structures and Algorithm Analysis in C++, Third Edition
From Everand
Data Structures and Algorithm Analysis in C++, Third Edition
Clifford A. Shaffer
4.5/5 (5)
Data Structures and Algorithm Analysis in Java, Third Edition
From Everand
Data Structures and Algorithm Analysis in Java, Third Edition
Clifford A. Shaffer
4/5 (4)
Set Theory and Logic
From Everand
Set Theory and Logic
Robert R. Stoll
3.5/5 (12)
Elementary Point-Set Topology: A Transition to Advanced Mathematics
From Everand
Elementary Point-Set Topology: A Transition to Advanced Mathematics
Andre L. Yandl
5/5 (1)
Mastering Elasticsearch 5.x - Third Edition
From Everand
Mastering Elasticsearch 5.x - Third Edition
Bharvi Dixit
3/5 (1)
Introduction to Proof in Abstract Mathematics
From Everand
Introduction to Proof in Abstract Mathematics
Andrew Wohlgemuth
5/5 (1)
Schematron: A language for validating XML
From Everand
Schematron: A language for validating XML
Erik Siegel
No ratings yet
Interpolation and Extrapolation Optimal Designs 2: Finite Dimensional General Models
From Everand
Interpolation and Extrapolation Optimal Designs 2: Finite Dimensional General Models
Giorgio Celant
No ratings yet
Information Theory and Statistics
From Everand
Information Theory and Statistics
Solomon Kullback
No ratings yet
JMP for Mixed Models
From Everand
JMP for Mixed Models
Ruth Hummel
No ratings yet
Vector Spaces and Matrices
From Everand
Vector Spaces and Matrices
Robert M. Thrall
No ratings yet
Learn Python in One Hour: Programming by Example
From Everand
Learn Python in One Hour: Programming by Example
Victor R. Volkman
3/5 (2)
Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications
From Everand
Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications
Harald Cramér
4/5 (2)
Elementary Theory and Application of Numerical Analysis: Revised Edition
From Everand
Elementary Theory and Application of Numerical Analysis: Revised Edition
David G. Moursund
No ratings yet
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
An Introduction to Functional Programming Through Lambda Calculus
From Everand
An Introduction to Functional Programming Through Lambda Calculus
Greg Michaelson
No ratings yet
Python Regular Expressions Explained: A Practical Guide with Examples
From Everand
Python Regular Expressions Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Regular Expressions Demystified: A Practical Guide with Examples
From Everand
Regular Expressions Demystified: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering Algorithms and Data Structures
From Everand
Mastering Algorithms and Data Structures
Manish Soni
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
Formal Languages And Automata Theory
From Everand
Formal Languages And Automata Theory
Ajit Singh
No ratings yet
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet

Key-Phrase Extraction For Classification

Uploaded by

Key-Phrase Extraction For Classification

Uploaded by

KEY-PHRASE EXTRACTION FOR CLASSIFICATION

Nikitas N. Karanikolas* and Christos Skourlas**

Areteion University Hospital, Systems’ Head, Athens, Greece

phrases C6 is based on the set L5.  q j kpij

Number of most similar In the five In the ten

The work presented in this paper is co-funded by 75%

[1] Ian Witten, Eibe Frank. Data Mining: Practical

You might also like