0% found this document useful (0 votes)
6 views5 pages

Zhang 2015

This document reviews the field of text mining, highlighting its historical development and current research status. It categorizes text mining applications into text categorization, text clustering, association rule extraction, and trend analysis, while also discussing various models and methodologies used in the field. The paper concludes by emphasizing the growing importance and commercial value of text mining as a branch of artificial intelligence.

Uploaded by

gowtham teja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views5 pages

Zhang 2015

This document reviews the field of text mining, highlighting its historical development and current research status. It categorizes text mining applications into text categorization, text clustering, association rule extraction, and trend analysis, while also discussing various models and methodologies used in the field. The paper concludes by emphasizing the growing importance and commercial value of text mining as a branch of artificial intelligence.

Uploaded by

gowtham teja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A Review on Text Mining

Yu Zhang, Mengdong Chen and Lianzhong Liu


School of Computer Science and Engineering
BeihangUniversity
Beijing, China
{yzhang & cmd & lz_liu}@buaa.edu.cn

II. HISTORY AND RESEARCH STATUS


Abstract—Because of large amounts of unstructured text data
generated on the Internet, text mining is believed to have high In 1958, Hans Peter Luhn [5] published an article on the
commercial value. Text mining is the process of extracting IBM Journal, which describes a business intelligence system
previously unknown, understandable, potential and practical that can realize the document automatic extraction and coding
patterns or knowledge from the collection of text data. This by using data processing machine, and implement document
paper introduces the research status of text mining. Then several classification through the word frequency statistics. It is
general models are described to know text mining in the overall recognized as the earliest definition of Business Intelligence
perspective. At last we classify text mining work as text (BI), also as the prototype of text mining.
categorization, text clustering, association rule extraction and
trend analysis according to applications. Subsequently, many scholars have carried out fruitful
research work in this field. List some of the typical
Keywords-text mining; data mining; knowledge discovery representatives here. In 1960, Maron published a paper
reporting a novel technique for literature indexing and
I. INTRODUCTION searching in a mechanized library system [6], which is the first
paper about automatic classification. KDT is first proposed by
With the rapid development of information technology and Feldman Ronen et al. [2] at the 1st International Conference on
the extensive application of network, the Internet has gradually Knowledge Discovery and Data Mining in 1995. Bjornar
become an indispensable part of people's life. Web pages and Larsen and Chinatsu Aone [7] describe an unsupervised, near-
social network sites will generate large amounts of linear time text clustering system, which is fast and effective,
unstructured text data, such as blogs, forum posts, technical offering a number of algorithm choices for each phase. They
documentation, etc. These data showing people’s behavior and used F-Measure (a combination of precision and recall) to
thought intuitively, contains a lot of information, which is gauge the quality of the generated hierarchies.
extremely difficult to deal with because of the huge number
and various forms. But the demand of analyzing text data is There are also many other outstanding research work,
rising. Therefore, how to acquire the information people need including text representation [8] and models construction
from large numbers of unstructured text data becomes the [1][9]-[12]; data dimensions reduction research in feature
research hotspot in the field of data mining and information. extraction [13]-[14]; research on mining algorithm of text
Text mining came into being. classification [15]-[17] and clustering [18]-[20]; deep semantic
mining based on natural language process [21]-[22]; and text
Text mining [1], also known as knowledge discovery in mining applications in different fields, such as literature mining
textual database(KDT) [2] or text data mining [3], of which in molecular biology [23]-[24], stock prediction in the field of
new interesting knowledge is created, is defined as the process finance and securities [25], web mining on the internet [26]-
of extracting previously unknown, understandable, potential [28], digital library [29] and so on.
and practical patterns or knowledge from the collection of
massive and unstructured text data or corpus. Currently, text mining has entered into the practical stage
from the experimental stage. There are many successful text
As a branch of data mining, text mining is believed to have mining systems, like IBM Intelligent Miner for Text [30], Text
higher commercial value than data mining because 80% of a Miner [31], VisualText [32] etc.
company’s information is contained in text documents [4].
However, text mining is more complex as the unstructured text III. TEXT MINING MODELS
data. Text mining is a comprehensive research area, which
involves in the fields of artificial intelligence, machine learning, Text mining is generally composed of three steps: text
mathematical statistics, database system, and so on. preprocessing, text mining operations, postprocessing. Text
preprocessing tasks including data selection, classification and
This article introduces the history of text mining and feature extraction generally convert the documents into
research status. Then some general models are described in intermediate forms, which should be suitable for different
Section III. The fourth part is to classify text mining work mining purpose. Text mining operations are the central part of
according to application. Finally, it is summary. a text mining system, and include clustering, association rule
____________________________________
978-1-47-- /1/$31.00 ©201 IEEE


discovery, trend analysis, pattern discovery and other
knowledge discovery algorithms. Postprocessing tasks
manipulate data or knowledge coming from text mining
operations, such as evaluation and selection of knowledge,
interpretation and visualization of knowledge.
So far, there are a lot of common text mining models. The
earliest one is the KDT system (Knowledge Discovery in Text)
[10] proposed by Feldman et al. The general architecture of the
KDT system is shown in figure 1. The system takes two inputs:
a collection of keyword-labeled documents, and a keyword
hierarchy which is a directed acyclic graph (DAG) of terms,
where each of the terms is identified by a unique name. In
general, the keyword hierarchy given the hierarchical
Figure 2. A text mining framework
relationship between the main concepts involved in the
application domain, is part of the domain background
Mothe et al. [11] add document warehouse on the basis of
knowledge. Discovery operation module is used to mine the
the general model. Figure 3 provides an overview of the steps
collection of keyword-labeled documents with the background
needed to create a document warehouse. The information
knowledge, and obtain the pattern people need, which is shown
selection has in charge to gather the information related to the
in a friendly way like graphics in presentation module.
domain of interest, which is described through an information
Tan [1] put forward a general framework consisting of two need. The information reformatting is to modify the format of
components, which is a representative text mining model. The the information, because documents are not necessarily
two components are: text refining that transforms free-form structured. The information cleaning is used to decide what the
text documents into a chosen intermediate form, and information that has to be kept is. That includes the dimensions
knowledge distillation that deduces patterns or knowledge from to take into account and eventually the values to be considered
the intermediate form (cf. fig. 2). Intermediate form (IF) can be as well as solving some syntax and semantic problems (e.g.
semi-structured such as the conceptual graph representation, or synonyms). The information summarization corresponds
structured such as the relational data representation. mainly to aggregation functions done on numerical data
Intermediate form can be document-based wherein each entity according to the value of some of the attributes. Classification,
represents a document, or concept-based wherein each entity clustering, factorial analysis (principal component analysis,
represents an object or concept of interests in a specific domain. correspondence analysis) are easily performed on document
Mining a document-based IF deduces patterns and relationship warehouse.
across documents. Document clustering/visualization and
In addition to general text mining models, many models
categorization are examples of mining from a document-based
have been proposed for specific application field. For instance,
IF. Knowledge distillation from a concept-based IF derives
Shehata et al. propose a novel concept-based mining model [33]
pattern and relationship across objects or concepts. Text
for text clustering. The model, whose input is a raw text
mining operations, such as predictive modeling and associative
document, consists of concept-based term analysis and
discovery, fall into this category. A document-based IF can be
concept-based similarity measure. They extend the model in
transformed into a concept-based IF by realigning or extracting
[34]. The advanced concept-based mining model consists of
the relevant information according to the objects of interests in
sentence-based concept analysis, document-based concept
a specific domain. It follows that document-based IF is usually
analysis, corpus-based concept analysis, and concept-based
domain-independent and concept-based IF is domain-
similarity measure, as depicted in Fig. 4. The model can
dependent.
efficiently find significant matching concepts between
documents, according to the semantics of their sentences.

Figure 1. KDT system architecture Figure 3. Overview of the document warehouse creation


features framework and employ three machine learning
methods (Naive Bayes, maximum entropy classification, and
support vector machines) to classify sentiment, using movie
reviews as data.
Other applications are e-mail filters to discard “junk” mail
[39], language identification for texts of unknown language
[40], the organization of patents into categories for making
their search easier [41], and so on.

B. Text Clustering
Text clustering is one of the earliest and most mature fields
in text mining. It is an unsupervised process through which
objects are classified into groups without predefined categories.
Text clustering is based on the well-known cluster hypothesis:
relevant documents tend to be more similar to each other than
to nonrelevant ones. Clustering is useful in a wide range of data
analysis fields, including data mining, document retrieval,
image segmentation, and pattern classification. And it is mostly
used to improve the precision rate and recall rate of
information retrieval system [42]-[44].
Text clustering can find topic in large scale text data, and is
the powerful tool for text theme analysis. [45] gives a topic
Figure 4. Concept-based mining model system analysis method. First extract the name entities from
documents, then look for frequent itemsets: groups of named
IV. THE CLASSIFICATION OF TEXT MINING entities that commonly occurred together, next perform
clustering of the named entities grouped by the frequent
Although text mining is an emerging field, there already itemsets using a hypergraph-based method [46]. Each cluster is
has been a great deal of research work involving a lot of represented as a set of named entities and corresponds to an
application area. According to these application areas, text ongoing topic in the corpus.
mining can be classified as text categorization, text clustering,
association rule extraction and trend analysis. Topic tracking of dynamic text data is also an interesting
subject of text clustering. Montes-y-Gómez et al. [47] proposes
A. Text Categorization a text mining method to get topics from online news and
analyze the influence of the peak news topics over other
As an important text mining application, text categorization current news topics. A common phenomenon in news reports is
is a supervised learning process. Text categorization (TC) is the the influence of a peak news topic, i.e., a topic with one-time
process of automatically determining the text category short-term peak of frequency, on the other news topics. This
according to the text content under the given classification kind of influence is called an ephemeral association. They
system. The study of TC dates back to 1960s, mainly for the propose a technique with which the observable associations are
index of scientific literature. To the 1990s, because of the detected by simple statistical methods.
increasing number of text data and the strong demand for
processing text, TC fully developed. In recent years, TC has
C. Association Rule Extraction
applied to many fields, from automatic or semi-automatic text
indexing to spam filtering, metadata generation, word sense Association rule extraction proposed by Agrawal [48], is an
disambiguation, hierarchical categorization of web pages, important topic in text mining. It is to find out the association
genre detection, etc. relationship between different feature words from the text
collection. The formal description of association rule extraction
Word sense disambiguation may be seen as a TC task [35]- is given in [49], as below:
[36] once we view word occurrence contexts as documents and
word senses as categories. Gale et al. [35] propose a method to Given a collection of indexed documents D  {d1 , d2 ,..., dn }
disambiguate senses that are usually associated with different and a set of items A  {w1 , w2 , , wn } , which composed of
topics. They use the class of Bayesian decision models that has
been applied successfully in related tasks such as author keywords, term, phrase, or concept. Let Wi be a set of items. A
identification and information retrieval. document d i is said to contain Wi if and only if Wi  di . An
Sentiment analysis is also an important research area of text association rule is an implication of the form Wi  W j
categorization. [37] calculates the orientation similarity of
words in the phrase based on the sentiment weight priority and where Wi  A , W j  A and Wi  W j   . There are two
puts forward the concept of center word to calculate the important basic measures for association rules, support and
orientation of the phrase according to the combination of the confidence. The rule Wi  W j has support s in the collection
words in the phrase. Lillian Lee et al. [38] use standard bag-of-


of documents D if s % of documents in D contain Wi W j . The historical data of stock prices. Montes-y-Gómez et al. [57]
support is calculated by the following formula: present a method for the trend analysis of news to find the
current social hot spot and its changing tendency.
At present, work in this area mainly adopts methods based
Support count of WW on statistics [10][54]. Feldman et al. [10] use keyword
j 
i j
Support WW
i
Total number of documents D distributions to label documents, and calculate the distance
between keyword distributions for collections from different
The rule Wi  W j holds in the collection of documents D points in time to find the changing trend of the text topics.
with confidence c if among those documents that contain Wi ,
V. CONCLUSION
c % of them contain W j also. The confidence is calculated by
We have provided a very brief introductions to the text
the following formula: mining and its research status. Then several general models are
described to know text mining in the overall perspective. At
Support WW
S i j last we classify text mining work as text categorization, text
Confidence Wi Wj  (2)
Support Wi clustering, association rule extraction and trend analysis
according to applications. Text mining is a new direction of
An association rule extraction is broken into two steps: 1) artificial intelligence, and with the continuous improvement of
generate all the itemsets whose support is greater than the user the text mining technology, its application areas will be
specified minimum support (called minsupp). Such sets are growing.
called the frequent itemsets and 2) use the identified frequent
itemsets to generate the rules that satisfy a user specified REFERENCES
minimum confidence (called minconf).
[1] Tan, Ah Hwee, et al. "Text Mining: The state of the art and the
Text mining system often produces a large number of challenges." Proceedings of the Pakdd Workshop on Knowledge
association rules, but few of them are the knowledge users Disocovery from Advanced Databases(2000):65--70.
interested in. So evaluate and choose the association rules are [2] Feldman, Ronen, and I. Dagan. "Knowledge Discovery in Textual
Databases (KDT)." In Proceedings of the First International Conference
very important to a practical text mining system. Reference [50] on Knowledge Discovery and Data Mining (KDD-95(1995):112--117.
estimates the novelty of text-mined rules using semantic
[3] Hearst, Marti A. "Untangling text data mining." University of
distance measures based on WordNet [51]. If the semantic Maryland1999:3--10.
distance is short, then the rule may be useless. [4] S. Grimes. "Unstructured data and the 80 percent rule." Carabridge
Bridgepoints, 2008.
D. Trend Analysis [5] Luhn, H. P. "A Business Intelligence System." Ibm Journal of Research
& Development2.4(1958):314-319.
If considering the time dimension of text data, it can reflect
the changing rules of text topics or predict the development [6] Maron, M. E., and J. L. Kuhns. "On Relevance, Probabilistic Indexing
and Information Retrieval.." Journal of the Acm7.3(1960):216-244.
trend of objects [52]. Now the research on trend analysis
[7] Larsen, Bjornar, and C. Aone. "Fast and effective text mining using
mainly aims at Current news, financial reports, scientific linear-time document clustering." Proceedings of the fifth ACM
literature, business reports and other scheduling text data [53]. SIGKDD international conference on Knowledge discovery and data
miningACM, 1999:16-22.
[54] proposes general probabilistic approaches to discover
[8] Salton, G., A. Wong, and C. S. Yang. "A vector space model for
and summarize the evolutionary patterns of themes in a text automatic indexing." In Communications of the ACM 18(11), 1975:
stream through discovering latent themes from text, 613-620.
constructing an evolution graph of themes, and analyzing life [9] Steinheiser, R., and C. Clifton. "Data Mining on Text." 2012 IEEE 36th
cycles of themes. To discover the evolutionary theme graph, Annual Computer Software and Applications ConferenceIEEE
their method would first generate word clusters (i.e., themes) Computer Society, 1998:630.
for each time period and then use the Kullback-Leibler [10] Feldman, Ronen, I. Dagan, and H. Hirsh. "Mining Text Using Keyword
divergence measure to discover coherent themes over time. Distributions." Journal of Intelligent Information
Systems10.3(1998):281-300.
The evolution graph can reveal how themes change over time
and how one theme in one time period has influenced other [11] Mothe J., Chrisment C., Dkaki T., Dousset B., Egret D., (2001)
"Information mining: use of the document dimensions to analyse
themes in later periods. They also propose a method based on interactively a document set", European Colloquium on IR Research:
hidden Markov models for analyzing the life cycle of each ECIR, 66-77.
theme. This method would first discover the globally [12] Ghanem, M., Chortaras, A., Guo, Y., Rowe, A., & Ratcliffe, J. (2005).
interesting themes and then compute the strength of a theme in "A Grid Infrastructure For Mixed Bioinformatics Data And Text
each time period. This allows us to not only see the trends of Mining." Computer Systems and Applications, 2005. The 3rd
ACS/IEEE International Conference on(Vol.29, pp.41-I).
strength variations of themes, but also compare the relative
strengths of different themes over time. [13] Karanikas, Haralampos, C. Tjortjis, and B. Theodoulidis. "An Approach
to Text Mining using Information Extraction." Proc. Workshop
Brian Lent et al. [55] find the changes of all kinds of Knowledge Management Theory Applications (KMTA 00(2000).
patents over the years through analyzing the relevant patent [14] Hu, Qinghua, et al. "A novel weighting formula and feature selection for
database. Victor Lavrenko et al. [56] predict the trend of stock text classification based on rough set theory." Natural Language
Processing and Knowledge Engineering, 2003. Proceedings. 2003
prices based on news of relevant quoted companies and International Conference on IEEE, 2003:638-645.


[15] Tan, Songbo, et al. "Using dragpushing to refine centroid text [37] Dun LI, Fu-Yuan CAO, Yuan-Da CAO, Yue-Liang WAN. "Text
classifiers." Proceedings of the 28th annual international ACM SIGIR Sentiment Classification Based on Phrase Patterns." Computer Science.
conference on Research and development in information retrieval. ACM, 35.4(2008):132-134. DOI:10.3969/j.issn.1002-137X.2008.04.037.
2005. [38] Pang, Bo, L. Lee, and S. Vaithyanathan. "Thumbs up? Sentiment
[16] Masoud Makrehchi and Mohamed S. Kamel. "Text Classification Using Classification using Machine Learning Techniques." Proceedings of
Small Number of Features.." Lecture Notes in Computer Emnlp(2002):79--86.
Science(2005):580-589. [39] Androutsopoulos, Ion, et al. "An Experimental Comparison of Naive
[17] Jiang, Chuntao, et al. "Text Classification using Graph Mining-based Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail
Feature Extraction." Knowledge-Based Systems23.4(2010):302–308. Messages." Proceedings of Annual International Acm Sigir Conference
[18] Hotho, Andreas, Steffen Staab, and Gerd Stumme. "Ontologies improve on Research & Development in Information Retrieval(2000):160--167.
text document clustering." Data Mining, 2003. ICDM 2003. Third IEEE [40] Cavnar, William B., and J. M. Trenkle. "N-Gram-Based Text
International Conference on. IEEE, 2003. Categorization." Proceedings of Int’l Symposium on Document Analysis
[19] Beil, Florian, Martin Ester, and Xiaowei Xu. "Frequent term-based text & Information Retrieval Las Vegas Nv(2001):161--175.
clustering." Proceedings of the eighth ACM SIGKDD international [41] Larkey, Leah S. "A Patent Search and Classification System." Digital
conference on Knowledge discovery and data mining. ACM, 2002. Libraries the Fourth Acm Conference on Digital Libraries(1999):79--87.
[20] Luo, Congnan, Y. Li, and S. M. Chung. "Text document clustering [42] Rijsbergen, Van. C.J. "Information Retrieval." 14th International
based on neighbors.." Data & Knowledge Symposium on Methodologies for Intelligent Systems. Volume 2871.,
Engineering68.11(2009):1271-1288. Maebashi City, Japan, LNCS, Springer-Verlag12.2-3(1989):95.
[21] Berendt, Bettina, Andreas Hotho, and Gerd Stumme. "Towards semantic [43] Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey
web mining." The Semantic Web—ISWC 2002. Springer Berlin "Scatter/Gather: a cluster-based approach to browsing large document
Heidelberg, 2002. 264-278. collections." Proceedings of the 15th annual international ACM SIGIR
[22] Dave, Kushal, Steve Lawrence, and David M. Pennock. "Mining the conference on Research and development in information retrievalACM,
peanut gallery: Opinion extraction and semantic classification of product 1992:318--329.
reviews." Proceedings of the 12th international conference on World [44] Oren Zamir, Oren Etzioni, Omid Madani, Richard M. Karp. "Fast and
Wide Web. ACM, 2003. Intuitive clustering of Web documents." Proceedings of International
[23] de Bruijn, Lambertus, and Joel Martin. "Literature mining in molecular Conference on Knowledge Discovery & Data Mining(1997):287--290.
biology." Proceedings of the EFMI Workshop on Natural Language [45] Clifton, Chris, and R. Cooley. TopCat: Data Mining for Topic
Processing in Biomedical Applications. 2002. Identification in a Text Corpus. Principles of Data Mining and
[24] Tanabe, L., et al. "MedMiner: an Internet text-mining tool for Knowledge DiscoverySpringer Berlin Heidelberg, 1999:174-183.
biomedical information, with application to gene expression profiling." [46] E.-H. S. Han, G. Karypis, and V. Kumar, “Clustering Based on
Biotechniques 27.6 (1999): 1210-4. Association Rule Hypergraphs,” Proc. SIGMOD’97 Workshop Research
[25] Mittermayer, Marc-André. "Forecasting intraday stock price trends with Issues in Data Mining and Knowledge Discovery, 1997.
text mining techniques." System Sciences, 2004. Proceedings of the 37th [47] M. Montes-y-Gómez, A. Gelbukh, and A. López-López. "Discovering
Annual Hawaii International Conference on. IEEE, 2004. Ephemeral Associations among News Topics." 17th international joint
[26] Cooley, Robert, Bamshad Mobasher, and Jaideep Srivastava. "Web conference on artificial intelligence ijcai-01, workshop on adaptive text
mining: Information and pattern discovery on the world wide web." mining 2001.
Tools with Artificial Intelligence, 1997. Proceedings., Ninth IEEE [48] Agrawal, Rakesh, T. Imieliński, and A. Swami. "Mining Association
International Conference on. IEEE, 1997. Rules Between Sets Of Items In Large Databases." SIGMOD '93
[27] Kosala, Raymond, and Hendrik Blockeel. "Web mining research: A Proceedings of the 1993 ACM SIGMOD international conference on
survey." ACM Sigkdd Explorations Newsletter 2.1 (2000): 1-15. Management of data1993:207--216.
[28] Grace, L. K., V. Maheswari, and Dhinaharan Nagamalai. "Analysis of [49] Mahgoub, Hany, et al. "A Text Mining Technique Using Association
web logs and web user in web mining." arXiv preprint arXiv:1101.5668 Rules Extraction." International Journal of Computational
(2011). Intelligence1(2008):21.
[29] Witten, Ian H., et al. "Text mining in a digital library." International [50] Basu, Sugato, et al. "Using lexical knowledge to evaluate the novelty of
Journal on Digital Libraries 4.1 (2004): 56-59. rules mined from text." Proceedings of the NAACL workshop and other
Lexical Resources: Applications, Extensions and Customizations. 2001.
[30] http://www-01.ibm.com/common/ssi/cgi-
bin/ssialias?infotype=an&subtype=ca&htmlfid=897/ENUS298- [51] Fellbaum, C., and G. Miller. WordNet:An Electronic Lexical Database.
061&appname=isource&language=enus#3pb MIT Press, 1998.
[31] http://www.sas.com/en_us/software/analytics/text-miner.html [52] Zhi-Qun Chen, and Guo-Xuan Zhang. "A Survey of Text Mining."
Journal of Pattern recognition and artificial intelligence 18.1(2005):65-
[32] http://www.textanalysis.com/Products/VisualText/visualtext.html 74. DOI:10.3969/j.issn.1003-6059.2005.01.012.
[33] Shehata, S., F. Karray, and M. Kamel. "Enhancing Text Clustering [53] Zhi-Qun Chen. " A Survey of Trend Mining for texts." Journal of
Using Concept-based Mining Model." IEEE 13th International Information Science 2(2010):316-320.
Conference on Data MiningIEEE Computer Society, 2006:1043-1048.
[54] Mei, Qiaozhu, and C. X. Zhai. "Discovering evolutionary theme patterns
[34] Shehata, Shady, F. Karray, and M. S. Kamel. "An Efficient Concept- from text: an exploration of temporal text mining." Proceedings of
Based Mining Model for Enhancing Text Clustering." IEEE Kdd ’(2005):198-207.
Transactions on Knowledge & Data Engineering22.10(2010):1360-1371.
[55] Lent, Brian, Rakesh Agrawal, and Ramakrishnan Srikant. "Discovering
[35] Gale, W. A., Church, K. W., and Yarowsky, D. (1993). “A Method for Trends in Text Databases." KDD. Vol. 97. 1997.
Disambiguating Word Senses in a Large Corpus.” Computers and the
Humanities 26(5): 415–439. [56] Lavrenko, Victor, et al. "Mining of Concurrent Text and Time Series."
Proceedings of Acm Sigkdd Intl Conference on Knowledge Discovery &
[36] Escudero, Gerard, L. Marquez, and G. Rigau. "Boosting Applied to Data Mining Workshop on Text Mining(2000):37--44.
Word Sense Disambiguation." IN PROCEEDINGS OF THE 12TH
EUROPEAN CONFERENCE ON MACHINE LEARNING2000:129-- [57] Montes-y-Gómez, López –López and Gelbukh, "Text Mining as a Social
141. Thermometer", Proc. Of the Workshop on Text Mining: Foundations,
Techniques and Applications, IJCAI-99, Stockholm, 1999.



You might also like