0% found this document useful (0 votes)

178 views4 pages

A Detailed Study On Text Mining Techniques

This document summarizes a research paper on text mining techniques. It discusses how text mining is used to extract hidden information from unstructured or semi-structured data. It also describes various techniques for mining plain text, including text summarization, document retrieval, information retrieval, and assessing document similarity. The document provides examples of applications of text mining such as analyzing survey responses, warranty claims, and medical interviews. It also distinguishes between text mining and data mining as well as text mining and web mining.

Uploaded by

VishalLakha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

178 views4 pages

A Detailed Study On Text Mining Techniques

Uploaded by

VishalLakha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

International Journal of Soft Computing and Engineering (IJSCE)

ISSN: 2231-2307, Volume-2, Issue-6, January 2013

A Detailed Study on Text Mining Techniques

Rashmi Agrawal, Mridula Batra

Abstract - Text Mining is an important step of Knowledge

Discovery process. It is used to extract hidden information
from not-structured or semi-structured data. This aspect is
fundamental because most of the Web information is semistructured due to the nested structure of HTML code, is
linked and is redundant. Web Text Mining helps whole
knowledge mining process in mining, extraction and
integration of useful data, information and knowledge from
Web page contents. Web Text Mining process able to
discover knowledge in a distributed and heterogeneous
multi-organization environment. In this paper, our basic
focus is to study the concept of Text Mining and various
techniques. Here, we are able to determine how to mine the
Plain as well as Structured Text. It also describes the major
ways in which text is mined when the input is plain natural
language, rather than partially-structured Web documents.
Keywords:
Documents.

Plain,

Structured,

Text

Mining,

Web

I. INTRODUCTION TO TEXT MINING

The Text mining processes unstructured information,
extracts meaningful numeric indices from the text, and makes
the information contained in the text accessible to the various
data mining (statistical and machine learning) algorithms.
Information can be extracted from the summarized words of
the documents, so the words can be analyzed and also the
similarities between words and documents can be determined
or how they are related to other variables in the data-mining
project. Basically, text mining converts text into numbers
which can then be included in other analyses such as
predictive data mining projects, clustering etc. Text mining is
also known as text data mining, which refers the process of
deriving high-quality information from text. High-quality
information is derived through the statistical pattern learning.
Text mining includes the process of structuring the input text
like parsing and other successive insertion into a database. TM
derives patterns within the structured data, evaluates them and
finally produces the output. Text mining takes account of text
categorization, text clustering, sentiment analysis, document
summarization, and entity relation modeling. Text mining is a
process that employs a set of algorithms for converting
unstructured text into structured data objects and the
quantitative methods used to analyze these data objects.

Manuscript received on January, 2013.

Rashmi Agrawal, Department of Computer Applications, Manav Rachna
International University,Faridabad,India.
Mridula Batra, Department of Computer Applications, Manav Rachna
International University,Faridabad,India.

II. APPLICATIONS OF TEXT MINING

There are various applications of Text mining like
automatic processing of messages and emails. For example, it
is possible to "filter" out automatically "junk email" based on
certain terms, such messages can automatically be discarded.
Such automatic systems for classifying electronic messages
can also be useful in applications where messages need to be
routed automatically to the most appropriate department.
Another application is Analyzing warranty or insurance
claims, diagnostic interviews. In some business domains, the
majority of information is collected in textual form. For
example, warranty claims or initial medical (patient)
interviews can be summarized in brief narratives, or when you
take your automobile to a service station for repairs, typically,
the attendant will write some notes about the problems that
you report and what you believe needs to be fixed.
Increasingly, those notes are collected electronically, so those
types of narratives are readily available for input into text
mining algorithms.
Analyzing open-ended survey responses.
Survey
questionnaires typically contain two broad types of questions:
open-ended and closed-ended. Closed-ended questions present
a discrete set of responses from which to choose. Such types
of responses are easily quantified and analyzed while openended questions allow the respondent to answer a question in
his own words. Such types of unstructured responses often
provide richer and more valued information than closed-ended
questions and are an important source of insight since they can
generate information that was not anticipated. Despite their
added value, researchers often prefer to avoid including openended questions in their surveys because of the tedious task of
reading and coding responses, a time-consuming and
expensive task especially when one has more than a few
hundred written responses.
III. VARIOUS TERMINOLOGIES OF TEXT MINING
A. Text Mining Vs. Data Mining
In Text Mining, patterns are extracted from natural language
text but in Data Mining patters are extracted from databases.
B. Text Mining Vs. Web Mining
In Text Mining, the input is free unstructured text, but in
Web Mining web sources are structured.
IV. WHY TEXT MINING?
Text mining is data mining which is applied to textual data.
Text is "unstructured, vague and difficult to deal with but it is
the most common method for formal exchange of information.
Whereas data mining belongs in the corporate world because
that's where most databases are, text mining promises to move
machine learning technology out of the companies and into the

118

A Detailed Study on Text Mining Techniques

home" as an increasingly necessary Internet adjunct i.e., as
"web data mining" provide a current review of web data
extraction tools.
Text mining is nothing but "nontraditional information
retrieval strategies." The goal of these strategies is to reduce
the effort required of users to obtain useful information from
large computerized text data sources. Traditional information
retrieval strategies simultaneously retrieve both less and much
information from the text. The nontraditional strategies
represent a useful system that must go beyond simple
retrieval.
A. How does Mining Work

Traditional keyword search retrieves documents

containing pre-defined keywords. Text mining extracts
precise information based on much more than just
keywords, such as entities or concepts, relationships,
phrases, sentences and even numerical information in
context.
Text mining software tools often use computational
algorithms based on Natural Language Processing, or
NLP, to enable a computer to read and analyze textual
information. It interprets the meaning of the text and
identifies extracts, synthesizes and analyzes relevant facts
and relationships that directly answer the question.
Text can be mined in a systematic, comprehensive and
reproducible way, and business critical information can be
captured automatically.
Powerful NLP-based queries can be run in real time
across millions of documents. These can be pre-written
queries.
Using wildcards, one can ask questions without even
having to know the keywords for which he is looking for
and still get back high quality, structured results.
One can switch in any vocabularies or thesauri to take
advantage of terminology used in its own specific
domain.
V. METHODS OF MINING TEXT

A. Mining Plain Text

This section describes the major ways in which text is
mined when the input is plain natural language, rather than
partially-structured Web documents. We begin with problems
that involve extracting information for human consumption.
Here are the various techniques which mine the plain text like
text summarization, document retrieval, Information retrieval,
Assessing document similarity and Text categorization.
A1. Text summarization
A text summarizer produces a compressed representation of
its input, which specifies human Consumption. It also contains
individual documents or groups of documents. Text
Compression is a related area but the output of text
summarization is specific to be human-readable. The output of
text compression algorithms is definitely not human-readable
and it is also not actionable, It only supports decompression,
that is, automatic reconstruction of the original text.
Summarization differs from many other forms of text mining
in that there are people, namely professional abstractors, who

are skilled in the art of producing summaries and carry out the
task as part of their professional life.
A2. Document Retrieval
Document retrieval is the task of identifying and returning
the most relevant documents. Traditional libraries provide
catalogues that allow users to identify documents based on
resources which consist of metadata. Metadata is a highly
structured document for summary, and successful
methodologies have been developed for manually extracting
metadata and for identifying relevant documents based on it,
methodologies that are widely taught in library school.
Automatic extraction of metadata (e.g. subjects, language,
author, key-phrases) is a prime application of text mining
techniques. The idea is to index every individual word in the
document collection. It specifies many effective and popular
document retrieval techniques.
A3. Information retrieval
Information retrieval is considered as an extension to
document retrieval where the documents that are returned are
processed to condense or extract the particular information
sought by the user. Thus document retrieval is followed by a
text summarization stage that focuses on the query posed by
the user, or an information extraction stage. The modularity of
documents may be adjusted so that each individual subsection
or paragraph comprises a unit in its own right, in an attempt to
focus results on individual nuggets of information rather than
lengthy documents.
A4. Assessing document similarity
Many text mining problems involve assessing the similarity
between different documents; for example, assigning
documents to pre-defined categories and grouping documents
into natural clusters. These are the basic problems in data
mining too, and have been a focus for research in text mining,
perhaps because the success of different techniques can be
evaluated and compared using standard, objective, measures
of success.
A5. Text categorization
Text categorization is the assignment of natural language
documents to predefined categories according to their content.
The set of categories is often called a controlled vocabulary.
Document categorization is a long-standing traditional
technique for information retrieval in libraries, where subjects
rival authors as the predominant gateway to library contents
although they are far harder to assign objectively than
authorship. Automatic text categorization has many practical
applications, including indexing for document retrieval,
automatically extracting metadata, word sense disambiguation
by detecting the topics a document covers, and organizing and
maintaining large catalogues of Web resources. As in other
areas of text mining, until the 1990s text categorization was
dominated by ad hoc techniques of knowledge engineering
that sought to elicit categorization rules from human experts
and code them into a system that could apply them
automatically to new documents. Since thenand particularly
in the research communitythe dominant approach has been
to use techniques of machine learning to infer categories
automatically from a training set of pre-classified documents.
Indeed, text categorization is a hot topic in machine learning
today. The pre-defined categories are symbolic labels with no

119

International Journal of Soft Computing and Engineering (IJSCE)

ISSN: 2231-2307, Volume-2, Issue-6, January 2013
additional semantics. When classifying a document, no
information is used except for the documents content itself.
Some tasks constrain documents to a single category, whereas
in others each document may have many categories.
Sometimes category labeling is probabilistic rather than
deterministic, or the objective is to rank the categories by their
estimated relevance to a particular document. Sometimes
documents are processed one by one, with a given set of
classes; alternatively there may be a single classperhaps a
new one that has been added to the setand the task is to
determine which documents it contains. Many machine
learning techniques have been used for text categorization.
B. Mining structured text
Much of the text that we have on the Internet contains
explicit structural markup and differs from traditional plain
text. Some markup is internal and indicates document
structure or format; some is external and gives explicit
hypertext links between documents. These information
sources give additional benefits for mining Web documents.
Both sources of information are extremely noisy: they involve
arbitrary and unpredictable choices by individual page
designers. However, these disadvantages are offset by the total
amount of data that is available, which is relatively unbiased
because it is aggregated over many different information
providers. Thus Web mining is emerging as a new subfield,
similar to text mining but taking advantage of the extra
information available in Web documents, particularly
hyperlinksand even capitalizing on the existence of topic
directories in the Web itself to improve results. We briefly
review three techniques for mining structured text. The first,
wrapper induction, uses internal markup information to
increase the effectiveness of text mining in marked-up
documents. The remaining two, document clustering and
determining the authority of Web documents, capitalize on
the external markup information that is present in hypertext in
the form of explicit links to other documents.
B1. Wrapper Induction
Internet resources that contain relational datatelephone
directories, product catalogs, etc.use Formatting markup to
clearly present the information they contain to users.
However, with standard HTML, it is quite difficult to extract
data from such resources in an automatic way. The XML
markup language is designed to overcome these problems by
encouraging page authors to mark their content in a way that
reflects document structure at a detailed level; but it is not
clear to what extent users will be prepared to share the
structure of their documents fully in XML, and even if they
do, huge numbers of legacy pages abound. Many software
systems use external online resources by hand-coding simple
parsing modules, commonly called wrappers, to analyze the
page structure and extract the requisite information. This is a
kind of text mining, but one that depends on the input having a
fixed, predetermined structure from which information can be
extracted algorithmically. Given that this assumption is
satisfied, the information extraction problem is relatively
trivial. But this is rarely the case. Page structures vary; errors
that are insignificant to human readers throw automatic
extraction procedures off completely; Web sites evolve. There
is a strong case for automatic induction of wrappers to reduce

these problems when small changes occur, and to make it

easier to produce new sets of extraction rules when structures
change completely.
B2. Document clustering with links
Document clustering techniques are based on the
documents textual similarity. However, the hyperlink
structure of Web documents, encapsulated in the link graph
in which nodes are Web pages and links are hyperlinks
between them, can be used as a different basis for clustering.
Many standard graph clustering and partitioning techniques
are applicable. Link-based clustering schemes typically use
factors such as:The number of hyperlinks that must be
followed to travel in the Web from one document to the other;
The number of common ancestors of the two documents,
weighted by their ancestry distance and The number of
common descendents of the documents, similarly weighted.
These can be combined into an overall similarity measure
between documents. In practice, a textual similarity measure is
usually incorporated as well, to yield a hybrid clustering
scheme that takes account of both the documents content and
their linkage structure. The overall similarity may then be
determined as the weighted sum of four factors. Such a
measure will be sensitive to the characteristics of the
documents and their linkage structure, and given the number
of parameters involved there is considerable scope for tuning
to maximize performance on particular data sets.
B3. Determining authority of Web documents
The Webs linkage structure is a valuable source of
information that reflects the popularity, sometimes interpreted
as importance, authority or status, of Web pages. For
each page, a numeric rank is computed. The basic premise is
that highly-ranked pages are ones that are cited, or pointed to,
by many other pages. Consideration is also given to (a) the
rank of the citing page, to reflect the fact that a citation by a
highly-ranked page is a better indication of quality than one
from a lesser page, and (b) the number of out-links from the
citing page, to prevent a highly ranked page from artificially
magnifying its influence simply by containing a large number
of pointers. This leads to a simple algebraic equation to
determine the rank of each member of a set of hyperlinked
pages. Complications arise from the fact that some links are
broken in that they lead to nonexistent pages, and from the
fact that the Web is not fully connected; these are easily
overcome. Such techniques are widely used by search engines
(e.g. Google) to determine how to sort the hits associated with
any given query. They provide a social measure of status that
relates to standard techniques developed by social scientists
for measuring and analyzing social networks.
VI. APPROACHES TO TEXT MINING
Using well-tested methods and understanding the results of
text mining:- Once a data matrix has been computed from the
input documents and words found in those documents, various
well-known analytic techniques can be used for further
processing which includes methods for clustering, factoring,
or predictive data mining

120

A Detailed Study on Text Mining Techniques

Black-box approaches to text mining and extraction of
concepts. There are text mining applications which use blackbox methods to take out detailed meaning from documents
with less human effort. These text-mining applications
summarize large numbers of text documents automatically,
retaining the core and most important meaning of those
documents.
Text mining as document search. The another approach of
text mining is the automatic search of large numbers of
documents based on key words or key phrases. This provides
efficient access to Web pages with certain content. It searches
very large document repositories based on varying criteria.
VII. CONCLUSION
In this paper our major focus is on how text is mined
whether it is plain text or structured text. In structured text we
have discussed how internal documents structure and external
structure is mined which gives explicit hypertext links
between documents. We have also discussed the functioning
of text mining like one can switch in any vocabularies or
thesauri to take advantage of terminology used in its own
specific domain and NLP-based queries can be run in real time
across millions of documents.
REFERENCES
[1]

[2]

[3]

[4]

[5]
[6]

[7]
[8]

[9]

Agrawal, R. and Srikant, R. (1994)

Fast algorithms for mining
association rules. Proc Int Conf on Very Large Databases VLDB-94,
Santiago, Chile, pp. 487-499.
Aone, C., Bennett, S.W., and Gorlinsky, J. (1996) Multi-media fusion
through application of machine learning and NLP. Proc AAAI
Symposium on Machine Learning in Information Access. Stanford, CA.
Appelt, D.E. (1999) Introduction to information extraction
technology. Tutorial, Int Joint Conf on Artificial Intelligence
IJCAI99. Morgan Kaufmann, San Mateo. Tutorial notes available at w
w w.ai.sri.com/~appelt/ie-tutorial.
Apte, C., Damerau, F.J. and Weiss, S.M. (1994) Automated learning
of decision rules for text categorization. ACM Trans Information
Systems, Vol. 12, No. 3, pp. 233-251.
Baeza-Yates, R. and Ribiero-Neto, B. (1999), Modern information
retrieval. Addison Wesley Longman, Essex, England.
Blum, A. and Mitchell, T. (1998) Combining labeled and unlabeled
data with co-training. Proc Conf on Computational Learning Theory
COLT-98. Madison, Wisconsin, pp. 92-100.
Borko, H. and Bernier, C.L. (1975) Abstracting concepts and methods.
Academic Press, San Diego, California.
Brill, E. (1992) A simple rule-based part of speech tagger. Proc Conf
on Applied NaturalLanguage Processing ANLP-92. Trento, Italy, pp.
152-155.
Brin, S. and Page, L. (1998) The anatomy of a large-scale
hypertextual Web search engine. ProcWorld Wide Web Conference
WWW-7. In Computer Networks and ISDN Systems, Vol. 30, No. 1-7,
pp. 107-117.

Rashmi Agrawal is working as Associate Professor

in Department of Computer Applications, Manav
Rachna International University, Faridabad. Her
qualifications are M.Tech, MBA, M.Phil (Computer
Science) and having more than 11 years of teaching
experience. She is pursuing Ph D in Computer
Science from Manav Rachna International University,
Faridabad. Her research area is Artificial Intelligence.
She has published 5 papers in National/ International Journals and 9 papers in
National/ International Conferences. Her area of interests is Data Structures,
Artificial Intelligence and Software Testing. She has also written a book on
Artificial Intelligence for Manav Rachna International Publication House. She
is an active member of Computer Society of India. She is a reviewer/ member
of IJRPES.

121

Mridula Batra is working as Assistant Professor in

Department of Computer Applications, Manav Rachna
International University, Faridabad. Her qualifications
are MCA, M.Phil (Computer Science) and having more
than 9 years of teaching experience. Her research area
is Data Mining. Her area of interests is Data Base
Systems, Computer Networks.

Ullah I. Future Communication Systems Using Artificial Intelligence, IoT,... 2024
No ratings yet
Ullah I. Future Communication Systems Using Artificial Intelligence, IoT,... 2024
253 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
30 pages
DWM Unit 1
No ratings yet
DWM Unit 1
27 pages
SECodec: Structural Entropy-Based Compressive Speech Representation Codec For Speech Language Models
100% (1)
SECodec: Structural Entropy-Based Compressive Speech Representation Codec For Speech Language Models
17 pages
AI Unit 1 Notes
100% (1)
AI Unit 1 Notes
16 pages
Ooad Unit I
No ratings yet
Ooad Unit I
227 pages
Introduction To Wireless Communication Systems
No ratings yet
Introduction To Wireless Communication Systems
27 pages
My Privacy My Decision: Control of Photo Sharing On Online Social Networks
No ratings yet
My Privacy My Decision: Control of Photo Sharing On Online Social Networks
84 pages
Wireless Communications and Networks - 30112018 PDF
No ratings yet
Wireless Communications and Networks - 30112018 PDF
200 pages
Data Science Harvard Lecture 1 PDF
No ratings yet
Data Science Harvard Lecture 1 PDF
43 pages
Advanced Functions of SQL
100% (1)
Advanced Functions of SQL
26 pages
Multithreaded Programming Using Java Threads
No ratings yet
Multithreaded Programming Using Java Threads
55 pages
Aiml Project Report
No ratings yet
Aiml Project Report
46 pages
CompTIA IT Fundamentals
No ratings yet
CompTIA IT Fundamentals
11 pages
Inside Social Network Analysis
100% (1)
Inside Social Network Analysis
13 pages
Unit 1 Information Architect
No ratings yet
Unit 1 Information Architect
35 pages
Dissertation Krueger Robert PDF
No ratings yet
Dissertation Krueger Robert PDF
212 pages
AI in The Enterprise: Unleashing Opportunity Through Data
No ratings yet
AI in The Enterprise: Unleashing Opportunity Through Data
35 pages
MCC Module 4
No ratings yet
MCC Module 4
83 pages
Lewis 5GPrimer WEB
No ratings yet
Lewis 5GPrimer WEB
22 pages
Cambridge University Press Social Media Mining An Introduction 2014
No ratings yet
Cambridge University Press Social Media Mining An Introduction 2014
338 pages
Object Oriented Analysis and Design Course Manual - Hari Aryal
No ratings yet
Object Oriented Analysis and Design Course Manual - Hari Aryal
122 pages
MSC - It - Object Oriented Analysis and Design
No ratings yet
MSC - It - Object Oriented Analysis and Design
243 pages
Refactoring PDF
No ratings yet
Refactoring PDF
43 pages
Save Data in Arabic in MySQL Database
No ratings yet
Save Data in Arabic in MySQL Database
2 pages
Sns Question Bank All Units Camu Format
No ratings yet
Sns Question Bank All Units Camu Format
9 pages
Introduccion To WebI
No ratings yet
Introduccion To WebI
72 pages
British Standards BS 1192 2007
100% (6)
British Standards BS 1192 2007
2 pages
Software Requirements Specification For Attendance
No ratings yet
Software Requirements Specification For Attendance
4 pages
Database Management System
No ratings yet
Database Management System
37 pages
White Paper 22022019 PDF
No ratings yet
White Paper 22022019 PDF
79 pages
Overview of Artificial Intelligence: Abu Saleh Musa Miah
No ratings yet
Overview of Artificial Intelligence: Abu Saleh Musa Miah
54 pages
Practical Text Analytics
No ratings yet
Practical Text Analytics
32 pages
Data Science: Executive PG Programme in
No ratings yet
Data Science: Executive PG Programme in
53 pages
Seminar 7 Introduction To Databases
No ratings yet
Seminar 7 Introduction To Databases
41 pages
ERD (Chapt 2)
No ratings yet
ERD (Chapt 2)
33 pages
Concurrent and Real-Time Programming in Java: © Andy Wellings, 2004
No ratings yet
Concurrent and Real-Time Programming in Java: © Andy Wellings, 2004
35 pages
Team5 Final
No ratings yet
Team5 Final
24 pages
Text Mining: Techniques and Its Application: December 2014
100% (1)
Text Mining: Techniques and Its Application: December 2014
5 pages
Modern Database Management 11e Chapter 1 Problems
0% (1)
Modern Database Management 11e Chapter 1 Problems
10 pages
Case Study On Text Mining
No ratings yet
Case Study On Text Mining
8 pages
Real Food Cafe
No ratings yet
Real Food Cafe
14 pages
NP Lab
No ratings yet
NP Lab
70 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
DBMS
No ratings yet
DBMS
20 pages
Slides CloudComputing
No ratings yet
Slides CloudComputing
40 pages
Constraint Satisfaction Problems
No ratings yet
Constraint Satisfaction Problems
35 pages
Communication System by Raviteja Balekai
No ratings yet
Communication System by Raviteja Balekai
68 pages
Doing Bluetooth Low Energy On Linux
No ratings yet
Doing Bluetooth Low Energy On Linux
26 pages
Meta-Analysis Course Content-Nibras Research Academy
No ratings yet
Meta-Analysis Course Content-Nibras Research Academy
8 pages
Aras Innovator 12.0 - Microsoft Reporting Services 2012 Guide PDF
No ratings yet
Aras Innovator 12.0 - Microsoft Reporting Services 2012 Guide PDF
43 pages
Text Mining Research Papers PDF
No ratings yet
Text Mining Research Papers PDF
28 pages
Predictive Maintenance of IOT Device
No ratings yet
Predictive Maintenance of IOT Device
15 pages
Primitives
100% (1)
Primitives
3 pages
(IJCST-V6I4P5) :S.Sheela, T.Bharathi
No ratings yet
(IJCST-V6I4P5) :S.Sheela, T.Bharathi
7 pages
Text Mining Project Report
No ratings yet
Text Mining Project Report
27 pages
BDA Mid-2 Important Questions
No ratings yet
BDA Mid-2 Important Questions
19 pages
Analisis Sistem Informasi Akuntansi Siklus Pendapatan Dan Siklus Pengeluaran Pada PT. Cipta Mortar Utama
No ratings yet
Analisis Sistem Informasi Akuntansi Siklus Pendapatan Dan Siklus Pengeluaran Pada PT. Cipta Mortar Utama
27 pages
Introduction of DBMS LEC 1
No ratings yet
Introduction of DBMS LEC 1
21 pages
From Big Data To Knowledge
No ratings yet
From Big Data To Knowledge
33 pages
DBE Case Study (Lab), Saswat Seth (CAM20023)
No ratings yet
DBE Case Study (Lab), Saswat Seth (CAM20023)
14 pages
DA Project Report
No ratings yet
DA Project Report
17 pages
Sgraup Resume
No ratings yet
Sgraup Resume
2 pages
Fitness Tracking System Project Proposal
No ratings yet
Fitness Tracking System Project Proposal
11 pages
Medbot: A Drug: Recommendation System
No ratings yet
Medbot: A Drug: Recommendation System
16 pages
Durant Photographyperformance 2010
No ratings yet
Durant Photographyperformance 2010
9 pages
Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu
No ratings yet
Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu
13 pages
Tensor Flow
No ratings yet
Tensor Flow
19 pages
Covariance Matrix Applications: Dimensionality Reduction
No ratings yet
Covariance Matrix Applications: Dimensionality Reduction
24 pages
Edit Tdi Excel Week 4 Assignment For Beginners
No ratings yet
Edit Tdi Excel Week 4 Assignment For Beginners
5 pages
2019 - 2 - Fake News Detection On Social Media Using Geometric Deep Learning
No ratings yet
2019 - 2 - Fake News Detection On Social Media Using Geometric Deep Learning
15 pages
A Comparison of Current Graph Database Models
No ratings yet
A Comparison of Current Graph Database Models
7 pages
FModel Log 2024 02 05
No ratings yet
FModel Log 2024 02 05
8 pages
Ultra Wide Band (UWB) Technology
No ratings yet
Ultra Wide Band (UWB) Technology
12 pages
What Is A PI System
No ratings yet
What Is A PI System
5 pages
Securing Virtual Realms: A Biometric Age-Verified Metaverse
No ratings yet
Securing Virtual Realms: A Biometric Age-Verified Metaverse
7 pages
UGRD-AI6100-2323T - Final Quiz 2 - Attempt PERFECT
No ratings yet
UGRD-AI6100-2323T - Final Quiz 2 - Attempt PERFECT
6 pages
91329-0136097111 ch01
No ratings yet
91329-0136097111 ch01
12 pages
OLGA 7.1.3 Release Notes
No ratings yet
OLGA 7.1.3 Release Notes
10 pages
SMS Spam Classification Using WEKA: Dipak R. Kawade Kavita S. Oza
No ratings yet
SMS Spam Classification Using WEKA: Dipak R. Kawade Kavita S. Oza
5 pages
Anritsu MT9090A Network Master
No ratings yet
Anritsu MT9090A Network Master
8 pages
Data Analyst Roadmap
No ratings yet
Data Analyst Roadmap
1 page
Table of Content
No ratings yet
Table of Content
13 pages
Text Mining Techniques Applications and Issues2
No ratings yet
Text Mining Techniques Applications and Issues2
5 pages
RAWDATA Assignment 2 - Querying IMDB With SQL: Use Your Account On Wt-220.ruc - DK
No ratings yet
RAWDATA Assignment 2 - Querying IMDB With SQL: Use Your Account On Wt-220.ruc - DK
4 pages
Rule Based Extraction From PDF
No ratings yet
Rule Based Extraction From PDF
4 pages
3day Techincal Worksop On IoT
No ratings yet
3day Techincal Worksop On IoT
2 pages
Business Analysis
No ratings yet
Business Analysis
2 pages
MYSQLAssignment 1
No ratings yet
MYSQLAssignment 1
2 pages
ADWDM (MECS-701) Syllabus
No ratings yet
ADWDM (MECS-701) Syllabus
1 page

A Detailed Study On Text Mining Techniques

Uploaded by

A Detailed Study On Text Mining Techniques

Uploaded by

International Journal of Soft Computing and Engineering (IJSCE)

ISSN: 2231-2307, Volume-2, Issue-6, January 2013

A Detailed Study on Text Mining Techniques

Abstract - Text Mining is an important step of Knowledge

I. INTRODUCTION TO TEXT MINING

Manuscript received on January, 2013.

II. APPLICATIONS OF TEXT MINING

A Detailed Study on Text Mining Techniques

Traditional keyword search retrieves documents

A. Mining Plain Text

International Journal of Soft Computing and Engineering (IJSCE)

these problems when small changes occur, and to make it

A Detailed Study on Text Mining Techniques

Agrawal, R. and Srikant, R. (1994)

Rashmi Agrawal is working as Associate Professor

Mridula Batra is working as Assistant Professor in

You might also like