0% found this document useful (0 votes)
10 views

Data Mining Assignment

Uploaded by

2313721033054
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Data Mining Assignment

Uploaded by

2313721033054
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DATA MINING ASSIGNMENT

TOPIC:TEXT MINING
INTRODUCTION:
Text mining in data mining is mostly used for, the unstructured text
data that can be transformed into structured data that can be used for
data mining tasks such as classification, clustering, and association
rule mining. This allows organizations to gain insights from a wide
range of data sources, such as customer feedback, social media posts,
and news articles
Text mining is a component of data mining that deals specifically
with unstructured text data. It involves the use of natural language
processing (NLP) techniques to extract useful information and
insights from large amounts of unstructured text data. Text mining
can be used as a preprocessing step for data mining or as a standalone
process for specific tasks.

LITERATURE REVIEW: This review of current literature


explores text mining techniques and industry-specific
applications. Selecting and using the right techniques
and tools according to the domain helps make the text-
mining process easier and more efficient. As you read
this article, understand this includes applying specific
sequences and patterns to extract useful information
by removing irrelevant details for predictive analysis.
Of course, major issues that may arise during the text
mining process include domain knowledge integration,
varying concepts of granularity, multilingual text
refinement, and natural language processing
ambiguity.

To implement text mining for an automated literature review, start


by defining your research objectives and keywords. This step ensures
your text mining efforts align with your research goals. Next, gather a
comprehensive corpus of relevant academic papers, articles, and
reports using databases like Google Scholar or PubMed.Once you’ve
assembled your corpus, employ natural language processing
techniques to preprocess the text data. This involves tokenization,
removing stop words, and stemming or lemmatization. Then, apply
text mining algorithms to extract key information, such as frequent
terms, topic models, and sentiment analysis. Tools like Python’s NLTK
or R’s tm package can be invaluable for this process. Finally,
synthesize the extracted information to identify patterns, trends, and
gaps in the existing literature. This automated approach can
significantly streamline your literature review process, allowing for
more efficient and comprehensive analysis of large volumes of
research material. Time-saving
Text mining can help researchers save time by automating the
process of identifying key themes and patterns.
Comprehensiveness
Text mining can help researchers produce more comprehensive
reviews by uncovering patterns and connections that might be
missed manually.
Identifying trends
Text mining can help researchers identify emerging trends and gaps
in existing research.
Filtering and categorization
Researchers can set parameters to filter and categorize information
based on their research questions.

PERSONAL MODEL:
There are many models and techniques used in text mining,
including:
Topic modeling
Uses unsupervised machine learning to identify groups of similar
words in a text. It can help understand the main topics in a collection
of documents. Latent Dirichlet Allocation (LDA) is a popular algorithm
for topic modeling.
Information extraction (IE)
Extracts relevant data from documents, such as keywords, addresses,
or emails. IE can save time by avoiding the need to manually sort
data.
Information retrieval (IR)
Uses algorithms to extract relevant patterns based on a set of words
or phrases. IR systems can track user behavior to discover relevant
data.
K-Nearest Neighbor (KNN)
Uses similarity measures to categorize data.
Decision trees
Uses a tree-like data structure to classify data. Decision trees can be
used to analyze customer feedback, classify sentiment, and identify
topics.
Random forest algorithm
Uses multiple decision trees to classify high-dimensional data.
Neural networks (NN)
Different types of neural networks can be used for text mining,
including convolutional neural networks (CNNs) and recurrent neural
networks (RNNs).
Clustering
Identifies intrinsic structures in textual information and organizes
them into subgroups or clusters.
Text mining also involves the following steps:
Data collection: Gathering text data from various sources
Preprocessing: Cleaning and preparing the data for analysis
Transformation: Transforming the text into a structured forma

CONCLUSION:

The availability of huge volume of text based data need to be


examined to extract valuable information. Text mining techniques
are used to analyze the interesting and relevant information
effectively and efficiently from large amount of unstructured data.
This paper presents a brief overview of text mining techniques that
help to improve the text mining process. Specific patterns and
sequences are applied in order to extract useful information by
eliminating irrelevant details for predictive analysis. Selection and
use of right techniques and tools according to the domain help to
make the text mining process easy and efficient. Domain knowledge
integration, varying concepts granularity, multilingual text
refinement, and natural language processing ambiguity are major
issues and challenges that arise during text mining process. In future
research work, we will focus to design algorithms which will help to
resolve issues presented in this work.

You might also like