0% found this document useful (0 votes)

9 views24 pages

IMTC634 - Data Science - Chapter 7

Uploaded by

msmakkar.chief19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views24 pages

IMTC634 - Data Science - Chapter 7

Uploaded by

msmakkar.chief19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Chapter 7: Text

Mining and Analytics

Chapter Index
S. Reference Particulars Slide
No. No. From - To
1 Learning Objectives 3
2 Topic 1 Differences between Text 4
Mining and Text Analytics

3 Topic 2 Text Mining Techniques 5 – 10

4 Topic 3 Text Mining Technologies 11 – 16
5 Topic 4 Methods and Approaches i 17 – 21
n Text Analytics

6 Topic 5 Applications of Text Analyt 22

ics

7 Let’s Sum Up 23
Learning Objectives

 Understand text mining and analytics

 Describe the text mining techniques

 Explain the text mining technologies

 Elucidate the methods and approaches in text analytics

 Describe the applications of text analytics

1. Differences between Text Mining
and Text Analytics

 Text Mining is the first step before analysing the text data. It involves

cleaning the data so that the same is made ready for text analytics.
 The various steps involved in the text mining process is shown in the

following figure :

Identification of a corpus

Preprocessing the text

Bag of words in R

Verify words in data frame

 Text analytics use techniques to infer, prescribe or predict any

information from the mined data.

2. Text Mining Techniques

 In order to make computers analyze, understand and

generate text, various techniques have developed in the
previous years. Some of the techniques are as follows:
 Sentiment analysis
 Topic modeling
 Term frequency
 Named entity recognition
 Event extraction
2. Text Mining Techniques

Sentiment analysis

 Sentiment analysis is one of the most significant and popular

techniques to describe and infer the textual data.

 It is used to derive the emotions from the text, tweets,

Facebook posts, or YouTube comments.

 Sentiments such as good, bad, anger, neutral, anxiety, etc.

are inferred from the given text.

 For example, how the people opine about a movie, topic or

decision by the government, etc. can be analyzed using
sentiment analysis tool.
2. Text Mining Techniques

Topic modeling
• Topic Modeling is a statistical approach for discovering
topic(s) from a collection of text documents based on
statistics of each word.
• Latent Dirichlet Allocation (LDA) is one of the most common
algorithms for topic modeling.
• The LDA Algorithm classifies the Corpus into Topics
automatically by self-learning to assign probabilities to all
terms in the corpus.
2. Text Mining Techniques

Term Frequency
• The Term Frequency tells about the importance of the word
with respect to total number of terms in the document.
• The ‘Term Frequency (TF)’ is usually measured along with
‘Inverse Document Frequency (IDF)’ as ‘TF-IDF’.
• ‘TF-IDF’ is abbreviation for ‘Term Frequency-Inverse
Document Frequency’. It is a statistic measure which tells
how a word is important in the given document.
2. Text Mining Techniques

Named Entity Recognition

 Named entity is the real-world object denoted by proper
noun for place, person, product, organization, quantity,
percentage, time, etc.
 Named Entity Recognition is a tool used in text analytics
which classify the named entities in the given corpus into
predefined classes, such as place, person, product,
organization, quantity, percentage, time, etc.
2. Text Mining Techniques

Event Extraction
 Suppose we want information of an event happened. Online
news has published this information in large text. Deriving
detailed and structured information about the event from
this text is called event extraction.
 By event extraction, we identify Ws, i.e., Who, When, Where,
to Whom, Why and How.
 In other words, event extraction identifies the relationship
between entities.
 Suppose you are analyzing the information on joint venture.
Then we will be extracting partners, products, place, capital
and profits of the said joint venture.
3. Text Mining Technologies

 Text mining is used to retrieve the potential information out

of the available data.
 Different technologies are required to extract the potential
information, some of which are as follows:

Information Retrieval

Information Extraction

Clustering

Categorization

Summarization
3. Text Mining Technologies

Information Retrieval
 Information Retrieval (IR) is extracting documents that
satisfies an information needed from within large collections.
These documents may be unstructured or semi-structured
and usually in text format. These documents are classified or
clustered as per the content or similarity in the content.
 It is a very broad term and data extracted from different
sources is further processed as per the requirement for
decision-making.
 In simple terms, you can say information retrieval gets sets
of relevant documents from the corpora or the masses.
3. Text Mining Technologies

Information Extraction
 Extraction of structured information from unstructured and/or
semi-structured documents is known as information extraction.
 In most of the cases, this activity concerns processing of human
language texts by means of Natural Language Processing (NLP).
 Information Extraction is the activity by which the document is
processed with automatic annotation and extraction of content
from images, audio, video.
 Internet Movie Database (IMDb) is an online database about the
information of world films, TV programs, home videos and video
games.
3. Text Mining Technologies

Clustering
 When you search for something on a web search engine, you get
huge number of documents in response to search phrase you
entered. It becomes difficult for you to browse or to identify the
relevant information.
 Clustering helps to group the retrieved documents into meaningful
categories. This grouping is done based on the descriptor (sets of
word) in the document. It is an unsupervised knowledge discovery
technique.
 One of the common example of clustering is hierarchical
clustering.
 In Hierarchical Clustering, each data point forms one cluster and
then pairs with the most adjacent cluster.
3. Text Mining Technologies

Categorization
 ‘Categorization’ refers to assigning the given document to a specific
category. A common example is segregating the application forms on
the basis of age, discipline, class, etc.
 The categorization can be done on the basis of topics or its
attributes, such as type of document, author, year of printing,
subject, etc.
 Categorization is also called ‘classification’ when you want to assign
instances of the appropriate class of your known types. If you are
using Gmail for handling emails, you find folders with names
Primary, Promotion, Social, Updates and Forum. Your emails are
being categorized into the previous mentioned categories.
3. Text Mining Technologies

Summarization
 Summarization is shorter form of text derived from one or
more texts which gives important knowledge from the
original document.
 The most important advantage of using a summary is that it
reduces the reading time.
 Text Summarization methods can be classified into the
following types:
 Extractive summarization
 Abstractive summarization
 Indicative summarization
 Informative summarization
4. Methods and Approaches in Text
Analytics

 In text mining, there are mining approaches, one which is based

on keywords and another one which is based on intelligent
technologies.
 The keyword-based approach uses different elements in the text
by identifying repetitive patterns present in the text and
establishing relationship between these elements using statistical
techniques.
 Text analytics is based on retrieval according to user requirement.
For information retrieval, the following methods are being used in
text analytics:
 Term-based method
 Phrase-based method
 Concept-based method
 Pattern taxonomy method
4. Methods and Approaches in Text
Analytics
Content Analysis
• Content analysis is a method for summarizing any form of content
by counting various aspects of the content.
• Content analysis also uses the quantitative method, though it
analyzes terms and the results are in the form of numbers and
percentages. The content analysis has six main stages, which are as
follows:

1. Selecting content for analysis

2. Units of content

3. Preparing content for coding

4. Coding the content

5. Counting and weighing

6. Drawing conclusions
4. Methods and Approaches in Text
Analytics
Natural Language Processing
 Program computers to process and analyze the natural
language is called Natural Language Processing (NLP).
 The NLP process is broken down into three parts. The first
task of NLP is to understand the natural language received
by the computer.
 The next task is called the part-of-speech (POS) tagging or
word-category disambiguation.
 The third step taken by an NLP is text-to-speech conversion.
At this stage, the computer programming language is
converted into an audible or textual format for the user.
4. Methods and Approaches in Text
Analytics
Simple Predictive Modeling
 Statistical technique to make predictions based on past
occurrences/data is called Predictive Modeling.
 Predictive Modeling involves the process of creating, testing
and validating the model to best predict the outcome. It is
done by running one or more algorithms on the data set
where prediction is going to be carried out.
 The seven steps involved in predictive modeling are:

1. Data Mining: The relevant data is mined from the

available chunk of data.

2. Understanding the Data: The data is then understood

to prepare the model.
4. Methods and Approaches in Text
Analytics
Simple Predictive Modeling

3. Preprocessing the Data: The data is preprocessed to prepare

the data model.
4. Model of Data: The model of data is created after preprocessing.
5. Evaluate model and select the best-fit model: The model
created is then evaluated and the best-fit model is selected for
deployment.
6. Deploy the model: The best-fit model is then deployed in
business.
7. Monitor and improve: The deployed model is monitored and
improved on timely basis.
5. Applications of Text Analytics

 Text analytics is used to analyze unstructured text, take out

important information from it and transform it into useful

information.

 Due to this benefit, text analytics find its applications in

various fields, some of which are as follows:

 Sentiment Analysis

 Emotion Detection

 Scholarly Communication

 Health

 Visualization
Let’s Sum Up

 One of the most significant and popular techniques to describe

and infer the textual data is sentiment analysis.
 Topic Modeling is a statistical approach for discovering topic(s)
from a collection of text documents based on statistics of each
word.
 The LDA Algorithm classifies the Corpus into Topics
automatically by self-learning to assign probabilities to all
terms in the corpus.
 The Term Frequency tells about the importance of the word
with respect to total number of terms in the document
 Deriving detailed and structured information about the event
from text is called event extraction.
THANK YOU

Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
64 pages
Tappi T411
100% (1)
Tappi T411
4 pages
Early Method of Detecting Deception
100% (2)
Early Method of Detecting Deception
6 pages
God of War Ghost of Sparta
100% (1)
God of War Ghost of Sparta
32 pages
FDS-Content Beyond Syllabus
No ratings yet
FDS-Content Beyond Syllabus
15 pages
Lecture 5 - Text Mining Sentiment and Social Media Analytics
No ratings yet
Lecture 5 - Text Mining Sentiment and Social Media Analytics
52 pages
MSS 064 Rev.00 Final
No ratings yet
MSS 064 Rev.00 Final
33 pages
Text Analytics
No ratings yet
Text Analytics
9 pages
Applied Text Analysis
No ratings yet
Applied Text Analysis
13 pages
Chapter 7 - Text Mining, Sentiment Analysis, and Social Analytics
No ratings yet
Chapter 7 - Text Mining, Sentiment Analysis, and Social Analytics
91 pages
Case Study On Text Mining
No ratings yet
Case Study On Text Mining
8 pages
UNIT - 1 Text Mining
No ratings yet
UNIT - 1 Text Mining
18 pages
Decision Support and Business Intelligence Systems (9 Ed., Prentice Hall) Text and Web Mining
100% (1)
Decision Support and Business Intelligence Systems (9 Ed., Prentice Hall) Text and Web Mining
45 pages
CG Data Management System
No ratings yet
CG Data Management System
2 pages
Soumya Ranjan Dash - Es20913
No ratings yet
Soumya Ranjan Dash - Es20913
1 page
Text and Web Analytics
No ratings yet
Text and Web Analytics
48 pages
Survey Instrument Validation Rating Scale SHS 2023
No ratings yet
Survey Instrument Validation Rating Scale SHS 2023
1 page
Text Mining & Applications in Social Media: by Anthony Yang
No ratings yet
Text Mining & Applications in Social Media: by Anthony Yang
30 pages
Text Mining: Techniques and Its Application: December 2014
100% (1)
Text Mining: Techniques and Its Application: December 2014
5 pages
Text Mining: Tools, Techniques, and Applications
No ratings yet
Text Mining: Tools, Techniques, and Applications
19 pages
Techno-Commercial Proposal (Without Price) (08!04!2025)
No ratings yet
Techno-Commercial Proposal (Without Price) (08!04!2025)
6 pages
Module 1 Part1
No ratings yet
Module 1 Part1
54 pages
A FAREWELL TO VIROLOGY (EXPERT EDITION) DR Mark Bailey
No ratings yet
A FAREWELL TO VIROLOGY (EXPERT EDITION) DR Mark Bailey
67 pages
Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
63 pages
Advanced AutoCAD 2022 Exercise Workbook For Windows Cheryl R Shrock Steve Heather Download PDF
100% (2)
Advanced AutoCAD 2022 Exercise Workbook For Windows Cheryl R Shrock Steve Heather Download PDF
40 pages
Business Intelligence and Anlytics UNIT 2
No ratings yet
Business Intelligence and Anlytics UNIT 2
35 pages
Bcse206l FDS Module-4 Smsatapathy
No ratings yet
Bcse206l FDS Module-4 Smsatapathy
50 pages
AFM - Module 4
No ratings yet
AFM - Module 4
48 pages
CHP 5
No ratings yet
CHP 5
57 pages
1 2 3 4 5 Merged
No ratings yet
1 2 3 4 5 Merged
23 pages
Text Mining
No ratings yet
Text Mining
18 pages
Unit I - Text Mining
No ratings yet
Unit I - Text Mining
48 pages
Lesson 23 - Unit Review Part 1
No ratings yet
Lesson 23 - Unit Review Part 1
2 pages
Turban Dss9e Ch07
No ratings yet
Turban Dss9e Ch07
45 pages
Text Mining: A Burgeoning Technology For Knowledge Extraction
100% (1)
Text Mining: A Burgeoning Technology For Knowledge Extraction
5 pages
Spectroscopic Techniques
No ratings yet
Spectroscopic Techniques
38 pages
Risk Analytics (IMT) - Chapter 7
No ratings yet
Risk Analytics (IMT) - Chapter 7
47 pages
Risk Analytics (IMT) - Chapter 11
No ratings yet
Risk Analytics (IMT) - Chapter 11
27 pages
Risk Analytics (IMT) - Chapter 12
No ratings yet
Risk Analytics (IMT) - Chapter 12
25 pages
Business Intelligence and Data Mining: by Dr. Atanu Rakshit Email: Atanu - Rakshit@iimrohtak - Ac.in
No ratings yet
Business Intelligence and Data Mining: by Dr. Atanu Rakshit Email: Atanu - Rakshit@iimrohtak - Ac.in
122 pages
DS Finalexam (Thxtoshravani)
No ratings yet
DS Finalexam (Thxtoshravani)
31 pages
Mod 1
No ratings yet
Mod 1
14 pages
Module 4
No ratings yet
Module 4
63 pages
Screenshot 2024-06-04 at 12.02.17 AM
No ratings yet
Screenshot 2024-06-04 at 12.02.17 AM
23 pages
DMPPT 557
No ratings yet
DMPPT 557
14 pages
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
42 pages
Distribution Methods and Strategies
No ratings yet
Distribution Methods and Strategies
25 pages
Seven Text Mining Techniques
No ratings yet
Seven Text Mining Techniques
21 pages
Webminingtextmining 160906165305
No ratings yet
Webminingtextmining 160906165305
17 pages
Unit 13
No ratings yet
Unit 13
21 pages
IMTC634 - Data Science - Chapter 13
No ratings yet
IMTC634 - Data Science - Chapter 13
16 pages
IMTC634 - Data Science - Chapter 11
No ratings yet
IMTC634 - Data Science - Chapter 11
22 pages
Unit 3
No ratings yet
Unit 3
20 pages
IMTC634 - Data Science - Chapter 10
No ratings yet
IMTC634 - Data Science - Chapter 10
18 pages
Unit 13 Budgeting and Budgetary Control
No ratings yet
Unit 13 Budgeting and Budgetary Control
33 pages
IMTC634 - Data Science - Chapter 9
No ratings yet
IMTC634 - Data Science - Chapter 9
16 pages
10 1109@icaccs 2019 8728547
No ratings yet
10 1109@icaccs 2019 8728547
5 pages
IMTC634 - Data Science - Chapter 6
No ratings yet
IMTC634 - Data Science - Chapter 6
22 pages
IMTC634 - Data Science - Chapter 12
No ratings yet
IMTC634 - Data Science - Chapter 12
15 pages
Text Mining
No ratings yet
Text Mining
25 pages
Text Mining
No ratings yet
Text Mining
12 pages
Text Mining
No ratings yet
Text Mining
13 pages
Section 2 Text Analytics and Text Mining Overview
No ratings yet
Section 2 Text Analytics and Text Mining Overview
47 pages
IMTC634 Data Science Chapter 3
No ratings yet
IMTC634 Data Science Chapter 3
11 pages
Literature Review Last Edit
No ratings yet
Literature Review Last Edit
11 pages
Continuity at A Point
No ratings yet
Continuity at A Point
20 pages
Cinema India
No ratings yet
Cinema India
31 pages
Web and Text Mining
No ratings yet
Web and Text Mining
6 pages
Text Mining
No ratings yet
Text Mining
16 pages
What Is Text Mining
No ratings yet
What Is Text Mining
9 pages
Customer Relationship Management
No ratings yet
Customer Relationship Management
25 pages
DMTerm Paper
No ratings yet
DMTerm Paper
4 pages
0812 0819BL
No ratings yet
0812 0819BL
15 pages
Assignment Rubel - Data Mining
No ratings yet
Assignment Rubel - Data Mining
12 pages
Chapter 5 Predictive Analytics II Text J Web J and Social Media Analytics
No ratings yet
Chapter 5 Predictive Analytics II Text J Web J and Social Media Analytics
5 pages
Prefinal-1 Model Paper (2024-25)
No ratings yet
Prefinal-1 Model Paper (2024-25)
4 pages
Astma Lab Manual
No ratings yet
Astma Lab Manual
17 pages
Matrikulasi - 2
No ratings yet
Matrikulasi - 2
37 pages
Text Mining Introduction
No ratings yet
Text Mining Introduction
6 pages
Banana - Mail Arte - Flue - v4 - n3-4 - 1984
No ratings yet
Banana - Mail Arte - Flue - v4 - n3-4 - 1984
3 pages
Maroon Black Minimalist Best Genre Movie List Planner
No ratings yet
Maroon Black Minimalist Best Genre Movie List Planner
5 pages
PJS Damansara Qtr4 2022 - Invoices
No ratings yet
PJS Damansara Qtr4 2022 - Invoices
3 pages
05b.BDA (18CS72) Module-5 Text Mining
No ratings yet
05b.BDA (18CS72) Module-5 Text Mining
23 pages
Text and Web Mining
No ratings yet
Text and Web Mining
44 pages
Isba 1 Finals Reviewer
No ratings yet
Isba 1 Finals Reviewer
3 pages
What Is Budgetary Cycle
No ratings yet
What Is Budgetary Cycle
6 pages
Mathura Vrindavan Tour
No ratings yet
Mathura Vrindavan Tour
1 page
Survey Data Analysis
No ratings yet
Survey Data Analysis
17 pages
A Shani 2020
No ratings yet
A Shani 2020
9 pages
Information Retrieval
No ratings yet
Information Retrieval
3 pages
O Poder Do Mel
No ratings yet
O Poder Do Mel
26 pages
Text Mining
No ratings yet
Text Mining
3 pages
Method Section-Seminar Paper
No ratings yet
Method Section-Seminar Paper
6 pages
Force of Friction
No ratings yet
Force of Friction
30 pages
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
No ratings yet
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
11 pages
Emergency Nursing Questionnaires 2
No ratings yet
Emergency Nursing Questionnaires 2
1 page
Library Cataloger General Responsibilities
No ratings yet
Library Cataloger General Responsibilities
2 pages
1-What Is Text Mining - IBM
No ratings yet
1-What Is Text Mining - IBM
5 pages
Construction of Anganwadi Centres: Madhya Pradesh
No ratings yet
Construction of Anganwadi Centres: Madhya Pradesh
4 pages
Different Text Mining Techniques
No ratings yet
Different Text Mining Techniques
4 pages
Kerry Anderson Resume 2017 Weebly
No ratings yet
Kerry Anderson Resume 2017 Weebly
3 pages
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

IMTC634 - Data Science - Chapter 7

Uploaded by

IMTC634 - Data Science - Chapter 7

Uploaded by

Chapter 7: Text

Mining and Analytics

3 Topic 2 Text Mining Techniques 5 – 10

6 Topic 5 Applications of Text Analyt 22

 Understand text mining and analytics

  Describe the text mining techniques

  Explain the text mining technologies

  Elucidate the methods and approaches in text analytics

  Describe the applications of text analytics

Preprocessing the text

Verify words in data frame

 Text analytics use techniques to infer, prescribe or predict any

information from the mined data.

 In order to make computers analyze, understand and

 Sentiment analysis is one of the most significant and popular

 It is used to derive the emotions from the text, tweets,

 Sentiments such as good, bad, anger, neutral, anxiety, etc.

 For example, how the people opine about a movie, topic or

Named Entity Recognition

 Text mining is used to retrieve the potential information out

 In text mining, there are mining approaches, one which is based

1. Selecting content for analysis

3. Preparing content for coding

4. Coding the content

5. Counting and weighing

1. Data Mining: The relevant data is mined from the

2. Understanding the Data: The data is then understood

3. Preprocessing the Data: The data is preprocessed to prepare

 Text analytics is used to analyze unstructured text, take out

important information from it and transform it into useful

 Due to this benefit, text analytics find its applications in

various fields, some of which are as follows:

 One of the most significant and popular techniques to describe

You might also like

 Describe the text mining techniques

 Explain the text mining technologies

 Elucidate the methods and approaches in text analytics

 Describe the applications of text analytics