0% found this document useful (0 votes)

198 views37 pages

TEXT ANALYTICS With Python

The document discusses text mining and sentiment analysis. It covers topics like the process of text analytics, techniques used in text analytics like natural language processing and sentiment analysis, practical applications of text analytics, and use case discussions. The document provides details on each step of text analytics like data collection, preprocessing, feature extraction, feature selection and different analysis methods.

Uploaded by

ignacio.pelirojo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views37 pages

TEXT ANALYTICS With Python

Uploaded by

ignacio.pelirojo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

TEXT MINING AND

SENTIMENT ANALYSIS
Extracting textual information to draw insights

Jeroen VK Rombouts

1
Topics for the Session

1. Introduction

2. Process of Text Analytics

3. Text Analytics Techniques

4. Practical Applications

5. Use Case Discussion

2
1. INTRODUCTION

3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

What is Text Mining ?

u Text Mining is the process of deriving high-quality information through
statistical pattern learning from text
u Types: text categorization, text clustering, concept/entity extraction,
production of granular taxonomies, sentiment analysis, document
summarization, and entity relation modelling

4
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Need for Text Mining (1/2)

u The global text analytics market was valued at USD 3.95 billion and is
expected to reach USD 10.38 billion by 2023 with an expected Compound
Annual Growth rate (CAGR) of 17.3% during the forecast period of 2018–2023
u Text analytics tools are being increasingly used by organizations to aid their
business-making process by offering actionable insights from various forms of
text sources, such as client interaction, emails, blogs, product reviews,
tweets, etc.
u The primary objective of text analytics is to accumulate different forms of
data, including structured and unstructured, which is further utilized for
analysis, thereby fuelling the organization’s business decisions

5
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Need for Text Mining (2/2)

u In marketing: analytical customer relationship management, predictive

model for customer attrition, sentiment analysis of a brand (benchmarking,
market analysis, competitive analysis …)
u Determine the identity of a brand, the way it communicates to its audience,
which emotional triggers it uses for its marketing campaigns …
u Ultimately, text mining allows a brand to readjust its communication and
strategy by identifying how audience/partners/competitors perceive it.

User Valence Volume .…

Peter +5 500
Sarah +3 400
Comp. 1 -10 5000
… … …
6
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text mining & Social Media Data – The Questions

u Volume:
u How much?
u Examples of metrics?
u Valence:
u How to measure?
u Examples of metrics?
u Heterogeneity?
u How different?
u Examples of metrics?

7
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text mining & Social Media Data – The Answers

u Volume:
u Amount of data scraped – measured in terms of kilobytes/gigabytes
u Number of records in the given data
u Valence:
u Measure the amount of positivity or negativity of a sentence
u Polarity and subjectivity
u Heterogeneity?
u Similarity of words in the text corpus
u Clustering based on the term frequencies

8
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

What is our Prime Focus ?

External and non-structured Data: Internalized Data:
Network, UGC, etc. Datawarehouse, ERP, CRM, etc.

External structured Data: Panel, Data for organizations and

Survey, Tests, etc. businesses directly usable for
business solutions

9
2. PROCESS OF TEXT
ANALYTICS

3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Process of Text Analytics

u Collection of Text Data

u Pre-processing
u Feature Extraction
u Feature Selection
u Text Analysis and Modelling
u Natural Language Processing
u Sentiment Analysis
u Text Grouping and Classification

11
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text Mining – Classification tree

12
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text Data

Data for text analytics can be of many forms

such as:
u Structured – Survey forms, Tests, Word
docs
u Semi-structured – Job listings, Retail
invoices, Reports
u Unstructured – Blogs, Tweets, Comments

13
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Pre-processing

u Case Conversion
u Punctuation removal
u Stopwords removal – Common words without significance
u Rare words removal – Very rare words which have no meaning
u Spelling correction
u Tokenization – Breaking down a sentence into a list of words
u Stemming – pruning the words to obtain the root word
u Lemmatization – changing the grammatical tense to obtain the root word

14
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Feature Extraction

u Number of words
u Number of characters
u Average word length
u Number of stopwords
u Number of special characters
u Number of numeric characters
u Number of uppercase words

15
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Feature Selection

u Feature selection refers to the filtering of useful information from the

extracted features through the methods discussed before.
u Feature selection can either be done by ‘Bag of Words’ method or by Machine
Learning
u Some other feature selection techniques and N-grams, Term Frequency,
Inverse Document Frequency (TF-IDF), Word embeddings

16
3. TEXT ANALYTICS
TECHNIQUES

3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Natural Language Processing

u The foremost functionality of the NLP in Text Mining is Parts Of Speech

tagging (commonly referred to as POS tagging). This function identifies each
word in a sentence as a grammatical part and tags them.
u Other features of NLP include:
u Text summarization
u Machine Translation
u Optical Character Recognition
u Document to Information

18
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Sentiment Analysis

u Brand perception among customers is one of the key factors to be considered

before making any critical decisions in the current market
u Sentiment Analysis of Text Data which has been collected, cleaned and
processed, will help us to better understand the consumer market
u The data for sentiment analysis is usually tweets, social media posts, blog
comments, product reviews, etc.
u Sentiment Analysis can also be carried out on large paragraphs to perceive the
emotion of the given text

19
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text Classification

u Types:
u Supervised document classification
u Unsupervised document classification
u Semi-supervised document classification
u Techniques:
u K-nearest neighbour algorithms
u Naïve Bayes classifier
u Support Vector Machines
u Artificial Neural Networks

20
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

POS Tagging – Parts Of Speech

u POS tagging is a process by which a single Parts of Speech tag is assigned to
each word (and symbols/punctuations) in a text.
u This is very useful to find out the grammatical patterns in N-grams and to
calculate distance metrics between different POS tags

21
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

TF-IDF (1/2)

u TF-IDF refers to Term Frequency – Inverse Document Frequency. It gives us the

importance of a particular word found in a text corpus
u The value of TF-IDF increases proportionally to the number of times a word
appears in the document and is offset by the number of documents in the
corpus that contain the word
u The formula for Term Frequency is given by:

22
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

TF-IDF (2/2)
u The Inverse Document Frequency is given by:

u Finally TF-IDF:

23
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Similarity – Levenshtein Distance

u The minimum number of edits (insertion, deletion, substitution) needed to

change a string of characters into another
u For example, the Levenshtein distance between kitten and sitting is 3, since
the following three edits change one into the other, and there is no way to do
it with fewer than three edits:
kitten → sitten (substitution of "s" for "k")
sitten → sittin (substitution of "i" for "e")
sittin → sitting (insertion of "g" at the end).
u Application: Spell Checkers, Fuzzy String searching, assist natural language
translation based on translation memory

24
4. PRACTICAL
APPLICATIONS

3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Practical Applications
u Spam mail Classification
u Brand perception in current Market
u Competitor Analysis
u Contextual Advertising
u Business Intelligence
u Prediction and Prevention of Crime
u Customer Care services
u Fraud detection by Insurance Companies

26
5. USE CASE
DISCUSSION

3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Hands on Text Analytics session

u Open the “Text Analytics - Accenture Strategic Business Analytics Chair”
python notebook
u type ‘pip install’ followed by the library name, to download required
packages or dependencies
u pip install textblob
u Set working directory to the location of the “train_E6oV3lV” CSV file

28
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Motivation behind the Use Case

u Hate speech is an unfortunately common occurr
ence on the Internet. Often social media sites
like Facebook and Twitter face the problem of
identifying and censoring problematic posts while
weighing the right to freedom of speech.
u The importance of detecting and moderating
hate speech is evident from the strong connection
between hate speech and actual hate crimes.
u Early identification of users promoting hate speech
could enable outreach programs that attempt to
prevent an escalation from speech to action.

29
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

About the Data set

u This data consists of Tweets was extracted from Twitter and is available for
the public on Analytics Vidhya contest – “Twitter Sentiment Analysis”
u The data is in the form of CSV containing 31,962 unique tweets which have
been scraped from twitter which has a mix of hate, neutral and positive
tweets
u Each tweet has a corresponding tweet ID and its sentiment label
u The hate tweets have been labelled as ‘1’ and the others as ‘0’

30
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Let’s explore the data

31
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text Pre-processing

32
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Feature Selection
TF-IDF N-grams

33
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Sentiment Analysis - Output

This analysis gives us a general opinion about

the set of tweets we took into consideration.
From the pie chart, we can see that around
80% of the tweets are either neutral or
positive and hence there is very less
hate/negative content on this text corpus.
34
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Word Cloud
What conclusions can we
draw based on the resulting
word cloud ?

We can refine the graph by

removing certain words
from the original corpus,
e.g.:
• Remove “go”
• Use Spelling Checks

35
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

K-means Clustering
Through K-Means clustering we
can now identify the group of
people who have a higher positive
sentiment than the rest, which is
cluster 2.

By clustering the tweets through

the sentiments instead, we can
classify the users according to
their emotions expressed in their
posts.

36
Conclusion and Future Scope

u Thus from our above analysis, we have obtained insights on the overall
sentiment of the people whose tweets have been scrutinized.
u This sentiment analysis will provide the base for hate/love speech
recognition.
u Further delving into the subject, we can train a model with our newly tagged
tweets and predict the occurrence of hate speeches of a new set of tweets.

Drains Manual
No ratings yet
Drains Manual
260 pages
17 Free Data Science Projects To Boost Your Knowledge & Skills
100% (1)
17 Free Data Science Projects To Boost Your Knowledge & Skills
9 pages
10 001 Krebs millMAX Centrifugal Slurry Pumps 2017 PDF
No ratings yet
10 001 Krebs millMAX Centrifugal Slurry Pumps 2017 PDF
8 pages
Programa Ciencia de Datos y Machine Learning Con Python.
No ratings yet
Programa Ciencia de Datos y Machine Learning Con Python.
13 pages
TOP 21 DATA SCIENCE PROJECTS - Part 1
No ratings yet
TOP 21 DATA SCIENCE PROJECTS - Part 1
6 pages
Full Computing Essentials 2023 29th Edition Timothy O'Leary PDF All Chapters
No ratings yet
Full Computing Essentials 2023 29th Edition Timothy O'Leary PDF All Chapters
51 pages
Best Practices May 20 2009 Presentation R0 - Benchmarking
No ratings yet
Best Practices May 20 2009 Presentation R0 - Benchmarking
28 pages
ONA2014 - Alberto Cairo
100% (1)
ONA2014 - Alberto Cairo
86 pages
Microstrategy Tips and Techniques: Reporting Essentials Five Styles of Business Intelligence
No ratings yet
Microstrategy Tips and Techniques: Reporting Essentials Five Styles of Business Intelligence
20 pages
Step by Step Guide How To Rapidly Build Neural Networks
No ratings yet
Step by Step Guide How To Rapidly Build Neural Networks
6 pages
Luis Romero CV Update
No ratings yet
Luis Romero CV Update
6 pages
Images & Video Capture - Qt+opencv PDF
100% (1)
Images & Video Capture - Qt+opencv PDF
70 pages
Book - Deep Learning - MIT PRESS - Book Online 2019
No ratings yet
Book - Deep Learning - MIT PRESS - Book Online 2019
3 pages
Project VBA: How and Why It Can Make You A Project Guru!
No ratings yet
Project VBA: How and Why It Can Make You A Project Guru!
14 pages
SQL Quick Reference
No ratings yet
SQL Quick Reference
3 pages
Dave's Tip 30 - Using P6 EPPM 8.2 Capacity Planning Module: TR Aining - Consul Ting - Soft Ware
100% (1)
Dave's Tip 30 - Using P6 EPPM 8.2 Capacity Planning Module: TR Aining - Consul Ting - Soft Ware
4 pages
Project Management Body of Knowledge: Muhammad Mudassar Ali SP14-R11-010/MS (PM)
No ratings yet
Project Management Body of Knowledge: Muhammad Mudassar Ali SP14-R11-010/MS (PM)
13 pages
Mongodb Essentials Training
No ratings yet
Mongodb Essentials Training
272 pages
Monte Carlo Simulation in Crystal Ball 7.3
No ratings yet
Monte Carlo Simulation in Crystal Ball 7.3
40 pages
Script Chat GPT
No ratings yet
Script Chat GPT
6 pages
Construction Cost Estimation Model and Dynamic Management Control Analysis Based On Artificial Intelligence
No ratings yet
Construction Cost Estimation Model and Dynamic Management Control Analysis Based On Artificial Intelligence
12 pages
60+ Data Engineer Interview Questions and Answers
No ratings yet
60+ Data Engineer Interview Questions and Answers
16 pages
QlikView Essentials - Sample Chapter
No ratings yet
QlikView Essentials - Sample Chapter
21 pages
Big Data
No ratings yet
Big Data
18 pages
EPPM Conference 2017
No ratings yet
EPPM Conference 2017
347 pages
How To Extend RapidMiner 5
No ratings yet
How To Extend RapidMiner 5
92 pages
RS161 1
No ratings yet
RS161 1
24 pages
07 - EDM Earned Duration Management
No ratings yet
07 - EDM Earned Duration Management
23 pages
Scheduling Riskmanagement CPM
No ratings yet
Scheduling Riskmanagement CPM
27 pages
Trends in Computer Science, Engineering and Information Technology First International Conference on Computer Science, Engineering and Information Technology, CCSEIT 2011, Tirunelveli, Tamil Nadu, India, September 23-25, 2
No ratings yet
Trends in Computer Science, Engineering and Information Technology First International Conference on Computer Science, Engineering and Information Technology, CCSEIT 2011, Tirunelveli, Tamil Nadu, India, September 23-25, 2
755 pages
Project
No ratings yet
Project
146 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
6 pages
Prompt Engineering For PMs
No ratings yet
Prompt Engineering For PMs
11 pages
KPM650 User Manual V2.2
No ratings yet
KPM650 User Manual V2.2
41 pages
Your Data Literacy Depends On Understanding The Types of Data and How They're Captured
No ratings yet
Your Data Literacy Depends On Understanding The Types of Data and How They're Captured
5 pages
P6 Databases Level 1
No ratings yet
P6 Databases Level 1
14 pages
FlexSim 7.5.2 Manual PDF
No ratings yet
FlexSim 7.5.2 Manual PDF
1,106 pages
Risk Analysis
No ratings yet
Risk Analysis
20 pages
E-Commerce Website
No ratings yet
E-Commerce Website
37 pages
14 MapReduce
100% (1)
14 MapReduce
82 pages
ISO 80000-3 A Complete Guide
From Everand
ISO 80000-3 A Complete Guide
Gerardus Blokdyk
No ratings yet
Guia Certificacion Asociaciones Publico Privadas APMG Chapter 1
0% (1)
Guia Certificacion Asociaciones Publico Privadas APMG Chapter 1
198 pages
Toronto Data Online Curriculum
No ratings yet
Toronto Data Online Curriculum
11 pages
L 0007634413 PDF
0% (1)
L 0007634413 PDF
30 pages
PMP Course Details
No ratings yet
PMP Course Details
8 pages
Ghezzi Fundamentals of Software Engineering
0% (1)
Ghezzi Fundamentals of Software Engineering
468 pages
Performance Tips and Tricks With ABL
No ratings yet
Performance Tips and Tricks With ABL
23 pages
International: The Association For The Advancement of Cost Engineering
No ratings yet
International: The Association For The Advancement of Cost Engineering
18 pages
REPORTE - Cost Management Manual
No ratings yet
REPORTE - Cost Management Manual
87 pages
Big Data Analytics Using Multiple Criteria Decision-Making Models (2017)
No ratings yet
Big Data Analytics Using Multiple Criteria Decision-Making Models (2017)
387 pages
Business Intelligence and Anlytics UNIT 2
No ratings yet
Business Intelligence and Anlytics UNIT 2
35 pages
Bcse206l FDS Module-4 Smsatapathy
No ratings yet
Bcse206l FDS Module-4 Smsatapathy
50 pages
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
42 pages
2025 Sma M3
No ratings yet
2025 Sma M3
77 pages
Lec 5 e Text Analytics Vector Space TF IDF
No ratings yet
Lec 5 e Text Analytics Vector Space TF IDF
51 pages
Text Analysis Monkeylearncom
No ratings yet
Text Analysis Monkeylearncom
46 pages
M3-Social Media Text Analytics
No ratings yet
M3-Social Media Text Analytics
19 pages
Lecture 6-Text Mining and Sentiment Analysis
No ratings yet
Lecture 6-Text Mining and Sentiment Analysis
57 pages
Text and Sentiment Analysis
No ratings yet
Text and Sentiment Analysis
41 pages
10 - Session 10 - Text Analytics, Text Mining and Sentiment Analysis
No ratings yet
10 - Session 10 - Text Analytics, Text Mining and Sentiment Analysis
36 pages
New Design of Intelligent Load Shedding Algorithm Based On Critical Line Overloads To Reduce Network Cascading Failure Risks
No ratings yet
New Design of Intelligent Load Shedding Algorithm Based On Critical Line Overloads To Reduce Network Cascading Failure Risks
15 pages
Faircode Technologies Private Limited - Home
No ratings yet
Faircode Technologies Private Limited - Home
1 page
Physical Education Class 12 Important Questions Chapter 10 Kinesiology Biomechanics and Sports - Learn CBSE
No ratings yet
Physical Education Class 12 Important Questions Chapter 10 Kinesiology Biomechanics and Sports - Learn CBSE
14 pages
Experiment 16: Heat Conduction
No ratings yet
Experiment 16: Heat Conduction
6 pages
Neoplasia
100% (1)
Neoplasia
15 pages
LPG Medan
100% (1)
LPG Medan
38 pages
Plus One Notes - Eng
No ratings yet
Plus One Notes - Eng
11 pages
The Life and Death of Planet Earth How The New Science of Astrobiology Charts The Ultimate Fate of Our World 1st Edition Peter Ward Download
No ratings yet
The Life and Death of Planet Earth How The New Science of Astrobiology Charts The Ultimate Fate of Our World 1st Edition Peter Ward Download
51 pages
Authentic Assessment Rubric - New Dog Breed
No ratings yet
Authentic Assessment Rubric - New Dog Breed
2 pages
How Could Ocean Acidification Impact Marine Organisms?: PH (PH of Liquids)
No ratings yet
How Could Ocean Acidification Impact Marine Organisms?: PH (PH of Liquids)
4 pages
2015 고등 영어독해와작문 (안병규) 교과서PDF
No ratings yet
2015 고등 영어독해와작문 (안병규) 교과서PDF
184 pages
Todd J. Desiato and Riccardo C. Storti - Warp Drive Propulsion Within Maxwell's Equations
No ratings yet
Todd J. Desiato and Riccardo C. Storti - Warp Drive Propulsion Within Maxwell's Equations
16 pages
Agricultural Pesticide Spraying Robotic System Controlled Using Android Application
No ratings yet
Agricultural Pesticide Spraying Robotic System Controlled Using Android Application
6 pages
alloy20DataSheet PDF
No ratings yet
alloy20DataSheet PDF
2 pages
AAN 2023 Day 1-2 Mind Next Original
No ratings yet
AAN 2023 Day 1-2 Mind Next Original
21 pages
Physical Properties of Metals
No ratings yet
Physical Properties of Metals
4 pages
Meaning and Discourse: Dr. Manjet Kaur Dr. Omer Mahfoodh
No ratings yet
Meaning and Discourse: Dr. Manjet Kaur Dr. Omer Mahfoodh
59 pages
Bastion Exterior - UV - TDS
No ratings yet
Bastion Exterior - UV - TDS
3 pages
DLL - Tle-H.e. 6 - Q1 - W7
No ratings yet
DLL - Tle-H.e. 6 - Q1 - W7
6 pages
Array Formulas
No ratings yet
Array Formulas
12 pages
Steel Squares: Specifications
No ratings yet
Steel Squares: Specifications
1 page
Fast-Play Tabletop Wargame Rules For Combined-Arms Operations, The Future
No ratings yet
Fast-Play Tabletop Wargame Rules For Combined-Arms Operations, The Future
140 pages
Calculus and Its Applications 11th Edition Bittinger Solutions Manualpdf Download
100% (11)
Calculus and Its Applications 11th Edition Bittinger Solutions Manualpdf Download
42 pages
Ollital Quotation ZG-160YRD Manual Type (Lab Open Mill)
No ratings yet
Ollital Quotation ZG-160YRD Manual Type (Lab Open Mill)
3 pages
Business 70 PDF
No ratings yet
Business 70 PDF
1 page
USPCAS-E Manual
No ratings yet
USPCAS-E Manual
119 pages
Share 'Ch05
100% (1)
Share 'Ch05
81 pages
Chest Freezer: User Manual
No ratings yet
Chest Freezer: User Manual
31 pages
Kohlberg's Stages of Moral Development: Presenter: Ma. Cristina B. Calago Maed-Edl Student EDUC. 202
No ratings yet
Kohlberg's Stages of Moral Development: Presenter: Ma. Cristina B. Calago Maed-Edl Student EDUC. 202
43 pages
C-TAW12-71 Exam Practice Questions and Answers
No ratings yet
C-TAW12-71 Exam Practice Questions and Answers
10 pages