100% found this document useful (1 vote)

160 views34 pages

Text Analytics

Presentation by Mr. Challapalli Sudhakar at GSIB National Conference on Business Analytics at GITAM Univeristy in Visakhapatnam

Uploaded by

shahazadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

160 views34 pages

Text Analytics

Presentation by Mr. Challapalli Sudhakar at GSIB National Conference on Business Analytics at GITAM Univeristy in Visakhapatnam

Uploaded by

shahazadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

TEXT ANALYTICS

C Sudhakar
CEO

Raskey Software Solutions Ltd

Email:sudhakar@raskeysoft.com

Web:www.raskeysoft.com

SMART ANALYTICS

Start with strategy

Measure Metrics and Data
Apply analytics
Report results
Transform Business

TYPES OF ANALYTICS
Data analytics
Compete on Analytics
Text analytics
Video analytics
Social networking analytics
Web analytics
Speech analytics

TEXT ANALYTICS
Text analytics is the process of analyzing
unstructured text, extracting relevant
information, and transforming it into useful
business intelligence
Text analysis is now capable of telling us things
we did not already know and perhaps more
importantly had no way of knowing before.
Access to huge text data sets an improved
technical capability means we can now mine the
text for patterns and trends that can be
incredibly useful in business.

TEXT ANALYTICS TASKS INCLUDE

Text categorization
Text clustering
Concept extraction
Sentiment analysis
Document summarization

TEXT CATEGORIZATION
Text categorization applies some structure to
the text which can then be used for analysis
or query
Text analytics assigns a document to one or
more classes or categories according to the
subject or according to other attributes such
as document type, author, creation date etc.,

TEXT CLUSTERING

As the name would suggest text clustering

allows you to automatically cluster huge
repositories of text into meaningful topics or
categories for fast information retrieval or
filtering

CONCEPT EXTRACTION
This concept allows you to extract concepts
from text.
Meaning varies with concept

SENTIMENT ANALYSIS

Sentiment analysis (also known as opinion mining) refers to the use of

natural language processing, text analysis and computational linguistics
to identify and extract subjective information in source materials.
An important part of our information-gathering behavior has always
been to find out what other people think. With the growing availability
and popularity of opinion-rich resources such as online review sites and
personal blogs, new opportunities and challenges arise as people now
can, and do, actively use information technologies to seek out and
understand the opinions of others. The sudden eruption of activity in the
area of opinion mining and sentiment analysis, which deals with the
computational treatment of opinion, sentiment, and subjectivity in text,
has thus occurred at least in part as a direct response to the surge of
interest in new systems that deal directly with opinions as a first-class
object.
The basic purpose of sentiment analysis is to classify polarity of any
given text data as positive negative or neutral. Or star classification or a
scal classification.

EXAMPLE

(1) I bought an iPhone 2 days ago

. (2) It was such a nice phone.
(3) The touch screen was really cool.
(4) The voice quality was clear too.
(5) However, my mother was mad with me as I did
not tell her before I bought it.
(6) She also thought the phone was too expensive,
and wanted me to return it to the shop.
? The first thing that we may notice is that there are
several opinions in this review.

ANALYSIS

Sentences (2), (3) and (4) express three positive opinions, while
sentences (5) and (6) express negative opinions.
Then we also notice that the opinions all have some targets on which they are
expressed.
The opinion in sentence (2) is on iPhone as a whole,
the opinions in sentences (3) and (4) are on the touch screen and voice
quality features of iPhone respectively.
The opinion in sentence (6) is on the price of iPhone, but the opinion/emotion in
sentence (5) is on me, not iPhone.
This is an important point.
In an application, the user may be interested in opinions on certain targets, but
not on all (e.g., unlikely on me).
Finally, we may also notice the sources or holders of opinions.
The source or holder of the opinions in sentences (2), (3) and (4) is the author of
the review
(I), but in sentences (5) and (6) it is my mother. With this example in mind, we
can define sentiment

OBJECT AND FEATURE

In general, opinions can be expressed on
any target entity, e.g., a product, a service,
an individual, an organization, or an event.
We use the term object to denote the target
entity that has been commented on.
An object can have a set of components (or
parts) and a set of attributes (or properties)
[1, 4], which we collectively call the features
of the object.

TECHNICAL CHALLNGES
Object Identification
Feature grouping and synonym grouping
Opinion orientation classification
Integration
Identification of spam reviews/ documents

CLASSFICATION
Document-level sentiment analysis;
Sentence-level sentiment analysis;
Aspect-based sentiment analysis;
Comparative sentiment analysis; and,
Sentiment lexicon acquisition.

DOCUMENT SUMMRIZATION
Again as the name suggest this text analytic
tool allows you to automatically summarize
documents to retain the most important
points from the original document.
Extraction
Abstraction

SUMMARY

Text Analytics is particularly useful for

information retrieval, pattern recognition,
tagging and annotation, information
extraction, sentiment assessment and
predictive analytics.

A REAL TIME PROCESS

SMALL EXAMPLE IN AI

THIS APPROACH WORKS INCASE OF BOUNDED GROUND

CURATOR ENGINE INTELLIGENCE ENGINES

Domain Intelligence
Extraction Engine
Context Intelligence
Keyword Intelligence
Intent Analysis Engine

Lead Validity
Intelligence
Positive

Opportunity

DOMAIN INTELLIGENCE
Document
Url & Name

Negative

Url / Name
pattern

Unsure

Both
Positive and
Negative

Neither
Positive Nor
Negative

Challenges

Dmoz /
Jigsaw Data

Positive

Insufficient domain knowledge More elimination can be achieved with

more domain knowledge from source.

Solution

Insufficient domain knowledge SLED crawler and domain classification

should provide more knowledge

EXTRACTION ENGINE
Document

Text, Xml
and
Metadata

Old
Document

New
Document

Parser

Tika and
Pdf2Xml

Challenges

Non visible characters raises exceptions or misinterpretation (2%)

PdfMiner

schools is extracted as schools and changes the meaning.

Parser failures PdfMiner is an accurate parser but fails at times (10%)

Solutions

Parser Failures Using Tika and Pdf2Xml as a combination reduces context

leakage.

CONTEXT INTELLIGENCE
Parser

Document
Titles and
Headers

Positive

Unsure

Challenges

Ambiguous Context Misleads Decision

Negative

Job posting inside an agenda

Insufficient Context Context away from keyword location or missing

Solutions

Insufficient Context Extract context from various locations.

Information from source, directory information, domain intelligence,
etc.

KEYWORD INTELLIGENCE
Parser

Context
Around
Keyword

Paragraphs

Bullet
Points

Challenges

Identification of keyword phrases Reduces data leakage

Keyword specific intelligence Negative extensions, support words etc.

Tables

free wifi, wireless mouse, network security policy.

Solutions

Keyword specific intelligence Manually collected for popular keywords.

Use statistical bigram approach for other keywords.

INTENT ANALYSIS

Context
Around
Keyword

Paragraph

Direct
Relation

Indirect
Relation

Bullet Point

Header
Analysis

Bullet Point
Analysis

Table

Row
Analysis

Header
Analysis

INTENT ANALYSIS CHALLENGES

Human Ambiguity

Improved productivity and streamlined IT infrastructure through file

storage capabilities
The plan includes providing sufficient network capacity (This sentence
is present in an analysis document from a writer)

Machine Ambiguity

Authorize a purchase of storage area network equipment - keyword

is network equipment
The technology director shall enhance awareness regarding network
security

Solution

Experimenting by building probabilistic language models.

INTENT ANALYSIS CHALLENGES

Stanford Mistakes

Sometimes Stanford software we are using, builds wrong relations

Ex: IT Infrastructure , IT is identified as it.

Solution

Replace keyword with a generic keyword before parsing it with

stanford. The generic keyword shouldnt spoil the relations.

Indirect buying decision

Information security is recognized as a top management challenge

for the department

OTHER CHALLENGES

Noisy Keywords

Noisy Domains

Keywords like vmware, firewall and gis contributes lots of

noise
Unavoidable these keywords also contribute towards
positives.

Domains like itdashboard.gov contributes lots of noise.

Contributed 22% noise to Tegile leads in June.

Duplicates

Same domain documents appears multiple times, contributing

to duplicate documents

POSITIVE MARKED DOCUMENTS

45%
40%

40%

37%

35%
30%

32%
28%

25%
19%

20%
15%
10%

12%
8%

5%
0%
May

June

13%

Lost Business
Wrong Context
Rejected by reviewer
Approved

LEAD VALIDITY INTELLIGENCE

False
Positives

Lost
Business

Low
Budget

Wrong
Industry

Too Early

Others

Challenges

Duplicates

Company specific constraints Campus Management requires only

Higher education leads.
Identifying Budget Constraints Eg. < $10k

Solution

Implemented Patterns to identify Lost Businesses

AFTER APPLYING LOST BUSINESS PATTERNS

90%
79%

80%
70%

65%
57%

60%

50%
40%

43%
Identified L.B
Not identified L.B

35%

30%

21%

20%
10%

0%
Juniper
(55/120)

Google
(30/150)

Tegile (19/55)

CURRENTLY COMPANIES ARE WORKING ON

Probabilistic Language Models

Build semi supervised language models to handle machine

ambiguity.
Develop a diversified language based dataset for training.

Driver Based Patterns

Develop patterns specific to driver word.

Eg:

Provide Specifies intent of an action

Provides Specifies intent of solution/service

Keyword Intelligence

Methodologies to derive and handle keyword phrases.

Start with manually adding keyword phrases and slowing
move towards an automated system.

THANK YOU

Fundamentals of Academic Writing Level 1 PDF
83% (24)
Fundamentals of Academic Writing Level 1 PDF
236 pages
Marketing Analytics Unit 1
No ratings yet
Marketing Analytics Unit 1
48 pages
Application of Logistic Regression To People-Analytics
No ratings yet
Application of Logistic Regression To People-Analytics
30 pages
Research Paper
No ratings yet
Research Paper
7 pages
Analytics Compendium
No ratings yet
Analytics Compendium
41 pages
Text Analytics
No ratings yet
Text Analytics
21 pages
Text Analytics
No ratings yet
Text Analytics
30 pages
What Is Predictive Analytics?
No ratings yet
What Is Predictive Analytics?
31 pages
Factor Analysis
67% (3)
Factor Analysis
25 pages
Conflict Resolution in Employee Relations
No ratings yet
Conflict Resolution in Employee Relations
17 pages
Corporate Social Responsibility
No ratings yet
Corporate Social Responsibility
16 pages
Code
No ratings yet
Code
8 pages
Management Information Systems - Introduction To Social Media
No ratings yet
Management Information Systems - Introduction To Social Media
26 pages
Global Marketing: A Project Report
No ratings yet
Global Marketing: A Project Report
24 pages
Social Media and Web Analytics Unit-5
No ratings yet
Social Media and Web Analytics Unit-5
10 pages
Study Material BTech IT VIII Sem Subject Deep Learning Deep Learning Btech IT VIII Sem
No ratings yet
Study Material BTech IT VIII Sem Subject Deep Learning Deep Learning Btech IT VIII Sem
30 pages
Data Mining
100% (3)
Data Mining
18 pages
Engineering-A Review Web Data Scrapping
No ratings yet
Engineering-A Review Web Data Scrapping
4 pages
Executive Information System
100% (1)
Executive Information System
11 pages
Case Study-Retail Analytics
100% (1)
Case Study-Retail Analytics
11 pages
Business Analytics Using Python Sentiment Analytics: Cyrus Lentin
100% (1)
Business Analytics Using Python Sentiment Analytics: Cyrus Lentin
28 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
Cluster Analysis
No ratings yet
Cluster Analysis
30 pages
ERP - Unit 1
No ratings yet
ERP - Unit 1
12 pages
Introduction To Factor Analysis (Compatibility Mode) PDF
No ratings yet
Introduction To Factor Analysis (Compatibility Mode) PDF
20 pages
SE Sec-A Lecture-10
No ratings yet
SE Sec-A Lecture-10
48 pages
Subin Sudhir 96 PDF
No ratings yet
Subin Sudhir 96 PDF
3 pages
Interview Preparations - NielsenIQ
No ratings yet
Interview Preparations - NielsenIQ
1 page
Introduction To Big Data - The Four V's
No ratings yet
Introduction To Big Data - The Four V's
35 pages
International Human Resource Management Unit-4 Equal Opportunity and Diversity Management in Global Context
No ratings yet
International Human Resource Management Unit-4 Equal Opportunity and Diversity Management in Global Context
17 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
D7.2 Data Managment Plan v1.04
No ratings yet
D7.2 Data Managment Plan v1.04
14 pages
S1-17-Mba ZC416-L16
No ratings yet
S1-17-Mba ZC416-L16
43 pages
Case Study
No ratings yet
Case Study
5 pages
Cluster Training PDF (Compatibility Mode)
No ratings yet
Cluster Training PDF (Compatibility Mode)
21 pages
Recommendation System
No ratings yet
Recommendation System
19 pages
Project
No ratings yet
Project
14 pages
Predictive Analytics
No ratings yet
Predictive Analytics
9 pages
Forecasting With Regression Model
No ratings yet
Forecasting With Regression Model
21 pages
0210108402-24-Ind426-2018-04-Ppt 3 Conjoint Analysis
No ratings yet
0210108402-24-Ind426-2018-04-Ppt 3 Conjoint Analysis
12 pages
Case - Study of Data Warehouse
No ratings yet
Case - Study of Data Warehouse
14 pages
Semantic Web SN
No ratings yet
Semantic Web SN
22 pages
A Multi-Dimensional Data Model
No ratings yet
A Multi-Dimensional Data Model
37 pages
Assessing HR Programmes:: UNIT-06 Creating HR Scorecard
No ratings yet
Assessing HR Programmes:: UNIT-06 Creating HR Scorecard
9 pages
KMBN302 Innovation and Entrepreneurship Unit - 4
No ratings yet
KMBN302 Innovation and Entrepreneurship Unit - 4
66 pages
Data Analytics Applications - Case Studies
No ratings yet
Data Analytics Applications - Case Studies
20 pages
Business Analytics - The Science of Data Driven Decision Making
No ratings yet
Business Analytics - The Science of Data Driven Decision Making
55 pages
Add-On Notes of PME UNIT-2
No ratings yet
Add-On Notes of PME UNIT-2
49 pages
BCSE 0105 - Machine Learning - Module 1 - Complete - NC
No ratings yet
BCSE 0105 - Machine Learning - Module 1 - Complete - NC
200 pages
Knowledge Creation With The Help of AI
100% (1)
Knowledge Creation With The Help of AI
5 pages
Market Research Notes
100% (1)
Market Research Notes
2 pages
UNIT 1 S&Web Analytics
No ratings yet
UNIT 1 S&Web Analytics
35 pages
7 New Quality Tools (The Seven Management and Planning Tools
No ratings yet
7 New Quality Tools (The Seven Management and Planning Tools
12 pages
Ucc & BM of Osmania University (MBA)
No ratings yet
Ucc & BM of Osmania University (MBA)
22 pages
Data Analytics - Group 9 - Sec B - ESITS
No ratings yet
Data Analytics - Group 9 - Sec B - ESITS
20 pages
1-Big Data Analytics
No ratings yet
1-Big Data Analytics
37 pages
Multi-Criteria Decision Making
No ratings yet
Multi-Criteria Decision Making
5 pages
Unit 2: Human Resource Policies & Strategies
100% (1)
Unit 2: Human Resource Policies & Strategies
26 pages
Notes From Introduction To Market Research - Naresh Malhotra
No ratings yet
Notes From Introduction To Market Research - Naresh Malhotra
134 pages
5.web Data Mining
No ratings yet
5.web Data Mining
41 pages
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
Field Attachment Students Report Form
No ratings yet
Field Attachment Students Report Form
6 pages
Guru Harkrishan Public School, India Gate Holiday Homework (2019 - 20) Class 8 English
No ratings yet
Guru Harkrishan Public School, India Gate Holiday Homework (2019 - 20) Class 8 English
5 pages
Baku Gan
No ratings yet
Baku Gan
2 pages
C++ Project On Railway Reservation System
25% (8)
C++ Project On Railway Reservation System
56 pages
Dimensionless Numbers
No ratings yet
Dimensionless Numbers
13 pages
Agriculture and Allied Group
No ratings yet
Agriculture and Allied Group
2 pages
Food Chains
No ratings yet
Food Chains
5 pages
Heal Your Core Wound With Soul Art Journal
No ratings yet
Heal Your Core Wound With Soul Art Journal
17 pages
Happy Street II - 1st Tests 2017 - Key
No ratings yet
Happy Street II - 1st Tests 2017 - Key
1 page
Theories of Perception PDF
No ratings yet
Theories of Perception PDF
31 pages
Singrao Khala PDF
No ratings yet
Singrao Khala PDF
25 pages
How Can I Remove Win32 - Grenam.a Permanently - Win32 - Grenam
No ratings yet
How Can I Remove Win32 - Grenam.a Permanently - Win32 - Grenam
15 pages
Intrinsic FUNCTIONS in COBOL
No ratings yet
Intrinsic FUNCTIONS in COBOL
33 pages
Autocad Lab Report
No ratings yet
Autocad Lab Report
9 pages
Superscalar Vs Superpipeline Processor
No ratings yet
Superscalar Vs Superpipeline Processor
17 pages
Facility Management Filetype PDF
No ratings yet
Facility Management Filetype PDF
2 pages
Logitech MX ERGO Wireless Trackball
No ratings yet
Logitech MX ERGO Wireless Trackball
8 pages
Ohms Law 14to16 Lesson-Plan
No ratings yet
Ohms Law 14to16 Lesson-Plan
3 pages
Why Do You Glamorize Serial Killers in The Media
No ratings yet
Why Do You Glamorize Serial Killers in The Media
7 pages
Clouds Lesson Plan PDF 2
No ratings yet
Clouds Lesson Plan PDF 2
4 pages
#Freud's Concept of Narcissism
No ratings yet
#Freud's Concept of Narcissism
5 pages
Absence Error Codes
100% (1)
Absence Error Codes
28 pages
Allocate Move Order Script
100% (1)
Allocate Move Order Script
3 pages
Discovering Your Natural Gifts and Transform Your Life Barry Douglass Mccollough
No ratings yet
Discovering Your Natural Gifts and Transform Your Life Barry Douglass Mccollough
10 pages
Human Nature
No ratings yet
Human Nature
28 pages
Case Study Repor Take Time
No ratings yet
Case Study Repor Take Time
18 pages
Lecture 1389
No ratings yet
Lecture 1389
6 pages
Machinist Mate 3 2 Surface Navy
No ratings yet
Machinist Mate 3 2 Surface Navy
592 pages
2019C MGMT871002
No ratings yet
2019C MGMT871002
4 pages

Text Analytics

Uploaded by

Text Analytics

Uploaded by

TEXT ANALYTICS

Raskey Software Solutions Ltd

Start with strategy

TEXT ANALYTICS TASKS INCLUDE

As the name would suggest text clustering

Sentiment analysis (also known as opinion mining) refers to the use of

(1) I bought an iPhone 2 days ago

OBJECT AND FEATURE

Text Analytics is particularly useful for

A REAL TIME PROCESS

THIS APPROACH WORKS INCASE OF BOUNDED GROUND

CURATOR ENGINE INTELLIGENCE ENGINES

Insufficient domain knowledge More elimination can be achieved with

Insufficient domain knowledge SLED crawler and domain classification

Non visible characters raises exceptions or misinterpretation (2%)

schools is extracted as schools and changes the meaning.

Parser failures PdfMiner is an accurate parser but fails at times (10%)

Parser Failures Using Tika and Pdf2Xml as a combination reduces context

Ambiguous Context Misleads Decision

Job posting inside an agenda

Insufficient Context Context away from keyword location or missing

Insufficient Context Extract context from various locations.

Identification of keyword phrases Reduces data leakage

free wifi, wireless mouse, network security policy.

Keyword specific intelligence Manually collected for popular keywords.

INTENT ANALYSIS CHALLENGES

Improved productivity and streamlined IT infrastructure through file

Authorize a purchase of storage area network equipment - keyword

Experimenting by building probabilistic language models.

INTENT ANALYSIS CHALLENGES

Sometimes Stanford software we are using, builds wrong relations

Replace keyword with a generic keyword before parsing it with

Indirect buying decision

Information security is recognized as a top management challenge

Keywords like vmware, firewall and gis contributes lots of

Domains like itdashboard.gov contributes lots of noise.

Same domain documents appears multiple times, contributing

POSITIVE MARKED DOCUMENTS

LEAD VALIDITY INTELLIGENCE

Company specific constraints Campus Management requires only

Implemented Patterns to identify Lost Businesses

AFTER APPLYING LOST BUSINESS PATTERNS

CURRENTLY COMPANIES ARE WORKING ON

Probabilistic Language Models

Build semi supervised language models to handle machine

Driver Based Patterns

Develop patterns specific to driver word.

Provide Specifies intent of an action

Methodologies to derive and handle keyword phrases.

You might also like