10 - Session 10 - Text Analytics, Text Mining and Sentiment Analysis
10 - Session 10 - Text Analytics, Text Mining and Sentiment Analysis
Analytics,
Text Mining
and Sentiment
Analysis
Dealing with Text
• Data are represented in ways natural to problems from which they were
derived
• If we want to apply the many data mining tools that we have at our disposal,
we must
• either engineer the data representation to match the tools
(representation engineering), or
• build new tools to match the data
Text is “unstructured”
•Linguistic structure is intended for human communication and not
computers
Context matters
Text Representation
• Goal: Take a set of documents –each of which is a relatively free-
form sequence of words– and turn it into our familiar feature-vector
form
• Straightforward representation
• Inexpensive to generate
TFIDF 𝑡, 𝑑 = TF 𝑡, 𝑑 × IDF 𝑡
Source: Data Science for Business; Fundamental Principles of Data Mining and Data- Analytic Thinking.
Example: Jazz
Musicians
• 15 prominent jazz musicians and excerpts of their biographies from
Wikipedia
Source: Data Science for Business; Fundamental Principles of Data Mining and Data- Analytic Thinking.
Example: Jazz Musicians
Source: Data Science for Business; Fundamental Principles of Data Mining and Data- Analytic Thinking.
Example: Jazz Musicians
Source: Data Science for Business; Fundamental Principles of Data Mining and Data- Analytic Thinking.
Example: Jazz Musicians
Beyond “Bag of Words”
• 𝑁-gram Sequences
• Topic Models
N-gram Sequences
• In some cases, word order is important and you want to preserve
some information about it in the representation
Source: Data Science for Business; Fundamental Principles of Data Mining and Data- Analytic Thinking.
Text Mining
Example
Task: predict the stock market based on the stories that appear on the
news wires
Mining News Stories to
Predict Stock Price
Movement
Source: Data Science for Business; Fundamental Principles of Data Mining and Data- Analytic Thinking.
Text Mining
Secara umum, perbedaan antara text mining dan text analytics adalah bahwa text analytics
merupakan konsep yang lebih luas yang mencakup pencarian informasi (misalnya, mencari dan
mengidentifikasi dokumen yang relevan untuk sekumpulan istilah kunci tertentu) serta ekstraksi
informasi, data mining, dan Web mining.
Text Analytics =Information Retrieval +Information Extraction +Data Mining +
Web Mining
Or
Text Analytics =Information Retrieval +Text Mining
Text Mining fields that produce very large amounts of textual data, such as law
(court orders), academic research (research articles), and finance
(quarterly reports).
Information extraction. Identify key phrases and relationships in text by searching for
predefined objects and sequences in text through pattern matching. The most common
form of information extraction
NLP is closely related to text mining because NLP allows feature extraction
from unstructured text so that various data mining techniques can be used
to extract knowledge (new and useful patterns and relationships) from the
text. In simple terms, text mining is a combination of NLP and data mining,
where NLP provides the foundation for understanding and analyzing text in
depth.
NATURAL LANGUAGE
PROCESSING
The benefits of NLP include the ability to generate automatic summaries of text,
translate text from one language to another, recognize sentiment in text, and
more. However, NLP is also faced with several challenges, such as:
• Text Division: Languages such as Chinese, Japanese, and Thai do not have
single word boundaries, making identification of word boundaries difficult.
• Interpreting Word Meanings: Many words have more than one meaning, so
choosing the correct meaning requires consideration of context.
• Syntactic Ambiguity: Grammar in natural languages is often ambiguous,
requiring the incorporation of semantic and contextual information to select
appropriate sentence structures.
• Imperfect Input: Foreign accents, vocal errors, or typographical errors in text
make language processing more difficult.
• Language Activities: Sentences can often be thought of as actions, which
cannot always be determined from sentence structure alone.
TEXT MINING Some applications of text mining in marketing include:
Sentiment analysis, which is now a popular application in text analytics, has a broad impact in various fields. Compared to
expensive and time-consuming traditional sentiment analysis methods, text analytics technology-based approaches are capable of
automating data collection, filtering, classification, and clustering on a large scale. The app accesses a variety of data sources such
as social media, product reviews, service center call records and more.
Some key applications of sentiment analysis include:
Voice of the Customer (VOC): Using sentiment analysis to understand and manage customer complaints and compliments, helping
companies improve their products and services.
Voice of the Market (VOM): Understanding aggregate market opinions and trends, assisting companies in developing product
strategies and positioning themselves in competitive markets.
Voice of the Employee (VOE): Uses sentiment analysis to assess employee satisfaction, which can influence efforts to improve
customer satisfaction.
Brand Management: Using sentiment analysis to monitor opinions on social media to maintain or improve brand reputation.
Financial Markets: Applying sentiment analysis to predict financial market movements, using data from social media, news and
online discussions.
Politics: Analyze sentiment in political discussions to predict election outcomes and understand the issues that matter to voters.
Government Intelligence: Uses sentiment analysis to monitor public opinion regarding government policies and identify potential
threats based on negative communications.
In addition, sentiment analysis can also be used in e-commerce site design, ad placement, search engine management, and email
filtration and analysis. With a wide range of applications, sentiment analysis helps organizations understand and respond to opinions
and trends in various contexts.
SENTIMEN ANALYSIS PROCESS
\
SENTIMEN ANALYSIS AND SPEECH ANALYTICS
Speech Analytics is a science that enables the analysis and extraction of information from live and recorded
conversations. This analysis is used to gather information for security, improve media applications, and
provide business intelligence through worldwide customer call analysis.
In speech analytics, sentiment analysis focuses on assessing the emotional state of a conversation and
measuring the presence and strength of positive or negative feelings. The essence of automated sentiment
analysis involves building models to describe the relationship between features and content in audio and
perceived and expressed sentiment.
❑ Provost, F.; Fawcett, T.: Data Science for Business; Fundamental Principles of
Data Mining and Data- Analytic Thinking. O‘Reilly, CA 95472, 2013.
❑ Sharda, R., Delen, D., Turban, E., (2018). Business intelligence, Analytics, and
Data Science: A Managerial Perspective, 4th Edition, Pearson.
Thank You