0% found this document useful (0 votes)
12 views5 pages

Chapter 5 Predictive Analytics II Text J Web J and Social Media Analytics

Chapter 5 discusses the significance of text analytics and text mining in extracting valuable insights from unstructured text data, which can lead to better decision-making and competitive advantages for businesses. It differentiates between text analytics, which focuses on quantitative results and pattern recognition, and text mining, which aims to discover qualitative knowledge from textual sources. The chapter also covers the process of sentiment analysis, its applications in various fields, and the steps involved in analyzing sentiments expressed in text.

Uploaded by

stanspatch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Chapter 5 Predictive Analytics II Text J Web J and Social Media Analytics

Chapter 5 discusses the significance of text analytics and text mining in extracting valuable insights from unstructured text data, which can lead to better decision-making and competitive advantages for businesses. It differentiates between text analytics, which focuses on quantitative results and pattern recognition, and text mining, which aims to discover qualitative knowledge from textual sources. The chapter also covers the process of sentiment analysis, its applications in various fields, and the steps involved in analyzing sentiments expressed in text.

Uploaded by

stanspatch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CHAPTER 5

PREDICTIVE ANALYTICS II: TEXT, WEB, AND SOCIAL


MEDIA ANALYTICS
LO 1 Describe text analytics and understand the need for text mining.
Due to the fact that knowledge is power in today’s business world, and knowledge is
derived from data and information, businesses that effectively and efficiently tap into
their text data sources will have the necessary knowledge to make better decisions,
leading to a competitive advantage over those businesses that lag behind.
TEXT ANALYTICS is the automated process of translating large volumes of unstructured text
into quantitative data to uncover insights, trends, and patterns. Combined with data
visualization tools, this technique enables companies to understand the story behind the
numbers and make better decisions.
Text analysis uses many linguistic, statistical, and machine learning techniques. Text
analytics involves information retrieval from unstructured data and the process of
structuring the input text to derive patters and trends and evaluating and interpreting the
output data. It also involves lexical analysis, categorisation, clustering, pattern recognition,
tagging, annotation, information extraction, link and association analysis and visualisation.

@study_ingmadesimple Luca du Toit


Application areas of text mining:
►information extraction – text mining can identify key phrases and relationships within text
by looking for predefined objects and sequences in text by way of pattern matching
►topic tracking – based on a user profile and documents that a user views, text mining
can predict other documents of interest to the user
►summarisation – text mining can summarise a document which saves time on the part
of the reader
►categorisation – text mining can identify the main themes of a document and then
place the document into a predefined set of categories based on those themes
►clustering – text mining can group similar documents without having a predefined set of
categories
►concept linking – text mining can create concepts-related documents by identifying
their shared concepts and thereby helping users find information that they perhaps would
not have found using traditional search methods
►question answering – text mining can find the best answer to a given question through
knowledge-drive pattern matching

Mang organisations are realising the importance of extracting knowledge from their
document-based data repositories through the use of text mining tools. Text mining
benefits are obvious in the areas where very large amounts of textual data are being
generated, such as law (court orders), academic research (research articled), finance
(quarterly reports), medicine (discharge summaries), biology (molecular interactions),
technology (patent files), and marketing (customer comments).
LO 2 Differentiate among text analytics, text mining, and data mining.

Both text analytics and text mining intend to solve the same problem (automatically
analysing raw text data) by using different techniques. Text mining identifies relevant
information within a text and therefore, provides qualitative results. Text analytics,
however, focuses on finding patterns and trends across large sets of data, resulting in
more quantitative results. Text analytics is usually used to create graphs, tables and other
sorts of visual reports

TEXT ANALYTICS is a broad concept that includes information retrieval, as well as


information extraction, data mining, and web mining. It is the process of converting
unstructured text data into meaningful data for analysis, in order to measure customer
opinions, product reviews, feedback, to provide search facility, sentimental analysis and
entity modelling to support fact based decision making.
TEXT MINING is primarily focused on discovering new and useful knowledge from the
textual data sources. It is the semiautomated process of identifying and extracting facts,
relationships, and patterns (useful information and knowledge) from large amounts of
unstructured data sources, that would otherwise remain buried in the mass of textual big
data.
Natural language processing (NLP) is an important component of text mining and is a
subfield of artificial intelligence and computational linguistics. It studies the problem of
understanding the natural human language with the view of converting depictions of
human language into more formal representations that are easier for computer programs
to manipulate.

@study_ingmadesimple Luca du Toit


Challenges associated with the implementation of NLP:
►part-of-speech tagging – It is difficult to markup terms in a text as corresponding to a
particular part of speech (such as nouns, verbs, adjectives, or adverbs) because the part
of speech depends not only on the definition of the term but also on the context within
which it is used.
►text segmentation – Some written languages, such as Chinese, Japanese, and Thai, do
not have single-word boundaries and so the text-parsing task requires the identification of
word boundaries which is often difficult. Similar challenges in speech segmentation
emerge when analysing spoken language because sounds representing successive letters
and words blend into each other.
►word sense disambiguation – Many words have more than one meaning. Selecting the
meaning that makes the most sense can only be accomplished by taking into account
the context within which the word is used.
►syntactic ambiguity – The grammar for natural languages is ambiguous because
multiple possible sentence structures often need to be considered. Choosing the most
appropriate structure usually requires a fusion of semantic and contextual information.
►imperfect or irregular input – Foreign or regional accents and vocal impediments in
speech and typographical or grammatical errors in texts make the processing of the
language an even more difficult task.
►speech acts - A sentence can often be considered an action by the speaker. The
sentence structure alone may not contain enough information to define this action. For
example, “Can you pass the class?” requests a simple yes/no answer, whereas “Can you
pass the salt?” is a request for a physical action to be performed.

TEXT MINING PROCESS:

Task 1: establish the corpus – The main purpose of the first task activity is to collect all the
documents related to the context being studied. Once collected, the text documents, e-
mails, web pages, short notes, recordings, XML files, etc are transformed and organised in
a manner such that they are all in the same representational form for computer
processing.
Task 2: create the term-document matrix – In this task, the digitised and organised
documents (the corpus) are used to create the TDM where row represent the documents
and columns represent the terms. The goal is to convert the list of organised documents
into a TDM where the cells are filled with the most appropriate indices.
Task 3: extract the knowledge – Novel patterns are extracted in the context of the
specific problem being addressed.

@study_ingmadesimple Luca du Toit


‘Text mining’ is the same as ‘data mining’ in that it has the same purpose and uses the
same processes, but with ‘text mining’ the input to the process is a collection of
unstructured data files.

DATA MINING is the process of identifying valid, novel, potentially useful, and ultimately
understandable patterns in data stored in structured databases, where the data are
organised in records structured by categorical, ordinal, or continuous variables.

Structured data is data that is standardized into a tabular format with numerous rows and
columns, making it easier to store and process for analysis and machine learning
algorithms. Structured data can include inputs such as names, addresses, and phone
numbers.

Unstructured data is data that does not have a predefined data format. It can include
text from sources, like social media or product reviews, or rich media formats like, video
and audio files.
LO 3 Describe sentiment analysis.

SENTIMENT refers to a settled opinion reflective of one’s feelings. It is a view or opinion that
is held or expressed.
SENTIMENT ANALYSIS deals with the automatic extraction of opinions, feelings, and
subjectivity in text. Sentiment analysis is the process of computationally identifying and
categorising opinions expressed in a piece of text, especially in order to determine
whether the writer's attitude towards a particular topic, product, etc. is positive, negative,
or neutral.

Sentiment analysis is often used by businesses to detect sentiment in social data, gauge
brand reputation, and understand customers.
Sentiment analysis applications:
►voice of the customer (VOC) – Sentiment analysis can access a company’s product
and service reviews to better understand and better manager customer complaints and
praises.
►voice of the market (VOM) – Sentiment analysis can help understand aggregate
opinions and trends of stakeholders to help companies with competitive intelligence and
product development and positioning.
►voice of the employee (VOE) – Sentiment analysis uses rich, opinionated textual data in
an effective and efficient way to listen to what employees are saying. Happy employees
empower customer experience efforts and improve customer satisfaction.
►brand management – Sentiment analysis helps shape perceptions rather than just
managing experiences.
►financial markets – Sentiment analysis can be used as a proper way to compute the
market movements, with the use of social media, news, blogs, and discussion groups.
►politics – Sentiment analysis can help understand what voters are thinking and can
clarify a candidate’s position on issues. It can also help political organisations, campaigns,
and new analysts to better understand which issues and positions matter the most to
voters.
►government intelligence – Sentiment analysis can allow the automatic analysis of the
opinions that people submit about pending policy or government-regulation proposals.

@study_ingmadesimple Luca du Toit


Sentiment analysis process:
(1) sentiment detection/ O-S
Polarity calculation – This step aims
to differentiate between a fact
ana an opinion, which may be
viewed as classification of text as
objective or subjective. If the
objectivity value is close to 1, then
there is no opinion and the process
goes back and grabs the next text
data to analyse.
(2) N-P polarity classification – This
step will take the opinionated
piece of text and will classify the
opinion as falling under one of two
opposing sentiment polarities
(positive or negative), or locate its
position on the continuum
between these two polarities. This
step will also involve identifying the
strength of the sentiment (mildly,
moderately, strongly, or very
strongly). This classification many
need to be done at several levels:
term, phrase, sentence, and
document level.

(3) target identification – This step aims to accurately identify the target of the expressed
sentiment, such as a person, product, or event.

(4) collection and aggregation – Once the sentiments of all text data points in the
document are identified and calculated, they are aggregated and converted to a single
sentiment measure for the whole document.
REFERENCES – the above summary is made using the following textbook:
R. Sharda, et el. 2018. Business Intelligence, Analytics, and Data Science: A Managerial
Perspective. Fourth Edition. Pearson.
PLEASE NOTE: I am selling the service provided in summarising this chapter and not the
intellectual property provided.

@study_ingmadesimple Luca du Toit

You might also like