Chapter 5 Predictive Analytics II Text J Web J and Social Media Analytics
Chapter 5 Predictive Analytics II Text J Web J and Social Media Analytics
Mang organisations are realising the importance of extracting knowledge from their
document-based data repositories through the use of text mining tools. Text mining
benefits are obvious in the areas where very large amounts of textual data are being
generated, such as law (court orders), academic research (research articled), finance
(quarterly reports), medicine (discharge summaries), biology (molecular interactions),
technology (patent files), and marketing (customer comments).
LO 2 Differentiate among text analytics, text mining, and data mining.
Both text analytics and text mining intend to solve the same problem (automatically
analysing raw text data) by using different techniques. Text mining identifies relevant
information within a text and therefore, provides qualitative results. Text analytics,
however, focuses on finding patterns and trends across large sets of data, resulting in
more quantitative results. Text analytics is usually used to create graphs, tables and other
sorts of visual reports
Task 1: establish the corpus – The main purpose of the first task activity is to collect all the
documents related to the context being studied. Once collected, the text documents, e-
mails, web pages, short notes, recordings, XML files, etc are transformed and organised in
a manner such that they are all in the same representational form for computer
processing.
Task 2: create the term-document matrix – In this task, the digitised and organised
documents (the corpus) are used to create the TDM where row represent the documents
and columns represent the terms. The goal is to convert the list of organised documents
into a TDM where the cells are filled with the most appropriate indices.
Task 3: extract the knowledge – Novel patterns are extracted in the context of the
specific problem being addressed.
DATA MINING is the process of identifying valid, novel, potentially useful, and ultimately
understandable patterns in data stored in structured databases, where the data are
organised in records structured by categorical, ordinal, or continuous variables.
Structured data is data that is standardized into a tabular format with numerous rows and
columns, making it easier to store and process for analysis and machine learning
algorithms. Structured data can include inputs such as names, addresses, and phone
numbers.
Unstructured data is data that does not have a predefined data format. It can include
text from sources, like social media or product reviews, or rich media formats like, video
and audio files.
LO 3 Describe sentiment analysis.
SENTIMENT refers to a settled opinion reflective of one’s feelings. It is a view or opinion that
is held or expressed.
SENTIMENT ANALYSIS deals with the automatic extraction of opinions, feelings, and
subjectivity in text. Sentiment analysis is the process of computationally identifying and
categorising opinions expressed in a piece of text, especially in order to determine
whether the writer's attitude towards a particular topic, product, etc. is positive, negative,
or neutral.
Sentiment analysis is often used by businesses to detect sentiment in social data, gauge
brand reputation, and understand customers.
Sentiment analysis applications:
►voice of the customer (VOC) – Sentiment analysis can access a company’s product
and service reviews to better understand and better manager customer complaints and
praises.
►voice of the market (VOM) – Sentiment analysis can help understand aggregate
opinions and trends of stakeholders to help companies with competitive intelligence and
product development and positioning.
►voice of the employee (VOE) – Sentiment analysis uses rich, opinionated textual data in
an effective and efficient way to listen to what employees are saying. Happy employees
empower customer experience efforts and improve customer satisfaction.
►brand management – Sentiment analysis helps shape perceptions rather than just
managing experiences.
►financial markets – Sentiment analysis can be used as a proper way to compute the
market movements, with the use of social media, news, blogs, and discussion groups.
►politics – Sentiment analysis can help understand what voters are thinking and can
clarify a candidate’s position on issues. It can also help political organisations, campaigns,
and new analysts to better understand which issues and positions matter the most to
voters.
►government intelligence – Sentiment analysis can allow the automatic analysis of the
opinions that people submit about pending policy or government-regulation proposals.
(3) target identification – This step aims to accurately identify the target of the expressed
sentiment, such as a person, product, or event.
(4) collection and aggregation – Once the sentiments of all text data points in the
document are identified and calculated, they are aggregated and converted to a single
sentiment measure for the whole document.
REFERENCES – the above summary is made using the following textbook:
R. Sharda, et el. 2018. Business Intelligence, Analytics, and Data Science: A Managerial
Perspective. Fourth Edition. Pearson.
PLEASE NOTE: I am selling the service provided in summarising this chapter and not the
intellectual property provided.