Module 6 - Social Media Analytics and Text Mining.
Module 6 - Social Media Analytics and Text Mining.
•
Stages Text Mining Process
• Text preprocessing―This involves the identification of all the unique
words in a document. Non-informative words, such as the, and, or,
and when, are filtered out from the document text before applying
word stemming. Word stemming refers to the process of reducing
the inflected or derived words to their stem base. For example,
words such as cat, cats, catlike, and catty will all be mapped to the
same stem base ‘cat’. Terms such as stemmers or stemming
algorithms are also used interchangeably in stemming programs.
Affix stemmers trim down both suffix and prefix, such as ed, ly, and
ing from a given word. Popular stemmers include Brute Force
algorithm and Suffix Tripping algorithm.
•
Stages Text Mining Process
• Document representation―A document is basically represented in words and
terms.
• Document retrieval―This involves the retrieval of a document based on some
query. Accurate results are ensured using text indexing and accuracy measures.
Text indexing and searching capabilities can be incorporated in an application
using Lucene which is a Java library.
• Document clustering―This involves the grouping of conceptually related
documents to ensure fast retrieval. A term for a given query can be searched
faster from the well-clustered documents. Document clustering can be
implemented using the following techniques:
• Hierarchical clustering
• One-pass clustering
• Buckshot clustering
Text Mining Process
• Both structured and unstructured data are involved in text mining.
Unstructured data comes from reviews and summaries while the
structured data is obtained from organized spreadsheets.
• Text mining tools identify themes, patterns, and insights hidden in the
structured as well as unstructured data. Various text mining software
are employed by organizations for different data mining applications.
Text Mining Software
• The following are some commonly used text mining software
• R―Used for statistical data analysis, text processing, and sentiment
analysis
• ActivePoint―Applied for natural language processing and online
catalog-based contextual search
• Attensity―Used for extraction of facts including who, what, where,
and why and then identifying people, places, and events and how
they are related
• Crossminder―Applied for cross-lingual text analytics
• Compare Suite―Used for comparing texts by keywords and
highlighting common and unique
Text Mining Software
• The following are some commonly used text mining software
• keywords IBM SPSS Predictive Analytics Suite―Applied for data
and text mining
• Monarch―Applied for analysis and transformation of reports into
live data
• SAS Text Miner―Provides a rich suite of text processing and
analysis tools
• Textalyzer―Used for online text analysis Apart from these, some
other text mining
Sentiment Analysis
• Sentiment analysis is one of the most important components of text mining. Also
termed as opinion mining, it involves careful analysis of people’s opinions,
sentiments, attitudes, appraisals, and evaluations.
• This is accomplished by examining large amounts of unstructured data obtained
from the Internet on the basis of positive, negative, or neutral view of the end user.
• Sentiment analysis involves the analysis of following sentences:
• Facts―Product A is better than product B.
• Opinions―I don’t like A. I think B is better in terms of durability. Similar to
Web analysis, specific queries are applied in sentiment analysis to retrieve and
rank relevant content.
• However, sentiment analysis also differs from Web analysis in certain factors. It is
possible to determine from a sentiment analysis that whether the content
expresses an opinion on the topic, and also whether the opinion is positive or
negative.
Sentiment Analysis
• Ranking in Web analysis is done on the basis of the frequency of keywords.
• On the other hand, ranking in sentiment analysis is done on the basis of polarity of
the attitude.
• With the widespread use of Web 2.0 technologies, a huge volume of opinionated
data is available on the social media.
• People using social media put their reviews and comments about products used
and also share their feedback, opinions and experiences with others in their
network.
• These reviews and feedback are utilized by organizations to improve and upgrade
their products and services, and enhance their brand equity.
• Sentiment analysis applies other domains such as linguistics, digital technologies,
text analysis tools, artificial intelligence, and Natural Language Processing (NLP) for
identification and extraction of useful information.
• This greatly influences various domains ranging from politics and science to social
science.
Sentiment Analysis
• The process of sentiment analysis begins by tagging words using Parts of Speech
(POS) such as subject, verb phrase, verb, noun phrase, determiner, and
prepositions.
• Defined patterns are filtered to identify their sentiment orientation. For example,
‘beautiful room’ has an adjective followed by noun.
• The adjective ‘beautiful’ indicates a positive perspective about the noun ‘room’.
• At this stage, the emotional factor in the phrase is also examined and analyzed.
After that, an average sentiment orientation of all the phrases is computed and
analyzed to conclude if a product is recommended by a user.
Sentiment Analysis
• Following parameters may be applied to classify the given text in the process of
sentiment analysis:
• Polarity, which can be positive, negative, or neutral
• Emotional states, which can be sad, angry, or happy
• Scaling system or numeric values
• Subjectivity or objectivity
• Features based on key entities such as durability of furniture, screen size of cell
phone, and lens quality of camera Automated sentiment
Online Tools for Sentiment Analysis
• Topsy―It is used to measure success of a Website on Twitter. It tracks the
occurrence of given and related keywords, website name, and website URL in
tweets.
• BackTweets―This toll is applied to improve search engine ranking of a website. It
tracks tweets that link back to a website.
• Twitterfall―It locates tweets that are important for a website. It can be used to
stay in touch with the customers and consumers, and respond to their queries and
suggestions in real time.
• TweetBeep― This is used to send timely updates or alerts for the topics of
interest.
• Reachli―Designed especially for Pinterest, it is a content sharing website. This tool
helps in tracking data and scheduling and organizing pins (denote the updates in
Pinterest) in advance.
THANK YOU
15