0% found this document useful (0 votes)
297 views47 pages

Section 2 Text Analytics and Text Mining Overview

The document discusses text analytics, text mining, and natural language processing (NLP). It defines text analytics and text mining, explaining that text analytics includes information retrieval while text mining focuses on discovering new knowledge. NLP is important for text mining as it converts unstructured text into structured representations. Popular text mining applications include information extraction, topic tracking, summarization, categorization, clustering, and question answering. The main steps in text mining are establishing a corpus, creating a term-document matrix, and extracting knowledge from the matrix.

Uploaded by

Noemer Orsolino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
297 views47 pages

Section 2 Text Analytics and Text Mining Overview

The document discusses text analytics, text mining, and natural language processing (NLP). It defines text analytics and text mining, explaining that text analytics includes information retrieval while text mining focuses on discovering new knowledge. NLP is important for text mining as it converts unstructured text into structured representations. Popular text mining applications include information extraction, topic tracking, summarization, categorization, clustering, and question answering. The main steps in text mining are establishing a corpus, creating a term-document matrix, and extracting knowledge from the matrix.

Uploaded by

Noemer Orsolino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Section 2 Text Analytics and Text Mining Overview

 What is text analytics? How does it differ from text mining?

o Text analytics is a concept that includes

 information retrieval (e.g., searching and identifying relevant documents

for a given set of key terms) as well as

 information extraction,

 data mining, and

 Web mining.

o By contrast, text mining is primarily focused on discovering new and useful

knowledge from textual data sources.

o The overarching goal for both text analytics and text mining is to turn

unstructured textual data into actionable information through the application of

natural language processing (NLP) and analytics.

o However, text analytics is a broader term because of its inclusion of information

retrieval.

o You can think of text analytics as a combination of information retrieval plus text

mining.

 What is text mining? How does it differ from data mining?

o Text mining is the application of data mining to unstructured, or less structured, text

files.

o As the names indicate, text mining analyzes words; and data mining analyzes numeric

data.

 Why is the popularity of text mining as a BI tool increasing?


o Text mining as a BI tool is increasing because of the rapid growth in text data and

availability of sophisticated BI tools.

o The benefits of text mining are obvious in the areas where very large amounts of

textual data are being generated, such as

 law (court orders),

 academic research (research articles),

 finance (quarterly reports),

 medicine (discharge summaries),

 biology (molecular interactions),

 technology (patent files), and

 marketing (customer comments).

 What are some popular application areas of text mining?

o Information extraction.

 Identification of key phrases and relationships within text by looking for

predefined sequences in text via pattern matching.

o Topic tracking.

 Based on a user profile and documents that a user views, text mining can

predict other documents of interest to the user.

o Summarization.

 Summarizing a document to save time on the part of the reader.

o Categorization.

 Identifying the main themes of a document and then placing the document

into a predefined set of categories based on those themes.


o Clustering.

 Grouping similar documents without having a predefined set of categories.

o Concept linking.

 Connects related documents by identifying their shared concepts and, by

doing so, helps users find information that they perhaps would not have

found using traditional search methods.

o Question answering.

 Finding the best answer to a given question through knowledge-driven

pattern matching.

Section 3 Natural Language Processing (NLP)

 What is NLP?

o Natural language processing (NLP) is an important component of text mining and is a

subfield of artificial intelligence and computational linguistics.

o It studies the problem of "understanding" the natural human language, with the view

of converting depictions of human language (such as textual documents) into more

formal representations (in the form of numeric and symbolic data) that are easier for

computer programs to manipulate.

 How does NLP relate to text mining?

o Text mining uses natural language processing to induce structure into the text

collection and then uses data mining algorithms such as classification, clustering,

association, and sequence discovery to extract knowledge from it.

 What are some of the benefits and challenges of NLP?


 Benefits

o NLP moves beyond syntax-driven text manipulation (which is often called "word

counting") to a true understanding and processing of natural language that considers

grammatical and semantic constraints as well as the context.

o The challenges include:

 Part-of-speech tagging.

 It is difficult to mark up terms in a text as corresponding to a

particular part of speech because the part of speech depends not

only on the definition of the term but also on the context within

which it is used.

 Text segmentation.

 Some written languages, such as Chinese, Japanese, and Thai, do

not have single-word boundaries.

 Word sense disambiguation.

 Many words have more than one meaning.

 Selecting the meaning that makes the most sense can only be

accomplished by considering the context within which the word is

used.

 Syntactic ambiguity.

 The grammar for natural languages is ambiguous; that is, multiple

possible sentence structures often need to be considered.

 Choosing the most appropriate structure usually requires a fusion

of semantic and contextual information.


 Imperfect or irregular input.

 Foreign or regional accents and vocal impediments in speech and

typographical or grammatical errors in texts make the processing

of the language an even more difficult task.

 Speech acts.

 A sentence can often be considered an action by the speaker. The

sentence structure alone may not contain enough information to

define this action.

 What are the most common tasks addressed by NLP?

o Following are among the most popular tasks:

 Question answering

 Automatic summarization

 Natural language generation

 Natural language understanding

 Machine translation

 Foreign language reading

 Foreign language writing

 Speech recognition

 Text-to-speech

 Text proofing

 Optical character recognition

Section 4: Text Mining Applications

 List and briefly discuss some of the text mining applications in marketing.
o Text mining can be used to increase cross-selling and up-selling by analyzing the

unstructured data generated by call centers.

o Text mining has become invaluable for customer relationship management.

Companies can use text mining to analyze rich sets of unstructured text data,

combined with the relevant structured data extracted from organizational databases,

to predict customer perceptions and subsequent purchasing behavior.

 How can text mining be used in security and counterterrorism?

o In 2007, EUROPOL developed an integrated system capable of accessing, storing,

and analyzing vast amounts of structured and unstructured data sources in order to

track transnational organized crime.

o Another security-related application of text mining is in the area of deception

detection.

 What are some promising text mining applications in biomedicine?

o As in any other experimental approach, it is necessary to analyze the vast amount of

data in the context of previously known information about the biological entities

under study. The literature is a particularly valuable source of information for

experiment validation and interpretation. Therefore, the development of automated

text mining tools to assist in such interpretation is one of the main challenges in

current bioinformatics research.

Section 5: Text Mining Process

 What are the main steps in the text mining process?

o Text mining entails three tasks:

 Establish the Corpus:


 Collect and organize the domain-specific unstructured data

 Create the Term-Document Matrix:

 Introduce structure to the corpus

 Extract Knowledge:

 Discover novel patterns from the T-D matrix

 What is the reason for normalizing word frequencies? What are the common methods for

normalizing word frequencies?

o The raw indices need to be normalized to qhave a more consistent TDM for further

analysis.

o Common methods are log frequencies, binary frequencies, and inverse document

frequencies.

 What is SVD? How is it used in text mining?

o Singular value decomposition (SVD), which is closely related to principal

components analysis, reduces the overall dimensionality of the input matrix (number

of input documents by number of extracted terms) to a lower dimensional space,

where each consecutive dimension represents the largest degree of variability

(between words and documents) possible.

 What are the main knowledge extraction methods from corpus?

o The main categories of knowledge extraction methods are

 classification,

 clustering,

 association, and

 trend analysis.
Section 6: Sentiment Analysis

 What is sentiment analysis? How does it relate to text mining?

o Sentiment analysis tries to answer the question, "What do people feel about a certain

topic?" by digging into opinions of many using a variety of automated tools. It is also

known as opinion mining, subjectivity analysis, and appraisal extraction.

o Sentiment analysis shares many characteristics and techniques with text mining.

However, unlike text mining, which categorizes text by conceptual taxonomies of

topics, sentiment classification generally deals with two classes (positive versus

negative), a range of polarity (e.g., star ratings for movies), or a range in strength of

opinion.

 What are the most popular application areas for sentiment analysis? Why?

o Customer relationship management (CRM) and customer experience management are

popular "voice of the customer (VOC)" applications. Other application areas include

"voice of the market (VOM)" and "voice of the employee (VOE)."

 What would be the expected benefits and beneficiaries of sentiment analysis in politics?

o Opinions matter a great deal in politics. Because political discussions are dominated

by quotes, sarcasm, and complex references to persons, organizations, and ideas,

politics is one of the most difficult, and potentially fruitful, areas for sentiment

analysis. By analyzing the sentiment on election forums, one may predict who is

more likely to win or lose. Sentiment analysis can help understand what voters are

thinking and can clarify a candidate's position on issues. Sentiment analysis can help

political organizations, campaigns, and news analysts to better understand which

issues and positions matter the most to voters. The technology was successfully
applied by both parties to the 2008 and 2012 American presidential election

campaigns.

 What are the main steps in carrying out sentiment analysis projects?

o The first step when performing sentiment analysis of a text document is called

sentiment detection, during which text data is differentiated between fact and opinion

(objective vs. subjective).

o This is followed by negative-positive (N-P) polarity classification, where a subjective

text item is classified on a bipolar range.

o Following this comes target identification (identifying the person, product, event, etc.

that the sentiment is about).

o Finally come collection and aggregation, in which the overall sentiment for the

document is calculated based on the calculations of sentiments of individual phrases

and words from the first three steps.

 What are the two common methods for polarity identification? Explain.

o Polarity identification can be done via a lexicon (as a reference library) or by using a

collection of training documents and inductive machine learning algorithms.

o The lexicon approach

 uses a catalog of words, their synonyms, and their meanings, combined

with numerical ratings indicating the position on the N-P polarity

associated with these words. In this way, affective, emotional, and

attitudinal phrases can be classified according to their degree of positivity

or negativity.

o By contrast, the training-document approach


 uses statistical analysis and machine learning algorithms, such as neural

networks, clustering approaches, and decision trees to ascertain the

sentiment for a new text document based on patterns from previous

"training" documents with assigned sentiment scores.

Section 7: Web Mining

 What are some of the main challenges the Web poses for knowledge discovery?

o The Web is too big for effective data mining.

o The Web is too complex.

o The Web is too dynamic.

o The Web is not specific to a domain.

o The Web has everything.

 What is Web mining? How does it differ from regular data mining or text mining?

o Web mining is the discovery and analysis of interesting and useful information from

the Web and about the Web, usually through Web-based tools.

o Text mining is less structured because it's based on words instead of numeric data.

 What are the three main areas of Web mining?

o The three main areas of Web mining are

 Web content mining,

 Web structure mining, and

 Web usage (or activity) mining.

 What is Web content mining? How can it be used for competitive advantage?

o Web content mining refers to the extraction of useful information from Web pages.
o The documents may be extracted in some machine-readable format so that automated

techniques can generate some information about the Web pages.

o Collecting and mining Web content can be used for competitive intelligence

(collecting intelligence about competitors' products, services, and customers), which

can give your organization a competitive advantage.

 What is Web structure mining? How does it differ from Web content mining?

o Web structure mining is the process of extracting useful information from the links

embedded in Web documents.

o By contrast, Web content mining involves analysis of the specific textual content of

web pages. So, Web structure mining is more related to navigation through a website,

whereas Web content mining is more related to text mining and the document

hierarchy of a particular web page.

Section 8: Search Engines

 What is a search engine? Why are they important for today's businesses?

o A search engine is a software program that searches for documents (Internet sites or

files) based on the keywords (individual words, multi-word terms, or a complete

sentence) that users have provided that have to do with the subject of their inquiry.

o This is the most prominent type of information retrieval system for finding relevant

content on the Web.

o Search engines have become the centerpiece of most Internet-based transactions and

other activities.
o Because people use them extensively to learn about products and services, it is very

important for companies to have prominent visibility on the Web; hence the major

effort of companies to enhance their search engine optimization (SEO).

 What is a web crawler? What is it used for? How does it work?

o A Web crawler (also called a spider or a Web spider) is a piece of software that

systematically browses (crawls through) the World Wide Web for the purpose of

finding and fetching Web pages.

o It starts with a list of "seed" URLs, goes to the pages of those URLs, and then follows

each page's hyperlinks, adding them to the search engine's database. Thus, the Web

crawler navigates through the Web in order to construct the database of websites.

 What is "search engine optimization"? Who benefits from it?

o Search engine optimization (SEO) is the intentional activity of affecting the visibility

of an e-commerce site or a website in a search engine's natural (unpaid or organic)

search results.

o It involves editing a page's content, HTML, metadata, and associated coding to both

increase its relevance to specific keywords and to remove barriers to the indexing

activities of search engines.

o In addition, SEO efforts include promoting a site to increase its number of inbound

links.

o SEO primarily benefits companies with e-commerce sites by making their pages

appear toward the top of search engine lists when users query.

 What things can help Web pages rank higher in the search engine results?

o Cross-linking between pages of the same website


 to provide more links to the most important pages may improve its

visibility.

o Writing content that includes frequently searched keyword phrases,

 so as to be relevant to a wide variety of search queries, will tend to

increase traffic.

o Updating content

 so as to keep search engines crawling back frequently can give additional

weight to a site.

o Adding relevant keywords to a Web page's metadata, including the title tag and

metadescription,

 will tend to improve the relevancy of a site's search listings, thus

increasing traffic.

o URL normalization of Web pages

 so that they are accessible via multiple URLs and using canonical link

elements and redirects can help make sure links to different versions of the

URL all count toward the page's link popularity score.

Section 9: Web Usage Mining (Web Analytics)

 What are the three types of data generated through Web page visits?

o Automatically generated data stored in server access logs, referrer logs, agent logs,

and client-side cookies

o User profiles

o Metadata, such as page attributes, content attributes, and usage data

 What is clickstream analysis? What is it used for?


o Analysis of the information collected by Web servers can help us better understand

user behavior. Analysis of this data is often called clickstream analysis.

o By using the data and text mining techniques, a company might be able to discern

interesting patterns from the clickstreams.

 What are the main applications of Web mining?

o Determine the lifetime value of clients.

o Design cross-marketing strategies across products.

o Evaluate promotional campaigns.

o Target electronic ads and coupons at user groups based on user access patterns.

o Predict user behavior based on previously learned rules and users' profiles.

o Present dynamic information to users based on their interests and profiles.

 What are commonly used Web analytics metrics? What is the importance of metrics?

o There are four main categories of Web analytic metrics:

 Website usability:

 How were they using my website?

 These involve page views, time on site, downloads, click map, and

click paths.

 Traffic sources:

 Where did they come from?

 These include referral websites, search engines, direct, offline

campaigns, and online campaigns.

 Visitor profiles:

 What do my visitors look like?


 These include keywords, content groupings, geography, time of

day, and landing page profiles.

 Conversion statistics:

 What does all this mean for the business?

 Metrics include new visitors, returning visitors, leads,

sales/conversions, and abandonments.

o These metrics are important because they provide access to a lot of valuable

marketing data, which can be leveraged for better insights to grow your business and

better document your ROI. The insight and intelligence gained from Web analytics

can be used to effectively manage the marketing efforts of an organization and its

various products or services.

Section 10: Social Analytics

 What is meant by social analytics? Why is it an important business topic?

o From a philosophical perspective, social analytics focuses on a theoretical object

called a "socius," a kind of "commonness" that is neither a universal account nor a

communality shared by every member of a body. Thus, social analytics in this sense

attempts to articulate the differences between philosophy and sociology.

o From a BI perspective, social analytics involves "monitoring, analyzing, measuring

and interpreting digital interactions and relationships of people, topics, ideas and

content."

 In this perspective, social analytics involves mining the textual content

created in social media (e.g., sentiment analysis, natural language


processing) and analyzing socially established networks (e.g., influencer

identification, profiling, prediction).

 This is an important business topic because it helps companies gain insight

about existing and potential customers' current and future behaviors, and

about the likes and dislikes toward a firm's products and services.

 What is a social network? What is social network analysis?

o A social network is a social structure composed of individuals/people (or groups of

individuals or organizations) linked to one another with some type of

connections/relationships.

o Social network analysis (SNA) is the systematic examination of social networks.

Dating back to the 1950s, social network analysis is an interdisciplinary field that

emerged from social psychology, sociology, statistics, and graph (network) theory.

 What is social media? How does it relate to Web 2.0?

o Social media refers to the enabling technologies of social interactions among people

in which they create, share, and exchange information, ideas, and opinions in virtual

communities and networks.

o It is a group of Internet-based software applications that build on the ideological and

technological foundations of Web 2.0, and that allow the creation and exchange of

user-generated content.

 What is social media analytics? What are the reasons behind its increasing popularity?

o Social media analytics refers to the systematic and scientific ways to consume the

vast amount of content created by Web-based social media outlets, tools, and
techniques for the betterment of an organization's competitiveness. Data includes

anything posted in a social media site.

o The increasing popularity of social media analytics stems largely from the similarly

increasing popularity of social media together with exponential growth in the

capacities of text and Web analytics technologies.

 How can you measure the impact of social media analytics?

o First, determine what your social media goals are. From here, you can use

analysis tools such as descriptive analytics, social network analysis, and advanced

(predictive, text examining content in online conversations), and ultimately

prescriptive analytics tools.


1) Section 2
a) What is text analytics? How does it differ from text mining?
i) Text analytics
(1) Turning unstructured text data into actionable information through applying
natural language processing and analytics. Text analytics is a broader concept that
includes the following
(a) Information retrieval
(b) Web mining
(c) Natural language processing
(d) Data mining
(i) Process of identifying relevant patterns in data stored in structured
databases
b) What is text mining? How does it differ from data mining?
i) Text mining
(1) Process of extracting patterns from large amounts of unstructured text sources.
Imposes structure on text sources then applies data mining techniques to extract
relevant information. Text mining is primarily focused on discovering new and
useful knowledge from the textual data sources.
c) Why is the popularity of text mining as an analytics tool increasing?
i) Areas with large amounts of text data immensely benefit from text mining
d) What are some of the most popular application areas of text mining?
i) Information extraction
(1) Identifying key phrases by looking at predefined objects through pattern matching
ii) Topic tracking
(1) Predict other documents on the interests of its user based on their profile
iii) Summarization
iv) Categorization
(1) Identifies main themes of a doc then places doc into categories based on the
identified themes
v) Clustering
(1) Grouping similar docs without the need of predefined categories
vi) Concept linking
(1) Connects related documents with their shared concepts
vii) Question answering
(1) Finding best answer through knowledge driven pattern matching
2) Section 3
a) What is NLP?
i) Natural language processing (NLP) is an important component of text mining and is a
subfield of artificial intelligence and computational linguistics. It studies the problem
of “understanding” the natural human language, with the view of converting
depictions of human language (such as textual documents) into more formal
representations (in the form of numeric and symbolic data) that are easier for
computer programs to manipulate. The goal of NLP is to move beyond syntax-driven
text manipulation (which is often called “word counting”) to a true understanding and
processing of natural language that considers grammatical and semantic constraints as
well as the context.
b) How does NLP relate to text mining?
i) It is a component of text mining
c) What are some of the benefits and challenges of NLP?
i) Benefits
(1) Perform large-scale analysis
(2) Get a more objective and accurate analysis
(3) Streamline processes and reduce costs
(4) Improve customer satisfaction
(5) Better understand your market
(6) Empower your employees
(7) Gain real, actionable insights
ii) Challenges
(1) Part-of-speech tagging
(a) Can’t detect if word is noun/verb because it’s dependent on context
(2) Text segmentation
(a) Some languages do not have single word boundaries
(3) Word sense disambiguation
(a) Words with more than one meaning require context
(4) Syntactic ambiguity
(a) Most appropriate structure requires both semantic and contextual information
(5) Imperfect or irregular input
(a) Accents can ruin input
(6) Speech acts
(a) Sentences that are considered or require actions contains sentence structure
that requires more info to define
d) What are the most common tasks addressed by NLP?
i) Question answering
(1) Producing a human language answer when asked a human language question
through a prestructured database or collection of natural language docs
ii) Auto summary
(1) Shortened version of og doc using most important points
iii) Natural language generation
(1) Convert info to human language
iv) Natural language understanding
(1) Convert human language into representation easier for computers to understand
v) Machine translation
(1) Translate one human language to another
vi) Foreign language reading
(1) Helps nonnative speaker speak foreign language
vii) Foreign language writing
(1) Helps nonnative speaker writing foreign language
viii) Speech recognition
(1) Text dictation. Converts spoken words to machine readable input
ix) Text to speech
(1) Converts language to human speech
x) Text proofing
(1) Detect and correct text errors
xi) Optical character recognition
(1) Translating images of text into editable docs
3) Section 4
a) List and briefly discuss some of the text mining applications in marketing.
i) Text mining can be used to increase cross-selling and up-selling by analyzing the
unstructured data generated by call centers. Text generated by call center notes as
well as transcriptions of voice conversations with customers can be analyzed by text
mining algorithms to extract novel, actionable information about customers’
perceptions toward a company’s products and services. In addition, blogs, user
reviews of products at independent Web sites, and discussion board postings are a
gold mine of customer sentiments. This rich collection of information, once properly
analyzed, can be used to increase satisfaction and the overall lifetime value of the
customer (Coussement & Van den Poel, 2008).
b) How can text mining be used in security and counterterrorism?
i) Another security-related application of text mining is in the area of deception
detection. Applying text mining to a large set of real-world criminal (person-of-
interest) statements, Fuller, Biros, and Delen (2008) developed prediction models to
differentiate deceptive statements from truthful ones. Using a rich set of cues
extracted from the textual statements, the model predicted the holdout samples with
70% accuracy, which is believed to be a significant success considering that the cues
are extracted only from textual statements (no verbal or visual cues are present).
Furthermore, compared to other deception-detection techniques, such as polygraph,
this method is nonintrusive and widely applicable to not only textual data, but also
(potentially) to transcriptions of voice recordings.
c) What are some promising text mining applications in biomedicine?
i) Chun et al. (2006) described a system that extracts disease–gene relationships from
literature accessed via MEDLINE. They constructed a dictionary for disease and gene
names from six public databases and extracted relation candidates by dictionary
matching. Because dictionary matching-produces a large number of false positives,
they developed a method of machine-learning–based named entity recognition (NER)
to filter out false recognitions of disease/gene names. They found that the success of
relation extraction is heavily dependent on the performance of NER filtering and that
the filtering improved the precision of relation extraction by 26.7%, at the cost of a
small reduction in recall.
4) Section 5
a) What are the main steps in the text mining process?
i)

ii)
b) What is the reason for normalizing word frequencies?
i) Increase consistency of term-document matrix (TDM)
c) What are the common methods for normalizing word frequencies?
i) log frequencies, binary frequencies, and inverse document frequencies, among others.
d) What is SVD? How is it used in text mining?
i) Singular value decomposition (SVD), which is closely related to principal
components analysis, reduces the overall dimensionality of the input matrix (number
of input documents by number of extracted terms) to a lower-dimensional space,
where each consecutive dimension represents the largest degree of variability
(between words and documents) possible (Manning & Schutze, 1999). Ideally, the
analyst might identify the two or three most salient dimensions that account for most
of the variability (differences) between the words and documents, thus identifying the
latent semantic space that organizes the words and documents in the analysis. Once
such dimensions are identified, the underlying “meaning” of what is contained
(discussed or described) in the documents has been extracted.
e) What are the main knowledge extraction methods from corpus?
i) Classification
(1) Given a set of categories and group of text documents, the documents are matched
with the correct category using models developed with a training data set that
includes both documents and categories
ii) Clustering
(1) Grouping an unlabeled collection into meaningful clusters without prior
knowledge
(2) Great for web content searches
(3) Improves search recall
(4) Improved search precisions
(5) Most popular clustering methods
(a) Scatter/gather
(i) Dynamically generates table of contents for the collection and adapts and
modifies it in response to user selection
iii) Association
(1) Direct relationships between terms or sets of concepts
iv) Trend analysis
(1) Analyzing two collections but from different points in time
5) Section 6
a) What is sentiment analysis? How does it relate to text mining?
i) Sentiment analysis
(1) Technique used to detect favorable and unfavorable opinions toward specific
products and services using a large number of text sources.
ii) Text mining is one of the tools sentiment analysis to identify people’s opinions
b) What are the most popular application areas for sentiment analysis? Why?
i) Voice of the Customer
(1) Sentiment analysis can access a company’s product/service reviews to better
understand and better manage customer opinions
ii) Voice of the Market
(1) Understanding aggregate opinions and trends
(2) Helps companies with competitive intelligence and product development and
positioning
iii) Voice of the Employee
(1) Using rich, opinionated textual data is an effective and efficient way to listen to
what employees are saying
iv) Brand Management
(1) Sentiment analysis helps brand management move toward shaping perception
from managing experiences
v) Financial Markets
(1) Using sentiment analysis through social media, news, blogs, and discussion
groups to compute market movements
vi) Politics
(1) Predicting election results
(2) Helps understand what voters are thinking and clarify candidates’ position on
issues
(3) Help political organizations identify critical issues and positions to voters
vii) Government Intelligence
(1) Allows automatic analysis of opinions that people submit about pending policy or
government -regulation proposals
(2) Monitoring spikes in negative sentiment
c) What would be the expected benefits and beneficiaries of sentiment analysis in politics?
i) As we all know, opinions matter a great deal in politics. Because political discussions
are dominated by quotes, sarcasm, and complex references to persons, organizations,
and ideas, politics is one of the most difficult, and potentially fruitful, areas for
sentiment analysis. By analyzing the sentiment on election forums, one may predict
who is more likely to win or lose. Sentiment analysis can help understand what voters
are thinking and can clarify a candidate’s position on issues. Sentiment analysis can
help political organizations, campaigns, and news analysts to better understand which
issues and positions matter the most to voters. The technology was successfully
applied by both parties to the 2008 and 2012 American presidential election
campaigns.
d) What are the main steps in carrying out sentiment analysis projects?
i) Step 1: Sentiment Detection
(1) Differentiating between fact and opinion
ii) Step 2: N-P Polarity Classification
(1) Grouping opinions in the spectrum of positive or negative
iii) Step 3: Target Identification
(1) Identifying the target of the expressed sentiment (person, product, event, etc.)
iv) Step 4: Collection & Aggregation
(1) All sentiments are aggregated and converted into a single measure of sentiment
v)
e) What are the two common methods for polarity identification? Explain.
i) Using a Lexicon
ii) Using a collection of training docs
6) Section 7
a) What are some of the main challenges the Web poses for knowledge discovery?
i) The Web is too big for effective data mining.
ii) • The Web is too complex.
iii) • The Web is too dynamic.
iv) • The Web is not specific to a domain.
v) • The Web has everything.
b) What is Web mining? How does it differ from regular data mining or text mining?
i) Web mining is the discovery and analysis of interesting and useful information from
the Web and about the Web, usually through Web-based tools. Text mining is less
structured because it's based on words instead of numeric data.
c) What are the three main areas of Web mining?
i) The three main areas of Web mining are Web content mining, Web structure mining,
and Web usage (or activity) mining.
d) What is Web content mining? How can it be used for competitive advantage?
i) Web content mining refers to the extraction of useful information from Web pages.
The documents may be extracted in some machine-readable format so that automated
techniques can generate some information about the Web pages. Collecting and
mining Web content can be used for competitive intelligence (collecting intelligence
about competitors' products, services, and customers), which can give your
organization a competitive advantage.
e) What is Web structure mining? How does it differ from Web content mining?
i) Web structure mining is the process of extracting useful information from the links
embedded in Web documents. By contrast, Web content mining involves analysis of
the specific textual content of web pages. So, Web structure mining is more related to
navigation through a website, whereas Web content mining is more related to text
mining and the document hierarchy of a particular web page.
7) Section 8: Search Engines
a) What is a search engine? Why are they important for today’s businesses?
i) A search engine is a software program that searches for documents Internet sites or
files) based on the keywords (individual words, multi-word terms, or a complete
sentence) that users have provided that have to do with the subject of their inquiry.
This is the most prominent type of information retrieval system for finding relevant
content on the Web. Search engines have become the centerpiece of most Internet-
based transactions and other activities. Because people use them extensively to learn
about products and services, it is very important for companies to have prominent
visibility on the Web, hence the major effort of companies to enhance their search
engine optimization (SEO).
b) What is a Web crawler? What is it used for? How does it work?
i) A Web crawler (also called a spider or a Web spider) is a piece of software that
systematically browses (crawls through) the World Wide Web for the purpose of
finding and fetching Web pages. It starts with a list of "seed" URLs, goes to the pages
of those URLs, and then follows each page's hyperlinks, adding them to the search
engine's database. Thus, the Web crawler navigates through the Web in order to
construct the database of websites.
c) What is “search engine optimization?” Who benefits from it?
i) Search engine optimization (SEO) is the intentional activity of affecting the visibility
of an e-commerce site or a website in a search engine's natural (unpaid or organic)
search results. It involves editing a page's content, HTML, metadata, and associated
coding to both increase its relevance to specific keywords and to remove barriers to
the indexing activities of search engines. In addition, SEO efforts include promoting a
site to increase its number of inbound links. SEO primarily benefits companies with
e-commerce sites by making their pages appear toward the top of search engine lists
when users query.
d) What things can help Web pages rank higher in the search engine results?
i) Cross-linking between pages of the same website to provide more links to the most
important pages may improve its visibility. Writing content that includes frequently
searched keyword phrases, so as to be relevant to a wide variety of search queries,
will tend to increase traffic. Updating content so as to keep search engines crawling
back frequently can give additional weight to a site. Adding relevant keywords to a
Web page's metadata, including the title tag and meta description, will tend to
improve the relevancy of a site's search listings, thus increasing traffic.URL
normalization of Web pages so that they are accessible via multiple URLs. Using
canonical link elements and redirects can help make sure links to different versions of
the URL all count toward the page's link popularity score.
8) Section 9: Web Usage Mining
a) What are the three types of data generated through Web page visits?
i) Automatically generated data stored in server access logs, referrer logs,agent logs,
and client-side cookies
ii) • User profiles
iii) • Metadata, such as page attributes, content attributes, and usage data.
b) What is clickstream analysis? What is it used for?
i) Analysis of the information collected by Web servers can help us better understand
user behavior. Analysis of this data is often called click stream analysis. By using the
data and text mining techniques, a company might be able to discern interesting
patterns from the clickstreams.
c) What are the main applications of Web mining?
i) Determine the lifetime value of clients.
ii) • Design cross-marketing strategies across products.
iii) • Evaluate promotional campaigns.
iv) • Target electronic ads and coupons at user groups based on user access patterns.
v) • Predict user behavior based on previously learned rules and users' profiles.
vi) • Present dynamic information to users based on their interests and profiles.
d) What are commonly used Web analytics metrics? What is the importance of metrics?
i) There are four main categories of Web analytic metrics:
(1) • Website usability: How were they using my website? These involve pageviews,
time on site, downloads, click map, and click paths.
(2) • Traffic sources: Where did they come from? These include referral websites,
search engines, direct, offline campaigns, and online campaigns.
(3) • Visitor profiles: What do my visitors look like? These include keywords, content
groupings, geography, time of day, and landing page profiles.
(4) • Conversion statistics: What does all this mean for the business? Metrics include
new visitors, returning visitors, leads, sales/conversions, and abandonments.
ii) --> These metrics are important because they provide access to a lot of valuable
marketing data, which can be leveraged for better insights to grow your business and
better document your ROI. The insight and intelligence gained from Web analytics
can be used to effectively manage the marketing efforts of an organization and its
various products or services.
9) Section 10: Social Analytics
a) What is meant by social analytics? Why is it an important business topic?
i) From a philosophical perspective, social analytics focuses on a theoretical object
called a "socius," a kind of "commonness" that is neither a universal account nor a
communality shared by every member of a body. Thus, social analytics in this sense
attempts to articulate the differences between philosophy and sociology. From a BI
perspective, social analytics involves "monitoring, analyzing, measuring and
interpreting digital interactions and relationships of people, topics, ideas and content."
In this perspective, social analytics involves mining the textual content created in
social media (e.g., sentiment analysis, natural language processing) and analyzing
socially established networks (e.g., influencer identification, profiling, prediction).
This is an important business topic because it helps companies gain insight about
existing and potential customers' current and future behaviors, and about the likes and
dislikes toward a firm's products and services.
b) What is a social network? What is the need for SNA?
i) A social network is a social structure composed of individuals/people (or groups of
individuals or organizations) linked to one another with some type of
connections/relationships. Social network analysis (SNA) is the systematic
examination of social networks. Dating back to the 1950s, social network analysis is
an interdisciplinary field that emerged from social psychology, sociology, statistics,
and graph (network) theory.
c) What is social media? How does it relate to Web 2.0?
i) Social media refers to the enabling technologies of social interactions among people
in which they create, share, and exchange information, ideas, and opinions in virtual
communities and networks. It is a group of Internet-based software applications that
build on the ideological and technological foundations of Web 2.0, and that allow the
creation and exchange of user-generated content.
d) What is social media analytics? What are the reasons behind its increasing popularity?
i) Social media analytics refers to the systematic and scientific ways to consume the
vast amount of content created by Web-based social media outlets, tools, and
techniques for the betterment of an organization's competitiveness. Data includes
anything posted in a social media site. The increasing popularity of social media
analytics stems largely from the similarly increasing popularity of social media
together with exponential growth in the capacities of text and Web analytics
technologies.
e) How can you measure the impact of social media analytics?
i) First, determine what your social media goals are. From here, you can use analysis
tools such as descriptive analytics, social network analysis, and advanced (predictive,
text examining content in online conversations),and ultimately prescriptive analytics
tools.
OPENING VIGNETTE: Machine versus Men on Jeopardy!: The Story of Watson
 What is Watson? What is special about it?

o IBM Research created Watson, an extraordinary computer system (a novel

combination of advanced hardware and software) designed to answer questions posed

in natural human language.

o Watson was capable of listening, understanding, responding, and winning in real time

on the Jeopardy quiz show.

o Watson proved that computer systems can do things that require human creativity and

intelligence.

 What technologies were used in building Watson (both hardware and software)?

o Watson is built on the DeepQA framework. The hardware for this system involves a

massively parallel processing architecture. In terms of software, Watson uses a

variety of AI-related QA technologies, including text mining, natural language

processing, question classification and decomposition, automatic source acquisition

and evaluation, entity and relation detection, logical form generation, and knowledge

representation and reasoning.

 Why did IBM spend all that time and money to build Watson? Where is the ROI?

o IBM's goal was to advance computer science by exploring new ways for computer

technology to affect science, business, and society. If successful, this could give IBM

a distinct competitive advantage in this important technological application area

Application Case 5.1: Insurance Group Strengthens Risk Management with Text Mining
Solution
 How can text analytics and mining be used to keep up with changing business needs of

insurance companies?
o The purpose was to expand and automate the analysis of unstructured accident

reports, witness statements, and claim narratives for the automobile insurance

company.

 What were the challenges, the proposed solution, and the obtained results with Insurance

case?

o The largest challenge is the unstructured nature of the documents and their variability.

 Can you think of other uses of text analytics and text mining for insurance companies?

o There are many possible solutions, but this type of system could be used in other

insurance areas. For example, it could be used to help evaluate the potential risks

involved in ensuring personal property

Application Case 5.2: AMC Networks Is Using Analytics to Capture New Viewers, Predict
Ratings, and Add Value for Advertisers in a Multichannel World
 What are the common challenges broadcasting companies are facing nowadays? How can

analytics help to alleviate these challenges?

o To remain competitive, television broadcasting companies need to develop new

content that appeals to their viewers. AMC developed original, hit shows such as

Breaking Bad, Better Call Saul, Mad Men, and The Walking Dead. Understanding

what type of content will be appealing requires the analysis of a large quantity and

variety of data from multiple sources. Analytic systems can help with this task by

speeding the evaluation, as well as aggregating information

 How did AMC leverage analytics to enhance their business performance?

o The company used analytic systems to help aggregate information across a wide

variety of platforms. After the information was aggregated, it was easier to evaluate
customer use trends, as well as to identify potential markets and submarkets that new

content could be produced for.

 What were the types of text analytics and text mini solutions developed by AMC networks?

Can you think of other potential uses of text mining applications in the broadcasting

industry?

o An example of the type of analysis used would be to look at customer viewing trends,

and identify particular markets for types of content.

o This type of analysis may help the industry drive towards more personalization in

potential content that is offered.

Application Case 5.5: Mining for Lies


o Person-of-interest statements completed by people involved in crimes on military

bases were analyzed using text mining techniques to determine which statements

were truthful or deceptive. The study analyzed text-based testimonies of persons of

interest in crimes. The deception detection used only text-based features (cues) and

did NOT analyze the observed behavior of the witnesses during their testimony

 Why is it difficult to detect deception?

o Humans tend to perform poorly at deception-detection tasks. This phenomenon is

exacerbated in text-based communications.

 How can text/data mining be used to detect deception in text?

o Classification models are trained and tested on quantified cues, and based on this,

statements are labeled as truthful or deceptive (e.g., by law enforcement personnel).


o In the Mining for Lies case study, a text based deception-detection method used by

the researchers was based on a process known as message feature mining, which

relies on elements of data and text mining techniques.

 What do you think are the main challenges for such an automated system with mining lies

case?

o One challenge is that the training system depends on humans to ascertain the

truthfulness of statements in the training data itself.

Application Case 5.4 Bringing the Customer into the Quality Equation: Lenovo Uses
Analytics to Rethink Its Redesign
 How did Lenovo use text analytics and text mining to improve quality and design of their

products and ultimately improve customer satisfaction?

o Lenovo is a leading computing product manufacturer and uses text mining to better

understand their current and potential customers' needs and wants related to product

quality and product design.

o Lenovo has been able to use text analytics and text mining to better understand

customer likes and dislikes across its product line.

o By using advanced systems, the company is able to identify, collect, and process a

large amount of diverse information to better understand customer feelings about

specific product lines, and Lenovo technology as a whole.

 What were the challenges, the proposed solution, and the obtained results for Lenovo?

o There are many challenges in this type of system.

 Chief among those challenges are the ability to identify sources of

information, in this case, reviews and comments from users, and the

 ability to analyze such a large and diverse data set.


o By using analytic and text mining systems, Lenovo has been able to better

characterize user sentiment, and use this information to drive both customer

service and product development.

o These systems have been very successful, and there are plans to grow their use in

the company in the future.

Application Case 5.5 Research Literature Survey with Text Mining


 How can text mining be used to ease the insurmountable task of literature review?

o In the research literature case study, the researchers analyzing academic papers

extracted information from the paper abstract.

o Text mining enables a semiautomated analysis of large volumes of published

literature.

o Clustering was used in this study to identify the natural groupings of the articles, and

list the most descriptive terms that characterized those clusters.

o Use of text and data mining can thus speed up and simplify the literature review

process for academic researchers.

 What are the common outcomes of a text mining project on a specific collection of journal

articles? Can you think of other potential outcomes not mentioned in this case?

o Common outcomes include identifying natural clusters of similar articles, helping to

identify the optimal number of cluster classifications.

o Text mining also has other possible applications in literature reviews. For example,

sentiment analysis can help to identify positive and negative judgments. Text mining

can be used to build taxonomies of concepts and terms within and between research

articles. You can find common themes by author as well as by journal.


Application Case 5.6 Creating a Unique Digital Experience to Capture the Moments That
Matter at Wimbledon
 How did Wimbledon use analytics capabilities to enhance viewers' experience?

o Wimbledon used analytics to help improve the viewer experience by leveraging data

that was already available, but under-used.

o For example, the system analyzed in real-time data coming from all the matches, and

flagged when important milestones or events occurred.

o In the Wimbledon case study, the tournament used data for each tennis match in real

time to highlight significant events happening at The Championships.

 What were the challenges, the proposed solution, and the obtained results on Wimbledon?

o One of the challenges that the tournament faced was providing services to viewers

across a wide variety of platforms.

 While the growth of mobile users was increasing, most users still utilize

desktop computers to access the tournament website.

o This meant that a hybrid solution needed to be undertaken, that provided the best

responsive viewing for mobile users, while integrating more in-depth and high-

resolution features for desktop users.

o In the Wimbledon case study, designers balanced the needs of mobile and desktop

computer users.

Application Case 5.7 Understanding Why Customers Abandon Shopping Carts Results in a
$10 Million Sales Increase
 How did Lotte.com use analytics to improve sales?
o Lotte.com is the leading Internet shopping mall in Korea and has developed its

integrated Web traffic analysis system using the SAS for Customer Experience

Analytics solution.

o This information enables Lotte.com to better understand customers and their behavior

online, and conduct sophisticated, cost-effective targeted marketing.

o It is false to assume that little can be done about visitor Web site abandonment rates.

 What were the challenges, the proposed solution, and the obtained results on Lotte.com

o In the Lotte.com retail case, the company deployed SAS for Customer Experience

Analytics to better understand the quality of customer traffic on their Web site,

classify order rates, and see which channels had the most visitors. Heightened

customer loyalty and optimized channels.

 Do you think e-commerce companies are in better position to leverage benefits of analytics?

Why? How?

o To the degree that e-commerce companies integrate analytics into their systems, they

can take advantage of these technologies to make improvements in the user

experience on their e-commerce systems and thereby increase customer satisfaction.

Application Case 5.8 Tito’s Vodka Establishes Brand Loyalty with an Authentic Social
Strategy
 How can social media analytics be used in the consumer products industry?

o In the case, Tito's Vodka uses social media analytics to help identify trends in the

overall market, as well as the interests of its own customers.

o The social media team actively uses Twitter and Instagram to have one-on-one

conversations and connect with brand enthusiasts.


o Tito's Vodka claims they are spreading the word on Twitter and capturing the party

on Instagram.

o Trends in cocktails were studied to create a quarterly recipe for customers.

o In the Tito's Vodka case, it was important that social media users all had a consistent

brand experience.

 What do you think are the key challenges, potential solutions, and probable results in

applying social media analytics in consumer products and services firms?

o The largest challenge in this area will be collecting and analyzing such a diverse set

of information.

o This type of activity will require advanced analytics systems to help marketers

understand customer preferences.

o Firms that engage in these practices successfully will be able to meet customer needs,

and by doing this build their brand acceptance as well as revenues.


Extra Notes
 ________ is a connections metric for social networks that measures the ties that actors in a

network have with others that are geographically close.

o Propinquity

 ________ is a segmentation metric for social networks that measures the strength of the

bonds between actors in a social network.

o Cohesion

 ________ is a technique used to detect favorable and unfavorable opinions toward specific

products and services using large numbers of textual data sources.

o Sentiment analysis

 ________ is mostly driven by sentiment analysis and is a key element of customer

experience management initiatives, where the goal is to create an intimate relationship with

the customer.

o Voice of the customer (VOC)

 ________ statistics help you understand whether your specific marketing objective for a Web

page is being achieved.

o Conversion

 ________ Web analytics refers to measurement and analysis of data relating to your

company that takes place outside your Web site.

o Off-site

 ________, also called homonyms, are syntactically identical words with different meanings.

o Polysemes
 A(n) ________ engine is a software program that searches for Web sites or files based on

keywords.

o search

 A(n) ________ is one or more Web pages that provide a collection of links to authoritative

Web pages.

o hub

 A(n) ________ Web site contains links that send traffic directly to your Web site.

o referral

 All of the following are challenges associated with natural language processing EXCEPT

o dividing up a text into individual words in English.

 Articles and auxiliary verbs are assigned little value in text mining and are usually filtered

out. True

 At a very high level, the text mining process can be broken down into three consecutive

tasks, the first of which is to establish the ________.

o Corpus

 Because the term document matrix is often very large and rather sparse, an important

optimization step is to reduce the ________ of the matrix.

o dimensionality

 Breaking up a Web page into its components to identify worthy words/terms and indexing

them using a set of rules is called

o parsing the documents.

 Categorization and clustering of documents during text mining differ only in the preselection

of categories.
o True

 Clickstream analysis does not need users to enter their perceptions of the Web site or other

feedback directly to be useful in determining their preferences.

o True

 Companies understand that when their product goes "viral," the content of the online

conversations about their product does not matter, only the volume of conversations.

o False

 Consistent high quality, higher publishing frequency, and longer time lag are all attributes of

industrial publishing when compared to Web publishing.

o False

 Current use of sentiment analysis in voice of the customer applications allows companies to

change their products or services in real time in response to customer sentiment.

o True

 Describe the query-specific clustering method as it relates to clustering.

o This method employs a hierarchical clustering approach where the most relevant

documents to the posed query appear in small tight clusters that are nested in larger

clusters containing less similar documents, creating a spectrum of relevance levels

among the documents.

 Descriptive analytics for social media feature such items as your followers as well as the

content in online conversations that help you to identify themes and sentiments.

o False

 How would you describe information extraction in text mining?


o Its the process of extracting structured information from unstructured or semi

structured documents.

 IBM's Watson utilizes a massively parallel, text mining-focused, probabilistic evidence-

based computational architecture called ________.

o DeepQA

 Identify, with a brief description, each of the four steps in the sentiment analysis process.

 Sentiment Detection: Here the goal is to

differentiate between a fact and an opinion,

which may be viewed as classification of text as

objective or subjective.

 N-P Polarity Classification: Given an

opinionated piece of text, the goal is to classify

the opinion as falling under one of two opposing

sentiment polarities, or locate its position on the

continuum between these two polarities.

 Target Identification: The goal of this step is to

accurately identify the target of the expressed

sentiment.

 Collection and Aggregation: In this step all text

data points in the document are aggregated and

converted to a single sentiment measure for the

whole document.
 In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but

easier to classify others, e.g., movie reviews, in the same way.

o True

 In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's

feelings.

o False

 In sentiment analysis, which of the following is an implicit opinion?

o The customer service I got for my TV was laughable.

 In text analysis, what is a lexicon?

o a catalog of words, their synonyms, and their meanings

 In text mining, if an association between two concepts has 7% support, it means that 7% of

the documents had both concepts represented in the same document.

o True

 In text mining, tokenizing is the process of

o categorizing a block of text in a sentence.

 In the car insurance case study, text mining was used to identify auto features that caused

injuries.

o False

 In the evolution of social media user engagement, the largest recent change is the growth of

creators.

o False
 In the Lotte.com retail case, the company deployed SAS for Customer Experience Analytics

to better understand the quality of customer traffic on their Web site, classify order rates, and

see which ________ had the most visitors.

o channels

 In the Mining for Lies case study, a text based deception-detection method used by Fuller

and others in 2008 was based on a process known as ________, which relies on elements of

data and text mining techniques.

o message feature mining

 In the opening vignette, the architectural system that supported Watson used all the following

elements EXCEPT

o a core engine that could operate seamlessly in another domain without changes.

 In the research literature case study, the researchers analyzing academic papers extracted

information from which source?

o the paper abstract

 In the security domain, one of the largest and most prominent text mining applications is the

highly classified ECHELON surveillance system. What is ECHELON assumed to be capable

of doing?

o Identifying the content of telephone calls, faxes, e-mails, and other types of data and

intercepting information sent via satellites, public switched telephone networks, and

microwave links

 In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe

for customers.

o True
 In the Tito's Vodka case, it was important that social media users all had a(n) ________

brand experience.

o consistent

 In the Wimbledon case study, designers balanced the needs of mobile and desktop computer

users.

o True

 In the Wimbledon case study, the tournament used data for each match in real time to

highlight

o significant events.

 In what ways does the Web pose great challenges for effective and efficient knowledge

discovery through data mining?

 The web is too big for effective data mining

- difficult to quantify the size of the web

 The web is too complex

- Content variation is large

 The web is too dynamic

- Content constantly being updated

o 4.The web is not specific to a domain.

- Different backgrounds

 The web has everything.

- only a small percentage is useful for one person 99% is not

 Natural language processing (NLP) is associated with which of the following areas?

o all of these
 Natural language processing (NLP), a subfield of artificial intelligence and computational

linguistics, is an important component of text mining. What is the definition of NLP?

o NLP is a discipline that studies the problem of understanding the natural human

language, with the view of converting depictions of human language into more formal

representations in the form of numeric and symbolic data that are easier for computer

programs to manipulate.

 Regional accents present challenges for natural language processing.

o True

 Search engine optimization (SEO) is a means by which

o Web site developers can increase Web site search rankings.

 Search engine optimization (SEO) techniques play a minor role in a Web site's search

ranking because only well-written content matters.

o False

 Search engines are only used in the context of the World Wide Web (WWW).

o False

 Sentiment analysis projects require a lexicon for use. If a project in English is undertaken,

you must generally make sure to

o use an English lexicon appropriate to the project at your discretion.

 Since little can be done about visitor Web site abandonment rates, organizations have to

focus their efforts on increasing the number of new visitors.

o False

 Text analytics is the subset of text mining that handles information retrieval and extraction,

plus data mining.


o False

 Understanding which keywords your users enter to reach your Web site through a search

engine can help you understand

o how well visitors understand your products.

 Web ________ are used to automatically read through the contents of Web sites.

o crawlers/spiders

 Web pages contain both unstructured information and ________, which are connections to

other Web pages.

o hyperlinks

 Web site usability may be rated poor if

o Web site visitors download few of your offered PDFs and videos.

 Web-based media has nearly identical cost and scale structures as traditional media.

o False

 What are the three categories of social media analytics technologies and what do they do?

o Descriptive analytics: Uses simple statistics to identify activity characteristics and

trends, such as how many followers you have, how many reviews were generated on

Facebook, and which channels are being used most often.

o Social network analysis: Follows the links between friends, fans, and followers to

identify

o connections of influence as well as the biggest sources of influence.

o Advanced analytics: Includes predictive analytics and text analytics that examine the

content in
o online conversations to identify themes, sentiments, and connections that would not

be revealed by casual surveillance.

 What are the two main types of Web analytics?

o off-site and on-site Web analytics

 What do voice of the market (VOM) applications of sentiment analysis do?

o They examine customer sentiment at the aggregate level.

 What does advanced analytics for social media do?

o It examines the content of online conversations.

 What does Web content mining involve?

o analyzing the unstructured content of Web pages

 What is one major way in which Web-based social media differs from traditional publishing

media?

o They have different costs to own and operate.

 What is search engine optimization (SEO) and why is it important for organizations that own

Web sites?

o Search engine optimization (SEO) is the intentional activity of affecting the visibility

of an e-commerce site or a Web site in a search engine's natural (unpaid or organic)

search results.

o In general, the higher ranked on the search results page, and more frequently a site

appears in the search results list, the more visitors it will receive from the search

engine's users.

o Being indexed by search engines like Google, Bing, and Yahoo! is not good enough

for businesses.
o Getting ranked on the most widely used search engines and getting ranked higher

than your competitors are what make the difference.

 What is the difference between white hat and black hat SEO activities?

o The main difference is that black hat focuses on techniques and strategies to get

higher search ranking. The focus is on search engines. In contrast, white hat focuses

on the use of techniques and strategies that are targeted to a human audience.

 What types of documents are BEST suited to semantic labeling and aggregation to determine

sentiment orientation?

o small- to medium-sized documents

 When a word has more than one meaning, selecting the meaning that makes the most sense

can only be accomplished by taking into account the context within which the word is used.

This concept is known as ________.

o word sense disambiguation

 When viewed as a binary feature, ________ classification is the binary classification task of

labeling an opinionated document as expressing either an overall positive or an overall

negative opinion.

o polarity

 Which of the following statements about Web site conversion statistics is FALSE?

o Visitors who begin a purchase on most Web sites must complete it.

 Why are the users' page views and time spent on your Web site important metrics?

o Important metrics because the website may have issues with the design or structure.

There may be a disconnect with the marketing message and the content on the page.

You might also like