Section 2 Text Analytics and Text Mining Overview
Section 2 Text Analytics and Text Mining Overview
information extraction,
Web mining.
o The overarching goal for both text analytics and text mining is to turn
retrieval.
o You can think of text analytics as a combination of information retrieval plus text
mining.
o Text mining is the application of data mining to unstructured, or less structured, text
files.
o As the names indicate, text mining analyzes words; and data mining analyzes numeric
data.
o The benefits of text mining are obvious in the areas where very large amounts of
o Information extraction.
o Topic tracking.
Based on a user profile and documents that a user views, text mining can
o Summarization.
o Categorization.
Identifying the main themes of a document and then placing the document
o Concept linking.
doing so, helps users find information that they perhaps would not have
o Question answering.
pattern matching.
What is NLP?
o It studies the problem of "understanding" the natural human language, with the view
formal representations (in the form of numeric and symbolic data) that are easier for
o Text mining uses natural language processing to induce structure into the text
collection and then uses data mining algorithms such as classification, clustering,
o NLP moves beyond syntax-driven text manipulation (which is often called "word
Part-of-speech tagging.
only on the definition of the term but also on the context within
which it is used.
Text segmentation.
Selecting the meaning that makes the most sense can only be
used.
Syntactic ambiguity.
Speech acts.
Question answering
Automatic summarization
Machine translation
Speech recognition
Text-to-speech
Text proofing
List and briefly discuss some of the text mining applications in marketing.
o Text mining can be used to increase cross-selling and up-selling by analyzing the
Companies can use text mining to analyze rich sets of unstructured text data,
combined with the relevant structured data extracted from organizational databases,
and analyzing vast amounts of structured and unstructured data sources in order to
detection.
data in the context of previously known information about the biological entities
text mining tools to assist in such interpretation is one of the main challenges in
Extract Knowledge:
What is the reason for normalizing word frequencies? What are the common methods for
o The raw indices need to be normalized to qhave a more consistent TDM for further
analysis.
o Common methods are log frequencies, binary frequencies, and inverse document
frequencies.
components analysis, reduces the overall dimensionality of the input matrix (number
classification,
clustering,
association, and
trend analysis.
Section 6: Sentiment Analysis
o Sentiment analysis tries to answer the question, "What do people feel about a certain
topic?" by digging into opinions of many using a variety of automated tools. It is also
o Sentiment analysis shares many characteristics and techniques with text mining.
topics, sentiment classification generally deals with two classes (positive versus
negative), a range of polarity (e.g., star ratings for movies), or a range in strength of
opinion.
What are the most popular application areas for sentiment analysis? Why?
popular "voice of the customer (VOC)" applications. Other application areas include
What would be the expected benefits and beneficiaries of sentiment analysis in politics?
o Opinions matter a great deal in politics. Because political discussions are dominated
politics is one of the most difficult, and potentially fruitful, areas for sentiment
analysis. By analyzing the sentiment on election forums, one may predict who is
more likely to win or lose. Sentiment analysis can help understand what voters are
thinking and can clarify a candidate's position on issues. Sentiment analysis can help
issues and positions matter the most to voters. The technology was successfully
applied by both parties to the 2008 and 2012 American presidential election
campaigns.
What are the main steps in carrying out sentiment analysis projects?
o The first step when performing sentiment analysis of a text document is called
sentiment detection, during which text data is differentiated between fact and opinion
o Following this comes target identification (identifying the person, product, event, etc.
o Finally come collection and aggregation, in which the overall sentiment for the
What are the two common methods for polarity identification? Explain.
o Polarity identification can be done via a lexicon (as a reference library) or by using a
or negativity.
What are some of the main challenges the Web poses for knowledge discovery?
What is Web mining? How does it differ from regular data mining or text mining?
o Web mining is the discovery and analysis of interesting and useful information from
the Web and about the Web, usually through Web-based tools.
o Text mining is less structured because it's based on words instead of numeric data.
What is Web content mining? How can it be used for competitive advantage?
o Web content mining refers to the extraction of useful information from Web pages.
o The documents may be extracted in some machine-readable format so that automated
o Collecting and mining Web content can be used for competitive intelligence
What is Web structure mining? How does it differ from Web content mining?
o Web structure mining is the process of extracting useful information from the links
o By contrast, Web content mining involves analysis of the specific textual content of
web pages. So, Web structure mining is more related to navigation through a website,
whereas Web content mining is more related to text mining and the document
What is a search engine? Why are they important for today's businesses?
o A search engine is a software program that searches for documents (Internet sites or
sentence) that users have provided that have to do with the subject of their inquiry.
o This is the most prominent type of information retrieval system for finding relevant
o Search engines have become the centerpiece of most Internet-based transactions and
other activities.
o Because people use them extensively to learn about products and services, it is very
important for companies to have prominent visibility on the Web; hence the major
o A Web crawler (also called a spider or a Web spider) is a piece of software that
systematically browses (crawls through) the World Wide Web for the purpose of
o It starts with a list of "seed" URLs, goes to the pages of those URLs, and then follows
each page's hyperlinks, adding them to the search engine's database. Thus, the Web
crawler navigates through the Web in order to construct the database of websites.
o Search engine optimization (SEO) is the intentional activity of affecting the visibility
search results.
o It involves editing a page's content, HTML, metadata, and associated coding to both
increase its relevance to specific keywords and to remove barriers to the indexing
o In addition, SEO efforts include promoting a site to increase its number of inbound
links.
o SEO primarily benefits companies with e-commerce sites by making their pages
appear toward the top of search engine lists when users query.
What things can help Web pages rank higher in the search engine results?
visibility.
increase traffic.
o Updating content
weight to a site.
o Adding relevant keywords to a Web page's metadata, including the title tag and
metadescription,
increasing traffic.
so that they are accessible via multiple URLs and using canonical link
elements and redirects can help make sure links to different versions of the
What are the three types of data generated through Web page visits?
o Automatically generated data stored in server access logs, referrer logs, agent logs,
o User profiles
o By using the data and text mining techniques, a company might be able to discern
o Target electronic ads and coupons at user groups based on user access patterns.
o Predict user behavior based on previously learned rules and users' profiles.
What are commonly used Web analytics metrics? What is the importance of metrics?
Website usability:
These involve page views, time on site, downloads, click map, and
click paths.
Traffic sources:
Visitor profiles:
Conversion statistics:
o These metrics are important because they provide access to a lot of valuable
marketing data, which can be leveraged for better insights to grow your business and
better document your ROI. The insight and intelligence gained from Web analytics
can be used to effectively manage the marketing efforts of an organization and its
communality shared by every member of a body. Thus, social analytics in this sense
and interpreting digital interactions and relationships of people, topics, ideas and
content."
about existing and potential customers' current and future behaviors, and
about the likes and dislikes toward a firm's products and services.
connections/relationships.
Dating back to the 1950s, social network analysis is an interdisciplinary field that
emerged from social psychology, sociology, statistics, and graph (network) theory.
o Social media refers to the enabling technologies of social interactions among people
in which they create, share, and exchange information, ideas, and opinions in virtual
technological foundations of Web 2.0, and that allow the creation and exchange of
user-generated content.
What is social media analytics? What are the reasons behind its increasing popularity?
o Social media analytics refers to the systematic and scientific ways to consume the
vast amount of content created by Web-based social media outlets, tools, and
techniques for the betterment of an organization's competitiveness. Data includes
o The increasing popularity of social media analytics stems largely from the similarly
o First, determine what your social media goals are. From here, you can use
analysis tools such as descriptive analytics, social network analysis, and advanced
ii)
b) What is the reason for normalizing word frequencies?
i) Increase consistency of term-document matrix (TDM)
c) What are the common methods for normalizing word frequencies?
i) log frequencies, binary frequencies, and inverse document frequencies, among others.
d) What is SVD? How is it used in text mining?
i) Singular value decomposition (SVD), which is closely related to principal
components analysis, reduces the overall dimensionality of the input matrix (number
of input documents by number of extracted terms) to a lower-dimensional space,
where each consecutive dimension represents the largest degree of variability
(between words and documents) possible (Manning & Schutze, 1999). Ideally, the
analyst might identify the two or three most salient dimensions that account for most
of the variability (differences) between the words and documents, thus identifying the
latent semantic space that organizes the words and documents in the analysis. Once
such dimensions are identified, the underlying “meaning” of what is contained
(discussed or described) in the documents has been extracted.
e) What are the main knowledge extraction methods from corpus?
i) Classification
(1) Given a set of categories and group of text documents, the documents are matched
with the correct category using models developed with a training data set that
includes both documents and categories
ii) Clustering
(1) Grouping an unlabeled collection into meaningful clusters without prior
knowledge
(2) Great for web content searches
(3) Improves search recall
(4) Improved search precisions
(5) Most popular clustering methods
(a) Scatter/gather
(i) Dynamically generates table of contents for the collection and adapts and
modifies it in response to user selection
iii) Association
(1) Direct relationships between terms or sets of concepts
iv) Trend analysis
(1) Analyzing two collections but from different points in time
5) Section 6
a) What is sentiment analysis? How does it relate to text mining?
i) Sentiment analysis
(1) Technique used to detect favorable and unfavorable opinions toward specific
products and services using a large number of text sources.
ii) Text mining is one of the tools sentiment analysis to identify people’s opinions
b) What are the most popular application areas for sentiment analysis? Why?
i) Voice of the Customer
(1) Sentiment analysis can access a company’s product/service reviews to better
understand and better manage customer opinions
ii) Voice of the Market
(1) Understanding aggregate opinions and trends
(2) Helps companies with competitive intelligence and product development and
positioning
iii) Voice of the Employee
(1) Using rich, opinionated textual data is an effective and efficient way to listen to
what employees are saying
iv) Brand Management
(1) Sentiment analysis helps brand management move toward shaping perception
from managing experiences
v) Financial Markets
(1) Using sentiment analysis through social media, news, blogs, and discussion
groups to compute market movements
vi) Politics
(1) Predicting election results
(2) Helps understand what voters are thinking and clarify candidates’ position on
issues
(3) Help political organizations identify critical issues and positions to voters
vii) Government Intelligence
(1) Allows automatic analysis of opinions that people submit about pending policy or
government -regulation proposals
(2) Monitoring spikes in negative sentiment
c) What would be the expected benefits and beneficiaries of sentiment analysis in politics?
i) As we all know, opinions matter a great deal in politics. Because political discussions
are dominated by quotes, sarcasm, and complex references to persons, organizations,
and ideas, politics is one of the most difficult, and potentially fruitful, areas for
sentiment analysis. By analyzing the sentiment on election forums, one may predict
who is more likely to win or lose. Sentiment analysis can help understand what voters
are thinking and can clarify a candidate’s position on issues. Sentiment analysis can
help political organizations, campaigns, and news analysts to better understand which
issues and positions matter the most to voters. The technology was successfully
applied by both parties to the 2008 and 2012 American presidential election
campaigns.
d) What are the main steps in carrying out sentiment analysis projects?
i) Step 1: Sentiment Detection
(1) Differentiating between fact and opinion
ii) Step 2: N-P Polarity Classification
(1) Grouping opinions in the spectrum of positive or negative
iii) Step 3: Target Identification
(1) Identifying the target of the expressed sentiment (person, product, event, etc.)
iv) Step 4: Collection & Aggregation
(1) All sentiments are aggregated and converted into a single measure of sentiment
v)
e) What are the two common methods for polarity identification? Explain.
i) Using a Lexicon
ii) Using a collection of training docs
6) Section 7
a) What are some of the main challenges the Web poses for knowledge discovery?
i) The Web is too big for effective data mining.
ii) • The Web is too complex.
iii) • The Web is too dynamic.
iv) • The Web is not specific to a domain.
v) • The Web has everything.
b) What is Web mining? How does it differ from regular data mining or text mining?
i) Web mining is the discovery and analysis of interesting and useful information from
the Web and about the Web, usually through Web-based tools. Text mining is less
structured because it's based on words instead of numeric data.
c) What are the three main areas of Web mining?
i) The three main areas of Web mining are Web content mining, Web structure mining,
and Web usage (or activity) mining.
d) What is Web content mining? How can it be used for competitive advantage?
i) Web content mining refers to the extraction of useful information from Web pages.
The documents may be extracted in some machine-readable format so that automated
techniques can generate some information about the Web pages. Collecting and
mining Web content can be used for competitive intelligence (collecting intelligence
about competitors' products, services, and customers), which can give your
organization a competitive advantage.
e) What is Web structure mining? How does it differ from Web content mining?
i) Web structure mining is the process of extracting useful information from the links
embedded in Web documents. By contrast, Web content mining involves analysis of
the specific textual content of web pages. So, Web structure mining is more related to
navigation through a website, whereas Web content mining is more related to text
mining and the document hierarchy of a particular web page.
7) Section 8: Search Engines
a) What is a search engine? Why are they important for today’s businesses?
i) A search engine is a software program that searches for documents Internet sites or
files) based on the keywords (individual words, multi-word terms, or a complete
sentence) that users have provided that have to do with the subject of their inquiry.
This is the most prominent type of information retrieval system for finding relevant
content on the Web. Search engines have become the centerpiece of most Internet-
based transactions and other activities. Because people use them extensively to learn
about products and services, it is very important for companies to have prominent
visibility on the Web, hence the major effort of companies to enhance their search
engine optimization (SEO).
b) What is a Web crawler? What is it used for? How does it work?
i) A Web crawler (also called a spider or a Web spider) is a piece of software that
systematically browses (crawls through) the World Wide Web for the purpose of
finding and fetching Web pages. It starts with a list of "seed" URLs, goes to the pages
of those URLs, and then follows each page's hyperlinks, adding them to the search
engine's database. Thus, the Web crawler navigates through the Web in order to
construct the database of websites.
c) What is “search engine optimization?” Who benefits from it?
i) Search engine optimization (SEO) is the intentional activity of affecting the visibility
of an e-commerce site or a website in a search engine's natural (unpaid or organic)
search results. It involves editing a page's content, HTML, metadata, and associated
coding to both increase its relevance to specific keywords and to remove barriers to
the indexing activities of search engines. In addition, SEO efforts include promoting a
site to increase its number of inbound links. SEO primarily benefits companies with
e-commerce sites by making their pages appear toward the top of search engine lists
when users query.
d) What things can help Web pages rank higher in the search engine results?
i) Cross-linking between pages of the same website to provide more links to the most
important pages may improve its visibility. Writing content that includes frequently
searched keyword phrases, so as to be relevant to a wide variety of search queries,
will tend to increase traffic. Updating content so as to keep search engines crawling
back frequently can give additional weight to a site. Adding relevant keywords to a
Web page's metadata, including the title tag and meta description, will tend to
improve the relevancy of a site's search listings, thus increasing traffic.URL
normalization of Web pages so that they are accessible via multiple URLs. Using
canonical link elements and redirects can help make sure links to different versions of
the URL all count toward the page's link popularity score.
8) Section 9: Web Usage Mining
a) What are the three types of data generated through Web page visits?
i) Automatically generated data stored in server access logs, referrer logs,agent logs,
and client-side cookies
ii) • User profiles
iii) • Metadata, such as page attributes, content attributes, and usage data.
b) What is clickstream analysis? What is it used for?
i) Analysis of the information collected by Web servers can help us better understand
user behavior. Analysis of this data is often called click stream analysis. By using the
data and text mining techniques, a company might be able to discern interesting
patterns from the clickstreams.
c) What are the main applications of Web mining?
i) Determine the lifetime value of clients.
ii) • Design cross-marketing strategies across products.
iii) • Evaluate promotional campaigns.
iv) • Target electronic ads and coupons at user groups based on user access patterns.
v) • Predict user behavior based on previously learned rules and users' profiles.
vi) • Present dynamic information to users based on their interests and profiles.
d) What are commonly used Web analytics metrics? What is the importance of metrics?
i) There are four main categories of Web analytic metrics:
(1) • Website usability: How were they using my website? These involve pageviews,
time on site, downloads, click map, and click paths.
(2) • Traffic sources: Where did they come from? These include referral websites,
search engines, direct, offline campaigns, and online campaigns.
(3) • Visitor profiles: What do my visitors look like? These include keywords, content
groupings, geography, time of day, and landing page profiles.
(4) • Conversion statistics: What does all this mean for the business? Metrics include
new visitors, returning visitors, leads, sales/conversions, and abandonments.
ii) --> These metrics are important because they provide access to a lot of valuable
marketing data, which can be leveraged for better insights to grow your business and
better document your ROI. The insight and intelligence gained from Web analytics
can be used to effectively manage the marketing efforts of an organization and its
various products or services.
9) Section 10: Social Analytics
a) What is meant by social analytics? Why is it an important business topic?
i) From a philosophical perspective, social analytics focuses on a theoretical object
called a "socius," a kind of "commonness" that is neither a universal account nor a
communality shared by every member of a body. Thus, social analytics in this sense
attempts to articulate the differences between philosophy and sociology. From a BI
perspective, social analytics involves "monitoring, analyzing, measuring and
interpreting digital interactions and relationships of people, topics, ideas and content."
In this perspective, social analytics involves mining the textual content created in
social media (e.g., sentiment analysis, natural language processing) and analyzing
socially established networks (e.g., influencer identification, profiling, prediction).
This is an important business topic because it helps companies gain insight about
existing and potential customers' current and future behaviors, and about the likes and
dislikes toward a firm's products and services.
b) What is a social network? What is the need for SNA?
i) A social network is a social structure composed of individuals/people (or groups of
individuals or organizations) linked to one another with some type of
connections/relationships. Social network analysis (SNA) is the systematic
examination of social networks. Dating back to the 1950s, social network analysis is
an interdisciplinary field that emerged from social psychology, sociology, statistics,
and graph (network) theory.
c) What is social media? How does it relate to Web 2.0?
i) Social media refers to the enabling technologies of social interactions among people
in which they create, share, and exchange information, ideas, and opinions in virtual
communities and networks. It is a group of Internet-based software applications that
build on the ideological and technological foundations of Web 2.0, and that allow the
creation and exchange of user-generated content.
d) What is social media analytics? What are the reasons behind its increasing popularity?
i) Social media analytics refers to the systematic and scientific ways to consume the
vast amount of content created by Web-based social media outlets, tools, and
techniques for the betterment of an organization's competitiveness. Data includes
anything posted in a social media site. The increasing popularity of social media
analytics stems largely from the similarly increasing popularity of social media
together with exponential growth in the capacities of text and Web analytics
technologies.
e) How can you measure the impact of social media analytics?
i) First, determine what your social media goals are. From here, you can use analysis
tools such as descriptive analytics, social network analysis, and advanced (predictive,
text examining content in online conversations),and ultimately prescriptive analytics
tools.
OPENING VIGNETTE: Machine versus Men on Jeopardy!: The Story of Watson
What is Watson? What is special about it?
o Watson was capable of listening, understanding, responding, and winning in real time
o Watson proved that computer systems can do things that require human creativity and
intelligence.
What technologies were used in building Watson (both hardware and software)?
o Watson is built on the DeepQA framework. The hardware for this system involves a
and evaluation, entity and relation detection, logical form generation, and knowledge
Why did IBM spend all that time and money to build Watson? Where is the ROI?
o IBM's goal was to advance computer science by exploring new ways for computer
technology to affect science, business, and society. If successful, this could give IBM
Application Case 5.1: Insurance Group Strengthens Risk Management with Text Mining
Solution
How can text analytics and mining be used to keep up with changing business needs of
insurance companies?
o The purpose was to expand and automate the analysis of unstructured accident
reports, witness statements, and claim narratives for the automobile insurance
company.
What were the challenges, the proposed solution, and the obtained results with Insurance
case?
o The largest challenge is the unstructured nature of the documents and their variability.
Can you think of other uses of text analytics and text mining for insurance companies?
o There are many possible solutions, but this type of system could be used in other
insurance areas. For example, it could be used to help evaluate the potential risks
Application Case 5.2: AMC Networks Is Using Analytics to Capture New Viewers, Predict
Ratings, and Add Value for Advertisers in a Multichannel World
What are the common challenges broadcasting companies are facing nowadays? How can
content that appeals to their viewers. AMC developed original, hit shows such as
Breaking Bad, Better Call Saul, Mad Men, and The Walking Dead. Understanding
what type of content will be appealing requires the analysis of a large quantity and
variety of data from multiple sources. Analytic systems can help with this task by
o The company used analytic systems to help aggregate information across a wide
variety of platforms. After the information was aggregated, it was easier to evaluate
customer use trends, as well as to identify potential markets and submarkets that new
What were the types of text analytics and text mini solutions developed by AMC networks?
Can you think of other potential uses of text mining applications in the broadcasting
industry?
o An example of the type of analysis used would be to look at customer viewing trends,
o This type of analysis may help the industry drive towards more personalization in
bases were analyzed using text mining techniques to determine which statements
interest in crimes. The deception detection used only text-based features (cues) and
did NOT analyze the observed behavior of the witnesses during their testimony
o Classification models are trained and tested on quantified cues, and based on this,
the researchers was based on a process known as message feature mining, which
What do you think are the main challenges for such an automated system with mining lies
case?
o One challenge is that the training system depends on humans to ascertain the
Application Case 5.4 Bringing the Customer into the Quality Equation: Lenovo Uses
Analytics to Rethink Its Redesign
How did Lenovo use text analytics and text mining to improve quality and design of their
o Lenovo is a leading computing product manufacturer and uses text mining to better
understand their current and potential customers' needs and wants related to product
o Lenovo has been able to use text analytics and text mining to better understand
o By using advanced systems, the company is able to identify, collect, and process a
What were the challenges, the proposed solution, and the obtained results for Lenovo?
information, in this case, reviews and comments from users, and the
characterize user sentiment, and use this information to drive both customer
o These systems have been very successful, and there are plans to grow their use in
o In the research literature case study, the researchers analyzing academic papers
literature.
o Clustering was used in this study to identify the natural groupings of the articles, and
o Use of text and data mining can thus speed up and simplify the literature review
What are the common outcomes of a text mining project on a specific collection of journal
articles? Can you think of other potential outcomes not mentioned in this case?
o Text mining also has other possible applications in literature reviews. For example,
sentiment analysis can help to identify positive and negative judgments. Text mining
can be used to build taxonomies of concepts and terms within and between research
o Wimbledon used analytics to help improve the viewer experience by leveraging data
o For example, the system analyzed in real-time data coming from all the matches, and
o In the Wimbledon case study, the tournament used data for each tennis match in real
What were the challenges, the proposed solution, and the obtained results on Wimbledon?
o One of the challenges that the tournament faced was providing services to viewers
While the growth of mobile users was increasing, most users still utilize
o This meant that a hybrid solution needed to be undertaken, that provided the best
responsive viewing for mobile users, while integrating more in-depth and high-
o In the Wimbledon case study, designers balanced the needs of mobile and desktop
computer users.
Application Case 5.7 Understanding Why Customers Abandon Shopping Carts Results in a
$10 Million Sales Increase
How did Lotte.com use analytics to improve sales?
o Lotte.com is the leading Internet shopping mall in Korea and has developed its
integrated Web traffic analysis system using the SAS for Customer Experience
Analytics solution.
o This information enables Lotte.com to better understand customers and their behavior
o It is false to assume that little can be done about visitor Web site abandonment rates.
What were the challenges, the proposed solution, and the obtained results on Lotte.com
o In the Lotte.com retail case, the company deployed SAS for Customer Experience
Analytics to better understand the quality of customer traffic on their Web site,
classify order rates, and see which channels had the most visitors. Heightened
Do you think e-commerce companies are in better position to leverage benefits of analytics?
Why? How?
o To the degree that e-commerce companies integrate analytics into their systems, they
Application Case 5.8 Tito’s Vodka Establishes Brand Loyalty with an Authentic Social
Strategy
How can social media analytics be used in the consumer products industry?
o In the case, Tito's Vodka uses social media analytics to help identify trends in the
o The social media team actively uses Twitter and Instagram to have one-on-one
on Instagram.
o In the Tito's Vodka case, it was important that social media users all had a consistent
brand experience.
What do you think are the key challenges, potential solutions, and probable results in
o The largest challenge in this area will be collecting and analyzing such a diverse set
of information.
o This type of activity will require advanced analytics systems to help marketers
o Firms that engage in these practices successfully will be able to meet customer needs,
o Propinquity
________ is a segmentation metric for social networks that measures the strength of the
o Cohesion
________ is a technique used to detect favorable and unfavorable opinions toward specific
o Sentiment analysis
experience management initiatives, where the goal is to create an intimate relationship with
the customer.
________ statistics help you understand whether your specific marketing objective for a Web
o Conversion
________ Web analytics refers to measurement and analysis of data relating to your
o Off-site
________, also called homonyms, are syntactically identical words with different meanings.
o Polysemes
A(n) ________ engine is a software program that searches for Web sites or files based on
keywords.
o search
A(n) ________ is one or more Web pages that provide a collection of links to authoritative
Web pages.
o hub
A(n) ________ Web site contains links that send traffic directly to your Web site.
o referral
All of the following are challenges associated with natural language processing EXCEPT
Articles and auxiliary verbs are assigned little value in text mining and are usually filtered
out. True
At a very high level, the text mining process can be broken down into three consecutive
o Corpus
Because the term document matrix is often very large and rather sparse, an important
o dimensionality
Breaking up a Web page into its components to identify worthy words/terms and indexing
Categorization and clustering of documents during text mining differ only in the preselection
of categories.
o True
Clickstream analysis does not need users to enter their perceptions of the Web site or other
o True
Companies understand that when their product goes "viral," the content of the online
conversations about their product does not matter, only the volume of conversations.
o False
Consistent high quality, higher publishing frequency, and longer time lag are all attributes of
o False
Current use of sentiment analysis in voice of the customer applications allows companies to
o True
o This method employs a hierarchical clustering approach where the most relevant
documents to the posed query appear in small tight clusters that are nested in larger
Descriptive analytics for social media feature such items as your followers as well as the
content in online conversations that help you to identify themes and sentiments.
o False
structured documents.
o DeepQA
Identify, with a brief description, each of the four steps in the sentiment analysis process.
objective or subjective.
sentiment.
whole document.
In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but
o True
feelings.
o False
In text mining, if an association between two concepts has 7% support, it means that 7% of
o True
In the car insurance case study, text mining was used to identify auto features that caused
injuries.
o False
In the evolution of social media user engagement, the largest recent change is the growth of
creators.
o False
In the Lotte.com retail case, the company deployed SAS for Customer Experience Analytics
to better understand the quality of customer traffic on their Web site, classify order rates, and
o channels
In the Mining for Lies case study, a text based deception-detection method used by Fuller
and others in 2008 was based on a process known as ________, which relies on elements of
In the opening vignette, the architectural system that supported Watson used all the following
elements EXCEPT
o a core engine that could operate seamlessly in another domain without changes.
In the research literature case study, the researchers analyzing academic papers extracted
In the security domain, one of the largest and most prominent text mining applications is the
of doing?
o Identifying the content of telephone calls, faxes, e-mails, and other types of data and
intercepting information sent via satellites, public switched telephone networks, and
microwave links
In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe
for customers.
o True
In the Tito's Vodka case, it was important that social media users all had a(n) ________
brand experience.
o consistent
In the Wimbledon case study, designers balanced the needs of mobile and desktop computer
users.
o True
In the Wimbledon case study, the tournament used data for each match in real time to
highlight
o significant events.
In what ways does the Web pose great challenges for effective and efficient knowledge
- Different backgrounds
Natural language processing (NLP) is associated with which of the following areas?
o all of these
Natural language processing (NLP), a subfield of artificial intelligence and computational
o NLP is a discipline that studies the problem of understanding the natural human
language, with the view of converting depictions of human language into more formal
representations in the form of numeric and symbolic data that are easier for computer
programs to manipulate.
o True
Search engine optimization (SEO) techniques play a minor role in a Web site's search
o False
Search engines are only used in the context of the World Wide Web (WWW).
o False
Sentiment analysis projects require a lexicon for use. If a project in English is undertaken,
Since little can be done about visitor Web site abandonment rates, organizations have to
o False
Text analytics is the subset of text mining that handles information retrieval and extraction,
Understanding which keywords your users enter to reach your Web site through a search
Web ________ are used to automatically read through the contents of Web sites.
o crawlers/spiders
Web pages contain both unstructured information and ________, which are connections to
o hyperlinks
o Web site visitors download few of your offered PDFs and videos.
Web-based media has nearly identical cost and scale structures as traditional media.
o False
What are the three categories of social media analytics technologies and what do they do?
trends, such as how many followers you have, how many reviews were generated on
o Social network analysis: Follows the links between friends, fans, and followers to
identify
o Advanced analytics: Includes predictive analytics and text analytics that examine the
content in
o online conversations to identify themes, sentiments, and connections that would not
What is one major way in which Web-based social media differs from traditional publishing
media?
What is search engine optimization (SEO) and why is it important for organizations that own
Web sites?
o Search engine optimization (SEO) is the intentional activity of affecting the visibility
search results.
o In general, the higher ranked on the search results page, and more frequently a site
appears in the search results list, the more visitors it will receive from the search
engine's users.
o Being indexed by search engines like Google, Bing, and Yahoo! is not good enough
for businesses.
o Getting ranked on the most widely used search engines and getting ranked higher
What is the difference between white hat and black hat SEO activities?
o The main difference is that black hat focuses on techniques and strategies to get
higher search ranking. The focus is on search engines. In contrast, white hat focuses
on the use of techniques and strategies that are targeted to a human audience.
What types of documents are BEST suited to semantic labeling and aggregation to determine
sentiment orientation?
When a word has more than one meaning, selecting the meaning that makes the most sense
can only be accomplished by taking into account the context within which the word is used.
When viewed as a binary feature, ________ classification is the binary classification task of
negative opinion.
o polarity
Which of the following statements about Web site conversion statistics is FALSE?
o Visitors who begin a purchase on most Web sites must complete it.
Why are the users' page views and time spent on your Web site important metrics?
o Important metrics because the website may have issues with the design or structure.
There may be a disconnect with the marketing message and the content on the page.