Ambigu
Ambigu
Sentiment analysis has become popular in business intelligence and analytics applications due to the great
need for learning insights from the vast amounts of user generated content on the Internet. One major
challenge of sentiment analysis, like most text classification tasks, is finding structures from unstructured
texts. Existing sentiment analysis techniques employ the supervised learning approach and the lexicon
scoring approach, both of which largely rely on the representation of a document as a collection of words
and phrases. The semantic ambiguity (i.e., polysemy) of single words and the sparsity of phrases negatively
affect the robustness of sentiment analysis, especially in the context of short social media texts. In this study,
we propose to represent texts using dependency features. We test the effectiveness of dependency features
in supervised sentiment classification. We compare our method with the current standard practice using
a labeled data set containing 170,874 microblogging messages. The combination of unigram features and
dependency features significantly outperformed other popular types of features.
r
CCS Concepts: Computing methodologies → Machine learning → Learning paradigms → Super-
vised learning → Supervised learning by classification
Additional Key Words and Phrases: Sentiment analysis, text mining, dependency, feature extraction, super-
vised learning
ACM Reference Format:
Shuyuan Deng, Atish P. Sinha, and Huimin Zhao. 2017. Resolving ambiguity in sentiment classification:
The role of dependency features. ACM Trans. Manage. Inf. Syst. 8, 2–3, Article 4 (June 2017), 13 pages.
DOI: http://dx.doi.org/10.1145/3046684
1. INTRODUCTION
Posting on social media platforms has become one of the most popular activities on the
Internet. Social media messages contain rich user opinions and are being generated
in high volume and velocity, providing businesses with a great opportunity to monitor
their environments in real time [Bifet and Frank 2010; Yu et al. 2013]. Sentiment
analysis, also known as opinion mining, has emerged as a useful tool for extracting
subjective information from different types of texts [Liu 2012; Pang and Lee 2008],
such as blogs, reviews, comments, and tweets.
Sentiment analysis typically classifies the directional emotions in texts into different
categories, such as positive, negative, and neutral [Abbasi et al. 2011; Chen et al.
2012]. It relies on natural language processing (NLP) and text-mining techniques.
Since text data are usually unstructured, the biggest challenge of sentiment analysis is
finding meaningful structures from texts. A standard practice is to represent texts as a
collection of words and/or phrases. However, single words can have multiple meanings,
Authors’ addresses: S. Deng, A. P. Sinha, and H. Zhao, Sheldon B. Lubar School of Business, University of
Wisconsin-Milwaukee, P.O. Box 742, 3202 N. Maryland Ave., Milwaukee, WI 53201-0742; emails: dengs@
uwm.edu, [email protected], [email protected].
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for
components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.
To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this
work in other works requires prior specific permission and/or a fee. Permissions may be requested from
Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)
869-0481, or [email protected].
c 2017 ACM 2158-656X/2017/06-ART4 $15.00
DOI: http://dx.doi.org/10.1145/3046684
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
4:2 S. Deng et al.
known as polysemy. The meanings can vary greatly by context. Social media texts are
typically short, making the contextual information even less available. Phrases are
much less ambiguous compared to single words. However, they lack flexibility since
they only account for fixed word sequences.
This study addresses the polysemy issue in sentiment analysis by introducing depen-
dency features as sentiment indicators. Dependencies are pairwise word relations [De
Marneffe and Manning 2008]. We argue that simple dependency representation of texts
is more effective than phrases in sentiment analysis, especially for short documents, if
the analysis is conducted at the document level. Compared to using single words and
phrases, using dependencies has at least two advantages. First, dependencies incorpo-
rate contextual information by using word relations. Second, a word relation can still be
established even if two words are not adjacent. In this study, we introduce dependency
features into supervised sentiment classification. We compare the classification effec-
tiveness of dependencies with that of the current standard practice of using n-grams
and part-of-speech tagged words on a large test set.
The remainder of this article is organized as follows. In the second section, we review
the standard practices in sentiment analysis and the types of representation of sentence
structures. In the third section, we describe the advantage of dependency features
and propose the dependency-based text representation in sentiment classification. In
the fourth section, we evaluate the effectiveness of dependency features with that of
different baseline approaches. Then, we review related studies using dependencies and
discuss the contribution of this article. The last section identifies the limitations of our
study and future research directions.
2. BACKGROUND
There are two major approaches to sentiment analysis: supervised learning and lexicon
scoring [Liu 2012; Pang and Lee 2008]. The supervised learning approach represents a
document as a set of linguistic features and trains a machine-learning classifier using
a large annotated corpus in which the sentiment category of each document is known.
The trained model is subsequently used to classify the sentiment of other documents
[Hu and Liu 2004; Pang et al. 2002]. The most popular method used to represent texts
is the word n-gram model [Abbasi et al. 2011; Chou et al. 2010]. Unigram models
present a document as a vector of word frequencies (i.e., vector space model). This
is also known as the bag-of-words (BOW) model. Term frequency-inverse document
frequency (TF-IDF), which assigns more weights to words that only occur in a few
documents, has been used as an improvement to plain frequency [Chou et al. 2010;
Ngo-Ye and Sinha 2012]. For short documents, such as social media text, the binary
value of word presence has also been used (i.e., whether it exists or does not exist). As
an effort to resolve word semantics, part-of-speech (POS) tagged words have also been
experimented with [Blitzer et al. 2006; Tsai et al. 2016; Zimbra et al. 2015]. A major
drawback of the BOW model is the strong assumption that the order of words (i.e.,
syntax) does not matter. To incorporate syntactical information, existing studies have
also attempted to use phrase patterns, bi-grams, and tri-grams. However, they have
not found consistent performance improvement by using these features.
One type of important information that previous research in sentiment analysis has
not effectively captured is syntax, the principles of constructing sentences [Chomsky
1965]. In natural language processing, there are two types of representations of sen-
tence structures, the constituency grammar and the dependency grammar [Covington
2001]. Constituency grammar, also known as phrase structure grammar, describes a
sentence as a set of constituency relations [Chomsky 2002]. Single words (i.e., leaves)
are the constituents of phrases, which, in turn, are constituents of more complicated
phrases, the eventual constituents of the sentence (i.e., root). The phrase features used
in sentiment classification are a simplified case of constituency features.
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
Resolving Ambiguity in Sentiment Classification 4:3
3. RESEARCH METHOD
In this study, we propose the use of dependency features to improve supervised senti-
ment classification. We compare proposed dependency features with features used in
previous studies in terms of classification correctness.
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
4:4 S. Deng et al.
dependency, dobj(stealing, thunder), means that thunder is the direct object of stealing.
Although the two words are not adjacent, their close relationship can still be cap-
tured by this dependency. The last dependency, det(the, thunder), means that the is
the determiner of thunder, which does not carry more meaning than merely the word,
thunder.
Without loss of generality, we suppose a classifier is trained using Sentence 1 (among
others) and is used to classify two other sentences, both of which are real-world exam-
ples:
Sentence 2: Apple keeps stealing Samsung’s thunder.
Sentence 3: Apple stealing user information via Face Time.
We show the different representations of both sentences in Table II. In these rep-
resentations, if an element has also appeared in Sentence 1, then we display it in
bold.
It is clear that Sentences 1 and 2 both express positive sentiment toward Apple, while
Sentence 3 expresses negative sentiment. Among the different types of feature repre-
sentation, only unigram, POS-tagged words, and dependency relations can reveal the
similarity between Sentences 1 and 2. However, unigram and POS-tagged words also
capture some similarity between Sentences 1 and 3, which have completely opposite
sentiment polarity. Among all types of representation, dependency is the only one that
can accurately detect the major similarities and differences between Sentence 1 and
the other sentences. We summarize the similar elements of different representations
for the three sentences in Table III.
3.2. Research Design
To use dependency relations as features, a document is parsed into dependencies. The
occurrence of each dependency in the corpus is measured for each document to generate
a vector space model. The dependency features may be quantified using frequency
(continuous) or presence (binary) numbers. In addition, inverse document frequency
may be employed. Then, a training data set containing documents represented by
dependency vectors and their sentiment categories is used to build a classifier.
4. EVALUATION
To evaluate the effectiveness of dependency features for sentiment classification, we
conducted experiments on a large social media data set. The data set contains all user
messages from Stocktwits between July 2009 and April 2014. Stocktwits is a leading
social media platform for investors to share opinions about the financial market. Sim-
ilar to tweets, Stocktwits messages are limited to 140 characters. Instead of officially
supporting Hashtags, Stocktwits uses Cashtags to track stocks and other financial as-
sets mentioned in a message. On Stocktwits, users can mark the sentiment of their
postings as bullish or bearish. In our data set, there are 87,776 bearish messages and
265,452 bullish messages. We did not use any unmarked messages since the sentiment
in these messages are uncertain. The experimental task is to classify each message
as bullish or bearish. For benchmarking purposes, we sampled an equal number of
bullish and bearish messages for cross validation. Our final data set contains 170,874
messages, half of which are bullish, and the other half are bearish.
We used Stanford typed dependencies in this study [De Marneffe and Manning 2008].
This representation defines approximately 50 grammatical relations. To generate de-
pendency features, we first used the CMU Ark Tweet POS Tagger [Owoputi et al.
2013] to tag the Stocktwits messages. Next, we used the Stanford Parser [De Marneffe
et al. 2006], a Java library published by the Stanford NLP group, to parse the tagged
messages into dependencies. The dependency parser was trained using the Wall Street
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
Resolving Ambiguity in Sentiment Classification 4:5
Journal (WSJ) section of the Penn treebank, which consisted of one million words of
manually annotated sentences [Marcus et al. 1993]. Each sentence in the treebank
was represented as a constituency tree. The parser first decomposed the constituency
trees into rules to represent a context-free grammar [Charniak 1996]. For example, a
sentence (S) consisting of a noun phrase (NP) and a verb phrase (VP) was represented
as rule, S → NP VP; a noun phrase consisting of a determiner (DT) and a noun (NN)
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
4:6 S. Deng et al.
was represented as rule, NP → DT NN. Each rule was assigned a probability based on
how often it occurred in the training corpus. Given a POS tagged sentence, its parse
tree was constructed by maximizing the joint likelihood of the rules [Johnson 1998].
The search for the maximum likelihood was accomplished using the CKY algorithm
[Martin and Jurafsky 2000]. Next, the dependency relations were extracted using pre-
defined patterns [De Marneffe et al. 2006]. A previous study has shown that the parser
can achieve about 90% accuracy in parsing English texts [Chen and Manning 2014].
The time complexity of the parser is O(n3 ) [Klein and Manning 2003]. Parsing all of
the messages took approximately 6h using a single core of an Intel i7-6700HQ processor.
We then constructed the features and performed the sentiment classification in Python.
Each run (training and testing) took less than 1min.
For each model, we performed 10-fold cross validation 10 times. We used accuracy to
evaluate the performance of the different models. We conducted five groups of sentiment
classification with different settings. Group 1 (1G) uses only word unigrams; Group 2
(1G+2G) uses word unigrams and bigrams; Group 3 (1G+2G+3G) uses word unigrams,
bigrams, and trigrams; Group 4 (POS) uses POS-tagged words; Group 5 (1G+DEP) uses
word unigrams and dependencies. Each dependency is treated as a single term.
Prior research proposed to use back-off dependency features in sentence-level sub-
jectivity detection, i.e., classifying opinion sentences and non-opinion sentences [Joshi
and Penstein-Ros 2009]. A back-off dependency is a dependency triplet with the head
or the modifier replaced with its POS tag. They found that the combination of unigrams
and back-off dependency features significantly outperformed the aforementioned base-
lines and even the combination of unigrams and lexicalized dependencies (Group 5).
It would be interesting to examine the usefulness of back-off dependencies. Thus, we
created two additional baseline groups. Group 6 (1G+M-BO) combines unigrams and
back-off dependencies with modifier words replaced by their POS tags. Group 7 (1G+H-
BO) consists of unigrams and back-off dependencies with head words replaced by their
POS tags.
To ensure the robustness of the results, we used two popular text classification meth-
ods, linear Support Vector Machine (SVM) and Naı̈ve Bayes (NB). We also quantified
the features using both frequency (continuous) and presence (binary) numbers, with
and without inverse document frequency. Because text classification often needs to
deal with high dimensional data, feature selection may help improving classification
accuracy. Combining dependencies and unigrams would significantly increase the fea-
ture space. Thus, we ran each experiment again with an additional feature selection
procedure to examine the sensitivity of the results to feature selection. In this study,
we chose Chi-squared-based feature selection and information gain-based feature se-
lection. Both have been shown to be effective in text classification [Chou et al. 2010]. In
Chi-squared-based feature selection, the Chi-squared value between each feature and
the sentiment class was calculated and ranked. In information gain-based selection, the
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
Resolving Ambiguity in Sentiment Classification 4:7
information gain (or reduction in uncertainty) between each feature and the sentiment
class was calculated and ranked. For each run, the top 10% features were retained to
train and test the model. Although the choice of the threshold, 10%, is arbitrary, it was
used for all groups and settings. Thus, this choice will not cause consistent bias for the
purpose of benchmarking.
Table IV shows the classification accuracy without feature selection. The highest
accuracy in each setting is displayed in bold. SVM results were generally better than
NB results, primarily due to the fact that the regularization used in SVM allows it to
work well with high-dimensional data. Using continuous features did not differ signif-
icantly from using binary features. This can be attributed to the fact that many terms
occur only once in a short microblog message. Using only unigram features gave the
lowest accuracy across all settings. Adding bigrams significantly improved the accu-
racy by about 2% across different settings. Further adding trigrams only led to slight
improvement (<0.5%). This supports our conjecture that capturing syntactical infor-
mation helps classifying sentiment in short texts. The performance of POS features was
better than that of unigrams but worse than that of the combination of unigrams and
bigrams. The combination of unigrams and back-off dependencies did not consistently
outperform the combination of unigrams, bigrams, and trigrams.
The combination of unigrams and dependencies gave the best performance. All SVM
results using such a feature set achieved over 1% improvement, compared to the sec-
ond best (1G+2G+3G). The difference between the accuracy of all runs in the two
groups is statistically significant (p < 0.001). This difference translates to more than
1,709 correctly predicted messages in our data set. Given the large message volume on
microblog websites, this improvement also represents a practically significant number.
Table V shows the results of Groups 1–7 when only using the Top 10% features
based on Chi-squared values. The highest accuracy in each setting is displayed in bold.
The simple Chi-squared feature selection procedure did not improve the classifica-
tion accuracy. Nonetheless, our purpose is to show that dependency feature improves
classification accuracy over other baselines even after feature selection. Consistent
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
4:8 S. Deng et al.
Table VII. Classification Accuracy (%) of 2G, H-BO, M-BO, and DEP
Classifier Measure IDF 2G H-BO M-BO DEP
No 75.50 74.25 71.18 76.06
Continuous
Yes 75.43 74.40 71.45 75.97
NB
No 75.56 74.24 71.26 76.10
Binary
Yes 75.47 74.39 71.50 76.00
No 73.55 73.50 68.40 74.50
Continuous
Yes 74.92 75.00 70.51 76.08
SVM
No 73.58 73.48 68.56 74.52
Binary
Yes 74.92 74.99 70.53 76.13
with the results in Table IV, using only unigram features yielded the lowest accuracy.
Adding bigrams led to significant improvement and further adding trigrams only led
to slight improvement. POS performed between 1G and 1G+2G, except in two settings,
where it outperformed all other groups. The combination of unigrams and back-off
dependencies did not consistently outperform the combination of unigrams, bigrams,
and trigrams. Similar patterns have been observed in the results using information
gain-based feature selection (Table VI).
After feature selection, using either Chi-squared values or information gain, the
combination of unigrams and dependencies outperformed all other groups in six of
eight settings. The improvement in the best-performing group over the second best is
over 2%; the improvement is statistically significant (p < 0.001). This shows that the
usefulness of dependency features is robust to feature selection.
Dependency is most similar to bigram since both of them can capture the syntactical
relation between two words. As we mentioned earlier, dependency can further identify
remote word relations. To compare the usefulness of dependency directly against that of
bigram in sentiment classification, we conducted additional two groups of experiments,
using bigram only (2G) and using dependency only (DEP), respectively. We also included
the back-off dependency features (H-BO and M-BO) as baselines. Table VII shows the
classification accuracy without feature selection. The highest accuracy in each setting
is displayed in bold. DEP improved accuracy by about 1% in most settings, compared
to 2G, H-BO, and M-BO (p < 0.001). Table VIII shows the classification accuracy using
the top 10% features based on Chi-squared values. DEP outperformed the baselines
in all settings (p < 0.001). Table IX shows the results using information gain-based
feature selection. DEP outperformed all baselines except in two settings.
5. RELATED STUDIES
There have been a number of studies that explored the usefulness of dependency struc-
ture in text classification (summarized in Table X). Wilson et al. [2004] proposed syntac-
tical features to classify the strength of opinion into neutral, low, medium, and high. The
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
Resolving Ambiguity in Sentiment Classification 4:9
Table VIII. Classification Accuracy (%) of 2G, H-BO, M-BO, and DEP
(Using top 10% features based on Chi-squared values)
Classifier Measure IDF 2G H-BO M-BO DEP
No 73.47 73.60 70.57 75.10
Continuous
Yes 73.49 73.75 70.73 75.07
NB
No 73.52 73.57 70.61 75.11
Binary
Yes 73.52 73.72 70.76 75.11
No 73.08 73.86 69.52 73.96
Continuous
Yes 74.26 74.95 70.95 75.93
SVM
No 73.16 73.88 69.50 74.09
Binary
Yes 74.30 74.94 70.95 76.00
Table IX. Classification Accuracy (%) of 2G, H-BO, M-BO, and DEP
(Using top 10% features based on information gain)
Classifier Measure IDF 2G H-BO M-BO DEP
No 71.14 72.40 59.51 73.06
Continuous
Yes 71.16 72.34 69.60 73.11
NB
No 71.22 72.37 69.54 73.08
Binary
Yes 71.22 72.31 69.59 73.13
No 69.54 72.08 67.61 70.68
Continuous
Yes 70.91 73.21 69.12 73.24
SVM
No 69.60 72.13 67.73 70.72
Binary
Yes 70.97 73.22 69.17 73.29
feature set they proposed includes POS tagged words, dependency, and word location
in a dependency tree. The experiments on a manually annotated news corpus showed
about 5% improvement over the baselines. However, it is not clear if the improvement
can be attributed to the dependency features. Ng et al. [2006] proposed to use three
types of dependency relations as features in classifying customer reviews as positive
or negative. The three types of dependency are adjective-noun, subject-verb, and verb-
object. However, these features did not help with classification accuracy. In Wilson et al.
[2009], dependency information related to a word was used to classify the sentiment of
the word. However, the features cannot be applied to document-level classification.
Joshi and Penstein-Ros [2009] used dependency features to detect if a sentence
contains opinion. They proposed back-off dependency features, which replace a word
or both words in a dependency with its/their POS tags. One example is amod(NN,
great), which indicates a dependency containing the word “great” modifying a noun.
This example feature works well for identifying the similar sentiment in the following
two sentences:
The camera is great.
The MP3 player is great.
However, such back-off dependency does not distinguish the difference for the fol-
lowing two phrases:
Cure cancer.
Have cancer.
Both of the two phrases can be represented as dobj(VB, cancer), i.e., a dependency
containing the word “cancer,” which is the direct object to a verb, but they have very
different sentiment polarity. Joshi and Penstein-Ros [2009] also proposed full back-off
features (e.g., amod(NN, ADJ)) and n-gram back-off features. They found that the back-
off features outperformed the unigram baseline. However, they did not find additional
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
4:10 S. Deng et al.
usefulness of the back-off features beyond the simple dependency features (with words).
No effort was made to classify positive sentiment against negative in their study, either.
Pak and Paroubek [2010] were among the first to explore the usefulness of depen-
dency in sentiment classification. The features they used include two-node and three-
node dependency subgraphs. These subgraphs were selected using manually created
rules. Moreover, they replaced words other than adjectives and verbs with a wildcard.
The idea is similar to the back-off dependencies in Joshi and Penstein-Ros [2009].
They tested the features on a data set of movie reviews. The experiment results did not
show improvement over the bag-of-words baseline reported in Matsumoto et al. [2005].
Nakagawa et al. [2010] modeled each dependency in a sentence with a hidden
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
Resolving Ambiguity in Sentiment Classification 4:11
REFERENCES
A. Abbasi, S. France, Z. Zhang, and H. Chen. 2011. Selecting attributes for sentiment classification using
feature relation networks. IEEE Trans. Knowl. Data Eng. 23, 3. 447–462.
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
4:12 S. Deng et al.
A. Bifet and E. Frank. 2010. Sentiment knowledge discovery in twitter streaming data. In Proceedings of
13th International Conference on Discovery Science. Springer. 1–15.
J. Blitzer, R. McDonald, and F. Pereira. 2006. Domain adaptation with structural correspondence learning. In
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association
for Computational Linguistics. 120–128.
E. Charniak. 1996. Tree-bank grammars. In Proceedings of the National Conference on Artificial Intelligence.
1031–1036.
D. Chen and C. D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Pro-
ceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14).
740–750.
H. Chen, R. H. Chiang, and V. C. Storey. 2012. Business intelligence and analytics: From big data to big
impact, MIS Quart. 36, 4. 1165–1188.
N. Chomsky. 1965. Aspects of the Theory of Syntax. MIT press.
N. Chomsky. 2002. Syntactic Structures. Walter de Gruyter.
C.-H. Chou, A. P. Sinha, and H. Zhao. 2010. A hybrid attribute selection approach for text classification. J.
Assoc. Informat. Syst. 11, 9. 491–518.
M. A. Covington. 2001. A fundamental algorithm for dependency parsing. In Proceedings of the 39th Annual
ACM Southeast Conference. Citeseer. 95–102.
M.-C. De Marneffe, B. MacCartney, and C. D. Manning. 2006. Generating typed dependency parses
from phrase structure parses. In Proceedings of the Language Resources and Evaluation Conference
(LREC’06). 449–454.
M.-C. De Marneffe and C. D. Manning. 2008. The stanford typed dependencies representation. In Coling 2008:
Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation. Association for
Computational Linguistics. 1–8.
M. Hu and B. Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the Tenth ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. 168–177.
M. Johnson. 1998. PCFG models of linguistic tree representations. Computat. Linguist. 24, 4. 613–632.
M. Joshi and C. Penstein-Ros. 2009. Generalizing dependency features for opinion mining. In Proceedings of
the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics. 313–316.
D. Klein and C. D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting
on Association for Computational Linguistics, Vol. 1. Association for Computational Linguistics. 423–
430.
B. Liu. 2012. Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5, 1. 1–167.
M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini. 1993. Building a large annotated corpus of english:
The penn treebank. Computat. Linguist. 19, 2. 313–330.
J. H. Martin and D. Jurafsky. 2000. Speech and Language Processing. International Edition.
S. Matsumoto, H. Takamura, and M. Okumura. 2005. Sentiment classification using word sub-sequences
and dependency sub-trees. In Advances in Knowledge Discovery and Data Mining. Springer. 301–311.
T. Nakagawa, K. Inui, and S. Kurohashi. 2010. Dependency tree-based sentiment classification using CRFs
with hidden variables. In Human Language Technologies: The 2010 Annual Conference of the North
American Chapter of the Association for Computational Linguistics. Association for Computational Lin-
guistics. 786–794.
V. Ng, S. Dasgupta, and S. Arifin. 2006. Examining the role of linguistic knowledge sources in the automatic
identification and classification of reviews. In Proceedings of the COLING/ACL on Main Conference
Poster Sessions. Association for Computational Linguistics. 611–618.
T. L. Ngo-Ye and A. P. Sinha. 2012. Analyzing online review helpfulness using a regressional relieff-enhanced
text mining method. ACM Trans. Manag. Inform. Syst. 3, 2. 1–20.
O. Owoputi, B. O’Connor, C. Dyer, K. Gimpel, N. Schneider, and N. A. Smith. 2013. Improved part-of-
speech tagging for online conversational text with word clusters. In Proceedings of the 2013 Conference
of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies (NAACL-HLT’13). 380–390.
A. Pak and P. Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings
of the Language Resources and Evaluation Conference (LREC’10).
B. Pang and L. Lee. 2008. Opinion Mining and Sentiment Analysis. Now Pub.
B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learn-
ing techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language
Processing, Vol. 10. Association for Computational Linguistics. 79–86.
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.
Resolving Ambiguity in Sentiment Classification 4:13
M.-F. Tsai, C.-J. Wang, and P.-C. Chien. 2016. Discovering finance keywords via continuous-space language
models. ACM Trans. Manag. Inform. Syst. 7, 3. 1–17.
D. Vilares, M. A. Alonso, and C. Gómez Rodrı́guez. 2015. On the usefulness of lexical and syntactic processing
in polarity classification of Twitter messages. J. Assoc. Inform. Sci. Technol. 66, 9. 1799–1816.
T. Wilson, J. Wiebe, and P. Hoffmann. 2009. Recognizing contextual polarity: An exploration of features for
phrase-level sentiment analysis. Comput. Linguist. 35, 3. 399–433.
T. Wilson, J. Wiebe, and R. Hwa. 2004. Just how mad are you? finding strong and weak opinion clauses. In
Proceedings of the 19th National Conference on Artifical Intelligence. 761–767.
Y. Yu, W. Duan, and Q. Cao. 2013. The impact of social and conventional media on firm equity value: A
sentiment analysis approach. Decis. Supp. Syst. 55, 4. 919–926.
D. Zimbra, H. Chen, and R. F. Lusch. 2015. Stakeholder analyses of firm-related web forums: Applications
in stock return prediction. ACM Trans. Manag. Inform. Syst. 6, 1.
ACM Transactions on Management Information System, Vol. 8, No. 2–3, Article 4, Publication date: June 2017.