Wang 2016
Wang 2016
Abstract—Social media is arguably the richest source of analysis technologies focus on finding the aggregate level
human generated text input. Opinions, feedbacks and critiques sentiment such that the sentiment polarity is typically one of
provided by internet users reflect attitudes and sentiments two categories (“positive” and “negative”) or three categories
towards certain topics, products, or services. The sheer volume of (with the addition of “neutral”) [2]. If finer-grained sentiment
such information makes it effectively impossible for any group of analysis can be achieved, it will yield more specific and more
persons to read through. Thus, social media sentiment analysis actionable results with detailed negative emotion subcategories
has become an important area of work to make sense of the social such as anger, sadness, and anxiety or positive emotion
media talk. However, most existing sentiment analysis techniques subcategories such as happiness and excitement [3].
focus only on the aggregate level, classifying sentiments broadly
into positive, neutral or negative, and lack the capabilities to In this paper, we describe a new method for fine-grained
perform fine-grained sentiment analysis. This paper describes a classification of social media sentiment. The actual sentiments
social media analytics engine that employs a social adaptive fuzzy as well as detailed emotions were identified in accordance with
similarity-based classification method to automatically classify industry needs. The basis for this method is a series of patents
text messages into sentiment categories (positive, negative, filed in [4] [5] and [6].
neutral and mixed), with the ability to identify their prevailing
emotion categories (e.g., satisfaction, happiness, excitement, The rest of this paper is organized as follows. Section II
anger, sadness, and anxiety). It is also embedded within an end- discusses the existing sensing technologies. Section III presents
to-end social media analysis system that has the capabilities to the proposed methodology of fine-grained sentiment analysis.
collect, filter, classify, and analyze social media text data and Section IV examines the performance of the proposed method
display a descriptive and predictive analytics dashboard for a using real world social media data. Lastly, in Section V, we
given concept. The proposed method has been developed and is conclude this study.
ready to be licensed to users.
II. EXISTING TECHNOLOGIES
Keywords—sentiment classification; sentiment analysis;
opinion mining; social media; social adaptive fuzzy similarity;
Sentiment analysis methods can be broadly categorized into
emotion two types: learning-based and lexical-based [7] [8]. Learning-
based method uses known properties derived from labelled
I. INTRODUCTION training data to make predictions about unlabelled new data. In
text data, it derives the relationship between the features of the
Social media, such as Twitter, Facebook and Chinese
text segment. Some examples of learning-based methods are
Weibo, is overwhelmingly the go-to platform for internet users
the Naïve Bayes (NB) classifier [9] [10], Maximum Entropy
to share their comments or experiences towards certain
(MaxEnt) classifier [11], support vector machine (SVM) [12]
products, services or policies. It is a gold mine for those who
[13] and Extreme Learning Machine (ELM) [14] [15].
appreciate the value of understanding public sentiment.
To be effective, models using such learning-based methods
There are various compelling use cases of social media
typically require a sufficiently large labelled training dataset
sentiment analysis: consumers referring to online reviews to
[15] [16] to achieve an acceptable classification accuracy [16]
help them make better purchase decisions; businesses eager to
[17]. However, in most social media contexts, it is difficult to
understand market preferences in order to improve their
determine what size of labelled dataset qualifies as being
offerings; politicians aspiring to gauge public response to their
sufficient because the diversity of the social discussion is not
policies or speeches. Not surprisingly, one of the hottest areas
known a priori [3] [12]. In addition, the labelling task would
of research in social analytics is sentiment analysis.
be costly or even prohibitive [3] [7] [12], not to mention
Sentiment analysis aims to understand the sentiment wasteful because the training results could not be readily
polarity of data [1]. A lot of social media analysis tools are applied to other datasets.
now available to perform such analysis, such as Stanford NLP's
On the other hand, lexical-based methods typically search a
natural language processing tool [2], Facebook Insights on
text for sentiment or emotion indicators specified in the
Facebook and TweetStats on Twitter. However, these existing
existing lexicons used [7] [18] [19] [20]. The effects of the
The work is supported by A*STAR Joint Council Office Development
Programme “Social Technologies+ Programme”.
1361 | P a g e
978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016
6-7 December 2016 | San Francisco, United States
indicators are then aggregated in order to derive the dominant applied the method to a real-world case which provides an
polarity of the text. Compared to learning-based methods, answer to the key question of whether public sentiments are
lexical-based methods are easier to be applied across different useful in support of effective social sensing and policy
datasets and costly labelling tasks are not required (no training management. The resulting insights enable decision-makers to
needed). formulate strategies and fine-tune the quality of their products,
services, and policies.
However, there are shortcomings in the current lexicon-
based methods. It is hard to create a unique lexical-basic III. PROPOSED TECHNIQUE
dictionary for use and test in different applications. Hence, the
existing methods use cleaned samples that are manually Detecting the attitude or emotions of a user with respect to
created. However, such data is different from real-world social certain topics or certain domains is the aim of sentiment
media data and only real-world social media data can produce analysis [2] [4] [5] [6]. Various techniques are leveraged for
true insight for organizations. The other shortcoming, as the design of the proposed method [3]. These techniques
mentioned earlier in Section I, is the lack of fine-grained include: linguistic inquiry and word count (LIWC) method
sensing capability [3] that provides detailed emotion [27], the affective norms for English words (ANEW) approach
identification. These shortcomings of the lexicon based for assigning normative emotional ratings to text [28], fuzzy
methods are also the limitations of the current learning based logic [29] and emotion theories [21] [22] [23] [24].
methods. The proposed method pays special design attention to the
The research on emotion has a long evolutionary history challenges of real-world datasets. It uses an innovative social
and emotion research activities have increased significantly adaptive fuzzy rule inference technique with linguistics
over the past two decades. One of the earlier efforts in emotion processors designed to minimize semantic ambiguity. This is
research was the effort of Shaver et al. [21]. Shaver et al. combined with multi-source lexicon integration and
grouped emotions into prototypes on the assumption that development to derive dominant valence (positive, negative,
different parts of emotion knowledge tend to make up an neutral, mixed) as well as prominent emotions (e.g., anger,
organized whole [21]. In their experiment, they first selected a sadness, anxiety, satisfaction, happiness, excitement).
group of words and had them rated based on whether the word A. Design Features and Components of the Proposed Method
was an emotion. Using the typical prototyping approach, they
The backbone of the proposed method is a social adaptive
managed to develop an abstract-to-concrete emotion hierarchy.
fuzzy inference algorithm that mimics human interpretations of
Psychologists Ortony and Turner argued against the view the expression of attitudes and emotions in online social
that basic emotions are psychologically primitive [22]. They network contexts. There is also a built-in advanced linguistic
proposed that all emotions are discrete and independent and are processing unit that contains the following sub-modules:
related to each other through a hierarchical structure. sentence decomposers, negation handlers, amplifier, diminisher
handlers, etc. [4] [5] [6] In addition, the proposed method is
Ekman’s emotion model is based on the argument that empowered by built-in linguistic lexicons from a variety of
there are distinctive facial expressions [23]. In this model, the sources, including a dictionary of emotion words and phrases
emotions are treated as discrete, measurable, and from Standard English, Internet/social media slang and local
physiologically distinct. Each of the emotions is a family of languages. It also includes emoticons. With more linguistics-
related states and this is consistent with Shaver’s model [21]. enhanced fuzzy similarity rules to handle sentiment
Plutchik enhanced Ekman's biologically driven perspective classification and without relying on any training data, it is thus
and developed the "wheel of emotions" [24]. He constructed a able to achieve the same level of measurement accuracy with
wheel-like diagram of emotions to visualize the basic emotions less human input than simple lexicon-based and learning-based
and grouped the primary emotions into a positive vs negative methods.
category, e.g., joy versus sadness; anger versus fear; trust The domain knowledge was obtained by using the domain
versus disgust; and surprise versus anticipation [24] [25]. lexicon knowledge extraction algorithm [30] to form domain
On the other hand, Alena et al. also took the typical lexicon lexicon dictionaries. In addition, to enhance domain
approach which leveraged and enhanced the above emotion adaptability, an expert user can further configure the domain
models [26]. They had each emotion word annotated by expert knowledge through the specification of a seed lexicon. For
annotators and compiled the words into an emotion dictionary example, the expert user can add to the lexicon the phrase
[26]. “salary lower” (in the company review domain), and to remove
from the lexicon the word “smart” (as in “smart watch” in the
The above efforts have contributed greatly to emotion smart phone domain). This can achieve a higher measurement
research and identification; however, there have been rare accuracy than simple lexicon-based and learning-based
research efforts that make use of them to integrate emotion methods.
analysis into sentiment analysis and enhance the capability of
the sensing technologies. B. Social Media Analysis System
In this paper, we leverage the above emotion research to To make the proposed method useful for real-world
develop fine-grained sentiment analysis technologies and datasets, we implement it within an end-to-end social media
implement a fine-grained emotion sensing method to address analysis system. The system consists of 6 modules, including
the limitations of the existing technologies. In addition, we social data collectors, noise filters, sentiment & emotion
1362 | P a g e
978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016
6-7 December 2016 | San Francisco, United States
analysis engine module, predictive analyser, results viewer and further improves the assessment of the situation, particularly
database. Fig. 1 shows the system’s architecture. negative emotions requiring attention from decision-makers
and crisis managers.
As shown in Fig. 2, the final outcome of any text will be
the sentiment categories and fine-grained emotions [2] [23]
[24] [25]. Fig. 2 (a) shows the sentiments and Fig. 2 (b) shows
the fine-grained emotions the system outputs.
The “Data Collector” crawls raw data from various Internet Fig. 2. Positive or negative sentiment can be further broken-down into fine-
sites, including forums, Twitter, and other blogs. Depending on grained emotions. (a) Sentiments; (b) Break down of negative sentiment into
fine-grained emotions
whether the data sources provide programmatic interface to
read data (such as Twitter’s REST API based on keywords and
For real-time testing of the proposed method, the interface
Streaming API that reads data constantly), the module is a
of real-time data analysis is illustrated in Fig.3. The tweets are
collection of codes that collects data and passes them to the
used as a test case to illustrate real time data collection,
“Noise Filter” module before processing.
analysis and visualization. The data containing geographic
The “Noise Filter/Smart Filter” removes noisy information is displayed in the form of a map.
“meaningless data”, such as advertisements, useless content
which does not include any comment information, and other
content-specific noises. Raw data are pre-processed by “Noise
Filter” to determine if they are relevant data or irrelevant data.
The relevant data are passed to an optional sub-module, “User-
defined filter”, that allows the user to define rules to further
trim out some data. These filtering ensures that data passed to
the “sentiment analysis engine” module is relevant to the
intended concept for further analysis.
The “Predictive Analyzer” performs the task of predictive
analysis of important outcomes such as sales volumes and
reputation crisis so that it can be used for important business
activities of forecasting, monitoring and action strategizing. It
includes two key components, 1) the predictor/feature set and
2) the predictive algorithm pool. The output of sentiment and
emotion analysis (i.e., such as positive, negative, neutral and Fig. 3. Part of the interface of the social media analytics system
mixed sentiments, and anger, sadness and anxiety emotions)
serves as a new predictor/feature on top of existing V. CONCLUSION
predictors/features. This research describes a social media analytics method
Consumer preference analysis, anomaly identification and that is able to perform fine-grained sentiment and emotion
time-series analysis for sales forecasting will be realized analysis. This research offers new ideas for designing a robust
through leveraging the output of the sentiment and emotion method that leverages adaptive learning capabilities, fuzzy
analysis engine combined the other results obtained through logic, and social science concepts in handling fine-grained
the predictive algorithm pool. sensing classification (sentiments as well as emotions) in
textual datasets. There are ample opportunities to apply the
IV. A REALWORLD CASE STUDY THROUGH THE SOCIAL proposed method to other sectors such as the healthcare,
MEDIA ANALYSIS SYSTEM corporate, leisure, public and private sectors to help them to
While understanding the valence of sentiments helps to understand their customers better, identify the relevant risks,
assess overall public reactions, the understanding of emotion and improve their products and services.
1363 | P a g e
978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016
6-7 December 2016 | San Francisco, United States
ACKNOWLEDGMENT [13] T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual
polarity: An exploration of features for phrase-level sentiment analysis,”
The “SentiMo-Advanced Social Media Analytics” team Assoc. Comput. Linguist., vol. 35, no. 3, 2009.
provided great support for the discussion on issues related to [14] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:
emotion and the algorithm development and implementation. Theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–
The authors would also like to thank Dr. Kenneth Kwok, Dr 501, Dec. 2006.
Paul Yang and their team for the discussion on issues related to [15] Z. Wang and Y. Parth, “Extreme Learning Machine for Multi-class
emotion and knowledge building. Sentiment Classification of Tweets,” Proc. ELM-2015, Springer Int.
Publ. 2016, vol. 1, pp. 1–11, 2016.
The proposed method and system had been developed and [16] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? sentiment
the API version is ready to be licensed: classification using machine learning techniques,” Proc. ACL-02 Conf.
https://www.etpl.sg/innovation-offerings/technologies-for- Empir. methods Nat. Lang. Process. Assoc. Comput. Linguist., vol. 10,
pp. 79–86, 2002.
license/tech-offers/2087.
[17] E. Haddi, X. Liu, and Y. Shi, “The role of text pre-processing in
REFERENCES sentiment analysis,” Procedia Comput. Sci., vol. 17, pp. 26–32, Jan.
[1] A. Trilla and F. Alías, “Sentence-based sentiment analysis for expressive 2013.
text-to-speech,” Audio, Speech, Lang. Process. IEEE Trans., vol. 21, no. [18] R. Feldman, “Techniques and applications for sentiment analysis,”
2, pp. 223–233, 2013. Commun. ACM, vol. 56, no. 4, p. 82, Apr. 2013.
[2] R. Socher, A. Perelygin, and J. Wu, “Recursive deep models for [19] I. Maks and P. Vossen, “A lexicon model for deep sentiment analysis
semantic compositionality over a sentiment treebank,” Proc. Conf. and opinion mining applications,” Decis. Support Syst., vol. 53, no. 4,
Empir. methods Nat. Lang. Process., vol. 1631, p. 1642, 2013. pp. 680–688, Nov. 2012.
[3] Z. Wang, J. C. Tong, and D. Chan, “Issues of social data analytics with a [20] Y. Rao, J. Lei, L. Wenyin, Q. Li, and M. Chen, “Building emotional
new method for sentiment analysis of social media data,” in 2014 IEEE dictionary for sentiment analysis of online news,” World Wide Web,
6th International Conference on Cloud Computing Technology and vol. 17, pp. 723–742, Jun. 2014.
Science, 2014, pp. 899–904. [21] P. Shaver, J. Schwartz, D. Kirson, and C. O’Connor, “Emotion
[4] Z. Wang, R. S. M. Goh, and Y. Yang, “A method and system for knowledge: further exploration of a prototype approach.,” J. Pers. Soc.
sentiment classification and emotion classification,” Patent Cooperation Psychol., vol. 52, no. 6, pp. 1061–1086, 1987.
Treaty (PCT) Application, PCT/SG2015/050469, 2014. [22] A. Ortony and T. J. Turner, “What ’ s Basic About Basic Emotions ?,”
[5] Z. Wang, R. S. M. Goh, and Y. Yang, “SentiMo-A Method and system Psychol. Rev., vol. 97, no. 3, pp. 315–331, 1990.
for fine-grained classification of social media sentiment and emotion [23] P. Ekman, “An argument for basic emotions,” Cognition & Emotion,
patterns,” Singapore Patent Application10201407766R, 2014. vol. 6. pp. 169–200, 1992.
[6] Z. Wang and J. C. Tong, “ChiEFS-A method and system for Chinese [24] D. Chafale and A. Pimpalkar, “Review on Developing Corpora for
hybrid multilingual emotion fine-grained sensing of text data,” Sentiment Analysis Using Plutchik ’ s Wheel of Emotions with Fuzzy
Singapore Patent Application No. 10201601413Q, 2015. Logic,” Int. J. Comput. Sci. Eng., vol. 2, no. 10, 2014.
[7] P. Gonçalves and M. Araújo, “Comparing and combining sentiment [25] R. Plutchik, “The Nature of Emotions Human emotions have deep
analysis methods,” Proc. first ACM Conf. Online Soc. networks. ACM., evolutionary roots, a fact that may explain their complexity and provide
pp. 27–38, 2013. tools for clinical practice,” Am. Sci., vol. 89, no. 4, pp. 344–350, 2001.
[8] B. Yuan, Y. Liu, and H. Li, “Sentiment classification in Chinese [26] A. Neviarouskaya, H. Prendinger, and M. Ishizuka, “Textual affect
microblogs: Lexicon-based and learning-based approaches,” Int. Proc. sensing for sociable and expressive online communication,” Affect.
Econ. Dev. Res., vol. 68, pp. 1–6, 2013. Comput. Intell. Interact., pp. 218–229, 2007.
[9] J. Ortigosa-Hernández, J. D. Rodríguez, L. Alzate, M. Lucania, I. Inza, [27] Y. R. Tausczik and J. W. Pennebaker, “The psychological meaning of
and J. a. Lozano, “Approaching sentiment analysis by using semi- words: LIWC and computerized text analysis methods,” J. Lang. Soc.
supervised learning of multi-dimensional classifiers,” Neurocomputing, Psychol., vol. 29, no. 1, pp. 24–54, Dec. 2010.
vol. 92, pp. 98–115, Sep. 2012.
[28] A. P. Soares, M. Comesaña, A. P. Pinheiro, A. Simões, and C. S. Frade,
[10] X. Glorot, A. Bordes, and Y. Bengio, “Domain adaptation for large- “The adaptation of the Affective Norms for English Words (ANEW) for
scale sentiment classification: A deep learning approach,” Proc. 28th Int. European Portuguese,” Behav. Res. Methods, vol. 44, pp. 256–269,
Conf. Mach. Learn., pp. 513–520, 2011. 2012.
[11] H. Ji, H. Deng, and J. Han, “Uncertainty reduction for knowledge [29] J. M. Mendel and D. Wu, “Challenges for perceptual computer
discovery and information extraction on the World Wide Web,” Proc. applications and how they were overcome,” IEEE Comput. Intell. Mag.,
IEEE, vol. 100, no. 9, pp. 2658–2674, Sep. 2012. vol. 7, no. 3, pp. 36 – 47, 2012.
[12] B. Gokaraju, S. S. Durbha, R. L. King, S. Member, and N. H. Younan, [30] Z. Wang, J. C. Tong, P. Ruan, and F. Li, “Lexicon knowledge extraction
“A machine learning based spatio-temporal data mining approach for with sentiment polarity computation,” IEEE Int. Conf. Data Min. Ser.
detection of harmful algal blooms in the Gulf of Mexico,” IEEE J. Sel. (ICDM), SENTIRE, Accept., 2016.
Top. Appl. earth Obs. Remote Sens., vol. 4, no. 3, pp. 710–720, 2011.
1364 | P a g e
978-1-5090-4171-8/16/$31.00 ©2016 IEEE