Issues and Challenges in Opinion
Issues and Challenges in Opinion
resources and tools for automatic opinion mining systems. classification components using machine learning
Opinion mining is the language and domain dependent technique with support vector machine as the basis [16].
task. Resources for one language do not give same results They used 1-gram to 10-gram strings, words, part of
for another language. In order to analyze opinions, it is speech information as features for the classification task.
necessary to study how opinions are expressed in target In their work for language independent opinion sentence
language. Every person has his own way of representing detection Zubaryeva & Savoy (2009) used statistical
his views. The two major tasks in opinion mining are approach to opinion detection and its’ evaluation on the
subjectivity analysis and sentiment analysis. The English, Chinese and Japanese corpora compared with
subjectivity classification task separates subjective three baselines, namely Naïve Bayes classifier, a language
(opinions) text from objective (facts) text. The sentiment model and an approach based on significant collocations
analysis deals with deciding the overall polarity of [14].
subjective text i.e. whether the text is positive, negative or The Indian languages are less explored in the field of
neutral. opinion mining. A study in sentiment analysis of Bengali
language was initiated by Das & Bandyopadhyay (2009)
Many approaches in the opinion mining systems rely on
using the news and blog corpus of Bengali language [2].
the existence of sentiment lexicon. Opinions are normally
For constructing the sentiment lexicon, they used word
expressed using subjective/sentiment words at word,
level translation process followed by error reduction
sentence, paragraph or document level. Sentiment lexicon
technique. Bakliwal, Arora, & Varma (2012) developed a
consists of words that represent subjective information.
sentiment lexicon for Hindi language using the Wordnet
Sentiment lexicon enables the construction of efficient
based approach [1].
rule-based subjectivity and sentiment classifiers that rely
Although the sentiment lexicon provides an efficient way
on the presence of lexicon entries in the text [3]. The
to identify opinions in text, other than the sentiment words,
sentiment lexicon may also contain polarity information in
various linguistic features also contribute to subjectivity
the form of sentimental category of individual term e.g.
and affect the polarity of text. Polanyi & Zaenen (2006)
highly positive, positive, etc. or polarity (positive/negative)
discuss various valence shifters at sentence level and
and numeric values indicating intensity of the polarity.
discourse based contextual valence shifters that affect
In this paper we explore how opinions are expressed in polarity orientation of the text [10]. Valence shifters are
Marathi language. We also discuss various issues that need the constructs that alter the text polarity in various ways.
to be addressed for efficiently analyzing opinions in For opinion mining in any language, study of such valence
Marathi sentences. The examples used in the paper are shifters greatly affect the classification results.
taken from the movie reviews domain. The reviews are
collected from online archives of various Marathi
newspapers. 3. Opinion Indicators
To analyze opinions, first we need to understand how
2. Related Works opinions are expressed in target language. Opinions are
mostly expressed using specific opinion words that
Although English is the dominant language in the field of explicitly convey sentiment information. The collection of
opinion mining, non – English languages are also being such words i.e. sentiment lexicon can be used to identify
studied at great extent. Many researchers opt to translate and extract subjective text. The sentiments can be
English resources into another language. Translation expressed at various levels like word, phrase, sentence,
approach is simple and easy to implement. But translation paragraph, etc.
may lead to ambiguity and loss of accuracy. So it becomes Like in other languages, sentiment bearing words exist in
necessary to study the target language itself for opinion Marathi language. Most of the research in opinion mining
expression. has focused on using adjectives and adverbs as major
The lexicon based approach is the most widely used indicators of subjectivity [4],[17]. However it is observed
approach in the field of opinion mining. Sentiment that along with adjectives and adverbs, nouns and verbs
lexicons could be constructed for any language. It has been are also used to express opinions. Table 1 list some of the
observed that adjectives and adverbs contribute most to the subjective words that occur in Marathi.
subjectivity [4], [17]. The sentiment lexicons have been
developed for many languages like Roman [15], Korean Table 1. List of subjective words in Marathi
[8], Arabic [11], etc. POS
For the Japanese language Kanamaru, Murata, & Isahara Positive Negative
Category
(2007) implemented opinion extraction, opinion holder आनंद, उत्साह, अततशयोक्ती,
identification, topic relevance detection and polarity Nouns
कौतक
ु ,चांगल
ु पणा अपयश,दल
ु क्ष
ल
अप्रततम, छान, कमकुवत, multiple sentences, where each sentence does not
Adjectives
अनप
ु म, उत्तम तद्दन, तनकृष्ट necessarily contain sentiment words, but the sentences are
related to each other as a single unit. e.g.
जेमतेम,
Adverbs आनंदाने, चांगलं
ववनाकारण
Text 5. संगीत, छायांकन, संकलन या सवलच बाजन ुं ी चचत्रपट पररपण
ू ल
Verbs भावणे, आवडणे खटकणे
बनला आहे . एकच अपवाद तो मध्यंतराचा. (The movie is
perfect in all the aspects like music,
These words are used in sentences to convey sentiments
photography, editing, etc. Interval is the only
about various aspects. Although sentiment lexicon is a
exception.)
good resource for identifying sentiments, the richness of
languages allows a single word to be used in different
context with different senses. So the presence of In this example, the first sentence is a direct positive
sentiment words does not always indicate opinions. e.g. opinion about the music, photography, and editing aspects
of movie. While the second sentence provides an
Text 1. ससनेमातले संवाद खूपच साधारण आहे त. (The dialogs in information about another aspect, i.e. interval, stating that
the movie are too ordinary) it is an exception to the first sentence. The second sentence
Text 2. गोष्टीची पार्शवलभूमी साधारण २५ वर्ाांपवू ीची आहे . (The is indirect negative opinion about the interval of the movie.
background of the story is approximately 25 The second sentence itself does not convey any type of
years ago) sentiments explicitly. It will be meaningful only when
In Text 1 the word ‘साधारण’ is used with meaning used with first sentence and the polarity of it will be
ordinary which is negative term, while in Text 2 same dependent on the polarity of the first sentence.
word is used with meaning approximately which is The indirect opinions are also encountered with common
objective. beliefs and indicators in the target domain. e.g.
There could also be situations where a single word or Text 6. या चचत्रपटाने पहहल्याच हदवशी १०० कोटी कमावले. (The
phrase may have different polarity in different domains or movie earned 100 crore on first day)
contexts. As shown in Text 3 and 4, the term ‘वेड’ This is a factual sentence giving information about the
(madness) can be used as positive opinion indicator in collection of a movie. However earning large amount is a
movie reviews domain and it has negative valence in positive indicator for movies. So this statement conveys a
health domain. positive sentiment for the particular movie.
Many such situations can be found in natural language
Text 3. जादईू संगीतानं तरुणाईला वेड लावणारा संगीतकार ए. आर. texts where the authors are expressing implicit opinions.
रे हमान बॉसलवड
ू बरोबरच परदे शी ससनेसष्ृ टीतही ठसा While analyzing indirect opinions, it becomes necessary to
उमटवतोय. – A statement in movie reviews study the domain for different contexts and situations and
to increase the scope of the sentiment unit to be larger than
admiring the music of a musician to be
a single sentence.
maddening
Text 4. सवलसाधारण भार्ेत ज्याला वेड लागणे म्हणतात त्या ववकृतीला
‘चचत्तभ्रम’ ककं वा ‘बवु िभ्रष्टता’ म्हणतात. – A statement in 4. Sentence Structure
health domain describing a mental disorder.
Every natural language processing system relies on the
Use of multiword expressions and sayings is also very basic grammar of the target language. Grammar gives us
common in natural language text. In case of such insight into how the text is constructed using words.
expressions group of words needs to be analyzed as a Words are the basic units in opinion mining systems.
single unit. Multiword expressions can be added to the Although single sentiment words are capable of expressing
sentiment lexicon as separate entries. opinions, words are normally used in bigger unit i.e. a
Indirect Opinions sentence.
Identification of indirect opinions is one of the most In case of direct opinions, sentiment words are a part of
challenging issues in the opinion mining systems. the sentence. The position of the sentiment word depends
Sentiment words are not the only option to express on the sentence structure. The principal word order in
opinions. Sentiments can also be expressed without using Marathi is SOV (subject–object–verb). The Marathi
any sentiment word. Such opinions are indirect opinions. grammar has three types of sentences:
When writing short comments, the writers use specific a. केवलवाक्य (Simple Sentence)
words. But when writing descriptive reviews, writers have Single principle clause.
freedom to extend their views in multiple sentences. In ससनेमा तांत्रत्रकदृष््या उत्कृष्ट झाला आहे .
such descriptive texts sentiments could span among b. समश्रवाक्य (Complex Sentence)
Single principal clause and one or more b. Subordinating (गौणत्वसूचक): are the conjunctions that
subordinate clauses. link dependent clause to an independent clause. The
उत्तम तंत्रकौशल्य, चांगले कलावंत असूनही हा चचत्रपट फारसे polarity of the sentence can be set to that of
समाधान करत नाही. independent clause.
c. संयक्ु तवाक्य (Compound Sentence) ससनेमाचं कथानकच एवढ जबरदस्त आहे की, ससनेमाला संगीताची
Two or more sentences of above types connected फारशी गरज जाणवली नाही. (The story of the movie is so
by various connectors. strong that it does not need music)
ससनेमाची कथा रोचक आहे आणण काही हठकाणी ती
मजेशीरसुिा वाटते. Marathi is a free order language. So the principal word
order is not always followed in texts. We often encounter
Most of the sentences follow this order and belong to any cases where compound and complex sentences are
one of the above mentioned types making it possible to constructed without using explicit connectors. Consider
design extraction patterns for opinions. However all the example in Text 10,
sentences in a document might not follow the correct Text 10. उत्तम तंत्रकौशल्य, चांगले कलावंत असूनही हा चचत्रपट फारसे
grammatical order. Every writer has his own writing style समाधान करत नाही. (In spite of good technical
that makes similar opinions to be expressed in different skills and actors, the movie doesn’t satisfy)
manner. e.g.
Text 7. संवाद प्रभावी आहेत. (The dialogs are effective) In this sentence opinion about three features तंत्रकौशल्य
Text 8. संवादांमधन (technical aspects), कलावंत (actors) and चचत्रपट (movie) are
ू मनोरं जनही होते.
(The dialogs entertain us too) expressed. First two features have positive polarity. The
Text 9. चचत्रपटातील प्रसंगांवर संवादांचे प्रभुत्व हदसते. scope of negation word नाही is limited to the last feature.
(Dominance of the dialogs could be seen on movie Presence of the word असन ू ही splits the sentence into two
events) clauses where one clause is of positive polarity while the
other is negative. In this sentence the word असूनही makes
The incorrect grammar issue causes more problems in the second clause of the sentence dominant over the first
social media and blog contents. On social media the users clause. So the overall polarity of the sentence could be set
mostly use short comments instead of full sentences. The as negative. While processing complex and compound
statements need not be grammatically correct. Handling sentences, the position and type of connectors play an
such data is a difficult task. important role in deciding the overall polarity of the
The simple sentences are comparatively easy to analyze. sentences.
But the complex and compound sentences are difficult
especially if contradictory opinions are expressed in same
sentence. The complex and compound sentences make use 5. Negation
of different types of connectors to connect various clauses
in a sentence. Negation is another challenging aspect in opinion mining.
Connectors The problem of negation needs to be addressed for opinion
The study of the connectors is necessary while deciding mining system of any language. Negation is a grammatical
the polarity of the sentences. The connectors are used to construction that contradicts or negates part or all of a
connect sentence clauses in compound and complex sentence's meaning. In case of English negative clauses
sentences. Connectors can combine multiple sentiment and sentences commonly include the negative particle
clauses of same polarity or of different polarity. When “not” or the contracted negative “n’t”. Similarly in
used with contradictory opinions, the polarity of the whole Marathi, words can be negated using prefixes like “अ”,
sentence will be dependent on the type of connector used. “गैर”, etc.
Following are the types of connectors: अ : अप्रततष्ठा, अप्रासंचगक
a. Coordinating (प्रधानत्वसूचक): links two words, phrases, गैर : गैरवतलन, गैरसमज
clauses or sentences that are grammatically equivalent. Negating terms like न, नसून, नसल्याने, etc. are also used to
In this case the clauses can be separated and polarity negate word polarities. Negation can be applied at word or
of each part can be calculated separately. sentence level. e.g.
ससनेमाची कथा रोचक आहे आणण काही हठकाणी ती मजेशीरसुिा
वाटते.
(The story of the movie is interesting and at Text 11. हा संपण
ू ल चचत्रपट मनोरं जनात्मक आहे . (The movie is
some places it is funny too.) recreational)
Text 12. हा चचत्रपट मनोरं जनात्मक नसून माहहतीपर आहे . (The Second clause “चचत्रपट आणखी चांगला बनु शकला असता”
movie is not recreational but informative) suggests that the movie would have been better if
Negation ‘नसन ू ’ is applied on subjective word the first clause was true.
‘मनोरं जनात्मक’ (recreational) It is observed that the second clause is dependent on the
first clause, both the clauses are contradictory to the real
When applied at sentence level, the negating term reverses situation. Thus the polarity of the whole sentence could be
the polarity of the whole sentence e.g. set to negative. The subjunctive mood can be identified by
use of specific constructs like जर – तर, तथावप, or verb
Table 2. Negation applied at sentence level phrases like हवेहोते, झालेअसते,etc.
संपादन चांगले नाही.
Sentence
(Editing is not good)
7. Irony
Subjective Word चांगले– Positive
Sentence polarity Negative People do not always use formal or plain words in
After identifying the negation indicators, it is necessary to expressing their opinions, especially when writing
determine the scope of the negation. Scope of the negation informal text. In case of newspaper articles the language
word defines the part of the subjective text which is used is formal and grammatically correct. But when
negated with that word. A negation word may contradict writing reviews or comments, the language of the author is
part or the entire sentence. Jia et.al. (2009) show that the not always formal and grammatically correct. While
identification of the scope of negation improves both the working on informal text we often encounter ironic
accuracy of sentiment analysis and the retrieval statements that are difficult to analyze due to the use of
effectiveness of opinion retrieval [9]. They have used the words to express the opposite of their literal meaning. e.g.
parse tree and typed dependencies along with heuristic
rules. Councill, McDonald, & Velikovich (2010) Text 14. कथाच नसल्याने सारा आनंदी आनंद आहे . (Since there is
presented a negation detection system based on a no story, everything is out of order.)
conditional random field modeled using features from an
English dependency parser including the lowercased token Here the word आनंदीआनंद(all is well) basically has positive
string, token POS, token-wise distance from explicit polarity. But the way it is used in this sentence, gives it
negation cues, POS information from dependency heads, negative sense.
and dependency distance to explicit negation cues [7]. Situational irony statements also express contradictory
sentiments in a single sentence. Such types of statements
are used when the situation is contrary to the author’s
6. Grammatical Moods expectations. e.g.
The grammatical mood conveys speaker’s attitude about Text 15. ‘…’सारखा चांगला चचत्रपट बनवणारा हदग्दशलक या चचत्रपटात
the state of being of what the sentence describes. Moods मात्र घोर तनराशा करतो. (The director who has
play an important role in opinion mining systems. In made good movies like ‘…’ disappoints in this
Marathi there are four types of moods: स्वाथल,आज्ञाथल, ववध्यथल, movie.)
संकेताथल.Here we discuss only the संकेताथल(subjunctive)mood In this sentence the first clause ‘…’ सारखा चांगला
which is related to opinion mining. With this mood, author चचत्रपटबनवणारा हदग्दशलक suggests a positive opinion about a
can set up a context of possibility or necessity in texts. It is particular director who has directed a good movie. The
used to explore conditions that are contrary to facts. e.g. author is expecting the other movies directed by the same
director to be good. The second clause, however expresses
Text 13. जर कथा आणण पटकथा चांगली असती तर, चचत्रपट आणखी negative polarity about the director as he has failed to
चांगला बनु शकला असता. (If the story and fulfill the expectations. In other words this sentence
screenplay was good, the movie would have expresses a negative opinion about a director who is
been better) otherwise considered to be very good.
The above sentence contains two clauses:
First clause “कथा आणण पटकथा चांगली असती” indirectly 8. Possibilities
expresses negative sentiment about the story and
screen play suggesting that the story and screenplay Different people may have different opinions on same
are not good topic. The author cannot always be sure that the reader
will agree with his assessments. Sometimes the author
may not be sure about making a particular opinion. In such Adverbs are better than Adjectives Alone”, Proceedings of
cases the author may express possibility of opinions. e.g. the International Conference on Weblogs and Social Media,
2007
Text 16. कदाचचत हा ससनेमा तम्ु हाला तनराश करु शकतो. (Perhaps [5] H.B. Patil, A.S. Patil, B.V. Pawar, “Part-of-Speech Tagger
this movie will disappoint you.) for Marathi Language using Limited Training Corpora”,
Here the author indicates that the movie might disappoint IJCA Proceedings on National Conference on Recent
you. It is not sure whether the author himself is Advances in Information Technology NCRAIT(4), 2014, pp.
disappointed or not, so the decision depends on the 33-37
reader’s experience. Such types of sentences could be [6] H.B. Patil, A.S. Patil, B.V. Pawar, “A Comprehensive
classified as neutral sentences. Analysis of Stemmers Available for Indic Languages”,
International Journal on Natural Language Computing
(IJNLC) Vol. 5, No.1, 2016, pp. 45-55
9. Conclusion [7] I. G. Councill, R. McDonald and L. Velikovich, “What’s
Opinion mining is a language and domain dependent task. Great and What’s Not: Learning to Classify the Scope of
When constructing opinion mining resources, it becomes Negation for Improved Sentiment Analysis”, Proceedings of
necessary to study the target language. Here in this paper the Workshop on Negation and Speculation in Natural
we discuss various issues and challenges that need to be Language, 2010, pp. 51–59
addressed while building automatic opinion mining system [8] J. Kim, H.Y. Jung, S. Nam, Y. Lee and J. Lee, “Found in
for Marathi language. This study helps to efficiently Translation: Conveying Subjectivity of a Lexicon of One
identify and extract subjective text that later can be Language into Another Using a Bilingual Dictionary and a
analyzed for polarity orientation. We observe that mere Link Analysis Algorithm”, ICCPOL 2009, LNAI 5459,
presence of sentiment words does not always indicate 2009, pp. 112–121
presence of opinions but other linguistic features also [9] L. Jia, C. Yu and W. Meng, “The Effect of Negation on
contribute to subjectivity and affect the polarity of the text. Sentiment Analysis and Retrieval Effectiveness”, CIKM’09,
We intend to develop automated rule based opinion ACM, 2009, pp. 1827-1830
detection system for Marathi language. The study of [10] L. Polanyi and A. Zaenen, “Contextual Valence Shifters”,
linguistic features will help to develop rules and extraction Computing attitude and affect in text: Theory and
patterns for extracting subjective information from text applications, 2006, pp. 1-10, Springer.
and analyzing this subjective text for polarity classification. [11] M. Abdul-Mageed and M. T. Diab, “Subjectivity and
Sentiment Annotation of Modern Standard Arabic
Newswire”, Proceedings of the Fifth Law Workshop, 2011,
Acknowledgments
pp. 110–118
The authors are thankful to the University Grants [12] N.V. Patil, A.S. Patil, and B.V. Pawar, “Survey of Named
Commission, New Delhi, India for supporting this Entity Recognition Systems with respect to Indian and
research under the Special Assistance Program (SAP) at Foreign Languages”, International Journal of Computer
the level of DRS-I. Applications 134(16), 2016, pp.21-26
[13] N. V. Patil, A. S. Patil and B. V. Pawar, “Issues and
References Challenges in Marathi Named Entity Recognition”,
[1] Bakliwal, P. Arora and V. Varma, “Hindi Subjective International Journal on Natural Language Computing
Lexicon: A Lexical Resource for Hindi Polarity (IJNLC) Vol. 5, No.1, 2016, pp. 15-30
Classification”, The eighth international conference on [14] O. Zubaryeva and J. Savoy, “Investigation in Statistical
Language Resources and Evaluation, 2012, pp. 1189-1196 Language-Independent Approaches for Opinion Detection
[2] Das and S. Bandyopadhyay, “Subjectivity Detection in in English, Chinese and Japanese”, Third International
English and Bengali: A CRF-based Approach”, 7th Cross Lingual Information Access Workshop, 2009, pp. 38–
International Conference on Natural Language Processing, 45
Macmillan Publishers, 2009 [15] R. Mihalcea, C. Banea and J. Wiebe, “Learning Multilingual
[3] Banea, R. Mihalcea and J. Wiebe, “A Bootstrapping Method Subjective Language via Cross-Lingual Projections”,
for Building Subjectivity Lexicons for Languages with Proceedings of the 45th Annual Meeting of the Association
Scarce Resources”, Proceedings of the Sixth conference on of Computational Linguistics, 2007, pp. 976–983
International Language Resources and Evaluation, 2008, pp. [16] T. Kanamaru, M. Murata and H. Isahara, “Japanese Opinion
2764-2767 Extraction System for Japanese Newspapers Using
[4] F. Benamara, C. Cesarano, A. Picariello, D. Reforgiato, and Machine-Learning Method”, Proceedings of NTCIR-6
V. Subrahmanian, “Sentiment Analysis: Adjectives and Workshop Meeting, 2007, pp. 301-307