0% found this document useful (0 votes)
23 views12 pages

Data Miningof Public Opinion An Overview

Uploaded by

Ayoub Brch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views12 pages

Data Miningof Public Opinion An Overview

Uploaded by

Ayoub Brch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Data Mining of Public Opinion: An Overview

Gloria Hristova1, a), Boryana Bogdanova1, b), and Nikolay Netov1, c)


1
Sofia University “St. Kliment Ohridski” – Faculty of Economics and Business Administration, Department of
Statistics and Econometrics, 125 Tsarigradsko Shose Blvd., Block 3, 1113 Sofia, Bulgaria.
a)
Corresponding author: [email protected]
b)
[email protected]
c)
[email protected]

Abstract. The United Nations recently published the “E-government survey 2020” with the main aim of assessing the e-
government development status of all United Nations member states. The survey outlines 14 leading countries in e-
government development (out of 193 member states) some of them claiming to utilize technologies as artificial
intelligence (AI), big data and blockchain. Moreover, with the burst of the COVID-19 pandemic the topic on
development and implementation of e-government services becomes even hotter. However, along with the research on
the process of digitalization of public services, it is important to develop tools measuring how these rapid changes are
perceived by the users. Consequently, this paper examines the most recent research devoted on public opinion data
mining. On the basis of extensive literature review, we outline the latest developments and trends in the field of public
opinion data mining with special focus on sentiment analysis. Our main goal is to provide a self-contained
comprehensive summary that might be used as a basis for design and development of AI systems aimed to mine the
public opinion.

INTRODUCTION

The big data era brings a lot of research interest and attention to the scientific fields of data mining and machine
learning. The combination between available data and advanced analytics techniques leads to developing smart and
data-driven solutions for the business, economy, and many other aspects of our modern life. More and more
researchers and practitioners start adopting data mining methods in the governmental domain as well (1). This
emerging field has many new applications which facilitate the process of digitalization of government services and
the development of e-government. We should note that the United Nations recently published the “E-government
survey 2020” with the main aim of assessing the e-government development status of all United Nations member
states (2). The survey outlines 14 leading countries in e-government development (out of 193 member states) some
of them claiming to utilize technologies as artificial intelligence (AI), big data and blockchain. Moreover, with the
burst of the COVID-19 pandemic the topic on development and implementation of e-government services becomes
even hotter. However, the researchers focus should consider not only the process of digitalization of many public
services but also how these rapid changes are perceived by the users.
Consequently, this paper examines the most recent research devoted on public opinion data mining. On the basis
of extensive literature review, we outline the latest developments and trends in the field of public opinion data
mining with special focus on sentiment analysis. Our main goal is to provide a self-contained comprehensive
summary that might be used as a basis for design and development of AI systems aimed to mine the public opinion.
For this purpose, we deliver a comparative tabular summary of reviewed papers in terms of utilized text processing
techniques, machine learning (ML) algorithms and data sources as well as utilized language resources for text
preprocessing and analysis.
Our paper contributes to the body of literature devoted on data mining applications in the governmental domain
in the following ways. We outline the general methodologies and techniques utilized for analyzing public opinions
and sentiments, as well as the main sources of such data used by researchers. Furthermore, our analysis sheds light
on the general topics of public interest (digital services, political issues, healthcare etc.) analyzed by the application
of data mining and machine learning techniques. Last but not least, our review is with special focus on sentiment
analysis applications for evaluation of public opinions, hence in the surveyed papers we also pay attention to the text
data language, utilized language resources for text processing and analysis and other important aspects of studies in
the field.
The rest of the paper is organized as follows. Section 2 sheds light on the research methodology. Section 3
provides an outline of the general methodologies for sentiment analysis and opinion mining. Section 4 presents a
review of recent research devoted on data mining and sentiment analysis of public opinion in the governmental
domain. Section 5 discusses the main findings made from the current research.

RESEARCH METHODOLOGY

The research methodology is carefully chosen as to answer to the specific goals set in the study. We first shed
light on the general methodologies for sentiment analysis and opinion mining. Second, we pay special attention to
the recent approaches for analysis of public opinion in the governmental domain through computational methods.
Many studies apply qualitative approaches toward the task - (3), (4), (5). However, the application of machine
learning and natural language processing (NLP) on opinions freely expressed in social networks may lead to huge
advancements in public opinion analysis. The current research project is part of a larger project for mining citizen
opinions on the e-government in Bulgaria, hence in our review we also investigate the applications of sentiment
analysis and citizen opinion mining on text data in Bulgarian. Figure 1 displays a diagram of the research
methodology.
In a nutshell, the current overview focuses on the following strands of papers:
1. General methodologies for opinion mining and sentiment analysis - we examine general trends in the field
and approaches towards the task.
2. Sentiment analysis of public opinions in the governmental domain - we examine most recent research
published in the last 5 years.

FIGURE 1. Research methodology - workflow

GENERAL METHODOLOGIES FOR OPINION MINING AND SENTIMENT


ANALYSIS

Opinion mining and sentiment analysis have a central role in the analysis of public opinion on various topics (6).
Furthermore, as stated by Alexopoulos et al. (1), sentiment analysis is a method which improves the communication
and relationship between modern governments and citizens. Sentiment analysis is a scientific field which combines
NLP and statistical techniques with the main aim to identify, extract and analyze the opinions, emotions, and
subjectivity, expressed in text. Sentiment analysis focuses on the feelings, evaluations, attitudes, and emotions of
people expressed towards various topics (7) – goods, services, events, topics of social importance, individuals etc.
Sentiment analysis has numerous applications in various industries – business and finance (see (8) and (9))
healthcare (10), politics (11), education (12) etc. Since the beginning of the 21st century there has been an ever-
increasing research interest in the field. The studies of Turney (13) and Pang et al. (14) are fundamental for the
development of the field. In 2012 Liu (15) provides a comprehensive study that aims to outline the state-of-the-art
approaches, application areas and sub-tasks of sentiment analysis. The study lays important groundwork for future
advancements and new applications in the field. In 2012 Liu described sentiment analysis as “the most active
research area in NLP”. According to Liu the field has quickly started to expand beyond computer science to social
and management sciences due to its immense significance and numerous applications that help in the business,
political and social aspects of our lives.
In the context of sentiment analysis, most often the task to determine the sentiment polarity of a given text is
tackled. As the name suggest, polarity analysis categorizes opinions/emotions as positive, negative, or neutral (16).
Depending on the task and data under study, different levels of these three general sentiment categories might be
considered. For example, the analysis might be focused on differentiating between these three polarity levels plus a
“mixed” sentiment category (17). Sentiments might be also considered on a finer scale - for example, “very
negative, negative, neutral, positive, very positive” (see (11) and (18)). Some of the pioneering research in the field
of sentiment analysis is focused on determining the sentiment polarity – (13), (14), (19). Recent research sheds light
on the development of “emotion sentiment analysis” (20). Yadollahi et al. provide a refined taxonomy of sentiment
analysis dividing it to two main types - opinion mining and emotion mining (20). While polarity classification is
defined as a subtask of “opinion mining”, emotion classification falls into the “emotion mining” category of
sentiment analysis. The last deals with the task of determining the expressed emotions in text (from a set of defined
emotions). An example application is the study of Gupta et al. (21) which aims to analyze 8 different emotions
(anger, anticipation, disgust, fear, joy, sadness, surprise, and trust) expressed by citizens in the context of the
COVID-19 pandemic and other socially important topics. Emotion mining receives more and more attention in the
recent years since it enables a deeper analysis of human experiences.
Three main approaches could be utilized for sentiment analysis (22). The first approach involves the application
of sentiment lexicons (the lexicon-based approach). Lexicons contain lists of sentiment words and phrases – these
are words/phrases that convey positive or negative sentiments. Such language resources are generated by utilizing a
dictionary-based or corpus-based approach (23). It should be pointed out that lexicons are domain-specific since
words have different meanings when used in different contexts. The last means that not every lexicon could be
applied to every dataset. There are also some “general-purpose” lexicons but, still, attention must be paid to the
special characteristic of data under study. The advantage of using such language resources (compared to other
approaches for sentiment analysis) is that applying them on text data is quick, interpretable, and straightforward.
Furthermore, such resources do not require the availability of training data. The last is a huge obstacle in the
application of purely statistical approaches for sentiment analysis and for this reason, in many use cases, the lexicon-
based approach is the preferred one. Among the most frequently used sentiment lexicons are - SentiWordNet (24),
AFINN (25), VADER (26), SocialSent (a collection of domain-specific lexicons) (27), TextBlob (28). Most lexicons
are suitable mainly for social media texts. The supported languages of the above-mentioned lexicons do not include
Bulgarian. There is one sentiment lexicon for Bulgarian which is available for research purposes. It is developed by
Kapukaranov and Nakov (29) and based on movie review data.
The second approach for sentiment analysis involves the utilization of machine learning methods, while the third
is a hybrid – a combination of the lexicon-based and machine learning approaches. The most frequently tackled task
in the field - sentiment polarity analysis – is usually considered as a text classification task. There are various
machine learning models that could be applied in such tasks (30). The main idea behind such predictive models is to
learn how to detect the sentiment by directly observing data. The classical methodology in machine learning
involves feature engineering prior to the application of a prediction model. Some of the most popular text
classification models are SVM (support vector machines), Naïve Bayes and Logistic Regression (31). However,
classical methods have some limitations. Feature engineering is crucial for obtaining good performance and this
phase might be quite time-consuming. Furthermore, when data size increases, methods relying on hand-crafted
features might become unreliable due to the increasing volume of new observations. If the text data language is a
“low-resource language” (like Bulgarian, for example), another problem emerging is the reliance on language
resources for some feature engineering and text pre-processing tasks such as part-of-speech (POS) tagging,
stemming and lemmatization etc. Without claiming to be exhaustive, we mention some recent research employing a
machine learning approach or a hybrid approach for sentiment analysis – (9), (32), (33).
To overcome some of the problems of classical machine learning methodologies, neural models could be utilized
for sentiment analysis. Deep learning addresses some of the described limitations and do not rely on hand-crafted
features, instead text is represented by embeddings - low-dimensional, learned continuous vector representations.
Embeddings capture the semantic relationships in texts and might remove the need to perform feature engineering.
One of the most popular word embedding techniques is Word2Vec proposed by Mikolov et al. in 2013 and useful in
many NLP tasks (34). Nowadays neural models receive more and more attention in the field of NLP since they are
applicable in many use cases. The latest breakthrough on the NLP stage introduced in 2017 are Transformer models.
The Transformer is a network architecture based on the idea of attention-mechanisms and useful in various natural
language understanding and generation tasks. It was introduced by Vaswani et al. (35) and opened a whole new path
for advancements in the field of text mining and NLP. The simple idea behind the Transformer is to avoid the usage
of convolution and time-consuming recurrent neural networks and develop a novel neural architecture that depends
entirely on attention-based mechanisms.
The Transformer architecture combined with transfer learning (36) lead to the construction of pretrained models
that enable the development of models for various NLP tasks such as machine translation, text summarization,
question answering, text classification etc. Transformer models also became a popular method for sentiment analysis
in different domains - (8), (37), (38). Among the most popular examples of pre-trained model architectures are
BERT (39), Facebook’s XLM model (40) and OpenAI GPT. Google’s BERT has several variations - XLNet (41),
RoBERTa (42), DistilBERT (43). The “HuggingFace” library is extremely popular among the deep learning
community since it is specifically designed to enable the development and deployment of such state-of-the-art
models for natural language processing (44). According to the official documentation in version 4.7.0 there are over
62 supported model architectures and over 960 datasets. The library covers over 190 different languages among
which English and Spanish are the most represented ones. Resources for Bulgarian are also available but, of course,
to a far less degree compared to other European languages. There are 38 datasets and 41 models applicable to
Bulgarian in version 4.7.0. However, among them there are not many language-specific models or datasets – most of
them are multilingual.

SENTIMENT ANALYSIS OF PUBLIC OPINION

This section aims at providing a summary of recent research focused specifically on sentiment analysis of public
opinion in the governmental domain. We focus on computational methods, rather than qualitative approaches
towards the task. Table 1 provides a self-contained comparative tabular summary of all reviewed papers in terms of
utilized data sources, text processing techniques and machine learning algorithms as well as utilized language
resources for text preprocessing and analysis. We consider all these aspects of the studies as important in the
analyzed domain. The first subsection explicitly summarizes studies with focus on digital public services, while the
second subsection summarizes studies applying sentiment analysis in other governmental domains (for example,
healthcare, politics and other).
TABLE 1. Summary of recent research devoted on citizen opinion mining

Are digital Are language


public resources for
Research services General methodology Data source of Text data text
Year
paper being for sentiment analysis public opinion language processing
analyzed and analysis
explicitly? used?
Yes - the
User app
authors
reviews in
Lexicon-based approach + generated the
(45) 2020 Yes Apple Store Arabic
rule-based model lexicons
and Google
utilized in the
Play
study.
User app Yes - the
Lexicon-based approach +
reviews in authors
(46) 2020 Yes ML-based approach English
Apple Store generated the
(SVM)
and Google lexicons
Play utilized in the
study.
English,
(47) 2020 Yes Manual annotation Twitter No
Hindi
Manual Annotation +
Facebook,
(48) 2020 Yes ML-based approach for Jordanian No
Twitter
sentiment classification
ML-based approach for
(49) 2019 Yes Questionnaire Arabic No
sentiment classification
Random Forest for
sentiment prediction (ML- Reviews in
based approach); Topic National
(50) 2017 No English No
modeling with LDA Health Service
(Latent Dirichlet in England
Allocation).
Random Forest for
Reviews in
sentiment prediction (ML-
National
(51) 2020 No based approach); Topic English No
Health Service
modeling with STM
in England
(Structural Topic Model).
Lexicon-based approach +
(52) 2019 - ML based approach Twitter - Yes
(Hybrid)
Lexicon-based approach +
(53) 2018 No Visual analysis - Event Twitter Spanish Yes
Drops, Word Clouds, etc.
Manual annotation +
(54) 2019 No Lexicon-based sentiment Twitter Spanish Yes
analysis; Topic modeling.
Twitter,
YouTube,
Yes - IBM
ML-based approach (deep Instagram,
(55) 2020 No Spanish Watson
learning) official press
system.
websites, and
internet forums
Yes - the
Manual annotation + ML- authors
based approach (k-nearest generated the
(56) 2019 No Twitter English
neighbor, logistic lexicons
regression, SVM) utilized in the
study.

Manual annotation + ML-


based approach (Random
(57) 2021 No Twitter English No
Forest, SVM, Naïve
Bayes)
ML-based approach (deep Product
(58) 2019 Yes Arabic No
learning) reviews
Comments in
(59) 2019 Yes ML-based approach - No
social media
Manual annotation + ML-
(60) 2021 No Twitter Arabic No
based approach

Focus on Digital Public Services


In (45) and (46) an aspect-based sentiment analysis of mobile government app reviews is performed. The app is
developed as part of a smart government initiative. The authors emphasize that in the recent years more and more
government services become digital and huge amounts are spent by governments to understand their app users and
adapt services to meet their needs. The main goal of both studies is to analyze citizens’ sentiments, needs and
expectations via the application of natural language processing techniques. In (45) the authors rely on manual
annotation of app reviews in combination with the usage of sentiment lexicon to perform aspect-based sentiment
analysis. The sentiment lexicon is developed by the authors within the study. The proposed methodologies are
valuable for understanding citizens and their sentiments and expectations of services provided by the government.
Furthermore, by utilizing findings from the analyses the performance of such government apps could be monitored,
and improvement areas could be easily identified.
Chakraborty (47) aims at identifying public sentiment expressed towards the initiative “Digital India” - a
program for providing essential government services through digital channels. The government of India launched
the “Digital India” campaign to ensure that essential government services are made available to citizens
electronically. The author analyzes public posts and comments in Twitter which are manually annotated as being
“supportive”, “neutral”, “criticizing” etc. The researcher justifies his choice of social media by describing Twitter as
a media that has a strong influence on modifying public opinion. Interesting descriptive analyses in terms of
comment categories, main twitter accounts involved, used language and other are performed. However, the study
does not include the application of any advanced techniques for sentiment analysis.
Al-Qudah et al. (48) are focused on capturing citizens’ sentiments expressed towards an online payment service
operated by the Central Bank of Jordan and used by citizens to pay digitally their bills for education, healthcare,
telecommunications, water, electricity and other. The digitalization of this public service aims to replace traditional
methods of payment and is crucial for building a digital society in Jordan. Authors utilize data in Facebook and
Twitter in the form of comments and reviews which are then manually annotated with the following sentiment
categories - positive, negative, neutral. The manually annotated sample is used for developing a sentiment
classification model. Several algorithms are tested - XGBoost, k-NN, J48 and other. A neutrality detector model is
used to filter out neutral words and retain only opinionated comments/words.
In (49) a road map which introduces KPIs measurements of services in the e-government of Kuwait is proposed.
The study is closely associated to Kuwait’s aim to enhance the ICT services for citizens and increase citizen
satisfaction by promoting the digital transformation of the public sector. The authors propose a road map aimed at
supporting ICT development in the educational sector of the e-government of Kuwait. For measuring citizens’
satisfaction with the provided public services (which is one of the KPIs) the authors rely only on questionnaire data.
If compared to other approaches, the sample of analyzed opinions is rather small (291 respondents). A sentiment
polarity classification model is built on a set of reviews collected from the questionnaire data. The authors apply a
classical ML approach involving feature engineering and application of a logistic regression model. The results from
the statistical analysis of data have important implications for the e-government services - despite the huge
investments in ICT, the e-government efforts do not lead to acceptable level of citizen satisfaction. The authors
suggest that the e-government authorities should make the information from citizen comments and reviews publicly
available so that researchers could utilize it and derive meaningful insights aimed at accelerating the ICT
development.
In (58) an innovative methodology for automating e-government services with the main goal of improving
customer satisfaction and reducing time and financial costs is proposed. The methodology is based mainly on
artificial intelligence techniques. The authors mention some of the challenges e-governments have in adopting deep
learning technologies in service automation. The novelty of the research is emphasized by the development of an
architecture of smart e-government platform that aims to support the implementation of AI in e-government. One of
the components of the proposed framework is devoted on sentiment analysis. A deep learning approach for
sentiment polarity analysis is proposed. The approach is based on recurrent neural networks and as a main source of
training data are used product reviews. It could be argued whether using such data for developing a sentiment
analysis model is the right approach if the model will be applied on citizens comments regarding e-government or
similar data in this domain.
Alguliyev et al. study an interesting problem concerning modern e-governments (59). The research is focused on
the detection of hidden social networks and propaganda against the e-government. In the recent years, studies aimed
at the detection of trolls, propaganda and other types of deceiving information became popular in the field of text
mining due to the increasing number of such behaviors in social networks. Propaganda threatens the security and
reputation of e-governments. The authors propose a conceptual approach based on social network extraction with
the main aim of detecting such behaviors. The approach includes several stages of analysis one of which is devoted
on sentiment analysis of citizens’ comments. The proposed approach is on the sentence level and considers three
sentiment categories – positive, negative, neutral. Social network extraction is applied on the sample of users posting
negative comments and the relationships between these users are analyzed in further detail.
Our literature review devoted on sentiment analysis of citizens’ opinion on digital public services provides
evidence that currently there are no such studies analyzing the opinion of Bulgarian citizens by application of
advanced analytics techniques. Our findings are not surprising since the digitalization of public services in Bulgaria
is still developing. Of course, there are studies aimed at analyzing public opinion on other important topics in the
political and economic life in Bulgaria - (61), (62), (63). Many social and political surveys in Bulgaria are carried
out by the Gallup International Association. Existing studies apply mainly in-depth interviews and surveys as main
tools for public opinion analysis as opposed to modern data mining and advanced analytics techniques. Our review
reveals that currently there are no studies applying NLP and ML techniques for public opinion mining in Bulgaria
regarding the state of the e-government or related issues. Our conclusions are also supported by findings in (64).
More research efforts have to be put in the application of computational methods for mining public opinion as
opposed to the frequently used qualitative approaches.

Focus on Other Topics of Public Interest

In (50) and (51) Kowalski et al. outline the benefits of text mining and machine learning for analyzing public
opinion and the implications for the public administration. Both studies are focused on public healthcare - authors
analyze a sample of citizen reviews of primary care practices in England. Authors frame determinants of user
satisfaction in several dimensions of service quality by the application of LDA (Latent Dirichlet Allocation) (50) or
structural topic modelling (STM) (51) in combination with a Random Forest model for prediction of user
satisfaction level. The study makes important insights into patients’ key drivers of satisfaction and recommends
concrete government actions necessary in order to improve the provided services. The results suggest that patient
satisfaction levels are influenced by factors which are not captured by surveys. Authors emphasize on the important
role of comments/reviews in free textual format implying that “while surveys are reliable, they cover narrow sub-
samples of citizen experiences” (51).
Another study aimed at citizen opinion mining is that of Dandannavar et al. (52). The authors argue that
comments in social networks could be efficiently utilized to help and guide governments in the development of
successful and sustainable government initiatives and innovations. According to them, social sentiment analysis
could help in identifying citizens feelings and concerns about government programs and policies and their various
aspects. A framework for social sentiment analysis is proposed. The system has several phases - data collection,
preprocessing, feature extraction, sentiment analysis and polarity classification. For sentiment analysis, the authors
propose a combination of sentiment lexicons and machine learning methods (a hybrid approach). As in other related
work, the considered main data source of public opinion is Twitter.
Hubert et al. (53) propose a methodology for analysis of government-citizen interactions in Twitter. Under focus
are the interactions through official government accounts in the social network. The study aims to analyze the
government activity, resources shared between government and citizens as part of interactions, citizen responses and
sentiments to government announcements etc. Under focus are interactions in the field of healthcare, social
development, education, environment, and other. Authors’ methodology consists of various visualization tools used
to reveal patterns and trends in government-citizen interactions in Twitter. For sentiment analysis the authors utilize
the NRC Affect Intensity Lexicon (the Spanish version is utilized since data is in Spanish) and examine eight
primary emotions - joy, trust, fear, surprise, sadness, disgust, anger, and anticipation. The aim is to assess the
general mood of citizens in response to government tweets. The analysis of Twitter data only is considered as a
limitation and the authors plan on including other social networks.
Another study utilizing Twitter data as a main source of citizens’ opinions is that of Mendez et al. (54). The
study is focused on the public transportation system in Santiago, Chile. Authors’ aim is to overcome the limitations
of traditional surveys and inspect citizens’ opinions freely expressed in social networks. One of the interesting
research questions posed in the study is whether satisfaction surveys might be replaced by information reported on
Twitter, which has massive coverage and is free. To provide an answer to this question, the authors combine
sentiment analysis techniques with topic modeling. In sentiment analysis, the authors first experiment with the
Spanish version of SentiStrength. However, after manual annotation and comparison with the results from the
dictionary-based approach, it becomes clear that only 41% of Tweets are correctly classified by using SentiStrength.
Empirical results suggest that level of detail and variety of answers in surveys are higher than the ones obtained by
analyzing comments in free textual format. However, the last cover many topics and can be used to effectively
diagnose problems in a timely manner. Authors suggest utilizing a combination of both the proposed methodology
and surveys as an effective way of public opinion mining.
In (55) under focus is risk communication management carried out by the government and main health
organizations during the COVID-19 pandemic in Spain. By utilizing web scraping techniques, the authors analyze
citizens’ interactions in various social media - Twitter, YouTube, Instagram, official press websites, and internet
forums. The study also investigates citizens’ emotions expressed in social media during the pandemic. The authors
use the Natural Language Understanding service of IBM Watson system for mining the following citizens emotions
- anger, fear, disgust, and sadness. The IBM platform enables an analysis of syntactic characteristics and provides
information on concepts, emotions, entities, keywords, relationships, and semantic roles found in text data. The
study reveals interesting insights into public emotions regarding different aspects of the COVID-19 pandemic and
addresses main issues in government control of the crisis (for example, no dialogue between the government and
other social actors, as well as contradictory communication).
In (56) an ML framework for mining public sentiments from microblogs is proposed. The authors study public
opinions regarding the China Pakistan Economic Corridor (CPEC). Contributions of the study include the
development of a: 1. database with tweets on CPEC; 2. ML-based sentiment analysis system for classifying public
tweets regarding CPEC; 3. domain-specific sentiment lexicon. As in many other studies in the field, Twitter data is
utilized. Manual annotation is performed to categorize tweets as positive, negative, and neutral. An algorithm for
automatic generation of a domain-specific lexicon is provided. During the construction of the sentiment lexicon, the
authors make use of POS tagging in order to extract adjectives and adverbs out of raw textual data. Authors motivate
the choice of these parts of speech by claiming that they are more likely to convey public sentiments. After manual
annotation and sentiment lexicon generation, the sentiment analysis task is approached as a supervised problem and
three popular algorithms are utilized for polarity classification – k-NN, SVM, Logistic Regression. The authors plan
to include data from other social networks and implement topic models into the sentiment analysis system.
In (57) a straightforward approach for ML-based sentiment analysis is proposed by Andoh et al. Tweets
regarding the political life in Ghana are collected and manually annotated as positive, negative, or neutral. Three
popular algorithms for text classification are tested – Random Forest, SVM and Naïve Bayes. In (60) a sentiment
tracking system using data generated from verified Twitter news accounts (news agencies, newspapers,
organizations etc.) is developed. One of the possible applications of the system is to facilitate the decision-making
processes of governments. After manual annotation of a subset of the sample followed by automatic annotation of
the rest of the sample, an ML-based approach for sentiment classification is utilized. Several text classification
algorithms are tested on both the manually and automatically annotated samples - Logistic Regression, Multilayer
Perceptron, Naïve Bayes, and SVM.

DISCUSSION

The field of citizen opinion mining has received a lot of research attention in the recent years. In the current
review of studies in this field, 75% of the papers have been published in the last two years. Undoubtedly, the field
will continue to expand as a result of government digitalization and introduction of artificial intelligence
technologies in this domain. In terms of main areas of public sentiments under analysis – in the current review
mainly papers in the area of e-government were included but not only. Our review reveals that other areas in the
governmental domain which could benefit from citizen opinion mining include healthcare, policy making, political
issues and transportation. Studies in the area of e-government cover a wide range of topics – among them are mining
the opinions regarding mobile government apps and online payment public services, introduction of KPIs
measurements of services in the e-government, track of opinions regarding digitalization campaigns in the public
sector, detection of hidden social networks and propaganda against the e-government and other. Some of the studies
are devoted on a particular digital public service, while others are focused on capturing “the general picture” in e-
government development. Our review supported from findings in other research reveals a research gap in the field -
there are no studies utilizing NLP and machine learning techniques for analysis of Bulgarian citizens’ opinions on
the provision of electronic public services in Bulgaria.
From the review, we observe that sentiment analysis has been applied most frequently on the document level.
However, there are examples of studies carrying out the analysis on the sentence or even aspect levels - (46), (59). It
is not surprising that sentiment analysis of citizen opinions has been applied mainly on the document level since the
main data source of such opinions appears to be Twitter. The last is a social network in which people express
themselves by posting short messages called “tweets”. More than a half of the reviewed articles utilize Twitter data,
but still there are studies considering other social media – (46), (55). Some authors outline the utilization of only
Twitter data as a research limitation. More attention has to be paid to discussion forums as a main source of public
opinions. However, such data has a rather noisy structure and poses some challenges in text processing and analysis.
Only one article in the current review claims to use forum data – (55). In terms of text data language, along with the
most frequently used English and Spanish, there is an interest in public opinion mining applications for low-resource
languages as Arabic, Hindi and Jordanian.
In terms of general methodology for sentiment analysis, our study reveals that the usage of sentiment lexicons or
manual annotation (or both) is almost inevitable. Among the sentiment analysis tools utilized in the domain of
citizen opinion mining are NRC Affect Intensity Lexicon, SentiStrength and IBM Watson system. Some authors
even develop domain specific lexicons. Many studies apply hybrid approaches towards the task of public opinion
mining because of the lack of labeled data. Among the most frequently used machine learning algorithms are
logistic regression and SVM. Classical machine learning methodologies are preferred and only two of the reviewed
articles apply deep learning techniques for public opinion mining. The last might be due to several factors. The first
reason is the lack of labeled data and usually small samples - such problems are not well-suited for deep learning
applications. Another reason hides in the fact that the field is still emerging, and researchers prefer to test more
straight-forward, interpretable, and well-studied methodologies for sentiment analysis since all these aspects are
important in the governmental domain. However, in the future we expect more research efforts put into the
application of deep learning and transfer learning for public opinion mining. Finally, it is important to mention that
some studies suggest that the most effective approach for public opinion mining is a combination between
traditional methods as surveys and sentiment analysis of comments posted in social media - (51), (54).

ACKNOWLEDGMENTS

The presentation and dissemination of these research results is supported in part by National Science Fund
Project КП-06-Н45/3/30.11.2020 “Identifying citizens' attitudes and assessments about access, quality, and usage of
electronic public services”.

REFERENCES

1. C. Alexopoulos, Z. Lachana, A. Androutsopoulou, V. Diamantopoulou, Y. Charalabidis and M. A. Loutsaris,


“How machine learning is changing e-government,” in Proceedings of the 12th International Conference on
Theory and Practice of Electronic Governance, pp. 354-363, (2019).
2. United Nations Department of Economic and Social Affairs, “E-Government Survey 2020: Digital
Government in the Decade of Action for Sustainable Development,” (2020).
3. G. Beirão and J. S. Cabral, “Understanding attitudes towards public transport and private car: A qualitative
study,” Transport policy, vol. 14, no. 6, pp. 478-489, (2007).
4. M. Reuchamps, H. Boerjan, C. Niessen and F. Randour, “More or less regional autonomy? A qualitative
analysis of citizen arguments towards (de) centralization in Belgium,” Comparative European Politics, vol. 19,
no. 2, pp. 225-247, (2021).
5. A. M. Chaudhry and M. Bilal, “Migration, Diaspora and Citizenship: A Qualitative Study of the Perceptions of
Pakistani Nationals towards the Political Rights of Pakistani Dual Citizens,” International Migration, vol. 59,
no. 1, pp. 58-73, (2021).
6. P. Balaji, D. Haritha and O. Nagaraju, “An overview on opinion mining techniques and sentiment
analysis,” Int. J Pure and Appl. Math, vol. 118, no. 19, pp. 61-69, (2018).
7. B. Pang, L. Lee, “Opinion mining and sentiment analysis,” Foundations and Trends in Information Retrieval,
vol.2, no. 1-2, pp. 1-135, (2008)
8. K. Mishev, A. Gjorgjevikj, I. Vodenska, L. T. Chitkushev and D. Trajanov, “Evaluation of sentiment analysis
in finance: from lexicons to transformers,” IEEE Access, vol. 8, pp. 131662-131682, (2020).
9. R. S. Jagdale, V. S. Shirsat and S. N. Deshmukh, “Sentiment analysis on product reviews using machine
learning techniques,” Cognitive Informatics and Soft Computing, pp. 639-647, (2019).
10. F. B. Hamzah, C. Lau, H. Nazri, D. V. Ligot, G. Lee, C. L. Tan, M. K. B. M. Shaib, U. H. B. Zaidon, A. B.
Abdullah, M. H. Chung, C. H. Ong, P. Y. Chew and R. E. Salunga, “CoronaTracker: worldwide COVID-19
outbreak data analysis and prediction,” Bull World Health Organ, vol. 1, no. 32 (2020).
11. N. Öztürk and S. Ayvaz, “Sentiment analysis on Twitter: A text mining approach to the Syrian refugee crisis,”
Telematics and Informatics, vol. 35, no. 1, pp. 136-147, (2018).
12. S. Rani and P. Kumar, “A sentiment analysis system to improve teaching and learning,” Computer, vol. 50, no.
5, pp. 36-43, (2017).
13. P. D. Turney, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of
Reviews,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL),
pp. 417-424, (2002).
14. B. Pang, L. Lee and S. Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning
Techniques,” in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing
(EMNLP 2002), pp. 79-86, (2002).
15. B. Liu, “Sentiment analysis and opinion mining,” Synthesis lectures on human language technologies, vol. 5,
no. 1, pp. 1-167, (2012).
16. D. M. E. D. M. Hussein, “A survey on sentiment analysis challenges,” Journal of King Saud University-
Engineering Sciences, vol. 30, no. 4, pp. 330-338, (2018).
17. B. R. Chakravarthi, R. Priyadharshini, V. Muralidaran, S. Suryawanshi, N. Jose, E. Sherly and J. P. McCrae,
“Overview of the track on sentiment analysis for dravidian languages in code-mixed text,” Forum for
Information Retrieval Evaluation, pp. 21-24, (2020).
18. A. H. Shapiro, M. Sudhof and D. J. Wilson, “Measuring news sentiment,” Journal of Econometrics, (2020).
19. K. Dave, S. Lawrence and D. M. Pennock, “Mining the peanut gallery: Opinion extraction and semantic
classification of product reviews,” in Proceedings of the 12th international conference on World Wide Web,
pp. 519-528, (2003).
20. A. Yadollahi, A. G. Shahraki and O. R. Zaiane, “Current state of text sentiment analysis from opinion to
emotion mining,” ACM Computing Surveys (CSUR), vol. 50, no. 2, pp. 1-33, (2017).
21. V. Gupta, N. Jain, P. Katariya, A. Kumar, S. Mohan, A. Ahmadian and M. Ferrara, “An emotion care model
using multimodal textual analysis on COVID-19,” Chaos, Solitons & Fractals, vol. 144, no. 110708, pp. 1-9,
(2021).
22. W. Medhat, A. Hassan and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain
Shams Engineering Journal, vol. 5, no. 4, pp. 1093-1113, (2014).
23. B. Liu, “Sentiment analysis: Mining opinions, sentiments, and emotions,” (Cambridge university press, 2020).
24. S. Baccianella, A. Esuli and F. Sebastiani, “Sentiwordnet 3.0: an enhanced lexical resource for sentiment
analysis and opinion mining,” in Lrec, vol. 10, no. 2010, pp. 2200-2204, (2010).
25. F. Å. Nielsen, “A new ANEW: Evaluation of a word list for sentiment analysis in microblogs,” preprint
arXiv:1103.2903, (2011).
26. C. Hutto and E. Gilbert, “Vader: A parsimonious rule-based model for sentiment analysis of social media text,”
in Proceedings of the International AAAI Conference on Web and Social Media, vol. 8, no. 1, (2014).
27. W. L. Hamilton, K. Clark, J. Leskovec and D. Jurafsky, “Inducing domain-specific sentiment lexicons from
unlabeled corpora,” in Proceedings of the conference on empirical methods in natural language processing.
conference on empirical methods in natural language processing, vol. 2016, pp. 595-605, (2016).
28. S. Loria, “textblob Documentation. Release 0.16.0,” (2018).
29. B. Kapukaranov and P. Nakov, “Fine-grained sentiment analysis for movie reviews in Bulgarian,” in
Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 266-274,
(2015).
30. K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes and D. Brown, “Text classification
algorithms: A survey,” Information, vol. 10, no. 4, p. 150, (2019).
31. K. Ravi and V. Ravi, “A survey on opinion mining and sentiment analysis: tasks, approaches and
applications,” Knowledge-Based Systems, vol. 89, pp. 14-46, (2015).
32. F. Rustam, M. Khalid, W. Aslam, V. Rupapara, A. Mehmood and G. S. Choi, “A performance comparison of
supervised machine learning models for Covid-19 tweets sentiment analysis,” Plos one, vol. 16, no. 2, (2021).
33. N. O. F. Daeli and A. Adiwijaya, “Sentiment analysis on movie reviews using Information gain and K-nearest
neighbor,” Journal of Data Science and Its Applications, vol. 3, no. 1, pp. 1-7, (2020).
34. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and
phrases and their compositionality,” in Advances in neural information processing systems, pp. 3111–3119,
(2013).
35. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin,
“Attention is all you need,” preprint arXiv:1706.03762, (2017).
36. C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang and C. Liu, “A survey on deep transfer learning,” in International
conference on artificial neural networks, pp. 270-279, (2018).
37. T. Zhang, B. Xu, F. Thung, S. A. Haryono, D. Lo and L. Jiang, “Sentiment Analysis for Software Engineering:
How Far Can Pre-trained Transformer Models Go?,” in 2020 IEEE International Conference on Software
Maintenance and Evolution (ICSME), pp. 70-80, (2020).
38. M. G. Sousa, K. Sakiyama, L. de Souza Rodrigues, P. H. Moraes, E. R. Fernandes and E. T. Matsubara,
“BERT for Stock Market Sentiment Analysis,” in 2019 IEEE 31st International Conference on Tools with
Artificial Intelligence (ICTAI), pp. 1597-1601, (2019).
39. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for
language understanding,” preprint arXiv:1810.04805, (2018).
40. G. Lample and A. Conneau, “Cross-lingual language model pretraining,” preprint arXiv:1901.07291, (2019).
41. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov and Q. V. Le, “Xlnet: Generalized autoregressive
pretraining for language understanding,” Advances in neural information processing systems, vol. 32, (2019).
42. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer and V. Stoyanov,
“Roberta: A robustly optimized bert pretraining approach,” preprint arXiv:1907.11692, (2019).
43. V. Sanh, L. Debut, J. Chaumond and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster,
cheaper and lighter,” preprint arXiv:1910.01108, (2019).
44. T. Wolf, L. Debut, V. Sanh, J, Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J.
Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q.
Lhoest and A. M. Rush, “HuggingFace's Transformers: State-of-the-art natural language processing,” preprint
arXiv:1910.03771, (2019).
45. S. Areed, O. Alqaryouti, B. Siyam and K. Shaalan, “Aspect-based sentiment analysis for Arabic government
reviews,” Recent Advances in NLP: The Case of Arabic Language, pp. 143-162, (2020).
46. O. Alqaryouti, N. Siyam, A. A. Monem and K. Shaalan, “Aspect-based sentiment analysis using smart
government review data,” Applied Computing and Informatics, (2020).
47. A. Chakraborty, “IDENTIFICATION OF PUBLIC SENTIMENT OVER COMMENTS THROUGH TWEETS
BY DIGITAL INDIA,” PalArch's Journal of Archaeology of Egypt/Egyptology, vol. 17, no. 7, pp. 9661-9694,
(2020).
48. D. A. Al-Qudah, A. Z. Ala’M, P. A. Castillo-Valdivieso and H. Faris, “Sentiment Analysis for e-Payment
Service Providers Using Evolutionary eXtreme Gradient Boosting,” IEEE Access, vol. 8, pp. 189930-189944,
(2020).
49. S. A. Gaber and B. Kazim, “A Proposed Road Map To Enhance E-Government Services: Kuwait Case Study,”
International Journal of Advanced Research and Publications, vol. 3, no. 12, (2019).
50. R. Kowalski, M. Esteve and S. J. Mikhaylov, “Application of Natural Language Processing to Determine User
Satisfaction in Public Services,” preprint arXiv:1711.08083, (2017).
51. R. Kowalski, M. Esteve and S. J. Mikhaylov, “Improving public services by mining citizen feedback: An
application of natural language processing,” Public Administration, vol. 98, no. 4, pp. 1011-1026, (2020).
52. P. S. Dandannavar, S. R. Mangalwede and S. B. Deshpande, “A proposed framework for evaluating the
performance of government initiatives through sentiment analysis,” Cognitive informatics and soft computing,
pp. 321-330, (2019).
53. R. B Hubert, E. Estevez, A. Maguitman and T. Janowski, “Examining government-citizen interactions on
Twitter using visual and sentiment analysis,” in Proceedings of the 19th annual international conference on
digital government research: governance in the data age, pp. 1-10, (2018).
54. J. T. Méndez, H. Lobel, D. Parra and J. C. Herrera, “Using Twitter to infer user satisfaction with public
transport: the case of Santiago, Chile,” IEEE Access, vol. 7, pp. 60255-60263, (2019).
55. C. de Las Heras-Pedrosa, P. Sánchez-Núñez and J. I. Peláez, “Sentiment analysis and emotion understanding
during the COVID-19 pandemic in Spain and its impact on digital ecosystems,” International Journal of
Environmental Research and Public Health, vol. 17, no. 15, (2020).
56. B. Amina and T. Azim, “SCANCPECLENS: A framework for automatic lexicon generation and sentiment
analysis of micro blogging data on China Pakistan economic corridor,” IEEE Access, vol. 7, pp. 133876-
133887, (2019).
57. J. Andoh, L. Asiedu, A. Lotsi and C. Chapman-Wardy, “Statistical Analysis of Public Sentiment on the
Ghanaian Government: A Machine Learning Approach,” Advances in Human-Computer Interaction, vol.
2021, (2021).
58. O. S. Al-Mushayt, “Automating E-Government services with artificial intelligence,” IEEE Access, vol. 7, pp.
146821-146829, (2019).
59. R. M. Alguliyev, R. M. Aliguliyev and G. Y. Niftaliyeva, “Extracting social networks from e-government by
sentiment analysis of users' comments,” Electronic Government, an International Journal, vol. 15, no. 1, pp.
91-106, (2019).
60. A. Al-Laith and M. Shahbaz, “Tracking sentiment towards news entities from Arabic news on social media,”
Future Generation Computer Systems, vol. 118, pp. 467-484, (2021).
61. H. Bogdanov, B. Andreeva, and G. Marinov, “How Does Tax Administration in Bulgaria Works-a Study of
Public's Opinion,” Izvestia Journal of the Union of Scientists-Varna. Economic Sciences Series, vol. 7, no. 2,
pp. 115-124, (2018).
62. G. I. Pavlova, “SURVEY ON PUBLIC ATTITUDES TOWARDS CHANGES OF THE HEALTH
INSURANCE MODEL IN BULGARIA,” Евразийский Союз Ученых, vol. 12, no. 3, (2019).
63. J. Yang and C. Williams, “Illegitimate economic practices in Bulgaria: Findings from a representative survey
of 2,005 citizens,” available at SSRN, (2017).
64. G. Hristova, “Text Analytics in Bulgarian: An Overview and Future Directions,” Cybernetics and Information
Technologies, vol. 21, no. 3 (in press).

You might also like