Advances in Data Science Methodologies and Applications 2021
Advances in Data Science Methodologies and Applications 2021
Series Editors
Janusz Kacprzyk
Polish Academy of Sciences, Warsaw, Poland
Lakhmi C. Jain
Faculty of Engineering and Information Technology, Centre for Arti icial
Intelligence, University of Technology, Sydney, NSW, Australia, KES
International, Shoreham-by-Sea, UK; Liverpool Hope University,
Liverpool, UK
Anna Esposito
Dipartimento di Psicologia, Università della Campania “Luigi Vanvitelli”,
and IIASS, Caserta, Italy
Lakhmi C. Jain
University of Technology Sydney, Broadway, Australia
Liverpool Hope University, Liverpool, UK
KES International, Shoreham-by-Sea, UK
The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional af iliations.
Anna Esposito
Email: [email protected]
Email: [email protected]
Lakhmi C. Jain
Email: [email protected]
Email: [email protected]
Abstract
Big data and data science are transforming our world today in ways we
could not have imagined at the beginning of the twenty- irst century.
The accompanying wave of innovation has sparked advances in
healthcare, engineering, business, science, and human perception,
among others. In this chapter we discuss big data and data science to
establish a context for the state-of-the-art technologies and
applications in this book. In addition, to provide a starting point for
new researchers, we present an overview of big data management and
analytics methods. Finally, we suggest opportunities for future
research.
1.1 Introduction
Big data and data science are transforming our world today in ways we
could not have imagined at the beginning of the twenty- irst century.
Although the underlying enabling technologies were present in 2000—
cloud computing, data storage, internet connectivity, sensors, arti icial
intelligence, geographic positioning systems (GPS), CPU power, parallel
computing, machine learning—it took the acceleration, proliferation
and convergence of these technologies to make it possible to envision
and achieve massive storage and data analytics at scale. The
accompanying wave of innovation has sparked advances in healthcare,
engineering, business, science, and human perception, among others.
This book offers a snapshot of state-of-the-art technologies and
applications in data science that can provide a foundation for future
research and development.
‘Data science’ is a broad term that can be described as “a set of
fundamental principles that support and guide the principled
extraction of information and knowledge from data” [20], p. 52, to
inform decision making. Closely af iliated with data science is ‘data
mining’ that can be de ined as the process of extracting knowledge from
large datasets by inding patterns, correlations and anomalies. Thus,
data mining is often used to develop predictions of the future based on
the past as interpreted from the data.
‘Big data’ make possible more re ined predictions and non-obvious
patterns due to a larger number of potential variables for prediction
and more varied types of data. In general, ‘big data’ can be de ined as
having one or more of characteristics of the 3 V’s of Volume, Velocity
and Variety [19]. Volume refers to the massive amount of data; Velocity
refers to the speed of data generation; Variety refers to the many types
of data from structured to unstructured. Structured data are organized
and can reside within a ixed ield, while unstructured data do not have
clear organizational patterns. For example, customer order history can
be represented in a relational database, while multimedia iles such as
audio, video, and textual documents do not have formats that can be
pre-de ined. Semi-structured data such as email fall between these two
since there are tags or markers to separate semantic elements. In
practice, for example, continual earth satellite imagery is big data with
all 3 V’s, and it poses unique challenges to data scientists for knowledge
extraction.
Besides data and methods to handle data, at least two other
ingredients are necessary for data science to yield valuable knowledge.
First, after potentially relevant data are collected from various sources,
data must be cleaned. Data cleaning or cleansing is the process of
detecting, correcting and removing inaccurate and irrelevant data
related to the problem to be solved. Sometimes new variables need to
be created or data put into a form suitable for analysis. Secondly, the
problem must be viewed from a “data-science perspective [of] …
structure and principles, which gives the data scientist a framework to
systematically treat problems of extracting useful knowledge from
data” [20]. Data visualization, domain knowledge for interpretation,
creativity, and sound decision making are all part of a data-science
perspective. Thus, advances in data science require unique expertise
from the authors that we are proud to present in the following pages.
The chapters in this book are brie ly summarized in Sect. 3 of this
article.
However, before proceeding with a description of the chapters, we
present an overview of big data management and analytics methods in
the following section. The purpose of this section is to provide an
overview of algorithms and techniques for data science to help place
the chapters in context and to provide a starting point for new
researchers who want to participate in this exciting ield.
1.5 Conclusions
This chapter presented an overview of big data and data science to
provide a context for the chapters in this book. To provide a starting
point for new researchers, we also provided an overview of big data
management and analytics methods. Finally, we pointed out
opportunities for future research.
We want to sincerely thank the contributing authors for sharing
their deep research expertise and knowledge of data science. We also
thank the publishers and editors who helped us achieve this book. We
hope that both young and established researchers ind inspiration in
these pages and, perhaps, connections to a new research stream in the
emerging and exciting ield of data science.
Acknowledgements
The research leading to these results has received funding from the EU
H2020 research and innovation program under grant agreement N.
769872 (EMPATHIC) and N. 823907 (MENHIR), the project
SIROBOTICS that received funding from Italian MIUR, PNR 2015-2020,
D. D. 1735, 13/07/2017, and the project ANDROIDS funded by the
program V: ALERE 2019 Università della Campania “Luigi Vanvitelli”, D.
R. 906 del 4/10/2019, prot. n. 157264, 17/10/2019.
References
1. Agrawal, R., Imieliń ski, T., Swami, A.: Mining association rules between sets of items in large
databases. ACM SIGMOD Rec. 22, 207–216 (1993)
[Crossref]
2. Chong, A.Y.L., Li, B., Ngai, E.W.T., Ch’ng, E., Lee, F.: Predicting online product sales via online
reviews, sentiments, and promotion strategies: a big data architecture and neural network
approach. Int. J. Oper. Prod. Manag 36(4), 358–383 (2016)
[Crossref]
3. Cui, B., Mondal, A., Shen, J., Cong, G., Tan, K. L.: On effective e-mail classi ication via neural
networks. In: International Conference on Database and Expert Systems Applications (pp. 85–
94). Springer, Berlin, Heidelberg (2005, August)
4. Dang, T., Stasak, B., Huang, Z., Jayawardena, S., Atcheson, M., Hayat, M., Le, P., Sethu, V., Goecke,
R., Epps, J.: Investigating word affect features and fusion of probabilistic predictions
incorporating uncertainty in AVEC 2017. In: Proceedings of the 7th Annual Workshop on
Audio/Visual Emotion Challenge, Mountain View, CA. 27–35, (2017)
5. Epasto, A., Lattanzi, S., Mirrokni, V., Sebe, I.O., Taei, A., Verma, S.: Ego-net community mining
applied to friend suggestion. Proc. VLDB Endowment 9, 324–335 (2015)
[Crossref]
6. Erlandsson, F., Bró dka, P., Borg, A., Johnson, H.: Finding in luential users in social media using
association rule learning. Entropy 18(164), 1–15 (2016). https://doi.org/10.3390/e1805016
7. Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to
classi ication. IEEE Trans. Syst. Many, and Cybern. Part C: Appl. Rev. 40(2), 121–144 (2010)
8. Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf.
Manage. 35, 137–144 (2015)
[Crossref]
9. Gong, Y., Poellabauer, C.: Topic modeling based on multi-modal depression detection. In:
Proceeding of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View,
CA, pp. 69–76, (2017)
10.
Gü neş, I., Gü ndü z-Oĝ üdü cü , Ş., Çataltepe, Z.: Link prediction using time series of
neighborhood-based node similarity scores. Data Min. Knowl. Disc. 30, 147–180 (2016)
[MathSciNet][Crossref]
11. Gupta, B., Rawat, A., Jain, A., Arora, A., Dhami, N.: Analysis of various decision tree algorithms
for classi ication in data mining. Int. J. Comput. Appl. 163(8), 15–19 (2017)
12. Koc, Y., Eyduran, E., Akbulut, O.: Application of regression tree method for different data from
animal science. Pakistan J. Zool. 49(2), 599–607 (2017)
[Crossref]
13. Linden, A., Yarnold, P.R.: Modeling time-to-event (survival) data using classi ication tree
analysis. J Eval. Clin. Pract. 23(6), 1299–1308 (2017)
[Crossref]
14. Liu, C., Wang, J., Zhang, H., Yin, M.: Mapping the hierarchical structure of the global shipping
network by weighted ego network analysis. Int. J. Shipping Transp. Logistics 10, 63–86
(2018)
[Crossref]
15. Mowlaei, M.F., Abadeh, M.S., Keshavarz, H.: Aspect-based sentiment analysis using adaptive
aspect-based lexicons. Expert Syst. Appl. 148, 113234 (2020)
16. Nisbet R., Elder J., Miner G.: The three most common data mining software tools. In: Handbook
of Statistical Analysis and Data Mining Applications, Chapter 10, pp. 197–234, (2009)
17. Pang-Ning T., Steinbach M., Vipin K.: Association analysis: basic concepts and algorithms. In:
Introduction to Data Mining, Chap. 6, Addison-Wesley, pp. 327–414, (2005). ISBN 978-0-321-
32136-7
18. Park, S., Lee, J., Kim, K.: Semi-supervised distributed representations of documents for
sentiment analysis. Neural Networks 119, 139–150 (2019)
[Crossref]
19. Phillips-Wren G., Iyer L., Kulkarni U., Ariyachandra T.: Business analytics in the context of big
data: a roadmap for research. Commun. Assoc. Inf. Syst. 37, 23 (2015)
20. Provost, F., Fawcett, T.: Data science and its relationship to big data and data-driven decision
making. Big Data 1(1), 51–59 (2013)
[Crossref]
21. Rout, J.K., Choo, K.K.R., Dash, A.K., Bakshi, S., Jena, S.K., Williams, K.L.: A model for sentiment
and emotion analysis of unstructured social media text. Electron. Commer. Res. 18(1), 181–
199 (2018)
[Crossref]
22. Tiefenbacher K., Olbrich S.: Applying big data-driven business work schemes to increase
customer intimacy. In: Proceedings of the International Conference on Information Systems,
Transforming Society with Digital Innovation, (2017)
23.
Tsai, C.-F., Eberleb, W., Chua, C.-Y.: Genetic algorithms in feature and instance selection. Knowl.
Based Syst. 39, 240–247 (2013)
[Crossref]
24. Yadava, A., Jhaa, C.K., Sharanb, A., Vaishb, V.: Sentiment analysis of inancial news using
unsupervised approach. Procedia Comput. Sci. 167, 589–598 (2020)
[Crossref]
25. Zheng, L., Hongwei, W., Song, G.: Sentimental feature selection for sentiment analysis of
Chinese online reviews. Int. J. Mach. Learn. Cybernet. 9(1), 75–84 (2018)
[Crossref]
© Springer Nature Switzerland AG 2021
G. Phillips-Wren et al. (eds.), Advances in Data Science: Methodologies and Applications,
Intelligent Systems Reference Library 189
https://doi.org/10.1007/978-3-030-51870-7_2
Giovanni Diraco
Email: [email protected]
Abstract
Nowadays, smart living technologies are increasingly used to support
older adults so that they can live longer independently with minimal
support of caregivers. In this regard, there is a demand for
technological solutions able to avoid the caregivers’ continuous, daily
check of the care recipient. In the age of big data, sensor data collected
by smart-living environments are constantly increasing in the
dimensions of volume, velocity and variety, enabling continuous
monitoring of the elderly with the aim to notify the caregivers of
gradual behavioral changes and/or detectable anomalies (e.g., illnesses,
wanderings, etc.). The aim of this study is to compare the main state-of-
the-art approaches for abnormal behavior detection based on change
prediction, suitable to deal with big data. Some of the main challenges
deal with the lack of “real” data for model training, and the lack of
regularity in the everyday life of the care recipient. At this purpose,
speci ic synthetic data are generated, including activities of daily living,
home locations in which such activities take place, as well as
physiological parameters. All techniques are evaluated in terms of
abnormality-detection performance and lead-time of prediction, using
the generated datasets with various kinds of perturbation. The
achieved results show that unsupervised deep-learning techniques
outperform traditional supervised/semi-supervised ones, with
detection accuracy greater than 96% and prediction lead-time of about
14 days in advance.
2.1 Introduction
Nowadays available sensing and assisted living technologies, installed
in smart-living environments, are able to collect huge amounts of data
by days, months and even years, yielding meaningful information useful
for early detection of changes in behavioral and/or physical state that,
if left undetected, may be a high risk for frail subjects (e.g., elderly or
disabled people) whose health conditions are amenable to change.
Early detection, indeed, makes it possible to alert relatives, caregivers,
or health-care personnel in advance when signi icant changes or
anomalies are detected, and above all before that critical levels are
reached. The “big” data collected from smart homes, therefore, offer a
signi icant opportunity to assist people for early recognition of
symptoms that might cause more serious disorders, and so in
preventing chronic diseases. The huge amounts of data collected by
different devices require automated analysis, and thus it is of great
interest to investigate and develop automatic systems for detecting
abnormal activities and behaviors in the context of elderly monitoring
[1] and smart living [2] applications.
Moreover, the long-term health monitoring and assessment can
bene it from knowledge held in long-term time series of daily activities
and behaviors as well as physiological parameters [3]. From the big
data perspective, the main challenge is to process and automatically
interpret—obtaining quality information—the data generated, at high
velocity (i.e., high sample rate) and volume (i.e., long-term datasets), by
a great variety of devices and sensors (i.e., structural heterogeneity of
datasets), becoming more common with the rapid advance of both
wearable and ambient sensing technologies [4].
A lot of research has been done in the general area of human
behavior understanding, and more speci ically in the area of daily
activity/behavior recognition and classi ication as normal or abnormal
[5, 6]. However, very little work is reported in the literature regarding
the evaluation of machine learning (ML) techniques suitable for data
analytics in the context of long-term elderly monitoring in smart living
environments. The purpose of this paper is to conduct a preliminary
study of the most representative machine/deep learning techniques, by
comparing them in detecting abnormal behaviors and change
prediction (CP).
The rest of this paper is organized as follows. Section 2.2 contains
related works, some background and state-of-the-art in abnormal
activity and behavior detection and CP, with special attention paid to
elderly monitoring through big data collection and analysis. Section 2.3
describes materials and methods that have been used in this study,
providing an overview of the system architecture, long-term data
generation and compared ML techniques. The indings and related
discussion are presented in Sect. 2.4. Finally, Sect. 2.5 draws some
conclusions and inal remarks.
Activity of daily living (ADL) Home location (LOC) Heart-rate level (HRL)
Eating (AE) Bedroom (BR) Very low (VL) [<50 beats/min]
Housekeeping (AH) Kitchen (KI) Low (LO) [65–80 beats/min]
Physical exercise (AP) Living room (LR) Medium (ME) [80–95 beats/min]
Resting (AR) Toilet (TO) High (HI) [95–110 beats/min]
Sleeping (AS) Very high (VH) [>110 beats/min]
Toileting (AT)
(2.3)
(2.4)
(2.5)
(2.6)
where . The HMM output is a sequence of triples
, where
, , and
represent, respectively, all possible
ADLs, home locations and HR levels (see Table 2.2). In general, a state
can produce a triple from a distribution over all possible triples. Hence,
the probability that the triple
is seen when the system is in state , i.e., the so-called emission
probability, is de ined as follows:
(2.7)
Since HMM does not represent the temporal dependency of activity
states, a hierarchical approach is proposed here by subdividing the day
into N time intervals, and modeling the activities in each time interval
with a dedicate HMM sub-model, namely M1, M2, …, MN, as depicted in
Fig. 2.2. For each sub-model Mi, thus, the irst state being activated
starts at a time Ti modeled as a Gaussian process, while the other states
within the same sub-model Mi start in consecutive time slots whose
durations are also modeled as Gaussian processes.
Fig. 2.2 State diagram of the suggested hierarchical HMM, able to model the temporal
dependency of daily activities
Fig. 2.3 Example of normal dataset, represented as an image of 365 × 288 pixels and 120 levels
Fig. 2.4 Same normal dataset shown in Fig. 2.3 but represented with different images for a ADLs,
b LOCs and c HRLs
Fig. 2.5 Example of abnormal data set, due to change in “Starting time of activity” (St). The
change gradually takes place from the 90th day on
Fig. 2.6 Example of abnormal data set, due to change in “Duration of activity” (Du). The change
gradually takes place from the 90th day on
Fig. 2.7 Example of abnormal data set, due to change in “Disappearing of activity” (Di). The
change gradually takes place from the 90th day on
Fig. 2.8 Example of abnormal data set, due to “Swap of two activities” (Sw). The change gradually
takes place from the 90th day on
Fig. 2.9 Example of abnormal data set, due to change in “Location of activity” (Lo). The change
gradually takes place from the 90th day on
Fig. 2.10 Example of abnormal data set, due to change in “Heart-rate during activity” (Hr). The
change gradually takes place from the 90th day on
Fig. 2.11 Work low of supervised and semi-supervised detection methods. Both normal and
abnormal labels are needed in the supervised training phase, whereas only normal labels are
required in the semi-supervised training
From Table 2.3, it is evident that with the change type Sw, there are
little differences in detection accuracy, which become more marked
with other kind of change such as Lo and Hr. In particular, the
supervised techniques exhibit poor detection accuracy with change
types as Lo and Hr, while the semi-supervised and unsupervised
techniques based on DL maintain good performance also in
correspondence of those change types. Similar considerations can be
carried out by observing the other performance metrics from
Tables 2.4, 2.5, 2.6 and 2.7.
The change types Lo (Fig. 2.9) and Hr (Fig. 2.10) in luence only a
narrow region of the intensity values. More speci ically, only location
values (Fig. 2.9b) are interested in Lo-type datasets, and only heart-rate
values (Fig. 2.10b) in the Hr case. On the other hand, other change
types like Di (Fig. 2.7) or Sw (Fig. 2.8) involve all values, i.e., ADL, LOC
and HRL, and so they are simpler to be detected and predicted.
However, the ability of DL techniques to capture spatio-temporal local
features (i.e., spatio-temporal relations between activities) allowed
good performance to be achieved also with change types whose
intensity variations were con ined in narrow regions.
The lead-times of prediction reported in Table 2.8 were obtained in
correspondence of the performance metrics discussed above and
reported from Tables 2.3, 2.4, 2.5, 2.6 and 2.7. In other words, such
times refer to the average number of days, before the day 180th (since
from this day on, the new behavior becomes stable), at which the
change can be detected with the performance reported from Tables 2.3,
2.4, 2.5, 2.6 and 2.7. The longer the lead-times of prediction the earlier
the change can be predicted. Also in this case, better lead-times were
achieved with change types Di and Sw (i.e., characterized by wider
regions of intensity variations) and with techniques SAE and DC, since
they are able to learn discriminative features more effectively than the
traditional ML techniques.
2.5 Conclusions
The contribution of this study is twofold. First, a common data model
able to represent and process simultaneously both ADLs, home
locations in which such ADLs take place (LOCs) and physiological
parameters (HRLs) as image data is presented. Second, the
performance of state-of-the-art ML-based and DL-based detection
techniques are evaluated by considering big data sets, synthetically
generated, including both normal and abnormal behaviors. The
achieved results are promising and show the superiority of DL-based
techniques in dealing with big data characterized by different kind of
data distribution. Future and ongoing activities are focused on the
evaluation of prescriptive capabilities of big data analytics aiming to
optimize time and resources involved in elderly monitoring
applications.
References
1. Gokalp, H., Clarke, M.: Monitoring activities of daily living of the elderly and the potential for
its use in telecare and telehealth: a review. Telemedi. e-Health 19(12), 910–923 (2013)
[Crossref]
2.
Sharma, R., Nah, F., Sharma, K., Katta, T., Pang, N., Yong, A.: Smart living for elderly: design and
human-computer interaction considerations. Lect. Notes Comput. Sci. 9755, 112–122 (2016)
[Crossref]
3. Parisa, R., Mihailidis, A.: A survey on ambient-assisted living tools for older adults. IEEE J.
Biomed. Health Informat. 17(3), 579–590 (2013)
[Crossref]
4. Vimarlund, V., Wass, S.: Big data, smart homes and ambient assisted living. Yearbook Medi.
Informat. 9(1), 143–149 (2014)
5. Mabrouk, A.B., Zagrouba, E.: Abnormal behavior recognition for intelligent video surveillance
systems: a review. Expert Syst. Appl. 91, 480–491 (2018)
[Crossref]
6. Bakar, U., Ghayvat, H., Hasanm, S.F., Mukhopadhyay, S.C.: Activity and anomaly detection in
smart home: a survey. Next Generat. Sens. Syst. 16, 191–220 (2015)
[Crossref]
7. Diraco, G., Leone, A., Siciliano, P., Grassi, M., Malcovati, P.A.: Multi-sensor system for fall
detection in ambient assisted living contexts. In: IEEE SENSORNETS, pp. 213–219 (2012)
8. Taraldsen, K., Chastin, S.F.M., Riphagen, I.I., Vereijken, B., Helbostad, J.L.: Physical activity
monitoring by use of accelerometer-based body-worn sensors in older adults: a systematic
literature review of current knowledge and applications. Maturitas 71(1), 13–19 (2012)
[Crossref]
9. Min, C., Kang, S., Yoo, C., Cha, J., Choi, S., Oh, Y., Song, J.: Exploring current practices for battery
use and management of smartwatches. In: Proceedings of the 2015 ACM International
Symposium on Wearable Computers, pp. 11–18, September (2015)
10. Stara, V., Zancanaro, M., Di Rosa, M., Rossi, L., Pinnelli, S.: Understanding the interest toward
smart home technology: the role of utilitaristic perspective. In: Italian Forum of Ambient
Assisted Living, pp. 387–401. Springer, Cham (2018)
11. Droghini, D., Ferretti, D., Principi, E., Squartini, S., Piazza, F.: A combined one-class SVM and
template-matching approach for user-aided human fall detection by means of loor acoustic
features. In: Computational Intelligence and Neuroscience (2017)
12. Hussmann, S., Ringbeck, T., Hagebeuker, B.: A performance review of 3D TOF vision systems in
comparison to stereo vision systems. In: Stereo Vision. InTech (2008)
13. Diraco, G., Leone, A., Siciliano, P.: Radar sensing technology for fall detection under near real-
life conditions. In: IET Conference Proceedings, pp. 5–6 (2016)
14. Lazaro, A., Girbau, D., Villarino, R.: Analysis of vital signs monitoring using an IR-UWB radar.
Progress Electromag. Res. 100, 265–284 (2010)
[Crossref]
15. Diraco, G., Leone, A., Siciliano, P.: A radar-based smart sensor for unobtrusive elderly
monitoring in ambient assisted living applications. Biosensors 7(4), 55 (2017)
16.
Dong, H., Evans, D.: Data-fusion techniques and its application. In: Fourth International
Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 2, pp. 442–445.
IEEE (2007)
17. Caroppo, A., Diraco, G., Rescio, G., Leone, A., Siciliano, P. (2015). Heterogeneous sensor
platform for circadian rhythm analysis. In: Advances IEEE International Workshop on in
Sensors and Interfaces (ISIE), 10 August 2015, pp. 187–192 (2015)
18. Miao, Y., Song, J.: Abnormal event detection based on SVM in video surveillance. In: IEEE
Workshop on Advance Research and Technology in Industry Applications, pp. 1379–1383
(2014)
19. Forkan, A.R.M., Khalil, I., Tari, Z., Foufou, S., Bouras, A.: A context-aware approach for long-
term behavioural change detection and abnormality prediction in ambient assisted living.
Pattern Recogn. 48(3), 628–641 (2015)
[Crossref]
20. Hejazi, M., Singh, Y.P.: One-class support vector machines approach to anomaly detection.
Appl. Arti i. Intell. 27(5), 351–366 (2013)
[Crossref]
21. Otte, F.J.P., Rosales Saurer, B., Stork, W. (2013). Unsupervised learning in ambient assisted
living for pattern and anomaly detection: a survey. In: Communications in Computer and
Information Science 413 CCIS, pp. 44–53 (2013)
22. Chen, X.W., Lin, X.: Big data deep learning: challenges and perspectives. IEEE Access 2, 514–
525 (2014)
[Crossref]
23. Ribeiro, M., Lazzaretti, A.E., Lopes, H.S.: A study of deep convolutional auto-encoders for
anomaly detection in videos. Pattern Recogn. Lett. 105, 13–22 (2018)
[Crossref]
24. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classi ication with deep convolutional
neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
(2012)
25. Krizhevsky, A., Hinton, G.E.: Using very deep autoencoders for content-based image retrieval.
In: ESANN, April (2011)
26. Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for
hierarchical feature extraction. In: International Conference on Arti icial Neural Networks, pp.
52–59. Springer, Berlin, Heidelberg (2011)
27. Guo, X., Liu, X., Zhu, E., Yin, J.: Deep clustering with convolutional autoencoders. In:
International Conference on Neural Information Processing, November, pp. 373–382. Springer
(2017)
28. Diraco, G., Leone, A., Siciliano, P.: Big data analytics in smart living environments for elderly
monitoring. In: Italian Forum of Ambient Assisted Living Proceedings, pp. 301–309. Springer
(2018)
29.
Cheng, H., Liu, Z., Zhao, Y., Ye, G., Sun, X.: Real world activity summary for senior home
monitoring. Multimedia Tools Appl. 70(1), 177–197 (2014)
[Crossref]
30. Almas, A., Farquad, M.A.H., Avala, N.R., Sultana, J.: Enhancing the performance of decision tree:
a research study of dealing with unbalanced data. In: Seventh International Conference on
Digital Information Management, pp. 7–10. IEEE ICDIM (2012)
31. Hu, W., Liao, Y., Vemuri, V.R.: Robust anomaly detection using support vector machines. In:
Proceedings of the International Conference on Machine Learning, pp. 282–289 (2003)
32. Pradhan, M., Pradhan, S.K., Sahu, S.K.: Anomaly detection using arti icial neural network. Int. J.
Eng. Sci. Emerg. Technol. 2(1), 29–36 (2012)
33. Sabokrou, M., Fayyaz, M., Fathy, M., Moayed, Z., Klette, R.: Deep-anomaly: fully convolutional
neural network for fast anomaly detection in crowded scenes. Comput. Vis. Image Underst.
172, 88–97 (2018)
[Crossref]
34. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does
unsupervised pre-training help deep learning? J. Mach. Learn. Res., 625–660 (2010)
35. De Amorim, R.C., Mirkin, B.: Minkowski metric, feature weighting and anomalous cluster
initializing in K-Means clustering. Pattern Recogn. 45(3), 1061–1075 (2012)
[Crossref]
36. Chiang, M.M.T., Mirkin, B.: Intelligent choice of the number of clusters in k-means clustering:
an experimental study with different cluster spreads. J. Classif. 27(1), 3–40 (2010)
[MathSciNet][Crossref]
37. Varewyck, M., Martens, J.P.: A practical approach to model selection for support vector
machines with a Gaussian kernel. IEEE Trans. Syst. Man Cybernet., Part B (Cybernetics) 41(2),
330–340 (2011)
39. Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I.J., Bergeron, A., Bouchard, N.,
Bengio, Y.: Theano: new features and speed improvements. In: Deep Learning and
Unsupervised Feature Learning NIPS Workshop (2012)
40. Matlab R2014, The MathWorks, Inc., Natick, MA, USA. https://it.mathworks.com
© Springer Nature Switzerland AG 2021
G. Phillips-Wren et al. (eds.), Advances in Data Science: Methodologies and Applications,
Intelligent Systems Reference Library 189
https://doi.org/10.1007/978-3-030-51870-7_3
Abstract
Affective computing has been an active area of research for the past two
decades. One of the major component of affective computing is
automatic emotion recognition. This chapter gives a detailed overview
of different emotion recognition techniques and the predominantly
used signal modalities. The discussion starts with the different emotion
representations and their limitations. Given that affective computing is
a data-driven research area, a thorough comparison of standard
emotion labelled databases is presented. Based on the source of the
data, feature extraction and analysis techniques are presented for
emotion recognition. Further, applications of automatic emotion
recognition are discussed along with current and important issues such
as privacy and fairness.
3.1 Introduction to Emotion Recognition
Understanding one’s emotional state is a vital step in day to day
communication. It is interesting to note that human beings are able to
interpret other’s emotion with great ease using different cues such as
facial movements, speech and gesture. Analyzing emotions help one to
understand other’s state of mind. Emotional state information is used
for intelligent Human Computer/Robot Interaction (HCI/HRI) and for
ef icient, productive and safe human-centered interfaces. The
information about the emotional state of a person can also be used to
enhance the learning environment so that students can learn better
from their teacher. Such information is also found to be bene icial in
surveillance where the overall mood of the group can be detected to
prevent any destructive events [47].
The term emotion is often used interchangeably with affect. Thoits
[133] argued that affect is a non-conscious evaluation of an emotional
event. Whereas, emotion is a culturally biased reaction to a particular
affect. Emotion is an ambiguous term as it has different interpretations
from different domains like psychology, cognitive science, sociology, etc.
Relevant to affective computing, emotion can be explained as a
combination of three components: subjective experience, which is
biased towards a subject; emotion expressions, which include all visible
cues like facial expressions, speech patterns, posture, body gesture; and
physiological response which is a reaction of a person’s nervous system
during an emotion [5, 133].
A basic cue for identifying a person’s emotional state is to detect
his/her facial expressions. There are various psychological theories
available which help one to understand a person’s emotion by their
facial expressions. The introduction of Facial Action Coding System
(FACS) [44] has helped researchers to understand the relationship
between facial muscles and facial expressions. For example, one can
distinguish two different types of smiles using this coding system. After
years of research in this area, it has become possible to identify facial
expressions with greater accuracy. Still, a question arises that, whether
only expressions are suf icient to identify emotions? Some people are
good at concealing their emotions. It is easier to identify an expression;
however, it is more dif icult to understand a person’s emotion i.e. the
state of the mind or what a person is actually feeling.
Along with the facial expressions, we human’s also rely on other
non-verbal cues such as gestures and verbal cues such as speech. In the
affective computing community, along with the analysis of the facial
expressions, researchers have also used the speech properties like
pitch, volume and physiological signals like Electroencephalogram
(EEG) signals, heart rate, blood pressure, pulse rate, low of words in
the written text to understand a person’s affect with more accuracy.
Hence, the use of different modalities can improve a machine’s ability
to identify the emotions similar to how human beings perform the task.
The area of affective computing though not very old, has seen a
sharp increase in the number of contributing researchers. This impact
is due to the interest in developing human centered arti icial
intelligence, which are in trend these days. Various emotion based
challenges are being organized by the researchers, such as Aff-Wild
[152], Audio/Visual Emotion Challenge (AVEC) [115], Emotion
recognition in the wild (EmotiW) [33], etc. These challenges provide an
opportunity for researchers to benchmark their automatic methods
against the prior works and each other.
3.2.4 Micro-Expressions
Apart from understanding facial expressions and AUs for emotion
detection, there exists another line of works, which focus on the subtle
and brief facial movements present in a video, which are dif icult for the
humans to recognise. Such facial movements are termed as micro-
expressions as they last less than approximately 500 ms as compared to
normal facial expressions (macro-expressions) which may last for a
second [150]. The concept of micro-expressions was introduced by
Haggard and Issacs [53] and it gained much success as micro-
expression is an involuntary act and it is dif icult to voluntarily control
them.
3.4 Challenges
As the domain of emotion recognition has a high number of possible
applications, research is going on to make the process more automatic
and applied. Due to the adaptation of benchmarking challenges such as
Aff-Wild, AVEC and EmotiW, few obstacles are being successfully
addressed. Major challenges are mentioned below:
Data Driven—Currently the success of emotion recognition
techniques is partly due to the advancements of different deep neural
networks. Due to deep networks, it has become possible to extract
complex and discriminative information. However, neural networks
require a large amount of data to learn useful representations for any
given task. For automatic emotion recognition task, having data
corresponding to real world emotions is non trivial; however, one may
record the person’s facial expressions or speech to some extent,
although these expressions may vary for the real and fake emotions.
For many years, posed facial expressions of professional actors have
been used to train models. Although, these models perform poorly,
when applied on data from real world settings. Currently, many
databases exists, which have spontaneous audio-visual emotions. Most
of these temporal databases are limited to the size and the number of
samples corresponding to each emotion category. It is non-trivial to
create a balanced database as it is dif icult to induce few emotions like
fear, disgust, etc. as compared to happy and angry.
Table 3.1 Comparison of commonly used emotion detection databases. Online readers can access
the website of these databases by clicking on the name of the database for more information.
Number of samples for text databases is in words. Number of samples in each database is an
approximate count
Feature AAM LBP [2] LPQ HOG [27] PHOG SIFT Gabor
[24] [106] [14] [87]
Geo/App Geo. App. App. App. App. App. App.
Temporal – LBP-TOP LPQ-TOP HOG-TOP – – Motion energy
[159] [65] [21] [146]
Local/Holistic Global Local Local Local Local Local Holistic
Studies [142] [149, 158] [28, 30] [126] [30, [126] [107]
126]
3.13 Conclusion
The progress of deep learning based methods has changed the way
automatic emotion recognition based methods work. However, it is
important to have an understanding of different feature extraction ways
to be able to create a suitable model for emotion detection. The
advancements in face detection, face tracking, facial landmark
prediction methods have made it possible to preprocess the data
ef iciently. Feature extractor methods in visual, speech, text and
physiological based data can be easily used in real time. Both deep
learning and traditional machine learning based methods have been
used successfully to learn emotion speci ic information based on the
complexity of the data available. All these techniques have improved
the emotion detection process to a greater extent from the last decade.
The current challenge lies to make the process more generalized such
that machines can identify emotions on par with humans. Ethics related
to the affect prediction need to be de ined and followed to create an
automatic emotion recognition system without compromising with the
human’s sentiments and privacy.
References
1. Agra ioti, F., Hatzinakos, D., Anderson, A.K.: ECG pattern analysis for emotion detection. IEEE
Trans. Affect. Comput. 3(1), 102–115 (2012)
2. Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application
to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12, 2037–2041 (2006)
[zbMATH]
3. Alarcao, S.M., Fonseca, M.J.: Emotions recognition using EEG signals: a survey. IEEE Trans.
Affect. Comput. (2017)
4. Albanie, S., Nagrani, A., Vedaldi, A., Zisserman, A.: Emotion recognition in speech using cross-
model transfer in the wild. arXiv preprint arXiv:1808.05561 (2018)
5. Ali, M., Mosa, A.H., Al Machot, F., Kyamakya, K.: Emotion recognition involving physiological
and speech signals: a comprehensive review. In: Recent Advances in Nonlinear Dynamics and
Synchronization, pp. 287–302. Springer (2018)
6. Asghar, N., Poupart, P., Hoey, J., Jiang, X., Mou, L.: Affective neural response generation. In:
European Conference on Information Retrieval, pp. 154–166. Springer (2018)
7. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Incremental face alignment in the wild. In:
Computer Vision and Pattern Recognition, pp. 1859–1866. IEEE (2014)
8. Bachorowski, J.A.: Vocal expression and perception of emotion. Curr. Direct. Psychol. Sci.
8(2), 53–57 (1999)
9. Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.P.: Openface 2.0: Facial behavior analysis
toolkit. In: 13th International Conference on Automatic Face & Gesture Recognition (FG
2018), pp. 59–66. IEEE (2018)
10. Bä nziger, T., Mortillaro, M., Scherer, K.R.: Introducing the geneva multimodal expression
corpus for experimental research on emotion perception. Emotion 12(5), 1161 (2012)
11. Barber, S.J., Lee, H., Becerra, J., Tate, C.C.: Emotional expressions affect perceptions of
younger and older adults’ everyday competence. Psychol. Aging 34(7), 991 (2019)
12. Basbrain, A.M., Gan, J.Q., Sugimoto, A., Clark, A.: A neural network approach to score fusion
for emotion recognition. In: 10th Computer Science and Electronic Engineering (CEEC), pp.
180–185 (2018)
13. Batliner, A., Hacker, C., Steidl, S., Nö th, E., D’Arcy, S., Russell, M.J., Wong, M.: “You Stupid Tin
Box” Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus.
Lrec (2004)
14. Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid Kernel. In: 6th
ACM international conference on Image and video retrieval, pp. 401–408. ACM (2007)
15. Bou-Ghazale, S.E., Hansen, J.H.: A comparative study of traditional and newly proposed
features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4),
429–442 (2000)
16. Busso, C., Bulut, M., Lee, C.C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J.N., Lee, S.,
Narayanan, S.S.: IEMOCAP: interactive emotional dyadic motion capture database. Lang.
Resour. Eval. 42(4), 335 (2008)
17. Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U.,
Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and
multimodal information. In: 6th International Conference on Multimodal Interfaces, pp. 205–
211. ACM (2004)
18. Busso, C., Parthasarathy, S., Burmania, A., AbdelWahab, M., Sadoughi, N., Provost, E.M.: MSP-
IMPROV: an acted corpus of dyadic interactions to study emotion perception. IEEE Trans.
Affect. Comput. 8(1), 67–80 (2017)
19. Cairns, D.A., Hansen, J.H.: Nonlinear analysis and classi ication of speech under stressed
conditions. J. Acoust. Soc. Am. 96(6), 3392–3400 (1994)
20. Cambria, E.: Affective computing and sentiment analysis. Intell. Syst. 31(2), 102–107 (2016)
21. Chen, J., Chen, Z., Chi, Z., Fu, H.: Dynamic texture and geometry features for facial expression
recognition in video. In: International Conference on Image Processing (ICIP), pp. 4967–
4971. IEEE (2015)
22. Chen, W., Picard, R.W.: Eliminating physiological information from facial videos. In: 12th
International Conference on Automatic Face and Gesture Recognition (FG 2017), pp. 48–55.
IEEE (2017)
23. Cho, K., Van Merrië nboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.:
Learning phrase representations using RNN encoder-decoder for statistical machine
translation. arXiv preprint arXiv:1406.1078 (2014)
24.
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal.
Mach. Intell. 6, 681–685 (2001)
25. Correa, J.A.M., Abadi, M.K., Sebe, N., Patras, I.: AMIGOS: A dataset for affect, personality and
mood research on individuals and groups. IEEE Trans. Affect. Comput. (2018)
26. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.:
Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–
80 (2001)
27. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International
Conference on Computer Vision & Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE
Computer Society (2005)
28. Davison, A., Merghani, W., Yap, M.: Objective classes for micro-facial expression recognition.
J. Imaging 4(10), 119 (2018)
29. Davison, A.K., Lansley, C., Costen, N., Tan, K., Yap, M.H.: SAMM: a spontaneous micro-facial
movement dataset. IEEE Trans. Affect. Comput. 9(1), 116–129 (2018)
30. Dhall, A., Asthana, A., Goecke, R., Gedeon, T.: Emotion recognition using phog and lpq features.
In: Face and Gesture 2011, pp. 878–883. IEEE (2011)
31. Dhall, A., Goecke, R., Gedeon, T.: Automatic group happiness intensity analysis. IEEE Trans.
Affect. Comput. 6(1), 13–26 (2015)
32. Dhall, A., Goecke, R., Lucey, S., Gedeon, T., et al.: Collecting large, richly annotated facial-
expression databases from movies. IEEE Multimedia 19(3), 34–41 (2012)
33. Dhall, A., Kaur, A., Goecke, R., Gedeon, T.: Emotiw 2018: audio-video, student engagement and
group-level affect prediction. In: International Conference on Multimodal Interaction, pp.
653–656. ACM (2018)
34. Du, S., Tao, Y., Martinez, A.M.: Compound facial expressions of emotion. Natl. Acad. Sci.
111(15), E1454–E1462 (2014)
35. Ekman, P., Friesen, W.V.: Unmasking the face: a guide to recognizing emotions from facial
clues. Ishk (2003)
36. Ekman, P., Friesen, W.V., Hager, J.C.: Facial Action Coding System: The Manual on CD ROM, pp.
77–254. A Human Face, Salt Lake City (2002)
37. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features,
classi ication schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)
[zbMATH]
38. Ertugrul, I.O., Cohn, J.F., Jeni, L.A., Zhang, Z., Yin, L., Ji, Q.: Cross-domain au detection: domains,
learning approaches, and measures. In: 14th International Conference on Automatic Face &
Gesture Recognition, pp. 1–8. IEEE (2019)
39.
Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., André , E., Busso, C., Devillers, L.Y., Epps, J.,
Laukka, P., Narayanan, S.S., et al.: The geneva minimalistic acoustic parameter set (GeMAPS)
for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202
(2016)
40. Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in Opensmile, the Munich
open-source multimedia feature extractor. In: 21st ACM international conference on
Multimedia, pp. 835–838. ACM (2013)
41. Fabian Benitez-Quiroz, C., Srinivasan, R., Martinez, A.M.: Emotionet: An accurate, real-time
algorithm for the automatic annotation of a million facial expressions in the wild. In:
Computer Vision and Pattern Recognition, pp. 5562–5570. IEEE (2016)
42. Fan, Y., Lu, X., Li, D., Liu, Y.: Video-based emotion recognition using CNN-RNN and C3D hybrid
networks. In: 18th ACM International Conference on Multimodal Interaction, pp. 445–450.
ACM (2016)
43. Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with
facial expressions for joint recognition of affect in child-robot interaction. arXiv preprint
arXiv:1901.01805 (2019)
44. Friesen, E., Ekman, P.: Facial action coding system: a technique for the measurement of facial
movement. Palo Alto 3, (1978)
45. Ganchev, T., Fakotakis, N., Kokkinakis, G.: Comparative evaluation of various MFCC
implementations on the speaker veri ication task. SPECOM 1, 191–194 (2005)
46. Ghimire, D., Lee, J., Li, Z.N., Jeong, S., Park, S.H., Choi, H.S.: Recognition of facial expressions
based on tracking and selection of discriminative geometric features. Int. J. Multimedia
Ubiquitous Eng. 10(3), 35–44 (2015)
47. Ghosh, S., Dhall, A., Sebe, N.: Automatic group affect analysis in images via visual attribute and
feature networks. In: 25th IEEE International Conference on Image Processing (ICIP), pp.
1967–1971. IEEE (2018)
48. Girard, J.M., Chu, W.S., Jeni, L.A., Cohn, J.F.: Sayette group formation task (GFT) spontaneous
facial expression database. In: 12th International Conference on Automatic Face & Gesture
Recognition (FG 2017), pp. 581–588. IEEE (2017)
49. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.
deeplearningbook.org
50. Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang,
Y., Thaler, D., Lee, D.H., et al.: Challenges in representation learning: a report on three machine
learning contests. Neural Netw. 64, 59–63 (2015)
51. Graves, A., Schmidhuber, J.: Framewise phoneme classi ication with bidirectional LSTM and
other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
52. Gunes, H., Pantic, M.: Automatic, dimensional and continuous emotion recognition. Int. J.
Synth. Emotions (IJSE) 1(1), 68–99 (2010)
53.
Haggard, E.A., Isaacs, K.S.: Micromomentary facial expressions as indicators of ego
mechanisms in psychotherapy. In: Methods of research in psychotherapy, pp. 154–165.
Springer (1966)
54. Han, J., Zhang, Z., Ren, Z., Schuller, B.: Implicit fusion by joint audiovisual training for emotion
recognition in mono modality. In: International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 5861–5865. IEEE (2019)
55. Han, J., Zhang, Z., Schmitt, M., Ren, Z., Ringeval, F., Schuller, B.: Bags in bag: generating context-
aware bags for tracking emotions from speech. Interspeech 2018, 3082–3086 (2018)
56. Happy, S., Patnaik, P., Routray, A., Guha, R.: The Indian spontaneous expression database for
emotion recognition. IEEE Trans. Affect. Comput. 8(1), 131–142 (2017)
57. Harvill, J., AbdelWahab, M., Lot ian, R., Busso, C.: Retrieving speech samples with similar
emotional content using a triplet loss function. In: International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pp. 7400–7404. IEEE (2019)
58. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer
vision and pattern recognition, pp. 770–778. IEEE (2016)
59. Hu, P., Ramanan, D.: Finding tiny faces. In: Computer vision and pattern recognition, pp. 951–
959. IEEE (2017)
60. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional
networks. In: Computer vision and pattern recognition, pp. 4700–4708. IEEE (2017)
61. Huang, Y., Yang, J., Liu, S., Pan, J.: Combining facial expressions and electroencephalography to
enhance emotion recognition. Future Internet 11(5), 105 (2019)
62. Hussein, H., Angelini, F., Naqvi, M., Chambers, J.A.: Deep-learning based facial expression
recognition system evaluated on three spontaneous databases. In: 9th International
Symposium on Signal, Image, Video and Communications (ISIVC), pp. 270–275. IEEE (2018)
63. Jack, R.E., Blais, C., Scheepers, C., Schyns, P.G., Caldara, R.: Cultural confusions show that facial
expressions are not universal. Curr. Biol. 19(18), 1543–1548 (2009)
64. Jack, R.E., Sun, W., Delis, I., Garrod, O.G., Schyns, P.G.: Four not six: revealing culturally
common facial expressions of emotion. J. Exp. Psychol. Gen. 145(6), 708 (2016)
65. Jiang, B., Valstar, M.F., Pantic, M.: Action unit detection using sparse appearance descriptors in
space-time video volumes. In: Face and Gesture, pp. 314–321. IEEE (2011)
66. Joshi, J., Goecke, R., Alghowinem, S., Dhall, A., Wagner, M., Epps, J., Parker, G., Breakspear, M.:
Multimodal assistive technologies for depression diagnosis and monitoring. J. Multimodal
User Interfaces 7(3), 217–228 (2013)
67. Jyoti, S., Sharma, G., Dhall, A.: Expression empowered residen network for facial action unit
detection. In: 14th International Conference on Automatic Face and Gesture Recognition, pp.
1–8. IEEE (2019)
68. Kaiser, J.F.: On a Simple algorithm to calculate the ‘Energy’ of a Signal. In: International
Conference on Acoustics, Speech, and Signal Processing, pp. 381–384. IEEE (1990)
69.
King, D.E.: Dlib-ML: A machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
70. Knyazev, B., Shvetsov, R., Efremova, N., Kuharenko, A.: Convolutional neural networks
pretrained on large face recognition datasets for emotion classi ication from video. arXiv
preprint arXiv:1711.04598 (2017)
71. Koelstra, S., Muhl, C., Soleymani, M., Lee, J.S., Yazdani, A., Ebrahimi, T., Pun, T., Nijholt, A.,
Patras, I.: DEAP: a database for emotion analysis; using physiological signals. IEEE Trans.
Affect. Comput. 3(1), 18–31 (2012)
72. Kratzwald, B., Ilić , S., Kraus, M., Feuerriegel, S., Prendinger, H.: Deep learning for affective
computing: text-based emotion recognition in decision support. Decis. Support Syst. 115,
24–35 (2018)
73. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classi ication with deep convolutional
neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
(2012)
74. Latif, S., Rana, R., Khalifa, S., Jurdak, R., Epps, J.: Direct modelling of speech emotion from raw
speech. arXiv preprint arXiv:1904.03833 (2019)
75. Lee, C.M., Narayanan, S.S., et al.: Toward detecting emotions in spoken dialogs. IEEE Trans.
Speech Audio Process. 13(2), 293–303 (2005)
76. Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: The
IEEE International Conference on Computer Vision (ICCV) (2019)
77. Li, S., Deng, W.: Deep facial expression recognition: a survey. arXiv preprint arXiv:1804.
08348 (2018)
78. Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for
expression recognition in the wild. In: Computer Vision and Pattern Recognition, pp. 2852–
2861. IEEE (2017)
79. Li, W., Xu, H.: Text-based emotion classi ication using emotion cause extraction. Expert Syst.
Appl. 41(4), 1742–1749 (2014)
80. Lian, Z., Li, Y., Tao, J.H., Huang, J., Niu, M.Y.: Expression analysis based on face regions in read-
world conditions. Int. J. Autom. Comput. 1–12
81. Liao, S., Jain, A.K., Li, S.Z.: A fast and accurate unconstrained face detector. IEEE Trans.
Pattern Anal. Mach. Intell. 38(2), 211–223 (2016)
82. Lienhart, R., Maydt, J.: An extended set of haar-like features for rapid object detection. In:
Proceedings of International Conference on Image Processing, vol. 1, p. I. IEEE (2002)
83. Liu, X., Zou, Y., Kong, L., Diao, Z., Yan, J., Wang, J., Li, S., Jia, P., You, J.: Data augmentation via
latent space interpolation for image classi ication. In: 24th International Conference on
Pattern Recognition (ICPR), pp. 728–733. IEEE (2018)
84.
Livingstone, S.R., Russo, F.A.: The Ryerson Audio-Visual Database of Emotional Speech and
Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North
American English. PloS One 13(5), e0196391 (2018)
85. Lot ian, R., Busso, C.: Building naturalistic emotionally balanced speech corpus by retrieving
emotional speech from existing podcast rRecordings. IEEE Trans. Affect. Comput. (2017)
86. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis.
60(2), 91–110 (2004)
87. Lowe, D.G., et al.: Object recognition from local scale-invariant features. ICCV 99, 1150–1157
(1999)
88. Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to
stereo vision (1981)
89. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-
kanade dataset (ck+): a complete dataset for action unit and emotion-speci ied expression.
In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101. IEEE
(2010)
90. Macı́as, E., Suá rez, A., Lacuesta, R., Lloret, J.: Privacy in affective computing based on mobile
sensing systems. In: 2nd International Electronic Conference on Sensors and Applications,
p. 1. MDPI AG (2015)
91. Makhmudkhujaev, F., Abdullah-Al-Wadud, M., Iqbal, M.T.B., Ryu, B., Chae, O.: Facial expression
recognition with local prominent directional pattern. Signal Process. Image Commun. 74, 1–
12 (2019)
92. Mandal, M., Verma, M., Mathur, S., Vipparthi, S., Murala, S., Deveerasetty, K.: RADAP: regional
adaptive af initive patterns with logical operators for facial expression recognition. IET
Image Processing (2019)
93. Martin, O., Kotsia, I., Macq, B., Pitas, I.: The eNTERFACE’05 audio-visual emotion database. In:
22nd International Conference on Data Engineering Workshops (ICDEW’06), pp. 8–8. IEEE
(2006)
94. Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: DISFA: a spontaneous facial
action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013)
95. McDuff, D., Amr, M., El Kaliouby, R.: AM-FED+: an extended dataset of naturalistic facial
expressions collected in everyday settings. IEEE Trans. Affect. Comput. 10(1), 7–17 (2019)
96. McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., Stroeve, S.:
Approaching automatic recognition of emotion from voice: a rough benchmark. In: ISCA
Tutorial and Research Workshop (ITRW) on Speech and Emotion (2000)
97. McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.: The SEMAINE database:
annotated multimodal records of emotionally colored conversations between a person and a
limited agent. IEEE Trans. Affect. Comput. 3(1), 5–17 (2012)
100. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words
and phrases and their compositionality. In: Advances in Neural Information Processing
Systems, pp. 3111–3119 (2013)
101. Moffat, D., Ronan, D., Reiss, J.D.: An evaluation of audio feature extraction toolboxes (2015)
102. Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: A database for facial expression,
valence, and arousal computing in the wild. arXiv preprint arXiv:1708.03985 (2017)
103. Munezero, M.D., Montero, C.S., Sutinen, E., Pajunen, J.: Are they different? Affect, feeling,
emotion, sentiment, and opinion detection in text. IEEE Trans. Affect. Comput. 5(2), 101–
111 (2014)
104. Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: a review of
the literature on human vocal emotion. J. Acoust. Soc. Am. 93(2), 1097–1108 (1993)
105. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models.
Speech Commun. 41(4), 603–623 (2003)
106. Ojansivu, V., Heikkilä , J.: Blur insensitive texture classi ication using local phase quantization.
In: International Conference on Image and Signal Processing, pp. 236–243. Springer (2008)
107. Ou, J., Bai, X.B., Pei, Y., Ma, L., Liu, W.: Automatic facial expression recognition using gabor
ilter and expression analysis. In: 2nd International Conference on Computer Modeling and
Simulation, vol. 2, pp. 215–218. IEEE (2010)
108. Pan, X., Guo, W., Guo, X., Li, W., Xu, J., Wu, J.: Deep temporal-spatial aggregation for video-
based facial expression recognition. Symmetry 11(1), 52 (2019)
109. Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognition. BMVC 1, 6 (2015)
110. Rabiner, L., Schafer, R.: Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffs
(1978)
111. Rassadin, A., Gruzdev, A., Savchenko, A.: Group-level emotion recognition using transfer
learning from face identi ication. In: 19th ACM International Conference on Multimodal
Interaction, pp. 544–548. ACM (2017)
112. Reynolds, C., Picard, R.: Affective sensors, privacy, and ethical contracts. In: CHI’04 Extended
Abstracts on Human Factors in Computing Systems, pp. 1103–1106. ACM (2004)
113. Rhue, L.: Racial in luence on automated perceptions of emotions. Available at SSRN 3281765,
(2018)
114. Ringeval, F., Eyben, F., Kroupi, E., Yuce, A., Thiran, J.P., Ebrahimi, T., Lalanne, D., Schuller, B.:
Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological
data. Pattern Recogn. Lett. 66, 22–30 (2015)
115.
Ringeval, F., Schuller, B., Valstar, M., Cummins, N., Cowie, R., Tavabi, L., Schmitt, M., Alisamir, S.,
Amiriparian, S., Messner, E.M., et al.: AVEC 2019 workshop and challenge: state-of-mind,
detecting depression with AI, and cross-cultural affect recognition. In: 9th International on
Audio/Visual Emotion Challenge and Workshop, pp. 3–12. ACM (2019)
116. Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the RECOLA multimodal
corpus of remote collaborative and affective interactions. In: 10th International Conference
and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–8. IEEE (2013)
117. Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)
118. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. Adv. Neural Inform.
Process. Syst. 3856–3866 (2017)
119. Saragih, J.M., Lucey, S., Cohn, J.F.: Face alignment through subspace constrained mean-shifts.
In: 12th International Conference on Computer Vision, pp. 1034–1041. IEEE (2009)
120. Sariyanidi, E., Gunes, H., Cavallaro, A.: Automatic analysis of facial affect: a survey of
registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(6),
1113–1133 (2015)
121. Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M.,
Weninger, F., Eyben, F., Marchi, E., et al.: The INTERSPEECH 2013 computational
paralinguistics challenge: social signals, con lict, emotion, Autism. In: 14th Annual
Conference of the International Speech Communication Association (2013)
122. Sebe, N., Cohen, I., Gevers, T., Huang, T.S.: Emotion recognition based on joint visual and audio
cues. In: 18th International Conference on Pattern Recognition, vol. 1, pp. 1136–1139. IEEE
(2006)
123. Seyeditabari, A., Tabari, N., Zadrozny, W.: Emotion detection in text: a review. arXiv preprint
arXiv:1806.00674 (2018)
124. Shi, J., Tomasi, C.: Good Features to Track. Tech. rep, Cornell University (1993)
125. Siddharth, S., Jung, T.P., Sejnowski, T.J.: Multi-modal approach for affective computing. arXiv
preprint arXiv:1804.09452 (2018)
126. Sikka, K., Dykstra, K., Sathyanarayana, S., Littlewort, G., Bartlett, M.: Multiple Kernel learning
for emotion recognition in the wild. In: 15th ACM on International Conference on
Multimodal Interaction, pp. 517–524. ACM (2013)
127. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556 (2014)
128. Sneddon, I., McRorie, M., McKeown, G., Hanratty, J.: The Belfast induced natural emotion
database. IEEE Trans. Affect. Comput. 3(1), 32–41 (2012)
129. Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A multimodal database for affect
recognition and implicit tagging. IEEE Trans. Affect. Comput. 3(1), 42–55 (2012)
130. Strapparava, C., Mihalcea, R.: Learning to identify emotions in text. In: ACM Symposium on
Applied Computing, pp. 1556–1560. ACM (2008)
131. Strapparava, C., Valitutti, A., et al.: Wordnet affect: an affective extension of wordnet. In: Lrec,
vol. 4, p. 40. Citeseer (2004)
132. Teager, H.: Some observations on oral air low during phonation. IEEE Trans. Acoust. Speech
Signal Process. 28(5), 599–601 (1980)
133. Thoits, P.A.: The sociology of emotions. Annu. Rev. Sociol. 15(1), 317–342 (1989)
134. Tomasi, C., Detection, T.K.: Tracking of point features. Tech. rep., Tech. Rep. CMU-CS-91-132,
Carnegie Mellon University (1991)
135. Torres, J.M.M., Stepanov, E.A.: Enhanced face/audio emotion recognition: video and instance
level classi ication using ConvNets and restricted boltzmann machines. In: International
Conference on Web Intelligence, pp. 939–946. ACM (2017)
136. Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M.A., Schuller, B., Zafeiriou, S.:
Adieu features? End-to-end speech emotion recognition using a deep convolutional
recurrent network. In: International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pp. 5200–5204. IEEE (2016)
137. Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: MoCoGAN: decomposing motion and content for
video generation. In: Computer Vision and Pattern Recognition, pp. 1526–1535. IEEE (2018)
138. Verma, G.K., Tiwary, U.S.: Multimodal fusion framework: a multiresolution approach for
emotion classi ication and recognition from physiological signals. NeuroImage 102, 162–
172 (2014)
139. Viola, P., Jones, M., et al.: Rapid object detection using a boosted cascade of simple features.
CVPR 1(1), 511–518 (2001)
140. Wagner, J., Andre, E., Lingenfelser, F., Kim, J.: Exploring fusion methods for multimodal
emotion recognition with missing data. IEEE Trans. Affect. Comput. 2(4), 206–218 (2011)
141. Wagner, J., Vogt, T., André , E.: A systematic comparison of different HMM designs for emotion
recognition from acted and spontaneous speech. In: International Conference on Affective
Computing and Intelligent Interaction, pp. 114–125. Springer (2007)
142. Wang, S., Liu, Z., Lv, S., Lv, Y., Wu, G., Peng, P., Chen, F., Wang, X.: A natural visible and infrared
facial expression database for expression recognition and emotion inference. IEEE Trans.
Multimedia 12(7), 682–691 (2010)
143. Warriner, A.B., Kuperman, V., Brysbaert, M.: Norms of valence, arousal, and dominance for
13,915 English lemmas. Behav. Res. Methods 45(4), 1191–1207 (2013)
144. Wiles, O., Koepke, A., Zisserman, A.: Self-supervised learning of a facial attribute embedding
from video. arXiv preprint arXiv:1808.06882 (2018)
145. Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation
spectral features. Speech Commun. 53(5), 768–785 (2011)
146. Wu, T., Bartlett, M.S., Movellan, J.R.: Facial expression recognition using gabor motion energy
ilters. In: Computer Vision and Pattern Recognition-Workshops, pp. 42–47. IEEE (2010)
147.
Wu, Y., Kang, X., Matsumoto, K., Yoshida, M., Kita, K.: Emoticon-based emotion analysis for
Weibo articles in sentence level. In: International Conference on Multi-disciplinary Trends in
Arti icial Intelligence, pp. 104–112. Springer (2018)
148. Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.c.: Convolutional LSTM
network: a machine learning approach for precipitation nowcasting. In: Advances in Neural
Information Processing Systems, pp. 802–810 (2015)
149. Yan, W.J., Li, X., Wang, S.J., Zhao, G., Liu, Y.J., Chen, Y.H., Fu, X.: CASME II: an improved
spontaneous micro-expression database and the baseline evaluation. PloS One 9(1), e86041
(2014)
150. Yan, W.J., Wu, Q., Liang, J., Chen, Y.H., Fu, X.: How fast are the leaked facial expressions: the
duration of micro-expressions. J. Nonverbal Behav. 37(4), 217–230 (2013)
151. Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial
behavior research. In: 7th International Conference on Automatic Face and Gesture
Recognition, pp. 211–216. IEEE (2006)
152. Zafeiriou, S., Kollias, D., Nicolaou, M.A., Papaioannou, A., Zhao, G., Kotsia, I.: Aff-wild: valence
and arousal’In-the-wild’challenge. In: Computer Vision and Pattern Recognition Workshops,
pp. 34–41. IEEE (2017)
153. Zamil, A.A.A., Hasan, S., Baki, S.M.J., Adam, J.M., Zaman, I.: Emotion detection from speech
signals using voting mechanism on classi ied frames. In: International Conference on
Robotics, Electrical and Signal Processing Techniques (ICREST), pp. 281–285. IEEE (2019)
154. Zhalehpour, S., Onder, O., Akhtar, Z., Erdem, C.E.: BAUM-1: a spontaneous audio-visual face
database of affective and mental states. IEEE Trans. Affect. Comput. 8(3), 300–313 (2017)
155. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask
cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
156. Zhang, Z., Girard, J.M., Wu, Y., Zhang, X., Liu, P., Ciftci, U., Canavan, S., Reale, M., Horowitz, A.,
Yang, H., et al.: Multimodal spontaneous emotion corpus for human behavior analysis. In:
Computer Vision and Pattern Recognition, pp. 3438–3446. IEEE (2016)
157. Zhang, Z., Luo, P., Loy, C.C., Tang, X.: From facial expression recognition to interpersonal
relation prediction. Int. J. Comput. Vis. 126(5), 550–569 (2018)
[MathSciNet]
158. Zhao, G., Huang, X., Taini, M., Li, S.Z., Pietikä Inen, M.: Facial expression recognition from near-
infrared videos. Image Vis. Comput. 607–619 (2011)
159. Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an
application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 6, 915–928 (2007)
160. Zhong, P., Wang, D., Miao, C.: An affect-rich neural conversational model with biased
attention and weighted cross-entropy loss. arXiv preprint arXiv:1811.07078 (2018)
161.
Zhou, G., Hansen, J.H., Kaiser, J.F.: Nonlinear feature based classi ication of speech under
stress. IEEE Trans. Speech Audio Process. 9(3), 201–216 (2001)
© Springer Nature Switzerland AG 2021
G. Phillips-Wren et al. (eds.), Advances in Data Science: Methodologies and Applications,
Intelligent Systems Reference Library 189
https://doi.org/10.1007/978-3-030-51870-7_4
Julia Krüger
Email: [email protected]
Abstract
Nowadays, a diverse set of addressee detection methods is discussed.
Typically, wake words are used. But these force an unnatural
interaction and are error-prone, especially in case of false positive
classi ication (user says the wake up word without intending to interact
with the device). Therefore, technical systems should be enabled to
perform a detection of device directed speech. In order to enrich
research in the ield of speech analysis in HCI we conducted studies
with a commercial voice assistant, Amazon’s ALEXA (Voice Assistant
Conversation Corpus, VACC), and complemented objective speech
analysis with subjective self and external reports on possible
differences in speaking with the voice assistant compared to speaking
with another person. The analysis revealed a set of speci ic features for
device directed speech. It can be concluded that speech-based
addressing of a technical system is a mainly conscious process
including individual modi ications of the speaking style.
4.1 Introduction
Voice assistant systems recently receive increased attention. The
market for commercial voice assistants is rapidly growing: e.g.
Microsoft Cortana had 133 million active users in 2016 (cf. [37]), the
Echo Dot was the best-selling product on all of Amazon in the 2017
holiday season (cf. [11]). Furthermore, 72% of people who own a voice-
activated speaker say their devices are often used as part of their daily
routine (cf. [25]). Already in 2018 approximately 10% of the internet
population used voice control according to [23]. Mostly the ease of use
is responsible for the attractiveness of today’s voice assistant systems.
By simply using speech commands, users can play music, search the
web, create to-do and shopping lists, shop online, get instant weather
reports, and control popular smart-home products.
Besides enabling an as simple as possible operation of the technical
system, voice assistants should allow a natural interaction. A natural
interaction is characterized by the understanding of natural actions and
the engagement of people into a dialog, while allowing them to interact
naturally with each other and the environment. Furthermore, users
don’t need to use additional devices or learn any instruction, as the
interaction respects the human perception. Correspondingly, the
interaction with such systems is easy and seductive for everyone (cf.
[63]). To ful ill these properties, cognitive systems, which are able to
perceive their environment and are working on the basis of gathered
knowledge and model-based recognition, are needed. In contrast,
today’s voice assistant’s system functionality is still very limited and
not seen as a natural interaction. Especially, when navigating the
nuances of human communication, today’s voice assistants still have a
long way to go. They are still incapable of handling other expressions
that have semantic similarity, still based on the evaluation of pre-
de ined keywords, and still unable to interpret prosodic variations.
Another important aspect on the way towards a natural interaction
with voice assistants, is the interaction initiation. Nowadays two
solutions have become established to initiate an interaction with a
technical system: push-to-talk and wake words. In research also other
methods have been evaluated, e.g. look-to-talk [36].
In push-to-talk systems, the user has to press a button, wait for a
(mostly acoustic signal) and can then start to talk. The conversation
set-up time can be reduced using buffers and contextual analyzes for
the initial speech burst [65]. Push-to-talk systems are mostly used in
environments where a error-free conversation initiation is needed, e.g.
telecommunication systems or cars [10]. The false acceptance rate is
nearly zero, only rare cases of wrong button pushes have to bet taken
into account. But, this high robustness is at the expense of the
naturalness of the interaction initiation. Therefore, in voice assistants
the wake-word method is more common.
For the wake-word technique, the user has to say a pre-de ined
keyword to activate the voice assistant and afterwards the speech
command can be uttered. Each voice assistant has its own unique wake
work1 which can sometimes be selected from a short list of (pre-
de ined) alternatives. This approach of calling your device by a name is
more natural than the push-to-talk solution, but far away from a
human-like interaction, as every dialog has to be initiated with the
wake-word. Only for a few exceptions the wake-word can be neglected.
Therefore, developers use a simple trick and extend the time-span the
device is listening after a dialog [56]. But, the currently preferred wake
word method is still error-prone. The voice assistant is still not able to
detect when it is addressed and when it is just talked about him. This
can result in users’ confusion, e.g., when the wake word has been said
but no interaction with the system was intended by the user. Especially
for voice assistant systems that are already able to buy products
automatically and in future should be enabled to autonomously make
decisions it is crucial to only react when truly intended by the user.
The following examples show how wake words already led to
errors. The irst example went through the news in January 2017. At
the end of a news story the presenter remarked: “I love the little girl,
saying ‘ALEXA order me a dollhouse.”’ Amazon Echo owners who were
watching the broadcast found that the remark triggered orders on their
own devices (cf. [31]). Another wake word failure highlights the privacy
issues of voice assistants. According to the KIRO7 news channel, a
private conversation of a family was recorded by Amazon’s ALEXA and
sent to the phone of a random person, who was in the family’s contact
list. Amazon justi ied this misconduct as follows: ALEXA woke up due to
a word in the background conversation sounding like ‘ALEXA’, the
subsequent conversation was heard as a “send message” request, the
customer’s contact name and the con irmation to send the message (cf.
[21]). A third example illustrated the malfunctions of smart home
services using Apple’s Siri. A neighbor of a house owner who had
equipped its house with a smart lock and the apple HomeKit was able
let himself in by shouting, “Hey Siri, unlock the front door.” [59]. These
examples illustrate that today’s solution of using a wake word is in
many ways insuf icient. Additional techniques are needed to detect
whether the voice assistant is (properly) addressed (by the owner) or
not. One possibility is the development of a reliable Addressee
Detection (AD) technique implemented in the system itself. Such
systems, will only react when the (correct) user addresses the voice
assistant with the intention to talk to the device.
Regarding AD research various aspects have already been
investigated so far, cf. Sect. 4.2. But, previous research concentrated on
the analyzes of observable users’ speech characteristics in the recorded
data and the subsequent analyzes of external ratings. The question
whether users themselves recognize differences or even perhaps
deliberately change their speaking style when interacting with a
technical system (and potential in luencing factors for this change)
have not been evaluated so far. Furthermore, a comparison between
self reported modi ications in speech behavior and externally as well as
automatic identi icated modi ications seems promising in case of
fundamental research.
In this chapter, an overview of recent advances in AD research will
be given. Furthermore, changes in speaking style will be identi ied by
analyzing modi ications of conversation factors during a multi-party
human-computer interaction (HCI). The remainder of the chapter is
structured as follows: In Sect. 4.2 previous work on related AD research
is presented and discussed. In Sect. 4.3 the experimental setup of the
utilized dataset and the participant description is presented. In Sect. 4.4
the dimensions under analyze “automatic”, “self” and “external” are
introduced. The results regarding these dimensions are then presented
in Sect. 4.5. Finally, Sect. 4.6 concludes the chapter and presents an
outlook.
order of the scenarios (Calendar Module and Quiz Module) is ixed. A and C denote the
experimental conditions alone and together with an confederate respectively
Fig. 4.3 Overview of the three utilized perspectives, and the relation of their different analyzes
Fig. 4.4 Mean and standard deviation of the UAR values for the German and non-German-
speaking annotators according to the two modules of VACC
4.5 Results
4.5.1 Addressee Annotation and Addressee Recognition
Task
4.5.1.1 Human AD Annotation
To irst test the quality of annotations in terms of the interrater
reliability, Krippendorff’s alpha is calculated. The differences between
the Calendar and the Quiz Module for German-speaking annotators are
marginal with around 0.55. For non-German-speaking annotators the
IRR is only 0.168 and only 0.255 for the Calendar module and the Quiz
module respectively. According to the interpretation scheme of [28],
this means a slight to fair IRR value for the non-German-speaking
annotators and a moderate IRR value for the German-speaking
annotators. These numbers already show that the task leaves space for
interpretations by the annotators. Especially some of the non-German-
speaking annotators are faced with dif iculties.
Regarding the human annotated AD task, the results are presented
in Fig. 4.4. It can be seen that in general, German-speaking annotators
are roughly 10% better in correctly identifying the addressee than non-
German-speaking annotators. This underlines to a certain degree the
importance of the speech content. Furthermore, the variance between
the German-speaking annotators regarding the two modules Calendar
and Quiz is much less than for the non-German-speaking ones with
approx. 6% and 14% respectively. Regarding the two modules of VACC
representing different conversational styles, it can be seen that the
more formal calendar task complicates the correct annotation for the
non-German-speaking annotators, the average is 65.39% and 70.61%
for calendar and quiz task, respectively. The German-speaking
annotators did not show these dif iculties.
Fig. 4.5 Mean and standard deviation of the UAR values for the automatic AD classi ication
according to the two different modules and the in luence of the confederate speaker onto the DD
utterances of the participants. For comparison the best annotation results are indicated (
German-speaking
non-German-speaking)
Characteristic R N K I
Choice of words 24/24 3/0 0/3 0/0
Sentence length 18/19 5/3 3/4 1/1
Monotony 19/19 6/6 2/2 0/0
Intonation (word/syllable accentuation) 16/17 7/5 4/4 0/1
Speaking rate 17/20 8/4 1/2 1/1
Melody 10/11 8/7 7/7 2/2
Calendar Quiz
DD versus HD DD versus HD
Identi ied pcm intensity, lspFreq[0-6], mfcc[2,4], pcm intensity, pcm loudness, pcm
distinctive
LLDs pcm loudness zcr, alphaRatio, F0semitone, F2amplitude, F3amplitude
References
1. Akhtiamov, O., Sidorov, M., Karpov, A., Minker, W.: Speech and text analysis for multimodal
addressee detection in human-human-computer interaction. In: Proceedings of the
INTERSPEECH-2017, pp. 2521–2525 (2017)
2. Akhtiamov, O., Siegert, I., Minker, W., Karpov, A.: Cross-corpus data augmentation for acoustic
addressee detection. In: 20th Annual SIGdial Meeting on Discourse and Dialogue (2019)
3. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist.
34, 555–596 (2008)
4. Baba, N., Huang, H.H., Nakano, Y.I.: Addressee identi ication for human-human-agent
multiparty conversations in different proxemics. In: Proceedings of the 4th Workshop on Eye
Gaze in Intelligent Human Machine Interaction, pp. 6:1–6:6 (2012)
5. Batliner, A., Hacker, C., Nö th, E.: To talk or not to talk with a computer. J. Multimodal User
Interfaces 2, 171–186 (2008)
6. Bertero, D., Fung, P.: Deep learning of audio and language features for humor prediction. In:
Proceedings of the 10th LREC, Portorož , Slovenia (2016)
7. Beyan, C., Carissimi, N., Capozzi, F., Vascon, S., Bustreo, M., Pierro, A., Becchio, C., Murino, V.:
Detecting emergent leader in a meeting environment using nonverbal visual features only. In:
Proceedings of the 18th ACM ICMI, pp. 317–324. ICMI 2016 (2016)
8. Bö ck, R., Siegert, I., Haase, M., Lange, J., Wendemuth, A.: ikannotate—a tool for labelling,
transcription, and annotation of emotionally coloured speech. In: Affective Computing and
Intelligent Interaction, LNCS, vol. 6974, pp. 25–34. Springer (2011)
9.
Bö ck, R., Egorow, O., Siegert, I., Wendemuth, A.: Comparative study on normalisation in
emotion recognition from speech. In: Horain, P., Achard, C., Mallem, M. (eds.) Proceedings of
the 9th IHCI 2017, pp. 189–201. Springer International Publishing, Cham (2017)
10. DaSilva, L.A., Morgan, G.E., Bostian, C.W., Sweeney, D.G., Midkiff, S.F., Reed, J.H., Thompson, C.,
Newhall, W.G., Woerner, B.: The resurgence of push-to-talk technologies. IEEE Commun. Mag.
44(1), 48–55 (2006)
[Crossref]
11. Dickey, M.R.: The echo dot was the best-selling product on all of amazon this holiday season.
TechCrunch (December 2017). Accessed 26 Dec 2017
12. Dowding, J., Clancey, W.J., Graham, J.: Are you talking to me? dialogue systems supporting
mixed teams of humans and robots. In: AIAA Fall Symposium Annually Informed
Performance: Integrating Machine Listing and Auditory Presentation in Robotic System,
Washington, DC, USA (2006)
13. Eggink, J., Bland, D.: A large scale experiment for mood-based classi ication of TV
programmes. In: Proceedings of ICME, pp. 140–145 (2012)
14. Egorow, O., Siegert, I., Wendemuth, A.: Prediction of user satisfaction in naturalistic human-
computer interaction. Kognitive Systeme 1 (2017)
15. Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., André , E., Busso, C., Devillers, L.Y., Epps, J.,
Laukka, P., Narayanan, S.S., Truong, K.P.: The geneva minimalistic acoustic parameter set
(gemaps) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–
202 (2016)
[Crossref]
16. Eyben, F., Wö llmer, M., Schuller, B.: openSMILE—the Munich versatile and fast open-source
audio feature extractor. In: Proceedings of the ACM MM-2010 (2010)
17. Gwet, K.L.: Intrarater reliability, pp. 473–485. Wiley, Hoboken, USA (2008)
18. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining
software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
[Crossref]
19. Hassenzahl, M., Burmester, M., Koller, F.: AttrakDiff: Ein Fragebogen zur Messung
wahrgenommener hedonischer und pragmatischer Qualitä t. In: Szwillus, G., Ziegler, J. (eds.)
Mensch & Computer 2003, Berichte des German Chapter of the ACM, vol. 57, pp. 187–196.
Vieweg+Teubner, Wiesbaden, Germany (2003)
[Crossref]
20. Hoffmann-Riem, C.: Die Sozialforschung einer interpretativen Soziologie - Der Datengewinn.
Kö lner Zeitschrift fü r Soziologie und Sozialpsychologie 32, 339–372 (1980)
21. Horcher, G.: Woman says her amazon device recorded private conversation, sent it out to
random contact. KIRO7 (2018). Accessed 25 May 2018
22.
Hö bel-Mü ller, J., Siegert, I., Heinemann, R., Requardt, A.F., Tornow, M., Wendemuth, A.: Analysis
of the in luence of different room acoustics on acoustic emotion features. In: Elektronische
Sprachsignalverarbeitung 2019. Tagungsband der 30. Konferenz, pp. 156–163, Dresden,
Germany (2019)
23. Jeffs, M.: Ok google, siri, alexa, cortana; can you tell me some stats on voice search? The Editr
Blog (2017). Accessed 8 Jan 2018
24. Jovanovic, N., op den Akker, R., Nijholt, A.: Human perception of intended addressee during
computer-assisted meetings. In: Proceedings of the 11th EACL, pp. 169–176 (2006)
25. Kleinberg, S.: 5 ways voice assistance is shaping consumer behavior. Think with Google
(2018). Accessed Jan 2018
26. Konzelmann, J.: Chatting up your google assistant just got easier. The Keyword, blog.google
(2018). Accessed 21 June 2018
28. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data.
Biometrics 33, 159–174 (1977)
29. Lange, J., Frommer, J.: Subjektives Erleben und intentionale Einstellung in Interviews zur
Nutzer-Companion-Interaktion. Proceedings der 41. GI-Jahrestagung, Lecture Notes in
Computer Science, vol. 192, pp. 240–254. Bonner Kö llen Verlag, Berlin, Germany (2011)
30. Lee, H., Stolcke, A., Shriberg, E.: Using out-of-domain data for lexical addressee detection in
human-human-computer dialog. In: Proceedings of NAACL, Atlanta, USA, pp. 221–229 (2013)
31. Liptak, A.: Amazon’s alexa started ordering people dollhouses after hearing its name on tv. The
Verge (2017). Accessed 7 Jan 2017
32. Lunsford, R., Oviatt, S.: Human perception of intended addressee during computer-assisted
meetings. In: Proceedings of the 8th ACM ICMO, Banff, Alberta, Canada, pp. 20–27 (2006)
33. Mallidi, S.H., Maas, R., Goehner, K., Rastrow, A., Matsoukas, S., Hoffmeister, B.: Device-directed
utterance detection. In: Proceedings of the INTERSPEECH’18, pp. 1225–1228 (2018)
34. Marchi, E., Tonelli, D., Xu, X., Ringeval, F., Deng, J., Squartini, S., Schuller, B.: Pairwise
decomposition with deep neural networks and multiscale kernel subspace learning for
acoustic scene classi ication. In: Proceedings of the Detection and Classi ication of Acoustic
Scenes and Events 2016 Workshop (DCASE2016), pp. 543–547 (2016)
35. Mayring, P.: Qualitative Content Analysis: Theoretical Foundation, Basic Procedures and
Software Solution. SSOAR, Klagenfurt (2014)
36. Oh, A., Fox, H., Kleek, M.V., Adler, A., Gajos, K., Morency, L.P., Darrell, T.: Evaluating look-to-talk.
In: Proceedings of the Extended Abstracts on Human Factors in Computing Systems (CHI EA
’02), pp. 650–651 (2002)
37.
Osborne, J.: Why 100 million monthly cortana users on windows 10 is a big deal. TechRadar
(2016). Accessed 20 July 2016
38. Oshrat, Y., Bloch, A., Lerner, A., Cohen, A., Avigal, M., Zeilig, G.: Speech prosody as a biosignal for
physical pain detection. In: Proceedings of Speech Prosody, pp. 420–424 (2016)
39. Prylipko, D., Rö sner, D., Siegert, I., Gü nther, S., Friesen, R., Haase, M., Vlasenko, B., Wendemuth,
A.: Analysis of signi icant dialog events in realistic human-computer interaction. J. Multimodal
User Interfaces 8, 75–86 (2014)
[Crossref]
40. Ramanarayanan, V., Lange, P., Evanini, K., Molloy, H., Tsuprun, E., Qian, Y., Suendermann-Oeft,
D.: Using vision and speech features for automated prediction of performance metrics in
multimodal dialogs. ETS Res. Rep. Ser. 1, (2017)
41. Raveh, E., Siegert, I., Steiner, I., Gessinger, I., Mö bius, B.: Three’s a crowd? Effects of a second
human on vocal accommodation with a voice assistant. In: Proceedings of Interspeech 2019,
pp. 4005–4009 (2019). https://doi.org/10.21437/Interspeech.2019-1825
42. Raveh, E., Steiner, I., Siegert, I., Gessinger, I., Mó bius, B.: Comparing phonetic changes in
computer-directed and human-directed speech. In: Elektronische Sprachsignalverarbeitung
2019. Tagungsband der 30, Konferenz, Dresden, Germany, pp. 42–49 (2019)
43. Rö sner, D., Frommer, J., Friesen, R., Haase, M., Lange, J., Otto, M.: LAST MINUTE: a multimodal
corpus of speech-based user-companion interactions. In: Proceedings of the 8th LREC,
Istanbul, Turkey, pp. 96–103 (2012)
44. Schuller, B., Steid, S., Batliner, A., Bergelson, E., Krajewski, J., Janott, C., Amatuni, A., Casillas, M.,
Seidl, A., Soderstrom, M., Warlaumont, A.S., Hidalgo, G., Schnieder, S., Heiser, C., Hohenhorst, W.,
Herzog, M., Schmitt, M., Qian, K., Zhang, Y., Trigeorgis, G., Tzirakis, P., Zafeiriou, S.: The
interspeech 2017 computational paralinguistics challenge: Addressee, cold & snoring. In:
Proceedings of the INTERSPEECH-2017, Stockholm, Sweden, pp. 3442–3446 (2017)
45. Shriberg, E., Stolcke, A., Hakkani-Tü r, D., Heck, L.: Learning when to listen: detecting system-
addressed speech in human-human-computer dialog. In: Proceedings of the
INTERSPEECH’12, Portland, USA, pp. 334–337 (2012)
46. Shriberg, E., Stolcke, A., Ravuri, S.: Addressee detection for dialog systems using temporal and
spectral dimensions of speaking style. In: Proceedings of the INTERSPEECH’13, Lyon, France,
pp. 2559–2563 (2013)
47. Siegert, I., Lotz, A., Egorow, O., Wendemuth, A.: Improving speech-based emotion recognition
by using psychoacoustic modeling and analysis-by-synthesis. In: Proceedings of SPECOM
2017, 19th International Conference Speech and Computer, pp. 445–455. Springer
International Publishing, Cham (2017)
48. Siegert, I., Bö ck, R., Wendemuth, A.: Inter-rater reliability for emotion annotation in human-
computer interaction—comparison and methodological improvements. J. Multimodal User
Interfaces 8, 17–28 (2014)
[Crossref]
49.
Siegert, I., Jokisch, O., Lotz, A.F., Trojahn, F., Meszaros, M., Maruschke, M.: Acoustic cues for the
perceptual assessment of surround sound. In: Karpov, A., Potapova, R., Mporas, I. (eds.)
Proceedings of SPECOM 2017, 19th International Conference Speech and Computer, pp. 65–
75. Springer International Publishing, Cham (2017)
50. Siegert, I., Krü ger, J.: How do we speak with alexa—subjective and objective assessments of
changes in speaking style between hc and hh conversations. Kognitive Systeme 1 (2019)
51. Siegert, I., Krü ger, J., Egorow, O., Nietzold, J., Heinemann, R., Lotz, A.: Voice assistant
conversation corpus (VACC): a multi-scenario dataset for addressee detection in human-
computer-interaction using Amazon’s ALEXA. In: Proceedings of the 11th LREC, Paris, France
(2018)
52. Siegert, I., Lotz, A.F., Egorow, O., Wolff, S.: Utilizing psychoacoustic modeling to improve
speech-based emotion recognition. In: Proceedings of SPECOM 2018, 20th International
Conference Speech and Computer, pp. 625–635. Springer International Publishing, Cham
(2018)
53. Siegert, I., Nietzold, J., Heinemann, R., Wendemuth, A.: The restaurant booking corpus—
content-identical comparative human-human and human-computer simulated telephone
conversations. In: Berton, A., Haiber, U., Wolfgang, M. (eds.) Elektronische
Sprachsignalverarbeitung 2019. Tagungsband der 30. Konferenz. Studientexte zur
Sprachkommunikation, vol. 90, pp. 126–133. TUDpress, Dresden, Germany (2019)
54. Siegert, I., Shuran, T., Lotz, A.F.: Acoustic addressee-detection – analysing the impact of age,
gender and technical knowledge. In: Berton, A., Haiber, U., Wolfgang, M. (eds.) Elektronische
Sprachsignalverarbeitung 2018. Tagungsband der 29. Konferenz. Studientexte zur
Sprachkommunikation, vol. 90, pp. 113–120. TUDpress, Ulm, Germany (2018)
55. Siegert, I., Wendemuth, A.: ikannotate2—a tool supporting annotation of emotions in audio-
visual data. In: Trouvain, J., Steiner, I., Mö bius, B. (eds.) Elektronische
Sprachsignalverarbeitung 2017. Tagungsband der 28. Konferenz. Studientexte zur
Sprachkommunikation, vol. 86, pp. 17–24. TUDpress, Saarbrü cken, Germany (2017)
56. Statt, N.: Amazon adds follow-up mode for alexa to let you make back-to-back requests. The
Verge (2018). Accessed 8 Mar 2018
57. Terken, J., Joris, I., De Valk, L.: Multimodalcues for addressee-hood in triadic communication
with a human information retrieval agent. In: Proceedings of the 9th ACM ICMI, Nagoya, Aichi,
Japan, pp. 94–101 (2007)
58. Tesch, R.: Qualitative Research Analysis Types and Software Tools. Palmer Press, New York
(1990)
59. Tilley, A.: Neighbor unlocks front door without permission with the help of apple’s siri. Forbes
(2017). Accessed 17 Sept 2017
60. Toyama, S., Saito, D., Minematsu, N.: Use of global and acoustic features associated with
contextual factors to adapt language models for spontaneous speech recognition. In:
Proceedings of the INTERSPEECH’17, pp. 543–547 (2017)
61. Tsai, T., Stolcke, A., Slaney, M.: Multimodal addressee detection in multiparty dialogue systems.
In: Proceedings of the 40th ICASSP, Brisbane, Australia, pp. 2314–2318 (2015)
62.
van Turnhout, K., Terken, J., Bakx, I., Eggen, B.: Identifying the intended addressee in mixed
human-human and human-computer interaction from non-verbal features. In: Proceedings of
the 7th ACM ICMI, Torento, Italy, pp. 175–182 (2005)
63. Valli, A.: Notes on natural interaction. Technical Report, University of Florence, Italy (09
2007)
64. Vinyals, O., Bohus, D., Caruana, R.: Learning speaker, addressee and overlap detection models
from multimodal streams. In: Proceedings of the 14th ACM ICMI, Santa Monica, USA, pp. 417–
424 (2012)
65. Weinberg, G.: Contextual push-to-talk: a new technique for reducing voice dialog duration. In:
MobileHCI (2009)
66. Zhang, R., Lee, H., Polymenakos, L., Radev, D.R.: Addressee and response selection in multi-
party conversations with speaker interaction RNNs. In: Proceedings of the 2016 Conference
on Empirical Methods in Natural Language Processing, pp. 2133–2143 (2016)
Footnotes
1 The wake word to activate Amazon’s ALEXA from its “inactive” state to be able to make a
request is ‘Alexa’ by default.
Garrett Goodman
Email: [email protected]
Cogan Shimizu
Email: [email protected]
5.1 Introduction
The world is inundated with data. For any de inition of data, too, the
amount generated per second is incredible. With the explosion of the
Internet through the World Wide Web in the 1990s and early 2000s, as
well as the more recent exponential explosion of the Internet of Things,
it is without a doubt that making sense of this data is a primary
research question of this century.
In answer to that, Data Science, as a ield, emerged as a new sub-
discipline of Computer Science. This new profession is expected to
make sense of those vast stores of data. However, due to the ield’s
nascent nature, exactly what it means to “make sense” of data, the
techniques to do so, and the body of curricula that comprises the ield
are ill-de ined. That is not to say it is not a rich ield of study with a
common basis among de initions. During its initial conceptualization,
perhaps it was most accurate to say that Data Science is a coupling of
statistics and computer science. In fact, in 2001, William S. Cleveland
publishes “Data Science: An Action Plan for Expanding the Technical
Areas of the Field of Statistics”, where he describes several areas of
statistics that could be enhanced by applying data processing methods
from computer science.
Nowadays, Data Science means more than just statistics, instead
referring to anything that has something to do with data. While
statistical rigor is there, the ield has grown to encompass much more,
from collecting data to analyzing it to produce a model, to drawing from
other ields for imparting context to the data (e.g. business intelligence
and analytics).
Data scientists combine entrepreneurship with patience, an
immense amount of exploration, the willingness to build data products,
and an ability to iterate over a solution. This has grown the ield such
that there are now a multitude of interdisciplinary application areas
that can reasonably fall under the purview of Data Science.
Unfortunately, the ield was outgrowing its ability to provide
guidance for developing applications [1]. This leads to teams carrying
out ad hoc fashion data analysis and a time-consuming process of trial
and error for identifying the right tool for the job [2]. Thus, in order to
preserve its internal coherence, data scientists have placed an
increased emphasis on developing methodologies, best practices, and
how they interact in order to provide solutions.
In response, a methodology for Data Science is described by John
Rollins, a data scientist at IBM Analytics [3]. Rollin outlines an iterative
process of 10 stages, starting with solution conception and concludes
with solution deployment, feedback solicitation, and re inement.
Figure 5.1 provides a graphical overview of this methodology.
Fig. 5.1 Foundational methodology for data science courtesy of [3]
5.2 Background
In this section a brief background will be given about FIS, GAs and FKR.
A brief literature review is also presented to show how different
methodologies have been used to evaluate and improve the FIS. As was
mentioned earlier FIS is a methodology widely used to solve mainly
problems with vague data producing consistent results.
3.
IF (ABV is Moderate) AND (IBU is Extreme) THEN (Beer is IPA)
4.
IF (ABV is High) AND (IBU is Extreme) THEN (Beer is IPA)
5.
IF (ABV is Moderate) AND (IBU is Low) THEN (Beer is ABA)
6.
IF (ABV is Low) AND (IBU is Moderate) THEN (Beer is ABA)
7.
IF (ABV is Moderate) AND (IBU is Moderate) THEN (Beer is APA).
Now that steps 1 and 2 as listed from Sect. 5.2.1.1 are completed, we
can focus on the remaining 5 steps. We continued by setting the FIS to
use Zadeh’s method of calculating the logical AND. Also, our FIS used
the defuzzi ication bisector method for producing a crisp value for
comparison against the data. Fortunately, the MATLAB Fuzzy Logic
Designer toolbox handles the calculations of steps 3 through 7.
5.3.3 GA Construction
A FIS for predicting three different beers has now been heuristically
constructed. Though, due to the heuristic construction of the fuzzy
rules and membership functions, optimization can be performed to
improve the results. Manually performing the changes though is not
practical as there are 11 membership functions with 2 parameters
each. This is where the GA can be of assistance. In this example, we will
utilize the GA to update the parameters of the membership functions
only. Though, it is possible to also apply the GA to the AND/OR
conjunctions as well as the NOT of the fuzzy rule set as well. Once
again, we utilize a MATLAB toolbox, in this case the Optimization
toolbox with the solver being set to GA.
There is a total of 22 membership function parameters to be
updated, so a chromosome of size 22 is created. For positions 1-6, the
lower and upper bounds are between 0.025 and 0.105 (ABV
parameters). Following, positions 7–16 have lower and upper bounds
of 1 and 150 (IBU parameters). Then, positions 17–22 have lower and
upper bounds of 0.0001 and 1 (Beer parameters). We set the initial
population size to 50 and have the following options for how the GA
performs selection, mutation, etc.:
Creation Function = Uniform: The initial population is created by
randomly sampling from a uniform distribution.
Scaling Function = Rank: Each chromosome in the population is
scaled with respect to list sorted by itness. Removing the clustering
of raw scores and relying on an integer list instead.
Selection Function = Stochastic Uniform: Selects a subset of
chromosomes from the population by stepping through the rank
scaled population and randomly selecting based on a uniform
probability.
Mutation Function = Adaptive Feasible: Mutation is applied to
positions of each surviving chromosome which are feasible with
respect to the constraints placed on the chromosome.
Crossover Function = Scattered: Randomly creates a vector of 1’s and
0’s the same size as the chromosome. The 1’s take the position from
the irst parent and the 0’s take the position from the second parent.
Function to Optimize = Sum of Squared Error (SSE).
5.3.4 Results
We present the results of both the heuristically created FIS and the GA
optimized version separately. We will compare the improvements by
examining the precision and recall of each target class, the prediction
surface before and after GA optimization, and the SSE as this is the
function we are optimizing. Beginning with the heuristically created
FIS, we can see from Heuristic FIS column of Table 5.2 that the
precision and recall for the ABA (0.91 and 0.77) and IPA (0.78 and 0.83)
predictions is performing quite well. Though, the precision and recall
for APA (0.59 and 0.55) could use improvement. We also note the SSE of
the heuristic FIS is 30. From Fig. 5.2, we can see an uneven surface
accompanied with unnecessary valleys. We, the authors, are not craft
beer experts and thus do not know the exact ranges of ABV or IBU in
which say an IPA constitutes. Though, based on the precision and recall
in Table 5.1, it was a good attempt. These discrepancies in the
membership functions are what cause the abnormalities found in the
surface of the heuristic FIS.
Table 5.2 Precision, recall, and SSE of the heuristic FIS compared against the GA optimized FIS
4.
Reify f-SWRL rule to create a KG in OWL.
In this case, Steps 3 and 4 are the open research question. Second,
we may start from a KG to construct an optimized FIS.
1.
Find or construct a KG.
2.
Mine rules from the KG.
3.
Convert the mined rules into f-SWRL rules.
4.
Convert the rule base into an FIS.
5.
Optimize the FIS via GA.
The third scenario is similar—we instead start with a Fuzzy
Ontology and initially attempt to mine f-SWRL rules. The pipeline
would continue from Step 3, as above. For all three of the scenarios so
far, it is also an open research question that all information contained in
the KG can be represented via rules, as f-SWRL is an extension of SWRL
and SWRL is a subset of OWL. Given that many ontologies contain
axioms with existential quanti iers in the consequent, SWRL may not be
wholly suf icient, but this will need further investigation.
Finally, as an FIS excels in assisting a user make an informed
decision in the face uncertainty or fuzziness, we imagine a clear
intersection with FIS, the above scenarios, and the nascent ield of
Stream Reasoning. Stream Reasoning is the study of applying inference
techniques to highly dynamic data. That is, data that might change on a
second to second (or faster) basis. In particular, this data may be triples
about information collected from sensors. This sensor data will have
uncertainty and fuzziness. A pertinent and open avenue of research
would investigate how the use of an FIS (handcrafted or optimized)
might complement the technologies available to the stream reasoning
community.
5.5 Conclusions
In this book chapter we analyzed different methodologies on how we
can optimize a broadly recognized and used fuzzy inference system.
First, crucial components of how data science has affected the scienti ic
community were presented. The application of data science and how
different methodologies have played crucial role over the years for the
development of the area were demonstrated. Then a background was
given on fuzzy inference systems, genetic algorithms and knowledge
graphs, in both technical and literature perspectives.
Accordingly, a dataset was used and rules were created for a FIS.
The output of the was optimized with the use of a genetic algorithm
and the results of this procedure were presented. The results showed
an improvement on the recall and precision as well as a smoother
convergence surface. Even though we are not experts on beer crafting
the results after the use of the GA are optimized. In other words, our
attempt on improving a FIS with the use of a GA worked.
Then, several different routes were proposed on how with the use of
knowledge graphs we can further improve the outputs of our optimized
system. The methodologies that were proposed are future work that
targets the integration of three different systems into one with main
goal the optimization of an FIS.
References
1. Saltz, J.S.: The need for new processes, methodologies and tools to support big data teams and
improve big data project effectiveness. In: 2015 IEEE International Conference on Big Data
(Big Data), pp. 2066–2071. IEEE (2015)
2. Bhardwaj, A., Bhattacherjee, S., Chavan, A., Deshpande, A., Elmore, A.J., Madden, S.,
Parameswaran, A.G.: Datahub: Collaborative Data Science & Dataset Version Management at
Scale (2014). arXiv preprint arXiv:1409.0798
3. Rollins, J.: Why we need a methodology for data science (2015). https://www.ibmbigdatahub.
com/blog/why-we-need-methodology-data-science. Accessed 06 Mar 2019
4. Papadakis Ktistakis, I.: An autonomous intelligent robotic wheelchair to assist people in need:
standing-up, turning-around and sitting-down. Doctoral dissertation, Wright State University
(2018)
5.
Lee, C.C.: Fuzzy logic in control systems: fuzzy logic controller. II. IEEE Trans. Syst. Man
Cybern. 20(2), 419–435 (1990)
6. Abraham, A.: Adaptation of fuzzy inference system using neural learning. In: Fuzzy Systems
Engineering, pp. 53–83. Springer, Berlin, Heidelberg (2005)
8. Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)
[Crossref]
10. Zadeh, L.A.: Outline of a new approach to the analysis of complex systems and decision
processes. IEEE Trans. Syst. Man Cybern. 1, 28–44 (1973)
[MathSciNet][Crossref]
11. Rao, J.B., Zakaria, A.: Improvement of the switching of behaviours using a fuzzy inference
system for powered wheelchair controllers. In: Engineering Applications for New Materials
and Technologies, pp. 205–217. Springer, Cham (2018)
12. Bourbakis, N., Ktistakis, I.P., Tsoukalas, L., Alamaniotis, M.: An autonomous intelligent
wheelchair for assisting people at need in smart homes: a case study. In: 2015 6th
International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1–
7. IEEE (2015)
13. Ktistakis, I.P., Bourbakis, N.G.: Assistive intelligent robotic wheelchairs. IEEE Potentials 36(1),
10–13 (2017)
[Crossref]
14. Ktistakis, I.P., Bourbakis, N.: An SPN modeling of the H-IRW getting-up task. In: 2016 IEEE
28th International Conference on Tools with Arti icial Intelligence (ICTAI), pp. 766–771. IEEE
(2016)
15. Ktistakis, I.P., Bourbakis, N.: A multimodal human-machine interaction scheme for an
intelligent robotic nurse. In: 2018 IEEE 30th International Conference on Tools with Arti icial
Intelligence (ICTAI), pp. 749–756. IEEE (2018)
16. Mohamed, S. R., Shohaimay, F., Ramli, N., Ismail, N., Samsudin, S.S.: Academic poster evaluation
by Mamdani-type fuzzy inference system. In: Regional Conference on Science, Technology and
Social Sciences (RCSTSS 2016), pp. 871–879. Springer, Singapore (2018)
17. Pourjavad, E., Mayorga, R.V.: A comparative study and measuring performance of
manufacturing systems with Mamdani fuzzy inference system. J. Intell. Manuf. 1–13 (2017)
18. Jain, V., Raheja, S.: Improving the prediction rate of diabetes using fuzzy expert system. IJ Inf.
Technol. Comput. Sci. 10, 84–91 (2015)
19. Danisman, T., Bilasco, I.M., Martinet, J.: Boosting gender recognition performance with a fuzzy
inference system. Expert Syst. Appl. 42(5), 2772–2784 (2015)
[Crossref]
20.
Thakur, S., Raw, S.N., Sharma, R.: Design of a fuzzy model for thalassemia disease diagnosis:
using Mamdani type fuzzy inference system (FIS). Int. J. Pharm. Pharm. Sci. 8(4), 356–61
(2016)
22. Gong, M., Yang, Y.H.: Multi-resolution stereo matching using genetic algorithm. In: Proceedings
IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001), pp. 21–29. IEEE (2001)
23. Brown, C., Barnum, P., Costello, D., Ferguson, G., Hu, B., Van Wie, M.: Quake ii as a robotic and
multi-agent platform. Robot. Vis. Tech. Rep. [Digital Repository] (2004). Available at HTTP.
http://hdl.handle.net/1802/1042.
24. Yasuda, G.I., Takai, H.: Sensor-based path planning and intelligent steering control of
nonholonomic mobile robots. In: IECON’01 27th Annual Conference of the IEEE Industrial
Electronics Society, vol. 1, pp. 317–322 (Cat. No. 37243). IEEE (2001)
25. Sandstrom, K., Norstrom, C.: Managing complex temporal requirements in real-time control
systems. In: Proceedings Ninth Annual IEEE International Conference and Workshop on the
Engineering of Computer-Based Systems, pp. 103–109. IEEE (2002)
26. Uz, M.E., Hadi, M.N.: Optimal design of semi active control for adjacent buildings connected by
MR damper based on integrated fuzzy logic and multi-objective genetic algorithm. Eng. Struct.
69, 135–148 (2014)
[Crossref]
27. Bobillo, F., Straccia, U.: The fuzzy ontology reasoner fuzzyDL. Knowl.-Based Syst. 95, 12–34
(2016)
[Crossref]
28. Di Noia, T., Mongiello, M., Nocera, F., Straccia, U.: A fuzzy ontology-based approach for tool-
supported decision making in architectural design. Knowl. Inf. Syst. 1–30 (2018)
29. Groth, W3C.: PROV-O: The PROV Ontology. https://www.w3.org/TR/prov-o/. Accessed 6 Apr
2019
30. Shimizu, C., Hitzler, P., Paul, C.: Ontology design patterns for Winston’s taxonomy of part-
whole-relationships. Proceedings WOP (2018).
31. Straccia, U.: Fuzzy semantic web languages and beyond. In: International Conference on
Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 3–8.
Springer, Cham (2017)
32. Straccia, U.: An Introduction to Fuzzy & Annotated Semantic Web Languages (2018). arXiv
preprint arXiv:1811.05724
33. Straccia, U.: A minimal deductive system for general fuzzy RDF. In: International Conference
on Web Reasoning and Rule Systems, pp. 166–181. Springer, Berlin, Heidelberg (2009)
34. Straccia, U.: Towards a fuzzy description logic for the semantic web (preliminary report). In:
European Semantic Web Conference, pp. 167–181. Springer, Berlin, Heidelberg (2005)
35.
Pan, J.Z., Stamou, G., Tzouvaras, V., Horrocks, I.: f-SWRL: a fuzzy extension of SWRL.
In: International Conference on Arti icial Neural Networks, pp. 829–834. Springer, Berlin,
Heidelberg (2005)
36. Lopes, N., Polleres, A., Straccia, U., Zimmermann, A.: AnQL: SPARQLing up annotated RDFS. In:
International Semantic Web Conference, pp. 518–533. Springer, Berlin, Heidelberg (2010)
37. Nguyen, V.T.K.: Semantic Web Foundations for Representing, Reasoning, and Traversing
Contextualized Knowledge Graphs (2017)
38. Bonatti, P.A., Decker, S., Polleres, A., Presutti, V.: Knowledge Graphs: New Directions for
Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371). Schloss Dagstuhl-
Leibniz-Zentrum fuer Informatik (2019)
Footnotes
1 https://www.youtube.com/watch?v=J_Q5X0nTmrA.
© Springer Nature Switzerland AG 2021
G. Phillips-Wren et al. (eds.), Advances in Data Science: Methodologies and Applications,
Intelligent Systems Reference Library 189
https://doi.org/10.1007/978-3-030-51870-7_6
Benedetta Muzii
Email: [email protected]
Anna Esposito
Email: [email protected]
Abstract
Over a century ago, psychoanalysis created an unprecedented
challenge: to show that the effects of the unconscious are more
powerful than those of consciousness. In an inverted scheme at present
time, neurosciences challenge psychoanalysis with experimental and
clinical models that are clarifying crucial aspects of the human mind.
Freud himself loved to say that psychological facts do not luctuate in
the air and that perhaps one day, biologists and psychoanalysts would
give a common explanation for psychic processes. Today, the rapid
development of neuroimaging methods has ushered in a new season of
research. Crucial questions are becoming more apparent. For instance,
how can the brain generate conscious states? Does consciousness only
involve limited area of the brain? These are insistent questions in a time
where the tendency of neuroscience to naturalize our relationship life
is ever more urgent. Consequently, these questions are also pressing:
Does morality originate in the brain? Can we still say “being free” or
freedom? Why does morality even exist? Lastly, is there a biologically
founded universal morality? This paper will try to demonstrate how
neurophysiology itself shows the implausibility of a universal morality.
References
1. Peter-Hagene, L.C., Salerno, J.M., Phalen, H.: Jury decision making. Psychol. Sci. Law 338
(2019)
2.
Singer, N., Kreuzpointner, L., Sommer, M., Wü st, S., Kudielka, B.M.: Decision-making in everyday
moral con lict situations: development and validation of a new measure. PLoS ONE 14(4),
e0214747 (2019)
[Crossref]
3. Maldonato, M., Dell’Orco, S.: Making decisions under uncertainty emotions, risk and biases. In:
Advances in Neural Networks: Computational and Theoretical Issues, pp. 293–302. Springer,
Cham (2015)
4. Kahneman, D., Rosen ield, A.M., Gandhi, L., Blaser, T.: Noise: how to overcome the high, hidden
cost of inconsistent decision making. Harv. Bus. Rev. 94(10), 38–46 (2016)
5. Maldonato, M., Dell’Orco, S.: Toward an evolutionary theory of rationality. World Futures
66(2), 103–123 (2010)
[Crossref]
6. Maldonato, M., Dell’Orco, S.: The natural logic of action. World Futures 69(3), 174–183
(2013)
[Crossref]
7. Chomsky, N.: The Logical Structure of Linguistic Theory. Plenum Press, New York and London
(1975)
8. Hauser, M.D., Young, L.: Modules, minds and morality. In: Hormones and Social Behaviour, pp.
1–11. Springer, Berlin, Heidelberg (2008)
9. Damasio, A.R.: The Feeling of What Happens: Body and Emotion in the Making of
Consciousness. Houghton Mif lin Harcourt (1999)
10. Dugatkin, L.: Animal cooperation among unrelated individuals. Naturwissenschaften 89(12),
533–541 (2002)
[Crossref]
11. Seyfarth, R.M., Cheney, D.L., Bergman, T., Fischer, J., Zuberbü hler, K., Hammerschmidt, K.: The
central importance of information in studies of animal communication. Anim. Behav. 80(1),
3–8 (2010)
12. Denton, K.K., Krebs, D.L.: Rational and emotional sources of moral decision-making: an
evolutionary-developmental account. Evol. Psychol. Sci. 3(1), 72–85 (2017)
[Crossref]
13. Maldonato, M., Dell’Orco, S., Sperandeo, R.: When intuitive decisions making, based on
expertise, may deliver better results than a rational, deliberate approach. In: Multidisciplinary
Approaches to Neural Computing, pp. 369–377. Springer, Cham (2018)
14. Maldonato, M., Dell’Orco, S., Esposito, A.: The emergence of creativity. World Futures 72(7–8),
319–326 (2016)
[Crossref]
15. Oliverio, A., Maldonato, M.: The creative brain. In: 2014 5th IEEE Conference on Cognitive
Infocommunications (CogInfoCom), pp. 527–532. IEEE (2014)
16.
Greene, J.D., Morelli, S.A., Lowenberg, K., Nystrom, L.E., Cohen, J.D.: Cognitive load selectively
interferes with utilitarian moral judgment. Cognition 107(3), 1144–1154 (2008)
[Crossref]
17. Wrangham, R.W.: Two types of aggression in human evolution. Proc. Natl. Acad. Sci. 115(2),
245–253 (2018)
[Crossref]
18. Maldonato, M.: The wonder of reason at the psychological roots of violence. In: Advances in
Culturally-Aware Intelligent Systems and in Cross-Cultural Psychological Studies, pp. 449–459.
Springer, Cham (2018)
19. Eibl-Eibesfeldt, I., Longo, G.: Etologia della guerra. Boringhieri (1983)
20. Foot, P.: Virtues and Vices and Other Essays in Moral Philosophy. Oxford University Press on
Demand (2002)
21. Tinghö g, G., Andersson, D., Bonn, C., Johannesson, M., Kirchler, M., Koppel, L., Vä st jä ll, D.:
Intuition and moral decision-making—the effect of time pressure and cognitive load on moral
judgment and altruistic behavior. PLoS ONE 11(10), e0164012 (2016)
[Crossref]
22. Hauser, M., Cushman, F., Young, L., Kang-Xing Jin, R., Mikhail, J.: A dissociation between moral
judgments and justi ications. Mind Lang. 22(1), 1–21 (2007)
[Crossref]
23. Hauser, M., Shermer, M.: Can science determine moral values? A challenge from and dialogue
with Marc Hauser about The Moral Arc. Skeptic (Altadena, CA) 20(4), 18–25 (2015)
24. Moll, J., Eslinger, P.J., Oliveira-Souza, R.: Frontopolar and anterior temporal cortex activation in
a moral judgment task: preliminary functional MRI results in normal subjects. Arq.
Neuropsiquiatr. 59, 657–664 (2001)
25. Glannon, W.: The evolution of neuroethics. In: Debates About Neuroethics, pp. 19–44.
Springer, Cham (2017)
26. Helion, C., Ochsner, K.N.: The role of emotion regulation in moral judgment. Neuroethics
11(3), 297–308 (2018)
[Crossref]
27. Parker, A.M., De Bruin, W.B., Fischhoff, B.: Maximizers versus satis icers: decision-making
styles, competence, and outcomes. Judgm. Decis. Mak. 2(6), 342 (2007)
28. Maldonato, M., Dell’Orco, S.: Adaptive and evolutive algorithms: a natural logic for arti icial
mind. In: Toward Robotic Socially Believable Behaving Systems-Volume II, pp. 13–21.
Springer, Cham (2016)
29. Juavinett, A.L., Erlich, J.C., Churchland, A.K.: Decision-making behaviors: weighing ethology,
complexity, and sensorimotor compatibility. Curr. Opin. Neurobiol. 49, 42–50 (2018)
[Crossref]
30.
Feigin, S., Owens, G., Goodyear-Smith, F.: Theories of human altruism: a systematic review. J.
Psychiatry Brain Funct. 1(1), 5 (2018)
[Crossref]
31. Pohling, R., Bzdok, D., Eigenstetter, M., Stumpf, S., Strobel, A.: What is ethical competence? The
role of empathy, personal values, and the ive-factor model of personality in ethical decision-
making. J. Bus. Ethics 137(3), 449–474 (2016)
[Crossref]
32. Gar inkel, H., Rawls, A., Lemert, C.C.: Seeing Sociologically: The Routine Grounds of Social
Action. Routledge (2015)
34. Dell’Orco, S., Esposito, A., Sperandeo, R., Maldonato, N.M.: Decisions under temporal and
emotional pressure: the hidden relationships between the unconscious, personality, and
cognitive styles. World Futures 1–14 (2019)
35. Maldonato, M., Dell’Orco, S.: How to make decisions in an uncertain world: heuristics, biases,
and risk perception. World Futures 67(8), 569–577 (2011)
[Crossref]
© Springer Nature Switzerland AG 2021
G. Phillips-Wren et al. (eds.), Advances in Data Science: Methodologies and Applications,
Intelligent Systems Reference Library 189
https://doi.org/10.1007/978-3-030-51870-7_7
Giansalvo Cirrincione
Email: [email protected]
Eros Pasero
Email: [email protected]
Abstract
Dealing with time-varying high dimensional data is a big problem for
real time pattern recognition. Non-stationary topological
representation can be addressed in two ways, according to the
application: life-long modeling or by forgetting the past. The G-EXIN
neural network addresses this problem by using life-long learning. It
uses an anisotropic convex polytope, which, models the shape of the
neuron neighborhood, and employs a novel kind of edge, called bridge,
which carries information on the extent of the distribution time change.
In order to take into account the high dimensionality of data, a novel
neural network, named GCCA, which embeds G-EXIN as the basic
quantization tool, allows a real-time non-linear dimensionality
reduction based on the Curvilinear Component Analysis. If, instead, a
hierarchical tree is requested for the interpretation of data clustering,
the new network GH-EXIN can be used. It uses G-EXIN for the clustering
of each tree node dataset. This chapter illustrates the basic ideas of this
family of neural networks and shows their performance by means of
synthetic and real experiments.
7.2 G-EXIN
G-EXIN [27] is an online, self-organizing, incremental neural network
whose number of neurons is determined by the quantization of the
input space. It uses seeds to colonize a new region of the input space,
and two distinct types of links (edges and bridges), to track data non-
stationarity. Each neuron is equipped with a weight vector to quantize
the input space and with a threshold to represent the average shape of
its region of in luence. In addition, it employs a new anisotropic
threshold idea, based on the shape (convex hull) of neuron
neighborhood to better match data topology. G-EXIN is incremental, i.e.
it can increase or decrease (pruning by age) the number of neurons. It
is also online: data taken directly from the input stream are fed only
once to the network. The training is never stopped, and the networks
keeps adapting itself to each new datum, that is, it is stochastic in
nature.
(7.1)
(7.2a)
(7.2b)
where as in k-means [18] and . Here, Ni
is the total number of times wi has been the irst winner and, α and σ
are two user-dependent parameters.
(b)
if xi is outside NGw1, only (2a) is used (Hard Competitive Learning,
HCL).
Next, for all the neurons that have been moved, i.e. whose weight
vector has changed, say φ-neurons, their thresholds are recomputed,
and their activation lags are set to true.
Finally, all the φ-neurons bridges, both ingoing and outgoing, are
checked and all those which have both neurons at their ends with
activation lags equal to true become edges.
2.
If there is a bridge, it is checked if w1 is the bridge tail; in this case,
step 1 is performed and the bridge becomes an edge. Otherwise, a
seed is created by means of the neuron doubling:
(a)
a virtual adaptation of the w1 weight is estimated by HCL (only
(2a) is used) and considered as the weight of a new neuron
(doubling).
(b)
w1 and the new neuron are linked with an edge (age set to zero)
and their thresholds are computed (they correspond to their
Euclidean distance).
Fig. 7.2 GCCA lowchart: black blocks deal with G-EXIN quantization while red ones, speci ically,
with GCCA projection
(7.3)
(7.4)
7.4 GH-EXIN
Hierarchical clustering is an important technique to retrieve multi-
resolution information from data. It creates a tree of clusters, which
corresponds to different resolution of data analysis. Generally, e.g. in
data mining, the outcome is a richer information if compared with plain
clustering.
The growing hierarchical GH-EXIN [28, 29] neural network builds a
hierarchical tree based on a stationary variant (i.e. without bridges) of
G-EXIN, called sG-EXIN. As before, the network is both incremental
(data-driven) and self-organized. It is a top-down, divisive technique, in
which all data start in a single cluster and, then, splits are done
recursively until all clusters satisfy certain conditions.
The algorithm starts from a single root node, which has associated
ictitiously the whole dataset; then, using vertical and horizontal
growths, it builds a hierarchical tree (see Fig. 7.3). Vertical growth
refers to the addition of further layers to leaf nodes until a higher
resolution is needed; it always implies the creation of a seed, i.e. a pair
of neurons, which represents the starting structure of a new sG-EXIN
neural network. On the other side, horizontal growth is the process of
adding further neurons to the seed. This characteristic is important in
order to be able to create complex hierarchical structures; indeed,
without it, it would be possible to build only binary trees. This process
is performed by the neuron creation mechanism during the sG-EXIN
training. As G-EXIN, GH-EXIN uses convex hull to de ine neuron
neighborhood, which implies the anisotropic region of in luence for the
horizontal growth. In addition, upon time, it performs outlier detection
and, when needed, it reallocates their data by using a novel
simultaneous approach on all the leaves.
The GH-EXIN training algorithm starts, as already mentioned, from
a single root node whose Voronoi set is the whole input dataset. It is
considered as the initial father node. A father neuron Ψ is the basis for
a further growth of the tree; indeed, new leaves are created (vertical
growth), whose father is Ψ and whose Voronoi sets are a partition (i.e. a
grouping of a set’s elements into non-empty subsets, whose
intersection is the empty set) of the Ψ one. More speci ically, for each
father neuron Ψ, which does not satisfy the vertical growth stop
criterion, a new seed is created as in G-EXIN and, then, an sG-EXIN
neural network is trained using the father Voronoi set as training set.
The neurons yielded by the training, which de ines a so-called neural
unit, became the sons of Ψ in the tree determining a partition of its
Voronoi set. If the resulting network does not satisfy the horizontal
growth stop criterion, the training is repeated for further epochs (i.e.
presentation of the whole Ψ dataset) until the criterion is ful illed.
At the end of each training epoch, if a neuron remains unconnected
(no neighbors) or is still lonely, it is pruned, but the associated data are
analyzed and possibly reassigned as explained later in this section.
At the end of each horizontal growth, the topology abstraction check
is performed to search for connected components within the graph of
the resulting neural unit. If more than one connected component is
detected, the algorithm tries to extract an abstract representation of
data; at this purpose, each connected component, representing a
cluster of data, is associated with a novel abstract neuron, which
becomes the father node of the connected component neurons,
determining a double simultaneous vertical growth. As weight vectors
of the abstract neurons are used the centroids of the clusters they
represent.
Then, each leaf, in the same level of the hierarchy of Ψ, that does not
satisfy the vertical growth stop criterion, is considered as a father node
and the growth algorithm is repeated, until no more leaves are available
in that speci ic level.
Finally, the overall above procedure is repeated on all the leaves of
the novel, deeper level yielded from the previous vertical growth;
therefore, the tree can keep growing until the needed resolution is
reached, that is, until the vertical growth stop criterion is satis ied for
all the leaves of the tree.
It is worth to be noticed that such mechanism allows a
simultaneous vertical and horizontal growth; indeed, due to node
creation (seed) below a father an additional level is added to the tree
(i.e. vertical growth) and, at the same time, thanks to sG-EXIN training,
several nodes are added to the same level (i.e. horizontal growth).
The novelty test (Semi-Isotropic Region of In luence), the weights
update (SCL) and the pruning mechanism (pruning by age) are the
same as in G-EXIN. The difference is that GH-EXIN is based on sG-EXIN
which, as stated above, does not have bridges; as a consequence, each
time a new neuron is created along the GH-EXIN training process, it is
created as a lonely neuron, that is a neuron with no edges. Then, in the
next iterations connections may be created according to the
Competitive Hebbian Rule; if, at the end of the epoch, the neuron is still
lonely, it will be removed according to the pruning rule.
When a neuron is removed, its Voronoi set data remain orphans and
are labelled as potential outliers to be checked at the end of each epoch;
for each potential outlier x, i.e. each datum, GH-EXIN seeks a possible
new candidate among all leaf nodes. If the closest neuron w among the
remaining, i.e. the new winner, belongs to the same neural unit of x but
the datum is outside its region of in luence (the hypersphere and the
convex-hull), x is not reassigned; otherwise, if x is within a winner
region of in luence within the same neural unit or in case the winner
belongs to another neural unit, it is reassigned to the winner Voronoi.
The growth stop criteria are used to drive, in an adaptive way, the
quantization process; for this reason, they are both based on the H
index, which, depends on the application at hand, and it is used to
measure clusters heterogeneity and purity, i.e. their quality. For the
horizontal growth, the idea is to check if the H average estimated value
of the neurons of the neural unit being built falls below a percentage of
the value of the father node. On the other side, in the vertical growth
stop criterion, a global user-dependent threshold is used for H; at the
same time, to avoid too small, meaningless clusters, a mincard parameter
is used to establish the minimum cardinality of Voronoi sets, i.e. the
maximum meaningful resolution.
7.5 Experiments
The performance of the above-mentioned neural networks has been
tested on both synthetic and real experiments. The aim has been to
check their clustering capabilities and to assess their speci ic abilities
(e.g. projection).
7.5.1 G-EXIN
The irst experiment deals with data drawn uniformly from a 5000-
points square distribution, which, after an initial steady state
(stationary phase), starts to move vertically (non-stationary phase).
Indeed, in the beginning, the network is trained with data randomly
extracted (without repetition) from the 5000-points square. Then, after
the presentation of the whole training set, the (support of the)
distribution starts to move monotonically, with constant velocity, along
the y-axis in the positive direction. The results of G-EXIN (agemax= 2, α
= 1, σ = 0.03) are presented in Figs. 7.4 and 7.5 both for the stationary
and non-stationary phases, respectively. Firstly, the network is able to
properly quantize the input distribution even along its borders; then, it
is able to fully understand the data evolution over time and to track it
after the end of the steady state. The importance of the density of
bridges as a signal of non-stationarity is also revealed in Fig. 7.6, which
shows how the number of bridges changes in time. In particular, the
growth is linear, which is a consequence of the constant velocity of the
distribution. G-EXIN correctly judges the data stream as drawn by a
single distribution with fully connected support, thanks to its links (i.e.,
edges and bridges). Figure 7.5 also shows G-EXIN performs life-long
learning, in the sense that previous quantization is not forgotten.
Fig. 7.4 G-EXIN: vertical moving square, stationary phase. Neurons (circles) and their links:
edges (green), bridges (red)
Fig. 7.5 G-EXIN: vertical moving square, non-stationary phase. Neurons (circles) and their links:
edges (green), bridges (red)
Fig. 7.6 G-EXIN: vertical moving square, number of bridges (Y-axis) over time (X-axis)
Resuming, the use of different, speci ic, anisotropic, links has been
proved to be an appropriate solution to track non-stationary input
changes into the input distribution.
The second experiment deals with data drawn uniformly from a
5000-points square distribution whose support changes abruptly
(jump) three times (from NW to NE, then from NE to SW and, inally,
from SW to SE), in order to test on abrupt changes. Figure 7.7 shows the
results of G-EXIN (agemax= 9, α = 1, σ = 0.06) on such dataset, where
neuron weights are represented as small dots and links as green
(edges) and red segments (bridges); the same color is used for all
neurons because the network does not perform any classi ication task.
Fig. 7.7 G-EXIN: three jumps moving square. Neurons (circles) and their links: edges (green),
bridges (red)
Not only G-EXIN learns the data topology and preserves all the
information without forgetting the previous history, as in the previous
experiment, but it is able to track an abrupt change in the distribution
by means of a single, long, bridge. The length of the bridges is
proportional to the extent of the distribution change.
Figure 7.7 also shows the G-EXIN graph is able to represent well the
borders of the squares because of its anisotropic threshold. On the
contrary, this is not possible with a simpler isotropic technique.
The third experiment deals with a more challenging problem: data
drawn from a dataset coming from the bearing failure diagnostic and
prognostic platform [30], which provides access to accelerated bearing
degradation test. In particular, the test is based on a non-stationary
framework that evolves from an initial transient to its healthy state to a
double fault. Figure 7.8 shows G-EXIN (agemax= 3, α = 0.2, σ = 0.01) on
the experiment dataset during the complete life of the bearing: the
initial transient, the healthy state and the following deterioration (the
structure and color legenda are the same as in the previous igures).
The transient phase is visible as the small cluster in the bottom left part
of the igure. Then, the long vertical bridge signals the onset of the
healthy state, which is represented as the central region made neurons
connected by green and red edges. Finally, on the right and upper of
this region there is the formation of longer and longer bridges which
detect the deterioration of the bearing.
Fig. 7.8 G-EXIN: bearing fault experiment. Neurons (circles) and their links: edges (green),
bridges (red)
7.5.3 GH-EXIN
Considering that GH-EXIN has been conceived for hierarchical
clustering, a dataset composed of two Gaussian mixture models has
been devised: the irst model is made of three Gaussians, the second
one of four Gaussians, as shown in Fig. 7.11.
Fig. 7.11 GH-EXIN: Gaussian dataset. Data (blue points) and contours
The results, visualized in Fig. 7.12 and Fig. 7.13, clearly show that
GH-EXIN (Hmax = 0.001, Hperc = 0.9, αγ0 = 0.5, αi0 = 0.05, agemax = 5,
mincard = 300) builds the correct hierarchy (the tree is visualized in
Fig. 7.14): two nodes in the irst layer (level), which represent the two
clusters, and as many leaves as Gaussians in the second layer, which
represent the mixtures. Neurons are also positioned correctly w.r.t. the
centers of the Gaussians.
Fig. 7.12 GH-EXIN: Gaussian dataset, irst level of the hierarchy. Data (yellow points) and
neurons (blue points)
Fig. 7.13 GH-EXIN: Gaussian dataset, second level of the hierarchy. Data (yellow points) and
neurons (blue points)
Fig. 7.14 GH-EXIN: Gaussian dataset, inal tree and cardinality of nodes and leaves
7.6 Conclusions
This chapter addresses the problem of inferring information from
unlabeled data drawn from stationary or non-stationary distributions.
At this aim, a family of novel unsupervised neural networks has been
introduced. The basic ideas are implemented in the G-EXIN neural
network, which is the basic tool of the family. The other neural
networks, GCCA and GH-EXIN, are extensions of G-EXIN, for
dimensionality reduction and hierarchical clustering, respectively. All
these networks exploit new peculiar tools: bridges, which are links for
detecting changes in the data distribution; anisotropic threshold for
taking into account the shape of the distribution; seed and associated
neuron doubling for the colonization of new distributions; soft-
competitive learning with the use of a Gaussian to represent the winner
neighborhood.
The experiments show these neural networks work well both for
synthetic and real experiments. In particular, they perform long-life
learning, build a quantization of the input space, represent the data
topology with edges and the non-stationarity with bridges, perform the
CCA non-linear dimensionality reduction with an accuracy comparable
to the of line CCA, yield the correct tree in case of hierarchical
clustering. These are fast algorithms that require only a few user-
dependent parameters.
Future work will deal with the search of new automatic variants,
which self-calibrate their parameters, and more challenging
applications.
References
1. Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. IEEE Trans. Commun.
28, 84–95 (1980)
2. MacQueen, J.: Some methods for classi ication and analysis of multivariate observations. In:
Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability.
Berkeley (USA) (1967)
3. Martinetz, T., Schulten, K.: A “neural-gas” network learns topologies. Artif. Neural Netw. 397–
402 (1991)
4. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43,
59–69 (1982)
5. White, R.H.: Competitive hebbian learning: algorithm and demonstrations. In: Neural Netw.
20(2), 261–275 (1992)
6. Martinetz, T., Schulten, K.: Topology representing networks. Neural Netw. 7(3), 507–522
(1994)
7. Prudent, Y., Ennaji, A.: An incremental growing neural gas learns topologies. In: Proceedings of
the IEEE International Joint Conference on Neural Networks. Motré al, Quebec, Canada (2005)
8. Furao, S., Ogurab, T., Hasegawab, O.: An enhanced self-organizing incremental neural. Neural
Netw. 20, 893–903 (2007)
[Crossref]
9. Bouguelia, M.R., Belaı̈d, Y., Belaı̈d, A.: An adaptive incremental clustering method based on the
growing neural gas algorithm. In: 2nd International Conference on Pattern Recognition
Applications and Methods ICPRAM 2013. Barcelona, (Spain) (2013)
10. Bouguelia, M.R., Belaı̈d, Y., Belaı̈d, A.: Online unsupervised neural-gas learning method for
in inite. In: Pattern Recognition Applications and Methods, pp. 57–70 (2015)
11.
Rougier, N.P., Boniface, Y.: Dynamic self-organizing map. Neurocomputing 74(11), 1840–1847
(2011)
12. Carpenter, G., Grossberg, S.: The ART of adaptive pattern recognition by a self-organizing
neural network. IEEE Comput. Soc. 21, 77–88 (1988)
13. Fritzke, B.: A growing neural gas network learns topologies. In: Advances in Neural
Information Processing System, vol. 7, pp. 625–632 (1995)
14. Fritzke, B.: A self-organizing network that can follow non-stationary distributions. In:
Proceedings of ICANN 97, International Conference on Arti icial Neural Networks. Lausanne,
Switzerland (1997)
15. Ghesmoune, M., Lebbah, M., Azzag, H.: State-of-the-art on clustering data streams. In: Big Data
Analytics, pp. 1–13 (2016)
16. Ghesmoune, M., Azzag, H., Lebbah, M.: G-stream: growing neural gas over data stream. In:
Neural Information Processing, 21st International Conference, ICONIP, Kuching, Malaysia
(2014)
17. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an ef icient data clustering method for very
large databases. In: SIGMOD Conference. New York (1996)
18. Kranen, P., Assent, I., Baldauf, C., & Seidl, T.: The ClusTree indexing microclusters for anytime
stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)
19. Aggarwal, C.C., Philip, S.Y., Han, J., Wang, J.: A framework for clustering evolving data streams.
In: VLDB2003 Proceedings of the VLDB Endowment. Berlin (2003)
20. Cao, F., Estert, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream
with noise. In: SIAM International Conference on Data Mining (SDM06). Maryland (2006)
21. Isaksson, C., Dunham, M.H., Hahsler, M.: SOStream: self organizing density-based clustering
over data stream. In: 8th International Conference on Machine Learning and Data Mining
MLDM 2012. Berlin (2012)
22. Cirrincione, G., Hé rault, J., Randazzo, V.: The on-line curvilinear component analysis (onCCA)
for real-time data reduction. In: Proceedings of the IEEE International Joint Conference on
Neural Networks. Killarney (Ireland) (2015)
23. Cirrincione, G., Randazzo, V., Pasero, E.: Growing curvilinear component analysis (GCCA) for
dimensionality reduction of nonstationary data. In: Multidisciplinary Approaches to Neural
Computing. Springer International Publishing, pp. 151–160 (2018)
24. Kumar, R.R., Randazzo, V., Cirrincione, G., Cirrincione, M., Pasero, E.: Analysis of stator faults in
induction machines using growing curvilinear component analysis. In: International
Conference on Electrical Machines and Systems ICEMS2017. Sydney (Australia) (2017)
25. Cirrincione, G., Randazzo, V., Pasero, E.: The Growing curvilinear component analysis (GCCA)
neural network. Neural Netw. 108–117 (2018)
26.
Cirrincione, G., Randazzo, V., Kumar, R.R., Cirrincione, M., Pasero, E.: Growing curvilinear
component analysis (GCCA) for stator fault detection in induction machines. In: Neural
Approaches to Dynamics of Signal Exchanges. Springer International Publishing (2019)
27. Randazzo, V., Cirrincione, G., Ciravegna, G., Pasero, E.: Nonstationary topological learning with
bridges and convex polytopes: the G-EXIN neural network. In: 2018 International Joint
Conference on Neural Networks (IJCNN). Rio de Janeiro (2018)
28. Barbiero, P., Bertotti, A., Ciravegna, G., Cirrincione, G., Pasero, E., Piccolo, E.: Unsupervised
gene identi ication in colorectal cancer. In: Quantifying and Processing Biomedical and
Behavioral Signals. Springer International Publishing, pp. 219–227 (2018)
29. Barbiero, P., Bertotti, A., Ciravegna, G., Cirrincione, G., Cirrincione, M., Piccolo, E.: Neural
biclustering in gene expression analysis. In: 2017 International Conference on Computational
Science and Computational Intelligence (CSCI). Las Vegas (2017)
30. Center, N.A.R.: FEMTO Bearing Data Set, NASA Ames Prognostics Data Repository. http://ti.arc.
nasa.gov/project/prognostic-data-repository
© Springer Nature Switzerland AG 2021
G. Phillips-Wren et al. (eds.), Advances in Data Science: Methodologies and Applications,
Intelligent Systems Reference Library 189
https://doi.org/10.1007/978-3-030-51870-7_8
A. Leone
Email: [email protected]
L. Giampetruzzi
Email: [email protected]
P. Siciliano
Email: [email protected]
Abstract
The electromyography signals (EMG) are widely used for the joint
movements and muscles contractions monitoring in several healthcare
applications. The recent progresses in surface EMG (sEMG)
technologies have allowed for the development of low invasive and
reliable sEMG-based wearable devices with this aim. These devices
promote long-term monitoring, however they are often very expensive
and not easy to be appropriately positioned. Moreover they employ
mono-use pre-gelled electrodes that can cause skin redness. To
overcome these issues, a prototype of a new smart sock has been
realized. It is equipped with reusable stretchable and non-adhesive
hybrid polymer electrolytes-based electrodes and can send sEMG data
through a low energy wireless transmission connection. The developed
device detects EMG signals coming from the Gastrocnemius-Tibialis
muscles of the legs and it is suitable for lower-limb related pathology
assessment, such as age-related changes in gait, sarcopenia pathology,
fall risk, etc. In the paper it has been described, as a case study, the use
of the socks to detect the risk of falling. A Machine Learning scheme has
been chosen in order to overcome the well-known drawbacks of
threshold approaches widely used in pre-fall systems, in which the
algorithm parameters have to be set according to the users’ speci ic
physical characteristics. The supervised classi ication phase has been
obtained through a low computational cost and a high classi ication
accuracy level Linear Discriminant Analysis. The developed system
shows high performance in terms of sensitivity and speci icity (about
80%) in controlled conditions, with a mean lead-time before the impact
of about 700 ms.
8.1 Introduction
Recently, bio-signal measurements among which electromyography
(EMG) and electroencephalography (EEG) have been increasingly
demanded. In particular, EMG is a medical procedure that provide the
acquisition of the electric potentials produced by the voluntary
contraction of the skeletal muscle ibers. These potentials are bio-
electric signals, acquired from the human body and then iltered to
reduce the noise produced by other electrical activities of the body or
inappropriate contact of the sensors, namely artifacts. Than the signals
are processed in a control system to acquire information regarding the
anatomical and physiological muscles’ characteristics and to make a
diagnosis. During last years, several works in literature have focused
the attention on the use of the EMG signals in medical context [1, 2].
They record and analyze the intramuscular or surface EMG (sEMG)
signals in order to study the human body’s behaviors under normal and
pathological conditions. The sEMG measurement method is safer and
less invasive than the intramuscular technique and it presents good
performance in the muscle action potentials monitoring. It uses non-
invasive, skin surface electrodes, realized with pre-gelled, textile or
hydrogel materials, located near the muscles of interest [3]. Application
in medicine regarding the use of the electromyography analysis appears
relevant for assessment of age-related changes in gait, and for diagnosis
in Sarcopenia Pathology (SP), Amyotrophic Lateral Sclerosis (ALS) and
Multiple Sclerosis (MS) or other neuropathies, postural anomalies, fall
risk, etc. [1, 4]. For the considered diseases, the lower limb muscles are
mainly monitored through the medical wired stations or portable and
wearable technologies. The last progresses in EMG technology have
allowed for the development of low invasive and reliable EMG based
wearable devices. They may be used in the monitoring of the elderly
during their normal activities for detection of dangerous events in
healthcare. In this work the attention has been focused on the leg
muscles assessment for the fall risk evaluation.
Fall events represent the second leading cause of accidental death
brought about by preventable injury. This rate mostly refers to people
over 60 years of age [5]. To date, several automatic integrated wearable
devices and ambient sensor devices capable of fall detection have been
constructed [6–9]. They present a good performance in terms of
sensitivity and speci icity and can alert the caregiver allowing a quick
medical intervention and the reduction of fall consequences. Although
these devices are remarkable, they cannot prevent injuries resulting
from the impact on the loor. To overcome this limitation advanced
technologies should be developed on the timely recognition of
imbalance and fall event; thereby reducing, not only the time of
probable medical intervention, but also through the activation of an
impact protection system (i.e. an airbag). The current solutions for the
assessment of patient physical instability, presented in the literature,
primarily monitor the users’ body movements and their muscle
behaviors [1–10]. While the kinematic analysis of human movements is
mainly accomplished through context aware systems, such as motion
capture systems, and wearable inertial sensors, implantable and
surface electromyography sensors have been used to conduct the
analysis on muscle behavior. Considering the wearable devices, they are
more invasive than the context aware systems, but they present some
important advantages, such as: the re-design of the environments is not
required, outdoor operation is possible and ethical issues (e.g. privacy)
are always satis ied. For these reasons in this paper the attention has
been focused on the fall risk detection systems based on wearable
technologies. The majority of the studies presented in the literature for
wearable-based fall risk assessment use accelerometer and gyroscopes
systems. They measure above all acceleration, velocity and posture of
the user’s body and appear to be a promising solution to reduce the fall
effect. Another strategy to evaluate the human imbalance is provided by
the use of the electromyography technique which measures the
electrical potentials produced by the lower limb muscles. They mainly
describe the changes reaction sequence and muscle contractile force
during an imbalance event. These studies suggest that the lack of
balance causes a sudden modi ication on the EMG patterns brought
about by reactive/corrective neuromuscular response [11, 12]. This
could indicate that imbalance detection systems based on EMG signals
may represent a very responsive and effective strategy for fall
prevention. In this kind of analysis, wired probes or wireless devices
integrating pre-gelled silver/silver chloride (Ag/AgCl) electrodes are
mainly used. However, these electrodes are single-use, uncomfortable
and unsuitable for a long-term monitoring due to their encumbrance
and skin irritation. In Fig. 8.1 some examples of wearable sEMG-based
devices are reported. Although they are minimally invasive and have a
wireless connection, its placement is not very simple and use pre-gelled
single-use electrodes.
Fig. 8.1 Examples of wearable and wireless sEMG-based devices
Fig. 8.4 HPe-based electrodes have been casting incorporating the clip, in the site where the
Myoware muscle sensor board is placed
One board was sewed on each sock and connected to the Myoware
device through conductive wires. The whole system was supplied with
a rechargeable Lipo battery of 3.7 V of 320 mA with dimension of 26.5 ×
25 × 4 mm and weight of 4 gr. It was placed and glued in the rear part of
the Beetle board. Figure 8.7 shows the realized prototype. Each
electronic component was insulated through an Acrylic resin lacquer; in
the future non-invasive packaging will be provided to make the system
washable. The total current consumption was measured to evaluate the
lifetime of the battery. Based on the results the whole system consumes
about 40 mA in data transmission mode. So, considering the employed
battery, the system is able to monitor the lower limb muscles and to
send data to a smartphone/embedded PC for about 8 h. Future
improvements should be addressed to increase the system autonomy,
optimizing the hardware and their power management logics. The
prototype was realized by using an elastic sock to enhance the adhesion
between the electrodes and the skin. The sensors were located on the
socks in correspondence of the antagonist Gastrocnemius-Tibialis
Muscles. The algorithmic framework for the elaboration of the EMG
signals, coming from the sensorized socks, was located and tested on an
embedded PC, equipped with a Bluetooth module.
Fig. 8.10 Functioning scheme of the movable platform used to induce imbalance conditions to
perform falling events
Fig. 8.12 Example of pre-elaborated signals for the four sEMG channels, obtained during a
bending action simulation
where lowEMGi is the EMG signal value for the less activity muscle,
while the highEMGi is the corresponding activity of the higher active
muscle
Mean absolute value
(MAV)
Variance (VAR)
where Pτ(Ci) is the prior probability of class (Ci) and usually is assigned
to 1/M with the assumption of equal priors; μ is overall mean
vector; ∑i is the average scatter of the sample vectors of different
classes (Ci) around their representative mean vector μi:
8.2.4 Results
To evaluate the performance of the system, the CCI and IEMG features
have been calculated for all ADLs and unbalance events, simulated
during the aforementioned acquisition campaign. In Table 8.2, the
values of the chosen features obtained considering whole dataset.
Table 8.2 Mean and standard deviation of the features for the actions simulated during the data
collection phase
8.3 Conclusion
In the paper new and low invasive surface Electromyography-based
smart socks for the monitoring of the antagonist Gastrocnemius-
Tibialis muscles is presented. The system is suitable for the evaluation
of several diseases related to the lower limb movements and activities
such as age-related changes in gait, fall risk, sarcopenia pathology,
amyotrophic lateral sclerosis and other peripheral neuropathies. The
performance of the developed hardware-software in terms of
sensitivity, speci icity and lead-time before the impact were high and
the level of users’ acceptability could be higher, regarding the
sEMG/EMG-based wearable systems present in literature and in the
market. The realized wearable sEMG-based system may cover a
relevant rule in the healthcare applications addressed to monitor the
elderly during their normal day-to-day activities in easy and effective
way. Moreover, it may be used in the long-term muscular behavior
monitoring for fall event recognition and impact protection systems
activation. The used Machine Learning scheme is computationally low
intensive, however it shows high performance in detection rate and
generalization degree by ensuring low detection time.
So it allows for the increase of decision making time before a
wearable airbag device activation. This may provide a signi icant
contribution to enhance the effectiveness and reliability of wearable
pre-fall systems. Future improvements could be addressed to improve
the performance of the hardware system, increasing the lifetime of the
battery and the system-level of the impermeability.
References
1. Joyce, N.C., Gregory, G.T.: Electrodiagnosis in persons with amyotrophic lateral sclerosis. PM &
R: J. Injury Funct. Rehabil. 5(5 Suppl), S89–95 (2013)
[Crossref]
2. Chowdhury, R.H., Reaz, M.B., Ali, M.A., Bakar, A.A., Chellappan, K., Chang, T.G.: Surface
electromyography signal processing and classi ication techniques. Sensors (Basel). 13(9),
12431–12466 (2013)
[Crossref]
3. Ghasemzadeh, H., Jafari, R., Prabhakaran, B.: A body sensor network with electromyogram and
inertial sensors: multimodal interpretation of muscular activities. IEEE Trans. Inf. Technol.
Biomed. 14(2), 198–206 (2010)
4. Leone, A., Rescio, G., Caroppo, A., Siciliano, P.: A wearable EMG-based system pre-fall detector.
Procedia Eng. 120, 455–458 (2015)
[Crossref]
5. Chung, T., Prasad, K., Lloyd, T.E.: Peripheral neuropathy: clinical and electrophysiological
considerations. Neuroimaging Clin. N. Am. 24(1), 49–65 (2013)
6. Andò , B., Baglio, S., Marletta, V.: A neurofuzzy approach for fall detection. In: 23rd ICE/IEEE
ITMC Conference, Madeira Island, Portugal, 27–29 June 2017
7. Andò , B., Baglio, S., Marletta, V.: A inertial microsensors based wearable solution for the
assessment of postural instability. In: ISOCS-MiNaB-ICT-MNBS, Otranto, Lecce, 25–29 June
2016
8. Bagalà , F., Becker, C., Cappello, A., Chiari, L., Aminian, K., Hausdorff, J.M., Zijlstra, W., Klenk, J.:
Evaluation of accelerometer-based fall detection algorithms on real-world falls. PLoS ONE 7,
e37062 (2012)
[Crossref]
9. Siciliano, P., Leone, A., Diraco, G., Distante, C., Malfatti, M., Gonzo, L., Grassi, M., Lombardi, A.,
Rescio, G., Malcovati, P.: A networked multisensor system for ambient assisted living
application. Advances in sensors and interfaces. In: IWASI, pp. 139–143 (2009)
10. Rescio, G., Leone, A., Siciliano, P.: Supervised expert system for wearable MEMS
accelerometer-based fall detector. J. Sens. 2013, Article ID 254629, 11 (2013)
11. Blenkinsop, G.M., Pain, M.T., Hiley, M.J.: Balance control strategies during perturbed and
unperturbed balance in standing and handstand. R. Soc. Open Sci. 4(7), 161018 (2017)
12. Galeano, D., Brunetti, F., Torricelli, D., Piazza, S., Pons, J.L.: A tool for balance control training
using muscle synergies and multimodal interfaces. BioMed Res. Int. 565370 (2014)
13. Park, S., Jayaraman, S.: Smart textiles: wearable electronic systems. MRS Bull. 28, 585–591
(2013)
[Crossref]
14. Matsuhisa, N., Kaltenbrunner, M., Yokota, T., Jinno, H., Kuribara, K., Sekitani, T., Someya, T.:
Printable elastic conductors with a high conductivity for electronic textile applications. Nat.
Commun. 6, 7461 (2015)
15. Colyer, S.L., McGuigan, P.M.: Textile electrodes embedded in clothing: a practical alternative to
traditional surface electromyography when assessing muscle excitation during functional
movements. J. Sports Sci. Med. 17(1), 101–109 (2018)
16. Posada-Quintero, H., Rood, R., Burnham, K., Pennace, J., Chon, K.: Assessment of
carbon/salt/adhesive electrodes for surface electromyography measurements. IEEE J. Transl.
Eng. Health Med. 4, 2100209 (2016)
[Crossref]
17. Kim, D., Abidian, M., Martin, D.C.: Conducting polymers grown in hydrogel scaffolds coated on
neural prosthetic devices. J. Biomed. Mater. Res. 71A, 577–585 (2004)
[Crossref]
18.
Mahmud, H.N., Kassim, A., Zainal, Z., Yunus, W.M.: Fourier transform infrared study of
polypyrrole–poly(vinyl alcohol) conducting polymer composite ilms: evidence of ilm
formation and characterization. J. Appl. Polym. Sci. 100, 4107–4113 (2006)
[Crossref]
19. Li, Y., Zhu, C., Fan, D., Fu, R., Ma, P., Duan, Z., Chi, L.: Construction of porous sponge-like PVA-
CMC-PEG hydrogels with pH-sensitivity via phase separation for wound dressing. Int. J.
Polym. Mater. Polym. Biomater. 1–11 (2019)
20. Green, R.A., Baek, S., Poole-Warren, L.A., Martens, P.J.: Conducting polymer-hydrogels for
medical electrode applications. Sci. Technol. Adv. Mater. 11(1), 014107 (2010)
[Crossref]
21. Dai, W.S., Barbari, T.A.: Hydrogel membranes with mesh size asymmetry based on the gradient
crosslinking of poly (vinyl alcohol). J. Membr. Sci. 156(1), 67–79 (1999)
[Crossref]
22. Li, Y., Zhu, C., Fan, D., Fu, R., Ma, P., Duan, Z., Chi, L.: A bi-layer PVA/CMC/PEG hydrogel with
gradually changing pore sizes for wound dressing. Macromol. Biosci. 1800424 (2019)
23. Saadiah, M.A., Samsudin, A.S.: Study on ionic conduction of solid bio-polymer hybrid
electrolytes based carboxymethyl cellulose (CMC)/polyvinyl alcohol (PVA) doped NH4NO3.
In: AIP Conference Proceedings, vol. 2030, no. 1. AIP Publishing (2018)
24. Vieira, M.G.A., da Silva, M.A., dos Santos, L.O., Beppu, M.M.: Natural-based plasticizers and
biopolymer ilms: a review. Eur. Polymer J. 47(3), 254–263 (2011)
[Crossref]
25. Mali, K.K., Dhawale, S.C., Dias, R.J., Dhane, N.S., Ghorpade, V.S.: Citric acid crosslinked
carboxymethyl cellulose-based composite hydrogel ilms for drug delivery. Indian J. Pharm.
Sci. 80(4), 657–667 (2018)
[Crossref]
26. http://www.advancertechnologies.com
27. https://www.dfrobot.com
28. De Luca, C.J., Gilmore, L.D., Kuznetsov, M., Roy, S.H.: Filtering the surface EMG signal:
movement artifact and baseline noise contamination. J. Biomech. 43(8), 1573–1579 (2010)
[Crossref]
29. Phinyomark, A., Chujit, G., Phukpattaranont, P., Limsakul, C., Huosheng, H.: A preliminary study
assessing time-domain EMG features of classifying exercises in preventing falls in the elderly.
In: 9th International Conference on Electrical Engineering/Electronics, Computer,
Telecommunications and Information Technology (ECTI-CON), pp. 1, 4, 16–18 (2012)
30. Horsak, B., et al.: A. Muscle co-contraction around the knee when walking with unstable shoes.
J. Electromyogr. Kinesiol. 25 (2015)
31. Mansor, M.N., Syam, S.H., Rejab, M.N., Syam, A.H.: Automatically infant pain recognition based
on LDA classi ier. In: 2012 International Symposium on Instrumentation & Measurement,
Sensor Network and Automation (IMSNA), Sanya, pp. 380–382 (2012)
32.
Rescio, G., Leone, A., Siciliano, P.: Supervised machine learning scheme for electromyography-
based pre-fall detection system. Expert Syst. Appl. 100, 95–105 (2018)
[Crossref]
33. Wu, G., Xue, S.: Portable preimpact fall detector with inertial sensors. IEEE Trans. Neural Syst.
Rehabil. Eng. 16(2), 178–183 (2018)
© Springer Nature Switzerland AG 2021
G. Phillips-Wren et al. (eds.), Advances in Data Science: Methodologies and Applications,
Intelligent Systems Reference Library 189
https://doi.org/10.1007/978-3-030-51870-7_9
Stefano Marrone
Email: [email protected]
Abstract
Modern material and immaterial infrastructures have seen a growth of
their complexity as well as the criticality of the role played in this
interconnected society. Such a growth has brought to a need for
protection in particular of vital services (e.g., electricity, water supply,
computer networks, etc.). This chapter introduces the problem of
de ining in mathematical terms a useful de inition of vulnerability for
distributed and networked systems: this de inition is then mapped onto
the well-known formalism of Bayesian Networks. A demonstration of
the applicability of this theoretical framework is given by describing
the distributed plate car recognition problem, one of the possible faces
of the smart city model.
9.1 Introduction
The availability of a massive amount of data has enabled the massive
application of machine learning and deep learning techniques across
the domain of computer-based critical systems. A huge set of automatic
learning frameworks are now available and are able to tackle with
different kinds of systems, enabling the diffusion of Big Data analysis,
cloud computing systems and (Industrial) Internet of Things (IoT). As
such applications become more and more widespread, data analysis
techniques have shown their capability to identify operational patterns
and to predict future behavior for anticipating possible problems. A
widespread example is constituted by the installation of cameras inside
urban areas; such cameras are used for different purposes that come
from traf ic to city light management: the images produced by these
cameras can be easily and automatically used for other purposes.
On the other hand, models have been extensively used in many
computer-intensive activities: one over the others, dependability
formal assessment of critical infrastructures (CIs).
One of the main challenges of CIs design and operation management
is the quanti ication of critical aspects such as resilience [6, 7] and
security [41] in order to support evidence driven protection
mechanisms against several known and unknown threats. Modern
infrastructures are demanded to realize more and more critical
functions (i.e., to guarantee that this security level its in the
requirements set by customers and/or international standards). These
infrastructures are characterized by internal complexities as well as a
high degree of inter-dependency among them. This results in an under-
speci ied nature of operations in complex systems that generates
potential for unforeseeable failures and cascading effects [8]. Thus, the
higher the complexity, more credible that protection systems present
exploits and vulnerabilities.
The advantages of the integration between data-driven and explicit
knowledge are numerous: (a) to scale up complexity of data analysis
allowing reducing size in real world problems; (b) to boost human
activities in the supervision of complex system operations; (c) to
improve the trustworthiness of the system models built manually; (d)
to enhance the accuracy of the results predicted with the analysis; (e) to
support the creation of models-at-runtime that is to align models with
data logged by the system in operation; (f) to enable automatic
validation of models extracted by data mining.
This chapter wants to describe one of these modeling approaches,
the distributed vulnerability formally de ining it.
After recalled and formalized the main concepts of distributed
vulnerability, the chapter de ines a mapping between this formalism
and languages that could ease the capability of analyzing and evaluating
the distributed vulnerability. This chapter focuses on Bayesian
Networks (BNs) as a tool to easily implement the de ined mathematical
approach.
The third objective of the chapter is to discuss the application of
such framework to a Smart City problem: the problems of image
processing and computer vision is License Plate Clone Recognition.
The structure of the chapter is the following: this Section introduces
the problem and motivates the chapter. Section 9.2 discusses related
works while Sect. 9.3 provides some information needed to understand
the chapter easily. Section 9.4 gives a formal de inition of the
distributed vulnerability concept. Section 9.5 presents the mapping
from this language to BNs. Section 9.6 describes the case study with its
functions and components: Sect. 9.7 applies the modeling approach to
such a problem. Section 9.8 ends the chapter discussing the results and
addressing for further research.
(9.3)
(9.4)
(9.5)
(9.6)
(9.7)
Equation (9.2) de ines a set of events that may occur in the system (EV);
Eq. (9.3) de ines a set of assessment functions (AS); Eq. (9.4) de ines a
set of sensor devices (SE); elements of the relation , see Eq. (9.5), are
tuple saying that the event a triggers the activation of the
(9.9)
ok ko
The attack phase is successful The attack phase fails or it has not been
attempted
The rule has been activated and The threat The threat has not been detected
detected
The sensor has raised a warning The sensor is not producing any alarm
(9.10)
(9.11)
In summary, the vulnerability is the probability of successful attack
given the occurrence of a threat:
(9.12)
so, the vulnerability of the system for the i-th alarm according to the j-
th attack pattern is the following:
(9.13)
that becomes
(9.14)
Here, the concept of the distributed vulnerability, in response to an
attack pattern ap could be de ined as following:
(9.15)
ok ko
0 1
0 1
Sensors
Say d the sensor under consideration, s the service (of another
infrastructure) triggering d with probability , a the attack triggering
ok ko
0 1
Assessment Functions
Say rl the assessment rule under consideration and d the sensing
device triggering rl: CPTs are built according to the Table 9.4.
Table 9.4 CPT of bn(rl)
ok ko
1 0
0 1
All of these cases can be extended when there are more than one
occurrence per parent type. As example, if we consider that if there is
more than one sensor as input to an assessment rule, all must be ok in
order to activate the rule.
Computing the posterior probability
on
the BN model means to calculate the probability of having a
malfunctioning of the component in case of attack. According to the
given de initions, it represents the vulnerability function
.
BN analysis algorithms allow to evaluate the posterior probability of
all the nodes of the model ef iciently: thus, this formalism suits to
compute the distributed vulnerability function .
Pros Cons
Naive Extra simple; sites do not Demand for high bandwidth; demand for high
centralized need extra hardware computational power of the central server
Pros Cons
Centralized Quite simple; it does not need LPR server replicated on each site
large bandwidth network
Decentralized It does scale with the number Still a single point of failure (not performance but
and the size of the sites fault tolerance and security) is present (HLR);
quite complex
Distributed Fully scales with growth of Complex software architecture; complex
the system; simple system computing paradigm
architecture
Figure 9.6 depicts the order in which the four phases of the LPR
process are accomplished: Reliability Estimation, Car Plate Detection
and Number Extraction.
and that the same car cannot have two different colors within the same
time interval ( )
(9.21)
(9.22)
(9.23)
Let us suppose that the LPR devices are all of the same kind—i.e., have
the same performance: the same with COL sensors. For what concerns
the relation, this set has three kind of elements:
(9.24)
where is the probability to detect a plate x, or
(9.25)
where is the probability to detect the RED color, or
(9.26)
where is the probability to detect the BLUE color. Furthermore,
elements of are:
(9.27)
with s a generic sensor, r a generic rule and the probability that the
sensor s is working. Let us suppose for simplicity that the rules are
deterministic, i.e., all the rules have probability 1 to succeed when
preconditions are met.
(9.28)
According to this formalization, it is possible to generate a BN model as
depicted in Fig. 9.8 where gray nodes are present but related arcs are
not report to make the draw readable. Up to now, there is no tool in
charge of automating such translation process: implementing it is
straightforward task and, as future research work, an automatic
translation and analysis tool will be provided and made publicly
available.
x ok ko
ko ko 0 1
x ok ko
ko ok 0 1
ok ko 0 1
ok ok
ok ko
ko ko ko ko 0 1
ko ko ko ok 0 1
ko ko ok ko 0 1
ko ko ok ok 1 0
ko ok ko ko 0 1
ko ok ko ok 1 0
ko ok ok ko 1 0
ko ok ok ok 1 0
ok ko ko ko 0 1
ok ko ko ok 1 0
ok ko ok ko 1 0
ok ko ok ok 1 0
ok ok ko ko 1 0
ok ok ko ok 1 0
ok ok ok ko 1 0
ok ok ok ok 1 0
ko ko ko ko 0 1
ko ko ko ok 0 1
ko ko ok ko 0 1
ko ko ok ok 1 0
ko ok ko ko 0 1
ko ok ko ok 0 1
ko ok ok ko 1 0
ko ok ok ok 1 0
ok ko ko ko 0 1
ok ko ko ok 1 0
ok ko ok ko 0 1
ok ko ok ok 1 0
ok ok ko ko 1 0
ok ok ko ok 1 0
ok ok ok ko 1 0
ok ok ok ok 1 0
SamePlate DiffColor ok ko
ko ko 0 1
ko ok 0 1
ok ko 0 1
ok ok 1 0
References
1. Allam, Z., Dhunny, Z.A.: On big data, arti icial intelligence and smart cities. Cities 89, 80–91
(2019)
[Crossref]
2.
Anagnostopoulos, C.-N.E., Anagnostopoulos, I.E., Psoroulas, I.D., Loumos, V., Kayafas, E.:
License plate recognition from still images and video sequences: a survey. IEEE Trans. Intell.
Transp. Syst. 9(3), 377–391 (2008)
[Crossref]
3. Bagheri, E., Ghorbani, A.A.: UML-CI: a reference model for pro iling critical infrastructure
systems. Inf. Syst. Front. 12(2), 115–139 (2010)
[Crossref]
4. Bapin, Y., Zarikas, V.: Smart building’s elevator with intelligent control algorithm based on
bayesian networks. Int. J. Adv. Comput. Sci. Appl. 10(2), 16–24 (2019)
5. Barrere, M., Badonnel, R., Festor, O.: Towards the assessment of distributed vulnerabilities in
autonomic networks and systems. In: 2012 IEEE Network Operations and Management
Symposium (NOMS), pp. 335–342 (2012)
6. Bellini, E., Ceravolo, P., Nesi, P.: Quantify resilience enhancement of UTS through exploiting
connected community and internet of everything emerging technologies. ACM Trans. Internet
Technol. 18(1) (2017)
7. Bellini, E., Coconea, L., Nesi, P.: A functional resonance analysis method driven resilience
quanti ication for socio-technical systems. IEEE Syst. J. 1–11 (2019)
8. Bellini, E., Nesi, P., Coconea, L., Gaitanidou, E., Ferreira, P., Simoes, A., Candelieri, A.: Towards
resilience operationalization in urban transport system: the resolute project approach. In:
Proceedings of the 26th European Safety and Reliability Conference on Risk, Reliability and
Safety: Innovating Theory and Practice, ESREL 2016, p. 345 (2017)
9. Bellini, E., Nesi, P., Pantaleo, G., Venturi, A.: Functional resonance analysis method based-
decision support tool for urban transport system resilience management. In: IEEE 2nd
International Smart Cities Conference: Improving the Citizens Quality of Life, ISC2 2016,
Proceedings (2016)
10. Bobbio, A., Ciancamerla, E., Franceschinis, G., Gaeta, R., Minichino, M., Portinale, L.: Sequential
application of heterogeneous models for the safety analysis of a control system: a case study.
Reliab. Eng. Syst. Saf. 81(3), 269–280 (2003)
[Crossref]
11. Boreiko, O., Teslyuk, V.: Model of a controller for registering passenger low of public
transport for the “smart” city system. In: 2017 14th International Conference The Experience
of Designing and Application of CAD Systems in Microelectronics, CADSM 2017, Proceedings,
pp. 207–209 (2017)
12. Chang, S.-L., Chen, L.-S., Chung, Y.-C., Chen, S.-W.: Automatic license plate recognition. IEEE
Trans. Intell. Transp. Syst. 5(1), 42–53 (2004)
[Crossref]
13. Chen, B., Cheng, H.H.: A review of the applications of agent technology in traf ic and
transportation systems. IEEE Trans. Intell. Transp. Syst. 11(2), 485–497 (2010)
[Crossref]
14.
Dolinina, O., Pechenkin, V., Gubin, N., Kushnikov, V.: A petri net model for the waste disposal
process system in the “smart clean city” project. In: ACM International Conference Proceeding
Series (2018)
15. Fanti, M.P., Mangini, A.M., Roccotelli, M.: A petri net model for a building energy management
system based on a demand response approach. In: 2014 22nd Mediterranean Conference on
Control and Automation, MED 2014, pp. 816–821 (2014)
16. Flammini, F., Marrone, S., Mazzocca, N., Pappalardo, A., Pragliola, C., Vittorini, V.:
Trustworthiness evaluation of multi-sensor situation recognition in transit surveillance
scenarios. In: Proceedings of SECIHD Conference. LNCS, vol. 8128 (2013)
17. Flammini, F., Marrone, S., Mazzocca, N., Vittorini, V.: A new modeling approach to the safety
evaluation of n-modular redundant computer systems in presence of imperfect maintenance.
Reliab. Eng. Syst. Saf. 94(9), 1422–1432 (2009)
[Crossref]
18. Flammini, F., Marrone, S., Mazzocca, N., Vittorini, V.: Petri net modelling of physical
vulnerability. Critical Information Infrastructure Security. LNCS, vol. 6983, pp. 128–139.
Springer (2013)
19. Flammini, F., Vittorini, V., Pappalardo, A.: Challenges and emerging paradigms for augmented
surveillance. Effective Surveillance for Homeland Security. Chapman and Hall/CRC (2013)
20. Frigault, M., Wang, L., Singhal, A., Jajodia, S.: Measuring network security using dynamic
Bayesian network. In: Proceedings of the 4th ACM Workshop on Quality of Protection, QoP
’08, New York, NY, USA, pp. 23–30. ACM (2008)
21. Gentile, U., Marrone, S., De Paola, F., Nardone, R., Mazzocca, N., Giugni, M.: Model-based water
quality assurance in ground and surface provisioning systems. In: Proceedings—2015 10th
International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 3PGCIC 2015,
pp. 527–532
22. Gentile, U., Marrone, S., Mazzocca, N., Nardone, R.: Cost-energy modelling and pro iling of
smart domestic grids. Int. J. Grid Utility Comput. 7(4), 257–271 (2016)
[Crossref]
23. Ghahramani, Z., Ghahramani, Z., Kim, H.C.: Bayesian classi ier combination (2003)
24. Hä ring, I., Sansavini, G., Bellini, E., Martyn, N., Kovalenko, T., Kitsak, M., Vogelbacher, G., Ross, K.,
Bergerhausen, U., Barker, K., Linkov, I.: Towards a generic resilience management,
quanti ication and development process: general de initions, requirements, methods,
techniques and measures, and case studies. NATO Science Peace Secur. Ser. C Environ. Secur.
Part F1, 21–80 (2017)
[Crossref]
25. Huang, C., Wu, X., Wang, D.: Crowdsourcing-based urban anomaly prediction system for smart
cities. In: International Conference on Information and Knowledge Management, Proceedings,
24–28-Oct 2016, pp. 1969–1972 (2016)
26. Ismagilova, E., Hughes, L., Dwivedi, Y.K., Raman, K.R.: Smart cities: advances in research—an
information systems perspective. Int. J. Inf. Manag. 47, 88–100 (2019)
[Crossref]
27.
Jü rjens, J.: UMLsec: extending UML for secure systems development. In: Proceedings of the
5th International Conference on The Uni ied Modeling Language, UML ’02, London, UK, UK,
pp. 412–425. Springer(2002)
28. Kasaei, S.H.M., Kasaei, S.M.M.: Extraction and recognition of the vehicle license plate for
passing under outside environment. In: 2011 European Intelligence and Security Informatics
Conference (EISIC), pp. 234–237 (2011)
29. Korb, K.B., Nicholson, A.E.: Bayesian Arti icial Intelligence, 2nd edn. CRC Press Inc., Boca
Raton, FL, USA (2010)
[Crossref]
30. Langseth, H., Portinale, L.: Bayesian networks in reliability. Reliab. Eng. Syst. Saf. 92(1), 92–
108 (2007)
[Crossref]
31. Latorre-Biel, J.-I., Faulin, J., Jimé nez, E., Juan, A.A.: Simulation model of traf ic in smart cities
for decision-making support: case study in Tudela (Navarre, Spain). Lecture Notes in
Computer Science (including subseries Lecture Notes in Arti icial Intelligence and Lecture
Notes in Bioinformatics). LNCS, vol. 10268, pp. 144–153 (2017)
32. Lund, M.S., Solhaug, B., Stølen, K.: Risk analysis of changing and evolving systems using CORAS.
In: Aldini, A., Gorrieri, R. (eds.) Foundations of Security Analysis and Design VI, pp. 231–274.
Springer, Berlin, Heidelberg (2011)
33. Marrone, S., Rodrı́guez, R.J., Nardone, R., Flammini, F., Vittorini, V.: On synergies of cyber and
physical security modelling in vulnerability assessment of railway systems. Comput. Electr.
Eng. 47, 275–285 (2015)
[Crossref]
34. Mauw, S., Oostdijk, M.: Foundations of attack trees. In: 8th International Conference on
Information Security and Cryptology—ICISC 2005, Seoul, Korea, 1–2 Dec 2005, pp. 186–198.
Revised Selected Papers (2005)
35. Pederson, P., Dudenhoeffer, D., Hartley, S., Permann, M.: Critical infrastructure
interdependency modeling: a survey of U.S. and international research. Technical Report,
Idaho National Laboratory (2006)
36. Pettet, G., Nannapaneni, S., Stadnick, B., Dubey, A., Biswas, G.: Incident analysis and prediction
using clustering and Bayesian network. In: 2017 IEEE SmartWorld Ubiquitous Intelligence
and Computing, Advanced and Trusted Computed, Scalable Computing and Communications,
Cloud and Big Data Computing, Internet of People and Smart City Innovation,
SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI 2017—Conference Proceedings, pp. 1–8
(2018)
37. Quaglietta, E., D’Acierno, L., Punzo, V., Nardone, R., Mazzocca, N.: A simulation framework for
supporting design and real-time decisional phases in railway systems. In: IEEE Conference on
Intelligent Transportation Systems, Proceedings, ITSC, pp. 846–851 (2011)
38. Ranawana, R., Palade, V.: Multi-classi ier systems: review and a roadmap for developers. Int. J.
Hybrid Intell. Syst. 3(1) (2006)
39. Rä ty, T.D.: Survey on contemporary remote surveillance systems for public safety. Trans. Syst.
Man Cyber Part C 40(5), 493–515 (2010)
[Crossref]
40. Rinaldi, S.M., Peerenboom, J.P., Kelly, T.K.: Identifying, understanding, and analyzing critical
infrastructure interdependencies. IEEE Control Syst. Mag. 21(6), 11–25 (2001)
[Crossref]
41. Sha, L., Gopalakrishnan, S., Liu, X., Wang, Q.: Cyber-physical systems: a new frontier. In: IEEE
International Conference on Sensor Networks, Ubiquitous and Trustworthy Computing, 2008,
SUTC ’08, pp. 1–9 (2008)
42. Shari i, H., Shahbahrami, A.: A comparative study on different license plate recognition
algorithms. In: Cheri i, H., Zain, J.M., El-Qawasmeh, E. (eds.) Digital Information and
Communication Technology and Its Applications. Communications in Computer and
Information Science, vol. 167, pp. 686–691. Springer, Berlin, Heidelberg (2011)
[Crossref]
43. Simpson, E., Roberts, S., Psorakis, I., Smith, A.: Dynamic Bayesian combination of multiple
imperfect classi iers. In: Guy, T.V., Karny, M., Wolpert, D. (eds.) Decision Making and
Imperfection. Studies in Computational Intelligence, vol. 474. Springer (2013)
44. Skinner, S.C., Stracener, J.T.: A graph theoretic approach to modeling subsystem dependencies
within complex systems. In: WMSCI 2007, ISAS 2007, Proceedings, vol. 3, pp. 41–46 (2007)
45. Sun, F., Wu, C., Sheng, D.: Bayesian networks for intrusion dependency analysis in water
controlling systems. J. Inf. Sci. Eng. 33(4), 1069–1083 (2017)
[MathSciNet]
46. Tang, K., Zhou, M.-T., Wang, W.-Y.: Insider cyber threat situational awareness framework using
dynamic Bayesian networks. In: Proceedings of the 4th International Conference on Computer
Science Education (ICCSE), July 2009, pp. 1146–1150
47. Vaniš , M., Urbaniec, K.: Employing Bayesian networks and conditional probability functions
for determining dependences in road traf ic accidents data. In: 2017 Smart Cities Symposium
Prague, SCSP 2017—IEEE Proceedings (2017)
48. Xie, P., Li, J.H., Ou, X., Liu, P., Levy, R.: Using Bayesian networks for cyber security analysis. In:
2010 IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June
2010, pp. 211–220
49. Yousef, R., Liginlal, D., Fass, S., Aoun, C.: Combining morphological analysis and Bayesian belief
networks: a DSS for safer construction of a smart city. In: 2015 Americas Conference on
Information Systems, AMCIS 2015 (2015)
50. Zhao, J., Ma, S., Han, W., Yang, Y., Wang, X.: Research and implementation of license plate
recognition technology. In: 2012 24th Chinese Control and Decision Conference (CCDC), pp.
3768–3773 (2012)
51.
Zonouz, S.A., Khurana, H., Sanders, W.H., Yardley, T.M.: RRE: a game-theoretic intrusion
response and recovery engine. IEEE Trans. Parallel Distrib. Syst. 25(2), 395–406 (2014)
[Crossref]
Footnotes
1 We suppose the reader is acquainted to this computing paradigm: for further details see [13].
C. Kanagal-Balakrishna
Email: [email protected]
Z. Callejas
Email: [email protected]
Abstract
In recent years, sentiment analysis has attracted a lot of research
attention due to the explosive growth of online social media usage and
the abundant user data they generate. Twitter is one of the most
popular online social networks and a microblogging platform where
users share their thoughts and opinions on various topics. Twitter
enforces a character limit on tweets, which makes users ind creative
ways to express themselves using acronyms, abbreviations, emoticons,
etc. Additionally, communication on Twitter does not always follow
standard grammar or spelling rules. These peculiarities can be used as
features for performing sentiment classi ication of tweets. In this
chapter, we propose a Maximum Entropy classi ier that uses an
ensemble of feature sets that encompass opinion lexicons, n-grams and
word clusters to boost the performance of the sentiment classi ier. We
also demonstrate that using several opinion lexicons as feature sets
provides a better performance than using just one, at the same time as
adding word cluster information enriches the feature space.
10.1 Introduction
Due to the explosive growth of online social media in the last few years,
people are increasingly turning to social media platforms such as
Facebook, Twitter, Instagram, Tumblr, LinkedIn, etc., to share their
thoughts, views and opinions on products, services, politics, celebrities,
events, and companies. This has resulted in a massive amount of user-
generated data [24].
As the usage of online social media has grown, so has the interest in
the ield of sentiment analysis [17, 25, 27]. For the scienti ic community,
sentiment analysis is a challenging and complex ield of study with
applications in multiple disciplines and has become one of the most
active research areas in Natural Language Processing, data mining, web
mining and management sciences. For industry, the massive amount of
user-generated data is fertile ground for extracting consumer opinion
and sentiment towards their brands. In recent years, we have seen how
social media has helped reshape businesses and sway public opinion
and sentiment, sometimes with a single viral post or tweet. Therefore,
monitoring public sentiment towards their products and services
enables them to cater to their customers better.
In the last few years, Twitter has become a hugely popular
microblogging platform with over 500 million tweets a day. However,
Twitter only allows short messages of up to 140 characters which
results in users using abbreviations, acronyms, emoticons, etc., to
better express themselves. The ield of sentiment analysis in Twitter
therefore includes the various complexities brought by this form of
communication using short informal text. The main motivation for
studying sentiment analysis in Twitter is due to the immense academic
as well as commercial value that it provides [1, 3, 26].
Besides its commercial applications, the number of application-
oriented research papers published on sentiment analysis has been
steadily increasing. For example, several researchers have used
sentiment information to predict movie success and box-of ice revenue.
Mishne and Glance showed that positive sentiment is a better predictor
of movie success than simple buzz count [15]. Researchers have also
analyzed sentiments of public opinions in the context of electoral
politics. For example, in [20], a sentiment score was computed based
simply on counting positive and negative sentiment words, which was
shown to correlate well with presidential approval, political election
polls, and consumer con idence surveys. Market prediction is also
another popular research area for sentiment analysis [13].
The main research question that we want to ask in this chapter is:
Can we combine different feature extraction methods to boost the
performance of sentiment classi ication of tweets?
Raw data cannot be fed directly to the algorithms themselves as
most of the algorithms expect numerical feature vectors with a ixed
size rather than the raw text documents with variable length. Feature
extraction is the process of transforming text documents into numerical
feature vectors. There are many standard feature extraction methods
for sentiment analysis of text data such as Bag of Words representation,
tokenization, etc. Since feature extraction usually results in high
dimensionality of features, it is important to use features that provide
useful information to the machine learning algorithm.
We see from Fig. 10.1 that the class distribution is not balanced. For
model training and classi ication, balanced class distribution is very
important to ensure the prior probabilities are not biased caused by the
imbalanced class distribution.
The classi ication accuracy of the baseline classi ier, FS2A, which
uses Maximum Entropy algorithm is 73.63%, whereas the LibLinear
SVM algorithm provides an accuracy of 73.13%. While Maximum
Entropy performs slightly better, the difference is not signi icant. When
we compare the baseline classi iers which models from feature set 1,
we see that none of the classi iers perform as well as the baseline
classi iers for both the algorithms. Feature set 1, which uses various
combinations of opinion lexicons, provides the highest classi ication
accuracy when we combine the opinion lexicons; AFINN, Bing Liu
Lexicon, NRC-10 Word Emotion Association Lexicon, NRC-10 Expanded
Lexicon, NRC Hashtag Emotion Lexicon and NRC Hashtag Sentiment
Lexicon with accuracies of 67.58% and 70.15% for Maximum Entropy
and LibLinear SVM respectively. LibLinear SVM consistently
outperforms Maximum Entropy in feature set 1.
Feature set 2 includes models built using various word n-gram
combinations. FS2C Maximum Entropy classi ier achieves the highest
overall accuracy with 79.64%. We observe that the classi ication
accuracy rises when we include Bigrams, Bigrams and Trigrams to the
baseline classi ier which only uses Unigrams. While this is true of both
Maximum Entropy and LibLinear SVM, the performance improvement
is more apparent with Maximum Entropy which shows a signi icant
improvement over the baseline when the n-gram combination of
unigrams, bigrams and trigrams. While LibLinear SVM shows an
improvement over the unigram model, the difference between the
unigram-bigram and unigram-bigram-trigram model is not signi icant.
Feature set 3 includes models built using various cluster n-gram
combinations. FS3C Maximum Entropy classi ier achieves the highest
overall accuracy with 76.87%. We observe that the classi ication
accuracy rises when we include Bigrams, Bigrams and Trigrams to the
baseline classi ier which only uses Unigrams. This is the case for both
Maximum Entropy and LibLinear SVM, although the performance
improvement is more apparent with Maximum Entropy which shows a
signi icant improvement over the baseline when the n-gram
combination of unigrams, bigrams and trigrams. While both the
algorithms show an improvement over the unigram model, the
difference between the unigram-bigram and unigram-bigram-trigram
model is not large.
The proposed feature set uses the best performing model among
the 3 feature sets. Therefore, we combine the models FS1D, FS2C and
FS3C to generate the proposed classi ier model. The LibLinear SVM
model achieves an accuracy of 78.32% which is better than the
performance of all the other LibLinear SVM classi iers built using the 3
feature sets. However, Maximum Entropy shows a signi icant
improvement in performance. It achieves the highest classi ication
accuracy of 84.3% as well as the highest overall classi ication accuracy
of all the models used. The Kappa statistic of models built using the
feature sets 1, 2, and 3, as well as that of the proposed feature set is
illustrated in Fig. 10.4.
Fig. 10.4 Kappa statistic values obtained for the set of models
For Maximum Entropy, the precision, recall and the F-score of the
baseline model, FS2A, is 0.75, 0.738 and 0.739 respectively, thus having
a slightly better precision compared to recall. For feature set 1, the
precision ranges from 0.67 to 0.681, recall ranges from 0.632 to 0.676
and F-score ranges from 0.614 to 0.675. Thus, none of the models
perform as well as the baseline model in terms of these metrics. FS1D
achieves the highest precision, recall and accuracy among the feature
set 1 models. For feature set 2, FS2C performs the best in terms of
accuracy precision and recall achieving values of 0.803, 0.796 and
0.798 respectively. For feature set 3, FS3C performs better than the
baseline values achieving 0.772, 0.769 and 0.769. PFS, model from the
proposed feature set which includes cluster unigram-bigram-trigram
combination, word unigram-bigram-trigram combination achieves the
highest overall performance metrics compared to the baseline model
with precision, recall and F-score values of 0.844, 0.843 and 0.843.
Figure 10.6 indicates the performance metrics of precision, recall
and F-score for feature sets 1, 2, 3 and the proposed feature set for
LibLinear SVM classi ier.
For LibLinear SVM, the precision, recall and the F-score of the
baseline model, FS2A, is 0.748, 0.732 and 0.733 respectively, thus
having a slightly better precision compared to recall. For feature set 1,
the precision ranges from 0.68 to 0.701, recall ranges from 0.677 to
0.701 and F-score ranges from 0.676 to 0.704. Thus, none of the models
perform as well as the baseline model in terms of these metrics. FS1D
achieves the highest precision, recall and accuracy among the feature
set 1 models. For feature set 2, FS2C performs the best in terms of
accuracy precision and recall achieving values of 0.777, 0.764 and
0.765 respectively. For feature set 3, FS3C performs the better than the
baseline values achieving 0.762, 0757 and 0.758. However, we do not
see a signi icant improvement in the metrics for the proposed feature
set model which uses LibLinear SVM compared to the other high-
performing LibLinear models such as FS2C.
From our discussion, it appears that using Opinion Lexicons alone
as features to train machine learning algorithms such as Maximum
Entropy and Support Vector Machines does not raise classi ication
accuracy signi icantly. However, using multiple Opinion Lexicons to
generate features seems to provide a better performance than using
them individually. Though using a standard word n-gram iteration such
as unigrams to train machine learning algorithms provides a better
performance than using Opinion Lexicons, adding higher order word n-
grams as features signi icantly improves performance. However, it was
observed during our experimentation that this effect only carries until
trigrams.
Generating features with word n-grams of higher order than
trigrams does not improve the performance and is computationally
expensive since it generates a large number of features and increases
sparsity. When cluster n-grams are uses ad features by themselves, they
too provide a better performance with higher order n-grams. As with
the word n-grams, higher order cluster n-grams provided better
performance than cluster unigrams alone. And similar to word n-grams,
this effect was only noticed until we reached trigrams. Using cluster n-
grams of higher order not only increased the time taken for feature
extraction, feature selection and model training, it also did not keep the
pattern of increased performance seen with the addition of cluster
bigrams and trigrams. When Opinion Lexicons, word n-grams and
cluster n-grams were combined from all the high performing models of
the three feature sets, Maximum Entropy classi ier showed a marked
improvement in performance while LibLinear SVM did not show any
signi icant improvement.
From the different experiments, it can be concluded that a
combination of word unigrams-bigrams-trigrams, cluster unigrams-
bigrams-trigrams as well as a combination of six opinion lexicons used
a features and then ranked using Information Gain alogorithm and the
Ranker Search method provided the best performance in terms of
accuracy, precision, recall, F-score and Kappa statistic when used with
the Maximum Entropy Classi ier with the conjugate gradient descent
method.
Acknowledgements
This research has received funding from the European Union’s Horizon
2020 research and innovation programme under grant agreement No
823907 (MENHIR: Mental health monitoring through interactive
conversations https://menhir-project.eu).
References
1. Abid, F., Alam, M., Yasir, M., Li, C.: Sentiment analysis through recurrent variants latterly on
convolutional neural network of twitter. Future Gener. Comput. Syst. 95, 292–308 (2019)
[Crossref]
2. Aggarwal, C., Zhai, C.: Mining Text Data. Springer Science and Business Media (2012)
3. An ensemble classi ication system for twitter sentiment analysis: Ankit, Saleena, N. Procedia
Comput. Sci. 132, 937–946 (2018)
[Crossref]
4. Balazs, J.A., Velá squez, J.D.: Opinion mining and information fusion: a survey. Inf. Fusion 27,
95–110 (2016)
[Crossref]
5. Bradley, M., Lang, P.: Affective norms for English words (ANEW): instruction manual and
affective ratings. Technical Report, Center for Research in Psychophysiology, University of
Florida (1999)
6. Bravo-Marquez, F., Mendoza, M., Poblete, B.: Combining strengths, emotions and polarities for
boosting twitter sentiment analysis. In: Proceedings of Second International Workshop on
Issues of Sentiment Discovery and Opinion Mining, pp. 1–9. Chicago, USA (2013)
7. Garg, A., Roth, D.: Understanding probabilistic classi iers. machine learning. In: Proceedings of
12th European Conference on Machine Learning (ECML’01), pp. 179–191. Freiburg, Germany
(2001)
8.
Hatzivassiloglou, V., McKeown, K.: Predicting the semantic orientation of adjectives. In:
Proceedings of ACL’98, pp. 174–181 (1998)
9. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of 10th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), pp.
168–177. Seattle, WA, USA (2004)
10. Kumar, A., Sebastian, T.: Sentiment analysis: a perspective on its past, present and futures. Int. J.
Intell. Syst. Appl. 4(10), 1–14 (2012)
11. Landis, J., Koch, G.: The measurement of observer agreement for categorical data. Biometrics
33(1), 159–174 (1977)
[Crossref]
12. Liu, B.: Sentiment Analysis and Opinion Mining. Synthesis Digital Library of Engineering and
Computer Science. Morgan & Claypool (2012)
13. Liu, B.: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University
Press (2016)
14. Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey.
Ain Shams Eng. J. 5(4), 1093–1113 (2014)
[Crossref]
15. Mishne, G., Glance, N.: Predicting movie sales from blogger sentiments. In: Proceedings of
Computational Approaches to Analyzing Weblogs, Papers from the 2006 AAAI Spring
Symposium, pp. 1–4. Stanford, California, USA (2006)
16. Mohammad, S., Turney, P.: Crowdsourcing a word şemotion association lexicon. Comput.
Intell. 29(3), 436–465 (2013)
[MathSciNet][Crossref]
17. Montoyo, A., Martı́nez-Barco, P., Balahur, A.: Subjectivity and sentiment analysis: an overview
of the current state of the area and envisaged developments. Decis. Support Syst. 53(4), 675–
679 (2012)
[Crossref]
18. Nakov, P., Kozareva, Z., Ritte, A., Rosenthal, S., Stoyanov, V., Wilson, T.: SemEval-2013 Task 2:
sentiment analysis in Twitter. In: Proceedings of 7th International Workshop on Semantic
Evaluation (SemEval’13), pp. 312–320. Atlanta, Georgia, USA (2013)
19. Nielsen, F.: A new anew: evaluation of a word list for sentiment analysis in microblogs. In:
Proceedings of the ESWC2011 Workshop on ’Making Sense of Microposts’: Big Things Come in
Small Packages, pp. 93–98. Crete, Greece (2011)
20. O’Connor, B., Balasubramanyan, R., Routledge, B., Smith, N.: From tweets to polls: Linking text
sentiment to public opinion time series. In: Proceedings of AAAI Conference on Weblogs and
Social Media, pp. 122–129. Stanford, California, USA (2010)
21. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Now Publishers (2008)
22.
Pozzi, F., Fersini, E., Messina, E., Liu, B.: Sentiment Analysis in Social Networks. Morgan
Kaufmann (2017)
23. Schuller, B., Batliner, A.: Computational Paralinguistics: Emotion, Affect and Personality in
Speech and Language Processing. Wiley (2013)
24. Thai, M.T., Wu, W., Xiong, H.: Big Data in Complex and Social Networks. Chapman and Hall/CRC
(2016)
25. Wang, D., Zhu, S., Li, T.: SumView: a Web-based engine for summarizing product reviews and
customer opinions. Expert Syst. Appl. 40(1), 27–33 (2013)
[Crossref]
26. Xiong, S., Lv, H., Zhao, W., Ji, D.: Towards twitter sentiment classi ication by multi-level
sentiment-enriched word embeddings. Neurocomputing 275, 2459–2466 (2018)
[Crossref]
27. Yu, L.C., Wu, J.L., Chang, P.C., Chu, H.S.: Using a contextual entropy model to expand emotion
words and their intensity for the sentiment classi ication of stock market news. Knowl.-Based
Syst. 41, 89–97 (2013)
[Crossref]
Footnotes
1 Amazon Mechanical Turk, https://www.mturk.com/.
Matthias L. Hemmje
Email: [email protected]
Abstract
Handling Big Data requires new techniques with regard to data access,
integration, analysis, information visualization, perception, interaction,
and insight within innovative and successful information strategies
supporting informed decision making. After deriving and qualitatively
evaluating the conceptual IVIS4BigData Reference Model as well as
de ining a Service-Oriented Architecture, two prototypical reference
applications for demonstrations and hands-on exercises for previous
identi ied e-Science user stereotypes with special attention to the
overall user experience to meet the users’ expectation and way-of-
working will be outlined within this book chapter. In this way and
based on the requirements as well as data know-how and other expert
know-how of an international leading automotive original equipment
manufacturer and a leading international player in industrial
automation, two speci ic industrial Big Data analysis application
scenarios (anomaly detection on car-to-cloud data and (predictive
maintenance analysis on robotic sensor data) will be utilized to
demonstrate the practical applicability of the IVIS4BigData Reference
Model and proof this applicability through a comprehensive evaluation.
By instantiation of an IVIS4BigData infrastructure and its exemplary
prototypical proof-of-concept reference implementation, both
application scenarios aim at performing anomaly detection on real-
world data that empowers different end user stereotypes in the
automotive and robotics application domain to gain insight from car-to-
cloud as well as from robotic sensor data.
Fig. 11.1 Langren’s graph of determinations of the distance from toledo to Rome [34, 70]
11.2.3 IVIS4BigData
In 2016, Bornschlegl et al. systematically performed the Road Mapping
of Infrastructures for Advanced Visual Interfaces Supporting Big Data
workshop [14], where academic and industrial researchers and
practitioners working in the area of Big Data, Visual Analytics, and
Information Visualization were invited to discuss and validate future
visions of Advanced Visual Interface infrastructures supporting Big
Data applications. Within that context, the conceptual IVIS4BigData
reference model (c.f. Fig. 11.5) was derived, presented [11], and
qualitatively evaluated [7] within the workshop’s road mapping and
validation activities [13]. Afterwards, a set of conceptual end user
empowering use cases that serve as a base for a functional, i.e.
conceptual as well as technical IVIS4BigData system speci ication
supporting end users, domain experts, as well as for software architects
in utilizing IVIS4BigData have been modeled and published [9].
Fig. 11.5 IVIS4BigData reference model [11]
Fig. 11.9 Con iguration support in IVIS4BigData use case perception and effectuation [10]
Fig. 11.11 Conceptual anomaly detection on car-to-cloud and robotic sensor data model [15]
Fig. 11.19 Robotic sensor data analysis result—parameter PCMD, PFBK, IPHC, and IPHA [56]
2. Albert, W., Tullis, T.: Measuring the User Experience: Collecting, Analyzing, and Presenting
Usability Metrics. Newnes (2013)
3. Apache Software Foundation.: Apache Hadoop (Version: 2.6.3) (2014). Last accessed 10 Jan
2016
4. Apache Software Foundation.: Apache Spark (Version: 1.6.1) (2016). Last accessed 18 April
2016
5. Ardito, C., Buono, P., Costabile, M.F., Lanzilotti, R., Piccinno, A.: End users as co-designers of
their own tools and products. J. Vis. Lang. Comput. 23(2), 78–90 (2012). Special issue
dedicated to Prof. Piero Mussio
6. Berwind, K.: A Cross Industry Standard Process to support Big Data Applications in Virtual
Research Environments (forthcoming). Ph.D. thesis, University of Hagen, Faculty of
Mathematics and Computer Science, Chair of Multimedia and Internet Applications, Hagen,
Germany (2019)
8. Bornschlegl, M.X.: A Cross Industry Standard Process to support Big Data Applications in
Virtual Research Environments (forthcoming). Ph.D. thesis, Advanced Visual Interfaces
Supporting Distributed Cloud-Based Big Data Analysis, Hagen, Germany (2019)
9. Bornschlegl, M.X., Berwind, K., Hemmje, M.L.: Modeling end user empowerment in big data
applications. In: 26th International Conference on Software Engineering and Data Engineering
(SEDE: San Diego, CA, USA, 2–4 Oct 2017 (Winona, MN, USA, 2017), pp. 47–54. International
Society for Computers and Their Applications, International Society for Computers and Their
Applications (2017)
10. Bornschlegl, M.X., Berwind, K., Hemmje, M.L.: Modeling end user empowerment in big data
analysis and information visualization applications. In: International Journal of Computers
and Their Applications (Winona, MN, USA, 2018), International Society for Computers and
Their Applications, International Society for Computers and Their Applications, pp. 30–42
11. Bornschlegl, M.X., Berwind, K., Kaufmann, M., Engel, F.C., Walsh, P., Hemmje, M.L., Riestra, R.,
Werkmann, B.: Ivis4bigdata: a reference model for advanced visual interfaces supporting big
data analysis in virtual research environments. In: Advanced Visual Interfaces. Supporting Big
Data Applications. Lecture Notes in Computer Science, vol. 10084, pp. 1–18. Springer
International Publishing (2016)
12.
Bornschlegl, M.X., Dammer, D., Lejon, E., Hemmje, M.L.: Ivis4bigdata infrastructures
supporting virtual research environments in industrial quality assurance. In: Proceedings of
the Joint Conference on Data Science, JCDS 2018, 22–23 May 2018. Edinburgh, UK (2018)
13. Bornschlegl, M.X., Engel, F.C., Bond, R., Hemmje, M.L.: Advanced Visual Interfaces. Supporting
Big Data Applications (2016)
14. Bornschlegl, M.X., Manieri, A., Walsh, P., Catarci, T., Hemmje, M.L.: Road mapping
infrastructures for advanced visual interfaces supporting big data applications in virtual
research environments. In: Proceedings of the International Working Conference on
Advanced Visual Interfaces, AVI 2016, Bari, Italy, 7–10 June 2016. pp. 363–367 (2016)
15. Bornschlegl, M.X., Reis, T., Hemmje, M.L.: A prototypical reference application of an
ivis4bigdata infrastructure supporting anomaly detection on car-to-cloud data. In: 27th
International Conference on Software Engineering and Data Engineering (SEDE: New Orleans,
LA, USA, 8–10 Oct 2017 (Winona, MN, USA, 2018), pp. 108–115. International Society for
Computers and Their Applications, International Society for Computers and Their
Applications (2018)
16. Brownlee, J.: Supervised and unsupervised machine learning algorithms (2016). Last accessed
23 Aug 2018
17. Card, S.K., Mackinlay, J.D., Shneiderman, B.: Information visualization. In: Card, S.K., Mackinlay,
J.D., Shneiderman, B. (eds.) Readings in Information Visualization, pp. 1–34. Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA (1999)
18. Chang, R., Ziemkiewicz, C., Green, T., Ribarsky, W.: De ining insight for visual analytics. IEEE
Comput. Graph. Appl. 29(2), 14–17 (2009)
[Crossref]
19. Costabile, M.F., Mussio, P., Parasiliti Provenza, L., Piccinno, A.: Supporting end users to be co-
designers of their tools. In: End-User Development. Lecture Notes in Computer Science,
vol. 5435, pp. 70–85. Springer Berlin Heidelberg (2009)
20. Dammer, D.: Big data visualization framework for cloud-based big data analysis to support
business intelligence. Master’s thesis, University of Hagen, Faculty of Mathematics and
Computer Science, Chair of Multimedia and Internet Applications, Hagen, Germany (2018)
21. Dendelion Blu Ltd.: Big data visualization: review of the 20 best tools (2015). Last accessed 13
Sept 2016
22. Doctrine Team.: Doctrine (Version 2.5.4) (2016). Last accessed 07 Feb 2018
23. ECMA International. Standard ECMA-404, the JSON data interchange format
24. European Commission.: Scalable semantic product data stream management for collaboration
and decision making in engineering. FP7-ICT-2009-5, Proposal Number: 257899, Proposal
Acronym: SMART VORTEX (2009)
27. European Commission.: Virtual environment for research interdisciplinary exchange. EINFRA-
9-2015, Proposal Acronym: VERTEX (2015)
28. Fischer, G.: In defense of demassi ication: empowering individuals. Hum.-Comput. Interact.
9(1), 66–70 (1994)
29. Fischer, G.: Meta-design: empowering all stakeholder as codesigners. In: Handbook on Design
in Educational Computing. pp. 135–145. Routledge, London (2013)
30. Fischer, G., Nakakoji, K.: Beyond the macho approach of arti icial intelligence: empower
human designers - do not replace them. Knowl.-Based Syst. 5(1), 15–30 (1992)
[Crossref]
31. Fraunhofer Institute for Computer Graphics Research IGD.: X3DOM (Version: 1.2) (2009). Last
accessed 11 Aug 2017
32. Fraunhofer Institute for Computer Graphics Research IGD.: Visual business analytics (2015).
Last accessed 02 Dec 2015
33. Freiknecht, J.: Big Data in der Praxis. Carl Hanser Verlag GmbH & Co. KG, Mü nchen,
Deutschland (2014)
34. Friendly, M.: Milestones in the history of data visualization: a case study in statistical
historiography. In: Weihs, C., Gaul, W. (eds.) Classi ication: The Ubiquitous Challenge, pp. 34–
52. Springer, New York (2005)
35. Harris, H., Murphy, S., Vaisman, M.: Analyzing the Analyzers: An Introspective Survey of Data
Scientists and Their Work. O’Reilly Media, Inc. (2013)
37. Illich, I.: Tools for Conviviality. World Perspectives. Harper & Row (1973)
38. Internet Engineering Task Force.: Common Format and MIME Type for Comma-Separated
Values (CSV) Files (2005). Last accessed 07 Feb 2018
39. Johnson, A., Parmer, J., Parmer, C., Sundquist, M.: Plotly.js (Version: 1.31.2) (2012). Last
accessed 29 Oct 2017
40. Keim, D., Andrienko, G., Fekete, J.-D., Gö rg, C., Kohlhammer, J., Melançon, G.: Visual analytics:
de inition, process, and challenges. In: Kerren, A., Stasko, J., Fekete, J.-D., North, C. (eds.)
Information Visualization. Lecture Notes in Computer Science, vol. 4950, pp. 154–175.
Springer Berlin Heidelberg (2008)
41. Keim, D., Mansmann, F., Schneidewind, J., Ziegler, H.: Challenges in visual data analysis. In:
Information Visualization, 2006. IV 2006. Tenth International Conference on Information
Visualisation (IV’06), pp. 9–16 (2006)
42.
Keim, D.A., Mansmann, F., Thomas, J.: Visual analytics: how much visualization and how much
analytics? SIGKDD Explor. Newsl. 11(2), 5–8 (2010). May
[Crossref]
43. Khronos Group Inc.: WebGL (Version: 2.0) (2011). Last accessed 08 Feb 2018
44. Kuhlen, R.: Informationsethik: umgang mit Wissen und Information in elektronischen
Rä umen. UTB / UTB. UVK-Verlag-Ges. (2004)
45. Machine Learning Group at the University of Waikato.: Weka (Version (3.7) (1992). Last
accessed 01 Aug 2018
46. Manieri, A., Demchenko, Y., Wiktorski, T., Brewer, S., Hemmje, M., Ferrari, T., Riestra, R., Frey, J.:
Data science professional uncovered: how the EDISON project will contribute to a widely
accepted pro ile for data scientists
47. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Hung Byers, A.: The next
frontier for innovation, competition, and productivity. McKinsey Global Institute, Big Data
(2011)
48. Microsoft Corporation.: Microsoft of ice excel (version 2016) (1985). Last accessed 07 Feb
2018
49. Mike Bostock.: d3.js (Version: 4.2.3) (2011). Last accessed 16 Sept 2016
50. Ng, A.: User friendliness? user empowerment? how to make a choice? Technical report,
Graduate School of Library and Information Science, University of Illinois at Urbana-
Champaign (2004)
51. Nieke, S.: d3-3d (Version 0.0.7) (2017). Last accessed 27 Feb 2018
53. Potencier, F.: Symfony (Version: 4.0.1) (2005). Last accessed 09 Dec 2017
54. Potencier, F.: Twig (Version: 2.4.4) (2009). Last accessed 09 Dec 2017
55. Prajapati, V.: Big Data Analytics with R and Hadoop. Packt Publishing (2013)
57. Reis, T.: Anomaly detection in car-to-cloud data based on a prototype reference application of
the ivis4bigdata infrastructure. Master’s thesis, University of Hagen, Faculty of Mathematics
and Computer Science, Chair of Multimedia and Internet Applications, Hagen, Germany (2018)
58. Robert Bosch GmbH.: Stress test for robots (2014). Last accessed 03 Dec 2018
59. Robertson, S.E.: Information Retrieval Experiment. In: The Methodology of Information
Retrieval Experiment, pp. 9–31. Butterworth-Heinemann, Newton, MA, USA (1981)
60.
Ryza, S., Laserson, U., Owen, S., Wills, J.: Advanced Analytics with Spark, vol. 1. O’Reilly Media,
Inc., Sebastopol, CA, USA, 3 (2015)
61. Salman, M., Star, K., Nussbaumer, A., Fuchs, M., Brocks, H., Vu, B., Heutelbeck, D., Hemmje, M.:
Towards social media platform integration with an applied gaming ecosystem. In: SOTICS
2015 : The Fifth International Conference on Social Media Technologies, Communication, and
Informatics, pp. 14–21. IARIA (2015)
62. SAS Institute Inc.: Data visualization: what it is and why it is important (2012). Last accessed
21 Dec 2015
63. Shneiderman, B.: The eyes have it: a task by data type taxonomy for information
visualizations. In: Proceedings, IEEE Symposium on Visual Languages, pp. 336–343 (1996)
64. Staubli International AG.: Staubli tx2-40 6-axis industrial robot (2018)
65. The jQuery Foundation.: jQuery (Version 3.2.1) (2006). Last accessed 27 Feb 2018
66. The R Foundation.: The R Project for Statistical Computing (Version 3.2.5) (1993). Last
accessed 28 April 2016
67. Thomas, J.J., Cook, K., et al.: A visual analytics agenda. IEEE Comput. Graph. Appl. 26(1), 10–
13 (2006). Jan
[Crossref]
68. Thomas, J.J., Cook, K.A.: Illuminating the Path: The Research and Development Agenda for
Visual Analytics. National Visualization and Analytics Ctr (2005)
70. Tufte, E.: Visual Explanations: Images and Quantities, Evidence, and Narrative. Graphics Press
(1997)
71. Twitter, I.: Bootstrap (Version 4.0.0) (2011). Last accessed 27 Feb 2018
72. Vu, D.B.: Realizing an applied gaming ecosystem: extending an education portal suite towards
an ecosystem portal. Master’s thesis, Technische Universitä t Darmstadt (2016)
74. Wang, W.: Big data, big challenges. In: Semantic Computing (ICSC), 2014 IEEE International
Conference on Semantic Computing, p. 6 (2014)
75. Wiederhold, G.: Mediators in the architecture of future information systems. Computer 25(3),
38–49 (1992). March
[Crossref]
76. Wong, P.C., Thomas, J.: Visual analytics. IEEE Comput. Graph. Appl. 5, 20–21 (2004)
[Crossref]
77.
Wood, J., Andersson, T., Bachem, A., Best, C., Genova, F., Lopez, D.R., Los, W., Marinucci, M.,
Romary, L., Van de Sompel, H., Vigen, J., Wittenburg, P., Giaretta, D., Hudson, R.L.: Riding the
wave: how Europe can gain from the rising tide of scienti ic data. Final report of the high level
expert group on scienti ic data; a submission to the European commission
78. World Wide Web Consortium (W3C).: HTML (Version 5) (2014). Last accessed 12 Sept 2016
79. World Wide Web Consortium (W3C).: SVG (Version 2) (2015). Last accessed 16 Sept 2016
Footnotes
1 Document Object Model.
6 Position Command.
7 Position Feedback.
9 sample 3 475—sample 3 200 275 samples 2 750 min 45.83 h 1.91 days.
10 sample 3 475—sample 3 300 175 samples 1 750 min 29.17 h 1.22 days.
© Springer Nature Switzerland AG 2021
G. Phillips-Wren et al. (eds.), Advances in Data Science: Methodologies and Applications,
Intelligent Systems Reference Library 189
https://doi.org/10.1007/978-3-030-51870-7_12
Lakhmi C. Jain
Email: [email protected]
Email: [email protected]
Abstract
Revolutionary growth in technology has changed the way humans
interact with machines. This can be seen in every area, including air
transport. For example, countries such as United States are planning to
deploy NextGen technology in all ields of air transport. The main goals
of NextGen are to enhance safety, performance and to reduce impacts
on environment by combining new and existing technologies. Loss of
Situation Awareness (SA) in pilots is one of the human factors that
affects aviation safety. There has been a signi icant research on SA
indicating that pilot’s perception error leading to loss of SA is a one of
the major causes of accidents in aviation. However, there is no system
in place to detect these errors. Monitoring visual attention is one of the
best mechanisms to determine a pilot’s attention and hence perception
of a situation. Therefore, this research implements computational
models to detect pilot’s attentional behavior using ocular data during
instrument light scenario and to classify overall attention behavior
during instrument light scenarios.
12.1 Introduction
Air travel is a common mode of transport in the modern era and
considered one of the safest. Even though aviation accidents are not as
common as road accidents, associated losses have a greater impact. One
civil aircraft accident can claim the lives of hundreds of people and
cause millions of dollars of economic loss. Therefore, airlines are bound
to abide by strict safety policies and guidelines. Safety breaches by
airlines are just one of the causes of aviation accidents.
Other causes include technical faults, human error and
environmental conditions [1]. Past investigations have shown that
more than 70% of accidents are caused by human error [2]. Given their
devastating effects, research into improving safety is a priority in
aviation. In order to enhance safety, performance, and to reduce
impacts on environment, countries like the United States are planning
to deploy Next Generation (NextG) technologies in all ields of air
transport. This research investigates the feasibility of improving
aviation safety by designing a novel system to monitor pilot visual
behaviour and detect possible errors in instrument scan pattern that
could potentially lead to loss of pilot Situation Awareness (SA).
From the previous researches, it is evident that ocular measures are
effective measures in determining attentional behavior [3]. Identi ied
attentional behaviours can further be used as to detect potential pilot
errors. With the ongoing research in embedded eye trackers and
technology growth, it can be foreseen that aircraft will include such
advanced recording devices in the near future [4].
In this research study, knowledge discovery in data process was
used to collect ocular data and extract attention patterns. Flight
simulator experiments were conducted with trainee pilots and ocular
data were collected using eye trackers. In the absence of readily
available classi ications of existing data, we developed a feature
extraction and decision model based on the observed data and inputs
from the subject matter experts. Different attributes from the
instrument scan sequence are also used to aggregate and devise models
for scoring attention behaviors.
This is a signi icant step towards detection of perceptual errors in
aviation human factors. Based on this model, further applications can
be developed to assess the performance of trainee pilots by light
instructors during simulator training. Also, the model can be further
developed into a SA monitoring and alerting system in future aircrafts
and in such way reducing the risk of accidents due to loss of SA.
3.
Simulator Con iguration: Prepar3D simulator is con igured to
launch aircraft in Instrument Flight Rules (IFR) mode, with
different departure and destination airports. The participant was
asked to perform the instrument lying using just the instrument
panel. Weather conditions and failures were precon igured for
different scenarios without the knowledge of participant.
4.
Gaze Tracking: Gaze tracking was commenced from the EyeTribe
tracker console immediately after the scenario started. Gaze
records were saved into a ile named after the time stamp. The end
result (crash or successful landing) and the simulator
con igurations for each scenario were also recorded.
The eye tracker provides data on the gaze coordinates for each
frame, the time stamp and the pupil diameter in Java Script Object
Notation (JSON) format, as shown below in Fig. 12.7.
Fig. 12.7 Sample readings in JSON format from EyeTribe tracker
Behavior Evaluation
The inal step in this experiment process is to evaluate the recognised
attention indicators as behaviours. To achieve this, the repeated
attention patterns are awarded scores, and the scores are aggregated to
relatively rank each pattern as poor, average or good. However, the
study refrains from classifying scan patterns as good or bad in a general
context because of the lack of decisive measures in aviation human
factors.
12.5.1 Results
This section covers attention scores and rating of instrument scan
sequences recorded from trainee pilots. The attention rating model
developed in Java traversed individual scan sequences and computed
the attention error indicator scores and the attention distribution
scores for each sequence. The score for each indicator was computed
over the sequence of transitions as speci ied earlier in the section.
Because the sequences are of varying lengths, scores were calculated
and standardised for each transition sequence. Table 12.2 shows the
computed attention error indicators and attention distribution.
Table 12.2 Attention indicator scores as certainty factor
12.6 Conclusions
The motivation for the experiments discussed in this chapter was to
arrive at a reliable measure and method that provide a better
mechanism to identify pilot’s attention distribution, and attention error
indicators such as Attention Blurring (AB), Attention Focusing (AF) and
Misplaced Attention (MA). During the course of the research, it was
proved that ocular measures are effective measures in determining
attentional behaviour.
The study also highlighted the importance of sequential
representation of gaze data and not only the aggregated ixation
distribution on AOIs. Attention indicator score models were designed
and applied to the sequences to identify various attentional behaviours.
It has been observed from the results that attention indicators can
overlap during instrument scan. However, using the scoring model
helps to determine the frequently exhibited attention indicators. The
computation of attention provides a comparative rating of attention
within the data set. The attention scores from the data set were
categorised as good, average or poor relative to other participants in
the group. However, the study refrains from labelling the behaviour as
good or poor in general scenarios because, so far in aviation, there has
been no clear distinction between expected good attention behaviour
and poor attentional behaviour.
There were a few challenges that arose during this study. Currently,
there is no standard de inition of expected patterns during instrument
scan. In addition, there are no real-time data or known classi ications
available in the aviation literature. Therefore, the study was based on
the recommended instrument scans in instrument lying manuals and
input from aviation Subject Matter Experts (SMEs). The six primary
instrument scans during instrument lying was used as the case for this
thesis. However, the system could be easily extended to include other
instruments and additional AOIs. One future extension could involve
the development of an expert system that includes other scenarios
during instrument scan and integrates the attention scoring and rating
algorithms for the purpose of analysis of pilot behaviour. The scope of
this study included only ocular measures, as eye tracking is a proven
method of detecting visual attention. Along with ocular measures,
integration of speech processing or other physiological measures such
as facial expressions recognition systems may help in developing a
robust futuristic SA monitoring system.
This research investigated the possibility of identifying attention
errors but did not attempt to provide feedback to the pilot. However, in
the future, a system based on this research could be developed that
could monitor pilots’ behaviour in real time, and provide timely
feedback and alerts to the pilots, which could prove to be lifesaving.
References
1. Ancel, E., Shih, A.T., Jones, S.M., Reveley, M.S., Luxhøj, J.T., Evans, J.K.: Predictive safety
analytics: inferring aviation accident shaping factors and causation. J. Risk Res. 18(4), 428–
451 (2015)
2. Shappell, S.A., Wiegmann, D.A.: Human factors analysis of aviation accident data: developing a
needs-based, data-driven, safety program. In: 3rd Workshop on Human Error, Safety, and
System Development (HESSD’99) (1999)
3. Thatcher, S., Kilingaru, K.: Intelligent monitoring of light crew situation awareness. Adv. Mater.
Res. 433(1), 6693–6701 (2012). Trans Tech Publications
4.
Kilingaru, K., Tweedale, J.W., Thatcher, S., Jain, L.C.: Monitoring pilot “situation awareness”. J.
Intell. Fuzzy Syst. 24(3), 457–466 (2013)
5. Regal, D.M., Rogers, W.H., Boucek. G.P.: Situational awareness in the commercial light deck:
de inition, measurement, and enhancement. SAE Technical Paper (1988)
6. Sarter, N.B., Woods, D.D.: Situation awareness: a critical but ill-de ined phenomenon. Int. J.
Aviat. Psychol. 1(1), 45–57 (1991)
7. Oakley, T.: Attention and cognition. J. Appl. Attention 17(1), 65–78 (2004)
[MathSciNet]
9. Lamme, V.A.: Why visual attention and awareness are different. Trends Cognitive Sci. 7(1), 12–
18 (2003)
10. Underwood, G., Chapman, P., Brocklehurst, N., Underwood, J., Crundall, D.: Visual attention
while driving: sequences of eye ixations made by experienced and novice drivers.
Ergonomics 46(6), 629–646 (2003)
11. Smith, P., Shah, M., da Vitoria, Lobo N.: Determining driver visual attention with one camera.
IEEE Trans. Intell. Transp. Syst. 4(4), 205–218 (2003)
12. Ji, Q., Yang, X.: Real-time eye, gaze, and face pose tracking for monitoring driver vigilance.
Real-time imaging. 8(5), 357–377 (2002)
[zbMATH]
13. Yu, C.S., Wang, E.M., Li, W.C., Braithwaite, G.: Pilots’ visual scan patterns and situation
awareness in light operations. Aviat. Space Environ. Med. 85(7), 708–714 (2014)
14. Haslbeck, A., Bengler, K.: Pilots’ gaze strategies and manual control performance using
occlusion as a measurement technique during a simulated manual light task. Cogn. Technol.
Work 18(3), 529–540 (2016)
15. Ho, H.F., Su, H.S., Li, W.C., Yu, C.S., Braithwaite, G.: Pilots’ latency of irst ixation and dwell
among regions of interest on the light deck. In: International Conference on Engineering
Psychology and Cognitive Ergonomics. Springer, Cham (2016)
16. Roscoe, A.H.: Heart rate as an in- light measure of pilot workload. Royal Aircraft
Establishment Farnborough (United Kingdom) (1982)
17. Hankins, T.C., Wilson, G.F.: A comparison of heart rate, eye activity, EEG and subjective
measures of pilot mental workload during light. Aviat. Space Environ. Med. 69(4), 360–367
(1998)
18. Craig, A., Tran, Y., Wijesuriya, N., Nguyen, H.: Regional brain wave activity changes associated
with fatigue. Psychophysiology 49(44), 574–582 (2012)
19. Diez, M., Boehm-Davis, D.A., Holt, R.W., Pinney, M.E., Hansberger, J.T., Schoppek, W.: Tracking
pilot interactions with light management systems through eye movements. In: Proceedings of
the 11th International Symposium on Aviation Psychology, vol. 6, issue 1. The Ohio State
University, Columbus (2001)
20. Van De Merwe, K., Van Dijk, H., Zon, R.: Eye movements as an indicator of situation awareness
in a light simulator experiment. Int. J. Aviat. Psychol. 22(1), 78–95 (2012)
21. Fitts, P.M., Jones, R.E., Milton, J.L.: Eye movements of aircraft pilots during instrument-landing
approaches. Ergon. Psychol. Mech. Models Ergon. 3(1), 56 (2005)
22. de Greef, T., Lafeber, H., van Oostendorp, H., Lindenberg, J.: Eye movement as indicators of
mental workload to trigger adaptive automation. In: International Conference on Foundations
of Augmented Cognition, pp. 219–228. Springer, Berlin, Heidelberg (2009)
23. Gibb, R., Gray, R., Scharff, L.: Aviation Visual Perception: Research, Misperception and Mishaps.
Routledge (2016)
24. Rayner, K., Pollatsek, A.: Eye movements and scene perception. Can. J. Psychol. 46(3), 342
(1992)
25. Instrument lying handbook: faa-h-8083-15a, United States Department of Transport Federal
Aviation Administration (2012)
26. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in
databases. AI Mag. 17(3), 37 (1996)
27. Ackoff, R.L.: From data to wisdom. J. Appl. Syst. Anal. 16(1), 3–9 (1989)
28. Bellinger, G., Castro, D., Mills, A.: Data, information, knowledge, and wisdom (2004)
30. Zeleny, M.: Management support systems: towards integrated knowledge management. Hum.
Syst. Manage. 7(1), 59–70 (1987)
33. Mill, E.: Json to CSV tool. Online: https://konklone.io/json/. Last accessed on 02 April 2018
34. Burch, M., Kull, A., Weiskopf, D.: AOI rivers for visualizing dynamic eye gaze frequencies.
Comput. Graph. Forum 32(3), 281–290 (2013)
35. Kurzhals, K., Weiskopf, D.: Aoi transition trees. In: Proceedings of the 41st Graphics Interface
Conference, pp. 41–48. Canadian Information Processing Society (2015)
36. Abbott, A., Hrycak, A.: Measuring resemblance in sequence data: An optimal matching analysis
of musicians’ careers. Am. J. Sociol. 96(1), 144–185 (1990)
37. Kinnebrew, J.S., Biswas, G.: Comparative action sequence analysis with hidden markov models
and sequence mining. In: Proceedings of the Knowledge Discovery in Educational Data
Workshop at the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD 2011). San Diego, CA (2011)
38. Power BI: [Available online], https://powerbi.microsoft.com/en-us/. Last accessed 26 August
2019
39. Kü bler, T., Eivazi, S., Kasneci, E.: Automated visual scanpath analysis reveals the expertise level
of micro-neurosurgeons. In: MICCAI Workshop on Interventional Microscopy, pp. 1–8 (2015)
40. Dewhurst, R., Nyströ m, M., Jarodzka, H., Foulsham, T., Johansson, R., Holmqvist, K.: It depends
on how you look at it: Scanpath comparison in multiple dimensions with MultiMatch, a
vector-based approach. Behav. Res. Methods 44(4), 1079–1100 (2012)
41. Li, H.: A short introduction to learning to rank. IEICE Trans. Inform. Syst. 94(10), 1854–1862
(2011)
© Springer Nature Switzerland AG 2021
G. Phillips-Wren et al. (eds.), Advances in Data Science: Methodologies and Applications,
Intelligent Systems Reference Library 189
https://doi.org/10.1007/978-3-030-51870-7_13
Antonino Staiano
Email: [email protected]
Abstract
Music is a language of emotions and music emotional recognition has
been addressed by different disciplines (e.g., psychology, cognitive
science and musicology). Nowadays, the music fruition mechanism is
evolving, focusing on the music content. In this work, a framework for
processing, classi ication and clustering of songs on the basis of their
emotional contents, is explained. On one hand, the main emotional
features are extracted after a pre-processing phase where both Sparse
Modeling and Independent Component Analysis based methodologies
are applied. The approach makes it possible to summarize the main
sub-tracks of an acoustic music song (e.g., information compression and
iltering) and to extract the main features from these parts (e.g., music
instrumental features). On the other hand, a system for music emotion
recognition based on Machine Learning and Soft Computing techniques
is introduced. One user can submit a target song, representing his
conceptual emotion, and obtain a playlist of audio songs with similar
emotional content. In the case of classi ication, a playlist is retrieved
from songs belonging to the same class. In the other case, the playlist is
suggested by the system exploiting the content of the audio songs and it
could also contains songs of different classes. Experimental results are
proposed to show the performance of the developed framework.
13.1 Introduction
One of the main channels for accessing reality and information about
people and their social interaction is the Multimedia content [1]. One
special medium is music that is essential for an independent child and
adult life [2] for its extraordinary ability to evoke powerful emotions
[3]. Recently, music emotion recognition have been studied in different
disciplines such as psychology, physiology, cognitive science and
musicology [4] where emotion usually has a short duration (seconds to
minutes) while mood has a longer duration (hours or days). Several
studies in neuroscience, by exploiting the current technique of
neuroimaging, found interesting biological property triggered by
speci ic areas of the brain when listening at emotional music. While
authors in [5] demonstrated that the amygdala plays an important role
in the recognition of fear, when scary music is played, authors in [3]
found that music creating high pleasurable feeling emotion, stimulates
dopaminergic pathways in the human brain such the mesolimbic, which
is involved in reward and motivation.
Another study [6], for example, exploits the Electroencephalography
(EEG) data, for the emotional response of terminally ill cancer patients
to a music therapy intervention and a recent study con irms an anti-
epileptic effect of Mozart music on the EEG in children, suggesting the
“Mozart therapy” as a treatment for drug-resistant epilepsy warrants
[7]. In these last years, several websites have tried to combine social
interaction with music and entertainment. For example Stereomood [8]
is a free emotional internet radio. Moreover, in [9] the authors
introduced a framework for mood detection of acoustic music data,
based on a music psychological theory in western cultures. In [10] the
authors proposed and compared two fuzzy classi iers determining
emotion classes by using a Arousal and Valence (AV) scheme. While, in
[4] the authors focus on a music emotion recognition system based on
fuzzy inferences. Recently, a system for music emotion recognition
based on machine learning and computational intelligence techniques
has been introduced in [11]. In that system, one user formulates a
query providing a target audio song with similar emotions to the ones
he wishes to retrieve while the authors use supervised techniques on
labeled data or unsupervised techniques on unlabeled data. The
emotional classes are a subset of the model, proposed by Russell,
showed in Fig. 13.1. According to it, emotions are explained as
combinations of arousal and valence, where arousal measures the level
of activation and valence measures pleasure/displeasure.
Moreover, in [12] a robust approach of features extraction from
music recordings has been introduced. The approach permits to extract
the representative sub-tracks by compression and iltering.
Aim of this chapter is to explain a robust framework for processing,
classi ication and clustering of musical audio songs by their emotional
contents. The main emotional features are obtained after a pre-
processing on the sub-tracks of an acoustic music song using both
Sparse Modeling [13] and Independent Component Analysis [14]. This
mechanism permits to compress and ilter the main music features
corresponding to their content (i.e., music instruments). The
framework take in input a target song, representing a conceptual
emotion, and permits to obtain a playlist of audio songs with similar
emotional content. In the case of classi ication, a playlist is obtained
from the songs belonging to the same class. In the other case, the
playlist is suggested by the system exploiting the content of the audio
songs and it could also contain songs of different classes.
The paper is organized as follows. In Sect. 13.2 the music emotional
features are described. In Sects. 13.3 and 13.4 the overall system and
the used techniques are described. Finally, in Sects. 13.5 and 13.6
several experimental results and considerations are proposed,
respectively.
13.2.2 Intensity
This feature is related to sound sensation and the amplitude of the
audio waves [17]. Formally, low intensity is associated to sensations of
sadness, melancholy, tenderness or peacefulness. While positive
emotions are correlated to high intensity, and in particular we could list
joy, excitement or triumph, while anger or fear are associated to very
high intensity with many variations. The intensity of the sound is
expressed by the regularity of the volume in the song. In particular, the
mean energy of the wavel is extracted
(13.1)
where x(t) is the value of the amplitude at time t and N is the length of
signal.
The standard deviation of AE is calculated
(13.2)
This value expresses the regularity of the volume in the song: high
volume, regularity of loudness, and loudness frequency.
13.2.3 Rhythm
The rhythm of a song is described by beat and tempo. The beat is the
regularly occurring pattern of rhythmic stresses in music and tempo is
the speed of the beat, expressed in Beats Per Minute (BPM).
Regular beats makes listeners peaceful or even melancholic, but
irregular beats could make some listeners feel aggressive or unsteady.
The approach used in our framework pemits to track beats estimating
the beat locations [18, 19].
13.2.4 Key
In a song, a group of pitches in ascending order form a scale, spanning
an octave. In our framework we adopt a key detection system for
estimating key associated with the maximum duration in the song for
each key change [20].
(13.4)
(13.6)
(13.7)
(13.8)
where is the sum of the norms of the rows of ,
(13.9)
(13.10)
for .
We note that for inverting the convolutive mixtures a set of
similar FIR ilters should be used
(13.11)
(13.13)
and
(13.14)
1.
if , then , , ,
2.
if , then at least exist , make .
Provided that is called the upper approximate limit, which
characterizes the border of all possible objects possibly belonging to
the i-th class. If some objects do not belong to the range which is
de ined by the upper approximate limit, then they belong to the
negative domain of this class, namely, they do not belong to this class.
The objective function of RFCM clustering algorithm is:
(13.15)
(13.16)
and
(13.17)
(13.18)
where is the fuzzy membership of a training sample in class c,
is one of the k-nearest samples, and is the weight inversely
proportional to the distance between and , . With
maximum of .
Fig. 13.5 Hierarchical clustering on the dataset of 28 songs applying three criteria: a overall song
elaboration; b sparse modeling; c sparse modeling and CICA
Fig. 13.6 Waveform of song 4
13.6 Conclusions
In this Chapter we introduced a framework for processing,
classi ication and clustering of songs on the basis of their emotional
contents. The main emotional features are extracted after a pre-
processing phase where both Sparse Modeling and Independent
Component Analysis based methodologies are used. The approach
makes it possible to summarize the main sub-tracks of an acoustic
music song and to extract the main features from these parts. The
musical features took into account were intensity, rhythm, scale,
harmony and spectral centroid. The core of the query engine takes in
input a target audio song provided by the user and returns a playlist of
the most similar songs. A classi ier is used to identify the class of the
target song, and then the most similar songs belonging to the same
class are obtained . This is achieved by using a fuzzy similarity measure
based on the Łukasiewicz product. In the case of classi ication, a playlist
is obtained from the songs of the same class. In the other cases, the
playlist is suggested by the system by exploiting the content of the
audio songs, which could also contain songs of different classes. The
obtained results with clustering are not comparable with those
obtained with the supervised techniques. However, we stress that in the
irst case, the playlist is obtained by songs contained in the same class
and in the second case the emotional information is suggested by the
system. The approach can be considered a real alternative to human
based classi ication systems (i.e., stereomood). In the next future the
authors will focus the attention on a greater database of songs, further
musical features and the use of semi-supervised approaches. Moreover
they will experiment new approaches as the Fuzzy Relational Neural
Network [28], that allows to extract automatically memberships and IF-
THEN reasoning rules.
Acknowledgements
This work was partially funded by the University of Naples Parthenope
(Sostegno alla ricerca individuale per il triennio 2017–2019 project).
References
1. Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico, F.: Marc schroeder. A
survey of social signal processing. IEEE Trans. Affect. Comput. Bridging Gap Between Soc.
Anim. Unsoc. Mach. (2011)
2. Barrow-Moore, J.L.: The Effects of Music Therapy on the Social Behavior of Children with
Autism. Master of Arts in Education College of Education California State University San
Marcos, November 2007
3. Blood, A.J., Zatorre, R.J.: Intensely pleasurable responses to music correlate with activity in
brain regions implicated in reward and emotion. Proc. Natl. Acad. Sci. 98(20), 11818–11823
(2001)
[Crossref]
4. Jun, S., Rho, S., Han, B.-J., Hwang, E.: A fuzzy inference-based music emotion recognition
system. In: 5th International Conference on In Visual Information Engineering—VIE (2008)
5. Koelsch, S., Fritz, T., v. Cramon, D.Y., Mü ller, K., Friederici, A.D.: Investigating emotion with
music: an fMRI study. Hum. Brain Mapp. 27(3), 239–250 (2006)
6. Ramirez, R., Planas, J., Escude, N., Mercade, J., Farriols, C.: EEG-based analysis of the emotional
effect of music therapy on palliative care cancer patients. Front. Psychol. 9, 254 (2018)
7. Grylls, E., Kinsky, M., Baggott, A., Wabnitz, C., McLellan, A.: Study of the Mozart effect in
children with epileptic electroencephalograms. Seizure—Eur. J. Epilepsy 59, 77–81 (2018)
[Crossref]
8. Stereomood Website
9. Lu, L., Liu, D., Zhang, H.-J.: Automatic mood detection and tracking of music audio signals.
IEEE Trans. Audiom Speech Lang. Process. 14(1) (2006)
10. Yang, Y.-H., Liu, C.-C., Chen, H.H.: Music Emotion Classi ication: a fuzzy approach. Proc. ACM
Multimed. 2006, 81–84 (2006)
11. Ciaramella, A., Vettigli, G.: Machine learning and soft computing methodologies for music
emotion recognition. Smart Innov. Syst. Technol. 19, 427–436 (2013)
[Crossref]
12. Iannicelli, M., Nardone, D., Ciaramella, A., Staiano, A.: Content-based music agglomeration by
sparse modeling and convolved independent component analysis. Smart Innov. Syst. Technol.
103, 87–96 (2019)
[Crossref]
13. Ciaramella, A., Gian ico, M., Giunta, G.: Compressive sampling and adaptive dictionary learning
for the packet loss recovery in audio multimedia streaming. Multimed. Tools Appl. 75(24),
17375–17392 (2016)
[Crossref]
14. Ciaramella, A., De Lauro, E., De Martino, S., Falanga, M., Tagliaferri, R.: ICA based identi ication
of dynamical systems generating synthetic and real world time series. Soft Comput. 10(7),
587–606 (2006)
[Crossref]
15. Thayer, R.E.: The Biopsichology of Mood and Arousal. Oxfrod University Press, New York
(1989)
16. Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. (1980)
17. Revesz, G.: Introduction to the Psychology of Music. Courier Dover Publications (2001)
18. Bello, J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., Sandler, M.B.: Tutorial on onset
detection in music signals. IEEE Trans. Speech Audio Process. (2005)
19. Davies, M.E.P., Plumbley, M.D.: Context-dependent beat tracking of musical audio. IEEE Trans.
Audio, Speech Lang. Process. 15(3), 1009–1020 (2007)
[Crossref]
20. Noland, K., Sandler, M.: Signal processing parameters for tonality estimation. In: Proceedings
of Audio Engineering Society 122nd Convention, Vienna (2007)
21. Grey, J.M., Gordon, J.W.: Perceptual effects of spectral modi ications on musical timbres. J.
Acoust. Soc. Am. 63(5), 1493–1500 (1978)
[Crossref]
22. Elhamifar, E., Sapiro, G., Vidal, R. See all by looking at a few: sparse modeling for inding
representative objects. In: Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, art. no. 6247852, pp. 1600–1607 (2012)
23. Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, Hoboken, N. J.
(2001)
24. Ciaramella, A., De Lauro, E., Falanga, M., Petrosino, S.: Automatic detection of long-period
events at Campi Flegrei Caldera (Italy). Geophys. Res. Lett. 38(18) (2013)
25. Ciaramella, A., De Lauro, E., De Martino, S., Di Lieto, B., Falanga, M., Tagliaferri, R.:
Characterization of Strombolian events by using independent component analysis. Nonlinear
Process. Geophys. 11(4), 453–461 (2004)
[Crossref]
26. Ciaramella, A., Tagliaferri, R.: Amplitude and permutation indeterminacies in frequency
domain convolved ICA. Proc. Int. Joint Conf. Neural Netw. 1, 708–713 (2003)
27. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classi ication. Wiley-Interscience (2000)
28. Ciaramella, A., Tagliaferri, R., Pedrycz, W., Di Nola, A.: Fuzzy relational neural network. Int. J.
Approx. Reason. 41, 146–163 (2006)
[MathSciNet][Crossref]
29. Sessa, S., Tagliaferri, R., Longo, G., Ciaramella, A., Staiano, A.: Fuzzy similarities in stars/galaxies
classi ication. In: Proceedings of IEEE International Conference on Systems, Man and
Cybernetics, pp. 494–4962 (2003)
30. Turunen, E.: Mathematics behind fuzzy logic. Adv. Soft Comput. Springer (1999)
31. Ciaramella, A., Cocozza, S., Iorio, F., Miele, G., Napolitano, F., Pinelli, M., Raiconi, G., Tagliaferri,
R.: Interactive data analysis and clustering of genomic data. Neural Netw. 21(2–3), 368–378
(2008)
[Crossref]
32. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)
33. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press,
New York (1981)
34. Wang, D., Wu, M.D.: Rough fuzzy c-means clustering algorithm and its application to image. J.
Natl. Univ. Def. Technol. 29(2), 76–80 (2007)
Footnotes
1 In the experiment we used .
© Springer Nature Switzerland AG 2021
G. Phillips-Wren et al. (eds.), Advances in Data Science: Methodologies and Applications,
Intelligent Systems Reference Library 189
https://doi.org/10.1007/978-3-030-51870-7_14
Miltiadis Alamaniotis
Email: [email protected]
Abstract
In the smart cities of the future arti icial intelligence (AI) will have a
dominant role given that AI will accommodate the utilization of
intelligent analytics for prediction of critical parameters pertaining to
city operation. In this chapter, a new data analytics paradigm is
presented and being applied for energy demand forecasting in smart
cities. In particular, the presented paradigm integrates a group of kernel
machines by utilizing a deep architecture. The goal of the deep
architecture is to exploit the strong capabilities of deep learning
utilizing various abstraction levels and subsequently identify patterns
of interest in the data. In particular, a deep feedforward deep neural
network is employed with every network node to implement a kernel
machine. This deep architecture, named neuro-kernel machine
network, is subsequently applied for predicting the energy
consumption of groups of residents in smart cities. Obtained results
exhibit the capability of the presented method to provide adequately
accurate predictions despite the form of the energy consumption data.
14.1 Introduction
Advancements in information and communication technologies have
served as the vehicle to move forward and implement the vision of
smart and interconnected societies. In the last decade, this vision has
been shaped and de ined as a “smart city” [28]. A smart city is a fully
connected community where the exchange of information aims at
improving the operation of the city and improve the daily life of the
citizens [18]. In particular, exploitation of information may lead to
greener, less polluted and more human cities [4, 16]. The latter is of
high concern and importance because it is expected that the population
of cities will increase in the near future [21].
In general, the notion of smart city may be considered as the
assembly of a set of service groups [1]. The coupling of the city services
with information technologies have also accommodated the
characterization of those groups with the term “smart.” In particular, a
smart city is comprised of the following service groups: smart energy,
smart healthcare, smart traf ic, smart farming, smart transportation,
smart buildings, smart waste management, and smart mobility [25].
Among those groups, smart energy is of high interest [8, 10]. Energy
is the cornerstone of the modern civilization, upon which the modern
way of life is built [12]. Thus, it is normal to assume that smart energy
is of high priority compared to the rest of the smart city components; in
a visual analogy, Fig. 14.1 denotes smart energy as the fundamental
component of smart cities [6]. Therefore, the optimization of the
distribution and the utilization of electrical energy within the premises
of the city is essential to move toward self-sustainable cities.
Fig. 14.1 Visualization of a smart city as pyramid with smart energy consist of the fundamental
component
Energy (load) prediction has been identi ied as the basis for
implementing smart energy services [9]. Accurate prediction of the
energy demand promotes the ef icient utilization of the energy
generation and distribution by making optimal decisions. Those
optimal decisions are made by taking into consideration the current
state of the energy grid and the anticipated demand [13]. Thus, energy
demand prediction accommodates fast and smart decisions with regard
the operation of the grid [5]. However, the integration of information
technologies and the use of smart meters from each consumer has
added further uncertainty and volatility in the demand pattern. Hence,
intelligent tools are needed that will provide high accurate forecasts
[20].
In this chapter, the goal is to introduce a new demand prediction
methodology that is applicable to smart cities. The extensive use of
information technologies in smart cities, as well as the heterogeneous
behavior of consumers even in close geographic vicinity will further
complicate the forecasting of the energy demand [27]. Furthermore,
predicting the demand of a smart city partition (e.g. a neighborhood)
that includes a speci ic number of consumers will impose high
challenges in energy forecasting [5]. For that reason, the new
forecasting methodology adopts a set of various kernel machines that
are equipped with different kernel functions [7]. In addition, it
assembles the kernel machines into a deep neural network architecture
that is called neuro-kernel-machine network (NKMN)). The goal of the
NKMN is to analyze the historical data aiming at capturing the energy
consumption behavior of the citizens by using a set of kernel machines
—with each machine to model a different set of data properties—[2].
Then, the kernel machines interact via a deep neural network that
accommodates the interconnection of kernel machines via a set of
weights. This architecture models the “interplay” of the data properties
in the hope that the neural driven architecture will identify the best
combination of kernel machines that captures the citizens’ stochastic
energy behavior [11].
The current chapter is organized as follows. In the next section,
kernel machines and more speci ically the kernel modeled Gaussian
processes are presented, while Sect. 14.3 presents the newly developed
NKMN architecture. Section 14.4 provides the test results obtained on a
set of data obtained from smart meters, whereas Sect. 14.5 concludes
the paper and provides its main points.
(14.5)
with I being the identity matrix. It should be noted that the selection of
mean to be equal to zero is a convenient choice without affecting the
derivation of the GPR framework [31].
Driven by Eqs. (14.3) and (14.4), a Gaussian process is obtained
whose parameters, i.e., mean and covariance values, are taken by:
(14.6)
(14.7)
where K stands for the so-called Gram matrix with entries at position i,
j is given by:
(14.8)
and thus, the Gaussian process is expressed as:
(14.9)
However, in practice the observed values consist of the aggregation
of the target value with some noise:
(14.10)
with εn being random noise following a normal distribution:
(14.11)
where denotes the variance of the noise [31]. By using Eqs. (14.9)
and (14.10), we conclude that the prior distribution over targets tn also
follow a normal distribution (in vector form):
(14.12)
(14.14)
(14.16)
where the dependence of both the mean and covariance functions on
the selected kernel is apparent [32].
Overall, the form of Eqs. (14.14) and (14.15) imply that the modeler
can control the output of the predictive distribution by selecting the
form of the kernel [14, 31].
14.3 Neuro-Kernel-Machine-Network
In this section the newly developed network for conducting predictive
analytics is presented [30]. The developed network implements a deep
learning approach [22, 26] in order to learn the historic consumption
patterns of city citizens and subsequently provide a prediction of
energy over a predetermined time interval [3].
The idea behind developing the NKMN is the adoption of kernel
machines as the nodes of the neural network [23]. In particular, a deep
architecture is adopted that is comprised of one input, L hidden (with L
being larger than 3) and one output layer as shown in Fig. 14.2. Notably,
the #L hidden layers are comprised of three nodes each, with the nodes
implementing a GP equipped with a different kernel function. The input
layer is not a computing layer and hence, does not perform any
information processing; it only forwards the input to the hidden layers
[29]. The last layer, i.e. the output, implements a linear function of the
inputs coming from the last hidden layer. The presented deep network
architecture is a feedforward network with a set of weights connecting
the previous layer to the next one [24].
(14.17)
where θ1, θ2 are two positive valued parameters; in the present work,
θ1 is taken equal to 3/2 (see [31] for details), whereas Kθ1() is a
modi ied Bessel function.
Gaussian Kernel
(14.18)
(14.19)
Then, the citizens are connected to each other via the hidden layer
weights. The role of weights is to express the degree of realization of
the speci ic behavior of the citizen in the overall city demand [15]. The
underlying idea is that in smart cities the overall demand is a result of
the interactive demands of the various citizens since they do have the
opportunity to exchange information and morph their inal demand [3,
8].
The training of the presented NKMN is performed as follows. In the
irst stage the training set of each citizen is put together and
subsequently the nodes of each the respective hidden layer are trained.
Once the node training is completed, then a training set of city demand
data is put together (denoted as “city demand data” in Fig. 14.5). This
newly formed training set consists of the historical demand patterns of
the city (or partition of the city)—re lects the inal demand and the
interactions among the citizens-. The training is performed using the
backpropagation algorithm.
Overall, the 2-stage process utilized for training the NKMN is
comprised of two supervised learning stages: the irst at the individual
node level, and the second one at the overall deep neural network level.
To make it clearer, the individual citizen historical data are utilized for
evaluation of the GP parameters at each hidden layer while the
aggregated data of the participating citizens are utilized to evaluate the
parameters of the network.
Finally, once the training of the network has been completed, then
the NKMN is able to make prediction over the demand of that speci ic
group of citizens as shown at the bottom of Fig. 14.5. Notably the group
might a neighborhood of 2–20 citizens or larger areas with thousands
of citizens. It is anticipated in the latter case that the training process
will last for long time.
14.5 Conclusion
In this chapter a new deep architecture for data analytics applied to
smart cities operation is presented. In particular, a deep feedforward
neural network is introduced where the nodes of the network are
implemented by kernel machines. Getting into more details the deep
network is comprised of a single input layer, L hidden and a single
output layer. The number of hidden layers is equal to the number of
citizens participating in the shaping of the energy demand under study.
The aim of the deep learning architecture is to model the energy (load)
behavior and the interactions among the citizens that affect the overall
demand shaping. In order to capture citizen behavior each hidden layer
is comprised of three different nodes with each node implementing a
kernel based Gaussian process with different kernel, namely, the
Maté rn, Gaussian and Neural Net kernel. The three nodes of each layer
are trained on the same dataset that contains historical demand
patterns of the respective citizen. The interactions among the citizens
are modeled in the form of the neural network weights.
With the above deep learning architecture, we are able to capture
the new dynamics in the energy demand that emerge from the
introduction of smart cities technologies. Therefore, the proposed
method is applicable to smart cities, and more speci ically to partitions
(or subgroups) within the smart city. The proposed method was tested
on a set of real-world data that were morphed [3] obtained from a set
of smart meters deployed in the state of Ireland. Results exhibited that
the presented deep learning architecture has the potency to analyze the
past behavior of the citizens and provide high accurate group demand
predictions.
Future work will move into two directions. The irst direction would
be to test the presented method in a higher number of citizens, whereas
the second direction will move toward testing various kernel machines
except for GP as the network nodes.
References
1. Al-Hader, M., Rodzi, A., Sharif, A.R., Ahmad, N.: Smart city components architicture. In: 2009
International Conference on Computational Intelligence, Modelling and Simulation, pp. 93–97.
IEEE (2009, September)
3. Alamaniotis, M., Gatsis, N.: Evolutionary multi-objective cost and privacy driven load
morphing in smart electricity grid partition. Energies 12(13), 2470 (2019)
[Crossref]
4. Alamaniotis, M., Bourbakis, N., Tsoukalas, L.H.: Enhancing privacy of electricity consumption
in smart cities through morphing of anticipated demand pattern utilizing self-elasticity and
genetic algorithms. Sustain. Cities Soc. 46, 101426 (2019)
[Crossref]
5.
Alamaniotis, M., Gatsis, N., Tsoukalas, L.H.: Virtual Budget: Integration of electricity load and
price anticipation for load morphing in price-directed energy utilization. Electr. Power Syst.
Res. 158, 284–296 (2018)
[Crossref]
6. Alamaniotis, M., Tsoukalas, L.H., Bourbakis, N.: Anticipatory driven nodal electricity load
morphing in smart cities enhancing consumption privacy. In 2017 IEEE Manchester
PowerTech, pp. 1–6. IEEE (2017, June)
7. Alamaniotis, M., Tsoukalas, L.H.: Multi-kernel assimilation for prediction intervals in nodal
short term load forecasting. In: 2017 19th International Conference on Intelligent System
Application to Power Systems (ISAP), pp. 1–6. IEEE, (2017)
8. Alamaniotis, M., Tsoukalas, L.H., Buckner, M.: Privacy-driven electricity group demand
response in smart cities using particle swarm optimization. In: 2016 IEEE 28th International
Conference on Tools with Arti icial Intelligence (ICTAI), pp. 946–953. IEEE, (2016a)
9. Alamaniotis, M., Tsoukalas, L.H.: Implementing smart energy systems: Integrating load and
price forecasting for single parameter based demand response. In: 2016 IEEE PES Innovative
Smart Grid Technologies Conference Europe (ISGT-Europe), pp. 1–6. IEEE (2016, October)
10. Alamaniotis, M., Bargiotas, D., Tsoukalas, L.H.: Towards smart energy systems: application of
kernel machine regression for medium term electricity load forecasting. SpringerPlus 5(1), 58
(2016b)
11. Alamaniotis, M., Tsoukalas, L.H., Fevgas, A., Tsompanopoulou, P., Bozanis, P.: Multiobjective
unfolding of shared power consumption pattern using genetic algorithm for estimating
individual usage in smart cities. In: 2015 IEEE 27th International Conference on Tools with
Arti icial Intelligence (ICTAI), pp. 398–404. IEEE (2015, November)
12. Alamaniotis, M., Tsoukalas, L.H., Bourbakis, N.: Virtual cost approach: electricity consumption
scheduling for smart grids/cities in price-directed electricity markets. In: IISA 2014, The 5th
International Conference on Information, Intelligence, Systems and Applications, pp. 38–43.
IEEE (2014, July)
13. Alamaniotis, M., Ikonomopoulos, A., Tsoukalas, L.H.: Evolutionary multiobjective optimization
of kernel-based very-short-term load forecasting. IEEE Trans. Power Syst. 27(3), 1477–1484
(2012)
[Crossref]
14. Alamaniotis, M., Ikonomopoulos, A., Tsoukalas, L.H.: A Pareto optimization approach of a
Gaussian process ensemble for short-term load forecasting. In: 2011 16th International
Conference on Intelligent System Applications to Power Systems, pp. 1–6. IEEE, (2011,
September)
15. Alamaniotis, M., Gao, R., Tsoukalas, L.H.: Towards an energy internet: a game-theoretic
approach to price-directed energy utilization. In: International Conference on Energy-
Ef icient Computing and Networking, pp. 3–11. Springer, Berlin, Heidelberg (2010)
16. Belanche, D., Casaló , L.V., Orú s, C.: City attachment and use of urban services: bene its for
smart cities. Cities 50, 75–81 (2016)
[Crossref]
17. Bishop, C.M.: Pattern Recognition and Machine Learning. springer, (2006)
18.
Bourbakis, N., Tsoukalas, L.H., Alamaniotis, M., Gao, R., Kerkman, K.: Demos: a distributed
model based on autonomous, intelligent agents with monitoring and anticipatory responses
for energy management in smart cities. Int. J. Monit. Surveill. Technol. Res. (IJMSTR) 2(4), 81–
99 (2014)
19. Commission for Energy Regulation (CER).: CER Smart Metering Project—Electricity
Customer Behaviour Trial, 2009–2010 [dataset]. 1st (edn.) Irish Social Science Data Archive.
SN: 0012-00, (2012). www.ucd.ie/issda/CER-electricity
20. Feinberg, E.A., Genethliou, D.: Load forecasting. In: Applied Mathematics for Restructured
Electric Power Systems, pp. 269–285. Springer, Boston, MA (2005)
21. Kraas, F., Aggarwal, S., Coy, M., Mertins, G. (eds.): Megacities: our global urban future. Springer
Science & Business Media, (2013)
22. Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network
architectures and their applications. Neurocomputing 234, 11–26 (2017)
[Crossref]
23. Mathew, J., Grif in, J., Alamaniotis, M., Kanarachos, S., Fitzpatrick, M.E.: Prediction of welding
residual stresses using machine learning: comparison between neural networks and neuro-
fuzzy systems. Appl. Soft Comput. 70, 131–146 (2018)
[Crossref]
24. Mohammadi, M., Al-Fuqaha, A.: Enabling cognitive smart cities using big data and machine
learning: approaches and challenges. IEEE Commun. Mag. 56(2), 94–101 (2018)
[Crossref]
25. Mohanty, S.P., Choppali, U., Kougianos, E.: Everything you wanted to know about smart cities:
the internet of things is the backbone. IEEE Consum. Electron. Mag. 5(3), 60–70 (2016)
[Crossref]
26. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep
learning applications and challenges in big data analytics. J. Big Data 2(1), 1 (2015)
[Crossref]
27. Nasiakou, A., Alamaniotis, M., Tsoukala, L.H.: Power distribution network partitioning in big
data environment using k-means and fuzzy logic. In: proceedings of the Medpower 2016
Conference, Belgrade, Serbia, pp. 1–7, (2016)
28. Nam, T., Pardo, T.A.: Conceptualizing smart city with dimensions of technology, people, and
institutions. In: Proceedings of the 12th Annual International Digital Government Research
Conference: Digital Government Innovation in Challenging Times, pp. 282–291. ACM, (2011)
29. Tsoukalas, L.H., Uhrig, R.E.: Fuzzy and Neural Approaches in Engineering, p. 1997. Wiley. Inc,
New York (1997)
30. Waller, M.A., Fawcett, S.E.: Data science, predictive analytics, and big data: a revolution that
will transform supply chain design and management. J. Bus. Logistics 34(2), 77–84 (2013)
[Crossref]
31.
Williams, C.K., Rasmussen, C.E.: Gaussian processes for machine learning, vol. 2(3), p. 4.
Cambridge, MA, MIT press, (2006)
32. Williams, C.K., Rasmussen, C.E.: Gaussian processes for regression. In: Advances in Neural
Information Processing Systems, pp. 514–520, (1996)
© Springer Nature Switzerland AG 2021
G. Phillips-Wren et al. (eds.), Advances in Data Science: Methodologies and Applications,
Intelligent Systems Reference Library 189
https://doi.org/10.1007/978-3-030-51870-7_15
Alessandro LeoneLeone
Email: [email protected]
Siciliano Pietro
Email: [email protected]
Abstract
Average life expectancy has increased steadily in recent decades. This
phenomenon, considered together with aging of the population, will
inevitably produce in the next years deep social changes that lead to the
need of innovative services for elderly people, focused to improve the
wellbeing and the quality of life. In this context many potential
applications would bene it from the ability of automatically recognize
facial expression with the purpose to re lect the mood, the emotions
and also mental activities of an observed subject. Although facial
expression recognition (FER) is widely investigated by many recent
scienti ic works, it still remains a challenging task for a number of
important factors among which one of the most discriminating is the
age. In the present work an optimized Convolutional Neural Network
(CNN) architecture is proposed and evaluated on two benchmark
datasets (FACES and Lifespan) containing expressions performed also
by aging adults. As baseline, and with the aim of making a comparison,
two traditional machine learning approaches based on handcrafted
features extraction process are evaluated on the same datasets.
Experimentation con irms the ef iciency of the proposed CNN
architecture with an average recognition rate higher than 93.6% for
expressions performed by ageing adults when a proper set of CNN
parameters was used. Moreover, the experimentation stage showed
that the deep learning approach signi icantly improves the baseline
approaches considered, and the most noticeable improvement was
obtained when considering facial expressions of ageing adults.
15.1 Introduction
The constant increase of the life expectancy and the consequent aging
phenomenon will inevitably produce in the next 20 years deep social
changes that lead to the need of innovative services for elderly people,
focused to maintain independence, autonomy and, in general, improve
the wellbeing and the quality of life of ageing adults [1]. It is obvious
how in this context many potential applications, such as robotics,
communications, security, medical and assistive technology, would
bene it from the ability of automatically recognize facial expression [2–
4], because different facial expressions can re lect the mood, the
emotions and also mental activities of an observed subject.
Facial expression recognition (FER) is related to systems that aims
to automatically analyse the facial movements and facial features
changes of visual information to recognize a facial expression. It is
important to mention that FER is different from emotion recognition.
The emotion recognition requires a higher level of knowledge. Despite
the facial expression could indicate an emotion, to the analysis of the
emotion information like context, body gesture, voice, cultural factors
are also necessary [5]. A classical automatic facial expression analysis
usually employs three main stages: face acquisition, facial data
extraction and representation (feature extraction), and classi ication.
Ekman’s initial research [6] determined that there were six basic
classes in FER: anger, disgust, fear, happiness, sadness and surprise.
Proposed solutions for the classi ication of aforementioned facial
expressions can be divided into two main categories: the irst category
includes the solutions that perform the classi ication by processing a
set of consecutive images while, the second one, includes the
approaches which carry out FER on each single image.
By working on image sequences much more information is available
for the analysis. Usually, the neutral expression is used as a reference
and some characteristics of facial traits are tracked over time in order
to recognize the evolving expression. The major drawback of these
approaches is the inherent assumption that the sequence content
evolves from the neutral expression to another one that has to be
recognized. This constrain strongly limits their use in real world
applications where the evolution of facial expressions is completely
unpredictable. For this reason, the most attractive solutions are those
performing facial expression recognition on a single image.
For static images various types of features might be used for the
design of a FER system. Generally, they are divided into the following
categories: geometric-based, appearance-based and hybrid-based
approaches. More speci ically, geometric-based features are able to
depict the shape and locations of facial components such as mouth,
nose, eyes and brows using the geometric relationships between facial
points to extract facial features. Three typical geometric feature-based
extraction methods are active shape models (ASM) [7], active
appearance models (AAM) [8] and scale-invariant feature transform
(SIFT) [9]. Appearance-based descriptors aim to use the whole-face or
speci ic regions in a face image to re lect the underlying information in
a face image. There are mainly three representative appearance-based
feature extraction methods, i.e. Gabor Wavelet representation [10],
Local Binary Patterns (LBP) [11] and Histogram of Oriented Gradient
(HOG) [12]. Hybrid-based approaches combine the two previous
features type in order to enhance the system’s performance and it
might be achieved either in features extraction or classi ication level.
Geometric-based, appearance-based and hybrid-based approaches
have been widely used for the classi ication of facial expressions even if
it is important to emphasize how all the aforementioned methodologies
require a process of feature de inition and extraction very daunting.
Extracting geometric or appearance-based features usually requests an
accurate feature point detection technique and generally this is dif icult
to implement in real-world complex background. In addition, this
category of methodologies easily ignore the changes in skin texture
such as wrinkles and furrows that are usually accentuated by the age of
the subject. Moreover, the task often expects the development and
subsequent analysis of complex models with a further process of ine-
tuning of several parameters, which nonetheless can show large
variances depending on individual characteristics of the subject that
performs facial expressions. Last but not least recent studies have
pointed out that classical approaches used for the classi ication of facial
expression are not performing well when used in real contexts where
face pose and lighting conditions are broadly different from the ideal
ones used to capture the face images within the benchmark datasets.
Among the factor that make FER very dif icult, one of the most
discriminating is the age [13, 14]. In particular, expressions of older
individuals appeared harder to decode, owing to age-related structural
changes in the face which supports the notion that the wrinkles and
folds in older faces actually resemble emotions. Consequently, state of
the art approaches based on handcrafted features extraction may be
inadequate for the classi ication of FER performed by aging adults.
It seems therefore very important to analyse automatic systems that
make the recognition of facial expressions of the ageing adults more
ef icient, considering that facial expressions of elderly, as highlighted
above, are broadly different from those of young or middle-aged for a
number of reasons. For example, in [15] researchers found that the
expressions of aging adults (women in this case) were more telegraphic
in the sense that their expressive behaviours tended to involve fewer
regions of the face, and yet more complex in that they used blended or
mixed expressions when recounting emotional events. These changes,
in part, account for why the facial expressions of ageing adults are more
dif icult to read. Another study showed that when emotional memories
were prompted and subjects asked to relate their experiences, ageing
adults were more facially expressive in terms of the frequency of
emotional expressions than younger individuals across a range of
emotions, as detected by an objective facial affect coding system [16].
One of the other changes that comes with age, making an aging facial
expression dif icult to recognize, involves the wrinkling of the facial
skin and the sag of facial musculature. Of course, part of this is due to
biologically based aspects of aging, but individual differences also
appear linked to personality process, as demonstrated in [17].
To the best of our knowledge, only few works in literature address
the problem of FER in aging adults. In [13] the authors perform a
computational study within and across different age groups and
compare the FER accuracies, founding that the recognition rate is
in luenced signi icantly by human aging. The major issue of this work is
related to the feature extraction step, in fact they manually labelled the
facial iducial points and, given these points, Gabor ilters are used to
extract features for subsequent FER. Consequently, this process is
inapplicable in the application context under consideration, where the
objective is to provide new technologies able to function automatically
and without human intervention.
On the other hand, the application described in [18] recognizes
emotions of ageing adults using an Active Shape Model [7] for feature
extraction. To train the model the authors employ three benchmark
datasets that do not contain adult faces getting an average accuracy of
82.7% on the same datasets. Tests performed on older faces acquired
with the webcam reached an average accuracy of 79.2%, without any
veri ication of how the approach works for example on a benchmark
dataset with older faces.
Analysing the results achieved it seems appropriate to investigate
new methodologies which must make the feature extraction process
less dif icult, while at the same time strengthening the classi ication of
facial expressions.
Recently, a viable alternative to the traditional feature design
approaches is represented by deep learning (DL) algorithms which
straightforwardly leads to automated feature learning [19]. Research
using DL techniques could make better representations and create
innovative models to learn these representations from unlabelled data.
These approaches became computationally feasible thanks to the
availability of powerful GPU processors, allowing high-performance
numerical computation in graphics cards. Some of the DL techniques
like Convolutional Neural Networks (CNNs), Deep Boltzmann Machine,
Deep Belief Networks and Stacked Auto-Encoders are applied to
practical applications like pattern analysis, audio recognition, computer
vision and image recognition where they produce challenging results
on various tasks [20].
It comes as no surprise that CNNs, for example, have worked very
well for FER, as evidenced by their use in a number of state-of-the-art
algorithms for this task [21–23], as well as winning related
competitions [24], particularly previous years’ EmotiW challenge [25,
26]. The problem with CNNs is that this kind of neural network has a
very high number of parameters and moreover achieves better
accuracy with big data. Because of that, it is prone to over itting if the
training is performed on a small sized dataset. Another not negligible
problem is that there are no publicly available datasets with suf icient
data for facial expression recognition with deep architectures.
In this paper, an automatic FER approach that employs a supervised
machine learning technique derived from DL is introduced and
compared with two traditional approaches selected among the most
promising ones and effective present in the literature. Indeed, a CNN
inspired from a popular architecture proposed in [27] was designed
and implemented. Moreover, in order to tackle the problem of the
over itting, this work proposes also in the pre-processing step,
standard methods for data generation in synthetic way (techniques
indicated in the literature as “data augmentation”) to cope with the
limitation inherent the amount of data.
The structure of the paper is as follows. Section 15.2 reports some
details about the implemented pipeline for FER in ageing adults,
emphasizing theoretical details for pre-processing steps. The same
section describes also the implemented CNN architecture and both
traditional machine learning approaches used for comparison.
Section 15.3 presents the results obtained, while discussion and
conclusion are summarized in Sect. 15.4.
15.2 Methods
Figure 15.1 shows the structure of our FER system. First, the
implemented pipeline performs a pre-processing task on the input
images (data augmentation, face detection, cropping and down
sampling, normalization). Once the images are pre-processed they can
be either used to train the implemented deep network or to extract
handcrafted features (both geometric and appearance-based).
Fig. 15.1 Pipeline of the proposed system. First a pre-processing task on the input images was
performed. The obtained normalized face image is used to train the deep neural network
architecture. Moreover, both geometrical and appearance-based features are extracted from
normalized image. Finally, each image is classi ied associating it with a label of most probably
facial expression
15.2.1 Pre-processing
Here are some details about the blocks that perform the pre-processing
algorithmic procedure, whereas the next sub-sections illustrate the
theoretical details of the DL methodology and the two classical machine
learning approaches used for comparison. It is well known that one of
the main problems of deep learning methods is that they need a lot of
data in the training phase to perform this task properly.
In the present work the problem is accentuated from having very
few datasets containing images of facial expressions performed by
ageing subjects. So before training the CNN model, we need to augment
the data with various transformations for generate various small
changes in appearances and poses.
The number of available images has been increased with three data
augmentation strategies. The irst strategy is to use lip augmentation,
mirroring images about the y-axis producing two-samples from each
image. The second strategy is to change the lighting condition of the
images. In this work lighting condition is varied by adding Gaussian
noise in the available face images. The last strategy consists in rotating
the images of a speci ic angle. Consequently each facial image has been
rotated through 7 angles randomly generated in the range [−30°; 30°]
with respect to the y-axis. Summarizing, starting from each image
present in the datasets, and through the combination of the previously
described data augmentation techniques, 32 facial images have been
generated.
The next step consists in the automatic detection of the facial
region. Here, the facial region is automatically identi ied on the original
image by means of the Viola-Jones face detector [28]. Once the face has
been detected by the Viola-Jones algorithm, a simple routine was
written in order to crop the face image. This is achieved by detecting
the coordinates of the top-left corner, the height and width of the face
enclosing rectangle, removing in this way all background information
and image patches that are not related to the expression. Since the
facial region could be of different sizes after cropping, in order to
remove the variation in face size and keep the facial parts in the same
pixel space, the algorithmic pipeline provides a down-sampling step
that generates face images with a ixed dimension using a linear
interpolation. It is important to stress how this pre-processing task
helps the CNN to learn which regions are related to each speci ic
expression. Next, the obtained cropped and down-sampled RGB face
image is converted into grayscale by eliminating the hue and saturation
information while retaining the luminance. Finally, since the image
brightness and contrast could vary even in images that represent the
same facial expression performed by the same subject, an intensity
normalization procedure was applied in order to reduce these issues.
Generally histogram equalization is applied to enhance the contrast of
the image by transforming the image intensity values since images
which have been contrast enhanced are easier to recognize and classify.
However, the noise can also be ampli ied by the histogram equalization
when enhancing the contrast of the image through a transformation of
its intensity value since a number of pixels fall inside the same gray
level range. Therefore, instead of applying the histogram equalization,
in this work the method introduced in [29] called “contrast limited
adaptive histogram equalization” (CLAHE) was used. This algorithm is
an improvement of the histogram equalization algorithm and
essentially consists in the division of the original image into contextual
regions, where histogram equalization was made on each of these sub
regions. These sub regions are called tiles. The neighboring tiles are
combing by using a bilinear interpolation to eliminate arti icially
induced boundaries. This could give much better contrast and provide
accurate results.
kernel which connects the i-th and j-th feature map. is a term called
bias (an error term) and f is the activation function. In the present work
the widely used Recti ied Linear Unit function (ReLU) was applied
because it was demonstrated that this kind of nonlinear function has
better itting abilities than hyperbolic tangent function or logistic
sigmoid function [31].
The irst convolution layer applies a convolution kernel of 5 × 5 and
outputs 32 images of 28 × 28 pixels. It aims to extract elementary visual
features, like oriented edges, end-point, corners and shapes in general.
In FER problem, the features detected are mainly the shapes, corners
and edges of eyes, eyebrow and lips. Once the features are detected, its
exact location is not so important, just its relative position compared to
the other features.
For example, the absolute position of the eyebrows is not important,
but their distances from the eyes are, because a big distance may
indicate, for instance, the surprise expression. This precise position is
not only irrelevant but it can also pose a problem, because it can
naturally vary for different subjects in the same expression.
The irst convolution layer is followed by a sub-sampling (pooling)
layer which is used to reduce the image to half of its size and control
the over itting. This layer takes small square blocks (2 × 2) from the
convolutional layer and subsamples it to produce a single output from
each block. The operation aims to reduce the precision with which the
position of the features extracted by the previous layer are encoded in
the new map. The most common pooling form is average pooling or
max pooling. In the present paper the max-pooling strategy has been
employed, which can be formulated as:
The last step provides a classi ication module that uses a Support
Vector Machine (SVM) for the analysis of the obtained features vector in
order to get a prediction in terms of facial expression (Fig. 15.3).
Fig. 15.3 FER based on the geometric features extraction methodology: a facial landmark
localization, b extraction of 32 geometric features (linear, elliptical and polygonal) using the
obtained landmarks
Fig. 15.6 Some examples of expressions performed by aging adults from the Lifespan database
The training and testing phase were performed on Intel i7 3.5 GHz
workstation with 16 GB DDR3 and equipped with GPU NVidia Titan X
using the Python library for machine learning Tensor low, developed
for implementing, training, testing and deploying deep learning models
[38].
For the performance evaluation of the methodologies all the images
of FACES dataset were pre-processed, whereas only the facial images of
Lifespan with the four facial expressions considered in the present
work were considered. Consequently, applying the data augmentation
techniques previously described (see Sect. 15.2), in total 65,664 facial
images of FACES (equally distributed among the facial expression
classes) and 31,360 facial images of Lifespan were used, a suf icient
number for using a deep learning technique.
On the other hand, the inal accuracy obtained by the proposed CNN
for each age group of FACES and Lifespan dataset is reported in
Table 15.4 and Table 15.5. It was computed using the network weights
of the best run out of 20 runs, having a validation set for accuracy
measurement.
Table 15.4 FER accuracy on FACES dataset evaluated for different age group with proposed CNN
and traditional machine learning approaches
Age group Proposed CNN (%) ASM + SVM (%) LBP + SVM (%)
Young (19–31 years) 92.43 86.42 87.22
Middle-aged (39–55 years) 92.16 86.81 87.47
Older (69–80 years) 93.86 84.98 85.61
Overall accuracy 92.81 86.07 86.77
Table 15.5 FER accuracy on Lifespan dataset evaluated for different age group with proposed
CNN and traditional machine learning approaches
Age group Proposed CNN (%) ASM + SVM (%) LBP + SVM (%)
Young (18–29 years) 93.01 90.16 90.54
Middle-aged (30–49 years) 93.85 89.24 90.01
Older (50–69 years) 95.48 86.12 86.32
Very old (70–93 years) 95.78 85.28 86.01
Overall accuracy 94.53 87.70 88.22
Table 15.7 Average classi ication accuracy obtained for FACES and Lifespan datasets with four
different combination of pre-processing steps using ASM + SVM and at varying of age groups
Table 15.8 Average classi ication accuracy obtained for FACES and Lifespan datasets with four
different combination of pre-processing steps using LBP + SVM and at varying of age groups
Estimated (%)
Anger Disgust Fear Happy Sad Neutral
Actual (%) Anger 96.8 0 0 0 2.2 1.0
Disgust 3.1 93.8 0 0.7 1.8 0.6
Fear 0 0 95.2 1.5 3.3 0
Estimated (%)
Anger Disgust Fear Happy Sad Neutral
Happy 0.7 2.8 1.1 94.3 0 1.1
Sad 0.6 0 4.1 0 90.2 5.1
Neutral 2.5 2.0 2.6 0 0 92.9
Table 15.11 Confusion matrix of four basic expression on Lifespan dataset (performed by older
and very old adults) using the proposed CNN architecture
Estimated (%)
Happy Neutral Surprise Sad
Actual (%) Happy 97.7 0.3 1.8 0.2
Neutral 2.1 96.4 0.6 0.9
Surprise 4.6 0.1 93.8 1.5
Sad 0.6 3.8 1.1 94.5
References
1. United Nations Programme on Ageing. The ageing of the world’s population, December 2013.
http://www.un.org/en/development/desa/population/publications/pdf/ageing/
WorldPopulationAgeing2013.pdf. Accessed July 2018
2. Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio,
visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58
(2009). https://doi.org/10.1109/tpami.2008.52
[Crossref]
3. Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art.
IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1424–1445 (2000). https://doi.org/10.1109/
34.895976
[Crossref]
4. Fasel, B., Luettin, J.: Automatic facial expression analysis: a survey. Pattern Recogn. 36(1),
259–275 (2003). https://doi.org/10.1016/s0031-3203(02)00052-3
[Crossref][zbMATH]
5. Carroll, J.M., Russell, J.A.: Do facial expressions signal speci ic emotions? Judging emotion from
the face in context. J. Pers. Soc. Psychol. 70(2), 205 (1996). https://doi.org/10.1037//0022-
3514.70.2.205
[Crossref]
6. Ekman, P., Rolls, E.T., Perrett, D.I., Ellis, H.D.: Facial expressions of emotion: an old controversy
and new indings [and discussion]. Philoso. Trans. R Soc. B Biolog. Sci. 335(1273), 63–69
(1992). https://doi.org/10.1098/rstb.1992.0008
[Crossref]
7. Shbib, R., Zhou, S.: Facial expression analysis using active shape model. Int. J. Sig. Process.
Image Process. Pattern Recogn. 8(1), 9–22 (2015). https://doi.org/10.14257/ijsip.2015.8.1.
02
8. Cheon, Y., Kim, D.: Natural facial expression recognition using differential-AAM and manifold
learning. Pattern Recogn. 42(7), 1340–1350 (2009). https://doi.org/10.1016/j.patcog.2008.
10.010
[Crossref][zbMATH]
9. Soyel, H., Demirel, H.: Facial expression recognition based on discriminative scale invariant
feature transform. Electron. Lett. 46(5), 343–345 (2010). https://doi.org/10.1049/el.2010.
0092
[Crossref]
10. Gu, W., Xiang, C., Venkatesh, Y.V., Huang, D., Lin, H.: Facial expression recognition using radial
encoding of local Gabor features and classi ier synthesis. Pattern Recogn. 45(1), 80–91
(2012). https://doi.org/10.1016/j.patcog.2011.05.006
[Crossref]
11. Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns:
a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009). https://doi.org/10.1016/
j.imavis.2008.08.005
[Crossref]
12. Chen, J., Chen, Z., Chi, Z., Fu, H.: Facial expression recognition based on facial components
detection and hog features. In: International Workshops on Electrical and Computer
Engineering Sub ields, pp. 884–888 (2014)
13. Guo, G., Guo, R., Li, X.: Facial expression recognition in luenced by human aging. IEEE Trans.
Affect. Comput. 4(3), 291–298 (2013). https://doi.org/10.1109/t-affc.2013.13
[Crossref]
14. Wang, S., Wu, S., Gao, Z., Ji, Q.: Facial expression recognition through modeling age-related
spatial patterns. Multimedia Tools Appl. 75(7), 3937–3954 (2016). https://doi.org/10.1007/
s11042-015-3107-2
[Crossref]
15.
Malatesta C.Z., Izard C.E.: The facial expression of emotion: young, middle-aged, and older
adult expressions. In: Malatesta C.Z., Izard C.E. (eds.) Emotion in Adult Development, pp. 253–
273. Sage Publications, London (1984)
16. Malatesta-Magai, C., Jonas, R., Shepard, B., Culver, L.C.: Type A behavior pattern and emotion
expression in younger and older adults. Psychol. Aging 7(4), 551 (1992). https://doi.org/10.
1037//0882-7974.8.1.9
[Crossref]
17. Malatesta, C.Z., Fiore, M.J., Messina, J.J.: Affect, personality, and facial expressive characteristics
of older people. Psychol. Aging 2(1), 64 (1987). https://doi.org/10.1037//0882-7974.2.1.64
[Crossref]
18. Lozano-Monasor, E., Ló pez, M.T., Vigo-Bustos, F., Ferná ndez-Caballero, A.: Facial expression
recognition in ageing adults: from lab to ambient assisted living. J. Ambi. Intell. Human.
Comput. 1–12 (2017). https://doi.org/10.1007/s12652-017-0464-x
19. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://
doi.org/10.1038/nature14539
[Crossref]
20. Yu, D., Deng, L.: Deep learning and its applications to signal and information processing
[exploratory dsp]. IEEE Signal Process. Mag. 28(1), 145–154 (2011). https://doi.org/10.
1109/msp.2010.939038
[Crossref]
21. Xie, S., Hu, H.: Facial expression recognition with FRR-CNN. Electron. Lett. 53(4), 235–237
(2017). https://doi.org/10.1049/el.2016.4328
[Crossref]
22. Li, Y., Zeng, J., Shan, S., Chen, X.: Occlusion aware facial expression recognition using cnn with
attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2018). https://doi.org/
10.1109/TIP.2018.2886767
[MathSciNet][Crossref]
23. Lopes, A.T., de Aguiar, E., De Souza, A.F., Oliveira-Santos, T.: Facial expression recognition with
convolutional neural networks: coping with few data and the training sample order. Pattern
Recogn. 61, 610–628 (2017). https://doi.org/10.1016/j.patcog.2016.07.026
[Crossref]
24. Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., …, Zhou, Y.:
Challenges in representation learning: a report on three machine learning contests. In:
International Conference on Neural Information Processing, pp. 117–124. Springer, Berlin,
Heidelberg (2013). https://doi.org/10.1016/j.neunet.2014.09.005
25. Kahou, S.E., Pal, C., Bouthillier, X., Froumenty, P., Gü lçehre, Ç., Memisevic, R., …, Mirza, M.:
Combining modality speci ic deep neural networks for emotion recognition in video. In:
Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp.
543–550. ACM (2013)
26.
Liu, M., Wang, R., Li, S., Shan, S., Huang, Z., Chen, X.: Combining multiple kernel methods on
riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th
International Conference on Multimodal Interaction, pp. 494–501. ACM (2014). https://doi.
org/10.1145/2663204.2666274
27. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document
recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
[Crossref]
28. Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vision 57(2), 137–154
(2004). https://doi.org/10.1023/b:visi.0000013087.49260. b
[Crossref]
29. Zuiderveld, K.: Contrast limited adaptive histogram equalization. Graphics Gems 474–485
(1994). https://doi.org/10.1016/b978-0-12-336156-1.50061-6
30. Hubel, D.H., Wiesel, T.N.: Receptive ields and functional architecture of monkey striate cortex.
J. Physiol. 195(1), 215–243 (1968). https://doi.org/10.1113/jphysiol.1968.sp008455
[Crossref]
31. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse recti ier neural networks. In: Proceedings of the
Fourteenth International Conference on Arti icial Intelligence and Statistics, pp. 315–323
(2011)
32. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of
COMPSTAT ’2010, pp. 177–186. Physica-Verlag HD (2010). https://doi.org/10.1007/978-3-
7908-2604-3_16
33. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classi ication with deep convolutional
neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
(2012)
34. Smith, L.N.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter
Conference on Applications of Computer Vision (WACV), pp. 464–472 IEEE (2017). https://
doi.org/10.1109/wacv.2017.58
35. Milborrow, S., Nicolls, F.: Active shape models with SIFT descriptors and MARS. In: 2014
International Conference on Computer Vision Theory and Applications (VISAPP), vol. 2, pp.
380–387. IEEE (2014). https://doi.org/10.5220/0004680003800387
36. Ebner, N.C., Riediger, M., Lindenberger, U.: FACES—a database of facial expressions in young,
middle-aged, and older women and men: development and validation. Behav. Res. Methods
42(1), 351–362 (2010). https://doi.org/10.3758/brm.42.1.351
[Crossref]
37. Minear, M., Park, D.C.: A lifespan database of adult facial stimuli. Behav. Res. Methods Instru.
Comput. 36(4), 630–633 (2004). https://doi.org/10.3758/bf03206543
[Crossref]
38. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., …, Kudlur, M.: Tensor low: a system
for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)
39. Zhang, C., Zhang, Z.: A survey of recent advances in face detection (2010)
40.
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. arXiv:1409.1556 (2014)
41. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016).
https://doi.org/10.1109/cvpr.2016.90
42. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., …, Rabinovich, A.: Going deeper
with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 1–9 (2015). https://doi.org/10.1109/cvpr.2015.7298594