0% found this document useful (0 votes)

33 views434 pages

Advances in Neural Computation Machine Learning and Cognitive Re 2020

Uploaded by

Đặng Hoàng Nhật Hưng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views434 pages

Advances in Neural Computation Machine Learning and Cognitive Re 2020

Uploaded by

Đặng Hoàng Nhật Hưng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 434

Studies in Computational Intelligence 856

Boris Kryzhanovsky
Witali Dunin-Barkowski
Vladimir Redko
Yury Tiumentsev Editors

Advances in Neural
Computation, Machine
Learning, and
Cognitive Research III
Selected Papers from the XXI
International Conference on
Neuroinformatics, October 7–11, 2019,
Dolgoprudny, Moscow Region, Russia
Studies in Computational Intelligence

Volume 856

Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the ﬁelds of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artiﬁcial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
The books of this series are submitted to indexing to Web of Science,
EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.

More information about this series at http://www.springer.com/series/7092

Boris Kryzhanovsky Witali Dunin-Barkowski
• •

Vladimir Redko Yury Tiumentsev

•

Editors

Advances in Neural
Computation, Machine
Learning, and Cognitive
Research III
Selected Papers from the XXI International
Conference on Neuroinformatics,
October 7–11, 2019, Dolgoprudny,
Moscow Region, Russia

123
Editors
Boris Kryzhanovsky Witali Dunin-Barkowski
Scientiﬁc Research Institute for System Scientiﬁc Research Institute for System
Analysis of Russian Academy of Sciences Analysis of Russian Academy of Sciences
Moscow, Russia Moscow, Russia

Vladimir Redko Yury Tiumentsev

Scientiﬁc Research Institute for System Moscow Aviation Institute
Analysis of Russian Academy of Sciences (National Research University)
Moscow, Russia Moscow, Russia

ISSN 1860-949X ISSN 1860-9503 (electronic)

Studies in Computational Intelligence
ISBN 978-3-030-30424-9 ISBN 978-3-030-30425-6 (eBook)
https://doi.org/10.1007/978-3-030-30425-6
© Springer Nature Switzerland AG 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The international conference “Neuroinformatics” is the annual multidisciplinary

scientific forum dedicated to the theory and applications of artificial neural networks,
the problems of neuroscience and biophysics systems, artificial intelligence,
adaptive behavior, and cognitive studies.
The scope of the conference is wide, ranging from theory of artificial neural
networks, machine learning algorithms, and evolutionary programming to
neuroimaging and neurobiology.
Main topics of the conference cover theoretical and applied research from the
following fields:
Neurobiology and neurobionics: cognitive studies, neural excitability, cellular
mechanisms, cognition and behavior, learning and memory, motivation and emotion,
bioinformatics, adaptive behavior and evolutionary modeling, brain–computer
interface;
Neural networks: neurocomputing and learning, paradigms and architectures,
biological foundations, computational neuroscience, neurodynamics, neuroinformatics,
deep learning networks, neuro-fuzzy systems, hybrid intelligent systems;
Machine learning: pattern recognition, Bayesian networks, kernel methods,
generative models, information theoretic learning, reinforcement learning, relational
learning, dynamical models, classification and clustering algorithms,
self-organizing systems;
Applications: medicine, signal processing, control, simulation, robotics, hardware
implementations, security, finance and business, data mining, natural language
processing, image processing, and computer vision.
More than 100 reports were presented at the Neuroinformatics 2019 Conference.
Of these, 50 papers were selected, including 3 invited papers, for which articles
were prepared and published in this volume.

Boris Kryzhanovsky
Witali Dunin-Barkowski
Vladimir Redko
Yury Tiumentsev
v
Organization

Editorial Board
Boris Kryzhanovsky Scientific Research Institute for System Analysis
of Russian Academy of Sciences
Witali Dunin-Barkowsky Scientific Research Institute for System Analysis
of Russian Academy of Sciences
Vladimir Red’ko Scientific Research Institute for System Analysis
of Russian Academy of Sciences
Yury Tiumentsev Moscow Aviation Institute
(National Research University)

Advisory Board

Prof. Alexander N. Gorban (Tentative Chair of the International Advisory Board)

Department of Mathematics
University of Leicester
Email: [email protected]
Homepage: http://www.math.le.ac.uk/people/ag153/homepage/
Google scholar proﬁle:
http://scholar.google.co.uk/citations?user=D8XkcCIAAAAJ&hl=en
Tel. +44 116 223 14 33
Address: Department of Mathematics
University of Leicester
Leicester LE1 7RH
UK
Prof. Nicola Kasabov
Professor of Computer Science and Director KEDRI
Phone: +64 9 921 9506
Email: [email protected]
http://www.kedri.info

vii
viii Organization

Physical Address:
KEDRI
Auckland University of Technology
AUT Tower, Level 7
Corner Rutland and Wakeﬁeld Street
Auckland
Postal Address:
KEDRI
Auckland University of Technology
Private Bag 92006
Auckland 1142
New Zealand
Prof. Jun Wang, PhD, FIEEE, FIAPR
Chair Professor of Computational Intelligence
Department of Computer Science
City University of Hong Kong
Kowloon Tong, Kowloon, Hong Kong
+852 34429701 (tel.)
+852-34420503 (fax)
[email protected]

Program Committee of the XXI International Conference

“Neuroinformatics-2019”
General Chair

Vedyakhin A. A. Sberbank and Moscow Institute of Physics

and Technology, Dolgoprudny,
Moscow Region

Co-chairs

Kryzhanovskiy Boris Scientiﬁc Research Institute for System Analysis,

Moscow
Dunin-Barkowski Witali Scientiﬁc Research Institute for System Analysis,
Moscow
Gorban Alexander University of Leicester, Great Britain
Nikolaevich
Organization ix

Program Committee

Ajith Abraham Machine Intelligence Research Labs (MIR Labs),

Scientiﬁc Network for Innovation and
Research Excellence, Washington, USA
Anokhin Konstantin National Research Centre “Kurchatov Institute,”
Moscow
Baidyk Tatiana The National Autonomous University of Mexico,
Mexico
Balaban Pavel Institute of Higher Nervous Activity
and Neurophysiology of RAS, Moscow
Borisyuk Roman Plymouth University, UK
Burtsev Mikhail National Research Centre “Kurchatov Institute,”
Moscow
Cangelosi Angelo Plymouth University, UK
Chizhov Anton Ioffe Physical Technical Institute, Russian
Academy of Sciences, St. Petersburg
Dolenko Sergey Skobeltsyn Institute of Nuclear Physics,
Lomonosov Moscow State University
Dolev Shlomi Ben-Gurion University of the Negev, Israel
Dosovitskiy Alexey Albert-Ludwigs-Universität, Freiburg, Germany
Dudkin Alexander United Institute of Informatics Problems, Minsk,
Belarus
Ezhov Alexander State Research Center of Russian Federation
“Troitsk Institute for Innovation and Fusion
Research,” Moscow
Frolov Alexander Institute of Higher Nervous Activity
and Neurophysiology of RAS, Moscow
Golovko Vladimir Brest State Technical University, Belarus
Hayashi Yoichi Meiji University, Kawasaki, Japan
Husek Dusan Institute of Computer Science, Czech Republic
Ivanitsky Alexey Institute of Higher Nervous Activity
and Neurophysiology of RAS, Moscow
Izhikevich Eugene Brain Corporation, San Diego, USA
Jankowski Stanislaw Warsaw University of Technology, Poland
Kaganov Yuri Bauman Moscow State Technical University
Kazanovich Yakov Institute of Mathematical Problems of Biology
of RAS, Pushchino, Moscow Region
Kecman Vojislav Virginia Commonwealth University, USA
Kernbach Serge Cybertronica Research, Research Center
of Advanced Robotics and Environmental
Science, Stuttgart, Germany
Koprinkova-Hristova Petia Institute of Information and Communication
Technologies, Bulgaria
x Organization

Kussul Ernst The National Autonomous University of Mexico,

Mexico
Litinsky Leonid Scientific Research Institute for System Analysis,
Moscow
Makarenko Nikolay The Central Astronomical Observatory
of the Russian Academy of Sciences
at Pulkovo, Saint Petersburg
Mishulina Olga National Research Nuclear University (MEPhI),
Moscow
Narynov Sergazy Alem Research, Almaty, Kazakhstan
Nechaev Yuri Honored Scientist of the Russian Federation,
Academician of the Russian Academy
of Natural Sciences, St. Petersburg
Pareja-Flores Cristobal Complutense University of Madrid, Spain
Prokhorov Danil Toyota Research Institute of North America,
USA
Vladimir Red’ko Scientific Research Institute for System Analysis
of Russian Academy of Sciences, Moscow
Rudakov Konstantin Dorodnicyn Computing Centre of RAS, Moscow
Rutkowski Leszek Czestochowa University of Technology, Poland
Samarin Anatoly A. B. Kogan Research Institute
for Neurocybernetics Southern Federal
University, Rostov-on-Don
Samsonovich Alexei George Mason University, USA
Sandamirskaya Yulia Institute of Neuroinformatics, UZH/ETHZ,
Switzerland
Shumskiy Sergey P. N. Lebedev Physical Institute of the Russian
Academy of Sciences, Moscow
Sirota Anton Ludwig Maximilian University of Munich,
Germany
Snasel Vaclav Technical University Ostrava, Czech Republic
Terekhov Serge JSC Svyaznoy Logistics, Moscow
Tikidji-Hamburyan Ruben Louisiana State University, USA
Tiumentsev Yury Moscow Aviation Institute
(National Research University)
Trofimov Alexander National Research Nuclear University (MEPhI),
Moscow
Tsodyks Misha Weizmann Institute of Science, Rehovot, Israel
Tsoy Yury Institut Pasteur Korea, Republic of Korea
Ushakov Vadim National Research Centre “Kurchatov Institute,”
Moscow
Velichkovsky Boris National Research Centre “Kurchatov Institute,”
Moscow
Vvedensky Viktor National Research Centre “Kurchatov Institute,”
Moscow
Organization xi

Yakhno Vladimir The Institute of Applied Physics of the Russian

Academy of Sciences, Nizhny Novgorod
Zhdanov Alexander Lebedev Institute of Precision Mechanics
and Computer Engineering, Russian Academy
of Sciences, Moscow
Contents

Invited Papers
Deep Learning a Single Photo Voxel Model Prediction from Real
and Synthetic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Vladimir V. Kniaz, Peter V. Moshkantsev, and Vladimir A. Mizginov
Tensor Train Neural Networks in Retail Operations . . . . . . . . . . . . . . . 17
Serge A. Terekhov
Semi-empirical Neural Network Based Modeling and Identiﬁcation
of Controlled Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Yury Tiumentsev and Mikhail Egorchev

Artificial Intelligence
Photovoltaic System Control Model on the Basis of a Modified
Fuzzy Neural Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Ekaterina A. Engel and Nikita E. Engel
Impact of Assistive Control on Operator Behavior Under High
Operational Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Mikhail Kopeliovich, Evgeny Kozubenko, Mikhail Kashcheev,
Dmitry Shaposhnikov, and Mikhail Petrushan
Hierarchical Actor-Critic with Hindsight for Mobile Robot
with Continuous State Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Staroverov Aleksey and Aleksandr I. Panov
The Hybrid Intelligent Information System for Music Classification . . . 71
Aleksandr Stikharnyi, Alexey Orekhov, Ark Andreev,
and Yuriy Gapanyuk
The Hybrid Intelligent Information System for Poems Generation . . . . 78
Maria Taran, Georgiy Revunkov, and Yuriy Gapanyuk

xiii
xiv Contents

Cognitive Sciences and Brain-Computer Interface, Adaptive Behavior

and Evolutionary Simulation
Is Information Density a Reliable Universal Predictor of Eye
Movement Patterns in Silent Reading? . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Valeriia A. Demareva and Yu. A. Edeleva
Bistable Perception of Ambiguous Images – Analytical Model . . . . . . . . 95
Evgeny Meilikov and Rimma Farzetdinova
Video-Computer Technology of Real Time Vehicle Driver
Fatigue Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Y. R. Muratov, M. B. Nikiforov, A. S. Tarasov, and A. M. Skachkov
Consistency Across Functional Connectivity Methods and Graph
Topological Properties in EEG Sensor Space . . . . . . . . . . . . . . . . . . . . . 116
Anton A. Pashkov and Ivan S. Dakhtin
Evolutionary Minimization of Spin Glass Energy . . . . . . . . . . . . . . . . . . 124
Vladimir G. Red’ko and Galina A. Beskhlebnova
Comparison of Two Models of a Transparent
Competitive Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Zarema B. Sokhova and Vladimir G. Red’ko
Spectral Parameters of Heart Rate Variability as Indicators
of the System Mismatch During Solving Moral Dilemmas . . . . . . . . . . . 138
I. M. Sozinova, K. R. Arutyunova, and Yu. I. Alexandrov
The Role of Brain Stem Structures in the Vegetative Reactions
Based on fMRI Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Vadim L. Ushakov, Vyacheslav A. Orlov, Yuri I. Kholodny,
Sergey I. Kartashov, Denis G. Malakhov, and Mikhail V. Kovalchuk
Ordering of Words by the Spoken Word Recognition Time . . . . . . . . . 151
Victor Vvedensky, Konstantin Gurtovoy, Mikhail Sokolov,
and Mikhail Matveev

Neurobiology and Neurobionics

A Novel Avoidance Test Setup: Device and Exemplary Tasks . . . . . . . . 159
Alexandra I. Bulava, Sergey V. Volkov, and Yuri I. Alexandrov
Direction Selectivity Model Based on Lagged
and Nonlagged Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Anton V. Chizhov, Elena G. Yakimova, and Elena Y. Smirnova
Wavelet and Recurrence Analysis of EEG Patterns of Subjects
with Panic Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Olga E. Dick
Contents xv

Two Delay-Coupled Neurons with a Relay Nonlinearity . . . . . . . . . . . . 181

Sergey D. Glyzin and Margarita M. Preobrazhenskaia
Brain Extracellular Matrix Impact on Neuronal Firing Reliability
and Spike-Timing Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Maiya A. Rozhnova, Victor B. Kazantsev, and Evgeniya V. Pankratova
Contribution of the Dorsal and Ventral Visual Streams
to the Control of Grasping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Irina A. Smirnitskaya

Deep Learning
The Simple Approach to Multi-label Image Classification Using
Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Yuriy S. Fedorenko
Application of Deep Neural Network for the Vision System
of Mobile Service Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Nikolay Filatov, Vladislav Vlasenko, Ivan Fomin,
and Aleksandr Bakhshiev
Research on Convolutional Neural Network for Object
Classification in Outdoor Video Surveillance System . . . . . . . . . . . . . . . 221
I. S. Fomin and A. V. Bakhshiev
Post-training Quantization of Deep Neural Network Weights . . . . . . . . 230
E. M. Khayrov, M. Yu. Malsagov, and I. M. Karandashev
Deep-Learning Approach for McIntosh-Based Classification
Of Solar Active Regions Using HMI and MDI Images . . . . . . . . . . . . . . 239
Irina Knyazeva, Andrey Rybintsev, Timur Ohinko,
and Nikolay Makarenko
Deep Learning for ECG Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Viktor Moskalenko, Nikolai Zolotykh, and Grigory Osipov
Competitive Maximization of Neuronal Activity in Convolutional
Recurrent Spiking Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Dmitry Nekhaev and Vyacheslav Demin
A Method of Choosing a Pre-trained Convolutional Neural
Network for Transfer Learning in Image Classification Problems . . . . . 263
Alexander G. Trofimov and Anastasia A. Bogatyreva
The Usage of Grayscale or Color Images for Facial Expression
Recognition with Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 271
Dmitry A. Yudin, Alexandr V. Dolzhenko, and Ekaterina O. Kapustina
xvi Contents

Applications of Neural Networks

Use of Wavelet Neural Networks to Solve Inverse Problems
in Spectroscopy of Multi-component Solutions . . . . . . . . . . . . . . . . . . . . 285
Alexander Eﬁtorov, Sergey Dolenko, Tatiana Dolenko, Kirill Laptinskiy,
and Sergey Burikov
Automated Determination of Forest-Vegetation Characteristics
with the Use of a Neural Network of Deep Learning . . . . . . . . . . . . . . . 295
Daria A. Eroshenkova, Valeri I. Terekhov, Dmitry R. Khusnetdinov,
and Sergey I. Chumachenko
Depth Mapping Method Based on Stereo Pairs . . . . . . . . . . . . . . . . . . . 303
Vasiliy E. Gai, Igor V. Polyakov, and Olga V. Andreeva
Semantic Segmentation of Images Obtained by Remote Sensing
of the Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Dmitry M. Igonin and Yury V. Tiumentsev
Diagnostics of Water-Ethanol Solutions by Raman Spectra
with Artiﬁcial Neural Networks: Methods to Improve Resilience
of the Solution to Distortions of Spectra . . . . . . . . . . . . . . . . . . . . . . . . . 319
Igor Isaev, Sergey Burikov, Tatiana Dolenko, Kirill Laptinskiy,
and Sergey Dolenko
Metaphorical Modeling of Resistor Elements . . . . . . . . . . . . . . . . . . . . . 326
Vladimir B. Kotov, Alexandr N. Palagushkin, and Fedor A. Yudkin
Semi-empirical Neural Network Models of Hypersonic Vehicle
3D-Motion Represented by Index 2 DAE . . . . . . . . . . . . . . . . . . . . . . . . 335
Dmitry S. Kozlov and Yury V. Tiumentsev
Style Transfer with Adaptation to the Central Objects of the Scene . . . 342
Alexey Schekalev and Victor Kitov
The Construction of the Approximate Solution of the Chemical
Reactor Problem Using the Feedforward Multilayer
Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Dmitriy A. Tarkhov and Alexander N. Vasilyev
Linear Prediction Algorithms for Lossless Audio Data Compression . . . 359
L. S. Telyatnikov and I. M. Karandashev

Neural Network Theory, Concepts and Architectures

Approach to Forecasting Behaviour of Dynamic System Beyond
Borders of Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
A. A. Brynza and M. O. Korlyakova
Contents xvii

Towards Automatic Manipulation of Arbitrary Structures

in Connectivist Paradigm with Tensor Product Variable Binding . . . . . 375
Alexander V. Demidovskij
Astrocytes Organize Associative Memory . . . . . . . . . . . . . . . . . . . . . . . . 384
Susan Yu. Gordleeva, Yulia A. Lotareva, Mikhail I. Krivonosov,
Alexey A. Zaikin, Mikhail V. Ivanchenko, and Alexander N. Gorban
Team of Neural Networks to Detect the Type of Ignition . . . . . . . . . . . . 392
Alena Guseva and Galina Malykhina
Chaotic Spiking Neural Network Connectivity Conﬁguration
Leading to Memory Mechanism Formation . . . . . . . . . . . . . . . . . . . . . . 398
Mikhail Kiselev
The Large-Scale Symmetry Learning Applying Pavlov Principle . . . . . . 405
Alexander E. Lebedev, Kseniya P. Solovyeva,
and Witali L. Dunin-Barkowski
Bimodal Coalitions and Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 412
Leonid Litinskii and Inna Kaganowa
Building Neural Network Synapses Based on Binary Memristors . . . . . 420
Mikhail S. Tarkov

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427

Invited Papers
Deep Learning a Single Photo Voxel
Model Prediction from Real
and Synthetic Images

Vladimir V. Kniaz1,2(B) , Peter V. Moshkantsev1,3 , and Vladimir A. Mizginov1

1
State Research Institute of Aviation Systems (GosNIIAS), Moscow, Russia
{vl.kniaz,vl.mizginov}@gosniias.ru, [email protected]
2
Moscow Institute of Physics and Technology (MIPT), Moscow, Russia
3
Moscow Aviation Institute, Moscow, Russia

Abstract. Reconstruction of a 3D model from a single image is chal-

lenging. Nevertheless, recent advances in deep learning methods demon-
strated exciting progress toward single-view 3D object reconstruction.
However, successful training of a deep learning model requires an exten-
sive dataset with pairs of geometrically aligned 3D models and color
images. While manual dataset collection using photogrammetry of laser
scanning is challenging, the 3D modeling provides a promising method
for data generation. Still, a deep model should be able to generalize from
synthetic to real data. In this paper, we evaluate the impact of the syn-
thetic data in the dataset on the performance of the trained model. We
use a recently proposed Z-GAN model as a starting point for our research.
The Z-GAN model leverages generative adversarial training and a frustum
voxel model to provide the state-of-the-art results in the single-view voxel
model prediction. We generated a new dataset with 2k synthetic color
images and voxel models. We train the Z-GAN model on synthetic, real,
and mixed images. We compare the performance of the trained models
on real and synthetic images. We provide a qualitative and quantitative
evaluation in terms of the Intersection over Union between the ground
truth and predicted voxel models. The evaluation demonstrates that the
model trained only on the synthetic data fails to generalize to real color
images. Nevertheless, a combination of synthetic and real data improves
the performance of the trained model. We made our training dataset
publicly available (http://www.zeﬁrus.org/SyntheticVoxels).

Keywords: Generative adversarial networks · Deep learning ·

Voxel model prediction · 3D object reconstruction

1 Introduction

Prediction of a 3D model from an image requires an estimation of the camera

pose related to the object and reconstruction of the object’s shape. While tradi-
tional multi-view stereo approaches [22,23,25] provide a robust solution for 3D
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 3–16, 2020.
https://doi.org/10.1007/978-3-030-30425-6_1
4 V. V. Kniaz et al.

Fig. 1. Results of our image-to-voxel translation based on generative adversarial net-

work (GAN) and frustum voxel model. Input color image (left). Ground truth frustum
voxel model slices colored as a depth map (middle). The voxel model output (right).

reconstruction, prediction of a 3D model from a monocular camera is required

in such applications as mobile robotics, augmented reality for smartphones, and
reconstruction o f lost cultural heritage [21]. Single image 3D reconstruction is
ambiguous. Firstly, a single image doesn’t provide enough data to estimate the
distance to the object’s surface. Secondly, back surfaces are not visible on a sin-
gle photo. Therefore, a priori knowledge about the object’s shape is required for
an accurate single-view reconstruction.
Recent advances of deep learning methods demonstrated impressive progress
in single-view 3D reconstruction [13,35,41,43]. Modern voxel model prediction
methods fall into two categories: object-centered and view-centered [35]. Object-
centered methods [13,41] predict the same voxel model for any camera pose
relative to an object. They aim to recognize object class in the input photo
and to predict its voxel model in the object-centered coordinate system. For
example, an object-centered method will generate the same voxel model for the
front facing car and the car captured from the rear side.
In contrast to object-centered methods, view-centered models provide differ-
ent outputs for different camera poses. They aim to generate a voxel model of
the object in the camera’s coordinate system. While a training dataset for an
object-centered method requires only a single voxel model for all images of a
single object class, each image in the training dataset for a view-centered app-
roach requires a geometrically aligned voxel model. Thus, generation of a view-
centered dataset is challenging. Nevertheless, view-centered methods generally
outperform object-centered methods [24,35].
A research project has been recently started by the authors with the aim
of developing a low-cost driver assistance system with a monocular camera. An
efficient training dataset generation technique is required to train a single-view
3D reconstruction model successfully. The technique should provide means for
modeling various traffic and weather conditions.
Recently a new kind of a view-centered 3D object representation was pro-
posed [24]. It is commonly called a frustum voxel model (fruxel model). Unlike
ordinary voxel models with cubic elements, fruxel models have trapezium-shaped
Hamiltonian Mechanics 5

elements that represent slices of the camera’s frustum. Each fruxel is aligned with
the pixel of the input color image (see Fig. 1). Fruxel models facilitate robust
training of a view-centered model as the contour alignment between the input
image and the fruxel model is preserved.
To the best of our knowledge, there are no results in the literature regarding
view-centered voxel model dataset generation using synthetic images and 3D
modeling. In this paper, we explore the impact of the synthetic data in the
performance of a view-centered model. We use a recently proposed generative
adversarial model Z-GAN [24] as a starting point for our research. We prepared
an extensive SyntheticVoxels dataset with 2k synthetic images of three object
classes and corresponding ground truth fruxel models. We made our dataset
publicly available. We compare the performance of the Z-GAN model trained on
real, synthetic, and mixed data.
The results of joint training on the synthetic and real data are encouraging
and show that synthetic data allows the model to generalize to previously unseen
objects. The developed view-centered dataset generation technique allows mod-
eling challenging 3D object conﬁgurations and traﬃc situations that can not be
reconstructed online using laser scanning or similar approaches.

2 Related Work
Generative Adversarial Networks. Development of a new type of neural
networks known as Generative Adversarial Networks (GANs) [14] made it possi-
ble to provide a mapping from a random noise vector to a domain of the desired
outputs (e.g., images, voxel models). GANs have received a lot of scholar atten-
tion in recent years. These networks provide inspiring results in such tasks as
image-to-image translation [20] and the voxel model generation [42].

Single-Photo 3D Model Reconstruction. Accurate 3D reconstruction is

challenging if only a single color image is used as an input. This problem was
intensively studied recently [10,31,32]. Recently some authors proposed new
methods that leverage deep learning [7,13,19,33,35,39,41,42,45]. Despite some
methods were proposed for prediction of unobserved voxels from a single depth
map [12,37,46–48], prediction of the voxel model of a complex scene from a
single color (RGB) image is more ambiguous. The 3D shape of an object should
be known for the accurate performance of the method. Therefore, the solution of
the problem occurs in 2 steps: object recognition and a 3D shape reconstruction.
In [13] a deep learning method for a single image voxel model reconstruction
was proposed. The method leverages an auto-encoder architecture for a voxel
model prediction. The method showed encouraging results, but the resolution of
the model was only 20 × 20 × 20 elements. A combined method for 3D model
reconstruction was proposed in [7]. In [33] a new voxel decoder architecture
was proposed that uses voxel tube and shape layers to increase the resulting
voxel model resolution. A comparison of surface-based and volumetric 3D model
prediction is performed in [35].
6 V. V. Kniaz et al.

Methods that leverage a latent space for 3D shape synthesis were developed
recently [5,13,42]. Wu et al. have proposed a GAN model [42] for a voxel model
generation (3D-GAN). This made it possible to predict models with a resolution
64 × 64 × 64 elements from a randomly sampled noise vector. The developed
method was used for a single-image 3D reconstruction using an approach pro-
posed in [13]. Despite the fact that 3D-GAN increased the number of elements in
the model compared to [13], the generalization ability of this method was low,
especially for previously unseen objects.

3D Shape Datasets. Several 3D shape datasets were designed [6,27,38,44]

for deep learning. Semantic segmentation was made for the Pascal VOC dataset
[11] to align a set of CAD models with color photos. The extended dataset was
named Pascal 3D+ [44]. However, the models trained with this dataset showed
a rough match between a 3D model and a photo. ShapeNet dataset [6] was
used to solve the problem of 3D shape recognition and prediction. However, the
ShapeNet provides only synthetic images and the exact reconstruction of the
model using single image is possible only with synthetic data. Hinterstoisser et
al. have generated a large Linemod dataset [15] with aligned RGB-D data. The
Linemod dataset was intensively used for training 6D pose estimation algorithms
[1–4,8,17,18,26,28,30,36,40]. In [16] a large dataset for 6D pose estimation of
texture-less objects was developed. An MVTec ITODD dataset [9] addresses the
challenging problem of 6D pose prediction in industrial application.

3 Method
The aim of the present research is to compare the performance of a single photo
voxel model prediction method trained on synthetic, real and mixed data. In
our research we use a generative adversarial network Z-GAN [24] that performs
color image-to-voxel model translation. Z-GAN model uses a special kind of voxel
model in which the voxel model is aligned with an input image.
While a depth map that present distances only to the object surface from
a given viewpoint, the voxel model includes information about the entire 3D
scene. The proposed frustum voxel models combines features of a depth map
and a voxel model. We use a hypothesis made by [41] as the starting point
for our research. To provide the aligned voxel model, we combine depth map
representation with a voxel grid. We term the resulting 3D model as a Frustum
Voxel model (Fruxel model).

3.1 Frustum Voxel Model

The main idea of the fruxel model is to provide a precise alignment of voxel slices
with contours of a color image. Such alignment can be achieved with a common
voxel model if the camera has an orthographic projection and its optical axis
coincides with the Z-axis of the voxel model. As the camera frustum is no longer
corresponds to the cube voxel elements, we use sections of a pyramid.
Hamiltonian Mechanics 7

Fruxel model representation provides multiple advantages. Firstly, each XY

slice of the model is aligned with some contours on a corresponding color photo
(some parts of them can be invisible). Secondly, a fruxel model encodes a shape
of both visible and invisible surfaces. Hence, unlike the depth map, it contains
complete information about the 3D shapes. In other words, the fruxel model
imitates perspective space. It is important to note that all slices of the fruxel
model have the same number of fruxel elements (e.g., 128 × 128 × 1).
A fruxel model is characterized by a following set of parameters: {zn , zf , d, α},
where zn is a distance to a near clipping plane, zf is a distance to a far clipping
plane, d is the number of frustum slices, α is a field of view of a camera.
Fruxel model is a special kind of a voxel model optimized for the training of
conditional adversarial networks. However, a fruxel model can be converted into
3 common data types: (1) voxel model, (2) depth map, (3) object annotation.
A voxel model can be generated from the fruxel model by scaling each con-
sequent layer slice by the coefficient k defined as:
zn
k= , (1)
zn + sz
z −z
where sz = f d n is the size of the fruxel element along the Z-axis.
To generate a depth map P from the fruxel model, we multiply indices of the
frontmost non-empty elements by the step sz

P (x, y) = argmin[F (x, y, i) = 1] · sz + zn , (2)

where P (x, y) is an element of a depth map, F (x, y, i) element of a fruxel model

at slice i with coordinates (x, y).
An object annotation is equal to a product of all elements with given x, y
coordinates

d
A(x, y) = F (x, y, i). (3)
i=0

3.2 Conditional Adversarial Networks

Generative adversarial networks generate a signal B̂ for a given random noise

vector z, G : z → B̂ [14,20]. Conditional GAN transforms an input image
A and the vector z to an output B̂, G : {A, z} → B̂. The input A can be
an image that is transformed by the generator network G. The discriminator
network D is trained to distinguish “real” signals from target domain B from the
“fakes” B̂ produced by the generator. Both networks are trained simultaneously.
Discriminator provides the adversarial loss that enforces the generator to produce
“fakes” B̂ that cannot be distinguished from “real” signal B.
We train a generator G : {A} → B̂ to synthesize a fruxel model B̂ ∈ Rw×h×d
conditioned by a color image A ∈ Rw×h×3 .
8 V. V. Kniaz et al.

256 × 256 × 3 128 × 128 × 64 64 × 64 × 128 32 × 32 × 256 16 × 16 × 512 8 × 8 × 512

4 × 4 × 512
2 × 2 × 512
1 × 1 × 512
conv2D conv2D conv2D conv2D conv2D conv2D conv2D conv2D
4×4 4×4 4×4 4×4 4×4 4×4 4×4 4×4

deconv3D

deconv3D
deconv3D

deconv3D
4×4×4

4×4×4

4×4×4
2×4×4
4×4×4

2×4×4
1 × 1 × 1 × 1024
2 × 2 × 2 × 1024
4 × 4 × 4 × 1024
16 × 16 × 16 × 1024 8 × 8 × 8 × 1024
128 × 128 × 128 128 × 128 × 128 × 128 64 × 64 × 64 × 256 32 × 32 × 32 × 512

Feature map 2D convolution 3D deconvolution Copy inflate

Fig. 2. The architecture of the generator.

3.3 Z-GAN Framework

We use pix2pix [20] framework as a base to develop our Z-GAN model. We keep
the encoder part of the generator unchanged. We change 2D convolution layers
with 3D deconvolution layers to encode a correlation between neighbor slices
along the Z-axis.
We keep the skip connections between the layers of the same depth that
were proposed in the U-Net model [34]. We believe that skip connections help
to transfer high-frequency components of the input image to the high-frequency
components of the 3D shape.

3.4 Z-GAN Model

The main idea of our volumetric generator G is to use the correspondence

between silhouettes in a color image and slices of a fruxel model. The original
U-Net generator leverages skip connections between convolutional and deconvo-
lutional layers of the same depth to transfer fine details from the source to the
target domain effectively.
We made two contributions to the original U-Net model. Firstly, we replaced
the 2D deconvolutional filters with 3D deconvolutional filters. Secondly, we mod-
ified the skip connections to provide the correspondence between shapes of 2D
and 3D features. The outputs of 2D convolutional filters in the left (encoder)
side of our generator are F2D ∈ Rw×h×c tensors, where w, h is the width and the
height of a feature map and c is the number of channels. The outputs of a 3D
deconvolutional filters in the right (decoder) side are F3D ∈ Rw×h×d×c tensors.
Hamiltonian Mechanics 9

Fig. 3. Synthetic dataset generation technique: (a) virtual camera, (b) slice of
fruxel model, (c) cutting plane, (d) low-poly 3D model, (e) synthetic color image.

We use d copies of each channel of F2D to ﬁll the third dimension of F3D . We
term this operation as “copy inﬂate”. The architecture of generator is presented
in Fig. 2.

3.5 Synthetic Dataset Generation Technique

We developed a synthetic dataset generation technique to create our Synthet-
icVoxels dataset (see Fig. 3). We use low poly 3D models of objects to render
both realistic synthetic images and generate frustum voxel models. We use 360◦
panoramic textures to provide a variety of realistic backgrounds. For each object,
we sample random points on a hemisphere around the object and use them as
virtual camera (a) locations. For each frame, we point the camera’s optical axis
at the object and select a random background texture. We randomly select the
color of the object’s texture for each frame.
When camera locations and background textures are prepared for all frames,
we perform dataset generation twofold. Firstly, we render a synthetic color image.
Secondly, we move a cutting plane object (c) normal to the camera optical axis
from the distance zn to the distance zf of the target fruxel model with the step
sz . Therefore, for each synthetic color image, we render d slices of the fruxel
model. We use a Boolean intersection between the cutting plane and the low-
poly 3D model to get all slices (b) of the fruxel model. Such approach allows
us to keep contours in color images (e) and slices (b) geometrically aligned. We
stack all d slices along the camera’s optical axis to obtain the resulting fruxel
model with dimensions w × h × d.
We generate our dataset using the Blender 3D creation suite. We automate
background and object color randomization, the camera movement, and the
10 V. V. Kniaz et al.
Oﬀ-road vehicle

Fig. 4. Examples of color images and corresponding fruxel models from our Synthet-
icVoxels dataset. Fruxel models are presented as depth maps in pseudo-colors.

cutting plane movement using the Blender Python API. We use an additional
ground plane to provide realistic object shadows. We render the plane with shad-
ows separately and use alpha-compositing to obtain the ﬁnal synthetic image.
SyntheticVoxels Dataset. Examples of synthetic images with ground truth
fruxel models from our SyntheticVoxels dataset are presented in Figs. 4 and 5.
The dataset includes images and fruxel models for four object classes: car, truck,
oﬀ-road vehicle, and van.

4 Experiments
4.1 Network Training
Our Z-GAN framework was trained on the VoxelCity [24] and SyntheticVoxels
datasets using PyTorch library [29]. We use independent test splits of Synthet-
icVoxels and VoxelCity datasets for evaluation with fruxel model parameters
{zn = 2, zf = 12, d = 128, α = 40◦ }. The training was performed using the
NVIDIA 1080 Ti GPU and took 20 hours for the whole framework. For network
optimization, we use a minibatch stochastic gradient descent with an Adam
solver. We set the learning rate to 0.0002 with momentum parameters β1 = 0.5,
β2 = 0.999 similar to [20].

4.2 Qualitative Evaluation

We show results of single-view voxel model generation in Figs. 6 and 7. We use
three object classes: car, oﬀ-road vehicle, and van. The Z-GAN model trained
Hamiltonian Mechanics 11

Car
Van

Fig. 5. Examples of color images and corresponding fruxel models from our Synthet-
icVoxels dataset. Fruxel models are presented as depth maps in pseudo-colors.

only on synthetic data fails to generalize to real images. Nevertheless, it success-

fully predicts realistic fruxel models for the synthetic input. The real data from
VoxelCity dataset [24] contains images of only nine models of cars. Therefore,
the Z-GAN model trained only on real data fails to predict fruxel models for cars
with a new 3D shape or color. The Z-GAN model trained on the union of real and
synthetic data produces voxel models of the complex objects with ﬁne details.
12 V. V. Kniaz et al.

Input GT Real Synthetic Real+Synthetic

Oﬀ-road vehicle Oﬀ-road vehicle Car

Fig. 6. Qualitative evaluation on synthetic images from SyntheticVoxels dataset. Fruxel

models are presented as depth maps in pseudo-colors.

Input GT Real Synthetic Real+Synthetic

Car
Car
Car

Fig. 7. Qualitative evaluation on real images from VoxelCity dataset. Fruxel models
are presented as depth maps in pseudo-colors.
Hamiltonian Mechanics 13

4.3 Quantitative Evaluation

We present results of the quantitative evaluation in terms of Intersection over

Union (IoU) in Table 1. The Z-GAN model predicts probability p of each element
of fruxel model being occupied by an object. We use a threshold p > 0.99 to
compare a predicted fruxel model with the ground truth model. The Z-GAN model
trained on synthetic and real data provides the best IoU for all object classes
except the van class. Most of images for the van class in our SyntheticVoxels
dataset do not provide backgrounds similar to the van in the VoxelCity dataset.
We believe that this is the reason for a slightly lower performance on the van
class. Nevertheless, the Z-GAN model trained on synthetic and real data provides
the highest mean IoU.

Table 1. IoU metric for diﬀerent object classes for Z-GAN model trained on real, syn-
thetic and mixed data.

Method Object class

car van oﬀ-road vehicle mean
Z-GAN synthetic 0.06 0.15 0.07 0.34
Z-GAN real 0.71 0.84 0.53 0.73
Z-GAN real + synthetic 0.76 0.79 0.79 0.78

5 Conclusions

We demonstrated that augmentation of the dataset with the synthetic data

improves the performance of image-to-frustum voxel model translation method.
While methods trained on purely synthetic data fail to generalize to real images,
joint training on synthetic and real images allows our model to achieve higher
IoU and to generalize to previously unseen objects. Our main observation is
that the variety of background textures aids the model’s generalization ability.
In our experiments, we use the Z-GAN generative adversarial network. To train
the Z-GAN model, we generated a new SyntheticVoxels dataset with 2k synthetic
images of three object classes and view-centered frustum voxel models. We devel-
oped a technique for the automatic generation of a view-centered dataset using
low-poly 3D models and 360◦ panoramic background textures. Our technique
and dataset can be used to train a single-view 3D reconstruction models. The
Z-GAN model trained on our SyntheticVoxels dataset achieves state-of-the-art
results in single photo voxel model prediction.

Acknowledgments. The reported study was funded by Russian Foundation for Basic
Research (RFBR) according to the project No 17-29-04410, and by the Russian Science
Foundation (RSF) according to the research project No 19-11-11008.
14 V. V. Kniaz et al.

References
1. Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.:
Pose guided RGBD feature learning for 3d object pose estimation. In: IEEE Inter-
national Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October
2017, pp. 3876–3884 (2017). https://doi.org/10.1109/ICCV.2017.416
2. Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.K.:
Pose guided RGBD feature learning for 3D object pose estimation. In: The IEEE
International Conference on Computer Vision (ICCV) (2017)
3. Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S.,
Rother, C.: DSAC - differentiable RANSAC for camera localization. In: The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
4. Brachmann, E., Rother, C.: Learning less is more - 6d camera localization via
3d surface regression. In: The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) (2018)
5. Brock, A., Lim, T., Ritchie, J., Weston, N.: Generative and discriminative voxel
modeling with convolutional neural networks, pp. 1–9 (2016). https://nips.cc/
Conferences/2016. Workshop contribution; Neural Information Processing Con-
ference : 3D Deep Learning, NIPS, 05–12 Dec 2016
6. Chang, A.X., Funkhouser, T.A., Guibas, L.J., Hanrahan, P., Huang, Q.X., Li, Z.,
Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: Shapenet: an
information-rich 3d model repository (2015). CoRR arXiv:abs/1512.03012
7. Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3d-r2n2: a unified approach
for single and multi-view 3d object reconstruction. In: Proceedings of the European
Conference on Computer Vision (ECCV) (2016)
8. Doumanoglou, A., Kouskouridas, R., Malassiotis, S., Kim, T.: Recovering 6d object
pose and predicting next-best-view in the crowd. In: 2016 IEEE Conference on
Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA,
27–30 June 2016, pp. 3583–3592 (2016). https://doi.org/10.1109/CVPR.2016.390
9. Drost, B., Ulrich, M., Bergmann, P., Hartinger, P., Steger, C.: Introducing mvtec
itodd - a dataset for 3d object recognition in industry. In: The IEEE International
Conference on Computer Vision (ICCV) Workshops (2017)
10. El-Hakim, S.: A flexible approach to 3d reconstruction from single images. In: ACM
SIGGRAPH, vol. 1, pp. 12–17 (2001)
11. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The
pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338
(2009)
12. Firman, M., Mac Aodha, O., Julier, S., Brostow, G.J.: Structured prediction of
unobserved voxels from a single depth image. In: The IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR) (2016)
13. Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and
generative vector representation for objects, chap. 34, pp. 702–722. Springer, Cham
(2016)
14. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,
S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural
Information Processing Systems, pp. 2672–2680 (2014)
15. Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab,
N.: Model based training, detection and pose estimation of texture-less 3d objects
in heavily cluttered scenes. In: Asian Conference on Computer Vision, pp. 548–562.
Springer, Heidelberg (2012)
Hamiltonian Mechanics 15

16. Hodaň, T., Haluza, P., Obdržálek, Š., Matas, J., Lourakis, M., Zabulis, X.: T-
LESS: an RGB-D dataset for 6d pose estimation of texture-less objects. In: IEEE
Winter Conference on Applications of Computer Vision (WACV) (2017)
17. Hodan, T., Haluza, P., Obdrzálek, S., Matas, J., Lourakis, M.I.A., Zabulis, X.:
T-LESS: an RGB-D dataset for 6d pose estimation of texture-less objects. In: 2017
IEEE Winter Conference on Applications of Computer Vision, WACV 2017, Santa
Rosa, CA, USA, 24–31 March 2017, pp. 880–888 (2017). https://doi.org/10.1109/
WACV.2017.103
18. Hodaň, T., Matas, J., Obdržálek, Š.: On evaluation of 6d object pose estimation.
In: European Conference on Computer Vision Workshops (ECCVW) (2016)
19. Huang, Q., Wang, H., Koltun, V.: Single-view reconstruction via joint analysis of
image and shape collections. ACM Trans. Graph. 34(4), 87:1–87:10 (2015)
20. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with con-
ditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 5967–5976. IEEE (2017)
21. Kniaz, V.V., Remondino, F., Knyaz, V.A.: Generative adversarial networks
for single photo 3d reconstruction. ISPRS - International Archives of the
Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-
2/W9, 403–408 (2019). https://doi.org/10.5194/isprs-archives-XLII-2-W9-403-
2019. https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-2-
W9/403/2019/
22. Knyaz, V.: Deep learning performance for digital terrain model generation. In:
Proceedings SPIE Image and Signal Processing for Remote Sensing XXIV, vol.
10789, p. 107890X (2018). https://doi.org/10.1117/12.2325768
23. Knyaz, V.A., Chibunichev, A.G.: Photogrammetric techniques for road surface
analysis. ISPRS - Int. Arch. Photogram. Remote Sens. Spatial Inf. Sci. XLI(B5),
515–520 (2016)
24. Knyaz, V.A., Kniaz, V.V., Remondino, F.: Image-to-voxel model translation with
conditional adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) Computer
Vision - ECCV 2018 Workshops, pp. 601–618. Springer, Cham (2019)
25. Knyaz, V.A., Zheltov, S.Y.: Accuracy evaluation of structure from motion surface
3D reconstruction. In: Proceedings SPIE Videometrics, Range Imaging, and Appli-
cations XIV, vol. 10332, p. 103320 (2017). https://doi.org/10.1117/12.2272021
26. Krull, A., Brachmann, E., Nowozin, S., Michel, F., Shotton, J., Rother, C.:
Poseagent: budget-constrained 6d object pose estimation via reinforcement learn-
ing. In: The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR) (2017)
27. Lim, J.J., Pirsiavash, H., Torralba, A.: Parsing IKEA objects: fine pose estimation.
In: Proceedings of the IEEE International Conference on Computer Vision ICCV
(2013)
28. Ma, M., Marturi, N., Li, Y., Leonardis, A., Stolkin, R.: Region-sequence based
six-stream CNN features for general and fine-grained human action recognition in
videos. Pattern Recogn. 76, 506–521 (2017)
29. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z.,
Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
30. Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method
for predicting the 3d poses of challenging objects without using depth. In: IEEE
International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29
October 2017, pp. 3848–3856 (2017). https://doi.org/10.1109/ICCV.2017.413
31. Remondino, F., El-Hakim, S.: Image-based 3D modelling: a review. Photogram.
Rec. 21(115), 269–291 (2006)
16 V. V. Kniaz et al.

32. Remondino, F., Roditakis, A.: Human ﬁgure reconstruction and modeling from
single image or monocular video sequence. In: Fourth International Conference on
3-D Digital Imaging and Modeling, 2003 (3DIM 2003), pp. 116–123. IEEE (2003)
33. Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested
shape layers. arXiv.org (2018)
34. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedi-
cal image segmentation. In: International Conference on Medical Image Computing
and Computer-Assisted Intervention, pp. 234–241. Springer, Cham (2015)
35. Shin, D., Fowlkes, C., Hoiem, D.: Pixels, voxels, and views: a study of shape rep-
resentations for single view 3d object shape prediction. In: IEEE Conference on
Computer Vision and Pattern Recognition (CVPR) (2018)
36. Sock, J., Kim, K.I., Sahin, C., Kim, T.K.: Multi-task deep networks for depth-based
6D object pose and joint registration in crowd scenarios. arXiv.org (2018)
37. Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene
completion from a single depth image. In: The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (2017)
38. Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., Tenenbaum, J.B.,
Freeman, W.T.: Pix3d: dataset and methods for single-image 3d shape modeling.
In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
39. Tatarchenko, M., Dosovitskiy, A., Brox, T.: Multi-view 3D Models from single
images with a convolutional network. arXiv.org (2015)
40. Tejani, A., Kouskouridas, R., Doumanoglou, A., Tang, D., Kim, T.: Latent-class
hough forests for 6 DoF object pose estimation. IEEE Trans. Pattern Anal. Mach.
Intell. 40(1), 119–132 (2018). https://doi.org/10.1109/TPAMI.2017.2665623
41. Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, W.T., Tenenbaum, J.B.: MarrNet:
3D shape reconstruction via 2.5D sketches. arXiv.org (2017)
42. Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic
latent space of object shapes via 3D generative-adversarial modeling. In: Advances
in Neural Information Processing Systems, pp. 82–90 (2016)
43. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a
deep representation for volumetric shapes. In: 2013 IEEE Conference on Computer
Vision and Pattern Recognition, Princeton University, Princeton, United States,
pp. 1912–1920. IEEE (2015)
44. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3d object
detection in the wild. In: IEEE Winter Conference on Applications of Computer
Vision (WACV) (2014)
45. Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learn-
ing single-view 3d object reconstruction without 3d supervision. papers.nips.cc
(2016)
46. Yang, B., Rosa, S., Markham, A., Trigoni, N., Wen, H.: 3D object dense recon-
struction from a single depth view. arXiv preprint arXiv:1802.00411 (2018)
47. Yang, B., Wen, H., Wang, S., Clark, R., Markham, A., Trigoni, N.: 3D object
reconstruction from a single depth view with adversarial learning. In: The IEEE
International Conference on Computer Vision (ICCV) Workshops (2017)
48. Zheng, B., Zhao, Y., Yu, J.C., Ikeuchi, K., Zhu, S.C.: Beyond point clouds: scene
understanding by reasoning geometry and physics. In: The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR) (2013)
Tensor Train Neural Networks in Retail
Operations

Serge A. Terekhov(B)

SC Svyaznoy, Moscow, Russian Federation

[email protected]

Abstract. The neural network generalization of Tensor Train decom-

position for multidimensional datasets of censored Poisson counts is pre-
sented. The model is successfully applied to two important classes of
retail operations: sales process under the controlled stock distribution
over the retail network, and the optimization of active retailer decisions,
such as pricing policy, marketing actions, and discounts. The advantage
of proposed Tensor Train Neural Network model is in its ability to cap-
ture non-linear relations between similar retail stores and similar con-
sumer goods, as well as jointly estimate sales potential of commodities
with wide dynamic range of popularity.

Keywords: Tensor Train Neural Network · Statistical estimation ·

Poisson counts process · Censored samples · Context bandits ·
Retail operations

1 Introduction
Stable statistical estimation of performance indicators and activity responses is
critical for control tasks in modern retail operations. Corporate data is not only
very noisy because of the intrinsic stochasticity of market processes, but observa-
tions are also the subject of truncation and censoring, e.g. due to endogenous con-
trol decisions and stock availability. Decision makers need truthful disturbance-
free measures both for the operations (such as sales) under the current condi-
tions, and for the potential value of business in some new or alternative contexts.
In case of retail network operations, the living example is an estimation of per-
spective sales of a certain commodity in alternative stores, where it was not
shown before.
This paper addresses two important classes of retail operations: the process
of sales over the retail network, and the optimization of active decisions, such
as pricing policy, marketing actions, and discounts. Optimal sales require the
control of stock distribution, while actions need valuable mix of their parameters.
The resulting value of operation depends on several context dimensions,
which will be treated as exogenous factors. For example, the estimated intensity
of buyers ﬂow is identiﬁed at certain location (retail store), for particular item
to be sold, and at certain period of time. These discrete context variables are
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 17–24, 2020.
https://doi.org/10.1007/978-3-030-30425-6_2
18 S. A. Terekhov

considered as dimensions of multi-dimensional tables or modes of a tensor [1].

Number of tensor modes is usually limited, but the size of discrete dictionary
along each mode is rather high. The entire count of data cells span can quickly
become several billions and more. The data is sparse and self-similar, so it can be
modelled using low rank tensor decompositions [1–4]. Among them the Tensor
Train decomposition [4] is in the focus of our research, since this formulation
allows to introduce useful generalizations.
As an complement to standard Tensor Train formulation [2,4], this paper
presents the following extensions:
(1) Sparse data tensor contains realizations of random event counts rather than
well-deﬁned numerical values. Each value is observed for its own combination
of indices of context tensor modes;
(2) Observations follow right-censored Poisson distributions, with individual
rates β and upper censoring bounds. The censoring indicators are reported
together with counts truncated at their upper bounds;
(3) Basic Tensor Train matrix multiplications are generalized to layered neural
operations. For each modelled tensor cell the set of neural layers is operated
layer by layer instead of the chaining linear multiplications matrix by matrix.
Elements of original Tensor Train matrices (i.e. reduced 3-mode tensors) are
serving as adjustable synaptic weights for neural units.
The neural extension of Tensor Train model, called Tensor Train Neural
Network (TTNN), was proposed in author’s lectures [6–8]. In this paper the
TTNN is treated as non-parametric variational probe function that estimates
de-censored Poisson distribution parameter β. TTNN is speciﬁc neural network
architecture comprising large set of small neural layers dynamically combined
for each set of tensor indices. This approach avoids training of one huge neural
network with sparse representation of input context variables. On the other hand,
it preserves the full richness of nonlinear neural approximations, not available
e.g. in recent linear variable grouping methods [21].
The idea of representing the regular deep neural network layer by Tensor
Train decomposition was proposed earlier [5]. Here, in contrast, the neural archi-
tecture is ab initio designed as the Tensor Train set of small elements, with no
assumption about any underlying large neural model.
In described retail operations the TTNN model represents a statistical esti-
mate of Poisson events rates from passively observed censored counts. The appli-
cation of TTNN model to rewards estimation for actively collected samples in
context bandit formulation is also discussed.

2 Formulation
Consider the basic operation of daily sales of commodities portfolio in large retail
network stores for extended period. Collected data is counts of items sold for
each commodity, at each store, and for every business day. Each store sells just
a fraction of the whole list of available commodity types, limited both physically
TTNN in Retail 19

by the store capacity and by the marketing plan. The practical problem is to
estimate the intensity of sales of total set of portfolio items over the whole set
of stores.
Daily sales at particular context usually are limited by availability of stock,
thus some of the observed counts are censored by truncation. The probability
to observe a ≥ 0 counts at the end of time period t given initial available stock
r > 0 is deﬁned by truncated Poisson distribution:

(β · t)a
P (a|r, β, t) = exp(−β · t), a = 1, 2, ...r − 1
a!

P (a = 0|r, β, t) = exp(−β · t)

r−1
P (a = r|r, β, t) = 1 − P (k|r, β, t)
k=0

where β is intensity of Poisson ﬂow. This distribution can be easily derived from
the general queueing birth-death process [16].
The observations can be represented as a tensor with d = 3 modes for the
time periods, set of stores, and the portfolio items. To simplify the notation let’s
introduce tensor enumeration index s = (i1 , i2 , .., id ). The set of index combi-
nations of all available observations in data set A = {A(s)} is denoted as S.
Then each observation is independently generated from the distribution with
individual unknown parameter β = β(s).

Fig. 1. Schematic representation of Tensor Train Neural Network assembly. For each
set of tensor indices (ﬁlled), the linked chain of corresponding neural layers is composed
to compute the output β̂.

The stable estimates β̂ of the complete tensor from sparse, noisy and censored
data is challenging task [1], in some sense similar to tensor design of recommender
systems [9]. We will consider the variational approach, when β̂ is approximated
with a member of non-parametric low-rank Tensor Train model, where all matrix
elements are treated as free variational parameters. In classical formulation [2]:

β̂(i1 , ..., id ) ≈ g(i1 , j1 ) · G(j1 , i2 , j2 ) · ... · G(jd−2 , id−1 , jd−1 ) · g(jd−1 , id )
j1 ..jd−1
20 S. A. Terekhov

with log statistical link function for unrestricted model, or identity link with the
restriction of non-negativity of matrix elements. Algebraic multiplications h(j) =
g(i1 , :) · G(:, i2 , j) are the special case of more general neural layer functions
h(i2 , j) = Fj (h(i1 , :) · G(:, i2 , j)), where h(i1 , :) = g(i1 , :), and F is vector of
sigmoids. Neural transformations are applied mode by mode, with single neuron
for the last id . Resulting model is called Tensor Train Neural Network [6–8].
Conceptually, the diagram of TTNN functioning is shown in Fig. 1.
Instead of using a single large neural network, the TTNN model comprises
many tiny neural networks, with the number of units deﬁned by tensor decompo-
sition rank. To estimate every element β̂(i1 , i2 , .., id ) the neural layers chain with
indices (i1 , i2 , .., id ) is dynamically composed. The gradient of modelled function
is computed via standard backpropagation chaining rule.

3 Estimation
The pattern of observed data samples follows some stable distribution as deﬁned
by retail operational practice. Samples are independent, since at every location
and time period the outcome is produced by diﬀerent customers. The logarithm
of observed data likelihood depends on variational tensor parameters:

L(A|β) = log P (A(s)|β(s), r(s))
s∈S

where s is tensor index, r are censoring indicators. The target Poisson intensities
β are given by TTNN neural model, as described above.

0.141

0.139

0.137

2 4 6

Fig. 2. Likelihoods (arbitrary units) of predictions from bagging ensembles versus vary-
ing model rank (decomposition matrix dimension, same for all tensor modes). Circles
are oﬀ-samples, dots - training samples for each committee member.
TTNN in Retail 21

The estimates of variational matrices g(i1 ), G(i2 ), ...G(id−1 ), g(id ) could be

obtained using direct maximization of total likelihood. But it should be noticed
that the elements with pure censored observations are formally unbounded (as
it reads form truncated Poisson distribution).
√ Also different tensor modes can
be re-scaled with proper multipliers d C1 C2 ...Cd = 1, leading to non-uniqueness
similar to one in canonical CP tensor decomposition [1]. This adjusts the use
of Bayesian formulation, when random model parameters are regularized with
L1/L2 prior terms, and random Poisson data β is enriched with Gamma prior
Lg = ξβ − η · log(β) with small low-informative parameters.
Resulting maximization of posterior is fulfilled by means of stochastic gra-
dient heuristics (Adam [10], ADADELTA [11]). Also, more traditional Rprop
[12] was used with extended random batch pages (1M and more samples). The
attraction of Rprop is in more clear evidence of the convergence, with no exter-
nal step size controlling schedules. There is no known sequential tensor update
methods (like classic [13]) for censored likelihoods.
Applications of this kind are scaled upto 100 × 5000 × 5000 tensors with
hundreds of millions of sparse observations.
All estimation models are intended to be used in a new unseen combinations
of tensor indices. Thus, in discussed retail application, some commodity sales
are planned for alternative stores. A statistical measure of estimated β uncer-
tainty is required, with low and high quantiles for risk assessment. To meet these
requirements it is proposed to apply bagging committee methodology [14]. Boot-
strap ensemble of TTNN models is trained in parallel, with control of on- and
off-sample likelihoods. Resulting estimates of β are aggregated into one median
estimate, with individual quantile ranges for each estimated value. The learn-
ing curve for bagging ensemble with incremented model complexity (TT matrix
rank) is shown in Fig. 2. Lowest off-sample neg-log likelihood indicates the best
ensemble rank.
Bagging estimators have many attractive properties. In context of tensor
data they correctly report large IQR [16] uncertainties for rare indices combina-
tions. The accuracy is constantly validated using new field data available fresh
after each business day. Table 1 represents a typical example of report row from
overnight computation.

Table 1. Example of report row with β estimates.

index i0 index i1 index i2 beta.med beta.lo beta.hi

91 3053 471 0.138 0.1 0.145

It reads that at the particular store 3053, the SKU 471 for date 91 is expected
to sell with rate of one item in 7 ≈ 1/0.138 days, with 25% risk of running as
low as one item in 10 days or slower.
Adequate complexity of tensor neural model can be assessed also with ordered
rank statistics of false neighbors [15]. Let’s consider series of models with growing
22 S. A. Terekhov

tensor matrices dimensions M = 1, 2, .... For particular tensor mode (the latest
one with indices id is picked in our applications) pairwise distances between all
vectors g(id ) are computed. Ordered neighbors set for each vector U (id ) is col-
lected. The set of neighbors tends to stabilize when the model is approaching the
correct dimension. Rank correlations between these sets for decompositions of
varying complexity M and M + 1 are compared, and recommended model com-
plexity is one with low number of false neighbors. Formal rank-based hypothesis
testing criteria can be utilized [16].

4 Active Operations and Context Bandits

The estimation problem discussed in previous section assumes the fixed exoge-
nous distribution of data samples collected in passive observation regime. This
is the usual situation for routine retail operations. No special treatment of data
except regular data quality checking is required in this case.
Another important class of applications is active operations that change data
generation conditions. These may include varying prices decisions, marketing
and discount actions, and other dynamic control technologies common in retail
practice. In these cases data generation process become partly endogenous, i.e. it
starts depending on performance under previously taken decisions and internal
system variables. This phenomenon is somewhat similar to “self-selection bias”
known in economics literature.
Well-established way to optimize the retail system performance is balanced
stratified sampling under conditions of randomized designed experiments, such
as latin hypercubes [17]. This approach, developed mostly for technical systems,
is very limited in retail practice where necessary experimentation in any non-
profitable conditions should be justified by additional gains in advance.
Tensor decomposition models lead to optimal utilization of all designed data,
since experimentation with different actions can be performed at different loca-
tions (and even with different commodities). Collected data is then fused into
one reward estimating tensor model.
In pure active settings the problem reduces to common contextual bandits
formulation [18]. Consider a set of available controls or actions V , also called
“bandit arms”. In retail applications these are pricing level decisions, discount
packages, or corporate KPI’s targeted to sales optimization. The actions are
applied in different contexts defined by tensor modes. Only one particular action
can be tried for particular tensor indices s = (i1 , i2 , .., id ), and the feedback
estimate is revealed only for chosen option.
Let’s extend the set of tensor modes with additional mode {id+1 ∈ V } for
available actions v ∈ V , including no-action option. The tensor TTNN model is
still directly applicable to estimation of events flow intensities β, provided that
selection of actions is randomized.
To actively choose more profitable action for each context the exploration
process [18,19] is used, with constant exploration factor γ. Given the context s,
the estimates of potentials β̂s are computed form TTNN model for each action v
TTNN in Retail 23

(last coordinate). Let v ∗ (s) is best believed action for s, estimated from TTNN
ensemble median, or from upper conﬁdence bound (UCB), as deﬁned by esti-
mated quantiles. The probability to select an action v in context s is given by:
γ
P (v ∈ V, s) = (1 − γ) · δv,v∗ (s) +
|V |

where δ is Kronecker delta, and |V | is total count of available actions. In case

of active selection the TTNN model is trained with weighted [20] by inverse
propensity samples, ws ∼ 1/P (v, s).
The optimal actions selection control is guided by periodically updated tensor
neural model. The exploration rate γ is usually limited by economic consider-
ations. The practical one, e.g. for pricing decisions, is to conﬁne potential loss
β · γ · (vmax − v ∗ ) within the experimentation budget.

5 Conclusion

Contemporary tensor decomposition models and algorithms are subject of inten-

sive study for more than a decade, from point of view both of theory and of
academic concept applications. During recent couple of years “early birds” of
industrial and commercial applications become more frequent. This paper is one
of the kind, it discusses two important classes of operation research problems in
very traditional offline retail industry.
For the case of sales planning and control, the general sales potential esti-
mation problem has been considered. It is shown that available sales and stock
statistics can be utilized to predict potential sales of new commodities and in
new locations, including estimates of probabilistic risk of operation outcome.
The advantage of proposed Tensor Train Neural Network model is in its ability
to capture non-linear relations between similar stores and similar goods, as well
as jointly estimate sales potential of commodities with wide dynamic range of
popularity.
In the area of active operations such as marketing actions and pricing deci-
sions, neural tensor models are extremely helpful in aggregating actively collected
information in common self-sustained estimates. In bandit setting, different con-
trol actions can be applied to different contexts, such as different discounts in
different stores. Tensor estimates of sales potential then provide a guidance of
selection of most profitable actions for each store location.

References
1. Acar, E., Dunlavy, D.M., Kolda, T.G., Mørup, M.: Scalable tensor factoriza-
tions with missing data. (2010). http://www.cs.sandia.gov/dmdunla/publications/
AcDuKoMo10.pdf
2. Oseledets, I.V., Tyrtyshnikov, E.E.: TT-cross approximation for multidimensional
arrays. Linear Algebra Appl. 432, 70–88 (2010)
24 S. A. Terekhov

3. Tensor Decompositions: Applications and Eﬃcient Algorithms at SIAM CSE

2017. http://perso.ens-lyon.fr/bora.ucar/tensors-cse17/index.html. Accessed 10
Oct 2019
4. Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33, 2295–2317
(2011)
5. Novikov, A., Podoprikhin, D., Osokin, A., Vetrov, D.: Tensorizing neural networks.
In: Advances in Neural Information Processing Systems 28, NIPS, pp. 442–450
(2015)
6. Terekhov, S.A.: Tensor decompositions in statistical estimation. In: XIX Interna-
tional Conference Neuroinformatics–2017, Moscow, 2–6 October 2017 (2017). (in
Russian)
7. Terekhov, S.A.: Tensor decompositions in estimation and statistical decision mak-
ing. In: Conference OpenTalks.ai, Moscow, 7–9 February 2018 (2018). (in Russian)
8. Terekhov, S.A.: Tensor decompositions in statistical decisions. In: Conference on
Artificial Intelligence Problems and Approaches, Moscow, 14 March 2018, pp. 53–
58 (2018). http://raai.org/library/books/Konf II problem--2018/book1 intellect.
pdf. Accessed 10 Oct 2019. (in Russian)
9. Frolov, E., Oseledets, I.: Tensor methods and recommender systems.
arXiv:1603.06038 [cs.LG], 19 March 2016
10. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980
[cs.LG] (2014)
11. Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv:1212.5701 [cs.LG]
(2012)
12. Igel, C., Husken, M.: Improving the Rprop learning algorithm. In: 2nd
ICSC International Symposium Neural Computation, NC 2000, pp. 115–
121. ICSC Academic Press (2000). https://pdfs.semanticscholar.org/df9c/
6a3843d54a28138a596acc85a96367a064c2.pdf
13. Cichocki, A., Zdunek, R., Amari, S.: Hierarchical ALS algorithms for nonnegative
matrix and 3D tensor factorization. In: Davies, M.E., James, C.J., Abdallah, S.A.,
Plumbley, M.D. (eds.) Independent Component Analysis and Signal Separation,
pp. 169–176. Springer, Heidelberg (2007)
14. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
15. Rhodes, C., Morari, M.: The false nearest neighbors algorithm: an overview. Comput.
Chem. Eng. 21, S1149 (1997). https://doi.org/10.1016/S0098-1354(97)87657-0
16. Ivchenko, G.I., Medvedev, Yu.I.: Mathematical statistics. URSS, Moscow (2018).
(in Russian)
17. Montgomery, D.C.: Design and Analysis of Experiments, 9th edn. Wiley, New
Jersey (2017)
18. Langford, J., Zhang, T.: The epoch-greedy algorithm for contextual multi-armed
bandits. In: Advances in Neural Information Processing Systems 20, NIPS, pp.
1096–1103 (2008)
19. Allesiardo, R., Feraud, R., Bouneffouf, D.: A neural networks committee for the
contextual bandit problem. arXiv:1409.8191 [cs.NE], 29 September 2014
20. Chu, W., Li, L., Reyzin, L., Schapire, R.E.: Contextual bandits with linear payoff
functions. In: 14th International Conference on Artificial Intelligence and Statistics,
AISTATS, Fort Lauderdale, FL, USA (2011)
21. Tay, J.K., Friedman, J., Tibshirani, R.: Principal component-guided sparse regres-
sion. arXiv:1810.04651 [stat.ME], 24 October 2018
Semi-empirical Neural Network Based
Modeling and Identification of Controlled
Dynamical Systems

Yury Tiumentsev(B) and Mikhail Egorchev

Moscow Aviation Institute (National Research University), Moscow, Russia

[email protected]

Abstract. One of the critical elements of the process of creating new

engineering systems is the formation of mathematical and computer
models that provide solutions to the problems of creating and using
such systems. For such systems, typical is a high level of complexity
of the objects and processes being modeled, their multidimensionality,
non-linearity and non-stationarity, the diversity and complexity of the
functions implemented by the simulated object. The solution to the prob-
lems of modeling for objects of this kind is significantly complicated by
the fact that the corresponding models have to be formed in the presence
of multiple and diverse uncertainties, such as incomplete and inaccurate
knowledge of the characteristics and properties of the object being mod-
eled, as well as the conditions in which the object will operate. Besides,
during operation, the properties of the object being modeled may change,
including sharp and significant, for example, due to equipment failures
and/or structural damages. An approach to the formation of gray box
models (semi-empirical models) for systems of this kind, based on com-
bining theoretical knowledge about the object of modeling with the meth-
ods and tools of neural network modeling, is considered. As an example,
we demonstrate the formation of a model for the longitudinal angular
motion of a maneuverable aircraft, as well as the identification of the
aerodynamic characteristics for the aircraft included in this model.

Keywords: Nonlinear dynamical system · Semi-empirical model ·

Grey box model · Neural network · Aircraft motion simulation

1 Introduction
In the processes of development and operation of technical systems, including
aircraft, a significant place is occupied by the solution of such problems as the
analysis of the behavior of dynamical systems, the synthesis of control algorithms
for them, and the identification of their unknown or inaccurately known charac-
teristics. A crucial role in solving the problems of these three classes belongs to
mathematical and computer models of dynamic systems [1,2].
Traditional classes of mathematical models for technical systems are ordi-
nary differential equations (for systems with lumped parameters) and partial
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 25–42, 2020.
https://doi.org/10.1007/978-3-030-30425-6_3
26 Y. Tiumentsev and M. Egorchev

diﬀerential equations (for systems with distributed parameters). As applied to

controlled dynamical systems, ordinary differential equations are most widely
used as a modeling tool [3–5].
Methods of forming and using models of the traditional type are by now suffi-
ciently developed and successfully used to solve a wide range of tasks. However,
concerning modern and advanced technical systems, some problems arise, the
solution of which cannot be provided by the traditional methods. These prob-
lems are caused by the presence of various and numerous uncertainties in the
properties of the corresponding system and in its operational conditions, which
can be parried only if the system in question has the property of adaptability
[6–8], i.e., there are means of operational adjustment of the system and its model
to the changing current situation.
As the experience shows [9,10], the modeling tool that is adequate to this sit-
uation is an approach based on the concept of an artificial neural network (ANN).
We can consider such an approach as an alternative to traditional methods of
modeling dynamic systems, which provides, among other things, the possibility
of obtaining adaptive models. At the same time, conventional neural network
models of dynamical systems, in particular, the models of the NARX and NAR-
MAX classes [10], which are most often used for the simulation of controlled
dynamic systems, do not fully meet the requirements. One of the most impor-
tant reasons for the insufficiently high efficiency of traditional-type ANN-models
concerning the class of problems under consideration is the formation of a purely
empirical model (that is, a model of the black box type), which should cover all
the nuances of the behavior of dynamic systems. For this, it is necessary to build
an ANN-model of a sufficiently high dimension (that is, with a large number of
adjustable parameters in it). At the same time, we know from the experience of
ANN-modeling that the larger the dimension of the ANN -model, the higher the
amount of training data required to configure it. As a result, with the amount
of experimental data that we can obtain for complex technical systems, it is not
possible to train such models, providing a given level of accuracy.
We propose some combined approach to overcome such difficulties, which are
typical for traditional models, both in the differential equations and ANN-model
forms [11–16]. We base this approach on ANN-modeling because only in this
variant, we can obtain adaptive models. Theoretical knowledge about the object
of modeling, existing in the form of ordinary differential equations (these are, for
example, traditional models of aircraft motion), we embed into the ANN-model
of the combined type (so-called semi-empirical ANN-model). At the same time,
a part of the ANN-model is formed based on the available theoretical knowledge
and does not require further adjustment (learning). Only those elements that
contain uncertainties, such as the aerodynamic characteristics of the aircraft, are
subject to adjustment and/or structural reorganization in the learning process
of the generated ANN-model.
The result of this approach are semi-empirical ANN-models, which allow solv-
ing problems inaccessible to traditional ANN-methods: dramatically reduce the
dimension of the ANN-model, which enables us to achieve the required accuracy
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 27

from it using training sets that are insuﬃcient for traditional ANN-models; pro-
vide the ability to identify the characteristics of dynamic systems described by
nonlinear functions of many variables (for example, the coeﬃcients of aerody-
namic forces and moments).
The following sections discuss the implementation of this approach, as well
as an example of its application for modeling the aircraft motion and identifying
the aerodynamic characteristics of the aircraft.

2 Dynamical System as an Object of Study

Let there be some dynamic system S, which is the object of modeling (Fig. 1).

ξ ζ

u x y
F (u, ξ) G(x, ζ)

y = Φ(u, ξ, ζ) = G(F (u, ξ), ζ)

Fig. 1. General structure of the simulated dynamical system

The system S perceives controlled u(t) and uncontrolled ξ(t) effects. Under
these influences, S changes its state x(t) according to its transformation (map-
ping) F (u(t), ξ(t)). At the initial time instant t = t0 the system state S takes
the value x(t0 ) = x0 .
The state x(t) is perceived by the sensor (observer) implementing the trans-
formation G(x(t), ζ(t)), and is given as the output of the system S, i.e. as the
results of observation y(t) for its state x(t). The imperfection state sensors of
the system S is taken into account by the introduction of an additional uncon-
trolled effect ζ(t) (“measuring noise”). The composition of mappings F (·) and
G(·) describes the relationship between the controlled input u(t) ∈ U of the sys-
tem S and its output y(t) ∈ Y , taking into account the influence of uncontrolled
effects ξ(t) and ζ(t) on the system under consideration:

y = Φ(u(t), ξ(t), ζ(t)) = G(F (u(t), ξ(t)), ζ(t)).

Let for system S be made Np observations

{yi } = Φ(ui , ξ, ζ), i = 1, . . . , NP , (1)

each of which recorded the current value of the controlled input ui = u(ti ) and
the corresponding output yi = y(ti ). The results y(ti ), ti ∈ [t0 , tf ] of these
observations together with the corresponding values of the controlled inputs ui
form a set of NP ordered pairs:

{ui , yi }, ui ∈ U, yi ∈ Y, i = 1, . . . , NP . (2)

28 Y. Tiumentsev and M. Egorchev

It is required to ﬁnd, using the data (2), such an approximation Φ(·) to
display Φ(·), implemented by the system S, to fulﬁll the condition

Φ(u(t), ξ(t), ζ(t)) − Φ(u(t), ξ(t), ζ(t)) ε,
(3)
∀u(ti ) ∈U, ∀ξ(ti ) ∈ Ξ, ∀ζ(ti ) ∈ Z, t ∈ [t0 , tf ], x(t0 ) = x0 .

Thus, as it follows from (3), it is necessary that the sought approximate map
has the required accuracy not only when reproducing observations (2), but
Φ(·)
also for all valid values of ui ∈ U for all valid initial conditions x(t0 ) = x0 .
We will call this mapping property Π(·) generalizing. The entries ∀ξ(ti ) ∈ Ξ
approximation will have the required
and ∀ζ(ti ) ∈ Z in (3) mean that the Φ(·)
accuracy provided that at any time instant t ∈ [t0 , tf ] uncontrolled impacts ξ(t)
on the S and the measurement noises ζ(t) do not exceed the permissible limits.
The mapping Φ(·) corresponds to the considered modeling object (dynamical
system S), and the mapping Φ(·) will be further named model of this object. We
will also further assume that for the S system, we have data of the form (2), and
possibly some knowledge of the “design” of the Φ(·) mapping implemented by
the considered system. In this case, the presence of data of this type is required.

At least, they are required to test the Φ(·)model being created. Knowledge about
the mapping Φ(·) may not be available, or they may be, but will not be used in

the formation of the model Φ(·).
Since the available number of experiments generating the set (2) is ﬁnite, the
norm · in the expression (3) will be treated as the standard deviation of the
form
NP
ξ, ζ) − Φ(u, ξ, ζ) = 1 i , ξ, ζ) − Φ(ui , ξ, ζ)]2
Φ(u, [Φ(u (4)
NP i=0

or form

NP
1
Φ(u, ξ, ζ) − Φ(u, ξ, ζ) =
i , ξ, ζ) − Φ(ui , ξ, ζ)]2 .
[Φ(u (5)
NP i=0

mapping to evaluate its generalizing properties is performed

Testing the Φ(·)
on a set of ordered pairs similar to (2)

{ũj , ỹj }, ũ ∈ U, ỹ ∈ Y, i = 1, . . . , NT , (6)

this requires that the condition ui = ũi , ∀i ∈ {1, . . . , NP }, ∀j ∈ {1, . . . , NT } is

met, i.e. all pairs in sets

{ui , yi }N
i=1 ,
P
{ũj , ỹi }NT
j=1

should be non-matching.
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 29

The error on the test set (6) is calculated in the same way as for the training
set (2)
NT
1 j , ξ, ζ) − Φ(ũj , ξ, ζ)]2 ,
Φ(ũ, ξ, ζ) − Φ(ũ, ξ, ζ) = [Φ(ũ (7)
NT j=0

it can also be represented as

NT
1
Φ(ũ, ξ, ζ) − Φ(ũ, ξ, ζ) =
j , ξ, ζ) − Φ(ũj , ξ, ζ)]2 .
[Φ(ũ (8)
NT j=0

Now we can formulate the problem of forming a model of the dynamical system

S. We need to build a model Φ(·), which reproduce with the required level of
accuracy a mapping Φ(·), realized by the system S, i.e., a model of Φ(·) for
which the magnitude of the modeling error (7) or (8) on the test set (6) will not
exceed the speciﬁed permissible values ε in (3). This formation should be based
on the data (2) used to learn the model, as well as (6) used to test the model,
in addition, possibly on knowledge about the S system.

3 The Main Problems that Need to Be Solved in the

Formation of a Dynamical System Model
When developing a model of a dynamic system, several problems arise that need
to be solved. Namely, we need to form:
– a set of quantities characterizing the object being modeled;
– class (family) of models, which includes the desired model;
– representative (informative) set of experimental data for the formation and
testing of the model;
– tools for selecting a specific model from the given class (criterion of the ade-
quacy of the model and the algorithm for its search).
Briefly, these problems we can characterize as follows.
Formation of a Set of Quantities Characterizing the Modeled Object.
The first thing that needs to be done when forming a model of a dynamic system
is to reveal a set of quantities characterizing the system under consideration. This
problem refers to the statement of the modeling problem and is not considered
further. We consider that the task has already been stated, i.e., a decision has
already been made as to which quantities should be taken into account in the
simulation.
Formation of a Family of Models that Includes the Desired Model. To
solve the problem of modeling a dynamical system, we have firstly to form some
(F ) = Φ
set of variants (family) Φ j (·), j = 1, 2, . . .. Then we need to choose the
best, in a certain sense, a variant of the model Φ ∗ (·). As already noted, when
solving this part of the problem of modeling a dynamic system, it is necessary
to answer the following two questions:
30 Y. Tiumentsev and M. Egorchev

– What is the desired family of variants Φ (F ) = {Φj (·)}, j = 1, 2, . . .?

– How to choose from the family Φ (F )
the variant Φ ∗ (·) that satisfies the con-

dition Φ(u(t), ξ, ζ) − Φ(u(t), ξ, ζ) ε, t ∈ [t0 , tf ], ∀u ∈ U , ξ ∈ Ξ, ζ ∈ Z?
The main ideas, which are further used to answer these questions, are as follows:
– when forming the set of options Φ (F ) the key is efficient structurization and
parameterization of this family of models;
– when choosing some variant Φ ∗ (·) from the Φ (F ) family, the key is machine
learning.
Generation of a Representative Set of Experimental Data for the For-
mation and Testing of the Model. One of the essential components of the
process of forming a dynamical system model is the acquisition of a data set,
which completely characterizes the behavior of the considered system. The suc-
cess of solving a simulation problem to a considerable extent depends on how
informative the existing training set is.
Forming a Tool for Selecting a Particular Model from the Given Class.
After a family of models has been formed for the dynamical system under con-
sideration, as well as a representative set of data describing its behavior, it is
necessary to define a tool that allows us to “extract” from this family a specific
model that satisfies a prescribed set of requirements. As such a tool within the
framework of the approach under consideration, it is quite natural to use the
means of neural network learning.

4 Neural Network Semi-empirical Models of Controllable

Dynamical Systems
As already noted, empirical ANN-models have severe limitations for the com-
plexity level of the problems to be solved. We propose to solve this problem
in the class of modular semi-empirical dynamic models combining the capabili-
ties of theoretical and neural network modeling. The formation and step-by-step
adjustment of such ANN-models we discuss in more detail in [11,12], which also
provides a comparison of the accuracy characteristics for semi-empirical and
empirical models.
The Relationship Between Empirical and Semi-empirical Models of
Dynamical Systems. Purely empirical models (black box models) are based
only on experimental data obtained by observing the behavior of the simulated
system [10,17,18]. This approach is typical for traditional neural network model-
ing. In some cases, it may be the only possible one if there is no a priori knowledge
about the nature of the system being modeled, as well as about mechanisms of
its functioning. However, this kind of knowledge is often present. In particular,
there are numerous models of motion for objects of various types (aircraft, ships,
cars, etc.) based on the laws of mechanics, and in some cases, on laws from other
ﬁelds of science. For example, when modeling the motion of aircraft with high
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 31

supersonic speeds, when thermal phenomena begin to play a signiﬁcant role,

we need to include in the motion model not only relations based on the laws of
mechanics but also relations from the field of thermodynamics and heat transfer.
Models of this kind, especially those derived directly from the fundamental
laws of nature (“from first principles”), play a crucial role in all areas of sci-
ence and technology. The formation of such models, however, is associated with
severe difficulties. We need to have appropriate knowledge about the object being
modeled, but it is not always possible. Besides, even if such a model exists, for
example, an aircraft motion model, it may be unsuitable for solving some spe-
cific task. Firstly, this model may contain quantities and dependencies, in the
values of which there are significant uncertainties, which, accordingly, prevents
from obtaining an accurate and reliable solution. Secondly, even if the model is
fully formed and there are no uncertainties in it, it may be unsuitable for solving
real-world applied problems. For example, if we want to simulate the motion of
some object in real time with high accuracy, the traditional model of motion in
the form of a system of differential equations (ordinary or partial derivatives),
which is solved using appropriate methods of numerical integration, may require
an unacceptably long time to obtain solutions.
As was shown above, approximate empirical models are formed to overcome
these difficulties. One of the most effective ways to obtain such models is the
neural network approach.
The models, called theoretical (“white box”), are directly opposed to purely
empirical models (“black box”) according to the principles of their formation.
Empirical data are involved in the process of obtaining a theoretical model only
indirectly as a source of information about the system, the nature of its behav-
ior. This information makes it possible to choose the appropriate class of rela-
tionships that describe the modeled system behavior, but these empirical data
themselves are not used when forming the relationships themselves. In contrast,
empirical models are based solely on experimental data. They are formed in such
a way as to respond to this data in the best possible way, i.e., reproduce them
with the least error.
Empirical models, allowing to overcome the difficulties associated with the-
oretical models, cause, in turn, new challenges that were not for models of the
theoretical type. In particular, learning of these models requires the presence of
appropriate training data sets, the acquisition of which can be a complicated
task. In the case when there is no data on the object being modeled, other than
experimental ones, characterizing its behavior, nothing remains but to try to
obtain an empirical model for the object. But in the case when, in addition to
empirical data, there is also some knowledge about it, for example, in the form
of equations of motion, albeit with uncertain factors in them, we have to try
to use theoretical data as complementary to the available experimental data.
Such a combined, compromise approach can be called a semi-empirical simu-
lation (gray box simulation) [19–22]. In comparison with a purely theoretical
approach, its application allows to increase the accuracy of modeling due to the
fact that the negative impact of elements of a theoretical model that cannot be
32 Y. Tiumentsev and M. Egorchev

adequately described due to a lack of relevant knowledge can be compensated by

conversion this model into a semi-empirical form and refining it with training on
available experimental data. As applied to purely empirical models, taking into
account existing theoretical knowledge through the transition to semi-empirical
models allows us to simplify the process of forming models that meet the spec-
ified requirements, and also, which is very important, to reduce the amount of
experimental data required to train the model. Moreover, the higher the amount
of theoretical knowledge involved, the smaller the amount of experimental data
necessary.
The General Scheme of the Formation of Semi-empirical ANN-
Models. The proposed approach consists of using to improve the model being
formed the theoretical knowledge about the simulated dynamical system jointly
with structural transformations and the learning of the theoretical model trans-
formed into a neural network form.
We take into account theoretical knowledge of two types: about the object
of modeling and appropriate computational methods. Model refinement is per-
formed through neural network learning. As a result, we form a dynamic ANN-
model, the architecture of which takes into account the existing knowledge about
the object of modeling. Traditional neural network models, as has been repeat-
edly noted, are purely empirical (black box), they are based only on experimental
data on the behavior of the system [23]. The dynamic modular networks consid-
ered below, related both experimental data and theoretical knowledge available,
can be classified as semi-empirical models (gray box) [19,20].
The formation of dynamic networks with a modular architecture in the form
of semi-empirical ANN-models consists of the following stages [11,12]:
(1) the formation of a theoretical model with continuous time for the studied
dynamical system, the acquisition of available experimental data on the
behavior of this system;
(2) evaluation of the accuracy for the theoretical model of a dynamical system
on available data, in the case of insufficient accuracy of it, hypothesizing the
reasons for this and possible ways to eliminate them;
(3) conversion of the source model with continuous time to a model with discrete
time;
(4) formation of a neural network representation for the obtained model with
discrete time;
(5) learning neural network model;
(6) assessment of the accuracy of the trained neural network model;
(7) adjustment, in case of insufficient accuracy, of the neural network model by
introducing structural changes into it.
The issues of structural formation of semi-empirical ANN-models, as well as
the comparison of their properties with the properties of traditional (empirical)
ANN-models, are discussed in more detail in [24].
An Example of a Semi-empirical ANN-Model. An assessment of the per-
formance of the ANN-model under consideration was carried out regarding the
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 33

aircraft angular longitudinal motion, which is described using a mathematical

model traditional for flight dynamics [25,26]:
q̄S g
α̇ = q − CL (α, q, ϕ) + ,
mV V
q̄Sc (9)
q̇ = Cm (α, q, ϕ) ,
Jy
T 2 ϕ̈ = −2T ζ ϕ̇ − ϕ + ϕact ,
α is angle of attack, deg; q is pitch angular velocity, deg/sec; ϕ is deflection angle
of elevator or all-moving horizontal tail, deg; CL is lift coefficient; Cm is pitching
moment coefficient; m is mass of aircraft, kg; V is airspeed, m/sec; q̄ = ρV 2 /2
is airplane dynamic pressure; ρ is mass air density, kg/m3 ; g is acceleration of
gravity, m/sec2 ; S is wing area of aircraft, m2 ; c is mean aerodynamic chord, m;
Jy is pitching moment inertia, kg · m2 . T , ζ are time constant and relative damp-
ing factor for elevator actuator; ϕact is command signal value for the elevator
(or all-moving horizontal tail) actuator limited by ±25◦ .

α(k + 1) q(k + 1)
φ(k + 1) ψ(k + 1)

1 1 1
Δt Δt Δt Δt
1
g
V
1
q̄S q̄Sc̄
− mV Jy 1
T2
Δ−1

−1 1

−2T ζ

α(k) q(k) φ(k) ψ(k) φact (k)

Fig. 2. Structure of semi-empirical ANN-model for dynamical system (9) according to

the Euler diﬀerence scheme

In the model (9) the values α, q, ϕ and ϕ̇ are the states of the controlled
object, the variable ϕact is the control. We consider maneuverable aircraft F-16
as an example of a speciﬁc object of modeling. The source data for this aircraft
were taken from [27].
A block diagram of a semi-empirical model based on (9) is shown in Fig. 2.
Here, the Euler method of integrating ordinary diﬀerential equations was used
34 Y. Tiumentsev and M. Egorchev

α(k + 1) q(k + 1)

Δ−1

δeact (k) α(k) α(k − m) q(k) q(k − n)

Fig. 3. Structure of empirical NARX-type ANN-model for dynamical system (9)

to transform the original model with continuous time into a model with discrete
time. For comparison, in Fig. 3 for the same model, a block diagram based on
the NARX network is shown. In both of these schemes, the links are highlighted
in red, whose synaptic weights are adjustable parameters of the model.

5 Generation of Training Sets for ANN-Modeling of

Dynamical Systems
To obtain training data, we use an approach based on a set of specially orga-
nized test control actions applied to a dynamical system. With this approach,
the actual motion of the dynamical system (x(t), u(t)) consists of the program
motion (x∗ (t), u∗ (t)) (test maneuver) caused by the control signal u∗ (t), as well
as the motion (x̃(t), ũ(t)) generated by the additional excitation ũ(t):

x(t) = x∗ (t) + x̃(t), u(t) = u∗ (t) + ũ(t). (10)

As examples of test maneuvers in relation to the aircraft, we can call:

– straight horizontal ﬂight at a constant speed;

– ﬂying with a monotonically increasing angle of attack;
– turn in a horizontal plane;
– ascending/descending spiral.

The type of test maneuver (x∗ (t), u∗ (t) in (10) determines the resulting ranges
of values of state and control variables, the type of excitation ũ(t) speciﬁes the
variety of examples within these ranges.
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 35

Fig. 4. Test disturbances as functions of time used in studying the dynamics of con-
trolled systems: a is a random signal; b is a polyharmonic signal. Here φact is the
actuator command signal for the all-moving horizontal tail of the maneuverable air-
craft from the example (9)

As was shown in the work of Schröder [28] (also in [29,30]) in this case, it is
advisable to use the polyharmonic signal as an excitation. An example of such
a signal is shown in Fig. 4a. The mathematical model of such a signal uj acting
on the j-th control is a harmonic polynomial

2πkt
uj = Ak sin + ϕk , Ik ⊂ K, K = {1, 2, . . . , M }, (11)
T
k∈Ik

which is a ﬁnite linear combination of the main harmonic A1 sin(ωt + ϕ1 ) and

higher order harmonics A2 sin(2ωt + ϕ2 ), A3 sin(3ωt + ϕ2 ) etc.
If the phase angles ϕk in (11) are randomly selected in the interval (−π, π],
then individual harmonic components, being summed up, can give at certain
points t(i) the amplitude value of the total signal uj (i), which violates the con-
ditions of proximity of the perturbed motion to the reference one. Prevention of
this undesirable phenomenon is carried out by appropriate selection of the phase
shift values ϕk .
One more typical excitation signal is random. An example of it is shown in
Fig. 4b. The values of this signal are kept constant at all time intervals [ti , ti+1 ),
i = 0, 1, . . . , n − 1. At time instants ti , these values may change randomly.
36 Y. Tiumentsev and M. Egorchev

6 Algorithms for Learning ANN-Models

A number of problems arise when learning dynamic ANN-models in the form of
recurrent neural networks. The main sources of diﬃculties will be the following:

– bifurcations of the operation modes of the network when changing the values
of model tunable parameters (synaptic weights, biases, internal parameters
of neurons) in the process of learning the ANN-model [31];
– the presence of long-term dependencies of the network outputs on the inputs
and states of the ANN-model at previous time instants [32,33];
– a very complicated landscape of the error function, rugged by numerous deep,
narrow and curved gorges, and often having a plateau [34].
Bifurcation of Network Dynamics. In the theory of nonlinear dynamical
systems, bifurcation is a qualitative restructuring of the functioning modes of a
dynamical system with a small change in its parameters [35]. The bifurcation
of the network dynamics is a qualitative change in the dynamic properties and
behavior of the ANN-model with small changes in its adjustable parameters
(synaptic weights, biases and, in some cases, internal parameters of neurons).
In terms of neural network learning, this means that the landscape of the error
function changes abruptly and significantly.
Long-Term Dependencies. When learning dynamic networks, there is a so-
called problem of long-term dependencies, because the output of the ANN-model
depends on its inputs and states at previous time instants, including those far
from the current point in time. Gradient methods of searching the minimum
of the error function behave unsatisfactorily in this case. The reason for this
behavior is clarified by the analysis of the asymptotic behavior of the learning
error and its gradient in the backpropagation process [32,33], which shows that
the values of these quantities rapidly (exponentially, as a rule) decrease.
Complicated Landscape of the Error Function. One of the most important
reasons for the emergence of difficulties in learning dynamic ANN-models is a
very complicated relief of the error function, carved by numerous deep, narrow,
and curved gorges. This reason is the most difficult for implementation of the
ANN-model learning process. In this case, the determining factor is the number
of examples in the training set. In this situation, we can only rely on such
an approach to working with training data, which would allow increasing their
number used consistently.
The problem of learning the recurrent ANN-model, taking into account the
complicated relief of the error function, can be solved for it in various ways
[36–38]. These methods include the following:
– regularization

J(w) = SSE + α · SSW,

where SSE is total mean square error of the network, SSW is the sum of the
squares of the weights;
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 37

– random variation of starting weights;

– combination of regular and genetic search;
– segmentation of the training sequence (changing the input network data
changes the location of the cavities on the relief of the error function).
Of the above approaches in complex problems (nonlinear multi-parameter
mapping implemented by the network in combination with a large training set
characterizing the complex behavior of a dynamic system), only the latter has
sufficient efficiency based on the segmentation of the training sequence.
The essence of this approach is as follows. Due to the very complicated relief
of the error function, only for a small set of initial values of the network param-
eters, we can find a global minimum using gradient optimization methods. If we
proceed to solve the problem of finding the initial values of parameters that are
sufficiently close to the minimum, then we can assume that they are solutions
of similar problems. That is, we need to generate a sequence of tasks that:
– the first task is quite simple, and we can find its solution for any initial
parameter values;
– each subsequent task is similar to the previous one - their solutions are close
in the parameter value space;
– the sequence converges to the original, required task.
The solution of individual subtasks from this sequence can be performed
using such algorithms, most often used to solve the problem of learning dynamic
networks [17,18,39], such as:
– Back Propagation Through Time (BPTT);
– Real-Time Recurrent Learning (RTRL);
– Extended Kalman Filter (EKF).
The main features of these algorithms are analyzed in [17].
This approach demonstrated its high efficiency in a series of computer exper-
iments and was successfully applied to solve several problems of modeling and
identification of dynamical systems. We discuss an example of such an applica-
tion in the next section.

7 Neural Network Semi-empirical Modeling of Aircraft

Motion
In this section, using the example of longitudinal angular motion of a maneu-
verable aircraft, the high efficiency of semi-empirical ANN-models (gray box
models) when solving applied problems is demonstrated. The theoretical model
is the corresponding traditional motion model of aircraft in the form of a system
of ordinary differential equations (9). The semi-empirical ANN-model formed in
this particular example includes two elements of the “black box” type (see Fig. 2)
describing the dependencies of lift coefficients and pitch moment on state vari-
ables (angle of attack, pitch angular velocity and deflection angle of a controlled
stabilizer). We need to restore these coefficients basing on the available experi-
mental data for the observed state variables of the dynamical system.
38 Y. Tiumentsev and M. Egorchev

, deg 0

−5
act
φ

−10
0 2 4 6 8 10 12 14 16 18 20

10
q, deg/sec

−10
0 2 4 6 8 10 12 14 16 18 20

10
α, deg

0
0 2 4 6 8 10 12 14 16 18 20

0.5
E ,%

0
q

−0.5
0 2 4 6 8 10 12 14 16 18 20

5
E ,%

0
α

−5
0 2 4 6 8 10 12 14 16 18 20
t, sec

Fig. 5. Estimation of the restoration accuracy for the dependencies CL (α) and Cm (α)
based on the results of testing the ANN-model (point mode, identiﬁcation and testing
using a polyharmonic signal). The output values of the object (9) and ANN-models
are shown by a blue line and a green line, respectively

As an example of a speciﬁc object of simulation, the F-16 maneuverable

aircraft was considered, the source data for which were taken from [27]. A com-
putational experiment with the model (9) to obtain a training set was conducted
for the time interval t ∈ [0; 20] sec with discretization step Δt = 0.02 sec for a
partially observable state vector y(t) = [α(t); ωz (t)]T , with additive white noise
with a standard deviation of σ = 0.01, affecting the output of the system y(t).
Test excitations used in studying the dynamics of the system (9) are shown in
Fig. 4.
As the target value of the simulation error, we will further use the standard
deviation of additive noise acting on the system output. Training in the sample
{yi }, i = 1, . . . , N , obtained using the source model (9), is carried out in Matlab
for networks in the form of LDDN (Layered Digital Dynamic Networks) using the
Levenberg–Marquardt algorithm by criterion of mean square error of model. The
Jacobi matrix is calculated using the RTRL (Real-Time Recurrent Learning) [27]
algorithm.
An extensive series of computational experiments was carried out comparing
the effectiveness of test signals for two types of test maneuvers: straight-line
horizontal flight at a constant speed and flight with a monotonically increasing
angle of attack. As a typical example, Fig. 5 shows how accurately unknown
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 39

Table 1. Simulation error on the training set (polyharmonic signal)

Problem Point mode Monotonous mode

RMSEα RMSEq RMSEα RMSEq
Adjusting Cy 1.02 · 10−3 1.24 · 10−4 1.02 · 10−3 1.24 · 10−4
Learning CL 1.02 · 10−3 1.23 · 10−4 1.02 · 10−3 1.24 · 10−4
Learning Cy , Cm 1.02 · 10−3 1.19 · 10−4 1.02 · 10−3 1.27 · 10−4
NARX simulation 1.85 · 10−3 3.12 · 10−3 1.12 · 10−3 7.36 · 10−4

Table 2. Simulation error on the test set (polyharmonic signal)

Problem Point mode Monotonous mode

RMSEα RMSEq RMSEα RMSEq
Adjusting CL 1.02 · 10−3 1.59 · 10−4 1.02 · 10−3 1.17 · 10−4
Learning CL 1.02 · 10−3 1.59 · 10−4 1.02 · 10−3 1.17 · 10−4
Learning CL , Cm 1.02 · 10−3 1.32 · 10−4 1.02 · 10−3 1.59 · 10−4
NARX simulation 2.32 · 10−2 4.79 · 10−2 3.16 · 10−2 5.14 · 10−2

Table 3. Simulation error on the test set for the semi-empirical model and three types
of excitation signals

Signal Point mode Monotonous mode

RMSEα RMSEq RMSEα RMSEq
Doublet 0.0202 0.0417 8.6723 34.943
Random 0.0041 0.0071 0.0772 0.2382
Polyharmonic 0.0029 0.0076 0.0491 0.1169

dependencies are restored (nonlinear functions CL (α), Cm (α)). The accuracy of

the obtained semi-empirical ANN-model using these dependencies was evaluated
in comparison with the original system (9), which used exact representations of
the functions CL (α), Cm (α). The results for these two models are so close that
the curves in the graphs almost coincide. Numerical estimates of the accuracy
of the models obtained are given in Table 1 (an estimate of the accuracy on
the training set) and Table 2 (an estimate of the generalizing properties of the
ANN-model). It also gives a comparison of the models obtained with the NARX
models.
Table 3 compares the values of the simulation error depending on the type
of excitation signal for the considered semi-empirical model of the longitudinal
angular motion of the aircraft. We can see that similar results for the empirical
NARX model are much less accurate, in particular, for the polyharmonic signal,
RMSEα = 1.3293, RMSEq = 2.7445.
40 Y. Tiumentsev and M. Egorchev

In Tables 1, 2 and 3 we denote straight-line horizontal ﬂight at a constant

speed as a point mode, and flight with a monotonically increasing angle of attack
as a monotonic mode. In addition, the term “learning” for the corresponding
aerodynamic coefficients denotes the problem of restoring the corresponding
unknown function “from scratch”, i.e. assuming no information about the possi-
ble values of these coefficients. The term “adjusting” denotes the task of refining
the values of the corresponding coefficient, known, for example, from the results
of wind tunnel tests.

8 Conclusions

The obtained results allow us to conclude that the methods of semi-empirical

neural network modeling, combining knowledge and experience from the rele-
vant subject area, as well as from traditional computational modeling, are a
powerful and promising tool potentially suitable for solving complex problems
of describing and analyzing the controlled motion of aircraft. Comparison of
the results obtained within the framework of the semi-empirical approach with
those obtained by traditional ANN-modeling (NARX-type models) shows the
undeniable advantages of semi-empirical models.

Acknowledgments. This research is supported by the Ministry of Science and Higher

Education of the Russian Federation as Project No. 9.7170.2017/8.9.

References
1. Hangos, K.M., Bokor, J.: Analysis and Control of Nonlinear Process Systems.
Springer, Berlin (2004)
2. Kulakowski, B.T., Gardner, J.F., Shearer, J.L.: Dynamic Modeling and Control of
Engineering Systems, 3rd edn. Cambridge University Press, Oxford (2007)
3. Scott, L.R.: Numerical Analysis. Princeton University Press, New Jersey (2011)
4. Hairer, E., Norsett, S.P., Wanner, G.: Solving Ordinary Differential Equations I:
Nonstiff Problems, 2nd edn. Springer, Berlin (2008)
5. Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II: Stiff and
Differential-Algebraic Problems, 2nd edn. Springer, Berlin (2002)
6. Tao, G.: Adaptive Control Design and Analysis. Wiley, New York (2003)
7. Ioannou, P.A., Sun, J.: Robust Adaptive Control. Prentice Hall, New Jersey (1995)
8. Astolfi, A., Karagiannis, D., Ortega, R.: Nonlinear and Adaptive Control with
Applications. Springer, Berlin (2008)
9. Nelles, O.: Nonlinear System Identification: From Classical Approaches to Neural
Networks and Fuzzy Models. Springer, Berlin (2001)
10. Billings, S.A.: Nonlinear System Identification: NARMAX Methods in the Time,
Frequency and Spatio-temporal Domains. Wiley, New York (2013)
11. Egorchev, M.V., Kozlov, D.S., Tiumentsev, Yu.V., Chernyshev, A.V.: Neural net-
work based semi-empirical models for controlled dynamical systems. J. Comput.
Inf. Technol. 9, 3–10 (2013). [in Russian]
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 41

12. Egorchev, M.V., Kozlov, D.S., Tiumentsev, Yu.V.: Neural network adaptive semi-
empirical models for aircraft controlled motion. In: Proceedings of the 29th
Congress of the International Council of the Aeronautical Sciences, vol. 4 (2014)
13. Egorchev, M.V., Tiumentsev, Yu.V.: Learning of semi-empirical neural network
model of aircraft three-axis rotational motion. Opti. Mem. Neural Netw. 24(3),
201–208 (2015)
14. Kozlov, D.S., Tiumentsev, Yu.V.: Neural network based semi-empirical models for
dynamical systems described by differential-algebraic equations. Opt. Mem. Neural
Netw. 24(4), 279–287 (2015)
15. Egorchev, M.V., Tiumentsev, Yu.V.: Semi-empirical neural network based app-
roach to modelling and simulation of controlled dynamical systems. Procedia Com-
put. Sci. 123, 134–139 (2018)
16. Egorchev, M.V., Tiumentsev, Yu.V.: Neural network semi-empirical modeling of
the longitudinal motion for maneuverable aircraft and identification of its aero-
dynamic characteristics. In: Advances in Neural Computation, Machine Learning,
and Cognitive Research. Studies in Computational Intelligence, vol. 736, pp. 65–71
(2018)
17. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall
PTR, New Jersey (2006)
18. Hagan, M.T., Demuth, H.B., Beale, M.: Neural Network Design. PWS Publishing
Company, New Orleans (1996)
19. Oussar, Y., Dreyfus, G.: How to be a gray box: dynamic semi-physical modeling.
Neural Netw. 14(9), 1161–1172 (2001)
20. Dreyfus, G.: Neural Networks - Methodology and Applications. Springer, Berlin
(2005)
21. Bohlin, T.: Practical Grey-Box Identification: Theory and Applications. Springer,
Berlin (2006)
22. Chen, Z., Wei, J., Jiang, R.: A gray-box neural network based model identification
and fault estimation scheme for nonlinear dynamic systems. Int. J. Neural Syst.
23(6), 1–15 (2013)
23. Rivals, I., Personnaz, L.: Black-box modeling with state-space neural networks.
In: Zbikowski, R., Hint, K.J. (Eds.) Neural Adaptive Control Technology, World
Scientific, pp. 237–264 (1996)
24. Brusov, V.S., Tiumentsev, Yu.V.: Neural Network Based Modeling of Aircraft
Motion. The MAI Publishing House, Moscow (2016). [in Russian]
25. Cook, M.V.: Flight Dynamics Principles. Elsevier, Amsterdam (2007)
26. Hull, D.G.: Fundamentals of Airplane Flight Mechanics. Springer, Berlin (2007)
27. Nguyen, L.T., Ogburn, M.E., Gilbert, W.P., Kibler, K.S., Brown, P.W., Deal, P.L.:
Simulator study of stall/post-stall characteristics of a fighter airplane with relaxed
longitudinal static stability. Technical Report, TP-1538, NASA, December 1979
28. Schröeder, M.R.: Synthesis of low-peak-factor signals and binary sequences with
low autocorrelation. IEEE Trans. Inf. Theory 16(1), 85–89 (1970)
29. Morelli, E.A., Klein, V.: Real-time parameter estimation in the frequency domain.
J. Guidance Control Dyn. 23(5), 812–818 (2000)
30. Smith, M.S., Moes, T.R., Morelli, E.A.: Flight investigation of prescribed simul-
taneous independent surface excitations for real-time parameter identification. In:
AIAA Paper 2003, No. 23, p. 5702 (2003)
31. Doya, K.: Bifurcations in the learning of recurrent neural networks. IEEE Int.
Symp. Circuits Syst. 6, 2777–2780 (1992)
32. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient
descent is difficult. Trans. Neur. Netw. 5(2), 157–166 (1994)
42 Y. Tiumentsev and M. Egorchev

33. Schaefer, A.M., Udluft, S., Zimmermann, H.-G.: Learning long-term dependencies
with recurrent neural networks. Neurocomputing 71(13–15), 2481–2488 (2008)
34. De Jesus, O., Horn, J.M., Hagan, M.T.: Analysis of recurrent network training
and suggestions for improvements. In: Proceedings of IJCNN, vol. 4, pp. 2632–
2637 (2001)
35. Seydel, R.: From Equilibrium to Chaos: Practical Bifurcation and Stability Anal-
ysis. Elsevier, Amsterdam (1988)
36. Phan, M.C., Hagan, M.T.: Error surface of recurrent neural networks. IEEE Trans.
Neural Netw. 24(11), 1709–1721 (2009)
37. Horn, J., De Jesús, O., Hagan, M.T.: Spurious valleys in the error surface of recur-
rent networks – analysis and avoidance. IEEE Trans. Neural Netw. 20(4), 686–700
(2009)
38. Elman, J.L.: Learning and development in neural networks: the importance of
starting small. Cognition 48(1), 71–99 (1993)
39. Mandic, D.P., Chambers, J.A.: Recurrent Neural Networks for Prediction: Learning
Algorithms, Architectures and Stability. Wiley, New York (2001)
Artificial Intelligence
Photovoltaic System Control Model
on the Basis of a Modiﬁed Fuzzy Neural Net

Ekaterina A. Engel(&) and Nikita E. Engel

Katanov State University of Khakassia,

Shetinkina 61, 655017 Abakan, Russian Federation
[email protected]

Abstract. This paper represents the photovoltaic system control model on the
basis of a modified fuzzy neural net. Based on the photovoltaic system condi-
tion, the modified fuzzy neural net provides a maximum power point tracking
under random perturbations. The architecture of the modified fuzzy neural net
was evolved using a neuro-evolutionary algorithm. The validity and advantages
of the proposed photovoltaic system control model on the basis of a modified
fuzzy neural net are demonstrated using numerical simulations. The simulation
results show that the proposed photovoltaic system control model on the basis of
a modified fuzzy neural net achieves real-time control speed and competitive
performance, as compared to a classical control scheme with a PID controller
based on perturbation & observation, or incremental conductance algorithm.

Keywords: Modiﬁed fuzzy neural net Random perturbations

Photovoltaic system Maximum power point tracking

1 Introduction

The Republic of Khakassia is one of the most perspective regions for development of
solar power system in Russian Federation. The annual average of the solar insolation
for town Abakan is about 1450 kWh/sq.m [1]. That exceeds values of the European
part of the Russian Federation (about 1200-1450 kWh/sq.m). But the photovoltaic
(PV) systems aren’t stable due to complex dynamics of the solar irradiance fluctuations.
Therefore, maximum power point tracking (MPPT) algorithms have an important role
in solar power generation. We consider a non-linear MPPT problem for PV systems.
PV system is non-linear and commonly suffers from restrictions imposed by sudden
variations in the solar irradiance level. Within the research literature, a whole array of
differing MPPT algorithms has been proposed [2]. Among them, the perturbation &
observation (P&O) and incremental conductance (IC) algorithms are the most common
due to simplicity and easy implementation. But controllers based on P&O, or IC
algorithm for PV systems have slow response times to changing reference commands,
take considerable time to settle down from oscillating around the target reference state,
must often be designed by hand. Moreover, the PV system control model should be
robust to different environmental conditions, in order to reliably generate maximum
power. Therefore, automatic intelligent algorithms such as fuzzy neural networks are
promising alternatives [3].

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 45–52, 2020.
https://doi.org/10.1007/978-3-030-30425-6_4
46 E. A. Engel and N. E. Engel

The real-life PV systems have complex dynamic due to random variation of the
system parameters and fluctuation of the solar irradiance. Thus, neural-network-based
solutions have been proposed to approximate this complex dynamic [3]. But the neural
network needs to become more adaptive. Adaptive behavior can be enabled by mod-
ifying the network into a recurrent neural network with fuzzy units. This forms the
motivation for the development of a PV system control model on the basis of a
modiﬁed fuzzy neural net (MFNN) as presented in this paper. Compared to existing
fuzzy neural nets, including ANFIS, the MFNN includes recurrent neural networks and
fuzzy units. The function approximation capabilities of a neural net are exploited to
approximate a membership function.

2 The PV System Control on the Basis of a MFNN

In this article, the function approximation capabilities of a MFNN are exploited to

approximate a nonlinear control law of PV system. This paper considers the devel-
opment of an effective maximum PV power point tracking algorithm on the basis of a
MFNN that remains easy to implement. The proposed modiﬁed fuzzy neural net is
capable of handling uncertainties in both the PV system parameters and in the
environment.

2.1 Mathematical Modelling of a PV System

We design and simulate in Octave environment 20 kW PV module by implementing
following mathematical models of electrical characteristics. The open-circuit voltage is
the extreme voltage offered from a PV cell at zero current. We calculate the open-
circuit voltage as follows

V ¼ NKT=Q ln ðIL Io Þ=Io þ 1: ð1Þ

where is the open-circuit voltage, is diode ideality constant, is the Boltzmann constant
ð1:381 1023 J=KÞ, T is temperature in Kelvin, Q is electron charge
ð1:602 1019 cÞ, IL is the light-generated current same as Iph ð AÞ, and Io is the
saturation diode current ð AÞ. We calculate the light-generated current as follows

IL ¼ G=Gref ðILref þ aIsc ðTc Tc ref ÞÞ: ð2Þ

where G is the radiation ðW=m2 Þ, Gref is the radiation under standard condition
ð1000 W=m2 Þ, ILref is the photoelectric current under standard condition (0.15 A), Tc ref
is module temperature under standard condition (298 K), aIsc is the temperature
coefﬁcient of the short-circuit current ðA=K Þ ¼ 0:0065=K, IL is the light-generated
current. We calculate the reverse saturation current as follows
3
Io ¼ Ior T=Tref exp QEg =KN ð1=Tr 1=T Þ : ð3Þ
Photovoltaic System Control Model on the Basis of a MFNN 47

where Ior ¼ Ish=expðVocn=NVtnÞ is the saturation current, Io is the reverse saturation

current, N is the ideality factor 1.5, and Eg is the band gap for silicon 1.10 eV. We
calculate the short-circuit current as follows

Ish ¼ IL Io ðexpðQ ðV IRS Þ=NKT Þ 1Þ: ð4Þ

This PV module provides 20 kW under standard condition (irradiance is 1000

W=m2 , temperature is 20 C). This Octave model uses a MPPT system with a duty cycle
that generates the required voltage to extract maximum power.

2.2 The PV System Control Model on the Basis of a MFNN

The MFNN is trained based on the data

Z i ¼ ð xi ¼ ðIr i ; V i ; Pi ; dI=dV i Þ; si ¼ ðDIr i ; dI=dV i Þ; Di Þ; ð5Þ

where i 2 1; ::; 106 , I and V represent the current and voltage respectively, Di is the
duty cycle of boost converter, dI and dV represent (respectively) the current error and
voltage error before and after the increment, Ir represents the solar irradiance, D Ir i ¼
Ir0i Ir1i ; Ir0i is the irradiance before the increment, Ir1i is the irradiance after the
increment, P – the PV system power; xi – input signal of MFNN; Di – control signal.
Data (5) have a training set of 8 * 105 examples, and a test set of 2 * 105 examples.
Fulfillment of the MFNN briefly can be described as follows.
Step 1. All samples of the data (5) si were classified into two groups according to
speed of the PV system conditions change: A1 is sudden change ðCi1 ¼ 1Þ; A2 is
smooth change ðCi2 ¼ 1Þ. This classification generates vector with elements C i .
Step 2. We trained two-layer network: Y ðsi Þ (number of hidden neurons is 2). The
vector si was network’s input. The vector C i was network’s target. We formed
membership function lj ðsÞ based on the two-layer network Y ðsi Þ as follows

Yðsi Þ; if Yðsi Þ 0 jYðsi Þj; if Yðsi Þ\0
l1 ðs Þ ¼
i
; l2 ðs Þ ¼
i
ð6Þ
0; if Yðsi Þ\0 0; if Yðsi Þ 0

This step provides the fuzzy sets Aj , (A1 is sudden change of the PV system
conditions, A2 is smooth change of the PV system conditions) with membership
function lj ðsÞÞ; j ¼ 1::2.
Step 3. We created the MFNN based on the data (5). The MFNN includes two
recurrent neural networks Fj (number of delays is 2), j ¼ 1::2. The MFNN architec-
ture’s parameters (number of nodes in hidden layer, corresponded weights and biases)
have been coded into particles X. The dimension component of particle X is dh ¼
12 h þ 2 2 fDmin ¼ d1 ¼ 14; Dmax ¼ d10 ¼ 122g. To make the PV system control
become adaptive, it needs to have some idea of how the actual PV system behavior
differs from its expected behavior, so that the recurrent neural network Fj can recali-
brate its behavior intelligently during run time, and try to eliminate
the constant
tracking error. We give the recurrent neural network Fj lj ðsÞ; x an extra input lj ðsÞ
48 E. A. Engel and N. E. Engel

which corresponds to the value of membership function lj ðsÞ. This input signal of the

recurrent neural networks Fj lj ðsÞ; x will give useful feedback for providing the
maximum PV power during the dynamically changing PV system conditions. This
control approach does provide a more intelligent algorithm of generating the control
signal u on the basis of a MFNN. We evaluated the ﬁtness function as follows:

X
H
f ðD; uÞ ¼ ð1=H Þ jD uj: ð7Þ
l¼1

where H is number of evaluated samples. We used modiﬁed ALO as optimization

algorithm. We presented modified ALO in [4]. We used function (7) as a fitness
function for the modified ALO. This step provides trained MFNN bestðdh Þ which
generate the control signal uðbestðdh ÞÞ – best solution X created by the modified ALO).
If-then rules are defined as:

Pj : IF x is Aj THEN u ¼ Fi lj ðsÞ; x ; j ¼ 1::2: ð8Þ

Simulation of the trained MFNN briefly can be described as follows.

Step 1. Aggregation antecedents of the rules (8) maps input data x into their
membership functions and matches data with conditions of rules. These mappings are
then activate the k rule, which indicates the k PV system mode and correspondent k
recurrent neural network Fk ðlj ðsÞ; xÞ; k 2 1::2.
Step 2. According the k mode the correspondent k recurrent neural network
Fk lj ðsÞ; x (trained based on the data (5)) generates the control signal u ¼ Fj lj ðsÞ; x .

2.3 Simulation and Results

We revisited the numerical examples from the previous subsections 2.1 and 2.2 due to
illustrate the benefits of the proposed photovoltaic system control model based on a
modified fuzzy neural net. All the simulations for this study are implemented in Octave.
Figure 1 shows the solar irradiance during the simulation time. For the purpose of this
simulation study, four solar irradiance scenarios were adopted:
(1) From time = 0 s to 0.4 s graph demonstrates slow variable shadow cast by an
obstacle, which causes a smooth change of irradiation;
(2) From time = 0.5 s to 1 s graph demonstrates smooth and steady decline in solar
irradiance which simulates a cloud covering;
(3) From time = 1.1 s to 2.1 s irradiation changes to the exact target values with a
smooth change;
(4) From time = 2.1 s to 2.5 s graph demonstrates sudden change in irradiation, from
sunshine conditions.
We fulfilled the MFNN based on the training set of the data (5). We trained the MFNN
using modified ALO [4]. Due to obtain statistical results, we perform 120 modified ALO
runs with following parameters: n ¼ 50 (we use 50 ants and ant lions), T ¼ 100 (we
Photovoltaic System Control Model on the Basis of a MFNN 49

terminate at the end of 100 iterations), the dimension is dh ¼ 12 h þ 2 2

fDmin ¼ d1 ¼ 14; Dmax ¼ d10 ¼ 122g. The vector f ðbestðdh ÞÞ ¼ ð4:2e 3; 3:7e 4;
1:5e 5; 1:1e 5; 2:5e 6; 1:1e 7; 3:4e 8; 4:2e 7; 1:7e 5; 2:4e 4Þ shows
that only one set of MFNN architecture with d7 ¼ 86 can achieve the ﬁtness function (7)
above 4e 8 over data (5).

Fig. 1. Plot of solar irradiance.

This MFNN includes two recurrent neuronets Fk ðlk ðsÞ; xÞ; k ¼ 1::2. The afore-
mentioned recurrent neuronets are the two-layered networks with seven hidden neu-
rons. In this comparison study, the performance of the proposed PV system control
model on the basis of a MFNN is compared against the standard model with the PID
controller (based on P&O or IC algorithm), under the same conditions. Figures 2 and 3
show the simulation results.

Fig. 2. Plot of the PV system power provided by control model with PID controller based on
P&O algorithm and the control model on the basis of a MFNN respectively.
50 E. A. Engel and N. E. Engel

According to Fig. 3, the response time using the IC algorithm is not better than the
one using the proposed algorithm in the ﬁrst 0.5 s. This means that the IC algorithm
which creates the control signal within the transient mode is the overshoot. From
time = 2.2 s to 3 s the PV system energy producing by the control model with the PID
controller based on the IC algorithm drops to zero.

Fig. 3. Plot of the PV system power provided by the control model with PID controller based on
the IC algorithm and the control model on the basis of a MFNN respectively.

The proposed PV system control model is more robust and provides more power
(Figs. 2 and 3) in comparison with the control models with the PID controller (based on
P&O, or the IC algorithm). Figure 2 shows the misjudgment phenomenon for the P&O
algorithm when solar irradiance continuously increases ðtime t 2 T ¼ ½0:3 s;
0:4 s [ ½0:8 s; 1 s [ ½1:7 s; 2:1 sÞ. In such situations, the proposed PV system control
model - which is based on a fuzzy modiﬁed neural net - produces on average 8.6% more
energy than does the case of the standard
model, which is based on a perturbation and
P t P
observation algorithm (100% PMFNN PtP&O =PtP&O = 1 ¼ 8:6%, where
t2T t2T
PMFNN is energy provided by proposed PV system control model based on a modiﬁed
fuzzy neural net, PP&O PMFNN is energy provided by standard model based on P&O
algorithm).
During time t 2 ½1:1 s; 1:3 s [ ½1:5 s; 1:7 s [ ½2:2 s; 3 s the PID controller based
on the IC algorithm generates a huge numerical value of the control signal (value of
control signal u 2 ½5:0706e þ 32; 5:6385e þ 33) as a result of sudden fluctuations in
the solar irradiance, while the proposed PV system control model provided the max-
imum PV power (Figs. 1, 3 and 4).
Photovoltaic System Control Model on the Basis of a MFNN 51

Fig. 4. Plot of the control signal provided by the PID controller based on the IC algorithm.

The MFNN provides a more suitable approach to the MPPT problem, with the
pointing accuracy. Extensive simulation studies on the Octave model have been carried
out on different initial conditions, different disturbance proﬁles, and variation in pho-
tovoltaic system and solar irradiation level parameters. The results show that consistent
performance has been achieved for the proposed PV system control model with good
stability and robustness as compared with the standard model with a PID controller.

3 Conclusions

It is shown that the PV system control model on the basis of a MFNN is robust to PV
system uncertainties. Unlike popular approaches to nonlinear control, a MFNN is used
to approximate the control law and not the system nonlinearities, which makes it
suitable over a wide range of nonlinearities. Compared to standard MPPT algorithms,
including P&O and IC, the PV system control model on the basis of a MFNN produces
good response time, low overshoot, and, in general, good performance. Simulation
comparison results for a PV system demonstrate the effectiveness of the PV system
control model on the basis of a MFNN as compared with the standard model with a
PID controller (based on P&O, or IC algorithm). It is our contention that the proposed
modiﬁed fuzzy neural net architecture can have generic control applications to other
kinds of systems, and produce a competitive alternative algorithm to neural networks
and PID controllers.

Acknowledgement. The reported study was funded by RFBR and Republic of Khakassia
according to the research project №. 19-48-190003.

References
1. Beta-energy ofﬁcial page. https://www.betaenergy.ru/insolation/abakan. Accessed 27 Apr
2019
2. Tavares, C.A.P., Leite, K.T.F., Suemitsu, W.I., Bellar, M.D.: Performance evaluation of PV
solar system with different MPPT methods. In: 35th Annual Conference of IEEE Industrial
Electronics IECON 2009, pp. 719–724 (2009)
52 E. A. Engel and N. E. Engel

3. Kumar, A., Chaudhary, P., Rizwan, M.: Development of fuzzy logic based MPPT controller
for PV system at varying meteorological parameters. In: 2015 Annual IEEE India Conference
(INDICON), New Delhi, pp. 1–6 (2015)
4. Engel, E.A., Engel, N.E.: Temperature forecasting based on the multi-agent adaptive fuzzy
neuronet. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V. (eds.) Advances in Neural
Computation, Machine Learning, and Cognitive Research. Neuroinformatics 2018. Studies in
Computational Intelligence, vol. 736. Springer, Cham (2019)
Impact of Assistive Control on Operator
Behavior Under High Operational Load

Mikhail Kopeliovich(B) , Evgeny Kozubenko, Mikhail Kashcheev,

Dmitry Shaposhnikov, and Mikhail Petrushan

Research Center of Neurotechnologies, Southern Federal University,

Rostov-on-Don, Russia
[email protected]

Abstract. This work describes the impact of artiﬁcial assistant on the

operator’s performance which is applied to correct the operator’s actions
in case of unsafe or ineﬀective behavior. In order for assistive control
to be eﬀective a method to evaluate and predict operator performance
should be applied. This paper presents a model of operator activity based
on histograms of the distribution of reaction times to particular stimuli.
The model is then applied to the task of monitoring operator activity in
a controlled environment, designed to emulate certain actions performed
by an aircraft pilot. For each subject, an individual behavioral portrait is
made. Then, performance changes under high operational load conditions
and impact of assistive control are evaluated.

Keywords: Adaptive behavior · Assistive control ·

Model of operator activity

1 Introduction

Safety and performance of diﬀerent processes (driving, ﬂying, manufacturing,

building and others) depend on the efficiency of the operator’s behavior and
accuracy of their actions. Therefore, the operator’s activity should be monitored
to forecast process performance [1,7]; in a case of unsafe behavior, the activity
should be corrected by an artificial assistant. The problem is considered with a
focus on the activities of the pilot of the aircraft, although it is formulated in a
universal manner, allowing to transfer the methods of monitoring and control to
another type of operator activity. A universal model (Fig. 1) of operator activity
is formalized, including components of monitoring and control of the perfor-
mance and safety of behavior. This model is applied to the task of controlling
operator activity in a developed test experiment, where particular elements emu-
late certain piloting actions. In particular, one of the tasks in the test experiment
is “alignment of the artificial horizon” (see Sect. 3).
To implement the procedure for monitoring and controlling operator activ-
ities, it is necessary to determine the criteria of the effectiveness and safety of

c Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 53–61, 2020.
https://doi.org/10.1007/978-3-030-30425-6_5
54 M. Kopeliovich et al.

behavior and methods of their evaluations. The following approach was imple-
mented: step 1—a list of events and situations is determined which may occur
or take place in the process of operator activity and to which a specific oper-
ator response is required, consisting of a specific set of actions. Further, step
2—determining the list of expected actions and methods of their registration.
Actions are characterized by a number of parameters that can be recorded. Thus,
step 3—registration of action parameters. Video surveillance system and feed-
back from the onboard system of the experimental setup (from the object of
operator activity) are used to capture the operator’s actions and their parame-
ters. From the onboard system, information is obtained about the presence (or
absence) of the action (pressing the key, switching the toggle switch, etc., see
Sect. 3) and the characteristics of the action (latency relative to the event for
which this action is expected).
Some actions or characteristics allow direct evaluation of the performance and
safety of behavior. For example, the absence of response to a particular stimulus
or a long reaction time (RT) can be interpreted by the control system as a
failed task (or an inefficiently performed task). Such control is the “first control
layer”. If the RT (or another parameter of the behavioral response) falls within
the permissible range, the behavior characteristics are analyzed in the “second
control layer” (or the “differential control layer”). In the second control layer, the
current observing behavior is compared with that typical for the operator at a
specific event. Variation of the characteristics of the operator’s actions within the
acceptable range depends on his physiological and psychological capabilities, on
the current state, on distractions, and it is difficult to directly interpret certain
values of such characteristics in terms of effectiveness and safety of behavior.
The project verifies the hypothesis that deviations of the characteristics of the
operator’s actions from those typical for a particular event are a correlate of the
efficiency and safety of operator activity and (or) allow us to make a forecast of
the effectiveness and safety of future behavior. Safety behavior can be formalized
as the probability of making a critical control error. Efficiency—in the form of the
number of non-critical errors per unit of time, possibly weighted by the degree of
significance. We identify the deviation from typical behavior by trying to classify
the feature vector comprised of behavioral parameters. If it fails to classify as a
model, which belongs to “typical behavior” class of particular operator, we treat
it as a possible correlate of non-optimal performance (see Sect. 4).
The overall scheme of the operational cycle with assistive control is presented
in Fig. 1. In general, it is similar to schemata of control systems in works [2–
4,8–10]. Assistive support components are described in works [3,11] and are
implemented in our approach in a similar manner.
This work is a summary of selected results of the project no. 2.955.2017/4.6,
supported by the Russian Ministry of Science and Higher Education.
Impact of Assistive Control on Operator Behavior 55

Artificial
Operational Cycle Assistant

Context Data Acquision

perception and Assistance and Storing
recognition

Performance Behavioral correlation

estimation of performance
Operator's
Context changes
decisions

Behavior model Personalized

recognition behavioral models

Operator's actions
Behavior model
Surveillance systems
building

Uncontrollable
events Causal relationship

Data transfer

Fig. 1. Operational cycle with assistive control

2 Problem Statement
We’ve evaluated operator’s behavior changes under high operational load con-
ditions which we model by positioning stimuli densely in time, where each stim-
ulus requires a certain response. A particular case of assistance is tested which
involves: (a) recognition of potentially non-optimal performance by classification
of features vector comprised of behavioral parameters, and (b) adding latency
to particular stimuli visualization.
To select the characteristics that make up the “individual portrait of behav-
ior” (a schema of actions and a list of ranges of their characteristics under certain
events), their variability is analyzed in a series of experiments. It makes sense to
carry out an analysis of the characteristics of behavioral reactions only after the
skill is established. The skill is considered established when the initial stage of
learning a new type of operator activity ends and performance indicators reach
a quasi-constant level.
It is assumed (and confirmed in our test experiment) that operational failure
is caused by perception conflict and, thus, serialization of visualization of quasi-
simultaneous stimuli may lead to more optimal performance despite the fact that
artificial latency of stimulus visualization itself increases RT.

3 Methods

We applied the general model of operational activity with assistive control

(Fig. 1) to a particular case of the test experiment, which involves high load
operator’s activity. The task for a test subject is to react, as quick and precise
as possible, to stimuli from the output devices (Fig. 2).
56 M. Kopeliovich et al.

Fig. 2. Experimental setup. Output devices: left (1) and right (2) monitors, LED panel
(3), speakers (4). Input devices: switch panel (5), keyboard (6), joystick (7)

According to Fig. 2, the stimuli come from monitors (1) and (2) in front of
the subject, the LED panel (3) and speakers (4). The image on the left monitor
(1) imitates artiﬁcial horizon (white line at the center), continuously changing
its tilt and height in random directions at a frequency of 10 Hz. On the right
monitor (2) there are the timer (upper-left corner) that starts from 15 s, and
restarts after the correct reaction is received, the penalty counter (bottom-left
corner) and the shape (circle, square, triangle, pentagonal star or hexagonal
star), which is changing randomly with 15–20 s interval. LED panel contains 6
rows of 3 diodes in each. During the experiment, a pattern of 1 to 9 randomly
chosen diodes is active, changing after random intervals of 2–12 s. Sound signal
with length about 100 ms rings after random intervals of 5–5.3 s.
Subjects were asked to react to stimuli using following input devices: switch
panel (5) with 5 toggles corresponding to possible shapes appearing on the right
monitor (2), computer keyboard (6) and Thrustmaster T.16000M joystick (7).
During the experiment, the subjects were charged to react quickly on the
four following stimuli:
1. Timer: when the timer on the right monitor (2) ends, to press a key on the
keyboard corresponding to the number of active diodes on the LED panel (3).
2. Shape: to switch toggles on the switch panel (5) corresponding to shape on
the right monitor.
3. Sound: to press the joystick (7) trigger on the sound signal.
4. Horizon: to hold artiﬁcial horizon (1) aligned with two black horizontal bars
using the joystick (7), while the horizon line randomly changing height of
its center with a speed of 0%–2.5% of the monitor width per second and
randomly changing rotation angle with a speed of 0–5 per second.
The experiment goes on for 3 min (success) or until reaching a threshold of 100
penalty points (failure). Table 1 illustrates permissible RT to the stimuli and the
corresponding penalties.
Impact of Assistive Control on Operator Behavior 57

Table 1. Permissible reaction time to external stimuli and penalty points in case of a
late/erroneous reaction

Stimulus Permissible Penalty for late reaction Penalty for

RT (s) (per second) wrong reaction
Timer 3 5 3
Shape 3 5 2
Sound 2 5 3
Horizon 0 Proportional to the None
deviation of the horizon

5 male subjects aged from 23 to 40 took part in the experiments. During one
session, each subject took part in 5 experiments, each subject participated in
one or more sessions with 1 to 2 week in-between. The Bioethics Committee of
SFedU approved the experimental protocol. Each volunteer signed the agreement
to participate in the experiment.
After the described experiments (will be further referred to as Normal Mode),
the subjects took part in two types of experiments with increased operational
load caused by the increased temporal density of stimuli (Hard Mode 1 and 2).
For Hard Mode 1, Timer, Shape and Sound stimuli occur 30% to 60% more
often, in addition, there is 50% chance for the shape to change in about 0–2 s
before or after the Timer ends.
Hard Mode 2 features the same changes as Hard Mode 1, in addition, when
the shape changes in 0–1.5 s before or 0–2 s after the Timer ends, the new shape
will be shown only after correct reaction on the Timer stimulus or after 5 s
passed.
Hard Mode experiments were performed starting with Hard Mode 1, consist-
ing of 3 sessions of 4 to 5 experiments in each with a 1-week interval between
sessions.
We assume that the histogram of the distribution of the RT may be the
basis for building a model of the subject’s behavior, which allows identifying the
subject by matching of reactions characteristics and determine the deviation of
behavior from the subject’s typical one in the next experiments. Reaction time is
widely used for evaluation of operator performance and for failures prediction [5]
and for modeling of human behavior [6].
In the study, the histogram of RT distribution on a certain stimulus is con-
sidered as a model of the subject’s behavior. Each stimulus is considered inde-
pendently. The problem of veriﬁcation of the subject’s identity is based on an
analysis of his RT distributions obtained in one or more experiments.
Identity veriﬁcation algorithm:

1. Create a model of the subject response for a certain stimulus: according to

the data available in the dataset calculate the histogram of the distribution
of the RT for the stimulus.
2. Get a subsample of RT values of the subject’s reactions from one or more
experiments.
58 M. Kopeliovich et al.

3. Calculate the distance (or measure of proximity) between the subsample and
the subject’s response model. The subject is considered to be successfully
verified if the distance is less than a fixed threshold (or more, in case of a
proximity measure).
To determine the thresholds for each stimulus and different subsample size, a
histogram of the RT distribution of the subject is chosen from the available
dataset of experimental results, which is compared with the histogram-model of
the subject (d1 value) and with histograms-models of other subjects (d2 values).
Subsamples are generated by randomly selecting K values from the subject’s RT
to the stimulus. Considered subsample sizes K are 4, 8, 16, 32, 64, 128. The set
of RT values of the subject for generating the subsample is contained in the set
of values for generating the model of the subject, which may affect the results
of comparing the subsample with the model. This will be discussed further.
For each of the five subjects, 100 independent calculations are performed for
each sample size, resulting in a set of 500 d1 values (for five subjects, 100 com-
parisons with their model) and 2000 d2 values (for five subjects, 100 comparisons
with the model of each of the other four subjects). The following functions of
histogram comparison are considered: chi-square and correlation.
The chi-square distribution function is as follows:
(H1 (I) − H2 (I))2
dchi (H1 , H2 ) = , (1)
H1 (I)
I

where H1 is the histogram of the model distribution, H2 is the histogram of the

subsample distribution.
The correlation function is equal to correlation coeﬃcient:

H1 (I) − H1 H2 (I) − H2
dcorr (H1 , H2 ) = I 2 2 , (2)
I H 1 (I) − H 1 I H 2 (I) − H 2

where H1 , H2 are the average values of the corresponding histograms.

The value of chi-square function represents distance: the smaller its value,
the “closer” the corresponding histograms are. The value of correlation func-
tion represents proximity: the greater its value, the “closer” the corresponding
histograms are.
The identity veriﬁcation algorithm requires a threshold for histogram com-
parison. The choice of the threshold in real applications should be made taking
into account the speciﬁcs of a task. In this paper, we use such a threshold that
leads to no more than 5% False Rejection Rate (FRR) and minimizes the False
Acceptance Rate (FAR).

4 Results
FRR (for operator’s identiﬁcation task) for the correlation and chi-square func-
tions decrease with the increase in the subsample size. The FRR of the selected
thresholds for the Timer and Shape stimuli (Table 2) is low as compared to the
Sound and Horizon stimuli. This is due to two factors: ﬁrst, the number of RT
Impact of Assistive Control on Operator Behavior 59

values for these stimuli in the dataset for each test is about 300 values, making
the subsample of 128 values close to the full model of the subject, substantially
reducing the distance (or increasing proximity) between them; second, the cor-
rect response to these stimuli is more diﬃcult for the subject than to the Sound
and Horizon stimuli, which can lead to visible diﬀerences in the behavior of the
subjects.

Table 2. The FRR for the Timer and Shape stimuli calculated for the FAR of 5%

Subsample Chi-square Correlation

size Timer Shape Timer Shape
4 93% 84% 67% 54%
8 91% 85% 51% 29%
16 83% 73% 33% 20%
32 73% 53% 12% 5%
64 12% 12% 0% 1%
128 0% 0% 0% 0%

We define the subject’s error as an average penalty per second value within an
experiment. An average error is defined as error averaged among all experiments
for a certain Mode. Table 3 illustrates the subject’s average error in different
scenarios and the portion of average error reduction, which is calculated as the
proportion of experiments in the Hard Mode 2 in which the subject’s error was
lower than the average error in the Hard Mode 1. It can be seen that for most
subjects, the addition of a visual delay in the appearance of the stimulus on
average increased efficiency. There are special cases with Subject 3, where the
efficiency has increased significantly, and with Subject 4, where the efficiency,
on the contrary, has decreased. Such changes indicate the individual nature of
the perception of simultaneous stimuli and different behaviors that are optimal
for different subjects.

Table 3. Average errors and error reduction coeﬃcients when adding visual delay to
the Sound stimulus. Explanations in the text

Subject The average The average Portion of

error value in error value in average error
the Hard the Hard reduction
Mode 1 Mode 2
1 0.81 0.78 0.73
2 0.63 0.60 0.71
3 1.11 0.87 1.00
4 0.58 0.65 0.33
5 0.70 0.70 0.71
60 M. Kopeliovich et al.

In the case of Normal Mode experiments, the correlation coeﬃcients between

the set of error values of all subjects in all experiments and the set of values of the
distances or proximity of the subsamples obtained from the relevant experiments
to the model of the subject are calculated.

Table 4. The correlation coeﬃcients calculated for the metrics under consideration
between the error and the correspondence of the subject’s model

Metrics Sound Timer Horizon Shape

Chi-square 0.35 0.14 0.25 0.21
Correlation −0.26 −0.18 −0.08 −0.29

Table 4 indicates low correlation for any metric and any stimulus. The highest
values of the coefficient are achieved on the Sound stimulus. Chi-square function
represents distance, so the correlation is positive: the greater the distance to
the own model, the greater the error. Similarly, we can explain the negative
correlation for the correlation function.
Normal Mode experiments, where the subject’s error was more than twice
their average error are referred to as failed experiments (experiments with critical
errors). There were three such experiments: one for each of the Subjects 3, 4,
and 5. We consider the hypothesis that the model of the subject’s behavior in
these experiments is significantly different from the subject general model. To
test the hypothesis, an analysis similar to the problem of recognition of subjects
is carried out, except for only three subjects participating in the analysis, and
only values in the selected experiments being used as subsamples of RT values.

5 Conclusion
A behavioral model of the operator was built based on histograms of the dis-
tribution of reaction times (RT) for particular stimuli in the test experiment
which imitates certain pilot’s actions. It was shown that such RT distribution
is unique to an individual, and therefore the operator can be identified based
only on behavioral model matching. Accuracy of such identification depends on
a number of reaction times registered for the person being identified. For exam-
ple, having 64 measurements of RT for Timer or Shape stimuli, FAR was 0% at
fixed FRR = 5% when identifying a particular operator among 5 possible ones.
According to our results, deviation of RT in particular experiment from typical
distribution for the operator is a weak correlate of performance; strong devia-
tion may indicate worse performance. Failure of behavioral model identification
of particular operator (besides indicating possible operator replacement) is a
strong indicator of unsafe or inefficient behavior, especially for the Shape stim-
ulus. A particular case of assistive control was considered which involves adding
latency to stimuli visualization for suppression of perceptive or cognitive conflict
Impact of Assistive Control on Operator Behavior 61

which occurs if an operator’s reaction is required for multiple stimuli appearing

simultaneously. This assistive control had an individual impact on an operator’s
performance; it increases the performance of most of the subjects in the test
experiment.
The behavior model of the subject in a failed experiment diﬀers from the
typical behavior model by some metrics. For example, only 8% of such samples
have been attributed to the general model of the operator behavior if using
correlation as proximity measure, hence in 92% of cases, the model behavior
would be recognized as atypical for this metric.

Acknowledgments. This work is supported by the Russian Ministry of Science and

Higher Education, project no. 2.955.2017/4.6.

References
1. Aloui, Z., Ahamada, N., Denoulet, J., Pierre, F., Rayrole, M., Gatti, M., Granado,
B.: Embedded real-time monitoring using SystemC in IMA Network. In: SAE 2016
Aerospace Systems and Technology Conference, September 2016, Hartford, United
States, pp. 1–4 (2016)
2. Didactic, F.: Process Control. Pressure, Flow, and Level. Legal Deposit — Library
and Archives Canada (2010)
3. Dittmeier, C., Casati, P.: Evaluating internal control systems. In: IIARF (2014)
4. Fotopoulos, J.: Process Control and Optimization Theory. Application to Heat
Treating Processes. Air Products and Chemicals Inc, Allentown (2006)
5. Kim, B., Bishu, R.: On assessing operator response time in human reliability anal-
ysis (HRA) using a possibilistic fuzzy regression model. Reliab. Eng. Syst. Saf.
52(1), 27–34 (1996)
6. Mahmud, J., Chen, J., Nichols, J.: When will you answer this? Estimating response
time in Twitter. In: Proceedings of the Seventh International AAAI Conference on
Weblogs and Social Media, pp. 697–700 (2013)
7. Martin, S., Vora, S., Yuen, K., Trivedi, M.: Dynamics of driver’s gaze: explorations
in behavior modeling and maneuver prediction. IEEE Trans. Intell. Veh. 3(2),
141–150 (2018)
8. O’Connor, D.: A Process Control Primer. Honeywell, Charlotte (2000)
9. Olum, Y.: Modern management theories and practices. In: East African Central
Banking Course, vol. 1, No. 11, pp. 5–6 (2004)
10. Rao, G.P.: Basic elements of control system. Control Syst. Robot. Autom. 1 (2009)
11. Stouﬀer, K., Pillitteri, V., Lightman, S., Abrams, M., Hahn, A.: Guide to industrial
control systems (ICS) security (2015)
Hierarchical Actor-Critic with Hindsight
for Mobile Robot with Continuous State Space

Staroverov Aleksey1 and Aleksandr I. Panov2,3(&)

1
Bauman Moscow State University, Moscow, Russia
[email protected]
2
Artiﬁcial Intelligence Research Institute, Federal Research Center
“Computer Science and Control” of the Russian Academy of Sciences,
Moscow, Russia
[email protected]
3
Moscow Institute of Physics and Technology, Moscow, Russia

Abstract. Hierarchies are used in reinforcement learning to increase learning

speed in sparse reward tasks. In this kind of tasks, the main problem is elapsed
time, required for the initial policy to reach the goal during the first steps.
Hierarchies can split a problem into a set of subproblems that could be reached
in less time. In order to implement this idea, Hierarchical Reinforcement
Learning (HRL) algorithms need to be able to learn the multiple levels within a
hierarchy in parallel, so these smaller subproblems could be solved at the same
time. Most famous existing HRL algorithms that can learn multi-level hierar-
chies are not able to efficiently learn levels of policies simultaneously, especially
in continuous space and action space environment. To address this problem, we
had analyzed the newest existing framework, Hierarchical Actor-Critic with
Hindsight (HAC), test it in the simulated mobile robot environment and deter-
mine the optimal configuration of parameters and ways to encode information
about the environment states.

Keywords: Hierarchical Actor-Critic Hindsight Experience Replay

Reinforcement learning

1 Introduction

Hierarchy has the potential to greatly accelerate reinforcement learning in sparse

reward tasks because hierarchical agents can decompose problems into smaller sub-
problems. In order to take advantage of the sample efficiency benefits of multi-level
hierarchies, an HRL algorithm must be able to learn the multiple levels within the
hierarchy in parallel. That is, hierarchical agents need to be able to simultaneously learn
both the appropriate subtasks and the sequences of primitive actions that achieve each
subtask. Yet the existing HRL algorithms that are capable of automatically learning
hierarchies in continuous domains [1–5] do not efficiently learn the multiple levels
within the hierarchy in parallel [10–12]. Instead, these algorithms often resort to
learning the hierarchy one level at a time. HAC can learn multiple levels of policies in
parallel. The hierarchies produced by HAC framework are comprised of a set of nested,

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 62–70, 2020.
https://doi.org/10.1007/978-3-030-30425-6_6
HAC with Hindsight for Mobile Robot with Continuous State Space 63

goal-conditioned policies that use the state space to decompose a task into short sub-
tasks. Authors demonstrate experimentally in both grid world and simulated robotics
domains that HAC approach can signiﬁcantly accelerate learning relative to other non-
hierarchical and hierarchical methods. Thus, the HAC framework [6] is the ﬁrst to
successfully learn 3-level hierarchies in parallel in tasks with continuous state and
action spaces (Fig. 1).

Fig. 1. Results of HAC framework with one, two and three level of hierarchy.

As a more realistic example, we tested HAC method using a simulated mobile

robot. The simulated environment and robot are shown in Fig. 2. The environment has
dimensions of 10 units by 10 units with walls and two visible rooms. The robot has
nine sonar sensors, a simple vision system with a 1-D ‘retina’, having a 135” field of
view and a gripper with a sensor that signals when a sphere is being gripped. The vision
system was modeled as a pinhole camera. Each sonar had a 15” field of view. The
sonar reading is the distance to the closest object within that 15” field of view but it
does not contain the exact heading to the object. This is consistent with the observed
behavior of sonars on physical robotic platforms.

Fig. 2. Simulated mobile robot environment (left) and HAC hierarchy (right).
64 S. Aleksey and A. I. Panov

The primitive action set consist of ﬁve actions: distance forward, distance forward
with angle rotation to the left and right, and the left or right rotation without distance
forward. If robot reaches the yellow sphere, the simulation is over and we achieve our
purpose.

2 Hierarchical Actor-Critic Algorithm

The hierarchies produced by HAC framework have a speciﬁc architecture consisting of

a set of nested, goal-conditioned policies that use the state space as the mechanism for
breaking down a task into subtasks. The hierarchy of nested policies works as follows.
The highest-level policy takes as input the current state and goal state provided by the
task and outputs a subgoal state. This state is used as the goal state for the policy at the
next level down. The policy at that level takes as input the current state and the goal
state provided by the level above and outputs its own subgoal state for the next level
below to achieve. This process continues until the lowest level is reached. The lowest
level then takes as input the current state and the goal state provided by the level above
and outputs a primitive action (Fig. 2). Further, each level has a certain number of
attempts to achieve its goal state. When the level either runs out of attempts or achieves
its goal state, execution at that levels ceases and the level above outputs another
subgoal.
The purpose of HAC framework is to efficiently learn a k-level hierarchy Pk1
consisting of k individual policies p0 ; . . .; pk1 , in which k is a hyperparameter chosen
by the user. In order to learn p0 ; . . .; pk1 in parallel, HAC framework transforms the
original Universal Markov Decision Process UMDP, Uoriginal ¼ ðS; G; A; T; R; cÞ, into a
set of k UMDPs U0 ; . . .; Uk1 , in which Ui ¼ ðSi ; Gi ; Ai ; Ti ; Ri ; ci Þ.
In our example with mobile robot, we choose HAC with 3 level of policies
ðp3 ; p2 ; p1 Þ as the most successful method, tested in the original article. The green
sphere is the policy level 2 ðp2 Þ goal, purple sphere is the policy level 1 ðp1 Þ goal
(Fig. 2). When the algorithm is fully trained, the agent must first reach purple sphere
with at most of 20 ticks of time, then reach green sphere with at most of 20 attempts of
generating purple sphere and finally reach goal yellow sphere with at most of 20
attempts of generating green sphere.
HAC approach enables agents to learn multiple policies in parallel using only
sparse reward functions because of two types of hindsight transitions. The example of
such transitions can be shown at next simple toy environment (Fig. 3).

Fig. 3. An example episode trajectory of the toy environment.

HAC with Hindsight for Mobile Robot with Continuous State Space 65

The tic marks along the trajectory show the next states for the robot after each
primitive action is executed. The pink circles show the original subgoal actions. The
gray circles show the subgoal states reached in hindsight after at most H actions by the
low-level policy.
Hindsight action transitions help agents learn multiple levels of policies simulta-
neously by training each subgoal policy with respect to a transition function that
simulates the optimal lower level policy hierarchy.
For toy example, action transition for the states S0 and S1 would be like:
• [initial state = s0, action = s1, reward = −1, next state = s1, goal = yellow flag,
discount rate = gamma]
• [initial state = s1, action = s2, reward = −1, next state = s2, goal = yellow flag,
discount rate = c]
The second type of hindsight transition, hindsight goal transitions, helps each level
learn a goal-conditioned policy in sparse reward tasks by extending the idea of
Hindsight Experience Replay [7] to the hierarchical setting.
The hindsight goal transition created by the fifth primitive action that achieved the
hindsight goal would be:
• [initial state = 4th tick mark, action = joint torques, reward = 0, next state = s1,
goal = s1, discount rate = 0]
Assuming the last state reached s5 is used as the hindsight goal, the first and the last
hindsight goal transition for the high level would be:
• [initial state = s0, action = s1, reward = −1, next state = s1, goal = s5, discount
rate = c]
• [initial state = s4, action = s5, reward = 0, next state = s5, goal = s5, discount
rate = 0]
Hindsight goal transitions should significantly help each level learn an effective
goal-conditioned policy because it guarantees that after every sequence of actions, at
least one transition will be created that contains the sparse reward (in our case a reward
and discount rate of 0). These transitions containing the sparse reward will in turn
incentivize the UVFA critic function to assign relatively high Q-values to the (state,
action, goal) tuples described by these transitions. The UVFA can then potentially
generalize these high Q-values to the other actions that could help the level solve its
tasks.
Technically, HAC builds off three techniques from the reinforcement learning lit-
erature [2]:
• the Deep Deterministic Policy Gradient (DDPG) learning algorithm [8]
• Universal Value Function Approximators (UVFA) [9]
• Hindsight Experience Replay (HER) [7].
66 S. Aleksey and A. I. Panov

2.1 DDPG: An Actor-Critic Algorithm

DDPG serves as the key learning infrastructure within Hierarchical Actor-Critic. It is an
actor–critic algorithm and thus uses two neural networks to enable agents to learn from
experience.
The actor network learns a deterministic policy that maps from states to actions

p:S!A ð1Þ

The critic network approximates the Q-function or the action-value function of the
current policy.

Qp ðst ; at Þ ¼ E ½Rt jst ; at ð2Þ

Rt: the discounted sum of future rewards.

Thus, the critic network maps from (state, action) pairs to expected long-term
reward:

Q:SA!R ð3Þ

The agent ﬁrst interacts with the environment for a period using a noisy policy
pðsÞ þ N ð0; 1Þ. The transitions experienced are stored as ðst ; at ; rt ; st þ 1 ; gt ;Þ.
The agent then updates its approximation of the Q-function of the current policy by
performing min-batch gradient descent on the loss function:

L ¼ ðQðst ; at Þ yt Þ2 ð4Þ

yt ¼ rt þ cQðst þ 1 ; pðst þ 1 ÞÞ ð5Þ

yt the Bellman estimate of the Q-function

The agent modiﬁes its policy based on the updated approximation of the action-
value function. The actor function is trained by moving its parameters in the direction
of the gradient of Q w.r.t. the actors parameters.
The hierarchical policy is composed of multiple goal-based policies or actor net-
works (Fig. 4).

2.2 Universal Value Function Approximator

Value functions are a core component of reinforcement learning systems. The main
idea is to construct a single function approximator V (s; h) that estimates the long-term
reward from any state s, using parameters h.
The main idea of Universal Value Function Approximator (UVFA) [9] compare to
ordinary value function approximator is to generalize not just over states s but also over
goals g. UVFA improve learning, by factoring observed values into separate embed-
ding vectors for state and goal, and then learning a mapping from s and g to these
factored embedding vectors (Fig. 5).
HAC with Hindsight for Mobile Robot with Continuous State Space 67

Fig. 4. Actor-Critic networks for hierarchical policy with 1 sub-goal layer.

Fig. 5. Diagram of the function approximation architectures.

At the left side of the ﬁgure, there is concatenated architecture. At the center is two-
stream architecture with two separate sub-networks combined at h. At the right shown a
decomposed view of two-stream architecture when trained in two stages, where target
embedding vectors are formed by matrix factorization (right sub-diagram) and two
embedding networks are trained with those as multi-variate regression targets (left and
center sub-diagrams).
Thus, instead of Qp ðst ; at Þ ¼ E ½Rt jst ; at , we use Qp ðst ; at ; gt Þ ¼ E½Rt jst ; at ; gt .
68 S. Aleksey and A. I. Panov

2.3 Hindsight Experience Replay

As the toy robot example illustrates, it can be difficult for any level in our framework to
receive the sparse reward. In the Actor-Critic algorithm, a buffer of past experiences is
used to stabilize training by decorrelating the training examples in each batch used to
update the neural network. This buffer records past states, the actions taken at those
states, the reward received, the next stat that was observed and goal that we wanted to
achieve. As we have seen, the data in the experience replay buffer can originate from an
exploration policy, which raises an interesting possibility; what if we could add ficti-
tious data, by imagining what would happen had the circumstance been different? This
is exactly what Hindsight Experience Replay (HER) [7] does in fact. Even though an
agent may have failed to achieve its given goal in an episode, the agent did learn a
sequence of actions to achieve a different objective in hindsight – the state in which the
agent finished.
Thus, learning how to achieve different goals in the goal space should help the
agent better determine how to achieve the original goal, by creating a separate copy of
the transitions that occurred in an episode and replacing:
• the original goal with the goal achieved in hindsight
• the original reward with the appropriate value given the new goal.

3 Experiment

For mobile robot environment as features, we choose x, y position, angle of rotation,

the x, y component of velocity and sensors data. Combination with all of those features
gives better performance compared to only x, y components and angle rotation. Only
sensors data lead to big problems, cause at different agent position, sensors can return
the same data. Due to that, higher levels of hierarchy will fail to give the proper goal to
lower level. Without the velocity vector, the agent also can get trouble with the learning
process, because the next state starts to depend on the previous state that breaks the
MDP concept (Fig. 6).

Fig. 6. Figure compares the performance of HAC with sensors data features (right) and without
sensors data features (left). The charts show the average success rate.
HAC with Hindsight for Mobile Robot with Continuous State Space 69

4 Results

Hierarchy has the potential to accelerate learning in sparse reward tasks because
hierarchy can decompose tasks into short horizon subtasks. A new framework HAC
can solve those simpler subtasks simultaneously. As for our mobile robot environment
with continuous space of states, the HAC outperform basic algorithms by 20–30% with
a relatively small area of the environment. With increasing of the size, this gap growth
will continue due to hindsight actions. One of the issues of this approach, that we
cannot set other different rewards than uses in hindsight. Due to that, we cannot
penalize actions that lead the agent to the wall collisions. It could be harmful when we
would train the algorithm in the real world with a real agent. The biggest advantage of
this approach is deﬁnitely a hierarchical neural network structure, because we can
transfer higher levels of neural networks weights to another agent or another envi-
ronment, which will dramatically decrease the time of training.

Acknowledgements. The reported study was supported by RFBR, research Projects No. 17-29-
07079.

References
1. Schmidhuber, J.: Learning to generate sub-goals for action sequences. In: Kohonen, T.,
Mäkisara, K., Simula, O., Kangas, J. (eds.) Artificial Neural Networks, pp. 967–972.
Elsevier Science Publishers B.V., North-Holland (1991)
2. Konidaris, G.D., Barto, A.G.: Skill discovery in continuous reinforcement learning domains
using skill chaining. Adv. Neural. Inf. Process. Syst. 22, 1015–1023 (2009)
3. Bacon, P.-L., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the
Thirty-First AAAI Conference on Artificial Intelligence, pp. 1726–1734 (2017)
4. Vezhnevets, A., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., Kavukcuoglu,
K.: FeUdal networks for hierarchical reinforcement learning. In: Proceedings of the 34th
International Conference on Machine Learning, pp. 3540–3549 (2017)
5. Nachum, O., Gu, S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning.
Adv. Neural. Inf. Process. Syst. 31, 3303–3313 (2018)
6. Levy, A., Konidaris, G., Platt, R., Saenko, K.: Learning multi-level hierarchies with
hindsight. arXiv:1712.00948. [cs.AI], March 2019
7. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B.,
Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay. Adv. Neural. Inf. Process.
Syst. 30, 5048–5058 (2017)
8. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.:
Continuous control with deep reinforcement learning. CoRR (2015). arXiv:1509.02971
9. Silver, D., Schaul, T., Horgan, D., Gregor, K.: Universal value function approximators. In:
International Conference on Machine Learning (July 2015)
10. Shikunov, M., Panov, A.I.: Hierarchical reinforcement learning approach for the road
intersection task. In: Samsonovich, A.V. (ed.) Biologically Inspired Cognitive Architectures
2019. Springer, Cham (2019)
70 S. Aleksey and A. I. Panov

11. Kuzmin, V., Panov, A.I.: Hierarchical reinforcement learning with options and united neural
network approximation. In: Abraham, A., Kovalev, S., Tarassov, V., Snasel, V., Sukhanov,
A. (eds.) Proceedings of the Third International Scientiﬁc Conference “Intelligent
Information Technologies for Industry” (IITI 2018), pp. 453–462. Springer, Cham (2018)
12. Ayunts, E., Panov, A.I.: Task planning in “Block World” with deep reinforcement learning.
In: Samsonovich, A.V., Klimov, V.V. (eds.) Biologically Inspired Cognitive Architectures
(BICA) for Young Scientists, pp. 3–9. Springer, Cham (2017)
The Hybrid Intelligent Information
System for Music Classification

Aleksandr Stikharnyi, Alexey Orekhov, Ark Andreev, and Yuriy Gapanyuk(B)

Bauman Moscow State Technical University, Moscow, Russia

[email protected]

Abstract. The article proposes an approach for music classiﬁcation

problem using hybrid intelligent information systems (HIIS). The HIIS
consists of two main components: the subconsciousness module and the
consciousness module. The subconsciousness module is implemented as
a set of binary classifiers based on the LSTM network. The output of the
subconsciousness module is the metadata stored in the metadata buffer.
The consciousness module is implemented using decision trees approach.
The implementation is based on the CART algorithm from the scikit–
learn library. The output of the consciousness module is the predicted
class of the music classification problem. The experiments were con-
ducted using custom dataset. The algorithms of three levels of complexity
were used for experiments: the logistic regression approach (the simplest
model), the multilayer perceptron approach (the model of medium com-
plexity), the HIIS approach (the model of high complexity). The results
of the experiments make sure the validity of the proposed HIIS approach.

Keywords: Music classiﬁcation problem ·

Hybrid intelligent information system (HIIS) ·
Subconsciousness module · Consciousness module · LSTM ·
Decision trees

1 Introduction
Recently, the use of machine learning in the field of music processing is signif-
icantly increasing. One of the tasks of such processing is the problem of music
classification. The review of existing methods for this problem is considered in
details in [1,2]. We propose an approach based on the concept of hybrid intelli-
gent information systems (HIIS). In this article, we will consider the details of
the HIIS-based system implementation for music classification and discuss the
results of experiments.

2 The HIIS-Based Approach for Music Classification

The music classiﬁcation problem may be described as follows. Let S be an arbi-
trary set of musical compositions consisting of vectors x ∈ S; {1, 2, . . . , N } is a
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 71–77, 2020.
https://doi.org/10.1007/978-3-030-30425-6_7
72 A. Stikharnyi et al.

set of N user classes. Then the classiﬁcation problem is reduced to the construc-
tion of the mapping algorithm:

f ∗ : {x|x ∈ S} ⇒ {1, 2, . . . , N }, (1)

consistent with the real users of the system. In other words, it is necessary to
build an algorithm that assigns to each music track of an arbitrary set one of
the predeﬁned class labels according to users’ real music preferences. For this
purpose, the HIIS-based approach is used.
The HIIS approach is described in details in the paper [3]. According to the
[3] the HIIS consists of two main components: the subconsciousness module (MS)
and the consciousness module (MC).
The subconsciousness module is related to the environment in which a HIIS
operates. Because the environment can be represented as a set of continuous
signals, the data processing techniques of the MS are mostly based on neural
networks, fuzzy logic, and combined neuro-fuzzy methods.
The consciousness module performs logical processing of information. It may
be based on traditional programming or workﬂow technology, and in particular,
the rule-based programming approach is gaining popularity.
In the proposed approach, both MS and MC are based on machine learning
algorithms. The generalized structure of the intelligent system is represented in
Fig. 1.

Fig. 1. The generalized structure of the intelligent system

It should be noted that the hybrid system as a whole is implemented as an

intelligent agent using the experience replay approach [4,5]. The metadata buffer
is used for replay purposes. This approach allows us to move from sequential to
parallel training. After receiving first metadata, we generate mini-batches from
metadata buffer and train the subconscious and the consciousness modules at
the same time.
The implementation of the subconsciousness and the consciousness modules
are discussed in details in the following sections.
The Hybrid Intelligent Information System for Music Classification 73

3 The Implementation of the Subconsciousness Module

The subconsciousness module is implemented as a set of binary classifiers [6]
based on the LSTM network [7].
On the input classifier received of vectors describing small time intervals in
sequence. To do this, it is necessary to break down into N images of 4096 samples
length, each of which is given in the amplitude-frequency characteristic using the
FFT transform [8]. This transformation, in effect, turns the time series into a
set of the magnitude of different frequencies that make up the time series. This
snapshot length provides optimal resolution in both frequency and time. For a
file with a sampling frequency of 44100 Hz–11 Hz and 93 ms.
The set of values obtained after the transformation is converted into a vector
using the convolution function (moving average) and statistical estimates. As
a rule, intervals of 1–10 s are used as an interval for work. Since it is impor-
tant for us to trace the dynamics of changes in parameters, each training vector
will include sequences of these parameters in time. It is worth saying that we
have a restriction on the length of the training vector dictated by computational
capabilities. For this reason, if you increase the time expansion, you will have to
reduce the frequency expansion. Thus, each vector will describe 20 frequency cor-
ridors (500 Hz) and their dynamics over the interval of 60 s, taking into account
overlaps.
The structure of the LSTM network is represented in Fig. 2.
Thus, the output of the subconsciousness module is the metadata stored in
the metadata buffer.

4 The Implementation of the Consciousness Module

The goal of the consciousness module is to create a model that predicts the value
of a target variable by learning simple decision rules inferred from the metadata
(stored in the metadata buffer). The metadata was received from a large number
of binary classifiers.
The metadata is a set of relative probabilities that an object belongs to the
two randomly selected classes. We assume that any object belongs to all classes,
but with a different probability. Then the task will be to “fit” the mixture of
distributions to the data and then to determine the probabilities of the observa-
tions belonging to each class. Obviously, the observation should be assigned to
the class for which this probability is higher.
To solve this task, we propose to use the decision trees as a kind of rule-based
approach. The practical implementation of the system is based on the CART
algorithm [9] from the scikit-learn library [10].
As an impurity measure the Gini impurity was used [11]:

H(Xm ) = pmk (1 − pmk ) (2)
k

where pmk - the proportion of class k observations in node m.

74 A. Stikharnyi et al.

Fig. 2. The structure of the LSTM network

Fig. 3. The structure of the consciousness decision tree

The structure of the consciousness decision tree is represented in Fig. 3.

Thus, the output of the consciousness module is the predicted class of the
music classiﬁcation problem.
The Hybrid Intelligent Information System for Music Classiﬁcation 75

5 The Experiments
For our research, we used custom dataset, which includes 1378 tracks divided
into three classes, with a sampling frequency of 44,100 Hz, the average track
length was 137.3 s.
Since we used custom dataset, we did not have the opportunity to com-
pare the obtained quality metrics with the quality metrics obtained by other
researchers.
Therefore, to ensure the validity of the proposed approach, we have conducted
experiments with algorithms of three levels of complexity: the logistic regression
approach (the simplest model); the multilayer perceptron approach (the model
of medium complexity) [12]; the HIIS approach (the model of high complexity).
The Precision, Recall and F1 -score (F-Measure) were used as classiﬁcation
metrics [13]. The experiments results are represented in Table 1.

Table 1. The experiments results

Approach Class0 Class1 Class2 Precision Recall F1 -score
The HIIS approach TP 129 108 140 377 0.984 0.926 0.95
FP 1 0 5 6
FN 7 20 3 30
TN 1 3 3 7
The multilayer TP 115 98 133 346 0.797 0.813 0.8
perceptron approach FP 27 29 32 88
FN 19 35 34 80
TN 2 3 1 6
The logistic TP 119 103 127 349 0.751 0.691 0.72
regression approach FP 17 42 56 115
FN 47 71 38 156
TN 0 4 2 6

The results of the experiments turned out to be expected. The logistic regres-
sion approach (the simplest model) shows the worst results. The multilayer per-
ceptron approach (the model of medium complexity) shows the medium results.
The HIIS approach (the model of high complexity) shows the best results. Thus,
the results of the experiments make sure the validity of the proposed approach.
To assess the quality of the classiﬁer (HIIS model), ROC-curves were built
[14]. The ROC function can also be used in multiclass classiﬁcation if the pre-
dicted outputs have been binarized. For this reason, ROC-curves are plotted for
each class. There are then a number of ways to average binary metric calcula-
tions across the set of classes, each of which may be useful in some scenario (we
use micro-average and macro-average metrics). AUC (Area Under Curve) is an
76 A. Stikharnyi et al.

Fig. 4. ROC curves for the HIIS model

aggregated quality characteristic of a classiﬁcation, independent of the price ratio

of errors. The higher the AUC value, the better the classiﬁcation model. The
following results were obtained for the HIIS model: AU Cmicro−average = 0.87,
AU Cmacro−average = 0.86. The ROC curves for the HIIS model are represented
in Fig. 4.

6 Conclusions

The article proposes an approach for music classification problem using hybrid
intelligent information systems (HIIS). The hybrid system as a whole is imple-
mented as an intelligent agent using the experience replay approach.
The subconsciousness module is related to the environment in which a HIIS
operates. Because the environment can be represented as a set of continuous
signals, the data processing techniques of the MS are mostly based on neural
networks, fuzzy logic, and combined neuro-fuzzy methods. In the proposed app-
roach, it is implemented as a set of the binary classifier based on the LSTM
network.
The consciousness module performs logical processing of information. It may
be based on traditional programming, workflow technology, rule-based program-
ming. In the proposed approach it is implemented using decision trees.
The experiments were conducted using custom dataset. The results of the
experiments make sure the validity of the proposed approach.
The Hybrid Intelligent Information System for Music Classification 77

References
1. Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification
and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011). https://doi.org/
10.1109/TMM.2010.2098858
2. Goienetxea, I., Martı́nez-Otzeta, J.M., Sierra, B., Mendialdua, I.: Towards the use
of similarity distances to music genre classification: a comparative study. PloS one
13(2), e0191417 (2018). https://doi.org/10.1371/journal.pone.0191417
3. Chernenkiy, V., Gapanyuk, Yu., Terekhov, V., Revunkov, G., Kaganov, Y.: The
hybrid intelligent information system approach as the basis for cognitive architec-
ture. Procedia Comput. Sci. 145, 143–152 (2018). http://www.sciencedirect.com/
science/article/pii/S187705091832307X
4. Zhang, S., Sutton, R.S.: A deeper look at experience replay. arXiv preprint
arXiv:1712.01275 (2017)
5. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P.,
McGrew, B., Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay.
arXiv preprint arXiv:1707.01495 (2017)
6. Koyejo, O.O., Natarajan, N., Ravikumar, P.K., Dhillon, I.S.: Consistent binary
classification with generalized performance metrics. In: Proceedings of the 27th
International Conference on Neural Information Processing Systems – vol. 2, pp.
2744–2752 (2014)
7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8),
1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
8. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
9. Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applica-
tions, 2nd edn. World Scientific Publishing Co., New Jersey (2014)
10. The Scikit-Learn Library: Decision trees. https://scikit-learn.org/stable/modules/
tree.html. Accessed 24 May 2019
11. Modarres, R., Gastwirth, J.L.: A cautionary note on estimating the standard error
of the Gini index of inequality. Oxf. Bull. Econ. Stat. 68(3), 385–390 (2006)
12. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York
(1994)
13. Powers, D.: Evaluation: from precision, recall and F-measure to ROC, informed-
ness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
14. Pepe, M.S.: The Statistical Evaluation of Medical Tests for Classification and Pre-
diction. Oxford University Press, Oxford (2004)
The Hybrid Intelligent Information System
for Poems Generation

Maria Taran, Georgiy Revunkov, and Yuriy Gapanyuk(&)

Bauman Moscow State Technical University, Moscow, Russia

[email protected]

Abstract. Any generated text must have a “form” and “content” components. It
is the “content” that is the main component of the generated text, but the “form”
component is no less important. It may be necessary to generate texts in different
linguistic styles among them the poetic linguistic style. The article proposes an
approach for poems generation problem using hybrid intelligent information
systems (HIIS). The HIIS consists of two main components: the subcon-
sciousness module and the consciousness module. In the case of poems gen-
eration, the subconsciousness module consists of two submodules: the stress
placement module and the rhyme and rhythm module. These modules use
machine learning techniques. The consciousness module includes the poem
synthesis module, which is rule-based. The stress placement module is based on
the convolutional neural network. On the test dataset, the accuracy of the
classiﬁer is 97.66%. The rhyme and rhythm module based on neural networks
with a depth of 5–7 layers. On the test dataset, the accuracy of the classiﬁer is
91.63%.

Keywords: Natural-Language Generation (NLG)

Hybrid Intelligent Information System (HIIS) Subconsciousness module
Consciousness module LSTM Convolutional neural network

1 Introduction

According to the Gartner’s report on BI Tools 2018 [1], “By 2020, natural language
generation and artiﬁcial intelligence will be a standard feature of 90% of modern
business intelligence platforms”. Thus, taking into account the needs of the industry,
the natural-language generation (NLG) is a very important area of software engineering
development [2].
The business intelligence platform may be considered as a special case of an
intelligent assistant agent. The concepts of such assistants have long been developed,
and there is no doubt that such software systems or hardware-software devices will be
used more and more. The natural-language generation (NLG) module should also be
part of such assistants.
It should be noted that at present, the area of text-based speech synthesis is also
actively developing. Therefore, we can assume that solving the problem of generating
text, we simultaneously create both a writing and a speaking agent.

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 78–86, 2020.
https://doi.org/10.1007/978-3-030-30425-6_8
The Hybrid Intelligent Information System for Poems Generation 79

Any generated text must have a “form” and “content” components. There is no
doubt that it is the “content” that is the main component of the generated text. But if the
assistants will be widely used, the questions of the “form” component of the transfer of
the text become no less important. In the process of generating text, the assistant should
be guided by the age of the interlocutor, his level of knowledge, and other aspects.
Depending on the context of the situation and the peculiarities of the interlocutor, it
may be necessary to generate texts in different linguistic styles.
This article is devoted to the texts generation in the poetic linguistic style. On the
one hand, the task of poems generation from the point of view of industry needs can be
viewed as a “toy” one. Indeed, it is difficult to assume that even in the distant future
financial statements will be formed in a poetic form. But on the other hand, the task of
poems generation is simply a special case of the task of generating texts in different
“forms”.
According to [3], traditional approaches to the generation of poems include:
1. Template Based Poetry Generation: templates of poetry forms are filled with words
that suit the defined constraints (either syntactic, rhythmic, or both).
2. Generate and Test Approaches: random word sequences are produced according to
formal requirements, that may involve metric, other formal constraints, and
semantic constraints.
3. Case-Based Reasoning Approaches: existing poems are retrieved, considering a
targeted message provided by the user, and are then adapted to fit the required
content.
4. Evolutionary Approaches: poetry generation is based on evolutionary computation.
Obviously, the only evolutionary approach takes full advantage of the methods of
artificial intelligence. Within the framework of the evolutionary approach, one of the
most detailed papers is the dissertation [4].
Now, according to [5], the methods of generating poems are increasingly using
artificial intelligence methods, especially deep neural networks. The example of such
an approach is an interactive poetry generation system “Hafez” [6, 7]. The “Hafez”
system generates poems in three steps:
1. Search for related rhyme words given user-supplied topic.
2. Create a finite-state acceptor (FSA) that incorporates the rhyme words and controls
meter.
3. Use a recurrent neural network (RNN) to generate the poem string, guided by the
FSA.
The features of the “Hafez” system is that it is firstly focused on dialogue with the
user, and secondly generates poems in English.
The proposed approach does not involve a dialogue with the user and is focused on
the Russian language.
Thus, despite the many years of effort, the generation of poems remains an open
problem. And the authors hope that this article will be a small step in the direction of
poems generation.
80 M. Taran et al.

2 The HIIS-Based Approach for Poems Generation

To solve the poems generation problem, we propose to use the approach based on the
hybrid intelligent information system (HIIS). The HIIS-based approach is described in
details in the paper [8]. In this section, we will briefly review the HIIS-based approach
and consider its application for poems generation. The generalized structure of a hybrid
intelligent information system is represented in Fig. 1.

Fig. 1. The generalized structure of a hybrid intelligent information system

According to [8], the HIIS structure consists of two main components: the sub-
consciousness module (MS) and the consciousness module (MC).
The MS (subconsciousness module) is related to the environment in which a HIIS
operates. Because the environment can be represented as a set of continuous signals,
the data processing techniques of the MS are mostly based on neural networks, fuzzy
logic, combined neuro-fuzzy methods, and machine learning techniques.
The MC (consciousness module) is traditionally based on conventional data and
knowledge processing, which may be based on traditional programming, workflow
technology, rule-based programming approach.
The advantages of a rules-based approach include flexibility. In this case, the
program is not hardcoded but forward chained with rules based on the data. The
disadvantages include the possibility of rules cycling and the complexity of processing
a large set of rules. Nowadays, for the processing of a large set of rules, the Rete
algorithm and its modiﬁcations are used.
To build the module of consciousness, it is possible to use machine learning
techniques, for example, building a set of rules in the form of a decision tree.
The Hybrid Intelligent Information System for Poems Generation 81

From the interaction point of view, the following options or their combinations are
possible in a HIIS:
1. Interaction is implemented through the environment. The MS reads the data from
the environment, converts them, and transmits them to the MC. The MC performs
logic processing and returns the results to the MS (if transformation is required) or
directly to the environment. The MS transforms the results and writes them into the
environment, where they can be read by another HIIS.
2. The MI (Module of Interaction) is used for the interaction with another HIIS.
Depending on the tasks to be solved, the MI can interact with the MC (which is
typical for conventional information systems) or with the MS (which is typical for
systems based on soft computing).
3. User interaction can be carried out using the MC (which is typical for conventional
information systems) or through the MS (which can be used, for example, in
automated simulators).
In the case of poems generation, the subconsciousness module consists of two
submodules: the stress placement module and the rhyme and rhythm module. These
modules use machine learning techniques. The consciousness module includes the
poem synthesis module, which is rule-based. The generalized structure of the HIIS for
poems generation is represented in Fig. 2.

The prose text The text in the poetic form

The stress
The rhyme and
placement The poem synthesis module
rhythm module
module

The subconsciousness module (MS) The consciousness module (MC)

Fig. 2. The generalized structure of the HIIS for poems generation

The interaction is implemented through the environment. In this case, the text in
prose or poetic form is considered as the environment.
The proposed approach is implemented for the Russian language. The implemen-
tation of modules is discussed in details in the following sections.

3 The Stress Placement Module

The input of the module is the word in Russian without stress, and the output is the
same word, but with the stress.
82 M. Taran et al.

The module is built on a hybrid approach, combining both rule processing (for
simple cases) and machine learning (for more complex cases). The module operation
algorithm contains the following steps:
1. The input word is converted to the required format, morphological analysis is
performed, and the initial dataset is formed for further processing.
2. In order to detect simple cases, the generated data is processed using a set of rules.
For example, of such a rule, the Russian letter “Ё” is always stressed.
3. If none of the rules ﬁres, then the machine learning model is used.
4. At the output of the module, a stressed word is formed in a human-readable format
as well as in the form of a dataset for further processing.
From the point of view of machine learning, the stress placement problem may be
considered as a problem of multi-class classiﬁcation. The features of the model are the
word itself and additional data that are extracted after morphological analysis. The
target feature is the position of the stressed letter in the word.

Fig. 3. The neural network architecture for the stress placement module

Experiments with convolutional neural network and LSTM-network were carried

out during the construction of the classiﬁer. With a comparable quality model based on
convolutional neural network is trained much faster. The categorical cross-entropy was
used as a loss function, and accuracy was used as a metric.
The Hybrid Intelligent Information System for Poems Generation 83

The neural network architecture was chosen experimentally. The ﬁnal architecture
is shown in Fig. 3. On the test dataset, the accuracy of the classiﬁer was 97.66%. The
neural network was trained for 16 epochs. The results are shown in Fig. 4.

Fig. 4. The metrics for the stress placement module

The example of stress placement module output (in Russian, stressed letters are
capitalized), “нa ocнOвe Этиx дAнныx тpEбyeтcя вoccтaнoвИть нeЯвнyю зaвИ-
cимocть тo ecть пocтpOить aлгopИтм cпocOбный для любOгo вoзмOжнoгo
вxOднoгo oбъEктa вЫдaть дocтAтoчнo тOчный клaccифицИpyющий oтвEт”.

4 The Rhyme and Rhythm Module

One or several sentences can be submitted to the module input depending on the total
number of words. This behavior is caused by learning four-line stanzas.
First, the input text is divided into words. The stress placement module is used to
determine the stress for each word. Then the features for machine learning models are
created. These features include the selected syllables and stresses, as well as the last
few letters in words.
Different machine learning methods were used to determine the appropriate words
for rhyme, size, presence or absence of alliteration and other target features. A separate
model was trained to predict each individual target feature.
The search for words for rhyme is performed on the basis of a pre-formed dic-
tionary. The dictionary contains both possible word endings and their alternation.
Based on empirically selected rules, only the most probable word sequences for a given
text are left in the dictionary. Neural networks with a depth of 5–7 layers were used to
determine other target features.
84 M. Taran et al.

A separate task is the formation of a data set for models training. A dataset was
prepared with poems by well-known authors suitable for speciﬁc conditions: each verse
contains four lines; white poems are removed from the corpus.
Since the resulting dataset contained several thousand examples, it was decided to
set the values of the target features automatically. For this purpose, methods of
dimensionality reduction [9] (PCA algorithm) and hierarchical clustering [10] were
used. As a result, seven separate clusters were identiﬁed. The visualization of clusters
obtained by the t-SNE [11] algorithm is shown in Fig. 5.

Fig. 5. The clusters visualization results

On the test dataset, the accuracy of the classiﬁer was 91.63%. The neural network
was trained for ten epochs. The results are shown in Fig. 6.

Fig. 6. The metrics for the rhyme and rhythm module

The Hybrid Intelligent Information System for Poems Generation 85

To improve the quality of classiﬁcation, it is planned to enhance the quality of the

dataset and work out the neural network architecture in more detail.

5 The Poem Synthesis Module

The poem synthesis module is rule-based. The module input is the input prose text and
the set of features received from the rhyme and rhythm module. The module output is
the generated text in the poetic form.
A stanza is not formed if the deviation from the template is two or more. Decli-
nation and conjugation of words are also not produced in the current version, which
leaves room for further improvement in the quality of the system’s work.
The example of the module output (in Russian, the input text is a fragment of the
cookbook): “Heoбxoдим cыp пoxoжий нa бpынзy. \Oтличнo пoдoйдeт пapмeзaн.
\Eгo нaдo пopeзaть кycoчкaми. \Bce cмeшaть и зaпpaвить мacлoм”.

6 Conclusions

The article proposes an approach for poems generation problem using hybrid intelligent
information systems (HIIS). HIIS consists of two main components: the subcon-
sciousness module (MS) and the consciousness module (MC).
In the case of poems generation, the subconsciousness module consists of two
submodules: the stress placement module and the rhyme and rhythm module. These
modules use machine learning techniques. The consciousness module includes the
poem synthesis module, which is rule-based.
The stress placement module is based on the convolutional neural network. On the
test dataset, the accuracy of the classiﬁer is 97.66%.
The rhyme and rhythm module is based on neural networks with a depth of 5–7
layers. On the test dataset, the accuracy of the classiﬁer is 91.63%.
The task of poems generation is simply a special case of the task of generating texts
in different “forms”. The proposed approach allows generating the text in the poetic
form from the prose text.

References
1. Gartner Report on BI Tools 2018. https://systelligent.com/gartner-report-on-bitools-2018.
Accessed 24 May 2019
2. Khurana, D., Koli, A., Khatter, K., Singh, S.: Natural language processing: state of the art,
current trends and challenges. arXiv preprint. arXiv:1708.05148 (2017)
3. Gervas, P.: Exploring quantitative evaluations of the creativity of automatic poets. In:
Workshop on Creative Systems, Approaches to Creativity in Artiﬁcial Intelligence and
Cognitive Science, 15th European Conference on Artiﬁcial Intelligence (2002)
4. Manurung, H.M.: An evolutionary algorithm approach to poetry generation. Ph.D. thesis,
Institute for Communicating and Collaborative Systems, School of Informatics, University
of Edinburgh (2003)
86 M. Taran et al.

5. Pandya, M.: NLP based poetry analysis and generation. Technical report. https://doi.org/10.
13140/rg.2.2.35878.73285 (2016)
6. Ghazvininejad, M., Shi, X., Choi, Y., Knight, K.: Generating topical poetry. In: Proceedings
of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1183–
1191 (2016). https://doi.org/10.18653/v1/d16-1126
7. Ghazvininejad, M., Shi, X., Priyadarshi, J., Knight, K.: Hafez: an interactive poetry
generation system. In: Proceedings of ACL 2017, System Demonstrations, pp. 43–48
(2017). https://doi.org/10.18653/v1/p17-4008
8. Chernenkiy, V., Gapanyuk, Y., Terekhov, V., Revunkov, G., Kaganov, Y.: The hybrid
intelligent information system approach as the basis for cognitive architecture. Procedia
Comput. Sci. 145, 143–152 (2018). http://www.sciencedirect.com/science/article/pii/
S187705091832307X
9. Maaten, L.V., Postma, E.O., Herik, J.V.: Dimensionality reduction: a comparative review.
J. Mach. Learn. Res. 10(66–71), 13 (2009)
10. Mishra, H., Tripathi, S.: A comparative study of data clustering techniques. Int. Res. J. Eng.
Technol. (IRJET) 4(5), 1392–1398 (2017)
11. Linderman, G.C., Steinerberger, S.: Clustering with t-SNE, provably. arXiv preprint. arXiv:
1706.02582 (2017)
Cognitive Sciences and Brain-Computer
Interface, Adaptive Behavior and
Evolutionary Simulation
Is Information Density a Reliable Universal
Predictor of Eye Movement Patterns
in Silent Reading?

Valeriia A. Demareva1(&) and Yu. A. Edeleva2

1
Lobachevsky State University, Nizhny Novgorod 603950, Russia
[email protected]
2
Technical University of Braunschweig, 38106 Brunswick, Germany

Abstract. The role of information density as a reliable universal predictor of

eye movement patterns in silent reading is considered. Density differences
between Russian and English are taken to explain the difference in eye move-
ment patterns for readers with Russian as a native language compared to
English-speaking readers. An empirical eye tracking study shows that only one
of four expectations got conﬁrmed. Supposedly, the eye-movement pattern
observed for Russian could be influenced by some additional language-speciﬁc
properties of Russian other than information density. We conclude that a uni-
versal algorithm that allows to predict eye movement patterns during silent
reading based on language density only hardly ever exists.

Keywords: Eye movements Information density Reading Prediction

Modeling

1 Introduction

Today, many studies in the field of computer vision focus, among other things, on eye
movement recognition [1, 2]. Therefore, a lot of computational models of eye move-
ments appear, that help not only to make inferences about the linguistic processes
involved in reading, but also to diagnose neurocognitive disorders [3, 4]. These models
are usually developed for specific languages. To make a model cross-linguistically
applicable it is important to define universal factors that determine eye-movement
reading patterns. This paper investigates information density as such a universal factor.
Variability between languages remains a key issue in psychology and linguistics, as
understanding of universal patterns of reading can feed models of information pro-
cessing. Frost et al. speak of the necessity to define independent cross-linguistic
parameters that underlie theory-motivated models of reading [5]. Yet a number of
scholars deny universality in reading patterns across languages [6]. If such universals
exist, they represent general principles by which information from print is extracted by
the writing processing system. Moreover, the most obvious prediction that can be made
based on Universality Theory by Frost et al. is that there are different ways of visual
information encoding in different writing systems. However, the time it takes to extract

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 89–94, 2020.
https://doi.org/10.1007/978-3-030-30425-6_9
90 V. A. Demareva and Yu. A. Edeleva

encoded meaning should remain comparable across languages regardless of the type of
encoding.
Humans with unimpaired visual system sample their environment by making a
series of fixations and saccades [7]. During fixations information intake from the
encoded input takes place while saccades do not supply any useful data. However, the
eye movements are largely under cognitive control and the analysis of temporal and
spatial characteristics of saccades during reading can reflect cognitive processing [8].
A number of studies show that during reading the upcoming visual input is partially
pre-processed in the parafovea [9, 10]. Thus, saccadic movements are akin to all human
species irrespective of language and culture; saccadic sampling and retinal make-up
determine the speed at which visual information is encoded and made available to the
linguistic processor.
Any language makes use of reading to extract information from written texts.
However, reading process itself may differ widely across languages at different levels,
from single words to phrases and the text as a whole. For example, while reading the
same text translated into different languages the participants show the range of eye
tracking patterns that vary in the number and length of fixations, and the length of
forward saccades [11]. Thus, cross-language differences may affect the eye-tracking
patterns observed in reading [12, 13]. One such peculiarity is the so-called language
density reported in various studies [14, 15]. Density in a language is the amount of
information conveyed by one structural unit, for instance, a word or a character.
Different types of density are distinguished in the literature. Lexical density is
defined by the number of lexical items such as nouns, verbs, adjectives, and adverbs
used in the text [16]. It can be used to estimate lexical variability in L2 speech
production [15, 16] or to study properties of texts from different corpora [17]. Semantic
density is a number of semantic features associated with one verb. It is used in studies
on language development, and language impairment in aphasia. Neighborhood density
is defined as the number of words that differ from a given word in only one phoneme in
any word position [19]. This type of language density can significantly influence oral
language decoding [20]. Propositional density is a measure of content richness in
language production [21].
For the study of written language decoding, two more kinds of density may be
informative. Visual density is the amount of visual information that is available per unit
of text [12]. Another type of density – information density - represents the amount of
information per word, depending on research goals and context [12]. Information
density is exploited to define the difference between language, for instance, German
and English. Visual density has been shown to influence the length of forward sac-
cades, and information density influences fixation durations cross-linguistically [12].
Thus, visual and information density may account for cross-linguistic differences in
patterns observed for written language decoding.
Letters, phonemes, and syllables cross-linguistically have different information
density. For example, single letters or syllables of English except for single-letter
pronouns (I), articles (a) or inflectional morphology (-s; -ed; -ing) are not usually
syntactically informative. In a language like Russian, however, which has a relatively
Is Information Density a Reliable Universal Predictor of Eye Movement 91

transparent orthography and a rich inflectional paradigm, letters and syllables, espe-
cially at word offset, bear semantic and syntactic information. As a result, when it
comes to higher-level processing, cross-linguistic differences may emerge in the rela-
tive utility of allocating attention to various features of the input [13]. To this extent,
words of equal length can be considered visually denser in Russian than in English.
Based on the assumption that writing systems differ as to their density, Liversedge
et al. in [12] defined universal and language specific eye-movement patterns for Fin-
nish, English and Chinese [12]. However, no such investigation has been made yet with
regard to Russian language. To address the issue of language universality in reading,
the study replicates the experiment in [12] for Russian, whose writing system differs
from English as to a number or parameters (alphabet, agglutination etc.). A systematic
comparative analysis of how information density affects the reading pattern for Russian
and English might be revealing for cross-linguistic modelling of reading patterns. Both
Russian and English are alphabetic languages that have vowels and consonants.
Therefore, their information density can be collocated: words in Russian are longer
than in English. Thus, the information density should be greater for English than for
Russian. In line with the study [12], we should expect the number of fixations and
saccade size to be greater and the fixation length to be shorter for Russian texts that are
equivalent to original English texts used in [12].

2 Methods

To test the participants’ proﬁciency in their native language (Russian) a C-test was
compiled. The C-test allows to assess different types of linguistic knowledge at the
micro- and macrolevel and requires mastery of grammar and vocabulary. It, thus, can
be used to assess “global” language proﬁciency [22]. The C-test is also reported as
highly reliable and valid [23]. The C-Test used in the study was a version of the story
“Who is called Mowgli?” that was adapted as to the guidelines in [24]. The C-test
included 40 words whose second half was deleted and had to be restored by the
participants. Every response was scored on a three-point scale: 3 points - correctly
recovered word; 2 points - the basis of the word is correctly chosen, but the word form
is erroneous; 1 point - the basis of the word is correctly chosen, but the initial form is
used; 0 points - wrong word basis/no answer. The maximum number of points - 120.
27 Russian-speaking students took part in the experiment. All of them scored high
on the C-test (more than 95% of maximum score). The eye-movements were recorded
with the help of SMI-Hi Speed Tracker 1250. The sampling rate was set to 500 Hz. The
experiment began with a 9-point calibration. After that the participants had to read eight
texts in Russian and answer comprehension questions. The texts used in the study were
translated from the stimuli used in [12]. They were split down into between 2 to 4
slides, so that each slide contained 1–8 sentences. Courier New with 0.46 visual angle
character subtension was used.
92 V. A. Demareva and Yu. A. Edeleva

English and Russian text corpora are compared in Table 1.

Table 1. Stimulus descriptives: number of words in total; average number of words in a

sentence; average word length (in characters).
Descriptives Russian English [12]
Total number of words 1676 1762
Average number of words in a sentence 11.72 14.68
Average word length (in characters) 6.1 5.63

The sentence was selected as a unit of analysis. Four measures reflecting global
properties of eye-movements were computed: (1) Total Sentence Reading Time,
(2) Average Number of Fixations, (3) Average Forward Saccade Size, and (4) Average
Fixation Duration. The date points beyond three standard deviations and ﬁxations
shorter than 60 ms and longer than 800 ms were removed from further statistical
analysis. Statistical analysis was performed in MS Excel and Statistica 10.0. Unifac-
torial dispersion analysis was used. Study design and procedures were approved by the
Ethical Committee of Lobachevsky State University, and all participants provided
written informed consent in accordance with the Declaration of Helsinki.

3 Results and Discussion

Global eye movement measures for Russian texts as obtained in the study as well as for
English texts are provided in Table 2.

Table 2. Global eye-movement measures: total sentence reading time (in ms), average number
of fixations, average forward saccade size (in characters), and average fixation duration (in ms).
Standard deviations are provided in Parentheses.
Eye-movement measures Russian English [12]
Total sentence reading time 4302 (1865) 3093 (777)
Average number of fixations 8.6 (2.38) 14.81 (2.93)
Average forward saccade size 7.78 (1.79) 8.53 (1.55)
Average fixation duration 195 (23) 207 (32)

Compared to the results in the study [12], our results for Russian texts show longer
average reading times, smaller number of fixations, shorter forward, and shorter fixa-
tions. The texts themselves had a significant influence on the eye movement measures.
For instance, there was a significant effect of text type on the total sentence reading
duration (F (6, 3701) = 27.5, p < 0,001), which could partially account for the
observed results [25].
Is Information Density a Reliable Universal Predictor of Eye Movement 93

The study fully reproduced the experimental design and analysis algorithm used in
[12], however, only one of the expectations (shorter fixation durations) got confirmed.
Supposedly, the eye-movement pattern observed for Russian could be influenced by
some additional language-specific properties of Russian other than information density.
Russian and English both belong to the Indo-European family [26]. Russian belongs to
the East-Slavic group [27], and English is a language of the West-Germanic group [26].
Compared to English, Russian is a highly inflectional language [28]. English orthog-
raphy with its 26 letters is considered irregular and morphophonemic where a word’s
sound pattern depends on its meaning. Russian uses Cyrillic script. The alphabet
contains 33 letters. Compared to English, Russian orthography is considered fairly
regular [29]. Moreover, different types of linguistic density (lexical [15, 16], semantic
[18]) should also be considered.

4 Conclusion

In this paper we studied information density as a possible universal predictor of eye

movement patterns during silent reading. Based on the results of the study, we con-
clude that a universal algorithm that allows to predict eye movement patterns based on
information density only hardly ever exists. Additional factors that underlie such
predictions should be investigated.

Acknowledgment. This work was supported by the Russian Foundation for Basic Research
(grant No. 18-013-01169).

References
1. Leroux, M., Raison, M., Adadja, T., Achiche, S.: Combination of eye tracking and computer
vision for robotics control. In: Proceedings of 2015 IEEE International Conference on
Technologies for Practical Robot Applications (TePRA), Woburn, pp. 1–6 (2015)
2. George, A., Routray, A.: Fast and accurate algorithm for eye localization for gaze tracking in
low-resolution images. Comput. Vis. 10(7), 660–669 (2016)
3. Beltrán, J., García-Vázquez, M.S., Benois-Pineau, J., Gutierrez-Robledo, L.M., Dartigues,
J.-F.: Computational techniques for eye movements analysis towards supporting early
Diagnosis of Alzheimer’s disease: a review. Computational and Mathematical Methods in
Medicine 2018. https://www.hindawi.com/journals/cmmm/2018/2676409/cta/. Accessed 26
May 2019
4. Heinzle, J., Aponte, E.A., Stephan, K.E.: Computational models of eye movements and their
application to schizophrenia. Curr. Opin. Behav. Sci. 11, 21–29 (2016)
5. Frost, R.: Towards a universal model of reading. Behav. Brain Sci. 35(5), 263–279 (2012)
6. Coltheart, M., Crain, S.: Are there universals of reading? We don’t believe so. Behav. Brain
Sci. 35(5), 20–21 (2012). Invited commentary on ‘‘Towards a universal model of reading”
7. Findlay, J.M., Gilchrist, I.D.: Active Vision: The Psychology of Looking and Seeing. Oxford
University Press, Oxford (2003)
8. Liversedge, S.P., Findlay, J.M.: Saccadic eye movements and cognition. Trends Cogn. Sci. 4
(1), 6–14 (2000)
94 V. A. Demareva and Yu. A. Edeleva

9. McConkie, G.W., Rayner, K.: The span of the effective stimulus during a fixation in reading.
Percept. Psychophysics 17, 578–586 (1975)
10. Rayner, K.: Eye movements and attention in reading, scene perception, and visual search.
Q. J. Exp. Psychol. 62, 1457–1506 (2009). The thirty-fifth Sir Frederick Bartlett Lecture
11. Rahaman, J., Agrawal, H., Srivastava, N., Chandrasekharan, S.: Recombinant enaction:
manipulatives generate new procedures in the imagination, by extending and recombining
action spaces. Cogn. Sci. 42, 370–415 (2018)
12. Liversedge, S.P., Drieghe, D., Li, X., Yan, G., Bai, X., Hyönä, J.: Universality in eye
movements and reading: a trilingual investigation. Cognition 147(3), 1–20 (2016)
13. Stoops, A., Christianson, K.: Parafoveal processing of inflectional morphology on Russian
nouns. J. Cogn. Psychol. 29(6), 653–669 (2017)
14. Crocker, M.W., Demberg, V., Teich, E.: Information density and linguistic encoding
(IDeaL). Künstl. Intell. 30, 77 (2016)
15. Gregori-Signes, C., Clavel-Arroitia, B.: Analyzing lexical density and lexical diversity in
university students’ written discourse. Procedia – Soc. Behav. Sci. 198, 546–556 (2015)
16. Reza, K., Gholami, J.: Lexical complexity development from dynamic systems theory
perspective: lexical density, diversity, and sophistication. Int. J. Instr. 10(4), 1–18 (2017)
17. Méndez, D., Ángeles, A.: Titles of scientific letters and research papers in astrophysics: a
comparative study of some linguistic aspects and their relationship with collaboration issues.
Adv. Lang. Literary Stud. 8(5), 128–139 (2017)
18. Borovsky, A., Ellis, E.M., Evans, J.L., Elman, J.L.: Semantic structure in vocabulary
knowledge interacts with lexical and sentence processing in infancy. Child Dev. 87(6),
1893–1908 (2016)
19. Nair, V., Biedermann, B., Nickels, L.: Understanding bilingual word learning: the role of
phonotactic probability and phonological neighborhood density. J. Speech Lang. Hear. Res.
60(12), 1–10 (2017)
20. Rispens, J., Baker, A., Duinmeijer, I.: Word recognition and nonword repetition in children
with language disorders: the effects of neighborhood density, lexical frequency, and
phonotactic probability. J. Speech Lang. Hear. Res. 58(1), 78–92 (2015)
21. Smolík, F., Stepankova, H., Vyhnálek, M., Nikolai, T., Horáková, K., Matejka, Š.:
Propositional density in spoken and written language of Czech-speaking patients with mild
cognitive impairment. J. Speech Lang. Hear. Res. 56(6), 1461–1470 (2016)
22. Eckes, T., Grotjahn, R.: A closer look at the construct validity of C-tests. Lang. Test. 23(3),
290–325 (2006)
23. Babaii, E., Ansary, H.: The C-test: a valid operationalization of reduced redundancy
principle? System 29, 209–219 (2001)
24. Cook, S.V., Pandža, N.B., Lancaster, A.K., Gor, K.: Fuzzy nonnative phonolexical
representations lead to fuzzy form-to-meaning mappings. Front. Psychol. 7, 1–17 (2016)
25. Demareva, V.A., Polevaia, A.V., Kushina, N.V.: The influence of language density on eye
movements in silent reading: an eye tracking study in Russian vs. English. Int.
J. Psychophysiol. 131S, S75–S76 (2018)
26. Baldi, P.H.: Indo-European languages. In: International Encyclopedia of the Social and
Behavioral Sciences, 2nd edn, Oxford, Pergamon (2015)
27. Zaprudski, S.: In the grip of replacive Bilingualism: the Belarusian language in contact with
Russian. Int. J. Sociol. Lang. 183, 97–118 (2007)
28. Maučec, M.S., Donaj, G.: Morphology in statistical machine translation from English to
highly inflectional language. Int. Test Conf. 47(1), 63–74 (2018)
29. Boulware-Gooden, R., Joshi, R.M., Grigorenko, E.: The role of phonology, morphology,
and orthography in English and Russian spelling. Dyslexia 21(2), 142–161 (2015)
Bistable Perception of Ambiguous
Images – Analytical Model

Evgeny Meilikov1,2(B) and Rimma Farzetdinova1

1
National Research Centre “Kurchatov Institute”, 123182 Moscow, Russia
[email protected]
2
Moscow Institute of Physics and Technology, 141707 Dolgoprudny, Russia

Abstract. Watching an ambiguous image leads to the bistability of its

perception, that randomly oscillates between two possible interpreta-
tions. The relevant evolution of the neuron system is usually described
with the equation of its “movement” over the nonuniform energy land-
scape under the action of the stochastic force. We utilize the alterna-
tive approach suggesting that the system is in the quasi-stationary state
being described by the Arrhenius equation. The latter determines the
probability of the dynamical variation of the image (for example, the
left and right Necker cubes [1]) along one scenario or another. Probabili-
ties of transitions from one perception to another are deﬁned by barriers
that detach corresponding wells of the energy landscape, and the rela-
tive value of the noise inﬂuencing this process. The mean noise value
could be estimated from experimental data. The model predicts loga-
rithmic dependence of the perception hysteresis width on the period of
cyclic sweeping the parameter, controlling the perception (for instance,
the contrast of the presented object). It agrees with the experiment and
allows to estimate the time interval between two various perceptions.

Keywords: Ambiguous images · Bistable perception

1 Introduction
Bistable perception is manifested when an ambiguous image, admitting two
interpretations, is presented to the subject. In that case the image perception
oscillates with time in a random manner between those two possible interpreta-
tions [2]. Such a bistability arises for different types of modality [3] – ambiguous
geometrical figures [Necker, 1832], figure-ground processes [4], etc. (cf. [5,6]).
Why those oscillations occur? Concrete “microscopic” mechanism of that
phenomenon is not known (see [7]), but various formal models are suggested
based, mainly, on the idea of competitions between distinct neuron populations
(engrams) [8]. The fundamental attribute of the most part of similar models is
the existence of fluctuations (the noise) which leads to random switching over
different perceptions.
We exploit the popular model according to which the dynamical process of
the bistable recognition might be reduced to traveling the ball along the energy
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 95–105, 2020.
https://doi.org/10.1007/978-3-030-30425-6_10
96 E. Meilikov and R. Farzetdinova

landscape in the presence of the high enough “noise” [8]. Relatively deep wells
of that landscape correspond to old neuronal patterns (“long-stored” in the
memory), while new images, being subjected to identification, are more shallow
wells. The image recognition is analogous to removing the ball in the nearest
deeper well corresponding to some known engram. Then, the possible perception
bistability is due to the fact that probabilities of transitions in different wells,
corresponding to different images, differ weakly, while in the usual situation
(with unambiguous image recognition) one of these probabilities significantly
outweighs another one. Now, the main problem is to establish, which details of
the system dynamics define characteristics of the bistable image recognition.

2 Energy Function
Due to fluctuations, the system state changes randomly that results in the per-
ception bistability. It is suggested [3], that two neuron populations (two differ-
ent neuron graphs, or two engrams) represent two possible interpretations of
the stimulus. Those two populations “compete” with one another, changing the
activity of their neurons. Such a model is based on introducing some energy func-
tion U with two local minima, corresponding to both different image perceptions,
and a barrier between these two states.
The temporal evolution of the neuron system is usually described with the
equation of its “movement” over the nonuniform energy landscape under the
action of the stochastic force, representing noise perturbations [9,10]. We utilize
the alternative (and simpler) approach suggesting that the system is in the
quasi-stationary state which could be described by the Arrhenius equation [11].
That would be true if the average energy Φ of noise fluctuations was less
than the hight of the barrier separating two system states. Below, we will see
that this suggestion is valid. But it is the aim of the work to show that this
“limited” model, though much more simple, gives no less (in some cases - more)
information than the more complicated models of [3]-type for describing bistable
perception. In addition, our approach is analytical one, while other models result
in numerical calculations and results only.
Usually, the energy function is written, by analogy with the phenomenological
theory of phase transitions [12], in the form of the power function of some state
parameter whose changing corresponds to the dynamic transition of the system
from one state to another. However drawing such a power form is justified only by
the possibility to expand the function U , in the neighborhood of its minimuma,
in powers of the state parameter. Therefore, the form of that function could be
selected arbitrary (mainly, for the ease of convenience) from the class of those,
preferably simple functions, that describe the needed evolution of the two-well
potential with changing the state parameter. Specifically, we write that function
in the form
U (θ) = −U0 (sin2 θ + Jθ), (1)
where θ is the generalized coordinate of the system state (the dynamical variable,
or the order parameter), U0 is the typical system “energy”. Here J(t) is the
control parameter, generally time-dependent, that defines the system state. For
instance, in the case of the Necker cube (see below) the image contrast could
Perception of Ambiguous Images 97

Fig. 1. Extremes of the energy function (1) at J = −0.2.

play the role of such a control parameter. We will be interested in the interval
of changing the parameter θ0 that corresponds to those minima of the function
U (θ) which are proximate to the point θ = 0. At J = 0 these extremes are placed
in points θ1 = −π/2, θ2 = π/2 (minima), θ0 = 0 (maximum). If J = 0, then
the maximum shifts to the point, where sin 2θ0 = J, and minima – to points
θ1 = −π/2 + θ0 , θ2 = π/2 + θ0 (see Fig. 1).
With rising the parameter J, the tilt of the energy landscape changes – the
first minimum becomes shallower, the second one – more deep, and the barrier
between them diminishes. Let, for instance, in the original state J = −1 and
the system resides in the first deep minimum. Then, with rising the control
parameter J the system will move (due to fluctuations) from the state θ1 (where
it has existed at J = −1) to the state θ2 , clearing the reduced barrier with the
top in the point θ0 . In full that barrier disappears at J = +1 (see Fig. 2).

Fig. 2. Energy landscape U = U (θ) at various values of the control parameter J.

Arrows indicate the system evolution under cyclic changing that parameter within
limits −1 < J < 1. The cycle corresponds to the hysteresis loop shown in the insert.
It is accepted that jumps between states occur at J = ±0.5 by crossing the barrier of
0.5U0 - height.
98 E. Meilikov and R. Farzetdinova

Under cyclic variation of the parameter J, the system does not have time to
follow it, and, due to such an “inertia”, the hysteretic dependence θ(J) arises,
shown in the insert of Fig. 2 and associated with system transitions from one
well to another over the detaching barrier of the ﬁnite hight. In the example
case, the transition occurs at J = ±0.5.
Barrier heights Δ12 , Δ21 , obstructing system transitions from the minimum
θ1 to the minimum θ2 and the reverse, is readily found from Eq. (1):

Δ12 /U0 = 1 − J 2 + J · (arcsin J − π/2), Δ21 /U0 = 1 − J 2 + J · (arcsin J + π/2).

In the linear approximation

Δ12 /U0 ≈ 1 − πJ/2, Δ21 /U0 ≈ 1 + πJ/2. (2)
Dependencies Δ12 (J), Δ21 (J) of those barriers on the control parameter
are shown in Fig. 3 which demonstrates that, with monotonous variation (J =
−1) → (J = +1), they are also monotonous and cross in the point J = 0.
Somewhere in the vicinity of that point the transition occurs from one minimum
to another. This is the phase transition with hysteresis whose width is, as usually,
depends on the relation between the time T of sweeping the control parameter
and the characteristic time τ (see Eq. (3)) of the phase transition.

Fig. 3. Dependencies of barriers heights for transitions θ1 → θ2 and θ2 → θ1 on the

control parameter J.

Instead of the explicit accounting the noise influence we will use the well-
known Arrhenius-Kramers formula [13] for the mean lifetime τ of the system in
the certain quasi-stationary state which is determined by the relation between
the height Δ of the “energy” barrier and the mean value Φ of the noise fluc-
tuation energy (that value could be called the chemical temperature)1 :
1
By fluctuations we mean the deviation of ion or neurotransmitter concentrations in
synaptic contacts. That is why we call this noise as chemical one. This term is purely
phenomenal, different processes could group together under this same heading. But,
nevertheless, the electric potential of a membrane fluctuates in a random manner
(see [14]).
Perception of Ambiguous Images 99

τ = τ0 exp(Δ/Φ), (3)

where τ0 is the constant which should be estimated (see below), and by reason
of its general sense is the time between two successive attempts to clear the
barrier. In fact, that relationship deﬁnes the probability of the system transition
in one or another state. The chemical or noise temperature Φ is the chemical
analog of thermal ﬂuctuations (to which the thermal energy corresponds in the
chemical kinetics).

3 Hysteresis

To estimate the width of the hysteresis loop for the dependence θ(J) (for
instance, with varying the control parameter J(t) with time), we will base on the
assumption that the transitions θ1 → θ2 and θ2 → θ1 between minimums of the
energy U (θ) occur not at the moment when the barrier between these two states
disappears, but upon condition that the life-time τ of the current state (see
Eq. (3)) diminishes (due to reducing the barrier hight) so that becomes much
less than the time T of the J-parameter sweeping, that is under the condition

τ = τ0 exp (Δ/Φ) = γT, where γ 1. (4)

It follows from Eqs. (2), (4) that the transition θ1 → θ2 occurs at

J = J1→2 = (2/π) [1 − Φ/U0 ln(γT /τ0 )] . (5)

By the system symmetry respective to transitions θ1 → θ2 and θ2 → θ1 , the

reverse transition occurs at J = J2→1 = −J1→2 , so that the whole width of the
hysteresis loop equals

h = J2→1 − J1→2 = (4/π) [1 − Φ/U0 ln(γT /τ0 )] . (6)

4 Necker Cube: Perception Bistability

In the experiment [10], the Necker cube has been presented as the ambiguous
figure (see Fig. 4) with the contrast of three neighbor cube edges, meeting in its
left middle corner, as the control parameter −1 < J < 1. The values J = −1
and J = +1 correspond, respectively, to luminosities j = 0 and j = 255 for
pixels of those edges images with 8-bit gray scale. Thus, the contrast J (the
control parameter) has been defined by the relation J = 2j/255 − 1, where j is
the luminosity of those lines on the given scale. In such a case, the contrast of
three middle cube edges, meeting in the right middle corner, equals 1 − 2J, and
the contrast of six visible outer cube edges equals to 1. In the symmetrical case
J = 0, so that the parameter J defines the deviation from the symmetry. For
the pure left cube J = −1, and for the pure right cube J = 1.
100 E. Meilikov and R. Farzetdinova

J= -1 J= - 0.42 J=0 J=0.56 J=1

Fig. 4. Images of Necker cubes with diﬀerent contrasts being deﬁned by the control
parameter J [10].

In the course of the experiment, cube images with N random values Ji of the
control parameter (i = 1, 2, . . . , N ) have been presented many times. Subjects
have been requested to press buttons on the control panel, according to their
initial impression – if the cube is “left” (Fig. 4a) or “right” (Fig. 4e). Each cube
with the ﬁxed value of the control parameter Ji has been randomly presented
many times.
For each value Ji of the control parameter, the probability

PL (Ji ) = l(Ji )/[l(Ji ) + r(Ji )] (7)

of observing the left cube has been calculated. Here l(Ji ) and r(Ji ) are, respec-
tively, numbers of pressing the left or the right button after presenting cubes
with the value Ji of the control parameter.
Shown in Fig. 5 experimental results are qualitatively similar for all subjects
but differ quantitatively. For some observers, the perception of images as left cube
ones transforms steeply into their perception as right cubes (near the “symmetry
point” J = 0, where PL = 0.5; see the upper panel of Fig. 5), while for others
this conversion is smeared (see the lower panel of Fig. 5).
In [10] those results are associated with competing different neuron popula-
tions near the cusp point in the catastrophe theory with noise included [15]. Our
approach is much simpler one – we use the Arrhenius relation(3) for the system
life-time in a metastable state that permits to describe correctly not only the
dependency PL (J), but the hysteresis of the image perception under the cyclic
variation of the control parameter (see below), as well.
We could identify the memorized patterns of the left and the right cubes with
some long-formed wells of the energy landscape, while the new image to be rec-
ognized – with the virtual (recently formed) well. Recognizing the image in that
model is the transfer of the system from the new well of the energy landscape,
corresponding to the presenting image, into one of two other wells, correspond-
ing in our case to engrams of the left and the right cubes. The direction of such
a random, to some extent, transfer is defined by the fact that barriers between
the initial and two final wells have different heights. The barrier between wells
of more similar images is lower, and that leads to the preferred transfer from the
well of the presented image into the well of more similar memorized one.
Let ΔL and ΔR be the heights of the barriers indicated. If the presented
image is more similar to the left cube image, then ΔL < ΔR and conversely. It is
clear, the more the contrast of the presented cube differs from the zero contrast
Perception of Ambiguous Images 101

Fig. 5. Typical experimental dependencies [10] of probabilities PL (J) to percept the

image as a left cube on the control parameter J (points in three panels relate to three
diﬀerent observers). Solid curves are theoretical dependencies (10) with c-parameter
values speciﬁed in each panel.

of the symmetrical image (J = 0, and ΔL = ΔR ), the more the diﬀerence

between barriers. Then the simplest linear relation between barriers heights and
the contrast J of the new image has the form

[ΔL − ΔR ]/Φ = c ·J, (8)

where c is the individual constant to be experimentally determined.

The probability PL to recognize the cube as the left one (or the right one)
depends on probabilities pL , (pR ) of transferring from the well corresponding to
the presented image into the well of the left (the right) cube. According to (3):

pL ∝ exp(−ΔL /Φ), pR ∝ exp(−ΔR /Φ). (9)

Hence, the total probability to see the left cube equals

PL = pL /(pL + pR ) = 1/[1 + exp(cJ)]. (10)

Figure 5 shows (together with experimental data [10]) theoretical dependen-

cies PL (J) calculated by Eq. (10) which, apparently, match well with the experi-
ment when the numerical value of the parameter c is properly chosen. The latter
varies within the limits from c ≈ 20 (the upper panel of Fig. 5) down to c ≈ 2
(the lower panel of Fig. 5). It follows therefrom, that in the ﬁrst case the noise
is rather weak:

Φ/(ΔL − ΔR ) = 1/cJ ∼ 0.1,

while in the last case the noise intensity is high enough: Φ/(ΔL − ΔR ) ∼ 1,
and is comparable with barrier heights.
102 E. Meilikov and R. Farzetdinova

5 Necker Cube: Hysteresis of Perception

In [9], experiments with Necker cube are discussed which relate to the statis-
tics of switching between two possible perceptions of the relevant image with
varying the control parameter in time. At first, that parameter has been grad-
ually changed (for the time T ) in the straight-going direction (from J = −1 to
J = 1), and then, for the same time, in the reversal direction (from J = 1 to
J = −1), wherein the time T of the contrast sweeping has been varied. Moments
τf , τb (for the forward and back sweeping the control parameter, correspond-
ingly) have been registered, when the observer has for the first time switched
from some image perception to another one. In the bistable system, such an
over-switching takes place twice – at the forward and backward variation of that
parameter. Such a hysteresis phenomenon depends on the rate of varying the
control parameter and is observed in different bistable systems.
Hysteresis is the system property which lies in the fact that under varying
external conditions the system state differs, more or less, from the state being
equilibrium at the current conditions. The latter state is that, which could be
reached in infinite time after onset of certain (further unchanged) conditions.
Really, to arrive at the state which is close enough to the equilibrium one, the
finite characteristic relaxation time τ is needed, so that the existence (or nonex-
istence) of hysteretic phenomena is defined by the relation of two times – the
relaxation one and the experiment duration time T : there is the hysteresis if
T τ , and the hysteresis is absent if T τ .
The hysteresis (more exactly, the hysteresis width) could be conveniently
characterized be the parameter

h = (τf + τb ) /T − 1, (11)

which goes to zero (and even becomes negative), when τf , τb < T /2, and is
distinct from zero at low T , when τf , τb > T /2 and h > 0. Hysteresis loops for
these two cases are gone in opposite directions – clockwise (h < 0) and anti-
clockwise (h > 0). As it is seen from (6), the case h < 0 is realized under the
condition
Φ/U0 > 1/ ln(γT /τ0 ), (12)
that corresponds to high enough (other factors being equal) intensity of ﬂuctu-
ations Φ/U0 1, provoking “advanced” transitions between energy minima
over high barriers.
The logarithmic dependency predicted by our model agrees with the exper-
iment [9], that allows to estimate numerically some model parameters. Figure 6
presents two typical experimental dependencies of the hysteresis width h on T
(for two diﬀerent subjects), which are properly approximated by straight lines
in the logarithmic scale. For numerical estimates, it is convenient to introduce
the dimensional constant τ1 = 1 s and rewrite Eq. (6) in the dimensionless form
h = A − B ln(T /τ1 ), where

A = (4/π) {1 − [Φ/U0 ] ln (γτ1 /τ0 )} , B = 4Φ/πU0 . (13)

Perception of Ambiguous Images 103

Fig. 6. Typical experimental dependencies (points) of hysteresis width (11) on the

duration T of scanning the control parameter [9]. Straight lines are linear ﬁts.

Fig. 7. Experimental dependencies τf, b (T ) (• − τf , ◦ − τb [9]).

Parameters A and B are determined from linear dependencies of Fig. 7. For

example, for the upper of those dependencies A ≈ 1, B ≈ 0.5. Herefrom, it
follows at once
Φ/U0 ≈ 0.4, τ0 /τ1 ≈ 2γ. (14)
Thus, the relative intensity of ﬂuctuations (for the concrete tested person) is
rather high. To compare, the lower dependency in Fig. 6 gives Φ/U0 ≈ 0.15.
We see that in all cases the noise is relatively small, and, hence, the Arrhenius
equation could be used. As for the time τ0 or the parameter γ, directly coupled
with it (see (14)), if one chooses, for instance, γ = 0.3, then τ0 ∼ 1 s.
We could also consider some simpler model of transferring the system from
one state into another one, suggesting that this transition occurs always (inde-
pendently of the sweeping time T ) at the moment, when the diﬀerence between
the initial contrast (J = −1 at the moment t = 0) and the contrast at the
104 E. Meilikov and R. Farzetdinova

switching moment (t = tf ) reaches some critical value Jc . For the linear sweeping
in the forward direction J = −1+t/T , so that Jc = tf /T , or tf = Jc ·T , that cor-
responds to the simple rule: switching time is proportional to the sweeping time.
That rule is in some extent confirmed by the experiment [9] – see Fig. 7, where
experimental dependencies τf, b (T ) are presented. One could see that in spite of
high data scattering those dependencies could be, in fact, considered as linear
ones. They correspond to the value Jc ≈ 0.15. Hence, in that model the switching
should happen every time when the contrast difference reaches ∼15%. However,
this over-simplified model predicts the constant hysteresis width h ≈ −0.7 (see
(11)), that contradicts to the experiment.

6 Conclusions

Described above bistability models consider, in fact, dynamical processes of

switching between different perceptions of an ambiguous image and the hystere-
sis of such a perception. On the other hand, no dynamical equations (such as
θ̇ = −∂U/∂θ) are used in our scheme. That is based on the Arrhenius-Boltzmann
relation (3) which defines the probability of dynamical changing the percepted
image type under some scenario.
In the considered model, probabilities of transitions from one perception type
to another are calculated. They are determined by barriers separating respective
wells (of the depth U0 ) of the energy landscape, and the noise level influencing
that process. The latter is represented by the parameter Φ whose relative value
could be estimated from experimental data: Φ/U0 ≈ 0.1 − 1 (individually for
various observers).
Predicted by the model, the logarithmic dependency of the perception hys-
teresis width on the period of cyclic sweeping the parameter controlling the
perception (for instance, the contrast of the presented image), agrees with the
experiment and allows to estimate the time τ of switching between two poten-
tially possible perceptions of the ambiguous image: τ ∼ 1 s for T = 30 s.
Thus, in the framework of the described “non-dynamical” approach one could
obtain some certain conclusions on dynamics characteristics of the bistable per-
ception for ambiguous images.

References
1. Necker, L.: Observations on some remarkable phenomenon which occurs on viewing
a ﬁgure of a crystal of geometrical solid. London Edinb. Philos. Mag. J. Sci. 3, 329–
337 (1832)
2. Huguet, G., Rinzel, J., Hupé, J.-M.: Noise and adaptation in multistable percep-
tion: noise drives when to switch, adaptation determines percept choice. J. Vis.
14(3), 19 (2014). 14
3. Moreno-Bote, R., Rinzel, J., Rubin, N.: Noise-induced alternations in an attractor
network model of perceptual bistability. J. Neurophysiol. 98, 1125–1139 (2007)
Perception of Ambiguous Images 105

4. Pressnitzer, D., Hupé, J.M.: Temporal dynamics of auditory and visual bistability
reveal common principles of perceptual organization. Curr. Biol. 16, 1351–1357
(2006)
5. Leopold, D.A., Logothetis, N.K.: Multistable phenomena: changing views in per-
ception. Trends Cogn. Sci. (Regul. Ed.) 3, 254–264 (1999)
6. Long, G.M., Toppino, T.C.: Enduring interest in perceptual ambiguity: alternating
views of reversible figures. Psychol. Bull. 130, 748–768 (2004)
7. Sterzer, P., Kleinschmidt, A., Rees, G.: The neural bases of multistable perception.
Trends Cogn. Sci. 13(7), 310–318 (2009)
8. Haken, H.: Principles of Brain Functioning. Springer, Cham (1996)
9. Pisarchik, A.N., Jaimes-Reátegui, R., Alejandro Magallón-Garcia, C.D., Obed
Castillo-Morales, C.: Critical slowing down and noise-induced intermittency in
bistable perception: bifurcation analysis. Biol. Cybern. 108(4), 397–404 (2014).
https://doi.org/10.1007/s00422-014-0607-5
10. Runnova, A.E., Hramov, A.E., Grubov, V.V., Koronovskii, A.E., Kurovskaya,
M.K., Pisarchik, A.N.: Chaos. Solitons Fractals 93, 201–206 (2016)
11. Stiller, W.: Arrhenius Equation and Non-Equlibrium Kinetics. BSB B.G. Teubner
Verlagsgesellschaft, Leipzig (1989)
12. Toledano, J.-C., Toledano, P.: The Landau Theory of Phase Transitions. World
Science, Singapore (1987)
13. Kramers, H.A.: Brownian motion in a field of force and the diffusion model of
chemical reactions. Physica 7, 284–304 (1940)
14. Burns, B.D.: The Uncertain Nervous System. Edward Arnold (Publishers) Ltd.,
London (1968)
15. Poston, T., Stewart, I.: Catastrophe Theory and its Applications. Pitman, London
(1978)
Video-Computer Technology of Real Time
Vehicle Driver Fatigue Monitoring

Y. R. Muratov(&), M. B. Nikiforov, A. S. Tarasov,

and A. M. Skachkov

Ryazan State Radio Engineering University Named After V.F. Utkin,

Ryazan, Russia
[email protected]

Abstract. This article is devoted to the actual problem of human fatigue control
and attention concentration reduction in transport. The authors consider the most
efﬁcient, by their mind, method of person psycho-emotional condition assess-
ment that is video control based on eye condition analysis. It is based on a
convolutional neural network having its own topology. The problem of network
optimal depth choosing to operate in real-time mode, and the problem of large
accuracy indicators on a single-board ARM processor architecture computer
were analyzed. As a research result, the software and hardware complex pro-
totype was presented. This prototype allows to detect human fatigue by means
of eye video image analysis. This system allows to reduce number of car
accidents associated with vehicle driver falling asleep. In conclusion, the short-
term project development prospects are proposed. Fatigue of the person who
makes control, management or decision-making, and decrease of attention
concentration on object can lead to critical consequences. The most efﬁcient
person physiological state control is video control based on eye condition
analysis. The algorithm based on convolutional neural network and its hardware
implementation, providing face search in the image, eye detection and analysis
of eye condition by “open-closed” principle is proposed.

Keywords: Convolutional neural network Single-board computer

Human fatigue control Accident reduction

1 Introduction

Automated operator monitoring systems development is an integral part of reliability

improvement in “human-machine” system. Such factors as emotional stress and fatigue
can lead to performance decrease that is described by main following characteristics:
attention state and emergency action readiness [1]. The most important is state control
of the operators whose action errors pose a direct threat to human lives: nuclear power
plants operators [2], military equipment [3], public and private transport drivers [4].
One of actual problems associated with attention concentration decreasing is driver
falling asleep when driving. In 2017, Ford conducted a survey among Russian drivers.
According to statistics, 32% of respondents fell asleep when driving, 3.8% of them

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 106–115, 2020.
https://doi.org/10.1007/978-3-030-30425-6_11
Video-Computer Technology of Real Time Vehicle Driver 107

admitted that they woke up after a collision or exiting from the roadway. About 20% of
all accidents are caused by falling asleep when driving [5]. A sleepy driver, like a
drunk, is extremely dangerous on the road. Every year the number of car accidents
caused by driver falling asleep, is increasing worldwide. A survey conducted in Nor-
way found that 1 of 12 car drivers at least once fell asleep when driving during a year.
Main signs of driver concentration decrease are the followings:
• vision focusing difficulties;
• frequent eye blinking;
• feeling of heavy eyelids;
• hardness of keeping head straight;
• frequent driver yawn;
• driver can hardly remember the last traveled kilometers;
• driver passes road signs without paying attention to them;
• car often moves out of his lane;
• difficulties connected with distance keeping;
• a car touches a noise strip on a road side.
In addition, human physiological reactions of cardiovascular, respiratory, central
nervous systems are changed in a state of fatigue and drowsiness.
Behavioral indicators such as yawning, blinking, head tilting, long time road dis-
tracted look are often used to detect the signs of driver’s attention concentration
decrease automatically [13].
Observation systems and video processing integrated into vehicle can significantly
improve transport security. The most efficient concentration decrease sign is the
dynamics of eye condition [6]. Developments of this type [2], implemented in hardware
can be mentioned. Despite apparent simplicity of the open and closed eyes detecting
function on the video frame, the presented systems are far from perfect. Eyes detecting
difficulties arise when the driver turns his head, for example, looking at the side
window or rear-view mirror, or at night, with variable road illumination and oncoming
lighting. There are also many other factors that make operation of the proposed systems
difficult [7].
Recently, systems of driver falling asleep monitoring based on face and eyes
analysis in the image on video camera began to appear actively. Such systems include
CoDriver made by Jungo, devices integrated into the steering wheel provided by
Johnson safety system, and Driver Alert Sleep Warning Device and others. Existing
developments suggest that the camera should be in a position to “see” the whole face.
This arrangement can be inconvenient and also makes it difficult to integrate such
systems into many vehicles where this optical sensor arrangement is not acceptable due
to design features.
Sharp problem of many systems is use of eyeglasses. Specialized lenses can create
flares and distort arrangement of eyes, making the work of similar systems impossible.
108 Y. R. Muratov et al.

The developed complex also should have a possibility of fast substring in case of
sharp brightness variation (the front lighting, driving in a tunnel, light reflection, etc.),
to have broad range of working temperatures, low power consumption and reasonable
price.

2 Methods of Faces and Eyes Detecting in Images

Investigation

System requires determining position of driver’s eye in the image. Before system
begins to detect a position of eye, it is necessary to localize driver’s face.
One of the most known methods of face detection is Viola-Jones’s method [3, 4].
Its basic principle is in image representation in integrated view. It allows to count total
brightness of any rectangle in the image. Integrated characteristics are used for cal-
culation of signs based on Haar’s primitives [9], and they give output using busting
[10] result. Training takes place very slowly, however search results of an object (face)
pass very quickly, but it is insufficiently correct in case of some head position. The
other relevant image search algorithm is application of Single Shot Multi Box Detector
(SSD) based on application of convolution neural networks, such as MobileNet. Such
algorithms have the highest accuracy then the Viola-John’s (more than 90%). However,
implementation of convolution network based on the ARM CPU shows poor perfor-
mance. One more way of objects detection in the image is histogram of the directed
gradients. The method is based on the directions of image brightness gradients cal-
culation of and on finding the area where the majority of them satisfy to a template. In
other words, it is necessary to find such section of the image that has HOG repre-
sentation most similar to HOG representation of face structure. HOG allows you to
detect face with the ability to vary between performance and accuracy. Therefore, for
example, in DLib [11] library authors could achieve detection accuracy in 99.38%.
Under the constraints imposed by the ARM processor architecture and camera
angles, HOG was chosen. The result of face localization by the HOG algorithm is the
coordinates of square frame containing whole face or its large area.
Eye detection is also possible with several different algorithms. The first and most
commonly used are Haar cascades. Algorithm gives correct results for 80% of cases
when the face is full. In poor lighting, night driving conditions, the algorithm works
unsatisfactorily. Low performance of the algorithm realization is also disadvantage of
Haar cascades. The most productive of all algorithms is the analytical algorithm for
determining landmark facial points. Versions of this algorithm using different datasets
are able to determine from 5 to 68 facial points at a speed about 7000 FPS on the
classical x86-x64 CPU architecture. In Fig. 1, the results of both methods, Haar cas-
cades and Analytics in the case when eyes were not localized by Haar cascades are
presented. Eyes selected by Haar cascades are highlighted by black rectangles. Data
obtained by the analytical algorithm are highlighted by grey rectangles.
Video-Computer Technology of Real Time Vehicle Driver 109

Fig. 1. Search result of eyes on faces

To detect eye condition either open or closed, you can use the analysis of eye points
coordinates obtained by analytical algorithm. In our implementation, each eye is
described by six points. However, accuracy of this analysis depends on the size of the
eyes and eyelids of different races of people, also on lighting conditions and head
positions. For certain positions of head analytic algorithm gives the front point of the
eye, that does not belong to eye. Therefore, it was decided to use a neural network
algorithm to detect the state of the eye. As the platform for training Keras was used [8].
The models received using of this Framework have the small weight and good inte-
gration into Open CV.
Input value of neural network is the image of an eye and its neighborhood in
several pixels. The image was scaled to size 96*96. Such image size was optimum.
After that image goes to an input of neural network. The output has 3 values:
1. Probability that the image has open eye
2. Probability that the image has closed eye
3. Probability that the image has no eye at all
The third value allows to exclude false operations of a complex when eye (face)
was not found, for example, in case of turn of the head. One of the main problems was
the choice of network architecture.
Small networks have low accuracy; however they make a decision very quickly.
Large networks, such as Xception [7, 12], on the contrary, have low speed of work and
give high accuracy.
Optimum implementation was reached by means of new variant of network
architecture creation that has the highest accuracy among others capable to work in real
time mode at RockChip RK3399 CPU (less than 30 ms per 1 frame).
Eight variants of neural network architectures were analyzed. The experiment
results are shown in Fig. 2.
As a result, the most optimum network has a structure
P3C32P2C64P2C128P2C256P2D1024D3, where:
Cn – convolution operation of the image with selection of n-signs
Pm – subselection operation (Max Pooling) with a m*m core size
Dk – full-meshed layer of neural network in k-neurons
110 Y. R. Muratov et al.

Fig. 2. Comparison of different neural network models depending on speed and accuracy

As a result of research the following architecture was obtained (Fig. 3):

Fig. 3. Neural network architecture

This model works with square color images of eye area, having 96*96 size and
represents consecutive execution of Convolution and Pooling operations increasing
number of characteristics until the array consisting only of 1024 characteristics is
received. After that, received characteristics go to a full-connected layer from 1024
Video-Computer Technology of Real Time Vehicle Driver 111

neurons and after that images are separated into three required classes: BAD, OPEN
and CLOSE. As activation function the ReLU function was selected:

0; if x\0
f ð xÞ ¼ ð1Þ
x; if x 0

Training was made on special marked set of images consisting of more than
150.000 images of the opened and closed eyes.
The network was trained on 6 iterations. Figure 4 shows dependence of false
positives on training iterations.

Fig. 4. Quality of work depending on the iteration of training

The training was provided on our own datasets which included samples of eyes
images of University students of different nationalities, under different lighting con-
ditions, different attributes of face. It allowed to provide high-quality work regardless of
conditions.
The result of trained network is shown in Fig. 5. Test sample shows that the
presented model demonstrates high accuracy under various conditions: glasses, glare,
sudden changes in brightness, etc.

Fig. 5. The eye condition detection result

The algorithms are implemented in C++, using the OpenCV library under the ARM
architecture.
112 Y. R. Muratov et al.

3 Experiment Results

To assess the proposed complex quality an experiment was conducted. During

experiment, about 10,000 different images prepared in different situations were
analyzed:
• Different lighting conditions and camera locations;
• Different age, ethnic and sex composition;
• Presence of limiting factors such as headdress, mustache, beard, etc.
• A driver in possession of glasses: sunglasses, anti-reflective, correcting hyperopia
and myopia.
In Figs. 6 and 7 the result of face detection by HOG algorithms and neural network
is presented.
Experimentally, face detection algorithms demonstrated high detection rates
(98.21%). A small percentage of incorrect results occurs only in case of bright lighting
of half face and a large angle of head rotation more than 50°. However, error com-
pensation at large angles is compensated by the presence of two cameras that com-
plement each other in conditions when the driver looks in the side mirrors.

Fig. 6. The result of the face selection module using HOG

In Fig. 8 the result of the neural network algorithm for detecting eyes state is
presented.
The presented group of algorithms allows you to ﬁnd and select areas regardless of
the shooting conditions and the presence of glasses.
Testing the system prototype made it possible to conﬁdently determine the moment
of closing and opening the eyes with the following shooting parameters:
• daylight and head positions ±45° horizontally;
Video-Computer Technology of Real Time Vehicle Driver 113

Fig. 7. The result of the face extraction module using a neural network

Fig. 8. The eye detection result. Eyes closed on the left and open on the right

• night mode (illumination of the IR diodes with a wavelength of 840 nm and con-
ditions of the head ±35°. horizontally;
• glasses with diopters ±5 day and night lighting for cases when the glasses shackle
does not cover the eyes in the image (angles of rotation of the head ±35°
horizontally);
• safety glasses with a light degree of shading, daylight for cases when the glasses do
not cover the eye on the image (angles of rotation of the head ±35° horizontally.)

4 Conclusion

The proposed video-computer technology includes two successive stages of deter-

mining approaching sleep driver state: ﬁrst of all, facial area search and selection in the
frame. Once this operation is performed, the algorithm uses a neural network that
114 Y. R. Muratov et al.

determines the both eyes state. Hardware implementation is focused on LowCost

evaluator level on the basis of the SoC RK3399. RK3399 includes CPU with big.
LITTLE architecture.: Dual-Core Cortex-A72 and Quad-Core Cortex-A53 and GPU
Mali-T864. One or two video cameras are used as image registration sensors. The
solution is not tied to a speciﬁc location of the camera, on the contrary, it allows the
driver to determine the place of attachment. The test showed that in the case of one
camera, the best place to install it is the space above the dashboard between the
windshield and the driver’s face. In the case of two cameras the best place to install
them is the front side pillars. Two cameras allow you to control the driver, e.g., when
he looks in the side mirrors.
The immediate prospects for the project development are the addition of new driver
fatigue assessment metrics. The ability to measure heart rhythm using the video series
is supposed as one of these metrics. This addition will increase the accuracy of driver
fatigue determination.
In addition, the developed complex makes it possible to set the limit of incessant
driving of the car. This feature has great relevance for vehicles requiring the installation
of tachographs. They have a common disadvantage – the ability to cheat device. Using
the developed complex as a “Smart” tachograph there is no opportunity to cheat it,
because the device “remembers” the driver’s face.
Thus, the developed complex will reduce the number of road transport incidents
that occur due to lack of concentration caused by driver fatigue or distraction.

References
1. Dushkov, B.A., et al.: Fundamentals of Engineering Psychology, p. 576, Moscow-
Yekaterinburg (2002)
2. Alyushin, M.V., Alyushin, A.V., Belopolsky, V.M., Kolobashkina, L.V., Ushakov, V.L.:
Optical technologies for monitoring systems of the current functional state of the operational
composition of the management of nuclear power facilities. Global Nucl. Saf. 6, 9–77
(2003). Moscow
3. Melnik, O.V., Demidova, K.A., Nikiforov, M.B., Ustyukov, D.I.: Continuous monitoring of
blood pressure of the vehicle crew and decision makers. Defense Technol. Sci. Tech.
Collect./FSUE “NIISU” 9, 77–80 (2016)
4. Sahayadhas, A., Sundaraj, K., Murugappan, M.: Detecting driver drowsiness based on
sensors: a review. Sensors 12(12), 16937–16953 (2012). (Basel). Publish
5. Ovcharenko, M.S.: Analysis and forecast of the state and level of accidents on the roads of
the Russian Federation and ways to reduce it. Sci. Methodical Electron. J. Concept 15,
1661–1665 (2002)
6. Dimov, I.S., Derevyanko, R.E., Kotin, D.A.: Automated system for preventing the driver
from falling asleep while driving. Vestn. MGTU 20(4), 659–664 (2017)
7. Image Processing in Aviation Vision Systems. Kostyashkin, L.N., Nikiforov, M.B. (eds.),
p. 240. Fizmatlit, Moscow (2016)
8. Chollet, F.: Keras. https://github.com/fchollet/keras. Accessed 21 Nov 2015
9. Viola, P., Jones, M.: Robust Real-time Object Detection. Cambridge Research Laboratory,
Cambridge (2001)
Video-Computer Technology of Real Time Vehicle Driver 115

10. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features.
Conf. Comput. Vis. Pattern Recogn. 1, l-511–l-518 (2001)
11. King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
12. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR, vol.
2 (2017)
13. Furman, G., Baharav, A., Cahan, C., Akselrod, S.: Early detection of falling asleep at the
wheel: a heart rate variability approach. Comput. Cardiol. 35, 1109–1112 (2008)
Consistency Across Functional Connectivity
Methods and Graph Topological Properties
in EEG Sensor Space

Anton A. Pashkov(&) and Ivan S. Dakhtin

South Ural State University (National Research University), Chelyabinsk, Russia

[email protected]

Abstract. One of the most widely used topological properties of brain graphs is
small-worldness. However, different functional connectivity methods can gen-
erate quantitatively different results, particularly when they are applied to EEG
sensor space. In this manuscript, we sought to evaluate the consistency of values
derived from pairwise correlation between selected functional connectivity
methods. We showed that the alpha band yielded maximal values of correlation
coefﬁcients between small-worldness indices obtained with different methods. In
contrast, delta and gamma bands demonstrated the least consistent results.

Keywords: EEG Brain graphs Functional connectivity

Small-world network

1 Introduction

The recent progress in neuroscience has made it possible to frame the brain functioning
in terms of graph theory. There are many metrics to evaluate topological features of the
complex networks. Watts and Strogatz defined a generative model for graphs with two
key properties: clustering coefficient and characteristic path length [1]. The generated
graphs having hybrid properties, short path length and high clustering coefficient, were
called small-world networks [2]. Their characteristic, small-worldness (SW), was found
to be ubiquitous and universal across both living and non-living complex systems (e.g.
C. elegance connectome, social networks, Internet) [2].
The mainstream standpoint in neuroscience is that these complex brain networks
are organized through synchronization of multiple brain areas. Neural oscillations may
play a causal role in forming brain activity and behavior [3]. Functional connectivity is
intended to characterize such patterns of synchronization.
It has repeatedly been demonstrated that topological properties of EEG-based brain
graphs can be useful in constituting novel biomarkers of psychiatric and neurological
disorders [4–6]. However, ultimate results of SW coefficient computations are highly
dependent on the method being used. For example, M. Lai and colleagues, comparing
scalp- and source-based measures of functional connectivity, found strong correlation
for the global connectivity between scalp- and source-level, but arguing that network
topology was only weakly correlated [7].

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 116–123, 2020.
https://doi.org/10.1007/978-3-030-30425-6_12
Consistency Across Functional Connectivity Methods 117

Thus, it is of critical importance to evaluate the difference of FC methods and

determine the influence this difference imposes on ﬁnal outcome.
In this study, taking the ﬁrst step toward this aim, we sought to make comparison of
different functional connectivity methods (and topological properties of graphs they
give).

2 Methods

One hundred and seven healthy volunteers participated in the experiment. High-density
EEG recordings in resting state with eyes open were analyzed. These recordings are
part of publicly available EEG dataset [8–10]. The EEG was recorded from 64 elec-
trodes as per the international 10-10 system (excluding electrodes Nz, F9, F10, FT9,
FT10, A1, A2, TP9, TP10, P9, and P10). We defined frequency ranges of EEG activity
according to conventional division: delta (1–3, 5 Hz), theta (4–7, 5 Hz), alpha (8–12,
5 Hz), beta (13–29, 5 Hz), gamma (30–45 Hz). Two reference electrodes were posi-
tioned at the left and right mastoids. The data were re-referenced offline to the common
average reference.
In present study, we used six functional connectivity measures.
1. Coherence [11],

E Sxy
Coh ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð1Þ
E ½Sxx E Syy

A widely used FC method estimating the relation between two signals.

2. Imaginary part of coherency [12],

Im E Sxy
iCoh ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2Þ
E½Sxx E Syy

A method in which imaginary part of cross-spectral density is taken instead of

magnitude. The method is considered to be insensitive to volume conduction bias.
3. PLI [13],

PLI ¼ E sign Im Sxy ð3Þ

Phase lag index, a method reducing the influence of common sources.

118 A. A. Pashkov and I. S. Dakhtin

4. wPLI [14],

E Im Sxy
wPLI ¼ ð4Þ
E Im Sxy

An extension of PLI, a weighted PLI. The weight is magnitude of the imaginary

part of the cross-spectral density. The method is less sensitive to small perturbations in
phase lag.
5. PLV [15],
" #

Sxy
PLV ¼ E ð5Þ
Sxy

Phase locking value, another method measuring phase synchrony. Originally tai-
lored to study evoked activity, it can still be applied to the resting-state.
6. PPC [16],

2 XN 1 XN
PPC ¼ cos hj hk : ð6Þ
N ðN 1Þ j¼1 k¼j þ 1

Pairwise phase consistency, an unbiased estimator of squared PLV.

Here Sxy ; Sxx ; Syy are cross-spectral and autospectral density of signals, E½ denotes
the averaging over epochs, N is epoch count, hj ; hk are relative phases at ith and jth epoch.
The raw data was bandpass ﬁltered from 1 to 45 Hz and epoched (5-s segments) in
open-source Python software MNE [17]. Then, we run automated artifact rejection
using Autoreject library [18]. Next, the preprocessed epochs were used to compute all-
to-all connectivity matrices; these matrices were thresholded (the threshold is the mean
value) and set as adjacency matrices of graphs. Then, the largest (by node count)
connected component of every graph was taken into consideration, for each average
clustering coefﬁcient C and average minimum path length L were computed. These
computations were then applied for the set of ten random graphs with the same number
of nodes and edges, gaining average Cr and Lr respectively. The SW values were
calculated as

C=Cr
SW ¼ : ð7Þ
L=Lr

and stacked into 1 x 107 array, in accordance with the number of participants.
Statistical analysis was performed using IBM SPSS version 25 (IBM Corp,
Armonk, NY, USA). Normality of distribution of the data was assessed with
Kolmogorov-Smirnov test. As a proxy measure for consistency, values of correlation
coefﬁcients between different functional connectivity metrics across different frequency
bands were used.
Consistency Across Functional Connectivity Methods 119

3 Results

Assumption of normal distribution was violated, so non-parametric Spearman’s rank

correlation coefficient was computed.
The main results are graphically displayed in Fig. 1 through Fig. 6. In delta fre-
quency range the only statistically significant correlation for wPLI was PLI (q = 0, 72,
p < 0, 01). Phase-lag index, in turn, showed negative correlation with three of five
measured functional connectivity methods, namely coherence, PLV and PPC
(q = −0,28, q = −0,22, q = −0,22, respectively). wPLI as well as PLI demonstrated
null correlation coefficient with iCoh metric. Coherence correlated maximally with
PLV and PPC at levels of q = 0,77 and q = 0,82, respectively. Maximal positive
correlation in delta range was observed in relationship between PLV and PPC
(q = 0,96, p < 0,01) (Figs. 2, 3, 4 and 5).

Fig. 1. Correlation coefﬁcients of small-worldness values between wPLI-to-all other functional

connectivity methods across different EEG frequency bands

Fig. 2. Correlation coefﬁcients of small-worldness values between PLI-to-all other functional

connectivity methods across different EEG frequency bands
120 A. A. Pashkov and I. S. Dakhtin

Fig. 3. Correlation coefﬁcients of small-worldness values between Coherence-to-all other

functional connectivity methods across different EEG frequency bands

Fig. 4. Correlation coefﬁcients of small-worldness values between PLV-to-all other functional

connectivity methods across different EEG frequency bands

Fig. 5. Correlation coefﬁcients of small-worldness values between iCoh-to-all other functional

connectivity methods across different EEG frequency bands
Consistency Across Functional Connectivity Methods 121

Fig. 6. Correlation coefﬁcients of small-worldness values between PPC-to-all other functional

connectivity methods across different EEG frequency bands

In similar vein, analysis of methods’ consistency in theta frequency resulted in high

positive correlation between wPLI and PLI (q = 0,84, p < 0,01). Maximal correlation
coefficient was found in PLV-PPC pair (q = 0,97, p < 0,01). Imaginary part of
coherence had moderate to low correlation coefficient values, all of which did not
exceed q = 0,55.
In EEG alpha range, wPLI and PLI had correlation strength of 0,938 (p < 0,01).
Coherence values highly correlated with PLV and PPC ones (q = 0,87 for both,
p < 0,01). There were no correlation coefficient values (for relationship between
coherence and other methods) which went below q = 0,6. Strong correlation was
observed in pairs iCoh-wPLI (q = 0,79, p < 0,01) and iCoh-PLI (q = 0,78, p < 0,01).
PLV had maximal correlation with Coh (q = 0,87, p < 0,01) and PPC (q = 0,98,
p < 0,01). Peak value in this frequency range was in PPC-PLV pair (q = 0,98,
p < 0,01).
wPLI-PLI correlation coefficient in beta range was q = 0,93 (p < 0,01). Minimal
values of correlation was found among wPLI and Coherence (q = 0,29, p < 0,01).
Coherence, in turn, had maximal correlation values with PLV (q = 0,72, p < 0,01) and
PPC (q = 0,83, p < 0,01). PLV highly correlated with PPC (q = 0,96, p < 0,01).
Coherence and imaginary part of coherence had correlation coefficient of 0,39
(p < 0,01).
Results in the gamma range showcase an absence of statistically significant cor-
relation between wPLI and coherence, whereas demonstrating strong link between
wPLI and PLI (q = 0,85, p < 0,01). wPLI had low correlation with PPC (q = 0,2,
p < 0,05). Correlation between coherence and PPC took value of 0,75 (p < 0,01).
Aside from PPC and PLV, other methods were shown to have correlations with
coherence not significantly different from zero. PLV had strong relationship with PPC
values (q = 0,9, p < 0,01). Imaginary part of coherence had no correlation values with
other methods which exceed level of q = 0,46.
122 A. A. Pashkov and I. S. Dakhtin

4 Discussion and Conclusions

In this paper, we strived to provide a brief and concise illustration of how consistent
measures of functional connectivity across different EEG frequency ranges were.
Major finding of the study is that the alpha range gives the highest correlation
coefficients and, therefore, allowing one to get more similar estimations of topological
properties of brain graphs across functional connectivity methods being tested. Pre-
dominance of activity in alpha band is distinguishing feature of brain resting state.
Moreover, EEG studies have shown that alpha power fluctuations in brain areas
directly point out to level of inhibition this region is exposed to [19]. Thus, alpha band,
being a conspicuous and reproducible feature of brain activity at rest, provide us with
the most consistent measures of topological properties of brain networks.
The least consistent values of correlation strength between FC methods were found
in delta and gamma frequency ranges. Delta and gamma bands are extreme examples of
EEG frequencies continuum, representing different modes of neural information pro-
cessing with delta being mostly involved in coordination distantly located areas, while
gamma rhythm engaged in local information processing [3]. However, it is currently
unclear to what extent it may relates to results observed in this paper.
wPLI has high correlation values with PLI in all frequency ranges. This may be
attributed to the fact that wPLI is an extension of the PLI. Both measures are insensitive
to volume conduction which represents the major issue for FC computed on EEG
sensor space data. The significance of this issue for functional connectivity analysis
may also be evidenced by considering iCoh-Coh pair. Correlation values between iCoh
and Coh didn’t surpass q = 0,39 (except for alpha range with q = 0,7), indicating the
possible presence of volume conduction effects.
It is worth noticing, however, that our study has a number of limitations. Firstly, we
used sensor but not source space data for analyzing SW of brain graphs. Therefore, the
obtained results should be taken with caution. Secondly, we did not verify the results
on directional and weighted graphs which also may give different pattern of results.
Finally, all the computations in sensor space are reference-dependent which implies the
need to reexamine these results by using different reference techniques. As a possible
extension of current paper, correlations between the different connectivity approaches
[20], namely time domain methods and frequency domain ones may be considered.
Space limitations prevent us from including an exhaustive list of all pairwise
comparisons between selected functional connectivity methods.
In conclusion, taking into account all abovementioned issues of extant data, it is
highly warranted to direct our efforts to critical and thorough revision of currently used
brain graph topological metrics and their clinical applications.

References
1. Watts, D.J., Strogatz, S.H.: Collective dynamics of “small-world” networks. Nature 393
(6684), 440–442 (1998)
2. Fornito, A., Zalesky, A., Bullmore, E.T.: Fundamentals of brain network analysis, p. 476.
Academic press, Cambridge (2016)
Consistency Across Functional Connectivity Methods 123

3. Thut, G., Miniussi, C., Gross, J.: The functional importance of rhythmic activity in the brain.
Curr. Biol. 22(16), R658–R663 (2012)
4. Jhung, K., Cho, S.-H., Jang, J.-H., Park, J.Y., Shin, D., Kim, K.R., An, S.K.: Small-world
networks in individuals at ultra-high risk for psychosis and ﬁrst-episode schizophrenia
during a working memory task. Neurosci. Lett. 535, 35–39 (2013)
5. Stam, C., Jones, B., Nolte, G., Breakspear, M., Scheltens, P.: Small-world networks and
functional connectivity in alzheimer’s disease. Cereb. Cortex 17(1), 92–99 (2006)
6. Wei, L., Li, Y., Yang, X., Xue, Q., Wang, Y.: Altered characteristic of brain networks in
mild cognitive impairment during a selective attention task: an EEG study. Int.
J. Psychophysiol. 98(1), 8–16 (2015)
7. Lai, M., Demuru, M., Hillebrand, A., Fraschini, M.: A comparison between scalp- and
source-reconstructed EEG networks. Sci. Rep. 8(1), 12269 (2018)
8. Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., et al.:
PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for
complex physiologic signals. Circulation 101(23), e215–e220 (2000)
9. Schalk, G., McFarland, D.J., Hinterberger, T., Birbaumer, N., Wolpaw, J.R.: BCI2000: a
general-purpose brain-computer interface (BCI) system. IEEE Trans. Biomed. Eng. 51(6),
1034–1043 (2004)
10. http://www.schalklab.org/research/bci2000
11. Bowyer, S.M.: Coherence a measure of the brain networks: past and present. Neuropsy-
chiatr. Electrophysiol. 2(1), 1 (2016)
12. Nolte, G., et al.: Identifying true brain interaction from EEG data using the imaginary part of
coherency. Clin. Neurophysiol. 115(10), 2292–2307 (2004)
13. Stam, C.J., et al.: Phase lag index: assessment of functional connectivity from multi-channel
EEG and MEG with diminished bias from common sources. Hum. Brain Mapp. 28(11),
1178–1193 (2007)
14. Vinck, M., et al.: An improved index of phase-synchronization for electro-physiological data
in the presence of volume-conduction, noise and sample-size bias. NeuroImage 55(4), 1548–
1565 (2011)
15. Lachaux, J.P., et al.: Measuring phase synchrony in brain signals. Hum. Brain Mapp. 8(4),
194–208 (1999)
16. Vinck, M., et al.: The pairwise phase consistency: a bias-free measure of rhythmic neuronal
synchronization. NeuroImage 51(1), 112–122 (2010)
17. Gramfort, A., Luessi, M., Larson, E., Engemann, D., Strohmeier, D., et al.: MEG and EEG
data analysis with MNE-Python. Front. Neurosci. 7, 267 (2013). ISSN 1662-453X
18. Jas, M., Engemann, D., Bekhti, Y., Raimondo, F., Gramfort, A.: Autoreject: automated
artifact rejection for MEG and EEG data. NeuroImage 159, 417–429 (2017)
19. Bazanova, O.M., Vernon, D.: Interpreting EEG alpha activity. Neurosci. Biobehav. Rev. 44,
94–110 (2014)
20. Bastos, A.M., Schoffelen, J.-M.: A tutorial review of functional connectivity analysis
methods and their interpretational pitfalls. Front. Syst. Neurosci. 9, 175 (2016)
Evolutionary Minimization
of Spin Glass Energy

Vladimir G. Red’ko(&) and Galina A. Beskhlebnova

Scientiﬁc Research Institute for System Analysis, Russian Academy of Sciences,

Moscow 117218, Russia
[email protected], [email protected]

Abstract. The current work describes the model of evolutionary minimization

of energy of spin glasses. The population of agents (modeled organisms) is
considered. The genotypes of agents are coded by a large number of spins of the
spin glass. The energy of the spin glass is calculated in accordance with the
Sherrington-Kirkpatrick model. This energy determines the fitness of agents.
The process of evolutionary minimization of the spin glass energy is analyzed
by means of computer simulation. Several properties of spin glasses that are
related to the model of evolutionary search are analyzed. In particular, the global
energy minima of spin glasses and the variation of energy at one-spin mutation
are estimated. The process of the gradual decrease of the spin glass energy is
also analyzed. The gradual decrease is performed by sequential changes of signs
of separate spins of spin glass. The computer simulation demonstrates that
evolutionary optimization results in the finding of essentially deeper energy
minima as compared with the gradual decrease. The rate and efficiency of
evolutionary minimization of energy of spin glasses have been estimated and
checked by computer simulation.

Keywords: Evolutionary optimization Energy of spin glass Agents

Rate and efﬁciency of evolutionary process

1 Introduction

The current work is development of our previous article [1]. The new features of the
current paper are the following: we consider here the more detailed model of the
evolutionary minimization of spin glass energy and analyze additionally several
properties of spin glasses that are related with the considered evolutionary search. This
additional analysis includes: (1) estimation of the global energy minima of spin glasses
by computer simulation, (2) estimation of energy variation at changing the sign of one
spin (this variation can be considered as the one-spin mutation), (3) the study of the
gradual decrease of spin glass energy. The gradual decrease is performed by the fol-
lowing method: the spins of the spin glass are sequentially changed and the changes,
which decrease the energy, are ﬁxed. The analysis is performed by means of computer
simulation.

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 124–130, 2020.
https://doi.org/10.1007/978-3-030-30425-6_13
Evolutionary Minimization of Spin Glass Energy 125

The most essential result of the current work is the analytical estimation of the rate
and efﬁciency of evolutionary minimization of the spin glass energy. Using computer
simulation, we have checked this analytical estimation.
Our evolutionary model is similar to the quasispecies model [2, 3]. In the current
article, we use analogy with the quasispecies model and our previous estimation for the
quasispecies model [4, 5] with Hamming distance between agent genotypes.

2 Model of Evolutionary Minimization of Spin Glass Energy

2.1 Formal Model of Spin Glass

Using the well-known model of spin glasses by Sherrington-Kirkpatrick [6, 7], we can
construct the evolutionary model for a very large number of the local maxima of a
ﬁtness function. The spin-glass model describes a system of pairwise interacting spins.
Interactions between the spins are random. A formal model of the spin glass is the
following.
(1) There is a system S of spins Si, i = 1,…,N (the number of spins N is supposed to
be large, N >> 1), Si = +1 or –1.
(2) The exchange interactions between spins are random. The energy of the spin
system is deﬁned as:

X
N
EðSÞ ¼ Jij Si Sj ; ð1Þ
i;j¼1;i\j

where Jij are the exchange interactions matrix elements. Jij are normally distributed
random values. Probability density P(Jij) is:
rffiffiffiffiffiffiffiffiffiffiffiffi ( " #)
N1 Jij2 ðN 1Þ
PðJij Þ ¼ exp : ð2Þ
2p 2

The model (1), (2) was intensively investigated. For further consideration, the
following spin-glass features are essential.
The number of local energy minima M is very large [8]:

M eaN ; a 0:2 : ð3Þ

A local energy minimum is deﬁned as a spin glass state SL, at which the change of
sign of any one spin (Si !–Si) increases the energy E.
The global energy minimum E0 equals approximately –0.8 N [9]:

E0 0:8N: ð4Þ
126 V. G. Red’ko and G. A. Beskhlebnova

From (1), (2) one can obtain that the mean value of the spin-glass energy is zero:

\E [ ¼ 0 ð5Þ

and the mean square root value of the energy variation at the change of sign of any one
spin (Si !–Si) is of the order of 1 [1]:
rffiffiffi
8
\DE [ ¼ : ð6Þ
p

Using computer simulation, we have checked the estimations (4), (6). Figure 1
shows the dependence of the global energy minimum E0 on the number of spins N in
the spin glass. Almost all results are averaged for different number of independent
calculations. Numbers of independent calculations nav are as follows: for N = 5, 10
nav = 106, for N = 15 nav = 104, for N = 20 nav = 103, for N = 25 nav = 10. For
N = 30, there was only single calculation.

-5
E0
-10

-15

-20

-25
0 5 10 15 20 25 30
N

Fig. 1. The dependence of the global energy minimum E0 on the number of spins N.

We also calculated the mean square root value of the energy variation at the change
of sign of any one spin (Si !–Si) (this result was averaged for 10000 of independent
calculations). The result was the calculated estimation: <DE> 1.60. The results of
these calculations agree with the estimations (4), (6).

2.2 Model of Evolutionary Process

Let us construct the spin-glass model of evolution. We suppose that the genotype of the
agent (modeled organism) is the set of N spins of the spin-glass system S. The ﬁtness of
the agent, which has the genotype Sk, is:

f ðSk Þ ¼ ebEðSk Þ ; ð7Þ

where b is the parameter of selection intensity, b > 0.

Evolutionary Minimization of Spin Glass Energy 127

The population is the set of n agents with genotypes Sk, k = 1,…, n. We suppose,
that (1) the evolutionary process consists of consecutive generations, (2) new gener-
ations are obtained by the selection and the mutations of agents. The agent is selected
into the population of the new generation in accordance with the fitness (7). At the
mutations, the signs of genotype symbols are changed (Ski !– Ski) with the probability
Pm for any symbol. The selection of agents into the new population is probabilistic: any
agent is selected into the new population with the probability, which is proportional to
its fitness f(Sk); namely, the well-known method “roulette wheel selection or fitness
proportionate selection” is used. The genotypes of agents of the initial population are
random.
Similar to the quasispecies model with the Hamming distance between the geno-
types of agents [4, 5], we will suppose the following natural relationships between the
parameters of the model: N, n >> 1, 2 N >> n, b >* PmN, PmN *<1, n * N. The
inequality 2 N >> n means that the evolutionary process is essentially stochastic, the
number of possible genotypes in populations is relatively small, and some kinds of
genotypes S are absent in the population. The relation b >* PmN means that the
intensity of selection is enough large. The relation PmN *<1 means that the intensity
of mutations is relatively small. The relation n * N means that the role of neutral
selection (independent on the fitness) is sufficiently small [5]. At these relations, the
evolutionary rate is mainly due to two processes: mutations and selection; at mutations,
the new agents with new kind of genotypes appear in the population; at the selection,
the agents with large fitness are selected into the population of the new generation.

2.3 Estimations of the Rate and Efﬁciency of Evolutionary Search

Let us estimate the rate and efﬁciency of the evolutionary process in the spin-glass
model. For large N (N >> 1, 2 N >> n) and sufﬁciently large population size n (when
the role of neutral selection is small), the total number of generations of the evolu-
tionary process GT can be estimated as follows. The emergence of new agents in the
population with less energy is the result of mutations, and then these agents are selected
into the population of the new generation. The characteristic number of generations
G–1, during which the average energy in the population <E>P decreases by 1, can be
estimated as:

GM þ GS
G1 ; ð8Þ
DE
where DE is the characteristic value of the variation of energy at one mutation.
GM * (NPm)−1 is the characteristic number of generations required for a single
mutation in a genotype. GS * (b DE)−1 is the typical number of generations, at which
agents with the energy <E>P – DE replace agents with the energy <E>P in the pop-
ulation. Pm is the probability of one mutation. According to the expression (6)
DE * 1.
128 V. G. Red’ko and G. A. Beskhlebnova

From these relations, we have:

1 1 1
G1 þ ; DE 1: ð9Þ
DE NPm bDE

The total change of the energy in the population during the evolutionary search of
energy minima according to (4), (5) is of order N, hence the characteristic number of
generations of the whole process of the evolutionary minimization of spin glass energy
GT for the considered model is GT * G–1 N. Therefore, we have:

1 N
GT þ : ð10Þ
Pm b

The total number of agents involved in the evolution is ntotal = n GT .

Let us estimate the values GT and ntotal at a sufﬁciently high intensity of selection
(when it is possible to neglect the second term in (10)) and for a sufﬁciently large
population size (when the role of a neutral selection is small). Similar to the model of
quasispecies with Hamming distance between the genotypes [4, 5], we suppose that
Pm * N −1 and n * N. Finally, we obtain:

GT N; ntotal N 2 : ð11Þ

The expressions (11) characterize the main results of our estimations. These
expressions have been checked by means of computer simulation.

2.4 Checking Estimations of the Rate and Efﬁciency of Evolutionary Search

The process of evolutionary search was analyzed by means of computer simulation.
The number of spins in the spin glass at computer simulation was sufﬁciently large:

-20
E
-40 1

-60
2
-80
0 50 100 150 200
G

Fig. 2. The dependence of spin glass energy of agents E on the generations G of evolutionary
search. 1 – the average energy of agents in the population, 2 – the minimal energy of agents in the
population. The parameters of simulation were the following: the number of spins N = 100, the
population size n = N = 100, the mutation intensity Pm = N−1 = 0.01, the parameter of selection
intensity b = 1. Results are averaged for 1000 different calculations.
Evolutionary Minimization of Spin Glass Energy 129

N = 100. Figure 2 shows the dependence of spin glass energy of agents on the gen-
erations of evolutionary search.
Figure 2 shows that the characteristic number of generations at evolutionary search
GT is of the order of the number of spins N. This is in accordance with the estimations
(11).
It should be underlined that the evolutionary search results in one of the local
energy minima of a spin glass. These minima are rather close to the global minimum of
the energy of the spin glass.
We also considered the gradual decrease of the spin glass energy, which is formed as
follows. The signs of the symbols of the spin glass are sequentially changed (Si !– Si,
i = 1, …, N) and only successful sign changes (resulting in the decrease of the spin glass
energy) are ﬁxed. The considered sequential search needs smaller number of participants
as compared with the evolutionary search. Using computer simulation, we have ana-
lyzed the sequential search. The process of energy minimization at the sequential search
is characterized by Fig. 3.

0
-5
E
-10
-15
-20
-25
-30
0 200 400 600 800 1000
t

Fig. 3. The dependence of the spin glass energy E on the searching time t at the sequential
search. Results are averaged for 1000 different calculations.

Comparison of Figs. 2 and 3 shows that the evolutionary search provides a sig-
nificantly deeper local energy minima EL, as compared with sequential search, because
different valleys in energy landscape are looked through simultaneously in the evo-
lutionary process with approaching to energy minima. Moreover, the evolutionary
search ensures the finding of sufficiently deep local minima that are close to the global
minima (see the expression (4) and Fig. 1 that characterize the value of global minima
quantitatively). Therefore, in the spin-glass case, the evolutionary search has a definite
advantage with respect to the sequential search: the evolutionary minimization ensures
the finding of the deeper energy minima.
130 V. G. Red’ko and G. A. Beskhlebnova

3 Conclusion

Thus, the model of evolutionary minimization of spin glass energy has been developed.
The rate and efficiency of evolutionary minimization of energy of spin glasses have
been analytically estimated and checked by computer simulation. It has been demon-
strated that the evolutionary search ensures the finding of sufficiently deep local energy
minima that are close to the global minimum.

Acknowledgments. The work was ﬁnancially supported by State Program of SRISA RAS.
Project number is No. 0065-2019-0003 (AAA-A19-119011590090-2).

References
1. Red’ko, V.G.: Spin glasses and evolution. Biofizika (Biophys.) 35(5), 831–834 (1990). (in
Russian)
2. Eigen, M.: Molekulare selbstorganisation und evolution (selforganization of matter and the
evolution of biological macromolecules). Naturwissenschaften 58(10), 465–523 (1971)
3. Eigen, M., Schuster, P.: The Hypercycle: A Principle of Natural Self-Organization. Springer,
Berlin (1979)
4. Red’ko, V.G., Tsoy, Y.R.: Estimation of the efficiency of evolution algorithms. Doklady
Math. (Rep. Math.) 72(2), 810–813 (2005)
5. Red’ko, V.G.: Modeling of cognitive evolution. Toward the Theory of Evolutionary Origin of
Human Thinking. KRASAND/URSS, Moscow (2018)
6. Sherrington, D., Kirkpatrick, S.: Solvable model of spin-glass. Phys. Rev. Lett. 35(26), 1792–
1796 (1975)
7. Kirkpatrick, S., Sherrington, D.: Infinite range model of spin-glass. Phys. Rev. B. 17(11),
4384–4403 (1978)
8. Tanaka, F., Edwards, S.F.: Analytic theory of the ground state of a spin glass: I. Ising spin
glass. J. Phys. F: Metal Phys. 10(12), 2769–2778 (1980)
9. Young, A.P., Kirkpatrick, S.: Low-temperature behavior of the infinite-range Ising spin-glass:
Exact statistical mechanics for small samples. Phys. Rev. B. 25(1), 440–451 (1982)
Comparison of Two Models of a Transparent
Competitive Economy

Zarema B. Sokhova(&) and Vladimir G. Red’ko

Scientiﬁc Research Institute for System Analysis, Russian Academy of Sciences,

Moscow 117218, Russia
[email protected], [email protected]

Abstract. The article compares two models of a transparent competitive

economy. In both models, the interaction between investors and producers is
considered. In the first model, the producers do not take into account their own
contributions to their capitals, in the second model, the producers take into
account their contributions to their own capitals, i.e. the producers themselves
play the role of investors. The analysis of these two models by computer sim-
ulation was performed. It is shown that in the first model when the producers
give half of their profits to investors, the capital in the producer community is
redistributed by investors more efficiently.

Keywords: Autonomous agents Transparent competitive economy

Investors Producers

1 Introduction

This paper develops our previous works [1–3], in which the basic model of interaction
between two communities of agents has been constructed and investigated. The basic
model considers agents-producers and agents-investors. In the basic model, the pro-
ducers do not take into account their contributions to their own capitals at the distri-
bution of proﬁts. In this paper, in addition to the basic model, a new model has been
constructed, in which producers take into account their own contributions to their
capitals at the distribution of their proﬁts. This means that producers can be considered
as some kind of investors that contribute the capital into themselves. By computer
simulation, the results obtained in these two models are compared for two regimes:
(1) without taking into account their own contributions of producers (the basic model)
and (2) taking into account their own contributions of producers (the new model).

2 Description of Models

2.1 Basic Model

In the basic model, two communities of agents are considered: agent-investors and
agent-producers [1–3]. The number of investors is N; the number of producers is M;
their capitals are equal to Kinv и Kpro , respectively. Agents function during NT periods.

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 131–137, 2020.
https://doi.org/10.1007/978-3-030-30425-6_14
132 Z. B. Sokhova and V. G. Red’ko

At the end of each period T; the investors determine the values of contributions that
they will make into producers in the next period T þ 1. To ﬁnd these values, tmax
iterations are performed. During iterations, the investors and producers exchange
information by means of light agents: searching agents and intention agents. These
light agents are similar to those used in the works [4, 5].
In the beginning of the period, the i-th producer has a capital Ci :

X
N
Ci ¼ Ci0 þ Cij ; ð1Þ
i¼1

where Ci0 is the own initial capital of the i-th producer, Cij is the capital invested by the
j-th investor into the i-th producer at the beginning of the period. The dependence of
the i-th producer’s proﬁt on its capital Ci is determined by the formula:

Pi ðCi Þ ¼ ki FðCi Þ; ð2Þ

where the function FðxÞ is the same for all producers, and the coefﬁcient ki charac-
terizes the efﬁciency of the i-th producer. The function FðxÞ has the form:

ax; if x Th
FðxÞ ¼ ; ð3Þ
Th; if x [ Th

where a is the positive parameter, Th is the threshold of the function FðxÞ; Th [ 0:

At the end of the period, the producer returns to investors their invested capital. In
addition, the producer pays investors a portion of their proﬁts. At this payment, the j-th
investor obtains the part of the proﬁt that is proportional to the investment made by this
investor into the i-th producer:

Cij
Pinv;ij ¼ krepay Pi ðCi Þ ; ð4Þ
P
N
Cil
l¼1

where Ci is the current capital (at the beginning of the period) of the i-th producer,
krepay is the payment parameter that characterizes the part of profits paid to investors,
0\krepay \1: Note that in this basic model, the producers do not take into account the
size of their own contribution Ci0 and give the part of their profits to the investors
according to the parameter krepay (see the expression (4)). The producer itself obtains
the remaining part of the profit:

X
N
Ppro;i ¼ Pi ðCi Þ Pinv;ij : ð5Þ
j¼1

Let’s characterize the iterative process, during which the contributions of investors
into producers are determined. At the ﬁrst iteration, the investors send the searching
agents to all producers and determine the current capital of each producer. Further, the
Comparison of Two Models of a Transparent Competitive Economy 133

investors estimate the values Aij , which characterize the proﬁt expected from the i-th
producer in the period. The values Aij are equal to:

0 Cij
Aij ¼ dij Pinv;ij ¼ dij krepay ki FðCi0 Þ ; ð6Þ
P
N
Cil
l¼1

where dij is the current degree of confidence of the j-th investor to the i-th producer, Cil
0
is the capital invested by the l-th investor into the i-th producer, Ci0 is the initial capital
of the i-th producer at the beginning of the period (in the first iteration, investments of
other investors are not taken into account). The current degree of confidence dij is equal
to dtest or duntest , dtest [ duntest [ 0: Parameters dtest , duntest take into account the fact that
the investor prefers the tested producers. In computer simulation, we set dtest ¼ 1;
duntest ¼ 0:5:
Then the j-th investor forms the intention to distribute its capital Kinv; j among the
producers proportionally to the values Aij : Namely, it is planned that the contribution
of the j-th investor into the i-th producer Cij will be equal to:

Aij
Cij ¼ Kinv; j : ð7Þ
P
M
Alj
l¼1

At the second iteration, each investor sends the intention agents to all producers
and informs them about the planned values of capital investments Cij : Based on these
data, the producers estimate their new capitals, which they expect after receiving
capitals from all investors. These capitals are calculated in accordance with the
expression (1).
Then investors send again the searching agents to all producers and evaluate the
0
new capitals of the producers Ci0 (taking into account the planned values of invest-
P
N
ments Cij of other investors), as well as the sums Cil . Investors estimate new values
l¼1
Aij in accordance with the expression (6), which already takes into account the sum of
the intended contributions of all investors. Further, each investor forms a new intention
to distribute the capital Kinv; j according to the expression (7). Then investors send
intention agents to the producers and inform them about the new intended values of
contributions Cij : After a sufficiently large number of such iterations, each investor
makes the final decision on investments for the next period. The final contributions are
equal to the values Cij obtained by investors at the last iteration.
At the end of each period, the capitals of the producers are reduced:
Kpro ðT þ 1Þ ¼ kamr Kpro ðTÞ, where kamr is the amortization coefficient (0\kamr 1).
Investors capitals are reduced analogously: Kinv ðT þ 1Þ ¼ kinf Kinv ðTÞ, where kinf is the

inflation coefficient 0\kinf 1 .
If the capital of an investor or producer becomes more than a certain large threshold
Thmax inv or Thmax pro , and the number of agents in the community is less than the
134 Z. B. Sokhova and V. G. Red’ko

possible maximum, then this investor or producer is divided to two agents. When the
investor or producer is divided, the “parent” gives half of its capital to the “descen-
dant”. The “producer-child” inherits the effectiveness ki of its parent. The “investor-
child” inherits the conﬁdence factors dij of the parent investor. The conﬁdence factor dij
to the “descendant” of the producer is set equal to duntest ; since this new producer was
not tested yet.
If the capital of an investor or producer becomes less than a certain small threshold
Thmin inv or Thmin pro ; then this investor or producer dies.

2.2 New Model

In the basic model described above, the distribution of profits by producers and the
estimations of profits (see the expressions (4), (6)) do not consider the contribution of
the producers: independently on producer contribution Ci0 ; the profit is distributed
between the producer and investors according to the payment parameter krepay :
In the new model, we consider the contribution of the producer at the distribution of
profits. We modify the expressions (4) and (6) as follows.

Cij
Pinv;ij ¼ Pi ðCi Þ ; ð8Þ
P
N
Cil þ Ci0
l¼1

0 Cij
Aij ¼ dij Pinv;ij ¼ dij ki FðCi0 Þ : ð9Þ
P
N
Cil þ Ci0
l¼1

Thus, at the distribution of proﬁts, each agent (both the producer and the investor)
receives a proﬁt that is proportional to the contribution of this agent.
The other elements of the new model are the same as in the basic model.

3 Results of Computer Simulation

At the computer simulation, we compared the basic model and the new model.
The main parameters of the simulation were the following: the number of periods
NT ¼ 1 or 100; the maximal number of iterations within the period tmax ¼ 10; maximal
capital thresholds for investors and producers Thmax inv ¼ 100:0; Thmax pro ¼ 100:0;
minimal capital thresholds for investors and producers Thmin inv ¼ 0:01; Thmin pro ¼
0:01; the maximal possible number of producers and investors in the community is
Mmax ¼ 2 or 100 and Nmax ¼ 1 or 100; the initial number of producers and investors
M0 ¼ 2 or 100; N0 ¼ 1 or 100; the maximal number of producers in which the investor
can invest its capital m ¼ 2 or 100; the parameter a of the profit function a ¼ 0:1; the
threshold of the profit function Th ¼ 100 (see the expression (3)); the payment
parameter krepay ¼ 0:5; the amortization and inflation coefficients kamr ¼ 1:0; kinf ¼ 1:0;
Comparison of Two Models of a Transparent Competitive Economy 135

the characteristic value of the random variation of the efﬁciency of producers at the
transition to a new period Dk ¼ 0:01:
For a clearer understanding of the influence of the scheme, which is used by the
producer, on the process the capital investments, the certain simulation was carried out
for the particular case of one investor and two producers. The efﬁciencies of producers
were k1 ¼ 0:34; k2 ¼ 0:94; the capital of the investor was Kinv ¼ 0:54; the capitals of
producers were Kpro; 1 ¼ 0:48; Kpro; 2 ¼ 0:26: Figure 1 shows the processes of redis-
tribution of the capital by the investor during iterations for the two considered models.

Fig. 1. Distribution of the investor’s contributions during iterations in the period T ¼ 0:

Figure 1 demonstrates that in the basic model, the investor makes contributions into
two producers, and in the new model, the investor selects only one, the most efficient
producer. That is, in the basic model, the investor at planning the contributions to
producers pays attention to both the efficiency and capital amount of producers, and in
the new model, the investor takes into account only the efficiency of producers (see also
the expressions (6), (9) and (2), (3)).
Let’s consider the case of the large community: N ¼ M ¼ 100: The simulation
results for the considered models are presented in Fig. 2.
Analysis of the results for this case shows that in the basic model, when the
producers pay half of their profits to investors krepay ¼ 0:5 , the capital of the producer
community is redistributed by investors more effectively. That is in the next period, the
investor gives the obtained capital to more efficient producers. This is the important
effect of the basic model: the efficient redistribution of capital within the producer
community (by means of investors). Indeed, in the basic model, the total profit (and the
total capital) of the producer community is greater as compared with the new model
(Fig. 2).
136 Z. B. Sokhova and V. G. Red’ko

Fig. 2. Dynamics of total capital of investors and producers in two models. N ¼ M ¼ 100 (lines
for producers and investors in the basic model coincide).

On the other hand, the regime of the new model is more profitable for investors. In
this model, investors choose the most efficient producers, and the profit depends only
on the size of the investments and the efficiency of the producer. It should be noted the
following point of the new model. The investor uses the efficiency of the producer and
receives the main part of the profits that corresponds to the investor’s contribution. The
producer receives only a rather small part of the profits that corresponds producer’s
contribution. Therefore, in the new model, the profits of producers grow more slowly as
compared with the basic model (Fig. 2). From an economic point of view, the regime
of the new model is rather unnatural, since the intensive development of contributions
of investors is not very useful for producers. Therefore, the interaction between agents
is rather ineffective in the new model. Thus, the regime of the basic model is more
interesting for further research.

4 Conclusion

It can be concluded that the behavior of the investors depends on the rules for esti-
mations and distributions of profits. And although the regime of the new model is
beneficial for the investor community, this regime is not profitable for producers. The
producer community is developing more efficiently if the regime of the basic model is
used. Thus, the regime of the basic model is more effective for the total development of
the whole economic community.

Acknowledgments. The work was ﬁnancially supported by State Program of SRISA RAS.
Project number is No. 0065-2019-0003 (AAA-A19-119011590090-2).
Comparison of Two Models of a Transparent Competitive Economy 137

References
1. Red’ko, V.G., Sokhova, Z.B.: Model of collective behavior of investors and producers in
decentralized economic system. Procedia Comput. Sci. 123, 380–385 (2018)
2. Red’ko, V.G., Sokhova, Z.B.: Iterative method for distribution of capital in transparent
economic system. Opt. Mem. Neural Netw. (Inf. Opt.) 26(3), 182–191 (2017)
3. Sokhova, Z.B., Red’ko, V.G.: Agent-based model of interactions in the community of
investors and producers, In: Samsonovich, A.V., Klimov, V.V., Rybina, G.V. (eds.)
Biologically Inspired Cognitive Architectures (BICA) for Young Scientists. Proceedings of
the First International Early Research Career Enhancement School (FIERCES 2016), pp. 235–
240. Springer, Switzerland (2016)
4. Claes, R., Holvoet, T., Weyns, D.: A decentralized approach for anticipatory vehicle routing
using delegate multiagent systems. IEEE Trans. Intell. Transp. Syst. 12(2), 364–373 (2011)
5. Holvoet, T., Valckenaers, P.: Exploiting the environment for coordinating agent intentions.
In: Environments for Multi-Agent Systems III, Lecture Notes in Artiﬁcial Intelligence, vol.
4389, pp. 51–66. Springer. Berlin (2007)
Spectral Parameters of Heart Rate Variability
as Indicators of the System Mismatch During
Solving Moral Dilemmas

I. M. Sozinova1,2(&), K. R. Arutyunova2, and Yu. I. Alexandrov1,2,3

1
Moscow State University of Psychology and Education, Moscow, Russia
[email protected]
2
Institute of Psychology, Russian Academy of Sciences, Moscow, Russia
3
Department of Psychology, National Research University Higher School
of Economics, Moscow, Russia

Abstract. Variability in beat-to-beat heart activity reflects the dynamics of

heart-brain interactions. From the positions of the system evolutionary theory,
any behaviour is based on simultaneous actualization of functional systems
formed at different stages of phylo- and ontogenesis. Each functional system is
comprised by neurons and other body cells, the activity of which contributes to
achieving an adaptive outcome for the whole organism. In this study we
hypothesized that the dynamics of spectral parameters of heart rate variability
(HRV) can be used as an indicator of the system mismatch observed when
functional systems with contradictory characteristics are actualized simultane-
ously. We presented 4–11-year-old children (N = 34) with a set of moral
dilemmas describing situations where an in-group member achieved optional
beneﬁts by acting unfairly and endangering lives of out-group members. The
results showed that LF/HF ratio of HRV was higher in children with developed
moral attitudes for fairness toward out-groups as compared to children who
showed preference for in-group members despite the unfair outcome for the out-
group. Thus, the system mismatch in situations with a moral conflict is shown to
be reflected in the dynamics of heart activity.

Keywords: System evolutionary theory Heart brain interactions

Spectral parameters of heart rate variability Moral dilemmas In-group
Out-group

1 Introduction

Changes in heart rate variability (HRV) reflect the brain – heart interactions (e.g., [10,
14, 22, 24]). HRV indexes have previously been considered as indicators of changes in
brain activation [24]. The baseline HRV is different in people in a state of coma as
compared to healthy people, some authors suggested that HRV can serve as an indi-
cator of the intensity of brain activity [17]. Thayer and colleagues [23] argued that
changes in HRV reflect the hierarchy in organization of an organism and usually
observed in response to indeterminacy and mismatch. The authors suggested that HRV
could indicate the “vertical” integration of the brain mechanisms controlling an

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 138–143, 2020.
https://doi.org/10.1007/978-3-030-30425-6_15
Spectral Parameters of Heart Rate Variability as Indicators 139

organism. It was noted that research into the relationship between heart and brain
activity could open new horizons for the study of psychophysiological bases of indi-
vidual behaviour [12].
Considered from the positions of the system evolutionary theory [2, 5, 21], any
behaviour is based on simultaneous actualization of functional systems [3] formed at
different stages of phylo- and ontogenesis. Each functional system is comprised by
neurons and other body cells, including those of the heart, the joint activity of which
contributes to achieving an adaptive outcome for the whole organism. From these
positions, “HRV originates in cooperation of the heart with the other components of
actualized functional systems” and reflects the system organization of behaviour (see
[6]: p. 2).
Our previous studies have found that in the process of individual development
children gradually shift from supporting in-group members, even when they behave
unfairly towards out-group members, to prioritizing fairness towards all other indi-
viduals, irrespective of what group they belong to [19, 20]. We argued that learning to
support fairness towards out-groups is associated with forming new functional systems
enabling this more complex behaviour. However, fairness towards outgroups can be
contradictory to earlier formed unconditional in-group preference. Situations like this
can be described as the system mismatch, when functional systems with contradictory
characteristics are actualized simultaneously. Here we hypothesize that in a situation
of a conflict between in- and out-group members, fairness towards out-groups would
predetermine the occurrence of a system mismatch reflected in HRV. To test this
hypothesis, we analyzed the spectral parameters of HRV in children solving moral
dilemmas with a conflict between in- and out-group members.

2 Materials and Methods

Thirty-four children participated in the study: 4–5-year-old pre-schoolers (N = 19;

Mean = 5,14; Med = 5; S.D. = 0,43; 25% = 4,48; 75% = 5,35) and 10–11-year-old
school children (N = 15; Mean = 10,62; Med = 10,92; S.D. = 0,52; 25% = 10;
75% = 11). The experimental protocols were approved by the Ethics Committee of the
Institute of Psychology Russian Academy of Sciences. Parents of all participants were
provided with detailed information about procedures of the study and signed informed
consent forms to allow their children to participate. Each child was individually
interviewed in a separate room. All children were presented with a set of moral
dilemmas describing situations when a limited resource was essential for the survival of
an out-group member and beneﬁcial, but not vital, for the well-being of an in-group
member. In each dilemma, an in-group member took away the resource, putting an out-
group member’s life at risk, and children had to choose who to support in this situation.
Heart rate was recorded during the entire experiment using a photoplethysmograph
RB-16CPS (Neurolab) and wireless sensor Zephyr HxM BT. BMInput (A.K. Krylov)
and HR-reader (V.V. Kozhevnikov) software were used.
Pulsograms were recorded into sequences of RR intervals by “Neuru” program
(A.K. Krylov). The spectral parameters of HRV were calculated using RRv7 software
(I.S. Shishalov) (window length — 100 s; step — 10 s). We analysed the following
140 I. M. Sozinova et al.

spectral parameters of HRV: low frequency power of HRV (LF), high frequency power
of HRV (HF), total power of HRV (TP), and LF/HF ratio [13].
Responses to dilemmas were coded as “1”, if a child chose to support an out-group
member, and “0”, if a child chose to support an in-group member. Average scores
characterising individual responses to all dilemmas were also calculated. For the
analyses, all participants were subdivided into two groups: those who supported out-
group members in more than a half of the dilemmas (“out-group supporters”) and those
who supported in-group members in more than a half of the dilemmas (“in-group
supporters”).
Statistical analyses were performed with IBM SPSS Statistic 17. Signiﬁcance at
p < 0.05.

3 Results

Average scores characterising individual responses to all dilemmas were different

between pre-schoolers and school age children, with pre-schoolers supporting out-
group members less often (Mann-Whitney U test: U = 73.5, z = –2.43; p = 0.015). No
signiﬁcant difference between the “in-group supporters” and “out-group supporters”
was observed in LF, HF or TP. Higher values of LF/HF ratio were shown in “out-group
supporters” as compared to “in-group supporters” (Mann-Whitney U test: U = 1.0,
z = –2.939; p = 0.003, for 4–5-year-olds; and U = 4.0, z = –2.66; p = 0.0008, for all
children). No difference in LF/HF ratio was observed between the groups of 4–5-year-
old and 10–11-year-old children within the subgroup of “out-group supporters” (see
Fig. 1).

Fig. 1. Higher values of LF/HF ratio in children supporting out-group members as compared to
children supporting in-group members in situations with a conflict where out-group members
were treated unfairly by in-group members * Mann-Whitney U test, p < 0.05.
Spectral Parameters of Heart Rate Variability as Indicators 141

There was an insufﬁcient number of “in-group supporters” among the 10–11-year-

old children for such statistical comparison.

4 Discussion

In this study we tested the hypothesis that in a situation of a conflict between in- and
out-group members, fairness towards out-groups would predetermine the occurrence of
a system mismatch, which is observed when functional systems with contradictory
characteristics are actualized simultaneously; and such a mismatch would be reflected
in HRV.
As mentioned above, any behaviour, including moral dilemma solving, is sup-
ported by simultaneous actualization of functional systems formed at different stages of
individual development. Our previous work [19, 20] demonstrated that young pre-
school age children tended to exhibit unconditional in-group preference, which is
considered a behavioural strategy based on actualization of functional systems formed
early in individual development, including those associated with parochial altruism
(unconditional in-group preference with aggressive behaviour toward out-groups [1, 9,
11]). Older children were shown to develop a more complex behavioural strategy to
support those treated unfairly, including members of out-groups, which requires
actualisation of later-formed functional systems. This is consistent with the view that
reciprocal altruism toward out-group members requires higher cognitive complexity
[16]. It is possible that the whole structure of individual experience is reorganised
through the formation of “new” systems enabling a new type of behaviour, which may
require some time. The development of moral attitudes towards out-groups occurs
gradually and requires accumulation of a sufﬁcient number of episodes associated with
the “new” moral behaviour. The conflict between the earlier and later formed systems
activated simultaneously can be described as an instance of the system mismatch,
because these systems have contradictory characteristics.
The results of this study showed that in situations involving a conflict where out-
group members are treated unfairly by in-group members, the decision to support out-
group members was associated with higher values of LF/HF ratio of HRV. Higher
values of LF/HF ratio are usually observed during stress [7, 8, 15, 18], which is also
considered as a situation of the system mismatch [4]. Thus, the results of this study
indicate that characteristics of social behaviour and its development, as observed in
case of moral attitudes toward in- and out-group members, can be manifested in the
dynamics of individual psychophysiological states.

Acknowlegements. The reported study was funded by RFBR, the research project № 18-313-
20003_mol_a_ved.
142 I. M. Sozinova et al.

References
1. Abbink, K., Brandts, J., Herrmann, B., Orzen, H.: Parochial altruism in inter-group conflicts.
Econ. Lett. 117(1), 45–48 (2012)
2. Alexandrov, Yu.I.: How we fragment the world: the view from inside versus the view from
outside. Soc. Sci. Inf. 47(3), 419–457 (2008)
3. Alexandrov, Yu.I.: Cognition as systemogenesis. In: Anticipation: Learning from the Past,
pp. 193–220. Springer, Cham (2015)
4. Alexandrov, Yu.I., Svarnik, O.E., Znamenskaya, I.I., Kolbeneva, M.G., Arutynova, K.R.,
Krylov, A.K., Bulava, A.I.: Regression as stage of development [Regressiya kak etap
razvitiya]. M.: Institute of Psychology Ras [Institut Psikhologii RAN] (2017) [in Russian]
5. Alexandrov, YuI, Grechenko, T.N., Gavrilov, V.V., Gorkin, A.G., Shevchenko, D.G.,
Grinchenko, Y.V., Bodunov, M.V.: Formation and realization of individual experience.
Neurosci Behav Physiol 27(4), 441–454 (1997)
6. Anokhin, P.K.: Biology and Neurophysiology of Conditioned Reflex and Its Role in
Adaptive Behavior, 1st edn. Pergamon Press, Oxford (1974)
7. Bakhchina, A.V., Arutyunova, K.R., Sozinov, A.A., Demidovsky, A.V., Alexandrov, Y.I.:
Sample entropy of the heart rate reflects properties of the system organization of behaviour.
Entropy 20(6), 449 (2018)
8. Bakhchina, A.V., Shishalov, I.S., Parin, S.B., Polevayam, S.A.: The dynamic cardiovascular
markers of stress. Int. J. Psychophysiol. 94(2), 230 (2014)
9. Bernhard, H., Fischbacher, U., Fehr, E.: Parochial altruism in humans. Nature 442(7105),
912 (2006)
10. Billman, G.E.: The effect of heart rate on the heart rate variability response to autonomic
interventions. Front. Physiol. 4, 222 (2013)
11. Choi, J.K., Bowles, S.: The coevolution of parochial altruism and war. Science 318(5850),
636–640 (2007)
12. Lane, R.D., Wager, T.D.: The new ﬁeld of Brain-Body Medicine: What have we learned and
where are we headed? NeuroImage 47(3), 135–1140 (2009)
13. Lombardi, F.: Clinical implications of present physiological understanding of HRV
components. Card. Electrophysiol. Rev. 6(3), 245–249 (2002)
14. McCraty, R., Atkinson, M., Tomasino, D., Bradley, R.T.: The coherent heart heart-brain
interactions, psychophysiological coherence, and the emergence of system-wide order.
Integr. Rev. A Transdisc. Transcult. J. New Thought Res. Prax. 5(2) (2009)
15. Polevaya, S.A., Eremin, E.V., Bulanov, N.A., Bakhchina, A.V., Kovalchuk, A.V., Parin, S.
B.: Event-related telemetry of heart rate for personalized remote monitoring of cognitive
functions and stress under conditions of everyday activity. Sovremennye tekhnologii v
medicine 11(1 (eng)) (2019)
16. Reznikova, Z.: Altruistic behavior and cognitive specialization in animal communities. In:
Encyclopedia of the Sciences of Learning, pp. 205–208 (2012)
17. Riganello, F., Candelieri, A., Quintieri, M., Conforti, D., Dolce, G.: Heart rate variability: an
index of brain processing in vegetative state? An artiﬁcial intelligence, data mining study.
Clin. Neurophysiol. 121, 2024–2034 (2010)
18. Runova, E.V., Grigoreva, V.N., Bakhchina, A.V., Parin, S.B., Shishalov, I.S., Kozhevnikov,
V.V., Nekrasova, M.M., Karatushina, D.I., Grigoreva, K.A., Polevaya, S.A.: Vegetative
correlates of conscious representation of emotional stress. CTM 5(4), 69–77 (2013)
19. Sozinova, I.M., Znamenskaya, I.I.: Dynamics of Russian children’s moral attitudes toward
out-group members. In: The Sixth International Conference On Cognitive Science, p. 94
(2014)
Spectral Parameters of Heart Rate Variability as Indicators 143

20. Sozinova, I.M., Sozinov, A.A., Laukka, S.J., Alexandrov, Yu.I.: The prerequisites of
prosocial behavior in human ontogeny. Int. J. Cogn. Res. Sci. Eng. Educ. (IJCRSEE) 5(1),
57–63 (2017)
21. Shvyrkov, V.B.: Behavioral specialization of neurons and the system-selection hypothesis of
learning. In: Human Memory and Cognitive Capabilities, pp. 599–611. Elsevier, Amsterdam
(1986)
22. Stefanovska, A.: Coupled oscillators: complex but not complicated cardiovascular and brain
interactions. In: 2006 International Conference of the IEEE Engineering in Medicine and
Biology Society, pp. 437–440. IEEE (2006)
23. Thayer, J.F., Lane, R.D.: Claude Bernard and the heart–brain connection: Further
elaboration of a model of neurovisceral integration. Neurosci. Biobehav. Rev. 33, 81–88
(2009)
24. Van der Wall, E.E., Van Gilst, W.H.: Neurocardiology: close interaction between heart and
brain. Netherlands Heart J. 21(2), 51–52 (2013)
The Role of Brain Stem Structures
in the Vegetative Reactions Based
on fMRI Analysis

Vadim L. Ushakov1,2(&), Vyacheslav A. Orlov1, Yuri I. Kholodny1,3,

Sergey I. Kartashov1,2, Denis G. Malakhov1,
and Mikhail V. Kovalchuk1
1
National Research Center “Kurchatov Institute”, Moscow, Russia
[email protected]
2
National Research Nuclear University “MEPhI”, Moscow, Russia
3
Bauman Moscow State Technical University, Moscow, Russia

Abstract. This work was aimed at studying the role of brain stem structures in
vegetative responses upon presentation of self signiﬁcant stimuli (personal
name) using the functional MRI method. The subjects, based on the data of the
MRI compatible polygraph, were divided into three groups with different degree
of vegetative reactions to personality-related stimuli: with strong galvanic skin
reactions (GSR) only—7 subjects; with medium GSR and cardiovascular
response (CR)—6 subjects; and with low reactivity of GSR and CR—5 subjects.
The obtained statistical maps of brain neural network activities showed high
activation of the brain stem structures upon presentation of personality-related
stimuli in the second group (medium GSR and CR); low activation of the stem
structures in the ﬁrst group (strong GSR); and complete absence of activation of
the stem structures in subjects of the third group (with low reactivity of the GSR
and CR). It was shown that the use of MRI compatible polygraph for selection
of fMRI data to subsequent statistical analysis is effective.

Keywords: MRI compatible polygraph fMRI Vegetative reactions

Traces of memory

1 Introduction

In the study of operation of brain neural networks and the determination of their exact
spatial-temporal characteristics, objective monitoring of the current condition of sub-
jects during functional magnetic resonance imaging (fMRI) is necessary. For this
purpose, an MRI compatible polygraph (MRIcP) has been developed at NRC “Kur-
chatov Institute”, which allows monitoring the dynamics of human vegetative reactions
during MRI examination (earlier, for this purpose, we used MRI compatible elec-
troencephalograph [1] and eye-tracker [2–4]). The data obtained with the use of MRIcP
could serve as correlates of important neurophysiological processes in the brain and
could be used to determine activation of neural networks involved in these processes.
In this work, a study was carried out using an MRIcP to reveal the relationship between
the dynamics of vegetative reactions—galvanic skin response (GSR) and
© Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 144–150, 2020.
https://doi.org/10.1007/978-3-030-30425-6_16
The Role of Brain Stem Structures in the Vegetative Reactions 145

cardiovascular response (CR)—in response to presentation of stimuli that are

personality-related for subjects, and activity of the brain stem areas potentially
responsible for the regulation of the human cardiovascular system.

2 Materials and Methods

Experiments were performed on a homogeneous group of 20 healthy subjects (men

aged 22–25 years). This study was approved by the ethics committee of the National
Research Centre Kurchatov Institute, ref. no. 5 (from April 5, 2017). All subjects
signed an informed consent for participation in the study.
The experiment was conducted using Siemens Magnetom Verio 3T MRI scanner
based on NRC Kurchatov Institute. To obtain anatomical MRI images, a three-
dimensional T1-weighted sequence was used in the sagittal plane with high spatial
resolution (176 slices, TR = 2530 ms, TE = 3.31 ms, thickness = 1 mm, angle = 7,
inversion time = 1200 ms and FOV = 256 256 mm2). Functional data was obtained
using a standard echo-planar sequence (32 slices, TR = 2000 ms, TE = 24 ms and
isotropic voxel 2 2 2 mm3). Preprocessing of MRI data was carried out on the
basis of the freely distributed software package SPM8 [5], and specially adapted and
developed terminal scripts for the MacOS system. The coordinate center of structural
and functional data were brought to front commissure. Further, calculation and cor-
rection of motion artifacts was made. With the help of separately recorded magnetic
field inhomogeneity maps, functional data was corrected in order to remove magnetic
susceptibility artifacts. Structural and functional MRI volumes were normalized to the
MNI (Montreal Neurological Institute) space. In order to remove incidental emissions,
a Gaussian based filter with a 6 6 6 mm3 core was applied to the functional data.
The preprocessing procedure was carried out according to the above scheme for each of
the 20 subjects. Student t-test was used for statistical analysis. For the calculation of
brain areas connectivity, the SPM’s CONN toolbox was used.
During the experiments, the so-called “test with a concealed name” (TCN) widely
used in forensic studies using a polygraph (SUP), was applied, during which the person
under study (hereinafter referred to as the subject) concealed his own name from the
polygraph examiner along with five other names; the series of names were presented five
times to the subject during the test. With the exception of one name, which stood under
the number “0”, all the others were presented in a random, unknown order to the subject
with the phrase “Your passport name is…”. The names were presented by the experi-
menter with an interval of about 20 s with the obligatory account of the current
dynamics of the physiological parameters, recorded using MRIcP. The cumulative
graphical representation of the physiological parameters during the TCN was visualized
on the computer screen in the form of a polygram. The dynamics of the electrical
properties of the skin, i.e. galvanic skin reactions, as well as reactions in the cardio-
vascular system manifested in the change of heart rate and narrowing of the blood
vessels of fingers (the so-called vascular spasm) were analyzed. Registered physiolog-
ical reactions were expertly evaluated on a 3-point scale widely used in SUP practice [6].
This test and the principle of classification of subjects (20 persons) into subgroups—
high-reactive subjects (15 persons) and low-reactive subjects (5 persons)—are described
146 V. L. Ushakov et al.

in detail in [7]. In the subgroup of high-reactive subjects (15 persons), the degree of GSR
was in the range of 60–100% (that is, the subjects have according to the GSR in TCN
from 6 to 10 points out of 10 possible). In low-reactive subjects (5 persons), the degree
of GSR was 40% or less (i.e., subjects received 4 or less out of 10 possible).
It should be noted that a subgroup of 15 highly reactive subjects, according to
MRIcP data, also turned out to be heterogeneous (as described in [7]) and was divided
into two parts. The subjects, based on MRcP data, were divided into three groups with
different degree of autonomic reactions to personality-related stimuli: with strong GSR
only—7 subjects (group 1), with mean GSR and CR (measured by photoplethysmo-
gram signal)—6 subjects (group 2), and with low reactivity of the GSR and CR—5
subjects (group 3). Two people were excluded from the analysis because they had no
signs of this gradation.
The obtained statistical maps of brain neural networks activity (see below) showed
high activation of brain stem structures upon personality-related stimuli presentation in
the second group (mean GSR and CR), low activation of stem structures in the ﬁrst
group (strong GSR), and total absence of stem structure activation in subjects of the
third group (low-reactive GSR and CR).
The ﬁrst group included the subjects, in whom only GSR was highly informative in
identifying the concealed name, and the subjects in the second group had GSR and
vascular spasm (Fig. 1) as informative parameters.

Fig. 1. Polygram of TCN of a highly reactive subject. 8 channels correspond to: 1—sound of
presented stimuli; 2—sound of subject responses (along with the sound of MRI scanner); 3—
subject head movement; 4,5—upper and lower pneumogram sensors, 6—GSR; 7—HR; 8—
photoplethysmogram.
The Role of Brain Stem Structures in the Vegetative Reactions 147

On the Fig. 1, the fifth, last presentation of the TCN is shown. Concealing
meaningful information (the own name was Alexander, highlighted by a rectangle on
Fig. 1) causes the subject to have a maximum GSR (channel 6), decrease in heart rate
(channel 7; a moving “lens” shows 85 beats per minute) and pronounced, minimal in
this presentation, narrowing of the vessels of the fingers (channel 8).
It was very difficult for low-reactive subjects to isolate a concealed name by
MRTcP recorded reactions, due to their physiological characteristics, low reactivity and
instability of GSR, heart rate and vascular spasm (Fig. 2).

Fig. 2. Polygram of TCN of a low-reactive subject. 8 channels correspond to: 1—sound of

presented stimuli; 2—sound of subject responses (along with the sound of MRI scanner); 3—
subject head movement; 4,5—upper and lower pneumogram sensors, 6—GSR; 7—HR; 8—
photoplethysmogram.

The Fig. 2 shows the chaotic appearance of the GSR during the third (out of ﬁve)
presentations of the TCN. Concealing his own name (Andrew, highlighted by a rect-
angle), among other names, causes the subject to have a very weak GSR (channel 6), as
well as not accompanied by a drop in heart rate (channel 7) and narrowing of the
vessels of the ﬁngers (channel 8).

3 Results

Figure 3 shows fMRI results obtained for the three groups of subjects, divided on the
basis of the MRIcP data: with strong GSR only (group 1); with mean GSR and CR
(group 2); with low reactivity of the GSR and CR (group 3).
148 V. L. Ushakov et al.

Fig. 3. The results of group statistical analysis (p < 0,001) for comparison of personality-related
stimuli perception in relation to neutral stimuli. The ﬁgure shows a group statistical map
underlaid with a high-resolution T1 image at levels x = −8, −6, −4: A—group 1; B—group 2; C
—group 1 with removal of some of the fMRI samples of perception of neutral names in the cases
when there was high reactivity in the MRIcP signal; D—group 2 with removal of some of the
fMRI samples of perception of neutral names in the cases when there was high reactivity in
MRIcP signal; E—group 3; F—group 3 with removal of some of the fMRI samples of perception
of neutral names in the cases when there was high reactivity in the MRIcP signal.
The Role of Brain Stem Structures in the Vegetative Reactions 149

Fig. 3. (continued)

On the basis of the obtained data of brain stem activation upon presentation of self
signiﬁcant stimuli, connectivity between this zone and other parts of the brain were
restored separately for groups with pronounced physiological reactions (15 subjects)
and with low physiological reactions (5 subjects). As a result, it was shown that for a
group of subjects with pronounced physiological reactions, a statistically signiﬁcant
(p < 0,001) negative correlation was observed between the activity of the brain stem
and the hippocampus when perceiving personality-related stimuli with respect to
neutral ones.

4 Discussion

As can be seen from results shown in Fig. 3, for a group with mean GSR and HR
changes, a pronounced activation of the brain stem structures is observed upon pre-
sentation of self signiﬁcant stimuli (see Fig. 3A and C), a signiﬁcantly lower level of
activity in the group with strong GSR (see Fig. 3E and F) and the complete absence of
stem activations in the group with low reactivity of the GSR and CR (see Fig. 3B and
D). When removing the neutral words from a sample of fMRI signals in the condition
when high reactivity was observed in MRIcP data, more extensive activity was
observed in the brain stem in groups 1 and 3 that consistent with the operation of
autonomous regulation systems [8]. Thus, we can conclude about the effectiveness of
using an MRIcP for the selection of fMRI data for subsequent statistical analysis. The
revealed hidden negative correlation between the activity of the brain stem and the
hippocampus in the perception of personality-related stimuli with respect to neutral
ones shows the promise of using the method of constructing connectomes to visualize
150 V. L. Ushakov et al.

the processes of neural network interactions with each other, which will be used in
further work. The experiments conﬁrmed the promising prospects of the joint use of
fMRI technology and SUP to study neurocognitive processes. In the course of the
study, the criterion for classifying subjects according to the dynamics of their vege-
tative reactions was discovered: the criterion allows for a more focused approach to the
study of neurocognitive processes and may contribute to improving the quality of fMRI
research for various purposes.

Acknowledgements. This study was partially supported by the National Research Centre
Kurchatov Institute (MRI compatible polygraphy), by RFBR Grant oﬁ-m 17-29-02518 (the
cognitive-effective structures of the human brain), by the Russian Foundation of Basic Research,
grant RFBR 18-29-23020 mk (method and approaches for fMRI analyses). The authors are
grateful to the MEPhI Academic Excellence Project for providing computing resources and
facilities to perform experimental data processing.

References
1. Dorokhov, V.B., Malakhov, D.G., Orlov, V.A., Ushakov, V.L.: Experimental model of study
of consciousness at the awakening: fMRI, EEG and behavioral methods. In: BICA 2018,
Proceedings of the Ninth Annual Meeting of the BICA Society. Advances in Intelligent
Systems and Computing, vol. 848, pp. 82–87 (2019)
2. Korosteleva, A., Mishulina, O., Ushakov, V.: Information approach in the problems of data
processing and analysis of cognitive experiments. In: BICA 2018, Proceedings of the Ninth
Annual Meeting of the BICA Society. Advances in Intelligent Systems and Computing, vol.
848, pp. 180–186 (2019)
3. Korosteleva, A., Ushakov, V., Malakhov, D., Velichkovsky, B.: Event-related fMRI analysis
based on the eye tracking and the use of ultrafast sequences. In: BICA for Young Scientists,
Proceedings of the First International Early Research Career Enhancement School on BICA
and Cybersecurity (FIERCES 2017). Advances in Intelligent Systems and Computing, vol.
636, pp. 107–112 (2017)
4. Orlov, V.A., Kartashov, S.I., Ushakov, V.L., Korosteleva, A.N., Roik, A.O., Velichkovsky,
B.M., Ivanitsky, G.A.: “Cognovisor” for the human brain: Towards mapping of thought
processes by a combination of fMRI and eye-tracking. In: Book Advances in Intelligent
Systems and Computing. Springer, vol. 449, pp. 151–157 (2016)
5. Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.B., Frith, C.D., Frackowiak, R.S.:
Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain
Mapp. 2, 189–210 (1995)
6. The accuracy and utility of polygraph testing (Department of Defense, DC). Polygraph 13, 1–
143 (1984)
7. Orlov, V.A., Kholodny, Y.I., Kartashov, S.I., Malakhov, D.G., Kovalchuk, M.V., Ushakov,
V.L.: Application of registration of human vegetative reactions in the process of functional
magnetic resonance imaging. In: Advances in Intelligent Systems and Computing (2019), in
Press
8. Sclocco, R., Beissner, F., Bianciardi, M., Polimeni, J.R., Napadow, V.: Challenges and
opportunities for brainstem neuroimaging with ultrahigh ﬁeld MRI. NeuroImage 168, 412–
426 (2018)
Ordering of Words by the Spoken Word
Recognition Time

Victor Vvedensky1(&), Konstantin Gurtovoy2, Mikhail Sokolov2,

and Mikhail Matveev3
1
NRC Kurchatov Institute, Moscow, Russia
[email protected]
2
Children’s Technology Park of NRC Kurchatov Institute, Moscow, Russia
3
Moscow State Institute of International Relations, Moscow, Russia

Abstract. We measured the time needed to recognize spoken words in a group

of 12 subjects. We see that recognition time varies for different words with the
same sound duration and they can be ordered from the word perceived most
quickly to the “slowest” one. Every subject “generates” his own ordered list of
24 words used. The individual lists are similar to some extent, so that the robust
average list can be compiled. Presumably, it reflects distribution of the word
representations in the cortex and the time required to retrieve any word depends
on its position.

Keywords: Spoken word recognition time Word ordering Network science

1 Introduction

Selecting operators for voice control of freely moving devices we encountered the
phenomenon which was not explicitly reported in the spoken word recognition studies
[1, 2]. Despite decades of intensive research the ﬁeld of spoken word recognition still
remains open for the study of the underlying cognitive and linguistic processes. With
new technologies available it is worth to revisit simple experimental approaches used to
explore the process of human perception of the spoken words. Before setting up
complex study of speech perception by humans, which involves the use of sophisti-
cated equipment such as functional magnetic resonance imaging fMRI, magnetic
encephalography MEG or brain-computer interfaces BCI, one has to select proper
linguistic material for the experiments. This requires a set of compact preliminary tests
which on the one hand can assess the ability of the candidate subjects to perform
smoothly the proposed task and on the other hand can sort the suggested linguistic
material. One has to select these words or their combinations which would allow lucid
interpretation of the experimental data. We believe that the spoken word should be
selected as the basic stimulus, since visual presentation of words implies the study of
the language for literate. The latter presumably involves other brain mechanisms than
the “language for illiterate” or the basic language do. We hope that the cortical pro-
cesses which activate smaller number of different activities will be easier to describe
and may be even to understand. Starting from this background we designed our
experiments described below.
© Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 151–156, 2020.
https://doi.org/10.1007/978-3-030-30425-6_17
152 V. Vvedensky et al.

2 Methods

24 Russian nouns were presented in random order, each word three times. The words
were pronounced by the same male speaker. Age range of our 12 listeners (5 women)
was quite broad: 10, 16, 17, 31, 32, 45, 61, 61, 62, 63, 70, 80 years. All subjects gave
informed consent to participate in the experiments. The study was approved by the
local ethics committee for biomedical research of RNC Kurchatov Institute. Each
session lasted about 20 min. The subjects were instructed to press “Enter” on the
keyboard just at the moment they recognize the word they hear. Before the next trial,
they repeat the word heard. The task is reasonably simple, so practically no errors
occur. This is the list of the words used: эффeкт, кyлaк, пecoк, мocт, cпopт, глaз,
книжкa, нapoд, пopoг, вaгoн, жизнь, вxoд, живoт, caпoг, мacтep, мeчтa, кocтюм,
oceнь, гpyппa, ceлo, вpeмя, жeнa, чиcлo, тpyбкa (in English: effect, ﬁst, sand, bridge,
sport, eye, book, people, door-step, carriage, life, entrance, belly, boot, master, dream,
suit, autumn, group, village, time, wife, number, pipe). Sound duration of the words is
nearly equal despite different number of letters (4 to 6) in the selected words.

3 Results

The scatter of recognition times is shown in Fig. 1 for three subjects, others display the
same behavior. The scatter is considerable and at ﬁrst glance looks noise-like. One
should not think that such a large scatter is somewhat special for just the experiment

Fig. 1. Time when the subjects pressed the key, indicating that they understood the word they
hear. 24 words and two repetitions for each were presented in random order. Average reaction time
for these subjects is somewhat different. In this plot reaction time is referenced to the sound offset.
Ordering of Words by the Spoken Word Recognition Time 153

with words. Quite the opposite, this phenomenon always complicates measurements of
the reaction time to simple stimuli, especially relevant for pilots and sportsmen.
However, in our case the stimulus is quite complex and different each time. We analyze
human reactions on different words separately. Recognition time is referenced to the
sound offset point since majority of the key presses fall on the post-word period. It
turns out that the recognition times for different words of the same sound duration can
be ordered, so that each listener generates ordered list of 24 perceived words. Two
examples are shown in Fig. 2.

Fig. 2. 24 words heard by two listeners (Subject1 and Subject 12 in Fig. 4) and ordered by their
recognition times. In this plot reaction time is referenced to the sound onset. Time scale is in
milliseconds. Each word was presented three times. Ends of the scatter bars correspond to the
longest and shortest recognition times, while the third time lies in the middle. One can see
similarity of these ordered word lists.

It is difﬁcult to compare performance of different people using reaction time, be-

cause it is highly variable. One needs more robust characteristics describing experi-
mental data. The test object for our subjects is the list of words. We see that each
subject perceives the list in its own manner: some words quickly, some words slowly.
We see that these individual lists are similar to some extent. We ascribe rank to each
word in the individual list, transforming it into a vector with 24 components. The
vectors for different subjects can be compared.
Figure 3 displays average word list for 12 subjects. The plot also indicates the
scatter of each word position in the individual lists. This scatter reflects individuality of
each listener and can be used to assess ability of the subjects to perform the task. Each
one recognizes words in a slightly different way. It turns out that just correlation of
individual rank vector with the average value speciﬁes each listener in a sufﬁciently
154 V. Vvedensky et al.

Fig. 3. List of 24 Russian words ordered by 12 listeners. The word below is recognized most
quickly while the recognition time gradually increases for the words above. Each listener
generates personal ordered list of the words with gradually growing recognition time. The
ordered lists are basically similar for the subjects and error bar represents standard deviation of
the rank for each word.

robust way. This correlation is shown in Fig. 4. Linear order emerging in a group of
subjects performing some cognitive task is common – the most obvious example is the
ranking of chess players. Ranking in the same group is not universal, though depends
on the speciﬁc task. In the same group of tennis players the rankings for singles and
doubles can differ considerably. It is worth to mention that the words also tend to be
ordered into linear lists: the Zipf law is the most spectacular example.
Earlier we observed the same ordering of both nouns and listeners for another
group of 24 words: кaшa, лeди, пoни, минa, гpyшa, тyшa, cитo, пивo, ceти, тeмa,
кoмa, вилы, бycы, мyxa, тинa, зoнa, cтaя, лocи, дypa, yши, дaмa, дoля, caжa, лыжи
(in English: porridge, lady, pony, mine, pear, carcass, sieve, beer, net, theme, coma,
hayfork, beads, fly, ooze, zone, flock, moose, fool, ears, dame, share, soot, ski). These
words are presented in the order of decreasing recognition time. In this early experi-
ment another group of listeners was tested.
Ordering of Words by the Spoken Word Recognition Time 155

Fig. 4. Correlation of ranked lists of 24 words, generated by 12 listeners, with the average list.
Trend line demonstrates ranking of the subjects.

4 Discussion

We analyze only a small group of words from several thousand used in the language.
However this is the common feature of all linguistic experiments. We are looking
forward to develop an approach which in evolutionary way will select proper groups of
words for particular linguistic task. The choice of proper group of listeners is also quite
important, since different people use variable strategies in the speech communication.
So the dialects emerge.
Our data show the directions where we shall proceed. We have to generate new lists
of words around the “quick” and “slow” words in the analyzed list. There are plenty of
words in the thesaurus. The same list has to be presented to several clearly distinct
groups of listeners, which emerge from previous experimentation. In this way we
expect to cover considerable part of the language thesaurus and to ﬁnd directions where
the experimental data will produce crucial information for the understanding of speech
perception.
Neuroimaging experimental data on the perception of words indicate broad scatter
of cortical activity, related to individual words, over the considerable part of both
cerebral hemispheres [3]. Locations for different word groups are detected using fMRI
156 V. Vvedensky et al.

machines, so that “across the cortex, semantic representation is organized along smooth
gradients that seem to be distributed systematically” [4]. It seems likely that we see
these local gradients in our experiments with groups of words. Observed linearity is
certainly local (for just group of words) though we believe that these linear segments
can be woven into the complete network of words, could be similar to the ﬁshnet. We
believe that our simple though careful testing of the words groups which can be
represented in the same cortical area can shed light on the mechanisms people use for
language communication. The tests described here can be easily combined with MEG
measurements which have long shown that the word heard evokes neuronal activity in
many places throughout the cortex [5].
The author VLV is supported by the Russian Fund for Basic Research, grant 18-00-
00575 comﬁ.

References
1. Pisoni, D.B., McLennan, C.T.: Spoken word recognition: historical roots, current theoretical
issues, and some new directions. In: Neurobiology of Language, Chap. 20. Elsevier Inc.,
Amsterdam (2016). https://doi.org/10.1016/B978-0-12-407794-2.00093-6
2. Vitevitch, M.S., Luce, P.A.: Phonological neighborhood effects in spoken word perception
and production. Annu. Rev. Linguist. 2(7), 1–7.20 (2016)
3. Huth, A.G., de Heer, W.A., Grifﬁths, T.L., Theunissen, F.E., Gallant, J.L.: Natural speech
reveals the semantic maps that tile human cerebral cortex. Nature 532(7600), 453–458 (2016).
PMID: 27121839
4. Huth, A.G., Nishimoto, S., Vu, A.T., Gallant, J.L.: A continuous semantic space describes the
representation of thousands of object and action categories across the human brain. Neuron
76, 1210–1224 (2012). https://doi.org/10.1016/j.neuron.2012.10.01499-110
5. Vvedensky V.L., Korshakov A.V.: Observation of many active regions in the right and left
hemispheres of the human brain which simultaneously and independently respond to word.
In: Proceedings Part 1 XV Russian Conference Neuroinformatics-2013, MEPhI, Moscow,
pp. 43–52 (2013). (in Russian)
Neurobiology and Neurobionics
A Novel Avoidance Test Setup:
Device and Exemplary Tasks

Alexandra I. Bulava1(&), Sergey V. Volkov2,3,

and Yuri I. Alexandrov1,4,5
1
Shvyrkov Lab of Neuronal Bases of Mind, Institute of Psychology,
Russian Academy of Sciences, Moscow, Russia
[email protected]
2
Lab for Behaviour of Lower Vertebrates,
Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences,
Moscow, Russia
3
Ocean Acoustics Lab, Shirshov Institute of Oceanology,
Russian Academy of Sciences, Moscow, Russia
4
Moscow State University of Psychology and Education, Moscow, Russia
5
Department of Psychology, National Research University Higher School
of Economics, Moscow, Russia

Abstract. This paper presents a novel rodent avoidance test. We have devel-
oped a specialized device and procedures that expand the possibilities for
exploration of the processes of learning and memory in a psychophysiological
experiment. The device consists of a current stimulating electrode-platform and
custom software that allows to control and record real-time experimental pro-
tocols as well as reconstructs animal movement paths. The device can be used to
carry out typical footshock-avoidance tests, such as passive, active, modiﬁed
active and pedal-press avoidance tasks. It can also be utilized in the studies of
prosocial behavior, including cooperation, competition, emotional contagion and
empathy. This novel footshock-avoidance test procedure allows flexible current-
stimulating settings. In our work, we have used slow-rising current. A test animal
can choose between the current rise and time-out intervals as a signal for action in
footshock avoidable tasks. This represents a choice between escape and avoid-
ance. This method can be used to explore individual differences in decision-
making and choice of avoidance strategies. It has been shown previously that a
behavioral act, for example, pedal-pressing is ensured by motivation-dependent
brain activity (avoidance or approach). We have created an experimental design
based on tasks of instrumental learning: pedal-pressing in an operant box results
in a reward, which is either a piece of food in a feeder (food-acquisition behavior)
or an escape-platform (footshock-avoidance behavior). Data recording and
analysis were performed using custom software, the open source Accord.NET
Framework was used for real-time object detection and tracking.

Keywords: Engineering Learning Footshock Avoidance task

Appetitive task Approach/Withdrawal Behavioral analysis

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 159–164, 2020.
https://doi.org/10.1007/978-3-030-30425-6_18
160 A. I. Bulava et al.

1 Introduction

Animal models are used by researchers all over the world. Rodent passive/active
avoidance tests are the typical models not only in experimental psychology but also in
clinical psychology, psychiatry and behavioral neuroscience. Recent years have
brought rapid advances in our understanding of the brain processes involved in the
avoidance-learning, along with their clinical implications for anxiety disorders, PTSD
etc. [7, 10]. Avoidance behavior in rodents has predominantly been studied using lever-
press signaled avoidance task, which requires animals to press a tool upon presentation
of a warning signal in order to prevent or escape punishment [10]. The development of
new techniques capable of modeling multidimensional cognitive activity could be a
valuable contribution to psychophysiological studies. The system organization of
human and animal behavior, including the processes of systemogenesis, can be studied
in a variety of situations, such as learning and performing behavioral tasks,
acute/chronic stress, psychotrauma, alcohol intoxication, etc. This paper presents a
novel rodent avoidance test designed to expand the possibilities for exploration of
learning and memory processes.

2 Device

The device we developed consists of a current stimulating electrode-platform and

custom software that allows to control and record the real-time behavioral protocol,
which can be used to reconstruct trajectories of the animal’s movement. The size and
amount of the electrodes provide a stable contact with animal skin (see Fig. 1e). The
device can be used for the typical footshock-avoidance tests, including passive, active
and modified active (see Fig. 1a–c). This is achieved by combining separate sectors of
electrodes (Patent RU2675174C1, Fig. 1). The device is completed with partitions and
sound/light signals, which provide possibilities to implement a broad range of
behavioral tasks in various situations and conditions, such as learning, helplessness,
stress in the studies of anxiety, stress disorders and memory, etc.
Finally, the device can be used to study prosocial behavior in rodents, including
cooperation, competition, willingness to help a conspecific, emotional contagion and
empathy. For instance, we have used a previously established model of emotional
contagion [4, 6] in which an animal observes a conspecific experience painful elec-
troshocks. This model is illustrated in Fig. 1d.
It is known that the electrical resistance of rodent skin depends on such factors as
age, sex and weight. Indeed, experiments revealed wide differences in the skin resis-
tance of animals [5, 8]. In addition, our study showed that skin resistance in rats
decreases after 5-min of electrostimulation. Therefore, we have applied electrical cir-
cuit of the voltage-controlled current source to compensate for this change in the
operation of the device.
A user can apply automatic settings for task-dependent stimulation or control
stimulation manually, including both, AC (alternating current) and DC (direct current).
Slow-growing stimulation can be regulated by a microcontroller. Impulse noise (arti-
facts) elimination is provided by the alternating current.
A Novel Avoidance Test Setup: Device and Exemplary Tasks 161

Fig. 1. Typical footshock-avoidance tests: (a) passive, (b) active, (c) modiﬁed active,
(d) “emotional contagion” - observer (left) and pain-demonstrator (right). (e) Device controller
(left) and a photograph illustrating the stable contact between electrodes (the arrow indicates one
of the electrodes) and animal’s skin.

3 A Novel Avoidance Test Procedure

A novel footshock-avoidance test procedure allows flexible current-stimulating settings

with variable times of trials, currents (from 0 to 3 mA) and time between trials. In our
work we have used slow-rising current. A typical trial consists of three intervals:
(1) current rise; (2) maximum value; (3) time-out (pause between trials). In order to
avoid footshock, a test animal learns to press a pedal during either the current rise
period, or time-out period. This experimental procedure allows to explore individual
differences in decision-making and choice avoidance strategies, when an animal makes
a choice between escape and avoidance.
Figure 2 illustrates the “learned helplessness” experiment, when unavoidable high-
intensity footshock is applied to an animal.

Fig. 2. An example of real-time protocol of footshock-avoidance behavior. Footshock is applied in all

4 sectors (A, B, C, D). Three trials are illustrated here. The rat is given a current of 0 to 1 mA, interval
settings: current rise from 0 to 5 s, followed by the maximum value from 5 to 10 s, and current stops
after 10 s (bottom, right). The next trial begins. Top right corner shows the real-time video recording.
162 A. I. Bulava et al.

4 Exemplary Tasks of Instrumental Learning

4.1 Approach/Withdrawal Paradigm
The most general division of behavior is considered to be approach and withdrawal.
Studies demonstrated motivation-dependent brain activity (avoidance- or approach-
goal) during behavioral acts, such as pedal pressing [1–3, 9]. A typical model of
approach behavior is a food-acquisition task, while the typical model of withdrawal
behaviour is an avoidance task.
We have created an experimental design based on tasks of instrumental learning.
Operant box is equipped with automated feeders, escape-platform and pedal bars
located in the opposite corners of the box. Pedal-pressing results in a reward, which is
either a piece of food in a feeder (food-acquisition behavior, see Fig. 3b), or an escape-
platform (footshock-avoidance behavior, see Fig. 3a). The action of pedal-pressing is
the same in both cases, but its result is variable: escape-platform or feeder.

Fig. 3. (a) Instrumental footshock-avoidance behavior. (b) Instrumental food-acquisition

behavior. (c) Movement paths of a representative rat. (d) Exemplary learning curve during
appetitive bar-pressing behavior.

4.2 Behavioral Data Recording and Analysis

Data recording and analysis were performed using custom software developed by
Volkov S.V. Fig. 4 shows exemplary real-time protocol for behavioral analysis (pro-
vided by the device).
A Novel Avoidance Test Setup: Device and Exemplary Tasks 163

Fig. 4. Exemplary real-time protocol for behavioral analysis (food-acquisition task). The
behavioral cycle: 1 - pedal (bar) pressing; 2 - start of the feeder motor; 3 - lowering rat head and
taking food from the feeder. Frame from the actual video recording during operant food-
acquisition behavior (right). The object is identiﬁed (rectangle), coordinates are recorded in PC.

The food-acquisition behavioral cycle was divided into several acts (Fig. 4 left):
pedal (bar) pressing (mechanosensor); moving to pedal corner; lowering head (pho-
tosensor) and taking food from the feeder. The moving object is identiﬁed (Fig. 4 right,
rectangle) by custom software using the open source Accord.NET Framework [11].
The signal-coordinates are recorded into PC. Animals’ movement paths are restored by
coordinates (see Fig. 3c).
The Accord.NET Framework is a .NET machine learning framework combined
with audio and image processing libraries completely written in C#. Real-time object
detection and tracking, as well as general methods for detecting and tracking. Con-
venient open source.

5 Conclusion

We have compiled and debugged a novel rodent avoidance task procedure that allows
to obtain new type of data about individual differences in decision-making and choice
of avoidance strategies. For example, experiments in active non-instrumental avoidance
test (see Fig. 1b) showed, that female rats choose to minimize the risks and avoid shock
during low-voltage current (a signal for avoidance), while male rats do it during the
pause (between trials), which allows to avoid the shock completely but with a risk of
high-voltage shock in rare occasions.
We have created an experimental design based on tasks of instrumental learning
that allows to explore motivation-dependent brain activity (avoidance or approach).
The novel rodent avoidance test that we developed expands the possibilities for
exploration of learning and memory processes.
164 A. I. Bulava et al.

Acknowledgments. This research was performed in the framework of the state assignment of
Ministry of Science and Higher Education of Russia (No. 0159-2019-0001 by Institute of Psy-
chology RAS - learning procedures; No. 0149-2019-0011 by Shirshov Institute of Oceanol-
ogy RAS - designed device).

References
1. Alexandrov, Y.I., Sams, M.: Emotion and consciousness: ends of a continuity. Cogn. Brain
Res. 25, 387–405 (2005)
2. Bulava, A.I., Grinchenko, Y.V.: Patterns of hippocampal activity during appetitive and
aversive learning. Biomed. Radioelectron. 2, 5–8 (2017)
3. Bulava, A.I., Svarnik, O.E., Alexandrov, Y.I.: Reconsolidation of the previous memory:
decreased cortical activity during acquisition of an active avoidance task as compared to an
instrumental operant food-acquisition task. In: 10th FENS Forum of Neuroscience, Abstracts
P044609, p. 3493 (2016)
4. Carrillo, M., Han, Y., Migliorati, F., Liu, M., Gazzola, V., Keysers, C.: Emotional mirror
neurons in the rat’s anterior cingulate cortex. Curr. Biol. 29(8), 1301–1312 (2019)
5. Cheng, N., Van Hoof, H., Bockx, E., Hoogmartens, M.J., et al.: The effects of electric
currents on ATP generation, protein synthesis, and membrane transport in rat skin. Clin.
Orthop. 171, 264–272 (1982)
6. Keum, S., Shin, H.-S.: Rodent models for studying empathy. Neurobiol. Learn. Mem. 135,
22–26 (2016)
7. Krypotos, A.-M., Effting, M., Kindt, M., Beckers, T.: Avoidance learning: a review of
theoretical models and recent developments. Front. Behav. Neurosci. 9, 189 (2015)
8. Muenzinger, K.F., Mize, R.H.: The sensitivity of the white rat to electric shock: threshold
and skin resistance. J. Comp. Psychol. 15(1), 139–148 (1933)
9. Shvyrkova, N.A., Shvyrkov, V.B.: Visual cortical unit activity during feeding and avoidance
behavior. Neurophysiology 7, 82–83 (1975)
10. Urcelay, G.P., Prevel, A.: Extinction of instrumental avoidance. Curr. Opin. Behav. Sci. 26,
165–171 (2019)
11. Accord.NET Framework. http://accord-framework.net/index.html. Accessed 14 May 2019
Direction Selectivity Model Based
on Lagged and Nonlagged Neurons

Anton V. Chizhov1,2(B) , Elena G. Yakimova3 , and Elena Y. Smirnova1,2

1
Ioﬀe Institute, Politekhnicheskaya str., 26, 194021 St.-Petersburg, Russia
[email protected]
2
Sechenov Institute of Evolutionary Physiology and Biochemistry of RAS,
Torez pr., 44, 194223 St.-Petersburg, Russia
3
Pavlov Institute of Physiology, Makarova emb. 6, 199034 St.-Petersburg, Russia

Abstract. Direction selectivity (DS) of visual cortex neurons is mod-

elled with a ﬁlter-based description of retino-thalamic pathway and a
conductance-based population model of the cortex as a 2-d continuum.
The DS mechanism is based on a pinwheel-dependent asymmetry of pro-
jections from lagged and non-lagged thalamic neurons to the cortex. The
model realistically reproduces responses to drifting gratings. The model
reveals the role of the cortex in sharpening DS, keeping interneurons
non-selective.

Keywords: Visual cortex · Direction selectivity ·

Lagged and non-lagged cells ·
Coductance-based refractory density model

1 Introduction
Primary visual cortex neurons are selective to various characteristics of the stim-
ulus: orientation, direction of motion, color, etc. [1]. Most of the DS models
include a time delay between the spatially separated inputs into a cortical cell
[2]. The physiological mechanism of this delay formation has been revealed in [3]
and further, in more details, in [4], where with the help of intracellular in vivo
registrations it was demonstrated that the lateral geniculate nucleus (LGN) neu-
rons fall into two classes: lagged and non-lagged cells; and a delay of the lagged
neurons is determined by the effects of the inhibitory-excitatory synaptic com-
plexes formed on synaptic axonal terminals of retinal ganglion cells in LGN. In
[5], it was proposed a complex schematic model of DS, based on specific conver-
gent projections of the signals from lagged and non-lagged LGN cells, as well as
on the intracortical interactions. Later, it was proposed a reduced, rate model
of a hypercolumn that exploits lagged and non-lagged LGN cells and feedfor-
ward inhibition [6]. However no any detailed and comprehensive model has been
reported yet. In our biophysically detailed model of V1 we use a conductance-
based refractory density (CBRD) approach [7], which allows us to benefit from
the advantages of population models and keep the precision of biophysically
detailed models.
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 165–171, 2020.
https://doi.org/10.1007/978-3-030-30425-6_19
166 A. V. Chizhov et al.

2 Methods
Lag-Nonlag Mechanism of Direction Selectivity. The LGN neurons differ
in their delayed reaction to visual stimuli and split into two populations of lagged
and non-lagged cells (Fig. 1). These populations are equally and homogeneously
distributed across LGN (Fig. 1, middle). The lagged/non-lagged cells have round,
center-surround receptive fields (RF) (Fig. 1, left). We consider only so-called
on-cells, they respond strongly to a bright stimulus in the center of RF and
are inhibited in the surround of RF. The center-surround structure is described
by an axisymmetric difference of Gaussians (DOG), as in [8], with the RF’s
temporal component set as a double-exponential function. The firing rate of
an LGN neuron at any given time is expressed as a convolution of RF with a
stimulus and rectified at zero. The model of LGN cells is described in detail in
[9]. Lagged cell activity is delayed by 40 μs, according to estimations from [4].

Fig. 1. Schematic representation of the proposed model reproducing direction selectiv-

ity in V1.

V1 consists of orientation hypercolumns (Fig. 1, right). V1 neurons receive

inputs from LGN cells that are located within the elongated footprint (Fig. 1,
middle). The elongation determines the orientation preference (horizontal). Neu-
rons near the border of two neighboring hypercolumns prefer the same orienta-
tion but opposite directions and have similar footprints. A preferred direction
depends on the asymmetry of connections with lagged and non-lagged cells, i.e.
the footprint (Fig. 1, middle), which is split into two halves along the axis of elon-
gation. V1 neurons preferring a certain direction (upward) receive non-lagged
input from one (top) side of the footprint and lagged input from the other (bot-
tom) side, and vice versa for V1 neurons preferring the opposite (downward)
Direction Selectivity Model Based on Lagged and Nonlagged Neurons 167

direction. Mathematical description of LGN to V1 projection is expressed in

terms of firing rates and convolutions.
A kernel expression that determines the direction selectivity bias is the tha-
lamic input into V1 neuron, ϕT h,E (x, y, t), which is given below. The pinwheels
with clockwise progression of orientation columns are adjacent to the ones with
counterclockwise progression. The pinwheel-centers are distributed on the rect-
angular grid with the pinwheel radius R and indexed by iP W and jP W . The
adjacent columns owing to different pinwheels have the same orientation prefer-
ences. The coordinates of the pinwheel-center are xP W = (2iP W − 1)R, yP W =
(2jP W − 1)R. The orientation angle for the point (x, y) of V1 which belongs to
the pinwheel (iP W , jP W ) is defined as θ(x, y) = arctan((y − yP W )/(x − xP W )).
The progression is determined by the factor (−1)iP W +jP W . Finally, the input
firing rate is

ϕT h,E (x, y, t) = dx̃dỹ DLGN −V 1 (x, y, x̃, ỹ) LLGN (x̃, ỹ, t − δ(x, y, x̃, ỹ)),

where

DLGN −V 1 (x, y, x̃, ỹ) = 1/(πσpref σorth ) exp −x2 /σpref2
− y 2 /σorth
2
,

x = (x̃ − xcf ) cos θ − (ỹ − ycf ) sin θ,
y = (x̃ − xcf ) sin θ + (ỹ − ycf ) cos θ,
δ(x, y, x̃, ỹ) = {40ms, if(−1)iP W +jP W x > 0; 0, otherwise}.

Here DLGN −V 1 (x, y, x̃, ỹ) is the LGN-to-V1 footprint with the width across
preferred orientation σpref and the width across orthogonal orientation σorth ;
δ(x, y, x̃, ỹ) is the delay that determines contributions of either lagged or non-
lagged cells.
Biophysically Detailed Mathematical Model of V1. V1 is modeled as a
continuum in 2-d cortical space. Each point contains 2 populations of neurons,
excitatory (E) and inhibitory (I), connected by AMPA, NMDA and GABA-
A-mediated synapses for recurrent interactions and only AMPA and NMDA
for LGN input. The strengths of the external connections correspond to the
pinwheel architecture, thus neurons receive inputs according to their orientation
and direction preferences. The strengths of the intracortical connections, i.e.
maximum conductances, are isotropic and distributed according to locations of
pre- and postsynaptic populations. The modeled area of the cortex was as large
as 1 mm × 1.5 mm and included 6 orientation hypercolumns.
The mathematical description of each population is based on the CBRD app-
roach [10,11], where neurons within each population are distributed according
to their phase variable, the time elapsed since their last spikes, t∗ . Single popu-
lation dynamics is governed by the equations for the neuronal density, the mean
over noise realizations voltage and gating variables. The CBRD for interacting
adaptive regular spiking pyramidal cells and fast spiking interneurons is given
in [7,12]. The model of an E-neuron takes into account two compartments and
a set of voltage-gated ionic currents, including the adaptation currents.
168 A. V. Chizhov et al.

3 Results
We have testified the mechanism of DS by comparison of spatio-temporal activity
patterns (Fig. 2) in response to horizontal gratings moving up (a) and down (b)
with temporal frequency 8 Hz and spatial frequency 0.25 cycle/grad. The bright
spots correspond to high activity. They appear in columns that prefer orientation
similar to that of the stimulus. The patterns are not symmetrical in respect to
the central vertical axis, which is due to DS, i.e. different direction preferences
for neurons of the left and right columns with the same orientation preferences,
as clear from the averaged over first 1600 ms activity of E-neurons (Fig. 2c).
The peaks of the E-cell activity locate in different hypercolumns, depending
on the direction of the grating movement. The plots for the excitatory firing
rate (Fig. 2c) are comparable to the optical imaging data, for example, the ones
obtained in cat visual cortex [13] (see their Figs. 4A-B).
For the location marked in Fig. 2c, the LGN input, mean voltage, synaptic
conductances, firing rate, voltage-sensitive dye (VSD) signal and voltage of rep-
resentative neurons are shown in Fig. 2d,e. These simulated signals are similar
to experimental recordings, for instance, those from [14] (their Fig. 5). The fir-
ing rates of E and I populations correlate in time. The amplitude of firing rate
oscillations strongly depends on the direction of gratings movement (compare
panels d and e). The voltage-sensitive dye (VSD) signal (bottom trace) was cal-
culated as a sum of three quarters of the E mean voltage and one quarter of the
I mean voltage. It is comparable to the experimentally recorded VSD-signals,
for instance, from [15].
The input signals for the neurons of the populations are the synaptic con-
ductances (Fig. 2d,e). Modulations in time of the excitatory and inhibitory com-
ponents are in-phase. To compare with experiments, it should be noted that
we present separate AMPA, NMDA and GABA conductances, whereas known
experimental studies reported anti-phase estimates of summed, AMPA+NMDA,
and inhibitory conductances [14,16–18], which should not be directly compared,
because of the underestimation of the experimental method, that was recently
revealed [19]. That is why, our observation of in-phase modulations of the AMPA
and GABA conductances should not be considered as untrue if compared with
experimental estimates of anti-phase excitatory and inhibitory conductances.
The CBRD-model enables one to reconstruct a behavior of a representa-
tive neuron, if known input variables of a population. As seen from voltage
traces, such a representative E-neuron generates spikes when the direction of
gratings movement is the preferred one. When the direction is opposite, only
sub-threshold depolarization is observed. As to an I-neuron, it shows weaker
direction specificity. Voltage traces recorded in response to moving gratings are
consistent with the ones presented in electrophysiological works in vivo, such as
[14,18], if compare the shape and the amplitude of voltage oscillations. Mean
voltage shown in Fig. 2d,e is the mean across noise realizations and across input
weights. Membrane potentials of individual neurons generally differ from this
Direction Selectivity Model Based on Lagged and Nonlagged Neurons 169

a b c
100 125 150 175 200 225 ms 100 125 150 175 200 225 ms up down

0 10 20 30 Hz 0 10 20 30 Hz

d e
preferred direction non-preferred direction
10 10 input to V1 neuron
Hz

Hz
input to V1 neuron
0 0
mean voltage
I
-60 I -60
mV

mV
mean voltage
E E
-70 -70
synaptic conductances
60 NMDA 60 NMDA
units

units
40 synaptic conductances 40
GABA
20 20 GABA
0 AMPA 0 AMPA
50 50

firing rate I firing rate I

E E
0 0
-40 -40
voltage in representative neuron voltage in representative neuron
mV

-60 I
-60 I
E
E
VSD-signal
-60 VSD-signal -60
mV

-70 250 500 750 -70 250 500 750

ms ms

Fig. 2. Activity of V1 domain in response to the moving gratings. (a,b) Distribution of

the E-population firing-rate (bottom) across the modeled area of V1 at different time
moments in response to the gratings (top) moving up (a) and down (b). The modeled
area of V1 includes 6 orientation hypercolumns with the centers marked by small white
dots. The white circle is the location of the representative population. (c) The firing
rate of E-populations, averaged over 1600 ms. (d,e) Activity characteristics in one
point of the modeled V1 area (big white point in b) in response to the grating stimuli
moving up (d) and down (e): the LGN input to the E-population; the mean voltage
of E (solid line) and I (dashed) populations; the AMPA (solid), NMDA (long-dashed)
and GABA (dashed) recurrent synaptic conductances; the firing rate of E (solid) and
I (dashed) populations; the voltage of representative E (solid) and I (dashed) neurons;
voltage-sensitive dye (VSD) signal.

mean voltage due to an individual input weight obeying the lognormal distribu-
tion, diﬀerent noise realizations and diﬀerent refractory state t∗ , as seen from
the example for the representative neuron.
170 A. V. Chizhov et al.

4 Discussion
In our model, the average activity patterns (Fig. 2c) are comparable with the
optical imaging data [13]. The scales and contrast of the modeled and experi-
mental spots of activity are similar. Also, the displacement of the spots after the
change of the stimulus direction is similar.
We have found that E-neurons are directionally selective, whereas I-neurons
are not, because of two reasons: I-neurons do not receive direct LGN input and
the characteristic length of E-cell connections to I-cells is 5 times bigger than
that of E-to-E connections.
The voltage traces registered in [14,17,18] have the same degree of DS as
our model. The Lag-Nonlag mechanism is principally similar to that based on
transient and sustained cells [20]. Alternatively, recently reported experimental
data obtained with the help of optogenetics [21] and multielectrode electrophysi-
ological recordings [22] suggest that DS in V1 is determined by a displacement of
on- and off- subzones of the receptive fields of V1 neurons. Here we did not take
into account the off-signals; instead, we considered only on-center off-surround
neurons in LGN and their pure excitatory projections to V1. Introduction of
feedforward inhibition and/or off-center on-surround LGN neurons and on-off
separation at the level of V1 is expected to produce stronger DS. This issue is
to be considered in our future study.
Concluding, the proposed model is quite realistic by construction and behav-
ior. Simulations approve that the suggested mechanism is consistent with known
experimental constraints.

Acknowledgment. The reported study was supported by the Russian Foundation

for Basic Research (RFBR) research project 19-015-00183.

References
1. Hubel, D.H., Wiesel, T.N.: Receptive ﬁelds of single neurones in the cat’s striate
cortex. J. Physiol. 148, 574–591 (1959)
2. Adelson, E.H., Bergen, J.R.: Spatiotemporal energy models for the perception of
motion. J. Opt. Soc. Am. A. 2, 284–299 (1985)
3. Cai, D., DeAngelis, G.C., Freeman, R.D.: Spatiotemporal receptive ﬁeld organiza-
tion in the lateral geniculate nucleus of cats and kittens. J. Neurophysiol. 78(2),
1045–1061 (1997)
4. Vigeland, L.E., Contreras, D., Palmer, L.A.: Synaptic mechanisms of temporal
diversity in the lateral geniculate nucleus of the thalamus. J. Neurosci. 33(5),
1887–1896 (2013)
5. Saul, A.B., Humphrey, A.L.: Evidence of input from lagged cells in the lateral
geniculate nucleus to simple cells in cortical area 17 of the cat. J. Neurophysiol.
68(4), 1190–1208 (1992)
6. Ursino, M., La Cara, G.E., Ritrovato, M.: Direction selectivity of simple cells in
the primary visual cortex: comparison of two alternative mathematical models. I:
response to drifting gratings. Comput. Biol. Med. 37(3), 398–414 (2007)
Direction Selectivity Model Based on Lagged and Nonlagged Neurons 171

7. Chizhov, A.V.: Conductance-based refractory density model of primary visual cor-

tex. J. Comput. Neurosci. 36, 297–319 (2014)
8. Dayan, P., Abbott, L.F.: Theoretical Neuroscience: Computational and Mathemat-
ical Modeling of Neural Systems. The MIT Press, Cambridge (2001)
9. Yakimova, E.G., Chizhov, A.V.: Experimental and modeling studies of orienta-
tional sensitivity of neurons in the lateral geniculate nucleus. Neurosci. Behav.
Physiol. 45(4), 465–475 (2015)
10. Chizhov, A.V., Graham, L.J., Turbin, A.A.: Simulation of neural population
dynamics with a refractory density approach and a conductance-based threshold
neuron model. Neurocomputing 70(1), 252–262 (2006)
11. Chizhov, A.V., Graham, L.J.: Population model of hippocampal pyramidal neu-
rons, linking a refractory density approach to conductance-based neurons. Phys.
Rev. E 75, 011924 (2007)
12. Chizhov, A., Amakhin, D., Zaitsev, A.: Computational model of interictal dis-
charges triggered by interneurons. PLoS ONE 12(10), e0185752 (2017)
13. Shmuel, A., Grinvald, A.: Functional organization for direction of motion and its
relationship to orientation maps in cat area 18. J. Neurosci. 16, 6945–6964 (1996)
14. Monier, C., Fournier, J., Fregnac, Y.: In vitro and in vivo measures of evoked
excitatory and inhibitory conductance dynamics in sensory cortices. J. Neurosci.
Methods 169, 323–365 (2008)
15. Grinvald, A., Lieke, E.E., Frostig, R.D., Hildesheim, R.: Cortical point-spread func-
tion and long-range lateral interactions revealed by real. J. Neurosci. 14(5), 2545–
2568 (1994)
16. Anderson, J.S., Carandini, M., Ferster, D.: Orientation tuning of input conduc-
tance, excitation, and inhibition in cat primary visual cortex. J. Neurophysiol.
84(2), 909–926 (2000)
17. Priebe, N.J., Ferster, D.: Direction selectivity of excitation and inhibition in simple
cells of the cat primary visual cortex. Neuron 45(1), 133–145 (2005)
18. Baudot, P., Levy, M., Marre, O., Monier, C., Pananceau, M., Fregnac, Y.: Anima-
tion of natural scene by virtual eye-movements evokes high precision and low noise
in V1 neurons. Front. Neural Circ. 7, 206 (2013)
19. Chizhov, A.V., Amakhin, D.V.: Method of experimental synaptic conductance esti-
mation: limitations of the basic approach and extension to voltage-dependent con-
ductances. Neurocomputing 275, 2414–2425 (2017)
20. Lien, A.D., Scanziani, M.: Cortical direction selectivity emerges at convergence of
thalamic synapses. Nature 558, 80–86 (2018)
21. Adesnik, H., Bruns, W., Taniguchi, H., Huang, J., Scanziani, M.: A neural circuit
for spatial summation in visual cortex. Nature 490, 226–231 (2012)
22. Kremkow, J., Jin, J., Wang, Y., Alonso, J.: Principles underlying sensory map
topography in primary visual cortex. Nature 533(7601), 52–57 (2016)
Wavelet and Recurrence Analysis of EEG
Patterns of Subjects with Panic Attacks

Olga E. Dick(&)

Pavlov Institute of Physiology of RAS, St. Petersburg, Russia

[email protected]

Abstract. The task of analyzing the reactive patterns of electroencephalogram

(EEG) in individuals with panic attacks before and after non-drug therapy
associated with the activation of artiﬁcial stable functional connections of the
human brain is considered. The quantitative measures of the photic driving
reaction for the suggested frequency are estimated by increasing the energy of
the wavelet spectrum during the photostimulation and the parameters of the joint
recurrence plot of the light stimulus and EEG pattern.

Keywords: EEG Panic attacks Wavelet analysis Joint recurrence plot

1 Introduction

Panic attacks include a complex of symptoms characterized by paroxysmal fear [1, 2].
The importance of the problem of treating this disorder is due to the lack of effec-
tiveness of drug therapy. That is why there is still a need to ﬁnd safe non-drug
therapies. One of these methods is the activation of artiﬁcial stable functional con-
nections (ASFC) of the human brain. The ASFC method is based on the intracerebral
phenomenon of long-term memory, which is a special kind of functional connections of
the brain that are formed under conditions of activation of subcortical structures and
impulse stimulation, and associated with the regulatory systems of the brain [3–5].
The aim of the work is to show the ability to identify quantitative indicators of the
improvement of the functional state of the brain of patients with panic attacks after
ASFC trials.

2 Materials and Methods

Artifact-free EEG patterns were analyzed in 10 patients aged from 26 to 45 years with a
disease duration of an average of 10 years and a diagnosis of panic disorder. The course
of correction was performed at the clinic of the Institute of the Human Brain of the
Russian Academy of Sciences and consisted of 10 trials of the formation of ASFC.
Each trial included 6 series of photostimulation with a frequency of 20 Hz and duration
of 10 s on the background of the medication of ethimizol, the intervals between the
stimuli were 60 s. The photostimulation was carried out using the functional brain
activity simulator “Mirage” (St. Petersburg). This device has proven itself in the

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 172–180, 2020.
https://doi.org/10.1007/978-3-030-30425-6_20
Wavelet and Recurrence Analysis of EEG Patterns 173

programs of non-drug correction in earlier studies. [3–5]. Before and after these trials,
the brain bioelectrical activity was recorded on a 21-channel electroencephalograph
with a sampling rate of 256 Hz. The study was approved by the local Ethics Com-
mittee. Written informed consent was obtained from all the subjects. The stimulation
lasted 10 s for each frequency, with a resting interval between frequencies of 30 s.
Since the signals reproducing the light rhythm have maximal amplitude in the occipital
lobes, the patterns at −O1-, Oz- и O2- sites were estimated.
The photic driving reaction in EEG patterns was estimated by the continuous
wavelet transform method [6] and the method of the joint recurrence analysis [7].
In the ﬁrst method the complex Morlet wavelet was used as the basic wavelet:

w0 ðtÞ ¼ p1=4 expð0:5t2 Þ expðix0 tÞ;

where the value x0 = 2p gives the simple relation between the scale a of the wavelet
transform and the real frequency f of the analyzed signal [6]:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
x0 þ 2 þ x0
f ¼ 1=a:
4pa

Due to the relation between a and f, the continuous wavelet transform of the signal
x(t) is determined by the function:

pffiffiffi Z
þ1
1=4
Wðf ; t0 Þ ¼ p f xðtÞ exp ð0:5ðt t0 Þ2 f 2 Þ exp ði2pðt t0 Þf Þdt
1

where t0 gives the shift of the wavelet function along the time.
The value jWðf ; t0 Þj2 determines the instantaneous distribution of the energy over
frequencies f, and the integral
Z t2
Eðf Þ ¼ Wðf ; t0 Þ2 dt0
t1

describes the global wavelet spectrum, i.e., the integral distribution of the wavelet
spectrum energy over frequencies on the time interval [t1, t2].
The light time series was approximated by a sequence of k Gauss impulses fol-
lowing each other with frequency fC:
!
X
k1
0:5 ðt tj Þ2
pðtÞ ¼ pffiffiffi exp ;
r p
j¼0 0
4r02

where r0 = 10 ms is the width of the impulse, tj are the centers of the impulses:
tj ¼ tA þ j=fc ; j ¼ 0; . . .; k 1; tA is the time of the beginning of the ﬁrst impulse in the
sequence [8].
174 O. E. Dick

The wavelet transform of the light series p(t) was found in the form [9]:
pffiffiffi X
f k1 f 2
Wðf ; t0 Þ ¼p1=4 pffiffiffi exp 2
þ g tj t0
g j¼0 4r0
!
2pf 2 ð2pr0 Þ2 f 3
þ i tj t0 þ ;
g g

where g ¼ 1 þ 2ðr0 f Þ2 :
The presence of the photic driving reaction was estimated by the value of the
coefﬁcient of photic driving (kR) in the narrow range [fC − Df, fC + Df] around each
applied stimulation frequency fC, where Df = 0.5 Hz [9].
The coefﬁcient of photic driving (kR) was determined by the ratio of the maxima of
the global wavelet spectra during the photic stimulation and before it. The value kD < 1
means that the energy of the global wavelet spectrum during the light stimulation is less
than the energy of the spectrum before stimulation and the absence of the photic
driving reaction of the given frequency.
The second method of the analysis of the photic driving reaction in the EEG
patterns is connected with the construction of joint recurrence plots of the EEG and the
light series.
A joint recurrence plot is a graphical representation of a matrix

1; yi yj ; zi zj ;
Ri;j ðeÞ ¼ ;
0; yi 6¼ yj ; zi 6¼ zj

in which values 1 or 0 correspond to black or white points, where the black point means
a recurrence and the white point corresponds to a nonrecurrence, respectively [7].
A joint recurrence, within the accuracy to e error, is determined as the repetition of the
state yj of the phase EEG trajectory to the state yi and the simultaneous repetition of the
state zj of the light signal phase trajectory to the state zi [7].
The phase trajectories of states z(t) and y(t) were obtained from the initial time
series {x(t)} and {p(t)} by using the delay coordinate embedding method [10]:

yðtÞ ¼ ðxðtÞ; xðt þ dÞ; . . .; xðt þ ðm1Þd Þ;

where d is the delay time, m is the embedding dimension, i.e. the minimal dimension of
the space in which the recovery trajectory reproduces properties of the initial trajectory.
The optimal time delay d was fitted on the basis of first minimum of the mutual
information function [11]. The optimal embedding dimension m was searched by the
false nearest neighbors method [12]. Signal extraction in the narrow band of fre-
quencies around the photostimulation frequency allowed us to find the value of the
optimal embedding dimension m < 5. The value e was equal to 1% of the standard
deviation of the analyzed signal.
Using the recurrence analysis we determined the quantitative measures of joint
recurrence plots such as
Wavelet and Recurrence Analysis of EEG Patterns 175

(1) the mean length of diagonal lines, L, in the joint recurrence plot,
(2) the recurrence time, s, which is necessary to the signal value returns into e
neighborhood of the previous point, as the vertical distance between the onset and
end of the sequent recurrence structure in the recurrence plot;
(3) the recurrence rate, RR:

1 X N
RR ¼ Ri;j ðeÞ
N 2 i;j

(4) the measure for determinism of the signal, DET, as the ratio of recurrence points
that form diagonal structures of at least length lmin to all recurrence points:

P
N
lPðe; lÞ
l¼lmin
DET ¼ ;
P
N
Ri;j ðeÞ
i;j

where Pðe; lÞ ¼ fli ; i ¼ 1; . . .; Nl g is a frequency distribution of diagonal lines of the

length l in the recurrence plot, N is the number of all the diagonal lines.
To compare the mean parameters obtained for the different electrode sites of one
tested subject, the non-parametric Friedman ANOVA test (p < 0.05) was applied. The
averaging was performed by ﬁve trials of EEG recordings for the each subject. To
examine the differences between the mean values of the parameters in the group of the
patients obtained before and after ASFC, the non-parametric Mann-Whitney test
(p < 0.05) was used.

3 Results

In the background EEG of 69% of patients the high-amplitude activity of the h - range
dominated, and EEG of 31% of patients showed the low-amplitude polymorphic
activity in the d, h - and a - ranges before the ASFC trials. The ASFC trials resulted in a
significant decrease in the amplitude of the h - activity, the disappearance of the
polymorphic activity and an increase in the activity in the a - range.
The reactive EEG patterns before the ASFC trials were characterized by asymmetry
of the responses of the occipital lobes of the brain to the photostimulus. It was man-
ifested in various values of maxima of local wavelet spectra of EEG patterns recorded
in O1 and O2 sites (Fig. 1a, c).
After 10 trials of ASFC, all patients reported a significant decrease or complete
disappearance of panic attacks, a decrease in general and situational anxiety. The
asymmetry of the photic driving reaction decreased (Fig. 1b, d).
Table 1 shows the average values of the photic driving coefficient (kR) for the
reactive EEG patterns before and after the ASFC trials. For 9 out of 10 patients with
panic attacks the value of the photic driving coefficient kR < 1 for frequencies of the h -
range, that means the absence of the photic driving reaction of the given rhythm. The
176 O. E. Dick

a before ASFS c
O1 O2
20
20.4 20.4 30
20.2 15 20.2
20
f, Hz

f, Hz
20 10 20
19.8 5 19.8 10
19.6 19.6
0 10 20 0 10 20
t, s t, s
after ASFS
O1 b O2 d

20.4 3 20.4 5
20.2 20.2 4
2
f, Hz

f, Hz
3
20 20
2
19.8 1 19.8
1
19.6 19.6
0 10 20 30 0 10 20 30
t, s t, s

Fig. 1. A decrease of maxima of local wavelet spectra of EEG patterns in O1 and O2 sites after
ASFC trials. The beginning and end of the photostimulation is indicated by arrows.

minor photic driving reaction is revealed for frequencies of the a - range (kR = 1.9
± 0.2 for 12 Hz and kR = 1.1 ± 0.1 for 8 Hz). The large reaction is found for fre-
quencies of the b - range, for example, kR = 101 ± 11 for 20 Hz. At the same time, it
is noted that the value kR for the O2 site is almost five times higher than that value for
the O1 site. Thus, there are statistically significant differences in mean values of the
coefficient kR, calculated for the occipital sites O1 and O2 (p < 0.05), that testifies about
the asymmetry of the photic driving reaction for the b - range in most patients tested.
After the ASFC trials the asymmetry of the responses of the occipital lobes of the
brain become statistically insignificant (p > 0.05), and the values kR < 1 for the a -
range. The photic driving reaction of the b - range decreases significantly (kR = 5.5
± 0.5 for the O1 site at 20 Hz).
The dynamics of the rhythm driving in EEG patterns in patients with panic attacks
after the ASFC trials was also confirmed by a change in simultaneous recurrences in the
joint recurrence plots of these patterns and light time series. Examples of such plots are
presented in Fig. 2b and d, respectively. The plots are constructed at 20 Hz for the
delay time d = 3 and the embedding dimension m = 3, the value of the neighborhood
size e is equal to 1% of the standard deviation of the analyzed time series. The
corresponding EEG patterns during photostimulation with this frequency are shown in
Fig. 2a, with a bold line, and a photostimulus with a thin dash-dotted line.
The left recurrence plot (Fig. 2b) has recurrent structures containing long diagonal
lines. It testifies about the emergence of simultaneous recurrences in the EEG pattern
and the light signal During the increase in the amplitude of the brain response to the
photostimulation of the proposed frequency (within the range of nL values from 600 to
1800), the number of simultaneous recurrences increases, which is reflected in an
increase in the length of the diagonal lines in the recurrence plot.
Wavelet and Recurrence Analysis of EEG Patterns 177

Table 1. The mean values of the photic driving coefﬁcient (kR), the recurrence rate (RR) and the
recurrence time (s) in joint recurrence plots of the EEG patterns and the light time series
(N = 9 from 10) before (N = 9 from 10) after ASFC
ASFC
f (Hz) O1 O2 O1 O2
Coefﬁcient of photic driving (kR):
6 <1 <1 <1 <1
12 1.9 ± 0.2 2.7 ± 0.2 <1 <1
14 5.4 ± 0.5 122 ± 18 2.1 ± 02 3.5 ± 0.3
18 35 ± 3.7 147 ± 15 11 ± 1.2 17 ± 1.8
20 22 ± 1.9 101 ± 11 5.5 ± 0.5 7.1 ± 0.7
Recurrence time (s):
6 39 ± 3.1 33 ± 3.1 35 ± 3.3 31 ± 3.0
12 28 ± 2.7 24 ± 2.3 39 ± 3.9 41 ± 4.1
14 13 ± 1.1 8 ± 0.8 44 ± 4.3 36 ± 3.5
18 7 ± 0.6 4 ± 0.3 25 ± 2.4 30 ± 2.9
20 9 ± 0.8 7 ± 0.6 37 ± 3.6 46 ± 4.5
Recurrence rate (RR):
6 0.05 ± 0.005 0.04 ± 0.004 0.06 ± 0.006 0.03 ± 0.003
12 0.08 ± 0.008 0.07 ± 0.007 0.05 ± 0.005 0.06 ± 0.006
14 0.11 ± 0.01 0.13 ± 0.01 0.02 ± 0.002 0.03 ± 0.003
18 0.13 ± 0.01 0.15 ± 0.01 0.03 ± 0.003 0.02 ± 0.002
20 0.32 ± 0.02 0.27 ± 0.02 0.04 ± 0.003 0.02 ± 0.002

before ASFC a after ASFC c

20 20

10 10
ЭЭГ (мкВ)
EEG, mkV

0 0

-10 -10

-20 -20
600 1200 1800 600 1200 1800
nL nL
b d
1 1

600 600
i

1800
1800

1 600 1 800 1 600 1800

j j

Fig. 2. Examples of EEG patterns during the photostimulation with a frequency of 20 Hz before
(a) and after the ASFC trials (c) (site O2). b, d are the joint recurrence plots of these patterns and
light time series
178 O. E. Dick

By contrast, the right recurrence plot (Fig. 2d) has only short diagonal lines, that
means the weak joint recurrence in the given light time series and the analyzed EEG
pattern.
Figure 3 shows in details the dynamics of changes in the values of measures of the
recurrence plot given in Fig. 2b. On the abscissa axis is the time calculated in accor-
dance with the rule t =nL * dt, where dt = 1/Fs, Fs is the sampling frequency of the
recorded signal. Figure 3a, depicts a gradual increase and decrease in the amplitude of
the EEG pattern in the time interval from 2 to 7 s in response to a photostimulation
with a frequency of 20 Hz, which lasted 10 s. Within this interval the measures of the
recurrence plot increase, namely, the rate recurrence (RR) (Fig. 3b), the determinism
(DET) (Fig. 3c), the mean length of diagonal lines (L) (Fig. 3d); the recurrence time (s)
changes slightly (Fig. 3e).

a
10
EEG, mkV

-10
1 2 3 4 5 6 7 8
t, s
b d
0.4
10
RR

0.3
L

0.2 0
3 5 7 3 5 7
t, s c t, s e
0.95

10
DET

0.9
τ

0.85 0
3 5 7 3 5 7
t, s t, s

Fig. 3. The EEG pattern during the photostimulation (solid line) and light time series with a
frequency of 20 Hz (dash-dotted line) (a) and the time changes of the recurrence plot measures:
the rate recurrence (RR) (b), the determinism (DET) (c), the mean length of diagonal lines (L) (d)
and recurrence time (s) (e).

The mean values of the rate recurrence and recurrence time for the EEG patterns
before and after the ASFC trials are represented in Table 1. The data of the Table 1
point at the enhancement of mean recurrence times and the decrease of mean recurrence
rates after the ASFC trials at the frequencies of b - range (s = 9 ± 0.8, RR = 0.32
± 0.02, f = 20 Hz, site O1, before ASFC and s = 37 ± 3.6, RR = 0.04 ± 0.003 after
ASFC).
The decrease in the recurrence rate as well as the increase in the recurrence time and
the decrease in the photic driving coefﬁcient during the photostimulation with the
Wavelet and Recurrence Analysis of EEG Patterns 179

frequencies of the b - range after the ASFC trials indicate that these trials lead to the
significantly reduced response of the brain to the external stimulus.
As known, the strong photic driving reaction in the b - range and the interhemi-
spheric asymmetry of the rhythm driving are associated with an increase in the psy-
choemotional excitability of a subject [13]. Therefore, the decline of the photic driving
reaction in the b - range found with both methods used in the work and the obtained
decrease of the asymmetry of the occipital lobes responses to photostimulus prove that
the trials of ASFCs lead to decreasing the neurotization degree of the subject with panic
attacks. Psychological testing of the patients before and after the ASFC trials confirmed
a significant reduction of the leading symptoms in the clinical picture of the disease,
namely, a decrease in the level of total anxiety (Table 2).

Table 2. Changes in psychological parameters in patients with panic attacks after the ASFC
trials (N = 10, p < 0,05)
Indicators Before ASFC After ASFC
Short term memory index (double test) 5,1 ± 1,7 8 ± 1,6
Anxiety (Taylor test) 27,7 ± 2,9 21,5 ± 2,3
Depression scale 80,4 ± 8 61,3 ± 6

4 Conclusions

The analysis of reactive EEG patterns carried out before and after non-drug therapy
associated with the activation of artificial stable functional connections of the brain, has
been shown that trials of ASFCs lead to decreasing of the asymmetry of the occipital
lobes responses to photostimulus and decreasing the neurotization degree of the subject
with panic attacks. It reflects in decreasing values of the photic driving coefficient and
the recurrence rate and in increasing the recurrence time. The improvement of the
subject psychophysiological state after the trials of ASFCs has been confirmed by the
positive dynamics of the psychophysiological testing data.

Acknowledgments. This study was supported by the Program of Fundamental Scientiﬁc Research
of State Academies for 2013–2020 (GP-14, section 64). The author thanks T. N. Reznikova, Prof. of
St. Petersburg Human Brain Institute for her help with data recordings.

References
1. Cosci, F.: The psychological development of panic disorder: implications for neurobiology
and treatment. Braz. J. Psychiatry 34, 9–19 (2012)
2. Wilson, K.A., Hayward, C.: A prospective evaluation of agoraphobia and depression
symptoms following panic attacks in a community sample of adolescents. J. Anxiety Disord.
19, 87–103 (2005)
3. Smirnov, V.M., Borodkin, Y.S.: Artiﬁcial Stable Functional Connections. Medicine,
Moscow, 192 p. (1979)
180 O. E. Dick

4. Smirnov, V.M., Reznikova, T.N.: Artiﬁcial stable functional connections as a method of

research and treatment of pathological state. Vestn. AMS USSR 9, 18–23 (1985)
5. Reznikova, T.N., Krasnov, A.A., Seliverstova, N.A., et al.: The study of the “internal picture
of the disease” in patients with organic and functional pathology of the central nervous
system in the process of therapeutic activations using the method of artifactual stable
functional connections of the human brain. Vestn. Clin. Psychol. 2, 76–82 (2004)
6. Torrence, C., Compo, G.P.: A practical guide to wavelet analysis. Bull. Am. Meteorol. Soc.
79, 61–78 (1998)
7. Marwan, N., Romano, M.C., Thiel, M., et al.: Recurrence plots for the analysis of complex
systems. Phys. Rep. 438, 237–329 (2007)
8. Bozhokin, S.V.: Wavelet analysis of dynamics of reproducing and forgetting the rhythms of
photostimulation for nonstationary EEG. J. Tech. Phys. 80, 16–24 (2010)
9. Dick, O.E., Svyatogor, I.A.: Wavelet and multifractal estimation of the intermittent photic
stimulation response in the electroencephalogram of patients with dyscirculatory
encephalopathy. Neurocomputing 165, 361–374 (2015)
10. Takens, F.: Detecting strange attractors in turbulence. In: Rand, D., Young, L.S. (eds.)
Dynamical Systems and Turbulence. Lecture Notes in Mathematics, pp. 366–381. Springer,
Berlin (1981)
11. Fraser, A.M., Swinney, H.L.: Independent coordinates for strange attractors from mutual
information. Phys. Rev. 33, 1134–1140 (1986)
12. Kennel, M.B., Brown, R., Abarbanel, H.D.: Determining embedding dimension for phase-
space reconstruction using a geometrical construction. Phys. Rev. A 45, 3403–3411 (1992)
13. Fedotchev, I., Bondar, A.T., Akoev, I.G.: Dynamic characteristics of the human resonance
EEG responses to rhythmic photostimulation. Human Physiol. 26, 64–72 (2000)
Two Delay-Coupled Neurons
with a Relay Nonlinearity

Sergey D. Glyzin and Margarita M. Preobrazhenskaia(B)

Yaroslavl State University, Yaroslavl, Russia

[email protected], [email protected]

Abstract. The models of association of two coupled neural oscillators

with synaptic coupling are considered. An important feature of this asso-
ciation is an additional delay in the chain of connections. The system of
relay differential-difference equations was chosen as the mathematical
model of this association. The synaptic connection was modeled based
on the modified idea of fast threshold modulation (FTM). The delay
allows obtaining new effects which is an essential complication of the
system dynamics, and an appearance of coexisting special form attrac-
tors. We show that there coexist asymptotically orbitally stable solutions
with summary N ∈ N spikes on a period. Moreover, the first oscillator
has m spikes, and the second one has N − m (m = 1, 2, . . . , N − 1)
spikes on a period. We conclude that the additional delay leads to an
accumulation of coexisting attractors with a given number of spikes on
a period.

Keywords: Dynamic systems · Time delay · Multistability ·

Bursting · Coupled oscillators

1 Introduction
Modeling the dynamics of changes in the electrical potential of nerve cells is
associated primarily with the works of Hodgkin and Huxley. These authors in
the article [1] for the first time managed to present a phenomenological model
based on balanced type relationships, such that its dynamics, with an appropri-
ate choice of parameters, have the basic qualitative properties characteristic of
the nerve cells observed in the experiment. The Hodgkin-Huxley model is quite
complex and contains a large number of parameters, the dependence on which
is very significant. It should be noted that in many cases the Hodgkin - Huxley
model gives a completely satisfactory not only qualitative but also quantitative
agreement with experimental data. Since the advent of this model, numerous
attempts have been made to simplify it while preserving the main effects spe-
cific to the dynamics of neurons. In the summarizing articles [2,3] a number of
criteria are given which the model of a pulse neuron must comply with, and a
large number of model systems are listed. Naturally, the simplest of them satisfy
not all the requirements. Among these requirements, the most important is the
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 181–189, 2020.
https://doi.org/10.1007/978-3-030-30425-6_21
182 S. D. Glyzin and M. M. Preobrazhenskaia

condition for the existence of a stable periodic pulse-type regime for the corre-
sponding system. To build a model of a single pulse neuron, let us reproduce
the line of reasoning from [4,5]. First of all, note that in [5] only potassium and
sodium currents are taken into account, the level of the greatest polarization of
the membrane is taken as the zero point and the potential deviation from this
level is denoted u(t). The equation of current balance, provided that leakage
currents are neglected, is written as:

cu̇ = −INa − IK . (1)

Further, it is assumed that the currents INa , IK can be represented as follows:

INa = χNa (u) · u, IK = χK (u) · u, (2)

where χNa (u), χK (u) are smooth functions that determine sodium and potassium
conductivity.
For potassium conductivity, book [5] accepted the hypothesis that it is
delayed comparing with the value of the membrane potential. We take this delay
to be the unit of time and assume that χK = χK (u(t−1)). To simplify the depen-
dency χNa (u) in book [5] it is noted that the areas of relative stabilization of
conductivities χNa (u) and χK (u) are large enough, so from (1) we can go to the
following equation with delay

u̇ = λ [−1 + αf (u(t − 1)) − βg(u)] u. (3)

Here the parameter λ > 0 characterizes the speed of the electrical processes in
the system and is assumed to be large, the functions f (u) and g(u) characterize
the conductivities of the ion channels and satisfy the conditions

f (0) = g(0) = 1, 0 < βg(u) + 1 < 1 (u > 0),

(4)
f (u), g(u), uf (u), ug (u) = O(u−1 ) for u → ∞.

Next, we consider a system of two coupled nerve cells modelled by equation

(3). The connection between the cells will be considered synaptic with time delay.
Let us dwell on the method of modelling such a connection.
In this paper, we develop an approach, firstly proposed in [6], to model chem-
ical synapses. Our approach is based on a modified idea of fast threshold mod-
ulation (FTM). It allows a system of two synaptic coupled neurons to have two
dynamical effects at the same time: multistability and bursting.
FTM, which was originally described in [7,8], is a special way of coupling of
dynamical systems. Its characteristic feature is that the right-hand sides of the
corresponding differential equations experience jumps as some control variables
cross critical values. FTM is usually implemented in neuron systems as follows.
Now consider a network of two neurons with synaptic interaction. According
to modern views (e.g., see [9]), this network satisfy the system of equations

u̇1 = λ [−1 + αf (u1 (t − 1)) − βg(u1 )] u1 + γs2 (u2 )(u∗ − u1 ),

(5)
u̇2 = λ [−1 + αf (u2 (t − 1)) − βg(u2 )] u2 + γs1 (u1 )(u∗ − u2 ).
Two Delay-Coupled Neurons with a Relay Nonlinearity 183

Here γ is a positive parameter characterizing the maximum conductivity of a

synapse, u∗ is the equilibrium potential (the Nernst potential), and the functions
sj (uj ), j = 1, 2 are the postsynaptic conductivities depending on the presynaptic
potentials uj . There exist several distinct ways, described in [9], of choosing the
functions sj (uj ). Following the FTM idea, we use the simplest of them. Namely,
we assume that

0, x < 0,
sj (uj ) = H(uj − u∗∗ ), H(x) = (6)
1, x > 0,

where u∗∗ is the threshold starting from which one neuron influences the other.
For example, if u1 < u∗∗ , then the first neuron does not affect the second one;
but if u1 > u∗∗ , then it does.
Our main goal is to adapt the above-represented method for modelling chem-
ical synapses to differential-difference equations of Volterra type (see [6]). In this
case, one should reject universally accepted concepts and take a slightly different
system for the mathematical model of this neural network, namely,

u̇1 = λ [−1 + αf (u1 (t − 1)) − βg(u1 ) + γs(u2 (t − h)) ln(u∗ /u1 )] u1 ,

(7)
u̇2 = λ [−1 + αf (u2 (t − 1)) − βg(u2 ) + γs(u1 (t − h)) ln(u∗ /u2 )] u2 ,

where b = const > 0, u∗ = exp(c λ), c = const ∈ R and the functions s(u) satisfy
the conditions

s(0) = 0, s(u) − 1, us (u) = O(u−1 ) as u → ∞. (8)

An important feature of the system (7) is the presence of an additional time delay
h > 1 in the coupling between oscillators. The reasons for choosing the system
(7) are as follows. Firstly, the general qualitative character of a synaptic link is
preserved when passing from (5) to (7), because in both cases the corresponding
coupling terms

γ sj−1 (uj−1 )(u∗ − uj ) and γ s(uj−1 )uj ln(u∗ /uj ) (j = 1, 2, u0 = u2 )

change their sign from plus to minus as the potentials uj increase and cross
the critical value u∗ . Secondly, which is the most important, there exists a well-
deﬁned limit object for system (7), which is a relay-type delay system.
Indeed, after the passage to the new variables

xj = (1/λ) ln uj , j = 1, 2 (9)

and as parameter λ tends to inﬁnity, system (7) can be represented in the form

ẋ1 = −1 + αRx1 (t − 1) − βR(x1 ) + γ (c − x1 ) H x2 (t − h),
(10)
ẋ2 = −1 + αR x2 (t − 1) − βR(x2 ) + γ (c − x2 ) H x1 (t − h) ,

where
def 1, x ≤ 0, def 0, x ≤ 0,
R(x) = H(x) = (11)
0, x > 0, 1, x > 0.
184 S. D. Glyzin and M. M. Preobrazhenskaia

As it turned out, the system (10) has a rather complex dynamics. As will be
shown in the next section, in this system, by introducing a delay in the chain of
relations between the equations, two fundamentally important phenomena can
be achieved at once. The ﬁrst of these consists in the coexistence of several stable
periodic regimes in the system (10). In this case, a mechanism for increasing
the number of such regimes can be indicated. This phenomenon is often called
multistability. The second important property of the system (10) solutions is that
they have some preassigned number of consecutive positive sections, followed
by a large section of negativity. Taking into account the replacement (9), such
cycles of the system (7) correspond to periodic solutions with the same number
of consecutive asymptotically high spikes, alternating with the section where the
potentials uj (t) are close to zero. Periodic solutions with this property are called
bursting-cycles (see [2,3,10,11]).
The proof of the theorem on the correspondence between the solutions of the
system (7) and the limit system (10) is a technically rather complicated task
(see, for example, [12,13]). It is connected with the construction of asymptotic
approximations of the solution of the system (7). To avoid this, one can replace
the system (7) with a relay type system

u̇1 = λ [−1 + αF (u1 (t − 1)) − βG(u1 ) + γG(u2 (t − h)) ln(u∗ /u1 )] u1 ,

(12)
u̇2 = λ [−1 + αF (u2 (t − 1)) − βG(u2 ) + γG(u1 (t − h)) ln(u∗ /u2 )] u2 ,

where
def 1, 0 < u ≤ 1, def 0, 0 < u ≤ 1,
F (u) = G(u) = (13)
0, u > 1. 1, u > 1.
Note that substitutes (9) leads (13) to relay functions (11), in particular

F (exp(λx)) = R(x), G(exp(λx)) = H(x) as λ > 0.

Thus, all the properties of the relay system (10) automatically passed to the
system (12).

2 Relay Model Analysis

We need the following definitions in the sequel. Let’s fix a sufficiently small

constant σ > 0 and consider the space E = C([−h − σ, −σ]; R2 ) of continuous
vector functions ϕ(t) = colon (ϕ1 (t), ϕ2 (t)) deﬁned for t ∈ [−h − σ, −σ]. We will
set the norm in E in the usual way, i.e. by the formula

||ϕ|| = max max |ϕj (t)|. (14)

j=1,2 −h−σ≤t≤−σ

Let’s introduce constants:

def def α−β−1
ξ = exp(−γα), η = + c. (15)
1−ξ
Two Delay-Coupled Neurons with a Relay Nonlinearity 185

Further, in order to determine the set of initial functions S (m) ⊂ E we ﬁx natural

N ; ﬁx constants q1 q2 such that q1 ∈ (0, σ), q2 > σ; ﬁx index m dependent
constants: q3 q4 , such that

q3 ∈ 0, (N/2 − m)T0 + ξη + σ , q4 > (N/2 − m)T0 − ξη + σ.

Here m = 1, . . . , N/2, · is an integer part of number. We deﬁne two function

sets:
def
S1 = {ϕ1 ∈ C[−h − σ, −σ] : ϕ1 (−σ) = −σ, −q2 ≤ ϕ1 (t) ≤ − q1 ∀t ∈ [−h − σ, −σ]},
(m) def
S2 = {ϕ2 ∈ C[−h − σ, −σ] : ϕ2 (−σ) = −d − σ,
−q4 ≤ ϕ2 (t) ≤ −q3 ∀t ∈ [−h − σ, −σ]},

where
(n − m)T0 + ξη ≤ d ≤ (n − m)T0 − ξη, m = 1, . . . , n. (16)
Firstly, let us consider an alone relay equation which we get for xj from (10)
if γ = 0:
ẋ = −1 + αR x(t − 1) − βR(x). (17)
The following statement was proved in the article [12].

Lemma 1 ([12]). Let α > β + 1 and σ < β + 1. Then equation (17) with initial
function ϕ1 ∈ S1 for t ∈ [−1 − σ, −σ] admits a unique stable periodic solution
given by equality
⎧
⎪
⎪ (α − 1)t, t ∈ [0, 1],
⎨
def −t + α, t ∈ [1, α],
x0 (t) = (18)
⎪ −(β + 1)(t − α), t ∈ [α, α + 1],
⎪
⎩
(α − β − 1)(t − T0 ), t ∈ [α + 1, T0 ],

def β+1
x0 (t + T0 ) ≡ x0 (t), T0 = α + 1 + . (19)
α−β−1
Secondly, we consider an additional task.
Lemma 2. For any l ∈ N and τ ∈ [(l − 1)T0 + α + 1, lT0 ], a solution of the task

ẋ = −1 + α − β + γ(c − x)H x0 (t) , x t=0 = x0 (τ ) (20)

is described by the following formula for k ∈ {0} ∪ N

ξ k (x0 (τ ) − η) exp(−γ(t − kT0 )) + η, t ∈ [kT0 , α + kT0 ],
y0 (τ, t) =
(α − β − 1)(t − α − kT0 ) + ξ k+1 (x0 (τ ) − η) + η,
t ∈ [α + kT0 , (k + 1)T0 ].

Now let us formulate for (10) a theorem about an coexistence of bursting-

cycles. A set of initial functions for (10) is deﬁned as follows:
def (m)
S (m) = S1 × S2 , m = 1, . . . , n. (21)
186 S. D. Glyzin and M. M. Preobrazhenskaia

For simplicity, further suppose that β = α − 2. Taking into account exponen-

tial dependence ξ and α, γ (see (15)), suppose that

η + α − 1 < 0, η + N T0 /2 < 0, (N/2 − 1)T0 + α + 1 − ξη ≤ h ≤ N T0 /2 + ξη. (22)

By deﬁnition put
⎧
⎨ x0 (t), t ∈ [0, h + d∗ ],
x1 (t) = y0 (α + h, t − d∗ − h), t ∈ [h + d∗ , h + d∗ + α + (m − 1)T0 ], (23)
(m) def
⎩ (m) (m)
t − T1 , t ∈ [h + d∗ + α + (m − 1)T0 , T1 ],
⎧
t − d∗ ,
⎪
⎪ t ∈ [0, d∗ ],
x ⎨(t), t ∈ [d∗ , h],
(m) def 0
x2 (t) = y (h − d , t − h), t ∈ [h, h + α + (N − m − 1)T ], (24)
⎪
⎪ 0 ∗ 0
⎩ (m) (m)
t − T2 , t ∈ [h + α + (N − m − 1)T0 , T1 ],
where

= h + d∗ + α + (m − 1)T0 − ξ m (h + d∗ − (n − m)T0 − η) − η,
(m) def
T1 (25)

= h + α + (N − m − 1)T0 − ξ N −m (h − d∗ − mT0 − η) − η,
(m) def
T2 (26)
m N −m
def (N − 2m)T0 + ξ (h − (N − m)T0 − η) − ξ (h − mT0 − η)
d∗ = . (27)
2 − ξ m − ξ N −m

Theorem 1. Let β = α − 2, γ, h satisfy (22). Then there exists σ > 0 such that
system (10) with initial condition from (21) admits N − 1 periodic modes
(m) (m)
colon (x1 (t), x2 (t)) (m = 1, . . . , N − 1).
(m) (m) (m)
Here x1 (t) and x2 (t) are T1 -periodic functions which have N − m and m
relatively short alternating segments of positivity and negativity which go after a
long enough segment where the functions values are negative.
A possible view of the periodic mode is illustrated in Fig. 1.
The following statement is about a stability of the solutions from Theorem 1.

Theorem 2. The solution of (10), described in Theorem 1, is asymptotically

orbitally stable.

A proof scheme is the same as, for example, in [10,12–14]. Let us introduce
some notation for its presentation.
def (m)
Denote a function of S (m) by ϕ = colon (ϕ1 , ϕ2 ), where ϕ1 ∈ S1 , ϕ2 ∈ S2 .
def
For an arbitrary function ϕ(t) from (21), denote by x(t) = colon x1 (t), x2 (t)
a solution of (10) such that x1 (t) ≡ ϕ1 (t), x2 (t) ≡ ϕ2 (t), when t ∈ [−h − σ, −σ].
Suppose that the equation
x1 (t − σ) = −σ (28)
Two Delay-Coupled Neurons with a Relay Nonlinearity 187

Fig. 1. A solution of (10). Here N = 8, m = 1.

has 2N − 2m or more positive roots. We denote one with number 2N − 2m by

(m)
T1 . Finally, we deﬁne the Poincare operator

Π:S→S

by the formula
def (m)
Π(ϕ) = x(t + T1 ), −h − σ ≤ t ≤ −σ. (29)

The first step of the proof is the construction of a solution on the segment
(m)
[−σ, T1 ]. It is possible to show that, here the solution is described by (23),
(24). We skip technical details.
(m)
Similarly to T1 , denote the root of x2 (t − σ) = −σ with number 2m + 1 by
(m) (m)
T2 . From the construction of a solution, it follows that T1 equals (25) and
(m)
T2 is described by (26).
Since (22), (25) and (26), the distance between (2N − 2m − 1)-th and (2N −
2m)-th roots of (28) more than length of the segment where S (m) is defined.
Hence operator Π is defined on the set S (m) and transform it into itself. Thus,
for any m = 1, . . . , n there exists periodic solution (23), (24) of the relay system.
From the explicit formulas (23), (24), it follows that all functions from S (m)
map to the unique function. Therefore, Π is contraction operator. According to
the contraction mapping principle, Π has a unique fixed point in S (m) . Thus,
periodic solution of (10) with initial condition from S (m) is unique. Its period
is (25). Moreover, a contraction property of Π means that the stability spectrum
of the periodic solution contains a multiplier μ2 = 0 in addition to μ1 = 1; all
other multipliers equal to zero. In the same time, the multiplier μ2 is a multiplier
of the map −d → −d, ¯ where d¯ is a number such that

(m)
x2 (T1 (ϕ) − σ) = −d¯ − σ. (30)

¯ Since (22), the value T (m) − σ belong to the segment [h + (N −

Let us ﬁnd d. 1
(m) (m)
m − 1)T0 + α, T2 ] where x2 (t) = t − T2 . Hence, using (25), (26) and (22),
we obtain

d¯ = T2 − T1 = −1 + ξ m + ξ N −m d + (N − 2m)T
(m) (m)
0
(31)
−ξ N −m (h − mT0 − η) + ξ m h − (N − m)T0 − η .
188 S. D. Glyzin and M. M. Preobrazhenskaia

A ﬁxed point of the map is (27). Formula (31) implies that μ2 = −1 + ξ m + ξ N −m .

Thus, we proof the following statement about multipliers of the periodic solution
of (10).

Lemma 3. The solution (23), (24) of (10) has a countable set of the zero mul-
tipliers, one unit multiplier μ1 = 1 and multiplier

μ2 = −1 + ξ m + ξ N −m . (32)

Lemma 3 implies Theorem 2.

Fig. 2. A solution of (7). Here N = 8, m = 2.

Proved theorems claim that (10) has N − 1 asymptotically orbitally stable

solutions with summary N ∈ N positivity segments on a period. Moreover, the
ﬁrst oscillator has m segments of a solution positivity, and the second one has
N − m (m = 1, . . . , N − 1) segments of a solution positivity on a period. By (9),
the segments with solution positive values correspond to spikes of the solutions
of the systems (7) and (12). The spikes amplitude have an order exp(λ). One of
the stable solutions of (7) and (12) is illustrated on Fig. 2 in case N = 8, m = 1.

3 Conclusion
We have proposed and studied a mathematical model of pair of synaptically
coupled impulse neurons with relay nonlinearity and delay in connection chain.
Lets point out the most important results.
The first important feature is that the system (12) is independent phe-
nomenological model of two synaptic coupled neurons. The presented approach
allows us to consider only relay system (12) which is given a well defined bio-
logical meaning. This avoids a laborious proof of the correspondence theorems
which one has to prove if right parts of (12) are continuous and parameter λ is
large (see, for example, [6,10,12–14]).
Secondly, an analysis of (12) shows that an introduction of a delay in a cou-
pling between oscillators implies new effects which are not typically for systems
without delay. In particular, for any even N we find a mechanisms of occurrence
of (N − 1) stable relaxation periodic regimes. The components of the solutions
Two Delay-Coupled Neurons with a Relay Nonlinearity 189

have summary N spikes on a period. Thus, there are both multistability phe-
nomenon and bursting-eﬀect.
Finally, thirdly, a set of coexisting attractors of (12) contains not only solu-
tions described in the present paper. For example, there are antiphase and
impulse-refractive modes which are not considered here.
The reported study was funded by RFBR according to the research project
18-29-10055.

References
1. Hodgkin, A.L., Huxley, A.: A quantitative description of membrane current and its
application to conduction and excitation in nerve. J. Physiol. 117, 500–544 (1952)
2. Izhikevich, E.: Neural excitability, spiking and bursting. Int. J. Bifurcat. Chaos
10(6), 1171–1266 (2000). https://doi.org/10.1142/S0218127400000840
3. Rabinovich, M.I., Varona, P., Selverston, A.I., Abarbanel, H.D.I.: Dynamical prin-
ciples in neuroscience. Rev. Mod. Phys. 78, 1213–1265 (2006). https://doi.org/10.
1103/RevModPhys.78.1213
4. Kashchenko, S.A., Maiorov, V.V., Myshkin, I.Y.: Wave distribution in simplest
ring neural structures. Matem. Mod. 7(12), 3–18 (1995). http://mi.mathnet.ru/
mm1392
5. Kashchenko, S.: Models of Wave Memory. Springer, Switzerland (2015). https://
doi.org/10.1007/978-3-319-19866-8
6. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: On a method for mathematical modeling
of chemical synapses. Differ. Equ. 49(10), 1193–1210 (2013). https://doi.org/10.
1134/S0012266113100017
7. Somers, D., Kopell, N.: Rapid synchronization through fast threshold modulation.
Biol. Cybern. 68, 393–407 (1993). https://doi.org/10.1007/BF00198772
8. Somers, D., Kopell, N.: Anti-phase solutions in relaxation oscillators coupled
through excitatory interactions. J. Math. Biol. 33, 261–280 (1995). https://doi.
org/10.1007/BF00169564
9. Terman, D.: An introduction to dynamical systems and neuronal dynamics. In:
Tutorials in Mathematical Biosciences I: Mathematical Neuroscience, pp. 21–68.
Springer, Berlin (2005). https://doi.org/10.1007/978-3-540-31544-5 2
10. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Modeling the bursting effect in
neuron systems. Math. Notes. 93(5), 676–690 (2013). https://doi.org/10.1134/
S0001434613050040
11. Chay, T.R., Rinzel, J.: Bursting, beating, and chaos in an excitable mem-
brane model. Biophys. J. 47(3), 357–366 (1985). https://doi.org/10.1016/S0006-
3495(85)83926-6
12. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Relaxation self-oscillations in neu-
ron systems: I. Differ. Equ. 47(7), 927–941 (2011). https://doi.org/10.1134/
S0012266111070020
13. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Relaxation self-oscillations in neu-
ron systems: II. Differ. Equ. 47(12), 1697–1713 (2011). https://doi.org/10.1134/
S0012266111120019
14. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Discrete autowaves in neural systems.
Comput. Math. Math. Phys. 52(5), 702–719 (2012). https://doi.org/10.1134/
S0965542512050090
Brain Extracellular Matrix Impact
on Neuronal Firing Reliability
and Spike-Timing Jitter

Maiya A. Rozhnova(B) , Victor B. Kazantsev, and Evgeniya V. Pankratova

Lobachevsky State University of Nizhni Novgorod,

23 Gagarin Ave., 603950 Nizhny Novgorod, Russia
[email protected]

Abstract. In this work, the role of the brain extracellular matrix (ECM)
in signal processing by a neuronal system is examined. For excitatory
postsynaptic currents in the form of Poisson signal, we study the changes
of the interspike intervals duration, spike-timing jitter and coefficient
of variation in the presence of a background noise with varied inten-
sity. Without ECM impacts, noise-delayed spiking phenomenon reflect-
ing worsening of both reliability and precision of signal processing is
revealed. It is shown that, the ECM-neuron feedback mechanism allows
enhancing the robustness of neuronal firing in the presence of noise.

Keywords: Brain extracellular matrix · Neuronal activity ·

Reliability and precision of signal transmission

1 Introduction
Information about any changes in external environment is transmitted by neu-
ronal systems via changes of their membrane potential activity. Despite the
presence of huge number of background noise sources, a lot of experimental
data show that repeated identic signals provoke outputs with similar character-
istics [1,2]. This amazing neuronal ability to process signals with high reliability
and precision is still poorly understood, and, therefore, is of particular interest.
Recently, based on experimental observations, new mathematical model for
neuronal activity in the presence of ECM was introduced in [3], where the authors
studied the role of ECM-neuron feedback mechanisms activation in sustaining
of homeostatic balance in neuronal ﬁring network as well as its possible role in
memory function implementation. In this study, within the frame of this model
we discuss one possible mechanism for neuronal activity regulation that can
enhance the reliability and precision of signal transmission in the presence of
background noise.

c Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 190–196, 2020.
https://doi.org/10.1007/978-3-030-30425-6_22
Brain Extracellular Matrix Impact on Neuronal Firing 191

2 Mathematical Model
2.1 Postsynaptic Neuronal Dynamics

We assume that the membrane potential of postsynaptic cell is evolved according

to the following current balance equation of the Hodgkin-Huxley model:

C V̇ = Iapp − Iion + Isyn , (1)

where Iion = IN a + IK + Il is the sum of the transmembrane currents with

IN a = gN a m3 (V )h(V )(V − EN a ), IK = gK n4 (V )(V − EK ), Il = gl (V − El )
are the sodium IN a and potassium IK ionic currents passing through the cell
membrane and the current Il through an unspeciﬁc leakage channel, respectively.
The dynamics of potential-dependent gating variables is described by the
following kinetic equations:

ẋ = αx (V )(1 − x) − βx (V )x, (2)

where x is m(V, t), h(V, t) (that are responsible for the activation and inactivation
of the Na+ -current), or n(V, t) (that controls the K+ -current activation). The
mean transition rates αx (V ), βx (V ), and the parameters of the model are taken
as in the classical work of Hodgkin and Huxley [4].

2.2 Synaptic Currents Modeling

We assume that synaptic current is Isyn = IEP SCs (k) + ξ(t) where the ﬁrst term
is determined as follows:

A, if tj < t < tj + τ,
IEP SCs (k) = (3)
0, otherwise,

where tj is the occurence time of a pulse with amplitude A in the input signal.
This time satisfies Poisson distribution with average time interval τin between
the subsequent pulses. The duration of each pulse in the input is assumed to be
constant with τ = 1 ms. For each pulse the amplitude A has a random value
that satisfies the probability distribution
2A −A2 /b2
P (A) = e (4)
b2
with the scaling factor b = b0 (1 + γZb Z), where γZb is the gain parameter that
modifies the amplitude of IEP SCs [3].
The second term of Isyn is the white Gaussian noise with zero mean ξ(t) = 0
and with the correlation function ξ(t)ξ(t + τG ) = Dδ(τG ).
Additionally we assume that Iapp = Idc (1 + γZ Z), where γZ is the feedback
gain parameter that modifies the applied current. Thus, both the currents in the
192 M. A. Rozhnova et al.

input of the neuron (1) depend on the variable Z whose value should be taken
from the following system of equations describing ECM dynamics:

Z0 − Z1
Ż = −(αZ + γP P )Z + βZ Z0 − ,
1 + exp(−(Q − θZ )/kZ )
(5)
P0 − P1
Ṗ = −αP P + βP P0 − ,
1 + exp(−(Q − θP )/kP )

where Q is an average neuronal activity variable that changes in time in accor-

dance with the following diﬀerential equation

Q̇ = −αQ Q + βQ / (1 + exp(−V /kQ )). (6)

In Eqs. (5) and (6), αZ = 0.001 ms−1 , γP = 0.1, βZ = 0.01 ms−1 , Z0 = 0,

Z1 = 1, θZ = 1.1, kZ = 0.15, αP = 0.001 ms−1 , βP = 0.01 ms−1 , P0 = 0,
P1 = 1, θP = 6, kP = 0.05, αQ = 0.0001 ms−1 , βQ = 0.01 ms−1 , kQ = 0.01.
In our numerical calculations, to avoid inﬂuence of transients the analysis of
interspike interval durations is carried out for t > 5 s. For all the averagings,
n = 10000 sampling values were used.

3 Neuronal Firing Without Impact of ECM

The dynamics of the membrane potential depends on characteristics of an input

current. In Fig. 1, two time series V (t) for low-and high-frequency Poisson input
with random amplitudes are shown. To study the role of the input signal param-
eters in neuronal activity, we further calculate the mean of the interspike interval
duration
1 (i)
n
τid = τ , (7)
n i=1 id

Fig. 1. (a), (b) EPSCs-Poisson pulse trains for two values of interpulse duration τin =
10 ms and τin = 2 ms, and (c), (d) evoked oscillations of the membrane potential in
the absence of ECM, D = 0, Idc = 5.7 μA/cm2 , b0 = 3.
Brain Extracellular Matrix Impact on Neuronal Firing 193

(i)
the spike-timing jitter (the mean square deviation of τid ) as

n
1 (i)
σ= [τ ]2 − τid
2 (8)
n i=1 id

and the coeﬃcient of variation β = σ/τid that illustrates the degree of coherence
in the neuronal output.

3.1 Output ISI-Statistics in the Absence of Gaussian Noise

For D = 0, three parameters deﬁne the change of the input, namely, Idc , τin and
b0 . Since for Hodgkin-Huxley (HH) model the parameter Idc can lead to one of
three possible regimes of neuronal behavior, we focus on three its values: Idc =
2
5 μA/cm (for dc-injected HH-model this corresponds to monostable regime with
2
a stable steady state), (b) Idc = 7 μA/cm (bistable regime with co-existence of
2
both stable steady state and limit cycle) and (c) Idc = 10 μA/cm (monostable
regime with stable limit cycle) [5]. As seen from Fig. 2 for all of these cases, the
decrease of the input frequency (as well as the decrease of Idc ) leads to increase
of all the considered characteristics of the output. While the increase of b0 for
large Idc -currents can lead either to the increase (for large τin ) or decrease (for
small τin ) of τid .

Fig. 2. The mean of the interspike interval duration, the spike-timing jitter, and the
coeﬃcient of variation as functions of input interpulse duration for three values of the
parameter b0 without ECM for (a) Idc = 5 μA/cm2 , (b) Idc = 7 μA/cm2 and (c)
Idc = 10 μA/cm2

3.2 Neuronal Firing in the Presence of Noise: Noise-Delayed

Spiking
For D = 0, ﬂuctuations can either suppress some of the spikes or, on the con-
trary, can provide the appearance of additional pulses in the output. For con-
194 M. A. Rozhnova et al.

sidered values of Idc , Fig. 3(a) shows that almost all curves have a similar non-
monotonic behavior with a maximum at some value of noise intensity. Wherein,
small amount of fluctuations impedes the spiking: noise with small intensity D
provokes the increase of the mean interspike interval duration. Such noise delayed
spiking phenomenon was observed in [7–13] for the mean latency time. Here, we
demonstrate that for the interspike intervals this phenomenon also takes place:
the neuronal cell sensitivity to noise is particularly high within a certain interval
of noise intensities (where the increase of τid is observed). The degree of such
sensitivity to noise is also dependent on b0 and τin . From Figs. 3(b), (c) follows
that the increase of b0 as well as the decrease of τin lead to decrease of noise-
sensitivity, the maximum becomes less pronounced. As we can see, noise delayed
2
spiking is observed for large enough values of Idc only. For Idc = 7 μA/cm (blue
curve in Fig. 3(a), upper panel) we observe another dependence. For this param-
eter, in noise-free case the system spends a lot of time near the resting state that
(i)
leads to appearance of large values in τid -statistics. Fluctuations drive out the
system to oscillatory mode and lead to decrease of τid . Obviously, that a similar
2
behavior we can also observe for any Idc < 7 μA/cm . Taking above mentioned
2
differences into account, we further focus on two cases (Idc = 7 μA/cm and
2
Idc = 8.5 μA/cm ) and consider the role of ECM in cell’s sensitivity to external
fluctuations.

Fig. 3. White Gaussian noise-induced changes: the mean of the interspike interval
duration, the spike-timing jitter, and the coeﬃcient of variation as functions of noise
intensity D for (a) for four values of Idc , τin = 4 ms, b0 = 1, (b) for three values of
b0 , Idc = 8.5 μA/cm2 , τin = 4 ms, (c) for three values of the input interpulse duration
τin , Idc = 8.5 μA/cm2 , b0 = 1.

4 ECM-Induced Changes of Neuronal Firing

Dynamical regimes of ECM activity within the considered model (5) were cir-
cumstantially studied in [14]. It was shown that various bistable (when switching
between two diﬀerent steady states is possible or stable stationary level co-exists
Brain Extracellular Matrix Impact on Neuronal Firing 195

with oscillations) and monostable modes can be observed for various parame-
ters. In this study, the average activity variable Q is assumed to be changeable
in time in accordance with (6). The parameters of ECM-model provide the tran-
sition to some stationary level of concentrations of ECM molecules Z as a result
of high level of averaged neuronal activity Q. Taking into account the gain of
IEP SCs (Fig. 4(a)) and Iapp (Fig. 4(b)) due to the establishment of Z-level, leads
to increase of the system’s reliability: ECM-induced elimination of the noise-
delayed spiking eﬀect is observed.

Fig. 4. ECM-induced changes: the mean of the interspike interval duration, the spike-
timing jitter, and the coeﬃcient of variation as functions of noise intensity D for two
values of Idc (Idc = 7 μA/cm2 (blue curves) and Idc = 8.5 μA/cm2 (green curves)) for
(a) γZb = 0.3, (b) γZ = 0.1, τin = 4 ms, b0 = 1.

5 Conclusions

Neuronal firing activity was studied within the frame of Hodgkin-Huxley model
driven by the synaptic currents accounting the existence of background noise
and impacts of ECM whose concentration of molecules can be modified via
feedback mechanism of neuron-ECM interaction. In the absence of ECM, the
phenomenon of noise-delayed spiking is observed: both reliability and precision
of the signal transition in the presence of noise become worse. Introducing of
ECM impacts into the model shows elimination of this negative noise-induced
effects, that allows demonstrating more reliable and precise signal processing by
the neuronal systems.
196 M. A. Rozhnova et al.

Acknowledgments. The work was supported by the Ministry of Education and Sci-
ence of Russia (Project No. 14.Y26.31.0022).

References
1. Rodriguez-Molina, V.M., Aertsen, A., Heck, D.H.: Spike timing and reliability
in cortical pyramidal neurons: effects of epsc kinetics, input synchronization and
background noise on spike timing. PLoS ONE 2(3), e319 (2007). https://doi.org/
10.1371/journal.pone.0000319
2. Tiesinga, P., Fellous, J.-M., Sejnowski, T.J.: Regulation of spike timing in visual
cortical circuits. Nat. Rev. Neurosci. 9(2), 97–107 (2008). https://doi.org/10.1038/
nrn2315
3. Kazantsev, V., Gordleeva, S., Stasenko, S., Dityatev, A.: A homeostatic model of
neuronal firing governed by feedback signals from the extracellular matrix. PLoS
ONE 7(7), e41646 (2012). https://doi.org/10.1371/journal.pone.0041646
4. Hodgkin, A.L., Huxley, A.F.: A quantitative description of membrane current and
its application to conduction and excitation in nerve. J. Physiol. 117, 500–544
(1952)
5. Lee, S.-G., Neiman, A., Kim, S.: Coherence resonance in a Hodgkin-Huxley neuron.
Phys. Rev. E 57(3), 3292–3297 (1998). https://doi.org/10.1103/PhysRevE.57.3292
6. Parmananda, P., Mena, C.H., Baier, G.: Resonant forcing of a silent Hodgkin-
Huxley neuron. Phys. Rev. E 66, 047202 (2002). https://doi.org/10.1103/
PhysRevE.66.047202
7. Pankratova, E.V., Polovinkin, A.V., Mosekilde, E.: Resonant activation in a
stochastic Hodgkin-Huxley model: interplay between noise and suprathreshold
driving effects. Eur. Phys. J. B 45(3), 391–397 (2005). https://doi.org/10.1140/
epjb/e2005-00187-2
8. Gordeeva, A.V., Pankratov, A.L.: Minimization of timing errors in reproduction
of single flux quantum pulses. Appl. Phys. Lett. 88, 022505 (2006)
9. Pankratova, E.V., Belykh, V.N., Mosekilde, E.: Role of the driving frequency in
a randomly perturbed Hodgkin-Huxley neuron with suprathreshold forcing. Eur.
Phys. J. B 53(4), 529–536 (2006). https://doi.org/10.1140/epjb/e2006-00401-9
10. Ozer, M., Graham, L.J.: Impact of network activity on noise delayed spiking for
a Hodgkin-Huxley model. Eur. Phys. J. B 61, 499–503 (2008). https://doi.org/10.
1140/epjb/e2008-00095-y
11. Gordeeva, A.V., Pankratov, A.L., Spagnolo, B.: Noise induced phenomena in point
Josephson junctions. Int. J. Bifurcat. Chaos 18, 2825–2831 (2008)
12. Uzuntarla, M., Ozer, M., Ileri, U., Calim, A., Torres, J.J.: Effects of dynamic
synapses on noise-delayed response latency of a single neuron. Phys. Rev. E 92(6),
062710 (2015). https://doi.org/10.1103/PhysRevE.92.062710
13. Uzuntarla, M.: Inverse stochastic resonance induced by synaptic background activ-
ity with unreliable synapses. Phys. Lett. A 377(38), 2585–2589 (2013). https://
doi.org/10.1016/j.physleta.2013.08.009
14. Lazarevich, I.A., Stasenko, S.V., Rozhnova, M.A., Pankratova, E.V., Dityatev,
A.E., Kazantsev, V.B.: Dynamics of the brain extracellular matrix governed by
interactions with neural cells. arxiv:1807.05740
Contribution of the Dorsal and Ventral Visual
Streams to the Control of Grasping

Irina A. Smirnitskaya(&)

Scientiﬁc Research Institute for System Analysis, Russian Academy of Sciences,

Nakhimovsky Prospect, 36/1, Moscow 117218, Russia
[email protected]

Abstract. Since 1982 Ungerleider and Mishkin’s paper about the different
roles of dorsal and ventral visual streams, the ﬁrst as “where” and the last as
“what”, there is no consensus, what these pathways really do and are they really
exist. In this review the contribution of parietal, premotor and prefrontal cortical
regions in the control of grasping in the context of the existence of two visual
streams is discussed. There is evidence that each of the two streams consists of
two subdivisions. The roles of the subdivisions in control of grasping such as:
the memorizing of the features of object for grasping, the calculation of value of
the object for grasping, the control of the movement’s precision, the retention of
the movement’s goal in working memory, and so on, are analyzed. The com-
plementarity of the dorsal and ventral regions of visual pathways in motion
control is shown. The separate problem is the coherency of the execution of all
this tasks. Each of the pathways performs its part by interchanging signals and
ensuring coordinated execution of the work.

Keywords: Dorsal visual stream Ventral visual stream Grasping

Premotor area Prefrontal area Value of action

1 Introduction

In 1982 the article by Ungerleider and Mishkin [1] introduced the “space versus object”
principle in interpretation of functions of different visual areas during perception. The
authors discovered that the processing of visual information starting in visual areas V1,
V2 divides in two streams: the ﬁrst, dorsal stream goes to the posterior parietal regions
through visual areas V5 and V6, the second, ventral stream proceeds to the temporal
lobe through area V4. The dorsal pathway is responsible for space perception, and the
ventral pathway is related with object perception. The authors called them “Where” and
“What” systems. The results were obtained in monkeys, but the information flow
division is true for humans too [2].
Let us take the well-studied process of grasping as an example of manipulative
actions to determine the roles of different visual streams and their interconnections.

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 197–203, 2020.
https://doi.org/10.1007/978-3-030-30425-6_23
198 I. A. Smirnitskaya

2 The Separation of the Dorsal Visual Stream

into the “Dorso-Dorsal” and “Dorso-Ventral” Sub-streams

Patients with posterior parietal cortex lesion can omit some operations of grasping [3].
A patient with optic ataxia disorder has difficulty in directing his arm towards an object
to be grasped. He can see the object and tell its location, but fails to get hold of it at
once, finding it as if by chance.
A patient suffering neglect has another type of malfunction: he can’t see an object at
all, but keeps implicit perception [4]. The difference is that in the first case the lesion is
in the superior parietal lobule and in the second, the lesion is centered in inferior
parietal lobule.
Both the superior parietal lobule (Brodmann area 5, SPL) and inferior parietal
lobule (Brodmann area 7, IPL) belong to the dorsal visual pathway and are the superior
and inferior parts of the intraparietal sulcus (IPS) (see Fig. 1). The SPL receives a
visual signal from the visual area V5 and sends the output signal to the dorsal premotor
area. The area is responsible for directing the hand and the eyes towards the object.
The IPL receives a signal from the visual area V6, its motor-region destination being
the ventral premotor area which controls the grasp motions of the hand and fingers.
The authors of paper [3] proposed a model of visual information processing in
dorsal visual stream that highlights two parts in it: the dorso-dorsal stream that goes
through the SPL to the dorsal premotor areas and the dorso-ventral stream that runs
through the IPL to the ventral premotor areas.

3 What Generally Should Be Done Before and During

Grasping
1. To sight the object and determine whether the object is familiar or not. The latter
implies scanning the whole dataset of images stored in the memory.
2. If the thing is familiar, it is necessary to determine its value by ﬁnding the object in
another dataset which stores the values of objects. If the object has a negative value,
i.e., it is dangerous, it should not be touched – instead it may be best to act quite
differently, e.g. to run away or to freeze. If the value is positive, it is necessary to
examine more general behavioral characteristics of the subject before grasping the
object. Speciﬁcally, it is necessary to decide whether to stop the current action and
occupy oneself with another job (e.g. leave the meal and attend to a toy). In the
latter case the grasping starts.
3. If the object is novel, not familiar, the grasping program is triggered to investigate
the object and to memorize its sensory characteristics and values.

3.1 The Different Behavioral Tasks of the Subdivisions of Dorsal

and Ventral Visual Streams
Figure 1 gives a rough delineation of ventral and dorsal visual streams. The common
pathway starts in the occipital visual areas. Then it parts: the dorsal stream goes to the
Contribution of the Dorsal and Ventral Visual Streams 199

parietal regions and proceeds to the motor, premotor and prefrontal areas of the neo-
cortex, the ventral stream runs to the inferior temporal areas, and ﬁnely to the ven-
trolateral pre-frontal cortex [2] being considered as the destination of the ventral
pathway. A detail inspection of pathways from inferotemporal area TE to prefrontal,
orbitofrontal and medial temporal regions points to the engagement of TE with the
network related to behavioral choice [6] determined by the values of objects and
possible actions.
The ventral visual pathway decides whether to answer to the input stimulus or not.
This means, that it solves two problems: (a) it determines the value of the stimulus and
(b) memorizes its sensory representation.
To cope with the ﬁrst problem, the interpretation of the visual signal is made in the
tem-poral and prefrontal areas. As a result, the object value is computed and the
behavioral choice is done. For this purpose, the inferior temporal area TE interchanges
signals with the amygdala, orbitofrontal cortex, hippocampal formation. In turn, the
amygdala, orbitofrontal and insular cortical areas are interconnected [7, 8] and jointly
calculate the value of objects [9]. The destination of the ventral pathway is the ven-
trolateral prefrontal cortex holding the response pattern.

Fig. 1. Two ways of interpretation of the visual signal: the ventral way (bottom part of the
ﬁgure) and dorsal way (the top part of the ﬁgure). The dorsal pathway divides in two ways: the
dorso-dorsal and dorso-ventral way. V1 – V6 are the occipital visual areas, TEO, TE stand for
inferior temporal areas, PMd, PMv are the dorsal and ventral pre-motor regions. Areas 46d and
46v are dorsolateral and ventrolateral prefrontal cortical regions.

The second problem is the memorizing and it is solved by the network consisting of
the inferior temporal area, hippocampus, perirhinal, postrhinal and entorhinal cortical
areas.
200 I. A. Smirnitskaya

The dorsal visual stream is the system that controls the action: it is responsible for
reaching the object by the arm and grasping by the ﬁngers.

4 The Dorsal Pathway. The Role of the Parietal, Motor

and Premotor Areas in the Control of Grasping

The visual and somatosensory features of objects are represented in the parietal cortex.
It consists of the primary somatosensory region S1 and higher-order areas that store a
combined visual and somatosensory representation of the object. These representations
are transmitted to the motor and premotor areas [5] to perform the action.
The somatosensory information that comes to the parietal cortex from the thalamus
is of two types: tactile and proprioceptive. The tactile information arrives from cuta-
neous mechanoreceptors embedded in the skin, that converts the mechanical defor-
mation of skin to neural signals. The proprioceptive information goes from deep
receptors telling about the degree of the compression and stretching of muscles, ten-
dons, ligaments and joints. For the motion that has already started, the both types of
information are feedback signals. That is, with respect to the somatosensory signal the
visual signal is a primary signal that triggers the motion. As the action lasts, the motion
is corrected; the tactile characteristics of the object such as the form, texture, weight are
analyzed and memorized; the patterns of the joint activity of hand and ﬁnger muscles
that secure proper grasp motions are also stored. For these purposes all areas partici-
pating in the initiation and execution of the motion (both primary and high-order areas)
sends feedforward and feedback projections.
Four subareas can be selected in the primary somatosensory area S1. These are
called Brodmann’s areas 1, 2, 3a and 3b. Area 3b is the primary area for tactile
reception, area 3a is the primary proprioceptive area. Area 1 is secondary for tactile
reception: its removal turns off the texture recognition; area 2 has equal amounts of
tactile and proprioceptive secondary inputs and it deal with coordination of ﬁngers in
grasping and with recognition of the form and size of objects being grasped.
The higher-order parietal areas form 2 clusters: lateral parietal areas and posterior
parietal areas.
In the previous paragraph, it was pointed out that the ventral visual pathway
determines the value of the object and the value of the manipulations with the object,
while the dorsal pathway is responsible for the arranging of the action. The exami-
nation of the pathways for sensory information in the parietal cortex (the dorsal
pathway) shows, that somatosensory characteristics of the object received during the
manipulation with it, arrive to the secondary somatosensory area S2 (the cluster of
lateral parietal areas), and this area sends signals to the insular cortex [10], which is a
part of the network storing values of objects and interacting with the ventral stream. We
see the joint activity of the dorsal and ventral pathways here.
The posterior parietal areas serve as the beginning of dorso-dorsal (SPL) and dorso-
ventral (IPL) pathways [3].
The dorso-ventral pathway starts in the inferior parietal lobule (Brodmann’s area 7)
which sends signals to the motor area M1 and ventral premotor area PMv. As a result
of interaction with S1, M1 and PMv, a distributed representation of sensory signals
Contribution of the Dorsal and Ventral Visual Streams 201

initiating the grasping is formed in the posterior parietal areas: the visual object to be
grasped, the direction towards the object and the handling characteristics of the object
(the form, size, weight and texture) found by referring to the previously investigated
and accumulated information [11]. Additionally, these areas interchange information
with inferior temporal areas TEa/m, which is a part of the ventral pathway: this area
sends a permission to act to posterior parietal areas. That is, the interaction of the dorsal
and ventral pathways also occurs in this place.
The dorso-dorsal pathway originating in the superior parietal lobule (Brodmann’s
area 5) sends signals to the dorsal premotor area PMd and is responsible for the
direction of the eyes and the arm towards the object. Though it doesn’t interchange
signals with inferior temporal areas and doesn’t receive signals about the value of the
object from them, the end point of the prefrontal cortex that receives the signal from the
dorso-dorsal pathway is the dorsolateral area 46d, which is considered to be the center
of the working memory. So, the dorso-dorsal pathway holds the holistic representation
of the current motor task.

5 The Pattern of Visual Pathways Interaction for the Control

of Grasping

Giving the formalized consideration of the trial-and-error learning, the classical text-
book Reinforcement Learning by Sutton and Barto [12] begins with the description of a
gambling machine which has n options with different winning probabilities (Multi-
armed Bandits). The game with this timeless device comes down to the repetition of the
same event: each time we, as though anew, come to the machine, activate an arm and
hope for a win. And only our memory keeps different outcomes, adding a one-time
outcome to the sequence of previous results. Having used this static example to
introduce the concepts of value function and prediction problem, the authors quickly
turn to the main objective: a sequence of actions where each step can be different and
where the desire to get the greatest reward necessitates the optimization of the whole
sequence.
If the transformation of external sensory (visual) signals into motor commands is
regarded as either a discrete action or succession of actions, the grasping is a discrete
act, while reaching by the arm followed by grasping an object by the ﬁngers is a
sequence of actions. As regards the necessary calculation of the action value, the
grasping value is equal to the value of the object to be grasped. The value of the action
sequence consisting of outstretching of the hand and grasping of the object is also equal
to the value of the object. However, there are many exceptions, e.g. when experi-
menters put different obstacles in the course of the hand, the values of action sequences
vary.
The ventral visual stream calculates the value of actions (in the discrete case, it is
equal to the value of the object being manipulated), and for an action sequence the
value can be found differently. It is important that the ventral visual stream engages the
hippocampus which is responsible for remembering new objects and corresponding
action sequences. In dealing with sequences the working memory plays an important
202 I. A. Smirnitskaya

role. The dorsolateral prefrontal cortex (area 46d) is regarded as a substratum of this
kind of memory. This cortical area interacts with the hippocampus.
It is widely accepted that the ventral stream endpoint is the ventrolateral prefrontal
cortex. It is true for discrete actions which, as mentioned above, are governed by the
dorsoventral stream.
The dorso-dorsal stream whose endpoint is the dorsolateral cortex is responsible for
the representation of the sequences of actions, in other term, the holistic representation
of the motor task.

Dorso-dorsal

Object
DLPFC PMd

SPL
VLPFC PMv

Value
Motor, Visual
calculation Premotor IPL
Areas
areas
Prefrontal Dorso-ventral V1, V2
OFC Insula
cortex Parietal
cortex
V4

TE, TEa/m
Ventral
stream
Inferotemporal
Hippocampus,
areas
entorhinal cortex

Fig. 2. The dorsal and ventral visual streams and their subsystems

6 Conclusion

There are 4 visual pathways (Fig. 2). These are dorso-dorsal and dorso-ventral path-
ways which belong to the dorsal stream. Within the ventral stream the pathway from
the visual areas to the inferior temporal areas TE splits to go to the orbitofrontal cortex
and to the hippocampal areas. Though these pathways execute their own tasks, they
interact and function conjointly.

Acknowledgement. The review was done within the 2019 state task 0065-2019-0003 Research
into Neuromorphic Big-Data Processing Systems and Technologies of Their Creation.
Contribution of the Dorsal and Ventral Visual Streams 203

References
1. Ungerleider, L.G., Mishkin, M.: Two cortical visual systems. In: Ingle, D.J., Goodale, M.A.,
Mansfield, R.J.W. (eds.) Analysis of Visual Behavior, pp. 549–586. MIT Press, Cambridge
(1982)
2. Kravitz, D.J., Saleem, K.S., Chris, I., Baker, C.I., Ungerleider, L.G., Mishkin, M.: The
ventral visual pathway: An expanded neural framework for the processing of object quality.
Trends Cogn. Sci. 17(1), 26–49 (2013)
3. Rizzolatti, G., Matelli, M.: Two different streams form the dorsal visual system: anatomy and
functions. Exp. Brain Res. 153, 146–157 (2003)
4. Rizzolatti, G., Berti, A., Gallese, V.: Spatial neglect: neurophysiological bases, cortical
circuits and theories. In: Boller, F., Grafman, J., Rizzolatti, G. (eds.) Handbook of
neuropsychology 2nd edn, vol. I, pp 503–537. Elsevier Science, Amsterdam (2000)
5. Delhaye, B.P., Long, K.H., Bensmaia, S.J.: Neural basis of touch and proprioception in
primate cortex. Compr. Physiol. 8(4), 1575–1602 (2019)
6. Murray, E.A., Rudebeck, P.H.: The drive to strive: goal generation based on current needs.
Front. Neurosci. 7, 1 (2013). Article112
7. Höistada, M., Barbas, H.: Sequence of information processing for emotions through
pathways linking temporal and insular cortices with the amygdala. Neuroimage 40(3), 1016–
1033 (2008)
8. Ghashghaeia, H.T., Hilgetaga, C.C., Barbas, H.: Sequence of information processing for
emotions based on the anatomic dialogue between prefrontal cortex and amygdala.
Neuroimage. 34(3), 905–923 (2007)
9. Smirnitskaya, I.A.: How the cingular cortex, basolateral amygdala and hippocamp contribute
to retraining. In: Proceedings of the XV All-Russia Conference Neuroinformatics (2013)
10. Friedman, D.P., Murray, E.A., O’Neill, J.B., Mishkin, M.: Cortical connections of the
somatosensory fields of the lateral sulcus of macaques: evidence for a corticolimbic pathway
for touch. J. Comp. Neurol. 252, 323–347 (1986)
11. Borra, E., Gerbella, M., Rozzi, S., Luppino, G.: The macaque lateral grasping network: a
neural substrate for generating purposeful hand actions. Neurosci. Biobehav. Rev. 75, 65–90
(2017)
12. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press,
Cambridge (2018)
Deep Learning
The Simple Approach to Multi-label Image
Classification Using Transfer Learning

Yuriy S. Fedorenko(&)

Bauman Moscow State Technical University, Baumanskaya 2-Ya, 5,

105005 Moscow, Russia
[email protected]

Abstract. The article deals with the problem of image classification on a rel-
atively small dataset. The training deep convolutional neural net from scratch
requires a large amount of data. In many cases, the solution to this problem is to
use the pretrained network on another big dataset (e.g. ImageNet) and fine-tune
it on available data. In the article, we apply this approach to classify advertising
banners images. Initially, we reset the weights of the last layer and change its
size to match a number of classes in our dataset. Then we train all network, but
the learning rate for the last layer is several times more than for other layers. We
use Adam optimization algorithm with some modifications. Firstly, applying
weight decay instead of L2 regularization (for Adam they are not same)
improves the result. Secondly, the division learning rate on the maximum of
gradients squares sum instead of just gradients squares sum makes the training
process more stable. Experiments have shown that this approach is appropriate
for classifying relatively small datasets. Used metrics and test time augmentation
are discussed. Particularly we find that confusion matrix is very useful because it
gives an understanding of how to modify the train set to increase model quality.

Keywords: Image recognition Transfer learning Adam One cycle policy

Weight decay Amsgrad Test time augmentation Confusion matrix

1 Introduction

Deep convolutional neural networks are very effective for solving image classification
task. However, training such networks from scratch (with random initialization) is not
always possible because it requires a large amount of data. Therefore transfer learning
has become common in many applied tasks [1]. Deep learning frameworks already
have common convolutional neural networks (VGG [2], ResNet [3], Inception [4])
pretrained on ImageNet. So, there is no need to train models yourself on this dataset.
But in practice there are several issues that need to be solved. The first problem is
connected with proper learning rate selection. Too small value may result in a very long
training process which stop on a flat valley. Too large value may lead to learning a sub-
optimal set of weights. Besides, the learning rate on the last layers of the network
should be greater than on the first layers, because the earlier layers of the network have
enough generic features that may be useful in many tasks. The second problem is
connected with unstable training process when using Adam algorithm.

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 207–213, 2020.
https://doi.org/10.1007/978-3-030-30425-6_24
208 Y. S. Fedorenko

2 Problem Deﬁnition

In this article, we consider the classiﬁcation of advertising banners images. The user
interest in the banner depends on the banner image, so it’s important to determine the
banner image topic. The banner image is fed on the input of the model. The model
output is one or several classes of image in our specialized taxonomy. But there are
several problems. Firstly, the number of labeled images is relatively small. It is mea-
sured in hundreds not thousands of samples. This amount of data is not enough to train
the model from scratch. Secondly, images of advertising banners are speciﬁc enough,
so we can’t use pretrained on ImageNet model directly. And thirdly, each image can
belong to several classes. For example, it may be the advertising of mobile application
to call a taxi. In such a case, the model should detect two classes: mobile app and taxi.

3 The Training Procedure

To deal with the first two problems we use transfer learning. We take pretrained neural
network, reset last layer weights and change last layer size to match the number of
classes in our taxonomy. We train all network, but for the last layer, the learning rate is
five times more than for other layers. Also, we use an adaptive learning rate [5].
Initially, the upper limit of the learning rate is searched. To find it we increase the
learning rate step by step from small value and train the neural net on each step. The
whole procedure takes only about 10–20 epochs, so the classical overfitting after
multiple passes through the training set does not have time to happen. The minimum
learning rate at which the validation set error starts to increase is the required upper
limit. The example is presented in Fig. 1. After each epoch, the learning rate was
increased by 1 step (0.0001), and the loss value in the training and validation set was
marked on the graph.

Fig. 1. Searching for the upper limit of learning rate

The Simple Approach to Multi-label Image Classiﬁcation 209

We start training with a learning rate of 1/10 from the upper limit value. Then
learning rate subsequently increases to the upper limit after that it decreases back
(Fig. 2). This method, called the one cycle policy, has a simple motivation. At the start,
the small learning rate provides more accurate convergence. Then when optimizer
traverses a flat valley, increasing of learning rate allows to speed up training. In the
final stages, optimizer falls into the local minimum, and the learning rate is again
reduced to provide more accuracy. Besides, it’s argued that a relatively high learning
rate in the middle of the training process is a form of regularization because it helps the
network to avoid steep areas of the loss function which correspond to overfitted con-
figurations [6].

Fig. 2. One cycle for learning rate

For training, we use Adam algorithm with modiﬁcations. Many researchers have
disappointed in Adam after it introducing in 2014, claiming that SGD with momentum
performs better. But in 2017 in [7] AdamW algorithm was proposed. It used weight
decay instead of L2 regularization. As known, L2 regularization implies adding sum of
the model weights squares to the loss function:
c Xn
Jr ¼ J þ x2
k¼1 k
2
where J – loss function, Jr – loss function with regularization term, c – regularization
coefﬁcient and xk – weights of the neural net. For simple SGD it leads to weight decay
because updating rule is as follows:

@J
xk ¼ xk a a c xk
@xk

Where a – learning rate. But for more sophisticated optimizers such as Adam, this
is not true, because regularization term in loss function affects the value of the accu-
mulated gradients and gradient squares. So, Adam with L2 regularization and Adam
with weight decay (AdamW) are two different approaches. In [7] authors argue that we
should use AdamW instead of Adam with L2 regularization implemented in classic
deep learning frameworks. Our experiments show that AdamW leads to a better result,
so we have used it.
210 Y. S. Fedorenko

One more modification is Amsgrad technique. In the article [8] an error was found
in Adam update rule. It could cause an algorithm to converge to a suboptimal point.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The problem is that the proof of Adam requires that the step size a= E ½g2 þ e does
not decrease over training process. But this is not satisfied in many cases because the
exponential moving average of gradient squares E ½g2 may decrease in the last epochs
of training. So, authors of Amsgrad suggested using the maximum value of this
quantity, because it is guaranteed non-increasing. In practice, the effect of such mod-
ification is controversial. But in our experiments using Amsgrad allows achieving
better and more stable results compared to simply Adam. So, we use Adam with weight
decay and Amsgrad technique.
As mentioned above, each sample may belong to multiple classes. In such a case,
the sample is passed to the model several times separately with each label. This allows
considering multi-label images in a simple way. Also, we use data augmentation during
training to improve network generalization.

4 Experiments

In the experimental analysis, we use ResNet18 network pretrained on ImageNet. We

use ResNet18 model because it is simple in relation to other deep convolutional neural
networks (so it requires less memory and training time), but the result is comparable
with other more complex models. We use ImageNet dataset, because it contains wide
variety of classes. It is also convenient in practice that deep learning frameworks has
pretrained models on this dataset. Then we split our dataset into train, validation and
test parts in the ratio 3:1:1 and train model as described above. Also for improving the
result, we use test time augmentation (TTA) [9]. The main idea of this approach is to
perform random transformations on the test set. Images from the test set are augmented
several times, and for each of them predictions are calculated. Then they are averaged.
This technique works because after averaging predictions errors are averaged too. And
if there is an error on one sample, leading to the wrong answer, it may disappear after
averaging over several samples, because errors on each sample will be differ, and only
correct answer stand out.
For evaluating model quality, we use confusion matrix and precision-recall graphs.
The ﬁrst one is the matrix which shows mutual errors between classes. By analyzing
these metrics, one can conclude which classes have many false positives or false
negatives. It gives an insight on how to modify train and validation sets (in our task we
prepare dataset ourselves). Proper dataset preparation has a strong effect on the result.
The second one shows precision and recall value for each class (for better readability,
not precision is shown but 1 – precision). It allows to visually estimating good classes
and classes with many false positives or false negatives. The fragment of confusion
matrix and precision-recall graphs are shown in Figs. 3 and 4 accordingly.
The Simple Approach to Multi-label Image Classiﬁcation 211

Fig. 3. Confusion matrix for “Auto” category

Fig. 4. Precision-recall graphs for “Auto” category (the top chart without TTA, the bottom chart
with TTA by 10 samples)

So, we can see that using test time augmentation slightly improves the result.
Examples of correct and wrong images classiﬁcation are presented in Fig. 5.
212 Y. S. Fedorenko

Fig. 5. Examples of images from “Auto” category with model answers

5 Conclusion

So, the concrete images classification tasks can be performed by transfer learning. It
solves the problem of a relatively small dataset and eliminates the need for a com-
putationally time-consuming procedure of training model from scratch. The using of
Adam optimization algorithm with its recent modifications along with proper learning
rate selection improves the training process and makes it more stable. Also, the dataset
preparation is crucial. The analyzing of confusion matrix and viewing misclassified
samples gives understanding, how to modify train dataset. Several iterations of dataset
enhancement usually yield an acceptable practical result.

References
1. Karpathy, A.: Convolutional neural networks for visual recognition. https://cs231n.github.io/
transfer-learning/. Accessed 1 Apr 2019
2. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition (2015). arXiv preprint, arXiv:1409.1556v6 [cs.CV]
3. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE
Conference on Computer Vision and Pattern Recognition, CVPR, pp. 770–778. IEEE, New
Jersey (2016)
The Simple Approach to Multi-label Image Classiﬁcation 213

4. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens J., Wojna, Z.: Rethinking the inception
architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern
Recognition, CVPR, pp. 2818–2826. IEEE, New Jersey (2016)
5. Smith, L.: Cyclical learning rates for training neural networks. In: IEEE Winter Conference on
Applications of Computer Vision, WACV, pp. 464–472. IEEE, New Jersey (2017)
6. Gupta, A.: Super-convergence: very fast training of neural networks using large learning rates.
https://towardsdatascience.com/https-medium-com-super-convergence-very-fast-training-of-
neural-networks-using-large-learning-rates-decb689b9eb0. Accessed 10 Apr 2019
7. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2019). arXiv preprint,
arXiv:1711.05101v3 [cs.LG]
8. Reddi, S., Kale, S., Kumar, S.: On the convergence of adam and beyond. In: International
Conference on Learning Representations, ICLR, Vancouver, BC, Canada , pp. 186–208
(2018)
9. Ayhan, M., Berens, P.: Test-time data augmentation for estimation of heteroscedastic aleatoric
uncertainty in deep neural networks. In: Medical Imaging with Deep Learing Conference,
MIDL, Amsterdam, Netherlands, pp. 278–286 (2018)
Application of Deep Neural Network
for the Vision System of Mobile Service Robot

Nikolay Filatov1(&), Vladislav Vlasenko1, Ivan Fomin1,

and Aleksandr Bakhshiev1,2
1
The Russian State Scientiﬁc Center for Robotics and Technical Cybernetics,
Tikhoretsky Prospect 21, 194064 Saint-Petersburg, Russia
[email protected]
2
Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya, 29,
195251 Saint-Petersburg, Russia

Abstract. The solution of object detection task is valuable in many ﬁelds of

robotics. However, application of neural networks for mobile robots requires the
use of high – performance architectures with low power consumption. In search
of suitable model, a comparative analysis of the YOLO and SqueezeDet
architectures was conducted. The task of detecting wooden cubes by mobile
robot with the camera with the aim of collecting them was solved. A speciﬁc
dataset was constructed for the training purposes. Applied SqueezeDet neural
network has reached precision 89% and recall 82% for IOU 0.5.

Keywords: Convolutional neural network SqueezeDet Object detection

Service robot

1 Introduction

With the development of deep neural networks used for the classiﬁcation, segmentation
and detection of objects, the area of their application is also growing [1].
The use of neural networks to increase the level of autonomy of vehicles is a
popular and urgent task. Also, neural network methods are often used to improve the
accuracy of orientation of mobile robots in environment [2]. In general, the task of
object detection by video image is extremely promising in robotics, its solution allows
scout robots to increase the level of autonomy when searching for objects of interest,
which is important when working in extreme conditions. It will also be useful to apply
these technologies in the service robotics industry to create more intelligent systems
capable of ﬁnding certain items.
The main limitation for the implementation of neural network algorithms is the high
requirements to computing hardware. This problem is being widely solved by the
community and at the moment there is a set of methods that provide improved speed. It
is relevant to compare and integrate these methods used in the real tasks of robotics.

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 214–220, 2020.
https://doi.org/10.1007/978-3-030-30425-6_25
Application of Deep Neural Network for the Vision System 215

2 SqueezeDet and YOLO Architectures Comparison

Single-stage neural network detectors have the highest speed, in which assumptions
about the location of objects and the probability of their belonging to certain classes are
made simultaneously with a convolutional neural network. Such neural networks are
YOLO [3] and SqueezeDet [4]. The principle of operation is to extract from the image
multidimensional feature maps and use them to train one (or more) layers, the output of
which is a tensor containing the estimated coordinates of objects and indices of their
classes. In the case of SqueezeDet, feature maps are retrieved using the SqueezeNet
high-performance neural network [5]. Coordinates are predicted in accordance with the
specified sampling grid, and object patterns—anchors. The templates are deformed and
shifted relative to the sampling grid, and each is assigned a confidence value, according
to which they are then filtered using non-maximum suppression.
Neural networks SqueezeDet and YOLO have the same principle of operation, but
the architecture of SqueezeDet was created more specifically to be embedded in low-
power platforms, which causes differences in the structure and performance of these
neural networks.
Consider the layers responsible for the object detection using the input object maps.
In SqueezeDet, the detection layer is a convolutional layer called ConvDet; for sim-
plicity, we denote the block responsible for detection in YOLO as FcDet, since it
consists of two fully connected layers.
Assume that the input feature map width is Wf, height is Hf, and input channels
number is Cf. Denote ConvDet’s filter width as Fw and height as Ff. With the proper
striding the output of ConvDet keeps the initial size of input feature map. Thus, to
compute K ð4 þ 1 þ CÞ outputs for each reference grid the ConvDet requires
FwFhChfK(5 + C) parameters (Fig. 1).

Fig. 1. ConvDet layer.

Using the same notation and designating the number of neurons in the ﬁrst layer of
the FcDet block as Ffc1, it can be determined that the number of parameters in the ﬁrst
fully connected layer will be WfHfChfFfc1. The second fully connected layer that
generates C class probabilities and K(4 + 1) bounding box coordinates for the WoxHo
sampling grid contains is Ffc1WoHo(5 K + C) (Fig. 2). The total number of param-
eters in these two fully connected layers is Ffc1(WfHfChf + WoHo(5 K + C)).
216 N. Filatov et al.

Fig. 2. FcDet layers.

The tensor 7 7 1024 is taken as the input feature map in YOLO, Ffc1 = 4096,
K = 2, C = 20, Wo = Ho = 7. Thus, the total number of parameters required for two
fully connected layers will be approximately 212 106. If the same conﬁguration
parameters are used for 3 3 ConvDet it would only require 3 3 1024 2 25
0:46 106 parameters which is 460 times smaller than FcDet.
A small number of parameters of the neural network certainly allow you to require
less space in the memory and provide a higher speed. However, due to the different
computational complexity of the layers, the speed of the architecture is not directly
proportional to its size, therefore, it is important to check the speed of operation of the
studied architectures on identical hardware. For the YOLO neural network detector,
there is a lightweight version - tiny-YOLO, the sizes of the architectures SqueezeDet,
YOLOv3, tiny-YOLO are shown in Table 1.

Table 1. Comparison of model sizes for selected architectures.

Architecture SqueezeDet Tiny-YOLO YOLOv3
Memory, MB 7 34 243

The speed of the neural network detector also depends on the size of the input
image, which allows you to adjust the size of the processed image in order to achieve
optimal accuracy and processing speed. Two series of experiments were performed for
three resolutions, using different hardware. In the ﬁrst experiment (Table 2) compu-
tations were carried out without the use of a graphics processing unit, on the
CPU AMD A10 9600p (2.4 GHz, 4 cores). In the second experiment (Table 3),
computations were performed using an Nvidia GeForce GTX 1070 graphics processor
(8 GB, 1683 MHz) and CPU Intel core I7 8700 (3.2 GHz, 6 cores).
Taking into account speed and model sizes of compared architectures SqueezeDet
neural network was chosen for object detecting task of mobile robot.
Application of Deep Neural Network for the Vision System 217

Table 2. Comparing the speed of neural networks to detect objects using CPU AMD A10
9600p.
Input image resolution, pix Frame processing time, s
SqueezeDet Tiny-YOLO YOLOv3
320 240 0,10 0,32 1,99
640 480 0,39 1,04 5,72
1280 1024 1,78 5,08 32,12

Table 3. Comparing the speed of neural networks to detect objects using GPU Nvidia
GeForce GTX 1070, CPU Intel core I7 8700.
Input image resolution, pix Frame processing time, s
SqueezeDet Tiny-YOLO YOLOv3
320 240 0,006 0,016 0,050
640 480 0,009 0,026 0,088
1280 1024 0,027 0,054 0,253

3 Problem Statement and Dataset Construction

It is required to develop a vision system for a service robot that can collect objects of
interest recognized using video camera images. The collected objects are wooden cubes
with a side of 33 mm with magnetic inserts in the centers of the faces.
For the application of neural network detector, data sets were made, the annotations
to the images contain the coordinates of the cubes. Constructed data sets can be divided
into two: «office» and «hall» . The first one contains 640 images of cubes in various
scenes inside the office premises, the shooting angle is arbitrary. The hall data set
consists of photographs obtained directly from a mobile robot in a large hall which is
convenient for experiments. The resolution of all images in the data sets was limited to
640 480 pixels to ensure high speed of the neural network.
It was decided to test the vision system in the hall for better specification of the
task. Thus, the hall dataset was main one and the office dataset was made for initial and
additional experiments.

4 Experimental Research

We studied the effect of adding non-target scenes to the training set, as well as the effect
of the choice of anchor boxes on the detection range and the accuracy of localization of
objects.
The correct setting of anchors is crucial in the SqueezeDet detector, since it is a
template and initial approximation of objects of interest. It is recommended to ﬁnd the
values of the anchors using the clustering of the annotations via the k-means method
[6]. However, due to problems with the multiple detection of an object a second set of
218 N. Filatov et al.

anchors was obtained by increasing the scale of the ﬁrst set of anchors. Denote the
anchors obtained by clustering as “precise”, the others as “enlarged” and consider the
inference peculiarities of a neural network when using these anchors. The values of
anchor boxes are shown in the Table 4.

Table 4. Anchor boxes used in training.

Name Anchor 1 Anchor 2 Anchor 3
Precise 20 20 36 36 64 64
Enlarged 36 36 64 64 100 100

A typical neural network prediction error when using “exact” anchors is the mul-
tiple detection of a single object, which leads to additional errors, since the additional
bounding boxes usually have a low intersection coefﬁcient (IOU) with annotation.
A good feature is the detection of small scale objects (Fig. 3b). In contrast, with the use
of “enlarged” anchors, repeated detections occur rarely, and objects over long distances
are not detected (Fig. 3a).

Fig. 3. Typical inference errors when using different anchors: (a) small scale object is not
detected, enlarged anchors, (b) multiple detection of singe object, precise anchors.

Such properties can be explained by the fact that large objects stand out from the
background more strongly and the loss function for them converges faster, therefore
bounding boxes based on small anchors can acquire relatively high conﬁdence on
fragments of a large object.
An experiment in which we compared the precision and recall of the three trained
models was conducted. Key features of the models learning process are shown in
Table 5. The experimental results are shown in Fig. 4. In all cases, as the initial weights
for neural network the weights obtained by training on the Kitti dataset [7] was used.
weights of the neural network trained on the Kitti data set were used as starting
Application of Deep Neural Network for the Vision System 219

weights. An erroneous detection is any bounding box that intersects with the annotation
less than the speciﬁed IOU threshold.

Table 5. Description of experiments.

Name Anchors Train dataset Test dataset
Hall enlarged Hall, 910 images Hall, 220 images
Hall + ofﬁce enlarged Hall + ofﬁce, 1400 images Hall, 220 images
Hall precise anchors precise Hall, 910 images Hall, 220 images

Analyzing these graphs, it is clear that, despite periodic multiple detections, the
model with precise anchors has better characteristics. It is also seen that the stable
omission of distant objects leads to a decrease of recall for models «hall»
and «hall + office» . At the same time, characteristics of the last two models are almost
the same, but the model, trained on hall and office photos, may be considered better due
to the fact that it works well in a larger variety of scenes.
Despite the high accuracy of one of the models, this quality assessment cannot be
final because it allows multiple detections of a single object, which is unacceptable
when planning a route for a mobile robot. To exclude multiple object detections, an
additional stage of filtering predictions was added. The implemented algorithm saves
only one the bounding box with the greatest confidence in the area of one detection.
Recalculation of quality metrics using additional filtration is shown in Fig. 5.
The use of additional filtering not only made the technical vision system convenient
to use, but also improved the F1 – score defined as:

precision recall
F1 ¼ 2 ð1Þ
precision þ recall

Then the maximum value of F1 before ﬁltering was 0.80, and the maximum value
of F1 after ﬁltering is 0.84.

Fig. 4. Precision - recall curves.

220 N. Filatov et al.

Fig. 5. Precision – recall curves after additional ﬁltration.

5 Conclusion

A high-performance neural network detector has been used, which can be used on a
wide range of hardware suitable for use on low-power platforms.
For the task of searching and collecting wooden cubes by mobile robot, training
datasets was created. The features of trained neural was analyzed, trained model
achieved high precision and recall on test dataset.
The probable direction of the research development is the analysis of ways to
increase the range of object detection for a certain camera and neural network detector.
Research on the object detection precision of small-scale objects depending on the
resolution of the input image and applied preprocessing.

Acknowledgment. This work was done as the part of the state task of the Ministry of Education
and Science of Russia No. 075-00924-19-00 “Cloud services for automatic synthesis and vali-
dation of datasets for training deep neural networks in pattern recognition tasks”.

References
1. Nielsen, M.A.: Neural Networks and Deep Learning, vol. 25. Determination Press, San
Francisco (2015)
2. Asadi, K., et al.: Real-time scene segmentation using a light deep neural network architecture
for autonomous robot navigation on construction sites. arXiv preprint arXiv:1901.08630
(2019)
3. Redmon, J., et al.: You only look once: unified, real-time object detection. In: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
4. Wu, B., et al.: Squeezedet: unified, small, low power fully convolutional neural networks for
real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops, pp. 129–137 (2017)
5. Iandola, F.N., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
and <0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016)
6. Zhao, Z.Q., et al.: Object detection with deep learning: a review. IEEE Trans. Neural Netw.
Learn. Syst. (2019)
7. Geiger, A., et al.: Vision meets robotics: the KITTI dataset. Int. J. Rob. Res. 32(11), 1231–
1237 (2013)
Research on Convolutional Neural Network
for Object Classification in Outdoor Video
Surveillance System

I. S. Fomin(&) and A. V. Bakhshiev

The Russian State Scientiﬁc Center for Robotics and Technical Cybernetic
(RTC), Tikhoretsky Prospect 21, 194064 Saint Petersburg, Russia
{i.fomin,alexab}@rtc.ru

Abstract. Nowadays indoor and outdoor video surveillance systems are very
widespread. Earlier, in the days of the first surveillance systems, the processing
power allowed only monitoring and recording the surveillance footage, but now
it becomes possible to use various methods of video analysis; in this article, we
investigate the convolutional neural networks application in the objects classi-
fication. In our previous work we developed the outdoor video surveillance
system for detecting objects using fixed and PTZ cameras. System provides
detection of moving objects with low computational cost and high accuracy.
This paper summarizes the results of work on the existing outdoor video
surveillance system for detecting objects and our new convolutional neural
network object classifier based on the Keras and TensorFlow packages. Reliable
determination of the object type allows the system to make decisions on pro-
cessing the object information. The considered classifiers allow performing both
simple classification (a person/not a person) and more complex one (error/
person/car/animal) with insignificantly lower reliability. Object tracking in
consecutive video frames can remarkably reduce the number of classification
operations, because there is no need in performing them for each frame in case
the object class has been identified with enough reliability. In addition, the
integration of the developed networks into the existing video surveillance sys-
tem is briefly described.

Keywords: Outdoor video surveillance Video analysis

Object classiﬁcation Convolutional neural networks Keras TensorFlow

1 Introduction

Nowadays the video surveillance systems are enough widespread to become a standard
way to ensure the inviolability of the various area types - from the government and
industrial buildings to private or company facilities. The video cameras have been used
for monitoring for a long time, and usually it is required to have some special human
personnel for analyzing camera footage. But it is quite difﬁcult for people to remain
constantly attentive, moreover, one person is not enough in case of multiple cameras
(there may be more than hundred cameras in a large industrial area). Also a person can
distract and miss an important event. A logical solution in this situation is the use of

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 221–229, 2020.
https://doi.org/10.1007/978-3-030-30425-6_26
222 I. S. Fomin and A. V. Bakhshiev

intelligent systems. Since the working conditions are very diverse such systems may
produce some false object detections. Integration of classifier based on convolutional
neural network into the video analysis system allows us to decrease the amount of
detection errors and expand the software functionality.
In our previous work [1] and [2], algorithm of the moving objects detection has
been introduced, it includes the set of object detection methods and performs with low
computational cost and high accuracy. The main distinctive feature of proposed
algorithm is the ability to detect and track objects under very difficult conditions. There
can be snow, rain, swaying grass, branches and bushes, changing lighting (in the partly
cloudy weather) in the frame, objects can have ultra-low resolution (single or dozens of
pixels). When properly configured this system performs well with both fixed and Pan-
tilt-zoom (PTZ) cameras.
The system is capable to classify objects by their position on the reference plane in
front of the camera, by the pixel and metric object sizes, and by the pixel or metric
movement speed. It helps to filter out some false positives, but sometimes this is not
enough.
Recently the convolutional neural networks idea becomes very popular for devel-
oping various computer vision systems. The neural networks working principle has a
lot in common with the human vision process. At the moment, the classification
methods based on the neural networks significantly exceed the classic (non-neural
network) methods. The neural network architectures now go far from the biological
prototype, but still produce outstanding results.
There are two fundamental problems with the use of previously developed algo-
rithms based on the space-time filtering of the video stream in the video surveillance
system. The first problem is to separate correctly detected objects from the false ones.
To cope with this issue, we propose to design a binary classifier based on the con-
volutional neural network. Since in the most cases the objects of system interest are
people, the classifier must determine with very high accuracy (more than 99%) whether
a person is within the selected area or it is an another object. This will eliminate the
system false positives for the non-human objects.
The second problem is more complex and relates to use-case with the specific
system responds depending on the type of detected and classified object. To solve this
task, we propose to design the convolutional neural network based classifier for
detecting several basic object types which are assumed to be treated in a specific way.
Since we need to determine the object type, this classifier must predict the object class
with more than 90% accuracy. A multiclass classification is always more complicated
than a binary one, especially when the image quality is not stable and the object classes
are unevenly distributed in the training dataset.
The paper considers original approaches to the synthesis of the architecture of
neural networks that require minimal computing resources. These architectures are able
to solve problems of classification of low-resolution objects extracted from the results
of the work of the video analytics system.
Research on Convolutional Neural Network for Object Classification 223

2 Related Work

As an approach, convolutional neural networks are known for a long time. For
example, paper [3] considered model of visual system that later was generalized to a
convolutional neural network.
In 1998, LeCun presented his work [4], in which for the first time the architecture
of a convolutional neural network for the objects classification was suggested, and for a
long time it was considered to be classic and almost the only one suitable for this task.
The main architect idea is the convolution operations with images and different
numbers of different sizes convolution kernel alternating with the pooling operations,
where the strongest activation is selected from each 2 2 square on the feature
map. The number of operations and parameters differ depending on the system settings,
but the idea is still the same.
After about a decade, computer performance has reached a level which allows not
only individual scientists with the special resources access and support to try solving the
object recognition tasks using the neural networks, but also video cards with a large
processing power became available for a wide range of people, and some packages for
direct use of the graphics cards memory appear. In 2012, Krizhevsky, a follower of
LeCun, and today also one of the most famous people in the field, proposed a new
architecture [5] which became the classic neural network structure for the object clas-
sification (see Fig. 1). Only after his work it became clear that deep learning requires a
huge amount of data, computing power and time. The most famous AlexNet model
includes more than 60 million parameters and uses two graphics accelerators for
training. The network ideologically inherits LeNet, but increases it three times. The
convolutional layers were added; the size of the convolution kernels was proposed to be
reduced from the input to the output. Some approaches have been proposed to avoid
overtraining, they are still popular and will be used further in the proposed architectures.

Fig. 1. The AlexNet architecture

This is DropOut (excluding some neural connections on each iteration of training),

data augmentation (adding artificial data and artificial extension of the training dataset),
as well as a modified type of neuron activation in ReLu (activation of the neuron is
possible only to the right of zero).
224 I. S. Fomin and A. V. Bakhshiev

After the publication of the paper [5], even more rapid growth of the neural net-
works complexity began for the objects classification problem solving, some of the
architectures are worth being briefly noted.
In CCCP Pooling [6], the use of fully connected single-layer perceptrons included
in ordinary convolutional layers for some parts of the network is proposed, this reduces
the number of features, but increases the number of network parameters. For some
cases it works better.
Some very good significant results were achieved by various VGG networks [7].
VGG-16 is one of the most well-proven architectures, very stable and showing good
results. VGG-19 reaches the limit of deepening development for this architecture type.
Well-known GoogleNet [8] was created based on the VGG-19 network, and even more
complex Inception model [9] was designed on the GoogleNet basis. This model also
had some modifications, and after adding several ways to normalize weights between
the epochs, simplifying, rearranging and changing the structure of individual parts
reached the limit in the Inception V3 network [10].
The last significant improvement which leads to creating several new models on its
basis, in particular, Inception V4, and still continues to evolve - is that with increasing
the number of layers in the VGG classical architecture, the limit will be reached soon.
But if we allow the network to skip individual layers during the transfer of information
and assume that the weights of the individual layers are equal to zero, different layers
and different depths will be involved in solving various recognition tasks during the
training process, and in this case the increase of the number layers will produces the
desired result. The ResNet architecture [11] based on this principle is one of the most
popular at the moment, although the expensive GPU computing is still required for the
training of the super complex architects described above.
Most of the object classification architectures described above are designed for
conditions different from those available in video surveillance systems. The input
image in AlexNet is 228 228, which is good for object recognition on large color
images similar to those presented in the ImageNet competition. Examples of image
fragments that need to be classified by the video surveillance system are shown in
Fig. 2 below.

Fig. 2. Examples of images with people (left) and without (right)

Small fragments sometimes can require a slightly complicated version of LeNet for
the object classification, but definitely not AlexNet or anything more complex.
Research on Convolutional Neural Network for Object Classification 225

For classiﬁcation of objects similar to represented on the picture above we chose

new original architectures. Then we should train neural networks using the marked data
from the video surveillance system.

3 Training and Test Datasets

To train and test neural networks for the object classification, at first we need to prepare
datasets specific for the task that needed to be solved - decide whether the detected
object belongs to the person or error class in the first case; error, person, cars, dogs or
cats in the second case, based on the result obtained from the video analytics system.
First of all, we should provide more information about the video analytics
parameters, which will affect the dataset and the network architecture. The surveillance
cameras are set in the following way: at night the survey is performed in the black and
white mode with the use of a special night infra-red illumination. Thereby, we made a
decision to always use black-and-white images, or convert the color ones to shades of
gray for training and verification.
The image resolution at the system input is 1280 720, but in order to increase the
speed, we decrease it 2 times to the 640 360 resolution. According to this, all
examples of result areas with objects will be extracted from the frame of the reduced
resolution.
The rectangle sizes that the algorithm allocates differ from each other, from 10–20
to 100–200 or slightly more pixels on the larger side; in addition, the center of the
rectangle is not always located exactly on the object, it is often shifted to the side.
Examples of why it is difficult to select an object based only on the coordinates of the
rectangle, and how the same frame can appear at different times of the day, are shown
on Fig. 3. There is an example of how the system detects a dog on the left frame - if
you cut off this bounding box, only a small and hardly recognizable part will remain.
There is a different problem in case of cars, because they often are too big in the frame
and the system is able to find only a part of the car.

Fig. 3. Examples of difﬁculties in ﬁnding objects

The ﬁrst set of data contains frames from two CCTV cameras that monitor one of
the authors private territory. The parameters of video analytics system on both cameras
were intentionally limited and worsened, it results in rising number of false detections
226 I. S. Fomin and A. V. Bakhshiev

of different sizes. Since most of false detections has very small size (1–3 5–7 pixels)
of the larger side, it was decided to allocate a 100 100 pixel square around the center
of the rectangle as a detection, taking into account staying within the image boundaries.
The training set is manually marked and contains objects of 5 classes: error, person,
car, dog, cat. The parameters of the dataset are represented in Table 1. The last class is
very poorly represented due to the low resolution and position of the cameras. This set
is a training set for a network that deﬁnes an object class, as well as a test set for a
binary classiﬁcation network (person or non-person).

Table 1. Number of object samples in the ﬁrst dataset

Class name Error Person Car Dog Cat
Number of object samples 28651 1169 876 843 18

In addition to the detections on real data from the CCTV camera, there were also
two previously used video with only people marked. More details about images
extracted from these videos are presented in Table 2.

Table 2. Number of object samples in the second dataset

Class name Error Person
Number of object samples (record 1) 10063 2012
Number of object samples (record 2) 18601 3720
All 28664 5732

We randomly selected squares of 100 100 pixels that do not intersect with any of
the bounding boxes that marked by the person class as negative examples. This set was
used as a training set for the binary classiﬁcation problem.

4 Experiments

4.1 The Network Architectures

In this section we consider the architectures of neural networks, which were developed
by authors and used in our experiments, and the results. In total, four different archi-
tectures were used in the binary classification problem and five different architectures -
in the multiclass classification problem.
The first architecture of the binary classification (A1–1) is shown in Fig. 4, (1). It
consists of two convolutional layers, two fully connected layers and one decision layer,
to which a dropout has been added (throwing out a part of the connections between the
training epochs), because the quality was not raised above 90% without it.
The second architecture (A1–2) is a more complex version of the A1–1 architec-
ture; one convolutional and one fully connected layer have been added in order to
follow the idea that the quality increases with the number of layers, scheme is also
shown in Fig. 4, (2). The third architecture (A1–3) repeats the A1–2 architecture
Research on Convolutional Neural Network for Object Classification 227

scheme, but the function of the weights normalization between the training epochs was
changed to the L2 norm, which should improve the training quality.
In the fourth architecture (A1–4) in addition to the changes made for A1–3, a new rule
for initializing the network layers before the training was introduced according to the
special He function (He Uniform). It is assumed that the initialization with non-random
data, but a special function should improve the quality and reliability of network training.
The first architecture of the multiclass classification (A2–1) shown in Fig. 4,
(3) represents an experiment on the use of completely new sizes of the convolution
kernels with the overall network structure remained the same - the first core is 20 20,
the second is 5 5, and the third is 3 3. Moreover, the number of neurons in the
output layer of the network is changed to 5, which corresponds to 5 classes.
The second (A2–2), the third (A2–3), the fourth (A2–4) and the fifth (A2–5) archi-
tectures correspond to the A1–1,…, A1–4 architectures from the two-class classification.

4.2 Results
In this work we trained and tested the binary error-person classifier and the multiclass
error-person-car-dog-cat classifier. The data sets used in both experiments are described
in the previous section. The results obtained in the first experiment (the binary clas-
sifier) are presented in Table 3. The architectures are labeled as described in the pre-
vious section. A 20% part of the training set was used as the validation set. All
architectures showed good results both on test and validation datasets. The best result
obtained with A1–2 architecture.

Fig. 4. Schemes of the selected architectures

228 I. S. Fomin and A. V. Bakhshiev

Table 3. Results of the binary classiﬁer

Architecture A1–1 A1–2 A1–3 A1–4
Validation precision 98.78 98.87 98.67 98.91
Test precision 99.35 99.49 99.27 99.35

We tested five architectures in the multiclass classifier experiments. The best result
was shown by a network that repeats the A1–3 architecture from the first experiment
(taking into account the increased outputs number) with the A2–4 name and the
93.85% accuracy on the test dataset (Table 4).

Table 4. Results of the multiclass classiﬁer

Architecture A2–1 A2–2 A2–3 A2–4 A2–5
Validation precision 91.23 91.81 89.99 93.59 89.97
Test precision 92.35 92.61 90.91 93.85 90.82

5 Integration into the Video Surveillance Software

We selected the binary classification network on the basis of experiment results for
integrating into the existing video surveillance system based on space-time filtering
algorithms. The networks described in the previous section are implemented on the
basis of the Keras framework, which uses the TensorFlow system for performing
matrix multithread computations. Both of these systems use python as the program-
ming language.
Integration is performed by calling the python code from the C++ program (the
video monitoring system modules use the C++ programming language). Fortunately,
most of the interaction problems are solved in the boost:: python library, so we use it to
design our program module.
The module is works as follows. A black and white image is sent to the input of the
module, it is used to obtain fragments with objects and the coordinates of all objects
detected on the image by the system. The module cuts off parts of the image with
objects in accordance with the rules chosen in Sect. 3. Fragments are sequentially
transferred to the python code for classification, after which the module forms a vector
with object classes as well as debug image.

6 Conclusions

We prepared two sets of data to classify the objects detected by the video analytics
system – one to separate people from any other objects, and the other one to solve the
problem of classifying objects by type (error-person-car-dog-cat). We trained and
tested several neural network architectures on these datasets to determine the most
Research on Convolutional Neural Network for Object Classiﬁcation 229

effective one and ﬁnd out how the number of layers and the layer’s parameters affect
the detection quality in both cases.
The experiment was successfully completed with good results in both cases, it
allows us to choose the most suitable architecture for integration into the video ana-
lytics system. Then the software module of the video analytics system was created on
the basis of existing libraries to ensure the correct interaction with the C++ and python
programs. This module can classify the detected objects as people and other classes.
For further research we plan to train more architectures on this dataset - AlexNet
with preliminary layers scaling, LeNet with images scaling up to 256 256, as well as
more complex ones from the VGG family or simpliﬁed ResNet analogs. In addition, we
going to increase the number of detectable objects classes for special cases and to
perform a long time module testing as part of the video surveillance system.

Acknowledgement. This work was done as the part of the state task of the Ministry of Edu-
cation and Science of Russia No. 075-00924-19-00 “Cloud services for automatic synthesis and
validation of datasets for training deep neural networks in pattern recognition tasks”

References
1. Stepanov, D.N.: Detection of moving objects by a digital-signal-processor-based automatic
video surveillance system. Perception ECVP abstract, 35, 0–0
2. Bakhshiev, A.V., Polovko, S.A., Stepanov, D.N., Smirnova, E.Yu.: Multichannel computer
vision systems for evaluation and estimation of complex dynamic environments for the tasks
of situation analysis. Rob. Tech. Cybern. 3, 59–63 (2014)
3. Hubel, D.H., Wiesel, T.N.: Brain mechanisms of vision. Sci. Am. 241(3), 150–163 (1979)
4. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document
recognition. Proc. IEEE 86(11), 2278–2324 (1998)
5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classiﬁcation with deep convolutional
neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105
(2012)
6. Lin, M., Chen, Q., Yan, S.: Network in network. https://arxiv.org/pdf/1312.4400v3.pdf.
Accessed 28 July 2018
7. Chatﬁeld, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details:
delving deep into convolutional nets. In: British Machine Vision Conference (2014)
8. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going
deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp 1–9 (2015)
9. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception
architecture for computer vision. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 2818–2826 (2016)
10. Scheme of inspection V3 neural network. http://josephpcohen.com/w/wp-content/uploads/
inception-v3.pdf. Accessed 28 July 2018
11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–
778 (2016)
Post-training Quantization of Deep Neural
Network Weights

E. M. Khayrov, M. Yu. Malsagov(&), and I. M. Karandashev

SRISA RAS, Moscow, Russia

[email protected],
{malsagov,karandashev}@niisi.ras.ru

Abstract. The paper considers the quantization of weights as a tool for

reducing the original size of an already trained neural net without having to
perform the retraining. We have examined the methods based on uniform and
exponential weight quantization and compared the results. Besides, we
demonstrate the use of the quantization algorithm in three neural nets: VGG16,
VGG19 and ResNet50.

Keywords: Quantization of weights Uniform quantization

Exponential quantization Neural network Number of bits Precision

1 Introduction

Today’s neural networks have millions of parameters. The constantly growing network
size discourages the installation of neural nets on mobile devices and gadgets. For
example, the trained VGG16 network [1] designed for classification of ImageNet
patterns [2] takes up about 553 MB of memory.
There are different methods to reduce the size of a trained network. Quantization is
one of these methods. The trick employs the division of the weight distribution interval
into discrete subintervals to reduce the weight word lengths. The use of quantization for
pretrained networks allows a significant decrease of the network size, which makes the
network more compact and suitable for installation on mobile systems.
Most implementations of neural net weights quantization involve the retraining of
the network, which requires a great deal of additional computations [3–5].
Unlike those approaches, we investigate the quantization methods that don’t call for
retraining; instead, they operate on weights of already trained networks. By now we
managed to find only one research where the similar approach is used [6]. The paper
offers the logarithmic quantization, which implies the division of weights into lengths
whose logarithms to base 2 is an integer from the interval [−7, 0]. Our method suggests
a more universal approach where both the base and the initial value can be varied. We
called the method the exponential quantization.

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 230–238, 2020.
https://doi.org/10.1007/978-3-030-30425-6_27
Post-training Quantization of Deep Neural Network Weights 231

2 The Methods

When examining trained neural networks like VGG-16, VGG-19, ResNet-50 etc., we
have discovered a clear general trend in the distribution of weights of these networks.
Particularly, the weights almost always comply with a symmetrical Gaussian or
Laplace distribution.
Initially, the weights can be regarded as distributed symmetrically in the interval
½M; M, where M is the largest magnitude of weights. Storing the signs of weights in a
separate array (by allocating one data bit per a sign), we reduce the distributional
interval to ½0; M. For simplicity, let us divide all the values by M to deal with the
interval x 2 ½0; 1.
In quantization we reduce the variety of numbers x 2 ½0; 1 by dividing interval
x 2 ½0; 1 into n segments whose ends are at points:

0 x0 \x1 \x2 \. . .\xn2 \xn1 ¼ 1: ð1Þ

Note that the first segment is ½0; x0 , the last is ½xn2 ; 1, and the others are defined as
½xk1 ; xk ; k ¼ 1; . . .; n 1. The reason of selection of the first segment will be
explained a bit later. The number of segments n is usually determined by the number of
bits B which is allocated for a quantization purpose:

n ¼ 2B1 : ð2Þ

Minus one means the allocation of one bit for the sign of a value.
Let us consider two most popular quantization methods: uniform and exponential
quantization (Fig. 1).

2.1 Uniform Quantization

The uniform quantization suggests the division of the distributional interval into equal
segments:

xk ¼ x0 þ kq; xk ¼ xk1 þ q; k ¼ 1; . . .; n 1: ð3Þ

Here x0 is the ﬁrst point (a variable parameter), n is the number of segments, and
the length q of segments is determined as

M x0
q¼ ; M ¼ max Wij : ð4Þ
n1
232 E. M. Khayrov et al.

Fig. 1. The uniform (to the left) and exponential (to the right) distribution.

2.2 Exponential Quantization

In exponential quantization the subsequent segments increase exponentially with
respect to the ﬁrst segment:

xk ¼ x0 qk ; xk ¼ qxk1 ; k ¼ 1; . . .; n 1: ð5Þ

As before x0 is the ﬁrst point, which is a variable parameter. The length q of

segments are defined as
pffiffiffiffiffiffiffiffiffiffiffi
q¼ M=x0 ; M ¼ max Wij :
n1
ð6Þ

2.3 Variable x0
Given the number of bits B, the quantization procedure is determined uniquely by the
end point of the ﬁrst segment (variable x0 ) in both uniform (3) and exponential (5)
approach.
We assume that parameter x0 is to be chosen so that the distribution of original
weights and distribution of quantized weights have highest correlation.

3 The Results

3.1 Quantization of Laplace-Distribution Quantities

Figures 2 and 3 show the examples of use of quantization methods under discussion. In
the experiments we take random quantities (the number of quantities N ¼ 10000) from
a Laplace distribution, normalize them to the maximum magnitude and subject them to
quantization. The number of quantization steps n is taken to be equal to the power of
two.
Post-training Quantization of Deep Neural Network Weights 233

Fig. 2. Original quantities (yellow histogram), uniformly quantized (left histogram) and
exponentially quantized (right histogram) quantities for a Laplace distribution. The number of
quantization steps n ¼ 8, x0 ¼ 0:05, the number of quantities N ¼ 10000.

Fig. 3. The result of uniform (left) and exponential (right) quantization for a Laplace
distribution. The number of quantization steps n ¼ 8, x0 ¼ 0:05, the number of N ¼ 10000.

We watched the correlation between the quantized and original sets of quantities.
Figures 4 and 5 show how the correlation depends on the length of the ﬁrst segment x0 .
It is seen that the highest correlation nearly always corresponds to some optimal value
of x0 . The best values of correlation q and x0 as function of the word length are shown
in Fig. 6.

3.2 Quantization of Normally Distributed Quantities

Similar experiments were carried out for random quantities obeying a normal distri-
bution (Fig. 7). From the qualitative point of view, the results are like those for the
Laplace distribution.
234 E. M. Khayrov et al.

ρ 1− ρ

x0 x0

Fig. 4. Correlation coefﬁcient q as function of x0 (left plot), value 1 q as function of x0 (right

plot) for a Laplace distribution. The number of quantization steps n ¼ 8.

1− ρ 1− ρ
b =1 b =1

b=2 b=2

b=3 b=3

b=4 b=4

b=5 b=5

b=6 b=6
b=7 b=7
b=8 b=8
x0 x0

Fig. 5. The x0 -dependence of 1 q for different number of bits B ¼ 1; 2; . . .; 8, used in uniform

and exponential quantization for a Laplace distribution. The top curves correspond to smaller
numbers of bits. The plot to the right differs only in logarithmic scale in the X-axis.

1− ρ x0

b b

Fig. 6. Parameter 1 q (left) and optimal value of x0 (right) as functions of the number of bits B
assigned for uniform and exponential quantization. Laplace distribution.
Post-training Quantization of Deep Neural Network Weights 235

3.3 Main Conclusions on Quantization of Random Quantities

The main results for both kinds of distributions and both quantization methods are
presented in Table 1.
They allow the following conclusions. First, the correlation parameter approaches 1
with the growing number of bits, the rate of the approaching being related exponen-
tially to that number (see Fig. 6).

Fig. 7. Original quantities (yellow histogram), uniformly quantized (left histogram) and
exponentially quantized (right histogram) quantities for a Gaussian distribution. The number
of quantization steps n ¼ 8, x0 ¼ 0:05, the number of quantities N ¼ 10000.

Second, parameter x0 has influence on correlation parameter q. Choosing suitable

value of x0 allows a signiﬁcant decrease of 1 q.
Third, the optimal value of x0 is also dependent on the number of bits. x0
approaches zero exponentially with the growing number of bits.
Finally, the uniform quantization proves to be behind the exponential quantization
by one bit (see Fig. 6). Figure 6a shows that one-bit-shift to the left makes uniform-
quantization points turn into exponential-quantization points. We haven’t found out the
cause of it, yet it means that the exponential quantization has a one-bit gain.

Table 1. The optimal values of parameters x0 and q for uniform and exponential quantization
methods in the case of Gaussian distribution.
Laplace Gauss
Bits number Unif. q Exp. q Unif. x0 Exp. x0 Unif. q Exp. q Unif.x0 Exp. x0
1 0.8538 0.8538 0.1010 0.1010 0.8575 0.8575 0.1194 0.1194
2 0.8881 0.9540 0.0980 0.0551 0.9075 0.9592 0.1133 0.0612
3 0.9531 0.98797 0.0643 0.0276 0.9634 0.9892 0.0673 0.0367
4 0.9841 0.9967 0.0367 0.0184 0.9886 0.9968 0.0367 0.0245
5 0.9955 0.99899 0.0184 0.0122 0.9968 0.9991 0.0153 0.0153
6 0.9988 0.9997 0.0122 0.0092 0.9992 0.9997 0.0092 0.0092
7 0.9997 0.99992 0.0061 0.0061 0.9998 0.9999 0.0061 0.0061
8 0.99992 0.99998 0.0031 0.0031 0.9999 1.0000 0.0031 0.0031
236 E. M. Khayrov et al.

3.4 Finding Zero Segment x0 for the VGG16 Neural Network

In quantization a part of elements of original weights whose values are close to zero
also turns into zero. The length x0 of this range is an important adjustable parameter the
right choice of which allows good correlation between the original weights distribution
and its quantized counterpart. To ﬁnd the best value of x0 , it is good to consider the
correlation between the original values and the results of their quantization.
Let f ð xÞ be the distribution function for weights in the neural network layer. The
examination of statistical characteristics of VVG16 neural net layers showed that the
mean of its weight coefﬁcients is close to zero, and that is why they can be neglected in
computations. Then the expression for the correlation takes the form:

Pn Rx
pffiffiffi xk xkk þ 1 zf ðzÞdz
2 k¼0
q¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; ð7Þ
rW P n Rx
x2k xkk þ 1 f ðzÞdz
k¼0

where rW are the mean square deviations of weights. The weights in the VVG16
network usually comply with two sorts of distribution. The convolution layers obey the
Laplace distribution; the fully connected layers follow the normal distribution.
The best value of x0 corresponds to the correlation maximum. Formula (7) doesn’t
describe the dependence perfectly. Yet it gives a good agreement in the correlation
maximum vicinity.
Table 2 presents the comparison of optimal characteristics found with the aid of
above-mentioned expressions and numerical search. The examination of Table 2
allows the evaluation for x0 as a function of the deviation of weight coefﬁcients.

x0 0:15rW : ð8Þ

At ﬁrst we ﬁnd the best value of parameter x0 for each layer of the neural net. Then
we use this parameter to determine other division points x1 . . .xq (5). In the end we carry
out quantization by setting actual weights equal to the following values:
8
< 0; x0 \W\x0
Wd ¼ xk ; xk W\xk þ 1 ; k ¼ 1; q 1: ð9Þ
:
xk ; xk þ 1 \W xk

We used a set of patterns from the ImageNet database to test the work of the
algorithm. The set has 50000 patterns corresponding to 1000 classes. To simplify the
computations, we took a sample of 1000 patterns from the set. The test allowed us to
draw general conclusions about the algorithm.
The results of the computations for different number of quantization sets n are given
in Tables 3 and 4.
Post-training Quantization of Deep Neural Network Weights 237

Table 2. The best parameters for quantization of VVG16 neural net weights. The number of
quantization steps n ¼ 16.
Layer x0 q Number of weights Mean rW x0 =rW
1 0.03100 0.99825 1728 −0.00244 0.20670 0.150
2 0.00531 0.997 36864 0.00491 0.04248 0.125
3 0.00403 0.9961 73728 0.00020 0.03222 0.125
4 0.00294 0.9962 147456 −0.00028 0.02354 0.125
5 0.00261 0.9947 294912 −0.00013 0.01738 0.150
6 0.00185 0.9945 589824 −0.00024 0.01235 0.150
7 0.0019 0.9948 589824 −0.00067 0.01267 0.150
8 0.00151 0.9949 1179648 −0.00045 0.01005 0.150
9 0.00114 0.9943 2359296 −0.00047 0.00762 0.150
10 0.00119 0.9948 2359296 −0.00081 0.00796 0.150
11 0.0013 0.9955 2359296 −0.00058 0.00869 0.150
12 0.00131 0.9954 2359296 −0.00074 0.00876 0.150
13 0.00127 0.9947 2359296 −0.00108 0.00848 0.150
14 0.00035 0.9965 102760448 −0.00014 0.00231 0.152
15 0.00066 0.9971 16777216 −0.00037 0.00438 0.151
16 0.00124 0.9972 4096000 0.00000 0.00828 0.150

Table 3. Accuracy Top1 for different numbers of quantization sets n for networks VGG16,
VGG19 and ResNet50.
VGG16 VGG19 ResNet50
Original (32 bits) 70.3 70.3 75.7
n = 64 (6 + 1 bit) 70.6 69.9 72.6
n = 32 (5 + 1 bit) 69.5 68.4 65.1
n = 16 (4 + 1 bit) 60.5 49.7 29.2
n = 8 (3 + 1 bit) 15.1 0.7 0.2
n = 4 (2 + 1 bit) 0.1 0.1 0.0

Table 4. Accuracy Top5 for different numbers of quantization sets n for networks VGG16,
VGG19 and ResNet50.
VGG16 VGG19 ResNet50
Original (32 bits) 90.9 90.4 93.3
n = 64 (6 + 1 bit) 90.5 89.7 90.6
n = 32 (5 + 1 bit) 90.2 88.7 85.9
n = 16 (4 + 1 bit) 85.2 77.4 53.8
n = 8 (3 + 1 bit) 33.8 9.6 0.8
n = 4 (2 + 1 bit) 0.8 0.5 0.4
238 E. M. Khayrov et al.

As seen from the Tables, the algorithm copes well with the 6-bit quantization and
provides almost the same results of accuracy as the original values of weights. It means
that we can make the size of a network ﬁve times less almost without a considerable
penalty in accuracy.

4 Conclusion

We have developed a quantization method that doesn’t engage retraining. Not requiring
much computation, the approach gives good results for 6-bit quantization and allows us
to reduce the neural net size by about five times. At the same time the classification
accuracy of the method keeps up to the mark.
However, the more layers a network contains, the worse the algorithm works. For
instance, the ResNet50 network of 50 layers works notably worse than VGG-type
networks which have fewer layers. At the same time the VGG16 network shows good
results with 5-bit quantization. The difference is the reason why the research in the field
should continue.

Funding. The work ﬁnancially supported by State Program of SRISA RAS No. 0065-2019-
0003 (AAA-A19-119011590090-2).

References
1. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition (2014). https://arxiv.org/abs/1409.1556
2. ImageNet – huge image dataset. http://www.image-net.org
3. Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. https://arxiv.org/abs/
1612.01064
4. Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., Zou, Y.: Dorefa-net: training low bitwidth
convolutional neural networks with low bitwidth gradients. https://arxiv.org/pdf/1606.06160.
pdf
5. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with
pruning, trained quantization and huffman coding. CoRR. https://arxiv.org/abs/1010.00149,
February 2015
6. Cai, J., Takemoto, M., Nakajo, H.: A deep look into logarithmic quantization of model
parameters in neural networks. In: The 10th International Conference on Advances in
Information Technology (IAIT2018), Bangkok, Thailand, 10–13 December 2018, 8 pages.
ACM, New York (2018). https://doi.org/10.1145/3291280.3291800
Deep-Learning Approach
for McIntosh-Based Classification
Of Solar Active Regions Using HMI
and MDI Images

Irina Knyazeva1,2,3(B) , Andrey Rybintsev1 , Timur Ohinko2 ,

and Nikolay Makarenko1,3
1
Pulkovo Observatory, Saint-Petersburg, Russia
[email protected]
2
Saint-Petersburg State University, Saint-Petersburg, Russia
3
Institute of Information and Computational Technologies, Almaty, Kazakhstan

Abstract. Solar active regions (ARs) are the primary source of solar
flares. There are plenty of studies where the statistical relationship
between ARs magnetic field complexity and solar flares are shown. Usu-
ally, the complexity of ARs described with different numerical magnetic
field parameters and characteristics calculated on top of them. Also,
there is well known and widely adapted McIntosh classification scheme
of sunspot groups, consists of three letters abbreviation. Solar Monitor’s
flare prediction system’s based on this classification. Up to date, the clas-
sification is done manual once a day by the specialist. In this paper, we
describe an automatic system based on convolutional neural networks.
For neural network training, we used images from two big magnetogram
databases (HMI and MDI images) covered together period from 1996
to the 2018 years. Our results show that the automated classification of
Solar ARs is possible with a moderate success rate, which allows to use
it in practical tasks.

Keywords: Solar active regions · McIntosh classiﬁcation ·

Deep learning · Neural networks · Image classiﬁcation

1 Introduction
The observation, analysis, and classification of solar active regions is an essen-
tial part of space weather research. Sunspots investigated by scientists for more
than 400 years [5,6]. It was established that they appeared mostly not isolated
but with groups. These groups differ significantly with the size, configuration,
number of spots, and other parameters. The sunspots on the Earth-facing side of
the Sun observed space weather specialists, and they give each sunspot region a
magnetic classification and a spot classification. The modern standard of classi-
fication is a 3-component McIntosh classification system ([4]). It was introduced
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 239–245, 2020.
https://doi.org/10.1007/978-3-030-30425-6_28
240 I. Knyazeva et al.

in 1966. The classification depends on the size, shape, and spot density of ARs.
It is a modified version of the Zürich classification system, with seven broad
categories, characterized a number of polarities and distribution of spots in the
group. The general form of the McIntosh classification is ZPC, where “Z” is the
modified Zürich class, “P” is the type of penumbra on the largest spot and “C” is
the degree of compactness in the interior of the group. The simplified picture of
active regions configuration according to McIntosh classification could be found
at the site SpaceWeatherLive.com and represented at Fig. 1 .

Fig. 1. McIntosh classiﬁcation schema

Sunspots classiﬁcation is mostly carried out manually by experts on a daily

basis. Even though classification rules are well defined, it is a subjective and
time-consuming process with the controversy between solar physicists even when
working together. That’s why the automated system could be useful not only
for practical reasons (for example in forecasting purpose) but also for making a
decision. At Fig. 2 are shown four examples of ARs with the different McIntosh
classes. In this case, ARs with a rather clear assignment of classes were taken,
but even looking at this example, it is clear that the task is challenging.
Solar ARs Classification 241

Fig. 2. Examples of ARs with diﬀerent McIntosh classes

Here we present an automated machine learning model for McIntosh based

classification of ARs based on deep convolutional neural networks approach.
There are several papers connected directly to this task [1]. These authors worked
under the project of automated solar flares forecasting and McIntosh classifica-
tion as a subproject. In their work feature based classification approach is pre-
sented. So at first step, different features, such as the diameter of groups, number
of spots, etc., based on expert knowledge were computed. After that these fea-
tures were used as input in the classification system. Within this approach, an
human-related preprocessing step should be done to extract the features. This
approach was standard in image processing tasks before the deep learning era.
In our paper, we also decide to skip this step with the help of the convolutional
neural networks. Yann LeCun first introduced the architecture of deep convolu-
tional neural networks in collaboration with Yoshua Bengio and others for the
handwriting recognition problem, which was called LeNet [3]. In the following
years, it was improved and generalized. Architecture for the first time showed
greater accuracy on this task. Since then, the data of this problem have become
a common benchmark for small neural networks. The next breakthrough was the
work that presented the AlexNet network [2], which showed the best results on
the ImageNet competition data. Further, the approach of convolutional neural
networks was developed and became practically the standard for the tasks of
recognition, detection, image segmentation, and other tasks of computer vision.
That’s why we also decided to try it in case of magnetogram analysis.

2 Data Collection and Description

In our study, we use information about solar active regions provided by the
Space Weather Prediction Center of National Oceanic and Atmospheric Admin-
242 I. Knyazeva et al.

istration (SWPC NOAA). Data could be collected directly from the SWPC
archive, but there are much additional information, and we used aggregated
information about each active region from the SpaceWeatherLive.com. Magne-
togram data is available since the 1996 year, so we collected information from
1996 till 2018. This data includes information about solar active region identi-
fication number (NOAA number), date, location at the Solar disc, and corre-
sponding McIntosh class. Python codes for data collection could be found at
the Github repository. We used magnetogram data from two instruments: Solar
Dynamics Observatory launched at Solar, and Heliospheric Observatory (http://
soi.stanford.edu/) (operated in 1996–2010) and its successor Helioseismic and
Magnetic Imager launched at Solar Dynamics Observatory http://hmi.stanford.
edu/ (2010 - now time). Full disc magnetograms hmi.m 720s from HMI/SDO
with resolution 4096 × 4096 and mdi.f d m 96m lev182 from SOHO/MDI with
resolution 1024×1024 were downloaded for each day presented at the final table.
After that, based on information from the table about the center of Active Region
location and date, all active regions presented at the full disc magnetogram were
cropped. We take size 500 × 400 for HMI, which corresponds to 125 × 100 for
MDI due to the low resolution of the last. Python codes for cropping also pro-
vided at Github repository. The example of full disc magnetogram and cropped
fragments is shown at Fig. 3. As a result, the total number of fragments 19565
were cropped, this number is less than the total number of records in the table
because we didn’t consider regions close to the limb. These fragments were used
as inputs to the neural network. As a target we take letters from McIntosh clas-
sification system. McIntosh classification is three-letter abbreviation; the first
letter consists of 7 classes, second - 6 and third 4. Distribution of examples
by each letter provided at Fig. 4. As shown in picture class distribution is not
balanced. We take it into account at the step of neural network learning.

Fig. 3. Part of full disk magnetogram with cropped Active Regions

Solar ARs Classiﬁcation 243

Fig. 4. Class distribution for each letter in McIntosh classiﬁcation system

3 Results

As was described previously, we used data from two instruments with different
resolution. Size of the fragments for mdi data is 125 × 100 for hmi 500 × 400.
It is possible to build different models for each type of data and then concate-
nate models. But at the start level, we decide to resize all images to one size
125 × 100 aware that we are losing some information. For neural network pro-
cessing, data should be normalized. Usually, for images or matrix data max-min
or standard deviation normalization are used, but in our case, it is important
to save information about global magnetic field strength, so we find maximum
values through all fragments and divide each pixel to this value. As a result, each
pixel in a fragment was a signed value from the range [−1,1]. As a baseline, we
use a simple convolutional neural network with three convolutional layers and
“tanh” function of activation. As a target, we used separate letters in McIntosh
codes, for each separate letter model was trained. After that, we experimented
with different architectures, well-proven in other image classification tasks, such
as DenseNet and ResNet. ResNet architecture was even worse than simple base-
line, DensNet gives little improvements with increasing of computational time.
Additionally, we try to add information about the statistical distribution of the
signed logarithm of data (sign(data) ∗ log(abs(data)) in the form of 8 values
of percentile values. This information was added to the dense layer (https://
www.overleaf.com/project/5cd932772cba9f6e5976cc37) next to the convolution
layers. The schema of the resulted model is presented in Fig. 5.
The accuracy metric received with the three different architectures are rep-
resented in the Table 1 below.
The distribution of correctly and wrong predicted classes by each letter pre-
sented at confusion matrix at Fig. 6. This result is slightly outperformed results
received by [1]
244 I. Knyazeva et al.

Table 1. Diﬀerent neural network architectures accuracy for each letter prediction.
Overall accuracy versus random prediction accuracy is given

Letter1 Letter 2 Letter 3

(Net vs Rand) (Net vs Rand) (Net vs Rand)
Baseline 0.52 (0.14) 0.56 (0.16) 0.68 (0.25)
ResNet 0.49 (0.14) 0.52 (0.16) 0.67 (0.25)
DensNet 0.54 (0.14) 0.57 (0.16) 0.7 (0.25)

Fig. 5. Neural network model used in ARs classiﬁcation. Each convolutional layer con-
sist from 64 5 × 5 ﬁlters with tanh activation function followed by BatchNormalization
layer

Fig. 6. Results of classiﬁcation each of three letter in the form of confusion matrix.

4 Conclusion and Discussion

The result of classification can hardly be called outstanding. But this is primar-
ily due to the complexity of the data. Also, the criteria for assigning a region
to a particular class are quite complex, and even experts may disagree, not to
mention the automatic system. Nevertheless, in our opinion, the information
about the Mcintosh class of ARs could be an important block in modern sys-
tems, since it makes it possible to use accumulated knowledge, even before the
era of regular observations. Besides, the quality of classification is much better
Solar ARs Classification 245

than the random guessing (for seven classes classification the quality of random
guessing is 15%, and in our model, it is 51%). Also, various steps are possible
to improve the model. So we did not use the white light data, while historically
they were available much earlier and it was for them that the classification was
initially carried out. We considered that the magnetograms contain much more
information. Besides, for learning neural networks, especially if we using deep
architectures, we need a lot of data. In our case the problem of data can be solved
by extra data labeling, which requires additional human resources and by using
data through a small interval, assuming that during this time the McIntosh class
will not be changed. In the latter case, additional information is needed on the
position of the center of the region after a specific time period, so some active
region tracking system should be realized.

Acknowledgments. We gratefully acknowledge ﬁnancial support of Institute of Infor-

mation and Computational Technologies (Grant AR05134227, Kazakhstan).

References
1. Colak, T., Qahwaji, R.: Automated McIntosh-based classification of sunspot groups
using MDI images. In: Solar Image Analysis and Visualization, pp. 67–86. Springer,
New York (2007). https://doi.org/10.1007/978-0-387-98154-3 7
2. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
volutional neural networks. In: Advances in Neural Information Processing Systems,
pp. 1097–1105 (2012)
3. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.,
Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural
Comput. 1(4), 541–551 (2008). https://doi.org/10.1162/neco.1989.1.4.541
4. McIntosh, P.S.: The classification of sunspot groups. Sol. Phys. 125(2), 251–267
(1990). https://doi.org/10.1007/BF00158405
5. Severny, A., Lüst, R.: Stellar and solar magnetic fields. In: International Astronom-
ical Union. Symposium (1965)
6. Yazev, S.: Kompleksi activnosti na Solnze. Nauka v Rossii, pp. 4–12 (2009)
Deep Learning for ECG Segmentation

Viktor Moskalenko , Nikolai Zolotykh(B) , and Grigory Osipov

Lobachevsky University of Nizhni Novgorod,

Gagarin Avenue 23, Nizhni Novgorod 603950, Russia
{viktor.moskalenko,nikolai.zolotykh,grigory.osipov}@itmm.unn.ru

Abstract. We propose an algorithm for electrocardiogram (ECG) seg-

mentation using a UNet-like full-convolutional neural network. The algo-
rithm receives an arbitrary sampling rate ECG signal as an input, and
gives a list of onsets and offsets of P and T waves and QRS complexes as
output. Our method of segmentation differs from others in speed, a small
number of parameters and a good generalization: it is adaptive to differ-
ent sampling rates and it is generalized to various types of ECG monitors.
The proposed approach is superior to other state-of-the-art segmentation
methods in terms of quality. In particular, F 1-measures for detection of
onsets and offsets of P and T waves and for QRS-complexes are at least
97.8%, 99.5%, and 99.9%, respectively.

Keywords: Electrocardiography · UNet · ECG segmentation ·

ECG delineation

1 Introduction

The electrocardiogram (ECG) is a recording of the electrical activity of the

heart, obtained with the help of electrodes located on the human body. This
is one of the most important methods for the diagnosis of heart diseases. The
ECG is usually treated by a doctor. Recently, automatic ECG analysis is of great
interest.
The ECG analysis includes detection of QRS complexes, P and T waves,
followed by an analysis of their shapes, amplitudes, relative positions, etc. (see
Fig. 1). The detection of onsets and offsets of QRS complexes and P and T waves
is also called segmentation or delineation of the ECG signal.
Accurate ECG automatic segmentation is a difficult problem for the follow-
ing reasons. For example, the P wave has a small amplitude and can be difficult
to identify due to interference arising from the movement of electrodes, muscle
noise, etc. P and T waves can be biphasic, which makes it difficult to accu-
rately determine their onsets and offsets. Some cardiac cycles may not contain
all standard segments, for example, the P wave may be missing, etc.
Among the methods of automatic ECG segmentation, methods using wavelet
transforms have proven to be the best [3,4,6–9]. In [11], a neural network app-
roach for ECG segmentation is proposed. The segmentation quality turned out to
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 246–254, 2020.
https://doi.org/10.1007/978-3-030-30425-6_29
Deep Learning for ECG Segmentation 247

Fig. 1. An example of medical segmentation. Yellow color corresponds to P waves, red

to QRS complexes, green to T waves. The symbol means the onset of a wave, ◦
means the wave peak, corresponds to the oﬀset of a wave.

be close to the quality obtained by state-of-the-art algorithms based on wavelet

transform, but still, as a rule, lower. In this paper, we suggest using the UNet-like
[10] neural network. As a result, using the neural network approach, it is pos-
sible to achieve and even exceed the quality of segmentation obtained by other
algorithms. In terms of quality, the proposed approach is superior to analogues.
In particular, F 1-measures for detection of onsets and offsets of P and T waves
and for QRS-complex are at least 97.8%, 99.5%, and 99.9%, respectively.
In addition, the proposed segmentation method differs from analogous in
speed, a small number of parameters in the neural network and good general-
ization: it is adaptive to different sampling rates and is generalized to various
types of ECG monitors.
The main differences of the proposed approach from the paper [11] follow:

– in [11], an ensemble of 12 convolutional neural networks is used; here we use

one full-convolutional neural network with skip links;
– in contrast to the present work, [11] does not use postprocessing;
– in [11], a preprocessing is used to remove a isoline drift; we process signals as
is; in Sect. 3.3, we will see that the quality of ECG segmentation is high even
in the case of the isoline drift.

2 Algorithm

2.1 Preprocessing

The neural network described below was trained on a dataset of ECG signals
with the sampling frequency 500 Hz and the duration 10 s (see Sect. 3.1). In
order to use this network for signals of a diﬀerent frequency or/and a diﬀerent
248 V. Moskalenko et al.

duration, we propose the following preprocessing. Let the frequency of an input

signal x = (x1 , x2 , . . . , xn ) be ν, and the network is trained on signals with the
frequency μ. Then T = n/ν is the signal duration. Convert the input signal as
follows.
(2i − 1)T
1. Form an array of time samples t = (t1 , t2 , . . . , tn ), where ti = are
2n
the midpoints of the time intervals formed by dividing the segment [0, T ] into
n equal parts (i = 1, 2, . . . , n).
2. On the set of points {(t1 , x1 ), (t2 , x2 ), . . . , (tn , xn )}, construct the cubic spline
[2].
3. Form the array of new time samples t = (t1 , t2 , . . . , tm ), where

(2i − 1)T
m = μT , ti = .
2m
4. Using the cubic spline, ﬁnd the signal values at t . The resulting array will be
the input to the neural network.

2.2 The Neural Network Architecture

The architecture of the neural network (see Fig. 2) is similar to the UNet archi-
tecture [10]. The input of the neural network is a vector of length l, where l is
the length of the ECG signal received from one lead. Each lead is fed to the
input of the neural network separately.

Fig. 2. Neural network architecture

The output size is (4, l). Each column of the output matrix contains 4 scores,
that characterize the conﬁdence degree of the neural network that the current
value of the signal belongs to the segments P, QRS, T or none of the above. The
proposed neural network includes the following layers:
Deep Learning for ECG Segmentation 249

(i) 4 blocks, each of which includes two convolutional layers with batch nor-
malization and the Relu activation function; these blocks are connected
sequentially with MaxPooling layers;
(ii) the output from the previous layer through the MaxPooling layer is fed to
the input of another block containing two convolutional layers with batch
normalization and the Relu activation function;
(iii) the output from the previous layer through the deconvolution and zero
padding layers is concatenated with the output from the layer (ii) and is
fed to the input of the block that includes two convolutional layers each
with the batch normalization and the Relu activation function;
(iv) the output from the previous layer through the deconvolution and zero
padding layers is sequentially fed to the input of another 4 blocks containing
two convolutional layers each with batch normalization and Relu activation
function; each time the output is concatenated with the output from the
corresponding layers (i) in the reverse order;
(v) the output from the previous layer is fed to the input of another convolu-
tional layer.
All convolutional layers have the following characteristics: kernel-size = 9,
padding = 4. All deconvolution layes have kernel-size = 8, stride = 2, padding =
3. For the last convolutional layer kernel-size = 1.
The main differences between the proposed network and UNet follow:
– we use 1d convolutions instead of 2d convolutions;
– we use a different number of channels and different parameters in the convo-
lutions;
– we use of copy + zero pad layers instead of copy + crop layers; as a result, in
the proposed method the dimension of the output is the same as the input;
in contrast, at the output of the UNet network, we obtain a segmentation of
only a part of the image.

2.3 Postprocessing
The output of the neural network is the matrix of size (4, l), where l is the
input signal length. Applying the argmax function to the columns of the matrix,
we obtain a vector of length l. Form an array of waves, ﬁnding all continuous
segments with the same label.
For processing multi-leads ECG (a typical number of leads is 12), we propose
to process each lead independently, and then ﬁnd the average of the resulting
scores. As we will see in the Section , such an analysis improves the quality of
the prediction.

3 Experimental Results
3.1 LUDB Dataset
The training of the neural network and experiment conducting were performed
on the extended LUDB dataset [5]. The dataset consists of a 455 12-leads ECG
250 V. Moskalenko et al.

with the duration of 10 seconds recorded with the sampling rate of 500 Hz. For
comparison of algorithms, the dataset was divided into a train and a test sets,
where the test consists of 200 ECG signals borrowed from the original LUDB
dataset. Since the proposed neural network elaborate the leads independently,
255 × 12 = 3060 signals of length 500 × 10 = 5000 were used for training. To
prevent overfitting, augmentation of data was performed: at each batch iteration,
a random continuous ECG fragment of 4 s was fed to the input of the neural
network.
The LUDB dataset has the following feature. One (sometimes two) first and
last cardiac cycles are not annotated. At the same time, the first and last marked
segments are necessarily QRS (see an example in Fig. 1). To implement a correct
comparison with the reference segmentation, the following modifications were
made in the algorithm:

– during augmentation, the ﬁrst and last 2 s were not taken, i. e. subsequences
of the length of 4 s were chosen starting from the 2-nd to the 4-th (ending
from the 6-th to the 8-th s);
– in order to avoid a large number of false positives, the ﬁrst and the last cardiac
cycles were removed during the validation of the algorithm.

3.2 Comparison of the Algorithms

Table 1 contains results of the experiment and the comparison of the results
with one of the best segmentation algorithm using wavelets [4] and the neural
network segmentation algorithm [11]. The last line shows the characteristics of
our algorithm that analyses the leads independently for a test set consisting of
200 × 12 = 2400 ECG.
The quality of the algorithms is determined using the following procedure.
According to the recommendations of the Association for Medical Instrumen-
tation [1], it is considered that an onset or an offset are detected correctly, if
their deviation from the doctor annotations does not exceed in absolute value
the tolerance of 150 ms.
If an algorithm correctly detects a significant point (an onset or an offset of
one of the P, QRS, T segments), then a true positive result (TP) is counted and
the time deviation (error) of the automatic determined point from the manually
marked point is measured. If there is no corresponding significant point in the
test sample in the neighborhood of ±tolerance of the detected significant point,
then the I type error is counted (false positive – FP). If the algorithm does not
detect a significant point, then the II type error is counted (false negative – FN).
Following [3,6,8,9], we measure the following quality metrics:

– the mean error m;

– the standard deviation σ of the mean error;
– the sensitivity, or recall, Se = TP/(TP + FN);
– the positive predictive value, or precision, PPV = TP/(TP + FP).
Deep Learning for ECG Segmentation 251

Here TP, FP, FN denotes the total number of correct solutions, type I errors,
and type II errors, respectively. We also give the value of
Se · PPV
– the F 1-measure: F 1 = 2 .
Se + PPV

Table 1. The comparison of ECG segmentation algorithms

P onset P offset QRS onset QRS offset T onset T offset

Kalyakulina Se (%) 98.46 98.46 99.61 99.61 − 98.03
et al. [4] PPV (%) 96.41 96.41 99.87 99.87 98.84
F1 (%) 97.42 97.42 99.74 99.74 98.43
−2.7±10.2 0.4 ± 11.4 −8.1 ± 7.7 3.8 ± 8.8 5.7 ± 15.5
m ± σ(ms)
Sereda et al. [11] Se (%) 95.20 95.39 99.51 99.50 97.95 97.56
PPV (%) 82.66 82.59 98.17 97.96 94.81 94.96
F1 (%) 88.49 88.53 98.84 98.72 96.35 96.24
2.7 ± 21.9 −7.4±28.6 2.6 ± 12.4 −1.7±14.1 8.4 ± 28.2 −3.1±28.2
m ± σ(ms)
This work Se (%) 98.05 98.01 100.00 100.00 99.68 99.77
PPV (%) 97.73 97.69 99.93 99.93 99.37 99.46
F1 (%) 97.89 97.85 99.97 99.97 99.52 99.61
−0.6±17.5 −2.4±18.4 1.5 ± 11.1 2.0 ± 10.6 2.9 ± 23.7 −2.4±30.4
m ± σ(ms)
This work (only Se (%) 98.61 98.59 99.99 99.99 99.32 99.40
lead II) PPV (%) 95.61 95.59 99.99 99.99 99.02 99.10
F1 (%) 97.09 97.07 99.99 99.99 99.17 99.25
−4.1±20.4 3.7 ± 19.6 1.8 ± 13.0 −0.2±11.4 −3.6±28.0 −4.1±35.3
m ± σ(ms)
This work (each Se (%) 97.38 97.36 99.96 99.96 99.43 99.48
lead is used PPV (%) 95.53 95.52 99.84 99.84 98.88 98.94
separately) F1 (%) 96.47 96.43 99.90 99.90 99.15 99.21
0.9 ± 14.1 −3.5±15.7 2.1 ± 9.8 1.6 ± 9.8 1.3 ± 20.9 −0.3±22.9
m ± σ(ms)

Analyzing the results, we can draw the following conclusions:

– the indicators Se and PPV for the proposed algorithm are the most or almost
the highest for all types of ECG segments;
– averaging the answer over all 12 leads helps to detect the complexes better:
it has improved both Se and PPV; however, the detecting the onsets and the
oﬀsets worsens, which is indicated by the growth of σ in all indicators;
– to detect the QRS-complexes, it is enough to use only lead II, since it gives
the highest quality of their determination; such an approach will reduce the
time of the algorithm 12 times, without passing the other leads through the
neural network;
– the best σ values are given by the algorithm [4];
– the results of the proposed approach for all indicators surpassed the other
neural network approach [11].

3.3 Examples of the Resulting Segmentations

Examples of segmentations obtained by the proposed algorithm are shown in
Figs. 3, 4, 5, 6 and 7.
252 V. Moskalenko et al.

The experiments show that the proposed algorithm conﬁdently copes with
noise of diﬀerent frequencies. An example with low frequency noise (breathing)
is shown in Fig. 3. An example with high frequency noise is presented in Fig. 4.
An example of the segmentation of an ECG with a pathology (ventricular
extrasystole) is shown in Fig. 5. An example of segmentation of an ECG obtained
from another type of ECG monitor is shown in Fig. 6. It is characterized by high
T waves and a strong degree of smoothness. Figure 7 presents an example of
segmentation of an ECG with the frequency of 50 Hz, reduced using a cubic
spline to the frequency of 500 Hz.

Fig. 3. An example of low frequency noise ECG segmentation (breathing)

Fig. 4. An example of high frequency noise ECG segmentation

Fig. 5. An example of ECG segmentation with pathology (ventricular extrasystole)

Deep Learning for ECG Segmentation 253

Fig. 6. An example of segmentation of an ECG obtained from another type of ECG

monitor. It is characterized by high T waves and a strong degree of smoothness.

Fig. 7. An example of segmentation of an ECG with the frequency of 50 Hz, reduced

using a cubic spline to the frequency of 500 Hz

4 Conclusion and Future Work

The paper describes an algorithm based on the use of a UNet-like neural network,
which is capable to quickly and efficiently construct the ECG segmentation. Our
method uses a small number of parameters and it has a good generalization.
In particular, it is adaptive to different sampling rates and it is generalized to
various types of ECG monitors. The proposed approach is superior to other state-
of-the-art segmentation methods in terms of quality. F 1-measures for detection
of onsets and offsets of P and T waves and for QRS-complexes are at least 97.8%,
99.5%, and 99.9%, respectively.
In the future, this can be used with diagnostic purposes. Using segmentation,
one can compute useful signal characteristics or use the neural network output
directly as a new network input for automated diagnostics with the hope of
improving the quality of classification.
In addition, one can try to improve the algorithm itself. In particular, the loss
function used in the proposed neural network probably does not quite reflect the
quality of segmentation. For example, it does not take into account some features
of the ECG (e. g. two adjacent QRS complexes cannot be too close to each other
or too far from each other).
254 V. Moskalenko et al.

Acknowledgement. The authors are grateful to the referee for valuable suggestions
and comments. The work is supported by the Ministry of Education and Science of
Russian Federation (project 14.Y26.31.0022).

References
1. Association for the Advancement of Medical Instrumentation. NSI/AAMI
EC57:1998/(R)2008 (Revision of AAMI ECAR:1987) (1999)
2. De Boor, C.: A Practical Guide to Splines. Springer, New York (1978)
3. Bote, J.M., Recas, J., Rincon, F., Atienza, D., Hermida, R.: A modular low-
complexity ECG delineation algorithm for real-time embedded systems. IEEE J.
Biomed. Health Inform. 22, 429–441 (2017)
4. Kalyakulina, A.I., Yusipov, I.I., Moskalenko, V.A., Nikolskiy, A.V., Kozlov,
A.A., Zolotykh, N.Y., Ivanchenko, M.V.: Finding morphology points of
electrocardiographic-signal waves using wavelet analysis. Radiophys. Quantum
Electron. 61(8–9), 689–703 (2019)
5. Kalyakulina, A.I., Yusipov, I.I., Moskalenko, V.A., Nikolskiy, A.V., Kozlov,
A.A., Kosonogov, K.A., Zolotykh, N.Yu., Ivanchenko, M.V.: LU electrocardio-
graphy database: a new open-access validation tool for delineation algorithms.
arXiv:1809.03393 (2018)
6. Di Marco, L.Y., Lorenzo, C.: A wavelet-based ECG delineation algorithm for 32-bit
integer online processing. Biomed. Eng. Online 10(1), 23 (2011)
7. Li, C., Zheng, C., Tai, C.: Detection of ECG characteristic points using wavelet
transforms. IEEE Trans. Biomed. Eng. 42(1), 21–28 (1995)
8. Martinez, A., Alcaraz, R., Rieta, J.J.: Automatic electrocardiogram delineator
based on the phasor transform of single lead recordings. In: Computing in Car-
diology, pp. 987–990. IEEE (2010)
9. Rincon, F., Recas, J., Khaled, N., Atienza, D.: Development and evaluation of
multilead wavelet-based ECG delineation algorithms for embedded wireless sensor
nodes. IEEE Trans. Inf. Technol. Biomed. 15, 854–863 (2011)
10. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedi-
cal image segmentation. In: International Conference on Medical Image Computing
and Computer-Assisted Intervention, pp. 234–241. Springer, Cham (2015)
11. Sereda, I., Alekseev, S., Koneva, A., Kataev, R., Osipov G.: ECG segmentation by
neural networks: errors and correction. arXiv:1812.10386 (2018)
Competitive Maximization of Neuronal
Activity in Convolutional Recurrent Spiking
Neural Networks

Dmitry Nekhaev(&) and Vyacheslav Demin

National Research Center “Kurchatov Institute”, Moscow, Russia

[email protected]

Abstract. Spiking neural networks (SNNs) are the promising algorithm for
specific neurochip hardware real-time solutions. SNNs are believed to be highly
energy and computationally efficient. We focus on developing local learning
rules that are capable to provide both supervised and unsupervised learning. We
suppose that each neuron in a biological neural network tends to maximize its
activity in competition with other neurons. This principle was put at the basis of
SNN learning algorithm called FEELING. Here we introduce efficient Convo-
lutional Recurrent Spiking Neural Network architecture that uses FEELING
rules and provides better results than fully connected SNN on MNIST bench-
mark having 55 times less learnable weight parameters.

Keywords: Spiking neural networks Local learning rules

1 Introduction

Spiking neural networks (SNNs) have some remarkable advantages in comparison to

formal neural networks: signiﬁcantly reduced energy consumption of neural network
algorithms realized in speciﬁc multi-core hardware (neurochips) [1], an ability to model
computing in continuous real time, an ability to test and use different bio-inspired local
training rules (Hebb’s, Spike-Timing Dependent Plasticity (STDP), metabolic, home-
ostatic, etc.) and others.
State-of-the-art results were obtained by training SNNs with gradient based methods
(back-propagation) for different machine learning tasks [2]. However, local training
rules have a great potential for the coming generation of SNN algorithms for intelligent
information processing. One of the most widely used and investigated local learning rule
for SNNs is STDP [2–5], we consider it as a baseline for developing other local learning
rules for SNNs. We proposed local leaning rule called FEELING [6] that allows training
of inhibitory and reciprocal connections inserted into the network architecture.

2 Materials and Methods

In this section we describe the process of data encoding, FEELING learning rules and
introduced Convolutional Recurrent Spiking Neural Network (CRSNN) architecture.

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 255–262, 2020.
https://doi.org/10.1007/978-3-030-30425-6_30
256 D. Nekhaev and V. Demin

2.1 Data Encoding

The height and width of the neural network input layer was chosen accordingly to the
MNIST dataset [7] which contains 60000 training images of handwritten digits and
10000 test images of 28 28 pixels (784 input neurons). Labels from 0 to 9 are
presented in the dataset, total number of different digits is balanced. 3000 training
images were used as the validation. Each image was fed to the input layer of the
network for 100 ms using Poisson spike trains encoding scheme. Firing rate of each
input neuron was distributed between 0 Hz and 250 Hz in proportion to pixel intensity.
Here we introduce encoding with two channels for grey-scale images. The ﬁrst
channel encodes standard images x with method described above. For constructing the
second channel we apply the same encoding method to the inverse image of the digit.
The inverse image xinverse is the result of subtraction original image from maximum
pixel intensity of the image:

xinverse
ij ¼ maxð xÞ xij ð1Þ

Additional inverse image helps to send to the network information about regions
that do not have high intensity pixels in term of presence of spikes (instead of absence
as for original image). Also, it provides input signals that are automatically l1 -nor-
malized, what is essential for algorithm convergence.

2.2 Neural Architecture

We use LIF [8] neurons with leaky threshold in our experiments. Additionally, we
compute exponential traces of each neuron with two different decaying parameters sa
and sh . The ﬁrst one stands for tracking instant activity aðtÞ of the neuron, the second
one allows to track average activity hðtÞ. Neuron model can be summarized in
equations:
8 dV
>
> sm dt j ¼ Vj þ Ij ðtÞ
>
> V
>
> dVth;j
¼ DVth N d t tj sth;j
>
< dt P th
Ij ðtÞ ¼ i wij ai ðtÞ ð2Þ
>
> aðtÞ ¼ 1 Dt sðtÞ
>
> sa aðt DtÞ þ sa
>
>
>
: hðtÞ ¼ 1 Dt hðt DtÞ þ sðtÞ
sh sh

Here Vj is a membrane potential of the neuron j, sm characteristic membrane time, Ij

input current of the neuron j, Vth;j threshold potential of neuron j, DVth constant
increment for the threshold, N total number of neurons in the layer, sth decay constant
for the threshold, aðtÞ and hðtÞ are instant and average activities of the neuron, sðtÞ is a
binary spike variable equaling 1 if spike occurs at the moment t, and 0 otherwise, sa
and sh are decay parameters for aðtÞ and hðtÞ ðsa sh Þ.
Competitive Maximization of Neuronal Activity 257

Here we introduce Convolutional Recurrent Spiking Neural Network (CRSNN)

architecture. This is a three-layer architecture, where the first feature extracting layer is
convolutional, the second layer is fixed spike-pooling operator and the third layer is the
classifier.
Convolutional Feature Extraction. There are 25 different convolutional kernels that
have 2 input channels. Size of each convolutional filter is 16 16. Each filter is
applied with stride 4 without padding. This convolutional structure provides 25 feature
maps with size 4 4 (400 total hidden neurons). Neurons corresponding to the same
feature map share weights.
Spike-Pooling. To reduce the number of weights of subsequent fully connected layer
we apply pooling operation to 4 4 feature maps obtained with convolutional filters.
All 16 neurons in each feature map are connected to one LIF neuron with equal
constant weights to perform operation similar to average pooling in rate-based artificial
neural networks. The difference is that in our case pooling operation integrates input
spikes during time and produces one spike only after integrating several input spikes
that should occur in a short period of time. Otherwise, the potential of the pooling LIF
neuron will not reach the threshold. We call this operation spike-pooling. Spike-
pooling neurons can be assumed as hubs for each convolutional filter.
Accordingly to [6] we insert inhibitory connections into the hidden layer. In pro-
posed CRSNN architecture competition is introduced between spike-pooling neurons
but not between convolution layer neurons. It allows to (1) allow competition only
between different convolutional filters (there is no competition between neurons of the
same feature map), (2) reduce total number of inhibitory connections from 400 399
to 25 24 (self-inhibition is not allowed).
Classifier. To classify input images on 10 classes we add 25 10 fully connected
layer at the end of the CRSNN. Additional supervised current is injected in corre-
sponding classifying neuron while presenting input image of the definite class.
Learnable inhibitory weights are also introduced between classifying neurons to pro-
vide distinguishable output of the network during test phase (supervised currents are
removed in test and validation phases).
Reciprocal learnable weights are added from classifying to hidden layer to provide
backward signal from classifying neurons to the hidden spike-pooling layer. Additional
backward weights allow to send reinforcing signal to the hidden layer from highly
activated classifying neurons. This type of connection is especially useful during
supervised learning.
Initialization. All forward weight values are initialized from uniform distribution in
the range [0, 1]. Forward weight values are clipped between 0 and 1 during training.
All inhibitory weights were initialized as −1 and clipped between −1 and 0 during
training. Initial reciprocal connections weights were set to 0.
258 D. Nekhaev and V. Demin

2.3 Learning Rules

In our recent work [6] we proposed local learning rule called Family-Engaged Exe-
cution and Learning of Induced Neuron Groups, or FEELING, that provides com-
petitive learning of recurrent spiking neural networks. Core idea of our learning rules is
that every single neuron strives to maximize its activity in competition with other
neurons to justify its biological role in the whole network. We summarize FEELING
rules in equations:
8 dwij w
>
> ¼ a ðai ðtÞ hi ðtÞÞ d t tj sij
>
< dwkj
dt wkj
dt ¼ b aj ðtÞ hj ðtÞ wkj dðt tk Þ s ð3Þ
>
>
dwkk0
dt ¼ c ðak ðtÞ hk ðt ÞÞ dðt tk Þ s
0
wkk0
>
: dwjj0 w0
dt ¼ g aj ðtÞ hj ðtÞ aj0 ðtÞ hj0 ðtÞ d t tj sjj

Here wij are forward weights, wkj reciprocal weights, wjj0 inhibitory weights
between classifying neurons, and wkk0 are inhibitory weights between neurons in spike-
pooling layer. Weights in convolutional layer are shared, so ﬁnal update for wij is
averaged between connections within each convolutional ﬁlter. In equations above, dðÞ
stands for Dirac’s delta-function meaning the weight updates occur only with spikes of
corresponding neurons, a, b, c, η are the learning rate parameters, and the last terms
serve for the weight decays with time constant s to forget inactive patterns.

3 Results

In this section we report our results of training CRSNN with FEELING learning rules.
We compare our CRSNN to RSNN proposed in [6]. We get better accuracy in both
supervised and semi-supervised training regimes having 56 times less number of
learning parameters. Also we analyze learned convolutional ﬁlters and plot maximizing
images [9] for classifying neurons for both normal and inverse input channels.

3.1 Learning Curves

Here we compare convergence of CRSNN to RSNN proposed in [6]. Both architectures
use the same data encoding method with inverse images on the input. Number of
hidden layer neurons is equal to 400 for both architectures (400 fully connected hidden
neurons for RSNN and 16 4 4 for CRSNN).
Supervised Learning. Learning curves for CRSNN and RSNN are presented in
Fig. 1. RSNN converges faster, but CRSNN provides better accuracy - 97.25% against
96.40% for RSNN. Also, we analyze the impact of reciprocal weights in CRSNN and
RSNN. Removing reciprocal weights from the training setup drops CRSNN accuracy
to 96.33% (against 94.15% for RSNN).
Competitive Maximization of Neuronal Activity 259

Fig. 1. Learning curves for CRSNN and RSNN on MNIST dataset.

Semi-supervised Learning. Learning curves in semi-supervised mode for CRSNN

and RSNN are presented in Fig. 2. Unsupervised learning starts after the ﬁrst 400
training MNIST digit images passed through the network with the teacher current (in
supervised mode). Unsupervised learning for convolutional network achieves better
accuracy (76.9%) than fully connected network (72.1%).

Fig. 2. Learning curves for supervised and semi-supervised modes. RSNN converges faster but
CRSNN provides better accuracy for both supervised and semi-supervised modes.

3.2 Weight Visualization

Weight visualization is useful for interpreting how neural network processes the input
information. Here we analyze convolutional filters, inhibitory weights and plot maxi-
mizing images for every classifying neuron.
Convolutional Filters. Convolutional layer consists of 25 filters with 16 16 for
each input channel (50 filters in total). Filters for the first and the second input channels
are presented in Fig. 3A and B respectively. Filters in the same row and column on
Fig. 3A and B correspond to the same output feature map. We emphasize that visu-
alizations for pairs of normal and inverse filters almost do not overlap and, moreover,
correspond to each other as the parts of one puzzle.
260 D. Nekhaev and V. Demin

Fig. 3. Convolutional filters obtained after training CRSNN with the FEELING learning rule.
(A) 25 filters on the left-side stand for the first (normal) input channel. (B) 25 filters on the right-
side stand for the second (inverse) input channel. We notice that areas that have high values of
weights of the first channel have small values of corresponding weights of the second channel.

Inhibitory Weights. Inhibitory weights between spike-pooling neurons can be viewed

as 25 25 matrix as demonstrated in Fig. 4. Self-inhibition is not allowed, so we
set all diagonal elements to 0. Despite the fact that we initialized all inhibitory weights
to −1, final weight distribution is far enough from the total inhibition. Strong inhibition
is essential mostly at the very early stages of training to provide learning of different
filters. However, after some period of simulation time, some filters start to cooperate
(yellow points on non-diagonal elements). I.e. 24 and 25 filters have very weak inhi-
bitory connections because they have quite similar convolutional weights (see the last
two filters in Fig. 3A and B).

Fig. 4. Inhibitory weights obtained while training with FEELING learning rule. These weights
provide competition between different convolutional network in CRSNN. We notice that ﬁnal
inhibitory weight matrix looks symmetric, as it should be anticipated naturally.

Maximizing Images. In this work we also have applied our original method of
reconstructing maximizing images [9] to convolutional spiking architecture. The main
idea of this method is that for each classifying neuron we have to (1) ﬁnd the image that
provides the highest activation of this neuron, (2) compute the gradient of the activity
of this neuron with respect to the input, (3) perform one step in the direction of
gradient, (4) iteratively repeat steps 2 and 3 for a ﬁxed number of epochs, and (5) pass
Competitive Maximization of Neuronal Activity 261

the result through the threshold ﬁlter to binarize maximizing image. Resulting maxi-
mizing images are presented in Fig. 5 for both normal and inverse input channels.

Fig. 5. Reconstructed maximizing images of the output neurons. The ﬁrst row corresponds to
the normal input channel, the second row corresponds to the inverse input channel.

3.3 Comparison with Other Methods

We compare our results with other training methods for different SNN architectures in
Table 1. CRSNN architecture got improvement by 0.85% accuracy over our previous
work [6] with FEELING leaning rule having 55 times less parameters than RSNN.

Table 1. Recognition accuracies of different algorithms on the MNIST dataset.

Architecture Hidden layers Learning rule Accuracy (%)
Spiking RBM [10] 500-500 CD 91.3
Fully connected SNN [4] 100 STDP 82.9
Fully connected SNN [4] 6400 STDP 95.0
Fully connected SNN [2] 800 Back-prop 98.56
Convolutional SNN [11] Convolutional coding Tempotron 91.3
Convolutional SNN [2] conv (20)-conv (50)-200 Back-prop 99.3
Convolutional SNN [5] conv (30)-conv (100)-100 STDP 98.4
RSNN [6] 100 FEELING 95.40
RSNN [6] 400 FEELING 96.40
CRSNN (this work) conv (25) FEEILNG 97.25
CRSNN + LogReg (this work) conv (25) FEELING 98.35

To compare our results with that of [5], we have trained not spiking linear classiﬁer
on the top of spike-pooling layer. We recorded activities of 25 pooling neurons, fed
them to the input of logistic regression classiﬁer and obtained accuracy 98.35%. So, we
have got the same accuracy with much shorter architecture using convolutional feature
extractor trained with FEELING instead of STDP.
Deeper architectures trained with a back-propagation technique (adapted to SNN)
[2] still outperform our results by 0.95% at the best having approximately 3 times more
trainable parameters than proposed CRSNN.
262 D. Nekhaev and V. Demin

4 Conclusion

Introduced CRSNN is a light architecture that can be trained with FEELING rules
using only 20000 MNIST images and provides high accuracy. An important advantage
of the FEELING as well as STDP rules is that they are local, i.e. use only locally
accessible data (the activities and weight values of interconnected neurons). This
property is believed to be the key for successful hardware realizations of learning
algorithms in the prospective high-performance and energy-efﬁcient neuromorphic
systems.

Acknowledgements. This work has been carried out using computing resources of the federal
collective usage center Complex for Simulation and Data Processing for Mega-science Facilities
at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/. Development of convolutional spiking
architecture and learning experiments has been supported by Russian Science Foundation grant
№. 17-71-20111, research and development of learning rules for spiking convolutional layers has
been supported by scientiﬁc grant of NRC “Kurchatov Institute” №. 1713.

References
1. Merolla, P.A., et al.: A million spiking-neuron integrated circuit with a scalable
communication network and interface. Science 345(6197), 668–673 (2014)
2. Lee, J.H., Delbruck, T., Pfeiffer, M.: Training deep spiking neural networks using
backpropagation. Front. Neurosci. 10, 508 (2016)
3. Bi, G., Poo, M.: Synaptic modifications in cultured hippocampal neurons: dependence on
spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci. 18(24), 10464–
10472 (1998)
4. Diehl, P., Cook, M.: Unsupervised learning of digit recognition using spike-timing-
dependent plasticity. Front. Comput. Neurosci. 9, 99 (2015)
5. Kheradpisheh, S.R., Ganjtabesh, M., Thorpe, S.J., Masquelier, T.: STDP-based spiking deep
convolutional neural networks for object recognition. Neural Netw. 99, 56–57 (2018)
6. Demin, V., Nekhaev, D.: Recurrent spiking neural network learning based on a competitive
maximization of neuronal activity. Front. Neuroinf. 12, 79 (2018)
7. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document
recognition. Proc. IEEE 86(11), 2278–2324 (1998)
8. Maass, W., Bishop, C.M.: Pulsed Neural Networks, p. 275. MIT Press, Massachusetts
(1999)
9. Nekhaev, D., Demin, V.: Visualization of maximizing images with deconvolutional
optimization method for neurons in deep neural networks. Procedia Comput. Sci. 119, 174–
181 (2017)
10. O’Connor, P., Neil, D., Liu, S., Delbruck, T., Pfeiffer, M.: Real-time classification and
sensor fusion with a spiking deep belief network. Front. Neurosci. 7, 178 (2013)
11. Bo, Z., et al.: Feedforward categorization on AER motion events using cortex-like features in
a spiking neural network. IEEE Trans. Neural Netw. Learn. Syst. 26, 1963–1978 (2015)
A Method of Choosing a Pre-trained
Convolutional Neural Network for Transfer
Learning in Image Classification Problems

Alexander G. Troﬁmov(&) and Anastasia A. Bogatyreva

National Research Nuclear University “MEPhI” (Moscow Engineering Physics

Institute), Kashirskoye Hwy 31, Moscow 115409, Russian Federation
[email protected]

Abstract. A method of choosing a pre-trained convolutional neural network

(CNN) for transfer learning on the new image classification problem is pro-
posed. The method can be used for quick estimation of which of the CNNs
trained on the ImageNet dataset images (AlexNet, VGG16, VGG19, GoogLe-
Net, etc.) will be the most accurate after its fine tuning on the new sample of
images. It is shown that there is high correlation (q 0.74, p < 0.01) between
the characteristics of the features obtained at the output of the pre-trained CNN’s
convolutional part and its accuracy on the test sample after fine tuning. The
proposed method can be used to make recommendations for researchers who
want to apply the pre-trained CNN and transfer learning approach to solve their
own classification problems and don’t have sufficient computational resources
and time for multiple fine tunings of available free CNNs with consequent
choosing the best one.

Keywords: Image classiﬁcation Convolutional neural network ImageNet

Transfer learning

1 Introduction

After the tremendous success of convolutional neural networks at the international

competitions ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) since
2012 more and more researchers tend to apply them to solve their image classiﬁcation
problems. In each case the researcher must make a choice – either build his own CNN
from scratch or take some pre-trained model as a starting point and adapt it to solve his
problem.
The CNN design from scratch is a difﬁcult and time-consuming task involving the
choice of sequence and types of network’s layers, number and dimension of convo-
lutional layers, parameters of convolutions, pooling, transfer functions, etc. Due to the
high computational complexity of the CNNs their design must take into account the
available computational capabilities and RAM. Many articles are dedicated to practical
recommendations for the CNN design. In [1], a method for estimating the resources
required for the CNN under design is proposed. In [2, 3], it was proposed to use
reinforcement learning and in [4] genetic algorithms in CNN design.

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 263–270, 2020.
https://doi.org/10.1007/978-3-030-30425-6_31
264 A. G. Troﬁmov and A. A. Bogatyreva

Another approach to image classiﬁcation with deep neural networks is transfer

learning [5–8]. This approach consists in choosing some pre-trained deep model as a
starting point and its additional training (fine tuning) on the new sample. The key
advantage of transfer learning in image classification is the ability to apply deep models
with small sample sizes [9].
CNNs obtained as a result of transfer learning were used in many practical appli-
cations such as image classification in medicine [10], text recognition [11], object
classification within x-ray imagery at airports [12], etc.
At the moment several dozen of CNNs trained for image classification are available
in open access (GoogLeNet, Alexnet, VGG16, VGG19, ResNet, etc.). Most of them
are trained on ImageNet dataset images [13]. For the new image classification problem
the question arises whether the transfer learning is applicable to solve it and if so,
which of the pre-trained models is best suited for this? In [14], it is noted that the
efficiency of transfer learning depends on the distance between the samples of features
formed by the CNN’s convolutional part for the original and new problems. In [15, 16],
it was shown that the features of CNN trained on ImageNet dataset can be successfully
transferred to solve a significantly different problem (biological data classification).
Despite the fact that most of the free pre-trained CNNs were trained on ImageNet
dataset images they all have different characteristics both in accuracy and in size and
hence in required computational resources and performance time. According to the
official results of the ILSVRC 2012–2015 competition there is a direct relationship
between the classification accuracy and the number of CNN layers [17]. However,
there is no reason to believe that the more complex the pre-trained network is, the more
accurate it will be for a new classification problem after fine tuning especially if the
new problem is very different from the original ImageNet problem.
In this paper we propose an approach to choose a pre-trained CNN for fine tuning
on the new sample of images. The approach is based on the estimation of pre-trained
CNN’s features separability for the considered image classification problem and the
choosing CNN with the greatest one.

2 Problem Statement

Let C1 ; . . .; CM be the convolutional parts of pre-trained CNNs (AlexNet, VGG16,

etc.), M is the number of considered deep models. The convolutional part Cm receives
an image x and forms the corresponding feature map at the output, which is a tensor
that can be transformed into Lm-dimensional vector zm ¼ ðzm1 ; . . .; zmLm ÞT : zm ¼ Cm ðxÞ,
m ¼ 1; M.
The dimension Lm is determined by the CNN architecture. Let D ¼
ðiÞ ðiÞ
x ; r ; i ¼ 1; n be a sample of n labeled images, xðiÞ is the i-th image, rðiÞ is the
corresponding class label, rðiÞ 2 f1; . . .; Kg, i ¼ 1; n, K is the number of classes.
We pose the problem of determining which of the models C1 ; . . .; CM is most
suitable for the transfer learning, i.e. for its ﬁne tuning on the sample D. One of the
approaches to solve this problem is exhaustive search consisting in the ﬁne tuning of
each of the models C1 ; . . .; CM on the sample D and then the selection of the best
A Method of Choosing a Pre-trained Convolutional Neural Network 265

accurate model. However, this approach has an obvious drawback in its high com-
putational complexity. Fine tuning of just one model can take up to several hours or
even days on modern GPUs. At the same time, a single run of models C1 ; . . .; CM on
the sample D is much less computationally expensive procedure.
In this paper we propose an approach based on the estimation of the separability of
features formed by models C1 ; . . .; CM . It is assumed that the more separable the data
observed at the output of the pre-trained CNN’s convolutional part for some sample of
images, the more accurate the CNN will be after its fine tuning on this sample. This
assumption is based on the fact that the fully connected CNN’s layers located after the
convolutional part tend to adapt mostly during fine tuning and the CNN’s convolu-
tional layers particularly the earlier ones adapt their weights much less or insignifi-
cantly [18]. In other words, the accuracy of the CNN after fine tuning is highly
determined by the quality of the features formed by the pre-trained CNN’s convolution
part. n o
ðiÞ
Let Dm ¼ zm ; rðiÞ ; i ¼ 1; n be the labeled sample of CNN’s features where
ðiÞ
the Lm-dimensional vector ¼ Cm ðxðiÞ Þ is obtained at the output of the CNN’s
zm
convolutional part Cm, m ¼ 1; M, as a result of its simulation on the image xðiÞ from
sample D. We estimate the separabilities c1 ; . . .; cM of the data in samples D1 ; . . .; DM
and select the model that is characterized by the highest separability. This model is
assumed to be most suitable for transfer learning as soon as it has a priori the more
efficient features among the considered pre-trained CNNs for the given image classi-
fication problem.

3 Estimation of CNN’s Features Quality

A direct method of estimating the data separability is to train some classifier on this
data. Accuracy of the trained classifier on the test sample will be the measure of data
separability. The CNN’s fully connected layers can be chosen as such classifiers. Its
drawback is dependency on the initial weights, training method and its hyperparam-
eteres, high computational cost and possible overfitting. In this regard we will use
robust and high-speed indirect estimation methods.
Existing metrics for data separability are usually based on the assumption that the
data of one class are spatially close while the classes themselves are far from each
other, i.e. that classes form clusters. Thus, some cluster indices are used (Dunn index,
Davis-Baldwin index, etc.) as a measure of class separability [19]. But in practice the
assumption of class compactness can be violated.
We propose a “naive” method for assessing the quality of CNN’s features. Its idea
is to assess the separability for each feature independently and construct the overall
index of separability based on separabilities of single features.
It is known that binary separability of one-dimensional data is characterized by
ROC-curve and ROC AUC can be used as a separability measure. The micro-averaged
and macro-averaged ROC AUC are generalizations of ROC AUC to the multiclass data
[20]. Due to the fact that in practice these multiclass measures are usually very similar
to each other we use macro-averaged ROC AUC as simpler one to calculate.
266 A. G. Trofimov and A. A. Bogatyreva

ð1Þ ðnÞ
Let zjm ; . . .; zjm be the sample obtained at the j-th output of model Cm, j ¼ 1; Lm ,
m ¼ 1; M, as a result of its simulation on the images xð1Þ ; . . .; xðnÞ from sample D. This
sample is characterized by macro-averaged ROC AUC ajm calculated using the cor-
responding class labels rð1Þ ; . . .; rðnÞ . Thus, the outputs of model Cm will be charac-
terized by a vector of macro-averaged ROC AUCs am ¼ ða1m ; . . .; aLm m ÞT , m ¼ 1; M.
The overall quality measure of the model Cm’s features is a some function f of the
vector am: cm ¼ f ðam Þ, m ¼ 1; M.
It is argued that the model with the highest quality measure will be the most
accurate after fine tuning on sample D. In order for this statement to be valid, it is
necessary to find a transformation f that maximizes the correlation q between model’s
quality measure and its accuracy after fine tuning:

q ¼ corrððc1 ; . . .; cM Þ; ðp1 ; . . .; pM ÞÞ ! max; ð1Þ

where pm is the accuracy of m-th CNN after ﬁne tuning on the sample D, m ¼ 1; M.
Problem (1) is a variational problem in the space of functions. Its exact solution can
be very difﬁcult. We calculate some statistics (in particular, mean, variance, etc. of the
vector am’s elements) as the quality cm and choose the statistic that provides the
maximum correlation q. The statistics used in this paper are discussed in Sect. 4.

4 Experimental Results

We used a convolutional parts of CNNs trained on ImageNet dataset: AlexNet,

VGG19, GoogLeNet, ResNet18, etc. as models C1 ; . . .; CM . Totally M = 10 models.
The sample of images (n = 5000) of handwritten digits from MNIST test dataset [21]
was used as a sample D. Number of classes K = 10.
The features observed at output of each model C1 ; . . .; CM were calculated for
images from sample D, consequently the corresponding samples D1 ; . . .; DM were
constructed and vectors a1 ; . . .; aM of macro-averaged ROC AUCs were calculated.
Each of the models C1 ; . . .; CM was fine tuned on the training part of the MNIST
sample and test accuracies p1 ; . . .; pM were obtained. The stopping criterion was to
achieve 99% classification accuracy on the training sample or early stopping. To
improve the reliability of the results, the training of each model was carried out 10
times and as a result 10 different samples p1 ; . . .; pM were calculated. Further for
simplicity the mean (by training processes) accuracy is understood. Figure 1 shows
box-and-whisker diagrams of samples a1m ; . . .; aLm m , m ¼ 1; M, and achieved after fine
tuning mean accuracy pm, m ¼ 1; M, and its standard deviation.
It can be noted on Fig. 1 that the relationship between statistics of sample
a1m ; . . .; aLm m and the corresponding accuracy pm, m ¼ 1; M, is unclear. So, the Squee-
zeNet and AlexNet networks have the best accuracies after fine tuning but statistical
characteristics of their features’ AUCs are not the highest. Moreover the average AUC of
features formed by the convolutional parts of the SqueezeNet is lower than that of all other
networks. The maximum AUC (0.74) is observed for the feature formed by the Den-
seNet201, however this network doesn’t demonstrate the best accuracy after fine tuning.
A Method of Choosing a Pre-trained Convolutional Neural Network 267

0.98

0.7
0.96

0.65

Accuracy
AUC 0.94
0.6

0.55 0.92

0.5
0.9

t18 101 et 6 1 t50 c7 c6

v3 g1 et
on resne snet len vg et20 esne netF en netF
pti re go
og se
n r x ez
ue alex
ince en ale sq
d

Fig. 1. Box-and-whisker diagrams of samples a1m ; . . .; aLm m , m ¼ 1; M, and accuracies achieved

after ﬁne tuning of each model C1 ; . . .; CM .

Figure 2 shows scatter plots on the plane (c, p). Different statistical characteristics
of samples a1m ; . . .; aLm m , m ¼ 1; M, were used to calculate the measures c1 ; . . .; cM .
The highest correlation (q 0.5) is observed between the fine-tuned CNN’s test
accuracy and the maximum AUC of its features. In addition, there is an interesting
relation between accuracy and the averaged (among all CNN’s features) AUC. As soon
as the averaged AUC grows the accuracy at first decreases and then begins to increase.
At the same time the networks with the minimum and maximum averaged AUCs
(SqueezeNet and AlexNet respectively) have almost the same classification accuracy
after fine tuning (97.5%).

Fig. 2. Scatter plots on the plane (c, p). Different statistical characteristics of samples
a1m ; . . .; aLm m , m ¼ 1; M, were used to calculate the measures c1 ; . . .; cM – mean (left), standard
deviation (center), maximum value (right). Each point corresponds to some CNN.
268 A. G. Troﬁmov and A. A. Bogatyreva

Other statistical characteristics (asymmetry coefﬁcient, kurtosis, number of main

components, etc.) of the samples a1m ; . . .; aLm m , m ¼ 1; M, were also calculated but the
corresponding correlation coefﬁcient q for them was less than 0.5.
Choosing the CNN to ﬁne tune based on the only greatest AUC of its features
seems unreliable. It is known that a more robust statistic can be obtained as a result of
some averaging. Thus, we average among q greatest AUCs to calculate cm:

1X
q
cm ðqÞ ¼ amj ; q Lm ; m ¼ 1; M; ð2Þ
q j¼1

where the AUCs am1 ; . . .; amLm are sorted in descending order. The greatest AUC is
maxfam1 ; . . .; amLm g ¼ cm ð1Þ, m ¼ 1; M.
Figure 3 shows the dependency of the correlation coefﬁcient qðqÞ ¼ corr ðcðqÞ; pÞ
on the number of CNN’s features q used in averaging in (2).

Fig. 3. A plot of the correlation coefﬁcient q(q) versus the number of CNN’s features q used in
averaging (left) and scatter plot on the plane (c, p) for q = 100 (right).

The plot shows that the maximum correlation (qmax 0.74, p-value < 0.01) cor-
responds to the number of features q 100. It means that the accuracy of the ﬁne
tuned network can be predicted more precisely based on the AUC averaged over 100
best features formed by pre-trained CNN.

5 Conclusion

It is shown that the accuracy of fine tuned CNN on the test images is strongly correlated
with the AUC averaged over the features formed by the CNN’s convolutional part and
having the greatest discrimination capability. This makes it possible to predict the
accuracy of fine tuned CNN on the new sample of images before the carrying out the
expensive fine tuning procedure.
The proposed method can be used to make recommendations for researchers who
want to apply the pre-trained CNN and transfer learning to solve their own
A Method of Choosing a Pre-trained Convolutional Neural Network 269

classification problems and don’t have sufficient computational resources for multiple
fine tunings of available free CNNs and choosing the best one.
A possible direction for further research is the construction of the more precise
characteristics of CNNs’ features to estimate their capability to transfer learning, i.e.
more accurately predict CNNs’ error after fine tuning on a new sample of images.
Another and more ambitious direction is the development of a method for quick
assessing the capability of the CNN to transfer learning based only on the descriptors of
the given sample of images without the calculation and statistical analysis of features
formed by of pre-trained CNN.

References
1. Ma, N., et al.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In:
Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
2. Baker, B., et al.: Designing neural network architectures using reinforcement learning. arXiv
preprint. arXiv:1611.02167 (2016)
3. Mortazi, A., Bagci, U.: Automatically designing CNN architectures for medical image
segmentation. In: International Workshop on Machine Learning in Medical Imaging, pp. 98–
106. Springer, Cham (2018)
4. Sun, Y., et al.: Automatically designing CNN architectures using genetic algorithm for
image classification. arXiv preprint. arXiv:1808.03818 (2018)
5. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image
representations using convolutional neural networks. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)
6. Huang, Z., Pan, Z., Lei, B.: Transfer learning with deep convolutional neural network for
SAR target classification with limited labeled data. Remote Sens. 9(9), 907 (2017)
7. Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big data 3(1), 9
(2016)
8. Kulik, S.: Neural network model of artificial intelligence for handwriting recognition.
J. Theor. Appl. Inf. Technol. 73(2), 202–211 (2015)
9. Larsen-Freeman, D.: Transfer of learning transformed. Lang. Learn. 63, 107–129 (2013)
10. Ghafoorian, M., et al.: Transfer learning for domain adaptation in MRI: application in brain
lesion segmentation. In: International Conference on Medical Image Computing and
Computer-Assisted Intervention, pp. 516–524. Springer, Cham (2017)
11. Tang, Y., Peng, L., Xu, Q., Wang, Y., Furuhata, A.: CNN based transfer learning for
historical Chinese character recognition. In: 2016 12th IAPR Workshop on Document
Analysis Systems (DAS), pp. 25–29 (2016)
12. Akcay, S., et al.: Transfer learning using convolutional neural networks for object
classification within x-ray baggage security imagery. In: 2016 IEEE International
Conference on Image Processing (ICIP), pp. 1057–1061 (2016)
13. ImageNet. http://www.image-net.org
14. Yosinski, J., et al.: How transferable are features in deep neural networks? In: Advances in
Neural Information Processing Systems, pp. 3320–3328 (2014)
15. Zhang, W., et al.: Deep model based transfer and multi-task learning for biological image
analysis. IEEE Trans. Big Data 99, 1 (2016)
270 A. G. Trofimov and A. A. Bogatyreva

16. Troﬁmov, A.G., Velichkovskiy, B.M., Shishkin, S.L.: An approach to use convolutional
neural network features in eye-brain-computer-interface. In: International Conference on
Neuroinformatics, pp. 132–137. Springer, Cham (2017)
17. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
18. Reyes, A.K., Caicedo, J.C., Camargo, J.E.: Fine-tuning deep convolutional networks for
plant recognition. CLEF (Working Notes), p. 1391 (2015)
19. Desgraupes, B.: Clustering Indices. University of Paris Ouest-Lab Modal’X, Paris (2013)
20. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel
classiﬁcation. In: European Conference on Machine Learning, pp. 406–417. Springer,
Berlin, Heidelberg (2007)
21. The MNIST database. http://yann.lecun.com/exdb/mnist/
The Usage of Grayscale or Color Images
for Facial Expression Recognition
with Deep Neural Networks

Dmitry A. Yudin1(&) , Alexandr V. Dolzhenko2 ,

and Ekaterina O. Kapustina2
1
Moscow Institute of Physics and Technology (National Research University),
Institutsky Per. 9, Dolgoprudny, Moscow Region 141700, Russia
[email protected]
2
Belgorod State Technological University named after V.G. Shukhov,
Kostukova Str. 46, Belgorod 308012, Russia

Abstract. The paper describes usage of modern deep neural network archi-
tectures such as ResNet, DenseNet and Xception for the classification of facial
expressions on color and grayscale images. Each image may contain one of
eight facial expression categories: “Neutral”, “Happiness”, “Sadness”, “Sur-
prise”, “Fear”, “Disgust”, “Anger”, “Contempt”. As the dataset was used
AffectNet. The most accurate architecture is Xception. It gave classification
accuracy on training sample 97.65%, on cleaned testing sample 57.48% and top-
2 accuracy on cleaned testing sample 76.70%. The category “Contempt” is
worst recognized by all the types of neural networks considered, which indicates
its ambiguity and similarity with other types of facial expressions. Experimental
results show that for the considered task it does not matter, the color or grayscale
image is fed to the input of the algorithm. This fact can save a significant amount
of memory when storing data sets and training neural networks. The computing
experiments was performed using graphics processor using NVidia CUDA
technology with Keras and Tensorflow deep learning frameworks. It showed
that the average processing time of one image varies from 4 ms to 30 ms for
different architectures. Obtained results can be used in software for neural
network training for face recognition systems.

Keywords: Image recognition Classiﬁcation Facial expression Emotion

Face Deep learning Convolutional neural network

1 Introduction

Currently, signiﬁcant progress has been made in creating efﬁcient image recognition
algorithms based on the use of deep neural networks [1–3]. As a rule, such algorithms
require the presence of a large number of images that are obtained in different lighting
and noise conditions. They need huge amounts of memory for storage as well as for
training. There are subject areas for which it is advisable to study the possibility of
using grayscale images instead of color during training of recognition algorithms. This
can reduce by three times the need for RAM or hard disk space.

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 271–281, 2020.
https://doi.org/10.1007/978-3-030-30425-6_32
272 D. A. Yudin et al.

One of these areas is the task of person’s facial expression recognition. The stan-
dard for determining the type of facial expressions is the Emotional facial action coding
system (EMFACS-7), proposed by Friesen and Ekman in 1983 [4]. This generally
accepted standard identifies seven basic types of emotions: (1) anger, (2) contempt,
(3) disgust, (4) fear, (5) happiness, (6) sadness, (7) surprise. Additionally, it is con-
sidered a neutral facial expression.
At the initial stage, methods for emotions recognition on human face images were
associated with manual selection of features: Gabor wavelets [5], local binary patterns
[6], geometric deformation features on image sequences [7], 3D Surface Features [8],
etc. Modern approaches are based on the automatic generation of image features based
on deep convolutional neural networks, some of them use the prior alignment technique
[9], some recognize facial expressions on images as they are [1, 10], including tuning
of pre-trained networks on the task of face identification [11]. Also deep spatial-
temporal networks were proposed for emotion recognition on video sequences [12].
Deep learning is also actively used to analyze facial expressions from face three-
dimensional model [13].
There are a number of commercial services that implement closed-ended emotion
recognition methods: Face API from Microsoft Azure [14], Amazon Emotion API [15],
Affectiva Emotion SDK [16], etc.
However, the recognition of facial expressions on images in complex conditions of
variable light, noise, uncomfortable perspective is still an important topic for further
research.
For the study of approaches based on neural networks, there are many different data
sets that differ in shooting conditions, the variety of people photographed, and the
number of images per class. Some popular datasets and their features are listed in
Table 1:
– Cohn-Kanade AU-Coded Expression Database [17] (CK), it is shown the statistics
for the case with the first two and last two frames of image sequences from
database,
– The Japanese Female Facial Expression Database [5] (JAFFE),
– Facial Expression Recognition Challenge [18] (FER2013),
– Facial expressions Repository [19] (FE),
– SoF dataset [20] (SoF),
– AffectNet [21].
The largest of them is AffectNet dataset (a total of more than 1 million images). In
addition to manually labeled data, it contains automatically annotated images that
researchers or developers can label and check on their own if necessary.
This paper discusses the issue of facial expression recognition on static images
using modern deep learning methods, as well as choosing the format of the input data.
On the one hand, color images provide additional information about a person’s face, on
the other hand, using grayscale images reduces the effect of shooting conditions: light
level, type of light source, etc. To choose one of these forms of image representation, it
is necessary to conduct experiments with various architectures of neural networks with
different sizes of input images.
The Usage of Grayscale or Color Images for FER 273

Table 1. Datasets for facial expression recognition

Database CK JAFFE FER2013 FE SoF AffectNet
details
Image 640 490– 256 256 48 48 23 29– 640 480 129 129–
size 720 480 355 536 4706 4706
Image Portrait Portrait Cropped Cropped Portrait Cropped face
style face face
Image Grayscale, Grayscale Grayscale Grayscale, Color Color
type color color
Facial expression categories:
Neutral 324 30 6194 6172 667 75374
Happy 138 31 8989 5693 1042 134915
Sad 56 31 6077 220 237 25959
(sad/anger/disgust)
Surprise 166 30 4002 364 145 (surprise/fear) 14590
Fear 50 32 5121 21 0 6878
Disgust 118 29 547 208 0 4303
Anger 90 30 4953 240 0 25382
Contempt 36 0 0 9 0 4250
Total: 978 213 35883 12927 2091 291651

2 Task Formulation

In this paper we will solve the task of determining one of the eight facial expression
categories (“Neutral”, “Happiness”, “Sadness”, “Surprise”, “Fear”, “Disgust”, “Anger”,
“Contempt”) on grayscale or color images with cropped faces, see the Fig. 1.
We had taken modern and the biggest open-source dataset – AffectNet which
contains 287651 images as training sample and 4000 images (500 images per class) as
testing sample [21]. Samples include images of different sizes from 129 129 to
4706 4706 pixels that are obtained from different cameras in different shooting
conditions.

0 1 2 3 4 5 6 7

Fig. 1. Examples of labeled images with facial expressions from AffectNet Dataset: 0 – Neutral,
1 – Happiness, 2 – Sadness, 3 – Surprise, 4 – Fear, 5 – Disgust, 6 – Anger, 7 – Contempt
274 D. A. Yudin et al.

To solve the task it is necessary to develop various variants of deep neural network
architectures and to test them on the available data set with 1-channel (grayscale) and
3-channel (color) image representation. We must determine which image representation
is best used for the task of facial expression recognition. Also, we need to select the
best architecture that will provide best performance and the highest quality measures of
image classiﬁcation: accuracy, precision and recall [22].

3 Dataset Preparation

AffectNet [21] was chosen as the main dataset, which is one of the largest modern
datasets for facial expression recognition. However, it contains relatively few images
for the “Fear”, “Disgust” and “Contempt” categories compared to other categories. To
conduct experiments for learning neural networks, augmentation of images was carried
out, and a balanced training sample was formed with 10,000 images per class.
For image augmentation we have used 5 sequential steps:
1. Coarse Dropout – setting rectangular areas within images to zero. We have gen-
erated a dropout mask at 2 to 25 percent of image’s size. In that mask, 0 to 2 percent
of all pixels were dropped (random per image).
2. Afﬁne transformation – image rotation on random degrees from −15 to 15.
3. Flipping of image along vertical axis with 0.9 probability.
4. Addition Gaussian noise to image with standard deviation of the normal distribution
from 0 to 15.
5. Cropping away (cut off) random value of pixels on each side of the image from 0 to
10% of the image height/width.
Results of this augmentation procedure are shown on Fig. 2.

Fig. 2. Examples of augmented images for Training sample 2 (balanced)

As the most of open-source datasets AffectNet contains wrong ground truth

labels for cropped faces (Fig. 3). We had cleaned the testing sample for more correct
evaluation of classiﬁers. As a result we create Testing sample 2 which have 3210
images.
The Usage of Grayscale or Color Images for FER 275

Errors in Errors in Errors in Errors in Errors in Errors in Errors in Errors in

“Neutral” “Happiness” “Sadness” “Surprise” “Fear” “Disgust” “Anger” “Contempt”
category category category category category category category category

Fig. 3. Examples of wrong ground truth labels in testing sample of AffectNet Dataset

Details of the datasets used in this research are given in Table 2.

Table 2. Training and testing samples of used dataset

Facial expression Training Training sample 2 Testing Testing sample 2
category sample 1 (balanced) sample 1 (cleaned)
0 - Neutral 74874 10000 500 490
1 - Happiness 134415 10000 500 451
2 - Sadness 25459 10000 500 473
3 - Surprise 14090 10000 500 453
4 - Fear 6348 10000 500 477
5 - Disgust 3803 10000 500 359
6 - Anger 24882 10000 500 351
7 - Contempt 3749 10000 500 156
Total: 287621 80000 4000 3210

4 Classiﬁcation of Emotion Categories Using Deep

Convolutional Neural Networks

In this paper to solve formulated task we investigate the application of a deep con-
volutional neural networks of three architectures:
– ResNetM architecture inspired from ResNet [23] and implemented by authors in
previous works [24]. It has input tensor 120 120 3 for color images and
120 120 1 for grayscale images. Its structure is shown in Fig. 4 and contains 3
convolutional blocks, 5 identity blocks, 2 max pooling layers, 1 average pooling
layer and one output dense layer. First 11 layers and blocks provide automatic
feature extraction and the last one fully connected layer allows us to ﬁnd one of ﬁve
image classes corresponding to input image. ResNetM net was trained on full
Training sample 1.
276 D. A. Yudin et al.

Fig. 4. ResNetM architecture.

– DenseNet architecture is based on DenseNet169 model [25] with input tensor

224 224 3 for color images and 224 224 1 for grayscale images. Its
structure uses alternating Dense and Transition blocs (Fig. 5). The dataset from
Training sample 2 containing 4000 images per class was prepared for DenseNet
training.

Fig. 5. DenseNet architecture.

– Xception architecture [26] with changed input tensor to 120 120 3 for color
images and 120 120 1 for grayscale images. This structure is a development
of the Inception [27] and is based on prospective Separable convolutional blocks
architectures (see Fig. 6). Xception net was trained on balanced Training sample 2.
The Usage of Grayscale or Color Images for FER 277

Fig. 6. Xception architecture.

Output layer in all architecture has 8 neurons with “Softmax” activation function.
All input images are pre-scaled to a size of 60 60 pixels for ResNetM architecture,
120 120 pixels for Xception architecture and 224 224 pixels for DenseNet169.
Neural networks works with color (three-channel) and grayscale (one-channel) images.
To train the neural networks we have used “categorical crossentropy” loss function,
Stochastic Gradient Descent (SGD) as training method with 0.001 learning rate.
Accuracy is used as classiﬁcation quality metric during training. The batch is consisted
of 5 images.
The training process of deep neural networks is shown in Fig. 7. The training
experiment was carried out for 50 learning epochs using our developed software tool
implemented on Python 3.5 programming language with Keras + Tensorflow frame-
works [28]. We can see that DenseNet and Xception networks have similar speed and
accuracy, while ResNetM achieves much lower accuracy rates on test samples com-
pared to them.
The calculations had performed using the NVidia CUDA technology on the
graphics processor of the GeForce GTX 1060 graphics card with 6.00 GB, central
processor Intel Core i-5-8300H, 4 Core with 2.3 GHz and 24 GB RAM.
278 D. A. Yudin et al.

Fig. 7. Training of deep neural networks with ResNetM, DenseNet and Xception architectures.

Table 3 shows the results of the facial expression recognition on training and test
samples with color or grayscale images using ResNetM, DenseNet169 and Xception
architectures.
Analysis of the obtained results shows the highest accuracy and on all samples
Xception architecture with grayscale input images: 97.65% on training sample, 57.48%
on testing sample 2 and top-2 accuracy 76.70%. It also has the greatest and more
balanced values of precision and recall for almost all categories (classes) of facial
expression except for “Anger” и “Contempt”.
ResNetM is signiﬁcantly faster than all other architectures: about 4 ms for pro-
cessing a single image against 12 ms for Xception and 30 ms for DenseNet. Also, this
architecture has the highest recognition recall for the “Happiness” category.
DenseNet surpasses all other architectures in “Anger” category recognition and is
better in terms of recognition recall of “Fear” and “Contempt” categories. Also it has
the highest precision for “Neutral” category.
The category “Contempt” is poorly recognized by all the types of neural networks
considered, which speaks primarily of its ambiguity and similarity with other types of
facial expressions, in particular “Neutral”.
The Usage of Grayscale or Color Images for FER 279

Table 3. Quality of facial expression recognition on AffectNet Dataset

Metric ResNetM DenseNet Xception
Color Grayscale Color Grayscale Color Grayscale
Accuracy on train sample 0.9283 0.9139 0.9168 0.9428 0.9686 0.9765
Accuracy on test 0.4844 0.4781 0.5520 0.5427 0.5654 0.5748
sample 2
Top-2 acc. on test 0.6748 0.6766 0.7467 0.7371 0.7355 0.7670
sample 2
Classif. time per image, s 0,0042 0.0047 0.0305 0.0299 0.0120 0.0123
Weights number 2613392 2607120 12656200 12649928 20877872 20877296
Size of model, Mb 10.654 10.629 51.933 51.908 83.826 83.823
Size of train sample on 26373.7 9520.9 3432.7 1210.6 7624.1 2670.2
HDD, Mb
Size of train sample 12425.2 4141.7 19267.6 6422.5 13824.0 4608.0
in operative memory, Mb
Quality metrics on test sample 2 (cleaned)
Neutral (0): precision 0.375 0.4083 0.4838 0.5644 0.5422 0.5223
Neutral (0): recall 0.6061 0.5000 0.5490 0.3755 0.4592 0.5735
Happiness (1): precision 0.5214 0.5325 0.7363 0.7701 0.7363 0.7973
Happiness (1): recall 0.9468 0.9268 0.7428 0.7428 0.8049 0.7849
Sadness (2): precision 0.5184 0.4103 0.6070 0.5617 0.5221 0.6099
Sadness (2): recall 0.4165 0.5370 0.4735 0.4715 0.6490 0.4693
Surprise (3): precision 0.4810 0.4708 0.4977 0.5177 0.5455 0.5000
Surprise (3): recall 0.3907 0.3377 0.4966 0.5475 0.5033 0.6137
Fear (4): precision 0.5880 0.5951 0.5867 0.5665 0.6181 0.5864
Fear (4): recall 0.3501 0.3542 0.5744 0.5898 0.5597 0.5765
Disgust (5): precision 0.6510 0.5679 0.5287 0.5600 0.5912 0.6655
Disgust (5): recall 0.2702 0.3259 0.6156 0.5070 0.5599 0.5097
Anger (6): recall 0.4645 0.4550 0.5552 0.4456 0.4941 0.4802
Anger (6): recall 0.5413 0.4900 0.4587 0.5954 0.4758 0.5869
Contempt (7): precision 0.3333 0.5384 0.3103 0.2827 0.3065 0.3407
Contempt (7): recall 0.0192 0.0448 0.4038 0.5128 0.3654 0.2949

As for the size of the network, here the smallest amount of memory is occupied by
the weights for ResNetM (about 10.6 MB), the largest volume by the weights of the
Xception network (83.8 MB).
For all considered types of neural networks, the representation of the input images
in gray or color format did not lead to any signiﬁcant difference in the values of the
metrics accuracy, top-2 accuracy, processing time per image, and weights number.
Thus, it can be concluded that for the facial recognition task it does not matter, the
color or grayscale image is fed to the algorithm. This fact can save a signiﬁcant amount
of memory when storing datasets (about 65% of HDD space) and training of neural
networks (about of 67% of operative memory).
280 D. A. Yudin et al.

5 Conclusions

It follows from the Table 3 that the applied architectures of a deep neural network for
face expression recognition on AffectNet dataset show high quality indicators for the
training set, but significantly worse results on the testing sample. This can be explained
by the ambiguity of certain emotions on a person’s face, a variety of shooting angles
and the presence of conflicting data in the training sample. The most accurate archi-
tecture is Xception. It gave classification accuracy 97.65% on training sample, 57.48%
on testing sample 2 and top-2 accuracy 76.70% on testing sample 2.
The category “Contempt” is worst recognized by all the types of neural networks
considered, which indicates its ambiguity and similarity with other types of facial
expressions.
Experimental results show that for the considered task it does not matter, the color
or grayscale image is fed to the input of the algorithm. This fact can save a significant
amount of memory when storing data sets and training neural networks.
An important aspect for the further application of the considered approaches is the
average classification time per image. It varies from 4 ms for ResNetM to 30 ms for
DenseNet. This suggests that the all described approaches can be integrated into a real-
time face recognition software.
To further studies on the paper topic it is necessary to expand the training and test
samples to cover more images in “Fear”, “Disgust” and “Contempt” categories. Also, it
will be promising to explore the emotion recognition on images with a face alignment
based on key points, in order to reduce the impact of choosing bounding box of face
detection algorithms.

Acknowledgment. The research was made possible by Government of the Russian Federation
(Agreement №. 075-02-2019-967).

References
1. Zeng, N., Zhang, H., Song, B., Liu, W., Li, Y., Dobaie, A.M.: Facial expression recognition
via learning deep sparse autoencoders. Neurocomputing 273, 643–649 (2018)
2. Yudin, D., Knysh, A.: Vehicle recognition and its trajectory registration on the image
sequence using deep convolutional neural network. In: The International Conference on
Information and Digital Technologies, pp. 435–441 (2017)
3. Yudin, D., Naumov, A., Dolzhenko, A., Patrakova, E.: Software for roof defects recognition
on aerial photographs. J. Phys. Conf. Ser. 1015(3), 032152 (2018)
4. Friesen, W., Ekman, P.: EMFACS-7: emotional facial action coding system. Unpublished
Manuscript Univ. Calif. San Francisco 2(36), 1 (1983)
5. Lyons, M.J., Akemastu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor
wavelets. In: 3rd IEEE International Conference on Automatic Face and Gesture
Recognition, pp. 200–205 (1998)
6. Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary
patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
7. Kotsia, I., Pitas, I.: Facial expression recognition in image sequences using geometric
deformation features and support vector machines. IEEE Trans. Image Process. 16(1), 172–
187 (2006)
The Usage of Grayscale or Color Images for FER 281

8. Wang, J., Yin, L., Wei, X., Sun, Y.: 3D facial expression recognition based on primitive
surface feature distribution. In: IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR 2006) (2006)
9. Lopes, A.T., de Aguiar, E., De Souza, A.F., Oliveira-Santos, T.: Facial expression
recognition with convolutional neural networks: coping with few data and the training
sample order. Pattern Recogn. 61, 610–628 (2017)
10. Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition
using deep neural networks. In: IEEE Winter Conference on Applications of Computer
Vision (WACV) (2016)
11. Ding, H., Zhou, S.K., Chellappa, R.: FaceNet2ExpNet: regularizing a deep face recognition
net for expression recognition. In: 12th IEEE International Conference on Automatic Face &
Gesture Recognition (FG 2017) (2017)
12. Zhang, K., Huang, Y., Du, Y., Wang, L.: Facial expression recognition based on deep
evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017)
13. Zhang, T., Zheng, W., Cui, Z., Zong, Y., Yan, J., Yan, K.: A deep neural network-driven
feature learning method for multi-view facial expression recognition. IEEE Trans.
Multimedia 18(12), 2528–2536 (2016)
14. Face API oт Microsoft Azure. https://azure.microsoft.com/ru-ru/services/cognitive-services/
face/#detection. Accessed 26 May 2019
15. Amazon Emotion API. https://docs.aws.amazon.com/rekognition/latest/dg/API_Emotion.
html. Accessed 26 May 2019
16. Affectiva Emotion SDK. https://www.affectiva.com/product/emotion-sdk/. Accessed 26
May 2019
17. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended
cohn-kanade dataset (CK+): a complete expression dataset for action unit and emotion-
specified expression. In: Proceedings of the Third International Workshop on CVPR for
Human Communicative Behavior Analysis (CVPR4HB 2010), pp. 94–101 (2010)
18. Carrier, P.-L., Courville, A.: Challenges in representation learning: facial expression
recognition challenge (2013). https://www.kaggle.com/c/challenges-in-representation-
learning-facial-expression-recognition-challenge/data. Accessed 26 May 2019
19. Facial expressions. A set of images for classifying facial expressions. https://github.com/
muxspace/facial_expressions. Accessed 26 May 2019
20. Afifi, M., Abdelhamed, A.: AFIF4: deep gender classification based on an AdaBoost-based
fusion of isolated facial features and foggy faces. J. Vis. Commun. Image Represent. 62, 77–
86 (2019)
21. Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression,
valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
22. Olson, D.L., Delen, D.: Advanced Data Mining Techniques, 1st edn. Springer, Cham (2008)
23. Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Deep residual learning for image
recognition. ECCV. arXiv:1512.03385 (2015)
24. Yudin, D., Kapustina, E.: Deep learning in vehicle pose recognition on two-dimensional
images. Adv. Intell. Syst. Comput. 874, 434–443 (2019)
25. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional
networks. CVPR 2017. arXiv:1608.06993 (2017)
26. Chollet, F.: Xception: deep learning with depthwise separable convolutions. CVPR 2017.
arXiv:1610.02357 (2017)
27. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception
architecture for computer vision. ECCV. arXiv:1512.00567 (2016)
28. Chollet, F.: Keras: deep learning library for theano and tensorflow. https://keras.io/.
Accessed 26 May 2019
Applications of Neural Networks
Use of Wavelet Neural Networks to Solve
Inverse Problems in Spectroscopy
of Multi-component Solutions

Alexander Eﬁtorov1(&) , Sergey Dolenko1 ,

Tatiana Dolenko1,2 , Kirill Laptinskiy1,2, and Sergey Burikov1,2
1
D.V. Skobeltsyn Institute of Nuclear Physics, M.V. Lomonosov
Moscow State University, Moscow 119991, Russia
[email protected], [email protected]
2
Physical Department, M.V. Lomonosov Moscow State University,
Moscow 119991, Russia

Abstract. Wavelet neural networks (WNN) are a family of approximation

algorithms that use wavelet functions to decompose the approximated function.
They are more flexible than conventional multi-layer perceptrons (MLP), but
they are more computationally expensive, and require more effort to ﬁnd optimal
parameters. In this study, we solve the inverse problems of determination of
concentrations of components in multi-component solutions by their Raman
spectra. The results demonstrated by WNN are compared to those obtained by
MLP and by the linear partial least squares (PLS) method. It is shown that
properly used WNN are a powerful method to solve multi-parameter inverse
problems.

Keywords: Wavelet neural networks Inverse problems

Raman spectroscopy Partial least squares Multi-layer perceptron

1 Introduction

The inverse problems of determination of concentrations of components in multi-

component solutions by processing optical spectra have been successfully solved using
multilayer perceptrons (MLP) [1–3]. It should be noted that such a solution is based on
the properties of the MLP as a universal approximator, and the resulting solution is an
approximation of a multi-parameter inverse function that maps the spectrum to the set
of determined parameters of the problem. Approximation is carried out by decom-
posing the approximated function over the basis of transfer functions of the hidden
layer of MLP, with the adjustment of the parameters of the basis functions in the
process of network training. Initially, all basis functions have approximately the same
parameters, including the same characteristic scale in the space of input features. With a
sufﬁcient amount of training patterns, the parameters will eventually be adjusted
optimally, in particular, will take into account the characteristics of the data – for
example, the width of the spectral bands characteristic of certain components of the
object of study. However, for real spectroscopic problems, the amount of available data

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 285–294, 2020.
https://doi.org/10.1007/978-3-030-30425-6_33
286 A. Eﬁtorov et al.

is usually too small (at best, the several thousand of patterns, with the dimension of the
input data of the order of hundreds). This means that when training the network, it is
almost inevitable that a local (rather than global) minimum of error functional will be
found, even if it is deep enough, and the solution found will be only quasi-optimal.
A possible solution of the problem is such a change in the decomposition basis, in
which the functions of both the same and multiple different scales in the space of input
features are already initially present in the basis. Such a basis is provided by wavelet
neural networks (WNN) [4]. At the same time there is reason to believe [5, 6], that in
the transition to the wavelet basis the form of the error functional will change in such a
way that the local minima will become deeper and will approach in their depth to the
global minimum, which will lead to a decrease in the average error of the approxi-
mation solution of the desired problem of modeling the inverse function. In this case,
the network will be able to work more efﬁciently with data that includes simultaneously
spectral bands of multiple different widths.
The classical approach to the formation and training of WNN has already been
worked out in detail, due to the fact that historically it appeared earlier. In particular,
the algorithm of error backpropagation by the method of stochastic gradient descent
(SGD) was used for training [7], as well as its combinations with the least squares
method [8], Kalman ﬁlter [9], genetic algorithms (GA) [10]. In addition to a com-
parative analysis of the use of GA and SGD backpropagation, it is interesting to
consider new popular methods of optimization Adam [11] and AdaGrad [12]. This
analysis should be carried out for WNN with linear and nonlinear activation functions
in the output layer, as well as with various families of wavelets.

2 Statement of Inverse Problem and Experimental Data

The inverse problem (IP) considered in this study is determination of types and con-
centrations of components in multi-component water solution of inorganic salts by
Raman spectra.
The principle possibility of solving this problem is due to the fact that the bands of
the Raman spectrum of an aqueous solution are very sensitive to the presence of
dissolved salts/ions in it (Fig. 1). Complex ions (SO42− – sulphates, NO4− – nitrates,
CO32− – carbonates etc.) have their own Raman lines in the region of 500–1500 cm−1
(the area of the so-called “ﬁngerprints”), which makes it possible to uniquely determine
the type of ion and its concentration. The presence of simple ions that do not have their
own Raman lines, however, is also manifested in the Raman spectra of aqueous
solutions. Ions such as Cl−, I−, Br−, Na+, K+ etc. affect the shape and position of the
most intense band of the spectrum – the band of stretching vibrations of water mole-
cules in the region 3000–4000 cm−1 [13–15]. Different ions have different effects. In
addition, the behavior of the quantitative characteristics of Raman spectral bands of
water depends signiﬁcantly on the state of the solutes: the presence of associates,
contact and non-contact ion pairs, etc. in the solution also appears in its spectrum.
This problem is inherently a complex IP. This nature of the task involves deter-
mining the concentration of a large set of simultaneously dissolved substances in a
wide range of concentrations – from tenths to units of mole per liter. Such tasks are
Use of Wavelet Neural Networks to Solve Inverse Problems 287

Fig. 1. Raman spectra of distilled water and of multi-component water solutions of inorganic
salts (left – “ﬁngerprint” area, right – Raman valence band of water). 1 – distilled water, 2 –
KNO3 – 0.6M, Li2SO4 – 0.75 M; 3 – NaCl-0.5 M, NH4Br – 1.75 M, CsI – 0.25 M; 4 – NaCl –
0.2 M, NH4Br – 0.2 M, Li2SO4 – 0.4 M, KNO3 – 1 M, CsI – 0.6 M.

relevant in the diagnosis of wastewater and process water, mineral water, sea and river
reservoirs. It is obvious that the components of the solution interact both with the
solvent molecules and with each other, and these interactions are of a complex non-
linear nature. Formation of associates, ion pairs, etc. is also possible. This leads to the
fact that it is impossible to create a model that adequately describes the molecular
interactions in the solution. In addition, it should be borne in mind that the information
content of different spectral channels varies. The spectral regions in which the lines of
complex ions and the valence band of water are located are obviously the ones most
sensitive to the type and concentration of dissolved substances. The area of the
deformation band of water (1600–1700 cm−1) and the area of the associative band
(2000–2400 cm−1) are much less informative. The presence of the above factors leads
to the fact that the dependence of the signal intensity in different spectral channels on
the concentration of solutes is significantly nonlinear. The situation is complicated by
the fact that the spectral bands that need to be analyzed simultaneously differ signifi-
cantly from each other both in intensity (for example, the valence band of water is
about 100 times more intense than the deformation band) and in width (for example,
the width of the lines of nitrate anions is units of cm−1, and the width of the valence
band of water at half-height is about 500 cm−1). In addition, the specificity of spec-
troscopic methods from the point of view of data processing is such that it implies the
solution of inverse problems for extracting the necessary information from high-
dimensional data, since the recorded spectra contain thousands of channels.
Previous experience of the authors showed that such multi-parameter IP are quite
effectively solved with the help of MLP. The developed methods, as well as the use of a
number of methods to reduce the dimension of the input data, allowed simultaneous
determination of the concentrations of 5 salts in water – NaCl, NH4Br, Li2SO4, KNO3,
288 A. Efitorov et al.

CsI – with an average error of 0.02 M in concentration measurement when operating in

the concentration range 0…2.5 M [1]. (Here we shall call this IP “5 salts problem”.)
However, at present, many applications require greater accuracy of salt identification
and determination of their concentration in multicomponent media, while increasing
the number of components in solutions, for example, diagnostics of process and
wastewater. Therefore, it is necessary to develop new methods and approaches that will
take into account the increasing complexity of interactions in solutions and the specifics
of spectroscopic methods.
The 5 salts problem provided the determination of individual concentrations of
components in a solution of inorganic salts, implying the component salt as a whole
(cation and anion together). It is clear that in the solution at concentrations far from the
solubility limit of the salt, it is in a completely dissociated state, and cations and anions
exist in the solution independently of each other and have an independent effect on the
Raman spectrum. That is, it is correct to understand as the component of the solution not
the salt as a whole, but a particular ion. Thus, a more correct formulation of the problem
is the identification and determination of the concentration of individual ions in multi-
component solutions. In this case, the number of components to be determined in
solutions increases dramatically. Moreover, the number/molarity of one ion does not
always correspond exactly to the molarity of the counter-ion, as in the case of 5 salts
problem, when salts with non-repeating ions were dissolved. Therefore, the problem of
diagnostics of multicomponent ion solutions is much more complicated. This second IP
(for solutions of 10 salts with repeated 10 ions Na+, NH4+, Li+, K+, Cs+, Cl−, Br−, SO42−,
NO3−, I−, we shall call it “10 ions problem”) was solved with the help of MLP [3]. The
accuracy of the determination of complex ions was 10−4 M, simple ions 10−3 M.
However, while such accuracy is quite satisfactory for monitoring of discharge and
formation waters, the diagnosis of e.g. mineral waters requires higher accuracy – down to
10−5 – 10−6 M. The 5 salts and 10 ions problems were first compared from the point of
view of their solution by MLP in [16]. The data array for the 5 salts problem consisted of
9144 patterns (spectra) with 1535 input features (channels), for 10 ions problem – 4445
patterns with 1824 features.

3 Results: Feature Extraction

First we present the results of solving the 10 ions problem with the partial least squares
(projection to latent structures) (PLS) method [17] and with MLP. The dataset was
randomly divided into training, validation and test sets in a ratio of 70:20:10.
PLS and MLP were applied both to the initial data and to the data processed by
various compression methods. Data compression is used to reduce the dimension of the
input data. Thus, for inverse problems of spectroscopy, the spectra contain thousands of
channels, thus making any approximation method tend to overtrain. At the same time, it
is clear that not all spectral channels are equally informative. Very often, reducing the
input dimension allows increasing the accuracy of the solution. In this case, only the
most informative input features remain, and the construction of the PLS model or MLP
training is carried out on patterns with a smaller number of input features extracted by
some algorithm.
Use of Wavelet Neural Networks to Solve Inverse Problems 289

The input data can be compressed in different ways. The simplest method is the
aggregation of spectral channels, consisting in summation of intensities in some
number of neighboring channels and averaging over these channels. In this study, in
addition to channel aggregation, the input data compression using discrete and con-
tinuous wavelet transform (DWT and CWT) was used. In this case, the initial spectrum
is considered as some scale space with the best resolution, and for some given basis of
orthogonal functions there is a set of subspaces with less detail. Calculations of the
DWT were carried out using R language using the wavethresh library: Wavelet
Statistics and Transforms [18]. The wavelets of the family Daubechies 10 [19] were
used. The CWT was calculated using our own code implementation in Python lan-
guage, supporting parallel computations on GPU through the use of library functions of
the tensorflow library [20]. Computational experiments with MLP training were carried
out by means of Python language on the basis of machine learning libraries scikit-learn
[21] and tensorflow.
Construction of the PLS model was stopped when convergence was achieved on
the training set. The results of the application of the PLS method are shown in Fig. 2.
As algorithms for compression of the input data, we used aggregation by 8 adjacent
input features, DWT for 4th, 5th, 6th, and 7th levels, and CWT with convolution width of
8, 16, 32 and 64 channels.

Fig. 2. Application of the PLS method to data with different compression of input features:
mean absolute error on the test dataset. The methods used are: Aggr – aggregation, DWT –
discrete wavelet transform, CWT – continuous wavelet transform; the number of input features is
separated by a space.

As can be seen, use of some methods of compression of the input data improves the
result of application of the PLS method to solve the 10 ions IP, compared with use of
the initial data. In the case of DWT, the best result is achieved using level 5 (32
approximation and 32 detail coefﬁcients). Aggregation by 8 features provides the result
290 A. Eﬁtorov et al.

better than DWT. The best result is achieved when using the CWT with a window 16
channels wide (190 input features). On the average, the best accuracy of determination
of salt concentration is 0.034 M.
When solving the 10 ions IP using MLP, a perceptron with two hidden layers (120
neurons in the ﬁrst hidden layer and 60 in the second) was used. Each network was
trained 5 times with various initial weights; the results of all 5 networks were averaged.
The results of the application of ANN method are shown in Fig. 3. The initial data used
was the same as in the PLS method.

Fig. 3. Application of the MLP to data with different compression of input features: mean
absolute error on the test dataset. The legend is the same as in Fig. 2.

The results of application of MLP without compression of the input data are worse
than the results obtained using DWT. In this case, DWT provides a greater error than
aggregation, and the best result is provided by CWT. The smallest error is achieved
using CWT with a window 32 channels wide (94 input features). On the average, the
mean absolute error of determination of salt concentration is 0.023 M.
Of the three methods of extraction of informative features considered above, the
method of CWT is the best. It demonstrates the lowest values of the mean absolute
error on the test dataset when using MLP or PLS. MLP shows signiﬁcantly better
results than PLS, indicating a signiﬁcant nonlinearity of the problem.

4 Results: Use of Wavelet Neural Networks

As a possible alternative to the two methods described above, we investigated the

classical wavelet neural network trained by gradient methods using the error back
propagation algorithm. We created a software implementation of classical wavelet
neural networks using modern techniques of parallel programming.
Use of Wavelet Neural Networks to Solve Inverse Problems 291

First, an implementation of the WNN classical scheme on the basis of the Python
programming language and a number of libraries of this language has been created.
This initial implementation was a classic Python code within object-oriented pro-
gramming. This implementation allowed to work out all the computational operations
performed during the training and application of the WNN, and to observe the evo-
lution of the parameters throughout the training.
The following problems were identified: saturation of weights and exit of their
values out of the domain of definition of wavelet functions in the training process (by
the shift parameter). In combination with the procedure of multiplication inside the
wavelon, this leads to the fact that it will provide the zero value, both in direct run, and
in the reverse propagation of the error. This makes further adjustment of weights by the
gradient method impossible.
The second parameter, the values of which also need to be artificially limited, is the
scale parameter. If the value of this parameter is very large, the function definition area
will suffer again, actually leading to the Delta function behavior, which will lead to
negative consequences similar to those mentioned above. At the same time, these
parameters are interrelated, so simply establishing hard constraints on their values
limits the definition area too much, and often does not allow finding optimal solutions
by the method of stochastic gradient descent (SGD).
The main way to deal with these problems was use of special effective approaches
to determining the initial values of weights (parameters) of the WNN.
In this study, new gradient descent algorithms were tested in the training of WNN:
Adam and Adadelta, in comparison with the classical SGD. As expected, SGD
demonstrated slower convergence and a high degree of dependence on the initialization
of the weights, and in case of an unfortunate coincidence – the problems described
above: going beyond the definition area of wavelet functions and the need to interrupt
training. However, when SGD was run many times, it was usually possible to obtain a
model comparable in properties to that trained by Adam algorithm. The Adadelta
method did not allow obtaining the best solutions, however, it should be noted that it
was often inclined to large values of learning rate parameters. This method may require
a more thorough search for the optimal parameters of the training algorithm. Note that
for several problems tested, and for the three methods of reducing the data dimension
for each of the problems, in only one scenario, SGD surpassed Adam. In all other cases
it was WNN trained by Adam that showed the best results.
Therefore, the most effective approach has been the combination of setting limits on
the values taken by the parameters and using the Adam method for training.
The next stage was the implementation of training and application of the WNN on
the basis of tensorflow high-performance machine learning library, which allowed use
of multithreaded calculations on CPU and GPU, greatly reducing calculation time.
Writing control scripts for the heterogeneous computing cluster allowed making
calculations simultaneously on more than 150 processor cores, managing all proce-
dures for storing and processing data from the cluster control terminal.
Finally, the results of solving 5 salts and 10 ions IPs using the classical WNN were
compared with the results obtained using the classical MLP and the PLS method. The
comparison is presented in Fig. 4 (5 salts) and Fig. 5 (10 ions).
292 A. Efitorov et al.

Fig. 4. Comparison of the results of solving the 5 salts problem by the algorithms of WNN,
MLP and PLS on the initial data and after compression by PCA, DWT and CWT methods with
the best parameters (Figs. 2, 3). The conﬁguration optimal for WNN in all cases was: 32
wavelons, Adam.

Fig. 5. Comparison of the results of solving the 10 ions problem by the algorithms of WNN,
MLP and PLS on the initial data and after compression by PCA, DWT and CWT methods with
the best parameters (Figs. 2, 3). The optimal conﬁgurations for WNN were the following: 32
wavelons, SGD (PCA); 16 wavelons, Adam (DWT); 32 wavelons, Adam (CWT).
Use of Wavelet Neural Networks to Solve Inverse Problems 293

On the base of the performed experiments, it can be concluded that at this stage,
WNN occupies an intermediate position between MLP and PLS, in some scenarios
even surpassing the result of MLP. This result can be considered partly successful,
since there are directions of improvement of the WNN training technology.
As it was mentioned above, the WNN has difﬁculties working with data of high
dimension. For this reason, although the obtained results were somewhat worse than
expected, they also showed good potential and prospects for WNN. At the same time,
the problem of working with high-dimensional data remains urgent and requires further
development of study in this direction.
Finally, this study has conﬁrmed the results of our preceding studies regarding
comparison of the two IPs. The 10 ions IP is much more complex and non-linear,
requiring maximum of information available to achieve the best results. Therefore,
feature selection worsens the result for the 10 ions IP in all cases, and MLP turns out to
be the ML algorithm providing the best results for any number of input features.

5 Conclusions

In this study, we considered use of wavelet neural networks to solve the inverse
problems of determination of the composition of multi-component solutions of inor-
ganic salts by the method of Raman spectroscopy combined with machine learning.
The results of WNN were compared to the results demonstrated by multi-layer per-
ceptrons and by the method of partial least squares (projection to latent structures).
As WNN is very sensitive to the number of input features, the solution of the
studied problems was preceded with feature extraction. The best result among the
feature extraction methods was demonstrated by continuous wavelet transformation.
At present stage of research, WNN usually performs better than the linear PLS
algorithm, but worse than an MLP. However, it has several problems in performing
efﬁcient training. Directions of possible improvement of the WNN training algorithm
have been formulated.

Acknowledgement. This study has been performed with ﬁnancial support from Russian
Foundation for Basic Research, projects 17-07-01479 and 19-01-00738.

References
1. Burikov, S.A., Dolenko, S.A., Dolenko, T.A., Persiantsev, I.G.: Application of artificial
neural networks to solve problems of identification and determination of concentration of
salts in multi-component water solutions by Raman spectra. Opt. Mem. Neural Netw. (Inf.
Opt.) 19(2), 140–148 (2010)
2. Dolenko, S.A., Burikov, S.A., Dolenko, T.A., Persiantsev, I.G.: Adaptive methods for
solving inverse problems in laser Raman spectroscopy of multi-component solutions. Pattern
Recogn. Image Anal. 22(4), 551–558 (2012)
294 A. Efitorov et al.

3. Efitorov, A., Dolenko, T., Burikov, S., Laptinskiy, K., Dolenko, S.: Neural network solution
of an inverse problem in Raman spectroscopy of multi-component solutions of organic salts.
In: Samsonovich, A.V. et al. (eds.) FIERCES 2016, Advances in Intelligent Systems and
Computing, vol. 449, pp. 273–279. Springer, Heidelberg (2016)
4. Zhang, Q., Benveniste, A.: Wavelet networks. IEEE Trans. Neural Netw. 6, 889–898 (1992)
5. Li, S., Chen, S.: Function approximation using robust wavelet neural networks. In: 14th
IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2002),
Proceedings, Washington, DC, USA, pp. 483–488 (2002)
6. Bellil, W., Ben Amar, C., Alimi, A.: Comparison between beta wavelet neural networks,
RBF neural networks and polynomial approximation for 1D, 2D functions approximation.
Int. J. Appl. Sci. Eng. Technol. 13, 33–37 (2006)
7. Zhang, J., Walter, G., Miao, Y.: Wavelet neural networks for function learning. IEEE Trans.
Signal Process. 43(6), 1485–1496 (1995)
8. Zhang, Q.: Using wavelet network in nonparameters estimation. IEEE Trans. Neural Netw.
8, 227–236 (1997)
9. Sui, Q., Gao, Y.: A stepwise updating algorithm for multiresolution wavelet neural networks.
In: International Conference on Wavelet Analysis and its Applications (WAA), Proceedings,
Chongqing, China, pp. 633–638 (2003)
10. Lim, C.G., Kim, K., Kim, E.: Modeling for an adaptive wavelet network parameter learning
using genetic algorithms. In: Fifteenth IASTED International Conference on Modeling and
Simulation, Proceedings, California, USA, pp. 55–59 (2004)
11. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv (2015). https://
arxiv.org/pdf/1412.6980v8.pdf. Accessed 09 June 2019
12. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and
stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
13. Rull, F., De Saja, J.A.: Effect of electrolyte concentration on the Raman spectra of water in
aqueous solutions. J. Raman Spectrosc. 17(2), 167–172 (1986)
14. Dolenko, T.A., Churina, I.V., et al.: Valence band of liquid water Raman scattering: some
peculiarities and applications in the diagnostics of water media. J. Raman Spectrosc. 31,
863–870 (2000)
15. Burikov, S.A., Dolenko, T.A., Velikotnyi, P.A., Sugonyaev, A.V., Fadeev, V.V.: The effect
of hydration of ions of inorganic salts on the shape of the Raman stretching band of water.
Opt. Spectrosc. 98(2), 235–239 (2005)
16. Efitorov, A., Dolenko, T., Burikov, S., Laptinskiy, K., Dolenko, S.: Solution of an inverse
problem in Raman spectroscopy of multi-component solutions of inorganic salts by artificial
neural networks. In: Villa, A.E.P. et al. (eds.) ICANN 2016, Part II, LNCS, vol. 9887,
pp. 355–362. Springer, Heidelberg (2016)
17. Esbensen, K.H.: Multivariate Data Analysis—In Practice, An Introduction to Multivariate
Data Analysis and Experimental Design, 5th edn. CAMO Software AS, US (2006)
18. Wavelet Statistics and Transforms. https://cran.r-project.org/package=wavethresh. Accessed
09 June 2019
19. Daubechies, I.: Ten Lectures on Wavelets. SIAM, Pennsylvania (1992)
20. TensorFlowTM: An open source machine learning framework for everyone. https://www.
tensorflow.org/. Accessed 09 June 2019
21. scikit-learn: Machine Learning in Python. http://scikit-learn.org/stable/index.html
Automated Determination of Forest-Vegetation
Characteristics with the Use of a Neural
Network of Deep Learning

Daria A. Eroshenkova, Valeri I. Terekhov,

Dmitry R. Khusnetdinov(&), and Sergey I. Chumachenko

Bauman Moscow State Technical University (BMSTU), Moscow, Russia

[email protected], [email protected],
[email protected]

Abstract. The article proposes a method of automated solution for determining

the species composition, stock coefﬁcient and other characteristics of forest
plantations with the use of deep learning. The analysis of existing approaches
and ways of forest inventory, which include the use of LiDAR systems and
machine learning methods, is carried out. An algorithm is proposed for solving
this problem and features of its implementation are given. The problem of
combining the data of a “dense cloud” and a lidar survey is considered, a
possible solution is proposed. The problem of segmentation of tree crowns
among many other objects in this data is also considered. For the segmentation
of crowns, it is proposed to use the PointNet neural network of deep learning,
which allows segmentation of objects by submitting a point cloud to the input.
The description of the architecture and the main features of the neural network
use are briefly given. The path of further research is determined.

Keywords: Forest inventory Unmanned aerial vehicle LiDAR

Segmentation Deep learning Neural network PointNet

1 Introduction

Recently, there has been an intensive development of various unmanned vehicles

(robots, cars, aircraft and underwater vehicles, etc.) intended for automating the work
processes of human activity. Their use eliminates the human factor, increases pro-
ductivity, and also most effectively solves the tasks. Unmanned vehicles are already
used in such areas as: transportation of goods, medicine, agriculture, construction,
communications, weapons, monitoring of objects, etc.
In some cases, unmanned vehicles work with lidar systems (Light Detection and
Ranging - LiDAR) [1], which is used to automatically construct a three-dimensional
map (scene) of the surrounding space and spatial orientation of the device. Such
systems generate huge arrays of data that can be used not only to visualize the resulting
image, but also to analyze individual objects located on the image, for example, using
machine learning methods [2–6].

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 295–302, 2020.
https://doi.org/10.1007/978-3-030-30425-6_34
296 D. A. Eroshenkova et al.

Using this approach, it is possible to improve already working solutions, and also to
develop more effective ways of solving existing problems in various fields. Based on
this, the article will consider the method that will determine the species composition,
planting stock and other coefficients of forest plantations with use of machine learning
methods based on lidar data.
This direction is extremely important, since today the use of unmanned vehicles
becomes more popular in forestry. This is primarily due to the fact that the unmanned
aerial vehicle (UAV) with the LiDAR system makes long-distance flights to take
pictures of hard-to-reach forest areas, monitors large areas, and also receives data on
the characteristics of forest stands in a short time. By analyzing the data obtained by
various methods, including methods of machine learning, it is possible to assess the
dynamics of the development of the forest fund in the studied area.
There are a number of studies and ready-made solutions that solve similar prob-
lems. For example, the Finnish company Arbonaut Oy Ltd [7] specializes in devel-
oping solutions for geographic information systems and processing data of remote
sensing for different areas. They use LiDAR systems for forest inventory with the
method based on sparse Bayesian regression for modeling forest characteristics [8].
This method is superior in accuracy to the traditional inventory methods based on field
measurements. However, it should be noted that forest plantations in Finland have a
fairly strict order and a homogeneous structure, and the variety of species is not large.
Such plantings are easier to analyze, unlike forest plantations in the Russian Federation,
where the order and structure of forests is more chaotic, and the diversity of species is
much greater [9, 10]. Therefore, the solution proposed by Arbonaut Oy Ltd is not
suitable for the inventory of forest plantations in the Russian Federation.
The task of forest inventory is extremely relevant and requires its speedy resolution,
taking into account the development of big data technologies, artificial intelligence,
robotics, as well as complex digital transformation of the economy and social sphere of
the Russian Federation by 2024.

2 Problem Deﬁnition and Algorithm Description

We analyzed the known approaches and methods for forest inventory and their
implementations and noticed the absence of acceptable, in terms of quality, ready-made
solutions [11]. In this regard, to solve this problem, we need to develop our own
method, based on the data of UAV shooting, LiDAR systems and the use of deep
learning, which is implemented using a neural network, as one of the methods of
machine learning. For this, we propose the following algorithm:
1. Combine LiDAR data (Fig. 1a) and « dense cloud » data (Fig. 1b) [12], which is a
kind of terrain plan on a precise geodetic basis:

A [ B ¼ C; ð1Þ

where A ¼ fða11 ; a12 ; . . .; a1m Þ; . . .; ðan1 ; an2 ; . . .; anm Þg is LiDAR data set, B ¼

fðb11 ; b12 ; . . .; b1k Þ; . . .; bp1 ; bp2 ; . . .; bpk g is « dense cloud » data set, C ¼
Automated Determination of Forest-Vegetation Characteristics 297

Fig. 1. Survey data: a - lidar survey of a strip of forest, b - forest «dense cloud»

fðc11 ; c12 ; . . .; c1r Þ; . . .; cðn þ pÞ1 ; cðn þ pÞ2 ; . . .; cðn þ pÞr g is compatible data set, aij ; bij
and cij are attributes of points of a three-dimensional scene according to the data
specification of each type of survey, including positional values; n is the number of
points in the set A, p is the number of points in the set B, m, k and r are the number of
points attributes.
This operation is performed in the ArcMap software [13] by spatial reference.
ArcMap allows you to create, view, edit, and publish maps. When using the spatial
reference function, it is necessary to find clearly expressed objects on the image -
crown tops. The result of this alignment will be data sets containing point clouds. In
addition to the positional values x, y and z, the system also stores additional infor-
mation. The following attributes are recorded and saved for each laser pulse of the
LiDAR system: intensity, reflection number, number of reflected signals, point clas-
sification values, extreme points of the flight line, RGB values, time, GPS, scan angle
and scan direction. A detailed description of each of the attributes can be found in the
specifications of the lidar data given in [14].
2. Segmentation of tree crowns in the combined images. Crown segmentation means
that each point in the picture must belong to a particular tree, if this point is indeed a
point of the tree, since there may be other objects in the pictures. Thus, it is
necessary to solve the segmentation problem:
298 D. A. Eroshenkova et al.

FðCÞ ¼ fðc11 ; c12 ; . . . ; c1r ; l1 Þ; . . . ; ðcðn þ pÞ1 ; cðn þ pÞ2 ; . . . ; cðn þ pÞr ; lðn þ pÞ Þg;

where F is segmentation function, C is the result of operation (1), cij are points
attributes, li is the variable of belonging of a point to a certain tree.
To solve this problem, we carried out a review of existing methods of 3D seg-
mentation [15], based on the results of which we propose to use PointNet convolutional
neural network (CNN) [16].
3. After segmentation of tree crowns, it is necessary to ﬁnd the diameter of the crowns.
Calculating the diameter of crowns by the known dependencies [17], it is possible
to determine the diameter of the stem. This parameter is important when analyzing
tree stands of the studied area.
4. Summarizing the results of paragraphs 1-3, one can determine the characteristics of
forest plantations, such as: the predominant species, tree species in the studied area,
the height of tree stands, the crown diameter and the stem diameter. The values of
these parameters can be used to calculate the full and stock of plantings in a given
territory. The result of the work will be a forest plantation map with a database
attached to it.

3 Features of the Implementation of the Proposed Approach

The advantage of using LiDAR in the task of determining the species composition,
stock coefficient and other characteristics of forest plantations is that the obtained data
give the correct height of plantations, which can be used in further analysis. The
drawbacks of the LiDAR data are rare measurements (about 30 points/m2), as well as
the absence of color scale (it is impossible to visually distinguish forestland species).
The « dense cloud » data is different. Its pros are high resolution system, i.e. frequent
location of points (up to 1000/m2), RGB images, and the presence of an infrared
channel, which is used for additional research of forests. But its limitation is inaccurate
measurement of the heights of forest plantations.
When combining LiDAR and « dense cloud » data we get:
– correct coefficient of the height of forest stands;
– frequent location of points;
– the presence of an infrared channel.
We should note that it is difficult to combine two scenes into one without the
presence of common points. Due to the different points of the survey, the angles of the
points, their slant ranges and other parameters differ. One possible solution is to shoot
from a UAV equipped with two LiDAR systems. Knowing the fixed distance between
the cameras, we get the difference in the locations of the points of the two scenes
relative to each other. Taking into account this distance, it is possible to carry out the
operation of combining two scenes into one scene.
An important point in the work is the use of CNN PointNet [16]. PointNet is a
unified deep learning network architecture that studies both global and local point
Automated Determination of Forest-Vegetation Characteristics 299

objects, providing a simple, effective approach for solving a number of 3D-recognition

tasks, such as: classiﬁcation, partial segmentation, and semantic segmentation.
The network architecture is shown in Fig. 2. The network has three key modules:
– max pooling layer (max pool) as a symmetric function for aggregating information
from all points;

Fig. 2. PointNet neural network architecture

– structure of combining information on local and global point objects (global

feature);
– two integrated alignment networks (T-Net) [18], which are used in the input
transform and feature transform blocks. T-Net is used to align the feature space of
input points and point features in geometric transformations. Thus, the studied set of
points remains invariant to these transformations.
The Classification Network accepts n input points as input data, applies transfor-
mations to input data and objects using the feature transform layer, and then aggregates
the point objects by max pooling. The output is the evaluation of classification by k
output scores.
Segmentation Network is an extension of the Classification Network. It combines
global and local point features, and output estimates by categories of the Classification
Network. In this case: mlp is a multilayer perceptron, where the numbers in brackets
denote the dimensions of the layers. For all layers with the ReLU activation function,
batch-normalization is used. The drop layers are used for the last mlp in the Classifi-
cation Network (in Fig. 2. contained in the mlp (512,256, k) Classification Network).
PointNet directly uses unordered sets of points from Euclidean space as input; these
sets have the following three properties.
1. Random nature. Unlike arrays of image pixels, a cloud of points is a set without a
specific order. In other words, a network that receives as input N sets of points in
three-dimensional space must be invariant to N! permutation of the input set in the
order of data input.
2. Interaction between points, which means that the points are not isolated and form
subsets with adjacent points.
300 D. A. Eroshenkova et al.

3. Invariance in transformations. The studied set of points must be invariant to certain

transformations. For example, neither the global category of the point cloud, nor the
segmentation of points should be changed, while rotating and moving points
together.
A cloud of points is represented as a set of points of a 3-dimensional space fPi ji ¼
1; . . .; ng; where each point Pi is a vector of coordinates ðx; y; zÞ with additional
functional channels, such as color, normal, etc.
For the process of semantic segmentation, the input can be a single object intended
for the segmentation of details of the object. This may be some area from the three-
dimensional scene for its segmentation into objects. The model will display n m
estimates for each of the n points and each of the m semantic subcategories.
Today, PointNet creators offer to test the network on a set of data that represent
point clouds of pieces of furniture and other interior items (table, door, sofa, board,
ceiling, floor, etc.) in the auditoriums of Stanford University [16]. Data sets are stored
in HDF5-files [19], therefore, in order to submit user data to the network input, it is
necessary to generate files of this type from them. Before proceeding with the PointNet
learning phase, we should additionally perform the algorithms for preprocessing,
dividing, and forming a data set. Metrics and additional information about the course of
training and testing are recorded in log-files. After learning on a training and test
sample, network recognition accuracy reaches 86% [16]. This result is encouraging and
is sufficient to solve the problem of determining the species composition and the stock
coefficient of forest plantations.

4 Conclusion

The article proposes a method for the automated determination of the species com-
position, stock coefficient and other characteristics of forest plantations. It includes
shooting from a UAV with a LiDAR system installed on it and using PointNet deep
learning neural network to process the received data. We described the working pro-
cedure and the algorithm for solving this problem. The article contains the description
and experimental results obtained by the authors of PointNet. It is shown that the
method proposed in the article is a promising, but not an easy scientific and technical
challenge. This is due to the fact that the main difficulties in analyzing the data obtained
are caused by the following problems:
1. Incompleteness and possible distortions of information about objects of interest due
to different types of surveys - lidar and « dense cloud »;
2. The lack of a correct method of combining data from a lidar survey and shooting
an « dense cloud » in one scene;
3. The impossibility of learning the neural network without a sufficient number of
labeled data sets.
Based on this, in the next stage of work, it is planned to create a labeled set of data
representing clouds of points of trees. Then it is necessary to split data into sets of
HDF5-files and train CNN PointNet with their help. Depending on the obtained
Automated Determination of Forest-Vegetation Characteristics 301

learning results, such a modiﬁcation of the network is possible, which will allow to
achieve the required accuracy in tree crown segmentation. After solving this problem, it
is necessary to consistently solve the problem of determining the diameters of crowns
and stems, as well as other related parameters that are used in the analysis of forest
stands of the studied area. The result of this work is a map of forest plantations of the
region with a base of the main characteristics of forest stands attached to it.

References
1. Weitkamp, C. (ed.).: Lidar: Range-Resolved Optical Remote Sensing of the Atmosphere.
vol. 102. Springer (2006)
2. Chernenkiy, V., Gapanyuk, Y., Revunkov, G., Kaganov, Y., Fedorenko, Y.: Metagraph
approach as a data model for cognitive architecture. In: Biologically Inspired Cognitive
Architectures Meeting, pp. 50–55. Springer, Cham, August 2018
3. Lychkov I.I., Alfimtsev A.N., Sakulin S.A.: Tracking of moving objects with regeneration of
object feature points. In: 2018 Global Smart Industry Conference (GloSIC), pp. 1–6. IEEE
(2018)
4. Neusypin, K.A., et al.: Algorithm for building models of INS/GNSS integrated navigation
system using the degree of identifiability. In: 2018 25th Saint Petersburg International
Conference on Integrated Navigation Systems (ICINS), pp. 1–5. IEEE (2018)
5. Serov, V.A., Voronov, E.M.: Evolutionary algorithms of stable-effective compromises
search in multi-object control problems. In: Smart Electromechanical Systems, pp. 19–29.
Springer, Cham (2019)
6. Knyazev, B., Barth, E., Martinetz, T.: Recursive autoconvolution for unsupervised learning
of convolutional neural networks. In: 2017 International Joint Conference on Neural
Networks (IJCNN), pp. 2486–2493. IEEE (2017)
7. https://www.arbonaut.com/en/
8. Tipping, M.E., et al.: Fast marginal likelihood maximisation for sparse Bayesian models. In:
AISTATS (2003)
9. Alexeyev, V.A., et al.: Statistical data on forest fund of Russia and changing of forest
productivity in the second half of XX century. St. Petersburg Forest Ecological Center,
p. 272 (2004)
10. http://www.iiasa.ac.at/web/home/research/researchPrograms/EcosystemsServicesandManag
ement/RussianForests.en.html
11. Hyyppä, J., et al.: Review of methods of small-footprint airborne laser scanning for
extracting forest inventory data in boreal forests. Int. J. Remote Sens. 29(5), 1339–1366
(2008)
12. Thrower, N.J.W., Jensen, J.R.: The orthophoto and orthophotomap: characteristics,
development and application. Am. Cartogr. 3(1), 39–56 (1976)
13. https://desktop.arcgis.com/en/arcmap/
14. Heidemann, H.K.: Lidar base specification. US Geol. Surv. (11-B4) (2012)
15. Nguyen, A., Le, B.: 3D point cloud segmentation: a survey. In: 2013 6th IEEE Conference
on Robotics, Automation and Mechatronics (RAM), pp. 225–230. IEEE (2013)
16. Qi, C.R., et al.: Pointnet: deep learning on point sets for 3d classification and segmentation.
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 652–660 (2017)
302 D. A. Eroshenkova et al.

17. Chumachenko, S.I., et al.: Simulation modelling of long-term stand dynamics at different
scenarios of forest management for coniferous–broad-leaved forests. Ecol. Model. 170(2–3),
345–361 (2003)
18. Ishiguro, H., Miyashita, T., Tsuji, S.: T-net for navigating a vision-guided robot in a real
world. In: Proceedings of 1995 IEEE International Conference on Robotics and Automation,
vol. 1, pp. 1068–1073. IEEE, (1995)
19. Folk, M., et al.: An overview of the HDF5 technology suite and its applications. In:
Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp. 36–47. ACM,
(2011)
Depth Mapping Method Based on Stereo Pairs

Vasiliy E. Gai(&), Igor V. Polyakov, and Olga V. Andreeva

Nizhny Novgorod State Technical University n.a. R.E. Alekseev,

Minin St., 24, Nizhny Novgorod, Russia
[email protected]

Abstract. The paper proposes a new method for solving the problem of con-
structing a depth map based on a stereo pair of images. The result of the depth
information recovery can be used to capture the reference points of objects in the
ﬁlm industry when creating special effects, as well as in computer vision sys-
tems used on vehicles to warn the driver about a possible collision. Proposed
method consists in using the theory of active perception at the stage of seg-
mentation and image matching.
To implement the proposed method, a software product in the C# language
was developed. The developed algorithm was tested on various sets of input
data. The results obtained during the experiment indicate the correct operation of
the proposed method in solving the problem of constructing a depth map.
The accuracy of depth mapping using the described method turned out to be
comparable with the accuracy of the methods considered in the review. This
suggests that this method is competitive and usable in practice.

Keywords: Theory of active perception Depth map Stereo pair of images

1 Introduction

One of the important tasks of the computer vision is the transformation of a stereo pair
of images into a three-dimensional scene. As a result of this process, the depth
information of each image point is restored. Obtaining an accurate depth map is the
ultimate goal of three-dimensional image recovery.
The depth information received as the result of this process can be used in many
other areas. For example, depth maps are used to capture the reference points of objects
in ﬁlm production when creating special effects, as well as in computer vision systems
used on vehicles to warn the driver about a possible collision.
Based on this, we can conclude that the development of new models and methods
for solving the problem of constructing a depth map based on a stereo pair is relevant.

2 The General Principle of the Methods for Constructing

Depth Maps Using a Stereo Pair

The general algorithm of depth mapping using stereo images includes the following
steps [1]: camera calibration, image rectiﬁcation, image segmentation, search for
matches between points of a pair of images, conversion of a discrepancy map into a

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 303–308, 2020.
https://doi.org/10.1007/978-3-030-30425-6_35
304 V. E. Gai et al.

depth map. In this paper, the first two stages are not considered, since these stages are
simple geometric transformations and they are solved at the hardware level in most
computer vision systems.
When analyzing the algorithms that implement the steps described above, the
following problems were identified. The problem of segmentation. This problem lies in
the fact that part of the segmentation algorithms does not have sufficient accuracy, and
therefore, at the stage of the search for correspondences, there are multiple errors
associated with incorrect segmentation. The other part of the algorithms provides
sufficient accuracy, but has a high computational complexity. There is also a problem
with the correlation of segments of two images [2]. The problem of finding matches.
This problem lies in the imperfection of matching algorithms, as a result of which the
accuracy of depth map construction is reduced [3]. The problem of handling errors after
matching. This problem based on the fact that usually after the stage of the search for
correspondences the discrepancy map contains a number of erroneously determined
points and their additional processing is necessary.

3 Methods of Depth Mapping

The proposed method for solving the problem of depth mapping applies the theory of
active perception (TAP) at the stage of segmentation and the search for correspondence
of points [4].
To solve the problem of depth mapping in this paper, the following algorithm is
proposed:
1. Image input – receiving images from cameras or from files;
2. Pre-processing – converting images to a brightness function.
3. Segmentation – the selection of objects in the first image in order to reduce the
search area in the future;
4. Search for matching segments – search for segments of the left image on the right
image;
5. Discrepancy mapping – the formation of a matrix containing information on how
much each point of the first image differs in its position in space from the same
point in the second image;
6. Depth mapping – the final stage of the restoration of depth information with the
subsequent visualization of the results.

4 Segmentation

The next step is to divide the image into segments. This is necessary to reduce the
search area at the stage of matching.
Due to the fact that there is a search for matching points on the same objects, the
best solution would be to divide the image into a set of objects, thereby narrowing the
search area to the inner area of objects. Also, the two source images are epipolar, which
allows the image to be divided into horizontal segments without the accuracy loss.
Depth Mapping Method Based on Stereo Pairs 305

Based on this, it was decided to produce segmentation in two stages:

1. Division into horizontal segments;
2. Selection in horizontal segments, segments based on the boundaries of objects.
Since the images included in the system are epipolar, all the horizontal lines of one
image coincide with the horizontal lines of the other. Therefore, you can divide the
image into horizontal segments without compromising the accuracy of the search.
After the image is divided into horizontal segments, it is necessary to select the
boundaries of objects inside each horizontal segment. For this process, we will use
ﬁlters that allow us to ﬁnd the change in brightness in different directions (see Fig. 1).

Fig. 1. Filters used to select borders.

Filter F1 is used to select vertical borders, F2 – to select horizontal borders, F3 – to

select diagonal borders. These ﬁlters are applied to each point in the horizontal seg-
ment. From the obtained values, the greatest in absolute value is selected, i.e. denoting
the greatest difference in brightness (see Fig. 2).

Fig. 2. Segments

For the subsequent use it is necessary to form a segment model. It consists of the
following elements:
1. The starting point of the segment and its description with the help of TAP.
2. The end point of the segment and its description using TAP.
TAP ﬁlters are used to describe points. The description of the points is formed by
applying to them all 16 TAP ﬁlters.
306 V. E. Gai et al.

5 Segment Matching

At the moment, one image is divided into segments. The next step is to search for the
segments of the first image on the second one. To do this, the second image searches
for the most similar points for the beginning and end of the segment using the fol-
lowing algorithm:
1. The response for the reference point is calculated for all 16 filters.
2. A 4 4 window is passed through the pixels of the horizontal segment of the
second image. The current pixel is the coordinate of the upper left corner of a 4 4
window. As the window passes through the image, the response is calculated for all
16 filters.
3. The difference modulus (“delta”) of each response is found with the reference
response that was found at the beginning.
4. All sixteen differences are summed up and saved together with the coordinates of
the current position of the window.
5. From all the obtained differences the minimum difference is found, which deter-
mines the minimum difference of the found point from the original one.
6. This point is set in accordance with the original.
This algorithm is performed for the starting and ending points of the segment. Thus,
pairs of segments of the first and second images are formed.

6 Discrepancy Mapping

The ﬁrst main stage of the algorithm is discrepancy mapping – a matrix containing
information about how much each point of the ﬁrst image differs in position in space
from the same point in the second image. For each point of the segment, the corre-
sponding point is searched for in the second image. The scope of the search in this case
is limited by the size of the segment.
When the desired point is found, its discrepancy is calculated by the formula:

D ¼ jX1 X2 j; ð1Þ

where X1 – coordinates of the point on the ﬁrst image, X2 – coordinates of a point on

the second image.

7 Depth Mapping

Depth mapping is the ﬁnal stage of solving the problem. At this stage, the discrepancy
map is converted to a depth map. It is also necessary to solve the problem of possible
errors made at the stage of the search for matches. Therefore, it was decided to apply
the following formula to all points of the discrepancy map:
Depth Mapping Method Based on Stereo Pairs 307

8
> D ; D Max;
< x þx;y
P2
n

Dx;y ¼ Di;y ð2Þ

>
: i¼xn2
n ; D [ Max;

where Dx;y – depth map value at point, Max – maximum possible depth map value, n –
the size of the area on which the average value is calculated.
This formula is a ﬁlter. In other words, if the value of a point is greater than
expected, replace its value with an average value from neighboring points. This
completes the depth recovery. The following formula is used to visualize the results:

Gx;y ¼ 255 Dx;y Dmax ; ð3Þ

where Dmax – maximum depth map value, Gx;y – point value in grayscale.

8 Computational Experiment

To conduct a computational experiment, a database of stereo images was formed. The

database consists of 2000 different pairs of images. For each of the pairs of images in
the database there is also a reference depth map (see Fig. 3).

Fig. 3. An example of images used in a computational experiment (left, right and depth map)

During the experiment, each point of the reference depth map is compared with the
corresponding points of the depth map obtained by the algorithm proposed in this
paper.
The proposed method for solving the problem of depth mapping has a set of input
parameters. Therefore, in the course of the experiment, different sets of values of the
input parameters of the algorithm were investigated in order to identify the set that
allows depth mapping with the greatest accuracy. As a result of a combination of all the
specified values of the input parameters of the algorithm, nine launch configurations
were obtained. For each configuration of the launch of the algorithm, the following
values were obtained: the accuracy of depth mapping, the average processing time of a
single image. The test results of the algorithm are given in Table 1.
308 V. E. Gai et al.

Table 1. Algorithm testing results

Maximum amount Minimum size Accuracy, Average processing
of segments of segments % time, s
1 10 82,4 10
50 83,7 9
70 82,8 9
4 10 90,2 7
50 90,4 6
70 90,6 6
8 10 90,3 5
50 90,7 5
70 90,7 5

Table 2 presents the results of the known methods for depth mapping [1].

Table 2. The results of work of the known methods of mapping depth

Method Accuracy, %
SAD without segmentation 87,6
MeanShift и SAD 90,7
Trust Distribution Algorithm and SSD 91,8

Comparing the data from Table 2 and the obtained results of testing the algorithm
(see Table 1), we can conclude that the developed method has a depth map con-
struction accuracy, which is quite comparable with the accuracy of the known methods
considered. As a result of testing the algorithm in normal conditions, the accuracy of
constructing a depth map equal to 90.7% was obtained.

References
1. Kamencay, P., Breznan, M., Jarina, R., Lukac, P., Zachariasova, M.: Improved depth map
estimation from stereo images based on hybrid method. Radioeng. J. 21(1), 70–78 (2012)
2. Comaniciu, D., Meer, P.: Mean shift: a robust approach towards feature space analysis. IEEE
Trans. Patt. Anal. Mach. Intell. 24(5), 603–619 (2002)
3. Hisham, M.B.: Template matching using sum of squared difference and normalized cross
correlation. In: 2015 IEEE Student Conference Research and Development (SCOReD) (2015)
4. Utrobin, V.A.: Physical interpretations of the elements of image algebra. Uspekhi
Fizicheskikh Nauk (UFN) 174(10), 1089–1104 (2004)
Semantic Segmentation of Images Obtained
by Remote Sensing of the Earth

Dmitry M. Igonin(&) and Yury V. Tiumentsev

Moscow Aviation Institute (National Research University), Moscow, Russia

[email protected], [email protected]

Abstract. In the last decade, computer vision algorithms, including those

related to the problem of understanding images, have developed a lot. One of the
tasks within the framework of this problem is semantic segmentation of images,
which provides the classification of objects available in the image at the pixel
level. This kind of segmentation is essential as a source of information for
robotic UAV behavior control systems. One of the types of pictures that are used
in this case is the images obtained by remote sensing of the earth’s surface.
A significant number of various neuroarchitecture based on convolutional neural
networks were proposed for solving problems of semantic segmentation of
images. However, for some reasons, not all of them are suitable for working
with pictures of the earth’s surface obtained using remote sensing. Neuroar-
chitectures that are potentially suitable for solving the problem of semantic
segmentation of images of the earth’s surface are identified, a comparative
analysis of their effectiveness as applied to this task is carried out.

Keywords: Earth remote sensing Aerial and satellite imaging 2D image

Semantic segmentation Convolutional neural networks Comparative analysis

1 Introduction

One of the most challenging scientiﬁc and applied problems of our time is the
development of behavior control systems for highly autonomous robotic unmanned
aerial vehicles (UAVs) that can perform complex missions under uncertainty condi-
tions [1, 2]. Such a control system, to support decision-making processes, requires
information about the current situation in which the UAV operates. In obtaining such
information, the most crucial role belongs to computer vision, which is an interdisci-
plinary scientiﬁc and applied area focused on solving problems related to the per-
ception, analysis, and understanding of images [3–5]. Understanding of images is
precisely the thing what we require to obtain the information necessary to decision
making when controlling the behavior of a UAV. It should be emphasized that it is the
understanding of images that is the basis for obtaining the information necessary for
making decisions in controlling the behavior of UAVs.
In the last decade, computer vision techniques have been actively developed,
including image understanding methods based on the use of deep learning and deep
neural networks, in particular, convolutional neural networks (CNN) [3, 6–10].

© Springer Nature Switzerland AG 2020

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 309–318, 2020.
https://doi.org/10.1007/978-3-030-30425-6_36
310 D. M. Igonin and Y. V. Tiumentsev

Concerning various types of images and problems to be solved, a signiﬁcant number of

varieties of CNN-based neuroarchitectures were proposed [6, 10].
One of the kinds of images that are of great importance in solving the problems
mentioned above is the images obtained by remote sensing of the earth’s surface [11,
12, 20]. Working with such pictures, especially in real time, causes severe demands for
both hardware and algorithms for solving the appropriate problems. For this reason, not
all neuroarchitectures based on the use of convolutional networks are suitable for
solving the problem of on-line semantic segmentation of images obtained by remote
sensing of the earth’s surface. In this regard, this article attempts to eliminate neu-
roarchitectures, which are for one reason or another, unsuitable for solving the con-
sidered problem and also to conduct a comparative analysis of the effectiveness of
potentially suitable neuroarchitecture. We discuss the results of this analysis in the
following sections.

2 Semantic Segmentation as a Part of the Image

Understanding Problem

We can solve the task of image understanding at several levels of granularity [3–5]:
1. Image classification. In this case, we assume that the image contains a single object
(the “main object”) that needs to be assigned to one of the finite set of prescribed
classes. The answer, in this case, is the label of the corresponding class.
2. Object classification and localization. In this case, in addition to the task of clas-
sifying an object, as in the previous granularity level, it is also required to localize it
on the image. Such localization is carried out by enclosing this object in some
bounding box. The answer, in this case, is the label of the corresponding class
together with the parameters of the bounding box.
3. Object detection. The task is similar to the one that is solved at the previous level,
but for the case when there are more than one classified objects in the image. The
answer, in this case, is a set of class labels in combination with a set of parameters
of the bounding box for all objects detected in the image.
4. Semantic segmentation. In this case, we solve the problem at the pixel level of the
analyzed image, that is, by assigning a label of the corresponding class to each of
the pixels of the given image. In general, the answer at this granularity level will be
an image of the same size as the original image with the corresponding class labels
assigned to each pixel. At the same time, for clarity, the image areas corresponding
to different classes are marked by different conditional colors.
5. Instance segmentation. This level provides additional granularity compared to
image segmentation. In this case, we require not only to mark each of the image
pixels with a corresponding label, but also to select individual instances of each of
the recognized classes in this image, as is the case in the “object detection” task. In
this case, we assign various conditional colors in the picture not to separate classes
of objects, but to separate instances of these classes. For example, in the semantic
segmentation task, pixels that correspond to all objects of the “person” class will be
Semantic Segmentation of Images Obtained by Remote Sensing of the Earth 311

marked with the same color, and in the case of instance segmentation, each of the
found objects of this type will we mark with its specific conditional color.
The following sections discuss the solution of one of these tasks, namely, the
problem of semantic image segmentation, which is critical for providing the UAV
behavior control system with source data.
A tool that has proven itself in solving problems of semantic image segmentation,
including under conditions of uncertainty, is a convolutional neural network (CNN) in
combination with deep learning methods [6, 7, 10]. During the last decade, a significant
number of neuroarchitectures based on this class of neural networks have been
proposed.
As experience in solving semantic image segmentation problems shows, such
CNN-based neuroarchitectures as U-Net [14], SegNet [15], MultiNet [16] demonstrate
the best results. There are attempts to use for semantic segmentation other networks, in
particular, DenseNet [17], DeepLab [8], ICNet [18], FRRN [19], as well as several
others. These networks, however, for some reasons do not meet the requirements
arising when working with images obtained by remote sensing methods. The analysis
of these reasons is beyond the scope of this article.
The following sections provide a comparative analysis of the neuroarchitectures U-
Net, SegNet, and MultiNet in terms of their efficiency in solving problems of semantic
segmentation of images obtained during remote sensing of the earth’s surface. This
analysis is carried out using source data from the WorldView-3 image gallery [11].

3 The Source Data Used for Analysis

The training data required to solve the problem of semantic segmentation was formed
using the gallery of multispectral images obtained by the WorldView-3 satellite [11].
This database contains tagged images of the earth’s surface that can be used to rec-
ognize objects of various types on them. Examples of such images we can see in Fig. 1.
Classes of objects that are labelled in the WorldView-3 database are presented in
Table 1.
All images in the gallery are presented in GeoTiff format [12] in three- and 16-band
format. We obtained 25 color images in RGB format with a resolution of 3396 3349
pixels using 16-band pictures. We plan to use the multispectral nature of photos in the
WorldView-3 gallery in our future research as a source of additional information about
the objects in these images.
We divide each of these 25 images into images of 128x128 pixels in size to reduce
the requirements for the required computational resources. Examples of the reduced
pictures we can see in Fig. 2. As a result of this operation, about 1.6 104 patterns
were obtained, namely, 9752 training patterns, 1300 validation patterns and 5202 test
patterns were formed. These sets of patterns are sufﬁcient, as shown by the results of
computational experiments, for training the analyzed convolutional networks.
312 D. M. Igonin and Y. V. Tiumentsev

Fig. 1. Examples of images of the earth’s surface from the WorldView-3 image gallery

Fig. 2. Training patterns and their masks obtained using images from WorldView-3 gallery
Semantic Segmentation of Images Obtained by Remote Sensing of the Earth 313

As we know [13], there is a rigid relationship between the number of tunable

parameters of a neural network and the number of training examples required for its
training. Bearing in mind this factor, we can say that the number of examples in the
generated training set is sufﬁcient to train such neuroarchitectures as U-Net, SegNet,
MultiNet.

Table 1. Object classes tagged on images from the WorldView-3 database

Class Content Label Percentage in
number database
0. Background Background 0.53
1. Large building, residential, non-residential, fuel Buildings 0.05
storage facility, fortiﬁed building
2. Misc manmade structures Misc
3. Roads Road 0.04
4. Poor, dirt, cart track, footpath, trail Track
5. Woodland, hedgerows, groups of trees, Trees 0.09
standalone trees
6. Contour ploughing, cropland, grain (wheat) Crops 0.28
crops, row (potatoes, turnips) crops
7. River, channel et al. Waterway 0.007
8. Lake, pond, pool Standing
water
9. Large vehicle (e.g. lorry, truck, bus), logistics Vehicle 0.0002
vehicle Large
10. Small vehicle (car, van), motorbike Vehicle
Small

4 Comparative Efﬁciency Analysis of Selected

Neuroarchitectures

The neuroarchitectures selected for comparative analysis of their effectiveness in

solving the problem of semantic segmentation of remote sensing images can be briefly
described as follows.
MultiNet (Fig. 3a) [16] is a neuroarchitecture, which consists of an encoder, a
decoder, and a segmentation decoder. This architecture was developed for use as part of
a behavior control system for unmanned cars.
SegNet [15] is an autoencoder based on a convolutional neural network. The
SegNet network architecture (Fig. 3b) consists of consecutive blocks, each of which
contains convolution layers, UpSampling layers, as well as ReLU activation layers and
BatchNorm normalization layers.
314 D. M. Igonin and Y. V. Tiumentsev

CONV 2D 128x128x3 CONV 2D 128x128x3

CONV 2D 128x128x64
Max Pooling 64x64x64 CONV 2D 128x128x64
BatchNormalisaon 128x128x64
CONV 2D 64x64x128 Max Pooling 64x64x64
CONV 2D 64x64x128
Max Pooling 32x32x128
CONV 2D 64x64x128
CONV 2D 32x32x256 BatchNormalisaon 64x64x128
CONV 2D 32x32x256 Max Pooling 32x32x128
CONV 2D 32x32x256
Max Pooling CONV 2D 32x32x256
16x16x256 BatchNormalisaon 32x32x256
CONV 2D 16x16x512
Max Pooling 16x16x256
CONV 2D 16x16x512
CONV 2D 16x16x512 CONV 2D 16x16x512
Max Pooling
CONV 2D 8x8x2 CONV 2D 16x16x2
BatchNormalisaon 16x16x512
8x8x512 Max Pooling 32x32x512
CONV 2D 8x8x512
CONV 2D 8x8x512 CONV 2D 16x16x512
CONV 2D 8x8x512 BatchNormalisaon 16x16x512
Max Pooling CONV 2D 16x16x512
4x4x512 BatchNormalisaon 16x16x512
Max Pooling 32x32x512
CONV 2D 4x4x2
CONV 2D TRANSPOSE CONV 2D 32x32x256
8x8x2 BatchNormalisaon 32x32x256
UpSampling2D 64x64x256
ADD 8x8x2 CONV 2D 64x64x128
CONV 2D TRANSPOSE 16x16x2 BatchNormalisaon 64x64x128
UpSampling2D 128x128x128

ADD 16x16x2 CONV 2D 128x128x128

CONV 2D TRANSPOSE 128x128x2 BatchNormalisaon 128x128x128
DENSE 128x128x7 UpSampling2D 128x128x3

(a) (b)
CONV 2D 128x128x3
CONV 2D 128x128x32
Max Pooling 64x64x32
CONV 2D 64x64x64
CONV 2D 64x64x64
Max Pooling 32x32x64
CONV 2D 32x32x128
CONV 2D 32x32x128
Max Pooling 16x16x128
CONV 2D 16x16x256
CONV 2D 16x16x256
Max Pooling 8x8x256
CONV 2D 8x8x512
CONV 2D 8x8x512
CONV 2D TRANSPOSE 16x16x256

CONCENTRATE 16x16x512

CONV 2D 16x16x256
CONV 2D 16x16x256
CONV 2D TRANSPOSE 32x32x128

CONCENTRATE 32x32x256

CONV 2D 32x32x128
CONV 2D 32x32x128
CONV 2D TRANSPOSE 64x64x64

CONCENTRATE 64x64x128

CONV 2D 64x64x64
CONV 2D 64x64x64
CONV 2D TRANSPOSE 128x128x32

CONCENTRATE 128x128x64

CONV 2D 128x128x32
CONV 2D 128x128x32
CONV 2D 128x128x3

(c)
Fig. 3. Neuroarchitectires: (a) – MultiNet; (b) – SegNet; (c) – U-Net
Semantic Segmentation of Images Obtained by Remote Sensing of the Earth 315

Fig. 4. Learning curves of the selected neuroarchitectures: (a) – MultiNet; (b) – SegNet;
(c) – U-Net

U-Net (Fig. 3c) [14] is a standard CNN architecture for image segmentation tasks.
In the ISBI competition in 2015, U-Net ranked ﬁrst by a large margin. The U-Net
network architecture yielded the best results in biomedical applications, as well as in
solving problems for which there is a limited amount of source data.
The quality of training is checked on the validation set that was not involved in
the learning (Fig. 4). The results of recognition for a test examples are presented
in the form of probability matrices (Fig. 5). The value of each of the elements of the
matrix is a probabilistic assessment of the conformity of the class to itself (the diagonal
values) and the probability of the classes being confused with each other (the non-
diagonal values).
316 D. M. Igonin and Y. V. Tiumentsev

Fig. 5. Test patterns results: (a) – MultiNet; (b) – SegNet; (c) – U-Net
Semantic Segmentation of Images Obtained by Remote Sensing of the Earth 317

5 Conclusions

With a ﬁxed size of the training set, neuroarchitectures with a smaller number of
adjustable parameters have an advantage, due to the tight connection between this
number and the number of training examples.
Under these conditions, the best results were shown by the SegNet network, for
which the average value of the diagonal elements of the probability matrix is higher
than that of the MultiNet and U-Net networks. It should be noted, however, that the
recognition of objects belonging to the Vehicle class, which is essential for the
applications in question, is a difﬁcult task for all analyzed networks.

Acknowledgement. This research is supported by the Ministry of Science and Higher Educa-
tion of the Russian Federation as Project No. 9.7170.2017/8.9.

References
1. Finn, A., Scheding, S.: Developments and Challenges for Autonomous Unmanned Vehicles.
Springer, Heildelberg (2010)
2. Valavanis, K.P.: Advances in Unmanned Aerial Vehicles: State of the Art and the Road to
Autonomy. Springer, Netherlands (2007)
3. Favorskaya, M.N., Jain, L.C. (eds.): Computer vision in control systems. Aerial and satellite
image processing, vol. 3. Springer, Heidelberg (2018)
4. Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, London (2011)
5. Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Prentice-Hall, New Jersey
(2002)
6. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press, Cambridge
(2017)
7. Zhao, Z.-Q., et al.: Object detection with deep learning: a review. arXiv:1807.05511v2 [cs.
CV]. Accessed 16 Apr 2019
8. Chen, L.-C., et al.: DeepLab: semantic image segmentation with deep convolutional nets,
Atrous convolution, and fully connected CRFs. arXiv:1606.00915v2 [cs.CV]. Accessed 12
May 2017
9. Hu, R., et al.: Learning to segment everything. arXiv:1711.10370v2 [cs.CV]. Accessed 27
March 2018
10. Gu, J., et al.: Recent advances in convolutional neural networks. arXiv:1512.07108v6 [cs.
CV]. Accessed 19 Oct 2017
11. WorldView-3 Satellite Imagery, DigitalGlobe, Inc. (2017)
12. Qu, J.J., et al.: Earth Science Satellite Remote Sensing: Data, Computational Processing, and
Tools, vol. 2. Springer, Heidelberg (2006)
13. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York
(2009). Pearson
14. Ronneberger, O, Fischer, P, Brox, T.: U-Net: convolutional networks for biomedical image
segmentation. arXiv:1505.04597v1 [cs.CV]. Accessed 18 May 2015
15. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder
architecture for image segmentation. arXiv:1511.00561v3. [cs.CV]. Accessed 10 Oct 2016
16. Teichmann, M., et al.: MultiNet: real-time joint semantic reasoning for autonomous driving.
arXiv:1612.07695v2 [cs.CV]. Accessed 8 May 2018
318 D. M. Igonin and Y. V. Tiumentsev

17. Huang, G, et al: Densely connected convolutional networks. arXiv:1608.06993v5 [cs.CV].

Accessed 28 Jan 2018
18. Zhao, H., et al.: ICNet for real-time semantic segmentation on high-resolution images. arXiv:
1704.08545v2 [cs.CV]. Accessed 20 Aug 2018
19. Pohlen, T., et al.: Full-resolution residual networks for semantic segmentation in street
scenes. arXiv:1611.08323v2 [cs.CV]. Accessed 6 Dec 2016
20. Cheng, G., Han, J., Lu, X.: Remote sensing image scene classiﬁcation: benchmark and state
of the art. Proc. IEEE 105(10), 1865–1883 (2017)
Diagnostics of Water-Ethanol Solutions
by Raman Spectra with Artiﬁcial Neural
Networks: Methods to Improve Resilience
of the Solution to Distortions of Spectra

Igor Isaev1,2(&), Sergey Burikov1,2, Tatiana Dolenko1,2 ,

Kirill Laptinskiy1,2, and Sergey Dolenko1
1
D.V. Skobeltsyn Institute of Nuclear Physics,
M.V. Lomonosov Moscow State University, Moscow, Russia
[email protected], [email protected]
2
Physical Department, M.V. Lomonosov Moscow State University,
Moscow, Russia

Abstract. In this study, we consider adding noise during training of a neural

network as a method of improving the stability of its solution to noise in the
data. We tested this method in solving the inverse problem of Raman spec-
troscopy of aqueous ethanol solutions, for a special type of distortion caused by
changes in the power of laser pump leading to compression or stretching of the
spectrum. In addition, we tested the method on the spectra of real alcoholic
beverages.

Keywords: Neural networks Inverse problems Raman spectroscopy

Water-ethanol solutions

1 Introduction

The problem of quality control of alcoholic beverages considered in this paper is to

detect toxic impurities (methanol, fusel oils, etc.) and to determine their concentrations.
The methods of solving this problem must be accurate, cheap, fast and non-contact.
Currently, there are a number of methods that allow solving this problem with a
sufﬁciently high accuracy: chromatography [1], NMR [2, 3], chemical methods.
However, they are expensive and time-consuming, and are not contactless, i.e. require
opening the container and extracting a certain amount of sample. As an alternative, the
method of Raman spectroscopy [4–6] was proposed, which is fast and non-contact, and
does not require complex sample preparation and expensive reagents.
Unfortunately, currently there is no analytical solution for the inverse problem
(IP) of Raman spectroscopy, and empirical methods based on the measurement of the
intensity of characteristic lines [4] are not applicable in the case of a large number of
components. Therefore, machine learning methods are actively used to solve IP in

Study performed at the expense of Russian Science Foundation, project no.19-11-00333.

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 319–325, 2020.
https://doi.org/10.1007/978-3-030-30425-6_37
320 I. Isaev et al.

spectroscopy. For example, artiﬁcial neural networks (ANN) have been successfully
used for determination of the concentrations of salts dissolved in water by Raman
spectra [5, 6], for rapid determination of wine components by absorption spectra [7], to
determine the content of glucose in urine by IR absorption spectra [8].
The IP considered in this paper, like many other IP, is characterized by incor-
rectness and poor conditionality, resulting in high sensitivity of the solution to noise in
the data. Despite the fact that ANN by themselves have the ability to work with noisy
data, in the case of IP this ability is not enough, requiring the development of special
approaches to improve the stability of the neural network solution.
In the previous studies of the authors [9–11], it was proposed to use addition of
noise during training to improve the stability of the neural network solution of IP. The
basis for this is a number of studies, where it was shown that this method could
improve the generalizing capabilities of the network [12, 13], prevent overtraining [14–
16], as well as increase the speed of training [17], and that its use was equivalent to
Tikhonov regularization [18].
In this paper, this method was tested in relation to the IP of spectroscopy of
aqueous ethanol solutions. In this case, a special type of distortion affecting the entire
spectrum at once was considered.

2 Problem Statement

2.1 Data Preparation

Experimental Setup. A spectrometer consisting of an argon laser (wavelength

488 nm, power 200 mW), a monochromator, and a CCD detector was used. The
spectra were recorded in the range of 200–3800 cm−1, with a resolution of 2 cm−1. For
each sample, 10 spectra were taken, which were then averaged.
The principle possibility of determining the concentrations of ethanol and the
impurities is due to the fact that each of the components has speciﬁc lines in the ranges
200–1600 cm−1 and 2600–3800 cm−1 (Fig. 1). Concentrations of single-component
aqueous solutions can be determined by the intensity of these lines [4]. However, for
multi-component solutions, this approach is not applicable, since the lines of the
components under consideration overlap (Figs. 1 and 2, left).
Simulation Alcohol Drinks Set. Alcoholic beverages of different strength were
modeled, and the following ethanol concentrations were considered: 35, 38, 40, 42, 45,
49, 53, 57%. Impurity concentrations varied from zero to lethal dose and were as
follows: methanol – 0, 0.05, 0.14, 0.4, 1.1, 3.1, 8.6, 24%; fusel oil – 0, 0.025, 0.07,
0.22, 0.66, 2, 6, 18%; ethyl acetate – 0, 0.17, 0.35, 0.7, 1.4, 2.8, 5.6, 11.2%. Fusel oil
was modeled by a mixture of isoamyl and isopropyl alcohols in a ratio of 70/30.
4043 spectra with various combinations of the considered components were recorded
(Fig. 2).
Real Alcohol Drinks Set. To test the results of the work, a data set containing spectra
of 69 real alcoholic beverages: vodka, gin, tequila, liqueurs etc. (Fig. 2, right) was
Diagnostics of Water-Ethanol Solutions by Raman Spectra with ANN 321

Fig. 1 Raman spectra of pure substances.

Fig. 2 Raman spectra of water-ethanol solutions.

recorded. In addition, spectra of pure alcohol and distilled water were also included.
There were total 73 patterns.

2.2 Description of Distortions

Experimental data of spectroscopy IP are subject to distortions of the following types:
(a) Deviations in the concentrations of the solution components due to inaccuracies in
the preparation of solutions (the “true” concentrations of the components for each
spectrum were not measured by an alternative method, but were set during the
preparation of each sample).
322 I. Isaev et al.

(b) Random errors in determining the intensity of the spectra channels by the CCD-
detector.
(c) Frequency shift of the spectrum channels, which may be due to uncontrolled
change of adjustment of the experimental setup when replacing the sample.
(d) Spectra distortions caused by a change in the laser power or by a change in the
absorption coefﬁcient of the sample container, which leads to stretching or con-
traction of the spectrum.
e) Variable pedestal caused by light scattering on inhomogeneity of medium density
(Fig. 2).
The purpose of this study was to verify the applicability of the previously devel-
oped methods of improving the resilience of the neural network solution of IP to noise
in the data to the problem of spectroscopy of aqueous ethanol solutions in relation to
the fourth type of distortion (stretching/contraction).

3 Solving the Problem

3.1 Data Preprocessing

In order to compensate distortions such as stretching/contraction, normalization is
usually used. For example, in the problem of ion concentration determination [11],
Raman spectra are normalized to the area (or maximum) of valence band of water. The
basis for this is the assumption that water concentration is approximately the same in all
samples. In the case of aqueous ethanol solutions, this assumption is not fulﬁlled: with
an increase in the proportion of ethanol in the solution, the proportion of water
decreases, due to which the intensity of the ethanol bands increases, and the intensity of
the valence band of water decreases (Fig. 2, center). In view of this, in the present
study, normalization was not performed, thus increasing the complexity of the problem
being solved.

3.2 Using Neural Networks

In this study, we used a multilayer perceptron with 32 neurons in the single hidden
layer. Activation function was logistic in the hidden layer, and linear in the output
layer. Training was carried out by the method of stochastic gradient descent. Each
network was trained 5 times with various initializations of weights, the statistics of
application of these 5 networks were averaged.
To prevent overtraining, the early stopping method was used. To do this, the initial
array of spectra was randomly divided into training, validation, and test sets, which
contained 2799, 779, 445 patterns, respectively. The training was stopped after 500
epochs without improving the result on the validation set.

3.3 Method of Training with Noise Addition

In [9] it was shown that the optimal method of training was the one when the training
of the ANN was carried out on a training set with addition of noise, and stop of the
Diagnostics of Water-Ethanol Solutions by Raman Spectra with ANN 323

training on a validation set without noise. In this case, the quality of the solution was
higher, and the training time – less. This approach was used in the present study.
The type of distortion (stretching/contraction) considered in this paper was modeled
as multiplicative noise. Two statistics were considered – Gaussian and uniform. The
noise levels considered were 1, 3, 5, 10, 20%. Thus, including the initial data sets
without noise, 11 training sets and 11 test sets, as well as 1 validation set were used.
Each initial pattern of the training and test sets was presented in 10 noise real-
izations. Networks trained on a training set with a certain noise level were applied to
test sets of all noise levels of the same statistics.

4 Results

4.1 Simulation Alcohol Drinks Set

For the ﬁrst data set that modeled alcoholic beverages, the results for ethanol are shown
in Fig. 3. One can see that the resilience of the solution to distortions in the data is
higher for distortions that have uniform statistics (Fig. 3, right) than for distortions
having Gaussian statistics (Fig. 3, left).

Fig. 3 The dependence of the quality of the solution (mean absolute error, MAE) for ethanol on
the distortion level in the test set for various distortion statistics: left – multiplicative Gaussian
distortion (mgd), right – multiplicative uniform distortion (mud). Various lines represent various
distortion levels in the training set.

For the method of adding noise during training, one can see that the higher is the
noise level in the training set, the slower is the deterioration of the solution when the
noise level in the test set increases. For the other components under consideration, the
nature of the dependencies is completely similar.
The low level of error may indirectly indicate that the dataset is representative.

4.2 Real Alcohol Drinks Set

To solve the problem on the data containing spectra of real alcoholic beverages, the
basic version of ANN was trained without adding distortions to the training set. In this
324 I. Isaev et al.

case, the results of almost the entire data set went into saturation – showed the lower or
upper limit of concentrations in the training sample (Fig. 4, left). This fact indicates a
high degree of difference of the sets.

Fig. 4 Results of application of neural networks to determine ethanol concentration in real

alcohol drinks. Left – network trained without adding distortions, right – networks trained with
addition of 20% Gaussian distortions to training set. Markers represent the determined
concentrations of ethanol; lines represent concentrations declared by drink producers.

Therefore, in the second case, the networks were trained at the maximum (20%)
level of Gaussian noise. The results are shown in Fig. 4, right.
When using the networks trained with noise, the results of determination of the
concentrations were close to those stated by the manufacturers. The average deviation
was 2.07% vol.

5 Conclusion

The following conclusions can be drawn from the results of the work:
• When using this method, the following effect has been confirmed: the higher is the
noise level in the training set, the slower the solution quality decreases with increase
of the noise level in the test set.
• The resilience of the solution to distortions in the data is higher for distortions
having uniform statistics than for distortions having Gaussian statistics.
• A dataset of spectra of real alcoholic beverages differs significantly from the dataset,
simulating alcoholic beverages. As a result, the networks trained without adding
noise, failed to give reasonable results.
• Networks trained with the addition of Gaussian noise with the level of 20% showed
an average deviation of 2.07% vol.
Thus, the effectiveness of the method of training with noise to improve the resi-
lience of the neural network solution of the inverse problem of spectroscopy of aqueous
ethanol solutions was confirmed.
Diagnostics of Water-Ethanol Solutions by Raman Spectra with ANN 325

References
1. Leary, J.: A quantitative gas chromatographic ethanol determination. J. Chem. Educ. 60(8),
675 (1983)
2. Isaac-Lam, M.: Determination of alcohol content in alcoholic beverages using 45 MHz
benchtop NMR spectrometer. Int. J Spectrosc. 2016(2526946), 8 (2016)
3. Zuriarrain, A., Zuriarrain, J., Villar, M., Berregi, I.: Quantitative determination of ethanol in
cider by 1H NMR spectrometry. Food Control 50, 758–762 (2015)
4. Boyaci, I., Genis, H., et al.: A novel method for quantification of ethanol and methanol in
distilled alcoholic beverages using Raman spectroscopy. J. Raman Spectrosc. 43(8), 1171–
1176 (2012)
5. Dolenko, S., Burikov, S., et al.: Adaptive methods for solving inverse problems in laser
Raman spectroscopy of multi-component solutions. Patt. Recogn. Image Anal. 22(4), 551–
558 (2012)
6. Dolenko, S., Burikov, S., et al.: Neural network approaches to solution of the inverse
problem of identification and determination of partial concentrations of salts in multi-
component water solutions. LNCS, vol. 8681, pp. 805–812 (2014)
7. Martelo-Vidal, M., Vázquez, M.: Application of artificial neural networks coupled to UV–
VIS–NIR spectroscopy for the rapid quantification of wine compounds in aqueous mixtures.
CyTA J. Food 13(1), 32–39 (2015)
8. Liu, W., Wang, W., et al.: Use of artificial neural networks in near-infrared spectroscopy
calibrations for predicting glucose concentration in urine. LNCS, vol. 5226, pp. 1040–1046
(2008)
9. Isaev, I.V., Dolenko, S.A.: Training with noise as a method to increase noise resilience of
neural network solution of inverse problems. Opt. Mem. Neural Netw. (Inf. Opt.) 25(3),
142–148 (2016)
10. Isaev, I.V., Dolenko, S.A.: Adding noise during training as a method to increase resilience of
neural network solution of inverse problems: test on the data of magnetotelluric sounding
problem. Studies in Computational Intelligence, vol. 736, pp. 9–16 (2018)
11. Isaev, I., Burikov, S., Dolenko, T., Laptinskiy, K., Vervald, A., Dolenko, S.: Joint application
of group determination of parameters and of training with noise addition to improve the
resilience of the neural network solution of the inverse problem in spectroscopy to noise in
data. LNCS, vol. 11139, pp. 435–444. Springer, Cham (2018)
12. Holmstrom, L., Koistinen, P.: Using additive noise in back-propagation training. IEEE
Trans. Neural Netw. 3(1), 24–38 (1992)
13. Matsuoka, K.: Noise injection into inputs in back-propagation learning. IEEE Trans. Syst.
Man Cybern. 22(3), 436–440 (1992)
14. An, G.: The effects of adding noise during backpropagation training on a generalization
performance. Neural Comput. 8(3), 643–674 (1996)
15. Zur, R.M., Jiang, Y., Pesce, L.L., Drukker, K.: Noise injection for training artificial neural
networks: a comparison with weight decay and early stopping. Med. Phys. 36(10), 4810–
4818 (2009)
16. Piotrowski, A.P., Napiorkowski, J.J.: A comparison of methods to avoid overfitting in neural
networks training in the case of catchment runoff modeling. J. Hydrol. 476, 97–111 (2013)
17. Wang, C., Principe, J.C.: Training neural networks with additive noise in the desired signal.
IEEE Trans. Neural Netw. 10(6), 1511–1517 (1999)
18. Bishop, C.M.: Training with noise is equivalent to Tikhonov regularization. Neural comput.
7(1), 108–116 (1995)
Metaphorical Modeling of Resistor Elements

Vladimir B. Kotov, Alexandr N. Palagushkin,

and Fedor A. Yudkin(&)

Scientiﬁc Research Institute of System Analysis, Moscow, Russia

[email protected]

Abstract. The variable resistors changing their resistance during the process of
functioning may become the basis for creation of neural networks elements
(synapses, neurons, etc.). The processes leading to resistance change are
extremely complicated and are not yet amenable to correct description. To
master the possibilities of using the variable resistors it is reasonable to use the
metaphorical modeling, i.e. to replace a complex physical system with a simple
mathematical system with a small number of parameters, reproducing the
important features of real system’s behavior. A simple (elementary) resistor
element with state determined by a single scalar variable is considered as the
modeling unit. The equations describing the change of the state variable are
written down. The choices of functions and parameters in equations, as well as
the methods of such elements combination with traditional electronic compo-
nents (ﬁxed resistors, capacitors, diodes, etc.) are discussed. The selection of
these functions from a small set and the adjustment of several parameters allow
us to obtain the characteristics close to real ones. The scheme of measuring the
“volt-ampere characteristics” is considered. An example of speciﬁc selection of
functions determining the resistor element behavior is given.

Keywords: Variable resistor State of resistor Equation of the state change

Volt-ampere characteristics

1 Introduction

One of the most promising directions of the neuromorphic devices’ elemental base
development is mastering the possibilities of variable resistors application [1, 2]. Such
resistors change their resistance in the process of functioning and are able to become
the basis for creation of neural networks elements analogs (synapses, neurons, etc.) [2].
Even the special title “memristors” was invented for these elements. However the
different authors understand this title differently. Moreover, the term itself implies the
presence of energy independent memory in “memristors”, which is not necessary at all
for the neural elements implementation. Therefore, in order to avoid misunderstandings
we will not use this term.
The functioning of variable resistors is based on various physical processes [3–6],
which are not yet fully understood due to their complexity. The constructed “physical”
models are not actually physical and require the adjustment of parameters. From the
point of practical development of neuromorphic devices it would be much more useful
to have the simplest model reproducing the main features of behavior, although unable

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 326–334, 2020.
https://doi.org/10.1007/978-3-030-30425-6_38
Metaphorical Modeling of Resistor Elements 327

to approximate the characteristics of devices with high accuracy due to small number of
parameters. At the same time, the model construction is based on general principles, its
speciﬁcation is aimed at maximum simpliﬁcation (provided that the required charac-
teristic features are preserved).

2 Equations and Assumptions

The resistor element obeys Ohm’s law

U ¼ RI; ð1Þ

where U is voltage on resistor, I – current flowing through resistor, R – resistance. As a

result of the current flow (and/or under the action of voltage) the changes occur in the
resistor and this is expressed in a change of its resistance. The state of resistor can be
described by state variables. We assume that for description of the state one scalar
variable x is enough, so R ¼ RðxÞ. Then the general equation describing the change of
state has the form

dx
¼ Fðx; U; I; tÞ ð2Þ
dt

In common conditions the dependence of the equation right side on time can be
ignored. By means of Ohm’s law it is possible to exclude one of the quantities U or
I. As a result, we obtain the equation

dx
¼ Fðx; IÞ ð3Þ
dt
or a similar equation where I ! U. Which one of these equations to use is a matter of
convenience. It can be assumed in many cases that the change in resistance is mainly
due to flowing current with the current dependent function having a simpler form.
Let’s assume that the state variable is between 0 and 1: 0 x 1. This can always
be achieved by converting the variable. We also consider that in the state x ¼ 0 the
resistor has the maximum resistance, and in the state x ¼ 1 – the minimum resistance.
The equations of state change are written for 0\x\1. In order to avoid going beyond
the range of permissible values, the right part of the equations should be considered
equal to zero at x 0 and x 1. The accepted assumptions do not ﬁxate the choice of
the state variable. It can be done in different ways. We can bind the state variable to
resistance R:

Rð xÞ ¼ R0 DRx; where R0 ¼ Rð0Þ [ 0; DR ¼ Rð0Þ Rð1Þ [ 0; ð4Þ

or conductivity G ¼ 1=R:
328 V. B. Kotov et al.

Gð xÞ ¼ G0 þ DGx; where G0 ¼ Gð0Þ [ 0; DG ¼ Gð1Þ Gð0Þ [ 0: ð5Þ

Let us proceed to concretization (simpliﬁcation) of function F from Eq. (3). We use

the expression

F ðx; I Þ ¼ Fxþ ð xÞFIþ ðI Þ þ Fx ð xÞFI ðI Þ þ F0 ð xÞ: ð6Þ

The ﬁrst term describes the effect of positive current on the change of the resistor
state. The effect of negative current is described by the second term. The splitting into
positive and negative (relative to the current direction) parts is due to the fact that for
the most interesting types of resistors the processes during the positive and negative
currents are different. Meanwhile usually the currents of different directions tend to
change the state variable in opposite directions. This is true in particular for structures
of the metal-dielectric/semiconductor-metal type [6]. In this case it is convenient to
determine the current direction in accordance with the resistor direction – the positive
current tends to increase the state variable x, and negative current – to reduce. Here we
can assume that the functions FIþ ðI Þ; FI ðI Þ describing the dependence of the rate of
x change on the quantity of positive and negative currents have the following properties

dFIþ ðI Þ
FIþ ðI Þ ¼ 0 at I 0; FIþ ðI Þ [ 0; [ 0 at I [ 0;
dt
dFI ðI Þ
FI ðI Þ ¼ 0 at I 0; FI ðI Þ [ 0; [ 0 at I \ 0: ð7Þ
dt

We note that the properties (7) are not universal. Thus, if the state variable is the
normalized temperature the heating occurs regardless of the current direction and both
summands have the same sign. However, this case is not very interesting for practice.
In addition, here the limitation to one direction current is possible, so it is sufﬁcient to
take only the ﬁrst term in the right hand side of Eq. (6).
It is possible to indicate as the functions with properties (7) the family of power
functions on semiaxis, that is, we can take

FIþ ðI Þ ¼ B þ I b þ at I [ 0; FI ðI Þ ¼ B ðIÞb at I \ 0; ð8Þ

where B þ ; B are positive coefﬁcients, b þ ; b – positive exponents of power. Many

used models assume the unit exponents of power [1, 4], but other exponents may also
be useful. For example the quite high exponents of power (actually not less than 2)
provide the functions well replacing the threshold functions.
The functions Fxþ ð xÞ; Fx ð xÞ describe inhomogeneity by x rate of state change. It is
natural to assume that

Fxþ ð xÞ 0; Fx ð xÞ 0 at 0\x\1: ð9Þ

These functions, like the F0(x) function, depend on the method of state variable
determination.
Metaphorical Modeling of Resistor Elements 329

Function F0 ðxÞ is used to describe the evolution of the resistor state in the absence
of current. The change of state has a character of approach to stationary state which
either coincides with one of the boundary states (x = 0 or x = 1) or corresponds to zero
of the F0 ðxÞ function. Let us assume for certainty that there is only one stationary
(basic) state x = 0. This is the most typical case. Herewith should be

F0 ð xÞ\0 at 0\x\1: ð10Þ

The most convenient power functions are

F0 ð xÞ ¼ f0 xa ðf0 [ 0; a 0Þ: ð11Þ

The Eq. (3) for I ¼ 0 with a function F0 ðxÞ of the form (11) has a solution

xðtÞ ¼ xðt0 Þ f0 ðt t0 Þ for a ¼ 0;

xðtÞ ¼ xðt0 Þexpff0 ðt t0 Þg for a ¼ 1; ð12Þ

h i1a
1

xðtÞ ¼ xðt0 Þ1a þ ða 1Þf0 ðt t0 Þ for a 6¼ 0; 1

(t0 is the initial time). At a\1, the basic state is achieved in a finite time t t0 ¼
xðt0 Þ1a =ðð1 aÞf0 Þ, afterwards the state is unchangeable. At a ¼ 1, the variable
x tends to zero according to exponential law. Although the basic state is not reached,
the approach to it is very fast. In both cases there is no sense to talk about long-term
memory. At a [ 1, approaching to basic state happens according to power law
1
ðt t0 Þ =1 a . The higher the index a is the slower the relaxation proceeds. The
memory on initial state is retained for long enough. Hence, the function (11) with a
sufficiently high index a allows us to model the long-term memory.
In case of need to model the memory with an infinite storage time, the function
F0 ðxÞ should be equal to zero at x from continuous interval.

3 Circuit to Measure the Electrical Characteristics

To measure the electrical characteristics of variable resistor a ﬁxed resistor with

resistance r is connected in series with it and a given voltage uðtÞ is applied to the
resulting pair. Measuring a voltage on the fixed resistor allows us to find out a current
through the resistors, the voltage on the variable resistor and its resistance. The fixed
resistor is also needed to limit the current. At Rð1Þ Rð0Þ, the dynamic range of
current is very large and the high current at x ! 1 in absence of fixed resistor could
damage the variable resistor. The resistance r is usually chosen according to the
conditions Rð1Þ r Rð0Þ.
330 V. B. Kotov et al.

For this circuit

u
I¼ ; ð13Þ
Rþr

therefore the Eq. (3) with representation (6) is written down as

dx þ þ u u
¼ Fx ð xÞFI þ Fx ð xÞFI þ F0 ðxÞ: ð14Þ
dt Rð xÞ þ r R ð xÞ þ r

At u 0, the ﬁrst term in the right part of Eq. (14) is equal to zero and two other
summands are negative with taking into account the properties of (7), (9), (10). It
means that here happens the accelerated relaxation to the ground state x ¼ 0. The
negative voltage u can be used for fast erase of information.
The second term in the right part of Eq. (14) annuls at u 0. The remaining
summands have opposite signs. Their sum can be either positive or negative depending
on x and u. Considering the right part of Eq. (14) Fðx; IÞ as a function of x and u, we
obtain partitioning of the permissible values region 0 x 1; u 0 into the regions
F [ 0 and F\0. At F [ 0, the state variable x increases over time, and decreases at
F\0.
Areas F [ 0 and F\0 are separated by the curve F ¼ 0. Above the curve F ¼ 0
(i.e. at bigger u values) is the region F [ 0, below – the region F\0. The equation
F ¼ 0 at a given value u determines the stationary point xst ðuÞ corresponding to the
stationary state of resistor at constant voltage of the source u. The stationary point is a
stable equilibrium point (or a stable stationary point) if to its left (x\xst ) we have
F [ 0 and to its right – F \ 0. Otherwise we have the unstable stationary point.
Besides the stationary points determined by equation F = 0 the boundary stationary
points are possible. The point x ¼ 0 is a stable stationary point if at small positive
values x we have F\0. The point x ¼ 1 is a stable stationary point when to its left
F [ 0.
Just the stable stationary points play the determining role at the direct voltage
u since in this case the Eq. (14) describes the approximation to stationary point. In most
cases the approach to stationary point is fast enough – exponential, or even the sta-
tionary point is achieved over the ﬁnite time. The conclusions for direct voltage u case
can be extended to the case of quasi-stationary voltage change, when the state of
resistor has time to adjust to current voltage.
In this case (i.e. at u [ 0), the equation of curve F ¼ 0 can be written in the form

u ¼ PðxÞ; ð15Þ

where

F0 ðxÞ
Pð xÞ ¼ ðRð xÞ þ r Þh þ ; ð16Þ
Fx ðxÞ
Metaphorical Modeling of Resistor Elements 331

u u
Ps u=P(x) Ps u=P(x)

Pi Pi
0 1 x 0 1x

Fig. 1. Increasing function P(x) Fig. 2. Nonmonotonous function

and one-dimensional trajectories P(x) and one-dimensional trajectories
at different u

hðzÞ is a function inverse to the function FIþ ðIÞ. On condition that the inequalities
(7) fulﬁll and providing that the function FIþ ðIÞ is unbounded at I ! þ 1 we obtain
that the function h(z) biunivocally and monotonically maps the positive semi-axis onto
the positive semi-axis.
Obviously, Pð xÞ [ 0 at 0\x\1. Let’s denote Ps and Pi the exact upper and lower
bounds of function Pð xÞ at 0\x\1. If the function Pð xÞ is unlimited we consider
Ps ¼ 1. For u\Pi the Eq. (15) for variable x has no solutions, and the only stationary
(stable) point is the boundary point x ¼ 0. At u [ Ps , the Eq. (15) also has no solu-
tions, here the only stationary point is the boundary point x ¼ 1.
For Pi \u\Ps , the Eq. (15) has at least one solution. If the function Pð xÞ is
increasing then the solution of Eq. (15) is the only one. This solution determines the
sole stationary (stable) point. The Fig. 1 shows such curve F = 0 together with one-
dimensional trajectories of the imaging point movement at different voltages u. And if
the function Pð xÞ is monotonically decreasing then the only solution of Eq. (15)
determines the unstable stationary point. Here the both boundary points x ¼ 0 and
x ¼ 1 are stable stationary points.
For nonmonotonous function Pð xÞ, in a certain range of voltage u values the
Eq. (15) has more than one solution. The solution xst corresponding to positive slope of
the curve u ¼ Pð xÞ provides the stable stationary point, and if at x ¼ xst the slope of the
curve is negative we get the unstable stationary point. Additional stable stationary
points can be located at interval boundaries. For given value of u the number of stable
stationary points must be one more than the number of unstable stationary points. In
typical cases there may be the two stable stationary points and one unstable point
(Fig. 2).
332 V. B. Kotov et al.

xst
u
1

0 u 0 xm 1x

Fig. 3. Switching branches of func- Fig. 4. Function P(x) from for-

tion xst(u) at quasistationary change mula (17) with one maximum and
of voltage u one-dimensional trajectories

If there are several roots of Eq. (15) the dependence of the stationary (stable) point
on voltage xst ðuÞ is multivalued (usually double-valued) within certain range of volt-
ages u. Under quasi-stationary source voltage change the change of resistor state
corresponds to movement on one of the branches of the function xst ðuÞ. If this branch
ends, the transition to another branch occurs inevitably (Fig. 3). A sharp change of the
resistor state accompanied by sharp changes of resistance, current and voltage of the
variable resistor is the most obvious manifestation of multi-stability (bistability in
having the two stable stationary states).

4 Example

Let’s take FIþ ðIÞ in the form (8), F0 ðxÞ as (11), RðxÞ as (4), and Fxþ ð xÞ ¼ 1. Then

1=
f0 b a=
Pð xÞ ¼ ðr þ R0 DRxÞx b : ð17Þ
Bþ

In this case Pi ¼ Pð0Þ ¼ 0; Ps \1. The function PðxÞ on the positive semi-axis has
a maximum at

a r þ R0
x ¼ xm : ð18Þ
a þ b DR

If xm 1 then the function PðxÞ is monotonically increasing at 0\x\1, the

Eq. (15) has only one solution at 0\u\Ps ¼ Pð1Þ, representing the stable stationary
state. At u Pð1Þ the only stationary state (stable) is the boundary state x ¼ 1.
And if xm \1, the maximum is within the permissible range of variable x (Fig. 4).
For 0\u\Pð1Þ, the Eq. (15) has the single solution representing the single stationary
state. For Pð1Þ\u\Ps ¼ Pðxm Þ, the Eq. (15) has two solutions: the smaller corre-
sponds to stable stationary state and the larger – to unstable stationary state. The second
Metaphorical Modeling of Resistor Elements 333

x,u
I

U 0 5 10 15 20 25 30 35 40 45

Fig. 5. “Volt-ampere characteristic” at Fig. 6. Time dependence x(t) with the

triangular voltage feeding graph of normalized voltage of source

stable stationary state is the boundary state x ¼ 1. The same boundary state is the only
stable state at u [ Pðxm Þ.
Thus, the inequality xm \1 is the condition for presence of bistability. Taking into
account that usually DR R0 ; r R0 , so the second multiplier in the right part of (18)
is of order of magnitude of one, we find out that the fulfillment of bistability condition
is quite real. However if the inequality xm \1 is observed with insufficient margin the
range of bistability becomes rather narrow and it is difficult to detect the bistability.
In practice, the periodically changing voltage of standard form (triangular, notched,
sinusoidal) is used as the source voltage. The condition of quasistationarity often is not
met. At that, the state of resistor does not keep up to get close enough to “stationary”
state for the current value of voltage u. So the state of resistor tends to “stationary” state
which is constantly changing. The resistance shocks arising due to the bistability can be
strongly smoothed due to incomplete relaxation towards the stationary state.
Figure 5 shows the “volt-ampere characteristic” (more precisely, the trajectory of
point with coordinates U, I) at using the triangular voltage u (of positive polarity),
obtained as a result of numerical solution of Eq. (6) at a ¼ 2; b ¼ 1; Rr0 ¼ 50;
R0 DR ¼ 1000. The three loops correspond to three periods of the source voltage. The
R0

difference of the loops is explained by the fact that at completion of the period of
voltage change, the state variable does not return to the initial value. This is clearly seen
in Fig. 6, where the graph of dependence xðtÞ along with the graph of the normalized
source voltage is presented.

5 Instead of Conclusion. Combinatorics

The considered model of simple resistor element can explain a lot and predict some-
thing. But not everything. It’s natural. The world of resistor elements is diverse and
cannot be covered by one simple model. Nevertheless the capabilities of model can be
signiﬁcantly expanded if we are not limited to one element and build on its basis the
various combinations.
334 V. B. Kotov et al.

The real variable resistor has two poles (contacts). Any of contacts can be a source
of variable (controlled) resistance. This is in any case true for metal-dielectric-metal,
metal-semiconductor-metal and other similar structures. To model such structures it is
necessary to use not one resistor element, but two parallel oppositely directed resistor
elements. The resulting combination has much richer capabilities (and is more com-
plex) than one resistor element.
At the points of different materials contact (for example, metal and dielectric) the
diverse diode structures characterized by nonlinear volt-ampere characteristics may
occur. If such characteristic can be considered as constant with no memory, then the
effect of the structure is reduced to series connection of diode or other similar non-
linear element. And if the volt-ampere characteristic depends on previous events, then
for an additional diode with memory it is possible to use a model similar to above
considered, but with use of the nonlinear Ohm’s law. In many cases it is convenient to
consider the diode and resistor elements as one.
The simple resistor element can act as a memory element – an analog of synapse if
the rate of resistor relaxation is somehow limited, for example the index a in formula
(11) is large enough. The resistor element can be used as a nonlinear element – analog
of neuron, since the resulting characteristics of resistor are essentially nonlinear, and
even the bistability is possible. If parallel to resistor element (simple or combined) a
capacitor is connected it is possible to implement the “leaky integration” found in many
neural networks.
The considered model is not only useful for presentation of existing resistor ele-
ments, but it also can indicate the direction of perspective elements improvement.

Funding. The work ﬁnancially supported by State Program of SRISA RAS No. 0065-2019-
0003 (AAA-A19-119011590090-2).

References
1. Adamatzky, A., Chua, L.: Memristor Networks. Springer, Heidelberg (2014)
2. Vaidyanathan, S., Volos, C.: Advances in Memristors. Memristive Devices and Systems.
Springer, Heidelberg (2017)
3. Yang, J.J., Strukov, D.B., Stewart, D.R.: Memristive devices for computing. Nat.
Nanotechnol. 8, 13 (2013)
4. Radwan, A.G., Fouda, M.E.: On the Mathematical Modeling of Memristor, Memcapacitor
and Meminductor. Springer, Heidelberg (2015)
5. Yang, Y., Lu, W.: Nanoscale resistive switching devices: mechanisms and modeling.
Nanoscale 4, 10076 (2013)
6. Palagushkin, A.N., et al.: Aspects of the a-TiOx Memristor Active Medium Technology.
J. Appl. Phys. 124, 205109 (2018)
Semi-empirical Neural Network Models
of Hypersonic Vehicle 3D-Motion
Represented by Index 2 DAE

Dmitry S. Kozlov1,2(B) and Yury V. Tiumentsev1

1
Moscow Aviation Institute (National Research University), Moscow, Russia
[email protected], [email protected]
2
Federal State Unitary Enterprise “State Research Institute of Aviation Systems”,
Moscow, Russia

Abstract. We consider a problem of mathematical modeling and com-

puter simulation of nonlinear controlled dynamical systems represented
by differential-algebraic equations of index 2. The solution of the prob-
lem is proposed within the framework of a neural network based semi-
empirical approach that combines theoretical knowledge of the model-
ing object with training tools applied to artificial neural networks. We
propose particular form semi-empirical models implementing implicit
Runge-Kutta integration formulas inside the activation function. The
training of the semi-empirical model makes it possible to elaborate on
the models of aerodynamic coefficients implemented as a part of it. We
present a semi-empirical model that uses as theoretical knowledge the
equations of a full model of hypersonic vehicle motion in the specific
phase of descent in the atmosphere. The simulation results for the prob-
lem of identifying the aerodynamic coefficient, implemented as an ANN-
module of a semi-empirical model of the movement of a hypersonic vehi-
cle, are presented.

Keywords: Dynamical system · Diﬀerential-algebraic equations ·

Semi-empirical model · Neural network based simulation

1 Introduction

The semi-empirical approach assumes the generation of gray-box models using

theoretical knowledge about the simulated object in the form of a system of
ordinary differential equations (ODE) [1]. We transform the initial theoretical
model into a semi-empirical one taking into account the methods of integrat-
ing the ODE so that the neural network methods could modify parts of the
model. In [1], we present the simulation results, confirming the high efficiency of
the semi-empirical approach compared with the traditional black-box dynamic
neural network models, such as NARX (Nonlinear AutoRegressive network with
eXogeneous inputs). The difference between the semi-empirical approach and
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 335–341, 2020.
https://doi.org/10.1007/978-3-030-30425-6_39
336 D. S. Kozlov and Y. V. Tiumentsev

the NARX approach lies in the fact that in the first case when generating a
model, some of the connections between state variables and control variables
of the source system of ODEs are embedded into the model without changing.
That allows us to reduce the number of adjusting parameters of the model and
improves its generalization properties.
In [2] Runge-Kutta neural networks (RKNN) are proposed for building mod-
els of dynamic systems represented in the form of ODE. This approach also
assumes the use of theoretical knowledge about the modeling object in the form
of the explicit Runge-Kutta integration formulas implemented in the network
architecture. RKNN has layers that, taking into account the connections between
state variables, implement the right parts of the ODE system. With this app-
roach, when training RKNN, the models of the right-hand side are refined.
In some problems, in addition to ODE, the theoretical model includes alge-
braic equality-type constraints, that is, the system of differential-algebraic equa-
tions (DAE) is the basis of the theoretical model. For DAE systems, the concept
of the index of the DAE system [3] is introduced. An example of such a problem
is the controlling of vehicle descending in the upper atmosphere.
In [1,4] the semi-empirical approach based on the explicit conditionally stable
methods of the numerical integration is considered. It is not possible to use this
approach directly to modeling the systems represented by DAE. A modification
is needed that takes into account the specific character of DAE systems.

2 Semi-empirical Models for DAE Systems

Let us examine the system of the diﬀerential-algebraic equations of index 2 in

the semi-explicit form

ẏ = f (t, y, z, u), 0 = g(t, y), (1)

where y = y(t) is a vector of state variables of the system, z = z(t) is the state
variable which is DAE algebraic variable (1), u = u(t) are the control variables.
We reduce the index of the system (1) by diﬀerentiating the algebraic constraint
[3]. The new algebraic constraint takes the form 0 = 2ġ+g. For index 1 DAE sys-
tems, the use of one-step s-stage methods for numerical integration is promising.
The implicit Runge-Kutta (IRK) method is often used. We propose to use the
IRK method based on the quadrature formula Radau IIA [3,5,6]. Applying IRK
method to the DAE system, we get (2)–(3). Using an implicit scheme involves
solving the system of nonlinear equations (2) by Newton’s method at each step
of integration:

s
Yni = yn + h aij f (tn + cj h, Ynj , Znj ), 0 = g̃(tn + ci h, Yni , Zni ), (2)
j=1

s
s
yn+1 = yn + h bi f (tn + cj h, Ynj , Znj ), zn+1 = R(∞)zn + bi ωi,j Zn,j , (3)
j=1 i,j=1
Semi-empirical ANN Index 2 DAE Models of Hypersonic Vehicle 3D-Motion 337

Fig. 1. The structural scheme of the semi-empirical model

where h is the integration step, (aij ), bi , cj are Butcher table coeﬃcients, yn is

s at step tn , ωij are the elements of the matrix
the vector of the state variables
inverse to (aij ), R(∞) = 1 − i,j=1 bi ωij .
The structural scheme of the semi-empirical model is shown in Fig. 1. We
described the network structure in [6]. The neural network is trained using the
RTRL (Real-Time Recurrent Learning) algorithm. We form the training set as
a sequence of observed outputs for a given control and initial conditions. It
uses a random input control signal (U (t)) of a speciﬁc type [5,6]. In contrast,
to [1,4], which implements the integration scheme in the network structure, we
propose the approach whereby the procedure containing the integration scheme
is speciﬁed inside the activation function of the network layer [7]. This approach
allows us to implement both explicit and implicit integration schemes in a model.
Let us consider the RKNN [2], which implements, to simplify the calculations,
2-stage Heun’s method. The method is explicit, and the network architecture
implements a cascade scheme with a single input and output (4). The values at
each stage of the method (K1 ) are calculated using the values of the previous
stage (K1 ), followed by their composition:

K0 = Nf (yn , W), K1 = Nf (yn + hK0 , W), yn+1 = yn + h/2(K0 + K1 ), (4)

where yn , yn+1 are network input and output respectively, Nf , W are ANN-
modules that implement the right-hand sides of the ODE system and their
weights. When training the network, the delta rule (5) is used to modify the
weights of the ANN-modules. The derivatives are calculated by the chain rule
considering the fact that the network error propagates through the cascade cir-
cuit to the network input and the same ANN-modules are used at each stage of
the method:

∂E ∂yn+1 ∂yn+1 h ∂K0 ∂K1
= −2(on+1 − yn+1 ) , = + ,
∂wj ∂wj ∂wj 2 ∂wj ∂wj
∂K0 ∂Nf (yn , W) ∂W
= , (5)
∂wj ∂W ∂wj
∂K1 ∂Nf (yn + hK0 , W) ∂K0 ∂Nf (yn + hK0 , W) ∂W
= h + ,
∂wj ∂y ∂wj ∂W ∂wj
338 D. S. Kozlov and Y. V. Tiumentsev

For the proposed semi-empirical models that realize implicit integration

schemes, it is not possible to implement a cascade scheme in the network archi-
tecture. When calculating the derivative in accordance with equation (3), the
error propagates only through hbi f (tn + cj h, Ynj , Znj ). We perform calculations
taking into account that Yni , Zni are known. Since we use the same ANN-modules
for each stage, several values obtained for such a weight coeﬃcient are obtained
by such a delta rule. For modiﬁcation, we use the smallest value.

3 Simulation Results

The proposed semi-empirical models can be used in the algorithms of trajec-

tory prognosis for the aircraft descending in the upper atmosphere. During the
descending, the flight trajectory we can divide into separated parts. The motion
along each part can be performed when state variables satisfy specific constraint
in the form of algebraic equality [5–7]. The training of the semi-empirical model
allows elaborating the models of aerodynamic coefficients implemented in it as
a separate artificial neural network (ANN) modules.
Let us consider the identification task for the aerodynamic pitching moment
coefficient Cm within a model of the hypersonic vehicle motion. The hypersonic
vehicle model from [8] and the standard model of the atmosphere are used in the
simulation. A full model of the vehicle motion, containing differential equations
describing the trajectory and angular motion, as well as the equations of the
actuators of control surfaces (6)–(7) is considered. This model is for the zero-
thrust phase of the flight. We use the right/left elevons and the rudder as control
surfaces.

V cos γ sin ψW V cos θ sin φ + sin γ sin β

μ̇ = , λ̇ = cos γ cos ψW , sin φW = ,
r cos λ r cos γ cos β
−fxW 2
Ḣ = V sin γ, V̇ = − g sin γ − ωE r cos λ (sin λ cos ψW cos γ − cos λ sin γ) ,
m
fyW V
ψ̇W = + cos γ sin ψW tan λ − 2ωE (cos λ cos ψW tan γ − sin λ)
mV cos γ r
2

ωE r cos λ sin λ sin ψW −fzW cos γ V 2
+ , γ̇ = + − g + 2ωE cos γ sin ψw
V cos γ mV V r
2
ωE r cos λ
+ (sin λ cos ψW sin γ + cos λ cos γ), Tr2 d¨r = −2Tr ξr d˙r − dr + dr,act ,
V
Ta2 d¨a = −2Ta ξa d˙a − da + da,act , Te2 d¨e = −2Te ξe d˙e − de + de,act , M = B · ME ,
Ix ṗ + (Iz − Iy )rq = L̄, Iy q̇ + (Ix − Iz )pr = M̄ , Iz ṙ + (Iy − Ix )qp = N̄ ,
⎡ ⎤ ⎡ sin φ cos φ
⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤
ψ̇ 0 cos θ cos θ
p − Mx MxE cos λ 0
⎣ θ̇ ⎦ = ⎣0 cos φ ωE + μ̇
− sin φ ⎦ ⎣q − My ⎦ , ⎣MyE ⎦ = ⎣ 0 −1⎦ ,
λ̇
φ̇ 1 sin φ tan θ cos φ tan θ r − Mz MzE − sin λ 0
(6)
Semi-empirical ANN Index 2 DAE Models of Hypersonic Vehicle 3D-Motion 339

GW Z + macW Z
α̇ = q − tan β(p cos α + r sin α) + , da,act = dpitch + droll ,
mV cos β
GW Y + macW Y (7)
β̇ = p sin α − r cos α + , de,act = dpitch − droll ,
mV
fxW = −D, fyW = Y cos φW + L sin φW , fzW = Y sin φW − L cos φW ,
where μ is the longitude, λ is the geocentric latitude, γ is the relative flight path
angle, ψW is the relative azimuth, H is the altitude, r is the distance from the
Earth center to the center of mass of the vehicle, V is the relative velocity, φW
is the bank angle, α is the angle of attack, β is the angle of sideslip, [ψ, θ, φ]
are Euler angles, [p, q, r]T are components of angular velocity vector, B is a
matrix transforming vectors from vehicle-carried local Earth reference frame to
body-fixed, D, L, Y are total aerodynamic drag, lift and side forces respectively,
L̄, M̄ , N̄ are aerodynamic rolling, pitching and yawing moments respectively,
Ix , Iy , Iz are the roll, pitch and yaw moments of inertia respectively, da , de , dr
are the deflections of the right and left elevons and the rudder, da,act , de,act , dr,act
are control signals for right and left elevons and the rudder actuators, dpitch , droll
are pitch and roll motion control signals, T = 0.02 sec are the time constants
for right/left elevons and rudder actuators, ξ = 0.707 are the right/left elevons
and rudder actuators damping ratios, ωE is the Earth rotational rate, g is the
geopotential function, m = 191902 lb is the mass of the vehicle, acW , GW are the
vectors of the Coriolis acceleration and force of gravity in wind-axes reference
frame respectively.
In the DAE system H, μ, λ, V, ψW , γ, ψ, θ, φ, p, q, r, α, β, da , de , dr are state
variables, droll is an algebraic variable. Pitch and roll motion control signals
dpitch , droll are control variables. The rudder control law is given in [6]. We cal-
culate values droll at each step of the numerical integration of the DAE system
following the (α–φW )-technique for control of aircraft descending in the upper
atmosphere [5–7]. To ensure movement along a given trajectory, the model (6)–
(7) is enclosed by an algebraic equality (8) describing the variation of relative
flight path angle γ in the range of [−4.2385◦ , −10◦ ]. The resulting system of
equations can be attributed to the index-2 DAE system. The equation (8) is
transformed for calculations. For variable γ̇ the index reduction by differentia-
tion procedure and the right-side (6)–(7) substitution are performed.

0 = γ + 4.2385 + 9 (t/200)2 , 0 = γ̇ + 18t/40000. (8)

We generate the semi-empirical model in the form of the modular neural

network. During the simulation, ANN-module is retrained, which implements the
pitching moment coefficient Cm . As a new moment coefficient, the model Cm is
used for the Mach number more than when the maneuver is performed (M +5). In
the training set dpitch sequences of a particular form as input data [5,6] are used.
We use the values of the pitch rate q as the output data. The weight coefficients
are changed to reproduce a new relationship during the training procedure.
We used MATLAB system and Neural Network Toolbox package when imple-
menting the semi-empirical models and in the course of the computer simula-
tions. We used t0 = 419 s and 1000 iterations were performed with the integration
340 D. S. Kozlov and Y. V. Tiumentsev

40
dpitch, deg

10
0 2 4 6 8 10 12 14 16 18 t, sec
−3
x 10
3

2
q
E

0
0 2 4 6 8 10 12 14 16 18 t, sec
0.02

0.015
q, rad/sec

0.01

0.005

0
0 2 4 6 8 10 12 14 16 18 t, sec
0
droll, deg

−0.5

−1
0 2 4 6 8 10 12 14 16 18 t, sec

Fig. 2. The semi-empirical model output for values from the test set

step t = 0.2 s. The initial values were H = 1.272e+5 ft, V = 6.922e+3 ft/sec,
γ = −4.2385◦ , ψW = 55.316◦ , μ = 183.8◦ , λ = 34.4◦ , ψ = 69.767◦ , θ = 9.64◦ ,
φ = 46.69◦ , α = 20◦ , β = 0◦ , ω = 0 rad/sec, droll = 0◦ d˙ = 0, da = de = 0◦ ,
dr = 1◦ . The hypersonic vehicle characteristics Ix , Iy , Iz , xcg , S, c̄, b and aerody-
namic force and moment coeﬃcient models (D, L, Y, L̄, M̄ , N̄ ,) are given from
[8]. To implement the model of the hypersonic vehicle motion a semi-empirical
model was used that realizes order 3 IRK method of numerical integration based
on Radau IIA quadrature formulas. A perceptron type network with 12 neurons
in the hidden layer was used as an ANN-module for Cm . In Fig. 2 we show the
values of the pitch control signal (dpitch ) from the test set, the values of the pitch
rate q calculated using the semi-empirical model, the values of the algebraic vari-
able (droll ) and the relevant absolute error (Eq ) of the q values reproduced by
the semi-empirical model. The root mean square deviations for the training, the
validation, and the test sets are respectively 6.6207e−4, 7.8975e−4, 0.0014.

4 Conclusions

The semi-empirical model was implemented using the equations of the full model
of the hypersonic vehicle motion in the specific part of the descent in the atmo-
sphere as theoretical knowledge. We present this system of equations as a DAE
system of index 2. The aerodynamic pitching moment coefficient implemented as
an ANN-module of a semi-empirical model has been identified to verify the train-
ing properties of this model. The obtained results demonstrate the efficiency of
the semi-empirical approach for neural network modeling of complex dynamical
systems.
Semi-empirical ANN Index 2 DAE Models of Hypersonic Vehicle 3D-Motion 341

Acknowledgments. This research is supported by the Ministry of Science and Higher

Education of the Russian Federation as Project No. 9.7170.2017/8.9.

References
1. Egorchev, M.V., Kozlov, D.S., Tiumentsev, Y.V., Chernyshev, A.V.: Neural net-
work based semi-empirical models for controlled dynamical systems. J. Comput.
Inf. Technol. 9, 3–10 (2013). (in Russian)
2. Wang, Y.J., Lin, C.T.: Runge-Kutta neural network for identification of dynamical
systems in high accuracy. IEEE Trans. Neural Netw. 9(2), 294–307 (1998)
3. Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II: Stiff and
Differential-Algebraic Problems, 2nd edn. Springer, Heidelberg (2002)
4. Egorchev, M.V., Tiumentsev, Y.V.: Learning of semi-empirical neural network
model of aircraft three-axis rotational motion. Opt. Mem. Neural Netw. (Inf. Opt.)
24(3), 201–208 (2015)
5. Kozlov, D.S., Tiumentsev, Y.V.: In: Proceedings of 8th Annual International Con-
ference on Biologically Inspired Cognitive Architectures, BICA 2017, vol. 128, pp.
252–257 (2018)
6. Kozlov, D.S., Tiumentsev, Y.V.: Neural network based semi-empirical models of
3D-motion of hypersonic vehicle. In: Advances in Neural Computation, Machine
Learning, and Cognitive Research II, pp. 196–201. Springer, Cham (2019)
7. Kozlov, D.S., Tiumentsev, Y.V.: Neural network based semi-empirical models for
dynamical systems described by differential-algebraic equations. Opt. Mem. Neural
Netw. (Inf. Opt.) 24(4), 279–287 (2015)
8. Shaughnessy, J.D., et al.: Hypersonic vehicle simulation model: winged-cone config-
uration. Technical report, NASA (1990)
Style Transfer with Adaptation
to the Central Objects of the Scene

Alexey Schekalev1 and Victor Kitov1,2(B)

1
Lomonosov Moscow State University, Moscow, Russia
[email protected]
2
Plekhanov Russian University of Economics, Moscow, Russia
[email protected]
https://victorkitov.github.io

Abstract. Style transfer is a problem of rendering an image with some

content in the style of another image, for example a family photo in
the style of a painting of some famous artist. The drawback of classical
style transfer algorithm is that it imposes style uniformly on all parts of
the content image, which perturbs central objects on the content image
(such as face and body in case of a picture with a person), and makes
them unrecognizable. This work proposes a novel style transfer algorithm
which automatically detects central objects on the content image, gen-
erates spatial importance mask and imposes style non-uniformly: central
objects are stylized less to preserve their recognizability and other parts
of the image are stylized as usual to preserve the style. Three meth-
ods of automatic central object detection are proposed and evaluated
qualitatively and via a user evaluation study. Both comparisons demon-
strate higher quality of stylization compared to the classical style transfer
method.

Keywords: Computer vision · Image processing · Style transfer ·

Image classiﬁcation

1 Introduction
Non-photorealistic rendering or image stylization [5] is a classical problem in
computer vision, where the task is to render a content image in a given style.
Early methods [3,7,9] perform reproduction of specific styles (e.g. oil paintings
or pencil drawings) and use hard-coded features and algorithms for that.
Style transfer is a problem of transferring any style from arbitrary image,
representing that style, to any content image, as shown on Fig. 1. It is found
by Gatys et al. [2] that this task can be performed surprisingly well using deep
convolutional neural networks. Their main idea is to find in the space of images a
picture semantically reflecting content from the content image and style from the
style image. These two contradicting goals are regulated by minimizing simulta-
neously content loss and style loss:
y = arg min{Lcontent (x, xc , α) + Lstyle (x, xs )} (1)
x
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 342–350, 2020.
https://doi.org/10.1007/978-3-030-30425-6_40
Style Transfer with Adaptation to the Central Objects of the Scene 343

Fig. 1. Style transfer task

where xc is the content image, xs —the style image, y—the resulting stylized
image and parameter α is a weight factor (multiplier) inside the content loss
function, controlling the strength of stylization (Fig. 2a). Lower α imposes more
style and vice versa. The shortcoming of this approach is that style is imposed
uniformly onto the whole content image, distorting important central objects of
the image, which are critical for perception. For example, it is hard to say what
kind of birds sit on the tree (Fig. 2b), because small details of bird silhouettes
are lost during stylization.

(a) (b)

Fig. 2. (a) Style transfer for diﬀerent α. (b) Problem case

One may improve preservation of content by increasing α coeﬃcient in (1).

However this solution decreases stylization strength globally, thus giving less
expressive stylization.
The paper proposes a new solution to this problem. First, central objects are
detected and selected using automatically generated spatial importance mask
344 A. Schekalev and V. Kitov

for the content image. Next, this mask is used to impose style with spatially
varying strength, controlled by the importance mask. This allows to achieve two
contradicting goals. Stylization is gentle on the central objects of the image,
critical for perception, such as human faces, houses, cars, etc. And stylization is
strong for the rest of the image, thus expressing a vivid style.
The paper is organized as follows. Section 2 gives a description of the pro-
posed method and provides qualitative comparisons with the baseline stylization
method of Gatys et al. [2]. Section 3 provides the details of the user evaluation
study and summarizes its results, highlighting the superiority of the proposed
solution. Section 4 concludes.

2 Method
2.1 Non-uniform Stylization
Consider the loss function in the optimization problem (1). In the original paper
[2] content loss is formalized as follows:
2
Lcontent (x, xc , α) = α l
Fi,j,c (x) − Fi,j,c
l
(xc ) (2)
i,j,c

where F l (z) ∈ RWl ×Hl ×Cl denotes inner tensor representation of image z on the
l-th layer of the convolutional neural network, (i, j) are spatial coordinates and
c is the number of the channel. Instead of using scalar α, we propose to use a
matrix αi,j ∈ RWl ×Hl with diﬀerent values for each spatial location (i, j):
l 2
L content (x, xc , α) = αi,j Fi,j,c (x) − Fi,j,c
l
(xc ) (3)
i,j,c

Making α spatially varying allows spatial control of the stylization strength.

In particular, it allows to impose less style on central objects of the scene, critical
for perception, and more style on all other areas of the image.

2.2 Automatic Central Objects Detection

Consider convolutional neural network pre-trained for image classification. We
use VGG [8]. Such model takes input image of size 224 × 224 × 3 (for different
size scaling is performed) and outputs probability distribution for each class
from ImageNet dataset [1]. We detect central objects by filling different parts
of the input image with uniform color and measuring change in the output
class probabilities. If key object of the image is filled, one observes a drastic
change in resulting class probabilities. On the contrary, if background is changed,
class probabilities change only slightly. Overall, the magnitude of change of class
probabilities determines the importance of the filled region. This approach was
used to visualize convolutional neural networks in classification problems [10],
but in the domain of style transfer, to our knowledge, it is used for the first
Style Transfer with Adaptation to the Central Objects of the Scene 345

time. We split the whole image into a set of regions and ﬁll each region one by
one, evaluating its importance, using the above principle. This way we construct
an importance map αi,j , measuring semantic signiﬁcance of each location (i, j)
on the image. This importance map is used as matrix α in the spatially aware
content loss (3) of the style transfer algorithm (1).

(a)

(b)

Fig. 3. (a) The probability distribution for input. (b) Changing the probability distri-
bution when the patch is overwritten

Fixed Patch-Based Mask Generation. In this approach we propose to

divide the image by a uniform grid into regular square patches p1 , ...pK (like
the input image on Fig. 3b). Denote input image with I, and Ik - input image
with k-th patch filled with constant color. We use pretrained classification con-
volutional neural network cnn(·), that takes image as input and outputs a vector
of class probabilities, corresponding to the image. We estimate the importance
of each patch k by calculating L2 distance between vectors of class probabilities
for original and modified image: cnn(I) − I(Ik ). Visualization of results shows
that proposed algorithm can find central object of the scene and separate it
from the background—the muzzle of a dog on Fig. 4a. We rescale a map of patch
importances to the spatial size Wl × Hl of intermediate image representation in
convolutional neural network on layer l, where content loss (3) is calculated, to
obtain weights αi,j . Next we apply style transfer procedure (1) with spatially
varying content loss (3).
At Figs. 4b and c the difference is shown between the baseline approach (style
transfer with the standard content loss (2)) and the proposed model correspond-
346 A. Schekalev and V. Kitov

ingly. There are a lot of small details at dog’s muzzle that are lost in baseline
approach and preserved in our algorithm.

(a) (b) (c)

Fig. 4. (a) Patch importance. (b) Baseline. (c) Patch stylisation

Average Patch-Based Mask Generation. It is found that ﬁxed patch grid

produces step-like boundary, consisting of horizontal and vertical edges, that is
not flexible enough to surround central object of arbitrary shape. For exam-
ple, on Fig. 4a important patch covers not only the central object, but also the
background in a step-like manner. To extend the shape of the boundary we addi-
tionally propose to use previous fixed patch grid algorithm for different positions
of the grid mesh and combine results together by pixel-wise averaging, as shown
on Fig. 5a. Resulting stylizations for the baseline and the proposed method are
shown on Figs. 5b and c. Averaging of different matrices allows to obtain smooth
distribution of weights with a smooth gradual boundary of elliptical shape.

Superpixel-Based Mask Generation. If central objects have complicated,

especially non-convex, boundaries, the proposed method becomes unsuitable. To
improve the results, instead of using a uniform patch grid, we suggest to split
the image into superpixels [6].
Superpixel extraction algorithm divides the image into small segments
(superpixels), the boundaries of which are the regions of sharp color change,
which reﬂect very accurately the true boundaries of the objects on the image
(Fig. 6a). The importance of each superpixel is approximated by the average
importance of the square patches, belonging to the superpixel, which in turn
can be estimated using ﬁxed or average patch-based mask generation algorithm
described above. Superpixel algorithm has two main parameters responsible for
the number of segments and the shape of the boundaries. We run the algorithm
over a set of typical values for these parameters and then average the obtained
masks for better quality, see Fig. 6b.
Style Transfer with Adaptation to the Central Objects of the Scene 347

(a) (b) (c)

Fig. 5. (a) Averaging α matrices. (b) Baseline. (c) Averaging patch stylization.

(a) (b)

Fig. 6. (a) Superpixels. (b) Averaging α matrices

Figure 7 shows qualitative diﬀerence between uniform stylization (a) aver-

aging patch-based (b) and superpixel-based (c) spatially varying stylization.
Boundaries of the central object – the glass – are non-convex, thus superpixel-
based approach extracts the boundary of such object better, which improves the
quality of the ﬁnal stylization.

Segmentation-Based Mask Generation. Deep learning models are good at

image segmentation tasks [11]. To select the boundaries of central objects more
accurately we can split the image into obtained semantic segments. The impor-
tance of each segment can be approximated by the average importance of the
square patches, belonging to the segment, which in turn can be estimated by
the fixed or average patch-based mask generation algorithm described above.
This approach allows to increase quality of stylization when it is easy to sepa-
rate central object from background using segmentation algorithms. Illustrative
example on Fig. 8 shows, that stylization algorithm with segmentation locates
the car exactly along its border, which allows to build accurate importance map,
which is very consistent with the actual border of the central object. In contrast,
superpixel-based algorithm affects some pixels near the car, which makes final
style transfer less sharp along the border of the central object.
348 A. Schekalev and V. Kitov

(a) (b) (c)

Fig. 7. (a) Baseline. (b) Averaging patch stylization. (c) Averaging super-pixel styliza-
tion

(a) (b)

Fig. 8. (a) Superpixel stylization. (b) Segmentation stylization

3 User Evaluation Study

To evaluate quantitatively the advantage of the proposed methods, compared

to the algorithm of Gatys et al. [2], we conduct user evaluation studies. In the
study a user is shown a pair of stylizations—by our method and by the baseline
method, and he is asked to select a stylization he likes more. Stylizations are
shown in random order to omit location bias. This procedure is repeated for a
set of six users and a representative set of content and style images, forming
together twenty nine stylization outputs. We conduct three surveys, comparing
baseline stylization algorithm of Gatys et al. [2] with our method with average
patch-based, superpixel-based and segmentation-based importance mask gener-
ation. Results, reporting how often our method with each kind of modification
is preferred in comparison to the uniform baseline, are shown on Table 1.
It can be seen that our method in all its modifications outperforms the base-
line stylization method. Image segmentation modification gives maximum bene-
fit, which can be attributed to the fact that it extracts the boundaries of central
objects more accurately.
Style Transfer with Adaptation to the Central Objects of the Scene 349

Table 1. Frequencies with which each of the proposed methods are preferred compared
to the baseline of Gatys et al. [2].

Frequency
Patches-based importance generation 66%
Superpixel-based importance generation 72%
Segmentation-based importance generation 80%

4 Conclusion

A new style transfer method with spatially varying strength is proposed in this
work. Stylization strength is controlled for each pixel by automatically gener-
ated importance mask. Three methods—patch-based, segmentation-based and
superpixel-based—are proposed to generate importance mask. Qualitative com-
parisons and conducted user evaluation studies demonstrate superiority of the
proposed method compared to the classical style transfer method of Gatys et al.
[2] due to strong and expressive style transfer for the background and more gentle
style transfer for the central objects of the content image, allowing to minimize
distortions of the important details. Among three proposed importance mask
generation approaches, segmentation-based method showed the highest quality
which may be attributed to more accurate boundary estimation of the central
objects of the image.

References
1. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale
hierarchical image database. In: 2009 IEEE Conference on Computer Vision and
Pattern Recognition. pp. 248–255. IEEE (2009)
2. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional
neural networks. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 2414–2423 (2016)
3. Gooch, B., Gooch, A.: Non-photorealistic rendering. AK Peters/CRC Press, Natick
(2001)
4. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classiﬁcation with deep con-
volutional neural networks. In: Advances in Neural Information Processing Sys-
tems, pp. 1097–1105 (2012)
5. Research, A.: Image stylization: history and future. https://research.adobe.com/
news/image-stylization-history-and-future/. Accessed 2 July 2019
6. Rosebrock, A.: Segmentation: a slic superpixel tutorial using python. https://
www.pyimagesearch.com/2014/07/28/a-slic-superpixel-tutorial-using-python/.
Accessed 2 July 2019
7. Rosin, P., Collomosse, J.: Image and video-based artistic stylisation, vol. 42.
Springer, Heidelberg (2012)
8. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556 (2014)
350 A. Schekalev and V. Kitov

9. Strothotte, T., Schlechtweg, S.: Non-photorealistic Computer Graphics: Modeling,

Rendering, and Animation. Morgan Kaufmann, Burlington (2002)
10. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In:
European Conference on Computer Vision, pp. 818–833. Springer, Cham (2014)
11. Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.:
Semantic understanding of scenes through the ade20k dataset. Int. J. Comput.
Vis. 127(3), 302–321 (2019)
The Construction of the Approximate Solution
of the Chemical Reactor Problem Using
the Feedforward Multilayer Neural Network

Dmitriy A. Tarkhov(&) and Alexander N. Vasilyev

Peter the Great St. Petersburg Polytechnical University,

29 Politechnicheskaya Street, 195251 Saint-Petersburg, Russia
[email protected], [email protected]

Abstract. A signiﬁcant proportion of phenomena and processes in physical and

technical systems is described by boundary value problems for ordinary dif-
ferential equations. Methods of solving these problems are the subject of many
works on mathematical modeling. In most works, the end result is a solution in
the form of an array of numbers, which is not the best for further research. In the
future, we move from the table of numbers to more suitable objects, for
example, functions based on interpolation, graphs, etc. We believe that such an
artificial division of the problem into two stages is inconvenient. We and some
other researchers used the neural network approach to construct the solution
directly as a function. This approach is based on finding an approximate solution
in the form of an artificial neural network trained on the basis of minimizing
some functional which formalizing the conditions of the problem. The disad-
vantage of this traditional neural network approach is the time-consuming
procedure of neural network training. In this paper, we propose a new approach
that allows users to build a multi-layer neural network solution without the use
of time-consuming neural network training procedures based on that mentioned
above functional. The method is based on the modification of classical formulas
for the numerical solution of ordinary differential equations, which consists in
their application to the interval of variable length. We demonstrated the effi-
ciency of the method by the example of solving the problem of modeling
processes in a chemical reactor.

Keywords: Ordinary differential equations Boundary value problems

Multilayer neural networks Chemical reactor model

1 Introduction

Our version of the neural network approach to solving differential equations turned out
to be quite universal [1–8]. At the same time, it was not devoid of several drawbacks
compared to the classical methods of meshes, ﬁnite elements, etc.
First, neural network training is a very resource-intensive procedure. Secondly, the
required size of the neural network and the time of its training increase dramatically
with strengthening the requirements for the accuracy of the model. In this paper, we
consider the methods of formation of multilayer functional approximations proposed by

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 351–358, 2020.
https://doi.org/10.1007/978-3-030-30425-6_41
352 D. A. Tarkhov and A. N. Vasilyev

us in [9] without a time-consuming learning procedure. The result is an analog of deep

learning [10, 13, 14].
The essence of the approach is to apply the known recurrent formulas of numerical
integration of differential equations [11] to the interval with a variable upper limit. The
result is an approximate solution in the form of a function of this upper limit.

2 Materials and Methods

Let us consider the Cauchy problem for a system of ordinary differential equations

y0 ðxÞ ¼ fðx; yðxÞÞ;
ð1Þ
yðx0 Þ ¼ y0

on the interval D ¼ ½x0 ; x0 þ a. Here x 2 D R; y Rp ; f : Rp þ 1 ! Rp :

For the numerical solution of the Cauchy problem (1) on the interval ½x0 ; x0 þ a, a
wide palette of numerical methods is developed [11]. A signiﬁcant part of them con-
sists in dividing the given interval by points xk into intervals of length hk ; k ¼ 1; . . .; n
and applying the recurrent formula:

yk þ 1 ¼ yk þ Fðf; hk ; xk ; yk Þ: ð2Þ

Here the operator F deﬁnes a speciﬁc method. A polyline (Euler’s polyline) or a

spline is drawn according to the obtained point approximations to get an approximate
solution in the form of a function.
We propose to apply n times the formula (2) to the interval with a variable upper
limit ½x0 ; x ½x0 ; x0 þ a (herewith hk ¼ hk ðxÞ, y0 ðxÞ ¼ y0 , yk ¼ yk ðxÞ). The result is a
function yn ðxÞ, which can be considered as an approximate solution of Eq. (1). In the
simplest case of uniform partitioning, we obtain hk ¼ nx, xk ¼ x0 þ xnk.
For the explicit Euler method, we have Fðf; hk ; xk ; yk Þ ¼ hk fðxk ; yk Þ. The estimation
of the resulting approximations in the form of inequality

kyðxk Þ yk k Cmaxðhk Þ ð3Þ

is known.
The constant C depends on the estimates of the function f and its derivatives in the
region in which the solution is found [11].
More accurate formulas are obtained by applying second-order methods [11], for
which the estimate (3) is replaced by the estimate kyðxk Þ yk k Cmaxðhk Þ2 .
One such method is the corrected Euler method, which works according to the
formula:

hk 0
Fðf; hk ; xk ; yk ; yk þ 1 Þ ¼ hk ½fðxk ; yk Þ þ ðf x ðxk ; yk Þ þ f 0 y ðxk ; yk Þfðxk ; yk ÞÞ ð4Þ
2
The Construction of the Approximate Solution of the Chemical Reactor 353

For the second-order equation of the form y00 ðxÞ ¼ fðx; yÞ, the Störmer method is
even more accurate [11]

yk þ 1 ¼ 2yk yk1 þ h2k fðxk ; yk Þ: ð5Þ

Quite often in practice, there are cases when the formulation of the problem (1)
includes parameters.

y0 ðxÞ ¼ fðx; yðxÞ; lÞ;
ð6Þ
yðx0 Þ ¼ y0 ðlÞ:

Here the vector of the mentioned parameters is denoted by l. In this situation, the
problem (6) is usually solved numerically for a sufﬁciently representative set of
parameters. Our approach automatically gives an approximate version of the required
dependence, as which is taken yn ðx; lÞ.
Another common complication of the problem (1) is the boundary value problem,
which has the form

y0 ðxÞ ¼ fðx; yðxÞÞ;
uðx0 Þ ¼ u0 ; vðx0 þ aÞ ¼ v0 :

Here vectors u; v are composed of coordinates of vector y; their total dimension is

equal to the dimension of vector y. The boundary value problem can be reduced to a
problem with a parameter

y0 ðxÞ ¼ fðx; yðxÞÞ;
ð7Þ
uðx0 Þ ¼ u0 ; wðx0 Þ ¼ l:

A vector w contains coordinates of a vector y that are not included in a vector u. As

before, we construct a multilayer solution of the problem (7) yn ðx; lÞ. This allows us to
get the equation from the conditions on the right end of the interval vn ðx0 þ a; lÞ ¼ v0 ;
solving the equation, we find l. This our approach can be considered as a functional
variant of the shooting method. Next, we consider its application for a specific
application problem.
The perspective direction of our approach development is connected with the use of
the neural network approximation of the function fðx; yÞ in formula (2) rather than the
function itself. As a result, even for single-layer neural network functions fðx; yÞ, we
obtain a solution in the form of a multilayer neural network. We have received such a
solution for the above-mentioned specific task.
We consider the stationary problem of thermal explosion in the plane-parallel case
[12] under the assumption that the reaction is one-stage, irreversible, not accompanied
by phase transitions, and it occurs in a stationary medium.
354 D. A. Tarkhov and A. N. Vasilyev

We have built an approximate solution of the boundary value problem:

d2y dy
þ d expðyÞ ¼ 0; ð0Þ ¼ 0; yð1Þ ¼ 0: ð8Þ
dx2 dx
This problem is interesting because we know the exact solution, the domain of
existence of the solution, and the parameter values at which the solution of the problem
does not exist (d [ d 0:878458).

3 Calculation

According to the above considerations, at the first step, we approximate the exponent
from Eq. (8) by the perceptron expðyÞ 4:09 3:71 tanh½1:19 0:794y on the
interval ½0; 1 (it is known [12] that the sought solution is on this interval).
In constructing the multilayer solution, we used our modification of the corrected
Euler method (4) as the first step and our modification of the Stӧrmer method (5) as the
next. For two layers, we obtain an approximate solution:

y2 ðx; dÞ y0 2:04x2 d þ 0:92x2 d tanh½1:19 0:794y0 þ

0:928x2 d tanh½1:19 0:794ðy0 þ 0:464x2 dð1:10 þ tanh½1:19 0:794y0 ÞÞ:

Here y0 is the unknown initial value of the desired function at the left end of the
interval ½0; 1. To define a parameter y0 , we use a condition on the right end of the
interval yð1Þ ¼ 0, acting in one of two ways. The first method is to define the value y0
for fixed values of the parameter d.
The maximum difference between the exact solution and the approximate solution
y2 ðx; dÞ at the parameter value d ¼ 0:1 was 0.00041, at d ¼ 0:5 was 0.0046, at d ¼ 0:8
was 0.14 (Fig. 1).

y2 x,0.5 y x,0.5 y2 x,0.8 y x,0.8

y y
0.35

0.30 0.8

0.25
0.6
0.20

0.15 0.4

0.10
0.2
0.05

x x
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

Fig. 1 The exact solution and the approximate two-layer solution y2 ðx; dÞ at the parameter value
(a) d ¼ 0:5, (b) d ¼ 0:8.
The Construction of the Approximate Solution of the Chemical Reactor 355

The results showed that for small values of d the approximate solution is close to
the exact solution. However, as the parameter d approaches the value d , the accuracy
deteriorates signiﬁcantly.
For three layers, we obtain an approximate solution:

y3 ðx; dÞ y0 2:04x2 d þ 0:619x2 d tanh½1:19 0:795y0

þ 0:825x2 d tanh½1:19
0:795y0 þ 0:180x2 d 0:164x2 d tanh½1:19 0:795y0
1:19 0:795y0 þ 0:722x2 d 0:328x2 d tanh½1:19 0:795y0
þ 0:413x2 d tanh
0:328x2 d tanh½1:19 0:795ðy0 þ 0:206x2 dð1:1 þ tanh½1:19 0:795y0 ÞÞ

The exact solution and the approximate three-layer solution y3 ðx; dÞ at d ¼ 0:1 and
at d ¼ 0:5 practically merge, so we do not give the corresponding graphs. The max-
imum difference between the exact solution and the approximate solution y3 ðx; dÞ at the
parameter value d ¼ 0:1 was 0.00037, at d ¼ 0:5 was 0.0016, and at d ¼ 0:8 was
0.026.
As the number of layers increases, accuracy enhances, but formulas become more
cumbersome.
The maximum difference between the exact solution and the approximate four-layer
solution y4 ðx; dÞ at d ¼ 0:1 made 0.00032, when d ¼ 0:5 this made 0.00044, it was
0.015 when d ¼ 0:8.
We present graphs of the exact solution and the approximate three-layer solution
y3 ðx; dÞ and four-layer solution y4 ðx; dÞ at the parameter value d ¼ 0:8 in Fig. 2.

y3 x,0.8 y x,0.8 y4 x,0.8 y x,0.8

y y
0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

x x
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

a) b)

Fig. 2 The exact solution and the approximate solution at the parameter value d ¼ 0:8: (a) three-
layer y3 ðx; dÞ, (b) four-layer y4 ðx; dÞ.

The second way to determine the parameter y0 is to build a neural network

dependency y0 ðdÞ. To do this, we use the condition on the right end yn ð1; dÞ ¼ 0,
minimizing the functional

X
m
y2n ð1; di Þ: ð9Þ
i¼1
356 D. A. Tarkhov and A. N. Vasilyev

Further, we present a result for which getting we used a three-layer solution. When
optimizing the functional (9) for m ¼ 100 and di ¼ id =m we got the dependence

y0 ðdÞ ¼ 1:52 1:65 tanh½1:54 1:28d:

In this case, we obtain an approximate solution

y ðx; dÞ 1:52 2:04x2 d 1:65 tanh½1:54 1:28d

0:619x2 d tanh½0:0201
1:31 tanh½1:54 1:28d
0:0202 0:18x2 d 1:31 tanh½1:54 12:8d
0:825x2 d tanh
2 0:164x d tanh½0:0202 1:31 tanh½1:54 128d
2
3
0:0202 0:722x2 d 1:31 tanh½1:54 1:28d
6 0:328x2 d tanh½0:0202 1:31 tanh½1:54 1:28d 7
0:413x2 d tanh6 4
7
5
0:0202 0:18x 2
d 1:31 tanh½1:54 12:8d
0:328x2 d tanh
0:164x2 d tanh½0:0202 1:31 tanh½1:54 128d

The maximum difference between the exact solution and the approximate solution
y ðx; dÞ at d ¼ 0:1 was 0.0055, this at d ¼ 0:5 was 0.0069, and for the parameter value
d ¼ 0:8 was 0.014.
To illustrate the accuracy of the obtained solution, we give the following graphs
(Fig. 3).

y x,0.5 y x,0.5 y x,0.8 y x,0.8

y y
0.35

0.30
0.6
0.25

0.20
0.4
0.15

0.10
0.2

0.05

x x
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

a) b)

Fig. 3 The exact solution and the approximate solution y ðx; dÞ at the parameter value:
(a)d ¼ 0:5, (b) d ¼ 0:8.

We compared the results with the classical method of obtaining approximate

solutions of differential equations, namely, with the expansion in the powers of the
parameter d. We present the results for the expansion up to the third degree

u3 ðx; dÞ ¼ d 1 x2 =2 þ d2 5=24 x2 =4 þ x4 =24 þ

d3 127=720 11x2 =48 þ x4 =16 7x6 =720 :
The Construction of the Approximate Solution of the Chemical Reactor 357

The maximum difference between the exact and approximate solution u3 ðx; dÞ with
the parameter value d ¼ 0:1 was 0.000035, with the parameter value d ¼ 0:5 was
0.0048, and with the parameter value d ¼ 0:8 was 0.12.
To illustrate the accuracy of the obtained solution, we give the following graphs
(Fig. 4).

u3 x,0.5 y x,0.5 u3 x,0.8 y x,0.8

y y

0.30
0.6
0.25

0.20
0.4
0.15

0.10
0.2

0.05

x x
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

a) b)

Fig. 4 The exact solution and approximate solution u3 ðx; dÞ at the parameter value: (a) d ¼ 0:5
and (b) d ¼ 0:8.

As we expected, our method gives a more uniform approximation over the entire
interval of parameter d change.

4 Conclusion

We have studied new methods for constructing approximate neural network solutions
of differential equations. The methods do not require the use of resource-intensive
training procedures and allow building solutions with guaranteed accuracy. As a test
problem, we considered the solution of the boundary value problem (8), which sim-
ulates the processes in a chemical reactor [12]. As a result, we obtained the above
explicit solutions, which are more accurate than approximate solutions [3], in which a
network with 100 neurons was used.

Acknowledgment. This paper is based on research carried out with the ﬁnancial support of the
grant of the Russian Scientiﬁc Foundation (project №18-19-00474).

References
1. Tarkhov, D., Vasilyev, A.: New neural network technique to the numerical solution of
mathematical physics problems. I Simple Probl. Opt. Mem. Neural Netw. (Inf. Opt.) 14, 59–
72 (2005)
2. Tarkhov, D., Vasilyev, A.: New neural network technique to the numerical solution of
mathematical physics problems. II Complicated Nonstand. Probl. Opt. Mem. Neural Netw.
(Inf. Opt.) 14, 97–122 (2005)
358 D. A. Tarkhov and A. N. Vasilyev

3. Shemyakina, T.A., Tarkhov, D.A., Vasilyev, A.N.: neural network technique for processes
modeling in porous catalyst and chemical reactor. In: Cheng, L. et al. (eds.) Advances in
Neural Networks – ISNN 2016. Lecture Notes in Computer Science, vol. 9719, pp. 547–554.
Springer, Cham (2016)
4. Budkina, E.M., Kuznetsov, E.B., Lazovskaya, T.V., Leonov, S.S., Tarkhov, D.A., Vasilyev,
A.N.: Neural network technique in boundary value problems for ordinary differential
equations. In: Cheng, L. et al. (eds.) Advances in Neural Networks – ISNN 2016. Lecture
Notes in Computer Science, vol. 9719, pp. 277–283. Springer, Cham (2016)
5. Lozhkina, O., Lozhkin, V., Nevmerzhitsky, N., Tarkhov, D., Vasilyev, A.: Motor transport
related harmful PM2.5 and PM10: from on-road measurements to the modeling of air
pollution by neural network approach on street and urban level. In: Journal of Physics
Conference Series, vol. 772 (2016). http://iopscience.iop.org/article/10.1088/1742-6596/772/
1/012031
6. Kaverzneva, T., Lazovskaya, T., Tarkhov, D., Vasilyev, A.: Neural network modeling of air
pollution in tunnels according to indirect measurements. In: Journal of Physics Conference
Series, vol. 772 (2016). http://iopscience.iop.org/article/10.1088/1742-6596/772/1/012035
7. Lazovskaya, T.V., Tarkhov, D.A., Vasilyev, A.N.: Parametric Neural Network Modeling in
Engineering. Recent Pat. Eng. 11(1), 10–15 (2017)
8. Antonov, V., Tarkhov, D., Vasilyev, A.: Uniﬁed approach to constructing the neural
network models of real objects. Part 1 Math. Models Meth. Appl. Sci. 41(18), 9244–9251
(2018)
9. Lazovskaya, T., Tarkhov, D.: Multilayer neural network models, based on grid methods. In:
IOP Conference Series: Materials Science and Engineering, vol. 158 (2016). http://
iopscience.iop.org/article/10.1088/1757-899X/158/1/01206
10. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117
(2015)
11. Hairer, E., Norsett, S. P., Wanner, G.: Solving Ordinary Differential Equations I: Nonstiff
Problem, xiv, p. 480. Springer, Berlin (1987)
12. Hlavacek, V., Marek, M., Kubicek, M.: Modelling of chemical reactors Part X. Chem. Eng.
Sci. 23 (1968)
13. Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Sig. Process. 7
(3–4), 1–199 (2014)
14. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127
(2009)
Linear Prediction Algorithms for Lossless
Audio Data Compression

L. S. Telyatnikov(&) and I. M. Karandashev

Scientiﬁc Research Institute for System Analysis of Russian Academy

of Sciences, Moscow, Russia
[email protected], [email protected]

Abstract. The paper considers the use of such linear interpolation algorithms
as LPC, FLPC, and Wise-LPC in the lossless audio data compression. In
addition to the interpolation methods, the problems of best coding and optimal
sampling window selection are investigated. The Wise-LPC algorithm is shown
to allow a 1–5% improvement of audio signal compression against conventional
LPC and FLPC approaches. The prediction error has a Laplace distribution, its
variance decreasing smoothly and reaching “saturation” with the growing
window width.

Keywords: LPC FLPC Codec Sampling Compression

1 Introduction

Neural net algorithms provide new tools for different ﬁelds of science and technology.
They have recently helped to make a breakthrough in pattern and speech recognition,
text translation, and intellectual multipath games such as Go, chess, etc. On the other
hand, the data compression, storage and transmission still use the algorithms developed
in the 80s and 90s or in early 2000s at best. These are such well-known lossless data
compression algorithms and data formats as zip, png, flac, exe, and many other lossy
compression techniques, e.g. mp3, jpeg, mpeg.
Here we would like to elaborate on the FLAC data format [1–3] once again. Today
this data format is most popular in lossless audio data compression. Article [2], which
gives the basics of the algorithm, was taken as a starting point for further consideration.
The FLAC format is the combination of linear predictive coding (LPC) [4] and
Huffman-Golomb prediction error coding [5, 6]. Below we discuss the features of
prediction and compression algorithms and present the experimental results.

2 Setting the Problem

2.1 Linear Predictive Coding (LPC)

We consider amplitude xt of an audio signal taken at an instant of time t. In the LPC
method, value ~xt , which is a linear combination of p readings at preceding instants:

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 359–364, 2020.
https://doi.org/10.1007/978-3-030-30425-6_42
360 L. S. Telyatnikov and I. M. Karandashev

X
p
~xt ¼ ai xti ¼ a1 xt1 þ a2 xt2 þ . . . þ ap xtp ð1Þ
i¼1

is formed to estimate amplitude xt . Instead of storing signal amplitudes it is sufﬁcient to

keep the coefﬁcients of the linear model and corresponding prediction errors:

et ¼ xt ~xt ð2Þ

The nearer to zero the value of error (2) is, the fewer data bits are needed for
storage. For this reason the unknown coefﬁcients fai gpi¼1 are determined by minimizing
the mean square deviation of the estimate from the actual amplitude:
!2
X
w X
p
E¼ xt ai xti ð3Þ
t¼0 i¼1

where xt are signal amplitudes at moments t 2 ½0; w, w is the sample length. Though
the sample length w is not deﬁned strictly and is often a mere standard requirement, the
usual number of readings in sample w is much larger than the order p of the linear
model (w p). For example, standard LCP10 used in speech compression has the
prediction order p ¼ 10 and the number of readings w ¼ 120.
It can be shown [4] that the minimization of (3) reduces to the set of p linear
equations with a Toeplitz matrix consisting only of the autocorrelation coefﬁcients:

X
w
Rl ¼ xt xtl ð4Þ
t¼0

The Levinson-Durbin algorithm [4] with computational complexity Oðp2 Þ can be

used to solve the set. Linear with respect to the sample length and quadratic with
respect to the model complexity, the computational complexity of the whole LPC
algorithm Oðwp2 Þ is mostly determined by the computation of autocorrelation coefﬁ-
cients Rl . The greater the sample length is, the more accurately autocorrelation coef-
ﬁcients Rl can be computed and the better the results of compression are.

2.2 Fixed Linear Predictive Coding (FLPC)

The FLPC algorithm is another method for determining coefficients fai gpi¼1 . Here the
coefficients are not calculated, they are constants derived from the expansion of the
signal in its derivatives. The first three linear estimates of the signal are:

~x1 ðtÞ ¼ xt1

~x2 ðtÞ ¼ 2xt1 xt2 ð5Þ
~x3 ðtÞ ¼ 3xt1 3xt2 þ xt3
Linear Prediction Algorithms for Lossless Audio Data Compression 361

The FLPC algorithm has an advantage over the LPC method in the unnecessity of
computing autocorrelation coefficients (4). Since all coefficients fai gpi¼1 are fixed in the
FLPC algorithm, there is no need to code and store anything but the errors.

2.3 Wise-LPC
It can be easily shown that FLPC of the p-th order gives the p-th derivative of the input
signal. We suggest a new algorithm Wise-LPC which is a combination of FLPC and
LPC algorithms. The idea is to determine how many derivatives of the signal (the order
of FLPC) should be taken before the use of the LPC method. The Wise-LPC algorithm
includes three steps:
1. Consecutive differentiation of the signal and computation of the error.
2. If the variance of the error for the n-th derivative is smaller than that for the ðn þ 1Þ-th
derivative, the process is stopped and the n-th derivative is chosen.
3. The application of the p-th-order LPC to the n-th derivative.
The time complexity remains linear when the Wise-LPC method is used.

2.4 Coding of the Remainder

Simple Huffman code [5, 6] is used to encode errors. It includes the following stages:
1. The sign of the number is determined: if it is positive, the code starts from 0,
otherwise, from 1.
2. The variance of the error is used to choose parameter N (see formula (6)). N least
signiﬁcant bits of the number are written in the code.
3. The remaining bits deﬁne how many zeros are to be written in the code.
4. 1 is written at the end of the code.
When the number of bits in the binary representation is less than N, zeros are added
to the left end to make up the N-bit binary. The decoding engages the same operations
as with encoding, but in the reverse order. This kind of simple Huffman code features
the unnecessity of the frequency table, which is used in the usual Huffman code.
It is necessary to determine the appropriate value of N to make the simple Huffman
code work effectively. As we show below, the error has the Laplace distribution. This
assumption (as it is shown in [5]) gives us the formula for optimal value of N:

N ¼ dlog2 ðr ln 2Þ 0:5e ð6Þ

where the r is the error variance.

3 Results

3.1 Window Width w

The ﬁrst conclusions have to do with the window width (sample width) w. The division
of the signal into samples is most important in realization of different codecs. Such
362 L. S. Telyatnikov and I. M. Karandashev

division is always made before processing and compression of the signal. The smaller
the sample length is, the simpler the transmission of this portion of the signal is and the
less risk that it gets distorted or lost in transmission. On the other hand, it was men-
tioned in paragraph 2.1 that the realization of the LPC algorithm requires that the
sample length shouldn’t be too small because it affects the precision of determination of
autocorrelation coefﬁcients. As of now there is not any mathematically proved rec-
ommendation about which window width should be best for which kind of signal.
Figure 1 illustrates the spread of error for p ¼ 3 when the window width varies.
Figure 2a shows the relation between the variance of the error and the degree of
approximation and window width w for the same audio signal. It is seen that with p ¼ 3
the widening of the window beyond w ¼ 4096 makes no sense because it doesn’t lead
to notable improvement, i.e. “the saturation” comes.

Fig. 1. The spread of the error with varying window width w and p ¼ 3.

3.2 Comparing LPC, FLPC and Wise-LPC Algorithms in Signal

Compression
The compression results of LPC, FLPC and Wise-LPC are shown in Fig. 2b and
Table 1. The compression efficiency is evaluated as C ¼ Ip =I, where I is the size of the
original file (wav file) in bytes, Ip is the size of the compressed file. We discovered that

Fig. 2. (a) The relation between the variance of the error and the degree of approximation p and
window width w; (b) Comparison of audio signal compression using LPC, FLPC and Wise-LPC
methods. The optimal degree of differentiation is n ¼ 2, the order of LPC changes from 0 to 9.
Linear Prediction Algorithms for Lossless Audio Data Compression 363

the best differentiation degree in Wise-LPC is dependent on the signal spectrum. The
higher the upper frequency of the signal is, the less the differentiation degree. The
upper frequency Fupper is determined by the threshold of 20 dB.
It is seen from Table 1 that if we deal with high-frequency signals (Fupper ¼ 12. . .20
kHz), the differentiation degree is n ¼ 1 or n ¼ 2. In the case of low-frequency signals
(Fupper ¼ 0. . .10 kHz), n ¼ 3 or n ¼ 4. The compression results for low-frequency
signals are signiﬁcantly better. In particular, the Wise-LPC algorithm should work well
in compression of speech because the human speech frequency spectrum extends from
0:3 to 3:4 kHz.

Table 1. The compression ratio for ﬁfteen different audio ﬁles

№ LPC FLPC The best Wise-LPC Fupper
compression compression degree of compression (kHz)
of order of order differentiation n with n ¼ n
p¼9 n ¼ n and p ¼ 9
1 0.502 0.407 4 0.370 3
2 0.421 0.409 3 0.351 3
3 0.509 0.424 4 0.387 5
4 0.544 0.449 3 0.395 5
5 0.606 0.611 3 0.597 8
6 0.602 0.68 3 0.565 10
7 0.542 0.56 2 0.519 12
8 0.729 0.796 2 0.725 13
9 0.634 0.656 2 0.607 13
10 0.684 0.779 1 0.669 15
11 0.742 0.810 2 0.725 15
12 0.693 0.811 1 0.679 16
13 0.747 0.786 1 0.739 18
14 0.719 0.726 1 0.718 20
15 0.749 0.782 1 0.748 20

4 Conclusions

The research allows the following conclusions. The variance of the error smoothly falls
and the width of the Laplace distribution approaches “saturation” when the window
width grows.
The Wise-LPC algorithm permits better compression retaining the linear time
complexity. On the average, the Wise-LPC algorithm improves the compression by 1–
5% for broadband high-frequency signals and 5-10% for low-frequency signals. It
allows the conclusion that the algorithm should work well in speech encoding.
The FLAC format involves the combination of linear prediction and Huffman-
Golomb error coding. Note that the division of the compression procedure into two
unrelated stages is a popular trick in compression algorithms: ﬁrst the extrapolation
364 L. S. Telyatnikov and I. M. Karandashev

algorithm is generated, and then the second algorithm that takes the remnants (pre-
diction errors) and stores them in a compact form is built. The approach is also popular
in modern neural-net-based compression techniques where neural nets are usually used
only in the ﬁrst stage (data prediction) [7]. We hope that we will evident soon the
advent of end-to-end systems where neural nets are engaged in the both stages con-
currently [8]. And this kind of systems is our next goal.

Acknowledgements. The research was supported by the State Program SRISA RAS No. 0065-
2019-0003 (AAA-A19-119011590090-2).

References
1. FLAC format. https://xiph.org/ﬂac/format.html
2. Robinson, T.: SHORTEN: Simple lossless and near-lossless waveform compression.
Technical Report 156, Cambridge University Engineering Department, Trumpington Street,
Cambridge, CB2 1PZ UK, December 1994
3. Hans, M., Schafer, R.W.: Lossless compression of digital audio. IEEE Sign. Process. Mag. 18
(4), 21–32 (2001). https://doi.org/10.1109/79.939834
4. Collomb, C.: Linear prediction and Levinson-Durbin algorithm (2009). https://www.
academia.edu/8479430/Linear_Prediction_and_Levinson-Durbin_Algorithm_Contents
5. Golomb, S.W.: Run-length Encodings. IEEE Trans. Inf. Theory 12, 399–401 (1966)
6. Rice, R.F.: Some Practical Universal Noiseless Coding Techniques. Technical Report 79/22,
Jet Propulsion Laboratory (1979)
7. Kleijn, W.B., Lim, F.S.C., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q., Walters, T.C.:
Wavenet based low rate speech coding. In: 2018 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) (2018). https://arxiv.org/abs/1712.01120
8. Kankanahalli, S.: End-to-end optimized speech coding with deep neural networks. In: 2018
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
pp. 2521–2525 (2018). https://doi.org/10.1109/icassp.2018.8461487. https://arxiv.org/abs/
1710.09064
Neural Network Theory, Concepts and
Architectures
Approach to Forecasting Behaviour
of Dynamic System Beyond Borders
of Education

A. A. Brynza(&) and M. O. Korlyakova

Bauman Moscow State Technical University - Kaluga Branch,

Kaluga 248000, Russia
[email protected]

Abstract. The problem of forecasting behavior of a complex dynamic system is

considered. The analysis of the approaches allowing in the conditions of limi-
tation of information and parametric uncertainty on behavior character to predict
with high accuracy behavior of systems for situations at which value of control
parameters go beyond the limits of the used training set is carried out. The
estimation of forecasting results is executed, the corresponding graphs are pre-
sented, conclusions are drawn.

Keywords: Training of neural networks Forecasting

Recurrent neural networks Trees of decisions LSTM networks

1 Introduction

In diagnostics of technical systems the forecasting carried out conditions of some

object which is based on telemetric data, obtained during work. The obtained infor-
mation is analyzed, the changes arising with a time under the influence of external and
irreversible processes of wear of different components of a system are defined. Forecast
of development of defects and timely assessment of technical condition on approximate
period allows to increase control efficiency of systems in general [1, 2]. Thus, it is
necessary to provide mechanisms of predictions of development of system conditions
under various states of operation and taking into account specific features of each
system.

2 Methods of Prediction of a System Behavior

Now one of effective methods of forecasting in sophisticated systems - creation of their

‘digital doubles’. Similar models are effectively used in different applied processes and
in systems with different function. As an example - modeling the emissions of charcoal
gas in a system with many sources generating energy in time [3]. There as a digital
double can be used a model of a partially equilibrium balanced power system. The
main task to be solved when carrying out the analysis of the information, consists in
determination of dynamics of changes of functioning of the formed information model,

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 367–374, 2020.
https://doi.org/10.1007/978-3-030-30425-6_43
368 A. A. Brynza and M. O. Korlyakova

which allows to describe the behavior of objects that make up a complex system in the
present and future. [4–6]. Thus, it is necessary to create models of systems which will
allow to predict the behavior of difﬁcult technical objects in conditions of stable and
changeable environments, at rated loads and beyond them.
For difﬁcult technical objects it is possible to use various approaches for creation of
mathematical models with different degree of details [7]:
– creation of nominally functional descriptions of a system (static or dynamic) that
demands understanding of processes, taking place in a system;
– creation of simulation models on the basis of the known properties and functions of
a system (nature of communications of input and output parameters);
– creation of models on the basis of training and the analysis experimental data
without a known functional connection, which requires a huge number of examples
of system work states.
The purpose of any variant of modeling consists in rather exact description of the
processes taking part in the modeled object for predictions of consequences. However,
it should be noted that the nominal settings are usually well studied, but the emer-
gencies have no full description. This leads to the fact that the formed model of the
object has to provide forecasting of behavior not only within nominal situations but
also to overstep their boundaries.
Let’s consider possible ways of solving the task of modeling of systems, based on
training methods using the examples. Among them it is possible to allocate neural
network models [3] which allow multiple examples to construct not only connections
of input and dependent parameters, but to estimate structure of this connections to a
certain degree. Let’s review several examples of models of dynamic systems to predict
their behavior beyond the borders of education.

2.1 Example 1. Vibration Gyroscope

Model of a vibration gyroscope with a control system [8] based on principle of adaptive
control in real time, where quasistationary angular speed of an oscillatory gyroscope is
considered as unknown parameter and must be estimate. Input - the operating impacts
(force) on both axes of a gyroscope are calculated in the way that dynamics of a
gyroscope reached the quality set by reference model for internal coordinates
ðx; y; x_ ; y_ ; €x; €yÞ and control impacts ðu; vÞ.
The model is limited to the range of stabilization of angular speeds of Xz from 3 to
7 rad/sec (see Fig. 1 (a)). Areas of angular speed out of speciﬁed range lead to
appearance of oscillatory process (see Fig. 1 (b)), which is not stabilized by a control
system.
Approach to Forecasting Behaviour of Dynamic System Beyond Borders of Education 369

Fig. 1. (a) Stabilization of the angular velocity, (b) There is no stabilization.

Let’s consider as the object of modeling the process of occurrence of such phe-
nomena, i.e. we actually will be able to predict whether the process of stabilization of
angular speed is successful by the behavior of the model. Thus we solve a problem of
classification of the following types:
– predict the distance of the prediction window ðnwÞ from the current time moment
ðtÞ, whether the angular velocity (i.e. analysis of the behavior of the system in
moments from t n to t þ nw);
– the input dataset is the values of the gyroscope state vector at the time of
t and n previous states X ¼ hxðt nÞ; yðt nÞ; x_ ðt nÞ; y_ ðt nÞ; €xðt nÞ; €yðt nÞ;
uxðt nÞ; uyðt nÞ; . . .xðtÞ; yðtÞ; x_ ðtÞ; y_ ðtÞ; €xðtÞ; €yðtÞ; uxðtÞ; uyðtÞi;
– the dependent variable T 2 f1; 1g. We believe that T ¼ 1 (class ‘off’ on Fig. 2)
the lack of stabilization of the angular velocity and T ¼ 1 (class ‘on’ on Fig. 2) for
stabilization areas, the difference between the reference model and the results of the
adaptation and control loop operation on the interval is actually estimated
½t; t þ nw.
Determination of the signal type refers to problems of classification temporary
signals (TSC) for the solution of which offered most various approaches on the basis of
use of classical feedforward and recurrent networks [4], networks of convolutional type
[5] and LSTM [6] networks. We teach qualifiers for the Xz area = [3, 7] rad/sec.
As a result of training of several types of qualifiers best results shows model based
on LSTM network [9]. Quality of decisions in the area of nominal values of angular
speed for test selection made 94% of correctly assessed situations. Application of this
model for the test is shown in Fig. 1a. Besides, modeling at Xz 2 ½7; 10 rad/sec and
Xz 2 ½0; 3 rad/sec shows high quality of prediction (70–75%) for examples behind the
borders Xz used for training. The generated classifier allows to specify the area of
stabilization for nw ¼ 0:004 seconds up to stable model entrance into this zone.
370 A. A. Brynza and M. O. Korlyakova

Table 1. Assessment of the quality of training of networks

Network type The size of the Training Time of one Type I error in the
window n, sec. time, min. prediction, sec. range [0,3] and
[7, 10] rad/sec, %
2-layer perceptron, Dt ¼ 0:004 10 0.013 49.60
10 neurons
2-layer perceptron, Dt ¼ 0:004 30 0.014 73.5
100 neurons
LSTM – 10 Dt ¼ 0:004 0.6 0.041 29.0
neurons
LSTM – 100 Dt ¼ 0:004 6 0.043 25.5
neurons

Results of modeling are given in Fig. 2 (a) for area inside training range and Fig. 2
(b) beyond the borders of education. Practically all timepoints in Fig. 2 (b) correspond
to lack of stabilization angular speed of T ¼ 1 (class ‘off’), the area of 1st kind errors
is highlighted with color. Assessment of quality of training of several types of networks
is given in Table 1 (sample size - 4000 examples).
The efﬁciency of LSTM network can be explained by formation of model of
temporary behavior whereas the network of a perceptron class made only the
description of the known part of data. Thus, when training LSTM appears dynamic
system model more suitable for formations of the digital double.

Fig. 2 Classiﬁer solution ‘on’ (T=1)\‘off’ (T=−1) (a) for the nominal area of the model,
(b) beyond the borders of the nominal area.
Approach to Forecasting Behaviour of Dynamic System Beyond Borders of Education 371

2.2 Example 2. Synchronous Motor Base

Let’s consider the second object, prediction of development process on the model of
the valve DC motor [10], which belongs to the class of prediction of a time series.
Valve motor represents the brushless synchronous motor with three stator windings and
rotor with permanent polar magnets, creating magnetic ﬁeld. A mathematical model of
the motor with introduction of variable statuses: ð~x1 ~x2 ~x3 ÞT ¼ ðid iq xÞT and
ð~u1 ~u2 ÞT ¼ ðud uq ÞT , allows for [10] to obtain a system of equations in dimensionless
variables:

x_ 1 ¼ x1 þ x2 x3 þ u1
x_ 2 ¼ x2 x1 x3 þ cx3 þ u2 ð1Þ
x_ 3 ¼ rðx2 x3 Þ MH signðx3 Þ

Parameters u1 ; u2 ; MH are control parameters. The development of the process in

time relative to the coordinates x1 ; x2 ; x3 takes the form shown in Fig. 3.

Fig. 3. Changing the state of a system.

The area of stability of an attractor is reached in an interval of control parameter

u1 2 ½9:3 0:1. For forecasting we use the sequence of a system status on previous
n modeling steps. The predicted value is set on nw of steps of modeling. Therefore, the
examples for training contain the following elements:
- the input dataset is the values of the state vector at the timepoint t and n previous
states X ¼ hx1 ðt nÞ; x2 ðt nÞ; x3 ðt nÞ; . . .; x1 ðtÞ; x2 ðtÞ; x3 ðtÞi;
- dependent variable T ¼ hx1 ðt þ nwÞ; x1 ðt þ nwÞ; x1 ðt þ nwÞi:
We have considered several different architectures for solving the forecasting task:
ensemble of decision trees and multilayer nonlinear perceptron. Besides, for the
regression target coordinates ðx1 ; x2 ; x3 Þ, applied different methods of forming of the
training couples. For models in training range all created solvers show good quality of
the description of a trajectory of a system. Let’s change parameters u1 ; u2 ; MH for
372 A. A. Brynza and M. O. Korlyakova

training borders, we will also consider result of generation of a trajectory vector of a

condition of a system, presented in Fig. 4.
A two-layer network with sigmoidal neurons on a hidden layer and linear output
neurons is used. The learning algorithm used is Levenberg-Marquardt. The training
sample is fed in the form of a vector 3n 1, where the intervals hð1; nÞ;
ðn þ 1; 2nÞ; ð2n þ 1; 3nÞi are put according to target values of coordinates x1 ; x2 ; x3 at
time t þ 1. At the same time the network is under construction for 3 exits (values
coordinates), i.e. the mode of vector modeling is realized. When the control parameter
reaches the instability zone, proceeding from results of modeling, it is possible to draw
a conclusion that usage of the network is capable to build the forecast of behavior of an
attractor in high quality only in the neighborhood of the control parameter. When going
beyond the limits of stability, network it is not capable to build the reliable forecast of
behavior.

Fig. 4 The initial trajectory of the system at u1 ¼ 6 and the result of forecasting using
(a) decision trees (b) the output vector of the neural network (c) serial output of the neural
network.

The most successful model of forecasting of a trajectory was received at

radical revision of a way of the description of the training couples. The training
selection is formed in the form of mix of couples ðhx1 ðt nÞ; . . .; x1 ðtÞi; x1 ðt þ 1ÞÞ;
ðhx2 ðt nÞ; . . .; x2 ðtÞi; x2 ðt þ 1ÞÞ; ðhx3 ðt nÞ; . . .; x3 ðtÞi; x3 ðt þ 1ÞÞ; where examples
are located consistently one after another and are put in compliance to target values of
the corresponding coordinate in timepoint of t þ 1. The network is built for the 1st exit
(coordinate values), but actually all coordinates are passed in the sample sequentially,
though with three examples. As can be seen from Fig. 5, such network is able to take
into account the patterns of behavior even without having in the training sample data on
the trajectory of motion at the time of loss of stability, and at the same time to build a
forecast with very high quality.
For assessment of quality of the generated model we will consider mistakes (MSD
to a target trajectory) on an interval of the analysis of the control parameter u1 2
½13 1 in Fig. 6, where experiment 1 is an ensemble of trees, experiment 2 is a
vector neural network, and experiment 3 is a sequential neural network.
Approach to Forecasting Behaviour of Dynamic System Beyond Borders of Education 373

Fig. 5 (a) The initial trajectory of the system at u1 ¼ 11 (unstable behavior) (b) The result of
prediction using a feed forward network.

Fig. 6 Error graphs of the considered experiments on the range of values of the control
parameter u1 2 ½13 1.

Based on the graph (see Fig. 6) the best quality of forecasting is shown by the
network from experiment 3. The network from experiment 2 achieves good prediction
quality only in the vicinity of the control parameter at which the training was per-
formed. The ensemble of trees is able to predict the pattern of behavior, however, not as
smooth as the results obtained using other approaches.
374 A. A. Brynza and M. O. Korlyakova

3 Conclusions

The formation of digital doubles of real objects allows to solve the problem of pre-
diction the behavior of complex objects, but it is necessary to consider that the resulting
dataset cannot reflect all features of work of a real system, and only a number of key
patterns.
As showed by preliminary experiments within computer modeling, prediction of
behavior of difﬁcult technical system is possible with good quality on the basis of
preliminary training if modeling occurs in one of the forms not only at input-output
reactions, but also may form a model of dynamic system. This fact was noted for both
classiﬁcation and time series forecasting.
Separately it is worth mentioning an opportunity in additional training of a digital
double in the course of working operation that will eventually allow to adjust forecasts
of working capacity.

References
1. Bazhenov, Yu., Kaleno, V.P.: Prediction of residual life of electronic engine control systems.
Gazette SibADI 2(56) (2017)
2. Tonoyan, S., Baldin, A., Eliseev, D.: Forecasting of the technical condition of electronic
systems with adaptive parametric models.Gazette BMSTU. Series “Instrumentation” 6(111)
(2016)
3. Modeling Time-dependent CO 2 Intensities in Multi-modal Energy Systems with Storage
Christopher Ripp and Florian Steinke, Member, IEEE. url: https://arxiv.org/pdf/1806.04003.
pdf
4. Katsuba, Yu., Grigorieva, L.: Application of artificial neural networks to predict the
technical condition of products. Int. Res. J. 3(45), 19–21 (2016)
5. Fawaz, H.I., Forestier, G., Weber, J., doumghar, L.I., Muller, P.-A.: Transfer learning for
time series classification. In: 2018 IEEE International Conference on Big Data (2018)
6. RuBwurm, M., Körner, M.: Temporal vegetation modelling using long short-term memory
networks for crop identification from medium-resolution multi-spectral satellite images. In:
2017 IEEE Conference On Computer Vision And Pattern Recognition Workshops
(CVPRW) (2017)
7. Chucheva, I.: models and methods of prediction. In: Mathematical Bureau. Forecasting on
OREM (2011)
8. Myshlyaev, Y., Finoshin, A., Myo, T.Y.: Sliding Mode with Tuning Surface Control for
MEMS Vibratory Gyroscope. 6th International Congress on Ultra Modern Telecommuni-
cations and Control Systems and Workshops (2014)
9. Tai, K.S., et al.: Improved semantic representations from tree-structured long short-term
memory network. arXiv:1503.00075 [cs.CL] (2015)
10. Chu, J., Hu, W.: Control chaos for pernament magnet synchronous motor base on adaptive
backstepping of error compensation. Int. J. Control Autom. 9(3), 163–174 (2016)
Towards Automatic Manipulation of Arbitrary
Structures in Connectivist Paradigm
with Tensor Product Variable Binding

Alexander V. Demidovskij(&)

Higher School of Economics, ul. Bolshaya Pecherskaya 25/15,

Nizhny Novgorod, Russia
[email protected]

Abstract. Building a bridge between symbolic and connectionist level of

computations requires constructing a full pipeline that accepts symbolic struc-
tures as an input, translates them to distributed representation, performs
manipulations with this representation equivalent to symbolic manipulations and
translates it back to the symbolic structure. This work proposes neural archi-
tecture that is capable of joining two structures which is an essential part of
structure manipulation step in the connectionist pipeline. Veriﬁcation of the
architecture demonstrates scalability of the solution, a set of advice for engi-
neering practitioners was elaborated.

Keywords: Connectionism Tensor computations Neural networks

Unsupervised learning

1 Introduction

For a long period, Artificial Intelligence (AI) community investigates two important
paradigms about computations: symbolic and sub-symbolic or connectionist approa-
ches. Although, those two ideas can be considered drastically different, it is likely for
them to become partners rather than competitors. Symbolic level is defined by methods
that manipulate symbols and explicit representations. Connectionist approach [1, 2] is
built around the idea of massive parallelism and mostly characterized by artificial
neural networks. The potential symbiosis of two paradigms can bring robust and
flexible solutions that produce understandable results that are easy to validate.
Symbolic structures can be encoded in the distributed representation with many
means: First-Order Logics (FOLs) [3, 4], Holographic Reduced Representations
(HRRs), Binary Spater Codes and so on [5]. One of the key contributions to the field
are presented in the Tensor Product Variable Binding approach proposed by
Smolensky [6] and further applied in Vector Symbolic Architectures (VSA) [7]. Dis-
tributed representations taken by this method are used in multiple domains, especially
in Natural Language Processing (NLP) [8], where a sentence plays a role of structure.
In order to describe the task and the proposed solution it is essential to give several key
definitions of the Tensor Product Variable Binding (TPVB).

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 375–383, 2020.
https://doi.org/10.1007/978-3-030-30425-6_44
376 A. V. Demidovskij

Deﬁnition 1. Filler – a particular instance of the given structural type.

Definition 2. Role – a function that filler presents in a structure.
Definition 3. Tensor multiplication is an operation over two tensors a with rank x and b
with rank y that produces a tensor z has rank x + y and it consists of pair-wise mul-
tiplications of all elements from x and y.
Definition 4. Tensor product of a structure. A structure is perceived as a set of pairs of
fillers ffi g and roles fri g and its tensor product is found as (1).
X
w¼ f
i i
ri ð1Þ

There are already solutions that can translate simple structures to tensor repre-
sentations and back to the symbolic structures [9]. However, there is a gap in making
operations over structures on the tensor level. Indeed, there are multiple routine
operations over structures: adding or removing nodes, joining structures together etc. In
this paper the task of joining structures together is considered and thoroughly analyzed.

2 Task Description

There is structure S presented on the Fig. 1. It consists of two levels of nesting (root is
not considered as a first level). This structure contains 3 fillers: A, B, C and only two
elementary roles: r0 (left child) and r1 (right child). Each filler and role should be
transformed to vector representation. There is only one strong requirement: fillers,
defined on vector space VF, should be linearly independent among each other, as well
as roles, defined on vector space VR. At the same time, an assignment for fillers and
roles can be arbitrary with the aforementioned condition being satisfied (2).

A ¼ ½8 0 0; B ¼ ½0 15 0; C ¼ ½0 0 10; r0 ¼ ½10 0; r1 ¼ ½0 5 ð2Þ

According to Deﬁnition 4 the given structure S can be translated to the distributed

representation (3).

Fig. 1. Sample structure

Towards Automatic Manipulation of Arbitrary Structures 377

X
w¼ f
i i
ri ¼ A r0 r0 þ C r1 r0 þ B r1 ð3Þ

It is easier to ﬁrst calculate compound roles (4) and then apply them to (3) in order
to ﬁnd the corresponding tensor representation (5).

½ 100 0
r00 ¼ r0 r0 ¼ ½10 0 ½10 0 ¼
½0 0
ð4Þ
½0 0
r10 ¼ r1 r0 ¼ ½0 5 ½10 0 ¼
½ 50 0

w ¼ A r00 þ C r10 þ B r1

½ 100 0 ½0 0
¼ ½8 0 0 þ ½0 0 10
½0 0 ½ 50 0
þ ½0 15 0 ½0 5 ¼ ð5Þ
2 3
½0 0
½ 800 0 ½0 0 ½0 0 6 7
¼ þ 4 ½0 75 5
½0 0 ½0 0 ½ 500 0
½0 0

It is extremely important to note that the resulting tensor representation contains

tensors of different rank that cannot be summed as plain matrixes. Instead there is a
direct sum operation. The idea is that a tensor of rank N can be represented as a list of
tensors of rank 1..N with tensors of rank 1..N−1 being just ﬁlled with zeros. Therefore,
when a sum of tensor representation is performed tensors are summed according to
their rank.
At this moment, it is clear how to build a binary tree of the predeﬁned height using
sub-symbolic operations. In order to better understand the requirements of the task of
the paper it is necessary to analyze the algorithm that is used to construct the considered
example (Fig. 2).

Fig. 2. Possible stages of building structure from subtrees. (a) There are independent fillers.
(b) A and C are joined as left and right children of root accordingly. B is still an independent
filler. (c) A subtree from (b) is taken as a left subtree and a free filler B is taken as a right subtree.
378 A. V. Demidovskij

From Fig. 2 it is clear that building a structure inherently means joining subtrees. In
case of binary tree there are one or two subtrees that can be joined. Also, it is vital that
at the beginning each ﬁller is considered as a separate tree that can participate in the
joining procedure.
This brings to the formulation of the task. The target task of the current paper is to
propose the robust neural architecture for performing dynamic construction of tensor
representation of the arbitrary structure via joining the subtrees and investigate engi-
neering aspects of its implementation.

3 Theoretical Method of Building Shift Matrix

Joining two subtrees as direct children of the new root and by that constructing the new
tree is by nature a simple operation that makes a whole subtree play a new role in terms
of Tensor Product Variable Binding. It is extremely clear from Fig. 2b, where instead
of taking big trees, there are only two fillers that play a role of left and right subtree
correspondingly. In order to achieve the same result on tensor level it is enough to
perform tensor multiplication of the filler and corresponding role. Generalizing it to the
case when instead of a filler there is a representation of a tree, there is still a need to
perform tensor multiplication of the tree distributed representation and the assigned
role. The complexity in this case lies in the fact that tensor representation of the
structure is the multi-component list of tensors of different depth and it is no longer a
plain vector-vector multiplication.
Definition 5. Joining operation cons(p, q) is an action over two structures (trees) so
that the tree p is sliding as a whole ‘down to the left’ so that its root is moved to the left-
child-of-the-root position and tree q is sliding ‘down to the right’.
Operation cons can be expressed for binary trees as:

consðp; qÞ ¼ p r0 þ p r1
cons0 ð pÞ consðp; ;Þ ð6Þ
cons1 ðqÞ consð;; qÞ;

where r0 and r1 are roles, ; is empty tree.

It was proved [10] that this operation can be expressed in matrix form given that it
operates over the tensor representation of structures (7).

consðp; qÞ ¼ Wcons0 p þ Wcons1 q ð7Þ

Matrix exposes a shifting mechanism over a tensor representation of a structure that

contains tensors of different rank. Technically, to shift the tree ‘down to the left’
(‘down to the right’) means to apply the role r0 (r1 ) to each tensor from tensor rep-
resentation. This is what Wcons0 (Wcons1 ) matrices perform. These matrices take symbols
at depth d from p and put them at depth d + 1.The form of these matrices is the
following: all elements are zeros except the elements under the main diagonal. This is
true because of the fact that cons operation just shifts the tree one level down. As both
Towards Automatic Manipulation of Arbitrary Structures 379

matrices are constructed in the same manner, only Wcons0 is considered in this section.
Matrix is computed from the role vector and identity matrices (8).

Wcons0 ¼1A r0 þ 1R 1A r0 þ 12

R 1A r 0 þ
ð8Þ
þ . . . þ 1d
R 1A r0 þ . . .;

where d is the depth of the representation, 1A is an identity matrix of width and height
equals number of elements in the filler vector and 1R is an analogous identity matrix
with size depending on the role vector.
The key point in constructing the matrix is to keep the order of tensor multipli-
cations. This is not so obvious because the way tensor representation is considered in
TPVB is rather unbounded – TPVB only recognizes the feature that resulting tensor
contains all multiplications of input tensors elements. However, for Wcons0 it is very
important to keep dimensions of roles first. Finally, we get the following matrix for
depth = 2, role vector with 2 elements, filler vector with 3 elements (9).
22 3 3
r0 0
0 0
6 6 r0 1 7 7
66 7 7
66 r0 0 7 7
66 0 0 7 0 7
66 r0 1 7 7
66 7 7
64 r0 0 5 7
6 0 0 7
6 r0 1 7
6
6 22 3 377
6 r0 0 7
6 6 6 r0 0 0 7 77
6 66 7 77
6 77
1
66 7
6 66 0 r0 0
0 7 77
6 66 7 0 77
6 66 r0 1 7 77
6 64 r00 5 77
6 6 77
6
6 6 0 0 777
6 r01
2 37
6 0 6 77
6 6 r00 77
6
6 6 6 r01 0 0 777 7
6 6 6 77 7
6 6 6 77 7
6 r0 0 7
6
6 6
6 0 6 0
6 0 777
7 7 7
6 6 6 r0 1 7 77
4 4 4 r0 0 5 5 5
0 0
r0 1
ð9Þ

During the computation phase the matrix is flattened and does not contain the block
structure present in (9). Blocks are shown for better visualization of the matrix
structure.

4 Proposed Neural Architecture

The overall scheme of the proposed neural architecture for joining structures is
demonstrated on the Fig. 3. Neural Network is designed to accept multiple inputs of
two types: constant and variable ones, they will be described later. After that each ﬁller
380 A. V. Demidovskij

processing subtree is flattened to a vector format while a shifting matrix is prepared

based on the role that is chosen for this sub-tree. Finally, each subtree vector repre-
sentation is multiplied by the shifting matrix and all the resulting vectors are summed
and by that the tensor representation of the structure that contains inputs structures as
direct children of the new root is produced. All the layers details are covered below.
Input Layers. As it was stated in (4) tensor representation is by deﬁnition a list of
tensors. Number of elements in the list hugely depends on the depth of the structures
that should be joined. Each variable input corresponds to the tensor of the particular
rank. Also, there can be multiple structures that we are going to join, that is why the
number of inputs can drastically grow with the demand of the original task. The second
type of inputs is constant inputs. Those inputs are ﬁlled with roles vectors. On the
Fig. 2 it is clear that there are only two roles taken for simplicity of description. In
reality there can be plenty of roles and Neural Network is designed to be easily
extended to a larger case.

Fig. 3. Overall scheme of the neural architecture

Reshaping Layers. Those layers are part of the subtree flattening branch (Fig. 4) and
exist for input tensors or rank 1 and 2. It is a technical requirement of the imple-
mentation in the Keras1 framework due to the fact that Flatten layer can work only with
tensors of rank bigger than two. So, Reshaping layers expand dimensions of such
inputs with fake dimension of 1 to satisfy Flatten layer requirements.

1
https://keras.io/.
Towards Automatic Manipulation of Arbitrary Structures 381

Flattening Layers. Those layers are part of the subtree flattening branch (Fig. 4) and
exist for all input tensors. Those layers transform tensors of different rank to a simple
vector format according to the ordinary rules of flattening multi-dimensional tensors.
Concatenate Layers. Those layers are part of the subtree flattening branch (Fig. 4).
Those layers join vectors that correspond to each level of the tensor representation in
one vector. The order is very important here: from vectors representing zero depth level
to N.
Transpose Layers. Those layers are part of the subtree flattening branch (Fig. 4). Due
to the fact that next operation is matrix-vector multiplication it is required to transform
a vector into a column vector. Transpose layers enclose the subtree flattening branch
and their output is used in the ﬁnal part of the network.

Fig. 4. Subtree flattening branch of the proposed architecture

ShiftMatrix Layers. Those layers are part of the role propagating branch (Fig. 5). The
primary and only purpose of this layer is production of the shift matrix that was
discussed in Section “Theoretical method of building shift matrix”. In practice it is a
tensor of rank 2 or an ordinary matrix. It is interesting to estimate it dimensions. Width
382 A. V. Demidovskij

of the matrix or a shift operator equals to the size of the vector representing the tree that
should be assigned to a given role while height of the matrix equals the size of vector
representing a structure assigned to a new role.

Fig. 5. Role propagating branch

MulVec Layers. Those layers are part of the neural network tail (Fig. 3). Those layers
perform ordinary matrix-vector multiplication and the resulting vector contains tensor
representation of the current subtree assigned to a new role.
Add Layer. This layer is an output of the network (Fig. 3). All the subtrees are now
assigned to new roles and it is required to join them together and the sum vector would
represent the resulting structure after joining all subtrees on the tensor level.

5 Conclusion

The novel neural architecture that solved a task of joining structures was proposed and
implemented in the Keras framework. The implementation is open-source and available
online2. Several conceptual gaps of original works devoted to the same topic were
closed, in particular the mechanics of building the shift matrix. The elaborated network
is robust and is designed to work with arbitrary number of roles and existing tensor
representations of different depth. This result provides an essential brick in the bridge
between symbolic and sub-symbolic levels of computations.
However, there is still an opened question on performing other operations over
arbitrary structures on the tensor level, for example adding or removing nodes or
moving nodes to other positions in the structure. Also, current proposal requires initial
deﬁnition of the structure maximum depth that can be an obstacle in edge cases, as well

2
https://github.com/demid5111/ldss-tensor-structures.
Towards Automatic Manipulation of Arbitrary Structures 383

as constructing the shifting matrix depending on number of roles. So, there is an actual
direction for further development of Tensor Product Variable Binding methods.

References
1. Rumelhart, D.E., Hinton, G.E., McClelland, J.L.: A general framework for parallel
distributed processing. Parallel Distrib. Process. Explor. Microstruct. Cogn. 1, 26 (1986)
2. Rumelhart, D.E., McClelland, J.L.: PDP Research Group: Parallel Distributed Processing,
1st edn, p. 184. MIT press, Cambridge (1988)
3. Seraﬁni, L., Garcez, A.D.A.: Logic tensor networks: deep learning and logical reasoning
from data and knowledge. arXiv preprint. arXiv:1606.04422 (2016)
4. Teso, S., Sebastiani, R., Passerini, A.: Structured learning modulo theories. Artif. Intell. 244,
166–187 (2017)
5. Browne, A., Sun, R.: Connectionist inference models. Neural Netw. 14(10), 1331–1355
(2001)
6. Smolensky, P.: Tensor product variable binding and the representation of symbolic
structures in connectionist systems. Artif. Intell. 46(1), 159–216 (1990)
7. Gallant, S.I., Okaywe, T.W.: Representing objects, relations, and sequences. Neural Comput.
25(8), 2038–2078 (2013)
8. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In:
International Conference on Machine Learning, pp. 1188–1196 (2014)
9. Demidovskij, A.: Considering selected aspects of tensor product variable binding in
connectionist systems. In: Proceedings of the 2019 Intelligent Systems Conference
(IntelliSys). The conference will be held in September, pp. 5–6. Springer, Cham (2019)
10. Smolensky, P., Legendre, G.: The Harmonic Mind: From Neural Computation to
Optimality-Theoretic Grammar (Cognitive Architecture), 1st edn. MIT press, Cambridge
(2006)
Astrocytes Organize Associative Memory

Susan Yu. Gordleeva1(&), Yulia A. Lotareva1,

Mikhail I. Krivonosov1, Alexey A. Zaikin1,2,
Mikhail V. Ivanchenko1, and Alexander N. Gorban1,3
1
Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia
[email protected]
2
University College London, London, UK
3
University of Leicester, Leicester, UK

Abstract. We investigate one aspect of the functional role played by astrocytes

in neuron-astrocyte networks present in the mammal brain. To highlight the
effect of neuron-astrocyte interaction, we consider simpliﬁed networks with
bidirectional neuron-astrocyte communication and without any connections
between neurons. We show that the fact, that astrocyte covers several neurons
and a different time scale of calcium events in astrocyte, alone can lead to the
appearance of neural associative memory. Without any doubt, this mechanism
makes the neuron networks more flexible to learning, and, hence, may con-
tribute to the explanation, why astrocytes have been evolutionary needed for the
development of the mammal brain.

Keywords: Astrocyte Associative memory Neural network

1 Introduction

The functional role of astrocyte calcium signaling in brain information processing was
intensely debated in recent decades. Astrocytes play crucial roles in brain homeostasis
and are emerging as regulatory elements of neuronal and synaptic physiology by
responding to neurotransmitters with Ca2 þ elevations and releasing gliotransmitters
that activate neuronal receptors [1]. The characteristic times of calcium signals (1–2 s)
are three orders of magnitude longer than the duration of spikes in neurons (1 ms). It
was shown that astrocyte can act as temporal and spatial integrator, hence, detecting the
level of spatio-temporal coherence in the activity of accompanying neuronal network.
Currently actively discussed hypothesis is that the astrocytic calcium activity can
induce spatial synchronization in neuronal circuits defined by the morphological ter-
ritory of the astrocyte [2–4]. In other words one can draw an analogy with the Hopfield
network. Calcium events in astrocytes that induce synchronization in surrounding
neural ensembles work as a temporal Hopfield network, and, hence, can be interpreted
as an associative memory model.
In this paper, we consider one of the simplest model of the neuron-astrocyte net-
work (NAN), where we implement a kind of the Hopfield network with forgetting.
There is just a few of previous works studying role of astrocyte in learning tasks. Porto-
Pazos and collaborators investigated the performance of an astrocyte-inspired learning

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 384–391, 2020.
https://doi.org/10.1007/978-3-030-30425-6_45
Astrocytes organize associative memory 385

rule to train deep learning networks in data classiﬁcation and found that the neuron-
astrocyte networks were able to outperform identical networks without astrocytes in all
classiﬁcation tasks they implemented [5–7]. In the presented studies they taken into
account only temporal features of astrocytic modulation of the signal transmission in
neural network. In contrast to this approach, we concentrate on the local spatial syn-
chronization organized by astrocyte, which, due to its different time scale, work as a
kind of neural associative memory.

2 Model and Architecture of Neuron-Astrocyte Network

The proposed neuron-astrocyte network consists of 2 layers, ﬁrst layer of neurons with
dimensions 40 40 and second layer of astrocytes with dimensions 13 13. To
focus only on associative learning, the elements in each layer are not interconnected.
We consider bidirectional neuron-astrocytic communication between layers. Each
astrocyte interacts with neuronal ensemble dimensions of 4 4 with overlapping in
one row (see Fig. 1). Experiments show that astrocytes and neurons communicate via a
special mechanism modulated by neurotransmitters from both sides. The model is
designed so that when the calcium level inside an astrocyte exceeds a threshold, the
astrocyte releases neuromodulator (e.g., glutamate) that may affect the release proba-
bility (and thus a synaptic strength) at neighboring connections in a tissue volume.
Single astrocyte can regulate the synaptic strength of several neighboring synapses.
The membrane potential of a single neuron is described by Izhikevich model and
evolves according to the following equations [8]:
8
> dV
< ¼ 0:04V 2 þ 5V þ 140 U þ Iapp þ Iastro ;
dt ð1Þ
>
: dU ¼ aðbV UÞ:
dt
If V 30 mV ; then V ! c; U ! U þ d:

We use the following parameter values: a = 0.1, b = 0.25, c = −65, d = 2. The

applied currents Iapp simulating input signal Iapp ¼ 5 if input signal is presented. The
astrocytic modulation of the synaptic activity is modeled by current Iastro , which has a
value Iastro ¼ 30, if Ca2 þ level in astrocyte exceeds 0.15 lM and more than 50% of
neurons, corresponding to this astrocyte, are activated.
Calcium dynamics in astrocyte is described by the Li-Rinzel model. State variables
of each cell include IP3 concentration IP3 , Ca2 þ concentration Ca, and the fraction of
activated IP3 receptors h. They evolve according to the following equations [9]:
386 S. Yu. Gordleeva et al.

8
>
>
dCa
¼ Ier Ipump þ Ileak ;
>
>
>
> dt
< dH H h
¼ ; ð2Þ
>
> dt sn
>
>
>
>
: dIP3 ¼ ðIP3s IP3 Þsr þ Iplc þ Ineuro :
dt
3 3
IP3 Ca c0 Ca
Ier ¼ c1 v1 h
3
Ca ;
IP3 þ d1 Ca þ d5 c1

c0 Ca
Ileak ¼ c1 v2 ;
c1
Ca2
Ipump ¼ v3 2 ;
Ca þ k32

IP3 þ d1 IP3 þ d1
H ¼ d2 = d2 þ Ca ;
IP3 þ d3 IP3 þ d3

IP3 þ d1
sn ¼ 1= a2 d2 þ Ca ;
IP3 þ d3
Ca þ ð1 aÞk4
Iplc ¼ v4 :
Ca þ k4

Biophysical meaning of all parameters in Eq. (2) and their values determined
experimentally can be found in Ref. [6]. For our purpose we use the following
parameter values [6].

c0 ¼ 2:0 lM; c1 ¼ 0:185; v1 ¼ 6 s1 ; v2 ¼ 0:11 s1 ; v3 ¼ 2:2 lMs1 ; v5 ¼ 0:025 lMs1 ;
v6 ¼ 0:2 lMs1 ; k1 ¼ 0:5 s1 ; k2 ¼ 1:0 lM; k3 ¼ 0:1 lM; a2 ¼ 0:14 lM 1 s1 ;
d1 ¼ 0:13 lM; d2 ¼ 1:049 lM; d3 ¼ 0:9434 lM; d5 ¼ 0:082 lM; a ¼ 0:8; sr ¼ 7:143 s;
IP3s ¼ 0:16 lM; k4 ¼ 1:1 lM:

The current Ineuro describes production of IP3 due to the synaptic activity of
neighbor neurons. The current Ineuro is modeled by rectangular pulse signal with
amplitude 5 lM and duration 60 ms. Ineuro 6¼ 0 if more than 50% of neurons, inter-
acting with this astrocyte, are activated.
Note that the time unit in the neuronal model Eq. (1) is 1 ms. Due to a slower time-
scale, in the astrocytic model Eq. (2) all empirical constants are indicated using sec-
onds as time units. When integrating the joint system of differential equations, the
astrocytic model time is rescaled so that the units in both models match up.
Astrocytes organize associative memory 387

Fig. 1. A network structure. Input images 40 40 pixels size fed into the neuronal network
containing 40 40 neurons. Red ﬁelds correspond to the astrocyte, which overlap by one
neuron wide layer.

3 Results

We have used as input signals the black and white images of digit 0 or 1, with size
40 40 pixels as shown in Fig. 2. The training set included 10 samples for each image
with 10% of salt and pepper noise added to every sample fed into the NAN (see
Fig. 3a).

Fig. 2. Patterns for network training.

A 40 40 pixel input is processed by a 40 40 neuron layer (1600 neurons),

obtaining the applied currents, Iapp , in Eq. (1) for each input which will be further
converted into spikes. The neural response, shown in Fig. 3b, is the membrane
potential map, further converted into spike trains. Each sample was presented to the
network during 4 ms with period between samples 40 ms. In Fig. 4, the membrane
potential change is shown. During the training, each astrocyte monitored activity
associated with it 16 neurons in time window of 400 ms. If more than 8 neurons were
spiking and spiking frequency was more than 17.5 s1 , astrocyte received an input
signal, Ineuro (see Eq. (2)), inducing an increase in intracellular calcium concentration
(see Fig. 3c).
388 S. Yu. Gordleeva et al.

Fig. 3. (a) The training sample with 10% of salt and pepper noise. (b) The response of the
neuronal network. The values of the membrane potentials are shown. (c) The intracellular Ca2 þ
concentrations in astrocytic layer.

After training, our neuron-astrocyte network remembers the pattern for a period of
time that is determined by the duration of the calcium pulse in astrocyte. Testing
sample was presented to the network for 20 ms. While Ca2 þ concentration in astrocyte
exceeded the threshold in 0.15 lM and more than 8 neurons were still active, a
feedback from astrocytes to neurons is turned on. This feedback is determined by
biophysical mechanisms of astrocytic modulation of synaptic transmission and mod-
eled as additional current Iastro in Eq. (1). Example of this test is shown in the Fig. 5.

Fig. 4. (a–c) Membrane potentials of neurons during and after training. (a) Neuron in target
pattern interacted with active astrocyte. (b) Neuron, which are not in target pattern, interacted
with active astrocyte. (c) Neuron not in target pattern interacted with quiet astrocyte. (d) The
intracellular Ca2 þ concentration in active astrocyte.
Astrocytes organize associative memory 389

Fig. 5. The testing sample with 40% of salt and pepper noise. (a) The response of the neuronal
network after an input with 4,4 (b) and 11,6 (c) ms duration. (d) The intracellular Ca2 þ
concentrations in astrocytic layer.

Tests showed that the network can not only clean noise inside the target pattern
(Fig. 5b) as expected but also can separate in time the pattern and noise around
(Fig. 5c). The latter is due to the fact that neuronal spiking frequency is proportional to
value of applied current.

Fig. 6. The dependences of the accuracy on noise level. Dotted line corresponds to manual
selected threshold of accuracy.
390 S. Yu. Gordleeva et al.

To test robustness to noise of the proposed network we calculated the dependencies

of the accuracy on noise level (see Fig. 6). Here the accuracy was not equal to 100% in
ideal sample without noise because of the fact, that resolution of our system have been
determined by the interaction radius astrocytes with neurons. Capacity of the proposed
network is determined by orthogonality of images, number of astrocytes, and the radius
of overlap between the territories of the astrocyte. In the Fig. 7 we presented the
example of the training proposed network to 2 patterns, represented by digits 1 and 0.

Fig. 7. (a) and (d) The training sample with 10% of salt and pepper noise. (b) and (e) The
response of the neuronal network. The values of the membrane potentials are shown. (c) and
(f) The intracellular Ca2 þ concentrations in astrocytic layer. (g) The testing sample with 40% of
salt and pepper noise. The response of the neuronal network after the 4,4 (h) and 11,6 (j) ms
input.

4 Conclusions

In this paper, we describe a simple neuron-astrocyte network architecture having the

capabilities for associative memory. The proposed neuron-astrocyte network works as a
temporal Hopﬁeld network. The effect considered occurs because of the local spatial
synchronization organized by the astrocyte and working on a different time scale. No
Astrocytes organize associative memory 391

links between cells have been required. Astrocytic modulation of the activity of nearby
neurons during elevation of calcium concentration imitates Hebbian temporary
synapse. In the future, the proposed neuron-astrocyte network will be developed by
incorporation of the Hebbian learning algorithm.
As we know from working with artiﬁcial intelligence algorithms, the flexibility of
learning strongly depends on the complexity of the network. As we have demonstrated,
astrocytes increases the complexity of the neural network by the coordination induce by
calcium events, and this mechanism alone can lead to the organization of the neural
associative memory. Without any doubt, it would be extremely interesting to investi-
gate how this learning mechanism will work together with deep learning.
Another important direction of the future research will include identiﬁcation of
conceptual markers of malfunction associated either with age-related disease or grows
disorders. In both these situation, the brain loses ability to learn properly, hence, the
question arises whether we could model these processes without simple conceptual
model, and, probably, shed light on the methodology how to identify pathology
markers in real medical applications.

Acknowledgments. This work was supported by the Ministry of Science and Education of
Russian Federation (Grant No. 075-15-2019-871).

References
1. Verkhratsky, A., Butt, A.: Glial Neurobiology. Wiley, Chichester (2007)
2. Bazargani, N., Attwell, D.: Astrocyte calcium signaling: the third wave. Nat. Neurosci. 19(2),
182–189 (2016)
3. Araque, A., Carmignoto, G., Haydon, P.G., Oliet, S.H., Robitaille, R., Volterra, A.:
Gliotransmitters travel in time and space. Neuron 81, 728–739 (2014)
4. Gordleeva, S.Y., Ermolaeva, A.V., Kastalskiy, I.A., Kazantsev, V.B.: Astrocyte as
spatiotemporal integrating detector of neuronal activity. Front. Physiol. 10, 294 (2019)
5. Porto-Pazos, A.B., Veiguela, N., Mesejo, P., Navarrete, M., Alvarellos, A., Ibáñez, O., Pazos,
A., Araque, A.: Artificial astrocytes improve neural network performance. PLoS ONE 6(4),
e19109 (2011)
6. Alvarellos-González, A., Pazos, A., Porto-Pazos, A. B.: Computational models of neuron-
astrocyte interactions lead to improved efficacy in the performance of neural networks.
Computational and Mathematical Methods in Medicine (2012)
7. Mesejo, P., Ibáñez, O., Fernández-Blanco, E., Cedrón, F., Pazos, A., Porto-Pazos, A.B.:
Artificial neuron–glia networks learning approach based on cooperative coevolution. Int.
J. Neural Syst. 25(4), 1550012 (2015)
8. Izhikevich, E.: Simple model of spiking neurons. IEEE Trans. Neural Netw. 14(6), 1569–
1572 (2003)
9. Li, Y.X., Rinzel, J.: Equations for InsP3 receptor-mediated [Ca2 +]i oscillations derived from
a detailed kinetic model: a Hodgkin-Huxley like formalism. Theor. Biol. 166(4), 461–473
(1994)
Team of Neural Networks to Detect
the Type of Ignition

Alena Guseva1(B) and Galina Malykhina1,2

1
Peter the Great St. Petersburg Polytechnic University,
29 Polytechnicheskaya, St.Petersburg, Russia
[email protected]
2
Russian State Scientiﬁc Center for Robotics and Technical Cybernetics,
21 Tikhoretsky Prospect, St.Petersburg, Russia
[email protected]
https://www.spbstu.ru/

Abstract. The article is about the development of a modern multisen-

sory fire system, which has sensors for temperature, CO concentration
and smoke concentration. The presence of several different types of sen-
sors allows determine the type of source of ignition, which make possi-
ble automatically determine the means of fire extinguishing at the very
beginning of the ignition process. The study was carried out on the basis
of simulation results obtained in the supercomputer center. It simulated
the processes of ignition in the ship’s rooms for various sources of fire:
paper, household waste containing plastic, gasoline, alcohol-containing
substances and electrical cables. As the study showed, a good result
can be obtained with the help of a team of specially organized neural
networks. A team of neural networks divided into two levels has been
proposed to solve this problem. At the first level, neural networks with
partial training are used. At the second level, a probabilistic neural net-
work. The fire system is highly flexible at the hardware level because it
has a wireless interface that allows quick reconfiguration. The software
of the fire system, in this case also has a high flexibility, allows for sim-
ple expansion, contraction or modification of software modules in the
conditions of changing sources of ignition in the room.

Keywords: Fire system · Source of ignition ·

Team of neural networks · Semisupervized learning · Bayesian NN

1 Introduction
The ship’s premises have different fire hazards. Moreover, inside a single room,
for example, an engine room, a room with electrical equipment, the probabilities
and types of ignition can differ significantly. Means of automatic extinguishing
can most quickly eliminate the fire, especially if they are applied locally. To
use these tools, you need to know what substance is ignited and where the
fire is located. In this case, local application of a suitable fire extinguishing
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 392–397, 2020.
https://doi.org/10.1007/978-3-030-30425-6_46
Team of Neural Networks 393

agent is possible. In the considered multi-sensor fire system, which has sensors
for temperature, CO concentration and smoke concentration, it is possible to
determine the type of fire. With a sufficient number of sensors and their optimal
placement, it is possible to determine the area of ignition. A better result can
be obtained using neural networks or a team of neural networks. Therefore,
the aim of our study is to develop a neural network data processing algorithm
of a multisensory fire system with the goal of the most rapid fire detection,
localization and classification.

2 Simulation of Fire
Consider the following sources of ignition and their respective classes in accor-
dance with the classification given in the NFPA10 (National Fire Protection
Association) standard [1]: Depending on the type of ignition, the readings of the
three types of sensors: temperature, concentration of carbon monoxide and con-
centration of smoke vary with time, as shown in Fig. 1. The data were obtained
using simulations on a supercomputer in the FDS environment [2]. The inertia of
the sensors was taken into account at the preprocessing stage using the impulse
response of each sensor. The analysis of dependencies in Fig. 1 showed that the
source of ignition affects the change in the fire factors received from the sensors.

Table 1. Sources of ﬁre and their corresponding class

Ignition source Class

Paper A1
Household waste containing plastic A2
Gazoline B1
Alcohol-containing substances B2
Electrical cable E

Fig. 1. Changes in fire factors for five sources of ignition, in case of a fire at zero-time,
(a). Data received from the temperature sensor, (b). Data obtained from a carbon
monoxide concentration sensor, (c). Data obtained from the sensor measuring the con-
centration of smoke.
394 A. Guseva and G. Malykhina

This distinction can be used to identify ignition sources. The recognition result
depends on the number of sensors and their relative location. Three sensors were
selected for the investigated room measuring 5 by 7 m, which corresponds to the
standards SP 5.13130.2009 [3]. The location of the sensors was optimized using
a variant of the genetic algorithm proposed by the authors in the articles. [4–6].
The importance of temporal dependencies for the recognition problem leads to
the application of temporal signal processing using dynamic INS with short-term
memory. Short-term memory is made on the delay line at the input of the INS.
Data from sensors is received once per second (Table 1).

3 Architecture of Detecting Fire Type System

The fire system is built on the basis of a wireless interface and allows for quick
reconfiguration at the hardware level when a situation changes in a particular
room or when moving from one room to another. The software of the fire system
should also have high flexibility, allow simple expansion, contraction or change
of fire conditions in the room. A team of neural networks divided into two levels
has been proposed to solve this problem. At the first level, neural networks with
partial training are used, at the second level, a probabilistic neural network. Let
us consider in more detail each part of the algorithmic support (Fig. 2).

Fig. 2. Team of the neural network for ﬁre type recognition

– Input parameters. The number of input parameters n depends on the number

of sensors located in the room space and on the length d of the short-term
memory delay line d: n = qdK where K is the number of individual sensors
included in the multisensor, q is data record from multisensor. In our case, q
= 3, while using sensors of temperature, carbon monoxide concentration and
smoke concentration, which is indirectly measured based on the definition of
visibility.
– Delay lines. Since the type of fire is characterized by the dynamics of changes
in fire factors, the current and several previous readings are received at the NN
input. To determine the required amount of short-term memory, we analyzed
the effect of the length of the delay line, which is selected from the series 3,
5, 10.
Team of Neural Networks 395

– Neural networks trained with partial involvement of a teacher (semi-supervised

learning). Five first-level neural networks with the same architecture are orga-
nized, each of which is designed to identify one type of fire. The input data
for neural networks have the same appearance, so it is advisable to use iden-
tical in structure neural networks. Having several identical neural networks
reduces the total number of parameters for training. In this case, less data is
required for training and, under equal conditions, the network is less prone
to retraining.
– Bayesian neural network. The neural network of the second level is designed
to estimate the probability of one of the five types of ignition.

The structure of ﬁve identical neural networks is shown in Fig. 3: X is data

vector of size n obtained from multisensors, V is output value in the range
from 0 to 1. At the input layer of each of their neural networks, current and
delayed normalized data received from sensors are received. Data normalization
is reduced to their reduction to the interval [0, 1]. Neural networks have an input,
hidden and output layer. Sigmoid activation functions are used for the hidden
and output layers. During partial training of each network, data is used in which
only those data that relate to the corresponding type of ﬁre are marked up,
the remaining data are considered as “other”. The training was performed using
the Levenberg-Markvard error backpropagation algorithm. Amount of data for
training is 25025, for testing – 5005.

Fig. 3. The structure of identical neural networks.

4 Evaluation of Ignition Source Recognition Results

The results of verification of neural networks are shown in Table 2. Analysis of the
effect of short-term memory, taking into account the training time and network
error, showed that the number of delayed samples of sensor signals can be taken
equal to 5. The number of delays grater then 5 leads to an increase in training
time and a decrease in accuracy. The Bayesian network is used to calculate the
probability that a fire belongs to the appropriate class. The structure of the
Bayesian network, shown in Fig. 4, is characterized by the input layer, hidden
layers and output layer. The input layer is the outputs of five neural networks
designed to determine each type of fire. As the activation function of the hidden
layer, a normalized exponent was used, which is a generalization of the logistic
396 A. Guseva and G. Malykhina

Table 2. Learning networks of twin networks 3 countdowns on entry

Artiﬁcial neural network to Number Training time, Network

determine the type of ﬁre of delays minutes error, %
Electrical cable 3 01:48 87,1
Electrical cable 5 13:10 93,5
Electrical cable 10 19:18 44,5
Paper 3 02:05 91,6
Paper 5 12:17 95,8
Paper 10 20:24 50,1
Gazoline 3 02:09 97,7
Gazoline 5 12:24 98,8
Gazoline 10 20:51 52,4
Alcohol-containing substances 3 02:07 95,4
Alcohol-containing substances 5 12:43 97,6
Alcohol-containing substances 10 21:02 51,8
Householdwaste containing plastic 3 02:10 87,5
Householdwaste containing plastic 5 12:29 87,6
Householdwaste containing plastic 10 20:47 43,8

Fig. 4. The structure of the Bayesian network

function. The output layer of the Bayesian network represents the probability
that one of five types of fires occurs or there is no fire at all. The sum of all values
of the output vector is equal to one. As a result of learning the Bayesian neural
network, the resulting error of determining the type of fire was 93.7. Moreover,
the main error is related to the work of the previous five neural networks. The
time required for training is 18 s.
Team of Neural Networks 397

5 Conclusion
The proposed two-tier architecture has several advantages:

– Allows you to perform a simple restructuring of the system in the absence of

sources of ignition in the room. To do this, you must remove the first level
neural network responsible for detecting this type of source and reduce the
number of neurons in the hidden layer of the Bayesian neural network.
– Allows simple expansion of the number of types of fire in a given room. To add
a new type of ignition, it is enough to add a neural network of the first level
and train it to recognize a new type of ignition. To the Bayesian network, you
need to add a neuron and retrain only this network, for which the learning
time is very short.
– Allows you to quickly reconfigure the fire system when moving its units to
another room. To do this, it is necessary to determine the optimal location of
the sensors and change the system so that it allows to determine the types of
sources of ignition possible in this room. The number of neural networks of the
first level should be changed as follows: removed the networks responsible for
fires, the sources of which are absent in the new room; networks responsible
for detecting new types of fires were added and trained.

References
1. NFPA 10: Standard for Portable Fire Extinguishers. https://www.nfpa.org/cod
es-and-standards/all-codes-and-standards/list-of-codes-and-standards/detail?code
=10
2. McGrattan, K., Hostikka, S., Floyd, J., Baum, H., Rehm, R., Mell, W., McDermott,
R.: Fire Dynamics Simulator (Version 5) Technical Reference Guide. National Insti-
tute of Standards and Technology, Gaithersburg (2010). http://code.google.com/
p/fds-smv
3. SP 5.13130.2009 Fire protection systems: Installation of fire alarm and fire extin-
guishing automatic. Norms and rules of design (with Amendment N 1). http://docs.
cntd.ru/document/1200071148
4. Malykhina, G.F., Guseva, A.I., Militsyn, A.V., Nevelskii, A.S.: Developing an intel-
ligent fire detection system on the ships. In: Sukhomlin, V., Zubareva, E., Shneps-
Shneppe, M. (eds.) The International Scientific Conference on II Convergent Cog-
nitive Information Technologies (Convergent’2017), vol. 2064, pp. 289–296. Russia,
Moscow (2017)
5. Militsyn, A.V., Malykhina, G.F., Guseva, A.I.: Early fire prevention in the plant.
In: International Conference on Industrial Engineering, Applications and Manufac-
turing (ICIEAM), Saint Petersburg, Russia, vol. 2, pp. 1–4. IEEE Explore (2017)
6. Guseva, A.I., Malykhina, G.F., Nevelskiy, A.S.: Neural network based algorithm for
the measurements of fire factors processing. In: Kryzhanovsky, B., Dunin-Barkowski,
W., Redko, V., Tiumentsev, Y. (eds.) Neural Computation, Machine Learning, and
Cognitive Research II. Neuralinformatics Studies in Computational Intelligence, vol.
79, pp. 160–166. Springer, Cham (2019)
Chaotic Spiking Neural Network Connectivity
Configuration Leading to Memory Mechanism
Formation

Mikhail Kiselev(&)

Chuvash State University, Cheboksary, Russia

[email protected]

Abstract. Chaotic spiking neural network serves as a main component (a

“liquid”) in liquid state machines (LSM) – a very promising approach to
application of neural networks to online analysis of dynamic data streams.
The LSM ability to recognize complex dynamic patterns is based on “memory”
of its liquid component – prolonged reaction of its neural network to input
stimuli. A generalization of LSM called self-organizing LSM (LSM including
spiking neural network with synaptic plasticity switched on) is studied. It is
demonstrated that memory appears in such networks under certain locality
conditions on their connectivity. Genetic algorithm is utilized to determine
parameters of neuron model, synaptic plasticity rule and connectivity optimal
from point of view of memory characteristics.

Keywords: Spiking neural network Liquid state machine

Chaotic neural network Synaptic plasticity
Neural network self-organization Memory mechanism

1 Introduction

The recently proposed neural network paradigms such as spiking neural networks
(SNN), convolutional and deep learning networks are considered by many researchers
as a potential basis for the break-through IT technologies of the near future. Since
SNNs are complex non-linear dynamic systems, their specific application area is
processing of dynamic signals such as video streams, sensory data in robotics or signals
from technological sensors.
The most common form of SNN architecture used for solution of this kind of
problems is the so called liquid state machine (LSM) [1]. LSM is a computational
model consisting of the two main parts. The first part is a large chaotic spiking neural
network. It is chaotic in the sense that it has no predefined structure (layers etc.).
Instead, its connectivity is random – presence of synaptic connection between two
given neurons, weight of this connection and its delay are random variables obeying
certain statistical distributions. Input data streams represented in form of spike
sequences (let us remind that spiking neurons communicate by spikes – short pulses of
the constant amplitude and negligible duration) are injected into the network via special
afferent synapses. The network responds to stimulation by complex activity of its

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 398–404, 2020.
https://doi.org/10.1007/978-3-030-30425-6_47
Chaotic Spiking Neural Network Connectivity Conﬁguration 399

neurons which may depend on recent history of the input signal. Activity of the
neurons (in form of spike counts in equal time intervals) is monitored by the second
part of LSM, the read-out mechanism. This mechanism implements supervised learning
– it learns to use LSM neuron activity data to classify input stimuli, to make predic-
tions, to recognize exceptional situations and to perform other data analysis and pre-
diction tasks. Nature of the read-out mechanism may be very diverse. It may be any
suitable data mining algorithm – logistic regression, support vector machine, decision
tree, naïve Bayesian classifier or anything else – it is required only that it should be fast
and could work with very multi-dimensional data. It is assumed that valuable predictive
features are hidden in the multi-dimensional and diverse reaction of the large SNN to
input signal, and the job of the read-out layer is to mine them in the seeming chaos of
the SNN activity.
In the original version of LSM which is used now by the majority of researchers,
neurons are not plastic – the synaptic plasticity is switched off. However, there are
many reasons to believe that it can play positive role. Indeed, the strong feature of LSM
is its randomness. It makes possible to implement all kinds of computations on input
data (provided that the SNN is sufficiently large). But at the same time, randomness is
an evident weakness of the LSM concept – small number of useful circuits in the
networks is neighbored by plenty of random network subsets performing senseless or
trivial operations. Thus, there is tempting opportunity to preserve computation gen-
erality provided by chaotic connectivity while eliminating senseless circuits in the
process of guided self-organization implemented in the form of synaptic plasticity. It
leads us to concept of self-organizing LSM (SOLSM). Test of this hypothesis and
creation of the practically usable SOLSM are among aims of the research project ArNI
(Artificial NeuroIntelligence).
The crucial feature of LSM explaining its efficiency for processing of dynamic data
is its memory ability (the transient working memory is meant here, not to be confused
with the constant long-term memory fixed in synaptic weights). If the spatio-temporal
pattern to be recognized spans significant time interval, the network should memorize
its beginning until its final part is presented. It is true for SOLSM, as well. However,
appearance of the memory mechanism in evolving chaotic SNN is very poorly
explored process. Some of the earlier works of the author were devoted to this subject
[2, 3]. However, the structured SNNs were studied in these works. At present, the
majority of working memory models in SNN is based on short-term plasticity, an
additional process modifying synaptic weights which acts together with the conven-
tional long-term STDP plasticity (see, for example, [4]). Different researchers include
this mechanism in their models in different forms. For example, in the pioneering work
of Izhikevich [5], short-term plasticity enables formation of the so called polychromous
neuronal groups (PNG), whose sporadic activation indicates recent appearance of the
stimulus specific for the given PNG. Other approaches utilize the notion of attractors
[6], meta-stable states of the network preserving information expressed by the attractor
in time. Further extension of this idea called continuous attractors explains how con-
tinuous values can be stored in memory [7]. However, most of these approaches cannot
be directly applied to SOLSM because either cannot be implemented in chaotic net-
works (like continuous attractors) or use complicated synaptic plasticity models
(especially, keeping in mind that LSM does not use synaptic plasticity at all).
400 M. Kiselev

Thus, our aim is to study how working memory can appear in chaotic SNN with
Hebbian long-term synaptic plasticity.

2 Model of Neuron and Synaptic Plasticity

The simplest but functional leaky integrate-and-ﬁre (LIF) neuron model with current-
based excitatory synapses and conductance-based inhibitory synapses was used in this
study. Upon receiving a spike at the moment tijþ , the i-th excitatory synapse instantly
increments the neuron membrane potential u by a value equal to its weight wiþ . The
k-th inhibitory synapse receiving spike instantly increments inhibitory membrane
conductance c by the value of its weight w k . In the absence of input spikes, u and
c decay to 0 with time constants su and sc, respectively. When u reaches a threshold
value, the neuron emits a spike. After that, the neuron cannot emit new spike during the
refractory period sR. Values of membrane potential are selected such that its resting
value equals 0 and its threshold value equals 1. While the value of c is not equal to
zero, the membrane potential falls exponentially to the inverse inhibitory potential UI
(which is negative) with the time constant 1/c. Thus, the used neuron model is
described by the following equations:
8 P þ
>
> du
¼ u
ð Þ þ d þ
< dt su c u U I w i t t ij
P
i;j
ð1Þ
>
> w
dt ¼ sc þ i d t tij
dc c
:
i;j

and the condition that if u > 1 and t > Ta + sR, where Ta is the moment when this
neuron ﬁred last time, then the neuron ﬁres and u is reset to 0.
The plasticity rule used in this work is based on the spike timing dependent
plasticity (STDP) model. As in our previous works [8, 9], the lower and upper limits
(wmin and wmax) on synaptic weight values are set by using the so called synaptic
resource W, whose value depends monotonically on the weight value w in accordance
with the following formula:

ðwmax wmin Þmax ðW; 0Þ

w ¼ wmin þ : ð2Þ
wmax wmin þ max ðW; 0Þ

Each long-term potentiation (LTP) or long-term depression (LTD) act increases or

decreases W by a certain value but value of w always remains in the interval [wmin,
wmax). LTP occurs when neuron fires in short time after arrival of the presynaptic spike
and has its classic form W ! W þ Dw þ exp ðDt=sw Þ, where Δt is a time interval
between post- and pre-synaptic spikes, Δw+ and sw are constants (we select sw equal to
su). Rule for LTD is different and much simpler – synapse is depressed by the value
Δw− every time it receives spike. Besides that, the total value of W in all synapses of
one neuron is kept constant – when some synapse is depressed or potentiated, all the
rest are modified in the respective direction by an equal value.
Chaotic Spiking Neural Network Connectivity Configuration 401

All postsynaptic connections of excitatory neurons (E) have non-negative weights,

weights of postsynaptic connections of inhibitory neurons (I) are negative (and constant
in time).

3 External Signal and Memory Ability Tests

Now let us describe how the memory ability of the SNN is evaluated. Informational
input of the SNN is represented as a certain number of nodes – sources of spikes (in our
experiments this number was equal to 600). These nodes emit low intensity Poissonian
noise (mean spike frequency 0.1 Hz). Besides that, every 100 ms, some group of input
nodes begins to emit high intensity (100 Hz) Poissonian noise. This high frequency
signal lasts 40 ms (below, it will be also called pattern). These groups do not intersect.
We used 30 groups (patterns), 20 nodes per group. Order, in which these groups
became active, was random. The task was to predict which group was active during the
preceding time interval using the network activity (spike counts of each neuron)
measured in the current interval. Successful prediction would mean that the network
memorizes properties of input signal during at least 60 ms and that this memory is
sufficiently stable – it is not destroyed immediately by activity of the next input node
group.
Random forest data mining algorithm [10] was chosen as a read-out mechanism
because of its speed and stability in case of very numerous predictors.
In the described series of experiments, the whole simulation lasted 1600 s. It was
assumed that during first 800 s the network reaches a certain equilibrium state. If it
really does then during last 800 s no significant synaptic weight modifications should
be observed. In this case, this second half of simulation period was used for mea-
surement of the pervious pattern prediction accuracy as was said above.
Interneuron connections have non-zero delays. Inhibitory connections are always
fast (have 1 ms delay).

4 Network Connectivity

Since it is not clear a priori which connectivity conﬁguration could lead to formation of
memory in SNN, the three following variants were tested:
• Neural gas. All neurons have identical number of synapses of each kind - exci-
tatory, inhibitory and afferent, connecting a neuron with input nodes (they are
always excitatory) but the set of presynaptic neurons is selected randomly for every
neuron. Synaptic weights and delays are also random and selected using the same
distribution law for all neurons, but different for connections E ! E; E ! I; I ! E
and I ! I.
• Bottleneck. The same as above but only a small fraction of all neurons have afferent
links.
• Sphere. Let us imagine that all neurons correspond to randomly selected points of a
sphere with radius equal to 1. The synaptic delays of excitatory links are
402 M. Kiselev

proportional to the length of the links. Network connectivity obeys the “small
world” law – all neurons have the same numbers of long and short links. Long links
are created by the same rule as in the two previous schemas. Postsynaptic neurons
for short links are selected using the probability distribution
pðrÞ exp ðr aÞ2 =2b2 , where r – is the distance to postsynaptic neuron, a and
b – the constants (for excitatory links a = 0).

5 Genetic Algorithm Finding Network with the Best Memory

Thus, three kinds of chaotic SNNs were explored. Each one is characterized by 30
+ parameters (constants, entering neuron model and plasticity rule, structural properties
of the network). Criterion for evaluation of their memory ability was described in
Sect. 3. Therefore, ﬁnding the best SNN is an optimization problem. This type of
optimization problems are solved efﬁciently by the genetic algorithm (GA) and it was
selected as an optimization technique in this study.
Optimization was performed for networks of the same size (10000 neurons). The
population size in all cases was 300. The mutation probability per individual was 0.5;
elitism – 10%. Optimization was stopped when 3 consecutive populations had not
shown progress.

6 Results

The GA optimization performed in this study showed that the connectivity configu-
rations “neural gas” and “bottleneck” show almost no signs of emerging memory
mechanism. The best accuracy obtained for “neural gas” was 6.34%, for “bottleneck” –
7.25%. It is too low accuracy, close to the baseline lazy classifier accuracy which
equals approximately to 3.3% for 30 equally frequent patterns. Interestingly, synaptic
plasticity was found to be a definitely positive factor – without it the accuracy fell to
4.29%. At the same time, formation of memory mechanism in a “sphere” SNN was
reliably demonstrated (accuracy 25.7%). The best network is characterized by very
sparse and local connectivity – excitatory neurons have 7 excitatory synapses such that
6 of them are connections with the closest neurons and only 1 link is “far”. Number of
inhibitory synapses is only 3, all inhibitory links are “local” (a = 0.00653,
b = 0.00315). The optimum percent of inhibitory neurons was 7.82%. Another inter-
esting feature of the best network is significant difference of time constant su for
excitatory and inhibitory neurons (14/4 ms).
Dependence of the accuracy on the network size was studied (for fixed optimum
values of the other parameters). It is shown on Fig. 1. We see that it is almost linear on
logarithmic scale.
The computations were performed on three GPU servers using the high perfor-
mance SNN simulation package ArNI. A SNN consisting of 100000 neurons is sim-
ulated at the speed 7 times slower than real time on a powerful PC with 4
NVIDIA TITAN Xp cards provided for this project by Kaspersky Lab.
Chaotic Spiking Neural Network Connectivity Configuration 403

60
Accuracy, %

0
6000 24000 96000
Network size

Fig. 1. Dependence of the pervious pattern determination accuracy on the SNN size.

7 Conclusion

The results obtained in this work let us make the following conclusions:
• The connectivity scheme used in the traditional LSM is not optimal from the
viewpoint of LSM memory characteristics and therefore may limit its ability to
produce valuable predictive features from dynamic data. To reach higher perfor-
mance the “small world” connectivity scheme described above should be used.
• SOLSM (LSM with plastic neurons) can outperform traditional LSM due to fuller
usage of network resources (restructuring silent or constantly active neuronal
groups).
• Network size is very important. It is possible that the power of SOLSM will be
unveiled in full only in case of very large SNN still unavailable for commonly used
hardware platforms (such as GPU servers).
The type of SNNs studied in this work is very hard for theoretical and empirical
exploration. This scientific problem requires significant research efforts. The presented
results while being significant and valid should still be considered as preliminary.
Systematic study of SOLSM is being carried out now as a part of the research project
ArNI supported by Kaspersly Lab, its results will be reported in further publications.

Acknowledgements. I would like to thank Andrey Lavrentyev and Artyom Nechiporuk for
valuable discussion. I am grateful to Kaspersky Lab for the powerful GPU computer provided.

References
1. Maass, W.: Liquid state machines: motivation, theory, and applications. In: Computability in
Context: Computation and Logic in the Real World. World Scientiﬁc, pp. 275–296 (2011)
2. Kiselev, M.: Self-organization process in large spiking neural networks leading to formation
of working memory mechanism. In: Rojas, I., Joya, G., Cabestany, J. (eds.) Proceedings of
IWANN 2013. LNCS, vol. 7902, Part I, pp. 510–517 (2013)
404 M. Kiselev

3. Kiselev, M.: Self-organized short-term memory mechanism in spiking neural network. In:
Proceedings of ICANNGA 2011 Part I, Ljubljana, pp. 120–129 (2011)
4. Fiebig, F., Lansner, A.: A spiking working memory model based on Hebbian short-term
potentiation. J. Neurosci. 37(1), 83–96 (2016)
5. Szatmary, B., Izhikevich, E.: Spike-timing theory of working memory. PLoS Comput. Biol.
6(8), e1000879 (2010)
6. Lansner, A., Marklund, P., Sikström, S., Nilsson, L.-G.: Reactivation in working memory:
an attractor network model of free recall. PLoS ONE 8(8), e73776 (2013). https://doi.org/10.
1371/journal.pone.0073776
7. Seeholzer, A., Deger, M., Gerstner, W.: Stability of working memory in continuous attractor
networks under the control of short-term plasticity. PLoS Comput. Biol. 15(4), e1006928
(2019). https://doi.org/10.1371/journal.pcbi.1006928
8. Kiselev, M.: Rate coding vs. temporal coding – is optimum between? In: Proceedings of
IJCNN-2016, pp. 1355–1359 (2016)
9. Kiselev, M., Lavrentyev, A.: A preprocessing layer in spiking neural networks – structure,
parameters, performance criteria, accepted for publication. In: Proceedings of IJCNN-2019
(2019)
10. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:
1010933404324
The Large-Scale Symmetry Learning Applying
Pavlov Principle

Alexander E. Lebedev(&), Kseniya P. Solovyeva,

and Witali L. Dunin-Barkowski

Scientiﬁc Research Institute for System Analysis of Russian Academy

of Sciences, Moscow, Russia
[email protected]

Abstract. Symmetry detection task in the domain of 100-dimension binary

vectors is considered. This task is characterized by practically infinite number of
training samples. We train an artificial neural network with binary neurons to
solve the symmetry detection task. Weight changing of hidden neurons is per-
formed according to Pavlov Principle. In the presence of error, synaptic weights
are adjusted considering a matrix of random weights. After training on a rela-
tively small number of data samples our network obtained generalization ability
and detects symmetry on data not present at the training set. The obtained
averaged percentage of correct recognition of our network is better than those of
classic perceptron with fixed weights of synapses of neurons of hidden layer.
We also compare performance of different modifications of the architecture
including different number of hidden layers, different number of neurons in
hidden layer, different number of neurons’ synapses.

Keywords: Symmetry detection Pavlov Principle Artiﬁcial neural nets

Feedback alignment Biologically plausible learning

1 Introduction

1.1 History of Symmetry Detection Task

Symmetry detection task is a tradition benchmark in the field of Artificial Intelligence
and Artificial Neural Networks research. In [1], one of the first works, introducing
backpropagation for training artificial neural nets, mirror symmetry detection was one
of the first tasks to test the algorithm. In this example the net had six input neurons
divided into two groups. The answer should be considered positive if the activity value
(which can be either 0 or 1) of each input neuron of the first group is equal to the value
of corresponding neuron of the second group. The learning required about 100000
presentations of input vectors, with the weights being adjusted on the basis of the
accumulated gradient after each sweep.
Another example of symmetry detection was presented in [2]. In this work a
Boltzmann machine was utilized. The task was formulated as to recognize whether a
binary square image was symmetric. Different types of symmetry were considered like
horizontal, vertical or diagonal. Input data was represented by 4 4 and 10 10

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 405–411, 2020.
https://doi.org/10.1007/978-3-030-30425-6_48
406 A. E. Lebedev et al.

binary images, corresponding to 16 and 100 size of input vector respectively. Boltz-
mann machine, trained on a set of randomly selected such images, obtained 98.2%
accuracy on 4 4 problem an 90% accuracy on 10 10 problem.

1.2 Pavlov Principle and Biologically Plausibly Learning Algorithms

Since its introduction in 1986 backpropagation of errors became a dominant algorithm
for training multi-layer neural nets. However, many neuroscientists argue that it is
impossible in a real brain due to several considerations. For example, it seems unre-
alistic that axons have mechanisms to propagate back complex information needed to
calculate all corresponding derivatives.
After pioneering work [3] of Timothy Lillicrap many studies proposed different
alternative algorithms for training neural networks that were intendent to be more
biologically plausible. In [3] a comparable in efﬁciency to deep learning by back-
propagation result was achieved without backpropagation of error utilizing derivatives.
Instead error signals are transmitted to previous layers using random feedback weights.
Intuition, gained from such experiments and neurobiological studies, can be gen-
eralized. In 2015 Pavlov principle was introduced. It was formulated in [4] as follows:
PAVLOV PRINCIPLE (PP): The network of neurons, such that the strength of each
of the connection between neurons is gradually changing as a function of locally
available error signal components and activity states of the neurons connected, comes
in the process of network functioning to error-free operation.
One of the closest implementation, resembling this principle (although probably
unaware of it) was performed by Nokland [5]. In this study a random matrix of weights
was used to propagate error information to neurons of hidden layers. This architecture
was efﬁcient to successfully solve MNIST problem.

2 The Model

2.1 Our Formulation of Symmetry Problem

In this work we investigate symmetry problem, applying Pavlov principle. It had been
briefly introduced in [4]. Our formulation of symmetry recognition task is following.
The input vector consists of fixed even number of binary variables divided into two
subsets. Each variable in the first subset has a corresponding “pair” variable in the
second subset. The input data sample is considered symmetric if the values of variables
in each “pair” are equal to each other. If a data sample doesn’t fit to the symmetry class,
it is considered as non-symmetric, or belonging to class 0.
In our experiments we consider input vectors with the size of 100. In this case there
are 2100 possible different data samples, which is practical infinity. And for each
symmetry class there are 2100 symmetric samples, which is greater than 2100 . This
makes including every possible data sample into a training set in fact impossible. The
classifier should have a strong generalization ability to successfully solve this problem.
To generate a data sample we use the following strategy. First, the class number is
selected randomly between given set of classes including symmetric classes and non-
The Large-Scale Symmetry Learning Applying Pavlov Principle 407

symmetric class. For a symmetry class we assign a random binary value to each
variable of the ﬁrst subset and the same value to each corresponding variable of the
second subset. This guarantees the data sample to be symmetric in the chosen way. If
the data sample should belong to non-symmetric class, each variable is assigned to a
random value. Then we check the sample for symmetry. If the data sample happened to
belong to the symmetric class, we select randomly one variable and change its value to
the opposite. Since symmetric vectors never have odd number of 0 s or 1 s, this
procedure guarantees to generate a non-symmetric data sample. However such per-
cussion is practically unnecessary since there is only a 2100 chance that a randomly
generated 100-digit binary vector will accidentally happen to be symmetric.

2.2 Learning Procedure and Experiments

To solve the symmetry recognition problem we train a neural net applying Pavlov
principle. The network consists of input layer, one or several hidden layers and an
output layer. The usage of a hidden layer is crucial, because the activity in an individual
input unit, considered alone, provides no evidence about symmetry or non-symmetry of
the whole input vector, so simply adding up the evidence from the individual input
units is unsufﬁcient.
The interpretation of Pavlov Principle, used in this study, includes using error
signals, derived from comparison of actual output values (taken from neurons of output
layer) with desired ones, explicitly to train neurons of hidden layers. Like in [4] these
signals are weighted by randomly chosen but ﬁxed weights. However as long as we
don’t need to compute derivatives of error function, we use binary McCulloch–Pitts-
like neurons with threshold activation function. More formally, the neurons of hidden
and output layers perform the following operation:

X
N
Y ðtÞ ¼ Sð wi ðtÞ Xi ðtÞ bÞ ð1Þ
i¼1

Here 2100 are components of the binary input vector X at step t, presented at i-th
synapse, and 2100 is the corresponding output value. Since the architecture used in our
study doesn’t imply recurrent connections, layerwise successive computation of neu-
rons’ outputs can be viewed as performed on the same step. 2100 is a binary input-
output threshold activation function, it equals to 1 if its argument is greater than 0, and
it is 0 otherwise. 2100 is the weight of input variable with index i 2100 and b is a
threshold value.
The learning rule of a hidden neuron can be formalized as following:
XK
wi ðt þ 1Þ ¼ wi ðtÞ þ e Fð K1
Ek ðtÞ ek;i ; YðtÞ; XðtÞÞ ð2Þ

Here, e is a learning rate factor which determines the speed of weight changing. E is
a K-component error vector, where k is the number of output values multiplied by 2.
Each component of an output vector 2100 corresponds to two error components, 2100
408 A. E. Lebedev et al.

and 2100 . They both equal to 0 if oj ðtÞ match with the desired value. E2j ðtÞ equals 1
only if oj ðtÞ is greater than its desired value (i.e. it equals to 1 when 0 is desired) and
E2j þ 1 ðtÞ equals to 1 only if oj ðtÞ is less than desired (i.e. it equals to 0 when 1 is
desired). 2100 is a ﬁxed weigh, associated with k-th error component and propagated to
i-th synapse. 2100 sets the learning rule. In most cases in this study we use the following
learning formula:
XK XK
F K1
Ek ðt Þ e k;i ; Y ð t Þ; X ð t Þ ¼ K1
Ek ð tÞ e k;i ðY ðtÞ 0:5Þ
ðX ðtÞ 0:5Þ 4 ð3Þ

For training output neurons we use a similar formula, but ek,i are not selected
randomly. Instead we use 2100 and 2100 for all i where j is the index of corresponding
output class. Other e-factors are equal to 0, so the total impact of error is equal to the
difference between the desired output value and the actual output. This makes the
learning rule of output neurons similar to delta-rule for classic perceptron. It increases
the weights of inputs, that are positively correlated with the desired output and
decreases weights of inputs, that are negatively correlated with the desired output.

100
90
80
70
60
50
40
30
20
10
0
698

862
124
165
206
247
288
329
370
411
452
493
534
575
616
657

739
780
821

903
944
985
1
42
83

learning rate 0.001

learning rate 0.01
fixed hidden neurons, learning rate 0.001

Fig. 1. The history of changing of percentage of correct symmetry recognition for different
values of learning rate and for perceptron with ﬁxed weights of neurons in hidden layer. The
vertical axis corresponds to percentage of correct answers. The horizontal axis corresponds to the
number of training steps (in thousands).
The Large-Scale Symmetry Learning Applying Pavlov Principle 409

We tested our neural net on the symmetry detection problem with different settings.
In the primary one we used one hidden layer with 400 neurons and 2 output neurons,
corresponding to symmetric and non-symmetric classes. Each neuron was connected to
each neuron of the previous layer. On Fig. 1 we present two examples of history of
changing of average (averaged over last 1000 steps) percentage of correct answers for
symmetry class during training. The learning process lasted 1000000 steps in this
experiment. Since the learning process stabilizes after 500000 steps for learning rate
0.01, we reduced the number of steps to this number for next experiments. The final
average percentage of correct answers was 94.80%.
We compared the obtained results with the performance of classic perceptron which
have neurons in hidden layer with fixed random weights. With a similar configuration
(400 neurons in one hidden layer, 1000000 training steps, learning rate 0.001) its
average obtained performance was 59.38% of correct symmetry recognition which is
slightly better than a random guess. The history of changing of percentage of correct
answers for symmetry class for perceptron with fixed weights in hidden layer is also
shown in Fig. 1 with dotted line.
Next we investigated architectures with more than one hidden layers. We tested
configurations with 1, 2, 3 and 5 hidden layers. Table 1 shows percentage of correct
answers for symmetry class, obtained after 500000 steps of training. These percentages
were measured during special test phase with fixed weights and lasted for 10000 steps.
The results were averaged over several independent runs. The obtained accuracy
decreases with the increase of number of hidden layers. However it was still better, than
a perceptron with fixed weights in hidden layer.

Table 1. Comparison of performance of conﬁgurations with different number of hidden layers.

Number of Obtained percentage of correct answers for Standard Number
hidden layers symmetry class (averaged over several runs) deviation of runs
1 94.80% 1.41% 5
2 82.14% 2.20% 3
3 74.45% 2.37% 3
5 62.23% 7.76% 3

We also investigated the impact of the amount of neurons in the hidden layer. We
tested conﬁgurations with 200, 400 and 800 neurons in hidden layer (with only one
hidden layer). As can be observed from Table 2, the increase of neurons in hidden layer
increases the performance of the neural network.
410 A. E. Lebedev et al.

Table 2. Comparison of performance of conﬁgurations with different number of neurons in

hidden layer.
Number of neurons Obtained percentage of correct Standard deviation Number of runs
in hidden layer answers for symmetry class
(averaged over several runs)
200 88.82% 5.86% 3
400 94.80% 1.41% 5
800 98.28% 0.53% 3

We also investigated architectures where the network was not fully connected.
Instead each neuron in hidden layer was randomly connected to the ﬁxed number of
neurons in the previous layer. Neurons of output layer remained connected to all
neurons of the previous layer. Table 3 presents the obtained performance for different
number of connections

Table 3. Comparison of performance of conﬁgurations with different number of connections of

neurons of hidden layer.
Number of connections Obtained percentage of correct Standard deviation Number of runs
per neuron answers for symmetry class
(averaged over several runs)
25 89.48% 3.69% 3
50 95.92% 0.77% 4

The learning procedure of the experiments, mentioned above, included no weight

normalization procedure. We performed a series of separate experiments to investigate
the impact of different normalization strategies. We examined normalization by sum
(which adjusts weights to preserve the sum of weights) and normalization by sum of
squares (which adjusts weights to preserve the sum of squares of weights). The result is
presented in Table 4. These experiments were carried out with basic conﬁgurations
parameters i.e. 400 neurons in one hidden layer. The training lasted for 500000 steps.
As it can be obtained, normalization by preserving sum of weights completely destroys
the learning process, showing 51.32% accuracy which is not better than a random
guess. By the way, normalization by preserving sum of squares managed to slightly
improve the performance. In the last setting we also used lower learning rate since the
normalization procedure leads to appearance of weights with low absolute value and
high learning rate can influence them too much.
The Large-Scale Symmetry Learning Applying Pavlov Principle 411

Table 4. Comparison of performance with different normalization strategies.

Normalization strategy type Obtained percentage of Standard Number
correct answers for symmetry deviation of runs
class
No normalization 94.80% 1.41% 5
Normalization by preserving 50.32% 0.69% 3
sum
Normalization by preserving 98.76% 0.80% 3
sum of squares, learning rate
0,001

3 Conclusion

In this work we investigated one of possible implementations of Pavlov Principle and

applied it to symmetry detection problem in the domain of 100-dimensional binary
vectors. Our implementation implies utilizing error signals from output neurons to
adjust weights of hidden neurons by weighting them by fixed random weights. Unlike
[5] we use binary neurons with threshold activation function. Although only a tiny
fraction of all possible data samples were used to train the neural network, it managed
to obtain generalization ability and detect symmetry of previously unpresented data
samples. This proves the general plausibility of Pavlov Principle. However the per-
formance didn’t improve by adding extra hidden layers. We suggest that using some
modifications of learning algorithm will allow overcoming this problem and further
improving the performance while remaining consistent with Pavlov Principle. Even-
tually this approach will lead toward appearance of more biologically plausible learning
algorithms and utilizing them to create general artificial intelligence.

Acknowledgements. The work is ﬁnancially supported by State Program of SRISA RAS

No. 0065-2019-0003 (AAA-A19-119011590090-2).

References
1. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating
errors. Nature 323, 533–534 (1986)
2. Sejnowski, T.J., Kienker, P.K., Hinton, G.E.: Learning symmetry groups with hidden units:
beyond the perceptron. Phys. D Nonlinear Phenom. 22(1–3), 260–275 (1986)
3. Lillicrap, T., Cownden, D., Tweed, D.B., Akerman C.J.: Random feedback weights support
learning in deep neural networks. arXiv:1411.0247 (2014)
4. Dunin-Barkowski, W.L., Solovyeva, K.P.: Pavlov principle and brain reverse engineering. In:
2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational
Biology, Saint Lois, Missouri, USA, 30 May–2 June 2018, vol. 37, pp. 1–5 (2018)
5. Nokland, A.: Direct feedback alignment provides learning in deep neural networks. arXiv:
160901596 (2016)
Bimodal Coalitions and Neural Networks

Leonid Litinskii(&) and Inna Kaganowa

Scientiﬁc Research Institute for System Analysis RAS,

Nakhimov Ave 36-1, Moscow 108840, Russia
[email protected]

Abstract. We give an account of the Axelrod – Bennet model that describes

formation of a bimodal coalition. We present its initial formalism and applica-
tions and reformulate the problem in terms of the Hopﬁeld model. This allowed
us to analyze a system of two homogeneous groups of agents, which interact
with each other. We obtained a phase diagram describing the dependence of the
bimodal coalition on external parameter.

Keywords: Bimodal coalition Hopﬁeld model Homogeneous groups

1 Introduction

In the early 90s R. Axelrod and D. Bennet proposed an approach for a formal
description of splitting of a set of interacting agents into two competing groups [1, 2].
Their results have found applications in social, politic, and management sciences. Then
Galam [3] reformulated this approach in terms of the Ising model. Afterwards he
complicated the initial scheme and proposed a number of new models (see references in
[4]). The following development of this approach led to the appearance of the
econophysics and sociophysics.
In this paper, we solve the same problem using the ideas and concepts of the
discrete dynamic of the Hopﬁeld model. We analyze analytically an idealized case of
two equally interacting homogeneous groups of the agents and construct a phase
diagram that describes completely how the decomposition of the agents into two
groups depends on the intra-group interaction and cross-interaction between groups.
Following tradition, the decomposition of the agents into two groups will be called
a bimodal coalition.

2 Bimodal Coalition Problem

1. The original setting of the problem. We have n agents that are connected with each
other. By wi ; i ¼ 1; . . .; n we deﬁne the weight of the i-th agent. The connections of the
agents we interpret in terms of their mutual propensity and suppose that propensities
are symmetrical:

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 412–419, 2020.
https://doi.org/10.1007/978-3-030-30425-6_49
Bimodal Coalitions and Neural Networks 413

[ 0; if agents i and j are prone to cooperate;
pij : pij ¼ pji :
\0; if agents iand jare prone to conflict:

Two lists A and A ~ deﬁne a bimodal coalition C ¼ ðA; AÞ ~ or, in other words, de-
composition into two groups. Each of these lists contains all the numbers of agents
assigned to the given group:
~ ¼ InA; where I ¼ f1; 2;. . .,ng is the full list.
A ¼ fi1; ; i2 ; . . .; ip g; A
Each grouping C ¼ ðA; AÞ ~ provides a proximity relation dij between the agents:

1; if agents i and j belong to the given list;
dij ðCÞ ¼
0; if agents i and j belong to different lists:

~ for the i-agent as

Let us deﬁne a productivity of the grouping C ¼ ðA; AÞ

X
n
Ui ðCÞ ¼ wj pij dij ðCÞ:
j¼1

The productivity of the grouping C for the i-th agent is maximal if all the other
agents with which the given agent is prone to cooperate belong to his group and the
group does not contain agents with which it is prone to conflict.
In the Axelrod-Bennet it is stated that a system of agents tends to those grouping for
which the weighted sum of the productivities is maximal:

X
n
UðCÞ ¼ wi Ui ðCÞ ! max: ð1Þ
i¼1

2. Applications. The described approach was applied when analyzing composi-

tions of the belligerent coalitions during the World War II. The agents were 17
European countries. An integral index deﬁned the weight of each country. It was
calculated as a combination of different demographic, industrial, and military
characteristics.
The mutual propensities were calculated using the data for 1936 and criteria that
included ethnic conflicts, religion, frontier incidents, political regime and so on. The
maximization of the sum UðCÞ led to two maximums. The global maximum corre-
sponded to the following decomposition:
A = {Britain, France, USSR, Czechoslovakia, Yugoslavia, Greece, and Denmark};
~ = {Germany, Italy, Poland, Romania, Hungary, Portugal, Finland, Latvia,
A
Lithuania, and Estonia}.
We see that only Poland found itself in the improper camp (as well as Portugal,
which was a neutral nation during the war). Let us make it clear that the block to which
the given country belonged was determined by taking into account who occupied the
country or who declared war on it.
414 L. Litinskii and I. Kaganowa

In other paper of the same authors they used this method to describe alliances of
producers of standards of UNIX operating systems. Nine companies involved in the
UNIX production were regarded as agents. They are
AT&T, Sun, Apollo, DEC, HP, Intergraph, SGI, IBM and Prime.
In the course of cumbersome calculations of the connections pij , some parameters
of the problem played the role of weight coefﬁcients. By varying the parameters within
reasonable limits they discovered only a weak dependence of the result on the values of
the parameters. The authors found that there were two decompositions of the functional
(1) that provided the same global maximum:
• {Sun, DEC, HP} and {AT&T, Apollo, Intergraph, SGI, IBM, Prime};
• {Sun, AT&T, IBM, Prime} and {DEC, HP, Apollo, Intergraph, SGI}.
The second grouping corresponded to the existing associations of the companies in
UNIX International and OPEN Software Foundation and only IBM was identiﬁed
incorrectly.
3. Ising model. In the second half of 90 s Serge Galam recognized that it was
convenient to formulate the Axelrod-Bennet model in terms of the Ising model.
Let us introduce a matrix

J ¼ ðJij Þ; Jij ¼ pij wi wj ð1 dij Þ; i; j ¼ 1; n

where dij is the Kronecker delta and the diagonal elements of the matrix J are equal to
zero. To each bimodal coalition C we assign a conﬁguration vector s ¼ ðs1 ;s2 ;. . .;sn Þ:

~ , s ¼(s1 ,s2 ,. . .,sn ): si ¼ 1; i 2 A
C ¼ ðA; AÞ ~
si ¼ 1; i 2 A:

Then the maximization of the sum (1) is equivalent to the determination of the state
s corresponding to the global minimum of the energy EðsÞ:

X
n
EðsÞ ¼ ðJs; sÞ ¼ Jij si sj ! min: ð2Þ
i;j¼1

The problem (2) is a well-known minimization problem of a quadratic form of

binary variables. This problem arises in various scientific fields.
4. Hopfield model. Let us analyze the described system in terms of a neural
network of the Hopfield type. As the context may require, in what follows we refer to
the binary variables si ¼ 1 as binary agents or spins. The state of the system we
describe by a configuration vector s ¼ ðs1 ;s2 ;. . .;sn Þ.
Let us introduce a dynamic procedure on which the Hopfield model is based. Let
P
sðtÞ be the state of the system at time t. At this moment a local field hi ðtÞ ¼ Nj¼1 Jij sj ðtÞ
acts on the i-th spin. At the next moment t þ 1 the state of the spin changes if its sign
does not coincide with the sign of the field hi ðtÞ, and it remains unchanged otherwise:
Bimodal Coalitions and Neural Networks 415

si ðtÞ; when si ðtÞhi ðtÞ 0
si ðt þ 1Þ ¼ , si ðt þ 1Þ ¼ signðhi ðtÞÞ: ð3Þ
si ðtÞ; when si ðtÞhi ðtÞ\0

In what follows, an unsatisfied spin is a spin whose sign does not coincide with the
sign of the field acting on it. If the state of the i-th spin changes, then its contribution to
the local fields acting on the other spins also changes. As a result, the state of some
other spins can also change etc. The evolution of the system consists of subsequent
turns of unsatisfied spins. Each step of the evolution is accompanied by a decrease of
the energy of the state, and sooner or later the system reaches a state that corresponds to
an energy minimum (it may be a local minimum). At that moment, the evolution of the
system will stop, since all the spins will be satisfied. However, according the setting of
the problem, we have to find the global minimum. For this purpose we can use
improved procedures of minimization [5, 6]. The formulation of problem (3) in terms
of neural networks allows us to illustrate the problem of bimodal coalition formation.
Concluding this section let us note, that all the energies are two-fold degenerate:
EðsÞ ¼ ðJs; sÞ ¼ EðsÞ. To remove the degeneration we need an external field.

3 Homogeneous Groups of Agents

1. One homogeneous group. A homogeneous group is a group where all the agents
interact identically. In this case the interaction matrix has the form
0 1
0 a a
Ba 0 aC
J¼B
@ ... .. .. . C; a [ 0: ð4Þ
. . .. A
a a 0

The network with such a connection matrix has only one the global minimum of the
energy s0 ¼ ð1; 1; . . .; 1Þ; and there are no other minima. (We do not take into account
the second minimum that appears due to the equality EðsÞ ¼ EðsÞ). In other words,
the states of all the agents are the same. It can be said that all the agents behave “as one
person”.
If we turn to Eq. (2) we see that for the system with the connection matrix (4), not a
bimodal coalition but a consolidation of all the agents into one group is profitable.
2. Two homogeneous groups. Let us examine a spin system consisting of two
homogeneous groups. We suppose that in the first group there are p agents and the
interactions between these agents are identical and equal to A. The interactions between
the remaining q agents (that constitute the second group) are also identical and equal to
C. We suppose that all the interactions between the agents from the first and second
groups are equal to B. We assume that C is positive and larger than A and B, and factor
out C. Now the connection matrix has the form
416 L. Litinskii and I. Kaganowa

0 p q 1
zfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflffl{ zfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflffl{
B 0 a a b b b C
B C
B a 0 a b b b C
B . . . . C
B . .
B . . . . ... .. .. . .
. . . .. C
C
B C a ¼ A=C;
J ¼ B a a 0 b b b C; b ¼ B=C:
B C
B b b b 0 1 1 C
B C
B b b b 1 0 1 C
B . . . .. C
@ .. .. . . ... .. .. . .
. . . . A
b b b 1 1 0

For a neural network with such a connection matrix we will describe the depen-
dence of the set of the minima upon the parameters a, b, p, and q, where p þ q ¼ n:
One can show that if a conﬁguration corresponds to a minimum of the energy (2) its
last q coordinates have to be identical:

s (s1 ,s2 ,. . .sp ,1,1,. . .,1). ð5Þ

Consequently, there are at most 2p conﬁgurations that can be minima of the

functional (2) and the last q coordinates of all the configurations are identical.
Let us divide the set of 2p configurations into classes Rk in such a way that the
configurations where among the first p coordinates exactly k coordinates equal to –1
belong to the class Rk . Since k takes the values k ¼ 0; 1; 2; . . .; p; there are p þ 1 such
classes. Let us write down these classes in the explicit form indicating how many
configurations belong to a given class (see Table 1).

Table 1. Classes Rk and numbers of conﬁgurations in these classes

Bimodal Coalitions and Neural Networks 417

It turns out that for some values of the parameters, only the configurations from the
class Rk provide the minimum of the functional (2) simultaneously. For all the con-
figurations from Rk the energies (2) are the same. In other words, they are minima, and
if the inequality 0\k\p is fulfilled then there are no other local minima of the
functional (2).
In Fig. 1 we show the partition of the ða; bÞ-plane into regions where one class or
other of the configurations Rk provides a minimum of the functional (2). Below we
interpret this diagram in terms of bimodal coalitions.

Fig. 1. The phase diagram for the problem (2).

3. Sensible interpretation. From Eq. (5) for the coordinates of the global mini-
mum it follows that the second group always acts “as one person” – the last q spins are
equal to +1.
At first, let us examine the case when the agents of the first homogeneous group are
prone to cooperate with each other (a [ 0). Then they also act ‘as one person’ (see the
upper half-plane of the diagram). If both groups of agents are prone to cooperate with
each other, that is, when b [ 0, then all the agents of the first group are in the same
state as the agents of the second group, i.e. the first p coordinates of the vector R0 are
equal to +1. However, if the groups conflict with each other, that is, b\0, then all the
agents of the first group are in the state, opposite to the state of the agents belonging to
the second group.
Let us summarize. When all the agents inside each group are prone to cooperate
(a [ 0), the sign of the cross-interaction defines the state of the whole system. If b [ 0,
and, consequently, the groups are prone to cooperate with each other, it is more
profitable for them to be together. In this case, the vector R0 provides the global
minimum. If b\0 and the groups are conflicting, it is more profitable for the groups to
be separate: in this case the global minimum corresponds to Rp .
Inside the symmetric strip along the axis of ordinates, both configurations R0 and
Rp are minima simultaneously. This strip is a unique region on the plane where the
418 L. Litinskii and I. Kaganowa

functional (2) has both global and local minima simultaneously. To the right of the axis
of ordinates, where b [ 0, the vectors R0 and Rp provide the global and the local
minima, respectively. On the other hand, to the left of the axis of ordinates, Rp cor-
responds to the global minimum and R0 to the local minimum. It is easy to explain why
such quasi-instability takes place. Indeed, let us suppose that the cross-interaction
between the groups is equal to zero: b ¼ 0. In other words, two groups of the agents are
completely independent. Then the problem (2) has two equivalent solutions R0 and Rp
that correspond to the same value of energy. When jbj increases slightly, at the
beginning the second configuration continues to be a minimum, but now a local
minimum. When the value of jbj becomes sufficiently large, the additional local
minimum disappears.
The narrow strip along the axis of ordinates is the result of removing the random
degeneracy of the global minimum when the external parameter b ¼ 0. It is interesting
to understand whether local minima always appear for the same reason? Or are there
other mechanisms for their appearance?
Finally, let us briefly discuss the situation when the agents inside the first group
conflict with each other (a\0). The lower half of the phase diagram shows that in this
case the first group of agents splits into two opposing groups. This conclusion is rather
reasonable. Other intrinsic interpretations are more speculative.

4 Conclusions

We have shown that in a system with a great number of interacting binary agents, the
known problem of the formation of two competing groups, or the problem of the
bimodal coalition, can be formulated in terms of neural networks of the Hopﬁeld type.
The neural network dynamics is convenient when describing the influence of the agents
on each other. We analyzed theoretically an idealized case of interaction between two
homogeneous groups of agents. The obtained results allowed us to present a sensible
interpretation of the bimodal coalition problem.
We determined the mechanism of the formation of the local minima for the energy
functional. It is interesting to ﬁnd out whether there are other possibilities for their
appearance. We think that our analysis is promising and deserves further examination.

Acknowledgement. The work was ﬁnancially supported by State Program of SRISA RAS
No. 0065-2019-0003 (AAA-A19-119011590090-2).
We are grateful to Ben Rozonoer for his help in preparation of this paper.

References
1. Axelrod, R.M., Bennett, D.S.: A landscape theory of aggregation. Brit. J. Polit. Sci. 23(2),
211–233 (1993)
2. Axelrod, R.M., Mitchell, W., Thomas, R.E., Bennett, D.S., Bruderer, E.: Coalition formation
in standard-setting alliances. Manag. Sci. 41(9), 1493–1508 (1995)
Bimodal Coalitions and Neural Networks 419

3. Galam, S.: Fragmentation versus stability in bimodal coalitions. Phys. A 230(1–2), 174–188
(1996)
4. Serge, G.: Sociophysics. Springer, New York (2012)
5. Houdayer, J., Martin, O.C.: Renormalization for discrete optimization. Phys. Rev. Lett. 83,
1030–1033 (1999)
6. Karandashev, I.M., Kryzhanovsky, B.V.: Matrix transformation method in quadratic binary
optimization. Opt. Mem. Neural Netw. (Inf. Opt.) 24(2), 67–81 (2015)
Building Neural Network Synapses Based
on Binary Memristors

Mikhail S. Tarkov(&)

Rzhanov Institute of Semiconductor Physics SB RAS, Novosibirsk, Russia

[email protected]

Abstract. The design of an analog multilevel memory cell based on the use of
resistors and binary memristors is proposed. This design provides a greater
number of resistance levels with a smaller number of elements than the well-
known multilevel memory devices. The cell is designed to set the synapse
weights in hardware-implemented neural networks. The neuron vector of
weights can be represented by a crossbar of binary memristors and a resistor set.
An algorithm is proposed for mapping the neuron weight to the proposed
multilevel memory cell. The proposed approach is illustrated by the construction
example of a neuron for partitioning a set of vectors into two classes.

Keywords: Neural networks Binary memristors Multilevel memory cell

Crossbar LTSPICE

1 Introduction

The neural network hardware implementation requires a lot of memory to store the
neurons layer weight matrix and it is expensive. The solution of this problem is
simplified by using a device called memristor (a resistor with a memory) as a memory
cell. The memristor was predicted theoretically in 1971 by Leon Chua [1]. The first
physical realization of the memristor was demonstrated in 2008 by the Hewlett Packard
laboratory as a thin-film TiO2 structure [2]. The memristor behaves like a synapse: it
“remembers” the total electrical charge that has passed through it. The memory based
on the memristors can reach the integration degree of 100 Gbits/cm2, several times
higher than that based on the flash memory technology. These unique properties make
the memristor a promising device for creating massively parallel neuromorphic
systems.
Binary memristors realize two conductivity values. Multilevel memristors realize a
set of discrete conductivity levels (the levels number can reach tens and hundreds).
Binary and multilevel memristors [3–8] are based on the filament switching mechanism
and are more widespread than analog memristors, which conductivities can be changed
continuously. The analog memristor materials are encountered much less often and
they require a more complex making process. Multilevel memristors are more stable to
statistical fluctuations than the analog memristors. The use of binary memristors to set
the weighing coefficients of neural networks makes it important to create multilevel
memory cells based on them.

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 420–425, 2020.
https://doi.org/10.1007/978-3-030-30425-6_50
Building Neural Network Synapses Based on Binary Memristors 421

2 Adjustable Multilevel Memory Cell Based on Binary

Memristors

The memory cell consists of parallel-connected circuits, each contains a binary

memristor and a resistor connected in series (Fig. 1). The cell output is the current, the
value of which is proportional to the product of the input voltage of the cell and its
conductivity. Each of the parallel-connected circuits implements the cell binary digit.
The high resistance M of the memristor corresponds to the value zero of the weight
binary digit of the cell, and the low resistance m M corresponds to the value one of
the digit. The resistance of the resistor is determined by the position of the corre-
sponding binary digit of the cell weight: the lowest digit corresponds to the maximum
resistance value R of the resistor, then the resistance decreases according to the law
R=2i ; i ¼ 0; 1; . . .; n 1; n is the number of binary digits. In contrast to [6, 7], we do
not require quantizing the input signal.
Resistance R must meet constraints

m R=2n ; R M=n:

nP
1
For n binary digits, the cell resistance has 2n values from R= 2i (we neglect the
i¼0
value m) to M R n. For example, we get 32 values using 10 elements (5 memristors
and 5 resistors with resistances Ri ¼ R=2i ; i ¼ 0; 1; . . .; n 1 (Fig. 1). For compar-
ison: in the cell proposed in [9], 27 resistance values were obtained using 15 elements
(3 memristors and 12 resistors).

Fig. 1. Example of multilevel memory cell

3 Specifying an Array of Weights

To calculate the neuron activation it is required to store an array of weights. As the

number of memory cells in such array increases, only the number of binary memristors
increases, but the resistors number remains the same, since they are common to all cells
(Fig. 2). It can be said that these resistors form the basis of the cells resistances, and the
binary memristors realize the decomposition coefﬁcients of the cell resistance
422 M. S. Tarkov

according to this basis. The set of binary memristors forms a crossbar, the number of
rows in which is equal to the decomposition digits number n, and the columns number
is equal to the neuron inputs number. In the general case, for the weight vector real-
ization, two crossbars are required: the ﬁrst for realizing positive weights, and the
second for realizing negative weights.

Fig. 2. A neuron weights array construction

Fig. 3. Circuit for setting the memristor resistance

In Fig. 2, the circuits designed to set the memristor resistances of the crossbar are
not shown. The corresponding scheme is presented in Fig. 3. It allows us to set the
memristor resistance of an arbitrary crossbar to the minimum m or maximum M value
depending on the sign of the voltage that is fed to the input In and signiﬁcantly exceeds
the binary memristor voltage threshold. For setting the memristor resistance, the
transistor T is open by the voltage source V. In the crossbar functioning mode, this
transistor is closed.
Building Neural Network Synapses Based on Binary Memristors 423

4 The Weight Array Binary Digits Calculation

Suppose that the neural network is trained, i.e. the network weights have been cal-
culated. To implement the neuron weights based on the multilevel memory cell we
propose the following algorithm.
1. Among the neuron weight coefficients w1 ; w2 ; . . .wL ; L is the number of weights,
choose a coefficient wmin 6¼ 0 such that jwmin j jwi j for all i ¼ 1; . . .; L. Put the
coefficient wmin in correspondence to the resistor with minimum conductivity R1 ,
R m, R M.
2. Normalize the weights: wi wi =jwmin j, i ¼ 1; . . .; L:
3. Set the number of binary digits n ¼ 1.
4. For normalized weights wi ; i ¼ 1; 2; . . .; L; select a set of binary coefficients kji 2
f0; 1g providing a minimum of the sum

X
L X
n1
Sn ¼ ðjwi j kji 2 j Þ2 :
i¼0 j¼0

5. If Sn [ e, where e is the permissible error value, increase the number of digits

(n n þ 1) and go to 4.
6. End.

5 A Neuron Example with Synapses Based on Binary

Memristors

Consider an example of developing a neuron with a crossbar on binary memristors. Let

the vectors x1 ¼ ð1; 1; 1Þ, x2 ¼ ð1; 1; 0Þ, x3 ¼ ð1; 0; 0Þ belong to the ﬁrst class (the
required neuron output signal di ¼ 1; i ¼ 1; 2; 3), and the vectors x4 ¼ ð0; 0; 0Þ,
x5 ¼ ð0; 0; 1Þ, x6 ¼ ð0; 1; 1Þ belong to the second class (di ¼ 1; i ¼ 4; 5; 6). Then,
according to the mass center method, the neuron weight vector is equal to

X
3 X
6
w¼ xi xi ¼ ð3; 1; 1Þ: ð1Þ
i¼1 i¼4

We assume that the neuron activation function is

1; a [ 0;
f ðaÞ ¼ a ¼ ðw; xÞ; ð2Þ
1; a 0;

x is the neuron input vector. The activation function (2) can be implemented on the
basis of an operational ampliﬁer operating in comparator mode.
424 M. S. Tarkov

The weight w1 ¼ 3 (see (1)) can be represented as a sum w1 ¼ 2 þ 1. This means

that this weight can be given by the conductivity g1 ¼ R2 þ R1 of parallel resistors with
resistances R2 and R respectively.
The weight w2 ¼ 1 can be set by the conductivity g2 ¼ R1 , and the weight w3 ¼ 1
is set by conductivity g3 ¼ R1 of the resistor connected to the negative input of the
operational ampliﬁer. Assuming that R ¼ 20 kX, m ¼ 100 X, M ¼ 1000 kX, we get
the neuron circuit shown in Fig. 4.

Fig. 4. A neuron example based on binary memristors

For clarity, only memristors in the “on” state are shown here, that is, memristors
with minimal resistance m ¼ 100 X. The Table 1 shows the results of the experiment
in LTSPICE modeling system [10]. The output voltage values 3.2 V mean that the
vectors x1 ; x2 ; x3 belong to the ﬁrst class, and the values −3.2 V mean that the vectors
x4 ; x5 ; x6 belong to the second class (supply voltage V ¼ 5 V).

Table 1. Results of the experiment in LTSPICE

Input x1 x2 x3 x4 x5 x6
Output, in volts 3,2 3,2 3,2 −3,2 −3,2 −3,2

In order for the operational ampliﬁer to implement the activation function (2) at the
input x4 ¼ ð0; 0; 0Þ, a small negative bias based on the V2 source is added to the
circuit. The input value 0 corresponds to zero voltage, and the input 1 corresponds to
the voltage 0.3 V, which does not change the memristor resistance.
Building Neural Network Synapses Based on Binary Memristors 425

6 Conclusion

An analog multilevel memory cell design based on resistors and binary memristors is
proposed. This design provides a greater number of resistance levels with a smaller
number of elements than the one proposed previously. The cell is designed to set the
neuron synapse weights in the hardware-implemented neural networks.
The neuron weights can be represented by a crossbar of binary memristors and a set
of resistors. The number of resistors used in the neuron weights vector does not depend
on the number of weights.
An algorithm is proposed for mapping the neuron weights to the multilevel memory
cells with binary memristors. The proposed approach is illustrated by the neuron
construction example for partitioning a set of vector patterns into two classes. The
example is implemented in the LTSPICE software simulation environment.

References
1. Chua, L.: Memristor – the missing circuit element. IEEE Trans. Circ. Theor. 18, 507–519
(1971)
2. Strukov, D.B., Snider, G.S., Stewart, D.R., Williams, R.S.: The missing memristor found.
Nature 453, 80–83 (2008)
3. He, W., Sun, H., Zhou, Y., Lu, K., Xue, K., Miao, X.: Customized binary and multi-level
HfO2−x-based memristors tuned by oxidation conditions. Sci. Rep. 7, 10070 (2017)
4. Yu, S., Gao, B., Fang, Z., Yu, H., Kang, J., Wong, H.-S.P.: A low energy oxide-based
electronic synaptic device for neuromorphic visual systems with tolerance to device
variation. Adv. Mater. 25, 1774–1779 (2013)
5. Tarkov, M.S.: Crossbar-based hamming associative memory with binary memristors. In:
Huang, T., Lv, J., Sun, C., Tuzikov, A. (eds.) Advances in Neural Networks – ISNN 2018.
ISNN 2018. Lecture Notes in Computer Science, vol 10878. Springer, Cham (2018). https://
link.springer.com/chapter/10.1007/978-3-319-92537-0_44. Accessed 25 Apr 2019
6. Truong, S.N., Ham, S.-J., Min, K.-S.: Neuromorphic crossbar circuit with nanoscale
ﬁlamentary-switching binary memristors for speech recognition. Nanoscale Res. Lett. 9
(629), 1–9 (2014)
7. Nguyen, T.V., Vo, M.-H.: New binary memristor crossbar architecture based neural
networks for speech recognition. Int. J. Eng. Sci. Invent. 5(5), 1–7 (2016)
8. Yakopcic, C., Taha, T.M., Subramanyam, G., Pino, R.E.: Memristor SPICE model and
crossbar simulation based on devices with nanosecond switching time. In: Proceedings of
International Joint Conference on Neural Networks, Dallas, Texas, USA, 4–9 August,
pp. 158–160, IEEE (2013) https://ieeexplore.ieee.org/abstract/document/6706773. Accessed
25 Apr 2019
9. Irmanova, A., James, A.P.: Neuron inspired data encoding memristive multi-level memory
cell. Analog Integr. Circ. Sign. Process. 95, 429–434 (2018)
10. LTspice XVII. URL: http://www.linear.com/designtools/ software/ #LTspice
Author Index

A Egorchev, Mikhail, 25
Aleksey, Staroverov, 62 Engel, Ekaterina A., 45
Alexandrov, Yu. I., 138 Engel, Nikita E., 45
Alexandrov, Yuri I., 159 Eroshenkova, Daria A., 295
Andreev, Ark, 71
Andreeva, Olga V., 303 F
Arutyunova, K. R., 138 Farzetdinova, Rimma, 95
Fedorenko, Yuriy S., 207
B Filatov, Nikolay, 214
Bakhshiev, A. V., 221 Fomin, I. S., 221
Bakhshiev, Aleksandr, 214 Fomin, Ivan, 214
Beskhlebnova, Galina A., 124
Bogatyreva, Anastasia A., 263 G
Brynza, A. A., 367 Gai, Vasiliy E., 303
Bulava, Alexandra I., 159 Gapanyuk, Yuriy, 71, 78
Burikov, Sergey, 285, 319 Glyzin, Sergey D., 181
Gorban, Alexander N., 384
C Gordleeva, Susan Yu., 384
Chizhov, Anton V., 165 Gurtovoy, Konstantin, 151
Chumachenko, Sergey I., 295 Guseva, Alena, 392

D I
Dakhtin, Ivan S., 116 Igonin, Dmitry M., 309
Demareva, Valeriia A., 89 Isaev, Igor, 319
Demidovskij, Alexander V., 375 Ivanchenko, Mikhail V., 384
Demin, Vyacheslav, 255
Dick, Olga E., 172 K
Dolenko, Sergey, 285, 319 Kaganowa, Inna, 412
Dolenko, Tatiana, 285, 319 Kapustina, Ekaterina O., 271
Dolzhenko, Alexandr V., 271 Karandashev, I. M., 230, 359
Dunin-Barkowski, Witali L., 405 Kartashov, Sergey I., 144
Kashcheev, Mikhail, 53
E Kazantsev, Victor B., 190
Edeleva, Yu. A., 89 Khayrov, E. M., 230
Eﬁtorov, Alexander, 285 Kholodny, Yuri I., 144

B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 427–428, 2020.
https://doi.org/10.1007/978-3-030-30425-6
428 Author Index

Khusnetdinov, Dmitry R., 295 R

Kiselev, Mikhail, 398 Red’ko, Vladimir G., 124, 131
Kitov, Victor, 342 Revunkov, Georgiy, 78
Kniaz, Vladimir V., 3 Rozhnova, Maiya A., 190
Knyazeva, Irina, 239 Rybintsev, Andrey, 239
Kopeliovich, Mikhail, 53
Korlyakova, M. O., 367
S
Kotov, Vladimir B., 326
Schekalev, Alexey, 342
Kovalchuk, Mikhail V., 144
Shaposhnikov, Dmitry, 53
Kozlov, Dmitry S., 335
Skachkov, A. M., 106
Kozubenko, Evgeny, 53
Smirnitskaya, Irina A., 197
Krivonosov, Mikhail I., 384
Smirnova, Elena Y., 165
Sokhova, Zarema B., 131
L
Sokolov, Mikhail, 151
Laptinskiy, Kirill, 285, 319
Solovyeva, Kseniya P., 405
Lebedev, Alexander E., 405
Sozinova, I. M., 138
Litinskii, Leonid, 412
Stikharnyi, Aleksandr, 71
Lotareva, Yulia A., 384

M T
Makarenko, Nikolay, 239 Taran, Maria, 78
Malakhov, Denis G., 144 Tarasov, A. S., 106
Malsagov, M. Yu., 230 Tarkhov, Dmitriy A., 351
Malykhina, Galina, 392 Tarkov, Mikhail S., 420
Matveev, Mikhail, 151 Telyatnikov, L. S., 359
Meilikov, Evgeny, 95 Terekhov, Serge A., 17
Mizginov, Vladimir A., 3 Terekhov, Valeri I., 295
Moshkantsev, Peter V., 3 Tiumentsev, Yury, 25
Moskalenko, Viktor, 246 Tiumentsev, Yury V., 309, 335
Muratov, Y. R., 106 Troﬁmov, Alexander G., 263

N U
Nekhaev, Dmitry, 255 Ushakov, Vadim L., 144
Nikiforov, M. B., 106
V
O Vasilyev, Alexander N., 351
Ohinko, Timur, 239 Vlasenko, Vladislav, 214
Orekhov, Alexey, 71 Volkov, Sergey V., 159
Orlov, Vyacheslav A., 144 Vvedensky, Victor, 151
Osipov, Grigory, 246

P Y
Palagushkin, Alexandr N., 326 Yakimova, Elena G., 165
Pankratova, Evgeniya V., 190 Yudin, Dmitry A., 271
Panov, Aleksandr I., 62 Yudkin, Fedor A., 326
Pashkov, Anton A., 116
Petrushan, Mikhail, 53 Z
Polyakov, Igor V., 303 Zaikin, Alexey A., 384
Preobrazhenskaia, Margarita M., 181 Zolotykh, Nikolai, 246

Data Intelligence and Cognitive Informatics: I. Jeena Jacob Selwyn Piramuthu Przemyslaw Falkowski-Gilski
No ratings yet
Data Intelligence and Cognitive Informatics: I. Jeena Jacob Selwyn Piramuthu Przemyslaw Falkowski-Gilski
579 pages
Jumper T8SG User Manual PDF
100% (1)
Jumper T8SG User Manual PDF
101 pages
Medical Applications of Artificial Intelligence (PDFDrive)
0% (1)
Medical Applications of Artificial Intelligence (PDFDrive)
480 pages
Advances in Computational Intelligence
100% (1)
Advances in Computational Intelligence
636 pages
Fundamentals of Computer: Complied By: Syed Bilal Hussain: 03334238250
No ratings yet
Fundamentals of Computer: Complied By: Syed Bilal Hussain: 03334238250
32 pages
Marc Install Instruct
No ratings yet
Marc Install Instruct
104 pages
Multimedia Digital Images
No ratings yet
Multimedia Digital Images
15 pages
Growing Adaptive Machines - Combining Development and Learning in Artificial Neural Networks (Kowaliw, Bredeche & Doursat 2014-06-05)
100% (2)
Growing Adaptive Machines - Combining Development and Learning in Artificial Neural Networks (Kowaliw, Bredeche & Doursat 2014-06-05)
266 pages
Opportunities and Challenges For Nextgeneration Applied Intellig 2009
No ratings yet
Opportunities and Challenges For Nextgeneration Applied Intellig 2009
341 pages
1.trainer Registration Process - Step by Step Guide
No ratings yet
1.trainer Registration Process - Step by Step Guide
44 pages
Applied Information Processing Systems 2022
100% (1)
Applied Information Processing Systems 2022
588 pages
Winmat Computing FG 6
No ratings yet
Winmat Computing FG 6
99 pages
Proceedings of The Global Ai Congress 2019 2020
67% (3)
Proceedings of The Global Ai Congress 2019 2020
712 pages
Intelligent Tools For Building A Scientific Information Platform - From Research To Implementation PDF
No ratings yet
Intelligent Tools For Building A Scientific Information Platform - From Research To Implementation PDF
297 pages
10.1007@978 981 15 5566 4
No ratings yet
10.1007@978 981 15 5566 4
781 pages
Mid Exam A
100% (1)
Mid Exam A
4 pages
10.1007@978 981 13 1498 8
No ratings yet
10.1007@978 981 13 1498 8
864 pages
Artificial Intelligence and Soft Computing
No ratings yet
Artificial Intelligence and Soft Computing
741 pages
10.1007@978 981 15 4032 5
No ratings yet
10.1007@978 981 15 4032 5
1,093 pages
Cute Family Doodle Collage For Marketing by Slidesgo
No ratings yet
Cute Family Doodle Collage For Marketing by Slidesgo
55 pages
Omid Miraei PDF
No ratings yet
Omid Miraei PDF
997 pages
The Fundamentals of Computational Intelligence (2017)
100% (1)
The Fundamentals of Computational Intelligence (2017)
389 pages
8d083e3e409247598c7bc38c33682067 (1)
No ratings yet
8d083e3e409247598c7bc38c33682067 (1)
30 pages
Mobius Manual
No ratings yet
Mobius Manual
100 pages
AIDE - 2019: Artificial Intelligence and Data Engineering - 2019
No ratings yet
AIDE - 2019: Artificial Intelligence and Data Engineering - 2019
8 pages
Preparation For Digital Investigations
No ratings yet
Preparation For Digital Investigations
25 pages
2019 Bookmatter ComputationalIntelligenceTheor PDF
No ratings yet
2019 Bookmatter ComputationalIntelligenceTheor PDF
11 pages
Cognitive Computing: Methodologies For Neural Computing and Semantic Computing in Brain-Inspired Systems
No ratings yet
Cognitive Computing: Methodologies For Neural Computing and Semantic Computing in Brain-Inspired Systems
14 pages
(Advances in Intelligent Systems and Computing 836) Oleg Chertov, Tymofiy Mylovanov, Yuriy Kondratenko, Janusz Kacprzyk, Vladik Kreinovich, Vadim Stefanuk-Recent Developments in Data Science and Intel.pdf
No ratings yet
(Advances in Intelligent Systems and Computing 836) Oleg Chertov, Tymofiy Mylovanov, Yuriy Kondratenko, Janusz Kacprzyk, Vladik Kreinovich, Vadim Stefanuk-Recent Developments in Data Science and Intel.pdf
391 pages
Proceedings of Research and Applications in Artificial Intelligence
No ratings yet
Proceedings of Research and Applications in Artificial Intelligence
350 pages
Computational Intelligence Paradigms Inn PDF
No ratings yet
Computational Intelligence Paradigms Inn PDF
280 pages
Computational Neuroscience - A First Course - 2013
100% (1)
Computational Neuroscience - A First Course - 2013
142 pages
GLCM Features
No ratings yet
GLCM Features
494 pages
Artificial Intelligence Within The Interplay Between Natural and Artificial
No ratings yet
Artificial Intelligence Within The Interplay Between Natural and Artificial
34 pages
1.1 - Computational Intelligence PDF
No ratings yet
1.1 - Computational Intelligence PDF
31 pages
Neural Networks: Robert Kozma, Steven Bressler, Leonid Perlovsky, Ganesh Kumar Venayagamoorthy
No ratings yet
Neural Networks: Robert Kozma, Steven Bressler, Leonid Perlovsky, Ganesh Kumar Venayagamoorthy
2 pages
Previewpdf
No ratings yet
Previewpdf
166 pages
Hai Thuong Y Tong Tam Linh - Quyen 1
No ratings yet
Hai Thuong Y Tong Tam Linh - Quyen 1
1,162 pages
Computational Intelligence
No ratings yet
Computational Intelligence
31 pages
Data Acquisition in MATLAB
No ratings yet
Data Acquisition in MATLAB
1 page
Distributed Computing and Artifi Cial Intelligence, 11th International Conference
No ratings yet
Distributed Computing and Artifi Cial Intelligence, 11th International Conference
562 pages
Intelligent Computing Theories and Application: De-Shuang Huang Kang-Hyun Jo Zhi-Kai Huang
No ratings yet
Intelligent Computing Theories and Application: De-Shuang Huang Kang-Hyun Jo Zhi-Kai Huang
810 pages
Simon
No ratings yet
Simon
10 pages
Brain-Inspired Computing: Katrin Amunts Lucio Grandinetti Thomas Lippert Nicolai Petkov
No ratings yet
Brain-Inspired Computing: Katrin Amunts Lucio Grandinetti Thomas Lippert Nicolai Petkov
163 pages
Artificial Intelligence in Medicine Book - 2022 - 2
No ratings yet
Artificial Intelligence in Medicine Book - 2022 - 2
15 pages
IEEE Conference Flyer 2022
No ratings yet
IEEE Conference Flyer 2022
1 page
SOFAPP5
100% (1)
SOFAPP5
19 pages
Animation Course
No ratings yet
Animation Course
2 pages
20 3 Image
No ratings yet
20 3 Image
267 pages
Sunil Kumar Dubey
No ratings yet
Sunil Kumar Dubey
3 pages
10.1007@978 3 030 37078 7
No ratings yet
10.1007@978 3 030 37078 7
280 pages
User-Agents Twitter-App Application Android
No ratings yet
User-Agents Twitter-App Application Android
14 pages
Msi WS75 10TM-492
No ratings yet
Msi WS75 10TM-492
1 page
Proceedings of The 11Th International Conference On Soft Computing and Pattern Recognition (Socpar 2019)
No ratings yet
Proceedings of The 11Th International Conference On Soft Computing and Pattern Recognition (Socpar 2019)
323 pages
Sts Reviewer
No ratings yet
Sts Reviewer
3 pages
Artificial Intelligence: in Neuroscience
No ratings yet
Artificial Intelligence: in Neuroscience
675 pages
Pccda Book
No ratings yet
Pccda Book
830 pages
10.1007@978 3 030 01328 8
No ratings yet
10.1007@978 3 030 01328 8
353 pages
Strixz790-H Gaming Wifi Um V2 Web
No ratings yet
Strixz790-H Gaming Wifi Um V2 Web
88 pages
Campus Bridge
No ratings yet
Campus Bridge
22 pages
ITWS R23 Record Notes Unit 5
No ratings yet
ITWS R23 Record Notes Unit 5
8 pages
Project PPT 1
No ratings yet
Project PPT 1
10 pages
Campus Automation System (Full Document)
No ratings yet
Campus Automation System (Full Document)
10 pages
Visual Chooser PDF
No ratings yet
Visual Chooser PDF
1 page
Dataand Communication Networks
No ratings yet
Dataand Communication Networks
332 pages
Keycode Constants in VBA
No ratings yet
Keycode Constants in VBA
4 pages
2013 Book AdvancesInComputationalIntelli
No ratings yet
2013 Book AdvancesInComputationalIntelli
543 pages
ET3491 Embedded Lab
No ratings yet
ET3491 Embedded Lab
24 pages
MCQ For Unit 3 Event Handling
No ratings yet
MCQ For Unit 3 Event Handling
16 pages
(FREE PDF Sample) Why Irrational Politics Appeals Mari Fitzduff Ebooks
No ratings yet
(FREE PDF Sample) Why Irrational Politics Appeals Mari Fitzduff Ebooks
77 pages
OpenStack Docs - Scheduling
No ratings yet
OpenStack Docs - Scheduling
13 pages
10.1007@978 3 030 25213 7
No ratings yet
10.1007@978 3 030 25213 7
244 pages
Click The Link Below To Download
No ratings yet
Click The Link Below To Download
81 pages
Front
No ratings yet
Front
29 pages
101577416
No ratings yet
101577416
58 pages
Dr. Rohit Raja
No ratings yet
Dr. Rohit Raja
154 pages
Information Systems and Neuroscience NeuroIS Retreat 2019 Digital Download
100% (8)
Information Systems and Neuroscience NeuroIS Retreat 2019 Digital Download
16 pages
Distributed Computing and Artificial Intelligence 12th Internati 2015
No ratings yet
Distributed Computing and Artificial Intelligence 12th Internati 2015
418 pages
Advances in Neural Computation, Machine Learning, and Cognitive Research VII
No ratings yet
Advances in Neural Computation, Machine Learning, and Cognitive Research VII
505 pages
Brain Inspired Information Technology 2010th Edition by Hanazawa ISBN 3642040245 9783642040245 Instant Download
No ratings yet
Brain Inspired Information Technology 2010th Edition by Hanazawa ISBN 3642040245 9783642040245 Instant Download
89 pages
Computational Intelligence: Theories, Applications and Future Directions
No ratings yet
Computational Intelligence: Theories, Applications and Future Directions
594 pages
Lecture 3.2.4 (Tiled Chip Multicore Processors)
No ratings yet
Lecture 3.2.4 (Tiled Chip Multicore Processors)
3 pages
Information Systems and Neuroscience NeuroIS Retreat 2023, Vienna, Austria Full Download
100% (20)
Information Systems and Neuroscience NeuroIS Retreat 2023, Vienna, Austria Full Download
14 pages
How To Apply
No ratings yet
How To Apply
3 pages
2020 Book IntelligentComputingParadigmRe
No ratings yet
2020 Book IntelligentComputingParadigmRe
129 pages
2020 Book DistributedComputingAndArtific
No ratings yet
2020 Book DistributedComputingAndArtific
72 pages
Machine Learning For Predictive Analysis Proceedings of Ictis 2020 Amit Joshi PDF Download
No ratings yet
Machine Learning For Predictive Analysis Proceedings of Ictis 2020 Amit Joshi PDF Download
76 pages
Natural and Artificial Computation for Biomedicine and Neuroscience International Work Conference on the Interplay Between Natural and Artificial Computation IWINAC 2017 Corunna Spain June 19 23 2017 Proceedings Part I 1st Edition José Manuel Ferrández Vicente instant download
100% (2)
Natural and Artificial Computation for Biomedicine and Neuroscience International Work Conference on the Interplay Between Natural and Artificial Computation IWINAC 2017 Corunna Spain June 19 23 2017 Proceedings Part I 1st Edition José Manuel Ferrández Vicente instant download
54 pages
Biomedical Applications Based on Natural and Artificial Computing International Work Conference on the Interplay Between Natural and Artificial Computation IWINAC 2017 Corunna Spain June 19 23 2017 Proceedings Part II 1st Edition José Manuel Ferrández Vicente instant download
100% (2)
Biomedical Applications Based on Natural and Artificial Computing International Work Conference on the Interplay Between Natural and Artificial Computation IWINAC 2017 Corunna Spain June 19 23 2017 Proceedings Part II 1st Edition José Manuel Ferrández Vicente instant download
54 pages
AI Threat to the Human Knowledge Base: Correcting the Course of a Language Model
From Everand
AI Threat to the Human Knowledge Base: Correcting the Course of a Language Model
Joseph J. Jean-Claude
No ratings yet

Advances in Neural Computation Machine Learning and Cognitive Re 2020

Uploaded by

Advances in Neural Computation Machine Learning and Cognitive Re 2020

Uploaded by

Studies in Computational Intelligence 856

More information about this series at http://www.springer.com/series/7092

Vladimir Redko Yury Tiumentsev

Vladimir Redko Yury Tiumentsev

ISSN 1860-949X ISSN 1860-9503 (electronic)

The international conference “Neuroinformatics” is the annual multidisciplinary

Prof. Alexander N. Gorban (Tentative Chair of the International Advisory Board)

Program Committee of the XXI International Conference

Vedyakhin A. A. Sberbank and Moscow Institute of Physics

Kryzhanovskiy Boris Scientiﬁc Research Institute for System Analysis,

Ajith Abraham Machine Intelligence Research Labs (MIR Labs),

Kussul Ernst The National Autonomous University of Mexico,

Yakhno Vladimir The Institute of Applied Physics of the Russian

Cognitive Sciences and Brain-Computer Interface, Adaptive Behavior

Neurobiology and Neurobionics

Two Delay-Coupled Neurons with a Relay Nonlinearity . . . . . . . . . . . . 181

Applications of Neural Networks

Neural Network Theory, Concepts and Architectures

Towards Automatic Manipulation of Arbitrary Structures

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427

Vladimir V. Kniaz1,2(B) , Peter V. Moshkantsev1,3 , and Vladimir A. Mizginov1

Abstract. Reconstruction of a 3D model from a single image is chal-

Keywords: Generative adversarial networks · Deep learning ·

Prediction of a 3D model from an image requires an estimation of the camera

Fig. 1. Results of our image-to-voxel translation based on generative adversarial net-

reconstruction, prediction of a 3D model from a monocular camera is required

Single-Photo 3D Model Reconstruction. Accurate 3D reconstruction is

3D Shape Datasets. Several 3D shape datasets were designed [6,27,38,44]

3.1 Frustum Voxel Model

Fruxel model representation provides multiple advantages. Firstly, each XY

P (x, y) = argmin[F (x, y, i) = 1] · sz + zn , (2)

where P (x, y) is an element of a depth map, F (x, y, i) element of a fruxel model

3.2 Conditional Adversarial Networks

Generative adversarial networks generate a signal B̂ for a given random noise

256 × 256 × 3 128 × 128 × 64 64 × 64 × 128 32 × 32 × 256 16 × 16 × 512 8 × 8 × 512

Feature map 2D convolution 3D deconvolution Copy inflate

Fig. 2. The architecture of the generator.

3.3 Z-GAN Framework

3.4 Z-GAN Model

The main idea of our volumetric generator G is to use the correspondence

3.5 Synthetic Dataset Generation Technique

4.2 Qualitative Evaluation

only on synthetic data fails to generalize to real images. Nevertheless, it success-

Input GT Real Synthetic Real+Synthetic

Fig. 6. Qualitative evaluation on synthetic images from SyntheticVoxels dataset. Fruxel

Input GT Real Synthetic Real+Synthetic

4.3 Quantitative Evaluation

We present results of the quantitative evaluation in terms of Intersection over

Method Object class

We demonstrated that augmentation of the dataset with the synthetic data

SC Svyaznoy, Moscow, Russian Federation

Abstract. The neural network generalization of Tensor Train decom-

Keywords: Tensor Train Neural Network · Statistical estimation ·

considered as dimensions of multi-dimensional tables or modes of a tensor [1].

The estimates of variational matrices g(i1 ), G(i2 ), ...G(id−1 ), g(id ) could be

Table 1. Example of report row with β estimates.

index i0 index i1 index i2 beta.med beta.lo beta.hi

4 Active Operations and Context Bandits

where δ is Kronecker delta, and |V | is total count of available actions. In case

Contemporary tensor decomposition models and algorithms are subject of inten-

3. Tensor Decompositions: Applications and Eﬃcient Algorithms at SIAM CSE

Yury Tiumentsev(B) and Mikhail Egorchev

Moscow Aviation Institute (National Research University), Moscow, Russia

Abstract. One of the critical elements of the process of creating new

Keywords: Nonlinear dynamical system · Semi-empirical model ·

diﬀerential equations (for systems with distributed parameters). As applied to

2 Dynamical System as an Object of Study

y = Φ(u, ξ, ζ) = G(F (u, ξ), ζ)

Fig. 1. General structure of the simulated dynamical system

y = Φ(u(t), ξ(t), ζ(t)) = G(F (u(t), ξ(t)), ζ(t)).

Let for system S be made Np observations

{yi } = Φ(ui , ξ, ζ), i = 1, . . . , NP , (1)

{ui , yi }, ui ∈ U, yi ∈ Y, i = 1, . . . , NP . (2)

mapping to evaluate its generalizing properties is performed

{ũj , ỹj }, ũ ∈ U, ỹ ∈ Y, i = 1, . . . , NT , (6)

this requires that the condition ui = ũi , ∀i ∈ {1, . . . , NP }, ∀j ∈ {1, . . . , NT } is

this requires that the condition ui = ũi , ∀i ∈ {1, . . . , NP }, ∀j ∈ {1, . . . , NT } is