Advances in Neural Computation Machine Learning and Cognitive Re 2020
Advances in Neural Computation Machine Learning and Cognitive Re 2020
Boris Kryzhanovsky
Witali Dunin-Barkowski
Vladimir Redko
Yury Tiumentsev Editors
Advances in Neural
Computation, Machine
Learning, and
Cognitive Research III
Selected Papers from the XXI
International Conference on
Neuroinformatics, October 7–11, 2019,
Dolgoprudny, Moscow Region, Russia
Studies in Computational Intelligence
Volume 856
Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
The books of this series are submitted to indexing to Web of Science,
EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.
Editors
Advances in Neural
Computation, Machine
Learning, and Cognitive
Research III
Selected Papers from the XXI International
Conference on Neuroinformatics,
October 7–11, 2019, Dolgoprudny,
Moscow Region, Russia
123
Editors
Boris Kryzhanovsky Witali Dunin-Barkowski
Scientific Research Institute for System Scientific Research Institute for System
Analysis of Russian Academy of Sciences Analysis of Russian Academy of Sciences
Moscow, Russia Moscow, Russia
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Boris Kryzhanovsky
Witali Dunin-Barkowski
Vladimir Redko
Yury Tiumentsev
v
Organization
Editorial Board
Boris Kryzhanovsky Scientific Research Institute for System Analysis
of Russian Academy of Sciences
Witali Dunin-Barkowsky Scientific Research Institute for System Analysis
of Russian Academy of Sciences
Vladimir Red’ko Scientific Research Institute for System Analysis
of Russian Academy of Sciences
Yury Tiumentsev Moscow Aviation Institute
(National Research University)
Advisory Board
vii
viii Organization
Physical Address:
KEDRI
Auckland University of Technology
AUT Tower, Level 7
Corner Rutland and Wakefield Street
Auckland
Postal Address:
KEDRI
Auckland University of Technology
Private Bag 92006
Auckland 1142
New Zealand
Prof. Jun Wang, PhD, FIEEE, FIAPR
Chair Professor of Computational Intelligence
Department of Computer Science
City University of Hong Kong
Kowloon Tong, Kowloon, Hong Kong
+852 34429701 (tel.)
+852-34420503 (fax)
[email protected]
Co-chairs
Program Committee
Invited Papers
Deep Learning a Single Photo Voxel Model Prediction from Real
and Synthetic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Vladimir V. Kniaz, Peter V. Moshkantsev, and Vladimir A. Mizginov
Tensor Train Neural Networks in Retail Operations . . . . . . . . . . . . . . . 17
Serge A. Terekhov
Semi-empirical Neural Network Based Modeling and Identification
of Controlled Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Yury Tiumentsev and Mikhail Egorchev
Artificial Intelligence
Photovoltaic System Control Model on the Basis of a Modified
Fuzzy Neural Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Ekaterina A. Engel and Nikita E. Engel
Impact of Assistive Control on Operator Behavior Under High
Operational Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Mikhail Kopeliovich, Evgeny Kozubenko, Mikhail Kashcheev,
Dmitry Shaposhnikov, and Mikhail Petrushan
Hierarchical Actor-Critic with Hindsight for Mobile Robot
with Continuous State Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Staroverov Aleksey and Aleksandr I. Panov
The Hybrid Intelligent Information System for Music Classification . . . 71
Aleksandr Stikharnyi, Alexey Orekhov, Ark Andreev,
and Yuriy Gapanyuk
The Hybrid Intelligent Information System for Poems Generation . . . . 78
Maria Taran, Georgiy Revunkov, and Yuriy Gapanyuk
xiii
xiv Contents
Deep Learning
The Simple Approach to Multi-label Image Classification Using
Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Yuriy S. Fedorenko
Application of Deep Neural Network for the Vision System
of Mobile Service Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Nikolay Filatov, Vladislav Vlasenko, Ivan Fomin,
and Aleksandr Bakhshiev
Research on Convolutional Neural Network for Object
Classification in Outdoor Video Surveillance System . . . . . . . . . . . . . . . 221
I. S. Fomin and A. V. Bakhshiev
Post-training Quantization of Deep Neural Network Weights . . . . . . . . 230
E. M. Khayrov, M. Yu. Malsagov, and I. M. Karandashev
Deep-Learning Approach for McIntosh-Based Classification
Of Solar Active Regions Using HMI and MDI Images . . . . . . . . . . . . . . 239
Irina Knyazeva, Andrey Rybintsev, Timur Ohinko,
and Nikolay Makarenko
Deep Learning for ECG Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Viktor Moskalenko, Nikolai Zolotykh, and Grigory Osipov
Competitive Maximization of Neuronal Activity in Convolutional
Recurrent Spiking Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Dmitry Nekhaev and Vyacheslav Demin
A Method of Choosing a Pre-trained Convolutional Neural
Network for Transfer Learning in Image Classification Problems . . . . . 263
Alexander G. Trofimov and Anastasia A. Bogatyreva
The Usage of Grayscale or Color Images for Facial Expression
Recognition with Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 271
Dmitry A. Yudin, Alexandr V. Dolzhenko, and Ekaterina O. Kapustina
xvi Contents
1 Introduction
elements that represent slices of the camera’s frustum. Each fruxel is aligned with
the pixel of the input color image (see Fig. 1). Fruxel models facilitate robust
training of a view-centered model as the contour alignment between the input
image and the fruxel model is preserved.
To the best of our knowledge, there are no results in the literature regarding
view-centered voxel model dataset generation using synthetic images and 3D
modeling. In this paper, we explore the impact of the synthetic data in the
performance of a view-centered model. We use a recently proposed generative
adversarial model Z-GAN [24] as a starting point for our research. We prepared
an extensive SyntheticVoxels dataset with 2k synthetic images of three object
classes and corresponding ground truth fruxel models. We made our dataset
publicly available. We compare the performance of the Z-GAN model trained on
real, synthetic, and mixed data.
The results of joint training on the synthetic and real data are encouraging
and show that synthetic data allows the model to generalize to previously unseen
objects. The developed view-centered dataset generation technique allows mod-
eling challenging 3D object configurations and traffic situations that can not be
reconstructed online using laser scanning or similar approaches.
2 Related Work
Generative Adversarial Networks. Development of a new type of neural
networks known as Generative Adversarial Networks (GANs) [14] made it possi-
ble to provide a mapping from a random noise vector to a domain of the desired
outputs (e.g., images, voxel models). GANs have received a lot of scholar atten-
tion in recent years. These networks provide inspiring results in such tasks as
image-to-image translation [20] and the voxel model generation [42].
Methods that leverage a latent space for 3D shape synthesis were developed
recently [5,13,42]. Wu et al. have proposed a GAN model [42] for a voxel model
generation (3D-GAN). This made it possible to predict models with a resolution
64 × 64 × 64 elements from a randomly sampled noise vector. The developed
method was used for a single-image 3D reconstruction using an approach pro-
posed in [13]. Despite the fact that 3D-GAN increased the number of elements in
the model compared to [13], the generalization ability of this method was low,
especially for previously unseen objects.
3 Method
The aim of the present research is to compare the performance of a single photo
voxel model prediction method trained on synthetic, real and mixed data. In
our research we use a generative adversarial network Z-GAN [24] that performs
color image-to-voxel model translation. Z-GAN model uses a special kind of voxel
model in which the voxel model is aligned with an input image.
While a depth map that present distances only to the object surface from
a given viewpoint, the voxel model includes information about the entire 3D
scene. The proposed frustum voxel models combines features of a depth map
and a voxel model. We use a hypothesis made by [41] as the starting point
for our research. To provide the aligned voxel model, we combine depth map
representation with a voxel grid. We term the resulting 3D model as a Frustum
Voxel model (Fruxel model).
deconv3D
deconv3D
deconv3D
deconv3D
deconv3D
deconv3D
deconv3D
deconv3D
4×4×4
4×4×4
4×4×4
4×4×4
4×4×4
2×4×4
4×4×4
2×4×4
1 × 1 × 1 × 1024
2 × 2 × 2 × 1024
4 × 4 × 4 × 1024
16 × 16 × 16 × 1024 8 × 8 × 8 × 1024
128 × 128 × 128 128 × 128 × 128 × 128 64 × 64 × 64 × 256 32 × 32 × 32 × 512
We use pix2pix [20] framework as a base to develop our Z-GAN model. We keep
the encoder part of the generator unchanged. We change 2D convolution layers
with 3D deconvolution layers to encode a correlation between neighbor slices
along the Z-axis.
We keep the skip connections between the layers of the same depth that
were proposed in the U-Net model [34]. We believe that skip connections help
to transfer high-frequency components of the input image to the high-frequency
components of the 3D shape.
Fig. 3. Synthetic dataset generation technique: (a) virtual camera, (b) slice of
fruxel model, (c) cutting plane, (d) low-poly 3D model, (e) synthetic color image.
We use d copies of each channel of F2D to fill the third dimension of F3D . We
term this operation as “copy inflate”. The architecture of generator is presented
in Fig. 2.
Fig. 4. Examples of color images and corresponding fruxel models from our Synthet-
icVoxels dataset. Fruxel models are presented as depth maps in pseudo-colors.
cutting plane movement using the Blender Python API. We use an additional
ground plane to provide realistic object shadows. We render the plane with shad-
ows separately and use alpha-compositing to obtain the final synthetic image.
SyntheticVoxels Dataset. Examples of synthetic images with ground truth
fruxel models from our SyntheticVoxels dataset are presented in Figs. 4 and 5.
The dataset includes images and fruxel models for four object classes: car, truck,
off-road vehicle, and van.
4 Experiments
4.1 Network Training
Our Z-GAN framework was trained on the VoxelCity [24] and SyntheticVoxels
datasets using PyTorch library [29]. We use independent test splits of Synthet-
icVoxels and VoxelCity datasets for evaluation with fruxel model parameters
{zn = 2, zf = 12, d = 128, α = 40◦ }. The training was performed using the
NVIDIA 1080 Ti GPU and took 20 hours for the whole framework. For network
optimization, we use a minibatch stochastic gradient descent with an Adam
solver. We set the learning rate to 0.0002 with momentum parameters β1 = 0.5,
β2 = 0.999 similar to [20].
Car
Van
Fig. 5. Examples of color images and corresponding fruxel models from our Synthet-
icVoxels dataset. Fruxel models are presented as depth maps in pseudo-colors.
Fig. 7. Qualitative evaluation on real images from VoxelCity dataset. Fruxel models
are presented as depth maps in pseudo-colors.
Hamiltonian Mechanics 13
Table 1. IoU metric for different object classes for Z-GAN model trained on real, syn-
thetic and mixed data.
5 Conclusions
Acknowledgments. The reported study was funded by Russian Foundation for Basic
Research (RFBR) according to the project No 17-29-04410, and by the Russian Science
Foundation (RSF) according to the research project No 19-11-11008.
14 V. V. Kniaz et al.
References
1. Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.:
Pose guided RGBD feature learning for 3d object pose estimation. In: IEEE Inter-
national Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October
2017, pp. 3876–3884 (2017). https://doi.org/10.1109/ICCV.2017.416
2. Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.K.:
Pose guided RGBD feature learning for 3D object pose estimation. In: The IEEE
International Conference on Computer Vision (ICCV) (2017)
3. Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S.,
Rother, C.: DSAC - differentiable RANSAC for camera localization. In: The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
4. Brachmann, E., Rother, C.: Learning less is more - 6d camera localization via
3d surface regression. In: The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) (2018)
5. Brock, A., Lim, T., Ritchie, J., Weston, N.: Generative and discriminative voxel
modeling with convolutional neural networks, pp. 1–9 (2016). https://nips.cc/
Conferences/2016. Workshop contribution; Neural Information Processing Con-
ference : 3D Deep Learning, NIPS, 05–12 Dec 2016
6. Chang, A.X., Funkhouser, T.A., Guibas, L.J., Hanrahan, P., Huang, Q.X., Li, Z.,
Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: Shapenet: an
information-rich 3d model repository (2015). CoRR arXiv:abs/1512.03012
7. Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3d-r2n2: a unified approach
for single and multi-view 3d object reconstruction. In: Proceedings of the European
Conference on Computer Vision (ECCV) (2016)
8. Doumanoglou, A., Kouskouridas, R., Malassiotis, S., Kim, T.: Recovering 6d object
pose and predicting next-best-view in the crowd. In: 2016 IEEE Conference on
Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA,
27–30 June 2016, pp. 3583–3592 (2016). https://doi.org/10.1109/CVPR.2016.390
9. Drost, B., Ulrich, M., Bergmann, P., Hartinger, P., Steger, C.: Introducing mvtec
itodd - a dataset for 3d object recognition in industry. In: The IEEE International
Conference on Computer Vision (ICCV) Workshops (2017)
10. El-Hakim, S.: A flexible approach to 3d reconstruction from single images. In: ACM
SIGGRAPH, vol. 1, pp. 12–17 (2001)
11. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The
pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338
(2009)
12. Firman, M., Mac Aodha, O., Julier, S., Brostow, G.J.: Structured prediction of
unobserved voxels from a single depth image. In: The IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR) (2016)
13. Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and
generative vector representation for objects, chap. 34, pp. 702–722. Springer, Cham
(2016)
14. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,
S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural
Information Processing Systems, pp. 2672–2680 (2014)
15. Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab,
N.: Model based training, detection and pose estimation of texture-less 3d objects
in heavily cluttered scenes. In: Asian Conference on Computer Vision, pp. 548–562.
Springer, Heidelberg (2012)
Hamiltonian Mechanics 15
16. Hodaň, T., Haluza, P., Obdržálek, Š., Matas, J., Lourakis, M., Zabulis, X.: T-
LESS: an RGB-D dataset for 6d pose estimation of texture-less objects. In: IEEE
Winter Conference on Applications of Computer Vision (WACV) (2017)
17. Hodan, T., Haluza, P., Obdrzálek, S., Matas, J., Lourakis, M.I.A., Zabulis, X.:
T-LESS: an RGB-D dataset for 6d pose estimation of texture-less objects. In: 2017
IEEE Winter Conference on Applications of Computer Vision, WACV 2017, Santa
Rosa, CA, USA, 24–31 March 2017, pp. 880–888 (2017). https://doi.org/10.1109/
WACV.2017.103
18. Hodaň, T., Matas, J., Obdržálek, Š.: On evaluation of 6d object pose estimation.
In: European Conference on Computer Vision Workshops (ECCVW) (2016)
19. Huang, Q., Wang, H., Koltun, V.: Single-view reconstruction via joint analysis of
image and shape collections. ACM Trans. Graph. 34(4), 87:1–87:10 (2015)
20. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with con-
ditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 5967–5976. IEEE (2017)
21. Kniaz, V.V., Remondino, F., Knyaz, V.A.: Generative adversarial networks
for single photo 3d reconstruction. ISPRS - International Archives of the
Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-
2/W9, 403–408 (2019). https://doi.org/10.5194/isprs-archives-XLII-2-W9-403-
2019. https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-2-
W9/403/2019/
22. Knyaz, V.: Deep learning performance for digital terrain model generation. In:
Proceedings SPIE Image and Signal Processing for Remote Sensing XXIV, vol.
10789, p. 107890X (2018). https://doi.org/10.1117/12.2325768
23. Knyaz, V.A., Chibunichev, A.G.: Photogrammetric techniques for road surface
analysis. ISPRS - Int. Arch. Photogram. Remote Sens. Spatial Inf. Sci. XLI(B5),
515–520 (2016)
24. Knyaz, V.A., Kniaz, V.V., Remondino, F.: Image-to-voxel model translation with
conditional adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) Computer
Vision - ECCV 2018 Workshops, pp. 601–618. Springer, Cham (2019)
25. Knyaz, V.A., Zheltov, S.Y.: Accuracy evaluation of structure from motion surface
3D reconstruction. In: Proceedings SPIE Videometrics, Range Imaging, and Appli-
cations XIV, vol. 10332, p. 103320 (2017). https://doi.org/10.1117/12.2272021
26. Krull, A., Brachmann, E., Nowozin, S., Michel, F., Shotton, J., Rother, C.:
Poseagent: budget-constrained 6d object pose estimation via reinforcement learn-
ing. In: The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR) (2017)
27. Lim, J.J., Pirsiavash, H., Torralba, A.: Parsing IKEA objects: fine pose estimation.
In: Proceedings of the IEEE International Conference on Computer Vision ICCV
(2013)
28. Ma, M., Marturi, N., Li, Y., Leonardis, A., Stolkin, R.: Region-sequence based
six-stream CNN features for general and fine-grained human action recognition in
videos. Pattern Recogn. 76, 506–521 (2017)
29. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z.,
Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
30. Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method
for predicting the 3d poses of challenging objects without using depth. In: IEEE
International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29
October 2017, pp. 3848–3856 (2017). https://doi.org/10.1109/ICCV.2017.413
31. Remondino, F., El-Hakim, S.: Image-based 3D modelling: a review. Photogram.
Rec. 21(115), 269–291 (2006)
16 V. V. Kniaz et al.
32. Remondino, F., Roditakis, A.: Human figure reconstruction and modeling from
single image or monocular video sequence. In: Fourth International Conference on
3-D Digital Imaging and Modeling, 2003 (3DIM 2003), pp. 116–123. IEEE (2003)
33. Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested
shape layers. arXiv.org (2018)
34. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedi-
cal image segmentation. In: International Conference on Medical Image Computing
and Computer-Assisted Intervention, pp. 234–241. Springer, Cham (2015)
35. Shin, D., Fowlkes, C., Hoiem, D.: Pixels, voxels, and views: a study of shape rep-
resentations for single view 3d object shape prediction. In: IEEE Conference on
Computer Vision and Pattern Recognition (CVPR) (2018)
36. Sock, J., Kim, K.I., Sahin, C., Kim, T.K.: Multi-task deep networks for depth-based
6D object pose and joint registration in crowd scenarios. arXiv.org (2018)
37. Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene
completion from a single depth image. In: The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (2017)
38. Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., Tenenbaum, J.B.,
Freeman, W.T.: Pix3d: dataset and methods for single-image 3d shape modeling.
In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
39. Tatarchenko, M., Dosovitskiy, A., Brox, T.: Multi-view 3D Models from single
images with a convolutional network. arXiv.org (2015)
40. Tejani, A., Kouskouridas, R., Doumanoglou, A., Tang, D., Kim, T.: Latent-class
hough forests for 6 DoF object pose estimation. IEEE Trans. Pattern Anal. Mach.
Intell. 40(1), 119–132 (2018). https://doi.org/10.1109/TPAMI.2017.2665623
41. Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, W.T., Tenenbaum, J.B.: MarrNet:
3D shape reconstruction via 2.5D sketches. arXiv.org (2017)
42. Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic
latent space of object shapes via 3D generative-adversarial modeling. In: Advances
in Neural Information Processing Systems, pp. 82–90 (2016)
43. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a
deep representation for volumetric shapes. In: 2013 IEEE Conference on Computer
Vision and Pattern Recognition, Princeton University, Princeton, United States,
pp. 1912–1920. IEEE (2015)
44. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3d object
detection in the wild. In: IEEE Winter Conference on Applications of Computer
Vision (WACV) (2014)
45. Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learn-
ing single-view 3d object reconstruction without 3d supervision. papers.nips.cc
(2016)
46. Yang, B., Rosa, S., Markham, A., Trigoni, N., Wen, H.: 3D object dense recon-
struction from a single depth view. arXiv preprint arXiv:1802.00411 (2018)
47. Yang, B., Wen, H., Wang, S., Clark, R., Markham, A., Trigoni, N.: 3D object
reconstruction from a single depth view with adversarial learning. In: The IEEE
International Conference on Computer Vision (ICCV) Workshops (2017)
48. Zheng, B., Zhao, Y., Yu, J.C., Ikeuchi, K., Zhu, S.C.: Beyond point clouds: scene
understanding by reasoning geometry and physics. In: The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR) (2013)
Tensor Train Neural Networks in Retail
Operations
Serge A. Terekhov(B)
1 Introduction
Stable statistical estimation of performance indicators and activity responses is
critical for control tasks in modern retail operations. Corporate data is not only
very noisy because of the intrinsic stochasticity of market processes, but observa-
tions are also the subject of truncation and censoring, e.g. due to endogenous con-
trol decisions and stock availability. Decision makers need truthful disturbance-
free measures both for the operations (such as sales) under the current condi-
tions, and for the potential value of business in some new or alternative contexts.
In case of retail network operations, the living example is an estimation of per-
spective sales of a certain commodity in alternative stores, where it was not
shown before.
This paper addresses two important classes of retail operations: the process
of sales over the retail network, and the optimization of active decisions, such
as pricing policy, marketing actions, and discounts. Optimal sales require the
control of stock distribution, while actions need valuable mix of their parameters.
The resulting value of operation depends on several context dimensions,
which will be treated as exogenous factors. For example, the estimated intensity
of buyers flow is identified at certain location (retail store), for particular item
to be sold, and at certain period of time. These discrete context variables are
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 17–24, 2020.
https://doi.org/10.1007/978-3-030-30425-6_2
18 S. A. Terekhov
2 Formulation
Consider the basic operation of daily sales of commodities portfolio in large retail
network stores for extended period. Collected data is counts of items sold for
each commodity, at each store, and for every business day. Each store sells just
a fraction of the whole list of available commodity types, limited both physically
TTNN in Retail 19
by the store capacity and by the marketing plan. The practical problem is to
estimate the intensity of sales of total set of portfolio items over the whole set
of stores.
Daily sales at particular context usually are limited by availability of stock,
thus some of the observed counts are censored by truncation. The probability
to observe a ≥ 0 counts at the end of time period t given initial available stock
r > 0 is defined by truncated Poisson distribution:
(β · t)a
P (a|r, β, t) = exp(−β · t), a = 1, 2, ...r − 1
a!
P (a = 0|r, β, t) = exp(−β · t)
r−1
P (a = r|r, β, t) = 1 − P (k|r, β, t)
k=0
where β is intensity of Poisson flow. This distribution can be easily derived from
the general queueing birth-death process [16].
The observations can be represented as a tensor with d = 3 modes for the
time periods, set of stores, and the portfolio items. To simplify the notation let’s
introduce tensor enumeration index s = (i1 , i2 , .., id ). The set of index combi-
nations of all available observations in data set A = {A(s)} is denoted as S.
Then each observation is independently generated from the distribution with
individual unknown parameter β = β(s).
Fig. 1. Schematic representation of Tensor Train Neural Network assembly. For each
set of tensor indices (filled), the linked chain of corresponding neural layers is composed
to compute the output β̂.
The stable estimates β̂ of the complete tensor from sparse, noisy and censored
data is challenging task [1], in some sense similar to tensor design of recommender
systems [9]. We will consider the variational approach, when β̂ is approximated
with a member of non-parametric low-rank Tensor Train model, where all matrix
elements are treated as free variational parameters. In classical formulation [2]:
β̂(i1 , ..., id ) ≈ g(i1 , j1 ) · G(j1 , i2 , j2 ) · ... · G(jd−2 , id−1 , jd−1 ) · g(jd−1 , id )
j1 ..jd−1
20 S. A. Terekhov
with log statistical link function for unrestricted model, or identity link with the
restriction of non-negativity of matrix elements. Algebraic multiplications h(j) =
g(i1 , :) · G(:, i2 , j) are the special case of more general neural layer functions
h(i2 , j) = Fj (h(i1 , :) · G(:, i2 , j)), where h(i1 , :) = g(i1 , :), and F is vector of
sigmoids. Neural transformations are applied mode by mode, with single neuron
for the last id . Resulting model is called Tensor Train Neural Network [6–8].
Conceptually, the diagram of TTNN functioning is shown in Fig. 1.
Instead of using a single large neural network, the TTNN model comprises
many tiny neural networks, with the number of units defined by tensor decompo-
sition rank. To estimate every element β̂(i1 , i2 , .., id ) the neural layers chain with
indices (i1 , i2 , .., id ) is dynamically composed. The gradient of modelled function
is computed via standard backpropagation chaining rule.
3 Estimation
The pattern of observed data samples follows some stable distribution as defined
by retail operational practice. Samples are independent, since at every location
and time period the outcome is produced by different customers. The logarithm
of observed data likelihood depends on variational tensor parameters:
L(A|β) = log P (A(s)|β(s), r(s))
s∈S
where s is tensor index, r are censoring indicators. The target Poisson intensities
β are given by TTNN neural model, as described above.
0.141
0.139
0.137
2 4 6
Fig. 2. Likelihoods (arbitrary units) of predictions from bagging ensembles versus vary-
ing model rank (decomposition matrix dimension, same for all tensor modes). Circles
are off-samples, dots - training samples for each committee member.
TTNN in Retail 21
It reads that at the particular store 3053, the SKU 471 for date 91 is expected
to sell with rate of one item in 7 ≈ 1/0.138 days, with 25% risk of running as
low as one item in 10 days or slower.
Adequate complexity of tensor neural model can be assessed also with ordered
rank statistics of false neighbors [15]. Let’s consider series of models with growing
22 S. A. Terekhov
tensor matrices dimensions M = 1, 2, .... For particular tensor mode (the latest
one with indices id is picked in our applications) pairwise distances between all
vectors g(id ) are computed. Ordered neighbors set for each vector U (id ) is col-
lected. The set of neighbors tends to stabilize when the model is approaching the
correct dimension. Rank correlations between these sets for decompositions of
varying complexity M and M + 1 are compared, and recommended model com-
plexity is one with low number of false neighbors. Formal rank-based hypothesis
testing criteria can be utilized [16].
The estimation problem discussed in previous section assumes the fixed exoge-
nous distribution of data samples collected in passive observation regime. This
is the usual situation for routine retail operations. No special treatment of data
except regular data quality checking is required in this case.
Another important class of applications is active operations that change data
generation conditions. These may include varying prices decisions, marketing
and discount actions, and other dynamic control technologies common in retail
practice. In these cases data generation process become partly endogenous, i.e. it
starts depending on performance under previously taken decisions and internal
system variables. This phenomenon is somewhat similar to “self-selection bias”
known in economics literature.
Well-established way to optimize the retail system performance is balanced
stratified sampling under conditions of randomized designed experiments, such
as latin hypercubes [17]. This approach, developed mostly for technical systems,
is very limited in retail practice where necessary experimentation in any non-
profitable conditions should be justified by additional gains in advance.
Tensor decomposition models lead to optimal utilization of all designed data,
since experimentation with different actions can be performed at different loca-
tions (and even with different commodities). Collected data is then fused into
one reward estimating tensor model.
In pure active settings the problem reduces to common contextual bandits
formulation [18]. Consider a set of available controls or actions V , also called
“bandit arms”. In retail applications these are pricing level decisions, discount
packages, or corporate KPI’s targeted to sales optimization. The actions are
applied in different contexts defined by tensor modes. Only one particular action
can be tried for particular tensor indices s = (i1 , i2 , .., id ), and the feedback
estimate is revealed only for chosen option.
Let’s extend the set of tensor modes with additional mode {id+1 ∈ V } for
available actions v ∈ V , including no-action option. The tensor TTNN model is
still directly applicable to estimation of events flow intensities β, provided that
selection of actions is randomized.
To actively choose more profitable action for each context the exploration
process [18,19] is used, with constant exploration factor γ. Given the context s,
the estimates of potentials β̂s are computed form TTNN model for each action v
TTNN in Retail 23
(last coordinate). Let v ∗ (s) is best believed action for s, estimated from TTNN
ensemble median, or from upper confidence bound (UCB), as defined by esti-
mated quantiles. The probability to select an action v in context s is given by:
γ
P (v ∈ V, s) = (1 − γ) · δv,v∗ (s) +
|V |
5 Conclusion
References
1. Acar, E., Dunlavy, D.M., Kolda, T.G., Mørup, M.: Scalable tensor factoriza-
tions with missing data. (2010). http://www.cs.sandia.gov/dmdunla/publications/
AcDuKoMo10.pdf
2. Oseledets, I.V., Tyrtyshnikov, E.E.: TT-cross approximation for multidimensional
arrays. Linear Algebra Appl. 432, 70–88 (2010)
24 S. A. Terekhov
1 Introduction
In the processes of development and operation of technical systems, including
aircraft, a significant place is occupied by the solution of such problems as the
analysis of the behavior of dynamical systems, the synthesis of control algorithms
for them, and the identification of their unknown or inaccurately known charac-
teristics. A crucial role in solving the problems of these three classes belongs to
mathematical and computer models of dynamic systems [1,2].
Traditional classes of mathematical models for technical systems are ordi-
nary differential equations (for systems with lumped parameters) and partial
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 25–42, 2020.
https://doi.org/10.1007/978-3-030-30425-6_3
26 Y. Tiumentsev and M. Egorchev
from it using training sets that are insufficient for traditional ANN-models; pro-
vide the ability to identify the characteristics of dynamic systems described by
nonlinear functions of many variables (for example, the coefficients of aerody-
namic forces and moments).
The following sections discuss the implementation of this approach, as well
as an example of its application for modeling the aircraft motion and identifying
the aerodynamic characteristics of the aircraft.
ξ ζ
u x y
F (u, ξ) G(x, ζ)
The system S perceives controlled u(t) and uncontrolled ξ(t) effects. Under
these influences, S changes its state x(t) according to its transformation (map-
ping) F (u(t), ξ(t)). At the initial time instant t = t0 the system state S takes
the value x(t0 ) = x0 .
The state x(t) is perceived by the sensor (observer) implementing the trans-
formation G(x(t), ζ(t)), and is given as the output of the system S, i.e. as the
results of observation y(t) for its state x(t). The imperfection state sensors of
the system S is taken into account by the introduction of an additional uncon-
trolled effect ζ(t) (“measuring noise”). The composition of mappings F (·) and
G(·) describes the relationship between the controlled input u(t) ∈ U of the sys-
tem S and its output y(t) ∈ Y , taking into account the influence of uncontrolled
effects ξ(t) and ζ(t) on the system under consideration:
each of which recorded the current value of the controlled input ui = u(ti ) and
the corresponding output yi = y(ti ). The results y(ti ), ti ∈ [t0 , tf ] of these
observations together with the corresponding values of the controlled inputs ui
form a set of NP ordered pairs:
It is required to find, using the data (2), such an approximation Φ(·) to
display Φ(·), implemented by the system S, to fulfill the condition
Φ(u(t), ξ(t), ζ(t)) − Φ(u(t), ξ(t), ζ(t)) ε,
(3)
∀u(ti ) ∈U, ∀ξ(ti ) ∈ Ξ, ∀ζ(ti ) ∈ Z, t ∈ [t0 , tf ], x(t0 ) = x0 .
Thus, as it follows from (3), it is necessary that the sought approximate map
has the required accuracy not only when reproducing observations (2), but
Φ(·)
also for all valid values of ui ∈ U for all valid initial conditions x(t0 ) = x0 .
We will call this mapping property Π(·) generalizing. The entries ∀ξ(ti ) ∈ Ξ
approximation will have the required
and ∀ζ(ti ) ∈ Z in (3) mean that the Φ(·)
accuracy provided that at any time instant t ∈ [t0 , tf ] uncontrolled impacts ξ(t)
on the S and the measurement noises ζ(t) do not exceed the permissible limits.
The mapping Φ(·) corresponds to the considered modeling object (dynamical
system S), and the mapping Φ(·) will be further named model of this object. We
will also further assume that for the S system, we have data of the form (2), and
possibly some knowledge of the “design” of the Φ(·) mapping implemented by
the considered system. In this case, the presence of data of this type is required.
At least, they are required to test the Φ(·)model being created. Knowledge about
the mapping Φ(·) may not be available, or they may be, but will not be used in
the formation of the model Φ(·).
Since the available number of experiments generating the set (2) is finite, the
norm · in the expression (3) will be treated as the standard deviation of the
form
NP
ξ, ζ) − Φ(u, ξ, ζ) = 1 i , ξ, ζ) − Φ(ui , ξ, ζ)]2
Φ(u, [Φ(u (4)
NP i=0
or form
NP
1
Φ(u, ξ, ζ) − Φ(u, ξ, ζ) =
i , ξ, ζ) − Φ(ui , ξ, ζ)]2 .
[Φ(u (5)
NP i=0
{ui , yi }N
i=1 ,
P
{ũj , ỹi }NT
j=1
should be non-matching.
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 29
The error on the test set (6) is calculated in the same way as for the training
set (2)
NT
1 j , ξ, ζ) − Φ(ũj , ξ, ζ)]2 ,
Φ(ũ, ξ, ζ) − Φ(ũ, ξ, ζ) = [Φ(ũ (7)
NT j=0
Now we can formulate the problem of forming a model of the dynamical system
S. We need to build a model Φ(·), which reproduce with the required level of
accuracy a mapping Φ(·), realized by the system S, i.e., a model of Φ(·) for
which the magnitude of the modeling error (7) or (8) on the test set (6) will not
exceed the specified permissible values ε in (3). This formation should be based
on the data (2) used to learn the model, as well as (6) used to test the model,
in addition, possibly on knowledge about the S system.
α(k + 1) q(k + 1)
φ(k + 1) ψ(k + 1)
1 1 1
Δt Δt Δt Δt
1
g
V
1
q̄S q̄Sc̄
− mV Jy 1
T2
Δ−1
−1 1
−2T ζ
In the model (9) the values α, q, ϕ and ϕ̇ are the states of the controlled
object, the variable ϕact is the control. We consider maneuverable aircraft F-16
as an example of a specific object of modeling. The source data for this aircraft
were taken from [27].
A block diagram of a semi-empirical model based on (9) is shown in Fig. 2.
Here, the Euler method of integrating ordinary differential equations was used
34 Y. Tiumentsev and M. Egorchev
α(k + 1) q(k + 1)
Δ−1
to transform the original model with continuous time into a model with discrete
time. For comparison, in Fig. 3 for the same model, a block diagram based on
the NARX network is shown. In both of these schemes, the links are highlighted
in red, whose synaptic weights are adjustable parameters of the model.
The type of test maneuver (x∗ (t), u∗ (t) in (10) determines the resulting ranges
of values of state and control variables, the type of excitation ũ(t) specifies the
variety of examples within these ranges.
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 35
Fig. 4. Test disturbances as functions of time used in studying the dynamics of con-
trolled systems: a is a random signal; b is a polyharmonic signal. Here φact is the
actuator command signal for the all-moving horizontal tail of the maneuverable air-
craft from the example (9)
As was shown in the work of Schröder [28] (also in [29,30]) in this case, it is
advisable to use the polyharmonic signal as an excitation. An example of such
a signal is shown in Fig. 4a. The mathematical model of such a signal uj acting
on the j-th control is a harmonic polynomial
2πkt
uj = Ak sin + ϕk , Ik ⊂ K, K = {1, 2, . . . , M }, (11)
T
k∈Ik
– bifurcations of the operation modes of the network when changing the values
of model tunable parameters (synaptic weights, biases, internal parameters
of neurons) in the process of learning the ANN-model [31];
– the presence of long-term dependencies of the network outputs on the inputs
and states of the ANN-model at previous time instants [32,33];
– a very complicated landscape of the error function, rugged by numerous deep,
narrow and curved gorges, and often having a plateau [34].
Bifurcation of Network Dynamics. In the theory of nonlinear dynamical
systems, bifurcation is a qualitative restructuring of the functioning modes of a
dynamical system with a small change in its parameters [35]. The bifurcation
of the network dynamics is a qualitative change in the dynamic properties and
behavior of the ANN-model with small changes in its adjustable parameters
(synaptic weights, biases and, in some cases, internal parameters of neurons).
In terms of neural network learning, this means that the landscape of the error
function changes abruptly and significantly.
Long-Term Dependencies. When learning dynamic networks, there is a so-
called problem of long-term dependencies, because the output of the ANN-model
depends on its inputs and states at previous time instants, including those far
from the current point in time. Gradient methods of searching the minimum
of the error function behave unsatisfactorily in this case. The reason for this
behavior is clarified by the analysis of the asymptotic behavior of the learning
error and its gradient in the backpropagation process [32,33], which shows that
the values of these quantities rapidly (exponentially, as a rule) decrease.
Complicated Landscape of the Error Function. One of the most important
reasons for the emergence of difficulties in learning dynamic ANN-models is a
very complicated relief of the error function, carved by numerous deep, narrow,
and curved gorges. This reason is the most difficult for implementation of the
ANN-model learning process. In this case, the determining factor is the number
of examples in the training set. In this situation, we can only rely on such
an approach to working with training data, which would allow increasing their
number used consistently.
The problem of learning the recurrent ANN-model, taking into account the
complicated relief of the error function, can be solved for it in various ways
[36–38]. These methods include the following:
– regularization
where SSE is total mean square error of the network, SSW is the sum of the
squares of the weights;
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 37
, deg 0
−5
act
φ
−10
0 2 4 6 8 10 12 14 16 18 20
10
q, deg/sec
−10
0 2 4 6 8 10 12 14 16 18 20
10
α, deg
0
0 2 4 6 8 10 12 14 16 18 20
0.5
E ,%
0
q
−0.5
0 2 4 6 8 10 12 14 16 18 20
5
E ,%
0
α
−5
0 2 4 6 8 10 12 14 16 18 20
t, sec
Fig. 5. Estimation of the restoration accuracy for the dependencies CL (α) and Cm (α)
based on the results of testing the ANN-model (point mode, identification and testing
using a polyharmonic signal). The output values of the object (9) and ANN-models
are shown by a blue line and a green line, respectively
Table 3. Simulation error on the test set for the semi-empirical model and three types
of excitation signals
8 Conclusions
References
1. Hangos, K.M., Bokor, J.: Analysis and Control of Nonlinear Process Systems.
Springer, Berlin (2004)
2. Kulakowski, B.T., Gardner, J.F., Shearer, J.L.: Dynamic Modeling and Control of
Engineering Systems, 3rd edn. Cambridge University Press, Oxford (2007)
3. Scott, L.R.: Numerical Analysis. Princeton University Press, New Jersey (2011)
4. Hairer, E., Norsett, S.P., Wanner, G.: Solving Ordinary Differential Equations I:
Nonstiff Problems, 2nd edn. Springer, Berlin (2008)
5. Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II: Stiff and
Differential-Algebraic Problems, 2nd edn. Springer, Berlin (2002)
6. Tao, G.: Adaptive Control Design and Analysis. Wiley, New York (2003)
7. Ioannou, P.A., Sun, J.: Robust Adaptive Control. Prentice Hall, New Jersey (1995)
8. Astolfi, A., Karagiannis, D., Ortega, R.: Nonlinear and Adaptive Control with
Applications. Springer, Berlin (2008)
9. Nelles, O.: Nonlinear System Identification: From Classical Approaches to Neural
Networks and Fuzzy Models. Springer, Berlin (2001)
10. Billings, S.A.: Nonlinear System Identification: NARMAX Methods in the Time,
Frequency and Spatio-temporal Domains. Wiley, New York (2013)
11. Egorchev, M.V., Kozlov, D.S., Tiumentsev, Yu.V., Chernyshev, A.V.: Neural net-
work based semi-empirical models for controlled dynamical systems. J. Comput.
Inf. Technol. 9, 3–10 (2013). [in Russian]
Semi-empirical ANN-Modeling of Controlled Dynamical Systems 41
12. Egorchev, M.V., Kozlov, D.S., Tiumentsev, Yu.V.: Neural network adaptive semi-
empirical models for aircraft controlled motion. In: Proceedings of the 29th
Congress of the International Council of the Aeronautical Sciences, vol. 4 (2014)
13. Egorchev, M.V., Tiumentsev, Yu.V.: Learning of semi-empirical neural network
model of aircraft three-axis rotational motion. Opti. Mem. Neural Netw. 24(3),
201–208 (2015)
14. Kozlov, D.S., Tiumentsev, Yu.V.: Neural network based semi-empirical models for
dynamical systems described by differential-algebraic equations. Opt. Mem. Neural
Netw. 24(4), 279–287 (2015)
15. Egorchev, M.V., Tiumentsev, Yu.V.: Semi-empirical neural network based app-
roach to modelling and simulation of controlled dynamical systems. Procedia Com-
put. Sci. 123, 134–139 (2018)
16. Egorchev, M.V., Tiumentsev, Yu.V.: Neural network semi-empirical modeling of
the longitudinal motion for maneuverable aircraft and identification of its aero-
dynamic characteristics. In: Advances in Neural Computation, Machine Learning,
and Cognitive Research. Studies in Computational Intelligence, vol. 736, pp. 65–71
(2018)
17. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall
PTR, New Jersey (2006)
18. Hagan, M.T., Demuth, H.B., Beale, M.: Neural Network Design. PWS Publishing
Company, New Orleans (1996)
19. Oussar, Y., Dreyfus, G.: How to be a gray box: dynamic semi-physical modeling.
Neural Netw. 14(9), 1161–1172 (2001)
20. Dreyfus, G.: Neural Networks - Methodology and Applications. Springer, Berlin
(2005)
21. Bohlin, T.: Practical Grey-Box Identification: Theory and Applications. Springer,
Berlin (2006)
22. Chen, Z., Wei, J., Jiang, R.: A gray-box neural network based model identification
and fault estimation scheme for nonlinear dynamic systems. Int. J. Neural Syst.
23(6), 1–15 (2013)
23. Rivals, I., Personnaz, L.: Black-box modeling with state-space neural networks.
In: Zbikowski, R., Hint, K.J. (Eds.) Neural Adaptive Control Technology, World
Scientific, pp. 237–264 (1996)
24. Brusov, V.S., Tiumentsev, Yu.V.: Neural Network Based Modeling of Aircraft
Motion. The MAI Publishing House, Moscow (2016). [in Russian]
25. Cook, M.V.: Flight Dynamics Principles. Elsevier, Amsterdam (2007)
26. Hull, D.G.: Fundamentals of Airplane Flight Mechanics. Springer, Berlin (2007)
27. Nguyen, L.T., Ogburn, M.E., Gilbert, W.P., Kibler, K.S., Brown, P.W., Deal, P.L.:
Simulator study of stall/post-stall characteristics of a fighter airplane with relaxed
longitudinal static stability. Technical Report, TP-1538, NASA, December 1979
28. Schröeder, M.R.: Synthesis of low-peak-factor signals and binary sequences with
low autocorrelation. IEEE Trans. Inf. Theory 16(1), 85–89 (1970)
29. Morelli, E.A., Klein, V.: Real-time parameter estimation in the frequency domain.
J. Guidance Control Dyn. 23(5), 812–818 (2000)
30. Smith, M.S., Moes, T.R., Morelli, E.A.: Flight investigation of prescribed simul-
taneous independent surface excitations for real-time parameter identification. In:
AIAA Paper 2003, No. 23, p. 5702 (2003)
31. Doya, K.: Bifurcations in the learning of recurrent neural networks. IEEE Int.
Symp. Circuits Syst. 6, 2777–2780 (1992)
32. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient
descent is difficult. Trans. Neur. Netw. 5(2), 157–166 (1994)
42 Y. Tiumentsev and M. Egorchev
33. Schaefer, A.M., Udluft, S., Zimmermann, H.-G.: Learning long-term dependencies
with recurrent neural networks. Neurocomputing 71(13–15), 2481–2488 (2008)
34. De Jesus, O., Horn, J.M., Hagan, M.T.: Analysis of recurrent network training
and suggestions for improvements. In: Proceedings of IJCNN, vol. 4, pp. 2632–
2637 (2001)
35. Seydel, R.: From Equilibrium to Chaos: Practical Bifurcation and Stability Anal-
ysis. Elsevier, Amsterdam (1988)
36. Phan, M.C., Hagan, M.T.: Error surface of recurrent neural networks. IEEE Trans.
Neural Netw. 24(11), 1709–1721 (2009)
37. Horn, J., De Jesús, O., Hagan, M.T.: Spurious valleys in the error surface of recur-
rent networks – analysis and avoidance. IEEE Trans. Neural Netw. 20(4), 686–700
(2009)
38. Elman, J.L.: Learning and development in neural networks: the importance of
starting small. Cognition 48(1), 71–99 (1993)
39. Mandic, D.P., Chambers, J.A.: Recurrent Neural Networks for Prediction: Learning
Algorithms, Architectures and Stability. Wiley, New York (2001)
Artificial Intelligence
Photovoltaic System Control Model
on the Basis of a Modified Fuzzy Neural Net
Abstract. This paper represents the photovoltaic system control model on the
basis of a modified fuzzy neural net. Based on the photovoltaic system condi-
tion, the modified fuzzy neural net provides a maximum power point tracking
under random perturbations. The architecture of the modified fuzzy neural net
was evolved using a neuro-evolutionary algorithm. The validity and advantages
of the proposed photovoltaic system control model on the basis of a modified
fuzzy neural net are demonstrated using numerical simulations. The simulation
results show that the proposed photovoltaic system control model on the basis of
a modified fuzzy neural net achieves real-time control speed and competitive
performance, as compared to a classical control scheme with a PID controller
based on perturbation & observation, or incremental conductance algorithm.
1 Introduction
The Republic of Khakassia is one of the most perspective regions for development of
solar power system in Russian Federation. The annual average of the solar insolation
for town Abakan is about 1450 kWh/sq.m [1]. That exceeds values of the European
part of the Russian Federation (about 1200-1450 kWh/sq.m). But the photovoltaic
(PV) systems aren’t stable due to complex dynamics of the solar irradiance fluctuations.
Therefore, maximum power point tracking (MPPT) algorithms have an important role
in solar power generation. We consider a non-linear MPPT problem for PV systems.
PV system is non-linear and commonly suffers from restrictions imposed by sudden
variations in the solar irradiance level. Within the research literature, a whole array of
differing MPPT algorithms has been proposed [2]. Among them, the perturbation &
observation (P&O) and incremental conductance (IC) algorithms are the most common
due to simplicity and easy implementation. But controllers based on P&O, or IC
algorithm for PV systems have slow response times to changing reference commands,
take considerable time to settle down from oscillating around the target reference state,
must often be designed by hand. Moreover, the PV system control model should be
robust to different environmental conditions, in order to reliably generate maximum
power. Therefore, automatic intelligent algorithms such as fuzzy neural networks are
promising alternatives [3].
The real-life PV systems have complex dynamic due to random variation of the
system parameters and fluctuation of the solar irradiance. Thus, neural-network-based
solutions have been proposed to approximate this complex dynamic [3]. But the neural
network needs to become more adaptive. Adaptive behavior can be enabled by mod-
ifying the network into a recurrent neural network with fuzzy units. This forms the
motivation for the development of a PV system control model on the basis of a
modified fuzzy neural net (MFNN) as presented in this paper. Compared to existing
fuzzy neural nets, including ANFIS, the MFNN includes recurrent neural networks and
fuzzy units. The function approximation capabilities of a neural net are exploited to
approximate a membership function.
where is the open-circuit voltage, is diode ideality constant, is the Boltzmann constant
ð1:381 1023 J=KÞ, T is temperature in Kelvin, Q is electron charge
ð1:602 1019 cÞ, IL is the light-generated current same as Iph ð AÞ, and Io is the
saturation diode current ð AÞ. We calculate the light-generated current as follows
where G is the radiation ðW=m2 Þ, Gref is the radiation under standard condition
ð1000 W=m2 Þ, ILref is the photoelectric current under standard condition (0.15 A), Tc ref
is module temperature under standard condition (298 K), aIsc is the temperature
coefficient of the short-circuit current ðA=K Þ ¼ 0:0065=K, IL is the light-generated
current. We calculate the reverse saturation current as follows
3
Io ¼ Ior T=Tref exp QEg =KN ð1=Tr 1=T Þ : ð3Þ
Photovoltaic System Control Model on the Basis of a MFNN 47
This step provides the fuzzy sets Aj , (A1 is sudden change of the PV system
conditions, A2 is smooth change of the PV system conditions) with membership
function lj ðsÞÞ; j ¼ 1::2.
Step 3. We created the MFNN based on the data (5). The MFNN includes two
recurrent neural networks Fj (number of delays is 2), j ¼ 1::2. The MFNN architec-
ture’s parameters (number of nodes in hidden layer, corresponded weights and biases)
have been coded into particles X. The dimension component of particle X is dh ¼
12 h þ 2 2 fDmin ¼ d1 ¼ 14; Dmax ¼ d10 ¼ 122g. To make the PV system control
become adaptive, it needs to have some idea of how the actual PV system behavior
differs from its expected behavior, so that the recurrent neural network Fj can recali-
brate its behavior intelligently during run time, and try to eliminate
the constant
tracking error. We give the recurrent neural network Fj lj ðsÞ; x an extra input lj ðsÞ
48 E. A. Engel and N. E. Engel
which corresponds to the value of membership function lj ðsÞ. This input signal of the
recurrent neural networks Fj lj ðsÞ; x will give useful feedback for providing the
maximum PV power during the dynamically changing PV system conditions. This
control approach does provide a more intelligent algorithm of generating the control
signal u on the basis of a MFNN. We evaluated the fitness function as follows:
X
H
f ðD; uÞ ¼ ð1=H Þ jD uj: ð7Þ
l¼1
This MFNN includes two recurrent neuronets Fk ðlk ðsÞ; xÞ; k ¼ 1::2. The afore-
mentioned recurrent neuronets are the two-layered networks with seven hidden neu-
rons. In this comparison study, the performance of the proposed PV system control
model on the basis of a MFNN is compared against the standard model with the PID
controller (based on P&O or IC algorithm), under the same conditions. Figures 2 and 3
show the simulation results.
Fig. 2. Plot of the PV system power provided by control model with PID controller based on
P&O algorithm and the control model on the basis of a MFNN respectively.
50 E. A. Engel and N. E. Engel
According to Fig. 3, the response time using the IC algorithm is not better than the
one using the proposed algorithm in the first 0.5 s. This means that the IC algorithm
which creates the control signal within the transient mode is the overshoot. From
time = 2.2 s to 3 s the PV system energy producing by the control model with the PID
controller based on the IC algorithm drops to zero.
Fig. 3. Plot of the PV system power provided by the control model with PID controller based on
the IC algorithm and the control model on the basis of a MFNN respectively.
The proposed PV system control model is more robust and provides more power
(Figs. 2 and 3) in comparison with the control models with the PID controller (based on
P&O, or the IC algorithm). Figure 2 shows the misjudgment phenomenon for the P&O
algorithm when solar irradiance continuously increases ðtime t 2 T ¼ ½0:3 s;
0:4 s [ ½0:8 s; 1 s [ ½1:7 s; 2:1 sÞ. In such situations, the proposed PV system control
model - which is based on a fuzzy modified neural net - produces on average 8.6% more
energy than does the case of the standard
model, which is based on a perturbation and
P t P
observation algorithm (100% PMFNN PtP&O =PtP&O = 1 ¼ 8:6%, where
t2T t2T
PMFNN is energy provided by proposed PV system control model based on a modified
fuzzy neural net, PP&O PMFNN is energy provided by standard model based on P&O
algorithm).
During time t 2 ½1:1 s; 1:3 s [ ½1:5 s; 1:7 s [ ½2:2 s; 3 s the PID controller based
on the IC algorithm generates a huge numerical value of the control signal (value of
control signal u 2 ½5:0706e þ 32; 5:6385e þ 33) as a result of sudden fluctuations in
the solar irradiance, while the proposed PV system control model provided the max-
imum PV power (Figs. 1, 3 and 4).
Photovoltaic System Control Model on the Basis of a MFNN 51
Fig. 4. Plot of the control signal provided by the PID controller based on the IC algorithm.
The MFNN provides a more suitable approach to the MPPT problem, with the
pointing accuracy. Extensive simulation studies on the Octave model have been carried
out on different initial conditions, different disturbance profiles, and variation in pho-
tovoltaic system and solar irradiation level parameters. The results show that consistent
performance has been achieved for the proposed PV system control model with good
stability and robustness as compared with the standard model with a PID controller.
3 Conclusions
It is shown that the PV system control model on the basis of a MFNN is robust to PV
system uncertainties. Unlike popular approaches to nonlinear control, a MFNN is used
to approximate the control law and not the system nonlinearities, which makes it
suitable over a wide range of nonlinearities. Compared to standard MPPT algorithms,
including P&O and IC, the PV system control model on the basis of a MFNN produces
good response time, low overshoot, and, in general, good performance. Simulation
comparison results for a PV system demonstrate the effectiveness of the PV system
control model on the basis of a MFNN as compared with the standard model with a
PID controller (based on P&O, or IC algorithm). It is our contention that the proposed
modified fuzzy neural net architecture can have generic control applications to other
kinds of systems, and produce a competitive alternative algorithm to neural networks
and PID controllers.
Acknowledgement. The reported study was funded by RFBR and Republic of Khakassia
according to the research project №. 19-48-190003.
References
1. Beta-energy official page. https://www.betaenergy.ru/insolation/abakan. Accessed 27 Apr
2019
2. Tavares, C.A.P., Leite, K.T.F., Suemitsu, W.I., Bellar, M.D.: Performance evaluation of PV
solar system with different MPPT methods. In: 35th Annual Conference of IEEE Industrial
Electronics IECON 2009, pp. 719–724 (2009)
52 E. A. Engel and N. E. Engel
3. Kumar, A., Chaudhary, P., Rizwan, M.: Development of fuzzy logic based MPPT controller
for PV system at varying meteorological parameters. In: 2015 Annual IEEE India Conference
(INDICON), New Delhi, pp. 1–6 (2015)
4. Engel, E.A., Engel, N.E.: Temperature forecasting based on the multi-agent adaptive fuzzy
neuronet. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V. (eds.) Advances in Neural
Computation, Machine Learning, and Cognitive Research. Neuroinformatics 2018. Studies in
Computational Intelligence, vol. 736. Springer, Cham (2019)
Impact of Assistive Control on Operator
Behavior Under High Operational Load
1 Introduction
behavior and methods of their evaluations. The following approach was imple-
mented: step 1—a list of events and situations is determined which may occur
or take place in the process of operator activity and to which a specific oper-
ator response is required, consisting of a specific set of actions. Further, step
2—determining the list of expected actions and methods of their registration.
Actions are characterized by a number of parameters that can be recorded. Thus,
step 3—registration of action parameters. Video surveillance system and feed-
back from the onboard system of the experimental setup (from the object of
operator activity) are used to capture the operator’s actions and their parame-
ters. From the onboard system, information is obtained about the presence (or
absence) of the action (pressing the key, switching the toggle switch, etc., see
Sect. 3) and the characteristics of the action (latency relative to the event for
which this action is expected).
Some actions or characteristics allow direct evaluation of the performance and
safety of behavior. For example, the absence of response to a particular stimulus
or a long reaction time (RT) can be interpreted by the control system as a
failed task (or an inefficiently performed task). Such control is the “first control
layer”. If the RT (or another parameter of the behavioral response) falls within
the permissible range, the behavior characteristics are analyzed in the “second
control layer” (or the “differential control layer”). In the second control layer, the
current observing behavior is compared with that typical for the operator at a
specific event. Variation of the characteristics of the operator’s actions within the
acceptable range depends on his physiological and psychological capabilities, on
the current state, on distractions, and it is difficult to directly interpret certain
values of such characteristics in terms of effectiveness and safety of behavior.
The project verifies the hypothesis that deviations of the characteristics of the
operator’s actions from those typical for a particular event are a correlate of the
efficiency and safety of operator activity and (or) allow us to make a forecast of
the effectiveness and safety of future behavior. Safety behavior can be formalized
as the probability of making a critical control error. Efficiency—in the form of the
number of non-critical errors per unit of time, possibly weighted by the degree of
significance. We identify the deviation from typical behavior by trying to classify
the feature vector comprised of behavioral parameters. If it fails to classify as a
model, which belongs to “typical behavior” class of particular operator, we treat
it as a possible correlate of non-optimal performance (see Sect. 4).
The overall scheme of the operational cycle with assistive control is presented
in Fig. 1. In general, it is similar to schemata of control systems in works [2–
4,8–10]. Assistive support components are described in works [3,11] and are
implemented in our approach in a similar manner.
This work is a summary of selected results of the project no. 2.955.2017/4.6,
supported by the Russian Ministry of Science and Higher Education.
Impact of Assistive Control on Operator Behavior 55
Artificial
Operational Cycle Assistant
Operator's actions
Behavior model
Surveillance systems
building
Uncontrollable
events Causal relationship
Data transfer
2 Problem Statement
We’ve evaluated operator’s behavior changes under high operational load con-
ditions which we model by positioning stimuli densely in time, where each stim-
ulus requires a certain response. A particular case of assistance is tested which
involves: (a) recognition of potentially non-optimal performance by classification
of features vector comprised of behavioral parameters, and (b) adding latency
to particular stimuli visualization.
To select the characteristics that make up the “individual portrait of behav-
ior” (a schema of actions and a list of ranges of their characteristics under certain
events), their variability is analyzed in a series of experiments. It makes sense to
carry out an analysis of the characteristics of behavioral reactions only after the
skill is established. The skill is considered established when the initial stage of
learning a new type of operator activity ends and performance indicators reach
a quasi-constant level.
It is assumed (and confirmed in our test experiment) that operational failure
is caused by perception conflict and, thus, serialization of visualization of quasi-
simultaneous stimuli may lead to more optimal performance despite the fact that
artificial latency of stimulus visualization itself increases RT.
3 Methods
Fig. 2. Experimental setup. Output devices: left (1) and right (2) monitors, LED panel
(3), speakers (4). Input devices: switch panel (5), keyboard (6), joystick (7)
According to Fig. 2, the stimuli come from monitors (1) and (2) in front of
the subject, the LED panel (3) and speakers (4). The image on the left monitor
(1) imitates artificial horizon (white line at the center), continuously changing
its tilt and height in random directions at a frequency of 10 Hz. On the right
monitor (2) there are the timer (upper-left corner) that starts from 15 s, and
restarts after the correct reaction is received, the penalty counter (bottom-left
corner) and the shape (circle, square, triangle, pentagonal star or hexagonal
star), which is changing randomly with 15–20 s interval. LED panel contains 6
rows of 3 diodes in each. During the experiment, a pattern of 1 to 9 randomly
chosen diodes is active, changing after random intervals of 2–12 s. Sound signal
with length about 100 ms rings after random intervals of 5–5.3 s.
Subjects were asked to react to stimuli using following input devices: switch
panel (5) with 5 toggles corresponding to possible shapes appearing on the right
monitor (2), computer keyboard (6) and Thrustmaster T.16000M joystick (7).
During the experiment, the subjects were charged to react quickly on the
four following stimuli:
1. Timer: when the timer on the right monitor (2) ends, to press a key on the
keyboard corresponding to the number of active diodes on the LED panel (3).
2. Shape: to switch toggles on the switch panel (5) corresponding to shape on
the right monitor.
3. Sound: to press the joystick (7) trigger on the sound signal.
4. Horizon: to hold artificial horizon (1) aligned with two black horizontal bars
using the joystick (7), while the horizon line randomly changing height of
its center with a speed of 0%–2.5% of the monitor width per second and
randomly changing rotation angle with a speed of 0–5 per second.
The experiment goes on for 3 min (success) or until reaching a threshold of 100
penalty points (failure). Table 1 illustrates permissible RT to the stimuli and the
corresponding penalties.
Impact of Assistive Control on Operator Behavior 57
Table 1. Permissible reaction time to external stimuli and penalty points in case of a
late/erroneous reaction
5 male subjects aged from 23 to 40 took part in the experiments. During one
session, each subject took part in 5 experiments, each subject participated in
one or more sessions with 1 to 2 week in-between. The Bioethics Committee of
SFedU approved the experimental protocol. Each volunteer signed the agreement
to participate in the experiment.
After the described experiments (will be further referred to as Normal Mode),
the subjects took part in two types of experiments with increased operational
load caused by the increased temporal density of stimuli (Hard Mode 1 and 2).
For Hard Mode 1, Timer, Shape and Sound stimuli occur 30% to 60% more
often, in addition, there is 50% chance for the shape to change in about 0–2 s
before or after the Timer ends.
Hard Mode 2 features the same changes as Hard Mode 1, in addition, when
the shape changes in 0–1.5 s before or 0–2 s after the Timer ends, the new shape
will be shown only after correct reaction on the Timer stimulus or after 5 s
passed.
Hard Mode experiments were performed starting with Hard Mode 1, consist-
ing of 3 sessions of 4 to 5 experiments in each with a 1-week interval between
sessions.
We assume that the histogram of the distribution of the RT may be the
basis for building a model of the subject’s behavior, which allows identifying the
subject by matching of reactions characteristics and determine the deviation of
behavior from the subject’s typical one in the next experiments. Reaction time is
widely used for evaluation of operator performance and for failures prediction [5]
and for modeling of human behavior [6].
In the study, the histogram of RT distribution on a certain stimulus is con-
sidered as a model of the subject’s behavior. Each stimulus is considered inde-
pendently. The problem of verification of the subject’s identity is based on an
analysis of his RT distributions obtained in one or more experiments.
Identity verification algorithm:
3. Calculate the distance (or measure of proximity) between the subsample and
the subject’s response model. The subject is considered to be successfully
verified if the distance is less than a fixed threshold (or more, in case of a
proximity measure).
To determine the thresholds for each stimulus and different subsample size, a
histogram of the RT distribution of the subject is chosen from the available
dataset of experimental results, which is compared with the histogram-model of
the subject (d1 value) and with histograms-models of other subjects (d2 values).
Subsamples are generated by randomly selecting K values from the subject’s RT
to the stimulus. Considered subsample sizes K are 4, 8, 16, 32, 64, 128. The set
of RT values of the subject for generating the subsample is contained in the set
of values for generating the model of the subject, which may affect the results
of comparing the subsample with the model. This will be discussed further.
For each of the five subjects, 100 independent calculations are performed for
each sample size, resulting in a set of 500 d1 values (for five subjects, 100 com-
parisons with their model) and 2000 d2 values (for five subjects, 100 comparisons
with the model of each of the other four subjects). The following functions of
histogram comparison are considered: chi-square and correlation.
The chi-square distribution function is as follows:
(H1 (I) − H2 (I))2
dchi (H1 , H2 ) = , (1)
H1 (I)
I
4 Results
FRR (for operator’s identification task) for the correlation and chi-square func-
tions decrease with the increase in the subsample size. The FRR of the selected
thresholds for the Timer and Shape stimuli (Table 2) is low as compared to the
Sound and Horizon stimuli. This is due to two factors: first, the number of RT
Impact of Assistive Control on Operator Behavior 59
values for these stimuli in the dataset for each test is about 300 values, making
the subsample of 128 values close to the full model of the subject, substantially
reducing the distance (or increasing proximity) between them; second, the cor-
rect response to these stimuli is more difficult for the subject than to the Sound
and Horizon stimuli, which can lead to visible differences in the behavior of the
subjects.
Table 2. The FRR for the Timer and Shape stimuli calculated for the FAR of 5%
We define the subject’s error as an average penalty per second value within an
experiment. An average error is defined as error averaged among all experiments
for a certain Mode. Table 3 illustrates the subject’s average error in different
scenarios and the portion of average error reduction, which is calculated as the
proportion of experiments in the Hard Mode 2 in which the subject’s error was
lower than the average error in the Hard Mode 1. It can be seen that for most
subjects, the addition of a visual delay in the appearance of the stimulus on
average increased efficiency. There are special cases with Subject 3, where the
efficiency has increased significantly, and with Subject 4, where the efficiency,
on the contrary, has decreased. Such changes indicate the individual nature of
the perception of simultaneous stimuli and different behaviors that are optimal
for different subjects.
Table 3. Average errors and error reduction coefficients when adding visual delay to
the Sound stimulus. Explanations in the text
Table 4. The correlation coefficients calculated for the metrics under consideration
between the error and the correspondence of the subject’s model
Table 4 indicates low correlation for any metric and any stimulus. The highest
values of the coefficient are achieved on the Sound stimulus. Chi-square function
represents distance, so the correlation is positive: the greater the distance to
the own model, the greater the error. Similarly, we can explain the negative
correlation for the correlation function.
Normal Mode experiments, where the subject’s error was more than twice
their average error are referred to as failed experiments (experiments with critical
errors). There were three such experiments: one for each of the Subjects 3, 4,
and 5. We consider the hypothesis that the model of the subject’s behavior in
these experiments is significantly different from the subject general model. To
test the hypothesis, an analysis similar to the problem of recognition of subjects
is carried out, except for only three subjects participating in the analysis, and
only values in the selected experiments being used as subsamples of RT values.
5 Conclusion
A behavioral model of the operator was built based on histograms of the dis-
tribution of reaction times (RT) for particular stimuli in the test experiment
which imitates certain pilot’s actions. It was shown that such RT distribution
is unique to an individual, and therefore the operator can be identified based
only on behavioral model matching. Accuracy of such identification depends on
a number of reaction times registered for the person being identified. For exam-
ple, having 64 measurements of RT for Timer or Shape stimuli, FAR was 0% at
fixed FRR = 5% when identifying a particular operator among 5 possible ones.
According to our results, deviation of RT in particular experiment from typical
distribution for the operator is a weak correlate of performance; strong devia-
tion may indicate worse performance. Failure of behavioral model identification
of particular operator (besides indicating possible operator replacement) is a
strong indicator of unsafe or inefficient behavior, especially for the Shape stim-
ulus. A particular case of assistive control was considered which involves adding
latency to stimuli visualization for suppression of perceptive or cognitive conflict
Impact of Assistive Control on Operator Behavior 61
References
1. Aloui, Z., Ahamada, N., Denoulet, J., Pierre, F., Rayrole, M., Gatti, M., Granado,
B.: Embedded real-time monitoring using SystemC in IMA Network. In: SAE 2016
Aerospace Systems and Technology Conference, September 2016, Hartford, United
States, pp. 1–4 (2016)
2. Didactic, F.: Process Control. Pressure, Flow, and Level. Legal Deposit — Library
and Archives Canada (2010)
3. Dittmeier, C., Casati, P.: Evaluating internal control systems. In: IIARF (2014)
4. Fotopoulos, J.: Process Control and Optimization Theory. Application to Heat
Treating Processes. Air Products and Chemicals Inc, Allentown (2006)
5. Kim, B., Bishu, R.: On assessing operator response time in human reliability anal-
ysis (HRA) using a possibilistic fuzzy regression model. Reliab. Eng. Syst. Saf.
52(1), 27–34 (1996)
6. Mahmud, J., Chen, J., Nichols, J.: When will you answer this? Estimating response
time in Twitter. In: Proceedings of the Seventh International AAAI Conference on
Weblogs and Social Media, pp. 697–700 (2013)
7. Martin, S., Vora, S., Yuen, K., Trivedi, M.: Dynamics of driver’s gaze: explorations
in behavior modeling and maneuver prediction. IEEE Trans. Intell. Veh. 3(2),
141–150 (2018)
8. O’Connor, D.: A Process Control Primer. Honeywell, Charlotte (2000)
9. Olum, Y.: Modern management theories and practices. In: East African Central
Banking Course, vol. 1, No. 11, pp. 5–6 (2004)
10. Rao, G.P.: Basic elements of control system. Control Syst. Robot. Autom. 1 (2009)
11. Stouffer, K., Pillitteri, V., Lightman, S., Abrams, M., Hahn, A.: Guide to industrial
control systems (ICS) security (2015)
Hierarchical Actor-Critic with Hindsight
for Mobile Robot with Continuous State Space
1 Introduction
goal-conditioned policies that use the state space to decompose a task into short sub-
tasks. Authors demonstrate experimentally in both grid world and simulated robotics
domains that HAC approach can significantly accelerate learning relative to other non-
hierarchical and hierarchical methods. Thus, the HAC framework [6] is the first to
successfully learn 3-level hierarchies in parallel in tasks with continuous state and
action spaces (Fig. 1).
Fig. 1. Results of HAC framework with one, two and three level of hierarchy.
Fig. 2. Simulated mobile robot environment (left) and HAC hierarchy (right).
64 S. Aleksey and A. I. Panov
The primitive action set consist of five actions: distance forward, distance forward
with angle rotation to the left and right, and the left or right rotation without distance
forward. If robot reaches the yellow sphere, the simulation is over and we achieve our
purpose.
The tic marks along the trajectory show the next states for the robot after each
primitive action is executed. The pink circles show the original subgoal actions. The
gray circles show the subgoal states reached in hindsight after at most H actions by the
low-level policy.
Hindsight action transitions help agents learn multiple levels of policies simulta-
neously by training each subgoal policy with respect to a transition function that
simulates the optimal lower level policy hierarchy.
For toy example, action transition for the states S0 and S1 would be like:
• [initial state = s0, action = s1, reward = −1, next state = s1, goal = yellow flag,
discount rate = gamma]
• [initial state = s1, action = s2, reward = −1, next state = s2, goal = yellow flag,
discount rate = c]
The second type of hindsight transition, hindsight goal transitions, helps each level
learn a goal-conditioned policy in sparse reward tasks by extending the idea of
Hindsight Experience Replay [7] to the hierarchical setting.
The hindsight goal transition created by the fifth primitive action that achieved the
hindsight goal would be:
• [initial state = 4th tick mark, action = joint torques, reward = 0, next state = s1,
goal = s1, discount rate = 0]
Assuming the last state reached s5 is used as the hindsight goal, the first and the last
hindsight goal transition for the high level would be:
• [initial state = s0, action = s1, reward = −1, next state = s1, goal = s5, discount
rate = c]
• [initial state = s4, action = s5, reward = 0, next state = s5, goal = s5, discount
rate = 0]
Hindsight goal transitions should significantly help each level learn an effective
goal-conditioned policy because it guarantees that after every sequence of actions, at
least one transition will be created that contains the sparse reward (in our case a reward
and discount rate of 0). These transitions containing the sparse reward will in turn
incentivize the UVFA critic function to assign relatively high Q-values to the (state,
action, goal) tuples described by these transitions. The UVFA can then potentially
generalize these high Q-values to the other actions that could help the level solve its
tasks.
Technically, HAC builds off three techniques from the reinforcement learning lit-
erature [2]:
• the Deep Deterministic Policy Gradient (DDPG) learning algorithm [8]
• Universal Value Function Approximators (UVFA) [9]
• Hindsight Experience Replay (HER) [7].
66 S. Aleksey and A. I. Panov
p:S!A ð1Þ
The critic network approximates the Q-function or the action-value function of the
current policy.
Q:SA!R ð3Þ
The agent first interacts with the environment for a period using a noisy policy
pðsÞ þ N ð0; 1Þ. The transitions experienced are stored as ðst ; at ; rt ; st þ 1 ; gt ;Þ.
The agent then updates its approximation of the Q-function of the current policy by
performing min-batch gradient descent on the loss function:
L ¼ ðQðst ; at Þ yt Þ2 ð4Þ
At the left side of the figure, there is concatenated architecture. At the center is two-
stream architecture with two separate sub-networks combined at h. At the right shown a
decomposed view of two-stream architecture when trained in two stages, where target
embedding vectors are formed by matrix factorization (right sub-diagram) and two
embedding networks are trained with those as multi-variate regression targets (left and
center sub-diagrams).
Thus, instead of Qp ðst ; at Þ ¼ E ½Rt jst ; at , we use Qp ðst ; at ; gt Þ ¼ E½Rt jst ; at ; gt .
68 S. Aleksey and A. I. Panov
3 Experiment
Fig. 6. Figure compares the performance of HAC with sensors data features (right) and without
sensors data features (left). The charts show the average success rate.
HAC with Hindsight for Mobile Robot with Continuous State Space 69
4 Results
Hierarchy has the potential to accelerate learning in sparse reward tasks because
hierarchy can decompose tasks into short horizon subtasks. A new framework HAC
can solve those simpler subtasks simultaneously. As for our mobile robot environment
with continuous space of states, the HAC outperform basic algorithms by 20–30% with
a relatively small area of the environment. With increasing of the size, this gap growth
will continue due to hindsight actions. One of the issues of this approach, that we
cannot set other different rewards than uses in hindsight. Due to that, we cannot
penalize actions that lead the agent to the wall collisions. It could be harmful when we
would train the algorithm in the real world with a real agent. The biggest advantage of
this approach is definitely a hierarchical neural network structure, because we can
transfer higher levels of neural networks weights to another agent or another envi-
ronment, which will dramatically decrease the time of training.
Acknowledgements. The reported study was supported by RFBR, research Projects No. 17-29-
07079.
References
1. Schmidhuber, J.: Learning to generate sub-goals for action sequences. In: Kohonen, T.,
Mäkisara, K., Simula, O., Kangas, J. (eds.) Artificial Neural Networks, pp. 967–972.
Elsevier Science Publishers B.V., North-Holland (1991)
2. Konidaris, G.D., Barto, A.G.: Skill discovery in continuous reinforcement learning domains
using skill chaining. Adv. Neural. Inf. Process. Syst. 22, 1015–1023 (2009)
3. Bacon, P.-L., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the
Thirty-First AAAI Conference on Artificial Intelligence, pp. 1726–1734 (2017)
4. Vezhnevets, A., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., Kavukcuoglu,
K.: FeUdal networks for hierarchical reinforcement learning. In: Proceedings of the 34th
International Conference on Machine Learning, pp. 3540–3549 (2017)
5. Nachum, O., Gu, S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning.
Adv. Neural. Inf. Process. Syst. 31, 3303–3313 (2018)
6. Levy, A., Konidaris, G., Platt, R., Saenko, K.: Learning multi-level hierarchies with
hindsight. arXiv:1712.00948. [cs.AI], March 2019
7. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B.,
Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay. Adv. Neural. Inf. Process.
Syst. 30, 5048–5058 (2017)
8. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.:
Continuous control with deep reinforcement learning. CoRR (2015). arXiv:1509.02971
9. Silver, D., Schaul, T., Horgan, D., Gregor, K.: Universal value function approximators. In:
International Conference on Machine Learning (July 2015)
10. Shikunov, M., Panov, A.I.: Hierarchical reinforcement learning approach for the road
intersection task. In: Samsonovich, A.V. (ed.) Biologically Inspired Cognitive Architectures
2019. Springer, Cham (2019)
70 S. Aleksey and A. I. Panov
11. Kuzmin, V., Panov, A.I.: Hierarchical reinforcement learning with options and united neural
network approximation. In: Abraham, A., Kovalev, S., Tarassov, V., Snasel, V., Sukhanov,
A. (eds.) Proceedings of the Third International Scientific Conference “Intelligent
Information Technologies for Industry” (IITI 2018), pp. 453–462. Springer, Cham (2018)
12. Ayunts, E., Panov, A.I.: Task planning in “Block World” with deep reinforcement learning.
In: Samsonovich, A.V., Klimov, V.V. (eds.) Biologically Inspired Cognitive Architectures
(BICA) for Young Scientists, pp. 3–9. Springer, Cham (2017)
The Hybrid Intelligent Information
System for Music Classification
1 Introduction
Recently, the use of machine learning in the field of music processing is signif-
icantly increasing. One of the tasks of such processing is the problem of music
classification. The review of existing methods for this problem is considered in
details in [1,2]. We propose an approach based on the concept of hybrid intelli-
gent information systems (HIIS). In this article, we will consider the details of
the HIIS-based system implementation for music classification and discuss the
results of experiments.
set of N user classes. Then the classification problem is reduced to the construc-
tion of the mapping algorithm:
consistent with the real users of the system. In other words, it is necessary to
build an algorithm that assigns to each music track of an arbitrary set one of
the predefined class labels according to users’ real music preferences. For this
purpose, the HIIS-based approach is used.
The HIIS approach is described in details in the paper [3]. According to the
[3] the HIIS consists of two main components: the subconsciousness module (MS)
and the consciousness module (MC).
The subconsciousness module is related to the environment in which a HIIS
operates. Because the environment can be represented as a set of continuous
signals, the data processing techniques of the MS are mostly based on neural
networks, fuzzy logic, and combined neuro-fuzzy methods.
The consciousness module performs logical processing of information. It may
be based on traditional programming or workflow technology, and in particular,
the rule-based programming approach is gaining popularity.
In the proposed approach, both MS and MC are based on machine learning
algorithms. The generalized structure of the intelligent system is represented in
Fig. 1.
5 The Experiments
For our research, we used custom dataset, which includes 1378 tracks divided
into three classes, with a sampling frequency of 44,100 Hz, the average track
length was 137.3 s.
Since we used custom dataset, we did not have the opportunity to com-
pare the obtained quality metrics with the quality metrics obtained by other
researchers.
Therefore, to ensure the validity of the proposed approach, we have conducted
experiments with algorithms of three levels of complexity: the logistic regression
approach (the simplest model); the multilayer perceptron approach (the model
of medium complexity) [12]; the HIIS approach (the model of high complexity).
The Precision, Recall and F1 -score (F-Measure) were used as classification
metrics [13]. The experiments results are represented in Table 1.
The results of the experiments turned out to be expected. The logistic regres-
sion approach (the simplest model) shows the worst results. The multilayer per-
ceptron approach (the model of medium complexity) shows the medium results.
The HIIS approach (the model of high complexity) shows the best results. Thus,
the results of the experiments make sure the validity of the proposed approach.
To assess the quality of the classifier (HIIS model), ROC-curves were built
[14]. The ROC function can also be used in multiclass classification if the pre-
dicted outputs have been binarized. For this reason, ROC-curves are plotted for
each class. There are then a number of ways to average binary metric calcula-
tions across the set of classes, each of which may be useful in some scenario (we
use micro-average and macro-average metrics). AUC (Area Under Curve) is an
76 A. Stikharnyi et al.
6 Conclusions
The article proposes an approach for music classification problem using hybrid
intelligent information systems (HIIS). The hybrid system as a whole is imple-
mented as an intelligent agent using the experience replay approach.
The subconsciousness module is related to the environment in which a HIIS
operates. Because the environment can be represented as a set of continuous
signals, the data processing techniques of the MS are mostly based on neural
networks, fuzzy logic, and combined neuro-fuzzy methods. In the proposed app-
roach, it is implemented as a set of the binary classifier based on the LSTM
network.
The consciousness module performs logical processing of information. It may
be based on traditional programming, workflow technology, rule-based program-
ming. In the proposed approach it is implemented using decision trees.
The experiments were conducted using custom dataset. The results of the
experiments make sure the validity of the proposed approach.
The Hybrid Intelligent Information System for Music Classification 77
References
1. Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification
and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011). https://doi.org/
10.1109/TMM.2010.2098858
2. Goienetxea, I., Martı́nez-Otzeta, J.M., Sierra, B., Mendialdua, I.: Towards the use
of similarity distances to music genre classification: a comparative study. PloS one
13(2), e0191417 (2018). https://doi.org/10.1371/journal.pone.0191417
3. Chernenkiy, V., Gapanyuk, Yu., Terekhov, V., Revunkov, G., Kaganov, Y.: The
hybrid intelligent information system approach as the basis for cognitive architec-
ture. Procedia Comput. Sci. 145, 143–152 (2018). http://www.sciencedirect.com/
science/article/pii/S187705091832307X
4. Zhang, S., Sutton, R.S.: A deeper look at experience replay. arXiv preprint
arXiv:1712.01275 (2017)
5. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P.,
McGrew, B., Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay.
arXiv preprint arXiv:1707.01495 (2017)
6. Koyejo, O.O., Natarajan, N., Ravikumar, P.K., Dhillon, I.S.: Consistent binary
classification with generalized performance metrics. In: Proceedings of the 27th
International Conference on Neural Information Processing Systems – vol. 2, pp.
2744–2752 (2014)
7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8),
1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
8. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
9. Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applica-
tions, 2nd edn. World Scientific Publishing Co., New Jersey (2014)
10. The Scikit-Learn Library: Decision trees. https://scikit-learn.org/stable/modules/
tree.html. Accessed 24 May 2019
11. Modarres, R., Gastwirth, J.L.: A cautionary note on estimating the standard error
of the Gini index of inequality. Oxf. Bull. Econ. Stat. 68(3), 385–390 (2006)
12. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York
(1994)
13. Powers, D.: Evaluation: from precision, recall and F-measure to ROC, informed-
ness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
14. Pepe, M.S.: The Statistical Evaluation of Medical Tests for Classification and Pre-
diction. Oxford University Press, Oxford (2004)
The Hybrid Intelligent Information System
for Poems Generation
Abstract. Any generated text must have a “form” and “content” components. It
is the “content” that is the main component of the generated text, but the “form”
component is no less important. It may be necessary to generate texts in different
linguistic styles among them the poetic linguistic style. The article proposes an
approach for poems generation problem using hybrid intelligent information
systems (HIIS). The HIIS consists of two main components: the subcon-
sciousness module and the consciousness module. In the case of poems gen-
eration, the subconsciousness module consists of two submodules: the stress
placement module and the rhyme and rhythm module. These modules use
machine learning techniques. The consciousness module includes the poem
synthesis module, which is rule-based. The stress placement module is based on
the convolutional neural network. On the test dataset, the accuracy of the
classifier is 97.66%. The rhyme and rhythm module based on neural networks
with a depth of 5–7 layers. On the test dataset, the accuracy of the classifier is
91.63%.
1 Introduction
According to the Gartner’s report on BI Tools 2018 [1], “By 2020, natural language
generation and artificial intelligence will be a standard feature of 90% of modern
business intelligence platforms”. Thus, taking into account the needs of the industry,
the natural-language generation (NLG) is a very important area of software engineering
development [2].
The business intelligence platform may be considered as a special case of an
intelligent assistant agent. The concepts of such assistants have long been developed,
and there is no doubt that such software systems or hardware-software devices will be
used more and more. The natural-language generation (NLG) module should also be
part of such assistants.
It should be noted that at present, the area of text-based speech synthesis is also
actively developing. Therefore, we can assume that solving the problem of generating
text, we simultaneously create both a writing and a speaking agent.
Any generated text must have a “form” and “content” components. There is no
doubt that it is the “content” that is the main component of the generated text. But if the
assistants will be widely used, the questions of the “form” component of the transfer of
the text become no less important. In the process of generating text, the assistant should
be guided by the age of the interlocutor, his level of knowledge, and other aspects.
Depending on the context of the situation and the peculiarities of the interlocutor, it
may be necessary to generate texts in different linguistic styles.
This article is devoted to the texts generation in the poetic linguistic style. On the
one hand, the task of poems generation from the point of view of industry needs can be
viewed as a “toy” one. Indeed, it is difficult to assume that even in the distant future
financial statements will be formed in a poetic form. But on the other hand, the task of
poems generation is simply a special case of the task of generating texts in different
“forms”.
According to [3], traditional approaches to the generation of poems include:
1. Template Based Poetry Generation: templates of poetry forms are filled with words
that suit the defined constraints (either syntactic, rhythmic, or both).
2. Generate and Test Approaches: random word sequences are produced according to
formal requirements, that may involve metric, other formal constraints, and
semantic constraints.
3. Case-Based Reasoning Approaches: existing poems are retrieved, considering a
targeted message provided by the user, and are then adapted to fit the required
content.
4. Evolutionary Approaches: poetry generation is based on evolutionary computation.
Obviously, the only evolutionary approach takes full advantage of the methods of
artificial intelligence. Within the framework of the evolutionary approach, one of the
most detailed papers is the dissertation [4].
Now, according to [5], the methods of generating poems are increasingly using
artificial intelligence methods, especially deep neural networks. The example of such
an approach is an interactive poetry generation system “Hafez” [6, 7]. The “Hafez”
system generates poems in three steps:
1. Search for related rhyme words given user-supplied topic.
2. Create a finite-state acceptor (FSA) that incorporates the rhyme words and controls
meter.
3. Use a recurrent neural network (RNN) to generate the poem string, guided by the
FSA.
The features of the “Hafez” system is that it is firstly focused on dialogue with the
user, and secondly generates poems in English.
The proposed approach does not involve a dialogue with the user and is focused on
the Russian language.
Thus, despite the many years of effort, the generation of poems remains an open
problem. And the authors hope that this article will be a small step in the direction of
poems generation.
80 M. Taran et al.
To solve the poems generation problem, we propose to use the approach based on the
hybrid intelligent information system (HIIS). The HIIS-based approach is described in
details in the paper [8]. In this section, we will briefly review the HIIS-based approach
and consider its application for poems generation. The generalized structure of a hybrid
intelligent information system is represented in Fig. 1.
According to [8], the HIIS structure consists of two main components: the sub-
consciousness module (MS) and the consciousness module (MC).
The MS (subconsciousness module) is related to the environment in which a HIIS
operates. Because the environment can be represented as a set of continuous signals,
the data processing techniques of the MS are mostly based on neural networks, fuzzy
logic, combined neuro-fuzzy methods, and machine learning techniques.
The MC (consciousness module) is traditionally based on conventional data and
knowledge processing, which may be based on traditional programming, workflow
technology, rule-based programming approach.
The advantages of a rules-based approach include flexibility. In this case, the
program is not hardcoded but forward chained with rules based on the data. The
disadvantages include the possibility of rules cycling and the complexity of processing
a large set of rules. Nowadays, for the processing of a large set of rules, the Rete
algorithm and its modifications are used.
To build the module of consciousness, it is possible to use machine learning
techniques, for example, building a set of rules in the form of a decision tree.
The Hybrid Intelligent Information System for Poems Generation 81
From the interaction point of view, the following options or their combinations are
possible in a HIIS:
1. Interaction is implemented through the environment. The MS reads the data from
the environment, converts them, and transmits them to the MC. The MC performs
logic processing and returns the results to the MS (if transformation is required) or
directly to the environment. The MS transforms the results and writes them into the
environment, where they can be read by another HIIS.
2. The MI (Module of Interaction) is used for the interaction with another HIIS.
Depending on the tasks to be solved, the MI can interact with the MC (which is
typical for conventional information systems) or with the MS (which is typical for
systems based on soft computing).
3. User interaction can be carried out using the MC (which is typical for conventional
information systems) or through the MS (which can be used, for example, in
automated simulators).
In the case of poems generation, the subconsciousness module consists of two
submodules: the stress placement module and the rhyme and rhythm module. These
modules use machine learning techniques. The consciousness module includes the
poem synthesis module, which is rule-based. The generalized structure of the HIIS for
poems generation is represented in Fig. 2.
The stress
The rhyme and
placement The poem synthesis module
rhythm module
module
The interaction is implemented through the environment. In this case, the text in
prose or poetic form is considered as the environment.
The proposed approach is implemented for the Russian language. The implemen-
tation of modules is discussed in details in the following sections.
The input of the module is the word in Russian without stress, and the output is the
same word, but with the stress.
82 M. Taran et al.
The module is built on a hybrid approach, combining both rule processing (for
simple cases) and machine learning (for more complex cases). The module operation
algorithm contains the following steps:
1. The input word is converted to the required format, morphological analysis is
performed, and the initial dataset is formed for further processing.
2. In order to detect simple cases, the generated data is processed using a set of rules.
For example, of such a rule, the Russian letter “Ё” is always stressed.
3. If none of the rules fires, then the machine learning model is used.
4. At the output of the module, a stressed word is formed in a human-readable format
as well as in the form of a dataset for further processing.
From the point of view of machine learning, the stress placement problem may be
considered as a problem of multi-class classification. The features of the model are the
word itself and additional data that are extracted after morphological analysis. The
target feature is the position of the stressed letter in the word.
Fig. 3. The neural network architecture for the stress placement module
The neural network architecture was chosen experimentally. The final architecture
is shown in Fig. 3. On the test dataset, the accuracy of the classifier was 97.66%. The
neural network was trained for 16 epochs. The results are shown in Fig. 4.
The example of stress placement module output (in Russian, stressed letters are
capitalized), “нa ocнOвe Этиx дAнныx тpEбyeтcя вoccтaнoвИть нeЯвнyю зaвИ-
cимocть тo ecть пocтpOить aлгopИтм cпocOбный для любOгo вoзмOжнoгo
вxOднoгo oбъEктa вЫдaть дocтAтoчнo тOчный клaccифицИpyющий oтвEт”.
One or several sentences can be submitted to the module input depending on the total
number of words. This behavior is caused by learning four-line stanzas.
First, the input text is divided into words. The stress placement module is used to
determine the stress for each word. Then the features for machine learning models are
created. These features include the selected syllables and stresses, as well as the last
few letters in words.
Different machine learning methods were used to determine the appropriate words
for rhyme, size, presence or absence of alliteration and other target features. A separate
model was trained to predict each individual target feature.
The search for words for rhyme is performed on the basis of a pre-formed dic-
tionary. The dictionary contains both possible word endings and their alternation.
Based on empirically selected rules, only the most probable word sequences for a given
text are left in the dictionary. Neural networks with a depth of 5–7 layers were used to
determine other target features.
84 M. Taran et al.
A separate task is the formation of a data set for models training. A dataset was
prepared with poems by well-known authors suitable for specific conditions: each verse
contains four lines; white poems are removed from the corpus.
Since the resulting dataset contained several thousand examples, it was decided to
set the values of the target features automatically. For this purpose, methods of
dimensionality reduction [9] (PCA algorithm) and hierarchical clustering [10] were
used. As a result, seven separate clusters were identified. The visualization of clusters
obtained by the t-SNE [11] algorithm is shown in Fig. 5.
On the test dataset, the accuracy of the classifier was 91.63%. The neural network
was trained for ten epochs. The results are shown in Fig. 6.
The poem synthesis module is rule-based. The module input is the input prose text and
the set of features received from the rhyme and rhythm module. The module output is
the generated text in the poetic form.
A stanza is not formed if the deviation from the template is two or more. Decli-
nation and conjugation of words are also not produced in the current version, which
leaves room for further improvement in the quality of the system’s work.
The example of the module output (in Russian, the input text is a fragment of the
cookbook): “Heoбxoдим cыp пoxoжий нa бpынзy. \Oтличнo пoдoйдeт пapмeзaн.
\Eгo нaдo пopeзaть кycoчкaми. \Bce cмeшaть и зaпpaвить мacлoм”.
6 Conclusions
The article proposes an approach for poems generation problem using hybrid intelligent
information systems (HIIS). HIIS consists of two main components: the subcon-
sciousness module (MS) and the consciousness module (MC).
In the case of poems generation, the subconsciousness module consists of two
submodules: the stress placement module and the rhyme and rhythm module. These
modules use machine learning techniques. The consciousness module includes the
poem synthesis module, which is rule-based.
The stress placement module is based on the convolutional neural network. On the
test dataset, the accuracy of the classifier is 97.66%.
The rhyme and rhythm module is based on neural networks with a depth of 5–7
layers. On the test dataset, the accuracy of the classifier is 91.63%.
The task of poems generation is simply a special case of the task of generating texts
in different “forms”. The proposed approach allows generating the text in the poetic
form from the prose text.
References
1. Gartner Report on BI Tools 2018. https://systelligent.com/gartner-report-on-bitools-2018.
Accessed 24 May 2019
2. Khurana, D., Koli, A., Khatter, K., Singh, S.: Natural language processing: state of the art,
current trends and challenges. arXiv preprint. arXiv:1708.05148 (2017)
3. Gervas, P.: Exploring quantitative evaluations of the creativity of automatic poets. In:
Workshop on Creative Systems, Approaches to Creativity in Artificial Intelligence and
Cognitive Science, 15th European Conference on Artificial Intelligence (2002)
4. Manurung, H.M.: An evolutionary algorithm approach to poetry generation. Ph.D. thesis,
Institute for Communicating and Collaborative Systems, School of Informatics, University
of Edinburgh (2003)
86 M. Taran et al.
5. Pandya, M.: NLP based poetry analysis and generation. Technical report. https://doi.org/10.
13140/rg.2.2.35878.73285 (2016)
6. Ghazvininejad, M., Shi, X., Choi, Y., Knight, K.: Generating topical poetry. In: Proceedings
of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1183–
1191 (2016). https://doi.org/10.18653/v1/d16-1126
7. Ghazvininejad, M., Shi, X., Priyadarshi, J., Knight, K.: Hafez: an interactive poetry
generation system. In: Proceedings of ACL 2017, System Demonstrations, pp. 43–48
(2017). https://doi.org/10.18653/v1/p17-4008
8. Chernenkiy, V., Gapanyuk, Y., Terekhov, V., Revunkov, G., Kaganov, Y.: The hybrid
intelligent information system approach as the basis for cognitive architecture. Procedia
Comput. Sci. 145, 143–152 (2018). http://www.sciencedirect.com/science/article/pii/
S187705091832307X
9. Maaten, L.V., Postma, E.O., Herik, J.V.: Dimensionality reduction: a comparative review.
J. Mach. Learn. Res. 10(66–71), 13 (2009)
10. Mishra, H., Tripathi, S.: A comparative study of data clustering techniques. Int. Res. J. Eng.
Technol. (IRJET) 4(5), 1392–1398 (2017)
11. Linderman, G.C., Steinerberger, S.: Clustering with t-SNE, provably. arXiv preprint. arXiv:
1706.02582 (2017)
Cognitive Sciences and Brain-Computer
Interface, Adaptive Behavior and
Evolutionary Simulation
Is Information Density a Reliable Universal
Predictor of Eye Movement Patterns
in Silent Reading?
1 Introduction
Today, many studies in the field of computer vision focus, among other things, on eye
movement recognition [1, 2]. Therefore, a lot of computational models of eye move-
ments appear, that help not only to make inferences about the linguistic processes
involved in reading, but also to diagnose neurocognitive disorders [3, 4]. These models
are usually developed for specific languages. To make a model cross-linguistically
applicable it is important to define universal factors that determine eye-movement
reading patterns. This paper investigates information density as such a universal factor.
Variability between languages remains a key issue in psychology and linguistics, as
understanding of universal patterns of reading can feed models of information pro-
cessing. Frost et al. speak of the necessity to define independent cross-linguistic
parameters that underlie theory-motivated models of reading [5]. Yet a number of
scholars deny universality in reading patterns across languages [6]. If such universals
exist, they represent general principles by which information from print is extracted by
the writing processing system. Moreover, the most obvious prediction that can be made
based on Universality Theory by Frost et al. is that there are different ways of visual
information encoding in different writing systems. However, the time it takes to extract
encoded meaning should remain comparable across languages regardless of the type of
encoding.
Humans with unimpaired visual system sample their environment by making a
series of fixations and saccades [7]. During fixations information intake from the
encoded input takes place while saccades do not supply any useful data. However, the
eye movements are largely under cognitive control and the analysis of temporal and
spatial characteristics of saccades during reading can reflect cognitive processing [8].
A number of studies show that during reading the upcoming visual input is partially
pre-processed in the parafovea [9, 10]. Thus, saccadic movements are akin to all human
species irrespective of language and culture; saccadic sampling and retinal make-up
determine the speed at which visual information is encoded and made available to the
linguistic processor.
Any language makes use of reading to extract information from written texts.
However, reading process itself may differ widely across languages at different levels,
from single words to phrases and the text as a whole. For example, while reading the
same text translated into different languages the participants show the range of eye
tracking patterns that vary in the number and length of fixations, and the length of
forward saccades [11]. Thus, cross-language differences may affect the eye-tracking
patterns observed in reading [12, 13]. One such peculiarity is the so-called language
density reported in various studies [14, 15]. Density in a language is the amount of
information conveyed by one structural unit, for instance, a word or a character.
Different types of density are distinguished in the literature. Lexical density is
defined by the number of lexical items such as nouns, verbs, adjectives, and adverbs
used in the text [16]. It can be used to estimate lexical variability in L2 speech
production [15, 16] or to study properties of texts from different corpora [17]. Semantic
density is a number of semantic features associated with one verb. It is used in studies
on language development, and language impairment in aphasia. Neighborhood density
is defined as the number of words that differ from a given word in only one phoneme in
any word position [19]. This type of language density can significantly influence oral
language decoding [20]. Propositional density is a measure of content richness in
language production [21].
For the study of written language decoding, two more kinds of density may be
informative. Visual density is the amount of visual information that is available per unit
of text [12]. Another type of density – information density - represents the amount of
information per word, depending on research goals and context [12]. Information
density is exploited to define the difference between language, for instance, German
and English. Visual density has been shown to influence the length of forward sac-
cades, and information density influences fixation durations cross-linguistically [12].
Thus, visual and information density may account for cross-linguistic differences in
patterns observed for written language decoding.
Letters, phonemes, and syllables cross-linguistically have different information
density. For example, single letters or syllables of English except for single-letter
pronouns (I), articles (a) or inflectional morphology (-s; -ed; -ing) are not usually
syntactically informative. In a language like Russian, however, which has a relatively
Is Information Density a Reliable Universal Predictor of Eye Movement 91
transparent orthography and a rich inflectional paradigm, letters and syllables, espe-
cially at word offset, bear semantic and syntactic information. As a result, when it
comes to higher-level processing, cross-linguistic differences may emerge in the rela-
tive utility of allocating attention to various features of the input [13]. To this extent,
words of equal length can be considered visually denser in Russian than in English.
Based on the assumption that writing systems differ as to their density, Liversedge
et al. in [12] defined universal and language specific eye-movement patterns for Fin-
nish, English and Chinese [12]. However, no such investigation has been made yet with
regard to Russian language. To address the issue of language universality in reading,
the study replicates the experiment in [12] for Russian, whose writing system differs
from English as to a number or parameters (alphabet, agglutination etc.). A systematic
comparative analysis of how information density affects the reading pattern for Russian
and English might be revealing for cross-linguistic modelling of reading patterns. Both
Russian and English are alphabetic languages that have vowels and consonants.
Therefore, their information density can be collocated: words in Russian are longer
than in English. Thus, the information density should be greater for English than for
Russian. In line with the study [12], we should expect the number of fixations and
saccade size to be greater and the fixation length to be shorter for Russian texts that are
equivalent to original English texts used in [12].
2 Methods
To test the participants’ proficiency in their native language (Russian) a C-test was
compiled. The C-test allows to assess different types of linguistic knowledge at the
micro- and macrolevel and requires mastery of grammar and vocabulary. It, thus, can
be used to assess “global” language proficiency [22]. The C-test is also reported as
highly reliable and valid [23]. The C-Test used in the study was a version of the story
“Who is called Mowgli?” that was adapted as to the guidelines in [24]. The C-test
included 40 words whose second half was deleted and had to be restored by the
participants. Every response was scored on a three-point scale: 3 points - correctly
recovered word; 2 points - the basis of the word is correctly chosen, but the word form
is erroneous; 1 point - the basis of the word is correctly chosen, but the initial form is
used; 0 points - wrong word basis/no answer. The maximum number of points - 120.
27 Russian-speaking students took part in the experiment. All of them scored high
on the C-test (more than 95% of maximum score). The eye-movements were recorded
with the help of SMI-Hi Speed Tracker 1250. The sampling rate was set to 500 Hz. The
experiment began with a 9-point calibration. After that the participants had to read eight
texts in Russian and answer comprehension questions. The texts used in the study were
translated from the stimuli used in [12]. They were split down into between 2 to 4
slides, so that each slide contained 1–8 sentences. Courier New with 0.46 visual angle
character subtension was used.
92 V. A. Demareva and Yu. A. Edeleva
The sentence was selected as a unit of analysis. Four measures reflecting global
properties of eye-movements were computed: (1) Total Sentence Reading Time,
(2) Average Number of Fixations, (3) Average Forward Saccade Size, and (4) Average
Fixation Duration. The date points beyond three standard deviations and fixations
shorter than 60 ms and longer than 800 ms were removed from further statistical
analysis. Statistical analysis was performed in MS Excel and Statistica 10.0. Unifac-
torial dispersion analysis was used. Study design and procedures were approved by the
Ethical Committee of Lobachevsky State University, and all participants provided
written informed consent in accordance with the Declaration of Helsinki.
Global eye movement measures for Russian texts as obtained in the study as well as for
English texts are provided in Table 2.
Table 2. Global eye-movement measures: total sentence reading time (in ms), average number
of fixations, average forward saccade size (in characters), and average fixation duration (in ms).
Standard deviations are provided in Parentheses.
Eye-movement measures Russian English [12]
Total sentence reading time 4302 (1865) 3093 (777)
Average number of fixations 8.6 (2.38) 14.81 (2.93)
Average forward saccade size 7.78 (1.79) 8.53 (1.55)
Average fixation duration 195 (23) 207 (32)
Compared to the results in the study [12], our results for Russian texts show longer
average reading times, smaller number of fixations, shorter forward, and shorter fixa-
tions. The texts themselves had a significant influence on the eye movement measures.
For instance, there was a significant effect of text type on the total sentence reading
duration (F (6, 3701) = 27.5, p < 0,001), which could partially account for the
observed results [25].
Is Information Density a Reliable Universal Predictor of Eye Movement 93
The study fully reproduced the experimental design and analysis algorithm used in
[12], however, only one of the expectations (shorter fixation durations) got confirmed.
Supposedly, the eye-movement pattern observed for Russian could be influenced by
some additional language-specific properties of Russian other than information density.
Russian and English both belong to the Indo-European family [26]. Russian belongs to
the East-Slavic group [27], and English is a language of the West-Germanic group [26].
Compared to English, Russian is a highly inflectional language [28]. English orthog-
raphy with its 26 letters is considered irregular and morphophonemic where a word’s
sound pattern depends on its meaning. Russian uses Cyrillic script. The alphabet
contains 33 letters. Compared to English, Russian orthography is considered fairly
regular [29]. Moreover, different types of linguistic density (lexical [15, 16], semantic
[18]) should also be considered.
4 Conclusion
Acknowledgment. This work was supported by the Russian Foundation for Basic Research
(grant No. 18-013-01169).
References
1. Leroux, M., Raison, M., Adadja, T., Achiche, S.: Combination of eye tracking and computer
vision for robotics control. In: Proceedings of 2015 IEEE International Conference on
Technologies for Practical Robot Applications (TePRA), Woburn, pp. 1–6 (2015)
2. George, A., Routray, A.: Fast and accurate algorithm for eye localization for gaze tracking in
low-resolution images. Comput. Vis. 10(7), 660–669 (2016)
3. Beltrán, J., García-Vázquez, M.S., Benois-Pineau, J., Gutierrez-Robledo, L.M., Dartigues,
J.-F.: Computational techniques for eye movements analysis towards supporting early
Diagnosis of Alzheimer’s disease: a review. Computational and Mathematical Methods in
Medicine 2018. https://www.hindawi.com/journals/cmmm/2018/2676409/cta/. Accessed 26
May 2019
4. Heinzle, J., Aponte, E.A., Stephan, K.E.: Computational models of eye movements and their
application to schizophrenia. Curr. Opin. Behav. Sci. 11, 21–29 (2016)
5. Frost, R.: Towards a universal model of reading. Behav. Brain Sci. 35(5), 263–279 (2012)
6. Coltheart, M., Crain, S.: Are there universals of reading? We don’t believe so. Behav. Brain
Sci. 35(5), 20–21 (2012). Invited commentary on ‘‘Towards a universal model of reading”
7. Findlay, J.M., Gilchrist, I.D.: Active Vision: The Psychology of Looking and Seeing. Oxford
University Press, Oxford (2003)
8. Liversedge, S.P., Findlay, J.M.: Saccadic eye movements and cognition. Trends Cogn. Sci. 4
(1), 6–14 (2000)
94 V. A. Demareva and Yu. A. Edeleva
9. McConkie, G.W., Rayner, K.: The span of the effective stimulus during a fixation in reading.
Percept. Psychophysics 17, 578–586 (1975)
10. Rayner, K.: Eye movements and attention in reading, scene perception, and visual search.
Q. J. Exp. Psychol. 62, 1457–1506 (2009). The thirty-fifth Sir Frederick Bartlett Lecture
11. Rahaman, J., Agrawal, H., Srivastava, N., Chandrasekharan, S.: Recombinant enaction:
manipulatives generate new procedures in the imagination, by extending and recombining
action spaces. Cogn. Sci. 42, 370–415 (2018)
12. Liversedge, S.P., Drieghe, D., Li, X., Yan, G., Bai, X., Hyönä, J.: Universality in eye
movements and reading: a trilingual investigation. Cognition 147(3), 1–20 (2016)
13. Stoops, A., Christianson, K.: Parafoveal processing of inflectional morphology on Russian
nouns. J. Cogn. Psychol. 29(6), 653–669 (2017)
14. Crocker, M.W., Demberg, V., Teich, E.: Information density and linguistic encoding
(IDeaL). Künstl. Intell. 30, 77 (2016)
15. Gregori-Signes, C., Clavel-Arroitia, B.: Analyzing lexical density and lexical diversity in
university students’ written discourse. Procedia – Soc. Behav. Sci. 198, 546–556 (2015)
16. Reza, K., Gholami, J.: Lexical complexity development from dynamic systems theory
perspective: lexical density, diversity, and sophistication. Int. J. Instr. 10(4), 1–18 (2017)
17. Méndez, D., Ángeles, A.: Titles of scientific letters and research papers in astrophysics: a
comparative study of some linguistic aspects and their relationship with collaboration issues.
Adv. Lang. Literary Stud. 8(5), 128–139 (2017)
18. Borovsky, A., Ellis, E.M., Evans, J.L., Elman, J.L.: Semantic structure in vocabulary
knowledge interacts with lexical and sentence processing in infancy. Child Dev. 87(6),
1893–1908 (2016)
19. Nair, V., Biedermann, B., Nickels, L.: Understanding bilingual word learning: the role of
phonotactic probability and phonological neighborhood density. J. Speech Lang. Hear. Res.
60(12), 1–10 (2017)
20. Rispens, J., Baker, A., Duinmeijer, I.: Word recognition and nonword repetition in children
with language disorders: the effects of neighborhood density, lexical frequency, and
phonotactic probability. J. Speech Lang. Hear. Res. 58(1), 78–92 (2015)
21. Smolík, F., Stepankova, H., Vyhnálek, M., Nikolai, T., Horáková, K., Matejka, Š.:
Propositional density in spoken and written language of Czech-speaking patients with mild
cognitive impairment. J. Speech Lang. Hear. Res. 56(6), 1461–1470 (2016)
22. Eckes, T., Grotjahn, R.: A closer look at the construct validity of C-tests. Lang. Test. 23(3),
290–325 (2006)
23. Babaii, E., Ansary, H.: The C-test: a valid operationalization of reduced redundancy
principle? System 29, 209–219 (2001)
24. Cook, S.V., Pandža, N.B., Lancaster, A.K., Gor, K.: Fuzzy nonnative phonolexical
representations lead to fuzzy form-to-meaning mappings. Front. Psychol. 7, 1–17 (2016)
25. Demareva, V.A., Polevaia, A.V., Kushina, N.V.: The influence of language density on eye
movements in silent reading: an eye tracking study in Russian vs. English. Int.
J. Psychophysiol. 131S, S75–S76 (2018)
26. Baldi, P.H.: Indo-European languages. In: International Encyclopedia of the Social and
Behavioral Sciences, 2nd edn, Oxford, Pergamon (2015)
27. Zaprudski, S.: In the grip of replacive Bilingualism: the Belarusian language in contact with
Russian. Int. J. Sociol. Lang. 183, 97–118 (2007)
28. Maučec, M.S., Donaj, G.: Morphology in statistical machine translation from English to
highly inflectional language. Int. Test Conf. 47(1), 63–74 (2018)
29. Boulware-Gooden, R., Joshi, R.M., Grigorenko, E.: The role of phonology, morphology,
and orthography in English and Russian spelling. Dyslexia 21(2), 142–161 (2015)
Bistable Perception of Ambiguous
Images – Analytical Model
1 Introduction
Bistable perception is manifested when an ambiguous image, admitting two
interpretations, is presented to the subject. In that case the image perception
oscillates with time in a random manner between those two possible interpreta-
tions [2]. Such a bistability arises for different types of modality [3] – ambiguous
geometrical figures [Necker, 1832], figure-ground processes [4], etc. (cf. [5,6]).
Why those oscillations occur? Concrete “microscopic” mechanism of that
phenomenon is not known (see [7]), but various formal models are suggested
based, mainly, on the idea of competitions between distinct neuron populations
(engrams) [8]. The fundamental attribute of the most part of similar models is
the existence of fluctuations (the noise) which leads to random switching over
different perceptions.
We exploit the popular model according to which the dynamical process of
the bistable recognition might be reduced to traveling the ball along the energy
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 95–105, 2020.
https://doi.org/10.1007/978-3-030-30425-6_10
96 E. Meilikov and R. Farzetdinova
landscape in the presence of the high enough “noise” [8]. Relatively deep wells
of that landscape correspond to old neuronal patterns (“long-stored” in the
memory), while new images, being subjected to identification, are more shallow
wells. The image recognition is analogous to removing the ball in the nearest
deeper well corresponding to some known engram. Then, the possible perception
bistability is due to the fact that probabilities of transitions in different wells,
corresponding to different images, differ weakly, while in the usual situation
(with unambiguous image recognition) one of these probabilities significantly
outweighs another one. Now, the main problem is to establish, which details of
the system dynamics define characteristics of the bistable image recognition.
2 Energy Function
Due to fluctuations, the system state changes randomly that results in the per-
ception bistability. It is suggested [3], that two neuron populations (two differ-
ent neuron graphs, or two engrams) represent two possible interpretations of
the stimulus. Those two populations “compete” with one another, changing the
activity of their neurons. Such a model is based on introducing some energy func-
tion U with two local minima, corresponding to both different image perceptions,
and a barrier between these two states.
The temporal evolution of the neuron system is usually described with the
equation of its “movement” over the nonuniform energy landscape under the
action of the stochastic force, representing noise perturbations [9,10]. We utilize
the alternative (and simpler) approach suggesting that the system is in the
quasi-stationary state which could be described by the Arrhenius equation [11].
That would be true if the average energy Φ of noise fluctuations was less
than the hight of the barrier separating two system states. Below, we will see
that this suggestion is valid. But it is the aim of the work to show that this
“limited” model, though much more simple, gives no less (in some cases - more)
information than the more complicated models of [3]-type for describing bistable
perception. In addition, our approach is analytical one, while other models result
in numerical calculations and results only.
Usually, the energy function is written, by analogy with the phenomenological
theory of phase transitions [12], in the form of the power function of some state
parameter whose changing corresponds to the dynamic transition of the system
from one state to another. However drawing such a power form is justified only by
the possibility to expand the function U , in the neighborhood of its minimuma,
in powers of the state parameter. Therefore, the form of that function could be
selected arbitrary (mainly, for the ease of convenience) from the class of those,
preferably simple functions, that describe the needed evolution of the two-well
potential with changing the state parameter. Specifically, we write that function
in the form
U (θ) = −U0 (sin2 θ + Jθ), (1)
where θ is the generalized coordinate of the system state (the dynamical variable,
or the order parameter), U0 is the typical system “energy”. Here J(t) is the
control parameter, generally time-dependent, that defines the system state. For
instance, in the case of the Necker cube (see below) the image contrast could
Perception of Ambiguous Images 97
play the role of such a control parameter. We will be interested in the interval
of changing the parameter θ0 that corresponds to those minima of the function
U (θ) which are proximate to the point θ = 0. At J = 0 these extremes are placed
in points θ1 = −π/2, θ2 = π/2 (minima), θ0 = 0 (maximum). If J = 0, then
the maximum shifts to the point, where sin 2θ0 = J, and minima – to points
θ1 = −π/2 + θ0 , θ2 = π/2 + θ0 (see Fig. 1).
With rising the parameter J, the tilt of the energy landscape changes – the
first minimum becomes shallower, the second one – more deep, and the barrier
between them diminishes. Let, for instance, in the original state J = −1 and
the system resides in the first deep minimum. Then, with rising the control
parameter J the system will move (due to fluctuations) from the state θ1 (where
it has existed at J = −1) to the state θ2 , clearing the reduced barrier with the
top in the point θ0 . In full that barrier disappears at J = +1 (see Fig. 2).
Under cyclic variation of the parameter J, the system does not have time to
follow it, and, due to such an “inertia”, the hysteretic dependence θ(J) arises,
shown in the insert of Fig. 2 and associated with system transitions from one
well to another over the detaching barrier of the finite hight. In the example
case, the transition occurs at J = ±0.5.
Barrier heights Δ12 , Δ21 , obstructing system transitions from the minimum
θ1 to the minimum θ2 and the reverse, is readily found from Eq. (1):
Δ12 /U0 = 1 − J 2 + J · (arcsin J − π/2), Δ21 /U0 = 1 − J 2 + J · (arcsin J + π/2).
Instead of the explicit accounting the noise influence we will use the well-
known Arrhenius-Kramers formula [13] for the mean lifetime τ of the system in
the certain quasi-stationary state which is determined by the relation between
the height Δ of the “energy” barrier and the mean value Φ of the noise fluc-
tuation energy (that value could be called the chemical temperature)1 :
1
By fluctuations we mean the deviation of ion or neurotransmitter concentrations in
synaptic contacts. That is why we call this noise as chemical one. This term is purely
phenomenal, different processes could group together under this same heading. But,
nevertheless, the electric potential of a membrane fluctuates in a random manner
(see [14]).
Perception of Ambiguous Images 99
τ = τ0 exp(Δ/Φ), (3)
where τ0 is the constant which should be estimated (see below), and by reason
of its general sense is the time between two successive attempts to clear the
barrier. In fact, that relationship defines the probability of the system transition
in one or another state. The chemical or noise temperature Φ is the chemical
analog of thermal fluctuations (to which the thermal energy corresponds in the
chemical kinetics).
3 Hysteresis
To estimate the width of the hysteresis loop for the dependence θ(J) (for
instance, with varying the control parameter J(t) with time), we will base on the
assumption that the transitions θ1 → θ2 and θ2 → θ1 between minimums of the
energy U (θ) occur not at the moment when the barrier between these two states
disappears, but upon condition that the life-time τ of the current state (see
Eq. (3)) diminishes (due to reducing the barrier hight) so that becomes much
less than the time T of the J-parameter sweeping, that is under the condition
In the experiment [10], the Necker cube has been presented as the ambiguous
figure (see Fig. 4) with the contrast of three neighbor cube edges, meeting in its
left middle corner, as the control parameter −1 < J < 1. The values J = −1
and J = +1 correspond, respectively, to luminosities j = 0 and j = 255 for
pixels of those edges images with 8-bit gray scale. Thus, the contrast J (the
control parameter) has been defined by the relation J = 2j/255 − 1, where j is
the luminosity of those lines on the given scale. In such a case, the contrast of
three middle cube edges, meeting in the right middle corner, equals 1 − 2J, and
the contrast of six visible outer cube edges equals to 1. In the symmetrical case
J = 0, so that the parameter J defines the deviation from the symmetry. For
the pure left cube J = −1, and for the pure right cube J = 1.
100 E. Meilikov and R. Farzetdinova
Fig. 4. Images of Necker cubes with different contrasts being defined by the control
parameter J [10].
In the course of the experiment, cube images with N random values Ji of the
control parameter (i = 1, 2, . . . , N ) have been presented many times. Subjects
have been requested to press buttons on the control panel, according to their
initial impression – if the cube is “left” (Fig. 4a) or “right” (Fig. 4e). Each cube
with the fixed value of the control parameter Ji has been randomly presented
many times.
For each value Ji of the control parameter, the probability
of observing the left cube has been calculated. Here l(Ji ) and r(Ji ) are, respec-
tively, numbers of pressing the left or the right button after presenting cubes
with the value Ji of the control parameter.
Shown in Fig. 5 experimental results are qualitatively similar for all subjects
but differ quantitatively. For some observers, the perception of images as left cube
ones transforms steeply into their perception as right cubes (near the “symmetry
point” J = 0, where PL = 0.5; see the upper panel of Fig. 5), while for others
this conversion is smeared (see the lower panel of Fig. 5).
In [10] those results are associated with competing different neuron popula-
tions near the cusp point in the catastrophe theory with noise included [15]. Our
approach is much simpler one – we use the Arrhenius relation(3) for the system
life-time in a metastable state that permits to describe correctly not only the
dependency PL (J), but the hysteresis of the image perception under the cyclic
variation of the control parameter (see below), as well.
We could identify the memorized patterns of the left and the right cubes with
some long-formed wells of the energy landscape, while the new image to be rec-
ognized – with the virtual (recently formed) well. Recognizing the image in that
model is the transfer of the system from the new well of the energy landscape,
corresponding to the presenting image, into one of two other wells, correspond-
ing in our case to engrams of the left and the right cubes. The direction of such
a random, to some extent, transfer is defined by the fact that barriers between
the initial and two final wells have different heights. The barrier between wells
of more similar images is lower, and that leads to the preferred transfer from the
well of the presented image into the well of more similar memorized one.
Let ΔL and ΔR be the heights of the barriers indicated. If the presented
image is more similar to the left cube image, then ΔL < ΔR and conversely. It is
clear, the more the contrast of the presented cube differs from the zero contrast
Perception of Ambiguous Images 101
while in the last case the noise intensity is high enough: Φ/(ΔL − ΔR ) ∼ 1,
and is comparable with barrier heights.
102 E. Meilikov and R. Farzetdinova
h = (τf + τb ) /T − 1, (11)
which goes to zero (and even becomes negative), when τf , τb < T /2, and is
distinct from zero at low T , when τf , τb > T /2 and h > 0. Hysteresis loops for
these two cases are gone in opposite directions – clockwise (h < 0) and anti-
clockwise (h > 0). As it is seen from (6), the case h < 0 is realized under the
condition
Φ/U0 > 1/ ln(γT /τ0 ), (12)
that corresponds to high enough (other factors being equal) intensity of fluctu-
ations Φ/U0 1, provoking “advanced” transitions between energy minima
over high barriers.
The logarithmic dependency predicted by our model agrees with the exper-
iment [9], that allows to estimate numerically some model parameters. Figure 6
presents two typical experimental dependencies of the hysteresis width h on T
(for two different subjects), which are properly approximated by straight lines
in the logarithmic scale. For numerical estimates, it is convenient to introduce
the dimensional constant τ1 = 1 s and rewrite Eq. (6) in the dimensionless form
h = A − B ln(T /τ1 ), where
switching moment (t = tf ) reaches some critical value Jc . For the linear sweeping
in the forward direction J = −1+t/T , so that Jc = tf /T , or tf = Jc ·T , that cor-
responds to the simple rule: switching time is proportional to the sweeping time.
That rule is in some extent confirmed by the experiment [9] – see Fig. 7, where
experimental dependencies τf, b (T ) are presented. One could see that in spite of
high data scattering those dependencies could be, in fact, considered as linear
ones. They correspond to the value Jc ≈ 0.15. Hence, in that model the switching
should happen every time when the contrast difference reaches ∼15%. However,
this over-simplified model predicts the constant hysteresis width h ≈ −0.7 (see
(11)), that contradicts to the experiment.
6 Conclusions
References
1. Necker, L.: Observations on some remarkable phenomenon which occurs on viewing
a figure of a crystal of geometrical solid. London Edinb. Philos. Mag. J. Sci. 3, 329–
337 (1832)
2. Huguet, G., Rinzel, J., Hupé, J.-M.: Noise and adaptation in multistable percep-
tion: noise drives when to switch, adaptation determines percept choice. J. Vis.
14(3), 19 (2014). 14
3. Moreno-Bote, R., Rinzel, J., Rubin, N.: Noise-induced alternations in an attractor
network model of perceptual bistability. J. Neurophysiol. 98, 1125–1139 (2007)
Perception of Ambiguous Images 105
4. Pressnitzer, D., Hupé, J.M.: Temporal dynamics of auditory and visual bistability
reveal common principles of perceptual organization. Curr. Biol. 16, 1351–1357
(2006)
5. Leopold, D.A., Logothetis, N.K.: Multistable phenomena: changing views in per-
ception. Trends Cogn. Sci. (Regul. Ed.) 3, 254–264 (1999)
6. Long, G.M., Toppino, T.C.: Enduring interest in perceptual ambiguity: alternating
views of reversible figures. Psychol. Bull. 130, 748–768 (2004)
7. Sterzer, P., Kleinschmidt, A., Rees, G.: The neural bases of multistable perception.
Trends Cogn. Sci. 13(7), 310–318 (2009)
8. Haken, H.: Principles of Brain Functioning. Springer, Cham (1996)
9. Pisarchik, A.N., Jaimes-Reátegui, R., Alejandro Magallón-Garcia, C.D., Obed
Castillo-Morales, C.: Critical slowing down and noise-induced intermittency in
bistable perception: bifurcation analysis. Biol. Cybern. 108(4), 397–404 (2014).
https://doi.org/10.1007/s00422-014-0607-5
10. Runnova, A.E., Hramov, A.E., Grubov, V.V., Koronovskii, A.E., Kurovskaya,
M.K., Pisarchik, A.N.: Chaos. Solitons Fractals 93, 201–206 (2016)
11. Stiller, W.: Arrhenius Equation and Non-Equlibrium Kinetics. BSB B.G. Teubner
Verlagsgesellschaft, Leipzig (1989)
12. Toledano, J.-C., Toledano, P.: The Landau Theory of Phase Transitions. World
Science, Singapore (1987)
13. Kramers, H.A.: Brownian motion in a field of force and the diffusion model of
chemical reactions. Physica 7, 284–304 (1940)
14. Burns, B.D.: The Uncertain Nervous System. Edward Arnold (Publishers) Ltd.,
London (1968)
15. Poston, T., Stewart, I.: Catastrophe Theory and its Applications. Pitman, London
(1978)
Video-Computer Technology of Real Time
Vehicle Driver Fatigue Monitoring
Abstract. This article is devoted to the actual problem of human fatigue control
and attention concentration reduction in transport. The authors consider the most
efficient, by their mind, method of person psycho-emotional condition assess-
ment that is video control based on eye condition analysis. It is based on a
convolutional neural network having its own topology. The problem of network
optimal depth choosing to operate in real-time mode, and the problem of large
accuracy indicators on a single-board ARM processor architecture computer
were analyzed. As a research result, the software and hardware complex pro-
totype was presented. This prototype allows to detect human fatigue by means
of eye video image analysis. This system allows to reduce number of car
accidents associated with vehicle driver falling asleep. In conclusion, the short-
term project development prospects are proposed. Fatigue of the person who
makes control, management or decision-making, and decrease of attention
concentration on object can lead to critical consequences. The most efficient
person physiological state control is video control based on eye condition
analysis. The algorithm based on convolutional neural network and its hardware
implementation, providing face search in the image, eye detection and analysis
of eye condition by “open-closed” principle is proposed.
1 Introduction
admitted that they woke up after a collision or exiting from the roadway. About 20% of
all accidents are caused by falling asleep when driving [5]. A sleepy driver, like a
drunk, is extremely dangerous on the road. Every year the number of car accidents
caused by driver falling asleep, is increasing worldwide. A survey conducted in Nor-
way found that 1 of 12 car drivers at least once fell asleep when driving during a year.
Main signs of driver concentration decrease are the followings:
• vision focusing difficulties;
• frequent eye blinking;
• feeling of heavy eyelids;
• hardness of keeping head straight;
• frequent driver yawn;
• driver can hardly remember the last traveled kilometers;
• driver passes road signs without paying attention to them;
• car often moves out of his lane;
• difficulties connected with distance keeping;
• a car touches a noise strip on a road side.
In addition, human physiological reactions of cardiovascular, respiratory, central
nervous systems are changed in a state of fatigue and drowsiness.
Behavioral indicators such as yawning, blinking, head tilting, long time road dis-
tracted look are often used to detect the signs of driver’s attention concentration
decrease automatically [13].
Observation systems and video processing integrated into vehicle can significantly
improve transport security. The most efficient concentration decrease sign is the
dynamics of eye condition [6]. Developments of this type [2], implemented in hardware
can be mentioned. Despite apparent simplicity of the open and closed eyes detecting
function on the video frame, the presented systems are far from perfect. Eyes detecting
difficulties arise when the driver turns his head, for example, looking at the side
window or rear-view mirror, or at night, with variable road illumination and oncoming
lighting. There are also many other factors that make operation of the proposed systems
difficult [7].
Recently, systems of driver falling asleep monitoring based on face and eyes
analysis in the image on video camera began to appear actively. Such systems include
CoDriver made by Jungo, devices integrated into the steering wheel provided by
Johnson safety system, and Driver Alert Sleep Warning Device and others. Existing
developments suggest that the camera should be in a position to “see” the whole face.
This arrangement can be inconvenient and also makes it difficult to integrate such
systems into many vehicles where this optical sensor arrangement is not acceptable due
to design features.
Sharp problem of many systems is use of eyeglasses. Specialized lenses can create
flares and distort arrangement of eyes, making the work of similar systems impossible.
108 Y. R. Muratov et al.
The developed complex also should have a possibility of fast substring in case of
sharp brightness variation (the front lighting, driving in a tunnel, light reflection, etc.),
to have broad range of working temperatures, low power consumption and reasonable
price.
System requires determining position of driver’s eye in the image. Before system
begins to detect a position of eye, it is necessary to localize driver’s face.
One of the most known methods of face detection is Viola-Jones’s method [3, 4].
Its basic principle is in image representation in integrated view. It allows to count total
brightness of any rectangle in the image. Integrated characteristics are used for cal-
culation of signs based on Haar’s primitives [9], and they give output using busting
[10] result. Training takes place very slowly, however search results of an object (face)
pass very quickly, but it is insufficiently correct in case of some head position. The
other relevant image search algorithm is application of Single Shot Multi Box Detector
(SSD) based on application of convolution neural networks, such as MobileNet. Such
algorithms have the highest accuracy then the Viola-John’s (more than 90%). However,
implementation of convolution network based on the ARM CPU shows poor perfor-
mance. One more way of objects detection in the image is histogram of the directed
gradients. The method is based on the directions of image brightness gradients cal-
culation of and on finding the area where the majority of them satisfy to a template. In
other words, it is necessary to find such section of the image that has HOG repre-
sentation most similar to HOG representation of face structure. HOG allows you to
detect face with the ability to vary between performance and accuracy. Therefore, for
example, in DLib [11] library authors could achieve detection accuracy in 99.38%.
Under the constraints imposed by the ARM processor architecture and camera
angles, HOG was chosen. The result of face localization by the HOG algorithm is the
coordinates of square frame containing whole face or its large area.
Eye detection is also possible with several different algorithms. The first and most
commonly used are Haar cascades. Algorithm gives correct results for 80% of cases
when the face is full. In poor lighting, night driving conditions, the algorithm works
unsatisfactorily. Low performance of the algorithm realization is also disadvantage of
Haar cascades. The most productive of all algorithms is the analytical algorithm for
determining landmark facial points. Versions of this algorithm using different datasets
are able to determine from 5 to 68 facial points at a speed about 7000 FPS on the
classical x86-x64 CPU architecture. In Fig. 1, the results of both methods, Haar cas-
cades and Analytics in the case when eyes were not localized by Haar cascades are
presented. Eyes selected by Haar cascades are highlighted by black rectangles. Data
obtained by the analytical algorithm are highlighted by grey rectangles.
Video-Computer Technology of Real Time Vehicle Driver 109
To detect eye condition either open or closed, you can use the analysis of eye points
coordinates obtained by analytical algorithm. In our implementation, each eye is
described by six points. However, accuracy of this analysis depends on the size of the
eyes and eyelids of different races of people, also on lighting conditions and head
positions. For certain positions of head analytic algorithm gives the front point of the
eye, that does not belong to eye. Therefore, it was decided to use a neural network
algorithm to detect the state of the eye. As the platform for training Keras was used [8].
The models received using of this Framework have the small weight and good inte-
gration into Open CV.
Input value of neural network is the image of an eye and its neighborhood in
several pixels. The image was scaled to size 96*96. Such image size was optimum.
After that image goes to an input of neural network. The output has 3 values:
1. Probability that the image has open eye
2. Probability that the image has closed eye
3. Probability that the image has no eye at all
The third value allows to exclude false operations of a complex when eye (face)
was not found, for example, in case of turn of the head. One of the main problems was
the choice of network architecture.
Small networks have low accuracy; however they make a decision very quickly.
Large networks, such as Xception [7, 12], on the contrary, have low speed of work and
give high accuracy.
Optimum implementation was reached by means of new variant of network
architecture creation that has the highest accuracy among others capable to work in real
time mode at RockChip RK3399 CPU (less than 30 ms per 1 frame).
Eight variants of neural network architectures were analyzed. The experiment
results are shown in Fig. 2.
As a result, the most optimum network has a structure
P3C32P2C64P2C128P2C256P2D1024D3, where:
Cn – convolution operation of the image with selection of n-signs
Pm – subselection operation (Max Pooling) with a m*m core size
Dk – full-meshed layer of neural network in k-neurons
110 Y. R. Muratov et al.
Fig. 2. Comparison of different neural network models depending on speed and accuracy
This model works with square color images of eye area, having 96*96 size and
represents consecutive execution of Convolution and Pooling operations increasing
number of characteristics until the array consisting only of 1024 characteristics is
received. After that, received characteristics go to a full-connected layer from 1024
Video-Computer Technology of Real Time Vehicle Driver 111
neurons and after that images are separated into three required classes: BAD, OPEN
and CLOSE. As activation function the ReLU function was selected:
0; if x\0
f ð xÞ ¼ ð1Þ
x; if x 0
Training was made on special marked set of images consisting of more than
150.000 images of the opened and closed eyes.
The network was trained on 6 iterations. Figure 4 shows dependence of false
positives on training iterations.
The training was provided on our own datasets which included samples of eyes
images of University students of different nationalities, under different lighting con-
ditions, different attributes of face. It allowed to provide high-quality work regardless of
conditions.
The result of trained network is shown in Fig. 5. Test sample shows that the
presented model demonstrates high accuracy under various conditions: glasses, glare,
sudden changes in brightness, etc.
The algorithms are implemented in C++, using the OpenCV library under the ARM
architecture.
112 Y. R. Muratov et al.
3 Experiment Results
In Fig. 8 the result of the neural network algorithm for detecting eyes state is
presented.
The presented group of algorithms allows you to find and select areas regardless of
the shooting conditions and the presence of glasses.
Testing the system prototype made it possible to confidently determine the moment
of closing and opening the eyes with the following shooting parameters:
• daylight and head positions ±45° horizontally;
Video-Computer Technology of Real Time Vehicle Driver 113
Fig. 7. The result of the face extraction module using a neural network
Fig. 8. The eye detection result. Eyes closed on the left and open on the right
• night mode (illumination of the IR diodes with a wavelength of 840 nm and con-
ditions of the head ±35°. horizontally;
• glasses with diopters ±5 day and night lighting for cases when the glasses shackle
does not cover the eyes in the image (angles of rotation of the head ±35°
horizontally);
• safety glasses with a light degree of shading, daylight for cases when the glasses do
not cover the eye on the image (angles of rotation of the head ±35° horizontally.)
4 Conclusion
References
1. Dushkov, B.A., et al.: Fundamentals of Engineering Psychology, p. 576, Moscow-
Yekaterinburg (2002)
2. Alyushin, M.V., Alyushin, A.V., Belopolsky, V.M., Kolobashkina, L.V., Ushakov, V.L.:
Optical technologies for monitoring systems of the current functional state of the operational
composition of the management of nuclear power facilities. Global Nucl. Saf. 6, 9–77
(2003). Moscow
3. Melnik, O.V., Demidova, K.A., Nikiforov, M.B., Ustyukov, D.I.: Continuous monitoring of
blood pressure of the vehicle crew and decision makers. Defense Technol. Sci. Tech.
Collect./FSUE “NIISU” 9, 77–80 (2016)
4. Sahayadhas, A., Sundaraj, K., Murugappan, M.: Detecting driver drowsiness based on
sensors: a review. Sensors 12(12), 16937–16953 (2012). (Basel). Publish
5. Ovcharenko, M.S.: Analysis and forecast of the state and level of accidents on the roads of
the Russian Federation and ways to reduce it. Sci. Methodical Electron. J. Concept 15,
1661–1665 (2002)
6. Dimov, I.S., Derevyanko, R.E., Kotin, D.A.: Automated system for preventing the driver
from falling asleep while driving. Vestn. MGTU 20(4), 659–664 (2017)
7. Image Processing in Aviation Vision Systems. Kostyashkin, L.N., Nikiforov, M.B. (eds.),
p. 240. Fizmatlit, Moscow (2016)
8. Chollet, F.: Keras. https://github.com/fchollet/keras. Accessed 21 Nov 2015
9. Viola, P., Jones, M.: Robust Real-time Object Detection. Cambridge Research Laboratory,
Cambridge (2001)
Video-Computer Technology of Real Time Vehicle Driver 115
10. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features.
Conf. Comput. Vis. Pattern Recogn. 1, l-511–l-518 (2001)
11. King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
12. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR, vol.
2 (2017)
13. Furman, G., Baharav, A., Cahan, C., Akselrod, S.: Early detection of falling asleep at the
wheel: a heart rate variability approach. Comput. Cardiol. 35, 1109–1112 (2008)
Consistency Across Functional Connectivity
Methods and Graph Topological Properties
in EEG Sensor Space
Abstract. One of the most widely used topological properties of brain graphs is
small-worldness. However, different functional connectivity methods can gen-
erate quantitatively different results, particularly when they are applied to EEG
sensor space. In this manuscript, we sought to evaluate the consistency of values
derived from pairwise correlation between selected functional connectivity
methods. We showed that the alpha band yielded maximal values of correlation
coefficients between small-worldness indices obtained with different methods. In
contrast, delta and gamma bands demonstrated the least consistent results.
1 Introduction
The recent progress in neuroscience has made it possible to frame the brain functioning
in terms of graph theory. There are many metrics to evaluate topological features of the
complex networks. Watts and Strogatz defined a generative model for graphs with two
key properties: clustering coefficient and characteristic path length [1]. The generated
graphs having hybrid properties, short path length and high clustering coefficient, were
called small-world networks [2]. Their characteristic, small-worldness (SW), was found
to be ubiquitous and universal across both living and non-living complex systems (e.g.
C. elegance connectome, social networks, Internet) [2].
The mainstream standpoint in neuroscience is that these complex brain networks
are organized through synchronization of multiple brain areas. Neural oscillations may
play a causal role in forming brain activity and behavior [3]. Functional connectivity is
intended to characterize such patterns of synchronization.
It has repeatedly been demonstrated that topological properties of EEG-based brain
graphs can be useful in constituting novel biomarkers of psychiatric and neurological
disorders [4–6]. However, ultimate results of SW coefficient computations are highly
dependent on the method being used. For example, M. Lai and colleagues, comparing
scalp- and source-based measures of functional connectivity, found strong correlation
for the global connectivity between scalp- and source-level, but arguing that network
topology was only weakly correlated [7].
2 Methods
One hundred and seven healthy volunteers participated in the experiment. High-density
EEG recordings in resting state with eyes open were analyzed. These recordings are
part of publicly available EEG dataset [8–10]. The EEG was recorded from 64 elec-
trodes as per the international 10-10 system (excluding electrodes Nz, F9, F10, FT9,
FT10, A1, A2, TP9, TP10, P9, and P10). We defined frequency ranges of EEG activity
according to conventional division: delta (1–3, 5 Hz), theta (4–7, 5 Hz), alpha (8–12,
5 Hz), beta (13–29, 5 Hz), gamma (30–45 Hz). Two reference electrodes were posi-
tioned at the left and right mastoids. The data were re-referenced offline to the common
average reference.
In present study, we used six functional connectivity measures.
1. Coherence [11],
E Sxy
Coh ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð1Þ
E ½Sxx E Syy
4. wPLI [14],
E Im Sxy
wPLI ¼ ð4Þ
E Im Sxy
Phase locking value, another method measuring phase synchrony. Originally tai-
lored to study evoked activity, it can still be applied to the resting-state.
6. PPC [16],
2 XN 1 XN
PPC ¼ cos hj hk : ð6Þ
N ðN 1Þ j¼1 k¼j þ 1
C=Cr
SW ¼ : ð7Þ
L=Lr
and stacked into 1 x 107 array, in accordance with the number of participants.
Statistical analysis was performed using IBM SPSS version 25 (IBM Corp,
Armonk, NY, USA). Normality of distribution of the data was assessed with
Kolmogorov-Smirnov test. As a proxy measure for consistency, values of correlation
coefficients between different functional connectivity metrics across different frequency
bands were used.
Consistency Across Functional Connectivity Methods 119
3 Results
In this paper, we strived to provide a brief and concise illustration of how consistent
measures of functional connectivity across different EEG frequency ranges were.
Major finding of the study is that the alpha range gives the highest correlation
coefficients and, therefore, allowing one to get more similar estimations of topological
properties of brain graphs across functional connectivity methods being tested. Pre-
dominance of activity in alpha band is distinguishing feature of brain resting state.
Moreover, EEG studies have shown that alpha power fluctuations in brain areas
directly point out to level of inhibition this region is exposed to [19]. Thus, alpha band,
being a conspicuous and reproducible feature of brain activity at rest, provide us with
the most consistent measures of topological properties of brain networks.
The least consistent values of correlation strength between FC methods were found
in delta and gamma frequency ranges. Delta and gamma bands are extreme examples of
EEG frequencies continuum, representing different modes of neural information pro-
cessing with delta being mostly involved in coordination distantly located areas, while
gamma rhythm engaged in local information processing [3]. However, it is currently
unclear to what extent it may relates to results observed in this paper.
wPLI has high correlation values with PLI in all frequency ranges. This may be
attributed to the fact that wPLI is an extension of the PLI. Both measures are insensitive
to volume conduction which represents the major issue for FC computed on EEG
sensor space data. The significance of this issue for functional connectivity analysis
may also be evidenced by considering iCoh-Coh pair. Correlation values between iCoh
and Coh didn’t surpass q = 0,39 (except for alpha range with q = 0,7), indicating the
possible presence of volume conduction effects.
It is worth noticing, however, that our study has a number of limitations. Firstly, we
used sensor but not source space data for analyzing SW of brain graphs. Therefore, the
obtained results should be taken with caution. Secondly, we did not verify the results
on directional and weighted graphs which also may give different pattern of results.
Finally, all the computations in sensor space are reference-dependent which implies the
need to reexamine these results by using different reference techniques. As a possible
extension of current paper, correlations between the different connectivity approaches
[20], namely time domain methods and frequency domain ones may be considered.
Space limitations prevent us from including an exhaustive list of all pairwise
comparisons between selected functional connectivity methods.
In conclusion, taking into account all abovementioned issues of extant data, it is
highly warranted to direct our efforts to critical and thorough revision of currently used
brain graph topological metrics and their clinical applications.
References
1. Watts, D.J., Strogatz, S.H.: Collective dynamics of “small-world” networks. Nature 393
(6684), 440–442 (1998)
2. Fornito, A., Zalesky, A., Bullmore, E.T.: Fundamentals of brain network analysis, p. 476.
Academic press, Cambridge (2016)
Consistency Across Functional Connectivity Methods 123
3. Thut, G., Miniussi, C., Gross, J.: The functional importance of rhythmic activity in the brain.
Curr. Biol. 22(16), R658–R663 (2012)
4. Jhung, K., Cho, S.-H., Jang, J.-H., Park, J.Y., Shin, D., Kim, K.R., An, S.K.: Small-world
networks in individuals at ultra-high risk for psychosis and first-episode schizophrenia
during a working memory task. Neurosci. Lett. 535, 35–39 (2013)
5. Stam, C., Jones, B., Nolte, G., Breakspear, M., Scheltens, P.: Small-world networks and
functional connectivity in alzheimer’s disease. Cereb. Cortex 17(1), 92–99 (2006)
6. Wei, L., Li, Y., Yang, X., Xue, Q., Wang, Y.: Altered characteristic of brain networks in
mild cognitive impairment during a selective attention task: an EEG study. Int.
J. Psychophysiol. 98(1), 8–16 (2015)
7. Lai, M., Demuru, M., Hillebrand, A., Fraschini, M.: A comparison between scalp- and
source-reconstructed EEG networks. Sci. Rep. 8(1), 12269 (2018)
8. Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., et al.:
PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for
complex physiologic signals. Circulation 101(23), e215–e220 (2000)
9. Schalk, G., McFarland, D.J., Hinterberger, T., Birbaumer, N., Wolpaw, J.R.: BCI2000: a
general-purpose brain-computer interface (BCI) system. IEEE Trans. Biomed. Eng. 51(6),
1034–1043 (2004)
10. http://www.schalklab.org/research/bci2000
11. Bowyer, S.M.: Coherence a measure of the brain networks: past and present. Neuropsy-
chiatr. Electrophysiol. 2(1), 1 (2016)
12. Nolte, G., et al.: Identifying true brain interaction from EEG data using the imaginary part of
coherency. Clin. Neurophysiol. 115(10), 2292–2307 (2004)
13. Stam, C.J., et al.: Phase lag index: assessment of functional connectivity from multi-channel
EEG and MEG with diminished bias from common sources. Hum. Brain Mapp. 28(11),
1178–1193 (2007)
14. Vinck, M., et al.: An improved index of phase-synchronization for electro-physiological data
in the presence of volume-conduction, noise and sample-size bias. NeuroImage 55(4), 1548–
1565 (2011)
15. Lachaux, J.P., et al.: Measuring phase synchrony in brain signals. Hum. Brain Mapp. 8(4),
194–208 (1999)
16. Vinck, M., et al.: The pairwise phase consistency: a bias-free measure of rhythmic neuronal
synchronization. NeuroImage 51(1), 112–122 (2010)
17. Gramfort, A., Luessi, M., Larson, E., Engemann, D., Strohmeier, D., et al.: MEG and EEG
data analysis with MNE-Python. Front. Neurosci. 7, 267 (2013). ISSN 1662-453X
18. Jas, M., Engemann, D., Bekhti, Y., Raimondo, F., Gramfort, A.: Autoreject: automated
artifact rejection for MEG and EEG data. NeuroImage 159, 417–429 (2017)
19. Bazanova, O.M., Vernon, D.: Interpreting EEG alpha activity. Neurosci. Biobehav. Rev. 44,
94–110 (2014)
20. Bastos, A.M., Schoffelen, J.-M.: A tutorial review of functional connectivity analysis
methods and their interpretational pitfalls. Front. Syst. Neurosci. 9, 175 (2016)
Evolutionary Minimization
of Spin Glass Energy
1 Introduction
The current work is development of our previous article [1]. The new features of the
current paper are the following: we consider here the more detailed model of the
evolutionary minimization of spin glass energy and analyze additionally several
properties of spin glasses that are related with the considered evolutionary search. This
additional analysis includes: (1) estimation of the global energy minima of spin glasses
by computer simulation, (2) estimation of energy variation at changing the sign of one
spin (this variation can be considered as the one-spin mutation), (3) the study of the
gradual decrease of spin glass energy. The gradual decrease is performed by the fol-
lowing method: the spins of the spin glass are sequentially changed and the changes,
which decrease the energy, are fixed. The analysis is performed by means of computer
simulation.
The most essential result of the current work is the analytical estimation of the rate
and efficiency of evolutionary minimization of the spin glass energy. Using computer
simulation, we have checked this analytical estimation.
Our evolutionary model is similar to the quasispecies model [2, 3]. In the current
article, we use analogy with the quasispecies model and our previous estimation for the
quasispecies model [4, 5] with Hamming distance between agent genotypes.
X
N
EðSÞ ¼ Jij Si Sj ; ð1Þ
i;j¼1;i\j
where Jij are the exchange interactions matrix elements. Jij are normally distributed
random values. Probability density P(Jij) is:
rffiffiffiffiffiffiffiffiffiffiffiffi ( " #)
N1 Jij2 ðN 1Þ
PðJij Þ ¼ exp : ð2Þ
2p 2
The model (1), (2) was intensively investigated. For further consideration, the
following spin-glass features are essential.
The number of local energy minima M is very large [8]:
A local energy minimum is defined as a spin glass state SL, at which the change of
sign of any one spin (Si !–Si) increases the energy E.
The global energy minimum E0 equals approximately –0.8 N [9]:
E0 0:8N: ð4Þ
126 V. G. Red’ko and G. A. Beskhlebnova
From (1), (2) one can obtain that the mean value of the spin-glass energy is zero:
\E [ ¼ 0 ð5Þ
and the mean square root value of the energy variation at the change of sign of any one
spin (Si !–Si) is of the order of 1 [1]:
rffiffiffi
8
\DE [ ¼ : ð6Þ
p
Using computer simulation, we have checked the estimations (4), (6). Figure 1
shows the dependence of the global energy minimum E0 on the number of spins N in
the spin glass. Almost all results are averaged for different number of independent
calculations. Numbers of independent calculations nav are as follows: for N = 5, 10
nav = 106, for N = 15 nav = 104, for N = 20 nav = 103, for N = 25 nav = 10. For
N = 30, there was only single calculation.
-5
E0
-10
-15
-20
-25
0 5 10 15 20 25 30
N
Fig. 1. The dependence of the global energy minimum E0 on the number of spins N.
We also calculated the mean square root value of the energy variation at the change
of sign of any one spin (Si !–Si) (this result was averaged for 10000 of independent
calculations). The result was the calculated estimation: <DE> 1.60. The results of
these calculations agree with the estimations (4), (6).
The population is the set of n agents with genotypes Sk, k = 1,…, n. We suppose,
that (1) the evolutionary process consists of consecutive generations, (2) new gener-
ations are obtained by the selection and the mutations of agents. The agent is selected
into the population of the new generation in accordance with the fitness (7). At the
mutations, the signs of genotype symbols are changed (Ski !– Ski) with the probability
Pm for any symbol. The selection of agents into the new population is probabilistic: any
agent is selected into the new population with the probability, which is proportional to
its fitness f(Sk); namely, the well-known method “roulette wheel selection or fitness
proportionate selection” is used. The genotypes of agents of the initial population are
random.
Similar to the quasispecies model with the Hamming distance between the geno-
types of agents [4, 5], we will suppose the following natural relationships between the
parameters of the model: N, n >> 1, 2 N >> n, b >* PmN, PmN *<1, n * N. The
inequality 2 N >> n means that the evolutionary process is essentially stochastic, the
number of possible genotypes in populations is relatively small, and some kinds of
genotypes S are absent in the population. The relation b >* PmN means that the
intensity of selection is enough large. The relation PmN *<1 means that the intensity
of mutations is relatively small. The relation n * N means that the role of neutral
selection (independent on the fitness) is sufficiently small [5]. At these relations, the
evolutionary rate is mainly due to two processes: mutations and selection; at mutations,
the new agents with new kind of genotypes appear in the population; at the selection,
the agents with large fitness are selected into the population of the new generation.
GM þ GS
G1 ; ð8Þ
DE
where DE is the characteristic value of the variation of energy at one mutation.
GM * (NPm)−1 is the characteristic number of generations required for a single
mutation in a genotype. GS * (b DE)−1 is the typical number of generations, at which
agents with the energy <E>P – DE replace agents with the energy <E>P in the pop-
ulation. Pm is the probability of one mutation. According to the expression (6)
DE * 1.
128 V. G. Red’ko and G. A. Beskhlebnova
The total change of the energy in the population during the evolutionary search of
energy minima according to (4), (5) is of order N, hence the characteristic number of
generations of the whole process of the evolutionary minimization of spin glass energy
GT for the considered model is GT * G–1 N. Therefore, we have:
1 N
GT þ : ð10Þ
Pm b
GT N; ntotal N 2 : ð11Þ
The expressions (11) characterize the main results of our estimations. These
expressions have been checked by means of computer simulation.
-20
E
-40 1
-60
2
-80
0 50 100 150 200
G
Fig. 2. The dependence of spin glass energy of agents E on the generations G of evolutionary
search. 1 – the average energy of agents in the population, 2 – the minimal energy of agents in the
population. The parameters of simulation were the following: the number of spins N = 100, the
population size n = N = 100, the mutation intensity Pm = N−1 = 0.01, the parameter of selection
intensity b = 1. Results are averaged for 1000 different calculations.
Evolutionary Minimization of Spin Glass Energy 129
N = 100. Figure 2 shows the dependence of spin glass energy of agents on the gen-
erations of evolutionary search.
Figure 2 shows that the characteristic number of generations at evolutionary search
GT is of the order of the number of spins N. This is in accordance with the estimations
(11).
It should be underlined that the evolutionary search results in one of the local
energy minima of a spin glass. These minima are rather close to the global minimum of
the energy of the spin glass.
We also considered the gradual decrease of the spin glass energy, which is formed as
follows. The signs of the symbols of the spin glass are sequentially changed (Si !– Si,
i = 1, …, N) and only successful sign changes (resulting in the decrease of the spin glass
energy) are fixed. The considered sequential search needs smaller number of participants
as compared with the evolutionary search. Using computer simulation, we have ana-
lyzed the sequential search. The process of energy minimization at the sequential search
is characterized by Fig. 3.
0
-5
E
-10
-15
-20
-25
-30
0 200 400 600 800 1000
t
Fig. 3. The dependence of the spin glass energy E on the searching time t at the sequential
search. Results are averaged for 1000 different calculations.
Comparison of Figs. 2 and 3 shows that the evolutionary search provides a sig-
nificantly deeper local energy minima EL, as compared with sequential search, because
different valleys in energy landscape are looked through simultaneously in the evo-
lutionary process with approaching to energy minima. Moreover, the evolutionary
search ensures the finding of sufficiently deep local minima that are close to the global
minima (see the expression (4) and Fig. 1 that characterize the value of global minima
quantitatively). Therefore, in the spin-glass case, the evolutionary search has a definite
advantage with respect to the sequential search: the evolutionary minimization ensures
the finding of the deeper energy minima.
130 V. G. Red’ko and G. A. Beskhlebnova
3 Conclusion
Thus, the model of evolutionary minimization of spin glass energy has been developed.
The rate and efficiency of evolutionary minimization of energy of spin glasses have
been analytically estimated and checked by computer simulation. It has been demon-
strated that the evolutionary search ensures the finding of sufficiently deep local energy
minima that are close to the global minimum.
Acknowledgments. The work was financially supported by State Program of SRISA RAS.
Project number is No. 0065-2019-0003 (AAA-A19-119011590090-2).
References
1. Red’ko, V.G.: Spin glasses and evolution. Biofizika (Biophys.) 35(5), 831–834 (1990). (in
Russian)
2. Eigen, M.: Molekulare selbstorganisation und evolution (selforganization of matter and the
evolution of biological macromolecules). Naturwissenschaften 58(10), 465–523 (1971)
3. Eigen, M., Schuster, P.: The Hypercycle: A Principle of Natural Self-Organization. Springer,
Berlin (1979)
4. Red’ko, V.G., Tsoy, Y.R.: Estimation of the efficiency of evolution algorithms. Doklady
Math. (Rep. Math.) 72(2), 810–813 (2005)
5. Red’ko, V.G.: Modeling of cognitive evolution. Toward the Theory of Evolutionary Origin of
Human Thinking. KRASAND/URSS, Moscow (2018)
6. Sherrington, D., Kirkpatrick, S.: Solvable model of spin-glass. Phys. Rev. Lett. 35(26), 1792–
1796 (1975)
7. Kirkpatrick, S., Sherrington, D.: Infinite range model of spin-glass. Phys. Rev. B. 17(11),
4384–4403 (1978)
8. Tanaka, F., Edwards, S.F.: Analytic theory of the ground state of a spin glass: I. Ising spin
glass. J. Phys. F: Metal Phys. 10(12), 2769–2778 (1980)
9. Young, A.P., Kirkpatrick, S.: Low-temperature behavior of the infinite-range Ising spin-glass:
Exact statistical mechanics for small samples. Phys. Rev. B. 25(1), 440–451 (1982)
Comparison of Two Models of a Transparent
Competitive Economy
1 Introduction
This paper develops our previous works [1–3], in which the basic model of interaction
between two communities of agents has been constructed and investigated. The basic
model considers agents-producers and agents-investors. In the basic model, the pro-
ducers do not take into account their contributions to their own capitals at the distri-
bution of profits. In this paper, in addition to the basic model, a new model has been
constructed, in which producers take into account their own contributions to their
capitals at the distribution of their profits. This means that producers can be considered
as some kind of investors that contribute the capital into themselves. By computer
simulation, the results obtained in these two models are compared for two regimes:
(1) without taking into account their own contributions of producers (the basic model)
and (2) taking into account their own contributions of producers (the new model).
2 Description of Models
At the end of each period T; the investors determine the values of contributions that
they will make into producers in the next period T þ 1. To find these values, tmax
iterations are performed. During iterations, the investors and producers exchange
information by means of light agents: searching agents and intention agents. These
light agents are similar to those used in the works [4, 5].
In the beginning of the period, the i-th producer has a capital Ci :
X
N
Ci ¼ Ci0 þ Cij ; ð1Þ
i¼1
where Ci0 is the own initial capital of the i-th producer, Cij is the capital invested by the
j-th investor into the i-th producer at the beginning of the period. The dependence of
the i-th producer’s profit on its capital Ci is determined by the formula:
where the function FðxÞ is the same for all producers, and the coefficient ki charac-
terizes the efficiency of the i-th producer. The function FðxÞ has the form:
ax; if x Th
FðxÞ ¼ ; ð3Þ
Th; if x [ Th
Cij
Pinv;ij ¼ krepay Pi ðCi Þ ; ð4Þ
P
N
Cil
l¼1
where Ci is the current capital (at the beginning of the period) of the i-th producer,
krepay is the payment parameter that characterizes the part of profits paid to investors,
0\krepay \1: Note that in this basic model, the producers do not take into account the
size of their own contribution Ci0 and give the part of their profits to the investors
according to the parameter krepay (see the expression (4)). The producer itself obtains
the remaining part of the profit:
X
N
Ppro;i ¼ Pi ðCi Þ Pinv;ij : ð5Þ
j¼1
Let’s characterize the iterative process, during which the contributions of investors
into producers are determined. At the first iteration, the investors send the searching
agents to all producers and determine the current capital of each producer. Further, the
Comparison of Two Models of a Transparent Competitive Economy 133
investors estimate the values Aij , which characterize the profit expected from the i-th
producer in the period. The values Aij are equal to:
0 Cij
Aij ¼ dij Pinv;ij ¼ dij krepay ki FðCi0 Þ ; ð6Þ
P
N
Cil
l¼1
where dij is the current degree of confidence of the j-th investor to the i-th producer, Cil
0
is the capital invested by the l-th investor into the i-th producer, Ci0 is the initial capital
of the i-th producer at the beginning of the period (in the first iteration, investments of
other investors are not taken into account). The current degree of confidence dij is equal
to dtest or duntest , dtest [ duntest [ 0: Parameters dtest , duntest take into account the fact that
the investor prefers the tested producers. In computer simulation, we set dtest ¼ 1;
duntest ¼ 0:5:
Then the j-th investor forms the intention to distribute its capital Kinv; j among the
producers proportionally to the values Aij : Namely, it is planned that the contribution
of the j-th investor into the i-th producer Cij will be equal to:
Aij
Cij ¼ Kinv; j : ð7Þ
P
M
Alj
l¼1
At the second iteration, each investor sends the intention agents to all producers
and informs them about the planned values of capital investments Cij : Based on these
data, the producers estimate their new capitals, which they expect after receiving
capitals from all investors. These capitals are calculated in accordance with the
expression (1).
Then investors send again the searching agents to all producers and evaluate the
0
new capitals of the producers Ci0 (taking into account the planned values of invest-
P
N
ments Cij of other investors), as well as the sums Cil . Investors estimate new values
l¼1
Aij in accordance with the expression (6), which already takes into account the sum of
the intended contributions of all investors. Further, each investor forms a new intention
to distribute the capital Kinv; j according to the expression (7). Then investors send
intention agents to the producers and inform them about the new intended values of
contributions Cij : After a sufficiently large number of such iterations, each investor
makes the final decision on investments for the next period. The final contributions are
equal to the values Cij obtained by investors at the last iteration.
At the end of each period, the capitals of the producers are reduced:
Kpro ðT þ 1Þ ¼ kamr Kpro ðTÞ, where kamr is the amortization coefficient (0\kamr 1).
Investors capitals are reduced analogously: Kinv ðT þ 1Þ ¼ kinf Kinv ðTÞ, where kinf is the
inflation coefficient 0\kinf 1 .
If the capital of an investor or producer becomes more than a certain large threshold
Thmax inv or Thmax pro , and the number of agents in the community is less than the
134 Z. B. Sokhova and V. G. Red’ko
possible maximum, then this investor or producer is divided to two agents. When the
investor or producer is divided, the “parent” gives half of its capital to the “descen-
dant”. The “producer-child” inherits the effectiveness ki of its parent. The “investor-
child” inherits the confidence factors dij of the parent investor. The confidence factor dij
to the “descendant” of the producer is set equal to duntest ; since this new producer was
not tested yet.
If the capital of an investor or producer becomes less than a certain small threshold
Thmin inv or Thmin pro ; then this investor or producer dies.
Cij
Pinv;ij ¼ Pi ðCi Þ ; ð8Þ
P
N
Cil þ Ci0
l¼1
0 Cij
Aij ¼ dij Pinv;ij ¼ dij ki FðCi0 Þ : ð9Þ
P
N
Cil þ Ci0
l¼1
Thus, at the distribution of profits, each agent (both the producer and the investor)
receives a profit that is proportional to the contribution of this agent.
The other elements of the new model are the same as in the basic model.
At the computer simulation, we compared the basic model and the new model.
The main parameters of the simulation were the following: the number of periods
NT ¼ 1 or 100; the maximal number of iterations within the period tmax ¼ 10; maximal
capital thresholds for investors and producers Thmax inv ¼ 100:0; Thmax pro ¼ 100:0;
minimal capital thresholds for investors and producers Thmin inv ¼ 0:01; Thmin pro ¼
0:01; the maximal possible number of producers and investors in the community is
Mmax ¼ 2 or 100 and Nmax ¼ 1 or 100; the initial number of producers and investors
M0 ¼ 2 or 100; N0 ¼ 1 or 100; the maximal number of producers in which the investor
can invest its capital m ¼ 2 or 100; the parameter a of the profit function a ¼ 0:1; the
threshold of the profit function Th ¼ 100 (see the expression (3)); the payment
parameter krepay ¼ 0:5; the amortization and inflation coefficients kamr ¼ 1:0; kinf ¼ 1:0;
Comparison of Two Models of a Transparent Competitive Economy 135
the characteristic value of the random variation of the efficiency of producers at the
transition to a new period Dk ¼ 0:01:
For a clearer understanding of the influence of the scheme, which is used by the
producer, on the process the capital investments, the certain simulation was carried out
for the particular case of one investor and two producers. The efficiencies of producers
were k1 ¼ 0:34; k2 ¼ 0:94; the capital of the investor was Kinv ¼ 0:54; the capitals of
producers were Kpro; 1 ¼ 0:48; Kpro; 2 ¼ 0:26: Figure 1 shows the processes of redis-
tribution of the capital by the investor during iterations for the two considered models.
Figure 1 demonstrates that in the basic model, the investor makes contributions into
two producers, and in the new model, the investor selects only one, the most efficient
producer. That is, in the basic model, the investor at planning the contributions to
producers pays attention to both the efficiency and capital amount of producers, and in
the new model, the investor takes into account only the efficiency of producers (see also
the expressions (6), (9) and (2), (3)).
Let’s consider the case of the large community: N ¼ M ¼ 100: The simulation
results for the considered models are presented in Fig. 2.
Analysis of the results for this case shows that in the basic model, when the
producers pay half of their profits to investors krepay ¼ 0:5 , the capital of the producer
community is redistributed by investors more effectively. That is in the next period, the
investor gives the obtained capital to more efficient producers. This is the important
effect of the basic model: the efficient redistribution of capital within the producer
community (by means of investors). Indeed, in the basic model, the total profit (and the
total capital) of the producer community is greater as compared with the new model
(Fig. 2).
136 Z. B. Sokhova and V. G. Red’ko
Fig. 2. Dynamics of total capital of investors and producers in two models. N ¼ M ¼ 100 (lines
for producers and investors in the basic model coincide).
On the other hand, the regime of the new model is more profitable for investors. In
this model, investors choose the most efficient producers, and the profit depends only
on the size of the investments and the efficiency of the producer. It should be noted the
following point of the new model. The investor uses the efficiency of the producer and
receives the main part of the profits that corresponds to the investor’s contribution. The
producer receives only a rather small part of the profits that corresponds producer’s
contribution. Therefore, in the new model, the profits of producers grow more slowly as
compared with the basic model (Fig. 2). From an economic point of view, the regime
of the new model is rather unnatural, since the intensive development of contributions
of investors is not very useful for producers. Therefore, the interaction between agents
is rather ineffective in the new model. Thus, the regime of the basic model is more
interesting for further research.
4 Conclusion
It can be concluded that the behavior of the investors depends on the rules for esti-
mations and distributions of profits. And although the regime of the new model is
beneficial for the investor community, this regime is not profitable for producers. The
producer community is developing more efficiently if the regime of the basic model is
used. Thus, the regime of the basic model is more effective for the total development of
the whole economic community.
Acknowledgments. The work was financially supported by State Program of SRISA RAS.
Project number is No. 0065-2019-0003 (AAA-A19-119011590090-2).
Comparison of Two Models of a Transparent Competitive Economy 137
References
1. Red’ko, V.G., Sokhova, Z.B.: Model of collective behavior of investors and producers in
decentralized economic system. Procedia Comput. Sci. 123, 380–385 (2018)
2. Red’ko, V.G., Sokhova, Z.B.: Iterative method for distribution of capital in transparent
economic system. Opt. Mem. Neural Netw. (Inf. Opt.) 26(3), 182–191 (2017)
3. Sokhova, Z.B., Red’ko, V.G.: Agent-based model of interactions in the community of
investors and producers, In: Samsonovich, A.V., Klimov, V.V., Rybina, G.V. (eds.)
Biologically Inspired Cognitive Architectures (BICA) for Young Scientists. Proceedings of
the First International Early Research Career Enhancement School (FIERCES 2016), pp. 235–
240. Springer, Switzerland (2016)
4. Claes, R., Holvoet, T., Weyns, D.: A decentralized approach for anticipatory vehicle routing
using delegate multiagent systems. IEEE Trans. Intell. Transp. Syst. 12(2), 364–373 (2011)
5. Holvoet, T., Valckenaers, P.: Exploiting the environment for coordinating agent intentions.
In: Environments for Multi-Agent Systems III, Lecture Notes in Artificial Intelligence, vol.
4389, pp. 51–66. Springer. Berlin (2007)
Spectral Parameters of Heart Rate Variability
as Indicators of the System Mismatch During
Solving Moral Dilemmas
1 Introduction
Changes in heart rate variability (HRV) reflect the brain – heart interactions (e.g., [10,
14, 22, 24]). HRV indexes have previously been considered as indicators of changes in
brain activation [24]. The baseline HRV is different in people in a state of coma as
compared to healthy people, some authors suggested that HRV can serve as an indi-
cator of the intensity of brain activity [17]. Thayer and colleagues [23] argued that
changes in HRV reflect the hierarchy in organization of an organism and usually
observed in response to indeterminacy and mismatch. The authors suggested that HRV
could indicate the “vertical” integration of the brain mechanisms controlling an
organism. It was noted that research into the relationship between heart and brain
activity could open new horizons for the study of psychophysiological bases of indi-
vidual behaviour [12].
Considered from the positions of the system evolutionary theory [2, 5, 21], any
behaviour is based on simultaneous actualization of functional systems [3] formed at
different stages of phylo- and ontogenesis. Each functional system is comprised by
neurons and other body cells, including those of the heart, the joint activity of which
contributes to achieving an adaptive outcome for the whole organism. From these
positions, “HRV originates in cooperation of the heart with the other components of
actualized functional systems” and reflects the system organization of behaviour (see
[6]: p. 2).
Our previous studies have found that in the process of individual development
children gradually shift from supporting in-group members, even when they behave
unfairly towards out-group members, to prioritizing fairness towards all other indi-
viduals, irrespective of what group they belong to [19, 20]. We argued that learning to
support fairness towards out-groups is associated with forming new functional systems
enabling this more complex behaviour. However, fairness towards outgroups can be
contradictory to earlier formed unconditional in-group preference. Situations like this
can be described as the system mismatch, when functional systems with contradictory
characteristics are actualized simultaneously. Here we hypothesize that in a situation
of a conflict between in- and out-group members, fairness towards out-groups would
predetermine the occurrence of a system mismatch reflected in HRV. To test this
hypothesis, we analyzed the spectral parameters of HRV in children solving moral
dilemmas with a conflict between in- and out-group members.
spectral parameters of HRV: low frequency power of HRV (LF), high frequency power
of HRV (HF), total power of HRV (TP), and LF/HF ratio [13].
Responses to dilemmas were coded as “1”, if a child chose to support an out-group
member, and “0”, if a child chose to support an in-group member. Average scores
characterising individual responses to all dilemmas were also calculated. For the
analyses, all participants were subdivided into two groups: those who supported out-
group members in more than a half of the dilemmas (“out-group supporters”) and those
who supported in-group members in more than a half of the dilemmas (“in-group
supporters”).
Statistical analyses were performed with IBM SPSS Statistic 17. Significance at
p < 0.05.
3 Results
Fig. 1. Higher values of LF/HF ratio in children supporting out-group members as compared to
children supporting in-group members in situations with a conflict where out-group members
were treated unfairly by in-group members * Mann-Whitney U test, p < 0.05.
Spectral Parameters of Heart Rate Variability as Indicators 141
4 Discussion
In this study we tested the hypothesis that in a situation of a conflict between in- and
out-group members, fairness towards out-groups would predetermine the occurrence of
a system mismatch, which is observed when functional systems with contradictory
characteristics are actualized simultaneously; and such a mismatch would be reflected
in HRV.
As mentioned above, any behaviour, including moral dilemma solving, is sup-
ported by simultaneous actualization of functional systems formed at different stages of
individual development. Our previous work [19, 20] demonstrated that young pre-
school age children tended to exhibit unconditional in-group preference, which is
considered a behavioural strategy based on actualization of functional systems formed
early in individual development, including those associated with parochial altruism
(unconditional in-group preference with aggressive behaviour toward out-groups [1, 9,
11]). Older children were shown to develop a more complex behavioural strategy to
support those treated unfairly, including members of out-groups, which requires
actualisation of later-formed functional systems. This is consistent with the view that
reciprocal altruism toward out-group members requires higher cognitive complexity
[16]. It is possible that the whole structure of individual experience is reorganised
through the formation of “new” systems enabling a new type of behaviour, which may
require some time. The development of moral attitudes towards out-groups occurs
gradually and requires accumulation of a sufficient number of episodes associated with
the “new” moral behaviour. The conflict between the earlier and later formed systems
activated simultaneously can be described as an instance of the system mismatch,
because these systems have contradictory characteristics.
The results of this study showed that in situations involving a conflict where out-
group members are treated unfairly by in-group members, the decision to support out-
group members was associated with higher values of LF/HF ratio of HRV. Higher
values of LF/HF ratio are usually observed during stress [7, 8, 15, 18], which is also
considered as a situation of the system mismatch [4]. Thus, the results of this study
indicate that characteristics of social behaviour and its development, as observed in
case of moral attitudes toward in- and out-group members, can be manifested in the
dynamics of individual psychophysiological states.
Acknowlegements. The reported study was funded by RFBR, the research project № 18-313-
20003_mol_a_ved.
142 I. M. Sozinova et al.
References
1. Abbink, K., Brandts, J., Herrmann, B., Orzen, H.: Parochial altruism in inter-group conflicts.
Econ. Lett. 117(1), 45–48 (2012)
2. Alexandrov, Yu.I.: How we fragment the world: the view from inside versus the view from
outside. Soc. Sci. Inf. 47(3), 419–457 (2008)
3. Alexandrov, Yu.I.: Cognition as systemogenesis. In: Anticipation: Learning from the Past,
pp. 193–220. Springer, Cham (2015)
4. Alexandrov, Yu.I., Svarnik, O.E., Znamenskaya, I.I., Kolbeneva, M.G., Arutynova, K.R.,
Krylov, A.K., Bulava, A.I.: Regression as stage of development [Regressiya kak etap
razvitiya]. M.: Institute of Psychology Ras [Institut Psikhologii RAN] (2017) [in Russian]
5. Alexandrov, YuI, Grechenko, T.N., Gavrilov, V.V., Gorkin, A.G., Shevchenko, D.G.,
Grinchenko, Y.V., Bodunov, M.V.: Formation and realization of individual experience.
Neurosci Behav Physiol 27(4), 441–454 (1997)
6. Anokhin, P.K.: Biology and Neurophysiology of Conditioned Reflex and Its Role in
Adaptive Behavior, 1st edn. Pergamon Press, Oxford (1974)
7. Bakhchina, A.V., Arutyunova, K.R., Sozinov, A.A., Demidovsky, A.V., Alexandrov, Y.I.:
Sample entropy of the heart rate reflects properties of the system organization of behaviour.
Entropy 20(6), 449 (2018)
8. Bakhchina, A.V., Shishalov, I.S., Parin, S.B., Polevayam, S.A.: The dynamic cardiovascular
markers of stress. Int. J. Psychophysiol. 94(2), 230 (2014)
9. Bernhard, H., Fischbacher, U., Fehr, E.: Parochial altruism in humans. Nature 442(7105),
912 (2006)
10. Billman, G.E.: The effect of heart rate on the heart rate variability response to autonomic
interventions. Front. Physiol. 4, 222 (2013)
11. Choi, J.K., Bowles, S.: The coevolution of parochial altruism and war. Science 318(5850),
636–640 (2007)
12. Lane, R.D., Wager, T.D.: The new field of Brain-Body Medicine: What have we learned and
where are we headed? NeuroImage 47(3), 135–1140 (2009)
13. Lombardi, F.: Clinical implications of present physiological understanding of HRV
components. Card. Electrophysiol. Rev. 6(3), 245–249 (2002)
14. McCraty, R., Atkinson, M., Tomasino, D., Bradley, R.T.: The coherent heart heart-brain
interactions, psychophysiological coherence, and the emergence of system-wide order.
Integr. Rev. A Transdisc. Transcult. J. New Thought Res. Prax. 5(2) (2009)
15. Polevaya, S.A., Eremin, E.V., Bulanov, N.A., Bakhchina, A.V., Kovalchuk, A.V., Parin, S.
B.: Event-related telemetry of heart rate for personalized remote monitoring of cognitive
functions and stress under conditions of everyday activity. Sovremennye tekhnologii v
medicine 11(1 (eng)) (2019)
16. Reznikova, Z.: Altruistic behavior and cognitive specialization in animal communities. In:
Encyclopedia of the Sciences of Learning, pp. 205–208 (2012)
17. Riganello, F., Candelieri, A., Quintieri, M., Conforti, D., Dolce, G.: Heart rate variability: an
index of brain processing in vegetative state? An artificial intelligence, data mining study.
Clin. Neurophysiol. 121, 2024–2034 (2010)
18. Runova, E.V., Grigoreva, V.N., Bakhchina, A.V., Parin, S.B., Shishalov, I.S., Kozhevnikov,
V.V., Nekrasova, M.M., Karatushina, D.I., Grigoreva, K.A., Polevaya, S.A.: Vegetative
correlates of conscious representation of emotional stress. CTM 5(4), 69–77 (2013)
19. Sozinova, I.M., Znamenskaya, I.I.: Dynamics of Russian children’s moral attitudes toward
out-group members. In: The Sixth International Conference On Cognitive Science, p. 94
(2014)
Spectral Parameters of Heart Rate Variability as Indicators 143
20. Sozinova, I.M., Sozinov, A.A., Laukka, S.J., Alexandrov, Yu.I.: The prerequisites of
prosocial behavior in human ontogeny. Int. J. Cogn. Res. Sci. Eng. Educ. (IJCRSEE) 5(1),
57–63 (2017)
21. Shvyrkov, V.B.: Behavioral specialization of neurons and the system-selection hypothesis of
learning. In: Human Memory and Cognitive Capabilities, pp. 599–611. Elsevier, Amsterdam
(1986)
22. Stefanovska, A.: Coupled oscillators: complex but not complicated cardiovascular and brain
interactions. In: 2006 International Conference of the IEEE Engineering in Medicine and
Biology Society, pp. 437–440. IEEE (2006)
23. Thayer, J.F., Lane, R.D.: Claude Bernard and the heart–brain connection: Further
elaboration of a model of neurovisceral integration. Neurosci. Biobehav. Rev. 33, 81–88
(2009)
24. Van der Wall, E.E., Van Gilst, W.H.: Neurocardiology: close interaction between heart and
brain. Netherlands Heart J. 21(2), 51–52 (2013)
The Role of Brain Stem Structures
in the Vegetative Reactions Based
on fMRI Analysis
Abstract. This work was aimed at studying the role of brain stem structures in
vegetative responses upon presentation of self significant stimuli (personal
name) using the functional MRI method. The subjects, based on the data of the
MRI compatible polygraph, were divided into three groups with different degree
of vegetative reactions to personality-related stimuli: with strong galvanic skin
reactions (GSR) only—7 subjects; with medium GSR and cardiovascular
response (CR)—6 subjects; and with low reactivity of GSR and CR—5 subjects.
The obtained statistical maps of brain neural network activities showed high
activation of the brain stem structures upon presentation of personality-related
stimuli in the second group (medium GSR and CR); low activation of the stem
structures in the first group (strong GSR); and complete absence of activation of
the stem structures in subjects of the third group (with low reactivity of the GSR
and CR). It was shown that the use of MRI compatible polygraph for selection
of fMRI data to subsequent statistical analysis is effective.
1 Introduction
In the study of operation of brain neural networks and the determination of their exact
spatial-temporal characteristics, objective monitoring of the current condition of sub-
jects during functional magnetic resonance imaging (fMRI) is necessary. For this
purpose, an MRI compatible polygraph (MRIcP) has been developed at NRC “Kur-
chatov Institute”, which allows monitoring the dynamics of human vegetative reactions
during MRI examination (earlier, for this purpose, we used MRI compatible elec-
troencephalograph [1] and eye-tracker [2–4]). The data obtained with the use of MRIcP
could serve as correlates of important neurophysiological processes in the brain and
could be used to determine activation of neural networks involved in these processes.
In this work, a study was carried out using an MRIcP to reveal the relationship between
the dynamics of vegetative reactions—galvanic skin response (GSR) and
© Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 144–150, 2020.
https://doi.org/10.1007/978-3-030-30425-6_16
The Role of Brain Stem Structures in the Vegetative Reactions 145
in detail in [7]. In the subgroup of high-reactive subjects (15 persons), the degree of GSR
was in the range of 60–100% (that is, the subjects have according to the GSR in TCN
from 6 to 10 points out of 10 possible). In low-reactive subjects (5 persons), the degree
of GSR was 40% or less (i.e., subjects received 4 or less out of 10 possible).
It should be noted that a subgroup of 15 highly reactive subjects, according to
MRIcP data, also turned out to be heterogeneous (as described in [7]) and was divided
into two parts. The subjects, based on MRcP data, were divided into three groups with
different degree of autonomic reactions to personality-related stimuli: with strong GSR
only—7 subjects (group 1), with mean GSR and CR (measured by photoplethysmo-
gram signal)—6 subjects (group 2), and with low reactivity of the GSR and CR—5
subjects (group 3). Two people were excluded from the analysis because they had no
signs of this gradation.
The obtained statistical maps of brain neural networks activity (see below) showed
high activation of brain stem structures upon personality-related stimuli presentation in
the second group (mean GSR and CR), low activation of stem structures in the first
group (strong GSR), and total absence of stem structure activation in subjects of the
third group (low-reactive GSR and CR).
The first group included the subjects, in whom only GSR was highly informative in
identifying the concealed name, and the subjects in the second group had GSR and
vascular spasm (Fig. 1) as informative parameters.
Fig. 1. Polygram of TCN of a highly reactive subject. 8 channels correspond to: 1—sound of
presented stimuli; 2—sound of subject responses (along with the sound of MRI scanner); 3—
subject head movement; 4,5—upper and lower pneumogram sensors, 6—GSR; 7—HR; 8—
photoplethysmogram.
The Role of Brain Stem Structures in the Vegetative Reactions 147
On the Fig. 1, the fifth, last presentation of the TCN is shown. Concealing
meaningful information (the own name was Alexander, highlighted by a rectangle on
Fig. 1) causes the subject to have a maximum GSR (channel 6), decrease in heart rate
(channel 7; a moving “lens” shows 85 beats per minute) and pronounced, minimal in
this presentation, narrowing of the vessels of the fingers (channel 8).
It was very difficult for low-reactive subjects to isolate a concealed name by
MRTcP recorded reactions, due to their physiological characteristics, low reactivity and
instability of GSR, heart rate and vascular spasm (Fig. 2).
The Fig. 2 shows the chaotic appearance of the GSR during the third (out of five)
presentations of the TCN. Concealing his own name (Andrew, highlighted by a rect-
angle), among other names, causes the subject to have a very weak GSR (channel 6), as
well as not accompanied by a drop in heart rate (channel 7) and narrowing of the
vessels of the fingers (channel 8).
3 Results
Figure 3 shows fMRI results obtained for the three groups of subjects, divided on the
basis of the MRIcP data: with strong GSR only (group 1); with mean GSR and CR
(group 2); with low reactivity of the GSR and CR (group 3).
148 V. L. Ushakov et al.
Fig. 3. The results of group statistical analysis (p < 0,001) for comparison of personality-related
stimuli perception in relation to neutral stimuli. The figure shows a group statistical map
underlaid with a high-resolution T1 image at levels x = −8, −6, −4: A—group 1; B—group 2; C
—group 1 with removal of some of the fMRI samples of perception of neutral names in the cases
when there was high reactivity in the MRIcP signal; D—group 2 with removal of some of the
fMRI samples of perception of neutral names in the cases when there was high reactivity in
MRIcP signal; E—group 3; F—group 3 with removal of some of the fMRI samples of perception
of neutral names in the cases when there was high reactivity in the MRIcP signal.
The Role of Brain Stem Structures in the Vegetative Reactions 149
Fig. 3. (continued)
On the basis of the obtained data of brain stem activation upon presentation of self
significant stimuli, connectivity between this zone and other parts of the brain were
restored separately for groups with pronounced physiological reactions (15 subjects)
and with low physiological reactions (5 subjects). As a result, it was shown that for a
group of subjects with pronounced physiological reactions, a statistically significant
(p < 0,001) negative correlation was observed between the activity of the brain stem
and the hippocampus when perceiving personality-related stimuli with respect to
neutral ones.
4 Discussion
As can be seen from results shown in Fig. 3, for a group with mean GSR and HR
changes, a pronounced activation of the brain stem structures is observed upon pre-
sentation of self significant stimuli (see Fig. 3A and C), a significantly lower level of
activity in the group with strong GSR (see Fig. 3E and F) and the complete absence of
stem activations in the group with low reactivity of the GSR and CR (see Fig. 3B and
D). When removing the neutral words from a sample of fMRI signals in the condition
when high reactivity was observed in MRIcP data, more extensive activity was
observed in the brain stem in groups 1 and 3 that consistent with the operation of
autonomous regulation systems [8]. Thus, we can conclude about the effectiveness of
using an MRIcP for the selection of fMRI data for subsequent statistical analysis. The
revealed hidden negative correlation between the activity of the brain stem and the
hippocampus in the perception of personality-related stimuli with respect to neutral
ones shows the promise of using the method of constructing connectomes to visualize
150 V. L. Ushakov et al.
the processes of neural network interactions with each other, which will be used in
further work. The experiments confirmed the promising prospects of the joint use of
fMRI technology and SUP to study neurocognitive processes. In the course of the
study, the criterion for classifying subjects according to the dynamics of their vege-
tative reactions was discovered: the criterion allows for a more focused approach to the
study of neurocognitive processes and may contribute to improving the quality of fMRI
research for various purposes.
Acknowledgements. This study was partially supported by the National Research Centre
Kurchatov Institute (MRI compatible polygraphy), by RFBR Grant ofi-m 17-29-02518 (the
cognitive-effective structures of the human brain), by the Russian Foundation of Basic Research,
grant RFBR 18-29-23020 mk (method and approaches for fMRI analyses). The authors are
grateful to the MEPhI Academic Excellence Project for providing computing resources and
facilities to perform experimental data processing.
References
1. Dorokhov, V.B., Malakhov, D.G., Orlov, V.A., Ushakov, V.L.: Experimental model of study
of consciousness at the awakening: fMRI, EEG and behavioral methods. In: BICA 2018,
Proceedings of the Ninth Annual Meeting of the BICA Society. Advances in Intelligent
Systems and Computing, vol. 848, pp. 82–87 (2019)
2. Korosteleva, A., Mishulina, O., Ushakov, V.: Information approach in the problems of data
processing and analysis of cognitive experiments. In: BICA 2018, Proceedings of the Ninth
Annual Meeting of the BICA Society. Advances in Intelligent Systems and Computing, vol.
848, pp. 180–186 (2019)
3. Korosteleva, A., Ushakov, V., Malakhov, D., Velichkovsky, B.: Event-related fMRI analysis
based on the eye tracking and the use of ultrafast sequences. In: BICA for Young Scientists,
Proceedings of the First International Early Research Career Enhancement School on BICA
and Cybersecurity (FIERCES 2017). Advances in Intelligent Systems and Computing, vol.
636, pp. 107–112 (2017)
4. Orlov, V.A., Kartashov, S.I., Ushakov, V.L., Korosteleva, A.N., Roik, A.O., Velichkovsky,
B.M., Ivanitsky, G.A.: “Cognovisor” for the human brain: Towards mapping of thought
processes by a combination of fMRI and eye-tracking. In: Book Advances in Intelligent
Systems and Computing. Springer, vol. 449, pp. 151–157 (2016)
5. Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.B., Frith, C.D., Frackowiak, R.S.:
Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain
Mapp. 2, 189–210 (1995)
6. The accuracy and utility of polygraph testing (Department of Defense, DC). Polygraph 13, 1–
143 (1984)
7. Orlov, V.A., Kholodny, Y.I., Kartashov, S.I., Malakhov, D.G., Kovalchuk, M.V., Ushakov,
V.L.: Application of registration of human vegetative reactions in the process of functional
magnetic resonance imaging. In: Advances in Intelligent Systems and Computing (2019), in
Press
8. Sclocco, R., Beissner, F., Bianciardi, M., Polimeni, J.R., Napadow, V.: Challenges and
opportunities for brainstem neuroimaging with ultrahigh field MRI. NeuroImage 168, 412–
426 (2018)
Ordering of Words by the Spoken Word
Recognition Time
1 Introduction
Selecting operators for voice control of freely moving devices we encountered the
phenomenon which was not explicitly reported in the spoken word recognition studies
[1, 2]. Despite decades of intensive research the field of spoken word recognition still
remains open for the study of the underlying cognitive and linguistic processes. With
new technologies available it is worth to revisit simple experimental approaches used to
explore the process of human perception of the spoken words. Before setting up
complex study of speech perception by humans, which involves the use of sophisti-
cated equipment such as functional magnetic resonance imaging fMRI, magnetic
encephalography MEG or brain-computer interfaces BCI, one has to select proper
linguistic material for the experiments. This requires a set of compact preliminary tests
which on the one hand can assess the ability of the candidate subjects to perform
smoothly the proposed task and on the other hand can sort the suggested linguistic
material. One has to select these words or their combinations which would allow lucid
interpretation of the experimental data. We believe that the spoken word should be
selected as the basic stimulus, since visual presentation of words implies the study of
the language for literate. The latter presumably involves other brain mechanisms than
the “language for illiterate” or the basic language do. We hope that the cortical pro-
cesses which activate smaller number of different activities will be easier to describe
and may be even to understand. Starting from this background we designed our
experiments described below.
© Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 151–156, 2020.
https://doi.org/10.1007/978-3-030-30425-6_17
152 V. Vvedensky et al.
2 Methods
24 Russian nouns were presented in random order, each word three times. The words
were pronounced by the same male speaker. Age range of our 12 listeners (5 women)
was quite broad: 10, 16, 17, 31, 32, 45, 61, 61, 62, 63, 70, 80 years. All subjects gave
informed consent to participate in the experiments. The study was approved by the
local ethics committee for biomedical research of RNC Kurchatov Institute. Each
session lasted about 20 min. The subjects were instructed to press “Enter” on the
keyboard just at the moment they recognize the word they hear. Before the next trial,
they repeat the word heard. The task is reasonably simple, so practically no errors
occur. This is the list of the words used: эффeкт, кyлaк, пecoк, мocт, cпopт, глaз,
книжкa, нapoд, пopoг, вaгoн, жизнь, вxoд, живoт, caпoг, мacтep, мeчтa, кocтюм,
oceнь, гpyппa, ceлo, вpeмя, жeнa, чиcлo, тpyбкa (in English: effect, fist, sand, bridge,
sport, eye, book, people, door-step, carriage, life, entrance, belly, boot, master, dream,
suit, autumn, group, village, time, wife, number, pipe). Sound duration of the words is
nearly equal despite different number of letters (4 to 6) in the selected words.
3 Results
The scatter of recognition times is shown in Fig. 1 for three subjects, others display the
same behavior. The scatter is considerable and at first glance looks noise-like. One
should not think that such a large scatter is somewhat special for just the experiment
Fig. 1. Time when the subjects pressed the key, indicating that they understood the word they
hear. 24 words and two repetitions for each were presented in random order. Average reaction time
for these subjects is somewhat different. In this plot reaction time is referenced to the sound offset.
Ordering of Words by the Spoken Word Recognition Time 153
with words. Quite the opposite, this phenomenon always complicates measurements of
the reaction time to simple stimuli, especially relevant for pilots and sportsmen.
However, in our case the stimulus is quite complex and different each time. We analyze
human reactions on different words separately. Recognition time is referenced to the
sound offset point since majority of the key presses fall on the post-word period. It
turns out that the recognition times for different words of the same sound duration can
be ordered, so that each listener generates ordered list of 24 perceived words. Two
examples are shown in Fig. 2.
Fig. 2. 24 words heard by two listeners (Subject1 and Subject 12 in Fig. 4) and ordered by their
recognition times. In this plot reaction time is referenced to the sound onset. Time scale is in
milliseconds. Each word was presented three times. Ends of the scatter bars correspond to the
longest and shortest recognition times, while the third time lies in the middle. One can see
similarity of these ordered word lists.
Fig. 3. List of 24 Russian words ordered by 12 listeners. The word below is recognized most
quickly while the recognition time gradually increases for the words above. Each listener
generates personal ordered list of the words with gradually growing recognition time. The
ordered lists are basically similar for the subjects and error bar represents standard deviation of
the rank for each word.
robust way. This correlation is shown in Fig. 4. Linear order emerging in a group of
subjects performing some cognitive task is common – the most obvious example is the
ranking of chess players. Ranking in the same group is not universal, though depends
on the specific task. In the same group of tennis players the rankings for singles and
doubles can differ considerably. It is worth to mention that the words also tend to be
ordered into linear lists: the Zipf law is the most spectacular example.
Earlier we observed the same ordering of both nouns and listeners for another
group of 24 words: кaшa, лeди, пoни, минa, гpyшa, тyшa, cитo, пивo, ceти, тeмa,
кoмa, вилы, бycы, мyxa, тинa, зoнa, cтaя, лocи, дypa, yши, дaмa, дoля, caжa, лыжи
(in English: porridge, lady, pony, mine, pear, carcass, sieve, beer, net, theme, coma,
hayfork, beads, fly, ooze, zone, flock, moose, fool, ears, dame, share, soot, ski). These
words are presented in the order of decreasing recognition time. In this early experi-
ment another group of listeners was tested.
Ordering of Words by the Spoken Word Recognition Time 155
Fig. 4. Correlation of ranked lists of 24 words, generated by 12 listeners, with the average list.
Trend line demonstrates ranking of the subjects.
4 Discussion
We analyze only a small group of words from several thousand used in the language.
However this is the common feature of all linguistic experiments. We are looking
forward to develop an approach which in evolutionary way will select proper groups of
words for particular linguistic task. The choice of proper group of listeners is also quite
important, since different people use variable strategies in the speech communication.
So the dialects emerge.
Our data show the directions where we shall proceed. We have to generate new lists
of words around the “quick” and “slow” words in the analyzed list. There are plenty of
words in the thesaurus. The same list has to be presented to several clearly distinct
groups of listeners, which emerge from previous experimentation. In this way we
expect to cover considerable part of the language thesaurus and to find directions where
the experimental data will produce crucial information for the understanding of speech
perception.
Neuroimaging experimental data on the perception of words indicate broad scatter
of cortical activity, related to individual words, over the considerable part of both
cerebral hemispheres [3]. Locations for different word groups are detected using fMRI
156 V. Vvedensky et al.
machines, so that “across the cortex, semantic representation is organized along smooth
gradients that seem to be distributed systematically” [4]. It seems likely that we see
these local gradients in our experiments with groups of words. Observed linearity is
certainly local (for just group of words) though we believe that these linear segments
can be woven into the complete network of words, could be similar to the fishnet. We
believe that our simple though careful testing of the words groups which can be
represented in the same cortical area can shed light on the mechanisms people use for
language communication. The tests described here can be easily combined with MEG
measurements which have long shown that the word heard evokes neuronal activity in
many places throughout the cortex [5].
The author VLV is supported by the Russian Fund for Basic Research, grant 18-00-
00575 comfi.
References
1. Pisoni, D.B., McLennan, C.T.: Spoken word recognition: historical roots, current theoretical
issues, and some new directions. In: Neurobiology of Language, Chap. 20. Elsevier Inc.,
Amsterdam (2016). https://doi.org/10.1016/B978-0-12-407794-2.00093-6
2. Vitevitch, M.S., Luce, P.A.: Phonological neighborhood effects in spoken word perception
and production. Annu. Rev. Linguist. 2(7), 1–7.20 (2016)
3. Huth, A.G., de Heer, W.A., Griffiths, T.L., Theunissen, F.E., Gallant, J.L.: Natural speech
reveals the semantic maps that tile human cerebral cortex. Nature 532(7600), 453–458 (2016).
PMID: 27121839
4. Huth, A.G., Nishimoto, S., Vu, A.T., Gallant, J.L.: A continuous semantic space describes the
representation of thousands of object and action categories across the human brain. Neuron
76, 1210–1224 (2012). https://doi.org/10.1016/j.neuron.2012.10.01499-110
5. Vvedensky V.L., Korshakov A.V.: Observation of many active regions in the right and left
hemispheres of the human brain which simultaneously and independently respond to word.
In: Proceedings Part 1 XV Russian Conference Neuroinformatics-2013, MEPhI, Moscow,
pp. 43–52 (2013). (in Russian)
Neurobiology and Neurobionics
A Novel Avoidance Test Setup:
Device and Exemplary Tasks
Abstract. This paper presents a novel rodent avoidance test. We have devel-
oped a specialized device and procedures that expand the possibilities for
exploration of the processes of learning and memory in a psychophysiological
experiment. The device consists of a current stimulating electrode-platform and
custom software that allows to control and record real-time experimental pro-
tocols as well as reconstructs animal movement paths. The device can be used to
carry out typical footshock-avoidance tests, such as passive, active, modified
active and pedal-press avoidance tasks. It can also be utilized in the studies of
prosocial behavior, including cooperation, competition, emotional contagion and
empathy. This novel footshock-avoidance test procedure allows flexible current-
stimulating settings. In our work, we have used slow-rising current. A test animal
can choose between the current rise and time-out intervals as a signal for action in
footshock avoidable tasks. This represents a choice between escape and avoid-
ance. This method can be used to explore individual differences in decision-
making and choice of avoidance strategies. It has been shown previously that a
behavioral act, for example, pedal-pressing is ensured by motivation-dependent
brain activity (avoidance or approach). We have created an experimental design
based on tasks of instrumental learning: pedal-pressing in an operant box results
in a reward, which is either a piece of food in a feeder (food-acquisition behavior)
or an escape-platform (footshock-avoidance behavior). Data recording and
analysis were performed using custom software, the open source Accord.NET
Framework was used for real-time object detection and tracking.
1 Introduction
Animal models are used by researchers all over the world. Rodent passive/active
avoidance tests are the typical models not only in experimental psychology but also in
clinical psychology, psychiatry and behavioral neuroscience. Recent years have
brought rapid advances in our understanding of the brain processes involved in the
avoidance-learning, along with their clinical implications for anxiety disorders, PTSD
etc. [7, 10]. Avoidance behavior in rodents has predominantly been studied using lever-
press signaled avoidance task, which requires animals to press a tool upon presentation
of a warning signal in order to prevent or escape punishment [10]. The development of
new techniques capable of modeling multidimensional cognitive activity could be a
valuable contribution to psychophysiological studies. The system organization of
human and animal behavior, including the processes of systemogenesis, can be studied
in a variety of situations, such as learning and performing behavioral tasks,
acute/chronic stress, psychotrauma, alcohol intoxication, etc. This paper presents a
novel rodent avoidance test designed to expand the possibilities for exploration of
learning and memory processes.
2 Device
Fig. 1. Typical footshock-avoidance tests: (a) passive, (b) active, (c) modified active,
(d) “emotional contagion” - observer (left) and pain-demonstrator (right). (e) Device controller
(left) and a photograph illustrating the stable contact between electrodes (the arrow indicates one
of the electrodes) and animal’s skin.
Fig. 4. Exemplary real-time protocol for behavioral analysis (food-acquisition task). The
behavioral cycle: 1 - pedal (bar) pressing; 2 - start of the feeder motor; 3 - lowering rat head and
taking food from the feeder. Frame from the actual video recording during operant food-
acquisition behavior (right). The object is identified (rectangle), coordinates are recorded in PC.
The food-acquisition behavioral cycle was divided into several acts (Fig. 4 left):
pedal (bar) pressing (mechanosensor); moving to pedal corner; lowering head (pho-
tosensor) and taking food from the feeder. The moving object is identified (Fig. 4 right,
rectangle) by custom software using the open source Accord.NET Framework [11].
The signal-coordinates are recorded into PC. Animals’ movement paths are restored by
coordinates (see Fig. 3c).
The Accord.NET Framework is a .NET machine learning framework combined
with audio and image processing libraries completely written in C#. Real-time object
detection and tracking, as well as general methods for detecting and tracking. Con-
venient open source.
5 Conclusion
We have compiled and debugged a novel rodent avoidance task procedure that allows
to obtain new type of data about individual differences in decision-making and choice
of avoidance strategies. For example, experiments in active non-instrumental avoidance
test (see Fig. 1b) showed, that female rats choose to minimize the risks and avoid shock
during low-voltage current (a signal for avoidance), while male rats do it during the
pause (between trials), which allows to avoid the shock completely but with a risk of
high-voltage shock in rare occasions.
We have created an experimental design based on tasks of instrumental learning
that allows to explore motivation-dependent brain activity (avoidance or approach).
The novel rodent avoidance test that we developed expands the possibilities for
exploration of learning and memory processes.
164 A. I. Bulava et al.
Acknowledgments. This research was performed in the framework of the state assignment of
Ministry of Science and Higher Education of Russia (No. 0159-2019-0001 by Institute of Psy-
chology RAS - learning procedures; No. 0149-2019-0011 by Shirshov Institute of Oceanol-
ogy RAS - designed device).
References
1. Alexandrov, Y.I., Sams, M.: Emotion and consciousness: ends of a continuity. Cogn. Brain
Res. 25, 387–405 (2005)
2. Bulava, A.I., Grinchenko, Y.V.: Patterns of hippocampal activity during appetitive and
aversive learning. Biomed. Radioelectron. 2, 5–8 (2017)
3. Bulava, A.I., Svarnik, O.E., Alexandrov, Y.I.: Reconsolidation of the previous memory:
decreased cortical activity during acquisition of an active avoidance task as compared to an
instrumental operant food-acquisition task. In: 10th FENS Forum of Neuroscience, Abstracts
P044609, p. 3493 (2016)
4. Carrillo, M., Han, Y., Migliorati, F., Liu, M., Gazzola, V., Keysers, C.: Emotional mirror
neurons in the rat’s anterior cingulate cortex. Curr. Biol. 29(8), 1301–1312 (2019)
5. Cheng, N., Van Hoof, H., Bockx, E., Hoogmartens, M.J., et al.: The effects of electric
currents on ATP generation, protein synthesis, and membrane transport in rat skin. Clin.
Orthop. 171, 264–272 (1982)
6. Keum, S., Shin, H.-S.: Rodent models for studying empathy. Neurobiol. Learn. Mem. 135,
22–26 (2016)
7. Krypotos, A.-M., Effting, M., Kindt, M., Beckers, T.: Avoidance learning: a review of
theoretical models and recent developments. Front. Behav. Neurosci. 9, 189 (2015)
8. Muenzinger, K.F., Mize, R.H.: The sensitivity of the white rat to electric shock: threshold
and skin resistance. J. Comp. Psychol. 15(1), 139–148 (1933)
9. Shvyrkova, N.A., Shvyrkov, V.B.: Visual cortical unit activity during feeding and avoidance
behavior. Neurophysiology 7, 82–83 (1975)
10. Urcelay, G.P., Prevel, A.: Extinction of instrumental avoidance. Curr. Opin. Behav. Sci. 26,
165–171 (2019)
11. Accord.NET Framework. http://accord-framework.net/index.html. Accessed 14 May 2019
Direction Selectivity Model Based
on Lagged and Nonlagged Neurons
1 Introduction
Primary visual cortex neurons are selective to various characteristics of the stim-
ulus: orientation, direction of motion, color, etc. [1]. Most of the DS models
include a time delay between the spatially separated inputs into a cortical cell
[2]. The physiological mechanism of this delay formation has been revealed in [3]
and further, in more details, in [4], where with the help of intracellular in vivo
registrations it was demonstrated that the lateral geniculate nucleus (LGN) neu-
rons fall into two classes: lagged and non-lagged cells; and a delay of the lagged
neurons is determined by the effects of the inhibitory-excitatory synaptic com-
plexes formed on synaptic axonal terminals of retinal ganglion cells in LGN. In
[5], it was proposed a complex schematic model of DS, based on specific conver-
gent projections of the signals from lagged and non-lagged LGN cells, as well as
on the intracortical interactions. Later, it was proposed a reduced, rate model
of a hypercolumn that exploits lagged and non-lagged LGN cells and feedfor-
ward inhibition [6]. However no any detailed and comprehensive model has been
reported yet. In our biophysically detailed model of V1 we use a conductance-
based refractory density (CBRD) approach [7], which allows us to benefit from
the advantages of population models and keep the precision of biophysically
detailed models.
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 165–171, 2020.
https://doi.org/10.1007/978-3-030-30425-6_19
166 A. V. Chizhov et al.
2 Methods
Lag-Nonlag Mechanism of Direction Selectivity. The LGN neurons differ
in their delayed reaction to visual stimuli and split into two populations of lagged
and non-lagged cells (Fig. 1). These populations are equally and homogeneously
distributed across LGN (Fig. 1, middle). The lagged/non-lagged cells have round,
center-surround receptive fields (RF) (Fig. 1, left). We consider only so-called
on-cells, they respond strongly to a bright stimulus in the center of RF and
are inhibited in the surround of RF. The center-surround structure is described
by an axisymmetric difference of Gaussians (DOG), as in [8], with the RF’s
temporal component set as a double-exponential function. The firing rate of
an LGN neuron at any given time is expressed as a convolution of RF with a
stimulus and rectified at zero. The model of LGN cells is described in detail in
[9]. Lagged cell activity is delayed by 40 μs, according to estimations from [4].
where
DLGN −V 1 (x, y, x̃, ỹ) = 1/(πσpref σorth ) exp −x2 /σpref2
− y 2 /σorth
2
,
x = (x̃ − xcf ) cos θ − (ỹ − ycf ) sin θ,
y = (x̃ − xcf ) sin θ + (ỹ − ycf ) cos θ,
δ(x, y, x̃, ỹ) = {40ms, if(−1)iP W +jP W x > 0; 0, otherwise}.
Here DLGN −V 1 (x, y, x̃, ỹ) is the LGN-to-V1 footprint with the width across
preferred orientation σpref and the width across orthogonal orientation σorth ;
δ(x, y, x̃, ỹ) is the delay that determines contributions of either lagged or non-
lagged cells.
Biophysically Detailed Mathematical Model of V1. V1 is modeled as a
continuum in 2-d cortical space. Each point contains 2 populations of neurons,
excitatory (E) and inhibitory (I), connected by AMPA, NMDA and GABA-
A-mediated synapses for recurrent interactions and only AMPA and NMDA
for LGN input. The strengths of the external connections correspond to the
pinwheel architecture, thus neurons receive inputs according to their orientation
and direction preferences. The strengths of the intracortical connections, i.e.
maximum conductances, are isotropic and distributed according to locations of
pre- and postsynaptic populations. The modeled area of the cortex was as large
as 1 mm × 1.5 mm and included 6 orientation hypercolumns.
The mathematical description of each population is based on the CBRD app-
roach [10,11], where neurons within each population are distributed according
to their phase variable, the time elapsed since their last spikes, t∗ . Single popu-
lation dynamics is governed by the equations for the neuronal density, the mean
over noise realizations voltage and gating variables. The CBRD for interacting
adaptive regular spiking pyramidal cells and fast spiking interneurons is given
in [7,12]. The model of an E-neuron takes into account two compartments and
a set of voltage-gated ionic currents, including the adaptation currents.
168 A. V. Chizhov et al.
3 Results
We have testified the mechanism of DS by comparison of spatio-temporal activity
patterns (Fig. 2) in response to horizontal gratings moving up (a) and down (b)
with temporal frequency 8 Hz and spatial frequency 0.25 cycle/grad. The bright
spots correspond to high activity. They appear in columns that prefer orientation
similar to that of the stimulus. The patterns are not symmetrical in respect to
the central vertical axis, which is due to DS, i.e. different direction preferences
for neurons of the left and right columns with the same orientation preferences,
as clear from the averaged over first 1600 ms activity of E-neurons (Fig. 2c).
The peaks of the E-cell activity locate in different hypercolumns, depending
on the direction of the grating movement. The plots for the excitatory firing
rate (Fig. 2c) are comparable to the optical imaging data, for example, the ones
obtained in cat visual cortex [13] (see their Figs. 4A-B).
For the location marked in Fig. 2c, the LGN input, mean voltage, synaptic
conductances, firing rate, voltage-sensitive dye (VSD) signal and voltage of rep-
resentative neurons are shown in Fig. 2d,e. These simulated signals are similar
to experimental recordings, for instance, those from [14] (their Fig. 5). The fir-
ing rates of E and I populations correlate in time. The amplitude of firing rate
oscillations strongly depends on the direction of gratings movement (compare
panels d and e). The voltage-sensitive dye (VSD) signal (bottom trace) was cal-
culated as a sum of three quarters of the E mean voltage and one quarter of the
I mean voltage. It is comparable to the experimentally recorded VSD-signals,
for instance, from [15].
The input signals for the neurons of the populations are the synaptic con-
ductances (Fig. 2d,e). Modulations in time of the excitatory and inhibitory com-
ponents are in-phase. To compare with experiments, it should be noted that
we present separate AMPA, NMDA and GABA conductances, whereas known
experimental studies reported anti-phase estimates of summed, AMPA+NMDA,
and inhibitory conductances [14,16–18], which should not be directly compared,
because of the underestimation of the experimental method, that was recently
revealed [19]. That is why, our observation of in-phase modulations of the AMPA
and GABA conductances should not be considered as untrue if compared with
experimental estimates of anti-phase excitatory and inhibitory conductances.
The CBRD-model enables one to reconstruct a behavior of a representa-
tive neuron, if known input variables of a population. As seen from voltage
traces, such a representative E-neuron generates spikes when the direction of
gratings movement is the preferred one. When the direction is opposite, only
sub-threshold depolarization is observed. As to an I-neuron, it shows weaker
direction specificity. Voltage traces recorded in response to moving gratings are
consistent with the ones presented in electrophysiological works in vivo, such as
[14,18], if compare the shape and the amplitude of voltage oscillations. Mean
voltage shown in Fig. 2d,e is the mean across noise realizations and across input
weights. Membrane potentials of individual neurons generally differ from this
Direction Selectivity Model Based on Lagged and Nonlagged Neurons 169
a b c
100 125 150 175 200 225 ms 100 125 150 175 200 225 ms up down
0 10 20 30 Hz 0 10 20 30 Hz
d e
preferred direction non-preferred direction
10 10 input to V1 neuron
Hz
Hz
input to V1 neuron
0 0
mean voltage
I
-60 I -60
mV
mV
mean voltage
E E
-70 -70
synaptic conductances
60 NMDA 60 NMDA
units
units
40 synaptic conductances 40
GABA
20 20 GABA
0 AMPA 0 AMPA
50 50
Hz
E E
0 0
-40 -40
voltage in representative neuron voltage in representative neuron
mV
mV
-60 I
-60 I
E
E
VSD-signal
-60 VSD-signal -60
mV
mV
mean voltage due to an individual input weight obeying the lognormal distribu-
tion, different noise realizations and different refractory state t∗ , as seen from
the example for the representative neuron.
170 A. V. Chizhov et al.
4 Discussion
In our model, the average activity patterns (Fig. 2c) are comparable with the
optical imaging data [13]. The scales and contrast of the modeled and experi-
mental spots of activity are similar. Also, the displacement of the spots after the
change of the stimulus direction is similar.
We have found that E-neurons are directionally selective, whereas I-neurons
are not, because of two reasons: I-neurons do not receive direct LGN input and
the characteristic length of E-cell connections to I-cells is 5 times bigger than
that of E-to-E connections.
The voltage traces registered in [14,17,18] have the same degree of DS as
our model. The Lag-Nonlag mechanism is principally similar to that based on
transient and sustained cells [20]. Alternatively, recently reported experimental
data obtained with the help of optogenetics [21] and multielectrode electrophysi-
ological recordings [22] suggest that DS in V1 is determined by a displacement of
on- and off- subzones of the receptive fields of V1 neurons. Here we did not take
into account the off-signals; instead, we considered only on-center off-surround
neurons in LGN and their pure excitatory projections to V1. Introduction of
feedforward inhibition and/or off-center on-surround LGN neurons and on-off
separation at the level of V1 is expected to produce stronger DS. This issue is
to be considered in our future study.
Concluding, the proposed model is quite realistic by construction and behav-
ior. Simulations approve that the suggested mechanism is consistent with known
experimental constraints.
References
1. Hubel, D.H., Wiesel, T.N.: Receptive fields of single neurones in the cat’s striate
cortex. J. Physiol. 148, 574–591 (1959)
2. Adelson, E.H., Bergen, J.R.: Spatiotemporal energy models for the perception of
motion. J. Opt. Soc. Am. A. 2, 284–299 (1985)
3. Cai, D., DeAngelis, G.C., Freeman, R.D.: Spatiotemporal receptive field organiza-
tion in the lateral geniculate nucleus of cats and kittens. J. Neurophysiol. 78(2),
1045–1061 (1997)
4. Vigeland, L.E., Contreras, D., Palmer, L.A.: Synaptic mechanisms of temporal
diversity in the lateral geniculate nucleus of the thalamus. J. Neurosci. 33(5),
1887–1896 (2013)
5. Saul, A.B., Humphrey, A.L.: Evidence of input from lagged cells in the lateral
geniculate nucleus to simple cells in cortical area 17 of the cat. J. Neurophysiol.
68(4), 1190–1208 (1992)
6. Ursino, M., La Cara, G.E., Ritrovato, M.: Direction selectivity of simple cells in
the primary visual cortex: comparison of two alternative mathematical models. I:
response to drifting gratings. Comput. Biol. Med. 37(3), 398–414 (2007)
Direction Selectivity Model Based on Lagged and Nonlagged Neurons 171
Olga E. Dick(&)
1 Introduction
Panic attacks include a complex of symptoms characterized by paroxysmal fear [1, 2].
The importance of the problem of treating this disorder is due to the lack of effec-
tiveness of drug therapy. That is why there is still a need to find safe non-drug
therapies. One of these methods is the activation of artificial stable functional con-
nections (ASFC) of the human brain. The ASFC method is based on the intracerebral
phenomenon of long-term memory, which is a special kind of functional connections of
the brain that are formed under conditions of activation of subcortical structures and
impulse stimulation, and associated with the regulatory systems of the brain [3–5].
The aim of the work is to show the ability to identify quantitative indicators of the
improvement of the functional state of the brain of patients with panic attacks after
ASFC trials.
Artifact-free EEG patterns were analyzed in 10 patients aged from 26 to 45 years with a
disease duration of an average of 10 years and a diagnosis of panic disorder. The course
of correction was performed at the clinic of the Institute of the Human Brain of the
Russian Academy of Sciences and consisted of 10 trials of the formation of ASFC.
Each trial included 6 series of photostimulation with a frequency of 20 Hz and duration
of 10 s on the background of the medication of ethimizol, the intervals between the
stimuli were 60 s. The photostimulation was carried out using the functional brain
activity simulator “Mirage” (St. Petersburg). This device has proven itself in the
programs of non-drug correction in earlier studies. [3–5]. Before and after these trials,
the brain bioelectrical activity was recorded on a 21-channel electroencephalograph
with a sampling rate of 256 Hz. The study was approved by the local Ethics Com-
mittee. Written informed consent was obtained from all the subjects. The stimulation
lasted 10 s for each frequency, with a resting interval between frequencies of 30 s.
Since the signals reproducing the light rhythm have maximal amplitude in the occipital
lobes, the patterns at −O1-, Oz- и O2- sites were estimated.
The photic driving reaction in EEG patterns was estimated by the continuous
wavelet transform method [6] and the method of the joint recurrence analysis [7].
In the first method the complex Morlet wavelet was used as the basic wavelet:
where the value x0 = 2p gives the simple relation between the scale a of the wavelet
transform and the real frequency f of the analyzed signal [6]:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
x0 þ 2 þ x0
f ¼ 1=a:
4pa
Due to the relation between a and f, the continuous wavelet transform of the signal
x(t) is determined by the function:
pffiffiffi Z
þ1
1=4
Wðf ; t0 Þ ¼ p f xðtÞ exp ð0:5ðt t0 Þ2 f 2 Þ exp ði2pðt t0 Þf Þdt
1
where t0 gives the shift of the wavelet function along the time.
The value jWðf ; t0 Þj2 determines the instantaneous distribution of the energy over
frequencies f, and the integral
Z t2
Eðf Þ ¼ Wðf ; t0 Þ2 dt0
t1
describes the global wavelet spectrum, i.e., the integral distribution of the wavelet
spectrum energy over frequencies on the time interval [t1, t2].
The light time series was approximated by a sequence of k Gauss impulses fol-
lowing each other with frequency fC:
!
X
k1
0:5 ðt tj Þ2
pðtÞ ¼ pffiffiffi exp ;
r p
j¼0 0
4r02
where r0 = 10 ms is the width of the impulse, tj are the centers of the impulses:
tj ¼ tA þ j=fc ; j ¼ 0; . . .; k 1; tA is the time of the beginning of the first impulse in the
sequence [8].
174 O. E. Dick
The wavelet transform of the light series p(t) was found in the form [9]:
pffiffiffi X
f k1 f 2
Wðf ; t0 Þ ¼p1=4 pffiffiffi exp 2
þ g tj t0
g j¼0 4r0
!
2pf 2 ð2pr0 Þ2 f 3
þ i tj t0 þ ;
g g
where g ¼ 1 þ 2ðr0 f Þ2 :
The presence of the photic driving reaction was estimated by the value of the
coefficient of photic driving (kR) in the narrow range [fC − Df, fC + Df] around each
applied stimulation frequency fC, where Df = 0.5 Hz [9].
The coefficient of photic driving (kR) was determined by the ratio of the maxima of
the global wavelet spectra during the photic stimulation and before it. The value kD < 1
means that the energy of the global wavelet spectrum during the light stimulation is less
than the energy of the spectrum before stimulation and the absence of the photic
driving reaction of the given frequency.
The second method of the analysis of the photic driving reaction in the EEG
patterns is connected with the construction of joint recurrence plots of the EEG and the
light series.
A joint recurrence plot is a graphical representation of a matrix
1; yi yj ; zi zj ;
Ri;j ðeÞ ¼ ;
0; yi 6¼ yj ; zi 6¼ zj
in which values 1 or 0 correspond to black or white points, where the black point means
a recurrence and the white point corresponds to a nonrecurrence, respectively [7].
A joint recurrence, within the accuracy to e error, is determined as the repetition of the
state yj of the phase EEG trajectory to the state yi and the simultaneous repetition of the
state zj of the light signal phase trajectory to the state zi [7].
The phase trajectories of states z(t) and y(t) were obtained from the initial time
series {x(t)} and {p(t)} by using the delay coordinate embedding method [10]:
where d is the delay time, m is the embedding dimension, i.e. the minimal dimension of
the space in which the recovery trajectory reproduces properties of the initial trajectory.
The optimal time delay d was fitted on the basis of first minimum of the mutual
information function [11]. The optimal embedding dimension m was searched by the
false nearest neighbors method [12]. Signal extraction in the narrow band of fre-
quencies around the photostimulation frequency allowed us to find the value of the
optimal embedding dimension m < 5. The value e was equal to 1% of the standard
deviation of the analyzed signal.
Using the recurrence analysis we determined the quantitative measures of joint
recurrence plots such as
Wavelet and Recurrence Analysis of EEG Patterns 175
(1) the mean length of diagonal lines, L, in the joint recurrence plot,
(2) the recurrence time, s, which is necessary to the signal value returns into e
neighborhood of the previous point, as the vertical distance between the onset and
end of the sequent recurrence structure in the recurrence plot;
(3) the recurrence rate, RR:
1 X N
RR ¼ Ri;j ðeÞ
N 2 i;j
(4) the measure for determinism of the signal, DET, as the ratio of recurrence points
that form diagonal structures of at least length lmin to all recurrence points:
P
N
lPðe; lÞ
l¼lmin
DET ¼ ;
P
N
Ri;j ðeÞ
i;j
3 Results
In the background EEG of 69% of patients the high-amplitude activity of the h - range
dominated, and EEG of 31% of patients showed the low-amplitude polymorphic
activity in the d, h - and a - ranges before the ASFC trials. The ASFC trials resulted in a
significant decrease in the amplitude of the h - activity, the disappearance of the
polymorphic activity and an increase in the activity in the a - range.
The reactive EEG patterns before the ASFC trials were characterized by asymmetry
of the responses of the occipital lobes of the brain to the photostimulus. It was man-
ifested in various values of maxima of local wavelet spectra of EEG patterns recorded
in O1 and O2 sites (Fig. 1a, c).
After 10 trials of ASFC, all patients reported a significant decrease or complete
disappearance of panic attacks, a decrease in general and situational anxiety. The
asymmetry of the photic driving reaction decreased (Fig. 1b, d).
Table 1 shows the average values of the photic driving coefficient (kR) for the
reactive EEG patterns before and after the ASFC trials. For 9 out of 10 patients with
panic attacks the value of the photic driving coefficient kR < 1 for frequencies of the h -
range, that means the absence of the photic driving reaction of the given rhythm. The
176 O. E. Dick
a before ASFS c
O1 O2
20
20.4 20.4 30
20.2 15 20.2
20
f, Hz
f, Hz
20 10 20
19.8 5 19.8 10
19.6 19.6
0 10 20 0 10 20
t, s t, s
after ASFS
O1 b O2 d
20.4 3 20.4 5
20.2 20.2 4
2
f, Hz
f, Hz
3
20 20
2
19.8 1 19.8
1
19.6 19.6
0 10 20 30 0 10 20 30
t, s t, s
Fig. 1. A decrease of maxima of local wavelet spectra of EEG patterns in O1 and O2 sites after
ASFC trials. The beginning and end of the photostimulation is indicated by arrows.
minor photic driving reaction is revealed for frequencies of the a - range (kR = 1.9
± 0.2 for 12 Hz and kR = 1.1 ± 0.1 for 8 Hz). The large reaction is found for fre-
quencies of the b - range, for example, kR = 101 ± 11 for 20 Hz. At the same time, it
is noted that the value kR for the O2 site is almost five times higher than that value for
the O1 site. Thus, there are statistically significant differences in mean values of the
coefficient kR, calculated for the occipital sites O1 and O2 (p < 0.05), that testifies about
the asymmetry of the photic driving reaction for the b - range in most patients tested.
After the ASFC trials the asymmetry of the responses of the occipital lobes of the
brain become statistically insignificant (p > 0.05), and the values kR < 1 for the a -
range. The photic driving reaction of the b - range decreases significantly (kR = 5.5
± 0.5 for the O1 site at 20 Hz).
The dynamics of the rhythm driving in EEG patterns in patients with panic attacks
after the ASFC trials was also confirmed by a change in simultaneous recurrences in the
joint recurrence plots of these patterns and light time series. Examples of such plots are
presented in Fig. 2b and d, respectively. The plots are constructed at 20 Hz for the
delay time d = 3 and the embedding dimension m = 3, the value of the neighborhood
size e is equal to 1% of the standard deviation of the analyzed time series. The
corresponding EEG patterns during photostimulation with this frequency are shown in
Fig. 2a, with a bold line, and a photostimulus with a thin dash-dotted line.
The left recurrence plot (Fig. 2b) has recurrent structures containing long diagonal
lines. It testifies about the emergence of simultaneous recurrences in the EEG pattern
and the light signal During the increase in the amplitude of the brain response to the
photostimulation of the proposed frequency (within the range of nL values from 600 to
1800), the number of simultaneous recurrences increases, which is reflected in an
increase in the length of the diagonal lines in the recurrence plot.
Wavelet and Recurrence Analysis of EEG Patterns 177
Table 1. The mean values of the photic driving coefficient (kR), the recurrence rate (RR) and the
recurrence time (s) in joint recurrence plots of the EEG patterns and the light time series
(N = 9 from 10) before (N = 9 from 10) after ASFC
ASFC
f (Hz) O1 O2 O1 O2
Coefficient of photic driving (kR):
6 <1 <1 <1 <1
12 1.9 ± 0.2 2.7 ± 0.2 <1 <1
14 5.4 ± 0.5 122 ± 18 2.1 ± 02 3.5 ± 0.3
18 35 ± 3.7 147 ± 15 11 ± 1.2 17 ± 1.8
20 22 ± 1.9 101 ± 11 5.5 ± 0.5 7.1 ± 0.7
Recurrence time (s):
6 39 ± 3.1 33 ± 3.1 35 ± 3.3 31 ± 3.0
12 28 ± 2.7 24 ± 2.3 39 ± 3.9 41 ± 4.1
14 13 ± 1.1 8 ± 0.8 44 ± 4.3 36 ± 3.5
18 7 ± 0.6 4 ± 0.3 25 ± 2.4 30 ± 2.9
20 9 ± 0.8 7 ± 0.6 37 ± 3.6 46 ± 4.5
Recurrence rate (RR):
6 0.05 ± 0.005 0.04 ± 0.004 0.06 ± 0.006 0.03 ± 0.003
12 0.08 ± 0.008 0.07 ± 0.007 0.05 ± 0.005 0.06 ± 0.006
14 0.11 ± 0.01 0.13 ± 0.01 0.02 ± 0.002 0.03 ± 0.003
18 0.13 ± 0.01 0.15 ± 0.01 0.03 ± 0.003 0.02 ± 0.002
20 0.32 ± 0.02 0.27 ± 0.02 0.04 ± 0.003 0.02 ± 0.002
10 10
ЭЭГ (мкВ)
EEG, mkV
0 0
-10 -10
-20 -20
600 1200 1800 600 1200 1800
nL nL
b d
1 1
600 600
i
1800
1800
Fig. 2. Examples of EEG patterns during the photostimulation with a frequency of 20 Hz before
(a) and after the ASFC trials (c) (site O2). b, d are the joint recurrence plots of these patterns and
light time series
178 O. E. Dick
By contrast, the right recurrence plot (Fig. 2d) has only short diagonal lines, that
means the weak joint recurrence in the given light time series and the analyzed EEG
pattern.
Figure 3 shows in details the dynamics of changes in the values of measures of the
recurrence plot given in Fig. 2b. On the abscissa axis is the time calculated in accor-
dance with the rule t =nL * dt, where dt = 1/Fs, Fs is the sampling frequency of the
recorded signal. Figure 3a, depicts a gradual increase and decrease in the amplitude of
the EEG pattern in the time interval from 2 to 7 s in response to a photostimulation
with a frequency of 20 Hz, which lasted 10 s. Within this interval the measures of the
recurrence plot increase, namely, the rate recurrence (RR) (Fig. 3b), the determinism
(DET) (Fig. 3c), the mean length of diagonal lines (L) (Fig. 3d); the recurrence time (s)
changes slightly (Fig. 3e).
a
10
EEG, mkV
-10
1 2 3 4 5 6 7 8
t, s
b d
0.4
10
RR
0.3
L
0.2 0
3 5 7 3 5 7
t, s c t, s e
0.95
10
DET
0.9
τ
0.85 0
3 5 7 3 5 7
t, s t, s
Fig. 3. The EEG pattern during the photostimulation (solid line) and light time series with a
frequency of 20 Hz (dash-dotted line) (a) and the time changes of the recurrence plot measures:
the rate recurrence (RR) (b), the determinism (DET) (c), the mean length of diagonal lines (L) (d)
and recurrence time (s) (e).
The mean values of the rate recurrence and recurrence time for the EEG patterns
before and after the ASFC trials are represented in Table 1. The data of the Table 1
point at the enhancement of mean recurrence times and the decrease of mean recurrence
rates after the ASFC trials at the frequencies of b - range (s = 9 ± 0.8, RR = 0.32
± 0.02, f = 20 Hz, site O1, before ASFC and s = 37 ± 3.6, RR = 0.04 ± 0.003 after
ASFC).
The decrease in the recurrence rate as well as the increase in the recurrence time and
the decrease in the photic driving coefficient during the photostimulation with the
Wavelet and Recurrence Analysis of EEG Patterns 179
frequencies of the b - range after the ASFC trials indicate that these trials lead to the
significantly reduced response of the brain to the external stimulus.
As known, the strong photic driving reaction in the b - range and the interhemi-
spheric asymmetry of the rhythm driving are associated with an increase in the psy-
choemotional excitability of a subject [13]. Therefore, the decline of the photic driving
reaction in the b - range found with both methods used in the work and the obtained
decrease of the asymmetry of the occipital lobes responses to photostimulus prove that
the trials of ASFCs lead to decreasing the neurotization degree of the subject with panic
attacks. Psychological testing of the patients before and after the ASFC trials confirmed
a significant reduction of the leading symptoms in the clinical picture of the disease,
namely, a decrease in the level of total anxiety (Table 2).
Table 2. Changes in psychological parameters in patients with panic attacks after the ASFC
trials (N = 10, p < 0,05)
Indicators Before ASFC After ASFC
Short term memory index (double test) 5,1 ± 1,7 8 ± 1,6
Anxiety (Taylor test) 27,7 ± 2,9 21,5 ± 2,3
Depression scale 80,4 ± 8 61,3 ± 6
4 Conclusions
The analysis of reactive EEG patterns carried out before and after non-drug therapy
associated with the activation of artificial stable functional connections of the brain, has
been shown that trials of ASFCs lead to decreasing of the asymmetry of the occipital
lobes responses to photostimulus and decreasing the neurotization degree of the subject
with panic attacks. It reflects in decreasing values of the photic driving coefficient and
the recurrence rate and in increasing the recurrence time. The improvement of the
subject psychophysiological state after the trials of ASFCs has been confirmed by the
positive dynamics of the psychophysiological testing data.
Acknowledgments. This study was supported by the Program of Fundamental Scientific Research
of State Academies for 2013–2020 (GP-14, section 64). The author thanks T. N. Reznikova, Prof. of
St. Petersburg Human Brain Institute for her help with data recordings.
References
1. Cosci, F.: The psychological development of panic disorder: implications for neurobiology
and treatment. Braz. J. Psychiatry 34, 9–19 (2012)
2. Wilson, K.A., Hayward, C.: A prospective evaluation of agoraphobia and depression
symptoms following panic attacks in a community sample of adolescents. J. Anxiety Disord.
19, 87–103 (2005)
3. Smirnov, V.M., Borodkin, Y.S.: Artificial Stable Functional Connections. Medicine,
Moscow, 192 p. (1979)
180 O. E. Dick
1 Introduction
Modeling the dynamics of changes in the electrical potential of nerve cells is
associated primarily with the works of Hodgkin and Huxley. These authors in
the article [1] for the first time managed to present a phenomenological model
based on balanced type relationships, such that its dynamics, with an appropri-
ate choice of parameters, have the basic qualitative properties characteristic of
the nerve cells observed in the experiment. The Hodgkin-Huxley model is quite
complex and contains a large number of parameters, the dependence on which
is very significant. It should be noted that in many cases the Hodgkin - Huxley
model gives a completely satisfactory not only qualitative but also quantitative
agreement with experimental data. Since the advent of this model, numerous
attempts have been made to simplify it while preserving the main effects spe-
cific to the dynamics of neurons. In the summarizing articles [2,3] a number of
criteria are given which the model of a pulse neuron must comply with, and a
large number of model systems are listed. Naturally, the simplest of them satisfy
not all the requirements. Among these requirements, the most important is the
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 181–189, 2020.
https://doi.org/10.1007/978-3-030-30425-6_21
182 S. D. Glyzin and M. M. Preobrazhenskaia
condition for the existence of a stable periodic pulse-type regime for the corre-
sponding system. To build a model of a single pulse neuron, let us reproduce
the line of reasoning from [4,5]. First of all, note that in [5] only potassium and
sodium currents are taken into account, the level of the greatest polarization of
the membrane is taken as the zero point and the potential deviation from this
level is denoted u(t). The equation of current balance, provided that leakage
currents are neglected, is written as:
where χNa (u), χK (u) are smooth functions that determine sodium and potassium
conductivity.
For potassium conductivity, book [5] accepted the hypothesis that it is
delayed comparing with the value of the membrane potential. We take this delay
to be the unit of time and assume that χK = χK (u(t−1)). To simplify the depen-
dency χNa (u) in book [5] it is noted that the areas of relative stabilization of
conductivities χNa (u) and χK (u) are large enough, so from (1) we can go to the
following equation with delay
Here the parameter λ > 0 characterizes the speed of the electrical processes in
the system and is assumed to be large, the functions f (u) and g(u) characterize
the conductivities of the ion channels and satisfy the conditions
where u∗∗ is the threshold starting from which one neuron influences the other.
For example, if u1 < u∗∗ , then the first neuron does not affect the second one;
but if u1 > u∗∗ , then it does.
Our main goal is to adapt the above-represented method for modelling chem-
ical synapses to differential-difference equations of Volterra type (see [6]). In this
case, one should reject universally accepted concepts and take a slightly different
system for the mathematical model of this neural network, namely,
where b = const > 0, u∗ = exp(c λ), c = const ∈ R and the functions s(u) satisfy
the conditions
An important feature of the system (7) is the presence of an additional time delay
h > 1 in the coupling between oscillators. The reasons for choosing the system
(7) are as follows. Firstly, the general qualitative character of a synaptic link is
preserved when passing from (5) to (7), because in both cases the corresponding
coupling terms
change their sign from plus to minus as the potentials uj increase and cross
the critical value u∗ . Secondly, which is the most important, there exists a well-
defined limit object for system (7), which is a relay-type delay system.
Indeed, after the passage to the new variables
xj = (1/λ) ln uj , j = 1, 2 (9)
and as parameter λ tends to infinity, system (7) can be represented in the form
ẋ1 = −1 + αRx1 (t − 1) − βR(x1 ) + γ (c − x1 ) H x2 (t − h),
(10)
ẋ2 = −1 + αR x2 (t − 1) − βR(x2 ) + γ (c − x2 ) H x1 (t − h) ,
where
def 1, x ≤ 0, def 0, x ≤ 0,
R(x) = H(x) = (11)
0, x > 0, 1, x > 0.
184 S. D. Glyzin and M. M. Preobrazhenskaia
As it turned out, the system (10) has a rather complex dynamics. As will be
shown in the next section, in this system, by introducing a delay in the chain of
relations between the equations, two fundamentally important phenomena can
be achieved at once. The first of these consists in the coexistence of several stable
periodic regimes in the system (10). In this case, a mechanism for increasing
the number of such regimes can be indicated. This phenomenon is often called
multistability. The second important property of the system (10) solutions is that
they have some preassigned number of consecutive positive sections, followed
by a large section of negativity. Taking into account the replacement (9), such
cycles of the system (7) correspond to periodic solutions with the same number
of consecutive asymptotically high spikes, alternating with the section where the
potentials uj (t) are close to zero. Periodic solutions with this property are called
bursting-cycles (see [2,3,10,11]).
The proof of the theorem on the correspondence between the solutions of the
system (7) and the limit system (10) is a technically rather complicated task
(see, for example, [12,13]). It is connected with the construction of asymptotic
approximations of the solution of the system (7). To avoid this, one can replace
the system (7) with a relay type system
where
def 1, 0 < u ≤ 1, def 0, 0 < u ≤ 1,
F (u) = G(u) = (13)
0, u > 1. 1, u > 1.
Note that substitutes (9) leads (13) to relay functions (11), in particular
Thus, all the properties of the relay system (10) automatically passed to the
system (12).
where
(n − m)T0 + ξη ≤ d ≤ (n − m)T0 − ξη, m = 1, . . . , n. (16)
Firstly, let us consider an alone relay equation which we get for xj from (10)
if γ = 0:
ẋ = −1 + αR x(t − 1) − βR(x). (17)
The following statement was proved in the article [12].
Lemma 1 ([12]). Let α > β + 1 and σ < β + 1. Then equation (17) with initial
function ϕ1 ∈ S1 for t ∈ [−1 − σ, −σ] admits a unique stable periodic solution
given by equality
⎧
⎪
⎪ (α − 1)t, t ∈ [0, 1],
⎨
def −t + α, t ∈ [1, α],
x0 (t) = (18)
⎪ −(β + 1)(t − α), t ∈ [α, α + 1],
⎪
⎩
(α − β − 1)(t − T0 ), t ∈ [α + 1, T0 ],
def β+1
x0 (t + T0 ) ≡ x0 (t), T0 = α + 1 + . (19)
α−β−1
Secondly, we consider an additional task.
Lemma 2. For any l ∈ N and τ ∈ [(l − 1)T0 + α + 1, lT0 ], a solution of the task
ẋ = −1 + α − β + γ(c − x)H x0 (t) , x t=0 = x0 (τ ) (20)
By definition put
⎧
⎨ x0 (t), t ∈ [0, h + d∗ ],
x1 (t) = y0 (α + h, t − d∗ − h), t ∈ [h + d∗ , h + d∗ + α + (m − 1)T0 ], (23)
(m) def
⎩ (m) (m)
t − T1 , t ∈ [h + d∗ + α + (m − 1)T0 , T1 ],
⎧
t − d∗ ,
⎪
⎪ t ∈ [0, d∗ ],
x ⎨(t), t ∈ [d∗ , h],
(m) def 0
x2 (t) = y (h − d , t − h), t ∈ [h, h + α + (N − m − 1)T ], (24)
⎪
⎪ 0 ∗ 0
⎩ (m) (m)
t − T2 , t ∈ [h + α + (N − m − 1)T0 , T1 ],
where
= h + d∗ + α + (m − 1)T0 − ξ m (h + d∗ − (n − m)T0 − η) − η,
(m) def
T1 (25)
= h + α + (N − m − 1)T0 − ξ N −m (h − d∗ − mT0 − η) − η,
(m) def
T2 (26)
m N −m
def (N − 2m)T0 + ξ (h − (N − m)T0 − η) − ξ (h − mT0 − η)
d∗ = . (27)
2 − ξ m − ξ N −m
Theorem 1. Let β = α − 2, γ, h satisfy (22). Then there exists σ > 0 such that
system (10) with initial condition from (21) admits N − 1 periodic modes
(m) (m)
colon (x1 (t), x2 (t)) (m = 1, . . . , N − 1).
(m) (m) (m)
Here x1 (t) and x2 (t) are T1 -periodic functions which have N − m and m
relatively short alternating segments of positivity and negativity which go after a
long enough segment where the functions values are negative.
A possible view of the periodic mode is illustrated in Fig. 1.
The following statement is about a stability of the solutions from Theorem 1.
A proof scheme is the same as, for example, in [10,12–14]. Let us introduce
some notation for its presentation.
def (m)
Denote a function of S (m) by ϕ = colon (ϕ1 , ϕ2 ), where ϕ1 ∈ S1 , ϕ2 ∈ S2 .
def
For an arbitrary function ϕ(t) from (21), denote by x(t) = colon x1 (t), x2 (t)
a solution of (10) such that x1 (t) ≡ ϕ1 (t), x2 (t) ≡ ϕ2 (t), when t ∈ [−h − σ, −σ].
Suppose that the equation
x1 (t − σ) = −σ (28)
Two Delay-Coupled Neurons with a Relay Nonlinearity 187
Π:S→S
by the formula
def (m)
Π(ϕ) = x(t + T1 ), −h − σ ≤ t ≤ −σ. (29)
The first step of the proof is the construction of a solution on the segment
(m)
[−σ, T1 ]. It is possible to show that, here the solution is described by (23),
(24). We skip technical details.
(m)
Similarly to T1 , denote the root of x2 (t − σ) = −σ with number 2m + 1 by
(m) (m)
T2 . From the construction of a solution, it follows that T1 equals (25) and
(m)
T2 is described by (26).
Since (22), (25) and (26), the distance between (2N − 2m − 1)-th and (2N −
2m)-th roots of (28) more than length of the segment where S (m) is defined.
Hence operator Π is defined on the set S (m) and transform it into itself. Thus,
for any m = 1, . . . , n there exists periodic solution (23), (24) of the relay system.
From the explicit formulas (23), (24), it follows that all functions from S (m)
map to the unique function. Therefore, Π is contraction operator. According to
the contraction mapping principle, Π has a unique fixed point in S (m) . Thus,
periodic solution of (10) with initial condition from S (m) is unique. Its period
is (25). Moreover, a contraction property of Π means that the stability spectrum
of the periodic solution contains a multiplier μ2 = 0 in addition to μ1 = 1; all
other multipliers equal to zero. In the same time, the multiplier μ2 is a multiplier
of the map −d → −d, ¯ where d¯ is a number such that
(m)
x2 (T1 (ϕ) − σ) = −d¯ − σ. (30)
Lemma 3. The solution (23), (24) of (10) has a countable set of the zero mul-
tipliers, one unit multiplier μ1 = 1 and multiplier
μ2 = −1 + ξ m + ξ N −m . (32)
3 Conclusion
We have proposed and studied a mathematical model of pair of synaptically
coupled impulse neurons with relay nonlinearity and delay in connection chain.
Lets point out the most important results.
The first important feature is that the system (12) is independent phe-
nomenological model of two synaptic coupled neurons. The presented approach
allows us to consider only relay system (12) which is given a well defined bio-
logical meaning. This avoids a laborious proof of the correspondence theorems
which one has to prove if right parts of (12) are continuous and parameter λ is
large (see, for example, [6,10,12–14]).
Secondly, an analysis of (12) shows that an introduction of a delay in a cou-
pling between oscillators implies new effects which are not typically for systems
without delay. In particular, for any even N we find a mechanisms of occurrence
of (N − 1) stable relaxation periodic regimes. The components of the solutions
Two Delay-Coupled Neurons with a Relay Nonlinearity 189
have summary N spikes on a period. Thus, there are both multistability phe-
nomenon and bursting-effect.
Finally, thirdly, a set of coexisting attractors of (12) contains not only solu-
tions described in the present paper. For example, there are antiphase and
impulse-refractive modes which are not considered here.
The reported study was funded by RFBR according to the research project
18-29-10055.
References
1. Hodgkin, A.L., Huxley, A.: A quantitative description of membrane current and its
application to conduction and excitation in nerve. J. Physiol. 117, 500–544 (1952)
2. Izhikevich, E.: Neural excitability, spiking and bursting. Int. J. Bifurcat. Chaos
10(6), 1171–1266 (2000). https://doi.org/10.1142/S0218127400000840
3. Rabinovich, M.I., Varona, P., Selverston, A.I., Abarbanel, H.D.I.: Dynamical prin-
ciples in neuroscience. Rev. Mod. Phys. 78, 1213–1265 (2006). https://doi.org/10.
1103/RevModPhys.78.1213
4. Kashchenko, S.A., Maiorov, V.V., Myshkin, I.Y.: Wave distribution in simplest
ring neural structures. Matem. Mod. 7(12), 3–18 (1995). http://mi.mathnet.ru/
mm1392
5. Kashchenko, S.: Models of Wave Memory. Springer, Switzerland (2015). https://
doi.org/10.1007/978-3-319-19866-8
6. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: On a method for mathematical modeling
of chemical synapses. Differ. Equ. 49(10), 1193–1210 (2013). https://doi.org/10.
1134/S0012266113100017
7. Somers, D., Kopell, N.: Rapid synchronization through fast threshold modulation.
Biol. Cybern. 68, 393–407 (1993). https://doi.org/10.1007/BF00198772
8. Somers, D., Kopell, N.: Anti-phase solutions in relaxation oscillators coupled
through excitatory interactions. J. Math. Biol. 33, 261–280 (1995). https://doi.
org/10.1007/BF00169564
9. Terman, D.: An introduction to dynamical systems and neuronal dynamics. In:
Tutorials in Mathematical Biosciences I: Mathematical Neuroscience, pp. 21–68.
Springer, Berlin (2005). https://doi.org/10.1007/978-3-540-31544-5 2
10. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Modeling the bursting effect in
neuron systems. Math. Notes. 93(5), 676–690 (2013). https://doi.org/10.1134/
S0001434613050040
11. Chay, T.R., Rinzel, J.: Bursting, beating, and chaos in an excitable mem-
brane model. Biophys. J. 47(3), 357–366 (1985). https://doi.org/10.1016/S0006-
3495(85)83926-6
12. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Relaxation self-oscillations in neu-
ron systems: I. Differ. Equ. 47(7), 927–941 (2011). https://doi.org/10.1134/
S0012266111070020
13. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Relaxation self-oscillations in neu-
ron systems: II. Differ. Equ. 47(12), 1697–1713 (2011). https://doi.org/10.1134/
S0012266111120019
14. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Discrete autowaves in neural systems.
Comput. Math. Math. Phys. 52(5), 702–719 (2012). https://doi.org/10.1134/
S0965542512050090
Brain Extracellular Matrix Impact
on Neuronal Firing Reliability
and Spike-Timing Jitter
Abstract. In this work, the role of the brain extracellular matrix (ECM)
in signal processing by a neuronal system is examined. For excitatory
postsynaptic currents in the form of Poisson signal, we study the changes
of the interspike intervals duration, spike-timing jitter and coefficient
of variation in the presence of a background noise with varied inten-
sity. Without ECM impacts, noise-delayed spiking phenomenon reflect-
ing worsening of both reliability and precision of signal processing is
revealed. It is shown that, the ECM-neuron feedback mechanism allows
enhancing the robustness of neuronal firing in the presence of noise.
1 Introduction
Information about any changes in external environment is transmitted by neu-
ronal systems via changes of their membrane potential activity. Despite the
presence of huge number of background noise sources, a lot of experimental
data show that repeated identic signals provoke outputs with similar character-
istics [1,2]. This amazing neuronal ability to process signals with high reliability
and precision is still poorly understood, and, therefore, is of particular interest.
Recently, based on experimental observations, new mathematical model for
neuronal activity in the presence of ECM was introduced in [3], where the authors
studied the role of ECM-neuron feedback mechanisms activation in sustaining
of homeostatic balance in neuronal firing network as well as its possible role in
memory function implementation. In this study, within the frame of this model
we discuss one possible mechanism for neuronal activity regulation that can
enhance the reliability and precision of signal transmission in the presence of
background noise.
2 Mathematical Model
2.1 Postsynaptic Neuronal Dynamics
where x is m(V, t), h(V, t) (that are responsible for the activation and inactivation
of the Na+ -current), or n(V, t) (that controls the K+ -current activation). The
mean transition rates αx (V ), βx (V ), and the parameters of the model are taken
as in the classical work of Hodgkin and Huxley [4].
where tj is the occurence time of a pulse with amplitude A in the input signal.
This time satisfies Poisson distribution with average time interval τin between
the subsequent pulses. The duration of each pulse in the input is assumed to be
constant with τ = 1 ms. For each pulse the amplitude A has a random value
that satisfies the probability distribution
2A −A2 /b2
P (A) = e (4)
b2
with the scaling factor b = b0 (1 + γZb Z), where γZb is the gain parameter that
modifies the amplitude of IEP SCs [3].
The second term of Isyn is the white Gaussian noise with zero mean ξ(t) = 0
and with the correlation function ξ(t)ξ(t + τG ) = Dδ(τG ).
Additionally we assume that Iapp = Idc (1 + γZ Z), where γZ is the feedback
gain parameter that modifies the applied current. Thus, both the currents in the
192 M. A. Rozhnova et al.
input of the neuron (1) depend on the variable Z whose value should be taken
from the following system of equations describing ECM dynamics:
Z0 − Z1
Ż = −(αZ + γP P )Z + βZ Z0 − ,
1 + exp(−(Q − θZ )/kZ )
(5)
P0 − P1
Ṗ = −αP P + βP P0 − ,
1 + exp(−(Q − θP )/kP )
Fig. 1. (a), (b) EPSCs-Poisson pulse trains for two values of interpulse duration τin =
10 ms and τin = 2 ms, and (c), (d) evoked oscillations of the membrane potential in
the absence of ECM, D = 0, Idc = 5.7 μA/cm2 , b0 = 3.
Brain Extracellular Matrix Impact on Neuronal Firing 193
(i)
the spike-timing jitter (the mean square deviation of τid ) as
n
1 (i)
σ= [τ ]2 − τid
2 (8)
n i=1 id
and the coefficient of variation β = σ/τid that illustrates the degree of coherence
in the neuronal output.
Fig. 2. The mean of the interspike interval duration, the spike-timing jitter, and the
coefficient of variation as functions of input interpulse duration for three values of the
parameter b0 without ECM for (a) Idc = 5 μA/cm2 , (b) Idc = 7 μA/cm2 and (c)
Idc = 10 μA/cm2
sidered values of Idc , Fig. 3(a) shows that almost all curves have a similar non-
monotonic behavior with a maximum at some value of noise intensity. Wherein,
small amount of fluctuations impedes the spiking: noise with small intensity D
provokes the increase of the mean interspike interval duration. Such noise delayed
spiking phenomenon was observed in [7–13] for the mean latency time. Here, we
demonstrate that for the interspike intervals this phenomenon also takes place:
the neuronal cell sensitivity to noise is particularly high within a certain interval
of noise intensities (where the increase of τid is observed). The degree of such
sensitivity to noise is also dependent on b0 and τin . From Figs. 3(b), (c) follows
that the increase of b0 as well as the decrease of τin lead to decrease of noise-
sensitivity, the maximum becomes less pronounced. As we can see, noise delayed
2
spiking is observed for large enough values of Idc only. For Idc = 7 μA/cm (blue
curve in Fig. 3(a), upper panel) we observe another dependence. For this param-
eter, in noise-free case the system spends a lot of time near the resting state that
(i)
leads to appearance of large values in τid -statistics. Fluctuations drive out the
system to oscillatory mode and lead to decrease of τid . Obviously, that a similar
2
behavior we can also observe for any Idc < 7 μA/cm . Taking above mentioned
2
differences into account, we further focus on two cases (Idc = 7 μA/cm and
2
Idc = 8.5 μA/cm ) and consider the role of ECM in cell’s sensitivity to external
fluctuations.
Fig. 3. White Gaussian noise-induced changes: the mean of the interspike interval
duration, the spike-timing jitter, and the coefficient of variation as functions of noise
intensity D for (a) for four values of Idc , τin = 4 ms, b0 = 1, (b) for three values of
b0 , Idc = 8.5 μA/cm2 , τin = 4 ms, (c) for three values of the input interpulse duration
τin , Idc = 8.5 μA/cm2 , b0 = 1.
with oscillations) and monostable modes can be observed for various parame-
ters. In this study, the average activity variable Q is assumed to be changeable
in time in accordance with (6). The parameters of ECM-model provide the tran-
sition to some stationary level of concentrations of ECM molecules Z as a result
of high level of averaged neuronal activity Q. Taking into account the gain of
IEP SCs (Fig. 4(a)) and Iapp (Fig. 4(b)) due to the establishment of Z-level, leads
to increase of the system’s reliability: ECM-induced elimination of the noise-
delayed spiking effect is observed.
Fig. 4. ECM-induced changes: the mean of the interspike interval duration, the spike-
timing jitter, and the coefficient of variation as functions of noise intensity D for two
values of Idc (Idc = 7 μA/cm2 (blue curves) and Idc = 8.5 μA/cm2 (green curves)) for
(a) γZb = 0.3, (b) γZ = 0.1, τin = 4 ms, b0 = 1.
5 Conclusions
Neuronal firing activity was studied within the frame of Hodgkin-Huxley model
driven by the synaptic currents accounting the existence of background noise
and impacts of ECM whose concentration of molecules can be modified via
feedback mechanism of neuron-ECM interaction. In the absence of ECM, the
phenomenon of noise-delayed spiking is observed: both reliability and precision
of the signal transition in the presence of noise become worse. Introducing of
ECM impacts into the model shows elimination of this negative noise-induced
effects, that allows demonstrating more reliable and precise signal processing by
the neuronal systems.
196 M. A. Rozhnova et al.
Acknowledgments. The work was supported by the Ministry of Education and Sci-
ence of Russia (Project No. 14.Y26.31.0022).
References
1. Rodriguez-Molina, V.M., Aertsen, A., Heck, D.H.: Spike timing and reliability
in cortical pyramidal neurons: effects of epsc kinetics, input synchronization and
background noise on spike timing. PLoS ONE 2(3), e319 (2007). https://doi.org/
10.1371/journal.pone.0000319
2. Tiesinga, P., Fellous, J.-M., Sejnowski, T.J.: Regulation of spike timing in visual
cortical circuits. Nat. Rev. Neurosci. 9(2), 97–107 (2008). https://doi.org/10.1038/
nrn2315
3. Kazantsev, V., Gordleeva, S., Stasenko, S., Dityatev, A.: A homeostatic model of
neuronal firing governed by feedback signals from the extracellular matrix. PLoS
ONE 7(7), e41646 (2012). https://doi.org/10.1371/journal.pone.0041646
4. Hodgkin, A.L., Huxley, A.F.: A quantitative description of membrane current and
its application to conduction and excitation in nerve. J. Physiol. 117, 500–544
(1952)
5. Lee, S.-G., Neiman, A., Kim, S.: Coherence resonance in a Hodgkin-Huxley neuron.
Phys. Rev. E 57(3), 3292–3297 (1998). https://doi.org/10.1103/PhysRevE.57.3292
6. Parmananda, P., Mena, C.H., Baier, G.: Resonant forcing of a silent Hodgkin-
Huxley neuron. Phys. Rev. E 66, 047202 (2002). https://doi.org/10.1103/
PhysRevE.66.047202
7. Pankratova, E.V., Polovinkin, A.V., Mosekilde, E.: Resonant activation in a
stochastic Hodgkin-Huxley model: interplay between noise and suprathreshold
driving effects. Eur. Phys. J. B 45(3), 391–397 (2005). https://doi.org/10.1140/
epjb/e2005-00187-2
8. Gordeeva, A.V., Pankratov, A.L.: Minimization of timing errors in reproduction
of single flux quantum pulses. Appl. Phys. Lett. 88, 022505 (2006)
9. Pankratova, E.V., Belykh, V.N., Mosekilde, E.: Role of the driving frequency in
a randomly perturbed Hodgkin-Huxley neuron with suprathreshold forcing. Eur.
Phys. J. B 53(4), 529–536 (2006). https://doi.org/10.1140/epjb/e2006-00401-9
10. Ozer, M., Graham, L.J.: Impact of network activity on noise delayed spiking for
a Hodgkin-Huxley model. Eur. Phys. J. B 61, 499–503 (2008). https://doi.org/10.
1140/epjb/e2008-00095-y
11. Gordeeva, A.V., Pankratov, A.L., Spagnolo, B.: Noise induced phenomena in point
Josephson junctions. Int. J. Bifurcat. Chaos 18, 2825–2831 (2008)
12. Uzuntarla, M., Ozer, M., Ileri, U., Calim, A., Torres, J.J.: Effects of dynamic
synapses on noise-delayed response latency of a single neuron. Phys. Rev. E 92(6),
062710 (2015). https://doi.org/10.1103/PhysRevE.92.062710
13. Uzuntarla, M.: Inverse stochastic resonance induced by synaptic background activ-
ity with unreliable synapses. Phys. Lett. A 377(38), 2585–2589 (2013). https://
doi.org/10.1016/j.physleta.2013.08.009
14. Lazarevich, I.A., Stasenko, S.V., Rozhnova, M.A., Pankratova, E.V., Dityatev,
A.E., Kazantsev, V.B.: Dynamics of the brain extracellular matrix governed by
interactions with neural cells. arxiv:1807.05740
Contribution of the Dorsal and Ventral Visual
Streams to the Control of Grasping
Irina A. Smirnitskaya(&)
Abstract. Since 1982 Ungerleider and Mishkin’s paper about the different
roles of dorsal and ventral visual streams, the first as “where” and the last as
“what”, there is no consensus, what these pathways really do and are they really
exist. In this review the contribution of parietal, premotor and prefrontal cortical
regions in the control of grasping in the context of the existence of two visual
streams is discussed. There is evidence that each of the two streams consists of
two subdivisions. The roles of the subdivisions in control of grasping such as:
the memorizing of the features of object for grasping, the calculation of value of
the object for grasping, the control of the movement’s precision, the retention of
the movement’s goal in working memory, and so on, are analyzed. The com-
plementarity of the dorsal and ventral regions of visual pathways in motion
control is shown. The separate problem is the coherency of the execution of all
this tasks. Each of the pathways performs its part by interchanging signals and
ensuring coordinated execution of the work.
1 Introduction
In 1982 the article by Ungerleider and Mishkin [1] introduced the “space versus object”
principle in interpretation of functions of different visual areas during perception. The
authors discovered that the processing of visual information starting in visual areas V1,
V2 divides in two streams: the first, dorsal stream goes to the posterior parietal regions
through visual areas V5 and V6, the second, ventral stream proceeds to the temporal
lobe through area V4. The dorsal pathway is responsible for space perception, and the
ventral pathway is related with object perception. The authors called them “Where” and
“What” systems. The results were obtained in monkeys, but the information flow
division is true for humans too [2].
Let us take the well-studied process of grasping as an example of manipulative
actions to determine the roles of different visual streams and their interconnections.
Patients with posterior parietal cortex lesion can omit some operations of grasping [3].
A patient with optic ataxia disorder has difficulty in directing his arm towards an object
to be grasped. He can see the object and tell its location, but fails to get hold of it at
once, finding it as if by chance.
A patient suffering neglect has another type of malfunction: he can’t see an object at
all, but keeps implicit perception [4]. The difference is that in the first case the lesion is
in the superior parietal lobule and in the second, the lesion is centered in inferior
parietal lobule.
Both the superior parietal lobule (Brodmann area 5, SPL) and inferior parietal
lobule (Brodmann area 7, IPL) belong to the dorsal visual pathway and are the superior
and inferior parts of the intraparietal sulcus (IPS) (see Fig. 1). The SPL receives a
visual signal from the visual area V5 and sends the output signal to the dorsal premotor
area. The area is responsible for directing the hand and the eyes towards the object.
The IPL receives a signal from the visual area V6, its motor-region destination being
the ventral premotor area which controls the grasp motions of the hand and fingers.
The authors of paper [3] proposed a model of visual information processing in
dorsal visual stream that highlights two parts in it: the dorso-dorsal stream that goes
through the SPL to the dorsal premotor areas and the dorso-ventral stream that runs
through the IPL to the ventral premotor areas.
parietal regions and proceeds to the motor, premotor and prefrontal areas of the neo-
cortex, the ventral stream runs to the inferior temporal areas, and finely to the ven-
trolateral pre-frontal cortex [2] being considered as the destination of the ventral
pathway. A detail inspection of pathways from inferotemporal area TE to prefrontal,
orbitofrontal and medial temporal regions points to the engagement of TE with the
network related to behavioral choice [6] determined by the values of objects and
possible actions.
The ventral visual pathway decides whether to answer to the input stimulus or not.
This means, that it solves two problems: (a) it determines the value of the stimulus and
(b) memorizes its sensory representation.
To cope with the first problem, the interpretation of the visual signal is made in the
tem-poral and prefrontal areas. As a result, the object value is computed and the
behavioral choice is done. For this purpose, the inferior temporal area TE interchanges
signals with the amygdala, orbitofrontal cortex, hippocampal formation. In turn, the
amygdala, orbitofrontal and insular cortical areas are interconnected [7, 8] and jointly
calculate the value of objects [9]. The destination of the ventral pathway is the ven-
trolateral prefrontal cortex holding the response pattern.
Fig. 1. Two ways of interpretation of the visual signal: the ventral way (bottom part of the
figure) and dorsal way (the top part of the figure). The dorsal pathway divides in two ways: the
dorso-dorsal and dorso-ventral way. V1 – V6 are the occipital visual areas, TEO, TE stand for
inferior temporal areas, PMd, PMv are the dorsal and ventral pre-motor regions. Areas 46d and
46v are dorsolateral and ventrolateral prefrontal cortical regions.
The second problem is the memorizing and it is solved by the network consisting of
the inferior temporal area, hippocampus, perirhinal, postrhinal and entorhinal cortical
areas.
200 I. A. Smirnitskaya
The dorsal visual stream is the system that controls the action: it is responsible for
reaching the object by the arm and grasping by the fingers.
The visual and somatosensory features of objects are represented in the parietal cortex.
It consists of the primary somatosensory region S1 and higher-order areas that store a
combined visual and somatosensory representation of the object. These representations
are transmitted to the motor and premotor areas [5] to perform the action.
The somatosensory information that comes to the parietal cortex from the thalamus
is of two types: tactile and proprioceptive. The tactile information arrives from cuta-
neous mechanoreceptors embedded in the skin, that converts the mechanical defor-
mation of skin to neural signals. The proprioceptive information goes from deep
receptors telling about the degree of the compression and stretching of muscles, ten-
dons, ligaments and joints. For the motion that has already started, the both types of
information are feedback signals. That is, with respect to the somatosensory signal the
visual signal is a primary signal that triggers the motion. As the action lasts, the motion
is corrected; the tactile characteristics of the object such as the form, texture, weight are
analyzed and memorized; the patterns of the joint activity of hand and finger muscles
that secure proper grasp motions are also stored. For these purposes all areas partici-
pating in the initiation and execution of the motion (both primary and high-order areas)
sends feedforward and feedback projections.
Four subareas can be selected in the primary somatosensory area S1. These are
called Brodmann’s areas 1, 2, 3a and 3b. Area 3b is the primary area for tactile
reception, area 3a is the primary proprioceptive area. Area 1 is secondary for tactile
reception: its removal turns off the texture recognition; area 2 has equal amounts of
tactile and proprioceptive secondary inputs and it deal with coordination of fingers in
grasping and with recognition of the form and size of objects being grasped.
The higher-order parietal areas form 2 clusters: lateral parietal areas and posterior
parietal areas.
In the previous paragraph, it was pointed out that the ventral visual pathway
determines the value of the object and the value of the manipulations with the object,
while the dorsal pathway is responsible for the arranging of the action. The exami-
nation of the pathways for sensory information in the parietal cortex (the dorsal
pathway) shows, that somatosensory characteristics of the object received during the
manipulation with it, arrive to the secondary somatosensory area S2 (the cluster of
lateral parietal areas), and this area sends signals to the insular cortex [10], which is a
part of the network storing values of objects and interacting with the ventral stream. We
see the joint activity of the dorsal and ventral pathways here.
The posterior parietal areas serve as the beginning of dorso-dorsal (SPL) and dorso-
ventral (IPL) pathways [3].
The dorso-ventral pathway starts in the inferior parietal lobule (Brodmann’s area 7)
which sends signals to the motor area M1 and ventral premotor area PMv. As a result
of interaction with S1, M1 and PMv, a distributed representation of sensory signals
Contribution of the Dorsal and Ventral Visual Streams 201
initiating the grasping is formed in the posterior parietal areas: the visual object to be
grasped, the direction towards the object and the handling characteristics of the object
(the form, size, weight and texture) found by referring to the previously investigated
and accumulated information [11]. Additionally, these areas interchange information
with inferior temporal areas TEa/m, which is a part of the ventral pathway: this area
sends a permission to act to posterior parietal areas. That is, the interaction of the dorsal
and ventral pathways also occurs in this place.
The dorso-dorsal pathway originating in the superior parietal lobule (Brodmann’s
area 5) sends signals to the dorsal premotor area PMd and is responsible for the
direction of the eyes and the arm towards the object. Though it doesn’t interchange
signals with inferior temporal areas and doesn’t receive signals about the value of the
object from them, the end point of the prefrontal cortex that receives the signal from the
dorso-dorsal pathway is the dorsolateral area 46d, which is considered to be the center
of the working memory. So, the dorso-dorsal pathway holds the holistic representation
of the current motor task.
Giving the formalized consideration of the trial-and-error learning, the classical text-
book Reinforcement Learning by Sutton and Barto [12] begins with the description of a
gambling machine which has n options with different winning probabilities (Multi-
armed Bandits). The game with this timeless device comes down to the repetition of the
same event: each time we, as though anew, come to the machine, activate an arm and
hope for a win. And only our memory keeps different outcomes, adding a one-time
outcome to the sequence of previous results. Having used this static example to
introduce the concepts of value function and prediction problem, the authors quickly
turn to the main objective: a sequence of actions where each step can be different and
where the desire to get the greatest reward necessitates the optimization of the whole
sequence.
If the transformation of external sensory (visual) signals into motor commands is
regarded as either a discrete action or succession of actions, the grasping is a discrete
act, while reaching by the arm followed by grasping an object by the fingers is a
sequence of actions. As regards the necessary calculation of the action value, the
grasping value is equal to the value of the object to be grasped. The value of the action
sequence consisting of outstretching of the hand and grasping of the object is also equal
to the value of the object. However, there are many exceptions, e.g. when experi-
menters put different obstacles in the course of the hand, the values of action sequences
vary.
The ventral visual stream calculates the value of actions (in the discrete case, it is
equal to the value of the object being manipulated), and for an action sequence the
value can be found differently. It is important that the ventral visual stream engages the
hippocampus which is responsible for remembering new objects and corresponding
action sequences. In dealing with sequences the working memory plays an important
202 I. A. Smirnitskaya
role. The dorsolateral prefrontal cortex (area 46d) is regarded as a substratum of this
kind of memory. This cortical area interacts with the hippocampus.
It is widely accepted that the ventral stream endpoint is the ventrolateral prefrontal
cortex. It is true for discrete actions which, as mentioned above, are governed by the
dorsoventral stream.
The dorso-dorsal stream whose endpoint is the dorsolateral cortex is responsible for
the representation of the sequences of actions, in other term, the holistic representation
of the motor task.
Dorso-dorsal
Object
DLPFC PMd
SPL
VLPFC PMv
Value
Motor, Visual
calculation Premotor IPL
Areas
areas
Prefrontal Dorso-ventral V1, V2
OFC Insula
cortex Parietal
cortex
V4
TE, TEa/m
Ventral
stream
Inferotemporal
Hippocampus,
areas
entorhinal cortex
Fig. 2. The dorsal and ventral visual streams and their subsystems
6 Conclusion
There are 4 visual pathways (Fig. 2). These are dorso-dorsal and dorso-ventral path-
ways which belong to the dorsal stream. Within the ventral stream the pathway from
the visual areas to the inferior temporal areas TE splits to go to the orbitofrontal cortex
and to the hippocampal areas. Though these pathways execute their own tasks, they
interact and function conjointly.
Acknowledgement. The review was done within the 2019 state task 0065-2019-0003 Research
into Neuromorphic Big-Data Processing Systems and Technologies of Their Creation.
Contribution of the Dorsal and Ventral Visual Streams 203
References
1. Ungerleider, L.G., Mishkin, M.: Two cortical visual systems. In: Ingle, D.J., Goodale, M.A.,
Mansfield, R.J.W. (eds.) Analysis of Visual Behavior, pp. 549–586. MIT Press, Cambridge
(1982)
2. Kravitz, D.J., Saleem, K.S., Chris, I., Baker, C.I., Ungerleider, L.G., Mishkin, M.: The
ventral visual pathway: An expanded neural framework for the processing of object quality.
Trends Cogn. Sci. 17(1), 26–49 (2013)
3. Rizzolatti, G., Matelli, M.: Two different streams form the dorsal visual system: anatomy and
functions. Exp. Brain Res. 153, 146–157 (2003)
4. Rizzolatti, G., Berti, A., Gallese, V.: Spatial neglect: neurophysiological bases, cortical
circuits and theories. In: Boller, F., Grafman, J., Rizzolatti, G. (eds.) Handbook of
neuropsychology 2nd edn, vol. I, pp 503–537. Elsevier Science, Amsterdam (2000)
5. Delhaye, B.P., Long, K.H., Bensmaia, S.J.: Neural basis of touch and proprioception in
primate cortex. Compr. Physiol. 8(4), 1575–1602 (2019)
6. Murray, E.A., Rudebeck, P.H.: The drive to strive: goal generation based on current needs.
Front. Neurosci. 7, 1 (2013). Article112
7. Höistada, M., Barbas, H.: Sequence of information processing for emotions through
pathways linking temporal and insular cortices with the amygdala. Neuroimage 40(3), 1016–
1033 (2008)
8. Ghashghaeia, H.T., Hilgetaga, C.C., Barbas, H.: Sequence of information processing for
emotions based on the anatomic dialogue between prefrontal cortex and amygdala.
Neuroimage. 34(3), 905–923 (2007)
9. Smirnitskaya, I.A.: How the cingular cortex, basolateral amygdala and hippocamp contribute
to retraining. In: Proceedings of the XV All-Russia Conference Neuroinformatics (2013)
10. Friedman, D.P., Murray, E.A., O’Neill, J.B., Mishkin, M.: Cortical connections of the
somatosensory fields of the lateral sulcus of macaques: evidence for a corticolimbic pathway
for touch. J. Comp. Neurol. 252, 323–347 (1986)
11. Borra, E., Gerbella, M., Rozzi, S., Luppino, G.: The macaque lateral grasping network: a
neural substrate for generating purposeful hand actions. Neurosci. Biobehav. Rev. 75, 65–90
(2017)
12. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press,
Cambridge (2018)
Deep Learning
The Simple Approach to Multi-label Image
Classification Using Transfer Learning
Yuriy S. Fedorenko(&)
Abstract. The article deals with the problem of image classification on a rel-
atively small dataset. The training deep convolutional neural net from scratch
requires a large amount of data. In many cases, the solution to this problem is to
use the pretrained network on another big dataset (e.g. ImageNet) and fine-tune
it on available data. In the article, we apply this approach to classify advertising
banners images. Initially, we reset the weights of the last layer and change its
size to match a number of classes in our dataset. Then we train all network, but
the learning rate for the last layer is several times more than for other layers. We
use Adam optimization algorithm with some modifications. Firstly, applying
weight decay instead of L2 regularization (for Adam they are not same)
improves the result. Secondly, the division learning rate on the maximum of
gradients squares sum instead of just gradients squares sum makes the training
process more stable. Experiments have shown that this approach is appropriate
for classifying relatively small datasets. Used metrics and test time augmentation
are discussed. Particularly we find that confusion matrix is very useful because it
gives an understanding of how to modify the train set to increase model quality.
1 Introduction
Deep convolutional neural networks are very effective for solving image classification
task. However, training such networks from scratch (with random initialization) is not
always possible because it requires a large amount of data. Therefore transfer learning
has become common in many applied tasks [1]. Deep learning frameworks already
have common convolutional neural networks (VGG [2], ResNet [3], Inception [4])
pretrained on ImageNet. So, there is no need to train models yourself on this dataset.
But in practice there are several issues that need to be solved. The first problem is
connected with proper learning rate selection. Too small value may result in a very long
training process which stop on a flat valley. Too large value may lead to learning a sub-
optimal set of weights. Besides, the learning rate on the last layers of the network
should be greater than on the first layers, because the earlier layers of the network have
enough generic features that may be useful in many tasks. The second problem is
connected with unstable training process when using Adam algorithm.
2 Problem Definition
In this article, we consider the classification of advertising banners images. The user
interest in the banner depends on the banner image, so it’s important to determine the
banner image topic. The banner image is fed on the input of the model. The model
output is one or several classes of image in our specialized taxonomy. But there are
several problems. Firstly, the number of labeled images is relatively small. It is mea-
sured in hundreds not thousands of samples. This amount of data is not enough to train
the model from scratch. Secondly, images of advertising banners are specific enough,
so we can’t use pretrained on ImageNet model directly. And thirdly, each image can
belong to several classes. For example, it may be the advertising of mobile application
to call a taxi. In such a case, the model should detect two classes: mobile app and taxi.
To deal with the first two problems we use transfer learning. We take pretrained neural
network, reset last layer weights and change last layer size to match the number of
classes in our taxonomy. We train all network, but for the last layer, the learning rate is
five times more than for other layers. Also, we use an adaptive learning rate [5].
Initially, the upper limit of the learning rate is searched. To find it we increase the
learning rate step by step from small value and train the neural net on each step. The
whole procedure takes only about 10–20 epochs, so the classical overfitting after
multiple passes through the training set does not have time to happen. The minimum
learning rate at which the validation set error starts to increase is the required upper
limit. The example is presented in Fig. 1. After each epoch, the learning rate was
increased by 1 step (0.0001), and the loss value in the training and validation set was
marked on the graph.
We start training with a learning rate of 1/10 from the upper limit value. Then
learning rate subsequently increases to the upper limit after that it decreases back
(Fig. 2). This method, called the one cycle policy, has a simple motivation. At the start,
the small learning rate provides more accurate convergence. Then when optimizer
traverses a flat valley, increasing of learning rate allows to speed up training. In the
final stages, optimizer falls into the local minimum, and the learning rate is again
reduced to provide more accuracy. Besides, it’s argued that a relatively high learning
rate in the middle of the training process is a form of regularization because it helps the
network to avoid steep areas of the loss function which correspond to overfitted con-
figurations [6].
For training, we use Adam algorithm with modifications. Many researchers have
disappointed in Adam after it introducing in 2014, claiming that SGD with momentum
performs better. But in 2017 in [7] AdamW algorithm was proposed. It used weight
decay instead of L2 regularization. As known, L2 regularization implies adding sum of
the model weights squares to the loss function:
c Xn
Jr ¼ J þ x2
k¼1 k
2
where J – loss function, Jr – loss function with regularization term, c – regularization
coefficient and xk – weights of the neural net. For simple SGD it leads to weight decay
because updating rule is as follows:
@J
xk ¼ xk a a c xk
@xk
Where a – learning rate. But for more sophisticated optimizers such as Adam, this
is not true, because regularization term in loss function affects the value of the accu-
mulated gradients and gradient squares. So, Adam with L2 regularization and Adam
with weight decay (AdamW) are two different approaches. In [7] authors argue that we
should use AdamW instead of Adam with L2 regularization implemented in classic
deep learning frameworks. Our experiments show that AdamW leads to a better result,
so we have used it.
210 Y. S. Fedorenko
One more modification is Amsgrad technique. In the article [8] an error was found
in Adam update rule. It could cause an algorithm to converge to a suboptimal point.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The problem is that the proof of Adam requires that the step size a= E ½g2 þ e does
not decrease over training process. But this is not satisfied in many cases because the
exponential moving average of gradient squares E ½g2 may decrease in the last epochs
of training. So, authors of Amsgrad suggested using the maximum value of this
quantity, because it is guaranteed non-increasing. In practice, the effect of such mod-
ification is controversial. But in our experiments using Amsgrad allows achieving
better and more stable results compared to simply Adam. So, we use Adam with weight
decay and Amsgrad technique.
As mentioned above, each sample may belong to multiple classes. In such a case,
the sample is passed to the model several times separately with each label. This allows
considering multi-label images in a simple way. Also, we use data augmentation during
training to improve network generalization.
4 Experiments
Fig. 4. Precision-recall graphs for “Auto” category (the top chart without TTA, the bottom chart
with TTA by 10 samples)
So, we can see that using test time augmentation slightly improves the result.
Examples of correct and wrong images classification are presented in Fig. 5.
212 Y. S. Fedorenko
5 Conclusion
So, the concrete images classification tasks can be performed by transfer learning. It
solves the problem of a relatively small dataset and eliminates the need for a com-
putationally time-consuming procedure of training model from scratch. The using of
Adam optimization algorithm with its recent modifications along with proper learning
rate selection improves the training process and makes it more stable. Also, the dataset
preparation is crucial. The analyzing of confusion matrix and viewing misclassified
samples gives understanding, how to modify train dataset. Several iterations of dataset
enhancement usually yield an acceptable practical result.
References
1. Karpathy, A.: Convolutional neural networks for visual recognition. https://cs231n.github.io/
transfer-learning/. Accessed 1 Apr 2019
2. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition (2015). arXiv preprint, arXiv:1409.1556v6 [cs.CV]
3. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE
Conference on Computer Vision and Pattern Recognition, CVPR, pp. 770–778. IEEE, New
Jersey (2016)
The Simple Approach to Multi-label Image Classification 213
4. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens J., Wojna, Z.: Rethinking the inception
architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern
Recognition, CVPR, pp. 2818–2826. IEEE, New Jersey (2016)
5. Smith, L.: Cyclical learning rates for training neural networks. In: IEEE Winter Conference on
Applications of Computer Vision, WACV, pp. 464–472. IEEE, New Jersey (2017)
6. Gupta, A.: Super-convergence: very fast training of neural networks using large learning rates.
https://towardsdatascience.com/https-medium-com-super-convergence-very-fast-training-of-
neural-networks-using-large-learning-rates-decb689b9eb0. Accessed 10 Apr 2019
7. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2019). arXiv preprint,
arXiv:1711.05101v3 [cs.LG]
8. Reddi, S., Kale, S., Kumar, S.: On the convergence of adam and beyond. In: International
Conference on Learning Representations, ICLR, Vancouver, BC, Canada , pp. 186–208
(2018)
9. Ayhan, M., Berens, P.: Test-time data augmentation for estimation of heteroscedastic aleatoric
uncertainty in deep neural networks. In: Medical Imaging with Deep Learing Conference,
MIDL, Amsterdam, Netherlands, pp. 278–286 (2018)
Application of Deep Neural Network
for the Vision System of Mobile Service Robot
1 Introduction
With the development of deep neural networks used for the classification, segmentation
and detection of objects, the area of their application is also growing [1].
The use of neural networks to increase the level of autonomy of vehicles is a
popular and urgent task. Also, neural network methods are often used to improve the
accuracy of orientation of mobile robots in environment [2]. In general, the task of
object detection by video image is extremely promising in robotics, its solution allows
scout robots to increase the level of autonomy when searching for objects of interest,
which is important when working in extreme conditions. It will also be useful to apply
these technologies in the service robotics industry to create more intelligent systems
capable of finding certain items.
The main limitation for the implementation of neural network algorithms is the high
requirements to computing hardware. This problem is being widely solved by the
community and at the moment there is a set of methods that provide improved speed. It
is relevant to compare and integrate these methods used in the real tasks of robotics.
Single-stage neural network detectors have the highest speed, in which assumptions
about the location of objects and the probability of their belonging to certain classes are
made simultaneously with a convolutional neural network. Such neural networks are
YOLO [3] and SqueezeDet [4]. The principle of operation is to extract from the image
multidimensional feature maps and use them to train one (or more) layers, the output of
which is a tensor containing the estimated coordinates of objects and indices of their
classes. In the case of SqueezeDet, feature maps are retrieved using the SqueezeNet
high-performance neural network [5]. Coordinates are predicted in accordance with the
specified sampling grid, and object patterns—anchors. The templates are deformed and
shifted relative to the sampling grid, and each is assigned a confidence value, according
to which they are then filtered using non-maximum suppression.
Neural networks SqueezeDet and YOLO have the same principle of operation, but
the architecture of SqueezeDet was created more specifically to be embedded in low-
power platforms, which causes differences in the structure and performance of these
neural networks.
Consider the layers responsible for the object detection using the input object maps.
In SqueezeDet, the detection layer is a convolutional layer called ConvDet; for sim-
plicity, we denote the block responsible for detection in YOLO as FcDet, since it
consists of two fully connected layers.
Assume that the input feature map width is Wf, height is Hf, and input channels
number is Cf. Denote ConvDet’s filter width as Fw and height as Ff. With the proper
striding the output of ConvDet keeps the initial size of input feature map. Thus, to
compute K ð4 þ 1 þ CÞ outputs for each reference grid the ConvDet requires
FwFhChfK(5 + C) parameters (Fig. 1).
Using the same notation and designating the number of neurons in the first layer of
the FcDet block as Ffc1, it can be determined that the number of parameters in the first
fully connected layer will be WfHfChfFfc1. The second fully connected layer that
generates C class probabilities and K(4 + 1) bounding box coordinates for the WoxHo
sampling grid contains is Ffc1WoHo(5 K + C) (Fig. 2). The total number of param-
eters in these two fully connected layers is Ffc1(WfHfChf + WoHo(5 K + C)).
216 N. Filatov et al.
The tensor 7 7 1024 is taken as the input feature map in YOLO, Ffc1 = 4096,
K = 2, C = 20, Wo = Ho = 7. Thus, the total number of parameters required for two
fully connected layers will be approximately 212 106. If the same configuration
parameters are used for 3 3 ConvDet it would only require 3 3 1024 2 25
0:46 106 parameters which is 460 times smaller than FcDet.
A small number of parameters of the neural network certainly allow you to require
less space in the memory and provide a higher speed. However, due to the different
computational complexity of the layers, the speed of the architecture is not directly
proportional to its size, therefore, it is important to check the speed of operation of the
studied architectures on identical hardware. For the YOLO neural network detector,
there is a lightweight version - tiny-YOLO, the sizes of the architectures SqueezeDet,
YOLOv3, tiny-YOLO are shown in Table 1.
The speed of the neural network detector also depends on the size of the input
image, which allows you to adjust the size of the processed image in order to achieve
optimal accuracy and processing speed. Two series of experiments were performed for
three resolutions, using different hardware. In the first experiment (Table 2) compu-
tations were carried out without the use of a graphics processing unit, on the
CPU AMD A10 9600p (2.4 GHz, 4 cores). In the second experiment (Table 3),
computations were performed using an Nvidia GeForce GTX 1070 graphics processor
(8 GB, 1683 MHz) and CPU Intel core I7 8700 (3.2 GHz, 6 cores).
Taking into account speed and model sizes of compared architectures SqueezeDet
neural network was chosen for object detecting task of mobile robot.
Application of Deep Neural Network for the Vision System 217
Table 2. Comparing the speed of neural networks to detect objects using CPU AMD A10
9600p.
Input image resolution, pix Frame processing time, s
SqueezeDet Tiny-YOLO YOLOv3
320 240 0,10 0,32 1,99
640 480 0,39 1,04 5,72
1280 1024 1,78 5,08 32,12
Table 3. Comparing the speed of neural networks to detect objects using GPU Nvidia
GeForce GTX 1070, CPU Intel core I7 8700.
Input image resolution, pix Frame processing time, s
SqueezeDet Tiny-YOLO YOLOv3
320 240 0,006 0,016 0,050
640 480 0,009 0,026 0,088
1280 1024 0,027 0,054 0,253
It is required to develop a vision system for a service robot that can collect objects of
interest recognized using video camera images. The collected objects are wooden cubes
with a side of 33 mm with magnetic inserts in the centers of the faces.
For the application of neural network detector, data sets were made, the annotations
to the images contain the coordinates of the cubes. Constructed data sets can be divided
into two: «office» and «hall» . The first one contains 640 images of cubes in various
scenes inside the office premises, the shooting angle is arbitrary. The hall data set
consists of photographs obtained directly from a mobile robot in a large hall which is
convenient for experiments. The resolution of all images in the data sets was limited to
640 480 pixels to ensure high speed of the neural network.
It was decided to test the vision system in the hall for better specification of the
task. Thus, the hall dataset was main one and the office dataset was made for initial and
additional experiments.
4 Experimental Research
We studied the effect of adding non-target scenes to the training set, as well as the effect
of the choice of anchor boxes on the detection range and the accuracy of localization of
objects.
The correct setting of anchors is crucial in the SqueezeDet detector, since it is a
template and initial approximation of objects of interest. It is recommended to find the
values of the anchors using the clustering of the annotations via the k-means method
[6]. However, due to problems with the multiple detection of an object a second set of
218 N. Filatov et al.
anchors was obtained by increasing the scale of the first set of anchors. Denote the
anchors obtained by clustering as “precise”, the others as “enlarged” and consider the
inference peculiarities of a neural network when using these anchors. The values of
anchor boxes are shown in the Table 4.
A typical neural network prediction error when using “exact” anchors is the mul-
tiple detection of a single object, which leads to additional errors, since the additional
bounding boxes usually have a low intersection coefficient (IOU) with annotation.
A good feature is the detection of small scale objects (Fig. 3b). In contrast, with the use
of “enlarged” anchors, repeated detections occur rarely, and objects over long distances
are not detected (Fig. 3a).
Fig. 3. Typical inference errors when using different anchors: (a) small scale object is not
detected, enlarged anchors, (b) multiple detection of singe object, precise anchors.
Such properties can be explained by the fact that large objects stand out from the
background more strongly and the loss function for them converges faster, therefore
bounding boxes based on small anchors can acquire relatively high confidence on
fragments of a large object.
An experiment in which we compared the precision and recall of the three trained
models was conducted. Key features of the models learning process are shown in
Table 5. The experimental results are shown in Fig. 4. In all cases, as the initial weights
for neural network the weights obtained by training on the Kitti dataset [7] was used.
weights of the neural network trained on the Kitti data set were used as starting
Application of Deep Neural Network for the Vision System 219
weights. An erroneous detection is any bounding box that intersects with the annotation
less than the specified IOU threshold.
Analyzing these graphs, it is clear that, despite periodic multiple detections, the
model with precise anchors has better characteristics. It is also seen that the stable
omission of distant objects leads to a decrease of recall for models «hall»
and «hall + office» . At the same time, characteristics of the last two models are almost
the same, but the model, trained on hall and office photos, may be considered better due
to the fact that it works well in a larger variety of scenes.
Despite the high accuracy of one of the models, this quality assessment cannot be
final because it allows multiple detections of a single object, which is unacceptable
when planning a route for a mobile robot. To exclude multiple object detections, an
additional stage of filtering predictions was added. The implemented algorithm saves
only one the bounding box with the greatest confidence in the area of one detection.
Recalculation of quality metrics using additional filtration is shown in Fig. 5.
The use of additional filtering not only made the technical vision system convenient
to use, but also improved the F1 – score defined as:
precision recall
F1 ¼ 2 ð1Þ
precision þ recall
Then the maximum value of F1 before filtering was 0.80, and the maximum value
of F1 after filtering is 0.84.
5 Conclusion
A high-performance neural network detector has been used, which can be used on a
wide range of hardware suitable for use on low-power platforms.
For the task of searching and collecting wooden cubes by mobile robot, training
datasets was created. The features of trained neural was analyzed, trained model
achieved high precision and recall on test dataset.
The probable direction of the research development is the analysis of ways to
increase the range of object detection for a certain camera and neural network detector.
Research on the object detection precision of small-scale objects depending on the
resolution of the input image and applied preprocessing.
Acknowledgment. This work was done as the part of the state task of the Ministry of Education
and Science of Russia No. 075-00924-19-00 “Cloud services for automatic synthesis and vali-
dation of datasets for training deep neural networks in pattern recognition tasks”.
References
1. Nielsen, M.A.: Neural Networks and Deep Learning, vol. 25. Determination Press, San
Francisco (2015)
2. Asadi, K., et al.: Real-time scene segmentation using a light deep neural network architecture
for autonomous robot navigation on construction sites. arXiv preprint arXiv:1901.08630
(2019)
3. Redmon, J., et al.: You only look once: unified, real-time object detection. In: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
4. Wu, B., et al.: Squeezedet: unified, small, low power fully convolutional neural networks for
real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops, pp. 129–137 (2017)
5. Iandola, F.N., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
and <0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016)
6. Zhao, Z.Q., et al.: Object detection with deep learning: a review. IEEE Trans. Neural Netw.
Learn. Syst. (2019)
7. Geiger, A., et al.: Vision meets robotics: the KITTI dataset. Int. J. Rob. Res. 32(11), 1231–
1237 (2013)
Research on Convolutional Neural Network
for Object Classification in Outdoor Video
Surveillance System
The Russian State Scientific Center for Robotics and Technical Cybernetic
(RTC), Tikhoretsky Prospect 21, 194064 Saint Petersburg, Russia
{i.fomin,alexab}@rtc.ru
Abstract. Nowadays indoor and outdoor video surveillance systems are very
widespread. Earlier, in the days of the first surveillance systems, the processing
power allowed only monitoring and recording the surveillance footage, but now
it becomes possible to use various methods of video analysis; in this article, we
investigate the convolutional neural networks application in the objects classi-
fication. In our previous work we developed the outdoor video surveillance
system for detecting objects using fixed and PTZ cameras. System provides
detection of moving objects with low computational cost and high accuracy.
This paper summarizes the results of work on the existing outdoor video
surveillance system for detecting objects and our new convolutional neural
network object classifier based on the Keras and TensorFlow packages. Reliable
determination of the object type allows the system to make decisions on pro-
cessing the object information. The considered classifiers allow performing both
simple classification (a person/not a person) and more complex one (error/
person/car/animal) with insignificantly lower reliability. Object tracking in
consecutive video frames can remarkably reduce the number of classification
operations, because there is no need in performing them for each frame in case
the object class has been identified with enough reliability. In addition, the
integration of the developed networks into the existing video surveillance sys-
tem is briefly described.
1 Introduction
Nowadays the video surveillance systems are enough widespread to become a standard
way to ensure the inviolability of the various area types - from the government and
industrial buildings to private or company facilities. The video cameras have been used
for monitoring for a long time, and usually it is required to have some special human
personnel for analyzing camera footage. But it is quite difficult for people to remain
constantly attentive, moreover, one person is not enough in case of multiple cameras
(there may be more than hundred cameras in a large industrial area). Also a person can
distract and miss an important event. A logical solution in this situation is the use of
intelligent systems. Since the working conditions are very diverse such systems may
produce some false object detections. Integration of classifier based on convolutional
neural network into the video analysis system allows us to decrease the amount of
detection errors and expand the software functionality.
In our previous work [1] and [2], algorithm of the moving objects detection has
been introduced, it includes the set of object detection methods and performs with low
computational cost and high accuracy. The main distinctive feature of proposed
algorithm is the ability to detect and track objects under very difficult conditions. There
can be snow, rain, swaying grass, branches and bushes, changing lighting (in the partly
cloudy weather) in the frame, objects can have ultra-low resolution (single or dozens of
pixels). When properly configured this system performs well with both fixed and Pan-
tilt-zoom (PTZ) cameras.
The system is capable to classify objects by their position on the reference plane in
front of the camera, by the pixel and metric object sizes, and by the pixel or metric
movement speed. It helps to filter out some false positives, but sometimes this is not
enough.
Recently the convolutional neural networks idea becomes very popular for devel-
oping various computer vision systems. The neural networks working principle has a
lot in common with the human vision process. At the moment, the classification
methods based on the neural networks significantly exceed the classic (non-neural
network) methods. The neural network architectures now go far from the biological
prototype, but still produce outstanding results.
There are two fundamental problems with the use of previously developed algo-
rithms based on the space-time filtering of the video stream in the video surveillance
system. The first problem is to separate correctly detected objects from the false ones.
To cope with this issue, we propose to design a binary classifier based on the con-
volutional neural network. Since in the most cases the objects of system interest are
people, the classifier must determine with very high accuracy (more than 99%) whether
a person is within the selected area or it is an another object. This will eliminate the
system false positives for the non-human objects.
The second problem is more complex and relates to use-case with the specific
system responds depending on the type of detected and classified object. To solve this
task, we propose to design the convolutional neural network based classifier for
detecting several basic object types which are assumed to be treated in a specific way.
Since we need to determine the object type, this classifier must predict the object class
with more than 90% accuracy. A multiclass classification is always more complicated
than a binary one, especially when the image quality is not stable and the object classes
are unevenly distributed in the training dataset.
The paper considers original approaches to the synthesis of the architecture of
neural networks that require minimal computing resources. These architectures are able
to solve problems of classification of low-resolution objects extracted from the results
of the work of the video analytics system.
Research on Convolutional Neural Network for Object Classification 223
2 Related Work
As an approach, convolutional neural networks are known for a long time. For
example, paper [3] considered model of visual system that later was generalized to a
convolutional neural network.
In 1998, LeCun presented his work [4], in which for the first time the architecture
of a convolutional neural network for the objects classification was suggested, and for a
long time it was considered to be classic and almost the only one suitable for this task.
The main architect idea is the convolution operations with images and different
numbers of different sizes convolution kernel alternating with the pooling operations,
where the strongest activation is selected from each 2 2 square on the feature
map. The number of operations and parameters differ depending on the system settings,
but the idea is still the same.
After about a decade, computer performance has reached a level which allows not
only individual scientists with the special resources access and support to try solving the
object recognition tasks using the neural networks, but also video cards with a large
processing power became available for a wide range of people, and some packages for
direct use of the graphics cards memory appear. In 2012, Krizhevsky, a follower of
LeCun, and today also one of the most famous people in the field, proposed a new
architecture [5] which became the classic neural network structure for the object clas-
sification (see Fig. 1). Only after his work it became clear that deep learning requires a
huge amount of data, computing power and time. The most famous AlexNet model
includes more than 60 million parameters and uses two graphics accelerators for
training. The network ideologically inherits LeNet, but increases it three times. The
convolutional layers were added; the size of the convolution kernels was proposed to be
reduced from the input to the output. Some approaches have been proposed to avoid
overtraining, they are still popular and will be used further in the proposed architectures.
After the publication of the paper [5], even more rapid growth of the neural net-
works complexity began for the objects classification problem solving, some of the
architectures are worth being briefly noted.
In CCCP Pooling [6], the use of fully connected single-layer perceptrons included
in ordinary convolutional layers for some parts of the network is proposed, this reduces
the number of features, but increases the number of network parameters. For some
cases it works better.
Some very good significant results were achieved by various VGG networks [7].
VGG-16 is one of the most well-proven architectures, very stable and showing good
results. VGG-19 reaches the limit of deepening development for this architecture type.
Well-known GoogleNet [8] was created based on the VGG-19 network, and even more
complex Inception model [9] was designed on the GoogleNet basis. This model also
had some modifications, and after adding several ways to normalize weights between
the epochs, simplifying, rearranging and changing the structure of individual parts
reached the limit in the Inception V3 network [10].
The last significant improvement which leads to creating several new models on its
basis, in particular, Inception V4, and still continues to evolve - is that with increasing
the number of layers in the VGG classical architecture, the limit will be reached soon.
But if we allow the network to skip individual layers during the transfer of information
and assume that the weights of the individual layers are equal to zero, different layers
and different depths will be involved in solving various recognition tasks during the
training process, and in this case the increase of the number layers will produces the
desired result. The ResNet architecture [11] based on this principle is one of the most
popular at the moment, although the expensive GPU computing is still required for the
training of the super complex architects described above.
Most of the object classification architectures described above are designed for
conditions different from those available in video surveillance systems. The input
image in AlexNet is 228 228, which is good for object recognition on large color
images similar to those presented in the ImageNet competition. Examples of image
fragments that need to be classified by the video surveillance system are shown in
Fig. 2 below.
Small fragments sometimes can require a slightly complicated version of LeNet for
the object classification, but definitely not AlexNet or anything more complex.
Research on Convolutional Neural Network for Object Classification 225
To train and test neural networks for the object classification, at first we need to prepare
datasets specific for the task that needed to be solved - decide whether the detected
object belongs to the person or error class in the first case; error, person, cars, dogs or
cats in the second case, based on the result obtained from the video analytics system.
First of all, we should provide more information about the video analytics
parameters, which will affect the dataset and the network architecture. The surveillance
cameras are set in the following way: at night the survey is performed in the black and
white mode with the use of a special night infra-red illumination. Thereby, we made a
decision to always use black-and-white images, or convert the color ones to shades of
gray for training and verification.
The image resolution at the system input is 1280 720, but in order to increase the
speed, we decrease it 2 times to the 640 360 resolution. According to this, all
examples of result areas with objects will be extracted from the frame of the reduced
resolution.
The rectangle sizes that the algorithm allocates differ from each other, from 10–20
to 100–200 or slightly more pixels on the larger side; in addition, the center of the
rectangle is not always located exactly on the object, it is often shifted to the side.
Examples of why it is difficult to select an object based only on the coordinates of the
rectangle, and how the same frame can appear at different times of the day, are shown
on Fig. 3. There is an example of how the system detects a dog on the left frame - if
you cut off this bounding box, only a small and hardly recognizable part will remain.
There is a different problem in case of cars, because they often are too big in the frame
and the system is able to find only a part of the car.
The first set of data contains frames from two CCTV cameras that monitor one of
the authors private territory. The parameters of video analytics system on both cameras
were intentionally limited and worsened, it results in rising number of false detections
226 I. S. Fomin and A. V. Bakhshiev
of different sizes. Since most of false detections has very small size (1–3 5–7 pixels)
of the larger side, it was decided to allocate a 100 100 pixel square around the center
of the rectangle as a detection, taking into account staying within the image boundaries.
The training set is manually marked and contains objects of 5 classes: error, person,
car, dog, cat. The parameters of the dataset are represented in Table 1. The last class is
very poorly represented due to the low resolution and position of the cameras. This set
is a training set for a network that defines an object class, as well as a test set for a
binary classification network (person or non-person).
In addition to the detections on real data from the CCTV camera, there were also
two previously used video with only people marked. More details about images
extracted from these videos are presented in Table 2.
We randomly selected squares of 100 100 pixels that do not intersect with any of
the bounding boxes that marked by the person class as negative examples. This set was
used as a training set for the binary classification problem.
4 Experiments
scheme, but the function of the weights normalization between the training epochs was
changed to the L2 norm, which should improve the training quality.
In the fourth architecture (A1–4) in addition to the changes made for A1–3, a new rule
for initializing the network layers before the training was introduced according to the
special He function (He Uniform). It is assumed that the initialization with non-random
data, but a special function should improve the quality and reliability of network training.
The first architecture of the multiclass classification (A2–1) shown in Fig. 4,
(3) represents an experiment on the use of completely new sizes of the convolution
kernels with the overall network structure remained the same - the first core is 20 20,
the second is 5 5, and the third is 3 3. Moreover, the number of neurons in the
output layer of the network is changed to 5, which corresponds to 5 classes.
The second (A2–2), the third (A2–3), the fourth (A2–4) and the fifth (A2–5) archi-
tectures correspond to the A1–1,…, A1–4 architectures from the two-class classification.
4.2 Results
In this work we trained and tested the binary error-person classifier and the multiclass
error-person-car-dog-cat classifier. The data sets used in both experiments are described
in the previous section. The results obtained in the first experiment (the binary clas-
sifier) are presented in Table 3. The architectures are labeled as described in the pre-
vious section. A 20% part of the training set was used as the validation set. All
architectures showed good results both on test and validation datasets. The best result
obtained with A1–2 architecture.
We tested five architectures in the multiclass classifier experiments. The best result
was shown by a network that repeats the A1–3 architecture from the first experiment
(taking into account the increased outputs number) with the A2–4 name and the
93.85% accuracy on the test dataset (Table 4).
We selected the binary classification network on the basis of experiment results for
integrating into the existing video surveillance system based on space-time filtering
algorithms. The networks described in the previous section are implemented on the
basis of the Keras framework, which uses the TensorFlow system for performing
matrix multithread computations. Both of these systems use python as the program-
ming language.
Integration is performed by calling the python code from the C++ program (the
video monitoring system modules use the C++ programming language). Fortunately,
most of the interaction problems are solved in the boost:: python library, so we use it to
design our program module.
The module is works as follows. A black and white image is sent to the input of the
module, it is used to obtain fragments with objects and the coordinates of all objects
detected on the image by the system. The module cuts off parts of the image with
objects in accordance with the rules chosen in Sect. 3. Fragments are sequentially
transferred to the python code for classification, after which the module forms a vector
with object classes as well as debug image.
6 Conclusions
We prepared two sets of data to classify the objects detected by the video analytics
system – one to separate people from any other objects, and the other one to solve the
problem of classifying objects by type (error-person-car-dog-cat). We trained and
tested several neural network architectures on these datasets to determine the most
Research on Convolutional Neural Network for Object Classification 229
effective one and find out how the number of layers and the layer’s parameters affect
the detection quality in both cases.
The experiment was successfully completed with good results in both cases, it
allows us to choose the most suitable architecture for integration into the video ana-
lytics system. Then the software module of the video analytics system was created on
the basis of existing libraries to ensure the correct interaction with the C++ and python
programs. This module can classify the detected objects as people and other classes.
For further research we plan to train more architectures on this dataset - AlexNet
with preliminary layers scaling, LeNet with images scaling up to 256 256, as well as
more complex ones from the VGG family or simplified ResNet analogs. In addition, we
going to increase the number of detectable objects classes for special cases and to
perform a long time module testing as part of the video surveillance system.
Acknowledgement. This work was done as the part of the state task of the Ministry of Edu-
cation and Science of Russia No. 075-00924-19-00 “Cloud services for automatic synthesis and
validation of datasets for training deep neural networks in pattern recognition tasks”
References
1. Stepanov, D.N.: Detection of moving objects by a digital-signal-processor-based automatic
video surveillance system. Perception ECVP abstract, 35, 0–0
2. Bakhshiev, A.V., Polovko, S.A., Stepanov, D.N., Smirnova, E.Yu.: Multichannel computer
vision systems for evaluation and estimation of complex dynamic environments for the tasks
of situation analysis. Rob. Tech. Cybern. 3, 59–63 (2014)
3. Hubel, D.H., Wiesel, T.N.: Brain mechanisms of vision. Sci. Am. 241(3), 150–163 (1979)
4. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document
recognition. Proc. IEEE 86(11), 2278–2324 (1998)
5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional
neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105
(2012)
6. Lin, M., Chen, Q., Yan, S.: Network in network. https://arxiv.org/pdf/1312.4400v3.pdf.
Accessed 28 July 2018
7. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details:
delving deep into convolutional nets. In: British Machine Vision Conference (2014)
8. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going
deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp 1–9 (2015)
9. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception
architecture for computer vision. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 2818–2826 (2016)
10. Scheme of inspection V3 neural network. http://josephpcohen.com/w/wp-content/uploads/
inception-v3.pdf. Accessed 28 July 2018
11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–
778 (2016)
Post-training Quantization of Deep Neural
Network Weights
1 Introduction
Today’s neural networks have millions of parameters. The constantly growing network
size discourages the installation of neural nets on mobile devices and gadgets. For
example, the trained VGG16 network [1] designed for classification of ImageNet
patterns [2] takes up about 553 MB of memory.
There are different methods to reduce the size of a trained network. Quantization is
one of these methods. The trick employs the division of the weight distribution interval
into discrete subintervals to reduce the weight word lengths. The use of quantization for
pretrained networks allows a significant decrease of the network size, which makes the
network more compact and suitable for installation on mobile systems.
Most implementations of neural net weights quantization involve the retraining of
the network, which requires a great deal of additional computations [3–5].
Unlike those approaches, we investigate the quantization methods that don’t call for
retraining; instead, they operate on weights of already trained networks. By now we
managed to find only one research where the similar approach is used [6]. The paper
offers the logarithmic quantization, which implies the division of weights into lengths
whose logarithms to base 2 is an integer from the interval [−7, 0]. Our method suggests
a more universal approach where both the base and the initial value can be varied. We
called the method the exponential quantization.
2 The Methods
When examining trained neural networks like VGG-16, VGG-19, ResNet-50 etc., we
have discovered a clear general trend in the distribution of weights of these networks.
Particularly, the weights almost always comply with a symmetrical Gaussian or
Laplace distribution.
Initially, the weights can be regarded as distributed symmetrically in the interval
½M; M, where M is the largest magnitude of weights. Storing the signs of weights in a
separate array (by allocating one data bit per a sign), we reduce the distributional
interval to ½0; M. For simplicity, let us divide all the values by M to deal with the
interval x 2 ½0; 1.
In quantization we reduce the variety of numbers x 2 ½0; 1 by dividing interval
x 2 ½0; 1 into n segments whose ends are at points:
Note that the first segment is ½0; x0 , the last is ½xn2 ; 1, and the others are defined as
½xk1 ; xk ; k ¼ 1; . . .; n 1. The reason of selection of the first segment will be
explained a bit later. The number of segments n is usually determined by the number of
bits B which is allocated for a quantization purpose:
n ¼ 2B1 : ð2Þ
Minus one means the allocation of one bit for the sign of a value.
Let us consider two most popular quantization methods: uniform and exponential
quantization (Fig. 1).
Here x0 is the first point (a variable parameter), n is the number of segments, and
the length q of segments is determined as
M x0
q¼ ; M ¼ max Wij : ð4Þ
n1
232 E. M. Khayrov et al.
Fig. 1. The uniform (to the left) and exponential (to the right) distribution.
xk ¼ x0 qk ; xk ¼ qxk1 ; k ¼ 1; . . .; n 1: ð5Þ
2.3 Variable x0
Given the number of bits B, the quantization procedure is determined uniquely by the
end point of the first segment (variable x0 ) in both uniform (3) and exponential (5)
approach.
We assume that parameter x0 is to be chosen so that the distribution of original
weights and distribution of quantized weights have highest correlation.
3 The Results
Fig. 2. Original quantities (yellow histogram), uniformly quantized (left histogram) and
exponentially quantized (right histogram) quantities for a Laplace distribution. The number of
quantization steps n ¼ 8, x0 ¼ 0:05, the number of quantities N ¼ 10000.
Fig. 3. The result of uniform (left) and exponential (right) quantization for a Laplace
distribution. The number of quantization steps n ¼ 8, x0 ¼ 0:05, the number of N ¼ 10000.
We watched the correlation between the quantized and original sets of quantities.
Figures 4 and 5 show how the correlation depends on the length of the first segment x0 .
It is seen that the highest correlation nearly always corresponds to some optimal value
of x0 . The best values of correlation q and x0 as function of the word length are shown
in Fig. 6.
ρ 1− ρ
x0 x0
1− ρ 1− ρ
b =1 b =1
b=2 b=2
b=3 b=3
b=4 b=4
b=5 b=5
b=6 b=6
b=7 b=7
b=8 b=8
x0 x0
1− ρ x0
b b
Fig. 6. Parameter 1 q (left) and optimal value of x0 (right) as functions of the number of bits B
assigned for uniform and exponential quantization. Laplace distribution.
Post-training Quantization of Deep Neural Network Weights 235
Fig. 7. Original quantities (yellow histogram), uniformly quantized (left histogram) and
exponentially quantized (right histogram) quantities for a Gaussian distribution. The number
of quantization steps n ¼ 8, x0 ¼ 0:05, the number of quantities N ¼ 10000.
Table 1. The optimal values of parameters x0 and q for uniform and exponential quantization
methods in the case of Gaussian distribution.
Laplace Gauss
Bits number Unif. q Exp. q Unif. x0 Exp. x0 Unif. q Exp. q Unif.x0 Exp. x0
1 0.8538 0.8538 0.1010 0.1010 0.8575 0.8575 0.1194 0.1194
2 0.8881 0.9540 0.0980 0.0551 0.9075 0.9592 0.1133 0.0612
3 0.9531 0.98797 0.0643 0.0276 0.9634 0.9892 0.0673 0.0367
4 0.9841 0.9967 0.0367 0.0184 0.9886 0.9968 0.0367 0.0245
5 0.9955 0.99899 0.0184 0.0122 0.9968 0.9991 0.0153 0.0153
6 0.9988 0.9997 0.0122 0.0092 0.9992 0.9997 0.0092 0.0092
7 0.9997 0.99992 0.0061 0.0061 0.9998 0.9999 0.0061 0.0061
8 0.99992 0.99998 0.0031 0.0031 0.9999 1.0000 0.0031 0.0031
236 E. M. Khayrov et al.
Pn Rx
pffiffiffi xk xkk þ 1 zf ðzÞdz
2 k¼0
q¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; ð7Þ
rW P n Rx
x2k xkk þ 1 f ðzÞdz
k¼0
where rW are the mean square deviations of weights. The weights in the VVG16
network usually comply with two sorts of distribution. The convolution layers obey the
Laplace distribution; the fully connected layers follow the normal distribution.
The best value of x0 corresponds to the correlation maximum. Formula (7) doesn’t
describe the dependence perfectly. Yet it gives a good agreement in the correlation
maximum vicinity.
Table 2 presents the comparison of optimal characteristics found with the aid of
above-mentioned expressions and numerical search. The examination of Table 2
allows the evaluation for x0 as a function of the deviation of weight coefficients.
x0 0:15rW : ð8Þ
At first we find the best value of parameter x0 for each layer of the neural net. Then
we use this parameter to determine other division points x1 . . .xq (5). In the end we carry
out quantization by setting actual weights equal to the following values:
8
< 0; x0 \W\x0
Wd ¼ xk ; xk W\xk þ 1 ; k ¼ 1; q 1: ð9Þ
:
xk ; xk þ 1 \W xk
We used a set of patterns from the ImageNet database to test the work of the
algorithm. The set has 50000 patterns corresponding to 1000 classes. To simplify the
computations, we took a sample of 1000 patterns from the set. The test allowed us to
draw general conclusions about the algorithm.
The results of the computations for different number of quantization sets n are given
in Tables 3 and 4.
Post-training Quantization of Deep Neural Network Weights 237
Table 2. The best parameters for quantization of VVG16 neural net weights. The number of
quantization steps n ¼ 16.
Layer x0 q Number of weights Mean rW x0 =rW
1 0.03100 0.99825 1728 −0.00244 0.20670 0.150
2 0.00531 0.997 36864 0.00491 0.04248 0.125
3 0.00403 0.9961 73728 0.00020 0.03222 0.125
4 0.00294 0.9962 147456 −0.00028 0.02354 0.125
5 0.00261 0.9947 294912 −0.00013 0.01738 0.150
6 0.00185 0.9945 589824 −0.00024 0.01235 0.150
7 0.0019 0.9948 589824 −0.00067 0.01267 0.150
8 0.00151 0.9949 1179648 −0.00045 0.01005 0.150
9 0.00114 0.9943 2359296 −0.00047 0.00762 0.150
10 0.00119 0.9948 2359296 −0.00081 0.00796 0.150
11 0.0013 0.9955 2359296 −0.00058 0.00869 0.150
12 0.00131 0.9954 2359296 −0.00074 0.00876 0.150
13 0.00127 0.9947 2359296 −0.00108 0.00848 0.150
14 0.00035 0.9965 102760448 −0.00014 0.00231 0.152
15 0.00066 0.9971 16777216 −0.00037 0.00438 0.151
16 0.00124 0.9972 4096000 0.00000 0.00828 0.150
Table 3. Accuracy Top1 for different numbers of quantization sets n for networks VGG16,
VGG19 and ResNet50.
VGG16 VGG19 ResNet50
Original (32 bits) 70.3 70.3 75.7
n = 64 (6 + 1 bit) 70.6 69.9 72.6
n = 32 (5 + 1 bit) 69.5 68.4 65.1
n = 16 (4 + 1 bit) 60.5 49.7 29.2
n = 8 (3 + 1 bit) 15.1 0.7 0.2
n = 4 (2 + 1 bit) 0.1 0.1 0.0
Table 4. Accuracy Top5 for different numbers of quantization sets n for networks VGG16,
VGG19 and ResNet50.
VGG16 VGG19 ResNet50
Original (32 bits) 90.9 90.4 93.3
n = 64 (6 + 1 bit) 90.5 89.7 90.6
n = 32 (5 + 1 bit) 90.2 88.7 85.9
n = 16 (4 + 1 bit) 85.2 77.4 53.8
n = 8 (3 + 1 bit) 33.8 9.6 0.8
n = 4 (2 + 1 bit) 0.8 0.5 0.4
238 E. M. Khayrov et al.
As seen from the Tables, the algorithm copes well with the 6-bit quantization and
provides almost the same results of accuracy as the original values of weights. It means
that we can make the size of a network five times less almost without a considerable
penalty in accuracy.
4 Conclusion
We have developed a quantization method that doesn’t engage retraining. Not requiring
much computation, the approach gives good results for 6-bit quantization and allows us
to reduce the neural net size by about five times. At the same time the classification
accuracy of the method keeps up to the mark.
However, the more layers a network contains, the worse the algorithm works. For
instance, the ResNet50 network of 50 layers works notably worse than VGG-type
networks which have fewer layers. At the same time the VGG16 network shows good
results with 5-bit quantization. The difference is the reason why the research in the field
should continue.
Funding. The work financially supported by State Program of SRISA RAS No. 0065-2019-
0003 (AAA-A19-119011590090-2).
References
1. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition (2014). https://arxiv.org/abs/1409.1556
2. ImageNet – huge image dataset. http://www.image-net.org
3. Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. https://arxiv.org/abs/
1612.01064
4. Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., Zou, Y.: Dorefa-net: training low bitwidth
convolutional neural networks with low bitwidth gradients. https://arxiv.org/pdf/1606.06160.
pdf
5. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with
pruning, trained quantization and huffman coding. CoRR. https://arxiv.org/abs/1010.00149,
February 2015
6. Cai, J., Takemoto, M., Nakajo, H.: A deep look into logarithmic quantization of model
parameters in neural networks. In: The 10th International Conference on Advances in
Information Technology (IAIT2018), Bangkok, Thailand, 10–13 December 2018, 8 pages.
ACM, New York (2018). https://doi.org/10.1145/3291280.3291800
Deep-Learning Approach
for McIntosh-Based Classification
Of Solar Active Regions Using HMI
and MDI Images
Abstract. Solar active regions (ARs) are the primary source of solar
flares. There are plenty of studies where the statistical relationship
between ARs magnetic field complexity and solar flares are shown. Usu-
ally, the complexity of ARs described with different numerical magnetic
field parameters and characteristics calculated on top of them. Also,
there is well known and widely adapted McIntosh classification scheme
of sunspot groups, consists of three letters abbreviation. Solar Monitor’s
flare prediction system’s based on this classification. Up to date, the clas-
sification is done manual once a day by the specialist. In this paper, we
describe an automatic system based on convolutional neural networks.
For neural network training, we used images from two big magnetogram
databases (HMI and MDI images) covered together period from 1996
to the 2018 years. Our results show that the automated classification of
Solar ARs is possible with a moderate success rate, which allows to use
it in practical tasks.
1 Introduction
The observation, analysis, and classification of solar active regions is an essen-
tial part of space weather research. Sunspots investigated by scientists for more
than 400 years [5,6]. It was established that they appeared mostly not isolated
but with groups. These groups differ significantly with the size, configuration,
number of spots, and other parameters. The sunspots on the Earth-facing side of
the Sun observed space weather specialists, and they give each sunspot region a
magnetic classification and a spot classification. The modern standard of classi-
fication is a 3-component McIntosh classification system ([4]). It was introduced
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 239–245, 2020.
https://doi.org/10.1007/978-3-030-30425-6_28
240 I. Knyazeva et al.
in 1966. The classification depends on the size, shape, and spot density of ARs.
It is a modified version of the Zürich classification system, with seven broad
categories, characterized a number of polarities and distribution of spots in the
group. The general form of the McIntosh classification is ZPC, where “Z” is the
modified Zürich class, “P” is the type of penumbra on the largest spot and “C” is
the degree of compactness in the interior of the group. The simplified picture of
active regions configuration according to McIntosh classification could be found
at the site SpaceWeatherLive.com and represented at Fig. 1 .
istration (SWPC NOAA). Data could be collected directly from the SWPC
archive, but there are much additional information, and we used aggregated
information about each active region from the SpaceWeatherLive.com. Magne-
togram data is available since the 1996 year, so we collected information from
1996 till 2018. This data includes information about solar active region identi-
fication number (NOAA number), date, location at the Solar disc, and corre-
sponding McIntosh class. Python codes for data collection could be found at
the Github repository. We used magnetogram data from two instruments: Solar
Dynamics Observatory launched at Solar, and Heliospheric Observatory (http://
soi.stanford.edu/) (operated in 1996–2010) and its successor Helioseismic and
Magnetic Imager launched at Solar Dynamics Observatory http://hmi.stanford.
edu/ (2010 - now time). Full disc magnetograms hmi.m 720s from HMI/SDO
with resolution 4096 × 4096 and mdi.f d m 96m lev182 from SOHO/MDI with
resolution 1024×1024 were downloaded for each day presented at the final table.
After that, based on information from the table about the center of Active Region
location and date, all active regions presented at the full disc magnetogram were
cropped. We take size 500 × 400 for HMI, which corresponds to 125 × 100 for
MDI due to the low resolution of the last. Python codes for cropping also pro-
vided at Github repository. The example of full disc magnetogram and cropped
fragments is shown at Fig. 3. As a result, the total number of fragments 19565
were cropped, this number is less than the total number of records in the table
because we didn’t consider regions close to the limb. These fragments were used
as inputs to the neural network. As a target we take letters from McIntosh clas-
sification system. McIntosh classification is three-letter abbreviation; the first
letter consists of 7 classes, second - 6 and third 4. Distribution of examples
by each letter provided at Fig. 4. As shown in picture class distribution is not
balanced. We take it into account at the step of neural network learning.
3 Results
As was described previously, we used data from two instruments with different
resolution. Size of the fragments for mdi data is 125 × 100 for hmi 500 × 400.
It is possible to build different models for each type of data and then concate-
nate models. But at the start level, we decide to resize all images to one size
125 × 100 aware that we are losing some information. For neural network pro-
cessing, data should be normalized. Usually, for images or matrix data max-min
or standard deviation normalization are used, but in our case, it is important
to save information about global magnetic field strength, so we find maximum
values through all fragments and divide each pixel to this value. As a result, each
pixel in a fragment was a signed value from the range [−1,1]. As a baseline, we
use a simple convolutional neural network with three convolutional layers and
“tanh” function of activation. As a target, we used separate letters in McIntosh
codes, for each separate letter model was trained. After that, we experimented
with different architectures, well-proven in other image classification tasks, such
as DenseNet and ResNet. ResNet architecture was even worse than simple base-
line, DensNet gives little improvements with increasing of computational time.
Additionally, we try to add information about the statistical distribution of the
signed logarithm of data (sign(data) ∗ log(abs(data)) in the form of 8 values
of percentile values. This information was added to the dense layer (https://
www.overleaf.com/project/5cd932772cba9f6e5976cc37) next to the convolution
layers. The schema of the resulted model is presented in Fig. 5.
The accuracy metric received with the three different architectures are rep-
resented in the Table 1 below.
The distribution of correctly and wrong predicted classes by each letter pre-
sented at confusion matrix at Fig. 6. This result is slightly outperformed results
received by [1]
244 I. Knyazeva et al.
Table 1. Different neural network architectures accuracy for each letter prediction.
Overall accuracy versus random prediction accuracy is given
Fig. 5. Neural network model used in ARs classification. Each convolutional layer con-
sist from 64 5 × 5 filters with tanh activation function followed by BatchNormalization
layer
Fig. 6. Results of classification each of three letter in the form of confusion matrix.
The result of classification can hardly be called outstanding. But this is primar-
ily due to the complexity of the data. Also, the criteria for assigning a region
to a particular class are quite complex, and even experts may disagree, not to
mention the automatic system. Nevertheless, in our opinion, the information
about the Mcintosh class of ARs could be an important block in modern sys-
tems, since it makes it possible to use accumulated knowledge, even before the
era of regular observations. Besides, the quality of classification is much better
Solar ARs Classification 245
than the random guessing (for seven classes classification the quality of random
guessing is 15%, and in our model, it is 51%). Also, various steps are possible
to improve the model. So we did not use the white light data, while historically
they were available much earlier and it was for them that the classification was
initially carried out. We considered that the magnetograms contain much more
information. Besides, for learning neural networks, especially if we using deep
architectures, we need a lot of data. In our case the problem of data can be solved
by extra data labeling, which requires additional human resources and by using
data through a small interval, assuming that during this time the McIntosh class
will not be changed. In the latter case, additional information is needed on the
position of the center of the region after a specific time period, so some active
region tracking system should be realized.
References
1. Colak, T., Qahwaji, R.: Automated McIntosh-based classification of sunspot groups
using MDI images. In: Solar Image Analysis and Visualization, pp. 67–86. Springer,
New York (2007). https://doi.org/10.1007/978-0-387-98154-3 7
2. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
volutional neural networks. In: Advances in Neural Information Processing Systems,
pp. 1097–1105 (2012)
3. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.,
Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural
Comput. 1(4), 541–551 (2008). https://doi.org/10.1162/neco.1989.1.4.541
4. McIntosh, P.S.: The classification of sunspot groups. Sol. Phys. 125(2), 251–267
(1990). https://doi.org/10.1007/BF00158405
5. Severny, A., Lüst, R.: Stellar and solar magnetic fields. In: International Astronom-
ical Union. Symposium (1965)
6. Yazev, S.: Kompleksi activnosti na Solnze. Nauka v Rossii, pp. 4–12 (2009)
Deep Learning for ECG Segmentation
1 Introduction
2 Algorithm
2.1 Preprocessing
The neural network described below was trained on a dataset of ECG signals
with the sampling frequency 500 Hz and the duration 10 s (see Sect. 3.1). In
order to use this network for signals of a different frequency or/and a different
248 V. Moskalenko et al.
(2i − 1)T
m = μT , ti = .
2m
4. Using the cubic spline, find the signal values at t . The resulting array will be
the input to the neural network.
The output size is (4, l). Each column of the output matrix contains 4 scores,
that characterize the confidence degree of the neural network that the current
value of the signal belongs to the segments P, QRS, T or none of the above. The
proposed neural network includes the following layers:
Deep Learning for ECG Segmentation 249
(i) 4 blocks, each of which includes two convolutional layers with batch nor-
malization and the Relu activation function; these blocks are connected
sequentially with MaxPooling layers;
(ii) the output from the previous layer through the MaxPooling layer is fed to
the input of another block containing two convolutional layers with batch
normalization and the Relu activation function;
(iii) the output from the previous layer through the deconvolution and zero
padding layers is concatenated with the output from the layer (ii) and is
fed to the input of the block that includes two convolutional layers each
with the batch normalization and the Relu activation function;
(iv) the output from the previous layer through the deconvolution and zero
padding layers is sequentially fed to the input of another 4 blocks containing
two convolutional layers each with batch normalization and Relu activation
function; each time the output is concatenated with the output from the
corresponding layers (i) in the reverse order;
(v) the output from the previous layer is fed to the input of another convolu-
tional layer.
All convolutional layers have the following characteristics: kernel-size = 9,
padding = 4. All deconvolution layes have kernel-size = 8, stride = 2, padding =
3. For the last convolutional layer kernel-size = 1.
The main differences between the proposed network and UNet follow:
– we use 1d convolutions instead of 2d convolutions;
– we use a different number of channels and different parameters in the convo-
lutions;
– we use of copy + zero pad layers instead of copy + crop layers; as a result, in
the proposed method the dimension of the output is the same as the input;
in contrast, at the output of the UNet network, we obtain a segmentation of
only a part of the image.
2.3 Postprocessing
The output of the neural network is the matrix of size (4, l), where l is the
input signal length. Applying the argmax function to the columns of the matrix,
we obtain a vector of length l. Form an array of waves, finding all continuous
segments with the same label.
For processing multi-leads ECG (a typical number of leads is 12), we propose
to process each lead independently, and then find the average of the resulting
scores. As we will see in the Section , such an analysis improves the quality of
the prediction.
3 Experimental Results
3.1 LUDB Dataset
The training of the neural network and experiment conducting were performed
on the extended LUDB dataset [5]. The dataset consists of a 455 12-leads ECG
250 V. Moskalenko et al.
with the duration of 10 seconds recorded with the sampling rate of 500 Hz. For
comparison of algorithms, the dataset was divided into a train and a test sets,
where the test consists of 200 ECG signals borrowed from the original LUDB
dataset. Since the proposed neural network elaborate the leads independently,
255 × 12 = 3060 signals of length 500 × 10 = 5000 were used for training. To
prevent overfitting, augmentation of data was performed: at each batch iteration,
a random continuous ECG fragment of 4 s was fed to the input of the neural
network.
The LUDB dataset has the following feature. One (sometimes two) first and
last cardiac cycles are not annotated. At the same time, the first and last marked
segments are necessarily QRS (see an example in Fig. 1). To implement a correct
comparison with the reference segmentation, the following modifications were
made in the algorithm:
– during augmentation, the first and last 2 s were not taken, i. e. subsequences
of the length of 4 s were chosen starting from the 2-nd to the 4-th (ending
from the 6-th to the 8-th s);
– in order to avoid a large number of false positives, the first and the last cardiac
cycles were removed during the validation of the algorithm.
Table 1 contains results of the experiment and the comparison of the results
with one of the best segmentation algorithm using wavelets [4] and the neural
network segmentation algorithm [11]. The last line shows the characteristics of
our algorithm that analyses the leads independently for a test set consisting of
200 × 12 = 2400 ECG.
The quality of the algorithms is determined using the following procedure.
According to the recommendations of the Association for Medical Instrumen-
tation [1], it is considered that an onset or an offset are detected correctly, if
their deviation from the doctor annotations does not exceed in absolute value
the tolerance of 150 ms.
If an algorithm correctly detects a significant point (an onset or an offset of
one of the P, QRS, T segments), then a true positive result (TP) is counted and
the time deviation (error) of the automatic determined point from the manually
marked point is measured. If there is no corresponding significant point in the
test sample in the neighborhood of ±tolerance of the detected significant point,
then the I type error is counted (false positive – FP). If the algorithm does not
detect a significant point, then the II type error is counted (false negative – FN).
Following [3,6,8,9], we measure the following quality metrics:
Here TP, FP, FN denotes the total number of correct solutions, type I errors,
and type II errors, respectively. We also give the value of
Se · PPV
– the F 1-measure: F 1 = 2 .
Se + PPV
The experiments show that the proposed algorithm confidently copes with
noise of different frequencies. An example with low frequency noise (breathing)
is shown in Fig. 3. An example with high frequency noise is presented in Fig. 4.
An example of the segmentation of an ECG with a pathology (ventricular
extrasystole) is shown in Fig. 5. An example of segmentation of an ECG obtained
from another type of ECG monitor is shown in Fig. 6. It is characterized by high
T waves and a strong degree of smoothness. Figure 7 presents an example of
segmentation of an ECG with the frequency of 50 Hz, reduced using a cubic
spline to the frequency of 500 Hz.
The paper describes an algorithm based on the use of a UNet-like neural network,
which is capable to quickly and efficiently construct the ECG segmentation. Our
method uses a small number of parameters and it has a good generalization.
In particular, it is adaptive to different sampling rates and it is generalized to
various types of ECG monitors. The proposed approach is superior to other state-
of-the-art segmentation methods in terms of quality. F 1-measures for detection
of onsets and offsets of P and T waves and for QRS-complexes are at least 97.8%,
99.5%, and 99.9%, respectively.
In the future, this can be used with diagnostic purposes. Using segmentation,
one can compute useful signal characteristics or use the neural network output
directly as a new network input for automated diagnostics with the hope of
improving the quality of classification.
In addition, one can try to improve the algorithm itself. In particular, the loss
function used in the proposed neural network probably does not quite reflect the
quality of segmentation. For example, it does not take into account some features
of the ECG (e. g. two adjacent QRS complexes cannot be too close to each other
or too far from each other).
254 V. Moskalenko et al.
Acknowledgement. The authors are grateful to the referee for valuable suggestions
and comments. The work is supported by the Ministry of Education and Science of
Russian Federation (project 14.Y26.31.0022).
References
1. Association for the Advancement of Medical Instrumentation. NSI/AAMI
EC57:1998/(R)2008 (Revision of AAMI ECAR:1987) (1999)
2. De Boor, C.: A Practical Guide to Splines. Springer, New York (1978)
3. Bote, J.M., Recas, J., Rincon, F., Atienza, D., Hermida, R.: A modular low-
complexity ECG delineation algorithm for real-time embedded systems. IEEE J.
Biomed. Health Inform. 22, 429–441 (2017)
4. Kalyakulina, A.I., Yusipov, I.I., Moskalenko, V.A., Nikolskiy, A.V., Kozlov,
A.A., Zolotykh, N.Y., Ivanchenko, M.V.: Finding morphology points of
electrocardiographic-signal waves using wavelet analysis. Radiophys. Quantum
Electron. 61(8–9), 689–703 (2019)
5. Kalyakulina, A.I., Yusipov, I.I., Moskalenko, V.A., Nikolskiy, A.V., Kozlov,
A.A., Kosonogov, K.A., Zolotykh, N.Yu., Ivanchenko, M.V.: LU electrocardio-
graphy database: a new open-access validation tool for delineation algorithms.
arXiv:1809.03393 (2018)
6. Di Marco, L.Y., Lorenzo, C.: A wavelet-based ECG delineation algorithm for 32-bit
integer online processing. Biomed. Eng. Online 10(1), 23 (2011)
7. Li, C., Zheng, C., Tai, C.: Detection of ECG characteristic points using wavelet
transforms. IEEE Trans. Biomed. Eng. 42(1), 21–28 (1995)
8. Martinez, A., Alcaraz, R., Rieta, J.J.: Automatic electrocardiogram delineator
based on the phasor transform of single lead recordings. In: Computing in Car-
diology, pp. 987–990. IEEE (2010)
9. Rincon, F., Recas, J., Khaled, N., Atienza, D.: Development and evaluation of
multilead wavelet-based ECG delineation algorithms for embedded wireless sensor
nodes. IEEE Trans. Inf. Technol. Biomed. 15, 854–863 (2011)
10. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedi-
cal image segmentation. In: International Conference on Medical Image Computing
and Computer-Assisted Intervention, pp. 234–241. Springer, Cham (2015)
11. Sereda, I., Alekseev, S., Koneva, A., Kataev, R., Osipov G.: ECG segmentation by
neural networks: errors and correction. arXiv:1812.10386 (2018)
Competitive Maximization of Neuronal
Activity in Convolutional Recurrent Spiking
Neural Networks
Abstract. Spiking neural networks (SNNs) are the promising algorithm for
specific neurochip hardware real-time solutions. SNNs are believed to be highly
energy and computationally efficient. We focus on developing local learning
rules that are capable to provide both supervised and unsupervised learning. We
suppose that each neuron in a biological neural network tends to maximize its
activity in competition with other neurons. This principle was put at the basis of
SNN learning algorithm called FEELING. Here we introduce efficient Convo-
lutional Recurrent Spiking Neural Network architecture that uses FEELING
rules and provides better results than fully connected SNN on MNIST bench-
mark having 55 times less learnable weight parameters.
1 Introduction
In this section we describe the process of data encoding, FEELING learning rules and
introduced Convolutional Recurrent Spiking Neural Network (CRSNN) architecture.
xinverse
ij ¼ maxð xÞ xij ð1Þ
Additional inverse image helps to send to the network information about regions
that do not have high intensity pixels in term of presence of spikes (instead of absence
as for original image). Also, it provides input signals that are automatically l1 -nor-
malized, what is essential for algorithm convergence.
Here wij are forward weights, wkj reciprocal weights, wjj0 inhibitory weights
between classifying neurons, and wkk0 are inhibitory weights between neurons in spike-
pooling layer. Weights in convolutional layer are shared, so final update for wij is
averaged between connections within each convolutional filter. In equations above, dðÞ
stands for Dirac’s delta-function meaning the weight updates occur only with spikes of
corresponding neurons, a, b, c, η are the learning rate parameters, and the last terms
serve for the weight decays with time constant s to forget inactive patterns.
3 Results
In this section we report our results of training CRSNN with FEELING learning rules.
We compare our CRSNN to RSNN proposed in [6]. We get better accuracy in both
supervised and semi-supervised training regimes having 56 times less number of
learning parameters. Also we analyze learned convolutional filters and plot maximizing
images [9] for classifying neurons for both normal and inverse input channels.
Fig. 2. Learning curves for supervised and semi-supervised modes. RSNN converges faster but
CRSNN provides better accuracy for both supervised and semi-supervised modes.
Fig. 3. Convolutional filters obtained after training CRSNN with the FEELING learning rule.
(A) 25 filters on the left-side stand for the first (normal) input channel. (B) 25 filters on the right-
side stand for the second (inverse) input channel. We notice that areas that have high values of
weights of the first channel have small values of corresponding weights of the second channel.
Fig. 4. Inhibitory weights obtained while training with FEELING learning rule. These weights
provide competition between different convolutional network in CRSNN. We notice that final
inhibitory weight matrix looks symmetric, as it should be anticipated naturally.
Maximizing Images. In this work we also have applied our original method of
reconstructing maximizing images [9] to convolutional spiking architecture. The main
idea of this method is that for each classifying neuron we have to (1) find the image that
provides the highest activation of this neuron, (2) compute the gradient of the activity
of this neuron with respect to the input, (3) perform one step in the direction of
gradient, (4) iteratively repeat steps 2 and 3 for a fixed number of epochs, and (5) pass
Competitive Maximization of Neuronal Activity 261
the result through the threshold filter to binarize maximizing image. Resulting maxi-
mizing images are presented in Fig. 5 for both normal and inverse input channels.
Fig. 5. Reconstructed maximizing images of the output neurons. The first row corresponds to
the normal input channel, the second row corresponds to the inverse input channel.
To compare our results with that of [5], we have trained not spiking linear classifier
on the top of spike-pooling layer. We recorded activities of 25 pooling neurons, fed
them to the input of logistic regression classifier and obtained accuracy 98.35%. So, we
have got the same accuracy with much shorter architecture using convolutional feature
extractor trained with FEELING instead of STDP.
Deeper architectures trained with a back-propagation technique (adapted to SNN)
[2] still outperform our results by 0.95% at the best having approximately 3 times more
trainable parameters than proposed CRSNN.
262 D. Nekhaev and V. Demin
4 Conclusion
Introduced CRSNN is a light architecture that can be trained with FEELING rules
using only 20000 MNIST images and provides high accuracy. An important advantage
of the FEELING as well as STDP rules is that they are local, i.e. use only locally
accessible data (the activities and weight values of interconnected neurons). This
property is believed to be the key for successful hardware realizations of learning
algorithms in the prospective high-performance and energy-efficient neuromorphic
systems.
Acknowledgements. This work has been carried out using computing resources of the federal
collective usage center Complex for Simulation and Data Processing for Mega-science Facilities
at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/. Development of convolutional spiking
architecture and learning experiments has been supported by Russian Science Foundation grant
№. 17-71-20111, research and development of learning rules for spiking convolutional layers has
been supported by scientific grant of NRC “Kurchatov Institute” №. 1713.
References
1. Merolla, P.A., et al.: A million spiking-neuron integrated circuit with a scalable
communication network and interface. Science 345(6197), 668–673 (2014)
2. Lee, J.H., Delbruck, T., Pfeiffer, M.: Training deep spiking neural networks using
backpropagation. Front. Neurosci. 10, 508 (2016)
3. Bi, G., Poo, M.: Synaptic modifications in cultured hippocampal neurons: dependence on
spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci. 18(24), 10464–
10472 (1998)
4. Diehl, P., Cook, M.: Unsupervised learning of digit recognition using spike-timing-
dependent plasticity. Front. Comput. Neurosci. 9, 99 (2015)
5. Kheradpisheh, S.R., Ganjtabesh, M., Thorpe, S.J., Masquelier, T.: STDP-based spiking deep
convolutional neural networks for object recognition. Neural Netw. 99, 56–57 (2018)
6. Demin, V., Nekhaev, D.: Recurrent spiking neural network learning based on a competitive
maximization of neuronal activity. Front. Neuroinf. 12, 79 (2018)
7. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document
recognition. Proc. IEEE 86(11), 2278–2324 (1998)
8. Maass, W., Bishop, C.M.: Pulsed Neural Networks, p. 275. MIT Press, Massachusetts
(1999)
9. Nekhaev, D., Demin, V.: Visualization of maximizing images with deconvolutional
optimization method for neurons in deep neural networks. Procedia Comput. Sci. 119, 174–
181 (2017)
10. O’Connor, P., Neil, D., Liu, S., Delbruck, T., Pfeiffer, M.: Real-time classification and
sensor fusion with a spiking deep belief network. Front. Neurosci. 7, 178 (2013)
11. Bo, Z., et al.: Feedforward categorization on AER motion events using cortex-like features in
a spiking neural network. IEEE Trans. Neural Netw. Learn. Syst. 26, 1963–1978 (2015)
A Method of Choosing a Pre-trained
Convolutional Neural Network for Transfer
Learning in Image Classification Problems
1 Introduction
2 Problem Statement
accurate model. However, this approach has an obvious drawback in its high com-
putational complexity. Fine tuning of just one model can take up to several hours or
even days on modern GPUs. At the same time, a single run of models C1 ; . . .; CM on
the sample D is much less computationally expensive procedure.
In this paper we propose an approach based on the estimation of the separability of
features formed by models C1 ; . . .; CM . It is assumed that the more separable the data
observed at the output of the pre-trained CNN’s convolutional part for some sample of
images, the more accurate the CNN will be after its fine tuning on this sample. This
assumption is based on the fact that the fully connected CNN’s layers located after the
convolutional part tend to adapt mostly during fine tuning and the CNN’s convolu-
tional layers particularly the earlier ones adapt their weights much less or insignifi-
cantly [18]. In other words, the accuracy of the CNN after fine tuning is highly
determined by the quality of the features formed by the pre-trained CNN’s convolution
part. n o
ðiÞ
Let Dm ¼ zm ; rðiÞ ; i ¼ 1; n be the labeled sample of CNN’s features where
ðiÞ
the Lm-dimensional vector ¼ Cm ðxðiÞ Þ is obtained at the output of the CNN’s
zm
convolutional part Cm, m ¼ 1; M, as a result of its simulation on the image xðiÞ from
sample D. We estimate the separabilities c1 ; . . .; cM of the data in samples D1 ; . . .; DM
and select the model that is characterized by the highest separability. This model is
assumed to be most suitable for transfer learning as soon as it has a priori the more
efficient features among the considered pre-trained CNNs for the given image classi-
fication problem.
A direct method of estimating the data separability is to train some classifier on this
data. Accuracy of the trained classifier on the test sample will be the measure of data
separability. The CNN’s fully connected layers can be chosen as such classifiers. Its
drawback is dependency on the initial weights, training method and its hyperparam-
eteres, high computational cost and possible overfitting. In this regard we will use
robust and high-speed indirect estimation methods.
Existing metrics for data separability are usually based on the assumption that the
data of one class are spatially close while the classes themselves are far from each
other, i.e. that classes form clusters. Thus, some cluster indices are used (Dunn index,
Davis-Baldwin index, etc.) as a measure of class separability [19]. But in practice the
assumption of class compactness can be violated.
We propose a “naive” method for assessing the quality of CNN’s features. Its idea
is to assess the separability for each feature independently and construct the overall
index of separability based on separabilities of single features.
It is known that binary separability of one-dimensional data is characterized by
ROC-curve and ROC AUC can be used as a separability measure. The micro-averaged
and macro-averaged ROC AUC are generalizations of ROC AUC to the multiclass data
[20]. Due to the fact that in practice these multiclass measures are usually very similar
to each other we use macro-averaged ROC AUC as simpler one to calculate.
266 A. G. Trofimov and A. A. Bogatyreva
ð1Þ ðnÞ
Let zjm ; . . .; zjm be the sample obtained at the j-th output of model Cm, j ¼ 1; Lm ,
m ¼ 1; M, as a result of its simulation on the images xð1Þ ; . . .; xðnÞ from sample D. This
sample is characterized by macro-averaged ROC AUC ajm calculated using the cor-
responding class labels rð1Þ ; . . .; rðnÞ . Thus, the outputs of model Cm will be charac-
terized by a vector of macro-averaged ROC AUCs am ¼ ða1m ; . . .; aLm m ÞT , m ¼ 1; M.
The overall quality measure of the model Cm’s features is a some function f of the
vector am: cm ¼ f ðam Þ, m ¼ 1; M.
It is argued that the model with the highest quality measure will be the most
accurate after fine tuning on sample D. In order for this statement to be valid, it is
necessary to find a transformation f that maximizes the correlation q between model’s
quality measure and its accuracy after fine tuning:
where pm is the accuracy of m-th CNN after fine tuning on the sample D, m ¼ 1; M.
Problem (1) is a variational problem in the space of functions. Its exact solution can
be very difficult. We calculate some statistics (in particular, mean, variance, etc. of the
vector am’s elements) as the quality cm and choose the statistic that provides the
maximum correlation q. The statistics used in this paper are discussed in Sect. 4.
4 Experimental Results
0.98
0.7
0.96
0.65
Accuracy
AUC 0.94
0.6
0.55 0.92
0.5
0.9
Figure 2 shows scatter plots on the plane (c, p). Different statistical characteristics
of samples a1m ; . . .; aLm m , m ¼ 1; M, were used to calculate the measures c1 ; . . .; cM .
The highest correlation (q 0.5) is observed between the fine-tuned CNN’s test
accuracy and the maximum AUC of its features. In addition, there is an interesting
relation between accuracy and the averaged (among all CNN’s features) AUC. As soon
as the averaged AUC grows the accuracy at first decreases and then begins to increase.
At the same time the networks with the minimum and maximum averaged AUCs
(SqueezeNet and AlexNet respectively) have almost the same classification accuracy
after fine tuning (97.5%).
Fig. 2. Scatter plots on the plane (c, p). Different statistical characteristics of samples
a1m ; . . .; aLm m , m ¼ 1; M, were used to calculate the measures c1 ; . . .; cM – mean (left), standard
deviation (center), maximum value (right). Each point corresponds to some CNN.
268 A. G. Trofimov and A. A. Bogatyreva
1X
q
cm ðqÞ ¼ amj ; q Lm ; m ¼ 1; M; ð2Þ
q j¼1
where the AUCs am1 ; . . .; amLm are sorted in descending order. The greatest AUC is
maxfam1 ; . . .; amLm g ¼ cm ð1Þ, m ¼ 1; M.
Figure 3 shows the dependency of the correlation coefficient qðqÞ ¼ corr ðcðqÞ; pÞ
on the number of CNN’s features q used in averaging in (2).
Fig. 3. A plot of the correlation coefficient q(q) versus the number of CNN’s features q used in
averaging (left) and scatter plot on the plane (c, p) for q = 100 (right).
The plot shows that the maximum correlation (qmax 0.74, p-value < 0.01) cor-
responds to the number of features q 100. It means that the accuracy of the fine
tuned network can be predicted more precisely based on the AUC averaged over 100
best features formed by pre-trained CNN.
5 Conclusion
It is shown that the accuracy of fine tuned CNN on the test images is strongly correlated
with the AUC averaged over the features formed by the CNN’s convolutional part and
having the greatest discrimination capability. This makes it possible to predict the
accuracy of fine tuned CNN on the new sample of images before the carrying out the
expensive fine tuning procedure.
The proposed method can be used to make recommendations for researchers who
want to apply the pre-trained CNN and transfer learning to solve their own
A Method of Choosing a Pre-trained Convolutional Neural Network 269
classification problems and don’t have sufficient computational resources for multiple
fine tunings of available free CNNs and choosing the best one.
A possible direction for further research is the construction of the more precise
characteristics of CNNs’ features to estimate their capability to transfer learning, i.e.
more accurately predict CNNs’ error after fine tuning on a new sample of images.
Another and more ambitious direction is the development of a method for quick
assessing the capability of the CNN to transfer learning based only on the descriptors of
the given sample of images without the calculation and statistical analysis of features
formed by of pre-trained CNN.
References
1. Ma, N., et al.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In:
Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
2. Baker, B., et al.: Designing neural network architectures using reinforcement learning. arXiv
preprint. arXiv:1611.02167 (2016)
3. Mortazi, A., Bagci, U.: Automatically designing CNN architectures for medical image
segmentation. In: International Workshop on Machine Learning in Medical Imaging, pp. 98–
106. Springer, Cham (2018)
4. Sun, Y., et al.: Automatically designing CNN architectures using genetic algorithm for
image classification. arXiv preprint. arXiv:1808.03818 (2018)
5. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image
representations using convolutional neural networks. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)
6. Huang, Z., Pan, Z., Lei, B.: Transfer learning with deep convolutional neural network for
SAR target classification with limited labeled data. Remote Sens. 9(9), 907 (2017)
7. Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big data 3(1), 9
(2016)
8. Kulik, S.: Neural network model of artificial intelligence for handwriting recognition.
J. Theor. Appl. Inf. Technol. 73(2), 202–211 (2015)
9. Larsen-Freeman, D.: Transfer of learning transformed. Lang. Learn. 63, 107–129 (2013)
10. Ghafoorian, M., et al.: Transfer learning for domain adaptation in MRI: application in brain
lesion segmentation. In: International Conference on Medical Image Computing and
Computer-Assisted Intervention, pp. 516–524. Springer, Cham (2017)
11. Tang, Y., Peng, L., Xu, Q., Wang, Y., Furuhata, A.: CNN based transfer learning for
historical Chinese character recognition. In: 2016 12th IAPR Workshop on Document
Analysis Systems (DAS), pp. 25–29 (2016)
12. Akcay, S., et al.: Transfer learning using convolutional neural networks for object
classification within x-ray baggage security imagery. In: 2016 IEEE International
Conference on Image Processing (ICIP), pp. 1057–1061 (2016)
13. ImageNet. http://www.image-net.org
14. Yosinski, J., et al.: How transferable are features in deep neural networks? In: Advances in
Neural Information Processing Systems, pp. 3320–3328 (2014)
15. Zhang, W., et al.: Deep model based transfer and multi-task learning for biological image
analysis. IEEE Trans. Big Data 99, 1 (2016)
270 A. G. Trofimov and A. A. Bogatyreva
16. Trofimov, A.G., Velichkovskiy, B.M., Shishkin, S.L.: An approach to use convolutional
neural network features in eye-brain-computer-interface. In: International Conference on
Neuroinformatics, pp. 132–137. Springer, Cham (2017)
17. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
18. Reyes, A.K., Caicedo, J.C., Camargo, J.E.: Fine-tuning deep convolutional networks for
plant recognition. CLEF (Working Notes), p. 1391 (2015)
19. Desgraupes, B.: Clustering Indices. University of Paris Ouest-Lab Modal’X, Paris (2013)
20. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel
classification. In: European Conference on Machine Learning, pp. 406–417. Springer,
Berlin, Heidelberg (2007)
21. The MNIST database. http://yann.lecun.com/exdb/mnist/
The Usage of Grayscale or Color Images
for Facial Expression Recognition
with Deep Neural Networks
Abstract. The paper describes usage of modern deep neural network archi-
tectures such as ResNet, DenseNet and Xception for the classification of facial
expressions on color and grayscale images. Each image may contain one of
eight facial expression categories: “Neutral”, “Happiness”, “Sadness”, “Sur-
prise”, “Fear”, “Disgust”, “Anger”, “Contempt”. As the dataset was used
AffectNet. The most accurate architecture is Xception. It gave classification
accuracy on training sample 97.65%, on cleaned testing sample 57.48% and top-
2 accuracy on cleaned testing sample 76.70%. The category “Contempt” is
worst recognized by all the types of neural networks considered, which indicates
its ambiguity and similarity with other types of facial expressions. Experimental
results show that for the considered task it does not matter, the color or grayscale
image is fed to the input of the algorithm. This fact can save a significant amount
of memory when storing data sets and training neural networks. The computing
experiments was performed using graphics processor using NVidia CUDA
technology with Keras and Tensorflow deep learning frameworks. It showed
that the average processing time of one image varies from 4 ms to 30 ms for
different architectures. Obtained results can be used in software for neural
network training for face recognition systems.
1 Introduction
Currently, significant progress has been made in creating efficient image recognition
algorithms based on the use of deep neural networks [1–3]. As a rule, such algorithms
require the presence of a large number of images that are obtained in different lighting
and noise conditions. They need huge amounts of memory for storage as well as for
training. There are subject areas for which it is advisable to study the possibility of
using grayscale images instead of color during training of recognition algorithms. This
can reduce by three times the need for RAM or hard disk space.
One of these areas is the task of person’s facial expression recognition. The stan-
dard for determining the type of facial expressions is the Emotional facial action coding
system (EMFACS-7), proposed by Friesen and Ekman in 1983 [4]. This generally
accepted standard identifies seven basic types of emotions: (1) anger, (2) contempt,
(3) disgust, (4) fear, (5) happiness, (6) sadness, (7) surprise. Additionally, it is con-
sidered a neutral facial expression.
At the initial stage, methods for emotions recognition on human face images were
associated with manual selection of features: Gabor wavelets [5], local binary patterns
[6], geometric deformation features on image sequences [7], 3D Surface Features [8],
etc. Modern approaches are based on the automatic generation of image features based
on deep convolutional neural networks, some of them use the prior alignment technique
[9], some recognize facial expressions on images as they are [1, 10], including tuning
of pre-trained networks on the task of face identification [11]. Also deep spatial-
temporal networks were proposed for emotion recognition on video sequences [12].
Deep learning is also actively used to analyze facial expressions from face three-
dimensional model [13].
There are a number of commercial services that implement closed-ended emotion
recognition methods: Face API from Microsoft Azure [14], Amazon Emotion API [15],
Affectiva Emotion SDK [16], etc.
However, the recognition of facial expressions on images in complex conditions of
variable light, noise, uncomfortable perspective is still an important topic for further
research.
For the study of approaches based on neural networks, there are many different data
sets that differ in shooting conditions, the variety of people photographed, and the
number of images per class. Some popular datasets and their features are listed in
Table 1:
– Cohn-Kanade AU-Coded Expression Database [17] (CK), it is shown the statistics
for the case with the first two and last two frames of image sequences from
database,
– The Japanese Female Facial Expression Database [5] (JAFFE),
– Facial Expression Recognition Challenge [18] (FER2013),
– Facial expressions Repository [19] (FE),
– SoF dataset [20] (SoF),
– AffectNet [21].
The largest of them is AffectNet dataset (a total of more than 1 million images). In
addition to manually labeled data, it contains automatically annotated images that
researchers or developers can label and check on their own if necessary.
This paper discusses the issue of facial expression recognition on static images
using modern deep learning methods, as well as choosing the format of the input data.
On the one hand, color images provide additional information about a person’s face, on
the other hand, using grayscale images reduces the effect of shooting conditions: light
level, type of light source, etc. To choose one of these forms of image representation, it
is necessary to conduct experiments with various architectures of neural networks with
different sizes of input images.
The Usage of Grayscale or Color Images for FER 273
2 Task Formulation
In this paper we will solve the task of determining one of the eight facial expression
categories (“Neutral”, “Happiness”, “Sadness”, “Surprise”, “Fear”, “Disgust”, “Anger”,
“Contempt”) on grayscale or color images with cropped faces, see the Fig. 1.
We had taken modern and the biggest open-source dataset – AffectNet which
contains 287651 images as training sample and 4000 images (500 images per class) as
testing sample [21]. Samples include images of different sizes from 129 129 to
4706 4706 pixels that are obtained from different cameras in different shooting
conditions.
0 1 2 3 4 5 6 7
Fig. 1. Examples of labeled images with facial expressions from AffectNet Dataset: 0 – Neutral,
1 – Happiness, 2 – Sadness, 3 – Surprise, 4 – Fear, 5 – Disgust, 6 – Anger, 7 – Contempt
274 D. A. Yudin et al.
To solve the task it is necessary to develop various variants of deep neural network
architectures and to test them on the available data set with 1-channel (grayscale) and
3-channel (color) image representation. We must determine which image representation
is best used for the task of facial expression recognition. Also, we need to select the
best architecture that will provide best performance and the highest quality measures of
image classification: accuracy, precision and recall [22].
3 Dataset Preparation
AffectNet [21] was chosen as the main dataset, which is one of the largest modern
datasets for facial expression recognition. However, it contains relatively few images
for the “Fear”, “Disgust” and “Contempt” categories compared to other categories. To
conduct experiments for learning neural networks, augmentation of images was carried
out, and a balanced training sample was formed with 10,000 images per class.
For image augmentation we have used 5 sequential steps:
1. Coarse Dropout – setting rectangular areas within images to zero. We have gen-
erated a dropout mask at 2 to 25 percent of image’s size. In that mask, 0 to 2 percent
of all pixels were dropped (random per image).
2. Affine transformation – image rotation on random degrees from −15 to 15.
3. Flipping of image along vertical axis with 0.9 probability.
4. Addition Gaussian noise to image with standard deviation of the normal distribution
from 0 to 15.
5. Cropping away (cut off) random value of pixels on each side of the image from 0 to
10% of the image height/width.
Results of this augmentation procedure are shown on Fig. 2.
Fig. 3. Examples of wrong ground truth labels in testing sample of AffectNet Dataset
In this paper to solve formulated task we investigate the application of a deep con-
volutional neural networks of three architectures:
– ResNetM architecture inspired from ResNet [23] and implemented by authors in
previous works [24]. It has input tensor 120 120 3 for color images and
120 120 1 for grayscale images. Its structure is shown in Fig. 4 and contains 3
convolutional blocks, 5 identity blocks, 2 max pooling layers, 1 average pooling
layer and one output dense layer. First 11 layers and blocks provide automatic
feature extraction and the last one fully connected layer allows us to find one of five
image classes corresponding to input image. ResNetM net was trained on full
Training sample 1.
276 D. A. Yudin et al.
– Xception architecture [26] with changed input tensor to 120 120 3 for color
images and 120 120 1 for grayscale images. This structure is a development
of the Inception [27] and is based on prospective Separable convolutional blocks
architectures (see Fig. 6). Xception net was trained on balanced Training sample 2.
The Usage of Grayscale or Color Images for FER 277
Output layer in all architecture has 8 neurons with “Softmax” activation function.
All input images are pre-scaled to a size of 60 60 pixels for ResNetM architecture,
120 120 pixels for Xception architecture and 224 224 pixels for DenseNet169.
Neural networks works with color (three-channel) and grayscale (one-channel) images.
To train the neural networks we have used “categorical crossentropy” loss function,
Stochastic Gradient Descent (SGD) as training method with 0.001 learning rate.
Accuracy is used as classification quality metric during training. The batch is consisted
of 5 images.
The training process of deep neural networks is shown in Fig. 7. The training
experiment was carried out for 50 learning epochs using our developed software tool
implemented on Python 3.5 programming language with Keras + Tensorflow frame-
works [28]. We can see that DenseNet and Xception networks have similar speed and
accuracy, while ResNetM achieves much lower accuracy rates on test samples com-
pared to them.
The calculations had performed using the NVidia CUDA technology on the
graphics processor of the GeForce GTX 1060 graphics card with 6.00 GB, central
processor Intel Core i-5-8300H, 4 Core with 2.3 GHz and 24 GB RAM.
278 D. A. Yudin et al.
Fig. 7. Training of deep neural networks with ResNetM, DenseNet and Xception architectures.
Table 3 shows the results of the facial expression recognition on training and test
samples with color or grayscale images using ResNetM, DenseNet169 and Xception
architectures.
Analysis of the obtained results shows the highest accuracy and on all samples
Xception architecture with grayscale input images: 97.65% on training sample, 57.48%
on testing sample 2 and top-2 accuracy 76.70%. It also has the greatest and more
balanced values of precision and recall for almost all categories (classes) of facial
expression except for “Anger” и “Contempt”.
ResNetM is significantly faster than all other architectures: about 4 ms for pro-
cessing a single image against 12 ms for Xception and 30 ms for DenseNet. Also, this
architecture has the highest recognition recall for the “Happiness” category.
DenseNet surpasses all other architectures in “Anger” category recognition and is
better in terms of recognition recall of “Fear” and “Contempt” categories. Also it has
the highest precision for “Neutral” category.
The category “Contempt” is poorly recognized by all the types of neural networks
considered, which speaks primarily of its ambiguity and similarity with other types of
facial expressions, in particular “Neutral”.
The Usage of Grayscale or Color Images for FER 279
As for the size of the network, here the smallest amount of memory is occupied by
the weights for ResNetM (about 10.6 MB), the largest volume by the weights of the
Xception network (83.8 MB).
For all considered types of neural networks, the representation of the input images
in gray or color format did not lead to any significant difference in the values of the
metrics accuracy, top-2 accuracy, processing time per image, and weights number.
Thus, it can be concluded that for the facial recognition task it does not matter, the
color or grayscale image is fed to the algorithm. This fact can save a significant amount
of memory when storing datasets (about 65% of HDD space) and training of neural
networks (about of 67% of operative memory).
280 D. A. Yudin et al.
5 Conclusions
It follows from the Table 3 that the applied architectures of a deep neural network for
face expression recognition on AffectNet dataset show high quality indicators for the
training set, but significantly worse results on the testing sample. This can be explained
by the ambiguity of certain emotions on a person’s face, a variety of shooting angles
and the presence of conflicting data in the training sample. The most accurate archi-
tecture is Xception. It gave classification accuracy 97.65% on training sample, 57.48%
on testing sample 2 and top-2 accuracy 76.70% on testing sample 2.
The category “Contempt” is worst recognized by all the types of neural networks
considered, which indicates its ambiguity and similarity with other types of facial
expressions.
Experimental results show that for the considered task it does not matter, the color
or grayscale image is fed to the input of the algorithm. This fact can save a significant
amount of memory when storing data sets and training neural networks.
An important aspect for the further application of the considered approaches is the
average classification time per image. It varies from 4 ms for ResNetM to 30 ms for
DenseNet. This suggests that the all described approaches can be integrated into a real-
time face recognition software.
To further studies on the paper topic it is necessary to expand the training and test
samples to cover more images in “Fear”, “Disgust” and “Contempt” categories. Also, it
will be promising to explore the emotion recognition on images with a face alignment
based on key points, in order to reduce the impact of choosing bounding box of face
detection algorithms.
Acknowledgment. The research was made possible by Government of the Russian Federation
(Agreement №. 075-02-2019-967).
References
1. Zeng, N., Zhang, H., Song, B., Liu, W., Li, Y., Dobaie, A.M.: Facial expression recognition
via learning deep sparse autoencoders. Neurocomputing 273, 643–649 (2018)
2. Yudin, D., Knysh, A.: Vehicle recognition and its trajectory registration on the image
sequence using deep convolutional neural network. In: The International Conference on
Information and Digital Technologies, pp. 435–441 (2017)
3. Yudin, D., Naumov, A., Dolzhenko, A., Patrakova, E.: Software for roof defects recognition
on aerial photographs. J. Phys. Conf. Ser. 1015(3), 032152 (2018)
4. Friesen, W., Ekman, P.: EMFACS-7: emotional facial action coding system. Unpublished
Manuscript Univ. Calif. San Francisco 2(36), 1 (1983)
5. Lyons, M.J., Akemastu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor
wavelets. In: 3rd IEEE International Conference on Automatic Face and Gesture
Recognition, pp. 200–205 (1998)
6. Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary
patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
7. Kotsia, I., Pitas, I.: Facial expression recognition in image sequences using geometric
deformation features and support vector machines. IEEE Trans. Image Process. 16(1), 172–
187 (2006)
The Usage of Grayscale or Color Images for FER 281
8. Wang, J., Yin, L., Wei, X., Sun, Y.: 3D facial expression recognition based on primitive
surface feature distribution. In: IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR 2006) (2006)
9. Lopes, A.T., de Aguiar, E., De Souza, A.F., Oliveira-Santos, T.: Facial expression
recognition with convolutional neural networks: coping with few data and the training
sample order. Pattern Recogn. 61, 610–628 (2017)
10. Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition
using deep neural networks. In: IEEE Winter Conference on Applications of Computer
Vision (WACV) (2016)
11. Ding, H., Zhou, S.K., Chellappa, R.: FaceNet2ExpNet: regularizing a deep face recognition
net for expression recognition. In: 12th IEEE International Conference on Automatic Face &
Gesture Recognition (FG 2017) (2017)
12. Zhang, K., Huang, Y., Du, Y., Wang, L.: Facial expression recognition based on deep
evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017)
13. Zhang, T., Zheng, W., Cui, Z., Zong, Y., Yan, J., Yan, K.: A deep neural network-driven
feature learning method for multi-view facial expression recognition. IEEE Trans.
Multimedia 18(12), 2528–2536 (2016)
14. Face API oт Microsoft Azure. https://azure.microsoft.com/ru-ru/services/cognitive-services/
face/#detection. Accessed 26 May 2019
15. Amazon Emotion API. https://docs.aws.amazon.com/rekognition/latest/dg/API_Emotion.
html. Accessed 26 May 2019
16. Affectiva Emotion SDK. https://www.affectiva.com/product/emotion-sdk/. Accessed 26
May 2019
17. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended
cohn-kanade dataset (CK+): a complete expression dataset for action unit and emotion-
specified expression. In: Proceedings of the Third International Workshop on CVPR for
Human Communicative Behavior Analysis (CVPR4HB 2010), pp. 94–101 (2010)
18. Carrier, P.-L., Courville, A.: Challenges in representation learning: facial expression
recognition challenge (2013). https://www.kaggle.com/c/challenges-in-representation-
learning-facial-expression-recognition-challenge/data. Accessed 26 May 2019
19. Facial expressions. A set of images for classifying facial expressions. https://github.com/
muxspace/facial_expressions. Accessed 26 May 2019
20. Afifi, M., Abdelhamed, A.: AFIF4: deep gender classification based on an AdaBoost-based
fusion of isolated facial features and foggy faces. J. Vis. Commun. Image Represent. 62, 77–
86 (2019)
21. Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression,
valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
22. Olson, D.L., Delen, D.: Advanced Data Mining Techniques, 1st edn. Springer, Cham (2008)
23. Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Deep residual learning for image
recognition. ECCV. arXiv:1512.03385 (2015)
24. Yudin, D., Kapustina, E.: Deep learning in vehicle pose recognition on two-dimensional
images. Adv. Intell. Syst. Comput. 874, 434–443 (2019)
25. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional
networks. CVPR 2017. arXiv:1608.06993 (2017)
26. Chollet, F.: Xception: deep learning with depthwise separable convolutions. CVPR 2017.
arXiv:1610.02357 (2017)
27. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception
architecture for computer vision. ECCV. arXiv:1512.00567 (2016)
28. Chollet, F.: Keras: deep learning library for theano and tensorflow. https://keras.io/.
Accessed 26 May 2019
Applications of Neural Networks
Use of Wavelet Neural Networks to Solve
Inverse Problems in Spectroscopy
of Multi-component Solutions
1 Introduction
is usually too small (at best, the several thousand of patterns, with the dimension of the
input data of the order of hundreds). This means that when training the network, it is
almost inevitable that a local (rather than global) minimum of error functional will be
found, even if it is deep enough, and the solution found will be only quasi-optimal.
A possible solution of the problem is such a change in the decomposition basis, in
which the functions of both the same and multiple different scales in the space of input
features are already initially present in the basis. Such a basis is provided by wavelet
neural networks (WNN) [4]. At the same time there is reason to believe [5, 6], that in
the transition to the wavelet basis the form of the error functional will change in such a
way that the local minima will become deeper and will approach in their depth to the
global minimum, which will lead to a decrease in the average error of the approxi-
mation solution of the desired problem of modeling the inverse function. In this case,
the network will be able to work more efficiently with data that includes simultaneously
spectral bands of multiple different widths.
The classical approach to the formation and training of WNN has already been
worked out in detail, due to the fact that historically it appeared earlier. In particular,
the algorithm of error backpropagation by the method of stochastic gradient descent
(SGD) was used for training [7], as well as its combinations with the least squares
method [8], Kalman filter [9], genetic algorithms (GA) [10]. In addition to a com-
parative analysis of the use of GA and SGD backpropagation, it is interesting to
consider new popular methods of optimization Adam [11] and AdaGrad [12]. This
analysis should be carried out for WNN with linear and nonlinear activation functions
in the output layer, as well as with various families of wavelets.
The inverse problem (IP) considered in this study is determination of types and con-
centrations of components in multi-component water solution of inorganic salts by
Raman spectra.
The principle possibility of solving this problem is due to the fact that the bands of
the Raman spectrum of an aqueous solution are very sensitive to the presence of
dissolved salts/ions in it (Fig. 1). Complex ions (SO42− – sulphates, NO4− – nitrates,
CO32− – carbonates etc.) have their own Raman lines in the region of 500–1500 cm−1
(the area of the so-called “fingerprints”), which makes it possible to uniquely determine
the type of ion and its concentration. The presence of simple ions that do not have their
own Raman lines, however, is also manifested in the Raman spectra of aqueous
solutions. Ions such as Cl−, I−, Br−, Na+, K+ etc. affect the shape and position of the
most intense band of the spectrum – the band of stretching vibrations of water mole-
cules in the region 3000–4000 cm−1 [13–15]. Different ions have different effects. In
addition, the behavior of the quantitative characteristics of Raman spectral bands of
water depends significantly on the state of the solutes: the presence of associates,
contact and non-contact ion pairs, etc. in the solution also appears in its spectrum.
This problem is inherently a complex IP. This nature of the task involves deter-
mining the concentration of a large set of simultaneously dissolved substances in a
wide range of concentrations – from tenths to units of mole per liter. Such tasks are
Use of Wavelet Neural Networks to Solve Inverse Problems 287
Fig. 1. Raman spectra of distilled water and of multi-component water solutions of inorganic
salts (left – “fingerprint” area, right – Raman valence band of water). 1 – distilled water, 2 –
KNO3 – 0.6M, Li2SO4 – 0.75 M; 3 – NaCl-0.5 M, NH4Br – 1.75 M, CsI – 0.25 M; 4 – NaCl –
0.2 M, NH4Br – 0.2 M, Li2SO4 – 0.4 M, KNO3 – 1 M, CsI – 0.6 M.
relevant in the diagnosis of wastewater and process water, mineral water, sea and river
reservoirs. It is obvious that the components of the solution interact both with the
solvent molecules and with each other, and these interactions are of a complex non-
linear nature. Formation of associates, ion pairs, etc. is also possible. This leads to the
fact that it is impossible to create a model that adequately describes the molecular
interactions in the solution. In addition, it should be borne in mind that the information
content of different spectral channels varies. The spectral regions in which the lines of
complex ions and the valence band of water are located are obviously the ones most
sensitive to the type and concentration of dissolved substances. The area of the
deformation band of water (1600–1700 cm−1) and the area of the associative band
(2000–2400 cm−1) are much less informative. The presence of the above factors leads
to the fact that the dependence of the signal intensity in different spectral channels on
the concentration of solutes is significantly nonlinear. The situation is complicated by
the fact that the spectral bands that need to be analyzed simultaneously differ signifi-
cantly from each other both in intensity (for example, the valence band of water is
about 100 times more intense than the deformation band) and in width (for example,
the width of the lines of nitrate anions is units of cm−1, and the width of the valence
band of water at half-height is about 500 cm−1). In addition, the specificity of spec-
troscopic methods from the point of view of data processing is such that it implies the
solution of inverse problems for extracting the necessary information from high-
dimensional data, since the recorded spectra contain thousands of channels.
Previous experience of the authors showed that such multi-parameter IP are quite
effectively solved with the help of MLP. The developed methods, as well as the use of a
number of methods to reduce the dimension of the input data, allowed simultaneous
determination of the concentrations of 5 salts in water – NaCl, NH4Br, Li2SO4, KNO3,
288 A. Efitorov et al.
First we present the results of solving the 10 ions problem with the partial least squares
(projection to latent structures) (PLS) method [17] and with MLP. The dataset was
randomly divided into training, validation and test sets in a ratio of 70:20:10.
PLS and MLP were applied both to the initial data and to the data processed by
various compression methods. Data compression is used to reduce the dimension of the
input data. Thus, for inverse problems of spectroscopy, the spectra contain thousands of
channels, thus making any approximation method tend to overtrain. At the same time, it
is clear that not all spectral channels are equally informative. Very often, reducing the
input dimension allows increasing the accuracy of the solution. In this case, only the
most informative input features remain, and the construction of the PLS model or MLP
training is carried out on patterns with a smaller number of input features extracted by
some algorithm.
Use of Wavelet Neural Networks to Solve Inverse Problems 289
The input data can be compressed in different ways. The simplest method is the
aggregation of spectral channels, consisting in summation of intensities in some
number of neighboring channels and averaging over these channels. In this study, in
addition to channel aggregation, the input data compression using discrete and con-
tinuous wavelet transform (DWT and CWT) was used. In this case, the initial spectrum
is considered as some scale space with the best resolution, and for some given basis of
orthogonal functions there is a set of subspaces with less detail. Calculations of the
DWT were carried out using R language using the wavethresh library: Wavelet
Statistics and Transforms [18]. The wavelets of the family Daubechies 10 [19] were
used. The CWT was calculated using our own code implementation in Python lan-
guage, supporting parallel computations on GPU through the use of library functions of
the tensorflow library [20]. Computational experiments with MLP training were carried
out by means of Python language on the basis of machine learning libraries scikit-learn
[21] and tensorflow.
Construction of the PLS model was stopped when convergence was achieved on
the training set. The results of the application of the PLS method are shown in Fig. 2.
As algorithms for compression of the input data, we used aggregation by 8 adjacent
input features, DWT for 4th, 5th, 6th, and 7th levels, and CWT with convolution width of
8, 16, 32 and 64 channels.
Fig. 2. Application of the PLS method to data with different compression of input features:
mean absolute error on the test dataset. The methods used are: Aggr – aggregation, DWT –
discrete wavelet transform, CWT – continuous wavelet transform; the number of input features is
separated by a space.
As can be seen, use of some methods of compression of the input data improves the
result of application of the PLS method to solve the 10 ions IP, compared with use of
the initial data. In the case of DWT, the best result is achieved using level 5 (32
approximation and 32 detail coefficients). Aggregation by 8 features provides the result
290 A. Efitorov et al.
better than DWT. The best result is achieved when using the CWT with a window 16
channels wide (190 input features). On the average, the best accuracy of determination
of salt concentration is 0.034 M.
When solving the 10 ions IP using MLP, a perceptron with two hidden layers (120
neurons in the first hidden layer and 60 in the second) was used. Each network was
trained 5 times with various initial weights; the results of all 5 networks were averaged.
The results of the application of ANN method are shown in Fig. 3. The initial data used
was the same as in the PLS method.
Fig. 3. Application of the MLP to data with different compression of input features: mean
absolute error on the test dataset. The legend is the same as in Fig. 2.
The results of application of MLP without compression of the input data are worse
than the results obtained using DWT. In this case, DWT provides a greater error than
aggregation, and the best result is provided by CWT. The smallest error is achieved
using CWT with a window 32 channels wide (94 input features). On the average, the
mean absolute error of determination of salt concentration is 0.023 M.
Of the three methods of extraction of informative features considered above, the
method of CWT is the best. It demonstrates the lowest values of the mean absolute
error on the test dataset when using MLP or PLS. MLP shows significantly better
results than PLS, indicating a significant nonlinearity of the problem.
First, an implementation of the WNN classical scheme on the basis of the Python
programming language and a number of libraries of this language has been created.
This initial implementation was a classic Python code within object-oriented pro-
gramming. This implementation allowed to work out all the computational operations
performed during the training and application of the WNN, and to observe the evo-
lution of the parameters throughout the training.
The following problems were identified: saturation of weights and exit of their
values out of the domain of definition of wavelet functions in the training process (by
the shift parameter). In combination with the procedure of multiplication inside the
wavelon, this leads to the fact that it will provide the zero value, both in direct run, and
in the reverse propagation of the error. This makes further adjustment of weights by the
gradient method impossible.
The second parameter, the values of which also need to be artificially limited, is the
scale parameter. If the value of this parameter is very large, the function definition area
will suffer again, actually leading to the Delta function behavior, which will lead to
negative consequences similar to those mentioned above. At the same time, these
parameters are interrelated, so simply establishing hard constraints on their values
limits the definition area too much, and often does not allow finding optimal solutions
by the method of stochastic gradient descent (SGD).
The main way to deal with these problems was use of special effective approaches
to determining the initial values of weights (parameters) of the WNN.
In this study, new gradient descent algorithms were tested in the training of WNN:
Adam and Adadelta, in comparison with the classical SGD. As expected, SGD
demonstrated slower convergence and a high degree of dependence on the initialization
of the weights, and in case of an unfortunate coincidence – the problems described
above: going beyond the definition area of wavelet functions and the need to interrupt
training. However, when SGD was run many times, it was usually possible to obtain a
model comparable in properties to that trained by Adam algorithm. The Adadelta
method did not allow obtaining the best solutions, however, it should be noted that it
was often inclined to large values of learning rate parameters. This method may require
a more thorough search for the optimal parameters of the training algorithm. Note that
for several problems tested, and for the three methods of reducing the data dimension
for each of the problems, in only one scenario, SGD surpassed Adam. In all other cases
it was WNN trained by Adam that showed the best results.
Therefore, the most effective approach has been the combination of setting limits on
the values taken by the parameters and using the Adam method for training.
The next stage was the implementation of training and application of the WNN on
the basis of tensorflow high-performance machine learning library, which allowed use
of multithreaded calculations on CPU and GPU, greatly reducing calculation time.
Writing control scripts for the heterogeneous computing cluster allowed making
calculations simultaneously on more than 150 processor cores, managing all proce-
dures for storing and processing data from the cluster control terminal.
Finally, the results of solving 5 salts and 10 ions IPs using the classical WNN were
compared with the results obtained using the classical MLP and the PLS method. The
comparison is presented in Fig. 4 (5 salts) and Fig. 5 (10 ions).
292 A. Efitorov et al.
Fig. 4. Comparison of the results of solving the 5 salts problem by the algorithms of WNN,
MLP and PLS on the initial data and after compression by PCA, DWT and CWT methods with
the best parameters (Figs. 2, 3). The configuration optimal for WNN in all cases was: 32
wavelons, Adam.
Fig. 5. Comparison of the results of solving the 10 ions problem by the algorithms of WNN,
MLP and PLS on the initial data and after compression by PCA, DWT and CWT methods with
the best parameters (Figs. 2, 3). The optimal configurations for WNN were the following: 32
wavelons, SGD (PCA); 16 wavelons, Adam (DWT); 32 wavelons, Adam (CWT).
Use of Wavelet Neural Networks to Solve Inverse Problems 293
On the base of the performed experiments, it can be concluded that at this stage,
WNN occupies an intermediate position between MLP and PLS, in some scenarios
even surpassing the result of MLP. This result can be considered partly successful,
since there are directions of improvement of the WNN training technology.
As it was mentioned above, the WNN has difficulties working with data of high
dimension. For this reason, although the obtained results were somewhat worse than
expected, they also showed good potential and prospects for WNN. At the same time,
the problem of working with high-dimensional data remains urgent and requires further
development of study in this direction.
Finally, this study has confirmed the results of our preceding studies regarding
comparison of the two IPs. The 10 ions IP is much more complex and non-linear,
requiring maximum of information available to achieve the best results. Therefore,
feature selection worsens the result for the 10 ions IP in all cases, and MLP turns out to
be the ML algorithm providing the best results for any number of input features.
5 Conclusions
In this study, we considered use of wavelet neural networks to solve the inverse
problems of determination of the composition of multi-component solutions of inor-
ganic salts by the method of Raman spectroscopy combined with machine learning.
The results of WNN were compared to the results demonstrated by multi-layer per-
ceptrons and by the method of partial least squares (projection to latent structures).
As WNN is very sensitive to the number of input features, the solution of the
studied problems was preceded with feature extraction. The best result among the
feature extraction methods was demonstrated by continuous wavelet transformation.
At present stage of research, WNN usually performs better than the linear PLS
algorithm, but worse than an MLP. However, it has several problems in performing
efficient training. Directions of possible improvement of the WNN training algorithm
have been formulated.
Acknowledgement. This study has been performed with financial support from Russian
Foundation for Basic Research, projects 17-07-01479 and 19-01-00738.
References
1. Burikov, S.A., Dolenko, S.A., Dolenko, T.A., Persiantsev, I.G.: Application of artificial
neural networks to solve problems of identification and determination of concentration of
salts in multi-component water solutions by Raman spectra. Opt. Mem. Neural Netw. (Inf.
Opt.) 19(2), 140–148 (2010)
2. Dolenko, S.A., Burikov, S.A., Dolenko, T.A., Persiantsev, I.G.: Adaptive methods for
solving inverse problems in laser Raman spectroscopy of multi-component solutions. Pattern
Recogn. Image Anal. 22(4), 551–558 (2012)
294 A. Efitorov et al.
3. Efitorov, A., Dolenko, T., Burikov, S., Laptinskiy, K., Dolenko, S.: Neural network solution
of an inverse problem in Raman spectroscopy of multi-component solutions of organic salts.
In: Samsonovich, A.V. et al. (eds.) FIERCES 2016, Advances in Intelligent Systems and
Computing, vol. 449, pp. 273–279. Springer, Heidelberg (2016)
4. Zhang, Q., Benveniste, A.: Wavelet networks. IEEE Trans. Neural Netw. 6, 889–898 (1992)
5. Li, S., Chen, S.: Function approximation using robust wavelet neural networks. In: 14th
IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2002),
Proceedings, Washington, DC, USA, pp. 483–488 (2002)
6. Bellil, W., Ben Amar, C., Alimi, A.: Comparison between beta wavelet neural networks,
RBF neural networks and polynomial approximation for 1D, 2D functions approximation.
Int. J. Appl. Sci. Eng. Technol. 13, 33–37 (2006)
7. Zhang, J., Walter, G., Miao, Y.: Wavelet neural networks for function learning. IEEE Trans.
Signal Process. 43(6), 1485–1496 (1995)
8. Zhang, Q.: Using wavelet network in nonparameters estimation. IEEE Trans. Neural Netw.
8, 227–236 (1997)
9. Sui, Q., Gao, Y.: A stepwise updating algorithm for multiresolution wavelet neural networks.
In: International Conference on Wavelet Analysis and its Applications (WAA), Proceedings,
Chongqing, China, pp. 633–638 (2003)
10. Lim, C.G., Kim, K., Kim, E.: Modeling for an adaptive wavelet network parameter learning
using genetic algorithms. In: Fifteenth IASTED International Conference on Modeling and
Simulation, Proceedings, California, USA, pp. 55–59 (2004)
11. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv (2015). https://
arxiv.org/pdf/1412.6980v8.pdf. Accessed 09 June 2019
12. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and
stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
13. Rull, F., De Saja, J.A.: Effect of electrolyte concentration on the Raman spectra of water in
aqueous solutions. J. Raman Spectrosc. 17(2), 167–172 (1986)
14. Dolenko, T.A., Churina, I.V., et al.: Valence band of liquid water Raman scattering: some
peculiarities and applications in the diagnostics of water media. J. Raman Spectrosc. 31,
863–870 (2000)
15. Burikov, S.A., Dolenko, T.A., Velikotnyi, P.A., Sugonyaev, A.V., Fadeev, V.V.: The effect
of hydration of ions of inorganic salts on the shape of the Raman stretching band of water.
Opt. Spectrosc. 98(2), 235–239 (2005)
16. Efitorov, A., Dolenko, T., Burikov, S., Laptinskiy, K., Dolenko, S.: Solution of an inverse
problem in Raman spectroscopy of multi-component solutions of inorganic salts by artificial
neural networks. In: Villa, A.E.P. et al. (eds.) ICANN 2016, Part II, LNCS, vol. 9887,
pp. 355–362. Springer, Heidelberg (2016)
17. Esbensen, K.H.: Multivariate Data Analysis—In Practice, An Introduction to Multivariate
Data Analysis and Experimental Design, 5th edn. CAMO Software AS, US (2006)
18. Wavelet Statistics and Transforms. https://cran.r-project.org/package=wavethresh. Accessed
09 June 2019
19. Daubechies, I.: Ten Lectures on Wavelets. SIAM, Pennsylvania (1992)
20. TensorFlowTM: An open source machine learning framework for everyone. https://www.
tensorflow.org/. Accessed 09 June 2019
21. scikit-learn: Machine Learning in Python. http://scikit-learn.org/stable/index.html
Automated Determination of Forest-Vegetation
Characteristics with the Use of a Neural
Network of Deep Learning
1 Introduction
Using this approach, it is possible to improve already working solutions, and also to
develop more effective ways of solving existing problems in various fields. Based on
this, the article will consider the method that will determine the species composition,
planting stock and other coefficients of forest plantations with use of machine learning
methods based on lidar data.
This direction is extremely important, since today the use of unmanned vehicles
becomes more popular in forestry. This is primarily due to the fact that the unmanned
aerial vehicle (UAV) with the LiDAR system makes long-distance flights to take
pictures of hard-to-reach forest areas, monitors large areas, and also receives data on
the characteristics of forest stands in a short time. By analyzing the data obtained by
various methods, including methods of machine learning, it is possible to assess the
dynamics of the development of the forest fund in the studied area.
There are a number of studies and ready-made solutions that solve similar prob-
lems. For example, the Finnish company Arbonaut Oy Ltd [7] specializes in devel-
oping solutions for geographic information systems and processing data of remote
sensing for different areas. They use LiDAR systems for forest inventory with the
method based on sparse Bayesian regression for modeling forest characteristics [8].
This method is superior in accuracy to the traditional inventory methods based on field
measurements. However, it should be noted that forest plantations in Finland have a
fairly strict order and a homogeneous structure, and the variety of species is not large.
Such plantings are easier to analyze, unlike forest plantations in the Russian Federation,
where the order and structure of forests is more chaotic, and the diversity of species is
much greater [9, 10]. Therefore, the solution proposed by Arbonaut Oy Ltd is not
suitable for the inventory of forest plantations in the Russian Federation.
The task of forest inventory is extremely relevant and requires its speedy resolution,
taking into account the development of big data technologies, artificial intelligence,
robotics, as well as complex digital transformation of the economy and social sphere of
the Russian Federation by 2024.
We analyzed the known approaches and methods for forest inventory and their
implementations and noticed the absence of acceptable, in terms of quality, ready-made
solutions [11]. In this regard, to solve this problem, we need to develop our own
method, based on the data of UAV shooting, LiDAR systems and the use of deep
learning, which is implemented using a neural network, as one of the methods of
machine learning. For this, we propose the following algorithm:
1. Combine LiDAR data (Fig. 1a) and « dense cloud » data (Fig. 1b) [12], which is a
kind of terrain plan on a precise geodetic basis:
A [ B ¼ C; ð1Þ
where A ¼ fða11 ; a12 ; . . .; a1m Þ; . . .; ðan1 ; an2 ; . . .; anm Þg is LiDAR data set, B ¼
fðb11 ; b12 ; . . .; b1k Þ; . . .; bp1 ; bp2 ; . . .; bpk g is « dense cloud » data set, C ¼
Automated Determination of Forest-Vegetation Characteristics 297
Fig. 1. Survey data: a - lidar survey of a strip of forest, b - forest «dense cloud»
fðc11 ; c12 ; . . .; c1r Þ; . . .; cðn þ pÞ1 ; cðn þ pÞ2 ; . . .; cðn þ pÞr g is compatible data set, aij ; bij
and cij are attributes of points of a three-dimensional scene according to the data
specification of each type of survey, including positional values; n is the number of
points in the set A, p is the number of points in the set B, m, k and r are the number of
points attributes.
This operation is performed in the ArcMap software [13] by spatial reference.
ArcMap allows you to create, view, edit, and publish maps. When using the spatial
reference function, it is necessary to find clearly expressed objects on the image -
crown tops. The result of this alignment will be data sets containing point clouds. In
addition to the positional values x, y and z, the system also stores additional infor-
mation. The following attributes are recorded and saved for each laser pulse of the
LiDAR system: intensity, reflection number, number of reflected signals, point clas-
sification values, extreme points of the flight line, RGB values, time, GPS, scan angle
and scan direction. A detailed description of each of the attributes can be found in the
specifications of the lidar data given in [14].
2. Segmentation of tree crowns in the combined images. Crown segmentation means
that each point in the picture must belong to a particular tree, if this point is indeed a
point of the tree, since there may be other objects in the pictures. Thus, it is
necessary to solve the segmentation problem:
298 D. A. Eroshenkova et al.
FðCÞ ¼ fðc11 ; c12 ; . . . ; c1r ; l1 Þ; . . . ; ðcðn þ pÞ1 ; cðn þ pÞ2 ; . . . ; cðn þ pÞr ; lðn þ pÞ Þg;
where F is segmentation function, C is the result of operation (1), cij are points
attributes, li is the variable of belonging of a point to a certain tree.
To solve this problem, we carried out a review of existing methods of 3D seg-
mentation [15], based on the results of which we propose to use PointNet convolutional
neural network (CNN) [16].
3. After segmentation of tree crowns, it is necessary to find the diameter of the crowns.
Calculating the diameter of crowns by the known dependencies [17], it is possible
to determine the diameter of the stem. This parameter is important when analyzing
tree stands of the studied area.
4. Summarizing the results of paragraphs 1-3, one can determine the characteristics of
forest plantations, such as: the predominant species, tree species in the studied area,
the height of tree stands, the crown diameter and the stem diameter. The values of
these parameters can be used to calculate the full and stock of plantings in a given
territory. The result of the work will be a forest plantation map with a database
attached to it.
The advantage of using LiDAR in the task of determining the species composition,
stock coefficient and other characteristics of forest plantations is that the obtained data
give the correct height of plantations, which can be used in further analysis. The
drawbacks of the LiDAR data are rare measurements (about 30 points/m2), as well as
the absence of color scale (it is impossible to visually distinguish forestland species).
The « dense cloud » data is different. Its pros are high resolution system, i.e. frequent
location of points (up to 1000/m2), RGB images, and the presence of an infrared
channel, which is used for additional research of forests. But its limitation is inaccurate
measurement of the heights of forest plantations.
When combining LiDAR and « dense cloud » data we get:
– correct coefficient of the height of forest stands;
– frequent location of points;
– the presence of an infrared channel.
We should note that it is difficult to combine two scenes into one without the
presence of common points. Due to the different points of the survey, the angles of the
points, their slant ranges and other parameters differ. One possible solution is to shoot
from a UAV equipped with two LiDAR systems. Knowing the fixed distance between
the cameras, we get the difference in the locations of the points of the two scenes
relative to each other. Taking into account this distance, it is possible to carry out the
operation of combining two scenes into one scene.
An important point in the work is the use of CNN PointNet [16]. PointNet is a
unified deep learning network architecture that studies both global and local point
Automated Determination of Forest-Vegetation Characteristics 299
4 Conclusion
The article proposes a method for the automated determination of the species com-
position, stock coefficient and other characteristics of forest plantations. It includes
shooting from a UAV with a LiDAR system installed on it and using PointNet deep
learning neural network to process the received data. We described the working pro-
cedure and the algorithm for solving this problem. The article contains the description
and experimental results obtained by the authors of PointNet. It is shown that the
method proposed in the article is a promising, but not an easy scientific and technical
challenge. This is due to the fact that the main difficulties in analyzing the data obtained
are caused by the following problems:
1. Incompleteness and possible distortions of information about objects of interest due
to different types of surveys - lidar and « dense cloud »;
2. The lack of a correct method of combining data from a lidar survey and shooting
an « dense cloud » in one scene;
3. The impossibility of learning the neural network without a sufficient number of
labeled data sets.
Based on this, in the next stage of work, it is planned to create a labeled set of data
representing clouds of points of trees. Then it is necessary to split data into sets of
HDF5-files and train CNN PointNet with their help. Depending on the obtained
Automated Determination of Forest-Vegetation Characteristics 301
learning results, such a modification of the network is possible, which will allow to
achieve the required accuracy in tree crown segmentation. After solving this problem, it
is necessary to consistently solve the problem of determining the diameters of crowns
and stems, as well as other related parameters that are used in the analysis of forest
stands of the studied area. The result of this work is a map of forest plantations of the
region with a base of the main characteristics of forest stands attached to it.
References
1. Weitkamp, C. (ed.).: Lidar: Range-Resolved Optical Remote Sensing of the Atmosphere.
vol. 102. Springer (2006)
2. Chernenkiy, V., Gapanyuk, Y., Revunkov, G., Kaganov, Y., Fedorenko, Y.: Metagraph
approach as a data model for cognitive architecture. In: Biologically Inspired Cognitive
Architectures Meeting, pp. 50–55. Springer, Cham, August 2018
3. Lychkov I.I., Alfimtsev A.N., Sakulin S.A.: Tracking of moving objects with regeneration of
object feature points. In: 2018 Global Smart Industry Conference (GloSIC), pp. 1–6. IEEE
(2018)
4. Neusypin, K.A., et al.: Algorithm for building models of INS/GNSS integrated navigation
system using the degree of identifiability. In: 2018 25th Saint Petersburg International
Conference on Integrated Navigation Systems (ICINS), pp. 1–5. IEEE (2018)
5. Serov, V.A., Voronov, E.M.: Evolutionary algorithms of stable-effective compromises
search in multi-object control problems. In: Smart Electromechanical Systems, pp. 19–29.
Springer, Cham (2019)
6. Knyazev, B., Barth, E., Martinetz, T.: Recursive autoconvolution for unsupervised learning
of convolutional neural networks. In: 2017 International Joint Conference on Neural
Networks (IJCNN), pp. 2486–2493. IEEE (2017)
7. https://www.arbonaut.com/en/
8. Tipping, M.E., et al.: Fast marginal likelihood maximisation for sparse Bayesian models. In:
AISTATS (2003)
9. Alexeyev, V.A., et al.: Statistical data on forest fund of Russia and changing of forest
productivity in the second half of XX century. St. Petersburg Forest Ecological Center,
p. 272 (2004)
10. http://www.iiasa.ac.at/web/home/research/researchPrograms/EcosystemsServicesandManag
ement/RussianForests.en.html
11. Hyyppä, J., et al.: Review of methods of small-footprint airborne laser scanning for
extracting forest inventory data in boreal forests. Int. J. Remote Sens. 29(5), 1339–1366
(2008)
12. Thrower, N.J.W., Jensen, J.R.: The orthophoto and orthophotomap: characteristics,
development and application. Am. Cartogr. 3(1), 39–56 (1976)
13. https://desktop.arcgis.com/en/arcmap/
14. Heidemann, H.K.: Lidar base specification. US Geol. Surv. (11-B4) (2012)
15. Nguyen, A., Le, B.: 3D point cloud segmentation: a survey. In: 2013 6th IEEE Conference
on Robotics, Automation and Mechatronics (RAM), pp. 225–230. IEEE (2013)
16. Qi, C.R., et al.: Pointnet: deep learning on point sets for 3d classification and segmentation.
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 652–660 (2017)
302 D. A. Eroshenkova et al.
17. Chumachenko, S.I., et al.: Simulation modelling of long-term stand dynamics at different
scenarios of forest management for coniferous–broad-leaved forests. Ecol. Model. 170(2–3),
345–361 (2003)
18. Ishiguro, H., Miyashita, T., Tsuji, S.: T-net for navigating a vision-guided robot in a real
world. In: Proceedings of 1995 IEEE International Conference on Robotics and Automation,
vol. 1, pp. 1068–1073. IEEE, (1995)
19. Folk, M., et al.: An overview of the HDF5 technology suite and its applications. In:
Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp. 36–47. ACM,
(2011)
Depth Mapping Method Based on Stereo Pairs
Abstract. The paper proposes a new method for solving the problem of con-
structing a depth map based on a stereo pair of images. The result of the depth
information recovery can be used to capture the reference points of objects in the
film industry when creating special effects, as well as in computer vision sys-
tems used on vehicles to warn the driver about a possible collision. Proposed
method consists in using the theory of active perception at the stage of seg-
mentation and image matching.
To implement the proposed method, a software product in the C# language
was developed. The developed algorithm was tested on various sets of input
data. The results obtained during the experiment indicate the correct operation of
the proposed method in solving the problem of constructing a depth map.
The accuracy of depth mapping using the described method turned out to be
comparable with the accuracy of the methods considered in the review. This
suggests that this method is competitive and usable in practice.
1 Introduction
One of the important tasks of the computer vision is the transformation of a stereo pair
of images into a three-dimensional scene. As a result of this process, the depth
information of each image point is restored. Obtaining an accurate depth map is the
ultimate goal of three-dimensional image recovery.
The depth information received as the result of this process can be used in many
other areas. For example, depth maps are used to capture the reference points of objects
in film production when creating special effects, as well as in computer vision systems
used on vehicles to warn the driver about a possible collision.
Based on this, we can conclude that the development of new models and methods
for solving the problem of constructing a depth map based on a stereo pair is relevant.
The general algorithm of depth mapping using stereo images includes the following
steps [1]: camera calibration, image rectification, image segmentation, search for
matches between points of a pair of images, conversion of a discrepancy map into a
depth map. In this paper, the first two stages are not considered, since these stages are
simple geometric transformations and they are solved at the hardware level in most
computer vision systems.
When analyzing the algorithms that implement the steps described above, the
following problems were identified. The problem of segmentation. This problem lies in
the fact that part of the segmentation algorithms does not have sufficient accuracy, and
therefore, at the stage of the search for correspondences, there are multiple errors
associated with incorrect segmentation. The other part of the algorithms provides
sufficient accuracy, but has a high computational complexity. There is also a problem
with the correlation of segments of two images [2]. The problem of finding matches.
This problem lies in the imperfection of matching algorithms, as a result of which the
accuracy of depth map construction is reduced [3]. The problem of handling errors after
matching. This problem based on the fact that usually after the stage of the search for
correspondences the discrepancy map contains a number of erroneously determined
points and their additional processing is necessary.
The proposed method for solving the problem of depth mapping applies the theory of
active perception (TAP) at the stage of segmentation and the search for correspondence
of points [4].
To solve the problem of depth mapping in this paper, the following algorithm is
proposed:
1. Image input – receiving images from cameras or from files;
2. Pre-processing – converting images to a brightness function.
3. Segmentation – the selection of objects in the first image in order to reduce the
search area in the future;
4. Search for matching segments – search for segments of the left image on the right
image;
5. Discrepancy mapping – the formation of a matrix containing information on how
much each point of the first image differs in its position in space from the same
point in the second image;
6. Depth mapping – the final stage of the restoration of depth information with the
subsequent visualization of the results.
4 Segmentation
The next step is to divide the image into segments. This is necessary to reduce the
search area at the stage of matching.
Due to the fact that there is a search for matching points on the same objects, the
best solution would be to divide the image into a set of objects, thereby narrowing the
search area to the inner area of objects. Also, the two source images are epipolar, which
allows the image to be divided into horizontal segments without the accuracy loss.
Depth Mapping Method Based on Stereo Pairs 305
Fig. 2. Segments
For the subsequent use it is necessary to form a segment model. It consists of the
following elements:
1. The starting point of the segment and its description with the help of TAP.
2. The end point of the segment and its description using TAP.
TAP filters are used to describe points. The description of the points is formed by
applying to them all 16 TAP filters.
306 V. E. Gai et al.
5 Segment Matching
At the moment, one image is divided into segments. The next step is to search for the
segments of the first image on the second one. To do this, the second image searches
for the most similar points for the beginning and end of the segment using the fol-
lowing algorithm:
1. The response for the reference point is calculated for all 16 filters.
2. A 4 4 window is passed through the pixels of the horizontal segment of the
second image. The current pixel is the coordinate of the upper left corner of a 4 4
window. As the window passes through the image, the response is calculated for all
16 filters.
3. The difference modulus (“delta”) of each response is found with the reference
response that was found at the beginning.
4. All sixteen differences are summed up and saved together with the coordinates of
the current position of the window.
5. From all the obtained differences the minimum difference is found, which deter-
mines the minimum difference of the found point from the original one.
6. This point is set in accordance with the original.
This algorithm is performed for the starting and ending points of the segment. Thus,
pairs of segments of the first and second images are formed.
6 Discrepancy Mapping
The first main stage of the algorithm is discrepancy mapping – a matrix containing
information about how much each point of the first image differs in position in space
from the same point in the second image. For each point of the segment, the corre-
sponding point is searched for in the second image. The scope of the search in this case
is limited by the size of the segment.
When the desired point is found, its discrepancy is calculated by the formula:
D ¼ jX1 X2 j; ð1Þ
7 Depth Mapping
Depth mapping is the final stage of solving the problem. At this stage, the discrepancy
map is converted to a depth map. It is also necessary to solve the problem of possible
errors made at the stage of the search for matches. Therefore, it was decided to apply
the following formula to all points of the discrepancy map:
Depth Mapping Method Based on Stereo Pairs 307
8
> D ; D Max;
< x þx;y
P2
n
where Dx;y – depth map value at point, Max – maximum possible depth map value, n –
the size of the area on which the average value is calculated.
This formula is a filter. In other words, if the value of a point is greater than
expected, replace its value with an average value from neighboring points. This
completes the depth recovery. The following formula is used to visualize the results:
Gx;y ¼ 255 Dx;y Dmax ; ð3Þ
where Dmax – maximum depth map value, Gx;y – point value in grayscale.
8 Computational Experiment
Fig. 3. An example of images used in a computational experiment (left, right and depth map)
During the experiment, each point of the reference depth map is compared with the
corresponding points of the depth map obtained by the algorithm proposed in this
paper.
The proposed method for solving the problem of depth mapping has a set of input
parameters. Therefore, in the course of the experiment, different sets of values of the
input parameters of the algorithm were investigated in order to identify the set that
allows depth mapping with the greatest accuracy. As a result of a combination of all the
specified values of the input parameters of the algorithm, nine launch configurations
were obtained. For each configuration of the launch of the algorithm, the following
values were obtained: the accuracy of depth mapping, the average processing time of a
single image. The test results of the algorithm are given in Table 1.
308 V. E. Gai et al.
Table 2 presents the results of the known methods for depth mapping [1].
Comparing the data from Table 2 and the obtained results of testing the algorithm
(see Table 1), we can conclude that the developed method has a depth map con-
struction accuracy, which is quite comparable with the accuracy of the known methods
considered. As a result of testing the algorithm in normal conditions, the accuracy of
constructing a depth map equal to 90.7% was obtained.
References
1. Kamencay, P., Breznan, M., Jarina, R., Lukac, P., Zachariasova, M.: Improved depth map
estimation from stereo images based on hybrid method. Radioeng. J. 21(1), 70–78 (2012)
2. Comaniciu, D., Meer, P.: Mean shift: a robust approach towards feature space analysis. IEEE
Trans. Patt. Anal. Mach. Intell. 24(5), 603–619 (2002)
3. Hisham, M.B.: Template matching using sum of squared difference and normalized cross
correlation. In: 2015 IEEE Student Conference Research and Development (SCOReD) (2015)
4. Utrobin, V.A.: Physical interpretations of the elements of image algebra. Uspekhi
Fizicheskikh Nauk (UFN) 174(10), 1089–1104 (2004)
Semantic Segmentation of Images Obtained
by Remote Sensing of the Earth
1 Introduction
One of the most challenging scientific and applied problems of our time is the
development of behavior control systems for highly autonomous robotic unmanned
aerial vehicles (UAVs) that can perform complex missions under uncertainty condi-
tions [1, 2]. Such a control system, to support decision-making processes, requires
information about the current situation in which the UAV operates. In obtaining such
information, the most crucial role belongs to computer vision, which is an interdisci-
plinary scientific and applied area focused on solving problems related to the per-
ception, analysis, and understanding of images [3–5]. Understanding of images is
precisely the thing what we require to obtain the information necessary to decision
making when controlling the behavior of a UAV. It should be emphasized that it is the
understanding of images that is the basis for obtaining the information necessary for
making decisions in controlling the behavior of UAVs.
In the last decade, computer vision techniques have been actively developed,
including image understanding methods based on the use of deep learning and deep
neural networks, in particular, convolutional neural networks (CNN) [3, 6–10].
We can solve the task of image understanding at several levels of granularity [3–5]:
1. Image classification. In this case, we assume that the image contains a single object
(the “main object”) that needs to be assigned to one of the finite set of prescribed
classes. The answer, in this case, is the label of the corresponding class.
2. Object classification and localization. In this case, in addition to the task of clas-
sifying an object, as in the previous granularity level, it is also required to localize it
on the image. Such localization is carried out by enclosing this object in some
bounding box. The answer, in this case, is the label of the corresponding class
together with the parameters of the bounding box.
3. Object detection. The task is similar to the one that is solved at the previous level,
but for the case when there are more than one classified objects in the image. The
answer, in this case, is a set of class labels in combination with a set of parameters
of the bounding box for all objects detected in the image.
4. Semantic segmentation. In this case, we solve the problem at the pixel level of the
analyzed image, that is, by assigning a label of the corresponding class to each of
the pixels of the given image. In general, the answer at this granularity level will be
an image of the same size as the original image with the corresponding class labels
assigned to each pixel. At the same time, for clarity, the image areas corresponding
to different classes are marked by different conditional colors.
5. Instance segmentation. This level provides additional granularity compared to
image segmentation. In this case, we require not only to mark each of the image
pixels with a corresponding label, but also to select individual instances of each of
the recognized classes in this image, as is the case in the “object detection” task. In
this case, we assign various conditional colors in the picture not to separate classes
of objects, but to separate instances of these classes. For example, in the semantic
segmentation task, pixels that correspond to all objects of the “person” class will be
Semantic Segmentation of Images Obtained by Remote Sensing of the Earth 311
marked with the same color, and in the case of instance segmentation, each of the
found objects of this type will we mark with its specific conditional color.
The following sections discuss the solution of one of these tasks, namely, the
problem of semantic image segmentation, which is critical for providing the UAV
behavior control system with source data.
A tool that has proven itself in solving problems of semantic image segmentation,
including under conditions of uncertainty, is a convolutional neural network (CNN) in
combination with deep learning methods [6, 7, 10]. During the last decade, a significant
number of neuroarchitectures based on this class of neural networks have been
proposed.
As experience in solving semantic image segmentation problems shows, such
CNN-based neuroarchitectures as U-Net [14], SegNet [15], MultiNet [16] demonstrate
the best results. There are attempts to use for semantic segmentation other networks, in
particular, DenseNet [17], DeepLab [8], ICNet [18], FRRN [19], as well as several
others. These networks, however, for some reasons do not meet the requirements
arising when working with images obtained by remote sensing methods. The analysis
of these reasons is beyond the scope of this article.
The following sections provide a comparative analysis of the neuroarchitectures U-
Net, SegNet, and MultiNet in terms of their efficiency in solving problems of semantic
segmentation of images obtained during remote sensing of the earth’s surface. This
analysis is carried out using source data from the WorldView-3 image gallery [11].
The training data required to solve the problem of semantic segmentation was formed
using the gallery of multispectral images obtained by the WorldView-3 satellite [11].
This database contains tagged images of the earth’s surface that can be used to rec-
ognize objects of various types on them. Examples of such images we can see in Fig. 1.
Classes of objects that are labelled in the WorldView-3 database are presented in
Table 1.
All images in the gallery are presented in GeoTiff format [12] in three- and 16-band
format. We obtained 25 color images in RGB format with a resolution of 3396 3349
pixels using 16-band pictures. We plan to use the multispectral nature of photos in the
WorldView-3 gallery in our future research as a source of additional information about
the objects in these images.
We divide each of these 25 images into images of 128x128 pixels in size to reduce
the requirements for the required computational resources. Examples of the reduced
pictures we can see in Fig. 2. As a result of this operation, about 1.6 104 patterns
were obtained, namely, 9752 training patterns, 1300 validation patterns and 5202 test
patterns were formed. These sets of patterns are sufficient, as shown by the results of
computational experiments, for training the analyzed convolutional networks.
312 D. M. Igonin and Y. V. Tiumentsev
Fig. 1. Examples of images of the earth’s surface from the WorldView-3 image gallery
Fig. 2. Training patterns and their masks obtained using images from WorldView-3 gallery
Semantic Segmentation of Images Obtained by Remote Sensing of the Earth 313
(a) (b)
CONV 2D 128x128x3
CONV 2D 128x128x32
Max Pooling 64x64x32
CONV 2D 64x64x64
CONV 2D 64x64x64
Max Pooling 32x32x64
CONV 2D 32x32x128
CONV 2D 32x32x128
Max Pooling 16x16x128
CONV 2D 16x16x256
CONV 2D 16x16x256
Max Pooling 8x8x256
CONV 2D 8x8x512
CONV 2D 8x8x512
CONV 2D TRANSPOSE 16x16x256
CONCENTRATE 16x16x512
CONV 2D 16x16x256
CONV 2D 16x16x256
CONV 2D TRANSPOSE 32x32x128
CONCENTRATE 32x32x256
CONV 2D 32x32x128
CONV 2D 32x32x128
CONV 2D TRANSPOSE 64x64x64
CONCENTRATE 64x64x128
CONV 2D 64x64x64
CONV 2D 64x64x64
CONV 2D TRANSPOSE 128x128x32
CONCENTRATE 128x128x64
CONV 2D 128x128x32
CONV 2D 128x128x32
CONV 2D 128x128x3
(c)
Fig. 3. Neuroarchitectires: (a) – MultiNet; (b) – SegNet; (c) – U-Net
Semantic Segmentation of Images Obtained by Remote Sensing of the Earth 315
Fig. 4. Learning curves of the selected neuroarchitectures: (a) – MultiNet; (b) – SegNet;
(c) – U-Net
U-Net (Fig. 3c) [14] is a standard CNN architecture for image segmentation tasks.
In the ISBI competition in 2015, U-Net ranked first by a large margin. The U-Net
network architecture yielded the best results in biomedical applications, as well as in
solving problems for which there is a limited amount of source data.
The quality of training is checked on the validation set that was not involved in
the learning (Fig. 4). The results of recognition for a test examples are presented
in the form of probability matrices (Fig. 5). The value of each of the elements of the
matrix is a probabilistic assessment of the conformity of the class to itself (the diagonal
values) and the probability of the classes being confused with each other (the non-
diagonal values).
316 D. M. Igonin and Y. V. Tiumentsev
Fig. 5. Test patterns results: (a) – MultiNet; (b) – SegNet; (c) – U-Net
Semantic Segmentation of Images Obtained by Remote Sensing of the Earth 317
5 Conclusions
With a fixed size of the training set, neuroarchitectures with a smaller number of
adjustable parameters have an advantage, due to the tight connection between this
number and the number of training examples.
Under these conditions, the best results were shown by the SegNet network, for
which the average value of the diagonal elements of the probability matrix is higher
than that of the MultiNet and U-Net networks. It should be noted, however, that the
recognition of objects belonging to the Vehicle class, which is essential for the
applications in question, is a difficult task for all analyzed networks.
Acknowledgement. This research is supported by the Ministry of Science and Higher Educa-
tion of the Russian Federation as Project No. 9.7170.2017/8.9.
References
1. Finn, A., Scheding, S.: Developments and Challenges for Autonomous Unmanned Vehicles.
Springer, Heildelberg (2010)
2. Valavanis, K.P.: Advances in Unmanned Aerial Vehicles: State of the Art and the Road to
Autonomy. Springer, Netherlands (2007)
3. Favorskaya, M.N., Jain, L.C. (eds.): Computer vision in control systems. Aerial and satellite
image processing, vol. 3. Springer, Heidelberg (2018)
4. Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, London (2011)
5. Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Prentice-Hall, New Jersey
(2002)
6. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press, Cambridge
(2017)
7. Zhao, Z.-Q., et al.: Object detection with deep learning: a review. arXiv:1807.05511v2 [cs.
CV]. Accessed 16 Apr 2019
8. Chen, L.-C., et al.: DeepLab: semantic image segmentation with deep convolutional nets,
Atrous convolution, and fully connected CRFs. arXiv:1606.00915v2 [cs.CV]. Accessed 12
May 2017
9. Hu, R., et al.: Learning to segment everything. arXiv:1711.10370v2 [cs.CV]. Accessed 27
March 2018
10. Gu, J., et al.: Recent advances in convolutional neural networks. arXiv:1512.07108v6 [cs.
CV]. Accessed 19 Oct 2017
11. WorldView-3 Satellite Imagery, DigitalGlobe, Inc. (2017)
12. Qu, J.J., et al.: Earth Science Satellite Remote Sensing: Data, Computational Processing, and
Tools, vol. 2. Springer, Heidelberg (2006)
13. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York
(2009). Pearson
14. Ronneberger, O, Fischer, P, Brox, T.: U-Net: convolutional networks for biomedical image
segmentation. arXiv:1505.04597v1 [cs.CV]. Accessed 18 May 2015
15. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder
architecture for image segmentation. arXiv:1511.00561v3. [cs.CV]. Accessed 10 Oct 2016
16. Teichmann, M., et al.: MultiNet: real-time joint semantic reasoning for autonomous driving.
arXiv:1612.07695v2 [cs.CV]. Accessed 8 May 2018
318 D. M. Igonin and Y. V. Tiumentsev
1 Introduction
spectroscopy. For example, artificial neural networks (ANN) have been successfully
used for determination of the concentrations of salts dissolved in water by Raman
spectra [5, 6], for rapid determination of wine components by absorption spectra [7], to
determine the content of glucose in urine by IR absorption spectra [8].
The IP considered in this paper, like many other IP, is characterized by incor-
rectness and poor conditionality, resulting in high sensitivity of the solution to noise in
the data. Despite the fact that ANN by themselves have the ability to work with noisy
data, in the case of IP this ability is not enough, requiring the development of special
approaches to improve the stability of the neural network solution.
In the previous studies of the authors [9–11], it was proposed to use addition of
noise during training to improve the stability of the neural network solution of IP. The
basis for this is a number of studies, where it was shown that this method could
improve the generalizing capabilities of the network [12, 13], prevent overtraining [14–
16], as well as increase the speed of training [17], and that its use was equivalent to
Tikhonov regularization [18].
In this paper, this method was tested in relation to the IP of spectroscopy of
aqueous ethanol solutions. In this case, a special type of distortion affecting the entire
spectrum at once was considered.
2 Problem Statement
recorded. In addition, spectra of pure alcohol and distilled water were also included.
There were total 73 patterns.
(b) Random errors in determining the intensity of the spectra channels by the CCD-
detector.
(c) Frequency shift of the spectrum channels, which may be due to uncontrolled
change of adjustment of the experimental setup when replacing the sample.
(d) Spectra distortions caused by a change in the laser power or by a change in the
absorption coefficient of the sample container, which leads to stretching or con-
traction of the spectrum.
e) Variable pedestal caused by light scattering on inhomogeneity of medium density
(Fig. 2).
The purpose of this study was to verify the applicability of the previously devel-
oped methods of improving the resilience of the neural network solution of IP to noise
in the data to the problem of spectroscopy of aqueous ethanol solutions in relation to
the fourth type of distortion (stretching/contraction).
training on a validation set without noise. In this case, the quality of the solution was
higher, and the training time – less. This approach was used in the present study.
The type of distortion (stretching/contraction) considered in this paper was modeled
as multiplicative noise. Two statistics were considered – Gaussian and uniform. The
noise levels considered were 1, 3, 5, 10, 20%. Thus, including the initial data sets
without noise, 11 training sets and 11 test sets, as well as 1 validation set were used.
Each initial pattern of the training and test sets was presented in 10 noise real-
izations. Networks trained on a training set with a certain noise level were applied to
test sets of all noise levels of the same statistics.
4 Results
Fig. 3 The dependence of the quality of the solution (mean absolute error, MAE) for ethanol on
the distortion level in the test set for various distortion statistics: left – multiplicative Gaussian
distortion (mgd), right – multiplicative uniform distortion (mud). Various lines represent various
distortion levels in the training set.
For the method of adding noise during training, one can see that the higher is the
noise level in the training set, the slower is the deterioration of the solution when the
noise level in the test set increases. For the other components under consideration, the
nature of the dependencies is completely similar.
The low level of error may indirectly indicate that the dataset is representative.
case, the results of almost the entire data set went into saturation – showed the lower or
upper limit of concentrations in the training sample (Fig. 4, left). This fact indicates a
high degree of difference of the sets.
Therefore, in the second case, the networks were trained at the maximum (20%)
level of Gaussian noise. The results are shown in Fig. 4, right.
When using the networks trained with noise, the results of determination of the
concentrations were close to those stated by the manufacturers. The average deviation
was 2.07% vol.
5 Conclusion
The following conclusions can be drawn from the results of the work:
• When using this method, the following effect has been confirmed: the higher is the
noise level in the training set, the slower the solution quality decreases with increase
of the noise level in the test set.
• The resilience of the solution to distortions in the data is higher for distortions
having uniform statistics than for distortions having Gaussian statistics.
• A dataset of spectra of real alcoholic beverages differs significantly from the dataset,
simulating alcoholic beverages. As a result, the networks trained without adding
noise, failed to give reasonable results.
• Networks trained with the addition of Gaussian noise with the level of 20% showed
an average deviation of 2.07% vol.
Thus, the effectiveness of the method of training with noise to improve the resi-
lience of the neural network solution of the inverse problem of spectroscopy of aqueous
ethanol solutions was confirmed.
Diagnostics of Water-Ethanol Solutions by Raman Spectra with ANN 325
References
1. Leary, J.: A quantitative gas chromatographic ethanol determination. J. Chem. Educ. 60(8),
675 (1983)
2. Isaac-Lam, M.: Determination of alcohol content in alcoholic beverages using 45 MHz
benchtop NMR spectrometer. Int. J Spectrosc. 2016(2526946), 8 (2016)
3. Zuriarrain, A., Zuriarrain, J., Villar, M., Berregi, I.: Quantitative determination of ethanol in
cider by 1H NMR spectrometry. Food Control 50, 758–762 (2015)
4. Boyaci, I., Genis, H., et al.: A novel method for quantification of ethanol and methanol in
distilled alcoholic beverages using Raman spectroscopy. J. Raman Spectrosc. 43(8), 1171–
1176 (2012)
5. Dolenko, S., Burikov, S., et al.: Adaptive methods for solving inverse problems in laser
Raman spectroscopy of multi-component solutions. Patt. Recogn. Image Anal. 22(4), 551–
558 (2012)
6. Dolenko, S., Burikov, S., et al.: Neural network approaches to solution of the inverse
problem of identification and determination of partial concentrations of salts in multi-
component water solutions. LNCS, vol. 8681, pp. 805–812 (2014)
7. Martelo-Vidal, M., Vázquez, M.: Application of artificial neural networks coupled to UV–
VIS–NIR spectroscopy for the rapid quantification of wine compounds in aqueous mixtures.
CyTA J. Food 13(1), 32–39 (2015)
8. Liu, W., Wang, W., et al.: Use of artificial neural networks in near-infrared spectroscopy
calibrations for predicting glucose concentration in urine. LNCS, vol. 5226, pp. 1040–1046
(2008)
9. Isaev, I.V., Dolenko, S.A.: Training with noise as a method to increase noise resilience of
neural network solution of inverse problems. Opt. Mem. Neural Netw. (Inf. Opt.) 25(3),
142–148 (2016)
10. Isaev, I.V., Dolenko, S.A.: Adding noise during training as a method to increase resilience of
neural network solution of inverse problems: test on the data of magnetotelluric sounding
problem. Studies in Computational Intelligence, vol. 736, pp. 9–16 (2018)
11. Isaev, I., Burikov, S., Dolenko, T., Laptinskiy, K., Vervald, A., Dolenko, S.: Joint application
of group determination of parameters and of training with noise addition to improve the
resilience of the neural network solution of the inverse problem in spectroscopy to noise in
data. LNCS, vol. 11139, pp. 435–444. Springer, Cham (2018)
12. Holmstrom, L., Koistinen, P.: Using additive noise in back-propagation training. IEEE
Trans. Neural Netw. 3(1), 24–38 (1992)
13. Matsuoka, K.: Noise injection into inputs in back-propagation learning. IEEE Trans. Syst.
Man Cybern. 22(3), 436–440 (1992)
14. An, G.: The effects of adding noise during backpropagation training on a generalization
performance. Neural Comput. 8(3), 643–674 (1996)
15. Zur, R.M., Jiang, Y., Pesce, L.L., Drukker, K.: Noise injection for training artificial neural
networks: a comparison with weight decay and early stopping. Med. Phys. 36(10), 4810–
4818 (2009)
16. Piotrowski, A.P., Napiorkowski, J.J.: A comparison of methods to avoid overfitting in neural
networks training in the case of catchment runoff modeling. J. Hydrol. 476, 97–111 (2013)
17. Wang, C., Principe, J.C.: Training neural networks with additive noise in the desired signal.
IEEE Trans. Neural Netw. 10(6), 1511–1517 (1999)
18. Bishop, C.M.: Training with noise is equivalent to Tikhonov regularization. Neural comput.
7(1), 108–116 (1995)
Metaphorical Modeling of Resistor Elements
Abstract. The variable resistors changing their resistance during the process of
functioning may become the basis for creation of neural networks elements
(synapses, neurons, etc.). The processes leading to resistance change are
extremely complicated and are not yet amenable to correct description. To
master the possibilities of using the variable resistors it is reasonable to use the
metaphorical modeling, i.e. to replace a complex physical system with a simple
mathematical system with a small number of parameters, reproducing the
important features of real system’s behavior. A simple (elementary) resistor
element with state determined by a single scalar variable is considered as the
modeling unit. The equations describing the change of the state variable are
written down. The choices of functions and parameters in equations, as well as
the methods of such elements combination with traditional electronic compo-
nents (fixed resistors, capacitors, diodes, etc.) are discussed. The selection of
these functions from a small set and the adjustment of several parameters allow
us to obtain the characteristics close to real ones. The scheme of measuring the
“volt-ampere characteristics” is considered. An example of specific selection of
functions determining the resistor element behavior is given.
1 Introduction
One of the most promising directions of the neuromorphic devices’ elemental base
development is mastering the possibilities of variable resistors application [1, 2]. Such
resistors change their resistance in the process of functioning and are able to become
the basis for creation of neural networks elements analogs (synapses, neurons, etc.) [2].
Even the special title “memristors” was invented for these elements. However the
different authors understand this title differently. Moreover, the term itself implies the
presence of energy independent memory in “memristors”, which is not necessary at all
for the neural elements implementation. Therefore, in order to avoid misunderstandings
we will not use this term.
The functioning of variable resistors is based on various physical processes [3–6],
which are not yet fully understood due to their complexity. The constructed “physical”
models are not actually physical and require the adjustment of parameters. From the
point of practical development of neuromorphic devices it would be much more useful
to have the simplest model reproducing the main features of behavior, although unable
to approximate the characteristics of devices with high accuracy due to small number of
parameters. At the same time, the model construction is based on general principles, its
specification is aimed at maximum simplification (provided that the required charac-
teristic features are preserved).
U ¼ RI; ð1Þ
dx
¼ Fðx; U; I; tÞ ð2Þ
dt
In common conditions the dependence of the equation right side on time can be
ignored. By means of Ohm’s law it is possible to exclude one of the quantities U or
I. As a result, we obtain the equation
dx
¼ Fðx; IÞ ð3Þ
dt
or a similar equation where I ! U. Which one of these equations to use is a matter of
convenience. It can be assumed in many cases that the change in resistance is mainly
due to flowing current with the current dependent function having a simpler form.
Let’s assume that the state variable is between 0 and 1: 0 x 1. This can always
be achieved by converting the variable. We also consider that in the state x ¼ 0 the
resistor has the maximum resistance, and in the state x ¼ 1 – the minimum resistance.
The equations of state change are written for 0\x\1. In order to avoid going beyond
the range of permissible values, the right part of the equations should be considered
equal to zero at x 0 and x 1. The accepted assumptions do not fixate the choice of
the state variable. It can be done in different ways. We can bind the state variable to
resistance R:
or conductivity G ¼ 1=R:
328 V. B. Kotov et al.
The first term describes the effect of positive current on the change of the resistor
state. The effect of negative current is described by the second term. The splitting into
positive and negative (relative to the current direction) parts is due to the fact that for
the most interesting types of resistors the processes during the positive and negative
currents are different. Meanwhile usually the currents of different directions tend to
change the state variable in opposite directions. This is true in particular for structures
of the metal-dielectric/semiconductor-metal type [6]. In this case it is convenient to
determine the current direction in accordance with the resistor direction – the positive
current tends to increase the state variable x, and negative current – to reduce. Here we
can assume that the functions FIþ ðI Þ; FI ðI Þ describing the dependence of the rate of
x change on the quantity of positive and negative currents have the following properties
dFIþ ðI Þ
FIþ ðI Þ ¼ 0 at I 0; FIþ ðI Þ [ 0; [ 0 at I [ 0;
dt
dFI ðI Þ
FI ðI Þ ¼ 0 at I 0; FI ðI Þ [ 0; [ 0 at I \ 0: ð7Þ
dt
We note that the properties (7) are not universal. Thus, if the state variable is the
normalized temperature the heating occurs regardless of the current direction and both
summands have the same sign. However, this case is not very interesting for practice.
In addition, here the limitation to one direction current is possible, so it is sufficient to
take only the first term in the right hand side of Eq. (6).
It is possible to indicate as the functions with properties (7) the family of power
functions on semiaxis, that is, we can take
These functions, like the F0(x) function, depend on the method of state variable
determination.
Metaphorical Modeling of Resistor Elements 329
Function F0 ðxÞ is used to describe the evolution of the resistor state in the absence
of current. The change of state has a character of approach to stationary state which
either coincides with one of the boundary states (x = 0 or x = 1) or corresponds to zero
of the F0 ðxÞ function. Let us assume for certainty that there is only one stationary
(basic) state x = 0. This is the most typical case. Herewith should be
The Eq. (3) for I ¼ 0 with a function F0 ðxÞ of the form (11) has a solution
(t0 is the initial time). At a\1, the basic state is achieved in a finite time t t0 ¼
xðt0 Þ1a =ðð1 aÞf0 Þ, afterwards the state is unchangeable. At a ¼ 1, the variable
x tends to zero according to exponential law. Although the basic state is not reached,
the approach to it is very fast. In both cases there is no sense to talk about long-term
memory. At a [ 1, approaching to basic state happens according to power law
1
ðt t0 Þ =1 a . The higher the index a is the slower the relaxation proceeds. The
memory on initial state is retained for long enough. Hence, the function (11) with a
sufficiently high index a allows us to model the long-term memory.
In case of need to model the memory with an infinite storage time, the function
F0 ðxÞ should be equal to zero at x from continuous interval.
At u 0, the first term in the right part of Eq. (14) is equal to zero and two other
summands are negative with taking into account the properties of (7), (9), (10). It
means that here happens the accelerated relaxation to the ground state x ¼ 0. The
negative voltage u can be used for fast erase of information.
The second term in the right part of Eq. (14) annuls at u 0. The remaining
summands have opposite signs. Their sum can be either positive or negative depending
on x and u. Considering the right part of Eq. (14) Fðx; IÞ as a function of x and u, we
obtain partitioning of the permissible values region 0 x 1; u 0 into the regions
F [ 0 and F\0. At F [ 0, the state variable x increases over time, and decreases at
F\0.
Areas F [ 0 and F\0 are separated by the curve F ¼ 0. Above the curve F ¼ 0
(i.e. at bigger u values) is the region F [ 0, below – the region F\0. The equation
F ¼ 0 at a given value u determines the stationary point xst ðuÞ corresponding to the
stationary state of resistor at constant voltage of the source u. The stationary point is a
stable equilibrium point (or a stable stationary point) if to its left (x\xst ) we have
F [ 0 and to its right – F \ 0. Otherwise we have the unstable stationary point.
Besides the stationary points determined by equation F = 0 the boundary stationary
points are possible. The point x ¼ 0 is a stable stationary point if at small positive
values x we have F\0. The point x ¼ 1 is a stable stationary point when to its left
F [ 0.
Just the stable stationary points play the determining role at the direct voltage
u since in this case the Eq. (14) describes the approximation to stationary point. In most
cases the approach to stationary point is fast enough – exponential, or even the sta-
tionary point is achieved over the finite time. The conclusions for direct voltage u case
can be extended to the case of quasi-stationary voltage change, when the state of
resistor has time to adjust to current voltage.
In this case (i.e. at u [ 0), the equation of curve F ¼ 0 can be written in the form
u ¼ PðxÞ; ð15Þ
where
F0 ðxÞ
Pð xÞ ¼ ðRð xÞ þ r Þh þ ; ð16Þ
Fx ðxÞ
Metaphorical Modeling of Resistor Elements 331
u u
Ps u=P(x) Ps u=P(x)
Pi Pi
0 1 x 0 1x
hðzÞ is a function inverse to the function FIþ ðIÞ. On condition that the inequalities
(7) fulfill and providing that the function FIþ ðIÞ is unbounded at I ! þ 1 we obtain
that the function h(z) biunivocally and monotonically maps the positive semi-axis onto
the positive semi-axis.
Obviously, Pð xÞ [ 0 at 0\x\1. Let’s denote Ps and Pi the exact upper and lower
bounds of function Pð xÞ at 0\x\1. If the function Pð xÞ is unlimited we consider
Ps ¼ 1. For u\Pi the Eq. (15) for variable x has no solutions, and the only stationary
(stable) point is the boundary point x ¼ 0. At u [ Ps , the Eq. (15) also has no solu-
tions, here the only stationary point is the boundary point x ¼ 1.
For Pi \u\Ps , the Eq. (15) has at least one solution. If the function Pð xÞ is
increasing then the solution of Eq. (15) is the only one. This solution determines the
sole stationary (stable) point. The Fig. 1 shows such curve F = 0 together with one-
dimensional trajectories of the imaging point movement at different voltages u. And if
the function Pð xÞ is monotonically decreasing then the only solution of Eq. (15)
determines the unstable stationary point. Here the both boundary points x ¼ 0 and
x ¼ 1 are stable stationary points.
For nonmonotonous function Pð xÞ, in a certain range of voltage u values the
Eq. (15) has more than one solution. The solution xst corresponding to positive slope of
the curve u ¼ Pð xÞ provides the stable stationary point, and if at x ¼ xst the slope of the
curve is negative we get the unstable stationary point. Additional stable stationary
points can be located at interval boundaries. For given value of u the number of stable
stationary points must be one more than the number of unstable stationary points. In
typical cases there may be the two stable stationary points and one unstable point
(Fig. 2).
332 V. B. Kotov et al.
xst
u
1
0 u 0 xm 1x
If there are several roots of Eq. (15) the dependence of the stationary (stable) point
on voltage xst ðuÞ is multivalued (usually double-valued) within certain range of volt-
ages u. Under quasi-stationary source voltage change the change of resistor state
corresponds to movement on one of the branches of the function xst ðuÞ. If this branch
ends, the transition to another branch occurs inevitably (Fig. 3). A sharp change of the
resistor state accompanied by sharp changes of resistance, current and voltage of the
variable resistor is the most obvious manifestation of multi-stability (bistability in
having the two stable stationary states).
4 Example
Let’s take FIþ ðIÞ in the form (8), F0 ðxÞ as (11), RðxÞ as (4), and Fxþ ð xÞ ¼ 1. Then
1=
f0 b a=
Pð xÞ ¼ ðr þ R0 DRxÞx b : ð17Þ
Bþ
In this case Pi ¼ Pð0Þ ¼ 0; Ps \1. The function PðxÞ on the positive semi-axis has
a maximum at
a r þ R0
x ¼ xm : ð18Þ
a þ b DR
x,u
I
U 0 5 10 15 20 25 30 35 40 45
stable stationary state is the boundary state x ¼ 1. The same boundary state is the only
stable state at u [ Pðxm Þ.
Thus, the inequality xm \1 is the condition for presence of bistability. Taking into
account that usually DR R0 ; r R0 , so the second multiplier in the right part of (18)
is of order of magnitude of one, we find out that the fulfillment of bistability condition
is quite real. However if the inequality xm \1 is observed with insufficient margin the
range of bistability becomes rather narrow and it is difficult to detect the bistability.
In practice, the periodically changing voltage of standard form (triangular, notched,
sinusoidal) is used as the source voltage. The condition of quasistationarity often is not
met. At that, the state of resistor does not keep up to get close enough to “stationary”
state for the current value of voltage u. So the state of resistor tends to “stationary” state
which is constantly changing. The resistance shocks arising due to the bistability can be
strongly smoothed due to incomplete relaxation towards the stationary state.
Figure 5 shows the “volt-ampere characteristic” (more precisely, the trajectory of
point with coordinates U, I) at using the triangular voltage u (of positive polarity),
obtained as a result of numerical solution of Eq. (6) at a ¼ 2; b ¼ 1; Rr0 ¼ 50;
R0 DR ¼ 1000. The three loops correspond to three periods of the source voltage. The
R0
difference of the loops is explained by the fact that at completion of the period of
voltage change, the state variable does not return to the initial value. This is clearly seen
in Fig. 6, where the graph of dependence xðtÞ along with the graph of the normalized
source voltage is presented.
The considered model of simple resistor element can explain a lot and predict some-
thing. But not everything. It’s natural. The world of resistor elements is diverse and
cannot be covered by one simple model. Nevertheless the capabilities of model can be
significantly expanded if we are not limited to one element and build on its basis the
various combinations.
334 V. B. Kotov et al.
The real variable resistor has two poles (contacts). Any of contacts can be a source
of variable (controlled) resistance. This is in any case true for metal-dielectric-metal,
metal-semiconductor-metal and other similar structures. To model such structures it is
necessary to use not one resistor element, but two parallel oppositely directed resistor
elements. The resulting combination has much richer capabilities (and is more com-
plex) than one resistor element.
At the points of different materials contact (for example, metal and dielectric) the
diverse diode structures characterized by nonlinear volt-ampere characteristics may
occur. If such characteristic can be considered as constant with no memory, then the
effect of the structure is reduced to series connection of diode or other similar non-
linear element. And if the volt-ampere characteristic depends on previous events, then
for an additional diode with memory it is possible to use a model similar to above
considered, but with use of the nonlinear Ohm’s law. In many cases it is convenient to
consider the diode and resistor elements as one.
The simple resistor element can act as a memory element – an analog of synapse if
the rate of resistor relaxation is somehow limited, for example the index a in formula
(11) is large enough. The resistor element can be used as a nonlinear element – analog
of neuron, since the resulting characteristics of resistor are essentially nonlinear, and
even the bistability is possible. If parallel to resistor element (simple or combined) a
capacitor is connected it is possible to implement the “leaky integration” found in many
neural networks.
The considered model is not only useful for presentation of existing resistor ele-
ments, but it also can indicate the direction of perspective elements improvement.
Funding. The work financially supported by State Program of SRISA RAS No. 0065-2019-
0003 (AAA-A19-119011590090-2).
References
1. Adamatzky, A., Chua, L.: Memristor Networks. Springer, Heidelberg (2014)
2. Vaidyanathan, S., Volos, C.: Advances in Memristors. Memristive Devices and Systems.
Springer, Heidelberg (2017)
3. Yang, J.J., Strukov, D.B., Stewart, D.R.: Memristive devices for computing. Nat.
Nanotechnol. 8, 13 (2013)
4. Radwan, A.G., Fouda, M.E.: On the Mathematical Modeling of Memristor, Memcapacitor
and Meminductor. Springer, Heidelberg (2015)
5. Yang, Y., Lu, W.: Nanoscale resistive switching devices: mechanisms and modeling.
Nanoscale 4, 10076 (2013)
6. Palagushkin, A.N., et al.: Aspects of the a-TiOx Memristor Active Medium Technology.
J. Appl. Phys. 124, 205109 (2018)
Semi-empirical Neural Network Models
of Hypersonic Vehicle 3D-Motion
Represented by Index 2 DAE
1 Introduction
the NARX approach lies in the fact that in the first case when generating a
model, some of the connections between state variables and control variables
of the source system of ODEs are embedded into the model without changing.
That allows us to reduce the number of adjusting parameters of the model and
improves its generalization properties.
In [2] Runge-Kutta neural networks (RKNN) are proposed for building mod-
els of dynamic systems represented in the form of ODE. This approach also
assumes the use of theoretical knowledge about the modeling object in the form
of the explicit Runge-Kutta integration formulas implemented in the network
architecture. RKNN has layers that, taking into account the connections between
state variables, implement the right parts of the ODE system. With this app-
roach, when training RKNN, the models of the right-hand side are refined.
In some problems, in addition to ODE, the theoretical model includes alge-
braic equality-type constraints, that is, the system of differential-algebraic equa-
tions (DAE) is the basis of the theoretical model. For DAE systems, the concept
of the index of the DAE system [3] is introduced. An example of such a problem
is the controlling of vehicle descending in the upper atmosphere.
In [1,4] the semi-empirical approach based on the explicit conditionally stable
methods of the numerical integration is considered. It is not possible to use this
approach directly to modeling the systems represented by DAE. A modification
is needed that takes into account the specific character of DAE systems.
where y = y(t) is a vector of state variables of the system, z = z(t) is the state
variable which is DAE algebraic variable (1), u = u(t) are the control variables.
We reduce the index of the system (1) by differentiating the algebraic constraint
[3]. The new algebraic constraint takes the form 0 = 2ġ+g. For index 1 DAE sys-
tems, the use of one-step s-stage methods for numerical integration is promising.
The implicit Runge-Kutta (IRK) method is often used. We propose to use the
IRK method based on the quadrature formula Radau IIA [3,5,6]. Applying IRK
method to the DAE system, we get (2)–(3). Using an implicit scheme involves
solving the system of nonlinear equations (2) by Newton’s method at each step
of integration:
s
Yni = yn + h aij f (tn + cj h, Ynj , Znj ), 0 = g̃(tn + ci h, Yni , Zni ), (2)
j=1
s
s
yn+1 = yn + h bi f (tn + cj h, Ynj , Znj ), zn+1 = R(∞)zn + bi ωi,j Zn,j , (3)
j=1 i,j=1
Semi-empirical ANN Index 2 DAE Models of Hypersonic Vehicle 3D-Motion 337
where yn , yn+1 are network input and output respectively, Nf , W are ANN-
modules that implement the right-hand sides of the ODE system and their
weights. When training the network, the delta rule (5) is used to modify the
weights of the ANN-modules. The derivatives are calculated by the chain rule
considering the fact that the network error propagates through the cascade cir-
cuit to the network input and the same ANN-modules are used at each stage of
the method:
∂E ∂yn+1 ∂yn+1 h ∂K0 ∂K1
= −2(on+1 − yn+1 ) , = + ,
∂wj ∂wj ∂wj 2 ∂wj ∂wj
∂K0 ∂Nf (yn , W) ∂W
= , (5)
∂wj ∂W ∂wj
∂K1 ∂Nf (yn + hK0 , W) ∂K0 ∂Nf (yn + hK0 , W) ∂W
= h + ,
∂wj ∂y ∂wj ∂W ∂wj
338 D. S. Kozlov and Y. V. Tiumentsev
3 Simulation Results
GW Z + macW Z
α̇ = q − tan β(p cos α + r sin α) + , da,act = dpitch + droll ,
mV cos β
GW Y + macW Y (7)
β̇ = p sin α − r cos α + , de,act = dpitch − droll ,
mV
fxW = −D, fyW = Y cos φW + L sin φW , fzW = Y sin φW − L cos φW ,
where μ is the longitude, λ is the geocentric latitude, γ is the relative flight path
angle, ψW is the relative azimuth, H is the altitude, r is the distance from the
Earth center to the center of mass of the vehicle, V is the relative velocity, φW
is the bank angle, α is the angle of attack, β is the angle of sideslip, [ψ, θ, φ]
are Euler angles, [p, q, r]T are components of angular velocity vector, B is a
matrix transforming vectors from vehicle-carried local Earth reference frame to
body-fixed, D, L, Y are total aerodynamic drag, lift and side forces respectively,
L̄, M̄ , N̄ are aerodynamic rolling, pitching and yawing moments respectively,
Ix , Iy , Iz are the roll, pitch and yaw moments of inertia respectively, da , de , dr
are the deflections of the right and left elevons and the rudder, da,act , de,act , dr,act
are control signals for right and left elevons and the rudder actuators, dpitch , droll
are pitch and roll motion control signals, T = 0.02 sec are the time constants
for right/left elevons and rudder actuators, ξ = 0.707 are the right/left elevons
and rudder actuators damping ratios, ωE is the Earth rotational rate, g is the
geopotential function, m = 191902 lb is the mass of the vehicle, acW , GW are the
vectors of the Coriolis acceleration and force of gravity in wind-axes reference
frame respectively.
In the DAE system H, μ, λ, V, ψW , γ, ψ, θ, φ, p, q, r, α, β, da , de , dr are state
variables, droll is an algebraic variable. Pitch and roll motion control signals
dpitch , droll are control variables. The rudder control law is given in [6]. We cal-
culate values droll at each step of the numerical integration of the DAE system
following the (α–φW )-technique for control of aircraft descending in the upper
atmosphere [5–7]. To ensure movement along a given trajectory, the model (6)–
(7) is enclosed by an algebraic equality (8) describing the variation of relative
flight path angle γ in the range of [−4.2385◦ , −10◦ ]. The resulting system of
equations can be attributed to the index-2 DAE system. The equation (8) is
transformed for calculations. For variable γ̇ the index reduction by differentia-
tion procedure and the right-side (6)–(7) substitution are performed.
40
dpitch, deg
30
20
10
0 2 4 6 8 10 12 14 16 18 t, sec
−3
x 10
3
2
q
E
0
0 2 4 6 8 10 12 14 16 18 t, sec
0.02
0.015
q, rad/sec
0.01
0.005
0
0 2 4 6 8 10 12 14 16 18 t, sec
0
droll, deg
−0.5
−1
0 2 4 6 8 10 12 14 16 18 t, sec
Fig. 2. The semi-empirical model output for values from the test set
step t = 0.2 s. The initial values were H = 1.272e+5 ft, V = 6.922e+3 ft/sec,
γ = −4.2385◦ , ψW = 55.316◦ , μ = 183.8◦ , λ = 34.4◦ , ψ = 69.767◦ , θ = 9.64◦ ,
φ = 46.69◦ , α = 20◦ , β = 0◦ , ω = 0 rad/sec, droll = 0◦ d˙ = 0, da = de = 0◦ ,
dr = 1◦ . The hypersonic vehicle characteristics Ix , Iy , Iz , xcg , S, c̄, b and aerody-
namic force and moment coefficient models (D, L, Y, L̄, M̄ , N̄ ,) are given from
[8]. To implement the model of the hypersonic vehicle motion a semi-empirical
model was used that realizes order 3 IRK method of numerical integration based
on Radau IIA quadrature formulas. A perceptron type network with 12 neurons
in the hidden layer was used as an ANN-module for Cm . In Fig. 2 we show the
values of the pitch control signal (dpitch ) from the test set, the values of the pitch
rate q calculated using the semi-empirical model, the values of the algebraic vari-
able (droll ) and the relevant absolute error (Eq ) of the q values reproduced by
the semi-empirical model. The root mean square deviations for the training, the
validation, and the test sets are respectively 6.6207e−4, 7.8975e−4, 0.0014.
4 Conclusions
The semi-empirical model was implemented using the equations of the full model
of the hypersonic vehicle motion in the specific part of the descent in the atmo-
sphere as theoretical knowledge. We present this system of equations as a DAE
system of index 2. The aerodynamic pitching moment coefficient implemented as
an ANN-module of a semi-empirical model has been identified to verify the train-
ing properties of this model. The obtained results demonstrate the efficiency of
the semi-empirical approach for neural network modeling of complex dynamical
systems.
Semi-empirical ANN Index 2 DAE Models of Hypersonic Vehicle 3D-Motion 341
References
1. Egorchev, M.V., Kozlov, D.S., Tiumentsev, Y.V., Chernyshev, A.V.: Neural net-
work based semi-empirical models for controlled dynamical systems. J. Comput.
Inf. Technol. 9, 3–10 (2013). (in Russian)
2. Wang, Y.J., Lin, C.T.: Runge-Kutta neural network for identification of dynamical
systems in high accuracy. IEEE Trans. Neural Netw. 9(2), 294–307 (1998)
3. Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II: Stiff and
Differential-Algebraic Problems, 2nd edn. Springer, Heidelberg (2002)
4. Egorchev, M.V., Tiumentsev, Y.V.: Learning of semi-empirical neural network
model of aircraft three-axis rotational motion. Opt. Mem. Neural Netw. (Inf. Opt.)
24(3), 201–208 (2015)
5. Kozlov, D.S., Tiumentsev, Y.V.: In: Proceedings of 8th Annual International Con-
ference on Biologically Inspired Cognitive Architectures, BICA 2017, vol. 128, pp.
252–257 (2018)
6. Kozlov, D.S., Tiumentsev, Y.V.: Neural network based semi-empirical models of
3D-motion of hypersonic vehicle. In: Advances in Neural Computation, Machine
Learning, and Cognitive Research II, pp. 196–201. Springer, Cham (2019)
7. Kozlov, D.S., Tiumentsev, Y.V.: Neural network based semi-empirical models for
dynamical systems described by differential-algebraic equations. Opt. Mem. Neural
Netw. (Inf. Opt.) 24(4), 279–287 (2015)
8. Shaughnessy, J.D., et al.: Hypersonic vehicle simulation model: winged-cone config-
uration. Technical report, NASA (1990)
Style Transfer with Adaptation
to the Central Objects of the Scene
1 Introduction
Non-photorealistic rendering or image stylization [5] is a classical problem in
computer vision, where the task is to render a content image in a given style.
Early methods [3,7,9] perform reproduction of specific styles (e.g. oil paintings
or pencil drawings) and use hard-coded features and algorithms for that.
Style transfer is a problem of transferring any style from arbitrary image,
representing that style, to any content image, as shown on Fig. 1. It is found
by Gatys et al. [2] that this task can be performed surprisingly well using deep
convolutional neural networks. Their main idea is to find in the space of images a
picture semantically reflecting content from the content image and style from the
style image. These two contradicting goals are regulated by minimizing simulta-
neously content loss and style loss:
y = arg min{Lcontent (x, xc , α) + Lstyle (x, xs )} (1)
x
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 342–350, 2020.
https://doi.org/10.1007/978-3-030-30425-6_40
Style Transfer with Adaptation to the Central Objects of the Scene 343
where xc is the content image, xs —the style image, y—the resulting stylized
image and parameter α is a weight factor (multiplier) inside the content loss
function, controlling the strength of stylization (Fig. 2a). Lower α imposes more
style and vice versa. The shortcoming of this approach is that style is imposed
uniformly onto the whole content image, distorting important central objects of
the image, which are critical for perception. For example, it is hard to say what
kind of birds sit on the tree (Fig. 2b), because small details of bird silhouettes
are lost during stylization.
(a) (b)
for the content image. Next, this mask is used to impose style with spatially
varying strength, controlled by the importance mask. This allows to achieve two
contradicting goals. Stylization is gentle on the central objects of the image,
critical for perception, such as human faces, houses, cars, etc. And stylization is
strong for the rest of the image, thus expressing a vivid style.
The paper is organized as follows. Section 2 gives a description of the pro-
posed method and provides qualitative comparisons with the baseline stylization
method of Gatys et al. [2]. Section 3 provides the details of the user evaluation
study and summarizes its results, highlighting the superiority of the proposed
solution. Section 4 concludes.
2 Method
2.1 Non-uniform Stylization
Consider the loss function in the optimization problem (1). In the original paper
[2] content loss is formalized as follows:
2
Lcontent (x, xc , α) = α l
Fi,j,c (x) − Fi,j,c
l
(xc ) (2)
i,j,c
where F l (z) ∈ RWl ×Hl ×Cl denotes inner tensor representation of image z on the
l-th layer of the convolutional neural network, (i, j) are spatial coordinates and
c is the number of the channel. Instead of using scalar α, we propose to use a
matrix αi,j ∈ RWl ×Hl with different values for each spatial location (i, j):
l 2
L content (x, xc , α) = αi,j Fi,j,c (x) − Fi,j,c
l
(xc ) (3)
i,j,c
time. We split the whole image into a set of regions and fill each region one by
one, evaluating its importance, using the above principle. This way we construct
an importance map αi,j , measuring semantic significance of each location (i, j)
on the image. This importance map is used as matrix α in the spatially aware
content loss (3) of the style transfer algorithm (1).
(a)
(b)
Fig. 3. (a) The probability distribution for input. (b) Changing the probability distri-
bution when the patch is overwritten
ingly. There are a lot of small details at dog’s muzzle that are lost in baseline
approach and preserved in our algorithm.
Fig. 5. (a) Averaging α matrices. (b) Baseline. (c) Averaging patch stylization.
(a) (b)
Fig. 7. (a) Baseline. (b) Averaging patch stylization. (c) Averaging super-pixel styliza-
tion
(a) (b)
Table 1. Frequencies with which each of the proposed methods are preferred compared
to the baseline of Gatys et al. [2].
Frequency
Patches-based importance generation 66%
Superpixel-based importance generation 72%
Segmentation-based importance generation 80%
4 Conclusion
A new style transfer method with spatially varying strength is proposed in this
work. Stylization strength is controlled for each pixel by automatically gener-
ated importance mask. Three methods—patch-based, segmentation-based and
superpixel-based—are proposed to generate importance mask. Qualitative com-
parisons and conducted user evaluation studies demonstrate superiority of the
proposed method compared to the classical style transfer method of Gatys et al.
[2] due to strong and expressive style transfer for the background and more gentle
style transfer for the central objects of the content image, allowing to minimize
distortions of the important details. Among three proposed importance mask
generation approaches, segmentation-based method showed the highest quality
which may be attributed to more accurate boundary estimation of the central
objects of the image.
References
1. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale
hierarchical image database. In: 2009 IEEE Conference on Computer Vision and
Pattern Recognition. pp. 248–255. IEEE (2009)
2. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional
neural networks. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 2414–2423 (2016)
3. Gooch, B., Gooch, A.: Non-photorealistic rendering. AK Peters/CRC Press, Natick
(2001)
4. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
volutional neural networks. In: Advances in Neural Information Processing Sys-
tems, pp. 1097–1105 (2012)
5. Research, A.: Image stylization: history and future. https://research.adobe.com/
news/image-stylization-history-and-future/. Accessed 2 July 2019
6. Rosebrock, A.: Segmentation: a slic superpixel tutorial using python. https://
www.pyimagesearch.com/2014/07/28/a-slic-superpixel-tutorial-using-python/.
Accessed 2 July 2019
7. Rosin, P., Collomosse, J.: Image and video-based artistic stylisation, vol. 42.
Springer, Heidelberg (2012)
8. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556 (2014)
350 A. Schekalev and V. Kitov
1 Introduction
Our version of the neural network approach to solving differential equations turned out
to be quite universal [1–8]. At the same time, it was not devoid of several drawbacks
compared to the classical methods of meshes, finite elements, etc.
First, neural network training is a very resource-intensive procedure. Secondly, the
required size of the neural network and the time of its training increase dramatically
with strengthening the requirements for the accuracy of the model. In this paper, we
consider the methods of formation of multilayer functional approximations proposed by
Let us consider the Cauchy problem for a system of ordinary differential equations
y0 ðxÞ ¼ fðx; yðxÞÞ;
ð1Þ
yðx0 Þ ¼ y0
yk þ 1 ¼ yk þ Fðf; hk ; xk ; yk Þ: ð2Þ
is known.
The constant C depends on the estimates of the function f and its derivatives in the
region in which the solution is found [11].
More accurate formulas are obtained by applying second-order methods [11], for
which the estimate (3) is replaced by the estimate kyðxk Þ yk k Cmaxðhk Þ2 .
One such method is the corrected Euler method, which works according to the
formula:
hk 0
Fðf; hk ; xk ; yk ; yk þ 1 Þ ¼ hk ½fðxk ; yk Þ þ ðf x ðxk ; yk Þ þ f 0 y ðxk ; yk Þfðxk ; yk ÞÞ ð4Þ
2
The Construction of the Approximate Solution of the Chemical Reactor 353
For the second-order equation of the form y00 ðxÞ ¼ fðx; yÞ, the Störmer method is
even more accurate [11]
Quite often in practice, there are cases when the formulation of the problem (1)
includes parameters.
y0 ðxÞ ¼ fðx; yðxÞ; lÞ;
ð6Þ
yðx0 Þ ¼ y0 ðlÞ:
Here the vector of the mentioned parameters is denoted by l. In this situation, the
problem (6) is usually solved numerically for a sufficiently representative set of
parameters. Our approach automatically gives an approximate version of the required
dependence, as which is taken yn ðx; lÞ.
Another common complication of the problem (1) is the boundary value problem,
which has the form
y0 ðxÞ ¼ fðx; yðxÞÞ;
uðx0 Þ ¼ u0 ; vðx0 þ aÞ ¼ v0 :
d2y dy
þ d expðyÞ ¼ 0; ð0Þ ¼ 0; yð1Þ ¼ 0: ð8Þ
dx2 dx
This problem is interesting because we know the exact solution, the domain of
existence of the solution, and the parameter values at which the solution of the problem
does not exist (d [ d 0:878458).
3 Calculation
According to the above considerations, at the first step, we approximate the exponent
from Eq. (8) by the perceptron expðyÞ 4:09 3:71 tanh½1:19 0:794y on the
interval ½0; 1 (it is known [12] that the sought solution is on this interval).
In constructing the multilayer solution, we used our modification of the corrected
Euler method (4) as the first step and our modification of the Stӧrmer method (5) as the
next. For two layers, we obtain an approximate solution:
Here y0 is the unknown initial value of the desired function at the left end of the
interval ½0; 1. To define a parameter y0 , we use a condition on the right end of the
interval yð1Þ ¼ 0, acting in one of two ways. The first method is to define the value y0
for fixed values of the parameter d.
The maximum difference between the exact solution and the approximate solution
y2 ðx; dÞ at the parameter value d ¼ 0:1 was 0.00041, at d ¼ 0:5 was 0.0046, at d ¼ 0:8
was 0.14 (Fig. 1).
0.30 0.8
0.25
0.6
0.20
0.15 0.4
0.10
0.2
0.05
x x
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
Fig. 1 The exact solution and the approximate two-layer solution y2 ðx; dÞ at the parameter value
(a) d ¼ 0:5, (b) d ¼ 0:8.
The Construction of the Approximate Solution of the Chemical Reactor 355
The results showed that for small values of d the approximate solution is close to
the exact solution. However, as the parameter d approaches the value d , the accuracy
deteriorates significantly.
For three layers, we obtain an approximate solution:
The exact solution and the approximate three-layer solution y3 ðx; dÞ at d ¼ 0:1 and
at d ¼ 0:5 practically merge, so we do not give the corresponding graphs. The max-
imum difference between the exact solution and the approximate solution y3 ðx; dÞ at the
parameter value d ¼ 0:1 was 0.00037, at d ¼ 0:5 was 0.0016, and at d ¼ 0:8 was
0.026.
As the number of layers increases, accuracy enhances, but formulas become more
cumbersome.
The maximum difference between the exact solution and the approximate four-layer
solution y4 ðx; dÞ at d ¼ 0:1 made 0.00032, when d ¼ 0:5 this made 0.00044, it was
0.015 when d ¼ 0:8.
We present graphs of the exact solution and the approximate three-layer solution
y3 ðx; dÞ and four-layer solution y4 ðx; dÞ at the parameter value d ¼ 0:8 in Fig. 2.
0.6 0.6
0.4 0.4
0.2 0.2
x x
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
a) b)
Fig. 2 The exact solution and the approximate solution at the parameter value d ¼ 0:8: (a) three-
layer y3 ðx; dÞ, (b) four-layer y4 ðx; dÞ.
X
m
y2n ð1; di Þ: ð9Þ
i¼1
356 D. A. Tarkhov and A. N. Vasilyev
Further, we present a result for which getting we used a three-layer solution. When
optimizing the functional (9) for m ¼ 100 and di ¼ id =m we got the dependence
The maximum difference between the exact solution and the approximate solution
y ðx; dÞ at d ¼ 0:1 was 0.0055, this at d ¼ 0:5 was 0.0069, and for the parameter value
d ¼ 0:8 was 0.014.
To illustrate the accuracy of the obtained solution, we give the following graphs
(Fig. 3).
0.30
0.6
0.25
0.20
0.4
0.15
0.10
0.2
0.05
x x
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
a) b)
Fig. 3 The exact solution and the approximate solution y ðx; dÞ at the parameter value:
(a)d ¼ 0:5, (b) d ¼ 0:8.
The maximum difference between the exact and approximate solution u3 ðx; dÞ with
the parameter value d ¼ 0:1 was 0.000035, with the parameter value d ¼ 0:5 was
0.0048, and with the parameter value d ¼ 0:8 was 0.12.
To illustrate the accuracy of the obtained solution, we give the following graphs
(Fig. 4).
0.30
0.6
0.25
0.20
0.4
0.15
0.10
0.2
0.05
x x
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
a) b)
Fig. 4 The exact solution and approximate solution u3 ðx; dÞ at the parameter value: (a) d ¼ 0:5
and (b) d ¼ 0:8.
As we expected, our method gives a more uniform approximation over the entire
interval of parameter d change.
4 Conclusion
We have studied new methods for constructing approximate neural network solutions
of differential equations. The methods do not require the use of resource-intensive
training procedures and allow building solutions with guaranteed accuracy. As a test
problem, we considered the solution of the boundary value problem (8), which sim-
ulates the processes in a chemical reactor [12]. As a result, we obtained the above
explicit solutions, which are more accurate than approximate solutions [3], in which a
network with 100 neurons was used.
Acknowledgment. This paper is based on research carried out with the financial support of the
grant of the Russian Scientific Foundation (project №18-19-00474).
References
1. Tarkhov, D., Vasilyev, A.: New neural network technique to the numerical solution of
mathematical physics problems. I Simple Probl. Opt. Mem. Neural Netw. (Inf. Opt.) 14, 59–
72 (2005)
2. Tarkhov, D., Vasilyev, A.: New neural network technique to the numerical solution of
mathematical physics problems. II Complicated Nonstand. Probl. Opt. Mem. Neural Netw.
(Inf. Opt.) 14, 97–122 (2005)
358 D. A. Tarkhov and A. N. Vasilyev
3. Shemyakina, T.A., Tarkhov, D.A., Vasilyev, A.N.: neural network technique for processes
modeling in porous catalyst and chemical reactor. In: Cheng, L. et al. (eds.) Advances in
Neural Networks – ISNN 2016. Lecture Notes in Computer Science, vol. 9719, pp. 547–554.
Springer, Cham (2016)
4. Budkina, E.M., Kuznetsov, E.B., Lazovskaya, T.V., Leonov, S.S., Tarkhov, D.A., Vasilyev,
A.N.: Neural network technique in boundary value problems for ordinary differential
equations. In: Cheng, L. et al. (eds.) Advances in Neural Networks – ISNN 2016. Lecture
Notes in Computer Science, vol. 9719, pp. 277–283. Springer, Cham (2016)
5. Lozhkina, O., Lozhkin, V., Nevmerzhitsky, N., Tarkhov, D., Vasilyev, A.: Motor transport
related harmful PM2.5 and PM10: from on-road measurements to the modeling of air
pollution by neural network approach on street and urban level. In: Journal of Physics
Conference Series, vol. 772 (2016). http://iopscience.iop.org/article/10.1088/1742-6596/772/
1/012031
6. Kaverzneva, T., Lazovskaya, T., Tarkhov, D., Vasilyev, A.: Neural network modeling of air
pollution in tunnels according to indirect measurements. In: Journal of Physics Conference
Series, vol. 772 (2016). http://iopscience.iop.org/article/10.1088/1742-6596/772/1/012035
7. Lazovskaya, T.V., Tarkhov, D.A., Vasilyev, A.N.: Parametric Neural Network Modeling in
Engineering. Recent Pat. Eng. 11(1), 10–15 (2017)
8. Antonov, V., Tarkhov, D., Vasilyev, A.: Unified approach to constructing the neural
network models of real objects. Part 1 Math. Models Meth. Appl. Sci. 41(18), 9244–9251
(2018)
9. Lazovskaya, T., Tarkhov, D.: Multilayer neural network models, based on grid methods. In:
IOP Conference Series: Materials Science and Engineering, vol. 158 (2016). http://
iopscience.iop.org/article/10.1088/1757-899X/158/1/01206
10. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117
(2015)
11. Hairer, E., Norsett, S. P., Wanner, G.: Solving Ordinary Differential Equations I: Nonstiff
Problem, xiv, p. 480. Springer, Berlin (1987)
12. Hlavacek, V., Marek, M., Kubicek, M.: Modelling of chemical reactors Part X. Chem. Eng.
Sci. 23 (1968)
13. Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Sig. Process. 7
(3–4), 1–199 (2014)
14. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127
(2009)
Linear Prediction Algorithms for Lossless
Audio Data Compression
Abstract. The paper considers the use of such linear interpolation algorithms
as LPC, FLPC, and Wise-LPC in the lossless audio data compression. In
addition to the interpolation methods, the problems of best coding and optimal
sampling window selection are investigated. The Wise-LPC algorithm is shown
to allow a 1–5% improvement of audio signal compression against conventional
LPC and FLPC approaches. The prediction error has a Laplace distribution, its
variance decreasing smoothly and reaching “saturation” with the growing
window width.
1 Introduction
Neural net algorithms provide new tools for different fields of science and technology.
They have recently helped to make a breakthrough in pattern and speech recognition,
text translation, and intellectual multipath games such as Go, chess, etc. On the other
hand, the data compression, storage and transmission still use the algorithms developed
in the 80s and 90s or in early 2000s at best. These are such well-known lossless data
compression algorithms and data formats as zip, png, flac, exe, and many other lossy
compression techniques, e.g. mp3, jpeg, mpeg.
Here we would like to elaborate on the FLAC data format [1–3] once again. Today
this data format is most popular in lossless audio data compression. Article [2], which
gives the basics of the algorithm, was taken as a starting point for further consideration.
The FLAC format is the combination of linear predictive coding (LPC) [4] and
Huffman-Golomb prediction error coding [5, 6]. Below we discuss the features of
prediction and compression algorithms and present the experimental results.
X
p
~xt ¼ ai xti ¼ a1 xt1 þ a2 xt2 þ . . . þ ap xtp ð1Þ
i¼1
et ¼ xt ~xt ð2Þ
The nearer to zero the value of error (2) is, the fewer data bits are needed for
storage. For this reason the unknown coefficients fai gpi¼1 are determined by minimizing
the mean square deviation of the estimate from the actual amplitude:
!2
X
w X
p
E¼ xt ai xti ð3Þ
t¼0 i¼1
where xt are signal amplitudes at moments t 2 ½0; w, w is the sample length. Though
the sample length w is not defined strictly and is often a mere standard requirement, the
usual number of readings in sample w is much larger than the order p of the linear
model (w p). For example, standard LCP10 used in speech compression has the
prediction order p ¼ 10 and the number of readings w ¼ 120.
It can be shown [4] that the minimization of (3) reduces to the set of p linear
equations with a Toeplitz matrix consisting only of the autocorrelation coefficients:
X
w
Rl ¼ xt xtl ð4Þ
t¼0
The FLPC algorithm has an advantage over the LPC method in the unnecessity of
computing autocorrelation coefficients (4). Since all coefficients fai gpi¼1 are fixed in the
FLPC algorithm, there is no need to code and store anything but the errors.
2.3 Wise-LPC
It can be easily shown that FLPC of the p-th order gives the p-th derivative of the input
signal. We suggest a new algorithm Wise-LPC which is a combination of FLPC and
LPC algorithms. The idea is to determine how many derivatives of the signal (the order
of FLPC) should be taken before the use of the LPC method. The Wise-LPC algorithm
includes three steps:
1. Consecutive differentiation of the signal and computation of the error.
2. If the variance of the error for the n-th derivative is smaller than that for the ðn þ 1Þ-th
derivative, the process is stopped and the n-th derivative is chosen.
3. The application of the p-th-order LPC to the n-th derivative.
The time complexity remains linear when the Wise-LPC method is used.
3 Results
division is always made before processing and compression of the signal. The smaller
the sample length is, the simpler the transmission of this portion of the signal is and the
less risk that it gets distorted or lost in transmission. On the other hand, it was men-
tioned in paragraph 2.1 that the realization of the LPC algorithm requires that the
sample length shouldn’t be too small because it affects the precision of determination of
autocorrelation coefficients. As of now there is not any mathematically proved rec-
ommendation about which window width should be best for which kind of signal.
Figure 1 illustrates the spread of error for p ¼ 3 when the window width varies.
Figure 2a shows the relation between the variance of the error and the degree of
approximation and window width w for the same audio signal. It is seen that with p ¼ 3
the widening of the window beyond w ¼ 4096 makes no sense because it doesn’t lead
to notable improvement, i.e. “the saturation” comes.
Fig. 1. The spread of the error with varying window width w and p ¼ 3.
Fig. 2. (a) The relation between the variance of the error and the degree of approximation p and
window width w; (b) Comparison of audio signal compression using LPC, FLPC and Wise-LPC
methods. The optimal degree of differentiation is n ¼ 2, the order of LPC changes from 0 to 9.
Linear Prediction Algorithms for Lossless Audio Data Compression 363
the best differentiation degree in Wise-LPC is dependent on the signal spectrum. The
higher the upper frequency of the signal is, the less the differentiation degree. The
upper frequency Fupper is determined by the threshold of 20 dB.
It is seen from Table 1 that if we deal with high-frequency signals (Fupper ¼ 12. . .20
kHz), the differentiation degree is n ¼ 1 or n ¼ 2. In the case of low-frequency signals
(Fupper ¼ 0. . .10 kHz), n ¼ 3 or n ¼ 4. The compression results for low-frequency
signals are significantly better. In particular, the Wise-LPC algorithm should work well
in compression of speech because the human speech frequency spectrum extends from
0:3 to 3:4 kHz.
4 Conclusions
The research allows the following conclusions. The variance of the error smoothly falls
and the width of the Laplace distribution approaches “saturation” when the window
width grows.
The Wise-LPC algorithm permits better compression retaining the linear time
complexity. On the average, the Wise-LPC algorithm improves the compression by 1–
5% for broadband high-frequency signals and 5-10% for low-frequency signals. It
allows the conclusion that the algorithm should work well in speech encoding.
The FLAC format involves the combination of linear prediction and Huffman-
Golomb error coding. Note that the division of the compression procedure into two
unrelated stages is a popular trick in compression algorithms: first the extrapolation
364 L. S. Telyatnikov and I. M. Karandashev
algorithm is generated, and then the second algorithm that takes the remnants (pre-
diction errors) and stores them in a compact form is built. The approach is also popular
in modern neural-net-based compression techniques where neural nets are usually used
only in the first stage (data prediction) [7]. We hope that we will evident soon the
advent of end-to-end systems where neural nets are engaged in the both stages con-
currently [8]. And this kind of systems is our next goal.
Acknowledgements. The research was supported by the State Program SRISA RAS No. 0065-
2019-0003 (AAA-A19-119011590090-2).
References
1. FLAC format. https://xiph.org/flac/format.html
2. Robinson, T.: SHORTEN: Simple lossless and near-lossless waveform compression.
Technical Report 156, Cambridge University Engineering Department, Trumpington Street,
Cambridge, CB2 1PZ UK, December 1994
3. Hans, M., Schafer, R.W.: Lossless compression of digital audio. IEEE Sign. Process. Mag. 18
(4), 21–32 (2001). https://doi.org/10.1109/79.939834
4. Collomb, C.: Linear prediction and Levinson-Durbin algorithm (2009). https://www.
academia.edu/8479430/Linear_Prediction_and_Levinson-Durbin_Algorithm_Contents
5. Golomb, S.W.: Run-length Encodings. IEEE Trans. Inf. Theory 12, 399–401 (1966)
6. Rice, R.F.: Some Practical Universal Noiseless Coding Techniques. Technical Report 79/22,
Jet Propulsion Laboratory (1979)
7. Kleijn, W.B., Lim, F.S.C., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q., Walters, T.C.:
Wavenet based low rate speech coding. In: 2018 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) (2018). https://arxiv.org/abs/1712.01120
8. Kankanahalli, S.: End-to-end optimized speech coding with deep neural networks. In: 2018
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
pp. 2521–2525 (2018). https://doi.org/10.1109/icassp.2018.8461487. https://arxiv.org/abs/
1710.09064
Neural Network Theory, Concepts and
Architectures
Approach to Forecasting Behaviour
of Dynamic System Beyond Borders
of Education
1 Introduction
which allows to describe the behavior of objects that make up a complex system in the
present and future. [4–6]. Thus, it is necessary to create models of systems which will
allow to predict the behavior of difficult technical objects in conditions of stable and
changeable environments, at rated loads and beyond them.
For difficult technical objects it is possible to use various approaches for creation of
mathematical models with different degree of details [7]:
– creation of nominally functional descriptions of a system (static or dynamic) that
demands understanding of processes, taking place in a system;
– creation of simulation models on the basis of the known properties and functions of
a system (nature of communications of input and output parameters);
– creation of models on the basis of training and the analysis experimental data
without a known functional connection, which requires a huge number of examples
of system work states.
The purpose of any variant of modeling consists in rather exact description of the
processes taking part in the modeled object for predictions of consequences. However,
it should be noted that the nominal settings are usually well studied, but the emer-
gencies have no full description. This leads to the fact that the formed model of the
object has to provide forecasting of behavior not only within nominal situations but
also to overstep their boundaries.
Let’s consider possible ways of solving the task of modeling of systems, based on
training methods using the examples. Among them it is possible to allocate neural
network models [3] which allow multiple examples to construct not only connections
of input and dependent parameters, but to estimate structure of this connections to a
certain degree. Let’s review several examples of models of dynamic systems to predict
their behavior beyond the borders of education.
Let’s consider as the object of modeling the process of occurrence of such phe-
nomena, i.e. we actually will be able to predict whether the process of stabilization of
angular speed is successful by the behavior of the model. Thus we solve a problem of
classification of the following types:
– predict the distance of the prediction window ðnwÞ from the current time moment
ðtÞ, whether the angular velocity (i.e. analysis of the behavior of the system in
moments from t n to t þ nw);
– the input dataset is the values of the gyroscope state vector at the time of
t and n previous states X ¼ hxðt nÞ; yðt nÞ; x_ ðt nÞ; y_ ðt nÞ; €xðt nÞ; €yðt nÞ;
uxðt nÞ; uyðt nÞ; . . .xðtÞ; yðtÞ; x_ ðtÞ; y_ ðtÞ; €xðtÞ; €yðtÞ; uxðtÞ; uyðtÞi;
– the dependent variable T 2 f1; 1g. We believe that T ¼ 1 (class ‘off’ on Fig. 2)
the lack of stabilization of the angular velocity and T ¼ 1 (class ‘on’ on Fig. 2) for
stabilization areas, the difference between the reference model and the results of the
adaptation and control loop operation on the interval is actually estimated
½t; t þ nw.
Determination of the signal type refers to problems of classification temporary
signals (TSC) for the solution of which offered most various approaches on the basis of
use of classical feedforward and recurrent networks [4], networks of convolutional type
[5] and LSTM [6] networks. We teach qualifiers for the Xz area = [3, 7] rad/sec.
As a result of training of several types of qualifiers best results shows model based
on LSTM network [9]. Quality of decisions in the area of nominal values of angular
speed for test selection made 94% of correctly assessed situations. Application of this
model for the test is shown in Fig. 1a. Besides, modeling at Xz 2 ½7; 10 rad/sec and
Xz 2 ½0; 3 rad/sec shows high quality of prediction (70–75%) for examples behind the
borders Xz used for training. The generated classifier allows to specify the area of
stabilization for nw ¼ 0:004 seconds up to stable model entrance into this zone.
370 A. A. Brynza and M. O. Korlyakova
Results of modeling are given in Fig. 2 (a) for area inside training range and Fig. 2
(b) beyond the borders of education. Practically all timepoints in Fig. 2 (b) correspond
to lack of stabilization angular speed of T ¼ 1 (class ‘off’), the area of 1st kind errors
is highlighted with color. Assessment of quality of training of several types of networks
is given in Table 1 (sample size - 4000 examples).
The efficiency of LSTM network can be explained by formation of model of
temporary behavior whereas the network of a perceptron class made only the
description of the known part of data. Thus, when training LSTM appears dynamic
system model more suitable for formations of the digital double.
Fig. 2 Classifier solution ‘on’ (T=1)\‘off’ (T=−1) (a) for the nominal area of the model,
(b) beyond the borders of the nominal area.
Approach to Forecasting Behaviour of Dynamic System Beyond Borders of Education 371
x_ 1 ¼ x1 þ x2 x3 þ u1
x_ 2 ¼ x2 x1 x3 þ cx3 þ u2 ð1Þ
x_ 3 ¼ rðx2 x3 Þ MH signðx3 Þ
Fig. 4 The initial trajectory of the system at u1 ¼ 6 and the result of forecasting using
(a) decision trees (b) the output vector of the neural network (c) serial output of the neural
network.
Fig. 5 (a) The initial trajectory of the system at u1 ¼ 11 (unstable behavior) (b) The result of
prediction using a feed forward network.
Fig. 6 Error graphs of the considered experiments on the range of values of the control
parameter u1 2 ½13 1.
Based on the graph (see Fig. 6) the best quality of forecasting is shown by the
network from experiment 3. The network from experiment 2 achieves good prediction
quality only in the vicinity of the control parameter at which the training was per-
formed. The ensemble of trees is able to predict the pattern of behavior, however, not as
smooth as the results obtained using other approaches.
374 A. A. Brynza and M. O. Korlyakova
3 Conclusions
The formation of digital doubles of real objects allows to solve the problem of pre-
diction the behavior of complex objects, but it is necessary to consider that the resulting
dataset cannot reflect all features of work of a real system, and only a number of key
patterns.
As showed by preliminary experiments within computer modeling, prediction of
behavior of difficult technical system is possible with good quality on the basis of
preliminary training if modeling occurs in one of the forms not only at input-output
reactions, but also may form a model of dynamic system. This fact was noted for both
classification and time series forecasting.
Separately it is worth mentioning an opportunity in additional training of a digital
double in the course of working operation that will eventually allow to adjust forecasts
of working capacity.
References
1. Bazhenov, Yu., Kaleno, V.P.: Prediction of residual life of electronic engine control systems.
Gazette SibADI 2(56) (2017)
2. Tonoyan, S., Baldin, A., Eliseev, D.: Forecasting of the technical condition of electronic
systems with adaptive parametric models.Gazette BMSTU. Series “Instrumentation” 6(111)
(2016)
3. Modeling Time-dependent CO 2 Intensities in Multi-modal Energy Systems with Storage
Christopher Ripp and Florian Steinke, Member, IEEE. url: https://arxiv.org/pdf/1806.04003.
pdf
4. Katsuba, Yu., Grigorieva, L.: Application of artificial neural networks to predict the
technical condition of products. Int. Res. J. 3(45), 19–21 (2016)
5. Fawaz, H.I., Forestier, G., Weber, J., doumghar, L.I., Muller, P.-A.: Transfer learning for
time series classification. In: 2018 IEEE International Conference on Big Data (2018)
6. RuBwurm, M., Körner, M.: Temporal vegetation modelling using long short-term memory
networks for crop identification from medium-resolution multi-spectral satellite images. In:
2017 IEEE Conference On Computer Vision And Pattern Recognition Workshops
(CVPRW) (2017)
7. Chucheva, I.: models and methods of prediction. In: Mathematical Bureau. Forecasting on
OREM (2011)
8. Myshlyaev, Y., Finoshin, A., Myo, T.Y.: Sliding Mode with Tuning Surface Control for
MEMS Vibratory Gyroscope. 6th International Congress on Ultra Modern Telecommuni-
cations and Control Systems and Workshops (2014)
9. Tai, K.S., et al.: Improved semantic representations from tree-structured long short-term
memory network. arXiv:1503.00075 [cs.CL] (2015)
10. Chu, J., Hu, W.: Control chaos for pernament magnet synchronous motor base on adaptive
backstepping of error compensation. Int. J. Control Autom. 9(3), 163–174 (2016)
Towards Automatic Manipulation of Arbitrary
Structures in Connectivist Paradigm
with Tensor Product Variable Binding
Alexander V. Demidovskij(&)
1 Introduction
For a long period, Artificial Intelligence (AI) community investigates two important
paradigms about computations: symbolic and sub-symbolic or connectionist approa-
ches. Although, those two ideas can be considered drastically different, it is likely for
them to become partners rather than competitors. Symbolic level is defined by methods
that manipulate symbols and explicit representations. Connectionist approach [1, 2] is
built around the idea of massive parallelism and mostly characterized by artificial
neural networks. The potential symbiosis of two paradigms can bring robust and
flexible solutions that produce understandable results that are easy to validate.
Symbolic structures can be encoded in the distributed representation with many
means: First-Order Logics (FOLs) [3, 4], Holographic Reduced Representations
(HRRs), Binary Spater Codes and so on [5]. One of the key contributions to the field
are presented in the Tensor Product Variable Binding approach proposed by
Smolensky [6] and further applied in Vector Symbolic Architectures (VSA) [7]. Dis-
tributed representations taken by this method are used in multiple domains, especially
in Natural Language Processing (NLP) [8], where a sentence plays a role of structure.
In order to describe the task and the proposed solution it is essential to give several key
definitions of the Tensor Product Variable Binding (TPVB).
There are already solutions that can translate simple structures to tensor repre-
sentations and back to the symbolic structures [9]. However, there is a gap in making
operations over structures on the tensor level. Indeed, there are multiple routine
operations over structures: adding or removing nodes, joining structures together etc. In
this paper the task of joining structures together is considered and thoroughly analyzed.
2 Task Description
There is structure S presented on the Fig. 1. It consists of two levels of nesting (root is
not considered as a first level). This structure contains 3 fillers: A, B, C and only two
elementary roles: r0 (left child) and r1 (right child). Each filler and role should be
transformed to vector representation. There is only one strong requirement: fillers,
defined on vector space VF, should be linearly independent among each other, as well
as roles, defined on vector space VR. At the same time, an assignment for fillers and
roles can be arbitrary with the aforementioned condition being satisfied (2).
X
w¼ f
i i
ri ¼ A r0 r0 þ C r1 r0 þ B r1 ð3Þ
It is easier to first calculate compound roles (4) and then apply them to (3) in order
to find the corresponding tensor representation (5).
½ 100 0
r00 ¼ r0 r0 ¼ ½10 0 ½10 0 ¼
½0 0
ð4Þ
½0 0
r10 ¼ r1 r0 ¼ ½0 5 ½10 0 ¼
½ 50 0
w ¼ A r00 þ C r10 þ B r1
½ 100 0 ½0 0
¼ ½8 0 0 þ ½0 0 10
½0 0 ½ 50 0
þ ½0 15 0 ½0 5 ¼ ð5Þ
2 3
½0 0
½ 800 0 ½0 0 ½0 0 6 7
¼ þ 4 ½0 75 5
½0 0 ½0 0 ½ 500 0
½0 0
Fig. 2. Possible stages of building structure from subtrees. (a) There are independent fillers.
(b) A and C are joined as left and right children of root accordingly. B is still an independent
filler. (c) A subtree from (b) is taken as a left subtree and a free filler B is taken as a right subtree.
378 A. V. Demidovskij
From Fig. 2 it is clear that building a structure inherently means joining subtrees. In
case of binary tree there are one or two subtrees that can be joined. Also, it is vital that
at the beginning each filler is considered as a separate tree that can participate in the
joining procedure.
This brings to the formulation of the task. The target task of the current paper is to
propose the robust neural architecture for performing dynamic construction of tensor
representation of the arbitrary structure via joining the subtrees and investigate engi-
neering aspects of its implementation.
Joining two subtrees as direct children of the new root and by that constructing the new
tree is by nature a simple operation that makes a whole subtree play a new role in terms
of Tensor Product Variable Binding. It is extremely clear from Fig. 2b, where instead
of taking big trees, there are only two fillers that play a role of left and right subtree
correspondingly. In order to achieve the same result on tensor level it is enough to
perform tensor multiplication of the filler and corresponding role. Generalizing it to the
case when instead of a filler there is a representation of a tree, there is still a need to
perform tensor multiplication of the tree distributed representation and the assigned
role. The complexity in this case lies in the fact that tensor representation of the
structure is the multi-component list of tensors of different depth and it is no longer a
plain vector-vector multiplication.
Definition 5. Joining operation cons(p, q) is an action over two structures (trees) so
that the tree p is sliding as a whole ‘down to the left’ so that its root is moved to the left-
child-of-the-root position and tree q is sliding ‘down to the right’.
Operation cons can be expressed for binary trees as:
consðp; qÞ ¼ p r0 þ p r1
cons0 ð pÞ consðp; ;Þ ð6Þ
cons1 ðqÞ consð;; qÞ;
matrices are constructed in the same manner, only Wcons0 is considered in this section.
Matrix is computed from the role vector and identity matrices (8).
where d is the depth of the representation, 1A is an identity matrix of width and height
equals number of elements in the filler vector and 1R is an analogous identity matrix
with size depending on the role vector.
The key point in constructing the matrix is to keep the order of tensor multipli-
cations. This is not so obvious because the way tensor representation is considered in
TPVB is rather unbounded – TPVB only recognizes the feature that resulting tensor
contains all multiplications of input tensors elements. However, for Wcons0 it is very
important to keep dimensions of roles first. Finally, we get the following matrix for
depth = 2, role vector with 2 elements, filler vector with 3 elements (9).
22 3 3
r0 0
0 0
6 6 r0 1 7 7
66 7 7
66 r0 0 7 7
66 0 0 7 0 7
66 r0 1 7 7
66 7 7
64 r0 0 5 7
6 0 0 7
6 r0 1 7
6
6 22 3 377
6 r0 0 7
6 6 6 r0 0 0 7 77
6 66 7 77
6 77
1
66 7
6 66 0 r0 0
0 7 77
6 66 7 0 77
6 66 r0 1 7 77
6 64 r00 5 77
6 6 77
6
6 6 0 0 777
6 r01
2 37
6 0 6 77
6 6 r00 77
6
6 6 6 r01 0 0 777 7
6 6 6 77 7
6 6 6 77 7
6 r0 0 7
6
6 6
6 0 6 0
6 0 777
7 7 7
6 6 6 r0 1 7 77
4 4 4 r0 0 5 5 5
0 0
r0 1
ð9Þ
During the computation phase the matrix is flattened and does not contain the block
structure present in (9). Blocks are shown for better visualization of the matrix
structure.
The overall scheme of the proposed neural architecture for joining structures is
demonstrated on the Fig. 3. Neural Network is designed to accept multiple inputs of
two types: constant and variable ones, they will be described later. After that each filler
380 A. V. Demidovskij
Reshaping Layers. Those layers are part of the subtree flattening branch (Fig. 4) and
exist for input tensors or rank 1 and 2. It is a technical requirement of the imple-
mentation in the Keras1 framework due to the fact that Flatten layer can work only with
tensors of rank bigger than two. So, Reshaping layers expand dimensions of such
inputs with fake dimension of 1 to satisfy Flatten layer requirements.
1
https://keras.io/.
Towards Automatic Manipulation of Arbitrary Structures 381
Flattening Layers. Those layers are part of the subtree flattening branch (Fig. 4) and
exist for all input tensors. Those layers transform tensors of different rank to a simple
vector format according to the ordinary rules of flattening multi-dimensional tensors.
Concatenate Layers. Those layers are part of the subtree flattening branch (Fig. 4).
Those layers join vectors that correspond to each level of the tensor representation in
one vector. The order is very important here: from vectors representing zero depth level
to N.
Transpose Layers. Those layers are part of the subtree flattening branch (Fig. 4). Due
to the fact that next operation is matrix-vector multiplication it is required to transform
a vector into a column vector. Transpose layers enclose the subtree flattening branch
and their output is used in the final part of the network.
ShiftMatrix Layers. Those layers are part of the role propagating branch (Fig. 5). The
primary and only purpose of this layer is production of the shift matrix that was
discussed in Section “Theoretical method of building shift matrix”. In practice it is a
tensor of rank 2 or an ordinary matrix. It is interesting to estimate it dimensions. Width
382 A. V. Demidovskij
of the matrix or a shift operator equals to the size of the vector representing the tree that
should be assigned to a given role while height of the matrix equals the size of vector
representing a structure assigned to a new role.
MulVec Layers. Those layers are part of the neural network tail (Fig. 3). Those layers
perform ordinary matrix-vector multiplication and the resulting vector contains tensor
representation of the current subtree assigned to a new role.
Add Layer. This layer is an output of the network (Fig. 3). All the subtrees are now
assigned to new roles and it is required to join them together and the sum vector would
represent the resulting structure after joining all subtrees on the tensor level.
5 Conclusion
The novel neural architecture that solved a task of joining structures was proposed and
implemented in the Keras framework. The implementation is open-source and available
online2. Several conceptual gaps of original works devoted to the same topic were
closed, in particular the mechanics of building the shift matrix. The elaborated network
is robust and is designed to work with arbitrary number of roles and existing tensor
representations of different depth. This result provides an essential brick in the bridge
between symbolic and sub-symbolic levels of computations.
However, there is still an opened question on performing other operations over
arbitrary structures on the tensor level, for example adding or removing nodes or
moving nodes to other positions in the structure. Also, current proposal requires initial
definition of the structure maximum depth that can be an obstacle in edge cases, as well
2
https://github.com/demid5111/ldss-tensor-structures.
Towards Automatic Manipulation of Arbitrary Structures 383
as constructing the shifting matrix depending on number of roles. So, there is an actual
direction for further development of Tensor Product Variable Binding methods.
References
1. Rumelhart, D.E., Hinton, G.E., McClelland, J.L.: A general framework for parallel
distributed processing. Parallel Distrib. Process. Explor. Microstruct. Cogn. 1, 26 (1986)
2. Rumelhart, D.E., McClelland, J.L.: PDP Research Group: Parallel Distributed Processing,
1st edn, p. 184. MIT press, Cambridge (1988)
3. Serafini, L., Garcez, A.D.A.: Logic tensor networks: deep learning and logical reasoning
from data and knowledge. arXiv preprint. arXiv:1606.04422 (2016)
4. Teso, S., Sebastiani, R., Passerini, A.: Structured learning modulo theories. Artif. Intell. 244,
166–187 (2017)
5. Browne, A., Sun, R.: Connectionist inference models. Neural Netw. 14(10), 1331–1355
(2001)
6. Smolensky, P.: Tensor product variable binding and the representation of symbolic
structures in connectionist systems. Artif. Intell. 46(1), 159–216 (1990)
7. Gallant, S.I., Okaywe, T.W.: Representing objects, relations, and sequences. Neural Comput.
25(8), 2038–2078 (2013)
8. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In:
International Conference on Machine Learning, pp. 1188–1196 (2014)
9. Demidovskij, A.: Considering selected aspects of tensor product variable binding in
connectionist systems. In: Proceedings of the 2019 Intelligent Systems Conference
(IntelliSys). The conference will be held in September, pp. 5–6. Springer, Cham (2019)
10. Smolensky, P., Legendre, G.: The Harmonic Mind: From Neural Computation to
Optimality-Theoretic Grammar (Cognitive Architecture), 1st edn. MIT press, Cambridge
(2006)
Astrocytes Organize Associative Memory
1 Introduction
The functional role of astrocyte calcium signaling in brain information processing was
intensely debated in recent decades. Astrocytes play crucial roles in brain homeostasis
and are emerging as regulatory elements of neuronal and synaptic physiology by
responding to neurotransmitters with Ca2 þ elevations and releasing gliotransmitters
that activate neuronal receptors [1]. The characteristic times of calcium signals (1–2 s)
are three orders of magnitude longer than the duration of spikes in neurons (1 ms). It
was shown that astrocyte can act as temporal and spatial integrator, hence, detecting the
level of spatio-temporal coherence in the activity of accompanying neuronal network.
Currently actively discussed hypothesis is that the astrocytic calcium activity can
induce spatial synchronization in neuronal circuits defined by the morphological ter-
ritory of the astrocyte [2–4]. In other words one can draw an analogy with the Hopfield
network. Calcium events in astrocytes that induce synchronization in surrounding
neural ensembles work as a temporal Hopfield network, and, hence, can be interpreted
as an associative memory model.
In this paper, we consider one of the simplest model of the neuron-astrocyte net-
work (NAN), where we implement a kind of the Hopfield network with forgetting.
There is just a few of previous works studying role of astrocyte in learning tasks. Porto-
Pazos and collaborators investigated the performance of an astrocyte-inspired learning
rule to train deep learning networks in data classification and found that the neuron-
astrocyte networks were able to outperform identical networks without astrocytes in all
classification tasks they implemented [5–7]. In the presented studies they taken into
account only temporal features of astrocytic modulation of the signal transmission in
neural network. In contrast to this approach, we concentrate on the local spatial syn-
chronization organized by astrocyte, which, due to its different time scale, work as a
kind of neural associative memory.
The proposed neuron-astrocyte network consists of 2 layers, first layer of neurons with
dimensions 40 40 and second layer of astrocytes with dimensions 13 13. To
focus only on associative learning, the elements in each layer are not interconnected.
We consider bidirectional neuron-astrocytic communication between layers. Each
astrocyte interacts with neuronal ensemble dimensions of 4 4 with overlapping in
one row (see Fig. 1). Experiments show that astrocytes and neurons communicate via a
special mechanism modulated by neurotransmitters from both sides. The model is
designed so that when the calcium level inside an astrocyte exceeds a threshold, the
astrocyte releases neuromodulator (e.g., glutamate) that may affect the release proba-
bility (and thus a synaptic strength) at neighboring connections in a tissue volume.
Single astrocyte can regulate the synaptic strength of several neighboring synapses.
The membrane potential of a single neuron is described by Izhikevich model and
evolves according to the following equations [8]:
8
> dV
< ¼ 0:04V 2 þ 5V þ 140 U þ Iapp þ Iastro ;
dt ð1Þ
>
: dU ¼ aðbV UÞ:
dt
If V 30 mV ; then V ! c; U ! U þ d:
8
>
>
dCa
¼ Ier Ipump þ Ileak ;
>
>
>
> dt
< dH H h
¼ ; ð2Þ
>
> dt sn
>
>
>
>
: dIP3 ¼ ðIP3s IP3 Þsr þ Iplc þ Ineuro :
dt
3 3
IP3 Ca c0 Ca
Ier ¼ c1 v1 h
3
Ca ;
IP3 þ d1 Ca þ d5 c1
c0 Ca
Ileak ¼ c1 v2 ;
c1
Ca2
Ipump ¼ v3 2 ;
Ca þ k32
IP3 þ d1 IP3 þ d1
H ¼ d2 = d2 þ Ca ;
IP3 þ d3 IP3 þ d3
IP3 þ d1
sn ¼ 1= a2 d2 þ Ca ;
IP3 þ d3
Ca þ ð1 aÞk4
Iplc ¼ v4 :
Ca þ k4
Biophysical meaning of all parameters in Eq. (2) and their values determined
experimentally can be found in Ref. [6]. For our purpose we use the following
parameter values [6].
c0 ¼ 2:0 lM; c1 ¼ 0:185; v1 ¼ 6 s1 ; v2 ¼ 0:11 s1 ; v3 ¼ 2:2 lMs1 ; v5 ¼ 0:025 lMs1 ;
v6 ¼ 0:2 lMs1 ; k1 ¼ 0:5 s1 ; k2 ¼ 1:0 lM; k3 ¼ 0:1 lM; a2 ¼ 0:14 lM 1 s1 ;
d1 ¼ 0:13 lM; d2 ¼ 1:049 lM; d3 ¼ 0:9434 lM; d5 ¼ 0:082 lM; a ¼ 0:8; sr ¼ 7:143 s;
IP3s ¼ 0:16 lM; k4 ¼ 1:1 lM:
The current Ineuro describes production of IP3 due to the synaptic activity of
neighbor neurons. The current Ineuro is modeled by rectangular pulse signal with
amplitude 5 lM and duration 60 ms. Ineuro 6¼ 0 if more than 50% of neurons, inter-
acting with this astrocyte, are activated.
Note that the time unit in the neuronal model Eq. (1) is 1 ms. Due to a slower time-
scale, in the astrocytic model Eq. (2) all empirical constants are indicated using sec-
onds as time units. When integrating the joint system of differential equations, the
astrocytic model time is rescaled so that the units in both models match up.
Astrocytes organize associative memory 387
Fig. 1. A network structure. Input images 40 40 pixels size fed into the neuronal network
containing 40 40 neurons. Red fields correspond to the astrocyte, which overlap by one
neuron wide layer.
3 Results
We have used as input signals the black and white images of digit 0 or 1, with size
40 40 pixels as shown in Fig. 2. The training set included 10 samples for each image
with 10% of salt and pepper noise added to every sample fed into the NAN (see
Fig. 3a).
Fig. 3. (a) The training sample with 10% of salt and pepper noise. (b) The response of the
neuronal network. The values of the membrane potentials are shown. (c) The intracellular Ca2 þ
concentrations in astrocytic layer.
After training, our neuron-astrocyte network remembers the pattern for a period of
time that is determined by the duration of the calcium pulse in astrocyte. Testing
sample was presented to the network for 20 ms. While Ca2 þ concentration in astrocyte
exceeded the threshold in 0.15 lM and more than 8 neurons were still active, a
feedback from astrocytes to neurons is turned on. This feedback is determined by
biophysical mechanisms of astrocytic modulation of synaptic transmission and mod-
eled as additional current Iastro in Eq. (1). Example of this test is shown in the Fig. 5.
Fig. 4. (a–c) Membrane potentials of neurons during and after training. (a) Neuron in target
pattern interacted with active astrocyte. (b) Neuron, which are not in target pattern, interacted
with active astrocyte. (c) Neuron not in target pattern interacted with quiet astrocyte. (d) The
intracellular Ca2 þ concentration in active astrocyte.
Astrocytes organize associative memory 389
Fig. 5. The testing sample with 40% of salt and pepper noise. (a) The response of the neuronal
network after an input with 4,4 (b) and 11,6 (c) ms duration. (d) The intracellular Ca2 þ
concentrations in astrocytic layer.
Tests showed that the network can not only clean noise inside the target pattern
(Fig. 5b) as expected but also can separate in time the pattern and noise around
(Fig. 5c). The latter is due to the fact that neuronal spiking frequency is proportional to
value of applied current.
Fig. 6. The dependences of the accuracy on noise level. Dotted line corresponds to manual
selected threshold of accuracy.
390 S. Yu. Gordleeva et al.
Fig. 7. (a) and (d) The training sample with 10% of salt and pepper noise. (b) and (e) The
response of the neuronal network. The values of the membrane potentials are shown. (c) and
(f) The intracellular Ca2 þ concentrations in astrocytic layer. (g) The testing sample with 40% of
salt and pepper noise. The response of the neuronal network after the 4,4 (h) and 11,6 (j) ms
input.
4 Conclusions
links between cells have been required. Astrocytic modulation of the activity of nearby
neurons during elevation of calcium concentration imitates Hebbian temporary
synapse. In the future, the proposed neuron-astrocyte network will be developed by
incorporation of the Hebbian learning algorithm.
As we know from working with artificial intelligence algorithms, the flexibility of
learning strongly depends on the complexity of the network. As we have demonstrated,
astrocytes increases the complexity of the neural network by the coordination induce by
calcium events, and this mechanism alone can lead to the organization of the neural
associative memory. Without any doubt, it would be extremely interesting to investi-
gate how this learning mechanism will work together with deep learning.
Another important direction of the future research will include identification of
conceptual markers of malfunction associated either with age-related disease or grows
disorders. In both these situation, the brain loses ability to learn properly, hence, the
question arises whether we could model these processes without simple conceptual
model, and, probably, shed light on the methodology how to identify pathology
markers in real medical applications.
Acknowledgments. This work was supported by the Ministry of Science and Education of
Russian Federation (Grant No. 075-15-2019-871).
References
1. Verkhratsky, A., Butt, A.: Glial Neurobiology. Wiley, Chichester (2007)
2. Bazargani, N., Attwell, D.: Astrocyte calcium signaling: the third wave. Nat. Neurosci. 19(2),
182–189 (2016)
3. Araque, A., Carmignoto, G., Haydon, P.G., Oliet, S.H., Robitaille, R., Volterra, A.:
Gliotransmitters travel in time and space. Neuron 81, 728–739 (2014)
4. Gordleeva, S.Y., Ermolaeva, A.V., Kastalskiy, I.A., Kazantsev, V.B.: Astrocyte as
spatiotemporal integrating detector of neuronal activity. Front. Physiol. 10, 294 (2019)
5. Porto-Pazos, A.B., Veiguela, N., Mesejo, P., Navarrete, M., Alvarellos, A., Ibáñez, O., Pazos,
A., Araque, A.: Artificial astrocytes improve neural network performance. PLoS ONE 6(4),
e19109 (2011)
6. Alvarellos-González, A., Pazos, A., Porto-Pazos, A. B.: Computational models of neuron-
astrocyte interactions lead to improved efficacy in the performance of neural networks.
Computational and Mathematical Methods in Medicine (2012)
7. Mesejo, P., Ibáñez, O., Fernández-Blanco, E., Cedrón, F., Pazos, A., Porto-Pazos, A.B.:
Artificial neuron–glia networks learning approach based on cooperative coevolution. Int.
J. Neural Syst. 25(4), 1550012 (2015)
8. Izhikevich, E.: Simple model of spiking neurons. IEEE Trans. Neural Netw. 14(6), 1569–
1572 (2003)
9. Li, Y.X., Rinzel, J.: Equations for InsP3 receptor-mediated [Ca2 +]i oscillations derived from
a detailed kinetic model: a Hodgkin-Huxley like formalism. Theor. Biol. 166(4), 461–473
(1994)
Team of Neural Networks to Detect
the Type of Ignition
1 Introduction
The ship’s premises have different fire hazards. Moreover, inside a single room,
for example, an engine room, a room with electrical equipment, the probabilities
and types of ignition can differ significantly. Means of automatic extinguishing
can most quickly eliminate the fire, especially if they are applied locally. To
use these tools, you need to know what substance is ignited and where the
fire is located. In this case, local application of a suitable fire extinguishing
c Springer Nature Switzerland AG 2020
B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 392–397, 2020.
https://doi.org/10.1007/978-3-030-30425-6_46
Team of Neural Networks 393
agent is possible. In the considered multi-sensor fire system, which has sensors
for temperature, CO concentration and smoke concentration, it is possible to
determine the type of fire. With a sufficient number of sensors and their optimal
placement, it is possible to determine the area of ignition. A better result can
be obtained using neural networks or a team of neural networks. Therefore,
the aim of our study is to develop a neural network data processing algorithm
of a multisensory fire system with the goal of the most rapid fire detection,
localization and classification.
2 Simulation of Fire
Consider the following sources of ignition and their respective classes in accor-
dance with the classification given in the NFPA10 (National Fire Protection
Association) standard [1]: Depending on the type of ignition, the readings of the
three types of sensors: temperature, concentration of carbon monoxide and con-
centration of smoke vary with time, as shown in Fig. 1. The data were obtained
using simulations on a supercomputer in the FDS environment [2]. The inertia of
the sensors was taken into account at the preprocessing stage using the impulse
response of each sensor. The analysis of dependencies in Fig. 1 showed that the
source of ignition affects the change in the fire factors received from the sensors.
Fig. 1. Changes in fire factors for five sources of ignition, in case of a fire at zero-time,
(a). Data received from the temperature sensor, (b). Data obtained from a carbon
monoxide concentration sensor, (c). Data obtained from the sensor measuring the con-
centration of smoke.
394 A. Guseva and G. Malykhina
This distinction can be used to identify ignition sources. The recognition result
depends on the number of sensors and their relative location. Three sensors were
selected for the investigated room measuring 5 by 7 m, which corresponds to the
standards SP 5.13130.2009 [3]. The location of the sensors was optimized using
a variant of the genetic algorithm proposed by the authors in the articles. [4–6].
The importance of temporal dependencies for the recognition problem leads to
the application of temporal signal processing using dynamic INS with short-term
memory. Short-term memory is made on the delay line at the input of the INS.
Data from sensors is received once per second (Table 1).
function. The output layer of the Bayesian network represents the probability
that one of five types of fires occurs or there is no fire at all. The sum of all values
of the output vector is equal to one. As a result of learning the Bayesian neural
network, the resulting error of determining the type of fire was 93.7. Moreover,
the main error is related to the work of the previous five neural networks. The
time required for training is 18 s.
Team of Neural Networks 397
5 Conclusion
The proposed two-tier architecture has several advantages:
References
1. NFPA 10: Standard for Portable Fire Extinguishers. https://www.nfpa.org/cod
es-and-standards/all-codes-and-standards/list-of-codes-and-standards/detail?code
=10
2. McGrattan, K., Hostikka, S., Floyd, J., Baum, H., Rehm, R., Mell, W., McDermott,
R.: Fire Dynamics Simulator (Version 5) Technical Reference Guide. National Insti-
tute of Standards and Technology, Gaithersburg (2010). http://code.google.com/
p/fds-smv
3. SP 5.13130.2009 Fire protection systems: Installation of fire alarm and fire extin-
guishing automatic. Norms and rules of design (with Amendment N 1). http://docs.
cntd.ru/document/1200071148
4. Malykhina, G.F., Guseva, A.I., Militsyn, A.V., Nevelskii, A.S.: Developing an intel-
ligent fire detection system on the ships. In: Sukhomlin, V., Zubareva, E., Shneps-
Shneppe, M. (eds.) The International Scientific Conference on II Convergent Cog-
nitive Information Technologies (Convergent’2017), vol. 2064, pp. 289–296. Russia,
Moscow (2017)
5. Militsyn, A.V., Malykhina, G.F., Guseva, A.I.: Early fire prevention in the plant.
In: International Conference on Industrial Engineering, Applications and Manufac-
turing (ICIEAM), Saint Petersburg, Russia, vol. 2, pp. 1–4. IEEE Explore (2017)
6. Guseva, A.I., Malykhina, G.F., Nevelskiy, A.S.: Neural network based algorithm for
the measurements of fire factors processing. In: Kryzhanovsky, B., Dunin-Barkowski,
W., Redko, V., Tiumentsev, Y. (eds.) Neural Computation, Machine Learning, and
Cognitive Research II. Neuralinformatics Studies in Computational Intelligence, vol.
79, pp. 160–166. Springer, Cham (2019)
Chaotic Spiking Neural Network Connectivity
Configuration Leading to Memory Mechanism
Formation
Mikhail Kiselev(&)
1 Introduction
The recently proposed neural network paradigms such as spiking neural networks
(SNN), convolutional and deep learning networks are considered by many researchers
as a potential basis for the break-through IT technologies of the near future. Since
SNNs are complex non-linear dynamic systems, their specific application area is
processing of dynamic signals such as video streams, sensory data in robotics or signals
from technological sensors.
The most common form of SNN architecture used for solution of this kind of
problems is the so called liquid state machine (LSM) [1]. LSM is a computational
model consisting of the two main parts. The first part is a large chaotic spiking neural
network. It is chaotic in the sense that it has no predefined structure (layers etc.).
Instead, its connectivity is random – presence of synaptic connection between two
given neurons, weight of this connection and its delay are random variables obeying
certain statistical distributions. Input data streams represented in form of spike
sequences (let us remind that spiking neurons communicate by spikes – short pulses of
the constant amplitude and negligible duration) are injected into the network via special
afferent synapses. The network responds to stimulation by complex activity of its
neurons which may depend on recent history of the input signal. Activity of the
neurons (in form of spike counts in equal time intervals) is monitored by the second
part of LSM, the read-out mechanism. This mechanism implements supervised learning
– it learns to use LSM neuron activity data to classify input stimuli, to make predic-
tions, to recognize exceptional situations and to perform other data analysis and pre-
diction tasks. Nature of the read-out mechanism may be very diverse. It may be any
suitable data mining algorithm – logistic regression, support vector machine, decision
tree, naïve Bayesian classifier or anything else – it is required only that it should be fast
and could work with very multi-dimensional data. It is assumed that valuable predictive
features are hidden in the multi-dimensional and diverse reaction of the large SNN to
input signal, and the job of the read-out layer is to mine them in the seeming chaos of
the SNN activity.
In the original version of LSM which is used now by the majority of researchers,
neurons are not plastic – the synaptic plasticity is switched off. However, there are
many reasons to believe that it can play positive role. Indeed, the strong feature of LSM
is its randomness. It makes possible to implement all kinds of computations on input
data (provided that the SNN is sufficiently large). But at the same time, randomness is
an evident weakness of the LSM concept – small number of useful circuits in the
networks is neighbored by plenty of random network subsets performing senseless or
trivial operations. Thus, there is tempting opportunity to preserve computation gen-
erality provided by chaotic connectivity while eliminating senseless circuits in the
process of guided self-organization implemented in the form of synaptic plasticity. It
leads us to concept of self-organizing LSM (SOLSM). Test of this hypothesis and
creation of the practically usable SOLSM are among aims of the research project ArNI
(Artificial NeuroIntelligence).
The crucial feature of LSM explaining its efficiency for processing of dynamic data
is its memory ability (the transient working memory is meant here, not to be confused
with the constant long-term memory fixed in synaptic weights). If the spatio-temporal
pattern to be recognized spans significant time interval, the network should memorize
its beginning until its final part is presented. It is true for SOLSM, as well. However,
appearance of the memory mechanism in evolving chaotic SNN is very poorly
explored process. Some of the earlier works of the author were devoted to this subject
[2, 3]. However, the structured SNNs were studied in these works. At present, the
majority of working memory models in SNN is based on short-term plasticity, an
additional process modifying synaptic weights which acts together with the conven-
tional long-term STDP plasticity (see, for example, [4]). Different researchers include
this mechanism in their models in different forms. For example, in the pioneering work
of Izhikevich [5], short-term plasticity enables formation of the so called polychromous
neuronal groups (PNG), whose sporadic activation indicates recent appearance of the
stimulus specific for the given PNG. Other approaches utilize the notion of attractors
[6], meta-stable states of the network preserving information expressed by the attractor
in time. Further extension of this idea called continuous attractors explains how con-
tinuous values can be stored in memory [7]. However, most of these approaches cannot
be directly applied to SOLSM because either cannot be implemented in chaotic net-
works (like continuous attractors) or use complicated synaptic plasticity models
(especially, keeping in mind that LSM does not use synaptic plasticity at all).
400 M. Kiselev
Thus, our aim is to study how working memory can appear in chaotic SNN with
Hebbian long-term synaptic plasticity.
The simplest but functional leaky integrate-and-fire (LIF) neuron model with current-
based excitatory synapses and conductance-based inhibitory synapses was used in this
study. Upon receiving a spike at the moment tijþ , the i-th excitatory synapse instantly
increments the neuron membrane potential u by a value equal to its weight wiþ . The
k-th inhibitory synapse receiving spike instantly increments inhibitory membrane
conductance c by the value of its weight w k . In the absence of input spikes, u and
c decay to 0 with time constants su and sc, respectively. When u reaches a threshold
value, the neuron emits a spike. After that, the neuron cannot emit new spike during the
refractory period sR. Values of membrane potential are selected such that its resting
value equals 0 and its threshold value equals 1. While the value of c is not equal to
zero, the membrane potential falls exponentially to the inverse inhibitory potential UI
(which is negative) with the time constant 1/c. Thus, the used neuron model is
described by the following equations:
8 P þ
>
> du
¼ u
ð Þ þ d þ
< dt su c u U I w i t t ij
P
i;j
ð1Þ
>
> w
dt ¼ sc þ i d t tij
dc c
:
i;j
and the condition that if u > 1 and t > Ta + sR, where Ta is the moment when this
neuron fired last time, then the neuron fires and u is reset to 0.
The plasticity rule used in this work is based on the spike timing dependent
plasticity (STDP) model. As in our previous works [8, 9], the lower and upper limits
(wmin and wmax) on synaptic weight values are set by using the so called synaptic
resource W, whose value depends monotonically on the weight value w in accordance
with the following formula:
Now let us describe how the memory ability of the SNN is evaluated. Informational
input of the SNN is represented as a certain number of nodes – sources of spikes (in our
experiments this number was equal to 600). These nodes emit low intensity Poissonian
noise (mean spike frequency 0.1 Hz). Besides that, every 100 ms, some group of input
nodes begins to emit high intensity (100 Hz) Poissonian noise. This high frequency
signal lasts 40 ms (below, it will be also called pattern). These groups do not intersect.
We used 30 groups (patterns), 20 nodes per group. Order, in which these groups
became active, was random. The task was to predict which group was active during the
preceding time interval using the network activity (spike counts of each neuron)
measured in the current interval. Successful prediction would mean that the network
memorizes properties of input signal during at least 60 ms and that this memory is
sufficiently stable – it is not destroyed immediately by activity of the next input node
group.
Random forest data mining algorithm [10] was chosen as a read-out mechanism
because of its speed and stability in case of very numerous predictors.
In the described series of experiments, the whole simulation lasted 1600 s. It was
assumed that during first 800 s the network reaches a certain equilibrium state. If it
really does then during last 800 s no significant synaptic weight modifications should
be observed. In this case, this second half of simulation period was used for mea-
surement of the pervious pattern prediction accuracy as was said above.
Interneuron connections have non-zero delays. Inhibitory connections are always
fast (have 1 ms delay).
4 Network Connectivity
Since it is not clear a priori which connectivity configuration could lead to formation of
memory in SNN, the three following variants were tested:
• Neural gas. All neurons have identical number of synapses of each kind - exci-
tatory, inhibitory and afferent, connecting a neuron with input nodes (they are
always excitatory) but the set of presynaptic neurons is selected randomly for every
neuron. Synaptic weights and delays are also random and selected using the same
distribution law for all neurons, but different for connections E ! E; E ! I; I ! E
and I ! I.
• Bottleneck. The same as above but only a small fraction of all neurons have afferent
links.
• Sphere. Let us imagine that all neurons correspond to randomly selected points of a
sphere with radius equal to 1. The synaptic delays of excitatory links are
402 M. Kiselev
proportional to the length of the links. Network connectivity obeys the “small
world” law – all neurons have the same numbers of long and short links. Long links
are created by the same rule as in the two previous schemas. Postsynaptic neurons
for short links are selected using the probability distribution
pðrÞ exp ðr aÞ2 =2b2 , where r – is the distance to postsynaptic neuron, a and
b – the constants (for excitatory links a = 0).
Thus, three kinds of chaotic SNNs were explored. Each one is characterized by 30
+ parameters (constants, entering neuron model and plasticity rule, structural properties
of the network). Criterion for evaluation of their memory ability was described in
Sect. 3. Therefore, finding the best SNN is an optimization problem. This type of
optimization problems are solved efficiently by the genetic algorithm (GA) and it was
selected as an optimization technique in this study.
Optimization was performed for networks of the same size (10000 neurons). The
population size in all cases was 300. The mutation probability per individual was 0.5;
elitism – 10%. Optimization was stopped when 3 consecutive populations had not
shown progress.
6 Results
The GA optimization performed in this study showed that the connectivity configu-
rations “neural gas” and “bottleneck” show almost no signs of emerging memory
mechanism. The best accuracy obtained for “neural gas” was 6.34%, for “bottleneck” –
7.25%. It is too low accuracy, close to the baseline lazy classifier accuracy which
equals approximately to 3.3% for 30 equally frequent patterns. Interestingly, synaptic
plasticity was found to be a definitely positive factor – without it the accuracy fell to
4.29%. At the same time, formation of memory mechanism in a “sphere” SNN was
reliably demonstrated (accuracy 25.7%). The best network is characterized by very
sparse and local connectivity – excitatory neurons have 7 excitatory synapses such that
6 of them are connections with the closest neurons and only 1 link is “far”. Number of
inhibitory synapses is only 3, all inhibitory links are “local” (a = 0.00653,
b = 0.00315). The optimum percent of inhibitory neurons was 7.82%. Another inter-
esting feature of the best network is significant difference of time constant su for
excitatory and inhibitory neurons (14/4 ms).
Dependence of the accuracy on the network size was studied (for fixed optimum
values of the other parameters). It is shown on Fig. 1. We see that it is almost linear on
logarithmic scale.
The computations were performed on three GPU servers using the high perfor-
mance SNN simulation package ArNI. A SNN consisting of 100000 neurons is sim-
ulated at the speed 7 times slower than real time on a powerful PC with 4
NVIDIA TITAN Xp cards provided for this project by Kaspersky Lab.
Chaotic Spiking Neural Network Connectivity Configuration 403
80
60
Accuracy, %
40
20
0
6000 24000 96000
Network size
Fig. 1. Dependence of the pervious pattern determination accuracy on the SNN size.
7 Conclusion
The results obtained in this work let us make the following conclusions:
• The connectivity scheme used in the traditional LSM is not optimal from the
viewpoint of LSM memory characteristics and therefore may limit its ability to
produce valuable predictive features from dynamic data. To reach higher perfor-
mance the “small world” connectivity scheme described above should be used.
• SOLSM (LSM with plastic neurons) can outperform traditional LSM due to fuller
usage of network resources (restructuring silent or constantly active neuronal
groups).
• Network size is very important. It is possible that the power of SOLSM will be
unveiled in full only in case of very large SNN still unavailable for commonly used
hardware platforms (such as GPU servers).
The type of SNNs studied in this work is very hard for theoretical and empirical
exploration. This scientific problem requires significant research efforts. The presented
results while being significant and valid should still be considered as preliminary.
Systematic study of SOLSM is being carried out now as a part of the research project
ArNI supported by Kaspersly Lab, its results will be reported in further publications.
Acknowledgements. I would like to thank Andrey Lavrentyev and Artyom Nechiporuk for
valuable discussion. I am grateful to Kaspersky Lab for the powerful GPU computer provided.
References
1. Maass, W.: Liquid state machines: motivation, theory, and applications. In: Computability in
Context: Computation and Logic in the Real World. World Scientific, pp. 275–296 (2011)
2. Kiselev, M.: Self-organization process in large spiking neural networks leading to formation
of working memory mechanism. In: Rojas, I., Joya, G., Cabestany, J. (eds.) Proceedings of
IWANN 2013. LNCS, vol. 7902, Part I, pp. 510–517 (2013)
404 M. Kiselev
3. Kiselev, M.: Self-organized short-term memory mechanism in spiking neural network. In:
Proceedings of ICANNGA 2011 Part I, Ljubljana, pp. 120–129 (2011)
4. Fiebig, F., Lansner, A.: A spiking working memory model based on Hebbian short-term
potentiation. J. Neurosci. 37(1), 83–96 (2016)
5. Szatmary, B., Izhikevich, E.: Spike-timing theory of working memory. PLoS Comput. Biol.
6(8), e1000879 (2010)
6. Lansner, A., Marklund, P., Sikström, S., Nilsson, L.-G.: Reactivation in working memory:
an attractor network model of free recall. PLoS ONE 8(8), e73776 (2013). https://doi.org/10.
1371/journal.pone.0073776
7. Seeholzer, A., Deger, M., Gerstner, W.: Stability of working memory in continuous attractor
networks under the control of short-term plasticity. PLoS Comput. Biol. 15(4), e1006928
(2019). https://doi.org/10.1371/journal.pcbi.1006928
8. Kiselev, M.: Rate coding vs. temporal coding – is optimum between? In: Proceedings of
IJCNN-2016, pp. 1355–1359 (2016)
9. Kiselev, M., Lavrentyev, A.: A preprocessing layer in spiking neural networks – structure,
parameters, performance criteria, accepted for publication. In: Proceedings of IJCNN-2019
(2019)
10. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:
1010933404324
The Large-Scale Symmetry Learning Applying
Pavlov Principle
1 Introduction
binary images, corresponding to 16 and 100 size of input vector respectively. Boltz-
mann machine, trained on a set of randomly selected such images, obtained 98.2%
accuracy on 4 4 problem an 90% accuracy on 10 10 problem.
2 The Model
symmetric class. For a symmetry class we assign a random binary value to each
variable of the first subset and the same value to each corresponding variable of the
second subset. This guarantees the data sample to be symmetric in the chosen way. If
the data sample should belong to non-symmetric class, each variable is assigned to a
random value. Then we check the sample for symmetry. If the data sample happened to
belong to the symmetric class, we select randomly one variable and change its value to
the opposite. Since symmetric vectors never have odd number of 0 s or 1 s, this
procedure guarantees to generate a non-symmetric data sample. However such per-
cussion is practically unnecessary since there is only a 2100 chance that a randomly
generated 100-digit binary vector will accidentally happen to be symmetric.
X
N
Y ðtÞ ¼ Sð wi ðtÞ Xi ðtÞ bÞ ð1Þ
i¼1
Here 2100 are components of the binary input vector X at step t, presented at i-th
synapse, and 2100 is the corresponding output value. Since the architecture used in our
study doesn’t imply recurrent connections, layerwise successive computation of neu-
rons’ outputs can be viewed as performed on the same step. 2100 is a binary input-
output threshold activation function, it equals to 1 if its argument is greater than 0, and
it is 0 otherwise. 2100 is the weight of input variable with index i 2100 and b is a
threshold value.
The learning rule of a hidden neuron can be formalized as following:
XK
wi ðt þ 1Þ ¼ wi ðtÞ þ e Fð K1
Ek ðtÞ ek;i ; YðtÞ; XðtÞÞ ð2Þ
Here, e is a learning rate factor which determines the speed of weight changing. E is
a K-component error vector, where k is the number of output values multiplied by 2.
Each component of an output vector 2100 corresponds to two error components, 2100
408 A. E. Lebedev et al.
and 2100 . They both equal to 0 if oj ðtÞ match with the desired value. E2j ðtÞ equals 1
only if oj ðtÞ is greater than its desired value (i.e. it equals to 1 when 0 is desired) and
E2j þ 1 ðtÞ equals to 1 only if oj ðtÞ is less than desired (i.e. it equals to 0 when 1 is
desired). 2100 is a fixed weigh, associated with k-th error component and propagated to
i-th synapse. 2100 sets the learning rule. In most cases in this study we use the following
learning formula:
XK XK
F K1
Ek ðt Þ e k;i ; Y ð t Þ; X ð t Þ ¼ K1
Ek ð tÞ e k;i ðY ðtÞ 0:5Þ
ðX ðtÞ 0:5Þ 4 ð3Þ
For training output neurons we use a similar formula, but ek,i are not selected
randomly. Instead we use 2100 and 2100 for all i where j is the index of corresponding
output class. Other e-factors are equal to 0, so the total impact of error is equal to the
difference between the desired output value and the actual output. This makes the
learning rule of output neurons similar to delta-rule for classic perceptron. It increases
the weights of inputs, that are positively correlated with the desired output and
decreases weights of inputs, that are negatively correlated with the desired output.
100
90
80
70
60
50
40
30
20
10
0
698
862
124
165
206
247
288
329
370
411
452
493
534
575
616
657
739
780
821
903
944
985
1
42
83
Fig. 1. The history of changing of percentage of correct symmetry recognition for different
values of learning rate and for perceptron with fixed weights of neurons in hidden layer. The
vertical axis corresponds to percentage of correct answers. The horizontal axis corresponds to the
number of training steps (in thousands).
The Large-Scale Symmetry Learning Applying Pavlov Principle 409
We tested our neural net on the symmetry detection problem with different settings.
In the primary one we used one hidden layer with 400 neurons and 2 output neurons,
corresponding to symmetric and non-symmetric classes. Each neuron was connected to
each neuron of the previous layer. On Fig. 1 we present two examples of history of
changing of average (averaged over last 1000 steps) percentage of correct answers for
symmetry class during training. The learning process lasted 1000000 steps in this
experiment. Since the learning process stabilizes after 500000 steps for learning rate
0.01, we reduced the number of steps to this number for next experiments. The final
average percentage of correct answers was 94.80%.
We compared the obtained results with the performance of classic perceptron which
have neurons in hidden layer with fixed random weights. With a similar configuration
(400 neurons in one hidden layer, 1000000 training steps, learning rate 0.001) its
average obtained performance was 59.38% of correct symmetry recognition which is
slightly better than a random guess. The history of changing of percentage of correct
answers for symmetry class for perceptron with fixed weights in hidden layer is also
shown in Fig. 1 with dotted line.
Next we investigated architectures with more than one hidden layers. We tested
configurations with 1, 2, 3 and 5 hidden layers. Table 1 shows percentage of correct
answers for symmetry class, obtained after 500000 steps of training. These percentages
were measured during special test phase with fixed weights and lasted for 10000 steps.
The results were averaged over several independent runs. The obtained accuracy
decreases with the increase of number of hidden layers. However it was still better, than
a perceptron with fixed weights in hidden layer.
We also investigated the impact of the amount of neurons in the hidden layer. We
tested configurations with 200, 400 and 800 neurons in hidden layer (with only one
hidden layer). As can be observed from Table 2, the increase of neurons in hidden layer
increases the performance of the neural network.
410 A. E. Lebedev et al.
We also investigated architectures where the network was not fully connected.
Instead each neuron in hidden layer was randomly connected to the fixed number of
neurons in the previous layer. Neurons of output layer remained connected to all
neurons of the previous layer. Table 3 presents the obtained performance for different
number of connections
3 Conclusion
References
1. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating
errors. Nature 323, 533–534 (1986)
2. Sejnowski, T.J., Kienker, P.K., Hinton, G.E.: Learning symmetry groups with hidden units:
beyond the perceptron. Phys. D Nonlinear Phenom. 22(1–3), 260–275 (1986)
3. Lillicrap, T., Cownden, D., Tweed, D.B., Akerman C.J.: Random feedback weights support
learning in deep neural networks. arXiv:1411.0247 (2014)
4. Dunin-Barkowski, W.L., Solovyeva, K.P.: Pavlov principle and brain reverse engineering. In:
2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational
Biology, Saint Lois, Missouri, USA, 30 May–2 June 2018, vol. 37, pp. 1–5 (2018)
5. Nokland, A.: Direct feedback alignment provides learning in deep neural networks. arXiv:
160901596 (2016)
Bimodal Coalitions and Neural Networks
1 Introduction
In the early 90s R. Axelrod and D. Bennet proposed an approach for a formal
description of splitting of a set of interacting agents into two competing groups [1, 2].
Their results have found applications in social, politic, and management sciences. Then
Galam [3] reformulated this approach in terms of the Ising model. Afterwards he
complicated the initial scheme and proposed a number of new models (see references in
[4]). The following development of this approach led to the appearance of the
econophysics and sociophysics.
In this paper, we solve the same problem using the ideas and concepts of the
discrete dynamic of the Hopfield model. We analyze analytically an idealized case of
two equally interacting homogeneous groups of the agents and construct a phase
diagram that describes completely how the decomposition of the agents into two
groups depends on the intra-group interaction and cross-interaction between groups.
Following tradition, the decomposition of the agents into two groups will be called
a bimodal coalition.
1. The original setting of the problem. We have n agents that are connected with each
other. By wi ; i ¼ 1; . . .; n we define the weight of the i-th agent. The connections of the
agents we interpret in terms of their mutual propensity and suppose that propensities
are symmetrical:
[ 0; if agents i and j are prone to cooperate;
pij : pij ¼ pji :
\0; if agents iand jare prone to conflict:
Two lists A and A ~ define a bimodal coalition C ¼ ðA; AÞ ~ or, in other words, de-
composition into two groups. Each of these lists contains all the numbers of agents
assigned to the given group:
~ ¼ InA; where I ¼ f1; 2;. . .,ng is the full list.
A ¼ fi1; ; i2 ; . . .; ip g; A
Each grouping C ¼ ðA; AÞ ~ provides a proximity relation dij between the agents:
1; if agents i and j belong to the given list;
dij ðCÞ ¼
0; if agents i and j belong to different lists:
X
n
Ui ðCÞ ¼ wj pij dij ðCÞ:
j¼1
The productivity of the grouping C for the i-th agent is maximal if all the other
agents with which the given agent is prone to cooperate belong to his group and the
group does not contain agents with which it is prone to conflict.
In the Axelrod-Bennet it is stated that a system of agents tends to those grouping for
which the weighted sum of the productivities is maximal:
X
n
UðCÞ ¼ wi Ui ðCÞ ! max: ð1Þ
i¼1
In other paper of the same authors they used this method to describe alliances of
producers of standards of UNIX operating systems. Nine companies involved in the
UNIX production were regarded as agents. They are
AT&T, Sun, Apollo, DEC, HP, Intergraph, SGI, IBM and Prime.
In the course of cumbersome calculations of the connections pij , some parameters
of the problem played the role of weight coefficients. By varying the parameters within
reasonable limits they discovered only a weak dependence of the result on the values of
the parameters. The authors found that there were two decompositions of the functional
(1) that provided the same global maximum:
• {Sun, DEC, HP} and {AT&T, Apollo, Intergraph, SGI, IBM, Prime};
• {Sun, AT&T, IBM, Prime} and {DEC, HP, Apollo, Intergraph, SGI}.
The second grouping corresponded to the existing associations of the companies in
UNIX International and OPEN Software Foundation and only IBM was identified
incorrectly.
3. Ising model. In the second half of 90 s Serge Galam recognized that it was
convenient to formulate the Axelrod-Bennet model in terms of the Ising model.
Let us introduce a matrix
where dij is the Kronecker delta and the diagonal elements of the matrix J are equal to
zero. To each bimodal coalition C we assign a configuration vector s ¼ ðs1 ;s2 ;. . .;sn Þ:
~ , s ¼(s1 ,s2 ,. . .,sn ): si ¼ 1; i 2 A
C ¼ ðA; AÞ ~
si ¼ 1; i 2 A:
Then the maximization of the sum (1) is equivalent to the determination of the state
s corresponding to the global minimum of the energy EðsÞ:
X
n
EðsÞ ¼ ðJs; sÞ ¼ Jij si sj ! min: ð2Þ
i;j¼1
si ðtÞ; when si ðtÞhi ðtÞ 0
si ðt þ 1Þ ¼ , si ðt þ 1Þ ¼ signðhi ðtÞÞ: ð3Þ
si ðtÞ; when si ðtÞhi ðtÞ\0
In what follows, an unsatisfied spin is a spin whose sign does not coincide with the
sign of the field acting on it. If the state of the i-th spin changes, then its contribution to
the local fields acting on the other spins also changes. As a result, the state of some
other spins can also change etc. The evolution of the system consists of subsequent
turns of unsatisfied spins. Each step of the evolution is accompanied by a decrease of
the energy of the state, and sooner or later the system reaches a state that corresponds to
an energy minimum (it may be a local minimum). At that moment, the evolution of the
system will stop, since all the spins will be satisfied. However, according the setting of
the problem, we have to find the global minimum. For this purpose we can use
improved procedures of minimization [5, 6]. The formulation of problem (3) in terms
of neural networks allows us to illustrate the problem of bimodal coalition formation.
Concluding this section let us note, that all the energies are two-fold degenerate:
EðsÞ ¼ ðJs; sÞ ¼ EðsÞ. To remove the degeneration we need an external field.
1. One homogeneous group. A homogeneous group is a group where all the agents
interact identically. In this case the interaction matrix has the form
0 1
0 a a
Ba 0 aC
J¼B
@ ... .. .. . C; a [ 0: ð4Þ
. . .. A
a a 0
The network with such a connection matrix has only one the global minimum of the
energy s0 ¼ ð1; 1; . . .; 1Þ; and there are no other minima. (We do not take into account
the second minimum that appears due to the equality EðsÞ ¼ EðsÞ). In other words,
the states of all the agents are the same. It can be said that all the agents behave “as one
person”.
If we turn to Eq. (2) we see that for the system with the connection matrix (4), not a
bimodal coalition but a consolidation of all the agents into one group is profitable.
2. Two homogeneous groups. Let us examine a spin system consisting of two
homogeneous groups. We suppose that in the first group there are p agents and the
interactions between these agents are identical and equal to A. The interactions between
the remaining q agents (that constitute the second group) are also identical and equal to
C. We suppose that all the interactions between the agents from the first and second
groups are equal to B. We assume that C is positive and larger than A and B, and factor
out C. Now the connection matrix has the form
416 L. Litinskii and I. Kaganowa
0 p q 1
zfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflffl{ zfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflffl{
B 0 a a b b b C
B C
B a 0 a b b b C
B . . . . C
B . .
B . . . . ... .. .. . .
. . . .. C
C
B C a ¼ A=C;
J ¼ B a a 0 b b b C; b ¼ B=C:
B C
B b b b 0 1 1 C
B C
B b b b 1 0 1 C
B . . . .. C
@ .. .. . . ... .. .. . .
. . . . A
b b b 1 1 0
For a neural network with such a connection matrix we will describe the depen-
dence of the set of the minima upon the parameters a, b, p, and q, where p þ q ¼ n:
One can show that if a configuration corresponds to a minimum of the energy (2) its
last q coordinates have to be identical:
It turns out that for some values of the parameters, only the configurations from the
class Rk provide the minimum of the functional (2) simultaneously. For all the con-
figurations from Rk the energies (2) are the same. In other words, they are minima, and
if the inequality 0\k\p is fulfilled then there are no other local minima of the
functional (2).
In Fig. 1 we show the partition of the ða; bÞ-plane into regions where one class or
other of the configurations Rk provides a minimum of the functional (2). Below we
interpret this diagram in terms of bimodal coalitions.
3. Sensible interpretation. From Eq. (5) for the coordinates of the global mini-
mum it follows that the second group always acts “as one person” – the last q spins are
equal to +1.
At first, let us examine the case when the agents of the first homogeneous group are
prone to cooperate with each other (a [ 0). Then they also act ‘as one person’ (see the
upper half-plane of the diagram). If both groups of agents are prone to cooperate with
each other, that is, when b [ 0, then all the agents of the first group are in the same
state as the agents of the second group, i.e. the first p coordinates of the vector R0 are
equal to +1. However, if the groups conflict with each other, that is, b\0, then all the
agents of the first group are in the state, opposite to the state of the agents belonging to
the second group.
Let us summarize. When all the agents inside each group are prone to cooperate
(a [ 0), the sign of the cross-interaction defines the state of the whole system. If b [ 0,
and, consequently, the groups are prone to cooperate with each other, it is more
profitable for them to be together. In this case, the vector R0 provides the global
minimum. If b\0 and the groups are conflicting, it is more profitable for the groups to
be separate: in this case the global minimum corresponds to Rp .
Inside the symmetric strip along the axis of ordinates, both configurations R0 and
Rp are minima simultaneously. This strip is a unique region on the plane where the
418 L. Litinskii and I. Kaganowa
functional (2) has both global and local minima simultaneously. To the right of the axis
of ordinates, where b [ 0, the vectors R0 and Rp provide the global and the local
minima, respectively. On the other hand, to the left of the axis of ordinates, Rp cor-
responds to the global minimum and R0 to the local minimum. It is easy to explain why
such quasi-instability takes place. Indeed, let us suppose that the cross-interaction
between the groups is equal to zero: b ¼ 0. In other words, two groups of the agents are
completely independent. Then the problem (2) has two equivalent solutions R0 and Rp
that correspond to the same value of energy. When jbj increases slightly, at the
beginning the second configuration continues to be a minimum, but now a local
minimum. When the value of jbj becomes sufficiently large, the additional local
minimum disappears.
The narrow strip along the axis of ordinates is the result of removing the random
degeneracy of the global minimum when the external parameter b ¼ 0. It is interesting
to understand whether local minima always appear for the same reason? Or are there
other mechanisms for their appearance?
Finally, let us briefly discuss the situation when the agents inside the first group
conflict with each other (a\0). The lower half of the phase diagram shows that in this
case the first group of agents splits into two opposing groups. This conclusion is rather
reasonable. Other intrinsic interpretations are more speculative.
4 Conclusions
We have shown that in a system with a great number of interacting binary agents, the
known problem of the formation of two competing groups, or the problem of the
bimodal coalition, can be formulated in terms of neural networks of the Hopfield type.
The neural network dynamics is convenient when describing the influence of the agents
on each other. We analyzed theoretically an idealized case of interaction between two
homogeneous groups of agents. The obtained results allowed us to present a sensible
interpretation of the bimodal coalition problem.
We determined the mechanism of the formation of the local minima for the energy
functional. It is interesting to find out whether there are other possibilities for their
appearance. We think that our analysis is promising and deserves further examination.
Acknowledgement. The work was financially supported by State Program of SRISA RAS
No. 0065-2019-0003 (AAA-A19-119011590090-2).
We are grateful to Ben Rozonoer for his help in preparation of this paper.
References
1. Axelrod, R.M., Bennett, D.S.: A landscape theory of aggregation. Brit. J. Polit. Sci. 23(2),
211–233 (1993)
2. Axelrod, R.M., Mitchell, W., Thomas, R.E., Bennett, D.S., Bruderer, E.: Coalition formation
in standard-setting alliances. Manag. Sci. 41(9), 1493–1508 (1995)
Bimodal Coalitions and Neural Networks 419
3. Galam, S.: Fragmentation versus stability in bimodal coalitions. Phys. A 230(1–2), 174–188
(1996)
4. Serge, G.: Sociophysics. Springer, New York (2012)
5. Houdayer, J., Martin, O.C.: Renormalization for discrete optimization. Phys. Rev. Lett. 83,
1030–1033 (1999)
6. Karandashev, I.M., Kryzhanovsky, B.V.: Matrix transformation method in quadratic binary
optimization. Opt. Mem. Neural Netw. (Inf. Opt.) 24(2), 67–81 (2015)
Building Neural Network Synapses Based
on Binary Memristors
Mikhail S. Tarkov(&)
Abstract. The design of an analog multilevel memory cell based on the use of
resistors and binary memristors is proposed. This design provides a greater
number of resistance levels with a smaller number of elements than the well-
known multilevel memory devices. The cell is designed to set the synapse
weights in hardware-implemented neural networks. The neuron vector of
weights can be represented by a crossbar of binary memristors and a resistor set.
An algorithm is proposed for mapping the neuron weight to the proposed
multilevel memory cell. The proposed approach is illustrated by the construction
example of a neuron for partitioning a set of vectors into two classes.
1 Introduction
The neural network hardware implementation requires a lot of memory to store the
neurons layer weight matrix and it is expensive. The solution of this problem is
simplified by using a device called memristor (a resistor with a memory) as a memory
cell. The memristor was predicted theoretically in 1971 by Leon Chua [1]. The first
physical realization of the memristor was demonstrated in 2008 by the Hewlett Packard
laboratory as a thin-film TiO2 structure [2]. The memristor behaves like a synapse: it
“remembers” the total electrical charge that has passed through it. The memory based
on the memristors can reach the integration degree of 100 Gbits/cm2, several times
higher than that based on the flash memory technology. These unique properties make
the memristor a promising device for creating massively parallel neuromorphic
systems.
Binary memristors realize two conductivity values. Multilevel memristors realize a
set of discrete conductivity levels (the levels number can reach tens and hundreds).
Binary and multilevel memristors [3–8] are based on the filament switching mechanism
and are more widespread than analog memristors, which conductivities can be changed
continuously. The analog memristor materials are encountered much less often and
they require a more complex making process. Multilevel memristors are more stable to
statistical fluctuations than the analog memristors. The use of binary memristors to set
the weighing coefficients of neural networks makes it important to create multilevel
memory cells based on them.
m R=2n ; R M=n:
nP
1
For n binary digits, the cell resistance has 2n values from R= 2i (we neglect the
i¼0
value m) to M R n. For example, we get 32 values using 10 elements (5 memristors
and 5 resistors with resistances Ri ¼ R=2i ; i ¼ 0; 1; . . .; n 1 (Fig. 1). For compar-
ison: in the cell proposed in [9], 27 resistance values were obtained using 15 elements
(3 memristors and 12 resistors).
according to this basis. The set of binary memristors forms a crossbar, the number of
rows in which is equal to the decomposition digits number n, and the columns number
is equal to the neuron inputs number. In the general case, for the weight vector real-
ization, two crossbars are required: the first for realizing positive weights, and the
second for realizing negative weights.
In Fig. 2, the circuits designed to set the memristor resistances of the crossbar are
not shown. The corresponding scheme is presented in Fig. 3. It allows us to set the
memristor resistance of an arbitrary crossbar to the minimum m or maximum M value
depending on the sign of the voltage that is fed to the input In and significantly exceeds
the binary memristor voltage threshold. For setting the memristor resistance, the
transistor T is open by the voltage source V. In the crossbar functioning mode, this
transistor is closed.
Building Neural Network Synapses Based on Binary Memristors 423
Suppose that the neural network is trained, i.e. the network weights have been cal-
culated. To implement the neuron weights based on the multilevel memory cell we
propose the following algorithm.
1. Among the neuron weight coefficients w1 ; w2 ; . . .wL ; L is the number of weights,
choose a coefficient wmin 6¼ 0 such that jwmin j jwi j for all i ¼ 1; . . .; L. Put the
coefficient wmin in correspondence to the resistor with minimum conductivity R1 ,
R m, R M.
2. Normalize the weights: wi wi =jwmin j, i ¼ 1; . . .; L:
3. Set the number of binary digits n ¼ 1.
4. For normalized weights wi ; i ¼ 1; 2; . . .; L; select a set of binary coefficients kji 2
f0; 1g providing a minimum of the sum
X
L X
n1
Sn ¼ ðjwi j kji 2 j Þ2 :
i¼0 j¼0
X
3 X
6
w¼ xi xi ¼ ð3; 1; 1Þ: ð1Þ
i¼1 i¼4
x is the neuron input vector. The activation function (2) can be implemented on the
basis of an operational amplifier operating in comparator mode.
424 M. S. Tarkov
For clarity, only memristors in the “on” state are shown here, that is, memristors
with minimal resistance m ¼ 100 X. The Table 1 shows the results of the experiment
in LTSPICE modeling system [10]. The output voltage values 3.2 V mean that the
vectors x1 ; x2 ; x3 belong to the first class, and the values −3.2 V mean that the vectors
x4 ; x5 ; x6 belong to the second class (supply voltage V ¼ 5 V).
In order for the operational amplifier to implement the activation function (2) at the
input x4 ¼ ð0; 0; 0Þ, a small negative bias based on the V2 source is added to the
circuit. The input value 0 corresponds to zero voltage, and the input 1 corresponds to
the voltage 0.3 V, which does not change the memristor resistance.
Building Neural Network Synapses Based on Binary Memristors 425
6 Conclusion
An analog multilevel memory cell design based on resistors and binary memristors is
proposed. This design provides a greater number of resistance levels with a smaller
number of elements than the one proposed previously. The cell is designed to set the
neuron synapse weights in the hardware-implemented neural networks.
The neuron weights can be represented by a crossbar of binary memristors and a set
of resistors. The number of resistors used in the neuron weights vector does not depend
on the number of weights.
An algorithm is proposed for mapping the neuron weights to the multilevel memory
cells with binary memristors. The proposed approach is illustrated by the neuron
construction example for partitioning a set of vector patterns into two classes. The
example is implemented in the LTSPICE software simulation environment.
References
1. Chua, L.: Memristor – the missing circuit element. IEEE Trans. Circ. Theor. 18, 507–519
(1971)
2. Strukov, D.B., Snider, G.S., Stewart, D.R., Williams, R.S.: The missing memristor found.
Nature 453, 80–83 (2008)
3. He, W., Sun, H., Zhou, Y., Lu, K., Xue, K., Miao, X.: Customized binary and multi-level
HfO2−x-based memristors tuned by oxidation conditions. Sci. Rep. 7, 10070 (2017)
4. Yu, S., Gao, B., Fang, Z., Yu, H., Kang, J., Wong, H.-S.P.: A low energy oxide-based
electronic synaptic device for neuromorphic visual systems with tolerance to device
variation. Adv. Mater. 25, 1774–1779 (2013)
5. Tarkov, M.S.: Crossbar-based hamming associative memory with binary memristors. In:
Huang, T., Lv, J., Sun, C., Tuzikov, A. (eds.) Advances in Neural Networks – ISNN 2018.
ISNN 2018. Lecture Notes in Computer Science, vol 10878. Springer, Cham (2018). https://
link.springer.com/chapter/10.1007/978-3-319-92537-0_44. Accessed 25 Apr 2019
6. Truong, S.N., Ham, S.-J., Min, K.-S.: Neuromorphic crossbar circuit with nanoscale
filamentary-switching binary memristors for speech recognition. Nanoscale Res. Lett. 9
(629), 1–9 (2014)
7. Nguyen, T.V., Vo, M.-H.: New binary memristor crossbar architecture based neural
networks for speech recognition. Int. J. Eng. Sci. Invent. 5(5), 1–7 (2016)
8. Yakopcic, C., Taha, T.M., Subramanyam, G., Pino, R.E.: Memristor SPICE model and
crossbar simulation based on devices with nanosecond switching time. In: Proceedings of
International Joint Conference on Neural Networks, Dallas, Texas, USA, 4–9 August,
pp. 158–160, IEEE (2013) https://ieeexplore.ieee.org/abstract/document/6706773. Accessed
25 Apr 2019
9. Irmanova, A., James, A.P.: Neuron inspired data encoding memristive multi-level memory
cell. Analog Integr. Circ. Sign. Process. 95, 429–434 (2018)
10. LTspice XVII. URL: http://www.linear.com/designtools/ software/ #LTspice
Author Index
A Egorchev, Mikhail, 25
Aleksey, Staroverov, 62 Engel, Ekaterina A., 45
Alexandrov, Yu. I., 138 Engel, Nikita E., 45
Alexandrov, Yuri I., 159 Eroshenkova, Daria A., 295
Andreev, Ark, 71
Andreeva, Olga V., 303 F
Arutyunova, K. R., 138 Farzetdinova, Rimma, 95
Fedorenko, Yuriy S., 207
B Filatov, Nikolay, 214
Bakhshiev, A. V., 221 Fomin, I. S., 221
Bakhshiev, Aleksandr, 214 Fomin, Ivan, 214
Beskhlebnova, Galina A., 124
Bogatyreva, Anastasia A., 263 G
Brynza, A. A., 367 Gai, Vasiliy E., 303
Bulava, Alexandra I., 159 Gapanyuk, Yuriy, 71, 78
Burikov, Sergey, 285, 319 Glyzin, Sergey D., 181
Gorban, Alexander N., 384
C Gordleeva, Susan Yu., 384
Chizhov, Anton V., 165 Gurtovoy, Konstantin, 151
Chumachenko, Sergey I., 295 Guseva, Alena, 392
D I
Dakhtin, Ivan S., 116 Igonin, Dmitry M., 309
Demareva, Valeriia A., 89 Isaev, Igor, 319
Demidovskij, Alexander V., 375 Ivanchenko, Mikhail V., 384
Demin, Vyacheslav, 255
Dick, Olga E., 172 K
Dolenko, Sergey, 285, 319 Kaganowa, Inna, 412
Dolenko, Tatiana, 285, 319 Kapustina, Ekaterina O., 271
Dolzhenko, Alexandr V., 271 Karandashev, I. M., 230, 359
Dunin-Barkowski, Witali L., 405 Kartashov, Sergey I., 144
Kashcheev, Mikhail, 53
E Kazantsev, Victor B., 190
Edeleva, Yu. A., 89 Khayrov, E. M., 230
Efitorov, Alexander, 285 Kholodny, Yuri I., 144
M T
Makarenko, Nikolay, 239 Taran, Maria, 78
Malakhov, Denis G., 144 Tarasov, A. S., 106
Malsagov, M. Yu., 230 Tarkhov, Dmitriy A., 351
Malykhina, Galina, 392 Tarkov, Mikhail S., 420
Matveev, Mikhail, 151 Telyatnikov, L. S., 359
Meilikov, Evgeny, 95 Terekhov, Serge A., 17
Mizginov, Vladimir A., 3 Terekhov, Valeri I., 295
Moshkantsev, Peter V., 3 Tiumentsev, Yury, 25
Moskalenko, Viktor, 246 Tiumentsev, Yury V., 309, 335
Muratov, Y. R., 106 Trofimov, Alexander G., 263
N U
Nekhaev, Dmitry, 255 Ushakov, Vadim L., 144
Nikiforov, M. B., 106
V
O Vasilyev, Alexander N., 351
Ohinko, Timur, 239 Vlasenko, Vladislav, 214
Orekhov, Alexey, 71 Volkov, Sergey V., 159
Orlov, Vyacheslav A., 144 Vvedensky, Victor, 151
Osipov, Grigory, 246
P Y
Palagushkin, Alexandr N., 326 Yakimova, Elena G., 165
Pankratova, Evgeniya V., 190 Yudin, Dmitry A., 271
Panov, Aleksandr I., 62 Yudkin, Fedor A., 326
Pashkov, Anton A., 116
Petrushan, Mikhail, 53 Z
Polyakov, Igor V., 303 Zaikin, Alexey A., 384
Preobrazhenskaia, Margarita M., 181 Zolotykh, Nikolai, 246