Learning Deep Learning
Learning Deep Learning
Received on April 01, 2022. Revised on July 21, 2022. Accepted on July 22, 2022.
As a consequence of its capability of creating high level abstractions from data, deep learning has been effectively
employed in a wide range of applications, including physics. Though deep learning can be, at first and simplistically
understood in terms of very large neural networks, it also encompasses new concepts and methods. In order to
understand and apply deep learning, it is important to become familiarized with the respective basic concepts. In
this text, after briefly revising some works relating to physics and deep learning, we introduce and discuss some
of the main principles of deep learning as well as some of its principal models. More specifically, we describe the
main elements, their use, as well as several of the possible network architectures. A companion tutorial in Python
has been prepared in order to complement our approach.
Keywords: Deep learning, Tutorial, Classification.
1. Introduction This is the case with several categories that are par-
ticularly important to humans, such as faces, plant and
In order for humans to interact with their environment, animals, as well as actions, among others. In these cases,
which includes other humans, it is necessary to develop subcategories are created, leading to increasing levels
models of the entities in the world (e.g. [1]). These of information and detail. However, because of limited
models allow not only the recognition of important memory and processing, the problem becomes increas-
objects/actions, but also provide subsidies for making ingly complex (e.g. [4]), and we need to stop this sub-
predictions that can have great impact on our lives. categorization at a point that is viable given our needs.
As a consequence of our restricted cognitive abilities, As a consequence of the fundamental importance of
the developed models of world entities need to have pattern recognition for humans, and also of our limita-
some level of abstraction, so as to allow a more effective tions, interest was progressively invested in developing
handling and association of concepts, and also as a automated means for performing this ability, leading to
means to obtain some degree of generalization in the areas such as automated pattern recognition, machine
representations [2]. learning, and computer vision (e.g. [5]).
Interestingly, the ability of abstraction is required from Artificial approaches to pattern recognition typically
humans, as a consequence of the need to prevent a level involve two main steps: (a) feature extraction; and
of detail that would otherwise surpasses our memory (b) classification based on these features (e.g. [1, 6]).
and/or processing capacity [3]. So, when we derive a Figure 1 illustrates these two stages. While the former
category of a real object such as a pear, we leave out was initially human-assisted, efforts were focused on
a large amount of detailed information (e.g. color varia- the classification itself. From the very beginning, neural
tions, small shape variations, etc.) so as to accommodate networks were understood to provide a motivation and
the almost unlimited instances of this fruit that can be reference, as a consequence of the impressive power of
found. Provided that we chose an effective set of features biological nervous systems (e.g. [7]).
to describe the pear, we will be able to recognize almost Interestingly, the subject of artificial neural networks
any instance of this fruit as being a pear, while generally (e.g. [8]) received successive waves of attention from the
not being able to distinguish between the pears in a tree. scientific-technologic community (for instance, respec-
Ideally, it would be interesting that the recognition tive to the Perceptron (e.g. [9]), and Hopfield networks
operated at varying levels of details so that, after (e.g. [10, 11]). These waves, frequently observed in
recognizing the general type of object, we could process scientific development, are characterized by a surge
to subsequent levels of increased detail and information, of development induced by one or more important
therefore achieving a more powerful performance. advances, until a kind of saturation is reached as a
consequence of the necessity of new major conceptual
* Correspondence email address: [email protected] and/or technological progresses. After saturation arises,
Supervised
including but not limited to mathematics, computer
Data science, biology, and physics. In the case of deep learning,
size
important concepts of physics [14, 15] have often been
weight
→ employed. Here, we brief and non-exhaustively review
width f Classifier Category
some of the concepts of physics that have found their
...
...
Revista Brasileira de Ensino de Física, vol. 44, e20220101, 2022 DOI: https://doi.org/10.1590/1806-9126-RBEF-2022-0101
Arruda et al. e20220101-3
xi
w i,k Ʃ yk of learning any possible shape [33], which is called
universal aproximation theorem. So, theoretically, this
w N,k number of layers is enough to any problem. However,
usually at least two hidden layers are used because it
was found to decrease the learning time and improve
xN
the accuracy [34].
(b)
4. The Deep Learning Framework
Figure 2: (a) A highly simplified biological neuron. The main
parts of a neuron include: dendrites and synapses; the cellular One of the main points of deep-learning is the capability
body; the implantation cone (represented as the dashed region),
of the network to extract features directly from the
in which the integration occurs; and the axons that transmit the
signal to the next neuron(s). (b) A possible model of a neuron
data. While feature extraction and classification are
k, k = 1, 2, . . . , K: The input data xi , i = 1, 2, . . . , N , come performed in standard machine learning methods, in
from the input layer at the left-hand side, and each of these deep-learning the network can learn the features from
values is multiplied by the respective weight wi,k . These values the raw data. Figure 3 illustrates the similarities and
are then summed up, yielding zk , which is fed into the activation differences between a typical neural network and a
function, producing the output yk . convolutional deep learning network.
4.1. Optimization
to assign the correct class to a given input data, it
Optimization is one of the key points of deep learning.
is necessary to associate the more appropriate weights,
This step consists of minimizing the loss function during
by using a given training method. One possibility is to
neural network training. The loss function, which mea-
optimize W according to an error function, where the
sures the quality of the neural network in modeling the
error is updated as follows
data, is used in order to optimize the network weights
wi,k (n) = wi,k (n − 1) + α xi ǫ, (1) (W ). There are several functions that can be used as
loss functions, some example include: Mean Square Error
where wi,k ∈ W , wi,k (n − 1) are the current weights, (MSE), Mean Absolute Error (MAE), Mean Bias Error
wi,k (n) are the updated weights, α is the learning rate, (MBE), SVM Loss, and Cross entropy loss. The chosen
xi is the input data, and ǫ is the error. Interestingly, this function depends on the deep learning network type and
simple methodology can classify data from two classes the performed task.
that are linearly separated. Figure 2 presents a highly The method used for optimization is called Optimizer.
simplified biological neuron (a) as well as a possible This optimization allows the classifier to learn its weights
respective model (b). W with respect to the training data. Because it is not
In order to represent more general regions, sets of possible to know the location of the global minimum
neurons have been considered, which are organized as of this function, several methods have been considered
a network [8]. The more straightforward manner is the including gradient descent, stochastic gradient descent,
use of the Multilayer Perceptron (MLP) [8]. In this case, and Adam [35]. The latter, one of the most often
the neurons are organized into three types of layers: adopted methods, consists of an efficient method for
stochastic optimization. As in the stochastic gradient
• Input layer: the first layer of the network (data descent, Adam also employs random sub-samples, called
input); minibatches. For each of the optimized parameters,
DOI: https://doi.org/10.1590/1806-9126-RBEF-2022-0101 Revista Brasileira de Ensino de Física, vol. 44, e20220101, 2022
e20220101-4 Learning Deep Learning
Data
(a)
k'N
kN
Data ...
...
...
...
...
k'3
k'2
k1 k'1
k0 k'0
(b)
Figure 3: A simple multilayer perceptron (a), and a convolutional deep learning network (b). A convolutional network involves layers
dedicated to convolution, pooling, and flattening. Each matrix of the convolution layer is associated with a given kernel ki . Often
a feedforward network is adopted for the classifier component.
one individual adaptive learning rate is used, and the CPU GPGPU
parameters are estimated from the first and second Memory
moments of the gradients. This method is indicated
for problems involving a significant number of parame- Memory Control
Core Core Core Core Core Core Core Core
ters [35]. Another advantage of Adam is that it requires Core Core Core Core Core Core Core Core
Core Core Core Core Core Core Core Core Core Core
Control
4.2. GPUs Control Core Core Core Core Core Core Core Core
The Graphics Processing Unit (GPU) was created to Core Core Control
Core Core Core Core Core Core Core Core
Revista Brasileira de Ensino de Física, vol. 44, e20220101, 2022 DOI: https://doi.org/10.1590/1806-9126-RBEF-2022-0101
Arruda et al. e20220101-5
is defined as The softmax function [13] can be used in the last layer
1 to deal with classification problems in which there are
f (z) = , (2) many distinct classes. This function is defined for each
1 + e−z
neuron k, k = 1, 2, . . . , K (see Fig. 2) as follows
where z is the value of the cell body (at the implan-
tation cone) sum. Another alternative is the hyperbolic ezk
f (zk ) = PK , (6)
tangent [37], which is defined as ezi
i=1
ez − e−z where zi is the ith input to the respective activation
f (z) = . (3)
ez + e−z function and K is the number of inputs to that function.
In this case, the function returns positive or negative Because the sum of the exponential values normalizes
values when the input is positive or negative, respec- this function, it can be understood as a probability
tively, as shown in Figure 5(c). Due to this characteristic, distribution.
the hyperbolic tangent is typically employed in tasks
involving many negative inputs. 4.4. Deep learning main aspects
Another possibility is the identity function [13], also
This subsection briefly describes the characteristic
called linear function. In this case, the input and
aspects of deep learning.
output are exactly the same, as can be observed in
Figure 5(d). This function is typically employed for
regression tasks. In the case of convolutional neural 4.4.1. Bias
networks (see Section 4.5), the most common activation The concept of bias consists of incorporating a fixed
function is the Rectified Linear Unit (ReLU) [13], which value, b, as input to the neural layer [40]. This value
is a function defined as allows the activation function to adapt in order to
f (z) = max(0, z). (4) better fit the data. Biasing can be mathematically
represented as
This functions is shown in Figure 5(e). In the case
of image data, the value of the pixels are given by yk = f (X T · Wk + b), (7)
positive numbers. So the input layer does not have where X = [xi ] is the input column vector, Wk = [wi,k ]
negative values. This function is understood as being is the column vector k derived from the weight matrix
ease to optimize and to preserve properties having good W , f (·) is a given activation function, and yk is the
generalization potential. output of the neuron k.
Alternatively, the Leaky Rectified Linear Units (Leaky
ReLU) [38, 39] function can be employed instead of
4.4.2. One hot encoding
ReLu. The difference is that the Leaky ReLU returns
output values different from zero when the inputs are One possibility to deal with categorical features (e.g.,
negative. In some situations, the Leaky ReLU was found car brands, nationalities, and fruits) is to map the
to reduce the training time. This function is defined as categories into integer numbers [41]. However, the order
of the numbers is arbitrary and can be interpreted by
f (z) = max(αz, z), (5)
the classifier as a ranking. Another solution consists
where α is the parameter that controls the negative part of assigning a separated variable to each category. An
of the function. An example of this function is illustrated example regarding fruits can be found in Figure 6. This
in the Figure 5(f). approach is called one hot encoding.
DOI: https://doi.org/10.1590/1806-9126-RBEF-2022-0101 Revista Brasileira de Ensino de Física, vol. 44, e20220101, 2022
e20220101-6 Learning Deep Learning
Categorical
Encodings y
features
Fruit
Blueberry 1 0 0
Pear 0 1 0
x
Apple 0 0 1 (a)
Apple 0 0 1
Pear 0 1 0 y y
x x
(b) (c)
0 3 6 1 2 0
9 7 5 6 5 2 Figure 8: Example of overfitting while classifying samples from
9 6 5 two classes, represented by blue circles and yellow squares, in
4 8 8 5 9 0 the presence of noisee. The dashed lines indicate the proper
8 8 9
1 3 5 4 2 3 separation between regions, and the black lines indicate the
7 8 8 separation found by a classifier. This classification problem can
7 6 7 8 8 4
be described by the regions as in (a), but different samplings
1 3 1 2 0 7 from this problem can lead to rather different classification
curves, as illustrated in (b) and (c), since the curve adheres
Figure 7: Example of max pooling, in which the highest number too much to each of the noisy sampled data sets.
of each window is selected and assigned to the new, reduced
matrix.
4.4.5. Overfitting
Overfitting (e.g. [43]) happens when the model fits, in
4.4.3. Pooling presence of noise or original category error, many details
This process is used in convolutional neural networks of the training data at the expense of undermining
(CNNs), typically after the convolution, for reducing the its generality for classifying different data. Figure 8
dimensionality of a given matrix, by first partitioning illustrates an example of this behavior. Some of the
each matrix in an intermediate layer and then mapping possible approaches to address overfitting are discussed
each partition into a single value [42]. There are many in the following.
possibilities of pooling. For example, the max pooling
selects the maximum value from each window; the min 4.4.6. Dropout
pooling considers the minimum value instead, among
many other possibilities. See an example in Figure 7. Dropout was proposed in order to minimize the problem
of overfitting [44]. More specifically, the objective of
this approach is to avoid excessive detail by replacing a
4.4.4. Flattening
percentage of the values of a given layer with zeros. The
This technique is employed in CNNs to convert a 2D percentage of zeros is the parameter of this technique.
matrix (or a set of matrices) into a 1D vector, as The success of Dropout derives from the fact that the
neurons do not learn too much detail of the instances of
x1,1
x1,2
the training set. Alternatively, it can be considered that
Dropout generates many different versions of a neural
..
. network, and each version has to fit the data with good
x1,1 x1,2 ··· x1,M
x1,M
accuracy.
x2,1 x2,2 ··· x2,M x2,1
. .. .. Flattening , (8)
.. .. x2,2
. . . −−−−−−−→
..
4.4.7. Batch normalization
xN,1 xN,2 ··· xN,M .
x2,M
Batch normalization [45] is based on the idea of normal-
.. izing the input of each layer to zero mean and unit stan-
.
dard deviation for a given batch of data. The advantages
xN,M
of applying this technique include the reduction of the
where N ×M is the dimension of the input matrix X. By number of training steps and, consequently, a decrease
considering a matrix set, the resultant vector represents of the learning time. This effect occurs because it allows
the concatenation of the vectors respectively to all of the the neurons from any layer to learn separately from
matrices. those in the other layers. Another advantage is that, in
Revista Brasileira de Ensino de Física, vol. 44, e20220101, 2022 DOI: https://doi.org/10.1590/1806-9126-RBEF-2022-0101
Arruda et al. e20220101-7
1 https://github.com/hfarruda/deeplearningtutorial/blob/mast 2 https://github.com/hfarruda/deeplearningtutorial/blob/mast
er/deepLearning_feedforward.ipynb er/deepLearning_CNN.ipynb
DOI: https://doi.org/10.1590/1806-9126-RBEF-2022-0101 Revista Brasileira de Ensino de Física, vol. 44, e20220101, 2022
e20220101-8 Learning Deep Learning
(a)
Revista Brasileira de Ensino de Física, vol. 44, e20220101, 2022 DOI: https://doi.org/10.1590/1806-9126-RBEF-2022-0101
Arruda et al. e20220101-9
DOI: https://doi.org/10.1590/1806-9126-RBEF-2022-0101 Revista Brasileira de Ensino de Física, vol. 44, e20220101, 2022
e20220101-10 Learning Deep Learning
Table 1: Comparison among the models considered in this work. *RBMs are normally employed as a part of the deep belief networks.
†
NLP means Natural Language Processing. The last column presents links to tutorial elaborated for each model.
Models Learning Main Applications Information Flow Tutorials
Feedforwrd Supervised Classification and Regression. Single Direction Tutorial – 1
CNN Supervised Computer Vision. Single Direction Tutorial – 2
RNN Supervised Temporal Series. With Loops Tutorial – 3
RBM* Unsupervised Computer Vision, Recommender Systems, Undirected Tutorial – 4
Information Retrieval, and
Data compression, etc.
Autoencoder Unsupervised Information Retrieval and Single Direction Tutorial – 5
Data compression.
GAN Semi-supervised Generation of Images, Audio Synthesis, Single Direction Tutorial – 6
NLP† , and Temporal Series.
Full tutorial available at https://github.com/hfarruda/deeplearningtutorial
particularly important for understanding a wide range of [5] L.F. Costa and R.M. Cesar Jr, Shape analysis and
deep learning-related methods. Table 1 summarizes the classification: theory and practice (CRC Press, Inc.,
revised models and some of their respective characteris- Boca Raton, 2000).
tics. A tutorial in Python has been prepared to serve as a [6] R.O. Duda, P.E. Hart and D.G. Stork, Pattern classifi-
companion to this work, illustrating and complementing cation (John Wiley & Sons, Hoboken, 2012).
the covered material (https://github.com/hfarruda/de [7] F.R. Monte Ferreira, M.I. Nogueira and J. DeFelipe,
eplearningtutorial). It is hoped that the reader will be Frontiers in neuroanatomy 8, 1 (2014).
motivated to probe further into the related literature. [8] S. Haykin, in: Neural networks and learning machines
(Pearson Education, India, 2009), 3 ed., v. 10.
Acknowledgments [9] I. Stephen, IEEE Transactions on neural networks 50,
179 (1990).
[10] J.D. Keeler, Cognitive Science 12, 299 (1988).
Henrique F. de Arruda acknowledges FAPESP for spon-
[11] B. Xu, X. Liu and X. Liao, Computers & Mathematics
sorship (grant no. 2018/10489-0, from 1st February
with Applications 45, 1729 (2003).
2019 until 31st May 2021). H. F. de Arruda also
[12] S. Dutta, Wiley Interdisciplinary Reviews: Data Mining
thanks Soremartec S.A. and Soremartec Italia, Ferrero and Knowledge Discovery 8, e1257 (2018).
Group, for partial financial support (from 1st July 2021). [13] I. Goodfellow, Y. Bengio and A. Courville, Deep learning
His funders had no role in study design, data collec- (MIT press, Cambridge, 2016).
tion, and analysis, decision to publish, or manuscript [14] N. Thuerey, P. Holl, M. Mueller, P. Schnell, F. Trost and
preparation. Alexandre Benatti thanks Coordenação K. Um, arXiv:2109.05237 (2021).
de Aperfeiçoamento de Pessoal de Nível Superior – [15] A. Tanaka, A. Tomiya and K. Hashimoto, Deep Learning
Brasil (CAPES) – Finance Code 001. Luciano da F. and Physics (Springer, Singapore, 2021).
Costa thanks CNPq (grant no. 307085/2018-0) and [16] L. Zdeborva, Nature Physics 16, 602 (2020).
FAPESP (proc. 15/22308-2) for sponsorship. César H. [17] D.E. Rumelhart, G.E. Hinton, J.L. McClelland, in:
Comin thanks FAPESP (Grant Nos. 2018/09125-4 and Parallel distributed processing: Explorations in the
2021/12354-8) for financial support. This work has microstructure of cognition, edited by D.E. Rumelhart
been supported also by FAPESP grants 11/50761-2 and and J.L. McClelland (MIT Press, Cambridge, 1986).
15/22308-2. [18] R. Salakhutdinov and H. Larochelle, in: Proceedings
of the thirteenth international conference on artificial
References intelligence and statistics (Sardinia, 2010).
[19] M.H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and
[1] L.F. Costa, Modeling: The human approach to sci- R. Melko, Physical Review X 8, 021050 (2018).
ence (cdt-8), available in: https://www.researchgate.net [20] N. Srivastava and R.R. Salakhutdinov, Advances in
/publication/333389500_Modeling_The_Human_App neural information processing systems 25 (2012).
roach_to_Science_CDT-8, accessed in 06/06/2019. [21] I. Goodfellow, M. Mirza, A. Courville and Y. Bengio, in:
[2] E.B. Goldstein and J. Brockmole, Sensation and percep- Proceedings of Advances in Neural Information Process-
tion (Cengage Learning, Belmont, 2016). ing Systems 26 (Lake Tahoe, 2013).
[3] R.G. Cook and J.D. Smith, Psychological Science 17, [22] M. Lutter, C. Ritter and J. Peters, arXiv:1907.04490
1059 (2006). (2019).
[4] L.F. Costa, Quantifying complexity (cdt-6), available in: [23] C. Häger and H.D. Pfister, IEEE Journal on Selected
https://www.researchgate.net/publication/332877069_ Areas in Communications 39, 280 (2020).
Quantifying_Complexity_CDT-6, accessed in 06/06/ [24] P. Sadowski and P. Baldi, in: Braverman Readings in
2019. Machine Learning. Key Ideas from Inception to Current
Revista Brasileira de Ensino de Física, vol. 44, e20220101, 2022 DOI: https://doi.org/10.1590/1806-9126-RBEF-2022-0101
Arruda et al. e20220101-11
State, edited by L. Rozonoer, B. Mirkin and I. Muchnik [48] Y. Bengio, P. Simard and P. Frasconi, IEEE transactions
(Springer, Boston, 2018). on neural networks 5, 157 (1994).
[25] M. Erdmann, J. Glombitza, G. Kasieczka and U. [49] S. Hochreiter and J. Schmidhuber, Neural computation
Klemradt, Deep Learning for Physics Research (World 9, 1735 (1997).
Scientific, Singapore, 2021). [50] G.E. Hinton, S. Osindero and Y.W. Teh, Neural compu-
[26] M. Raissi, The Journal of Machine Learning Research tation 18, 1527 (2006).
19, 932 (2018). [51] E. Aarts, J. Korst, Simulated annealing and Boltzmann
[27] D. Guest, K. Cranmer and D. Whiteson, Annual Review machines: a stochastic approach to combinatorial opti-
of Nuclear and Particle Science 68, 161 (2018). mization and neural computing (John Wiley & Sons,
[28] T.A. Le, A.G. Baydin and F. Wood, in: Proceedings of Inc., Hoboken, 1989).
the 20th International Conference on Artificial Intelli- [52] G.E. Hinton, Neural computation 14, 1771 (2002).
gence and Statistics PMLR 54 (Fort Lauderdale, 2017). [53] P. Baldi, in: Proceedings of ICML workshop on unsuper-
[29] A.G. Baydin, L. Shao, W. Bhimji, L. Heinrich, L. Mead- vised and transfer learning, PMLR 27 (Bellevue, 2012).
ows, J. Liu, A. Munk, S. Naderiparizi, B. Gram-Hansen, [54] L. McInnes, J. Healy and J. Melville, arXiv:1802.03426
G. Louppe, et al., in: Proceedings of the international (2018).
conference for high performance computing, networking, [55] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.
storage and analysis (Denver, 2019). Warde-Farley, S. Ozair, A. Courville and Y. Bengio, in:
[30] G. Zheng, X. Li, R.H. Zhang and B. Liu, Science Advances in neural information processing systems 27,
advances 6, eaba1482 (2020). edited by Z. Ghahramani, M. Welling, C. Cortes, N.
[31] A. Guillen, A. Bueno, J. Carceller, J. Martınez- Lawrence and K.Q. Weinberger (NeurIPS Proceedings,
Velazquez, G. Rubio, C.T. Peixoto and P. Sanchez- Montreal, 2014).
Lucas, Astroparticle Physics 111, 12 (2019). [56] K. Roth, A. Lucchi, S. Nowozin and T. Hofmann, in:
[32] M. Reichstein, G. Camps-Valls, B. Stevens, M. Jung, J. Advances in Neural Information Processing Systems 30,
Denzler, N. Carvalhais and Prabhat, Nature 566, 195 edited by I. Guyon, U. Von Luxburg, S. Bengio, H.
(2019). Wallach, R. Fergus, S. Vishwanathan and R. Garnett
[33] R. Hecht-Nielsen, in: Proceedings of the international (NeurIPS Proceedings, Montreal, 2014).
conference on Neural Networks (New York, 1987). [57] L. Metz, B. Poole, D. Pfau and J. Sohl-Dickstein, in:
[34] M.M. Poulton, in: Handbook of Geophysical Exploration: Proceedings of International Conference on Learning
Seismic Exploration (Elsevier, Amsterdam, 2001), v. 30. Representations (San Juan, 2016).
[35] D.P. Kingma and J. Ba, arXiv:1412.6980 (2014). [58] G.C. Cawley and N.L. Talbot, Journal of Machine
[36] A.K. Jain, J. Mao and K. Mohiuddin, Computer 29, 31 Learning Research 11, 2079 (2010).
(1996).
[37] P. Sibi, S.A. Jones and P. Siddarth, Journal of The-
oretical and Applied Information Technology 47, 1264
(2013).
[38] A.L. Maas, A.Y. Hannun and A.Y. Ng, in: Pro-
ceeding International Conference on Machine Learning
(Atlanta, 2013).
[39] B. Xu, N. Wang, T. Chen and M. Li, arXiv:1505.00853
(2015).
[40] C.M. Bishop and N.M. Nasrabadi, Pattern recognition
and machine learning (Springer, Berlim, 2006), v. 4.
[41] A. Deshpande and M. Kumar, Artificial intelligence
for big data: Complete guide to automating big data
solutions using artificial intelligence techniques (Packt
Publishing Ltd, Birmingham, 2018).
[42] M. Cheung, J. Shi, O. Wright, L.Y. Jiang, X. Liu and
J.M. Moura, IEEE Signal Processing Magazine 37, 139
(2020).
[43] X. Ying, in: 2018 International Conference on Computer
Information Science and Application Technology – v.
1168 (Daqing, 2019).
[44] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever
and R.R. Salakhutdinov, arXiv:1207.0580 (2012).
[45] S. Ioffe and C. Szegedy, in: Proceedings of the 32nd Inter-
national Conference on Machine Learning (Mountain
View, 2015).
[46] J. Schmidhuber, Neural networks 61, 85 (2015).
[47] D.E. Rumelhart, G.E. Hinton and R.J. Williams, Nature
323, 533 (1986).
DOI: https://doi.org/10.1590/1806-9126-RBEF-2022-0101 Revista Brasileira de Ensino de Física, vol. 44, e20220101, 2022