A Feedforward Neural Network Framework For Approximating The Solutions To Nonlinear Ordinary Differential Equations
A Feedforward Neural Network Framework For Approximating The Solutions To Nonlinear Ordinary Differential Equations
https://doi.org/10.1007/s00521-022-07855-5 (0123456789().,-volV)(0123456789().
,- volV)
ORIGINAL ARTICLE
Received: 25 May 2022 / Accepted: 16 September 2022 / Published online: 1 October 2022
Ó The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022
Abstract
In this paper, we propose a method to approximate the solutions to nonlinear ordinary differential equations (ODE) using a
deep learning feedforward artificial neural networks (ANNs). The efficiency of the proposed—unsupervised type machine
learning—method is shown by solving two boundary value problems (BVPs) from quantum mechanics and nanofluid
mechanics. The proposed mean-squared loss function is the sum of two terms: the first term satisfies the differential
equation, while the second term satisfies the initial or boundary conditions. The total loss function is minimized by using
general type of quasi-Newton optimization methods to get a desired network output. The approximation capability of the
proposed method is verified for two sets of boundary value problems: first, a second-order nonlinear ODE and, second, a
system of coupled nonlinear third-order ODEs. Point-wise comparison of our approximation shows a strong agreement
with the available exact solutions and/or Runge–Kutta-based numerical solutions. We remark that the proposed algorithm
minimizes the overall learnable network hyperparameters in a given initial or boundary value problems. More importantly,
for the coupled system of third-order nonlinear ordinary differential equations, the proposed method does not need any
adjustment with the initial/boundary conditions. Also, the current method does not require any special type of computa-
tional mesh. A straightforward minimization of total loss function yields a highly accurate results even with less number of
epochs. Therefore, the proposed framework offers an attractive setting for the fluid mechanics community who are
interested in studying heat and mass transfer problems.
Keywords Machine learning Artificial neural networks System of ordinary differential equations Thomas–Fermi
differential equation Nanofluid mechanics
123
1662 Neural Computing and Applications (2023) 35:1661–1673
approximation capability of ANN is first reported as the special type of finite difference method [33]. The mini-
universal approximation theorem in the seminal works of mization of the total loss function would yield a network
Cybenko and Hornik in [20, 21]. The neural networks are approximate solution which is in excellent agreement with
machine learning models which works similar like neurons the numerical solution from classical Runge–Kutta shoot-
in human brain. Mathematically, these are the directed ing type methods.
graphs where each graph vertex is like a neuron and an The paper is organized as follows: a detailed introduc-
edge connecting two vertices represents an edge between tion about the neural network architecture such parameters,
two neurons. The idea of using ANNs as global approxi- activation functions, and also training & testing algorithms
mators was proposed in the pioneering works of Lee and is given in Sect. 2. The two numerical experiments are
Kang [22], Meade and Fernandez [23], Logovski [24], and presented in Sect. 3; first, the Thomas–Fermi model from
Lagaris [25]. The essence of these works is about using the quantum physics is numerically approximated by the pro-
output of a single hidden-layer (with sufficient number of posed neural network method, and all the computational
neurons) network to construct the discrete approximation details and results are given in the Sect. 3.1; second, a
to the exact solution of the differential equation. The Falkner–Skan type of flow model from nanofluid
important advantages of ANNs as approximators are as mechanics along with its boundary layer approximation
follows: the network approximate solution is smooth and and network approximation method are all presented in the
differentiable; the neural network approximator are on the Sect. 3.2. Finally, conclusions and some directions for the
discrete unstructured points and therefore the meshing is future work are given in Sect. 4.
not an issue; the network output is not affected from
rounding errors, domain discretization errors and approxi-
mation errors which typically defines the convergence in 2 Introduction to neural network
other numerical methods such as finite element method; the architecture
ANN framework is tunable and can be readily setup as a
tool to approximate solutions to ordinary, partial and Consider a Feedforward Neural Network (FFNN) with x 2
integral equations [26–30] and even for approximating Rn as input vector connected to a single hidden layer that
green’s functions for the nonlinear elliptic operators [31]; produces ‘‘n’’ number of neural network outputs denoted
the number of hyperparameters required is less than the by N as shown in Fig. 1. Let l1 ; l2 ; l3 ; l4 denote the single
typical numerical method; the method is implementable on input layer, two hidden layers and a single output layer,
parallel architecture. respectively. The input layer l1 consists of an input vector
In the last few years, there has been huge surge in the x ¼ fx1 ; x2 ; ; xn g connected to H number of hidden
literature with the research focusing on the development of nodes which are then connected to the nodes in the hidden
new deep learning frameworks via unsupervised machine layers l2 and l3 . Let wij denotes the weights connecting the
learning methods to approximate the solutions to differ- input nodes to the hidden nodes. The strength of the con-
ential equations. A major portion of the success is attrib- nection between hidden layers l2 and l3 is denoted by w fij .
uted to the development of efficient libraries such as the The last hidden layer l3 is connected to the output layer l4
Tensorflow, Keras and others [32]. In the present work, through the weights vij . Let yi be the output from each node
we examine the use of a deep learning feedforward neural in the hidden layers and Nðx; vÞ be the neural network
networks framework for the solutions to nonlinear ordinary output. The neural network architecture also consists of an
differential equations. The network architecture is imple- additional input called as bias.
mented using Python programming language along with
the use of Tensorflow and Keras packages. We have 2.1 Neural network parameters
considered two boundary value problems to illustrate the
efficiency of our method: the first one is a second-order The strength of the connection between the nodes is
singular nonlinear ordinary differential equation from described by the weights between the neurons across lay-
quantum mechanics; the second problem is a coupled ers. If the weights of each neurons have the same value,
system of third-order nonlinear ordinary differential equa- then the network fails to learn the different patterns; hence,
tions from nanofluid mechanics. In both the problems, the to avoid such a situation, we initialize each neurons with
basic idea is to minimize the total loss function to get a random values generated by the computer [34]. The
network output, a collocated approximation for the exact weights in the network are assigned by an ‘‘initializer’’
solution and then compare the network solution with the algorithm. Since the neural network is highly sensitive to
available solutions from other numerical techniques from the initial weights, hence it is very important to choose the
the literature. In the case of nanofluid model, the ANN right initializer. The initializers used in our neural network
framework is simple, straightforward, and without any
123
Neural Computing and Applications (2023) 35:1661–1673 1663
architecture are the Glorot normal and the Glorot uniform Definition 1 A function r : R ! ½0; 1 is a squashing or
[35]. In Glorot uniform, the weight matrix values are a set activation function if it is non-decreasing with the fol-
of random numbers drawn from the uniform distribution. lowing properties [36]:
While, the Glorot normal algorithm follows the normal lim rðkÞ ¼ 1; and lim rðkÞ ¼ 0
distribution with a standard deviation e e The
s and a mean m. k!1 k!1
weights are initialized using the following formulas:
e e
W ¼ Dð m; s Þ: ð1Þ
The choice of the activation function has a large impact
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 on the capability and performance of any neural network
e ¼ 0 and e
In Glorot normal; m s¼ : ð2Þ architecture. We have used the following activation func-
Iin þ Oout
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tions in the current work are Tanh, Sigmoid, and Softmax.
6 6 Among these, Sigmoid and Softmax activation functions are
e¼
In Glorot uniform; m and e
s¼ ;
Iin þ Oout Iin þ Oout explained below and in Fig. 2 the plot of functions with
ð3Þ their derivatives are depicted.
where Iin and Oout are the number of inputs and outputs, Sigmoid activation function: The sigmoid activation
respectively. Also, the symbol D specifies normal distri- function, rs ðxÞ : R ! ½0; 1 is a bounded, differentiable,
bution for the Glorot normal and uniform distribution for real-valued function with positive derivative everywhere. It
the Glorot uniform. The network consists of a special type is a monotonically increasing and continuously differen-
of input node known as ‘‘bias’’ b that is constant tiable everywhere. This is an ‘‘S’’ shaped curve (as depicted
throughout the layer. The bias vector is not affected by in Fig. 2a). The sigmoid and its derivative are given by,
other neurons and also does not accept input from any
1
neurons. These weights and biases are collectively called rs ðxÞ ¼ ; ð4Þ
the ‘‘network parameters’’ denoted by P that need to be ð1 þ ex Þ
adjusted to achieve the desired output. r0s ðxÞ ¼ rs ðxÞð1 rs ðxÞÞ: ð5Þ
123
1664 Neural Computing and Applications (2023) 35:1661–1673
123
Neural Computing and Applications (2023) 35:1661–1673 1665
ybðx; PÞ yðxÞ; ð14Þ In the above algorithm, the steps 4 and 5 include the
backpropagation process.
the NN output ybðx; PÞ depends on the input vector x and
the network’s hyperparameters P. The goal of the training
phase is to learn the parameters P by optimizing the loss
function Lðx; PÞ that is written as sum of differential
equation loss Ld ,
123
1666 Neural Computing and Applications (2023) 35:1661–1673
123
Neural Computing and Applications (2023) 35:1661–1673 1667
123
1668 Neural Computing and Applications (2023) 35:1661–1673
The boundary conditions for the flow problem are as Table 1 The absolute error between neural network output and ref-
follows: erence solution for Thomas–Fermi differential equation
as y ! 1 : u ¼ ue ðxÞ; v ¼ 0; T ¼ T1 ; C ¼ C1 ; 0 1 1 0
1 0.426335 0.424008 0.002327
ð42Þ
2 0.244121 0.243009 0.001113
Our goal in this work is to develop a framework for the 3 0.157281 0.156633 0.000649
numerical solution for the problem described above. To 4 0.108716 0.108404 0.000312
reduce the partial differential equations model (37)–(40) to 5 0.078848 0.078808 4.00E-05
the corresponding ordinary differential equation system, 6 0.05934 0.059423 8.27E-05
we first introduce a scalar stream function wðx; yÞ defined 7 0.045976 0.046098 0.000122
as: 8 0.036382 0.036587 0.000205
ow ow 9 0.029171 0.029591 0.00042
u¼ ; and v¼ ; ð43Þ 10 0.023524 0.024314 0.00079
oy ox
15 0.007164 0.010805 0.003641
which satisfies the continuity eq. (37) automatically. Then, 20 1.25E-06 0.005785 0.005784
to study the problem under meaningful boundary layer
assumptions, we introduce the following dimensionless
variables [43–45]:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Table 2 Values of initial slope by varying number of hidden layers
2ue ðxÞm x
w¼ f ðgÞ; ð44aÞ Layers Loss yb0 ð0Þ
ð1 þ mÞ
1 2:27 102 -1.49338443
T ¼ T1 þ ðTw T1 ÞhðgÞ; ð44bÞ 2 2:7 10 3 -1.55807343
C ¼ C1 þ ðCw C1 Þ/ðgÞ; ð44cÞ 3 2 103 -1.56169785
4 2 103 -1.55733242
where g is the similarity variable defined as 5 1:2 10 3 -1.5653681
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð1 þ mÞue ðxÞ 6 2 103 -1.56185863
g¼y : ð45Þ 4
2vx 7 2:2 10 -1.56720482
8 4 -1.56599946
6:67 10
In the current work, we have assumed the free-stream
velocity (also known as the potential flow velocity) as
ue ðxÞ ¼ axm , where a is a constant. The exponent m is a
wedge angle parameter and it is a function of b is the Table 3 Values of initial slope by varying number of hidden neurons
Neurons Loss yb0 ð0Þ
16 0.0227 -1.49338443
32 0.0241 -1.48843538
64 0.0236 -1.48969492
128 0.0239 -1.48865737
256 0.0253 -1.48394417
123
Neural Computing and Applications (2023) 35:1661–1673 1669
123
1670 Neural Computing and Applications (2023) 35:1661–1673
Fig. 5 a The NN solution for profile f ðgÞ and b Velocity profile of the nanofluid
Fig. 6 a Temperature profile of the nanofluid and b Concentration profile of the nanoparticle
Fig. 7 The velocity, temperature and concentration profiles of a nanofluid using a neural network method and b Runge–Kutta method
3.2.1 Results and discussion Glorot normal initializer. All the parameter values used in
this problem are taken from [46]. The results obtained from
The training of neural network model for f ; h was done for the deep learning network model are shown below.
7k epochs and for U it was done with 20k epochs. The For f, the model compiles within 15 seconds by
training and testing data set consists of 100 uniformly recording the best loss value of 1:1 104 . Fig. 5a and b
spaced points in the computational domain ð0; 3Þ. The loss shows the comparison of neural network solution for
functions Lf , Lh and LU are minimized using AdaMax stream function f and velocity profile for m ¼ 0:0909 and
optimizer with sigmoid as activation function applied on 16 s ¼ 0:5 with solutions obtained by Runge–Kutta method.
hidden neurons. The network weights are initialized using The parametric values chosen to obtain the NN solution for
123
Neural Computing and Applications (2023) 35:1661–1673 1671
Table 5 The error table of neural network solution with reference chosen to obtain the NN solution for U are
solution m ¼ 0:0909; Nt ¼ NB ¼ 0:1; d ¼ 0:2; Le ¼ 5. The model
g fref fb jfref fbj compiles within 25 seconds by recording the best loss
value of 7 102 . In Fig. 5a, the NN solution for U is
0 0.916674306 0.916674306 0 compared with Runge–Kutta solution.
0.3 0.97114453 0.971142 2.17E-06 Fig. 7 displays the comparison of neural network solution
0.6 1.113634692 1.113624 1.10E-05 for all the three profiles: velocity; temperature; and concen-
0.9 1.317738131 1.317732 6.01E-06 tration profiles with standard Runge–Kutta method. The
1.2 1.562806969 1.56278 2.75E-05 parameter values used are as follows: m ¼ 0:0909; s ¼
1.5 1.83341886 1.833363 5.60E-05 0:5; Nt ¼ NB ¼ 0:1; d ¼ 0:2; Pr ¼ 6:2; Le ¼ 5: Table 5
1.8 2.118838392 2.118779 5.89E-05 shows the absolute error of f, h and U between the NN
2.1 2.412195059 2.412137 5.81E-05 solution and Runge–Kutta solution.
2.4 2.709473054 2.709389 8.36E-05
2.7 3.008534433 3.008414 0.00012
3 3.308343141 3.308343141 0 4 Conclusions
123
1672 Neural Computing and Applications (2023) 35:1661–1673
treatment is needed in the proposed method, a direct 10. Falkneb V, Skan SW (1931) LXXXV. solutions of the boundary-
minimization of the loss function would yield accurate layer equations. London, Edinburgh, Dublin Philoso Magaz J Sci
12(80):865–896
solutions to the higher order differential equations. 11. Lakshmi KM, Siddheshwar PG, Muddamallappa MS (2020)
In future works, we plan to explore ANN-based tool for Study of rotating bénard-brinkman convection of Newtonian
the other types of nonlinear system of differential equa- liquids and Nanoliquids in enclosures. Int J Mech Sci 188:105931
tions, fractional order operators, partial differential equa- 12. Muddamallappa MS, Bhatta D, Riahi DN (2009) Numerical
investigation on marginal stability and convection with and
tions, and other challenging multiphysics problems. Also, without magnetic field in a mushy layer. Transp Porous Media
there are many important issues related to the proof of the 79(2):301–317
convergence of the ANN method, establishing a relation- 13. Bhatta D, Riahi DN, Muddamallappa MS (2012) On nonlinear
ship between loss function and errors, and many other evolution of convective flow in an active mushy layer. J Eng
Math 74(1):73–89
theoretical issues that deserve further investigations and 14. Lin S (1976) Oxygen diffusion in a spherical cell with nonlinear
attention. oxygen uptake kinetics. J Theor Biol 60(2):449–457
15. Gou K, Muddamallappa MS (2020) An analytic study on non-
Acknowledgements Authors would like to thank the support of linear radius change for hyperelastic tubular organs under volume
Research and Innovation and College of Science & Engineering, expansion. Acta Mechanica 231(4):1503–17
Texas A &M University-Corpus Christi (TAMUCC) for this research. 16. Butcher JC (2016) Numerical methods for ordinary differential
Also, authors like to acknowledge the high performance computing equations. John Wiley & Sons, UK
clusters at TAMUCC for providing access to the computing systems. 17. Grossmann C, Roos H, Stynes M (2007) Numerical treatment of
The authors like to thank the anonymous reviewers for their valuable partial differential equations; vol. 154. Springer
comments. 18. LeVeque RJ (2007) Finite difference methods for ordinary and
partial differential equations: steady-state and time-dependent
Data Availability Data sharing is not applicable as no new datasets problems. SIAM
were generated during this study. 19. Lambert J (1991) Numerical methods for ordinary differential
systems: the initial value problem. John Wiley & Sons Inc, UK
20. Hornik K, Stinchcombe M, White H (1989) Multilayer feedfor-
Declarations
ward networks are universal approximators. Neural Netw
2(5):359–366
Conflict of Interest The authors declare that they have no conflict of 21. Cybenko G (1989) Approximation by superpositions of a sig-
interest. moidal function. Math Control Signals Syst 2(4):303–314
22. Lee H, Kang IS (1990) Neural algorithm for solving differential
equations. J Comput Phys 91(1):110–131
References 23. Meade AJ Jr, Fernandez AA (1994) The numerical solution of
linear ordinary differential equations by feedforward neural net-
1. Yoon H, Mallikarjunaiah S (2022) A finite element discretization works. Math Comput Model 19(12):1–25
of some boundary value problems for nonlinear strain-limiting 24. Logovski AC (1992) Methods for solving of differential equa-
elastic bodies. Math Mech Solids 27(2):281–307. https://doi.org/ tions in neural basis. In: Proceedings 1992 RNNS/IEEE sympo-
10.1177/10812865211020789 sium on neuroinformatics and neurocomputers. IEEE,
2. Yoon HC, Vasudeva KK, Mallikarjunaiah SM (2022) Finite pp. 919–927
element model for a coupled thermo-mechanical system in non- 25. Lagaris IE, Likas A, Fotiadis DI (1998) Artificial neural networks
linear strain-limiting thermoelastic body. Commun Nonlinear Sci for solving ordinary and partial differential equations. IEEE
Num Simulation 108:106262 Trans Neural Netw 9(5):987–1000
3. Lee S, Yoon HC, Mallikarjunaiah SM (2022) Finite element 26. Ngom M, Marin O (2021) Fourier neural networks as function
simulation of quasi-static tensile fracture in nonlinear strain- approximators and differential equation solvers. Stat Anal Data
limiting solids with the phase-field approach. J Comput Appl Min: ASA Data Sci J 14(6):647–661
Math 399:113715 27. Lau LL, Werth D (2020) ODEN: A framework to solve ordinary
4. Gou K, Mallikarjuna M, Rajagopal K, Walton J (2015) Modeling differential equations using artificial neural networks. arXiv
fracture in the context of a strain-limiting theory of elasticity: a preprint arXiv:2005.14090
single plane-strain crack. Int J Eng Sci 88:73–82 28. Rao C, Sun H, Liu Y (2021) Physics-informed deep learning for
5. Muddamallappa MS (2015) On two theories for brittle fracture: computational elastodynamics without labeled data. J Eng Mech
modeling and direct numerical simulations. Ph.D. thesis 147(8):04021043
6. Mallikarjunaiah S, Walton J (2015) On the direct numerical 29. Shi E, Xu C (2021) A comparative investigation of neural net-
simulation of plane-strain fracture in a class of strain-limiting works in solving differential equations. J Algorithm Comput
anisotropic elastic bodies. Int J Fract 192(2):217–232 Technol 15:1748302621998605
7. Ferguson LA, Muddamallappa M, Walton JR (2015) Numerical 30. Dockhorn T (2019) A discussion on solving partial differential
simulation of mode-iii fracture incorporating interfacial equations using neural networks. arXiv preprint arXiv:1904.
mechanics. Int J Fract 192(1):47–56 07200
8. Thomas LH (1927) The calculation of atomic fields. In: Mathe- 31. Gin CR, Shea DE, Brunton SL, Kutz JN (2021) DeepGreen: deep
matical proceedings of the Cambridge philosophical society; learning of green’s functions for nonlinear boundary value
vol. 23. Cambridge University Press, pp. 542–548 problems. Sci Rep 11(1):1–14
9. Schrödinger E (1926) An undulatory theory of the mechanics of 32. Terra J (2021) Keras vs Tensorflow vs Pytorch: understanding the
atoms and molecules. Phys Rev 28(6):1049 most popular deep learning frameworks
33. Cebeci T, Bradshaw P (2012) Physical and computational aspects
of convective heat transfer. Springer Science & Business Media
123
Neural Computing and Applications (2023) 35:1661–1673 1673
34. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT 43. Yacob NA, Ishak A, Pop I (2011) Falkner-skan problem for a
press, UK static or moving wedge in nanofluids. Int J Therm Sci
35. Glorot X, Bengio Y (2010) Understanding the difficulty of 50(2):133–139
training deep feedforward neural networks. In: Proceedings of the 44. Mahanthesh B, Mackolil J, Mallikarjunaiah SM (2021) Response
thirteenth international conference on artificial intelligence and surface optimization of heat transfer rate in falkner-skan flow of
statistics. In: JMLR Workshop and conference proceedings, zno- eg nanoliquid over a moving wedge: Sensitivity analysis. Int
pp. 249–256 Commun Heat Mass Transfer 125:105348
36. McCulloch WS, Pitts W (1943) A logical calculus of the ideas 45. Mahabaleshwar US, Vishalakshi AB, Bognar GV, Mallikarjuna-
immanent in nervous activity. Bull Math Biophys 5(4):115–133 iah SM (2022) Effects of thermal radiation on the flow of a
37. Nocedal J, Wright SJ (1999) Numerical optimization. Springer, bouusinesq couple stress nanofluid over a porous nonlinear
USA stretching sheet. Inter J Appl Comput Math 8(4):1–7
38. Mustapha A, Mohamed L, Ali K (2021) Comparative study of 46. Kasmani RM, Sivasankaran S, Siri Z (2014) Convective heat
optimization techniques in deep learning: application in the transfer of nanofluid past a wedge in the presence of heat gen-
ophthalmology field. J Phys: Confer Series 1743:012002 eration/absorption with suction/injection. In: AIP conference
39. Rumelhart DE, Durbin R, Golden R, Chauvin Y (1995) Back- proceedings; vol. 1605. American Institute of Physics,
propagation: the basic theory. Backpropagation: theory, archi- pp. 506–511
tectures and applications, pp. 1–34
40. Fermi E (1927) Un metodo statistico per la determinazione di Publisher’s Note Springer Nature remains neutral with regard to
alcune priorieta dell’atome. Rend Accad Naz Lincei jurisdictional claims in published maps and institutional affiliations.
6(602–607):32
41. Jovanovic R, Kais S, Alharbi FH (2014) Spectral method for
Springer Nature or its licensor holds exclusive rights to this article
solving the nonlinear thomas-fermi equation based on exponen-
under a publishing agreement with the author(s) or other rightsh-
tial functions. J Appl Math 2014:168568. https://doi.org/10.1155/
older(s); author self-archiving of the accepted manuscript version of
2014/168568
this article is solely governed by the terms of such publishing
42. Parand K, Delkhosh M (2017) Accurate solution of the thomas-
agreement and applicable law.
fermi equation using the fractional order of rational chebyshev
functions. J Comput Appl Math 317:624–642
123