0% found this document useful (0 votes)
44 views13 pages

A Feedforward Neural Network Framework For Approximating The Solutions To Nonlinear Ordinary Differential Equations

This paper presents a method for approximating solutions to nonlinear ordinary differential equations (ODEs) using a feedforward artificial neural network (ANN) framework. The proposed approach is demonstrated through two boundary value problems from quantum mechanics and nanofluid mechanics, showing strong agreement with exact and numerical solutions. The method minimizes a mean-squared loss function and does not require special computational meshes, making it efficient for fluid mechanics applications.

Uploaded by

gozen17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views13 pages

A Feedforward Neural Network Framework For Approximating The Solutions To Nonlinear Ordinary Differential Equations

This paper presents a method for approximating solutions to nonlinear ordinary differential equations (ODEs) using a feedforward artificial neural network (ANN) framework. The proposed approach is demonstrated through two boundary value problems from quantum mechanics and nanofluid mechanics, showing strong agreement with exact and numerical solutions. The method minimizes a mean-squared loss function and does not require special computational meshes, making it efficient for fluid mechanics applications.

Uploaded by

gozen17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Neural Computing and Applications (2023) 35:1661–1673

https://doi.org/10.1007/s00521-022-07855-5 (0123456789().,-volV)(0123456789().
,- volV)

ORIGINAL ARTICLE

A feedforward neural network framework for approximating


the solutions to nonlinear ordinary differential equations
Pavithra Venkatachalapathy1,2 • S. M. Mallikarjunaiah1

Received: 25 May 2022 / Accepted: 16 September 2022 / Published online: 1 October 2022
Ó The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022

Abstract
In this paper, we propose a method to approximate the solutions to nonlinear ordinary differential equations (ODE) using a
deep learning feedforward artificial neural networks (ANNs). The efficiency of the proposed—unsupervised type machine
learning—method is shown by solving two boundary value problems (BVPs) from quantum mechanics and nanofluid
mechanics. The proposed mean-squared loss function is the sum of two terms: the first term satisfies the differential
equation, while the second term satisfies the initial or boundary conditions. The total loss function is minimized by using
general type of quasi-Newton optimization methods to get a desired network output. The approximation capability of the
proposed method is verified for two sets of boundary value problems: first, a second-order nonlinear ODE and, second, a
system of coupled nonlinear third-order ODEs. Point-wise comparison of our approximation shows a strong agreement
with the available exact solutions and/or Runge–Kutta-based numerical solutions. We remark that the proposed algorithm
minimizes the overall learnable network hyperparameters in a given initial or boundary value problems. More importantly,
for the coupled system of third-order nonlinear ordinary differential equations, the proposed method does not need any
adjustment with the initial/boundary conditions. Also, the current method does not require any special type of computa-
tional mesh. A straightforward minimization of total loss function yields a highly accurate results even with less number of
epochs. Therefore, the proposed framework offers an attractive setting for the fluid mechanics community who are
interested in studying heat and mass transfer problems.

Keywords Machine learning  Artificial neural networks  System of ordinary differential equations  Thomas–Fermi
differential equation  Nanofluid mechanics

1 Introduction techniques available to obtain explicit expressions for the


exact solutions, while the solutions to other differential
Singular and nonlinear differential equations arise fre- equations are approximated by numerical discretization
quently in the fields of material science[1–7], quantum algorithms. Numerical techniques approximate the
mechanics [8, 9], fluid mechanics [10–13], and biology unknown solution value at each of the discretized—known
[14, 15]. For some differential equations, there are many as nodes—points in the computational domain. However,
devising an appropriate computational algorithm for the
higher order nonlinear differential equations on the com-
& S. M. Mallikarjunaiah plex geometries is always challenging. The literature on the
[email protected] numerical methods for the solutions of differential equa-
Pavithra Venkatachalapathy tions is huge; we refer the interested reader to some classic
[email protected] books [16–19].
1
Department of Mathematics & Statistics, Texas A &M There is a great a scope for developing more efficient
University-Corpus Christi, Corpus Christi, TX 78412-5825, and unified numerical methods to approximate the solu-
USA tions to differential equations. Recently, Artificial Neural
2
Department of Mathematics & Statistics, Texas Tech Networks (ANNs) have shown a tremendous capability of
University, 1108 Memorial Circle, Lubbock, approximating solutions to differential equations. The
TX 79409-1042, USA

123
1662 Neural Computing and Applications (2023) 35:1661–1673

approximation capability of ANN is first reported as the special type of finite difference method [33]. The mini-
universal approximation theorem in the seminal works of mization of the total loss function would yield a network
Cybenko and Hornik in [20, 21]. The neural networks are approximate solution which is in excellent agreement with
machine learning models which works similar like neurons the numerical solution from classical Runge–Kutta shoot-
in human brain. Mathematically, these are the directed ing type methods.
graphs where each graph vertex is like a neuron and an The paper is organized as follows: a detailed introduc-
edge connecting two vertices represents an edge between tion about the neural network architecture such parameters,
two neurons. The idea of using ANNs as global approxi- activation functions, and also training & testing algorithms
mators was proposed in the pioneering works of Lee and is given in Sect. 2. The two numerical experiments are
Kang [22], Meade and Fernandez [23], Logovski [24], and presented in Sect. 3; first, the Thomas–Fermi model from
Lagaris [25]. The essence of these works is about using the quantum physics is numerically approximated by the pro-
output of a single hidden-layer (with sufficient number of posed neural network method, and all the computational
neurons) network to construct the discrete approximation details and results are given in the Sect. 3.1; second, a
to the exact solution of the differential equation. The Falkner–Skan type of flow model from nanofluid
important advantages of ANNs as approximators are as mechanics along with its boundary layer approximation
follows: the network approximate solution is smooth and and network approximation method are all presented in the
differentiable; the neural network approximator are on the Sect. 3.2. Finally, conclusions and some directions for the
discrete unstructured points and therefore the meshing is future work are given in Sect. 4.
not an issue; the network output is not affected from
rounding errors, domain discretization errors and approxi-
mation errors which typically defines the convergence in 2 Introduction to neural network
other numerical methods such as finite element method; the architecture
ANN framework is tunable and can be readily setup as a
tool to approximate solutions to ordinary, partial and Consider a Feedforward Neural Network (FFNN) with x 2
integral equations [26–30] and even for approximating Rn as input vector connected to a single hidden layer that
green’s functions for the nonlinear elliptic operators [31]; produces ‘‘n’’ number of neural network outputs denoted
the number of hyperparameters required is less than the by N as shown in Fig. 1. Let l1 ; l2 ; l3 ; l4 denote the single
typical numerical method; the method is implementable on input layer, two hidden layers and a single output layer,
parallel architecture. respectively. The input layer l1 consists of an input vector
In the last few years, there has been huge surge in the x ¼ fx1 ; x2 ;    ; xn g connected to H number of hidden
literature with the research focusing on the development of nodes which are then connected to the nodes in the hidden
new deep learning frameworks via unsupervised machine layers l2 and l3 . Let wij denotes the weights connecting the
learning methods to approximate the solutions to differ- input nodes to the hidden nodes. The strength of the con-
ential equations. A major portion of the success is attrib- nection between hidden layers l2 and l3 is denoted by w fij .
uted to the development of efficient libraries such as the The last hidden layer l3 is connected to the output layer l4
Tensorflow, Keras and others [32]. In the present work, through the weights vij . Let yi be the output from each node
we examine the use of a deep learning feedforward neural in the hidden layers and Nðx; vÞ be the neural network
networks framework for the solutions to nonlinear ordinary output. The neural network architecture also consists of an
differential equations. The network architecture is imple- additional input called as bias.
mented using Python programming language along with
the use of Tensorflow and Keras packages. We have 2.1 Neural network parameters
considered two boundary value problems to illustrate the
efficiency of our method: the first one is a second-order The strength of the connection between the nodes is
singular nonlinear ordinary differential equation from described by the weights between the neurons across lay-
quantum mechanics; the second problem is a coupled ers. If the weights of each neurons have the same value,
system of third-order nonlinear ordinary differential equa- then the network fails to learn the different patterns; hence,
tions from nanofluid mechanics. In both the problems, the to avoid such a situation, we initialize each neurons with
basic idea is to minimize the total loss function to get a random values generated by the computer [34]. The
network output, a collocated approximation for the exact weights in the network are assigned by an ‘‘initializer’’
solution and then compare the network solution with the algorithm. Since the neural network is highly sensitive to
available solutions from other numerical techniques from the initial weights, hence it is very important to choose the
the literature. In the case of nanofluid model, the ANN right initializer. The initializers used in our neural network
framework is simple, straightforward, and without any

123
Neural Computing and Applications (2023) 35:1661–1673 1663

Fig. 1 Feedforward neural


network architecture with single
hidden layer

architecture are the Glorot normal and the Glorot uniform Definition 1 A function r : R ! ½0; 1 is a squashing or
[35]. In Glorot uniform, the weight matrix values are a set activation function if it is non-decreasing with the fol-
of random numbers drawn from the uniform distribution. lowing properties [36]:
While, the Glorot normal algorithm follows the normal lim rðkÞ ¼ 1; and lim rðkÞ ¼ 0
distribution with a standard deviation e e The
s and a mean m. k!1 k!1
weights are initialized using the following formulas:
e e
W ¼ Dð m; s Þ: ð1Þ
The choice of the activation function has a large impact
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 on the capability and performance of any neural network
e ¼ 0 and e
In Glorot normal; m s¼ : ð2Þ architecture. We have used the following activation func-
Iin þ Oout
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tions in the current work are Tanh, Sigmoid, and Softmax.
6 6 Among these, Sigmoid and Softmax activation functions are
e¼
In Glorot uniform; m and e
s¼ ;
Iin þ Oout Iin þ Oout explained below and in Fig. 2 the plot of functions with
ð3Þ their derivatives are depicted.
where Iin and Oout are the number of inputs and outputs, Sigmoid activation function: The sigmoid activation
respectively. Also, the symbol D specifies normal distri- function, rs ðxÞ : R ! ½0; 1 is a bounded, differentiable,
bution for the Glorot normal and uniform distribution for real-valued function with positive derivative everywhere. It
the Glorot uniform. The network consists of a special type is a monotonically increasing and continuously differen-
of input node known as ‘‘bias’’ b that is constant tiable everywhere. This is an ‘‘S’’ shaped curve (as depicted
throughout the layer. The bias vector is not affected by in Fig. 2a). The sigmoid and its derivative are given by,
other neurons and also does not accept input from any
1
neurons. These weights and biases are collectively called rs ðxÞ ¼ ; ð4Þ
the ‘‘network parameters’’ denoted by P that need to be ð1 þ ex Þ
adjusted to achieve the desired output. r0s ðxÞ ¼ rs ðxÞð1  rs ðxÞÞ: ð5Þ

2.2 Activation function Softmax activation function: It takes an n-dimensional


input vector and produces an n-dimensional output vector
The activation function, also known as the squashing whose values are between 0 and 1 and add up to 1. The
function, is applied to each neuron in the hidden and the softmax function and its derivative is given by,
output layers of the network. The activation function is
ex i
used to regulate the output of the network model and it can rt ðxi Þ ¼ P xj 8i; j 2 1;    ; N; ð6Þ
ð je Þ
be either linear or nonlinear. The mathematical definition
of the activation function is as follows:

123
1664 Neural Computing and Applications (2023) 35:1661–1673

Fig. 2 Commonly used activation functions with their derivatives

 Definition 3 A vector of points x is called as minimum of


rt ðxi Þð1  rt ðxi Þ i ¼ j;
r0t ðxi Þ ¼ ð7Þ
rt ðxi Þrt ðxj Þ i 6¼ j: an objective function Le if,
e  Þ  LðxÞ
Lðx e 8x 2 Rn : ð12Þ
The mathematical operation in the hidden layer is actually
the linear combination of an input to its weights called as
weighted sum denoted by ‘‘g’’ that is converted to a non-
linear function by applying activation function rð:Þ at each The optimization is a process of minimizing or maxi-
of the hidden nodes; then, the output of each hidden layer is mizing an objective function LðxÞe subjected to certain
as follows: constraints on its variables [37]. The minimizers can be
! found using algorithms called optimizers. Any optimiza-
X n
tion process begins with an initial guess x and generates a
yi ¼ rðgi Þ ¼ r xj wij : ð8Þ
j¼1 sequence of corrections until it reaches the true value x .
The role of an optimizer in a neural network model is to
The same operation is applied for each hidden neuron in find the optimal parameters P that significantly reduce the
the network model. The final output, at each output nodes, e
function LðPÞ while training the model. Here P is the
of the FFNN is given by: vector of network parameters. A few of the optimizers used
X
H in the FFNN are discussed below and a comparative study
Nj ¼ vij  yi : ð9Þ of different optimizers can be found in [38].
i¼1 The overall process of the NN model has to go through
The type of activation function rðgi Þ is chosen based on the training and testing phases. Initially, we create a training
problem. Let Nðx; PÞ denotes the neural network output data set that is feed through the input layer to the first
and P ¼ fwij ; vij g is a set with network parameters. The hidden layer. The activation function is applied to each
neural network output is then used in the loss function hidden node that produces the output as activated weighted
given by, sum given in Eq. (8). The output obtained from the network
model is used to define the loss function that is considered
Lðx; PÞ ¼ f ðx; Nðx; PÞÞ: ð10Þ as the appropriate metric to evaluate the neural network
solution.
Definition 2 Let Le : Rn ! R be a continuously differen-
The training of network model involves finding the
tiable function and x 2 Rn be an n-dimensional vector
optimized parameters P that minimizes the loss function
subjected to certain constraints c. The minimization prob-
Lðx; PÞ. This is done by the gradient descent-based opti-
lem for Le is then defined as follows: mization method. The optimizer iteratively updates the NN

ci ðx1 ;    ; xn Þ ¼ ri i ¼ 1;    ; m; parameters in the negative gradient direction of Lðx; PÞ
e
minn LðxÞ subject to
x2R cj ðx1 ;    ; xn Þ  sj j ¼ 1;    ; z: ensuring that the network parameters flow toward a min-
ima. The gradients of Lðx; PÞ are computed by backprop-
ð11Þ
agation [39] method. This completes the training phase for
the network model. In testing phase, the network model is

123
Neural Computing and Applications (2023) 35:1661–1673 1665

tested for the same number of training points on the given 1 X n


2
domain. The obtained numerical values will be treated as Ld ¼ ðL½xi ; ybðxi Þ; yb0 ðxi Þ; yb00 ðxi ÞÞ ; ð15Þ
Nint i¼1
the neural network solution for the problem under inves-
tigation and are used for the post-processing of the results. with Nint as the number of interior data points and
In the remainder of this section, we illustrate the neural boundary condition loss Lbc ,
network mechanism to approximate the solution to ODEs.
Lbc ¼ ðb
y ðaÞ  yðaÞÞ þ ðb
y ðbÞ  yðbÞÞ: ð16Þ
Consider the following differential operator defined on the
domain X ¼ ða; bÞ: Then, the mean-squared loss is given by,
L½x; yðxÞ; y ðxÞ; y ðxÞ ¼ 0 for x 2 X : ða; bÞ;
0 00
ð13Þ L ¼ L2d þ L2bc ; ð17Þ
the above operator subjected to boundary conditions The overall process of the FFNN consists of training and
yðaÞ ¼ A and yðbÞ ¼ B. The neural network-based testing phase that can be viewed as an algorithm:
approximate solution ybðxÞ defined as:

Algorithm 1 Training phase


while L ≤  do
Step 1: initialize the weights wij , w
ij , vij and the initialize the loss value with L = 0.
Step 2: yi ← σ(gi ) output of the hidden layer.
Step 3: ok ← vij σ(yi ), k =number of output nodes and ok is the final output.
∂L ∂L ∂L
Step 4: Compute the loss gradients , and .
∂wij ∂ w ij ∂vij
Step 5: Weights are modified using the following gradient descent method :
n+1 n n
wij = wij + ∇wij , (18)
n+1 n n
w
ij = w
ij + ∇wij , (19)
vijn+1 = n n
vij + ∇vij . (20)

Step 6: if L ≤ , STOP; otherwise, go to STEP 2 and initiate new training.


end while
Step 7: Once the training is completed, next step is to test the network model for
given testing dataset.

ybðx; PÞ  yðxÞ; ð14Þ In the above algorithm, the steps 4 and 5 include the
backpropagation process.
the NN output ybðx; PÞ depends on the input vector x and
the network’s hyperparameters P. The goal of the training
phase is to learn the parameters P by optimizing the loss
function Lðx; PÞ that is written as sum of differential
equation loss Ld ,

Algorithm 2 Testing phase


Step 1: Formulate a mean-squared loss function for the boundary value problem under
investigation.
Step 2: Choose the neural network model architecture intuitively depending on the
complexity of the problem.
Step 3: Create a one-dimensional testing dataset x by discretizing the given interval
(a, b) and then by a linear interpolation create a dataset for the dependent variable.
Test the overall loss value by adjusting the network hyperparameters.
Step 4: Save the optimized network parameters to test the neural network model.
Step 5: Save the network prediction which will be later used as a collocated numerical
solution for the given BVP for further postprocessing.

123
1666 Neural Computing and Applications (2023) 35:1661–1673

3 Numerical experiments dðK:EÞ h2 ð3p2 nÞ5=3


p¼ ¼ : ð24Þ
dV 15m1 p2
The proposed feedforward neural network method is
illustrated by solving two different type of problems: one For q as electron charge density the pressure gradient is,
from quantum mechanics; the other is a boundary value rp ¼ enr/; ð25Þ
problem from the study of heat transfer issue in nanofluid
mechanics. The point-wise convergence of the network here / is the electrostatic potential and e is the electron
approximate solution is shown for both the problems. Both charge. Taking the gradient of p of Eq. (24) and equating
problems considered in this paper do not possess the exact with Eq. (25) and integrating, we arrive at the total energy
solution; hence, we compared the network output to ref- of the electrons,
erence solutions from the relevant literature and numerical p2F
solution obtained from Runge–Kutta method. The entire  e/ ¼ e/0 ; ð26Þ
2m
training, testing and the optimization procedures were done
on the High Performance Computing cluster at Texas A where pF ¼ hKF is the Fermi momentum. Let y be the
&M University - Corpus Christi, Texas, USA. The whole dimensionless dependent variable that depends on the
procedure took several minutes even for more than 30k radius r given by the equation:
epochs and with the complex neural network architecture. rW
In all the numerical experiments, we used the learning rate y¼ ; ð27Þ
Ze
as 0.001 and tested several choices of activation function
and optimizers. In the next two subsections, we provide a where Z is the nuclear charge. Then, the number density in
detailed study of the network approximation method for terms of y is expressed as:
both the BVPs.  3=2  
1 2m1 Ze2 y 3=2
n¼ 2 ; ð28Þ
3p h 2 r
3.1 Thomas–Fermi differential equation model
e 3p2=3 a0
r ¼ bx; be ¼ 7=3 1=3 ; ð29Þ
Thomas–Fermi Differential Equation (TFDE) [8, 40] is a 2 Z
mathematical model for multi-electron atoms that is used to where a0 is Bohr radius and x is dimensionless radial
approximate the distribution of electrons in atom. The variable. Then, the radial equation for y is
nucleus is surrounded by an electron cloud that is repre-
sented as, zero temperature, negatively charged, degenerate d2 y 1
2
¼ pffiffiffi y3=2 ; 0\x\1; ð30Þ
Fermi-Dirac fluid. The TFDE model treats the Ne number of dx x
electrons as a Fermi-electron fluid in ground state. The yð0Þ ¼ 1; yð1Þ ¼ 0: ð31Þ
electron fluid surrounding the nucleus is held by the elec-
trostatic forces that come from both the nucleus and the The differential Eq. (30) together with the boundary con-
other electrons. ditions (31) constitute the Thomas–Fermi differential
Let Ne number of electrons be placed in a box of volume equation model (TFDEM). The main objective of any
e solution algorithm for TFDEM is to find the initial slope of
V then the number density n ¼ NV and the density of states
the solution curve, i.e., y0 ð0Þ, which is a key ingredient in
with respect to electron wave number k, the formula for the energy of the neutral atom, given by
d Ne k2 V ð21Þ
[41]:
¼ 2 :  
dk p
6 4P 2=3 7 0
E¼ Z 3 y ð0Þ: ð32Þ
By integrating the above equation from k ¼ 0 to k ¼ KF 7 3
we can get an expression for Fermi wave number as,
There are numerous solution methods—numerical, ana-
KF ¼ ð3p2 nÞ1=3 : ð22Þ lytical and semi-analytical introduced to solve TFDEM.
Parand and Delkhosh [42] used a transformation based on
The total kinetic energy of the electrons:
the on Chebyshev polynomials to convert the TFDEM into
h2 ð3p2 NeÞ5=3 a sequence of linear ODEs by using fractional order
K:E ¼ : ð23Þ
10m1 p2 V 2=3 rational Chebyshev functions of the first kind. The main
objective in [42] was to compute the value of initial slope
The pressure p in terms of number density n is given by: y0 ð0Þ accurately and the reported value up to 37 decimal
places is

123
Neural Computing and Applications (2023) 35:1661–1673 1667

y0 ð0Þ ¼ 1:5880710226113753127186845094239501095: 1:49338443 and starts to decrease but fluctuates between


3 and 4 layers. We choose 16 as hidden neurons because
ð33Þ
while testing the yb0 values for different number of neurons
In this paper, we employ a deep learning feedforward we observe that increasing the number of neurons increases
neural network to approximate the solution to TFDEM the value of the slope (see Table 3). The best value
(30–31). Let yb be the neural network approximation to y. obtained by our model is yb0 ð0Þ ¼ 1:56720482 with 7
We choose ð0; 20Þ as the computational and a dataset is hidden layers of 16 neurons each with sigmoid activation
created by discretization the domain. Then, the training of function applied for each neuron.
the network is setup by taking linear interpolation as the
dependent variable. The differential equation loss function 3.2 Falkner–Skan flow of a nanofluid past
Ld for Eq. (30) is given by: a moving wedge
n  2
1 X 00 1 3=2
Ld ¼ yb ðxi Þ  pffiffiffiffi ðb
y ðxi ÞÞ : ð34Þ Here, we consider a wedge (with an angle bp) immersed in
Nint i¼1 xi a nanofluid subjected to suction or injection type of
boundary conditions and with a constant free-stream flow
The boundary condition loss function Lb for Eq. (31) is,
velocity ue ðxÞ. We introduce a coordinate system by taking
y ð0Þ  yð0ÞÞ2 þ ðb
Lb ¼ ðb y ð20Þ  yð20ÞÞ2 : ð35Þ the horizontal x-axis to be along the wedge surface and the
vertical y-axis to be normal to the wedge surface. Let T1
The total mean-squared loss function is given by, and C1 denotes the ambient temperature and nanoparticles
L ¼ Ld þ Lb : ð36Þ volume fraction, respectively. Also, let Tw and Cw denotes
the constant temperature and concentration at the wall of
The total loss function is minimized using AdaMax opti- the wedge, respectively. The entire physical flow configu-
mizer and the corresponding neural network predictions are ration is depicted in Fig. 4).
recorded. To illustrate the efficacy of the proposed neural The fundamental assumptions to obtain the simplified
network tool, a comparison with the numerical solution mathematical equations are as follows: the flow is steady;
from [42] will be presented in the next subsection. the ambient fluid is viscous and incompressible; and the
flow considered is two-dimensional. The mathematical
3.1.1 Network approximation for the TFDE model model which includes the conservation of mass, momen-
tum, thermal energy incorporating the thermophysical
We first describe all the procedures involved in the network properties of the nanofluids written in the Cartesian coor-
method. The network parameters are initialized randomly dinate system as [43]:
using Glorot normal initializer. The neural network model is
ou ov
trained for 16 number of hidden neurons activated by the þ ¼ 0; ð37Þ
Sigmoid activation function. To evaluate the training process ox oy
of the neural network model, we use the mean-squared loss ou ou o2 u due ðxÞ
u þ v ¼ m 2 þ ue ðxÞ ; ð38Þ
function that is minimized using AdaMax optimizer. Once ox oy oy dx
the architecture of set up, next step is to determine the NN
prediction yb0 ð0Þ by computing the optimized NN parameters oT oT o2 T
u þv ¼a 2
ox oy oy
such that the mean-squared loss value is minimum for all the    2 ð39Þ
allowable ranges of the network’s hyperparameters. The oC oT DT oT
þ s DB þ þ QðT  T1 Þ;
results obtained from FFNN model is depicted as follows: oy oy T1 oy
In [42], the initial slope of the solution curve is obtained oC oC o2 C D T o2 T
by taking 300 collocation points in the computational u þv ¼ DB 2 þ ; ð40Þ
ox oy oy T1 oy2
domain ð0; 20Þ. We tried to setup the network model for
the same computational domain by using sufficient number where u and v are the velocity components in x and
of training and testing data points along with a good ydirections, m is the kinematic viscosity of nanofluid, T is
number of epochs. In our attempt, we did a deep learning the nanofluid temperature and C is the volume fraction, a is
model with more than one-hidden layer. The comparison of nanofluid thermal diffusivity, s is the ratio of heat capacity
NN solution with reference solution is shown in Fig. 3. of nanoparticle and the heat capacity of the fluid. The term
Table 1 shows the absolute error of yb (NN output) and DB is the Brownian diffusion coefficient which defines the
yref (reference solution). We select 16 hidden neurons and Brownian motion. The parameter Q is the specific heat of
vary the number of hidden layers to record the value for yb0 nanoparticle and heat absorption/generation coefficient.
as shown in Table 2. The initial slope starts at The constant DT is the thermal diffusion.

123
1668 Neural Computing and Applications (2023) 35:1661–1673

The boundary conditions for the flow problem are as Table 1 The absolute error between neural network output and ref-
follows: erence solution for Thomas–Fermi differential equation

at y ¼ 0 : u ¼ 0; v ¼ v0 ; T ¼ Tw ; C ¼ Cw ; ð41Þ x yb yref jyref  ybj

as y ! 1 : u ¼ ue ðxÞ; v ¼ 0; T ¼ T1 ; C ¼ C1 ; 0 1 1 0
1 0.426335 0.424008 0.002327
ð42Þ
2 0.244121 0.243009 0.001113
Our goal in this work is to develop a framework for the 3 0.157281 0.156633 0.000649
numerical solution for the problem described above. To 4 0.108716 0.108404 0.000312
reduce the partial differential equations model (37)–(40) to 5 0.078848 0.078808 4.00E-05
the corresponding ordinary differential equation system, 6 0.05934 0.059423 8.27E-05
we first introduce a scalar stream function wðx; yÞ defined 7 0.045976 0.046098 0.000122
as: 8 0.036382 0.036587 0.000205
ow ow 9 0.029171 0.029591 0.00042
u¼ ; and v¼ ; ð43Þ 10 0.023524 0.024314 0.00079
oy ox
15 0.007164 0.010805 0.003641
which satisfies the continuity eq. (37) automatically. Then, 20 1.25E-06 0.005785 0.005784
to study the problem under meaningful boundary layer
assumptions, we introduce the following dimensionless
variables [43–45]:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Table 2 Values of initial slope by varying number of hidden layers
2ue ðxÞm x
w¼ f ðgÞ; ð44aÞ Layers Loss yb0 ð0Þ
ð1 þ mÞ
1 2:27 102 -1.49338443
T ¼ T1 þ ðTw  T1 ÞhðgÞ; ð44bÞ 2 2:7 10 3 -1.55807343
C ¼ C1 þ ðCw  C1 Þ/ðgÞ; ð44cÞ 3 2 103 -1.56169785
4 2 103 -1.55733242
where g is the similarity variable defined as 5 1:2 10 3 -1.5653681
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð1 þ mÞue ðxÞ 6 2 103 -1.56185863
g¼y : ð45Þ 4
2vx 7 2:2 10 -1.56720482
8 4 -1.56599946
6:67 10
In the current work, we have assumed the free-stream
velocity (also known as the potential flow velocity) as
ue ðxÞ ¼ axm , where a is a constant. The exponent m is a
wedge angle parameter and it is a function of b is the Table 3 Values of initial slope by varying number of hidden neurons
Neurons Loss yb0 ð0Þ

16 0.0227 -1.49338443
32 0.0241 -1.48843538
64 0.0236 -1.48969492
128 0.0239 -1.48865737
256 0.0253 -1.48394417

Hartree pressure gradient. The relationship between m, b is


given by:
2m X
b¼ ; and b¼ ð46Þ
mþ1 p
The following Table 4 gives the description of the physical
parameters used in the above model.
Fig. 3 Comparison of neural network solution with reference solution
for Thomas–Fermi differential equation

123
Neural Computing and Applications (2023) 35:1661–1673 1669

The above set of ODEs are solved numerically by


employing the Runge–Kutta–Gill procedure in conjunction
with the shooting algorithm to satisfy the conditions at the
boundary layer edge as shown in [46]. In the current work,
we use a Feedforward neural network framework to
approximate the solution variables such as the velocity,
temperature and concentration. The boundary value prob-
lem ()-() which are originally defined on the domain
½0; 1Þ now scaled down to ½0; 3 for the computational
purpose. The goal of the ANN approximation method is to
minimize the total loss functions for each variables. To this
end, we define the total loss functions for f ; h and U as
Fig. 4 The physical flow configuration of the problem
shown below.
On substituting the above similarity transformation () The total loss function Lf for the variable f is given by,
n  2
into the mathematical model (38–40), we obtain the fol- 1 X
fb000 þ fbfb00 þ bð1  fb0 Þ þð fbð0Þ  f ð0ÞÞ2
2
Lf ¼
lowing third-order nonlinear coupled system of ordinary Nint i¼1
differential equations.
þ ð fb0 ð0Þ  f 0 ð0ÞÞ2 þ ð fb0 ð3Þ  f 0 ð3ÞÞ2 :
f 000 þ ff 00 þ bð1  f 02 Þ ¼ 0; ð47aÞ
ð49Þ
1 00
h þ f h0 þ NB U0 h02 þ bdh ¼ 0; ð47bÞ The total loss function Lh for the variable h is given by,
Pr
n  2
Nt 00 1 X 1 b00 bb0 b 0 b 02 b
U00 þ Lef U0 þ h ¼ 0; ð47cÞ Lh ¼ h þ f h þ NB U h þ bd h
NB Nint i¼1 Pr ð50Þ
subjected to the following set of boundary conditions: b 2 b
þ ð hð0Þ  hð0ÞÞ þ ð hð3Þ  hð3ÞÞ : 2

2 The total loss function LU for the variable U is given by,


f ð0Þ ¼ s; f 0 ð0Þ ¼ 0; hð0Þ ¼ 1; Uð0Þ ¼ 1; ð48aÞ
mþ1
n  
1 X b b b Nt b00 2 b
f 0 ð1Þ ¼ 1; hð1Þ ¼ 0 Uð1Þ ¼ 0; ð48bÞ LU ¼ 00
U þ Le f U þ 0
h þð Uð0Þ  Uð0ÞÞ2
Nint i¼1 NB
where Pr ¼ va is the Prandtl number, Lewis number given b
þ ð Uð3Þ  Uð3ÞÞ2 :
as Le ¼ v=DB , Brownian motion parameter is denoted by
DB ðUw  U1 Þ ð51Þ
NB ¼ ðqcÞp v is the Brownian motion
ðqcÞf The three total loss functions are minimized using the
parameter, Nt ¼ ðqcÞp DT ðhw  h1 Þ=ðqcÞf T1 v is the ther- architecture defined in the two algorithms-1 and -2. The
mophoresis parameter, d ¼ Qx=ue ðxÞ heat generation/ab- network output is saved and compared with the available
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sorption parameter and s ¼ v0 ðm þ 1Þx=2vue ðxÞ is the numerical solution from shooting technique.
suction parameter.

Table 4 Description of the


v ¼ l=q Kinematic viscosity of the nanofluid
parameters used in the
mathematical model v0 Velocity suction or injection at the wall
Q Thermal conductivity
DB ¼ kB T=3pldp Brownian diffusion
kf ; kp ; kB ; l; dp Thermal conductivity of the fluid,
Thermal conductivity of the nanoparticle,
Boltzmann’s constant, nanofluid viscosity and
Nanoparticle diameter respectively
DT ¼ b lC=q Thermal diffusion
b ¼ 0:26kf =ð2kf þ kp Þ Thermophoretic coefficient

123
1670 Neural Computing and Applications (2023) 35:1661–1673

Fig. 5 a The NN solution for profile f ðgÞ and b Velocity profile of the nanofluid

Fig. 6 a Temperature profile of the nanofluid and b Concentration profile of the nanoparticle

Fig. 7 The velocity, temperature and concentration profiles of a nanofluid using a neural network method and b Runge–Kutta method

3.2.1 Results and discussion Glorot normal initializer. All the parameter values used in
this problem are taken from [46]. The results obtained from
The training of neural network model for f ; h was done for the deep learning network model are shown below.
7k epochs and for U it was done with 20k epochs. The For f, the model compiles within 15 seconds by
training and testing data set consists of 100 uniformly recording the best loss value of 1:1 104 . Fig. 5a and b
spaced points in the computational domain ð0; 3Þ. The loss shows the comparison of neural network solution for
functions Lf , Lh and LU are minimized using AdaMax stream function f and velocity profile for m ¼ 0:0909 and
optimizer with sigmoid as activation function applied on 16 s ¼ 0:5 with solutions obtained by Runge–Kutta method.
hidden neurons. The network weights are initialized using The parametric values chosen to obtain the NN solution for

123
Neural Computing and Applications (2023) 35:1661–1673 1671

Table 5 The error table of neural network solution with reference chosen to obtain the NN solution for U are
solution m ¼ 0:0909; Nt ¼ NB ¼ 0:1; d ¼ 0:2; Le ¼ 5. The model
g fref fb jfref  fbj compiles within 25 seconds by recording the best loss
value of 7 102 . In Fig. 5a, the NN solution for U is
0 0.916674306 0.916674306 0 compared with Runge–Kutta solution.
0.3 0.97114453 0.971142 2.17E-06 Fig. 7 displays the comparison of neural network solution
0.6 1.113634692 1.113624 1.10E-05 for all the three profiles: velocity; temperature; and concen-
0.9 1.317738131 1.317732 6.01E-06 tration profiles with standard Runge–Kutta method. The
1.2 1.562806969 1.56278 2.75E-05 parameter values used are as follows: m ¼ 0:0909; s ¼
1.5 1.83341886 1.833363 5.60E-05 0:5; Nt ¼ NB ¼ 0:1; d ¼ 0:2; Pr ¼ 6:2; Le ¼ 5: Table 5
1.8 2.118838392 2.118779 5.89E-05 shows the absolute error of f, h and U between the NN
2.1 2.412195059 2.412137 5.81E-05 solution and Runge–Kutta solution.
2.4 2.709473054 2.709389 8.36E-05
2.7 3.008534433 3.008414 0.00012
3 3.308343141 3.308343141 0 4 Conclusions

b In this paper, we have presented a deep learning feedfor-


g href h jhref  b
hj
ward artificial neural network framework as an approxi-
0 1 1 0 mation tool for the solutions to the nonlinear ordinary
0.3 0.290632 0.300304 0.009672 differential equations. The mean-squared type loss function
0.6 0.050367 0.049527 0.00084 is shown to be efficient in approximating the solutions.
0.9 0.005261 0.006465 0.001204 Further, we have done a comparative study between the
1.2 0.000371 0.001539 0.001168 different activation functions, optimizers, number of neu-
1.5 4.80E-05 0.000185 0.000137 rons in each hidden layers and more importantly number of
1.8 3.33E-05 –0.00019 0.000219 hidden layers. Our overall approach is straightforward,
2.1 3.13E-05 7.90E-05 4.77E-05 simple, and hence easy to implement. In our approach, the
2.4 3.00E-05 0.000491 0.000461 initial/boundary conditions were part of the total loss
2.7 2.88E-05 0.000664 0.000635 function; then, there is no need for imposing these sepa-
3 2.78E-05 0 2.78E-05 rately as constraints.
Our method has shown some promising results in
approximating solution to Thomas–Fermi differential
g Uref b
U b
jUref  Uj equation model. The comparison with an available results
from the literature is good. A main objective in this
0 1 1 0
problem is to compute the initial slope of the solution
0.3 0.5202 0.525036 0.004836
curve, i.e., y0 ð0Þ. Although the compute value of the slope
0.6 0.180851 0.186026 0.005175
at x ¼ 0 is close to the value reported by other studies, we
0.9 0.038099 0.036881 0.001218
feel that one can fine-tune the current framework to achieve
1.2 0.004912 0.003803 0.001109
the even higher accuracy. The best value obtained by a
1.5 0.000434 -0.00069 0.00112
deep learning model with 7 hidden layers of 16 neurons in
1.8 6.95E-05 -0.00133 0.001403
each layer was yb0 ð0Þ ¼ 1:56720482.
2.1 5.11E-05 -0.00119 0.001243
For the problem of third-order coupled nonlinear ordi-
2.4 5.05E-05 -0.00142 0.00147
nary differential equations, the proposed method has
2.7 5.04E-05 -0.00174 0.001789
achieved remarkable results with the total loss reported was
3 5.04E-05 0 5.04E-05
less than 104 even with small number of epochs. It is clear
from this example that artificial neural network-based
framework offers an attractive numerical tool for fluid
mechanics community instead of classical Runge–Kutta
h are m ¼ 0:0909; Nt ¼ NB ¼ 0:1; d ¼ 0:2; Pr ¼ 6:2. shooting methods or finite difference methods. In the latter
The model compiles within 20 seconds by recording the methods, one needs to adjust the initial/boundary condi-
best loss value of 6 104 . The reference solution for h is tions and create a special grid to accommodate the higher
obtained by Runge–Kutta method and is compared with order derivatives and a linearization to take care of non-
NN solution as shown in Fig. 6a. The parametric values linearity in the differential equations. There is no special

123
1672 Neural Computing and Applications (2023) 35:1661–1673

treatment is needed in the proposed method, a direct 10. Falkneb V, Skan SW (1931) LXXXV. solutions of the boundary-
minimization of the loss function would yield accurate layer equations. London, Edinburgh, Dublin Philoso Magaz J Sci
12(80):865–896
solutions to the higher order differential equations. 11. Lakshmi KM, Siddheshwar PG, Muddamallappa MS (2020)
In future works, we plan to explore ANN-based tool for Study of rotating bénard-brinkman convection of Newtonian
the other types of nonlinear system of differential equa- liquids and Nanoliquids in enclosures. Int J Mech Sci 188:105931
tions, fractional order operators, partial differential equa- 12. Muddamallappa MS, Bhatta D, Riahi DN (2009) Numerical
investigation on marginal stability and convection with and
tions, and other challenging multiphysics problems. Also, without magnetic field in a mushy layer. Transp Porous Media
there are many important issues related to the proof of the 79(2):301–317
convergence of the ANN method, establishing a relation- 13. Bhatta D, Riahi DN, Muddamallappa MS (2012) On nonlinear
ship between loss function and errors, and many other evolution of convective flow in an active mushy layer. J Eng
Math 74(1):73–89
theoretical issues that deserve further investigations and 14. Lin S (1976) Oxygen diffusion in a spherical cell with nonlinear
attention. oxygen uptake kinetics. J Theor Biol 60(2):449–457
15. Gou K, Muddamallappa MS (2020) An analytic study on non-
Acknowledgements Authors would like to thank the support of linear radius change for hyperelastic tubular organs under volume
Research and Innovation and College of Science & Engineering, expansion. Acta Mechanica 231(4):1503–17
Texas A &M University-Corpus Christi (TAMUCC) for this research. 16. Butcher JC (2016) Numerical methods for ordinary differential
Also, authors like to acknowledge the high performance computing equations. John Wiley & Sons, UK
clusters at TAMUCC for providing access to the computing systems. 17. Grossmann C, Roos H, Stynes M (2007) Numerical treatment of
The authors like to thank the anonymous reviewers for their valuable partial differential equations; vol. 154. Springer
comments. 18. LeVeque RJ (2007) Finite difference methods for ordinary and
partial differential equations: steady-state and time-dependent
Data Availability Data sharing is not applicable as no new datasets problems. SIAM
were generated during this study. 19. Lambert J (1991) Numerical methods for ordinary differential
systems: the initial value problem. John Wiley & Sons Inc, UK
20. Hornik K, Stinchcombe M, White H (1989) Multilayer feedfor-
Declarations
ward networks are universal approximators. Neural Netw
2(5):359–366
Conflict of Interest The authors declare that they have no conflict of 21. Cybenko G (1989) Approximation by superpositions of a sig-
interest. moidal function. Math Control Signals Syst 2(4):303–314
22. Lee H, Kang IS (1990) Neural algorithm for solving differential
equations. J Comput Phys 91(1):110–131
References 23. Meade AJ Jr, Fernandez AA (1994) The numerical solution of
linear ordinary differential equations by feedforward neural net-
1. Yoon H, Mallikarjunaiah S (2022) A finite element discretization works. Math Comput Model 19(12):1–25
of some boundary value problems for nonlinear strain-limiting 24. Logovski AC (1992) Methods for solving of differential equa-
elastic bodies. Math Mech Solids 27(2):281–307. https://doi.org/ tions in neural basis. In: Proceedings 1992 RNNS/IEEE sympo-
10.1177/10812865211020789 sium on neuroinformatics and neurocomputers. IEEE,
2. Yoon HC, Vasudeva KK, Mallikarjunaiah SM (2022) Finite pp. 919–927
element model for a coupled thermo-mechanical system in non- 25. Lagaris IE, Likas A, Fotiadis DI (1998) Artificial neural networks
linear strain-limiting thermoelastic body. Commun Nonlinear Sci for solving ordinary and partial differential equations. IEEE
Num Simulation 108:106262 Trans Neural Netw 9(5):987–1000
3. Lee S, Yoon HC, Mallikarjunaiah SM (2022) Finite element 26. Ngom M, Marin O (2021) Fourier neural networks as function
simulation of quasi-static tensile fracture in nonlinear strain- approximators and differential equation solvers. Stat Anal Data
limiting solids with the phase-field approach. J Comput Appl Min: ASA Data Sci J 14(6):647–661
Math 399:113715 27. Lau LL, Werth D (2020) ODEN: A framework to solve ordinary
4. Gou K, Mallikarjuna M, Rajagopal K, Walton J (2015) Modeling differential equations using artificial neural networks. arXiv
fracture in the context of a strain-limiting theory of elasticity: a preprint arXiv:2005.14090
single plane-strain crack. Int J Eng Sci 88:73–82 28. Rao C, Sun H, Liu Y (2021) Physics-informed deep learning for
5. Muddamallappa MS (2015) On two theories for brittle fracture: computational elastodynamics without labeled data. J Eng Mech
modeling and direct numerical simulations. Ph.D. thesis 147(8):04021043
6. Mallikarjunaiah S, Walton J (2015) On the direct numerical 29. Shi E, Xu C (2021) A comparative investigation of neural net-
simulation of plane-strain fracture in a class of strain-limiting works in solving differential equations. J Algorithm Comput
anisotropic elastic bodies. Int J Fract 192(2):217–232 Technol 15:1748302621998605
7. Ferguson LA, Muddamallappa M, Walton JR (2015) Numerical 30. Dockhorn T (2019) A discussion on solving partial differential
simulation of mode-iii fracture incorporating interfacial equations using neural networks. arXiv preprint arXiv:1904.
mechanics. Int J Fract 192(1):47–56 07200
8. Thomas LH (1927) The calculation of atomic fields. In: Mathe- 31. Gin CR, Shea DE, Brunton SL, Kutz JN (2021) DeepGreen: deep
matical proceedings of the Cambridge philosophical society; learning of green’s functions for nonlinear boundary value
vol. 23. Cambridge University Press, pp. 542–548 problems. Sci Rep 11(1):1–14
9. Schrödinger E (1926) An undulatory theory of the mechanics of 32. Terra J (2021) Keras vs Tensorflow vs Pytorch: understanding the
atoms and molecules. Phys Rev 28(6):1049 most popular deep learning frameworks
33. Cebeci T, Bradshaw P (2012) Physical and computational aspects
of convective heat transfer. Springer Science & Business Media

123
Neural Computing and Applications (2023) 35:1661–1673 1673

34. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT 43. Yacob NA, Ishak A, Pop I (2011) Falkner-skan problem for a
press, UK static or moving wedge in nanofluids. Int J Therm Sci
35. Glorot X, Bengio Y (2010) Understanding the difficulty of 50(2):133–139
training deep feedforward neural networks. In: Proceedings of the 44. Mahanthesh B, Mackolil J, Mallikarjunaiah SM (2021) Response
thirteenth international conference on artificial intelligence and surface optimization of heat transfer rate in falkner-skan flow of
statistics. In: JMLR Workshop and conference proceedings, zno- eg nanoliquid over a moving wedge: Sensitivity analysis. Int
pp. 249–256 Commun Heat Mass Transfer 125:105348
36. McCulloch WS, Pitts W (1943) A logical calculus of the ideas 45. Mahabaleshwar US, Vishalakshi AB, Bognar GV, Mallikarjuna-
immanent in nervous activity. Bull Math Biophys 5(4):115–133 iah SM (2022) Effects of thermal radiation on the flow of a
37. Nocedal J, Wright SJ (1999) Numerical optimization. Springer, bouusinesq couple stress nanofluid over a porous nonlinear
USA stretching sheet. Inter J Appl Comput Math 8(4):1–7
38. Mustapha A, Mohamed L, Ali K (2021) Comparative study of 46. Kasmani RM, Sivasankaran S, Siri Z (2014) Convective heat
optimization techniques in deep learning: application in the transfer of nanofluid past a wedge in the presence of heat gen-
ophthalmology field. J Phys: Confer Series 1743:012002 eration/absorption with suction/injection. In: AIP conference
39. Rumelhart DE, Durbin R, Golden R, Chauvin Y (1995) Back- proceedings; vol. 1605. American Institute of Physics,
propagation: the basic theory. Backpropagation: theory, archi- pp. 506–511
tectures and applications, pp. 1–34
40. Fermi E (1927) Un metodo statistico per la determinazione di Publisher’s Note Springer Nature remains neutral with regard to
alcune priorieta dell’atome. Rend Accad Naz Lincei jurisdictional claims in published maps and institutional affiliations.
6(602–607):32
41. Jovanovic R, Kais S, Alharbi FH (2014) Spectral method for
Springer Nature or its licensor holds exclusive rights to this article
solving the nonlinear thomas-fermi equation based on exponen-
under a publishing agreement with the author(s) or other rightsh-
tial functions. J Appl Math 2014:168568. https://doi.org/10.1155/
older(s); author self-archiving of the accepted manuscript version of
2014/168568
this article is solely governed by the terms of such publishing
42. Parand K, Delkhosh M (2017) Accurate solution of the thomas-
agreement and applicable law.
fermi equation using the fractional order of rational chebyshev
functions. J Comput Appl Math 317:624–642

123

You might also like