First Paper1
First Paper1
net/publication/365889526
CITATIONS READS
0 129
6 authors, including:
SEE PROFILE
All content following this page was uploaded by Sourav Saha on 02 December 2022.
Abstract The article proposes formulating and codifying a set of applied numerical methods, coined as Deep Learning
Discrete Calculus (DLDC), that uses the knowledge from discrete numerical methods to interpret the deep learning algo-
rithms through the lens of applied mathematics. The DLDC methods aim to leverage the flexibility and ever increasing
resources of deep learning and rich literature of numerical analysis to formulate a general class of numerical method
that can directly use data with uncertainty to predict the behavior of an unknown system as well as elevate the speed
and accuracy of numerical solution of the governing equations for known systems. The article is structured in two major
sections. In the first section, the building blocks of the DLDC methods are presented and deep learning structures anal-
ogous to traditional numerical methods such as finite difference and finite element methods are constructed with a view
to incorporate these techniques in Science, Technology, Engineering, Mathematics (STEM) syllabus for K-12 students.
The second section builds upon the building blocks of the previous discussion, and proposes new solution schemes for
differential and integral equations pertinent to multiscale mechanics. Each section is accompanied with mathematical
formulation of the numerical methods, analogous DLDC formulation, and suitable examples.
Keywords Numerical Methods · Discrete Calculus · Kernel Learning · Partial Differential Equation · Convolution
1 Introduction
The problems of engineering and physical science can be categorized into three types [1], type 1 problems are the
problems with limited physical understanding and a lot of experimental data, type 2 problems are those with incom-
plete physical understanding and some experimental data, and type 3 problems are those for which there is sufficient
knowledge about the system but solving for the system response is computationally challenging. An example of type 1
problem is relating the spatio-temporal variation of temperature to the resulting mechanical properties in metal additive
manufacturing [2], a type 2 problem would be calibrating the heat source and other relevant models for computational
fluid dynamics simulation for metal additive manufacturing process [3], and a type 3 problem would be relating the
microstructure and fatigue life of a 3D printed metallic part [2, 4]. Although the classification is not a strict one, it points
to two facts: a) even with numerical modeling, aid of data is needed at some level, and b) by combining data science and
numerical method one can cover the spectrum of solving for completely unknown phenomenon to challenging problems.
Hence, the question naturally arises, is there a way we can come up with a new method that gives us the best of both
worlds?
Establishing this connection between numerical methods and deep learning is the most sought after prize in the
literature nowadays. The quest started with solving the dynamic system directly using data science and deep learning
methods [5,6,7]. The deep learning methods primarily include deep neural network [8], recurrent neural network [9],
convolutional neural network [10], and residual neural network [11]. Other techniques from data science such as unsu-
pervised learning are also heavily used to analyze the data and extract meaningful features that may or may not have
explicit physical meaning (sometimes called the latent variables) [12, 13]. However, it was apparent that using directly
deep learning methods results in lack of generalization [14]. The neural networks are essentially highly non-linear in-
terpolation functions with parameters being trained on observations [15]. Hence, to predict the behaviour of a system
outside the training range becomes challenging. Moreover, the neural networks in such methods may become very com-
plex and it is often impossible to interpret the inner working of the neural network in use. This results in a ”black box”
1 Theoretical and Applied Mechanics, Northwestern University, 2145 Sheridan Rd, Evanston, 60208, Illinois, USA.
2 Department of Mechanical Engineering, Northwestern University, 2145 Sheridan Rd, Evanston, 60208, Illinois, USA.
3 Department of Mathematics, Princeton University, Washington Road, Princeton, 08540, New Jersey, USA.
† Corresponding Author: Wing Kam Liu, E-mail: [email protected]
2 Sourav Saha1 et al.
Fig. 1: The key features of the Deep Learning Discrete Calculus (DLDC) method.
method which we can just use without proper know-how. Finally, these methods are data-hungry i.e., it requires a lot of
data to implement these.
As a solution to the lack of generalization, combining the knowledge of mechanics with data science is proposed as
Mechanistic Data Science (MDS) [15]. The knowledge of mechanics can be incorporated in selection of features and
curing of data to be used as features, or, governing principles such as conservation of energy can be directly used as an
optimization constraint. The later method is also termed as the Physics Informed Neural Network (PINN) [16, 17, 18].
These methods have become extremely popular as these serve perfectly to solve type 3 problems [19, 20, 21]. Moreover,
these methods only partially solve the issue of the requirement of large datasets. However, neither of these theories
solves the problem of lack of interpretation as the neural network is still there. Finally, for type 1 problems (where we
only have data) the idea of adding governing principle does not work.
Data science techniques to discover the underlying governing equation from data have been proposed [22, 23, 24].
Another example is Dimension-Net which discovers the underlying non-dimensional numbers directly from data [25,
26]. While these techniques are exciting, one needs to have some idea about the system behavior to use these techniques
as these are primarily regression techniques working on a library of candidate functions. Therefore, there is still a need
for a data science framework that can directly use the experimental data to solve for the dynamics of the system and at
the same time interpret the technique.
Efforts towards merging numerical methods with deep learning have gained increased attention in the last few
years. The initial efforts were to establish the parallels between a subset of numerical methods with a subset of deep
learning techniques [27,28]. For example, how a recurrent neural network behaves like an ordinary differential equation
[29] such as wave equation [30], or, how some finite difference techniques or multi-grid methods have parallels with
convolutional neural network [31]. While these works are important as an interpretation of the deep learning techniques,
they do not suggest how to improve the numerical solvers using data science or vice-versa. One of many early efforts
that tries to leverage deep learning to improve existing numerical methods is called Hierarchical Deep Learning Finite
Element Method (HiDeNN-FEM) [28] where using custom neural networks different linear and non-linear interpolation
shape functions are obtained. A recent breakthrough in interpretable general deep learning methods for solving partial
differential equation came in the form of operator learning [32]. Two major methods were proposed concurrently for
operator learning called Fourier Neural Operator (FNO) [33], and DeepONet [34]. There have been studies showing
comparisons between these two techniques [35]. Keeping the similarities and differences aside, the philosophy of both
of these methods are same: mapping functions from input space to output space instead of mapping data. Based on
the same philosophy, several other structures are being proposed including graph kernel network and non-local kernel
network [36,37].
Aside from solving challenging problems in scientific and engineering research, contemporary deep learning meth-
ods have a huge pedagogical potential as well. In order to develop a competent workforce, big tech companies and
entrepreneurs are now focusing on integrating artificial intelligence (AI) in high school curriculum [38]. There is in-
creasing need for skilled workforce over outstanding academic achievers in the industries. Naturally, the question arises
how to train the young students in AI and at the same time, make the curriculum interesting for them. If the AI is
introduced in a traditional way through teaching computer science, the diversity in the student body will become very
difficult to attain. Due to the socioeconomic background of the students, direct training in machine learning algorithms
may become very hard to attain. Thus untested introduction to AI for middle and high school students will create another
Title Suppressed Due to Excessive Length 3
bias. On the other end of the spectrum, the way mathematics, especially, calculus is taught in the high schools may be-
come too abstract for some students to grasp. It is natural for the young students to be interested to learn a concept with
a clear idea about how to implement it. Usually, in the school curriculum the calculus is taught through some textbook
examples which often make the students disinterested in the topic [39]. This lack of interest results in reduced number of
students choosing Science, Technology, Engineering, and Mathematics (STEM) majors. There have been a lot research
[40,41,42] on how to incorporate data science and AI into high school education. However, most of these studies are at
strategic level and do not provide a clear pathway to merge two broad fields of calculus and AI. It is the authors’ opin-
ion that introducing AI through deep learning and interpretation of deep learning via elementary calculus is the most
optimized way to teach calculus at the high school level. This has two-way benefit: one, implementing calculus through
advanced computational tools with real-life data will be possible for young students. It will increase their interest and
understanding on the basic calculus. Two, interpretability of deep learning algorithms can be improved if the algorithms
are thought from the first principle.
With the broad scope in mind, in this article, the authors propose an applied numerical method that aims to combine
the mechanistic knowledge, classical numerical methods, and deep learning algorithms called Deep Learning Discrete
Calculus (DLDC) (see, Figure 1). A formal definition of Deep Learning Discrete Calculus can be given as:
The Deep Learning Discrete Calculus is a discipline to integrate fundamental definitions of calculus with nu-
merical methods using deep learning neural network.
Deep learning is an extremely powerful tool for feature extraction and prediction, but it requires massive and
cleansed data sets to perform well. For many engineering applications, the collection of so much data is prohibitively ex-
pensive or even impossible, making traditional deep learning approaches nonviable. Incorporating discrete calculus and
numerical method tools into neural network architectures can empower deep learning to address engineering problems
where datasets may be small and noisy. Additionally, numerical method-informed deep learning frameworks sometimes
have the potential to extrapolate beyond available observations, which is something traditional deep learning imple-
mentations are infamously incapable of. The synthesis of numerical analysis ideas with machine learning can reduce
the expenses of collecting data, pre-processing data, and training models, while improving the interpretability, accu-
racy, and generalizability of deep neural networks. Figure 1 shows the key features of the DLDC method. The DLDC
aims to become a general method for solving the three different types of problems discussed before. The article has
two major parts. The first part aims to introduce the concept of DLDC through building blocks of calculus, such as,
differential and integral calculus for STEM education. In later sections, the article discusses how to apply the DLDC
methods to solve differential and integral equations, and how it has advantages over the traditional numerical methods.
Each demonstration is accompanied by an example to further reinforce the ideas presented in the article.
Fig. 2: A neural network model with input, hidden, and output layers.
In this section, the basic mathematical foundations of the DLDC methods are presented. This section will be par-
ticularly useful for implementing in STEM education. Before going into too much detail, a brief discussion on neural
network is required. For extensive discussion, the readers can refer to [15]. A neural network has three types of lay-
ers, input, hidden, and output. The input variables of interest go into input layer, and the output variables are obtained
through the output layer. The hidden layer(s) takes input variable and pass those through a non-linear function, called
activation function, with optimization parameters. Mathematically, if x is the input variable, A is the activation function,
y is the output variable, the equation for a neural network can be written as,
y = A(W x + b) (1)
4 Sourav Saha1 et al.
where W is the weight, and b is the bias. By varying the weight and bias to minimize the error between predicted output
and training data, one can achieve an approximation of any linear and non-linear functional relationship. A schematic
relationship showing the details of a neural network with one input, one hidden, and one output layer is shown in Figure
2. The convention of super and sub-scripts for weights and biases are shown in Figure 2. Following this convention, the
output can be written as a function of the trainable weights, biases, and given input as
h 2,3 1,2 2,3 1,2
i
y = A W1,1 (A(W1,1 x + b21 )) + W2,1 A(W1,2 x + b22 ) (2)
The trainable W and b parameters are varied to minimize the following loss function:
P
1X
Loss f unction = (y − y∗ )2 (3)
P n=1
Based on this basic definition of neural network, we proceed to show how to build a neural network from scratch to
mimic the finite difference methods. Let us consider a function f (x) as shown in Figure 2(a). If we want to have an
approximation of first derivative of f (x) at A, the forward difference method will give us [43],
dy f (x j+1 ) − f (x j )
= (4)
dx x j+1 − x j
Here, the index j indicates point A, and j + 1 indicates the next point on the function. The central difference method
gives,
dy f (x j+1 ) − f (x j−1 )
= (5)
dx x j+1 − x j−1
A higher order approximation of the first derivative involving three points looks like,
dy −3 f (x j ) + 4 f (x j+1 ) − f (x j+2 )
= (6)
dx x j+1 − x j−1
Therefore, from observation, we can simply write the general form of these equations as,
n
dy X
= w j f (x j ) (7)
dx j=1
Fig. 3: (a) A schematic diagram showing a continuous function f (x) and discrete sampling data, (b) The DLDC structure
for the first derivative (forward difference method).
This form for numerical differentiation is famously known as differential quadrature method in numerical analysis
[44,45,46]. If we compare the form of Eqn. 7 with Eqn. 1, a clear parallel can be observed. The finite difference formulas
can be obtained using linear activation functions with zero bias. Inspired from this observation, a first-order derivative
equivalent neural network is proposed in Figure 2(b). In the figure, the weights and biases are expressed in index notation
for convenience. In Wi,l−1,l
j , the subscripts i and j denotes the incoming neuron and outgoing neuron, respectively. The
superscript l identifies the layers. The index l − 1 indicates the incoming layer and l indicates the outgoing layer. The
network has one hidden layer, rectified linear unit activation function, and zero bias. The weights between input and
Title Suppressed Due to Excessive Length 5
hidden layer are constrained to be 1, and the weights between the hidden and output layer (red marked) are optimized.
All the activation functions are linear and biases are set to zero. The cost function can be written as,
N !2
1 X dy true dy prediction
− (8)
N i=1 dx dx i
2,3
Here, N is the sample size. It is interesting to note that after optimization, the trainable weights becomes W1,1 =
2,1
1
and W2,3
x j+1 −x j = x j+1−1−x j . This value resembles the forward difference method. If a neural network like this is trained
once against data generated by a known function and it’s derivative, the trained network can be used to predict the first
derivative of any function provided the sampling remains the same.
This concept has been applied to formulate the first order derivative of different types of known function. Some of
the results are presented in Figure 4. In case of Figure 4(a), the neural network is trained for a simple sinusoidal function
in a range between 0 − 360 degrees. With the trained neural network, we tried to predict the first-order derivative of
cosine function which gives very good prediction. In the next example, see, Figure 4(b), a simple polynomial was used
to generate training data for the neural network. Once the network is trained, the same network is applied to predict the
derivative of a higher-order polynomial (Figure 4(c)) and a relatively complicated function (Figure 4(d)).
Fig. 4: The DLDC results for first-order derivative. (a) Performance for predicting the first derivative of cosine function.
The training was done with sine function. (b) Training performance for simple polynomial, and (c), (d) performance of
the network to predict derivatives of more complicated functions.
The next building block is to solve an integration problem. Numerically, an integral of a function with respect to a
variable is an estimation of area under the curve representing that function when plotted against the said variable. Based
on this simple concept, numerical methods such as trapezoidal or Simpson’s method are in use. However, engineers are
mostly interested to solve time dependent ordinary or partial differential equation. The first method that comes to mind
for such computation is the Euler’s method. Let us suppose we want to solve an ordinary differential equation.
dy
= f (y, t) (9)
dt
6 Sourav Saha1 et al.
where y is the dependent variable and t is time. A DLDC equivalent structure for explicit Euler method and associated
prediction is shown in Figure 5. The generalized alpha method for Euler’s method is:
yn+1 = yn + ∆t [1 − α]. f n + [α]. f n+1 , (10)
where, n is the time step, ∆t is step size, and α is a factor that indicates an explicit method if the value is 0 and implicit
method if the value is 1. The structure of the DLDC for Euler integration is exactly the same as forward difference
method, except the inputs are the values of the integrand and dependent variable at the previous step and the output is
the value of the dependent variable in the next time step. In the neural network shown in Fig. 5 (a), the biases are set to
zero and activation functions are linear. The black weights are fixed to a value of 1 and the red weights are only allowed
to train. The values of the red marked weights in the figure are the final values of these weights after training. It turns
out that after the training, the weights finally get to a value that makes the output mimic the equation Eq. 10.
Fig. 5: (a) The DLDC structure for the explicit Euler formulation, (b) A comparison between the analytical and the
DLDC prediction for the time integration.
The numerical prediction in the example was made by solving dy dt = 15t + 8t with y = 0 initial condition. The
2
predicted and analytical solutions diverge slightly near 1 second. But this error is coming from the Euler method itself
as the weights of the trained DLDC is exactly same as the explicit Euler method. Similar network can be constructed for
implicit Euler method as well.
Fig. 6: The DLDC structure for 2-point Gauss quadrature method. The red marked terms are to be optimized during
training.
The next attempt was made to construct a DLDC method for Gauss quadrature integration method [43]. In brief, the
Gauss quadrature method can perform integration by converting integral to fixed limit between -1 to 1. The equations
look like,
Title Suppressed Due to Excessive Length 7
b 1
a + b dx
Z Z !
b−a
f (x)dx = f ξ+ dξ, (11)
a −1 2 2 dξ
Z 1 n
X
g(x)dx = wi g(xi ). (12)
−1 i=1
Here, n is the number of quadrature points and wi are weights assigned to each point. g(xi ) is the value of the converted
function at each quadrature points. The DLDC structure for the 2-point Gauss quadrature is shown in Figure 6. In order
to train the network, a third order polynomial g(x, a) = a0 + a1 x + a2 x2 + a3 x3 is assumed and the corresponding Gauss
R1
integral −1 g(x, a)dx = 2a0 + 32 a2 . The training data is generated for different values of a and the integral. In the DLDC
neural network, the activation function is rectified linear unit between input and first hidden layer and at the last layer,
11 , w21 with following cost
and linear for the second hidden layer. If we train the neural network with respect to x1 , x2 , w34 34
function,
N Z 1 !2
1X 4
σ (x1 , x2 , w34
11 , w21 ; a ) −
34 n
g(x, a)dx , (13)
N n=1 1 −1
where
N
Xi−1
σ41 (x1 , x2 , w34
11 , w21 ; a ) =
34 n
wi−1,i
jk
σi−1
j + bk = c1 g(x1 ; a ) + c2 g(x2 , a ),
i n n
(14)
j=1
after the training the quadrature coordinates x1 , x2 , and the weights c1 and c2 have exact same value as theoretical
Gauss quadrature method. The structure was extended to 3 and 4-point rules and for all the cases the trained values
for the co-ordinates and weights match well with the analytical value (see Figure 7). Just like the Euler method and
finite difference approximation, the Gauss quadrature can be trained with a known integration can be used for any other
integration without retraining.
Fig. 7: A table showing the comparison between different quadrature methods and optimized values for the DLDC.
This section builds upon the ideas of the DLDC formulation for STEM education and proposes discrete calculus-inspired
deep learning methods to solve partial differential equations and integral equations which are at the core of applied
mechanics.
8 Sourav Saha1 et al.
In this section, we use DLDC to formulate a new interpolation theory based on convolution, called Convolution Hierar-
chical Deep-learning Neural Network (C-HiDeNN). C-HiDeNN interpolants are smoother and more accurate than those
of FEM. They are designed to have arbitrary reproducing polynomial orders, resulting in superior convergence behavior
without using higher order elements.
To understand C-HiDeNN, we should review HiDeNN-FEM first [1, 28]. HiDeNN-FEM is a neural network rep-
resentation of FEM. HiDeNN-FEM shape functions are small neural networks that are designed to be mathematically
equivalent to FEM shape functions. Therefore, the weight and bias of HiDeNN-FEM shape functions are represented as
a function of nodal coordinates and nodal field variables. The element-wise shape functions are then gathered to form a
global neural network and finally, a loss function based on the principle of minimum potential energy is minimized.
In HiDeNN-, the solution is obtained by optimizing nodal coordinates and nodal field variables with respect to the
potential energy loss function. There are two solution schemes for the HiDeNN-FEM. First, the nodal coordinates are
kept fixed and only the nodal variables are updated. This makes the HiDeNN-FEM mathematically equivalent to FEM,
thus the solution accuracy and computation time of HiDeNN-FEM is on the same order as FEM. Second, both the nodal
coordinates and nodal variables are updated, thus making it equivalent to r-adaptive FEM. For detailed discussions,
readers may refer to [28].
Fig. 8: C-HiDeNN formulation for 1D Poisson problem. Light blue and orange terms are weights and biases of the neural
network, respectively. If there is no weight or bias assigned for a neuron, it will have fixed negligible weight=1 and
bias=0. Functions inside neurons with blue edges represent activation functions while those with black edges represent
inputs (green color) and outputs (white color) of the neuron. (a) shows nodal coordinates in both physical and parametric
space, focusing on the element of interest eI . Nodal patch domains and other terminologies are defined below. (b)
represents the C-HiDeNN shape function of element eI , which constitutes the hierarchical DeNN layer of the global
neural network (d). (c) is the convolution patch function that can be found at the greed dotted box in (b). This figure is
taken from [47].
Since HiDeNN-FEM is still based on finite element interpolations, the improvement of global solution accuracy is
insignificant even with the automatic r-adaptivity. This prompted the idea of convolution to answer the question: ”Can we
incorporate convolution filters to HiDeNN-FEM shape functions to achieve highly smooth and accurate interpolants?”.
This idea is written as (in 1D):
Title Suppressed Due to Excessive Length 9
X X X
xi
uh,e (ξ) = Ni (ξ) Wa,p, j x (ξ) u j =
h,e
Ñk (ξ)uk , (15)
i∈Ae j∈Ais k∈Aes
Fig.8(a) where the integer-valued patch size s refers to the number of element layers surrounding node i. That is, Ai=x s=2
I
i=xI+1
contains node xI−2 through node xI+2 while A s=2 contains node xI−1 through node xI+3 .
The general polynomial interpolants Ni (ξ) can be thought as any finite element shape functions that are compactly
xi h,e (ξ) are also
supported and satisfy Kronecker delta and partition of unity. The convolution patch functions Wa,p, j x
compactly supported supported interpolation functions that satisfy Kronecker delta and reproducing conditions. Here,
the dilation parameter a determines the size of support domain and p is the reproducing polynomial order. For the convo-
lution patch functions, we borrow from well-developed meshfree techniques. In this study, the radial point interpolation
method is adopted because this method returns stable interpolants that satisfy Kronecker delta property and reproducing
conditions. For details, readers may refer to [48,49].
Finally, the double summation in Eq. 15 is combined into a single summation over the elemental patch domain
Aes = ∪i∈Ae Ais , and the resulting convolution interpolants are Ñk (ξ) where k ∈ Aes . The convolution interpolants therefore
satisfy compact supportness, partition of unity, Kronecker delta, and reproducing conditions, making it easy to apply
boundary conditions.
In terms of DLDC, the C-HiDeNN shape function can be written as a neural network illustrated in Fig.8(b). The
first two hidden layers refer to the finite element shape functions, and the third hidden layer is for the convolution patch
functions. Note that the number of neurons is determined by the number of nodes in nodal patch domains. That is, larger
patch size s leads to larger number of neurons or higher connectivity. This is analogous to the convolutional neural
network (CNN) kernels.
In C-HiDeNN, higher reproducing polynomial order p is achieved by setting large patch size s. Based on our
preliminary study, s ≥ p to get stable convolution patch functions. In other words, C-HiDeNN can achieve higher order
p by increasing the connectivity of nodes while using the same linear elements. This is completely different in FEM
where higher order elements must be used to build higher order shape functions. Therefore, the global degrees of freedom
(DOFs) of C-HiDeNN are the same as linear FEM, but the bandwidth of the global stiffness matrix becomes larger due
to increased nodal connectivity.
The weights of the third hidden layer in Fig.8(b) are built form another sub-neural network shown in Fig.8(c). This
follows the radial point interpolation theory and details are discussed in [47]. Finally, the elementwise shape functions
are assembled to form a global neural network and potential energy loss function shown in Fig.8.
To demonstrate superior accuracy of C-HiDeNN compared to FEM, a 2-D Poisson problem has been solved:
∇ · (∇u(x)) + b f (x) = 0 in Ω
(17)
u=0 on Γ
where b f (x) is the body force and a square domain Ω whose lower left corner is at (0, 0) and upper right cornder is at
(10,10) is used. Γ is the boundary of the domain Ω. We set the analytical field variable as:
1 2
(x − 10x)(y2 − 10y) 2e−2((x−3) +(y−3) ) + e−2((x−7) +(y−7) ) .
2 2 2 2
u(x) = (18)
625
L2 and H1 error estimators are used:
R 1/2
Ω
(u − uh )2 dx
∥e∥L2 = ∥u − uh ∥L2 = R 1/2
Ω
u2 dx
R 1/2 (19)
(u − uh )2 dx + Ω ∥∇u − ∇uh ∥22 dx
R
Ω
∥e∥H1 = ∥u − u ∥H1 =
h
R 1/2
u2 dx + Ω ∥∇u∥22 dx
R
Ω
Fig.9 shows C-HiDeNN-FEM can have arbitrary convergence rates depending on the reproducing polynomial order
p. That is, for the same p, the convergence rates (slope of Fig.9(a,c)) of FEM and C-HiDeNN are asymptotically
the same. However, their y-intercepts are different. When p = 1, C-HiDeNN is around two orders of magnitude more
accurate than FEM for a given mesh size. The gap decreases as p goes up and the FEM turns around when p = 3 (i.e.,
10 Sourav Saha1 et al.
Fig. 9: Convergence plot. (a, b) are for L2 norm error and (c, d) are for H1 norm error estimation. (a, c) are the error vs.
mesh size plots and (b, d) are the error vs. degrees of freedom plots. Numbers on the graph are the convergence rates
(italic font for C-HiDeNN). For C-HiDeNN, patch size s = 3 and dilation parameter a = 30. The graphs are colored by
the reproducing polynomial order p. FEM uses dashed lines and C-HiDeNN uses solid lines.
blue dotted curves are lower than blue solid curves in Fig.9(a,c)). However, this is not a fair comparison as higher order
FEM uses higher order elements that has more DOFs while C-HiDeNN still uses the linear elements.
The errors are plotted with respect to DOFs in Fig.9(b,d). These two plots show that C-HiDeNN curves are always
lower than those of FEM for the same p. That is, for the same DOFs, C-HiDeNN is always more accurate than FEM. In
other words, to achieve the same level of error, FEM needs more DOFs than C-HiDeNN for the same p. When p = 3,
for example, FEM requires 5 to 10 times more DOFs than C-HiDeNN-FEM for a given accuracy (see Fig.9(b,d)).
It is important to note that the convolution operations over the local patch domains are the same as those performed in
convolutional neural network (CNN). The larger the patch size s, the more neurons connectivity. The dilation parameter
a acts as a convolution operator, dictating the feature extraction of the data. Thus, many of the well-developed high-
performance computing algorithms utilizing parallel architectures (such as GPU and TPU) in CNN can be integrated
with C-HiDeNN. [47] discusses in detail on how the parallel programming can accelerate C-HiDeNN computation and
demonstrates that it can be as fast as commercial FEM software running on CPU.
This section proposes a DLDC strategy for solving partial differential equations based on a space-time finite element
formulation [50,51,52]. Space-time finite element methods are numerical discretization schemes to predict the spa-
tiotemporal responses of dynamic systems. When used to approximate solutions to partial differential equations, they
require assumed values of the underlying governing parameters. For structural problems, these parameters may corre-
spond to material properties such as density, stiffness, and damping capacity. For many applications these assumptions
are reasonable, but for many others the distribution of properties throughout a material domain is unknown. For the
latter, experimental observation is required for the accurate calibration of model parameters. Consider an elastic one-
dimensional bar governed by the following equation.
∂2 u ∂2 u
EA − ρA + f = 0. (20)
∂x2 ∂t2
Here, t is time, x is location, u is displacement, E is elastic stiffness, A is cross-sectional area, ρ is density of the
material, and f is external excitation. The discretization of the bar must be done through the space and time dimensions
simultaneously, and here it is done with linear quadrilateral elements. Details of the Galerkin formulation are included
in the Appendix. The nodes are numbered counting upwards through space in the same order for each time instant, be-
ginning with the first time instant. This results in a highly structured space-time stiffness matrix that will be exploited in
the developed DLDC method. For this problem with one spatial dimension, N nodes through space and T nodes through
time will result in a total of N × T degrees of freedom. Besides the enforcement of initial and boundary conditions, the
Title Suppressed Due to Excessive Length 11
stiffness matrix retains a diagonal structure such that the N equations corresponding to a given time instant only have 3N
non-zero coefficients. This corresponds to the displacement field at time t being only directly related to the displacement
fields at times t − 1 and t + 1. For a system discretized with linear quadrilateral elements, a piece of the matrix equation’s
larger diagonal structure is depicted below with subscripts denoting sub-matrix size and superscripts denoting time step.
t−2
uN×1
t−1 t−1
AN×N BN×N CN×N 0 0 uN×1 fN×1
t t
0 AN×N BN×N CN×N 0 uN×1 = fN×1 (21)
0 0 AN×N BN×N CN×N ut+1 f t+1
N×1 N×1
t+2
uN×1
Here, A, B, and C represent repeated sub-matrices within the global stiffness matrix. For the elastic bar governed
by an equation with no damping term, it is noted that A and C are identical. The solution of this matrix equation can be
done sequentially, solving only N simultaneous equations at a time before progressing to the next time instant. These
computations can be formulated as an autoregressive neural network. Autoregressive neural networks are sequential
feed-forward models used to predict sequence values based on prior observations of the sequence [53, 54]. Generally,
this can be represented by the following equation with ARNN denoting the autoregressive neural network as a function
of prior observation data, where Xt and εt represent the observation and prediction error respectively at step t in a
sequence of data.
∂u
∂2 u
m + c + ku = f0 sin(ωt) (25)
∂t2 ∂t
Here, t is time, u is displacement, m is mass, c is damping coefficient, k is spring stiffness, and f0 and ω are the
maximum magnitude and frequency of an external sinusoidal force. Training data is generated by solving the equation
using 150 finite elements spanning 3 time units for the following parameter values and initial conditions: m = 1, c =
10, k = 100, f0 = 10, ω = 2π, u(t = 0) = 1, and ∂u
∂t (t = 0) = 1. A STFEM autoregressive neural network is implemented
with m, c, and k as the trainable parameters. The sum of the absolute differences, also known as the L1 loss function,
between the true displacements, ut , and their predicted values are taken as the objective function to be minimized. The
Adam optimization algorithm is used to iteratively update the learnable parameters. A convergence plot of the training
12 Sourav Saha1 et al.
(a) Exact Training Data and Predicted Solution (b) Convergence of the Training Procedure
Fig. 10: Training the STFEM Auto-Regressive Neural Network with Exact Data.
procedure is depicted in Fig. 10(b), which shows that the learnable parameters in the neural network approach the true
values of the equation coefficients as the sum of the absolute errors, |εt |, in the prediction of ut is minimized, resulting in
accurate prediction of the equation solution visualized in Fig. 10(a). The learned values of these coefficients within the
neural network can then be used to make predictions for problems with new initial conditions and boundary conditions,
generalizing and extrapolating beyond training data like a traditional numerical method while remaining a data-driven
machine learning framework. As shown in Fig. 11, the model is able to accurately predict the behavior of the spring-
mass-damper system for a variety of problems with different boundary and initial conditions even though it was only
trained on a single case.
Fig. 11: Prediction for New Initial Conditions and Boundary Conditions from Exact Training Data.
Gaussian noise of mean 0 and variance 0.001 is added to a new set of training data for the same problem, which is
now solved with a coarser 50 element mesh spanning 3 time units. The resulting time series is shown in Fig. 12, and Fig.
13 illustrates that even when provided with a very small and noisy training data set the STFEM structured auto-regressive
neural network can accurately capture the system behavior across a variety of new conditions. The ability of the STFEM
neural network to produce accurate predictions for cases beyond the training data set is not typical of traditional neural
networks, which typically only perform well when applied to cases similar to those seen during training.
Fig. 13: Prediction for New Initial Conditions and Boundary Conditions from Noisy Training Data.
Dense and deep neural networks offer great function representation capacity, which make them powerful tools for
predicting complex phenomena if they are provided with sufficiently large and diverse datasets to learn from. With
limited data, neural networks exhibit overfitting due to their large number of trainable parameters leading to poor inter-
polation capability. Additionally, the extrapolation capability of neural networks to make good predictions beyond the
range of their training data is generally very poor with “black box” implementation. These qualities preclude neural net-
works from solving engineering problems where the collection of data is expensive or cannot span the space of desired
prediction capability. By designing an autoregressive neural network architecture consistent with the space-time finite
element method, the generalizability and interpretability of the numerical method can be preserved while harnessing
the advantage of data-driven machine learning to reduce the need to make potentially incorrect parameter assumptions.
The STFEM neural network may be trained on video data for only a single loading case to produce reasonably accurate
parameter values. This intelligently designed neural network, which captures mesh connectivity and spatiotemporal dis-
cretization, both reduces the cost of model training and increases robustness to data noise. The simple neural network
architecture, with limited learnable parameters, eliminates the need for collecting a massive quantity of data. In fact, with
exact data, only a few time observations are required to eventually converge to the true system parameters, which could
then be used to make predictions across a wide variety of boundary conditions in the same manner as finite element
analysis. Since the functional structure of the neural network is specified as the finite element method, the collection
of additional data only serves to reduce inaccuracies arising due to noise in the observations. This contrasts traditional
deep-learning models which seek to learn a functional structure from scratch, hence their massive data requirement.
The section will discuss how the DLDC methods can be used to solve the integral equations. For brevity and the underly-
ing context of the article, this section will focus on how to solve the Lippmann-Schwinger equation [55] efficiently with
the understanding of the numerical methods through DLDC. This form of integral equation has myriads of applications
including scattering [56] and multiscale mechanics of materials [57, 58]. A general integral equation appears as,
Z b
f (x) = K(x, y)u(y)dy, (26)
a
where u is an unknown function, f is a known function, and K is the kernel function. Solution of the integral
equation depends on determining the kernel function. The importance of this kernel function will be clear if discussed
in the context of solving a differential equation through Green’s function. For example, consider the following equation,
Here, G(·) is the Green’s function, and the following relationship holds true,
independence, in Fourier Neural Network (FNO) the input space of the function variables is taken to Fourier domain
and trained [33]. Another approach is using graph kernels [59] where a subset of the domain points from input and the
output functional space are considered to construct a graph and convolutional kernel is employed for training. While
these networks are shown to solve complex problems including Navier-Stokes equation [33], when the domain to solve
becomes very large for example large-scale simulation of 3D microstructure as shown in [60, 61], the number of training
parameters may explode and a lot of offline training efforts are required. In this section, using the concepts of mechanics
and DLDC, the article gives formulations for two methods that can reduce the computational burden when solving
large-scale engineering problems through convolutional kernels.
Fig. 14: The idea of solving a problem with SCA is explained at the top row of the figure. The 2D representative volume
element with 100 × 100 mesh is discretized into 8 clusters. In DLDC method, each cluster is assumed to be a node of
the graph and mechanistic functional spaces are taken as nodal attributes. The interaction matrix is similar to nodal
interaction among the clusters.
As a demonstration, the article chooses to solve resort to the self-consistent clustering analysis [62]. The SCA
algorithm has demonstrated considerable success in modeling damage, failure, and mechanical response of multiscale
materials system [63,4]. The idea of SCA is to group similar material points together in a structure based on their elastic
response, and solve the Lippmann-Schwinger equation only for the group or cluster of material points instead of the
entire domain. More details can be found here [62]. The following section will touch the basics of SCA for convenience
of discussion.
The homogenization problem for a representative volume element (RVE) can be modeled as the following integral
equation.
Z h i
ϵ(x) = ϵ 0 (x) − G(x, y) : σ(y) − C0 (y) : ϵ(y) dy, (30)
Ω
where ϵ(x) is local strain at material point x, Ω is the physical domain (RVE), ϵ 0 (x) is background strain, σ(y) is the
local stress, C 0 is the reference material stiffness. With SCA, the Eq. 30 is not solved for all the material points rather on
clustered domain. The clustering is done by K-means clustering [64]. At first, a small, elastic load is applied to the RVE
and the resulting strain concentration matrix is stored. Based on this strain concentration matrix, the RVE is clustered
(see Figure 14). On this clustered domain the Eq. 30 becomes
k " Z Z # h
X 1 i
∆ϵ I = ∆ϵ 0 − I
χ I
(x)χ J
(y)G(x, y)dxdy : ∆σI (y) − C 0 (y) : ∆ϵ J (y) . (31)
j=1
c |Ω| Ω Ω
In the equation, c is cluster identifier, k is the number of clusters, I, J are the cluster indices, and χI (x) is the cluster
variables which assumes value 1 when x is in cluster I and zero otherwise. If one compares Eqn. 30 and 31, it will be
apparent that the Green’s function is replaced by the convolution operation also known as the interaction tensor.
Title Suppressed Due to Excessive Length 15
Z Z
1
D = I
IJ
χI (x)χ J (y)G(x, y)dxdy. (32)
c |Ω| Ω Ω
In conventional SCA method, this interaction tensor is pre-computed as so-called ”offline” database. Based on this
pre-computed kernel/interaction tensor, the strain on each cluster is computed from Eq. 32. With the DLDC method, the
article will show how to determine this kernel on the cluster centroids so that the number of parameters to be trained can
be reduced.
Using graph kernel networks, the Eq. 31 can be solved on reduced functional space. The justification comes from the
mathematical form of the graph kernel neural networks. The graph kernel network has the following form for variable
update:
Z !
vt+1 (x) = σ Wvt (x) + K(x, y, a(x), a(y))vt (y)dy . (33)
Ω
Here, vt+1 is the output variable in a transformed functional space at (t + 1)-th time step, σ is the activation function,
W is weight, (x, y, a(x), a(y) is the edge index in the input function space where x and y are coordinates, and a(x) and
a(y) are values of input function. More details can be found here []. The discretized form of the Eq. 33 is:
1 X
vt+1 (x) = σ Wvt (x) +
K(x, y, a(x), a(y))vt (y)dy . (34)
|N(x)| y∈N
(a) (b)
(c) (d)
Fig. 15: Problem statement for the one-dimensional self-consistent clustering analysis. a) The one-dimensional problem
domain for training the network, b) The cluster centroids after k-means clustering, c) Comparison between analytical
and SCA solution for 33 clusters, d) Evolution of error during training.
In this equation, N(x) defines a neighborhood inside radius r. Drawing similarities with Eq. 31, it is apparent that
the interaction tensor and the summation term in Eq. 34 have similarities. The only difference is that, in graph kernel
network the neighborhood nodes can be selected by changing the radius of whereas the SCA is inherently global when
computing the interaction tensor. As an example, let us consider a one-dimensional domain of length 10 as shown in
Figure 15 (a) discretized into 1000 points in equal intervals. The material is a linear elastic one with stiffness varying as
C(x) = 1+x1
2 . On this domain, the overall applied strain ϵ is varied from 0.05 to 0.5 and the domain is clustered starting
from 2 to 128 clusters. For each such discretization, the cluster centroids are taken as the input samples. Essentially, 1000
material points are reduced 2-128 material points with the assumption that the local strain has a constant value inside
16 Sourav Saha1 et al.
Fig. 16: Comparison between kernel learning prediction and SCA for 300 clusters.
each of these clusters. A distribution of possible cluster centroids with 33 clusters is shown in Figure 15(b). To generate
the data, the solution with SCA is validated against the analytical solution (Figure 15(c)). The training is performed
with a modified version of the graph kernel network with 6 iterative layers and a radius of 2 (i.e., the interaction is
established between only 2 neighbor clusters). The training performance of the network is shown in Figure 15. The
figure suggests that the training performance is quite good as the normalized mean squared error is going down with
number of epochs. A more thorough analysis on the efficacy of such construction is shown in a companion paper [65]
where it is shown how varying the hyperparameters of the neural network such as the radius of influence r or number of
layers for training. Apart from the reduction of training parameters and domain points, the method can also extrapolate
to some extent. It is often identified as the ”resolution independence”. An example of such extrapolation is shown in
Figure 16.The figure shows the prediction of the kernel learning algorithm and SCA for 300 clusters and 0.2 applied
strain. This particular case is outside the domain of training and testing datasets. In spite of that, we can see that the
kernel learning algorithm is as good as SCA. However, further testing is required before claiming that kernel learning
method has resolution independence. Nevertheless, the results are promising and shows a way how we can overcome
the limitations of the current graph kernel networks while solving three-dimensional engineering problems.
4 Future Developments
While the structure of the neural networks proposed in DLDC shows the potential of a general framework of solving
fundamental calculus and applied science problems, there are still some limitations that need to be overcome. For
example, the forward difference or central difference neural networks presented in the differential calculus section works
well only same discretization or sampling intervals are used. This is apparent from the construction of neural networks
as well. However, in real-life applications, unstructured data or uneven sampling is quite common. The next step is to
extend the DLDC structures similar to differential quadrature method to obtain the derivative of unstructured data. The
weights can be learned via either convolutional kernel (similar to C-HiDeNN) or fully-connected neural network or its
variant (such as autoencoder). While the DLDC structure gives the exact reproduction of the Gauss quadrature method,
it is to be explored if the flexibility of deep neural network can be used to see if it can solve integration that hard to solve
with Gauss methods, such as integration around singularities. The authors are currently working to extend C-HiDeNN
into a more general form and to solve multiresolution problems such as [66, 67]. While the proposed integral equation
solver can significantly reduce the number of training parameters for graph kernels, we still need to train the method with
some data. The authors are currently working to extend C-HiDeNN method to solve the integral Lippmann-Schwinger
equation so that the convolutional patch function can capture interaction among the cluster centroids.
5 Conclusions
The proposed DLDC method aims to combine the rich knowledge of numerical methods with modern data science
techniques and create a new perspective on solving challenging problems in engineering and physical sciences. The
building blocks of the DLDC are still in the making. The purpose of building the tools is to make a general and easy to
follow process of making a deep neural network to solve any governing differential equation. The vision is to combine
all the benefits from different numerical methods and propose a unified approach to solve engineering problems. At the
same time, the DLDC methods can be used as a teaching tool for calculus in K-12 education system. These tools will
simultaneously introduce the students with deep learning and applied calculus. This is expected to increase participation
Title Suppressed Due to Excessive Length 17
in STEM programs from students. Moreover, the DLDC methods try to avoid multiple training and can be more flexible
using the deep neural network. The DLDC version of finite element method i.e., C-HiDeNN shows higher accuracy
compared to FEM with same degrees of freedom. The proposed integral equation solver potentially avoids large number
of training parameters that was the bottleneck for using graph kernel network. However, further research is required to
establish the DLDC as viable method to solve for general problems in science and engineering.
Acknowledgements The authors would like to acknowledge the support of National Science Foundation (NSF, USA) grants CMMI-1762035
and CMMI-1934367 and AFOSR, USA grant FA9550-18-1-0381. C. Park and S. Saha would like to thank the Division of Orthopaedic Surgery
and Sports Medicine at Ann and Robert H. Lurie Children’s Hospital for their philanthropic grant. The authors would like to acknowledge
the contribution of Alberto Ciampaglia, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Italy and Visiting
Researcher, Department of Mechanical Engineering, Northwestern University to this article.
ZT ZR
∂2 u ∂2 u
!
ṽ EA − ρA ∂x∂t = 0 (36)
∂x2 ∂t2
0 L
ZT ZR
∂ṽ ∂2 u ∂ṽ ∂u ∂ṽ ∂2 u ∂ṽ ∂u
!
EA − EA − ρA + ρA ∂x∂t = 0 (37)
∂x ∂x2 ∂x ∂x ∂t ∂t2 ∂t ∂t
0 L
# x=R ZT ZR #t=T ZT ZR
T
∂u ∂ṽ ∂u ∂u ∂ṽ ∂u
"Z "Z R
EAṽ ∂t − EA ∂x∂t − ρAṽ ∂x + ρA ∂x∂t = 0 (38)
0 ∂x x=L ∂x ∂x L ∂t t=0 ∂t ∂t
0 L 0 L
Finite element shape function matrices can be used to discretize the simplified weak form. For an element spanning
x = x0e to x = x1e through space and t = t0e to t = t1e through time, the element matrices correspond to the following
expressions.
e e
Zt1 Zx1
∂ṽ ∂u
Element Stiffness Matrix [K]e = EA ∂x∂t (39)
∂x ∂x
t0e x0e
e e
Zt1 Zx1
∂ṽ ∂u
Element Mass Matrix [M]e = ρA ∂x∂t (40)
∂t ∂t
t0e x0e
The global boundary conditions are prescribed via the other terms in Eq. 38.
T
∂u T
∂u
"Z # "Z #
Space Boundary Conditions = EAṽ ∂t , EAṽ ∂t (41)
0 ∂x x=L 0 ∂x x=R
R
∂u R
∂u
"Z # "Z #
Time Boundary Conditions = ρAṽ ∂x , ρAṽ ∂x (42)
L ∂t t=0 L ∂t t=T
The time boundary condition at t = 0 corresponds to the enforcement of initial velocity. The time boundary condition
at the final time, t = T , can be eliminated from the global matrix equation in favor of equations that enforce initial
displacement at t = 0.
18 Sourav Saha1 et al.
7 Appendix 2: Time Finite Element Formulation for a Spring Mass Damper System
The element force matrices account for the force boundary condition due to the external sinusoidal excitation of the
sprung mass. The temporal boundary conditions are prescribed via the other term in Eq. 45.
" # " #
du du
Time Boundary Conditions = mṽ , mṽ (50)
dt 0 dt T
The time boundary condition at t = 0 corresponds to the enforcement of initial velocity. The time boundary condition
at the final time, t = T , can be eliminated from the global matrix equation in favor of equations that enforce initial
displacement at t = 0.
References
1. S. Saha, Z. Gan, L. Cheng, J. Gao, O.L. Kafka, X. Xie, H. Li, M. Tajdari, H.A. Kim, W.K. Liu, Computer Methods in Applied Mechanics
and Engineering 373, 113452 (2021)
2. X. Xie, J. Bennett, S. Saha, Y. Lu, J. Cao, W.K. Liu, Z. Gan, npj Computational Materials 7(1), 1 (2021)
3. Z. Gan, K.K. Jones, Y. Lu, W.K. Liu, Integrating Materials and Manufacturing Innovation 10(2), 177 (2021)
4. O.L. Kafka, K.K. Jones, C. Yu, P. Cheng, W.K. Liu, Journal of the Mechanics and Physics of Solids 150, 104350 (2021)
5. J. Sirignano, K. Spiliopoulos, Journal of computational physics 375, 1339 (2018)
6. M. Raissi, The Journal of Machine Learning Research 19(1), 932 (2018)
7. J. Han, A. Jentzen, et al., Communications in mathematics and statistics 5(4), 349 (2017)
8. B. Yu, et al., Communications in Mathematics and Statistics 6(1), 1 (2018)
9. L. Xiao, B. Liao, S. Li, K. Chen, Neural Networks 98, 102 (2018)
10. M. Zhu, B. Chang, C. Fu, arXiv preprint arXiv:1802.08831 (2018)
11. J.N. Kani, A.H. Elsheikh, arXiv preprint arXiv:1709.00939 (2017)
12. M. Karl, M. Soelch, J. Bayer, P. Van der Smagt, arXiv preprint arXiv:1605.06432 (2016)
13. S. Chakraverty, S. Mall, Artificial neural networks for engineers and scientists: solving ordinary differential equations (CRC Press, 2017)
14. M. Magill, F. Qureshi, H. de Haan, Advances in Neural Information Processing Systems 31 (2018)
15. W.K. Liu, Z. Gan, M. Fleming, Springer. Lubliner, J., Oliver, J., Oller, S. and Oñate, E.(1989). A plastic-damage model for concrete.
International Journal of solids and structures 25(3), 299 (2021)
16. M. Raissi, P. Perdikaris, G.E. Karniadakis, Journal of Computational physics 378, 686 (2019)
17. G.E. Karniadakis, I.G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang, Nature Reviews Physics 3(6), 422 (2021)
18. G. Pang, L. Lu, G.E. Karniadakis, SIAM Journal on Scientific Computing 41(4), A2603 (2019)
19. A. Krishnapriyan, A. Gholami, S. Zhe, R. Kirby, M.W. Mahoney, Advances in Neural Information Processing Systems 34, 26548 (2021)
20. R. Li, E. Lee, T. Luo, Materials Today Physics 19, 100429 (2021)
21. W.T. Leung, G. Lin, Z. Zhang, arXiv preprint arXiv:2108.12942 (2021)
22. S.L. Brunton, J.L. Proctor, J.N. Kutz, Proceedings of the national academy of sciences 113(15), 3932 (2016)
23. E. Kaiser, J.N. Kutz, S.L. Brunton, Proceedings of the Royal Society A 474(2219), 20180335 (2018)
24. K. Kaheman, J.N. Kutz, S.L. Brunton, Proceedings of the Royal Society A 476(2242), 20200279 (2020)
25. Z. Gan, O.L. Kafka, N. Parab, C. Zhao, L. Fang, O. Heinonen, T. Sun, W.K. Liu, Nature communications 12(1), 1 (2021)
26. X. Xie, W.K. Liu, Z. Gan, arXiv preprint arXiv:2111.03583 (2021)
Title Suppressed Due to Excessive Length 19
27. R.T. Chen, Y. Rubanova, J. Bettencourt, D.K. Duvenaud, Advances in neural information processing systems 31 (2018)
28. L. Zhang, L. Cheng, H. Li, J. Gao, C. Yu, R. Domel, Y. Yang, S. Tang, W.K. Liu, Computational Mechanics 67(1), 207 (2021)
29. M.Y. Niu, L. Horesh, I. Chuang, arXiv preprint arXiv:1904.12933 (2019)
30. T.W. Hughes, I.A. Williamson, M. Minkov, S. Fan, Science advances 5(12), eaay6946 (2019)
31. J. He, J. Xu, Science china mathematics 62(7), 1331 (2019)
32. N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, A. Anandkumar, arXiv preprint arXiv:2108.08481 (2021)
33. Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, A. Anandkumar, arXiv preprint arXiv:2010.08895 (2020)
34. L. Lu, P. Jin, G.E. Karniadakis, arXiv preprint arXiv:1910.03193 (2019)
35. L. Lu, X. Meng, S. Cai, Z. Mao, S. Goswami, Z. Zhang, G.E. Karniadakis, Computer Methods in Applied Mechanics and Engineering
393, 114778 (2022)
36. A. Anandkumar, K. Azizzadenesheli, K. Bhattacharya, N. Kovachki, Z. Li, B. Liu, A. Stuart, in ICLR 2020 Workshop on Integration of
Deep Neural Models and Differential Equations (2020)
37. H. You, Y. Yu, M. D’Elia, T. Gao, S. Silling, arXiv preprint arXiv:2201.02217 (2022)
38. C. Tucker, K. Jackson, J.J. Park, in American Society of Engineering Education (2020)
39. N. Wang, P. Tonko, N. Ragav, M. Chungyoun, J. Plucker, Technology & Innovation (2022)
40. D.S. Touretzky, C. Gardner-McCune, Computational Thinking Education in K-12: Artificial Intelligence Literacy and Physical Computing
pp. 153–180 (2022)
41. D. Touretzky, C. Gardner-McCune, D. Seehorn, International Journal of Artificial Intelligence in Education pp. 1–34 (2022)
42. Y. Yin, Ai4all: Ai education for k-12. Tech. rep., EasyChair (2022)
43. S.C. Chapra, R.P. Canale, et al., Numerical methods for engineers, vol. 1221 (Mcgraw-hill New York, 2011)
44. R. Bellman, J. Casti, Journal of Mathematical Analysis and Applications 34(2), 235 (1971)
45. C.W. Bert, M. Malik, Applied Mechanics Review (1996)
46. C. Shu, Differential quadrature and its application in engineering (Springer Science & Business Media, 2012)
47. C. Park, Y. Lu, S. Saha, T. Xue, J. Guo, S. Mojumder, G. Wagner, W. Liu, Computaitonal Mechanics (2023)
48. G. Liu, Y. Gu, Journal of Sound and vibration 246(1), 29 (2001)
49. G.R. Liu, Y.T. Gu, An introduction to meshfree methods and their programming (Springer Science & Business Media, 2005)
50. T.J. Hughes, G.M. Hulbert, Computer methods in applied mechanics and engineering 66(3), 339 (1988)
51. G.M. Hulbert, T.J. Hughes, Computer methods in applied mechanics and engineering 84(3), 327 (1990)
52. L. Wang, H. Zhong, Applied Mathematical Modelling 41, 445 (2017)
53. T. Taskaya-Temizel, M.C. Casey, Neural Networks 18(5-6), 781 (2005)
54. O. Triebe, N. Laptev, R. Rajagopal, arXiv preprint arXiv:1911.12436 (2018)
55. B.A. Lippmann, J. Schwinger, Physical Review 79(3), 469 (1950)
56. A. Gopal, P.G. Martinsson, Advances in Computational Mathematics 48(4), 1 (2022)
57. M. Zecevic, R.A. Lebensohn, L. Capolungo, Mechanics of Materials 166, 104208 (2022)
58. H. Moulinec, P. Suquet, Computer methods in applied mechanics and engineering 157(1-2), 69 (1998)
59. Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, A. Anandkumar, arXiv preprint arXiv:2003.03485 (2020)
60. S. Saha, O.L. Kafka, Y. Lu, C. Yu, W.K. Liu, Integrating Materials and Manufacturing Innovation 10(3), 360 (2021)
61. S. Saha, O.L. Kafka, Y. Lu, C. Yu, W.K. Liu, Integrating Materials and Manufacturing Innovation 10(2), 142 (2021)
62. Z. Liu, M.A. Bessa, W.K. Liu, Computer Methods in Applied Mechanics and Engineering 306, 319 (2016)
63. C. Yu, O.L. Kafka, W.K. Liu, Computer Methods in Applied Mechanics and Engineering 349, 339 (2019)
64. J.A. Hartigan, M.A. Wong, Journal of the royal statistical society. series c (applied statistics) 28(1), 100 (1979)
65. O. Huang, S. Saha, J. Guo, W.K. Liu, Computational Mechanics (submitted) (2023)
66. C. McVeigh, F. Vernerey, W.K. Liu, L.C. Brinson, Computer Methods in Applied Mechanics and Engineering 195(37-40), 5053 (2006)
67. C. McVeigh, W.K. Liu, Journal of the Mechanics and Physics of Solids 57(2), 244 (2009)