Physics Informed Neural Networks For Numerical Analysis
Physics Informed Neural Networks For Numerical Analysis
Robin Pfeiffer
June 2022
Table of Contents
1. Introduction........................................................................................................................................................... 1
2. How Neural Networks work.................................................................................................................................. 1
2.1 Activation Function......................................................................................................................................... 1
2.2 More general activation functions................................................................................................................... 4
2.3 Setup of Neural Network.................................................................................................................................5
2.4 Stochastic Gradient Descent.......................................................................................................................... 6
3. Classification Problem.......................................................................................................................................... 9
4. Applications to Numerical Analysis.....................................................................................................................13
4.1 Interpolation Problems..................................................................................................................................13
5. Physics Informed Neural Networks.....................................................................................................................14
6. Optimisation........................................................................................................................................................15
6.1 Stocastic Gradient Descent.......................................................................................................................... 15
6.2 Number of Iterations..................................................................................................................................... 15
6.3 Choosing ......................................................................................................................................................16
7. Conslusion.......................................................................................................................................................... 16
7.1 References................................................................................................................................................... 16
1. Introduction
Multilayered artificial neural networks are becoming a very useful tool in many application fields, including in
numerical solution of differential equations. This is a report on the work I have completed over the summer
of 2022, supervised by Niall Madden, exploring the setup of neural networks and some of its uses in the real
world.
There is a general form that most networks follow. It is set up in layers, with some number of neurons per layer.
In order to go from one layer to another, we left multiply our values by a weight matrix, and add a bias. Through
this process, we end up with many of variables. Training our network consists of making small changes to these
variables over time, until we are happy with the output.
1
It is quite a simple function; all we can do is change how quickly it increases from zero to one, and at what
point along the x-axis it does so, which is done by scaling or shifting the input variable, x. However, by chaining
multiple instances of these functions together, we can create some more complex functions, as will be shown in
later sections.
clear;
x = linspace(-20,20,200);
figure;
tiledlayout(2, 2)
nexttile([1 2])
plot(x, 1./(1+exp(-x)), 'r-')
xlim([-20 20]), ylim([-0.5 1.5])
title('\sigma(x)')
nexttile
plot(x, 1./(1+exp(-(5*x))), 'r-')
xlim([-10 10]), ylim([-0.5 1.5])
title('\sigma(5x)')
nexttile
plot(x, 1./(1+exp(-(5*(x-6)))), 'r-');
xlim([-10 10]), ylim([-0.5 1.5])
title('\sigma(5(x-6))')
2
It is also possible to have a vector based sigmoid function. If your input is a vector of size n, your output will be
a vector of size n.
3
title("Output in 2nd position")
x = linspace(-10,10,200);
figure
plot(x, [exp(x)-exp(-x)]./[exp(x)+exp(-x)], 'r-');
xlim([-10 10]), ylim([-1.5 1.5])
title('tanh(x)')
4
2.3 Setup of Neural Network
A neural network is built on multiple layers, with multiple nodes in each layer. The first layer is our input, the final
layer is our output, and the middle layers are what we call hidden layers.
5
In order to go from one layer to another, we first multiply it by a matrix of weights W and add some bias b, and
then run it through our activation function.
The size of each weight matrix and bias is determined by the number of neurons in the layer before and after. If
there's n variables in the first layer and m variables in the second, our weight will be an matrix. The bias
will be a vector of size m.
We now have a function that symbolises our neural network. We let X be the input of our training
data, and Y the required output. We can now create a measure for how close our network is to the required
solution. We let the cost function be
Note: This is a function of the weights and biases in the neural network.
We now need to minimise this cost function, by changing the weights and biases in our network.
6
This function has a minimum on 0.3088 at
[m, X]=min2(f)
m = 0.3088
X = 1×2
0.3500 0.4000
should be an even better one, for some (which depends on the situation)
Df = grad(f);
x0=[.5;.65];
N = 15;
eta = 0.025;
x = zeros(2,N);
x(:,1)=x0;
figure;
contour(f); hold on;
for i=2:N
x(:,i) = x(:,i-1) - eta*Df(x(1,i-1), x(2,i-1));
dx = x(:,i)-x(:,i-1);
quiver(x(1,i-1), x(2,i-1), dx(1), dx(2))
end
hold off;
7
xlim([0.00 1.00])
ylim([0.00 1.00])
Each of these arrows moves in the direction of greatest descent (along the total derivative)
However, it can be very expensive to find the derivative in all variables, so we just update our function one
variable at a time. Our new formula will look like , for a randomly chosen k.
rng(200)
N = 14;
eta = 0.05;
x = zeros(2,N);
x0 = [0.5,0.65];
x(:,1)=x0;
figure(4);
contour(f); hold on;
for i=2:N
k = randi(2);
x(:,i) = x(:,i-1);
df = Df(k);
x(k,i) = x(k,i-1) - eta*df(x(1,i-1), x(2,i-1));
dx = x(:,i)-x(:,i-1);
quiver(x(1,i-1), x(2,i-1), dx(1), dx(2), 'linewidth', 2)
end
8
hold off;
Each step is in only one direction, either along the x-axis or the y-axis.
We can extend this idea to our cost function, with p variables across N layers. At each iteration we will do the
following:
and are the weight and bias used to jump from layer to i
After iterating over this many times, you should end up with a suitably small cost function. We can optimise this
even further by changing your choice of , more on this later.
3. Classification Problem
The following problem is posed as such. Given a region in , and two sets of points such as the figure below,
train a neural network to split the region in two.
9
close all;
x1 = [0.1,0.3,0.1,0.6,0.4,0.6,0.5,0.9,0.4,0.7];
x2 = [0.1,0.4,0.5,0.9,0.2,0.3,0.6,0.2,0.4,0.6];
X = [x1;x2];
red_i = false(1,length(X));
red_i(1:5)=true;
blue_i = ~red_i;
Y = zeros(size(X));
Y(1,red_i)=1;
Y(2,blue_i)=1;
figure(5);
hold on;
plot(X(1,red_i), X(2,red_i),'ro', ...
X(1,blue_i), X(2,blue_i),...
'bx','MarkerSize',12,'LineWidth',4);
Axis1 = gca;
Axis1.XTick = [0 1]; Axis1.YTick = [0 1];
Axis1.FontWeight = 'Bold'; Axis1.FontSize = 16;
xlim([0,1])
ylim([0,1])
hold off;
10
We can very easily apply this problem to our neural network from above. We can construct our input matrix X
out of the postition of each point, i.e., each column of i will be . Our desired output for each point will be
if it is blue, if it is red. Using this training data, we can train our network to give the following output.
sgdV01
We now test this with a more complicated system, using the same neural network
sgdV02([2 2 3 2]);
11
This clearly didn't go as planned. We can solve this by increasing the amount of layers, and the nuerons per
layer
sgdV02([2 5 5 5 2]);
12
4. Applications to Numerical Analysis
We now move to the idea of a "Physics Informed Neural Network", called a "PINN" for short. At it's simplest,
a PINN is just a neural network that is applied to some problem involving a mathematical model, such as a
differential equation, and where the loss function is related to the residual.
Since the solution of any differential equation is a function, we first look at the idea of approximating functions
with neural networks.
We can use the same setup to approximate functions, given only a few points that lie along it. The only slight
change is we have one input, x position, and one output, which is our y position. Our cost function stays the
same.
In the following example, it will try to plot the function , using only the orange
points as an input.
13
The points in this example are
nlsrun_bvpV02
and .
Let's denote an approximate solution to this equation, given by a neural network, as . Then the residual is
, where the are the training points. So, if it were the case that and , then we
have solved the problem exactly, at least at the initial point and training points. Morelikely, these are not zero,
but they can be used to define the cost function
14
Now we have just another optimisation function. The results are as follows.
ComputeDerivativeExample
These results look promising, but there is scope for improvement, as we now discuss.
6. Optimisation
Then I looked at how to optimise the training of the network.
15
The number of iterations is directly linked to the program's runtime. We can significantly reduce the number of
iterations by setting up a desired cost, and once we hit it we can stop. In my tests a good cost was as any
improvements were negligible to the human eye.
6.3 Choosing
The value of also affects the runtime. If the value of is too small, it will take many steps to make any real
progress. However, if it is too big, it could end up jumping around the minimum without getting any closer. The
choice of is problem specific, and in the examples above was used.
It is also possible to change throughout the training process. At a certain point the steps wiil be too big, so you
want to reduce it. I set my process up as follows. I set a goal for the network to reduce it's cost by 0.6% every
10,000 iterations, and if it missed the goal 3 times in a row, was reduced by 5%.
7. Conslusion
My work is based on Higham + Higham (2018) and Raissi (2019). This project allowed me to use the chebfun
toolbox
This is an area of maths that is quickly evolving and many new capabilities of neural networks are being
discovered constantly, which is why it was exciting to work on this project.
If I had more time to work on this project, I would have liked to explore some other differential equations such
as boundary value problems, coupled systems, non-linear problems and problems with noisy data. I would have
also worked on a gradient descent solver for the differential equation instead of using a built-in non-linear solver.
I would have also liked to develop my code in python to be shared more easily
7.1 References
Higham, C. F. and Higham, D. J. (2019) Deep learning: an introduction for applied mathematicians. SIAM
Review, 61(4), pp. 860-891. (doi: 10.1137/18M1165748)
Raissi, P. Perdikaris, G.E. Karniadakis, (2019). Physics-informed neural networks: A deep learning framework
for solving forward and inverse problems involving nonlinear partial differential equations. J. Comp Phys 37,
pp686-707. (doi: 10.1016/j.jcp.2018.10.045).
16