\history

Received 14 July 2023, accepted 1 August 2023, date of publication 7 August 2023, date of current version 17 August 2023. 10.1109/ACCESS.2023.3302892

\tfootnote

This work was supported by the the Austrian COMET — Competence Centers for Excellent Technologies — Programme of the Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology, the Austrian Federal Ministry for Digital and Economic Affairs, and the States of Styria, Upper Austria, Tyrol, and Vienna for the COMET Centers Know-Center and LEC EvoLET, respectively. The COMET Programme is managed by the Austrian Research Promotion Agency (FFG)

\corresp

Corresponding author: Franz M. Rohrhofer (e-mail: [email protected]).

Data vs. Physics: The Apparent Pareto Front of Physics-informed Neural Networks

FRANZ M. ROHRHOFER1    STEFAN POSCH2    CLEMENS GößNITZER2    and BERNHARD C. GEIGER1    Know-Center GmbH, Research Center for Data-Driven Business & Big Data Analytics, Sandgasse 36/4, 8010 Graz, Austria LEC GmbH, Large Engines Competence Center, Inffeldgasse 19, 8010 Graz, Austria
Abstract

Physics-informed neural networks (PINNs) have emerged as a promising deep learning method, capable of solving forward and inverse problems governed by differential equations. Despite their recent advance, it is widely acknowledged that PINNs are difficult to train and often require a careful tuning of loss weights when data and physics loss functions are combined by scalarization of a multi-objective (MO) problem. In this paper, we aim to understand how parameters of the physical system, such as characteristic length and time scales, the computational domain, and coefficients of differential equations affect MO optimization and the optimal choice of loss weights. Through a theoretical examination of where these system parameters appear in PINN training, we find that they effectively and individually scale the loss residuals, causing imbalances in MO optimization with certain choices of system parameters. The immediate effects of this are reflected in the apparent Pareto front, which we define as the set of loss values achievable with gradient-based training and visualize accordingly. We empirically verify that loss weights can be used successfully to compensate for the scaling of system parameters, and enable the selection of an optimal solution on the apparent Pareto front that aligns well with the physically valid solution. We further demonstrate that by altering the system parameterization, the apparent Pareto front can shift and exhibit locally convex parts, resulting in a wider range of loss weights for which gradient-based training becomes successful. This work explains the effects of system parameters on MO optimization in PINNs, and highlights the utility of proposed loss weighting schemes.

Index Terms:
multi-objective optimization, Pareto front, physics-informed neural networks, system parameters
\titlepgskip

=-21pt

I Introduction

Recent developments in scientific computing have led to deep learning approaches that can model the dynamics of physical systems governed by differential equations. State-of-the-art methods often infer the dynamics during model training by leveraging data that embodies the fundamental laws of physics [1],[2],[3]. In contrast, physics-informed neural networks (PINNs) directly encode the governing differential equations as soft constraints via a physics loss function [4][5]. PINNs enable a seamless integration of data and physics with their respective losses often considered as multi-objective (MO). The large-scale flexibility of PINNs together with their time-continuous, mesh-independent, and unsupervised encoding of differential equations has propelled PINNs into a vast number of multi-scale and multi-physics applications [6][7]. Today, PINNs are widespread in diverse scientific and engineering disciplines, such as bioengineering [8][9], aerodynamics [10][11], and materials science [12][13].

Setting up a well-working PINN application, however, is not straightforward. In particular, the vanilla implementation of PINNs is known to be prone to training failures that often lead to inaccurate and nonphysical predictions [14]. The discussion on training failures in PINNs is diverse, and each problem setup seems to present its own unique challenges for PINN optimization [15]. This diversity makes it difficult to choose the right remedy in the face of certain optimization issues. In general, any improvement in the robustness and generalizability of PINNs requires an understanding of the optimization complexity of the physics loss function. As a general rule, its complexity increases as the physical system and governing differential equations become more complex. This is true of systems with highly-nonlinear [16], chaotic [17], or multiscale dynamics [18]. To cope with complex systems and geometries, domain decomposition or sequence-to-sequence methods have been developed that divide the original problem into smaller subdomains [16][19]. Each subdomain is then tackled by a separate PINN, resulting in an overall lower optimization complexity. Furthermore, soft attention mechanisms were introduced to focus the physics loss optimization on regions which are typically hard to resolve, such as discontinuities [20] and stiff dynamics [21]. For systems that are slightly more complex than a previously solved problem, curriculum regularization provides a simple starting point for the PINN optimization, which gradually becomes more complex as the PINN is trained [16]. Yet even with simple systems, the optimization may converge to suboptimal solutions that describe trivial solutions to the physics loss function [22]. In this regard, learning in sinusoidal space [25] or methods that respect causality [26] provide a potential remedy.

While all the above mentioned circumstances and proposed modifications particularly apply to optimization issues related to the physics loss function, a major class of discussions addresses issues related to the MO optimization in PINNs. Loss weighting schemes are arguably the most frequently used modifications to the vanilla PINN framework.

I-A Loss Weighting Schemes

Loss weighting schemes originate from issues related to MO optimization [27]. In PINNs, MO optimization is inherently specified by multiple loss functions which are related to either data or physics. In this particular context, we use the term “data” to specify the use of loss functions with labeled data. Labeled data is typically used in PINNs to encode initial and boundary conditions of forward problems or to impose additional data constraints, e.g., with data coming from experiments. With the term “physics”, we refer to the use of loss functions that encode the governing differential equations. These types of loss functions do not use any labeled data, as further discussed in Section II. Although the total number of loss functions can be reduced, e.g., by using hard constraints [23][24], most PINN applications involve the use of multiple loss functions. The standard approach to minimizing them is gradient-based optimization of a linear scalarized MO problem, given by

minθiαii(θ),subscript𝜃subscript𝑖subscript𝛼𝑖subscript𝑖𝜃\min_{\theta}\sum_{i}\alpha_{i}\mathcal{L}_{i}(\theta),roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_θ ) , (1)

with θ𝜃\thetaitalic_θ denoting the network weights being optimized, isubscript𝑖\mathcal{L}_{i}caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT the constituent loss functions and αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT their respective loss weights. The vanilla PINN framework uses an unweighted scalarization, hence α=1𝛼1\alpha=1italic_α = 1 for any given loss function [5]. However, it has been frequently reported that this simple linear combination often leads to optimization failures, which are characterized by a stalled minimization of one loss and high prediction errors. Consequently, it has become a standard procedure in PINNs to adjust loss weights, which are trimmed (often in a trial-and-error procedure) until a sufficiently accurate prediction is obtained. Several adaptive loss weighting schemes have been introduced to overcome the cumbersome tuning of manual loss weights. These schemes focus on what is observed during the PINN training and rely on mean gradient statistics [27][28], inverse average gradient magnitudes [29], or maximum likelihood estimations [30]. Recently, formulations of a constrained optimization problem that use Lagrangian multipliers have also appeared in the literature [31].

I-B Contribution

The use of loss weighting schemes in PINNs has become an essential approach to solving convergence issues related to MO optimization and/or refining the accuracy of final network predictions. Yet when and why the need for loss weighting schemes arises in the first place has rarely been discussed in terms of the investigated physical system and its properties. In this work, we therefore focus on how system parameters, such as characteristic length and time scales, the size of the computational domain, and coefficients of differential equations, affect MO optimization in PINNs. We observe that the choice of system parameters influences the absolute scale of both data and physics residuals. For certain choices of system parameters, this can result in imbalanced scales of loss residuals. This has an immediate effect on MO optimization and the “apparent” Pareto front, which we define as the set of loss values that are achievable with gradient-based training. By analyzing the apparent Pareto front and final PINN predictions, we find that loss weights can be used to compensate for the scaling effects of system parameters, with their optimal choice following the prescribed trend of which loss residuals dominate the MO optimization. Furthermore, we find that the apparent Pareto front of more balanced residuals can exhibit locally convex parts, which results in a wider range of loss weights with which gradient-based training becomes successful. Our results provide valuable insights for MO optimization in PINNs and contribute to a better understanding of the influence of system parameters.

The remainder of this paper is arranged as follows: In Section II, we discuss the fundamental working method of vanilla PINNs and provide the details of MO optimization and the apparent Pareto front. In Section III, we introduce two physical systems on which we perform our analysis. The two systems employ the diffusion equation and Navier-Stokes equations, which are well-known differential equations frequently used in the PINN literature. In Section IV, we theoretically analyze the use of feature scaling to demonstrate how and where system parameters appear in the PINN training and affect the scale of loss residuals. In Section V, we empirically study the apparent Pareto front and trend of optimal loss weights while altering the system parameterization. Section VI provides a discussion of our findings and Section VII a conclusion.

II Physics-Informed Neural Networks

To briefly demonstrate the fundamental working method of PINNs, we take the originally proposed PINN framework [5] and consider a parameterized partial differential equation (PDE) of the general form

tu(t,x)+[u;λ]=0,xΩ,t[0,T],formulae-sequence𝑡𝑢𝑡𝑥𝑢𝜆0formulae-sequence𝑥Ω𝑡0𝑇\frac{\partial}{\partial t}u(t,x)+\mathcal{F}[u;\lambda]=0,\quad x\in\Omega,t% \in[0,T],divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_u ( italic_t , italic_x ) + caligraphic_F [ italic_u ; italic_λ ] = 0 , italic_x ∈ roman_Ω , italic_t ∈ [ 0 , italic_T ] , (2)

where u(t,x)𝑢𝑡𝑥u(t,x)italic_u ( italic_t , italic_x ) is the solution function, [;λ]𝜆\mathcal{F}[\cdot\,;\lambda]caligraphic_F [ ⋅ ; italic_λ ] represents an arbitrary potentially nonlinear differential operator with parameterization λ𝜆\lambdaitalic_λ, and ΩΩ\Omegaroman_Ω represents the spatial computational domain which is a subset of Dsuperscript𝐷\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT. We proceed by approximating the solution function u(t,x)𝑢𝑡𝑥u(t,x)italic_u ( italic_t , italic_x ) with a fully-connected neural network uθ(t,x)subscript𝑢𝜃𝑡𝑥u_{\theta}(t,x)italic_u start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_t , italic_x ) with weights θ𝜃\thetaitalic_θ.

II-A Data and Physics Loss Functions

This paper only considers well-posed problems, i.e., problems with a unique solution function u𝑢uitalic_u, which is attained by imposing sufficient initial and boundary conditions (IC and BC). The IC and BC define the solution function at the boundary of the computational domain, i.e., at t=0𝑡0t=0italic_t = 0 and xΩ𝑥Ωx\in\partial\Omegaitalic_x ∈ ∂ roman_Ω, respectively, and provide a basically infinite set of labeled training data that can be sampled once prior to network training or anew at each training epoch. The premise for encoding the IC and BC, this dataset is used by PINNs in the data loss function

𝒟(θ)subscript𝒟𝜃\displaystyle\mathcal{L}_{\mathcal{D}}(\theta)caligraphic_L start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT ( italic_θ ) =1N𝒟i=1N𝒟|ei|2,withabsent1subscript𝑁𝒟superscriptsubscript𝑖1subscript𝑁𝒟superscriptsubscript𝑒𝑖2with\displaystyle=\frac{1}{N_{\mathcal{D}}}\sum_{i=1}^{N_{\mathcal{D}}}\left|e_{i}% \right|^{2},\mathrm{with}= divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , roman_with (3a)
eisubscript𝑒𝑖\displaystyle e_{i}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =uθ(ti,xi)ui,absentsubscript𝑢𝜃subscript𝑡𝑖subscript𝑥𝑖subscript𝑢𝑖\displaystyle=u_{\theta}(t_{i},x_{i})-u_{i},= italic_u start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (3b)

where 𝒟subscript𝒟\mathcal{L}_{\mathcal{D}}caligraphic_L start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT denotes the mean squared error loss (MSE) function given by the data residuals eisubscript𝑒𝑖e_{i}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT which are determined from the labeled dataset {(ti,xi),ui}i=1N𝒟superscriptsubscriptsubscript𝑡𝑖subscript𝑥𝑖subscript𝑢𝑖𝑖1subscript𝑁𝒟\{(t_{i},x_{i}),u_{i}\}_{i=1}^{N_{\mathcal{D}}}{ ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Details of the exact definition of IC and BC in our experiments are given in Section III.

To encode the governing differential equation, PINNs make use of automatic differentiation [32] that retrieves (partial) derivatives of the neural network function at specified coordinates, commonly called collocation points. Derivatives can be evaluated up to the order to which the neural network activation function is differentiable. Activation functions commonly used in PINNs are thus the hyperbolic tangent (tanh), the sinusoidal (sin), or the swish. With the neural network derivatives evaluated at the collocation points, the physics loss function is given by

(θ)subscript𝜃\displaystyle\mathcal{L}_{\mathcal{F}}(\theta)caligraphic_L start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT ( italic_θ ) =1Ni=1N|fi|2,withabsent1subscript𝑁superscriptsubscript𝑖1subscript𝑁superscriptsubscript𝑓𝑖2with\displaystyle=\frac{1}{N_{\mathcal{F}}}\sum_{i=1}^{N_{\mathcal{F}}}\left|f_{i}% \right|^{2},\mathrm{with}= divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , roman_with (4a)
fisubscript𝑓𝑖\displaystyle f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =tuθ(ti,xi)+[uθ(ti,xi);λ],absent𝑡subscript𝑢𝜃subscript𝑡𝑖subscript𝑥𝑖subscript𝑢𝜃subscript𝑡𝑖subscript𝑥𝑖𝜆\displaystyle=\frac{\partial}{\partial t}u_{\theta}(t_{i},x_{i})+\mathcal{F}[u% _{\theta}(t_{i},x_{i});\lambda],= divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_u start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + caligraphic_F [ italic_u start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ; italic_λ ] , (4b)

where subscript\mathcal{L}_{\mathcal{F}}caligraphic_L start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT denotes the MSE loss function given by the physics residuals fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT which are determined from the unlabeled dataset {(ti,xi)}i=1Nsuperscriptsubscriptsubscript𝑡𝑖subscript𝑥𝑖𝑖1subscript𝑁\{(t_{i},x_{i})\}_{i=1}^{N_{\mathcal{F}}}{ ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. In comparison to conventional PDE solvers that use a predefined computational mesh, the choice of collocation points in PINNs is not restricted, i.e., they can be randomly sampled from inside the function domain once or at each epoch, e.g., by using hyper-cube sampling.

II-B The Apparent Pareto Front

Vanilla PINNs and many variants thereof handle data and physics loss functions in a MO manner. The standard approach in PINNs is to simultaneously minimize both by defining a single or total loss function via linear scalarization of the MO problem previously introduced by (1). The particular use of loss weights and their optimal choice differs in the literature (see Section I-A): they depend on the total number of loss functions in use and are tuned manually or in an adaptive weighting scheme.

From a theoretical perspective, any choice of loss weight in a scalarized problem selects a Pareto optimal solution to the MO optimization problem [33]. The Pareto front can then be seen as the set of all possible Pareto optima that are attained by continuous adjustment of the loss weights. In fact, any use of loss weights in a well-posed problem would converge to the same Pareto optima as the true solution of (2), since a perfectly approximated solution function yields zero for all losses. However, finite-sized networks and highly complex loss functions are used in practice, so the gradient decent optimization eventually converges to local optima. Hence these optima do not represent true Pareto optimal solutions but approximations that are found along the optimization path as selected by the particular choice of loss weights. For our analysis, we thus define the “apparent” Pareto front as the set of loss values that are achievable with gradient-based optimization. To empirically obtain the apparent Pareto front, we train several PINN instances with different loss weights and observe where the gradient-based optimization has converged (see Section V for further details).

For the sake of simplicity and demonstration purposes, this paper only distinguishes between data and physics loss functions and introduces a single parameter that is manually tuned and simultaneously trades both by

(θ;α):=α𝒟(θ)+(1α)(θ),assign𝜃𝛼𝛼subscript𝒟𝜃1𝛼subscript𝜃\mathcal{L}(\theta;\alpha):=\alpha\mathcal{L}_{\mathcal{D}}(\theta)+(1-\alpha)% \mathcal{L}_{\mathcal{F}}(\theta),caligraphic_L ( italic_θ ; italic_α ) := italic_α caligraphic_L start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT ( italic_θ ) + ( 1 - italic_α ) caligraphic_L start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT ( italic_θ ) , (5)

where α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ). According to this definition, α1𝛼1\alpha\to 1italic_α → 1 favors low data losses, while α0𝛼0\alpha\to 0italic_α → 0 favors low physics losses. A loss weight of α=0.5𝛼0.5\alpha=0.5italic_α = 0.5 relates to unweighted MO optimization as used by vanilla PINNs.

III Experimental Setup

Refer to caption

Figure 1: Geometrical setup and reference solution for the diffusion example. Representative sample of training data is shown as black crosses (IC and BC) and white circles (collocation).

III-A Diffusion Example

The diffusion equation, along with variants thereof, is among the most widely studied parabolic PDEs with applications in many fields of science, pure mathematics, and engineering. In this work we study the heat equation, a special case of the diffusion equation in the context of engineering, specifically considering the cooling of a one-dimensional rod with an initial temperature distribution and Dirichlet boundary conditions on both rod ends. The dynamics of the system are described by the equation

tu(t,x)=κ2x2u(t,x),𝑡𝑢𝑡𝑥𝜅superscript2superscript𝑥2𝑢𝑡𝑥\frac{\partial}{\partial t}u(t,x)=\kappa\frac{\partial^{2}}{\partial x^{2}}u(t% ,x),divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_u ( italic_t , italic_x ) = italic_κ divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_u ( italic_t , italic_x ) , (6)

where the solution function u𝑢uitalic_u represents the temperature of the rod at position x𝑥xitalic_x and time t𝑡titalic_t, and κ𝜅\kappaitalic_κ is the thermal diffusivity. We define the initial (IC) and boundary (BC) conditions as

ICIC\displaystyle\mathrm{IC}roman_IC ::\displaystyle:: u(0,x)=sin(πxL)𝑢0𝑥sin𝜋𝑥𝐿\displaystyle u(0,x)=\text{sin}\left(\pi\frac{x}{L}\right)italic_u ( 0 , italic_x ) = sin ( italic_π divide start_ARG italic_x end_ARG start_ARG italic_L end_ARG ) x[0,L],𝑥0𝐿\displaystyle x\in[0,L],italic_x ∈ [ 0 , italic_L ] , (7)
BCBC\displaystyle\mathrm{BC}roman_BC ::\displaystyle:: u(t,0)=u(t,L)=0𝑢𝑡0𝑢𝑡𝐿0\displaystyle u(t,0)=u(t,L)=0italic_u ( italic_t , 0 ) = italic_u ( italic_t , italic_L ) = 0 t[0,T],𝑡0𝑇\displaystyle t\in[0,T],italic_t ∈ [ 0 , italic_T ] , (8)

with L𝐿Litalic_L denoting the length of the rod and T𝑇Titalic_T the simulation time. For our later analysis and the sake of simplicity, we introduce the characteristic (diffusive) time scale

τ:=L2κ,assign𝜏superscript𝐿2𝜅\tau:=\frac{L^{2}}{\kappa},italic_τ := divide start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_κ end_ARG , (9)

and consider the simulation time as multiplies of it, i.e., T=λτ𝑇𝜆𝜏T=\lambda\tauitalic_T = italic_λ italic_τ with λ+𝜆subscriptsuperscript\lambda\in\mathbb{R}^{+}_{*}italic_λ ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT.

The problem stated above is well-posed and has a unique solution

u(x,t)=sin(πxL)eπ2t/τ.𝑢𝑥𝑡sin𝜋𝑥𝐿superscript𝑒superscript𝜋2𝑡𝜏u(x,t)=\text{sin}\left(\pi\frac{x}{L}\right)e^{-\pi^{2}t/\tau}.italic_u ( italic_x , italic_t ) = sin ( italic_π divide start_ARG italic_x end_ARG start_ARG italic_L end_ARG ) italic_e start_POSTSUPERSCRIPT - italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t / italic_τ end_POSTSUPERSCRIPT . (10)

Fixing λ𝜆\lambdaitalic_λ while adjusting L𝐿Litalic_L and κ𝜅\kappaitalic_κ leaves the intrinsic diffusion process and solution function unaffected. This gives us a controllable setting of different system parameterizations all described by the same solution function (cf. Fig. 1). In an engineering context, this is known as dynamic similitude.

III-B Navier-Stokes Example

The Navier-Stokes equations are the governing PDEs in the study of fluid flow. Along with the continuity equation, they describe the conservation of momentum and mass of viscous fluids. Their importance in scientific and engineering modeling is indisputable as they are applied to many physical phenomena occurring in weather forecasting, blood flow through vessels, or flow around obstacles. For our study, we limit their application to a two-dimensional incompressible steady-state flow where the solution function to be approximated by the PINN is given by the vector-valued function (x,y)(u,v,p)maps-to𝑥𝑦𝑢𝑣𝑝(x,y)\mapsto(u,v,p)( italic_x , italic_y ) ↦ ( italic_u , italic_v , italic_p ). The governing equations for the conservation of momentum and mass, respectively, are

(𝐮)𝐮𝐮𝐮\displaystyle(\mathbf{u}\cdot\nabla)\mathbf{u}( bold_u ⋅ ∇ ) bold_u =1ρp+ν2𝐮absent1𝜌𝑝𝜈superscript2𝐮\displaystyle=-\frac{1}{\rho}\nabla p+\nu\nabla^{2}\mathbf{u}= - divide start_ARG 1 end_ARG start_ARG italic_ρ end_ARG ∇ italic_p + italic_ν ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_u (11a)
𝐮𝐮\displaystyle\nabla\cdot\mathbf{u}∇ ⋅ bold_u =0,absent0\displaystyle=0,= 0 , (11b)

where 𝐮=(u,v)𝐮𝑢𝑣\mathbf{u}=(u,v)bold_u = ( italic_u , italic_v ) and p𝑝pitalic_p denote fluid velocity in the x𝑥xitalic_x- and y𝑦yitalic_y-directions and pressure, respectively, and ρ𝜌\rhoitalic_ρ and ν𝜈\nuitalic_ν are the fluid density and viscosity. The continuity equation (11b) can be hard coded with PINNs (see [5]) which leaves behind the conservation of momemtum (11a) to be encoded in the physics loss function.

Analytical solutions in fluid dynamics are rare. Therefore, we consider the laminar fluid flow in the wake of a two-dimensional grid which has been solved analytically and is better known as Kovasznay flow [34]. With L𝐿Litalic_L as the characteristic spacing of the grid, u0subscript𝑢0u_{0}italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT the mean velocity in the xx\mathrm{x}roman_x-direction and assuming constant density ρ𝜌\rhoitalic_ρ, the analytical solution to this problem is given by

u(x,y)𝑢𝑥𝑦\displaystyle u(x,y)italic_u ( italic_x , italic_y ) =u0(1eγxLcos(2πyL)),absentsubscript𝑢01superscript𝑒𝛾𝑥𝐿cos2𝜋𝑦𝐿\displaystyle=u_{0}\left(1-e^{\gamma\frac{x}{L}}\text{cos}\left(2\pi\frac{y}{L% }\right)\right),= italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT italic_γ divide start_ARG italic_x end_ARG start_ARG italic_L end_ARG end_POSTSUPERSCRIPT cos ( 2 italic_π divide start_ARG italic_y end_ARG start_ARG italic_L end_ARG ) ) , (12a)
v(x,y)𝑣𝑥𝑦\displaystyle v(x,y)italic_v ( italic_x , italic_y ) =u0γ2πeγxLsin(2πyL),absentsubscript𝑢0𝛾2𝜋superscript𝑒𝛾𝑥𝐿sin2𝜋𝑦𝐿\displaystyle=\frac{u_{0}\gamma}{2\pi}e^{\gamma\frac{x}{L}}\text{sin}\left(2% \pi\frac{y}{L}\right),= divide start_ARG italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_γ end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT italic_γ divide start_ARG italic_x end_ARG start_ARG italic_L end_ARG end_POSTSUPERSCRIPT sin ( 2 italic_π divide start_ARG italic_y end_ARG start_ARG italic_L end_ARG ) , (12b)
p(x,y)𝑝𝑥𝑦\displaystyle p(x,y)italic_p ( italic_x , italic_y ) =u02e2γxL+C,absentsuperscriptsubscript𝑢02superscript𝑒2𝛾𝑥𝐿𝐶\displaystyle=u_{0}^{2}e^{2\gamma\frac{x}{L}}+C,= italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT 2 italic_γ divide start_ARG italic_x end_ARG start_ARG italic_L end_ARG end_POSTSUPERSCRIPT + italic_C , (12c)

where C𝐶Citalic_C is a constant and

γ=12ν14ν2+4π2.𝛾12𝜈14superscript𝜈24superscript𝜋2\gamma=\frac{1}{2\nu}-\sqrt{\frac{1}{4\nu^{2}}+4\pi^{2}}.italic_γ = divide start_ARG 1 end_ARG start_ARG 2 italic_ν end_ARG - square-root start_ARG divide start_ARG 1 end_ARG start_ARG 4 italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 4 italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (13)

To study this problem with different system parameterizations, we again make use of the concept of dynamic similitudes and consider a fixed Reynolds number Re=Lu0/νRe𝐿subscript𝑢0𝜈\mathrm{Re}=Lu_{0}/\nuroman_Re = italic_L italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / italic_ν for the above mentioned system. Under these conditions, we adjust the mean velocity by u0=1/Lsubscript𝑢01𝐿u_{0}=1/Litalic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 / italic_L (neglecting physical units) so that any selected grid spacing L𝐿Litalic_L represents the same fluid flow. To establish geometric similarity, the spatial extension is assumed to be equal in both directions, i.e., x[0,L]𝑥0𝐿x\in[0,L]italic_x ∈ [ 0 , italic_L ] and y[0,L]𝑦0𝐿y\in[0,L]italic_y ∈ [ 0 , italic_L ].

In this example, the PINN is trained on boundary data sampled from the analytical solution of the flow velocities (12a) and (12b) at the boundary. Since the flow is steady state, there is no IC. The pressure is entirely learned by the PINN and inferred from the set of governing PDEs (11a).

Refer to caption

Figure 2: Geometrical setup and reference solution for the Navier-Stokes example. Representative sample of training data is shown as black crosses (IC and BC) and white circles (collocation).

IV Effects of System Parameters on Loss Residuals

In this section, we show that the absolute scale of data and physics residuals depends on the underlying system parameters. We discuss the effects of feature scaling, which is a necessary step to bring the input dimensions into a range suitable for gradient descent optimization. Our focus is on min-max feature scaling since its input ranges are well-defined and given by the computational domain.

IV-A Diffusion Example

For the diffusion equation, we consider the input ranges (t,x)[0,T]×[0,L]𝑡𝑥0𝑇0𝐿(t,x)\in[0,T]\times[0,L]( italic_t , italic_x ) ∈ [ 0 , italic_T ] × [ 0 , italic_L ] scaled to the unit interval [0,1]2superscript012[0,1]^{2}[ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

t^=tT,andx^=xL,formulae-sequence^𝑡𝑡𝑇and^𝑥𝑥𝐿\hat{t}=\frac{t}{T},\quad\mathrm{and}\quad\hat{x}=\frac{x}{L},over^ start_ARG italic_t end_ARG = divide start_ARG italic_t end_ARG start_ARG italic_T end_ARG , roman_and over^ start_ARG italic_x end_ARG = divide start_ARG italic_x end_ARG start_ARG italic_L end_ARG , (14)

where t^^𝑡\hat{t}over^ start_ARG italic_t end_ARG and x^^𝑥\hat{x}over^ start_ARG italic_x end_ARG denote the scaled input variables. It is noteworthy that applying feature scaling can be seen as scaling the network function to the physical system, thus adapting to characteristic time and length scales as apparent by (14). The physics residuals for the diffusion equation can thus be written in terms of the scaled and characteristic quantities:

fi=1Tt^uθ(t^i,x^i)κL22x^2uθ(t^i,x^i),subscript𝑓𝑖1𝑇^𝑡subscript𝑢𝜃subscript^𝑡𝑖subscript^𝑥𝑖𝜅superscript𝐿2superscript2superscript^𝑥2subscript𝑢𝜃subscript^𝑡𝑖subscript^𝑥𝑖f_{i}=\frac{1}{T}\frac{\partial}{\partial\hat{t}}u_{\theta}(\hat{t}_{i},\hat{x% }_{i})-\frac{\kappa}{L^{2}}\frac{\partial^{2}}{\partial\hat{x}^{2}}u_{\theta}(% \hat{t}_{i},\hat{x}_{i}),italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_T end_ARG divide start_ARG ∂ end_ARG start_ARG ∂ over^ start_ARG italic_t end_ARG end_ARG italic_u start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - divide start_ARG italic_κ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_u start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (15)

where the additional pre-factors in the equation appear when the chain rule of derivatives is applied. We again consider the simulation time as multiples of the characteristic diffusive time scale, i.e., T=λτ=λL2/κ𝑇𝜆𝜏𝜆superscript𝐿2𝜅T=\lambda\tau=\lambda L^{2}/\kappaitalic_T = italic_λ italic_τ = italic_λ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_κ, and rearrange the physics residuals according to this definition to yield the following:

fi=κL2(1λt^uθ(t^i,x^i)2x^2uθ(t^i,x^i)).subscript𝑓𝑖𝜅superscript𝐿21𝜆^𝑡subscript𝑢𝜃subscript^𝑡𝑖subscript^𝑥𝑖superscript2superscript^𝑥2subscript𝑢𝜃subscript^𝑡𝑖subscript^𝑥𝑖f_{i}=\frac{\kappa}{L^{2}}\left(\frac{1}{\lambda}\frac{\partial}{\partial\hat{% t}}u_{\theta}(\hat{t}_{i},\hat{x}_{i})-\frac{\partial^{2}}{\partial\hat{x}^{2}% }u_{\theta}(\hat{t}_{i},\hat{x}_{i})\right).italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_κ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_λ end_ARG divide start_ARG ∂ end_ARG start_ARG ∂ over^ start_ARG italic_t end_ARG end_ARG italic_u start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_u start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) . (16)

It is now apparent that applying feature scaling re-parameterizes the diffusion equation encoded in the physics loss function and effectively scales the physics residuals by fiκ/L2similar-tosubscript𝑓𝑖𝜅superscript𝐿2f_{i}\sim\kappa/L^{2}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_κ / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. In contrast, the data residuals are not affected by those system parameters and stay in the range of ei1similar-tosubscript𝑒𝑖1e_{i}\sim 1italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ 1 due to the chosen IC (7). Consequently, data and physics residuals scale differently according to the scale ratio given by

fi/eiκL2.similar-tosubscript𝑓𝑖subscript𝑒𝑖𝜅superscript𝐿2f_{i}/e_{i}\sim\frac{\kappa}{L^{2}}.italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ divide start_ARG italic_κ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (17)

This shows how a certain choice of L𝐿Litalic_L and κ𝜅\kappaitalic_κ overbalances their scale. While for κ/L21much-greater-than𝜅superscript𝐿21\kappa/L^{2}\gg 1italic_κ / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≫ 1 the scale of physics residuals is predominant, for κ/L21much-less-than𝜅superscript𝐿21\kappa/L^{2}\ll 1italic_κ / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≪ 1 the scale of data residuals is predominant. This scaling in turn directly affects the loss values in (5) and respective gradients with measurable consequences for MO optimization [27].

IV-B Navier-Stokes Example

For the Navier-Stokes system, we consider the input ranges (x,y)[L,L]2𝑥𝑦superscript𝐿𝐿2(x,y)\in[-L,L]^{2}( italic_x , italic_y ) ∈ [ - italic_L , italic_L ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT scaled to the interval [1,1]2superscript112[-1,1]^{2}[ - 1 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

x^=xL,y^=yL,thus^=L.formulae-sequence^𝑥𝑥𝐿formulae-sequence^𝑦𝑦𝐿thus^𝐿\hat{x}=\frac{x}{L},\quad\quad\hat{y}=\frac{y}{L},\quad\mathrm{thus}\quad\hat{% \nabla}=L\nabla.over^ start_ARG italic_x end_ARG = divide start_ARG italic_x end_ARG start_ARG italic_L end_ARG , over^ start_ARG italic_y end_ARG = divide start_ARG italic_y end_ARG start_ARG italic_L end_ARG , roman_thus over^ start_ARG ∇ end_ARG = italic_L ∇ . (18)

This time, the solution function will be affected by the particular choice of L𝐿Litalic_L, since u0=1/Lsubscript𝑢01𝐿u_{0}=1/Litalic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 / italic_L (see experimental setup III-B). Hence, we consider the velocities and pressure in terms of scaled quantities that are commonly used in the nondimensionalization of the Navier-Stokes equations:

𝐮^:=𝐮u0,andp^:=pρu02.formulae-sequenceassign^𝐮𝐮subscript𝑢0andassign^𝑝𝑝𝜌superscriptsubscript𝑢02\mathbf{\hat{u}}:=\frac{\mathbf{u}}{u_{0}},\quad\mathrm{and}\quad\hat{p}:=% \frac{p}{\rho u_{0}^{2}}.over^ start_ARG bold_u end_ARG := divide start_ARG bold_u end_ARG start_ARG italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG , roman_and over^ start_ARG italic_p end_ARG := divide start_ARG italic_p end_ARG start_ARG italic_ρ italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (19)

Rewriting the physics residuals in terms of the scaled quantities yields

fi=u02L(𝐮^θ^)𝐮^θ+u02L^p^θνu0L2^2𝐮^θ,subscript𝑓𝑖superscriptsubscript𝑢02𝐿subscript^𝐮𝜃^subscript^𝐮𝜃superscriptsubscript𝑢02𝐿^subscript^𝑝𝜃𝜈subscript𝑢0superscript𝐿2superscript^2subscript^𝐮𝜃f_{i}=\frac{u_{0}^{2}}{L}\left(\mathbf{\hat{u}}_{\theta}\cdot\hat{\nabla}% \right)\mathbf{\hat{u}}_{\theta}+\frac{u_{0}^{2}}{L}\hat{\nabla}\hat{p}_{% \theta}-\nu\frac{u_{0}}{L^{2}}\hat{\nabla}^{2}\mathbf{\hat{u}}_{\theta},italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_L end_ARG ( over^ start_ARG bold_u end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ⋅ over^ start_ARG ∇ end_ARG ) over^ start_ARG bold_u end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT + divide start_ARG italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_L end_ARG over^ start_ARG ∇ end_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT - italic_ν divide start_ARG italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG over^ start_ARG ∇ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG bold_u end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , (20)

where for clarity we have omitted the spatial dependencies 𝐮^θ(x^,y^)subscript^𝐮𝜃^𝑥^𝑦\mathbf{\hat{u}}_{\theta}(\hat{x},\hat{y})over^ start_ARG bold_u end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG , over^ start_ARG italic_y end_ARG ) and p^θ(x^,y^)subscript^𝑝𝜃^𝑥^𝑦\hat{p}_{\theta}(\hat{x},\hat{y})over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG , over^ start_ARG italic_y end_ARG ). After u0=1/Lsubscript𝑢01𝐿u_{0}=1/Litalic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 / italic_L is used and rearranged,

fi=1L3((𝐮^θ^)𝐮^θ+^p^θν^2𝐮^θ)subscript𝑓𝑖1superscript𝐿3subscript^𝐮𝜃^subscript^𝐮𝜃^subscript^𝑝𝜃𝜈superscript^2subscript^𝐮𝜃f_{i}=\frac{1}{L^{3}}\left((\mathbf{\hat{u}}_{\theta}\cdot\hat{\nabla})\mathbf% {\hat{u}}_{\theta}+\hat{\nabla}\hat{p}_{\theta}-\nu\hat{\nabla}^{2}\mathbf{% \hat{u}}_{\theta}\right)italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ( ( over^ start_ARG bold_u end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ⋅ over^ start_ARG ∇ end_ARG ) over^ start_ARG bold_u end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT + over^ start_ARG ∇ end_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT - italic_ν over^ start_ARG ∇ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG bold_u end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) (21)

it is again apparent that the physics residuals are scaled by the factor fi1/L3similar-tosubscript𝑓𝑖1superscript𝐿3f_{i}\sim 1/L^{3}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ 1 / italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, while the data residuals are scaled by ei1/Lsimilar-tosubscript𝑒𝑖1𝐿e_{i}\sim 1/Litalic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ 1 / italic_L since ei=u0(𝐮^θ(x^i,y^i)𝐮^i)subscript𝑒𝑖subscript𝑢0subscript^𝐮𝜃subscript^𝑥𝑖subscript^𝑦𝑖subscript^𝐮𝑖e_{i}=u_{0}(\mathbf{\hat{u}}_{\theta}(\hat{x}_{i},\hat{y}_{i})-\mathbf{\hat{u}% }_{i})italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG bold_u end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - over^ start_ARG bold_u end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Consequently, data and physics residuals are scaled differently with their scale ratio given by

fi/ei1L2.similar-tosubscript𝑓𝑖subscript𝑒𝑖1superscript𝐿2f_{i}/e_{i}\sim\frac{1}{L^{2}}.italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (22)

Again, this demonstrates that a certain choice of L𝐿Litalic_L overbalances their scale with consequences for MO optimization.

Refer to caption

Figure 3: Data vs. physics loss (top row) and test set errors (bottom row) for the diffusion equation. Different system parameterizations are arranged as columns. The system parameters determine the residual scale ratio fi/eiκ/L2similar-tosubscript𝑓𝑖subscript𝑒𝑖𝜅superscript𝐿2f_{i}/e_{i}\sim\kappa/L^{2}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_κ / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where for κ/L21much-greater-than𝜅superscript𝐿21\kappa/L^{2}\gg 1italic_κ / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≫ 1 the scale of physics residuals is predominant and for κ/L21much-less-than𝜅superscript𝐿21\kappa/L^{2}\ll 1italic_κ / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≪ 1 that of data residuals. Final optimization steps (empty circles) in the top row outline the apparent Pareto front.

V Studying the Apparent Pareto Front and Optimal Choice of Loss Weights

In this section, we empirically analyze the scaling effects of system parameters on MO optimization of data and physics. To understand the qualitative trend of optimal loss weights, we analyze the apparent Pareto front with different system parameterizations and study which loss weights achieve a good PINN performance in terms of the relative 2superscript2\ell^{2}roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error. We use the concept of dynamic similitudes (see experimental setup III) to study one and the same physical system with individual sets of system parameters, which causes different scaling effects captured by the residual scale ratio fi/eisubscript𝑓𝑖subscript𝑒𝑖f_{i}/e_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

V-A PINN Setup

We optimize PINNs with four hidden layers and 50 neurons per layer, employing a hyperbolic tangent (tanh) activation function for the hidden layers and a linear activation for the output layer111All code is available on GitHub at https://github.com/frohrhofer/PINN_pareto. Network weights are initialized using the Glorot uniform [35] initializer. We use Adam [36] to minimize the MO loss (5) with a learning rate of 0.010.010.010.01. Training is performed for 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT epochs to ensure that the optimization has converged to a solution, where measured quantities have stabilized and no longer change substantially. Training data is sampled anew at each epoch with N=1024subscript𝑁1024N_{\mathcal{F}}=1024italic_N start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT = 1024 (unlabeled collocation points) and N=128𝑁128N=128italic_N = 128 (labeled data points) at each boundary, thus N𝒟=384subscript𝑁𝒟384N_{\mathcal{D}}=384italic_N start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT = 384 for the diffusion example (cf. Fig. 1) and N𝒟=512subscript𝑁𝒟512N_{\mathcal{D}}=512italic_N start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT = 512 for the Navier-Stokes example (cf. Fig. 2).

To evaluate the accuracy of predictions, we measure the relative 2superscript2\ell^{2}roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error on an independent test set, which is randomly sampled inside the computational domain with N=1024𝑁1024N=1024italic_N = 1024. The relative 2superscript2\ell^{2}roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-error is given by

ϵuuuθ2u2,subscriptitalic-ϵ𝑢subscriptnorm𝑢subscript𝑢𝜃2subscriptnorm𝑢2\epsilon_{u}\equiv\frac{\|u-u_{\theta}\|_{2}}{\|u\|_{2}},italic_ϵ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ≡ divide start_ARG ∥ italic_u - italic_u start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_u ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , (23)

where u𝑢uitalic_u is obtained from the analytical solutions (10) and (12). For qualitative analysis of the apparent Pareto front, we train PINN instances with loss weights α={0.001,0.01,0.1,0.5,0.9,0.99,0.999}𝛼0.0010.010.10.50.90.990.999\alpha=\{0.001,0.01,0.1,0.5,0.9,0.99,0.999\}italic_α = { 0.001 , 0.01 , 0.1 , 0.5 , 0.9 , 0.99 , 0.999 }. Test runs are repeated with five unique seeds for data sampling and network initialization.

Note that we have also performed tests with different activation functions (sin, swish), network architectures (2x30, 6x100), learning rates (0.001, 0.0001), and training set sizes, but the results had a similar qualitative outcome as presented in the subsection. For the sake of simplicity, the following subsections are concerned with only the setting presented above.

V-B Diffusion Example

The diffusion example is parameterized by L𝐿Litalic_L, κ𝜅\kappaitalic_κ, and λ𝜆\lambdaitalic_λ, where for a fixed value of λ𝜆\lambdaitalic_λ any choice of L𝐿Litalic_L and κ𝜅\kappaitalic_κ represents the same underlying system dynamics as with the analytical solution given by (10), cf. Fig. 1. In this example, the residual scale ratio is determined by the system parameters through fi/eiκ/L2similar-tosubscript𝑓𝑖subscript𝑒𝑖𝜅superscript𝐿2f_{i}/e_{i}\sim\kappa/L^{2}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_κ / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. To study the effects on the apparent Pareto front and the optimal choice of loss weights, we thus choose λ=0.5𝜆0.5\lambda=0.5italic_λ = 0.5 for our experiment and take (L,κ){(5,0.04),(5,1),(1,1),(1,25)}𝐿𝜅50.045111125(L,\kappa)\in\{(5,0.04),(5,1),(1,1),(1,25)\}( italic_L , italic_κ ) ∈ { ( 5 , 0.04 ) , ( 5 , 1 ) , ( 1 , 1 ) , ( 1 , 25 ) } to cause a respective scaling by κ/L2{0.0016,0.04,1,25}𝜅superscript𝐿20.00160.04125\kappa/L^{2}\in\{0.0016,0.04,1,25\}italic_κ / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ { 0.0016 , 0.04 , 1 , 25 }. We have also performed a test with λ{0.1,1,10}𝜆0.1110\lambda\in\{0.1,1,10\}italic_λ ∈ { 0.1 , 1 , 10 }, but its results revealed qualitative behavior similar to what is presented in this section.

Fig. 3 shows the results of this experiment; the different system parameterizations are arranged by column. The top row displays the history of loss tuples (𝒟subscript𝒟\mathcal{L}_{\mathcal{D}}caligraphic_L start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT and subscript\mathcal{L}_{\mathcal{F}}caligraphic_L start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT) in each of the trained PINN instances as recorded during the training. To increase clarity and comprehension, initial (epoch 00) and final (epoch 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT) optimization steps are averaged across the five different PINN runs and highlighted as empty crosses and circles, respectively. It is clear that each PINN instance starts close to the marked initialization step and eventually converges to different regions as determined by the loss weight α𝛼\alphaitalic_α. The apparent Pareto front can thus be seen as a theoretical interpolation curve of the final loss values indicated by the empty circles. Note that as a result of utilizing a stochastic optimization method, it is possible for those points to have slightly worse values than those found in intermediate optimization steps. The bottom row of the figure shows the respective test set errors (ϵusubscriptitalic-ϵ𝑢\epsilon_{u}italic_ϵ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT) as a function of the optimization step. The test set errors are again averaged over the five PINN instances using different seeds, which are presented by thick lines while individual runs are thin. The black dashed line highlights the unweighted MO, and the empty circles at the end of the test error curves are randomly shifted vertically to reduce overlap.

Looking at the test set error in the bottom row of the figure, we observe a common trend from left to right: in the presence of predominant data residuals (Fig. 3a), a higher weighting of the physics loss (α0𝛼0\alpha\to 0italic_α → 0) yields low test set errors. On the contrary, when predominantly physics residuals are present (Fig. 3d), a higher weighting of the data loss (α1𝛼1\alpha\to 1italic_α → 1) is necessary to achieve comparable low errors. Intermediate scale factors (Fig. 3b and 3c) follow this trend where the unweighted optimization (α=0.5𝛼0.5\alpha=0.5italic_α = 0.5) already achieves comparably low errors due to the more balanced residual scales. Yet when κ/L2=1𝜅superscript𝐿21\kappa/L^{2}=1italic_κ / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1, performance is slightly better when a higher weight is given to data loss.

The apparent Pareto front in the top row provides further insights: in general, we observe the apparent Pareto front substantially changing its shape with different system parameterizations. In Fig. 3b and 3c, we find that the apparent Pareto front in the lower left region locally exhibits convex parts (not explicitly highlighted), where generally a wider range of loss weights achieves similarly accurate results, i.e., MO optimization is less sensitive to the particular choice of the loss weight α𝛼\alphaitalic_α. In contrast are the apparent Pareto fronts in the presence of unbalanced residual scales (Fig. 3a and 3d), where only a particular choice of α𝛼\alphaitalic_α, here either α0𝛼0\alpha\to 0italic_α → 0 or α1𝛼1\alpha\to 1italic_α → 1, yields accurate results.

Refer to caption

Figure 4: Data vs. physics loss (top row) and test set errors (bottom row) for the Navier-Stokes equations. Different system parameterizations are arranged as columns. The system parameters determine the residual scale ratio fi/ei1/L2similar-tosubscript𝑓𝑖subscript𝑒𝑖1superscript𝐿2f_{i}/e_{i}\sim 1/L^{2}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ 1 / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where for 1/L21much-greater-than1superscript𝐿211/L^{2}\gg 11 / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≫ 1 the scale of physics residuals is predominant and for 1/L21much-less-than1superscript𝐿211/L^{2}\ll 11 / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≪ 1 that of data residuals. Final optimization steps (empty circles) in the top row outline the apparent Pareto front.

V-C Navier-Stokes Example

The Navier-Stokes setup has a single system parameter L𝐿Litalic_L that simultaneously affects both residual scales; their scale ratio is determined by fi/ei1/L2similar-tosubscript𝑓𝑖subscript𝑒𝑖1superscript𝐿2f_{i}/e_{i}\sim 1/L^{2}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ 1 / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We thus choose L{25,5,1,0.2}𝐿25510.2L\in\{25,5,1,0.2\}italic_L ∈ { 25 , 5 , 1 , 0.2 } to cause a respective scaling by 1/L2{0.0016,0.04,1,25}1superscript𝐿20.00160.041251/L^{2}\in\{0.0016,0.04,1,25\}1 / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ { 0.0016 , 0.04 , 1 , 25 }.

The history of loss tuples (𝒟subscript𝒟\mathcal{L}_{\mathcal{D}}caligraphic_L start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT and subscript\mathcal{L}_{\mathcal{F}}caligraphic_L start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT), and test set errors (ϵpsubscriptitalic-ϵ𝑝\epsilon_{p}italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT) for this experiment is found in Fig. 4. The same explanation of the figure that was provided in the previous subsection also applies here. Note that since the relative 2superscript2\ell^{2}roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error is dependent on the absolute range of the target value and the range in this example varies with regard to different system parameterizations, a direct quantitative comparison of test errors between different settings of L𝐿Litalic_L is not possible. Furthermore, since the velocity errors (ϵusubscriptitalic-ϵ𝑢\epsilon_{u}italic_ϵ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and ϵvsubscriptitalic-ϵ𝑣\epsilon_{v}italic_ϵ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT) exhibit behavior qualitatively similar to that of the pressure field, their display is omitted for the sake of clarity.

Similar to the previous example, a common trend is observed from left to right: in the presence of predominant data residuals (Fig. 4a), a higher weighting of the physics loss (α0𝛼0\alpha\to 0italic_α → 0) achieves lower prediction errors. In contrast, the setting with predominantly physics residuals (Fig. 4d) requires higher weighting of the data loss (α1𝛼1\alpha\to 1italic_α → 1) to yield accurate results. The trend is similar with intermediate settings (Fig. 4b and 4c), where the unweighted MO already achieves accurate results when 1/L2=0.041superscript𝐿20.041/L^{2}=0.041 / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.04, while a slightly higher weighting of the data loss works better when 1/L2=11superscript𝐿211/L^{2}=11 / italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1.

Regarding the apparent Pareto front, we once again observe that its shape is substantially influenced by the chosen parameterization. As observed in the previous example, locally convex parts emerge in the lower left corner of the apparent Pareto front (here in Fig. 4a-c), and this coincides with a wider range of loss weights that achieve similar accuracy.

VI Discussion

It has become widely known that the success of PINN training in the sense of obtaining the physically correct solution depends on a myriad of factors, and hence requires a variety of mitigation methods. In general, these factors can be categorized as depending on model parameters and system parameters, though we believe that this separation is less clear than in classical deep learning problems.

This paper has shed more light on the effect of system parameters. Specifically, Section IV shows mathematically with two systems of differential equations that changing the size of the computational domain, e.g., by altering the system parameterization, affects the loss residuals used in MO optimization. Our experiments in Section V are in line with our mathematical analysis and show how varying the system parameters leads to different scaling factors of loss residuals, which in turn requires different loss weights α𝛼\alphaitalic_α for PINN training to be successful. Altering the system parameterization thus effectively converts one specific PINN training instance to another that is characterized by different loss weights.222Additional investigations, which are not reproduced here, have revealed that the new PINN training instance may also be characterized by differently parameterized differential equations. This insight resonates with and mathematically supports the literature that claims that successful PINN training requires carefully selected loss weights or proposes automatic loss weighting schemes. While proposed methods are usually tailored to specific cases and directly address issues as they arise during training, our results offer a more comprehensive view and focus on potential root causes of issues that may arise from inappropriate system parameterizations.

From the perspective of differential equations, the computational domain is rarely assumed to be fixed. Instead and especially in computational fluid dynamics problems are often nondimensionalized or scaled. In other words, the parameters of the system under consideration are reduced to their minimum number, and often normalized with regard to intrinsic quantities (e.g., the spatial domain is normalized with regard to a characteristic length scale). Nondimensionalization therefore suggests a certain set of system parameters and subsequently a set of ideal loss weights. Note that the ideal loss weights still have to be discovered, even those in nondimensionalized systems: In Fig. 4 (c)𝑐(c)( italic_c ), it can be seen that the best test set error is achieved with a loss weight of α=0.9𝛼0.9\alpha=0.9italic_α = 0.9.

From a practical perspective, we believe that the insights from this work can help to improve PINN training. For example, suppose that a parameterization (nondimensionalized or not) and PINN configuration (network architecture, activation function, loss weights, etc.) are known that achieve satisfactory training performance with a given system of differential equations. Such configurations may be taken from existing literature that has successfully applied PINNs to problems of practical relevance. To solve an instance of the same problem with different system parameters, the results presented here can now be taken to study how the individual loss terms of the new problem scale relative to the original problem as a function of the changed system parameters. This can then be used to adjust the loss weights of the new PINN instance. (Yet note that with this loss weight adjustment, the parameters of the differential equation may also change relative to the original and new systems.) Supposing that each differential equation and each dimension of the IC and BC losses is assigned a separate loss weight, a certain number of system parameter changes can be compensated for. If this does not suffice to ensure successful PINN training, one can resort to other methods that directly affect the system parameters (see Section I-A). These include domain decomposition (which either explicitly or indirectly, via adaptive collocation point sampling, affects the effective size of the computational domain), and curriculum learning (which starts PINN training at “simple” parameterizations and slowly changes the parameters to those of the new system). Moreover, when dealing with solution functions that have challenging regions to learn, such as discontinuities or stiff systems, the physics residuals can become comparably larger in these regions and exceed the scaling effects of system parameters. Therefore, it may become necessary to weight individual collocation points to handle such situation effectively. Future research should investigate whether such a general procedure makes PINN training more reliable; this paper has taken an important step in this direction.

VII Conclusion

In this paper, we highlighted the importance of understanding the role of system parameters in the training of PINNs. We demonstrated that system parameters such as characteristic length and time scales, the computational domain, and coefficients of differential equations influence the absolute scale of data and physics residuals. We theoretically and empirically verified that this has a consequence for PINN training, especially when data and physics loss functions are minimized through a scalarized MO optimization problem. As measured by the set of loss values that are achievable with gradient-based training and defined as the “apparent” Pareto front, we observed that its shape and the optimal choice of loss weights are determined by the chosen system parameters. Moreover, we showed by altering the system parameterization that the apparent Pareto front can exhibit locally convex parts, resulting in a wider range of loss weights for which gradient-based training is successful. These findings provide important insights into the MO optimization of PINNs and suggest that the common practice of nondimensionalization can have significant implications for PINN training. Ultimately, this work contributes to the development of more effective and efficient loss weighting schemes, which take into account fundamental properties of physical systems governed by differential equations.

References

  • [1] S.M. Udrescu and M. Tegmark. ”AI Feynman: A physics-inspired method for symbolic regression.” Science Advances, 2020, 6. Jg., Nr. 16, S. eaay2631.
  • [2] S.L. Brunton, J.L. Proctor, and J.N. Kutz. ”Discovering governing equations from data by sparse identification of nonlinear dynamical systems.” Proceedings of the national academy of sciences, 2016, 113. Jg., Nr. 15, S. 3932-3937.
  • [3] A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. Battaglia. ”Learning to simulate complex physics with graph networks.” International conference on machine learning. PMLR, 2020. S. 8459-8468.
  • [4] I.E. Lagaris, A. Likas, and D.I. Fotiadis. ”Artificial neural networks for solving ordinary and partial differential equations.” IEEE transactions on neural networks, 1998, 9. Jg., Nr. 5, S. 987-1000.
  • [5] M. Raissi, P. Perdikaris, and G.E Karniadakis. ”Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.” Journal of Computational physics, 2019, 378. Jg., S. 686-707.
  • [6] M. Raissi, A. Yazdani, and G.E. Karniadakis. ”Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations.” Science, 2020, 367. Jg., Nr. 6481, S. 1026-1030.
  • [7] Y. Chen, L. Lu, G.E. Karniadakis, and L. Dal Negro. ”Physics-informed neural networks for inverse problems in nano-optics and metamaterials.” Optics express, 2020, 28. Jg., Nr. 8, S. 11618-11633.
  • [8] F. Sahli Costabal, Y. Yang, P. Perdikaris, D.E. Hurtado, and E. Kuhl. ”Physics-informed neural networks for cardiac activation mapping.” Frontiers in Physics, 2020, 8. Jg., S. 42.
  • [9] G. Kissas, Y. Yang, E. Hwuang, W.R. Witschey, J.A. Detre, and P. Perdikaris. ”Machine learning in cardiovascular flows modeling: Predicting arterial blood pressure from non-invasive 4D flow MRI data using physics-informed neural networks.” Computer Methods in Applied Mechanics and Engineering, 2020, 358. Jg., S. 112623.
  • [10] Z. Mao, A. D. Jagtap, and G.E. Karniadakis. ”Physics-informed neural networks for high-speed flows.” Computer Methods in Applied Mechanics and Engineering, 2020, 360. Jg., S. 112789.
  • [11] A. Dourado, and F. A. Viana. ”Physics-informed neural networks for missing physics estimation in cumulative damage models: a case study in corrosion fatigue.” Journal of Computing and Information Science in Engineering, 2020, 20. Jg., Nr. 6.
  • [12] Q. He, D. Barajas-Solano, G. Tartakovsky, and A.M. Tartakovsky. ”Physics-informed neural networks for multiphysics data assimilation with application to subsurface transport.” Advances in Water Resources, 2020, 141. Jg., S. 103610.
  • [13] M. Yin, X. Zheng, J.D. Humphrey, and G.E. Karniadakis. ”Non-invasive inference of thrombus material properties with physics-informed neural networks.” Computer Methods in Applied Mechanics and Engineering, 2021, 375. Jg., S. 113603.
  • [14] A. Krishnapriyan, A. Gholami, S. Zhe, R. Kirby, and M.W. Mahoney. ”Characterizing possible failure modes in physics-informed neural networks.” Advances in Neural Information Processing Systems, 2021, 34. Jg., S. 26548-26560.
  • [15] S. Monaco, and D. Apiletti. ”Training physics-informed neural networks: One learning to rule them all?.” Results in Engineering, 2023, 18. Jg., S. 101023.
  • [16] O. Fuks, and H.A. Tchelepi. ”Limitations of physics informed machine learning for nonlinear two-phase transport in porous media.” Journal of Machine Learning for Modeling and Computing, 2020, 1. Jg., Nr. 1.
  • [17] S. Steger, F.M. Rohrhofer, and B.C. Geiger. ”How PINNs cheat: Predicting chaotic motion of a double pendulum.” The Symbiosis of Deep Learning and Differential Equations II.
  • [18] G.E. Karniadakis, I.G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang. ”Physics-informed machine learning.” Nature Reviews Physics, 2021, 3. Jg., Nr. 6, S. 422-440.
  • [19] A.D. Jagtap, and G.E. Karniadakis. ”Extended Physics-informed Neural Networks (XPINNs): A Generalized Space-Time Domain Decomposition based Deep Learning Framework for Nonlinear Partial Differential Equations.” AAAI Spring Symposium: MLPS. 2021. S. 2002-2041.
  • [20] C. Wu, M. Zhu, Q. Tan, Y. Kartha, and L. Lu. ”A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks.” Computer Methods in Applied Mechanics and Engineering, 2023, 403. Jg., S. 115671.
  • [21] L.D. McClenny, and U.M. Braga-Neto. ”Self-adaptive physics-informed neural networks.” Journal of Computational Physics, 2023, 474. Jg., S. 111722.
  • [22] F.M. Rohrhofer, S. Posch, C. Gößnitzer, and B.C. Geiger. ”On the Role of Fixed Points of Dynamical Systems in Training Physics-Informed Neural Networks.” Transactions on Machine Learning Research, 2023.
  • [23] W. Peng, W. Zhou, J. Zhang, and W. Yao. ”Accelerating physics-informed neural network training with prior dictionaries.” arXiv preprint arXiv:2004.08151, 2020.
  • [24] L. Lu, R. Pestourie, W. Yao, Z. Wang, F. Verdugo, and S. G. Johnson. ”Physics-informed neural networks with hard constraints for inverse design.” SIAM Journal on Scientific Computing, 2021, 43. Jg., Nr. 6, S. B1105-B1132.
  • [25] J.C. Wong, C. Ooi, A. Gupta, and Y.S. Ong. ”Learning in sinusoidal spaces with physics-informed neural networks.” IEEE Transactions on Artificial Intelligence, 2022.
  • [26] S. Wang, S. Sankaran, and P. Perdikaris. ”Respecting causality is all you need for training physics-informed neural networks.” arXiv preprint arXiv:2203.07404, 2022.
  • [27] S. Wang, Y. Teng, and P. Perdikaris. ”Understanding and mitigating gradient flow pathologies in physics-informed neural networks.” SIAM Journal on Scientific Computing, 2021, 43. Jg., Nr. 5, S. A3055-A3081.
  • [28] X. Jin, S. Cai, H. Li, and G.E. Karniadakis. ”NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations.” Journal of Computational Physics, 2021, 426. Jg., S. 109951.
  • [29] S. Maddu, D. Sturm, C.L. Müller, and I.F. Sbalzarini. ”Inverse Dirichlet weighting enables reliable training of physics informed neural networks.” Machine Learning: Science and Technology, 2022, 3. Jg., Nr. 1, S. 015026.
  • [30] Z. Xiang, W. Peng, X. Liu, and W. Yao. ”Self-adaptive loss balanced Physics-informed neural networks.” Neurocomputing, 2022, 496. Jg., S. 11-34.
  • [31] H. Son, S.W. Cho, and H.J. Hwang. ”Al-pinns: Augmented lagrangian relaxation method for physics-informed neural networks.” arXiv preprint arXiv:2205.01059, 2022.
  • [32] A.G. Baydin, B.A. Pearlmutter, A.A. Radul, and J.M. Siskind. ”Automatic differentiation in machine learning: a survey.” Journal of Marchine Learning Research, 2018, 18. Jg., S. 1-43.
  • [33] C.L. Hwang, and A.S.M Masud. ”Multiple objective decision making—methods and applications: a state-of-the-art survey.” Springer Science & Business Media, 2012.
  • [34] L.I.G. Kovasznay. ”Laminar flow behind a two-dimensional grid.” Mathematical Proceedings of the Cambridge Philosophical Society. Cambridge University Press, 1948. S. 58-62.
  • [35] X. Glorot, and Y. Bengio. ”Understanding the difficulty of training deep feedforward neural networks.” Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010. S. 249-256.
  • [36] D.P. Kingma, and J. Ba. ”Adam: A method for stochastic optimization.” arXiv preprint arXiv:1412.6980, 2014.
[Uncaptioned image] Franz M. Rohrhofer received a B.Sc. in physics from the University of Graz, Austria, in 2016 and a Dipl.-Ing. degree in technical physics (with distinction) from Graz University of Technology, Austria, in 2019. From 2019 to 2020 he worked as a Project Assistant at the Institute of Theoretical and Computational Physics at Graz University of Technology, Austria. In this position, his work included the development of machine learning surrogates for crystal structure prediction with a special focus on hybrid and theory-assisted feature engineering. In 2020 he joined Know-Center GmbH, Graz, Austria, as a Research Scientist and started a PhD programme in computer science at Graz University of Technology, Austria. His current work and research interests comprise the development of physics-informed deep learning approaches for computational fluid mechanics and reactive flow simulations.
[Uncaptioned image] Stefan Posch received a B.Sc. (2011), M.Sc. (2013) and Ph.D. (2017) in mechanical engineering at Graz University of Technology, Austria. He then worked as a senior engineer at Midea Austria GmbH where he was responsible for simulation tasks in the field of hermetic compressors. Since 2019 he has worked at the Large Engines Competence Center GmbH in Graz, Austria, as a senior scientist and team leader for system simulation and AI integration. His main research interests include the combination of numerical simulation and data-driven approaches
[Uncaptioned image] Clemens Gößnitzer received his bachelor’s (2014), master’s (2016), and doctoral (2019) degrees in chemical and process engineering from Vienna University of Technology, Austria. After his Ph.D., he joined the Large Engines Competence Center GmbH in Graz, Austria, as a senior engineer, where he works on ignition modeling and combustion simulation in the context of internal combustion engines. Since 2021 he has led the CFD team and is responsible for the physical modeling and simulation of complex combustion phenomena. His research interests are the simulation of reactive flows with conventional and emerging fuels, physical modeling, and the integration of machine learning approaches in physical simulations.
[Uncaptioned image] Bernhard C. Geiger (S’07, M’14, SM’19) received the Dipl.-Ing. degree in electrical engineering (with distinction) in 2009 and the Dr. techn. degree in electrical and information engineering (with distinction) from Graz University of Technology, Austria, in 2014. In 2009 he joined the Signal Processing and Speech Communication Laboratory at Graz University of Technology as a Project Assistant and became a Research and Teaching Associate at the same lab in 2010. He was a Senior Scientist and Erwin Schrödinger Fellow at the Institute for Communications Engineering, Technical University of Munich, Germany from 2014 to 2017 and a postdoctoral researcher at the Signal Processing and Speech Communication Laboratory, Graz University of Technology, Austria from 2017 to 2018. He is currently a Senior Researcher at Know-Center GmbH, Graz, Austria, where he also leads the Machine Learning Group within the Knowledge Discovery Area. His research interests include information theory for machine learning, theory-assisted machine learning, and information-theoretic model reduction for Markov chains and hidden Markov models.
\EOD