\addbibresource

biblio.bib affiliationtext: Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany **affiliationtext: [email protected]; ∗∗[email protected]

algorithm

Seifallah Elfetni Reza Darvishi Kamachali

PINNs-MPF: A Physics-Informed Neural Network Framework for Multi-Phase-Field Simulation of Interface Dynamics

Seifallah Elfetni Reza Darvishi Kamachali
Abstract

We present an application of Physics-Informed Neural Networks (PINNs) to handle MultiPhase-Field (MPF) simulations of microstructure evolution. To handle such problems, it has been showcased that a combination of optimization techniques extended and adapted from the PINNs literature, as well as the introduction of specific techniques inspired by the MPF Method background, is required. The numerical resolution is realized through a multi-variable time-series problem by using fully discrete resolution. Within each interval, space, time, and phases/grains are treated separately, constituting discrete subdomains. An extended multi-networking concept is implemented to subdivide the simulation domain into multiple batches, with each batch associated with an independent Neural Network (NN) trained to predict the solution. To ensure efficient interaction across different phases/grains and in the spatio-temporal-phasic subdomain, a Master NN handles efficient interaction among the multiple networks, as well as the transfer of learning in different directions. A set of systematic simulations with increasing complexity was performed, that benchmarks various critical aspects of MPF simulations, including different geometries, types of interface dynamics and the evolution of an interfacial triple junction. A comprehensive approach is adopted to specifically focus the attention on the interfacial regions through an automatic and dynamic meshing process (introduced as an adaptive Mesh-free optimization), significantly simplifying the tuning of hyper-parameters and serving as a fundamental key for addressing MPF problems using Machine Learning. The pyramidal training approach is proposed to the PINN community as a dual-impact method: it facilitates the initialization of training when dealing with multiple networks, and it is proposed as a method to unify the solution through an extended transfer of learning. The proposed PINNs-MPF framework successfully reproduces benchmark tests with high fidelity and Mean Squared Error (MSE) loss values ranging from 10-4 to 10-6 compared to ground truth solutions.

Keywords: PINNs, Phase-field method, Parallel training, Machine learning, Neural networks, Adaptive mesh refinement

1 Introduction

Today’s microstructure modeling has grown into an extensive branch of materials research, on the one hand, revealing the multi-physics of the concurrently evolving microstructural constituents across scales and, on the other hand, bridging these with the process, property and performance of materials. Despite remarkable advances over the last two decades, the rapidly growing complexities of materials’ chemistry and processing are constantly surpassing our actual microstructure modeling capacity, thus, decelerating materials innovation. A potential solution for this growing challenge is to utilize the recent developments in machine learning (ML) and deep learning (DL) for microstructure modeling [choudharyrecent2022]. The recently introduced Physics-Informed Neural Networks (PINNs) are showing promising capabilities in addressing physical phenomena [cuomoscientific2022]. By enabling non-linear approximations of solutions for Partial Differential Equations (PDEs), the PINNs counterpart of the conventional approaches, such as phase-field, fluid dynamics and finite-element methods, would allow for a mesh-free modeling framework, capable of transfer of the learning and without the necessity of local approximations or simplifications in the underlying physics [cuomoscientific2022, HAGHIGHAT2021113741, Ramuhalli1528518]. These promote new perspectives in microstructure modeling, motivating us to pursue the current line of studies.

First applications of PINNs to solve 1D PDEs (e.g., the heat equation, wave equation and Burgers’ equation) were quite successful, demonstrating its simplicity and efficiency [raissiphysicsinformed2019, YANG2019136, ZHANG2019108850]. Since then, deeper applications have been increasingly targeted in various areas including heat transfer [Sharmaheattransfer2023, ZOBEIRY2021104232], solid mechanics [HAGHIGHAT2021113741, DIAO2023116120, HENKES2022114790, GUO2023113334, bai2023introduction, S0219876223500135, GOSWAMI2020102447, khorrami2023artificial], electromagnetic [HAGHIGHAT2021113741, Ramuhalli1528518, CAO2023123622], turbulence modeling [Shirui2020, HANRAHAN2023109232], electro-chemistry [CHEN2022116918, Hofmann2023] and Energy [HAN20233450, PRIYADARSHI2024110231].

However, as the literature on resolving PDEs using PINNs has grown, several challenges have been revealed, highlighting the necessity for optimization techniques tailored to specific physical contexts. Jagtap et al. [Jagtap2020] proposed Extended Physics-Informed Neural Networks (XPINNs) to address moving boundaries. This method subdivides the domain into different mini-batches to locally compute the solution using distinct neural networks. XPINNs were tested on the one-dimensional viscous Burgers equation, employing various optimization techniques such as subdomain decomposition (in time and space), adaptive activation functions, independent hyper-parameter adjustment for each network, and a specific data processing method to characterize the average behavior of predictions along the subdomain interfaces. Kharazmi et al. [KHARAZMI2021113547] introduced hp-Variational Physics-Informed Neural Networks (hp-VPINNs), utilizing domain decomposition and h-refinement with dynamic mesh resolution adjustment to target the advection–diffusion equation (1D) and Poisson equation (1D-2D). Meng et al. [MENG2020113250] propose a Parareal PINNs (PPINN) framework for solving time-dependent PDEs using domain decomposition. The examples include ordinary differential equations (ODEs) and a 2D nonlinear diffusion-reaction equation. They employ various optimization techniques like alternating between Adam and L-BFGS optimizers, Quasi-Monte Carlo sampling for stochastic ODEs, and principally a parallel-in-time training approach with a serial update at the subdomain interfaces using the fine PINN solutions as a parallel prediction-correction process. Penwarden et al. [PENWARDEN2023112464] further enhanced these techniques by introducing stacked-decomposition inspired by causality literature (time-causality enforcement methods [krishnapriyan2021characterizing, BIHLO2022111024]), window-sweeping, alternance of optimizers and transfer of learning methods proposed to overcome training challenges, respect causality, and improve scalability by limiting computation per optimization iteration. The authors consider the 1D convection equation, 1D Allen-Cahn equation and 1D Korteweg–de Vries. While the reported methods have demonstrated promising results, further testing and extension are necessary to address more complex scenarios, such as two-dimensional configurations involving systems of equations, Multi-Phase Field problems and three-dimensional domains. These scenarios introduce additional challenges that require innovative approaches and optimization techniques.

A few applications of PINNs to handle diffuse interfaces were recently presented, especially to study evolution of multiple interacting phases or components [ROJAS2023100450, ZHANG202364, Haghighat2022, ZHANG2023111919, AMINI2023112323, wight2020solving]. Zheng et al. [Zheng2022] have developed a physics-constrained neural network (PCNN) to predict the sequential motions of flow simulations. Due to the challenging nature of multi-phase problems, the applied approach was based on the satisfaction of the mass conversation constraint when encoding the observation of the recurrent NN. A conservative boundedness mapping algorithm (MCBOM) was then called to correct the phase prediction, reported to be efficient for handling sequential notions in some testing benchmarks in 2D, e.g., the evolution of three phases in the shear layer and dam break problems. Haghighat et al. [Haghighat2022] proposed a dimensionless form of the PDEs combined with a sequential training strategy based on stress-split algorithms and multi-networking, applied for solving several problems in poroelasticity. It was found that both the size and structure of the NN, along with the hyperparameters of the optimizer, such as the learning rate, can significantly influence the quality of PINN solutions. Amini et al. [AMINI2023112323] introduced inverse modeling of nonisothermal multiphase poromechanics using PINNs, successfully applying the method to both single- and multi-phase flow in porous media.

During these remarkable advances, several challenges were also spotted in the field that remain to be addressed. A primary concern with PINNs involves the gradient pathology in the total loss function, due to larger gradients in higher-derivative terms, affecting the accuracy of the predictions [Haghighat2022]. Furthermore, ensuring systematic convergence of training becomes difficult, especially with the high computational cost of multi-networking. The use of a multi-networking approach can generate differing architectures or training data among separate NNs, which leads to imperfect alignment and thus discontinuities or artifacts along boundaries. Adjusting the size of training sets, including initial conditions, collocation points, and boundary conditions, was shown to help resolve this problem, but at the same time, this adds to the complexity and computational costs [cuomoscientific2022, ZHU201956]. The assembly and transfer of collective learning from multiple networks (transferring the learning) is another major challenge in PINNs. Here tuning hyper-parameters, even with a single NN, remains challenging: Adaptive weighting, while introduced to address feature dependencies, reflects non-systematic learning of solution patterns [CHEN2022116918]. In addition to these, the profound sensitivity of the solution to the hyper-parameters, encompassing variables such as the number of neurons, network layers, learning rate, and the composition of training datasets, emerges as a major issue that requires more attention when dealing with PINNs [CHEN2022116918, ARZANI2023111768]. Regarding Multi-phase Field (MPF) equations, they involve a coupled system of nonlinear PDEs with multiple phases and several parameters. The presence of higher-order derivative terms and a non-convex potential term further contributes to additional non-linearities and the emergence of several local minima. Additionally, the generated solutions often require correction algorithms to satisfy conservation laws, rendering classical PINNs unable to make accurate predictions. Therefore, incorporating conservative correction algorithms and adequate optimization methods becomes necessary [HUANG2021103727].

Multi-phase-field methods (MPFM) for microstructure modeling have proven to be powerful tools for capturing intricate multi-physics phenomena in multi-phase, multi-component materials [chen2002phase, steinbach2009phase, steinbachphasefield2009]. Being based on a generalizable free energy functional, a key feature of the MPF framework is its capacity to be flexibly expanded to different degrees of complexity which can be specific to the physics of the problem at hand. A representative demonstration of this flexibility has been explored in dealing with polycrystalline microstructures: Starting from curvature-driven interface kinetics, the MPF framework has gradually expanded to cover the chemical [GROSE2022111570], mechanical [SINGH2021107348] and even the physics of magnetic and electric contributions [Huo2023]. These led to the accurate investigation of complex phenomena such as precipitation, recrystallization, and phase transformation as well as various chemo-mechanically coupled phenomena [darvishikamachali2013grainPhD, kamachali2015texture, schwarze2018computationally]. Despite successful implementations and applications [kamachali2018numerical, tegeler2017parallel], an intrinsic challenge of the MPF approach is that with any further advancement and raising scientific questions, immediate needs for extensive programming and optimization efforts are required, which often spread to deep levels of redefining key parameters and functions as well as memory- and storage-structures in the code. These originate from the fact that any update to the model necessitates changes to the underlying free energy functional and corresponding equations of motion. As a result, it becomes difficult to keep the software updates up with the pace of emerging topics in materials research, for instance, the unanswered questions regarding non-equilibrium microstructure evolution in additively manufactured materials [WANG2020101538, PARSAZADEH2023101102] or the complex diffusion-interface couplings in novel high-entropy alloys [QIAO2021160295, Ziyuan2022HEA].

This work presents a comprehensive framework, termed PINNs-MPF, specifically designed to tackle the challenges of solving MPF equations using PINNs. By identifying the limitations of existing PINN methods when applied to MPF equations and recognizing the additional complexities introduced by these non-linear, convex, and constrained systems, we propose a unified approach that combines and extends relevant PINN techniques with MPF method strategies. This strategy is inspired by the core development of the OpenPhase software for MPF simulations, which has demonstrated success in studying diffuse interface problems, but aiming to overcome reported limitations, by enabling a learnable solution while efficiently handling non-linearities, to integrate more physics. [darvishikamachali2013grainPhD].
This framework introduces key novelties through its optimization strategies: (i) A extended domain decomposition approach is implemented that uses a central coordinating network (referred to as the "Master") to manage tasks and ensure continuity among subnetworks (referred to as the "workers") during full parallel training in space, time and phases. (ii) An extreme mesh refinement approach, with a distinct focus on the phase-field interfacial zones, is applied. (iii) We simplify and reduce the hyper-parameters as elaborated in the following. (iv) A synchronization step to ensure efficient interactions between the evolving phase-fields is applied. Finally, (v) a pyramidal training approach is applied as a robust alternative to adaptive weighting and an efficient solution for transferring learning of multiple-to-multiple and single-to-multiple networks. Inspired by the MPF benchmark philosophy [darvishikamachali2013grainPhD], we test our PINNs across four core applications that built the MPF concept with a progressive complexity: First, we model the motion of a diffuse interface (propagation of a stable phase-field profile), second and third, we study the curvature-driven shrinkage of a grain in the presence and absence of a driving force, and fourth, we investigate the evolution of a triple junction that demonstrates the core principles of a multi-phase evolution. The results are compared against the outputs from OpenPhase and discussed in the context of our hyper-parameters and concepts implemented within our PINNs framework.

2 The PINNs-MPF Framework

The MPF temporal evolution is governed by the set of PDEs as follows:

ϕ˙α=μσNβ=1N(2ϕα2ϕβ+π22η2(ϕαϕβ))+μNβ=1NΔG(ϕα,ϕβ),subscript˙italic-ϕ𝛼𝜇𝜎𝑁superscriptsubscript𝛽1𝑁superscript2subscriptitalic-ϕ𝛼superscript2subscriptitalic-ϕ𝛽superscript𝜋22superscript𝜂2subscriptitalic-ϕ𝛼subscriptitalic-ϕ𝛽𝜇𝑁superscriptsubscript𝛽1𝑁Δ𝐺subscriptitalic-ϕ𝛼subscriptitalic-ϕ𝛽\dot{\phi}_{\alpha}=\frac{\mu\sigma}{N}\sum_{\beta=1}^{N}\left(\nabla^{2}\phi_% {\alpha}-\nabla^{2}\phi_{\beta}+\frac{\pi^{2}}{2\eta^{2}}(\phi_{\alpha}-\phi_{% \beta})\right)+\frac{\mu}{N}\sum_{\beta=1}^{N}\Delta G\left(\phi_{\alpha},\phi% _{\beta}\right),over˙ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = divide start_ARG italic_μ italic_σ end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_β = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT - ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT + divide start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_ϕ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT - italic_ϕ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) ) + divide start_ARG italic_μ end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_β = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT roman_Δ italic_G ( italic_ϕ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) , (1)

with phase-fields ϕα[0,1]subscriptitalic-ϕ𝛼01\phi_{\alpha}\in[0,1]italic_ϕ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ∈ [ 0 , 1 ], μ𝜇\muitalic_μ the interface mobility, σ𝜎\sigmaitalic_σ the interface energy and η𝜂\etaitalic_η the interface width. The pairwise term ΔGΔ𝐺\Delta Groman_Δ italic_G counts for any additional driving force (e.g. chemical, elastic, etc.) influencing the interface dynamics. At every point in space and time, the summation of phase-fields/order parameter reads:

α=1Nϕα=1,superscriptsubscript𝛼1𝑁subscriptitalic-ϕ𝛼1\sum_{\alpha=1}^{N}\phi_{\alpha}=1,∑ start_POSTSUBSCRIPT italic_α = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = 1 , (2)

with N𝑁Nitalic_N the number of phase-fields present. Further details of the MPF model are given in the Methods section 3. The PINNs-MPF framework is targeted to resolve Eqs. (1) and (2) in time and space and for various initializations. In doing so, PINNs aim to minimize a loss function (θ)𝜃\mathcal{L}(\theta)caligraphic_L ( italic_θ ) with respect to the hyper-parameters θ𝜃\thetaitalic_θ. With the MPF model in hand, this is composed of several terms:

θ=argminθ[(θ)]=argminθ[λpdePDE(θ)+λbcBC(θ)+λicIC(θ)+λϕϕ(θ)]superscript𝜃subscript𝜃𝜃subscript𝜃subscript𝜆pdesubscriptPDE𝜃subscript𝜆bcsubscriptBC𝜃subscript𝜆icsubscriptIC𝜃subscript𝜆italic-ϕsubscriptitalic-ϕ𝜃\theta^{*}=\arg\min_{\theta}\left[\mathcal{L}(\theta)\right]=\arg\min_{\theta}% \left[\lambda_{\mathrm{pde}}\mathcal{L}_{\mathrm{PDE}}(\theta)+\lambda_{% \mathrm{bc}}\mathcal{L}_{\mathrm{BC}}(\theta)+\lambda_{\mathrm{ic}}\mathcal{L}% _{\mathrm{IC}}(\theta)+\lambda_{\sum\phi}\mathcal{L}_{\sum\phi}(\theta)\right]italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ caligraphic_L ( italic_θ ) ] = roman_arg roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_λ start_POSTSUBSCRIPT roman_pde end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT roman_PDE end_POSTSUBSCRIPT ( italic_θ ) + italic_λ start_POSTSUBSCRIPT roman_bc end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT roman_BC end_POSTSUBSCRIPT ( italic_θ ) + italic_λ start_POSTSUBSCRIPT roman_ic end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT roman_IC end_POSTSUBSCRIPT ( italic_θ ) + italic_λ start_POSTSUBSCRIPT ∑ italic_ϕ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT ∑ italic_ϕ end_POSTSUBSCRIPT ( italic_θ ) ] (3)

where 𝒫𝒟subscript𝒫𝒟\mathcal{L}_{\mathcal{PDE}}caligraphic_L start_POSTSUBSCRIPT caligraphic_P caligraphic_D caligraphic_E end_POSTSUBSCRIPT is the loss associated with enforcing the governing physics through PDEs:

PDE(θ)=1Nri=1Nr[𝒓(𝒙ri,tri,θ)]2subscriptPDE𝜃1subscript𝑁𝑟superscriptsubscript𝑖1subscript𝑁𝑟superscriptdelimited-[]𝒓superscriptsubscript𝒙𝑟𝑖superscriptsubscript𝑡𝑟𝑖𝜃2\mathcal{L}_{\mathrm{PDE}}(\theta)=\frac{1}{N_{r}}\sum_{i=1}^{N_{r}}\left[% \boldsymbol{r}\left(\boldsymbol{x}_{r}^{i},t_{r}^{i},\theta\right)\right]^{2}caligraphic_L start_POSTSUBSCRIPT roman_PDE end_POSTSUBSCRIPT ( italic_θ ) = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ bold_italic_r ( bold_italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_θ ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (4)

in which, r represents the residual of the PDEs or the physical loss (Eq. 1) and Nrsubscript𝑁𝑟{N_{r}}italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT the number of collocation points. 𝒞subscript𝒞\mathcal{L}_{\mathcal{BC}}caligraphic_L start_POSTSUBSCRIPT caligraphic_B caligraphic_C end_POSTSUBSCRIPT is the loss associated with enforcing boundary conditions (BC) on the simulation domain:

BC(θ)=1NBCi=1NBC[𝒖(𝒙BCi,tBCi,θ)gBCi]2subscriptBC𝜃1subscript𝑁BCsuperscriptsubscript𝑖1subscript𝑁BCsuperscriptdelimited-[]𝒖superscriptsubscript𝒙BC𝑖superscriptsubscript𝑡BC𝑖𝜃superscriptsubscript𝑔BC𝑖2\mathcal{L}_{\mathrm{BC}}(\theta)=\frac{1}{N_{\mathrm{BC}}}\sum_{i=1}^{N_{% \mathrm{BC}}}\left[\boldsymbol{u}\left(\boldsymbol{x}_{\mathrm{BC}}^{i},t_{% \mathrm{BC}}^{i},\theta\right)-g_{\mathrm{BC}}^{i}\right]^{2}caligraphic_L start_POSTSUBSCRIPT roman_BC end_POSTSUBSCRIPT ( italic_θ ) = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT roman_BC end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_BC end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ bold_italic_u ( bold_italic_x start_POSTSUBSCRIPT roman_BC end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT roman_BC end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_θ ) - italic_g start_POSTSUBSCRIPT roman_BC end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (5)

with 𝒖𝒖\boldsymbol{u}bold_italic_u representing the approximated solution at boundaries 𝒙isuperscriptsubscript𝒙𝑖\boldsymbol{x}_{\mathcal{B}}^{i}bold_italic_x start_POSTSUBSCRIPT caligraphic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and time tisuperscriptsubscript𝑡𝑖t_{\mathcal{B}}^{i}italic_t start_POSTSUBSCRIPT caligraphic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT using the neural network with parameters θ𝜃\thetaitalic_θ and, gisuperscriptsubscript𝑔𝑖g_{\mathcal{B}}^{i}italic_g start_POSTSUBSCRIPT caligraphic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is the prescribed or observed value at the corresponding boundary and time. ICsubscriptIC\mathcal{L}_{\text{IC}}caligraphic_L start_POSTSUBSCRIPT IC end_POSTSUBSCRIPT is the loss associated with enforcing initial conditions (IC), i.e., the spatial configuration of the phase-fields:

IC(θ)=1NICi=1NIC[𝒖(𝒙ICi,0,θ)hICi]2subscriptIC𝜃1subscript𝑁ICsuperscriptsubscript𝑖1subscript𝑁ICsuperscriptdelimited-[]𝒖superscriptsubscript𝒙IC𝑖0𝜃superscriptsubscriptIC𝑖2\mathcal{L}_{\mathrm{IC}}(\theta)=\frac{1}{N_{\mathrm{IC}}}\sum_{i=1}^{N_{% \mathrm{IC}}}\left[\boldsymbol{u}\left(\boldsymbol{x}_{\mathrm{IC}}^{i},0,% \theta\right)-h_{\mathrm{IC}}^{i}\right]^{2}caligraphic_L start_POSTSUBSCRIPT roman_IC end_POSTSUBSCRIPT ( italic_θ ) = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT roman_IC end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_IC end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ bold_italic_u ( bold_italic_x start_POSTSUBSCRIPT roman_IC end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , 0 , italic_θ ) - italic_h start_POSTSUBSCRIPT roman_IC end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (6)

that compares the approximated solution at initial state 𝒙ICisuperscriptsubscript𝒙IC𝑖\boldsymbol{x}_{\text{IC}}^{i}bold_italic_x start_POSTSUBSCRIPT IC end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and time t=0𝑡0t=0italic_t = 0 against the prescribed or observed value at the corresponding (initial) state hICisuperscriptsubscriptIC𝑖h_{\text{IC}}^{i}italic_h start_POSTSUBSCRIPT IC end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Finally, ϕsubscriptitalic-ϕ\mathcal{L}_{\sum\phi}caligraphic_L start_POSTSUBSCRIPT ∑ italic_ϕ end_POSTSUBSCRIPT is the loss associated with the constraint Eq. (2) of the phase summation:

ϕ(θ)=|α=1Nϕα(𝒙,t,θ)1|2subscriptitalic-ϕ𝜃superscriptsuperscriptsubscript𝛼1𝑁subscriptitalic-ϕ𝛼𝒙𝑡𝜃12\mathcal{L}_{\sum\phi}(\theta)=\left|\sum_{\alpha=1}^{N}\phi_{\alpha}\left(% \boldsymbol{x},t,\theta\right)-1\right|^{2}caligraphic_L start_POSTSUBSCRIPT ∑ italic_ϕ end_POSTSUBSCRIPT ( italic_θ ) = | ∑ start_POSTSUBSCRIPT italic_α = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( bold_italic_x , italic_t , italic_θ ) - 1 | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (7)

where N𝑁Nitalic_N is the total number of phases in the given point. Unlike the other losses which are computed over a subpart of the domain, the loss over the sum constraint is computed for the entire simulation domain.

The coefficients λPDEsubscript𝜆𝑃𝐷𝐸\lambda_{PDE}italic_λ start_POSTSUBSCRIPT italic_P italic_D italic_E end_POSTSUBSCRIPT, λBCsubscript𝜆𝐵𝐶\lambda_{BC}italic_λ start_POSTSUBSCRIPT italic_B italic_C end_POSTSUBSCRIPT, λICsubscript𝜆𝐼𝐶\lambda_{IC}italic_λ start_POSTSUBSCRIPT italic_I italic_C end_POSTSUBSCRIPT and λϕsubscript𝜆italic-ϕ\lambda_{\sum\phi}italic_λ start_POSTSUBSCRIPT ∑ italic_ϕ end_POSTSUBSCRIPT in Eq. (3) are weight factors associated with different loss terms, to balance the importance of each term. The objective of Eq. (3) is to find the optimal parameters θsuperscript𝜃\theta^{*}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT that minimize the whole loss function, resulting in learned neural networks that are then capable of approximating the MPF solutions of a problem with the given initialization and boundary conditions. We note that since the MPF method is based on a free energy functional, reinforcing terms to account for the energy dissipation [SECCI2024105494] could be directly added to the loss function, further elaborated in the Discussion (section 5).

The architecture of the current PINNs-MPF framework is modular and composed of various implementations. To better illustrate the PINNs-MPF framework, Figure 1 presents a flowchart of the framework.

Refer to caption
(a) Global architecture of the PINNs-MPF framework. A general description of the discrete training subdomain is provided in (a1), while a detailed illustration of the training process within each time interval is shown in (a2).
Refer to caption
(b) Pyramidal training and continuity across domains: A comprehensive description of the pyramidal transfer of learning from one subdivision level to another is depicted in (b1), while an illustrative example of this hierarchical approach, followed by addressing continuity between domains, is provided in (b2)
Figure 1: The architecture of PINNs-MPF framework. Panel (b) adds some details to panel (a). The pyramidal training enables continuity and multiple-to-multiple transfer of learning. The continuity of the prediction is ensured by the propagation of boundary predictions. The set of techniques employed in this model is accessible within the code. The user has the flexibility to reserve the entire or a specific subset of the options, based on the simulation requirements and complexities.

Panel (a) in Figure 1 presents the global architecture: Similar to the conventional MPF method, full discrete training in space and time is considered. Our experiments demonstrate that this approach not only eases the comparison to our reference MPF solutions but also allows for maintaining a constant demand for resources throughout the entire simulation, in contrast to continuous training, which demands growing consumption [cuomoscientific2022]. The general control of the training is handled through a master network, referred to as the MASTER-PINN, while the workload is shared among multiple parallel NNs (subdomains), referred to as WORKER PINNs (c.f. paragraph 2.2). The subdividing procedure is automated. Within each subdomain, the training methodology is carried out independently, employing all distinctive techniques as introduced in the following. This includes ‘pyramidal training’, a structured approach designed for handling the transfer of learning along the spatial domain (c.f. paragraph 2.3). The term ‘basket of PINNs’ (c.f. paragraph 2.7) refers to an ensemble of networks dynamically selected for optimization to enhance efficiency. Indeed, Without an efficient strategy, the resource demand can grow exponentially, leading to impractical training times and excessive use of computational resources. Additionally, the concept of the ’wheel of optimizers’ (c.f. paragraph 2.7) is introduced, denoting the interaction between the optimizers.

The training is in two directions: First in space, where the domain is decomposed into regular boxes (denoted as Nbatches). The degree of this decomposition depends on the complexity of the problem. The second direction of training concerns the phase-fields. For each of these spatial-temporal blocks, referred to as a batch, a single WORKER PINN, referred to as PINNi, is trained to locally predict the solution. Hereafter, the term "blocks" is employed interchangeably with "quarters," as the spatial domain is consistently divided into four blocks. Each PINNi handles a given phase in a given batch and, depending on the domain decomposition, interacts along boundaries for spatial continuity.

At the beginning of each training (time) step, the batch structure is separately built for each phase-field α𝛼\alphaitalic_α, i.e., Nbatches PINNs are created and inherit the same architecture as the MASTER-PINN. During the training, several optimization techniques could be applied (based on the initial user selection), including pyramidal training, the wheel of optimizers, the basket of PINNis, and the concept of Quarters-based training. An upper view of the global algorithm is shown in Algorithm 1, where Nbatches minsubscript𝑁batches minN_{\text{batches min}}italic_N start_POSTSUBSCRIPT batches min end_POSTSUBSCRIPT, scipyepoch𝑠𝑐𝑖𝑝subscript𝑦epochscipy_{\text{epoch}}italic_s italic_c italic_i italic_p italic_y start_POSTSUBSCRIPT epoch end_POSTSUBSCRIPT denote the minimum number of batches (coarser subdivision in case of pyramidal training) and the periodic epoch for scipy optimization respectively. In the following, we describe the details of various implementations of the PINNs-MPF framework and the related algorithm.

0.1
Input1: Initialization
Input2: USER inputs
0.2 InitializetheMasterPINN𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒𝑡𝑒𝑀𝑎𝑠𝑡𝑒𝑟𝑃𝐼𝑁𝑁absentInitialize\leavevmode\nobreak\ the\leavevmode\nobreak\ Master\leavevmode% \nobreak\ PINN\leftarrowitalic_I italic_n italic_i italic_t italic_i italic_a italic_l italic_i italic_z italic_e italic_t italic_h italic_e italic_M italic_a italic_s italic_t italic_e italic_r italic_P italic_I italic_N italic_N ← User inputs
0.3 PreapreGlobaltrainingdata𝑃𝑟𝑒𝑎𝑝𝑟𝑒𝐺𝑙𝑜𝑏𝑎𝑙𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔𝑑𝑎𝑡𝑎absentPreapre\leavevmode\nobreak\ Global\leavevmode\nobreak\ training\leavevmode% \nobreak\ data\leftarrowitalic_P italic_r italic_e italic_a italic_p italic_r italic_e italic_G italic_l italic_o italic_b italic_a italic_l italic_t italic_r italic_a italic_i italic_n italic_i italic_n italic_g italic_d italic_a italic_t italic_a ← Initialization + USER inputs
 //
0.4 for epochs in range Nepochs (total number of epochs) do
0.5       set tminsubscript𝑡𝑚𝑖𝑛t_{min}italic_t start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT and tmaxsubscript𝑡𝑚𝑎𝑥t_{max}italic_t start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT
      
       // \rightarrow training subdomain bounds
0.6       while NbatchesNbatches minsucceeds-or-equalssubscript𝑁𝑏𝑎𝑡𝑐𝑒𝑠subscript𝑁batches minN_{batches}\succeq N_{\text{batches\leavevmode\nobreak\ min}}italic_N start_POSTSUBSCRIPT italic_b italic_a italic_t italic_c italic_h italic_e italic_s end_POSTSUBSCRIPT ⪰ italic_N start_POSTSUBSCRIPT batches min end_POSTSUBSCRIPT  do
0.7             initialize the multi-networks/pinns.
0.8             decompose the subdomain.
0.9             \rightarrow Horizontal subdivision (initialization of PINNis handling first phase).
0.10             + Vertical extrusion (initialization of PINNis handling other phases).
0.11             affect IC training data to each PINNi).
0.12             \rightarrow dynamically generate BC and collocation points for each PINNi.
0.13             divide the subdomain into different training Blocks/Quarters.
0.14             assign each PINNi to a parent Block.
0.15             for each Block do
0.16                   for each PINNi in Block  do
                        
                         // Remind: each PINNi handles one different phase
0.17                         if first phase then
0.18                               select a candidate PINNi \leftarrow highest ratio of interfacial points.
0.19                               \rightarrow put "selected PINNi" in the Basket.
0.20                               store information about "selected PINNi" (dictionary). \rightarrow to use if pyramidal training
0.21                              
0.22                        else
0.23                               for each phase in current Block  do
0.24                                     get \leftarrow the NN with the same batch index as the "selected PINNi".
            
             // \Rightarrow Basket of PINNis Ready
            
0.25             Multi-processing \rightarrow start Parallel Training \rightarrow parallel optimization using Adam.
0.26             if epoch % scipyepochepoch{}_{\text{epoch}}start_FLOATSUBSCRIPT epoch end_FLOATSUBSCRIPT  then
0.27                   if pinn.loss succeeds-or-equals\succeq Threshold for pinn in pinns then
0.28                         apply L-BFGS-B optimization.
0.29                        
0.30                  resample collocation points.
0.31                  
0.32            if PINNi.loss precedes\prec Threshold for PINNi in PINNis then
0.33                   if NbatchesNbatches minsucceedssubscript𝑁𝑏𝑎𝑡𝑐𝑒𝑠subscript𝑁batches minN_{batches}\succ N_{\text{batches\leavevmode\nobreak\ min}}italic_N start_POSTSUBSCRIPT italic_b italic_a italic_t italic_c italic_h italic_e italic_s end_POSTSUBSCRIPT ≻ italic_N start_POSTSUBSCRIPT batches min end_POSTSUBSCRIPT then
0.34                         reduce Nbatchessubscript𝑁𝑏𝑎𝑡𝑐𝑒𝑠N_{batches}italic_N start_POSTSUBSCRIPT italic_b italic_a italic_t italic_c italic_h italic_e italic_s end_POSTSUBSCRIPT \leftarrow Eq. 8
                        
                         // \rightarrow Pyramidal training.
0.35                        
0.36      increase tminsubscript𝑡𝑚𝑖𝑛t_{min}italic_t start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT and tmaxsubscript𝑡𝑚𝑎𝑥t_{max}italic_t start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT
      
       // \rightarrow go to the next training subdomain
0.37      
Algorithm 1 General training scheme for the implemented PINNs-MPF.

2.1 Discrete Resolution in Time

We consider the MPF equations as solving a multi-variable time-series problem through the NNs. This has also been discussed in previous studies [montes2021accelerating, HU2022115128, oommen2022learning, fetni2023capabilities]. The computational domain in the upper left corner panel in Figure 1(a) depicts this idea. Here the training process occurs in spatial domains/subdomains at fixed time intervals, analogous to time steps in conventional modeling methods, but with relatively larger intervals to harness the robust linearization capabilities of deep learning (DL). In order to proceed to a subsequent time domain, a global loss threshold is predefined to be achieved. This discrete resolution ensures a quasi-constant resource consumption, hence preventing any accumulating computational load along the time axis. To draw an analogy between discrete resolution in MPF modeling, the notation ΔtΔ𝑡\Delta troman_Δ italic_t (also denoted ΔtΔsuperscript𝑡{\Delta}t^{*}roman_Δ italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT for dimensionless resolution) is adopted here for both PINNs and MPF resolutions; in the context of PINNs, this notation refers to the length of the time interval. In the context of the PINN literature, this approach categorizes our framework within the class of time-marching or temporal PINNs, as discussed in the causality literature [PENWARDEN2023112464, CHEN2024111423, WANG2024116813].

2.2 Extended domain decomposition

This technique extends the domain decomposition approach introduced in previous PINN studies [Jagtap2020, SHUKLA2021110683, MENG2020113250] by adding a third dimension of decomposition and parallelization along the phases direction. This extension aims to handle the increased complexity of the target problem more effectively. A centralized network ensures synchronization among the multi-networks, facilitating a coordinated learning process across the decomposed spatial, temporal and phase domains. Each phase, managed by a corresponding Neural Network, requires concurrent access to the interaction terms (Iγsubscript𝐼𝛾I_{\gamma}italic_I start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT) of other phases (Eq. LABEL:eq:Inter_term), which collectively form dual or higher-order junctions. These interaction terms demand a synchronization method (Algorithm 2 ). Consequently, parallel implementation is crucial for the precise calculation of interaction terms, as per the NN definitions. A serial approach cannot adequately manage these interactions, rendering parallelism indispensable rather than merely an improvement to be quantitatively compared against serial execution or existing methods, if available.

One challenging aspect of a PINNs-MPF is the heavy computational costs associated with the size of the simulation domain and the number of phase-fields. Here the concept of multi-networking is essential to distribute this computational load efficiently. This involves the parallelization of multiple PINNis and their associated subdomains obtained through a structured domain decomposition. In this context, a MASTER-PINN is designed to distribute tasks, training data, and collate the learned outcomes from the WORKER-PINNis/PINNis. Each PINNi operates independently and concurrently, utilizing a dedicated thread in the computer. It functions indeed as an autonomous entity responsible for processing a specific batch within the spatial domain and managing a given phase-field within the MPF domain. All PINNis are assumed to have the same architecture (number of layers and neurons per layer, learning rate, etc.). This facilitates complete parallelization of the training process and ensures spatial continuity throughout the training.

As depicted in the flowchart in Figure 1, the MASTER-PINN does not undergo any training but performs several central tasks, including (i) preparing the training data set, (ii) initializing and assigning each WORKER PINNi for training, (iii) initiating and controlling the training process, (iv) facilitating communication among the WORKER PINNs, (v) managing the temporal transition of the training, and (vi) transferring the (ideally) successful learning from the actual training to another simulation. From a technical point of view, it is noteworthy that the current multi-networking approach enables the transfer of learning from multiple networks to a single network. This capability is available within the code through the pyramidal training option, wherein the subdivision could be progressively transformed from fine to coarse, ideally converging to a single network (c.f. paragraph 2.3).

The communication between PINNis could be classified into two categories: a spatial ‘horizontal’ communication which is required to ensure spatial continuity across the boundary between their subdomains (c.f paragraph 2.4) and a phase-field ‘vertical’ communication that allows the coupled co-evolution of the phase-fields in the context of the MPF model (c.f paragraph 2.6). To facilitate the training of the WORKER-PINNis and their efficient management by the MASTER-PINN, the concept of the trinity batch-PINNi-phase is crucial. Each PINNi is assigned a unique identifier by associating it with a specific batch number and phase-field number, as illustrated in Figure 1a. Consequently, the local neural network incorporates horizontally stacked spatial-temporal coordinates (x,y,t)𝑥𝑦𝑡(x,y,t)( italic_x , italic_y , italic_t ), the output (prediction) of which is a single array representing the fraction of the corresponding phase, ranging from 0 to 1. By simplifying the multi-phase problem into discrete single-phase problems in this manner, the computational approach is streamlined. This necessitates efficient interactions between the phases, which is treated through the ‘vertical’ communication. Finally, it is preferred to spatially decompose the simulation into Nbatches equal batches.

2.3 Pyramidal Training and Block Training

Merging the hyper-parameters of various PINNis (WORKERS) at once and with equal significance can result in overwriting certain features in the domain. For this purpose, we have thought of a gradual pyramidal training and merging. The concept of pyramidal training draws inspiration from the data augmentation principle in ML [HUTER2020109488, CARPENTER2002183]. It is based on the assumption that adding more data to a pre-trained model serves as a warm starting point, enabling the NN to better capture crucial features before further data processing. We propose to implement a pyramidal training paradigm in conjunction with another concept of ‘training on blocks’. These two complementary concepts are showcased in Figure 1(b) and described as follows: Once the spatial domain is subdivided into multiple batches, each one can be further divided into distinct domains, giving a pyramidal structure. Each of these finer domains in the pyramidal structure is called a block or a quarter. The procedure of subdivision can continue as depicted in 1(b). After the pyramidal subdivision is completed, the training process initiates at the finest subdivision level (i.e., the base of the pyramid). However, instead of training all Neural Networks (NNs) within all batches, within each block/quarter, PINNi with the highest number of interfacial points is selected for training; this selection concerns the first phase and the selected PINNis serve as a reference for other phases. Similarly, for the other phases (in the vertical direction), training is assigned to the PINNi within the same batch as the reference PINNi within each block/quarter. This selective handling reduces the computational costs and makes the training focused on hot spots in the simulation domain. Once target losses are reached for the selected PINNis, the training stops at the finer level and transitions to a coarser level using the same algorithm described above. The training on the coarser level begins with a warm-start approach, where each selected PINNi receives learning from the previously selected PINNi at the finer level, within the same quarter and sharing a spatial intersection area, thereby achieving a data augmentation-like training. The transition from one level of the pyramid to another is set by a squared relationship to ensure a pyramid-like subdivision:

Nbatches coarser=(Nbatches2)2subscript𝑁batches coarsersuperscriptsubscript𝑁batches22N_{\text{batches\leavevmode\nobreak\ coarser}}=\left(\sqrt{N_{\text{batches}}}% -2\right)^{2}italic_N start_POSTSUBSCRIPT batches coarser end_POSTSUBSCRIPT = ( square-root start_ARG italic_N start_POSTSUBSCRIPT batches end_POSTSUBSCRIPT end_ARG - 2 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (8)

where Nbatches coarsersubscript𝑁batches coarserN_{\text{batches\leavevmode\nobreak\ coarser}}italic_N start_POSTSUBSCRIPT batches coarser end_POSTSUBSCRIPT is the number of batches in the next upper level until (Nbatches=Nbatches minsubscript𝑁𝑏𝑎𝑡𝑐𝑒𝑠subscript𝑁batches minN_{batches}=N_{\text{batches min}}italic_N start_POSTSUBSCRIPT italic_b italic_a italic_t italic_c italic_h italic_e italic_s end_POSTSUBSCRIPT = italic_N start_POSTSUBSCRIPT batches min end_POSTSUBSCRIPT is reached, that means the continuation of training the PINNis on the whole spatial domain. This gradual resolution has two main advantages: On the one hand, it allows a fast identification of the optimal set of hyper-parameters, and therefore key features, without the need for an adaptive weighting strategy. On the other hand, it guarantees the training of the same numbers of NN independently of the number of subdivisions (Nbatchessubscript𝑁𝑏𝑎𝑡𝑐𝑒𝑠N_{batches}italic_N start_POSTSUBSCRIPT italic_b italic_a italic_t italic_c italic_h italic_e italic_s end_POSTSUBSCRIPT). For that, it is worth mentioning that such a strategy is highly recommended to be activated for the early stages of the training and then set off to speed up the training.

2.4 Adaptive mesh-free optimization

The Adaptive mesh-free optimization (AMFO) is here introduced as a PINN implementation extending the adaptive mesh refinement (AMR) concept, from the phase-field literature, to dynamically adjust collocation point distributions. AMR variants have indeed proven valuable for phase-field and MPF models, allowing efficient capture of evolving interfaces and steep gradients. By incorporating adaptive collocation point sampling, this approach leverages the mesh-free nature of PINNs while retaining AMR benefits like improved accuracy and robustness [LI20127926, GUPTA2022115347, XU2022108891, FREDDI2023100127, bijaya2023multilevel]. The ease of integrating adaptive sampling with PINNs makes this approach attractive for tackling challenges posed by phase-field models, such as complex geometries and localized high-gradient regions. The proposed technique employs adaptive collocation point sampling to concentrate points around interfacial regions, a denoising loss function to reduce noise in non-interfacial areas, and dynamic resampling to continuously adapt the overall collocation point distribution.

2.4.1 Adaptive re-meshing

In the presence of diffuse interfaces, continuity among the phase-fields and across the spatial domains is required. Inspired by the storage concept proposed for the OpenPhase [darvishikamachali2013grainPhD], we adopt a mesh structure that mainly focuses on the interfacial regions, neglecting interior grain zones where phase-fields are equal to 0 or 1. This strategy not only reduces the computational costs but also reduces the tendency for an error when coping for the continuity across the domains/batches. This is further promoted by the high capacity of fitting and extrapolation of ML which we elaborate on later in the Discussion (Section 5). To this end, some key hyper-parameters are user-predefined as follows. First, the minimum and maximum number of IC points per batch Nini minini min{}_{\text{ini min}}start_FLOATSUBSCRIPT ini min end_FLOATSUBSCRIPT and Nini maxini max{}_{\text{ini max}}start_FLOATSUBSCRIPT ini max end_FLOATSUBSCRIPT are defined. Once the ratios of 0 and 1 phase-values per batch are given, the number of IC points per batch Nini per batchsubscript𝑁ini per batchN_{\text{ini\ per\ batch}}italic_N start_POSTSUBSCRIPT ini per batch end_POSTSUBSCRIPT is set, following:

Nini per batch=min(int(Nini maxNini min1+eξ+Nini min),Nini max),subscript𝑁ini per batchintsubscript𝑁ini maxsubscript𝑁ini min1superscript𝑒𝜉subscript𝑁ini minsubscript𝑁ini maxN_{\text{ini\ per\ batch}}=\min\left(\text{int}\left(\frac{N_{\text{ini max}}-% N_{\text{ini min}}}{1+e^{-\xi}}+N_{\text{ini min}}\right),N_{\text{ini max}}% \right),italic_N start_POSTSUBSCRIPT ini per batch end_POSTSUBSCRIPT = roman_min ( int ( divide start_ARG italic_N start_POSTSUBSCRIPT ini max end_POSTSUBSCRIPT - italic_N start_POSTSUBSCRIPT ini min end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - italic_ξ end_POSTSUPERSCRIPT end_ARG + italic_N start_POSTSUBSCRIPT ini min end_POSTSUBSCRIPT ) , italic_N start_POSTSUBSCRIPT ini max end_POSTSUBSCRIPT ) , (9)

in which ξ𝜉\xiitalic_ξ donates the percentage of interfacial points per batch. The sigmoid-like function in Eq. (9) allows gradual densification of the mesh depending on the importance of the zone, so that the diffuse interfacial area and triple junctions in a multi-phase configuration should be allocated by the higher number of IC points. Using Eq. (9), the collocation points are generated dynamically around the IC points (typically within a circle of radius η/2𝜂2\eta/2italic_η / 2) through a predefined ratio to get Nf=CfNinisubscript𝑁𝑓subscript𝐶𝑓subscript𝑁iniN_{f}=C_{f}*N_{\text{ini}}italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∗ italic_N start_POSTSUBSCRIPT ini end_POSTSUBSCRIPT, where Nfsubscript𝑁𝑓N_{f}italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is the number of collocation points per batch and Cfsubscript𝐶𝑓C_{f}italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is a user parameter that could be adjusted depending on the nature of the problem. Experimenting with our benchmark studies below, we varied this coefficient between 20 to 40. Another potential improvement, to reduce the hyper-parameters in Eq. (9) and to increase the significance to triple junctions, is to define the density of interfacial points dynamically. This would involve computing the percentage of interfacial points per batch Nini per batchsubscript𝑁ini per batchN_{\text{ini\ per\ batch}}italic_N start_POSTSUBSCRIPT ini per batch end_POSTSUBSCRIPT dynamically, thereby determining the number of interfacial points per batch as follows:

Nini per batch=[|xminxmax||yminymax|η2]ξχsubscript𝑁ini per batchdelimited-[]subscript𝑥minsubscript𝑥maxsubscript𝑦minsubscript𝑦maxsuperscript𝜂2𝜉𝜒N_{\text{ini\ per\ batch}}=\left[\frac{|x_{\text{min}}-x_{\text{max}}|\cdot|y_% {\text{min}}-y_{\text{max}}|}{\eta^{2}}\right]\cdot\xi\cdot\chiitalic_N start_POSTSUBSCRIPT ini per batch end_POSTSUBSCRIPT = [ divide start_ARG | italic_x start_POSTSUBSCRIPT min end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT max end_POSTSUBSCRIPT | ⋅ | italic_y start_POSTSUBSCRIPT min end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT max end_POSTSUBSCRIPT | end_ARG start_ARG italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] ⋅ italic_ξ ⋅ italic_χ (10)

where xmin,xmax,ymin,ymaxsubscript𝑥minsubscript𝑥maxsubscript𝑦minsubscript𝑦maxx_{\text{min}},x_{\text{max}},y_{\text{min}},y_{\text{max}}italic_x start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT max end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT max end_POSTSUBSCRIPT represent the limits of each PINNi. Recall that ξ𝜉\xiitalic_ξ corresponds to the associated percentage of interfacial points, emphasizing that η𝜂\etaitalic_η denotes the interfacial width, and these parameters are already available within the code. Additionally, χ𝜒\chiitalic_χ is an introduced hyper-parameter (user input) representing the local density of the mesh, indicating how much the mesh should be densified based on the number of interfacial points inside. With this improvement, the geometric hyper-parameters are reduced to two variables: the number of subdivisions (Nbatchesbatches{}_{\text{batches}}start_FLOATSUBSCRIPT batches end_FLOATSUBSCRIPT) and the local density (χ𝜒\chiitalic_χ), considering that the Cfsubscript𝐶𝑓C_{f}italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT coefficient could be set at 20 for all numerical experiments. This simplification allows focusing on the architecture of the neural networks. The PINNis sharing the same batch index while handling different phases should use common collocation data (common batches). This introduces an additional stage of optimization. Further details on this can be found in the Supplementary Material (SM), section C.

2.4.2 Denoising Loss

Focusing on the diffuse interfacial zones gives an efficient and precise approximation of the MPF solution, but still does not exclude random prediction away from the interfaces. To address this issue, a denoising loss, functioning as a corrector, is introduced. A batch is typically made of three distinct regions: A, B, and C, where region A corresponds to the interface, region B is the grain-containing area, and region C encompasses areas far from the interface (no-grain), see Figure 2). By dynamically identifying areas of interest, as depicted in Figure 2), the additional loss term is applied within unlabeled regions that are free of collocation points, such that, each PINNi operates to minimize the prediction of phase-field values in regions identified as ’no grain’ while maximizing predictions in the grain-containing regions. Eq. 11 describes this loss, where the Mean Squared Error (MSE) is calculated based on the prediction ϕpredsubscriptitalic-ϕpred\phi_{\text{pred}}italic_ϕ start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT and is conditioned on whether the region is identified as ’no grain’ or ’grain,’ denoted by flagno grainsubscriptflagno grain\text{flag}_{\text{no grain}}flag start_POSTSUBSCRIPT no grain end_POSTSUBSCRIPT and flaggrainsubscriptflaggrain\text{flag}_{\text{grain}}flag start_POSTSUBSCRIPT grain end_POSTSUBSCRIPT, respectively:

lossdenoising={MSE(ϕpred,0),where no grain (flagnograin=0)+MSE(ϕpred,1),where grain (flaggrain=1)subscriptlossdenoisingcasesMSEsubscriptitalic-ϕpred0where no grain 𝑓𝑙𝑎subscript𝑔𝑛𝑜𝑔𝑟𝑎𝑖𝑛0MSEsubscriptitalic-ϕpred1where grain 𝑓𝑙𝑎subscript𝑔𝑔𝑟𝑎𝑖𝑛1\text{loss}_{\text{denoising}}=\begin{cases}\leavevmode\nobreak\ \text{MSE}% \leavevmode\nobreak\ (\phi_{\text{pred}},0),&\text{where no grain }(flag_{% \leavevmode\nobreak\ no\leavevmode\nobreak\ grain}=0)\\ \leavevmode\nobreak\ +\leavevmode\nobreak\ \text{MSE}\leavevmode\nobreak\ (% \phi_{\text{pred}},1),&\text{where grain }(flag_{\leavevmode\nobreak\ grain}=1% )\end{cases}loss start_POSTSUBSCRIPT denoising end_POSTSUBSCRIPT = { start_ROW start_CELL MSE ( italic_ϕ start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT , 0 ) , end_CELL start_CELL where no grain ( italic_f italic_l italic_a italic_g start_POSTSUBSCRIPT italic_n italic_o italic_g italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0 ) end_CELL end_ROW start_ROW start_CELL + MSE ( italic_ϕ start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT , 1 ) , end_CELL start_CELL where grain ( italic_f italic_l italic_a italic_g start_POSTSUBSCRIPT italic_g italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 1 ) end_CELL end_ROW (11)
Refer to caption
Figure 2: Illustration of the denoising loss which is computed in the identified areas B and C, far from the interface .

2.4.3 Dynamic Resampling

Resampling, applied after each Scipy optimization, serves as a dynamic process that benefits both converged and non-converged PINNis. The collocation points undergo automatic resampling. This approach offers a dual purpose: converged PINNis are subjected to testing with new but relatively similar populations. This testing mechanism safeguards against overfitting and ensures the robustness of the predictions as confirmed by previous PINN studies [WU2023115671, raissiphysicsinformed2019, JAGTAP2020109136, li2022dynamic, daw2023mitigating]. Simultaneously, the non-converged PINNis are exposed to new, unexplored samples to have a better chance to enrich their training and therefore a better convergence.

2.5 Propagation of BC across Sub-domains

As shown in Figure 1(b), the propagation of BC along the neighbors in our decomposed space is required to ensure continuity and therefore a reliable prediction of the MPF solution. To this end, a customized BC loss is introduced that concerns the spatial internal boundaries across subdomains/batches/PINNis handling a given phase-field. Here, each PINNi optimizes its weights by identifying its neighbors while considering the boundary predictions of its neighbors that are used to minimize the additional BC loss. The BC loss function is set as the sum of differences between the prediction of the current PINNi and those of the neighbors, depicted also in Figure 1b. A detail that enhances the handling of the boundary condition is to consider the PINN00xx (in batch 0 for each phase) as an absolute reference after addressing its PDE and IC losses. Subsequently, for a given PINNi, the west and inner neighbors are regarded as temporal references. This leads to a systematic propagation of the boundary conditions, ensuring that all PINNi instances update their weights based on the PINN00xx. These interactions create mutual dependencies among neighboring domains/batches/PINNis, fostering spatial-temporal continuity during the training process.

2.6 Handling Multi-Phases (Vertical Optimization)

The full discrete construction of the current framework in space, time and phase-fields allows the instant communication between PINNis, through the MASTER PINN, that is required to perform the multi-phase studies. This is achieved in two steps: In the first step, one is entailed to identify the possible interaction(s) between any two phase-fields. This must be done for every PINNi. Obviously, with every interaction found, an associated PINNi for that phase-field is identified as well. A fundamental assumption here is that two given phase-fields interact when their diffuse interfaces intersect. This results in obtaining a batch-phase space. In 2D, a maximum of three phase-fields intersect, forming a triple junction. An example of this idea is illustrated in Figure 3 where we find that out of the 48 possible batch-phasе combinations, 8 exhibited no interactions. This approach has a significant impact when handling a large number of phase-fields. Hereafter, the term ’grain’ is also used interchangeably with ’phase’ once reflects the same entity.

Refer to caption
Figure 3: Illustration of the interaction between different phases in different batches for a 2D triple junction case: each phase, in each batch interacts with up to three phases.

The MASTER-WORKER PINNi structure enables a computational- and memory-efficient calculation of the Laplacian term in Eq. (1). Each PINNi responsible for a specific phasе-field communicatеs its computеd Laplacian tеrm instantly to thе nеighboring PINNis working on the same batch but handling different phase-fields. This is essential for avoiding the computation of multiple Laplacian terms by each PINNi and enhancing the computational efficiency. The numerical implementation of the MPF model (Eq. (1)) is simplified into the parallel computation of the interaction terms for each phase within each batch. Subsequently, a robust communication protocol ensures synchronization of the computation for the correct equation terms, as detailed in Algorithm 2. To ensure synchronization, a given PINNi needs to wait to receive all the required information from the PINNis in interactions. As a reminder (c.f. paragraph 2.4), an optimization detail involves the use of a shared collocation dataset for all PINNis handling different phases within the same batch. For simplification, the collocation dataset corresponding to the first phase in the initialization (phase α𝛼\alphaitalic_α in algorithms 1 and 2) is consistently selected as a reference.

0.1
0.2
Input1: PDE equation \leftarrow Eq. 1
Input2: Interaction infos (dictionary)
0.3 if  phase 1  then
0.4       get \leftarrow self batchXf𝑏𝑎𝑡𝑐subscript𝑋𝑓{batch_{Xf}}italic_b italic_a italic_t italic_c italic_h start_POSTSUBSCRIPT italic_X italic_f end_POSTSUBSCRIPT
0.5      
0.6else
0.7       identify the other PINNis in interaction with self.
0.8       get batchXf𝑏𝑎𝑡𝑐subscript𝑋𝑓batch_{Xf}italic_b italic_a italic_t italic_c italic_h start_POSTSUBSCRIPT italic_X italic_f end_POSTSUBSCRIPT of the PINNi handling the phase 1.
0.9      
0.10compute ϕ˙αsubscript˙italic-ϕ𝛼\dot{\phi}_{\alpha}over˙ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT
0.11 compute self interaction term (Iαsubscript𝐼𝛼I_{\alpha}italic_I start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT) \leftarrow Eq. LABEL:eq:Inter_term
0.12 for PINNibeta𝑏𝑒𝑡𝑎{}_{\leavevmode\nobreak\ beta}start_FLOATSUBSCRIPT italic_b italic_e italic_t italic_a end_FLOATSUBSCRIPT in PINNis in interaction  do
      
       // αβ𝛼𝛽\alpha\neq\betaitalic_α ≠ italic_β
0.13       while Iβsubscript𝐼𝛽I_{\beta}italic_I start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT is not updated  do
0.14             wait
0.15            
0.16      for PINNibeta𝑏𝑒𝑡𝑎{}_{\leavevmode\nobreak\ beta}start_FLOATSUBSCRIPT italic_b italic_e italic_t italic_a end_FLOATSUBSCRIPT in PINNis in interaction  do
            
             // γα,β𝛾𝛼𝛽\gamma\neq\alpha,\betaitalic_γ ≠ italic_α , italic_β
0.17             while Iγsubscript𝐼𝛾I_{\gamma}italic_I start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT is not updated  do
0.18                   wait
0.19                  
0.20      All interaction terms are computed
0.21       compute the right side of Eq. 1
0.22       compute the PDE loss for the actual batch \leftarrow Eq. 4
0.23       correct the phase predictions ; ϕαϕαi=1Nϕisubscriptitalic-ϕ𝛼subscriptitalic-ϕ𝛼superscriptsubscript𝑖1𝑁subscriptitalic-ϕ𝑖\phi_{\alpha}\leftarrow\frac{\phi_{\alpha}}{\sum_{i=1}^{N}\phi_{i}}italic_ϕ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ← divide start_ARG italic_ϕ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG
0.24       compute the sum constraint loss for the entire grid at t=tmax𝑡subscript𝑡𝑚𝑎𝑥t=t_{max}italic_t = italic_t start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT \leftarrow Eq. 7
0.25       return the total loss
0.26      
Algorithm 2 The vertical optimization along phases to optimize the computing of the PDE loss for a given PINNis (prediction a given phase (α𝛼\alphaitalic_α) in a given batch). Here "self" denotes the considered PINNi while batchXf𝑏𝑎𝑡𝑐subscript𝑋𝑓batch_{Xf}italic_b italic_a italic_t italic_c italic_h start_POSTSUBSCRIPT italic_X italic_f end_POSTSUBSCRIPT represents the associated dataset of collocation points. The dynamic correction of the phase summations is done within this block.

2.7 The Wheel of Optimizers

The framework effectively combines Adam’s stochastic optimization with Scipy’s L-BFGS, fostering a collaborative approach. Adam operates at each epoch, contributing its stochastic optimization capabilities, while Scipy, notably the L-BFGS optimizer, is periodically selected. During these selections, L-BFGS efficiently leverages information from the Hessian matrix to optimize weights. When a given PINNi successfully converges, reaching a loss value below the predetermined threshold, it temporarily steps back, entering a wait state. Meanwhile, other PINNis, selected from a shared ’basket of PINNis,’ proceed with their L-BFGS optimization. This metaphorical basket illustrates the practice of choosing specific PINNs for weight optimization and then reintegrating them, symbolizing a cyclic process; hence, the terminology ’wheel of optimizers’. The purpose of this cyclic process is to ensure that optimized learning results in synchronized motion between batches, leading to accurate predictions at boundaries, and effectively captures the dynamics across phases, including terms from PDE.

2.8 Transfer of Learning

The highly symmetric and parallel construction of the current implementation enables the transfer of learning in multiple directions. Within the same simulation, we transfer the learning spatially through the pyramidal training (between the levels of pyramids) and across boundary conditions. The transfer also occurs temporally, from one spatial domain to its image in the next time step. These allow us to create checkpoints to continue/restart a simulation without losing the accumulated learning. A particular application of such transfer is to optimize the parallel performance of the worker PINNis. When a worker PINNi fails to converge after a certain number of Scipy optimizations, it is then warm-started by receiving the learning from the nearest neighbor already converged. This is to enhance its convergence rate and overall performance. Such a situation could arise indeed due to various factors, including insufficient data or training samples, numerical instabilities, singularities or discontinuities in the data and inadequate regularization. Additionally, the framework allows real-time processing, facilitating easy debugging and dynamic adaptability.

2.9 Phase correction

As a requirement to handle MPF problems, a dynamic correction mechanism for phase predictions is hereafter introduced. This algorithm conserves the summation of the phase fields of different phases within interfacial regions. PINNs often fail to predict correct solutions, necessitating correction algorithms to satisfy conservation laws [HUANG2021103727]. Therefore, incorporating conservative correction algorithms and adequate optimization methods is essential for accurate predictions [Zheng2022].

3 Methods

4 Results

4.1 Simulation Results

Analogous to the MPF approach, the reliability and performance of a PINNs-MPF for studying the interface dynamics in polycrystalline materials need to be benchmarked capturing three basic aspects: First, the curvature-driven motion of an interface, second, the correct motion of the interface under any additional driving force and third, the force balance at the junctions of interfaces [darvishikamachali2013grainPhD]. The first two requirements in this set can be generally written as:

vn=ϕ˙ϕ=M(σκ+Δg),subscript𝑣𝑛˙italic-ϕitalic-ϕ𝑀𝜎𝜅Δ𝑔\displaystyle v_{n}=\frac{\dot{\phi}}{\nabla\phi}=M\left(\sigma\kappa+\Delta g% \right),italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG over˙ start_ARG italic_ϕ end_ARG end_ARG start_ARG ∇ italic_ϕ end_ARG = italic_M ( italic_σ italic_κ + roman_Δ italic_g ) , (12)

with vnsubscript𝑣𝑛v_{n}italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT the local interface velocity normal to its plane, κ𝜅\kappaitalic_κ the local curvature of the interface, and ΔgΔ𝑔\Delta groman_Δ italic_g a constant driving force, where with consider ΔG=ϕΔgΔ𝐺italic-ϕΔ𝑔\Delta G=\nabla\phi\Delta groman_Δ italic_G = ∇ italic_ϕ roman_Δ italic_g in Eq. (1). The third requirement concerning interface junctions necessitates the MPF to fulfill Young’s law, which relates the angles between interfaces at the junction to the interfacial energy. For a triple junction and isotropic interface energy assumed, the three interfaces at equilibrium meet at a 120 angle. To test the PINNs-MPF framework, certain simulation scenarios related to the above requirements are studied as listed in Table 1, along with related physical parameters and applied hyper-parameters in the PINNs implementation in Table 2. We note that for the grain shrink scenarios, the number of IC points per batch Nini per batchsubscript𝑁ini per batchN_{\text{ini per batch}}italic_N start_POSTSUBSCRIPT ini per batch end_POSTSUBSCRIPT serves as an input parameter, and Equation 9 is used for its generation (with Nini minsubscript𝑁ini minN_{\text{ini min}}italic_N start_POSTSUBSCRIPT ini min end_POSTSUBSCRIPT and Nini maxsubscript𝑁ini maxN_{\text{ini max}}italic_N start_POSTSUBSCRIPT ini max end_POSTSUBSCRIPT varied up to 90). However, for the triple junction scenario, this process is automated through Equation 10, where χ𝜒\chiitalic_χ is set to 40 in order to dynamically populate the batches during training, depending on the evolution of the solution.

Table 1: Numerical and physical parameters of the testing benchmarks. The corresponding units of the interface width, mobility, and interfacial energies are respectively m, m4J1s1superscript𝑚4superscript𝐽1superscript𝑠1m^{4}J^{-1}s^{-1}italic_m start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and Jm2𝐽superscript𝑚2Jm^{-2}italic_J italic_m start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT.
Spatio-temporal
parameters (dimensionless)
Physical parameters
Benchmark Nx dx Ny dy Nt
Interface
width
Mobility
Interfacial
energy
Phases
Traveling
wave
interface
100 0.01 100 0.01 1000 10dx 1e-4 1 1
Grain shrinkage
under
driving force
64 .. 64 .. 100 7dx 1e-4 1 1
Curvature-driven
grain shrinkage
64 .. 64 .. 150 4-9dx 1e-4 1 1
Triple juncion 64 .. 64 ..
up to
 200
6dx 1e-4 1 4
Table 2: Hyper-parameters of the implemented PINN for the different testing benchmarks. LR: Learning Rate, Per.: Periodicity/Epochs
NN Architecture Optimizers
Benchmark NNs HL/NN Nodes / layer Adam L-BFGS
LR
Activation
function
max
Iter
Per.
Traveling
wave interface
1 6 32 1e-3
tanh + sigmoid
(last layer)
1000 100
Grain shrink
under driving
force
4 6 32-128 1e-4 idem 1000 50
Grain shrink with
natura motion
4 6 128 1e-4 idem 1000 50
Triple juncion
(*)
16
(**)
6 128 1e-5 idem 500 50
  • (*) For the triple junction, the pyramidal approach was tested for the early stages of the training.

  • (**) When using the pyramidal training, 64 NNs are inititalizated, while 16 NNs are selected for training using the the concept of the basket of PINNs.

To further highlight the effectiveness of multi-networking in conjunction with a combination of optimizers for training PINNs, an additional numerical experiment is proposed as follows: reproducing the benchmark of grain shrinkage under constant driving forces with varying numbers of hidden layers and neurons per layer. Specifically, a fixed number of six hidden layers with the number of neurons ranging from 64 to 512 was applied. This was achieved by employing a cyclic optimization strategy alternating between the Adam optimizer and the L-BFGS optimizer (referred to as the wheel of optimizers). Associated findings and comments are provided in the discussion section.
Additionally, we illustrate in Table 3 the progressive increase of the optimization techniques to deal with the induced complexity by the tackled scenarios.

Table 3: Enumeration of applied optimization techniques with increasing complexity of target problems, as well as the associated number of model trainable parameters.
Benchmark Activated Techniques
Number of
optimization
Techniques
Number of
trainable
parameters
Travelling wave
interface
Wheel of optimizers (Adam and LFBGS),
resampling, discrete or continuous resolution
(optional), transfer of learning
4 3,297
Grain shrinkage
under
driving force
Wheel of optimizers, discrete resolution,
extended domain decomposition,
Adaptive Mesh-Free Optimization (AMFO),
transfer of learning
5 416,005
Curvature-driven
grain shrinkage
Wheel of optimizers, discrete resolution,
extended domain decomposition,
AMFO, transfer of learning
5 416,005
Triple junction
Wheel of optimizers, discrete resolution,
Extended domain decomposition,
transfer of learning, AMFO, Basket of PINNs,
Handling multi-phases, pyramidal training
correction of phases predictions
9 1,414,417

Furthermore, for the scenarios involving grain shrinking and triple-junctions, the MPF solutions from OpenPhase calculations are used [darvishikamachali2013grainPhD]. The associated scheme is provided in the SM, section A. All numerical implementations were coded using TensorFlow and performed on standard workstations with an AMD Ryzen Threadripper PRO 5975WX 32-Cores CPU (64 threads).

4.2 Traveling Interface under Constant Driving Forces

As a key starting point, the behavior of a planar interface traveling under a given driving force is explored. In the absence of the curvature, this can be simplified to a 1D problem in which the phase-field profile propagates according to [steinbachphasefield2009, darvishikamachali2013grainPhD]:

ϕ(x,t)={1 for x<vntη21212sin(πη(xvnt)) for vntη2x<vnt+η20 for xvnt+η2italic-ϕ𝑥𝑡cases1 for 𝑥subscript𝑣𝑛𝑡𝜂21212𝜋𝜂𝑥subscript𝑣𝑛𝑡 for subscript𝑣𝑛𝑡𝜂2𝑥subscript𝑣𝑛𝑡𝜂20 for 𝑥subscript𝑣𝑛𝑡𝜂2\phi(x,t)=\left\{\begin{array}[]{lll}1&\text{ for }&x<v_{n}t-\frac{\eta}{2}\\ \frac{1}{2}-\frac{1}{2}\sin\left(\frac{\pi}{\eta}\left(x-v_{n}t\right)\right)&% \text{ for }&v_{n}t-\frac{\eta}{2}\leq x<v_{n}t+\frac{\eta}{2}\\ 0&\text{ for }&x\geq v_{n}t+\frac{\eta}{2}\end{array}\right.italic_ϕ ( italic_x , italic_t ) = { start_ARRAY start_ROW start_CELL 1 end_CELL start_CELL for end_CELL start_CELL italic_x < italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_t - divide start_ARG italic_η end_ARG start_ARG 2 end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_sin ( divide start_ARG italic_π end_ARG start_ARG italic_η end_ARG ( italic_x - italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_t ) ) end_CELL start_CELL for end_CELL start_CELL italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_t - divide start_ARG italic_η end_ARG start_ARG 2 end_ARG ≤ italic_x < italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_t + divide start_ARG italic_η end_ARG start_ARG 2 end_ARG end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL for end_CELL start_CELL italic_x ≥ italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_t + divide start_ARG italic_η end_ARG start_ARG 2 end_ARG end_CELL end_ROW end_ARRAY (13)

The results of this campaign are gathered in Figure 4. The utility of analyzing a traveling wave solution lies in its ability to accurately predict the shape and velocity of the phase-field profile. Figure 4(a) presents the setup of the BC and IC points, as well as the placement of PDE collocation points for this problem. As described in paragraph 2.4, the meshing is predominantly focused within the interface regions, where the PINN model is trained to predict the solution. Excellent agreement is obtained between the PINNs-MPF and the theoretical solutions at various time instants, Figure 4(b). Figure 4(c) shows the spatial evolution of the interface resolved by the PINNs-MPF. To trace the interface motion and its velocity, a representative point with a phase-field value ϕ=0.5italic-ϕ0.5\phi=0.5italic_ϕ = 0.5 at x,t=0,0formulae-sequence𝑥𝑡00x,t=0,0italic_x , italic_t = 0 , 0 is selected. Figure 4(d) compares the theoretical and predicted velocities, revealing an accuracy approaching unity after 2500 epochs of training, with a total \approx10 minutes of computation time.

For a planar interface (no curvature), the interface velocity is expected to linearly scale with the input driving force (Eq. (12)). Figure 4(e) shows the interface velocities obtained for various driving forces, perfectly matching the theoretical expectation. Here the PINNs-MPF were first trained for a single ΔgΔ𝑔\Delta groman_Δ italic_g value (Figure 4(f)) and the learning is then transferred to predict the interface velocity for various ΔgΔ𝑔\Delta groman_Δ italic_g values. Such transfer of learning is typically carried out within a significantly reduced number of (maximum 100) epochs of training, as the training is not always evitable [weinan_deep_2018]. From a technical perspective, this analysis tests localized meshing, the transfer of learning and full-discrete resolution. It is worth noting that here a single neural network effectively handles the 1D scenario.

Refer to caption

(a) Initial mesh
Refer to caption
(b) Travelling wave solution
Refer to caption
(c) Spacial motion of the interface
Refer to caption
(d) Identification of the interfacial velocity
Refer to caption
(e) Impact of external driving force
Refer to caption
(f) PINN extrapolations
Figure 4: PINN solution for the traveling wave interface problem against the theoretical solution.
Table 4: Quantitative comparison between each of the tested architectures and the theoretical solution.
Configuration Neural Networks HL Neurons/HL MAE MSE
1 4 6 32 4.72×1024.72superscript1024.72\times 10^{-2}4.72 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 3.06×1033.06superscript1033.06\times 10^{-3}3.06 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
2 4 6 64 3.12×1023.12superscript1023.12\times 10^{-2}3.12 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 1.47×1031.47superscript1031.47\times 10^{-3}1.47 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
3 4 6 128 2.82×1032.82superscript1032.82\times 10^{-3}2.82 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 1.14×1051.14superscript1051.14\times 10^{-5}1.14 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT

One can immediately expand the driving-force-driven interface kinetics to 2D where the planar interface is replaced by a circular interface. In this scenario, the interface has a curvature, but its effect is negligible under the condition that Δgσκmuch-greater-thanΔ𝑔𝜎𝜅\Delta g\gg\sigma\kapparoman_Δ italic_g ≫ italic_σ italic_κ (Eq. (12). Although the linear scaling of the interface velocity holds, this scenario imposes a challenge for the PINNs-MPF going from a cartesian to a spherical coordinate. PINNs-MPF predictions versus theoretical solution from OpenPhase are shown in Figure 5. Here initial trials involving a single NN (used for the 1D case above) revealed limitations marked by training instabilities and noise (c.f. the SM for qualitative and quantitative comparison between PINN predictions and ground truth solution for a single NN). However, stable training was obtained when applying four neural networks. Indeed, subsequent refinements increased the prediction accuracy by progressively increasing the number of neurons starting from 32, resulting in an optimal configuration of 128 neurons and 6 hidden layers, as observed in Figure 5. To quantify the differences between each PINN prediction and the theoretical solution, it is proposed in Table 4 measures of the MSE and Mean Absolute Error (MAE) for each prediction compared to the ground truth solution. The architecture with 128 neurons not only accurately matched the theoretical solution (with MAE and MSE values of 2.82×1032.82superscript1032.82\times 10^{-3}2.82 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and 1.14×1051.14superscript1051.14\times 10^{-5}1.14 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT respectively) but also outperformed the phase-field solution obtained using an explicit scheme. Thus, it is hereafter proposed to keep this architecture for subsequent benchmarks as it allowed to deal with the double-well potential term and the non-convex potential term in Eq. LABEL:eq:dual_form on a one hand, and to fix the related set of parameters of this architecture, especially for the multi-phase field scenario.

It is worth noting that changing from the 1D to 2D setup, the PINNs-MPF is well capable of capturing and maintaining the correct interface thickness throughout the simulation.

Refer to caption
(a) Initial Condition
Refer to caption
(b) Localized mesh
Refer to caption
(c) ϕitalic-ϕ\phiitalic_ϕ at Time (1 %)
Refer to caption
(d) ϕitalic-ϕ\phiitalic_ϕ at Time (51 %)
Refer to caption
(e) ϕitalic-ϕ\phiitalic_ϕ at Time (86 %)
Refer to caption
(f) Radius vs. time for different architectures.
Figure 5: Grain prediction of PINN against theoretical and PF solutions

4.3 Curvature-driven Interface Motion

Capturing the natural curvature-driven motion of an interface is the most central capability of a phase-field model, encoded into the gradient energy term and related Laplacian in the governing free energy functional and equations of motion, respectively. When the Laplacian terms take control over the interfacial motion, i.e., σκΔgmuch-greater-than𝜎𝜅Δ𝑔\sigma\kappa\gg\Delta gitalic_σ italic_κ ≫ roman_Δ italic_g in Eq. (12), nonlinear kinetics arise: Here we study the shrinkage of a circular phase (grain) with κ=1/R𝜅1𝑅\kappa=1/Ritalic_κ = 1 / italic_R, giving vn=dR/dt=Mσ/Rsubscript𝑣𝑛𝑑𝑅𝑑𝑡𝑀𝜎𝑅v_{n}=dR/dt=M\sigma/Ritalic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_d italic_R / italic_d italic_t = italic_M italic_σ / italic_R. This gives another level of complexity to test the PINNs-MPF framework. The number of time intervals required for the resolution of this case was 150, corresponding to Δt=103Δsuperscript𝑡superscript103{\Delta}t^{*}=10^{-3}roman_Δ italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, while Δt=104Δsuperscript𝑡superscript104{\Delta}t^{*}=10^{-4}\textbf{}roman_Δ italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT is required to resolve the same problem using the PF scheme (c.f. the SM, section A). The results are presented in Figure 6. Here, the PINN solution fits the theoretical one. However, the phase-field linearized scheme results in a premature shrinkage of the grain. The same result is obtained even when trying with smaller time steps. This case, where the motion is fully governed by the interface and no external driving force is applied, demonstrates that PINNs can efficiently handle non-linearities in the phase-field context. Indeed, it is worth reminding that we use the tanh activation function in hidden layers to capture diverse non-linear patterns in physics-informed input data. For the output layer, the sigmoid activation ensures bounded predictions, making it suitable for interpreting outputs. The specific limits (the order parameter) remain between 0 and 1. These tailored non-linear activations allow PINN to effectively handle the grain shrinkage scenario where the motion is controlled by the interfacial energy and curvature.

The pattern of the phase-field solution depends on the interface width λ𝜆\lambdaitalic_λ; for a thin interface (η=4dx𝜂4𝑑𝑥\eta=4dxitalic_η = 4 italic_d italic_x), there is a slight deviation from the theoretical solution, while the results improve with increasing the width. This behavior is systematically investigated [darvishikamachali2013grainPhD], showing that it arises from the numerical limitations of calculating the Laplacian and consequently the interface curvature. This is a feature of the diffuse phase-field interfaces. The PINNs-MPF predictions were studied for three interface widths of η=𝜂absent\eta=italic_η = 4dx, 7dx and 9dx. In all configurations, a consistent alignment is evident between the PINNs-MPF and the OpenPhase solutions. This is best visible for the thinnest interface shown in Figure 6b. This set of benchmarks demonstrates that the PINNs-MPF is not only capable of handling the nonlinear interface kinetics but also has a good sense of the interface width and its impact on the solution. The latter is crucial for future developments of the model to deal with interfacial phenomena, such as interfacial elasticity and solute segregation [wang2021incorporating, zhou2022revealing].

Refer to caption
(a) ϕitalic-ϕ\phiitalic_ϕ at Time (1 %)
Refer to caption
(b) Radius vs. time for η=7dx𝜂7𝑑𝑥\eta=7dxitalic_η = 7 italic_d italic_x
Refer to caption
(c) ϕitalic-ϕ\phiitalic_ϕ at Time (40 %)
Refer to caption
(d) Radius vs. time for η=4dx𝜂4𝑑𝑥\eta=4dxitalic_η = 4 italic_d italic_x
Refer to caption
(e) ϕitalic-ϕ\phiitalic_ϕ at Time (65 %)
Refer to caption
(f) Radius vs. time for η=9dx𝜂9𝑑𝑥\eta=9dxitalic_η = 9 italic_d italic_x
Figure 6: Grain radius prediction of PINN against theoretical and PF solutions (Open-Phase) for a natural motion. The plots correspond for PINN prediction for an interface width λ=7dx𝜆7𝑑𝑥\lambda=7dxitalic_λ = 7 italic_d italic_x.

4.4 Establishing Equilibrium Triple Junctions

A major capability of MPF is to handle interfacial junctions. This is not only to fulfill Young’s law but also to ensure the evolutionary path leading to a configuration with minimum energy. To test our PINNs-MPF framework, a system with four phase-fields (grains) and six triple junctions was considered. The initial mesh is visualized in Figure 7. Using our method (c.f. the SM, section C), the IC points are densified within the interfacial region, while some random points were also selected away from the interfaces to allow PINNs to learn better. To streamline computations, BC are selectively addressed along the interface lines between different batches. It is worth highlighting that the initialization having sharp angles presents a challenge for PINNs to produce accurate solutions. This is because of the curvature-driven nature of the interface dynamics, thus, initiating away from smooth curves/shapes results in complex evolutionary scenarios.

The simulation results are visually depicted in Figure 8. Here, each grain is shown in its initial and final state of the simulation, compared to the simulation results from OpenPhase. Figure 8(m)-(o) show the total interfacial region in the initial and final state. The comparison between PINNs-MPF and OpenPhase reveals an excellent agreement. The microstructure evolves into an equilibrium state where the angles at the triple junctions approach 120 . One can see that the motion of the phases is well synchronized, allowing the global solution to converge to the equilibrium state. Note that the application of boundary conditions on the outer boundary of the simulation box requires the interfaces on both ends to adjust in an energy-minimizing manner.

The difference between the PINNs-MPF solution and OpenPhase lies in the interface thickness. This difference may be justified by the gradient topology of each methodology: It is indeed noted that the gradient calculation employed in the implementation, specifically using the "gradient.tape()" module in TensorFlow, differs from the Laplacian used in the regular grid computation for MPF calculations. Additional details about the difference in the computation of the Laplacian terms are given in the SM, section A. This difference explains the minor discrepancies related to the interface width. However, it can be asserted that PINNs predictions exhibit greater fidelity compared to OpenPhase when preserving similar interface widths to the initial state. We also note that, due to the curvature-driven nature of grain growth problems, handling an initialization with sharp angles (90) is challenging. This demonstration of the correct evolution of such a critical initialization consolidates the use of such a framework in increasingly difficult contexts. From a technical standpoint, it’s worth noting that this results in relatively slow training in the beginning until addressing the sharp angles, after which the training accelerates.

Refer to caption
(a) Phase Indexes
Refer to caption
(b) Phase 0
Refer to caption
(c) Phase 1
Refer to caption
(d) Phase 2
Refer to caption
(e) Phase 3
Figure 7: Initial condition points for the different NNs handling the four phases in different batches. The corresponding mesh is given in the SM, section C. Reminder: the numbers on each batch correspond to the corresponding PINNi index (cf. Figure 1). The IC points in each batch are surrounded by a circle with a common color for checking purposes.

Initialization     MPF Solution     PINN Predictions

Refer to caption
(a) grain 0
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d) grain 1
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g) grain 2
Refer to caption
(h)
Refer to caption
(i)
Refer to caption
(j) grain 3
Refer to caption
(k)
Refer to caption
(l)
Refer to caption
(m) Sum of Interfaces
Refer to caption
(n)
Refer to caption
(o)
Figure 8: Pure Phase-field solution (phase-field ϕitalic-ϕ\phiitalic_ϕ) against PINN predictions for a triple-junction problem involving four grains/phases in the system.

5 Discussion

Analyzing the training dynamics through the loss functions

The training of PINNs-MPF involves various loss terms, described through Eqs. 4 to 7. To understand and optimize the training process necessitates a comprehensive analysis of each loss in terms of their individual behavior and collective synchronization.
For the grain shrinking scenario (paragraph 4.3), the evolution of various mean losses for the four PINNis across epochs is shown in Figure 9. Here, (i) each loss curve corresponds to the average of the four corresponding losses from the four neural networks (NNs) and (ii) the entire training process is technically conducted in a single execution. However, for the sake of clarity in illustrating the evolution of losses, we choose to present the simulation in two stages, such that, after convergence in the initial time interval, the acquired learning is then applied to restart the simulation.

Figure 9a shows the evolution of the different losses for the first time interval (t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT-t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT). The threshold is set at 5×1055superscript1055\times 10^{-5}5 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT, marking the transition point to the next time interval. The training initially focuses on the interfacial regions, activating only the PDE and IC losses. Once the solution is computed in these regions, BC and denoising losses are subsequently activated. This aims to establish spatial continuity between batches and eliminate noise within both grain and non-grain areas, respectively. The discrete activation of BC and denoising loss has proven to be more efficient in enhancing training compared to continuous activation. It is proposed to categorize the losses into intrinsic losses, including partial PDE and IC losses, representing core elements for the NN. On the other hand, extrinsic losses, here BC and denoising, play a role in fine-tuning the solution and establishing continuity with neighboring NNs. These extrinsic losses could be subjected to optimized integration.

Figure 9(b) shows the local minimums corresponding to this transition, where all maximum losses of PINNis (as defined in Eq. (3)) fall below this threshold, indicating the onset of training for the next time intervals. The gaps between peaks correspond to the length of each time interval. The simulation could be then divided into two distinct phases. Initially, when the grain radius is substantial, the training progresses relatively fast. Subsequently, in the second phase with a smaller grain size, the BC loss exhibits an inverse relationship with the grain radius, becoming the most influential. The seamless transfer of learning between time intervals becomes evident as the PDE loss initially tends to decrease and then stabilizes in the later stages of training. Notably, the magnitudes of losses appear reasonable, with the PDE loss being approximately ten times greater than that of IC. If the BC loss remained activated throughout the entire simulation, in the subsequent benchmark (triple junction), it was managed through discrete activation, mirroring the approach adopted during the initial time interval.

Refer to caption
(a) Training on the first time interval t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT-t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
Refer to caption
(b) Training on the whole time domain)
Figure 9: Evolution of the different losses (PDE, IC, BC, denoising and mean loss) for the four PINNis over the epochs for a grain shrink without driving force (for an interface width λ=7dx𝜆7𝑑𝑥\lambda=7dxitalic_λ = 7 italic_d italic_x). The plots are displayed in a logarithmic scale. In (b), values below 108superscript10810^{-8}10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT and above 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT are excluded from the plot for visualization purposes. Additionally, the denoising loss is omitted from the plot due to its low magnitude and fluctuating nature, which may affect the visualization of other losses.

For the triple junction scenario (paragraph 4.4), the evolution of various losses (PDE, IC, BC, and mean loss) for the 16 NNs across epochs is illustrated in Figure 10.
In this context, the threshold is now set at 6×1046superscript1046\times 10^{-4}6 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.

Similar to the grain shrinking scenario, the first time interval is addressed, and then the cumulative learning is used to restart the simulation. For this purpose, two approaches are compared, with and without a pyramidal approach (respectively in Figures 10 a and b). It is noted that the only difference between the two simulations is the number of batches for the fine subdivision Nbatches minsubscript𝑁batches minN_{\text{batches min}}italic_N start_POSTSUBSCRIPT batches min end_POSTSUBSCRIPT (c.f. Algorithm 1). Specifically, Nbatches minsubscript𝑁batches minN_{\text{batches min}}italic_N start_POSTSUBSCRIPT batches min end_POSTSUBSCRIPT=16 and Nbatchessubscript𝑁batchesN_{\text{batches}}italic_N start_POSTSUBSCRIPT batches end_POSTSUBSCRIPT=4 for the pyramidal approach, while Nbatches minsubscript𝑁batches minN_{\text{batches min}}italic_N start_POSTSUBSCRIPT batches min end_POSTSUBSCRIPT=Nbatches subscript𝑁batches N_{\text{batches }}italic_N start_POSTSUBSCRIPT batches end_POSTSUBSCRIPT=4 for the second configuration. In Figure 10(a), the impact of the optimization techniques is subsequently showcased: wheel of optimizers and the pyramidal training. The wheel of optimizers is activated with a period of 50 epochs; L-BFGS-B initially decreases the global loss from a magnitude of 101superscript10110^{-1}10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT to ×102absentsuperscript102\times 10^{-2}× 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT MSE, then a progressive decrease is achieved with subsequent rounds to below the threshold of 5×1045superscript1045\times 10^{-4}5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT MSE. Then, Adam allows the transfer of learning from level 1 to level 2 of the pyramid, where the whole spatial domain is handled. Even if the global loss increases initially, it quickly falls below the target threshold (the 16 NNs converge together within 850 epochs), demonstrating the model’s capacity to generalize the learnable solution to the whole domain. However, the non-adoption of such a strategy makes it hard to enforce the convergence of all the NNs, as seen in Figure 10(b), and the global loss stagnates above the threshold even after 6000 epochs of training, using the same set of parameters as in Figure 10(a) except for the non-activation of the pyramidal training. It is worth noting that such a comparison was subjected to excessive re-execution for repeatability with the same outputs; the 16 NNs failed to converge simultaneously without the activation of pyramidal training during the first time interval. It is also suggested that this finding is also attributed to two main impacts of pyramidal training: the progressive data augmentation basis of this concept, and the facilitation of addressing boundary conditions, as accurate boundary predictions are a direct consequence of good convergence within the domain. Additional details about the domain decomposition related to the pyramidal approach are provided in the SM, section D.

The black dashed line in Figure 10c corresponds to the total mean loss, exhibiting oscillations that dip below the threshold (red line) for the enhancement of the next training interval. Here, the boundary condition loss is not visualized due to its discrete activation; however, its magnitude is noted to be in the range of 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT. It is noteworthy that all losses for all NNs should fall below this limit (not just the mean values). Previous observations regarding the behavior of the IC and PDE losses remain relatively valid in terms of the stabilization of losses across epochs. However, the declining aspect observed in the previous scenario is not evident here. This discrepancy could be attributed to the complexity of the triple junction evolution and dynamic interactions between different phases when compared to the single-grain shrinkage scenario. The sum loss (Eq. (LABEL:eq:summConstraint)) has the lowest magnitude (1012superscript101210^{-12}10 start_POSTSUPERSCRIPT - 12 end_POSTSUPERSCRIPT), and its constant values reflect the algorithm’s capacity to dynamically correct predictions, ensuring adherence to the summation constraint.

Refer to caption
(a) Training on the first time interval t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT-t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT using the pyramidal approach.
Refer to caption
(b) Training on the first time interval t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT-t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT without the pyramidal approach.
Refer to caption
(c) Training on the whole time domain.
Figure 10: Evolution of the different losses for the 16 PINNis over the epochs for the triple junction scenario. The plot is displayed in a logarithmic scale.
Addressing Nonlinearities in Microstructure Prediction

The current implementation showcases the effectiveness of this method in addressing non-linearities such as in the MPF model, offering good predictions of microstructure evolution in different scenarios above. The ability to capture both curvature-driven and driving-force-driven interface dynamics is well demonstrated. Typical MPF simulations employ a time-stepping approach with a magnitude of up to 10-4 and a minimum of 1000 steps across the given scenarios that is to ensure computational stability. In contrast, applying PINNs is shown here to allow for the selection of much larger time-stepping up to 200 time steps, except for the traveling interface simulation where computational speed was a focal consideration. This difference stems from the fact that while the conventional MPF is limited due to the spatial resolution, the PINNs-MPF focuses on the overall microstructural patterns, substantiating its ability to capture underlying physics while maintaining stable computational schemes.

Tuning the model hyper-parameters


One of the crucial points when dealing with PINNs is the tuning of the hyper-parameters. Here, any simplification made for hyper-parameters directly impacted convergence and can reduce the efforts for trial and error. In terms of the training dataset, this is automatically set, simplifying user inputs to the initialization, physical, and geometrical parameters. On the other hand, we have adopted major concepts from conventional MPF frameworks, namely the OpenPhase [darvishikamachali2013grainPhD]. This includes considering our simulation domain heterogeneously, with a focus on the phase-field interfacial region, while parallelizing the handle of individual phase-field variables. This demonstrates that the PINNs-MPF can successfully inherit optimization techniques already applied in classical computing approaches. Especially, the PINN resolution can be then assimilated as a multi-variable sequential learning, showcasing its adaptability and integration capabilities. The application of multi-networking and training by blocks demonstrates the feasibility of addressing diffuse interface problems with simplified hyper-parameters.

For specific challenges handling moving boundaries, the implementation of adaptive weights (gradient scaling algorithm) was unavoidable [Haghighat2022]. It is therefore worth noting that the loss terms in the implemented PINNs-MPF are by default left unweighted, i.e. λpde=λbc=λic=λϕ=1subscript𝜆pdesubscript𝜆bcsubscript𝜆icsubscript𝜆italic-ϕ1\lambda_{\mathrm{pde}}=\lambda_{\mathrm{bc}}=\lambda_{\mathrm{ic}}=\lambda_{% \sum\phi}=1italic_λ start_POSTSUBSCRIPT roman_pde end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT roman_bc end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT roman_ic end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT ∑ italic_ϕ end_POSTSUBSCRIPT = 1 in Eq. 3. This reduces the complexity of the current framework and the need for meticulous tuning, especially when considering the diverse nature of loss terms in Eq. 3. Moreover, this kind of algorithm could be useful to integrate in more complex scenarios. We note that our numerical experiments have indicated that the extrinsic losses can be weighted to optimize performance; for instance, reducing the weights for the BC loss to expedite training, and conversely, increasing the weighting of the phase summation loss to allocate more attention to it. However, the intrinsic losses (IC and PDE) cannot be weighted.

Interfacial Attention for a Latent Microstructure Representation


The results of the conducted benchmark studies indicate that the global PINN solution in simulations involves a precise computation of solutions along interface regions throughout the entire domain, complemented by an extrapolation method to approximate solutions at the interface region. This hybrid approach proves beneficial for diffuse-interface problems using ML. In the triple junction scenario, the distributed ’0 and 1’ points, as depicted in Figure 7, were employed to enhance the precision of the solution, particularly for the application of phase summation (Eq. 2) and accurate addressing of boundary conditions between neighboring NNs. Similarly to the mono-phase benchmarks, it is possible to fully focus on interfacial regions within multi-phase simulations. First demonstrations of this are provided within Section E of the SM. This opens the way to resolve MPF simulations using NNs independently of the grid size, thus allowing for tackling more complex scenarios.

Dynamic correction mechanism for phase summation constraint


Within the literature, a major limitation against the application of ML to resolve multi-phase problems is the phase summation criterion: Indeed, in previous works, a requirement for a bounding algorithm was obligatory as a correction step to the PINN prediction [HUANG2021103727, Zheng2022]. Meanwhile, in the current work, the applied method introduces a dynamic correction mechanism for phase predictions. This involves dividing the prediction of each neural network (NN) responsible for a specific phase by the sum of predictions from other NNs handling other phases within the same batch. This dynamic correction not only ensures continuous training but also suppresses the need for an external correction algorithm. Instead, the correction of phase summation is seamlessly integrated as an additional constraint loss term, enhancing the efficiency of the training process.

Exploring model complexity and impact of the optimization strategy


The results about the additional numeric experiment are gathered in Table 5. Associated graphical illustration is provided in Figure 11. The experiments revealed that while all architectures started with high and similar MSE values, the MSE decreased significantly after the first round of Adam optimization. The four models reach similar loss magnitude after the first round. Subsequent L-BFGS optimization further reduced the MSE for the 6×646646\times 646 × 64 and 6×12861286\times 1286 × 128 architectures, with the 6×12861286\times 1286 × 128 model attaining the best overall MSE of 1.29×1041.29superscript1041.29\times 10^{-4}1.29 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. Larger architectures like 6×25662566\times 2566 × 256 and 6×51265126\times 5126 × 512 failed to converge within the specified rounds, potentially due to overfitting or optimization challenges arising from their high parameter counts. The 6×12861286\times 1286 × 128 architecture struck an optimal balance between model complexity and generalization ability, benefiting from the combined Adam and L-BFGS optimization strategy. While larger models had more parameters, they did not necessarily perform better and could be computationally expensive. This numerical experiment reveals a solid correlation between the model’s parameter count, representing its complexity, and the impact of multi-networking. For instance, the 6×25662566\times 2566 × 256 configuration, yielding 1,651,205 trainable parameters, challenges training when employing four NNs. Conversely, a reduced parameter count of 1,414,417 enables training 16 NNs (16 x (6 HL × 128 neurons)) simultaneously to effectively handle four phases (c.f. the triple junction benchmark). Moreover, even higher parallelism is feasible through pyramidal training, where initially 64 NNs are initialized, but only 16 are selected for further training using the basket of PINNs concept. That implies that using moderately-sized networks through the multi-networking context (the extended subdomain decomposition technique) allows to efficiently alternate optimizers (the wheel of optimizers), while the transfer of learning facilitates the transition between time intervals. This underscores the rationale behind initially selecting a subset of PINN techniques, while the second subset (AMFO, phase correction, and vertical optimization) has proven crucial in prior MPF method studies and the current work. Pyramidal training and the comprehensive set of PINNs emerge as essential for progressively transferring learning to a unified domain when dealing with massive domain decomposition, thereby making the framework ready for up scaling, as hereafter detailed.

Table 5: Evolution of MSE and model trainable parameters count for different architectures: optimization rounds with Adam-L-BFGS-Adam cycles. Comparisons are done at epoch 0, after first and second rounds.
HL
×\times×
neurons/HL
Trainable
parameters
MSE
at epoch 0
MSE
after round 1
MSE
after round 2
Rounds
required to
converge (){}^{(}*)start_FLOATSUPERSCRIPT ( end_FLOATSUPERSCRIPT ∗ )
6 x 64 105,605 1.396×1011.396superscript1011.396\times 10^{1}1.396 × 10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 1.387×1011.387superscript1011.387\times 10^{-1}1.387 × 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT 1.2702superscript1.27021.270^{-2}1.270 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 5
6 x 128 416,005 1.33×1011.33superscript1011.33\times 10^{1}1.33 × 10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 2.712×1012.712superscript1012.712\times 10^{-1}2.712 × 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT 1.29×1041.29superscript1041.29\times 10^{-4}1.29 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 3
6 x 256 1,651,205 1.36×1011.36superscript1011.36\times 10^{1}1.36 × 10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 1.87×1011.87superscript1011.87\times 10^{-1}1.87 × 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT 1.387×1011.387superscript1011.387\times 10^{-1}1.387 × 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT succeeds\succ 10
6 x 512 6,579,205 1.351×1011.351superscript1011.351\times 10^{1}1.351 × 10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 7.517×1017.517superscript1017.517\times 10^{-1}7.517 × 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT 1.723×1011.723superscript1011.723\times 10^{-1}1.723 × 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT succeeds\succ 10

(){}^{(}*)start_FLOATSUPERSCRIPT ( end_FLOATSUPERSCRIPT ∗ ) It is worth reminding that convergence signifies that the global loss of the model has fallen below the threshold and the current time interval has been managed.

Refer to caption
Figure 11: Evolution of MSEs and model parameters count for different architectures as variants of the benchmark 2.
Transfer of learning

In the context of learning transfer, our framework demonstrates the feasibility of hierarchically transferring knowledge within subdomain decomposition. Through multi-networking and full parallel training, it employs a pyramidal training approach, categorized as ’Multiple to Multiple’ transfer. This allows the transfer of learning, for example (but not limited to), from 4 NNs to 4 others, 16 to 16, 16 to 4, etc. From a technical standpoint, achieving ’Multiple to Single’ transfer of learning is possible, where knowledge is transferred from multiple NNs to a single NN and vice versa. However, our first trials showcased that training a single NN revealed challenges, leading to optimization failures. The synergy between ML and optimization algorithms is crucial for effectively exploiting patterns, but this task becomes challenging with large NNs. Multi-networking and pyramidal training until reaching a reasonable number of NNs proves efficient in managing such complexities. Nevertheless, we provide within the actual code repository a demonstration of this ’Multiple to Single’ transfer, denoted as ’PINNissubscriptPINN𝑖𝑠{\textit{PINN}}_{i}sPINN start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s to Master’ transfer, that could be of interest for other physical applications. Additional details are provided in the SM, section F.

Exploring more ways for improvement of the framework

In the current development phase, one aspect to consider is selecting the appropriate threshold to transition between time intervals in the full-discrete resolution. This threshold was found to be strictly related to each simulation and subjected to trial and error. For the magnitude of 10-5 in single-phase benchmarks, it was slightly raised when handling the multiphase field (MPF) scenario to ensure a compromise between the speed of the training and the convergence of the solution. One possibility to deal with this limitation is to integrate an additional loss term related to the variation in the energy of the system. If the energy residual (and its derivative) reaches a steady state, it is then a physical indication that the system reaches its equilibrium within this time interval, and then the transition could be ensured [GUILLENGONZALEZ2014821, WANG2023216]. Theoretical formulations and graphical illustrations related to the utility of integrating energy loss to enhance training are provided in the SM, section B.

A potential point of improvement for the current PINNs-MPF framework is the integration of sequential training. Indeed, optimizing the evolution of losses related to the triple junction motion could be achieved if the model captures the temporal behavior of the solution. The flexibility of discrete resolution under the current implementation eases the integration of sequential learning through Recurrent Neural Networks. Optimizing runtimes against theoretical solutions or existing literature is beyond the scope of this study, as PINNs are inherently more time-consuming than classical approaches. Enhancing PINN solvers is a work in progress [cuomoscientific2022, chuang2022experience] and studies on the use of PINNs for solving multi-phase problems remain uncommon [Haghighat2022] . Therefore, the first goal of this study is to prioritize solution fidelity, training continuity, and efficient resource distribution. This approach is similar to recent studies focusing on multiphase phenomena, particularly in fluid dynamics. For instance, Haghighat et al. [Haghighat2022] have concentrated on the methodology and challenges associated with training PINNs for coupled flow and deformation in porous media. Zheng et al. [Zheng2022] have aimed to predict the sequential motions of flow simulations in a discontinuous manner, alternating between PINN training and MCBOM corrections. Amini et al. [AMINI2023112323] have focused on validating the inverse modeling approach targeting nonisothermal multiphase poromechanics. To conduct running time studies on these multi-phase field frameworks, inspiration could be taken from the study of Shukla et al. [SHUKLA2021110683]. Among other proposals, leveraging multiple GPUs and nodes and adopting a hybrid programming model could be beneficial.

6 Conclusion

An interconnected PINNs-MPF framework has been presented to resolve MPF simulation of interface dynamics. Our approach has been to address and benchmark most central requirements for a reliable reproduction of MPF simulations, namely, curvature- and driving-force-driven interface motions and the evolution of a triple junction. In particular, the nonlinear evolution of curved interfaces and the effect of interface width on the interface kinetic were captured, demonstrating the capability of this framework in dealing with diffuse interfaces. The simulation of the triple junction established that the application of the sum constraint as a loss term and in combination with a parallelization scheme works precisely and efficiently. The rationale behind the current development has been to demonstrate the feasibility of PINNs in capturing key features of microstructure evolution, before increasing complexities. This ensures a well-prepared transition to more advanced dimensions while maintaining the robustness and reliability of the methodology. All benchmark simulations are conducted in a single execution without the need for post-correction algorithms, such as intercalated phase correction algorithms. Various advanced techniques were encompassed in the PINNs-MPF framework, including training optimization, extended domain decomposition, and boundary condition propagation. These techniques have yielded accurate predictions and efficient training, establishing a robust foundation for future advancements in this research area. Potential avenues for further exploration include developing more unified solutions through sequential learning, conducting comparative studies with other PINN approaches in the literature, and integrating energy-based optimizations, which is a key component of the MPF Method. As this progress step, the vision is to extend the capability of the developed framework to address more intricate scenarios involving a greater number of phases, larger grids, and three-dimensional dimensions. As the difficulty of the scenarios gradually increases, it is crucial to carefully address related issues when using machine learning to predict solutions, such as managing singularities, mitigating inaccurate predictions, and optimizing computation times.

Acknowledgments

RDK acknowledges financial support from the German Research Foundation (DFG) within projects DA 1655/2-1 (Heisenberg program) and DA 1655/3-1.

Supplementary material

Additional results and discussions as well as a video presentation of the framework are given in the Supplementary Material attached.

Data availability

The PINNs-MPF repository is available on GitHub at the following link:
https://github.com/SFETNI/PINNs_MPF

Competing financial interests

The authors declare no competing financial interests.

\printbibliography

See pages - of S_M.pdf