biblio.bib affiliationtext: Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany **affiliationtext: [email protected]; ∗∗[email protected]
algorithm
PINNs-MPF: A Physics-Informed Neural Network Framework for Multi-Phase-Field Simulation of Interface Dynamics
Abstract
We present an application of Physics-Informed Neural Networks (PINNs) to handle MultiPhase-Field (MPF) simulations of microstructure evolution. To handle such problems, it has been showcased that a combination of optimization techniques extended and adapted from the PINNs literature, as well as the introduction of specific techniques inspired by the MPF Method background, is required.
The numerical resolution is realized through a multi-variable time-series problem by using fully discrete resolution.
Within each interval, space, time, and phases/grains are treated separately, constituting discrete subdomains.
An extended multi-networking concept is implemented to subdivide the simulation domain into multiple batches, with each batch associated with an independent Neural Network (NN) trained to predict the solution.
To ensure efficient interaction across different phases/grains and in the spatio-temporal-phasic subdomain, a Master NN handles efficient interaction among the multiple networks, as well as the transfer of learning in different directions.
A set of systematic simulations with increasing complexity was performed, that benchmarks various critical aspects of MPF simulations, including different geometries, types of interface dynamics and the evolution of an interfacial triple junction.
A comprehensive approach is adopted to specifically focus the attention on the interfacial regions through an automatic and dynamic meshing process (introduced as an adaptive Mesh-free optimization), significantly simplifying the tuning of hyper-parameters and serving as a fundamental key for addressing MPF problems using Machine Learning. The pyramidal training approach is proposed to the PINN community as a dual-impact method: it facilitates the initialization of training when dealing with multiple networks, and it is proposed as a method to unify the solution through an extended transfer of learning.
The proposed PINNs-MPF framework successfully reproduces benchmark tests with high fidelity and Mean Squared Error (MSE) loss values ranging from 10-4 to 10-6 compared to ground truth solutions.
Keywords: PINNs, Phase-field method, Parallel training, Machine learning, Neural networks, Adaptive mesh refinement
Contents
- 1 Introduction
-
2 The PINNs-MPF Framework
- 2.1 Discrete Resolution in Time
- 2.2 Extended domain decomposition
- 2.3 Pyramidal Training and Block Training
- 2.4 Adaptive mesh-free optimization
- 2.5 Propagation of BC across Sub-domains
- 2.6 Handling Multi-Phases (Vertical Optimization)
- 2.7 The Wheel of Optimizers
- 2.8 Transfer of Learning
- 2.9 Phase correction
- 3 Methods
- 4 Results
- 5 Discussion
- 6 Conclusion
1 Introduction
Today’s microstructure modeling has grown into an extensive branch of materials research, on the one hand, revealing the multi-physics of the concurrently evolving microstructural constituents across scales and, on the other hand, bridging these with the process, property and performance of materials. Despite remarkable advances over the last two decades, the rapidly growing complexities of materials’ chemistry and processing are constantly surpassing our actual microstructure modeling capacity, thus, decelerating materials innovation. A potential solution for this growing challenge is to utilize the recent developments in machine learning (ML) and deep learning (DL) for microstructure modeling [choudharyrecent2022]. The recently introduced Physics-Informed Neural Networks (PINNs) are showing promising capabilities in addressing physical phenomena [cuomoscientific2022]. By enabling non-linear approximations of solutions for Partial Differential Equations (PDEs), the PINNs counterpart of the conventional approaches, such as phase-field, fluid dynamics and finite-element methods, would allow for a mesh-free modeling framework, capable of transfer of the learning and without the necessity of local approximations or simplifications in the underlying physics [cuomoscientific2022, HAGHIGHAT2021113741, Ramuhalli1528518]. These promote new perspectives in microstructure modeling, motivating us to pursue the current line of studies.
First applications of PINNs to solve 1D PDEs (e.g., the heat equation, wave equation and Burgers’ equation) were quite successful, demonstrating its simplicity and efficiency [raissiphysicsinformed2019, YANG2019136, ZHANG2019108850]. Since then, deeper applications have been increasingly targeted in various areas including heat transfer [Sharmaheattransfer2023, ZOBEIRY2021104232], solid mechanics [HAGHIGHAT2021113741, DIAO2023116120, HENKES2022114790, GUO2023113334, bai2023introduction, S0219876223500135, GOSWAMI2020102447, khorrami2023artificial], electromagnetic [HAGHIGHAT2021113741, Ramuhalli1528518, CAO2023123622], turbulence modeling [Shirui2020, HANRAHAN2023109232], electro-chemistry [CHEN2022116918, Hofmann2023] and Energy [HAN20233450, PRIYADARSHI2024110231].
However, as the literature on resolving PDEs using PINNs has grown, several challenges have been revealed, highlighting the necessity for optimization techniques tailored to specific physical contexts. Jagtap et al. [Jagtap2020] proposed Extended Physics-Informed Neural Networks (XPINNs) to address moving boundaries. This method subdivides the domain into different mini-batches to locally compute the solution using distinct neural networks. XPINNs were tested on the one-dimensional viscous Burgers equation, employing various optimization techniques such as subdomain decomposition (in time and space), adaptive activation functions, independent hyper-parameter adjustment for each network, and a specific data processing method to characterize the average behavior of predictions along the subdomain interfaces. Kharazmi et al. [KHARAZMI2021113547] introduced hp-Variational Physics-Informed Neural Networks (hp-VPINNs), utilizing domain decomposition and h-refinement with dynamic mesh resolution adjustment to target the advection–diffusion equation (1D) and Poisson equation (1D-2D). Meng et al. [MENG2020113250] propose a Parareal PINNs (PPINN) framework for solving time-dependent PDEs using domain decomposition. The examples include ordinary differential equations (ODEs) and a 2D nonlinear diffusion-reaction equation. They employ various optimization techniques like alternating between Adam and L-BFGS optimizers, Quasi-Monte Carlo sampling for stochastic ODEs, and principally a parallel-in-time training approach with a serial update at the subdomain interfaces using the fine PINN solutions as a parallel prediction-correction process. Penwarden et al. [PENWARDEN2023112464] further enhanced these techniques by introducing stacked-decomposition inspired by causality literature (time-causality enforcement methods [krishnapriyan2021characterizing, BIHLO2022111024]), window-sweeping, alternance of optimizers and transfer of learning methods proposed to overcome training challenges, respect causality, and improve scalability by limiting computation per optimization iteration. The authors consider the 1D convection equation, 1D Allen-Cahn equation and 1D Korteweg–de Vries. While the reported methods have demonstrated promising results, further testing and extension are necessary to address more complex scenarios, such as two-dimensional configurations involving systems of equations, Multi-Phase Field problems and three-dimensional domains. These scenarios introduce additional challenges that require innovative approaches and optimization techniques.
A few applications of PINNs to handle diffuse interfaces were recently presented, especially to study evolution of multiple interacting phases or components [ROJAS2023100450, ZHANG202364, Haghighat2022, ZHANG2023111919, AMINI2023112323, wight2020solving]. Zheng et al. [Zheng2022] have developed a physics-constrained neural network (PCNN) to predict the sequential motions of flow simulations. Due to the challenging nature of multi-phase problems, the applied approach was based on the satisfaction of the mass conversation constraint when encoding the observation of the recurrent NN. A conservative boundedness mapping algorithm (MCBOM) was then called to correct the phase prediction, reported to be efficient for handling sequential notions in some testing benchmarks in 2D, e.g., the evolution of three phases in the shear layer and dam break problems. Haghighat et al. [Haghighat2022] proposed a dimensionless form of the PDEs combined with a sequential training strategy based on stress-split algorithms and multi-networking, applied for solving several problems in poroelasticity. It was found that both the size and structure of the NN, along with the hyperparameters of the optimizer, such as the learning rate, can significantly influence the quality of PINN solutions. Amini et al. [AMINI2023112323] introduced inverse modeling of nonisothermal multiphase poromechanics using PINNs, successfully applying the method to both single- and multi-phase flow in porous media.
During these remarkable advances, several challenges were also spotted in the field that remain to be addressed. A primary concern with PINNs involves the gradient pathology in the total loss function, due to larger gradients in higher-derivative terms, affecting the accuracy of the predictions [Haghighat2022]. Furthermore, ensuring systematic convergence of training becomes difficult, especially with the high computational cost of multi-networking. The use of a multi-networking approach can generate differing architectures or training data among separate NNs, which leads to imperfect alignment and thus discontinuities or artifacts along boundaries. Adjusting the size of training sets, including initial conditions, collocation points, and boundary conditions, was shown to help resolve this problem, but at the same time, this adds to the complexity and computational costs [cuomoscientific2022, ZHU201956]. The assembly and transfer of collective learning from multiple networks (transferring the learning) is another major challenge in PINNs. Here tuning hyper-parameters, even with a single NN, remains challenging: Adaptive weighting, while introduced to address feature dependencies, reflects non-systematic learning of solution patterns [CHEN2022116918]. In addition to these, the profound sensitivity of the solution to the hyper-parameters, encompassing variables such as the number of neurons, network layers, learning rate, and the composition of training datasets, emerges as a major issue that requires more attention when dealing with PINNs [CHEN2022116918, ARZANI2023111768]. Regarding Multi-phase Field (MPF) equations, they involve a coupled system of nonlinear PDEs with multiple phases and several parameters. The presence of higher-order derivative terms and a non-convex potential term further contributes to additional non-linearities and the emergence of several local minima. Additionally, the generated solutions often require correction algorithms to satisfy conservation laws, rendering classical PINNs unable to make accurate predictions. Therefore, incorporating conservative correction algorithms and adequate optimization methods becomes necessary [HUANG2021103727].
Multi-phase-field methods (MPFM) for microstructure modeling have proven to be powerful tools for capturing intricate multi-physics phenomena in multi-phase, multi-component materials [chen2002phase, steinbach2009phase, steinbachphasefield2009]. Being based on a generalizable free energy functional, a key feature of the MPF framework is its capacity to be flexibly expanded to different degrees of complexity which can be specific to the physics of the problem at hand. A representative demonstration of this flexibility has been explored in dealing with polycrystalline microstructures: Starting from curvature-driven interface kinetics, the MPF framework has gradually expanded to cover the chemical [GROSE2022111570], mechanical [SINGH2021107348] and even the physics of magnetic and electric contributions [Huo2023]. These led to the accurate investigation of complex phenomena such as precipitation, recrystallization, and phase transformation as well as various chemo-mechanically coupled phenomena [darvishikamachali2013grainPhD, kamachali2015texture, schwarze2018computationally]. Despite successful implementations and applications [kamachali2018numerical, tegeler2017parallel], an intrinsic challenge of the MPF approach is that with any further advancement and raising scientific questions, immediate needs for extensive programming and optimization efforts are required, which often spread to deep levels of redefining key parameters and functions as well as memory- and storage-structures in the code. These originate from the fact that any update to the model necessitates changes to the underlying free energy functional and corresponding equations of motion. As a result, it becomes difficult to keep the software updates up with the pace of emerging topics in materials research, for instance, the unanswered questions regarding non-equilibrium microstructure evolution in additively manufactured materials [WANG2020101538, PARSAZADEH2023101102] or the complex diffusion-interface couplings in novel high-entropy alloys [QIAO2021160295, Ziyuan2022HEA].
This work presents a comprehensive framework, termed PINNs-MPF, specifically designed to tackle the challenges of solving MPF equations using PINNs. By identifying the limitations of existing PINN methods when applied to MPF equations and recognizing the additional complexities introduced by these non-linear, convex, and constrained systems, we propose a unified approach that combines and extends relevant PINN techniques with MPF method strategies.
This strategy is inspired by the core development of the OpenPhase software for MPF simulations, which has demonstrated success in studying diffuse interface problems, but aiming to overcome reported limitations, by enabling a learnable solution while efficiently handling non-linearities, to integrate more physics.
[darvishikamachali2013grainPhD].
This framework introduces key novelties through its optimization strategies: (i) A extended domain decomposition approach is implemented that uses a central coordinating network (referred to as the "Master") to manage tasks and ensure continuity among subnetworks (referred to as the "workers") during full parallel training in space, time and phases. (ii) An extreme mesh refinement approach, with a distinct focus on the phase-field interfacial zones, is applied. (iii) We simplify and reduce the hyper-parameters as elaborated in the following. (iv) A synchronization step to ensure efficient interactions between the evolving phase-fields is applied. Finally, (v) a pyramidal training approach is applied as a robust alternative to adaptive weighting and an efficient solution for transferring learning of multiple-to-multiple and single-to-multiple networks.
Inspired by the MPF benchmark philosophy [darvishikamachali2013grainPhD], we test our PINNs across four core applications that built the MPF concept with a progressive complexity: First, we model the motion of a diffuse interface (propagation of a stable phase-field profile), second and third, we study the curvature-driven shrinkage of a grain in the presence and absence of a driving force, and fourth, we investigate the evolution of a triple junction that demonstrates the core principles of a multi-phase evolution. The results are compared against the outputs from OpenPhase and discussed in the context of our hyper-parameters and concepts implemented within our PINNs framework.
2 The PINNs-MPF Framework
The MPF temporal evolution is governed by the set of PDEs as follows:
(1) |
with phase-fields , the interface mobility, the interface energy and the interface width. The pairwise term counts for any additional driving force (e.g. chemical, elastic, etc.) influencing the interface dynamics. At every point in space and time, the summation of phase-fields/order parameter reads:
(2) |
with the number of phase-fields present. Further details of the MPF model are given in the Methods section 3. The PINNs-MPF framework is targeted to resolve Eqs. (1) and (2) in time and space and for various initializations. In doing so, PINNs aim to minimize a loss function with respect to the hyper-parameters . With the MPF model in hand, this is composed of several terms:
(3) |
where is the loss associated with enforcing the governing physics through PDEs:
(4) |
in which, r represents the residual of the PDEs or the physical loss (Eq. 1) and the number of collocation points. is the loss associated with enforcing boundary conditions (BC) on the simulation domain:
(5) |
with representing the approximated solution at boundaries and time using the neural network with parameters and, is the prescribed or observed value at the corresponding boundary and time. is the loss associated with enforcing initial conditions (IC), i.e., the spatial configuration of the phase-fields:
(6) |
that compares the approximated solution at initial state and time against the prescribed or observed value at the corresponding (initial) state . Finally, is the loss associated with the constraint Eq. (2) of the phase summation:
(7) |
where is the total number of phases in the given point. Unlike the other losses which are computed over a subpart of the domain, the loss over the sum constraint is computed for the entire simulation domain.
The coefficients , , and in Eq. (3) are weight factors associated with different loss terms, to balance the importance of each term. The objective of Eq. (3) is to find the optimal parameters that minimize the whole loss function, resulting in learned neural networks that are then capable of approximating the MPF solutions of a problem with the given initialization and boundary conditions. We note that since the MPF method is based on a free energy functional, reinforcing terms to account for the energy dissipation [SECCI2024105494] could be directly added to the loss function, further elaborated in the Discussion (section 5).
The architecture of the current PINNs-MPF framework is modular and composed of various implementations. To better illustrate the PINNs-MPF framework, Figure 1 presents a flowchart of the framework.
Panel (a) in Figure 1 presents the global architecture: Similar to the conventional MPF method, full discrete training in space and time is considered. Our experiments demonstrate that this approach not only eases the comparison to our reference MPF solutions but also allows for maintaining a constant demand for resources throughout the entire simulation, in contrast to continuous training, which demands growing consumption [cuomoscientific2022]. The general control of the training is handled through a master network, referred to as the MASTER-PINN, while the workload is shared among multiple parallel NNs (subdomains), referred to as WORKER PINNs (c.f. paragraph 2.2). The subdividing procedure is automated. Within each subdomain, the training methodology is carried out independently, employing all distinctive techniques as introduced in the following. This includes ‘pyramidal training’, a structured approach designed for handling the transfer of learning along the spatial domain (c.f. paragraph 2.3). The term ‘basket of PINNs’ (c.f. paragraph 2.7) refers to an ensemble of networks dynamically selected for optimization to enhance efficiency. Indeed, Without an efficient strategy, the resource demand can grow exponentially, leading to impractical training times and excessive use of computational resources. Additionally, the concept of the ’wheel of optimizers’ (c.f. paragraph 2.7) is introduced, denoting the interaction between the optimizers.
The training is in two directions: First in space, where the domain is decomposed into regular boxes (denoted as Nbatches). The degree of this decomposition depends on the complexity of the problem. The second direction of training concerns the phase-fields. For each of these spatial-temporal blocks, referred to as a batch, a single WORKER PINN, referred to as PINNi, is trained to locally predict the solution. Hereafter, the term "blocks" is employed interchangeably with "quarters," as the spatial domain is consistently divided into four blocks. Each PINNi handles a given phase in a given batch and, depending on the domain decomposition, interacts along boundaries for spatial continuity.
At the beginning of each training (time) step, the batch structure is separately built for each phase-field , i.e., Nbatches PINNs are created and inherit the same architecture as the MASTER-PINN. During the training, several optimization techniques could be applied (based on the initial user selection), including pyramidal training, the wheel of optimizers, the basket of PINNis, and the concept of Quarters-based training. An upper view of the global algorithm is shown in Algorithm 1, where , denote the minimum number of batches (coarser subdivision in case of pyramidal training) and the periodic epoch for scipy optimization respectively. In the following, we describe the details of various implementations of the PINNs-MPF framework and the related algorithm.
2.1 Discrete Resolution in Time
We consider the MPF equations as solving a multi-variable time-series problem through the NNs. This has also been discussed in previous studies [montes2021accelerating, HU2022115128, oommen2022learning, fetni2023capabilities]. The computational domain in the upper left corner panel in Figure 1(a) depicts this idea. Here the training process occurs in spatial domains/subdomains at fixed time intervals, analogous to time steps in conventional modeling methods, but with relatively larger intervals to harness the robust linearization capabilities of deep learning (DL). In order to proceed to a subsequent time domain, a global loss threshold is predefined to be achieved. This discrete resolution ensures a quasi-constant resource consumption, hence preventing any accumulating computational load along the time axis. To draw an analogy between discrete resolution in MPF modeling, the notation (also denoted for dimensionless resolution) is adopted here for both PINNs and MPF resolutions; in the context of PINNs, this notation refers to the length of the time interval. In the context of the PINN literature, this approach categorizes our framework within the class of time-marching or temporal PINNs, as discussed in the causality literature [PENWARDEN2023112464, CHEN2024111423, WANG2024116813].
2.2 Extended domain decomposition
This technique extends the domain decomposition approach introduced in previous PINN studies [Jagtap2020, SHUKLA2021110683, MENG2020113250] by adding a third dimension of decomposition and parallelization along the phases direction. This extension aims to handle the increased complexity of the target problem more effectively. A centralized network ensures synchronization among the multi-networks, facilitating a coordinated learning process across the decomposed spatial, temporal and phase domains. Each phase, managed by a corresponding Neural Network, requires concurrent access to the interaction terms () of other phases (Eq. LABEL:eq:Inter_term), which collectively form dual or higher-order junctions. These interaction terms demand a synchronization method (Algorithm 2 ). Consequently, parallel implementation is crucial for the precise calculation of interaction terms, as per the NN definitions. A serial approach cannot adequately manage these interactions, rendering parallelism indispensable rather than merely an improvement to be quantitatively compared against serial execution or existing methods, if available.
One challenging aspect of a PINNs-MPF is the heavy computational costs associated with the size of the simulation domain and the number of phase-fields. Here the concept of multi-networking is essential to distribute this computational load efficiently. This involves the parallelization of multiple PINNis and their associated subdomains obtained through a structured domain decomposition. In this context, a MASTER-PINN is designed to distribute tasks, training data, and collate the learned outcomes from the WORKER-PINNis/PINNis. Each PINNi operates independently and concurrently, utilizing a dedicated thread in the computer. It functions indeed as an autonomous entity responsible for processing a specific batch within the spatial domain and managing a given phase-field within the MPF domain. All PINNis are assumed to have the same architecture (number of layers and neurons per layer, learning rate, etc.). This facilitates complete parallelization of the training process and ensures spatial continuity throughout the training.
As depicted in the flowchart in Figure 1, the MASTER-PINN does not undergo any training but performs several central tasks, including (i) preparing the training data set, (ii) initializing and assigning each WORKER PINNi for training, (iii) initiating and controlling the training process, (iv) facilitating communication among the WORKER PINNs, (v) managing the temporal transition of the training, and (vi) transferring the (ideally) successful learning from the actual training to another simulation. From a technical point of view, it is noteworthy that the current multi-networking approach enables the transfer of learning from multiple networks to a single network. This capability is available within the code through the pyramidal training option, wherein the subdivision could be progressively transformed from fine to coarse, ideally converging to a single network (c.f. paragraph 2.3).
The communication between PINNis could be classified into two categories: a spatial ‘horizontal’ communication which is required to ensure spatial continuity across the boundary between their subdomains (c.f paragraph 2.4) and a phase-field ‘vertical’ communication that allows the coupled co-evolution of the phase-fields in the context of the MPF model (c.f paragraph 2.6). To facilitate the training of the WORKER-PINNis and their efficient management by the MASTER-PINN, the concept of the trinity batch-PINNi-phase is crucial. Each PINNi is assigned a unique identifier by associating it with a specific batch number and phase-field number, as illustrated in Figure 1a. Consequently, the local neural network incorporates horizontally stacked spatial-temporal coordinates , the output (prediction) of which is a single array representing the fraction of the corresponding phase, ranging from 0 to 1. By simplifying the multi-phase problem into discrete single-phase problems in this manner, the computational approach is streamlined. This necessitates efficient interactions between the phases, which is treated through the ‘vertical’ communication. Finally, it is preferred to spatially decompose the simulation into Nbatches equal batches.
2.3 Pyramidal Training and Block Training
Merging the hyper-parameters of various PINNis (WORKERS) at once and with equal significance can result in overwriting certain features in the domain. For this purpose, we have thought of a gradual pyramidal training and merging. The concept of pyramidal training draws inspiration from the data augmentation principle in ML [HUTER2020109488, CARPENTER2002183]. It is based on the assumption that adding more data to a pre-trained model serves as a warm starting point, enabling the NN to better capture crucial features before further data processing. We propose to implement a pyramidal training paradigm in conjunction with another concept of ‘training on blocks’. These two complementary concepts are showcased in Figure 1(b) and described as follows: Once the spatial domain is subdivided into multiple batches, each one can be further divided into distinct domains, giving a pyramidal structure. Each of these finer domains in the pyramidal structure is called a block or a quarter. The procedure of subdivision can continue as depicted in 1(b). After the pyramidal subdivision is completed, the training process initiates at the finest subdivision level (i.e., the base of the pyramid). However, instead of training all Neural Networks (NNs) within all batches, within each block/quarter, PINNi with the highest number of interfacial points is selected for training; this selection concerns the first phase and the selected PINNis serve as a reference for other phases. Similarly, for the other phases (in the vertical direction), training is assigned to the PINNi within the same batch as the reference PINNi within each block/quarter. This selective handling reduces the computational costs and makes the training focused on hot spots in the simulation domain. Once target losses are reached for the selected PINNis, the training stops at the finer level and transitions to a coarser level using the same algorithm described above. The training on the coarser level begins with a warm-start approach, where each selected PINNi receives learning from the previously selected PINNi at the finer level, within the same quarter and sharing a spatial intersection area, thereby achieving a data augmentation-like training. The transition from one level of the pyramid to another is set by a squared relationship to ensure a pyramid-like subdivision:
(8) |
where is the number of batches in the next upper level until ( is reached, that means the continuation of training the PINNis on the whole spatial domain. This gradual resolution has two main advantages: On the one hand, it allows a fast identification of the optimal set of hyper-parameters, and therefore key features, without the need for an adaptive weighting strategy. On the other hand, it guarantees the training of the same numbers of NN independently of the number of subdivisions (). For that, it is worth mentioning that such a strategy is highly recommended to be activated for the early stages of the training and then set off to speed up the training.
2.4 Adaptive mesh-free optimization
The Adaptive mesh-free optimization (AMFO) is here introduced as a PINN implementation extending the adaptive mesh refinement (AMR) concept, from the phase-field literature, to dynamically adjust collocation point distributions. AMR variants have indeed proven valuable for phase-field and MPF models, allowing efficient capture of evolving interfaces and steep gradients. By incorporating adaptive collocation point sampling, this approach leverages the mesh-free nature of PINNs while retaining AMR benefits like improved accuracy and robustness [LI20127926, GUPTA2022115347, XU2022108891, FREDDI2023100127, bijaya2023multilevel]. The ease of integrating adaptive sampling with PINNs makes this approach attractive for tackling challenges posed by phase-field models, such as complex geometries and localized high-gradient regions. The proposed technique employs adaptive collocation point sampling to concentrate points around interfacial regions, a denoising loss function to reduce noise in non-interfacial areas, and dynamic resampling to continuously adapt the overall collocation point distribution.
2.4.1 Adaptive re-meshing
In the presence of diffuse interfaces, continuity among the phase-fields and across the spatial domains is required. Inspired by the storage concept proposed for the OpenPhase [darvishikamachali2013grainPhD], we adopt a mesh structure that mainly focuses on the interfacial regions, neglecting interior grain zones where phase-fields are equal to 0 or 1. This strategy not only reduces the computational costs but also reduces the tendency for an error when coping for the continuity across the domains/batches. This is further promoted by the high capacity of fitting and extrapolation of ML which we elaborate on later in the Discussion (Section 5). To this end, some key hyper-parameters are user-predefined as follows. First, the minimum and maximum number of IC points per batch N and N are defined. Once the ratios of 0 and 1 phase-values per batch are given, the number of IC points per batch is set, following:
(9) |
in which donates the percentage of interfacial points per batch. The sigmoid-like function in Eq. (9) allows gradual densification of the mesh depending on the importance of the zone, so that the diffuse interfacial area and triple junctions in a multi-phase configuration should be allocated by the higher number of IC points. Using Eq. (9), the collocation points are generated dynamically around the IC points (typically within a circle of radius ) through a predefined ratio to get , where is the number of collocation points per batch and is a user parameter that could be adjusted depending on the nature of the problem. Experimenting with our benchmark studies below, we varied this coefficient between 20 to 40. Another potential improvement, to reduce the hyper-parameters in Eq. (9) and to increase the significance to triple junctions, is to define the density of interfacial points dynamically. This would involve computing the percentage of interfacial points per batch dynamically, thereby determining the number of interfacial points per batch as follows:
(10) |
where represent the limits of each PINNi. Recall that corresponds to the associated percentage of interfacial points, emphasizing that denotes the interfacial width, and these parameters are already available within the code. Additionally, is an introduced hyper-parameter (user input) representing the local density of the mesh, indicating how much the mesh should be densified based on the number of interfacial points inside. With this improvement, the geometric hyper-parameters are reduced to two variables: the number of subdivisions (N) and the local density (), considering that the coefficient could be set at 20 for all numerical experiments. This simplification allows focusing on the architecture of the neural networks. The PINNis sharing the same batch index while handling different phases should use common collocation data (common batches). This introduces an additional stage of optimization. Further details on this can be found in the Supplementary Material (SM), section C.
2.4.2 Denoising Loss
Focusing on the diffuse interfacial zones gives an efficient and precise approximation of the MPF solution, but still does not exclude random prediction away from the interfaces. To address this issue, a denoising loss, functioning as a corrector, is introduced. A batch is typically made of three distinct regions: A, B, and C, where region A corresponds to the interface, region B is the grain-containing area, and region C encompasses areas far from the interface (no-grain), see Figure 2). By dynamically identifying areas of interest, as depicted in Figure 2), the additional loss term is applied within unlabeled regions that are free of collocation points, such that, each PINNi operates to minimize the prediction of phase-field values in regions identified as ’no grain’ while maximizing predictions in the grain-containing regions. Eq. 11 describes this loss, where the Mean Squared Error (MSE) is calculated based on the prediction and is conditioned on whether the region is identified as ’no grain’ or ’grain,’ denoted by and , respectively:
(11) |
2.4.3 Dynamic Resampling
Resampling, applied after each Scipy optimization, serves as a dynamic process that benefits both converged and non-converged PINNis. The collocation points undergo automatic resampling. This approach offers a dual purpose: converged PINNis are subjected to testing with new but relatively similar populations. This testing mechanism safeguards against overfitting and ensures the robustness of the predictions as confirmed by previous PINN studies [WU2023115671, raissiphysicsinformed2019, JAGTAP2020109136, li2022dynamic, daw2023mitigating]. Simultaneously, the non-converged PINNis are exposed to new, unexplored samples to have a better chance to enrich their training and therefore a better convergence.
2.5 Propagation of BC across Sub-domains
As shown in Figure 1(b), the propagation of BC along the neighbors in our decomposed space is required to ensure continuity and therefore a reliable prediction of the MPF solution. To this end, a customized BC loss is introduced that concerns the spatial internal boundaries across subdomains/batches/PINNis handling a given phase-field. Here, each PINNi optimizes its weights by identifying its neighbors while considering the boundary predictions of its neighbors that are used to minimize the additional BC loss. The BC loss function is set as the sum of differences between the prediction of the current PINNi and those of the neighbors, depicted also in Figure 1b. A detail that enhances the handling of the boundary condition is to consider the PINN00xx (in batch 0 for each phase) as an absolute reference after addressing its PDE and IC losses. Subsequently, for a given PINNi, the west and inner neighbors are regarded as temporal references. This leads to a systematic propagation of the boundary conditions, ensuring that all PINNi instances update their weights based on the PINN00xx. These interactions create mutual dependencies among neighboring domains/batches/PINNis, fostering spatial-temporal continuity during the training process.
2.6 Handling Multi-Phases (Vertical Optimization)
The full discrete construction of the current framework in space, time and phase-fields allows the instant communication between PINNis, through the MASTER PINN, that is required to perform the multi-phase studies. This is achieved in two steps: In the first step, one is entailed to identify the possible interaction(s) between any two phase-fields. This must be done for every PINNi. Obviously, with every interaction found, an associated PINNi for that phase-field is identified as well. A fundamental assumption here is that two given phase-fields interact when their diffuse interfaces intersect. This results in obtaining a batch-phase space. In 2D, a maximum of three phase-fields intersect, forming a triple junction. An example of this idea is illustrated in Figure 3 where we find that out of the 48 possible batch-phasе combinations, 8 exhibited no interactions. This approach has a significant impact when handling a large number of phase-fields. Hereafter, the term ’grain’ is also used interchangeably with ’phase’ once reflects the same entity.
The MASTER-WORKER PINNi structure enables a computational- and memory-efficient calculation of the Laplacian term in Eq. (1). Each PINNi responsible for a specific phasе-field communicatеs its computеd Laplacian tеrm instantly to thе nеighboring PINNis working on the same batch but handling different phase-fields. This is essential for avoiding the computation of multiple Laplacian terms by each PINNi and enhancing the computational efficiency. The numerical implementation of the MPF model (Eq. (1)) is simplified into the parallel computation of the interaction terms for each phase within each batch. Subsequently, a robust communication protocol ensures synchronization of the computation for the correct equation terms, as detailed in Algorithm 2. To ensure synchronization, a given PINNi needs to wait to receive all the required information from the PINNis in interactions. As a reminder (c.f. paragraph 2.4), an optimization detail involves the use of a shared collocation dataset for all PINNis handling different phases within the same batch. For simplification, the collocation dataset corresponding to the first phase in the initialization (phase in algorithms 1 and 2) is consistently selected as a reference.
2.7 The Wheel of Optimizers
The framework effectively combines Adam’s stochastic optimization with Scipy’s L-BFGS, fostering a collaborative approach. Adam operates at each epoch, contributing its stochastic optimization capabilities, while Scipy, notably the L-BFGS optimizer, is periodically selected. During these selections, L-BFGS efficiently leverages information from the Hessian matrix to optimize weights. When a given PINNi successfully converges, reaching a loss value below the predetermined threshold, it temporarily steps back, entering a wait state. Meanwhile, other PINNis, selected from a shared ’basket of PINNis,’ proceed with their L-BFGS optimization. This metaphorical basket illustrates the practice of choosing specific PINNs for weight optimization and then reintegrating them, symbolizing a cyclic process; hence, the terminology ’wheel of optimizers’. The purpose of this cyclic process is to ensure that optimized learning results in synchronized motion between batches, leading to accurate predictions at boundaries, and effectively captures the dynamics across phases, including terms from PDE.
2.8 Transfer of Learning
The highly symmetric and parallel construction of the current implementation enables the transfer of learning in multiple directions. Within the same simulation, we transfer the learning spatially through the pyramidal training (between the levels of pyramids) and across boundary conditions. The transfer also occurs temporally, from one spatial domain to its image in the next time step. These allow us to create checkpoints to continue/restart a simulation without losing the accumulated learning. A particular application of such transfer is to optimize the parallel performance of the worker PINNis. When a worker PINNi fails to converge after a certain number of Scipy optimizations, it is then warm-started by receiving the learning from the nearest neighbor already converged. This is to enhance its convergence rate and overall performance. Such a situation could arise indeed due to various factors, including insufficient data or training samples, numerical instabilities, singularities or discontinuities in the data and inadequate regularization. Additionally, the framework allows real-time processing, facilitating easy debugging and dynamic adaptability.
2.9 Phase correction
As a requirement to handle MPF problems, a dynamic correction mechanism for phase predictions is hereafter introduced. This algorithm conserves the summation of the phase fields of different phases within interfacial regions. PINNs often fail to predict correct solutions, necessitating correction algorithms to satisfy conservation laws [HUANG2021103727]. Therefore, incorporating conservative correction algorithms and adequate optimization methods is essential for accurate predictions [Zheng2022].
3 Methods
4 Results
4.1 Simulation Results
Analogous to the MPF approach, the reliability and performance of a PINNs-MPF for studying the interface dynamics in polycrystalline materials need to be benchmarked capturing three basic aspects: First, the curvature-driven motion of an interface, second, the correct motion of the interface under any additional driving force and third, the force balance at the junctions of interfaces [darvishikamachali2013grainPhD]. The first two requirements in this set can be generally written as:
(12) |
with the local interface velocity normal to its plane, the local curvature of the interface, and a constant driving force, where with consider in Eq. (1). The third requirement concerning interface junctions necessitates the MPF to fulfill Young’s law, which relates the angles between interfaces at the junction to the interfacial energy. For a triple junction and isotropic interface energy assumed, the three interfaces at equilibrium meet at a 120∘ angle. To test the PINNs-MPF framework, certain simulation scenarios related to the above requirements are studied as listed in Table 1, along with related physical parameters and applied hyper-parameters in the PINNs implementation in Table 2. We note that for the grain shrink scenarios, the number of IC points per batch serves as an input parameter, and Equation 9 is used for its generation (with and varied up to 90). However, for the triple junction scenario, this process is automated through Equation 10, where is set to 40 in order to dynamically populate the batches during training, depending on the evolution of the solution.
|
Physical parameters | ||||||||||||
Benchmark | Nx | dx | Ny | dy | Nt |
|
Mobility |
|
Phases | ||||
|
100 | 0.01 | 100 | 0.01 | 1000 | 10dx | 1e-4 | 1 | 1 | ||||
|
64 | .. | 64 | .. | 100 | 7dx | 1e-4 | 1 | 1 | ||||
|
64 | .. | 64 | .. | 150 | 4-9dx | 1e-4 | 1 | 1 | ||||
Triple juncion | 64 | .. | 64 | .. |
|
6dx | 1e-4 | 1 | 4 |
NN Architecture | Optimizers | ||||||||||
Benchmark | NNs | HL/NN | Nodes / layer | Adam | L-BFGS | ||||||
LR |
|
|
Per. | ||||||||
|
1 | 6 | 32 | 1e-3 |
|
1000 | 100 | ||||
|
4 | 6 | 32-128 | 1e-4 | idem | 1000 | 50 | ||||
|
4 | 6 | 128 | 1e-4 | idem | 1000 | 50 | ||||
|
|
6 | 128 | 1e-5 | idem | 500 | 50 |
-
•
(*) For the triple junction, the pyramidal approach was tested for the early stages of the training.
-
•
(**) When using the pyramidal training, 64 NNs are inititalizated, while 16 NNs are selected for training using the the concept of the basket of PINNs.
To further highlight the effectiveness of multi-networking in conjunction with a combination of optimizers for training PINNs, an additional numerical experiment is proposed as follows: reproducing the benchmark of grain shrinkage under constant driving forces with varying numbers of hidden layers and neurons per layer. Specifically, a fixed number of six hidden layers with the number of neurons ranging from 64 to 512 was applied. This was achieved by employing a cyclic optimization strategy alternating between the Adam optimizer and the L-BFGS optimizer (referred to as the wheel of optimizers). Associated findings and comments are provided in the discussion section.
Additionally, we illustrate in Table 3 the progressive increase of the optimization techniques to deal with the induced complexity by the tackled scenarios.
Benchmark | Activated Techniques |
|
|
|||||||
|
|
4 | 3,297 | |||||||
|
|
5 | 416,005 | |||||||
|
|
5 | 416,005 | |||||||
Triple junction |
|
9 | 1,414,417 |
Furthermore, for the scenarios involving grain shrinking and triple-junctions, the MPF solutions from OpenPhase calculations are used [darvishikamachali2013grainPhD]. The associated scheme is provided in the SM, section A. All numerical implementations were coded using TensorFlow and performed on standard workstations with an AMD Ryzen Threadripper PRO 5975WX 32-Cores CPU (64 threads).
4.2 Traveling Interface under Constant Driving Forces
As a key starting point, the behavior of a planar interface traveling under a given driving force is explored. In the absence of the curvature, this can be simplified to a 1D problem in which the phase-field profile propagates according to [steinbachphasefield2009, darvishikamachali2013grainPhD]:
(13) |
The results of this campaign are gathered in Figure 4. The utility of analyzing a traveling wave solution lies in its ability to accurately predict the shape and velocity of the phase-field profile. Figure 4(a) presents the setup of the BC and IC points, as well as the placement of PDE collocation points for this problem. As described in paragraph 2.4, the meshing is predominantly focused within the interface regions, where the PINN model is trained to predict the solution. Excellent agreement is obtained between the PINNs-MPF and the theoretical solutions at various time instants, Figure 4(b). Figure 4(c) shows the spatial evolution of the interface resolved by the PINNs-MPF. To trace the interface motion and its velocity, a representative point with a phase-field value at is selected. Figure 4(d) compares the theoretical and predicted velocities, revealing an accuracy approaching unity after 2500 epochs of training, with a total 10 minutes of computation time.
For a planar interface (no curvature), the interface velocity is expected to linearly scale with the input driving force (Eq. (12)). Figure 4(e) shows the interface velocities obtained for various driving forces, perfectly matching the theoretical expectation. Here the PINNs-MPF were first trained for a single value (Figure 4(f)) and the learning is then transferred to predict the interface velocity for various values. Such transfer of learning is typically carried out within a significantly reduced number of (maximum 100) epochs of training, as the training is not always evitable [weinan_deep_2018]. From a technical perspective, this analysis tests localized meshing, the transfer of learning and full-discrete resolution. It is worth noting that here a single neural network effectively handles the 1D scenario.
Configuration | Neural Networks | HL | Neurons/HL | MAE | MSE |
1 | 4 | 6 | 32 | ||
2 | 4 | 6 | 64 | ||
3 | 4 | 6 | 128 |
One can immediately expand the driving-force-driven interface kinetics to 2D where the planar interface is replaced by a circular interface. In this scenario, the interface has a curvature, but its effect is negligible under the condition that (Eq. (12). Although the linear scaling of the interface velocity holds, this scenario imposes a challenge for the PINNs-MPF going from a cartesian to a spherical coordinate. PINNs-MPF predictions versus theoretical solution from OpenPhase are shown in Figure 5. Here initial trials involving a single NN (used for the 1D case above) revealed limitations marked by training instabilities and noise (c.f. the SM for qualitative and quantitative comparison between PINN predictions and ground truth solution for a single NN). However, stable training was obtained when applying four neural networks. Indeed, subsequent refinements increased the prediction accuracy by progressively increasing the number of neurons starting from 32, resulting in an optimal configuration of 128 neurons and 6 hidden layers, as observed in Figure 5. To quantify the differences between each PINN prediction and the theoretical solution, it is proposed in Table 4 measures of the MSE and Mean Absolute Error (MAE) for each prediction compared to the ground truth solution. The architecture with 128 neurons not only accurately matched the theoretical solution (with MAE and MSE values of and respectively) but also outperformed the phase-field solution obtained using an explicit scheme. Thus, it is hereafter proposed to keep this architecture for subsequent benchmarks as it allowed to deal with the double-well potential term and the non-convex potential term in Eq. LABEL:eq:dual_form on a one hand, and to fix the related set of parameters of this architecture, especially for the multi-phase field scenario.
It is worth noting that changing from the 1D to 2D setup, the PINNs-MPF is well capable of capturing and maintaining the correct interface thickness throughout the simulation.
4.3 Curvature-driven Interface Motion
Capturing the natural curvature-driven motion of an interface is the most central capability of a phase-field model, encoded into the gradient energy term and related Laplacian in the governing free energy functional and equations of motion, respectively. When the Laplacian terms take control over the interfacial motion, i.e., in Eq. (12), nonlinear kinetics arise: Here we study the shrinkage of a circular phase (grain) with , giving . This gives another level of complexity to test the PINNs-MPF framework. The number of time intervals required for the resolution of this case was 150, corresponding to , while is required to resolve the same problem using the PF scheme (c.f. the SM, section A). The results are presented in Figure 6. Here, the PINN solution fits the theoretical one. However, the phase-field linearized scheme results in a premature shrinkage of the grain. The same result is obtained even when trying with smaller time steps. This case, where the motion is fully governed by the interface and no external driving force is applied, demonstrates that PINNs can efficiently handle non-linearities in the phase-field context. Indeed, it is worth reminding that we use the tanh activation function in hidden layers to capture diverse non-linear patterns in physics-informed input data. For the output layer, the sigmoid activation ensures bounded predictions, making it suitable for interpreting outputs. The specific limits (the order parameter) remain between 0 and 1. These tailored non-linear activations allow PINN to effectively handle the grain shrinkage scenario where the motion is controlled by the interfacial energy and curvature.
The pattern of the phase-field solution depends on the interface width ; for a thin interface (), there is a slight deviation from the theoretical solution, while the results improve with increasing the width. This behavior is systematically investigated [darvishikamachali2013grainPhD], showing that it arises from the numerical limitations of calculating the Laplacian and consequently the interface curvature. This is a feature of the diffuse phase-field interfaces. The PINNs-MPF predictions were studied for three interface widths of 4dx, 7dx and 9dx. In all configurations, a consistent alignment is evident between the PINNs-MPF and the OpenPhase solutions. This is best visible for the thinnest interface shown in Figure 6b. This set of benchmarks demonstrates that the PINNs-MPF is not only capable of handling the nonlinear interface kinetics but also has a good sense of the interface width and its impact on the solution. The latter is crucial for future developments of the model to deal with interfacial phenomena, such as interfacial elasticity and solute segregation [wang2021incorporating, zhou2022revealing].
4.4 Establishing Equilibrium Triple Junctions
A major capability of MPF is to handle interfacial junctions. This is not only to fulfill Young’s law but also to ensure the evolutionary path leading to a configuration with minimum energy. To test our PINNs-MPF framework, a system with four phase-fields (grains) and six triple junctions was considered. The initial mesh is visualized in Figure 7. Using our method (c.f. the SM, section C), the IC points are densified within the interfacial region, while some random points were also selected away from the interfaces to allow PINNs to learn better. To streamline computations, BC are selectively addressed along the interface lines between different batches. It is worth highlighting that the initialization having sharp angles presents a challenge for PINNs to produce accurate solutions. This is because of the curvature-driven nature of the interface dynamics, thus, initiating away from smooth curves/shapes results in complex evolutionary scenarios.
The simulation results are visually depicted in Figure 8. Here, each grain is shown in its initial and final state of the simulation, compared to the simulation results from OpenPhase. Figure 8(m)-(o) show the total interfacial region in the initial and final state. The comparison between PINNs-MPF and OpenPhase reveals an excellent agreement. The microstructure evolves into an equilibrium state where the angles at the triple junctions approach 120 ∘. One can see that the motion of the phases is well synchronized, allowing the global solution to converge to the equilibrium state. Note that the application of boundary conditions on the outer boundary of the simulation box requires the interfaces on both ends to adjust in an energy-minimizing manner.
The difference between the PINNs-MPF solution and OpenPhase lies in the interface thickness. This difference may be justified by the gradient topology of each methodology: It is indeed noted that the gradient calculation employed in the implementation, specifically using the "gradient.tape()" module in TensorFlow, differs from the Laplacian used in the regular grid computation for MPF calculations. Additional details about the difference in the computation of the Laplacian terms are given in the SM, section A. This difference explains the minor discrepancies related to the interface width. However, it can be asserted that PINNs predictions exhibit greater fidelity compared to OpenPhase when preserving similar interface widths to the initial state. We also note that, due to the curvature-driven nature of grain growth problems, handling an initialization with sharp angles (90∘) is challenging. This demonstration of the correct evolution of such a critical initialization consolidates the use of such a framework in increasingly difficult contexts. From a technical standpoint, it’s worth noting that this results in relatively slow training in the beginning until addressing the sharp angles, after which the training accelerates.
Initialization MPF Solution PINN Predictions
5 Discussion
Analyzing the training dynamics through the loss functions
The training of PINNs-MPF involves various loss terms, described through Eqs. 4 to 7.
To understand and optimize the training process necessitates a comprehensive analysis of each loss in terms of their individual behavior and collective synchronization.
For the grain shrinking scenario (paragraph 4.3), the evolution of various mean losses for the four PINNis across epochs is shown in Figure 9.
Here, (i) each loss curve corresponds to the average of the four corresponding losses from the four neural networks (NNs) and (ii) the entire training process is technically conducted in a single execution. However, for the sake of clarity in illustrating the evolution of losses, we choose to present the simulation in two stages, such that, after convergence in the initial time interval, the acquired learning is then applied to restart the simulation.
Figure 9a shows the evolution of the different losses for the first time interval (-). The threshold is set at , marking the transition point to the next time interval. The training initially focuses on the interfacial regions, activating only the PDE and IC losses. Once the solution is computed in these regions, BC and denoising losses are subsequently activated. This aims to establish spatial continuity between batches and eliminate noise within both grain and non-grain areas, respectively. The discrete activation of BC and denoising loss has proven to be more efficient in enhancing training compared to continuous activation. It is proposed to categorize the losses into intrinsic losses, including partial PDE and IC losses, representing core elements for the NN. On the other hand, extrinsic losses, here BC and denoising, play a role in fine-tuning the solution and establishing continuity with neighboring NNs. These extrinsic losses could be subjected to optimized integration.
Figure 9(b) shows the local minimums corresponding to this transition, where all maximum losses of PINNis (as defined in Eq. (3)) fall below this threshold, indicating the onset of training for the next time intervals. The gaps between peaks correspond to the length of each time interval. The simulation could be then divided into two distinct phases. Initially, when the grain radius is substantial, the training progresses relatively fast. Subsequently, in the second phase with a smaller grain size, the BC loss exhibits an inverse relationship with the grain radius, becoming the most influential. The seamless transfer of learning between time intervals becomes evident as the PDE loss initially tends to decrease and then stabilizes in the later stages of training. Notably, the magnitudes of losses appear reasonable, with the PDE loss being approximately ten times greater than that of IC. If the BC loss remained activated throughout the entire simulation, in the subsequent benchmark (triple junction), it was managed through discrete activation, mirroring the approach adopted during the initial time interval.
For the triple junction scenario (paragraph 4.4), the evolution of various losses (PDE, IC, BC, and mean loss) for the 16 NNs across epochs is illustrated in Figure 10.
In this context, the threshold is now set at .
Similar to the grain shrinking scenario, the first time interval is addressed, and then the cumulative learning is used to restart the simulation. For this purpose, two approaches are compared, with and without a pyramidal approach (respectively in Figures 10 a and b). It is noted that the only difference between the two simulations is the number of batches for the fine subdivision (c.f. Algorithm 1). Specifically, =16 and =4 for the pyramidal approach, while ==4 for the second configuration. In Figure 10(a), the impact of the optimization techniques is subsequently showcased: wheel of optimizers and the pyramidal training. The wheel of optimizers is activated with a period of 50 epochs; L-BFGS-B initially decreases the global loss from a magnitude of to MSE, then a progressive decrease is achieved with subsequent rounds to below the threshold of MSE. Then, Adam allows the transfer of learning from level 1 to level 2 of the pyramid, where the whole spatial domain is handled. Even if the global loss increases initially, it quickly falls below the target threshold (the 16 NNs converge together within 850 epochs), demonstrating the model’s capacity to generalize the learnable solution to the whole domain. However, the non-adoption of such a strategy makes it hard to enforce the convergence of all the NNs, as seen in Figure 10(b), and the global loss stagnates above the threshold even after 6000 epochs of training, using the same set of parameters as in Figure 10(a) except for the non-activation of the pyramidal training. It is worth noting that such a comparison was subjected to excessive re-execution for repeatability with the same outputs; the 16 NNs failed to converge simultaneously without the activation of pyramidal training during the first time interval. It is also suggested that this finding is also attributed to two main impacts of pyramidal training: the progressive data augmentation basis of this concept, and the facilitation of addressing boundary conditions, as accurate boundary predictions are a direct consequence of good convergence within the domain. Additional details about the domain decomposition related to the pyramidal approach are provided in the SM, section D.
The black dashed line in Figure 10c corresponds to the total mean loss, exhibiting oscillations that dip below the threshold (red line) for the enhancement of the next training interval. Here, the boundary condition loss is not visualized due to its discrete activation; however, its magnitude is noted to be in the range of . It is noteworthy that all losses for all NNs should fall below this limit (not just the mean values). Previous observations regarding the behavior of the IC and PDE losses remain relatively valid in terms of the stabilization of losses across epochs. However, the declining aspect observed in the previous scenario is not evident here. This discrepancy could be attributed to the complexity of the triple junction evolution and dynamic interactions between different phases when compared to the single-grain shrinkage scenario. The sum loss (Eq. (LABEL:eq:summConstraint)) has the lowest magnitude (), and its constant values reflect the algorithm’s capacity to dynamically correct predictions, ensuring adherence to the summation constraint.
Addressing Nonlinearities in Microstructure Prediction
The current implementation showcases the effectiveness of this method in addressing non-linearities such as in the MPF model, offering good predictions of microstructure evolution in different scenarios above. The ability to capture both curvature-driven and driving-force-driven interface dynamics is well demonstrated. Typical MPF simulations employ a time-stepping approach with a magnitude of up to 10-4 and a minimum of 1000 steps across the given scenarios that is to ensure computational stability. In contrast, applying PINNs is shown here to allow for the selection of much larger time-stepping up to 200 time steps, except for the traveling interface simulation where computational speed was a focal consideration. This difference stems from the fact that while the conventional MPF is limited due to the spatial resolution, the PINNs-MPF focuses on the overall microstructural patterns, substantiating its ability to capture underlying physics while maintaining stable computational schemes.
Tuning the model hyper-parameters
One of the crucial points when dealing with PINNs is the tuning of the hyper-parameters. Here, any simplification made for hyper-parameters directly impacted convergence and can reduce the efforts for trial and error. In terms of the training dataset, this is automatically set, simplifying user inputs to the initialization, physical, and geometrical parameters.
On the other hand, we have adopted major concepts from conventional MPF frameworks, namely the OpenPhase [darvishikamachali2013grainPhD]. This includes considering our simulation domain heterogeneously, with a focus on the phase-field interfacial region, while parallelizing the handle of individual phase-field variables. This demonstrates that the PINNs-MPF can successfully inherit optimization techniques already applied in classical computing approaches. Especially, the PINN resolution can be then assimilated as a multi-variable sequential learning, showcasing its adaptability and integration capabilities.
The application of multi-networking and training by blocks demonstrates the feasibility of addressing diffuse interface problems with simplified hyper-parameters.
For specific challenges handling moving boundaries, the implementation of adaptive weights (gradient scaling algorithm) was unavoidable [Haghighat2022]. It is therefore worth noting that the loss terms in the implemented PINNs-MPF are by default left unweighted, i.e. in Eq. 3. This reduces the complexity of the current framework and the need for meticulous tuning, especially when considering the diverse nature of loss terms in Eq. 3. Moreover, this kind of algorithm could be useful to integrate in more complex scenarios. We note that our numerical experiments have indicated that the extrinsic losses can be weighted to optimize performance; for instance, reducing the weights for the BC loss to expedite training, and conversely, increasing the weighting of the phase summation loss to allocate more attention to it. However, the intrinsic losses (IC and PDE) cannot be weighted.
Interfacial Attention for a Latent Microstructure Representation
The results of the conducted benchmark studies indicate that the global PINN solution in simulations involves a precise computation of solutions along interface regions throughout the entire domain, complemented by an extrapolation method to approximate solutions at the interface region. This hybrid approach proves beneficial for diffuse-interface problems using ML.
In the triple junction scenario, the distributed ’0 and 1’ points, as depicted in Figure 7, were employed to enhance the precision of the solution, particularly for the application of phase summation (Eq. 2) and accurate addressing of boundary conditions between neighboring NNs. Similarly to the mono-phase benchmarks, it is possible to fully focus on interfacial regions within multi-phase simulations. First demonstrations of this are provided within Section E of the SM.
This opens the way to resolve MPF simulations using NNs independently of the grid size, thus allowing for tackling more complex scenarios.
Dynamic correction mechanism for phase summation constraint
Within the literature, a major limitation against the application of ML to resolve multi-phase problems is the phase summation criterion:
Indeed, in previous works, a requirement for a bounding algorithm was obligatory as a correction step to the PINN prediction [HUANG2021103727, Zheng2022].
Meanwhile, in the current work, the applied method introduces a dynamic correction mechanism for phase predictions. This involves dividing the prediction of each neural network (NN) responsible for a specific phase by the sum of predictions from other NNs handling other phases within the same batch. This dynamic correction not only ensures continuous training but also suppresses the need for an external correction algorithm. Instead, the correction of phase summation is seamlessly integrated as an additional constraint loss term, enhancing the efficiency of the training process.
Exploring model complexity and impact of the optimization strategy
The results about the additional numeric experiment are gathered in Table 5. Associated graphical illustration is provided in Figure 11.
The experiments revealed that while all architectures started with high and similar MSE values, the MSE decreased significantly after the first round of Adam optimization. The four models reach similar loss magnitude after the first round. Subsequent L-BFGS optimization further reduced the MSE for the and architectures, with the model attaining the best overall MSE of . Larger architectures like and failed to converge within the specified rounds, potentially due to overfitting or optimization challenges arising from their high parameter counts. The architecture struck an optimal balance between model complexity and generalization ability, benefiting from the combined Adam and L-BFGS optimization strategy. While larger models had more parameters, they did not necessarily perform better and could be computationally expensive.
This numerical experiment reveals a solid correlation between the model’s parameter count, representing its complexity, and the impact of multi-networking. For instance, the configuration, yielding 1,651,205 trainable parameters, challenges training when employing four NNs. Conversely, a reduced parameter count of 1,414,417 enables training 16 NNs (16 x (6 HL × 128 neurons)) simultaneously to effectively handle four phases (c.f. the triple junction benchmark). Moreover, even higher parallelism is feasible through pyramidal training, where initially 64 NNs are initialized, but only 16 are selected for further training using the basket of PINNs concept. That implies that using moderately-sized networks through the multi-networking context (the extended subdomain decomposition technique) allows to efficiently alternate optimizers (the wheel of optimizers), while the transfer of learning facilitates the transition between time intervals. This underscores the rationale behind initially selecting a subset of PINN techniques, while the second subset (AMFO, phase correction, and vertical optimization) has proven crucial in prior MPF method studies and the current work. Pyramidal training and the comprehensive set of PINNs emerge as essential for progressively transferring learning to a unified domain when dealing with massive domain decomposition, thereby making the framework ready for up scaling, as hereafter detailed.
|
|
|
|
|
|
||||||||||||||
6 x 64 | 105,605 | 5 | |||||||||||||||||
6 x 128 | 416,005 | 3 | |||||||||||||||||
6 x 256 | 1,651,205 | 10 | |||||||||||||||||
6 x 512 | 6,579,205 | 10 |
It is worth reminding that convergence signifies that the global loss of the model has fallen below the threshold and the current time interval has been managed.
Transfer of learning
In the context of learning transfer, our framework demonstrates the feasibility of hierarchically transferring knowledge within subdomain decomposition. Through multi-networking and full parallel training, it employs a pyramidal training approach, categorized as ’Multiple to Multiple’ transfer. This allows the transfer of learning, for example (but not limited to), from 4 NNs to 4 others, 16 to 16, 16 to 4, etc. From a technical standpoint, achieving ’Multiple to Single’ transfer of learning is possible, where knowledge is transferred from multiple NNs to a single NN and vice versa. However, our first trials showcased that training a single NN revealed challenges, leading to optimization failures. The synergy between ML and optimization algorithms is crucial for effectively exploiting patterns, but this task becomes challenging with large NNs. Multi-networking and pyramidal training until reaching a reasonable number of NNs proves efficient in managing such complexities. Nevertheless, we provide within the actual code repository a demonstration of this ’Multiple to Single’ transfer, denoted as ’ to Master’ transfer, that could be of interest for other physical applications. Additional details are provided in the SM, section F.
Exploring more ways for improvement of the framework
In the current development phase, one aspect to consider is selecting the appropriate threshold to transition between time intervals in the full-discrete resolution. This threshold was found to be strictly related to each simulation and subjected to trial and error. For the magnitude of 10-5 in single-phase benchmarks, it was slightly raised when handling the multiphase field (MPF) scenario to ensure a compromise between the speed of the training and the convergence of the solution. One possibility to deal with this limitation is to integrate an additional loss term related to the variation in the energy of the system. If the energy residual (and its derivative) reaches a steady state, it is then a physical indication that the system reaches its equilibrium within this time interval, and then the transition could be ensured [GUILLENGONZALEZ2014821, WANG2023216]. Theoretical formulations and graphical illustrations related to the utility of integrating energy loss to enhance training are provided in the SM, section B.
A potential point of improvement for the current PINNs-MPF framework is the integration of sequential training. Indeed, optimizing the evolution of losses related to the triple junction motion could be achieved if the model captures the temporal behavior of the solution. The flexibility of discrete resolution under the current implementation eases the integration of sequential learning through Recurrent Neural Networks. Optimizing runtimes against theoretical solutions or existing literature is beyond the scope of this study, as PINNs are inherently more time-consuming than classical approaches. Enhancing PINN solvers is a work in progress [cuomoscientific2022, chuang2022experience] and studies on the use of PINNs for solving multi-phase problems remain uncommon [Haghighat2022] . Therefore, the first goal of this study is to prioritize solution fidelity, training continuity, and efficient resource distribution. This approach is similar to recent studies focusing on multiphase phenomena, particularly in fluid dynamics. For instance, Haghighat et al. [Haghighat2022] have concentrated on the methodology and challenges associated with training PINNs for coupled flow and deformation in porous media. Zheng et al. [Zheng2022] have aimed to predict the sequential motions of flow simulations in a discontinuous manner, alternating between PINN training and MCBOM corrections. Amini et al. [AMINI2023112323] have focused on validating the inverse modeling approach targeting nonisothermal multiphase poromechanics. To conduct running time studies on these multi-phase field frameworks, inspiration could be taken from the study of Shukla et al. [SHUKLA2021110683]. Among other proposals, leveraging multiple GPUs and nodes and adopting a hybrid programming model could be beneficial.
6 Conclusion
An interconnected PINNs-MPF framework has been presented to resolve MPF simulation of interface dynamics. Our approach has been to address and benchmark most central requirements for a reliable reproduction of MPF simulations, namely, curvature- and driving-force-driven interface motions and the evolution of a triple junction. In particular, the nonlinear evolution of curved interfaces and the effect of interface width on the interface kinetic were captured, demonstrating the capability of this framework in dealing with diffuse interfaces. The simulation of the triple junction established that the application of the sum constraint as a loss term and in combination with a parallelization scheme works precisely and efficiently. The rationale behind the current development has been to demonstrate the feasibility of PINNs in capturing key features of microstructure evolution, before increasing complexities. This ensures a well-prepared transition to more advanced dimensions while maintaining the robustness and reliability of the methodology. All benchmark simulations are conducted in a single execution without the need for post-correction algorithms, such as intercalated phase correction algorithms. Various advanced techniques were encompassed in the PINNs-MPF framework, including training optimization, extended domain decomposition, and boundary condition propagation. These techniques have yielded accurate predictions and efficient training, establishing a robust foundation for future advancements in this research area. Potential avenues for further exploration include developing more unified solutions through sequential learning, conducting comparative studies with other PINN approaches in the literature, and integrating energy-based optimizations, which is a key component of the MPF Method. As this progress step, the vision is to extend the capability of the developed framework to address more intricate scenarios involving a greater number of phases, larger grids, and three-dimensional dimensions. As the difficulty of the scenarios gradually increases, it is crucial to carefully address related issues when using machine learning to predict solutions, such as managing singularities, mitigating inaccurate predictions, and optimizing computation times.
Acknowledgments
RDK acknowledges financial support from the German Research Foundation (DFG) within projects DA 1655/2-1 (Heisenberg program) and DA 1655/3-1.
Supplementary material
Additional results and discussions as well as a video presentation of the framework are given in the Supplementary Material attached.
Data availability
The PINNs-MPF repository is available on GitHub at the following link:
https://github.com/SFETNI/PINNs_MPF
Competing financial interests
The authors declare no competing financial interests.
See pages - of S_M.pdf