Machine learning models for atom-diatom reactions across isotopologues

Daniel Julian Rian Koots Jesús Pérez-Ríos Department of Physics and Astronomy, Stony Brook University, Stony Brook, New York, 11794-1901, United States

(July 1, 2024)

Abstract

This work shows that feed-forward neural networks can predict the final ro-vibrational state distributions of inelastic and reactive processes of the reaction of Ca + H₂ $\rightarrow$ CaH + H in the hyper-thermal regime, relevant for buffer gas chemistry. Furthermore, these models can be extended to the isotopologues of the reaction involving deuterium and tritium. In addition, we develop a neural network model that can learn across the chemical space based on the isotopologues of hydrogen. The model can predict the outcome of a reaction whose reactants have never been seen. This is done by training on the Ca + H₂ and Ca + T₂ reactions and subsequently predicting the Ca + D₂ reaction.

^†^†preprint: APS/123-QED

I Introduction

Artificial intelligence and machine learning are becoming ever more prominent in the big-data era, and this is certainly true in the physical and chemical sciences. In the study of physical chemistry, powerful data-driven tools can be used to make predictions about the outcomes of chemical reactions [1, 2, 3, 4, 5], the spectroscopic properties of molecules [6, 7, 8, 9, 10, 11], the discovery of new materials [12, 13, 14], development of quantum technologies [15] and the design of new and efficient experimental protocols [16, 17, 18]. In each scenario, the driving force is the same: finding universal models to ensure reliable predictions without performing costly computations.

Universal machine learned models capable of making predictions for many chemical reactions would revolutionize chemical engineering and manufacturing by unlocking the optimal conditions under which reactions can be produced. Motivated by this possibility, the physical chemistry community is working on developing universal data-driven chemical reaction predictors. However, the task is heroic due to the vastness of the chemical space. On the other hand, it is possible to focus on small, fully controllable systems to unleash the predictive power of machine learning: atom-diatom collisions, which, at low temperatures, find applications in atomic and molecular physics [19, 20]. In that endeavor, there have been several efforts toward predicting atom-diatom collisions in the hyperthermal regime [21, 22, 23, 24, 25, 26, 27, 28, 29]. Specifically, it has been shown that neural networks (NN) can efficiently predict the state-to-state cross sections and the product state distributions. However, most of these studies focused on the reactive pathway, and only recently, a combined study was performed: inelastic plus reactive processes for O + CO reactions have been reported [23], albeit in this reaction, the reactive process only accounts for atom exchange. In addition, and more importantly, the machine learning models developed can only make predictions on previously seen reactions; i.e., the machine learning models are exposed to data for the same reaction to be predicted, which constrains the universality of the models.

In this work, we show that it is possible to predict the product state distributions of an atom-diatom reaction across the isotopologue space via deep learning. Precisely, we predict the final ro-vibrational state distributions of the reaction Ca + H₂ $\rightarrow$ CaH + H in the hyperthermal regime, and those related isotopologues involving the hydrogen isotopes. We show that a reduced featurization is suitable for training independent models capable of individually predicting each isotopologue of the reaction. Furthermore, this featurization can be used for models predicting the final ro-vibrational state distributions of inelastic processes. To move towards a more universal atom-diatom reaction predictor, we develop a model capable of generalizing between the isotopologues of the reaction by taking into account the isotopic effects of the reaction using an extended featurization. The paper is organized as follows: Section II presents the fundamentals of quasi-classical trajectory (QCT) calculations and the main characteristics of the NN models; Section III presents the main results based on our NN models and Section IV contains a summary and main conclusions derived from this work.

II Methodology

We focus on Ca+H ${}_{2}(v_{i},j_{i})$ collisions in the hyper-thermal regime (collision energies $\gtrsim 1000$ K) where $(v_{i},j_{i})$ denotes the initial ro-vibrational sate of the hydrogen molecule. In this regime, the collision shows three possible reaction products, characterized as:

•

Reaction: Ca+H ${}_{2}(v_{i},j_{i})$ $\rightarrow$ CaH $(v_{f},j_{f})$ + H. The CaH diatom is formed, where $(v_{f},j_{f})$ stands for the ro-vibrational state of the product molecule. This reaction could be used to efficiently produce CaH, one of the molecules of interest in cold and ultracold chemistry [30, 31], in a controllable manner, such as previous work showed [32].
•

Inelastic collision: Ca+H ${}_{2}(v_{i},j_{i})$ $\rightarrow$ Ca+H ${}_{2}(v_{f},j_{f})$ . The reactants and products are the same, but they appear in different internal states.
•

Dissociation: Ca+H ${}_{2}(v_{i},j_{i})$ $\rightarrow$ Ca + H + H. Here, all three atoms become unbounded and no product diatom is formed.

Moreover, we consider the isotopologues of the scattering process Ca+H ${}_{2}(v_{i},j_{i})$ with the deuterium and tritium molecules replacing hydrogen, as shown in Table 1.

Isotopologue	Reactive Process	Inelastic Process
Hydrogen	H₂( $v_{i}$ , $j_{i}$ ) + Ca $\rightarrow$ CaH( $v_{f}$ , $j_{f}$ ) + H	H₂( $v_{i}$ , $j_{i}$ ) + Ca $\rightarrow$ H₂( $v_{f}$ , $j_{f}$ ) + Ca
Deuterium	D₂( $v_{i}$ , $j_{i}$ ) + Ca $\rightarrow$ CaD( $v_{f}$ , $j_{f}$ ) + D	D₂( $v_{i}$ , $j_{i}$ ) + Ca $\rightarrow$ D₂( $v_{f}$ , $j_{f}$ ) + Ca
Tritium	T₂( $v_{i}$ , $j_{i}$ ) + Ca $\rightarrow$ CaT( $v_{f}$ , $j_{f}$ ) + T	T₂( $v_{i}$ , $j_{i}$ ) + Ca $\rightarrow$ T₂( $v_{f}$ , $j_{f}$ ) + Ca

Table 1: The isotopologues of the reactive collision along with the processes that are investigated. It should be noted that no NN model was developed for the inelastic process of the tritium isotopologue.

The goal of our neural network (NN) models is to learn the final product state distributions of reactive and inelastic collisions, excluding dissociation since this process does not produce molecules. Therefore, we need data to train these models. In our case, training data is generated via QCT calculations, which yields reliable results in this energy regime [21, 23, 26].

II.1 QCT calculations

QCT calculations treat the nuclear dynamics classically in the potential energy described by the electrons of the system. In contrast, the initial conditions to evolve the classical equations of motion depend on the quantum state of the system via the Bohr-Sommerfeld quantization rule [33]. Similarly, the outcome of the classical evolution can be expressed in a quantum manner using the same principle. QCT calculations are accurate to describe the dynamics of small systems at high collision energies, where many partial waves contribute to the scattering observables, and quantum effects are washed out [34].

In these simulations, the interaction potential between the atoms is specified, and many trajectories are run with randomized initial conditions (generated via the Monte Carlo method) over a range of impact parameters. The molecule is initialized in a ro-vibrational state corresponding to its energy spectrum $(v_{i},j_{i})$ and the atom is initialized beyond the interaction range of the molecule. The system is evolved classically using Hamilton’s equations of motion, and the final ro-vibrational state is assigned semi-classically using the Bohr-Sommerfeld quantization rule. The final rotational number is given by [33]

j_{f}=-\frac{1}{2}+\frac{1}{2}\sqrt{1+4\frac{\vec{J_{f}}\cdot{\vec{J_{f}}}}{% \hbar^{2}}},

(1)

where $\vec{J_{f}}$ is the relative angular momentum of the molecule. The final vibrational number for a molecule is given by [33]

v_{f}=-\frac{1}{2}+\frac{\sqrt{2\mu}}{\pi\hbar}\int_{r_{-}}^{r_{+}}\left[E^{% \prime}_{int}-V(r)-\frac{\hbar^{2}j_{f}(j_{f}+1)}{2\mu r^{2}}\right]^{\frac{1}% {2}}dr,

(2)

where $\mu$ is the reduced mass of the product molecule, $V(r)$ is the molecular potential energy as a function of $r$ , the interparticle distance, and [ $r_{-},r_{+}$ ] are the inner and outer turning points of the molecule, respectively. Both $v_{f}$ and $j_{f}$ are treated with the histogram binning method, where the above expressions are rounded to the nearest integer.

In our calculations we assume a pair-wise interaction potential, in which each pair-term is described by a Morse potential: $V(r)=D_{e}\left(1-\exp^{(-\alpha(r-r_{e}))}\right)^{2}-D_{e}$ . The potential parameters for H₂ and CaH were obtained from [35, 36] and [37, 38], respectively. We have checked that adding a non-additive interaction does not change the shape of the product state distributions, and only changes them through an overall factor. For a given collision energy, we run 10⁴ trajectories per impact parameter in the range of 0 - 3.75 a₀ sampled at equal intervals of 0.25 a₀; i.e., we run 160,000 trajectories per initial condition. The collision energies were in the range 5,000 K - 50,000 K. A limited number of initial rotational and vibrational quantum numbers were sampled given the tendency of the diatoms to dissociate under high ro-vibrational states. Vibrational quantum numbers were sampled in the range $v_{i}=$ [0, 9] and rotational numbers were sampled in $j_{i}=$ [0, 20]. A limited number of initial ro-vibrational states are explored because the lightest isotopologue of the reactant diatom, namely H₂, cannot support many ro-vibrational states before it dissociates.

II.2 Targets

The final product state distributions are discrete probability mass functions that give the distribution of the final rotational and vibrational states with the probability of either the reactive or inelastic process. The distributions are chosen in such a way that each of the rotational and vibrational distributions is marginalized over the other. Each initial state, characterized by the collision energy $E_{c}$ , $v_{i}$ , and $j_{i}$ , produces a final rotational state distribution for $j_{f}$ and a final vibrational state distribution for $v_{f}$ . Furthermore, for a given final rotational or vibrational distribution, the sum of probabilities over all quantum numbers gives the total probability of either the reactive or inelastic process. Thus, the NN models for the product state distributions aim to provide a prediction of the distribution of final states as well as the probability of the reactive or inelastic process for a given initial state.

II.3 Featurization

In this work we develop two main classes of models. The first class consists of models that make predictions for a particular isotopologue and process (reactive or inelastic) whose training data consists solely of that particular process. The second class consists of a model that was trained on two reactive processes, the hydrogen and tritium isotopologues of the reaction, that can make predictions across the isotopologues for the reactive process in Table 1. In other words, the second model predicts a reaction that has never been exposed to the NN before, namely the deuterium isotopologue, thus exploring the chemical space. The first class does not require any information that distinguishes the hydrogen isotopes, whereas the second class has an expanded featurization that aptly differentiates the isotopologues.

The featurization for the models includes information about the initial state and the reactant diatom. The model that makes predictions across isotopologues requires additional features, such as masses and select spectroscopic constants of the reactant diatom. For the first class of models, the featurization is as follows: the collision energy $E_{c}$ in K, the initial rotational quantum number $j_{i}$ , the initial vibrational quantum number $v_{i}$ , the internal energy of the diatom, the angular momentum of the diatom, the rotational energy of the diatom (with $v$ = 0), the vibrational energy of the diatom (with $j$ = 0), the vibrational time period $\tau$ , the relative velocity, and the classical turning points of the diatom (these count as two separate features), following Ref. [25]. For the second class, which makes predictions across isotologues, the following additional features were included: the reduced mass of the reactants, the mass of the hydrogen isotope, the reduced mass of the product diatom, the harmonic frequency spectroscopic constant $\omega_{e}$ of the reactant diatom, the rotational spectroscopic constant $B_{e}$ of the reactant diatom, and the binding energy spectroscopic constant of the reactant diatom $D_{0}$ . By extending the feature vector for the second class of model to inform about the isotopic effects of the collision, the model has become more general by learning multiple chemical reactions.

II.4 Neural Networks

Neural networks (NN) are machine-learned models that fit a multitude of parameters to learn an underlying relationship presented in its training data. In this work, we use the simplest conception of a deep NN: a fully-connected feed-forward neural network with multiple hidden layers. Feed-forward NNs consist of layers, each having a certain number of neurons that act as perceptron units. These perceptron units perform a linear transformation on its input followed by a non-linear transformation, also known as the activation, to produce the neuron’s output. First, an input vector consisting of the features of the model is fed to the first layer, and each neuron in the first layer produces an output. All of the outputs of the first layer’s neurons are collected into another vector, which is fed into each neuron of the second layer. This process continues until the final layer is reached, and the vector of the outputs of the last layer are the predictions, often regarded as the target vector. The fact that each neuron’s output is fed into each neuron of the following layer makes the model fully-connected.

In the learning process, an optimization algorithm, most of the time a variant of gradient descent, is used to fit the parameters of the model, which are the weights and biases in the linear transformations of the neurons. The learning process involves continually updating the model parameters to minimize a loss function, which measures the error in the model’s predictions compared to what is actually present in the training data set. Though the parameters in the linear transformation the focus of the learning process, the non-linear activation functions are necessary for the model to learn complex relationships since they introduce the ability to learn non-linear relationships, which are common in virtually all applications. Aspects of the model, such as its architecture, learning rate, initialization scheme, optimization algorithm, loss function, and number of training iterations, are known as hyper-parameters. Hyper-parameters are not automatically fit and must be chosen in such a way, usually empirically, to best allow the model to learn from the training data without overfitting to it.

The training data for the NN models was generated via quasi-classical trajectory (QCT) simulations using the PyQCAMS software package [39]. The results of the simulation for each initial condition were used to generate the final rotational and vibrational state distributions, and the probability amplitudes of these distributions are the targets for the NN model. Training data had collision energies between 5,000 K and 50,000 K at 5,000 K intervals, initial vibrational numbers in [0, 1, 3, 5, 7, 9] and initial rotational numbers in [0, 2, 10, 15, 20]. The probability amplitudes for final rotational numbers are predicted for every other number (e.g. $j_{f}$ = 0, 2, 4, $\ \ldots$ ). The probability amplitudes for intermediate final rotational quantum numbers (e.g. $j_{f}$ = 1, 3, 5, $\ \ldots$ ) can be obtained by a linear interpolation since the final state distributions are smooth and roughly locally linear [22]. Since less vibrational quanta were occupied, the targets for final vibrational distributions consisted of the probability amplitudes for quantum numbers $v_{i}$ = 0, 1, 2, … m, with m depending on the particular isotopologue and process. Likewise, the maximum $j_{f}$ also varied depending on the isotopologue and process. Each NN model’s output vector contains both the final rotational and vibrational distributions for its particular task, which is the combination of isotopologue and process it makes predictions for.

Model	Layers	Neurons	Parameters	Training set size
NNRH	26	666	19277	242
NNRD	26	1027	53433	260
NNRT	26	1047	54153	260
NNIH	26	666	19277	242
NNID	20	546	16832	260
NNIso	22	972	52573	502

Table 2: The model requirements in terms of neural network architecture and training data set size. The required number of trajectories per model is on the order of

10^{7}

since each training example requires 160000 trajectories.

Given the need to predict multiple processes (reactive and inelastic) across three isotopologues of the reaction, a total of six models were developed, as shown in Table 2. For each isotopologue, a final state distribution model was made for the reactive process. These three models from now on will be named NNRH, NNRD, and NNRT, standing for Neural Network for the Reactive process of the Hydrogen isotopologue, Neural Network for the Reactive process of the Deuterium isotopologue, and so on. For the hydrogen and deuterium isotopologues, models were developed to predict the inelastic process. These two models are named NNIH and NNID, standing for Neural Network for the Inelastic process of the Hydrogen isotopologue, and so on. Finally, one model was made for the purpose of predicting the reactive process for all three isotopologues simultaneously, and this model is named NNIso, or Neural Network for predicting Isotopic effects.

The machine-learned neural network models used to predict the final ro-vibrational state distributions are feed-forward deep neural networks. The number of layers varies between models, and the various models’ architectures, in terms of aggregates, are provided in Table 2. The number of neurons per layer is gradually increased layer by layer to get from the dimension of the input vector to the final dimension size of the target vector, which was always larger given the need to predict the probability amplitudes over a distribution of final quantum numbers. Indeed, this general pattern of expanding the NN width from the input to the output was found to be effective in previous work [22]. The activation function of all hidden layers except the last was the softplus function, given by

\text{softplus}(x)=\ln(1+e^{x}).

(3)

The final layer of each model, which produces the output vector of probability amplitudes, uses the sigmoid activation function since the targets are all probabilities and they, as well as their square roots, which were used for training, lie in [0, 1]. The Adam optimizer was used to fit the parameters of the NN models [40]. The parameters were initialized using Glorot normal initialization [41]. The learning rates of the models varied as the learning rate is a hyper-parameter of each individual model, but they were near 0.05. Tens of thousands of training epochs were used for each model with the total number overall ranging between approximately 45,000 - 75,000 epochs. The number of training epochs was of course another hyper-parameter allowed to vary between models. The mean squared deviation (MSD) loss function was used for the training process. In this work, the NN models were implemented using the TensorFlow package with Keras [42, 43].

Refer to caption — Figure 1: A sample of test results for the prediction of final reactive rotational state distributions for an initial state of (v_i, j_i) = (6, 6). The columns are organized by isotopologue; i.e., the first, second and third columns refer to the NNRT, NNRD and NNRH models, respectively. The rows are organized by the collision energy in K. Points in blue indicate the results from QCT simulations and those in red indicate NN predictions.

II.5 Model evaluation

The performance of the NN models was determined using test data generated for initial states that were not included in the training dataset. The performance accuracy of the models on the test data is calculated using the root mean square error (RMSE) for each test example, given as

\text{RMSE}=\sqrt{\frac{1}{N}\sum_{i=1}^{i=N}(y_{i}-\hat{y_{i}})^{2}},

(4)

and it is calculated for each target final state distribution by summing over all $N$ NN predictions, $\hat{y_{i}}$ , and comparing to QCT results, $y_{i}$ . For each test example, which is a single final rotational or vibrational state distribution for a given initial state, the RMSE is calculated over all points for which NN predictions are available. In cases where the NN prediction exists but the QCT result is not specified, the QCT result is assumed to be 0 since the outcome in question was not observed in the QCT simulation and thus has an estimated probability of 0. The final reported error is then the ratio of the RMSE to the maximum probability amplitude for the distribution as given by the QCT calculation. The reported error is thus the relative RMSE

\text{Relative \ RMSE}=\frac{\sqrt{\frac{1}{N}\sum_{i=1}^{i=N}(y_{i}-\hat{y_{i% }})^{2}}}{\text{max}(\text{QCT \ distribution})},

(5)

and represents how large the error is for a given predicted distribution relative to the maximum value available from the QCT result for that distribution.

III Results

The NN model predictions for the product state distributions are divided into three categories. First, the NN models for reactive processes trained on a single isotopologue (NNRH, NNRD and NNRT). Second, for the NN models for inelastic collisions trained on a single isotopologue (NNIH and NNID), and third, for the NN model trained on two isotopologues of the reaction (NNiso), namely H₂ + Ca and T₂ + Ca, that then predicts the D₂ + Ca reaction in the testing data set.

III.1 Product state distributions of reactive processes

The three single reaction predictors (NNRH, NNRD, and NNRT) make predictions for the final rotational and vibrational state distributions of the reaction upon which they were trained. The test data for these models consisted of initial states ( $v_{i}$ , $j_{i}$ , $E_{c}$ ) not included in the training data set. Here, we tested the models’ ability to make predictions in the interpolation regime, meaning the initial states in the testing set are combinations of $v_{i}$ , $j_{i}$ , and $E_{c}$ that are within the range of the values of those variables presented in the training set. Given the small training data set size, see Table 2, the models are expected to have only limited extrapolation capability.

The results for the rotational product state distribution for Ca + H ${}_{2}(v_{i},j_{i})\rightarrow$ CaH $(v_{f},j_{f})$ + H, including all the isotopic variants of hydrogen – deuterium and tritium – are shown in Fig. 1. Specifically, the initial state is chosen as ( $v_{i}$ , $j_{i}$ ) = (6, 6), with collision energies of 13000 K, 33000 K and 43000 K. The figure shows a comparison between the NN predictions and the QCT calculations: panels (a), (d) and (g) shows the performance of the NNRT model; panels (b), (e) and (h) the performance of the NNRD model, whereas panels (c), (f) and (i) correspond to the NNRH model. The NN models agree extremely well with the QCT calculations, both qualitatively and quantitatively, despite the fact that, for deuterium and tritium, the largest probability is $\sim 10^{-3}$ . As a result, the NN models are capable of describibing any subtlety of the rotational product state distribution.

In the case of the vibrational product state distribution, the results are shown in Fig. 2, where a remarkable agreement between the NN predictions and the QCT simulations is noticed. For instance, the NN models capture all the most relevant features: highly vibrational states are populated for higher collision energies. It is worth emphasizing that for these results, we used the same training set as for the ones in Fig. 1, but in this case the test set is different.

Based on the results displayed in Figs. 1 and 2, even with small training data sets, the models did not overfit to noise and instead learned the underlying relationships. In other words, our NN models capture the physics properly but ignore the possible small noise effects inherent to the Monte Carlo approach employed in the data generation through QCT simulations.

Fig. 3 presents a general assessment of the performance of the single reaction predictors. This figure displays a heat map of the relative RMSE for the product state distributions (see Eq. 5), including vibrational and rotational product state distributions. First, one notices a rather complex pattern in the error distribution. For instance, the model NNRH shows a relative RMSE below 0.2 for initial states $(v_{i},\ j_{i})=(6,\ 6)$ and $(4,\ 12)$ , independently of the collision energy. On the contrary, the same model for $(v_{i},\ j_{i})=(2,\ 5)$ , shows at high collision energies a relative RMSE of 0.3. The same applies to the NNRD and NNRT models. It is worth noting the remarkable performance of the NNRT model for $(v_{i},\ j_{i})=(6,\ 6)$ , showing a relative RMSE below 0.06 independent of the collision energy. However, independent of the complex pattern of the error distribution, the overall combined error of the model for the rotational and vibrational state distributions is less than 0.15, or 15%.

III.2 Product state distributions of inelastic processes

We have developed two NN models to predict the final rotational and vibrational state distributions for the inelastic collisions between hydrogen or deuterium molecules with calcium, labeled as NNIH and NNID, respectively, as shown in Table 2. These two models each predict the final rotational and vibrational distributions for their respective isotopologues. Additionally, both models use the same featurization as that used for the single reaction predictors.

A sample of test results for the final state rotational distribution of the inelastic collision models is provided in Fig. 4. In this case, as in the case of reactions, the NN models reproduce remarkably well the QCT calculations. As a result, we can conclude that the NN models are learning the underlying chemistry properly. Similarly, in Fig. 5, we present our results for the final vibrational product state distribution. The performance of the NNs is outstanding independent of the collision energy and final vibrational state. Therefore, as with the reactive process predictors, the inelastic predictors demonstrate a capability to make predictions on test data in the interpolation regime that succeed both in terms of numerical accuracy and the shapes of the distributions.

The relative RMSE values for the test results of the inelastic collision models are provided in Fig. 6. First, we notice that the relative errors are smaller than the models for reactions. In this case the relative errors do not exceed 15%. There are even some cases where the relative error is 2.5%, showing excellent performance. Therefore, it is clear that the inelastic collision models are capable of predicting the shapes of the final state distributions with high fidelity alongside high numerical accuracy. Using the same featurization as the single reaction predictors, it is clear that neural network models can learn the relationship between the initial state space and the final rotational and vibrational state distributions with high accuracy. It is apparent that the inelastic predictors outperform their reactive process counterparts for most if not all of the test examples provided in the testing data set. Thus, the featurization used for the reactive and inelastic models is universal between processes.

Clearly, the inelastic models outperform their reactive counterparts for nearly all test examples. This can be explained by the fact that reactions are less frequent than inelastic processes for the range of collision energies explored in this work, yielding smaller probabilities in comparison with inelastic processes. Hence, the NN models for reactions have a harder time capturing the underlying reaction mechanism.

The training data set size between each inelastic model and their reactive counterpart (same isotopologue) are equivalent, and these training sets show the same portions of the initial state space. The inelastic models do not capture the final state distributions perfectly, so there is the possibility that increasing the training data set size could improve model performance. However, even with a small data set it is clear that the inelastic process for the hydrogen and deuterium collisions with calcium can be effectively predicted by a simple feed-forward neural network.

III.3 Prediction results for a model trained on multiple reactions

Until now, the training data contained the same chemical species as the final product state distribution; i.e., the model is exposed to the same reaction we want to predict. This is the workhorse of many ideas on machine learning prediction models for atom-molecule collisions [21, 22, 23, 24, 25, 26, 27]. However, to develop a universal predictor, it would be necessary to make predictions of a reaction across the chemical space. In other words, the training data could be different from the reaction that has to be predicted, which is a very demanding property. However, Ca + H₂ collisions serve as a candidate to explore the universality of machine learning models for chemical reactions since it could be easily generalized, including hydrogen isotopologues, and hence, the same reaction can lead to three different ones. In this case, the hydrogen isotopes only enlarge the chemical space. However, as shown below, the isotopic effects on the reaction are substantial. On the other hand, it is worth mentioning that the predictive power of these models could be compromised by the zero-point energy effects characteristic of a full quantitative treatment and absent in a QCT approach.

Our model NNIso is trained on data from Ca + H ${}_{2}(v_{i},j_{i})\rightarrow$ CaH $(v_{f},j_{f})$ + H and Ca + T ${}_{2}(v_{i},j_{i})\rightarrow$ CaT $(v_{f},j_{f})$ + T, but it predicts the product state distribution for D ${}_{2}(v_{i},j_{i})\rightarrow$ CaD $(v_{f},j_{f})$ + D. The NN model has never been exposed to any deuterium containing reaction, and still it will predict the outcome of the reaction. To do so, as previously mentioned, this model was given an expanded feature vector that captures variables related to the isotopic differences in the chemical system. With this expanded featurization, this model generalizes between the isotopologues of the reaction. The capability of this model to produce accurate results indicates that it has effectively learned the isotopic effects of the reaction. It should be noted that isotopic variants of the reaction other than those mentioned exist, such as where the reactant diatom is composed of two atoms of different hydrogen isotopes. For these particular isotopic variants, our methodology could be extended to predict these reactions by developing more models or enlarging current models to accommodate two distinguishable reactive channels, one for each possible product diatom. However, this is beyond the scope of this work as our models currently are used to predict a single reactive channel of an atom-diatom collision.

The test performance of this model is captured in Figs. 7 and 8, with the latter providing the relative RMSE values. From Fig. 7, which shows the product state distributions, it is clear that the model is capable of capturing the shape of the final rotational and vibrational state distributions. The model predictions still follow the trend present in the QCT results and capture the shapes of the actual distributions. Additionally, this model is capable of making numerically accurate predictions, as evidenced by the overlap of the predicted and actual distributions in panels (a)-(c) of Fig. 7. Thus, this model is capable of generalizing between isotopologues of the reaction to make effective predictions of the final state distributions for members of the test data set.

Though this model that generalizes between isotopologues has shown success in predicting members of the testing sample, it struggles more than the single reaction predictor counterparts to make equally accurate predictions. This result arises despite NNIso being given more training data than any individual single reaction predictor (NNIso was trained on both the hyrdrogen and tritium isotopologues). However, it should be noted that the multiple reaction predictor saw the same regions of the initial state space, albeit between two different reactions. So, while more training data was provided to the multiple reaction predictor, more was needed to compensate for the model needing to learn the isotopic effects with the extended featurization. If the model were provided with some deuterium training examples, it could better generalize between the isotopologues, but we sought to push the model by having it predict a reaction not previously included in the training data set. The featurization for this model could be improved to allow the neural network to learn the isotopic effects better. While the original featurization used for the previous models has been shown to produce effective predictions, it is still redundant (e.g., by including both angular momentum and rotational energy) and perhaps removing this redundancy would allow for better training of the model to learn the isotopic effects. Also, there could be yet a better-extended featurization that could be used to inform the model about how to predict the isotopic effects of the reactions.

IV Conclusions

Our work explores the capabilities of deep learning in predicting atom-molecule reaction outcomes. Specifically, we focus on Ca + H₂ collisions in the hyper-thermal regime as a prototypical example, including isotopologue variations. Using a deep feed-forward neural network, we demonstrate it is possible to use a single featurization to learn the final reactive ro-vibrational state distributions across isotopologues of an atom-diatom reactive collision. Additionally, we show that this featurization is general enough to allow a deep neural network to learn either the reactive or inelastic process.

To expand on this, we develop an NN model capable of learning isotope variants of a given reaction; i.e., the model is capable of predicting the outcome of a reaction using information from other reactions. By augmenting the featurization to include information about the masses of the system and select spectroscopic constants of the reactant diatom, we demonstrate it is possible for a single NN model to learn all of the hydrogen isotopic variants of the reaction H₂ + Ca $\rightarrow$ CaH + H. Efforts such as these to design machine-learned neural networks to generalize between more reactions will move us closer to a universal model for atom-diatom reactive collisions.

acknowledgements

This work was supported by the United States Air Force Office of Scientific Research [grant number FA9550-23-1-0202]. The authors thank Prof. M. Meuwly for reading the manuscript and for fruitful discussions.

References

Meuwly [2021] M. Meuwly, Machine learning for chemical reactions, Chemical Reviews 121, 10218 (2021).
Jorner et al. [2021] K. Jorner, T. Brinck, P.-O. Norrby, and D. Buttar, Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies, Chem. Sci. 12, 1163 (2021).
Friederich et al. [2020] P. Friederich, G. dos Passos Gomes, R. De Bin, A. Aspuru-Guzik, and D. Balcells, Machine learning dihydrogen activation in the chemical space surrounding vaska’s complex, Chem. Sci. 11, 4584 (2020).
Choi et al. [2018] S. Choi, Y. Kim, J. W. Kim, Z. Kim, and W. Y. Kim, Feasibility of activation energy prediction of gas-phase reactions by machine learning, Chemistry – A European Journal 24, 12354 (2018), https://chemistry-europe.onlinelibrary.wiley.com/doi/pdf/10.1002/chem.201800345 .
García-Andrade et al. [2023] X. García-Andrade, P. García Tahoces, J. Pérez-Ríos, and E. Martínez Núñez, Barrier height prediction by machine learning correction of semiempirical calculations, The Journal of Physical Chemistry A 127, 2274 (2023).
Zhang et al. [2022] W. Zhang, L. C. Kasun, Q. J. Wang, Y. Zheng, and Z. Lin, A review of machine learning for near-infrared spectroscopy, Sensors 22, 10.3390/s22249764 (2022).
Carlos A. Meza Ramirez and ur Rehman [2021] L. A. Carlos A. Meza Ramirez, Michael Greenop and I. ur Rehman, Applications of machine learning in spectroscopy, Applied Spectroscopy Reviews 56, 733 (2021), https://doi.org/10.1080/05704928.2020.1859525 .
Ghosh et al. [2019] K. Ghosh, A. Stuke, M. Todorović, P. B. Jørgensen, M. N. Schmidt, A. Vehtari, and P. Rinke, Deep learning spectroscopy: Neural networks for molecular excitation spectra, Advanced Science 6, 1801367 (2019), https://onlinelibrary.wiley.com/doi/pdf/10.1002/advs.201801367 .
Ibrahim et al. [2024] M. A. E. Ibrahim, X. Liu, and J. Pérez-Ríos, Spectroscopic constants from atomic properties: a machine learning approach, Digital Discovery 3, 34 (2024).
Liu et al. [2021] X. Liu, G. Meijer, and J. Pérez-Ríos, On the relationship between spectroscopic constants of diatomic molecules: a machine learning approach, RSC Adv. 11, 14552 (2021).
Amaral and Mohallem [2020] P. H. R. Amaral and J. R. Mohallem, Machine-learning predictions of positron binding to molecules, Phys. Rev. A 102, 052808 (2020).
Schuetzke et al. [2023] J. Schuetzke, N. J. Szymanski, and M. Reischl, Validating neural networks for spectroscopic classification on a universal synthetic dataset, npj Computational Materials 9, 100 (2023).
Merchant et al. [2023] A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon, and E. D. Cubuk, Scaling deep learning for materials discovery, Nature 624, 80 (2023).
Bartel et al. [2020] C. J. Bartel, A. Trewartha, Q. Wang, A. Dunn, A. Jain, and G. Ceder, A critical examination of compound stability predictions from machine-learned formation energies, npj Computational Materials 6, 97 (2020).
Krenn et al. [2023] M. Krenn, J. Landgraf, T. Foesel, and F. Marquardt, Artificial intelligence and machine learning for quantum technologies, Phys. Rev. A 107, 010101 (2023).
Lode et al. [2021] A. U. J. Lode, R. Lin, M. Büttner, L. Papariello, C. Lévêque, R. Chitra, M. C. Tsatsos, D. Jaksch, and P. Molignini, Optimized observable readout from single-shot images of ultracold atoms via machine learning, Phys. Rev. A 104, L041301 (2021).
da Silva et al. [2021] B. P. da Silva, B. A. D. Marques, R. B. Rodrigues, P. H. S. Ribeiro, and A. Z. Khoury, Machine-learning recognition of light orbital-angular-momentum superpositions, Phys. Rev. A 103, 063704 (2021).
Davletov et al. [2020] E. T. Davletov, V. V. Tsyganok, V. A. Khlebnikov, D. A. Pershin, D. V. Shaykin, and A. V. Akimov, Machine learning for achieving bose-einstein condensation of thulium atoms, Phys. Rev. A 102, 011302 (2020).
Tscherbul and Kłos [2021] T. V. Tscherbul and J. Kłos, Universal stereodynamics of cold atom-molecule collisions in electric fields, Phys. Rev. A 103, 062810 (2021).
Morita et al. [2018] M. Morita, M. B. Kosicki, P. S. Żuchowski, and T. V. Tscherbul, Atom-molecule collisions, spin relaxation, and sympathetic cooling in an ultracold spin-polarized $\mathrm{Rb}(^{2}s)-\mathrm{SrF}(^{2}\mathrm{\Sigma}^{+})$ mixture, Phys. Rev. A 98, 042702 (2018).
Arnold et al. [2020] J. Arnold, D. Koner, S. Käser, N. Singh, R. J. Bemish, and M. Meuwly, Machine learning for observables: Reactant to product state distributions for atom–diatom collisions, The Journal of Physical Chemistry A 124, 7177 (2020).
Arnold et al. [2022] J. Arnold, J. C. San Vicente Veliz, D. Koner, N. Singh, R. J. Bemish, and M. Meuwly, Machine learning product state distributions from initial reactant states for a reactive atom–diatom collision system, The Journal of Chemical Physics 156, 034301 (2022), https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/5.0078008/16533706/034301_1_online.pdf .
Huang and Cheng [2024] X. Huang and X. Cheng, State-to-state dynamics and machine learning predictions of inelastic and reactive O(3P) + CO(1 $\Sigma$ +) collisions relevant to hypersonic flows, The Journal of Chemical Physics 160, 174310 (2024), https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/5.0195543/19918177/174310_1_5.0195543.pdf .
Huang et al. [2023] X. Huang, K.-M. Gu, C.-M. Guo, and X.-L. Cheng, Dissociation cross sections and rates in o2 + n collisions: molecular dynamics simulations combined with machine learning, Phys. Chem. Chem. Phys. 25, 29475 (2023).
Koner et al. [2019] D. Koner, O. T. Unke, K. Boe, R. J. Bemish, and M. Meuwly, Exhaustive state-to-state cross sections for reactive molecular collisions from importance sampling simulation and a neural network representation, The Journal of Chemical Physics 150, 211101 (2019), https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/1.5097385/15559489/211101_1_online.pdf .
Priyadarshini et al. [2023] M. S. Priyadarshini, S. Venturi, I. Zanardi, and M. Panesi, Efficient quasi-classical trajectory calculations by means of neural operator architectures, Phys. Chem. Chem. Phys. 25, 13902 (2023).
San Vicente Veliz et al. [2022] J. C. San Vicente Veliz, J. Arnold, R. J. Bemish, and M. Meuwly, Combining machine learning and spectroscopy to model reactive atom + diatom collisions, The Journal of Physical Chemistry A 126, 7971 (2022).
Gu et al. [2023] K.-M. Gu, H. Zhang, and X.-L. Cheng, Exhaustive state-specific dissociation study of the N2( $\Sigma$ g+1)+N(S4) system using QCT combined with a neural network method, The Journal of Chemical Physics 158, 244302 (2023), https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/5.0151331/18013956/244302_1_5.0151331.pdf .
Wang et al. [2024] J. Wang, J. C. S. V. Veliz, and M. Meuwly, High-energy reaction dynamics of n₃ (2024), arXiv:2404.18877 [physics.chem-ph] .
Vázquez-Carson et al. [2022] S. F. Vázquez-Carson, Q. Sun, J. Dai, D. Mitra, and T. Zelevinsky, Direct laser cooling of calcium monohydride molecules, New Journal of Physics 24, 083006 (2022).
Weinstein et al. [1998] J. D. Weinstein, R. deCarvalho, T. Guillet, B. Friedrich, and J. M. Doyle, Magnetic trapping of calcium monohydride molecules at millikelvin temperatures, Nature 395, 148 (1998).
Liu et al. [2022] X. Liu, W. Wang, S. C. Wright, M. Doppelbauer, G. Meijer, S. Truppe, and J. Pérez-Ríos, The chemistry of AlF and CaF production in buffer gas sources, The Journal of Chemical Physics 157, 074305 (2022), https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/5.0098378/16696962/074305_1_online.pdf .
Truhlar and Muckerman [1979] D. G. Truhlar and J. T. Muckerman, Atom-molecule collision theory: a guide for the experimentalist (Plenum Press, New York, 1979) Chap. Reactive scattering Cross sections III: quasiclassical and semiclassical methods, p. 505–561.
Pérez-Ríos [2020] J. Pérez-Ríos, An Introduction to Cold and Ultracold Chemistry (Springer International Publishing, Cham, 2020).
Yan et al. [1996] Z.-C. Yan, J. F. Babb, A. Dalgarno, and G. W. F. Drake, Variational calculations of dispersion coefficients for interactions among h, he, and li atoms, Phys. Rev. A 54, 2824 (1996).
Liu et al. [2009] J. Liu, E. J. Salumbides, U. Hollenstein, J. C. J. Koelemeij, K. S. E. Eikema, W. Ubachs, and F. Merkt, Determination of the ionization and dissociation energies of the hydrogen molecule, The Journal of Chemical Physics 130, 174306 (2009), https://doi.org/10.1063/1.3120443 .
Shayesteh et al. [2017] A. Shayesteh, S. F. Alavi, M. Rahman, and E. Gharib-Nezhad, Ab initio transition dipole moments and potential energy curves for the low-lying electronic states of cah, Chemical Physics Letters 667, 345 (2017).
Mitroy and Zhang [2008] J. Mitroy and J.-Y. Zhang, Properties and long range interactions of the calcium atom, The Journal of Chemical Physics 128, 134305 (2008), https://doi.org/10.1063/1.2841470 .
Koots and Pérez-Ríos [2024] R. Koots and J. Pérez-Ríos, Pyqcams: Python quasi-classical atom–molecule scattering, Atoms 12, 10.3390/atoms12050029 (2024).
Kingma and Ba [2014] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
Glorot and Bengio [2010] X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, Vol. 9, edited by Y. W. Teh and M. Titterington (PMLR, Chia Laguna Resort, Sardinia, Italy, 2010) pp. 249–256.
Abadi et al. [2015] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems (2015), software available from tensorflow.org.
Chollet et al. [2015] F. Chollet et al., Keras, https://keras.io (2015).