0% found this document useful (0 votes)
8 views10 pages

Choudhary 2025 Diffractgpt Atomic Structure Determination From X Ray Diffraction Patterns Using A Generative Pretrained

The article presents DiffractGPT, a generative pretrained transformer model that predicts atomic structures from X-ray diffraction patterns, addressing the complex challenge of crystal structure determination in materials science. Trained on a large dataset, DiffractGPT demonstrates improved prediction accuracy when incorporating chemical information and significantly reduces the computational resources and expertise required for structure determination. This advancement offers a robust tool for automating crystal structure determination and facilitates data-driven materials discovery and design.

Uploaded by

Riandoken
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

Choudhary 2025 Diffractgpt Atomic Structure Determination From X Ray Diffraction Patterns Using A Generative Pretrained

The article presents DiffractGPT, a generative pretrained transformer model that predicts atomic structures from X-ray diffraction patterns, addressing the complex challenge of crystal structure determination in materials science. Trained on a large dataset, DiffractGPT demonstrates improved prediction accuracy when incorporating chemical information and significantly reduces the computational resources and expertise required for structure determination. This advancement offers a robust tool for automating crystal structure determination and facilitates data-driven materials discovery and design.

Uploaded by

Riandoken
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This article is licensed under CC-BY 4.

pubs.acs.org/JPCL Letter

DiffractGPT: Atomic Structure Determination from X‑ray Diffraction


Patterns Using a Generative Pretrained Transformer
Kamal Choudhary*
Cite This: J. Phys. Chem. Lett. 2025, 16, 2110−2119 Read Online

ACCESS Metrics & More Article Recommendations *


sı Supporting Information
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

ABSTRACT: Crystal structure determination from powder diffraction patterns is a


complex challenge in materials science, often requiring extensive expertise and
computational resources. This study introduces DiffractGPT, a generative pretrained
transformer model designed to predict atomic structures directly from X-ray
diffraction (XRD) patterns. By capturing the intricate relationships between
Downloaded via 103.47.133.152 on July 1, 2025 at 08:31:10 (UTC).

diffraction patterns and crystal structures, DiffractGPT enables fast and accurate
inverse design. Trained on thousands of atomic structures and their simulated XRD
patterns from the JARVIS-DFT data set, we evaluate the model across three
scenarios: (1) without chemical information, (2) with a list of elements, and (3)
with an explicit chemical formula. The results demonstrate that incorporating
chemical information significantly enhances prediction accuracy. Additionally, the
training process is straightforward and fast, bridging gaps between computational,
data science, and experimental communities. This work represents a significant
advancement in automating crystal structure determination, offering a robust tool for data-driven materials discovery and design.

S ince the discovery of X-rays in 1895, they have been widely


used in medical imaging, crystallography, and astronomy.1
Numerous experimental techniques in materials science rely on
manual intervention, particularly when dealing with ambiguous
or incomplete data.
In recent years, machine learning has emerged as a powerful
X-rays, including X-ray diffraction (XRD), X-ray fluorescence tool in materials science, offering the potential to accelerate
(XRF), X-ray photoelectron spectroscopy (XPS), small-angle X- materials discovery and characterization.11−13 In particular,
ray scattering (SAXS), X-ray tomography (XRT), X-ray high-throughput materials design and process modeling, which
reflectometry (XRR), grazing incidence X-ray diffraction are key driving forces behind the Materials Genome Initiative
(GIXRD), and resonant inelastic X-ray scattering (RIXS).2,3 and the Creating Helpful Incentives to Produce Semiconductors
Among these, XRD plays a crucial role in determining atomic (CHIPS) Act,14 require a bridge between experiments and
structures and uncovering the mechanisms underlying mechan- multiscale modeling components, where large language models
ical strength, electronic properties, optical behavior, and (LLMs) could play a significant role. Moreover, two recent
chemical reactivity.4,5 However, crystal structure determination Nobel Prizes in Physics and Chemistry in 2024 for neural
currently involves extensive trial and error as well as expert networks and AlphaFold clearly demonstrate the wide
knowledge. The main challenge lies in the reduction of chemical applicability of AI/ML in scientific research.
and three-dimensional structural information into one-dimen- The AI/ML techniques have been successfully used for both
sional diffraction patterns, which causes the loss of phase forward (structure to property) and inverse (property to
information and complicates structure determination. structure) tasks in materials design.11 Generating crystal
Additionally, the presence of peaks in the diffraction data of structures from XRD can be considered a generative AI-based
newly discovered compounds, complex materials, or multiphase inverse design task. Recent advancements in machine learning
systems further exacerbates this challenge. Over the past few related to X-ray diffraction15 include works by Park et al.,16
decades, Rietveld refinement, simulated annealing, and evolu- NeuralXRD,17 XRD_is_All_You_Need,18 Crystallography
tionary algorithms have been developed to address this problem Companion Agent (XCA),19 ARiXD-ML,20 Zaloga et al.,21
by iteratively fitting data to potential candidate structures.2,3
Several widely used software tools, such as FullProf,6 the Received: October 30, 2024
General Structure Analysis System (GSAS),7 GenX,8 TOtal Revised: January 20, 2025
Pattern Analysis Solutions (TOPAS),9 and Materials Analysis Accepted: January 21, 2025
Using Diffraction (MAUD),10 are available for this purpose. Published: February 20, 2025
While these methods have been successful, they often require
significant domain expertise, computational resources, and
Not subject to U.S. Copyright. Published
2025 by American Chemical Society https://doi.org/10.1021/acs.jpclett.4c03137
2110 J. Phys. Chem. Lett. 2025, 16, 2110−2119
The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter

XTEC,22 Li et al.,23 Maffettone et al.,24 Oviedo et al.25 and wide range of material classes, including metallic, semi-
several others.26−28 These works demonstrate the application of conducting, insulating, superconducting, high-strength, topo-
ML models for a wide range of tasks, including crystal lattice and logical, solar, thermoelectric, piezoelectric, dielectric, two-
space group classification, peak detection, and structure dimensional, magnetic, porous, defect, and various other types
generation. In particular, the application of deep generative of bulk materials.
models such as Variational Autoencoders (VAEs) and In this paper, we describe the architecture and training
Generative Adversarial Networks (GANs) has demonstrated methodology of DiffractGPT and evaluate its performance on
the ability to generate complex atomic structures based on the PXRD data set. DiffractGPT uses transformer architecture
insights. based on the Mistral AI model39 but can be easily adapted to
The potential of GPT in natural language processing (NLP), other LLMs as well. We demonstrate that DiffractGPT not only
such as ChatGPT, has spurred interest in their applications matches the accuracy of traditional methods but also
beyond textual data, particularly in domains such as chemistry significantly reduces the computational time and expertise
and materials science. The success of AtomGPT (Atomistic required for crystal structure determination. AtomGPT and
Generative Pretrained Transformer),29 which demonstrated the DiffractGPT are analogous to AlphaFold (mentioned above) in
capability to generate atomic structures and predict material their approach to solving complex structure−property relation-
properties using transformer-based architectures, highlights the ships using machine learning. They adapt generative predictive
power of transformer models in handling materials data. frameworks to tackle fundamental challenges in materials
AtomGPT establishes the relationship between atomic config- science, mirroring what AlphaFold40 has achieved for biology.
urations as text and material properties, allowing it to tackle both The results show the promise of using generative machine
forward and inverse design problems. learning models for automating the crystal structure determi-
The GPT is a type of LLM originally developed for natural nation process, opening up new avenues for materials discovery
language processing and has demonstrated remarkable success and design. The code used in this study will be made available on
in generating coherent and contextually relevant text.30−32 the AtomGPT GitHub page: https://github.com/usnistgov/
Models such as ChatGPT33 have been used for code generation, atomgpt.
debugging, literature reviews, and numerous other tasks. The data set used for this work is taken from the JARVIS-DFT
However, if we attempt to perform forward/inverse materials database, which includes nearly 80,000 atomic structures and
design tasks, the outcomes can be quite poor.34−36 Nevertheless, several material properties derived from density functional
inspired by its simplicity of use and the massive success of theory and powder X-ray diffraction patterns.37,38,41 From an
ChatGPT, an alternate model, AtomGPT, was introduced, atomic structure and a given X-ray wavelength (here Cu Kα),
tailored for forward and inverse materials design. the corresponding PXRD patterns can be easily calculated. The
While AtomGPT enables scalar material properties to be PXRD pattern was computed from the atomic structure by first
generated from atomic structures, its application for generating calculating the reciprocal lattice vectors and interplanar spacings
atomic structures from experimental properties, such as XRD, dhkl for each set of Miller indices (hkl). Bragg’s law, nλ = 2dhkl
has not yet been explored. Based on these developments, we sin θ, was used to convert these d-spacings into scattering angles
introduce DiffractGPT (DGPT), a specialized generative model 2θ. The structure factor F(hkl) for each reflection was then
designed to directly predict crystal structures from powder X-ray calculated as the sum of atomic scattering contributions from all
diffraction (PXRD) patterns. DiffractGPT leverages the power- atoms in the unit cell, taking into account their positions and
ful architecture of AtomGPT, adapting it to the unique associated phase shifts. The atomic scattering factor f(θ), which
challenges of PXRD-based crystal structure determination. By varies with the scattering angle, was used to model the electron
training on large data sets such as JARVIS-DFT (JDFT), which density distribution around each atom accurately. The
comprises simulated PXRD patterns alongside their correspond- diffraction intensity for each reflection was obtained using the
ing atomic structures, DiffractGPT learns to map complex relation I(hkl) ∝ |F(hkl)|2. A Gaussian broadening function was
diffraction data to accurate crystal structures. This approach also applied to account for experimental resolution effects. The
enables the direct prediction of atomic arrangements from final XRD pattern was generated by summing the corrected
diffraction data, significantly reducing the need for iterative intensities over all relevant reflections. All calculations were
fitting and manual intervention. We further evaluate various performed using custom scripts in the JARVIS-tools package to
application scenarios for DiffractGPT, such as XRD with no simulate the diffraction patterns for comparison with exper-
known chemical constituents, with guessed elements, and with imental data.
explicit chemical formulas for structure design tasks. We also Such XRD predictions were carried out for all the data in the
provide a web framework and tools to match the XRD patterns JARVIS-DFT (JDFT) data set. The XRD data set was split into a
with existing data, as well as to generate new structures using the 90:10 ratio for training and testing the DiffractGPT models.
generative models. Most importantly, although we apply the This requires fine-tuning LLM models such as Mistral AI,39
models to XRD data, they can also be useful for other which are based on transformer architecture. Each transformer
experiments, such as neutron and electron diffraction and block contains two main components: a multihead self-attention
other spectroscopic experiments. mechanism and a position-wise feed-forward network. The
The Joint Automated Repository for Various Integrated input to the model is a sequence of tokens, which are first
Simulations (JARVIS) - density functional theory (DFT)37,38 converted into embeddings and then passed through the
database used in this work contains nearly 80,000 bulk 3D transformer blocks. The scaled dot-product attention used in a
materials and 1,100 2D materials. The JARVIS-DFT project transformer model can be written as
originated about six years ago and has amassed millions of
material properties, along with carefully converged atomic ij QKT yz
Attention(Q , K , V ) = softmaxjjjj zzV
z
structures using tight convergence parameters and various j dk zz
exchange-correlation functionals. JARVIS-DFT encompasses a k { (1)

2111 https://doi.org/10.1021/acs.jpclett.4c03137
J. Phys. Chem. Lett. 2025, 16, 2110−2119
The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter

Figure 1. Crystal lattice and spacegroup data-distribution in the JARVIS-DFT (JDFT) database and comparison of a few simulated XRD-patterns with
experimental measurements. (a) Crystal lattice and spacegroup distribution in the JDFT atomic structure database. (b) Simulated and experimental
PXRD for silicon. The experimental data was taken from RRUFF database with ID R050145 while the simulated data from JDFT ID JVASP-1002. (c)
Simulated and experimental PXRD for lanthanum boride. The experimental data was obtained as a part of this work while the simulated data from
JDFT with ID of 15014. (d) Simulated and experimental PXRD for silicon carbide (Moissanite). The experimental data was taken from RRUFF
database with ID R061083 while the simulated data from JDFT ID JVASP-107. (e) Simulated and experimental PXRD for magnesium boride. The
experimental data was obtained as a part of this work while the simulated data from JDFT ID JVASP-1151. (f) Simulated and experimental PXRD for
hafnium carbide. The experimental data was obtained as a part of this work while the simulated data from JDFT ID JVASP-17957.

where Q, K, and V represent the query, key, and value matrices, Now, fine-tuning requires transforming the instructions into a
respectively. Here, dk is the dimensionality of the key vectors. specialized protocol such as Alpaca.46 The Alpaca instructions
The multihead attention is obtained by concatenating multiple consist of Python dictionaries with keys for instruction, input,
such attention heads. The multihead self-attention mechanism and output texts. The instruction key was set to “Below is a
allows the model to focus on different parts of the input description of a material.” The XRD patterns were interpolated
sequence when computing the output for a particular token. on a grid of 180 points, with intervals of 0.5° 2θ, using three
There are thousands of LLMs, especially transformer models, floating-point precision, and then converted to a string with a
that are publicly available. In particular, we use the Mistral AI 7 newline character as separators. A fixed pattern length allows for
billion parameter model,39 which employs Low-Rank Adapta- uniform token lengths for LLMs, irrespective of different
tion (LoRA) for parameter-efficient fine-tuning (PEFT)42 simulation and experimental settings for PXRD data. Note that
adopted from the UnslothAI package.43 Mistral is a powerful with decreasing intervals (here 0.5), the number of tokens
model with 7.3 billion parameters and has been shown to increases, and hence, the training and inference time will be
higher. The input key used was of three types: (1) with no
outperform the Large Language Model Meta AI (LLaMA) 2
chemical information, (2) with elemental lists only, and (3) with
13B,44 LLaMA 1 34B,45 and ChatGPT33 on several publicly
an explicit chemical formula. For the input with no chemical
available benchmarks. The Mistral 7B model combines information, the input key was simply “The XRD is ... Generate
efficiency and performance within a 7 billion parameter atomic structure description with lattice lengths, angles,
architecture. It introduces several key innovations, including coordinates, and atom types.” Similarly, for the second and
Grouped-Query Attention for reduced computational complex- third cases, the inputs were “The chemical elements are ... The
ity, Sliding Window Attention for processing longer sequences, XRD is ... Generate atomic structure description with lattice
and Rotary Positional Embeddings (RoPE) for improved lengths, angles, coordinates, and atom types.” and “The chemical
position encoding. The model features 32 layers, a hidden size formula is ... The XRD is ... Generate atomic structure
of 4096, and 32 attention heads. It employs prenormalization, description with lattice lengths, angles, coordinates, and atom
Swish-Gated Linear Unit (SwiGLU) activation in feed-forward types,” respectively. Finally, the output key was a string of lattice
layers, and various training optimizations. This model was also lengths, angles, and chemical coordinates along with three
successfully used in the previous AtomGPT work.29 fractional coordinates in XYZ format. Two decimal precision
2112 https://doi.org/10.1021/acs.jpclett.4c03137
J. Phys. Chem. Lett. 2025, 16, 2110−2119
The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter

was used for lattice parameters and three decimal precision for the least number belongs to the triclinic lattice out of the seven
coordinates. crystal systems. Similarly, out of 230 space groups, 225, which
As directly fine-tuning such an LLM can be computationally belong to the cubic lattice system, is prevalent. Such analysis
expensive, the PEFT method was used within the Hugging Face provides a basic understanding of the predictive limits of the
ecosystem. Additionally, Transformer Reinforcement Learning models. For instance, if the model is trained with a sufficiently
(TRL) and RoPE47 were employed to patch the Mistral model large cubic data set but not with a triclinic data set, it might
with fast LoRA42 weights for reduced memory training. After generalize well for cubic systems but not for triclinic ones.
obtaining the PEFT model, corresponding tokenizer, and Alpaca There are various proprietary databases that contain PXRD
data set, supervised fine-tuning tasks were carried out with a and atomic structure information. However, in this work, we
batch size of 5, using the AdamW 8-bit optimizer and a cross- choose to use the publicly available JARVIS-DFT data set for
entropy loss function for 5 epochs. This loss function measures proof of concept. Note that although a simulated PXRD
the difference between the predicted probability distribution database is used here, it can be easily extended to include
over the vocabulary and the true distribution (i.e., the one-hot experimental data in the future. Analyzing the accuracy of the
encoded target words). After the model is trained, it is evaluated simulated PXRD compared to experimental results is important.
on the test set with respect to reconstruction/test performance. In Figure 1b−f, we present a few such comparisons. The
To further clarify, after training the model on the training set, experimental data was either obtained from RRUFF database or
while keeping the instruction and input keys in the test set, the as part of the experimental component of this work.
trained model is employed to generate outputs. After parsing the The simulated and experimental PXRD for silicon, which is
outputs to create corresponding crystal structures, the undoubtedly the most important material, especially for the
StructureMatcher algorithm48 is used to find the best match semiconductor industry, is shown in Figure 1b. The
between two structures, considering all invariances of materials. experimental data was taken from the RRUFF database with
The root-mean-square error (RMS) is averaged over all matched ID R050145, while the simulated data is from JDFT with ID
materials. Because the interatomic distances can vary signifi- JVASP-1002. All the simulation and experimental data were
cantly for different materials, the RMS is normalized following rescaled between 0 and 1 based on the maximum height
the work in ref..49 Note that this is just one of the metrics for available in that pattern for uniform comparison. We can observe
generative models for atomic structures, and there can be close agreement between the simulated (Sim.) and experimental
numerous other types of metrics. (Exp.) patterns, suggesting high fidelity of the simulated data.
In addition to developing GPT models, convolutional neural We note that the relative peak heights may not be exactly
networks (CNN) and gradient boosting regression tree (GBR) identical for all the peaks, which can be attributed to the
models were developed to predict lattice lengths given XRD collection of crystal planes encountered during PXRD experi-
patterns, with the same train-test split as for GPT models. For ments.
the GBR and CNN models, the XRD signals are used as inputs Similarly, the simulated and experimental PXRD for
and the three lattice constants as outputs. For GBR, we used lanthanum boride, considered an important reference material
1000 estimators, a learning rate of 0.01, and a maximum depth of for XRD, is shown in Figure 1c. The experimental data was
3 with a mean absolute error loss function. The CNN model obtained as part of this work, while the simulated data is from
used in this study, referred to as CNNRegressor, is designed to JDFT with ID JVASP-15014. Here, we observe excellent
perform regression tasks by extracting features from one- agreement in peak positions and peak height values, especially
dimensional input data. The architecture begins with two 1D up to 60° 2θ values, after which peak heights begin to differ. The
convolutional layers: the first layer has 16 filters and the second simulated and experimental PXRD for silicon carbide
layer has 32 filters, both with a kernel size of 3 and padding of 1 (Moissanite) is shown in Figure 1d. The experimental data
to preserve the input size. Each convolutional layer is followed was taken from the RRUFF database with ID R061083, while
by a Rectified Linear Unit (ReLU) activation function to the simulated data is from JDFT with ID JVASP-107. Here, we
introduce nonlinearity. MaxPooling layers with a kernel size of 2 see more peaks in the simulation around 30° 2θ, which can also
and stride of 2 are applied to downsample the feature maps, be attributed to the reasons mentioned above regarding crystal
reducing dimensionality and computational load. After these planes encountered during experiments. PXRD should measure
operations, the output is flattened to a shape of 32 × 45, which an aggregate of all present crystal planes that diffract X-ray that
feeds into a fully connected layer with 64 neurons. The final fulfill the Braggs criterion. However, in experiments, it is possible
output layer contains 3 neurons, corresponding to the three to miss some of the plane orientations in the powder sample.
target values predicted by the model. This architecture allows Finally, the simulated and experimental PXRDs for magnesium
the network to efficiently learn relevant features from the input boride and hafnium carbide are shown in Figure 1e,f. In the case
data for accurate regression. The CNN model was trained for 50 of magnesium boride, we are missing a peak around the 20° 2θ
epochs with a batch size of 32. value, as well as peaks after 60° 2θ. We observe excellent
Finally, XRD measurements were also performed for this agreement in the hafnium carbide case, especially up to 60° 2θ
work to validate the simulated XRD patterns. The crystal values, after which the experimental data shows fewer peaks than
structures were characterized using spatially resolved powder X- the simulated data. After generating such PXRD patterns for all
ray diffraction with a Bruker D8 Discover. We explored Bragg the materials in JDFT, we perform LLM training following the
angles ranging from 10° 2θ to 90° 2θ using Cu Kα radiation details mentioned above, and the resultant models can be used
(wavelength 1.54184 Å) at 50 kV, with a step size of 0.02° and a for fast prediction of crystal structures.
scan rate of 6° per minute. As the first evaluation of the model’s performance, the lattice
In Figure 1, we show the crystal lattice and space group data constants in the x, y, and z crystallographic directions are
distribution in the JDFT database and a comparison of several compared for crystals in the test set and those generated using
simulated XRD patterns with experimental measurements. In the DGPT models. This test set was never exposed to the model
Figure 1a, we observe that most of the crystals are cubic, while during training. The lattice constants from XRD can also be
2113 https://doi.org/10.1021/acs.jpclett.4c03137
J. Phys. Chem. Lett. 2025, 16, 2110−2119
The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter

predicted using other ML techniques such as gradient boosting MAE of 0.17 Å suggests promising results. As larger databases
regression tree (GBR), convolutional neural networks (CNN), are used for DiffractGPT in the future, the MAE may further
and various DiffractGPT (DGPT) models, as shown in Table 1. decrease. Note that DiffractGPT provides not only lattice
constants but also full atomic structure information, such as
Table 1. Performance Measurement in Terms of Mean chemical elements and coordinates. Hence, as a second
Absolute Error (MAE) for Predicting Lattice Constants (Å) evaluation, we compare the root-mean-square distance (RMS-
Using Gradient Boosting Regression (GBR), Convolutional d) between the predicted and target materials in the test set and
Neural Network (CNN), and Varieties of DiffractGPT find that the lowest error is observed for the DGPT model with
(DGPT) Modelsa explicit formulas. The RMS-d of 0.07 Å is comparable to the
AtomGPT value of 0.08 Å for the superconductor design task.29
DGPT-no DGPT- DGPT-
Prop/MAE GBR CNN formula element list formula To illustrate further, we show the predicted lattice constants
and volumes for the DiffractGPT chemical formula + XRD
a 1.03 0.28 0.25 0.18 0.17
pattern model in Figure 2. The color of the dots in the plot
b 0.99 0.27 0.26 0.20 0.18
c 1.27 0.28 0.38 0.28 0.27
represents different crystal lattice types. The cubic, tetragonal,
RMS-d - - 0.23 0.21 0.07
orthorhombic, hexagonal, trigonal, monoclinic, and triclinic
a systems are represented by blue, green, red, cyan, magenta,
We also compare root mean square distance in predicted vs target purple, and black colors, respectively. The values that lie on the x
structures using DGPT models.
= y line represent perfect agreement, while points away from it
represent outliers. We barely observe outliers from symmetric
The mean absolute errors (MAE) for predicting a, b, and c lattice systems such as cubic materials. Most of the outliers are
lattice constants on the test set for GBR are 1.03 Å, 0.99 Å, and from the red and purple dots, representing orthorhombic and
1.27 Å. Similarly, for CNN models, MAEs of 0.28 Å, 0.27 Å, and monoclinic systems. We find the maximum R2 score of 0.85 (for
0.28 Å are observed, which is a significant improvement lattice constant b) and the minimum R2 of 0.78 for lattice
compared to GBR. Now, the performance of three types of constant a.
DiffractGPT models�those with chemical information, with Now, we present an overview of the usability of the
element lists, and with explicit formulas�shows the minimum DiffractGPT framework in Figure 3. DiffractGPT can be used
error for the model with explicit formulas, which is intuitively to predict the complete crystal structure given a PXRD pattern.
correct. Specifically, the lowest error in lattice constant A user provides a PXRD pattern as input. These patterns contain
predictions was observed for the a-lattice parameter at 0.17 Å. background noise, which can be automatically detected and
This value is close to the CNN model predictions. Li et al. subtracted using scripts available in JARVIS-Tools. As a first
performed a similar task for predicting lattice constants and option, the spectrum can be matched with structures from
found a mean absolute error (MAE) of 0.48 Å50 and an R2 of atomic structure databases, such as those in JDFT or similar
0.80. Although the data sets for these two works are different, a databases, based on simulated XRD patterns using cosine

Figure 2. Performance of DiffracGPT chemical formula+XRD pattern to atomic structure model for lattice constants in (a) x-crystallographic
direction, (b) y-crystallographic direction, (c) z-crystallographic direction, (d) volume. The color of the dots in the plot represents different crystal
lattice types. The cubic, tetragonal, orthorhombic, hexagonal, trigonal, monoclinic, and triclinic systems are represented by blue, green, red, cyan,
magenta, purple, and black colors, respectively. The values that lie on the x = y line represent perfect agreement, while points away from it represent
outliers.

2114 https://doi.org/10.1021/acs.jpclett.4c03137
J. Phys. Chem. Lett. 2025, 16, 2110−2119
The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter

Figure 3. Schematic overview of crystal structure determination from XRD patterns using the DiffractGPT workflow. It begins with the user providing
an XRD pattern as input. Utilizing the scripts available in JARVIS-Tools, background subtraction is automatically performed. First, the spectrum can be
matched with structures from atomic structure databases, such as those in JDFT or similar databases, based on simulated XRD patterns using cosine
similarity or other metrics. Alternatively, there are multiple scenarios where the user might (1) not know the constituent elements at all, (2) have some
idea about the involved elements, or (3) explicitly know the chemical formula. Based on the provided information, the XRD pattern can be converted
to strings followed by tokenization, after which one or more pretrained DiffractGPT models can be applied to generate potential crystal structures.
Subsequently, further optimization can be performed using a unified GNN force field, such as ALIGNN-FF, to generate additional structure
candidates. A tentative application for this workflow is available at the Web site https://jarvis.nist.gov/jxrd.

similarity or other metrics. A web application for this option is In Figure 4, we evaluate the performance of the DiffractGPT
available at the JARVIS-XRD Web site (https://jarvis.nist.gov/ (DGPT)-formula model with and without ALIGNN-FF (AFF)
jxrd). This process can predict the top candidates for the input optimization for a few selected materials. In these examples, the
XRD pattern. However, if the XRD patterns are complex or if the input chemical formula and X-ray diffraction (XRD) pattern are
material does not exist in the current databases, the second fed into the DGPT model to generate an initial atomic structure.
option can be employed as follows. There are multiple scenarios: The theoretical XRD pattern of the generated structure is
the user might (1) not know the constituent chemical elements shown, along with the mean absolute error (MAE) when
at all, (2) have some idea about the involved elements, or (3) compared to the original input XRD pattern. To further
explicitly know the chemical formula. We have independent demonstrate the impact of optimization, we apply the ALIGNN-
DiffractGPT models for all these scenarios. Based on the FF (AFF) force field to relax the DGPT-generated structure, and
provided information, we can convert the XRD pattern to strings the resulting XRD pattern for the optimized structure is shown
followed by tokenization, after which one or more pretrained along with its corresponding MAE. We observe some of the
limitations of the model. For example, in Figure 4a, there are 6
DiffractGPT models can be applied to generate potential crystal
peaks while the DGPT model generates model with 7 peaks for
structures. Note that transformer architectures allow for fast
Silicon as shown in Figure 4b. After applying the ALIGNN-FF
sampling, which can also be used to generate multiple options optimization, we observe that the number of peaks is corrected
for the crystal structure if necessary. to 6, as expected as shown in Figure 4c. A similar trend is
As an optional subsequent step, further optimization of the observed for LaB6, where the input XRD pattern has 13 peaks
generated structures can be performed using a unified graph (Figure 4f), but the DGPT model initially predicts 14 peaks
neural network (GNN) force field (FF), such as the atomistic (Figure 4g). This discrepancy is also corrected with ALIGNN-
line graph neural network (ALIGNN)-FF,51 to generate FF optimization. On the other hand, for the HfC case shown in
additional structure candidates. It was developed for fast crystal Figure 4j, the predicted XRD pattern consistently matches the
structure optimization and to handle chemically and structurally correct number of peaks, suggesting that ALIGNN-FF
diverse crystalline systems, with the entirety of the JARVIS-DFT optimization may not be necessary in this case. We further
data set used for training. This data set contains 4 million energy- quantify these observations with mean absolute error (MAE)
force entries for 89 elements of the periodic table, of which values, comparing the target and predicted XRD patterns. The
307,113 entries were utilized for training.51 ALIGNN-FF is structure with the lower MAE can be considered the better
seamlessly integrated into the DiffractGPT framework. candidate structure for the XRD pattern. Moreover, while for the
2115 https://doi.org/10.1021/acs.jpclett.4c03137
J. Phys. Chem. Lett. 2025, 16, 2110−2119
The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter

Figure 4. Evaluating the performance of the DiffractGPT (DGPT)-formula model with and without ALIGNN-FF (AFF) optimization for a few
example materials. The input chemical formula and XRD pattern are fed into the DGPT model to generate the atomic structure. The theoretical XRD
pattern of the generated structure is shown as DGPT, along with the mean absolute error (MAE) of the XRD pattern in comparison with the input
XRD. The DGPT structure is further optimized with AFF, and the XRD of the optimized structure, along with its MAE, is shown. (a) Silicon atomic
structure, (b) input XRD pattern for Si, (c) XRD pattern of the DGPT-generated structure given the chemical formula and XRD, (d) XRD pattern for
the AFF-optimized DGPT structure. Similar results for LaB6 (e−h) and HfC (i−l) are shown.

Figure 5. XRD patterns for both perfect and defective two-atom silicon (JVASP-1002) structure, with and without the displacement of an atom from
its equilibrium position, are shown. The x-coordinate of the first atom is translated by 0 (panels a−c) and 0.2 (panels d−f), with the 0 translation
representing the perfect crystal. After generating the crystals, we predict their simulated patterns. We then use these patterns, along with the chemical
formula Si, to generate the DGPT-based atomic structure and its corresponding diffraction pattern. Furthermore, the DGPT-generated structure is
optimized using ALIGNN-FF, and the corresponding XRD patterns are also presented.

above analysis simulated XRD patterns were used as inputs, the experimental XRD pattern is scaled between 0 and 1 and peaks
same for experimental patterns is shown in Figure S1. The less than 0.04 as a threshold value are removed to align with the
2116 https://doi.org/10.1021/acs.jpclett.4c03137
J. Phys. Chem. Lett. 2025, 16, 2110−2119
The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter

training based simulated data. Interestingly, we observe excellent design, paving the way for enhanced research and development
agreement for Si and LaB6 case, but for HfC case we observe in materials science.
noticeable difference.
While the above analysis provides insights into the perform-
ance of the model in different scenarios, obtaining deeper

*
ASSOCIATED CONTENT
sı Supporting Information
physical insights into why these discrepancies occur is a more
The Supporting Information is available free of charge at
complex task. Due to the nature of deep learning models with
https://pubs.acs.org/doi/10.1021/acs.jpclett.4c03137.
billions of parameters, they tend to be less explainable, making it
difficult to extract detailed physical explanations. However, we Additional examples of evaluating the performance of the
plan to explore such investigations in future work to better DiffractGPT-formula model with experimental XRD
understand these behaviors. patterns as inputs (PDF)
Furthermore, there could be different types of real world
diffraction patterns including defects. An example of silicon
structure with and without defects (translated atom) is shown in
Figure 5. After constructing a perfect silicon structure with two
■ AUTHOR INFORMATION
Corresponding Author
atoms in the primitive cell, The x-coordinate of the first atom is Kamal Choudhary − Material Measurement Laboratory,
translated by 0 (panels a−c) and 0.2 (panels d−f), with the 0 National Institute of Standards and Technology, Gaithersburg,
translation representing the perfect crystal. After generating the Maryland 20899, United States; orcid.org/0000-0001-
crystals, we predict their simulated patterns. We then use these 9737-8074; Email: [email protected]
patterns, along with the chemical formula Si, to generate the Complete contact information is available at:
DGPT-based atomic structure and its corresponding diffraction https://pubs.acs.org/10.1021/acs.jpclett.4c03137
pattern. Furthermore, the DGPT-generated structure is
optimized using ALIGNN-FF, and the corresponding XRD Notes
patterns are also presented. We observe that for the defective The author declares no competing financial interest.
structure, the peaks show reasonable agreement for peaks before
45° 2θ, but after that, it begins to differ compared to input XRD
pattern. This can be attributed to the fact that the current work
has primarily focused on perfect materials with no defective
■ ACKNOWLEDGMENTS
K.C. thanks the National Institute of Standards and Technology
structures explicitly included during training. However, it could (NIST) for computational resources. K.C. thanks Maureen E.
be extended to defective materials in the future. Detecting Williams, Adam J. Biacchi and Adam A. Creuziger at NIST for
defects, such as vacancies, dislocations or other imperfections, in helpful discussion. This work was performed with funding from
materials through X-ray diffraction (XRD) is a challenging task. the CHIPS Metrology Program, part of CHIPS for America,
While XRD is commonly used to study crystalline materials, the National Institute of Standards and Technology, U.S. Depart-
presence of defects introduces complexities in the diffraction ment of Commerce. Certain commercial equipment, instru-
patterns. Previous studies, such as those utilizing convolutional ments, software, or materials are identified in this paper in order
neural networks52−54 and Long Short-Term Memory (LSTM) to specify the experimental procedure adequately. Such
networks55 for identifying vacancies, strain in semiconductors, identifications are not intended to imply recommendation or
have made progress in this area. Our model, trained on endorsement by NIST, nor it is intended to imply that the
diffraction patterns from ideal structures, can be extended to materials or equipment identified are necessarily the best
defective systems by incorporating additional training data from available for the purpose.
materials with known defects. With such data, the model should
be able to generalize and capture the diffraction features
associated with defects and dislocations.
■ REFERENCES
(1) Als-Nielsen, J.; McMorrow, D. Elements of modern X-ray physics;
In conclusion, this study introduces an efficient approach for John Wiley & Sons: West Sussex, UK, 2011.
determining crystal structures from powder X-ray diffraction (2) Giacovazzo, C. Fundamentals of crystallography; Oxford University
patterns. It goes beyond existing generative AI applications Press: Oxford, UK, 2002.
focused on scalar properties by facilitating structure generation (3) Wyon, C. X-ray metrology for advanced microelectronics.
and demonstrating the potential of using spectral data, such as European Physical Journal-Applied Physics 2010, 49, 20101.
(4) Holder, C. F.; Schaak, R. E. Tutorial on powder X-ray diffraction
XRD. The DiffractGPT model is capable of predicting material for characterizing nanoscale materials. ACS Nano 2019, 13, 7359−
properties with high accuracy, particularly when the chemical 7365.
elements of the materials are known. Notably, DiffractGPT (5) Brown, J. G. X-rays and Their Applications; Springer: New York,
outperforms conventional machine learning models, such as 2012.
gradient boosting and convolution neural network, in predicting (6) Rodriguez-Carvajal, J.; Roisnel, T. FullProf. 98 and WinPLOTR:
lattice constants while also providing the option to generate new windows 95/NT applications for diffraction; Commission for Powder
complete crystal structures. Additionally, the training process for Diffraction, International Union of Crystallography, Newsletter, 1998,
DiffractGPT is straightforward, fast and relatively easy to learn, 20, May−August.
thereby bridging the gap between the computational, data (7) Larson, A. C.; Von Dreele, R. B. General Structure Analysis System
(GSAS); Los Alamos National Laboratory Report lAUR: 1985; Vol. 86.
science, and experimental communities. As a complementary
(8) Glavic, A.; Björck, M. GenX 3: the latest generation of an
tool, we offer a framework that matches experimental XRD established tool. Journal of applied crystallography 2022, 55, 1063−
patterns with existing databases, incorporating automated 1071.
background subtraction. This work represents a significant (9) Coelho, A. A. TOPAS and TOPAS-Academic: an optimization
advancement in the automation of crystal structure determi- program integrating computer algebra and crystallographic objects
nation and provides a robust tool for data-driven materials written in C++. J. Appl. Crystallogr. 2018, 51, 210−218.

2117 https://doi.org/10.1021/acs.jpclett.4c03137
J. Phys. Chem. Lett. 2025, 16, 2110−2119
The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter

(10) Wenk, H.-R.; Lutterotti, L.; Vogel, S. Rietveld texture analysis (29) Choudhary, K. AtomGPT: Atomistic Generative Pretrained
from TOF neutron diffraction data. Powder Diffraction 2010, 25, 283− Transformer for Forward and Inverse Materials Design. J. Phys. Chem.
296. Lett. 2024, 15, 6909−6917.
(11) Choudhary, K.; DeCost, B.; Chen, C.; Jain, A.; Tavazza, F.; Cohn, (30) Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.;
R.; Park, C. W.; Choudhary, A.; Agrawal, A.; Billinge, S. J.; et al. Recent Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need.
advances and applications of deep learning methods in materials Advances in neural information processing systems; 2017; Vol. 30.
science. npj Computational Materials 2022, 8, 59. (31) Tunstall, L.; Von Werra, L.; Wolf, T. Natural language processing
(12) Vasudevan, R. K.; Choudhary, K.; Mehta, A.; Smith, R.; Kusne, with transformers; O’Reilly Media, Inc.: 2022.
G.; Tavazza, F.; Vlcek, L.; Ziatdinov, M.; Kalinin, S. V.; Hattrick- (32) Rothman, D. Transformers for Natural Language Processing: Build
Simpers, J. Materials science in the artificial intelligence age: high- innovative deep neural network architectures for NLP with Python,
throughput library generation, machine learning, and a pathway from PyTorch, TensorFlow, BERT, RoBERTa, and more; Packt Publishing Ltd:
correlations to the underpinning physics. MRS Commun. 2019, 9, 821− 2021.
838. (33) Wu, T.; He, S.; Liu, J.; Sun, S.; Liu, K.; Han, Q.-L.; Tang, Y. A
(13) Schmidt, J.; Marques, M. R.; Botti, S.; Marques, M. A. Recent brief overview of ChatGPT: The history, status quo and potential future
advances and applications of machine learning in solid-state materials development. IEEE/CAA Journal of Automatica Sinica 2023, 10, 1122−
science. npj Computational Materials 2019, 5, 83. 1136.
(14) CHIPS.Gov � nist.gov. https://www.nist.gov/chips [accessed (34) Pimentel, A.; Wagener, A.; da Silveira, E. F.; Picciani, P.; Salles,
October 10, 2024]. B.; Follmer, C.; Oliveira Jr, O. N. Challenging ChatGPT with
(15) Surdu, V.-A.; Győ rgy, R. X-ray diffraction data analysis by Chemistry-Related Subjects. ChemrXiv preprint 2023,
machine learning methods�a review. Applied Sciences 2023, 13, 9992. DOI: 10.26434/chemrxiv-2023-xl6w3. Accessed October 10, 2024.
(16) Park, W. B.; Chung, J.; Jung, J.; Sohn, K.; Singh, S. P.; Pyo, M.; (35) Jablonka, K. M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B.
Shin, N.; Sohn, K.-S. Classification of crystal structure using a Leveraging large language models for predictive chemistry. Nature
convolutional neural network. IUCrJ. 2017, 4, 486−494. Machine Intelligence 2024, 6, 161.
(17) Zhdanov, M.; Zhdanov, A. Machine learning-assisted close-set X- (36) Polak, M. P.; Morgan, D. Extracting accurate materials data from
ray diffraction phase identification of transition metals. arXiv preprint research papers with conversational language models and prompt
arXiv:2305.15410, 2023 [accessed October 10, 2024]. engineering. Nat. Commun. 2024, 15, 1569.
(18) Lee, B. D.; Lee, J.-W.; Park, W. B.; Park, J.; Cho, M.-Y.; Pal Singh, (37) Wines, D.; Gurunathan, R.; Garrity, K. F.; DeCost, B.; Biacchi, A.
S.; Pyo, M.; Sohn, K.-S. Powder X-ray diffraction pattern is all you need J.; Tavazza, F.; Choudhary, K. Recent progress in the JARVIS
for machine-learning-based symmetry identification and property infrastructure for next-generation data-driven materials design. Applied
prediction. Advanced Intelligent Systems 2022, 4, 2200042. Physics Reviews 2023, 10, 041302.
(19) Banko, L.; Maffettone, P. M.; Naujoks, D.; Olds, D.; Ludwig, A. (38) Choudhary, K.; Garrity, K. F.; Reid, A. C.; DeCost, B.; Biacchi, A.
Deep learning for visualization and novelty detection in large X-ray J.; Hight Walker, A. R.; Trautt, Z.; Hattrick-Simpers, J.; Kusne, A. G.;
diffraction datasets. Npj Computational Materials 2021, 7, 104. Centrone, A.; et al. The joint automated repository for various
(20) Yanxon, H.; Weng, J.; Parraga, H.; Xu, W.; Ruett, U.; Schwarz, N. integrated simulations (JARVIS) for data-driven materials design. npj
Artifact identification in X-ray diffraction data using machine learning computational materials 2020, 6, 173.
methods. Journal of Synchrotron Radiation 2023, 30, 137−146. (39) Jiang, A. Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot,
(21) Zaloga, A. N.; Stanovov, V. V.; Bezrukova, O. E.; Dubinin, P. S.; D. S.; Casas, D. d. l.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.;
Yakimov, I. S. Crystal symmetry classification from powder X-ray et al. Mistral 7B. arXiv preprint arXiv:2310.06825, 2023 [Accessed
diffraction patterns using a convolutional neural network. Materials October 10, 2024].
Today Communications 2020, 25, 101662. (40) Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.;
(22) Venderley, J.; Mallayya, K.; Matty, M.; Krogstad, M.; Ruff, J.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Ž ídek, A.;
Pleiss, G.; Kishore, V.; Mandrus, D.; Phelan, D.; Poudel, L.; et al. Potapenko, A.; et al. Highly accurate protein structure prediction
Harnessing interpretable and unsupervised machine learning to address with AlphaFold. Nature 2021, 596, 583−589.
big data from modern X-ray diffraction. Proceedings of the National (41) Choudhary, K.; Zhang, Q.; Reid, A. C.; Chowdhury, S.; Van
Academy of Sciences 2022, 119, No. e2109665119. Nguyen, N.; Trautt, Z.; Newrock, M. W.; Congo, F. Y.; Tavazza, F.
(23) Lee, J.-W.; Park, W. B.; Lee, J. H.; Singh, S. P.; Sohn, K.-S. A deep- Computational screening of high-performance optoelectronic materials
learning technique for phase identification in multiphase inorganic using OptB88vdW and TB-mBJ formalisms. Scientific data 2018, 5, 1−
compounds using synthetic XRD powder patterns. Nat. Commun. 2020, 12.
11, 86. (42) Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.;
(24) Maffettone, P. M.; Banko, L.; Cui, P.; Lysogorskiy, Y.; Little, M. Wang, L.; Chen, W. Lora: Low-rank adaptation of large language
A.; Olds, D.; Ludwig, A.; Cooper, A. I. Crystallography companion models. arXiv preprint arXiv:2106.09685, 2021 [Accessed October 10,
agent for high-throughput materials discovery. Nature Computational 2024].
Science 2021, 1, 290−297. (43) Unsloth AI Unsloth - GitHub Repository. https://github.com/
(25) Oviedo, F.; Ren, Z.; Sun, S.; Settens, C.; Liu, Z.; Hartono, N. T. unslothai/unsloth. Accessed: October 10, 2024.
P.; Ramasamy, S.; DeCost, B. L.; Tian, S. I.; Romano, G.; et al. Fast and (44) Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.;
interpretable classification of small X-ray diffraction datasets using data Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al.
augmentation and deep neural networks. npj Computational Materials Llama 2: Open foundation and fine-tuned chat models. arXiv preprint
2019, 5, 60. arXiv:2307.09288, 2023 [Accessed October 10, 2024].
(26) Chen, L.; Wang, B.; Zhang, W.; Zheng, S.; Chen, Z.; Zhang, M.; (45) Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.;
Dong, C.; Pan, F.; Li, S. Crystal Structure Assignment for Unknown Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama:
Compounds from X-ray Diffraction Patterns with Deep Learning. J. Am. Open and efficient foundation language models. arXiv preprint
Chem. Soc. 2024, 146, 8098. arXiv:2302.13971, 2023 [Accessed October 10, 2024].
(27) Xin, C.; Yin, Y.; Song, B.; Fan, Z.; Song, Y.; Pan, F. Machine (46) Taori, R.; Gulrajani, I.; Zhang, T.; Dubois, Y.; Li, X.; Guestrin, C.;
learning-accelerated discovery of novel 2D ferromagnetic materials Liang, P.; Hashimoto, T. B. Stanford alpaca: an instruction-following
with strong magnetization. Chip 2023, 2, 100071. llama model (2023). URL: https://github.com/tatsu-lab/stanford_
(28) Xin, C.; Song, B.; Jin, G.; Song, Y.; Pan, F. Advancements in alpaca [Accessed October 10, 2024].
High-Throughput Screening and Machine Learning Design for 2D (47) Su, J.; Ahmed, M.; Lu, Y.; Pan, S.; Bo, W.; Liu, Y. Roformer:
Ferromagnetism: A Comprehensive Review. Advanced Theory and Enhanced transformer with rotary position embedding. Neurocomput-
Simulations 2023, 6, 2300475. ing 2024, 568, 127063.

2118 https://doi.org/10.1021/acs.jpclett.4c03137
J. Phys. Chem. Lett. 2025, 16, 2110−2119
The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter

(48) Ong, S. P.; Richards, W. D.; Jain, A.; Hautier, G.; Kocher, M.;
Cholia, S.; Gunter, D.; Chevrier, V. L.; Persson, K. A.; Ceder, G. Python
Materials Genomics (pymatgen): A robust, open-source python library
for materials analysis. Comput. Mater. Sci. 2013, 68, 314−319.
(49) Xie, T.; Fu, X.; Ganea, O.-E.; Barzilay, R.; Jaakkola, T. Crystal
diffusion variational autoencoder for periodic material generation. arXiv
preprint arXiv:2110.06197, 2021 [Accessed October 10, 2024].
(50) Li, Y.; Yang, W.; Dong, R.; Hu, J. Mlatticeabc: generic lattice
constant prediction of crystal materials using machine learning. ACS
omega 2021, 6, 11585−11594.
(51) Choudhary, K.; DeCost, B.; Major, L.; Butler, K.; Thiyagalingam,
J.; Tavazza, F. Unified graph neural network force-field for the periodic
table: solid state applications. Digital Discovery 2023, 2, 346−355.
(52) Lim, B.; Bellec, E.; Dupraz, M.; Leake, S.; Resta, A.; Coati, A.;
Sprung, M.; Almog, E.; Rabkin, E.; Schulli, T.; et al. A convolutional
neural network for defect classification in Bragg coherent X-ray
diffraction. npj Computational Materials 2021, 7, 115.
(53) Boulle, A.; Debelle, A. Convolutional neural network analysis of
x-ray diffraction data: strain profile retrieval in ion beam modified
materials. Machine Learning: Science and Technology 2023, 4, 015002.
(54) Judge, W.; Chan, H.; Sankaranarayanan, S.; Harder, R. J.;
Cabana, J.; Cherukara, M. J. Defect identification in simulated Bragg
coherent diffraction imaging by automated AI. MRS Bull. 2023, 48,
124−133.
(55) Motamedi, M.; Shidpour, R.; Ezoji, M. LSTM-based framework
for predicting point defect percentage in semiconductor materials using
simulated XRD patterns. Sci. Rep. 2024, 14, 24353.

2119 https://doi.org/10.1021/acs.jpclett.4c03137
J. Phys. Chem. Lett. 2025, 16, 2110−2119

You might also like