Gupta Et Al 2023 Mppredictor An Artificial Intelligence Driven Web Tool For Composition Based Material Property
Gupta Et Al 2023 Mppredictor An Artificial Intelligence Driven Web Tool For Composition Based Material Property
■ INTRODUCTION
Traditionally, the fields of materials science and chemistry have
as input, including many properties for which large data sets
are not available, making it harder to build a highly accurate
model by traditional model training methods. The online
involved conducting simulations and hands-on experiments in
informatics tool deployed in this work can play a crucial role in
the lab to understand and discover new materials with desired
the process of materials discovery and design.
properties. Over time, the data generated by such simulations
and experiments have accumulated to the point that it has
become possible to apply data-driven methods to build
predictive models for material properties. This led to the
■ IMPLEMENTATION
The MPpredictor web tool has two primary components as
described below:
emergence of new subfields called material informatics1−5 and
cheminformatics6−8 based on the fourth paradigm of science9 • Front-end: The front-end is created using HTML
which tries to unify the first three paradigms of the experiment, (hypertext markup language), CSS (cascading style
theory, and simulation. Recent years have seen a surge in sheets), Bootstrap, and JQuery. HTML is the standard
research works that utilize data-driven methodologies to markup language for creating web pages and is used to
predict and optimize material properties.10−18 In order to describe the structure of a web page. CSS is used to style
realize the vision of the development of advanced materials, and layout web pages, and Bootstrap is a free front-end
the U.S. government launched the Materials Genome Initiative framework for faster and easier web development which
(MGI)19 in 2011. The initiative aimed to reduce the time lag includes HTML- and CSS-based design templates.
between the discovery of materials and their deployment to JQuery is a fast, small, and feature-rich JavaScript library
which makes things like HTML document traversal and
half at a fraction of the cost. The Materials Genome Initiative
manipulation, event handling, etc. We created two basic
Strategic Plan,20,21 released and expanded in 2014 and 2021,
respectively, also recognizes data analytics and artificial
intelligence (AI) as one of the main tools to integrate Received: February 27, 2023
advanced modeling, computational and experimental tools, and Published: March 27, 2023
quantitative data, to realize the vision of MGI. It is in the spirit
and pursuit of the vision and approach of MGI that we discuss
and present in this software article an online informatics tool to
predict 41 material properties using the chemical composition
© 2023 The Authors. Published by
American Chemical Society https://doi.org/10.1021/acs.jcim.3c00307
1865 J. Chem. Inf. Model. 2023, 63, 1865−1871
Journal of Chemical Information and Modeling pubs.acs.org/jcim Application Note
webpages using HTML where the first webpage handles • Next, the user has to provide a list of chemical
the user input, and the second webpage prints the compositions in the text box below the periodic table
output provided by the API. We provide forms to give a (with “Enter Your Chemical Composition Here” as the
set of inputs to the web tool and a button to predict the default text) to specify the compounds for which they
desired material properties for a given input. We also use would like to obtain the predicted material properties.
CSS and Bootstrap to create a responsive and user- For multiple inputs, the user has to separate the
friendly front-end to make an interactive environment chemical compositions with a space in between them.
for the user. JQuery is used to send/handle the request The elements indicated in red in the periodic table in
to the API when the button is clicked. Figure 2 may not be used. The user can also use the
• Back-end: The back-end comprises the trained models periodic table and the action buttons provided to put in
and the API called Flask. Flask is a microframework in and modify the chemical compositions they want to
Python that is extensively used to deploy ML models on provide input to the web tool.
the web. The models are trained using a cross-property • Finally, the user can click on the Submit button to
transfer learning framework22 to build accurate models process the inputs (i.e., selected material properties and
for 41 material properties. Details of the model training list of chemical compositions) and get the predicted
performed to build the deployed models are described in material properties. The user may see a table specifying
a later section. We create a function to wrap the model “Invalid Inputs” if there is an error in the type of input
so that it can process the request sent from the input provided to the web tool. In that case, the user can go
webpage as follows: (1) Initialize the model and reload back to the previous page (using the web browser or
the weights (for deep learning model). (2) Process the “Back to Home” button) and redo the process with a
input into the desired composition-based features as correct set of inputs.
required by the predictive model. (3) Call the predict
function of our model on the preprocessed input. (4) The results are presented in the form of a table with columns
Postprocess the output of our model for display on the representing each of the compositions given as input and rows
webpage. Flask API is mainly used for communicating representing the selected material properties from the best
and handling requests from the webpage. model. Depending on the number of material properties
selected and the chemical compositions provided, the web tool
The block diagram of these components of the web tool is may take some time (in the order of a few seconds) to generate
shown in Figure 1. the predictions. The screenshots of the main and result pages
■ USAGE
The web tool developed in this work deploys AI models to
of the material property predictor with an example of
predicting selected material properties for a given list of
chemical compositions are depicted in Figures 3 and 4. The
predict various material properties of materials based on their tool is available online at http://ai.eecs.northwestern.edu/
chemical compositions. A screenshot of the default view of the MPpredictor.
main page of the deployed material property predictor is
shown in Figure 2. The instructions to use the web tool are as
follows:
■ MODEL DEPLOYMENT
We use the cross-property transfer learning framework22
• First, the user has to select the material properties they composed of supervised learning techniques to learn the
want to predict using the web tool from a given set of predictive models for material properties, given the composi-
checkboxes. The material property to predict is set to tion of the materials without the use of structure-related
“All” as default. A labeled box with some broad attributes for 41 material properties. The training methods and
categories to which each of the properties belongs is model training performed in this work are described as follows:
also provided, which can also be used to select all Methods. The models trained using the cross-property
available properties of a particular category if desired. transfer learning framework consist of a base model, scratch
1866 https://doi.org/10.1021/acs.jcim.3c00307
J. Chem. Inf. Model. 2023, 63, 1865−1871
Journal of Chemical Information and Modeling pubs.acs.org/jcim Application Note
Figure 2. Screenshot of the default view of the main page of the deployed material property predictor web tool.
(SC) models, and transfer learning (TL) models. The base of the training data as the predicted value. For SC models, we
model is a baseline model that uses the average property value perform the model training directly on the target data set from
1867 https://doi.org/10.1021/acs.jcim.3c00307
J. Chem. Inf. Model. 2023, 63, 1865−1871
Journal of Chemical Information and Modeling pubs.acs.org/jcim Application Note
Figure 3. Screenshot of the main page of the deployed material property predictor web tool with an example input.
scratch using ML/DL models with composition-based feature extraction-based TL on the target data set. Fine-tuning
elemental fractions23 (EF) and physical attributes14 (PA) as uses the weights of the pretrained model as an initial weight for
model input without providing the model with any form of further training on the target data set with the same
pretrained knowledge. For TL models, we use a model architecture as the source model. We performed two types of
pretrained on the source data set using only EF as the fine-tuning, i.e., traditional fine-tuning and modified fine-
representation for the model input to perform fine-tuning and tuning. Traditional fine-tuning uses all the weights of the
1868 https://doi.org/10.1021/acs.jcim.3c00307
J. Chem. Inf. Model. 2023, 63, 1865−1871
Journal of Chemical Information and Modeling pubs.acs.org/jcim Application Note
Figure 4. Screenshot of the result page of the deployed material property predictor web tool with an example output.
source model without any modification. Modified fine-tuning Model Training. For model training, we divide the
also uses all the weights of the source model except for the last available data for each property into training and validation
layer, which is randomly initialized. Feature extraction-based sets in a 9:1 ratio without the holdout test set to maximize the
TL uses the activation of the pretrained model from a given data used for model training, since these models are developed
layer as the representation for each compound of the target for deployment. The prediction accuracies of the best models
selected and deployed in the web tool based on validation
data set, which is then used as an input to the ML/DL models.
MAE values for each of the 41 target properties are shown in
For example, if we extract the representation from ElemNet’s Table 1. The results on the holdout test sets are provided in
first layer, each compound will be represented as a 1024- the Supporting Information.
dimensional feature vector which can then be used as an input Scope and Limitations. In this application note, we
to other ML/DL models. The same process can be followed describe the development of an online material property
for other layers with different numbers of neurons. predictor (MPpredictor) that takes a list of chemical
1869 https://doi.org/10.1021/acs.jcim.3c00307
J. Chem. Inf. Model. 2023, 63, 1865−1871
Journal of Chemical Information and Modeling pubs.acs.org/jcim Application Note
Table 1. Prediction Performance in Terms of Validation is advantageous as structure information can be unavailable or
MAE of Best SC and TL Models Selected for Each of the difficult to obtain in many cases, while on the other hand these
Target Material Properties models cannot distinguish between structure polymorphs of a
given composition. We believe that given the huge chemical
Property Data size Base Best SC Best TL
space, composition-based models such as those deployed in
Kpoints Length Unit (Å) 28,056 18.77 10.69 11.15 this software tool can be instrumental in first identifying
Kpoints Array Average (Å) 28,171 5.232 2.692 2.779 promising composition systems of elemental combinations,
Bandgap Optb88vdw (eV) 28,163 0.987 0.253 0.228 which could subsequently be analyzed further with structure-
Formation Energy (eV/atom) 28,155 0.850 0.127 0.115 prediction methods and structure-aware property prediction
Encut (eV) 28,108 246.2 76.02 80.04 methods.14,22,23 Note that for some input compositions, there
Ehull (eV/atom) 27,297 0.130 0.054 0.0471 may be a difference between the predicted value and the value
Magmom Oszicar (μB) 25,844 1.222 0.408 0.377 specified in DFT databases due to the nature of the model
Magmom Outcar (μB) 25,357 1.172 0.385 0.343 training of the machine learning models (the presence of such
Eps Refractive Index (x) 25,150 3.829 1.259 1.204 differences is also well recognized24 even across the DFT
P Powerfact (Wm1 K1) 16,250 650.2 486.3 477.9 databases).
N Powerfact (Wm1 K1) 16,250 657.8 478.1 477.1
P Effective Masses
300 K Average (kg)
N Effective Masses
16,763
16,760
1.917
1.917
1.081
1.081
1.109
1.120
■ CONCLUSION AND SIGNIFICANCE
From a research point of view, this work explores the
300 K Average (kg) applicability of predictive modeling techniques to predict
P Seebeck (μV/K) 14,439 163.1 56.14 56.53 material properties in the form of a web tool. Unlike many
N Seebeck (μV/K) 14,144 108.6 48.47 49.73 other existing works which primarily focus on predicting a
Meps Refractive Index (x) 11,349 4.904 1.738 1.726 single or a few material properties,15,16 here we perform
Max (Phonon) Mode (cm−1) 10,963 284.6 55.97 62.12 material property prediction for a wide range of material
Min (Phonon) Mode (cm−1) 10,930 40.76 22.89 21.57 properties, comprised of both DFT-computed and exper-
Elastic Tensor C11 (GPa) 10,839 81.56 34.25 32.19 imental data sets. From a practical perspective, we have
Elastic Tensor C12 (GPa) 10,759 44.95 16.52 16.49 deployed the most accurate predictive models for 41 material
Elastic Tensor C13 (GPa) 10,846 42.51 13.86 13.46 properties in a software tool with a user-friendly and easy-to-
Elastic Tensor C22 (GPa) 10,832 84.01 33.70 31.26 access design. The main advantage of this software tool is its
Elastic Tensor C33 (GPa) 10,856 84.03 35.11 33.16 ability to accurately predict multiple material properties in a
Elastic Tensor C44 (GPa) 9986 29.52 14.94 14.51 matter of seconds by just using the chemical composition of
Elastic Tensor C55 (GPa) 9755 26.63 12.35 11.25 the material without the need for any structure-related
Elastic Tensor C66 (GPa) 9739 27.56 13.43 12.68 information, which is, in general, hard to obtain and is also
Bulk Modulus KV (GPa) 10,743 49.07 11.36 10.52 required when performing expensive DFT simulations. The
Shear Modulus GV (GPa) 10,209 24.21 11.05 10.28 deployed software tool is expected to be a valuable resource to
Bandgap MBJ (eV) 7296 1.910 0.539 0.500 assist in the process of searching for better materials with
Spillage (Å−1) 3866 0.499 0.349 0.334 improved properties for researchers and practitioners in
SLME (%) 3006 9.442 6.636 5.827 materials science, chemistry, and related communities.
Max Ir Mode (cm−1) 2302 422.5 83.68 95.71
Min Ir Mode (cm−1)
Dfpt Piezo Max Dielectric
Electronic (ε11)
2268
2126
66.13
5.715
37.39
2.809
40.72
2.520
■ ASSOCIATED CONTENT
Data Availability Statement
Dfpt Piezo Max Dielectric (ε11) 2126 6.953 3.437 3.206 The material property predictor web tool developed in this
Dfpt Piezo Max Dielectric Ioonic 2126 2.559 0.783 0.691 work is available online at http://ai.eecs.northwestern.edu/
(ε11) MPpredictor. The code used to develop the web tool is also
Dfpt Piezo Max Eij (cm−2) 1123 0.515 0.373 0.359 publicly available at https://github.com/GuptaVishu2002/
Dfpt Piezo Max Dij (cm−2) 689 44.05 21.51 19.95 MPpredictor. The code and data required to perform TL
Exfoliation Energy (eV/atom) 557 61.29 37.72 35.74 used in this study are available at 10.5281/zenodo.5533023.
Experimental Formation Energy
(eV/atom)
1643 1.033 0.101 0.072 *
sı Supporting Information
The Supporting Information is available free of charge at
Experimental Bandgap (eV) 4920 1.205 0.423 0.348
https://pubs.acs.org/doi/10.1021/acs.jcim.3c00307.
The lowest MAE values in each row are highlighted in bold.
Results on holdout test set, Table 1, and Figure 1 (PDF)
Gaithersburg, Maryland 20899, United States; Theiss (11) Agrawal, A.; Deshpande, P. D.; Cecen, A.; Basavarsu, G. P.;
Research, La Jolla, California 92037, United States; Choudhary, A. N.; Kalidindi, S. R. Exploration of data science
DeepMaterials LLC, Silver Spring, Maryland 20906, United techniques to predict fatigue strength of steel from composition and
States; orcid.org/0000-0001-9737-8074 processing parameters. Integrating Materials and Manufacturing
Innovation 2014, 3, 90−108.
Yuwei Mao − ECE Department, Northwestern University, (12) Liu, R.; Kumar, A.; Chen, Z.; Agrawal, A.; Sundararaghavan, V.;
Evanston, Illinois 60208, United States Choudhary, A. A predictive machine learning approach for micro-
Kewei Wang − ECE Department, Northwestern University, structure optimization and materials design. Sci. Rep. 2015, 5, 1−12.
Evanston, Illinois 60208, United States (13) Liu, R.; Yabansu, Y. C.; Agrawal, A.; Kalidindi, S. R.;
Francesca Tavazza − Materials Measurement Laboratory, Choudhary, A. N. Machine learning approaches for elastic localization
National Institute of Standards and Technology, linkages in high-contrast composite materials. Integrating Materials
Gaithersburg, Maryland 20899, United States and Manufacturing Innovation 2015, 4, 192−208.
Carelyn Campbell − Materials Measurement Laboratory, (14) Ward, L.; Agrawal, A.; Choudhary, A.; Wolverton, C. A general-
National Institute of Standards and Technology, purpose machine learning framework for predicting properties of
Gaithersburg, Maryland 20899, United States inorganic materials. npj Computational Materials 2016, 2, 1−7.
(15) Agrawal, A.; Choudhary, A. A fatigue strength predictor for
Wei-keng Liao − ECE Department, Northwestern University,
steels using ensemble data mining: steel fatigue strength predictor. In
Evanston, Illinois 60208, United States Proceedings of the 25th ACM International on Conference on Information
Alok Choudhary − ECE Department, Northwestern and Knowledge Management, 2016; pp 2497−2500.
University, Evanston, Illinois 60208, United States (16) Agrawal, A.; Meredig, B.; Wolverton, C.; Choudhary, A. A
Complete contact information is available at: formation energy predictor for crystalline materials using ensemble
https://pubs.acs.org/10.1021/acs.jcim.3c00307 data mining. In 2016 IEEE 16th International Conference on Data
Mining Workshops (ICDMW), 2016; pp 1276−1279.
(17) Jha, D.; Gupta, V.; Liao, W.-k.; Choudhary, A.; Agrawal, A.
Notes Moving closer to experimental level material property prediction
The authors declare no competing financial interest. using AI. Sci. Rep. 2022, 12, 12.
(18) Gupta, V.; Liao, W.-k.; Choudhary, A.; Agrawal, A. Brnet:
■ ACKNOWLEDGMENTS
This work was performed under the following financial
Branched residual network for fast and accurate predictive modeling
of materials properties. In Proceedings of the 2022 SIAM interna-
tional conference on data mining (SDM). Society for Industrial and
assistance: Award 70NANB19H005 from the U.S. Department Applied Mathematics 2022, 343−351.
of Commerce, National Institute of Standards and Technology (19) Materials Genome Initiative for Global Competitiveness; National
as part of the Center for Hierarchical Materials Design Science and Technology Council, 2011.
(CHiMaD). Partial support is also acknowledged from NSF (20) Drosback, M. Materials genome initiative: advances and
initiatives. JOM 2014, 66, 334.
Award CMMI-2053929 and DOE Awards DE-SC0019358 and
(21) Initiative, M. G. Strategic Plan, 2021. Materials Genome
DE-SC0021399, and Northwestern Center for Nanocombina- Initiative. https://www.mgi.gov (accessed 2022−06−14).
torics. (22) Gupta, V.; Choudhary, K.; Tavazza, F.; Campbell, C.; Liao, W.-
■ REFERENCES
(1) Agrawal, A.; Choudhary, A. Perspective: Materials informatics
k.; Choudhary, A.; Agrawal, A. Cross-property deep transfer learning
framework for enhanced predictive analytics on small materials data.
Nat. Commun. 2021, 12, 1−10.
(23) Jha, D.; Ward, L.; Paul, A.; Liao, W.-k.; Choudhary, A.;
and big data: Realization of the “fourth paradigm” of science in Wolverton, C.; Agrawal, A. ElemNet: Deep Learning the Chemistry of
materials science. Apl Materials 2016, 4, 053208. Materials From Only Elemental Composition. Sci. Rep. 2018, 8,
(2) Kalidindi, S. R.; De Graef, M. Materials data science: current 17593.
status and future outlook. Annu. Rev. Mater. Res. 2015, 45, 171−193. (24) Hegde, V. I.; Borg, C. K.; del Rosario, Z.; Kim, Y.; Hutchinson,
(3) Rajan, K. Materials informatics: The materials “gene” and big M.; Antono, E.; Ling, J.; Saxe, P.; Saal, J. E.; Meredig, B.
data. Annu. Rev. Mater. Res. 2015, 45, 153−169. Reproducibility in high-throughput density functional theory: a
(4) Agrawal, A.; Choudhary, A. Deep materials informatics: comparison of AFLOW, Materials Project, and OQMD. arXiv
Applications of deep learning in materials science. MRS Commun. Preprint, arXiv:2007.01988, 2020.
2019, 9, 779−792.
(5) Choudhary, K.; DeCost, B.; Chen, C.; Jain, A.; Tavazza, F.;
Cohn, R.; Park, C. W.; Choudhary, A.; Agrawal, A.; Billinge, S. J. L.;
Holm, E.; Ong, S. P.; Wolverton, C. Recent advances and applications
of deep learning methods in materials science. npj Computational
Materials 2022, 8, 1−26.
(6) Xu, J.; Hagler, A. Chemoinformatics and drug discovery.
Molecules 2002, 7, 566−600.
(7) Hassan, M.; Brown, R. D.; Varma-O’Brien, S.; Rogers, D.
Cheminformatics analysis and learning in a data pipelining environ-
ment. Molecular diversity 2006, 10, 283−299.
(8) Lo, Y.-C.; Rensi, S. E.; Torng, W.; Altman, R. B. Machine
learning in chemoinformatics and drug discovery. Drug discovery today
2018, 23, 1538−1546.
(9) Hey, A. J.; Tansley, S.; Tolle, K. M. The Fourth Paradigm: Data-
Intensive Scientific Discovery; Microsoft Research: Redmond, WA,
2009; Vol. 1.
(10) Gopalakrishnan, K.; Agrawal, A.; Ceylan, H.; Kim, S.;
Choudhary, A. Knowledge discovery and data mining in pavement
inverse analysis. Transport 2013, 28, 1−10.
1871 https://doi.org/10.1021/acs.jcim.3c00307
J. Chem. Inf. Model. 2023, 63, 1865−1871