Predicting Material Properties Using Machine Learning For Accelerated Materials Discovery
Predicting Material Properties Using Machine Learning For Accelerated Materials Discovery
com
Volume 1, Issue 3, 2022
_________________________________________________________________________________________
Abstract
The rapid prediction of material properties has become a pivotal factor in accelerating materials discovery and development,
driven by advancements in machine learning and data-driven methodologies. This paper presents a novel system for predicting
material properties using machine learning techniques, offering a scalable and efficient framework for exploring new materials
with optimized properties. The system incorporates large datasets, feature engineering, and multiple machine learning models,
such as Kernel Ridge Regression, Random Forest, and Neural Networks, to predict material properties like thermal
conductivity, elastic modulus, and electronic bandgap. By integrating physics-based knowledge into machine learning models,
the proposed system enhances the accuracy and interpretability of predictions. The results indicate that the system can
significantly reduce the time and cost of material discovery while delivering high prediction accuracy. This is the potential
approach to revolutionize materials science by enabling researchers to identify promising material candidates in silico, paving
the way for breakthroughs in energy, electronics, and sustainable materials.
Keywords: Machine Learning, Feature Engineering, Materials Informatics, Data-Driven Discovery, Random Forest, Kernel
Ridge Regression, Neural Networks.
How to Cite : Suryawanshi, N. S. (2022). Predicting Material Properties Using Machine Learning for Accelerated Materials
Discovery. International Journal of Scientific Research and Modern Technology, 1(3).
https://doi.org/10.38124/ijsrmt.v1i3.89 23
At its core this paradigm can predict material potentially leading to more accurate models for property
properties, allowing researchers to computationally screen prediction. The impact of these advancements is far-
vast numbers of potential materials before engaging in time- reaching, with the potential to revolutionize industries such
consuming and costly experimental procedures [5]. This as electronics, energy, aerospace, and healthcare by
approach holds the potential to drastically reduce the time significantly accelerating materials discovery and
and expense associated with materials discovery by optimization [18].
identifying high-potential candidates for applications
spanning energy storage, electronics, and structural This paper explores advancements in predicting
materials [6,7]. material properties to accelerate the discovery of novel
materials. We review current methodologies, challenges,
The success of this data-driven strategy is driven by the and opportunities in this dynamic field, focusing on various
application of machine learning algorithms, which can machine-learning techniques and the essential role of big
capture and model complex relationships between material data in enhancing material property predictions.
composition, structure, and properties [8]. These algorithms Additionally, we discuss the integration of computational
are trained on large datasets of existing materials and their predictions with experimental validation.
known properties, enabling predictions about new materials
that have not yet been explored [9]. Machine learning has The proposed system employs a hybrid machine
already demonstrated its potential in several material learning framework that utilizes deep neural networks,
science applications, including thermoelectric material regression models, and decision trees to analyze large-scale
discovery, crystal structure prediction [10], and catalyst datasets and identify complex patterns in material
optimization [11]. compositions. By incorporating feature engineering and
physics-based knowledge, this system aims to improve the
In Figure 1: Machine Learning Methodology First, accuracy and interpretability of predictions.
material patterns within a group are turned into numerical
fingerprint vectors. Then, a method to measure how This approach by enhancing predictive capabilities
chemically similar or different they are, called chemical paves the way for breakthroughs in clean energy, advanced
distance, is used in a learning model here, kernel ridge electronics, and sustainable materials, driving innovation
regression to connect these distances to their properties [12]. across multiple industries through the intersection of
materials science, data analytics, and machine learning.
One of the significant challenges in property prediction
lies in creating and curating high-quality datasets for model II. LITERATURE REVIEW
training [13]. Efforts like the Materials Project, AFLOW,
and OQMD have made substantial progress in gathering and Greeley et al. pioneered a novel method for high-
organizing materials data, making it available to researchers throughput computational screening of surface catalysts,
worldwide [14,15,16]. Combining these datasets with merging density functional theory (DFT) calculations with
advanced data mining techniques has allowed for extracting thermodynamic modeling to predict the stability and activity
valuable insights and patterns from existing knowledge of binary surface alloys for hydrogen evolution [18]. This
about materials. foundational work established a framework that uses
computational techniques to efficiently evaluate the specific
One of the significant challenges in property prediction properties of numerous materials.
lies in creating and curating high-quality datasets for model
training [13]. Efforts like the Materials Project, AFLOW, Expanding on this concept, Johannesson et al.
and OQMD have made substantial progress in gathering and introduced a genetic algorithm approach to predict stable
organizing materials data, making it available to researchers alloy compositions [19]. Their method integrated DFT
worldwide [14,15,16]. Combining these datasets with calculations with a genetic algorithm, effectively navigating
advanced data mining techniques has allowed for extracting the extensive compositional landscape of multicomponent
valuable insights and patterns from existing knowledge alloys. This research demonstrated the effectiveness of
about materials. combining computational methods with optimization
algorithms to hasten the discovery of new materials with
An equally important aspect of this predictive process desired characteristics.
is the development of appropriate descriptors, or features,
that can effectively capture the essential characteristics of Balachandran et al. were among the early adopters of
materials [17]. These descriptors, which range from simple machine learning in materials science, focusing on property
elemental composition to more complex electronic structure prediction [20]. They developed a support vector machine
attributes, significantly influence the accuracy and (SVM) model to predict the formation of specific crystal
generalizability of predictive models. structures in AB2 intermetallic compounds. This study
illustrated the capacity of machine learning techniques to
As the field continues to evolve, there is a growing uncover intricate relationships between material
emphasis on integrating physics-based principles with composition and structure, facilitating the rapid screening of
machine learning models to create hybrid approaches that new compounds.
offer both predictive power and interpretability. These
methods aim to combine the flexibility of machine learning
with the fundamental insights provided by materials science,
24
Long et al. advanced the machine learning application engaged in creating machine-learning models for material
in materials science by creating a neural network model to property predictions.
predict the glass-forming ability of metallic alloys [21].
Their model, trained on a comprehensive dataset of known In energy materials research, Olivares-Amaya et al.
glass-forming alloys, showcased the ability of machine designed a high-throughput computational strategy for
learning to capture complex, non-linear relationships screening organic photovoltaic materials [26]. By
present in materials data. combining DFT calculations with a genetic algorithm, they
explored the vast chemical space of organic molecules,
Rupp et al. contributed significantly by proposing a identifying promising candidates for solar cell applications.
machine-learning model for predicting molecular This work illustrated the potential of computational
atomization energies [22]. They introduced a Gaussian screening in accelerating the discovery of new materials for
process regression model that utilized a unique descriptor renewable energy solutions.
based on the Coulomb matrix that effectively captures the
atomic arrangement of molecules. This research highlighted Sharma et al. applied machine learning to the challenge
how machine learning could accurately forecast quantum of predicting thermoelectric properties [27]. They
mechanical properties while reducing the computational developed a support vector regression model to estimate the
expense associated with traditional methods. thermoelectric figure of merit (ZT) of half-Heusler
compounds. This study demonstrated how machine learning
Pilania et al. enhanced the field by creating a machine- can guide the identification of high-performance
learning framework for predicting the bandgaps of double thermoelectric materials, which are crucial for waste heat
perovskites [23]. By merging high-throughput DFT recovery and solid-state cooling technologies.
calculations with statistical learning techniques, they
established a predictive model for bandgaps, demonstrating Lookman et al. provided a thorough review of the
the capability of machine learning to efficiently screen challenges and opportunities in materials discovery and
numerous compounds for specific electronic properties. design through machine learning [28]. Their discussion
encompassed various facets of the field, including data
Ghiringhelli et al. addressed the critical issue of feature generation, curation, feature engineering, model selection,
selection in materials informatics by introducing a method and the integration of machine learning with physical
known as SISSO (sure independence screening and models. This work emphasized the interdisciplinary nature
sparsifying operator), aimed at identifying optimal of materials informatics, highlighting the necessity for
descriptors from a large array of potential features [24]. This collaboration among materials scientists, computer
work underscored the significance of feature engineering in scientists, and data scientists.
developing accurate and interpretable machine-learning
models for material property predictions. Ramakrishna et al. examined the broader potential of
machine learning in materials science, discussing its
Ward et al. built upon the field of feature engineering applications across various subfields and its capacity to
by presenting an extensive set of compositional descriptors expedite materials discovery and development [29]. They
for inorganic materials [25]. They developed 145 stressed the importance of developing interpretable
descriptors based on elemental properties, showcasing their machine-learning models and combining data-driven
effectiveness in predicting various material characteristics. strategies with domain expertise in materials science.
This research provided valuable tools for researchers
25
[34] Proposed a framework for computational Density functional DFT-calculated Simplifications in models
catalyst design, demonstrating its theory calculations, adsorption may limit accuracy for
potential through case studies on microkinetic modeling. energies and complex catalytic systems;
ammonia synthesis and hydrogen reaction barriers. focused on specific
evolution catalysts. reactions.
[35] Developed an adaptive design strategy Bayesian optimization, PMN-PT May require significant
using uncertainties, leading to the uncertainty piezoelectric computational resources for
discovery of a new class of high- quantification, density composition- complex material systems;
performance piezoelectrics, functional theory property limited to specific material
outperforming previous materials. calculations. database. classes.
[36] Created a data mining approach for ionic Data mining Inorganic Limited to ionic compounds;
substitutions, leading to the discovery of techniques, ionic Crystal Structure may miss non-traditional
209 new ternary compounds, with 177 substitution rules, Database substitutions or compounds
confirmed stable DFT calculations. density functional (ICSD). with complex bonding.
theory calculations.
[37] Developed a novel materials Machine learning, data Materials Accuracy depends on the
representation method using structural mining, electronic Project database quality and diversity of the
and electronic fingerprints, enabling structure calculations training data; may struggle
efficient exploration and visualization of with very complex materials
materials space.
26
[47] Developed a computational screening Density functional theory Binary surface Simplifications in models
method for electrocatalysts, calculations, screening alloy adsorption may limit accuracy for
identifying new promising materials algorithms energies complex catalytic systems
for hydrogen evolution.
[48] Determined structural, vibrational, Density functional theory, Carbon allotrope Limited to specific carbon-
and thermodynamic properties of phonon calculations properties based materials; accuracy
diamond, graphite, and derivatives depends on DFT
using first-principles calculations. functionals used
In addition to improving predictive accuracy and experimental validation, this system aims to make a
efficiency, the system is expected to generate valuable meaningful and lasting impact on the future of materials
insights into the underlying factors governing material science research.
behavior. By analyzing the intricate relationships between
chemical compositions and material properties, researchers V. CONCLUSION
will gain a deeper understanding of how to engineer
materials with desired characteristics. Such insights will The proposed system represents a significant
pave the way for innovative material formulations and advancement in the field of materials science, where the
applications that meet the evolving demands of technology. integration of machine learning models with large material
datasets enables rapid and accurate prediction of material
Moreover, the integration of data analytics with properties. This data-driven approach has the potential to
materials science is likely to foster interdisciplinary transform the traditional trial-and-error method of materials
collaboration among researchers from diverse fields. This discovery into an accelerated, computationally guided
collaborative environment is essential for driving innovation process. By leveraging machine learning techniques such as
and advancing materials discovery, as it encourages the Kernel Ridge Regression, Random Forest, and Neural
exchange of knowledge and expertise. Overall, the expected Networks, combined with physics-based insights, the
outcomes of the proposed system will significantly enhance system is capable of predicting key material properties with
predictive capabilities in materials science, ultimately high accuracy and reliability. The expected results
leading to breakthroughs in renewable energy technologies, demonstrate a substantial reduction in the time and cost of
electronic materials, and sustainable resource development. discovering new materials, which has far-reaching
By bridging the gap between computational predictions and implications for industries focused on clean energy,
29
electronics, and sustainable materials development. As the [15]. Kirklin, Scott, et al. "The Open Quantum Materials
system is further refined and validated, it will contribute to Database (OQMD): assessing the accuracy of DFT
faster innovation cycles and a deeper understanding of formation energies." npj Computational Materials
material behavior, ultimately driving scientific and 1.1 (2015): 1-15.
technological progress in materials science. [16]. Ghiringhelli, Luca M., et al. "Big data of materials
science: critical role of the descriptor." Physical
REFERENCES review letters 114.10 (2015): 105503.
[17]. Ward, Logan, and Chris Wolverton. "Atomistic
[1]. Rajan, Krishna. "Materials informatics: The
calculations and materials informatics: A review."
materials “gene” and big data." Annual Review of
Current Opinion in Solid State and Materials Science
Materials Research 45.1 (2015): 153-169.
21.3 (2017): 167-176.
[2]. Agrawal, Ankit, and Alok Choudhary. " Perspective:
Materials informatics and big data: Realization of [18]. Greeley, Jeff, Jens K. Nørskov, and Manos
the “fourth paradigm” of science in materials Mavrikakis. "Electronic structure and catalysis on
science." Apl Materials 4.5 (2016). metal surfaces." Annual review of physical
chemistry 53.1 (2002): 319-348.
[3]. Curtarolo, Stefano, et al. "The high-throughput
highway to computational materials design." Nature [19]. Johannesson, Gisli Holmar, et al. "Combined
materials 12.3 (2013): 191-201. electronic structure and evolutionary search
approach to materials design." Physical Review
[4]. Liu, Yue, et al. "Materials discovery and design
Letters 88.25 (2002): 255506.
using machine learning." Journal of Materiomics 3.3
(2017): 159-177. [20]. Balachandran, Prasanna V., Scott R. Broderick, and
Krishna Rajan. "Identifying the ‘inorganic gene’for
[5]. Jain, Anubhav, et al. "Commentary: The Materials
high-temperature piezoelectric perovskites through
Project: A materials genome approach to
statistical learning." Proceedings of the Royal
accelerating materials innovation." APL materials
Society A: Mathematical, Physical and Engineering
1.1 (2013).
Sciences 467.2132 (2011): 2271-2290.
[6]. Butler, Keith T., et al. "Machine learning for
[21]. Long, Zhilin, et al. "A new criterion for predicting
molecular and materials science." Nature 559.7715
the glass-forming ability of bulk metallic glasses."
(2018): 547-555. Journal of Alloys and Compounds 475.1-2 (2009):
[7]. Ward, Logan, et al. "A general-purpose machine 207-219.
learning framework for predicting properties of
[22]. Rupp, Matthias, et al. "Fast and accurate modeling
inorganic materials." npj Computational Materials
of molecular atomization energies with machine
2.1 (2016): 1-7.
learning." Physical review letters 108.5 (2012):
[8]. Meredig, Bryce, et al. "Combinatorial screening for 058301.
new materials in unconstrained composition space
[23]. Pilania, Ghanshyam, et al. "Accelerating materials
with machine learning." Physical Review B 89.9
property predictions using machine learning."
(2014): 094104.
Scientific reports 3.1 (2013): 2810.
[9]. Gaultois, Michael W., et al. "Perspective: Web-
[24]. Ghiringhelli, Luca M., et al. "Big data of materials
based machine learning models for real-time
science: critical role of the descriptor." Physical
screening of thermoelectric materials properties."
review letters 114.10 (2015): 105503.
Apl Materials 4.5 (2016).
[25]. Ward, Logan, et al. "A general-purpose machine
[10]. Ulissi, Zachary W., et al. "To address surface
learning framework for predicting properties of
reaction network complexity using scaling relations
inorganic materials." npj Computational Materials
machine learning and DFT calculations." Nature
2.1 (2016): 1-7.
communications 8.1 (2017): 14621.
[26]. Olivares-Amaya, Roberto, et al. "Accelerated
[11]. Raccuglia, Paul, et al. "Machine-learning-assisted
computational discovery of high-performance
materials discovery using failed experiments."
materials for organic photovoltaics by means of
Nature 533.7601 (2016): 73-76. cheminformatics." Energy & Environmental Science
[12]. Pilania, Ghanshyam, et al. "Accelerating materials 4.12 (2011): 4849-4861.
property predictions using machine learning."
[27]. Sharma, Vinit, Chenchen Wang, Robert G.
Scientific reports 3.1 (2013): 2810.
Lorenzini, Rui Ma, Qiang Zhu, Daniel W. Sinkovits,
[13]. Ong, Shyue Ping, et al. "Python Materials Genomics Ghanshyam Pilania et al. "Rational design of all
(pymatgen): A robust, open-source python library organic polymer dielectrics." Nature
for materials analysis." Computational Materials communications 5, no. 1 (2014): 4845.
Science 68 (2013): 314-319.
[28]. Lookman, Turab, Francis J. Alexander, and Krishna
[14]. Curtarolo, Stefano, et al. "AFLOWLIB. ORG: A Rajan, eds. Information science for materials
distributed materials properties repository from discovery and design. Vol. 1. Switzerland: Springer
high-throughput ab initio calculations." International Publishing, 2016.
Computational Materials Science 58 (2012): 227-
235.
30
[29]. Ramakrishna, S., Zhang, T. Y., Lu, W. C., Qian, Q., [43]. Klanner, Catharina, David Farrusseng, Laurent
Low, J. S. C., Yune, J. H. R., ... & Kalidindi, S. R. Baumes, Mourad Lengliz, Claude Mirodatos, and
(2018). Materials informatics. Journal of Intelligent Ferdi Schüth. "The development of descriptors for
Manufacturing, 29(6), 1-20. solids: teaching “catalytic intuition” to a computer."
[30]. Curtarolo, Stefano, et al. "Predicting crystal Angewandte Chemie 116, no. 40 (2004): 5461-5463.
structures with data mining of quantum [44]. Saad, Yousef, et al. "Data mining for materials:
calculations." Physical review letters 91.13 (2003): Computational experiments with AB compounds."
135503. Physical Review B—Condensed Matter and
[31]. Hautier, Geoffroy, et al. "Finding nature’s missing Materials Physics 85.10 (2012): 104104.
ternary oxide compounds using machine learning [45]. Meredig, Bryce, and C. Wolverton. " A hybrid
and density functional theory." Chemistry of computational–experimental approach for
Materials 22.12 (2010): 3762-3767. automated crystal structure solution." Nature
[32]. Behler, Jörg, and Michele Parrinello. "Generalized materials 12.2 (2013): 123-127.
neural-network representation of high-dimensional [46]. Hautier, Geoffroy, et al. "Identification and design
potential-energy surfaces." Physical review letters principles of low hole effective mass p-type
98.14 (2007): 146401. transparent conducting oxides." Nature
[33]. Sabin, T. J., C. A. L. Bailer-Jones, and P. J. Withers. communications 4.1 (2013): 2292.
" Accelerated learning using Gaussian process [47]. Greeley, Jeff, et al. "Computational high-throughput
models to predict static recrystallization in an Al-Mg screening of electrocatalytic materials for hydrogen
alloy." Modelling and Simulation in Materials evolution." Nature materials 5.11 (2006): 909-913.
Science and Engineering 8.5 (2000): 687. [48]. Mounet, Nicolas, and Nicola Marzari. "First-
[34]. Nørskov, Jens Kehlet, et al. "Towards the principles determination of the structural, vibrational
computational design of solid catalysts." Nature and thermodynamic properties of diamond, graphite,
chemistry 1.1 (2009): 37-46. and derivatives." Physical Review B—Condensed
[35]. Balachandran, Prasanna V., et al. "Adaptive Matter and Materials Physics 71.20 (2005): 205214.
strategies for materials design using uncertainties." [49]. Ashby, M. F. "Multi-objective optimization in
Scientific reports 6.1 (2016): 19660. material design and selection." Acta materialia 48.1
[36]. Hautier, Geoffroy, et al. "Data mined ionic (2000): 359-369.
substitutions for the discovery of new compounds." [50]. Franceschetti, Alberto, and Alex Zunger. "The
Inorganic chemistry 50.2 (2011): 656-663. inverse band-structure problem of finding an atomic
[37]. Isayev, Olexandr, et al. "Materials cartography: configuration with given electronic properties."
representing and mining materials space using Nature 402.6757 (1999): 60-63
structural and electronic fingerprints." Chemistry of
Materials 27.3 (2015): 735-743.
[38]. Saal, James E., et al. "Materials design and discovery
with high-throughput density functional theory: the
open quantum materials database (OQMD)." Jom 65
(2013): 1501-1509.
[39]. Fujimura, Koji, Atsuto Seko, Yukinori Koyama,
Akihide Kuwabara, Ippei Kishida, Kazuki Shitara,
Craig AJ Fisher, Hiroki Moriwake, and Isao Tanaka.
"Accelerated materials design of lithium superionic
conductors based on first-principles calculations and
machine learning algorithms." Advanced Energy
Materials 3, no. 8 (2013): 980-985.
[40]. Srinivasan, Srikant, et al. "Mapping Chemical
Selection Pathways for Designing Multicomponent
Alloys: an informatics framework for materials
design." Scientific reports 5.1 (2015): 17960.
[41]. Fischer, Christopher C., Kevin J. Tibbetts, Dane
Morgan, and Gerbrand Ceder. " Predicting crystal
structure by merging data mining with quantum
mechanics." Nature materials 5, no. 8 (2006): 641-
646.
[42]. Bligaard, Thomas, Jens Kehlet Nørskov, Søren
Dahl, J. Matthiesen, Claus H. Christensen, and JJJoC
Sehested. "The Brønsted–Evans–Polanyi relation
and the volcano curve in heterogeneous catalysis."
Journal of catalysis 224, no. 1 (2004): 206-217.
31