Big Semantic Data Processing in The Materials Design Domain: Definitions

Big Semantic Data Processing in the Materials

Design Domain

Patrick Lambrix, Rickard Armiento, Anna Delin, Huanyu Li

Definitions challenges include energy storage, solar

cells, thermoelectrics, and magnetic
To speed up the progress in the field transport (Ceder and Persson (2013);
of materials design, a number of chal- Jain et al (2013); Curtarolo et al (2013)).
lenges related to big data need to be ad- The space of potentially useful
dressed. This entry discusses these chal- materials yet to be discovered — the
lenges and shows the semantic technolo- so-called ‘chemical white space’ — is
gies that alleviate the problems related to immense. The possible combinations
Variety, Variability and Veracity. of, say, up to six different elements,
constitute many billions. The space
is further extended by possibilities
of different phases, low-dimensional
Overview systems, nanostructuring, and so forth,
which adds several orders of magnitude.
Materials design and materials in- This space was traditionally explored by
formatics is central for technological experimental techniques, i.e., materials
progress, not the least in the green synthesis and subsequent experimental
engineering domain. Many traditional characterization. Parsing and searching
materials contain toxic or critical raw the full space of possibilities this way
materials, whose use should be avoided is however hardly practical. Recent
or eliminated. Also, there is an urgent advances in condensed matter theory
need to develop new environmentally and materials modeling make it possible
friendly energy technology. Presently, to generate reliable materials data by
relevant examples of materials design means of computer simulations based

on quantum mechanics (Lejaeghere et al to be developed, and integration of the

(2016)). High-throughput simulations material into the production must be
combined with machine learning can realized. Furthermore, life-cycle aspects
speed up progress significantly and also of the material need to be assessed.
help to break out of local optima in Today, this post-invention process takes
composition space to reveal unexpected typically about two decades (Mulhol-
solutions and new chemistries (Gaultois land and Paradiso (2016); Jain et al
et al (2016)). (2013)). Shortening this time is in itself
This development has led an important strategic goal, which could
to a global effort — known as be realized with the help of an inte-
the Materials Genome Initiative grated informatics approach (Jain et al
(https://www.mgi.gov/) — to (2013), Materials Genome Initiative
assemble and curate databases that https://www.mgi.gov/).
combine experimentally known and To summarize, it is clear that ma-
computationally predicted materials terials data, experimental as well as
properties. A central idea is that materi- simulated, has the potential to speed
als design challenges can be addressed up progress significantly in many steps
by searching these databases for entries in the chain starting with materials
with desired combinations of properties. discovery, all the way to marketable
Nevertheless, these data sources also product. However, the data needs to
open up for materials informatics, i.e., be suitably organized and easily ac-
the use of big data methodology and cessible, which in practice is highly
data mining techniques to discover nontrivial to achieve. It will require a
new physics from the data itself. A multidisciplinary effort and the various
workflow for such a discovery process conventions and norms in use need to
can be based on a typical data ming be integrated. Materials data is highly
process, where key factors are identified, heterogeneous and much of it is cur-
reduced and extracted from heteroge- rently hidden behind corporate walls
neous databases, similar materials are (Mulholland and Paradiso (2016)).
identified by modeling and relationship
mining and properties are predicted
through evaluation and understanding
of the results from the data mining Big Data Challenges
techniques (Agrawal and Alok (2016)).
The use of the data in such a workflow To implement the data-driven materials
requires addressing problems with data design workflow, we need to deal with
integration, provenance, and seman- several of the big data properties (e.g.
tics, which remains an active field of Rajan (2015)).
research. Volume refers to the quantity of the
Even when a new material has been generated and stored data. The size of
invented and synthesized in a lab, the data determines the value and poten-
much work remains before it can be tial insight. Although the experimental
deployed. Production methods allowing materials science does not generate
manufacturing the material at large huge amounts of data, computer sim-
scale in a cost effective manner need ulations with accuracy comparable to
experiments can. Moreover, going from provenance information from which

state-of-the-art static simulations at one can derive the data quality. Not all
temperature T=0K towards realistic the computed data is confirmed by lab
descriptions of materials properties at experiments. Some data is generated
temperatures of operation in devices and by machine learning and data mining
tools will raise these amounts as well. algorithms.
Variety refers to the type and nature of
the data. The materials databases are het-
erogeneous in different ways. They store
different kinds of data and in different Sources of data and Semantic
formats. Some databases contain infor- Technologies
mation about materials crystal structure,
some about their thermochemistry, oth- Although the majority of materials data
ers about mechanical properties. More- that has been produced by measurement
over, different properties may have the or through predictive computation have
same names, while the same information not yet become organized in general
may be represented differently in differ- easy-to-use databases, several sizable
ent databases. databases and repositories do exist.
Velocity refers to the speed at which However, as they are heterogeneous in
the data is generated and processed to nature, semantic technologies are im-
meet the demands and challenges that lie portant for the selection and integration
in the path of growth and development. of the data to be used in the materials
In computational materials science new design workflow. This is particularly im-
data is generated continuously, by a large portant to deal with Variety, Variability
number of groups all other the world. and Veracity.
In principle, one can store summary re- Within this field the use of seman-
sults and data streams from a specific tic technologies is in its infancy with
run as long as one needs (days, weeks, the development of ontologies and stan-
years) and analyze it afterwards. How- dards. Ontologies aim to define the ba-
ever, to store all the data indefinitely may sic terms and relations of a domain of
be a challenge. Some data needs to be re- interest, as well as the rules for com-
moved as the storage capacity is limited. bining these terms and relations. They
Variability deals with the consistency standardize terminology in a domain and
of the data. Inconsistency of the data are a basis for semantically enriching
set can hamper processes to handle and data, integration of data from different
manage it. This can occur for single databases (Variety), and reasoning over
databases as well as data that was the data (Variability and Veracity). Ac-
integrated from different sources. cording to Zhang et al (2015a) in the
Veracity deals with the quality of the materials domain ontologies have been
data. This can vary greatly, affecting used to organize materials knowledge in
accurate analysis. The data generated a formal language, as a global concep-
within materials science may contain tualization for materials information in-
errors, and it is often noisy. The quality tegration (e.g. Cheng et al (2014)), for
of the data is different in different linked materials data publishing, for in-
databases. It may be challenging to have ference support for discovering new ma-
terials and for semantic query support ber of databases for phase identification
(e.g., Zhang et al (2015b, 2017)). are hosted. These databases have been
Further, standards for exporting in use by experimentalists for a long
data from databases and between tools time.
are being developed. These standards Springer Materials (http:
provide a way to exchange data be- //materials.springer.com/)
tween databases and tools, even if the contains among many other data sources
internal representations of the data in the well-known Landolt Bornstein
the databases and tools are different. database, an extensive data collection
They are a prerequisite for efficient from many areas of physical sciences
materials data infrastructures that allow and engineering. Similarly, The Japan
for the discovery of new materials National Institute of Material Science
(Austin (2016)). In several cases the (NIMS) Materials Database MatNavi
standards formalize the description of (http://mits.nims.go.jp/
materials knowledge (and thereby create index_en.html) contains a wide
ontological knowledge). collection of mostly experimental but
In the remainder of this section a brief also some computational electronic
overview of databases, ontologies and structure data.
standards in the field is given. Thermodynamical data, necessary
for computing phase diagrams with the
CALPHAD method, exist in many dif-
ferent databases (Campbell et al (2014)).
Databases Open access databases with relevant
data can be found through OpenCalphad
The Inorganic Crystal Struc- (http://www.opencalphad.
ture Database (ICSD, https: com/databases.html).
//icsd.fiz-karlsruhe.de/) Databases of results from electron
is a frequently utilized database for structure calculations have existed in
completely identified inorganic crystal some form for several decades. In 1978,
structures, with nearly 200k entries Moruzzi, Janak, and Williams published
(Belsky et al (2002); Bergerhoff et al a book with computed electronic prop-
(1983)). The data contained in ICSD erties such as, e.g., density of states,
serve as an important starting point bulk modulus and cohesive energy
in many electronic structure calcula- of all metals (Moruzzi et al (2013)).
tions. Several other crystallographic Only recently however, the use of such
information resources are also avail- databases have become widespread, and
able (Glasser (2016)). A popular open some of these databases have grown to a
access resource is the Crystallography substantial size.
Open Database (COD, http://www. Among the more recent efforts
crystallography.net/cod/) to collect materials properties ob-
with nearly 400k entries (Grazulis et al tained from electronic structure
(2012)). calculations publicly available a
At the International Cen- few prominent examples include the
tre for Diffraction Data (ICDD, Electronic Structure Project (ESP)
http://www.icdd.com/) a num- (http://materialsgenome.se)
with ca 60k electronic structure re- Ontologies

sults, Aflow (Curtarolo et al (2012),
http://aflowlib.org/) with We introduce the features of current ma-
data on over 1.7 million com- terials ontologies from a materials (Table
pounds, the Materials Project with 1) and a knowledge representation per-
data on nearly 70k inorganic com- spective (Table 2), respectively.
pounds (Jain et al (2013), https: Most ontologies focus on specific
//materialsproject.org/), sub-domains of the materials field
the Open Quantum Materials Database (Domain in Table 1) and have been
(OQMD, http://oqmd.org/), with developed with a specific use in mind
over 470k entries (Saal et al (2013)), (Application Scenario in Table 1).
and the NOMAD repository with 44 The Materials Ontology in Ashino
million electronic structure calcula- (2010) was designed for data exchange
tions (https://repository. among thermal property databases.
nomad-coe.eu/). Also available is Other ontologies were built to enable
the Predicted Crystallography Open knowledge-guided materials design
Database (PCOD, http://www. or new materials discovery, such as
crystallography.net/pcod/) PREMΛ P ontology (Bhat et al (2013))
with over 1 million predicted crystal for steel mill products, MatOnto ontol-
structures, which is a project closely ogy (Cheung et al (2008)) for oxygen
related to COD. ion conducting materials in the fuel
As the amount of computed data cell domain, and SLACKS ontology
grows, the need for informatics in- (Premkumar et al (2014)) that integrates
frastructure also increases. Many of relevant product life cycle domains
the databases discussed above have which consist of engineering analysis
made their frameworks available, well- and design, materials selection and
known examples include the ones by manufacturing. The FreeClassOWL
Materials Project and OQMD. Other ontology (Radinger et al (2013)) is de-
publicly available frameworks used signed for the construction and building
in publications for materials design materials domain and supports seman-
and informatics include the Automated tic search for construction materials.
Interactive Infrastructure and Database MMOY ontology (Zhang et al (2016))
for Computational Science (AiiDA, captures metal materials knowledge
http://www.aiida.net/) (Pizzi from Yago. The ontology design pattern
et al (2016)), the Atomic Simula- in Vardeman et al (2017) models and
tion Environment (ASE, https: allows for reasoning about material
//wiki.fysik.dtu.dk/ase/) transformations in the carbon dioxide
(Larsen et al (2017)), and the and sodium acetate productions by com-
high-throughput toolkit (httk, bining baking soda and vinegar. Some
http://www.httk.org) (Faber ontologies are generated (Data Source in
et al (2016)). Table 1) by extracting knowledge from
other data resources such as the Plinius
ontology (van der Vet et al (1994))
which is extracted from 300 publication
abstracts in the domain of ceramic
materials, and MatOWL (Zhang et al materials property data which includes

(2009)) which is extracted from MatML schemas for such things as materials
schema data to enable ontology-based properties, composition, heat, and
data access. The ontologies may also production.
use other ontologies as a basis such as Some other standards that have
for instance, MatOnto that uses DOLCE received more attention are, e.g., Ther-
(Gangemi et al (2002)) and EXPO moML and CML. ThermoML (Frenkel
(Soldatova and King (2006)). et al (2006, 2011)) is an XML-based
From the knowledge representation markup language for exchange of
perspective (Table 2), the basic terms thermophysical and thermochemical
defined in materials ontologies involve property data. It covers over 120
materials, properties, performance, and properties regarding thermodynamic
processing in specific sub-domains. and transport property data for pure
The number of concepts ranges from compounds, multicomponent mixtures,
a few to several thousands. There are and chemical reactions. CML or Chem-
relatively few relationships and most ical Markup Language (Murray-Rust
ontologies have instances. Almost all and Rzepa (2011); Murray-Rust et al
ontologies use OWL as a representation (2011)) covers chemistry and espe-
language. In terms of organization of cially molecules, reactions, solid-state,
materials ontologies, Ashino’s Materials computation and spectroscopy. It is
Ontology, MatOnto, and PREMΛ P on- an extensible language that allows for
tology are developed as several ontology the creation of sub-domains through
components that are integrated in one the convention construct. Further, the
ontology. In Table 2 this is denoted in dictionaries construct allows for con-
the modularity column. necting CML elements to dictionaries
(or ontologies). This was inspired by the
approach of the Crystallographic Infor-
mation Framework or CIF (Bernstein
Standards et al (2016), http://www.iucr.
There are currently not so many stan- The European Committee for Stan-
dards yet in this domain. Early efforts dardization (CEN) organized workshops
including ISO standards and MatML on standards for materials engineering
achieved limited adoption according data (Austin (2016)) of which the re-
to Austin (2016). The standard ISO sults are documented in CEN (2010).
10303-45 includes an information The work focuses specifically on ambi-
model for materials properties. It pro- ent temperature tensile testing and devel-
vides schemas for material properties, oped schemas as well as an ontology (the
chemical compositions and measure ELSSI-EMD ontology from above).
values (Swindells (2009)). ISO 10303- Another recent approach is connected
235 includes an information model to the European Centre of Excellence
for product design and verification. NOMAD (Ghiringhelli et al (2016)).
MatML (Kaufman and Begley (2003), The NOMAD repository’s (https:
https://www.matml.org/) is //repository.nomad-coe.eu/)
an XML-based markup language for metadata structure is formatted to be
Table 1 Comparison of materials ontologies from a materials perspective

Materials ontology Data Source Domain Application Scenario

Ashino’s Materials Ontology

Thermal property databases Thermal properties Data exchange, search
Ashino (2010)
Plinius ontology
Publication abstracts Ceramics Knowledge extraction
van der Vet et al (1994)
MatOnto DOLCE ontology1 ,
Crystals New materials discovery
Cheung et al (2008) EXPO ontology2
PREMΛ P ontology
PREMΛ P platform Materials Knowledge-guided design
Bhat et al (2013)
FreeClassOWL Eurobau data3 , Construction and
Semantic query support
Radinger et al (2013) GoodRelations ontology4 building materials
MatML schema data Materials Semantic query support
Zhang et al (2009)
Yago data Metals Knowledge extraction
Zhang et al (2016)
ELSSI-EMD ontology Materials testing data Materials testing, Ambient
Data interoperability
CEN (2010) from ISO standards temperature tensile testing
SLACKS ontology Ashino’s Materials Ontology,
Laminated composites Knowledge-guided design
Premkumar et al (2014) MatOnto
1 DOLCE stands for Descriptive Ontology for Linguistic and Cognitive Engineering.
2 EXPO ontology is used to describe scientific experiments.
3 Eurobau.com compiles construction materials data from ten European countries.
4 GoodRelations ontology (Hepp (2008)) is used for e-commerce with concepts such as business
entities and prices.
Table 2 Comparison of materials ontologies from a knowledge representation perspective

Materials ontology Ontology Metrics Language Modularity

Ashino’s Materials Ontology

Ashino (2010)
606 concepts, 31 relationships,
488 instances
Plinius ontology 17 concepts, 4 relationships,
Ontolingua code
van der Vet et al (1994) 119 instances1
Cheung et al (2008)
78 concepts, 10 relationships,
24 instances
PREMΛ P ontology
Bhat et al (2013)
62 concepts UML !
FreeClassOWL 5714 concepts, 225 relationships
Radinger et al (2013) 1469 instances
(not available) OWL
Zhang et al (2009)
544 metal concepts, 1781 related concepts,
9 relationships, 318 metal instances OWL
Zhang et al (2016)
1420 related instances
ELSSI-EMD ontology
CEN (2010)
35 concepts, 37 relationships,
33 instances
SLACKS ontology
34 concepts and 10 relationships at least2 OWL
Premkumar et al (2014)
1 103 instances out of 119 are elements in the periodic system.
2 The numbers are based on the high-level class diagram and an illustration of instances’
integration in SLACKS shown in (Premkumar et al (2014)).

independent of the electronic-structure reasoning capabilities should be used, as

theory or molecular-simulation code in the bioinformatics field in the 1990s
that was used to generate the data and (Lambrix et al (2009)). Databases could
can thus be used as an exchange format. use ontologies to define their schemas
and enable ontology-based querying.
Integration of databases is enabled by
the use of ontologies. However, when
Conclusion databases have used different ontologies,
alignments between different ontologies
The use of the materials data in a are needed as well (Euzenat and Shvaiko
materials design workflow requires (2007)). Further, more effort should be
addressing several big data problems in- put on connecting ontologies and stan-
cluding Variety, Variability and Veracity. dards (as started in the CML, CEN and
Semantic technologies are a key factor NOMAD approaches), which may also
in tackling some of these problems. lead to connections between different
Currently, efforts have started in creat- standards. Reasoning can be used in
ing materials databases, ontologies and different ways. When developing re-
standards. However, much work remains sources reasoning can help in debugging
to be done. To make full use of these and completing the resources leading to
resources there is a need for integration higher quality resources (Ivanova and
of different kinds of resources and Lambrix (2013)). Reasoning can also
be used during querying of databases CEN (2010) A guide to the development and
as well as in the process of connecting use of standards compliant data formats for
different resources. engineering materials test data European
Committee for standardization
Cheng X, Hu C, Li Y (2014) A semantic-
driven knowledge representation model for
the materials engineering application. Data
References Science Journal 13:26–44, DOI 10.2481/dsj.
Agrawal A, Alok C (2016) Perspective: mate- Cheung K, Drennan J, Hunter J (2008) To-
rials informatics and big data: realization of wards an Ontology for Data-driven Discov-
the Fourth paradigm of science in materials ery of New Materials. In: McGuinness D,
science. APL Mater 4:053,208:1–10, DOI Fox P, Brodaric B (eds) Semantic Scientific
10.1063/1.4946894 Knowledge Integration AAAI/SSS Work-
Ashino T (2010) Materials Ontology: An In- shop, pp 9–14
frastructure for Exchanging Materials Infor- Curtarolo S, Setyawan W, Wang S, Xue J,
mation and Knowledge. Data Science Jour- Yang K, Taylor R, Nelson L, Hart G,
nal 9:54–61, DOI 10.2481/dsj.008-041 Sanvito S, Buongiorno-Nardelli M, Mingo
Austin T (2016) Towards a digital infrastruc- N, Levy O (2012) AFLOWLIB.ORG:
ture for engineering materials data. Mate- A distributed materials properties reposi-
rials Discovery 3:1–12, DOI 10.1016/j.md. tory from high-throughput ab initio cal-
2015.12.003 culations. Computational Materials Science
Belsky A, Hellenbrandt M, Karen VL, Luksch 58(Supplement C):227–235, DOI 10.1016/j.
P (2002) New developments in the Inor- commatsci.2012.02.002
ganic Crystal Structure Database (ICSD): Curtarolo S, Hart G, Buongiorno-Nardelli M,
accessibility in support of materials research Mingo N, Sanvito S, Levy O (2013)
and design. Acta Crystallographica Section The high-throughput highway to compu-
B: Structural Science 58(3):364–369, DOI tational materials design. Nature Materials
10.1107/S0108768102006948 12(3):191, DOI 10.1038/nmat3568
Bergerhoff G, Hundt R, Sievers R, Brown ID Euzenat J, Shvaiko P (2007) Ontology Match-
(1983) The inorganic crystal structure data ing. Springer
base. Journal of Chemical Information and Faber F, Lindmaa A, von Lilienfeld A,
Computer Sciences 23(2):66–69, DOI 10. Armiento R (2016) Machine Learn-
1021/ci00038a003 ing Energies of 2 Million Elpasolite
Bernstein HJ, Bollinger JC, Brown ID, Grazulis $(AB–C˝˙–2˝–D˝˙–6˝)$ Crystals. Phys-
S, Hester JR, McMahon B, Spadaccini N, ical Review Letters 117(13):135,502,
Westbrook JD, Westrip SP (2016) Specifica- DOI 10.1103/PhysRevLett.117.135502
tion of the crystallographic information file Frenkel M, Chiroco RD, Diky V, Dong
format, version 2.0. J Appl Cryst 49:277– Q, Marsh KN, Dymond JH, Wakeham
284, DOI 10.1107/S1600576715021871 WA, Stein SE, Knigsberger E, Goodwin
Bhat M, Shah S, Das P, Reddy S (2013) ARH (2006) XML-based IUPAC stan-
Premλ p: knowledge driven design of dard for experimental, predicted, and crit-
materials and engineering process. In: ically evaluated thermodynamic property
ICoRD’13, Springer, pp 1315–1329, data storage and capture (ThermoML)
DOI 10.1007/978-81-322-1050-4˙105 (IUPAC Recommendations 2006). Pure
Campbell CE, Kattner UR, Liu ZK (2014) Appl Chem 78:541612, DOI 10.1351/
File and data repositories for Next Genera- pac200678030541
tion CALPHAD. Scripta Materialia 70(Sup- Frenkel M, Chirico RD, Diky V, Brown
plement C):7–11, DOI 10.1016/j.scriptamat. PL, Dymond JH, Goldberg RN, Goodwin
2013.06.013 ARH, Heerklotz H, Knigsberger E, Lad-
Ceder G, Persson KA (2013) How Supercom- bury JE, Marsh KN, Remeta DP, Stein SE,
puters Will Yield a Golden Age of Materials Wakeham WA, Williams PA (2011) Ex-
Science. Scientific American 309 tension of ThermoML: The IUPAC stan-
dard for thermodynamic data communi-
cations (IUPAC Recommendations 2011). Lambrix P, Stromback L, Tan H (2009)

Pure Appl Chem 83:19371969, DOI 10. Information Integration in Bioinformatics
1351/PAC-REC-11-05-01 with Ontologies and Standards. In: Bry F,
Gangemi A, Guarino N, Masolo C, Oltramari Maluszynski J (eds) Semantic Techniques
A, Schneider L (2002) Sweetening ontolo- for the Web, pp 343–376, DOI 10.1007/
gies with dolce. Knowledge engineering and 978-3-642-04581-3˙8
knowledge management: Ontologies and the Larsen AH, Mortensen JJ, Blomqvist J, Castelli
semantic Web pp 223–233, DOI 10.1007/ IE, Christensen R, Marcin Duak, Friis J,
3-540-45810-7˙18 Groves MN, Hammer B, Hargus C, Her-
Gaultois MW, Oliynyk AO, Mar A, Sparks mes ED, Jennings PC, Jensen PB, Ker-
TD, Mulholland GJ, Meredig B (2016) mode J, Kitchin JR, Kolsbjerg EL, Kubal
Perspective: Web-based machine learning J, Kristen Kaasbjerg, Lysgaard S, Marons-
models for real-time screening of thermo- son JB, Maxson T, Olsen T, Pastewka L,
electric materials properties. APL Materials Andrew Peterson, Rostgaard C, Schitz J,
4(5):053,213, DOI 10.1063/1.4952607 Schtt O, Strange M, Thygesen KS, Tejs
Ghiringhelli LM, Carbogno C, Levchenko S, Vegge, Vilhelmsen L, Walter M, Zeng Z,
Mohamed F, Huhs G, Lueders M, Oliveira Jacobsen KW (2017) The atomic simula-
M, Scheffler M (2016) Towards a Common tion environment - a Python library for
Format for Computational Materials Sci- working with atoms. Journal of Physics:
ence Data. PSI-K Scientific Highlights July Condensed Matter 29(27):273,002, DOI 10.
Glasser L (2016) Crystallographic Information 1088/1361-648X/aa680e
Resources. Journal of Chemical Education Lejaeghere K, Bihlmayer G, Bjrkman T, Blaha
93(3):542–549, DOI 10.1021/acs.jchemed. P, Blgel S, Blum V, Caliste D, Castelli IE,
5b00253 Clark SJ, Corso AD, Gironcoli Sd, Deutsch
Grazulis S, Dazkevic A, Merkys A, Chateigner T, Dewhurst JK, Marco ID, Draxl C, Duak
D, Lutterotti L, Quiros M, Serebryanaya M, Eriksson O, Flores-Livas JA, Garrity
NR, Moeck P, Downs RT, Le Bail A KF, Genovese L, Giannozzi P, Giantomassi
(2012) Crystallography Open Database M, Goedecker S, Gonze X, Grns O, Gross
(COD): an open-access collection of crys- EKU, Gulans A, Gygi F, Hamann DR, Has-
tal structures and platform for world- nip PJ, Holzwarth NaW, Iuan D, Jochym
wide collaboration. Nucleic Acids Research DB, Jollet F, Jones D, Kresse G, Koepernik
40(Database issue):D420–D427, DOI 10. K, Kkbenli E, Kvashnin YO, Locht ILM,
1093/nar/gkr900 Lubeck S, Marsman M, Marzari N, Nitzsche
Hepp M (2008) Goodrelations: An ontology U, Nordstrm L, Ozaki T, Paulatto L, Pickard
for describing products and services offers CJ, Poelmans W, Probert MIJ, Refson K,
on the web. Knowledge Engineering: Prac- Richter M, Rignanese GM, Saha S, Schef-
tice and Patterns pp 329–346, DOI 10.1007/ fler M, Schlipf M, Schwarz K, Sharma S,
978-3-540-87696-0˙29 Tavazza F, Thunstrm P, Tkatchenko A, Tor-
Ivanova V, Lambrix P (2013) A unified ap- rent M, Vanderbilt D, van Setten MJ, Spey-
proach for debugging is-a structure and broeck VV, Wills JM, Yates JR, Zhang GX,
mappings in networked taxonomies. J Cottenier S (2016) Reproducibility in den-
Biomed Semant 4:10:1–10:19, DOI 10. sity functional theory calculations of solids.
1186/2041-1480-4-10 Science 351(6280):aad3000, DOI 10.1126/
Jain A, Ong SP, Hautier G, Chen W, Richards science.aad3000
WD, Dacek S, Cholia S, Gunter D, Skin- Moruzzi VL, Janak JF, Williams ARAR (2013)
ner D, Ceder G, Persson KA (2013) Com- Calculated electronic properties of metals.
mentary: The Materials Project: A materi- Pergamon Press, New York
als genome approach to accelerating materi- Mulholland GJ, Paradiso SP (2016) Perspec-
als innovation. APL Materials 1(1):011,002, tive: Materials informatics across the prod-
DOI 10.1063/1.4812323 uct lifecycle: Selection, manufacturing, and
Kaufman JG, Begley EF (2003) MatML: A certification. APL Materials 4(5):053,207,
Data Interchange Markup Language. Ad- DOI 10.1063/1.4945422
vanced Materials And Processes 161:35–36 Murray-Rust P, Rzepa HS (2011) CML:
Evolution and design. Journal of Chem-
informatics 3:44:1–44:15, DOI 10.1186/ tern and Its Use Case for Modeling Material
1758-2946-3-44 Transformation. Semantic Web 8:719–731,
Murray-Rust P, Townsend JA, Adams SE, DOI 10.3233/SW-160231
Phadungsukanan W, Thomas J (2011) The Zhang X, Hu C, Li H (2009) Semantic query on
semantics of Chemical Markup Language materials data based on mapping matml to
(CML): dictionaries and conventions. Jour- an owl ontology. Data Science Journal 8:1–
nal of Cheminformatics 3:43, DOI 10.1186/ 17, DOI 10.2481/dsj.8.1
1758-2946-3-43 Zhang X, Zhao C, Wang X (2015a) A sur-
Pizzi G, Cepellotti A, Sabatini R, Marzari N, vey on knowledge representation in materi-
Kozinsky B (2016) AiiDA: automated inter- als science and engineering: An ontological
active infrastructure and database for com- perspective. Computers in Industry 73:8–22,
putational science. Computational Materials DOI 10.1016/j.compind.2015.07.005
Science 111(Supplement C):218–230, DOI Zhang X, Pan D, Zhao C, Li K (2016) MMOY:
10.1016/j.commatsci.2015.09.013 Towards deriving a metallic materials on-
Premkumar V, Krishnamurty S, Wileden JC, tology from Yago. Advanced Engineering
Grosse IR (2014) A semantic knowledge Informatics 30:687–702, DOI 10.1016/j.aei.
management system for laminated com- 2016.09.002
posites. Advanced engineering informatics Zhang X, Chen H, Ruan Y, Pan D, Zhao C
28(1):91–101, DOI 10.1016/j.aei.2013.12. (2017) MATVIZ: a semantic query and vi-
004 sualization approach for metallic materials
Radinger A, Rodriguez-Castro B, Stolz A, data. International Journal of Web Infor-
Hepp M (2013) Baudataweb: the austrian mation Systems 13:260–280, DOI 10.1108/
building and construction materials mar- IJWIS-11-2016-0065
ket as linked data. In: Proceedings of the Zhang Y, Luo X, Zhao Y, chao Zhang
9th International Conference on Semantic H (2015b) An ontology-based knowledge
Systems, ACM, pp 25–32, DOI 10.1145/ framework for engineering material se-
2506182.2506186 lection. Advanced Engineering Informat-
Rajan K (2015) Materials Informatics: The ics 29:9851000, DOI 10.1016/j.aei.2015.09.
Materials Gene and Big Data. Annu Rev 002
Mater Res 45:153–169, DOI 10.1146/
Saal JE, Kirklin S, Aykol M, Meredig
B, Wolverton C (2013) Materials De-
sign and Discovery with High-Throughput
Density Functional Theory: The Open
Quantum Materials Database (OQMD).
JOM 65(11):1501–1509, DOI 10.1007/
Soldatova LN, King RD (2006) An ontology
of scientific experiments. J R Soc Inter-
face 3(11):795–803, DOI 10.1098/rsif.2006.
Swindells N (2009) The representation and ex-
change of material and other engineering
properties. Data Science Journal 8:190–200,
DOI 10.2481/dsj.008-007
van der Vet P, Speel PH, Mars N (1994) The
Plinius ontology of ceramic materials. In:
Mars N (ed) Workshop Notes ECAI’94
Workshop Comparison of Implemented On-
tologies, pp 187–205
Vardeman C, Krisnadhi A, Cheatham M,
Janowicz K, Ferguson H, Hitzler P, Buc-
cellato A (2017) An Ontology Design Pat-

