Document 5
Document 5
Google Scholar
International Organization for Standardization (2012)
X. Hu, P. Balasubramaniam
Xu et al., 2019
Towards a systematic data harmonization to enable AI application in the process
industry
Comput. Chem. Eng., 21 (1997), pp. S71-S76, 10.1016/S0098-1354(97)87481-9
u
InTech (2008), 10.5772/68
It appears that the classes have an uneven distribution, which is unavoidable since
this data set is a representative cross-section of all components in a process
plant. In the context of this work, it is important to investigate to what extent
the unequal distribution of training data will affect the results of the
classification.
k
View PDFView articleView in ScopusGoogle Scholar
International Electrotechnical Commission, 2016. IEC 62424, Representation of
process control engineering – requests in P&I diagrams and data exchange between
P&ID tools and PCE-CAE tools. International Electrotechnical Commission, Geneva.
Fig. 4. GUI of the DEXPI-2-graph converter.
node2vec
Get rights and content
and CONCAT concatenates the individually calculated aggregations of each node.
To better understand the modeling of plant topology by message passing GNNs, an
example is given in Fig. 7 that relates the aggregation of neighborhood information
to a snippet of a P&ID. The example shows the aggregation by a two-layer neural
network. Since the plant topology is to be learned, we focus in the following on
the equipment information, such as the classes of each component in the P&ID. Thus,
in a first step (k = 1), inferences can be made about the vessel based on the
information from the valve and the heat exchanger. In a second step (k = 2), a
valve's and a temperature sensor's information can be aggregated for the embedding
of the heat exchanger, while the valves´ embedding is influenced by the connected
drive and flow control.
Cited by (4)
D.P. Kingma, J. Ba
Download : Download high-res image (199KB)
Proceedings of the ICLR (2014), p. 2015
Cambridge University Press, Cambridge (2009), 10.1017/CBO9781139644150
The main diagonal shows that most components of each class are correctly
classified. This way, all separation units are correctly classified. Process
control equipment (PCE, 96%) and piping components (93%) are also almost completely
correctly assigned. With over 87% prediction accuracy, valves and check valves are
also classified sufficiently well, although it is noticeable that there is
confusion between valves and piping equipment and valves and safety valves.
However, with less than 10%, this is still within tolerable limits. The
classification of the remaining components is much more difficult for the GNN.
Thus, 39% of safety valves are identified as piping equipment and 16% as standard
valves. This is not surprising since both classes are usually found in similar
positions of a P&ID. Classification of pumps (67%), vessels (55%) and heat
exchangers (75%) is only reasonably satisfactory. It is noticeable that all three
classes are mainly classified as valves or PCEs. Classes which are particularly
strongly represented in the data set according to Table 1. The GNN models should
therefore be further optimized in the future. At this point, it is conceivable to
integrate the comparatively underrepresented classes more strongly into the
training by introducing weighting factors. Furthermore, it would be conceivable to
use a larger k, which would aggregate more information. However, this results in a
larger computational effort.
Google Scholar
Google Scholar
Question answering system for chemistry—A semantic agent extension
Download : Download full-size image
Wiedau et al., 2019
Inductive representation learning on large graphs
open access
In the following, several P&IDs in the standardized DEXPI format are used as
training data, which were exported using the program PlantEngineer from the
software vendor X-Visual Technologies GmbH and converted to graphs in GraphML
format (GraphML Project Group, 2017) according to chapter 2.1. In total, 35 P&ID
graphs from third parties (laboratory and industrial plants) with 1641 nodes and
1410 edges are used. The data set contains 92 different equipment classes (valves,
pumps, vessels, instrumentation, etc.) based on the DEXPI specifications (Theißen
and Wiedau, 2021) and has three different classes of edges (pipes, signal lines,
process connection lines). The ratio of nodes/edges shows that, as expected for
P&IDs, these are very linear graphs with rather low connectivity structures. At a
closer look there are usually many single nodes along a pipeline (e.g. valves,
vessels, pumps, heat exchangers, measuring points, etc.) which results in a kind of
dead ends. Additionally, some P&IDs show inconsistencies in their drawn structures,
which in some cases lead to isolated nodes or several, smaller graphs. However,
these inconsistencies were deliberately included in the data set, as the data is
intended to represent the current state of machine-readable P&IDs in the process
industry to obtain representative results. The influence of the inconsistencies on
the results is examined in more detail in chapter 4.
Wiedau et al., 2021
Google Scholar
x
into an one-hot vector, i.e., a vector consisting of zeros except a single entry
that is set to one (Gulli and Pal, 2017). The sum of the vectors allows for
determining exactly the used one-hot vectors, which were needed for its generation.
Zhao et al., 2005
Show full outline
To make the structure of the P&ID available for further processing, the respective
DEXPI file of the P&ID is converted into a graph using Python. The plant topology
of the P&ID, including parameters relevant for modeling, is stored in a directed
graph in the form of a GraphML file (GraphML Project Group, 2017). A directed
graph, per definition, consists of a set N of nodes and a set E of edges. The edges
are directed, meaning that each edge is defined by ordered pairs of nodes (start
and end node) (Turau and Weyer, 2015). The P&ID to be processed is stored in the
DEXPI format, which is based on a Proteus XML schema (Proteus XML, 2017) and
contains three levels of information relevant for topology extraction. The
Equipment, which contains a listing of the components present in the P&ID. The
PipingNetworkSystem, which describes the piping system and the interconnections
between the various equipment, and the PipingComponents, which contain components
embedded in the piping system, like valves. The classes of equipment and components
are uniquely defined in DEXPI via the EquipmentClass and ComponentClass.
Furthermore, the InstrumentationFunction provides information about process control
equipment (PCE) and their connections. Fig. 2 shows the procedure used to convert
the P&ID via Python.
Readers:
Oeing, 2022
The reduction of a graph to canonical form and the algebra which appears therein
The design and engineering of piping and instrumentation diagrams (P&ID) is a very
time-consuming and labor-intensive process. Although P&IDs show common patterns
that could be reused during development, the drawing is usually created manually
and built up from scratch for each process. The aim of this paper is to recognize
these patterns with the help of artificial intelligence (AI) and to make them
available for the development and the drawing process of P&IDs. In order to achieve
this, P&ID data is made accessible for AI applications through the DEXPI format,
which is a machine-readable, manufacturer-independent exchange standard for P&IDs.
It is demonstrated how deep learning models trained with DEXPI P&ID data can
support the engineering as well as drawing of P&IDs and therefore decrease labor
time and costs. This is achieved by assisted prediction of equipment in P&IDs based
on recurrent neural networks as well as consistency checks based on graph neural
networks.
Deep Learning with Keras: Implementing Deep Learning Models and Neural Networks
With the Power of Python
number of nodes
An agent-based environment for operational design
CrossRefGoogle Scholar
Oeing, J., 2022. DEXPI2graph converter application [WWW Document]. URL
https://github.com/TUDoAD/DEXPI2graphML (accessed 5.18.22).
Algorithmische Graphentheorie, De Gruyter Studium
Download : Download high-res image (186KB)
M. Grabisch, J.-L. Marichal, R. Mesiar, E. Pap
Chem. Ing. Tech., 93 (2021), pp. 2105-2115, 10.1002/cite.202100203
ArXiv ID 1412.6980
Google Scholar
Google Scholar
Datasets
The workflow of the node classification is shown in Fig. 8. First, all nodes of all
P&ID graphs in the used dataset are divided into a training dataset (80%) and a
test dataset (20%) using a mask. The neural network is then provided with
information about the topology of the graph, as well as attributes of the nodes and
edges, e.g. equipment class, connection type, etc… From this information, the
network generates an embedding for each node and the predicted node class. This is
compared with the real node class and the error is reduced via backpropagation.
After the training is finished, the trained network can be used for node
classification of unseen data (nodes).
Download : Download high-res image (310KB)
Abstract
Download : Download high-res image (499KB)
Download : Download full-size image
arbitrary node u of a graph
StellarGraph 2020
M. Schuster, K.K. Paliwal
2023, arXiv
N.K. Manaswi
Graph neural networks: a review of methods and applications
message of an aggregated neighborhood of a node in a graph
Google Scholar
Google Scholar
Pumps 69
Deep Learning with Applications Using Python
Schuster and Paliwal, 1997
The results show that RNNs are generally able to learn patterns in sequences from
P&ID graphs. It is noticeable that the SimpleRNN provides the best results with a
validation accuracy of 78.36%. In the case, where the equipment is part of the five
most likely predictions, even 95.2% accuracy is achieved. The BRNN reaches an
accuracy of 94.39%, while predicting the five most suitable equipment types. The
LSTM and GRU have slightly lower accuracy, suggesting that the effect of the
diminishing gradient for the short sequences involved does not have a significant
effect on the training. At the same time, it should be noted that training for the
GRU took less than one-third the time of a SimpleRNN model. Given the current small
amount of data, this is not a decisive factor with the current setting. However,
should the training of the models be done in the future on large data sets or
continuously, it is recommended to give more attention to this aspect, as the use
of GRUs or LSTMs can save time and resources (Strubell et al., 2019), which should
be considered with respect to a sustainable process development.
Chollet, F., 2020. Keras API - documentation vers. 2.4.0 [WWW Document]. URL
https://keras.io (accessed 2.20.22).
C. Zhao, M. Bhushan, V. Venkatasubramanian
Long short-term memory
stands for the embedding of a node u at iteration step k. UPDATE and AGGREGATE are
arbitrary, differentiable functions, where the aggregation of the neighborhood N(u)
of node u represents the actual "message" m. The parameter k defines the number of
iterations, at which the message passing proceeds, thus represents the number of
hidden layers of the GNN. Since the aggregation of the neighborhood information
must be independent of the order, it is important that the AGGREGATION is a
permutation-invariant function. Based on the embedding for each iteration step k, a
final embedding for each node u can subsequently be determined using a final layer
(Hamilton, 2020).
Manaswi, 2018
Article Metrics
Proceedings of the NIPS (2017), p. 17
M. Wiedau, L. von Wedel, H. Temmen, R. Welke, N. Papakonstantinou
Abstract
and applies it to the sum-MLP combination in Eq. (6) (Hamilton, 2020).
Sequential node prediction using recurrent neural networks
Original Article
V. Turau, C. Weyer
Strubell et al., 2019
Citation Indexes:
Google Scholar
RELX group home page
Add to Mendeley
Hamilton et al., 2017
Data augmentation for machine learning of chemical process flowsheets
Citations
AI-based suggestions can be used to speed up the process of drawing P&IDs. A
sequence of drawn and connected components is used to learn their course with the
help of an RNN. Recurrent neural networks are neural networks that can model time
series (or in this case linearly structured sequences) in training data based on
their structure (Hu and Balasubramaniam, 2008). Based on the learned correlations,
a node prediction is carried out, which returns the most probable subsequent P&ID
components based on an input sequence. The workflow of the modeling is explained in
more detail in the following.
Download : Download full-size image
GraphML Project Group 2017
Fig. 5. Workflow of an RNN-based model for predicting subsequent equipment in
P&IDs.
Fig 9
The authors declare that they have no known competing financial interests or
personal relationships that could have appeared to influence the work reported in
this paper.
Download : Download full-size image
v
S. Fillinger, H. Bonart, W. Welscher, E. Esche, J.-U. Repke
In the following, the different RNN models are used and trained with the in chapter
2.2 generated P&ID graphs according to the presented workflow. The implementation
is done in Python using the keras library (Chollet, 2020). The "Adam" optimizer
(Kingma and Ba, 2014) is used for all trainings and the calculation of the loss is
performed by the "categorical cross entropy" (Murphy, 2012). The prediction
accuracy is used as an evaluation metric and is defined as follows.
Digital Chemical Engineering
Google Scholar
Fig 8
Cho et al., 2014
Proceedings of the ICLR (2014), p. 2015
Download : Download full-size image
Zhou et al., 2020
Author links open overlay panelJonas Oeing a, Wolfgang Welscher b, Niclas Krink b,
Lars Jansen a, Fabian Henke a, Norbert Kockmann a
Gulli and Pal, 2017
MIT Press, Cambridge (2012)
Recommended articles
Google Scholar
Digital Chemical Engineering, Volume 2, 2022, Article 100010
Google Scholar
Keywords
As mentioned before, the information from the P&ID is interpreted in the form of a
graph. This makes it possible to store the relationships between components and the
topology in an unambiguous and machine-interpretable way. However, to learn the
graph structure as a whole and to solve tasks such as node classification, edge
classification or link predictions, machine learning methods of graph analysis are
required that can deal with non-Euclidean data structures such as graphs. The
modeling of graph structures is particularly interesting in the field of P&ID
engineering. By learning connections (e.g. piping, signal lines, …) or components
(e.g. valves, equipment, …) based on their neighborhood with the help of AI, it
will be possible in the future to perform consistency checks in P&IDs and detect
errors in P&IDs. This could reduce the amount of time for drawing P&IDs, which will
shorten the time for developing a plants documentation. To achieve this goal, Graph
Neural Networks can be used for modeling Graph Neural Networks (GNN) can be used
for modelling, which have become increasingly important in recent years (Zhou et
al., 2020). A GNN is based on a message passing algorithm that aggregates arbitrary
information from the neighborhood of a node, which will convolve the graph
(Hamilton, 2020). In general, the message passing of a GNN is analogous to the
Weisfeiler-Lehman algorithm to test the isomorphism of two graphs (Weisfeiler and
Leman, 1968), which was introduced in 1968 and in which information is aggregated
from the neighborhood of each node.
Heat exchangers 86
Fig 3
Show more
Download : Download full-size image
View in ScopusGoogle Scholar
E. Strubell, A. Ganesh, A. McCallum
Advertise
IEEE Trans. Signal Process., 45 (1997), pp. 2673-2681, 10.1109/78.650093
Introduction
input
K. Xu, W. Hu, J. Leskovec, S. Jegelka
Grabisch et al., 2009
Hu, 2008
(4)
Acknowledgment
View in ScopusGoogle Scholar
https://doi.org/10.1016/j.dche.2022.100038
Google Scholar
GraphML Project Group, 2017. GraphML specification [WWW Document]. URL
http://graphml.graphdrawing.org/specification/dtd.html (accessed 2.10.22).
Contact and support
Google Scholar
Mathematically, the message passing of an GNN (Eq. (2)) can be described as follows
(Hamilton, 2020):
Google Scholar
International Organization for Standardization
© 2022 The Authors. Published by Elsevier Ltd on behalf of Institution of Chemical
Engineers (IChemE).
Show 3 more articles
arbitrary node in the neighborhood of node u
(7)
International Organization for Standardization, 2013. ISO 15926-2 - industrial
automation systems and integration – integration of life-cycle data for process
plants including oil and gas production facilities – part 2: data model. Beuth
Verlag, Geneva.
List of symbols
The sequential node prediction can be divided into three parts and its workflow is
shown in Fig. 5.
The need for data-driven modeling and optimization is growing in the process
industry due to increasing digitalization of processes and tools. Previous work
already shows an acceleration of process development by agent-based environments,
which can be understood as the first steps of intelligent process development
(Batres et al., 1997). Further work shows the potential for using intelligent
process information models by describing chemical plants and processes in terms of
a machine-readable format (e.g. colored petri nets), which enables accessibility
for deterministic algorithms (Zhao et al., 2005). This paper aims to develop
applications that accelerate the development of P&IDs using artificial intelligence
(AI). This requires an accelerated development and application of standardized and
machine-readable file exchange formats to ensure a sufficiently large and highly
available database for the application of AI in P&ID development. In the field of
P&IDs, the DEXPI (Data Exchange in Process Industry) standard is becoming
increasingly popular., as it enables the uniform description of P&IDs and ensures
the vendor-independent exchange of information (Theißen and Wiedau, 2021). At the
same time, DEXPI provides the possibility to be used as a platform for digital
plant data in process industry (Wiedau et al., 2019), which can significantly
reduce the development time of chemical and biotechnological production plants.
Additionally, interoperability increases due to the continuous integration of DEXPI
into existing engineering software (Fillinger et al., 2017).
Acknowledgment
Fillinger et al., 2017
Fig 5
2022, arXiv
Digital Chemical Engineering, Volume 3, 2022, Article 100028
Elsevier
Google Scholar
Chollet, 2020
Safety valves 93
Grover and Leskovec, 2016
Google Scholar
Fig 2
output
Neural Comput., 9 (1997), pp. 1735-1780, 10.1162/neco.1997.9.8.1735
Google Scholar
(6)
Abbreviations
A. Gulli, S. Pal
Abbreviations
Fig. 10. Normalized confusion matrix of the recursive GNN with sum aggregation for
the test data set.
bias
Recurrent Neural Networks
Social Media
Preprocessing – DEXPI-2-graph
Elsevier logo
TOWARDS AUTOMATIC GENERATION OF PIPING AND INSTRUMENTATION DIAGRAMS (P&IDS)
WITH ARTIFICIAL INTELLIGENCE
Murphy, 2012
Elsevier logo with wordmark
Bahdanau et al., 2014
Fig. 6. Results of the training of following P&ID equipment with different RNN
models