Design, Modeling and Implementation of Digital Twins
Design, Modeling and Implementation of Digital Twins
Behavioral Models
These kind of models are a specification of the system behavior based on the physical process
that the PO controls. As a result, these models refer to a mathematical or computational
description of how the variables of interest relate to each other, for example, to understand how
forces, acceleration, jerk, angular displacement, angular acceleration and other phenomena
interact in the physical process. The main approaches to building behavioral models include using
control or databased techniques. The control techniques consist of observing a physical
phenomenon, developing an understanding of it and expressing it in the form of mathematical
equations. Understanding physical process is hard and may not be possible in complex systems.
However, it provides tools to reason about the system behavior, which is useful to generalize the
models to similar problems and also to bound errors.
On the contrary, the data-based models work as a black box and do not provide these
advantages. The positive points of these approaches is that they are more flexible and can take
into account new experimental data. They may also be more suitable for complex systems where
understanding the physics is not possible, or even in systems where physics models cannot be
applied, such as 5G networks. The models obtained by data may be biased by the dataset and the
errors cannot be estimated [71].
Control Models—These are physics-based models, i.e., they use the laws of physics and compare
simulated results with known information, represented by mathematical models [20,72]. Control
is the core function of DTs built for CPS. It aims at maintaining the system at an acceptable level
of operation in response to disturbances. In CPS, the physical processes provide information to
the cyber components, and the cyber components control the physical processes. In order to
remain consistent, real-time data from the physical process is collected using sensors. The data is
communicated to the cyber components and it is used to compute the control output to send it
back to the actuators to correct the physical process. A behavior model based on control theory
can be obtained using transfer functions or state-space modeling. This way, the reality is
theoretically modeled using representations that relate to each possible input signal and the
corresponding output signal. Using this technique, the design process starts with the differential
equations that model the behavior of the physical process being controlled. Then, the transfer
function can be derived from the differential equations of the process. The transfer functions and
the state-space modeling are equivalent representations, i.e., one can be derived from the other
and vice versa [73].
The transfer function G(s) is the ratio of the Laplace transformation using the complex variable s
of the output Y(s) to that of the input U(s). It is represented, as shown in Equation (1), by the
division of two polynomials; the numerator is created by taking the coefficients bi of the output
differential equation and the denominator using the coefficients ai of the input differential
equation.
G(s) = Y(s) U(s) = i=0 aisn-i (1) m∑ bism-i n∑ i=0 A transfer function with multiple inputs and
multiple outputs is usually represented in matrix form, which indicates the relationship of each
input and each output of the system.
The state-space model expresses the differential equations in matrix form, as shown in Equation
(2): xk+1 = Axk + Buk + wk yk = Cxk + vk (2) where xk 2 Rn is the vector of the state variables at
the k-th time step, uk 2 Rp is the control signal and wk 2 Rn is the process noise, which is
assumed to be a zero-mean Gaussian white noise with covariance Q, i.e., wk ∼ N(0, Q).
Moreover, A 2 Rn×n and B 2 Rn×p are, respectively, the state matrix and the input matrix. The
value of the output vector yk 2 Rm represents the measurements produced by the sensors that
are affected by a noise vk, assumed as a zero-mean Gaussian white noise with covariance R, i.e.,
vk ∼ N(0, R), and C 2 Rm×n is the output matrix that maps the state xk to the system output. DTs
require replicating the states of the physical process within a CPS in functionally equivalent
virtual replicas to mirror the internal behavior of the system. To solve this issue, Eckhart et al.
[74] analyze how to passively replicate the program states of devices to obtain a virtual
representation of the CPS during its operation.
They propose an approach that identifies stimuli on the system’s specification and then replicates
them in a virtual environment. This way, the stimuli triggers state transitions and different data
sources, such as network traffic or system logs, can be used to identify the stimuli, replay it and
synchronize it between the DT and the PO. Similarly, Schellenberger et al. [33] propose a DT to
detect attacks in CPSs. In this case, the approach is based on Luenberger state observers [75] to
estimate the state of the system based on observations of its inputs, outputs and a mathematical
model that describes its dynamics, i.e., an observer is a continuous-time dynamical system that
takes as input the measured input and measured output of the plant and produces an estimate
of the state of the plant as output.
4.1.1. 行为模型 这些模型是基于物理过程的系统行为规范。因此,这些模型涉及与感兴趣
的变量如何相互关联的数学或计算描述,例如,了解在物理过程中如何相互作用的力、加
速度、急加速度、角位移、角加速度等现象。 构建行为模型的主要方法包括使用控制或数
据驱动技术。控制技术包括观察物理现象,开发对其的理解,并将其表达为数学方程的形
式。理解物理过程很难,对于复杂系统可能是不可能的。然而,它提供了推断系统行为的
工具,这对于将模型推广到类似问题并限制错误非常有用。相反,基于数据的模型作为黑
匣子工作,不提供这些优势。这些方法的积极点是它们更加灵活,可以考虑新的实验数据
它们还可能更适用于理解物理不可能的复杂系统,甚至在物理模型无法应用的系统中,比
如 5G 网络。由数据获得的模型可能会受到数据集的偏差影响,且无法估计错误[71]。 控制
模型-这些是基于物理的模型,即它们使用物理定律并将模拟结果与已知信息进行比较,以
数学模型表示[20,72]。控制是为 CPS 构建的 DT 的核心功能。其目标是在响应干扰时将系
统维持在可接受的操作水平上。在 CPS 中,物理过程提供信息给网络组件,而网络组件控
制物理过程。为了保持一致性,从物理过程中收集实时数据使用传感器。数据被传输到网
络组件,并用于计算控制输出以将其发送回执行器以校正物理过程。 基于控制理论的行为
模型可以使用传递函数或状态空间建模来获得。通过这种方式,可以使用与每个可能的输
入信号和相应的输出信号相关联的表示来理论建模现实。使用这种技术,设计过程从模拟
被控制的物理过程的微分方程开始。然后,可以从过程的微分方程导出传递函数。传递函
数和状态空间建模是等效的表示,即可以从一个中导出另一个,反之亦然 [73]。 传递函数
G(s)是输出 Y(s)的复杂变量 s 的拉普拉斯变换与输入 U(s)的拉普拉斯变换之比。如方程
(1)所示,它由两个多项式的除法表示;分子由输出微分方程的系数 bi 构成,分母由输
入微分方程的系数 ai 构成。 G(s) = Y(s) U(s) = i=0 aisn-i (1) m∑ bism-i n∑ i=0 具有多个输
入和多个输出的传递函数通常以矩阵形式表示,该矩阵指示系统的每个输入和输出之间的
关系。 状态空间模型以矩阵形式表示微分方程,如方程(2)所示: xk+1 = Axk + Buk +
wk yk = Cxk + vk (2) 其中 xk 2 Rn 是第 k 个时间步骤的状态变量向量,uk 2 Rp 是控制信号,
wk 2 Rn 是过程噪声,假定为均值为零的高斯白噪声,协方差为 Q,即 wk ∼ N(0, Q)。此外,
A 2 Rn×n 和 B 2 Rn×p 分别是状态矩阵和输入矩阵。输出向量 yk 2 Rm 的值表示传感器产
生的测量值,受噪声 vk 的影响,假定为均值为零的高斯白噪声,协方差为 R,即 vk ∼ N(0,
R),而 C 2 Rm×n 是将状态 xk 映射到系统输出的输出矩阵。 DT 需要在功能上等效的虚拟
副本中复制物理过程的状态,以反映系统的内部行为。为了解决这个问题, Eckhart 等人
[74]分析了如何被动地复制设备的程序状态以在其操作期间获得 CPS 的虚拟表示。他们提
出了一种方法,该方法可以识别系统规范上的刺激,然后在虚拟环境中复制它们。这样,
刺激触发状态转换,并且可以使用不同的数据源,如网络流量或系统日志,来识别刺激,
重播它并在 DT 和 PO 之间同步它。类似地,Schellenberger 等人[33]提出了一种用于检测
CPS 攻击的 DT。在这种情况下,该方法基于 Luenberger 状态观察者[75]来估计基于对其输
入、输出和描述其动态的数学模型的观察的系统状态,即观察者是连续时间动力系统,其
输入是测量的输入和测量的输出,并产生植物状态的估计作为输出。
Data-Dependent Models—These are based on data structures that retain all the variables
describing the reality at the level of abstraction chosen. With the data supplied by the PO, it is
possible to build the VO with the help of Artificial Intelligence (AI) open-source libraries such as
TensorFlow [76], PyTorch [77] or OpenCV [78]. This approach is based on the assumption that
since data is a manifestation of both known and unknown physics; by developing a data-driven
model, one can account for the full physics [71]. To build this type of model, it is required to
develop a four-stage process which involves data generation, data collection, data pre-processing
and data analysis through AI algorithms. The data generation is strongly based on sensors that
collect information from the PO. Multiple tools for data collection also exist that can support real-
time data collection, such as Apache Kafka [79]. After that, the system will have a huge amount of
data. Hence, it is required to pre-process it to ensure the quality and completeness. It is also
necessary to compress it and summarize it. The process requires evaluating the relations
between variables and detecting noise. This process of data engineering also includes cleaning
the data to correct or remove corrupt and inaccurate data. For that, operations such as filtering,
handling missing or erroneous values and removing redundant and duplicate information are
used. Furthermore, data integration, data transformation and data enrichment are also parts of
the data engineering process. Apache Spark [80] is one useful framework for memory-based data
processing. As a result of the operations, the pre-processing increases data accuracy and saves
computational cost.
Hybrid Control–Data Models
As explained previously, both control and data models have advantages and disadvantages. The
hybrid models try to obtain the strong points from both design techniques [81]. The use of a
control model ensures physical interpretability, which is very useful, for example, in engineering
systems. Machine learning models are very well adapted to data and are suited to real-time
applications [82]. For example, ref [83] proposes the integration of physics-based models with
machine learning to design a DT to predict structure damage. This strategy allows the use of an
interpretable model (physics-based) to build a fast DT (machine learning) that will be connected
to the PO to support real-time engineering decisions. In addition, ref [84] shows how to build a
hybrid DT model of a heater in a water process system. The work details the steps for updating
the physical model and process system using data-driven models of the process equipment. This
way, with the help of history data to teach ML models, the DT can be continually improved over
time. Chakraborty et al. [85,86] also propose a hybrid control model for linear single-degree-of-
freedom structural dynamic systems evolving in two different operational time scales. The
approach uses a physics-based model for data processing and response predictions, and a data-
driven machine learning model for the time-evolution of the system parameters.
Other Modeling Techniques
Some modeling techniques do not use the physics of the system but the relation of the
components. For example, it is possible to use graph models to represent communication models
or knowledge-based models that require having an expert to analyze the system and manually
design a modelization of the characteristics and behavior. Other methods include the one
proposed by Dai et al. [68], who proposed an ontology-based method to model as-fabricated
parts. They argue that this methodology provides a standardized process to create DTs. Through
this modeling technique, engineers may perform evaluation and optimization of machining
processes. To create the DT, the model encapsulates the physical data and information
relationship with its external environment. They use a model dependent on realism and it is
based on the belief that all we can know about reality consists of networks of concepts that
explain observations by connecting the concepts with rules to define models. The realism also
suggests that we cannot know the reality as it is, but only approximations of it represented by
models. This way, a rational information model can represent critical concepts and their
relationships. Additionally, Pylianidis et al. [69] propose creating DT models using simulation-
assisted ML algorithms. They use process-based models integrated with ML to adapt the resulting
model to the input data. The process-based model aggregates data to a lower resolution to
mimic real situations and develop the ML models using a fraction of the process-based model
inputs.
数据相关模型——这些模型基于保留了描述所选择抽象级别的现实的所有变量的数据结构
通过 PO 提供的数据,可以使用人工智能(AI)开源库如 TensorFlow [76]、PyTorch [77]或
OpenCV [78]来构建 VO。这种方法基于这样一个假设,即数据既是已知物理和未知物理的
体现;通过开发数据驱动模型,可以考虑到全部物理 [71]。要构建这种类型的模型,需要
开发一个包括数据生成、数据收集、数据预处理和通过 AI 算法进行数据分析的四个阶段的
过程。数据生成主要依赖于从 PO 收集信息的传感器。还存在多种用于数据收集的工具,
可以支持实时数据收集,如 Apache Kafka [79]。之后,系统将拥有大量数据。因此,需要
对其进行预处理以确保质量和完整性。还需要对其进行压缩和总结。该过程需要评估变量
之间的关系并检测噪声。数据工程的这一过程还包括清理数据以更正或删除损坏和不准确
的数据。为此,使用了过滤、处理丢失或错误值以及删除冗余和重复信息等操作。此外,
数据集成、数据转换和数据丰富也是数据工程过程的组成部分。Apache Spark [80]是一个用
于基于内存的数据处理的有用框架。通过这些操作,预处理可以提高数据的准确性并节省
计算成本。
混合控制-数据模型 如前所述,控制模型和数据模型都有其优点和缺点。混合模型试图获
取设计技术的优点 [81]。使用控制模型可以确保物理可解释性,在工程系统中非常有用。
机器学习模型非常适合数据,并适用于实时应用 [82]。例如,参考文献[83]提出了将基于
物理的模型与机器学习相结合,设计一个用于预测结构损伤的 DT 的方法。这种策略允许
使用可解释的模型(基于物理的模型)构建一个快速的 DT(机器学习),并将其连接到
PO 以支持实时工程决策。此外,参考文献[84]展示了如何在水处理系统中建立一个加热器
的混合 DT 模型。该工作详细描述了如何使用过程设备的数据驱动模型来更新物理模型和
过程系统。通过利用历史数据来教授 ML 模型,可以不断改进 DT。Chakraborty 等人[85,86]
还提出了一种用于两个不同操作时间尺度中演化的线性单自由度结构动力系统的混合控制
模型。该方法使用基于物理的模型进行数据处理和响应预测,以及基于数据驱动的机器学
习模型进行系统参数的时态演化。
其他建模技术 一些建模技术不使用系统的物理学,而是使用组件之间的关系。例如,可以
使用图模型来表示通信模型或需要专家分析系统并手动设计特性和行为模型的知识模型。
其他方法包括 Dai 等人提出的方法 [68],他们提出了一种基于本体的方法来对制造部件进
行建模。他们认为这种方法提供了一个标准化的流程来创建 DT。通过这种建模技术,工程
师可以执行加工过程的评估和优化。为了创建 DT,模型封装了与其外部环境的物理数据和
信息关系。他们使用了依赖于现实主义的模型,它基于这样一种信念,即我们对现实的所
有了解都由连接概念的网络来解释观察,并通过规则来定义模型。现实主义还表明,我们
无法真正了解现实,只能通过模型来近似表示。这样,一个理性的信息模型可以表示关键
概念及其关系。此外,Pylianidis 等人[69]提出使用辅助模拟的 ML 算法创建 DT 模型。他们
将基于过程的模型与 ML 集成,以适应输入数据。基于过程的模型将数据聚合到较低分辨
率,以模仿真实情况,并使用基于过程的模型输入的一小部分来开发 ML 模型。
4.1.2. Structural Model
This model defines a structured description of the connection and assembly relations among the
structures that perform the functions and behaviors. The interrelation of the structure is the
foundation for the transferring and transformation of the material, energy, information and
motion behavior of the system. The structural model usually includes topology definition, layout
planning and buffer designing [13]. A Physical Model enables simulating the physical properties
and loads, analyzing phenomena such as deformation, cracking and corrosion [10,70].
4.1.2. 结构模型 这个模型定义了执行功能和行为的结构之间的连接和组装关系的结构化描
述。结构之间的相互关系是系统的物质、能量、信息和运动行为的传递和转化的基础
[13]。结构模型通常包括拓扑定义、布局规划和缓冲设计 [13]。物理模型可以模拟物理性
质和载荷,分析变形、裂纹和腐蚀等现象 [10,70]。
A Geometric Model reflects the geometry, shapes, sizes, positions and assembly of machine
components, the kinematics, the logic and the interfaces of the real system [87,88]. For instance,
3D modeling is one of the techniques used to represent system geometry. It is the process of
developing a mathematical representation of the surface of an object. The 3D models can be
constructed by a 3D scan of the object, or through specialized software using equations, and are
finally represented in terms of curves and surfaces [71]. Image-based methods also offer a good
alternative to geometry measurements, compared to scanning techniques. The image-based
approaches permit reconstructing the geometry using image processing algorithms based on
digital photogrammetry. In addition, they can be complemented with data that describe the
internal structure of the object that can be obtained by classical methods of inspection, thermal
imaging or radar techniques, which allow for investigating a physical structure in more depth
[37]. Anbalagan et al. [89] explain how to create Digital Geometry models using Computer-Aided
Design (CAD). They discuss CAD modeling and manufacturing simulation methodologies in a
virtual environment. The objective is to create geometric models useful for DT design.
几何模型反映了机器组件的几何、形状、大小、位置和组装、运动行为、逻辑和接口
[87,88]。例如,3D 建模是用于表示系统几何的技术之一。这是开发对象表面的数学表示的
过程。3D 模型可以通过对对象进行 3D 扫描来构建,也可以通过使用方程的专业软件来构
建,最终以曲线和曲面的形式表示 [71]