0% found this document useful (0 votes)
3 views20 pages

2023 K-ST A Formal Executable Semantics of The Structured Text

Uploaded by

Yanshi Dong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views20 pages

2023 K-ST A Formal Executable Semantics of The Structured Text

Uploaded by

Yanshi Dong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Singapore Management University

Institutional Knowledge at Singapore Management University

Research Collection School Of Computing and School of Computing and Information Systems
Information Systems

9-2023

K-ST: A formal executable semantics of the structured text


language for PLCs
Kun WANG

Jingyi WANG

Christopher M. POSKITT
Singapore Management University, [email protected]

Xiangxiang CHEN

Jun SUN
Singapore Management University, [email protected]

See next page for additional authors

Follow this and additional works at: https://ink.library.smu.edu.sg/sis_research

Part of the Programming Languages and Compilers Commons, Software Engineering Commons, and
the Theory and Algorithms Commons

Citation
WANG, Kun; WANG, Jingyi; POSKITT, Christopher M.; CHEN, Xiangxiang; SUN, Jun; and CHENG, Peng. K-
ST: A formal executable semantics of the structured text language for PLCs. (2023). IEEE Transactions on
Software Engineering. 49, (10), 4796-4813.
Available at: https://ink.library.smu.edu.sg/sis_research/8199

This Journal Article is brought to you for free and open access by the School of Computing and Information
Systems at Institutional Knowledge at Singapore Management University. It has been accepted for inclusion in
Research Collection School Of Computing and Information Systems by an authorized administrator of Institutional
Knowledge at Singapore Management University. For more information, please email [email protected].
Author
Kun WANG, Jingyi WANG, Christopher M. POSKITT, Xiangxiang CHEN, Jun SUN, and Peng CHENG

This journal article is available at Institutional Knowledge at Singapore Management University:


https://ink.library.smu.edu.sg/sis_research/8199
1

K-ST: A Formal Executable Semantics of the


Structured Text Language for PLCs
Kun Wang, Jingyi Wang, Christopher M. Poskitt, Xiangxiang Chen, Jun Sun, and Peng Cheng

Abstract—Programmable Logic Controllers (PLCs) are responsible for automating process control in many industrial systems (e.g. in
manufacturing and public infrastructure), and thus it is critical to ensure that they operate correctly and safely. The majority of PLCs are
programmed in languages such as Structured Text (ST). However, a lack of formal semantics makes it difficult to ascertain the
correctness of their translators and compilers, which vary from vendor-to-vendor. In this work, we develop K-ST, a formal executable
semantics for ST in the K framework. Defined with respect to the IEC 61131-3 standard and PLC vendor manuals, K-ST is a high-level
reference semantics that can be used to evaluate the correctness and consistency of different ST implementations. We validate K-ST
by executing 567 ST programs extracted from GitHub and comparing the results against existing commercial compilers (i.e.,
CODESYS, CX-Programmer, and GX Works2). We then apply K-ST to validate the implementation of the open source OpenPLC
platform, comparing the executions of several test programs to uncover five bugs and nine functional defects in the compiler.

Index Terms—Formal executable semantics, PLC programming, Structured text, K framework, OpenPLC.

1 I NTRODUCTION

P ROGRAMMABLE Logic Controllers (PLCs) are responsi-


ble for automating process control in several modern
industrial systems, e.g. in manufacturing and public infras-
tructure. It is critical to ensure that PLCs are operating cor-
rectly, as any functional or security-related defects may lead
to serious incidents in the system. This has most famously
been demonstrated by the Stuxnet worm [1], while many
other less-known safety and security incidents [2], [3], [4]
and potential hazards [5], [6] related to PLCs have resulted Fig. 1: High-level workflow of our approach
in significant consequences, with an estimated $350,000 in
damage on average [7].
The majority of PLCs are programmed using languages correctly implemented and exhibit only expected behaviors
defined in the IEC 61131-3 open international standard [8]. when the code is being run on a PLC.
Programs can be written in graphical languages such as This has motivated a surge of research on analyzing
Function Block Diagrams (FBD), but the standard also de- and verifying PLC programs [7], [10], [11], [12], [13], [14],
fines Structured Text (ST), a fully textual language based [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25],
on the idea of organizing code into ‘function blocks’ and [26], [27], [28], [29], [30], although few works focus on ST
designed with a syntax similar to Pascal. ST is a particularly implementations/compilers. Zhang et al. [24] propose Vet-
important IEC 61131-3 language given its utility for data PLC, a temporal context-aware, program-based approach
processing [9], and the fact that snippets of ST are actu- to produce timed event sequences that can be used for
ally required in FBD and other graphical languages. It is automatic safety vetting. McLaughlin et al. [21] propose TSV
therefore important that translators and compilers for ST are which translates assembly-level code into an intermediate
language (ILIL) to verify safety-critical code executed on
PLCs. Mader and Wupper [26] translate Instruction List (IL)
• K. Wang, and J. Wang are with the College of Control Science and code into timed automata [31]. Bauer et al. [25] similarly use
Engineering, Zhejiang University, Zhejiang 310027, China. timed automata as the formalism for Sequential Function
E-mail: {kunwang yml, wangjyee}@zju.edu.cn.
• CM. Poskitt is with the School of Computing and Information Systems, Chart (SFC). In [27], the proposed method transforms IL to
Singapore Management University, Singapore. Petri-nets [32], and manually builds two additional Petri-
E-mail: [email protected]. nets for modeling the PLC and its environment. Xiong et
• X. Chen is with the College of Control Science and Engineering, Zhejiang
University, Zhejiang 310027, China.
al. [23] propose an algorithm based on variable state analysis
E-mail: [email protected]. for automatically extracting a Behavior Model (BM) from an
• J. Sun is with the School of Computing and Information Systems, ST program. These works attempt to transform PLC pro-
Singapore Management University, Singapore. grams into an intermediate language or another program-
E-mail: [email protected].
• P. Cheng is with the College of Control Science and Engineering, Zhejiang ming language (i.e., C) which is suitable for verifying or
University, Zhejiang 310027, China. detecting potential issues using existing associated verifiers
E-mail: [email protected]. or checkers. The issue of these approaches is that they lack
(Corresponding authors: Jingyi Wang and Peng Cheng) analysis and proof of equivalence in the conversion process. In
2

addition, the analyses they perform are often limited (since supports only 10 basic data types, whereas CODESYS
the existing tools are not designed for PLCs) and do not supports 17 types. Thus, a formal semantics needs to be
offer the feedback to the level of source code. Canet et ‘concrete’ enough to be useful, but ‘high-level’ enough to
al. [22] propose formal semantics for a significant fragment be general/extendable to the different nuances of vendors’
of the IL language, and a direct coding of this semantics compilers. A preliminary attempt at defining a high-level
into a model checking tool. Huuck [28] develops a formal semantics for ST was made by Huang et al. [38]. However,
operational semantics and abstract semantics for IL, which it falls short of a full reference semantics as it misses several
allows approximating program simulation for a set of inputs important features of the language, e.g. certain data types,
in one simulation run. Blech et al. [10], [11], [30] attempted and key sentences.
to define the formal semantics of the IL and SFC languages In this work, we develop K-ST, a formal executable ref-
in Coq and NuSMV and, based on that, verify the safety erence semantics for ST in the K framework [39]. Our high-
properties in the code. However, IL is a low level assembly- level semantics is both executable and machine readable,
like language that has been deprecated from the IEC61131- and can be used by the K framework to generate inter-
3 standard. Furthermore, these studies mainly concentrate preters, compilers, state-space explorers, model checkers,
on analyzing the functional aspects of the programs and and deductive program verifiers. Our principal goals for the
may overlook potential vulnerabilities and security risks design of K-ST are as follows:
introduced during the compilation process.
1) Validated reference semantics. K-ST is designed
While extensive research has been conducted on testing
to cover all the main features of ST, and is vali-
more ‘traditional’ compilers (e.g. vulnerability detection for
dated against hundreds of different real-world ST
GCC and Clang [33], [34], [35]), compilers for PLC lan-
programs extracted from GitHub.
guages such as ST have received much less attention. The
2) General and extendable. The semantics is high-
challenges associated with testing the implementation of
level (rather than tied to a particular compiler), with
a compiler arise from the inherent difficulties of ensuring
the goal of supporting different ST implementations
its correctness. One particular challenge stems from the
as well as extensions for vendor-specific functions.
absence of a precise specification of the expected behavior of
3) Analyses of ST compilers. Most importantly, K-ST
a compiler. For most popular programming languages, there
can be used to check the correctness and consistency
exist multiple purportedly equivalent implementations of
of different ST implementations, and thus ensure
compilers. Compiler testing can take advantage of this by
that a compiler is not introducing an unintended
utilizing these implementations as oracles for conducting
behavior or compile-time threat [40], [41] into a
differential testing [36]. However, in the case of the domain-
critical industrial system.
specific ST language, there is no specific implementation
standard, and different vendors often develop their own Given the absence of complete feature descriptions for
compilers based on their specific requirements. Another the ST language in official documentation, we not only refer
challenge is the semantic complexity of the input and output to the definitions and code samples in the official docu-
languages that compilers handle. The fact that different ments, but also extensively consult the guidance manuals
vendors develop their own implementations further exacer- provided by multiple vendors to better define the semantics
bates this issue. Compiler testing methods based on formal of the ST language. For example, there is no specific docu-
semantics [37] have shown advantages in addressing these mentation on how integer overflow is handled in the offi-
challenges. With a formal semantics of the ST language, the cial documents. Through investigating multiple instruction
expected behavior of ST compilers can be precisely and manuals, we found that existing ST compilers generally use
unambiguously defined, which can greatly aid in testing truncation to handle integer overflow without any warning.
and verifying their correctness. In defining the semantics, we find that the rewriting rule of
To the best of our knowledge, a practical and complete the K framework provides a good mechanism for capturing
semantics for the ST language does not exist, which makes the unique features of ST. For example, we can rewrite
it difficult to ascertain the correctness of ST translators REPEAT to WHILE to achieve the execution effect of REPEAT.
and compilers (e.g. by comparing executions). There are a We validate K-ST by extracting 567 real-world ST code
number of reasons why such a reference semantics is yet to samples from GitHub and comparing their executions in our
emerge. First, there is insufficient documentation defining semantics against their executions resulting from various
or describing the complete features of the ST language [9]. commercial compilers (i.e., CODESYS, CX-Programmer, and
For instance, the official documentation introduces language GX Works2). We find that K-ST is sufficiently complete to
features by only a few examples, based on which it is support 509 of these programs (consisting of 26,137 lines of
difficult for readers to fully understand the behavior of the code) and executes those programs correctly (i.e., producing
language. Second, the ST compilers provided by different the same outputs as the corresponding existing compiler),
vendors (e.g. Allen-Bradley, Siemens) can implement the with the remaining programs only unsupported due to the
language differently, and their closed source solutions make use of certain vendor-specific or hardware-related functions
it difficult to fully assess how they behave systematically that we did not yet formalize. Furthermore, to evaluate
(other than through manual observation). For example, the utility of K-ST for testing ST compilers, we compared
CODESYS, CX-Programmer, and GX Works2 all produce the executions of the 567 programs (and several mutants)
negative numbers in the results of negative modulo oper- under K-ST and OpenPLC [42], a popular open source PLC
ations, even though this behavior is undefined according program compiler. Through this semantics-based testing, we
to the official documentation. Furthermore, GX Works2 are able to uncover five bugs and nine functional defects in
3

the OpenPLC compiler, all of them are previously unknown.


Fig. 1 summarises the high-level workflow of this process.
In summary, we make three main contributions.
• We propose an executable formal reference semantics
for ST;
• We collect a set of 567 complete ST program samples,
and validate the correctness of our executable seman-
tics by running those programs in the semantics and
via existing compilers (CODESYS, CX-Programmer,
and GX Works2), comparing the results.
• We test OpenPLC, an open source PLC program
compiler, using our proposed semantics, and find
five bugs and nine functional defects.
The remaining part of this paper is organized as follows.
Section 2 introduces the background of ST and the K frame-
work. The proposed executable operational semantics of ST
formalized in K is introduced in Section 3. Section 4 shows
some practical applications of our formal semantics. The
evaluation results of the proposed semantics are introduced
in Section 5. Section 6 summarises some related work, and
Section 7 concludes this work.

2 BACKGROUND
In this section, we briefly introduce the background of the
Structured Text (ST) language and the K framework.

2.1 Structured Text


The Programmable Logic Controller, invented in 1969 by Fig. 2: An ST programming example
Dick Morley, is specially designed for applications in indus-
trial environments, e.g. assembly lines, robotic devices, or
public infrastructure. These kinds of applications all require ST, as the only textual programming language supported
high reliability and ease of programming. by the new IEC standard, has a number of advantages
Early PLCs were represented as a series of logic expres- compared to other PLC languages. First, being textual, ST
sions in some kind of Boolean format. With the development programs can be copied relatively easily. Second, compared
of programming terminals and the complexity of existing with the other four languages, it is more convenient for
control procedures, Ladder Diagrams (LD) were developed mathematical calculations, formulas and algorithms, and for
to program PLCs. As of 1993, the IEC 61131-3 standard managing large amounts of data [9]. Third, compared with
developed by the International Electrotechnical Commis- 20 years ago, PLC solutions are more in demand today and
sion (IEC) defined five programming languages, including ST can better adapt to this change. Finally, LD, SFC and FBD
two textual programming languages—ST and IL—as well also require parts of the program to be written in ST anyway
as three graphical languages—LD, FBD, and SFC. A simple [45], [46].
example in Fig. 2 [43] shows a ST code example which can Unfortunately, the absence of documents defining or
be used for linear scaling of an analog sensor signal. describing the complete features of the ST language and the
ST is a high-level PLC programming language which differing customizations of vendors can lead to inconsistent
is similar to Pascal [44] (widely used from 1980 to 2000), implementations of ST. In addition, understanding the se-
C/C++ and Java. While it contains common constructs mantics of the ST language, and ensuring that it is formally
from modern programming languages such as FUNCTION, defined is difficult for end users accustomed to graphical
IF/ELSIF/ELSE and CASE branches, WHILE and FOR loops, it programming. A formal executable semantics of ST not only
has its own characteristics, such as the lack of recursion, cap- provides a standard, but also helps PLC engineers verify the
italized keywords, REPEAT statement, and FUNCTION BLOCK completeness and correctness of these implementations.
structure. For instance, FUNCTION BLOCK as an important
part of ST, and has its own state. Its main purpose is to mod-
ularize and structure a straightforwardly defined portion of 2.2 The K Framework
the program. It is similar to the class-object manifestation in K is a formal logic framework based on rewriting logic [47].
object-oriented programming. Function blocks exist in two It was developed with the overarching goal of pursuing
forms: as a type or as an instance, but only the instance the ideal language framework, where all programming lan-
can be called. For each function block, the local variables guages have formal semantic definitions and all language
retain their values between each ‘call’. TABLE 1 shows the tools are automatically derived in a correct-by-construction
common elements of ST. manner at no additional cost. The K backends, such as the
4

TABLE 1: Common elements of the ST language

Type Element Type Element Type Element

FUNCTION BLOCK INT ARRAY


Built-in Data Type
Program Organization Unit FUNCTION DINT ...
PROGRAM SINT VAR GLOBAL
IF LINT VAR
CASE UINT VAR INPUT
WHILE UDINT VAR OUTPUT
Main Statement FOR USINT VAR IN OUT
REPEAT Built-in Data Type ULINT VAR EXTERNAL
Declaration Type
EXIT REAL VAR TEMP
RETURN LREAL AT
... BOOL RETAIN
ENUM STRING PERSISTENT
User Data Type
STRUCT WSTRING CONSTANT
TIME TIME OF DAY ...
Built-in Data Type
DATE DATE AND TIME ...

Isabelle theory generator, the model checker, and the de-


ductive verifier, can be utilized to prove properties based on
the semantics and generated verification tools [48]. Several
executable semantics in K have been developed for main-
stream programming languages, including C [49], Java [50],
JavaScript [51], Rust [52], Solidity [53], and IMP [54].
A language semantics definition in K consists of three
parts: the language syntax, the configuration, and a set of
semantics constructed based on the syntax and the config-
uration. Given the semantics definition for a programming
language and some source programs, K executes these pro-
grams like a translator. For illustration, in the following we
take a strict subset of the ST language, i.e., STdemo shown
in Fig. 2 as an example to illustrate how to define language
semantics in K.
Configuration. The whole configuration cell T of STdemo
contains two cells, namely k and state. The cell k is used
to store the source program $P GM for execution, and the
cell state is used to record the mapping from a variable
identifier to its value. The configuration simulates the mem-
ory status and environmental changes during runs of the
program. Fig. 3: The syntax of STdemo
hh$P GM : P gmik h.M apistate iT
With the configuration defined, we present the syntax of where V Bs stands for b : REAL; Error : BOOL := FALSE;.
STdemo in Fig. 3, which includes some numerical operations, Then, K will rewrite hVAR a : REAL; V Bs END VAR · · · ik to
logic operations and commonly used statements. Based on hVAR V Bs END VAR · · · ik , which means that a : REAL; has
the configuration and the syntax of STdemo , we introduce been executed according to rule Variable Allocate.
some basic rules in the semantics. The role of the semantics Meanwhile, it adds the mapping between the
is to tell K how to execute the source code, where K variable name and the corresponding value
executes the code and updates the configuration sentence- (a 7→ 0.0) in the current state cell Rho. In addition,
by-sentence after parsing the source program. “requires notBool (X in keys (Rho))” guarantees that
Here, we show the semantics of Allocate, Lookup and the variable will not be re-declared. Similarly, variables
Assignment in Fig. 4 as they are the most commonly b and Error will be allocated separately. After that, the
used constructs in programming languages. TABLE 2 content in the k cell is hVAR .V arBodys END VAR · · · ik ,
describes some common semantic notations. Take Allocate where .V arBodys represents an empty variable declaration
as an example: when K runs to lines 9–13 in Fig. 2, the list, that is, no additional variable needs to be allocated. The
content in the k cell is hVAR a : REAL; V Bs END VAR · · · ik , rule Variable Finish Allocate will be called to convert
5

to be implemented by extending the operational seman-


tics. Specifically, the syntax of ST is constructed based on
the official IEC 61131-3 standard [46]. The configuration is
specifically designed for ST. Based on the syntax and the
configuration, we then formalize the semantic rules for the
language features with rewriting logic. Next, we present
each component of the semantic one by one.

3.1 The Syntax of ST


TABLE 3 presents the syntax of ST defined in K-ST, which
covers most of the core syntax. We remark that TABLE 3
only contains the main part of K-ST while omitting oth-
ers, e.g., some built-in functions (LEN, DELETE and so on)
for space reasons. The syntax is specified by a grammar
in a dialect of Extended Backus-Naur Form (EBNF) [55],
where ∗ means zero or more repetitions. In ST, the top-level
grammatical structures include user-defined types (TYPE
statements) and three Program Organization Units (POUs):
FUNCTION, FUNCTION BLOCK and PROGRAM. Other syntactical
elements are derived within these top-level grammatical
Fig. 4: The partial semantics of STdemo structures.

TABLE 2: Summary of semantic notations


3.2 The Configuration of ST
The execution of an ST program needs to update the follow-
Notation Description
ing kinds of state: data segment, code segment and stack.
rule The beginning of a semantic rule. Among them, the data segment is used to store global vari-
The symbol ⇒ means “rewritten by”, thus a ⇒ b ables, the code segment is used to store program execution
a⇒b code, and the stack is used to store local variables of the
denotes that a can be replaced by b.
program. Note that runtime environment switching caused
a requires b Execute a when b is true.
by function calls is also achieved by the operation of stack.
hik stands for the k cell in a configuration. The overall runtime configuration of ST in K is presented in
a a
b k Similar to a ⇒ b, b
means a will be rewritten by b. Fig. 5. We highlight our careful design choices as follows.
However, it can only be used inside hi. Overview. There are 11 main cells in the configuration
h· · · a · · · i · · · represents the content in the a context.
T , i.e., k , control, allenv , genv , gvenv , store, type, constant
input, output and nextLoc. The value of each cell is initial-
. . stands for empty.
ized according to its specified type. For instance, for cells
Any value. with a mapping relationship, their values are initialized to
a : b The type of variable a is b. M ap type, and for cells that store a collection, they are
a 7→ b, a ← b Mapping from a to b. initialized to List type. A ‘.’ followed by any type means
a y b The execution of a, followed by execution of b.
an empty set of this type. For instance, .M ap in the cell
genv represents that genv is initialized with an empty map.
Enumeration type. By default, when an enumeration
type is defined in ST, PLC compilers will automatically
“VAR .V arBodys END VAR” in k to “.”, which means
associate a number (indexed from 0 and incremented by 1
that there is no more code to execute in the VAR block and
each time) to each variable in the enumeration. For repeated
K will continue to execute the subsequent code.
declarations, we use count cell to record the value of the
current enumeration.
3 F ORMAL S EMANTICS OF S TRUCTURED T EXT IN Global variables. There are two types of global variables.
First, the POUs and customized types that users define.
T HE K FRAMEWORK These variables can be accessed anywhere in the program.
In this section, we introduce K-ST, the executable opera- We store these variables in the genv cell as the basis
tional semantics of ST formalized in K. Note that in practice for program operation. Second, the variables defined in
the PLC programming environment is provided by specific VAR GLOBAL. These variables cannot be directly accessed in
PLC manufacturers including CODESYS and Siemens’s TIA the program unless they are declared with VAR EXTERNAL.
portal (TIA, Structured Control Language (SCL)). As a con- We store these variables in the gvenv cell and provide them
sequence, the implementations of different manufacturers on demand.
can vary and may also include their own unique functions Program execution. The source code parsed by the syn-
or structures. tax SourceU nit, called $P GM , is stored in the cell k for
Our approach is therefore to focus on the common fea- execution. Then the $P GM will be executed unit by unit.
tures, allowing other unique functions of the environment If the program terminates normally, there will be a ‘.’ in
6

TABLE 3: The syntax of ST

Syntax Description

Id ::= [a − zA − z ] [a − zA − Z0 − 9 ]∗
Ids ::= Id∗ Identifier
IdV al ::= Id := Expression
EnumStructDeclaration ::= T Y P E EnumDeclarationExp∗ EN D T Y P E
| T Y P E StructDeclarationExp∗ EN D T Y P E
EnumBlock ::= Ids | IdV al∗ Enum and Struct declaration
EnumDeclarationExp ::= Id : (EnumBlock) ; | Id : (EnumBlock) := Id;
StructDeclarationExp ::= Id : ST RU CT V arDeclarationExp∗ EN D ST RU CT
F unction ::= F U N CT ION Id : T ype V arDeclaration∗ Statements EN D F U N CT ION Function declaration
F unctionBlock ::= F U N CT ION BLOCK Id V arDeclaration∗ Statements EN D F U N CT ION Function block declaration
P rogram ::= P ROGRAM Id V arDeclaration∗ Statements EN D P ROGRAM Program declaration
T ype ::= IN T |DIN T |SIN T |LIN T |U IN T |U DIN T |U SIN T |U LIN T |BY T E|W ORD|DW ORD|REAL
|LREAL|ST RIN G|ST RIN G [Expression] |W ST RIN G|W ST RIN G [Expression] |T IM E|DAT E Variable types
|T IM E OF DAY |DAT E AN D T IM E|Id|ARRAY [Expression] OF T ype
V arT ype ::= V AR GLOBAL | V AR | V AR IN P U T | V AR OU T P U T | V AR IN OU T | V AR T EM P
V arDeclarationExp ::= Ids : T ype; | Ids : T ype := Expression; Variable declaration
V arDeclaration ::= V arT ype V arDeclaration EN D V AR
Operation ::= + | − | ∗ | / | ∗ ∗ | M OD | < | > | = | <= | >= | <> | AN D | &
| AN D T HEN | XOR | OR | OR ELSE | ..
Expression ::= Int | F loat | String | Bool | Bit | AllT ime | Id | Expression Operation Expression Expressions
Expression (Expressions) | Expression.Expression | Expression [Expressions] | (Expression)
Expressions ::= Expression∗
Assignment ::= Expression := Expression; Assignment statement
ElseIf Block ::= ELSE Statements | ELSE IF Expression T HEN Statements ElseIf Block∗
If ::= IF Expression T HEN Statements ElseIf Block∗ EN D IF ;
CaseBlock ::= Expression : Statements | Expression .. Expression : Statements Branch statements
Case ::= CASE Expression OF CaseBlock∗ EN D CASE;
| CASE Expression OF CaseBlock∗ ELSE Statements EN D CASE;
W hile ::= W HILE Expression DO Statements EN D W HILE;
F or ::= F OR Expression T O Expression DO Statements EN D F OR;
Loop statements
| F OR Expression T O Expression BY Expression DO Statements EN D F OR;
Repeat ::= REP EAT Statements U N T IL Expression EN D REP EAT ;
Return ::= RET U RN ; Return statement
Exit ::= EXIT ; Exit statement
Statement ::= Expression; | Assignment | If | Case | W hile | F or | Repeat | Return | Exit
Statements
Statements ::= Statement∗

the k cell, denoting that no more units need to be executed. and indexes in the current environment during program ex-
In the preprocessing phase (the first pass of K), the k cell ecution. Furthermore, cells temp and count are used in ENUM
only contains the token execute. Afterwards, K will start and STRUCT, where temp is for temporary mapping and
executing from the MAIN program. count is used as a counting pointer. The cell gvid records
all identifiers of global variables to assist in the generation
Stack operations. The cell control contains seven of global variables. The cell print records variables which
subcells—f stack , env , temp, count, gvid, print and break — need to be output. Finally, break stores the program after
which record the operating environment of the currently the loop in order to support the implementation of the EXIT
running code segment. Specifically, the function stack statement in FOR, WHILE and REPEAT loops.
f stack is a list used to store the environment before exe-
cuting other POUs, including variables in the current envi-
ronment and the subsequent program. Next, the cell env is Execution environment. The allenv cell is used to cache
used to store the mapping relationship between variables the execution environment before function calls (for strict
7

Fig. 5: The runtime configuration of ST in K

type checking of parameter passing in function calls1 ). The types needing additional implementation in K-ST, which we
cell genv records the result of the pre-processing (including call extended data types. These extended data types can be
POUs and custom types) and will be copied to env when categorized into two kinds: 1) elementary types (TIME, BYTE,
env is refreshed. The last cell related to the environment is WORD, DWORD, TIME OF DAY, DATE and DATE AND TIME) and
called gvenv and is used to index global variables. 2) compound types (ENUM and STRUCT). We implement these
Memory operation. The store cell is used to simulate extended data types by the composition of built-in types and
memory to record the mapping relationships of indexes methods in K as follows.
and variable values. After that, the cells input and output
are used to realize external inputs and external output We take TIME OF DAY as an example to introduce
respectively. The last cell, nextLoc, ensures that the index of elementary types. There are two types of TIME OF DAY
a variable can always be incremented without duplication. in ST, e.g., TIME OF DAY#23 : 45 : 56.30 and
The design consideration behind this is that for complex lan- TOD#23 : 45 : 56.30. Fig. 6 shows our implementation
guages, it is more effective to explicitly manage arbitrarily of TIME OF DAY type together with its relevant operations.
large memory than use garbage collection [56]. Lines 1 and 2 respectively define the syntax of TIME OF DAY
and how to parse it (Get TIME OF DAY). Line 3 is used
3.3 Semantics of the Core Features to convert Get TIME OF DAY to TIME OF DAY, which is
We implement the executable semantics covering most core achieved by two steps—Gtd2T d and Standardization—
features of ST and leave the vendor-specific functionalities where Gtd2T d realizes the conversion of the format and
as potential extensions. For example, some compilers would Standardization realizes content conversion, e.g., replacing
use additional keywords to distinguish the declaration part 60 minutes with 1 hour. Lines 4–11 define some arithmetic
and the execution part of the program. In the following, and relational operations of TIME OF DAY.
we provide an overview of four core semantic features of
ST, including 1) data types, 2) main control statements, 3) For compound types, we take STRUCT as an example
declarations and calls of POUs and 4) memory operations. and show its semantics in Fig. 7, including both STRUCT
Before diving into the details, we present the notations as declaration and instantiation. Declarations are shown in
follows. rule Struct Declaration, where we allocate memory for
each defined data structure. The instantiation of STRUCT
3.3.1 Extended Data Types consists of four main steps in rule Struct Instantiation:
The K framework supports diverse data types including 1) CreatStruct allocates memory for I1, 2) StructInits
identifiers (Id), integers (Int), bools (Bool), floats (F loat) generates each variable in turn according to V ds in STRUCT,
and strings (String ), which cover most of the require- 3) Set assigns values to the corresponding variables ac-
ments. However, there are still some unsupported data cording to Idvs, and finally, 4) U pdate stores the mapping
relationship of variables related to I1 into the memory of I1
1. This is optional but recommended for ST compilers. to facilitate subsequent use.
8

Fig. 6: Implementation of TIME OF DAY in K

Fig. 8: The partial semantics of ST control statements

the concept of class-object manifestation in object-oriented


programming (OOP), which aims to achieve better modu-
larization. FUNCTION BLOCKs exist in two forms: as a type
or as an instance, and only the instance can be called.
For a FUNCTION BLOCK instance, the local variables retain
their values between each ‘call’. PROGRAMs are defined by
the IEC 61131-3 standard as a “logical assembly of all the
programming language elements and constructs necessary
for the intended signal processing required for the control
of a machine or process by a PLC-system” [46]. Due to
space constraints, we show the declaration, call and return
operation of FUNCTION BLOCKs in Fig. 9 as an example for
illustration (FUNCTION and PROGRAM are shown in Fig. 10
Fig. 7: The partial semantics of STRUCT in K and explained only when necessary).
Declaration. The declaration of FUNCTION BLOCK
is similar to that of STRUCT. As shown in
3.3.2 Main Control Statements rule Function Block Declaration, we first assign an
Control statements are important in ST for achieving com- index in memory for FUNCTION BLOCK X , set the type to the
plex program logic (as in most other programming lan- built-in F unctionBlock , and convert the entire declaration
guages). We show the rules for CASE, REPEAT and EXIT in statement to the built-in type f unblambda(X, void, V ds, S)
Fig. 8 (as the semantics of IF, WHILE and FOR are typical). A for storage, where void means no return value, V ds and S
CASE statement can be rewritten as a combination of an IF are variable declarations and operations in X respectively.
and CASE through rule Case. The rule Repeat is imple- The purpose of setting const to true is to prevent it from
mented as follows. We first store the subsequent statements being modified. Note that FUNCTION and PROGRAM set type
outside the loop (recorded as K ) in cell break to deal with and store to F unction, f unblambda(X, T, V ds, S) and
the EXIT statement that may appear, and then rewrite it P rogram, plambda(X, void, V ds, S, .M ap).
into the form of WHILE for further execution. During the Instantiation. The instantiation of FUNCTION BLOCK
execution of the loop body, once EXIT is executed, all the is achieved through variable declarations, as shown
statements in the current cell k are discarded and rewritten in rule Function Block Instantiation. However, the
to K (storing the subsequent statements), as shown in value is set to runf unblambda(X, void, V ds, S, .M ap) to
rule Exit. distinguish it from f unblambda and .M ap is designed
to store the FUNCTION BLOCK environment for next call
3.3.3 The Declaration and Call of POUs and external query. This is because a FUNCTION BLOCK
In ST programs, statements are inside Program Organi- can only be called after instantiation, i.e., runf unblambda
zation Units (POUs), i.e., FUNCTION, FUNCTION BLOCK or can be executed but f unblambda can not. Since FUNCTION
PROGRAM. A FUNCTION is a stateless POU type, comparing and PROGRAM have no such restrictions, f unlambda and
to a FUNCTION BLOCK which stores its own state after ex- plambda can be directly called and executed.
ecution. The design of the FUNCTION BLOCK is similar to Call. There are two cases when a FUNCTION BLOCK
9

renew. After that, K executes the variable declaration V ds


(including index application, initialization and assignment)
and statements S in the function block. In addition, U pdate
is used to update the .M ap in runf unblambda to record
the current environment. Finally, RETURN can return to
the calling program and configure the corresponding
environment. In other cases (not called for the first time),
as shown in rule Function Block Call Others, there is
already a mapping relationship between related variables
and values in cell store, and the mapping relationship
between identifiers and indexes is also stored in the
runf unblambda. Therefore, no new memory allocation
will be made during the execution process and the existing
environment will be used. Note that the value of the
variable in the FUNCTION BLOCK will not be initialized,
which means that the execution result for the same input
may be different.
Regardless of whether RETURN appears in the
FUNCTION BLOCK, we add a RETURN by default for each
FUNCTION BLOCK as a sign that the FUNCTION BLOCK has
finished running and returned to the calling POUs. Since
FUNCTION BLOCKs and PROGRAMs do not have a return value,
we set null as the return value. Note that a FUNCTION
has a return value, and the returned value is the value
corresponding to the function identifier, so we need to use
Clearenv to clean up the memory environment correspond-
ing to the function identifier after calling procedure renew
and add the declaration of the function identifier variable in
V ds.

3.3.4 Memory Operations


Here, we present the rules for memory operations on ele-
mentary types in ST, such as built-in types and extended
elementary types. What elementary types have in common
is that they take only one memory slot. For complex types,
such as enums, structs, arrays, etc, which are compositions
of elementary types, the memory operation can be regarded
as a set of memory operations on elementary types. For
instance, the assignment to struct can be equivalent to assign
value for each variable of this struct.
Similar to STdemo , main memory operations in ST are
still composed of Allocation, Lookup, Assignment and
additional Clearenv . Where Allocation implements the
allocation of memory for variables in the store, Lookup
is used to find variable values in store cell, Assignment
implements the assignment of variables, and Clearenv im-
plements the recovery of memory in the store. However,
because the complete ST semantics has a more complex type
design, they will involve more cells in configurations, and
are more complicated, as shown in Fig. 11.
Fig. 9: The partial semantics of FUNCTION BLOCK Note that HOLE is just a variable, but it has special
meaning in the context of sentences with the ‘heat’ or ‘cool’
attribute. In short, ‘heat’ is to lookup the corresponding
is called. The first case is that the FUNCTION BLOCK content of the HOLE in the formula, and ‘cool’ is to put
is called for the first time, as shown in the recheck results back into the formula. For example, in
rule Function Block Call First. Since there is no expression a+b where a is represented as HOLE , ‘heat’ is to
initial environment (the last value of runf unblambda is take a out of the formula and find its corresponding value.
.M ap), we will first store the current execution environment If it is 3, and ‘cool’ puts 3 back into the original formula,
inf o in f stack , including subsequent statements K , the then the formula becomes 3 + b.
Allenv of the current environment, and the parameters Let us start with the Assignment operation (we omit
C in cell control. Then, we reset parameters C through Lookup as it is straightforward). The Assignment of ST
10

Fig. 11: The partial semantics of memory operations

requirements, and on the other hand, we use constant to


ensure that the constant cannot be modified. Although the
memory cleaning operation is not necessary for ST in K, a
simple Clearenv operation can effectively reduce repetitive
code and improve code readability. For rule Clearenv,
what needs attention is the operation on cell env : it replaces
the index L of variable X with undef which means null in
the map supported by K.
ST has relatively complex and strict type definitions,
Fig. 10: The partial semantics of FUNCTION and PROGRAM therefore the rule Allocation of ST involves more cells and
operations, such as type and constant for storing variable
types and whether they are constants, where U ndef ined
is used to generate the default of the specified type. In
divides the Assignment of STdemo into two steps, where
addition, according to the content in TABLE 1, not only
context and rule Find Index are used to determine
VAR will be used in the variable declaration process, but
the index L of the assigned variable X in store, and
also other keywords, such as VAR INPUT, VAR IN OUT, etc.
rule Assignment implements the update of the store
In order to reduce the complexity of the code, we also
at index L. The purpose of this division is to make the
implement these declarations through VAR declarations. For
Assignment operation better applicable to complex types,
instance, Fig. 12 shows the implementation of VAR GLOBAL
because in some cases the index of the assigned variable can
and CONSTANT. We realize regional changes (from the env
not be directly obtained and multiple queries are required.
cell to the gvenv cell) through letogv , and SetConstant
For instance, when assigning a value to A [3, 5, 7], where
realizes the modification of the value in the const cell.
A is a multi-dimensional array, we need to look up each
dimension one by one to finally determine the index. In We remark that K-ST covers 259 core features with 876
addition, we refer to the state of X in type and constant rules in total, using 2315 lines of K code. The complete code
during the assignment process. On the one hand, we use can be accessed through https://github.com/wkyml/K-ST.
Limit to ensure that the assigned value meets the type
11

TABLE 4: Measure of K-ST/OpenPLC consistency

The result of K-ST


Successful Unusual
execution(Q) termination(I )
Successful Q = Q0 "
%
The result execution(Q ) 0
Q 6= Q0 %
of OpenPLC Unusual I = I 0 &Q = Q0 "
%
termination(I )0
others %
Fig. 12: The partial semantics of variable declarations
TABLE 5: Mutation operations

4 T ESTING AND A NALYSING ST C OMPILERS Mutation Operation Example


In addition to providing formal references for defined lan- Variable Random Assignment a : IN T ; a : IN T := 3527;
guages, our formal semantics also has several applications
Scalar Variable Replacement a := b; a := c | 30;
that use language-independent tools provided by K, such as
Arithmetic Operator Replacement a+b a−b
state space exploration, model checking, symbol execution
and deductive program validation. We omit demonstration Arithmetic Operator Insertion a+b a+b−c
of these applications in this paper since they have been Arithmetic Operator Deletion a+b−c a+b
well-illustrated in related works [51], [52]. In this work, Relational Operator Replacement a>b a <= b
we introduce the testing of ST implementations/compilers Logical Connector Replacement a AN D b a OR b
based on our executable semantics, K-ST.
Logical Connector Insertion a AN D b a AN D b OR c
As discussed earlier, because ST compilers are typically
provided by vendors, the execution behavior of compilers Logical Connector Deletion a AN D b OR c a AN D b
may be different, and may even be inconsistent with respect “NOT” Mutation N OT a a|a N OT a
to the high-level semantics [37]. One of the main applica- Statement Insertion IF · · · EN D IF ;
tions of the proposed semantics is to define the ‘reference’ Statement Deletion EXIT ;
execution behavior of ST, which can help programmers
detect bugs in existing ST compilers.
To explore this application (and given the closed nature In order to better mutate seed programs to improve
of commercial compilers), we choose OpenPLC2 as our test the diversity of test samples, we propose specific mutation
object, which is open source and supports ST programming. operations in TABLE 5 to generate mutated test samples.
The overall workflow of our testing approach is depicted in These mutation operations can enrich the test samples while
Fig. 13. It includes three parts: program variation, program minimizing program errors. Our method for generating
execution and result comparison. First, seed programs are mutant ST programs is shown in Algorithm 1. Given an ST
mutated to improve the diversity of test samples. Next, program Si , the algorithm makes a copy, randomly assigns
we use the mutated program as input to run OpenPLC initial values to all variables at the time of declaration, and
and our executable semantics respectively. Finally, the re- applies some applicable mutation operators to randomly
sult comparison part compares the consistency of the two selected lines in the program. The test is done by comparing
execution results. It should be noted that we use a policy results of these samples in K-ST and OpenPLC. It should
similar to [33], that is, the program does not need input, be noted that correct and erroneous programs in the test
and the category of result consistency comparison includes sample are both meaningful for checking the consistency
the values of all variables in the program. The comparison of execution behavior. This is because K-ST and OpenPLC
of results is performed to analyze potential inconsistencies report program errors at the same time, allowing us to verify
between K-ST and OpenPLC. By comparing the final ex- a stronger notion of consistency. In addition, considering
ecution state of the program with its variable state, we the lag of OpenPLC updates, we also tested it on the latest
can identify potential inconsistencies. The execution state Beremiz3 which uses the same underlying implementation
focuses on determining whether the program has completed (MATIEC 4 ) as OpenPLC. The specific results of the test are
its execution or terminates at the same statement. On the shown in Section 5.
other hand, the variable state captures the values of all
variables in the program, including input, output, and inter-
mediate variables, after the program has finished running. 5 E VALUATION
TABLE 4 shows our measure of consistency, where Q and In order to evaluate the semantics of ST we defined in K,
Q0 represent the values of each variable after the program we deployed K-ST on K version 5.1.11 (Intel(R) Core(TM)
executes, I and I 0 represent the commands corresponding i7-9750H CPU @ 2.60GHz). In the following, we design
to the exception termination, and "and %represent consis- multiple experiments to systematically answer the following
tency and inconsistency respectively. As a result, unless K- research questions (RQs).
ST and OpenPLC exhibit identical execution and memory
states, their behavior will be deemed inconsistent. 3. https://beremiz.org/
4. https://github.com/thiagoralves/OpenPLC Editor/tree/
2. https://www.openplcproject.com/ master/matiec
12

Fig. 13: Overview of the test process

5.1 Test Sets


For the purpose of evaluating the coverage and the correct-
ness of K-ST, the test data set that we used comes from
GitHub. We searched 4853 programs in GitHub through
keywords in the ST language. Then, we automatically
screened out samples containing other programming lan-
guages (2516) and XML forms (1542). After that, we manu-
ally splice the remaining programs and remove samples that
lack the components required for operation (such as POUs).
After screening, 567 complete programs written in pure
ST formed our test set. In other words, these 567 samples
contain all the components required for operation and do
not use other languages, such as C and Python.
With the aim of comprehensively testing the correct-
ness of the execution behavior of OpenPLC, we use two
sample sets, including test samples collected from GitHub
(GitHub set) and test samples obtained through mutation
(Mutated set). The GitHub set is the sample set with 567
test samples mentioned before. The Mutated set is generated
by Algorithm 1. We selected 30 high-quality samples from
GitHub set as initial mutant seeds. These 30 samples contain
all the key features of ST. Then, three rounds of iterative
mutation are carried out through Algorithm 1. Each round
of iteration produces 10 mutation samples per seed. Except
for the initial seed used in the first round, the seeds of each
• RQ1: How much of the ST language is K-ST cover- round of mutation are the result of the previous round of
ing? Completeness of the semantics is an important mutation. We get a set containing 33,330 mutation samples.
indicator to measure executable formal semantics.
The lack of key semantics will seriously affect the 5.2 Experiment Results and Analyses
usefulness of formal semantics.
• RQ2: Is K-ST correct? Semantic correctness is the 5.2.1 Semantic Completeness (RQ1)
basis for ensuring the usability of executable formal We executed K-ST on 567 test samples collected from
semantics, so we need to analyze the correctness of GitHub. Among these 567 test samples, K-ST supports the
formal semantics implemented. execution of 509 of them. For these 509 tests which K-ST can
• RQ3: Can K-ST be used to discover bugs in a com- support, Fig. 14 lists the number of tests for some important
piler? This is important since a key application of features (based on TABLE 1) used in the evaluation. Specif-
executable formal semantics is to identify compiler ically, there are six kinds of features, namely FUNCTION,
bugs. FUNCTION BLOCK, PROGRAM, Declaration types, Date types
13

Fig. 14: Number of tests for each feature in ST

and Statements. For Declaration types, we list the num- keyword BEGIN to represent the end of variable declaration
ber of tests for CONSTANT, VAR GLOBAL, VAR, VAR INPUT, and the beginning of operation instructions. In addition,
VAR OUTPUT, VAR IN OUT, VAR TEMP and VAR EXTERNAL. there are also obvious differences between different prod-
For Data types, we list the number of tests for elemen- ucts of the same vendor. For example, the S7-1500 and
tary types signed integer (INT, DINT, SINT, LINT), un- the S7-1200 from Siemens support different type conversion
signed integer (UINT, UDINT, USINT, ULINT), float (REAL, methods5 , where the former only provides explicit conver-
LREAL), Boolean (BOOL), byte (BYTE, WORD, DWORD), string sions of types, and the latter provides both explicit and
(STRING, WSTRING), and time (TIME, DATE, TIME OF DAY, implicit conversions.
DATE AND TIME); compound types enum (ENUM) and struct
(STRUCT); and finally, the array type ARRAY. For Statements, 5.2.2 Semantics Correctness (RQ2)
we list the number of tests for main control statements: IF, On the other hand, in order to evaluate the correctness of
CASE, FOR, WHILE, REPEAT, EXIT and RETURN. K-ST, we compared the execution results of K-ST against
As indicated in Fig. 14, compared with FUNCTION, those of vendor compilers CODESYS, CX-Programmer and
the FUNCTION BLOCK is more favored by ST program- GX Works2. We consider the proposed semantics correct
mers (PROGRAM is necessary for ST program operation). if the execution behaviors of K-ST are consistent with the
For Declaration types, the most used is VAR (with ones of the CODESYS, CX-Programmer and GX Works2
a ratio of 470/509), followed by VAR INPUT (386/509), compilers. The consistency criteria described in Section 4
VAR OUTPUT (360/509) and VAR IN OUT (313/509). Among are utilized to evaluate the consistency of behavior between
all the Data types, BOOL is the most used, followed by K-ST and the compilers provided by vendors. Specifically,
unsigned integer and ARRAY, constituting 322/509 and if K-ST and these compilers demonstrate identical execution
311/509 respectively. For the Data types, BOOL is the most and variable states for the same program, their behavior
common type. In addition, we must remark that we do not is deemed consistent. We list the coverage of the K-ST
count the type of array members. Finally, IF is the most semantics in TABLE 6 from the perspective of each feature
common statement in all the tests considered. This is also in specified by the official ST documentation, where FC, C and
line with the main working scenarios of PLCs. N mean “Fully Covered and Consistent with Compilers”,
“Covered and Consistent with Compilers” and “Not Cov-
We remark that we do not consider the vendor-based
ered”, respectively.
functions in this experiment as these functions vary not
From TABLE 6, we can see clearly that for POUs, we
only from vendor to vendor, but even from product to
fully cover the declaration and call. In variable declarations,
product. In particular, Mitsubishi PLCs provide completely
different data types, including Bit, Word[Signed/Unsigned],
5. https://support.industry.siemens.com/dl/dl-media/272/
Double Word[Signed/Unsigned], Bit STRING[16-bit/32- 109742272/att 918238/v6/93516999691/zh-CHS/index.html#
bit], FLOAT, STRING[32] and Time. Siemens PLCs support ae443583b99950f7cca0d7237fe81ad4
14

TABLE 6: Coverage of the proposed ST semantics

Feature Coverage Feature Coverage Feature Coverage

POUs(core) Data types(core) Enum instantiation FC


P OU s declaration SIN T FC Struct
F U N CT ION FC IN T FC Struct declaration FC
F U N CT ION BLOCK FC DIN T FC Struct instantiation FC
P ROGRAM FC LIN T FC F unction block
P OU s calls U SIN T FC F unction block instantiation FC
F U N CT ION FC U IN T FC Array
F U N CT ION BLOCK FC U DIN T FC One − dimensional array C
P ROGRAM FC U LIN T FC M ulti − dimensional array C
Variable Declaration(core) REAL FC Statements(core)
CON ST AN T FC LREAL FC Assignment statement
V AR GLOBAL FC BOOL FC := FC
V AR FC BY T E FC ⇒ N
V AR IN P U T FC W ORD FC Branch statement
V AR OU T P U T FC DW ORD FC IF FC
V AR IN OU T FC ST RIN G FC CASE FC
V AR EXT ERN AL FC W ST RIN G FC Loop statement
V AR T EM P FC T IM E FC W HILE FC
AT C DAT E FC F OR FC
RET AIN N T IM E OF DAY FC REP EAT FC
P ERSIST EN T N DAT E AN D T IM E FC Break statement
Typed constant Enum RET U RN FC
Type # Data FC Enum declaration FC EXIT FC
Built − in function
N umerical f unction (30)
ADD , SU B , M U L, SQR, IN C , DEC , M AX , M IN , M U X , ABS , SQRT , T RU N C , F RAC , F LOOR, LN , LOG, EXP , SIN
COS , T AN , COS , T AN ASIN , ACOS , AT AN , N EG, EXP T , DIV , M OD, LIM IT
Logical f unction (9)
GT , LT , GE , LE , EQ, N E , AN D, OR, SEL
String f unction (9)
CON CAT , IN SERT , DELET E , REP LACE , F IN D, LEN , LEF T , RIGHT , M ID
T ranslate f unction (160)
 FC: Fully Covered and Consistent with Compilers (256/262)  C: Covered and Consistent with Compilers (3/262)  N: Not Covered (3/262)

AT is related to input and output. We remark, however, that Programmer and GX Works2, the following points need to
the storage mode of variables in K is very different from that be explained. Firstly, due to the closed nature of these com-
in real PLCs, so we just support simple computer-side input pilers, they cannot be simply called, so we have to manually
and output. In addition, RETAIN and PERSISTENT are related fill the code in the specified way into the compiler to compile
to the actual situation in the PLC, so they are not imple- and run, and compare the results, which is laborious and te-
mented. For instance, AT is used to bind the actual point of dious work. This also hinders us from testing these commer-
the PLC; RETAIN and PERSISTENT support the preservation cial compilers in an extensively large scale. After that, differ-
of variable values after a power failure or power loss. Array ent vendors have obvious differences in the implementation
is the only one which is covered but not fully covered in of compilers, so the source code needs to be adapted to a
all data types. Limited by the realization of arrays, it is certain extent. For example, only 10 basic data types—Bit,
temporarily impossible to achieve the array for enum and Word[Signed/Unsigned], Double Word[Signed/Unsigned],
struct, and to assign values to multi-dimensional arrays as Bit STRING[16-bit/32-bit], FLOAT, STRING[32] and Time—
a whole. In statements, ⇒ has been used in K and can be are provided in the GX Works2 compiler, so we need to
replaced by :=. For built-in functions, we show a list which adapt the variable types of the source program.
we supported, including 30 numerical functions, 9 logical
functions, 9 string functions and 160 translate functions.
In the process of comparing with CODESYS, CX-
15

TABLE 7: The results of K-ST and OpenPLC functional deficiencies and bugs we found in OpenPLC.
We show some relevant case studies in APPENDIX B.
Data Set GitHub Set Mutated Set Considering that Beremiz can be regarded as an updated
Number of samples 567 31059 (2271) version of OpenPLC, we have retested the inconsistencies
Number of program K-ST 509 15850 we found in Beremiz. We found that in the latest Beremiz,
it fixes some problems, including negative MOD operation
run completely OpenPLC 490 11581
results and “VAR” parsing exceptions. But other bugs and
Kp Of 30 5664 shortcomings still exist. In response to these problems in
Inconsistent Kf Op 11 1395 OpenPLC, we have submitted them to the OpenPLC and
Diff. Result 0 735 Beremiz developers and are waiting for their confirmation6 .

5.2.3 Finding Bugs in OpenPLC (RQ3) 6 R ELATED W ORK


We execute OpenPLC and K-ST with the GitHub set and In this section, we discuss some other PLC program analysis
Mutated set as input. The execution results of the two data techniques, summarize their characteristics, and distinguish
sets are shown in TABLE 7. Here, Kp Of is the number of them from our work.
programs that K-ST can execute normally but OpenPLC Keliris et al. [19] propose a framework (ICSREF) which
cannot compile and run; Kf Op is the number of programs can automate the reverse engineering process for PLC bina-
that K-ST cannot run normally but OpenPLC can. ries. They instantiate ICSREF modules for reversing binaries
For the GitHub set, K-ST supports 509 of them, and compiled with CODESYS and getting the complete Control
OpenPLC supports 490. Through analysis, we found that the Flow Graph (CFG), and they provide an end-to-end case
reason for this phenomenon is that OpenPLC has some func- study of dynamic payload generation and attack deploy-
tional deficiencies. For example, OpenPLC does not support ment. Tychalas et al. [7] analyze the binary files generated by
the initialization of variables using formulas at the time all control system programming languages in CODESYS to
of declaration; numerical calculations of BYTE, WORD, DWORD understand the differences and even the vulnerabilities in-
types are not supported, etc. troduced during the program compilation process. Based on
For the Mutated set, there is a big difference between this analysis, they provide a fuzzing framework (ICSFuzz)
the execution results of K-ST and OpenPLC. First of all, to perform security evaluation of the PLC binaries. Our
we filter 2,271 timeout programs that timed out both in work differs from them because we focus on the source code
OpenPLC and K-ST with 10 seconds as the time limit. After and do not rely on any specific compilation environment.
that, we manually analyzed these samples with inconsistent Kuzmin et al. [57] propose to use linear-time tem-
results to determine the causes. For the large Kp Of value, poral logic (LTL) to guide program behavior and check
functional deficiencies remain the main reason. whether ST programs satisfy the corresponding temporal
We found an interesting bug in OpenPLC. The bug is a logic through Cadence SMV. Darvas et al. [15] propose rule-
“VAR” parsing exception in OpenPLC. If the first operation based reductions and a Cone of Influence (COI) reduction
instruction starts with “VAR”, such as “VAR0 := 1;”, Open- variant for state explosion problems that may be encoun-
PLC terminates abnormally. The interesting phenomenon is tered in the formal analysis of ST code, and use the NuSMV
when an error statement appears in an unexecuted part of model checker to verify temporal logic. After that, they
the program, such as after the ”RETURN;”: K-ST can execute [58] provide a state machine and data-flow-based formal
such a program, but OpenPLC cannot. The main reason specification method for PLC modules. In addition, they
for this phenomenon is that K adopts an operation-based [43] analyze the feasibility of converting between the 5 PLC
detection mechanism. Because the error code will not be programming languages provided by Siemens, and point
executed, it will not lead to the termination of our executable out that the extended SCL (a vendor-defined ST) can be
semantics. The case study is shown in APPENDIX A. used as the target language for conversion. Adiego et al.
After that, by analyzing those programs that have differ- [59] propose an intermediate model-based method which
ent results on K-ST and OpenPLC, we find that the reasons can transform PLC programs written in different modeling
for the different results are mainly due to the differences in languages of verification tools to facilitate checking tem-
underlying implementations between K and OpenPLC. For poral logic. Hailesellasie et al. [60] propose UBIS, which
example, for integer mode operation −7 MOD 3, the execu- converts ST programs with potential intrusions as well as
tion result of K-ST is −1, whereas the result for OpenPLC trusted versions of programs into attributed graphs through
is 2. From a mathematical point of view, both results are UPPAAL, and compares their nodes and edges to detect
correct, but they will have a completely different impact on stealthy code injections. Bohlender et al. [61] apply formal
any following operations. When we run the program again verification and falsification of temporal logic specifications
in CODESYS, the results of CODESYS were the same as K- to analyze chemical plant automation systems. Rawlings
ST. et al. [62] use symbolic model checking tools st2smv and
SynthSMV to verify and falsify a ST program controlling
For those samples that K-ST cannot run normally but
batch reactor systems. Xiong et al. [23] use the behavior
OpenPLC can execute normally, our analysis found some
model (BM) to specify the behavior of ST programs, and
bugs in OpenPLC. For example, while OpenPLC can check
explicit divide-by-zero operations, it allows the execution 6. https://bitbucket.org/automforge/matiec git/issues?status=
of implicit divide-by-zero operations. TABLE 8 details all new&status=open
16

TABLE 8: The bugs and functional deficiencies of OpenPLC

Type Problem Description


“VAR” parsing exception The first operation instruction starts with “VAR”, and OpenPLC terminates abnormally.
Division by zero OpenPLC can check explicit division 0 but allow the execution of implicit division 0.
Bug Overflow access OpenPLC can check explicit overflow access but allow the execution of implicit overflow access.
MOD by zero OpenPLC provides MOD 0 operation, and the result is 0.
MOD Exception The divisor of MOD operation can be empty.
Numerical OpenPLC does not support normal numerical calculation ∗∗.
calculation defects Numerical calculations of BYTE, WORD, DWORD types are not supported.
Array functions defects Parentheses are not allowed in array assignments.
FUNCTION BLOCK
Functional Multiple instantiation of function blocks in one statement is not allowed.
instantiation defects
deficiencies
ENUM defects OpenPLC does not support normal assignment of ENUM type.
Variable Some non-keyword strings cannot be used as variable names, such as “ramp”, “LocalVar0 ”, etc.
declaration defects OpenPLC can not support formula and other variables previously declared as initial value.
Without operation or variable declarations, OpenPLC cannot compile ST program.
Structural defects
Without statements in FOR, WHILE, IF, CASE, REPEAT, OpenPLC cannot compile ST.

provide an method based on automatic theoretical to verify 61833015 and 62293511, Provincial Key R&D Program of
LTL attributes on BM. Our work differs from the afore- Zhejiang under grants 2020C01038 and 2021C01032, and the
mentioned works because they attempt to transform PLC Starry Night Science Fund of Zhejiang University Shanghai
programs into intermediate languages or other program- Institute for Advanced Study, Grant No. SN-ZJU-SIAS-001.
ming languages which are suitable for verifying or detecting
potential issues, and lack analysis in the conversion process.
In addition, these methods do not offer feedback at the level R EFERENCES
of source code.
[1] R. Langner, “Stuxnet: Dissecting a cyberwarfare weapon,” IEEE
Huang et al. [38] is the closest work to ours. They first Security & Privacy, vol. 9, no. 3, pp. 49–51, 2011.
defined the executable semantics of the ST language in K [2] G. Liang, S. R. Weller, J. Zhao, F. Luo, and Z. Y. Dong, “The 2015
and use it to check some security properties. Our work Ukraine blackout: Implications for false data injection attacks,”
IEEE Transactions on Power Systems, vol. 32, no. 4, pp. 3317–3318,
differs because we cover a more complete ST language, and 2016.
we can use it to discover errors in ST compilers. [3] K. Zetter, “The Ukrainian power grid was hacked again,” Mother-
board, 2017.
[4] N. Perlroth and C. Krauss, “A cyberattack in Saudi Arabia had a
7 C ONCLUSION deadly goal,” Experts fear another try, 2018.
[5] D. Tychalas and M. Maniatakos, “Open platform systems under
In this paper, we introduced an executable operational
scrutiny: A cybersecurity analysis of the device tree,” in 2018 25th
semantics of ST formalized in the K framework. We pre- IEEE International Conference on Electronics, Circuits and Systems
sented the semantics of the core features of ST, namely (ICECS). IEEE, 2018, pp. 477–480.
data types, memory operations, its main control statements, [6] A. Nochvay, “Security research: CODESYS runtime, a PLC control
framework,” Kaspersky ICS CERT, 2019.
and function calls. Our experimental results show that the [7] D. Tychalas, H. Benkraouda, and M. Maniatakos, “ICSFuzz: Ma-
proposed ST semantics has already covered the main core nipulating I/Os and repurposing binary code to enable instru-
language features and correctly implements 26,137 lines of mented fuzzing in ICS control applications,” in 30th {USENIX}
public ST code on GitHub. Furthermore, the application Security Symposium ({USENIX} Security 21), 2021.
[8] “Programmable controllers - Part 3: Programming languages,”
of the proposed semantics in testing and analyzing PLC International Electrotechnical Commission, Standard, 2013.
compilers is discussed. By comparing and analyzing the [9] T. M. Antonsen, PLC Controls with Structured Text (ST), V3: IEC
execution results of OpenPLC and K-ST, we found five 61131-3 and best practice ST programming. BoD–Books on Demand,
2020.
bugs and some functional deficiencies in OpenPLC. In the
[10] J. O. Blech and S. O. Biha, “On formal reasoning on the semantics
future, we hope to further extend K-ST to support the pro- of PLC using Coq,” arXiv preprint arXiv:1301.3047, 2013.
gramming environments provided by different vendors. For [11] J. O. Blech and S. Ould Biha, “Verification of PLC properties based
example, vendors may customize keywords (Bit STRING of on formal semantics in Coq,” in International Conference on Software
Engineering and Formal Methods. Springer, 2011, pp. 58–73.
GX Works2), add additional structures (LABEL of Siemens), [12] T. Ovatman, A. Aral, D. Polat, and A. O. Ünver, “An overview
or even widely extend ST (ExST of CODESYS). of model checking practices on verification of PLC software,”
Software & Systems Modeling, vol. 15, no. 4, pp. 937–960, 2016.
[13] H. Janicke, A. Nicholson, S. Webber, and A. Cau, “Runtime-
ACKNOWLEDGMENTS monitoring for industrial control systems,” Electronics, vol. 4, no. 4,
pp. 995–1017, 2015.
We thank the reviewers for their constructive feedback. This
[14] L. Garcia, S. Zonouz, D. Wei, and L. P. De Aguiar, “Detecting PLC
research is supported by National Key R&D Program of control corruption via on-device runtime verification,” in 2016
China under grant 2020YFB2010900, NSFC under grants Resilience Week (RWS). IEEE, 2016, pp. 67–72.
17

[15] D. Darvas, B. F. Adiego, A. Vörös, T. Bartha, E. B. Vinuela, and L. Zhang, “A survey of compiler testing,” ACM Computing Surveys
V. M. G. Suárez, “Formal verification of complex properties on (CSUR), vol. 53, no. 1, pp. 1–36, 2020.
PLC programs,” in International Conference on Formal Techniques for [36] W. M. McKeeman, “Differential testing for software,” Digital Tech-
Distributed Objects, Components, and Systems. Springer, 2014, pp. nical Journal, vol. 10, no. 1, pp. 100–107, 1998.
284–299. [37] R. Schumi and J. Sun, “SpecTest: Specification-based compiler
[16] D. Darvas, I. Majzik, and E. B. Viñuela, “Formal verification of testing,” Fundamental Approaches to Software Engineering, vol. 12649,
safety PLC based control software,” in International Conference on p. 269, 2021.
Integrated Formal Methods. Springer, 2016, pp. 508–522. [38] Y. Huang, X. Bu, G. Zhu, X. Ye, X. Zhu, and J. Shi, “KST: Executable
[17] L. Garcia, F. Brasser, M. H. Cintuglu, A.-R. Sadeghi, O. A. Mo- formal semantics of IEC 61131-3 structured text for verification,”
hammed, and S. A. Zonouz, “Hey, my malware knows physics! IEEE Access, vol. 7, pp. 14 593–14 602, 2019.
Attacking PLCs with physical model aware rootkit.” in NDSS, [39] G. Rosu, “K: A semantic framework for programming languages
2017. and formal analysis tools,” Dependable Software Systems Engineer-
[18] R. Spenneberg, M. Brüggemann, and H. Schwartke, “PLC-Blaster: ing, vol. 50, p. 186, 2017.
A worm living solely in the PLC,” Black Hat Asia, vol. 16, pp. 1–16, [40] M. J. Hohnka, J. A. Miller, K. M. Dacumos, T. J. Fritton, J. D. Erdley,
2016. and L. N. Long, “Evaluation of compiler-induced vulnerabilities,”
[19] A. Keliris and M. Maniatakos, “ICSREF: A framework for auto- Journal of Aerospace Information Systems, vol. 16, no. 10, pp. 409–426,
mated reverse engineering of industrial control systems binaries,” 2019.
arXiv preprint arXiv:1812.03478, 2018. [41] M. Marcozzi, Q. Tang, A. F. Donaldson, and C. Cadar, “Compiler
[20] S. Guo, M. Wu, and C. Wang, “Symbolic execution of pro- fuzzing: How much does it matter?” Proceedings of the ACM on
grammable logic controller code,” in Proceedings of the 2017 11th Programming Languages, vol. 3, no. OOPSLA, pp. 1–29, 2019.
Joint Meeting on Foundations of Software Engineering, 2017, pp. 326– [42] T. R. Alves, M. Buratto, F. M. De Souza, and T. V. Rodrigues,
336. “OpenPLC: An open source alternative to automation,” in IEEE
[21] S. E. McLaughlin, S. A. Zonouz, D. J. Pohly, and P. D. McDaniel, Global Humanitarian Technology Conference (GHTC 2014). IEEE,
“A trusted safety verifier for process controller code.” in NDSS, 2014, pp. 585–589.
vol. 14, 2014. [43] D. Darvas, I. Majzik, and E. Blanco Viñuela, “Generic representa-
[22] G. Canet, S. Couffin, J. Lesage, A. Petit, and P. Schnoebelen, tion of PLC programming languages for formal verification,” in
“Towards the automatic verification of PLC programs written in 23rd PhD Mini-Symposium. Budapest University of Technology
instruction list,” in Proceedings of the IEEE International Conference and Economics, 2016, pp. 6–9.
on Systems, Man & Cybernetics: ”Cybernetics Evolving to Systems, [44] N. Roos, “Programming PLCs using structured text,” in Interna-
Humans, Organizations, and their Complex Interactions”. IEEE, 2000, tional Multiconference on Computer Science and Information Technol-
pp. 2449–2454. ogy. Citeseer, 2008, pp. 20–22.
[23] J. Xiong, X. Bu, Y. Huang, J. Shi, and W. He, “Safety verification [45] F. Markovic, “Automated test generation for structured text lan-
of IEC 61131-3 Structured Text programs,” IEEE Transactions on guage using uppaal model checker,” 2015.
Industrial Informatics, vol. 17, no. 4, pp. 2632–2640, 2020. [46] M. Tiegelkamp and K.-H. John, IEC 61131-3: Programming indus-
[24] M. Zhang, C.-Y. Chen, B.-C. Kao, Y. Qamsane, Y. Shao, Y. Lin, trial automation systems. Springer, 2010.
E. Shi, S. Mohan, K. Barton, J. Moyne et al., “Towards automated [47] N. Martı-Oliet and J. Meseguer, “Rewriting logic: roadmap and
safety vetting of PLC code in real-world plants,” in 2019 IEEE bibliography,” Theoretical Computer Science, vol. 285, no. 2, pp. 121–
Symposium on Security and Privacy (SP). IEEE, 2019, pp. 522–538. 154, 2002.
[25] N. Bauer, S. Engell, R. Huuck, S. Lohmann, B. Lukoschus, [48] A. Stefănescu, D. Park, S. Yuwen, Y. Li, and G. Roşu, “Semantics-
M. Remelhe, and O. Stursberg, “Verification of PLC programs based program verifiers for all languages,” ACM SIGPLAN Notices,
given as sequential function charts,” in Integration of software vol. 51, no. 10, pp. 74–91, 2016.
specification techniques for applications in Engineering. Springer, [49] C. Ellison and G. Rosu, “An executable formal semantics of C with
2004, pp. 517–540. applications,” ACM SIGPLAN Notices, vol. 47, no. 1, pp. 533–544,
[26] A. Mader and H. Wupper, “Timed automaton models for simple 2012.
programmable logic controllers,” in Proceedings of 11th Euromicro [50] D. Bogdanas and G. Roşu, “K-Java: A complete semantics of
Conference on Real-Time Systems. Euromicro RTS’99. IEEE, 1999, pp. Java,” in Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT
106–113. Symposium on Principles of Programming Languages, 2015, pp. 445–
[27] T. Mertke and G. Frey, “Formal verification of PLC programs 456.
generated from signal interpreted Petri nets,” in 2001 IEEE Inter- [51] D. Park, A. Stefănescu, and G. Roşu, “KJS: A complete formal
national Conference on Systems, Man and Cybernetics. e-Systems and semantics of JavaScript,” in Proceedings of the 36th ACM SIGPLAN
e-Man for Cybernetics in Cyberspace (Cat. No. 01CH37236), vol. 4. Conference on Programming Language Design and Implementation,
IEEE, 2001, pp. 2700–2705. 2015, pp. 346–356.
[28] R. Huuck, “Semantics and analysis of instruction list programs,” [52] F. Wang, F. Song, M. Zhang, X. Zhu, and J. Zhang, “KRust: A for-
Electronic Notes in Theoretical Computer Science, vol. 115, pp. 3–18, mal executable semantics of Rust,” in 2018 International Symposium
2005. on Theoretical Aspects of Software Engineering (TASE). IEEE, 2018,
[29] J. Sadolewski, “Conversion of ST control programs to ANSI C for pp. 44–51.
verification purposes,” e-Informatica Software Engineering Journal, [53] J. Jiao, S. Kan, S.-W. Lin, D. Sanan, Y. Liu, and J. Sun, “Semantic
vol. 5, no. 1, 2011. understanding of smart contracts: Executable operational seman-
[30] B. F. Adiego, D. Darvas, E. B. Viñuela, J.-C. Tournier, V. M. G. tics of Solidity,” in 2020 IEEE Symposium on Security and Privacy
Suárez, and J. O. Blech, “Modelling and formal verification of (SP). IEEE, 2020, pp. 1695–1712.
timing aspects in large PLC programs,” IFAC Proceedings Volumes, [54] T. Nipkow and G. Klein, “Imp: A simple imperative language,” in
vol. 47, no. 3, pp. 3333–3339, 2014. Concrete Semantics. Springer, 2014, pp. 75–94.
[31] O. Maler and S. Yovine, “Hardware timing verification using KRO- [55] D. D. McCracken and E. D. Reilly, “Backus-Naur Form (BNF),” in
NOS,” in Proceedings of the Seventh Israeli Conference on Computer Encyclopedia of Computer Science, 2003, pp. 129–131.
Systems and Software Engineering. IEEE, 1996, pp. 23–29. [56] G. Roşu and T. F. Şerbănuţă, “K overview and simple case study,”
[32] M. Heiner and T. Menzel, “Petri net semantics for the PLC user Electronic Notes in Theoretical Computer Science, vol. 304, pp. 3–56,
programming language Instruction List,” Techn. Report BTU Cot- 2014.
tbus, I-20/1997, Cottbus December, 1997. [57] E. V. Kuzmin, A. Shipov, and D. A. Ryabukhin, “Construction and
[33] V. Le, M. Afshari, and Z. Su, “Compiler validation via equivalence verification of PLC programs by LTL specification,” in 2013 Tools
modulo inputs,” ACM Sigplan Notices, vol. 49, no. 6, pp. 216–226, & Methods of Program Analysis. IEEE, 2013, pp. 15–22.
2014. [58] D. Darvas, E. Blanco Vinuela, and I. Majzik, “A formal specifica-
[34] X. Yang, Y. Chen, E. Eide, and J. Regehr, “Finding and understand- tion method for PLC-based applications,” 2015.
ing bugs in C compilers,” in Proceedings of the 32nd ACM SIGPLAN [59] B. F. Adiego, D. Darvas, E. B. Viñuela, J.-C. Tournier, S. Bliudze,
conference on Programming language design and implementation, 2011, J. O. Blech, and V. M. G. Suárez, “Applying model checking to
pp. 283–294. industrial-sized PLC programs,” IEEE Transactions on Industrial
[35] J. Chen, J. Patra, M. Pradel, Y. Xiong, H. Zhang, D. Hao, and Informatics, vol. 11, no. 6, pp. 1400–1410, 2015.
18

[60] M. Hailesellasie and S. R. Hasan, “Intrusion detection in PLC- Jun Sun is currently a tenured professor at the
based industrial control systems using formal verification ap- School of Information Systems, Singapore Man-
proach in conjunction with graphs,” Journal of Hardware and Sys- agement University. He received bachelor’s and
tems Security, vol. 2, no. 1, pp. 1–14, 2018. Ph.D. degrees in computing science from the
[61] D. Bohlender and S. Kowalewski, “Compositional verification of National University of Singapore (NUS) in 2002
PLC software using horn clauses and mode abstraction,” IFAC- and 2006, respectively. From 2010 to 2019, he
PapersOnLine, vol. 51, no. 7, pp. 428–433, 2018. was an assistant/associate professor at the Sin-
[62] B. C. Rawlings, J. M. Wassick, and B. E. Ydstie, “Application of gapore University of Technology and Design. He
formal verification and falsification to large-scale chemical plant was a visiting scholar at MIT from 2011 to 2012.
automation systems,” Computers & Chemical Engineering, vol. 114, His research focuses on software engineering,
pp. 211–220, 2018. formal methods, program analysis, and cyber-
security. He is the co-founder of the PAT model checker.

Kun Wang received the B.S. degree in infor-


mation and computing sciences from Chongqing
University of Posts and Telecommunications of
China, in 2017. He received the M.Eng. degree
in Cyberspace Security from Xidian University
of China, in 2020. He is currently pursuing his
Ph.D degree with State Key Laboratory of In-
dustrial Control Technology, Group of Networked
Sensing and Control, Zhejiang University. His re-
search interests include control system security
and formal methods.

Jingyi Wang is currently a tenure-track assis-


tant professor at the College of Control Science
and Engineering, Zhejiang University, China. He
received his Ph.D. from Singapore University of
Technology and Design in 2018, and his bach-
elor’s degree in Information Engineering from
Xi’an Jiaotong University in 2013. He was a re-
search fellow at the School of Computing, Na-
tional University of Singapore during 2019-2020
and at Information Systems Technology and De-
sign Pillar, Singapore University of Technology Peng Cheng received the B.Sc. degree in au-
and Design during 2018-2019. His research interests include formal tomation and the Ph.D. degree in control science
methods, software engineering, cyber-security and machine learning. and engineering, from Zhejiang University, Hang
Zhou, China, in 2004 and 2009, respectively.
From 2012 to 2013, he worked as Research
Fellow in Information System Technology and
Design Pillar, Singapore University of Technol-
ogy and Design. He is currently a Professor with
Christopher M. Poskitt is an Associate Profes- the College of Control Science and Engineering,
sor of Computer Science (Education) at Singa- Zhejiang University, Hangzhou, China. His re-
pore Management University (SMU), where he search interests include networked sensing and
is part of the Centre for Research on Intelligent control, cyber-physical systems, and control system security.
Software Engineering. Prior to SMU, he held
postdoctoral research positions at ETH Zürich
and SUTD, and obtained his PhD in Computer
Science from the University of York (2014). His
research broadly addresses the problem of engi-
neering correct and secure software, especially
in the context of cyber-physical systems (e.g. in-
dustrial control systems, autonomous vehicles). In addition to software
engineering, his research interests span formal methods, cybersecurity,
and computer science education.

Xiangxiang Chen received the B.Eng. degree


in mechanical engineering from Xi’an Jiaotong
University, Xi’an, China in 2021. He is work-
ing toward the Ph.D degree in Cyberspace
Security at the IS2 Lab at School of Control
Science and Engineering, Zhejiang University,
Hangzhou, China. His research interests include
fuzzing and AI system testing.

You might also like