Celonis PQL Chapter-Web
Celonis PQL Chapter-Web
Chapter from:
Process Querying Methods
Acknowledgments
This chapter will be published in the upcoming book “Process Querying Methods”
by Artem Polyvyanyy, Springer Nature Switzerland AG, 2020.
Abstract
3
Process mining studies data-driven methods to discover, enhance and monitor business
processes by gathering knowledge from event logs recorded by modern IT systems. To gain
valuable process insights, it is essential for process mining users to formalize their process
questions as executable queries. For this purpose, we present the Celonis Process Query Language
(Celonis PQL), which is a domain-specific language tailored towards a special process data
model and designed for business users. It translates process-related business questions into
queries and executes them on a custom-built query engine. Celonis PQL covers a broad set
of more than 150 operators, ranging from process-specific functions to machine learning and
mathematical operators. Its syntax is inspired by SQL, but specialized for process-related queries.
In addition, we present practical use cases and real-world applications, which demonstrate the
expressiveness of the language and how business users can apply it to discover, enhance and
monitor business processes. The maturity and feasibility of Celonis PQL is shown by thousands of
users from different industries, who apply it to various process types and huge amounts of event
data every day.
Thomas Vogelgesang
Celonis SE, Munich, Germany
Jessica Kaufmann
Celonis SE, Munich, Germany
David Becher
Celonis SE, Munich, Germany
Robert Seilbeck
Celonis SE, Munich, Germany
Martin Klenk
Celonis SE, Munich, Germany
Due to their strong ability to provide transparency across complex business processes, process
mining capabilities have been adopted by many software vendors and academic tools. A key
success factor for any process mining tool is the ability to translate business questions into
executable process queries and to make the query results accessible to the user. To this end, we
developed Celonis Process Query Language (Celonis PQL). It takes the input from the user and
executes the queries in a custom-built query engine. This allows the users to analyze all facets
of a business process in detail, as well as to detect and employ process improvements. Celonis
PQL is a comprehensive query language that consists of more than 150 (process) operators. The
language design is strongly inspired by the requirements of business users. Therefore, Celonis
PQL achieved a wide adoption by thousands of users across various industries and process types.
This chapter is organized as follows: Section 2 provides the background knowledge, which is
required to understand the specifics of our process query language. Section 3 gives an overview
of the various application scenarios of Celonis PQL. Section 4 presents the query language, its
syntax and operators. Section 5 demonstrates the applicability of Celonis PQL to solve widespread
business problems. Section 6 outlines the implementation of the query language. Section 7
positions Celonis PQL within the Process Querying Framework (PQF). Finally, Section 8 concludes
the chapter.
2 Background
In this section, we introduce the general concept of process mining and how the Celonis software
architecture enables process mining through our query language. We also present the history
of Celonis PQL as well as the design goals that were considered during the development of the
query language.
A sequence
The timestamp of events,
indicates ordered
the time whenby
thetheir
eventtimestamps,
took place. that belong to the same case
is called a trace. The traces of all the different cases with the same activity sequence
A sequence of events, ordered by their timestamps, that belong to the same case is called a
represent a variant. The throughput time between two events of a case is the time
trace. The traces of all the different cases with the same activity sequence represent a variant.
Thedifference
throughputbetween the corresponding
time between two events of timestamps.
a case is the Accordingly,
time differencethe throughput
between the time
corre-
of a case
sponding is equal to
timestamps. the throughput
Accordingly, time between
the throughput time the
of afirst
caseand the lasttoevent
is equal of the
the throughput
timecorresponding trace.
between the first and the last event of the corresponding trace.
Figure 1 shows an example event log in procurement. Each case represents a
process instance of one purchase order. In the first case, the order item is created in
the system, an approval for purchasing is requested and approved. After the approval,
the order is sent to the vendor. Two days later, the ordered goods are received, the
invoice is registered and eventually paid. In the second case, an order item is created,
but the approval to actually order it is rejected. Besides the three required attributes
of an event log mentioned above, the example also includes attribute Department,
which specifies the executing department
T. Vogelgesang, for each
J. Kaufmann, event,R.as
D. Becher, wellJ.as
Seilbeck, attribute Item
Geyer-Klingeberg, M. Klenk
Application 1 Application N
PQL PQL
Data Model
Transformations PQL
Transformations
Celonis PQL
Engine
Source System Z
Data
Process mining techniques that are applied on an event log to understand and improve
the corresponding process can be assigned to three groups [15]: discovery, conformance and
enhancement. Discovery uses the event log as input and generates a business process model as
output. Conformance takes the event log and an a priori process model to detect discrepancies
between the log data and the a priori model. Enhancement takes the event log and an a priori
model to improve the model with the insights generated from the event log.
Activities
Case Activity Timestamp Department
1 Create Purchase Order Item 2019-01-23 08:15 D1
1 Request Approval 2019-01-23 08:20 D1
1 Grant Approval 2019-01-23 11:00 D2
1 Send Purchase Order 2019-01-23 11:10 D1
1 Receive Goods 2019-01-25 10:30 D3
1 Scan Invoice 2019-01-25 11:30 D3
1 Clear Invoice 2019-01-28 17:15 D3
2 Create Purchase Order Item 2019-01-23 13:00 D1 Vendors
2 Request Approval 2019-01-23 15:00 D1 VendorNo Name Country
2 Reject 2019-01-23 18:00 D2
V10 Screws Inc. DE
Cases
Orders
Case Item Quantity OrderValue OrderNo
OrderNo ShippingType VendorNo
1 Screw 100 50 4711
4711 Standard V10
2 Screw Driver 1 99 4711
Fig.3:3:
Fig. Example
Example datamodel
data modelwith
withfour
fourtables,
tables,including
includingactivity
activityand
and case
case tables
tables
Data model.
A dataData model.
model A data model
combines combines
all tables from allthetables
sourcefrom the source
system system (orsource
(or multiple multiplesystems)
source systems) which contain the data about a process that
which contain the data about a process that a user wants to analyze. In the data a user wants to analyze.
model, the
foreignInkeytherelationships
data model, the foreignthe
between keysource
relationships
tables canbetween the source
be defined. tables
This can be here
is performed
because defined. This joins
specifying is performed hereofbecause
is not part the query specifying
language joins is not
itself. Thepart of the
tables arequery
arranged in
language itself. The tables are arranged in a snowflake schema,
a snowflake schema, which is common for data warehouses, and the schema is centered which is common
aroundfor data warehouses,
explicit and thetables.
case and activity schema is centered
Other around
data tables explicitadditional
provide case and activity
context. Figure
tables. Other data tables provide additional context. Figure 3 shows
3 shows an example data model. It contains the event log of Figure 1 in the Activities table, an example
data
including model.
the It contains
Department the event
column. log of to
It is linked Figure
the C1ases
in the Activities
table, table,information
containing including about
the Department
each order item. The Item column.
attributeIt is linked
from the to the Cases
example table,
event logcontaining
of Figure 1 information
is contained in the
aboutas
Cases table, each
theorder
dataitem.
model Theshould
Item attribute
contain from the example
normalized table event log ofBoth
schemas. Figure 1 items
order
is contained
(i.e. both in the Cases
cases) belong to thetable,
sameaspurchase
the data model
ordershould
whichcontain
is sentnormalized
to one vendor.table Details
purchase orders and the vendors are available in the O
schemas. Both order items (i.e. both cases) belong to the same purchase ordertables of
about the rders and V endors
the datawhich
model.is sent to one vendor. Details about the purchase orders and the vendors
are available in the Orders and Vendors tables of the data model.
Activity table.
Activity table. The data model always contains an event log, which we call the
activity
The data modeltable.
alwaysThecontains
activity table always
an event log,contains
which we thecall
threethecolumns
activity of the core
table. The activity
event log
table always attributes,
contains thewhile
threeadditional
columnscolumns may event
of the core be present. Within onewhile
log attributes, case, the
additional
columnscorresponding rows in
may be present. the activity
Within tablethe
one case, arecorresponding
always sorted basedrows on
in thethe timestamp
activity table are
always column.
sorted based on the timestamp column.
Usually, the activity table is not directly present in the source systems and therefore
Usually, the activity table is not directly present in the source systems and therefore needs to
needs to be generated depending on the business process being analyzed. Since
be generated depending on the business process being analyzed. Since the source system is
the source system is a relational database in most cases, this is usually done in SQL
in the so-called transformation step. The transformation result can be a database
view. However, a persisted table is usually
T. Vogelgesang, created
J. Kaufmann, for performance
D. Becher, reasons. This M. Klenk
R. Seilbeck, J. Geyer-Klingeberg,
procedure is comparable to the extract, transform, load procedure (ETL) in data
warehouses. Like all the other tables, the resulting activity table is then imported
a relational database in most cases, this is usually done in SQL in the so-called transformation
8 step. The transformation result can be a database view. However, a persisted table is usually
created for performance reasons. This procedure is comparable to the extract, transform,
load procedure (ETL) in data warehouses. Like all the other tables, the resulting activity table
is then imported into the data model. The user can specify the case and activity table in a
graphical user interface (GUI) and mark the corresponding columns of the activity table as
the case, activity and timestamp columns.
Case table.
The case table provides information about each case and acts as the fact table in the
snowflake schema. It always includes the case column, containing all distinct case IDs, while
other columns provide additional information on the cases. There is a 1:N relationship between
the case table and the activity table. If the case table is not specified in the data model, it will
be generated automatically during the data model load. The case table then consists of one
column containing all distinct case IDs from the activity table. This guarantees that a case
table always exists, and the Celonis PQL functions and operators can rely on it.
Celonis PQL Engine is an analytical column-store main memory database system. It evaluates
Celonis PQL queries over a defined data model. Section 6 describes the Celonis PQL Engine in
more detail.
Applications.
Celonis applications provide a variety of tools for the business user to discover, monitor and
enhance business processes. All applications use Celonis PQL to query the required data.
They include easy-to-use GUIs, providing a convenient way for the users to interact with the
process and business data. In the applications, the users can specify custom Celonis PQL
queries. There are also many auto-generated queries sent by the applications to retrieve
various information, which is then presented to the user in the graphical interface. An overview
of the different applications that use Celonis PQL is given in Section 3.
In SQL, different database systems support different SQL dialects. For example, function
names and syntaxes for certain functionalities like date and time calculations are database
specific. However, Celonis PQL was independent of the SQL dialect of the underlying database
system. If necessary, Celonis PQL functions were mapped to equivalent SQL functions based on
the appropriate dialect.
In October 2018, Celonis Intelligent Business Cloud (IBC) was released. This transition from
an on-premises product to a modern native cloud solution provides easier access to process
mining and, consequently, IBC increased the number of Celonis PQL users significantly. Many
new applications, all using Celonis PQL to query process data, are included in IBC, as described in
Section 3.
The language is continuously extended with new functionalities. This is mostly driven by
customers who use Celonis PQL on a daily basis to explore their data. Due to the rich functionality
and possibility to use it in various Celonis applications, Celonis PQL is used by a high number of
users in many production systems.
Simplicity.
The query language should be easy to use for business users. Providing an easy way to translate
complex process questions into data queries should make process mining accessible for
business users.
Flexibility.
The query language should not include specialized functions. Instead, the goal is to provide a set
of generic functions and operators that can be combined in a wide range of queries. This flexi-
bility is very important, since the users should be able to formulate all their questions in the query
language, regardless of the processes they address.
Event log-centered.
In contrast to SQL, the language should be designed to support dedicated process mining
functionality. This should be reflected in the query language by process functions, which operate
on the given event log.
Frontend interaction.
To simplify the use of the query language, the user should be able to formulate queries with
support of a GUI. Consequently, the goal is to design a language that provides easy integration
via GUI components. The simple query creation using a GUI is a key factor for the usability of a
product, which results in high acceptance, usage, and adoption by the users.
3 Applications
As a result of emerging technologies, the requirements on tools used for analyzing processes
within different business departments go beyond the simple tracking of performance. For this
reason, process mining at Celonis is evolving into a holistic approach that serves as a performance
accelerator for business processes. Necessary steps that are included within this approach are
the discovery, enhancement and monitoring of processes. Within discovery, process mining can
capture digital footprints from all source systems involved in the process, visualize the respective
processes and understand the root causes of deviations between the as-is process and the to-be
process. Thus, the discovery step serves as the starting point for process improvements. During
enhancement, process mining supports the automation of tasks, proposes intelligent actions
and proactively drives process interventions and improvements. Monitoring allows the user to
continuously track the development of key figures that are defined during the discovery step. This
enables the ongoing benchmarking of processes – internally, as well as externally. Celonis PQL
enables all these activities and tasks, and it is used in all Celonis products as depicted in Figure 4.
• Process Automation
• Transformation • Action Engine
Center
• Machine Learning
PQL Workbench
Monitor Enhance PQL
• Process Analytics is part of discovery and can be used to visualize and identify the root
causes of issues within a process. Furthermore, it identifies the specific actions that have the
greatest impact on solving the issue. For that purpose, Celonis PQL is used to obtain perfor-
mance metrics, such as the average case duration, the change rate in the process or the
degree of process automation. In addition, Celonis PQL enables the user to enrich the event
log data with data related to the problematic cases, such as finding the vendor causing the
issue, or identifying sub-processes that prolong throughput times.
• Process Conformance is, as shown in Figure 4, also part of the discovery step. It is used to
identify deviations from the defined to-be process and to uncover the root causes of process
deviations. In this context, Celonis PQL allows utilizing the calculated process conformance to
obtain metrics leading to the discovery of root causes for deviations [16].
• Action Engine is part of the enhancement step and uses insights gained from the discovery
step to recommend actions to improve process performance [1]. The foundation to generate
these findings is Celonis PQL. The findings can include all the relevant information that impacts
a decision about whether or how an action, like executing a task in the source system or
triggering a bot, is performed. In addition, Celonis PQL assists in prioritizing necessary actions.
• Machine Learning Workbench enables the usage of Jupyter notebooks within Celonis IBC for
creating and using Python predictive models. As part of the enhancement step, it supports
building predictive models on process data to proactively avoid downstream friction like
long running process steps, leading to e.g. late payments in finance. Celonis PQL is crucial for
querying the process data necessary as input for the predictive models.
~ 2M queries
per day
> 5k data ~ 370ms query
models execution time
Celonis IBC enables customers to use the entire product range on-demand, containing all
products described above. As shown in Figure 5, Celonis IBC is currently (as of October 2019)
managing more than 10,000 users and more than 5,000 data models. Two million queries are
executed per day, written in Celonis PQL, and processed by the Celonis PQL Engine with an
average execution time of 370 milliseconds per query. In reality, the execution time per query
is often much lower, but especially complex Celonis PQL statements on huge datasets lead to
outliers in execution time. In around 90% of the cases, the execution time is less than a second.
Nevertheless, we still aim for continuous performance improvements in order to further reduce
the execution time of complex Celonis PQL statements, like heavily nested queries, in the future.
Currently, the supported data types comprise STRING, INT, FLOAT, and DATE. Boolean
values are not directly supported, but can be represented as integers. Each data type can hold
NULL values. In general, Celonis PQL treats NULL values as non-existing and ignores them in
aggregations. Also, row-wise operations like adding the values of two columns will return NULL
if one of its inputs is NULL.
Currently, Celonis PQL provides more than 150 different operators to process the event data.
Due to space limitations, we cannot sketch the full language. However, we can offer a brief
overview of the major language features before we present selected examples to showcase the
expressiveness of the language. Comprehensive documentation of the Celonis PQL operators
can be accessed via the free process mining platform Celonis Snap1.
SELECT TABLE (
COUNT ( DISTINCT "Activities"."Department" ), KPI COUNT ( DISTINCT "Activities"."Department" ),
FROM
"Activities"
LEFT JOIN
"Cases" Implicit joins
ON from Data Model
"Activities"."CaseID" = "Cases"."CaseID"
GROUP BY Implicit
"Cases"."CaseID" grouping
WHERE
"Cases"."OrderValue" > 1000 ; Filter FILTER "Cases"."OrderValue" > 1000 ;
Similar to SQL, Celonis PQL enables the user to specify the data columns to retrieve from the
data model. This can either be an aggregation, which we call a KPI, or an unaggregated column,
which we call a dimension. While the data columns are part of the SELECT statement in SQL,
Celonis PQL requires them to be wrapped in the TABLE operator, which combines the specified
columns into a common table.
In contrast to SQL, Celonis PQL does not require the user to define how to join the different tables
within the query. Instead, it implicitly joins the tables according to their foreign key relationships
which have to be defined only once in the data model. Also, the grouping clause is not needed in
Both languages offer the possibility to filter rows. While SQL requires the user to formulate the
filter condition in the WHERE clause of the query, Celonis PQL offers the FILTER statements which
are separated from the TABLE statements but executed together. Splitting the data selection
and the filters into different statements enables the user to define multiple filter statements in
different locations inside an application, which then can be combined into the table statement
to query the data.
Beyond this simple structure, Celonis PQL provides a wide range of different operators which
can be combined to answer complex business questions. The following list gives an overview of
the most important classes of operators.
Aggregations.
Celonis PQL offers a wide range of aggregation functions, from simple standard functions like
count and average, to more advanced aggregations like standard deviation and quantiles.
Most of the aggregation functions are also available as window-based functions computing the
aggregation not over all values but over a user-defined sliding element window.
Data functions.
These are operators like REMAP_VALUES (see Section 4.2) and CASE WHEN (see Section 5.1),
which allow for conditional changes of values.
These functions enable the user to modify, project or round a date or time value, e.g. add a day
to a date or extract the month from a timestamp. There are also functions to compute date and
time difference (e.g. between timestamps of events).
Index functions.
Index functions create indices based on columns. The function INDEX_ACTIVITY_LOOP, for
example, returns for each activity how many times it has occurred in direct succession in a case.
This is useful, e.g., for identifying self-loops and computing their cycle lengths.
There are various machine learning functions available, e.g., to cluster data using the k-means
algorithm or learn decision trees.
Math functions.
Celonis PQL offers a wide range of mathematical functions, e.g., for arithmetic computations,
rounding float numbers, and computing logarithms.
Process functions.
Process functions comprise all process-specific functions which operate on the activity table
and take its configuration into account. Examples are pattern-based process filters, SOURCE
and TARGET operators (see Section 4.2), and computation of variants (see Section 4.3). There
are also special process mining operators for discovering process models, clustering variants,
and checking the conformance of a process model to the event data (see Section 4.4).
String modification.
These functions enable the user to modify string values, e.g., trimming whitespaces, changing
case, and creating substrings.
A major difference between SQL and Celonis PQL is the different language scope. Hence,
Celonis PQL does not support all operators that are available in SQL. This is due to the fact that the
development of the language is driven by customer requirements and only operators that are
needed for the target use cases are implemented. For example, generic set operators like UNION
are not supported, as they have not been required so far.
Another major difference to SQL is the missing support of a data manipulation language
(DML). As all updates in the process mining scenario should come from the source systems,
there is no need to directly manipulate and update the data through the query language. As the
data can be considered to be read-only, this also allows for specific performance optimizations
during implementation (see Section 6).
Furthermore, Celonis PQL does not provide any data definition language (DDL). As the data
model is created by a visual data model editor and stored internally, there has not been any
need for this so far.
In contrast to SQL, Celonis PQL is domain-specific and offers a wide range of process mining
operators which are not available in SQL. Consequently, Celonis PQL seamlessly integrates the
data with the process perspective. In the following, we explain selected process operators like
SOURCE and TARGET (Section 4.2), VARIANT (Section 4.3), and CONFORMANCE (Section 4.4)
in more detail.
TABLE (
SOURCE ( "ACTIVITIES"."ACTIVITY" ),
TARGET ( "ACTIVITIES"."ACTIVITY" ),
MINUTES_BETWEEN ( SOURCE ("ACTIVITIES"."TIMESTAMP" ),
TARGET ("ACTIVITIES"."TIMESTAMP" ) )
);
Input Output
CASE_ID ACTIVITY TIMESTAMP Source Target Throughput Time (mins)
1 ‘A’ 2019-01-01 13:00:00 ‘A’ 1 ‘B’ 1
1 ‘B’ 2 ‘C’ 6
1 ‘B’ 2019-01-01 13:01:00
2 ‘C’ 3 ‘D’ 2
1 ‘C’ 2019-01-01 13:07:00 RESULT
3
1 ‘D’ 2019-01-01 13:09:00
ACTIVITIES
Fig. 7: Example of throughput time computation using SOURCE and TARGET operators
To overcome this issue, Celonis PQL relies on the SOURCE and TARGET operators. Figure
7 shows an example that illustrates how SOURCE and TARGET can be used to compute the
throughput time between an event and its direct successor. While SOURCE always refers to the
actual event, TARGET refers to its following event. Consequently, SOURCE and TARGET can be
used to combine an event with its following event in the same row of a table. Both operators
accept a column of the activity table as input and return the respective value of the referred
event, as illustrated in Figure 7.
For the first event in the Activities table, SOURCE returns the activity name ‘A’ of the current
event, while TARGET returns the activity name ‘B’ of the following event (refer to in Figure 7).
For the second event of the input table, SOURCE returns ‘B’ and TARGET returns ‘C’ (refer to in
Figure 7) while they return ‘C’ and ‘D’ for the third event (refer to in Figure 7).
The example also demonstrates how the SOURCE and TARGET operators can be used to
compute the throughput time. Instead of the activity column, we can use the column containing
the timestamp of the events as input. Consequently, SOURCE and TARGET return the timestamps
of the referred events. Then, we can pass the result columns of the SOURCE and TARGET operators
to the MINUTES_BETWEEN operator to compute the difference between the timestamps of an
event and its following event in minutes. In the example of Figure 7, this results in throughput times
of 1 minute from ‘A’ to ‘B’, 6 minutes from ‘B’ to ‘C’, and 2 minutes from ‘C’ to ‘D’.
Syntax 1 shows the syntax of the SOURCE and TARGET operators, which is similar for both
operators. The first parameter is a column of the activity table. Its values are mapped to the
referred events and returned as result column. The result column is stored in a temporary result
table which can be joined with the case table.
To skip certain events, the SOURCE and TARGET operators accept an optional filter column
as a parameter. This column must be of the same size as the activity table. The SOURCE and
TARGET operators ignore all events that have a NULL value in the related entry of the filter
column. Usually, the filter column is created using the REMAP_VALUES operator.
The syntax of the REMAP_VALUES operator is shown in Syntax 2. The first parameter is an input
column of type string that provides the values that should be remapped as input. For creating
a filter column for the SOURCE and TARGET operators, this input column is usually the activity
column of the activity table. However, REMAP_VALUES can be generally applied to any column
of type string. The second parameter is a list of one or more pairs of string values that describe
the mapping. Each occurrence of the first value of the pair will be remapped to the second value
of the pair. Finally, the operator accepts an optional string value that will replace all values that
are not remapped within the mapping. If this optional default replacement value is missing, all
values not considered in the mapping will remain unchanged. As the REMAP_VALUES operator
is only applicable to columns of type string, REMAP_INTS provides a similar functionality for
columns of type integer.
Figure 8 shows a simple example of the REMAP_VALUES operator. It takes the activity column
of the activity table as input and maps ‘B’ and ‘C’ to NULL. As the optional replacement value is
not defined, all the other values (‘A’ and ‘D’) remain the same. Figure 9 demonstrates how to use
the result of the REMAP_VALUES as filter column for the SOURCE and TARGET operators by an
example query.
TABLE (
REMAP_VALUES ( "ACTIVITIES"."ACTIVITY", [ 'B', NULL ], [ 'C', NULL ] ) )
);
Input Output
CASE_ID ACTIVITY TIMESTAMP Remapped Values
1 ‘A’ 2019-01-01 13:00:00 ‘A’
1 ‘B’ 2019-01-01 13:01:00 NULL
1 ‘C’ 2019-01-01 13:07:00 NULL
1 ‘D’ 2019-01-01 13:09:00 ‘D’
ACTIVITIES RESULT
The query returns the activity names of the source and target events given in the input table
Activities. However, the Result table only shows one row relating ‘A’ to ‘D’ because the activities
‘B’ and ‘C’ are filtered out. This is achieved by passing the result of the REMAP_VALUES operator
as shown in Figure 8 to the SOURCE operator as filter column. As both activities ‘B’ and ‘C’ are
mapped to NULL, the next subsequent activity of ‘A’ is ‘D’ with a throughput time of 9 minutes.
To define which relationships between the events should be considered, the operators offer
the optional edge configuration parameter. Figure 10 illustrates the different edge configu-
ration options. The first option (a) is the default and only considers the direct follow relationships
between the events, while option (b) only considers relationships from the first event to all subse-
quent events. Option (c) is similar to option (b) but also considers self-loops of the first event.
Option (d) is the opposite of option (b) and only considers relationships going from any event to
the last event.
Query
TABLE (
SOURCE ( "ACTIVITIES"."ACTIVITY",
REMAP_VALUES ( "ACTIVITIES"."ACTIVITY", [ 'B', NULL ], [ 'C', NULL ] ) ),
TARGET ( "ACTIVITIES"."ACTIVITY" ),
MINUTES_BETWEEN ( SOURCE ("ACTIVITIES"."TIMESTAMP" ),
TARGET ("ACTIVITIES"."TIMESTAMP" ) )
);
Input Output
CASE_ID ACTIVITY TIMESTAMP Source Target Throughput Time (mins)
1 ‘A’ 2019-01-01 13:00:00 ‘A’ ‘D’ 9
1 ‘B’ 2019-01-01 13:01:00 RESULT
1 ‘C’ 2019-01-01 13:07:00
1 ‘D’ 2019-01-01 13:09:00
ACTIVITIES
Fig. 9: Example for omitting activities ‘B’ and ‘C’ in SOURCE and TARGET operators
Fig. 10: Available edge configuration options of the SOURCE and TARGET operators
Query
TABLE (
SOURCE ( "ACTIVITIES"."ACTIVITY", FIRST_OCCURRENCE[] TO ANY_OCCURRENCE[] ),
TARGET ( "ACTIVITIES"."ACTIVITY" ),
MINUTES_BETWEEN ( SOURCE ("ACTIVITIES"."TIMESTAMP" ),
TARGET ("ACTIVITIES"."TIMESTAMP" ) )
);
Input Output
CASE_ID ACTIVITY TIMESTAMP Source Target Throughput Time (mins)
1 ‘A’ 2019-01-01 13:00:00 ‘A’ 1 ‘B’ 1
1 2 3
1 ‘B’ 2019-01-01 13:01:00 ‘A’ 2 ‘C’ 7
‘A’ 3 ‘D’ 9
1 ‘C’ 2019-01-01 13:07:00 RESULT
Fig. 11: Example for computing how many minutes after the start of the process
an activity was executed
Finally, option (e) only considers the relationship between the first and the last event. The
different options enable the user to compute KPIs between different activities of the process. For
example, you can use option (b) to compute how many minutes after the start of the process
(indicated by the first activity ‘A’) an activity was executed. This is illustrated in Figure 11 where
SOURCE always refers to the first event of the case (activity ‘A’) while TARGET refers to any other
event (activities ‘B’, ‘C’, and ‘D’). Consequently, MINUTES_BETWEEN computes the minutes
elapsed between the occurrence of ‘A’ and all the other activities of the case. For computing
the remaining process execution time for each activity of the process, you can simply adapt the
edge configuration in the query from Figure 11 to option (d).
To simplify the query, the optional edge configuration and the filter column need to be defined
in only one occurrence of SOURCE or TARGET per query. The settings are implicitly propagated
to all other operators in the same query. This can be seen in the query in Figure 9, where the
TARGET operator inherits the filter column from the SOURCE operator.
Besides the computation of custom process KPIs, like the throughput time between certain
activities, SOURCE and TARGET also enable more advanced use cases, like the segregation
of duties, as we will demonstrate in Section 5.3. A concept similar to the SOURCE and TARGET
operators has recently been proposed in [3].
VARIANT ( input_column )
The syntax of the VARIANT operator is shown in Syntax 3. As input, the operator uses a column
of type string from the activity table. The operator concatenates the string values of the given
column into a single string delimited by comma, and adds the result to the row of the related
case. Usually, the activity column is used as input, however, other columns of the activity table,
like the name of the executing department or user, can be used.
Sometimes, different cases may have self-loops of the same activity but with a different
number of activities. Consequently, these cases are related to different variants. However, in
some applications it is not of interest how often an activity is repeated but only if there is a
self-loop or not. For such cases, the VARIANT operator can be wrapped by the SHORTENED
command which shortens self-loops to a maximum number of occurrences. In this way, it is
possible to abstract from repeated activities and reduce the number of distinct variants. The limit
for the length of the self-loops can be specified by an optional parameter. The default value for
the maximum cycle length is 2.
Figure 12 shows an example query for the variant computation. The input data consists of an
activity table and a case table which can be joined by the foreign key relationship between the
Case_ID columns of both tables. For each case, the query result shows the variant string (Variant
column) and the variant string with reduced self-loops (Shortened column). Column Variant of
the Result table shows individual variants (with a varying number of ‘B’ activities) for each case,
while column Shortened shows equal variants for the cases 2 and 3 where the third ‘B’ activity of
case 3 is omitted.
TABLE (
"CASES"."CASE_ID",
VARIANT ( "ACTIVITIES"."ACTIVITY" ),
SHORTENED ( VARIANT ( "ACTIVITIES"."ACTIVITY" ) )
);
Input Output
CASE_ID ACTIVITY TIMESTAMP CASE_ID Variant Shortened
1 ‘A’ 2019-01-01 13:00:00 1 ‘A, B, C’ ‘A, B, C’
1 ‘B’ 2019-01-01 13:01:00
2 ‘A, B, B, C’ ‘A, B, B, C’
1 ‘C’ 2019-01-01 13:02:00
3 ‘A, B, B, B, C’ ‘A, B, B, C’
2 ‘A’ 2019-01-01 13:03:00
RESULT
2 ‘B’ 2019-01-01 13:04:00
2 ‘B’ 2019-01-01 13:05:00
2 ‘C’ 2019-01-01 13:06:00
3 ‘A’ 2019-01-01 13:07:00
3 ‘B’ 2019-01-01 13:08:00
3 ‘B’ 2019-01-01 13:09:00
3 ‘B’ 2019-01-01 13:10:00
3 ‘C’ 2019-01-01 13:11:00
ACTIVITIES
CASE_ID
1
2
3
CASES
ACTIVITIES.CASE_ID CASES.CASE_ID
Foreign Keys
Fig. 12: Example for the VARIANT operator with and without reduced self-loops
The third part of the model description is a list of flow relations, where each flow relation is
specified as a pair of the source place and the target transition or a pair of the source transition
and the target place, respectively. After that, a list of value pairs defines the mapping of activity
names to the related transitions. The first value in such a pair is an activity name as a string while
the second value is the ID of a transition which must be defined in the list of transitions. The last
two parts of the model description are the lists of start and end places, respectively. Both lists
consist of place IDs which must be specified in the first part of the model description.
The CONFORMANCE operator replays the activities’ names from the input column on the
process model. As a result, it adds a temporary column of type integer to the activity table. The
value of a row in this fresh column indicates if there is a conformance issue or not. Also, the type
of violation and the related activities in the process model are encoded in this value.
As the integer encoding is not suitable for the end-user, the CONFORMANCE operator can
be wrapped in the READABLE command. If there is a violation, this will translate the encoding
into a message explaining the violation.
Figure 13 shows an example query that uses the CONFORMANCE operator which takes an
activity table with three different cases as input. The process model that should be related to the
event log is illustrated in Figure 14. It is a simple Petri net consisting of two transitions and three
places forming a trivial sequence of two activities ‘A’ and ‘B’.
The result of the query consists of four columns with Case_ID as the first, and the Activity as
the second column. The third column (Conformance) shows the integer encoded result of the
CONFORMANCE operator, while the fourth column (Readable) shows intuitive messages explaining
the deviations. Even though activity ‘A’ matches the model, the first row is marked as incomplete
because it is the last activity of case 1 which does not reach the end of the process due to the
missing activity ‘B’. For case 2, the first activity (row 2) conforms, but the second activity (‘C’ in row
3) is not part of the process model and, therefore, is marked as an undesired activity. In contrast to
that, case 3 fully conforms, which is indicated for all its activities (rows 4 and 5) of the output.
TABLE (
"ACTIVITIES"."CASE_ID",
"ACTIVITIES"."ACTIVITY",
CONFORMANCE ( "ACTIVITIES"."ACTIVITY" , [ "P_0" "P_1" "P_2"] , [ "T_01" "T_12"] ,
[ [ "P_0" "T_01" ] [ "T_01" "P_1" ] [ "P_1" "T_12" ] [ "T_12" "P_2" ] ],
[ [ 'A' "T_01" ] [ 'B' "T_12" ] ], [ "P_0" ] , [ "P_2" ]
),
READABLE (
CONFORMANCE ( "ACTIVITIES"."ACTIVITY" , [ "P_0" "P_1" "P_2"] ,[ "T_01" "T_12"] ,
[ [ "P_0" "T_01" ] [ "T_01" "P_1" ] [ "P_1" "T_12" ] [ "T_12" "P_2" ] ],
[ [ 'A' "T_01" ] [ 'B' "T_12" ] ], [ "P_0" ] , [ "P_2" ]
)
)
);
Input Output
CASE_ID ACTIVITY TIMESTAMP CASE_ID ACTIVITY Conformance Readable
1 ‘A’ 2019-01-01 13:00:00 1 ‘A’ 2147483647 'Incomplete'
2 ‘A’ 2019-01-01 13:01:00 2 ‘A’ 0 'Conforms'
2 ‘C’ 2019-01-01 13:02:00 2 ‘C’ -2 'C is an undesired
3 ‘A’ 2019-01-01 13:03:00 activity'
3 ‘B’ 2019-01-01 13:04:00 3 ‘A’ 0 'Conforms'
ACTIVITIES 3 ‘B’ 0 'Conforms'
RESULT
Fig. 13: CONFORMANCE operator example with integer encoding and readable explanation
A B
P_0 T_01 P_1 T_12 P_2
As the example illustrates, the model description is quite extensive even for such a small
model which seems to contradict the design goal of keeping the language as simple as possible.
However, the conformance operator is usually called from a GUI component. Using this component,
the user can upload, automatically discover or manually model a process model in Business
Process Model and Notation (BPMN)[12] representation, which is automatically translated into
the required string description. The GUI component can also bind the model description string
to a variable. Instead of defining the process model in the query, the user can simply insert the
variable, which makes it much easier to use the CONFORMANCE operator in other GUI compo-
nents. For example, the user can apply the CONFORMANCE operator in a filter in order to restrict
a data table or chart only to cases that are marked as incomplete.
5 Use Cases
This section demonstrates the applicability of Celonis PQL for solving real-world problems of
business users. First, we show how Celonis PQL is used to discover working capital optimiza-
tions. In our example, we identify early invoice payments to improve the on-time payment rate
(Section 5.1). Second, we demonstrate how Celonis PQL is used to identify ping-pong-cases in
IT service management processes in order to reduce ticket resolution times (Section 5.2). Third,
we show the application of Celonis PQL for detecting segregation of duties violations to prevent
fraud and errors in procurement (Section 5.3).
Query 1 shows a Celonis PQL statement for the calculation of the early payment ratio per
vendor. Using this query, the user is able to discover the vendors which have the highest ratio of
invoices paid more than three days early.
The distinction whether an invoice was paid more than three days before the due date is
made within the CASE WHEN statement (lines 7–17) by calculating the throughput time with
the CALC_THROUGHPUT function (lines 9–12). The CALC_THROUGHPUT operator takes the
timestamp of the first occurrence of activity ‘Clear Invoice’ and the timestamp of the first occur-
rence of activity ‘Due Date passed’ and calculates the difference. The second parameter, given as
REMAP_TIMESTAMPS operator (line 12), counts time units in the specified interval DAYS based
on the timestamps in the activity table to enable the calculation of the throughput time. As the
CALC_THROUGHPUT operator returns NULL if the end date is before the start date, the result
of the calculation is wrapped in the COALESCE (lines 8–14) operator to return 0 in these cases.
The result of the COALESCE operator is then compared to the specified three days (line 14). If the
result is greater than 3, the CASE WHEN statement returns 1; otherwise 0.
The whole CASE WHEN statement is wrapped in the AVG operator (lines 6–18), allowing to
calculate the ratio of invoices paid more than three days early. By specifying the vendor name (''
Invoice '' . '' VendorName ") as a dimension in the TABLE statement (lines 4–19), the ratio is calcu-
lated per vendor. To get the vendors with the highest ratio of early invoice payments, the result of
the AVG calculation is sorted in descending order by the ORDER BY statement (line 19). The two
Customer support within ITSM systems is usually carried out by creating a ticket for each
customer inquiry in the system and solving these tickets. Thus, an important key figure for ITSM
is the resolution time of a ticket. A ticket is ideally resolved without the interference of many
departments or teams. However, in so-called ping-pong-cases, a ticket is repeatedly going back
and forth between departments or teams. This is massively slowing down the resolution time. To
prevent this, the identification of ping-pong-cases is crucial.
Query 2 shows a Celonis PQL query to identify direct ping-pong-cases. A case in this context
is equivalent to a ticket. Direct ping-pong refers to tickets in which the same activity appears (at
least) two times with only one other activity in between, e.g. ‘Change Assigned Group’ directly
followed by ‘Review Ticket’ directly followed by ‘Change Assigned Group’.
The query calculates whether a ticket is a ping-pong-case or not within the CASE WHEN
statement (lines 4–10). If the current activity equals ‘Change Assigned Group’, the second next
activity is equal to the current activity and the next activity is not equal to the current activity, the
ticket is classified as ping-pong-case and the CASE WHEN statement returns the ticket ID.
1 TABLE(
2 " Tickets "." Country ",
3 COUNT( DISTINCT
4 CASE
5 WHEN " Activities "." Activity " = ’Change Assigned Group ’
6 AND " Activities "." Activity " = ACTIVITY_LEAD (" Activities "." Activity " ,2)
7 AND " Activities "." Activity " != ACTIVITY_LEAD (" Activities "." Activity " ,1)
8 THEN " Activities "." TicketId "
9 ELSE NULL
10 END
11 )
12 /
13 COUNT_TABLE (" Tickets ")
14 AS " DirectPingPongRatio "
15 ) ORDER BY " DirectPingPongRatio " DESC;
The comparison between the current activity, the next and the second next activity is achieved
by using the ACTIVITY_LEAD operator (lines 6 and 7). In general, the ACTIVITY_LEAD operator
returns the activity from the row that follows the current activity by offset number of rows within a
case. As the timestamp column of the activity table is defined in the data model, the ACTIVITY_
LEAD operator can implicitly rely on the correct ordering of events. The CASE WHEN statement
is wrapped in a COUNT operator (lines 3–11) to count the total number of ping-pong-cases. By
Query 3 shows a Celonis PQL query to identify indirect ping-pong-cases. Indirect ping-pong
refers to tickets in which the activity ‘Change Assigned Group’ appears at least two times with
more than one other activity in between, e.g., ‘Change Assigned Group’, directly followed by
‘Review Ticket’, directly followed by ‘Do some work’, directly followed by ‘Change Assigned Group’.
The query shown in Query 3 calculates whether a ticket is an indirect ping-pong-case or not
by using the operators ACTIVATION_COUNT (line 7) and ACTIVITY_LEAD (lines 8 and 9) within
a CASE WHEN statement (lines 6–12).
ACTIVATION_COUNT returns, for every activity, how many times it has already occurred
(so far) in the current case. Within the CASE WHEN statement, the ticket ID is returned if the
ACTIVATION_COUNT is greater than 1 and the current activity is not equal to the last and the
second last activity. The latter comparison is calculated by the ACTIVITY_LAG operator. In
general, ACTIVITY_LAG returns the activity from the row that precedes the current activity by
o˙set number of rows within a case.
If one of the expressions in the WHEN-clause (lines 7–9) is FALSE, the CASE WHEN statement
returns NULL. As in the example for direct ping-pong-cases, the CASE WHEN statement is
wrapped in a COUNT operator (lines 5–13) to count the total number of ping-pong-cases. By
adding DISTINCT (line 5) to the COUNT operator, it is guaranteed that a ticket is only counted
once as an indirect ping-pong-case. The result of the COUNT operator is then, again, divided
by the total number of tickets to get the ratio of indirect ping-pong-cases. Thereby, the total
number of tickets is calculated using the COUNT_TABLE operator (line 15). The country
(" Tickets " . " Country ", line 4) is specified as a dimension in the TABLE statement (lines 3–17)
1 TABLE(
2 " PurchaseOrders "." PurchaseOrganization ",
3 AVG(
4 CASE
5 WHEN SOURCE (" Activities "." Department ",
6 REMAP_VALUES (" Activities "." Activity ",
7 [ ’Request Approval ’, ’Request Approval ’ ],
8 [ ’Grant Approval ’, ’Grant Approval ’ ],
9 NULL
10 )
11 ) = TARGET (" Activities "." Department ")
12 THEN 1.0
13 ELSE 0.0
14 END
15 ) AS " SoDViolationRatio "
16 ) ORDER BY " SoDViolationRatio " DESC;
Query 4 shows a Celonis PQL query for the calculation of the ratio of purchase orders in which
the SoD for the activities ‘Request Approval’ and ‘Grant Approval’ was violated because the same
department executed both tasks. The ratio is calculated per purchase organization to discover
the ones with the highest violation ratio. Comparing whether the activities ‘Request Approval’
and ‘Grant Approval’ were executed by the same department is done within the CASE WHEN
statement (lines 4–14). The statement contrasts the source event department to the target event
department by using the SOURCE and TARGET operators (lines 5–11). A detailed description of
these operators can be found in Section 4.2. The REMAP_VALUES function (lines 6–10) passed
as a parameter to the SOURCE operator, allows to extract the activities ‘Request Approval’ and
‘Grant Approval’ by mapping them to the same name while mapping all the other activities to
NULL. If the comparison between the source department and the target department returns
true, the CASE WHEN statement returns 1 (line 12); otherwise 0 (line 13).
The AVG operator (lines 3–15) in which the CASE WHEN statement is wrapped
calculates the ratio of violations of the SoD. By specifying the purchasing organization
(" PurchaseOrders " . " PurchaseOrganization ") as a dimension in the TABLE statement (lines
1–17), the ratio of violations is calculated per purchase organization. The result of the AVG calcu-
6 Implementation
Celonis PQL is the basis of a commercial product that promises interactive process mining
and business intelligence using data sets with hundreds of millions of events. For this reason,
the implementation of the language has to fulfill high requirements regarding performance,
scalability, and low latency.
The implementation targets business intelligence and process mining because our
experience is that process mining unfolds its full potential in combination with classic BI. Many
insights into customer data could only be derived by taking into account further dimensional
tables, in addition to the event log. For example, to find the country with the most segregation of
duties violations (see Section 5.3), information about the countries has to be available.
In the past, different types of software addressed two fields: BI and process mining. BI is the
domain of relational database systems. This is also reflected by the TPC-H benchmark [5], which is
the de facto standard benchmark for analytical databases. The benchmark portrays a wholesale
supplier in a data warehouse and focuses on classic BI questions, but it does not consider any
process mining aspects. As a result, databases perform well in answering BI questions, but they
are not optimized to answer process mining questions.
Besides the Celonis PQL implementation, process mining is done on relational databases,
specialized implementations, or graph databases [6]. Graph databases can be considered as
a reasonable choice, because a process instance can be interpreted as a graph. While they
deliver a decent performance for process mining, a graph database is not optimized for business
intelligence. This is why our objective was not to build upon an existing data processing solution.
Instead, we wanted to design a system from scratch which combines techniques from relational
and graph databases.
Like most state-of-the-art database systems, the Celonis PQL implementation is a main
memory database. This means that it uses main memory as primary storage instead of the disk
in order to avoid slow disk access. It is implemented in C++ and Java. C++ is used for all software
modules in which active control over the main memory is necessary, like the storage layer and
the performance-critical process mining algorithms. Java is used for non-performance critical
sections, like the parser, because of its memory safety.
The Celonis PQL Engine uses state-of-the-art techniques from the database research
community like just-in-time (JIT) compilation and dictionary encoding. JIT compilation is a
technique which generates and compiles code to execute a query. It allows achieving a good
cache locality and a low number of CPU instructions resulting in a very high performance as
shown by Neumann et al. [11]. Dictionary encoding is a standard compression technique to
reduce the memory overhead [10].
The Celonis PQL Engine implementation is also focused on scaling with the number of CPU
cores within one server. The challenge here is that the implementation has to be lightweight
enough to run on commodity laptop hardware, while it has to be sophisticated enough to make
use of all the power a high-end server provides. This is achieved by parallel intra query execution,
refer to Leis et al. [9] for details.
The Celonis PQL Engine implementation, however, is not designed to process queries across
multiple servers. This is a conscious decision because the Celonis PQL Engine needs to provide
results with low latency. Synchronizing multiple machines across the network to execute a Celonis
PQL query adds overhead, which is against the low latency goal. The work of Schüle et al. [14]
supports our single-node approach by demonstrating that a single server can handle even
large scale applications like Wikipedia. To support such applications, lightweight in-memory
compression techniques have to be in place. The Celonis PQL Engine implementation uses an
approach for in-memory compression which is inspired by the work of Lang et al. [7].
Model, Record
Dynamic
System
Dynamic
System and Correlate The Process
1
...
n Querying
create
Framework
Query Intent
read
Recording Process Querying
Modeling Correlating
Instruction
Query Process
Interpret
Query
Condition
Query Analytics
Condition
1 1
Condition 1
Action Engine
Prepare
+ =
Event Log + Process Model,
Indexing Process Model Celonis PQL KPIs,
Conformance Process
Automation
Index
and Process Filtering Optimizing Transformation
Statistics Center
Machine
Filtered Process Learning
Cache Execution Plan
Repository A′ Workbench
Caching
Process
Querying Process Querying
Statistics
Execute
Fig. 15: Celonis PQL in the context of the Process Querying Framework
The query intent of Celonis PQL is limited to create and read. While all supported kinds of
behavioral models can be read, process models and correlation models can also be created
by Celonis PQL queries (e.g., by process discovery and conformance checking). The update and
delete query intents are not included – especially for the event logs – as they should always stem
from the source systems. Therefore, event log updates can be achieved by delta loads which
regularly extract the latest data from the source systems. The process querying instruction is
usually defined by an analyst through a user interface. For example, the user defines the columns
to be shown in a table, which can be considered as the query conditions. The selections from the
user interface are then formalized into a Celonis PQL query.
The Prepare part of the framework focuses on increasing the efficiency of the query processing.
The Celonis PQL Engine – that processes the queries – maintains a cache for query results, refer
to Section 6. After the application starts, it warms up the cache with the most relevant queries
derived from the Process Querying Statistics to provide fast response times. According to [13],
the Indexing component does not only include classical index structures but also all kind of data
structures for an efficient retrieval of data records. It is covered by the dictionary encoding of
columns, as discussed in Section 6.
The Execute part of the framework combines an event log with an optional process model
and a Celonis PQL query into a query result which can be either a process model, KPIs, filtered
and processed event log data, or conformance information. The concrete input and output of the
query depend on the selected query intent and the query conditions. The Filtering component
reduces the input data of the query. This can either be achieved by the REMAP_VALUES operator
and the filter column of the SOURCE and TARGET operators, as described in Section 4.2, or by
the general filter statement shown in the example in Figure 6. The Optimizing component uses
basic database technology to rewrite the query and create the Execution Plan, which describes a
directed graph of operator nodes. The Process Querying component then executes the execution
plan on the filtered data. It also retrieves data from the cache to avoid re-computation of either
the full query or certain parts of it which are shared with previous queries.
The Interpret part of the framework communicates the query results to the user and improves
the user’s comprehension of them. The applications in the Celonis IBC platform incorporate
Celonis PQL and make the results accessible to the user. The Process Analytics presents the query
results as process graphs, charts and tables. Beyond pure visualization, it is highly interactive
with dynamic filtering to drill-down the processes to specific cases of interest. This interactivity
offered by all GUI components is achieved through the dynamic creation of Celonis PQL queries.
Transformation Center supports process monitoring. It historicizes the query results to show
how the processes evolved over time. Finally, Machine Learning Workbench provides a platform
for user-defined machine learning analyses over event logs and retrieves the event data using
Celonis PQL queries.
As Celonis PQL comprises more than 150 different operators to process event data, we could
only provide an overview of the major language features that are currently offered to users and
showcase the expressiveness of the language with a few examples. Besides the description of
the language, we illustrated the application of Celonis PQL within the various products available
in Celonis IBC. Presented statistics show the extensive usage of Celonis PQL within these products.
In addition, we presented the applicability of the query language for solving different real-world
problems customers are facing, such as fraud prevention with segregation of duties and speed
up of service requests by identifying ping-pong-cases. Finally, we described the position of
Celonis PQL within the Process Querying Framework (PQF) [13]. Celonis PQL instantiates all parts
of the PQF, except for the capability to simulate models. Moreover, create and read query intents
are covered.
Future work on Celonis PQL will focus on the implementation of new operators to further
enrich the capabilities and use cases of the query language. Additionally, efforts will be made
to improve query performance. New features will be developed in co-innovation projects with
academic and commercial partners, and with our customers.
Readers can access a wide range of PQL functionalities for free. Business users can use
Celonis PQL in the free Celonis Snap2 version. Academic users can get free access to the full
Celonis IBC technology including the wide range of Celonis PQL capabilities via the Celonis IBC -
Academic Edition3.
2 https://www.celonis.com/snap-signup
3 https://www.celonis.com/academic-signup
JIT.............................................Just-In-Time
SLA...........................................Service-Level Agreement
SoD..........................................Segregation of Duties
This work is protected by copyright and other applicable intellectual property laws and exclu-
sively owned by Celonis SE, its affiliates and licensors. It is provided subject to the Creative
Commons Attribution 4.0 International License as “open access content”. Unauthorized copying,
modification, distribution and other use are prohibited. Nothing herein contains any statement,
warranty or otherwise binding information in relation to Celonis’ products and/or their current
or future development. A publication of the final and entire version of the work will be made by
Springer Nature Switzerland AG.