Sap Signavio Process Intelligence Signal Reference en
Sap Signavio Process Intelligence Signal Reference en
2024-05-23
2 Tutorial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
2.1 Understand the sample process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
2.2 Count cases and cities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
2.3 Analyze order amounts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
2.4 Determine case cycle times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
2.5 Investigate events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Query language of SAP Signavio Process Intelligence optimized for performing process mining tasks on large
amounts of event data
SAP Signavio Analytics Language, SIGNAL is a specialized query language for process analysis.
The language is based on SQL. Like SQL, you use queries to retrieve data and perform calculations on the data.
However, it is not possible to change or delete process data.
The difference to SQL is the data model. While you usually query data from multiple tables with SQL, SIGNAL
queries the data from only one table, which contains nested events. In addition, SIGNAL provides numerous
custom functions to work more effectively with this data structure.
SIGNAL is optimized for process mining, for example to determine conformance, cycle times, and rework, and
it supports exploration at scale by all kind of SAP Signavio Process Intelligence users.
With SIGNAL, you can only retrieve data from processes to which you have access.
Data model
When mining the data of a process, you retrieve the data of a single table. This table contains the case
attributes and their nested events and event attributes. The following table shows this nested structure.
• per case
Each case is treated as one row. The nested events and event attributes are represented as a nested table.
• per event
Each event is treated as one row. The case ID and case attributes are repeated for each event.
Data types
The data type of a column defines which value the column can hold. All data types can occur on case level as
well as on event level (nested).
• Strings
• Numbers stored as double precision floating point
• Timestamps stored with millisecond precision, without time zone information.
• Durations stored with millisecond precision
• Booleans
All of these data types can appear in the source file and in the query result.
Both case and event attributes can be Null, indicating the absence of a value or an unknown value.
Process mining
SIGNAL follows a specific syntax that is described throughout this documentation using a special notation.
The syntax of SIGNAL is based on SQL but enhanced with functions to run in-depth process analysis queries.
All queries always follow this basic structure:
SELECT expressions
FROM table/process
WHERE conditions
Example
Example
This query counts the cases in the declared table for which the condition is true.
Syntax Notation
The following notation is used to describe the syntax of SIGNAL. Note that this notation isn't part of the actual
query:
• Angle brackets indicate a required element. Don't include the angle brackets as part of a query.
Keywords
SIGNAL keywords are case-insensitive, but are by convention written upper-case to distinguish them from
expressions. Read more on keywords in section SIGNAL keywords [page 203].
Attribute Names
Attribute names are case-insensitive. In the following cases, the attribute name must be enclosed in double
quotes:
Attribute Values
Attribute values are, where applicable, case-sensitive. Attribute values of type String, Timestamp, and Duration
must always be enclosed in single quotes.
For SIGNAL, the following pre-defined semantic attributes are always present:
The syntax rules for semantic attributes are the same as for the other attributes.
1.2 Aliases
Learn about aliases in SIGNAL, the process mining query language of SAP Signavio Process Intelligence.
Aliases are used to give the result set a temporary name to make the column headings in your result set
easier to read. It's common to alias a column when using an aggregate function in a query. Without an alias, a
name will be generated based on the column and operations in the expression. Read more in section SIGNAL
functions [page 83].
Syntax:
expression AS alias_name
Example:
No. of cases
Use the SELECT statement to select data from a process and return it in a result set.
Syntax
Clause Description
FROM Specifies the table in your process from which you want to retrieve the data. You
can reference the process by using the explicit Process ID, which can be found
on the API tab in the process settings page. Alternatively you can use the alias
THIS_PROCESS to refer to the default view.
TABLESAMPLE Specifies the absolute or percentage table fraction to be considered for the query.
WHERE Specifies the condition that must be met for cases to be selected. If this clause
isn't provided, then all records are selected.
UNION ALL Concatenates the result sets of two or more SELECT statements.
GROUP BY Collects data across multiple records and groups the results by one or more
columns. The GROUP BY clause requires an index similar to the ORDER BY
clause. You can use one or multiple indices.
ORDER BY Sorts the records in the result set. If more than one index is provided, separate
them with a comma.
ASC sorts the result set in ascending order by expression, DESC sorts it in
descending order.
NULLS FIRST sorts the result set with null values first, NULLS LAST with null
values last.
FILL Fills any gaps inside a time series column by inserting new rows into the result
set, each containing missing timestamps.
OFFSET Specifies the starting point to return rows from a result set.
Column Expressions
The SELECT statement is used to structure the rows of your result set.
Syntax:
Each column expression specifies how to populate the values in that particular column.
For example, you can select a column from the table referred to by tableExpression. Doing so populates
your result set with the values from the specified column in the table.
Example
1 C_1023 Received 1
2 C_1198 Received 3
3 C_1212 Dispatched 2
This query returns a result set with one column containing all the values from the process data's customer
ID column:
Customer ID
C_1023
C_1198
C_1212
A column expression can also be a literal value. Such a value is repeated in that column for each row in the
result set.
Example
SELECT case_id, 1
FROM THIS_PROCESS
This query creates a result set with two columns. The first column is populated with the values from
case_id in THIS_PROCESS. The second column is populated with the literal value 1.
case_id 1
1 1
2 1
3 1
You can also create column expressions by combining values, column names, operators or function calls.
Example
In this query, the result set's second column is populated with the values from the "Order Quantity" column
in THIS_PROCESS, but in each case that value is increased by 1.
C_1023 2
C_1198 4
C_1212 3
You can use the * operator in a SELECT statement to select all columns of a table under the following
conditions:
Example
SELECT case_id,
"Customer ID",
This query uses a subquery to select all columns from THIS_PROCESS. From them, the outer query then
selects an explicit subset of columns:
1 C_1023 1
2 C_1198 3
3 C_1212 2
Including Clauses
When building a SIGNAL query, add any of the optional clauses in the order they appear in the syntax.
Example
This query returns the case ID, customer ID and order quantity of the first 2 cases in the table. The clause
ORDER BY 3 DESC orders the result set by the third column in descending order.
2 C_1198 3
3 C_1212 2
Related Information
Learn about the SELECT (DISTINCT) clause in SIGNAL, the process mining query language of SAP Signavio
Process Intelligence.
The SELECT DISTINCT clause returns unique values of a specified column. If NULL values are present, they are
included. This function is not supported in subqueries.
Parameter Description
table The process table or view from which you want to retrieve
data. It is referenced by explicit Process ID or the alias
THIS_PROCESS.
This query returns all unique city names from the city column in the default process view, sorted in ascending
order. Duplicate values are removed from the result set. Therefore every city name appears only once.
This query returns a unique list of event names in ascending order. Using AS “Event Name:” assigns a column
alias to the result. Using the FLATTEN expression flattens the nested event data in the process, thereby
representing each nested event as a single row containing case and event properties. Duplicate values are then
removed from the result set so that every event name only appears once.
Learn about the FROM clause in SIGNAL, the process mining query language of SAP Signavio Process
Intelligence.
The FROM clause specifies the table in your process from which you want to retrieve the data.
THIS_PROCESS is the alias or temporary name assigned to the table that was set as default view in your
process. By default, the FROM clause fetches data from the default view.
FROM table
Parameter Description
table The process table or view from which you want to retrieve
records. You can reference the process by using the ex-
plicit Process ID which can be found on the API tab in the
process settings page. Alternatively you can use the alias
THIS_PROCESS to refer to the default view.
To retrieve the list of unique cities available in your process, run the following query.
Learn about the WHERE clause in SIGNAL, the process mining query language of SAP Signavio Process
Intelligence.
The WHERE clause is used to filter the data and apply conditions to the SELECT statement. This lets you
retrieve subsets of table rows based on set conditions.
Syntax:
WHERE condition
condition The condition that must be met for records you want to
select.
Symbols
= (equal to)
To retrieve the total order amount from the purchases made in the Boston city. This query retrieves the Order
Amount column values that match the WHERE condition.
To get the list of case Ids of the orders from Miami city with order status as delivered, run the following query.
The FILTER clause is used to filter data inside aggregations. This lets you include or exclude specific cases or
events from aggregations in your query.
Syntax
Parameter Description
Example
The following query finds the duration between two events: first receipt of an order and completion of payment
for that order. It does this by executing a subquery, querying event-level data. In obtaining the timestamp of the
final payment, only events named 'Receive Payment' are included by the filter. Because an order may include
multiple payments, LAST is applied to the resulting aggregation to obtain the latest timestamp.
SELECT
case_id,
(SELECT LAST(end_time) FILTER (WHERE event_name = 'Receive Payment') -
FIRST(end_time) FILTER (WHERE event_name = 'Receive Customer Order')
) AS PaymentTurnaround
FROM THIS_PROCESS
Example output:
This clause evaluates a given boolean expression on the event attributes and removes from the result set those
which evaluate to false. This stands in contrast to the WHERE statement, which removes entire cases from a
result set.
Note
Because this clause removes events rather than entire cases, it's possible for a result set to include cases
with empty event lists. Refer to Example 3 [page 22] for a demonstration of how to remove such cases
from a result set.
Parameter Description
Restriction
MATCHES and BEHAVIOR MATCHES expressions can't be nested. Consequently, these expressions can't be
used in the booleanExpression of a FILTER EVENTS clause.
If a query contains both a FILTER EVENTS clause and a WHERE clause, the FILTER EVENTS clause is
evaluated before the WHERE clause. Having this evaluation order enables the elimination of particular events
before cases as a whole are filtered further. Example 4 demonstrates this behavior.
The expressions defined in the FILTER EVENTS and WHERE clauses are evaluated after the FROM clause.
Consequently, using event-level filtering in combination with the FLATTEN operator isn't possible because
flattening transforms all event-level columns to case level.
Tip
As an alternative, you can use event-level filtering in a subquery and then flatten the result.
The booleanExpression can reference both case and event-level attributes. Refer to Example 2 for a
demonstration.
Usage of the FILTER EVENTS clause requires that an event-level attribute appears in at least one of the
following (otherwise, an error is returned):
The following query filters out all events where the payment amount is greater than 300.
SELECT
case_id,
"Order Amount",
event_name,
"Payment Amount"
FROM "THIS_PROCESS"
FILTER EVENTS WHERE "Payment Amount" > 300
Output:
Example 2
Using the same process data as the previous example, the following query demonstrates how both case and
event-level attributes can be included in a FILTER EVENTS clause.
SELECT
case_id,
"Order Amount",
event_name,
"Payment Amount"
FROM "THIS_PROCESS"
FILTER EVENTS WHERE "Payment Amount" < "Order Amount"
As the example data shows, all orders were paid for in one complete payment with the exception of case
'00009', which was paid for in two smaller installments. Therefore, this query filters out all events except the
payment-related events of case '00009'.
Example 3
Event-level filtering can leave behind cases with empty event lists. To filter out such cases, combine the FILTER
EVENTS clause with a WHERE clause as in the following query (which adapts the query from Example 2):
SELECT
case_id,
"Order Amount",
event_name,
"Payment Amount"
FROM "THIS_PROCESS"
FILTER EVENTS WHERE "Payment Amount" < "Order Amount"
-- Eliminate empty event lists
WHERE (SELECT COUNT(event_name) > 0)
This method takes advantage of every event automatically having an event_name attribute. Event lists
featuring no event_name can therefore be considered as empty.
Again using the process data from Example 1, this query further demonstrates combining the FILTER EVENTS
and WHERE clauses.
SELECT
case_id,
"Order Amount",
event_name,
"Payment Amount"
FROM "THIS_PROCESS"
FILTER EVENTS WHERE event_name IN ('Receive Customer Order', 'Receive Payment',
'Receive Delivery Confirmation')
WHERE event_name MATCHES (^'Receive Customer Order' -> 'Receive Payment' ->
'Receive Delivery Confirmation')
The FILTER EVENTS clause keeps only those events named 'Receive Customer Order', 'Receive Payment', or
'Receive Delivery Confirmation'. The containing cases are filtered further by the WHERE clause, which uses a
matching expression. Only cases whose event lists follow an expected pattern are kept, that pattern being:
1. Receiving an order
2. Receiving a single payment
3. Receiving delivery confirmation
All cases follow this pattern apart from case '00009', which, as earlier examples showed, involves multiple
payments.
Related Information
Groups rows by one or more columns and computes aggregate functions for each group.
Columns are referred to by a number, which reflects their position in the SELECT clause. For example, GROUP
BY 2 groups the result by the second expression in the SELECT clause. All rows that share the same values for
the grouped expression are condensed into a single row.
Syntax
Parameter Description
Remember
In a GROUP BY clause, you cannot refer to columns by
their name.
The GROUP BY clause is often used with aggregate functions, such as COUNT, MAX, or AVG. The aggregate
function is computed across all rows of each group and returns a separate value for each group. To specify the
rows to be considered for the aggregation, you can apply the FILTER clause to the aggregate function.
The GROUP BY clause is optional. If the GROUP BY clause is not present, then the following applies:
• If there are aggregate and non-aggregate expressions in the SELECT statement, then the result is
automatically grouped by any non-aggregate expressions.
• If there are only aggregate expressions in the SELECT statement, then the result is a single group
comprising all the selected rows.
• You must group by all expressions in the SELECT statement that are not encapsulated by an aggregate
function. Exception: The ungrouped expression is functionally dependent on a grouped expression (see
Example 2).
Grouping by Events
If a case-level attribute is chosen for the grouping, the result set is grouped according to the values present in
that attribute. If an event-level attribute is chosen, the result set is grouped by the list of identical sequences of
events. This can be used to identify process variants.
Example 1
GROUP BY 1, 2 groups by the expressions in the SELECT statement that are not encapsulated by an
aggregation function. Since this SELECT statement contains two expressions (city and region), the GROUP
BY index must refer to both expressions. GROUP BY 1 or GROUP BY 2 is not valid in this case.
Example 2
To get the count of case ids with actual order amounts and order amounts multiplied 2.50 times, run the
following query.
Since the second expression order_amount*2.50 is functionally dependent to the first colum
order_amount, GROUP BY 1 is valid in this case.
Caution
For very large data sets, this operator may require excessive CPU activity, causing long query execution
times.
Syntax
columnIndex The index of a column used to sort the records in the result
set. If more than one expression is provided, separate the
values with a comma.
• The direction of sorting: ASC sorts the column in ascending order. DESC sorts it in descending order. If
neither keyword is specified, the default sorting is ascending.
• The position of NULL values: NULLS FIRST sorts a column so that NULL values are presented first. NULLS
LAST sorts a column so that NULL values are presented last. If neither is specified, the default position is
NULLS FIRST.
Example
This query returns cases in the process sorted by the order amount in descending order.
The FILL clause fills any gaps inside a time series column. It does so by inserting new rows into the result set,
each containing missing timestamps.
As part of the fill, the precision must be specified. Assume, for example, you specify a precision level of 'days'. If
the time series column contains the values 'Mon, 4th September 2023' and 'Thu, 7th September 2023' with no
intervening values, then two rows would be inserted containing the missing dates ('Tue, 5th September 2023'
and 'Wed, 6th September 2023') in the time series column.
This precision level is part of a larger fill specification you need to provide, which instructs SIGNAL how to
populate the columns in all inserted rows.
Syntax
Parameter Description
tableExpression The process table or view from which you want to retrieve
records, referenced by explicit Process ID or the alias
THIS_PROCESS.
Note
The number of fill specifications provided must equal
the number of columns selected. Under certain circum-
stances, missing specifications are added automatically.
See Writing Fill Specifications [page 28] for more infor-
mation.
• TIMESERIES(<precision>): Defines the precision of the filling, determining with what regularity missing
entries are filled in the time series. Exactly one fill specification must be a TIMESERIES.
The precision parameter accepts a string literal specifying the precision level. The following levels are
supported:
• 'year'
Each fill specification matches with the column at the same index. In other words, the first specification in the
FILL clause corresponds to the first column in the SELECT clause, the second specification corresponds to the
second column, and so on.
SELECT
col1,
col2,
col3
FROM THIS_PROCESS
FILL
TIMESERIES('day'), -- corresponds to col1
GROUP, -- corresponds to col2
NULL -- corresponds to col3
Each column must have a fill specification. Any specifications you fail to provide are assumed to be of type
NULL and implicitly added to the end of the FILL clause. Therefore, the following query is equivalent to the
previous query:
SELECT
col1,
col2,
col3
FROM THIS_PROCESS
FILL
TIMESERIES('day'),
GROUP
-- NULL added here implicitly
However, if a NULL specification falls between TIMESERIES and GROUP specifications, then the NULL
specification must be added explicitly.
SELECT
col1,
col2,
col3
FROM THIS_PROCESS
FILL
TIMESERIES('day'),
NULL,
GROUP
The FILL clause isn't only useful for filling missing timestamps in the overall result set. It can also group the
values in a specified column and then add rows containing all missing timestamps to each group.
Remember
Example
If we fill the time series with missing dates without specifying a GROUP column, then the missing dates are
added independently of the other columns. The newly added rows would therefore be populated with NULL
values in all other columns.
SELECT
Received,
City,
"Customer Type"
FROM THIS_PROCESS
FILL TIMESERIES('day')
By specifying City as a group, all missing dates are added for each group of cities. Within each group, the
value of City is repeated for each inserted row.
SELECT
Received,
City,
"Customer Type"
FROM THIS_PROCESS
FILL TIMESERIES('day'), GROUP
The GROUP BY clause is applied after the filling. It's not related to the GROUP fill specification.
The FILL clause influences the definition and output of the ORDER BY clause. The FILL clause returns the
filled time series for each group (as defined by the GROUP specification) in ascending order. Consequently, in
the ORDER BY clause, the group column must be followed by the ascending time series column (as defined by
TIMESERIES(<precision>)).
Note
If no ORDER BY is provided, the output is first grouped by the GROUP specified column and then by the
TIMESERIES specified column.
This example builds a query for showing the types of orders received on a daily basis. An overview of every day
should be shown but not every type of order is sold every day, so we use FILL to fill any gaps. There are two
types of customer: 'Standard' and 'Premium'.
SELECT
DATE_TRUNC('day', (SELECT FIRST(end_time))) AS "Date",
"Customer Type"
FROM THIS_PROCESS
GROUP BY 1, 2
Result:
The next step is to add extra rows for each missing order type on each day. Specifically, we group the values in
the Customer Type column into two groups and then add new rows for the missing days within each group. The
value for the Customer Type column in the new rows takes the value of the current group.
To do that, add a FILL clause. TIMESERIES('day') identifies the first column as the time series. GROUP
identifies the Customer Type as the group column.
SELECT
DATE_TRUNC('day', (SELECT FIRST(end_time))) AS "Date",
"Customer Type"
FROM THIS_PROCESS
GROUP BY 1, 2
FILL
TIMESERIES('day'),
GROUP
Finally, we select the number of orders for each group on each date. The corresponding fill specification is
NULL, meaning that column is filled with null values in the new rows.
SELECT
DATE_TRUNC('day', (SELECT FIRST(end_time))) AS "Date",
"Customer Type",
COUNT(case_id)
FROM THIS_PROCESS
GROUP BY 1, 2
FILL
TIMESERIES('day'),
GROUP,
NULL
Related Information
Learn about the LIMIT clause in SIGNAL, the process mining query language of SAP Signavio Process
Intelligence.
With the LIMIT clause, you can specify the number of rows to return. Normally, you use this clause together
with the ORDER BY clause.
Without a LIMIT clause, the result set is limited to 500 rows by default. By setting the LIMIT clause, you can
decrease or increase the default limit.
Syntax:
SELECT expression
FROM table
LIMIT number
Parameter Description
table The process table or view from which you want to retrieve
records, referenced by explicit Process ID or the alias
THIS_PROCESS.
Example:
This query returns the first 10 rows where the customer Ids are sorted in descending order. It returns the
customer IDs with orders from only Boston city.
Learn about the TABLESAMPLE clause in SIGNAL, the process mining query language of SAP Signavio Process
Intelligence.
With the TABLESAMPLE clause, you can specify either the absolute or percentage table fraction to be
considered for the query. The table fraction is sampled randomly after the FROM clause is evaluated and
before all other clauses.
Syntax:
SELECT expression
FROM table
TABLESAMPLE sampling method (sampling amount) REPEATABLE (seed
)
Parameter Description
table The process table or view from which you want to retrieve
records, referenced by explicit Process ID or the alias
THIS_PROCESS.
Example:
This query returns 10 rows sampled randomly from the specified table.
For TABLESAMPLE EXACT(10 PERCENT) REPEATABLE(123), the query returns 10 percent of the rows
sampled randomly from the specified table.
Learn about the OFFSET clause in SIGNAL, the process mining query language of SAP Signavio Process
Intelligence.
With the OFFSET clause, you can specify the starting point to return rows from a result set.
Syntax:
SELECT expression
FROM table
LIMIT number
OFFSET offset_number
Parameter Description
table The process table or view from which you want to retrieve
records, referenced by explicit Process ID or the alias
THIS_PROCESS
offset_number The amount or rows to skip from the top of the table.
Flattens a table so that each event attribute becomes a top-level row. Case attributes are repeated accordingly.
Caution
For very large data sets, this operator may require excessive CPU activity, causing long query execution
times.
The flattened table allows aggregations based on event names or other event attributes.
Remember
Do not combine the FLATTEN operator with the MATCHES operators. The MATCHES operators work only with
nested tables and not with flattened tables.
Syntax
FLATTEN(<tableName>)
Example 1
Example 2
This query returns the total number of cases in which each event occurs.
Learn about the UNION ALL operator in SIGNAL, the process mining query language of SAP Signavio Process
Intelligence.
With the UNION ALL operator, you can combine the result sets of two or more SELECT statements. The
amount of table columns and column data types must match for each SELECT statement when using the
UNION ALL operator.
Syntax:
SELECT expression
FROM tables
[WHERE conditions]
TABLESAMPLE sampling method (sampling amount) REPEATABLE (seed)
ORDER BY order_index [, ...] [ASC | DESC] [NULLS FIRST | NULLS LAST]
UNION ALL
SELECT expression
FROM tables
[WHERE conditions]
TABLESAMPLE sampling method (sampling amount) REPEATABLE (seed)
ORDER BY order_index [, ...] [ASC | DESC] [NULLS FIRST | NULLS LAST]
Parameter Description
tables The tables from which you want to retrieve records. At least
one table must be listed in the FROM clause.
conditions (Optional) Conditions that must be met in order for the re-
cords to be selected.
SELECT statements before and after UNION ALL can only have the following clauses:
SELECT expressions
FROM table
[TABLESAMPLE]
[WHERE]
[GROUP BY]
Any other clauses, for example the UNION ALL, ORDER BY, FILL, LIMIT, and OFFSET are applied to the result of
UNION ALL.
Example 1:
This query returns a combined result set of the case_ids from tables in a column.
This query returns a combined result set of dates where an event was either open or closed on the given date.
A time-stamp (ts) is used, where 1 means the event is open and 0 means the event is closed on the respective
dates.
There are two types of subquery: general and event-level. They differ based on the level at which they operate,
and what types of data they return.
Related Information
General subqueries operate at case level and return tables. They're used in combination with the FROM clause
and are useful for adapting result sets before selecting from them.
Syntax
The syntax of a general subquery follows that of a standard query. The difference comes from its placement.
Parameter Description
Remember
A general subquery always needs an alias, otherwise the
query is invalid.
A general subquery follows the same syntax. However, the subquery is part of a FROM statement, providing
to its containing query a table from which to select data. As it builds the table, the subquery can adapt the
contents, for example by filtering or applying functions, providing the outer query with a customized data set.
Because a general subquery follows this same syntax, it's possible for a subquery to also select from a further
nested subquery. This recursive behavior allows you to nest subqueries to an arbitrary extent.
Examples
Example
This query uses a general subquery to group orders by city and calculate their number. From the resulting
table, only those rows having a number of cases greater than 500 are selected.
A general subquery can select from either a table, such as THIS_PROCESS, or from another nested subquery.
This nested subquery might in turn select from a further nested subquery and so on. Let's look at an example.
Example
The following query shows the average payment amounts per city:
SELECT
"City",
AVG("Order Amount") AS "avg_payment"
FROM THIS_PROCESS
GROUP BY 1
Output:
Developing this further, the next query shows only the average payment amounts per city above a certain
amount.
SELECT
"City",
avg_payment
FROM (
-- sub1 begins here
SELECT
"City",
avg_payment
The initial query has now become the innermost subquery (sub2). Its containing query, which is itself a
subquery (sub1), adapts the data further by filtering out the cases where average payment is equal to or
below 375. This provides the table for the outermost query to select from.
Output:
Event-level subqueries operate on event data and return scalar values. They're usually used with SELECT or
WHERE statements.
Syntax
Parameter Description
SELECT case_id,
(SELECT LAST(end_date))
FROM THIS_PROCESS
In this example, the second column is chosen using an event-level subquery. For each case in the process, this
subquery drops into event-level and selects from the case's event table. In this example, the column end_date
is selected. Because an event-level subquery returns scalar values, the aggregation function LAST is used to
return only the timestamp of the case's latest event, rather than a whole column of timestamps.
Nesting
• If included, this clause defines a nested event-level subquery for the containing subquery to select from.
• If a subquery doesn't include a FROM clause, it selects data from the table defined in the FROM clause of the
outermost query.
• The innermost subquery doesn't require a FROM clause. It selects data from the table defined in the
outermost query's FROM clause.
• All subqueries require an alias, the exception being the outermost subquery (where an alias is optional).
• A nested subquery doesn't need to use another event-level subquery to access event-level attributes. It
already operates at event-level by virtue of being nested. Example 3 demonstrates this.
Example 1
The following query selects the ID of each case and uses an event-level subquery to access each case's event
data. The subquery must return a scalar value, which in this example is the difference between the latest and
earliest timestamps. Since it's already the innermost subquery, no FROM clause is necessary.
SELECT
case_id,
(SELECT LAST("end_time") - FIRST("end_time")) AS "Turnaround Time"
FROM THIS_PROCESS
Example 2
Event-level subqueries can also be used in combination with the WHERE statement to filter out cases based on
event attributes. In this example, payments are event-level data since several payments can be made per order.
When an order is canceled, the last event in that case is 'Order Canceled'.
• The first (in the SELECT clause) selects all payment amounts per case and sums them, yielding the total of
that order.
• The second (in the WHERE clause) selects the last event in each case and filters out those orders that were
canceled.
Output:
An event-level subquery can contain a nested subquery, as demonstrated in this example for finding the lowest
payment received per order.
-- Outermost query
SELECT
case_id,
(
-- Subquery
SELECT MIN("Payment Amount")
FROM (
-- Nested subquery
SELECT "Payment Amount"
WHERE event_name = 'Receive Payment'
) as sub
) AS "Lowest Payment"
FROM THIS_PROCESS
The nested subquery, being the innermost, selects from the table defined by the outermost query
(THIS_PROCESS) and provides a list of payment amounts for the current order. The containing subquery
selects the minimum value from this list, thus finding the lowest payment and meeting its requirement to
return a scalar value.
Output:
1.5 Expressions
In SIGNAL, an expression is a combination of values, operators and functions which is evaluated to return a
single value.
Operator Description
+ Add
– Subtract
* Multiply
/ Divide
% Modulo
Type information
expression1 + expression2
Number + Number = Number
expression1 - expression2
Number – Number = Number
expression1 * expression2
Number * Number = Number
expression1 / expression2
Number / Number = Number
expression1 % expression2
Number % Number = Number
Operator Precedence
These operators follow the standard mathematical order of operations. This means:
• Evaluation proceeds from left to right. For example, 9 - 5 + 2 is interpreted as (9 - 5) + 2 = 6 rather than 9 -
(5 + 2) = 2.
• Evaluation is performed in order of precedence (from higher to lower). For example, 5 + 2 * 2 is interpreted
as 5 + (2 * 2) = 9 rather than (5 + 2) * 2 = 14.
This query subtracts the order quantity from the shipment quantity, calculating any difference between an
order's size and the total number of goods actually dispatched.
Result (canceled orders, where no delivery took place, have no number displayed):
Example 2
This query divides each payment received by the total amount of the order and multiplies by 100. The resulting
figure is the percentage of the full amount each individual payment represents.
SELECT
case_id,
"Payment Amount" / "Order Amount" * 100 AS "Proportion %"
FROM FLATTEN(THIS_PROCESS)
WHERE event_name = 'Receive Payment'
Example 3
This query returns the remainder when dividing the shipment quantity by 15. Assuming goods from an order
are shipped in containers with a capacity of 15, this query calculates how many units were shipped in a non-full
container.
Comparison expressions evaluate to logical true or false by comparing the values of two expressions.
Operator Description
= Equal to
Syntax
Type Information
This table summarizes which combinations of data types are valid with the various comparison operators and
what type of result each combination yields.
Note
Example 1
The following example query uses the comparison operations, '>', '<', and '=' for defining conditions in the
WHERE clause. It returns a list of delivered orders whose order amount is between 100 and 900. If the order
status is 'Delivered' and the order amount is greater than 100 and less than 900, then the query returns the
case IDs of these orders along with other details.
Result:
The following example query uses the comparison operations, '>=', '<=', and '<>' for defining conditions in the
WHERE clause. It returns a list of orders that are not delivered and with the order amount between 300 and
2000. If the order status is not 'Delivered' and the order amount is greater than or equal to 300 and less than
or equal to 2000, then the query returns the case IDs of these orders along with other details.
Result:
Operator Description
AND, OR
Syntax:
Parameters Description
Operator Precedence
These two operators have different operator precedence. Specifically, AND takes precedence over OR. In
compound expressions including both operators, this means that AND operations are evaluated before OR
operations. For example, A OR B AND C is interpreted as A OR (B AND C) by default.
To override this and force OR operations to take precedence, enclose the relevant part of the expression in
parentheses, for example, (A OR B) AND C.
NOT
Syntax:
NOT(<expression1>)
Parameters Description
Type Information
This table summarizes which combinations of data types are valid with the various logical operators and what
type of result each combination yields.
The following example returns a list of delivered orders in the city of Boston. If the order status is delivered and
the city is Boston, then the query returns the case IDs of these orders along with other details.
Result:
The following example returns orders where the value is less than 50 OR the value is greater than 2000 and city
is Houston. Keep in mind the remark above noting how AND operations take precedence over OR operations.
Result:
The following example returns the total order amount of all purchases excluding orders from the cities, Boston
and Miami.
Result:
Matching expressions recognize patterns in event-level values, returning true if a given pattern matches or false
otherwise.
Restriction
The MATCHES expressions work only for nested tables and not for flattened tables. Don't combine a
MATCHES expression with the FLATTEN operator.
MATCHES
The MATCHES expression allows filtering based on patterns that check for a certain sequence of event-level
values, such as event_name.
Syntax
Parameter Description
(A | B) "A" or "B"
Example 1
Match cases where the first event is “Receive Customer Order” and the event “Receive Delivery Confirmation”
occurs at any point later on.
Example 2
Match cases where exactly one event of any kind occurs between a payment being received and the order
being canceled.
BEHAVIOUR MATCHES
The BEHAVIOUR MATCHES expression checks whether complex behavior matches a pattern. Up to eight
behaviors can be combined. For each behavior an alias must be given. Behaviors can be specified using
comparison operators.
Syntax
Parameter Description
(A | B) "A" or "B"
Example
Match cases where a small payment was followed at some point by a larger payment.
Tests whether a value is equal to any value in a specified list. If so, then the whole expression evaluates to true,
otherwise it evaluates to false.
Syntax
Note
The types of the values in the
list must match the type of
valueExpression.
Examples
You can use IN to select only a subset of those entries, as the following example does.
Output:
SELECT
SUM("Order Quantity") AS "Sold",
IF ("Type of Goods" IN ('Cappy', 'Cappy with Print'), 'Yes', 'No') AS
"Headgear?"
FROM THIS_PROCESS
The query gives each order a label called "Headgear?". If the value of the order's Type of Goods field matches
with either 'Cappy' or 'Cappy with Print', then the label becomes 'Yes'. Otherwise, the label becomes 'No'. The
SUM function totals the items in both categories.
Using LIKE and ILIKE expressions, you can search for a specified string pattern in a column with string data
type. The LIKE expression matches case-sensitive string patterns, whereas the ILIKE expression matches
case-insensitive string patterns.
An optional keyword 'NOT' can be used along with LIKE and ILIKE expressions to search for the non-matching
string patterns.
Syntax
The LIKE and ILIKE expressions take string as input and return boolean as output.
For example:
Expression Output
Special Characters
The LIKE and ILIKE expressions use special characters to compare strings, character-by-character.
The following special characters can be used in conjunction with these expressions:
ILIKE '%to%' Returns true if the matching string pattern contains "to" in
any case and in any position
LIKE '_o%' Returns true if the matching string pattern contains "o" in
the second position
ILIKE 'm__%' Returns true if the matching string pattern starts with "m" or
"M" and is at least 3 characters in length
ILIKE 'b%n' Returns true if the matching string pattern starts with "b" or
"B", and ends with "n" or "N"
You can also use the LIKE and ILIKE expressions with MATCHES and BEHAVIOR MATCHES expressions for
filtering nested data. For example, see Example 3 [page 72] and Example 4 [page 72].
Example 1
The following example displays the use of wildcards in the string pattern.
Result:
The query collects the “Type of Goods” that ends with the value 'with print'.
Example 2
The following example displays the use of escape character (\) in the string pattern.
The query returns the entries where the case_id value starts with 100, follows by a backslash character, and
ends with any characters or numbers in a SIGNAL table.
Example 3
The following example displays the use of ILIKE expression in MATCHES expression.
SELECT count(case_id)
FROM THIS_PROCESS
WHERE event_name MATCHES ('Receive Customer Order' ~> ILIKE '%print%' ~> ILIKE
'%ship%')
Result:
The query returns the count of case_ids from THIS_PROCESS where a print action occured, and where the
order was shipped either as standard or express.
Example 4
The following example displays the use of ILIKE expression in BEHAVIOR MATCHES expression.
SELECT count(case_id)
FROM THIS_PROCESS
WHERE
BEHAVIOR
(event_name == 'Receive Customer Order') as order_received,
(event_name ILIKE '%print%') as order_printed,
(event_ILIKE '%ship%') as order_shipped
MATCHES (order_received ~> order_printed ~> order_shipped)
The query returns the count of case_ids from THIS_PROCESS where a print action occured, and where the
order was shipped either as standard or express.
Example 5
SELECT DISTINCT(event_name)
FROM FLATTEN(THIS_PROCESS)
WHERE event_name NOT LIKE '%Purchase%'
Result:
The query selects the event_name from THIS_PROCESS where the event_name does not contain the string
'Purchase'.
CASE WHEN
The CASE WHEN expression evaluates a list of conditions and returns a value when the first condition is met.
Once a condition is true, it stops reading and returns the result. If no conditions are true, it returns the value in
the ELSE clause. If the ELSE clause is not present, it returns a NULL value.
Each CASE statement must end with the END statement. The ELSE statement is optional, and provides a way
to capture values not specified in the WHEN/THEN statements.
• If a match is found, then the corresponding result in the THEN statement is returned, and the evaluation
stops. Any further WHEN statements aren't evaluated.
• If no match is found and an ELSE statement is present in the expression, then the result in the ELSE
statement is returned.
• If no match is found, and no ELSE statement is present, then a NULL value is returned.
Syntax
CASE
WHEN <condition> THEN <result>
[WHEN <condition> THEN <result> ...]
[ELSE <defaultResult>]
END
Parameter Description
• SELECT
• WHERE
<
WHEN "Order Amount" < '1000' THEN
'Approved'
>
WHEN "Order Amount" > '1000' THEN
'Approved'
<=
WHEN "Order Amount" <= '1000' THEN
'Approved'
>=
WHEN "Order Amount" >= '1000' THEN
'Approved'
=
WHEN "Order Amount" = '1000' THEN
'Approved'
<>
WHEN "Order Amount" <> '1000' THEN
'Rejected'
IN
WHEN "Country" IN ('Germany',
'France') THEN 'Europe'
NOT IN
WHEN "Country" NOT IN ('Germany',
'France') THEN 'Rest of the World'
Example 1
The following example selects countries based on regions. When a country is in a specific region (condition is
true), it's added to the region (result). If a country doesn't match any of the conditions, it's added to the ELSE
result "Rest of World".
SELECT DISTINCT
"Country",
CASE WHEN "Country" = 'USA' THEN 'North America'
WHEN "Country" = 'Germany' THEN 'EU'
WHEN "Country" = 'France' THEN 'EU'
WHEN "Country" = 'South Africa' THEN 'Africa'
ELSE 'Rest of World'
END AS "Region"
FROM THIS_PROCESS
Result:
The query returns which countries reside in which regions in a SIGNAL table.
The following example compares the minimum and maximum values of customer satisfaction ratings (CSAT)
at an event level in a CASE WHEN expression.
SELECT "Country",
CASE WHEN (SELECT(MAX("CSAT"))) = (SELECT(MIN("CSAT"))) THEN 'None'
WHEN (SELECT(MAX("CSAT"))) - (SELECT(MIN("CSAT"))) <= 1 THEN 'Low'
WHEN (SELECT(MAX("CSAT"))) - (SELECT(MIN("CSAT"))) <= 3 THEN 'Medium'
ELSE 'High'
END AS "CSAT Delta",
count(case_id)
FROM THIS_PROCESS
GROUP BY 1,2
Result:
The query returns the total number of cases in each CSAT Delta category (None, Low, Medium, and High) and
Country. The CSAT Delta category is determined by the delta between the maximum CSAT and minimum CSAT
value within the case.
The IF expression evaluates a single condition and returns one of two values accordingly.
• If the condition evaluates to true, the expression returns the first value.
• If the condition evaluates to false, the expression returns the second value.
You must provide both values. The values can be literals, column attributes or the results of other expressions,
such as additional IF expressions.
Syntax
Parameters
Parameter Description
Nested IF
IF expressions that contain additional IF expressions as value_if_true or value_if_false are called nested IF
expressions. Either or both of the possible return values, value_if_true and value_if_false, can be another IF
expression. The number of nested IF expressions is limited to 256.
Example - Simple IF
The following example returns a distinct list of cities along with an associated state. The IF expression is used
to compare the City to a literal value 'San Francisco'. If the City name matches the defined condition, the
state value is returned as 'California'. If the city does not match the defined condition, then the state value is
returned as 'Other'.
SELECT
DISTINCT "City",
IF("City" = 'San Francisco', 'California', 'Other') AS "State"
FROM THIS_PROCESS
Example - Nested IF
The following example returns a distinct list of cities along with corresponding states. The first IF expression
is used to compare the City to a literal value 'San Francisco'. If the city name matches the defined condition,
the state value is returned as 'California'. If the city does not match the defined condition, the next nested IF
expression is evaluated. If the state does not match the condition in the last IF expression, then the state value
is returned as 'Other'.
SELECT
DISTINCT "City",
IF(
"City" = 'San Francisco',
'California',
IF(
"City" = 'Miami',
'Florida',
IF("City" = 'Houston', 'Texas', 'Other')
)
) AS "State"
FROM THIS_PROCESS
IS NULL
Syntax
Parameter Description
Alternatively, IS NOT NULL returns True if an expression doesn't evalute to NULL. Otherwise it returns False.
Example - IS NULL
The following example returns the case IDs whose payment amount is null.
SELECT DISTINCT
"case_id" AS "Case ID",
"Shipment Number",
"Shipment Carrier"
Result:
The following example returns the case IDs where the payment amount is not null.
SELECT DISTINCT
"case_id" AS "Case ID",
"Shipment Number",
"Shipment Carrier"
FROM FLATTEN(THIS_PROCESS)
WHERE "SHIPMENT Number" IS NOT NULL
LIMIT 10
A literal is a fixed value of a certain type. SIGNAL supports literals of several different types.
String Literal
String literals are always enclosed in single quotes. If a string literal contains one or more single quotes, these
single quotes must be followed by a single quote.
Strings are case-sensitive. This means, for example, that the literals 'Smith' and 'smith' would be considered
different.
Number Literal
• 0-9
• 1.5 | .5
• 1e-10 | .5e+1000
Syntax:
DATE 'literal'
TIMESTAMP 'literal'
Parameter Description
DURATION Literal
With the DURATION function you specify a duration literal which is parsed into a time interval.
Syntax:
DURATION 'literal'
Parameter Description
For all duration strings, the singular and plural forms are
supported.
Example:
SELECT COUNT(case_id)
FROM THIS_PROCESS
WHERE (SELECT LAST(EndTime) - FIRST(EndTime)) > DURATION '10days'
This query returns the number of cases in this process with more than 10 days between the start time and the
end time.
Boolean Literal
Example
Assume that orders worth over 500 qualify for a discount. This query adds an attribute to the result set
stating whether a case qualifies.
SELECT
case_id,
"Order Amount",
IF("Order Amount" > 500, TRUE, FALSE) AS "Qualifies for Discount"
FROM THIS_PROCESS
1.6 Functions
Use functions in SIGNAL to carry out specific tasks and calculate values.
With the linear regression functions, you can calculate the relationship between two variables using the least
squares regression method, which is a standard approach for calculating a linear relationship. The least
squares regression line, also called the line of best fit, can be visualized as a straight line drawn through a set of
data points that represents the relationship between them.
For the regression line calculation, you need two parameters, slope and intercept. The slope and intercept
show how two variables are related according to an average rate of change, which works well with scatter plots
because scatter plots show two variables. The regression line is plotted on a scatter plot of the same data to
show the general data trend.
The purpose of the linear regression function is to find the relationship between the explanatory variable, X,
and the dependent variable, Y. It predicts the value of Y when the value of X is known.
• The primary purpose of the linear regression function is to facilitate the plotting of regression lines in
Correlation widget.
• You can also use the linear regression function within custom SQL statements to calculate trend or
regression values. This is useful for making predictions that extend beyond the provided data. For example,
suppose that order value is plotted against quantity and you want to know the value of an order size that is
10 times the maximum order size within your dataset.
• regr_slope(Y, X) function: slope of the least-squares-fit linear equation determined by the (X, Y) pairs
• regr_intercept(Y, X) function: Y-intercept of the least-squares-fit linear equation determined by the (X, Y)
pairs
The slope indicates the steepness of the regression line, whereas the intercept indicates its intersection with
the Y axis. The Y-intercept is the point at which the graph intersects the Y-axis. Based on the slope of each
X-axis unit, the subsequent point on the Y-axis is determined.
Syntax
REGR_INTERCEPT(<yAttribute>, <xAttribute>)
REGR_SLOPE(<yAttribute>, <xAttribute>)
Returns: A Number, the intercept of the univariate linear regression line determined by the (X, Y) pairs.
Example 1
The following query correlates the order amount with order quantity and return values for intercept and slope.
SELECT
REGR_INTERCEPT("Order Amount" , "Order Quantity") as "Intercept",
REGR_SLOPE("Order Amount" , "Order Quantity") as "Slope"
FROM THIS_PROCESS
Result:
Following is the scatter plot, generated using Correlation widget in a process. It shows the scatter plot and
regression line for order amount plotted against order quantity.
The following query calculates the order value along the regression line relative to an order size that isn't
included in your data set.
Result:
Example 3
The following query calculates the order value along the regression line relative to an order size that isn't
included in your data set and grouped by City.
SELECT "City",
Result:
1.6.1.2 AVG
Calculates the average of a collection of numeric values. NULL values are ignored
Syntax
AVG(<expression>)
Returns: The average as a Number, Timestamp or Duration. The return type matches the type of the
expression parameter.
AVG(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
Related Information
1.6.1.3 BOOL_AND
Returns true if the supplied expression evaluates to true for all input rows, otherwise it returns false.
Syntax
BOOL_AND(<expression>)
Returns: A Boolean. Result is true if all input rows evaluate to true for the supplied expression, otherwise false.
BOOL_AND(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
Example
This query returns true if all rows record an approval performed by a manager.
SELECT
(SELECT BOOL_AND(event_name = 'Approve' AND performer = 'Manager'))
FROM THIS_PROCESS
Related Information
1.6.1.4 BOOL_OR
Returns true if the supplied expression evaluates to true for any input row, otherwise it returns false.
Syntax
BOOL_OR(<expression>)
Returns: A Boolean. Result is true if any input row evaluates to true for the supplied expression, otherwise false.
BOOL_OR(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
This query returns true if any row records an approval performed by a manager.
SELECT
(SELECT BOOL_OR(event_name = 'Approve' AND performer = 'Manager'))
FROM THIS_PROCESS
Related Information
1.6.1.5 COUNT
Counts the number of values in a specified column. NULL values aren't counted.
Syntax
COUNT(<expression>)
COUNT(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
SELECT COUNT(case_id)
Related Information
Counts the number of distinct values in a specified column. If NULL values are present, they are excluded.
In the case of counting event-level columns, since event-level columns are lists, this function counts the
number of distinct lists.
Syntax
COUNT(DISTINCT <expression>)
expression The column whose distinct values are Number, Text, Timestamp, List (i.e.
to be counted. event-level column)
Example 1 (Case-level)
Example 2 (Event-level)
This query counts the number of distinct event sequences per case. Each distinct sequence represents a
process variant, therefore this query returns the number of process variants.
Syntax
FIRST(<expression>)
expression The collection of values from which the Number, Timestamp, Duration, Text,
first is chosen. Boolean
Returns: The first value as a Number, Timestamp, Duration, Text or Boolean. The return type matches the type
of the expression parameter.
FIRST(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
This query returns the name of the first event in this process.
SELECT
(SELECT FIRST(event_name))
FROM THIS_PROCESS
Related Information
Syntax
LAST(<expression>)
expression The collection of values from which the Number, Timestamp, Duration, Text,
last is chosen. Boolean
Returns: The last value as a Number, Timestamp, Duration, Text or Boolean. The return type matches the type
of the expression parameter.
LAST(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
This query returns the name of the last event in this process.
SELECT
(SELECT LAST(event_name))
FROM THIS_PROCESS
Related Information
Syntax
MAX(<expression>)
expression The collection from which the maxi- Number, Timestamp, Duration
mum value is chosen.
Returns: The maximum value as a Number, Timestamp or Duration. The return type matches the type of the
expression parameter.
MAX(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
SELECT MAX(discount)
FROM THIS_PROCESS
Related Information
A shortcut for the PERCENTILE_CONT function with a fixed percentile of 0.5, that is the middle number.
Syntax
MEDIAN(<expression>)
expression The column of which you want to deter- Number, Timestamp, Duration
mine the value that separates the lower
from the upper half.
Returns: The median value as a Number, Timestamp or Duration. The return type matches the type of the
column_name parameter.
Example
1 65
2 72
3 81
4 95
5 112
6 128
SELECT MEDIAN(A)
FROM THIS_PROCESS
The query determines in a first step the value position = 3. Because the sorted list has an even number of
items, this query calculates the arithmetic mean, that is (81 + 95)/2 = 88.
Syntax
MIN(<expression>)
expression The collection from which the minimum Number, Timestamp, Duration
value is chosen.
Returns: The minimum value as a Number, Timestamp or Duration. The return type matches the type of the
expression parameter.
MIN(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
SELECT MIN(discount)
FROM THIS_PROCESS
Related Information
1.6.1.12 PERCENTILE_CONT
Returns a percentile value based on the input column's continuous distribution. If no value lies at the
percentile, the linear interpolation between the closest two values is returned.
NULL values are ignored. If all values are NULL, this function returns NULL.
expression The column for which you want to de- Number, Timestamp, Duration
termine the value that separates the
lower from the upper percentile for the
given percentile rank.
Returns: The calculated continuous percentile as a Number, Timestamp or Duration. The return type matches
the type of the expression parameter.
Example
1 65
2 72
3 81
4 95
5 112
6 128
SELECT
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY A)
FROM THIS_PROCESS
This query determines in a first step the value position = 1.5. Because it's a fraction, the result value is the
average of position 1 and 2 = 68.5. Values below 68.5 are in the lower percentile, values above 68.5 are in the
upper percentile.
SELECT
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY A)
FROM THIS_PROCESS
For p = 0.75, the value position = 4.5. The result is the average of position 4 and 5, that is 103.5.
1.6.1.13 PERCENTILE_DISC
Returns a percentile value based on the discrete distribution of values in a column. The returned value has the
smallest distance to the given percentile.
NULL values are ignored. If all values are NULL, this function returns NULL.
Syntax
expression The column for which you want to de- Number, Timestamp, Duration
termine the value that separates the
lower from the upper percentile for the
given percentile rank.
Returns: The calculated discrete percentile. The return type matches the type of the expression parameter.
Example
1 65
2 72
3 81
4 95
5 112
6 128
SELECT
PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY A)
This query determines the value position = 1.5. Because it's a fraction, the value at the next higher position is
returned, that is 72.
Related Information
1.6.1.14 STDDEV
The standard deviation describes the average deviation of all measured values from the mean value. A low
standard deviation indicates that the values tend to be close to the mean value. A high standard deviation
indicates that the values are spread out over a wide range.
Syntax
STDDEV(<expression>)
expression The collection of values for which you Number, Timestamp, Duration
want to determine the standard devia-
tion.
Returns: The standard deviation as a Number, Timestamp, or Duration. The return type matches the type of
the expression parameter.
STDDEV(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
SELECT
MIN("Order Amount") AS "Min",
MAX("Order Amount") AS "Max",
AVG("Order Amount") AS "Avg",
STDDEV("Order Amount") AS "StdDev"
FROM THIS_PROCESS
Output:
Example 2
This query shows how the cycle time of a process changes week-to-week. Specifically, it calculates both the
average and the standard deviation of the cycle time on a weekly basis.
To smooth out short-term fluctuations in the data, it uses windowed versions of the average (AVG) and
standard deviation (STDDEV) functions. These versions calculate a moving average and moving standard
deviation for the weekly mean cycle time respectively.
In the nested query, sq1, all cases are clustered together by week. The mean cycle time within each cluster is
calculated.
The outer query applies the windowed AVG and STDDEV functions to the weekly mean cycle time. In both cases,
the window includes the three weeks preceding and the three weeks following the current week.
SELECT
"Week",
"Avg Cycle Time",
AVG("Avg Cycle Time") OVER (ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as
mean,
STDDEV("Avg Cycle Time") OVER (ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as
std
FROM
(
SELECT
DATE_TRUNC('WEEK', (SELECT LAST (END_TIME))) AS "Week",
AVG((SELECT LAST(end_time) - FIRST(end_time))) AS "Avg Cycle Time"
FROM THIS_PROCESS
ORDER BY 1 ASC NULLS FIRST
FILL timeseries('WEEK')
) as sq1
1.6.1.15 SUM
Calculates the sum of all values in a collection of numeric values. NULL values are ignored.
Syntax
SUM(<expression>)
Returns: The sum as a Number or Duration. The return type matches the type of the expression parameter.
SUM(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
For more details about this syntax, refer to the Window Functions overview.
Example
Related Information
Returns the absolute value of an expression. For an expression, x, the non-negative value of x is returned
regardless of its sign.
Syntax
ABS(<expression>)
Example 1
The resulting table displays the absolute values of the two values:
Example 2
This example finds all changes to order quantities. Since changes can be negative or positive, this query
returns only the magnitude of each change.
SELECT
case_id,
"Order Quantity Changed" AS Change,
ABS("Order Quantity Changed") AS AbsChange
FROM FLATTEN(THIS_PROCESS)
WHERE event_name = 'Change Order Quantity'
Even where order sizes were reduced, those changes can be presented as absolute values.
Calculates the exponent of the equation x = b ^ (y) -> log (b,x) = y of a numeric expression.
Syntax
LOG(<expression>, <base>)
Returns: A Number, the logarithm of the value calculated to the specified base.
Example
SELECT
LOG(10, 1000),
LOG(3, 27),
LOG(2, 8)
FROM THIS_PROCESS
The table displays the results of the three calculations. In each case, the result is 3 (specifically, 2 ^ 3 = 8, 3 ^ 3
= 27, and 2 ^ 3 = 8).
Syntax
POW(<base>, <exponent>)
Example
SELECT
POW(9, 2),
POW(9, 0.5),
POW(9, 0),
POW(9, 1)
FROM THIS_PROCESS
• -1 if x < 0
• 0 if x = 0
• 1 if x > 0
Syntax
SIGN(<expression>)
Example
SELECT
SIGN(10) AS Positive,
SIGN(0) AS Zero,
SIGN(-8) AS Negative
FROM THIS_PROCESS
The resulting table displays the signs of the three numeric expressions:
Syntax
SQRT(<expression>)
Example 1
This query calculates the square root from the given argument. For expression = 9, the result is 3.
SELECT SQRT(9)
FROM THIS_PROCESS
Example 2
This query calculates the square root from the order amounts.
Use conditional functions to define which operations to execute or which value to return based on a condition.
Accepts an arbitrary number of arguments and returns the first argument that isn't NULL.
Syntax
Returns: The value of the first non-NULL expression. If all arguments are NULL, this function returns NULL.
Example 1
The following query returns the most precise location available in each case:
Example 2
A useful application of COALESCE is providing a fallback in case an expression evaluates to NULL. The fallback
should be a literal value supplied as the last argument to the function.
Assuming the same process data as in Example 1, the following query displays the location in each case, but
has a fallback value of "Unknown" in cases where the city is NULL.
Output:
Use conversion functions to convert values from one data type to another.
Related Information
1.6.4.1 DURATION_FROM_DAYS
Syntax
DURATION_FROM_DAYS(<expression>)
Returns: A Duration, the length of time represented by the original number of days.
Output:
1.6.4.2 DURATION_FROM_MILLISECONDS
Converts a number of milliseconds into a Duration.
Syntax
DURATION_FROM_MILLISECONDS(<expression>)
Returns: A Duration, the length of time represented by the original number of milliseconds.
Example
This query demonstrates two instances of converting a number of milliseconds to a Duration. In both cases, the
argument has a value of 720,000 and therefore returns a duration of 12 minutes.
SELECT
DURATION_FROM_MILLISECONDS(720000) AS "Duration 1",
DURATION_FROM_MILLISECONDS(12*60*1000) AS "Duration 2"
FROM THIS_PROCESS
Output:
Syntax
DURATION_TO_DAYS(<expression>)
Returns: A Number, the days in the original duration. The value is a decimal and so may include fractions of a
day.
Example
This query calculates a case's duration by subtracting the last event's timestamp from the first event's
timestamp, which yields a duration. This duration becomes an argument in a call to DURATION_TO_DAYS.
SELECT case_id,
(SELECT FIRST(end_time)) AS "Start",
(SELECT LAST(end_time)) AS "End",
(SELECT LAST(end_time)) - (SELECT FIRST(end_time)) AS "Duration",
DURATION_TO_DAYS( (SELECT LAST(end_time)) - (SELECT FIRST(end_time)) ) AS
"Duration in days"
FROM THIS_PROCESS
Output:
1.6.4.4 DURATION_TO_MILLISECONDS
Converts a Duration value into the equivalent number of milliseconds.
Syntax
DURATION_TO_MILLISECONDS(<expression>)
Example
This query calculates a case's duration by subtracting the last event's timestamp from the first
event's timestamps, which yields a duration. This duration becomes an argument in two calls to
DURATION_TO_MILLISECONDS. The first call returns the number of milliseconds. The second call does the
same but is followed by a conversion of the millisecond value into minutes.
SELECT case_id,
(SELECT FIRST(end_time)) AS "Start",
(SELECT LAST(end_time)) AS "End",
(SELECT LAST(end_time)) - (SELECT FIRST(end_time)) AS "Duration",
DURATION_TO_MILLISECONDS( (SELECT LAST(end_time)) - (SELECT
FIRST(end_time)) ) AS "In ms",
DURATION_TO_MILLISECONDS( (SELECT LAST(end_time)) - (SELECT
FIRST(end_time)) ) / (1000*60) AS "In min"
FROM THIS_PROCESS
Output:
Caution
For very large data sets, this function may require excessive CPU activity, causing long query execution
times.
Syntax
TO_NUMBER(<expression>)
Returns: A numeric value, if the value of the string expression can be converted to a number. If expression
can't be converted, then this function returns NULL.
Formatting
The expression argument must strictly follow a numeric format. The only accepted non-numeric character is
a decimal point – commas aren't supported for separating the integer and fractional parts of a number.
Extraneous characters, such as currency symbols or thousands separators, prevent the number from being
converted.
Tip
Example 1
Let's assume we have the following process data (THIS_PROCESS). The days and order amounts are stored as
strings, but aren't formatted consistently.
1 1 100
2 2nd 1000
3 3 1,500
4 null $300
5 five 99.45
The following query converts those values into numbers where possible. In the case of the Order Amount, the
query also performs arithmetic on the resulting value.
SELECT
TO_NUMBER("Day") AS "Day",
TO_NUMBER("Order Amount") * 0.9 AS "Discounted Amount"
FROM THIS_PROCESS
Output:
1.6.4.6 TO_STRING
When converting a timestamp to a string, you can optionally provide its format using a timestamp pattern. If a
pattern isn't provided, the timestamp value is parsed according to RFC 3339 .
Syntax
TO_STRING(<expression> [, <pattern>] )
Note
This pattern follows a defined syn-
tax. Refer to the Timestamp Pat-
tern topic.
Returns: The value of expression as a string. If expression evaluates to NULL, this function returns NULL.
Example 1
The query in this example shows conversion of numeric and timestamp values.
The order amount is a numeric value. To display the amount with a currency symbol, the value must first be
converted to a string before it's concatenated to the symbol.
The date selected is that of the start event in each case. It's converted to a string without providing a
timestamp pattern, meaning it's displayed using the default pattern.
SELECT
case_id,
"Order Amount",
CONCAT('$', TO_STRING("Order Amount")) AS "Order Amount ($)",
(SELECT FIRST(end_time)) AS "Creation Date",
TO_STRING((SELECT FIRST(end_time))) AS "Date as String"
FROM THIS_PROCESS
Output:
The following query demonstrates how the timestamp pattern can be used to format the output of a
timestamp when it's converted to a string.
SELECT
case_id,
(SELECT FIRST(end_time)) AS "Creation Date",
TO_STRING((SELECT FIRST(end_time)), 'DD-MM-YYYY') AS "Date",
TO_STRING((SELECT FIRST(end_time)), 'HH:mm:ss.SSS') AS "24hr Time w/ ms",
TO_STRING((SELECT FIRST(end_time)), 'hh:mm aa') AS "12hr Time"
FROM THIS_PROCESS
Output:
Related Information
1.6.4.7 TO_TIMESTAMP
You can optionally provide the format of the timestamp using a timestamp pattern. If a pattern isn't provided,
the timestamp value is parsed according to RFC 3339.
Caution
For very large data sets, this function may require excessive CPU activity when including a timestamp
pattern, causing long query execution times.
TO_TIMESTAMP(<expression> [, <pattern>] )
mat.
Note
This pattern follows a defined syn-
tax. Refer to the Timestamp Pat-
tern topic.
Returns: The expression as a timestamp. If expression evaluates to NULL, this function returns NULL.
Example 1
Let's assume we have the following process data (THIS_PROCESS), where the timestamp and date information
is stored as strings:
1 1970-01-01T00:00:00.000Z 01/01/1970
2 1970-01-01T00:00:01.000Z 01/01/1970
3 null null
4 2023-09-13T18:55:30.000+02:00 13/09/2023
5 2023-09-13T16:55:30.000Z 13/09/2023
SELECT
case_id,
TO_TIMESTAMP(timestamp_string) AS "Timestamp Value",
TO_TIMESTAMP(date_string, 'DD/MM/YYYY') AS "Date Value"
FROM THIS_PROCESS
Output:
Example 2
Continuing with the data from the previous example, this query demonstrates how TO_STRING and
TO_TIMESTAMP are inverses of one another.
The following query returns the original timestamp string value after conversion to and from a timestamp, but
in the timezone UTC+0.
SELECT
case_id,
TO_STRING(TO_TIMESTAMP(timestamp_string)) AS "Original Timestamp"
FROM THIS_PROCESS
Output:
RFC 3339
Timestamp Pattern [page 121]
TO_STRING [page 116]
Timestamp patterns specify a format for showing date and time values.
A timestamp pattern is a string used when converting date and time values to strings. Each character in the
pattern specifies which element of a timestamp features in the output and how it appears.
Note
This pattern character is supported only in the
TO_STRING function.
M Month (1-12)
aa am/pm (lowercase)
AA AM/PM (uppercase)
m Minute (0-59)
s Second (0-60)
T Literal T
Remarks
• Both padded and unpadded inputs are accepted. For example, the pattern M/D/YYYY also parses
01/01/1970 and MM/DD/YYYY also parses 1/1/1970.
• Timezones are parsed in any supported format. Z/ZZ/ZZZ parses any of the following timezone offsets:
Z/+00/+0000/ +00:00
The patterns for fractional sections (SSS / SSSSSS / SSSSSSSSS) expect the exact number of fractional
digits:
Note
1.6.5.1 DATE_ADD
Syntax
• 'year'
• 'quarter'
• 'month'
• 'week'
• 'day'
• 'hour'
• 'minute'
• 'second'
• 'millisecond'
Note
Fractional parts of this value are ig-
nored.
Returns: A new timestamp, the result of adding the specified number of units to the timestamp parameter. If
number or timestamp is NULL, then this function returns NULL.
Months are handled specially. Adding N months increases the month part of the timestamp by N, wrapping the
value around as necessary if the year boundary is passed. If the day part of the resulting timestamp would be
greater than the number of days in the month, then the day value becomes the last day of the month.
For example, simply adding a month to '2023-01-31' would yield '2023-02-31', an invalid date. Therefore,
DATE_ADD('month', 1, DATE '2023-01-31') returns '2023-02-28'
The special handling of months means that this function isn't associative, meaning that chaining together
calls to DATE_ADD can have different results depending on the order of the calls. Refer to Example 3 for a
demonstration.
SELECT start_date,
DATE_ADD('year', 1, start_date) AS "+ 1 year",
DATE_ADD('quarter', 1, start_date) AS "+ 1 quarter",
DATE_ADD('month', 1, start_date) AS "+1 month",
DATE_ADD('day', 1, start_date) AS "+1 day",
DATE_ADD('hour', 1, start_date) AS "+1 hour"
FROM THIS_PROCESS
Example 2
Using the process data from the previous example, the following query demonstrates subtracting time units
from a timestamp:
SELECT start_date,
DATE_ADD('year', -2, start_date) AS "-2 years",
DATE_ADD('quarter', -3, start_date) AS "-3 quarters",
DATE_ADD('month', -4, start_date) AS "-4 months",
DATE_ADD('day', -5, start_date) AS "-5 days",
DATE_ADD('hour', -6, start_date) AS "-6 hours"
FROM THIS_PROCESS
-- Filter out null values (demo'ed in Example 1)
WHERE start_date IS NOT NULL
Example 3
Because DATE_ADD both accepts and returns a timestamp, it's possible to chain together calls to this function.
For example, the following expression is valid:
However, the special handling of months means DATE_ADD isn't associative, so chaining calls to this function
can yield different results depending on the ordering of calls. Sticking with the previous input data, running the
query
SELECT start_date,
-- Column 2: Subtract a month, then add a month
DATE_ADD('month', 1, DATE_ADD('month', -1, start_date)) as "-1 then +1
month",
-- Column 3: Add a month, then subtract a month
DATE_ADD('month', -1, DATE_ADD('month', 1, start_date)) as "+1 then -1 month"
FROM THIS_PROCESS
WHERE start_date IS NOT NULL
In column 2, the inner call to DATE_ADD is executed first and subtracts a month from start_date:
The outer call is then executed and adds a month to the result:
Related Information
1.6.5.2 DATE_DIFF
Returns the period between two timestamps as a number of date or time units.
Tip
It's also possible to calculate the period between two timestamps as a duration value using one of the
following methods:
Syntax
turned value.
• 'year'
• 'quarter'
• 'month'
• 'week'
• 'day'
• 'hour'
• 'minute'
• 'second'
• 'millisecond'
Returns: A number representing the period between the two timestamps. Note that:
The function doesn't take calendar years, quarters, or weeks into account:
The counting of months is handled specially. One month is considered elapsed each time the calendar month
can be increased without reaching the month of endTimestamp. When the month prior to endTimestamp is
reached, a further month is considered elapsed only if the day and time parts of endTimestamp are greater
than or equal to those of startTimestamp.
For example, consider the dates '2024-01-15' and '2024-04-14' as startTimestamp and endTimestamp
respectively. Beginning with January 2024, we count two increases in the calendar month until reaching March
2024, which is the month prior to April 2024. In this case, the day part of endTimestamp is less than that of
startTimestamp, so a further month isn't considered elapsed. There's therefore a two-month period between
'2024-01-15' and '2024-04-14'.
The special handling of months means that the behavior of DATE_DIFF isn't always consistent with the
behavior of the DATE_ADD function. Using DATE_ADD to add N months to date D doesn't necessarily mean that
DATE_DIFF returns N when comparing the result with D. Refer to Example 2 for a demonstration.
SELECT start_date,
end_date,
DATE_DIFF('year', start_date, end_date) AS diff_year,
DATE_DIFF('quarter', start_date, end_date) AS diff_quarter,
DATE_DIFF('month', start_date, end_date) AS diff_month,
DATE_DIFF('day', start_date, end_date) AS diff_day
FROM THIS_PROCESS
Example 2
The special handling of months means that the behavior of the DATE_DIFF function isn’t always consistent
with that of the DATE_ADD function. Sticking with the previous input data, the following query shows three
values in each case: start_date, start_date with one month added to it, and the difference between the
two according to DATE_DIFF:
SELECT start_date,
DATE_ADD('month', 1, start_date) AS start_plus_1_month,
DATE_DIFF('month', start_date, (DATE_ADD('month', 1, start_date))) AS
diff_month
FROM THIS_PROCESS
In the second and third cases, the day part of start_plus_1_month is less than that of start_date, and so
this difference isn't considered a month.
To avoid this behavior, you can use the DATE_TRUNC function to restrict the precision of your timestamps to the
month level, as the following query does:
SELECT start_date,
DATE_TRUNC('month', start_date) AS start_truncated,
DATE_ADD('month', 1, DATE_TRUNC('month', start_date)) AS start_plus_1_month,
DATE_DIFF('month',
DATE_TRUNC('month', start_date),
DATE_ADD('month', 1, DATE_TRUNC('month', start_date))
) AS diff_month
FROM THIS_PROCESS
Output:
Related Information
For example, you can display the month and week of a timestamp in separate columns or you can display date
information that is not displayed by default, for example the hour.
Syntax
DATE_PART(<precision>, <expression>)
Available values:
• 'year'
• 'quarter'
• 'month'
• 'week' (ISO 8601-week numbering
is applied)
• 'day'
• 'day_of_week' (returns 0 – 6, be-
ginning with Sunday = 0)
• 'hour'
• 'minute'
• 'second'
• millisecond
Returns: A Number corresponding to the value of the extracted part of the date.
Example
SELECT
"case_id",
(SELECT FIRST("end_time")) AS "Timestamp",
DATE_PART('day', (SELECT FIRST("end_time"))) AS "Day",
DATE_PART('day_of_week', (SELECT FIRST("end_time"))) AS "Day of week",
DATE_PART('week', (SELECT FIRST("end_time"))) AS "Week",
DATE_PART('month', (SELECT FIRST("end_time"))) AS "Month",
DATE_PART('year', (SELECT FIRST("end_time"))) AS "Year",
DATE_PART('hour', (SELECT FIRST("end_time"))) AS "Hour",
This query extracts date parts from the timestamp of the cases and displays them in separate columns. If
the underlying data doesn't provide the queried precision, the value is displayed as "0", as in this example for
milliseconds.
Example
Example output:
1.6.5.4 DATE_TRUNC
Truncation in this context means that all precision levels below the truncated date part of the timestamp are
displayed as "01". For example, if you truncate the timestamp with precision level "year", all months and days
are set to "01".
Syntax
DATE_TRUNC(<precision>, <expression>)
Available values:
• 'year'
• 'quarter'
• 'month'
• 'week' (ISO 8601-week numbering
is applied)
• 'day'
• 'hour'
• 'minute'
• 'second'
• 'millisecond'
Example
SELECT
DATE_TRUNC('day', (SELECT FIRST(END_TIME))) AS "Truncated (day)",
DATE_TRUNC('month', (SELECT FIRST(END_TIME))) AS "Truncated (month)",
DATE_TRUNC('year', (SELECT FIRST(END_TIME))) AS "Truncated (year)"
FROM THIS_PROCESS
This query returns the given timestamps with truncated date parts:
• DATE_TRUNC('day') returns the timestamps unmodified because precision levels below "day" are not
displayed.
• DATE_TRUNC('month') returns the timestamps with all precision levels below month are set to "01", that is
01/mm/yyyy.
• DATE_TRUNC('year') returns the timestamps with all precision levels below year are set to "01", that is
01/01/yyyy.
Example
Example output:
The calculation is done according to a specific weekday calendar of relevant 'working' and 'non-working' time.
The built-in weekday calendars support a possible level of precision down to the millisecond.
Note
The built-in calendars are defined at the day level. Working time for work days is considered as a full 24
hours.
Syntax
Example
The query below demonstrates how the choice of weekday calendar affects calculated durations.
Three pairs of timestamps are selected. For each pair, the duration field, calculated without reference to
working time, displays the simple time difference. Each subsequent column shows how the original time
difference changes when taking into account a different weekday calendar.
Example
SELECT case_id,
(SELECT FIRST(end_time)) AS "start",
(SELECT LAST(end_time)) AS "end",
(SELECT LAST(end_time)) - (SELECT FIRST(end_time)) AS "duration",
DURATION_BETWEEN(
(SELECT FIRST(end_time)),
(SELECT LAST(end_time)),
'WEEKDAY_MTWTF'
) AS "MTWTF",
DURATION_BETWEEN(
(SELECT FIRST(end_time)),
(SELECT LAST(end_time)),
'WEEKDAY_MTWTFS'
) AS "MTWTFS",
DURATION_BETWEEN(
(SELECT FIRST(end_time)),
(SELECT LAST(end_time)),
'WEEKDAY_SMTWT'
) AS "SMTWT",
DURATION_BETWEEN(
(SELECT FIRST(end_time)),
(SELECT LAST(end_time)),
'WEEKDAY_SSMTWT')
AS "SSMTWT",
DURATION_BETWEEN(
(SELECT FIRST(end_time)),
(SELECT LAST(end_time)),
'WEEKDAY_SMTWTF'
) AS "SMTWTF",
DURATION_BETWEEN(
(SELECT FIRST(end_time)),
Example output:
1.6.5.6 DURATION_FROM_DAYS
Converts a number of days into a Duration.
Syntax
DURATION_FROM_DAYS(<expression>)
Returns: A Duration, the length of time represented by the original number of days.
Example
Output:
Syntax
DURATION_FROM_MILLISECONDS(<expression>)
Returns: A Duration, the length of time represented by the original number of milliseconds.
Example
This query demonstrates two instances of converting a number of milliseconds to a Duration. In both cases, the
argument has a value of 720,000 and therefore returns a duration of 12 minutes.
SELECT
DURATION_FROM_MILLISECONDS(720000) AS "Duration 1",
DURATION_FROM_MILLISECONDS(12*60*1000) AS "Duration 2"
FROM THIS_PROCESS
Output:
1.6.5.8 DURATION_TO_DAYS
Syntax
DURATION_TO_DAYS(<expression>)
Returns: A Number, the days in the original duration. The value is a decimal and so may include fractions of a
day.
Example
This query calculates a case's duration by subtracting the last event's timestamp from the first event's
timestamp, which yields a duration. This duration becomes an argument in a call to DURATION_TO_DAYS.
SELECT case_id,
(SELECT FIRST(end_time)) AS "Start",
(SELECT LAST(end_time)) AS "End",
(SELECT LAST(end_time)) - (SELECT FIRST(end_time)) AS "Duration",
DURATION_TO_DAYS( (SELECT LAST(end_time)) - (SELECT FIRST(end_time)) ) AS
"Duration in days"
FROM THIS_PROCESS
Output:
Related Information
Syntax
DURATION_TO_MILLISECONDS(<expression>)
Example
This query calculates a case's duration by subtracting the last event's timestamp from the first
event's timestamps, which yields a duration. This duration becomes an argument in two calls to
DURATION_TO_MILLISECONDS. The first call returns the number of milliseconds. The second call does the
same but is followed by a conversion of the millisecond value into minutes.
SELECT case_id,
(SELECT FIRST(end_time)) AS "Start",
(SELECT LAST(end_time)) AS "End",
(SELECT LAST(end_time)) - (SELECT FIRST(end_time)) AS "Duration",
DURATION_TO_MILLISECONDS( (SELECT LAST(end_time)) - (SELECT
FIRST(end_time)) ) AS "In ms",
DURATION_TO_MILLISECONDS( (SELECT LAST(end_time)) - (SELECT
FIRST(end_time)) ) / (1000*60) AS "In min"
FROM THIS_PROCESS
Output:
What is UTC?
UTC is Coordinated Universal Time, the main standard by which clocks are globally synchronized.
A UTC timestamp represents time measured at 0° longitude. All time zones are offset from UTC to calculate
local time. For example, Central European Time (CET) is UTC+1. A local time of 09:00 CET would be 08:00
UTC.
UTC doesn't change with seasons and isn't affected by daylight saving. Therefore, Central European Summer
Time (CEST) is UTC+2. A local time of 10:00 CEST would be 08:00 UTC.
Syntax
NOW()
Returns: A Timestamp corresponding to the date and time at which the query is executed.
Example
In the following example, the output of the call to NOW becomes the expression parameter value for
DATE_PART. The result displays the current quarter at the time the query is executed.
SELECT
NOW(),
DATE_PART('quarter', NOW()) as "current_quarter"
FROM THIS_PROCESS
1.6.6.1 CHAR_INDEX
Returns the starting position of the first occurrence of a string within a second string.
Syntax
CHAR_INDEX(<string>, <searchString>)
searched.
searchString String
The string to be located in string.
• if searchString appears in string, then this function returns the starting index of the first occurrence.
• if searchString doesn't appear in string, then this function returns 0.
Note
• The indexing is 1-based, meaning the first character in stringExpression occupies character index
1.
• The indexing counts Unicode characters.
• The function is case-sensitive. For example, the string 'case' wouldn't be found in the string
'UPPERCASE'.
Example 1
• Finding the only occurrence of a string: In this case, 'York' is searched for within 'New York'. Since the 'Y' in
'York' occupies the fifth position in the string, the function returns 5.
• Finding the first of many occurrences of a string: In this case, 'iss' is searched for within 'Mississippi'. Even
though 'iss' occurs twice, it first appears at index 2, and so the function returns 2.
• Returning 0 when failing to find a string: In this case, the function fails to find 'San' inside 'Los Angeles' and
so returns 0.
• Returning NULL when a parameter is NULL: In this case, 'Anywhere' is searched for within NULL, which can't
be searched, so the function returns NULL.
SELECT
CHAR_INDEX('New York', 'York') AS "New YORK",
CHAR_INDEX('Mississippi', 'iss') AS "MISSissippi",
CHAR_INDEX('Los Angeles', 'San') AS "Los Angeles",
CHAR_INDEX(NULL, 'Anywhere') AS "Nowhere"
FROM THIS_PROCESS
Example 2
In this example, there are six types of goods for sale. We can think of these six types as being three garment
types – cappy, hoody and t-shirt – each available in two varieties: with and without a print. This query outputs
the types and the number sold.
SELECT
"Type of Goods",
COUNT("Type of Goods") AS "Number Sold"
FROM THIS_PROCESS
Output:
Let's say we want to ignore the distinction between varieties and instead count the number sold per garment
type.
The following query removes the first occurrence of a space character along with all text that follows, leaving
only the first word. The condition in the first IF function uses CHAR_INDEX to determine whether the field
includes a space character. If it doesn't, then the "Type of Goods" field is returned as is. Otherwise, a substring
of that field is extracted, specifically between the first character and the index of the first occurring space. The
index is decremented by 1 so that the space character isn't included in the substring.
SELECT
IF(
-- Checks whether a whitespace is present inside the "Type of Goods"
field.
Output:
Related Information
1.6.6.2 CHAR_LENGTH
Syntax
CHAR_LENGTH(<string>)
Returns: The number of Unicode characters in the string. If the string is NULL, the function returns a NULL
value.
Example
This query displays the names of all cities in the data (THIS_PROCESS) along with the number of Unicode
characters in those names, including spaces.
SELECT
"City",
CHAR_LENGTH("City") AS "Length"
FROM THIS_PROCESS
GROUP BY 1
Output:
1.6.6.3 CONCAT
CONCAT can only be applied to non-nested attributes. To use CONCAT on nested attributes, use the FLATTEN
operator. Alternatively, use a nested query that returns a single string attribute as a result.
CONCAT(<expression1>, <expression2>)
This query concatenates a dash to the name of every event which features the string 'ship'. It then uses the
output of that call as a parameter to a second call to CONCAT, appending the carrier name.
SELECT
COUNT(DISTINCT case_id) AS "Case Count", event_name,
"Shipment Carrier",
CONCAT(CONCAT(event_name, ' - '), "Shipment Carrier") AS "Shipment Type and
Provider"
FROM FLATTEN(THIS_PROCESS)
WHERE event_name ILIKE '%ship%'
Output:
This query selects a field from a nested attribute, using a subquery to select the first event name. Using the
alternate syntax for concatenation, it appends to this event name the case ID.
SELECT
Output:
1.6.6.4 LEFT
Syntax
LEFT(<string>, <numberOfCharacters>)
return.
Returns: The specified number of leftmost characters from the string argument. If the value exceeds the
length of string, then this function returns the complete string.
This query displays the names of all cities in the data (THIS_PROCESS) as well as the six leftmost characters of
each name. Names of length less than six are displayed in their entirety.
SELECT
"City",
LEFT("City", 6) AS "Leftmost6"
FROM THIS_PROCESS
GROUP BY 1
Output:
Related Information
1.6.6.5 LOWER
Converts all upper case characters in an input string to lower case characters.
Syntax
LOWER(<string>)
Returns: A new string with the same value as the string parameter but with all upper case characters
converted to lower case.
Example
This query demonstrates converting all upper case characters in the "Type of Goods" attribute to lower case:
1.6.6.6 LTRIM
The definition of a whitespace character follows the Unicode Character Database definition and includes
characters such as:
Syntax
LTRIM(<string>)
Returns: A new string with the same value as the string parameter but with all leading whitespace characters
removed.
Related Functions
Note
The data in this example has been rendered so as to make all whitespace characters explicit.
Output:
| Type of Goods | LTRIM | RTRIM |
TRIM |
|----------------------|----------------------|----------------------|-------------
---------|
| "Cappy" | "Cappy" | "Cappy" |
"Cappy" |
| " T-shirt\t" | "T-shirt\t" | " T-shirt" | "T-
shirt" |
| "Cappy with Print" | "Cappy with Print" | "Cappy with Print" | "Cappy
with Print" |
| "Hoody \r\n" | "Hoody \r\n" | "Hoody" |
"Hoody" |
Related Information
1.6.6.7 REPLACE
Searches a string, replacing all occurrences of a specified substring with an alternative string
REPLACE can only be applied to non-nested attributes. To use REPLACE on nested attributes, use the
FLATTEN operator. Alternatively, use a nested query that returns a single string attribute as a result.
Returns: A String, the result of replacing in sourceExpression all incidences of searchExpression with
replacementExpression.
SELECT
DISTINCT "Type of Goods",
REPLACE("Type of Goods", 'Cappy', 'Cap') AS "Single Text Adjustment
(Literal): Cappy->Cap",
REPLACE("Type of Goods", 'w', 'W') AS "Multiple Text Adjustment (Literal):
w->W"
FROM FLATTEN(THIS_PROCESS)
Output:
SELECT
case_id,
(SELECT FIRST(event_name)) AS "Nested Attribute: Event_Name",
Output:
1.6.6.8 REVERSE
Returns a new string with the order of all characters from the input string reversed.
Syntax
REVERSE(<string>)
Returns: A string containing all characters from the string parameter in reverse order.
This query demonstrates reversing the "Type of Goods" attribute in every row:
Output:
Syntax
RIGHT(<string>, <numberOfCharacters>)
return.
Returns: The specified number of rightmost characters from the string argument. If the value exceeds the
length of string, then this function returns the complete string.
Example
This query displays the names of all cities in the data (THIS_PROCESS) as well as the six rightmost characters
of each name. Names of length less than six are displayed in their entirety.
SELECT
"City",
RIGHT("City", 6) AS "Rightmost6"
FROM THIS_PROCESS
GROUP BY 1
Related Information
1.6.6.10 RTRIM
The definition of a whitespace character follows the Unicode Character Database definition and includes
characters such as:
Syntax
RTRIM(<string>)
Returns: A new string with the same value as the string parameter but with all trailing whitespace characters
removed.
Related Functions
Example
Note
The data in this example has been rendered so as to make all whitespace characters explicit.
Output:
| Type of Goods | LTRIM | RTRIM |
TRIM |
|----------------------|----------------------|----------------------|-------------
---------|
| "Cappy" | "Cappy" | "Cappy" |
"Cappy" |
| " T-shirt\t" | "T-shirt\t" | " T-shirt" | "T-
shirt" |
| "Cappy with Print" | "Cappy with Print" | "Cappy with Print" | "Cappy
with Print" |
| "Hoody \r\n" | "Hoody \r\n" | "Hoody" |
"Hoody" |
1.6.6.11 SUBSTRING
Extracts from a string a specified number of Unicode characters beginning from a given start position.
Syntax
Returns: A string containing a subset of characters from the string parameter beginning at index
startPosition and with a length of up to numberOfCharacters. Exceptions to this include:
SIGNAL provides alternative means of obtaining a substring when delimiters are involved.
A common use case regarding substrings is the extraction of characters occurring before or after a delimiter
or separator string. For example, let's say your process data contains a Department column containing labels
of the form "DEPT_IT", "DEPT_HR", "DEPT_FINANCE" and so on. From these labels, you wish to extract the
characters following but not including the underscore.
However, SIGNAL provides a pair of functions making queries like this more convenient to write:
SUBSTRING_BEFORE(<string>, <searchString>)
SUBSTRING_AFTER(<string>, <searchString>)
These functions work by searching for the first occurrence of searchString within string and then
returning all characters before (or after) that occurrence. The example above could instead be written like
so:
SUBSTRING_AFTER("Department", '_')
Example
In this example, customer IDs have the format 'C_nnnnn'. This query selects only the numeric part of the ID by
extracting the five characters from index position three onwards.
SELECT
"Customer ID",
SUBSTRING("Customer ID", 3, 5) AS "Number Part"
FROM THIS_PROCESS
Output:
Related Information
Extracts from a string all Unicode characters occurring after a delimiter string.
Syntax
SUBSTRING_AFTER(<string>, <searchString>)
Note
If searchString occurs multi-
ple times in string, the first oc-
currence is taken.
Returns: A string containing all characters from the string parameter that occur after searchString.
Exceptions include:
Example 1
SELECT
SUBSTRING_AFTER('DEPT_FINANCE', '_') AS "Finance Dept.",
SUBSTRING_AFTER('DEPT_SALES_MARKETING', '_') AS "Sales & Marketing Dept.",
SUBSTRING_AFTER('DEPT_HR', '-') AS "Search String Not Found",
SUBSTRING_AFTER(NULL, '_') AS "Input String NULL"
FROM THIS_PROCESS
Related Information
1.6.6.13 SUBSTRING_BEFORE
Extracts from a string all Unicode characters occurring before a delimiter string.
Syntax
SUBSTRING_BEFORE(<string>, <searchString>)
Note
If searchString occurs multi-
ple times in string, the first oc-
currence is taken.
Returns: A string containing all characters from the string parameter that occur before searchString.
Exceptions include:
SELECT
SUBSTRING_BEFORE('FINANCE_DEPT', '_') AS "Finance Dept.",
SUBSTRING_BEFORE('SALES_MARKETING_DEPT', '_') AS "Sales & Marketing Dept.",
SUBSTRING_BEFORE('HR_DEPT', '-') AS "Search String Not Found",
SUBSTRING_BEFORE(NULL, '_') AS "Input String NULL"
FROM THIS_PROCESS
Output:
Example 2
In this example, there are six types of goods for sale. We can think of these six types as being three garment
types – cappy, hoody and t-shirt – each available in two varieties: with and without a print. This query outputs
the types and the number sold.
SELECT
"Type of Goods",
COUNT("Type of Goods") AS "Number Sold"
FROM THIS_PROCESS
Output:
The following query ignores all text following the first space, if a space exists.
SELECT
IF(
SUBSTRING_BEFORE("Type of Goods", ' ') IS NOT NULL,
SUBSTRING_BEFORE("Type of Goods", ' '),
"Type of Goods"
) AS "Garment Type",
COUNT("Type of Goods") AS "Number Sold"
FROM THIS_PROCESS
Output:
Related Information
1.6.6.14 TRIM
Removes all leading and trailing whitespaces from the input string.
The definition of a whitespace character follows the Unicode Character Database definition and includes
characters such as:
TRIM(<string>)
Returns: A new string with the same value as the string parameter but with all leading and trailing
whitespace characters removed.
Related Functions
Example
Note
The data in this example has been rendered so as to make all whitespace characters explicit.
Output:
| Type of Goods | LTRIM | RTRIM |
TRIM |
|----------------------|----------------------|----------------------|-------------
---------|
| "Cappy" | "Cappy" | "Cappy" |
Related Information
1.6.6.15 UPPER
Converts all lower case characters in an input string to upper case characters.
Syntax
UPPER(<string>)
Returns: A new string with the same value as the string parameter but with all lower case characters
converted to upper case.
This query demonstrates converting all lower case characters in the "Type of Goods" attribute to upper case:
Related Information
Learn about window functions in SIGNAL, the process mining query language of SAP Signavio Process
Intelligence.
This section explains the window functions that you can use in your SIGNAL queries.
Window functions are aggregate-like functions that you can perform over a selection of the rows which are
selected by a query. You can use window functions to perform calculations on a set of table rows that are
related to the current row in a table. In window functions, each row remains separate to the query output.
Window functions have access to all the rows that are part of the current row's group which is determined by
the PARTITION BY list of the window function.
Below is the general form of a window function: {aggregation function} OVER ([PARTITION BY
{partition expressions}] [ORDER BY {order expressions}] [[ROWS | RANGE] BETWEEN
{window frame}])
• The {aggregation function} is the function which groups the values of multiple rows to create a single
summary value.
• The ORDER BY clause sorts the data set in the PARTITION BY clause in ascending or descending order.
• The PARTITION BY clause is a subclause of the OVER clause and groups a data set into partitions.
• ROWS|RANGE modes define the scope of the {window frame}.
• The {window frame} is the set of rows related to the current row where the window function is used to
calculate the values of the defined window. You can define the window frame by using the ROW and RANGE
modes.
Limitations
Below is a list of current limitations that apply while using window functions in SIGNAL queries:
• Window functions can only be used on flat data, not on event level or nested data.
• You can't create an empty window frame when using window functions. The following are examples of
empty window frames:
• ROWS BETWEEN 1 PRECEDING AND 2 PRECEDING
• ROWS BETWEEN 1 FOLLOWING AND 1 PRECEDING
• ROWS BETWEEN 1 FOLLOWING AND CURRENT ROW
Considerations
Functions
Aggregate functions
• SUM
• COUNT
• MIN
• MAX
• AVG
• FIRST
• LAST
• BOOL_OR
• BOOL_AND
Non-aggregate functions
• LAG
• LEAD
• ROW_NUMBER
Learn about window functions in SIGNAL, the process mining query language of SAP Signavio Process
Intelligence.
The ORDER BY clause sorts the data set in the PARTITION BY clause in ascending or descending order.
Parameter Description
columnName The column name in your table you want to include in the
function.
Example
City Value
Berlin 1000
Paris 3000
London 2500
Rome 1500
City Value
Berlin 1000
Rome 2500
London 5000
Paris 8000
Learn about window functions in SIGNAL, the process mining query language of SAP Signavio Process
Intelligence.
The PARTITION BY clause defines the group of rows which the window function operates with. You can add
multiple expressions after the PARTITION BY. For example:
Parameter Description
column name The column name in your table you want to include in the
function.
SELECT column name, SUM(column name) OVER (PARTITION BY column name ORDER BY
column name)
Parameter Description
column name The column name in your table you want to include in the
function.
Examples
Example 1
City Value
Berlin 1000
Berlin 1800
Paris 3000
London 2500
Paris 1500
London 1200
Berlin 1300
City Value
Berlin 4100
Berlin 4100
Berlin 4100
Paris 4500
Paris 4500
London 3700
London 3700
Example 2
City Value
Berlin 1000
Berlin 1800
Paris 3000
London 2500
Paris 1500
London 1200
Berlin 1300
City Value
Berlin 1000
Berlin 2300
Berlin 4100
London 1200
London 3700
Paris 1500
Paris 4500
The window frame is the set of rows related to the current row where the window function is used to calculate
the values of the defined window. You can define the window frame by using the ROW and RANGE modes.
The window frame contains a frame_start and a frame_end. These frames are the start and end of your
window frame.
• CURRENT ROW
• UNBOUNDED PRECEDING
• offset PRECEDING
• CURRENT ROW
• UNBOUNDED FOLLOWING
• offset FOLLOWING
The frame_start of CURRENT ROW means the frame starts with the current row's first peer row (a row that the
window's ORDER BY clause sorts as equivalent to the current row). The frame_end of CURRENT ROW means
the frame ends with the current row's last peer row.
The UNBOUNDED keyword is the first or last row of the peer group with the partition.
The offset expression's data type can vary depending on the data type of the ordering column. If you use
numeric ordering columns, the type is the same as the ordering column.
If the ordering column is of the type timestamp ('10 days'), you can have the following RANGE BETWEEN '1 day'
PRECEDING AND '10 days' FOLLOWING. The offset expression must be a non-null and non-negative value.
ROWS
With the ROWS mode, you can define the start and end of the window frame in terms of rows relevant to the
current row. You can define the window frame with the ROWS mode in following ways:
The UNBOUNDED keyword refers to the first or last row in a column or partition.
If there is no ORDER BY clause, the returned results are undefined and the order in which the rows are
processed isn't uniform.
Syntax:
Parameter Description
column name The column name in your table you want to include in the
function.
City Value
Berlin 1000
Paris 3000
London 2500
Paris 1500
City Value
Berlin 4000
Paris 6500
London 7000
Paris 4000
RANGE
With the RANGE mode, you can define where the window frame starts and ends in window functions. When
using the RANGE keyword the ORDER BY clause is required and you must specify one column name by which
the window frame is ordered. Using the RANGE keyword is useful when working time series and when there are
many gaps or duplicate data in your tables.
With the RANGE keyword, you can define the window frame by the maximum difference between the value of
the column in the current row and its value in the preceding or following rows of the scope.
The RANGE mode only works with data types that are of the type interval, which are numbers and timestamps.
Choice and Boolean data types aren't supported.
You can define the window frame with the RANGE mode in the following ways:
Syntax:
column name The column name in your table you want to include in the
function.
interval type Data type which are of type interval such as numbers (1.0)
and timestamps ('10days').
Example:
City Value
Berlin 1000
Paris 3000
London 3000
Paris 2000
SELECT city, SUM(value) OVER (ORDER BY value RANGE BETWEEN 1000.0 PRECEDING AND
1000.0 FOLLOWING)
City Value
Berlin 3000.0
Paris 9000.0
London 8000.0
Paris 8000.0
Some of the standard aggregate functions can be applied to windows. This allows the function to be applied to
defined groups of rows instead of the entire table.
For more general information about windows, refer to the Window Functions overview.
Caution
For very large data sets, these functions may require excessive CPU activity, causing long query execution
times.
Related Information
1.6.7.4.1 AVG
Calculates the average of a collection of numeric values. NULL values are ignored
Syntax
AVG(<expression>)
Returns: The average as a Number, Timestamp or Duration. The return type matches the type of the
expression parameter.
AVG(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
Related Information
1.6.7.4.2 BOOL_AND
Returns true if the supplied expression evaluates to true for all input rows, otherwise it returns false.
Syntax
BOOL_AND(<expression>)
Returns: A Boolean. Result is true if all input rows evaluate to true for the supplied expression, otherwise false.
BOOL_AND(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
Example
This query returns true if all rows record an approval performed by a manager.
SELECT
(SELECT BOOL_AND(event_name = 'Approve' AND performer = 'Manager'))
FROM THIS_PROCESS
Related Information
1.6.7.4.3 BOOL_OR
Returns true if the supplied expression evaluates to true for any input row, otherwise it returns false.
Syntax
BOOL_OR(<expression>)
Returns: A Boolean. Result is true if any input row evaluates to true for the supplied expression, otherwise false.
BOOL_OR(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
This query returns true if any row records an approval performed by a manager.
SELECT
(SELECT BOOL_OR(event_name = 'Approve' AND performer = 'Manager'))
FROM THIS_PROCESS
Related Information
1.6.7.4.4 COUNT
Counts the number of values in a specified column. NULL values aren't counted.
Syntax
COUNT(<expression>)
COUNT(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
SELECT COUNT(case_id)
Related Information
1.6.7.4.5 FIRST
Syntax
FIRST(<expression>)
expression The collection of values from which the Number, Timestamp, Duration, Text,
first is chosen. Boolean
Returns: The first value as a Number, Timestamp, Duration, Text or Boolean. The return type matches the type
of the expression parameter.
FIRST(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
This query returns the name of the first event in this process.
SELECT
(SELECT FIRST(event_name))
FROM THIS_PROCESS
1.6.7.4.6 LAST
Syntax
LAST(<expression>)
expression The collection of values from which the Number, Timestamp, Duration, Text,
last is chosen. Boolean
Returns: The last value as a Number, Timestamp, Duration, Text or Boolean. The return type matches the type
of the expression parameter.
LAST(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
This query returns the name of the last event in this process.
SELECT
(SELECT LAST(event_name))
FROM THIS_PROCESS
Related Information
Syntax
MAX(<expression>)
expression The collection from which the maxi- Number, Timestamp, Duration
mum value is chosen.
Returns: The maximum value as a Number, Timestamp or Duration. The return type matches the type of the
expression parameter.
MAX(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
SELECT MAX(discount)
FROM THIS_PROCESS
Related Information
Syntax
MIN(<expression>)
expression The collection from which the minimum Number, Timestamp, Duration
value is chosen.
Returns: The minimum value as a Number, Timestamp or Duration. The return type matches the type of the
expression parameter.
MIN(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
SELECT MIN(discount)
FROM THIS_PROCESS
Related Information
The standard deviation describes the average deviation of all measured values from the mean value. A low
standard deviation indicates that the values tend to be close to the mean value. A high standard deviation
indicates that the values are spread out over a wide range.
Syntax
STDDEV(<expression>)
expression The collection of values for which you Number, Timestamp, Duration
want to determine the standard devia-
tion.
Returns: The standard deviation as a Number, Timestamp, or Duration. The return type matches the type of
the expression parameter.
STDDEV(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example 1
SELECT
MIN("Order Amount") AS "Min",
MAX("Order Amount") AS "Max",
AVG("Order Amount") AS "Avg",
STDDEV("Order Amount") AS "StdDev"
FROM THIS_PROCESS
Output:
This query shows how the cycle time of a process changes week-to-week. Specifically, it calculates both the
average and the standard deviation of the cycle time on a weekly basis.
To smooth out short-term fluctuations in the data, it uses windowed versions of the average (AVG) and
standard deviation (STDDEV) functions. These versions calculate a moving average and moving standard
deviation for the weekly mean cycle time respectively.
In the nested query, sq1, all cases are clustered together by week. The mean cycle time within each cluster is
calculated.
The outer query applies the windowed AVG and STDDEV functions to the weekly mean cycle time. In both cases,
the window includes the three weeks preceding and the three weeks following the current week.
SELECT
"Week",
"Avg Cycle Time",
AVG("Avg Cycle Time") OVER (ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as
mean,
STDDEV("Avg Cycle Time") OVER (ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as
std
FROM
(
SELECT
DATE_TRUNC('WEEK', (SELECT LAST (END_TIME))) AS "Week",
AVG((SELECT LAST(end_time) - FIRST(end_time))) AS "Avg Cycle Time"
FROM THIS_PROCESS
ORDER BY 1 ASC NULLS FIRST
FILL timeseries('WEEK')
) as sq1
1.6.7.4.10 SUM
Calculates the sum of all values in a collection of numeric values. NULL values are ignored.
Syntax
SUM(<expression>)
Returns: The sum as a Number or Duration. The return type matches the type of the expression parameter.
SUM(<expression>) OVER (
[ PARTITION BY [, <partitionExpression>, ...] ]
[ ORDER BY <orderExpression> [ { ASC | DESC } ] [, <orderExpression>, ...] [
<windowFrame> ] ]
)
For more details about this syntax, refer to the Window Functions overview.
Example
Related Information
The functions in this section do not aggregate collections of values into a single value. They are applied to
individual rows within a window.
For more general information about windows, refer to the Window Functions overview.
Caution
For very large data sets, these functions may require excessive CPU activity, causing long query execution
times.
Related Information
1.6.7.5.1 LAG
Returns a single column value from the preceding row according to the window partition and sort criteria.
When no preceding row exists, a null value is returned.
Syntax
LAG(<columnName>) OVER (
[ PARTITION BY <partitionExpression> [, <partitionExpression>, ... ] ]
ORDER BY <sortExpression>
)
Example
This query partitions data by case ID, sorting each partition chronologically. Each row includes the case ID and
event name along with two additional columns:
The result shows the name of each event that occurred, the name of the preceding event and how much time
separates the two events. For events having no predecessor, a null value is displayed.
SELECT
case_id,
LAG(event_name) OVER (PARTITION BY case_id ORDER BY end_time) AS Predecessor,
event_name AS Event,
end_time - LAG(end_time) OVER (PARTITION BY case_id ORDER BY end_time) AS
cycle_time
FROM FLATTEN(THIS_PROCESS)
Example
Example output:
Returns a single column value from the succeeding row according to the window partition and sort criteria.
When no succeeding row exists, a null value is returned.
Syntax
LEAD(<columnName>) OVER (
[ PARTITION BY <partitionExpression> [, <partitionExpression>, ... ] ]
ORDER BY <sortExpression>
)
Parameter Description
Example
This query partitions data by case ID, sorting each partition chronologically. Each row includes the case ID and
event name along with two additional columns:
The result shows the name of each event that occurred, the name of the succeeding event and how much time
separates the two events. For events having no successor, a null value is displayed.
SELECT
case_id,
event_name AS Event,
LEAD(event_name) OVER (PARTITION BY case_id ORDER BY end_time) AS Successor,
LEAD(end_time) OVER (PARTITION BY case_id ORDER BY end_time) - end_time AS
CycleTime
FROM FLATTEN(THIS_PROCESS)
Example
Example output:
Returns a list of numbers representing a running occurrence count of distinct items in an event list.
The function operates at event level. When applied to an event column, it calculates a running total of
occurrences for each distinct value in the event list. At each step, it adds to a list the number of item
occurrences up to that point.
Syntax
<aggregateFunction> ( <occurrenceAlias> )
FROM ( SELECT OCCURRENCE( <expression> ) AS <occurrenceAlias> ) AS
<subqueryAlias>
Parameter Description
occurrenceAlias An alias for the OCCURRENCE function's output for the ag-
gregate function to use.
Returns: A list of numbers containing the running totals of each distinct item in expression. Note that:
Behavior
To illustrate the function's behavior, let's imagine an event column containing six values: ["A", "B", "A",
"C", "B", "A"]. The following table shows how OCCURRENCE proceeds to calculate a return value for this
column as it encounters each value:
Return Value of
Value Number Value Remark OCCURRENCE So Far
Example
The OCCURRENCE function is useful for identifying cases containing rework, indicated by repeated events. The
following query selects the ID and the list of event names for each case.
The result set is filtered by the subquery in the WHERE clause. The OCCURRENCE function is applied to the
event_name column, which returns a list of running occurrences of each event. The MAX function finds the
largest number in this list, indicating the number of times the most repeated event occurred per case. Only
cases containing an event that was worked more than or equal to four times are included in the result.
SELECT
case_id,
(
SELECT AVG(occ) FROM (
SELECT OCCURRENCE(event_name) AS occ
) AS sub
) AS "Average"
FROM THIS_PROCESS
WHERE case_id = '00098'
The OCCURRENCE function determines for each value the number of occurrences up to the current point. The
occurrence values for case '00098' are [1, 1, 2, 3, 4, 1, 1]. Consequently, the AVG function calculates
case '00098' as 1.857, the sum of its occurrence values (13) divided by the number of elements (7).
Related Information
The functions in this section are window functions that sort data partitions and assign a rank value for each row
within a result set.
Using these functions, you can filter the results and sort them for consistent stacking in visualizations such as
stacked bar charts.
Caution
For very large data sets, these functions may require excessive CPU activity, causing long query execution
times.
Returns the rank of each row in a result set based on the order defined in the OVER clause for each partition
The DENSE_RANK function assigns the same ranking for rows with identical values and doesn't skip the rank
positions for these identical rows. For example, rows with rank values, 1, 2, 2, 2, 3, 4. The rank value of the next
nonidentical row will have the succeeding rank.
Syntax
DENSE_RANK() OVER (
[ PARTITION BY <expression> [, <expression>, ...] ]
[ ORDER BY <expression> [ { ASC | DESC } ] [, <expression>, ...] ]
)
Parameters Description
ORDER BY: Sorts each partition based on the defined criteria. Optional. If ORDER BY clause isn't included in
your query, each row will have the same rank value, one.
ASC | DESC: ASC sorts the result set in ascending order and DESC sorts it in descending order based on the
expressions defined in ORDER BY.
Example 1
In the following query, the cities are first partitioned by the type of good and sorted by the count of high value
orders per Type of Goods. The query then returns the cities with order amount greater than 1500 ranked in
descending order.
Example
Result:
The following query partitions the result set by the type of goods and returns the list of cities with order amount
greater than 1500 within a partition. In the result set, the RANK and DENSE RANK of all the returned rows is
one as the ORDER BY clause isn't included in the query.
Example
Result:
Returns the rank of each row in a result set based on the order defined in the OVER clause for each partition.
The RANK function assigns the same ranking for rows with identical values and skips the rank positions for
these identical rows. For example, rows with rank values, 1, 2, 2, 2, 5, 6. The rank value of the next nonidentical
row depends on the number of rows with same ranking. The number of rows with the same ranking determines
the number of rank values to be skipped.
Syntax
RANK() OVER (
[ PARTITION BY <expression> [, <expression>, ...] ]
[ ORDER BY <expression> [ { ASC | DESC } ] [, <expression>, ...] ]
)
Parameters Description
ASC | DESC: ASC sorts the result set in ascending order and DESC sorts it in descending order based on the
expressions defined in ORDER BY.
Example
In the following query, the cities are first partitioned by the type of good and sorted by the count of high value
orders per Type of Goods. The query then returns the cities with order amount greater than 1500 ranked in
descending order.
Example
Result:
Returns the calculated row number based on partitioned and sorted set of values.
The ranking happens sequentially based on the order defined in the OVER clause for each partition. The rank
value is different even if the rows contain the same values.
Syntax
ROW_NUMBER() OVER (
[ PARTITION BY <expression> [, <expression>, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, <expression>, ...] ]
)
Parameters Description
expression The column or argument that determines how the rows are
grouped and how each partition is sorted. Mandatory if PAR-
TITION BY or ORDER BY clauses are included in the query.
ORDER BY: Sorts each partition based on the defined criteria. Optional.
ASC | DESC: ASC sorts the result set in ascending order and DESC sorts it in descending order based on the
expressions defined in ORDER BY.
Example
In the following query, the order amount is first partitioned based on the city and the type of good. The query
then returns the order amount ranked in descending order in the result set.
Example
Result:
Learn about the BUCKET() function in SIGNAL, the process mining query language of SAP Signavio Process
Intelligence.
The BUCKET() function calculates the indexes of values in a range of values (as buckets) that are equal in
size. The bucket indexes can then be applied to an aggregate function, for example a COUNT() function, which
counts the values within that bucket.
The BUCKET() function uses following parameters to calculate the bucket indexes as a positive integer
number:
Syntax:
Parameter Description
min The starting value of the bucket. The values must be either
numeric or duration values.
bucket_width The width of the bucket. The value must be either a positive
numeric or positive duration [page 6] that is greater than
zero.
To understand the range of values in the bucket, you can first determine the bucket boundaries and then apply
the boundaries to the BUCKET() function. To determine the bucket boundaries and then apply the boundaries
to the BUCKET() function, use the following pattern:
SELECT
IF (b = 0, NULL, ((b - 1) * bucket_width) + min) AS bucket_start,
IF (b = #_inlier_buckets + 1, NULL, (b * bucket_width) + min) AS bucket_end,
value
FROM (
SELECT BUCKET(expression, bucket_start, bucket_width, #_inlier_buckets) AS b,
COUNT(1) AS value
FROM ...
) AS sub
The bucket boundaries pattern isn't required for the BUCKET() function to work. It provides a useful way to
understand the range of values in the bucket. The pattern defines the minimum and maximum values being
aggregated in the bucket.
Example 1:
The following example calculates the bucket boundaries (bucket_start and bucket_end). Then aggregates the
total number of cases based on 'Order Amount'. The cases are bucketed into intervals of 100€ (bucket_width).
The results are then calculated for 100 buckets (#_inlier_buckets) where the 'Order Amount' is greater than 1€
(min) and less than 10001€ (max). The maximum value is calculated with the following formula: min + bucket
width x #_inlier_buckets = 1+ 100 x 100 = 10001.
SELECT
total_cases,
bucket_id,
IF (bucket_id > 100, -1, ((bucket_id-1) * 100) + 0) AS bucket_start,
IF (bucket_id > 100, -1, (bucket_id * 100) + 0) AS bucket_end
FROM (
SELECT
BUCKET("Order Amount", 1, 100,100) as bucket_id,
COUNT(case_id) as total_cases
FROM FLATTEN(THIS_PROCESS)
WHERE "Order Amount" is not null
GROUP BY 1
ORDER BY 1
) as sub
Example 2:
The following example aggregates the total number of cases based on the 'Order Amount'. The cases are
bucketed into intervals of 100€ (bucket_width). The results are calculated for 10 buckets (#_inlier_buckets)
where the 'Order Amount' is greater than 1€ (min) and less than 1001€ (max). The maximum value is
calculated with the following formula: min + bucket width x #_inlier_buckets = 1 + 100 x 10 = 1001
The result displays in a breakdown widget. The values on the X axis represent the bucket ID. Buckets that
contain at least one case within them are displayed. The cases that have a bucket value of 10 but exceed the
maximum value are added into the outlier bucket. The outlier bucket has a bucket ID of 11.
SELECT
BUCKET("Order Amount", 1, 100,10) as order_amount,
COUNT(case_id)
FROM FLATTEN(THIS_PROCESS)
WHERE "Order Amount" is not null
GROUP BY 1
ORDER BY 1
Result:
Learn about the numeric rounding functions in SIGNAL, the process mining query language of SAP Signavio
Process Intelligence.
The numeric rounding functions simplify calculations by converting floating-point numbers to an approximate
value that is shorter, clearer, and easier to remember.
These functions help group data, set threshold values for comparisons or histogram bucketing, and calculate
complex value expressions such as cost or duration.
For example,
• 10.56 is rounded to 11
• -10.56 is rounded to -10
For example,
• 10.56 is rounded to 10
• -10.56 is rounded to -11
For example,
• 10.45 is rounded to 10
• 10.56 is rounded to 11
• -10.45 is rounded to -10
• -10.5 is rounded to -11
For example,
• 10.45 is rounded to 10
• -10.5 is rounded to -10
Parameter Description
Examples
Example 1
The following query returns the order amount rounded up or down to the nearest integer value based on the
specified function.
SELECT "case_id",
"Order Amount",
CEIL("Order Amount") AS "Order Amount - Ceiling",
FLOOR("Order Amount") AS "Order Amount - Floor",
ROUND("Order Amount") AS "Order Amount - Round",
TRUNC("Order Amount") AS "Order Amount - Truncated",
FROM THIS_PROCESS
LIMIT 10
Result:
The following query shows how each function rounds up the negative values.
Result:
1.7 Keywords
Keywords are words with special significance in SIGNAL, meaning they cannot be used as identifiers in a query.
If you wish to use a keyword as an identifier, you must enclose it in double quotation marks.
is invalid because it tries to give the selected column an alias of 'Sum'. However, 'SUM' – being the name of
a function – is a SIGNAL keyword. (SIGNAL keywords are case-insensitive, so the difference in case is of no
consequence.) To use 'Sum' as an identifier, you would have to enclose it in double quotation marks, like so:
• ABS
• ALL
• ANALYZE
• AND
• ANY
• AS
• ASC
• BARRIER
• BEHAVIOUR
• BETWEEN
• BOOL_AND
• BOOL_OR
• BUCKET
• BY
• CASE
• CASE_ID
• CATEGORY
• CEIL
• CHAR_INDEX
• CHAR_LENGTH
• COALESCE
• CONCAT
• COUNT
• CREATE
• CURRENT
• DATE_ADD
• DATE_DIFF
• DATE_PART
• DATE_TRUNC
• DEFAULT
• DENSE_RANK
• DESC
• DESCRIBE
• DISTINCT
• DROP
• ELSE
• END
• END_TIME
• EVENT_ID
• EVENT_NAME
• EVENTS
• EXACT
• EXPLAIN
• EXTERNAL
• FALSE
• FILL
• FILTER
• FIRST
• FLATTEN
• FLOOR
• FOLLOWING
• FORMAT
• FROM
• GRANT
• GROUP
• HAVING
• IF
• ILIKE
• IN
• INVOKER
• IS
• JOIN
• JSON
• LAG
• LAST
• LEAD
• LEFT
• LIKE
• LIMIT
• LOCATION
• LOG
• MATCHES
• MAX
• MEDIAN
• MIN
• NOT
• NOW
• NULL
• NULLS
• OCCURRENCE
• ODATA
• OFFSET
• ON
• ONLY
• OR
• ORDER
• OUTER
• OVER
• PARQUET
• PARTITION
• PERCENT
• PERCENTILE_CONT
• PERCENTILE_DESC
• PERMISSIONS
• POW
• PRECEDING
• PRIVATE
• PUBLIC
• RANGE
• RANK
• REGR_INTERCEPT
• SECURITY
• SELECT
• SIGN
• SQRT
• START_TIME
• STDDEV
• SUBSTRING
• SUBSTRING_AFTER
• SUBSTRING_BEFORE
• SUM
• TABLE
• TABULAR
• TEXT
• THEN
• TIMESERIES
• TIMESTAMP
• TO
• TO_NUMBER
• TO_STRING
• TO_TIMESTAMP
• TRUE
• TRUNC
• UNBOUNDED
• UNION
• USING
• VIEW
• WHEN
• WHERE
• WITH
• WITHIN
1.8 Performance
The SIGNAL Engine processes queries and can handle very large data sets rapidly, although some factors can
affect performance.
The SIGNAL Engine performs numerous optimizations automatically to maximize query performance. In
general, the engine can comfortably handle event logs consisting of up to 1 billion entries.
For a certain subset of language features, performance can suffer when operating on a data set of very large
size. This can compromise the execution time of the containing query. A function or operator is considered
vulnerable to performance issues if its execution time tends to be greater than five seconds when running
on a data set of 1 billion events. All functions and operators prone to such performance issues are flagged
accordingly in their documentation.
In SIGNAL, the timeout length of a query is 10 minutes. Once a query has run for 10 minutes without result,
that query is terminated and an error is returned.
Tip
If you receive a timeout error, refreshing your browser triggers a restart of the query. A result may be
returned by this subsequent attempt, however it isn't guaranteed.
Learn about using SIGNAL in the following tutorial, the process mining query language of SAP Signavio Process
Intelligence.
Based on a sample process with test data, this tutorial introduces you to the main principles of SIGNAL,
starting from simple case-attribute based queries to more complex event-based queries.
Count cases and cities • How many cases exist • Count a case attribute COUNT
for this process? • Count the distinct re-
COUNT DISTINCT
• How many different cit- cords of a case attribute
ies are involved in this • Count two or more case AS
process? attributes ORDER BY
• How many cases exist • Rename case attribute
WHERE
for each city? names with alias names
• How many cases exist • Sort the result set IN
for New York and Miami? • Filter a result set to in-
clude only records that
fulfill a specified condi-
tion
Analyze order amounts • What is the average or- • Determine the average AVG
der amount of this proc- value of a case attribute
SUM
ess? • Determine a filtered
• What is the average or- average case attribute FILTER
der amount in Houston? • Sum up a case attribute
• What is the total order • Determine the percent-
amount in Boston? age value of a case at-
• What is the percentage tribute compared to the
order amount in Boston overall value
compared to the total • Apply filter condition(s)
order amount? within a query
Determine process cycle • How long is the average • Calculate cycle times MAX
times
cycle time of all cases? • Perform subqueries on
MIN
• How long is the average event attributes
cycle time by city? • Determine cycle time by
• What are the maxi- case attribute
mum / minimum cycle • Calculate the largest /
times by city? smallest values
Investigate events • How many cases have • Counting the total num- IF
been closed / canceled? ber of events
NOT
• What is the drop-out • Filter for not true condi-
rate? tions MATCHES
Note
For a more detailed view of the process flow, check the Process Discovery widget of the sample process.
In this tutorial, the following case and event attributes are used:
• Case attributes
• Case ID
• City
• Order Amount in EUR
• Event attributes
• EventName
Example (extract):
Note
For more detailed information about the test data, check them under Process Settings > Data in the sample
process.
Read about the overview of case attributes used in the SIGNAL tutorial.
You want to get an overview about the case attributes of the process: How many cases exist for this process?
How many different cities are involved in this process? How many cases exist for each city? How many cases
exist for New York and Miami?
Query instruction: Count (keyword: COUNT) all cases (expression: case_id) in this process.
SIGNAL syntax:
SELECT
COUNT(case_id)
FROM THIS_PROCESS
Query instruction: Count (keyword: COUNT) all cities (expression: City) in this process. Unlike example 1, only
distinct (keyword: DISTINCT) records are counted.
SIGNAL syntax:
SELECT
COUNT(DISTINCT City)
FROM THIS_PROCESS
Query result: The total number of distinct cities, displayed in a Value widget:
Learning success:
Query instruction: Count (keyword: COUNT) the number of cases (expression: case id) by city (expression:
City). Rename the case attribute names with aliases (keyword: AS) to "Case Number" and "Site". This makes
labels for widgets easier to understand. Finally sort the result set by Case numbers (keywords: ORDER BY 1) in
descending order (keyword: DESC).
The order expression 1 selects the column, which has to be sorted. You can sort in ascending order (keyword:
ASC), descending order (keyword: DESC), by null values first (keyword: NULLS FIRST,) or by null values last
(keyword: NULLS LAST).
SIGNAL syntax:
SELECT
COUNT(case_id) AS "Case Numbers", City AS "Site"
FROM THIS_PROCESS
ORDER BY 1 DESC
Query display: The total number of cases by city, displayed in a SIGNAL table widget and in a Breakdown
widget.
Table:
Example 4: How many cases exist for New York and Miami?
Learning success: Filter a result set to include only records that fulfill a specified condition.
Query instruction: This query is very similar to example 3, but in this case, you do not query all cases but only
the cases for New York and Miami. To filter the result, introduce the filter (keyword: WHERE) and specify the
filter condition (keyword: IN + expression: 'New York', 'Miami'). Sort the result set by cities (keyword: ORDER
BY 1), in descending order(keyword: DESC).
SIGNAL syntax:
SELECT
COUNT(case_id) AS "Case Numbers", "City" AS "Site"
FROM THIS_PROCESS
WHERE City IN('New York', 'Miami')
ORDER BY 1 DESC
Query display: The total number of cases by filtered cities, displayed in a SIGNAL table widget and in a
Breakdown widget.
You want to get insights about the order amount: What is the average order amount of this process? What is
the average order amount in Houston? What is the total order amount in Boston? What is the order amount in
Boston related to the total order amount?
Query instruction: Determine the average value (keyword: AVG) for the order amount(expression: Order
Amount in EUR).
SIGNAL syntax:
SELECT
AVG("Order Amount in EUR")
FROM THIS_PROCESS
Query instruction: Determine the average order amount (see example 1). Introduce the filter (keyword:
WHERE) and specify the filter condition (expression: "City"='Houston').
SIGNAL syntax:
SELECT
AVG("Order Amount in EUR")
FROM THIS_PROCESS
WHERE("City"='Houston')
Query instruction: Sum up (keyword: SUM) the total order amount (expression: "Order Amount in EUR").
Introduce the filter (keyword: WHERE) and specify the filter condition (expression: "City"='Boston').
SIGNAL syntax:
SELECT
SUM("Order Amount in EUR")
FROM THIS_PROCESS
WHERE("City"='Boston')
Query result: The total order amount in Boston, displayed in a Value widget:
Learning success:
• Determine the percentage value of a case attribute compared to the overall value.
• Apply filter condition(s) within a query.
Query instruction: Sum up the order amount in Boston (see example 3). Unlike example 3, you cannot
apply the filter condition as the last step. You have to filter (keyword: FILTER + filter condition: (WHERE
"City"='Boston')) the result set before you can calculate the percentage value.
SIGNAL syntax:
SELECT
SUM("Order Amount in EUR")
FILTER (WHERE "City"='Boston')
/SUM("Order Amount in EUR")
* 100
FROM THIS_PROCESS
Query result: The percentage order amount of Boston, displayed in a Value widget:
Read about how cycle times are determined in the SIGNAL tutorial.
You want to determine the cycle times of your process: How long is the average cycle time of all cases? How
long is the average cycle time by city? What are the maximum / minimum cycle times by city?
Learning success:
Query instruction: You have to calculate the cycle times first and then aggregate them to an average value. The
cycle times are calculated from the event-based timestamps, so you have to perform a subquery (keyword:
(SELECT).
To calculate the cycle time, you have to subtract the first event timestamp (keyword: SELECT FIRST +
expression: end_time) from the last event timestamp (keyword: SELECT LAST + expression: end_time). From
these values, you aggregate the average value (keyword: AVG) .
SIGNAL syntax:
SELECT
AVG(
(SELECT LAST(end_time))
-
(SELECT FIRST(end_time)))
FROM THIS_PROCESS
Query instruction: Determine the average cycle time (see example 1) by city (expression: City). Rename the
case attribute with an alias (keyword: AS) to "Cycle Time".
SIGNAL syntax:
SELECT
AVG(
(SELECT LAST(end_time))
-
(SELECT FIRST(end_time))) AS "Cycle Time", "City"
Query result: The average cycle times by city, displayed in a SIGNAL table widget:
Query instruction: According to example 1, you calculate the cycle times by city. From these values, you
determine the smallest value(keyword: MIN) and the largest values(keyword: MAX). Rename the values with
aliases (keyword: AS) to "Maximum Cycle Time" and "Minimum Cycle Time". Finally sort the result set by
city(keywords: ORDER BY 3) in ascending order(keyword: ASC).
SIGNAL syntax:
SELECT
MAX(
(SELECT LAST(end_time))
-
(SELECT FIRST(end_time))) AS "Maximum Cycle Time",
MIN(
SELECT LAST(end_time))
-
(SELECT FIRST(end_time))) AS "Minimum Cycle Time",
"City"
FROM THIS_PROCESS
ORDER BY 3 ASC
Read about how events of the process are used in the SIGNAL tutorial.
You want to get an overview about the events of the process: How many cases have been closed / canceled?
What is the drop-out rate? How many cases follow the standard process? How many cases are canceled
although the T-shirt has been sent for printing?
You perform two subqueries for the last event names and sum up all cases for which the respective condition is
fulfilled. The event names are event-based attributes, so you perform a subquery (keyword: (SELECT).
Select the last events of the cases (keyword: SELECT LAST + expression: event_name).
If (keyword: IF) the last event name is "Receive Delivery Confirmation" (keyword: IN + expression: ('Receive
Delivery Confirmation')) count 1, otherwise 0 (keyword: 1,0)). Sum up (keyword: SUM) the value for all cases.
Rename the value with an alias(keyword: AS) to "Closed Cases".
SIGNAL syntax:
SELECT
SUM
(IF
((SELECT LAST("event_name"))
IN('Receive Delivery Confirmation'),1,0)) AS "Closed Cases",
SUM
(IF
((SELECT LAST("event_name"))
IN('Receive Delivery Confirmation'),0,1)) AS "Canceled Cases"
FROM THIS_PROCESS
Query result: The total number of closed and canceled cases, displayed in a SIGNAL table widget:
Query instruction: Select the last events of the cases (keyword: SELECT LAST + expression: event_name).
Filter the result set(keyword: FILTER (WHERE) for event names other than (keyword: NOT) "Receive Delivery
Confirmation" (keyword: IN+ expression: ('Receive Delivery Confirmation')) and calculate the percentage.
SIGNAL syntax:
SELECT
(COUNT(case_id) FILTER
(WHERE NOT(SELECT LAST(event_name)
IN('Receive Delivery Confirmation'))))
/
COUNT(case_id)
*100
FROM THIS_PROCESS
Query instruction: This query is similar to example 2, but in this case, you do not only search for the first or last
event but for a certain pattern of events.
1. The starting event is "Receive Customer Order" (expression: ^'Receive Customer Order')
2. The next event (directly or indirectly following the start event) is "Receive Payment" (expression: ~>
'Receive Payment').
3. The next event (directly or indirectly following the preceding event) is either "Ship Goods Standard" or
"Ship Goods Express"(expression: ~>('Ship Goods Standard | 'Ship Goods Express')) .
4. The final event (directly or indirectly following the preceding event) is "Receive Delivery Confirmation"
(expression: 'Receive Delivery Confirmation'$).
SIGNAL syntax:
SELECT
(COUNT(case_id) FILTER
(WHERE event_name MATCHES
(^ 'Receive Customer Order'
~>'Receive Payment'
~>('Ship Goods Standard'|'Ship Goods Express')
~> 'Receive Delivery Confirmation'$)))
/
COUNT(case_id)
*100
FROM THIS_PROCESS
Query instruction: This query is similar to example 3, but for a different type of pattern: You want to determine
how many orders have been canceled while the T-Shirt has already been sent for printing.
SIGNAL syntax:
SELECT
(COUNT(case_id) FILTER
(WHERE event_name MATCHES
(^ 'Receive Customer Order'
~>'Receive Payment'
~>'Send T-shirt to Printing')
~>'Order Canceled'$)))
/
COUNT(case_id)
*100
FROM THIS_PROCESS
What is the cookbook, who is it for and what will they get out of it?
Audience
The intended audience of the cookbook is process analysts who already have a basic familiarity with SIGNAL
and knowledge of similar querying languages, like SQL. We recommend that the reader already knows the core
keywords and functions available, and has already used the language to construct simple queries.
Purpose
SIGNAL, the SAP Signavio Analytics Language, is the process mining query language of SAP Signavio Process
Intelligence. It's a powerful tool with numerous features that can be used and combined in many different ways.
This cookbook aims to guide early-stage users who already have some basic familiarity with SIGNAL and help
them to develop their abilities with the language.
Structure
The cookbook's approach is to guide the reader through a series of examples of process analysis using
SIGNAL. You can think of each example as a use case or recipe demonstrating how to use the language to
achieve some nontrivial goal.
They're organized into categories corresponding to typical process analysis activities, such as determining
conformance, variants, or rework. This way, you can use the cookbook not only as a learning resource but also
a reference guide, one that you can browse in order to learn about achieving specific, commonly occurring
goals.
Learn about different ways of calculating cycle times from your process data.
Cycle time refers to the recorded time between two events in a case. By itself, cycle time generally refers to the
duration of a whole case, namely the time between that cases's first and last events. However, cycle time can
also refer to the duration between two arbitary events of interest to the user.
Goal
Solution
Calculate the complete cycle time for all cases and then find the average value from this collection.
Discussion
Every event has an associated end_date value. We can locate the earliest event in a case by using the FIRST
function, which returns the first element in a collection.
FIRST(end_date)
Similarly, we can locate the latest event in a case using the LAST function.
To calculate the duration of a case, subtract the earliest event_time from the latest event_time. Keep in
mind that event_time is an event-level attribute, so this subtraction is done in a subquery.
This expression would return the cycle times of all cases. To calculate the average of all these values, supply the
expression as input to the AVG function.
Related Information
Goal
Discover how long each step of a case took. In other words, we want to know the cycle times between each pair
of consecutive events.
Solution
For each case, order the events chronologically and compare the timestamp of each event to its predecessor
(where it has one).
To get the timestamp of the current row, we can select its end_date attribute.
As part of the solution, we want to compare each row and its immediate neighbor. For comparing a row and its
predecessor, the LAG window function is available.
Tip
Window functions allow you to group your data into partitions and carry out operations within that
partition. You can use a window function to perform calculations on the set of rows in the same partition as
the current row.
The LAG function accesses a column value from the row preceding the current row. Selecting LAG(end_time)
returns the timestamp of the previous row. Bear in mind that the first row in each partition has no predecessor,
in which case LAG returns NULL.
1. The partition. We're examining all events on a per-case basis, so we partition by case_id.
2. The sorting. We need the events in chronological order, so we sort by end_time.
To get the cycle time between the two consecutive events, we can subtract the timestamp of the previous
(earlier) row from that of the current one.
Selecting this alone would give us only a list of durations. Let's also associate each duration with the names of
the two enclosing events. We can get the event name of the current event by selecting event_name.
To get the name of a previous event, we can again use LAG. We partition and order the rows in the same way as
before, but instead select the event_name.
We have now defined the columns of our result set, which we can select along with the case_id:
SELECT
case_id,
LAG(event_name) OVER (PARTITION BY case_id ORDER BY end_time) AS Predecessor,
event_name AS Event,
end_time - LAG(end_time) OVER (PARTITION BY case_id ORDER BY end_time) AS
cycle_time
All that remains is to add a FROM clause. Keep in mind that window functions can't work with nested data, so
the process data table must be flattened.
SELECT
case_id,
LAG(event_name) OVER (PARTITION BY case_id ORDER BY end_time) AS Predecessor,
event_name AS Event,
end_time - LAG(end_time) OVER (PARTITION BY case_id ORDER BY end_time) AS
cycle_time
Example Output
3.2 Variants
Goal
Determine which variants are the most prevalent within the process data. We'd like to choose how many
variants to see and also query other data associated with each variant, in this case the average cycle time.
Solution
Find which distinct sequences of events exist in the process data, count them, and order the result numerically.
Also, for each distinct sequence, calculate its average cycle time.
Discussion
As specified by the data model, each case contains a nested table of event data. One of the columns of this
nested table is event_name. Selecting event_name in a query therefore returns a list of event names.
Although this example contains three event sequences, there are only two distinct sequences – notice how
the first two cases ship goods standard, while the third case ships goods express. To show only the distinct
variants, you can group them by their sequence of event names.
To count the number of distinct variants in this grouped data set, apply COUNT to a case-level attribute.
SELECT
event_name AS "Variant",
COUNT(case_id)
FROM THIS_PROCESS
GROUP BY 1
To rank the variants starting with the most prevalent, sort this column in descending order of value. To choose
only the most prevalent N values, limit the result set to N.
SELECT
event_name AS "Variant",
COUNT(case_id)
FROM THIS_PROCESS
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5
Finally, we want to find the average cycle time per variant, which is explained in the Average Cycle Time recipe.
AVG is an aggregate function. When used with a GROUP BY clause, an aggregate function computes values
across all rows of each group and returns a separate value for each group. Since we're grouping by sequence of
event names, this calculates the average of each distinct variant.
Example Output
Rework within a case is identified by a number of repeated process steps exceeding a given threshold.
Show an arbitrary number of cases containing rework, treating the threshold value of rework as a parameter.
Goal
Solution
In each case, count the number of occurrences of each event. Specify the minimum number of event
occurrences to qualify them as rework and filter out cases whose value is insufficient.
To count occurrences of events, we can use the OCCURRENCE function. This function calculates a running total
of occurrences for a specified event-level attribute. For example, consider a case with the following list of event
names:
case_id event_name
Receive Payment
The OCCURRENCE function would return the following values for event_name: [1, 1, 2, 3, 1, 1, 2, 1]. These values
show that each event occurs between one and three times in that case.
Because our goal is to include only cases above the threshold value, let's use the OCCURRENCE function as part
of a WHERE clause. Let's choose 3 as the threshold value. Since we're operating at event level, the clause must
use an event-level subquery. However, we can't simply use the OCCURRENCE function like so:
That's because a subquery must return an aggregated value instead of a collection of values. Since our goal is
to identify cases meeting a minimum threshold value, let's take the maximum number of occurrences by using
the MAX function. Doing this determines if at least one event meets the threshold.
Finally, we can arbitrarily limit the number of cases returned. Let's take the first 10 cases from the process
data.
Related Information
Compliance measures to what extent a process exhibits specific patterns and behaviors.
Show the compliance rate for only a subset of cases you're interested in. As a demonstration, we'll consider the
subset as sales orders using standard invoicing.
Goal
See what rate of cases receive payment when standard invoicing is applied.
Solution
Calculate a count of cases exhibiting a specific pattern of events, then divide this by the total number of cases.
In this instance, a case is counted if it uses standard invoicing and has received payment.
Discussion
To count cases, we can use the COUNT function. However, this counts all cases, so we need to exclude cases
whose events don't follow the event pattern we're interested in.
We can exclude cases from a count by applying the FILTER clause to the COUNT function. This clause has the
following syntax:
So, we need to express behavior as a condition. For this, SIGNAL provides matching expressions. These yield
true or false depending on whether event-level attributes follow a specific sequence. In this example, we're
interested in cases following the standard invoicing pattern: a customer order is received and payment follows
later, either directly or indirectly. In our example data, this information is captured in the event name. We can
express this as:
COUNT(1) FILTER (WHERE event_name MATCHES ('Receive Customer Order' ~> 'Receive
Payment'))
Since we'd like the compliance rate, we divide the filtered number of cases by the total number of cases and
multiply by 100.
(
COUNT(1) FILTER (WHERE event_name MATCHES ('Receive Customer Order'~>
'Receive Payment'))
/
COUNT(1)
) * 100
SELECT
(
COUNT(1) FILTER (WHERE event_name MATCHES ('Receive Customer Order'~>
'Receive Payment'))
/
COUNT(1)
) * 100 AS "Standard Invoicing Compliance Rate"
FROM THIS_PROCESS
Related Information
Learn how you can specify behaviour patterns, using them to find matches or deviations in process data.
A behaviour is an expression evaluating case or event level attributes and can be used in a pattern matching
expression.
Find the number of cases whose behavior shows them to be incomplete. As a demonstration, we'll consider
sales invoices, which are considered incomplete if they remain unpaid and haven't been cancelled..
Goal
Obtain a count of how many invoices remain unpaid and aren't in any other way resolved.
Solution
Find all cases where an order was received but no payment followed, ignoring cancelled orders. Count the
resultant cases.
Discussion
In our example process data, receipt of an order is given the event name 'Receive Customer Order'. Using
pattern matching, we can identify all received orders by matching against this event name
We need to refine the matching so that, from these cases, only those which are either unpaid or not canceled
are matched. In the case of unpaid orders, receipt events aren't followed by payment events, either directly or
indirectly.
Similarly, in the case of non-canceled orders, receipt events aren't followed by cancellation events, either
directly or indirectly.
For an order to be considered incomplete, the sequence of events in each case must match all three of these
conditions. Therefore, we connect them using the logical connector AND.
Finally, this matching expression can become part of a WHERE clause. Because we're counting matches, we
can simply count the number of rows that match that expression.
Related Information
A time series arranges process data into chronologically ordered cases and events.
Filter events to find specific cases of interest and order them chronologically without leaving any gaps in the
time series.
Goal
Identify cases containing a specific pattern of events and put them into a time series, summarizing them at a
particular level of precision. To demonstrate this, we'll write a query that selects only sales orders paid for in
multiple installments and counts them on a per day basis.
Solution
Select the timestamps of the cases in the process data, filtering for cases whose event patterns contain more
than one payment event. Summarize them per day and fill any gaps in the time series, so that days without
such cases are also included in the result set.
First, select the timestamp. Take the date of the case's first event, in other words when the order was created.
Since we're selecting an event, use an event-level subquery.
(SELECT FIRST(end_time))
The timestamps are to be summarized per day, so truncate the timestamp to that level of precision.
DATE_TRUNC(
'day',
(SELECT FIRST(end_time))
) AS "Date"
To count cases per day, use the COUNT function to count case_id in the result set. (Recall that when the
SELECT clause includes both aggregate and non-aggregate expressions, the result is automatically grouped by
the non-aggregate expressions, in this case the timestamp.)
SELECT
DATE_TRUNC(
'day',
(SELECT FIRST(end_time))
) AS "Date",
COUNT(case_id) AS "# Cases"
FROM THIS_PROCESS
ORDER BY 1
At this stage, the query counts all cases and returns the following result on the example data:
To include only the cases where multiple payments were made, filter using a matching expresion. Include only
cases where a payment event was followed at some point by another payment event.
SELECT
DATE_TRUNC(
'day',
Now that the query filters out certain cases, executing it on the example data leaves gaps in the time series:
To remove the gaps, use the FILL clause to insert rows where dates are missing. The query selects two
columns, so the FILL clause must provide one specification for each:
1. TIMESERIES('day'): The first column of an inserted row is filled with the day missing at that point in the
time series.
2. NULL: The second column of an inserted row is filled with NULL.
SELECT
DATE_TRUNC(
'day',
(SELECT FIRST(end_time))
) AS "Date",
COUNT(case_id) AS "# Cases"
FROM THIS_PROCESS
WHERE event_name MATCHES ('Receive Payment' ~> 'Receive Payment')
ORDER BY 1
FILL TIMESERIES ('day'), NULL
Related Information
Goal
See a month-by-month overview of how many cases were active in the respective time period. Also, we'd like
running totals of the number of started and finished cases.
1. On a monthly basis, find which cases were opened and which were closed.
2. Count the found cases, filling any gaps in time series.
3. Calculate from these figures how many cases were active.
Discussion
We're interested in a month-by-month overview, so let's truncate the precision of the dates to the level of
month.
Next, we label each selected row with a number. This label, started, indicates if the row records an order
being opened (1) or closed (0).
Finally, join these two queries using the UNION ALL clause so that they return one combined result set.
This query serves as a subquery, providing a data source for an outer query that we construct in the following
section.
SELECT "Month"
Next, the labels we created become useful. We use them to count how many cases were opened (started =
1) and how many were closed (started = 0) by filtering on those labels before counting.
SELECT "Month",
COUNT(1) FILTER (WHERE started = 1) AS count_started,
COUNT(1) FILTER (WHERE started = 0) AS count_finished
FROM (
-- The subquery from the previous section goes here.
) AS sub1
GROUP BY 1
If a time period happens to contain no rows, it appears as a gap in the time series. It's a good idea therefore
to use a FILL clause, which inserts missing periods into any gaps. In this case, we're querying at the level of
months, so we'll use a specification of FILL TIMESTAMP('month') to ensure the regularity of the time series.
SELECT "Month",
COUNT(1) FILTER (WHERE started = 1) AS count_started,
COUNT(1) FILTER (WHERE started = 0) AS count_finished
FROM (
-- The subquery from the previous section goes here.
) AS sub1
GROUP BY 1
FILL TIMESERIES('month')
At this point, running the query produces an output something like this:
Once again, this query will become a subquery, acting as a data source for an outer query that we'll construct in
the following section.
SELECT "Month",
count_started AS "Started, Current & Previous Months",
count_finished AS "Completed, Current & Previous Months",
FROM (
-- Subquery from the previous section goes here.
) AS sub2
) AS sub3
The third column should display the number of active cases for a particular month in time. For any given
month, that number is calculated by taking the number of cases started up to that month and subtracting the
cases finished up to the previous month. Whenever you need to consider the values from the row immediately
before, you can use the LAG function.
Be mindful that if no previous row exists, LAG returns NULL. So, we use the COALESCE function to ensure that a
numeric value is returned in such a case.
We can add this to our SELECT clause as the "Active Cases" column.
SELECT "Month",
count_started AS "Started, Current & Previous Months",
count_finished AS "Completed, Current & Previous Months",
count_started - COALESCE(LAG(count_finished) OVER (ORDER BY "Month"), 0) AS
"Active Cases"
FROM (
-- Subquery from previous section goes here.
) AS sub2
) AS sub3
SELECT "Month",
count_started AS "Started, Current & Previous Months",
count_finished AS "Completed, Current & Previous Months",
count_started - COALESCE(LAG(count_finished) OVER (ORDER BY "Month"), 0) AS
"Active Cases"
FROM (
SELECT "Month",
SUM(count_started) OVER (ORDER BY "Month") AS count_started,
SUM(count_finished) OVER (ORDER BY "Month") AS count_finished
FROM (
SELECT "Month",
COUNT(1) FILTER (WHERE started = 1) AS count_started,
COUNT(1) FILTER (WHERE started = 0) AS count_finished
FROM (
SELECT DATE_TRUNC('month', (SELECT FIRST(end_time))) AS "Month",
1 AS started
FROM THIS_PROCESS
UNION ALL
SELECT DATE_TRUNC('month', (SELECT LAST(end_time))) AS "Month",
Example Output
Related Information
Hyperlinks
Some links are classified by an icon and/or a mouseover text. These links provide additional information.
About the icons:
• Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your
agreements with SAP) to this:
• The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.
• SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any
damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.
• Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering an SAP-hosted Web site. By using
such links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this
information.
Example Code
Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax
and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of
example code unless damages have been caused by SAP's gross negligence or willful misconduct.
Bias-Free Language
SAP supports a culture of diversity and inclusion. Whenever possible, we use unbiased language in our documentation to refer to people of all cultures, ethnicities,
genders, and abilities.
SAP and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP
SE (or an SAP affiliate company) in Germany and other countries. All
other product and service names mentioned are the trademarks of their
respective companies.