0% found this document useful (0 votes)
11 views217 pages

Loffler v. Automatic Data Processing. Using The KNIME Analytics Platform 2021

The document is a comprehensive guide on automating data processing using the KNIME Analytics Platform, covering various aspects such as variables, loops, conditional branching, and workflow automation. It emphasizes the importance of data literacy in the modern workforce and advocates for the automation of repetitive data tasks to allow professionals to focus on their core expertise. The publication includes practical examples and resources for users to enhance their understanding and application of KNIME for data analysis and automation.

Uploaded by

AZEYOU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views217 pages

Loffler v. Automatic Data Processing. Using The KNIME Analytics Platform 2021

The document is a comprehensive guide on automating data processing using the KNIME Analytics Platform, covering various aspects such as variables, loops, conditional branching, and workflow automation. It emphasizes the importance of data literacy in the modern workforce and advocates for the automation of repetitive data tasks to allow professionals to focus on their core expertise. The publication includes practical examples and resources for users to enhance their understanding and application of KNIME for data analysis and automation.

Uploaded by

AZEYOU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 217

Vladimír Löffler

Automatic data processing using the KNIME Analytics


Platform
1st Edition
Published by
© VLADIMÍR LÖFFLER, Ostrov, 2021
Graphic design of the cover
© PETRA LÖFFLEROVÁ, Ostrov, 2021
Translation: PRESTO – PŘEKLADATELSKÉ CENTRUM s.r.o,
2021
Table of contents
1. Introduction
2. Getting started
Training files
Automation
3. KNIME – variables – introduction
Types of variables
KNIME – global variables
KNIME - variables defined in the Flow Variables tree
Input node variables
Output node variables
KNIME – variables defined in special nodes
Example 1 - variables formed based on a customization table
KNIME – creating variables using the “input” nodes
Example 1 - file selection for the CSV Reader
Example 2 - creating variables of the String type
Example 3 - creating variables of the Integer type
Combining Input fields within a single component
Data input using the List Box Configuration node
KNIME – creating variables using Widgets
4. KNIME – executing workflows in a loop – cycles
KNIME – loop over the table rows
Example – combining sheets of an MS Excel workbook
KNIME – loop according to the value groups
Example – automatic division of a large table according to individual groups
KNIME – loop over an interval
Example – creating a graph of the sin(x) and cos(x) functions
Practical use of loops of the Interval Loop type
KNIME – loop over the table columns
Example – mass data transformation of selected columns
Alternative workflow
KNIME – other loop types
Chunk Loop – loop executed “chunk-by-chunk”
Counting Loop – loop with a given number of repetitions
Generic Loop – loop with a condition at the end
Recursive Loop

5. KNIME – conditional workflow branching


KNIME – branching of the IF type
Workflow nodes – IF Switch – End IF
Example – combining of daily and weekly KPI reports
KNIME – branching of the CASE type
Example – calculation of the total weekly needs of workers
Workflow completion
“Tuning” workflows
6. KNIME – workflow calling automation
Assisted automation – semi-automatic workflow
Launching semi-automatic workflows
Autonomous automation – automatic workflows
Launching automatic workflows

7. KNIME – workflow automation and clarity


Workflow clarity
Node descriptions
Annotations
Metanodes
Components
Meta information

8. KNIME – workflow automation and speed


Speed
Configuration of the given environment
Cache node
Parallel execution
Other workflow performance optimization options
9. KNIME – automation and error resilience
Workflow resilience
Empty table treatment
Catching errors

10. KNIME – other tips and tricks


Using the Knime examples
Data visualization
Power BI
Databases
Connectors to SAP
Machine learning
Python and R
R installation
Python installation
Knime Server
11. KNIME – links to other sources
Links
Knime – official
Knime – community
Training
Social networks
Public datasets worth playing with
The latest versions of the training workflows on the Knime Hub
12. About the author
13. Acknowledgment
1. Introduction
We are at the beginning of the 2020s. A new industrial revolution is already
fully underway and the pace that ground-breaking technologies are
introduced will only accelerate in subsequent years. Adapting quickly to the
ongoing changes in the world is a key factor of success for companies as well
as individuals. The ability to efficiently gather and assess the omnipresent
data is ceasing to be a competitive advantage. It is rather becoming a
necessity for operating and surviving on the current turbulent markets. This
fact dramatically increases the demands on the labor force and its “data
literacy”.
The aspect of expertise and productive utilization of the human potential
The number of fields where the ability to process data is automatically
expected has been growing. Adverts like “Looking for a warehouseman -
knowledge of MS Excel necessary” already form a common part of the labor
market. There are even fields and jobs where data processing outweighs the
given technical work. Planners, accountants, human resource personnel,
controlling employees, suppliers as well as other experts receive their wages
partially for work conducted in their respective fields of expertise and
partially for their “other expertise” - repeated data downloads,
transformations and storage, and their combining and dividing. A significant
part of these data operations is not creative. On the contrary, the work is
rather monotonous, repeating day after day, week after week, month after
month....
We believe that there is no reason for monotonous work, including
sophisticated monotonous work with data, in the 21st century. We believe that
the value of employees lies in their ability to conduct creative activities and
that experts should be able to focus on their main areas of expertise. We
believe that working hours should be utilized for the corresponding
professional work, not for “agonizing” over tables. However, the repeated
creation of tables and reports often forms a significant part of the job content
of many experts.
Why should the value of an accountant, quality controller, production
planner, logistic specialist or manager depend on his/her ability to create
tables?
Time aspect
Nevertheless, there are tables, reports and analyses that are very useful. The
information in them can be used for reducing costs, increasing revenues or
helping the management to react to everyday situations in a more flexible and
efficient way. However, they are only available sometimes (weekly,
monthly, annually) because it is difficult to create them every day, or even
upon request. It is not at all unusual for the preparation of a monthly report,
when the data are downloaded, combined, cleaned and transformed from
several sources, to take several days.
We believe that the main added value of company reports is based on the fact
that they reflect the current condition of the company as much as possible.
That allows for flexible responses to the given world changes and thus for
efficient company management. We believe that the necessary data should be
available in a format that is understandable basically upon request (and not
only once a week or once a month). We also believe that data processing
should be automated as much as possible, thus eliminating errors and delays
and also allowing the experts to focus on their field of expertise. It also
assists managers in their effort to manage their companies well.
The Knime Analytics Platform is a very effective tool for the automation
of data processing (from individual use to use in supranational corporations).
Knime Analytics Platform
Knime Analytics Platform is an application originally created at the
University of Konstanz. It has been one of the world’s best platforms for data
processing and analyses, machine learning and data science for many years
now.
Knime is an open-source software product. No commercial license is
necessary to use it (i.e., Knime is available free of charge). There is also a
commercial license for the extensive use of Knime. It allows cooperation
within the framework of a team. It includes a web portal for publishing
automatic reports (similar to, for example, Microsoft Power BI) as well as the
Knime Server for the advanced control of automatic workflows.
(for more information see the Knime Open Source Story:
https://www.knime.com/knime-open-source-story)
How does Knime work with data?
Knime uses a graphic interface for creating a workflow. Individual workflow
steps, known as nodes, execute the intended data operations. (If you know the
MS Excel environment, you can imagine the Knime workflow as a well-
arranged graphic Makro, or as a transformation process in MS Excel Power
Query.) The configuration of the nodes is intuitive and does not require
any programming knowledge. Working with the nodes is interactive and
user-friendly.
Example of a simple Knime workflow

Workflows can be created for individual analytic tasks or for repeated


operations - for assisted or autonomous automation. Individual workflow
steps can be launched gradually (with continuous monitoring of the step
results), or as a whole - by a single click. Workflows can also be called
externally (from another program) or planned using a scheduler with an
automatic start at a selected time.

Analytic path
Knime workflows permit the creation and saving of complete analytic
procedures, which are used for preparing organized data, processed exactly
pursuant to our needs, from the original raw data. This can be very
convenient if we execute an analysis only very rarely or if the input
information always differs only slightly. In that case, the visualized analytic
path will help us to get quickly oriented and to effectively complete our
analysis, even though such an analytic workflow may not be suitable for
automation.
What is included in the publication
The publication primarily explores the automation of data processing - by
semiautomatic and autonomous data workflows. You will gradually become
acquainted with the basic building blocks of the automation - variables, loops
and branches. We will also show you how to optimize our data workflows,
protect them against errors and launch them automatically.
The individual chapters address theory as well as practice. We have
endeavored to apply a balanced approach. That is why each explored topic is
complemented by an example, which can be practically tested.
Theory

Topic explanation
Usage examples with a detailed analysis of individual operations
Notes and observations from the practice
References to other examples and information sources

Practice
We believe that interactive teaching is the most beneficial way of teaching for
students. That is why we have prepared a shared folder, from which you can
download a package that includes all the explored workflows (20 fully
functional Knime workflows), including the used data.
Note:
None of the provided examples require the paid version of Knime (i.e., they
do not require the Knime Server).
Used symbols (apply to the printed and PDF versions; the ePub and Mobi
formats contain only text)

Knime and operating systems


The examples presented in the individual chapters were mostly prepared
in Knime 4.2 for MS Windows 10. Knime can be also used in and is
supported by the Linux and macOS operating systems.
We tested Knime in Ubuntu 18.04 LTS (Linux) and in macOS Big Sur
(Apple macOS), though we went back to the version for MS Windows 10.
The reason is that Knime is stable in the MS Windows 10 environment,
which we cannot sincerely confirm when it comes to the Linux and macOS
versions.
2. Getting started
The following chapters gradually explore and explain the basic building
blocks of the automatic data workflows, i.e.:
Global and flow variables
Active elements for creating variables (user dialogues)
Various loop types (cycles)
Conditional branching (possibility to choose an alternative data
processing method based on the given situation)
Protection of workflow against errors
Workflow performance
The actual automation (the option to launch workflows
automatically or semi-automatically)

We assume that you are already familiar with the basics of working with the
Knime Analytics Platform. If you are not, we recommend getting familiar
with the basics of working in the Knime environment first.
The following is a link to the updated introduction to the work with Knime:
https://www.knime.com/getting-started-guide
You can also find links to other study materials in the last chapter of the
book.
If you do not have Knime installed on your computer yet, you can use some
of the automatic installation programs (for MS Windows, Linux and macOS)
that can be found on the Knime website.
Link: https://www.knime.com/downloads/download-knime

Training files
The folder with all the explored workflows and the corresponding data files
can be downloaded here (password for the download is Knime2020):
Knime workflow
Download archive – Knime_Automation_book
Data files
Download archive – Knime_Advanced_Data

Alternative link (password for the download is Knime2020):


https://strojove-uceni.eu/downloads/
Upon the download, you should have the following files available:

Where to put the data


Extract the downloaded archive KNIME_Advanced_Data. Save the folder
with the data to C:\ (if your hard drive has a different name or if you prefer a
different folder for the data, change the uploading path or data storage area in
the corresponding workflow nodes).
Folder with the data

Note: a special Archive folder is always located in the output folder and its
subfolders. It contains the prepared result files (in order to allow for
comparing your outputs with ours).
Import Knime workflow
Save the Knime_Automation_book archive with the work workflows to your
Knime workspace using the “Import KNIME Workflow…” option. Upon the
import completion, the workflow should be available in the tree of the local
workflows.
Import KNIME Workflow

Note: to launch some of the workflows, the installation of appropriate


extension packages is needed. As a part of our presentation, we include
individual explanatory steps for their installation. All the workflows have
been created in Knime 4.1 and 4.2 and tested in the 4.3 version. Knime 4.3
introduced new nodes for uploading data (for example, from MS Excel). That
is why some nodes in Knime 4.3 can invite you to use the new version
(marked as Deprecated in the given node name). Knime respects retrospect
compatibility, which means that the functionality of the workflows created in
previous versions will always be preserved. The latest versions of the
workflows are also available on the Knime Hub
(https://hub.knime.com/search?q=lofflerwf).

Automation
What do we mean when we say automation?
We define the automation level in the Knime context in the following
manner:
Automation examples
Knime can be used for automating almost any data task. Just to get an idea,
we present here some typical examples. Particular solutions of the selected
tasks are then described in detail as a part of our explanations.
3. KNIME – variables – introduction
In the Knime environment, we can work with variables – known as Flow
variables. Variables allow us to execute more complex operations within the
framework of our workflows. We can continuously save useful values in the
variables, upon which we can use them in the workflow nodes whenever
needed.
A variable is defined by its name, data type and value. For example, the city
variable can be of the String data type (i.e., a text string) and its value can be
Tokyo. Or, the age variable can be of the Integer data type and its value can
be 62.
Table of the basic variable types:

When do we use the Flow variables


Flow variables can be used either for more complex, one-time workflows (for
example for workflows that require branching based on certain rules, or
workflows that include cyclically executed operations), or for dynamic
workflows, repeated use of which is expected.
Using Flow variables, you can create semi-automatic workflows (for assisted
automation scenarios) and even workflows that will be fully autonomous
(launched without human intervention).
Static workflows, in which no parameters change and in which it is not
necessary to react to the results arising from individual nodes, do not need
flow variables.

Types of variables
We use two types of variables:

Variables created for the entire workflow – global variables


Variables created in the workflow nodes

Global variables are defined prior to launching the corresponding workflow.


They are visible to all its nodes, including the node that is called first.
Variables created in the workflow nodes are visible only to those nodes that
are subsequently connected to the node in which the variable was created.
Variables can be formed in the workflow nodes in several ways:
Within the frame of a node, defined in the Flow Variables tree
Within the frame of the node configuration, using the button
for entering a variable (only some nodes)
Using the Input and Widget nodes
we define a variable, a value of which is
assigned to it using a user dialogue
Using special nodes for variables
nodes designated for creating variables, either
from given table values or pursuant to certain
rules
Using Java nodes
KNIME – global variables
Global variables can be created for the entire KNIME workflow before its
launch. Variables created in this manner can be used within the entire
workflow.
We can test the variable in a simple workflow.
Workflow: 001_Variables - global 1

You can define global variables using the “Workflow Variables…” option
(selected by clicking on the names of individual workflows with the right
mouse button).
You can add a variable using the Add button. We define the variable name,
type and initial value.
The result looks like this.

We will filter the content of the row in the City column.


We set (for the “use pattern matching” option) the v_city variable, which is
available for the given node, and launch the node.
The filter defined by the content of the v_city variable is functional.
We can manually control the behavior of our workflows using globally
defined variables. For example, we have data that contain all sales for all
countries, but we are only interested in particular countries or a particular
group of products. The corresponding global variable defined for the entire
workflow is then a good helper.
On the other hand, global variables are not convenient for one-time or static
workflows. It is hard to find any advantages of their use for such workflows.
KNIME - variables defined in the Flow Variables
tree
Yet another type of Flow variable is a variable that is created as a part of the
node configuration.

Node configurations almost always include the Flow variables tab. The tree
of variables includes fields that are accessible within the frame of a particular
node. Existing variables, which then influence the behavior of the given
node, can be selected in the grey fields (input variables). Variable names
(new or existing) can be entered in the white fields. They will then contain
values from the output of the given node (output variables, which
subsequently form input variables for subsequent nodes).
Grey and white fields for entering variables
Input node variables
Workflow: 001_Variables - global 1
In order to explain the “grey fields” for the Flow Variables, we create a
global workflow variable called v_article, into which we save the Article
string (which is the name of the table column whose values we will want to
convert using the Number To String node).

In this simple workflow, we upload the file that contains the sales data for the
material numbers, which Knime uploaded as the Integer type. However, we
want to convert the material numbers into the String type.
In the Number To String node, we select the variable v_article for the
“included_names” parameter.
We do not manually select anything on the Options tab (the Include area will
be empty) since the selection control was taking over by the variable
v_article.
Conversion result.
We can see the current variable values on the Flow Variables tab.

Output node variables


We use output node variables if we need, for any reason whatsoever, to
preserve the selections we have made within the given node.
To demonstrate this point, we once again use the Number To String node
within the same workflow as in the previous case. This time, however, we
select the Article column manually and enter the names of the new variables
in the “white fields”.
Upon launching the node, the result is identical to the previous case.
We can see the newly formed variables on the Flow Variables tab. We can
use them arbitrarily within the linked workflow nodes.
This way we can, for example, see the newly formed variables in the linked
Rule Engine node. (Variables of the String types (v_f_type, v_city and
v_article) are accessible here. The variables v_inc_names and v_exc_names
are of the list type (list, collection). They would have to be modified prior to
being used in the Rule Engine node. Nevertheless, they can still be used
within the given workflow).
KNIME – variables defined in special nodes
Knime includes several nodes, the result of which is a single new variable or
multiple new variables.

The principle of such nodes is very similar. Variables are formed based on
the given table values or as a result of a certain rule.
The possibility to create variables in this manner is fundamental for the
existence of automatic workflows, when we need to control the workflow
behavior using parameters that can later change.
Example 1 - variables formed based on a customization table

We create our customization table and workflow parameters using the Table
Creator node. It is a simplification. In real life, you would use a csv
definition file, MS Excel table, table from an SQL database, etc.

We created the Path, City, Chunk and Filter columns, and we entered the
corresponding values in the row.
In technical documentation, it is certainly recommended to describe the
significance of the individual variables (we recommend to do so and to keep
such descriptions for more extensive workflows).

We connect the Table Row To Variable node.


The node result is represented by a list of variables - the variable name was
obtained from the column name and the value from the first row of the table.

We can then use the variables arbitrarily, for example, when uploading a file,
in the nodes for filtering and converting individual values, etc.
Workflow example
Workflow: 002_Variables - variable nodes 1

In the CSV Reader node, we initially activated the variable ports using the
“Show Flow Variable Ports” option.
Next, we selected suitable variables.
The Path variable for a dynamic file upload.
The Chunk variable for limiting the number of the uploaded rows.
We selected the City variable in the Row Filter node in a similar manner.
In the Number To String node, we used the Filter variable.

Result with the original parameters.


We can change the parameters and restart the workflow at any time.

If we want to have the parameters saved in the customization file, we can


conveniently use the Table Column To Variable node.
Customization file
File: C:\KNIME_Advanced\Input\Customizing\Customizing_variables_1.txt

File upload – with the read row IDs and read column headers parameters.
The workflow can then look, for example, like this.
Workflow: 003_Variables - variable nodes 2
The Table Column to Variable node creates variables according to the
selected columns in our customization table. The values are saved in the
Value column.

The node result is represented by new Flow variables.


Final table upon executing the entire workflow.

Flow Variables in the final table.


Note:
The values adopted from the customization table were of the String type.
However, the CSV Reader node requires variables of the Integer type for the
number of the uploaded rows. That is why we used the String Manipulation
(Variable) node.

This node allows for manipulations with strings, including data type
conversions. Our conversion for the Chunk variable is from the String type to
the Integer type. We selected the toInt(x) function for the Chunk variable and
we checked the Replace Variable option.
KNIME – creating variables using the “input”
nodes
Another method for creating variables for your Knime workflows is the use
of the Input nodes.

Input nodes allow for the creation of variables using a user dialogue.
Example 1 - file selection for the CSV Reader
Node configuration

1. We select the Local File Browser Configuration node


2. We configure the node, for example, in the following manner

3. We create the Component

Wrapping in the Component node is necessary in order for the input node to
start reacting as an interactive input field - with the user dialogue.
4. We complete the final configuration steps of the finished
Component node
5. We name the Flow variable

We do the configuration in the component dialogue - we select “Configure”


from the context menu of the node.
We go to the Flow Variables tab and type the variable name in the path port
(here CSV_in), into which the path to the selected file will be saved.

6. We connect and configure the CSV Reader node

Do not forget to activate the variable ports.


We connect the CSV_in variable in the CSV Reader node.
This was the last step of the configuration. Now we can use the dialogue
window for selecting the file name. You can arbitrarily use the CSV_in Flow
variable within the entire workflow.
Other types of the Input nodes can be configured similarly.
Example 2 - creating variables of the String type
We use the String Configuration node. The other configurations are already
identical.

We configure the node, for example, in the following manner.


Once again, we wrap it as the Component and assign a Flow variable.
We use the variable arbitrarily, for example, as a parameter for the Row
Filter node.
Example 3 - creating variables of the Integer type
We use the Integer Configuration node. The other configurations are
identical again.
We configure the node, for example, in the following manner.
We wrap it in the component and assign a Flow variable.

We then arbitrarily use the Component in the nodes, for which it makes sense
to use our Flow variable, in our case in the Row Filter node again.
The workflow that uses the Input nodes can then look, for example, like this.
Workflow: 004_Active_elements - Input 1
File: C:\KNIME_Advanced\Input\Sales_full.csv
Note – using variables of the Input nodes
If we want to use an Input node variable, we need to set the connection of the
given variable ports in the output node of the component and include the
selected variable in the Include part of the filter. It is then not necessary to
enter the Flow variable upon the first calling of our Component nodes.
Combining Input fields within a single component
Several Input nodes can be combined within a single component. Single use
of the given component will then allow for entering several variables
simultaneously.
The node prepared in this manner can look, for example, like this.
The use of the Workflow can then look, for example, like this:
Workflow: 005_Active_elements - Input 2
File: C:\KNIME_Advanced\Input\Sales_full.csv

Data input using the List Box Configuration node


This node allows for entering multiple values, which you can then use, for
example, when filtering, etc.

We wrap the node in the component again and select an output variable.
However, the node returns only one row, which consists of all entered values.

In order to be able to practically use the values entered in this manner, it is


convenient to separate them, for example, into rows or columns of the table.
This can be done, for example, in the following way.
Workflow: 006_Active_elements - Input 3
File: C:\KNIME_Advanced\Input\Sales_full.csv
The Variable to Table Row node transfers the variable from the List Box
Configuration node to the given table row.
The Cell Splitter node converts the content of the cell into a list of values
separated by a comma (Output, As list selection).
We can then divide the list of the values into individual rows using the
Ungroup node.

Tables prepared in this manner can already be used for filtering, for example.
Filtering by the table values can be executed using the Reference Row Filter
node.

And this is our result.


KNIME – creating variables using Widgets
The use of Widgets in our workflows is technically very similar to the work
with the Input nodes.
One significant difference is the fact that Widgets allow for formatting using
CSS. They also offer a graphic interface – interactive view. The benefits
become apparent when working with the Web portal, which forms a part of
the paid version (Knime Server).
When working in a local environment, we recommend using the Input nodes
described in the previous chapter (it is because some Widgets directly require
the Knime Server).
The Widgets in the Node Repository tree are located within the frame of the
Widgets branch.
We do not start individual Widgets by double-clicking on them. Instead, we
use the “Execute and Open Views” option.

Similar to the Input nodes, Widgets can be encased in the Component meta
node, upon which they can be called within a single common interactive
view.
To demonstrate how the widgets work, we will modify the
005_Active_elements - Input 2 workflow in a way that it includes the
Widgets instead of the Input nodes.
Workflow: 007_Active_elements - Widgets 1
We launch the Component using the “Interactive View: Widget multi view”.
We can enter values of the variables in the open interactive view.
Detail of the “Widget multi view” Component.

The links given below include more detailed information about the Widgets
and their use.
All information about the Widgets can be found at the Node Pit, including
workflow examples:
https://nodepit.com/category/flowabstraction/widgets
Using Widgets in the environment of the Web Portal and Knime Server:
https://www.knime.com/knime-software/knime-webportal
Formatting Widgets using CSS:
https://docs.knime.com/2019-06/analytics_platform_css_guide/index.html
4. KNIME – executing workflows in a
loop – cycles
When we need a part of our workflow to be executed in a loop or cycle, we
use nodes of the Loop type.
A Loop must have a starting node (Loop Start) and an ending node (End
Loop). Any sequence of the nodes (loop body, in our case Action) is then
repeated within the frame of the loop until the loop ends.

A loop is ended when:

the predefined number of repetitions is completed


A given condition is fulfilled

Knime includes several Loop Start and Loop End nodes.


Loop Start nodes
Loop End nodes

We will become familiar with the most commonly used loop types in detail.
KNIME – loop over the table rows
We start the overview of the loops with a very practical loop type - Table
Row To Variable.
The Table Row To Variable Loop Start node gradually passes through the
rows of a table and converts the content of each row into a variable. We can
then work with the variable inside the loop body. A loop body can be any
workflow. We close the loops using the Loop End node, which starts the
subsequent loop iterations and, at the same time, gathers the results of
individual iterations in the table.

The loop can be started as a whole – individual iterations are then executed
together or step-by-step – you can then execute and monitor individual loop
iterations.
Options of the Loop End node
Example – combining sheets of an MS Excel workbook
Situation: We have a file in MS Excel. It contains data related to bad quality
expenses for cost accounting (reworking, warranty claims and scrapping) for
the individual calendar weeks. The data are arranged in a single file, though
they are divided into individual sheets according to the calendar weeks
(CW01, CW02…). A new sheet with the data for the previous calendar week
is added to the table each week. We need to create a universal workflow that
will combine the data from all the sheets in a single table.

We can prepare, for example, the following workflow:


Workflow: 008_Loops – 1
File: C:\KNIME_Advanced\Input\Cost\Cost_report_2020_CW10.xlsx
We first upload the names of all the sheets from our MS Excel file into the
table using the Read Excel Sheet Names (XLS) node.
Node configuration

Final node table


We thus have a table that contains the current names of all the sheets of our
MS Excel file. We need to state the names of the sheets in the loop in the
Excel Reader (XLS) node, thus being able to gradually upload and combine
all its sheets. We thus need to enter the sheet name in the variable, which will
be changing during the loop – it will gradually acquire the following values:
CW01, CW02…CWxx.
The Table Row To Variable Loop Start node makes it possible to transfer the
values of the table rows into a variable (in the loop).
Node configuration (we used the preset values)
Flow Variables after the first node execution. Sheet names in the Sheet
variable.

We then use the Sheet variable when configuring the Excel Reader (XLS)
node.
Note: notice the connection of the Excel Reader (XLS) node via the variable
port

We convert the values in the Document ID and Cost Center columns in the
body loop into the String type and we add the CW column, which will contain
the name of the current sheet (it presents the calendar week here; thus, we do
not lose the information about the calendar week from which the data came).
Configuration of the Number To String node

Configuration of the Constant Value Column node


We finish the loop using the Loop End node. This node will ensure
repetitions of the loop body until it runs into the last row of the table with the
sheet names.

Configuration of the End Loop node (the proposed configuration is used)


Result of the Loop End node (the final table is gradually uploaded to the
result upon each iteration)

We can save the final (combined) table into a new MS Excel file.
Other practical examples of the use of the Table Row To Variable Loop can
be found under the following link:
https://nodepit.com/node/org.knime.base.node.flowvariable.variableloophead.LoopStartVa
KNIME – loop according to the value groups
The Group Loop Start node in the loop gradually combines the values of the
input table. Arbitrary workflow steps can then be executed in the loop body
with the groups created in this manner. Again, we close the loops using the
Loop End node, which starts the subsequent loop iterations and, at the same
time, gathers the results of individual iterations in the table.

Similar to the first example, the Group Loop can be started as a whole –
individual iterations are then executed together or step-by-step – we can then
execute and monitor individual loop iterations.
Options of the Loop End node

Execute Executes all loop iterations


Step Loop Execution Executes one loop iteration
Reset Deletes the results of all iterations and the loop can
then be launched again
Pause Loop Execution Pausing the loop execution at the current iteration
Resume Loop Resuming the loop execution
Execution

Example – automatic division of a large table according to


individual groups
Situation: We have a file that contains document data related to bad quality
cost accounting. The documents are accounted to individual cost centers. We
need to create tables for the cost center managers, which will include only
those documents that are relevant to their respective centers. In other words,
we need to arrange the common table according to the cost centers and we
need to save the groups created in this manner into several, suitably named
new files.
Note: our input will be the table that was the result of the example of a loop
over the table rows.

We can prepare, for example, the following workflow:


Workflow: 009_Loops – 2
File: C:/KNIME_Advanced/Input/Cost/Cost_report_2020_CW10_All.xlsx
First of all, we upload the input file in the usual manner, using the Excel
Reader (XLS) node.
Variables after the node execution
We start the loop itself by the Group Loop Start node.

In the Group Loop Start node, we set the field, based on which we will
gradually create the given group of values. In our case, these values will be
from the Cost Center column.
Node result (in our case the result of the last iteration)
Flow variables after the node execution

The String Manipulation (Variable) nodes in the loop body will prepare us a
suitable file name and path, which we will then use for saving the final files.
First of all, we replace the CW10_All string in the Path variable by the value
of the Cost Center variable – in our case 100199.

Node result – modified variable Path


In the second String Manipulation (Variable) node, we complete the final
form of the path for saving the combined data files – we replace Input for
Output.

Node result
We will save the final files using the Excel Writer (XLS) node; the path and
the file name will be dynamically controlled by the prepared variable Path.

Final files in the configured target folder


Example of the final file Cost_report_2020_100007

Additional examples of the practical use of the Group Loop can be found
here:
https://nodepit.com/node/org.knime.base.node.meta.looper.group.GroupLoopStartNodeFa
KNIME – loop over an interval
The Interval Loop Start node starts a loop that gradually increases the value
of the variable within the time interval we defined. In the node, we determine
the start and end values of the interval and also the step, which represents the
value by which the variable will be increasing during the individual
iterations.

If we set an interval from 0 to 10 with step 2 for variable X, the variable will
gradually acquire the following values:

Example – creating a graph of the sin(x) and cos(x) functions


Situation: we need to display a graph of the given function for the values
from the numerical interval we defined. (The example is rather academic,
though it is a good demonstration of the principle of the Interval Loop.)
We can prepare, for example, the following workflow:
Workflow: 010_Loops – 3
In the Table Creator node, we create columns for the future values of
variable X, and also columns for the values of functions sin(x) and cos(x).

In the Interval Loop Start node, we set the bottom limit, upper limit and
interval step. We set the prefix name for the variables to be loop_X. We
subsequently use the variables in other loop nodes (X will thus be represented
by the loop_Xvalue variable).
Flow variable after the second iteration.

In the next section of our workflow, we create the necessary data. We will
then repeat this section until we reach the top limit of the loop (i.e., until the
value of the flow variable loop_Xvalue will be equal to the value of the flow
variable loop_Xto = 6.28).
We use the Constant Value Column node for creating a new row of the table.
The new row will contain a value in column X, which will correspond to the
value of the loop_Xvalue variable. During the loop, the values in this row
will be overwritten by new values and the final table (in the End Loop node)
will contain all the rows generated by the individual loop iterations.

Assigning flow variables in the Constant Value Column node.


Final node table after the third iteration

This is followed by two Math formula nodes for the calculation of the values
of the sin(x) and cos(x) functions and for their recording in the table.
The value of the sin(loop_Xvalue) function will be recorded in the sin(x)
column.
Similarly, the value of the cos(loop_Xvalue) function will be recorded in the
cos(x) column.
Final table of the second Math Formula (cos(x)) node after the seventh
iteration.
We configure the End Loop node, for example, in the following manner.

Once the entire loop is executed, the node result will contain 629 values –
values of variable X and corresponding values of functions sin(x) and cos(x).
We can then use such a table for drawing the given function graphs, which
we execute using the Line Plot node.

Configuration of the Line Plot node - selection of the variables


The Line Plot node offers several options for modifying the final form of the
graph. We will only use a simple configuration of the displayed graph name –
Chart title: sin(x) and cos(x).

Once the node is executed, we can view the final graphs – Interactive View
option: Line Plot.
Practical use of loops of the Interval Loop type
Apart from “academic” examples, the Interval Loop offers several practical
usage options. For example, when simulating the impact of the prices of input
materials on the product profitability (impact upon a price change by 1 EUR,
2 EUR…) or, similarly, the iterative modelling of the impact of tax rate
changes, various margins, or changes in the number of workers. For
advanced scenarios, we can use them for, for example, to search for optimal
parameters of the machine learning models (when we change the model
parameters in the loop and save the final model accuracy in the table, from
which we subsequently select the best model parameters for productive
deployment).
You can find several practical examples of the use of the Interval Loop here:
https://nodepit.com/node/org.knime.base.node.meta.looper.LoopStartIntervalNodeFactory
KNIME – loop over the table columns
The Column List Loop Start node starts the loop, which goes through the
selected table columns and executes the defined workflow nodes over these
columns. The Loop End (Column Append) node ensures gradual passage
through individual columns and it also gathers the processed columns into the
final table.

Example – mass data transformation of selected columns


Situation: We have a table with the sales data for a certain date. The table
comes from the ERP system, which includes the codes of the organizational
units and of other data that resemble whole numbers. We need to modify the
data type of such information, thus avoiding problems during subsequent data
processing (for example, the Shipping area column includes identification
codes of individual distribution locations - 100, 101…121. We do not want to
treat these codes as numbers (we do not want adding them up, determining
their average, etc.). Instead, we want to treat them as categories (we want to
execute filtering, combining and other operations based on them). In order to
make it immediately recognizable that the codes represent categories, we
want to add the "N_" string to every such code, for example, Packing ID
„2000014378“ will look like this after the conversion: “N_2000014378”.
We can prepare, for example, the following workflow:

Workflow: 011_Loops – 4
File: C:\KNIME_Advanced\Input\Shipping\Shipping_report.xlsx
Using the Excel Reader (XLS) node, we initially upload the given file, upon
which we create two column groups in the Column Splitter node:

1. columns that we want to modify


2. columns that we want to keep in the existing format
We then connect the Port Top of the Column Splitter node to the Column
List Loop Start node, where we select the columns that we will gradually go
through as a part of the loop.
Result of the Column List Loop Start node after the first iteration.

Observe the flow variables, formed within the loop. We use the
currentColumnName variable for the dynamic renaming of the columns (we
need to rename the columns to, for example, tmp in order to allow for
universal data modifications of all columns within the loop – see below).

First of all, we convert the column data type in the loop body (Number To
String node, the column name is controlled by the currentColumnName
variable), after which we change the texts stated in the given column - using
the String Manipulation node.
In the other nodes in the loop body, we want to add the "N_" string in front of
each code (gradually in all columns), thus making sure that we can
immediately see that they are categories and not numbers.

Since we will change data in the individual columns (there will be several of
them), we need to do a little trick – we change the column name before the
String Manipulation node, for example, to tmp. Once the manipulation with
the string is completed, we restore the original column name. (Note: should
we only want to change the data type in the column from a number to a
string, and not to change the data, we just need the Number To String node.)
The String Manipulation node and function join() need to know the name of
the column in which the given data manipulation will be executed. The
picture below shows an example of a Plant column. If we do not gradually
rename all columns to, for example, tmp, we would need to have multiple
nodes – one for each column.
When we rename each column within the frame of the loop of the table to
tmp, the String Manipulation node will look like this.
We have thus ensured that all the columns of the table will be modified.
We execute the first conversion of the column names using the Column
Rename (Regex) node, with the stipulation that we will control the
searchString parameter by the currentColumnName variable and that we
permanently set the Replacement: parameter as tmp.
We use the same node type for the second conversion, we just set the variable
reversely. The searchString parameter will be set as tmp and the
replaceString parameter will be controlled by the currentColumnName
variable (the implicit text prefix_$1 plays no role here).
We close the loop using the Loop End (Column Append) node.
The result after the entire loop is executed looks like this.

All the columns are of the S = String type and the data in the columns have
been modified as we needed.
In the last step of our workflow, we will repeatedly connect the table using
the Joiner node. The joining column will be the technical Row ID column.
Connection result.
Alternative workflow
The example given above explains a loop over table columns. However,
Knime is continuously being developed and version 4.2 brought a new node
type - String Manipulation (Multi Column). This node makes mass
manipulations of columns a very easy task. A workflow created in Knime
version 4.2 can look, for example, like this.

Workflow: 011_Loops – 4

Configuration of the String Manipulation (Multi Column) node.


Final table.
Some workflow examples with a practical use of the Column List Loop can
be found here:
https://nodepit.com/node/org.knime.base.node.meta.looper.columnlist2.ColumnListLoopS
KNIME – other loop types
Knime includes several other loop types. The principle of their use is
identical to the detailed descriptions given above. Let us briefly introduce
them.
Chunk Loop – loop executed “chunk-by-chunk”
The Chunk Loop Start node can divide the initial table into “chunks” of a
selected size and gradually process them. The criterion for the size of the
“chunk” is the number of the table rows. The remaining number of the row
will be uploaded as a part of the last step of the loop (for example, 105,000
rows, chunk = 25,000: 25,000, 25,000, 25,000, 25,000 and 5,000 rows in
individual iterations of the loop will be gradually uploaded).

Configuration of the Chunk Loop Start node.


Usage examples
Example 1
We have a list of tens of thousands of sentences that we need to translate
from English to Czech. A large number of community translators is working
on the translation. We need to divide and distribute the list evenly.
Solution: we prepare a workflow, which will first upload the source file, upon
which it will, using the Chunk Loop, divide the source table into chunks,
each consisting of a thousand sentences, and gradually save the results in
several small files that will be distributed to the translators.
Example 2
We have a csv file that includes data from the production line sensors. The
file consists of almost two million rows and 4,500 columns. The technicians
need to view and analyze the data in MS Excel. However, the file cannot be
opened in MS Excel – all attempts to do so end up by the program crashing.
A preview of the csv file in Notepad is not working either.
Solution: we prepare a workflow that will upload the csv file and split the
table in the Chunk Loop into chunks, each of them consisting of, let us say,
300,000 rows. We then save these log “chunks” into multiple Excel tables
within the framework of the loop.
Note: Knime can easily process csv files even of this size (provided we have
sufficient memory), though analyses executed directly in Knime are not
suitable for everybody.
Many examples of the use of the Chunk Loop can be found here:
https://nodepit.com/node/org.knime.base.node.meta.looper.chunk.LoopStartChunkNodeFa

Counting Loop – loop with a given number of repetitions


The Counting Loop Start node starts the loop with n iterations. The
workflow in the loop body is then executed n times, always with the entire
input table. The Loop End node then again collects results from individual
iterations.
Configuration of the Counting Loop node.

Several examples of the use of the Counting Loop can be found here:
https://nodepit.com/node/org.knime.base.node.meta.looper.LoopStartCountNodeFactory

Generic Loop – loop with a condition at the end


The Generic Loop Start node starts a general loop, from which we can create
a loop with a condition at the end using the terminal Variable Condition
Loop End node. This loop will then execute any workflow in the loop body
until the terminal condition is fulfilled.
Configuration of the Generic Loop Start node.

Configuration of the Variable Condition Loop End node.


The Variable Condition Loop End node has two output ports. The first port
is designated for the data collected in individual loop iterations. The second
port is designated for the table with the values of the controlling variable after
individual loop iterations.

Several examples of workflows with the use of a loop with a condition at the
end can be studied here:
https://nodepit.com/node/org.knime.base.node.meta.looper.condition.LoopStartGenericNo

Recursive Loop
The Recursive Loop Start node initiates a special loop type, which is
exceptional due to the fact that it allows for returning the result of any
workflow executed within the loop body back to the start node of the loop
using the terminal Recursive Loop End node. The loop body executes the
first iteration at the input port of the node. The loop body then executes the
second and additional iterations with the given final table sent back using the
Recursive Loop End node.

The Recursive Loop End node has two input ports. The first port is used for
collecting data from individual iterations (identically to other loop types).
The second port is designated for the data that will be recursively sent back to
the Recursive Loop Start node.
Configuration of the Recursive Loop Start node.

Configuration of the Recursive Loop End node.

Some practical examples of the use of recursive loops can be found here:
https://nodepit.com/node/org.knime.base.node.meta.looper.recursive.RecursiveLoopStartN
5. KNIME – conditional workflow
branching
Should our workflows result in a situation when we need to execute an
alternative set of steps based on a situation that has arisen, we can
conveniently use the nodes for conditional branching.
Knime uses two types of branching:
1. Branching of the IF type – we use it for “either/or”, “yes/no”,
etc. cases.
2. Branching of the CASE type – we use it for “black, white or
green”, “0, 1 or 2”, etc. cases.

Branching of the IF type

Branching of the CASE type


KNIME – branching of the IF type
Workflow nodes – IF Switch – End IF
For branching of the IF type, we use the IF Switch node, which is logically
linked to the End IF node. The End IF node does not have to always be used
(we do not use it when the alternative workflow branch logically ends).

The IF Switch node includes the PortChoice parameter, which can acquire
the following values: both, bottom or top. The values control if the workflow
leaves the IF Switch node via port 1, port 2, or via both ports. If we, for
example, select the top value, the bottom port of the node closes and only the
nodes linked to the upper port are executed.
To make the workflow automatic, the branching needs to be controlled by a
variable.

Example – combining of daily and weekly KPI reports


Situation: values of the daily and weekly KPI are saved in the KPI folder
Objective: creating an automatic workflow that will combine all daily and
weekly KPI reports. The output will be represented by two files, which will
be used for reports on KPI development over time.
Input element

We can prepare, for example, the following workflow:


We will upload the file names located in the folder into the table, upon which
we will gradually go through this table using the given loop and decide,
according to the file name, if the content of the given file should be uploaded
to the weekly or monthly overview.

Workflow: 012_If_Switch - 1
File: C:\KNIME_Advanced\Input\KPI (folder with the files)
The key moment of the workflow is the Rule Engine Variable node. In this
node, we fill the variable that we have named switch with the value top,
provided the flow variable Location (gradually hiding all file names) contains
the string “daily”. The switch variable will be filled with the value bottom,
provided the flow variable Location contains the string “weekly”.
Setting up a rule for the switch variable:

Flow variables after uploading the given CSV file in the CSV Reader node:
We insert the switch variable in the IF Switch node.

The PortChoice is now controlled by the switch variable.


Situation when port 2 was active - the switch variable in this step of the loop
had the value of bottom.
KNIME – branching of the CASE type
For branching of the CASE type, we can use the CASE Switch Data (Start)
and CASE Switch Data (End) nodes. Depending on the value of the
PortIndex parameter (it can acquire the following values: "0", "1" or "2"), the
workflow continues with port 0, 1 or 2. Similarly to the branching of the IF
type, the ports that are not selected are closed and only the nodes that leave
the active port are executed.
While the active port can be selected manually, automatic workflows need a
variable for the port control.
Example – calculation of the total weekly needs of workers
Situation: we have 12 teams that create a plan of the needs of workers for a
three-shift, two-shift and single-shift operation for every working week. The
plan is in the form of a table that includes the corresponding team
identification, calendar week, shift identification and number of workers
needed for the individual shifts on concrete days. Some teams have a five-day
working week, some also work on Saturdays and others work continuously.
The rule for sending a report is that it includes the entire week (even when no
workers are required for Saturdays and Sundays), however, this rule is not
always followed. In reality, some reports do not contain columns with a
Sunday shift, or Sunday and Saturday shifts.
Objective: To combine information from all work teams into a single table
and to modify the corresponding data into a format that will be more suitable
for further processing (data for the days of the week in the columns to be
converted to rows).
Input element
Type 1 file - continuous operation

Type 2 file - six-day working week


Type 3 file - five-day working week

A possible solution is, for example, the following workflow:


Using the List Files node, we upload the file names in the input folder into a
table. Using the given cycle, we will then gradually go through this table - we
will open individual files, determine how many columns they have and, based
on the determined value, select the file processing manner. File processing
will be represented here by adding missing columns and by turning the axis
of the number of workers from columns to rows.

Workflow: 013_CASE_Switch - 1
File: C:\KNIME_Advanced\Input\Shifts (folder with the files)
The key moment of the workflow is the Rule Engine Variable node, which
fills the Switch variable based on the information about the number of
columns in the table obtained from the Extract Table Dimension node
(variable Number Columns). This variable will control the ports in the CASE
Switch Data (Start) node and thus also the action that will be executed with
the data.
Setting up the rules for the Switch variable according to the Number Columns
value:

Flow variables for the execution of the Extract Table Dimension node (one
of the loop steps):
In the CASE Switch Data (Start) node, we connect the PortIndex parameter
with the Switch variable.

Based on the value of the Switch variable, the corresponding actions are
executed in individual Metanodes (metanodes are used here for clarity
purposes).
We will look into the details of the executed actions. We open individual
metanodes.
7 days file metanode for tables that contain columns from Monday to Sunday.
Only “unpivoting” is executed here - transformation of the columns to rows.

The result of the node is a table in a “database” format.


The 6 days file metanode for input tables that are missing the Sunday
column.

We will add the missing column using the Rule Engine node (using the
condition Switch = 1, we will fill the column with 0 values; to add the
column, we use the option Append Column: Sun).
The other nodes are then identical, i.e., we execute “unpivoting” using the
Unpivoting node, and rearrange the order of the columns using the Column
Resorter node.
Result after rearranging the order of the columns.
The 5 days file metanode for tables that are missing the Saturday as well as
Sunday columns.

Should the Saturday and Sunday columns be missing in a file, we will add
both of them using two Rule Engine nodes (we will fill the columns with 0
values using the condition Switch = 1 resp. Switch = 2; to add the column, we
use the option Append Column: Sat, resp. Append Column: Sun).

The other nodes of this workflow branch are identical to the previous two
cases.
Workflow completion
The workflow can be completed, for example, like this.
The Loop End node executes the aggregation of the uploaded data
from the gradually uploaded files, while the Column Auto Type
Cast node identifies an unknown data type in the ColumnValues
node. Using the Excel Writer (XLS) node, we save the final
combined file. If we also want to see the corresponding graph, we
can, for example, arrange the data using the GroupBy node and
display them using the Line Plot node.
Data type “?” in the ColumnValues column (which includes the required
number of workers)

“Tuning” workflows
When we start a workflow, we usually find parts that we could improve.
In our case, it seems convenient to place the Unpivoting node past the CASE
branches, to rename the generic column names ColumnNames and
ColumnValues to clearer ones, such as Day and FTE Count, and to eliminate
the problem with the identification of the ColumnValues variable, and thus
also of the Column Auto Type Cast node.
The ColumnValues variable at the output of the Loop End node.

We will shortly address the problem of an unknown type in the


ColumnValues column.
Why is the data type in the ColumnValues column unknown? It is because the
rule for adding the Sat or Sun columns in the Rule Engine nodes uses
quotation marks. By doing so, we notified Knime that we want values of the
“S” (String) type. However, the ColumnValues column in the files that
contain data from the Saturday and Sunday shifts is identified as “I”
(Integer). Knime solves this contradiction by using the “?” data type.

If we remove the quotation marks, the column will contain “I” (Integer)
values.
Should the ColumnValues column still be of the “?” type, tracing (launching
the loop step-by-step) and searching for the error source (a file, a node, etc.)
would be executed next, or we would keep the Column Auto Type Cast node
for identifying a correct data type, as it is in our case.
6. KNIME – workflow calling
automation
Knime allows for two types of automation:

assisted automation
autonomous automation.

Both of these automation types have a high added value.

Assisted automation – semi-automatic workflow


Assisted automation should be the semi-automatic launching of a workflow,
with the stipulation that part of the workflow can run automatically, but it
stops at a certain point and waits for manual intervention from the operator.
A manual intervention can be represented, for example, by entering a value,
the determination of a filter parameter, entering the name of the input file,
etc. Assisted automation also include workflows that are gradually manually
launched (node-by-node, metanode-by-metanode or one series of nodes after
another) without changing the configurations of the corresponding nodes.
Launching semi-automatic workflows
We launch semi-automatic workflows from the Knime Analytics Platform.
We thus must open the given workflow, which we prepared for repeated use
in advance, upon every use, set the corresponding nodes (file names, filters,
etc.) and gradually launch them.

Autonomous automation – automatic workflows


Autonomous automation applies to the workflows that have been built in a
way that their execution does not require any input from the operator. Such
workflows control the corresponding inputs, data processing as well as
outputs on their own. Autonomous workflows plentifully use flow variables,
branching and loops. Workflow behavior can be controlled by tables of rules
and parameters. Creating such workflows is more difficult, though their
added value is very high.
Launching automatic workflows
We can launch automatic workflows in several ways:
(the list may not be complete)

In a one-time manner using the Windows command line


Using the task planner in Windows
By calling them from an external program
By calling them from another workflow
Using the Knime server

How does it work?


The knime.exe program (by means of which we launch Knime) enables
calling with parameters. One of these parameters controls the launch of the
workflow in the “batch” mode, i.e., in the background without the need to
open the workflow editor itself. To make sure the knime.exe program can be
found, we need to set the system variable PATH in Windows in a way that it
includes the path to the knime.exe file (for version Knime 4.2, the path is
C:\Program Files\KNIME_4.2).

Restart Windows after you set the system variable.


Program parameters of knime.exe, which are needed for launching the
workflow “in the background” (without opening Knime and without any
intervention by an operator):
-nosplash
-reset
-application org.knime.product.KNIME_BATCH_APPLICATION
-workflowDir="path to our workflow"

Launching workflow using the command line


Open the cmd.exe program using the keyboard combination Win + R. Enter
the string in the command line that will execute the workflow "Loops_001":
knime -nosplash -reset -application
org.knime.product.KNIME_BATCH_APPLICATION -
workflowDir="C:\Users\laufi\OneDrive\Dokumenty\Knime_příprava_Udemy\KNIME_W

When you need to call a workflow with a global variable, add the following
parameter:
-workflow.variable=<var>,<value>,<type>. (e.g. -workflow.variable=package,123,int)

Launching a workflow using the Windows task planner


The task planner in Windows allows for one-time or repeated launches of a
program. To start a program, you can also set the parameters with which the
program will be launched. This way, you can set the automatic launch of a
selected Knime workflow, for example, every Monday at 10:00 a.m.
Launching workflows using an external program
Knime workflows can also be launched from another program. The example
given below shows how we can call the Knime workflow "Loops_001" from
a program written in Python.
import os
cmd = 'knime -nosplash -reset -application
org.knime.product.KNIME_BATCH_APPLICATION -
workflowDir="C:\Users\laufi\OneDrive\Dokumenty\Knime_příprava_Udemy\KNIME_W
os.system(cmd)
Calling workflows from another workflow
To make more extensive workflows automatic, we can use the nodes that are
already able to call other workflows within our workflow. Our workflow can
then work with data that have been prepared by another workflow. We will
demonstrate this convenient option on a simple example.
We will prepare a simple workflow, which will create a table of the names of
text files from a selected folder, upon which it gradually uploads the content
of these files in the corresponding output table.

Workflow: 014_External_WF_Call – 1
File: C:\KNIME_Advanced\Input\KPI_filter (file folder)
Notice the Container Output (Table) node. This node will secure the transfer
of the table when the workflow is called by another workflow. The table will
be transferred using the Parameter Name parameter: (it has the name output
here).
We will call the workflow shown above using the Call Workflow (Table
Based) node. The actual data transfer is governed by the parameter Fill
output table from: = output.

Workflow: 015_External_WF_Call - 2
Launching workflows using the Knime Server
The most effective tool for the automated launch of workflows is the Knime
Server. Apart from automating work with workflows, the Knime Server can
do several other things – it supports team work, manages access rights to the
given data, publishes data in the form of well-arranged dashboards. Apart
from the above, it allows for executing automatic data modifications and
transformations (which is the main topic of this publication). It also forms a
platform for the data science. Data scientists can use Knime for creating and
tuning models for machine learning, while the Knime Server ensures a
productive deployment of these models.
Knime Server is a paid service. Price information related to the Knime Server
and more detailed information about the content and functionalities of the
Knime Server can be found here: https://www.knime.com/knime-server.
7. KNIME – workflow automation
and clarity
Workflow automation is not only about variables, loops and branching.
Automatic workflows should also be well arranged, understandable, fast and
resistant to errors.

Workflow clarity
Always keep in mind that you need to create workflows that will run for a
long time without human intervention. When you come back to a certain
workflow after a while, you need to quickly grasp what the workflow does
and how it operates. You can significantly improve workflow clarity by using
the following elements:
Node descriptions (short and long)
Annotations
Metanodes
Components
Meta information

Node descriptions
Short descriptions can be entered directly in the “visible” part of the nodes.

Long descriptions allow for detailed descriptions of the nodes. We can enter
long texts using the “Edit Node Description”. We will then see the text when
you hold the mouse pointer over the given node for approximately one
second.
Hold the mouse pointer over the node for approximately 1 second and the
corresponding node description will be displayed.

Annotations
Annotations allow for the effective documentation of the workflows. Using
annotations, you can comment on and document workflow logic blocks. The
annotation editor is very simple and well-arranged.

Annotation editor.

Workflow: 014_External_WF_Call - 1.1


Metanodes
A metanode is a node that contains part of a workflow. Using metanodes, we
can improve the clarity of our workflows.
Let us have a tuned part of an extensive workflow, which includes multiple
nodes (for example, nodes for various transformations, file uploads and
uploads of database data, etc.). When the part in question forms a logical unit,
which can be briefly and accurately described, it is convenient to enclose it
into a metanode.
In this workflow, we have a Metanode, aptly called Ungroup collections.

When we open this metanode, we will see several nodes that form a logical
unit. The clarity of the workflow has significantly improved, without
compromising the clarity of the entire workflow.
Workflow: 016_Metanode_example – 1
(adopted from KNIME Example Workflow “Finding Association Rules for
Market Basket Analysis”)
The fastest way to create a metanode is to mark consecutive workflow nodes
and, using the right mouse button, select the “Create Metanode” option.
Knime will create everything else (except the metanode description)
automatically.

You can also find more information about metanodes here:


https://www.knime.com/metanodes
Components
Originally “Wrapped Metanode”. A component can contain part of a
workflow, but also a completely independent workflow, which we can use at
multiple locations or share within a team. We already explored the creation
and use of the Component-nodes in the chapter on variables.
You can find more detailed information about the nodes of the Component
type under this link: https://docs.knime.com/2020-
07/analytics_platform_components_guide/index.html
Meta information
Meta information is created at the level of the entire workflow.
Select the “Edit Meta Information…” option from the menu of the selected
workflow.

A dialogue window opens (at the right part of the screen, at the location
where node descriptions usually are). It has a structure for entering meta
information about our workflow.
Now we can start editing.
Well maintained meta information has an added value, particularly due to the
fact that you can quickly grasp a particular workflow by reading them.
Well-arranged search of workflows according to data and tags in meta
information is not fully functional yet (version 4.2.).
8. KNIME – workflow automation
and speed
Speed
When it comes to speed, Knime is a relatively well-tuned application.
While we do not have to usually engage in assessing performance when it
comes to manually started, one-time workflows, the “performance” issue can
be important for automatic workflows that operate with large volumes of
data.
Let us explore three areas that can significantly influence the performance of
our workflows:
1. Configuration of the given environment
2. Using the “Cache” node
3. Using “Parallel execution”

Configuration of the given environment


A fundamental parameter that has an impact on the Knime performance is the
size of the memory that Knime can use.
-Xmx<number of megabytes>m
We initially set the -Xmx parameter in the dialogue box when installing
Knime. Changes to this important parameter can be executed in the
“knime.ini” file, which is located in the main system folder (it could be, for
example, here: C:\Program Files\KNIME_4.2.1).
In this case, Knime can use up to 2.5 GB of memory (that is not sufficient for
processing larger volumes of data).
If you have more memory available to you, I recommend you make as much
of it as possible available to Knime (for example, if you have 8 GB, set 6 GB
for Knime. If you have 16 GB, set at least 12 GB for Knime, etc.).

Cache node

The output of the Cache node is the same table as the input table, though it is
newly created in the memory, free of any links to previous transformations
and intermediate statuses. This can significantly speed up subsequent data
processing. For example, if we filter out 5 columns that we are interested in
from a table with a total of 100 columns, the workflow does not delete these
columns, it only hides them. When we use the Cache node, the workflow
continues only with the “clean” five columns, which can thus be much faster.
One of the possible uses of the Cache node - after a transformation of the
table columns.
Workflow: 017_Performance_Speed – 1
File: C:\KNIME_Advanced\Input\Sales\Sales_full.csv
Be careful though, the Cache node has its own arrangements. That is why
you should observe the given “authorized” use of this method. To make sure
that the use is “authorized”, you can use the helpful Timer Info node, which
shows the times consumed by individual workflow nodes.
Output example of the Time Info node.

Other information related to the Cache node and its use can be studied here:
https://nodepit.com/node/org.knime.base.node.util.cache.CacheNodeFactory

Parallel execution
The parallel execution of selected nodes can really influence the processing
speed of some workflows.
Additional extensions must be installed using "Install KNIME Extensions…"
for the nodes that have the ability to do this.
This particular extension is called KNIME Virtual Nodes.

When the installation is complete, three new nodes are added to the Node
Repository.
Example of the use of the Parallel Chunk Start and Parallel Chunk End
nodes.

Workflow: 018_Performance_Speed – 2
Files:
C:\KNIME_Advanced\Input\Sales\Sales_full.csv
C:\KNIME_Advanced\Input\Sales\Sales_locations.xlsx

Once again, the use of parallel processing should be properly assessed and
tested. These nodes also have their own arrangements and only the proper
simulation, testing and measurements will show us what the real effect of this
method is (it can be enormous, though it can also be negative if used
unsuitably).
Other information related to virtual nodes and their uses for parallel
workflow operations can be found here:
https://nodepit.com/node/org.knime.core.node.workflow.virtual.parallelchunkstart.Parallel

Other workflow performance optimization options


Additional workflow performance tuning tips can be studied here:
https://www.knime.com/blog/optimizing-knime-workflows-for-performance
9. KNIME – automation and error
resilience
Workflow resilience
Automatic workflows should be as resilient against errors and unexpected
states as possible. Workflows should ideally always provide expected results
(when everything takes place in the way it is supposed to). Should difficulties
arise during a workflow execution, the workflow should run its course
despite the errors. Instead of the expected result, the workflow should return
information about the given event - for example, in the form of an error log.
We will show two simple examples that demonstrate some options for
improving the resilience of our workflows.

1. Empty table treatment


2. Catching errors

Empty table treatment


A situation can occur during the execution of a workflow when the given
node result is an empty table. When this happens, the subsequent nodes
cannot process any data.

Knime resolves this situation in an elegant way using the Empty Table
Switch node.

This apt node works very simply. The workflow continues with port 1 when
the input table contains data. However, when the input table is empty, the
workflow continues with port 2. This will, for example, allow us to upload
data in the correct way (from a different source, from the same source but
with different parameters, etc.), or to prepare the corresponding error record –
error log.
The strength of the Empty Table Switch node is demonstrated by this
workflow.

The workflow uploads the files from the stock inventory count (the files
should be saved in a single folder) and combines them into a single file. If the
folder is empty, it produces an error log instead of the expected result.

Workflow: 019_Automation_log_file – 1
File: C:\KNIME_Advanced\Input\Inventory_upload (file folder)
Catching errors
Knime allows us to catch and correct error states using two nodes:
Try – we place this node before the operation (node or a
sequence of nodes), for which an error, which we need to
treat, can occur
Catch Errors – we place this node past the treated node or
sequence of nodes

These nodes form a part of the Error Handling group of nodes. They exist in
multiple variants.

The Try node starts the block that we want to treat.

The Catch Errors node either continues with the unchanged (expected,
correct) output of the treated node (input port 1), or with the node output that
is prepared for error situations (input port 2). If there is an error, it is recorded
in the given variables.
We will demonstrate how the “Try – Catch” process works on a simple
example. The workflow either uploads data from the SAP system and saves
the result in a data file, or creates a log with the given error description. CSV
writer creates either the Data_SAP_download file with the required data (if
the data connection and upload are free of errors), or the
Error_log_SAP_download file with the corresponding information about the
given error state (if an error occurs in the Python Source node).

Workflow: 020_Automation_log_file – 2

Note: the example with the extraction from SAP was “handily available”. The
workflow would work the same for other database connectors as well.

This is what the result of the Catch Errors node looks like when the caught
error reports are saved in the prepared “Failing…” variables.
Configuration of the Catch Errors node.

This is what an error log can look like (the error description was shortened).

Note: to start a workflow without catching errors, the given local


environment must be properly configured – SAP GUI, and SAP NW libraries
– see https://blogs.sap.com/2020/06/09/connecting-python-with-sap-step-by-
step-guide/. Nevertheless, the example should particularly demonstrate
catching errors and the creation of an error log.
You can also study several examples of the practical use of the Try – Catch
nodes here:
https://nodepit.com/node/org.knime.base.node.flowcontrol.trycatch.generictry.VariablePor
10. KNIME – other tips and tricks
Using the Knime examples
There are many examples directly available in the Knime main menu. You
can study them and learn from them.

Data visualization
Knime offers very good data visualization options. Data views are
implemented using interactive views (I have already touched on the topic in
the chapter about loops). The ability to visualize data will become even more
apparent in combination with the (paid) Knime Server, resp. the Web Portal.
You can find the visualization nodes in basic Knime under Views/
JavaScript:
Examples of the use of the visualization nodes can be found under
EXAMPLES/ 03_Visualization:

Interactive view example (Workflow from the examples -


07_Using_the_Sunburst_Chart_for_Titanic):

Additional data visualization options are provided by the Knime extension –


Basic Intelligence Reporting Tool (BIRT). This tool enables the creation of
reports based on the data from your workflows. More information about the
BIRT tool can be found here: https://www.knime.com/node/20445.
We would like to explore the extensive and very interesting issue of “Data
visualization in Knime” in detail in another part of our series related to work
with Knime Analytics Platform (it is being prepared for 2021-2).

Power BI
A useful extension can be additionally installed in Knime. It enables the
sending of data to Microsoft Power BI within a workflow. However, you
must have at least the Power BI Pro license.

Databases
We will explore databases and database connectors (MS SQL, SAP ERP,
SAP HANA, MySQL, etc.) in the next part of our Knime series. (2021)
Once again, Nodepit is a rich source of information.
https://nodepit.com/category/database
Connectors to SAP
A common topic in medium and large companies is the ability of a direct
connection between the SAP system and Knime.
This connection is feasible (for classic SAP as well as SAP S/4 HANA), and
there are several ways of achieving such a connection. We have tried three
different ways:
1. Connection using SAP RFC
a. Configuration of the local environment for SAP
RFC
b. Uploading data from SAP using RFC functions
(from the Python Source node) and their use
directly in a Knime workflow

2. Connection using a Theobald Universal connector


a. The universal Theobald RFC function must be
transported to SAP
b. Connection configuration in the Theobald
environment (very easy)
c. Using the SAP Reader (Theobald) node in the
given Knime workflow
3. Connection using Web Service
a. In SAP, we create a function module that allows
for RFC (in transaction SE37)
b. We create Web Service (in SE37 and
SOAMANAGER) from the function module
c. We use the POST Request node in the Knime
workflow and transfer the acquired XML data to
a table using the XPath node

Links to the SAP topic:


Regarding 1:
https://blogs.sap.com/2020/06/09/connecting-python-with-sap-step-by-step-
guide/

Regarding 2:
https://hub.knime.com/knime/extensions/org.knime.features.sap.theobald/latest/org.knime
Regarding 3:
https://hub.knime.com/knime/extensions/org.knime.features.rest/latest/org.knime.rest.node

Machine learning
The Knime Analytics Platform is a tool, the main advantage of which is its
ability to resolve and automate tasks from the area of advanced data analytics
without using program code. This excellent characteristic of Knime is
particularly apparent in the area of machine learning and artificial
intelligence. Knime supports classic machine learning algorithms as well as
deep learning algorithms.
The library of examples, part 04 Analytics, contains several workflows that
demonstrate the use of various algorithm types.

The gracefulness of machine learning in Knime is well demonstrated by this


example (wage prediction based on demographic data; see the library of
examples, 05_Gradient_Boosted_Trees).
An excellent introduction to the topic of machine learning in the Knime
environment in the form of an online course has been also created by my
colleague Barbora Stetinova on the Udemy.com platform
(https://www.udemy.com/course/data-analyzing-and-machine-learning-
hands-on-with-knime/).

Python and R
Apart from the native Knime nodes supported by Java Script, we can also
additionally install support for the Python and R programming languages.
R installation
Since I belong among the fans of the Python language and do not know R
well, I do not have any practical experience with the installation of R in
Knime. Nevertheless, detailed instructions for working with “R-nodes” can
be found here:
https://docs.knime.com/2019-12/r_installation_guide/index.html

Python installation
If you want to work with nodes that support the Python language, you need
to:

Install the KNIME Extension “KNIME Python Integration”


Install Python itself – ideally as the Anaconda distribution
(https://anaconda.org/)
Configure the environment in File -> Preferences -> KNIME -
> Python

Note to the Preferences/ KNIME/ Python configuration: when


implementing the configuration on the “Python” tab, you will be “forced” to
set the py2_knime and py3_knime environments (Knime sets these
environments, i.e., a list of the used Python libraries, in the Anaconda
Navigator). Once these environments are created, you can return to them and
select your preferred environment – in our case, for example, the “base”
environment with Python 3.8.3.
The Knime - Python environment installation guide can be found here:
https://docs.knime.com/2018-12/python_installation_guide/index.html

Knime Server
Knime Server is a tool that can move your automation to a completely
different level. The server is a paid tool and its price is relatively high.
However, if you have multiple automatic workflows and are considering
automatic business reporting or the implementation of AI scenarios for your
company, Knime server definitely represents a solution that you can count
on.
Knime Server is available as a local, “on-premise” installation or as a cloud
solution (AWS and MS Azure).
Knime Server introduction:
https://www.youtube.com/watch?v=NuEhV7TXh1Y
KNIME Server official website:
https://www.knime.com/knime-server
11. KNIME – links to other sources
Links
Links to other sources, including social networks, blogs, attendance and on-
line courses, as well as a link to real datasets (public datasets to play with).
Knime – official
Knime – updated basic operation guide with the platform:
https://www.knime.com/getting-started-guide
Knime forum:
https://forum.knime.com/
Knime hub:
https://hub.knime.com/
Knime blog:
https://www.knime.com/blog
Knime – community
Node pit:
https://nodepit.com/
Training
https://www.knime.com/knime-courses
MOOC courses
https://www.udemy.com/course/knime-bootcamp/ (basics, free)
https://www.udemy.com/courses/search/?
src=ukw&q=%C5%A1t%C4%9Btinov%C3%A1 (courses of my colleague B.
Stetinova on the topic of Knime and machine learning)
https://www.udemy.com/courses/search/?src=ukw&q=knime (search link to
other courses on Udemy.com)
https://www.coursera.org/lecture/code-free-data-science/introduction-to-
knime-analytics-platform-YBD5E (introductions to Knime on the
Coursera.org platform)
Social networks
Social network Link
Facebook https://www.facebook.com/KNIMEanalytics/
Instagram https://www.instagram.com/knime_official/
LinkedIn https://ch.linkedin.com/company/knime.com
Youtube https://www.youtube.com/user/KNIMETV

Public datasets worth playing with


http://archive.ics.uci.edu/ml/index.php

The latest versions of the training workflows on the Knime


Hub
https://hub.knime.com/search?q=lofflerwf
12. About the author

Vladimír Löffler (1971)


Graduated from the University of Economics. Experience with
large database systems in the retail area (Kaufland) and automotive
industry (Škoda Auto, WITTE Automotive) since 2001. Knowledge
of ERP systems – SAP, Business Intelligence systems – SAP
Business Intelligence, Knime, Power BI, programming – especially
ABAP and Python. Participant in Big Data projects – machine
learning, data analytics, democratization of data science. Fan of Big
Data and data science, athlete (MTB Freeride, skiing, martial arts),
traveler, married, two children.
Co-founder of and lecturer in Elderberry Data.

https://cz.linkedin.com/in/vladim%C3%ADr-l%C3%B6ffler-a84a9456
https://www.udemy.com/user/vladimir-loffler/
[email protected]
13. Acknowledgment
I would like to thank my colleagues (and particularly Barbora Stetinova) for
their feedback and precious advice and comments related to the content of
this textbook. I would also like to thank my family, which supported me
while I was working on this book.

You might also like