Kettle Manual

1. The document provides instructions on installing Pentaho Data Integration (Kettle) and introduces its main components: Spoon and transformations. 2. Spoon is the graphical tool used to design and test Kettle processes. Transformations contain steps and hops that move data through the transformation. 3. The document walks through a basic "Hello World" transformation example to greet names in a CSV file and output them to an XML file. It demonstrates creating a transformation in Spoon, adding input, script, and output steps, and configuring the steps.

Uploaded by

GustavoAdolfoMedinaPaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

218 views17 pages

Kettle Manual

Uploaded by

GustavoAdolfoMedinaPaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

01.

Installing Kettle
Pentaho Data Integration (Kettle) Tutorial 02. Spoon Introduction
Installing Kettle
You can download PDI from . At the time of this writing, the newest released version is 3.0.3, so the file you have to download is Sourceforge.net
. Kettle-3.0.3.GA-nnnn.zip
Prerequisites
Kettle requires the Sun Java Runtime Environment (JRE) version 1.5 (also called 5.0 in some naming schemes) or newer. You can obtain a JRE
for free from . http://java.sun.com/
Installation
PDI does not require installation unless you download the Windows .exe file, which needs no specific installation instructions. For all other
platforms, simply unpack the zip file into a folder of your choice. On Unix-like operating systems, you will need to make the shell scripts
executable by using the command: chmod
cd Kettle
chmod +x *.sh
Pentaho Data Integration (Kettle) Tutorial 02. Spoon Introduction
1.
2.
02. Spoon Introduction
01. Installing Kettle Pentaho Data Integration (Kettle) Tutorial 03. Hello World Example
Spoon Introduction
Spoon is the graphical tool with which you design and test every PDI process. The other PDI components execute the processes designed with
Spoon, and are executed from a terminal window.
Repository and files
In Spoon, you build Jobs and Transformations. PDI offers two methods to save them:
Database repository
Files
If you choose the repository method, the repository has to be created the first time you execute Spoon. If you choose the files method, the Jobs
are saved in files with the extension, and the Transformations are in files with the extension. In this tutorial you'll work with the second kjb ktr
method.
Starting Spoon
Start Spoon by executing on Windows, or on Unix-like operating systems. As soon as Spoon starts, a dialog window spoon.bat spoon.sh
appears asking for the repository connection data. Click the button. No Repository
The next thing you'll see is a welcome window. Go to the menu and click . A window will come up that enables you to change Edit Options...
various general and visual characteristics. If you change something, it will be necessary to restart Spoon in order to see the changes applied.
01. Installing Kettle Pentaho Data Integration (Kettle) Tutorial 03. Hello World Example
03. Hello World Example
02. Spoon Introduction Pentaho Data Integration (Kettle) Tutorial 04. Refining Hello World
Hello World Example
Although this will be a simple example, it will introduce you to some of the fundamentals of PDI:
Working with the Spoon tool
Transformations
Steps and Hops
Predefined variables
Previewing and Executing from Spoon
Executing Transformations from a terminal window with the Pan tool.
Overview
Let's suppose that you have a CSV file containing a list of people, and want to create an XML file containing greetings for each of them.
If this were the content of your CSV file:
last_name, name
Suarez,Maria
Guimaraes,Joao
Rush,Jennifer
Ortiz,Camila
Rodriguez,Carmen
da Silva,Zoe
This would be the output in your XML file:
- <Rows>
- <row>
<msg>Hello, Maria!</msg>
</row>
- <row>
<msg>Hello, Joao!</msg>
</row>
- <row>
<msg>Hello, Jennifer!</msg>
</row>
- <row>
<msg>Hello, Camila!</msg>
</row>
- <row>
<msg>Hello, Carmen!</msg>
</row>
- <row>
<msg>Hello, Zoe!</msg>
</row>
</Rows>
The creation of the file with greetings from the flat file will be the goal for your first Transformation.
A Transformation is made of Steps linked by Hops. These Steps and Hops form paths through which data flows. Therefore it's said that a
Transformation is . data-flow oriented
Preparing the environment
1.
2.
3.
1.
2.
3.
4.
1.
2.
3.
4.
Before starting a Transformation, create a folder in the installation folder or some other convenient place. There you'll save all the files for Tutorial
this tutorial. Then create a CSV file like the one shown above, and save it in the Tutorial folder as . list.csv
Transformation walkthrough
The proposed task will be accomplished in three subtasks:
Creating the Transformation
Constructing the skeleton of the Transformation using Steps and Hops
Configuring the Steps in order to specify their behavior
Creating the Transformation
Click , then select . Alternatively you can go to the menu, then select , then . You can also New Transformation File New Transformation
just press . Ctrl-N
In the navigator, click , then click . Or right click the diagram and click . Or use View Transformation 1 Settings Transformation Settings
the Ctrl+T shortcut.
A window appears where you can specify Transformation properties. In this case, just write a name and a description, then click . Save
Save the Transformation in the folder with the name . This will create a file. Tutorial hello hello.ktr
Constructing the skeleton of the Transformation using Steps and Hops
A is the minimal unit inside a Transformation. A wide variety of Steps are available, grouped into categories like Input and Output, among Step
others. Each Step is designed to accomplish a specific function, such as reading a parameter or normalizing a dataset.
A is a graphical representation of data flowing between two Steps, with an origin and a destination. The data that flows through that Hop Hop
constitutes the of the origin Step, and the of the destination Step. A Hop has only one origin and one destination, but more Output Data Input Data
than one Hop could leave a Step. When that happens, the Output Data can be copied or distributed to every destination. Likewise, more than one
Hop can reach a Step. In those instances, the Step has to have the ability to merge the Input from the different Steps in order to create the
Output.
A Transformation has to do the following:
Read the CSV file
Build the greetings
Save the greetings in the XML file
For each of these items you'll use a different Step, according to the next diagram:
In this example, the correspondence between tasks and Steps is one-to-one because the Transformation is very simple. It isn't always that way,
though.
Here's how to start the Transformation:
To the left of the workspace is the . Select the category. Steps Palette Input
Drag the CSV file onto the workspace on the right.
Select the category. Scripting
4.
5.
6.
1.
2.
3.
1.
2.
3.
4.
Drag the icon to the workspace. Modified JavaScript Value
Select the category. Output
Drag the icon to the workspace. XML Output
Now you will link the CSV file input with the Modified Java Script Value by creating a Hop:
Select the first Step.
Hold the key and drag the icon onto the second Step. Shift
Link the Modified Java Script Value with the XML Output via this same process.
Specifying Step behavior
Every Step has a configuration window. These windows vary according to the functionality of the Steps and the category to which they belong.
However, is always a representative name inside the Transformation - this doesn't change among Step configurations. Step Name Step
allows you to clarify the purpose of the Step. Description
Configuring the CSV file input Step
Double-click on the CSV file input Step.
The configuration window belonging to this kind of Step will appear. Here you'll indicate the location, format and content of the input file.
Replace the default name with one that is more representative of this Step's function. In this case, type in . name list
In the field, type the name and location of the input file. Filename
Note: Just to the right of the text box is a symbol with a red dollar sign. This means that you can use variables as well
as plain text in that field. A variable can be written manually as ${name_of_the_variable} or selected from the variable
window, which you can access by pressing . This window shows both predefined and user-defined Ctrl-Spacebar
variables, but since you haven't created any variables yet, right now you'll only see the predefined ones. Among those,
select:
4.
5.
6.
7.
8.
9.
1.
2.
3.
4.
5.
${Internal.Transformation.Filename.Directory}
Next the name of the variable, type a slash and the name of the file you created:
${Internal.Transformation.Filename.Directory}/list.csv
At runtime the variable will be replaced by its value, which will be the path where the Transformation was saved. The
Transformation will search the file
in that location. list.csv
Click to add the list of column names of the input file to the grid. By default, the Step assumes that the file has headers (the Get Fields
checkbox is checked). Header row present
Note: The button is present in most Steps' configuration windows. Its purpose is to load a grid with data from Get Fields
external sources or previous Steps. Even when the fields can be written manually, this button gives you a shortcut when
there are many available fields and you want to use all or almost all of them.
The grid has now the names of the columns of your file: and , and should look like this: last_name name
Switch lazy conversion off
Click to ensure that the file will be read as expected. A window showing data from the file will appear. Preview
Click to finish defining the Step CSV file input. OK
Configuring the Modified JavaScript Value Step
Double-click on the Step. Modified JavaScript Value
The Step configuration window will appear. This is different from the previous Step config window in that it allows you to write JavaScript
code. You will use it to build the message concatenated with each of the names. "Hello, "
Name this Step . Greetings
The main area of the configuration window is for coding. To the left, there is a tree with a set of available functions that you can use in the
code. In particular, the last two branches have the and fields, ready to use in the code. In this example there are two fields: input output
and . Write the following code: last_name name
var msg = 'Hello, ' + name.getString() + ; "!"
Note: The text name.getString() can be written manually, or by double-clicking on the text in the function tree.
At the bottom you can type any variable created in the code. In this case, you have created a variable named . Since you need to msg
send this message to the output file, you have to write the variable name in the grid. This should be the result:
5.
6.
7.
8.
9.
10.
1.
2.
3.
4.
5.
Don't mix these variables with PDI variables - they are not the same. Warning:
Note: is not an adjective for , but for the Step. You are not dealing with a variant of JavaScript - it Modified JavaScript
is the Step itself that is modified. It is an enhanced version of the original Step, which you found in previous versions of
PDI.
Click to finish configuring . OK Step Modified Script Value
Select the Step you just configured. In order to check that the new field will leave this Step, you will now see the Input and Output Fields.
are the data columns that reach a Step. are the data columns that leave a Step. There are Steps that simply Input Fields Output Fields
transform the input data. In this case, the input and output fields are usually the same. There are Steps, however, that add fields to the
Output - , for example. There are other Steps that filter or combine data causing that the Output has less fields that the Input - Calculator
, for example. Group by
Right-click the Step to bring up a context menu.
Select . You'll see that the Input Fields are and , which come from the CSV file input Step. Show Input Fields last_name name
Select . You'll see that not only do you have the existing fields, but also the new field. Show Output Fields msg
Configuring the XML Output Step
Double-click the . The configuration window for this kind of Step will appear. Here you're going to set the name and XML Output Step
location of the output file, and establish which of the fields you want to include. You may include all or some of the fields that reach the
Step.
Name the Step . File with Greetings
In the box write: File
${Internal.Transformation.Filename.Directory}/Hello.xml
Click to fill the grid with the three input fields. In the output file you only want to include the message, so delete and Get Fields name
. last_name
Save the Transformation again.
How does it work?
When you execute a Transformation, almost all Steps are executed simultaneously. The Transformation executes asynchronously; the rows of
data flow through the
1.
2.
3.
4.
5.
6.
Steps at their own pace. Each processed row flows to the next Step without waiting for the others. In real-world Transformations, forgetting this
characteristic can be a significant source of unexpected results.
At this point, Hello World is almost completely configured. A Transformation reads the input file, then creates messages for each row via the
JavaScript code, and then the message is sent to the output file. This is a small example with very few rows of names, so it is difficult to notice the
asynchronous execution in action. Keep in mind, however, that it's possible that at the same time a name is being written in the output file,
another is leaving the first Step of the Transformation.
Verify, preview and execute
Before executing the Transformation, check that everything is properly configured by clicking . Spoon will verify that the Verify
Transformation is syntactically correct, and look for unreachable Steps and nonexistent connections. If everything is in order (it should be
if you followed the instructions), you are ready to preview the output.
Select the JavaScript Step and then click button. The following window will appear: Preview
As you can see, Spoon suggests that you preview the selected Step. Click . After that, you will see a window with a sample QuickLaunch
of the output of the JavaScript Step. If the output is what you expected, you're ready to execute the Transformation.
Click . Run
Spoon will show a window where you can set, among other information, the parameters for the execution and the logging level. Click
. Launch
A new window tab will appear in the Job window. This is the log tab, which contains a log of the current execution.
The log tab has two sections: An upper part and a lower part.
In the upper side you can see the executed operations for each Step of the Transformation. In particular, pay attention to these:
Read: the number of rows coming from previous Steps.
Written: the number of rows leaving from this Step toward the next.
Input: the number of rows read from a file or table.
Output: the number of rows written to a file or table.
Errors: errors in the execution. If there are errors, the whole row will become red.
In the lower portion of the window, you will see the execution step by step. The detail will depend on the log level established. If you pay attention
to this detail, you will see the asynchronicity of the execution. The last line of the text will be:
Spoon - The transformation has finished!!
If there weren't error messages in the text, open the newly generated file and check its content. Hello.xml
Pan
Pan allows you to execute Transformations from a terminal window. The script is on Windows, or on other platforms, and it's pan.bat pan.sh
located in the installation folder. If you run the script without any options, you'll see a description pan with a list of available options.
To execute your Transformation, try the simplest command:
Pan /file <Jobs_path>/Hello.ktr /norep
/norep is a command to ask Spoon not to connect to the repository.
/file precedes the name of the file that contains the Transformation.
<Jobs_path> is the full path to the Tutorial folder, for example:
C:/Pentaho/Tutorial
or
/home/PentahoUser/Tutorial
The other options are run with default values.
After you enter this command, the Transformation will be executed in the same way it did inside Spoon. In this case, the log will be written to the
terminal unless you specify a file to write to. The format of the log text will vary a little, but the information will be basically the same that you saw
in the graphical environment.
02. Spoon Introduction Pentaho Data Integration (Kettle) Tutorial 04. Refining Hello World
04. Refining Hello World
03. Hello World Example Pentaho Data Integration (Kettle) Tutorial
Refining Hello World
Now that the Transformation has been created and executed, the next task is enhancing it.
Overview
These are the improvements that you'll make to your existing Transformation:
You won't look for the input file in the same folder, but in a new one, a folder independent to that where the Transformations are saved.
The name of the input file won't be fixed; the Transformation will receive it as a parameter.
You will validate the existence of the input file (exercise: execute the Transformation you created, setting as the name of the file, a file
that doesn't exist. See what happens!)
The name the output file will be dependent of the name of the input file.
Here's what happens:
Get the parameter
Create the output file with greetings
Check if the parameter is null; if it is, abort
Check if the file exists; if not, abort
This will be accomplished via a , which is a component made by Job Entries linked by Hops. These Entries and Hops are arranged according Job
the expected order of execution. Therefore it is said that a Job is . flow-control oriented
A is a unit of execution inside a Job. Each Job Entry is designed to accomplish a specific function, ranging from verifying the existence Job Entry
of a table to sending an email.
From a Job it is possible to execute a Transformation or another Job, that is, Jobs and Transformations are also Job Entries.
A Hop is a graphical representation that identifies the sequence of execution between two Job Entries.
Even when a Hop has only one origin and one destination, a particular Job Entry can be reached by more than a Hop, and more than a Hop can
leave any particular Job Entry.
This is the process:
Getting the parameter will be resolved by a new Transformation
The parameter will be verified through the result of the new Transformation, qualified by the conditional execution of the next Steps.
The file's existence will be verified by a Job Entry.
Executing the main task of the Job will be made by a variation of the Transformation you made in the first part of this tutorial.
Graphically it's represented like this:
1.
2.
3.
1.
2.
Preparing the Environment
In this part of the tutorial, the input and output files will be in a new folder called - go ahead and create it now. Copy the file to this Files list.csv
new directory.
In order to avoid writing the full path each time you need to reference the folder or the files, it makes sense to create a variable containing this
information. To do this, edit the configuration file, located in the kettle.properties C:\Documents and Settings\<username>\.kettle* folder on
directory on other platforms. Put this line at the Windows XP/2000, C:\Profiles\<username>\.kettle* folder on Windows Vista or the *~/.kettle
end of the file, changing the path to the one specific to the Files directory you just created:
FILES=/home/PentahoUser/Files
Spoon reads this file when it starts, so for this change to take effect, you must restart Spoon.
Now you are ready to start. This process involves three stages:
Create the Transformation
Modify the Transformation
Build the Job
Creating the Transformation
Create a new Transformation the same way you did before. Name this Transformation . get_file_name
Drag the following Steps to the workspace, name them, and link them according to the diagram:
2.
a.
b.
c.
3.
1.
2.
3.
4.
1.
2.
3.
4.
5.
1.
2.
3.
## Get System Info (Input category)
Filter Rows (Flow category)
Abort (Flow category)
Set Variable (Job category)
Configure the Steps as explained below:
Configuring the Get System Info Step (Input category)
This Step captures information from sources outside the Transformation, like the system date or parameters entered in the command line. In this
case, you will use the Step to get the first and only parameter. The configuration window of this Step has a grid. In this grid, each row you fill will
become a new column containing system data.
Double-click the Step.
In the first cell, below the Name column, write . my_file
When you click the cell below Type, a window will show up with the available options. Select . command line argument 1
Click . OK
Configuring the Filter Rows Step (Flow category)
This Step divides the output in two, based upon a condition. Those rows for which the condition evaluates to true follow one path in the diagram,
the others follow another.
Double-click the Step.
Write the condition: In select and replace the with . Field my_file = IS NULL
In the drop-down list next to , select . Send 'true' data to Step Abort
In the drop-down list next to , select . Send 'false' data to Step Set Variable
Click . OK
Now a NULL parameter will reach the Abort Step, and a NOT NULL parameter will reach the Set Variable Step.
Configuring the Abort Step (Flow category)
You don't have anything to configure in this Step. If a row of data reaches this Step, the Transformation aborts, then fails, and you will use that
result in the main Job.
Configuring the "Set Variable" Step ("Job" category)
This Step allows you to create variables and put the content of some of the input fields into them. The configuration window of the Step has a grid.
Each row in this grid is meant to hold a new variable.
Now you'll create a new variable to use later:
Double-click the Step.
Click . The only existing field will appear: . The default variable name is the name of the selected field in upper case: Get Fields my_file
. Leave the default intact. MY_FILE
Click OK.
1.
2.
3.
4.
5.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
1.
Execution
To test the Transformation, click . Run
Within the run dialog, you will find a grid titled "Arguments" on the bottom left. Delete whatever arguements are already inside, and
instead type as the first argument value. This will be transfered to the transformation as the command line argument. list
Click . Launch
In the pane, you'll see a message like this: Logging
Set Variables.0 - Set variable MY_FILE to value [list]
Click again, and clear the value of the first argument. This time, when you hit you'll see this: Run Launch
Abort.0 - Row nr 1 causing abort : []
Abort.0 - Aborting after having seen 1 rows.
In the Pane, You'll see the line highlighted in red, which indicates that an error occurred and that the Transformation Step Metrics Step Abort
failed (as expected).
Modifying the Transformation
Now it's time to modify the transformation in order to match the names of the files to their corresponding parameters. If the command line Hello
argument to the job would be , this transformation should read the file and create the file . It would also be foo foo.csv foo_with_greetings.xml
helpful to add a filter to discard the empty rows in the input file.
Open the Transformation . Hello.ktr
Open the configuration window. CSV File Input Step
Delete the content of the text box, and press to see the list of existing variables. You should see the Filename Ctrl-Spacebar FILES
variable you added to kettle.properties. Select it and add the name of the variable you created in the previous Transformation. The text
becomes:
${FILES}/${MY_FILE}.csv
Click . OK
Open the configuration window. XML Output Step
Replace the content of the text box with this: Filename
${FILES}/${MY_FILE}_with_greetings
Click to view the projected XML filename. It should replace the FILES variable with your files directory and look like Show Filename(s)
this (depending on the location specified for FILES):
/home/Pentaho/files/${MY_FILE}_with_greetings.xml
Click . OK
Drag a step into the transformation. Filter Rows
Drag the step onto the Hop that leaving and reaching . When you see that the Filter Rows CSV Input Modified Javascript Script Value
Hop line becomes emphasized (thicker), release the mouse button. You have now linked the new step to the sequence of existent steps.
Select for the Field, and for the comparator. name IS NOT NULL
Leave and blank. This makes it so only the rows that fulfill the condition (rows Send 'true' data to Step Send 'false' data to Step
with non-null names) follow to the next Step. This is similar to an earlier Step.
Click . OK
Click and name this Transformation . Save As Hello_with_parameters
Executing the Transformation
To test the changes you made, you need to make sure that the variable exists and has a value. Because this Transformation is MY_FILE
independent of the Transformation that creates the variable, in order to execute it, you'll have to create the variable manually.
1.
2.
3.
4.
5.
6.
1.
a.
b.
c.
d.
e.
2.
1.
2.
3.
a.
b.
c.
4.
a.
b.
c.
5.
a.
b.
6.
a.
7.
a.
In the menu, click . A list of variables will appear. Edit Set Environment Variables
At the bottom of the list, type in as the variable name; as the content, type the name of the file without its extension. MY_FILE
Click . OK
Click . Run
In the list of variables, you'll see the one you just created. Click to execute the Transformation. Launch
Lastly, verify the existence and content of the output file.
Building the main job
The last task in this part of the tutorial is the construction of the main Job:
Create the Job:
Click , then . New Job
The Job workspace, where you can drop Job Entries and Hops, will come up.
Click , then . Job Settings
A window in which you can specify some Job properties will come up. Type in a name and a description.
Click . Save the Job in the Tutorial folder, under the name . Save Hello
Build the skeleton of the Job with Job Entries and Hops:
To the left of the workspace there is a palette of Job Entries.
Now build the Job:
Drag the following steps into the workspace: one step, two steps, and one step. General->Start General->Transformation File Exists
Link them in the following order: Start, Transformation, File Exists, Transformation.
Drag two steps to the workspace. Link one of them to the first step and the other to the General->Abort Transformation File Exists
step. The newly created hops will turn red.
Configure the Steps:
Double click the first Transformation step. The configuration window will come up.
In the field, type the following: Transformation filename
${Internal.Job.Filename.Directory}/get_file_name.ktr
This will work since transformations and jobs reside in the same folder.
Click . OK
Configure the second of the two Transformation Job Entries:
Double-click the entry. The configuration window will come up.
Type the name of the other Transformation in the field: Transformation Filename
${Internal.Job.Filename.Directory}/Hello_with_parameter.ktr
Click . OK
Configure the File Exists Job Entry:
Double-click the entry to bring up the configuration window.
Put the complete path of the file whose existence you want to verify in the field. The name is the same that you wrote Filename
in the modified Transformation Hello:
${FILES}/${MY_FILE}.csv
Note: Remember that the variable ${FILES} was defined in the kettle.properties file and the variable
${MY_FILE} was created in the Job Entry that is going to be executed before this one.
Configure the Abort step connected to the get_file_name transformation step:
In the Message textbox write: The file name argument is missing
Configure the Abort step connected to the File Exists step:
In the Message textbox write this text:
The file ${FILES}/${MY_FILE}.csv does not exist
Note: In runtime, the tool will replace the variable names by its values, showing for example: "The file
c:/Pentaho/Files/list.csv does not exist. If you place your mouse pointer over the Message textbox, Spoon will
display a tooltip showing projected output.
1.
2.
3.
1.
2.
3.
4.
5.
6.
7.
Configuring the Hops
A Job Entry can be executed unconditionally (it's executed always), when the previous Job Entry was successful, and when the previous Job
Entry failed. This execution is represented by different colors in the Hops: a black Hop indicates that the following Job Entry is always executed; a
green Hop indicates that the following Job Entry is executed only if the previous Job Entry was successful; and a red Hop indicates that the
following Job Entry is executed only if the previous Job Entry failed.
As a consequence of the order in which the Job Entries of your Job were created and linked, all of the Hops took the right color, that is, the Steps
will execute as you need:
The first Transformation entry will be always executed (The Hop that goes from Start toward this entry, is black)
If the Transformation that gets the parameter doesn't find a parameter, (that is, the Transformation failed), the control goes through the
red Hop towards the
Abort Job entry.
If the Transformation is successful, the control goes through the green Hop towards the File Exists entry.
If the file doesn't exist, that is, the verification of the existence fails, the control goes through the red Hop, towards the second Abort Job
entry.
If the verification is successful, the control goes through the green Hop towards the main Transformation entry.
If you wanted to change the condition for the execution of a Job Entry, the steps to follow would be:
Select the Hop that reached this Job Entry.
Right click to bring up a context menu.
Click , then one of the three available conditions. Evaluation
How it works
When you execute a Job, the execution is tied to the order of the Job Entries, the direction of the Hops, and the condition under which an entry is
or not executed. The execution follows a sequence. The execution of a Job Entry cannot begin until the execution of the Job Entries that precede
it has finished.
In real-world situations, a Job can be a solution to solve problems related to a sequence of tasks in the Transformations. If you need a part of a
Transformation to finish before another part begins, a solution could be to divide the Transformation into two independent Transformations, and
execute them from a Job, one after the other.
Executing the Job
To execute a Job, you first must supply a parameter. Because the only place where the parameter is used is in the Transformation get_file_name
(after that you only use the variable where the parameter is saved) write the parameter as follows:
Double-click the Transformation Step. get_file_name
The ensuing window has a grid named . In the first row type . Arguments list
Click . OK
Click the button, or from the title menu select . Run Job->Run
A window will appear with general information related with the execution of the Job.
Click . Launch
The execution results pane on the bottom should display the execution results.
Within the execution results pane, the Job Metrics tab shows the Job Entries of your Job. For each executed Job Entry, you'll see, among other
data, the result of the execution. The execution of the entries follows a sequence. As a result, if an entry fails, you won't see the entries that follow
because they never start.
In the Logging tab you can see the log detail, including the starting and ending time of the Job Entries. In particular, when an Entry is a
Transformation, the log corresponding to the transformation is also included.
The new file has been created when you see this at the end of the log text:
Spoon - Job has ended.
If the input file was , then the output file should be and should be in the same folder. Find it and check its list.csv list_with_greetings.xml
content.
Now change the name of the parameter by replacing it with a nonexistent file name and execute the Job again. You'll see that the Job aborts, and
the log shows the following message (where <parameter> is the parameter you supplied):
Abort - The file <parameter> does not exist
Now try deleting the parameter and executing the Job one more time. In this case the Job aborts as well, and in the log you can see this
message, as expected:
Abort - The file name is missing
Kitchen
Kitchen is the tool used to execute Jobs from a terminal window. The script is on Windows, and on other platforms, and kitchen.bat kitchen.sh
you'll find it in the installation folder. If you execute it, you'll see a description of the command with a list of the available options.
To execute the Job, try the simplest command:
kitchen /file <Jobs_path>/Hello.kjb <par> /norep
/norep is a command to ask Spoon not to connect to the repository.
/file precedes the name of the file corresponding to the Job to be executed.
<Jobs_path> is the full path of the folder Tutorial, for example:
c:/Pentaho/Tutorial (Windows)
or
/home/PentahoUser/Tutorial
<par> is the parameter that the Job is waiting for. Remember that the expected parameter is the name of the input file, without the csv.
The other options (i.e. log level) take default values.
After you enter this command, the Job will be executed in the same way it did inside Spoon. In this case, the log will be written to the terminal
unless you redirect it to a file. The format of the log text will vary a little, but the information will be basically the same as in the graphical
environment.
Try to execute the Job without parameters, with an invalid parameter (a nonexistent file), and with a valid parameter, and verify that everything
works as expected. Also experiment with Kitchen, changing some of the options, such as log level.
Pentaho Data Integration (Kettle) Tutorial
Written by , Pentaho Community Member, BI consultant (Assert Solutions), Argentina Mara Carina Roldn
This work is licensed under the . Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License
Introduction
Pentaho Data Integration (PDI, also called ) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. Kettle
Though ETL tools are most frequently used in data warehouses environments, PDI can also be used for other purposes:
Migrating data between applications or databases
Exporting data from databases to flat files
Loading data massively into databases
Data cleansing
Integrating applications
PDI is easy to use. Every process is created with a graphical tool where you specify what to do without writing code to indicate how to do it;
because of this, you could say that PDI is . metadata oriented
PDI can be used as a standalone application, or it can be used as part of the larger Pentaho Suite. As an ETL tool, it is the most popular open
source tool available. PDI supports a vast array of input and output formats, including text files, data sheets, and commercial and free database
engines. Moreover, the transformation capabilities of PDI allow you to manipulate data with very few limitations.
Through a simple "Hello world" example, this tutorial will to show you how easy it is to work with PDI and get you ready to make your own more
complex Transformations.

Pentaho Data Integration Pentaho Data Integration
100% (1)
Pentaho Data Integration Pentaho Data Integration
99 pages
PE Format Reverse Engineer View
100% (2)
PE Format Reverse Engineer View
87 pages
Pdi Lab Guide
No ratings yet
Pdi Lab Guide
44 pages
Pentaho Data Integration
No ratings yet
Pentaho Data Integration
99 pages
Logical & Physical Address Space - Swapping - Memory Management Techniques
No ratings yet
Logical & Physical Address Space - Swapping - Memory Management Techniques
47 pages
Unit 1 - Operating System - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 1 - Operating System - WWW - Rgpvnotes.in PDF
17 pages
Git - The Simple Guide - No Deep Shit
No ratings yet
Git - The Simple Guide - No Deep Shit
9 pages
UNIT-II CCP Notes PDF
No ratings yet
UNIT-II CCP Notes PDF
10 pages
ProServe Assignment #1 - v1
100% (1)
ProServe Assignment #1 - v1
5 pages
BCAC601 Unix and Shell Programming 600105 2024,23-24 EVEN SEM (Copy)
No ratings yet
BCAC601 Unix and Shell Programming 600105 2024,23-24 EVEN SEM (Copy)
1 page
Installed Files
No ratings yet
Installed Files
16 pages
RBS Log 1 Merpati 2G
No ratings yet
RBS Log 1 Merpati 2G
913 pages
Sruthi Gujjarlapudi Week 2 Lab
No ratings yet
Sruthi Gujjarlapudi Week 2 Lab
7 pages
Full Circle - Issue 216 April 2025
No ratings yet
Full Circle - Issue 216 April 2025
65 pages
CMD Comands
No ratings yet
CMD Comands
6 pages
OS - Chapter - 5 - File System
No ratings yet
OS - Chapter - 5 - File System
30 pages
Introduction To Unix File Permission
No ratings yet
Introduction To Unix File Permission
7 pages
Bugreport OnePlusN200TMO SKQ1.210216.001 2024 05 29 05 51 12 Dumpstate - Log 25424
No ratings yet
Bugreport OnePlusN200TMO SKQ1.210216.001 2024 05 29 05 51 12 Dumpstate - Log 25424
33 pages
Log
No ratings yet
Log
92 pages
Principles of Operating Systems: Lecture 5 - Interprocess Communication Ardalan Amiri Sani
No ratings yet
Principles of Operating Systems: Lecture 5 - Interprocess Communication Ardalan Amiri Sani
38 pages
Ppa Practical
No ratings yet
Ppa Practical
34 pages
GV 5
No ratings yet
GV 5
2 pages
Theme Resource Changer Tutorial
No ratings yet
Theme Resource Changer Tutorial
5 pages
Lecture-9 Cyber Forensics - File System
No ratings yet
Lecture-9 Cyber Forensics - File System
3 pages
Multithreading Programming:: Unit 3: Multithreading and Event Handling
No ratings yet
Multithreading Programming:: Unit 3: Multithreading and Event Handling
15 pages
How To Install Windows 11 On Legacy BIOS Without Secure Boot or TPM 2.0 - All Things How
No ratings yet
How To Install Windows 11 On Legacy BIOS Without Secure Boot or TPM 2.0 - All Things How
13 pages
STURDeCAM31 CUOAGX CXLC AT Release Package Manifest Rev 1 2
No ratings yet
STURDeCAM31 CUOAGX CXLC AT Release Package Manifest Rev 1 2
10 pages
McAfee Sdat Extract and Scan For Anti Virus in Safe Mode
No ratings yet
McAfee Sdat Extract and Scan For Anti Virus in Safe Mode
3 pages
Install PEAR For PHP 7
No ratings yet
Install PEAR For PHP 7
3 pages
Outtext
No ratings yet
Outtext
6 pages
Fpse 64
No ratings yet
Fpse 64
6 pages
Log... 21.11
No ratings yet
Log... 21.11
2 pages
Log
No ratings yet
Log
2 pages
Python: Learn Python in 24 Hours
From Everand
Python: Learn Python in 24 Hours
Alex Nordeen
4/5 (12)
Pivot Tables In Depth For Microsoft Excel 2016
From Everand
Pivot Tables In Depth For Microsoft Excel 2016
Suljan Qeska
3.5/5 (3)
Talend Open Studio Cookbook
From Everand
Talend Open Studio Cookbook
Rick Barton
2/5 (1)
Visual Basic .NET for complete beginners
From Everand
Visual Basic .NET for complete beginners
Ken Carney
5/5 (6)
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Windows Batch File Programming
From Everand
Windows Batch File Programming
Michael Elliott
2/5 (2)
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
From Everand
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
Adam Freeman
No ratings yet
PHP and MySQL 24-Hour Trainer
From Everand
PHP and MySQL 24-Hour Trainer
Andrea Tarr
No ratings yet
Learn to Use Microsoft Excel 2016 eBook
From Everand
Learn to Use Microsoft Excel 2016 eBook
Michelle Halsey
No ratings yet
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
From Everand
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Larry Keller
3/5 (1)
JavaScript Introduction
From Everand
JavaScript Introduction
Lisa Saldivar
No ratings yet
Microsoft Excel Data Validation
From Everand
Microsoft Excel Data Validation
Marie Eklof
No ratings yet
Software Design Simplified
From Everand
Software Design Simplified
Liviu Catalin Dorobantu
No ratings yet
Microsoft Dynamics CRM 2011: Dashboards Cookbook
From Everand
Microsoft Dynamics CRM 2011: Dashboards Cookbook
Mark AuCoin
No ratings yet
Excel 2010 For Dummies
From Everand
Excel 2010 For Dummies
Greg Harvey
5/5 (2)
HTML5 canvas in real time
From Everand
HTML5 canvas in real time
Antonio Taccetti
No ratings yet
Crystal Reports Introduction: Versions 2008-2016
From Everand
Crystal Reports Introduction: Versions 2008-2016
Seth Bonder
No ratings yet
How to Write a Bulk Emails Application in Vb.Net and Mysql: Step by Step Fully Working Program
From Everand
How to Write a Bulk Emails Application in Vb.Net and Mysql: Step by Step Fully Working Program
Lotfi Ferchichi
No ratings yet
Jump Start Git
From Everand
Jump Start Git
Shaumik Daityari
No ratings yet
SharePoint 2010 Issue Tracking System Design, Create, and Manage
From Everand
SharePoint 2010 Issue Tracking System Design, Create, and Manage
Sarath Thirumoorthi
3/5 (1)
AutoIT Scripting For Beginners
From Everand
AutoIT Scripting For Beginners
Rajan
5/5 (2)
OpenCart Tips and Tricks
From Everand
OpenCart Tips and Tricks
iSenseLabs
No ratings yet
JavaScript for Kids: Start Your Coding Adventure
From Everand
JavaScript for Kids: Start Your Coding Adventure
Abdelfattah Ragab
No ratings yet
The Project Gutenberg RST Manual
From Everand
The Project Gutenberg RST Manual
Marcello Perathoner
No ratings yet
AutoCAD MEP 2022 for Designers, 6th Edition
From Everand
AutoCAD MEP 2022 for Designers, 6th Edition
Prof. Sham Tickoo
No ratings yet
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
From Everand
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
Abdelfattah Ragab
No ratings yet
Hidden Gems of Microsoft Excel
From Everand
Hidden Gems of Microsoft Excel
Ambily
No ratings yet
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Introduction to PHP, Part 4, Second Edition
From Everand
Introduction to PHP, Part 4, Second Edition
Adam Majczak
No ratings yet
Learn Excel in 24 Hours
From Everand
Learn Excel in 24 Hours
Alex Nordeen
4/5 (3)
How to Upgrade Captiva InputAccel
From Everand
How to Upgrade Captiva InputAccel
Cooper Faust
No ratings yet
The Definitive Guide to Getting Started with OpenCart 2.x
From Everand
The Definitive Guide to Getting Started with OpenCart 2.x
iSenseLabs
No ratings yet
Make Bootstrap Themes
From Everand
Make Bootstrap Themes
Bo Feng
No ratings yet
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
C# For Beginners: An Introduction to C# Programming with Tutorials and Hands-On Examples
From Everand
C# For Beginners: An Introduction to C# Programming with Tutorials and Hands-On Examples
Nathan Metzler
5/5 (1)
Eclipse for Java Developers
From Everand
Eclipse for Java Developers
Dimitar Boyadzhiev
No ratings yet
Getting started with Adobe Acrobat Pro
From Everand
Getting started with Adobe Acrobat Pro
Rémy Lentzner
5/5 (1)
Pivot for office workers: Using Excel 365 and 2021
From Everand
Pivot for office workers: Using Excel 365 and 2021
Ina Koys
No ratings yet
Duke's Tips For Finding Functions in Word: Version 2007 And Later
From Everand
Duke's Tips For Finding Functions in Word: Version 2007 And Later
M. D. Wadsworth
No ratings yet
Advance Excel 2016: Training guide
From Everand
Advance Excel 2016: Training guide
Ritu Arora
No ratings yet
The Complete Powershell Training for Beginners
From Everand
The Complete Powershell Training for Beginners
Abdelfattah Benammi
No ratings yet
Javascript: Javascript Programming For Absolute Beginners: Ultimate Guide To Javascript Coding, Javascript Programs And Javascript Language
From Everand
Javascript: Javascript Programming For Absolute Beginners: Ultimate Guide To Javascript Coding, Javascript Programs And Javascript Language
William Sullivan
3.5/5 (2)
ASP.NET Application Development Fundamentals
From Everand
ASP.NET Application Development Fundamentals
James Lombard
No ratings yet
Programming ASP.NET
From Everand
Programming ASP.NET
Nino Paiotta
No ratings yet
Learn Javascript In 1 Hour
From Everand
Learn Javascript In 1 Hour
John Bura
No ratings yet
Windows Command Prompt
From Everand
Windows Command Prompt
Murat Yildirimoglu
No ratings yet
Basics with Windows Powershell
From Everand
Basics with Windows Powershell
Prometheus MMS
No ratings yet
Microsoft Access for Beginners and Intermediates
From Everand
Microsoft Access for Beginners and Intermediates
Fredrick Ezeh
No ratings yet
Salesforce Developer Interview Questions: 1.0, #1
From Everand
Salesforce Developer Interview Questions: 1.0, #1
SFDC TELUGU
No ratings yet
Microsoft Word for Beginners and Intermediates
From Everand
Microsoft Word for Beginners and Intermediates
Fredrick Ezeh
No ratings yet
Programming with Python
From Everand
Programming with Python
Enrique Vicente
No ratings yet
Intermediate Access: Access Essentials, #2
From Everand
Intermediate Access: Access Essentials, #2
M.L. Humphrey
No ratings yet
JavaScript Fundamentals: JavaScript Syntax, What JavaScript is Use for in Website Development, JavaScript Variable, Strings, Popup Boxes, JavaScript Objects, Function, and Event Handlers
From Everand
JavaScript Fundamentals: JavaScript Syntax, What JavaScript is Use for in Website Development, JavaScript Variable, Strings, Popup Boxes, JavaScript Objects, Function, and Event Handlers
Steven Bright
No ratings yet
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
From Everand
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
Jens Boje
No ratings yet
Coding In C Decoded: Decoded, #1
From Everand
Coding In C Decoded: Decoded, #1
D Brown
No ratings yet
Computer for Kids: The Operating System
From Everand
Computer for Kids: The Operating System
Steven Bright
No ratings yet

Kettle Manual

Uploaded by

Kettle Manual

Uploaded by

01.

You might also like