Kettle Manual
Kettle Manual
Installing Kettle
Pentaho Data Integration (Kettle) Tutorial 02. Spoon Introduction
Installing Kettle
You can download PDI from . At the time of this writing, the newest released version is 3.0.3, so the file you have to download is Sourceforge.net
. Kettle-3.0.3.GA-nnnn.zip
Prerequisites
Kettle requires the Sun Java Runtime Environment (JRE) version 1.5 (also called 5.0 in some naming schemes) or newer. You can obtain a JRE
for free from . http://java.sun.com/
Installation
PDI does not require installation unless you download the Windows .exe file, which needs no specific installation instructions. For all other
platforms, simply unpack the zip file into a folder of your choice. On Unix-like operating systems, you will need to make the shell scripts
executable by using the command: chmod
cd Kettle
chmod +x *.sh
Pentaho Data Integration (Kettle) Tutorial 02. Spoon Introduction
1.
2.
02. Spoon Introduction
01. Installing Kettle Pentaho Data Integration (Kettle) Tutorial 03. Hello World Example
Spoon Introduction
Spoon is the graphical tool with which you design and test every PDI process. The other PDI components execute the processes designed with
Spoon, and are executed from a terminal window.
Repository and files
In Spoon, you build Jobs and Transformations. PDI offers two methods to save them:
Database repository
Files
If you choose the repository method, the repository has to be created the first time you execute Spoon. If you choose the files method, the Jobs
are saved in files with the extension, and the Transformations are in files with the extension. In this tutorial you'll work with the second kjb ktr
method.
Starting Spoon
Start Spoon by executing on Windows, or on Unix-like operating systems. As soon as Spoon starts, a dialog window spoon.bat spoon.sh
appears asking for the repository connection data. Click the button. No Repository
The next thing you'll see is a welcome window. Go to the menu and click . A window will come up that enables you to change Edit Options...
various general and visual characteristics. If you change something, it will be necessary to restart Spoon in order to see the changes applied.
01. Installing Kettle Pentaho Data Integration (Kettle) Tutorial 03. Hello World Example
03. Hello World Example
02. Spoon Introduction Pentaho Data Integration (Kettle) Tutorial 04. Refining Hello World
Hello World Example
Although this will be a simple example, it will introduce you to some of the fundamentals of PDI:
Working with the Spoon tool
Transformations
Steps and Hops
Predefined variables
Previewing and Executing from Spoon
Executing Transformations from a terminal window with the Pan tool.
Overview
Let's suppose that you have a CSV file containing a list of people, and want to create an XML file containing greetings for each of them.
If this were the content of your CSV file:
last_name, name
Suarez,Maria
Guimaraes,Joao
Rush,Jennifer
Ortiz,Camila
Rodriguez,Carmen
da Silva,Zoe
This would be the output in your XML file:
- <Rows>
- <row>
<msg>Hello, Maria!</msg>
</row>
- <row>
<msg>Hello, Joao!</msg>
</row>
- <row>
<msg>Hello, Jennifer!</msg>
</row>
- <row>
<msg>Hello, Camila!</msg>
</row>
- <row>
<msg>Hello, Carmen!</msg>
</row>
- <row>
<msg>Hello, Zoe!</msg>
</row>
</Rows>
The creation of the file with greetings from the flat file will be the goal for your first Transformation.
A Transformation is made of Steps linked by Hops. These Steps and Hops form paths through which data flows. Therefore it's said that a
Transformation is . data-flow oriented
Preparing the environment
1.
2.
3.
1.
2.
3.
4.
1.
2.
3.
4.
Before starting a Transformation, create a folder in the installation folder or some other convenient place. There you'll save all the files for Tutorial
this tutorial. Then create a CSV file like the one shown above, and save it in the Tutorial folder as . list.csv
Transformation walkthrough
The proposed task will be accomplished in three subtasks:
Creating the Transformation
Constructing the skeleton of the Transformation using Steps and Hops
Configuring the Steps in order to specify their behavior
Creating the Transformation
Click , then select . Alternatively you can go to the menu, then select , then . You can also New Transformation File New Transformation
just press . Ctrl-N
In the navigator, click , then click . Or right click the diagram and click . Or use View Transformation 1 Settings Transformation Settings
the Ctrl+T shortcut.
A window appears where you can specify Transformation properties. In this case, just write a name and a description, then click . Save
Save the Transformation in the folder with the name . This will create a file. Tutorial hello hello.ktr
Constructing the skeleton of the Transformation using Steps and Hops
A is the minimal unit inside a Transformation. A wide variety of Steps are available, grouped into categories like Input and Output, among Step
others. Each Step is designed to accomplish a specific function, such as reading a parameter or normalizing a dataset.
A is a graphical representation of data flowing between two Steps, with an origin and a destination. The data that flows through that Hop Hop
constitutes the of the origin Step, and the of the destination Step. A Hop has only one origin and one destination, but more Output Data Input Data
than one Hop could leave a Step. When that happens, the Output Data can be copied or distributed to every destination. Likewise, more than one
Hop can reach a Step. In those instances, the Step has to have the ability to merge the Input from the different Steps in order to create the
Output.
A Transformation has to do the following:
Read the CSV file
Build the greetings
Save the greetings in the XML file
For each of these items you'll use a different Step, according to the next diagram:
In this example, the correspondence between tasks and Steps is one-to-one because the Transformation is very simple. It isn't always that way,
though.
Here's how to start the Transformation:
To the left of the workspace is the . Select the category. Steps Palette Input
Drag the CSV file onto the workspace on the right.
Select the category. Scripting
4.
5.
6.
1.
2.
3.
1.
2.
3.
4.
Drag the icon to the workspace. Modified JavaScript Value
Select the category. Output
Drag the icon to the workspace. XML Output
Now you will link the CSV file input with the Modified Java Script Value by creating a Hop:
Select the first Step.
Hold the key and drag the icon onto the second Step. Shift
Link the Modified Java Script Value with the XML Output via this same process.
Specifying Step behavior
Every Step has a configuration window. These windows vary according to the functionality of the Steps and the category to which they belong.
However, is always a representative name inside the Transformation - this doesn't change among Step configurations. Step Name Step
allows you to clarify the purpose of the Step. Description
Configuring the CSV file input Step
Double-click on the CSV file input Step.
The configuration window belonging to this kind of Step will appear. Here you'll indicate the location, format and content of the input file.
Replace the default name with one that is more representative of this Step's function. In this case, type in . name list
In the field, type the name and location of the input file. Filename
Note: Just to the right of the text box is a symbol with a red dollar sign. This means that you can use variables as well
as plain text in that field. A variable can be written manually as ${name_of_the_variable} or selected from the variable
window, which you can access by pressing . This window shows both predefined and user-defined Ctrl-Spacebar
variables, but since you haven't created any variables yet, right now you'll only see the predefined ones. Among those,
select:
4.
5.
6.
7.
8.
9.
1.
2.
3.
4.
5.
${Internal.Transformation.Filename.Directory}
Next the name of the variable, type a slash and the name of the file you created:
${Internal.Transformation.Filename.Directory}/list.csv
At runtime the variable will be replaced by its value, which will be the path where the Transformation was saved. The
Transformation will search the file
in that location. list.csv
Click to add the list of column names of the input file to the grid. By default, the Step assumes that the file has headers (the Get Fields
checkbox is checked). Header row present
Note: The button is present in most Steps' configuration windows. Its purpose is to load a grid with data from Get Fields
external sources or previous Steps. Even when the fields can be written manually, this button gives you a shortcut when
there are many available fields and you want to use all or almost all of them.
The grid has now the names of the columns of your file: and , and should look like this: last_name name
Switch lazy conversion off
Click to ensure that the file will be read as expected. A window showing data from the file will appear. Preview
Click to finish defining the Step CSV file input. OK
Configuring the Modified JavaScript Value Step
Double-click on the Step. Modified JavaScript Value
The Step configuration window will appear. This is different from the previous Step config window in that it allows you to write JavaScript
code. You will use it to build the message concatenated with each of the names. "Hello, "
Name this Step . Greetings
The main area of the configuration window is for coding. To the left, there is a tree with a set of available functions that you can use in the
code. In particular, the last two branches have the and fields, ready to use in the code. In this example there are two fields: input output
and . Write the following code: last_name name
var msg = 'Hello, ' + name.getString() + ; "!"
Note: The text name.getString() can be written manually, or by double-clicking on the text in the function tree.
At the bottom you can type any variable created in the code. In this case, you have created a variable named . Since you need to msg
send this message to the output file, you have to write the variable name in the grid. This should be the result:
5.
6.
7.
8.
9.
10.
1.
2.
3.
4.
5.
Don't mix these variables with PDI variables - they are not the same. Warning:
Note: is not an adjective for , but for the Step. You are not dealing with a variant of JavaScript - it Modified JavaScript
is the Step itself that is modified. It is an enhanced version of the original Step, which you found in previous versions of
PDI.
Click to finish configuring . OK Step Modified Script Value
Select the Step you just configured. In order to check that the new field will leave this Step, you will now see the Input and Output Fields.
are the data columns that reach a Step. are the data columns that leave a Step. There are Steps that simply Input Fields Output Fields
transform the input data. In this case, the input and output fields are usually the same. There are Steps, however, that add fields to the
Output - , for example. There are other Steps that filter or combine data causing that the Output has less fields that the Input - Calculator
, for example. Group by
Right-click the Step to bring up a context menu.
Select . You'll see that the Input Fields are and , which come from the CSV file input Step. Show Input Fields last_name name
Select . You'll see that not only do you have the existing fields, but also the new field. Show Output Fields msg
Configuring the XML Output Step
Double-click the . The configuration window for this kind of Step will appear. Here you're going to set the name and XML Output Step
location of the output file, and establish which of the fields you want to include. You may include all or some of the fields that reach the
Step.
Name the Step . File with Greetings
In the box write: File
${Internal.Transformation.Filename.Directory}/Hello.xml
Click to fill the grid with the three input fields. In the output file you only want to include the message, so delete and Get Fields name
. last_name
Save the Transformation again.
How does it work?
When you execute a Transformation, almost all Steps are executed simultaneously. The Transformation executes asynchronously; the rows of
data flow through the
1.
2.
3.
4.
5.
6.
Steps at their own pace. Each processed row flows to the next Step without waiting for the others. In real-world Transformations, forgetting this
characteristic can be a significant source of unexpected results.
At this point, Hello World is almost completely configured. A Transformation reads the input file, then creates messages for each row via the
JavaScript code, and then the message is sent to the output file. This is a small example with very few rows of names, so it is difficult to notice the
asynchronous execution in action. Keep in mind, however, that it's possible that at the same time a name is being written in the output file,
another is leaving the first Step of the Transformation.
Verify, preview and execute
Before executing the Transformation, check that everything is properly configured by clicking . Spoon will verify that the Verify
Transformation is syntactically correct, and look for unreachable Steps and nonexistent connections. If everything is in order (it should be
if you followed the instructions), you are ready to preview the output.
Select the JavaScript Step and then click button. The following window will appear: Preview
As you can see, Spoon suggests that you preview the selected Step. Click . After that, you will see a window with a sample QuickLaunch
of the output of the JavaScript Step. If the output is what you expected, you're ready to execute the Transformation.
Click . Run
Spoon will show a window where you can set, among other information, the parameters for the execution and the logging level. Click
. Launch
A new window tab will appear in the Job window. This is the log tab, which contains a log of the current execution.
The log tab has two sections: An upper part and a lower part.
In the upper side you can see the executed operations for each Step of the Transformation. In particular, pay attention to these:
Read: the number of rows coming from previous Steps.
Written: the number of rows leaving from this Step toward the next.
Input: the number of rows read from a file or table.
Output: the number of rows written to a file or table.
Errors: errors in the execution. If there are errors, the whole row will become red.
In the lower portion of the window, you will see the execution step by step. The detail will depend on the log level established. If you pay attention
to this detail, you will see the asynchronicity of the execution. The last line of the text will be:
Spoon - The transformation has finished!!
If there weren't error messages in the text, open the newly generated file and check its content. Hello.xml
Pan
Pan allows you to execute Transformations from a terminal window. The script is on Windows, or on other platforms, and it's pan.bat pan.sh
located in the installation folder. If you run the script without any options, you'll see a description pan with a list of available options.
To execute your Transformation, try the simplest command:
Pan /file <Jobs_path>/Hello.ktr /norep
/norep is a command to ask Spoon not to connect to the repository.
/file precedes the name of the file that contains the Transformation.
<Jobs_path> is the full path to the Tutorial folder, for example:
C:/Pentaho/Tutorial
or
/home/PentahoUser/Tutorial
The other options are run with default values.
After you enter this command, the Transformation will be executed in the same way it did inside Spoon. In this case, the log will be written to the
terminal unless you specify a file to write to. The format of the log text will vary a little, but the information will be basically the same that you saw
in the graphical environment.
02. Spoon Introduction Pentaho Data Integration (Kettle) Tutorial 04. Refining Hello World
04. Refining Hello World
03. Hello World Example Pentaho Data Integration (Kettle) Tutorial
Refining Hello World
Now that the Transformation has been created and executed, the next task is enhancing it.
Overview
These are the improvements that you'll make to your existing Transformation:
You won't look for the input file in the same folder, but in a new one, a folder independent to that where the Transformations are saved.
The name of the input file won't be fixed; the Transformation will receive it as a parameter.
You will validate the existence of the input file (exercise: execute the Transformation you created, setting as the name of the file, a file
that doesn't exist. See what happens!)
The name the output file will be dependent of the name of the input file.
Here's what happens:
Get the parameter
Create the output file with greetings
Check if the parameter is null; if it is, abort
Check if the file exists; if not, abort
This will be accomplished via a , which is a component made by Job Entries linked by Hops. These Entries and Hops are arranged according Job
the expected order of execution. Therefore it is said that a Job is . flow-control oriented
A is a unit of execution inside a Job. Each Job Entry is designed to accomplish a specific function, ranging from verifying the existence Job Entry
of a table to sending an email.
From a Job it is possible to execute a Transformation or another Job, that is, Jobs and Transformations are also Job Entries.
A Hop is a graphical representation that identifies the sequence of execution between two Job Entries.
Even when a Hop has only one origin and one destination, a particular Job Entry can be reached by more than a Hop, and more than a Hop can
leave any particular Job Entry.
This is the process:
Getting the parameter will be resolved by a new Transformation
The parameter will be verified through the result of the new Transformation, qualified by the conditional execution of the next Steps.
The file's existence will be verified by a Job Entry.
Executing the main task of the Job will be made by a variation of the Transformation you made in the first part of this tutorial.
Graphically it's represented like this:
1.
2.
3.
1.
2.
Preparing the Environment
In this part of the tutorial, the input and output files will be in a new folder called - go ahead and create it now. Copy the file to this Files list.csv
new directory.
In order to avoid writing the full path each time you need to reference the folder or the files, it makes sense to create a variable containing this
information. To do this, edit the configuration file, located in the kettle.properties C:\Documents and Settings\<username>\.kettle* folder on
directory on other platforms. Put this line at the Windows XP/2000, C:\Profiles\<username>\.kettle* folder on Windows Vista or the *~/.kettle
end of the file, changing the path to the one specific to the Files directory you just created:
FILES=/home/PentahoUser/Files
Spoon reads this file when it starts, so for this change to take effect, you must restart Spoon.
Now you are ready to start. This process involves three stages:
Create the Transformation
Modify the Transformation
Build the Job
Creating the Transformation
Create a new Transformation the same way you did before. Name this Transformation . get_file_name
Drag the following Steps to the workspace, name them, and link them according to the diagram:
2.
a.
b.
c.
3.
1.
2.
3.
4.
1.
2.
3.
4.
5.
1.
2.
3.
## Get System Info (Input category)
Filter Rows (Flow category)
Abort (Flow category)
Set Variable (Job category)
Configure the Steps as explained below:
Configuring the Get System Info Step (Input category)
This Step captures information from sources outside the Transformation, like the system date or parameters entered in the command line. In this
case, you will use the Step to get the first and only parameter. The configuration window of this Step has a grid. In this grid, each row you fill will
become a new column containing system data.
Double-click the Step.
In the first cell, below the Name column, write . my_file
When you click the cell below Type, a window will show up with the available options. Select . command line argument 1
Click . OK
Configuring the Filter Rows Step (Flow category)
This Step divides the output in two, based upon a condition. Those rows for which the condition evaluates to true follow one path in the diagram,
the others follow another.
Double-click the Step.
Write the condition: In select and replace the with . Field my_file = IS NULL
In the drop-down list next to , select . Send 'true' data to Step Abort
In the drop-down list next to , select . Send 'false' data to Step Set Variable
Click . OK
Now a NULL parameter will reach the Abort Step, and a NOT NULL parameter will reach the Set Variable Step.
Configuring the Abort Step (Flow category)
You don't have anything to configure in this Step. If a row of data reaches this Step, the Transformation aborts, then fails, and you will use that
result in the main Job.
Configuring the "Set Variable" Step ("Job" category)
This Step allows you to create variables and put the content of some of the input fields into them. The configuration window of the Step has a grid.
Each row in this grid is meant to hold a new variable.
Now you'll create a new variable to use later:
Double-click the Step.
Click . The only existing field will appear: . The default variable name is the name of the selected field in upper case: Get Fields my_file
. Leave the default intact. MY_FILE
Click OK.
1.
2.
3.
4.
5.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
1.
Execution
To test the Transformation, click . Run
Within the run dialog, you will find a grid titled "Arguments" on the bottom left. Delete whatever arguements are already inside, and
instead type as the first argument value. This will be transfered to the transformation as the command line argument. list
Click . Launch
In the pane, you'll see a message like this: Logging
Set Variables.0 - Set variable MY_FILE to value [list]
Click again, and clear the value of the first argument. This time, when you hit you'll see this: Run Launch
Abort.0 - Row nr 1 causing abort : []
Abort.0 - Aborting after having seen 1 rows.
In the Pane, You'll see the line highlighted in red, which indicates that an error occurred and that the Transformation Step Metrics Step Abort
failed (as expected).
Modifying the Transformation
Now it's time to modify the transformation in order to match the names of the files to their corresponding parameters. If the command line Hello
argument to the job would be , this transformation should read the file and create the file . It would also be foo foo.csv foo_with_greetings.xml
helpful to add a filter to discard the empty rows in the input file.
Open the Transformation . Hello.ktr
Open the configuration window. CSV File Input Step
Delete the content of the text box, and press to see the list of existing variables. You should see the Filename Ctrl-Spacebar FILES
variable you added to kettle.properties. Select it and add the name of the variable you created in the previous Transformation. The text
becomes:
${FILES}/${MY_FILE}.csv
Click . OK
Open the configuration window. XML Output Step
Replace the content of the text box with this: Filename
${FILES}/${MY_FILE}_with_greetings
Click to view the projected XML filename. It should replace the FILES variable with your files directory and look like Show Filename(s)
this (depending on the location specified for FILES):
/home/Pentaho/files/${MY_FILE}_with_greetings.xml
Click . OK
Drag a step into the transformation. Filter Rows
Drag the step onto the Hop that leaving and reaching . When you see that the Filter Rows CSV Input Modified Javascript Script Value
Hop line becomes emphasized (thicker), release the mouse button. You have now linked the new step to the sequence of existent steps.
Select for the Field, and for the comparator. name IS NOT NULL
Leave and blank. This makes it so only the rows that fulfill the condition (rows Send 'true' data to Step Send 'false' data to Step
with non-null names) follow to the next Step. This is similar to an earlier Step.
Click . OK
Click and name this Transformation . Save As Hello_with_parameters
Executing the Transformation
To test the changes you made, you need to make sure that the variable exists and has a value. Because this Transformation is MY_FILE
independent of the Transformation that creates the variable, in order to execute it, you'll have to create the variable manually.
1.
2.
3.
4.
5.
6.
1.
a.
b.
c.
d.
e.
2.
1.
2.
3.
a.
b.
c.
4.
a.
b.
c.
5.
a.
b.
6.
a.
7.
a.
In the menu, click . A list of variables will appear. Edit Set Environment Variables
At the bottom of the list, type in as the variable name; as the content, type the name of the file without its extension. MY_FILE
Click . OK
Click . Run
In the list of variables, you'll see the one you just created. Click to execute the Transformation. Launch
Lastly, verify the existence and content of the output file.
Building the main job
The last task in this part of the tutorial is the construction of the main Job:
Create the Job:
Click , then . New Job
The Job workspace, where you can drop Job Entries and Hops, will come up.
Click , then . Job Settings
A window in which you can specify some Job properties will come up. Type in a name and a description.
Click . Save the Job in the Tutorial folder, under the name . Save Hello
Build the skeleton of the Job with Job Entries and Hops:
To the left of the workspace there is a palette of Job Entries.
Now build the Job:
Drag the following steps into the workspace: one step, two steps, and one step. General->Start General->Transformation File Exists
Link them in the following order: Start, Transformation, File Exists, Transformation.
Drag two steps to the workspace. Link one of them to the first step and the other to the General->Abort Transformation File Exists
step. The newly created hops will turn red.
Configure the Steps:
Double click the first Transformation step. The configuration window will come up.
In the field, type the following: Transformation filename
${Internal.Job.Filename.Directory}/get_file_name.ktr
This will work since transformations and jobs reside in the same folder.
Click . OK
Configure the second of the two Transformation Job Entries:
Double-click the entry. The configuration window will come up.
Type the name of the other Transformation in the field: Transformation Filename
${Internal.Job.Filename.Directory}/Hello_with_parameter.ktr
Click . OK
Configure the File Exists Job Entry:
Double-click the entry to bring up the configuration window.
Put the complete path of the file whose existence you want to verify in the field. The name is the same that you wrote Filename
in the modified Transformation Hello:
${FILES}/${MY_FILE}.csv
Note: Remember that the variable ${FILES} was defined in the kettle.properties file and the variable
${MY_FILE} was created in the Job Entry that is going to be executed before this one.
Configure the Abort step connected to the get_file_name transformation step:
In the Message textbox write: The file name argument is missing
Configure the Abort step connected to the File Exists step:
In the Message textbox write this text:
The file ${FILES}/${MY_FILE}.csv does not exist
Note: In runtime, the tool will replace the variable names by its values, showing for example: "The file
c:/Pentaho/Files/list.csv does not exist. If you place your mouse pointer over the Message textbox, Spoon will
display a tooltip showing projected output.
1.
2.
3.
1.
2.
3.
4.
5.
6.
7.
Configuring the Hops
A Job Entry can be executed unconditionally (it's executed always), when the previous Job Entry was successful, and when the previous Job
Entry failed. This execution is represented by different colors in the Hops: a black Hop indicates that the following Job Entry is always executed; a
green Hop indicates that the following Job Entry is executed only if the previous Job Entry was successful; and a red Hop indicates that the
following Job Entry is executed only if the previous Job Entry failed.
As a consequence of the order in which the Job Entries of your Job were created and linked, all of the Hops took the right color, that is, the Steps
will execute as you need:
The first Transformation entry will be always executed (The Hop that goes from Start toward this entry, is black)
If the Transformation that gets the parameter doesn't find a parameter, (that is, the Transformation failed), the control goes through the
red Hop towards the
Abort Job entry.
If the Transformation is successful, the control goes through the green Hop towards the File Exists entry.
If the file doesn't exist, that is, the verification of the existence fails, the control goes through the red Hop, towards the second Abort Job
entry.
If the verification is successful, the control goes through the green Hop towards the main Transformation entry.
If you wanted to change the condition for the execution of a Job Entry, the steps to follow would be:
Select the Hop that reached this Job Entry.
Right click to bring up a context menu.
Click , then one of the three available conditions. Evaluation
How it works
When you execute a Job, the execution is tied to the order of the Job Entries, the direction of the Hops, and the condition under which an entry is
or not executed. The execution follows a sequence. The execution of a Job Entry cannot begin until the execution of the Job Entries that precede
it has finished.
In real-world situations, a Job can be a solution to solve problems related to a sequence of tasks in the Transformations. If you need a part of a
Transformation to finish before another part begins, a solution could be to divide the Transformation into two independent Transformations, and
execute them from a Job, one after the other.
Executing the Job
To execute a Job, you first must supply a parameter. Because the only place where the parameter is used is in the Transformation get_file_name
(after that you only use the variable where the parameter is saved) write the parameter as follows:
Double-click the Transformation Step. get_file_name
The ensuing window has a grid named . In the first row type . Arguments list
Click . OK
Click the button, or from the title menu select . Run Job->Run
A window will appear with general information related with the execution of the Job.
Click . Launch
The execution results pane on the bottom should display the execution results.
Within the execution results pane, the Job Metrics tab shows the Job Entries of your Job. For each executed Job Entry, you'll see, among other
data, the result of the execution. The execution of the entries follows a sequence. As a result, if an entry fails, you won't see the entries that follow
because they never start.
In the Logging tab you can see the log detail, including the starting and ending time of the Job Entries. In particular, when an Entry is a
Transformation, the log corresponding to the transformation is also included.
The new file has been created when you see this at the end of the log text:
Spoon - Job has ended.
If the input file was , then the output file should be and should be in the same folder. Find it and check its list.csv list_with_greetings.xml
content.
Now change the name of the parameter by replacing it with a nonexistent file name and execute the Job again. You'll see that the Job aborts, and
the log shows the following message (where <parameter> is the parameter you supplied):
Abort - The file <parameter> does not exist
Now try deleting the parameter and executing the Job one more time. In this case the Job aborts as well, and in the log you can see this
message, as expected:
Abort - The file name is missing
Kitchen
Kitchen is the tool used to execute Jobs from a terminal window. The script is on Windows, and on other platforms, and kitchen.bat kitchen.sh
you'll find it in the installation folder. If you execute it, you'll see a description of the command with a list of the available options.
To execute the Job, try the simplest command:
kitchen /file <Jobs_path>/Hello.kjb <par> /norep
/norep is a command to ask Spoon not to connect to the repository.
/file precedes the name of the file corresponding to the Job to be executed.
<Jobs_path> is the full path of the folder Tutorial, for example:
c:/Pentaho/Tutorial (Windows)
or
/home/PentahoUser/Tutorial
<par> is the parameter that the Job is waiting for. Remember that the expected parameter is the name of the input file, without the csv.
The other options (i.e. log level) take default values.
After you enter this command, the Job will be executed in the same way it did inside Spoon. In this case, the log will be written to the terminal
unless you redirect it to a file. The format of the log text will vary a little, but the information will be basically the same as in the graphical
environment.
Try to execute the Job without parameters, with an invalid parameter (a nonexistent file), and with a valid parameter, and verify that everything
works as expected. Also experiment with Kitchen, changing some of the options, such as log level.
Pentaho Data Integration (Kettle) Tutorial
Written by , Pentaho Community Member, BI consultant (Assert Solutions), Argentina Mara Carina Roldn
This work is licensed under the . Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License
Introduction
Pentaho Data Integration (PDI, also called ) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. Kettle
Though ETL tools are most frequently used in data warehouses environments, PDI can also be used for other purposes:
Migrating data between applications or databases
Exporting data from databases to flat files
Loading data massively into databases
Data cleansing
Integrating applications
PDI is easy to use. Every process is created with a graphical tool where you specify what to do without writing code to indicate how to do it;
because of this, you could say that PDI is . metadata oriented
PDI can be used as a standalone application, or it can be used as part of the larger Pentaho Suite. As an ETL tool, it is the most popular open
source tool available. PDI supports a vast array of input and output formats, including text files, data sheets, and commercial and free database
engines. Moreover, the transformation capabilities of PDI allow you to manipulate data with very few limitations.
Through a simple "Hello world" example, this tutorial will to show you how easy it is to work with PDI and get you ready to make your own more
complex Transformations.