0% found this document useful (0 votes)
201 views

Guided Tutorial For Pentaho Data Integration Using Oracle

This document provides a tutorial for using Pentaho Data Integration to transform and load data into an Oracle database. It involves loading data from Excel and Access files, performing transformations like date parsing and field combining, and loading the transformed data into Oracle fact and dimension tables based on a snowflake schema. The tutorial requires installing Pentaho Data Integration, Oracle/MySQL databases, and JDBC drivers. It walks through creating a transformation pipeline to load the first data source from Excel, adding validation checks, and using a database connection to lookup columns from Oracle tables.

Uploaded by

Asalia Zavala
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
201 views

Guided Tutorial For Pentaho Data Integration Using Oracle

This document provides a tutorial for using Pentaho Data Integration to transform and load data into an Oracle database. It involves loading data from Excel and Access files, performing transformations like date parsing and field combining, and loading the transformed data into Oracle fact and dimension tables based on a snowflake schema. The tutorial requires installing Pentaho Data Integration, Oracle/MySQL databases, and JDBC drivers. It walks through creating a transformation pipeline to load the first data source from Excel, adding validation checks, and using a database connection to lookup columns from Oracle tables.

Uploaded by

Asalia Zavala
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Guided Tutorial for Pentaho Data Integration using Oracle

In the data integration exercise, you will use the Pentaho Data Integration tool to transform
two data sources and load data into an Oracle fact table. You will perform transformations to
parse date strings, combine fields, and perform validation checks. Before starting this tutorial,
you need to install necessary software, download data sources, and create tables used in the
tutorial.

1. Tutorial Prerequisites
Before starting this tutorial, you should download and install the server and client for either
Oracle or MySQL server. You can find details in Module 1 about Oracle installation. If you have
access to a remote Oracle server (perhaps through your employer), you do not need to install
the server software on your own machine.

You also need to install Pentaho Data Integration before starting this tutorial. After installing
Pentaho Data Integration, you need to install the Java Database Connectivity (JDBC) driver for
Oracle. Module 1 contains installation instructions about Pentaho Data Integration and JDBC
drivers. This tutorial demonstrates the community edition of the most recent stable version
(5.0.1) of Pentaho Data Integration.

After installing Pentaho Data Integration, you need to obtain the data sources used in the
tutorial from the class website.

 Excel file used in part 1 of the tutorial


 Access database used in part 2 of the tutorial

The tutorial uses the Store Sales data warehouse as depicted in Figure 1. Sales is the fact entity
type surrounded by 1-M relationships with dimension entity types, Item, Customer, Store, and
TimeDim. The schema design has a snowflake for the 1-M relationship from Division to Store. In
the table design, table names have been preceded with the prefix “SS” to avoid conflicts with
other tables. Thus, the fact table is SSSales, not Sales as shown in the ERD of Figure 1.

The class website contains documents for Oracle and MySQL. You need to create and populate
the tables using one of these documents. The Oracle document also contains a statement to
create a sequence object for the SSSales table.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle Page 2

Figure 1: Oracle Snowflake Schema for the Store Sales Data Warehouse

2. Creating your First Transformation


The Data Integration component of Spoon allows you to create transformations and jobs.
Transformations involve data flows such as reading from a source, transforming data and
loading it into a target location. Jobs coordinate transformations such as defining dependencies
among transformations and execution conditions such as, “Is my source file available?” or
“Does a table exist in my database?

This exercise will step you through building your first transformation with Pentaho Data
Integration introducing common concepts along the way. Follow the instructions below to
create a new transformation.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle Page 3

1. After starting Pentaho Data Integration, you will see the opening window (Figure 2) and the
Spoon window (Figure 3).

2. Click (New) in the upper left corner of the Spoon window.

3. Select Transformation from the list of components (Figure 4) displayed after selecting the
New button.

Figure 2: Pentaho Data Integration Welcome Window


29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle Page 4

Figure 3: Spoon Opening Window

Figure 4: Spoon Transformation List


29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle Page 5

3. Load the first data source from Excel


Make sure that you have downloaded the Excel input file from the class website. You need to
know the location of this file in Step 4 below.

Step 1 – In the View tab, right click the new transformation 1 and select “settings…”

Step 2 – Set the Transformation name for the new transformation as: SSTORETEST and click OK.

Step 3 – Save the transformation following File  Save. You will see the empty transformation
window in the Spoon (Figure 5).

Figure 5: Empty Transformation Window

Step 4 – Create the Excel Input step:

o Under the Design tab, expand the Input node (Figure 6).
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle Page 6

Figure 6: New Microsoft Excel Input Node

o Select and drag a Microsoft Excel Input step into the canvas on the right.
o Double Click on the Microsoft Excel Input step. The edit properties dialog box (Figure 7)
associated with the Microsoft Excel Input step appears. In this dialog box, you specify
the properties related to a particular step.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle Page 7

Figure 7: Files Window for Microsoft Excel Input Property Editing

o Set name for the Excel Input as SSExcelData and specify the Excel data source path in
the Files tab.
o In the tab named Files, click the button “Browse…” and locate the Excel file that you
downloaded from the class website. Then, Click “Add” to add the file to the selected
files area.
o In the tab named Sheets, click the button “Get sheetname(s)…”. There will appear an
Enter List (Figure 8) to choose sheets. Select Sheet 1, press “>” to move it into the right
area. Click OK.
o In the tab names Fields, click on “Get fields from header row…” You need to change the
data types, length, and precision as the specification in Figure 9.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle Page 8

Figure 8: Sheet Specification Window

Figure 9: Fields Window for Microsoft Excel Input Property Editing

o Click OK at the bottom of the window. The input icon will change to the SSExcel icon
displayed in Figure 10.

Step 5 – In this part of the tutorial, you will add constraint checking for null values and
appropriate data types for the Excel data source.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle Page 9

o Add a Filter Rows step to your transformation. Under the Design table, go to Flow 
Filter Rows (Figure 10).

Figure 10: Excel Input Node and Filter Node in Spoon

o Create a “hop” between the SSExcelSource (Excel file input) step and the Filter Rows
step. Hops are used to describe the flow of data in your transformation. To create the
hop, click the SSExcel Source (Excel file input) step, then press the <SHIFT> key down
and draw a line to the Filter Rows step (Figure 11).

Figure 11: Hop connecting an Excel Input Node Connected to a Filter Node

o Alternatively, you can draw hops by hovering over a step until the hover menu (Figure
12) appears. Drag the hop painter icon from the source step to your target step.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 10

Figure 12: Hover Menu

o Double-click the Filter Rows step. The Filter Rows edit properties dialog box appears
(Figure 13).

Figure 13: Property Edit Window of Filter Node

o The Step Name field is Filter rows by default.


o Under The condition, click <field>. A dialog box that contains the fields you can use to
create your condition appears.
o In the Fields: dialog box (Figure 14) select SalesUnits and click OK.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 11

Figure 14: Condition Fields Selection Window

o Click on the comparison operator (Figure 15) (set to = by default) and select the IS NOT
NULL function and click OK.

Figure 15: Comparison Operator List

o Click the button . A new condition row appears with null = [ ] as a default.
o Click on the expression and add constraints for the next column similarly to what you
did for “SalesUnits”
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 12

o Click on UP. This will allow you to see both conditions joint by AND

o Click the button again. Another new condition row appears with null = [ ] as a
default.
o Keeping repeating these steps for all fields.
o The final view of filter conditions is shown by Figure 16.

Figure 16: Filter Conditions Window


29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 13

o Save your transformation.

Step 6 – Create a step to sort the result of the Filter Rows step.

o Under the Design tab, expand the contents of the Transform node.
o Click and drag a Sort Rows step into your transformation; create a hop between the
Filter rows and Sort Rows steps. Select Result is TRUE in the filter results selection list
(Figure 17).

Figure 17: Filter Results Selection List

o Double-click the Sort Rows step to open its edit properties dialog box (Figure 18). Click
“Get Fields” to obtain the fields. Delete other fields except the Day, Month and Year
fields. Then click Ok.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 14

Figure 18: Property Edit Window of Sort Rows Node

4. Using a Database Connection to Lookup Columns from Oracles tables


Pentaho Data Integration allows you to define connections to multiple databases provided by
multiple database vendors (MySQL, Oracle, Postgres, and many more). Pentaho Data
Integration ships with the most suitable JDBC drivers for supported databases and its primary
interface to databases is through JDBC. Vendors write a driver that matches the JDBC
specification and Pentaho Data Integration uses the driver. Unless you require extensive
debugging or have other needs, you won’t ever need to write your own database driver.

When you define a database connection, the connection information (username, password,
port number, and so on) is stored in the Pentaho Enterprise Repository and is available to other
users when they connect to the repository. If you are not using the Pentaho Enterprise
Repository, the database connection information is stored in the XML file associated with a
transformation or job.

Connections that are available for use with a transformation or job are listed under Database
Connection node in the explorer View in Spoon.

There are several ways to define a new database connection:


29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 15

o In Spoon, under View in the navigation tap, right click Database connections and choose
New.
o In Spoon, under View in the navigation tap, right click Database connections and choose
New Connection Wizard.
o In the Table input configuration box, click on New.

This part of the tutorial involves looking up the date from the SSTimeDim table to check the
validity of dates in the Excel data source. In addition, you will lookup primary key columns from
other Oracle tables to ensure loaded data does not contain invalid foreign keys.

Step 1 – Access the SSTimeDim table from Oracle database.

o Under the Design tab, expand the contents of the Input node.
o Click and drag a Table Input step into your transformation.
o Double-click the Table Input step to open its edit properties dialog box (Figure 19).
o Rename your Table Input step to SSTimeDim.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 16

Figure 19: Property Edit Window of Table Input Node

o Click “New…” next to the connection field. You must create a connection to the
database. The Database connection dialog box appears.
o Before setting the connection information, you should first configure the JDBC driver
according to the instructions described in the installation procedure for Pentaho Data
Integration. You must also have created and populated the tables of the Store Sales data
warehouse.
o Provide the settings for connecting to the database as shown in Figure 20. You have two
options for connection details. If you created and populated the store sales tables under
the SYSTEM account, you should use the first connection details. If you created and
populated the store sales in an account you created (LocalUser1), you should use the
second connection details. Note that host name and port are left blank in both
connection details. The Database Name is only partially shown in Figure 20. You must
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 17

enter the full value exactly into the Database Name field. The full value for database
name is shown in the connection details.

Figure 20: Database Connection Window for SYSTEM User

o Connection for Oracle Database Virtual Box Appliance. This connection requires that
you have PDI installed on the Oracle Virtual Box, not Windows. The predefined
connection in SQL Developer uses the privileged account, SYSTEM. You can use ORCL or
CDB1 as the service name in the connection string.
Connection Name: Oracle12cDB
Connection Type: Oracle
Host Name:
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 18

Database Name:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)
(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=ORCL)))
Port Number:
User Name: SYSTEM *** or other user name that you created. ***
Password: oracle *** You may have changed this default password for SYSTEM. ***
Access: Native (JDBC)
o Connection for local 12c server using SYSTEM account and SID of ORCL.
Connection Name: Oracle12cDB
Connection Type: Oracle
Host Name:
Database Name:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)
(PORT=1521)))(CONNECT_DATA=(SID=ORCL)))
Port Number:
User Name: SYSTEM
Password: *** use the administrative password that you gave during installation ***
Access: Native (JDBC)
o Alternative connection using a local user and service name of PDBORCL. Note that
PDBORCL must be open and the local user must have been previously created. You
should see instructions in the document about making Oracle connections in the
software installations lesson in module 1.
Connection Name: Oracle12cDB
Connection Type: Oracle
Host Name:
Database Name:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)
(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=PDBORCL)))
Port Number:
User Name: LocalUser1 *** or other user name that you created. ***
Password: *** use the password that you gave for LocalUser1 or other user that you
created ***
Access: Native (JDBC)

o Click “Test” to test the connection. Then success test result is shown by Figure 21.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 19

Figure 21: Database Connection Test

o Type in “SELECT * FROM SSTimeDim” in the SQL section (Figure 22). You can click the
Preview button to view the database. Click Ok, to exit the Database Connection dialog
box.

Figure 22: SQL Edit Section in Property Window of Table Input Node

o Add another sort rows component Sort rows 2, and a hop connecting the SSTimeDim
step. In the field specification (Figure 23), delete other fields except TIMEDAY,
TIMEMOHTH, TIMEYEAR fields.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 20

Figure 23: Property Edit Window of Sort Rows 2 Node

o Under the Design tab, expand the contents of the Joins node.
o Click and drag a Merge Join step into your transformation; create a hop between the
Sort rows, Sort rows 2 and Merge Join steps (Figure 24).

Figure 24: Two Sort Rows Nodes Connected to Merge Join Node

o Double-click the Merge Join step to specify its properties (Figure 25). Set First step as
Sort rows, Second step as Sort rows 2, and Join Type as INNER. Click both of the “Get
key fields” at left and right to get the possible fields to join. In the left table, delete
other fields except Day, Month and Year fields. In the right table, delete other fields
except TIMEDAY, TIMEMONTH, and TIMEYEAR fields. Then click OK.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 21

Figure 25: Property Edit Window of Merge Join Node

o Now, we have finished inner join between Excel input and SSTimeDim table.

Step 2 – Inner join the SSItem, SSCustomer, and SSStore tables.

Similar to getting data from the SSTimeDim table in the previous section, inner joining these
tables requires Table Input components. First, we set the connection and query properties for
the SSItem table. Note that these tables should exist in your Oracle schema before these steps.

o Drag and drop the Table Input 2 into the design pane.
o Double click on the newly created component to open its Basic Settings pane. Specify
the connection as shown in previous figure.
o Use “SSItem” as the Table Name value and “SELECT * FROM SSItem” as the Query value.
o Create two sort rows components: Sort rows 3 and Sort rows 4, connecting Merge Join
and SSItem respectively. See the field to be sorted as: ItemID and ITEMID respectively.
o Drag and drop the Merge Join 2 into the design pane. Connect Sort rows 3 and Sort
rows 4 to Merge Join 2. Set the field to be joined as Item ID and ITEMID.
o The global view of all nodes and connections after Step 2 is shown by Figure 26.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 22

Figure 26: Global View of All Nodes and Connections after Step 2

Step 3 – Inner join the tables.

o Inner join the tables named SSCustomer and SSStore in your transformation using the
same method described previously.
o For the SSCustomer step, connect the CustID (from Excel file) and CUSTID (from
Database) fields.
o For the SSStore step, connect the StoreID (from Excel file) and STOREID (from Database)
fields.
o The global view of all nodes and connections after Step 3 is shown by Figure 27.

Figure 27: Global View of All Nodes and Connections after Step 3
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 23

Step 4 – Create and connect an Add Sequence step to generate values for the SalesNo column.

o Under the Design tab, expand the contents of the Transform node.
o Click and drag an Add sequence step into your transformation; create a hop between
the Merge Join 4 and Add Sequence steps (Figure 28).
o Double click on the newly created component to open its Basic Settings pane.
o Set SalesNo as the name of value. Check the box for use DB to get sequence. Select the
connection as Oracle12cDB. Set SSSalesNoSeq as sequence name (Figure 29).

Figure 28: Global View of All Nodes and Connections after Step 4
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 24

Figure 29: Property Edit Window of Add sequence node

5. Insert data into the SSSales table


o Under the Design tab, expand the contents of the Output node.
o Click and drag an Insert/Update step into your transformation; create a hop between
the Add sequence and Insert/Update steps.

Figure 30 shows the Insert/Update node (SSSales) connected to Add sequence Node.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 25

Figure 30: Connect Insert/Update Node to Last Merge Join Node

o Double click the Insert/Update component, to specify its properties (Figure 31). Set the
step name as SSSales. Select the connection as Oracle12cDB. Type in the Target table
as SSSales. DON’T click the button “Get fields”. Instead, select the names from the two
table fields and set the comparator between them to “=”. The final window should look
like Figure 31.

Figure 31: Property Edit Window of Insert/Update Node


29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 26

o Click the button “Get Updated fields” and then click on “Edit mapping” button to edit
mapping. The mapping edit window is shown by Figure 32. Select the fields named
SalesUnits, SalesDollar, SaleCost, CustID, StoreID, ItemID TIMENO and SalesNo into the
mappings field. Pentaho will automatically match the corresponding name in the Target
field. Only SalesNo field has to be manually matched with SALESNO field. Then click OK.

Figure 32: Mapping Edit Window

o Select the SSSales step and run a preview by clicking on . In the transformation debug
dialog click on Quick Launch (Figure 33).
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 27

Figure 33: Transformation Debug Dialog

o The Examine preview data window is displayed by Figure 34.

Figure 34: Execution Report Window

o Connect to your Oracle account (on your PC or remote server) so you can verify the
number of rows in the SSSales table. You should see 104 rows with 8 new rows added to
the 96 rows in the sample data (Figure 35).
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 28

Figure 35: Inserted Data in Oracle Database

o If you do not see the extra rows, the Oracle output component had a failure. To see the
error, check the Execution Results section.

6. Load second data source from Access


The next part of the exercise involves creation of a new transformation to process the Access
data source. Make sure that you have downloaded the Access database file from the class
website and noted its location on your computer. You will begin by loading the data from a
table in this database.

Step 1- Add the Access Input Step

o Under the Design tab, expand the Input node. Figure 36 shows the Design table and
input node.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 29

Figure 36: New Microsoft Access Input Node

o Select and drag a Microsoft Access Input step onto the canvas on the right;
o Double Click on the Microsoft Access Input. The edit properties dialog box associated
with the Microsoft Access Input step appears (Figure 37). In this dialog box, you specify
the properties related to a particular step.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 30

Figure 37: Property Edit Window of Microsoft Access Input Node

o Set name for the Access Input as Sales and specify the Excel data source path in the Files
tab.
o In the tab named Content, click the button “Get tables” of table section. There will
appear a window (Figure 38). Select Sales as the table name, click OK.

Figure 38: Table Selection Window


29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 31

o In the tab named Fields, click the button “Get fields”. There will appear a list (Figure 39)
showing the fields in the table named Sales.

Figure 39: Fields Window for Microsoft Access Input Property Editing

o Click the button “Preview rows” to preview the database (Figure 40). When asked for
the number of rows type 12 and click OK.

Figure 40: Examine Preview Data Window


29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 32

o Click OK at the bottom of the window. The input icon will change to the shape shown by
Figure 41.

Figure 41: Sales Node Icon

Step 2 –You will add constraint checking for null values using the Filter Rows step.

o Add a Filter Rows step to your transformation. Under the Design table, go to Flow 
Filter Rows (Figure 42).

Figure 42: Access Input Node and Filter Node in Spoon

o Create a hop between the Sales (Access file input) step and the Filter Rows step. Hops
are used to describe the flow of data in your transformation. To create the hop, click the
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 33

Sales (Access file input) step, then press the <SHIFT> key down and draw a line to the
Filter Rows step.
o Alternatively, you can draw hops by hovering over a step until the hover menu appears.
Drag the hop painter icon from the source step to your target step.
o Double-click the Filter Rows step. The Filter Rows edit properties dialog box appears.
o In the Step Name field type, Filter rows.
o Under The condition, click <field>. A dialog box that contains the fields you can use to
create your condition appears.
o In the Fields: dialog box select SalesUnits and click OK.
o Click on the comparison operator (set to = by default) and select the IS NOT NULL
function and click OK.

o Click the button , add constraints for other columns (Figure 43).

Figure 43: Filter Conditions Window


29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 34

o Save your transformation.


o Important: the section of Send “true” data to step and Send “False data” to step
should be specified as the same step before executing the whole transformation.

7. Separate SalesDay fields into Day, Month, Year fields


In this part of the tutorial, you will use the Select Values step to change the format of the
myDate field and the Split Fields step to parse the field into date components.

o Under the Design tab, expand the contents of the Transform node.
o Click and drag a Select values step into your transformation.
o Create a “hop” between the Filter rows step and the Select values step (Figure 44).

Figure 44: True Filter Results Connected to Select Values Node

o Double-click the Select values step to open its edit properties dialog box.
o In the tab named Metadata, click the button “Get fields to change”, to get the fields to
change, which is shown by Figure 45. Change the Type of field myDate as String, change
its Format as dd-MM-yyyy. Click OK.

Figure 45: Meta-data Tab of Select Values Property Edit Window

o Under the Design tab, expand the contents of the Transform node.
o Click and drag a Split fields step into your transformation (Figure 46).
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 35

Figure 46: Create Split Fields in Spoon

o Create a “hop” between the Select values step and the Split fields step.
o Double-click the Split fields step to open its edit properties dialog box (Figure 47).
o Select myDate in the Field to split, type “-” as the Delimiter. Type in Year, Month and
Day in the Column named New field, and set their Type as Number.

Figure 47: Property Edit Window of Field Splitter Node

o Click OK.
o Click , to preview this transform (Figure 48). Make sure that Split Fields step is
selected from the left side panel of the transformation debug dialog and click on “Quick
Launch” button.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 36

Figure 48: Examine Preview Data Window

8. Lookup Columns from the Oracle tables


This part of the exercise involves looking up the date from the SSTimeDim table to check the
validity of dates in the Access data source. In addition, you will lookup primary key columns
from other Oracle tables to ensure loaded data does not contain invalid foreign keys. This part
of the exercise is similar to Section 3.

Step 1 – Access the SSTimeDim table from Oracle database.

o Under the Design tab, expand the contents of the Input node.
o Click and drag a Table Input step into your transformation.
o Double-click the Table Input step to open its edit properties dialog box.
o Rename your Table Input step to SSTimeDim.
o Click “New” next to the connection field. You must create a connection to the database.
The Database connection dialog box appears.
o Provide the settings for connecting to the database as shown in the Figure 20.
o Connection Name: Oracle12cDB
Connection Type: Oracle
Host Name:
Database Name: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)
(HOST=132.194.167.74)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=portdb2.ucdenver.pvt)))
Port Number:
Access: Native (JDBC)
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 37

You should replace the IP address, port number and the service name. Also, you need to
use your assigned user name and password. Do not use ISMG6480ClassStudent as the
user name.
o Click “Test”, to test the connection.
o Type in “SELECT * FROM SSTimeDim” in the SQL section. You can click the Preview button to
view the database. Click Ok, to exit the Database Connection dialog box.
o Under the Design tab, expand the contents of the Transform node.
o Click and drag a Sort Rows step into your transformation; create a hop between the
Split fields and Sort Rows steps.
o Double-click the Sort Rows step to open its edit properties dialog box. Click “Get fields”
to obtain the fields. Delete other fields except the Day, Month and Year fields. Then click
Ok.
o Add one more sort rows component Sort rows 2, and a hop connecting the SSTimeDim
step. In the field specification, delete other fields except TIMEDAY, TIMEMOHTH,
TIMEYEAR fields.
o Under the Design tab, expand the contents of the Join node.
o Click and drag a Merge Join step into your transformation; create a hop between the
Sort rows, Sort rows 2 and Merge Join steps.
o Double-click the Merge Join step to specify its properties. Set First step as Sort rows,
Second step as Sort rows 2, and Join Type as INNER. Click both of the “Get key fields”
at left and right to get the possible fields to join. In the left table, delete other fields
except Day, Month and Year fields. In the right table, delete other fields except
TIMEDAY, TIMEMONTH, and TIMEYEAR fields. Then click OK.
o Now, we have finished inner join between the Access table and SSTimeDim table.
o Figure 49 shows the global view of all nodes and connections after Step 1.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 38

Figure 49: Global View of All Nodes and Connections after Step 1

Step 2 – Inner join SSItem, SSCustomer, and SSStore to Access table.

o Inner join the tables named SSItem, SSCustomer, and SSStore in your transformation
using the same method described before.
o For SSItem step, connect ItemID (from Excel file) and ITEMID (from Database) fields.
o For SSCustomer step, connect CustID (from Excel file) and CUSTID (from Database) fields.
o For SSStore step, connect StoreID (from Excel file) and STOREID (from Database) fields.
o Figure 50 shows the global view of all nodes and connections after Step 2.

Figure 50: Global View of All Nodes and Connections after Step 2

Step 3 – Add SalesNo column.

o Under the Design tab, expand the contents of the Transform node.
o Click and drag Add sequence step into your transformation; create a hop between the
Merge Join 4 and Add Sequence steps (Figure 51).
o Double click on the newly created component to open its Basic Settings pane.
o Set SalesNo as the name of value. Check the box for use DB to get sequence. Select the
connection as tbs11g2. Set SSSalesNoSeq as sequence name (Figure 52)
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 39

Figure 51: Global View of All Nodes and Connections after Step 3

Figure 52: Property Edit Window of Add sequence node


29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 40

9. Insert data into the SSSales table


o Under the Design tab, expand the contents of the Output node.
o Click and drag an Insert/Update step into your transformation; create a hop between
the Add sequence and Insert/Update steps. Figure 53 shows the connection.
o Double click the Insert/Update component, to specify its properties. Set the step name
as SSSales. Select the connection as Oracle12cDB. Type in the Target table as SSSales.
DON’T click the buttons “Get fields”. Instead, select the names from the two table fields
and set the comparator between them to “=”. The final window should look like Figure
31.
o Click the button “Get Updated fields” and then click on “Edit mapping” button to edit
mapping. The mapping edit window is shown by Figure 32. Select the fields named
SalesUnits, SalesDollar, SaleCost, CustID, StoreID, ItemID TIMENO and SalesNo into the
mappings field. Pentaho will automatically match the corresponding name in the Target
field. Only SalesNo field has to be manually matched with SALESNO field. Then click OK.

Figure 53: Connect Insert/Update Node to Last Merge Join Node


o Select the SSSales step and run a preview by clicking on . In the transformation debug
dialog click on Quick Launch (Figure 33).
o The Examine preview data window is displayed by Figure 34.
29 July 2020 Guided Tutorial for Pentaho Data Integration using Oracle P a g e 41

Connect to your Oracle account (on your PC or remote server) so you can verify the number of
rows in the SSSales table. You should see 112 rows with 8 new rows added to the 104 rows in
the sample data (Figure 54).

Figure 54: Inserted Data in Oracle Database

You might also like