ETL Integrator User Guide
ETL Integrator User Guide
Release 5.0
© 2003 by SeeBeyond Technology Corporation. All Rights Reserved. This work is protected as an unpublished work under the
copyright laws.
This work is confidential and proprietary information of SeeBeyond and must be maintained in strict confidence.
Version 20031015130533.
Contents
Chapter 1
System Description 5
Introduction 5
The eTL Integrator Product 5
The ETL Process 6
eTL Supporting Features 6
Business Integration and the eTL Integrator 8
Supporting Documents 9
Writing Conventions 10
eGate Installation Requirements 10
The SeeBeyond Web Site 11
Chapter 2
Interface to eGate 12
Enterprise Designer Components 12
Menu Bar 13
Enterprise Explorer 14
Project Editor 14
Creating Flat File OTDs 14
Importing Metadata Information for flat files 16
Chapter 3
Chapter 4
Deployment 48
General Instruction about Creating a Deployment Profile 48
Verify the Output Data 49
Create Environment and Activate the Deployment Profile for eTL 49
Scenario 1 50
Scenario 2 55
Glossary 59
ETL Terms 59
Index 62
Chapter 1
System Description
SeeBeyond’s eTL Integrator technology is optimized for very large record sets and
build data scenarios that are fully integrated with the SeeBeyond ICAN suite
(Integrated Composite Application Network Suite) to unify the domains of eAI
(eBusiness and Application Integration) and ETL. The eTL Integrator can be integrated
into the enterprise business process or used as a classic, standalone, ETL process.
1.1 Introduction
Extraction Transform and Load (ETL) is a data integration technology that extracts data
from several heterogeneous data sources, combines and standardizes the data, then
presents or stores the data in a uniform format for informational purposes.
ETL is necessary because many non-modern system architectures evolved over the
years in environments where data was typically captured, processed and stored by
separate and distinct software applications and databases. As a result, the data residing
in the databases of many companies is typically non-standardized.
Product Description
SeeBeyond’s eTL Integrator technology is optimized for very large record sets and
build data scenarios that are fully integrated with the SeeBeyond ICAN suite
(Integrated Composite Application Network Suite) to unify the domains of eAI
(eBusiness and Application Integration), and Enterprise Information Integration (EII).
With these unified domains you can build unprecedented solutions using both message
based processing (eGate) and dataset based processing (eTL) technologies.
The eTL Integrator product provides excellent performance at runtime for high volume
extraction, and load of tabular data sets, which reduces eGate Collaboration design time.
The eTL Integrator can be integrated into enterprise business processes or used as a
classic, standalone product.The ETL Process
In an ETL process, data is extracted from data sources. The data is then transformed (or
processed), using rules, algorithms, concatenations, or filters or, into a desired state
suitable for loading into a database or data warehouse. See the following Figure 1.
Product Usage
The eTL product can be used to acquire a temporary subset of data for reports or other
purposes, or acquire a more permanent data set for the population of a data mart or
data warehouse. The product may also be used for conversion of one database type to
another or for the migration of data from one database or platform to another.
eTL
Extract, transform, Load
Warehouse
Extract data from a Process through a series of Load data into a target/
source transformations warehouse
ETL Technology
! Batch oriented operations are typically restricted to batch windows in a regularly
scheduled timeframe.
! Interfaces with data stores (e.g. RDBMS).
! Intended primarily for creating data warehouses.
! Not well suited for online transactions.
! Designed for one-to-one (i.e. point-to-point), integration scenarios.
Additional Conventions
Windows Systems
For the purposes of this guide, references to “Windows” will apply to Microsoft
Windows Server 2003, Windows XP, and Windows 2000.
Path Name Separator
This guide uses the backslash (“\“) as the separator within path names. If you are
working on a UNIX system, please make the appropriate substitutions.
! Logical Host
! Enterprise Designer
Refer to the eGate Integrator Installation Guide for system requirements and installation
instructions.
Chapter 2
Interface to eGate
The Enterprise Designer is the graphical user interface (GUI) used to design and
implement ICAN 5.0 projects. This chapter overviews the features and interface of the
Enterprise Designer window.
This chapter includes
! “Enterprise Designer Components” on page 12
! “Menu Bar” on page 13
! “Enterprise Explorer” on page 14
! “Project Editor” on page 14
Menu Bar
Toolbar
Enterprise Explorer
Note: This chapter provides a high-level overview of the Enterprise Designer GUI. Refer to
the eGate Integrator User’s Guide for a more detailed description of the menu bar,
toolbar, Enterprise Explorer, and Enterprise Designer.
Note: Refer to Chapter 7 of the User Guide for information about collaboration definitions
and using the OTD Wizard.)
3 Type a name for your flat file OTD and click Next.
The Select Sample Files for Import window appears.
In Figure 6 above you can browse to find a file on your computer or on the network.
4 Click the drop-down arrow to navigate to the file you want to select.
5 Click the Add button to select the file.
The selected file(s) appears in the Selected flat files list.
Note: The file must reside on your computer or on a network location you have permission
to access.
You can select one or more flat files similar to the way you can include multiple tables
in an Oracle database OTD. Later in this process, the system will automatically inspect
files to a assess structure and read sample data.
1 Enter a Table name and select an encoding scheme. The default encoding is ASCII.
2 Select a File format, Delimited or Fixed-width.
There are five criteria and delimiters used to instruct the system how to parse your
selected flat file. The following are valid for delimited only:
" Default SQL Type
" Record Delimiter
" Field Delimiter
" Text Qualifier
" First line contains field names?
In our sample we are configuring the delimited format, Figure 10. The following
two figures show examples of fixed width formats:
The default SQL type, Figure 10 above, is used for all elements in the flat file OTD
structure unless a different type is specified by the end user in a subsequent panel.
The Record Delimiter, Figure 11 above, allows you to specify how the various records in
the flat file are physically separated from each other.
The Field Delimiter, Figure 12 above, specifies how the various elements (fields) in the
flat file records are physically separated from each other. The following field delimiters
are supported: comma, tab, and pipe (|).
The Text Qualifier, Figure 13 above, explicitly specifies how eGate Integrator detects
text fields. You can select double quote (“), single quote (‘), or none.
The “First line contains...” offers a True or False selection. See Figure 14 above. You
can specify whether the selected flat file includes the names of its fields in its header
row.
! True - the names specified in the header row will be used as element names of the
new OTD.
! False - the eTL Integrator will dynamically assign initial names to the new OTD
elements, which can be changed in the next panel.
Suggested OTD Record Structure
After the parsing specifications have been set, you are ready to define the record
layout and field properties for your file. The system displays a suggested OTD
record, but you can change the various field properties, including name, length, and
data type.
The suggested OTD record properties are displayed, based on your file structure
and your previous selections. In the previous Figure 15, the fields Length, Column
name and Datatype are editable.
1 Click to highlight the field you want to edit.
2 Double-click to begin editing.
In this final step of the OTD Wizard process, you can select which elements of the new
flat file OTD will be made available for use by ICAN Collaborations. See Figure 16
above.
1 Click Finish to create the new flat file OTD in the ICAN repository.
The new flat file OTD displays.
Chapter 3
The following scenario will guide you through the development of a simple project.
2 Type your Username and Password and then click Login to start the Enterprise
Designer.
Create and name a Project
1 In the Enterprise Explorer pane of the enterprise Designer, right-click the
Repository name (computer icon ) and then click New Project.
2 Type Project_eTL as the name for your project and press Enter.
The Project_eTL structure appears in the Explorer pane on the left side of the
window.
Create a New Object Type Definition
You will be creating definitions for database tables.
1 Right-click Project_eTL.
The New Object Type Definition Wizard appears, displaying a list of tools to
create OTDs.
Note: The Port_ID is not the eGate port but rather is the database port number.
1 3
4
10 Next you will select the following tables (number 5 in the previous Figure 26):
! Orders
! Inventory
! Exceptional_orders
11 After each selection click the Select button. (It will take a minute before the table
name appears in the name list below.)
12 After you have finished your selections, and they appear in the list (number 6 in the
previous Figure 26), click the OK button.
The Selected Tables window will appear.
3 Click Finish.
The the wizard closes and the Enterprise Designer window reappears.
eTL Collaboration
Next you will configure your Collaborations for source and target tables.
1 Right-click Project_eTL.
3 Type Collab_eTL.
4 Click the Next button to select source tables (or click Finish to create a Collaboration
with no source or target tables initially appearing on the designer pane).
Select Source Tables
You can select tables in a multiple table OTD.
5 Highlight your OTD.
6 Click the right-arrow button to complete your selection.
7 Click on the checkbox next to each item to be used as a source table.
8 Click Next (then repeat the previous steps to select the target tables).
9 Click Window on the menu and click Close All.
10 Double-click on Collab_eTL.
11 Right-click on Collab_eTL.
12 Click Select Tables... from the options.
13 Navigate to the Select Source Tables window (Figure 33).
14 Check the table boxes for the source tables, Inventory and Orders_Input.
15 Click Next.
Select Target Table
3 Click on the graphic “handles” and expand the view of the tables.
Place a Join Operator on the eTL Canvas
1 Click on the ‘Join’ icon and drag the join operator to the designer pane.
2 Select the Inventory table and connect it to the ‘left’ property of the join operator.
3 Select the Orders_Input table and connect it to the ‘right’ property of the join
operator.
11 Select the ‘result’ property of the ‘greater than’ operator and connect it to the ‘left’
property of the ‘and’ operator.
12 Select the ‘result’ property of the ‘equal’ operator and connect it to the ‘right’
property of the ‘and’ operator.
13 Select the ‘result property of the ‘and’ operator and connect it to the ‘condition’
property of the ‘join’ operator.
Map Target Table Columns
1 Map the following target table columns from the Order_input table.
The following are exceptions:
2 Place a new ‘literal’ operator on the eTL designer pane (click on the icon and
drag).
3 Enter value ‘n’ into the ‘literal’ operator.
Set Properties
Figure 43 Properties
View the output data in the lower window. Check the database table
Exceptional_orders to verify that the ‘literal’ has changed (use show data).
Note: The ‘loadOrderETL’ icon (inFigure 45) is replaced by the eTL Collaboration icon
(Figure 46).
Note: You can drag and drop the execute object from the Project Explorer.
2 Right-click the connection between ‘getFile’ and ‘loadOrderETL’ and select the
‘Add Business Rule’ option.
The business rule, represented by the Mapping symbol (referred to as ‘M’ in
eInsight) appears.
3 Click the ‘M’ icon in the eInsight toolbar to invoke the Transformation Designer.
Define Inbound Mapping for the eTL
1 Select the newly created business rule, (the ‘M’ of the ‘getFile’ ‘LoadOrderETL’
connection).
2 Select the ‘M’ icon of the eInsight toolbar.
The Transformation Designer appears.
3 Expand the ‘FileClient’ operator in the left pane of the Transformation Designer.
4 Expand the ‘loadOrderETL’ operator in the right pane of the Transformation
Designer.
5 Connect ‘text’ data in the left pane to ‘myFilter’ data in the right pane. See the
following Figure 49.
The eTL Collaboration’s parameter (‘myFilter’ data element) is now supplied by the
contents of the ‘read’ file. This embeds the eTL Collaboration into the business
process.
Chapter 4
Deployment
After a Project has been completed it must be deployed. This section explains that
process.
Activate Environment
6 Click the Activate button. The Activation in Progress message appears.
Activating the Deployment Profile may take a few minutes.
Run the Bootstrap and Management Agent
The Bootstrap process executes your and begins the process of polling your input data.
The Bootstrap process is performed from a command prompt. Bootstrap will pick up
the deployment profile the first time it runs; after that you would redeploy.
The Bootstrap command is case sensitive on Windows.
To run the Bootstrap
1 Open a Windows command prompt as shown in the following. (Click Start; click
Run; type cmd.)
2 Navigate to where you installed the logicalhost; for example,
eGate50\logicalhost\bootstrap\bin, then type the following command:
CD \eGate50\logicalhost\bootstrap\bin
3 To start the Bootstrap process, type the following command:
bootstrap -e environment_name -l logicalhost_name
-r repository_URL -i username -p password
environment_name is the name of your environment (for example, TutorialTest),
logicalhost_name is the name of your Logical Host (for example, LogicalHost1),
repository_URL is the full URL of your Repository including the Repository name
(for example, http://labserver:9000/Test),
username is your user name, and
password is your password.
4 Press Enter.
The Bootstrap process takes a few minutes to execute. The Management Agent
starts the components in the Project.
4.2.1. Scenario 1
Deploy a Stand-alone Process
This is a continuation of “Using eTL With eInsight” on page 43.
The following examples were created with an Oracle database.
1 Create a new Connectivity Map in Enterprise Designer.
2 Add two Oracle external application icons.
This will implement the database logic as defined in the eTL Collaboration.
2 Connect the Scheduler Service from the left pane of the eTL Collaboration to the
Scheduler icon.
3 Double-click the new connection (line).
Its property sheet displays. There are many options available; this scenario covers
only a few.
Time Zone
LogicalHost Time Zone
Schedule Type
Frequency in Seconds
Seconds 10
4 Select a timezone, where the LogicalHost resides, from the dropdown list. (See
graphic 1).
5 Select a schedule type (interval type). For example, Daily at time, Frequency in
seconds, or Frequency in hours. (See graphic 2.)
6 Select an interval value such as, midnight, Monday 8 AM PST, every hour or every
10 seconds. (See graphic 3.)
Connect Source and Target
Resume mapping the Connectivity map.
1 Double-click the eTL Collaboration Service icon, if not already open.
2 Connect the source Oracle OTD Service to the source external Oracle system icon.
3 Connect the target Oracle OTD Service to the target external Oracle system icon.
4 Double-click the connection between the source Oracle OTD Service and the source
external Oracle system icon.
The connection’s property sheet displays.
5 Select the Outbound Oracle eWay option (because this is an outbound operation for
the eTL Collaboration).
6 Click the OK button.
7 Double-click the connection between the target Oracle OTD Service and the target
external Oracle system icon.
The connection’s property sheet displays.
8 Select the Outbound Oracle eWay option.
9 Click the OK button.
The connection property sheet is displayed.
Note: The environment properties of the source database are determined at the time the
solution is activated via a deployment profile; therefore, you don’t have to specify the
actual database name here.
Figure 56 Properties
Create an Environment
This involves the following steps:
1 Switch to the Environment Explorer view and create a new environment.
2 Select the Oracle External system.
3 Enter the property values.
Note: The Oracle External system will be used in the Deployment profile.
4.2.2. Scenario 2
Deploy as an Invoked Service
The example illustrated in this section explains how to invoke a service from an
eInsight business process.
The following examples were created with an Oracle database.
1 Create a new Connectivity Map in Enterprise Designer.
2 Add two external application icons (inbound and outbound).
3 Add two Oracle external application icons.
This will implement the database logic as defined in the eTL Collaboration.
10 Connect the Oracle source OTD to the Oracle source external system.
11 Connect the Oracle source OTD to the Oracle target external system.
Create a Deployment Profile for Scenario 2
Prerequisites:
! Create an environment
! Create at least one logical host
! Create at least one Integration server within the logical host
! Create at least one Oracle external system
! Create at least one File external system
1 From the Project Explorer of the Enterprise Designer, create a new deployment
profile.
2 Drag the eInsight business process Service into the Integration server.
3 Drag the eTL Collaboration Service to an Integration server of a logical host.
4 Drag the Oracle Services to an external Oracle system.
5 Drag the (flat) file (read and write) Services to an external file system.
6 Start your logical host, if not already running.
7 Activate your deployment.
8 Verify the results.
Glossary
The following terms, although not all inclusive, are common to ETL.
Dimension Table Dimension tables describe the business entities of an enterprise; also called lookup or
reference tables.
Dirty Data Dirty data contains, but is not limited to, incorrect data including spelling errors,
punctuation errors, incorrect data referencing, incomplete, inconsistent, outdated,
and redundant data.
Drill Down To move from summary to more detailed data by “drilling down” to get it. In database
terminology this might mean starting with a general category and drilling down to a
specific field in a record.
EPR Enterprise Resource Management
ETL Extract, Transform, Load. Extract is the process of reading data from a source database
and extracting the desired subset of data. Transform is the process of converting the
extracted data from its previous form into the desired form. Load is the process of
writing the data into a larger database.
Extraction Data are extracted from a source using software tools. This first step in ETL initially
“gets” the data.
Fact Table A fact table typically contains two types of columns: those containing facts and those
that contain foreign keys to dimension tables. Fact tables contain detail facts and/or
summary facts.
Join Matches records, which are joined by a common field, in two tables in a relational
database. Often part of a Select query.
Metadata “Data about data.” Metadata describes “how,” “when,” and “who” about structure and
format, of a particular set of data. ETL tools are used to generate and maintain a central
metadata repository.
Non-normalized Data Non-normalized data cannot be cross-referenced accurately, if at all, and causes
manageability issues. Non-normalized data may be converted to normalized data.
Normalized Data Normalization is a common database design process used to remove redundant or
incorrect organization and data. The design and normalization of the database will
create a maintainable data set that can be cross-referenced.
Normalized data is not only easier to analyze but also easier to expand. Normalization
involves removing redundancy and correcting incorrect data structure and
organization.
OLAP Online analytical processing.
Query A request for information from a database. There are three query methods:
Choose – With this easy-to-use method, the database system presents a list of
parameters from which you can choose. This method is not as flexible as other
methods.
Query by example (QBE) – With this method, the system lets you specify fields and
values to define a query.
Query language – With this method, you have the flexibility and power to make
requests for information in the form of a stylized query using a query language. This is
the most complex and powerful method.
Raw Data Data that has not been turned into “information,” through processing. Although
factual and “real” raw data is unorganized.
Relational Database Short for relational database management system, most often referred to as RDBM.
Data is stored in related tables. Relational databases can be viewed in many different
(RDBM)
ways.
In this system a single database can be spread across several tables. (RDBM differs
from flat-file databases where each database is self-contained as a single file or table.)
B F
Bootstrap Flat files
see also deployment 49 see OTD, flat files 14
C I
Connectivity Map Installation requirements 10
defined in a project 14
conventions
path name separator 10 M
Windows 10
Menu Bar 13
D O
Deployment
Operators and transformation tools
activate 49
and 37
bootstrap and management agent 49
equal 36
deployment profile 48
greater than 37
eTL deployment 50
join 35
eTL, connect source and target 52
literal 39
eTL, deploy a stand-alone process 50
touppercase 40
eTL, deploy an invoked service 55
OTD
general instructions 48
creating a flat file 14
integration and message servers 48
flat file, field delimiter 20
run the bootstrap 49
flat files, selection criteria 16
verify output 49
suggested record displays, based on your file
Deployment Profile Editor
structure 22
defined in a project 14
OTD Editor
document
defined in a project 14
conventions 10
Documents
supporting documents 9 P
Project
E connect to database 26
create a new sample project 24
Enterprise Designer
create new object type definition 25
create and configure components 12
Enterprise Designer window 31
editor 14
new eTL Collaboration 32
GUI 13
properties 42
menu bar 13
runtime inputs 41
project 14
runtime outputs 41
project editor, is in the right pane 14
select database objects 27
Enterprise Explorer
select tables for Connectivity Map 33
organize components, in the left pane 14
R
Requirements 10
S
SQL
default type 19
W
writing conventions 10