A Guide
A Guide
Informatica PowerCenter
(Version 8.5)
This product includes OSSP UUID software which is Copyright (c) 2002 Ralf S. Engelschall, Copyright (c) 2002 The OSSP Project Copyright (c) 2002 Cable & Wireless Deutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mitlicense.php. This product includes software developed by Boost (http://www.boost.org/). Permissions and limitations regarding this software are subject to terms available at http://www.boost.org/LICENSE_1_0.txt. This product includes software copyright 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://www.pcre.org/license.txt. This product includes software copyright (c) 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://www.eclipse.org/org/documents/epl-v10.php. The product includes the zlib library copyright (c) 1995-2005 Jean-loup Gailly and Mark Adler. This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html. This product includes software licensed under the terms at http://www.bosrup.com/web/overlib/?License. This product includes software licensed under the terms at http://www.stlport.org/doc/license.html. This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php.) This product includes software developed by the Indiana University Extreme! Lab. For further information please visit http://www.extreme.indiana.edu/. This Software is protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,014,670; 6,032,158; 5,794,246; 6,339,775; 6,850,947; 6,895,471 and other U.S. Patents Pending. DISCLAIMER: Informatica Corporation provides this documentation as is without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. The information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is subject to change at any time without notice.
Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiii Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . xxxiii Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiii Visiting the Informatica Knowledge Base . . . . . . . . . . . . . . . . . . . . . xxxiii Obtaining Customer Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiii
Entering and Validating Default Values . . . . . . . . . . . . . . . . . . . . . . . . . 30 Configuring Tracing Level in Transformations . . . . . . . . . . . . . . . . . . . . . . . 32 Reusable Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Instances and Inherited Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Mapping Variables in Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Creating Reusable Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Promoting Non-Reusable Transformations . . . . . . . . . . . . . . . . . . . . . . 34 Creating Non-Reusable Instances of Reusable Transformations . . . . . . . . 35 Adding Reusable Transformations to Mappings . . . . . . . . . . . . . . . . . . . 35 Modifying a Reusable Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 36
vi
Table of Contents
Configuring the Complex Data Exchange Repository Folder . . . . . . . . . . . . 58 Complex Data Transformation Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Input Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Output Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Complex Data Transformation Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Complex Data Transformation Components . . . . . . . . . . . . . . . . . . . . . . . . 63 Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Creating a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Creating a Mapping for a Complex Data Engine Parser Service . . . . . . . 65 Creating a Mapping for the File Output Type . . . . . . . . . . . . . . . . . . . . 66 Splitting XML Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Rules and Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Steps to Configure a Complex Data Transformation . . . . . . . . . . . . . . . . . . 69
Table of Contents
vii
Validating Mappings with Custom Transformations . . . . . . . . . . . . . . . . 89 Working with Procedure Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Creating Custom Transformation Procedures . . . . . . . . . . . . . . . . . . . . . . . . 91 Step 1. Create the Custom Transformation . . . . . . . . . . . . . . . . . . . . . . 91 Step 2. Generate the C Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Step 3. Fill Out the Code with the Transformation Logic . . . . . . . . . . . . 94 Step 4. Build the Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Step 5. Create a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Step 6. Run the Session in a Workflow . . . . . . . . . . . . . . . . . . . . . . . . 105
viii
Table of Contents
Row Strategy Functions (Row-Based Mode) . . . . . . . . . . . . . . . . . . . . 146 Change Default Row Strategy Function . . . . . . . . . . . . . . . . . . . . . . . 147 Array-Based API Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Maximum Number of Rows Functions . . . . . . . . . . . . . . . . . . . . . . . . 148 Number of Rows Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Is Row Valid Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Data Handling Functions (Array-Based Mode) . . . . . . . . . . . . . . . . . . 150 Row Strategy Functions (Array-Based Mode) . . . . . . . . . . . . . . . . . . . 153 Set Input Error Row Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Table of Contents
ix
Step 6. Run the Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Distributing External Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 Distributing COM Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 Distributing Informatica Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Development Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 COM Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Row-Level Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Return Values from Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Exceptions in Procedure Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Memory Management for Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 194 Wrapper Classes for Pre-Existing C/C++ Libraries or VB Functions . . . 194 Generating Error and Tracing Messages . . . . . . . . . . . . . . . . . . . . . . . . 194 Unconnected External Procedure Transformations . . . . . . . . . . . . . . . . 196 Initializing COM and Informatica Modules . . . . . . . . . . . . . . . . . . . . 196 Other Files Distributed and Used in TX . . . . . . . . . . . . . . . . . . . . . . . 200 Service Process Variables in Initialization Properties . . . . . . . . . . . . . . . . . . 201 External Procedure Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Dispatch Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 External Procedure Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Property Access Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Parameter Access Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Code Page Access Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Transformation Name Access Functions . . . . . . . . . . . . . . . . . . . . . . . 206 Procedure Access Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Partition Related Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Tracing Level Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Table of Contents
Table of Contents
xi
Processing Subseconds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Compiling a Java Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Fixing Compilation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 Locating the Source of Compilation Errors . . . . . . . . . . . . . . . . . . . . . 254 Identifying Compilation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
xii
Table of Contents
Table of Contents
xiii
Dropping Transaction Boundaries for Two Pipelines . . . . . . . . . . . . . . 324 Creating a Joiner Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Table of Contents
xv
VSAM Normalizer Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 VSAM Normalizer Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 VSAM Normalizer Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 Steps to Create a VSAM Normalizer Transformation . . . . . . . . . . . . . . 406 Pipeline Normalizer Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 Pipeline Normalizer Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Pipeline Normalizer Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Steps to Create a Pipeline Normalizer Transformation . . . . . . . . . . . . . 412 Using a Normalizer Transformation in a Mapping . . . . . . . . . . . . . . . . . . . 415 Generating Key Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
Replacing Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Sequence Generator Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 NEXTVAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 CURRVAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Transformation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Start Value and Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Increment By . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 End Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 Current Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 Number of Cached Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 Creating a Sequence Generator Transformation . . . . . . . . . . . . . . . . . . . . . 454
xviii
Table of Contents
Custom Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Heterogeneous Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 Creating Key Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 Adding an SQL Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 Entering a User-Defined Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 Outer Join Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 Informatica Join Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 Creating an Outer Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Common Database Syntax Restrictions . . . . . . . . . . . . . . . . . . . . . . . . 491 Entering a Source Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 Using Sorted Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 Select Distinct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 Overriding Select Distinct in the Session . . . . . . . . . . . . . . . . . . . . . . 496 Adding Pre- and Post-Session SQL Commands . . . . . . . . . . . . . . . . . . . . . 497 Creating a Source Qualifier Transformation . . . . . . . . . . . . . . . . . . . . . . . 498 Creating a Source Qualifier Transformation By Default . . . . . . . . . . . . 498 Creating a Source Qualifier Transformation Manually . . . . . . . . . . . . . 498 Configuring Source Qualifier Transformation Options . . . . . . . . . . . . 498 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
Table of Contents
xix
Transaction Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522 High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522 SQL Transformation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 SQL Settings Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 SQL Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 SQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529 Creating an SQL Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
xx
Table of Contents
Creating a Stored Procedure Transformation . . . . . . . . . . . . . . . . . . . . . . . 556 Importing Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 Manually Creating Stored Procedure Transformations . . . . . . . . . . . . . 558 Setting Options for the Stored Procedure . . . . . . . . . . . . . . . . . . . . . . 559 Changing the Stored Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 Configuring a Connected Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 562 Configuring an Unconnected Transformation . . . . . . . . . . . . . . . . . . . . . . 563 Calling a Stored Procedure From an Expression . . . . . . . . . . . . . . . . . . 563 Calling a Pre- or Post-Session Stored Procedure . . . . . . . . . . . . . . . . . . 566 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Pre-Session Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Post-Session Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 Session Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 Supported Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 SQL Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 Parameter Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 Input/Output Port in Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 Type of Return Value Supported . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 Expression Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
Table of Contents
xxi
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
xxii
Table of Contents
List of Figures
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure 1-1. Sample Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1-2. Example of Input, Output, and Input/Output Ports . . . . . . . . . . . . . . . . . . . . . . . 8 1-3. Sample Input and Output Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1-4. Expression Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1-5. Variable Ports That Store Values Across Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1-6. Default Value for Input and Input/Output Ports . . . . . . . . . . . . . . . . . . . . . . . . . 21 1-7. Default Value for Output Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1-8. Using a Constant as a Default Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1-9. Using the ERROR Function to Skip Null Input Values . . . . . . . . . . . . . . . . . . . . 26 1-10. Entering and Validating Default Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1-11. Reverting to Original Reusable Transformation Properties . . . . . . . . . . . . . . . . . 37 2-1. Aggregator Transformation Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3-1. Complex Data Transformation Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3-2. CDET Settings Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3-3. CDET Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3-4. Complex Data Transformation Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3-5. Mapping with a Complex Data Transformation and an XML Parser . . . . . . . . . . . 65 3-6. Mapping with a Complex Data Transformation with File Output Type . . . . . . . . 66 3-7. Enabling Streaming Input on the XML Parser Transformation . . . . . . . . . . . . . . . 67 4-1. Custom Transformation Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4-2. Editing Port Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4-3. Port Attribute Definitions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4-4. Edit Port Attribute Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4-5. Custom Transformation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4-6. Custom Transformation Ports Tab - Union Example . . . . . . . . . . . . . . . . . . . . . . 92 4-7. Custom Transformation Properties Tab - Union Example . . . . . . . . . . . . . . . . . . 93 4-8. Mapping with a Custom Transformation - Union Example . . . . . . . . . . . . . . . . 105 5-1. Custom Transformation Handles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6-1. Sample Mapping with an Expression Transformation . . . . . . . . . . . . . . . . . . . . . 158 6-2. Expression Transformation Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6-3. Expression Transformation Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7-1. External Procedure Transformation Properties Tab . . . . . . . . . . . . . . . . . . . . . . 167 7-2. Process for Distributing External Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 7-3. External Procedure Transformation Initialization Properties . . . . . . . . . . . . . . . . 199 7-4. External Procedure Transformation Initialization Properties Tab . . . . . . . . . . . . 201 8-1. Sample Mapping with a Filter Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 210 8-2. Specifying a Filter Condition in a Filter Transformation . . . . . . . . . . . . . . . . . . 211 8-3. Filter Transformation Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 9-1. HTTP Transformation Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 9-2. HTTP Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
List of Figures
xxiii
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
9-3. HTTP Transformation Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4. HTTP Transformation HTTP Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5. HTTP Tab for a GET Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6. HTTP Tab for a POST Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7. HTTP Tab for a SIMPLE POST Example . . . . . . . . . . . . . . . . . . . . . . . . . 10-1. Java Code Tab Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2. Java Transformation Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3. Java Transformation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4. Java Transformation Settings Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . 10-5. Highlighted Error in Code Entry Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6. Highlighted Error in Full Code Window . . . . . . . . . . . . . . . . . . . . . . . . . 12-1. Define Expression Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2. Java Expressions Code Entry Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1. Java Transformation Example - Sample Mapping . . . . . . . . . . . . . . . . . . . 13-2. Java Transformation Example - Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . 13-3. Java Transformation Example - Import Packages Tab . . . . . . . . . . . . . . . . 13-4. Java Transformation Example - Helper Code Tab . . . . . . . . . . . . . . . . . . 13-5. Java Transformation Example - On Input Row Tab . . . . . . . . . . . . . . . . . 13-6. Java Transformation Example - Successful Compilation . . . . . . . . . . . . . . 14-1. Mapping with Master and Detail Pipelines . . . . . . . . . . . . . . . . . . . . . . . 14-2. Joiner Transformation Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3. Mapping Configured to Join Data from Two Pipelines . . . . . . . . . . . . . . 14-4. Mapping that Joins Two Branches of a Pipeline . . . . . . . . . . . . . . . . . . . . 14-5. Mapping that Joins Two Instances of the Same Source . . . . . . . . . . . . . . . 14-6. Preserving Transaction Boundaries when You Join Two Pipeline Branches 15-1. Session Properties for Flat File Lookups . . . . . . . . . . . . . . . . . . . . . . . . . 15-2. Return Port in a Lookup Transformation . . . . . . . . . . . . . . . . . . . . . . . . 16-1. Building Lookup Caches Sequentially . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2. Building Lookup Caches Concurrently . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3. Mapping with a Dynamic Lookup Cache . . . . . . . . . . . . . . . . . . . . . . . . . 16-4. Dynamic Lookup Transformation Ports Tab . . . . . . . . . . . . . . . . . . . . . . 16-5. Using Update Strategy Transformations with a Lookup Transformation . . 16-6. Slowly Changing Dimension Mapping with Dynamic Lookup Cache . . . . 17-1. Normalizer Transformation Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-2. Normalizer Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-3. Normalizer Transformation Properties Tab . . . . . . . . . . . . . . . . . . . . . . . 17-4. Normalizer Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-5. COBOL Source Definition Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6. Sales File VSAM Normalizer Transformation . . . . . . . . . . . . . . . . . . . . . 17-7. VSAM Normalizer Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8. Normalizer Tab for a VSAM Normalizer Transformation . . . . . . . . . . . . . 17-9. Pipeline Normalizer Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-10. Pipeline Normalizer Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.222 .224 .229 .230 .231 .237 .239 .241 .250 .255 .256 .277 .278 .295 .297 .299 .300 .302 .303 .306 .308 .317 .319 .319 .323 .343 .354 .362 .363 .368 .369 .376 .381 .394 .395 .396 .397 .402 .402 .404 .405 .408 .408
xxiv
List of Figures
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
17-11. Pipeline Normalizer Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-12. Normalizer Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-13. Grouping Repeated Columns on the Normalizer Tab . . . . . . . . . . . . . . . . . . . 17-14. Group-Level Column on the Normalizer Tab . . . . . . . . . . . . . . . . . . . . . . . . . 17-15. Sales File COBOL Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-16. Multiple Record Types Routed to Different Targets . . . . . . . . . . . . . . . . . . . . 17-17. Router Transformation User-Defined Groups . . . . . . . . . . . . . . . . . . . . . . . . 17-18. COBOL Source with A Multiple-Occurring Group of Columns . . . . . . . . . . . 17-19. Generated Keys in Target Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-20. Generated Keys Mapped to Target Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-1. Sample Mapping with a Rank Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 19-1. Comparing Router and Filter Transformations . . . . . . . . . . . . . . . . . . . . . . . . 19-2. Sample Router Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-3. Using a Router Transformation in a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 19-4. Specifying Group Filter Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-5. Router Transformation Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-6. Input Port Name and Corresponding Output Port Names . . . . . . . . . . . . . . . . 20-1. Connecting NEXTVAL to Two Target Tables in a Mapping . . . . . . . . . . . . . . . 20-2. Mapping with a Sequence Generator and an Expression Transformation . . . . . . 20-3. Connecting CURRVAL and NEXTVAL Ports to a Target . . . . . . . . . . . . . . . . 21-1. Sample Mapping with a Sorter Transformation . . . . . . . . . . . . . . . . . . . . . . . . 21-2. Sample Sorter Transformation Ports Configuration . . . . . . . . . . . . . . . . . . . . . 21-3. Sorter Transformation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1. Source Definition Connected to a Source Qualifier Transformation . . . . . . . . . 22-2. Joining Two Tables with One Source Qualifier Transformation . . . . . . . . . . . . 22-3. Creating a Relationship Between Two Tables . . . . . . . . . . . . . . . . . . . . . . . . . . 23-1. SQL Transformation Script Mode Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-2. SQL Editor for an SQL Transformation Query . . . . . . . . . . . . . . . . . . . . . . . . 23-3. SQL Transformation Static Query Mode Ports . . . . . . . . . . . . . . . . . . . . . . . . 23-4. SQL Transformation Ports to Pass a Full Dynamic Query . . . . . . . . . . . . . . . . 23-5. SQL Transformation Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-6. SQL Settings Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-7. SQL Transformation SQL Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-1. Dynamic Query Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-2. Dynamic Query Expression Transformation Ports . . . . . . . . . . . . . . . . . . . . . . 24-3. Dynamic Query SQL Transformation Ports tab: . . . . . . . . . . . . . . . . . . . . . . . 24-4. Dynamic Connection Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-5. Dynamic Query Example Expression Transformation Ports . . . . . . . . . . . . . . . 24-6. Dynamic Connection Example SQL Transformation Ports . . . . . . . . . . . . . . . . 25-1. Sample Mapping with a Stored Procedure Transformation . . . . . . . . . . . . . . . . 25-2. Expression Transformation Referencing a Stored Procedure Transformation . . . 25-3. Stored Procedure Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-1. Transaction Control Transformation Properties . . . . . . . . . . . . . . . . . . . . . . . .
410 411 412 414 415 416 417 418 418 419 422 430 431 433 434 436 437 444 445 446 458 459 461 473 477 479 503 506 508 509 524 526 527 535 538 539 541 543 545 562 563 569 579
List of Figures
xxv
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
26-2. 26-3. 26-4. 26-5. 26-6. 27-1. 27-2. 27-3. 27-4. 28-1.
Sample Transaction Control Mapping . . . . . . . . . . . . . . . . . . . . . . . Effective and Ineffective Transaction Control Transformations . . . . . Transaction Control Transformation Effective for a Transformation . Valid Mapping with Transaction Control Transformations . . . . . . . . Invalid Mapping with Transaction Control Transformations . . . . . . Union Transformation Groups Tab . . . . . . . . . . . . . . . . . . . . . . . . Union Transformation Group Ports Tab . . . . . . . . . . . . . . . . . . . . . Union Transformation Ports Tab . . . . . . . . . . . . . . . . . . . . . . . . . . Mapping with a Union Transformation . . . . . . . . . . . . . . . . . . . . . . Specifying Operations for Individual Target Tables . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
.581 .583 .583 .584 .585 .592 .593 .593 .595 .604
xxvi
List of Figures
List of Tables
Table 1-1. Transformation Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 1-2. Multi-Group Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 1-3. Transformations Containing Expressions . . . . . . . . . . . . . . . . . . . . . . . . . Table 1-4. Variable Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 1-5. System Default Values and Integration Service Behavior . . . . . . . . . . . . . . Table 1-6. Transformations Supporting User-Defined Default Values . . . . . . . . . . . . Table 1-7. Default Values for Input and Input/Output Ports . . . . . . . . . . . . . . . . . . Table 1-8. Supported Default Values for Output Ports . . . . . . . . . . . . . . . . . . . . . . . Table 1-9. Session Log Tracing Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 3-1. Complex Data Transformation Port Contents by Output Type . . . . . . . . . Table 3-2. Complex Data Transformation Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 3-3. Complex Data Transformation Properties . . . . . . . . . . . . . . . . . . . . . . . . Table 4-1. Custom Transformation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 4-2. Transaction Boundary Handling with Custom Transformations . . . . . . . . Table 4-3. Module File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 4-4. UNIX Commands to Build the Shared Library. . . . . . . . . . . . . . . . . . . . . Table 5-1. Custom Transformation Handles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 5-2. Custom Transformation Generated Functions . . . . . . . . . . . . . . . . . . . . . Table 5-3. Custom Transformation API Functions . . . . . . . . . . . . . . . . . . . . . . . . . . Table 5-4. Custom Transformation Array-Based API Functions . . . . . . . . . . . . . . . . Table 5-5. INFA_CT_MODULE Property IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 5-6. INFA_CT_PROC_HANDLE Property IDs . . . . . . . . . . . . . . . . . . . . . . . Table 5-7. INFA_CT_TRANS_HANDLE Property IDs . . . . . . . . . . . . . . . . . . . . . . Table 5-8. INFA_CT_INPUT_GROUP and INFA_CT_OUTPUT_GROUP Handle Property IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 5-9. INFA_CT_INPUTPORT and INFA_CT_OUTPUTPORT_HANDLE Handle Property IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 5-10. Property Functions (MBCS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 5-11. Property Functions (Unicode) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 5-12. Compatible Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 5-13. Get Data Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 5-14. Get Data Functions (Array-Based Mode) . . . . . . . . . . . . . . . . . . . . . . . . Table 7-1. Differences Between COM and Informatica External Procedures . . . . . . . Table 7-2. External Procedure Transformation Properties . . . . . . . . . . . . . . . . . . . . . Table 7-3. Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 7-4. Visual C++ and Transformation Datatypes . . . . . . . . . . . . . . . . . . . . . . . Table 7-5. Visual Basic and Transformation Datatypes . . . . . . . . . . . . . . . . . . . . . . . Table 7-6. External Procedure Initialization Properties . . . . . . . . . . . . . . . . . . . . . . . Table 7-7. Descriptions of Parameter Access Functions. . . . . . . . . . . . . . . . . . . . . . . Table 7-8. Member Variable of the External Procedure Base Class. . . . . . . . . . . . . . . Table 8-1. Filter Transformation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . . . . .. 2 .. 9 . 11 . 15 . 20 . 22 . 24 . 28 . 32 . 60 . 62 . 64 . 82 . 87 103 104 109 110 110 112 127 128 129
. . . . . . 130 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . 131 133 133 134 136 151 166 167 169 192 192 201 204 206 213
xxvii
List of Tables
Table 9-1. HTTP Transformation Properties . . . . . . . . . . . . . . . . . . . . . Table 9-2. HTTP Transformation Methods . . . . . . . . . . . . . . . . . . . . . . Table 9-3. GET Method Groups and Ports . . . . . . . . . . . . . . . . . . . . . . Table 9-4. POST Method Groups and Ports . . . . . . . . . . . . . . . . . . . . . Table 9-5. SIMPLE POST Method Groups and Ports . . . . . . . . . . . . . . Table 10-1. Mapping from PowerCenter Datatypes to Java Datatypes . . . Table 10-2. Java Transformation Properties . . . . . . . . . . . . . . . . . . . . . . Table 12-1. Enumerated Java Datatypes . . . . . . . . . . . . . . . . . . . . . . . . Table 12-2. JExpression API Methods . . . . . . . . . . . . . . . . . . . . . . . . . . Table 13-1. Input and Output Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 14-1. Joiner Transformation Properties . . . . . . . . . . . . . . . . . . . . Table 14-2. Integration Service Behavior with Transformation Scopes for Joiner Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 15-1. Differences Between Connected and Unconnected Lookups . Table 15-2. Lookup Transformation Port Types . . . . . . . . . . . . . . . . . . Table 15-3. Lookup Transformation Properties . . . . . . . . . . . . . . . . . . . Table 15-4. Session Properties for Flat File Lookups . . . . . . . . . . . . . . . Table 16-1. Lookup Caching Comparison . . . . . . . . . . . . . . . . . . . . . . . Table 16-2. Integration Service Handling of Persistent Caches . . . . . . . . Table 16-3. NewLookupRow Values . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 16-4. Dynamic Lookup Cache Behavior for Insert Row Type . . . . Table 16-5. Dynamic Lookup Cache Behavior for Update Row Type . . . Table 16-6. Location for Sharing Unnamed Cache . . . . . . . . . . . . . . . . . Table 16-7. Properties for Sharing Unnamed Cache . . . . . . . . . . . . . . . . Table 16-8. Location for Sharing Named Cache . . . . . . . . . . . . . . . . . . . Table 16-9. Properties for Sharing Named Cache . . . . . . . . . . . . . . . . . . Table 17-1. Normalizer Transformation Properties . . . . . . . . . . . . . . . . Table 17-2. Normalizer Tab Columns . . . . . . . . . . . . . . . . . . . . . . . . . . Table 17-3. Normalizer Tab for a VSAM Normalizer Transformation . . . Table 17-4. Pipeline Normalizer Tab . . . . . . . . . . . . . . . . . . . . . . . . . . Table 18-1. Rank Transformation Ports . . . . . . . . . . . . . . . . . . . . . . . . Table 18-2. Rank Transformation Properties . . . . . . . . . . . . . . . . . . . . . Table 20-1. Sequence Generator Transformation Properties . . . . . . . . . . Table 21-1. Column Sizes for Sorter Data Calculations . . . . . . . . . . . . . Table 22-1. Conversion for Datetime Mapping Parameters and Variables Table 22-2. Source Qualifier Transformation Properties . . . . . . . . . . . . . Table 22-3. Locations for Entering Outer Join Syntax . . . . . . . . . . . . . . Table 22-4. Syntax for Normal Joins in a Join Override . . . . . . . . . . . . . Table 22-5. Syntax for Left Outer Joins in a Join Override . . . . . . . . . . . Table 22-6. Syntax for Right Outer Joins in a Join Override . . . . . . . . . Table 23-1. Full Database Connection Information . . . . . . . . . . . . . . . . Table 23-2. Native Connect String Syntax . . . . . . . . . . . . . . . . . . . . . Table 23-3. Output Rows By Query Statement - Query Mode . . . . . . . . Table 23-4. NumRowsAffected Rows by Query Statement - Query Mode
xxviii List of Tables
... ... ... ... ... ... ... ... ... ... ... the ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.222 .225 .226 .226 .227 .235 .241 .284 .286 .296 .308 .322 .331 .336 .338 .344 .361 .365 .370 .378 .379 .385 .385 .388 .388 .396 .398 .405 .411 .424 .427 .448 .462 .469 .471 .485 .485 .487 .489 .514 .514 .518 .518
Table 23-5. Output Rows by Query Statement - UPDATE, INSERT, or DELETE Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 23-6. Output Rows by Query Statement - SELECT Statement . . . . . . . . . . Table 23-7. Output Rows by Query Statement - DDL Queries . . . . . . . . . . . . . . . Table 23-8. SQL Transformation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 23-9. SQL Settings Tab Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 23-10. SQL Transformation Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 23-11. Standard SQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 23-12. SQL Transformation Connection Options . . . . . . . . . . . . . . . . . . . . Table 25-1. Comparison of Connected and Unconnected Stored Procedure Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 25-2. Setting Options for the Stored Procedure Transformation . . . . . . . . . Table 28-1. Constants for Each Database Operation . . . . . . . . . . . . . . . . . . . . . . Table 28-2. Specifying an Operation for All Rows . . . . . . . . . . . . . . . . . . . . . . . . Table 28-3. Update Strategy Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . .
520 520 521 524 526 527 529 531 550 559 599 602 603
List of Tables
xxix
xxx
List of Tables
Preface
Welcome to PowerCenter, the Informatica software product that delivers an open, scalable data integration solution addressing the complete life cycle for all data integration projects including data warehouses, data migration, data synchronization, and information hubs. PowerCenter combines the latest technology enhancements for reliably managing data repositories and delivering information resources in a timely, usable, and efficient manner. The PowerCenter repository coordinates and drives a variety of core functions, including extracting, transforming, loading, and managing data. The Integration Service can extract large volumes of data from multiple platforms, handle complex transformations on the data, and support high-speed loads. PowerCenter can simplify and accelerate the process of building a comprehensive data warehouse from disparate data sources.
xxxi
Document Conventions
This guide uses the following formatting conventions:
If you see It means The word or set of words are especially emphasized. Emphasized subjects. This is the variable name for a value you enter as part of an operating system command. This is generic text that should be replaced with user-supplied values. The following paragraph provides additional facts. The following paragraph provides suggested uses. The following paragraph notes situations where you can overwrite or corrupt data, unless you follow the specified procedure. This is a code example. This is an operating system command you enter from a prompt to run a task.
xxxii
Preface
Informatica Customer Portal Informatica web site Informatica Knowledge Base Informatica Global Customer Support
[email protected] for technical inquiries [email protected] for general customer service requests
WebSupport requires a user name and password. You can request a user name and password at http://my.informatica.com.
Preface
xxxiii
Use the following telephone numbers to contact Informatica Global Customer Support:
North America / South America Informatica Corporation Headquarters 100 Cardinal Way Redwood City, California 94063 United States Europe / Middle East / Africa Informatica Software Ltd. 6 Waltham Park Waltham Road, White Waltham Maidenhead, Berkshire SL6 3TN United Kingdom Asia / Australia Informatica Business Solutions Pvt. Ltd. Diamond District Tower B, 3rd Floor 150 Airport Road Bangalore 560 008 India Toll Free Australia: 1 800 151 830 Singapore: 001 800 4632 4357 Standard Rate India: +91 80 4112 5738
Standard Rate Belgium: +32 15 281 702 France: +33 1 41 38 92 26 Germany: +49 1805 702 702 Netherlands: +31 306 022 797 United Kingdom: +44 1628 511 445
xxxiv
Preface
Chapter 1
Overview, 2 Creating a Transformation, 5 Configuring Transformations, 6 Working with Ports, 7 Multi-Group Transformations, 9 Working with Expressions, 10 Using Local Variables, 15 Using Default Values for Ports, 20 Configuring Tracing Level in Transformations, 32 Reusable Transformations, 33
Overview
A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data. Transformations in a mapping represent the operations the Integration Service performs on the data. Data passes through transformation ports that you link in a mapping or mapplet. Transformations can be active or passive. An active transformation can change the number of rows that pass through it, such as a Filter transformation that removes rows that do not meet the filter condition. A passive transformation does not change the number of rows that pass through it, such as an Expression transformation that performs a calculation on data and passes all rows through the transformation. Transformations can be connected to the data flow, or they can be unconnected. An unconnected transformation is not connected to other transformations in the mapping. An unconnected transformation is called within another transformation, and returns a value to that transformation. Table 1-1 provides a brief description of each transformation:
Table 1-1. Transformation Descriptions
Transformation Aggregator Application Source Qualifier Complex Data Type Active/ Connected Active/ Connected Active or Passive/ Connected Active or Passive/ Connected Passive/ Connected Passive/ Connected or Unconnected Active/ Connected Passive/ Connected Passive/ Connected Description Performs aggregate calculations. Represents the rows that the Integration Service reads from an application, such as an ERP source, when it runs a session. Transforms data in unstructured and semi-structured formats.
Custom
Calculates a value. Calls a procedure in a shared library or in the COM layer of Windows. Filters data. Connects to an HTTP server to read or update data. Defines mapplet input rows. Available in the Mapplet Designer.
Joiner Lookup
Normalizer Output Rank Router Sequence Generator Sorter Source Qualifier SQL
Source qualifier for COBOL sources. Can also use in the pipeline to normalize data from relational or flat file sources. Defines mapplet output rows. Available in the Mapplet Designer. Limits records to a top or bottom range. Routes data into multiple transformations based on group conditions. Generates primary keys. Sorts data based on a sort key. Represents the rows that the Integration Service reads from a relational or flat file source when it runs a session. Executes SQL queries against a database.
Stored Procedure
Transaction Control Union Update Strategy XML Generator XML Parser XML Source Qualifier
Defines commit and rollback transactions. Merges data from different databases or flat file systems. Determines whether to insert, delete, update, or reject rows. Reads data from one or more input ports and outputs XML through a single output port. Reads XML from one input port and outputs data to one or more output ports. Represents the rows that the Integration Service reads from an XML source when it runs a session.
Overview
When you build a mapping, you add transformations and configure them to handle data according to a business purpose. Complete the following tasks to incorporate a transformation into a mapping: 1. Create the transformation. Create it in the Mapping Designer as part of a mapping, in the Mapplet Designer as part of a mapplet, or in the Transformation Developer as a reusable transformation. Configure the transformation. Each type of transformation has a unique set of options that you can configure. Link the transformation to other transformations and target definitions. Drag one port to another to link them in the mapping or mapplet.
2. 3.
Creating a Transformation
You can create transformations using the following Designer tools:
Mapping Designer. Create transformations that connect sources to targets. Transformations in a mapping cannot be used in other mappings unless you configure them to be reusable. Transformation Developer. Create individual transformations, called reusable transformations, that use in multiple mappings. For more information, see Reusable Transformations on page 33. Mapplet Designer. Create and configure a set of transformations, called mapplets, that you use in multiple mappings. For more information, see Mapplets in the Designer Guide.
Use the same process to create a transformation in the Mapping Designer, Transformation Developer, and Mapplet Designer.
To create a transformation: 1. 2. 3.
Open the appropriate Designer tool. In the Mapping Designer, open or create a Mapping. In the Mapplet Designer, open or create a Mapplet. On the Transformations toolbar, click the button corresponding to the transformation you want to create. -orClick Transformation > Create and select the type of transformation you want to create.
4.
Drag across the portion of the mapping where you want to place the transformation. The new transformation appears in the workspace. Next, you need to configure the transformation by adding any new ports to it and setting other properties.
Creating a Transformation
Configuring Transformations
After you create a transformation, you can configure it. Every transformation contains the following common tabs:
Transformation. Name the transformation or add a description. Port. Add and configure ports. Properties. Configure properties that are unique to the transformation. Metadata Extensions. Extend the metadata in the repository by associating information with individual objects in the repository.
Some transformations might include other tabs, such as the Condition tab, where you enter conditions in a Joiner or Normalizer transformation. When you configure transformations, you might complete the following tasks:
Add ports. Define the columns of data that move into and out of the transformation. Add groups. In some transformations, define input or output groups that define a row of data entering or leaving the transformation. Enter expressions. Enter SQL-like expressions in some transformations that transform the data. Define local variables. Define local variables in some transformations that temporarily store data. Override default values. Configure default values for ports to handle input nulls and output transformation errors. Enter tracing levels. Choose the amount of detail the Integration Service writes in the session log about a transformation.
Creating Ports
You can create a new port in the following ways:
Drag a port from another transformation. When you drag a port from another transformation the Designer creates a port with the same properties, and it links the two ports. Click Layout > Copy Columns to enable copying ports. Click the Add button on the Ports tab. The Designer creates an empty port you can configure.
Configuring Ports
On the Ports tab, you can configure the following properties:
Port name. The name of the port. Use the following conventions while naming ports:
Begin with a single- or double-byte letter or single- or double-byte underscore (_). Port names can contain any of the following single- or double-byte characters: a letter, number, underscore (_), $, #, or @.
Datatype, precision, and scale. If you plan to enter an expression or condition, make sure the datatype matches the return value of the expression. Port type. Transformations may contain a combination of input, output, input/output, and variable port types.
Working with Ports 7
Default value. The Designer assigns default values to handle null values and output transformation errors. You can override the default value in some ports. Description. A description of the port. Other properties. Some transformations have properties specific to that transformation, such as expressions or group by properties.
For more information about configuration options, see the appropriate sections in this chapter or in the specific transformation chapters.
Note: The Designer creates some transformations with configured ports. For example, the
Designer creates a Lookup transformation with an output port for each column in the table or view used for the lookup. You need to create a port representing a value used to perform a lookup.
Linking Ports
After you add and configure a transformation in a mapping, you link it to targets and other transformations. You link mapping objects through the ports. Data passes into and out of a mapping through the following ports:
Input ports. Receive data. Output ports. Pass data. Input/output ports. Receive data and pass it unchanged.
Figure 1-2 shows an example of a transformation with input, output, and input/output ports:
Figure 1-2. Example of Input, Output, and Input/Output Ports
To link ports, drag between ports in different mapping objects. The Designer validates the link and creates the link only when the link meets validation requirements. For more information about connecting mapping objects or about how to link ports, see Mappings in the Designer Guide.
Multi-Group Transformations
Transformations have input and output groups. A group is a set of ports that define a row of incoming or outgoing data. A group is analogous to a table in a relational source or target definition. Most transformations have one input and one output group. However, some have multiple input groups, multiple output groups, or both. A group is the representation of a row of data entering or leaving a transformation. Table 1-2 lists the transformations with multiple groups:
Table 1-2. Multi-Group Transformations
Transformation Custom HTTP Joiner Router Union XML Source Qualifier XML Target Definition XML Parser XML Generator Description Contains any number of input and output groups. Contains an input, output, and a header group. Contains two input groups, the master source and detail source, and one output group. Contains one input group and multiple output groups. Contains multiple input groups and one output group. Contains multiple input and output groups. Contains multiple input groups. Contains one input group and multiple output groups. Contains multiple input groups and one output group.
When you connect transformations in a mapping, you must consider input and output groups. For more information about connecting transformations in a mapping, see Mappings in the Designer Guide. Some multiple input group transformations require the Integration Service to block data at an input group while the Integration Service waits for a row from a different input group. A blocking transformation is a multiple input group transformation that blocks incoming data. The following transformations are blocking transformations:
Custom transformation with the Inputs May Block property enabled Joiner transformation configured for unsorted input
The Designer performs data flow validation when you save or validate a mapping. Some mappings that contain blocking transformations might not be valid. For more information about data flow validation, see Mappings in the Designer Guide. For more information about blocking source data, see Integration Service Architecture in the Administrator Guide.
Multi-Group Transformations
Transformation language functions. SQL-like functions designed to handle common expressions. User-defined functions. Functions you create in PowerCenter based on transformation language functions. Custom functions. Functions you create with the Custom Function API.
For more information about the transformation language and custom functions, see the Transformation Language Reference. For more information about user-defined functions, see Working with User-Defined Functions in the Designer Guide. Enter an expression in an output port that uses the value of data from an input or input/ output port. For example, you have a transformation with an input port IN_SALARY that contains the salaries of all the employees. You might want to use the individual values from the IN_SALARY column later in the mapping, and the total and average salaries you calculate through this transformation. For this reason, the Designer requires you to create a separate output port for each calculated value. Figure 1-3 shows an Aggregator transformation that uses input ports to calculate sums and averages:
Figure 1-3. Sample Input and Output Ports
10
Table 1-3 lists the transformations in which you can enter expressions:
Table 1-3. Transformations Containing Expressions
Transformation Aggregator Expression Performs an aggregate calculation based on all data passed through the transformation. Alternatively, you can specify a filter for records in the aggregate calculation to exclude certain kinds of records. For example, you can find the total number and average salary of all employees in a branch office using this transformation. Performs a calculation based on values within a single row. For example, based on the price and quantity of a particular item, you can calculate the total purchase price for that line item in an order. Specifies a condition used to filter rows passed through this transformation. For example, if you want to write customer data to the BAD_DEBT table for customers with outstanding balances, you could use the Filter transformation to filter customer data. Sets the conditions for rows included in a rank. For example, you can rank the top 10 salespeople who are employed with the company. Routes data into multiple transformations based on a group expression. For example, use this transformation to compare the salaries of employees at three different pay levels. You can do this by creating three groups in the Router transformation. For example, create one group expression for each salary range. Flags a row for update, insert, delete, or reject. You use this transformation when you want to control updates to a target, based on some condition you apply. For example, you might use the Update Strategy transformation to flag all customer rows for update when the mailing address has changed, or flag all employee rows for reject for people who no longer work for the company. Specifies a condition used to determine the action the Integration Service performs, either commit, roll back, or no transaction change. You use this transformation when you want to control commit and rollback transactions based on a row or set of rows that pass through the transformation. For example, use this transformation to commit a set of rows based on an order entry date. Return Value Result of an aggregate calculation for a port.
Expression
Filter
TRUE or FALSE, depending on whether a row meets the specified condition. Only rows that return TRUE are passed through this transformation. The transformation applies this value to each row passed through it. Result of a condition or calculation for a port. TRUE or FALSE, depending on whether a row meets the specified group expression. Only rows that return TRUE pass through each user-defined group in this transformation. Rows that return FALSE pass through the default group. Numeric code for update, insert, delete, or reject. The transformation applies this value to each row passed through it.
Rank
Router
Update Strategy
Transaction Control
One of the following built-in variables, depending on whether or not a row meets the specified condition: - TC_CONTINUE_TRANSACTION - TC_COMMIT_BEFORE - TC_COMMIT_AFTER - TC_ROLLBACK_BEFORE - TC_ROLLBACK_AFTER The Integration Service performs actions based on the return value.
11
depend on this port in the mapping. For more information, see Mappings in the Designer Guide.
Adding Comments
You can add comments to an expression to give descriptive information about the expression or to specify a valid URL to access business documentation about the expression. You can add comments in one of the following ways:
To add comments within the expression, use -- or // comment indicators. To add comments in the dialog box, click the Comments button.
For examples on adding comments to expressions, see The Transformation Language in the Transformation Language Reference.
12 Chapter 1: Working with Transformations
For more information about linking to business documentation, see Using the Designer in the Designer Guide.
Validating Expressions
Use the Validate button to validate an expression. If you do not validate an expression, the Designer validates it when you close the Expression Editor. If the expression is invalid, the Designer displays a warning. You can save the invalid expression or modify it. You cannot run a session against a mapping with invalid expressions.
In the transformation, select the port and open the Expression Editor. Enter the expression. Use the Functions and Ports tabs and the operator keys.
3.
4.
Validate the expression. Use the Validate button to validate the expression.
13
For example, to define the expression IIF(color=red,5) in a parameter file, perform the following steps: 1. 2. In the mapping that uses the expression, create a mapping parameter $$Exp. Set IsExprVar to true and set the datatype to String. In the Expression Editor, set the expression to the name of the mapping parameter as follows:
$$Exp
3. 4.
Configure the session or workflow to use a parameter file. In the parameter file, set the value of $$Exp to the expression string as follows:
$$Exp=IIF(color=red,5)
For more information about defining expression strings in parameter files, see the Designer Guide.
14
Temporarily store data. Simplify complex expressions. Store values from prior rows. Capture multiple return values from a stored procedure. Compare values. Store the results of an unconnected Lookup transformation.
Rather than entering the same arguments for both calculations, you might create a variable port for each condition in this calculation, then modify the expression to use the variables. Table 1-4 shows how to use variables to simplify complex expressions and temporarily store data:
Table 1-4. Variable Usage
Port V_CONDITION1 V_CONDITION2 Value JOB_STATUS = Full-time OFFICE_ID = 1000
15
Each row contains a state. You need to count the number of rows and return the row count for each state:
California,3 Hawaii ,2
New Mexico,3
You can configure an Aggregator transformation to group the source rows by state and count the number of rows in each group. Configure a variable in the Aggregator transformation to store the row count. Define another variable to store the state name from the previous row.
16
Figure 1-5 shows the Aggregator transformation ports to return the sum of rows by state:
Figure 1-5. Variable Ports That Store Values Across Rows
Description The name of a state. The source rows are grouped by the state name. The Aggregator transformation returns one row for each state. The row count for the current State. When the value of the current State column is the same as the Previous_State column, the Integration Service increments State_Count. Otherwise, it resets the State_Count to 1. The value of the State column in the previous row. When the Integration Service processes a row, it moves the State value to Previous_State. The number of rows the Aggregator transformation processed for a state. The Integration Service returns State_Counter once for each state.
State_Count
Previous_State
Variable
State
State_Counter
Output
State_Count
17
Port order. The Integration Service evaluates ports by dependency. The order of the ports in a transformation must match the order of evaluation: input ports, variable ports, output ports. Datatype. The datatype you choose reflects the return value of the expression you enter. Variable initialization. The Integration Service sets initial values in variable ports, where you can create counters.
Port Order
The Integration Service evaluates ports in the following order: 1. Input ports. The Integration Service evaluates all input ports first since they do not depend on any other ports. Therefore, you can create input ports in any order. Since they do not reference other ports, the Integration Service does not order input ports. Variable ports. Variable ports can reference input ports and variable ports, but not output ports. Because variable ports can reference input ports, the Integration Service evaluates variable ports after input ports. Likewise, since variables can reference other variables, the display order for variable ports is the same as the order in which the Integration Service evaluates each variable. For example, if you calculate the original value of a building and then adjust for depreciation, you might create the original value calculation as a variable port. This variable port needs to appear before the port that adjusts for depreciation. 3. Output ports. Because output ports can reference input ports and variable ports, the Integration Service evaluates output ports last. The display order for output ports does not matter since output ports cannot reference other output ports. Be sure output ports display at the bottom of the list of ports.
2.
Datatype
When you configure a port as a variable, you can enter any expression or condition in it. The datatype you choose for this port reflects the return value of the expression you enter. If you specify a condition through the variable port, any numeric datatype returns the values for TRUE (non-zero) and FALSE (zero).
Variable Initialization
The Integration Service does not set the initial value for variables to NULL. Instead, the Integration Service uses the following guidelines to set initial values for variables:
18
Zero for numeric ports Empty strings for string ports 01/01/1753 for Date/Time ports with PMServer 4.0 date handling compatibility disabled 01/01/0001 for Date/Time ports with PMServer 4.0 date handling compatibility enabled
Therefore, use variables as counters, which need an initial value. For example, you can create a numeric variable with the following expression:
VAR1 + 1
This expression counts the number of rows in the VAR1 port. If the initial value of the variable were set to NULL, the expression would always evaluate to NULL. This is why the initial value is set to zero.
19
Input port. The system default value for null input ports is NULL. It displays as a blank in the transformation. If an input value is NULL, the Integration Service leaves it as NULL. Output port. The system default value for output transformation errors is ERROR. The default value appears in the transformation as ERROR(transformation error). If a transformation error occurs, the Integration Service skips the row. The Integration Service notes all input rows skipped by the ERROR function in the session log file. The following errors are considered transformation errors:
Data conversion errors, such as passing a number to a date function. Expression evaluation errors, such as dividing by zero. Calls to an ERROR function.
Input/output port. The system default value for null input is the same as input ports, NULL. The system default value appears as a blank in the transformation. The default value for output transformation errors is the same as output ports. The default value for output transformation errors does not display in the transformation.
Table 1-5 shows the system default values for ports in connected transformations:
Table 1-5. System Default Values and Integration Service Behavior
Port Type Input, Input/Output Output, Input/Output Default Value NULL ERROR Integration Service Behavior Integration Service passes all input null values as NULL. Integration Service calls the ERROR function for output port transformation errors. The Integration Service skips rows with errors and writes the input data and error message in the session log file. User-Defined Default Value Supported Input Input/Output Output
Note: Variable ports do not support default values. The Integration Service initializes variable
ports according to the datatype. For more information, see Using Local Variables on page 15.
20
Figure 1-6 shows that the system default value for input and input/output ports appears as a blank in the transformation:
Figure 1-6. Default Value for Input and Input/Output Ports
Selected Port
Figure 1-7 shows that the system default value for output ports appears ERROR(transformation error):
Figure 1-7. Default Value for Output Ports
Selected Port
21
You can override some of the default values to change the Integration Service behavior when it encounters null input values and output transformation errors.
Input ports. You can enter user-defined default values for input ports if you do not want the Integration Service to treat null values as NULL. Output ports. You can enter user-defined default values for output ports if you do not want the Integration Service to skip the row or if you want the Integration Service to write a specific message with the skipped row to the session log. Input/output ports. You can enter user-defined default values to handle null input values for input/output ports in the same way you can enter user-defined default values for null input values for input ports. You cannot enter user-defined default values for output transformation errors in an input/output port.
Note: The Integration Service ignores user-defined default values for unconnected
transformations. For example, if you call a Lookup or Stored Procedure transformation through an expression, the Integration Service ignores any user-defined default value and uses the system default value only. Table 1-6 shows the ports for each transformation that support user-defined default values:
Table 1-6. Transformations Supporting User-Defined Default Values
Transformation Aggregator Custom Expression External Procedure Filter HTTP Java Lookup Normalizer Rank Router SQL Stored Procedure Input Values for Input Port Input/Output Port Supported Supported Supported Supported Supported Supported Supported Supported Supported Not Supported Supported Supported Supported Output Values for Output Port Not Supported Supported Supported Supported Not Supported Not Supported Supported Supported Supported Supported Not Supported Not Supported Supported Output Values for Input/Output Port Not Supported Not Supported Not Supported Not Supported Not Supported Not Supported Supported Not Supported Not Supported Not Supported Not Supported Supported Not Supported
22
Constant value. Use any constant (numeric or text), including NULL. Constant expression. You can include a transformation function with constant parameters. ERROR. Generate a transformation error. Write the row and a message in the session log or row error log. The Integration Service writes the row to session log or row error log based on session configuration. ABORT. Abort the session.
23
You cannot use values from ports within the expression because the Integration Service assigns default values for the entire mapping when it initializes the session. Some invalid default values include the following examples, which incorporate values read from ports:
AVG(IN_SALARY) IN_PRICE * IN_QUANTITY :LKP(LKP_DATES, DATE_SHIPPED)
Note: You cannot call a stored procedure or lookup table from a default value expression.
Replace the null value with a constant value or constant expression. Skip the null value with an ERROR function. Abort the session with the ABORT function.
Table 1-7 summarizes how the Integration Service handles null input for input and input/ output ports:
Table 1-7. Default Values for Input and Input/Output Ports
Default Value NULL (displays blank) Constant or Constant expression Default Value Type System User-Defined Description Integration Service passes NULL. Integration Service replaces the null value with the value of the constant or constant expression.
24
ABORT
User-Defined
Selected Port
25
The Integration Service replaces all null values in the EMAIL port with the string UNKNOWN DEPT.
DEPT_NAME Housewares NULL Produce REPLACED VALUE Housewares UNKNOWN DEPT Produce
Figure 1-9 shows a default value that instructs the Integration Service to skip null values:
Figure 1-9. Using the ERROR Function to Skip Null Input Values
When you use the ERROR function as a default value, the Integration Service skips the row with the null value. The Integration Service writes all rows skipped by the ERROR function into the session log file. It does not write these rows to the session reject file.
DEPT_NAME Housewares NULL Produce RETURN VALUE Housewares 'Error. DEPT is NULL' (Row is skipped) Produce
26
The following session log shows where the Integration Service skips the row with the null value:
TE_11019 Port [DEPT_NAME]: Default value is: ERROR(<<Transformation Error>> [error]: Error. DEPT is NULL ... error('Error. DEPT is NULL') ). CMN_1053 EXPTRANS: : ERROR: NULL input column DEPT_NAME: Current Input data: CMN_1053 Rowid=2 Input row from SRCTRANS: Rowdata: ( RowType=4 Src Rowid=2 Targ
For more information about the ERROR function, see the Transformation Language Reference.
Replace the error with a constant value or constant expression. The Integration Service does not skip the row. Abort the session with the ABORT function. Write specific messages in the session log for transformation errors.
You cannot enter user-defined default output values for input/output ports.
27
Table 1-8 summarizes how the Integration Service handles output port transformation errors and default values in transformations:
Table 1-8. Supported Default Values for Output Ports
Default Value Transformation Error Default Value Type System Description When a transformation error occurs and you did not override the default value, the Integration Service performs the following tasks: - Increases the transformation error count by 1. - Skips the row, and writes the error and input row to the session log file or row error log, depending on session configuration. The Integration Service does not write the row to the reject file. Integration Service replaces the error with the default value. The Integration Service does not increase the error count or write a message to the session log. Session aborts and the Integration Service writes a message to the session log. The Integration Service does not increase the error count or write rows to the reject file.
User-Defined
User-Defined
Replacing Errors
If you do not want the Integration Service to skip a row when a transformation error occurs, use a constant or constant expression as the default value for an output port. For example, if you have a numeric output port called NET_SALARY and you want to use the constant value 9999 when a transformation error occurs, assign the default value 9999 to the NET_SALARY port. If there is any transformation error (such as dividing by zero) while computing the value of NET_SALARY, the Integration Service uses the default value 9999.
28
The following examples show how user-defined default values may override the ERROR function in the expression:
Constant value or expression. The constant value or expression overrides the ERROR function in the output port expression. For example, if you enter 0 as the default value, the Integration Service overrides the ERROR function in the output port expression. It passes the value 0 when it encounters an error. It does not skip the row or write Negative Sale in the session log.
ABORT. The ABORT function overrides the ERROR function in the output port expression. If you use the ABORT function as the default value, the Integration Service aborts the session when a transformation error occurs. The ABORT function overrides the ERROR function in the output port expression.
ERROR. If you use the ERROR function as the default value, the Integration Service includes the following information in the session log:
Error message from the default value Error message indicated in the ERROR function in the output port expression Skipped row
ERROR('No default value')
For example, you can override the default value with the following ERROR function: The Integration Service skips the row, and includes both error messages in the log.
TE_7007 Transformation Evaluation Error; current row skipped... TE_7007 [<<Transformation Error>> [error]: Negative Sale ... error('Negative Sale') ] Sun Sep 20 13:57:28 1998 TE_11019 Port [OUT_SALES]: Default value is: ERROR(<<Transformation Error>> [error]: No default value ... error('No default value')
29
The default value must be either a NULL, a constant value, a constant expression, an ERROR function, or an ABORT function. For input/output ports, the Integration Service uses default values to handle null input values. The output default value of input/output ports is always ERROR(Transformation Error). Variable ports do not use default values. You can assign default values to group by ports in the Aggregator and Rank transformations. Not all port types in all transformations allow user-defined default values. If a port does not allow user-defined default values, the default value field is disabled. Not all transformations allow user-defined default value. For more information, see Table 1-6 on page 22. If a transformation is not connected to the mapping data flow, the Integration Service ignores user-defined default values. If any input port is unconnected, its value is assumed to be NULL and the Integration Service uses the default value for that input port. If an input port default value contains the ABORT function and the input value is NULL, the Integration Service immediately stops the session. Use the ABORT function as a default value to restrict null input values. The first null value in an input port stops the session. If an output port default value contains the ABORT function and any transformation error occurs for that port, the session immediately stops. Use the ABORT function as a default value to enforce strict rules for transformation errors. The first transformation error for this port stops the session. The ABORT function, constant values, and constant expressions override ERROR functions configured in output port expressions.
30
Figure 1-10 shows the user-defined value for a port and the Validate button:
Figure 1-10. Entering and Validating Default Values
Selected Port
Validate Button
The Designer also validates default values when you save a mapping. If you enter an invalid default value, the Designer marks the mapping invalid.
31
By default, the tracing level for every transformation is Normal. Change the tracing level to a Verbose setting only when you need to debug a transformation that is not behaving as expected. To add a slight performance boost, you can also set the tracing level to Terse, writing the minimum of detail to the session log when running a workflow containing the transformation. When you configure a session, you can override the tracing levels for individual transformations with a single tracing level for all transformations in the session.
32
Reusable Transformations
Mappings can contain reusable and non-reusable transformations. Non-reusable transformations exist within a single mapping. Reusable transformations can be used in multiple mappings. For example, you might create an Expression transformation that calculates value-added tax for sales in Canada, which is useful when you analyze the cost of doing business in that country. Rather than perform the same work every time, you can create a reusable transformation. When you need to incorporate this transformation into a mapping, you add an instance of it to the mapping. Later, if you change the definition of the transformation, all instances of it inherit the changes. The Designer stores each reusable transformation as metadata separate from any mapping that uses the transformation. If you review the contents of a folder in the Navigator, you see the list of all reusable transformations in that folder. Each reusable transformation falls within a category of transformations available in the Designer. For example, you can create a reusable Aggregator transformation to perform the same aggregate calculations in multiple mappings, or a reusable Stored Procedure transformation to call the same stored procedure in multiple mappings. You can create most transformations as a non-reusable or reusable. However, you can only create the External Procedure transformation as a reusable transformation. When you add instances of a reusable transformation to mappings, you must be careful that changes you make to the transformation do not invalidate the mapping or generate unexpected data.
Reusable Transformations
33
logs an error. For more information, see Mapping Parameters and Variables in the Designer Guide.
Design it in the Transformation Developer. In the Transformation Developer, you can build new reusable transformations. Promote a non-reusable transformation from the Mapping Designer. After you add a transformation to a mapping, you can promote it to the status of reusable transformation. The transformation designed in the mapping then becomes an instance of a reusable transformation maintained elsewhere in the repository.
If you promote a transformation to reusable status, you cannot demote it. However, you can create a non-reusable instance of it.
Note: Sequence Generator transformations must be reusable in mapplets. You cannot demote
In the Designer, switch to the Transformation Developer. Click the button on the Transformation toolbar corresponding to the type of transformation you want to create. Drag within the workbook to create the transformation. Double-click the transformation title bar to open the dialog displaying its properties. Click the Rename button and enter a descriptive name for the transformation, and click OK. Click the Ports tab, then add any input and output ports you need for this transformation. Set the other properties of the transformation, and click OK. These properties vary according to the transformation you create. For example, if you create an Expression transformation, you need to enter an expression for one or more of the transformation output ports. If you create a Stored Procedure transformation, you need to identify the stored procedure to call.
8.
34
In the Designer, open a mapping and double-click the title bar of the transformation you want to promote. Select the Make Reusable option. When prompted whether you are sure you want to promote the transformation, click Yes. Click OK to return to the mapping. Click Repository > Save.
Now, when you look at the list of reusable transformations in the folder you are working in, the newly promoted transformation appears in this list.
In the Designer, open a mapping. In the Navigator, select an existing transformation and drag the transformation into the mapping workspace. Hold down the Ctrl key before you release the transformation. The status bar displays the following message:
Make a non-reusable copy of this transformation and add it to this mapping.
3.
Release the transformation. The Designer creates a non-reusable instance of the existing reusable transformation.
4.
In the Designer, switch to the Mapping Designer. Open or create a mapping. In the list of repository objects, drill down until you find the reusable transformation you want in the Transformations section of a folder. Drag the transformation from the Navigator into the mapping.
Reusable Transformations 35
Link the new transformation to other transformations or target definitions. Click Repository > Save.
When you delete a port or multiple ports in a transformation, you disconnect the instance from part or all of the data flow through the mapping. When you change a port datatype, you make it impossible to map data from that port to another port using an incompatible datatype. When you change a port name, expressions that refer to the port are no longer valid. When you enter an invalid expression in the reusable transformation, mappings that use the transformation are no longer valid. The Integration Service cannot run sessions based on invalid mappings.
36
Figure 1-11 shows how you can revert to the original properties of the reusable transformation:
Figure 1-11. Reverting to Original Reusable Transformation Properties
Reusable Transformations
37
38
Chapter 2
Aggregator Transformation
Overview, 40 Components of the Aggregator Transformation, 41 Configuring Aggregate Caches, 44 Aggregate Expressions, 45 Group By Ports, 47 Using Sorted Input, 50 Creating an Aggregator Transformation, 52 Tips, 53 Troubleshooting, 54
39
Overview
Transformation type: Active Connected
The Aggregator transformation lets you perform aggregate calculations, such as averages and sums. The Integration Service performs aggregate calculations as it reads and stores necessary data group and row data in an aggregate cache. The Aggregator transformation is unlike the Expression transformation, in that you use the Aggregator transformation to perform calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis only. When you use the transformation language to create aggregate expressions, you can use conditional clauses to filter rows, providing more flexibility than SQL language. After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally. For more information about incremental aggregation, see the Workflow Administration Guide.
40
Aggregate cache. The Integration Service stores data in the aggregate cache until it completes aggregate calculations. It stores group values in an index cache and row data in the data cache. For more informations, see Configuring Aggregate Caches on page 44. Aggregate expression. Enter an expression in an output port. The expression can include non-aggregate expressions and conditional clauses. For more informations, see Aggregate Expressions on page 45. Group by port. Indicate how to create groups. The port can be any input, input/output, output, or variable port. When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified. For more information, see Group By Ports on page 47. Sorted input. Select this option to improve session performance. To use sorted input, you must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order. For more information, see Using Sorted Input on page 50.
You can configure the Aggregator transformation components and options on the Properties and Ports tab.
41
42
Description Index cache size for the transformation. Default cache size is 1,000,000 bytes. If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must run the session on a 64-bit Integration Service. You can configure the Integration Service to determine the cache size at run time, or you can configure a numeric value. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache. Specifies how the Integration Service applies the transformation logic to incoming data: - Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions. - All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source. For more information about transformation scope, see the Workflow Administration Guide.
Transformation Scope
Enter an expression in any output port, using conditional clauses or non-aggregate functions in the port. Create multiple aggregate output ports. Configure any input, input/output, output, or variable port as a group by port. Improve performance by connecting only the necessary input/output ports to subsequent transformations, reducing the size of the data cache. Use variable ports for local variables. Create connections to other transformations as you enter an expression.
43
sorted ports. It does not use cache memory. You do not need to configure cache memory for Aggregator transformations that use sorted ports.
44
Aggregate Expressions
The Designer allows aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions. It can also include one aggregate function nested within another aggregate function, such as:
MAX( COUNT( ITEM ))
The result of an aggregate expression varies depending on the group by ports used in the transformation. For example, when the Integration Service calculates the following aggregate expression with no group by ports defined, it finds the total quantity of items sold:
SUM( QUANTITY )
However, if you use the same expression, and you group by the ITEM port, the Integration Service returns the total quantity of items sold, by item. You can create an aggregate expression in any output port and use multiple aggregate ports in a transformation. For more information about creating expressions, see Working with Expressions on page 10.
Aggregate Functions
Use the following aggregate functions within an Aggregator transformation. You can nest one aggregate function within another aggregate function. The transformation language includes the following aggregate functions:
AVG COUNT FIRST LAST MAX MEDIAN MIN PERCENTILE STDDEV SUM VARIANCE
When you use any of these functions, you must use them in an expression within an Aggregator transformation. For a description of these functions, see Functions in the Transformation Language Reference.
Aggregate Expressions
45
Conditional Clauses
Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE. For example, use the following expression to calculate the total commissions of employees who exceeded their quarterly quota:
SUM( COMMISSION, COMMISSION > QUOTA )
Non-Aggregate Functions
You can also use non-aggregate functions in the aggregate expression. The following expression returns the highest number of items sold for each item (grouped by item). If no items were sold, the expression returns 0.
IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))
46
Group By Ports
The Aggregator transformation lets you define groups for aggregations, rather than performing the aggregation across all input data. For example, rather than finding the total company sales, you can find the total sales grouped by region. To define a group for the aggregate expression, select the appropriate input, input/output, output, and variable ports in the Aggregator transformation. You can select multiple group by ports to create a new group for each unique combination. The Integration Service then performs the defined aggregation for each group. When you group values, the Integration Service produces one row for each group. If you do not group values, the Integration Service returns one row for all input rows. The Integration Service typically returns the last row of each group (or the last row received) with the result of the aggregation. However, if you specify a particular row to be returned (for example, by using the FIRST function), the Integration Service then returns the specified row. When selecting multiple group by ports in the Aggregator transformation, the Integration Service uses port order to determine the order by which it groups. Since group order can affect the results, order group by ports to ensure the appropriate grouping. For example, the results of grouping by ITEM_ID then QUANTITY can vary from grouping by QUANTITY then ITEM_ID, because the numeric values for quantity are not necessarily unique. The following Aggregator transformation groups first by STORE_ID and then by ITEM:
Group By Ports
47
The Integration Service performs the aggregate calculation on the following unique groups:
STORE_ID
101 101 201 301
The Integration Service then passes the last row received, along with the results of the aggregation, as follows:
STORE_ID 101 101 201 301 ITEM 'battery' 'AAA' 'battery' 'battery' QTY 2 2 4 1 PRICE 2.59 2.45 1.59 2.45 SALES_PER_STORE 17.34 4.90 8.35 2.45
Non-Aggregate Expressions
Use non-aggregate expressions in group by ports to modify or replace groups. For example, if you want to replace AAA battery before grouping, you can create a new group by output port, named CORRECTED_ITEM, using the following expression:
IIF( ITEM = 'AAA battery', battery, ITEM )
Default Values
Use default values in the group by port to replace null input values. This allows the Integration Service to include null item groups in the aggregation. For more information about default values, see Using Default Values for Ports on page 20.
48
For example, if you define a default value of Misc in the ITEM column, the Integration Service replaces null groups with Misc:
Group By Ports
49
If you use sorted input and do not presort data correctly, you receive unexpected results.
The aggregate expression uses nested aggregate functions. The session uses incremental aggregation.
If you use sorted input and do not sort data correctly, the session fails.
Pre-Sorting Data
To use sorted input, you pass sorted data through the Aggregator. Data must be sorted in the following ways:
By the Aggregator group by ports, in the order they appear in the Aggregator transformation. Using the same sort order configured for the session. If data is not in strict ascending or descending order based on the session sort order, the Integration Service fails the session. For example, if you configure a session to use a French sort order, data passing into the Aggregator transformation must be sorted using the French sort order.
50
For relational and file sources, use the Sorter transformation to sort data in the mapping before passing it to the Aggregator transformation. You can place the Sorter transformation anywhere in the mapping prior to the Aggregator if no transformation changes the order of the sorted data. Group by columns in the Aggregator transformation must be in the same order as they appear in the Sorter transformation. For information about sorting data using the Sorter transformation, see Sorter Transformation on page 457. If the session uses relational sources, you can also use the Number of Sorted Ports option in the Source Qualifier transformation to sort group by columns in the source database. Group by columns must be in the same order in both the Aggregator and Source Qualifier transformations. For information about sorting data in the Source Qualifier, see Using Sorted Ports on page 494. The following mapping shows a Sorter transformation configured to sort the source data in ascending order by ITEM_NO:
With sorted input, the Aggregator transformation returns the following results:
ITEM_NAME Cereal Soup QTY 2 2 PRICE 5.25 3.25 INCOME_PER_ITEM 14.99 21.25
51
In the Mapping Designer, click Transformation > Create. Select the Aggregator transformation. Enter a name for the Aggregator, click Create. Then click Done. The Designer creates the Aggregator transformation.
3.
Drag the ports to the Aggregator transformation. The Designer creates input/output ports for each port you include.
4. 5. 6.
Double-click the title bar of the transformation to open the Edit Transformations dialog box. Select the Ports tab. Click the group by option for each column you want the Aggregator to use in creating groups. Optionally, enter a default value to replace null groups.
7.
Click Add to add an expression port. The expression port must be an output port. Make the port an output port by clearing Input (I). For more information about creating expressions, see Working with Expressions on page 10.
8.
Optionally, add default values for specific ports. If the target database does not handle null values and certain ports are likely to contain null values, specify a default value.
9. 10. 11.
Configure properties on the Properties tab. Click OK. Click Repository > Save to save changes to the mapping.
52
Tips
Use sorted input to decrease the use of aggregate caches. Sorted input reduces the amount of data cached during the session and improves session performance. Use this option with the Sorter transformation to pass sorted data to the Aggregator transformation. Limit connected input/output or output ports. Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache. Filter before aggregating. If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation.
Tips
53
Troubleshooting
I selected sorted input but the workflow takes the same amount of time as before. You cannot use sorted input if any of the following conditions are true:
The aggregate expression contains nested aggregate functions. The session uses incremental aggregation. Source data is data driven.
When any of these conditions are true, the Integration Service processes the transformation as if you do not use sorted input. A session using an Aggregator transformation causes slow performance. The Integration Service may be paging to disk during the workflow. You can increase session performance by increasing the index and data cache sizes in the transformation properties. For more information about caching, see Session Caches in the Workflow Administration Guide. I entered an override cache directory in the Aggregator transformation, but the Integration Service saves the session incremental aggregation files somewhere else. You can override the transformation cache directory on a session level. The Integration Service notes the cache directory in the session log. You can also check the session properties for an override cache directory.
54
Chapter 3
Overview, 56 Configuring the Complex Data Exchange Repository Folder, 58 Complex Data Transformation Settings, 59 Complex Data Transformation Ports, 62 Complex Data Transformation Components, 63 Creating a Mapping, 65 Steps to Configure a Complex Data Transformation, 69
55
Overview
Transformation type: Active/Passive Connected
The Complex Data transformation integrates with Complex Data Exchange to transform data in unstructured and semi-structured file formats. Complex Data Exchange transforms documents of any format, such as Microsoft Word, Excel, HTML, and PDF files. It also transforms data structured formats such as ACORD, HIPAA, HL7, EDI-X12, EDIFACT, AFP, and SWIFT. When you run a session with the Complex Data transformation, the transformation passes either source data or the source data file path to the Complex Data Exchange Engine. The Complex Data Engine is the Complex Data Exchange runtime module that executes data transformations. Complex Data Exchange runs a service that transforms the data. A service is a Complex Data Exchange data transformation that you deploy to a repository. The Complex Data Exchange Engine can execute services from the repository. The Complex Data Exchange Engine passes the output back to the Complex Data transformation, or it writes the output directly to an output file.
Parser. Converts source documents to XML. Serializer. Converts XML to other formats. Mapper. Converts XML files to XML files with a different structure. Transformer. Modifies the data in any format. Streamer. Splits large input documents into segments. The streamer processes documents that have multiple messages or records in them, such as HIPAA or EDI files.
For more information about creating projects with Complex Data Exchange, see Getting Started with Complex Data Exchange.
56
to call from the Complex Data Exchange Repository. Complete the following steps to transform data with the Complex Data transformation: 1. 2. 3. 4. Install the Unstructured Data Option on the client machine and the machine that runs the Integration Service. Configure a transformation project in Complex Data Exchange Studio. Deploy the project as a Complex Data Exchange service. Create a Complex Data transformation that calls the Complex Data Exchange service. For more information, see Complex Data Transformation Components on page 63. Configure a mapping with the Complex Data transformation. For more information, see Creating a Mapping on page 65.
Overview
57
To configure the repository folder location, open Complex Data Exchange Configuration from the Windows Start menu. The repository location is in the following path in the Complex Data Exchange Configuration:
CM Configuration > CM Repository > File System > Base Path
The Complex Data Exchange Engine runs the services that you deploy to the repository. To run a service in a PowerCenter session, copy the Complex Data Exchange service folder to the Complex Data Exchange repository on the computer that runs the Integration Service. The Unstructured Data Option must be installed on that machine. If the development computer can access the remote file system, you can change the Complex Data Exchange repository to a remote location and deploy services directly from the Complex Data Exchange Studio to the remote computer that runs the Integration Service. For more information about deploying services to remote machines, see the Complex Data Exchange Studio User Guide. If you added any custom files to the Complex Data Exchange autoInclude\user or the externLibs\user directory, copy them to the autoInclude\user or externLibs\user directory on the machine that runs the Integration Service. For information about using these directories, see External Components in the Complex Data Exchange Engine Developer Guide.
58
After you create the transformation you can change these settings on the CDET Settings tab. Figure 3-2 shows the CDET Settings tab:
Figure 3-2. CDET Settings Tab
59
Input Type
The input type determines how the Complex Data Exchange Engine receives input data from the Complex Data transformation. The input type determines whether the InputBuffer port contains source data or a source document path. The input type is Buffer or File.
Buffer
When the input type is Buffer, the Complex Data transformation receives source data in the InputBufferPort. The Integration Service passes data from the port to the Complex Data Exchange Engine.
File
When the input type is File, the Complex Data transformation receives the source file path in the InputBufferPort. The Integration Service passes the source file path to the Complex Data Exchange Engine. The Complex Data Exchange Engine opens the source file. You can use the File input type when you need to parse binary files such as Microsoft Excel or Microsoft Word files.
Output Type
The output type determines how the Complex Data Exchange Engine returns output data. The Complex Data transformation with the Buffer or File output type returns one row for each input row. A Complex Data transformation with the Splitting XML output type can generate multiple rows for each input row. Table 3-1 shows the contents of the Complex Data transformation ports by output type:
Table 3-1. Complex Data Transformation Port Contents by Output Type
Output Type File File Buffer Splitting Port OutputFileName OutputBuffer OutputBuffer OutputBuffer Port Type Input Output Output Output Description Receives a file name. The Complex Data Exchange Engine creates an output file with the name. Returns the file name when the Complex Data Exchange successfully writes the output. Returns transformed data from Complex Data Exchange. Returns XML data from the Complex Data Exchange. The XML file can be split across multiple rows.
Buffer
The Complex Data Exchange returns transformed data back to the Integration Service. The Complex Data transformation receives the data and writes it to the target from the OutputBuffer port.
60
File
The Complex Data Exchange Engine writes the output to a file. It does not return the data to the Integration Service. The Complex Data Exchange Engine names the output file based on the file name from the OutputFilename column. The Integration Service writes the output filename to the target for each source row that the Complex Data Exchange Engine transforms. When an error occurs, the Integration Service writes a NULL value to the target and returns a row error. If the output filename is blank, the Integration Service returns a row error. You can choose the File output type when you transform XML to binary data such as a PDF file or a Microsoft Excel file. The Complex Data Exchange Engine writes the output file instead of the Complex Data transformation.
Splitting
The Complex Data transformation splits XML data from the Complex Data Exchange into multiple rows. Configure Splitting output for XML files that are too large for the OutputBuffer port. When you configure Splitting output, pass the XML data to the XML Parser transformation. Configure the XML Parser transformation to process the multiple XML rows as one XML file.
61
When you choose the File output type, the Designer adds a port for the file name.
When you create a Complex Data transformation, the Designer creates the CDETInput group and the CDETOutput group. Table 3-2 describes the default ports:
Table 3-2. Complex Data Transformation Ports
Port InputBuffer OutputFileName Input/Output Input Input Description Receives source data or a path to the source document. File output type only. Contains a name for the output file. If you do not connect OutputFileName to a downstream transformation or target, the mapping is invalid. Returns output from the Complex Data Engine. Returns the output file name when the Complex Data Engine writes the output file.
OutputBuffer
Output
62
Transformation. Enter the name and description of the transformation. The naming convention for a Complex Data transformation is CD_TransformationName. You can also make the Complex Data transformation reusable. Ports. View the transformation ports and attributes. For more information, see Complex Data Transformation Ports on page 62. Properties. Configure the Complex Data transformation general properties such as Runtime Location and Tracing Level. For more information, see Properties Tab on page 63. CDET Ports. Configure Complex Data transformation ports and attributes. Modify port attributes and add pass-through ports. For more information, see Complex Data Transformation Ports on page 62. CDET Settings. Modify Complex Data transformation settings such as Input and Output type. For more information, see Complex Data Transformation Settings on page 59.
Properties Tab
Configure the Complex Data transformation general properties on the Properties tab. Some transformation properties do not apply to the transformation or are not configurable. Figure 3-4 shows the Complex Data transformation Properties tab:
Figure 3-4. Complex Data Transformation Properties Tab
63
Tracing Level
Required
IsPartitionable
Required
Optional
Output is Deterministic
Optional
64
Creating a Mapping
When you create a mapping, you design the mapping according to the type of Complex Data Exchange project you are going to run. For example, the Complex Data Exchange Parser and Mapper generate XML data. You can link the OutputBuffer port in the Complex Data transformation to an XML Parser transformation. The Complex Data Exchange Serializer component can generate any output from XML. It can generate HTML or binary files such Microsoft Word or Microsoft Excel. When the output is binary data, the Complex Data Exchange Engine can write the output to a file instead of passing it back to the Complex Data transformation.
Source Qualifier transformation. Passes the Word document file name to the Complex Data transformation. The source file name contains the complete path to the file that contains order information.
Creating a Mapping 65
Complex Data transformation. Receives the source file name in the InputBuffer port. It passes the name to the Complex Data Exchange Engine. The Complex Data Exchange Engine runs a parser service to transform the data to XML. The Complex Data transformation returns the XML data to the OutPutBuffer port. The XML Parser transformation. Receives the XML data in the DataInput port. It parses the XML data and returns order header and detail information to relational targets.
The Source Qualifier transformation passes an XML file and a file name to the Complex Data transformation. The source definition contains employee names. The Complex Data transformation receives the XML file in the InputBuffer port and the file name in the OutputFileName port. It passes the XML data and the file name to the Complex Data Engine. The Complex Data Engine runs a serializer service to transform the XML data to a Microsoft Excel file. It writes the Excel file with a file name based on the value of OutputFilename. The Complex Data transformation returns the file name in the OutputBuffer port. The flat file target receives the file name.
66
When you configure the Complex Data transformation to split XML output, enable the XML Parser transformation to receive the XML data in multiple rows. Otherwise the XML Parser transformation might receive an incomplete XML file. The session might fail. For more information about the XML Parser transformation, see Midstream XML Transformations in the XML Guide. The Complex Data transformation returns data in pass-through ports whether the row is successful or not. When the transformation returns XML data in multiple rows, it generates the same pass-through data each time it generates a row. You can use a Filter transformation to remove the duplicate pass-through data before writing it to a target. To enable the XML Parser transformation to receive XML files over multiple rows, select Enable XML Input Streaming in the transformation session properties. Figure 3-7 shows where to enable the XML Parser transformation to receive split XML data:
Figure 3-7. Enabling Streaming Input on the XML Parser Transformation
Enable streaming XML input when the XML file is in multiple rows.
If a Complex Data transformation in a mapping has the File output type, you must link the OutputBuffer port to a downstream transformation. Otherwise, the mapping is invalid. The OutputBuffer port contains the output file name. Link SPLIT XML output from the Complex Data transformation to an XML Parser transformation.
Creating a Mapping 67
You must configure a service name with the Complex Data transformation or the mapping is invalid.
68
In the Mapping Designer or Transformation Developer, click Transformation >Create. Select Complex Data Transformation as the transformation type. Enter a name for the transformation. Click Create. The Complex Data Transformation dialog box appears.
5.
Required Required
6. 7.
Click OK. You can change the settings on the CDET Settings tab and you can add pass-through ports on the CDET Ports tab.
69
70
Chapter 4
Custom Transformation
Overview, 72 Creating Custom Transformations, 75 Working with Groups and Ports, 77 Working with Port Attributes, 80 Custom Transformation Properties, 82 Working with Transaction Control, 86 Blocking Input Data, 88 Working with Procedure Properties, 90 Creating Custom Transformation Procedures, 91
71
Overview
Transformation type: Active/Passive Connected
Custom transformations operate in conjunction with procedures you create outside of the Designer interface to extend PowerCenter functionality. You can create a Custom transformation and bind it to a procedure that you develop using the functions described in Custom Transformation Functions on page 107. Use the Custom transformation to create transformation applications, such as sorting and aggregation, which require all input rows to be processed before outputting any output rows. To support this process, the input and output functions occur separately in Custom transformations compared to External Procedure transformations. The Integration Service passes the input data to the procedure using an input function. The output function is a separate function that you must enter in the procedure code to pass output data to the Integration Service. In contrast, in the External Procedure transformation, an external procedure function does both input and output, and its parameters consist of all the ports of the transformation. You can also use the Custom transformation to create a transformation that requires multiple input groups, multiple output groups, or both. A group is the representation of a row of data entering or leaving a transformation. For example, you might create a Custom transformation with one input group and multiple output groups that parses XML data. Or, you can create a Custom transformation with two input groups and one output group that merges two streams of input data into one stream of output data.
Complex Data transformation with PowerCenter HTTP transformation with PowerCenter Java transformation with PowerCenter SQL transformation with PowerCenter
72
Union transformation with PowerCenter XML Parser transformation with PowerCenter XML Generator transformation with PowerCenter SAP/ALE_IDoc_Interpreter transformation with PowerExchange for SAP NetWeaver mySAP Option SAP/ALE_IDoc_Prepare transformation with PowerExchange for SAP NetWeaver mySAP Option BAPI/RFC transformation with PowerExchange for SAP NetWeaver mySAP Option Web Service Consumer transformation with PowerExchange for Web Services
Integration Service data movement mode The INFA_CTChangeStringMode() function The INFA_CTSetDataCodePageID() function
The Custom transformation procedure code page must be two-way compatible with the Integration Service code page. The Integration Service passes data to the procedure in the Custom transformation procedure code page. Also, the data the procedure passes to the Integration Service must be valid characters in the Custom transformation procedure code page. By default, when the Integration Service runs in ASCII mode, the Custom transformation procedure code page is ASCII. Also, when the Integration Service runs in Unicode mode, the Custom transformation procedure code page is UCS-2, but the Integration Service only passes characters that are valid in the Integration Service code page. However, use the INFA_CTChangeStringMode() functions in the procedure code to request the data in a different format. In addition, when the Integration Service runs in Unicode mode, you can request the data in a different code page using the INFA_CTSetDataCodePageID() function. Changing the format or requesting the data in a different code page changes the Custom transformation procedure code page to the code page the procedure requests:
ASCII mode. You can write the external procedure code to request the data in UCS-2 format using the INFA_CTChangeStringMode() function. When you use this function, the procedure must pass only ASCII characters in UCS-2 format to the Integration Service. Do not use the INFA_CTSetDataCodePageID() function when the Integration Service runs in ASCII mode. Unicode mode. You can write the external procedure code to request the data in MBCS using the INFA_CTChangeStringMode() function. When the external procedure requests the data in MBCS, the Integration Service passes the data in the Integration Service code
Overview 73
page. When you use the INFA_CTChangeStringMode() function, you can write the external procedure code to request the data in a different code page from the Integration Service code page using the INFA_CTSetDataCodePageID() function. The code page you specify in the INFA_CTSetDataCodePageID() function must be two-way compatible with the Integration Service code page.
Note: You can also use the INFA_CTRebindInputDataType() function to change the format
74
m_<module_name>.c. Defines the module. This file includes an initialization function, m_<module_name>_moduleInit() that lets you write code you want the Integration Service to run when it loads the module. Similarly, this file includes a deinitialization function, m_<module_name>_moduleDeinit(), that lets you write code you want the Integration Service to run before it unloads the module. p_<procedure_name>.c. Defines the procedure in the module. This file contains the code that implements the procedure logic, such as data cleansing or merging data. makefile.aix, makefile.aix64, makefile.hp, makefile.hp64, makefile.hpparisc64, makefile.linux, makefile.sol, and makefile.sol64. Make files for the UNIX platforms. Use makefile.aix64 for 64-bit AIX platforms, makefile.sol64 for 64-bit Solaris platforms, and makefile.hp64 for 64-bit HP-UX (Itanium) platforms.
Custom transformations are connected transformations. You cannot reference a Custom transformation in an expression. You can include multiple procedures in one module. For example, you can include an XML writer procedure and an XML parser procedure in the same module. You can bind one shared library or DLL to multiple Custom transformation instances if you write the procedure code to handle multiple Custom transformation instances. When you write the procedure code, you must make sure it does not violate basic mapping rules. For more information about mappings and mapping validation, see Mappings in the Designer Guide. The Custom transformation sends and receives high precision decimals as high precision decimals.
75
Transformation tab. You can rename the transformation and add a description on the Transformation tab. Ports tab. You can add and edit ports and groups to a Custom transformation. For more information about creating ports and groups, see Working with Groups and Ports on page 77. You can also define the input ports an output port depends on. For more information about defining port dependencies, see Defining Port Relationships on page 78. Port Attribute Definitions tab. You can create user-defined port attributes for Custom transformation ports. For more information about creating and editing port attributes, see Working with Port Attributes on page 80. Properties tab. You can define transformation properties such as module and function identifiers, transaction properties, and the runtime location. For more information about defining transformation properties, see Custom Transformation Properties on page 82. Initialization Properties tab. You can define properties that the external procedure uses at runtime, such as during initialization. For more information about creating initialization properties, see Working with Procedure Properties on page 90. Metadata Extensions tab. You can create metadata extensions to define properties that the procedure uses at runtime, such as during initialization. For more information about using metadata extensions for procedure properties, see Working with Procedure Properties on page 90.
76
Add and delete groups, and edit port attributes. First Input Group Header Output Group Header
77
You can change group names by typing in the group header. You can only enter ASCII characters for port and group names. Once you create a group, you cannot change the group type. If you need to change the group type, delete the group and add a new group. When you delete a group, the Designer deletes all ports of the same type in that group. However, all input/output ports remain in the transformation, belong to the group above them, and change to input ports or output ports, depending on the type of group you delete. For example, an output group contains output ports and input/output ports. You delete the output group. The Designer deletes the output ports. It changes the input/ output ports to input ports. Those input ports belong to the input group with the header directly above them. To move a group up or down, select the group header and click the Move Port Up or Move Port Down button. The ports above and below the group header remain the same, but the groups to which they belong might change.
78
Figure 4-2 shows where you create and edit port dependencies:
Figure 4-2. Editing Port Dependencies
Choose an input or input/output port on which the output or input/output port depends.
For example, create a external procedure that parses XML data. You create a Custom transformation with one input group containing one input port and multiple output groups containing multiple output ports. According to the external procedure logic, all output ports depend on the input port. You can define this relationship in the Custom transformation by creating a port dependency for each output port. Define each port dependency so that the output port depends on the one input port.
To create a port dependency: 1. 2. 3. 4. 5. 6. 7.
On the Ports tab, click Custom Transformation and choose Port Dependencies. In the Output Port Dependencies dialog box, select an output or input/output port in the Output Port field. In the Input Ports pane, select an input or input/output port on which the output port or input/output port depends. Click Add. Repeat steps 3 to 4 to include more input or input/output ports in the port dependency. To create another port dependency, repeat steps 2 to 5. Click OK.
79
Port Attribute
Default Value
Name. The name of the port attribute. Datatype. The datatype of the port attribute value. You can choose Boolean, Numeric, or String. Value. The default value of the port attribute. This property is optional. When you enter a value here, the value applies to all ports in the Custom transformation. You can override the port attribute value for each port on the Ports tab.
You define port attributes for each Custom transformation. You cannot copy a port attribute from one Custom transformation to another.
80
You can change the port attribute value for a particular port by clicking the Open button. This opens the Edit Port Attribute Default Value dialog box. Or, you can enter a new value by typing directly in the Value column. You can filter the ports listed in the Edit Port Level Attributes dialog box by choosing a group from the Select Group field.
81
Module Identifier
Function Identifier
82
Runtime Location
Is Active
Generate Transaction
83
Output is Deterministic
Within the procedure. You can write the external procedure code to set the update strategy for output rows. The external procedure can flag rows for insert, update, delete, or reject. For more information about the functions used to set the update strategy, see Row Strategy Functions (Row-Based Mode) on page 146. Within the mapping. Use the Custom transformation in a mapping to flag rows for insert, update, delete, or reject. Select the Update Strategy Transformation property for the Custom transformation. Within the session. Configure the session to treat the source rows as data driven.
If you do not configure the Custom transformation to define the update strategy, or you do not configure the session as data driven, the Integration Service does not use the external procedure code to flag the output rows. Instead, when the Custom transformation is active, the Integration Service flags the output rows as insert. When the Custom transformation is passive, the Integration Service retains the row type. For example, when a row flagged for update enters a passive Custom transformation, the Integration Service maintains the row type and outputs the row as update.
You can configure the Custom transformation so the Integration Service uses one thread to process the Custom transformation for each partition using the Requires Single Thread Per Partition property. When you configure a Custom transformation to process each partition with one thread, the Integration Service calls the following functions with the same thread for each partition:
You can include thread-specific operations in these functions because the Integration Service uses the same thread to process these functions for each partition. For example, you might attach and detach threads to a Java Virtual Machine.
Note: When you configure a Custom transformation to process each partition with one thread,
the Workflow Manager adds partition points depending on the mapping configuration. For more information, see the Workflow Administration Guide.
85
Transformation Scope. Determines how the Integration Service applies the transformation logic to incoming data. Generate Transaction. Indicates that the procedure generates transaction rows and outputs them to the output groups.
Transformation Scope
You can configure how the Integration Service applies the transformation logic to incoming data. You can choose one of the following values:
Row. Applies the transformation logic to one row of data at a time. Choose Row when the results of the procedure depend on a single row of data. For example, you might choose Row when a procedure parses a row containing an XML file. Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when the results of the procedure depend on all rows in the same transaction, but not on rows in other transactions. When you choose Transaction, you must connect all input groups to the same transaction control point. For example, you might choose Transaction when the external procedure performs aggregate calculations on the data in a single transaction. All Input. Applies the transformation logic to all incoming data. When you choose All Input, the Integration Service drops transaction boundaries. Choose All Input when the results of the procedure depend on all rows of data in the source. For example, you might choose All Input when the external procedure performs aggregate calculations on all incoming data, or when it sorts all incoming data.
For more information about transformation scope, see Understanding Commit Points in the Workflow Administration Guide.
Generate Transaction
You can write the external procedure code to output transactions, such as commit and rollback rows. When the external procedure outputs commit and rollback rows, configure the Custom transformation to generate transactions. Select the Generate Transaction transformation property. You can enable this property for active Custom transformations. For information about the functions you use to generate transactions, see Data Boundary Output Notification Function on page 139. When the external procedure outputs a commit or rollback row, it outputs or rolls back the row for all output groups. When you configure the transformation to generate transactions, the Integration Service treats the Custom transformation like a Transaction Control transformation. Most rules that apply to a Transaction Control transformation in a mapping also apply to the Custom
86 Chapter 4: Custom Transformation
transformation. For example, when you configure a Custom transformation to generate transactions, you cannot concatenate pipelines or pipeline branches containing the transformation. For more information about working with Transaction Control transformations, see Transaction Control Transformation on page 577. When you edit or create a session using a Custom transformation configured to generate transactions, configure it for user-defined commit.
All Input
87
88
The procedure code includes two algorithms, one that uses blocking and the other that copies the source data to a buffer allocated by the procedure instead of blocking data. The code checks whether or not the Integration Service allows the Custom transformation to block data. The procedure uses the algorithm with the blocking functions when it can block, and uses the other algorithm when it cannot block. You might want to do this to create a Custom transformation that you use in multiple mapping configurations. For more information about verifying whether the Integration Service allows a Custom transformation to block data, see Validating Mappings with Custom Transformations on page 89.
Note: When the procedure blocks data and you configure the Custom transformation as a
Validating at Runtime
When you run a session, the Integration Service validates the mapping against the procedure code at runtime. When the Integration Service does this, it tracks whether or not it allows the Custom transformations to block data:
Configure the Custom transformation as a blocking transformation. The Integration Service always allows the Custom transformation to block data. Configure the Custom transformation as a non-blocking transformation. The Integration Service allows the Custom transformation to block data depending on the mapping configuration. If the Integration Service can block data at the Custom transformation without blocking all sources in the target load order group simultaneously, it allows the Custom transformation to block data.
You can write the procedure code to check whether or not the Integration Service allows a Custom transformation to block data. Use the INFA_CT_getInternalProperty() function to access the INFA_CT_TRANS_MAY_BLOCK_DATA property ID. The Integration Service returns TRUE when the Custom transformation can block data, and it returns FALSE when the Custom transformation cannot block data. For more information about the INFA_CT_getInternalProperty() function, see Property Functions on page 126.
89
Metadata Extensions. You can specify the property name, datatype, precision, and value. Use metadata extensions for passing information to the procedure. For more information about creating metadata extensions, see Metadata Extensions in the Repository Guide. Initialization Properties. You can specify the property name and value.
While you can define properties on both tabs in the Custom transformation, the Metadata Extensions tab lets you provide more detail for the property. Use metadata extensions to pass properties to the procedure. For example, you create a Custom transformation external procedure that sorts data after transforming it. You could create a boolean metadata extension named Sort_Ascending. When you use the Custom transformation in a mapping, you can choose True or False for the metadata extension, depending on how you want the procedure to sort the data. When you define a property in the Custom transformation, use the get all property names functions, such as INFA_CTGetAllPropertyNamesM(), to access the names of all properties defined on the Initialization Properties and Metadata Extensions tab. Use the get external property functions, such as INFA_CT_getExternalPropertyM(), to access the property name and value of a property ID you specify.
Note: When you define a metadata extension and an initialization property with the same
name, the property functions only return information for the metadata extension.
90
This section includes an example to demonstrate this process. The steps in this section create a Custom transformation that contains two input groups and one output group. The Custom transformation procedure verifies that the Custom transformation uses two input groups and one output group. It also verifies that the number of ports in all groups are equal and that the port datatypes are the same for all groups. The procedure takes rows of data from each input group and outputs all rows to the output group.
In the Transformation Developer, click Transformation > Create. In the Create Transformation dialog box, choose Custom transformation, enter a transformation name, and click Create. In the Union example, enter CT_Inf_Union as the transformation name.
3.
In the Active or Passive dialog box, create the transformation as a passive or active transformation, and click OK. In the Union example, choose Active.
4. 5.
Click Done to close the Create Transformation dialog box. Open the transformation and click the Ports tab. Create groups and ports. You can edit the groups and ports later, if necessary. For more information about creating groups and ports, see Working with Groups and Ports on page 77.
91
In the Union example, create the groups and ports shown in Figure 4-6:
Figure 4-6. Custom Transformation Ports Tab - Union Example
6.
Select the Properties tab and enter a module and function identifier and the runtime location. Edit other transformation properties. For more information about Custom transformation properties, see Custom Transformation Properties on page 82.
92
7.
Click the Metadata Extensions tab to enter metadata extensions, such as properties the external procedure might need for initialization. For more information about using metadata extensions for procedure properties, see Working with Procedure Properties on page 90. In the Union example, do not create metadata extensions.
8.
Click the Port Attribute Definitions tab to create port attributes, if necessary. For more information about creating port attributes, see Working with Port Attributes on page 80. In the Union example, do not create port attributes.
9. 10.
After you create the Custom transformation that calls the procedure, the next step is to generate the C files.
93
In the Transformation Developer, select the transformation and click Transformation > Generate Code. Select the procedure you just created. The Designer lists the procedures as <module_name>.<procedure_name>. In the Union example, select UnionDemo.Union.
3.
Specify the directory where you want to generate the files, and click Generate. In the Union example, select <client_installation_directory>/TX. The Designer creates a subdirectory, <module_name>, in the directory you specified. In the Union example, the Designer creates <client_installation_directory>/TX/ UnionDemo. It also creates the following files:
m_UnionDemo.c m_UnionDemo.h p_Union.c p_Union.h makefile.aix (32-bit), makefile.aix64 (64-bit), makefile.hp (32-bit), makefile.hp64 (64-bit), makefile.hpparisc64, makefile.linux (32-bit), and makefile.sol (32-bit).
Open p_<procedure_name>.c for the procedure. In the Union example, open p_Union.c.
2. 3.
Enter the C code for the procedure. Save the modified file. In the Union example, use the following code:
/************************************************************************** * * Copyright (c) 2005 Informatica Corporation. This file contains * material proprietary to Informatica Corporation and may not be copied * or distributed in any form without the written permission of Informatica * Corporation * **************************************************************************/
94
/************************************************************************** * Custom Transformation p_union Procedure File * * This file contains code that functions that will be called by the main * server executable. * * for more information on these files, * see $(INFA_HOME)/ExtProc/include/Readme.txt **************************************************************************/
/* * INFORMATICA 'UNION DEMO' developed using the API for custom * transformations.
* File Name: p_Union.c * * An example of a custom transformation ('Union') using PowerCenter8.0 * * The purpose of the 'Union' transformation is to combine pipelines with the * same row definition into one pipeline (i.e. union of multiple pipelines). * [ Note that it does not correspond to the mathematical definition of union * since it does not eliminate duplicate rows.] * * This example union transformation allows N input pipelines ( each * corresponding to an input group) to be combined into one pipeline. * * To use this transformation in a mapping, the following attributes must be * true: * a. The transformation must have >= 2 input groups and only one output group. * b. In the Properties tab set the following properties: * * * * * * * * * * * This version of the union transformation does not provide code for changing the update strategy or for generating transactions. i. ii. Module Identifier: UnionDemo Function Identifier: Union
iii. Inputs May Block: Unchecked iv. v. vi. Is Active: Checked Update Strategy Transformation: Unchecked * Transformation Scope: All
* c. The input groups and the output group must have the same number of ports * * and the same datatypes. This is verified in the initialization of the module and the session is failed if this is not true.
95
* * */
Load Order Group and can also be contained within multiple partitions.
Description: Initialization for the procedure. Returns INFA_SUCCESS if procedure initialization succeeds, else return INFA_FAILURE.
Input: procedure - the handle for the procedure Output: None Remarks: This function will get called once for the session at initialization time. It will be called after the moduleInit function. **************************************************************************/
INFA_STATUS p_union_procInit( INFA_CT_PROCEDURE_HANDLE procedure) { const INFA_CT_TRANSFORMATION_HANDLE* transformation = NULL; const INFA_CT_PARTITION_HANDLE* partition = NULL; size_t nTransformations = 0, nPartitions = 0, i = 0;
/* Log a message indicating beginning of the procedure initialization */ INFA_CTLogMessageM( eESL_LOG, "union_demo: Procedure initialization started ..." );
96
/* For each transformation verify that the 0th partition has the correct * properties. This does not need to be done for all partitions since rest * of the partitions have the same information */ for (i = 0; i < nTransformations; i++) { /* Get the partition handle */ partition = INFA_CTGetChildrenHandles(transformation[i], &nPartitions, PARTITIONTYPE );
if (validateProperties(partition) != INFA_SUCCESS) { INFA_CTLogMessageM( eESL_ERROR, "union_demo: Failed to validate attributes of " "the transformation"); return INFA_FAILURE; } }
return INFA_SUCCESS; }
Description: Deinitialization for the procedure. Returns INFA_SUCCESS if procedure deinitialization succeeds, else return INFA_FAILURE.
Input: procedure - the handle for the procedure Output: None Remarks: This function will get called once for the session at deinitialization time. It will be called before the moduleDeinit function. **************************************************************************/
97
Description: Initialization for the partition. Returns INFA_SUCCESS if partition deinitialization succeeds, else return INFA_FAILURE.
Input: partition - the handle for the partition Output: None Remarks: This function will get called once for each partition for each transformation in the session. **************************************************************************/
Description: Deinitialization for the partition. Returns INFA_SUCCESS if partition deinitialization succeeds, else return INFA_FAILURE.
Input: partition - the handle for the partition Output: None Remarks: This function will get called once for each partition for each transformation in the session. **************************************************************************/
98
Description: Notification that a row needs to be processed for an input group in a transformation for the given partition. Returns INFA_ROWSUCCESS if the input row was processed successfully, INFA_ROWFAILURE if the input row was not processed successfully and INFA_FATALERROR if the input row causes the session to fail.
Input: partition - the handle for the partition for the given row group - the handle for the input group for the given row Output: None Remarks: This function is probably where the meat of your code will go, as it is called for every row that gets sent into your transformation. **************************************************************************/
{ const INFA_CT_OUTPUTGROUP_HANDLE* outputGroups = NULL; const INFA_CT_INPUTPORT_HANDLE* inputGroupPorts = NULL; const INFA_CT_OUTPUTPORT_HANDLE* outputGroupPorts = NULL; size_t nNumInputPorts = 0, nNumOutputGroups = 0, nNumPortsInOutputGroup = 0, i = 0;
/* Get the output group port handles */ outputGroups = INFA_CTGetChildrenHandles(partition, &nNumOutputGroups, OUTPUTGROUPTYPE);
/* Get the input groups port handles */ inputGroupPorts = INFA_CTGetChildrenHandles(inputGroup, &nNumInputPorts, INPUTPORTTYPE);
/* For the union transformation, on receiving a row of input, we need to * output that row on the output group. */ for (i = 0; i < nNumInputPorts; i++) { INFA_CTSetData(outputGroupPorts[i], INFA_CTGetDataVoid(inputGroupPorts[i]));
99
INFA_CTSetIndicator(outputGroupPorts[i], INFA_CTGetIndicator(inputGroupPorts[i]) );
INFA_CTSetLength(outputGroupPorts[i], INFA_CTGetLength(inputGroupPorts[i]) ); }
/* We know there is only one output group for each partition */ return INFA_CTOutputNotification(outputGroups[0]); }
Description: Notification that the last row for an input group has already been seen. Return INFA_FAILURE if the session should fail as a result of seeing this notification, INFA_SUCCESS otherwise.
Input: partition - the handle for the partition for the notification group - the handle for the input group for the notification Output: None **************************************************************************/
INFA_STATUS p_union_eofNotification( INFA_CT_PARTITION_HANDLE partition, INFA_CT_INPUTGROUP_HANDLE group) { INFA_CTLogMessageM( eESL_LOG, "union_demo: An input group received an EOF notification");
return INFA_SUCCESS; }
Description: Notification that a transaction has ended. The data boundary type can either be commit or rollback. Return INFA_FAILURE if the session should fail as a result of seeing this notification, INFA_SUCCESS otherwise.
Input: partition - the handle for the partition for the notification transactionType - commit or rollback Output: None **************************************************************************/
100
/* Helper functions */
Description: Validate that the transformation has all properties expected by a union transformation, such as at least one input group, and only one output group. Return INFA_FAILURE if the session should fail since the transformation was invalid, INFA_SUCCESS otherwise.
Input: partition - the handle for the partition Output: None **************************************************************************/
INFA_STATUS validateProperties(const INFA_CT_PARTITION_HANDLE* partition) { const INFA_CT_INPUTGROUP_HANDLE* inputGroups = NULL; const INFA_CT_OUTPUTGROUP_HANDLE* outputGroups = NULL; size_t nNumInputGroups = 0, nNumOutputGroups = 0; const INFA_CT_INPUTPORT_HANDLE** allInputGroupsPorts = NULL; const INFA_CT_OUTPUTPORT_HANDLE* outputGroupPorts = NULL; size_t nNumPortsInOutputGroup = 0; size_t i = 0, nTempNumInputPorts = 0;
/* Get the input and output group handles */ inputGroups = INFA_CTGetChildrenHandles(partition[0], &nNumInputGroups, INPUTGROUPTYPE);
/* 1. Number of input groups must be >= 2 and number of output groups must * be equal to one. */
101
{ INFA_CTLogMessageM( eESL_ERROR, "UnionDemo: There must be at least two input groups " "and only one output group"); return INFA_FAILURE; }
/* 2. Verify that the same number of ports are in each group (including * output group). */ outputGroupPorts = INFA_CTGetChildrenHandles(outputGroups[0], &nNumPortsInOutputGroup, OUTPUTPORTTYPE);
/* Allocate an array for all input groups ports */ allInputGroupsPorts = malloc(sizeof(INFA_CT_INPUTPORT_HANDLE*) * nNumInputGroups);
if ( nNumPortsInOutputGroup != nTempNumInputPorts) { INFA_CTLogMessageM( eESL_ERROR, "UnionDemo: The number of ports in all input and " "the output group must be the same."); return INFA_FAILURE; } }
free(allInputGroupsPorts);
/* 3. Datatypes of ports in input group 1 must match data types of all other * groups.
TODO:*/
return INFA_SUCCESS; }
102
Start Visual C++. Click File > New. In the New dialog box, click the Projects tab and select the Win32 Dynamic-Link Library option. Enter its location. In the Union example, enter <client_installation_directory>/TX/UnionDemo.
5.
Enter the name of the project. You must use the module name specified for the Custom transformation as the project name. In the Union example, enter UnionDemo.
6.
Click OK. Visual C++ creates a wizard to help you define the project components.
7.
In the wizard, select An empty DLL project and click Finish. Click OK in the New Project Information dialog box. Visual C++ creates the project files in the directory you specified.
8.
103
9.
Navigate up a directory level. This directory contains the procedure files you created. Select all .c files and click OK. In the Union example, add the following files:
m_UnionDemo.c p_Union.c
Click Project > Settings. Click the C/C++ tab, and select Preprocessor from the Category field. In the Additional Include Directories field, enter the following path and click OK:
..; <PowerCenter_install_dir>\extproc\include\ct
13.
Click Build > Build <module_name>.dll or press F7 to build the project. Visual C++ creates the DLL and places it in the debug or release directory under the project directory.
Copy all C files and makefiles generated by the Designer to the UNIX machine.
Note: If you build the shared library on a machine other than the Integration Service
machine, you must also copy the files in the following directory to the build machine: <PowerCenter_install_dir>\ExtProc\include\ct In the Union example, copy all files in <client_installation_directory>/TX/UnionDemo.
2.
Set the environment variable INFA_HOME to the Integration Service installation directory.
Note: If you specify an incorrect directory path for the INFA_HOME environment
104
In this mapping, two sources with the same ports and datatypes connect to the two input groups in the Custom transformation. The Custom transformation takes the rows from both sources and outputs them all through its one output group. The output group has the same ports and datatypes as the input groups.
In the Workflow Manager, create a workflow. Create a session for this mapping in the workflow. Copy the shared library or DLL to the runtime location directory. Run the workflow containing the session. When the Integration Service loads a Custom transformation bound to a procedure, it loads the DLL or shared library and calls the procedure you define.
105
106
Chapter 5
Overview, 108 Function Reference, 110 Working with Rows, 114 Generated Functions, 116 API Functions, 122 Array-Based API Functions, 148
107
Overview
Custom transformations operate in conjunction with procedures you create outside of the Designer to extend PowerCenter functionality. The Custom transformation functions allow you to develop the transformation logic in a procedure you associate with a Custom transformation. PowerCenter provides two sets of functions called generated and API functions. The Integration Service uses generated functions to interface with the procedure. When you create a Custom transformation and generate the source code files, the Designer includes the generated functions in the files. Use the API functions in the procedure code to develop the transformation logic. When you write the procedure code, you can configure it to receive a block of rows from the Integration Service or a single row at a time. You can increase the procedure performance when it receives and processes a block of rows. For more information about receiving rows from the Integration Service, see Working with Rows on page 114.
108
INFA_CT_INPUTPORT_HANDLE
INFA_CT_OUTPUTPORT_HANDLE
INFA_CT_PROC_HANDLE
Overview
109
Function Reference
The Custom transformation functions include generated and API functions. Table 5-2 lists the Custom transformation generated functions:
Table 5-2. Custom Transformation Generated Functions
Function m_<module_name>_moduleInit() p_<proc_name>_procInit() p_<proc_name>_partitionInit() p_<proc_name>_inputRowNotification() p_<proc_name>_dataBdryNotification() p_<proc_name>_eofNotification() p_<proc_name>_partitionDeinit() p_<proc_name>_procedureDeinit() m_<module_name>_moduleDeinit() Description Module initialization function. For more information, see Module Initialization Function on page 116. Procedure initialization function. For more information, see Procedure Initialization Function on page 117. Partition initialization function. For more information, see Partition Initialization Function on page 117. Input row notification function. For more information, see Input Row Notification Function on page 118. Data boundary notification function. For more information, see Data Boundary Notification Function on page 119. End of file notification function. For more information, see End Of File Notification Function on page 119. Partition deinitialization function. For more information, see Partition Deinitialization Function on page 120. Procedure deinitialization function. For more information, see Procedure Deinitialization Function on page 120. Module deinitialization function. For more information, see Module Deinitialization Function on page 121.
110
INFA_CTGetAllPropertyNamesU()
INFA_CTGetExternalProperty<datatype>M() INFA_CTGetExternalProperty<datatype>U() INFA_CTRebindInputDataType() INFA_CTRebindOutputDataType() INFA_CTGetData<datatype>() INFA_CTSetData() INFA_CTGetIndicator() INFA_CTSetIndicator() INFA_CTGetLength() INFA_CTSetLength() INFA_CTSetPassThruPort() INFA_CTOutputNotification() INFA_CTDataBdryOutputNotification() INFA_CTGetErrorMsgU() INFA_CTGetErrorMsgM() INFA_CTLogMessageU()
Function Reference
111
112
Function Reference
113
You can decrease the number of function calls the Integration Service and procedure make. The Integration Service calls the input row notification function fewer times, and the procedure calls the output notification function fewer times. You can increase the locality of memory access space for the data. You can write the procedure code to perform an algorithm on a block of data instead of each row of data.
By default, the procedure receives a row of data at a time. To receive a block of rows, you must include the INFA_CTSetDataAccessMode() function to change the data access mode to array-based. When the data access mode is array-based, you must use the array-based data handling and row strategy functions to access and output the data. When the data access mode is row-based, you must use the row-based data handling and row strategy functions to access and output the data. All array-based functions use the prefix INFA_CTA. All other functions use the prefix INFA_CT. For more information about the array-based functions, see Array-Based API Functions on page 148. Use the following steps to write the procedure code to access a block of rows: 1. 2. Call INFA_CTSetDataAccessMode() during the procedure initialization, to change the data access mode to array-based. When you create a passive Custom transformation, you can also call INFA_CTSetPassThruPort() during procedure initialization to pass through the data for input/output ports. When a block of data reaches the Custom transformation procedure, the Integration Service calls p_<proc_name>_inputRowNotification() for each block of data. Perform the rest of the steps inside this function. 3. 4. 5. 6. 7. Call INFA_CTAGetNumRows() using the input group handle in the input row notification function to find the number of rows in the current block. Call one of the INFA_CTAGetData<datatype>() functions using the input port handle to get the data for a particular row in the block. Call INFA_CTASetData to output rows in a block. Before calling INFA_CTOutputNotification(), call INFA_CTASetNumRows() to notify the Integration Service of the number of rows the procedure is outputting in the block. Call INFA_CTOutputNotification().
114
In row-based mode, you can return INFA_ROWERROR in the input row notification function to indicate the function encountered an error for the row of data on input. The Integration Service increments the internal error count. In array-based mode, do not return INFA_ROWERROR in the input row notification function. The Integration Service treats that as a fatal error. If you need to indicate a row in a block has an error, call the INFA_CTASetInputErrorRowM() or INFA_CTASetInputErrorRowU() function. In row-based mode, the Integration Service only passes valid rows to the procedure. In array-based mode, an input block may contain invalid rows, such as dropped, filtered, or error rows. Call INFA_CTAIsRowValid() to determine if a row in a block is valid. In array-based mode, do not call INFA_CTASetNumRows() for a passive Custom transformation. You can call this function for active Custom transformations. In array-based mode, call INFA_CTOutputNotification() once. In array-based mode, you can call INFA_CTSetPassThruPort() only for passive Custom transformations. In array-based mode for passive Custom transformations, you must output all rows in an output block, including any error row.
115
Generated Functions
When you use the Designer to generate the procedure code, the Designer includes a set of functions called generated functions in the m_<module_name>.c and p_<procedure_name>.c files. The Integration Service uses the generated functions to interface with the procedure. When you run a session, the Integration Service calls these generated functions in the following order for each target load order group in the mapping: 1. 2. 3. Initialization functions Notification functions Deinitialization functions
Initialization Functions
The Integration Service first calls the initialization functions. Use the initialization functions to write processes you want the Integration Service to run before it passes data to the Custom transformation. Writing code in the initialization functions reduces processing overhead because the Integration Service runs these processes only once for a module, procedure, or partition. The Designer generates the following initialization functions:
m_<module_name>_moduleInit(). For more information, see Module Initialization Function on page 116. p_<proc_name>_procInit(). For more information, see Procedure Initialization Function on page 117. p_<proc_name>_partitionInit(). For more information, see Partition Initialization Function on page 117.
Argument module
Datatype INFA_CT_MODULE_HANDLE
116
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value. When the function returns INFA_FAILURE, the Integration Service fails the session.
Argument procedure
Datatype INFA_CT_PROCEDURE_HANDLE
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value. When the function returns INFA_FAILURE, the Integration Service fails the session.
Argument transformation
Datatype INFA_CT_PARTITION_HANDLE
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value. When the function returns INFA_FAILURE, the Integration Service fails the session.
Note: When the Custom transformation requires one thread for each partition, you can
Generated Functions
117
information about working with thread-specific procedure code, see Working with ThreadSpecific Procedure Code on page 84.
Notification Functions
The Integration Service calls the notification functions when it passes a row of data to the Custom transformation. The Designer generates the following notification functions:
p_<proc_name>_inputRowNotification(). For more information, see Input Row Notification Function on page 118. p_<proc_name>_dataBdryRowNotification(). For more information, see Data Boundary Notification Function on page 119. p_<proc_name>_eofNotification(). For more information, see End Of File Notification Function on page 119.
Note: When the Custom transformation requires one thread for each partition, you can
include thread-specific operations in the notification functions. For more information about working with thread-specific procedure code, see Working with Thread-Specific Procedure Code on page 84.
The datatype of the return value is INFA_ROWSTATUS. Use the following values for the return value:
INFA_ROWSUCCESS. Indicates the function successfully processed the row of data. INFA_ROWERROR. Indicates the function encountered an error for the row of data. The Integration Service increments the internal error count. Only return this value when the data access mode is row. If the input row notification function returns INFA_ROWERROR in array-based mode, the Integration Service treats it as a fatal error. If you need to indicate a row in a block has
118
INFA_FATALERROR. Indicates the function encountered a fatal error for the row of data or the block of data. The Integration Service fails the session.
Description Partition handle. Integration Service uses one of the following values for the dataBoundaryType parameter: - eBT_COMMIT - eBT_ROLLBACK
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value. When the function returns INFA_FAILURE, the Integration Service fails the session.
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value. When the function returns INFA_FAILURE, the Integration Service fails the session.
Generated Functions
119
Deinitialization Functions
The Integration Service calls the deinitialization functions after it processes data for the Custom transformation. Use the deinitialization functions to write processes you want the Integration Service to run after it passes all rows of data to the Custom transformation. The Designer generates the following deinitialization functions:
p_<proc_name>_partitionDeinit(). For more information, see Partition Deinitialization Function on page 120. p_<proc_name>_procDeinit(). For more information, see Procedure Deinitialization Function on page 120. m_<module_name>_moduleDeinit(). For more information, see Module Deinitialization Function on page 121.
Note: When the Custom transformation requires one thread for each partition, you can
include thread-specific operations in the initialization and deinitialization functions. For more information about working with thread-specific procedure code, see Working with Thread-Specific Procedure Code on page 84.
Argument partition
Datatype INFA_CT_PARTITION_HANDLE
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value. When the function returns INFA_FAILURE, the Integration Service fails the session.
Note: When the Custom transformation requires one thread for each partition, you can
include thread-specific operations in the partition deinitialization function. For more information about working with thread-specific procedure code, see Working with ThreadSpecific Procedure Code on page 84.
120
Description Procedure handle. Integration Service uses one of the following values for the sessionStatus parameter: - INFA_SUCCESS. Indicates the session succeeded. - INFA_FAILURE. Indicates the session failed.
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value. When the function returns INFA_FAILURE, the Integration Service fails the session.
Description Module handle. Integration Service uses one of the following values for the sessionStatus parameter: - INFA_SUCCESS. Indicates the session succeeded. - INFA_FAILURE. Indicates the session failed.
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value. When the function returns INFA_FAILURE, the Integration Service fails the session.
Generated Functions
121
API Functions
PowerCenter provides a set of API functions that you use to develop the transformation logic. When the Designer generates the source code files, it includes the generated functions in the source code. Add API functions to the code to implement the transformation logic. The procedure uses the API functions to interface with the Integration Service. You must code API functions in the procedure C file. Optionally, you can also code the module C file. Informatica provides the following groups of API functions:
Set data access mode. See Set Data Access Mode Function on page 122. Navigation. See Navigation Functions on page 123. Property. See Property Functions on page 126. Rebind datatype. See Rebind Datatype Functions on page 133. Data handling (row-based mode). See Data Handling Functions (Row-Based Mode) on page 135. Set pass-through port. See Set Pass-Through Port Function on page 138. Output notification. See Output Notification Function on page 139. Data boundary output notification. See Data Boundary Output Notification Function on page 139. Error. See Error Functions on page 140. Session log message. See Session Log Message Functions on page 141. Increment error count. See Increment Error Count Function on page 142. Is terminated. See Is Terminated Function on page 142. Blocking. See Blocking Functions on page 143. Pointer. See Pointer Functions on page 144. Change string mode. See Change String Mode Function on page 144. Set data code page. See Set Data Code Page Function on page 145. Row strategy (row-based mode). See Row Strategy Functions (Row-Based Mode) on page 146. Change default row strategy. See Change Default Row Strategy Function on page 147.
Informatica also provides array-based API Functions. For more information about arraybased API functions, see Array-Based API Functions on page 148.
122
When you set the data access mode to array-based, you must use the array-based versions of the data handling functions and row strategy functions. When you use a row-based data handling or row strategy function and you switch to array-based mode, you will get unexpected results. For example, the DLL or shared library might crash. You can only use this function in the procedure initialization function. If you do not use this function in the procedure code, the data access mode is row-based. However, when you want the data access mode to be row-based, include this function and set the access mode to row-based. For more information about the array-based functions, see Array-Based API Functions on page 148. Use the following syntax:
INFA_STATUS INFA_CTSetDataAccessMode( INFA_CT_PROCEDURE_HANDLE procedure, INFA_CT_DATA_ACCESS_MODE mode ); Input/ Output Input Input
Description Procedure name. Data access mode. Use the following values for the mode parameter: - eDA_ROW - eDA_ARRAY
Navigation Functions
Use the navigation functions when you want the procedure to navigate through the handle hierarchy. For more information about handles, see Working with Handles on page 108. PowerCenter provides the following navigation functions:
INFA_CTGetAncestorHandle(). For more information, see Get Ancestor Handle Function on page 123. INFA_CTGetChildrenHandles(). For more information, see Get Children Handles Function on page 124. INFA_CTGetInputPortHandle(). For more information, see Get Port Handle Functions on page 125. INFA_CTGetOutputPortHandle(). For more information, see Get Port Handle Functions on page 125.
API Functions
123
Description Handle name. Return handle type. Use the following values for the returnHandleType parameter: - PROCEDURETYPE - TRANSFORMATIONTYPE - PARTITIONTYPE - INPUTGROUPTYPE - OUTPUTGROUPTYPE - INPUTPORTTYPE - OUTPUTPORTTYPE
The handle parameter specifies the handle whose parent you want the procedure to access. The Integration Service returns INFA_CT_HANDLE if you specify a valid handle in the function. Otherwise, it returns a null value. To avoid compilation errors, you must code the procedure to set a handle name to the return value. For example, you can enter the following code:
INFA_CT_MODULE_HANDLE module = INFA_CTGetAncestorHandle(procedureHandle, INFA_CT_HandleType);
Argument handle
Datatype INFA_CT_HANDLE
124
Argument pnChildrenHandles
Datatype size_t*
Description Integration Service returns an array of children handles. The pnChildrenHandles parameter indicates the number of children handles in the array. Use the following values for the returnHandleType parameter: - PROCEDURETYPE - TRANSFORMATIONTYPE - PARTITIONTYPE - INPUTGROUPTYPE - OUTPUTGROUPTYPE - INPUTPORTTYPE - OUTPUTPORTTYPE
returnHandleType
INFA_CTHandleType
Input
The handle parameter specifies the handle whose children you want the procedure to access. The Integration Service returns INFA_CT_HANDLE* when you specify a valid handle in the function. Otherwise, it returns a null value. To avoid compilation errors, you must code the procedure to set a handle name to the returned value. For example, you can enter the following code:
INFA_CT_PARTITION_HANDLE partition = INFA_CTGetChildrenHandles(procedureHandle, pnChildrenHandles, INFA_CT_PARTITION_HANDLE_TYPE);
INFA_CTGetInputPortHandle(). Use this function when the procedure knows the output port handle for an input/output port and needs the input port handle. Use the following syntax:
INFA_CTINFA_CT_INPUTPORT_HANDLE INFA_CTGetInputPortHandle(INFA_CT_OUTPUTPORT_HANDLE outputPortHandle); Input/ Output input
Argument outputPortHandle
Datatype INFA_CT_OUTPUTPORT_HANDLE
INFA_CTGetOutputPortHandle(). Use this function when the procedure knows the input port handle for an input/output port and needs the output port handle.
API Functions
125
Argument inputPortHandle
Datatype INFA_CT_INPUTPORT_HANDLE
The Integration Service returns NULL when you use the get port handle functions with input or output ports.
Property Functions
Use the property functions when you want the procedure to access the Custom transformation properties. The property functions access properties on the following tabs of the Custom transformation:
Ports Properties Initialization Properties Metadata Extensions Port Attribute Definitions INFA_CTGetInternalProperty<datatype>(). For more information, see Get Internal Property Function on page 126. INFA_CTGetAllPropertyNamesM(). For more information, see Get All External Property Names (MBCS or Unicode) on page 132. INFA_CTGetAllPropertyNamesU(). For more information, see Get All External Property Names (MBCS or Unicode) on page 132. INFA_CTGetExternalProperty<datatype>M(). For more information, see Get External Properties (MBCS or Unicode) on page 132. INFA_CTGetExternalProperty<datatype>U(). For more information, see Get External Properties (MBCS or Unicode) on page 132.
Use the following functions when you want the procedure to access the properties:
INFA_CTGetInternalPropertyStringM(). Accesses a value of type string in MBCS for a given property ID. Use the following syntax:
INFA_STATUS INFA_CTGetInternalPropertyStringM( INFA_CT_HANDLE handle, size_t propId, const char** psPropValue );
INFA_CTGetInternalPropertyStringU(). Accesses a value of type string in Unicode for a given property ID. Use the following syntax:
INFA_STATUS INFA_CTGetInternalPropertyStringU( INFA_CT_HANDLE handle, size_t propId, const INFA_UNICHAR** psPropValue );
INFA_CTGetInternalPropertyInt32(). Accesses a value of type integer for a given property ID. Use the following syntax:
INFA_STATUS INFA_CTGetInternalPropertyInt32( INFA_CT_HANDLE handle, size_t propId, INFA_INT32* pnPropValue );
INFA_CTGetInternalPropertyBool(). Accesses a value of type Boolean for a given property ID. Use the following syntax:
INFA_STATUS INFA_CTGetInternalPropertyBool( INFA_CT_HANDLE handle, size_t propId, INFA_BOOLEN* pbPropValue );
INFA_CTGetInternalPropertyINFA_PTR(). Accesses a pointer to a value for a given property ID. Use the following syntax:
INFA_STATUS INFA_CTGetInternalPropertyINFA_PTR( INFA_CT_HANDLE handle, size_t propId, INFA_PTR* pvPropValue );
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
API Functions
127
INFA_CT_TRANS_OUTPUT_IS_ REPEATABLE
Integer
INFA_CT_TRANS_FATAL_ERROR
Boolean
128
INFA_CT_TRANS_NUM_PARTITIONS INFA_CT_TRANS_DATACODEPAGE
Integer Integer
INFA_CT_TRANS_TRANSFORM_ SCOPE
Integer
API Functions
129
INFA_CT_TRANS_OUTPUT_IS_ REPEATABLE
Integer
INFA_CT_TRANS_FATAL_ERROR
Boolean
INFA_CT_PORT_PRECISION INFA_CT_PORT_SCALE
Integer Integer
130
INFA_CT_PORT_BOUNDDATATYPE
Integer
INFA_CT_PORT_BOUNDDATATYPE
Integer
API Functions
131
INFA_CTGetAllPropertyNamesM(). Accesses the property names in MBCS. Use the following syntax:
INFA_STATUS INFA_CTGetAllPropertyNamesM(INFA_CT_HANDLE handle, const char*const** paPropertyNames, size_t* pnProperties); Input/ Output Input Output
Description Specify the handle name. Specifies the property name. The Integration Service returns an array of property names in MBCS. Indicates the number of properties in the array.
pnProperties
size_t*
Output
INFA_CTGetAllPropertyNamesU(). Accesses the property names in Unicode. Use the following syntax:
INFA_STATUS INFA_CTGetAllPropertyNamesU(INFA_CT_HANDLE handle, const INFA_UNICHAR*const** pasPropertyNames, size_t* pnProperties); Input/ Output Input Output
Description Specify the handle name. Specifies the property name. The Integration Service returns an array of property names in Unicode. Indicates the number of properties in the array.
pnProperties
Output
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
132
functions to access property names. For the handle parameter, specify a handle name from the handle hierarchy. The Integration Service fails the session if the handle name is invalid.
Note: If you define an initialization property with the same name as a metadata extension, the
Integration Service returns the metadata extension value. Use the following functions when you want the procedure to access the values of the properties:
INFA_CTGetExternalProperty<datatype>M(). Accesses the value of the property in MBCS. Use the syntax as shown in Table 5-10:
Table 5-10. Property Functions (MBCS)
Syntax INFA_STATUS INFA_CTGetExternalPropertyStringM(INFA_CT_HANDLE handle, const char* sPropName, const char** psPropValue); INFA_STATUS INFA_CTGetExternalPropertyINT32M(INFA_CT_HANDLE handle, const char* sPropName, INFA_INT32* pnPropValue); INFA_STATUS INFA_CTGetExternalPropertyBoolM(INFA_CT_HANDLE handle, const char* sPropName, INFA_BOOLEN* pbPropValue); Property Datatype String Integer Boolean
INFA_CTGetExternalProperty<datatype>U(). Accesses the value of the property in Unicode. Use the syntax as shown in Table 5-11:
Table 5-11. Property Functions (Unicode)
Syntax INFA_STATUS INFA_CTGetExternalPropertyStringU(INFA_CT_HANDLE handle, INFA_UNICHAR* sPropName, INFA_UNICHAR** psPropValue); INFA_STATUS INFA_CTGetExternalPropertyStringU(INFA_CT_HANDLE handle, INFA_UNICHAR* sPropName, INFA_INT32* pnPropValue); INFA_STATUS INFA_CTGetExternalPropertyStringU(INFA_CT_HANDLE handle, INFA_UNICHAR* sPropName, INFA_BOOLEN* pbPropValue); Property Datatype String Integer Boolean
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
API Functions
133
Consider the following rules when you rebind the datatype for an output or input/output port:
You must use the data handling functions to set the data and the indicator for that port. Use the INFA_CTSetData() and INFA_CTSetIndicator() functions in row-based mode, and use the INFA_CTASetData() function in array-based mode. Do not call the INFA_CTSetPassThruPort() function for the output port.
Dec18 Dec28
134
Description Output port handle. The datatype with which you rebind the port. Use the following values for the datatype parameter: - eINFA_CTYPE_SHORT - eINFA_CTYPE_INT32 - eINFA_CTYPE_CHAR - eINFA_CTYPE_RAW - eINFA_CTYPE_UNICHAR - eINFA_CTYPE_TIME - eINFA_CTYPE_FLOAT - eINFA_CTYPE_DOUBLE - eINFA_CTYPE_DECIMAL18_FIXED - eINFA_CTYPE_DECIMAL28_FIXED - eINFA_CTYPE_INFA_CTDATETIME
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
INFA_CTGetData<datatype>(). For more information, see Get Data Functions (RowBased Mode) on page 136. INFA_CTSetData(). For more information, see Set Data Function (Row-Based Mode) on page 136.
API Functions 135
INFA_CTGetIndicator(). For more information, see Indicator Functions (Row-Based Mode) on page 137. INFA_CTSetIndicator(). For more information, see Indicator Functions (Row-Based Mode) on page 137. INFA_CTGetLength(). For more information, see Length Functions on page 138. INFA_CTSetLength(). For more information, see Length Functions on page 138.
char* INFA_CTGetDataStringM(INFA_CT_INPUTPORT_HANDLE dataHandle); IUNICHAR* INFA_CTGetDataStringU(INFA_CT_INPUTPORT_HANDLE dataHandle); INFA_INT32 INFA_CTGetDataINT32(INFA_CT_INPUTPORT_HANDLE dataHandle); double INFA_CTGetDataDouble(INFA_CT_INPUTPORT_HANDLE dataHandle); INFA_CT_RAWDATE INFA_CTGetDataDate(INFA_CT_INPUTPORT_HANDLE dataHandle); INFA_CT_RAWDEC18 INFA_CTGetDataRawDec18( INFA_CT_INPUTPORT_HANDLE dataHandle); INFA_CT_RAWDEC28 INFA_CTGetDataRawDec28( INFA_CT_INPUTPORT_HANDLE dataHandle); INFA_CT_DATETIME INFA_CTGetDataDateTime(INFA_CT_INPUTPORT_HANDLE dataHandle);
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
Note: If you use the INFA_CTSetPassThruPort() function on an input/output port, do not
INFA_CTGetIndicator(). Gets the indicator for an input port. Use the following syntax:
INFA_INDICATOR INFA_CTGetIndicator(INFA_CT_INPUTPORT_HANDLE dataHandle);
The return value datatype is INFA_INDICATOR. Use the following values for INFA_INDICATOR:
INFA_DATA_VALID. Indicates the data is valid. INFA_NULL_DATA. Indicates a null value. INFA_DATA_TRUNCATED. Indicates the data has been truncated.
INFA_STATUS INFA_CTSetIndicator(INFA_CT_OUTPUTPORT_HANDLE dataHandle, INFA_INDICATOR indicator); Input/ Output Input Input
INFA_CTSetIndicator(). Sets the indicator for an output port. Use the following syntax:
Description Output port handle. The indicator value for the output port. Use one of the following values: - INFA_DATA_VALID. Indicates the data is valid. - INFA_NULL_DATA. Indicates a null value. - INFA_DATA_TRUNCATED. Indicates the data has been truncated.
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
Note: If you use the INFA_CTSetPassThruPort() function on an input/output port, do
API Functions
137
Length Functions
Use the length functions when you want the procedure to access the length of a string or binary input port, or to set the length of a binary or string output port. Use the following length functions:
INFA_CTGetLength(). Use this function for string and binary ports only. The Integration Service returns the length as the number of characters including trailing spaces. Use the following syntax:
INFA_UINT32 INFA_CTGetLength(INFA_CT_INPUTPORT_HANDLE dataHandle);
The return value datatype is INFA_UINT32. Use a value between zero and 2GB for the return value.
INFA_CTSetLength(). When the Custom transformation contains a binary or string output port, you must use this function to set the length of the data, including trailing spaces. Verify you the length you set for string and binary ports is not greater than the precision for that port. If you set the length greater than the port precision, you get unexpected results. For example, the session may fail. Use the following syntax:
INFA_STATUS INFA_CTSetLength(INFA_CT_OUTPUTPORT_HANDLE dataHandle, IUINT32 length);
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
Only use this function in an initialization function. If the procedure includes this function, do not include the INFA_CTSetData(), INFA_CTSetLength, INFA_CTSetIndicator(), or INFA_CTASetData() functions to pass data to the output port. In row-based mode, you can only include this function when the transformation scope is Row. When the transformation scope is Transaction or All Input, this function returns INFA_FAILURE. In row-based mode, when you use this function to output multiple rows for a given input row, every output row contains the data that is passed through from the input port. In array-based mode, you can only use this function for passive Custom transformations.
You must verify that the datatype, precision, and scale are the same for the input and output ports. The Integration Service fails the session if the datatype, precision, or scale are not the same for the input and output ports you specify in the INFA_CTSetPassThruPort() function.
138 Chapter 5: Custom Transformation Functions
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
row notification function. If you include it somewhere else, it returns a failure. Use the following syntax:
INFA_ROWSTATUS INFA_CTOutputNotification(INFA_CT_OUTPUTGROUP_HANDLE group); Input/ Output Input
Argument group
Datatype INFA_CT_OUTPUT_GROUP_HANDLE
The return value datatype is INFA_ROWSTATUS. Use the following values for the return value:
INFA_ROWSUCCESS. Indicates the function successfully processed the row of data. INFA_ROWERROR. Indicates the function encountered an error for the row of data. The Integration Service increments the internal error count. INFA_FATALERROR. Indicates the function encountered a fatal error for the row of data. The Integration Service fails the session.
Note: When the procedure code calls the INFA_CTOutputNotification() function, you must
verify that all pointers in an output port handle point to valid data. When a pointer does not point to valid data, the Integration Service might shut down unexpectedly.
API Functions
139
Description Handle name. The transaction type. Use the following values for the dataBoundaryType parameter: - eBT_COMMIT - eBT_ROLLBACK
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
Error Functions
Use the error functions to access procedure errors. The Integration Service returns the most recent error. PowerCenter provides the following error functions:
INFA_CTGetErrorMsgM(). Gets the error message in MBCS. Use the following syntax:
const char* INFA_CTGetErrorMsgM();
INFA_CTGetErrorMsgU(). Gets the error message in Unicode. Use the following syntax:
const IUNICHAR* INFA_CTGetErrorMsgU();
140
Argument errorSeverityLevel
Datatype INFA_CT_ErrorSeverityLevel
Description Severity level of the error message that you want the Integration Service to write in the session log. Use the following values for the errorSeverityLevel parameter: - eESL_LOG - eESL_DEBUG - eESL_ERROR Enter the text of the message in Unicode in quotes.
msg
INFA_UNICHAR*
Input
Argument errorSeverityLevel
Datatype INFA_CT_ErrorSeverityLevel
Description Severity level of the error message that you want the Integration Service to write in the session log. Use the following values for the errorSeverityLevel parameter: - eESL_LOG - eESL_DEBUG - eESL_ERROR Enter the text of the message in MBCS in quotes.
msg
char*
Input
API Functions
141
Description Partition handle. Integration Service increments the error count by nErrors for the given transformation instance. Integration Service uses INFA_FAILURE for the pStatus parameter when the error count exceeds the error threshold and fails the session.
pStatus
INFA_STATUS*
Input
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
Is Terminated Function
Use the INFA_CTIsTerminated() function when you want the procedure to check if the PowerCenter Client has requested the Integration Service to stop the session. You might call this function if the procedure includes a time-consuming process. Use the following syntax:
INFA_CTTerminateType INFA_CTIsTerminated(INFA_CT_PARTITION_HANDLE handle); Input/ Output input
Argument handle
Datatype INFA_CT_PARTITION_HANDLE
The return value datatype is INFA_CTTerminateType. The Integration Service returns one of the following values:
eTT_NOTTERMINATED. Indicates the PowerCenter Client has not requested to stop the session. eTT_ABORTED. Indicates the Integration Service aborted the session. eTT_STOPPED. Indicates the Integration Service failed the session.
142
Blocking Functions
When the Custom transformation contains multiple input groups, you can write code to block the incoming data on an input group. For more information about blocking data, see Blocking Input Data on page 88. Consider the following rules when you use the blocking functions:
You can block at most n-1 input groups. You cannot block an input group that is already blocked. You cannot block an input group when it receives data from the same source as another input group. You cannot unblock an input group that is already unblocked. INFA_CTBlockInputFlow(). Allows the procedure to block an input group. Use the following syntax:
INFA_STATUS INFA_CTBlockInputFlow(INFA_CT_INPUTGROUP_HANDLE group);
INFA_CTUnblockInputFlow(). Allows the procedure to unblock an input group. Use the following syntax:
INFA_STATUS INFA_CTUnblockInputFlow(INFA_CT_INPUTGROUP_HANDLE group); Input/ Output Input
Argument group
Datatype INFA_CT_INPUTGROUP_HANDLE
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
Verify Blocking
When you use the INFA_CTBlockInputFlow() and INFA_CTUnblockInputFlow() functions in the procedure code, verify the procedure checks whether or not the Integration Service allows the Custom transformation to block incoming data. To do this, check the value of the INFA_CT_TRANS_MAY_BLOCK_DATA propID using the INFA_CTGetInternalPropertyBool() function. When the value of the INFA_CT_TRANS_MAY_BLOCK_DATA propID is FALSE, the procedure should either not use the blocking functions, or it should return a fatal error and stop the session. If the procedure code uses the blocking functions when the Integration Service does not allow the Custom transformation to block data, the Integration Service might fail the session.
API Functions
143
Pointer Functions
Use the pointer functions when you want the Integration Service to create and access pointers to an object or a structure. PowerCenter provides the following pointer functions:
INFA_CTGetUserDefinedPtr(). Allows the procedure to access an object or structure during run time. Use the following syntax:
void* INFA_CTGetUserDefinedPtr(INFA_CT_HANDLE handle) Input/ Output Input
Argument handle
Datatype INFA_CT_HANDLE
INFA_CTSetUserDefinedPtr(). Allows the procedure to associate an object or a structure with any handle the Integration Service provides. To reduce processing overhead, include this function in the initialization functions. Use the following syntax:
void INFA_CTSetUserDefinedPtr(INFA_CT_HANDLE handle, void* pPtr) Input/ Output Input Input
144
Description Procedure handle name. Specifies the string mode that you want the Integration Service to use. Use the following values for the stringMode parameter: - eASM_UNICODE. Use this when the Integration Service runs in ASCII mode and you want the procedure to access data in Unicode. - eASM_MBCS. Use this when the Integration Service runs in Unicode mode and you want the procedure to access data in MBCS.
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
Description Transformation handle name. Specifies the code page you want the Integration Service to pass data in. For valid values for the dataCodePageID parameter, see Code Pages in the Administrator Guide.
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
API Functions
145
INFA_CTGetRowStrategy(). Allows the procedure to get the update strategy for a row. Use the following syntax:
INFA_STATUS INFA_CTGetRowStrategy(INFA_CT_INPUTGROUP_HANDLE group, INFA_CTUpdateStrategy updateStrategy); Input/ Output Input Input
Description Input group handle. Update strategy for the input port. The Integration Service uses the following values: - eUS_INSERT = 0 - eUS_UPDATE = 1 - eUS_DELETE = 2 - eUS_REJECT = 3
INFA_CTSetRowStrategy(). Sets the update strategy for each row. This overrides the INFA_CTChangeDefaultRowStrategy function. Use the following syntax:
INFA_STATUS INFA_CTSetRowStrategy(INFA_CT_OUTPUTGROUP_HANDLE group, INFA_CT_UPDATESTRATEGY updateStrategy); Input/ Output Input Input
Description Output group handle. Update strategy you want to set for the output port. Use one of the following values: - eUS_INSERT = 0 - eUS_UPDATE = 1 - eUS_DELETE = 2 - eUS_REJECT = 3
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
146
Description Transformation handle. Specifies the row strategy you want the Integration Service to use for the Custom transformation. - eDUS_PASSTHROUGH. Flags the row for passthrough. - eDUS_INSERT. Flags rows for insert. - eDUS_UPDATE. Flags rows for update. - eDUS_DELETE. Flags rows for delete.
The return value datatype is INFA_STATUS. Use INFA_SUCCESS and INFA_FAILURE for the return value.
API Functions
147
Maximum number of rows. See Maximum Number of Rows Functions on page 148. Number of rows. See Number of Rows Functions on page 149. Is row valid. See Is Row Valid Function on page 150. Data handling (array-based mode). See Data Handling Functions (Array-Based Mode) on page 150. Row strategy. See Row Strategy Functions (Array-Based Mode) on page 153. Set input error row. See Set Input Error Row Functions on page 154.
INFA_CTAGetInputNumRowsMax(). Use this function to determine the maximum number of rows allowed in an input block. Use the following syntax:
IINT32 INFA_CTAGetInputRowMax( INFA_CT_INPUTGROUP_HANDLE inputgroup ); Input/ Output Input
Argument inputgroup
Datatype INFA_CT_INPUTGROUP_HANDLE
INFA_CTAGetOutputNumRowsMax(). Use this function to determine the maximum number of rows allowed in an output block.
148
Argument outputgroup
Datatype INFA_CT_OUTPUTGROUP_HANDLE
INFA_CTASetOutputRowMax(). Use this function to set the maximum number of rows allowed in an output block. Use the following syntax:
INFA_STATUS INFA_CTASetOutputRowMax( INFA_CT_OUTPUTGROUP_HANDLE outputgroup, INFA_INT32 nRowMax ); Input/ Output Input Input
Description Output group handle. Maximum number of rows you want to allow in an output block. You must enter a positive number. The function returns a fatal error when you use a non-positive number, including zero.
INFA_CTAGetNumRows(). You can determine the number of rows in an input block. Use the following syntax:
INFA_INT32 INFA_CTAGetNumRows( INFA_CT_INPUTGROUP_HANDLE inputgroup ); Input/ Output Input
Argument inputgroup
Datatype INFA_CT_INPUTGROUP_HANDLE
INFA_CTASetNumRows(). You can set the number of rows in an output block. Call this function before you call the output notification function.
149
Description Output port handle. Number of rows you want to define in the output block. You must enter a positive number. The Integration Service fails the output notification function when specify a non-positive number.
Description Input group handle. Index number of the row in the block. The index is zero-based. You must verify the procedure only passes an index number that exists in the data block. If you pass an invalid value, the Integration Service shuts down unexpectedly.
150
PowerCenter provides the following data handling functions for the array-based data access mode:
INFA_CTAGetData<datatype>(). For more information, see Get Data Functions (ArrayBased Mode) on page 151. INFA_CTAGetIndicator(). For more information, see Get Indicator Function (ArrayBased Mode) on page 152. INFA_CTASetData(). For more information, see Set Data Function (Array-Based Mode) on page 152.
151
Description Input port handle. Index number of the row in the block. The index is zero-based. You must verify the procedure only passes an index number that exists in the data block. If you pass an invalid value, the Integration Service shuts down unexpectedly.
The return value datatype is INFA_INDICATOR. Use the following values for INFA_INDICATOR:
INFA_DATA_VALID. Indicates the data is valid. INFA_NULL_DATA. Indicates a null value. INFA_DATA_TRUNCATED. Indicates the data has been truncated.
Description Output port handle. Index number of the row in the block. The index is zero-based. You must verify the procedure only passes an index number that exists in the data block. If you pass an invalid value, the Integration Service shuts down unexpectedly. Pointer to the data.
pData
void*
Input
152
Argument nLength
Datatype INFA_UINT32
Description Length of the port. Use for string and binary ports only. You must verify the function passes the correct length of the data. If the function passes a different length, the output notification function returns failure for this port. Verify the length you set for string and binary ports is not greater than the precision for the port. If you set the length greater than the port precision, you get unexpected results. For example, the session may fail. Indicator value for the output port. Use one of the following values: - INFA_DATA_VALID. Indicates the data is valid. - INFA_NULL_DATA. Indicates a null value. - INFA_DATA_TRUNCATED. Indicates the data has been truncated.
indicator
INFA_INDICATOR
Input
INFA_CTAGetRowStrategy(). Allows the procedure to get the update strategy for a row in a block. Use the following syntax:
INFA_CT_UPDATESTRATEGY INFA_CTAGetRowStrategy( INFA_CT_INPUTGROUP_HANDLE inputgroup, INFA_INT32 iRow); Input/ Output Input Input
Description Input group handle. Index number of the row in the block. The index is zero-based. You must verify the procedure only passes an index number that exists in the data block. If you pass an invalid value, the Integration Service shuts down unexpectedly.
153
INFA_CTASetRowStrategy(). Sets the update strategy for a row in a block. Use the following syntax:
void INFA_CTASetRowStrategy( INFA_CT_OUTPUTGROUP_HANDLE outputgroup, INFA_INT32 iRow, INFA_CT_UPDATESTRATEGY updateStrategy ); Input/ Output Input Input
Description Output group handle. Index number of the row in the block. The index is zero-based. You must verify the procedure only passes an index number that exists in the data block. If you pass an invalid value, the Integration Service shuts down unexpectedly. Update strategy for the port. Use one of the following values: - eUS_INSERT = 0 - eUS_UPDATE = 1 - eUS_DELETE = 2 - eUS_REJECT = 3
updateStrategy
INFA_CT_UPDATESTRATEGY
Input
INFA_CTASetInputErrorRowM(). You can notify the Integration Service that a row in the input block has an error and to output an MBCS error message to the session log. Use the following syntax:
INFA_STATUS INFA_CTASetInputErrorRowM( INFA_CT_INPUTGROUP_HANDLE inputGroup, INFA_INT32 iRow, size_t nErrors, INFA_MBCSCHAR* sErrMsg ); Input/ Output Input Input
Description Input group handle. Index number of the row in the block. The index is zero-based. You must verify the procedure only passes an index number that exists in the data block. If you pass an invalid value, the Integration Service shuts down unexpectedly.
154
Description Use this parameter to specify the number of errors this input row has caused. MBCS string containing the error message you want the function to output. You must enter a null-terminated string. This parameter is optional. When you include this argument, the Integration Service prints the message in the session log, even when you enable row error logging.
INFA_CTASetInputErrorRowU(). You can notify the Integration Service that a row in the input block has an error and to output a Unicode error message to the session log. Use the following syntax:
INFA_STATUS INFA_CTASetInputErrorRowU( INFA_CT_INPUTGROUP_HANDLE inputGroup, INFA_INT32 iRow, size_t nErrors, INFA_UNICHAR* sErrMsg ); Input/ Output Input Input
Description Input group handle. Index number of the row in the block. The index is zero-based. You must verify the procedure only passes an index number that exists in the data block. If you pass an invalid value, the Integration Service shuts down unexpectedly. Use this parameter to specify the number of errors this output row has caused. Unicode string containing the error message you want the function to output. You must enter a null-terminated string. This parameter is optional. When you include this argument, the Integration Service prints the message in the session log, even when you enable row error logging.
nErrors sErrMsg
size_t INFA_UNICHAR*
Input Input
155
156
Chapter 6
Expression Transformation
This chapter includes the following topics:
Overview, 158 Expression Transformation Components, 159 Configuring Ports, 160 Creating an Expression Transformation, 162
157
Overview
Transformation type: Passive Connected
Use the Expression transformation to calculate values in a single row. For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. You can also use the Expression transformation to test conditional statements before you pass the results to a target or other transformations. Use the Expression transformation to perform non-aggregate calculations. To perform calculations involving multiple rows, such as sums or averages, use the Aggregator transformation. Figure 6-1 shows a simple mapping with an Expression transformation used to concatenate the first and last names of employees from the EMPLOYEES table:
Figure 6-1. Sample Mapping with an Expression Transformation
158
Transformation. Enter the name and description of the transformation. The naming convention for an Expression transformation is EXP_TransformationName. You can also make the transformation reusable. Ports. Create and configure ports. For more information, see Configuring Ports on page 160. Properties. Configure the tracing level to determine the amount of transaction detail reported in the session log file. Metadata Extensions. Specify the extension name, datatype, precision, and value. You can also create reusable metadata extensions. For more information about creating metadata extensions, see Metadata Extensions in the Repository Guide.
159
Configuring Ports
You can create and modify ports on the Ports tab. Figure 6-3 shows the Ports tab of an Expression transformation:
Figure 6-3. Expression Transformation Ports Tab
Port name. Name of the port. For more information about naming ports, see Working with Ports on page 7. Datatype, precision, and scale. Configure the datatype and set the precision and scale for each port. Port type. A port can be input, output, input/output, or variable. The input ports receive data and output ports pass data. The input/output ports pass data unchanged. Variable ports store data temporarily and can store values across the rows. For more information about variable ports, see Using Local Variables on page 15. Expression. Use the Expression Editor to enter expressions. Expressions use the transformation language, which includes SQL-like functions, to perform calculations.For more information about entering expressions, see Working with Expressions on page 10. Default values and description. Set default value for ports and add description. For more information about using default values, see Using Default Values for Ports on page 20.
160
Calculating Values
To calculate values for a single row using the Expression transformation, you must include the following ports:
Input or input/output ports. Provides values used in a calculation. For example, if you need to calculate the total price for an order, create two input or input/output ports. One port provides the unit price and the other provides the quantity ordered. Output ports. Provides the return value of the expression. You enter the expression as a configuration option for the output port. You can also configure a default value for each port.
You can enter multiple expressions in a single Expression transformation by creating an expression for each output port. For example, you might want to calculate different types of withholding taxes from each employee paycheck, such as local and federal income tax, Social Security and Medicare. Since all of these calculations require the employee salary, the withholding category, and may require the corresponding tax rate, you can create input/ output ports for the salary and withholding category and a separate output port for each calculation.
Configuring Ports
161
In the Mapping Designer, open a mapping. Click Transformation > Create. Select Expression transformation. Enter a name and click Done. Select and drag the ports from the source qualifier or other transformations to add to the Expression transformation. You can also open the transformation and create ports manually. Double-click on the title bar and click on Ports tab. You can create output and variable ports within the transformation. In the Expression section of an output or variable port, open the Expression Editor. Enter an expression. Click Validate to verify the expression syntax. For more information about creating expressions, see Working with Expressions on page 10. Click Ok. Assign the port datatype, precision, and scale to match the expression return value. Create reusable transformations on the Transformation tab.
Note: After you make the transformation reusable, you cannot copy ports from the source
5. 6. 7. 8. 9. 10.
qualifier or other transformations. You can create ports manually within the transformation.
11. 12. 13. 14. 15.
Configure the tracing level on the Properties tab. Add metadata extensions on the Metadata Extensions tab. Click Ok Connect the output ports to a downstream transformation or target. Click Repository > Save.
162
Chapter 7
Overview, 164 Configuring External Procedure Transformation Properties, 167 Developing COM Procedures, 170 Developing Informatica External Procedures, 180 Distributing External Procedures, 190 Development Notes, 192 Service Process Variables in Initialization Properties, 201 External Procedure Interfaces, 202
163
Overview
Transformation type: Passive Connected/Unconnected
External Procedure transformations operate in conjunction with procedures you create outside of the Designer interface to extend PowerCenter functionality. Although the standard transformations provide you with a wide range of options, there are occasions when you might want to extend the functionality provided with PowerCenter. For example, the range of standard transformations, such as Expression and Filter transformations, may not provide the functionality you need. If you are an experienced programmer, you may want to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping. To get this kind of extensibility, use the Transformation Exchange (TX) dynamic invocation interface built into PowerCenter. Using TX, you can create an Informatica External Procedure transformation and bind it to an external procedure that you have developed. You can bind External Procedure transformations to two kinds of external procedures:
COM external procedures (available on Windows only) Informatica external procedures (available on Windows, AIX, HP-UX, Linux, and Solaris)
To use TX, you must be an experienced C, C++, or Visual Basic programmer. Use multi-threaded code in external procedures.
164
2.
3.
When you develop Informatica external procedures, the External Procedure transformation provides the information required to generate Informatica external procedure stubs.
Pipeline Partitioning
If you purchase the Partitioning option with PowerCenter, you can increase the number of partitions in a pipeline to improve session performance. Increasing the number of partitions allows the Integration Service to create multiple connections to sources and process partitions of source data concurrently. When you create a session, the Workflow Manager validates each pipeline in the mapping for partitioning. You can specify multiple partitions in a pipeline if the Integration Service can maintain data consistency when it processes the partitioned data.
Overview
165
Use the Is Partitionable property on the Properties tab to specify whether or not you can create multiple partitions in the pipeline. For more information about partitioning External Procedure transformations, see the Workflow Administration Guide.
166
Module/Programmatic Identifier
Required
Procedure Name
Required
167
Tracing Level
Optional
Is Partitionable
Required
168
Output is Deterministic
Optional
Table 7-3 describes the environment variables the Integration Service uses to locate the DLL or shared object on the various platforms for the runtime location:
Table 7-3. Environment Variables
Operating System Windows AIX HPUX Linux Solaris Environment Variable PATH LIBPATH SHLIB_PATH LD_LIBRARY_PATH LD_LIBRARY_PATH
169
Launch Visual C++ and click File > New. In the dialog box that appears, select the Projects tab. Enter the project name and location. In the BankSoft example, you enter COM_VC_Banksoft as the project name, and c:\COM_VC_Banksoft as the directory.
4.
Select the ATL COM AppWizard option in the projects list box and click OK.
170
Set the Server Type to Dynamic Link Library, select the Support MFC option, and click Finish. The final page of the wizard appears.
6. 7. 8.
Click OK to return to Visual C++. Add a class to the new project. On the next page of the wizard, click the OK button. The Developer Studio creates the basic project files.
In the Workspace window, select the Class View tab, right-click the tree item COM_VC_BankSoft.BSoftFin classes, and choose New ATL Object from the local menu that appears. Highlight the Objects item in the left list box and select Simple Object from the list of object types. Click Next. In the Short Name field, enter a short name for the class you want to create. In the BankSoft example, use the name BSoftFin, since you are developing a financial function for the fictional company BankSoft. As you type into the Short Name field, the wizard fills in suggested names in the other fields.
2. 3. 4.
5.
Enter the programmatic identifier for the class. In the BankSoft example, change the ProgID (programmatic identifier) field to COM_VC_BankSoft.BSoftFin. A programmatic identifier, or ProgID, is the human-readable name for a class. Internally, classes are identified by numeric CLSID's. For example:
{33B17632-1D9F-11D1-8790-0000C044ACF9}
The standard format of a ProgID is Project.Class[.Version]. In the Designer, you refer to COM classes through ProgIDs.
6. 7.
Select the Attributes tab and set the threading model to Free, the interface to Dual, and the aggregation setting to No. Click OK.
Now that you have a basic class definition, you can add a method to it.
Return to the Classes View tab of the Workspace Window. Expand the tree view.
171
Right-click the newly-added class. In the BankSoft example, you right-click the IBSoftFin tree item.
4.
Click the Add Method menu item and enter the name of the method. In the BankSoft example, you enter FV.
5.
In the Parameters field, enter the signature of the method. For FV, enter the following:
[in] double Rate, [in] long nPeriods, [in] double Payment, [in] double PresentValue, [in] long PaymentType, [out, retval] double* FV
This signature is expressed in terms of the Microsoft Interface Description Language (MIDL). For a complete description of MIDL, see the MIDL language reference. Note that:
[in] indicates that the parameter is an input parameter. [out] indicates that the parameter is an output parameter. [out, retval] indicates that the parameter is the return value of the method.
Also, note that all [out] parameters are passed by reference. In the BankSoft example, the parameter FV is a double.
6.
Click OK. The Developer Studio adds to the project a stub for the method you added.
In the BankSoft example, return to the Class View tab of the Workspace window and expand the COM_VC_BankSoft classes item. Expand the CBSoftFin item. Expand the IBSoftFin item under the above item. Right-click the FV item and choose Go to Definition. Position the cursor in the edit window on the line after the TODO comment and add the following code:
double v = pow((1 + Rate), nPeriods); *FV = -( (PresentValue * v) + (Payment * (1 + (Rate * PaymentType))) * ((v - 1) / Rate) );
172
Since you refer to the pow function, you have to add the following preprocessor statement after all other include statements at the beginning of the file:
#include <math.h>
The final step is to build the DLL. When you build it, you register the COM procedure with the Windows registry.
Pull down the Build menu. Select Rebuild All. As Developer Studio builds the project, it generates the following output:
------------Configuration: COM_VC_BankSoft - Win32 Debug-------------Performing MIDL step Microsoft (R) MIDL Compiler Version 3.01.75 Copyright (c) Microsoft Corp 1991-1997. All rights reserved. Processing .\COM_VC_BankSoft.idl COM_VC_BankSoft.idl Processing C:\msdev\VC\INCLUDE\oaidl.idl oaidl.idl Processing C:\msdev\VC\INCLUDE\objidl.idl objidl.idl Processing C:\msdev\VC\INCLUDE\unknwn.idl unknwn.idl Processing C:\msdev\VC\INCLUDE\wtypes.idl wtypes.idl Processing C:\msdev\VC\INCLUDE\ocidl.idl ocidl.idl Processing C:\msdev\VC\INCLUDE\oleidl.idl oleidl.idl Compiling resources... Compiling... StdAfx.cpp Compiling... COM_VC_BankSoft.cpp BSoftFin.cpp Generating Code... Linking... Creating library Debug/COM_VC_BankSoft.lib and object Debug/ COM_VC_BankSoft.exp Registering ActiveX Control... RegSvr32: DllRegisterServer in .\Debug\COM_VC_BankSoft.dll succeeded. COM_VC_BankSoft.dll - 0 error(s), 0 warning(s)
Notice that Visual C++ compiles the files in the project, links them into a dynamic link library (DLL) called COM_VC_BankSoft.DLL, and registers the COM (ActiveX) class COM_VC_BankSoft.BSoftFin in the local registry. Once the component is registered, it is accessible to the Integration Service running on that host.
173
For more information about how to package COM classes for distribution to other Integration Services, see Distributing External Procedures on page 190. For more information about how to use COM external procedures to call functions in a preexisting library of C or C++ functions, see Wrapper Classes for Pre-Existing C/C++ Libraries or VB Functions on page 194. For more information about how to use a class factory to initialize COM objects, see Initializing COM and Informatica Modules on page 196.
Open the Transformation Developer. Click Transformation > Import External Procedure. The Import External COM Method dialog box appears.
3.
4.
Select the COM DLL you created and click OK. In the Banksoft example, select COM_VC_Banksoft.DLL.
5. 6. 7.
Under Select Method tree view, expand the class node (in this example, BSoftFin). Expand Methods. Select the method you want (in this example, FV) and press OK. The Designer creates an External Procedure transformation.
174
8.
Open the External Procedure transformation, and select the Properties tab.
Enter ASCII characters in the Module/Programmatic Identifier and Procedure Name fields.
9.
Enter ASCII characters in the Port Name fields. For more information about mapping Visual C++ and Visual Basic datatypes to COM datatypes, see COM Datatypes on page 192.
10.
Click OK, and then click Repository > Save. The repository now contains the reusable transformation, so you can add instances of this transformation to mappings.
175
Use the Source Analyzer and the Target Designer to import FVInputs and FVOutputs into the same folder as the one in which you created the COM_BSFV transformation.
In the Mapping Designer, create a new mapping named Test_BSFV. Drag the source table FVInputs into the mapping. Drag the target table FVOutputs into the mapping. Drag the transformation COM_BSFV into the mapping.
5.
Connect the Source Qualifier transformation ports to the External Procedure transformation ports as appropriate.
176
6. 7.
Connect the FV port in the External Procedure transformation to the FVIn_ext_proc target column. Validate and save the mapping.
Uses the COM runtime facilities to load the DLL and create an instance of the class. Uses the COM IDispatch interface to call the external procedure you defined once for every row that passes through the mapping.
Note: Multiple classes, each with multiple methods, can be defined within a single project.
In the Workflow Manager, create the session s_Test_BSFV from the Test_BSFV mapping. Create a workflow that contains the session s_Test_BSFV. Run the workflow. The Integration Service searches the registry for the entry for the COM_VC_BankSoft.BSoftFin class. This entry has information that allows the Integration Service to determine the location of the DLL that contains that class. The Integration Service loads the DLL, creates an instance of the class, and invokes the FV function for every row in the source table. When the workflow finishes, the FVOutputs table should contain the following results:
FVIn_ext_proc 2581.403374 12682.503013 82846.246372 2301.401830
177
Launch Visual Basic and click File > New Project. In the dialog box that appears, select ActiveX DLL as the project type and click OK. Visual Basic creates a new project named Project1. If the Project window does not display, type Ctrl+R, or click View > Project Explorer. If the Properties window does not display, press F4, or click View > Properties.
3. 4.
In the Project Explorer window for the new project, right-click the project and choose Project1 Properties from the menu that appears. Enter the name of the new project. In the Project window, select Project1 and change the name in the Properties window to COM_VB_BankSoft.
Inside the Project Explorer, select the Project Project1 item, which should be the root item in the tree control. The project properties display in the Properties Window. Select the Alphabetic tab in the Properties Window and change the Name property to COM_VB_BankSoft. This renames the root item in the Project Explorer to COM_VB_BankSoft (COM_VB_BankSoft). Expand the COM_VB_BankSoft (COM_VB_BankSoft) item in the Project Explorer. Expand the Class Modules item. Select the Class1 (Class1) item. The properties of the class display in the Properties Window. Select the Alphabetic tab in the Properties Window and change the Name property to BSoftFin.
3. 4. 5. 6.
By changing the name of the project and class, you specify that the programmatic identifier for the class you create is COM_VB_BankSoft.BSoftFin. Use this ProgID to refer to this class inside the Designer.
178
This Visual Basic FV function, of course, performs the same operation as the C++ FV function in Developing COM Procedures with Visual Basic on page 177.
From the File menu, select the Make COM_VB_BankSoft.DLL. A dialog box prompts you for the file location. Enter the file location and click OK.
Visual Basic compiles the source code and creates the COM_VB_BankSoft.DLL in the location you specified. It also registers the class COM_VB_BankSoft.BSoftFin in the local registry. Once the component is registered, it is accessible to the Integration Service running on that host. For more information about how to package Visual Basic COM classes for distribution to other machines hosting the Integration Service, see Distributing External Procedures on page 190. For more information about how to use Visual Basic external procedures to call preexisting Visual Basic functions, see Wrapper Classes for Pre-Existing C/C++ Libraries or VB Functions on page 194. To create the procedure, follow steps 6 to 9 of Using Visual C++ to Develop COM Procedures on page 170.
179
Open the Transformation Developer and create an External Procedure transformation. Open the transformation and enter a name for it. In the BankSoft example, enter EP_extINF_BSFV.
3.
Create a port for each argument passed to the procedure you plan to define. Be sure that you use the correct datatypes.
180
To use the FV procedure as an example, you create the following ports. The last port, FV, captures the return value from the procedure:
4.
Select the Properties tab and configure the procedure as an Informatica procedure. In the BankSoft example, enter the following:
181
Note on Module/Programmatic Identifier: The following table describes how the module name determines the name of the DLL or shared object on the various platforms:
Operating System Windows AIX HPUX Linux Solaris 5. Module Identifier INF_BankSoft INF_BankSoft INF_BankSoft INF_BankSoft INF_BankSoft Library File Name INF_BankSoft.DLL libINF_BankSoftshr.a libINF_BankSoft.sl libINF_BankSoft.so libINF_BankSoft.so.1
Click OK.
After you create the External Procedure transformation that calls the procedure, the next step is to generate the C++ files.
File names. A prefix tx is used for TX module files. Module class names. The generated code has class declarations for the module that contains the TX procedures. A prefix Tx is used for TX module classes. For example, if an External Procedure transformation has a module name Mymod, then the class name is TxMymod.
Select the transformation and click Transformation > Generate Code. Select the check box next to the name of the procedure you just created. In the BankSoft example, select INF_BankSoft.FV.
3.
Specify the directory where you want to generate the files, and click Generate. The Designer creates a subdirectory, INF_BankSoft, in the directory you specified. Each External Procedure transformation created in the Designer must specify a module and a procedure name. The Designer generates code in a single directory for all transformations sharing a common module name. Building the code in one directory creates a single shared library. The Designer generates the following files:
182
tx<moduleName>.h. Defines the external procedure module class. This class is derived from a base class TINFExternalModule60. No data members are defined for this class in the generated code. However, you can add new data members and methods here. tx<moduleName>.cpp. Implements the external procedure module class. You can expand the InitDerived() method to include initialization of any new data members you add. The Integration Service calls the derived class InitDerived() method only when it successfully completes the base class Init() method.
This file defines the signatures of all External Procedure transformations in the module. Any modification of these signatures leads to inconsistency with the External Procedure transformations defined in the Designer. Therefore, you should not change the signatures. This file also includes a C function CreateExternalModuleObject, which creates an object of the external procedure module class using the constructor defined in this file. The Integration Service calls CreateExternalModuleObject instead of directly calling the constructor.
<procedureName>.cpp. The Designer generates one of these files for each external procedure in this module. This file contains the code that implements the procedure logic, such as data cleansing and filtering. For data cleansing, create code to read in values from the input ports and generate values for output ports. For filtering, create code to suppress generation of output rows by returning INF_NO_OUTPUT_ROW. stdafx.h. Stub file used for building on UNIX systems. The various *.cpp files include this file. On Windows systems, the Visual Studio generates an stdafx.h file, which should be used instead of the Designer generated file. version.cpp. This is a small file that carries the version number of this implementation. In earlier releases, external procedure implementation was handled differently. This file allows the Integration Service to determine the version of the external procedure module. makefile.aix, makefile.aix64, makefile.hp, makefile.hp64, makefile.hpparisc64, makefile.linux, makefile.sol. Make files for UNIX platforms. Use makefile.aix, makefile.hp, makefile.linux, and makefile.sol for 32-bit platforms. Use makefile.aix64 for 64-bit AIX platforms and makefile.hp64 for 64-bit HP-UX (Itanium) platforms.
Example 1
In the BankSoft example, the Designer generates the following files:
txinf_banksoft.h. Contains declarations for module class TxINF_BankSoft and external procedure FV. txinf_banksoft.cpp. Contains code for module class TxINF_BankSoft. fv.cpp. Contains code for procedure FV. version.cpp. Returns TX version. stdafx.h. Required for compilation on UNIX. On Windows, stdafx.h is generated by Visual Studio. readme.txt. Contains general help information.
Developing Informatica External Procedures 183
Example 2
If you create two External Procedure transformations with procedure names Myproc1 and Myproc2, both with the module name Mymod, the Designer generates the following files:
txmymod.h. Contains declarations for module class TxMymod and external procedures Myproc1 and Myproc2. txmymod.cpp. Contains code for module class TxMymod. myproc1.cpp. Contains code for procedure Myproc1. myproc2.cpp. Contains code for procedure Myproc2. version.cpp. stdafx.h. readme.txt.
Open the <Procedure_Name>.cpp stub file generated for the procedure. In the BankSoft example, you open fv.cpp to code the TxINF_BankSoft::FV procedure.
2.
Enter the C++ code for the procedure. The following code implements the FV procedure:
INF_RESULT TxINF_BankSoft::FV() { // // // // // Input port values are mapped to the m_pInParamVector array in the InitParams method. Use m_pInParamVector[i].IsValid() to check if they are valid. Use m_pInParamVector[i].GetLong or GetDouble, etc. to get their value. Generate output data into m_pOutParamVector. TODO: Fill in implementation of the FV method here. ostrstream ss; char* s; INF_Boolean bVal; double v; TINFParam* Rate = &m_pInParamVector[0]; TINFParam* nPeriods = &m_pInParamVector[1]; TINFParam* Payment = &m_pInParamVector[2]; TINFParam* PresentValue = &m_pInParamVector[3]; TINFParam* PaymentType = &m_pInParamVector[4]; TINFParam* FV = &m_pOutParamVector[0]; bVal =
184
INF_Boolean( Rate->IsValid() && nPeriods->IsValid() && Payment->IsValid() && PresentValue->IsValid() && PaymentType->IsValid() ); if (bVal == INF_FALSE) { FV->SetIndicator(INF_SQL_DATA_NULL); return INF_SUCCESS; } v = pow((1 + Rate->GetDouble()), (double)nPeriods->GetLong()); FV->SetDouble( -( (PresentValue->GetDouble() * v) + (Payment->GetDouble() * (1 + (Rate->GetDouble() * PaymentType->GetLong()))) * ((v - 1) / Rate->GetDouble()) ) ); ss << "The calculated future value is: " << FV->GetDouble() <<ends; s = ss.str(); (*m_pfnMessageCallback)(E_MSG_TYPE_LOG, 0, s); (*m_pfnMessageCallback)(E_MSG_TYPE_ERR, 0, s); delete [] s; return INF_SUCCESS; }
The Designer generates the function profile, including the arguments and return value. You need to enter the actual code within the function, as indicated in the comments. Since you referenced the POW function and defined an ostrstream variable, you must also include the preprocessor statements: On Windows:
#include <math.h>; #include <strstrea.h>;
185
Start Visual C++. Click File > New. In the New dialog box, click the Projects tab and select the MFC AppWizard (DLL) option. Enter its location. In the BankSoft example, you enter c:\pmclient\tx\INF_BankSoft, assuming you generated files in c:\pmclient\tx.
5.
Enter the name of the project. It must be the same as the module name entered for the External Procedure transformation. In the BankSoft example, it is INF_BankSoft.
6.
Click OK. Visual C++ now steps you through a wizard that defines all the components of the project.
7. 8.
In the wizard, click MFC Extension DLL (using shared MFC DLL). Click Finish. The wizard generates several files.
9. 10.
Click Project > Add To Project > Files. Navigate up a directory level. This directory contains the external procedure files you created. Select all .cpp files. In the BankSoft example, add the following files:
11. 12.
Click Project > Settings. Click the C/C++ tab, and select Preprocessor from the Category field.
186
In the Additional Include Directories field, enter ..; <pmserver install dir>\extproc\include. Click the Link tab, and select General from the Category field. Enter <pmserver install dir>\bin\pmtx.lib in the Object/Library Modules field. Click OK. Click Build > Build INF_BankSoft.dll or press F7 to build the project. The compiler now creates the DLL and places it in the debug or release directory under the project directory. For information about running a workflow with the debug version, see Running a Session with the Debug Version of the Module on Windows on page 188.
If you cannot access the PowerCenter Client tools directly, copy all the files you need for the shared library to the UNIX machine where you plan to perform the build. For example, in the BankSoft procedure, use ftp or another mechanism to copy everything from the INF_BankSoft directory to the UNIX machine.
2.
3.
Enter the command to make the project. The command depends on the version of UNIX, as summarized below:
UNIX Version AIX (32-bit) AIX (64-bit) HP-UX (32-bit) HP-UX (64-bit) Linux Solaris Command make -f makefile.aix make -f makefile.aix64 make -f makefile.hp make -f makefile.hp64 make -f makefile.linux make -f makefile.sol
187
In the Workflow Manager, create a workflow. Create a session for this mapping in the workflow.
Tip: Alternatively, you can create a re-usable session in the Task Developer and use it in
the workflow.
3. 4.
Copy the library (DLL) to the Runtime Location directory. Run the workflow containing the session.
In the Workflow Manager, create a workflow. Create a session for this mapping in the workflow. Or, you can create a re-usable session in the Task Developer and use it in the workflow.
3. 4.
Copy the library (DLL) to the Runtime Location directory. To use the debug build of the External Procedure transformation library:
Preserve pmtx.dll by renaming it or moving it from the server bin directory. Rename pmtxdbg.dll to pmtx.dll.
5. 6.
Run the workflow containing the session. To revert the release build of the External Procedure transformation library back to the default library:
Rename pmtx.dll back to pmtxdbg.dll. Return/rename the original pmtx.dll file to the server bin directory.
188
Note: If you run a workflow containing this session with the debug version of the module on
Windows, you must return the original pmtx.dll file to its original name and location before you can run a non-debug session.
189
After you build the DLL, exit Visual Basic and launch the Visual Basic Application Setup wizard. Skip the first panel of the wizard. On the second panel, specify the location of the project and select the Create a Setup Program option. In the third panel, select the method of distribution you plan to use. In the next panel, specify the directory to which you want to write the setup files. For simple ActiveX components, you can continue to the final panel of the wizard. Otherwise, you may need to add more information, depending on the type of file and the method of distribution.
190
6.
Click Finish in the final panel. Visual Basic then creates the setup program for the DLL. Run this setup program on any Windows machine where the Integration Service is running.
Copy the DLL to the directory on the new Windows machine anywhere you want it saved. Log in to this Windows machine and open a DOS prompt. Navigate to the directory containing the DLL and execute the following command:
REGSVR32 project_name.DLL
project_name is the name of the DLL you created. In the BankSoft example, the project name is COM_VC_BankSoft.DLL. or COM_VB_BankSoft.DLL. This command line program then registers the DLL and any COM classes contained in it.
Move the DLL or shared object that contains the external procedure to a directory on a machine that the Integration Service can access. Copy the External Procedure transformation from the original repository to the target repository using the Designer client tool. -orExport the External Procedure transformation to an XML file and import it in the target repository. For more information, see Exporting and Importing Objects in the Repository Guide.
191
Development Notes
This section includes some additional guidelines and information about developing COM and Informatica external procedures.
COM Datatypes
When using either Visual C++ or Visual Basic to develop COM procedures, you need to use COM datatypes that correspond to the internal datatypes that the Integration Service uses when reading and transforming data. These datatype matches are important when the Integration Service attempts to map datatypes between ports in an External Procedure transformation and arguments (or return values) from the procedure the transformation calls. Table 7-4 compares Visual C++ and transformation datatypes:
Table 7-4. Visual C++ and Transformation Datatypes
Visual C++ COM Datatype VT_I4 VT_UI4 VT_R8 VT_BSTR VT_DECIMAL VT_DATE Transformation Datatype Integer Integer Double String Decimal Date/Time
If you do not correctly match datatypes, the Integration Service may attempt a conversion. For example, if you assign the Integer datatype to a port, but the datatype for the corresponding argument is BSTR, the Integration Service attempts to convert the Integer value to a BSTR.
192
Row-Level Procedures
All External Procedure transformations call procedures using values from a single row passed through the transformation. You cannot use values from multiple rows in a single procedure call. For example, you could not code the equivalent of the aggregate functions SUM or AVG into a procedure call. In this sense, all external procedures must be stateless.
INF_SUCCESS. The external procedure processed the row successfully. The Integration Service passes the row to the next transformation in the mapping. INF_NO_OUTPUT_ROW. The Integration Service does not write the current row due to external procedure logic. This is not an error. When you use INF_NO_OUTPUT_ROW to filter rows, the External Procedure transformation behaves similarly to the Filter transformation.
Note: When you use INF_NO_OUTPUT_ROW in the external procedure, make sure you
connect the External Procedure transformation to another transformation that receives rows from the External Procedure transformation only.
INF_ROW_ERROR. Equivalent to a transformation error. The Integration Service discards the current row, but may process the next row unless you configure the session to stop on n errors. INF_FATAL_ERROR. Equivalent to an ABORT() function call. The Integration Service aborts the session and does not process any more rows.
When the Integration Service creates an object of type Tx<MODNAME>, it passes to its constructor a pointer to a callback function that can be used to write error or debugging messages to the session log. (The code for the Tx<MODNAME> constructor is in the file Tx<MODNAME>.cpp.) This pointer is stored in the Tx<MODNAME> member variable m_pfnMessageCallback. The type of this pointer is defined in a typedef in the file $PMExtProcDir/include/infemmsg.h:
typedef void (*PFN_MESSAGE_CALLBACK)( enum E_MSG_TYPE eMsgType, unsigned long Code, char* Message );
194
If you specify the eMsgType of the callback function as E_MSG_TYPE_LOG, the callback function will write a log message to the session log. If you specify E_MSG_TYPE_ERR, the callback function writes an error message to the session log. If you specify E_MSG_TYPE_WARNING, the callback function writes an warning message to the session log. Use these messages to provide a simple debugging capability in Informatica external procedures. To debug COM external procedures, you may use the output facilities available from inside a Visual Basic or C++ class. For example, in Visual Basic use a MsgBox to print out the result of a calculation for each row. Of course, you want to do this only on small samples of data while debugging and make sure to remove the MsgBox before making a production run.
Note: Before attempting to use any output facilities from inside a Visual Basic or C++ class,
you must add the following value to the registry: 1. Add the following entry to the Windows registry:
\HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\PowerMart\Parameter s\MiscInfo\RunInDebugMode=Yes
This option starts the Integration Service as a regular application, not a service. You can debug the Integration Service without changing the debug privileges for the Integration Service service while it is running. 2. Start the Integration Service from the command line, using the command PMSERVER.EXE. The Integration Service is now running in debug mode. When you are finished debugging, make sure you remove this entry from the registry or set RunInDebugMode to No. Otherwise, when you attempt to start PowerCenter as a service, it will not start. 1. Stop the Integration Service and change the registry entry you added earlier to the following setting:
\HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\PowerMart\Parameter s\MiscInfo\RunInDebugMode=No
2.
Development Notes
195
external procedure, you will just get NULLs for all the output parameters. The TINFParam class also supports functions for obtaining the metadata for a particular parameter. For a complete description of all the member functions of the TINFParam class, see the infemdef.h include file in the tx/include directory. Note that one of the main advantages of Informatica external procedures over COM external procedures is that Informatica external procedures directly support indicator manipulation. That is, you can check an input parameter to see if it is NULL, and you can set an output parameter to NULL. COM provides no indicator support. Consequently, if a row entering a COM-style external procedure has any NULLs in it, the row cannot be processed. Use the default value facility in the Designer to overcome this shortcoming. However, it is not possible to pass NULLs out of a COM function.
When a row passes through the transformation containing the expression, the Integration Service calls the procedure associated with the External Procedure transformation. The expression captures the return value of the procedure through the External Procedure transformation return port, which should have the Result (R) option checked. For more information about expressions, see Working with Expressions on page 10.
Initialization of Informatica-style external procedures. The Tx<MODNAME> class, which contains the external procedure, also contains the initialization function, Tx<MODNAME>::InitDerived. The signature of this initialization function is wellknown to the Integration Service and consists of three parameters:
nInitProps. This parameter tells the initialization function how many initialization properties are being passed to it. Properties. This parameter is an array of nInitProp strings representing the names of the initialization properties.
196
Values. This parameter is an array of nInitProp strings representing the values of the initialization properties.
The Integration Service first calls the Init() function in the base class. When the Init() function successfully completes, the base class calls the Tx<MODNAME>::InitDerived() function. The Integration Service creates the Tx<MODNAME> object and then calls the initialization function. It is the responsibility of the external procedure developer to supply that part of the Tx<MODNAME>::InitDerived() function that interprets the initialization properties and uses them to initialize the external procedure. Once the object is created and initialized, the Integration Service can call the external procedure on the object for each row.
Initialization of COM-style external procedures. The object that contains the external procedure (or EP object) does not contain an initialization function. Instead, another object (the CF object) serves as a class factory for the EP object. The CF object has a method that can create an EP object. The signature of the CF object method is determined from its type library. The Integration Service creates the CF object, and then calls the method on it to create the EP object, passing this method whatever parameters are required. This requires that the signature of the method consist of a set of input parameters, whose types can be determined from the type library, followed by a single output parameter that is an IUnknown** or an IDispatch** or a VARIANT* pointing to an IUnknown* or IDispatch*. The input parameters hold the values required to initialize the EP object and the output parameter receives the initialized object. The output parameter can have either the [out] or the [out, retval] attributes. That is, the initialized object can be returned either as an output parameter or as the return value of the method. The datatypes supported for the input parameters are:
Development Notes
197
COM VC type VT_UI1 VT_BOOL VT_I2 VT_UI2 VT_I4 VT_UI4 VT_R4 VT_R8 VT_BSTR VT_CY VT_DATE
Programmatic Identifier for Class Factory. Enter the programmatic identifier of the class factory. Constructor. Specify the method of the class factory that creates the EP object.
198
Figure 7-3 shows the Initialization Properties tab of a COM-style External Procedure transformation:
Figure 7-3. External Procedure Transformation Initialization Properties
You can enter an unlimited number of initialization properties to pass to the Constructor method for both COM-style and Informatica-style External Procedure transformations. To add a new initialization property, click the Add button. Enter the name of the parameter in the Property column and enter the value of the parameter in the Value column. For example, you can enter the following parameters:
Parameter Param1 Param2 Param3 Value abc 100 3.17
Note: You must create a one-to-one relation between the initialization properties you define in
the Designer and the input parameters of the class factory constructor method. For example, if the constructor has n parameters with the last parameter being the output parameter that receives the initialized object, you must define n 1 initialization properties in the Designer, one for each input parameter in the constructor method. You can also use process variables in initialization properties. For information about process variables support in Initialization properties, see Service Process Variables in Initialization Properties on page 201.
Development Notes
199
Following are the library files located under the path <PMInstallDir> that are needed for linking external procedures and running the session:
libpmtx.a (AIX) libpmtx.sl (HP-UX) libpmtx.so (Linux) libpmtx.so (Solaris) pmtx.dll and pmtx.lib (Windows)
200
Table 7-6 contains the initialization properties and values for the External Procedure transformation in Figure 7-4:
Table 7-6. External Procedure Initialization Properties
Property mytempdir memorysize input_file output_file extra_var Value $PMTempDir 5000000 $PMSourceFileDir/file.in $PMTargetFileDir/file.out $some_other_variable Expanded Value Passed to the External Procedure Library /tmp 5000000 /data/input/file.in /data/output/file.out $some_other_variable
When you run the workflow, the Integration Service expands the property list and passes it to the external procedure initialization function. Assuming that the values of the built-in process variables $PMTempDir is /tmp, $PMSourceFileDir is /data/input, and $PMTargetFileDir is /data/output, the last column in Table 7-6 contains the property and expanded value information. Note that the Integration Service does not expand the last property $some_other_variable because it is not a built-in process variable.
Service Process Variables in Initialization Properties 201
Dispatch External procedure Property access Parameter access Code page access Transformation name access Procedure access Partition related Tracing level
Dispatch Function
The Integration Service calls the dispatch function to pass each input row to the external procedure module. The dispatch function, in turn, calls the external procedure function you specify. External procedures access the ports in the transformation directly using the member variable m_pInParamVector for input ports and m_pOutParamVector for output ports.
Signature
The dispatch function has a fixed signature which includes one index parameter.
virtual INF_RESULT Dispatch(unsigned long ProcedureIndex) = 0
Signature
The external procedure function has no parameters. The input parameter array is already passed through the InitParams() method and stored in the member variable m_pInParamVector. Each entry in the array matches the corresponding IN and IN-OUT ports of the External Procedure transformation, in the same order. The Integration Service fills this vector before calling the dispatch function.
202 Chapter 7: External Procedure Transformation
Use the member variable m_pOutParamVector to pass the output row before returning the Dispatch() function. For the MyExternal Procedure transformation, the external procedure function is the following, where the input parameters are in the member variable m_pInParamVector and the output values are in the member variable m_pOutParamVector:
INF_RESULT Tx<ModuleName>::MyFunc()
Signature
Informatica provides the following functions in the base class:
TINFConfigEntriesList* TINFBaseExternalModule60::accessConfigEntriesList(); const char* GetConfigEntry(const char* LHS);
GetConfigEntryValue() property access functions to access the initialization property names and values. You can call these functions from a TX program. The TX program then converts this string value into a number, for example by using atoi or sscanf. In the following example, addFactor is an Initialization Property. accessConfigEntriesList() is a member variable of the TX base class and does not need to be defined.
const char* addFactorStr = accessConfigEntriesList()-> GetConfigEntryValue("addFactor");
203
Signature
A parameter passed to an external procedure is a pointer to an object of the TINFParam class. This fixed-signature function is a method of that class and returns the parameter datatype as an enum value. The valid datatypes are: INF_DATATYPE_LONG INF_DATATYPE_STRING INF_DATATYPE_DOUBLE INF_DATATYPE_RAW INF_DATATYPE_TIME Table 7-7 lists a brief description of some parameter access functions:
Table 7-7. Descriptions of Parameter Access Functions
Parameter Access Function INF_DATATYPE GetDataType(void); Description Gets the datatype of a parameter. Use the parameter datatype to determine which datatype-specific function to use when accessing parameter values. Verifies that input data is valid. Returns FALSE if the parameter contains truncated data and is a string. Verifies that input data is NULL. Verifies that input port passing data to this parameter is connected to a transformation. Verifies that output port receiving data from this parameter is connected to a transformation. Verifies that parameter corresponds to an input port. Verifies that parameter corresponds to an output port. Gets the name of the parameter.
INF_Boolean IsValid(void); INF_Boolean IsNULL(void); INF_Boolean IsInputMapped (void); INF_Boolean IsOutput Mapped (void); INF_Boolean IsInput(void); INF_Boolean IsOutput(void); INF_Boolean GetName(void);
204
double GetDouble(void);
char* GetString(void);
char* GetRaw(void);
void SetLong(long lVal); void SetDouble(double dblVal); void SetString(char* sVal); void SetRaw(char* rVal, size_t ActualDataLen); void SetTime(TINFTime timeVal);
Only use the SetInt32 or GetInt32 function when you run the external procedure on a 64-bit Integration Service. Do not use any of the following functions:
205
Table 7-8 lists the member variables of the external procedure base class:
Table 7-8. Member Variable of the External Procedure Base Class
Variable m_nInParamCount m_pInParamVector m_nOutParamCount m_pOutParamVector Description Number of input parameters. Actual input parameter array. Number of output parameters. Actual output parameter array.
Signature
Use the following functions to obtain the Integration Service code page through the external procedure program. Both functions return equivalent information.
int GetServerCodePageID() const; const char* GetServerCodePageName() const;
Use the following functions to obtain the code page of the data the external procedure processes through the external procedure program. Both functions return equivalent information.
int GetDataCodePageID(); // returns 0 in case of error const char* GetDataCodePageName() const; // returns NULL in case of error
Signature
The char* returned by the transformation name access functions is an MBCS string in the code page of the Integration Service. It is not in the data code page.
206 Chapter 7: External Procedure Transformation
Signature
Use the following function to get the name of the external procedure associated with the External Procedure transformation:
const char* GetProcedureName() const;
Use the following function to get the index of the external procedure associated with the External Procedure transformation:
inline unsigned long GetProcedureIndex() const;
Signature
Use the following function to obtain the number of partitions in a session:
unsigned long GetNumberOfPartitions();
Use the following function to obtain the index of the partition that called this external procedure:
unsigned long GetPartitionIndex();
207
Signature
Use the following function to return the session trace level:
TracingLevelType GetSessionTraceLevel();
208
Chapter 8
Filter Transformation
Overview, 210 Filter Transformation Components, 211 Filter Condition, 214 Steps to Create a Filter Transformation, 215 Tips, 216
209
Overview
Transformation type: Active Connected
Use the Filter transformation to filter out rows in a mapping. As an active transformation, the Filter transformation may change the number of rows passed through it. The Filter transformation allows rows that meet the specified filter condition to pass through. It drops rows that do not meet the condition. You can filter data based on one or more conditions. A filter condition returns TRUE or FALSE for each row that the Integration Service evaluates, depending on whether a row meets the specified condition. For each row that returns TRUE, the Integration Services pass through the transformation. For each row that returns FALSE, the Integration Service drops and writes a message to the session log. The mapping in Figure 8-1 passes the rows from a human resources table that contains employee data through a Filter transformation. The filter only allows rows through for employees that make salaries of $30,000 or higher.
Figure 8-1. Sample Mapping with a Filter Transformation
You cannot concatenate ports from more than one transformation into the Filter transformation. The input ports for the filter must come from a single transformation.
Tip: Place the Filter transformation as close to the sources in the mapping as possible to
maximize session performance. Rather than passing rows you plan to discard through the mapping, you can filter out unwanted data early in the flow of data from sources to targets.
210
Transformation. Enter the name and description of the transformation. The naming convention for a Filter transformation is FIL_TransformationName. You can also make the transformation reusable. Ports. Create and configure ports. For more information, see Configuring Filter Transformation Ports on page 211. Properties. Configure the filter condition to filter rows. Use the Expression Editor to enter the filter condition. For more information about filter conditions, see Filter Condition on page 214. You can also configure the tracing level to determine the amount of transaction detail reported in the session log file. Metadata Extensions. Create a non-reusable metadata extension to extend the metadata of the transformation transformation. Configure the extension name, datatype, precision, and value. You can also promote metadata extensions to reusable extensions if you want to make it available to all transformation transformations. For more information about creating metadata extensions, see Metadata Extensions in the Repository Guide.
211
Port name. Name of the port. For more information about naming ports, see Working with Ports on page 7. Datatype, precision, and scale. Configure the datatype and set the precision and scale for each port. Port type. All ports are input/output ports. The input ports receive data and output ports pass data. Default values and description. Set default value for ports and add description. For more information about using default values, see Using Default Values for Ports on page 20.
212
Tracing Level
Optional
213
Filter Condition
The filter condition is an expression that returns TRUE or FALSE. Enter conditions using the Expression Editor available on the Properties tab. Any expression that returns a single value can be used as a filter. For example, if you want to filter out rows for employees whose salary is less than $30,000, you enter the following condition:
SALARY > 30000
You can specify multiple components of the condition, using the AND and OR logical operators. If you want to filter out employees who make less than $30,000 and more than $100,000, you enter the following condition:
SALARY > 30000 AND SALARY < 100000
You can also enter a constant for the filter condition. The numeric equivalent of FALSE is zero (0). Any non-zero value is the equivalent of TRUE. For example, the transformation contains a port named NUMBER_OF_UNITS with a numeric datatype. You configure a filter condition to return FALSE if the value of NUMBER_OF_UNITS equals zero. Otherwise, the condition returns TRUE. You do not need to specify TRUE or FALSE as values in the expression. TRUE and FALSE are implicit return values from any condition you set. If the filter condition evaluates to NULL, the row is treated as FALSE.
Note: The filter condition is case sensitive.
This condition states that if the FIRST_NAME port is NULL, the return value is FALSE and the row should be discarded. Otherwise, the row passes through to the next transformation. For more information about the ISNULL and IS_SPACES functions, see the Transformation Language Reference.
214
In the Mapping Designer, open a mapping. Click Transformation > Create. Select Filter transformation. Enter a name for the transformation. Click Create and then click Done. Select and drag all the ports from a source qualifier or other transformation to add them to the Filter transformation. Double-click on the title bar and click on Ports tab. You can also manually create ports within the transformation. Click the Properties tab to configure the filter condition and tracing level.
7. 8.
In the Value section of the filter condition, open the Expression Editor. Enter the filter condition you want to apply. The default condition returns TRUE. Use values from one of the input ports in the transformation as part of this condition. However, you can also use values from output ports in other transformations.
9. 10. 11.
Enter an expression. Click Validate to verify the syntax of the conditions you entered. Select the tracing level. Add metadata extensions on the Metadata Extensions tab.
215
Tips
Use the Filter transformation early in the mapping. To maximize session performance, keep the Filter transformation as close as possible to the sources in the mapping. Rather than passing rows that you plan to discard through the mapping, you can filter out unwanted data early in the flow of data from sources to targets. Use the Source Qualifier transformation to filter. The Source Qualifier transformation provides an alternate way to filter rows. Rather than filtering rows from within a mapping, the Source Qualifier transformation filters rows when read from a source. The main difference is that the source qualifier limits the row set extracted from a source, while the Filter transformation limits the row set sent to a target. Since a source qualifier reduces the number of rows used throughout the mapping, it provides better performance. However, the Source Qualifier transformation only lets you filter rows from relational sources, while the Filter transformation filters rows from any type of source. Also, note that since it runs in the database, you must make sure that the filter condition in the Source Qualifier transformation only uses standard SQL. The Filter transformation can define a condition using any statement or transformation function that returns either a TRUE or FALSE value. For more information about setting a filter for a Source Qualifier transformation, see Source Qualifier Transformation on page 467.
216
Chapter 9
HTTP Transformation
Overview, 218 Creating an HTTP Transformation, 220 Configuring the Properties Tab, 222 Configuring the HTTP Tab, 224 Examples, 229
217
Overview
Transformation type: Passive Connected
The HTTP transformation enables you to connect to an HTTP server to use its services and applications. When you run a session with an HTTP transformation, the Integration Service connects to the HTTP server and issues a request to retrieve data from or update data on the HTTP server, depending on how you configure the transformation:
Read data from an HTTP server. When the Integration Service reads data from an HTTP server, it retrieves the data from the HTTP server and passes the data to the target or a downstream transformation in the mapping. For example, you can connect to an HTTP server to read current inventory data, perform calculations on the data during the PowerCenter session, and pass the data to the target. Update data on the HTTP server. When the Integration Service writes to an HTTP server, it posts data to the HTTP server and passes HTTP server responses to the target or a downstream transformation in the mapping. For example, you can post data providing scheduling information from upstream transformations to the HTTP server during a session.
Figure 9-1 shows how the Integration Service processes an HTTP transformation:
Figure 9-1. HTTP Transformation Processing
HTTP Server HTTP Request Integration Service HTTP Response
Source
HTTP Transformation
Target
The Integration Service passes data from upstream transformations or the source to the HTTP transformation, reads a URL configured in the HTTP transformation or application connection, and sends an HTTP request to the HTTP server to either read or update data. Requests contain header information and may contain body information. The header contains information such as authentication parameters, commands to activate programs or web services residing on the HTTP server, and other information that applies to the entire HTTP request. The body contains the data the Integration Service sends to the HTTP server.
218
When the Integration Service sends a request to read data, the HTTP server sends back an HTTP response with the requested data. The Integration Service sends the requested data to downstream transformations or the target. When the Integration Service sends a request to update data, the HTTP server writes the data it receives and sends back an HTTP response that the update succeeded. The HTTP transformation considers response codes 200 and 202 as a success. It considers all other response codes as failures. The session log displays an error when an HTTP server passes a response code that is considered a failure to the HTTP transformation. The Integration Service then sends the HTTP response to downstream transformations or the target. You can configure the HTTP transformation for the headers of HTTP responses. HTTP response body data passes through the HTTPOUT output port.
Authentication
The HTTP transformation uses the following forms of authentication:
Basic. Based on a non-encrypted user name and password. Digest. Based on an encrypted user name and password. NTLM. Based on encrypted user name, password, and domain.
The HTTP server requires authentication. You want to configure the connection timeout. You want to override the base URL in the HTTP transformation.
For information about configuring the HTTP connection object, see the Workflow Administration Guide.
Overview
219
Transformation. Configure the name and description for the transformation. Ports. View input and output ports for the transformation. You cannot add or edit ports on the Ports tab. The Designer creates ports on the Ports tab when you add ports to the header group on the HTTP tab. For more information, see Configuring Groups and Ports on page 225. Properties. Configure properties for the HTTP transformation on the Properties tab. For more information, see Configuring the Properties Tab on page 222. Initialization Properties. You can define properties that the external procedure uses at run time, such as during initialization. For more information about creating initialization properties, see Working with Procedure Properties on page 90. Metadata Extensions. You can specify the property name, datatype, precision, and value. Use metadata extensions for passing information to the procedure. For more information about creating metadata extensions, see Metadata Extensions in the Repository Guide. Port Attribute Definitions. You can view port attributes for HTTP transformation ports. You cannot edit port attribute definitions. HTTP. Configure the method, ports, and URL on the HTTP tab. For more information, see Configuring the HTTP Tab on page 224.
220
In the Transformation Developer or Mapping Designer, click Transformation > Create. Select HTTP transformation. Enter a name for the transformation. Click Create. The HTTP transformation displays in the workspace.
5. 6.
221
Table 9-1 describes the HTTP transformation properties that you can configure:
Table 9-1. HTTP Transformation Properties
Option Runtime Location Description Location that contains the DLL or shared library. Default is $PMExtProcDir. Enter a path relative to the Integration Service machine that runs the session using the HTTP transformation. If you make this property blank, the Integration Service uses the environment variable defined on the Integration Service machine to locate the DLL or shared library. You must copy all DLLs or shared libraries to the runtime location or to the environment variable defined on the Integration Service machine. The Integration Service fails to load the procedure when it cannot locate the DLL, shared library, or a referenced file. Amount of detail displayed in the session log for this transformation. Default is Normal.
Tracing Level
222
223
Select the method. Select GET, POST, or SIMPLE POST method based on whether you want to read data from or write data to an HTTP server. For more information, see Selecting a Method on page 224. Configure groups and ports. Manage HTTP request/response body and header details by configuring input and output ports. You can also configure port names with special characters. For more information, see Configuring Groups and Ports on page 225. Configure a base URL. Configure the base URL for the HTTP server you want to connect to. For more information, see Configuring a URL on page 227.
Method
Groups
Selecting a Method
The groups and ports you define in a transformation depend on the method you select. To read data from an HTTP server, select the GET method. To write data to an HTTP server, select the POST or SIMPLE POST method.
224
To define the metadata for the HTTP request, you must configure input and output ports based on the method you select:
GET method. Use the input group to add input ports that the Designer uses to construct the final URL for the HTTP server. POST or SIMPLE POST method. Use the input group for the data that defines the body of the HTTP request.
For all methods, use the header group for the HTTP request header information.
Output. Contains body data for the HTTP response. Passes responses from the HTTP server to downstream transformations or the target. By default, contains one output port, HTTPOUT. You cannot add ports to the output group. You can modify the precision for the HTTPOUT output port. Input. Contains body data for the HTTP request. Also contains metadata the Designer uses to construct the final URL to connect to the HTTP server. To write data to an HTTP server, the input group passes body information to the HTTP server. By default, contains one input port. Header. Contains header data for the request and response. Passes header information to the HTTP server when the Integration Service sends an HTTP request. Ports you add to the header group pass data for HTTP headers. When you add ports to the header group the Designer adds ports to the input and output groups on the Ports tab. By default, contains no ports.
Note: The data that passes through an HTTP transformation must be of the String datatype.
String data includes any markup language common in HTTP communication, such as HTML and XML.
225
Table 9-3 describes the groups and ports for the GET method:
Table 9-3. GET Method Groups and Ports
Request/ Response REQUEST Group Input Header Description The Designer uses the names and values of the input ports to construct the final URL. You can configure input and input/output ports for HTTP requests. The Designer adds ports to the input and output groups based on the ports you add to the header group: - Input group. Creates input ports based on input and input/output ports from the header group. - Output group. Creates output ports based on input/output ports from the header group. You can configure output and input/output ports for HTTP responses. The Designer adds ports to the input and output groups based on the ports you add to the header group: - Input group. Creates input ports based on input/output ports from the header group. - Output group. Creates output ports based on output and input/output ports from the header group. All body data for an HTTP response passes through the HTTPOUT output port.
RESPONSE
Header
Output
RESPONSE
Header
Output
226
Table 9-5 describes the ports for the SIMPLE POST method:
Table 9-5. SIMPLE POST Method Groups and Ports
Request/ Response REQUEST Group Input Header Description You can add one input port. Body data for an HTTP request can pass through one input port. You can configure input and input/output ports for HTTP requests. The Designer adds ports to the input and output groups based on the ports you add to the header group: - Input group. Creates input ports based on input and input/output ports from the header group. - Output group. Creates output ports based on input/output ports from the header group. You can configure output and input/output ports for HTTP responses. The Designer adds ports to the input and output groups based on the ports you add to the header group: - Input group. Creates input ports based on input/output ports from the header group. - Output group. Creates output ports based on output and input/output ports from the header group. All body data for an HTTP response passes through the HTTPOUT output port.
RESPONSE
Header
Output
Configuring a URL
After you select a method and configure input and output ports, you must configure a URL. Enter a base URL, and the Designer constructs the final URL. If you select the GET method, the final URL contains the base URL and parameters based on the port names in the input group. If you select the POST or SIMPLE POST methods, the final URL is the same as the base URL. You can also specify a URL when you configure an HTTP application connection. The base URL specified in the HTTP application connection overrides the base URL specified in the HTTP transformation.
Note: An HTTP server can redirect an HTTP request to another HTTP server. When this
occurs, the HTTP server sends a URL back to the Integration Service, which then establishes a connection to the other HTTP server. The Integration Service can establish a maximum of five additional connections.
(?), followed by name/value pairs. The Designer appends the question mark and the name/ value pairs that correspond to the names and values of the input ports you add to the input group. When you select the GET method and add input ports to the input group, the Designer appends the following group and port information to the base URL to construct the final URL:
?<input group input port 1 name> = $<input group input port 1 value>
For each input port following the first input group input port, the Designer appends the following group and port information:
& <input group input port n name> = $<input group input port n value>
where n represents the input port. For example, if you enter www.company.com for the base URL and add the input ports ID, EmpName, and Department to the input group, the Designer constructs the following final URL:
www.company.com?ID=$ID&EmpName=$EmpName&Department=$Department
You can edit the final URL to modify or add operators, variables, or other arguments. For more information about HTTP requests and query string, see http://www.w3c.org.
228
Examples
This section contains examples for each type of method:
GET Example
The source file used with this example contains the following data:
78576 78577 78578
Figure 9-5 shows the HTTP tab of the HTTP transformation for the GET example:
Figure 9-5. HTTP Tab for a GET Example
The Designer appends a question mark (?), the input group input port name, a dollar sign ($), and the input group input port name again to the base URL to construct the final URL:
http://www.informatica.com?CR=$CR
Examples
229
The Integration Service sends the source file values to the CR input port of the HTTP transformation and sends the following HTTP requests to the HTTP server:
http://www.informatica.com?CR=78576 http://www.informatica.com?CR=78577 http://www.informatica.com?CR=78578
The HTTP server sends an HTTP response back to the Integration Service, which sends the data through the output port of the HTTP transformation to the target.
POST Example
The source file used with this example contains the following data:
33,44,1 44,55,2 100,66,0
Figure 9-6 shows that each field in the source file has a corresponding input port:
Figure 9-6. HTTP Tab for a POST Example
The Integration Service sends the values of the three fields for each row through the input ports of the HTTP transformation and sends the HTTP request to the HTTP server specified in the final URL.
230
Figure 9-7 shows the HTTP tab of the HTTP transformation for the SIMPLE POST example:
Figure 9-7. HTTP Tab for a SIMPLE POST Example
The Integration Service sends the body of the source file through the input port and sends the HTTP request to the HTTP server specified in the final URL.
Examples
231
232
Chapter 10
Java Transformation
Overview, 234 Using the Java Code Tab, 237 Configuring Ports, 239 Configuring Java Transformation Properties, 241 Developing Java Code, 245 Configuring Java Transformation Settings, 250 Compiling a Java Transformation, 253 Fixing Compilation Errors, 254
233
Overview
Transformation type: Active/Passive Connected
You can extend PowerCenter functionality with the Java transformation. The Java transformation provides a simple native programming interface to define transformation functionality with the Java programming language. You can use the Java transformation to quickly define simple or moderately complex transformation functionality without advanced knowledge of the Java programming language or an external Java development environment. For example, you can define transformation logic to loop through input rows and generate multiple output rows based on a specific condition. You can also use expressions, user-defined functions, unconnected transformations, and mapping variables in the Java code. You create Java transformations by writing Java code snippets that define transformation logic. You can use Java transformation API methods and standard Java language constructs. For example, you can use static code and variables, instance variables, and Java methods. You can use third-party Java APIs, built-in Java packages, or custom Java packages. You can also define and use Java expressions to call expressions from within a Java transformation. For more information about the Java transformation API methods, see Java Transformation API Reference on page 259. For more information about using Java expressions, see Java Expressions on page 273. The PowerCenter Client uses the Java Development Kit (JDK) to compile the Java code and generate byte code for the transformation. The Integration Service uses the Java Runtime Environment (JRE) to execute generated byte code at run time.When you run a session with a Java transformation, the Integration Service uses the JRE to execute the byte code and process input rows and generate output rows. You can define transformation behavior for a Java transformation based on the following events:
The transformation receives an input row The transformation has processed all input rows The transformation receives a transaction notification such as commit or rollback
234
3. 4.
Configure the transformation properties. For more information, see Configuring Java Transformation Properties on page 241. Use the code entry tabs in the transformation to write and compile the Java code for the transformation. For more information, see Developing Java Code on page 245 and Compiling a Java Transformation on page 253. Locate and fix compilation errors in the Java code for the transformation. For more information, see Fixing Compilation Errors on page 254.
5.
Datatype Mapping
The Java transformation maps PowerCenter datatypes to Java primitives, based on the Java transformation port type. The Java transformation maps input port datatypes to Java primitives when it reads input rows, and it maps Java primitives to output port datatypes when it writes output rows. For example, if an input port in a Java transformation has an Integer datatype, the Java transformation maps it to an integer primitive. The transformation treats the value of the input port as Integer in the transformation, and maps the Integer primitive to an integer datatype when the transformation generates the output row. Table 10-1 shows the mapping between PowerCenter datatypes and Java primitives by a Java transformation:
Table 10-1. Mapping from PowerCenter Datatypes to Java Datatypes
PowerCenter Datatype CHAR BINARY LONG (INT32) Java Datatype String byte[] int
Overview
235
* For more information about configuring the Java datatype for PowerCenter Decimal datatypes, see Enabling High Precision on page 251. * * For more information about configuring the Java datatype for PowerCenter Date/Time datatypes, see Processing Subseconds on page 252.
String and byte[] are object datatypes in Java. Int, double, and long are primitive datatypes.
236
Navigator
Code Window
Output Window
Navigator. Add input or output ports or APIs to a code snippet. The Navigator lists the input and output ports for the transformation, the available Java transformation APIs, and a description of the port or API function. For input and output ports, the description includes the port name, type, datatype, precision, and scale. For API functions, the description includes the syntax and use of the API function. The Navigator disables any port or API function that is unavailable for the code entry tab. For example, you cannot add ports or call API functions from the Import Packages code entry tab.
237
For more information about using the Navigator when you develop Java code, see Developing Java Code on page 245.
Code window. Develop Java code for the transformation. The code window uses basic Java syntax highlighting. For more information, see Developing Java Code on page 245. Code entry tabs. Define transformation behavior. Each code entry tab has an associated Code window. To enter Java code for a code entry tab, click the tab and write Java code in the Code window. For more information about the code entry tabs, see Developing Java Code on page 245. Define Expression link. Launches the Define Expression dialog box that you use to create Java expressions. For more information about creating and using Java expressions, see Java Expressions on page 273. Settings link. Launches the Settings dialog box that you use to set the classpath for thirdparty and custom Java packages and to enable high precision for Decimal datatypes. For more information, see Configuring Java Transformation Settings on page 250. Compile link. Compiles the Java code for the transformation. Output from the Java compiler, including error and informational messages, appears in the Output window. For more information about compiling Java transformations, see Compiling a Java Transformation on page 253. Full Code link. Opens the Full Code window to display the complete class code for the Java transformation. The complete code for the transformation includes the Java code from the code entry tabs added to the Java transformation class template. For more information about using the Full Code window, see Fixing Compilation Errors on page 254. Output window. Displays the compilation results for the Java transformation class. You can right-click an error message in the Output window to locate the error in the snippet code or the full code for the Java transformation class in the Full Code window. You can also double-click an error in the Output window to locate the source of the error. For more information about using the Output window to troubleshoot compilation errors, see Fixing Compilation Errors on page 254.
238
Configuring Ports
A Java transformation can have input ports, output ports, and input/output ports. You create and edit groups and ports on the Ports tab. You can specify default values for ports. After you add ports to a transformation, use the port names as variables in Java code snippets. Figure 10-2 shows the Ports tab for a Java transformation with one input group and one output group:
Figure 10-2. Java Transformation Ports Tab
Add and delete groups, and edit port relationships. Input Group
Output Group
Configuring Ports
239
Simple datatypes. If you define a default value for the port, the transformation initializes the value of the port variable to the default value. Otherwise, it initializes the value of the port variable to 0. Complex datatypes. If you provide a default value for the port, the transformation creates a new String or byte[] object, and initializes the object to the default value. Otherwise, the transformation initializes the port variable to NULL. Input ports with a NULL value generate a NullPointerException if you access the value of the port variable in the Java code.
Input/Output Ports
The Java transformation treats input/output ports as pass-through ports. If you do not set a value for the port in the Java code for the transformation, the output value is the same as the input value. The Java transformation initializes the value of an input/output port in the same way as an input port. If you set the value of a port variable for an input/output port in the Java code, the Java transformation uses this value when it generates an output row. If you do not set the value of an input/output port, the Java transformation sets the value of the port variable to 0 for simple datatypes and NULL for complex datatypes when it generates an output row.
240
241
Is Partitionable
Required
Optional Required
Optional
Transformation Scope
Required
242
Output Is Ordered
Required
Optional Optional
Transformation Scope. Determines how the Integration Service applies the transformation logic to incoming data. Generate Transaction. Indicates that the Java code for the transformation generates transaction rows and outputs them to the output group.
Transformation Scope
You can configure how the Integration Service applies the transformation logic to incoming data. You can choose one of the following values:
Row. Applies the transformation logic to one row of data at a time. Choose Row when the results of the transformation depend on a single row of data. You must choose Row for passive transformations. Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when the results of the transformation depend on all rows in the same transaction, but not on rows in other transactions. For example, you might choose Transaction when the Java code performs aggregate calculations on the data in a single transaction.
243
All Input. Applies the transformation logic to all incoming data. When you choose All Input, the Integration Service drops transaction boundaries. Choose All Input when the results of the transformation depend on all rows of data in the source. For example, you might choose All Input when the Java code for the transformation sorts all incoming data.
For more information about transformation scope, see Understanding Commit Points in the Workflow Administration Guide.
Generate Transaction
You can define Java code in an active Java transformation to generate transaction rows, such as commit and rollback rows. If the transformation generates commit and rollback rows, configure the Java transformation to generate transactions with the Generate Transaction transformation property. For more information about Java transformation API methods to generate transaction rows, see commit on page 261 and rollBack on page 269. When you configure the transformation to generate transaction rows, the Integration Service treats the Java transformation like a Transaction Control transformation. Most rules that apply to a Transaction Control transformation in a mapping also apply to the Java transformation. For example, when you configure a Java transformation to generate transaction rows, you cannot concatenate pipelines or pipeline branches containing the transformation. For more information about working with Transaction Control transformations, see Transaction Control Transformation on page 577. When you edit or create a session using a Java transformation configured to generate transaction rows, configure it for user-defined commit.
Within the Java code. You can write the Java code to set the update strategy for output rows. The Java code can flag rows for insert, update, delete, or reject. For more about setting the update strategy, see setOutRowType on page 271. Within the mapping. Use the Java transformation in a mapping to flag rows for insert, update, delete, or reject. Select the Update Strategy Transformation property for the Java transformation. Within the session. Configure the session to treat the source rows as data driven.
If you do not configure the Java transformation to define the update strategy, or you do not configure the session as data driven, the Integration Service does not use the Java code to flag the output rows. Instead, the Integration Service flags the output rows as insert.
244
Import Packages. Import third-party Java packages, built-in Java packages, or custom Java packages. For more information, see Importing Java Packages on page 246. Helper Code. Define variables and methods available to all tabs except Import Packages. For more information, see Defining Helper Code on page 246. On Input Row. Define transformation behavior when it receives an input row. For more information, see On Input Row Tab on page 247. On End of Data. Define transformation behavior when it has processed all input data. For more information, see On End of Data Tab on page 248. On Receiving Transaction. Define transformation behavior when it receives a transaction notification. Use with active Java transformations. For more information, see On Receiving Transaction Tab on page 248. Java Expressions. Define Java expressions to call PowerCenter expressions. You can use Java expressions in the Helper Code, On Input Row, On End of Data, and On Transaction code entry tabs. For more information about Java expressions, see Java Expressions on page 273.
Access input data and set output data on the On Input Row tab. For active transformations, you can also set output data on the On End of Data and On Receiving Transaction tabs.
Click the appropriate code entry tab. To access input or output column variables in the snippet, double-click the name of the port in the Navigator. To call a Java transformation API in the snippet, double-click the name of the API in the navigator. If necessary, configure the appropriate API input values. Write appropriate Java code, depending on the code snippet. The Full Code windows displays the full class code for the Java transformation.
245
When you import non-standard Java packages, add the package or class to the classpath. For more information about setting the classpath, see Configuring Java Transformation Settings on page 250. When you export or import metadata that contains a Java transformation in the PowerCenter Client, the JAR files or classes that contain the third-party or custom packages required by the Java transformation are not included. If you import metadata that contains a Java transformation, copy the JAR files or classes that contain the required third-party or custom packages to the PowerCenter Client and Integration Service machines.
Static code and static variables. You can declare static variables and static code within a static block. All instances of a reusable Java transformation in a mapping and all partitions in a session share static code and variables. Static code executes before any other code in a Java transformation. For example, the following code declares a static variable to store the error threshold for all instances of a Java transformation in a mapping:
static int errorThreshold;
Use this variable to store the error threshold for the transformation and access it from all instances of the Java transformation in a mapping and from any partition in a session.
Note: You must synchronize static variables in a multiple partition session or in a reusable
Java transformation.
Instance variables. You can declare partition-level instance variables. Multiple instances of a reusable Java transformation in a mapping or multiple partitions in a session do not share instance variables. Declare instance variables with a prefix to avoid conflicts and initialize non-primitive instance variables.
246
For example, the following code uses a boolean variable to decide whether to generate an output row:
// boolean to decide whether to generate an output row // based on validity of input private boolean generateRow;
User-defined methods. Create user-defined static or instance methods to extend the functionality of the Java transformation. Java methods declared in the Helper Code tab can use or modify output variables or locally declared instance variables. You cannot access input variables from Java methods in the Helper Code tab. For example, use the following code in the Helper Code tab to declare a function that adds two integers:
private int myTXAdd (int num1,int num2) { return num1+num2; }
Input port and output port variables. You can access input and output port data as a variable by using the name of the port as the name of the variable. For example, if in_int is an Integer input port, you can access the data for this port by referring as a variable in_int with the Java primitive datatype int. You do not need to declare input and output ports as variables. Do not assign a value to an input port variable. If you assign a value to an input variable in the On Input Row tab, you cannot get the input data for the corresponding port in the current row.
Instance variables and user-defined methods. Use any instance or static variable or userdefined method you declared in the Helper Code tab. For example, an active Java transformation has two input ports, BASE_SALARY and BONUSES, with an integer datatype, and a single output port, TOTAL_COMP, with an integer datatype. You create a user-defined method in the Helper Code tab, myTXAdd, that adds two integers and returns the result. Use the following Java code in the On Input Row tab to assign the total values for the input ports to the output port and generate an output row:
TOTAL_COMP = myTXAdd (BASE_SALARY,BONUSES); generateRow();
When the Java transformation receives an input row, it adds the values of the BASE_SALARY and BONUSES input ports, assigns the value to the TOTAL_COMP output port, and generates an output row.
Developing Java Code 247
Java transformation API methods. You can call API methods provided by the Java transformation. For more information about Java transformation API methods, see Java Transformation API Reference on page 259.
Output port variables. Use the names of output ports as variables to access or set output data for active Java transformations. Instance variables and user-defined methods. Use any instance variables or user-defined methods you declared in the Helper Code tab. Java transformation API methods. Call API methods provided by the Java transformation. Use the commit and rollBack API methods to generate a transaction. For example, use the following Java code to write information to the session log when the end of data is reached:
logInfo("Number of null rows for partition is: " + partCountNullRows);
For more information about API methods, see Java Transformation API Reference on page 259.
Output port variables. Use the names of output ports as variables to access or set output data. Instance variables and user-defined methods. Use any instance variables or user-defined methods you declared in the Helper Code tab. Java transformation API methods. Call API methods provided by the Java transformation. Use the commit and rollBack API methods to generate a transaction. For example, use the following Java code to generate a transaction after the transformation receives a transaction:
commit();
248
For more information about API methods, see Java Transformation API Reference on page 259.
249
Set classpath.
Configure the Java Classpath session property. Set the classpath using the Java Classpath session property. This classpath applies to the session. For more information about Java Classpath, see the Workflow Administration Guide. Configure the Java SDK Classpath. Configure the Java SDK Classpath on the Processes tab of the Integration Service properties in the Administration Console. This setting applies to all sessions run on the Integration Service.
250
Configure the CLASSPATH environment variable. Set the CLASSPATH environment variable on the Integration Service machine. Restart the Integration Service after you set the environment variable. This applies to all sessions run on the Integration Service.
Enter the environment variable CLASSPATH, and set the value to the default classpath. For information about setting environment variables on Windows, consult Microsoft documentation.
Configure the CLASSPATH environment variable. Set the CLASSPATH environment variable on the PowerCenter Client machine. This applies to all java processes run on the machine. Configure the Java transformation settings. Set the classpath in the Java transformation settings. This applies to sessions that include this Java transformation. The PowerCenter Client adds the required files to the classpath when you compile the java code.
On the Java Code tab, click the Settings link. The Settings dialog box appears.
2. 3.
Click Browse under Add Classpath to select the JAR file or class file for the imported package. Click OK. Click Add. The JAR or class file appears in the list of JAR and class files for the transformation.
4.
To remove a JAR file or class file, select the JAR or class file and click Remove.
When you enable high precision, you can process Decimal ports with precision less than 28 as BigDecimal. The Java transformation converts decimal data with a precision greater than 28 to the Double datatype. Enabling high precision does not affect how the Integration Service processes bigint data. Java transformation expressions process binary, integer, double, and string data. Java transformation expressions cannot process bigint data. For example, a Java transformation has an input port of type Decimal that receives a value of 40012030304957666903. If you enable high precision, the value of the port is treated as it appears. If you do not enable high precision, the value of the port is 4.00120303049577 x 10^19.
Processing Subseconds
You can process subsecond data up to nanoseconds in the Java code. When you configure the settings to use nanoseconds in datetime values, the generated Java code converts the transformation Date/Time datatype to the Java BigDecimal datatype, which has precision to the nanosecond. By default, the generated Java code converts the transformation Date/Time datatype to the Java Long datatype, which has precision to the millisecond.
252
253
Locate the source of the error. You can locate the source of the error in the Java snippet code or in the full class code for the transformation. Identify the type of error. Use the results of the compilation in the output window and the location of the error to identify the type of error.
After you identify the source and type of error, fix the Java code in the code entry tab and compile the transformation again.
254
255
However, if you use the same variable name in the On Input Row tab, the Java compiler issues an error for a redeclaration of a variable. You must rename the variable in the On Input Row code entry tab to fix the error.
256
When you compile the transformation, the PowerCenter Client adds the code from the On Input Row code entry tab to the full class code for the transformation. When the Java compiler compiles the Java code, the unmatched brace causes a method in the full class code to terminate prematurely, and the Java compiler issues an error.
257
258
Chapter 11
259
commit. Generates a transaction. For more information, see commit on page 261. failSession. Throws an exception with an error message and fails the session. For more information, see failSession on page 262. generateRow. Generates an output row for active Java transformations. For more information, see generateRow on page 263. getInRowType. Returns the input type of the current row in the transformation. For more information, see getInRowType on page 264. incrementErrorCount. Increases the error count for the session. For more information, see incrementErrorCount on page 265. isNull. Checks the value of an input column for a null value. For more information, see isNull on page 266. logError. Writes an error message to the session log. For more information, see logError on page 268. logInfo. Writes an informational message to the session log. For more information, see logInfo on page 267. rollback. Generates a rollback transaction. For more information, see rollBack on page 269. setNull. Sets the value of an output column in an active or passive Java transformation to NULL. For more information, see setNull on page 270. setOutRowType. Sets the update strategy for output rows. For more information, see setOutRowType on page 271.
You can add any API method to a code entry tab by double-clicking the name of the API method in the Navigator, dragging the method from the Navigator into the Java code snippet, or manually typing the API method in the Java code snippet. You can also use the defineJExpression and invokeJExpression API methods to create and invoke Java expressions. For more information about using the API methods with Java expressions, see Java Expressions on page 273.
260
commit
Generates a transaction. Use commit in any tab except the Import Packages or Java Expressions code entry tabs. You can only use commit in active transformations configured to generate transactions. If you use commit in an active transformation not configured to generate transactions, the Integration Service throws an error and fails the session.
Syntax
Use the following syntax:
commit();
Example
Use the following Java code to generate a transaction for every 100 rows processed by a Java transformation and then set the rowsProcessed counter to 0:
if (rowsProcessed==100) { commit(); rowsProcessed=0; }
commit
261
failSession
Throws an exception with an error message and fails the session. Use failSession to terminate the session. Do not use failSession in a try/catch block in a code entry tab. Use failSession in any tab except the Import Packages or Java Expressions code entry tabs.
Syntax
Use the following syntax:
failSession(String errorMessage); Input/ Output Input
Argument errorMessage
Datatype String
Example
Use the following Java code to test the input port input1 for a null value and fail the session if input1 is NULL:
if(isNull(input1)) { failSession(Cannot process a null value for port input1.); }
262
generateRow
Generates an output row for active Java transformations. When you call generateRow, the Java transformation generates an output row using the current value of the output port variables. If you want to generate multiple rows corresponding to an input row, you can call generateRow more than once for each input row. If you do not use generateRow in an active Java transformation, the transformation does not generate output rows. Use generateRow in any code entry tab except the Import Packages or Java Expressions code entry tabs. You can use generateRow with active transformations only. If you use generateRow in a passive transformation, the session generates an error.
Syntax
Use the following syntax:
generateRow();
Example
Use the following Java code to generate one output row, modify the values of the output ports, and generate another output row:
// Generate multiple rows. if(!isNull("input1") && !isNull("input2")) { output1 = input1 + input2; output2 = input1 - input2; } generateRow(); // Generate another row with modified values. output1 = output1 * 2; output2 = output2 * 2; generateRow();
generateRow
263
getInRowType
Returns the input type of the current row in the transformation. The method returns a value of insert, update, delete, or reject. You can only use getInRowType in the On Input Row code entry tab. You can only use the getInRowType method in active transformations configured to set the update strategy. If you use this method in an active transformation not configured to set the update strategy, the session generates an error.
Syntax
Use the following syntax:
rowType getInRowType(); Input/ Output Output
Argument rowType
Datatype String
Description Update strategy type. Value can be INSERT, UPDATE, DELETE, or REJECT.
Example
Use the following Java code to propagate the input type of the current row if the row type is UPDATE or INSERT and the value of the input port input1 is less than 100 or set the output type as DELETE if the value of input1 is greater than 100:
// Set the value of the output port. output1 = input1; // Get and set the row type. String rowType = getInRowType(); setOutRowType(rowType); // Set row type to DELETE if the output port value is > 100. if(input1 > 100) setOutRowType(DELETE);
264
incrementErrorCount
Increases the error count for the session. If the error count reaches the error threshold for the session, the session fails. Use incrementErrorCount in any tab except the Import Packages or Java Expressions code entry tabs.
Syntax
Use the following syntax:
incrementErrorCount(int nErrors); Input/ Output Input
Argument nErrors
Datatype Integer
Description Number of errors to increment the error count for the session.
Example
Use the following Java code to increment the error count if an input port for a transformation has a null value:
// Check if input employee id and name is null. if (isNull ("EMP_ID_INP") || isNull ("EMP_NAME_INP")) { incrementErrorCount(1); // if input employee id and/or name is null, don't generate a output row for this input row generateRow = false; }
incrementErrorCount
265
isNull
Checks the value of an input column for a null value. Use isNull to check if data of an input column is NULL before using the column as a value. You can use the isNull method in the On Input Row code entry tab only.
Syntax
Use the following syntax:
Boolean isNull(String satrColName); Input/ Output Input
Argument strColName
Datatype String
Example
Use the following Java code to check the value of the SALARY input column before adding it to the instance variable totalSalaries:
// if value of SALARY is not null if (!isNull("SALARY")) { // add to totalSalaries TOTAL_SALARIES += SALARY; }
or
// if value of SALARY is not null String strColName = "SALARY"; if (!isNull(strColName)) { // add to totalSalaries TOTAL_SALARIES += SALARY; }
266
logInfo
Writes an informational message to the session log. Use logInfo in any tab except the Import Packages or Java Expressions tabs.
Syntax
Use the following syntax:
logInfo(String logMessage); Input/ Output Input
Argument logMessage
Datatype String
Example
Use the following Java code to write a message to the message log after the Java transformation processes a message threshold of 1,000 rows:
if (numRowsProcessed == messageThreshold) { logInfo("Processed " + messageThreshold + " rows."); }
logInfo
267
logError
Write an error message to the session log. Use logError in any tab except the Import Packages or Java Expressions code entry tabs.
Syntax
Use the following syntax:
logError(String errorMessage); Input/ Output Input
Argument errorMessage
Datatype String
Example
Use the following Java code to log an error of the input port is null:
// check BASE_SALARY if (isNull("BASE_SALARY")) { logError("Cannot process a null salary field."); }
268
rollBack
Generates a rollback transaction. Use rollBack in any tab except the Import Packages or Java Expressions code entry tabs. You can only use rollback in active transformations configured to generate transactions. If you use rollback in an active transformation not configured to generate transactions, the Integration Service generates an error and fails the session.
Syntax
Use the following syntax:
rollBack();
Example
Use the following code to generate a rollback transaction and fail the session if an input row has an illegal condition or generate a transaction if the number of rows processed is 100:
// If row is not legal, rollback and fail session. if (!isRowLegal()) { rollback(); failSession(Cannot process illegal row.); } else if (rowsProcessed==100) { commit(); rowsProcessed=0; }
rollBack
269
setNull
Sets the value of an output column in an active or passive Java transformation to NULL. Once you set an output column to NULL, you cannot modify the value until you generate an output row. Use setNull in any tab except the Import Packages or Java Expressions code entry tabs.
Syntax
Use the following syntax:
setNull(String strColName); Input/ Output Input
Argument strColName
Datatype String
Example
Use the following Java code to check the value of an input column and set the corresponding value of an output column to null:
// check value of Q3RESULTS input column if(isNull("Q3RESULTS")) { // set the value of output column to null setNull("RESULTS"); }
or
// check value of Q3RESULTS input column String strColName = "Q3RESULTS"; if(isNull(strColName)) { // set the value of output column to null setNull(strColName); }
270
setOutRowType
Sets the update strategy for output rows. The setOutRowType method can flag rows for insert, update, or delete. You can only use setOutRowType in the On Input Row code entry tab. You can only use setOutRowType in active transformations configured to set the update strategy. If you use setOutRowType in an active transformation not configured to set the update strategy, the session generates an error and the session fails.
Syntax
Use the following syntax:
setOutRowType(String rowType); Input/ Output Input
Argument rowType
Datatype String
Example
Use the following Java code to propagate the input type of the current row if the row type is UPDATE or INSERT and the value of the input port input1 is less than 100 or set the output type as DELETE if the value of input1 is greater than 100:
// Set the value of the output port. output1 = input1; // Get and set the row type. String rowType = getInRowType(); setOutRowType(rowType); // Set row type to DELETE if the output port value is > 100. if(input1 > 100) setOutRowType(DELETE);
setOutRowType
271
272
Chapter 12
Java Expressions
Overview, 274 Using the Define Expression Dialog Box, 276 Working with the Simple Interface, 281 Working with the Advanced Interface, 283 JExpression API Reference, 289
273
Overview
You can invoke PowerCenter expressions in a Java transformation with the Java programming language. Use expressions to extend the functionality of a Java transformation. For example, you can invoke an expression in a Java transformation to look up the values of input or output ports or look up the values of Java transformation variables. To invoke expressions in a Java transformation, you generate the Java code or use Java transformation APIs to invoke the expression. You invoke the expression and use the result of the expression in the appropriate code entry tab. You can generate the Java code that invokes an expression or use API methods to write the Java code that invokes the expression. Use the following methods to create and invoke expressions in a Java transformation:
Use the Define Expression dialog box. Create an expression and generate the code for an expression. For more information, see Using the Define Expression Dialog Box on page 276. Use the simple interface. Use a single method to invoke an expression and get the result of the expression. For more information, see Working with the Simple Interface on page 281. Use the advanced interface. Use the advanced interface to define the expression, invoke the expression, and use the result of the expression. For more information, see Working with the Advanced Interface on page 283.
You can invoke expressions in a Java transformation without advanced knowledge of the Java programming language. You can invoke expressions using the simple interface, which only requires a single method to invoke an expression. If you are familiar with object oriented programming and want more control over invoking the expression, you can use the advanced interface.
Transformation language functions. SQL-like functions designed to handle common expressions. User-defined functions. Functions you create in PowerCenter based on transformation language functions. Custom functions. Functions you create with the Custom Function API. Unconnected transformations. You can use unconnected transformations in expressions. For example, you can use an unconnected lookup transformation in an expression.
274
You can also use built-in variables, user-defined mapping and workflow variables, and predefined workflow variables such as $Session.status in expressions. For more information about the transformation language and custom functions, see the Transformation Language Reference. For more information about user-defined functions, see Working with User-Defined Functions in the Designer Guide.
Overview
275
2. 3.
After you generate the Java code, call the generated function in the appropriate code entry tab to invoke an expression or get a JExpression object, depending on whether you use the simple or advanced interface.
Note: To validate an expression when you create the expression, you must use the Define
Use a unique function name that does not conflict with an existing Java function in the transformation or reserved Java keywords. You must configure the parameter name, Java datatype, precision, and scale. The input parameters are the values you pass when you call the function in the Java code for the transformation. To pass a Date datatype to an expression, use a String datatype for the input parameter. If an expression returns a Date datatype, you can use the return value as a String datatype in the simple interface and a String or long datatype in the advanced interface.
For more information about the mapping between PowerCenter datatypes and Java datatypes, see Datatype Mapping on page 235.
276
Figure 12-1 shows the Define Expression dialog box where you configure the function and the expression for a Java transformation:
Figure 12-1. Define Expression Dialog Box
Java Function Name
277
Figure 12-2 shows the Java Expressions code entry tab and generated Java code for an expression in the advanced interface:
Figure 12-2. Java Expressions Code Entry Tab
In the Transformation Developer, open a Java transformation or create a new Java transformation. Click the Java Code tab. Click the Define Expression link. The Define Expression dialog box appears.
4. 5.
Enter a function name. Optionally, enter a description for the expression. You can enter up to 2,000 characters.
6.
278
When you create the parameters, configure the parameter name, datatype, precision, and scale.
7. 8. 9. 10. 11.
Click Launch Editor to create an expression with the parameters you created in step 6. Click Validate to validate the expression. Optionally, you can enter the expression in the Expression field and click Validate to validate the expression. If you want to generate Java code using the advanced interface, select Generate advanced code. Click Generate. The Designer generates the function to invoke the expression in the Java Expressions code entry tab.
The following example shows the template for a Java expression generated using the advanced interface:
JExpression function_name () { JExprParamMetadata params[] = new JExprParamMetadata[number of parameters]; params[0] = new JExprParamMetadata ( EDataType.STRING, 20, 0 ); ... params[number of parameters - 1] = new JExprParamMetadata ( EDataType.STRING, // data type // precision // scale // data type throws SDKException
279
20, 0 ); ...
// precision // scale
280
invokeJExpression
Invokes an expression and returns the value for the expression. Input parameters for invokeJExpression are a string value that represents the expression and an array of objects that contain the expression input parameters. Use the following rules and guidelines when you use invokeJExpression:
Return datatype. The return type of invokeJExpression is an object. You must cast the return value of the function with the appropriate datatype. You can return values with Integer, Double, String, and byte[] datatypes. Row type. The row type for return values from invokeJExpression is INSERT. If you want to use a different row type for the return value, use the advanced interface. For more information, see invoke on page 289. Null values. If you pass a null value as a parameter or the return value for invokeJExpression is NULL, the value is treated as a null indicator. For example, if the return value of an expression is NULL and the return datatype is String, a string is returned with a value of null. Date datatype. You must convert input parameters with a Date datatype to String. To use the string in an expression as a Date datatype, use the to_date() function to convert the string to a Date datatype. Also, you must cast the return type of any expression that returns a Date datatype as a String.
Description String that represents the expression. Array of objects that contain the input parameters for the expression.
The following example concatenates the two strings John and Smith and returns John Smith:
(String)invokeJExpression("concat(x1,x2)", new Object [] { "John ", "Smith" });
281
Note: The parameters passed to the expression must be numbered consecutively and start with
the letter x. For example, to pass three parameters to an expression, name the parameters x1, x2, and x3.
282
EDataType class. Enumerates the datatypes for an expression. For more information, see EDataType Class on page 284. JExprParamMetadata class. Contains the metadata for each parameter in an expression. Parameter metadata includes datatype, precision, and scale. For more information, see JExprParamMetadata Class on page 284. defineJExpression API. Defines the expression. Includes PowerCenter expression string and parameters. For more information, see defineJExpression on page 285. JExpression class. Contains the methods to create, invoke, get the metadata and get the expression result, and check the return datatype. For more information, see JExpression API Reference on page 289.
2. 3. 4. 5. 6.
Null values. If you pass a null value as a parameter or if the result of an expression is null, the value is treated as a null indicator. For example, if the result of an expression is null and the return datatype is String, a string is returned with a value of null. You can check the result of an expression using isResultNull. For more information, see isResultNull on page 290.
283
Date datatype. You must convert input parameters with a Date datatype to a String before you can use them in an expression. To use the string in an expression as a Date datatype, use the to_date() function to convert the string to a Date datatype. You can get the result of an expression that returns a Data datatype as String or long datatype. For more information, see getStringBuffer on page 292 and getLong on page 291.
EDataType Class
Enumerates the Java datatypes used in expressions. You can use the EDataType class to get the return datatype of an expression or assign the datatype for a parameter in a JExprParamMetadata object. You do not need to instantiate the EDataType class. Table 12-1 lists the enumerated values for Java datatypes in expressions:
Table 12-1. Enumerated Java Datatypes
Datatype INT DOUBLE STRING BYTE_ARRAY DATE_AS_LONG Enumerated Value 1 2 3 4 5
The following example shows how to use the EDataType class to assign a datatype of String to an JExprParamMetadata object:
JExprParamMetadata params[] = new JExprParamMetadata[2]; params[0] = new JExprParamMetadata ( EDataType.STRING, 20, 0 ); ... // precision // scale // data type
JExprParamMetadata Class
Instantiates an object that represents the parameters for an expression and sets the metadata for the parameters. You use an array of JExprParamMetadata objects as input to the defineJExpression to set the metadata for the input parameters. You can create a instance of the JExprParamMetadata object in the Java Expressions code entry tab or in defineJExpression.
284
Description Datatype of the parameter. Precision of the parameter. Scale of the parameter.
For example, use the following Java code to instantiate an array of two JExprParamMetadata objects with String datatypes, precision of 20, and scale of 0:
JExprParamMetadata params[] = new JExprParamMetadata[2]; params[0] = new JExprParamMetadata(EDataType.STRING, 20, 0); params[1] = new JExprParamMetadata(EDataType.STRING, 20, 0); return defineJExpression(":LKP.LKP_addresslookup(X1,X2)",params);
defineJExpression
Defines the expression, including the expression string and input parameters. Arguments for defineJExpression include a JExprParamMetadata object that contains the input parameters and a string value that defines the expression syntax. To use defineJExpression, you must instantiate an array of JExprParamMetadata objects that represent the input parameters for the expression. You set the metadata values for the parameters and pass the array as an argument to defineJExpression.
285
Description String that represents the expression. Array of JExprParaMetadata objects that contain the input parameters for the expression.
For example, use the following Java code to create an expression to perform a lookup on two strings:
JExprParaMetadata params[] = new JExprParamMetadata[2]; params[0] = new JExprParamMetadata(EDataType.STRING, 20, 0); params[1] = new JExprParamMetadata(EDataType.STRING, 20, 0); defineJExpression(":lkp.mylookup(x1,x2)",params);
Note: The parameters passed to the expression must be numbered consecutively and start with
the letter x. For example, to pass three parameters to an expression, name the parameters x1, x2, and x3.
JExpression Class
The JExpression class contains the methods to create and invoke an expression, return the value of an expression, and check the return datatype. Table 12-2 lists the JExpression API methods:
Table 12-2. JExpression API Methods
Method Name invoke getResultDataType getResultMetadata isResultNull getInt getDouble getStringBuffer getBytes Description Invokes an expression. Returns the datatype of the expression result. Returns the metadata of the expression result. Checks the result value of an expression result. Returns the value of an expression result as an Integer datatype. Returns the value of an expression result as a Double datatype. Returns the value of an expression result as a String datatype. Returns the value of an expression result as a byte[] datatype.
286
For more information about the JExpression class, including syntax, usage, and examples, see JExpression API Reference on page 289.
called LKP_addresslookup. Use the following Java code in the Helper Code tab of the Transformation Developer:
JExprParamMetadata addressLookup() { JExprParamMetadata params[] = new JExprParamMetadata[2]; params[0] = new JExprParamMetadata ( EDataType.STRING, 50, 0 ); params[1] = new JExprParamMetadata ( EDataType.STRING, 50, 0 ); return defineJExpression(":LKP.LKP_addresslookup(X1,X2)",params); } JExpression lookup = null; boolean isJExprObjCreated = false; // data type // precision // scale // data type // precision // scale throws SDKException
287
Use the following Java code in the On Input Row tab to invoke the expression and return the value of the ADDRESS port:
... if(!iisJExprObjCreated) { lookup = addressLookup(); isJExprObjCreated = true; } lookup = addressLookup(); lookup.invoke(new Object [] {NAME,COMPANY}, ERowType.INSERT); EDataType addressDataType = lookup.getResultDataType(); if(addressDataType == EDataType.STRING) { ADDRESS = (lookup.getStringBuffer()).toString(); } else { logError("Expression result datatype is incorrect."); } ...
288
invoke
Invokes an expression. Arguments for invoke include an object that defines the input parameters and the row type. You must instantiate an JExpression object before you use invoke. You can use ERowType.INSERT, ERowType.DELETE, and ERowType.UPDATE for the row type. Use the following syntax:
objectName.invoke( new Object[] { param1[, ... paramN ]}, rowType ); Input/ Output Input Input
Description JExpression object name. Object array that contains the input values for the expression.
For example, you create a function in the Java Expressions code entry tab named address_lookup() that returns an JExpression object that represents the expression. Use the following code to invoke the expression that uses input ports NAME and COMPANY:
JExpression myObject = address_lookup(); myObject.invoke(new Object[] { NAME,COMPANY }, ERowType INSERT);
289
getResultDataType
Returns the datatype of an expression result. getResultDataType returns a value of EDataType. For more information about the EDataType enumerated class, see EDataType Class on page 284. Use the following syntax:
objectName.getResultDataType();
For example, use the following code to invoke an expression and assign the datatype of the result to the variable dataType:
myObject.invoke(new Object[] { NAME,COMPANY }, ERowType INSERT); EDataType dataType = myObject.getResultDataType();
getResultMetadata
Returns the metadata for an expression result. For example, you can use getResultMetadata to get the precision, scale, and datatype of an expression result. You can assign the metadata of the return value from an expression to an JExprParamMetadata object. Use the getScale, getPrecision, and getDataType object methods to retrieve the result metadata. Use the following syntax:
objectName.getResultMetadata();
For example, use the following Java code to assign the scale, precision, and datatype of the return value of myObject to variables:
JExprParamMetadata myMetadata = myObject.getResultMetadata(); int scale = myMetadata.getScale(); int prec = myMetadata.getPrecision(); int datatype = myMetadata.getDataType();
Note: The getDataType object method returns the integer value of the datatype, as enumerated
in EDataType. For more information about the EDataType class, see EDataType Class on page 284.
isResultNull
Check the value of an expression result. Use the following syntax:
objectName.isResultNull();
290
For example, use the following Java code to invoke an expression and assign the return value of the expression to the variable address if the return value is not null:
JExpression myObject = address_lookup(); myObject.invoke(new Object[] { NAME,COMPANY }, ERowType INSERT); if(!myObject.isResultNull()) { String address = myObject.getStringBuffer(); }
getInt
Returns the value of an expression result as an Integer datatype. Use the following syntax:
objectName.getInt();
For example, use the following Java code to get the result of an expression that returns an employee ID number as an integer, where findEmpID is a JExpression object:
int empID = findEmpID.getInt();
getDouble
Returns the value of an expression result as a Double datatype. Use the following syntax:
objectName.getDouble();
For example, use the following Java code to get the result of an expression that returns a salary value as a double, where JExprSalary is an JExpression object:
double salary = JExprSalary.getDouble();
getLong
Returns the value of an expression result as a Long datatype. You can use getLong to get the result of an expression that uses a Date datatype. Use the following syntax:
objectName.getLong();
For example, use the following Java code to get the result of an expression that returns a Date value as a Long datatype, where JExprCurrentDate is an JExpression object:
long currDate = JExprCurrentDate.getLong();
291
getStringBuffer
Returns the value of an expression result as a String datatype. Use the following syntax:
objectName.getStringBuffer();
For example, use the following Java code to get the result of an expression that returns two concatenated strings, where JExprConcat is an JExpression object:
String result = JExprConcat.getStringBuffer();
getBytes
Returns the value of an expression result as an byte[] datatype. For example, you can use getByte to get the result of an expression that encypts data with the AES_ENCRYPT function. Use the following syntax:
objectName.getBytes();
For example, use the following Java code to get the result of an expression that encrypts the binary data using the AES_ENCRYPT function, where JExprEncryptData is an JExpression object:
byte[] newBytes = JExprEncryptData.getBytes();
292
Chapter 13
Overview, 294 Step 1. Import the Mapping, 295 Step 2. Create Transformation and Configure Ports, 296 Step 3. Enter Java Code, 298 Step 4. Compile the Java Code, 303 Step 5. Create a Session and Workflow, 304
293
Overview
You can use the Java code in this example to create and compile an active Java transformation. You import a sample mapping and create and compile the Java transformation. You can then create and run a session and workflow that contains the mapping. The Java transformation processes employee data for a fictional company. It reads input rows from a flat file source and writes output rows to a flat file target. The source file contains employee data, including the employee identification number, name, job title, and the manager identification number. The transformation finds the manager name for a given employee based on the manager identification number and generates output rows that contain employee data. The output data includes the employee identification number, name, job title, and the name of the employees manager. If the employee has no manager in the source data, the transformation assumes the employee is at the top of the hierarchy in the company organizational chart.
Note: The transformation logic assumes the employee job titles are arranged in descending
order in the source file. Complete the following steps to import the sample mapping, create and compile a Java transformation, and create a session and workflow that contains the mapping: 1. 2. 3. 4. 5. Import the sample mapping. For more information, see Step 1. Import the Mapping on page 295. Create the Java transformation and configure the Java transformation ports. For more information, see Step 2. Create Transformation and Configure Ports on page 296. Enter the Java code for the transformation in the appropriate code entry tabs. For more information, see Step 3. Enter Java Code on page 298. Compile the Java code. For more information, see Step 4. Compile the Java Code on page 303. Create and run a session and workflow. For more information, see Step 5. Create a Session and Workflow on page 304.
For a sample source and target file for the session, see Sample Data on page 304. The PowerCenter Client installation contains a mapping, m_jtx_hier_useCase.xml, and flat file source, hier_input, that you can use with this example.
294
Source definition and Source Qualifier transformation. Flat file source definition, hier_input, that defines the source data for the transformation. Target definition. Flat file target definition, hier_data, that receives the output data from the transformation.
You can import the metadata for the mapping from the following location:
<PowerCenter Client installation directory>\client\bin\m_jtx_hier_useCase.xml
295
output ports. Table 13-1 shows the input and output ports for the transformation:
Table 13-1. Input and Output Ports
Port Name EMP_ID_INP EMP_NAME_INP EMP_AGE EMP_DESC_INP EMP_PARENT_EMPID EMP_ID_OUT EMP_NAME_OUT EMP_DESC_OUT EMP_PARENT_EMPNAME Port Type Input Input Input Input Input Output Output Output Output Datatype Integer String Integer String Integer Integer String String String Precision 10 100 10 100 10 10 100 100 100 Scale 0 0 0 0 0 0 0 0 0
296
Figure 13-2 shows the Ports tab in the Transformation Developer after you create the ports:
Figure 13-2. Java Transformation Example - Ports Tab
297
Import Packages. Imports the java.util.Map and java.util.HashMap packages. For more information, see Import Packages Tab on page 298. Helper Code. Contains a Map object, lock object, and boolean variables used to track the state of data in the Java transformation. For more information, see Helper Code Tab on page 299. On Input Row. Contains the Java code that processes each input row in the transformation. For more information, see On Input Row Tab on page 300.
For more information about using the code entry tabs to develop Java code, see Developing Java Code on page 245.
The Designer adds the import statements to the Java code for the transformation.
298
empMap. Map object that stores the identification number and employee name from the source. lock. Lock object used to synchronize the access to empMap across partitions. generateRow. Boolean variable used to determine if an output row should be generated for the current input row. isRoot. Boolean variable used to determine if an employee is at the top of the company organizational chart (root).
299
// Boolean to track whether to generate an output row based on validity // of the input data. private boolean generateRow; // Boolean to track whether the employee is root. private boolean isRoot;
300
// row for this input row. generateRow = false; } else { // Set the output port values. EMP_ID_OUT = EMP_ID_INP; EMP_NAME_OUT = EMP_NAME_INP; } if (isNull ("EMP_DESC_INP")) setNull("EMP_DESC_OUT"); } else { EMP_DESC_OUT = EMP_DESC_INP; } boolean isParentEmpIdNull = isNull("EMP_PARENT_EMPID"); if(isParentEmpIdNull) { // This employee is the root for the hierarchy. isRoot = true; logInfo("This is the root for this hierarchy."); setNull("EMP_PARENT_EMPNAME"); } synchronized(lock) { // If the employee is not the root for this hierarchy, get the // corresponding parent id. if(!isParentEmpIdNull) EMP_PARENT_EMPNAME = (String) (empMap.get(new Integer (EMP_PARENT_EMPID))); // Add employee to the map for future reference. empMap.put (new Integer(EMP_ID_INP), EMP_NAME_INP); } // Generate row if generateRow is true. if(generateRow) generateRow();
301
302
303
Sample Data
The following data is an excerpt from the sample source file:
1,James Davis,50,CEO, 4,Elaine Masters,40,Vice President - Sales,1 5,Naresh Thiagarajan,40,Vice President - HR,1 6,Jeanne Williams,40,Vice President - Software,1 9,Geetha Manjunath,34,Senior HR Manager,5 10,Dan Thomas,32,Senior Software Manager,6 14,Shankar Rahul,34,Senior Software Manager,6 20,Juan Cardenas,32,Technical Lead,10 21,Pramodh Rahman,36,Lead Engineer,14 22,Sandra Patterson,24,Software Engineer,10 23,Tom Kelly,32,Lead Engineer,10 35,Betty Johnson,27,Lead Engineer,14 50,Dave Chu,26,Software Engineer,23 70,Srihari Giran,23,Software Engineer,35 71,Frank Smalls,24,Software Engineer,35
304
Chapter 14
Joiner Transformation
Overview, 306 Joiner Transformation Properties, 308 Defining a Join Condition, 310 Defining the Join Type, 311 Using Sorted Input, 314 Joining Data from a Single Source, 318 Blocking the Source Pipelines, 321 Working with Transactions, 322 Creating a Joiner Transformation, 325 Tips, 328
305
Overview
Transformation type: Active Connected
Use the Joiner transformation to join source data from two related heterogeneous sources residing in different locations or file systems. You can also join data from the same source. The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources. The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target. Figure 14-1 shows the master and detail pipelines in a mapping with a Joiner transformation:
Figure 14-1. Mapping with Master and Detail Pipelines
Master Pipeline
Detail Pipeline
To join more than two sources in a mapping, join the output from the Joiner transformation with another source pipeline. Add Joiner transformations to the mapping until you have joined all the source pipelines. The Joiner transformation accepts input from most transformations. However, consider the following limitations on the pipelines you connect to the Joiner transformation:
You cannot use a Joiner transformation when either input pipeline contains an Update Strategy transformation. You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly before the Joiner transformation.
306
transformation scope to control how the Integration Service applies transformation logic. To work with the Joiner transformation, complete the following tasks:
Configure the Joiner transformation properties. Properties for the Joiner transformation identify the location of the cache directory, how the Integration Service processes the transformation, and how the Integration Service handles caching. For more information, see Joiner Transformation Properties on page 308. Configure the join condition. The join condition contains ports from both input sources that must match for the Integration Service to join two rows. Depending on the type of join selected, the Integration Service either adds the row to the result set or discards the row. For more information, see Defining a Join Condition on page 310. Configure the join type. A join is a relational operator that combines data from multiple tables in different databases or flat files into a single result set. You can configure the Joiner transformation to use a Normal, Master Outer, Detail Outer, or Full Outer join type. For more information, see Defining the Join Type on page 311. Configure the session for sorted or unsorted input. You can improve session performance by configuring the Joiner transformation to use sorted input. To configure a mapping to use sorted data, you establish and maintain a sort order in the mapping so that the Integration Service can use the sorted data when it processes the Joiner transformation. For more information about configuring the Joiner transformation for sorted input, see Using Sorted Input on page 314. Configure the transaction scope. When the Integration Service processes a Joiner transformation, it can apply transformation logic to all data in a transaction, all incoming data, or one row of data at a time. For more information about configuring how the Integration Service applies transformation logic, see Working with Transactions on page 322.
If you have the partitioning option in PowerCenter, you can increase the number of partitions in a pipeline to improve session performance. For information about partitioning with the Joiner transformation, see the Workflow Administration Guide.
Overview
307
When you create a mapping, you specify the properties for each Joiner transformation. When you create a session, you can override some properties, such as the index and data cache size for each transformation. Table 14-1 describes the Joiner transformation properties:
Table 14-1. Joiner Transformation Properties
Option Case-Sensitive String Comparison Cache Directory Description If selected, the Integration Service uses case-sensitive string comparisons when performing joins on string columns. Specifies the directory used to cache master or detail rows and the index to these rows. By default, the cache files are created in a directory specified by the process variable $PMCacheDir. If you override the directory, make sure the directory exists and contains enough disk space for the cache files. The directory can be a mapped or mounted drive. Specifies the type of join: Normal, Master Outer, Detail Outer, or Full Outer. Not applicable for this transformation type.
308
Sorted Input
Transformation Scope
309
Use one or more ports from the input sources of a Joiner transformation in the join condition. Additional ports increase the time necessary to join two sources. The order of the ports in the condition can impact the performance of the Joiner transformation. If you use multiple ports in the join condition, the Integration Service compares the ports in the order you specify. The Designer validates datatypes in a condition. Both ports in a condition must have the same datatype. If you need to use two ports in the condition with non-matching datatypes, convert the datatypes so they match. If you join Char and Varchar datatypes, the Integration Service counts any spaces that pad Char values as part of the string:
Char(40) = "abcd" Varchar(40) = "abcd"
The Char value is abcd padded with 36 blank spaces, and the Integration Service does not join the two fields because the Char field contains trailing spaces.
Note: The Joiner transformation does not match null values. For example, if both EMP_ID1
and EMP_ID2 contain a row with a null value, the Integration Service does not consider them a match and does not join the two rows. To join rows with null values, replace null input with default values, and then join on the default values. For more information about default values, see Using Default Values for Ports on page 20.
310 Chapter 14: Joiner Transformation
Note: A normal or master outer join performs faster than a full outer or detail outer join.
If a result set includes fields that do not contain data in either of the sources, the Joiner transformation populates the empty fields with null values. If you know that a field will return a NULL and you do not want to insert NULLs in the target, you can set a default value on the Ports tab for the corresponding port.
Normal Join
With a normal join, the Integration Service discards all rows of data from the master and detail source that do not match, based on the condition. For example, you might have two sources of data for auto parts called PARTS_SIZE and PARTS_COLOR with the following data:
PARTS_SIZE (master source) PART_ID1 1 2 3 DESCRIPTION Seat Cover Ash Tray Floor Mat SIZE Large Small Medium
PARTS_COLOR (detail source) PART_ID2 1 3 4 DESCRIPTION Seat Cover Floor Mat Fuzzy Dice COLOR Blue Black Yellow
To join the two tables by matching the PART_IDs in both sources, you set the condition as follows:
PART_ID1 = PART_ID2
311
When you join these tables with a normal join, the result set includes the following data:
PART_ID 1 3 DESCRIPTION Seat Cover Floor Mat SIZE Large Medium COLOR Blue Black
Because no size is specified for the Fuzzy Dice, the Integration Service populates the field with a NULL. The following example shows the equivalent SQL statement:
SELECT * FROM PARTS_SIZE RIGHT OUTER JOIN PARTS_COLOR ON (PARTS_SIZE.PART_ID1 = PARTS_COLOR.PART_ID2)
Because no color is specified for the Ash Tray, the Integration Service populates the field with a NULL.
312
Because no color is specified for the Ash Tray and no size is specified for the Fuzzy Dice, the Integration Service populates the fields with NULL. The following example shows the equivalent SQL statement:
SELECT * FROM PARTS_SIZE FULL OUTER JOIN PARTS_COLOR ON (PARTS_SIZE.PART_ID1 = PARTS_COLOR.PART_ID2)
313
Configure the sort order. Configure the sort order of the data you want to join. You can join sorted flat files, or you can sort relational data using a Source Qualifier transformation. You can also use a Sorter transformation. Add transformations. Use transformations that maintain the order of the sorted data. Configure the Joiner transformation. Configure the Joiner transformation to use sorted data and configure the join condition to use the sort origin ports. The sort origin represents the source of the sorted data.
When you configure the sort order in a session, you can select a sort order associated with the Integration Service code page. When you run the Integration Service in Unicode mode, it uses the selected session sort order to sort character data. When you run the Integration Service in ASCII mode, it sorts all character data using a binary sort order. To ensure that data is sorted as the Integration Service requires, the database sort order must be the same as the user-defined session sort order. When you join sorted data from partitioned pipelines, you must configure the partitions to maintain the order of sorted data. For more information about joining data from partitioned pipelines, see Working with Partition Points in the Workflow Administration Guide.
Use sorted flat files. When the flat files contain sorted data, verify that the order of the sort columns match in each source file. Use sorted relational data. Use sorted ports in the Source Qualifier transformation to sort columns from the source database. Configure the order of the sorted ports the same in each Source Qualifier transformation. For more information about using sorted ports, see Using Sorted Ports on page 494. Use Sorter transformations. Use a Sorter transformation to sort relational or flat file data. Place a Sorter transformation in the master and detail pipelines. Configure each Sorter transformation to use the same order of the sort key ports and the sort order direction.
314
For more information about using the Sorter transformation, see Creating a Sorter Transformation on page 465. If you pass unsorted or incorrectly sorted data to a Joiner transformation configured to use sorted data, the session fails and the Integration Service logs the error in the session log file.
Do not place any of the following transformations between the sort origin and the Joiner transformation:
Custom Unsorted Aggregator Normalizer Rank Union transformation XML Parser transformation XML Generator transformation Mapplet, if it contains one of the above transformations
You can place a sorted Aggregator transformation between the sort origin and the Joiner transformation if you use the following guidelines:
Configure the Aggregator transformation for sorted input using the guidelines in Using Sorted Input on page 50. Use the same ports for the group by columns in the Aggregator transformation as the ports at the sort origin. The group by ports must be in the same order as the ports at the sort origin.
When you join the result set of a Joiner transformation with another pipeline, verify that the data output from the first Joiner transformation is sorted.
Tip: You can place the Joiner transformation directly after the sort origin to maintain sorted
data.
Enable Sorted Input on the Properties tab. Define the join condition to receive sorted data in the same order as the sort origin.
315
The ports you use in the join condition must match the ports at the sort origin. When you configure multiple join conditions, the ports in the first join condition must match the first ports at the sort origin. When you configure multiple conditions, the order of the conditions must match the order of the ports at the sort origin, and you must not skip any ports. The number of sorted ports in the sort origin can be greater than or equal to the number of ports at the join condition.
ITEM_NO ITEM_NAME PRICE You must use ITEM_NO in the first join condition. If you add a second join condition, you must use ITEM_NAME. If you want to use PRICE in a join condition, you must also use ITEM_NAME in the second join condition.
When you configure the join condition, use the following guidelines to maintain sort order:
If you skip ITEM_NAME and join on ITEM_NO and PRICE, you lose the sort order and the Integration Service fails the session.
316
Figure 14-3 shows a mapping configured to sort and join on the ports ITEM_NO, ITEM_NAME, and PRICE:
Figure 14-3. Mapping Configured to Join Data from Two Pipelines
The master and detail Sorter transformations sort on the same ports in the same order.
When you use the Joiner transformation to join the master and detail pipelines, you can configure any one of the following join conditions:
ITEM_NO = ITEM_NO
or
ITEM_NO = ITEM_NO1 ITEM_NAME = ITEM_NAME1
or
ITEM_NO = ITEM_NO1 ITEM_NAME = ITEM_NAME1 PRICE = PRICE1
317
Join two branches of the same pipeline. Join two instances of the same source.
In the target, you want to view the employees who generated sales that were greater than the average sales for their departments. To do this, you create a mapping with the following transformations:
Sorter transformation. Sorts the data. Sorted Aggregator transformation. Averages the sales data and group by department. When you perform this aggregation, you lose the data for individual employees. To maintain employee data, you must pass a branch of the pipeline to the Aggregator transformation and pass a branch with the same data to the Joiner transformation to maintain the original data. When you join both branches of the pipeline, you join the aggregated data with the original data. Sorted Joiner transformation. Uses a sorted Joiner transformation to join the sorted aggregated data with the original data. Filter transformation. Compares the average sales data against sales data for each employee and filter out employees with less than above average sales.
318
Source
Pipeline Branch 2
Note: You can also join data from output groups of the same transformation, such as the
Custom transformation or XML Source Qualifier transformation. Place a Sorter transformation between each output group and the Joiner transformation and configure the Joiner transformation to receive sorted input. Joining two branches might impact performance if the Joiner transformation receives data from one branch much later than the other branch. The Joiner transformation caches all the data from the first branch, and writes the cache to disk if the cache fills. The Joiner transformation must then read the data from disk when it receives the data from the second branch. This can slow processing.
Note: When you join data using this method, the Integration Service reads the source data for
each source instance, so performance can be slower than joining two branches of a pipeline.
319
Guidelines
Use the following guidelines when deciding whether to join branches of a pipeline or join two instances of a source:
Join two branches of a pipeline when you have a large source or if you can read the source data only once. For example, you can only read source data from a message queue once. Join two branches of a pipeline when you use sorted data. If the source data is unsorted and you use a Sorter transformation to sort the data, branch the pipeline after you sort the data. Join two instances of a source when you need to add a blocking transformation to the pipeline between the source and the Joiner transformation. Join two instances of a source if one pipeline may process slower than the other pipeline. Join two instances of a source if you need to join unsorted data.
320
321
You join two branches of the same source pipeline. Use the Transaction transformation scope to preserve transaction boundaries. For information about preserving transaction boundaries for a single source, see Preserving Transaction Boundaries for a Single Pipeline on page 323. You join two sources, and you want to preserve transaction boundaries for the detail source. Use the Row transformation scope to preserve transaction boundaries in the detail pipeline. For more information about preserving transaction boundaries for the detail source, see Preserving Transaction Boundaries in the Detail Pipeline on page 323. You join two sources or two branches and you want to drop transaction boundaries. Use the All Input transformation scope to apply the transformation logic to all incoming data and drop transaction boundaries for both pipelines. For more information about dropping transaction boundaries for two pipelines, see Dropping Transaction Boundaries for Two Pipelines on page 324.
You can drop transaction boundaries when you join the following sources:
Table 14-2 summarizes how to preserve transaction boundaries using transformation scopes with the Joiner transformation:
Table 14-2. Integration Service Behavior with Transformation Scopes for the Joiner Transformation
Transformation Scope Row Input Type Unsorted Sorted *Transaction Sorted Integration Service Behavior Preserves transaction boundaries in the detail pipeline. Session fails. Preserves transaction boundaries when master and detail originate from the same transaction generator. Session fails when master and detail do not originate from the same transaction generator Session fails. Drops transaction boundaries.
*Sessions fail if you use real-time data with All Input or Transaction transformation scopes.
322
For more information about transformation scope and transaction boundaries, see Understanding Commit Points in the Workflow Administration Guide.
Master and detail pipeline branches originate from the same transaction.
The Integration Service joins the pipeline branches and preserves transaction boundaries.
323
324
In the Mapping Designer, click Transformation > Create. Select the Joiner transformation. Enter a name, and click OK. The naming convention for Joiner transformations is JNR_TransformationName. Enter a description for the transformation. The Designer creates the Joiner transformation.
2.
Drag all the input/output ports from the first source into the Joiner transformation. The Designer creates input/output ports for the source fields in the Joiner transformation as detail fields by default. You can edit this property later.
3.
Select and drag all the input/output ports from the second source into the Joiner transformation. The Designer configures the second set of source fields and master fields by default.
4. 5.
Double-click the title bar of the Joiner transformation to open the transformation. Click the Ports tab.
6.
Click any box in the M column to switch the master/detail relationship for the sources.
325
Tip : To improve performance for an unsorted Joiner transformation, use the source with
fewer rows as the master source. To improve performance for a sorted Joiner transformation, use the source with fewer duplicate key values as the master.
7.
Add default values for specific ports. Some ports are likely to contain null values, since the fields in one of the sources may be empty. You can specify a default value if the target database does not handle NULLs.
8.
9.
Click the Add button to add a condition. You can add multiple conditions. The master and detail ports must have matching datatypes. The Joiner transformation only supports equivalent (=) joins. For more information about defining the join condition, see Defining a Join Condition on page 310.
326
10.
Click the Properties tab and configure properties for the transformation.
Note: You can edit the join condition from the Condition tab. The keyword AND
separates multiple conditions. For more information about defining the properties, see Joiner Transformation Properties on page 308.
11. 12.
Click OK. Click the Metadata Extensions tab to configure metadata extensions. For information about working with metadata extensions, see Metadata Extensions in the Repository Guide.
13.
327
Tips
Perform joins in a database when possible. Performing a join in a database is faster than performing a join in the session. In some cases, this is not possible, such as joining tables from two different databases or flat file systems. If you want to perform a join in a database, use the following options:
Create a pre-session stored procedure to join the tables in a database. Use the Source Qualifier transformation to perform the join. For more information, see Joining Source Data on page 476 for more information.
Join sorted data when possible. You can improve session performance by configuring the Joiner transformation to use sorted input. When you configure the Joiner transformation to use sorted data, the Integration Service improves performance by minimizing disk input and output. You see the greatest performance improvement when you work with large data sets. For more information, see Using Sorted Input on page 314. For an unsorted Joiner transformation, designate the source with fewer rows as the master source. For optimal performance and disk storage, designate the source with the fewer rows as the master source. During a session, the Joiner transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process. For a sorted Joiner transformation, designate the source with fewer duplicate key values as the master source. For optimal performance and disk storage, designate the source with fewer duplicate key values as the master source. When the Integration Service processes a sorted Joiner transformation, it caches rows for one hundred keys at a time. If the master source contains many rows with the same key value, the Integration Service must cache more rows, and performance can be slowed.
328
Chapter 15
Lookup Transformation
Overview, 330 Connected and Unconnected Lookups, 331 Relational and Flat File Lookups, 333 Lookup Components, 335 Lookup Properties, 338 Lookup Query, 345 Lookup Condition, 349 Lookup Caches, 351 Configuring Unconnected Lookup Transformations, 352 Creating a Lookup Transformation, 356 Tips, 358
329
Overview
Transformation type: Passive Connected/Unconnected
Use a Lookup transformation in a mapping to look up data in a flat file or a relational table, view, or synonym. You can import a lookup definition from any flat file or relational database to which both the PowerCenter Client and Integration Service can connect. Use multiple Lookup transformations in a mapping. The Integration Service queries the lookup source based on the lookup ports in the transformation. It compares Lookup transformation port values to lookup source column values based on the lookup condition. Pass the result of the lookup to other transformations and a target. Use the Lookup transformation to perform many tasks, including:
Get a related value. For example, the source includes employee ID, but you want to include the employee name in the target table to make the summary data easier to read. Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales). Update slowly changing dimension tables. Use a Lookup transformation to determine whether rows already exist in the target. Connected or unconnected. Connected and unconnected transformations receive input and send output in different ways. Relational or flat file lookup. When you create a Lookup transformation, you can choose to perform a lookup on a flat file or a relational table. When you create a Lookup transformation using a relational table as the lookup source, you can connect to the lookup source using ODBC and import the table definition as the structure for the Lookup transformation. When you create a Lookup transformation using a flat file as a lookup source, the Designer invokes the Flat File Wizard. For more information about using the Flat File Wizard, see Working with Flat Files in the Designer Guide.
You can configure the Lookup transformation to complete the following types of lookups:
Cached or uncached. Sometimes you can improve session performance by caching the lookup table. If you cache the lookup, you can choose to use a dynamic or static cache. By default, the lookup cache remains static and does not change during the session. With a dynamic cache, the Integration Service inserts or updates rows in the cache during the session. When you cache the target table as the lookup, you can look up values in the target and insert them if they do not exist, or update them if they do.
330
Designate one return port (R). Returns one column from each row. If there is no match for the lookup condition, the Integration Service returns NULL.
If there is a match for the lookup condition, the Integration Service returns the result of the lookup condition into the return port.
Pass one output value to another transformation. The lookup/output/return port passes the value to the transformation calling :LKP expression. Does not support user-defined default values.
If the transformation uses a dynamic cache, the Integration Service inserts the row into the cache when it does not find the row in the cache. When the Integration Service finds the row in the cache, it updates the row in the cache or leaves it unchanged. It flags the row as insert, update, or no change. 4. The Integration Service passes return values from the query to the next transformation. If the transformation uses a dynamic cache, you can pass rows to a Filter or Router transformation to filter new rows to the target.
Note: This chapter discusses connected Lookup transformations unless otherwise specified.
For more information about unconnected Lookup transformations, see Configuring Unconnected Lookup Transformations on page 352.
332
Relational Lookups
When you create a Lookup transformation using a relational table as a lookup source, you can connect to the lookup source using ODBC and import the table definition as the structure for the Lookup transformation. You can override the default SQL statement to add a WHERE clause or to query multiple tables.
Use indirect files as lookup sources by specifying a file list as the lookup file name. Use sorted input for the lookup. You can sort null data high or low. With relational lookups, this is based on the database support. Use case-sensitive string comparison with flat file lookups. With relational lookups, the case-sensitive comparison is based on the database support.
In the following flat file lookup source, the keys are grouped, but not sorted. The Integration Service can cache the data, but performance may not be optimal.
OrderID 1001 1001 CustID CA502 CA501 ItemNo. F895S C530S ItemDesc Flashlight Compass Comments
Key data is grouped, but not sorted. CustID is out of order within OrderID.
333
Comments
The keys are not grouped in the following flat file lookup source. The Integration Service cannot cache the data and fails the session.
OrderID 1001 1001 1005 1003 1003 1001 CustID CA501 CA501 OK503 TN601 CA500 CA502 ItemNo. T552T C530S S104E R938M F304T F895S ItemDesc Tent Compass Safety Knife Regulator System First Aid Kit Flashlight
Key data for CustID is not grouped.
Comments
If you choose sorted input for indirect files, the range of data must not overlap in the files.
334
Lookup Components
Define the following components when you configure a Lookup transformation in a mapping:
Lookup Source
Use a flat file or a relational table for a lookup source. When you create a Lookup transformation, you can import the lookup source from the following locations:
Any relational source or target definition in the repository Any flat file source or target definition in the repository Any table or file that both the Integration Service and PowerCenter Client machine can connect to
The lookup table can be a single table, or you can join multiple tables in the same database using a lookup SQL override. The Integration Service queries the lookup table or an inmemory cache of the table for all incoming rows into the Lookup transformation. The Integration Service can connect to a lookup table using a native database driver or an ODBC driver. However, the native database drivers improve session performance.
Cached lookups. You can improve performance by indexing the columns in the lookup ORDER BY. The session log contains the ORDER BY clause. Uncached lookups. Because the Integration Service issues a SELECT statement for each row passing into the Lookup transformation, you can improve performance by indexing the columns in the lookup condition.
Lookup Ports
The Ports tab contains options similar to other transformations, such as port name, datatype, and scale. In addition to input and output ports, the Lookup transformation includes a
Lookup Components 335
lookup port type that represents columns of data in the lookup source. An unconnected Lookup transformation also includes a return port type that represents the return value. Table 15-2 describes the port types in a Lookup transformation:
Table 15-2. Lookup Transformation Port Types
Ports I Type of Lookup Connected Unconnected Connected Unconnected Number Required Minimum of 1 Description Input port. Create an input port for each lookup port you want to use in the lookup condition. You must have at least one input or input/output port in each Lookup transformation. Output port. Create an output port for each lookup port you want to link to another transformation. You can designate both input and lookup ports as output ports. For connected lookups, you must have at least one output port. For unconnected lookups, use a lookup/output port as a return port (R) to designate a return value. Lookup port. The Designer designates each column in the lookup source as a lookup (L) and output port (O). Return port. Use only in unconnected Lookup transformations. Designates the column of data you want to return based on the lookup condition. You can designate one lookup/output port as the return port.
Minimum of 1
L R
Minimum of 1 1 only
The Lookup transformation also enables an associated ports property that you configure when you use a dynamic cache. Use the following guidelines to configure lookup ports:
If you delete lookup ports from a flat file session, the session fails. You can delete lookup ports from a relational lookup if you are certain the mapping does not use the lookup port. This reduces the amount of memory the Integration Service uses to run the session. To ensure datatypes match when you add an input port, copy the existing lookup ports.
Lookup Properties
On the Properties tab, you can configure properties, such as an SQL override for relational lookups, the lookup source name, and tracing level for the transformation. You can also configure caching properties on the Properties tab. For more information about lookup properties, see Lookup Properties on page 338.
336
Lookup Condition
On the Condition tab, you can enter the condition or conditions you want the Integration Service to use to determine whether input data qualifies values in the lookup source or cache. For more information about the lookup condition, see Lookup Condition on page 349.
Metadata Extensions
You can extend the metadata stored in the repository by associating information with repository objects, such as Lookup transformations. For example, when you create a Lookup transformation, you may want to store your name and the creation date with the Lookup transformation. You associate information with repository metadata using metadata extensions. For more information, see Metadata Extensions in the Repository Guide.
Lookup Components
337
Lookup Properties
Properties for the Lookup transformation identify the database source, how the Integration Service processes the transformation, and how it handles caching and multiple matches. When you create a mapping, you specify the properties for each Lookup transformation. When you create a session, you can override some properties, such as the index and data cache size, for each transformation in the session properties. Table 15-3 describes the Lookup transformation properties:
Table 15-3. Lookup Transformation Properties
Option Lookup SQL Override Lookup Type Relational Description Overrides the default SQL statement to query the lookup table. Specifies the SQL statement you want the Integration Service to use for querying lookup values. Use only with the lookup cache enabled. For more information, see Lookup Query on page 345. Specifies the name of the table from which the transformation looks up and caches values. You can import a table, view, or synonym from another database by selecting the Import button on the dialog box that appears when you first create a Lookup transformation. If you enter a lookup SQL override, you do not need to add an entry for this option. Indicates whether the Integration Service caches lookup values during the session. When you enable lookup caching, the Integration Service queries the lookup source once, caches the values, and looks up values in the cache during the session. This can improve session performance. When you disable caching, each time a row passes into the transformation, the Integration Service issues a select statement to the lookup source for lookup values. Note: The Integration Service always caches flat file lookups. Determines what happens when the Lookup transformation finds multiple rows that match the lookup condition. You can select the first or last row returned from the cache or lookup source, or report an error. Or, you can allow the Lookup transformation to use any value. When you configure the Lookup transformation to return any matching value, the transformation returns the first value that matches the lookup condition. It creates an index based on the key ports rather than all Lookup transformation ports. If you do not enable the Output Old Value On Update option, the Lookup Policy On Multiple Match option is set to Report Error for dynamic lookups. For more information about lookup caches, see Lookup Caches on page 359. Displays the lookup condition you set in the Condition tab.
Relational
Lookup Condition
338
Lookup Properties
339
340
Datetime Format
Flat File
Thousand Separator
Flat File
Decimal Separator
Flat File
Lookup Properties
341
Null Ordering
Flat File
Sorted Input
Flat File
Subsecond Precision
Relational
Flat file lookups. Configure location information, such as the file directory, file name, and the file type. Relational lookups. You can define $Source and $Target variables in the session properties. You can also override connection information to use the $DBConnectionName or $AppConnectionName session parameter.
342
Lookup Properties
343
Table 15-4 describes the session properties you configure for flat file lookups:
Table 15-4. Session Properties for Flat File Lookups
Property Lookup Source File Directory Description Enter the directory name. By default, the Integration Service looks in the process variable directory, $PMLookupFileDir, for lookup files. You can enter the full path and file name. If you specify both the directory and file name in the Lookup Source Filename field, clear this field. The Integration Service concatenates this field with the Lookup Source Filename field when it runs the session. You can also use the $InputFileName session parameter to specify the file name. For more information about session parameters, see the Workflow Administration Guide. Name of the lookup file. If you use an indirect file, specify the name of the indirect file you want the Integration Service to read. You can also use the lookup file parameter, $LookupFileName, to change the name of the lookup file a session uses. If you specify both the directory and file name in the Source File Directory field, clear this field. The Integration Service concatenates this field with the Lookup Source File Directory field when it runs the session. For example, if you have C:\lookup_data\ in the Lookup Source File Directory field, then enter filename.txt in the Lookup Source Filename field. When the Integration Service begins the session, it looks for C:\lookup_data\filename.txt. For more information, see the Workflow Administration Guide. Indicates whether the lookup source file contains the source data or a list of files with the same file properties. Choose Direct if the lookup source file contains the source data. Choose Indirect if the lookup source file contains a list of files. When you select Indirect, the Integration Service creates one cache for all files. If you use sorted input with indirect files, verify that the range of data in the files do not overlap. If the range of data overlaps, the Integration Service processes the lookup as if you did not configure for sorted input.
Choose a relational or application connection. Specify a database connection using the $Source or $Target connection variable. Use the session parameter $DBConnectionName or $AppConnectionName, and define it in the parameter file.
For more information about configuring session connections, see the Workflow Administration Guide.
344
Lookup Query
The Integration Service queries the lookup based on the ports and properties you configure in the Lookup transformation. The Integration Service runs a default SQL statement when the first row enters the Lookup transformation. If you use a relational lookup, you can customize the default query with the Lookup SQL Override property.
SELECT. The SELECT statement includes all the lookup ports in the mapping. You can view the SELECT statement by generating SQL using the Lookup SQL Override property. Do not add or delete any columns from the default SQL statement. ORDER BY. The ORDER BY clause orders the columns in the same order they appear in the Lookup transformation. The Integration Service generates the ORDER BY clause. You cannot view this when you generate the default SQL using the Lookup SQL Override property.
Override the ORDER BY clause. Create the ORDER BY clause with fewer columns to increase performance. When you override the ORDER BY clause, you must suppress the generated ORDER BY clause with a comment notation. For more information, see Overriding the ORDER BY Clause on page 346.
Note: If you use pushdown optimization, you cannot override the ORDER BY clause or
A lookup table name or column names contains a reserved word. If the table name or any column name in the lookup query contains a reserved word, you must ensure that all reserved words are enclosed in quotes. For more information, see Reserved Words on page 347. Use parameters and variables. Use parameters and variables when you enter a lookup SQL override. Use any parameter or variable type that you can define in the parameter file. You can enter a parameter or variable within the SQL statement, or you can use a parameter or variable as the SQL query. For example, you can use a session parameter, $ParamMyLkpOverride, as the lookup SQL query, and set $ParamMyLkpOverride to the SQL statement in a parameter file.
Lookup Query
345
The Designer cannot expand parameters and variables in the query override and does not validate it when you use a parameter or variable. The Integration Service expands the parameters and variables when you run the session. For more information about using mapping parameters and variables in expressions, see the Designer Guide. For more information about parameter files, see the Workflow Administration Guide.
A lookup column name contains a slash (/) character. When generating the default lookup query, the Designer and Integration Service replace any slash character (/) in the lookup column name with an underscore character. To query lookup column names containing the slash character, override the default lookup query, replace the underscore characters with the slash character, and enclose the column name in double quotes. Add a WHERE statement. Use a lookup SQL override to add a WHERE statement to the default SQL statement. You might want to use this to reduce the number of rows included in the cache. When you add a WHERE statement to a Lookup transformation using a dynamic cache, use a Filter transformation before the Lookup transformation. This ensures the Integration Service only inserts rows into the dynamic cache and target table that match the WHERE clause. For more information, see Using the WHERE Clause with a Dynamic Cache on page 379.
Note: The session fails if you include large object ports in a WHERE clause.
Other. Use a lookup SQL override if you want to query lookup data from multiple lookups or if you want to modify the data queried from the lookup table before the Integration Service caches the lookup rows. For example, use TO_CHAR to convert dates to strings.
suppress the generated ORDER BY clause with a comment notation. The Integration Service always generates an ORDER BY clause, even if you enter one in the override. Place two dashes -- after the ORDER BY override to suppress the generated ORDER BY clause. For example, a Lookup transformation uses the following lookup condition:
ITEM_ID = IN_ITEM_ID PRICE <= IN_PRICE
The Lookup transformation includes three lookup ports used in the mapping, ITEM_ID, ITEM_NAME, and PRICE. When you enter the ORDER BY clause, enter the columns in the same order as the ports in the lookup condition. You must also enclose all database reserved words in quotes. Enter the following lookup query in the lookup SQL override:
SELECT ITEMS_DIM.ITEM_NAME, ITEMS_DIM.PRICE, ITEMS_DIM.ITEM_ID FROM ITEMS_DIM ORDER BY ITEMS_DIM.ITEM_ID, ITEMS_DIM.PRICE --
346
To override the default ORDER BY clause for a relational lookup, complete the following steps: 1. 2. 3. Generate the lookup query in the Lookup transformation. Enter an ORDER BY clause that contains the condition ports in the same order they appear in the Lookup condition. Place two dashes -- as a comment notation after the ORDER BY clause to suppress the ORDER BY clause that the Integration Service generates. If you override the lookup query with an ORDER BY clause without adding comment notation, the lookup fails.
Note: Sybase has a 16 column ORDER BY limitation. If the Lookup transformation has more
than 16 lookup/output ports (including the ports in the lookup condition), you might want to override the ORDER BY clause or use multiple Lookup transformations to query the lookup table.
Reserved Words
If any lookup name or column name contains a database reserved word, such as MONTH or YEAR, the session fails with database errors when the Integration Service executes SQL against the database. You can create and maintain a reserved words file, reswords.txt, in the Integration Service installation directory. When the Integration Service initializes a session, it searches for reswords.txt. If the file exists, the Integration Service places quotes around matching reserved words when it executes SQL against the database. You may need to enable some databases, such as Microsoft SQL Server and Sybase, to use SQL-92 standards regarding quoted identifiers. Use connection environment SQL to issue the command. For example, with Microsoft SQL Server, use the following command:
SET QUOTED_IDENTIFIER ON
Note: The reserved words file, reswords.txt, is a file that you create and maintain in the
Integration Service installation directory. The Integration Service searches this file and places quotes around reserved words when it executes SQL against source, target, and lookup databases. For more information about reswords.txt, see the Workflow Administration Guide.
You can only override the lookup SQL query for relational lookups. Configure the Lookup transformation for caching. If you do not enable caching, the Integration Service does not recognize the override. Generate the default query, and then configure the override. This helps ensure that all the lookup/output ports are included in the query. If you add or subtract ports from the SELECT statement, the session fails. Use a Filter transformation before a Lookup transformation using a dynamic cache when you add a WHERE clause to the lookup SQL override. This ensures the Integration
Lookup Query
347
Service only inserts rows in the dynamic cache and target table that match the WHERE clause. For more information, see Using the WHERE Clause with a Dynamic Cache on page 379.
If you want to share the cache, use the same lookup SQL override for each Lookup transformation. If you override the ORDER BY clause, the session fails if the ORDER BY clause does not contain the condition ports in the same order they appear in the Lookup condition or if you do not suppress the generated ORDER BY clause with the comment notation. If you use pushdown optimization, you cannot override the ORDER BY clause or suppress the generated ORDER BY clause with comment notation. If the table name or any column name in the lookup query contains a reserved word, you must enclose all reserved words in quotes.
On the Properties tab, open the SQL Editor from within the Lookup SQL Override field. Click Generate SQL to generate the default SELECT statement. Enter the lookup SQL override. Connect to a database, and then click Validate to test the lookup SQL override. Click OK to return to the Properties tab.
348
Lookup Condition
The Integration Service uses the lookup condition to test incoming values. It is similar to the WHERE clause in an SQL query. When you configure a lookup condition for the transformation, you compare transformation input values with values in the lookup source or cache, represented by lookup ports. When you run a workflow, the Integration Service queries the lookup source or cache for all incoming values based on the condition. You must enter a lookup condition in all Lookup transformations. Some guidelines for the lookup condition apply for all Lookup transformations, and some guidelines vary depending on how you configure the transformation. Use the following guidelines when you enter a condition for a Lookup transformation:
The datatypes in a condition must match. Use one input port for each lookup port used in the condition. Use the same input port in more than one condition in a transformation. When you enter multiple conditions, the Integration Service evaluates each condition as an AND, not an OR. The Integration Service returns only rows that match all the conditions you specify. The Integration Service matches null values. For example, if an input lookup condition column is NULL, the Integration Service evaluates the NULL equal to a NULL in the lookup. If you configure a flat file lookup for sorted input, the Integration Service fails the session if the condition columns are not grouped. If the columns are grouped, but not sorted, the Integration Service processes the lookup as if you did not configure sorted input. For more information about sorted input, see Flat File Lookups on page 333.
The lookup condition guidelines and the way the Integration Service processes matches can vary, depending on whether you configure the transformation for a dynamic cache or an uncached or static cache. For more information about lookup caches, see Lookup Caches on page 359.
Use the following operators when you create the lookup condition:
=, >, <, >=, <=, !=
If you include more than one lookup condition, place the conditions in the following order to optimize lookup performance:
Equal to (=) Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=) Not equal to (!=)
Lookup Condition
349
The input value must meet all conditions for the lookup to return a value.
The condition can match equivalent values or supply a threshold condition. For example, you might look for customers who do not live in California, or employees whose salary is greater than $30,000. Depending on the nature of the source and condition, the lookup might return multiple values.
Dynamic Cache
If you configure a Lookup transformation to use a dynamic cache, you can only use the equality operator (=) in the lookup condition.
Return the first matching value, or return the last matching value. You can configure the transformation to return the first matching value or the last matching value. The first and last values are the first value and last value found in the lookup cache that match the lookup condition. When you cache the lookup source, the Integration Service generates an ORDER BY clause for each column in the lookup cache to determine the first and last row in the cache. The Integration Service then sorts each lookup source column in ascending order. The Integration Service sorts numeric columns in ascending numeric order (such as 0 to 10), date/time columns from January to December and from the first of the month to the end of the month, and string columns based on the sort order configured for the session.
Return any matching value. You can configure the Lookup transformation to return any value that matches the lookup condition. When you configure the Lookup transformation to return any matching value, the transformation returns the first value that matches the lookup condition. It creates an index based on the key ports rather than all Lookup transformation ports. When you use any matching value, performance can improve because the process of indexing rows is simplified. Return an error. When the Lookup transformation uses a static cache or no cache, the Integration Service marks the row as an error, writes the row to the session log by default, and increases the error count by one. When the Lookup transformation uses a dynamic cache, the Integration Service fails the session when it encounters multiple matches either while caching the lookup table or looking up values in the cache that contain duplicate keys. Also, if you configure the Lookup transformation to output old values on updates, the Lookup transformation returns an error when it encounters multiple matches.
350
Lookup Caches
You can configure a Lookup transformation to cache the lookup file or table. The Integration Service builds a cache in memory when it processes the first row of data in a cached Lookup transformation. It allocates memory for the cache based on the amount you configure in the transformation or session properties. The Integration Service stores condition values in the index cache and output values in the data cache. The Integration Service queries the cache for each row that enters the transformation. The Integration Service also creates cache files by default in the $PMCacheDir. If the data does not fit in the memory cache, the Integration Service stores the overflow values in the cache files. When the session completes, the Integration Service releases cache memory and deletes the cache files unless you configure the Lookup transformation to use a persistent cache. When configuring a lookup cache, you can specify any of the following options:
Persistent cache Recache from lookup source Static cache Dynamic cache Shared cache
Note: You can use a dynamic cache for relational or flat file lookups.
For more information about working with lookup caches, see Lookup Caches on page 359.
Lookup Caches
351
Testing the results of a lookup in an expression Filtering rows based on the lookup results Marking rows for update based on the result of a lookup, such as updating slowly changing dimension tables Calling the same lookup multiple times in one mapping Add input ports. Add the lookup condition. Designate a return value. Call the lookup from another transformation.
Complete the following steps when you configure an unconnected Lookup transformation: 1. 2. 3. 4.
Create a lookup condition that compares the ITEM_ID in the source with the ITEM_ID in the target. Compare the PRICE for each item in the source with the price in the target table.
If the item exists in the target table and the item price in the source is less than or equal to the price in the target table, you want to delete the row. If the price in the source is greater than the item price in the target table, you want to update the row.
352
Create an input port (IN_ITEM_ID) with datatype Decimal (37,0) to match the ITEM_ID and an IN_PRICE input port with Decimal (10,2) to match the PRICE lookup port.
If the item exists in the mapping source and lookup source and the mapping source price is less than or equal to the lookup price, the condition is true and the lookup returns the values designated by the Return port. If the lookup condition is false, the lookup returns NULL. Therefore, when you write the update strategy expression, use ISNULL nested in an IIF to test for null values.
353
To continue the update strategy example, you can define the ITEM_ID port as the return port. The update strategy expression checks for null values returned. If the lookup condition is true, the Integration Service returns the ITEM_ID. If the condition is false, the Integration Service returns NULL. Figure 15-2 shows a return port in a Lookup transformation:
Figure 15-2. Return Port in a Lookup Transformation
Return Port
To continue the example about the retail store, when you write the update strategy expression, the order of ports in the expression must match the order in the lookup condition. In this case, the ITEM_ID condition is the first lookup condition, and therefore, it is the first argument in the update strategy expression.
IIF(ISNULL(:LKP.lkpITEMS_DIM(ITEM_ID, PRICE)), DD_UPDATE, DD_REJECT)
Use the following guidelines to write an expression that calls an unconnected Lookup transformation:
The order in which you list each argument must match the order of the lookup conditions in the Lookup transformation. The datatypes for the ports in the expression must match the datatypes for the input ports in the Lookup transformation. The Designer does not validate the expression if the datatypes do not match.
354
If one port in the lookup condition is not a lookup/output port, the Designer does not validate the expression. The arguments (ports) in the expression must be in the same order as the input ports in the lookup condition. If you use incorrect :LKP syntax, the Designer marks the mapping invalid. If you call a connected Lookup transformation in a :LKP expression, the Designer marks the mapping invalid.
Tip: Avoid syntax errors when you enter expressions by using the point-and-click method to
355
In the Mapping Designer, click Transformation > Create. Select the Lookup transformation. Enter a name for the transformation. The naming convention for Lookup transformations is LKP_TransformationName. Click OK. In the Select Lookup Table dialog box, you can choose the following options:
2.
Choose an existing table or file definition. Choose to import a definition from a relational table or file. Skip to create a manual definition.
Choose an existing definition. Import a definition.
3. 4. 5. 6. 7.
Define input ports for each lookup condition you want to define. For an unconnected Lookup transformation, create a return port for the value you want to return from the lookup. Define output ports for the values you want to pass to another transformation. For Lookup transformations that use a dynamic lookup cache, associate an input port or sequence ID with each lookup port. Add the lookup conditions. If you include more than one lookup condition, place the conditions in the following order to optimize lookup performance:
Equal to (=) Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=) Not equal to (!=)
For information about lookup conditions, see Lookup Condition on page 349.
356 Chapter 15: Lookup Transformation
8.
On the Properties tab, set the properties for the Lookup transformation, and click OK. For a list of properties, see Lookup Properties on page 338.
9.
For unconnected Lookup transformations, write an expression in another transformation using :LKP to call the unconnected Lookup transformation.
357
Tips
Add an index to the columns used in a lookup condition. If you have privileges to modify the database containing a lookup table, you can improve performance for both cached and uncached lookups. This is important for very large lookup tables. Since the Integration Service needs to query, sort, and compare values in these columns, the index needs to include every column used in a lookup condition. Place conditions with an equality operator (=) first. If you include more than one lookup condition, place the conditions in the following order to optimize lookup performance:
Equal to (=) Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=) Not equal to (!=)
Cache small lookup tables. Improve session performance by caching small lookup tables. The result of the lookup query and processing is the same, whether or not you cache the lookup table. Join tables in the database. If the lookup table is on the same database as the source table in the mapping and caching is not feasible, join the tables in the source database rather than using a Lookup transformation. Use a persistent lookup cache for static lookups. If the lookup source does not change between sessions, configure the Lookup transformation to use a persistent lookup cache. The Integration Service then saves and reuses cache files from session to session, eliminating the time required to read the lookup source. Call unconnected Lookup transformations with the :LKP reference qualifier. When you write an expression using the :LKP reference qualifier, you call unconnected Lookup transformations only. If you try to call a connected Lookup transformation, the Designer displays an error and marks the mapping invalid.
358
Chapter 16
Lookup Caches
Overview, 360 Building Connected Lookup Caches, 362 Using a Persistent Lookup Cache, 364 Working with an Uncached Lookup or Static Cache, 366 Working with a Dynamic Lookup Cache, 367 Sharing the Lookup Cache, 384 Tips, 390
359
Overview
You can configure a Lookup transformation to cache the lookup table. The Integration Service builds a cache in memory when it processes the first row of data in a cached Lookup transformation. It allocates memory for the cache based on the amount you configure in the transformation or session properties. The Integration Service stores condition values in the index cache and output values in the data cache. The Integration Service queries the cache for each row that enters the transformation. The Integration Service also creates cache files by default in the $PMCacheDir. If the data does not fit in the memory cache, the Integration Service stores the overflow values in the cache files. When the session completes, the Integration Service releases cache memory and deletes the cache files unless you configure the Lookup transformation to use a persistent cache. If you use a flat file lookup, the Integration Service always caches the lookup source. If you configure a flat file lookup for sorted input, the Integration Service cannot cache the lookup if the condition columns are not grouped. If the columns are grouped, but not sorted, the Integration Service processes the lookup as if you did not configure sorted input. For more information, see Flat File Lookups on page 333. When you configure a lookup cache, you can configure the following cache settings:
Building caches. You can configure the session to build caches sequentially or concurrently. When you build sequential caches, the Integration Service creates caches as the source rows enter the Lookup transformation. When you configure the session to build concurrent caches, the Integration Service does not wait for the first row to enter the Lookup transformation before it creates caches. Instead, it builds multiple caches concurrently. For more information, see Building Connected Lookup Caches on page 362. Persistent cache. You can save the lookup cache files and reuse them the next time the Integration Service processes a Lookup transformation configured to use the cache. For more information, see Using a Persistent Lookup Cache on page 364. Recache from source. If the persistent cache is not synchronized with the lookup table, you can configure the Lookup transformation to rebuild the lookup cache. For more information, see Building Connected Lookup Caches on page 362. Static cache. You can configure a static, or read-only, cache for any lookup source. By default, the Integration Service creates a static cache. It caches the lookup file or table and looks up values in the cache for each row that comes into the transformation. When the lookup condition is true, the Integration Service returns a value from the lookup cache. The Integration Service does not update the cache while it processes the Lookup transformation. For more information, see Working with an Uncached Lookup or Static Cache on page 366. Dynamic cache. To cache a target table or flat file source and insert new rows or update existing rows in the cache, use a Lookup transformation with a dynamic cache. The Integration Service dynamically inserts or updates data in the lookup cache and passes data
360
to the target. For more information, see Working with a Dynamic Lookup Cache on page 367.
Shared cache. You can share the lookup cache between multiple transformations. You can share an unnamed cache between transformations in the same mapping. You can share a named cache between transformations in the same or different mappings. For more information, see Sharing the Lookup Cache on page 384.
When you do not configure the Lookup transformation for caching, the Integration Service queries the lookup table for each input row. The result of the Lookup query and processing is the same, whether or not you cache the lookup table. However, using a lookup cache can increase session performance. Optimize performance by caching the lookup table when the source table is large. For more information about caching properties, see Lookup Properties on page 338. For information about configuring the cache size, see Session Caches in the Workflow Administration Guide.
Note: The Integration Service uses the same transformation logic to process a Lookup
transformation whether you configure it to use a static cache or no cache. However, when you configure the transformation to use no cache, the Integration Service queries the lookup table instead of the lookup cache.
Cache Comparison
Table 16-1 compares the differences between an uncached lookup, a static cache, and a dynamic cache:
Table 16-1. Lookup Caching Comparison
Uncached You cannot insert or update the cache. You cannot use a flat file lookup. When the condition is true, the Integration Service returns a value from the lookup table or cache. When the condition is not true, the Integration Service returns the default value for connected transformations and NULL for unconnected transformations. For more information, see Working with an Uncached Lookup or Static Cache on page 366. Static Cache You cannot insert or update the cache. Use a relational or a flat file lookup. When the condition is true, the Integration Service returns a value from the lookup table or cache. When the condition is not true, the Integration Service returns the default value for connected transformations and NULL for unconnected transformations. For more information, see Working with an Uncached Lookup or Static Cache on page 366. Dynamic Cache You can insert or update rows in the cache as you pass rows to the target. Use a relational or a flat file lookup. When the condition is true, the Integration Service either updates rows in the cache or leaves the cache unchanged, depending on the row type. This indicates that the row is in the cache and target table. You can pass updated rows to a target. When the condition is not true, the Integration Service either inserts rows into the cache or leaves the cache unchanged, depending on the row type. This indicates that the row is not in the cache or target. You can pass inserted rows to a target table. For more information, see Updating the Dynamic Lookup Cache on page 377.
Overview
361
Sequential caches. The Integration Service builds lookup caches sequentially. The Integration Service builds the cache in memory when it processes the first row of the data in a cached lookup transformation. For more information, see Sequential Caches on page 362. Concurrent caches. The Integration Service builds lookup caches concurrently. It does not need to wait for data to reach the Lookup transformation. For more information, see Concurrent Caches on page 363.
Note: The Integration Service builds caches for unconnected Lookup transformations
sequentially regardless of how you configure cache building. If you configure the session to build concurrent caches for an unconnected Lookup transformation, the Integration Service ignores this setting and builds unconnected Lookup transformation caches sequentially.
Sequential Caches
By default, the Integration Service builds a cache in memory when it processes the first row of data in a cached Lookup transformation. The Integration Service creates each lookup cache in the pipeline sequentially. The Integration Service waits for any upstream active transformation to complete processing before it starts processing the rows in the Lookup transformation. The Integration Service does not build caches for a downstream Lookup transformation until an upstream Lookup transformation completes building a cache. For example, the following mapping contains an unsorted Aggregator transformation followed by two Lookup transformations. Figure 16-1 shows a mapping that contains multiple Lookup transformations:
Figure 16-1. Building Lookup Caches Sequentially
The Integration Service processes all the rows for the unsorted Aggregator transformation and begins processing the first Lookup transformation after the unsorted Aggregator
362
transformation completes. When it processes the first input row, the Integration Service begins building the first lookup cache. After the Integration Service finishes building the first lookup cache, it can begin processing the lookup data. The Integration Service begins building the next lookup cache when the first row of data reaches the Lookup transformation. You might want to process lookup caches sequentially if the Lookup transformation may not process row data. The Lookup transformation may not process row data if the transformation logic is configured to route data to different pipelines based on a condition. Configuring sequential caching may allow you to avoid building lookup caches unnecessarily. For example, a Router transformation might route data to one pipeline if a condition resolves to true, and it might route data to another pipeline if the condition resolves to false. In this case, a Lookup transformation might not receive data at all.
Concurrent Caches
You can configure the Integration Service to create lookup caches concurrently. You may be able to improve session performance using concurrent caches. Performance may especially improve when the pipeline contains an active transformations upstream of the Lookup transformation. You may want to configure the session to create concurrent caches if you are certain that you will need to build caches for each of the Lookup transformations in the session. When you configure the Lookup transformation to create concurrent caches, it does not wait for upstream transformations to complete before it creates lookup caches, and it does not need to finish building a lookup cache before it can begin building other lookup caches. For example, you configure the session shown in Figure 16-1 on page 362 for concurrent cache creation. Figure 16-2 shows lookup transformation caches built concurrently:
Figure 16-2. Building Lookup Caches Concurrently
When you run the session, the Integration Service builds the Lookup caches concurrently. It does not wait for upstream transformations to complete, and it does not wait for other Lookup transformations to complete cache building.
Note: You cannot process caches for unconnected Lookup transformations concurrently.
To configure the session to create concurrent caches, configure a value for the session configuration attribute, Additional Concurrent Pipelines for Lookup Cache Creation.
Building Connected Lookup Caches 363
Table 16-2 summarizes how the Integration Service handles persistent caching for named and unnamed caches:
Table 16-2. Integration Service Handling of Persistent Caches
Mapping or Session Changes Between Sessions Integration Service cannot locate cache files. Enable or disable the Enable High Precision option in session properties. Edit the transformation in the Mapping Designer, Mapplet Designer, or Reusable Transformation Developer.* Edit the mapping (excluding Lookup transformation). Change database connection or the file location used to access the lookup table. Change the Integration Service data movement mode. Change the sort order in Unicode mode. Change the Integration Service code page to a compatible code page. Change the Integration Service code page to an incompatible code page. Named Cache Rebuilds cache. Fails session. Fails session. Reuses cache. Fails session. Fails session. Fails session. Reuses cache. Fails session. Unnamed Cache Rebuilds cache. Rebuilds cache. Rebuilds cache. Rebuilds cache. Rebuilds cache. Rebuilds cache. Rebuilds cache. Reuses cache. Rebuilds cache.
*Editing properties such as transformation description or port description does not affect persistent cache handling.
365
366
Inserts the row into the cache. The row is not in the cache and you specified to insert rows into the cache. You can configure the transformation to insert rows into the cache based on input ports or generated sequence IDs. The Integration Service flags the row as insert. Updates the row in the cache. The row exists in the cache and you specified to update rows in the cache. The Integration Service flags the row as update. The Integration Service updates the row in the cache based on the input ports. Makes no change to the cache. The row exists in the cache and you specified to insert new rows only. Or, the row is not in the cache and you specified to update existing rows only. Or, the row is in the cache, but based on the lookup condition, nothing changes. The Integration Service flags the row as unchanged.
The Integration Service either inserts or updates the cache or makes no change to the cache, based on the results of the lookup query, the row type, and the Lookup transformation properties you define. For more information, see Updating the Dynamic Lookup Cache on page 377. The following list describes some situations when you use a dynamic lookup cache:
Updating a master customer table with new and updated customer information. You want to load new and updated customer information into a master customer table. Use a Lookup transformation that performs a lookup on the target table to determine if a customer exists or not. Use a dynamic lookup cache that inserts and updates rows in the cache as it passes rows to the target. Loading data into a slowly changing dimension table and a fact table. You want to load data into a slowly changing dimension table and a fact table. Create two pipelines and use a Lookup transformation that performs a lookup on the dimension table. Use a dynamic lookup cache to load data to the dimension table. Use a static lookup cache to load data to the fact table, making sure you specify the name of the dynamic cache from the first pipeline. For more information, see Example Using a Dynamic Lookup Cache on page 381. Reading a flat file that is an export from a relational table. You want to read data from a Teradata table, but the ODBC connection is slow. You can export the Teradata table
Working with a Dynamic Lookup Cache 367
contents to a flat file and use the file as a lookup source. You can pass the lookup cache changes back to the Teradata table if you configure the Teradata table as a relational target in the mapping. Use a Router or Filter transformation with the dynamic Lookup transformation to route inserted or updated rows to the cached target table. You can route unchanged rows to another target table or flat file, or you can drop them. When you create multiple partitions in a pipeline that use a dynamic lookup cache, the Integration Service creates one memory cache and one disk cache for each transformation. However, if you add a partition point at the Lookup transformation, the Integration Service creates one memory cache for each partition. For more information, see Session Caches in the Workflow Administration Guide. Figure 16-3 shows a mapping with a Lookup transformation that uses a dynamic lookup cache:
Figure 16-3. Mapping with a Dynamic Lookup Cache
NewLookupRow. The Designer adds this port to a Lookup transformation configured to use a dynamic cache. Indicates with a numeric value whether the Integration Service inserts or updates the row in the cache, or makes no change to the cache. To keep the lookup cache and the target table synchronized, you pass rows to the target when the NewLookupRow value is equal to 1 or 2. For more information, see Using the NewLookupRow Port on page 369.
368
Associated Port. Associate lookup ports with either an input/output port or a sequence ID. The Integration Service uses the data in the associated ports to insert or update rows in the lookup cache. If you associate a sequence ID, the Integration Service generates a primary key for inserted rows in the lookup cache. For more information, see Using the Associated Input Port on page 370. Ignore Null Inputs for Updates. The Designer activates this port property for lookup/ output ports when you configure the Lookup transformation to use a dynamic cache. Select this property when you do not want the Integration Service to update the column in the cache when the data in this column contains a null value. For more information, see Using the Ignore Null Property on page 374. Ignore in Comparison. The Designer activates this port property for lookup/output ports not used in the lookup condition when you configure the Lookup transformation to use a dynamic cache. The Integration Service compares the values in all lookup ports with the values in their associated input ports by default. Select this property if you want the Integration Service to ignore the port when it compares values before updating a row. For more information, see Using the Ignore in Comparison Property on page 375.
Figure 16-4 shows the output port properties unique to a dynamic Lookup transformation:
Figure 16-4. Dynamic Lookup Transformation Ports Tab
When the Integration Service reads a row, it changes the lookup cache depending on the results of the lookup query and the Lookup transformation properties you define. It assigns the value 0, 1, or 2 to the NewLookupRow port to indicate if it inserts or updates the row in the cache, or makes no change. For information about how the Integration Service determines to update the cache, see Updating the Dynamic Lookup Cache on page 377. The NewLookupRow value indicates how the Integration Service changes the lookup cache. It does not change the row type. Therefore, use a Filter or Router transformation and an Update Strategy transformation to help keep the target table and lookup cache synchronized. Configure the Filter transformation to pass new and updated rows to the Update Strategy transformation before passing them to the cached target. Use the Update Strategy transformation to change the row type of each row to insert or update, depending on the NewLookupRow value. You can drop the rows that do not change the cache, or you can pass them to another target. For more information, see Using Update Strategy Transformations with a Dynamic Cache on page 375. Define the filter condition in the Filter transformation based on the value of NewLookupRow. For example, use the following condition to pass both inserted and updated rows to the cached target:
NewLookupRow != 0
370
When you select Sequence-ID in the Associated Port column, the Integration Service generates a key when it inserts a row into the lookup cache. The Integration Service uses the following process to generate sequence IDs: 1. 2. 3. When the Integration Service creates the dynamic lookup cache, it tracks the range of values in the cache associated with any port using a sequence ID. When the Integration Service inserts a new row of data into the cache, it generates a key for a port by incrementing the greatest sequence ID existing value by one. When the Integration Service reaches the maximum number for a generated sequence ID, it starts over at one. It then increments each sequence ID by one until it reaches the smallest existing value minus one. If the Integration Service runs out of unique sequence ID numbers, the session fails.
The Integration Service generates a sequence ID for each row it inserts into the cache.
Input value. Value the Integration Service passes into the transformation. Lookup value. Value that the Integration Service inserts into the cache. Input/output port output value. Value that the Integration Service passes out of the input/output port.
The lookup/output port output value depends on whether you choose to output old or new values when the Integration Service updates a row:
Output old values on update. The Integration Service outputs the value that existed in the cache before it updated the row. Output new values on update. The Integration Service outputs the updated value that it writes in the cache. The lookup/output port value matches the input/output port value.
Note: You configure to output old or new values using the Output Old Value On Update
transformation property. For more information about this property, see Lookup Properties on page 338. For example, the following Lookup transformation uses a dynamic lookup cache:
371
By default, the row type of all rows entering the Lookup transformation is insert. To perform both inserts and updates in the cache and target table, you select the Insert Else Update property in the Lookup transformation. The following sections describe the values of the rows in the cache, the input rows, lookup rows, and output rows as you run the session.
Input Values
The source contains rows that exist and rows that do not exist in the target table. The following rows pass into the Lookup transformation from the Source Qualifier transformation:
SQ_CUST_ID 80001 80002 99001 SQ_CUST_NAME Marion Atkins Laura Gomez Jon Freeman SQ_ADDRESS 100 Main St. 510 Broadway Ave. 555 6th Ave.
Note: The input values always match the values the Integration Service outputs out of the
input/output ports.
Lookup Values
The Integration Service looks up values in the cache based on the lookup condition. It updates rows in the cache for existing customer IDs 80001 and 80002. It inserts a row into the cache for customer ID 99001. The Integration Service generates a new key (PK_PRIMARYKEY) for the new row.
PK_PRIMARYKEY CUST_ID 100001 100002 100004 80001 80002 99001 CUST_NAME ADDRESS
Marion Atkins 100 Main St. Laura Gomez Jon Freeman 510 Broadway Ave. 555 6th Ave.
372
Output Values
The Integration Service flags the rows in the Lookup transformation based on the inserts and updates it performs on the dynamic cache. These rows pass through an Expression transformation to a Router transformation that filters and passes on the inserted and updated rows to an Update Strategy transformation. The Update Strategy transformation flags the rows based on the value of the NewLookupRow port. The output values of the lookup/output and input/output ports depend on whether you choose to output old or new values when the Integration Service updates a row. However, the output values of the NewLookupRow port and any lookup/output port that uses the Sequence-ID is the same for new and updated rows. When you choose to output new values, the lookup/output ports output the following values:
NewLookupRow 2 2 1 PK_PRIMARYKEY 100001 100002 100004 CUST_ID 80001 80002 99001 CUST_NAME ADDRESS
Marion Atkins 100 Main St. Laura Gomez Jon Freeman 510 Broadway Ave. 555 6th Ave.
When you choose to output old values, the lookup/output ports output the following values:
NewLookupRow 2 2 1 PK_PRIMARYKEY 100001 100002 100004 CUST_ID 80001 80002 99001 CUST_NAME Marion James Laura Jones Jon Freeman ADDRESS 100 Main St. 510 Broadway Ave. 555 6th Ave.
Note that when the Integration Service updates existing rows in the lookup cache and when it passes rows to the lookup/output ports, it always uses the existing primary key (PK_PRIMARYKEY) values for rows that exist in the cache and target table. The Integration Service uses the sequence ID to generate a new primary key for the customer that it does not find in the cache. The Integration Service inserts the new primary key value into the lookup cache and outputs it to the lookup/output port. The Integration Service output values from the input/output ports that match the input values. For those values, see Input Values on page 372.
Note: If the input value is NULL and you select the Ignore Null property for the associated
input port, the input value does not equal the lookup value or the value out of the input/ output port. When you select the Ignore Null property, the lookup cache and the target table might become unsynchronized if you pass null values to the target. You must verify that you do not pass null values to the target. For more information, see Using the Ignore Null Property on page 374.
373
Insert null values. The Integration Service uses null values from the source and updates the lookup cache and target table using all values from the source. Ignore null values. The Integration Service ignores the null values in the source and updates the lookup cache and target table using only the not null values from the source.
If you know the source data contains null values, and you do not want the Integration Service to update the lookup cache or target with null values, select the Ignore Null property for the corresponding lookup/output port. For example, you want to update the master customer table. The source contains new customers and current customers whose last names have changed. The source contains the customer IDs and names of customers whose names have changed, but it contains null values for the address columns. You want to insert new customers and update the current customer names while retaining the current address information in a master customer table. For example, the master customer table contains the following data:
PRIMARYKEY 100001 100002 100003 CUST_ID 80001 80002 80003 CUST_NAME Marion James Laura Jones Shelley Lau ADDRESS 100 Main St. 510 Broadway Ave. 220 Burnside Ave. CITY Mt. View Raleigh Portland STATE CA NC OR ZIP 94040 27601 97210
Select Insert Else Update in the Lookup transformation in the mapping. Select the Ignore Null option for all lookup/output ports in the Lookup transformation. When you run a session, the Integration Service ignores null values in the source data and updates the lookup cache and the target table with not null values:
PRIMARYKEY 100001 100002 100003 100004 CUST_ID 80001 80002 80003 99001 CUST_NAME ADDRESS CITY Mt. View Raleigh Portland San Jose STATE CA NC OR CA ZIP 94040 27601 97210 95051
Marion Atkins 100 Main St. Laura Gomez Shelley Lau Jon Freeman 510 Broadway Ave. 220 Burnside Ave. 555 6th Ave.
Note: When you choose to ignore NULLs, you must verify that you output the same values to
the target that the Integration Service writes to the lookup cache. When you choose to ignore
374
NULLs, the lookup cache and the target table might become unsynchronized if you pass null input values to the target. Configure the mapping based on the value you want the Integration Service to output from the lookup/output ports when it updates a row in the cache:
New values. Connect only lookup/output ports from the Lookup transformation to the target. Old values. Add an Expression transformation after the Lookup transformation and before the Filter or Router transformation. Add output ports in the Expression transformation for each port in the target table and create expressions to ensure you do not output null input values to the target.
Integration Service fails the session when you ignore all ports.
Rows entering the Lookup transformation. By default, the row type of all rows entering a Lookup transformation is insert. However, use an Update Strategy transformation before a Lookup transformation to define all rows as update, or some as update and some as insert. Rows leaving the Lookup transformation. The NewLookupRow value indicates how the Integration Service changed the lookup cache, but it does not change the row type. Use a Filter or Router transformation after the Lookup transformation to direct rows leaving the Lookup transformation based on the NewLookupRow value. Use Update Strategy transformations after the Filter or Router transformation to flag rows for insert or update before the target definition in the mapping.
375
Note: If you want to drop the unchanged rows, do not connect rows from the Filter or Router
transformation with the NewLookupRow equal to 0 to the target definition. When you define the row type as insert for rows entering a Lookup transformation, use the Insert Else Update property in the Lookup transformation. When you define the row type as update for rows entering a Lookup transformation, use the Update Else Insert property in the Lookup transformation. If you define some rows entering a Lookup transformation as update and some as insert, use either the Update Else Insert or Insert Else Update property, or use both properties. For more information, see Updating the Dynamic Lookup Cache on page 377. Figure 16-5 shows a mapping with multiple Update Strategy transformations and a Lookup transformation using a dynamic cache:
Figure 16-5. Using Update Strategy Transformations with a Lookup Transformation
Update Strategy marks rows as update. Update Strategy inserts new rows into the target.
In this case, the Update Strategy transformation before the Lookup transformation flags all rows as update. Select the Update Else Insert property in the Lookup transformation. The Router transformation sends the inserted rows to the Insert_New Update Strategy transformation and sends the updated rows to the Update_Existing Update Strategy transformation. The two Update Strategy transformations to the right of the Lookup transformation flag the rows for insert or update for the target.
376
You must also define the following update strategy target table options:
These update strategy target table options ensure that the Integration Service updates rows marked for update and inserts rows marked for insert. If you do not choose Data Driven, the Integration Service flags all rows for the row type you specify in the Treat Source Rows As option and does not use the Update Strategy transformations in the mapping to flag the rows. The Integration Service does not insert and update the correct rows. If you do not choose Update as Update, the Integration Service does not correctly update the rows flagged for update in the target table. As a result, the lookup cache and target table might become unsynchronized. For more information, see Setting the Update Strategy for a Session on page 602. For more information about configuring target session properties, see Working with Targets in the Workflow Administration Guide.
Insert Else Update. Applies to rows entering the Lookup transformation with the row type of insert. Update Else Insert. Applies to rows entering the Lookup transformation with the row type of update.
Note: You can select either the Insert Else Update or Update Else Insert property, or you can
select both properties or neither property. The Insert Else Update property only affects rows entering the Lookup transformation with the row type of insert. The Update Else Insert property only affects rows entering the Lookup transformation with the row type of update.
377
row of any other row type, such as update, enters the Lookup transformation, the Insert Else Update property has no effect on how the Integration Service handles the row. When you select Insert Else Update and the row type entering the Lookup transformation is insert, the Integration Service inserts the row into the cache if it is new. If the row exists in the index cache but the data cache is different than the current row, the Integration Service updates the row in the data cache. If you do not select Insert Else Update and the row type entering the Lookup transformation is insert, the Integration Service inserts the row into the cache if it is new, and makes no change to the cache if the row exists. Table 16-4 describes how the Integration Service changes the lookup cache when the row type of the rows entering the Lookup transformation is insert:
Table 16-4. Dynamic Lookup Cache Behavior for Insert Row Type
Insert Else Update Option Cleared (insert only) Row Found in Cache Yes No Selected Yes Yes No Data Cache is Different n/a n/a Yes No n/a Lookup Cache Result No change Insert Update No change Insert NewLookupRow Value 0 1 2* 0 1
*If you select Ignore Null for all lookup ports not in the lookup condition and if all those ports contain null values, the Integration Service does not change the cache and the NewLookupRow value equals 0. For more information, see Using the Ignore Null Property on page 374.
378
Table 16-5 describes how the Integration Service changes the lookup cache when the row type of the rows entering the Lookup transformation is update:
Table 16-5. Dynamic Lookup Cache Behavior for Update Row Type
Update Else Insert Option Cleared (update only) Row Found in Cache Yes Yes No Selected Yes Yes No Data Cache is Different Yes No n/a Yes No n/a Lookup Cache Result Update No change No change Update No change Insert NewLookupRow Value 2* 0 0 2* 0 1
*If you select Ignore Null for all lookup ports not in the lookup condition and if all those ports contain null values, the Integration Service does not change the cache and the NewLookupRow value equals 0. For more information, see Using the Ignore Null Property on page 374.
When you first run the session, the Integration Service builds the lookup cache from the target table based on the lookup SQL override. Therefore, all rows in the cache match the condition in the WHERE clause, EMP_STATUS = 4. Suppose the Integration Service reads a source row that meets the lookup condition you specify (the value for EMP_ID is found in the cache), but the value of EMP_STATUS is 2. The Integration Service does not find the row in the cache, so it inserts the row into the cache and passes the row to the target table. When this happens, not all rows in the cache match the condition in the WHERE clause. When the Integration Service tries to insert this row in the target table, you might get inconsistent data if the row already exists there.
379
To verify that you only insert rows into the cache that match the WHERE clause, add a Filter transformation before the Lookup transformation and define the filter condition as the condition in the WHERE clause in the lookup SQL override. For the example above, enter the following filter condition:
EMP_STATUS = 4
For more information about the lookup SQL override, see Overriding the Lookup Query on page 345.
Use a Router transformation to pass rows to the cached target when the NewLookupRow value equals one or two. Use the Router transformation to drop rows when the NewLookupRow value equals zero, or you can output those rows to a different target. Use Update Strategy transformations after the Lookup transformation to flag rows for insert or update into the target. Set the error threshold to one when you run a session. When you set the error threshold to one, the session fails when it encounters the first error. The Integration Service does not write the new cache files to disk. Instead, it restores the original cache files, if they exist. You must also restore the pre-session target table to the target database. For more information about setting the error threshold, see the Workflow Administration Guide. Verify that you output the same values to the target that the Integration Service writes to the lookup cache. When you choose to output new values on update, only connect lookup/ output ports to the target table instead of input/output ports. When you choose to output old values on update, add an Expression transformation after the Lookup transformation and before the Router transformation. Add output ports in the Expression transformation for each port in the target table and create expressions to ensure you do not output null input values to the target. Set the Treat Source Rows As property to Data Driven in the session properties. Select Insert and Update as Update when you define the update strategy target table options in the session properties. This ensures that the Integration Service updates rows marked for update and inserts rows marked for insert. Select these options in the Transformations View on the Mapping tab in the session properties.
If the row does not exist in the lookup cache, the Integration Service inserts the row in the cache and passes it to the target table. If the row does exist in the lookup cache, the Integration Service does not update the row in the cache or target table.
Note: If the source data contains null values in the lookup condition columns, set the error
threshold to one. This ensures that the lookup cache and table remain synchronized if the Integration Service inserts a row in the cache, but the database rejects the row due to a Not Null constraint.
The Lookup transformation uses a dynamic lookup cache. When the session starts, the Integration Service builds the lookup cache from the target table. When the Integration Service reads a row that is not in the lookup cache, it inserts the row in the cache and then passes the row out of the Lookup transformation. The Router transformation directs the row to the UPD_Insert_New Update Strategy transformation. The Update Strategy transformation marks the row as insert before passing it to the target. The target table changes as the session runs, and the Integration Service inserts new rows and updates existing rows in the lookup cache. The Integration Service keeps the lookup cache and target table synchronized. To generate keys for the target, use Sequence-ID in the associated port. The sequence ID generates primary keys for new rows the Integration Service inserts into the target table. Without the dynamic lookup cache, you need to use two Lookup transformations in the mapping. Use the first Lookup transformation to insert rows in the target. Use the second Lookup transformation to recache the target table and update rows in the target table.
381
You increase session performance when you use a dynamic lookup cache because you only need to build the cache from the database once. You can continue to use the lookup cache even though the data in the target table changes.
You can create a dynamic lookup cache from a relational or flat file source. The Lookup transformation must be a connected transformation. Use a persistent or a non-persistent cache. If the dynamic cache is not persistent, the Integration Service always rebuilds the cache from the database, even if you do not enable Recache from Lookup Source. You cannot share the cache between a dynamic Lookup transformation and static Lookup transformation in the same target load order group. You can only create an equality lookup condition. You cannot look up a range of data. Associate each lookup port that is not in the lookup condition with an input port or a sequence ID. Use a Router transformation to pass rows to the cached target when the NewLookupRow value equals one or two. Use the Router transformation to drop rows when the NewLookupRow value equals zero, or you can output those rows to a different target. Verify that you output the same values to the target that the Integration Service writes to the lookup cache. When you choose to output new values on update, only connect lookup/ output ports to the target table instead of input/output ports. When you choose to output old values on update, add an Expression transformation after the Lookup transformation and before the Router transformation. Add output ports in the Expression transformation for each port in the target table and create expressions to ensure you do not output null input values to the target. When you use a lookup SQL override, make sure you map the correct columns to the appropriate targets for lookup. When you add a WHERE clause to the lookup SQL override, use a Filter transformation before the Lookup transformation. This ensures the Integration Service only inserts rows in the dynamic cache and target table that match the WHERE clause. For more information, see Using the WHERE Clause with a Dynamic Cache on page 379. When you configure a reusable Lookup transformation to use a dynamic cache, you cannot edit the condition or disable the Dynamic Lookup Cache property in a mapping. Use Update Strategy transformations after the Lookup transformation to flag the rows for insert or update for the target. Use an Update Strategy transformation before the Lookup transformation to define some or all rows as update if you want to use the Update Else Insert property in the Lookup transformation. Set the row type to Data Driven in the session properties.
382
Select Insert and Update as Update for the target table options in the session properties.
383
Unnamed cache. When Lookup transformations in a mapping have compatible caching structures, the Integration Service shares the cache by default. You can only share static unnamed caches. Named cache. Use a persistent named cache when you want to share a cache file across mappings or share a dynamic and a static cache. The caching structures must match or be compatible with a named cache. You can share static and dynamic named caches.
When the Integration Service shares a lookup cache, it writes a message in the session log.
You can share static unnamed caches. Shared transformations must use the same ports in the lookup condition. The conditions can use different operators, but the ports must be the same. You must configure some of the transformation properties to enable unnamed cache sharing. For more information, see Table 16-6 on page 385. The structure of the cache for the shared transformations must be compatible.
If you use hash auto-keys partitioning, the lookup/output ports for each transformation must match. If you do not use hash auto-keys partitioning, the lookup/output ports for the first shared transformation must match or be a superset of the lookup/output ports for subsequent transformations.
384
If the Lookup transformations with hash auto-keys partitioning are in different target load order groups, you must configure the same number of partitions for each group. If you do not use hash auto-keys partitioning, you can configure a different number of partitions for each target load order group.
Table 16-6 shows when you can share an unnamed static and dynamic cache:
Table 16-6. Location for Sharing Unnamed Cache
Shared Cache Static with Static Dynamic with Dynamic Dynamic with Static Location of Transformations Anywhere in the mapping. Cannot share. Cannot share.
Table 16-7 describes the guidelines to follow when you configure Lookup transformations to share an unnamed cache:
Table 16-7. Properties for Sharing Unnamed Cache
Properties Lookup SQL Override Lookup Table Name Lookup Caching Enabled Lookup Policy on Multiple Match Lookup Condition Connection Information Source Type Tracing Level Lookup Cache Directory Name Lookup Cache Persistent Lookup Data Cache Size Configuration for Unnamed Shared Cache If you use the Lookup SQL Override property, you must use the same override in all shared transformations. Must match. Must be enabled. n/a Shared transformations must use the same ports in the lookup condition. The conditions can use different operators, but the ports must be the same. The connection must be the same. When you configure the sessions, the database connection must match. Must match. n/a Does not need to match. Optional. You can share persistent and non-persistent. Integration Service allocates memory for the first shared transformation in each pipeline stage. It does not allocate additional memory for subsequent shared transformations in the same pipeline stage. For information about pipeline stages, see the Workflow Administration Guide. Integration Service allocates memory for the first shared transformation in each pipeline stage. It does not allocate additional memory for subsequent shared transformations in the same pipeline stage. For information about pipeline stages, see the Workflow Administration Guide. You cannot share an unnamed dynamic cache.
385
Lookup/Output Ports
Insert Else Update Update Else Insert Datetime Format Thousand Separator Decimal Separator Case-Sensitive String Comparison Null Ordering Sorted Input
2. 3.
386
4. 5.
The Integration Service saves the cache files to disk after it processes each target load order group. The Integration Service uses the following rules to process the second Lookup transformation with the same cache file name prefix:
The Integration Service uses the memory cache if the transformations are in the same target load order group. The Integration Service rebuilds the memory cache from the persisted files if the transformations are in different target load order groups. The Integration Service rebuilds the cache from the database if you configure the transformation to recache from source and the first transformation is in a different target load order group. The Integration Service fails the session if you configure subsequent Lookup transformations to recache from source, but not the first one in the same target load order group. If the cache structures do not match, the Integration Service fails the session.
If you run two sessions simultaneously that share a lookup cache, the Integration Service uses the following rules to share the cache files:
The Integration Service processes multiple sessions simultaneously when the Lookup transformations only need to read the cache files. The Integration Service fails the session if one session updates a cache file while another session attempts to read or update the cache file. For example, Lookup transformations update the cache file if they are configured to use a dynamic cache or recache from source.
You can share any combination of dynamic and static caches, but you must follow the guidelines for location. For more information, see Table 16-8 on page 388. You must configure some of the transformation properties to enable named cache sharing. For more information, see Table 16-9 on page 388. A dynamic lookup cannot share the cache if the named cache has duplicate rows. A named cache created by a dynamic Lookup transformation with a lookup policy of error on multiple match can be shared by a static or dynamic Lookup transformation with any lookup policy. A named cache created by a dynamic Lookup transformation with a lookup policy of use first or use last can be shared by a Lookup transformation with the same lookup policy. Shared transformations must use the same output ports in the mapping. The criteria and result columns for the cache must match the cache files.
The Integration Service might use the memory cache, or it might build the memory cache from the file, depending on the type and location of the Lookup transformations.
387
Table 16-8 shows when you can share a static and dynamic named cache:
Table 16-8. Location for Sharing Named Cache
Shared Cache Static with Static Location of Transformations - Same target load order group. - Separate target load order groups. - Separate mappings. - Separate target load order groups. - Separate mappings. - Separate target load order groups. - Separate mappings. Cache Shared - Integration Service uses memory cache. - Integration Service uses memory cache. - Integration Service builds memory cache from file. - Integration Service uses memory cache. - Integration Service builds memory cache from file. - Integration Service builds memory cache from file. - Integration Service builds memory cache from file.
For more information about target load order groups, see Mappings in the Designer Guide. Table 16-9 describes the guidelines to follow when you configure Lookup transformations to share a named cache:
Table 16-9. Properties for Sharing Named Cache
Properties Lookup SQL Override Lookup Table Name Lookup Caching Enabled Lookup Policy on Multiple Match Configuration for Named Shared Cache If you use the Lookup SQL Override property, you must use the same override in all shared transformations. Must match. Must be enabled. - A named cache created by a dynamic Lookup transformation with a lookup policy of error on multiple match can be shared by a static or dynamic Lookup transformation with any lookup policy. - A named cache created by a dynamic Lookup transformation with a lookup policy of use first or use last can be shared by a Lookup transformation with the same lookup policy. Shared transformations must use the same ports in the lookup condition. The conditions can use different operators, but the ports must be the same. The connection must be the same. When you configure the sessions, the database connection must match. Must match. n/a Must match. Must be enabled. When transformations within the same mapping share a cache, the Integration Service allocates memory for the first shared transformation in each pipeline stage. It does not allocate additional memory for subsequent shared transformations in the same pipeline stage. For information about pipeline stages, see the Workflow Administration Guide.
Lookup Condition Connection Information Source Type Tracing Level Lookup Cache Directory Name Lookup Cache Persistent Lookup Data Cache Size
388
Dynamic Lookup Cache Output Old Value on Update Cache File Name Prefix Recache from Source
Lookup/Output Ports Insert Else Update Update Else Insert Thousand Separator Decimal Separator Case-Sensitive String Comparison Null Ordering Sorted Input
Note: You cannot share a lookup cache created on a different operating system. For example,
only an Integration Service on UNIX can read a lookup cache created on a Integration Service on UNIX, and only an Integration Service on Windows can read a lookup cache created on an Integration Service on Windows.
389
Tips
Cache small lookup tables. Improve session performance by caching small lookup tables. The result of the lookup query and processing is the same, whether or not you cache the lookup table. Use a persistent lookup cache for static lookup tables. If the lookup table does not change between sessions, configure the Lookup transformation to use a persistent lookup cache. The Integration Service then saves and reuses cache files from session to session, eliminating the time required to read the lookup table.
390
Chapter 17
Normalizer Transformation
This chapter includes the following topics:
Overview, 392 Normalizer Transformation Components, 394 Normalizer Transformation Generated Keys, 399 VSAM Normalizer Transformation, 401 Pipeline Normalizer Transformation, 408 Using a Normalizer Transformation in a Mapping, 415 Troubleshooting, 420
391
Overview
Transformation type: Active Connected
The Normalizer transformation receives a row that contains multiple-occurring columns and returns a row for each instance of the multiple-occurring data. The transformation processes multiple-occurring columns or multiple-occurring groups of columns in each source row. The Normalizer transformation parses multiple-occurring columns from COBOL sources, relational tables, or other sources. It can process multiple record types from a COBOL source that contains a REDEFINES clause. For example, you might have a relational table that stores four quarters of sales by store. You need to create a row for each sales occurrence. You can configure a Normalizer transformation to return a separate row for each quarter. The following source rows contain four quarters of sales by store:
Store1 100 300 500 700 Store2 250 450 650 850
The Normalizer returns a row for each store and sales combination. It also returns an index that identifies the quarter number:
Store1 100 1 Store1 300 2 Store1 500 3 Store1 700 4 Store2 250 1 Store2 450 2 Store2 650 3 Store2 850 4
The Normalizer transformation generates a key for each source row. The Integration Service increments the generated key sequence number each time it processes a source row. When the source row contains a multiple-occurring column or a multiple-occurring group of columns, the Normalizer transformation returns a row for each occurrence. Each row contains the same generated key value. When the Normalizer returns multiple rows from a source row, it returns duplicate data for single-occurring source columns. For example, Store1 and Store2 repeat for each instance of sales. You can create a VSAM Normalizer transformation or a pipeline Normalizer transformation:
VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier transformation for a COBOL source. The Mapping Designer creates VSAM
392
Normalizer columns from a COBOL source in a mapping. The column attributes are read-only. The VSAM Normalizer receives a multiple-occurring source column through one input port. For more information, see VSAM Normalizer Transformation on page 401.
Pipeline Normalizer transformation. A transformation that processes multiple-occurring data from relational tables or flat files. You create the columns manually and edit them in the Transformation Developer or Mapping Designer. The pipeline Normalizer transformation represents multiple-occurring columns with one input port for each source column occurrence. For more information about the Pipeline Normalizer transformation, see Pipeline Normalizer Transformation on page 408.
Overview
393
Transformation. Enter the name and description of the transformation. The naming convention for an Normalizer transformation is NRM_TransformationName. You can also make the pipeline Normalizer transformation reusable. Ports. View the transformation ports and attributes. For more information, see Ports Tab on page 394. Properties. Configure the tracing level to determine the amount of transaction detail reported in the session log file. Choose to reset or restart the generated key sequence value in the next session. For more information, see Properties Tab on page 396. Normalizer. Define the structure of the source data. The Normalizer tab defines source data as columns and groups of columns. For more information, see Normalizer Tab on page 397. Metadata Extensions. Configure the extension name, datatype, precision, and value. You can also create reusable metadata extensions. For more information about creating metadata extensions, see Metadata Extensions in the Repository Guide.
Ports Tab
When you define a Normalizer transformation, you configure the columns in the Normalizer tab. The Designer creates the ports. You can view the Normalizer ports and attributes on the Ports tab.
394
Pipeline and VSAM Normalizer transformations represent multiple-occurring source columns differently. A VSAM Normalizer transformation has one input port for a multiple-occurring column. A pipeline Normalizer transformation has multiple input ports for a multipleoccurring column. The Normalizer transformation has one output port for each single-occurring input port. When a source column is multiple-occurring, the pipeline and VSAM Normalizer transformations have one output port for the column. The transformation returns a row for each source column occurrence. The Normalizer transformation has a generated column ID (GCID) port for each multipleoccurring column. The generated column ID is an index for the instance of the multipleoccurring data. For example, if a column occurs four times in a source record, the Normalizer returns a value of 1, 2, 3, or 4 in the generated column ID based on which instance of the multiple-occurring data occurs in the row. The naming convention for the Normalizer generated column ID is GCID_<occuring_field_name>. The Normalizer transformation has at least one generated key port. The Integration Service increments the generated key sequence number each time it processes a source row. Figure 17-2 shows the Normalizer transformation Ports tab:
Figure 17-2. Normalizer Ports Tab
Sales_By_Quarter is multipleoccurring in the source. The Normalizer transformation has one output port for Sales_By_Quarter. It returns four rows for each source row. Generated Key Start Value
You can change the ports on a pipeline Normalizer transformation by editing the columns on the Normalizer tab. To change a VSAM Normalizer transformation, you need to change the COBOL source and recreate the transformation. You can change the generated key start values on the Ports tab. You can enter different values for each generated key. When you change a start value, the generated key value resets to the start value the next time you run a session. For more information about generated keys, see Normalizer Transformation Generated Keys on page 399.
Normalizer Transformation Components 395
For more information about the VSAM Normalizer Ports tab, see VSAM Normalizer Ports Tab on page 403. For more information about the pipeline Normalizer Ports tab, see Pipeline Normalizer Ports Tab on page 409.
Properties Tab
Configure the Normalizer transformation general properties on the Properties tab. Figure 17-3 shows the Normalizer transformation Properties tab:
Figure 17-3. Normalizer Transformation Properties Tab
Restart
Required
Tracing Level
Required
396
Normalizer Tab
The Normalizer tab defines the structure of the source data. The Normalizer tab defines source data as columns and groups of columns. A group of columns might define a record in a COBOL source or it might define a group of multiple-occurring fields in the source. The column level number identifies groups of columns in the data. Level numbers define a data hierarchy. Columns in a group have the same level number and display sequentially below a group-level column. A group-level column has a lower level number, and it contains no data. In Figure 17-4, Quarterly_Data is a group-level column. It is Level 1. The Quarterly_Data group occurs four times in each row. Sales_by_Quarter and Returns_by_Quarter are Level 2 columns and belong to the group. Figure 17-4 shows the Normalizer tab of a pipeline Normalizer transformation:
Figure 17-4. Normalizer Tab
Each column has an Occurs attribute. The Occurs attribute identifies columns or groups of columns that occur more than once in a source row. When you create a pipeline Normalizer transformation, you can edit the columns. When you create a VSAM Normalizer transformation, the Normalizer tab is read-only.
397
Table 17-2 describes the Normalizer tab attributes that are common to the VSAM and pipeline Normalizer transformations:
Table 17-2. Normalizer Tab Columns
Attribute Column Name Level Description Name of the source column. Group columns. Columns in the same group occur beneath a column with a lower level number. When each column is the same level, the transformation contains no column groups. The number of instances of a column or group of columns in the source row. The transformation column datatype can be String, Nstring, or Number. Precision. Length of the column. Number of decimal positions for a numeric column.
The Normalizer tab for a VSAM Normalizer transformation contains the same attributes as the pipeline Normalizer transformation, but it includes attributes unique to a COBOL source definition. For more information about the Normalizer tab for a VSAM Normalizer transformation, see VSAM Normalizer Tab on page 404. For more information about the Normalizer tab for the pipeline Normalizer transformation, see Pipeline Normalizer Tab on page 411.
398
sequence. The Integration Service might pass duplicate keys to the target when you reset a generated key that exists in the target.
Modify the generated key sequence value. You can modify the generated key sequence value on the Ports tab of the Normalizer transformation. The Integration Service assigns the sequence value to the first generated key it creates for that column. Reset the generated key sequence. Reset the generated key sequence on the Normalizer transformation Properties tab. When you reset the generated key sequence, the Integration Service resets the generated key start value back to the value it was before the session. Reset the generated key sequence when want to create the same generated key values each time you run the session. Restart the generated key sequence. Restart the generated key sequence on the Normalizer transformation Properties tab. When you restart the generated key sequence, the Integration Service starts the generated key sequence at 1 the next time it runs a session. When you restart the generated key sequence, the generated key start value does not change in the Normalizer transformation until you run a session. When you run the session, the Integration Service overrides the sequence number value on the Ports tab.
When you reset or restart the generated key sequence, the reset or restart affects the generated key sequence values the next time you run a session. You do not change the current generated
399
key sequence values in the Normalizer transformation. When you reset or restart the generated key sequence, the option is enabled for every session until you disable the option.
400
STORE_DATA. 05 05 05 STORE_NAME STORE_ADDR1 STORE_CITY PIC X(30). PIC X(30). PIC X(30).
03
The sales file can contain two types of sales records. Store_Data defines a store and Detail_Data defines merchandise sold in the store. The REDEFINES clause indicates that Detail_Data fields might occur in a record instead of Store_Data fields. The first three characters of each sales record is the header. The header includes a record type and a store ID. The value of Hdr_Rec_Type defines whether the rest of the record contains store information or merchandise information. For example, when Hdr_Rec_Type is S, the record contains store data. When Hdr_Rec_Type is D, the record contains detail data. When the record contains detail data, it includes the Supplier_Info fields. The OCCURS clause defines four suppliers in each Detail_Data record. For more information about COBOL source definitions, see the Designer Guide.
401
Figure 17-5 shows the Sales_File COBOL source definition that you might create from the COBOL copybook:
Figure 17-5. COBOL Source Definition Example
Group level columns identify groups of columns in a COBOL source definition. Group level columns do not contain data.
The Sales_Rec, Hdr_Data, Store_Data, Detail_Data, and Supplier_Info columns are grouplevel columns that identify groups of lower level data. Group-level columns have a length of zero because they contain no data. None of these columns are output ports in the source definition. The Supplier_Info group contains Supplier_Code and Supplier_Name columns. The Supplier_Info group occurs four times in each Detail_Data record. When you create a VSAM Normalizer transformation from the COBOL source definition, the Mapping Designer creates the input/output ports in the Normalizer transformation based on the COBOL source definition. The Normalizer transformation contains at least one generated key output port. When the COBOL source has multiple-occurring columns, the Normalizer transformation has a generated column ID output port. For more information about the generated column ID, see Ports Tab on page 394. Figure 17-6 shows the Normalizer transformation ports the Mapping Designer creates from the source definition:
Figure 17-6. Sales File VSAM Normalizer Transformation
The Normalizer transformation has a generated key and a generated column ID.
402
In Figure 17-5 on page 402, the Supplier_Info group of columns occurs four times in each COBOL source row. The COBOL source row might contain the following data:
Item1 ItemDesc 100 25 A Supplier1 B Supplier2 C Supplier3 D Supplier4
The Normalizer transformation returns a row for each occurrence of the Supplier_Code and Supplier_Name columns. Each output row contains the same item, description, price, and quantity values. The Normalizer returns the following detail data rows from the COBOL source row:
Item1 ItemDesc 100 25 A Supplier1 1 1 Item1 ItemDesc 100 25 B Supplier2 1 2 Item1 ItemDesc 100 25 C Supplier3 1 3 Item1 ItemDesc 100 25 D Supplier4 1 4
Each output row contains a generated key and a column ID. The Integration Service updates the generated key value when it processes a new source row. In the detail data rows, the generated key value is 1. The column ID defines the Supplier_Info column occurrence number. The Integration Service updates the column ID for each occurrence of the Supplier_Info. The column ID values are 1, 2, 3, 4 in the detail data rows.
403
Supplier_Code and Supplier_Name occur four times in the COBOL source. The Ports tab shows one Supplier_Code port and one Supplier_Name port.
404
405
Signed (S) Trailing Sign (T) Included Sign (I) Real Decimal Point (R) Redefines Business Name
In the Mapping Designer, create a new mapping or open an existing mapping. Drag a COBOL source definition into the mapping. The Designer adds a Normalizer transformation and connects it to the COBOL source definition. If you have not enabled the option to create a source qualifier by default, the Create Normalizer Transformation dialog box appears:
406
For more information about the option to create a source qualifier by default, see Using the Designer in the Designer Guide.
3.
If the Create Normalizer Transformation dialog box appears, you can choose from the following options:
VSAM Source. Create a transformation from the COBOL source definition in the mapping. Pipeline. Create a transformation, but do not define columns from a COBOL source. Define the columns manually on the Normalizer tab. You might choose this option when you want to process multiple-occurring data from another transformation in the mapping.
To create the VSAM Normalizer transformation, select the VSAM Normalizer transformation option. The dialog box displays the name of the COBOL source definition in the mapping. Select the COBOL source definition and click OK.
4. 5. 6.
Open the Normalizer transformation. Select the Ports tab to view the ports in the Normalizer transformation. The Designer creates the ports from the COBOL source definition by default. Click the Normalizer tab to review the source column organization. The Normalizer tab contains the same information as the Columns tab of the COBOL source. However, you cannot modify column attributes in the Normalizer transformation. To change column attributes, change the COBOL copybook, import the COBOL source, and recreate the Normalizer transformation.
7.
Select the Properties tab to set the tracing level. You can also configure the transformation to reset the generated key sequence numbers at the start of the next session. For more information about changing generated key values, see Changing the Generated Key Values on page 399.
407
Each source row has a StoreName column and four instances of Sales_By_Quarter.
Figure 17-10 shows the ports that the Designer creates from the columns in the Normalizer transformation:
Figure 17-10. Pipeline Normalizer Ports
A pipeline Normalizer transformation has an input port for each instance of a multiple-occurring column. The transformation returns one instance of the multiple-occurring column in each output row.
408
The Normalizer transformation returns one row for each instance of the multiple-occurring column:
Dellmark 100 1 1 Dellmark 450 1 2 Dellmark 650 1 3 Dellmark 780 1 4 Tonys Tonys Tonys Tonys 666 2 1 333 2 2 444 2 3 555 2 4
The Integration Service increments the generated key sequence number each time it processes a source row. The generated key links each quarter sales to the same store. In this example, the generated key for the Dellmark row is 1. The generated key for the Tonys store is 2. The transformation returns a generated column ID (GCID) for each instance of a multipleoccurring field. The GCID_Sales_by_Quarter value is always 1, 2, 3, or 4 in this example. For more information about the generated key, see Normalizer Transformation Generated Keys on page 399.
409
The Designer creates an input port for each occurrence of a multipleoccurring column. You can change the generated key sequence number.
To change the ports in a pipeline Normalizer transformation, modify the columns in the Normalizer tab. When you add a column occurrence, the Designer adds an input port. The Designer creates ports for the lowest level columns. It does not create ports for group level columns.
410
411
The level number on the Normalizer tab identifies a hierarchy of columns. Group level columns identify groups of columns. The group level column has a lower level number than columns in the group. Columns in the same group have the same level number and display sequentially below the group level column on the Normalizer tab. Figure 17-13 shows a group of multiple-occurring columns in the Normalizer tab:
Figure 17-13. Grouping Repeated Columns on the Normalizer Tab
The NEWRECORD column contains no data. It is a Level 1 group column. The group occurs four times in each source row. Store_Number and Store_Name are Level 2 columns. They belong to the NEWRECORD group.
For more information about creating columns and groups, see Steps to Create a Pipeline Normalizer Transformation on page 412.
In the Transformation Developer or the Mapping Designer, click Transformation > Create. Select Normalizer transformation. Enter a name for the Normalizer transformation. The naming convention for Normalizer transformations is NRM_TransformationName.
2. 3. 4.
Click Create and click Done. Open the Normalizer transformation and click the Normalizer tab. Click Add to add a new column.
412
The Designer creates a new column with default attributes. You can change the name, datatype, precision, and scale.
5. 6.
To create a multiple-occurring column, enter the number of occurrences in the Occurs column. To create a group of multiple-occurring columns, enter at least one of the columns on the Normalizer tab. Select the column. Click Level.
All columns are the same level by default. The Level defines columns that are grouped together.
The Designer adds a NEWRECORD group level column above the selected column. NEWRECORD becomes Level 1. The selected column becomes Level 2. You can rename the NEWRECORD column.
7.
You can change the column level for other columns to add them to the same group. Select a column and click Level to change it to the same level as the column above it. Columns in the same group must appear sequentially in the Normalizer tab.
413
Figure 17-14 shows the NEWRECORD column that groups the Store_Number and Store_Name columns:
Figure 17-14. Group-Level Column on the Normalizer Tab
The NEWRECORD column is a level one group column. Store_Number and Store_Name are level two columns.
8. 9.
Change the occurrence at the group level to make the group of columns multipleoccurring. Click Apply to save the columns and create input and output ports. The Designer creates the Normalizer transformation input and output ports. In addition, the Designer creates the generated key columns and a column ID for each multipleoccurring column or group of columns.
10.
Select the Properties tab to change the tracing level or reset the generated key sequence numbers after the next session. For more information about changing generated key values, see Changing the Generated Key Values on page 399.
414
The COBOL source definition and the Normalizer transformation have columns that represent fields in both types of records. You need to filter the store rows from the item rows and pass them to different targets. Figure 17-15 shows the Sales_File COBOL source:
Figure 17-15. Sales File COBOL Source
The source record might contain the Store_Data information or Detail_Data information with four occurrences of Supplier_Info.
The Hdr_Rec_Type defines whether the record contains store or merchandise data. When the Hdr_Rec_Type value is S, the record contains Store_Data. When the Hdr_Rec_Type is D, the record contains Detail_Data. Detail_Data always includes four occurrences of Supplier_Info fields. To filter data, connect the Normalizer output rows to a Router transformation to route the store, item, and supplier data to different targets. You can filter rows in the Router transformation based on the value of Hdr_Rec_Type.
415
Figure 17-16 shows the mapping that routes Sales_File records to different targets:
Figure 17-16. Multiple Record Types Routed to Different Targets
3 2 4 6
The mapping filters multiple record types from the COBOL source to relational targets. The the multiple-occurring source columns are mapped to a separate relational table. Each row is indexed by occurrence in the source row. The mapping contains the following transformations:
Normalizer transformation. The Normalizer transformation returns multiple rows when the source contains multiple-occurring Detail_Data. It also processes different record types from the same source. Router transformation. The Router transformation routes data to targets based on the value of Hdr_Rec_Type. Aggregator transformation. The Aggregator transformation removes duplicate Detail_Data rows that occur with each Supplier_Info occurrence. The Normalizer transformation passes the header record type and header store number columns to the Sales_Header target. Each Sales_Header record has a generated key that links the Sales_Header row to a Store_Data or Detail_Data target row. The Normalizer returns Hdr_Data and Store_Data once per row. The Normalizer transformation passes all columns to the Router transformation. It passes Detail_Data data four times per row, once for each occurrence of the Supplier_Info columns. The Detail_Data columns contain duplicate data, except for the Supplier_Info columns. The Router transformation passes the store name, address, city, and generated key to Store_Data when the Hdr_Rec_Type is S. The generated key links Store_Data rows to Sales_Header rows.
2.
3.
416
The Router transformation contains one user-defined group for the store data and one user-defined group for the merchandise items. 4. The Router transformation passes the item, item description, price, quantity, and Detail_Data generated keys to an Aggregator transformation when the Hdr_Rec_Type is D. The Router transformation passes the supplier code, name, and column ID to the Suppliers target when the Hdr_Rec_Type is D. It passes the generated key that links the Suppliers row to the Detail_Data row. The Aggregator transformation removes the duplicate Detail_Data columns. The Aggregator passes one instance of the item, description, price, quantity, and generated key to Detail_Data. The Detail_Data generated key links the Detail_Data rows to the Suppliers rows. Detail_Data also has a key that links the Detail_Data rows to Sales_Header rows. Figure 17-17 shows the user-defined groups and the filter conditions in the Router transformation:
Figure 17-17. Router Transformation User-Defined Groups
5.
6.
The Router transformation passes store data or item data based on the record type.
417
Figure 17-18 shows a COBOL source definition that contains a multiple-occurring group of columns:
Figure 17-18. COBOL Source with A Multiple-Occurring Group of Columns
The Normalizer transformation generates a GK_Detail_Sales key for each source row. The GK_Detail_Sales key represents one Detail_Record source row. Figure 17-19 shows the primary-foreign key relationships between the targets:
Figure 17-19. Generated Keys in Target Tables
Multiple-occurring Detail_Supplier rows have a foreign key linking them to the same Detail_Sales row.
418
Figure 17-20 shows the GK_Detail_Sales generated key connected to primary and foreign keys in the target:
Figure 17-20. Generated Keys Mapped to Target Keys
Pass GK_Detail_Sales to the primary key of Detail_Sales and the foreign key of Detail_Suppliers.
Detail_Sales_Target. Pass the Detail_Item, Detail_Desc, Detail_Price, and Detail_Qty columns to a Detail_Sales target. Pass the GK_Detail_Sales key to the Detail_Sales primary key. Aggregator Transformation. Pass each Detail_Sales row through an Aggregator transformation to remove duplicate rows. The Normalizer returns duplicate Detail_Sales columns for each occurrence of Detail_Suppliers. Detail_Suppliers. Pass each instance of the Detail_Suppliers columns to a the Detail_Suppliers target. Pass the GK_Detail_Sales key to the Detail_Suppliers foreign key. Each instance of the Detail_Suppliers columns has a foreign key that relates the Detail_Suppliers row to the Detail_Sales row.
419
Troubleshooting
I cannot edit the ports in my Normalizer transformation when using a relational source. When you create ports manually, add them on the Normalizer tab in the transformation, not the Ports tab. Importing a COBOL file failed with numberrors. What should I do? Verify that the COBOL program follows the COBOL standard, including spaces, tabs, and end of line characters. The COBOL file headings should be similar to the following text:
identification division. program-id. mead. environment division. select file-one assign to "fname". data division. file section. fd FILE-ONE.
The Designer does not read hidden characters in the COBOL program. Use a text-only editor to make changes to the COBOL file. Do not use Word or Wordpad. Remove extra spaces. A session that reads binary data completed, but the information in the target table is incorrect. Edit the session in the Workflow Manager and verify that the source file format is set correctly. The file format might be EBCDIC or ASCII. The number of bytes to skip between records must be set to 0. I have a COBOL field description that uses a non-IBM COMP type. How should I import the source? In the source definition, clear the IBM COMP option. In my mapping, I use one Expression transformation and one Lookup transformation to modify two output ports from the Normalizer transformation. The mapping concatenates them into a single transformation. All the ports are under the same level. When I check the data loaded in the target, it is incorrect. Why is that? You can only concatenate ports from level one. Remove the concatenation.
420
Chapter 18
Rank Transformation
Overview, 422 Ports in a Rank Transformation, 424 Defining Groups, 425 Creating a Rank Transformation, 426
421
Overview
Transformation type: Active Connected
You can select only the top or bottom rank of data with Rank transformation. Use a Rank transformation to return the largest or smallest numeric value in a port or group. You can also use a Rank transformation to return the strings at the top or the bottom of a session sort order. During the session, the Integration Service caches input data until it can perform the rank calculations. The Rank transformation differs from the transformation functions MAX and MIN, in that it lets you select a group of top or bottom values, not just one value. For example, use Rank to select the top 10 salespersons in a given territory. Or, to generate a financial report, you might also use a Rank transformation to identify the three departments with the lowest expenses in salaries and overhead. While the SQL language provides many functions designed to handle groups of data, identifying top or bottom strata within a set of rows is not possible using standard SQL functions. You connect all ports representing the same row set to the transformation. Only the rows that fall within that rank, based on some measure you set when you configure the transformation, pass through the Rank transformation. You can also write expressions to transform data or perform calculations. Figure 18-1 shows a mapping that passes employee data from a human resources table through a Rank transformation. The Rank transformation only passes the rows for the top 10 highest paid employees to the next transformation.
Figure 18-1. Sample Mapping with a Rank Transformation
As an active transformation, the Rank transformation might change the number of rows passed through it. You might pass 100 rows to the Rank transformation, but select to rank only the top 10 rows, which pass from the Rank transformation to another transformation. You can connect ports from only one transformation to the Rank transformation. You can also create local variables and write non-aggregate expressions.
422
Rank Caches
During a session, the Integration Service compares an input row with rows in the data cache. If the input row out-ranks a cached row, the Integration Service replaces the cached row with the input row. If you configure the Rank transformation to rank across multiple groups, the Integration Service ranks incrementally for each group it finds. The Integration Service stores group information in an index cache and row data in a data cache. If you create multiple partitions in a pipeline, the Integration Service creates separate caches for each partition. For more information about caching, see Session Caches in the Workflow Administration Guide.
Enter a cache directory. Select the top or bottom rank. Select the input/output port that contains values used to determine the rank. You can select only one port to define a rank. Select the number of rows falling within a rank. Define groups for ranks, such as the 10 least expensive products for each manufacturer.
Overview
423
One only
Rank Index
The Designer creates a RANKINDEX port for each Rank transformation. The Integration Service uses the Rank Index port to store the ranking position for each row in a group. For example, if you create a Rank transformation that ranks the top five salespersons for each quarter, the rank index numbers the salespeople from 1 to 5:
RANKINDEX 1 2 3 4 5 SALES_PERSON Sam Mary Alice Ron Alex SALES 10,000 9,000 8,000 7,000 6,000
The RANKINDEX is an output port only. You can pass the rank index to another transformation in the mapping or directly to a target.
424
Defining Groups
Like the Aggregator transformation, the Rank transformation lets you group information. For example, if you want to select the 10 most expensive items by manufacturer, you would first define a group for each manufacturer. When you configure the Rank transformation, you can set one of its input/output ports as a group by port. For each unique value in the group port, the transformation creates a group of rows falling within the rank definition (top or bottom, and a particular number in each rank). Therefore, the Rank transformation changes the number of rows in two different ways. By filtering all but the rows falling within a top or bottom rank, you reduce the number of rows that pass through the transformation. By defining groups, you create one set of ranked rows for each group. For example, you might create a Rank transformation to identify the 50 highest paid employees in the company. In this case, you would identify the SALARY column as the input/output port used to measure the ranks, and configure the transformation to filter out all rows except the top 50. After the Rank transformation identifies all rows that belong to a top or bottom rank, it then assigns rank index values. In the case of the top 50 employees, measured by salary, the highest paid employee receives a rank index of 1. The next highest-paid employee receives a rank index of 2, and so on. When measuring a bottom rank, such as the 10 lowest priced products in the inventory, the Rank transformation assigns a rank index from lowest to highest. Therefore, the least expensive item would receive a rank index of 1. If two rank values match, they receive the same value in the rank index and the transformation skips the next value. For example, if you want to see the top five retail stores in the country and two stores have the same sales, the return data might look similar to the following:
RANKINDEX 1 1 3 4 SALES 10000 10000 90000 80000 STORE Orange Brea Los Angeles Ventura
Defining Groups
425
In the Mapping Designer, click Transformation > Create. Select the Rank transformation. Enter a name for the Rank. The naming convention for Rank transformations is RNK_TransformationName. Enter a description for the transformation. This description appears in the Repository Manager.
2.
Click Create, and then click Done. The Designer creates the Rank transformation.
3. 4.
Link columns from an input transformation to the Rank transformation. Click the Ports tab, and then select the Rank (R) option for the port used to measure ranks.
If you want to create groups for ranked rows, select Group By for the port that defines the group.
426
5.
Click the Properties tab and select whether you want the top or bottom rank.
6. 7.
For the Number of Ranks option, enter the number of rows you want to select for the rank. Change the other Rank transformation properties, if necessary. Table 18-2 describes the Rank transformation properties:
Table 18-2. Rank Transformation Properties
Setting Cache Directory Description Local directory where the Integration Service creates the index and data cache files. By default, the Integration Service uses the directory entered in the Workflow Manager for the process variable $PMCacheDir. If you enter a new directory, make sure the directory exists and contains enough disk space for the cache files. Specifies whether you want the top or bottom ranking for a column. Number of rows you want to rank. When running in Unicode mode, the Integration Service ranks strings based on the sort order selected for the session. If the session sort order is case sensitive, select this option to enable case-sensitive string comparisons, and clear this option to have the Integration Service ignore case for strings. If the sort order is not case sensitive, the Integration Service ignores this setting. By default, this option is selected. Determines the amount of information the Integration Service writes to the session log about data passing through this transformation in a session.
Tracing Level
427
Transformation Scope
8. 9.
428
Chapter 19
Router Transformation
Overview, 430 Working with Groups, 432 Working with Ports, 436 Connecting Router Transformations in a Mapping, 438 Creating a Router Transformation, 440
429
Overview
Transformation type: Active Connected
A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. However, a Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group. If you need to test the same input data based on multiple conditions, use a Router transformation in a mapping instead of creating multiple Filter transformations to perform the same task. The Router transformation is more efficient. For example, to test data based on three conditions, you only need one Router transformation instead of three filter transformations to perform this task. Likewise, when you use a Router transformation in a mapping, the Integration Service processes the incoming data only once. When you use multiple Filter transformations in a mapping, the Integration Service processes the incoming data for each transformation. Figure 19-1 shows two mappings that perform the same task. Mapping A uses three Filter transformations while Mapping B produces the same result with one Router transformation:
Figure 19-1. Comparing Router and Filter Transformations
Mapping A Mapping B
A Router transformation consists of input and output groups, input and output ports, group filter conditions, and properties that you configure in the Designer.
430
Input Ports
Input Group
Output Ports
Overview
431
Input Output
Input Group
The Designer copies property information from the input ports of the input group to create a set of output ports for each output group.
Output Groups
There are two types of output groups:
User-Defined Groups
You create a user-defined group to test a condition based on incoming data. A user-defined group consists of output ports and a group filter condition. You can create and edit userdefined groups on the Groups tab with the Designer. Create one user-defined group for each condition that you want to specify. The Integration Service uses the condition to evaluate each row of incoming data. It tests the conditions of each user-defined group before processing the default group. The Integration Service determines the order of evaluation for each condition based on the order of the connected output groups. The Integration Service processes user-defined groups that are connected to a transformation or a target in a mapping. The Integration Service only processes user-defined groups that are not connected in a mapping if the default group is connected to a transformation or a target. If a row meets more than one group filter condition, the Integration Service passes this row multiple times.
432
drop all rows in the default group, do not connect it to a transformation or a target in a mapping. The Designer deletes the default group when you delete the last user-defined group from the list.
433
Since you want to perform multiple calculations based on the data from three different countries, create three user-defined groups and specify three group filter conditions on the Groups tab. Figure 19-4 shows specifying group filter conditions in a Router transformation to filter customer data:
Figure 19-4. Specifying Group Filter Conditions
In the session, the Integration Service passes the rows of data that evaluate to TRUE to each transformation or target that is associated with each user-defined group, such as Japan, France, and USA. The Integration Service passes the row to the default group if all of the conditions evaluate to FALSE. If this happens, the Integration Service passes the data of the other six countries to the transformation or target that is associated with the default group. If you want the Integration Service to drop all rows in the default group, do not connect it to a transformation or a target in a mapping.
Adding Groups
Adding a group is similar to adding a port in other transformations. The Designer copies property information from the input ports to the output ports. For more information, see Working with Groups on page 432.
To add a group to a Router transformation: 1. 2. 3. 4.
Click the Groups tab. Click the Add button. Enter a name for the new group in the Group Name section. Click the Group Filter Condition field and open the Expression Editor.
434
5. 6. 7.
Enter the group filter condition. Click Validate to check the syntax of the condition. Click OK.
435
The Designer creates output ports by copying the following properties from the input ports:
When you make changes to the input ports, the Designer updates the output ports to reflect these changes. You cannot edit or delete output ports. The output ports display in the Normal view of the Router transformation. The Designer creates output port names based on the input port names. For each input port, the Designer creates a corresponding output port in each output group.
436
Figure 19-6 shows the output port names of a Router transformation that correspond to the input port names:
Figure 19-6. Input Port Name and Corresponding Output Port Names
437
You can connect one output port in a group to multiple transformations or targets.
Output Group 1 Port 1 Port 2 Port 3 Output Group 2 Port 1 Port 2 Port 3 Port 1 Port 2 Port 3 Port 4 Port 1 Port 2 Port 3 Port 4
You can connect multiple output ports in one group to multiple transformations or targets.
Output Group 1 Port 1 Port 2 Port 3 Output Group 2 Port 1 Port 2 Port 3 Port 1 Port 2 Port 3 Port 4 Port 1 Port 2 Port 3 Port 4
You cannot connect more than one group to one target or a single input group transformation.
Output Group 1 Port 1 Port 2 Port 3 Output Group 2 Port 1 Port 2 Port 3 Port 1 Port 2 Port 3 Port 4
438
You can connect more than one group to a multiple input group transformation, except for Joiner transformations, when you connect each output group to a different input group.
Output Group 1 Port 1 Port 2 Port 3 Output Group 2 Port 1 Port 2 Port 3 Input Group 1 Port 1 Port 2 Port 3 Input Group 2 Port 1 Port 2 Port 3
439
In the Mapping Designer, open a mapping. Click Transformation > Create. Select Router transformation, and enter the name of the new transformation. The naming convention for the Router transformation is RTR_TransformationName. Click Create, and then click Done.
3. 4. 5. 6.
Select and drag all the ports from a transformation to add them to the Router transformation, or you can manually create input ports on the Ports tab. Double-click the title bar of the Router transformation to edit transformation properties. Click the Transformation tab and configure transformation properties. Click the Properties tab and configure tracing levels. For more information about configuring tracing levels, see Configuring Tracing Level in Transformations on page 32.
7.
Click the Groups tab, and then click the Add button to create a user-defined group. The Designer creates the default group when you create the first user-defined group.
Click the Group Filter Condition field to open the Expression Editor. Enter a group filter condition. Click Validate to check the syntax of the conditions you entered. Click OK. Connect group output ports to transformations or targets. Click Repository > Save.
440
Chapter 20
Overview, 442 Common Uses, 443 Sequence Generator Ports, 444 Transformation Properties, 448 Creating a Sequence Generator Transformation, 454
441
Overview
Transformation type: Passive Connected
The Sequence Generator transformation generates numeric values. Use the Sequence Generator to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers. The Sequence Generator transformation is a connected transformation. It contains two output ports that you can connect to one or more transformations. The Integration Service generates a block of sequence numbers each time a block of rows enters a connected transformation. If you connect CURRVAL, the Integration Service processes one row in each block. When NEXTVAL is connected to the input port of another transformation, the Integration Service generates a sequence of numbers. When CURRVAL is connected to the input port of another transformation, the Integration Service generates the NEXTVAL value plus the Increment By value. You can make a Sequence Generator reusable, and use it in multiple mappings. You might reuse a Sequence Generator when you perform multiple loads to a single target. For example, if you have a large input file that you separate into three sessions running in parallel, use a Sequence Generator to generate primary key values. If you use different Sequence Generators, the Integration Service might generate duplicate key values. Instead, use the reusable Sequence Generator for all three sessions to provide a unique value for each target row.
442
Common Uses
You can complete the following tasks with a Sequence Generator transformation:
Create keys. Replace missing values. Cycle through a sequential range of numbers.
Creating Keys
You can create primary or foreign key values with the Sequence Generator transformation by connecting the NEXTVAL port to a target or downstream transformation. You can use a range of values from 1 to 9,223,372,036,854,775,807 with the smallest interval of 1. When you create primary or foreign keys, use the Cycle option to prevent the Integration Service from creating duplicate primary keys. You might do this by selecting the Truncate Target Table option in the session properties or by creating composite keys. To create a composite key, you can configure the Integration Service to cycle through a smaller set of values. For example, if you have three stores generating order numbers, you might have a Sequence Generator cycling through values from 1 to 3, incrementing by 1. When you pass the following set of foreign keys, the generated values then create unique composite keys:
COMPOSITE_KEY 1 2 3 1 2 3 ORDER_NO 12345 12345 12345 12346 12346 12346
Common Uses
443
NEXTVAL
Connect NEXTVAL to multiple transformations to generate unique values for each row in each transformation. Use the NEXTVAL port to generate sequence numbers by connecting it to a downstream transformation or target. You connect the NEXTVAL port to generate the sequence based on the Current Value and Increment By properties. The maximum value of sequence numbers NEXTVAL generates is the maximum value of the Bigint datatype, or 9,223,372,036,854,775,807. If a primary key in the target is of Integer datatype, then the maximum value of sequence numbers NEXTVAL generates is the maximum value of the Integer datatype, or 2,147,483,647. For example, you might connect NEXTVAL to two targets in a mapping to generate unique primary key values. The Integration Service creates a column of unique primary key values for each target table. The column of unique primary key values is sent to one target table as a block of sequence numbers. The other target receives a block of sequence numbers from the Sequence Generator transformation after the first target receives the block of sequence numbers. Figure 20-1 shows connecting NEXTVAL to two target tables in a mapping:
Figure 20-1. Connecting NEXTVAL to Two Target Tables in a Mapping
For example, you configure the Sequence Generator transformation as follows: Current Value = 1, Increment By = 1. The Integration Service generates the following primary key values for the T_ORDERS_PRIMARY and T_ORDERS_FOREIGN target tables:
T_ORDERS_PRIMARY TABLE: PRIMARY KEY 1 T_ORDERS_FOREIGN TABLE: PRIMARY KEY 6
444
If you want the same values to go to more than one target that receives data from a single transformation, you can connect a Sequence Generator transformation to that preceding transformation. The Integration Service processes the values into a block of sequence numbers. This allows the Integration Service to pass unique values to the transformation, and then route rows from the transformation to targets. Figure 20-2 shows a mapping with a the Sequence Generator that passes unique values to the Expression transformation. The Expression transformation then populates both targets with identical primary key values.
Figure 20-2. Mapping with a Sequence Generator and an Expression Transformation
For example, you configure the Sequence Generator transformation as follows: Current Value = 1, Increment By = 1. The Integration Service generates the following primary key values for the T_ORDERS_PRIMARY and T_ORDERS_FOREIGN target tables:
T_ORDERS_PRIMARY TABLE: PRIMARY KEY 1 2 3 4 5 T_ORDERS_FOREIGN TABLE: PRIMARY KEY 1 2 3 4 5
Note: When you run a partitioned session on a grid, the Sequence Generator transformation
445
CURRVAL
CURRVAL is NEXTVAL plus the Increment By value. You typically only connect the CURRVAL port when the NEXTVAL port is already connected to a downstream transformation. When a row enters a transformation connected to the CURRVAL port, the Integration Service passes the last created NEXTVAL value plus one. The maximum value of sequence numbers that CURRVAL can generate is 9,223,372,036,854,775,806. If a primary key in the target is of Integer datatype, then the maximum value of sequence numbers CURRVAL can generate is 2,147,483,646. For information about the Increment By value, see Increment By on page 449. Figure 20-3 shows connecting CURRVAL and NEXTVAL ports to a target:
Figure 20-3. Connecting CURRVAL and NEXTVAL Ports to a Target
For example, you configure the Sequence Generator transformation as follows: Current Value = 1, Increment By = 1. The Integration Service generates the following values for NEXTVAL and CURRVAL:
NEXTVAL 1 2 3 4 5 CURRVAL 2 3 4 5 6
If you connect the CURRVAL port without connecting the NEXTVAL port, the Integration Service passes a constant value for each row. When you connect the CURRVAL port in a Sequence Generator transformation, the Integration Service processes one row in each block. You can optimize performance by connecting only the NEXTVAL port in a mapping.
446
Note: When you run a partitioned session on a grid, the Sequence Generator transformation
447
Transformation Properties
The Sequence Generator transformation is unique among all transformations because you cannot add, edit, or delete the default ports, NEXTVAL and CURRVAL. Table 20-1 lists the Sequence Generator transformation properties you can configure:
Table 20-1. Sequence Generator Transformation Properties
Sequence Generator Setting Start Value Required/ Optional Required Description Start value of the generated sequence that you want the Integration Service to use if you use the Cycle option. If you select Cycle, the Integration Service cycles back to this value when it reaches the end value. Default is 0. Maximum value is 9,233,372,036,854,775,806. Difference between two consecutive values from the NEXTVAL port. Default is 1. Maximum value is 2,147,483,647. If the Increment By value is greater than the maximum value, the client displays a message that states the value of Increment By must be less than or equal to 2,147,483,647. Maximum value the Integration Service generates. If the Integration Service reaches this value during the session and the sequence is not configured to cycle, the session fails. Maximum value is 9,233,372,036,854,775,807. Current value of the sequence. Enter the value you want the Integration Service to use as the first value in the sequence. If you want to cycle through a series of values, the value must be greater than or equal to the start value and less than the end value. If the Number of Cached Values is set to 0, the Integration Service updates the current value to reflect the last-generated value for the session plus one, and then uses the updated current value as the basis for the next time you run this session. However, if you use the Reset option, the Integration Service resets this value to its original value after each session. Note: If you edit this setting, you reset the sequence to the new setting. If you reset Current Value to 10, and the increment is 1, the next time you use the session, the Integration Service generates a first value of 10. Maximum value is 9,233,372,036,854,775,806. The Integration Service sets the value to NULL if the current value exceeds the maximum value. If enabled, the Integration Service cycles through the sequence range. If disabled, the Integration Service stops the sequence at the configured end value. The Integration Service fails the session with overflow errors if it reaches the end value and still has rows to process.
Increment By
Required
End Value
Optional
Current Value
Optional
Cycle
Optional
448
Reset
Optional
Tracing Level
Optional
Enter the lowest value in the sequence that you want the Integration Service to use for the Start Value. Enter the highest value to be used for End Value. Select Cycle.
As it cycles, the Integration Service reaches the configured end value for the sequence, it wraps around and starts the cycle again, beginning with the configured Start Value.
Increment By
The Integration Service generates a sequence (NEXTVAL) based on the Current Value and Increment By properties in the Sequence Generator transformation. The Current Value property is the value at which the Integration Service starts creating the sequence for each session. Increment By is the integer the Integration Service adds to the existing value to create the new value in the sequence. By default, the Current Value is set to 1, and Increment By is set to 1. For example, you might create a Sequence Generator transformation with a current value of 1,000 and an increment of 10. If you pass three rows through the mapping, the Integration Service generates the following set of values:
Transformation Properties 449
End Value
End Value is the maximum value you want the Integration Service to generate. If the Integration Service reaches the end value and the Sequence Generator is not configured to cycle through the sequence, the session fails with the following error message:
TT_11009 Sequence Generator Transformation: Overflow error.
If the Integration Service writes primary or foreign key bigint values to an Oracle target of Integer datatype, it drops the row and writes a message in the session log. To avoid dropping rows, change the Oracle Integer datatype to the Number datatype that can hold bigint data. The Integration Service converts the transformation Bigint datatype to the native Bigint datatype for IBM DB2, Microsoft SQL Server, Informix, Sybase IQ, and Teradata Parallel Transporter.
Note: Set the end value to any integer between 1 and 9,233,372,036,854,775,806. The session
Current Value
The Integration Service uses the current value as the basis for generated values for each session. To indicate which value you want the Integration Service to use the first time it uses the Sequence Generator transformation, you must enter that value as the current value. If you want to use the Sequence Generator transformation to cycle through a series of values, the current value must be greater than or equal to Start Value and less than the end value. At the end of each session, the Integration Service updates the current value to the last value generated for the session plus one if the Sequence Generator Number of Cached Values is 0. For example, if the Integration Service ends a session with a generated value of 101, it updates the Sequence Generator current value to 102 in the repository. The next time the Sequence Generator is used, the Integration Service uses 102 as the basis for the next generated value. If the Sequence Generator Increment By is 1, when the Integration Service starts another session using the Sequence Generator, the first generated value is 102. If you have multiple versions of a Sequence Generator transformation, the Integration Service updates the current value across all versions when it runs a session. The Integration Service updates the current value across versions regardless of whether you have checked out the Sequence Generator transformation or the parent mapping. The updated current value overrides an edited current value for a Sequence Generator transformation if the two values are different. For example, User 1 creates Sequence Generator transformation and checks it in, saving a current value of 10 to Sequence Generator version 1. Then User 1 checks out the Sequence Generator transformation and enters a new current value of 100 to Sequence Generator version 2. User 1 keeps the Sequence Generator transformation checked out. Meanwhile,
450 Chapter 20: Sequence Generator Transformation
User 2 runs a session that uses the Sequence Generator transformation version 1. The Integration Service uses the checked-in value of 10 as the current value when User 2 runs the session. When the session completes, the current value is 150. The Integration Service updates the current value to 150 for version 1 and version 2 of the Sequence Generator transformation even though User 1 has the Sequence Generator transformation checked out. If you open the mapping after you run the session, the current value displays the last value generated for the session plus one. Since the Integration Service uses the current value to determine the first value for each session, you should edit the current value only when you want to reset the sequence. If you have multiple versions of the Sequence Generator transformation, and you want to reset the sequence, you must check in the mapping or reusable Sequence Generator transformation after you modify the current value.
Note: If you configure the Sequence Generator to Reset, the Integration Service uses the
current value as the basis for the first generated value for each session.
values for the Sequence Generator transformation. This reduces the communication required between the master and worker DTM processes and the repository.
Transformation Properties
451
repository during the session. It also causes sections of skipped values since unused cached values are discarded at the end of each session. For example, you configure a Sequence Generator transformation as follows: Number of Cached Values = 50, Current Value = 1, Increment By = 1. When the Integration Service starts the session, it caches 50 values for the session and updates the current value to 50 in the repository. The Integration Service uses values 1 to 39 for the session and discards the unused values, 40 to 49. When the Integration Service runs the session again, it checks the repository for the current value, which is 50. It then caches the next 50 values and updates the current value to 100. During the session, it uses values 50 to 98. The values generated for the two sessions are 1 to 39 and 50 to 98.
Reset
If you select Reset for a non-reusable Sequence Generator transformation, the Integration Service generates values based on the original current value each time it starts the session. Otherwise, the Integration Service updates the current value to reflect the last-generated value plus one, and then uses the updated value the next time it uses the Sequence Generator transformation. For example, you might configure a Sequence Generator transformation to create values from 1 to 1,000 with an increment of 1, and a current value of 1 and choose Reset. During the first
452 Chapter 20: Sequence Generator Transformation
session run, the Integration Service generates numbers 1 through 234. Each subsequent time the session runs, the Integration Service again generates numbers beginning with the current value of 1. If you do not select Reset, the Integration Service updates the current value to 235 at the end of the first session run. The next time it uses the Sequence Generator transformation, the first value generated is 235.
Note: Reset is disabled for reusable Sequence Generator transformations.
Transformation Properties
453
In the Mapping Designer, click Transformation > Create. Select the Sequence Generator transformation. The naming convention for Sequence Generator transformations is SEQ_TransformationName.
2.
Enter a name for the Sequence Generator, and click Create. Click Done. The Designer creates the Sequence Generator transformation.
3. 4. 5.
Double-click the title bar of the transformation. Enter a description for the transformation. Select the Properties tab. Enter settings. For a list of transformation properties, see Table 20-1 on page 448.
454
Note: You cannot override the Sequence Generator transformation properties at the
session level. This protects the integrity of the sequence values generated.
6. 7.
Click OK. To generate new sequences during a session, connect the NEXTVAL port to at least one transformation in the mapping. Use the NEXTVAL or CURRVAL ports in an expression in other transformations.
8.
455
456
Chapter 21
Sorter Transformation
Overview, 458 Sorting Data, 459 Sorter Transformation Properties, 461 Creating a Sorter Transformation, 465
457
Overview
Transformation type: Active Connected
You can sort data with the Sorter transformation. You can sort data in ascending or descending order according to a specified sort key. You can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be distinct. The Sorter transformation is an active transformation. It must be connected to the data flow. You can sort data from relational or flat file sources. You can also use the Sorter transformation to sort data passing through an Aggregator transformation configured to use sorted input. When you create a Sorter transformation in a mapping, you specify one or more ports as a sort key and configure each sort key port to sort in ascending or descending order. You also configure sort criteria the Integration Service applies to all sort key ports and the system resources it allocates to perform the sort operation. Figure 21-1 shows a simple mapping that uses a Sorter transformation. The mapping passes rows from a sales table containing order information through a Sorter transformation before loading to the target.
Figure 21-1. Sample Mapping with a Sorter Transformation
458
Sorting Data
The Sorter transformation contains only input/output ports. All data passing through the Sorter transformation is sorted according to a sort key. The sort key is one or more ports that you want to use as the sort criteria. You can specify more than one port as part of the sort key. When you specify multiple ports for the sort key, the Integration Service sorts each port sequentially. The order the ports appear in the Ports tab determines the succession of sort operations. The Sorter transformation treats the data passing through each successive sort key port as a secondary sort of the previous port. At session run time, the Integration Service sorts data according to the sort order specified in the session properties. The sort order determines the sorting criteria for special characters and symbols. Figure 21-2 shows the Ports tab configuration for the Sorter transformation sorting the data in ascending order by order ID and item ID:
Figure 21-2. Sample Sorter Transformation Ports Configuration
At session run time, the Integration Service passes the following rows into the Sorter transformation:
ORDER_ID 45 45 43 41 ITEM_ID 123456 456789 000246 000468 QUANTITY 3 2 6 5 DISCOUNT 3.04 12.02 34.55 .56
Sorting Data
459
After sorting the data, the Integration Service passes the following rows out of the Sorter transformation:
ORDER_ID 41 43 45 45 ITEM_ID 000468 000246 123456 456789 QUANTITY 5 6 3 2 DISCOUNT .56 34.55 3.04 12.02
460
461
If it cannot allocate enough memory, the Integration Service fails the session. For best performance, configure Sorter cache size with a value less than or equal to the amount of available physical RAM on the Integration Service machine. Allocate at least 8 MB (8,388,608 bytes) of physical memory to sort data using the Sorter transformation. Sorter cache size is set to 8,388,608 bytes by default. If the amount of incoming data is greater than the amount of Sorter cache size, the Integration Service temporarily stores data in the Sorter transformation work directory. The Integration Service requires disk space of at least twice the amount of incoming data when storing data in the work directory. If the amount of incoming data is significantly greater than the Sorter cache size, the Integration Service may require much more than twice the amount of disk space available to the work directory. Use the following formula to determine the size of incoming data:
number_of_input_rows [( column_size) + 16]
Table 21-1 gives the individual column size values by datatype for Sorter data calculations:
Table 21-1. Column Sizes for Sorter Data Calculations
Datatype Binary Date/Time Decimal, high precision off (all precision) Decimal, high precision on (precision <=18) Decimal, high precision on (precision >18, <=28) Decimal, high precision on (precision >28) Decimal, high precision on (negative scale) Double Real Integer Small integer Bigint NString, NText, String, Text Column Size precision + 8 Round to nearest multiple of 8 29 16 24 32 16 16 16 16 16 16 64 Unicode mode: 2*(precision + 5) ASCII mode: precision + 9
The column sizes include the bytes required for a null indicator. To increase performance for the sort operation, the Integration Service aligns all data for the Sorter transformation memory on an 8-byte boundary. Each Sorter column includes rounding to the nearest multiple of eight.
462
The Integration Service also writes the row size and amount of memory the Sorter transformation uses to the session log when you configure the Sorter transformation tracing level to Normal. For more information about Sorter transformation tracing levels, see Tracing Level on page 463.
Case Sensitive
The Case Sensitive property determines whether the Integration Service considers case when sorting data. When you enable the Case Sensitive property, the Integration Service sorts uppercase characters higher than lowercase characters.
Work Directory
You must specify a work directory the Integration Service uses to create temporary files while it sorts data. After the Integration Service sorts the data, it deletes the temporary files. You can specify any directory on the Integration Service machine to use as a work directory. By default, the Integration Service uses the value specified for the $PMTempDir process variable. When you partition a session with a Sorter transformation, you can specify a different work directory for each partition in the pipeline. To increase session performance, specify work directories on physically separate disks on the Integration Service system.
Tracing Level
Configure the Sorter transformation tracing level to control the number and type of Sorter error and status messages the Integration Service writes to the session log. At Normal tracing level, the Integration Service writes the size of the row passed to the Sorter transformation and the amount of memory the Sorter transformation allocates for the sort operation. The Integration Service also writes the time and date when it passes the first and last input rows to the Sorter transformation. If you configure the Sorter transformation tracing level to Verbose Data, the Integration Service writes the time the Sorter transformation finishes passing all data to the next transformation in the pipeline. The Integration Service also writes the time to the session log when the Sorter transformation releases memory resources and removes temporary files from the work directory. For more information about configuring tracing levels for transformations, see Configuring Tracing Level in Transformations on page 32.
463
Transformation Scope
The transformation scope specifies how the Integration Service applies the transformation logic to incoming data:
Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions. All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.
For more information about transformation scope, see Understanding Commit Points in the Workflow Administration Guide.
464
In the Mapping Designer, click Transformation > Create. Select the Sorter transformation. The naming convention for Sorter transformations is SRT_TransformationName. Enter a description for the transformation. This description appears in the Repository Manager, making it easier to understand what the transformation does.
2.
Enter a name for the Sorter and click Create. The Designer creates the Sorter transformation.
3. 4.
Click Done. Drag the ports you want to sort into the Sorter transformation. The Designer creates the input/output ports for each port you include.
5. 6. 7. 8. 9.
Double-click the title bar of the transformation to open the Edit Transformations dialog box. Select the Ports tab. Select the ports you want to use as the sort key. For each port selected as part of the sort key, specify whether you want the Integration Service to sort data in ascending or descending order. Select the Properties tab. Modify the Sorter transformation properties. For information about Sorter transformation properties, see Sorter Transformation Properties on page 461. Select the Metadata Extensions tab. Create or edit metadata extensions for the Sorter transformation. For more information about metadata extensions, see Metadata Extensions in the Repository Guide. Click OK. Click Repository > Save to save changes to the mapping.
10.
11. 12.
465
466
Chapter 22
Overview, 468 Source Qualifier Transformation Properties, 471 Default Query, 473 Joining Source Data, 476 Adding an SQL Query, 480 Entering a User-Defined Join, 482 Outer Join Support, 484 Entering a Source Filter, 492 Using Sorted Ports, 494 Select Distinct, 496 Adding Pre- and Post-Session SQL Commands, 497 Creating a Source Qualifier Transformation, 498 Troubleshooting, 500
467
Overview
Transformation type: Active Connected
When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier transformation represents the rows that the Integration Service reads when it runs a session. Use the Source Qualifier transformation to complete the following tasks:
Join data originating from the same source database. You can join two or more tables with primary key-foreign key relationships by linking the sources to one Source Qualifier transformation. Filter rows when the Integration Service reads source data. If you include a filter condition, the Integration Service adds a WHERE clause to the default query. Specify an outer join rather than the default inner join. If you include a user-defined join, the Integration Service replaces the join information specified by the metadata in the SQL query. Specify sorted ports. If you specify a number for sorted ports, the Integration Service adds an ORDER BY clause to the default SQL query. Select only distinct values from the source. If you choose Select Distinct, the Integration Service adds a SELECT DISTINCT statement to the default SQL query. Create a custom query to issue a special SELECT statement for the Integration Service to read source data. For example, you might use a custom query to perform aggregate calculations.
Transformation Datatypes
The Source Qualifier transformation displays the transformation datatypes. The transformation datatypes determine how the source database binds data when the Integration Service reads it. Do not alter the datatypes in the Source Qualifier transformation. If the datatypes in the source definition and Source Qualifier transformation do not match, the Designer marks the mapping invalid when you save it.
468
If one Source Qualifier transformation provides data for multiple targets, you can enable constraint-based loading in a session to have the Integration Service load data based on target table primary and foreign key relationships. For more information, see Mappings in the Designer Guide.
Some databases require you to identify datetime values with additional punctuation, such as single quotation marks or database specific functions. For example, to convert the
Overview
469
$$$SessStartTime value for an Oracle source, use the following Oracle function in the SQL override:
to_date ($$$SessStartTime, mm/dd/yyyy hh24:mi:ss)
For Informix, use the following Informix function in the SQL override to convert the $$$SessStartTime value:
DATETIME ($$$SessStartTime) YEAR TO SECOND
For more information about SQL override, see Overriding the Default Query on page 474. For information about database specific functions, see the database documentation.
Tip: To ensure the format of a datetime parameter or variable matches that used by the source,
validate the SQL query. For information about mapping parameters and variables, see Mapping Parameters and Variables in the Designer Guide.
470
User-Defined Join
Tracing Level
Select Distinct
471
Post-SQL
Output is Deterministic
Output is Repeatable
472
Default Query
For relational sources, the Integration Service generates a query for each Source Qualifier transformation when it runs a session. The default query is a SELECT statement for each source column used in the mapping. In other words, the Integration Service reads only the columns that are connected to another transformation. Figure 22-1 shows a single source definition connected to a Source Qualifier transformation:
Figure 22-1. Source Definition Connected to a Source Qualifier Transformation
Although there are many columns in the source definition, only three columns are connected to another transformation. In this case, the Integration Service generates a default query that selects only those three columns:
SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.FIRST_NAME FROM CUSTOMERS
If any table name or column name contains a database reserved word, you can create and maintain a file, reswords.txt, containing reserved words. When the Integration Service initializes a session, it searches for reswords.txt in the Integration Service installation directory. If the file exists, the Integration Service places quotes around matching reserved words when it executes SQL against the database. If you override the SQL, you must enclose any reserved word in quotes. For more information about the reserved words file, see the Workflow Administration Guide. When generating the default query, the Designer delimits table and field names containing the following characters with double quotes:
/ + - = ~ ` ! % ^ & * ( ) [ ] { } ' ; ? , < > \ | <space>
Default Query
473
From the Properties tab, select SQL Query. The SQL Editor appears.
The SQL Editor displays the default query the Integration Service uses to select source data.
2. 3.
Note: If you do not cancel the SQL query, the Integration Service overrides the default query
with the custom SQL query. Do not connect to the source database. You only connect to the source database when you enter an SQL query that overrides the default query. You must connect the columns in the Source Qualifier transformation to another transformation or target before you can generate the default query.
only the defined SQL statement. The SQL Query overrides the User-Defined Join, Source Filter, Number of Sorted Ports, and Select Distinct settings in the Source Qualifier transformation.
Note: When you override the default SQL query, you must enclose all database reserved words
in quotes.
Default Query
475
Default Join
When you join related tables in one Source Qualifier transformation, the Integration Service joins the tables based on the related keys in each table. This default join is an inner equijoin, using the following syntax in the WHERE clause:
Source1.column_name = Source2.column_name
For example, you might see all the orders for the month, including order number, order amount, and customer name. The ORDERS table includes the order number and amount of each order, but not the customer name. To include the customer name, you need to join the ORDERS and CUSTOMERS tables. Both tables include a customer ID, so you can join the tables in one Source Qualifier transformation.
476
Figure 22-2 shows joining two tables with one Source Qualifier transformation:
Figure 22-2. Joining Two Tables with One Source Qualifier Transformation
When you include multiple tables, the Integration Service generates a SELECT statement for all columns used in the mapping. In this case, the SELECT statement looks similar to the following statement:
SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME, CUSTOMERS.ADDRESS1, CUSTOMERS.ADDRESS2, CUSTOMERS.CITY, CUSTOMERS.STATE, CUSTOMERS.POSTAL_CODE, CUSTOMERS.PHONE, CUSTOMERS.EMAIL, ORDERS.ORDER_ID, ORDERS.DATE_ENTERED, ORDERS.DATE_PROMISED, ORDERS.DATE_SHIPPED, ORDERS.EMPLOYEE_ID, ORDERS.CUSTOMER_ID, ORDERS.SALES_TAX_RATE, ORDERS.STORE_ID FROM CUSTOMERS, ORDERS WHERE CUSTOMERS.CUSTOMER_ID=ORDERS.CUSTOMER_ID
The WHERE clause is an equijoin that includes the CUSTOMER_ID from the ORDERS and CUSTOMER tables.
Custom Joins
If you need to override the default join, you can enter contents of the WHERE clause that specifies the join in the custom query. If the query performs an outer join, the Integration Service may insert the join syntax in the WHERE clause or the FROM clause, depending on the database syntax. You might need to override the default join under the following circumstances:
The datatypes of columns used for the join do not match. You want to specify a different type of join, such as an outer join.
For more information about custom joins and queries, see Entering a User-Defined Join on page 482.
Heterogeneous Joins
To perform a heterogeneous join, use the Joiner transformation. Use the Joiner transformation when you need to join the following types of sources:
Join data from different source databases Join data from different flat file systems Join relational sources and flat files
the primary key-foreign keys. If the source table has fewer than 1,000 rows, you might decrease performance if you index the primary key-foreign keys. For example, the corporate office for a retail chain wants to extract payments received based on orders. The ORDERS and PAYMENTS tables do not share primary and foreign keys. Both tables, however, include a DATE_SHIPPED column. You can create a primary keyforeign key relationship in the metadata in the Source Analyzer. Note, the two tables are not linked. Therefore, the Designer does not recognize the relationship on the DATE_SHIPPED columns. You create a relationship between the ORDERS and PAYMENTS tables by linking the DATE_SHIPPED columns. The Designer adds primary and foreign keys to the DATE_SHIPPED columns in the ORDERS and PAYMENTS table definitions.
478
If you do not connect the columns, the Designer does not recognize the relationships. The primary key-foreign key relationships exist in the metadata only. You do not need to generate SQL or alter the source tables. Once the key relationships exist, use a Source Qualifier transformation to join the two tables. The default join is based on DATE_SHIPPED.
479
Open the Source Qualifier transformation, and click the Properties tab. Click the Open button in the SQL Query field. The SQL Editor dialog box appears. Click Generate SQL. The Designer displays the default query it generates when querying rows from all sources included in the Source Qualifier transformation.
4.
Enter a query in the space where the default query appears. Every column name must be qualified by the name of the table, view, or synonym in which it appears. For example, if you want to include the ORDER_ID column from the ORDERS table, enter ORDERS.ORDER_ID. You can double-click column names appearing in the Ports window to avoid typing the name of every column.
480
You can use a parameter or variable as the query, or you can include parameters and variables in the query. Enclose string mapping parameters and variables in string identifiers. Alter the date format for datetime mapping parameters and variables when necessary.
5. 6. 7.
Select the ODBC data source containing the sources included in the query. Enter the user name and password to connect to this database. Click Validate. The Designer runs the query and reports whether its syntax was correct.
8. 9.
Click OK to return to the Edit Transformations dialog box. Click OK again to return to the Designer. Click Repository > Save.
Tip: You can resize the Expression Editor. Expand the dialog box by dragging from the
borders. The Designer saves the new size for the dialog box as a client setting.
481
Create a Source Qualifier transformation containing data from multiple sources or associated sources. Open the Source Qualifier transformation, and click the Properties tab. Click the Open button in the User Defined Join field. The SQL Editor dialog box appears. Enter the syntax for the join. Do not enter the keyword WHERE at the beginning of the join. The Integration Service adds this keyword when it queries rows.
482
Enclose string mapping parameters and variables in string identifiers. Alter the date format for datetime mapping parameters and variables when necessary.
5. 6.
Click OK to return to the Edit Transformations dialog box, and then click OK to return to the Designer. Click Repository > Save.
483
Left. Integration Service returns all rows for the table to the left of the join syntax and the rows from both tables that meet the join condition. Right. Integration Service returns all rows for the table to the right of the join syntax and the rows from both tables that meet the join condition.
Note: Use outer joins in nested query statements when you override the default query.
When you use Informatica join syntax, enclose the entire join statement in braces ({Informatica syntax}). When you use database syntax, enter syntax supported by the source database without braces. When using Informatica join syntax, use table names to prefix column names. For example, if you have a column named FIRST_NAME in the REG_CUSTOMER table, enter REG_CUSTOMER.FIRST_NAME in the join syntax. Also, when using an alias for a table name, use the alias within the Informatica join syntax to ensure the Integration Service recognizes the alias.
484
Table 22-3 lists the join syntax you can enter, in different locations for different Source Qualifier transformations, when you create an outer join:
Table 22-3. Locations for Entering Outer Join Syntax
Transformation Source Qualifier Transformation Transformation Setting User-Defined Join Description Create a join override. The Integration Service appends the join override to the WHERE or FROM clause of the default query. Enter join syntax immediately after the WHERE in the default query. Create a join override. The Integration Service appends the join override to the WHERE clause of the default query. Enter join syntax immediately after the WHERE in the default query.
Extract Override
You can combine left outer and right outer joins with normal joins in a single source qualifier. Use multiple normal joins and multiple left outer joins. When you combine joins, enter them in the following order: 1. 2. 3. Normal Left outer Right outer
Note: Some databases limit you to using one right outer join.
Table 22-4 displays the syntax for Normal Joins in a Join Override:
Table 22-4. Syntax for Normal Joins in a Join Override
Syntax source1 source2 join_condition Description Source table name. The Integration Service returns rows from this table that match the join condition. Source table name. The Integration Service returns rows from this table that match the join condition. Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator.
485
For example, you have a REG_CUSTOMER table with data for registered customers:
CUST_ID 00001 00002 00003 00004 FIRST_NAME LAST_NAME Marvin Dinah John J. Chi Jones Bowden Marks
To return rows displaying customer names for each transaction in the month of June, use the following syntax:
{ REG_CUSTOMER INNER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }
The Integration Service returns rows with matching customer IDs. It does not include customers who made no purchases in June. It also does not include purchases made by nonregistered customers.
486
Table 22-5 displays syntax for left outer joins in a join override:
Table 22-5. Syntax for Left Outer Joins in a Join Override
Syntax source1 source2 join_condition Description Source table name. With a left outer join, the Integration Service returns all rows in this table. Source table name. The Integration Service returns rows from this table that match the join condition. Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator.
For example, using the same REG_CUSTOMER and PURCHASES tables described in Normal Join Syntax on page 485, you can determine how many customers bought something in June with the following join override:
{ REG_CUSTOMER LEFT OUTER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }
The Integration Service returns all registered customers in the REG_CUSTOMERS table, using null values for the customer who made no purchases in June. It does not include purchases made by non-registered customers. Use multiple join conditions to determine how many registered customers spent more than $100.00 in a single purchase in June:
{REG_CUSTOMER LEFT OUTER JOIN PURCHASES on (REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID AND PURCHASES.AMOUNT > 100.00) }
487
You might use multiple left outer joins if you want to incorporate information about returns during the same time period. For example, the RETURNS table contains the following data:
CUST_ID 00002 00002 CUST_ID 6/10/2000 6/21/2000 RETURN 55.79 104.45
To determine how many customers made purchases and returns for the month of June, use two left outer joins:
{ REG_CUSTOMER LEFT OUTER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID LEFT OUTER JOIN RETURNS on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }
488
Table 22-6 displays syntax for right outer joins in a join override:
Table 22-6. Syntax for Right Outer Joins in a Join Override
Syntax source1 source2 join_condition Description Source table name. The Integration Service returns rows from this table that match the join condition. Source table name. With a right outer join, the Integration Service returns all rows in this table. Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator.
You might use a right outer join with a left outer join to join and return all data from both tables, simulating a full outer join. For example, you can extract all registered customers and all purchases for the month of June with the following join override:
{REG_CUSTOMER LEFT OUTER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID RIGHT OUTER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }
489
Open the Source Qualifier transformation, and click the Properties tab. In a Source Qualifier transformation, click the button in the User Defined Join field. In an Application Source Qualifier transformation, click the button in the Join Override field.
3.
Enter the syntax for the join. Do not enter WHERE at the beginning of the join. The Integration Service adds this when querying rows. Enclose Informatica join syntax in braces ( { } ). When using an alias for a table and the Informatica join syntax, use the alias within the Informatica join syntax. Use table names to prefix columns names, for example, table.column. Use join conditions supported by the source database. When entering multiple joins, group joins together by type, and then list them in the following order: normal, left outer, right outer. Include only one right outer join per nested query. Select port names from the Ports tab to ensure accuracy.
4.
Click OK.
After connecting the input and output ports for the Application Source Qualifier transformation, double-click the title bar of the transformation and select the Properties tab. In an Application Source Qualifier transformation, click the button in the Extract Override field. Click Generate SQL. Enter the syntax for the join in the WHERE clause immediately after the WHERE. Enclose Informatica join syntax in braces ( { } ). When using an alias for a table and the Informatica join syntax, use the alias within the Informatica join syntax. Use table names to prefix columns names, for example, table.column. Use join conditions supported by the source database. When entering multiple joins, group joins together by type, and then list them in the following order: normal, left outer, right outer. Include only one right outer join per nested query. Select port names from the Ports tab to ensure accuracy.
2. 3. 4.
490
5.
Click OK.
Do not combine join conditions with the OR operator in the ON clause of outer join syntax. Do not use the IN operator to compare columns in the ON clause of outer join syntax. Do not compare a column to a subquery in the ON clause of outer join syntax. When combining two or more outer joins, do not use the same table as the inner table of more than one outer join. For example, do not use either of the following outer joins:
{ TABLE1 LEFT OUTER JOIN TABLE2 ON TABLE1.COLUMNA = TABLE2.COLUMNA TABLE3 LEFT OUTER JOIN TABLE2 ON TABLE3.COLUMNB = TABLE2.COLUMNB } { TABLE1 LEFT OUTER JOIN TABLE2 ON TABLE1.COLUMNA = TABLE2.COLUMNA TABLE2 RIGHT OUTER JOIN TABLE3 ON TABLE2.COLUMNB = TABLE3.COLUMNB}
Do not use both tables of an outer join in a regular join condition. For example, do not use the following join condition:
{ TABLE1 LEFT OUTER JOIN TABLE2 ON TABLE1.COLUMNA = TABLE2.COLUMNA WHERE TABLE1.COLUMNB = TABLE2.COLUMNC}
Note: Entering a condition in the ON clause might return different results from entering
When using an alias for a table, use the alias to prefix columns in the table. For example, if you call the REG_CUSTOMER table C, when referencing the column FIRST_NAME, use C.FIRST_NAME.
491
In the Mapping Designer, open a Source Qualifier transformation. The Edit Transformations dialog box appears.
2. 3.
Select the Properties tab. Click the Open button in the Source Filter field.
492
4.
In the SQL Editor dialog box, enter the filter. Include the table name and port name. Do not include the keyword WHERE in the filter. Enclose string mapping parameters and variables in string identifiers. Alter the date format for datetime mapping parameters and variables when necessary.
5.
Click OK.
493
Aggregator. When you configure an Aggregator transformation for sorted input, you can send sorted data by using sorted ports. The group by ports in the Aggregator transformation must match the order of the sorted ports in the Source Qualifier transformation. For more information about using a sorted Aggregator transformation, see Using Sorted Input on page 50. Joiner. When you configure a Joiner transformation for sorted input, you can send sorted data by using sorted ports. Configure the order of the sorted ports the same in each Source Qualifier transformation. For more information about using a sorted Joiner transformation, see Using Sorted Input on page 314.
Note: You can also use the Sorter transformation to sort relational and flat file data before
Aggregator and Joiner transformations. For more information about sorting data using the Sorter transformation, see Sorter Transformation on page 457. Use sorted ports for relational sources only. When using sorted ports, the sort order of the source database must match the sort order configured for the session. The Integration Service creates the SQL query used to extract source data, including the ORDER BY clause for sorted ports. The database server performs the query and passes the resulting data to the Integration Service. To ensure data is sorted as the Integration Service requires, the database sort order must be the same as the user-defined session sort order. When you configure the Integration Service for data code page validation and run a workflow in Unicode data movement mode, the Integration Service uses the selected sort order to sort character data. When you configure the Integration Service for relaxed data code page validation, the Integration Service uses the selected sort order to sort all character data that falls in the language range of the selected sort order. The Integration Service sorts all character data outside the language range of the selected sort order according to standard Unicode sort ordering. When the Integration Service runs in ASCII mode, it ignores this setting and sorts all character data using a binary sort order. The default sort order depends on the code page of the Integration Service. The Source Qualifier transformation includes the number of sorted ports in the default SQL query. However, if you modify the default query after choosing the Number of Sorted Ports, the Integration Service uses only the query defined in the SQL Query property.
494
In the Mapping Designer, open a Source Qualifier transformation, and click the Properties tab. Click in Number of Sorted Ports and enter the number of ports you want to sort. The Integration Service adds the configured number of columns to an ORDER BY clause, starting from the top of the Source Qualifier transformation. The source database sort order must correspond to the session sort order.
Tip: Sybase supports a maximum of 16 columns in an ORDER BY clause. If the source is
Click OK.
495
Select Distinct
If you want the Integration Service to select unique values from a source, use the Select Distinct option. You might use this feature to extract unique customer IDs from a table listing total sales. Using Select Distinct filters out unnecessary data earlier in the data flow, which might improve performance. By default, the Designer generates a SELECT statement. If you choose Select Distinct, the Source Qualifier transformation includes the setting in the default SQL query. For example, in the Source Qualifier transformation in Figure 22-2 on page 477, you enable the Select Distinct option. The Designer adds SELECT DISTINCT to the default query as follows:
SELECT DISTINCT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME, CUSTOMERS.ADDRESS1, CUSTOMERS.ADDRESS2, CUSTOMERS.CITY, CUSTOMERS.STATE, CUSTOMERS.POSTAL_CODE, CUSTOMERS.EMAIL, ORDERS.ORDER_ID, ORDERS.DATE_ENTERED, ORDERS.DATE_PROMISED, ORDERS.DATE_SHIPPED, ORDERS.EMPLOYEE_ID, ORDERS.CUSTOMER_ID, ORDERS.SALES_TAX_RATE, ORDERS.STORE_ID FROM CUSTOMERS, ORDERS WHERE CUSTOMERS.CUSTOMER_ID=ORDERS.CUSTOMER_ID
However, if you modify the default query after choosing Select Distinct, the Integration Service uses only the query defined in the SQL Query property. In other words, the SQL Query overrides the Select Distinct setting.
To use Select Distinct: 1. 2.
Open the Source Qualifier transformation in the mapping, and click on the Properties tab. Check Select Distinct, and Click OK.
In the Workflow Manager, open the Session task, and click the Mapping tab. Click the Transformations view, and click the Source Qualifier transformation under the Sources node. In the Properties settings, enable Select Distinct, and click OK.
496
Use any command that is valid for the database type. However, the Integration Service does not allow nested comments, even though the database might. You can use parameters and variables in source pre- and post-session SQL commands, or you can use a parameter or variable as the command. Use any parameter or variable type that you can define in the parameter file. For information about using parameter files, see the Workflow Administration Guide. Use a semicolon (;) to separate multiple statements. The Integration Service issues a commit after each statement. The Integration Service ignores semicolons within /*...*/. If you need to use a semicolon outside of comments, you can escape it with a backslash (\). When you escape the semicolon, the Integration Service ignores the backslash, and it does not use the semicolon as a statement separator. The Designer does not validate the SQL.
Note: You can also enter pre- and post-session SQL commands on the Properties tab of the
497
In the Designer, click Tools > Options. Select the Format tab. In the Tools options, select Mapping Designer. Select Create Source Qualifier When Opening Sources.
For more information about configuring Designer options, see Using the Designer in the Designer Guide.
In the Mapping Designer, click Transformation > Create. Enter a name for the transformation, and click Create. Select a source, and click OK. Click Done.
In the Designer, open a mapping. Double-click the title bar of the Source Qualifier transformation. In the Edit Transformations dialog box, click Rename, enter a descriptive name for the transformation, and click OK. The naming convention for Source Qualifier transformations is SQ_TransformationName, such as SQ_AllSources.
498
4. 5.
Click the Properties tab. Enter the Source Qualifier transformation properties. For information about the Source Qualifier transformation properties, see Source Qualifier Transformation Properties on page 471. Click the Sources tab and indicate any associated source definitions you want to define for this transformation. Identify associated sources only when you need to join data from multiple databases or flat file systems.
6.
7.
499
Troubleshooting
I cannot perform a drag and drop operation, such as connecting ports. Review the error message on the status bar for details. I cannot connect a source definition to a target definition. You cannot directly connect sources to targets. Instead, you need to connect them through a Source Qualifier transformation for relational and flat file sources, or through a Normalizer transformation for COBOL sources. I cannot connect multiple sources to one target. The Designer does not allow you to connect multiple Source Qualifier transformations to a single target. There are two workarounds:
Reuse targets. Since target definitions are reusable, you can add the same target to the mapping multiple times. Then connect each Source Qualifier transformation to each target. Join the sources in a Source Qualifier transformation. Then remove the WHERE clause from the SQL query.
I entered a custom query, but it is not working when I run the workflow containing the session. Be sure to test this setting for the Source Qualifier transformation before you run the workflow. Return to the Source Qualifier transformation and reopen the dialog box in which you entered the custom query. You can connect to a database and click the Validate button to test the SQL. The Designer displays any errors. Review the session log file if you need further information. The most common reason a session fails is because the database login in both the session and Source Qualifier transformation is not the table owner. You need to specify the table owner in the session and when you generate the SQL Query in the Source Qualifier transformation. You can test the SQL Query by cutting and pasting it into the database client tool (such as Oracle Net) to see if it returns an error. I used a mapping variable in a source filter and now the session fails. Try testing the query by generating and validating the SQL in the Source Qualifier transformation. If the variable or parameter is a string, you probably need to enclose it in single quotes. If it is a datetime variable or parameter, you might need to change its format for the source system.
500
Chapter 23
SQL Transformation
Overview, 502 Script Mode, 503 Query Mode, 506 Connecting to Databases, 513 Session Processing, 517 SQL Transformation Properties, 523 SQL Statements, 529 Creating an SQL Transformation, 530
501
Overview
Transformation type: Active/Passive Connected
The SQL transformation processes SQL queries midstream in a pipeline. You can insert, delete, update, and retrieve rows from a database. You can pass the database connection information to the SQL transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the query and returns rows and database errors. For example, you might need to create database tables before adding new transactions. You can create an SQL transformation to create the tables in a workflow. The SQL transformation returns database errors in an output port. You can configure another workflow to run if the SQL transformation returns no errors. When you create an SQL transformation, you configure the following options:
Script mode. The SQL transformation runs ANSI SQL scripts that are externally located. You pass a script name to the transformation with each input row. The SQL transformation outputs one row for each input row. For more information about script mode, see Script Mode on page 503. Query mode. The SQL transformation executes a query that you define in a query editor. You can pass strings or parameters to the query to define dynamic queries or change the selection parameters. You can output multiple rows when the query has a SELECT statement. For more information about query mode, see Query Mode on page 506.
Database type. The type of database the SQL transformation connects to. Connection type. Pass database connection information to the SQL transformation or use a connection object. For more information about connection types, see Connecting to Databases on page 513.
502
Script Mode
An SQL transformation running in script mode runs SQL scripts from text files. You pass each script file name from the source to the SQL transformation ScriptName port. The script file name contains the complete path to the script file. When you configure the transformation to run in script mode, you create a passive transformation. The transformation returns one row for each input row. The output row contains results of the query and any database error. When the SQL transformation runs in script mode, the query statement and query data do not change. When you need to run different queries in script mode, you pass the scripts in the source data. Use script mode to run data definition queries such as creating or dropping tables. When you configure an SQL transformation to run in script mode, the Designer adds the ScriptName input port to the transformation. When you create a mapping, you connect the ScriptName port to a port that contains the name of a script to execute for each row. You can execute a different SQL script for each input row. The Designer creates default ports that return information about query results. Figure 23-1 shows the default ports for an SQL transformation configured to run in script mode:
Figure 23-1. SQL Transformation Script Mode Ports
Script Mode
503
An SQL transformation configured for script mode has the following default ports:
Port ScriptName ScriptResult ScriptError Type Input Output Output Description Receives the name of the script to execute for the current row. Returns PASSED if the script execution succeeds for the row. Otherwise contains FAILED. Returns errors that occur when a script fails for a row.
Example
You need to create order and inventory tables before adding new data to the tables. You can create an SQL script to create the tables and configure an SQL transformation to run the script. You create a file called create_order_inventory.txt that contains the SQL statements to create the tables. The following mapping shows how to pass the script name to the SQL transformation:
The Integration Service reads a row from the source. The source row contains the SQL script file name and path:
C:\81\server\shared\SrcFiles\create_order_inventory.txt
The transformation receives the file name in the ScriptName port. The Integration Service locates the script file and parses the script. It creates an SQL procedure and sends it to the database to process. The database validates the SQL and executes the query. The SQL transformation returns the ScriptResults and ScriptError. If the script executes successfully, the ScriptResult output port returns PASSED. Otherwise, the ScriptResult port returns FAILED. When the ScriptResult is FAILED, the SQL transformation returns error messages in the ScriptError port. The SQL transformation returns one row for each input row it receives.
You can use a static or dynamic database connection with script mode. For more information about configuring database connections with the SQL transformation, see Connecting to Databases on page 513.
504
To include multiple query statements in a script, you can separate them with a semicolon. You can use mapping variables or parameters in the script file name. The script code page defaults to the locale of the operating system. You can change the locale of the script. For more information about the script locale property, see SQL Settings Tab on page 526. The script file must be accessible by the Integration Service. The Integration Service must have read permissions on the directory that contains the script. If the Integration Service uses operating system profiles, the operating system user of the operating system profile must have read permissions on the directory that contains the script. The Integration Service ignores the output of any SELECT statement you include in the SQL script. The SQL transformation in script mode does not output more than one row of data for each input row. You cannot use scripting languages such as Oracle PL/SQL or Microsoft/Sybase T-SQL in the script. You cannot use nested scripts where the SQL script calls another SQL script. A script cannot accept run-time arguments.
Script Mode
505
Query Mode
When an SQL transformation runs in query mode, it executes an SQL query that you define in the transformation. You pass strings or parameters to the query from the transformation input ports to change the query statement or the query data. When you configure the SQL transformation to run in query mode, you create an active transformation. The transformation can return multiple rows for each input row. Create queries in the SQL transformation SQL Editor. To create a query, type the query statement in the SQL Editor main window. The SQL Editor provides a list of the transformation ports that you can reference in the query. Figure 23-2 shows an SQL query in the SQL Editor:
Figure 23-2. SQL Editor for an SQL Transformation Query
When you create a query, the SQL Editor validates the port names in the query. It also verifies that the ports you use for string substitution are string datatypes. The SQL Editor does not validate the syntax of the SQL query. You can create the following types of SQL queries in the SQL transformation:
Static SQL query. The query statement does not change, but you can use query parameters to change the data. The Integration Service prepares the query once and runs the query for all input rows. For more information about static queries, see Using Static SQL Queries on page 507. Dynamic SQL query. You can change the query statements and the data. The Integration Service prepares a query for each input row. For more information about dynamic queries, see Using Dynamic SQL Queries on page 508.
506
When you create a static query, the Integration Service prepares the SQL procedure once and executes it for each row. When you create a dynamic query, the Integration Service prepares the SQL for each input row. You can optimize performance by creating static queries.
The following static SQL query has query parameters that bind to the Employee_ID and Dept input ports of an SQL transformation:
SELECT Name, Address FROM Employees WHERE Employee_Num =?Employee_ID? and Dept = ?Dept?
The Integration Service generates the following query statements from the rows:
SELECT Name, Address FROM Employees WHERE Employee_ID = 100 and DEPT = Products SELECT Name, Address FROM Employees WHERE Employee_ID = 123 and DEPT = HR SELECT Name, Address FROM Employees WHERE Employee_ID = 130 and DEPT = Accounting
Query Mode
507
When you configure output ports for database columns, you need to configure the datatype of each database column you select. Select a native datatype from the list. When you select the native datatype, the Designer configures the transformation datatype for you. The native datatype in the transformation must match the database column datatype. The Integration Service matches the column datatype in the database with the native database type in the transformation at run time. If the datatypes do not match, the Integration Service generates a row error. The Integration Service processes bigint and subsecond data values depending on the type of database used. For more information about transformation datatypes you can use for each database, see Datatype Reference in the Designer Guide. Figure 23-3 shows the ports in the transformation configured to run in query mode:
Figure 23-3. SQL Transformation Static Query Mode Ports
The input ports receive the data in the WHERE clause. The output ports return the columns from the SELECT statement.
The SQL query selects name and address from the employees table. The SQL transformation writes a row to the target for each database row it retrieves.
508
To change a query statement, configure a string variable in the query for the portion of the query you want to change. To configure the string variable, identify an input port by name in the query and enclose the name with the tilde (~). The query changes based on the value of the data in the port. The transformation input port that contains the query parameter must be a string datatype. You can use string substitution to change the query statement and the query data. When you create a dynamic SQL query, the Integration Service prepares a query for each input row. You can pass the full query or pass part of the query in an input port:
Full query. You can substitute the entire SQL query with query statements from source data. For more information, see Passing the Full Query on page 509. Partial query. You can substitute a portion of the query statement, such as the table name. For more information, see Substituting the Table Name in a String on page 510.
The transformation receives the query in the Query_Port input port. Figure 23-4 shows ports in the SQL transformation:
Figure 23-4. SQL Transformation Ports to Pass a Full Dynamic Query
Query_Port receives the the query statements. SQLError returns any database error.
Query Mode
509
The Integration Service replaces the ~Query_Port~ variable in the dynamic query with the SQL statements from the source. It prepares the query and sends it to the database to process. The database executes the query. The SQL transformation returns database errors to the SQLError port. The following mapping shows how to pass the query to the SQL transformation:
When you pass the full query, you can pass more than one query statement for each input row. For example, the source might contain the following rows:
DELETE FROM Person WHERE LastName = Jones; INSERT INTO Person (LastName, Address) VALUES ('Smith', '38 Summit Drive') DELETE FROM Person WHERE LastName = Jones; INSERT INTO Person (LastName, Address) VALUES ('Smith', '38 Summit Drive') DELETE FROM Person WHERE LastName = Russell;
You can pass any type of query in the source data. When you configure SELECT statements in the query, you must configure output ports for the database columns you retrieve from the database. When you mix SELECT statements and other types of queries, the output ports that represent database columns contain null values when no database columns are retrieved.
The source might pass the following values to the Table_Port column:
Table_Port Employees_USA Employees_England Employees_Australia
The Integration Service replaces the ~Table_Port~ variable with the table name in the input port:
SELECT Emp_ID, Address from Employees_USA where Dept = HR SELECT Emp_ID, Address from Employees_England where Dept = HR SELECT Emp_ID, Address from Employees_Australia where Dept = HR
510
For more information about how to use parameter variables in the SQL transformation, see Dynamic Update Example on page 535.
The number and the order of the output ports must match the number and order of the fields in the query SELECT clause. The native datatype of an output port in the transformation must match the datatype of the corresponding column in the database. The Integration Service generates a row error when the datatypes do not match. When the SQL query contains an INSERT, UPDATE, or DELETE clause, the transformation returns data to the SQLError port, the pass-through ports, and the NumRowsAffected port when it is enabled. If you add output ports, you can configure default values for them. Otherwise the ports receive NULL data values. When the SQL query contains a SELECT statement and the transformation has a passthrough port, the transformation returns data to the pass-through port whether or not the
Query Mode
511
query returns database data. The SQL transformation returns a row with NULL data in the database output ports.
You cannot add the "_output" suffix to output port names that you create. You cannot use the pass-through port to return data from a SELECT query. When the number of output ports is more than the number of columns in the SELECT clause, the extra ports receive a NULL or default value in a session. When the number of output ports is less than the number of columns in the SELECT clause, the Integration Service generates a row error. You can use string substitution instead of parameter binding in a query. However, the input ports must be string datatypes.
512
Connecting to Databases
You can use a static database connection or you can pass database connection information to the SQL transformation at run time. Use one of the following types of connections to connect the SQL transformation to a database:
Static connection. Configure the connection object in the session. You must first create the connection object in Workflow Manager. Logical connection. Pass a connection name to the SQL transformation as input data at run time. You must first create the connection object in Workflow Manager. Full Database Connection. Pass the connect string, user name, password, and other connection information to SQL transformation input ports at run time.
Connecting to Databases
513
Table 23-1 describes the ports that the Designer creates when you configure an SQL transformation to connect to a database with a full connection:
Table 23-1. Full Database Connection Information
Port ConnectString DBUser DBPasswd CodePage Required/ Optional Required Required Required Optional Description Contains the database name and database server name. For information about passing the connect string, see Passing the Connect String on page 514. Name of the user with permissions to read and write from the database. DBUser password. Code page the Integration Service uses to read from or write to the database. Use the ISO code page name, such as ISO-8859-6. The code page name is not case sensitive. For a list of supported code page names, see Code Pages in the Administrator Guide. Connection attributes. Pass the attributes as name-value pairs. Delimit each attribute from another with a semicolon. Attribute names are not case sensitive. For more information about advanced options, see Passing Advanced Options on page 514.
AdvancedOptions
Optional
*Sybase ASE servername is the name of the Adaptive Server from the interfaces file. **Use Teradata ODBC drivers to connect to source and target databases.
For example, you might pass the following string to configure connection options:
Use Trusted Connection = 1; Connection Retry Period = 5
Teradata Sybase ASE Microsoft SQL Server Teradata Microsoft SQL Server
Domain Name
Oracle
All Sybase ASE Microsoft SQL Server Sybase ASE Microsoft SQL Server Microsoft SQL Server
You need the PowerCenter license key to connect different database types. A session fails if PowerCenter is not licensed to connect to the database.
Connecting to Databases
515
To improve performance, use a static database connection. When you configure a dynamic connection, the Integration Service establishes a new connection for each input row. When you have a limited number of connections to use in a session, you can configure multiple SQL transformations. Configure each SQL transformation to use a different static connection. Use a Router transformation to route rows to a SQL transformation based on connectivity information in the row. When you configure the SQL transformation to use full connection data, the database password is plain text. You can pass logical connections when you have a limited number of connections you need to use in a session. A logical connection provides the same functionality as the full connection, and the database password is secure. When you pass logical database connections to the SQL transformation, the Integration Service accesses the repository to retrieve the connection information for each input row. When you have many rows to process, passing logical database connections might have a performance impact.
516
Session Processing
When the Integration Service processes an SQL transformation, it runs SQL queries midstream in the pipeline. When a SELECT query retrieves database rows, the SQL transformation returns the database columns in the output ports. For other types of queries, the SQL transformation returns query results, pass-through data, or database errors in output ports. The SQL transformation configured to run in script mode always returns one row for each input row. A SQL transformation that runs in query mode can return a different number of rows for each input row. The number of rows the SQL transformation returns is based on the type of query it runs and the success of the query. For more information, see Input Row to Output Row Cardinality on page 517. You can use transaction control with the SQL transformation when you configure the transformation to use a static database connection. You can also issue commit and rollback statements in the query. For more information, see Transaction Control on page 522. The SQL transformation provides database connection resiliency. However, you cannot recover an SQL transformation with a resume from last checkpoint recovery strategy. For more information, see High Availability on page 522.
Query statement processing. When the query contains a SELECT statement, the Integration Service can retrieve multiple output rows. NumRowsEffected port configuration. The NumRowsAffected output port contains the total number of rows affected by updates, inserts, or deletes for one input row. Pass-through ports. When the SQL transformation contains pass-through ports, the transformation returns the column data at least once for each source row. Query results. When a SELECT query is successful, the SQL transformation might retrieve multiple rows. When the query contains other statements, the Integration Service might generate a row that contains SQL errors or the number of rows affected. The maximum row count configuration. The Max Output Row Count limits the number of rows the SQL transformation returns from SELECT queries.
Table 23-3 lists the output rows the SQL transformation generates for different types of query statements when no errors occur in query mode:
Table 23-3. Output Rows By Query Statement - Query Mode
Query Statement UPDATE, INSERT, DELETE only One or more SELECT statements DDL queries such as CREATE, DROP, TRUNCATE Output Rows Zero rows. Total number of database rows retrieved. Zero rows.
script mode, NumRowsAffected is always NULL. Table 23-4 lists the output rows the SQL transformation generates when you enable NumRowsAffected in query mode:
Table 23-4. NumRowsAffected Rows by Query Statement - Query Mode
Query Statement UPDATE, INSERT, DELETE only One or more SELECT statements DDL queries such as CREATE, DROP, TRUNCATE Output Rows One row with the NumRowsAffected total. Total number of database rows retrieved. NumRowsAffected is zero in each row. One row with zero NumRowsAffected.
When the SQL transformation runs in query mode and a query contains multiple statements, the Integration Service returns the NumRowsAffected sum in the last row it returns for an input row. NumRowsAffected contains the sum of the rows affected by all INSERT, UPDATE, and DELETE statements from an input row. For example, a query contains the following statements:
DELETE from Employees WHERE Employee_ID = 101; SELECT Employee_ID, LastName from Employees WHERE Employee_ID = 103; INSERT into Employees (Employee_ID, LastName, Address)VALUES (102, 'Gein', '38 Beach Rd')
518
The DELETE statement affects one row. The SELECT statement does not affect any row. The INSERT statement affects one row. The value of NumRowsAffected is two. The Integration Service returns this value in the last output row it returns for the input row. The Integration Service returns no output rows from the DELETE statement. It returns one row from the SELECT statement. It returns one row from the INSERT statement that contains the NumRowsAffected total. The NumRowsAffected port returns zero when all of the following conditions are true:
The database is Informix. The transformation is running in query mode. The query contains no parameters.
If the first SELECT statement returns 200 rows, and the second SELECT statement returns 50 rows, the SQL transformation returns 100 rows from the first SELECT statement. It returns no rows from the second statement. To configure unlimited output rows, set Max Output Row Count to zero.
SQLError. Returns database errors when the SQL transformation runs in query mode. ScriptError. Returns database errors when the SQL transformation runs in script mode.
When the SQL query contains syntax errors, the error port contains the error text from the database. For example, the following SQL query generates a row error from an Oracle database:
SELECT Product_ID FROM Employees
The Employees table does not contain Product_ID. The Integration Service generates one row. The SQLError port contains the error text in one line:
ORA-0094: Product_ID: invalid identifier Database driver error... Function Name: Execute SQL Stmt: SELECT Product_ID from Employees Oracle Fatal Error
Session Processing
519
When a query contains multiple statements, and you configure the SQL transformation to continue on SQL error, the SQL transformation might return rows from the database for one query statement, but return database errors for another query statement. The SQL transformation returns any database error in a separate row. For more information about continuing on SQL errors, see Continuing on SQL Error on page 521. When you configure a pass-through port or the NumRowsAffected port, the SQL transformation returns at least one row for each source row. When a query returns no data, the SQL transformation returns the pass-through data and the NumRowsAffected values, but it returns null values in the database output ports. You can remove rows with null values by passing the output rows through a Filter transformation. The following tables describe the output rows the SQL transformation returns based on the type of query statements. Table 23-5 describes the rows the SQL transformation generates for UPDATE, INSERT, or DELETE query statements:
Table 23-5. Output Rows by Query Statement - UPDATE, INSERT, or DELETE Queries
SQLError No NumRowsAffected Port and/or Pass-Through Port Configured - No ports configured. - Either port configured. Yes - No ports configured. - Either port configured. Rows Output - Zero rows. - One row with the NumRowsAffected and/or the pass-through column data. - One row with the error in the SQLError port. - One row with the error in the SQLError port, the NumRowsAffected column, and/or the pass-through column data.
Table 23-6 describes the number of output rows the SQL transformation generates for SELECT statements:
Table 23-6. Output Rows by Query Statement - SELECT Statement
SQLError No NumRowsAffected Port and/or Pass-Through Port Configured - No ports configured. - Either port configured. Rows Output - Zero or more rows, based on what rows are returned from each SELECT statement. - One or more rows, based on what rows are returned for each SELECT statement. If NumRowsAffected is enabled, each row contains a NumRowsAffected column with a value zero. If a pass-through port is configured, each row contains the passthrough column data. When the query returns multiple rows, the pass-through column data is duplicated in each row.
520
Table 23-7 describes the number of output rows the SQL transformation generates for DDL queries such as CREATE, DROP, or TRUNCATE:
Table 23-7. Output Rows by Query Statement - DDL Queries
SQLError No NumRowsAffected Port and/or Pass-Through Port Configured - No ports configured. - Either port configured. Yes - No ports configured. - Either port configured. Rows Output - Zero rows. - One row that includes the NumRowsAffected column with value zero and/or the pass-through column data. - One row that contains the error in the SQLError port. - One row with the error in the SQLError port, the NumRowsAffected column with value zero, and/or the pass-through column data.
If the DELETE statement fails, the SQL transformation returns an error message from the database. The Integration Service continues processing the INSERT statement.
Tip: Disable the Continue on SQL Error option to debug database errors. Otherwise you
might not be able to associate errors with the query statements that caused them.
Session Processing
521
Transaction Control
An SQL transformation that runs in script mode drops any incoming transaction boundary from an upstream source or transaction generator. The Integration Service issues a commit after executing the script for each input row in the SQL transformation. The transaction contains the set of rows affected by the script. An SQL transformation that runs in query mode commits transactions at different points based on the database connection type:
Dynamic database connection. The Integration Service issues a commit after executing the SQL for each input row. The transaction is the set of rows affected by the script. You cannot use a Transaction Control transformation with dynamic connections in query mode. For more information about how to use a dynamic database connection, see Dynamic Connection Example on page 541. Static connection. The Integration Service issues a commit after processing all the input rows. The transaction includes all the database rows to update. You can override the default behavior by using a Transaction Control transformation to control the transaction, or by using commit and rollback statements in the SQL query. When you configure an SQL statement to commit or rollback rows, configure the SQL transformation to generate transactions with the Generate Transaction transformation property. Configure the session for user-defined commit.
For more information about transaction control, see Understanding Commit Points in the Workflow Administration Guide. The following transaction control SQL statements are not valid with the SQL transformation:
SAVEPOINT. Identifies a rollback point in the transaction. SET TRANSACTION. Changes transaction options.
High Availability
When you have high availability, the SQL transformation provides database connection resiliency for static and dynamic connections. When the Integration Service fails to connect to the database, it retries the connection. You can configure the connection retry period for a connection. When the Integration Service cannot connect to the database in the time period that you configure, it generates a row error for a dynamic connection or fails the session for a static connection. The Integration Service is not resilient to temporary network failures or relational database unavailability when the Integration Service retrieves data from a database for the SQL transformation. When a connection fails during processing, the session fails. The Integration Service cannot reconnect and continue processing from the last commit point. You cannot resume a session from the last checkpoint when it contains the SQL transformation. When the Integration Service resumes a session, the recovery session must produce the same data as the original session. The SQL transformation cannot produce repeatable data between session runs. You can recover the workflow when you configure the session to restart.
522 Chapter 23: SQL Transformation
Ports. Displays the transformation ports and attributes that you create on the SQL Ports tab. Properties. SQL Transformation general properties. For more information, see Properties Tab on page 523. Initialization Properties. Run-time properties that the transformation uses during initialization. For more information about creating initialization properties, see Working with Procedure Properties on page 90. Metadata Extensions. Property name and value pairs you can use with a procedure when the Integration Service runs the procedure. For more information about creating metadata extensions, see Metadata Extensions in the Repository Guide. Port Attribute Definitions. User-defined port attributes that apply to all ports in the transformation. For more information about port attributes, see Working with Port Attributes on page 80. SQL Setting. Attributes unique to the SQL transformation. For more information, see SQL Settings Tab on page 526. SQL Ports. SQL transformation ports and attributes. For more information, see SQL Ports Tab on page 527.
Note: You cannot update the columns on the Ports tab. When you define ports on the SQL
Properties Tab
Configure the SQL transformation general properties on the Properties tab. Some transformation properties do not apply to the SQL transformation or are not configurable.
523
Tracing Level
Required
524
Optional
Required
Optional
Optional
Output is Deterministic
Optional
525
Table 23-9 lists the attributes you can configure on the SQL Setting tab:
Table 23-9. SQL Settings Tab Attributes
Option Continue on SQL Error within row Add Statistic Output Port Description Continues processing the remaining SQL statements in a query after an SQL error occurs. Adds a NumRowsAffected output port. The port returns the total number of database rows affected by INSERT, DELETE, and UPDATE query statements for an input row. Defines the maximum number of rows the SQL transformation can output from a SELECT query. To configure unlimited rows, set Max Output Row Count to zero. Identifies the code page for a SQL script. Choose the code page from the list. Default is operating system locale.
526
SQLError
Output
Query
527
528
SQL Statements
Table 23-11 lists the statements you can use with the SQL transformation:
Table 23-11. Standard SQL Statements
Statement Type Data Definition Statement ALTER COMMENT CREATE DROP RENAME TRUNCATE Data Manipulation CALL DELETE EXPLAIN PLAN INSERT LOCK TABLE MERGE SELECT UPDATE Data Control Language GRANT REVOKE Transaction Control COMMIT ROLLBACK Description Modifies the structure of the database. Adds comments to the data dictionary. Creates a database, table, or index. Deletes an index, table, or database. Renames a database object. Removes all rows from a table. Calls a PL/SQL or Java subprogram. Deletes rows from a table. Writes the access plan for a statement into the database Explain tables. Inserts row into a table. Prevents concurrent application processes from using or changing a table. Updates a table with source data. Retrieves data from the database. Updates the values of rows of a table. Grants privileges to a database user. Removes access privileges for a database user. Saves a unit of work and performs the database changes for that unit of work. Reverses changes to the database since the last COMMIT.
SQL Statements
529
Click Transformation > Create. Select the SQL transformation. Enter a name for the transformation. The naming convention for an SQL transformation is SQL_TransformationName. Enter a description for the transformation and click Create.
4.
Query mode. Configure an active transformation that executes dynamic SQL queries. Script mode. Configure a passive transformation that executes external SQL scripts.
5. 6.
Configure the database type that the SQL transformation connects to. Choose the database type from the list. Configure the SQL transformation connection options.
530
Connection Object
7.
Click OK to configure the transformation. The Designer creates default ports in the transformation depending on the options you choose. You cannot change the configuration except for the database type.
8.
Click the Ports tab to add ports to the transformation. Add pass-through ports after database ports. For more information about the SQL transformation input and output ports, see SQL Ports Tab on page 527.
531
532
Chapter 24
Overview, 534 Dynamic Update Example, 535 Dynamic Connection Example, 541
533
Overview
The SQL transformation processes SQL queries midstream in a pipeline. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. You can pass the database connection information to the SQL transformation as input data at run time. This chapter provides two examples that illustrate SQL transformation functionality. You use the examples in this chapter to create and execute dynamic SQL queries and to connect dynamically to databases. The chapter provides sample data and descriptions of the transformations that you can include in mappings. The chapter provides the following examples:
Creating a dynamic SQL query to update a database. The dynamic query update example shows how update product prices in a table based on a price code received from a source file. For more information, see Dynamic Update Example on page 535. Configuring a dynamic database connection. The dynamic connection example shows how to connect to different databases based on the value of a customer location in a source row. For more information, see Dynamic Connection Example on page 541.
534
The Expression transformation passes column names to the SQL transformation in the UnitPrice_Query and PkgPrice_Query ports.
PPrices source definition. The PPrices flat file contains a product ID, package price, unit price, and price code. The price code defines whether the package price and unit price are wholesale, retail, or manufactured prices. For more information about the PPrices source definition, see Defining the Source File on page 536. Error_File flat file target definition. The target contains the Datastring field that receives database errors from the SQL transformation. For more information about Error_File, see Creating a Target Definition on page 537. Exp_Dynamic_Expression transformation. The Expression transformation defines which Prod_Cost column names to update based on the value of the PriceCode column. It returns the column names in the UnitPrice_Query and PkgPrice_Query ports. For more information about the Exp_Dynamic_Expression, see Configuring the Expression Transformation on page 538.
535
SQL_Dynamic_Query transformation. The SQL transformation has a dynamic SQL query to update a UnitPrice column and a PkgPrice column in the Prod_Cost table. It updates the columns named in the UnitPrice_Query and PkgPrice_Query columns. For more information about the SQL_Dynamic_Query transformation, see Defining the SQL Transformation on page 544.
Note: The mapping does not contain a relational table definition for the Prod_Cost table. The SQL transformation has a static connection to the database that contains the Prod_Cost table. The transformation generates the SQL statements to update the unit prices and package prices in the table. For more information about dynamic queries, see Using Dynamic SQL Queries on page 508.
You can import the PPrices.dat file to create the PPrices source definition in the repository. The PPrices file contains the following columns:
Column ProductID PriceCode UnitPrice PkgPrice Datatype String String Number Number Precision 10 2 10 10 Description A unique number that identifies the product to update. M, W, or R. Defines whether the prices are Manufactured, Wholesale, or Retail prices. The price for each unit of the product. The price for a package of the product.
536
The following SQL statements create the Prod_Cost table and three product rows on an Oracle database:
Create table Prod_Cost (ProductId varchar (10), WUnitPrice number, WPkgPrice number, RUnitPrice number, RPkgPrice number,MUnitPrice number, MPkgPrice number ); insert into Prod_Cost values('100',0,0,0,0,0,0); insert into Prod_Cost values('200',0,0,0,0,0,0); insert into Prod_Cost values('300',0,0,0,0,0,0); commit;
537
UnitPrice_Query and PkgPrice_Query ports pass column names to the SQL transformation based on expression results.
The SQL transformation has the following columns that contain the results of expressions:
UnitPrice_Query. Returns the column name MUnitprice, RUnitPrice, or WUnitprice based on the whether the price code is M, R, or W.
DECODE(PriceCode,'M', 'MUnitPrice','R', 'RUnitPrice','W', 'WUnitPrice')
PkgPrice_Query. Returns the column name MPkgPrice, RPkgPrice, or WPkgPrice, based on the whether the price code is M, R, or W.
DECODE(PriceCode,'M', 'MPkgPrice','R', 'RPkgPrice','W', 'WPkgPrice')
Query Mode. The SQL transformation executes dynamic SQL queries. Static Connection. The SQL transformation connects once to the database with the connection object you define in the Workflow Manager.
538
For more information about defining the SQL transformation, see Creating an SQL Transformation on page 530. Figure 24-3 shows the SQL transformation Ports tab:
Figure 24-3. Dynamic Query SQL Transformation Ports tab:
The SQL transformation has a dynamic SQL query that updates one of the UnitPrice columns and one of the PkgPrice columns in the Prod_Cost table based on the column names it receives in the UnitPrice_Query and the PkgPrice_Query ports. The SQL transformation has the following query:
Update Prod_Cost set ~UnitPrice_Query~= ?UnitPrice?, ~PkgPrice_Query~ = ?PkgPrice? where ProductId = ?ProductId?;
The SQL transformation substitutes the UnitPrice_Query and PkgPrice_Query string variables with the column names to update. The SQL transformation binds the ProductId, UnitPrice and PkgPrice parameters in the query with data that it receives in the corresponding ports. For example, the following source row contains a unit price and a package price for product 100:
100,M,100,110
539
When the PriceCode is M, the prices are manufacturing prices. The Expression transformation passes MUnitprice and MPkgPrice column names to the SQL transformation to update. The SQL transformation executes the following query:
Update Prod_Cost set MUnitprice = 100, MPkgPrice = 110 where ProductId = 100;
The following source row contains wholesale prices for product 100:
100,W,120,200
The Expression transformation passes WUnitprice and WPkgPrice column names to the SQL transformation. The SQL transformation executes the following query:
Update Prod_Cost set WUnitprice = 120, WPkgPrice = 200 where ProductId = 100;
If the database returns any errors to the SQL transformation, the Error_File target contains the error text.
540
Customer source definition. A flat file source definition that includes customer information. The customer location determines which database the SQL transformation connects to when it inserts the customer data. For more information about the source definition, see Defining the Source File on page 542. Error_File target definition. The target contains a Datastring field that receives database errors from the SQL transformation. For more information about the Error_File target definition, see Creating a Target Definition on page 542. Exp_Dynamic_Connection transformation. The Expression transformation defines which database to connect to based on the value of the Location column. The Expression transformation returns the connection object name in the Connection port. The connection object is a database connection defined in the Workflow Manager. For more information about the Exp_Dynamic_Connection transformation, see Configuring the Expression Transformation on page 543. SQL_Dynamic_Connection transformation. The SQL transformation receives a connection object name in the LogicalConnectionPort. It connects to the database and inserts the customer data in the database. For more information about the SQL_Dynamic_Connection transformation, see Defining the SQL Transformation on page 544.
541
You can create a Customer.dat file in Srcfiles that contains the following rows:
1,John Smith,6502345677,[email protected],US 2,Nigel Whitman,5123456754,[email protected],UK 3,Girish Goyal,5674325321,[email protected],CAN 4,Robert Gregson,5423123453,[email protected],US
You can import the Customer.dat file to create the Customer source definition in the repository.
542
Note: This example includes three databases to illustrate dynamic database connections. If the
tables are in the same database, use a static database connection to improve performance.
For more information about creating database connections, see Managing Connection Objects in the Workflow Administration Guide.
The expression returns the name of a connection object based on the customer location.
543
The expression returns a connection object name based on whether the location is US, UK, or CAN. When the location does not match a location in the expression, the connection object name defaults to DBORA_US. For example, the following source row customer information for a customer from the United States:
1,John Smith,6502345677,[email protected],US
When the customer location is US, the Expression transformation returns the DBORA_US connection object name. The Expression transformation passes DBORA_US to the SQL transformation LogicalConnectionObject port.
Query Mode. The SQL transformation executes dynamic SQL queries. Dynamic Connection. The SQL transformation connects to databases depending on connection information you pass to the transformation in a mapping. Connection Object. The SQL transformation has a LogicalConnectionObject port that receives the connection object name. The connection object must be defined in the Workflow Manager connections.
For more information about defining the SQL transformation, see Creating an SQL Transformation on page 530.
544
SQL Query
The SQL transformation receives the connection object name in the LogicalConnectionObject port. It connects to the database with the connection object name each time it processes a row. The transformation has the following dynamic SQL query to insert the customer data into a CUST table:
INSERT INTO CUST VALUES (?CustomerId?,?CustomerName?,?PhoneNumber?,?Email?);
The SQL transformation substitutes parameters in the query with customer data from the input ports of the transformation. For example, the following source row contains customer information for customer number 1:
1,John Smith,6502345677,[email protected],US
The SQL transformation connects to the database with the DBORA_US connection object. It executes the following SQL query:
INSERT INTO CUST VALUES (1,John Smith,6502345677,[email protected]);
545
Note: Do not configure a database connection for the session. The SQL transformation can
If the database returns any errors to the SQL transformation, the Error_File target contains the error text.
546
Chapter 25
Overview, 548 Using a Stored Procedure in a Mapping, 552 Writing a Stored Procedure, 553 Creating a Stored Procedure Transformation, 556 Configuring a Connected Transformation, 562 Configuring an Unconnected Transformation, 563 Error Handling, 569 Supported Databases, 571 Expression Rules, 573 Tips, 574 Troubleshooting, 575
547
Overview
Transformation type: Passive Connected/Unconnected
A Stored Procedure transformation is an important tool for populating and maintaining databases. Database administrators create stored procedures to automate tasks that are too complicated for standard SQL statements. A stored procedure is a precompiled collection of Transact-SQL, PL-SQL or other database procedural statements and optional flow control statements, similar to an executable script. Stored procedures are stored and run within the database. You can run a stored procedure with the EXECUTE SQL statement in a database client tool, just as you can run SQL statements. Unlike standard SQL, however, stored procedures allow user-defined variables, conditional statements, and other powerful programming features. Not all databases support stored procedures, and stored procedure syntax varies depending on the database. You might use stored procedures to complete the following tasks:
Check the status of a target database before loading data into it. Determine if enough space exists in a database. Perform a specialized calculation. Drop and recreate indexes.
Database developers and programmers use stored procedures for various tasks within databases, since stored procedures allow greater flexibility than SQL statements. Stored procedures also provide error handling and logging necessary for critical tasks. Developers create stored procedures in the database using the client tools provided with the database. The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Integration Service. You might use a stored procedure to perform a query or calculation that you would otherwise make part of a mapping. For example, if you already have a well-tested stored procedure for calculating sales tax, you can perform that calculation through the stored procedure instead of recreating the same calculation in an Expression transformation.
548
Some limitations exist on passing data, depending on the database implementation, which are discussed throughout this chapter. Additionally, not all stored procedures send and receive data. For example, if you write a stored procedure to rebuild a database index at the end of a session, you cannot receive data, since the session has already finished.
Input/Output Parameters
For many stored procedures, you provide a value and receive a value in return. These values are known as input and output parameters. For example, a sales tax calculation stored procedure can take a single input parameter, such as the price of an item. After performing the calculation, the stored procedure returns two output parameters, the amount of tax, and the total cost of the item including the tax. The Stored Procedure transformation sends and receives input and output parameters using ports, variables, or by entering a value in an expression, such as 10 or SALES.
Return Values
Most databases provide a return value after running a stored procedure. Depending on the database implementation, this value can either be user-definable, which means that it can act similar to a single output parameter, or it may only return an integer value. The Stored Procedure transformation captures return values in a similar manner as input/ output parameters, depending on the method that the input/output parameters are captured. In some instances, only a parameter or a return value can be captured. If a stored procedure returns a result set rather than a single return value, the Stored Procedure transformation takes only the first value returned from the procedure.
Note: An Oracle stored function is similar to an Oracle stored procedure, except that the
stored function supports output parameters or return values. In this chapter, any statements regarding stored procedures also apply to stored functions, unless otherwise noted.
Status Codes
Status codes provide error handling for the Integration Service during a workflow. The stored procedure issues a status code that notifies whether or not the stored procedure completed successfully. You cannot see this value. The Integration Service uses it to determine whether to continue running the session or stop. You configure options in the Workflow Manager to continue or stop the session in the event of a stored procedure error.
Connected. The flow of data through a mapping in connected mode also passes through the Stored Procedure transformation. All data entering the transformation through the input ports affects the stored procedure. You should use a connected Stored Procedure
Overview
549
transformation when you need data from an input port sent as an input parameter to the stored procedure, or the results of a stored procedure sent as an output parameter to another transformation.
Unconnected. The unconnected Stored Procedure transformation is not connected directly to the flow of the mapping. It either runs before or after the session, or is called by an expression in another transformation in the mapping.
Unconnected Unconnected
For more information, see Configuring a Connected Transformation on page 562 and Configuring an Unconnected Transformation on page 563.
Normal. The stored procedure runs where the transformation exists in the mapping on a row-by-row basis. This is useful for calling the stored procedure for each row of data that passes through the mapping, such as running a calculation against an input port. Connected stored procedures run only in normal mode.
550
Pre-load of the Source. Before the session retrieves data from the source, the stored procedure runs. This is useful for verifying the existence of tables or performing joins of data in a temporary table. Post-load of the Source. After the session retrieves data from the source, the stored procedure runs. This is useful for removing temporary tables. Pre-load of the Target. Before the session sends data to the target, the stored procedure runs. This is useful for verifying target tables or disk space on the target system. Post-load of the Target. After the session sends data to the target, the stored procedure runs. This is useful for re-creating indexes on the database.
You can run more than one Stored Procedure transformation in different modes in the same mapping. For example, a pre-load source stored procedure can check table integrity, a normal stored procedure can populate the table, and a post-load stored procedure can rebuild indexes in the database. However, you cannot run the same instance of a Stored Procedure transformation in both connected and unconnected mode in a mapping. You must create different instances of the transformation. If the mapping calls more than one source or target pre- or post-load stored procedure in a mapping, the Integration Service executes the stored procedures in the execution order that you specify in the mapping. The Integration Service executes each stored procedure using the database connection you specify in the transformation properties. The Integration Service opens the database connection when it encounters the first stored procedure. The database connection remains open until the Integration Service finishes processing all stored procedures for that connection. The Integration Service closes the database connections and opens a new one when it encounters a stored procedure using a different database connection. To run multiple stored procedures that use the same database connection, set these stored procedures to run consecutively. If you do not set them to run consecutively, you might have unexpected results in the target. For example, you have two stored procedures: Stored Procedure A and Stored Procedure B. Stored Procedure A begins a transaction, and Stored Procedure B commits the transaction. If you run Stored Procedure C before Stored Procedure B, using another database connection, Stored Procedure B cannot commit the transaction because the Integration Service closes the database connection when it runs Stored Procedure C. Use the following guidelines to run multiple stored procedures within a database connection:
The stored procedures use the same database connect string defined in the stored procedure properties. You set the stored procedures to run in consecutive order. The stored procedures have the same stored procedure type:
552
The stored procedure receives the employee ID 101 as an input parameter, and returns the name Bill Takash. Depending on how the mapping calls this stored procedure, any or all of the IDs may be passed to the stored procedure. Since the syntax varies between databases, the SQL statements to create this stored procedure may vary. The client tools used to pass the SQL statements to the database also vary. Most databases provide a set of client tools, including a standard SQL editor. Some databases, such as Microsoft SQL Server, provide tools that create some of the initial SQL statements. In all cases, consult the database documentation for more detailed descriptions and examples.
Note: The Integration Service fails sessions that contain stored procedure arguments with large
objects.
Informix
In Informix, the syntax for declaring an output parameter differs from other databases. With most databases, you declare variables using IN or OUT to specify if the variable acts as an input or output parameter. Informix uses the keyword RETURNING, making it difficult to distinguish input/output parameters from return values. For example, you use the RETURN command to return one or more output parameters:
CREATE PROCEDURE GET_NAME_USING_ID (nID integer) RETURNING varchar(20); define nID integer; define outVAR as varchar(20);
553
SELECT FIRST_NAME INTO outVAR FROM CONTACT WHERE ID = nID return outVAR; END PROCEDURE;
Notice that in this case, the RETURN statement passes the value of outVAR. Unlike other databases, however, outVAR is not a return value, but an output parameter. Multiple output parameters would be returned in the following manner:
return outVAR1, outVAR2, outVAR3
Informix does pass a return value. The return value is not user-defined, but generated as an error-checking value. In the transformation, the R value must be checked.
Oracle
In Oracle, any stored procedure that returns a value is called a stored function. Rather than using the CREATE PROCEDURE statement to make a new stored procedure based on the example, you use the CREATE FUNCTION statement. In this sample, the variables are declared as IN and OUT, but Oracle also supports an INOUT parameter type, which lets you pass in a parameter, modify it, and return the modified value:
CREATE OR REPLACE FUNCTION GET_NAME_USING_ID ( nID IN NUMBER, outVAR OUT VARCHAR2) RETURN VARCHAR2 IS RETURN_VAR varchar2(100); BEGIN SELECT FIRST_NAME INTO outVAR FROM CONTACT WHERE ID = nID; RETURN_VAR := 'Success'; RETURN (RETURN_VAR); END; /
Notice that the return value is a string value (Success) with the datatype VARCHAR2. Oracle is the only database to allow return values with string datatypes.
554
Notice that the return value does not need to be a variable. In this case, if the SELECT statement is successful, a 0 is returned as the return value.
IBM DB2
The following text is an example of an SQL stored procedure on IBM DB2:
CREATE PROCEDURE get_name_using_id ( IN id_in int,
OUT emp_out char(18), OUT sqlcode_out int) LANGUAGE SQL P1: BEGIN -- Declare variables DECLARE SQLCODE INT DEFAULT 0; DECLARE emp_TMP char(18) DEFAULT ' '; -- Declare handler DECLARE EXIT HANDLER FOR SQLEXCEPTION SET SQLCODE_OUT = SQLCODE; select employee into emp_TMP from doc_employee where id = id_in; SET emp_out = EMP_TMP;
Teradata
The following text is an example of an SQL stored procedure on Teradata. It takes an employee ID number as an input parameter and returns the employee name as an output parameter:
CREATE PROCEDURE GET_NAME_USING_ID (IN nID integer, OUT outVAR varchar(40)) BEGIN SELECT FIRST_NAME INTO :outVAR FROM CONTACT where ID = :nID; END;
555
Use the Import Stored Procedure dialog box to configure the ports used by the stored procedure. Configure the transformation manually, creating the appropriate ports for any input or output parameters.
Stored Procedure transformations are created as Normal type by default, which means that they run during the mapping, not before or after the session. New Stored Procedure transformations are not created as reusable transformations. To create a reusable transformation, click Make Reusable in the Transformation properties after creating the transformation.
Note: Configure the properties of reusable transformations in the Transformation Developer,
not the Mapping Designer, to make changes globally for the transformation.
Select the stored procedure icon and add a Stored Procedure transformation. Click Transformation > Import Stored Procedure. Click Transformation > Create, and then select Stored Procedure.
When you import a stored procedure containing a period (.) in the stored procedure name, the Designer substitutes an underscore (_) for the period in the Stored Procedure transformation name.
556
In the Mapping Designer, click Transformation > Import Stored Procedure. Select the database that contains the stored procedure from the list of ODBC sources. Enter the user name, owner name, and password to connect to the database and click Connect.
Notice the folder in the dialog box displays FUNCTIONS. The stored procedures listed in this folder contain input parameters, output parameters, or a return value. If stored procedures exist in the database that do not contain parameters or return values, they appear in a folder called PROCEDURES. This applies primarily to Oracle stored procedures. For a normal connected Stored Procedure to appear in the functions list, it requires at least one input and one output port.
Tip: You can select Skip to add a Stored Procedure transformation without importing the
stored procedure. In this case, you need to manually add the ports and connect information within the transformation. For more information, see Manually Creating Stored Procedure Transformations on page 558.
3.
Select the procedure to import and click OK. The Stored Procedure transformation appears in the mapping. The Stored Procedure transformation name is the same as the stored procedure you selected. If the stored procedure contains input parameters, output parameters, or a return value, you see the appropriate ports that match each parameter or return value in the Stored Procedure transformation.
557
In this Stored Procedure transformation, you can see that the stored procedure contains the following value and parameters:
An integer return value, called RETURN_VALUE, with an output port. A string input parameter, called nNAME, with an input port. An integer output parameter, called outVar, with an input and output port.
Note: If you change the transformation name, you need to configure the name of the
stored procedure in the transformation properties. If you have multiple instances of the same stored procedure in a mapping, you must also configure the name of the stored procedure.
4.
Open the transformation, and click the Properties tab. Select the database where the stored procedure exists from the Connection Information row. If you changed the name of the Stored Procedure transformation to something other than the name of the stored procedure, enter the Stored Procedure Name.
5. 6.
Click OK. Click Repository > Save to save changes to the mapping.
In the Mapping Designer, click Transformation > Create, and then select Stored Procedure. The naming convention for a Stored Procedure transformation is the name of the stored procedure, which happens automatically. If you change the transformation name, then you need to configure the name of the stored procedure in the Transformation Properties. If you have multiple instances of the same stored procedure in a mapping, you must perform this step.
2.
Click Skip. The Stored Procedure transformation appears in the Mapping Designer.
3.
Open the transformation, and click the Ports tab. You must create ports based on the input parameters, output parameters, and return values in the stored procedure. Create a port in the Stored Procedure transformation for each of the following stored procedure parameters:
558
For the integer input parameter, you would create an integer input port. The parameter and the port must be the same datatype and precision. Repeat this for the output parameter and the return value. The R column should be selected and the output port for the return value. For stored procedures with multiple parameters, you must list the ports in the same order that they appear in the stored procedure.
4.
Click the Properties tab. Enter the name of the stored procedure in the Stored Procedure Name row, and select the database where the stored procedure exists from the Connection Information row.
5. 6.
Click OK. Click Repository > Save to save changes to the mapping.
Although the repository validates and saves the mapping, the Designer does not validate the manually entered Stored Procedure transformation. No checks are completed to verify that the proper parameters or return value exist in the stored procedure. If the Stored Procedure transformation is not configured properly, the session fails.
Connection Information
559
Tracing Level
Execution Order
Subsecond Precision
Output is Repeatable
Output is Deterministic
verify the Stored Procedure transformation each time you open the mapping. After you import or create the transformation, the Designer does not validate the stored procedure. The session fails if the stored procedure does not match the transformation.
561
Although not required, almost all connected Stored Procedure transformations contain input and output parameters. Required input parameters are specified as the input ports of the Stored Procedure transformation. Output parameters appear as output ports in the transformation. A return value is also an output port, and has the R value selected in the transformation Ports configuration. For a normal connected Stored Procedure to appear in the functions list, it requires at least one input and one output port. Output parameters and return values from the stored procedure are used as any other output port in a transformation. You can link these ports to another transformation or target.
To configure a connected Stored Procedure transformation: 1.
Create the Stored Procedure transformation in the mapping. For more information, see Creating a Stored Procedure Transformation on page 556.
2. 3. 4.
Drag ports from upstream transformations to connect to any available input ports. Drag the output ports of the Stored Procedure to other transformations or targets. Open the Stored Procedure transformation, and select the Properties tab. Select the appropriate database in the Connection Information if you did not select it when creating the transformation. Select the Tracing level for the transformation. If you are testing the mapping, select the Verbose Initialization option to provide the most information in the event that the transformation fails. Click OK.
562
From an expression. Called from an expression written in the Expression Editor within another transformation in the mapping. Pre- or post-session. Runs before or after a session.
The sections below explain how you can run an unconnected Stored Procedure transformation.
However, just like a connected mapping, you can apply the stored procedure to the flow of data through the mapping. In fact, you have greater flexibility since you use an expression to call the stored procedure, which means you can select the data that you pass to the stored procedure as an input parameter. When using an unconnected Stored Procedure transformation in an expression, you need a method of returning the value of output parameters to a port. Use one of the following methods to capture the output values:
Assign the output value to a local variable. Assign the output value to the system variable PROC_RESULT.
563
By using PROC_RESULT, you assign the value of the return parameter directly to an output port, which can apply directly to a target. You can also combine the two options by assigning one output parameter as PROC_RESULT, and the other parameter as a variable. Use PROC_RESULT only within an expression. If you do not use PROC_RESULT or a variable, the port containing the expression captures a NULL. You cannot use PROC_RESULT in a connected Lookup transformation or within the Call Text for a Stored Procedure transformation. If you require nested stored procedures, where the output parameter of one stored procedure passes to another stored procedure, use PROC_RESULT to pass the value. The Integration Service calls the unconnected Stored Procedure transformation from the Expression transformation. Notice that the Stored Procedure transformation has two input ports and one output port. All three ports are string datatypes.
To call a stored procedure from within an expression: 1.
Create the Stored Procedure transformation in the mapping. For more information, see Creating a Stored Procedure Transformation on page 556.
2.
In any transformation that supports output and variable ports, create a new output port in the transformation that calls the stored procedure. Name the output port.
Output Port
The output port that calls the stored procedure must support expressions. Depending on how the expression is configured, the output port contains the value of the output parameter or the return value.
3.
Open the Expression Editor for the port. The value for the new port is set up in the Expression Editor as a call to the stored procedure using the :SP keyword in the Transformation Language. The easiest way to set this up properly is to select the Stored Procedures node in the Expression Editor, and
564
click the name of Stored Procedure transformation listed. For a normal connected Stored Procedure to appear in the functions list, it requires at least one input and one output port.
The stored procedure appears in the Expression Editor with a pair of empty parentheses. The necessary input and/or output parameters are displayed in the lower left corner of the Expression Editor.
4.
Configure the expression to send input parameters and capture output parameters or return value. You must know whether the parameters shown in the Expression Editor are input or output parameters. You insert variables or port names between the parentheses in the order that they appear in the stored procedure. The datatypes of the ports and variables must match those of the parameters passed to the stored procedure. For example, when you click the stored procedure, something similar to the following appears:
:SP.GET_NAME_FROM_ID()
This particular stored procedure requires an integer value as an input parameter and returns a string value as an output parameter. How the output parameter or return value is captured depends on the number of output parameters and whether the return value needs to be captured. If the stored procedure returns a single output parameter or a return value (but not both), you should use the reserved variable PROC_RESULT as the output variable. In the previous example, the expression would appear as:
:SP.GET_NAME_FROM_ID(inID, PROC_RESULT)
inID can be either an input port for the transformation or a variable in the transformation. The value of PROC_RESULT is applied to the output port for the expression.
565
If the stored procedure returns multiple output parameters, you must create variables for each output parameter. For example, if you create a port called varOUTPUT2 for the stored procedure expression, and a variable called varOUTPUT1, the expression appears as:
:SP.GET_NAME_FROM_ID(inID, varOUTPUT1, PROC_RESULT)
The value of the second output port is applied to the output port for the expression, and the value of the first output port is applied to varOUTPUT1. The output parameters are returned in the order they are declared in the stored procedure. With all these expressions, the datatypes for the ports and variables must match the datatypes for the input/output variables and return value.
5.
Click Validate to verify the expression, and then click OK to close the Expression Editor. Validating the expression ensures that the datatypes for parameters in the stored procedure match those entered in the expression.
6.
Click OK. When you save the mapping, the Designer does not validate the stored procedure expression. If the stored procedure expression is not configured properly, the session fails. When testing a mapping using a stored procedure, set the Override Tracing session option to a verbose mode and configure the On Stored Procedure session option to stop running if the stored procedure fails. Configure these session options in the Error Handling settings of the Config Object tab in the session properties.
The stored procedure in the expression entered for a port does not have to affect all values that pass through the port. Using the IIF statement, for example, you can pass only certain values, such as ID numbers that begin with 5, to the stored procedure and skip all other values. You can also set up nested stored procedures so the return value of one stored procedure becomes an input parameter for a second stored procedure. For more information about configuring the stored procedure expression, see Expression Rules on page 573.
Create the Stored Procedure transformation in the mapping. For more information, see Creating a Stored Procedure Transformation on page 556.
566
2.
Double-click the Stored Procedure transformation, and select the Properties tab.
3.
Enter the name of the stored procedure. If you imported the stored procedure, this should be set correctly. If you manually set up the stored procedure, enter the name of the stored procedure.
4. 5.
Select the database that contains the stored procedure in Connection Information. Enter the call text of the stored procedure. This is the name of the stored procedure, followed by all applicable input parameters in parentheses. If there are no input parameters, you must include an empty pair of parentheses, or the call to the stored procedure fails. You do not need to include the SQL statement EXEC, nor do you need to use the :SP keyword. For example, to call a stored procedure called check_disk_space, enter the following text:
check_disk_space()
To pass a string input parameter, enter it without quotes. If the string has spaces in it, enclose the parameter in double quotes. For example, if the stored procedure check_disk_space required a machine name as an input parameter, enter the following text:
check_disk_space(oracle_db)
You must enter values for the input parameters, since pre- and post-session procedures cannot pass variables. When passing a datetime value through a pre- or post-session stored procedure, the value must be in the Informatica default date format and enclosed in double quotes as follows:
SP(12/31/2000 11:45:59)
You can use PowerCenter parameters and variables in the call text. Use any parameter or variable type that you can define in the parameter file. You can enter a parameter or
567
variable within the call text, or you can use a parameter or variable as the call text. For example, you can use a session parameter, $ParamMyCallText, as the call text, and set $ParamMyCallText to the call text in a parameter file. For more information, see Parameter Files in the Workflow Administration Guide.
6.
Select the stored procedure type. The options for stored procedure type include:
Source Pre-load. Before the session retrieves data from the source, the stored procedure runs. This is useful for verifying the existence of tables or performing joins of data in a temporary table. Source Post-load. After the session retrieves data from the source, the stored procedure runs. This is useful for removing temporary tables. Target Pre-load. Before the session sends data to the target, the stored procedure runs. This is useful for verifying target tables or disk space on the target system. Target Post-load. After the session sends data to the target, the stored procedure runs. This is useful for re-creating indexes on the database.
7.
Select Execution Order, and click the Up or Down arrow to change the order, if necessary. If you have added several stored procedures that execute at the same point in a session (such as two procedures that both run at Source Post-load), you can set a stored procedure execution plan to determine the order in which the Integration Service calls these stored procedures. You need to repeat this step for each stored procedure you wish to change.
8.
Click OK.
Although the repository validates and saves the mapping, the Designer does not validate whether the stored procedure expression runs without an error. If the stored procedure expression is not configured properly, the session fails. When testing a mapping using a stored procedure, set the Override Tracing session option to a verbose mode and configure the On Stored Procedure session option to stop running if the stored procedure fails. Configure these session options on the Error Handling settings of the Config Object tab in the session properties. You lose output parameters or return values called during pre- or post-session stored procedures, since there is no place to capture the values. If you need to capture values, you might want to configure the stored procedure to save the value in a table in the database.
568
Error Handling
Sometimes a stored procedure returns a database error, such as divide by zero or no more rows. The final result of a database error during a stored procedure depends on when the stored procedure takes place and how the session is configured. You can configure the session to either stop or continue running the session upon encountering a pre- or post-session stored procedure error. By default, the Integration Service stops a session when a pre- or post-session stored procedure database error occurs. Figure 25-3 shows the properties you can configure for stored procedures and error handling:
Figure 25-3. Stored Procedure Error Handling
Pre-Session Errors
Pre-read and pre-load stored procedures are considered pre-session stored procedures. Both run before the Integration Service begins reading source data. If a database error occurs during a pre-session stored procedure, the Integration Service performs a different action depending on the session configuration.
If you configure the session to stop upon stored procedure error, the Integration Service fails the session. If you configure the session to continue upon stored procedure error, the Integration Service continues with the session.
Error Handling
569
Post-Session Errors
Post-read and post-load stored procedures are considered post-session stored procedures. Both run after the Integration Service commits all data to the database. If a database errors during a post-session stored procedure, the Integration Service performs a different action depending on the session configuration.
If you configure the session to stop upon stored procedure error, the Integration Service fails the session. However, the Integration Service has already committed all data to session targets. If you configure the session to continue upon stored procedure error, the Integration Service continues with the session.
Session Errors
Connected or unconnected stored procedure errors occurring during the session are not affected by the session error handling option. If the database returns an error for a particular row, the Integration Service skips the row and continues to the next row. As with other row transformation errors, the skipped row appears in the session log.
570
Supported Databases
The supported options for Oracle, and other databases, such as Informix, Microsoft SQL Server, and Sybase are described below. For more information about database differences, see Writing a Stored Procedure on page 553. For more information about supported features, see the database documentation.
SQL Declaration
In the database, the statement that creates a stored procedure appears similar to the following Oracle stored procedure:
create or replace procedure sp_combine_str (str1_inout IN OUT varchar2, str2_inout IN OUT varchar2, str_out OUT varchar2) is begin str1_inout := UPPER(str1_inout); str2_inout := upper(str2_inout); str_out := str1_inout || ' ' || str2_inout; end;
In this case, the Oracle statement begins with CREATE OR REPLACE PROCEDURE. Since Oracle supports both stored procedures and stored functions, only Oracle uses the optional CREATE FUNCTION statement.
Parameter Types
There are three possible parameter types in stored procedures:
IN. Defines the parameter something that must be passed to the stored procedure. OUT. Defines the parameter as a returned value from the stored procedure. INOUT. Defines the parameter as both input and output. Only Oracle supports this parameter type.
Supported Databases
571
572
Expression Rules
Unconnected Stored Procedure transformations can be called from an expression in another transformation. Use the following rules and guidelines when configuring the expression:
A single output parameter is returned using the variable PROC_RESULT. When you use a stored procedure in an expression, use the :SP reference qualifier. To avoid typing errors, select the Stored Procedure node in the Expression Editor, and double-click the name of the stored procedure. However, the same instance of a Stored Procedure transformation cannot run in both connected and unconnected mode in a mapping. You must create different instances of the transformation. The input/output parameters in the expression must match the input/output ports in the Stored Procedure transformation. If the stored procedure has an input parameter, there must also be an input port in the Stored Procedure transformation. When you write an expression that includes a stored procedure, list the parameters in the same order that they appear in the stored procedure and the Stored Procedure transformation. The parameters in the expression must include all of the parameters in the Stored Procedure transformation. You cannot leave out an input parameter. If necessary, pass a dummy variable to the stored procedure. The arguments in the expression must be the same datatype and precision as those in the Stored Procedure transformation. Use PROC_RESULT to apply the output parameter of a stored procedure expression directly to a target. You cannot use a variable for the output parameter to pass the results directly to a target. Use a local variable to pass the results to an output port within the same transformation. Nested stored procedures allow passing the return value of one stored procedure as the input parameter of another stored procedure. For example, if you have the following two stored procedures:
And the return value for get_employee_id is an employee ID number, the syntax for a nested stored procedure is:
:sp.get_employee_salary (:sp.get_employee_id (employee_name))
Do not use single quotes around string parameters. If the input parameter does not contain spaces, do not use any quotes. If the input parameter contains spaces, use double quotes.
Expression Rules
573
Tips
Do not run unnecessary instances of stored procedures. Each time a stored procedure runs during a mapping, the session must wait for the stored procedure to complete in the database. You have two possible options to avoid this:
Reduce the row count. Use an active transformation prior to the Stored Procedure transformation to reduce the number of rows that must be passed the stored procedure. Or, create an expression that tests the values before passing them to the stored procedure to make sure that the value does not really need to be passed. Create an expression. Most of the logic used in stored procedures can be easily replicated using expressions in the Designer.
574
Troubleshooting
I get the error stored procedure not found in the session log file. Make sure the stored procedure is being run in the correct database. By default, the Stored Procedure transformation uses the target database to run the stored procedure. Double-click the transformation in the mapping, select the Properties tab, and check which database is selected in Connection Information. My output parameter was not returned using a Microsoft SQL Server stored procedure. Check if the parameter to hold the return value is declared as OUTPUT in the stored procedure. With Microsoft SQL Server, OUTPUT implies input/output. In the mapping, you probably have checked both the I and O boxes for the port. Clear the input port. The session did not have errors before, but now it fails on the stored procedure. The most common reason for problems with a Stored Procedure transformation results from changes made to the stored procedure in the database. If the input/output parameters or return value changes in a stored procedure, the Stored Procedure transformation becomes invalid. You must either import the stored procedure again, or manually configure the stored procedure to add, remove, or modify the appropriate ports. The session has been invalidated since I last edited the mapping. Why? Any changes you make to the Stored Procedure transformation may invalidate the session. The most common reason is that you have changed the type of stored procedure, such as from a Normal to a Post-load Source type.
Troubleshooting
575
576
Chapter 26
Overview, 578 Transaction Control Transformation Properties, 579 Using Transaction Control Transformations in Mappings, 582 Mapping Guidelines and Validation, 586 Creating a Transaction Control Transformation, 587
577
Overview
Transformation type: Active Connected
PowerCenter lets you control commit and roll back transactions based on a set of rows that pass through a Transaction Control transformation. A transaction is the set of rows bound by commit or roll back rows. You can define a transaction based on a varying number of input rows. You might want to define transactions based on a group of rows ordered on a common key, such as employee ID or order entry date. In PowerCenter, you define transaction control at two levels:
Within a mapping. Within a mapping, you use the Transaction Control transformation to define a transaction. You define transactions using an expression in a Transaction Control transformation. Based on the return value of the expression, you can choose to commit, roll back, or continue without any transaction changes. Within a session. When you configure a session, you configure it for user-defined commit. You can choose to commit or roll back a transaction if the Integration Service fails to transform or write any row to the target.
When you run the session, the Integration Service evaluates the expression for each row that enters the transformation. When it evaluates a commit row, it commits all rows in the transaction to the target or targets. When the Integration Service evaluates a roll back row, it rolls back all rows in the transaction from the target or targets. If the mapping has a flat file target you can generate an output file each time the Integration Service starts a new transaction. You can dynamically name each target flat file. For more information about creating target files by transaction, see the PowerCenter Designer Guide.
Note: You can also use the transformation scope in other transformation properties to define
transactions. For more information, see Understanding Commit Points in the Workflow Administration Guide.
578
Transformation tab. You can rename the transformation and add a description on the Transformation tab. Ports tab. You can add input/output ports to a Transaction Control transformation. Properties tab. You can define the transaction control expression, which flags transactions for commit, roll back, or no action. Metadata Extensions tab. You can extend the metadata stored in the repository by associating information with the Transaction Control transformation. For more information, see Metadata Extensions in the Repository Guide.
Properties Tab
On the Properties tab, you can configure the following properties:
579
Enter the transaction control expression in the Transaction Control Condition field. The transaction control expression uses the IIF function to test each row against the condition. Use the following syntax for the expression:
IIF (condition, value1, value2)
The expression contains values that represent actions the Integration Service performs based on the return value of the condition. The Integration Service evaluates the condition on a row-by-row basis. The return value determines whether the Integration Service commits, rolls back, or makes no transaction changes to the row. When the Integration Service issues a commit or roll back based on the return value of the expression, it begins a new transaction. Use the following built-in variables in the Expression Editor when you create a transaction control expression:
TC_CONTINUE_TRANSACTION. The Integration Service does not perform any transaction change for this row. This is the default value of the expression. TC_COMMIT_BEFORE. The Integration Service commits the transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction. TC_COMMIT_AFTER. The Integration Service writes the current row to the target, commits the transaction, and begins a new transaction. The current row is in the committed transaction. TC_ROLLBACK_BEFORE. The Integration Service rolls back the current transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction. TC_ROLLBACK_AFTER. The Integration Service writes the current row to the target, rolls back the transaction, and begins a new transaction. The current row is in the rolled back transaction.
If the transaction control expression evaluates to a value other than commit, roll back, or continue, the Integration Service fails the session.
Example
You want to use transaction control to write order information based on the order entry date. You want to ensure that all orders entered on any given date are committed to the target in the same transaction. To accomplish this, you can create a mapping with the following transformations:
Sorter transformation. Sort the source data by order entry date. Expression transformation. Use local variables to determine whether the date entered is a new date.
580
PREV_DATE
DATE_ENTERED
DATE_OUT
NEW_DATE
Note: The Integration Service evaluates ports by dependency. The order in which ports display in a transformation must match the order of evaluation: input ports, variable ports, output ports.
Transaction Control transformation. Create the following transaction control expression to commit data when the Integration Service encounters a new order entry date:
IIF(NEW_DATE = 1, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION)
581
Aggregator transformation with the All Input level transformation scope Joiner transformation with the All Input level transformation scope Rank transformation with the All Input level transformation scope Sorter transformation with the All Input level transformation scope Custom transformation with the All Input level transformation scope Custom transformation configured to generate transactions Transaction Control transformation A multiple input group transformation, such as a Custom transformation, connected to multiple upstream transaction control points
For more information about working with transaction control, see Understanding Commit Points in the Workflow Administration Guide. Mappings with Transaction Control transformations that are ineffective for targets may be valid or invalid. When you save or validate the mapping, the Designer displays a message indicating which Transaction Control transformations are ineffective for targets.
582
Figure 26-3 shows a valid mapping with both effective and ineffective Transaction Control transformations:
Figure 26-3. Effective and Ineffective Transaction Control Transformations
Effective Transaction Control Transformation
Transformation Scope property is All Input. Aggregator drops transaction boundaries defined by TransactionControl1.
Although a Transaction Control transformation may be ineffective for a target, it can be effective for downstream transformations. Downstream transformations with the Transaction level transformation scope can use the transaction boundaries defined by an upstream Transaction Control transformation. Figure 26-4 shows a valid mapping with a Transaction Control transformation that is effective for a Sorter transformation, but ineffective for the target:
Figure 26-4. Transaction Control Transformation Effective for a Transformation
Effective Transaction Control Transformation for Target
Transformation Scope property is All Input. Aggregator drops transaction boundaries defined by TCT1. Transformation Scope property is Transaction. Sorter uses the transaction boundaries defined by TCT1.
583
Figure 26-5 shows a valid mapping with both an ineffective and an effective Transaction Control transformation:
Figure 26-5. Valid Mapping with Transaction Control Transformations
Active Source for Target1 Effective for Target1, Ineffective for Target2
The Integration Service processes TransactionControl1, evaluates the transaction control expression, and creates transaction boundaries. The mapping does not include any transformation that drops transaction boundaries between TransactionControl1 and Target1, making TransactionControl1 effective for Target1. The Integration Service uses the transaction boundaries defined by TransactionControl1 for Target1. However, the mapping includes a transformation that drops transaction boundaries between TransactionControl1 and Target2, making TransactionControl1 ineffective for Target2. When the Integration Service processes Aggregator2, it drops the transaction boundaries defined by TransactionControl1 and outputs all rows in an open transaction. Then the Integration Service evaluates TransactionControl2, creates transaction boundaries, and uses them for Target2. If a roll back occurs in TransactionControl1, the Integration Service rolls back only rows from Target1. It does not roll back any rows from Target2.
584
Figure 26-6 shows an invalid mapping with both an ineffective and an effective Transaction Control transformation:
Figure 26-6. Invalid Mapping with Transaction Control Transformations
Mapplet contains Transaction Control transformation. Ineffective for Target1 and Target2 Transformation Scope property is All Input. Active Source for Target1
The mapping is invalid because Target1 is not connected to an effective Transaction Control transformation.
585
If the mapping includes an XML target, and you choose to append or create a new document on commit, the input groups must receive data from the same transaction control point. Transaction Control transformations connected to any target other than relational, XML, or dynamic MQSeries targets are ineffective for those targets. You must connect each target instance to a Transaction Control transformation. You can connect multiple targets to a single Transaction Control transformation. You can connect only one effective Transaction Control transformation to a target. You cannot place a Transaction Control transformation in a pipeline branch that starts with a Sequence Generator transformation. If you use a dynamic Lookup transformation and a Transaction Control transformation in the same mapping, a rolled-back transaction might result in unsynchronized target data. A Transaction Control transformation may be effective for one target and ineffective for another target. If each target is connected to an effective Transaction Control transformation, the mapping is valid. See Figure 26-5 on page 584 for an example of a valid mapping with an ineffective Transaction Control transformation. Either all targets or none of the targets in the mapping should be connected to an effective Transaction Control transformation. See Figure 26-6 on page 585 for an example of an invalid mapping where one target has an effective Transaction Control transformation and one target has an ineffective Transaction Control transformation.
586
In the Mapping Designer, click Transformation > Create. Select the Transaction Control transformation. Enter a name for the transformation. The naming convention for Transaction Control transformations is TC_TransformationName.
3.
Enter a description for the transformation. This description appears when you view transformation details in the Repository Manager, making it easier to understand what the transformation does. Click Create. The Designer creates the Transaction Control transformation. Click Done. Drag the ports into the transformation. The Designer creates the input/output ports for each port you include. Open the Edit Transformations dialog box, and select the Ports tab. You can add ports, edit port names, add port descriptions, and enter default values.
4. 5. 6. 7.
8. 9.
Select the Properties tab. Enter the transaction control expression that defines the commit and roll back behavior. Select the Metadata Extensions tab. Create or edit metadata extensions for the Transaction Control transformation. For more information about metadata extensions, see Metadata Extensions in the Repository Guide. Click OK. Click Repository > Save to save changes to the mapping.
10. 11.
587
588
Chapter 27
Union Transformation
Overview, 590 Working with Groups and Ports, 592 Creating a Union Transformation, 594 Using a Union Transformation in Mappings, 595
589
Overview
Transformation type: Active Connected
The Union transformation is a multiple input group transformation that you use to merge data from multiple pipelines or pipeline branches into one pipeline branch. It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows. The Integration Service processes all input groups in parallel. The Integration Service concurrently reads sources connected to the Union transformation, and pushes blocks of data into the input groups of the transformation. The Union transformation processes the blocks of data based on the order it receives the blocks from the Integration Service. You can connect heterogeneous sources to a Union transformation. The Union transformation merges sources with matching ports and outputs the data from one output group with the same ports as the input groups. The Union transformation is developed using the Custom transformation.
You can create multiple input groups, but only one output group. All input groups and the output group must have matching ports. The precision, datatype, and scale must be identical across all groups. The Union transformation does not remove duplicate rows. To remove duplicate rows, you must add another transformation such as a Router or Filter transformation. You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation. The Union transformation does not generate transactions.
Transformation tab. You can rename the transformation and add a description. Properties tab. You can specify the tracing level. Groups tab. You can create and delete input groups. The Designer displays groups you create on the Ports tab. Group Ports tab. You can create and delete ports for the input groups. The Designer displays ports you create on the Ports tab.
590
You cannot modify the Ports, Initialization Properties, Metadata Extensions, or Port Attribute Definitions tabs in a Union transformation.
Overview
591
You can create ports by copying ports from a transformation, or you can create ports manually. When you create ports on the Group Ports tab, the Designer creates input ports in each input group and output ports in the output group. The Designer uses the port names you specify on the Group Ports tab for each input and output port, and it appends a number to make each port name in the transformation unique. It also uses the same metadata for each port, such as datatype, precision, and scale.
592
The Ports tab displays the groups and ports you create. You cannot edit group and port information on the Ports tab. Use the Groups and Group Ports tab to edit groups and ports. Figure 27-3 shows the Union transformation Ports tab with the groups and ports defined in Figure 27-1 and Figure 27-2:
Figure 27-3. Union Transformation Ports Tab
593
In the Mapping Designer, click Transformations > Create. Select Union Transformation and enter the name of the transformation. The naming convention for Union transformations is UN_TransformationName.
3. 4. 5.
Enter a description for the transformation. Click Create, and then click Done. Click the Groups tab. Add an input group for each pipeline or pipeline branch you want to merge. The Designer assigns a default name for each group but they can be renamed.
6. 7. 8. 9. 10. 11.
Click the Group Ports tab. Add a new port for each row of data you want to merge. Enter port properties, such as name and datatype. Click the Properties tab to configure the tracing level. Click OK. Click Repository > Save to save changes.
594
When a Union transformation in a mapping receives data from a single transaction generator, the Integration Service propagates transaction boundaries. When the transformation receives data from multiple transaction generators, the Integration Service drops all incoming transaction boundaries and outputs rows in an open transaction. For more information about working with transactions, see Understanding Commit Points in the Workflow Administration Guide.
595
596
Chapter 28
Overview, 598 Flagging Rows Within a Mapping, 599 Setting the Update Strategy for a Session, 602 Update Strategy Checklist, 605
597
Overview
Transformation type: Active Connected
When you design a data warehouse, you need to decide what type of information to store in targets. As part of the target table design, you need to determine whether to maintain all the historic data or just the most recent changes. For example, you might have a target table, T_CUSTOMERS, that contains customer data. When a customer address changes, you may want to save the original address in the table instead of updating that portion of the customer row. In this case, you would create a new row containing the updated address, and preserve the original row with the old customer address. This shows how you might store historical information in a target table. However, if you want the T_CUSTOMERS table to be a snapshot of current customer data, you would update the existing customer row and lose the original address. The model you choose determines how you handle changes to existing rows. In PowerCenter, you set the update strategy at two different levels:
Within a session. When you configure a session, you can instruct the Integration Service to either treat all rows in the same way (for example, treat all rows as inserts), or use instructions coded into the session mapping to flag rows for different database operations. Within a mapping. Within a mapping, you use the Update Strategy transformation to flag rows for insert, delete, update, or reject.
Note: You can also use the Custom transformation to flag rows for insert, delete, update, or
reject. For more information about using the Custom transformation to set the update strategy, see Setting the Update Strategy on page 84.
2.
3.
598
The Integration Service treats any other value as an insert. For information about these constants and their use, see the Transformation Language Reference.
For more information about the IIF and DECODE functions, see Functions in the Transformation Language Reference.
599
In the Mapping Designer, add an Update Strategy transformation to a mapping. Click Layout > Link Columns. Drag all ports from another transformation representing data you want to pass through the Update Strategy transformation. In the Update Strategy transformation, the Designer creates a copy of each port you drag. The Designer also connects the new port to the original port. Each port in the Update Strategy transformation is a combination input/output port. Normally, you would select all of the columns destined for a particular target. After they pass through the Update Strategy transformation, this information is flagged for update, insert, delete, or reject.
4.
Open the Update Strategy transformation and rename it. The naming convention for Update Strategy transformations is UPD_TransformationName.
5. 6.
Click the Properties tab. Click the button in the Update Strategy Expression field. The Expression Editor appears.
7. 8. 9. 10. 11.
Enter an update strategy expression to flag rows as inserts, deletes, updates, or rejects. Validate the expression and click OK. Click OK to save the changes. Connect the ports in the Update Strategy transformation to another transformation or a target instance. Click Repository > Save.
Position the Aggregator before the Update Strategy transformation. In this case, you perform the aggregate calculation, and then use the Update Strategy transformation to flag rows that contain the results of this calculation for insert, delete, or update. Position the Aggregator after the Update Strategy transformation. Here, you flag rows for insert, delete, update, or reject before you perform the aggregate calculation. How you flag a particular row determines how the Aggregator transformation treats any values in that row used in the calculation. For example, if you flag a row for delete and then later use the row to calculate the sum, the Integration Service subtracts the value appearing in this row. If the row had been flagged for insert, the Integration Service would add its value to the sum.
600
These update strategy target table options ensure that the Integration Service updates rows marked for update and inserts rows marked for insert. If you do not choose Data Driven, the Integration Service flags all rows for the database operation you specify in the Treat Source Rows As option and does not use the Update Strategy transformations in the mapping to flag the rows. The Integration Service does not insert and update the correct rows. If you do not choose Update as Update, the Integration Service does not correctly update the rows flagged for update in the target table. As a result, the lookup cache and target table might become unsynchronized. For more information, see Setting the Update Strategy for a Session on page 602. For more information about using Update Strategy transformations with the Lookup transformation, see Using Update Strategy Transformations with a Dynamic Cache on page 375. For more information about configuring target session properties, see Working with Targets in the Workflow Administration Guide.
601
Table 28-2 displays the options for the Treat Source Rows As setting:
Table 28-2. Specifying an Operation for All Rows
Setting Insert Delete Description Treat all rows as inserts. If inserting the row violates a primary or foreign key constraint in the database, the Integration Service rejects the row. Treat all rows as deletes. For each row, if the Integration Service finds a corresponding row in the target table (based on the primary key value), the Integration Service deletes it. Note that the primary key constraint must exist in the target definition in the repository.
602
Data Driven
Delete Update
Data Driven
603
Figure 28-1 displays the update strategy options in the Transformations view on Mapping tab of the session properties:
Figure 28-1. Specifying Operations for Individual Target Tables
Insert. Select this option to insert a row into a target table. Delete. Select this option to delete a row from a table. Update. You have the following options in this situation:
Update as Update. Update each row flagged for update if it exists in the target table. Update as Insert. Inset each row flagged for update. Update else Insert. Update the row if it exists. Otherwise, insert it.
Truncate table. Select this option to truncate the target table before loading data.
604
605
606
Chapter 29
XML Transformations
XML Source Qualifier Transformation, 608 XML Parser Transformation, 609 XML Generator Transformation, 610
607
You can add an XML Source Qualifier transformation to a mapping by dragging an XML source definition to the Mapping Designer workspace or by manually creating one. When you add an XML source definition to a mapping, you need to connect it to an XML Source Qualifier transformation. The XML Source Qualifier transformation defines the data elements that the Integration Service reads when it executes a session. It determines how the PowerCenter reads the source data. An XML Source Qualifier transformation always has one input or output port for every column in the XML source. When you create an XML Source Qualifier transformation for a source definition, the Designer links each port in the XML source definition to a port in the XML Source Qualifier transformation. You cannot remove or edit any of the links. If you remove an XML source definition from a mapping, the Designer also removes the corresponding XML Source Qualifier transformation. You can link one XML source definition to one XML Source Qualifier transformation You can link ports of one XML Source Qualifier group to ports of different transformations to form separate data flows. However, you cannot link ports from more than one group in an XML Source Qualifier transformation to ports in the same target transformation. You can edit some of the properties and add metadata extensions to an XML Source Qualifier transformation. For more information about using an XML Source Qualifier transformation, see the XML Guide.
608
Use an XML Parser transformation to extract XML inside a pipeline. The XML Parser transformation lets you extract XML data from messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or databases. The XML Parser transformation functionality is similar to the XML source functionality, except it parses the XML in the pipeline. For example, you might want to extract XML data from a TIBCO source and pass the data to relational targets. The XML Parser transformation reads XML data from a single input port and writes data to one or more output ports. For more information about the XML Parser transformation, see the XML Guide.
609
Use an XML Generator transformation to create XML inside a pipeline. The XML Generator transformation lets you read data from messaging systems, such as TIBCO and MQ Series, or from other sources, such as files or databases. The XML Generator transformation functionality is similar to the XML target functionality, except it generates the XML in the pipeline. For example, you might want to extract data from relational sources and pass XML data to targets. The XML Generator transformation accepts data from multiple ports and writes XML through a single output port. For more information about the XML Generator transformation, see the XML Guide.
610
Index
A
ABORT function See also Transformation Language Reference using 24 active transformations See also transformations Aggregator 40 Custom 72 Filter 210 Java 235 Joiner 306 Normalizer 392 overview 2 Rank 422 Router 430 Sorter 458 Source Qualifier 468, 502 Transaction Control 579 Union 590 Update Strategy 598 XML Generator 610 XML Parser 609 XML Source Qualifier 608 add statistic output port SQL transformation option 526 adding comments to expressions 12 groups 434
advanced interface EDataType class 284 example 287 invoking Java expressions 283 Java expressions 283 JExpression API Reference 289 JExpression class 286 JExprParaMetadata class 284 advanced options SQL transformation 514 aggregate functions See also Transformation Language Reference list of 45 null values 46 overview 45 using in expressions 45 Aggregator transformation compared to Expression transformation 40 components 41 conditional clause example 46 creating 52 functions list 45 group by ports 47 nested aggregation 46 non-aggregate function example 46 null values 46 optimizing performance 53 overview 40 ports 43
611
sorted ports 50 STDDEV (standard deviation) function 45 tracing levels 42 troubleshooting 54 Update Strategy combination 600 using variables 15 using with the Joiner transformation 316 VARIANCE function 45 All Input transformation scope behavior in Joiner transformation 322 API functions Custom transformation 122 API methods Java transformation 259 array-based functions data handling 150 is row valid 150 maximum number of rows 148 number of rows 149 overview 148 row strategy 153 set input error row 154 ASCII Custom transformation 73 External Procedure transformation 164 ASCII mode configuring sort order for Joiner transformation 314 associated ports Lookup transformation 370 sequence ID 370 averages See Aggregator transformation
C
C/C++ See also Visual C++ linking to Integration Service 194 Cache directory Joiner transformation property 308 cache file name prefix overview 384 caches concurrent 362, 363 dynamic lookup cache 367 Joiner transformation 321 Lookup transformation 360 named persistent lookup 384 sequential 362 sharing lookup 384 static lookup cache 366 caching master rows in Joiner transformation 321 calculations aggregate 40 using the Expression transformation 158 using variables with 16 Call Text (Property) Stored Procedure transformation 560 case-sensitive string comparison Joiner transformation property 308 Char datatypes Java transformation 235 Class Name property Java transformation 241 CLASSPATH Java transformation, configuring 250 COBOL VSAM Normalizer transformation 401 COBOL source definitions creating a Normalizer transformation 406 OCCURS statement 401 code pages See also Administrator Guide access functions 206 Custom transformation 73 External Procedure transformation 164 code snippets creating for Java transformation 245 COM external procedures adding to repository 174 compared to Informatica external procedures 166
B
BankSoft example Informatica external procedure 180 overview 166 BigDecimal datatype Java transformation 235, 251 binary datatypes Java transformation 235 blocking detail rows in Joiner transformation 321 blocking data Custom transformation 88 Custom transformation functions 143 Joiner transformation 321 Buffer input type Complex Data transformation 60 Buffer output type
612 Index
creating 170 creating a source 176 creating a target 176 datatypes 192 debugging 194 developing in Visual Basic 177 developing in Visual C++ 170, 175 development notes 192 distributing 190 exception handling 193 initializing 196 memory management 194 overview 170 registering with repositories 174 return values 193 row-level procedures 193 server type 170 unconnected 196 COM servers type for COM external procedures 170 comments adding to expressions 12 commit example for Java transformation 261 Java transformation API method 261 syntax for Java transformation 261 compilation errors identifying in Java transformation 256 compiling Custom transformation procedures 103 DLLs on Windows systems 186 Java transformation 253 Complex Data Exchange repository location 58 Complex Data transformation components 63 configuring 69 InputBuffer port 62 OutputBuffer port 62 OutputFileName port 62 ports 62 processing pass-through ports 67 properties 63 splitting output 61 composite key creating with Sequence Generator transformation 443 concurrent caches See caches conditions Filter transformation 214 Joiner transformation 310
Lookup transformation 349, 353 Router transformation 433 configuring ports 6, 7 connect string syntax 514 connected lookups See also Lookup transformation creating 356 description 331 overview 331 connected transformations Aggregator 40 Custom 72 Expression 158 External Procedure 164 Filter 210 Joiner 306 Lookup 330 Normalizer 392 Rank 422 Router 430 Sequence Generator 442 Source Qualifier 468 SQL 502 Stored Procedure 548 Update Strategy 598 XML Generator 610 XML Parser 609 XML Source Qualifier 608 connecting to databases SQL transformation 513 Connection Information (Property) Lookup transformation 339 Stored Procedure transformation 559 connection objects configuring in Lookup transformations 339 configuring in Stored Procedure transformations 559 connection settings SQL transformation 513 connection variables using in Lookup transformations 339 using in Stored Procedure transformations 559 connections SQL transformation 513 connectivity connect string examples 514 constants replacing null values with 23 continue on SQL error SQL transformation 526
Index
613
creating Aggregator transformation 52 COM external procedures 170 connected Lookup transformation 356 Custom transformation 75, 91 Expression transformation 162 Filter transformation 215 Informatica external procedures 180 Joiner transformation 325 non-reusable instance of reusable transformation 35 ports 7 Rank transformation 426 reusable transformations 34 Router transformation 440 Sequence Generator transformation 454 Stored Procedure transformation 556 transformations 5 Union transformation 594 Update Strategy transformation 599 Current Value property Sequence Generator transformation 448, 450 CURRVAL port Sequence Generator transformation 446 custom functions using with Java expressions 274 Custom transformation blocking data 88 building the module 103 code pages 73 compiling procedures 103 components 76 creating 75, 91 creating groups 77 creating ports 77 creating procedures 91 defining port relationships 78 distributing 74 functions 108 Generate Transaction property 86 generating code files 75, 93 initialization properties 90 Inputs May Block property 88 Is Partitionable property 83 metadata extensions 90 overview 72 passing rows to procedure 114 port attributes 80 procedure properties 90 properties 82 property IDs 127 Requires Single Thread property 84
rules and guidelines 75 setting the update strategy 84 threads 84 thread-specific code 84 transaction boundaries 87 transaction control 86 Transformation Scope property 86 Update Strategy property 84 Custom transformation functions API 122 array-based 148 blocking logic 143 change default row strategy 147 change string mode 144 data boundary output 139 data handling (array-based) 150 data handling (row-based) 135 deinitialization 120 error 140 generated 116 increment error count 142 initialization 116 is row valid 150 is terminated 142 maximum number of rows 148 navigation 123 notification 118 number of rows 149 output notification 139 pointer 144 property 126 rebind datatype 133 row strategy (array-based) 153 row strategy (row-based) 146 session log 141 set data access mode 122 set data code page 145 set input error row 154 set pass-through port 138 working with handles 108, 123 Custom transformation procedures creating 91 example 94 generating code files 93 thread-specific 84 working with rows 114 Cycle Sequence Generator transformation property 448, 449
614
Index
D
data joining 306 pre-sorting 50 rejecting through Update Strategy transformation 605 selecting distinct 496 storing temporary 15 data driven overview 603 data handling functions array-based 150 row-based 135 database resilience SQL transformation 522 databases See also Configuration Guide See also specific database vendors, such as Oracle joining data from different 306 options supported 571 datatypes COM 192 Java transformation 235 Source Qualifier 468 transformation 192 Date/Time datatypes Java transformation 235 DB2 See IBM DB2 debugging external procedures 194 Java transformation 254 default groups Router transformation 432 default join Source Qualifier 476 default query methods for overriding 474 overriding using Source Qualifier 480 overview 473 viewing 473 default values Aggregator group by ports 48 entering 30 input ports 20 input/output ports 20 output ports 20, 22 overview 20 rules for 30 validating 30 defineJExpression
Java expression API method 285 defining port dependencies in Custom transformation 78 deinitialization functions Custom transformation 120 dependencies ports in Custom transformations 78 detail outer join description 312 detail rows blocking in Joiner transformation 321 processing in sorted Joiner transformation 321 processing in unsorted Joiner transformation 321 developing COM external procedures 170 Informatica external procedures 180 dispatch function description 202 distributing Custom transformation procedures 74 external procedures 190 DLLs (dynamic linked libraries) compiling external procedures 186 double datatype Java transformation 235, 251 dynamic connections performance considerations 516 SQL transformation 513 SQL transformation example 541 dynamic linked libraries See DLLs dynamic lookup cache error threshold 380 filtering rows 369 overview 367 reject loading 380 synchronizing with target 380 using flat file sources 367 dynamic Lookup transformation output ports 369 dynamic SQL queries SQL transformation 508 SQL transformation example 535
E
EDataType class Java expressions 284 effective Transaction Control transformation definition 582 End Value property
Index 615
Sequence Generator transformation 448, 450 entering expressions 10, 12 source filters 492 SQL query override 480 user-defined joins 482 environment variables setting for Java packages 250 error count incrementing for Java transformation 265 ERROR function See also Transformation Language Reference using 24 error handling for stored procedures 569 with dynamic lookup cache 380 error messages See also Troubleshooting Guide for external procedures 194 tracing for external procedures 194 errors handling 27 increasing threshold in Java transformation 265 validating in Expression Editor 13 with dynamic lookup cache 380 exceptions from external procedures 193 Execution Order (Property) Stored Procedure transformation 560 Expression Editor overview 12 syntax colors 13 using with Java expressions 277 validating expressions using 13 Expression transformation creating 162 overview 158 routing data 162 using variables 15 expressions See also Transformation Language Reference Aggregator transformation 45 calling lookups 354 calling stored procedure from 563 entering 10, 12 Filter condition 214 non-aggregate 48 return values 11 rules for Stored Procedure transformation 573 simplifying 15 update strategy 599
using with Java transformation 274 validating 13 External Procedure transformation See also COM external procedures See also Informatica external procedures ATL objects 171 BankSoft example 166 building libraries for C++ external procedures 173 building libraries for Informatica external procedures 186 building libraries for Visual Basic external procedures 179 code page access function 206 COM datatypes 192 COM external procedures 170 COM vs. Informatica types 166 creating in Designer 180 debugging 194 description 165 development notes 192 dispatch function 202 exception handling 193 external procedure function 202 files needed 200 IDispatch interface 170 Informatica external procedure using BankSoft example 180 Informatica external procedures 180 initializing 196 interface functions 202 Is Partitionable (property) 168 member variables 206 memory management 194 MFC AppWizard 186 Module (property) 167 multi-threaded code 164 Output is Deterministic (property) 169 Output is Repeatable (property) 169 overview 164 parameter access functions 204 partition related functions 207 pipeline partitioning 165 process variable support 201 Programmatic Identifier (property) 167 properties 165, 167 property access function 203 return values 193 row-level procedure 193 Runtime Location (property) 168 session 177 64-bit 205
616
Index
Tracing Level (property) 168 tracing level function 208 unconnected 196 using in a mapping 176 Visual Basic 177 Visual C++ 170 wrapper classes 194 external procedures See also External Procedure transformation debugging 194 development notes 192 distributing 190 distributing Informatica external procedures 191 interface functions 202 linking to 164
full outer join definition 313 functions See also Transformation Language Reference aggregate 45 non-aggregate 46
G
generate Java code Java expressions 277 generate output row Java transformation 263 generate rollback row Java transformation 269 generate transaction Java transformation 243 Generate Transaction property Java transformation 243 generated column ID Normalizer transformation 395 generated functions Custom transformation 116 generated keys Normalizer transformation 399 generateRow example for Java transformation 263 Java transformation API methods 263 syntax for Java transformation 263 generating transactions Custom transformation 86 Java transformation 244, 261 getBytes Java expression API method 292 getDouble Java expression API method 291 getInRowType example for Java transformation 264 Java transformation API method 264 syntax for Java transformation 264 getInt Java expression API method 291 getLong Java expression API method 291 getResultMetadata Java expression API method 290 getStringBuffer Java expression API method 292 group by ports Aggregator transformation 47 non-aggregate expression 48
Index 617
F
failing sessions Java transformation 262 failSession example for Java transformation 262 Java transformation API method 262 syntax for Java transformation 262 File input type Complex Data transformation 60 File output type Complex Data transformation 61, 66 Filter transformation condition 214 creating 215 example 210 overview 210 performance tips 216 tips for developing 216 filtering rows Source Qualifier as filter 216 transformation for 210, 458 flat file lookups description 333 sorted input 333 flat files joining data 306 lookups 333 foreign key creating with Sequence Generator transformation 443 Forwarding Rejected Rows configuring 599 option 599 full database connection passing to SQL transformation 513
using default values 48 group filter condition Router transformation 433 groups adding 434 Custom transformation 77 Custom transformation rules 78 HTTP transformation 225 Java transformation 239 Router transformation 432 Union transformation 592 user-defined 432
H
handles Custom transformation 108 Helper Code tab example 299 Java transformation 246 heterogeneous joins See Joiner transformation high precision enabling for Java transformation 251 HTTP transformation authentication 219 configuring groups and ports 225 configuring HTTP tab 224 configuring properties 224 creating 220 examples 229 groups 225 Is Partitionable property 223 Requires Single Thread per Partition property 223 response codes 219 thread-specific code 223
I
IBM DB2 connect string syntax 514 IDispatch interface defining a class 170 IIF function replacing missing keys with Sequence Generator transformation 443 Import Packages tab example 298 Java transformation 246 Increment by
Sequence Generator transformation property 448, 449 incrementErrorCount example for Java transformation 265 Java transformation API method 265 syntax for Java transformation 265 incrementing setting sequence interval 449 indexes lookup conditions 358 lookup table 335, 358 ineffective Transaction Control transformation definition 582 Informatica external procedures compared to COM 166 debugging 194 developing 180 development notes 192 distributing 191 exception handling 193 generating C++ code 182 initializing 196 memory management 194 return values 193 row-level procedures 193 unconnected 196 Informix connect string syntax 514 stored procedure notes 553 initialization functions Custom transformation 116 initializing Custom transformation procedures 90 external procedures 196 Integration Service variable support for 201 variables 18 input parameters stored procedures 549 input ports default values 20 overview 8 using as variables 247 input row getting row type 264 Input type property Complex Data transformation 60 input/output ports default values 20 overview 8 InputBuffer port Complex Data transformation 62 Inputs Must Block property
618
Index
Java transformation 242 instance variable Java transformation 246, 247, 248 instances creating reusable transformations 33 Integration Service aggregating data 47 datatypes 192 error handling of stored procedures 569 running in debug mode 194 transaction boundaries 87 variable support 201 invoke Java expression API method 289 invokeJExpression API method 281 Is Active property Java transformation 242 Is Partitionable (property) See also Workflow Administration Guide External Procedure transformation 168 Is Partitionable property Custom transformation 83 HTTP transformation 223 Java transformation 242 isNull example for Java transformation 266 Java transformation API method 266 syntax for Java transformation 266 isResultNull Java expression API method 290
J
Java Classpath session property 250 Java code snippets creating for Java transformation 245 example 298 Java Code tab using 237 Java expression API methods defineJExpression 285 getBytes 292 getDouble 291 getInt 291 getLong 291 getResultMetadata 290 getStringBuffer 292 invoke 289 isResultNull 290
Java expressions advanced interface 283 advanced interface example 287 configuring 276 configuring functions 276 creating 277 EDataType class 284 expression function types 274 generate Java code 277 generating 276 invokeJExpression API method 281 invoking with advanced interface 283 invoking with simple interface 281 Java Expressions tab 278 JExpression API reference 289 JExpression class 286 JExprParaMetadata class 284 rules and guidelines 281, 283 simple interface 281 simple interface example 282 steps to create 278 using custom functions 274 using transformation language functions 274 using user-defined functions 274 using with Java transformation 274 Java Expressions tab Java expressions 278 Java packages importing 246 Java transformation active 235 API methods 259 checking null values 266 Class Name property 241 compilation errors 254 compiling 253 creating code 245 creating groups 239 creating Java code 245 creating ports 239 datatype mapping 235 debugging 254 default port values 240 example 294 failing session 262 Generate Transaction property 243, 244 getting input row type 264 Helper Code tab 246 Import Package tab 246 Is Partitionable property 242 Java Code tab 237
Index
619
Language property 241 locating errors 254 On End of Data tab 248 On Input Row tab 247 On Receiving Transaction tab 248 Output is Deterministic property 243 Output is Ordered property 243 passive 235 primitive datatypes 235 properties 241 Requires Single Thread Per Partition property 243 session level classpath 250 session log 267, 268 setting CLASSPATH 250 setting null values 270 setting output row type 271 setting the update strategy 244 Tracing Level property 242 transaction control 243 transformation level classpath 250 Transformation Scope property 242, 243 Update Strategy property 244 Java transformation API methods commit 261 failSession 262 generateRow 263 getInRowType 264 incrementErrorCount 265 isNull 266 logError 268 logInfo 267 rollback 269 setNull 270 setOutRowType 271 JDK Java transformation 234 JExpression API reference Java expressions 289 JExpression class Java expressions 286 JExprParaMetadata class Java expressions 284 join condition defining 316 overview 310 using sort origin ports 314 join override left outer join syntax 487 normal join syntax 485 right outer join syntax 489 join syntax
left outer join 486 normal join 485 right outer join 488 join type detail outer join 312 full outer join 313 Joiner properties 311 left outer join 484 master outer join 312 normal join 311 right outer join 484 Source Qualifier transformation 484 Join Type property Joiner transformation 308 joiner cache Joiner transformation 321 Joiner data cache size Joiner transformation property 309 Joiner index cache size Joiner transformation property 309 Joiner transformation All Input transformation scope 322 behavior with All Input transformation scope 322 behavior with Row transformation scope 322 behavior with Transaction transformation scope 322 blocking source data 321 caches 321 conditions 310 configuring join condition to use sort origin ports 314 configuring sort order 314 configuring sort order in ASCII mode 314 configuring sort order in Unicode mode 314 creating 325 detail pipeline 306 dropping transaction boundaries 324 join types 311 joining data from the same source 318 joining multiple databases 306 master pipeline 306 overview 306 performance tips 328 preserving transaction boundaries 323 processing real-time data 323 properties 308 real-time data 322 Row transformation scope 322 rules for input 306 Transaction transformation scope 322 transactions 322 transformation scope 322 using with Sorter transformation 314
620
Index
joining sorted data configuring to optimize join performance 314 using sorted flat files 314 using sorted relational data 314 using Sorter transformation 314 joins creating key relationships for 478 custom 477 default for Source Qualifier 476 Informatica syntax 484 user-defined 482 JRE Java transformation 234
K
keys creating for joins 478 creating with Sequence Generator transformation 443 source definitions 478
L
Language property Java transformation 241 left outer join creating 486 syntax 486 Level attribute pipeline Normalizer transformation 411 VSAM Normalizer transformation 405 libraries for C++ external procedures 173 for Informatica external procedures 186 for VB external procedures 179 load order Source Qualifier 468 load types stored procedures 568 local variables See variables log files See session logs logError example for Java transformation 268 Java transformation API method 268 syntax for Java transformation 268 logical database connection passing to SQL transformation 513 performance considerations 516
LogicalConnectionObject description 513 SQL transformation example 544 logInfo example for Java transformation 267 Java transformation API method 267 syntax for Java transformation 267 long datatype Java transformation 235 lookup caches definition 360 dynamic 367 dynamic, error threshold 380 dynamic, synchronizing with target 380 dynamic, WHERE clause 379 handling first and last values 350 named persistent caches 384 overriding ORDER BY 346 overview 360 partitioning guidelines with unnamed caches 384 persistent 364 recache from database 362 reject loading 380 sharing 384 sharing unnamed lookups 384 static 366 lookup condition definition 337 overview 349 lookup ports definition 335 NewLookupRow 369 overview 335 lookup properties configuring in a session 342 lookup query description dynamic cache 379 ORDER BY 345 overriding 345 overview 345 reserved words 347 Sybase ORDER BY limitation 347 WHERE clause 379 Lookup SQL Override option dynamic caches, using with 379 mapping parameters and variables 345 reducing cache size 346 lookup table indexes 335, 358 Lookup transformation
Index
621
See also Workflow Administration Guide associated input port 370 cache sharing 384 caches 360 components of 335 condition 349, 353 connected 331 Connection Information (Property) 339 creating connected lookup 356 default query 345 dynamic cache 367 entering custom queries 348 error threshold 380 expressions 354 filtering rows 369 flat file lookups 330, 333 lookup sources 330 mapping parameters and variables 345 multiple matches 350 named persistent cache 384 NewLookupRow port 369 overriding the default query 345 overview 330 performance tips 358 persistent cache 364 ports 335 properties 338 recache from database 362 reject loading 380 return values 353 sequence ID 370 synchronizing dynamic cache with target 380 unconnected 331, 352 Update Strategy combination 601 lookups See lookup query 345
M
mapper Complex Data Exchange 56 Mapping Designer adding reusable transformation 35 creating ports 7 mapping parameters in lookup SQL override 345 in Source Qualifier transformations 469 mapping variables in lookup SQL override 345 in Source Qualifier transformations 469 reusable transformations 33
622 Index
mappings adding a COBOL source 401 adding reusable transformations 35 adding transformations 5 affected by stored procedures 549 configuring connected Stored Procedure transformation 562 configuring unconnected Stored Procedure transformation 563 flagging rows for update 599 lookup components 335 modifying reusable transformations 36 using an External Procedure transformation 176 using Router transformations 438 master outer join description 312 preserving transaction boundaries 323 master rows caching 321 processing in sorted Joiner transformation 321 processing in unsorted Joiner transformation 321 max output row count SQL transformation 526 memory management for external procedures 194 metadata extensions in Custom transformations 90 methods Java transformation 247, 248 Java transformation API 259 MFC AppWizard overview 186 Microsoft SQL Server connect string syntax 514 stored procedure notes 554 missing values replacing with Sequence Generator 443 Module (property) External Procedure transformation 167 multi-group transformations 9 multiple matches Lookup transformation 350
N
named cache persistent 364 recache from database 364 sharing 386 named persistent lookup cache
overview 384 sharing 386 native datatypes SQL transformation 507 NewLookupRow output port overview 369 NEXTVAL port Sequence Generator 444 non-aggregate expressions overview 48 non-aggregate functions example 46 normal join creating 485 definition 311 preserving transaction boundaries 323 syntax 485 Normal tracing levels overview 32 Normalizer transformation creating a pipeline Normalizer transformation 412 creating a VSAM Normalizer transformation 406 example 415 generated column ID 395 generated key 399 Level attribute 405, 411 mapping example 416 Normalizer tab 397 Occurs attribute 397, 398 overview 392 pipeline Normalizer 408 Ports tab 394 Properties tab 396 troubleshooting 420 VSAM Normalizer 401 notification functions Custom transformation 118 Null ordering in detail Joiner transformation property 309 Null ordering in master Joiner transformation property 308 null values aggregate functions 46 checking in Java transformation 266 filtering 214 replacing using aggregate functions 48 replacing with a constant 23 setting for Java transformation 270 skipping 26 number of cached values Sequence Generator property value 449
Sequence Generator transformation property 451 NumRowsAffected port SQL transformation 518
O
object datatype Java transformation 235 Occurs attribute Normalizer transformation 397, 398 OCCURS statement COBOL example 401 On End of Data tab Java transformation 248 On Input Row tab example 300 Java transformation 247 On Receiving Transaction tab Java transformation 248 operators See also Transformation Language Reference lookup condition 349 Oracle connect string syntax 514 stored procedure notes 554 ORDER BY lookup query 345 outer join See also join type creating 489 creating as a join override 490 creating as an extract override 490 Integration Service supported types 484 Output is Deterministic (Property) Stored Procedure transformation 560 Output is Deterministic (property) See also Workflow Administration Guide External Procedure transformation 169 Output is Deterministic property Java transformation 243 Output is Ordered property Java transformation 243 Output is Repeatable (Property) Stored Procedure transformation 560 Output is Repeatable (property) See also Workflow Administration Guide External Procedure transformation 169 output parameters stored procedures 549 output ports default values 20
Index 623
dynamic Lookup transformation 369 error handling 20 NewLookupRow in Lookup transformation 369 overview 8 required for Expression transformation 161 using as variables 247 output row setting row type in Java transformation 271 Output type property Complex Data transformation 61 OutputBuffer port Complex Data transformation 62 OutputFileName port Complex Data transformation 62 overriding default Source Qualifier SQL query 480 lookup query
P
parameter access functions 64-bit 205 description 204 parameter binding SQL transformation queries 507 parser Complex Data Exchange 56 Complex Data Exchange output 65 partition related functions description 207 partitioned pipelines joining sorted data 314 partitioning See also Workflow Administration Guide passive transformations Expression 158 External Procedure transformation 164 Java 235 Lookup 330 overview 2 Sequence Generator 442 Stored Procedure 548 pass-through ports adding to SQL transformation 511 Complex Data transformation processing 67 percentile See Aggregator transformation See also Transformation Language Reference performance Aggregator transformation 50 improving filter 216
624 Index
Joiner transformation 328 logical database connections 516 Lookup transformation 358 static database connections 516 using variables to improve 15 persistent lookup cache named and unnamed 364 named files 384 overview 364 recache from database 362 sharing 384 pipeline Normalizer transformation creating 412 description 408 Normalizer tab 411 Ports tab 409 pipeline partitioning Custom transformation 83 External Procedure transformation 165 HTTP transformation 223 pipelines Joiner transformation 306 merging with Union transformation 590 port attributes editing 81 overview 80 port dependencies Custom transformation 78 port values Java transformation 240 ports Aggregator transformation 43 Complex Data transformation 62 configuring 6 creating 6, 7 Custom transformation 77 default values overview 20 evaluation order 18 group by 47 HTTP transformation 225 Java transformation 239 Lookup transformation 335 NewLookup Row in Lookup transformation 369 Rank transformation 424 Router transformation 436 Sequence Generator transformation 444 sorted 50, 494 sorted ports option 494 Source Qualifier 494 Union transformation 592 variables ports 15
post-session errors 570 stored procedures 566 pre- and post-session SQL Source Qualifier transformation 497 pre-session errors 569 stored procedures 566 primary key creating with Sequence Generator transformation 443 primitive datatypes Java transformation 235 Programmatic Identifier (property) External Procedure transformation 167 promoting non-reusable transformations 34 property access function description 203 property functions Custom transformation 126 property IDs Custom transformation 127
Q
queries Lookup transformation 345 overriding lookup 345 Source Qualifier transformation 473, 480 query See lookup query query mode rules and guidelines 511 SQL transformation 506 quoted identifiers reserved words 473
R
Rank transformation creating 426 defining groups for 425 options 423 overview 422 ports 424 RANKINDEX port 424 using variables 15 ranking groups of data 425 string values 423
real-time data processing with Joiner transformation 323 with Joiner transformation 322 recache from database named cache 364 overview 362 unnamed cache 364 registering COM procedures with repositories 174 reinitializing lookup cache See recache from database reject file update strategies 599 relational databases joining 306 repositories COM external procedures 174 registering COM procedures with 174 Requires Single Thread per Partition property HTTP transformation 223 Java transformation 243 Requires Single Thread property Custom transformation 84 reserved words generating SQL with 473 lookup query 347 resword.txt 473 Reset property Sequence Generator transformation 449, 452 return port Lookup transformation 336, 353 return values from external procedures 193 Lookup transformation 353 Stored Procedure transformation 549 reusable transformations adding to mappings 35 changing 36 creating 34 creating a non-reusable instance 35 mapping variables 33 overview 33 reverting to original 37 Revert button in reusable transformations 37 right outer join creating 488 syntax 488 rollBack example for Java transformation 269 Java transformation API method 269
Index
625
syntax for Java transformation 269 rolling back data Java transformation 269 Router transformation connecting in mappings 438 creating 440 example 434 filtering Normalizer data 415 group filter condition 433 groups 432 overview 430 ports 436 routing rows transformation for 430 row strategy functions array-based 153 row-based 146 Row transformation scope behavior in Joiner transformation 322 row-based functions data handling 135 row strategy 146 rows deleting 605 flagging for update 599 rules default values 30 Runtime Location (property) External Procedure transformation 168
S
script mode rules and guidelines 504 SQL transformation 503 ScriptError port description 504 ScriptName port SQL transformation 504 ScriptResults port SQL transformation 504 scripts locale option SQL transformation 526 select distinct overriding in sessions 496 Source Qualifier option 496 Sequence Generator transformation creating 454 creating composite key 443 creating primary and foreign keys 443 Current Value 450
626 Index
CURRVAL port 446 cycle 449 End Value property 450 Increment By property 449 NEXTVAL port 444 non-reusable 451 number of cached values 451 overview 442 ports 444 properties 448 replacing missing values 443 reset 452 reusable 452 start value 449 using IIF function to replace missing keys 443 sequence ID Lookup transformation 370 sequential caches See caches serializer Complex Data Exchange 56 services Complex Data Exchange 56 session logs Java transformation 267, 268 tracing levels 32 session recovery See also Workflow Administration Guide sessions $$$SessStartTime 469 configuring to handle stored procedure errors 569 External Procedure transformation 177 improving performance through variables 15 incremental aggregation 40 overriding select distinct 496 running pre- and post-stored procedures 566 setting update strategy 602 skipping errors 24 Stored Procedure transformation 549 setNull example for Java transformation 270 Java transformation API method 270 syntax for Java transformation 270 setOutRowType example for Java transformation 271 Java transformation API method 271 syntax for Java transformation 271 sharing named lookup caches 386 unnamed lookup caches 384 simple interface
API methods 281 example 282 Java expressions 281 sort order Aggregator transformation 50 configuring for Joiner transformation 314 Source Qualifier transformation 494 sort origin configuring the join condition to use 314 definition 314 sorted data joining from partitioned pipelines 314 using in Joiner transformation 314 sorted flat files using in Joiner transformation 314 sorted input flat file lookups 333 Sorted Input property Aggregator transformation 50 Joiner transformation 309 sorted ports Aggregator transformation 50 caching requirements 44 pre-sorting data 50 reasons not to use 50 sort order 494 Source Qualifier 494 sorted relational data using in Joiner transformation 314 Sorter transformation configuring 461 configuring Sorter Cache Size 461 creating 465 overview 458 properties 461 using with Joiner transformation 314 $Source Lookup transformations 339 Stored Procedure transformations 559 Source Analyzer creating key relationships 478 source filters adding to Source Qualifier 492 Source Qualifier transformation $$$SessStartTime 469 configuring 498 creating key relationships 478 custom joins 477 datatypes 468 default join 476 default query 473
entering source filter 492 entering user-defined join 482 joining source data 476 joins 478 mapping parameters and variables 469 Number of Sorted Ports option 494 outer join support 484 overriding default query 474, 480 overview 468 pre- and post-session SQL 497 properties 471, 499 Select Distinct option 496 sort order with Aggregator 51 SQL override 480 target load order 468 troubleshooting 500 viewing default query 474 XML Source Qualifier 608 sources joining 306 joining data from the same source 318 joining multiple 306 merging 590 Splitting output property Complex Data transformation 61 SQL adding custom query 480 overriding default query 474, 480 viewing default query 474 SQL Ports tab SQL transformation 527 SQL query adding custom query 480 dynamic connection example 541 dynamic update example 535 overriding default query 474, 480 viewing default query 474 SQL settings tab SQL transformation 526 SQL statements supported by SQL transformation 529 SQL transformation advanced options 514 configuring connections 513 database resilience 522 description 502 dynamic connection example 541 dynamic query example 535 dynamic SQL queries 508 native datatype column 507 NumRowsAffected 518
Index
627
passing full connection information 513 pass-through ports 511 properties 523, 524 query mode 506 script mode 503 ScriptError port 504 ScriptName port 504 ScriptResults port 504 SELECT queries 507 setting SQL attributes 526 setting Verbose Data 524 SQL ports description 527 static query ports 508 static SQL queries 507 supported SQL statements 529 transaction control 522 using string substitution 509 standard deviation See Aggregator transformation See also Transformation Language Reference Start Value property Sequence Generator transformation 448, 449 static code Java transformation 246 static database connection description 513 performance considerations 516 static lookup cache overview 366 static SQL queries configuring ports 508 SQL transformation 507 static variable Java transformation 246 status codes Stored Procedure transformation 549 Stored Procedure transformation Call Text (Property) 560 configuring 552 configuring connected stored procedure 562 configuring unconnected stored procedure 563 connected 549 Connection Information (Property) 559 creating by importing 556, 557 creating manually 558, 559 Execution Order (Property) 560 expression rules 573 importing stored procedure 556 input data 548 input/output parameters 549 modifying 560
output data 548 Output is Deterministic (Property) 560 Output is Repeatable (Property) 560 overview 548 performance tips 574 pre- and post-session 566 properties 559 return values 549 running pre- or post-session 566 setting options 559 specifying session runtime 550 specifying when run 550 status codes 549 Stored Procedure Type (Property) 560 Subsecond Precision (Property) 560 Tracing Level (Property) 560 troubleshooting 575 unconnected 549, 563 Stored Procedure Type (Property) Stored Procedure transformation 560 stored procedures See also Stored Procedure transformation changing parameters 560 creating sessions for pre or post-session run 566 database-specific syntax notes 553 definition 548 error handling 569 IBM DB2 example 555 importing 557 Informix example 553 load types 568 Microsoft example 554 Oracle example 554 post-session errors 570 pre-session errors 569 session errors 570 setting type of 560 specifying order of processing 550 supported databases 571 Sybase example 554 Teradata example 555 writing 553 writing to variables 17 streamer component Complex Data Exchange 56 string substitution SQL transformation queries 509 strings ranking 423 Subsecond Precision (Property) Stored Procedure transformation 560
628
Index
Sybase ASE connect string syntax 514 ORDER BY limitation 347 stored procedure notes 554 syntax common database restrictions 491 creating left outer joins 486 creating normal joins 485 creating right outer joins 488
T
tables creating key relationships 478 $Target Lookup transformations 339 Stored Procedure transformations 559 target load order Source Qualifier 468 target tables deleting rows 605 inserts 605 setting update strategy for 603 targets updating 598 TC_COMMIT_AFTER constant description 580 TC_COMMIT_BEFORE constant description 580 TC_CONTINUE_TRANSACTION constant description 580 TC_ROLLBACK_AFTER constant description 580 TC_ROLLBACK_BEFORE constant description 580 Teradata connect string syntax 514 Terse tracing level defined 32 threads Custom transformation 84 thread-specific operations Custom transformations 84 HTTP transformations 223 TINFParam parameter type definition 195 Tracing Level (Property) Stored Procedure transformation 560 Tracing Level (property) External Procedure transformation 168 tracing level function
description 208 Tracing Level property Java transformation 242 tracing levels Joiner transformation property 309 Normal 32 overriding 32 overview 32 Sequence Generator transformation property 449 session 32 session properties 42 Terse 32 Verbose Data 32 Verbose Initialization 32 tracing messages for external procedures 194 transaction definition 579 generating 86, 244, 261 working with in Joiner transformation 322 transaction boundaries Custom transformation 87 transaction boundary dropping in Joiner transformation 324 preserving in Joiner transformation 323 transaction control Custom transformation 86 example 580 expression 580 Java transformation 243 overview 578 SQL transformation 522 transformation 579 Transaction Control transformation creating 587 effective 582 in mappings 582 ineffective 582 mapping validation 586 overview 579 properties 579 Transaction transformation scope behavior in Joiner transformation 322 Transformation Exchange (TX) definition 164 transformation language aggregate functions 45 using with Java expressions 274 transformation scope All Input transformation scope with Joiner transformation 322
Index
629
Custom transformation 86 defining for Joiner transformation 322 Java transformation 243 Joiner transformation property 309 Row transformation scope with Joiner transformation 322 Transaction transformation scope with Joiner transformation 322 Transformation Scope property Java transformation 242 transformations See also active transformations See also connected transformations See also passive transformations See also unconnected transformations active and passive 2 adding to mappings 5 Aggregator 40 connected 2 creating 5 Custom 72 definition 2 descriptions 2 Expression 158 External Procedure 164 Filter 210 handling errors 27 Joiner 306 Lookup 330 making reusable 34 multi-group 9 Normalizer 392 overview 2 promoting to reusable 34 Rank 422 reusable transformations 33 Router 430 Sequence Generator 442 Source Qualifier 468 SQL 502 Stored Procedure 548 tracing levels 32 types that allow for expressions 11 unconnected 2 Union 590 Update Strategy 598 XML Generator 610 XML Parser 609 XML Source Qualifier 608 Transformer component Complex Data Exchange 56
Treat Source Rows As update strategy 602 troubleshooting Aggregator transformation 54 Java transformation 254 Normalizer transformation 420 Source Qualifier transformation 500 Stored Procedure transformation 575 TX-prefixed files external procedures 182
U
unconnected Lookup transformation input ports 352 return port 353 unconnected lookups See also Lookup transformation adding lookup conditions 353 calling through expressions 354 description 331 designating return values 353 overview 352 unconnected transformations External Procedure transformation 164, 196 Lookup 330 Lookup transformation 352 Stored Procedure transformation 548 Unicode mode See also Administrator Guide configuring sort order for Joiner transformation 314 Custom transformation 73 External Procedure Transformation 164 Union transformation components 590 creating 594 groups 592 guidelines 590 overview 590 ports 592 unnamed cache persistent 364 recache from database 364 sharing 384 unsorted Joiner transformation processing detail rows 321 processing master rows 321 update strategy setting with a Custom transformation 84 setting with a Java transformation 244 Update Strategy transformation
630
Index
Aggregator combination 600 checklist 605 creating 599 entering expressions 599 forwarding rejected rows 599 Lookup combination 601 overview 598 setting options for sessions 602, 603 steps to configure 598 Update Strategy Transformation property Java transformation 242 URL adding through business documentation links 12 user-defined functions using with Java expressions 274 user-defined group Router transformation 432 user-defined joins entering 482 user-defined methods Java transformation 247, 248
wrapper classes for 194 Visual C++ adding libraries to Integration Service 194 COM datatypes 192 developing COM external procedures 170 distributing procedures manually 191 wrapper classes for 194 VSAM Normalizer transformation creating 406 description 401 Normalizer tab 404
W
web links adding to expressions 12 Windows systems compiling DLLs on 186 wizards ATL COM AppWizard 170 MFC AppWizard 186 Visual Basic Application Setup Wizard 190 wrapper classes for pre-existing libraries or functions 194
V
Validate button transformations 30 validating default values 30 expressions 13 values calculating with Expression transformation 161 variable ports overview 15 variables capturing stored procedure results 17 initializations 18 Java transformation 246, 247, 248 overview 15 port evaluation order 18 Verbose Data tracing level overview 32 SQL transformation 524 Verbose Initialization tracing level overview 32 Visual Basic adding functions to Integration Service 194 Application Setup Wizard 190 code for external procedures 165 COM datatypes 192 developing COM external procedures 177 distributing procedures manually 191
X
XML Generator transformation overview 610 XML Parser transformation overview 609 XML transformations See also XML Guide Source Qualifier 608 XML Generator 610 XML Parser 609
Index
631
632
Index