XML Guide
XML Guide
Informatica PowerCenter®
(Version 7.1.1)
Informatica PowerCenter XML User Guide
Version 7.1.1
August 2004
This software and documentation contain proprietary information of Informatica Corporation, they are provided under a license agreement
containing restrictions on use and disclosure and is also protected by copyright law. Reverse engineering of the software is prohibited. No
part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
without prior consent of Informatica Corporation.
Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software
license agreement as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR
12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.
The information in this document is subject to change without notice. If you find any problems in the documentation, please report them to
us in writing. Informatica Corporation does not warrant that this documentation is error free.
Informatica, PowerMart, PowerCenter, PowerChannel, PowerCenter Connect, MX, and SuperGlue are trademarks or registered trademarks
of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be
trade names or trademarks of their respective owners.
Informatica PowerCenter products contain ACE (TM) software copyrighted by Douglas C. Schmidt and his research group at Washington
University and University of California, Irvine, Copyright (c) 1993-2002, all rights reserved.
Portions of this software contain copyrighted material from The JBoss Group, LLC. Your right to use such materials is set forth in the GNU
Lesser General Public License Agreement, which may be found at http://www.opensource.org/licenses/lgpl-license.php. The JBoss materials
are provided free of charge by Informatica, “as-is”, without warranty of any kind, either express or implied, including but not limited to the
implied warranties of merchantability and fitness for a particular purpose.
Portions of this software contain copyrighted material from Meta Integration Technology, Inc. Meta Integration® is a registered trademark
of Meta Integration Technology, Inc.
This product includes software developed by the Apache Software Foundation (http://www.apache.org/).
The Apache Software is Copyright (c) 1999-2004 The Apache Software Foundation. All rights reserved.
DISCLAIMER: Informatica Corporation provides this documentation “as is” without warranty of any kind, either express or implied,
including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. The information
provided in this documentation may include technical inaccuracies or typographical errors. Informatica could make improvements and/or
changes in the products described in this documentation at any time without notice.
Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
New Features and Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiv
PowerCenter 7.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiv
PowerCenter 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xvi
PowerCenter 7.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
About Informatica Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvi
About this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii
Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . xxviii
Visiting the Informatica Webzine . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii
Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii
Visiting the Informatica Developer Network . . . . . . . . . . . . . . . . . . . xxviii
Obtaining Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxix
iii
Simple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Complex Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Component Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Element and Attribute Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Substitution Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
XML Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Code Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
iv Table of Contents
Generating Hierarchy Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Creating Custom XML Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Selecting Root Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Reducing Metadata Explosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Synchronizing XML Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Editing XML Source Definition Properties . . . . . . . . . . . . . . . . . . . . . . . . . 70
Creating XML Definitions from Repository Definitions . . . . . . . . . . . . . . . 72
Troubleshooting XML Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Table of Contents v
Validating XML Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Setting XML View Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
All Hierarchy Foreign Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Non-Recursive Row Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Hierarchy Relationship Row Option . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Force Row Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Type Relationship Row Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Troubleshooting XML Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
vi Table of Contents
Chapter 7: Midstream XML Transformations . . . . . . . . . . . . . . . . . . 137
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
XML Parser Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
XML Generator Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Creating a Midstream XML Transformation . . . . . . . . . . . . . . . . . . . . . . . 143
Editing Midstream XML Transformation Properties . . . . . . . . . . . . . . . . . 144
Midstream XML Parser Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Midstream XML Generator Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Generating Pass-Through Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
List of Figures ix
Figure 5-1. Filename Column in a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117
Figure 6-1. XML Source Qualifier Transformation Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . .126
Figure 6-2. Linking XML Source Qualifier Transformations to One Input Group . . . . . . . . . .129
Figure 6-3. Linking XML Source Qualifier to Multiple Input Group Transformations . . . . . .130
Figure 6-4. Sample XML File StoreInfo.xml . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .131
Figure 6-5. Invalid use of XML Source Qualifier Transformation in Aggregator Mapping . . . .132
Figure 6-6. Using a Denormalized Group in a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . .133
Figure 6-7. Using an XML Source Definition Twice in a Mapping . . . . . . . . . . . . . . . . . . . . .134
Figure 7-1. XML Parser Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .140
Figure 7-2. XML Generator Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .141
Figure 7-3. Sample XML Generator Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .142
Figure 7-4. Midstream XML Parser Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145
Figure 7-5. Midstream XML Generator Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .146
Figure 7-6. Pass-Through Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149
Figure 8-1. Properties Settings for an XML Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152
Figure 8-2. Properties Settings for an XML Writer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .155
Figure 8-3. Mapping Data to an XML Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .163
Figure 8-4. Properties Settings for an XML Generator Transformation . . . . . . . . . . . . . . . . . .165
Figure 8-5. Properties Settings for an XML Parser Transformation . . . . . . . . . . . . . . . . . . . . .167
x List of Figures
List of Tables
Table 1-1. Cardinality of Elements in XML . . . . . . . . . . . . . . . . . ... . . . . . . .. . . . . . . . . . . 14
Table 3-1. Create XML Views Options . . . . . . . . . . . . . . . . . . . . ... . . . . . . .. . . . . . . . . . . 59
Table 6-1. XML Source Qualifier Properties . . . . . . . . . . . . . . . . ... . . . . . . .. . . . . . . . . . 127
Table 7-1. Midstream XML Parser Settings . . . . . . . . . . . . . . . . . ... . . . . . . .. . . . . . . . . . 145
Table 7-2. Midstream XML Generator Settings . . . . . . . . . . . . . . ... . . . . . . .. . . . . . . . . . 146
Table 8-1. XML Reader Options . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . .. . . . . . . . . . 152
Table 8-2. XML Source Qualifier Options for a Session . . . . . . . . ... . . . . . . .. . . . . . . . . . 153
Table 8-3. XML Writer Options . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . .. . . . . . . . . . 156
Table 8-4. Null and Empty String Output for XML Targets . . . . . ... . . . . . . .. . . . . . . . . . 159
Table 8-5. XML Generator Transformation Session Options. . . . . ... . . . . . . .. . . . . . . . . . 165
Table 8-6. XML Parser Transformation Session Options . . . . . . . ... . . . . . . .. . . . . . . . . . 167
Table A-1. XML and Transformation Datatypes . . . . . . . . . . . . . . ... . . . . . . .. . . . . . . . . . 170
List of Tables xi
xii List of Tables
Preface
Welcome to PowerCenter, Informatica’s software product that delivers an open, scalable data
integration solution addressing the complete life cycle for all data integration projects
including data warehouses and data marts, data migration, data synchronization, and
information hubs. PowerCenter combines the latest technology enhancements for reliably
managing data repositories and delivering information resources in a timely, usable, and
efficient manner.
The PowerCenter metadata repository coordinates and drives a variety of core functions,
including extracting, transforming, loading, and managing data. The PowerCenter Server can
extract large volumes of data from multiple platforms, handle complex transformations on the
data, and support high-speed loads. PowerCenter can simplify and accelerate the process of
moving data warehouses from development to test to production.
xiii
New Features and Enhancements
This section describes new features and enhancements to PowerCenter 7.1.1, 7.1, and 7.0.
PowerCenter 7.1.1
This section describes new features and enhancements to PowerCenter 7.1.1.
Data Profiling
♦ Data sampling. You can create a data profile for a sample of source data instead of the
entire source. You can view a profile from a random sample of data, a specified percentage
of data, or for a specified number of rows starting with the first row.
♦ Verbose data enhancements. You can specify the type of verbose data you want the
PowerCenter Server to write to the Data Profiling warehouse. The PowerCenter Server can
write all rows, the rows that meet the business rule, or the rows that do not meet the
business rule.
♦ Session enhancement. You can save sessions that you create from the Profile Manager to
the repository.
♦ Domain Inference function tuning. You can configure the Data Profiling Wizard to filter
the Domain Inference function results. You can configure a maximum number of patterns
and a minimum pattern frequency. You may want to narrow the scope of patterns returned
to view only the primary domains, or you may want to widen the scope of patterns
returned to view exception data.
♦ Row Uniqueness function. You can determine unique rows for a source based on a
selection of columns for the specified source.
♦ Define mapping, session, and workflow prefixes. You can define default mapping,
session, and workflow prefixes for the mappings, sessions, and workflows generated when
you create a data profile.
♦ Profile mapping display in the Designer. The Designer displays profile mappings under a
profile mappings node in the Navigator.
PowerCenter Server
♦ Code page. PowerCenter supports additional Japanese language code pages, such as JIPSE-
kana, JEF-kana, and MELCOM-kana.
♦ Flat file partitioning. When you create multiple partitions for a flat file source session, you
can configure the session to create multiple threads to read the flat file source.
♦ pmcmd. You can use parameter files that reside on a local machine with the Startworkflow
command in the pmcmd program. When you use a local parameter file, pmcmd passes
variables and values in the file to the PowerCenter Server.
xiv Preface
♦ SuSE Linux support. The PowerCenter Server runs on SuSE Linux. On SuSE Linux, you
can connect to IBM, DB2, Oracle, and Sybase sources, targets, and repositories using
native drivers. Use ODBC drivers to access other sources and targets.
♦ Reserved word support. If any source, target, or lookup table name or column name
contains a database reserved word, you can create and maintain a file, reswords.txt,
containing reserved words. When the PowerCenter Server initializes a session, it searches
for reswords.txt in the PowerCenter Server installation directory. If the file exists, the
PowerCenter Server places quotes around matching reserved words when it executes SQL
against the database.
♦ Teradata external loader. When you load to Teradata using an external loader, you can
now override the control file. Depending on the loader you use, you can also override the
error, log, and work table names by specifying different tables on the same or different
Teradata database.
Repository
♦ Exchange metadata with other tools. You can exchange source and target metadata with
other BI or data modeling tools, such as Business Objects Designer. You can export or
import multiple objects at a time. When you export metadata, the PowerCenter Client
creates a file format recognized by the target tool.
Repository Server
♦ pmrep. You can use pmrep to perform the following functions:
− Remove repositories from the Repository Server cache entry list.
− Enable enhanced security when you create a relational source or target connection in the
repository.
− Update a connection attribute value when you update the connection.
♦ SuSE Linux support. The Repository Server runs on SuSE Linux. On SuSE Linux, you
can connect to IBM, DB2, Oracle, and Sybase repositories.
Security
♦ Oracle OS Authentication. You can now use Oracle OS Authentication to authenticate
database users. Oracle OS Authentication allows you to log on to an Oracle database if you
have a logon to the operating system. You do not need to know a database user name and
password. PowerCenter uses Oracle OS Authentication when the user name for an Oracle
connection is PmNullUser.
Preface xv
♦ Pipeline partitioning. You can create multiple partitions in a session containing web
service source and target definitions. The PowerCenter Server creates a connection to the
Web Services Hub based on the number of sources, targets, and partitions in the session.
XML
♦ Multi-level pivoting. You can now pivot more than one multiple-occurring element in an
XML view. You can also pivot the view row.
PowerCenter 7.1
This section describes new features and enhancements to PowerCenter 7.1.
Data Profiling
♦ Data Profiling for VSAM sources. You can now create a data profile for VSAM sources.
♦ Support for verbose mode for source-level functions. You can now create data profiles
with source-level functions and write data to the Data Profiling warehouse in verbose
mode.
♦ Aggregator function in auto profiles. Auto profiles now include the Aggregator function.
♦ Creating auto profile enhancements. You can now select the columns or groups you want
to include in an auto profile and enable verbose mode for the Distinct Value Count
function.
♦ Purging data from the Data Profiling warehouse. You can now purge data from the Data
Profiling warehouse.
♦ Source View in the Profile Manager. You can now view data profiles by source definition
in the Profile Manager.
♦ PowerCenter Data Profiling report enhancements. You can now view PowerCenter Data
Profiling reports in a separate browser window, resize columns in a report, and view
verbose data for Distinct Value Count functions.
♦ Prepackaged domains. Informatica provides a set of prepackaged domains that you can
include in a Domain Validation function in a data profile.
Documentation
♦ Web Services Provider Guide. This is a new book that describes the functionality of Real-time
Web Services. It also includes information from the version 7.0 Web Services Hub Guide.
♦ XML User Guide. This book consolidates XML information previously documented in the
Designer Guide, Workflow Administration Guide, and Transformation Guide.
Licensing
Informatica provides licenses for each CPU and each repository rather than for each
installation. Informatica provides licenses for product, connectivity, and options. You store
xvi Preface
the license keys in a license key file. You can manage the license files using the Repository
Server Administration Console, the PowerCenter Server Setup, and the command line
program, pmlic.
PowerCenter Server
♦ 64-bit support. You can now run 64-bit PowerCenter Servers on AIX and HP-UX
(Itanium).
♦ Partitioning enhancements. If you have the Partitioning option, you can define up to 64
partitions at any partition point in a pipeline that supports multiple partitions.
♦ PowerCenter Server processing enhancements. The PowerCenter Server now reads a
block of rows at a time. This improves processing performance for most sessions.
♦ CLOB/BLOB datatype support. You can now read and write CLOB/BLOB datatypes.
Repository Server
♦ Updating repository statistics. PowerCenter now identifies and updates statistics for all
repository tables and indexes when you copy, upgrade, and restore repositories. This
improves performance when PowerCenter accesses the repository.
♦ Increased repository performance. You can increase repository performance by skipping
information when you copy, back up, or restore a repository. You can choose to skip MX
data, workflow and session log history, and deploy group history.
♦ pmrep. You can use pmrep to back up, disable, or enable a repository, delete a relational
connection from a repository, delete repository details, truncate log files, and run multiple
pmrep commands sequentially. You can also use pmrep to create, modify, and delete a
folder.
Repository
♦ Exchange metadata with business intelligence tools. You can export metadata to and
import metadata from other business intelligence tools, such as Cognos Report Net and
Business Objects.
♦ Object import and export enhancements. You can compare objects in an XML file to
objects in the target repository when you import objects.
♦ MX views. MX views have been added to help you analyze metadata stored in the
repository. REP_SERVER_NET and REP_SERVER_NET_REF views allow you to see
information about server grids. REP_VERSION_PROPS allows you to see the version
history of all objects in a PowerCenter repository.
Preface xvii
Transformations
♦ Flat file lookup. You can now perform lookups on flat files. When you create a Lookup
transformation using a flat file as a lookup source, the Designer invokes the Flat File
Wizard. You can also use a lookup file parameter if you want to change the name or
location of a lookup between session runs.
♦ Dynamic lookup cache enhancements. When you use a dynamic lookup cache, the
PowerCenter Server can ignore some ports when it compares values in lookup and input
ports before it updates a row in the cache. Also, you can choose whether the PowerCenter
Server outputs old or new values from the lookup/output ports when it updates a row. You
might want to output old values from lookup/output ports when you use the Lookup
transformation in a mapping that updates slowly changing dimension tables.
♦ Union transformation. You can use the Union transformation to merge multiple sources
into a single pipeline. The Union transformation is similar to using the UNION ALL SQL
statement to combine the results from two or more SQL statements.
♦ Custom transformation API enhancements. The Custom transformation API includes
new array-based functions that allow you to create procedure code that receives and
outputs a block of rows at a time. Use these functions to take advantage of the
PowerCenter Server processing enhancements.
♦ Midstream XML transformations. You can now create an XML Parser transformation or
an XML Generator transformation to parse or generate XML inside a pipeline. The XML
transformations enable you to extract XML data stored in relational tables, such as data
stored in a CLOB column. You can also extract data from messaging systems, such as
TIBCO or IBM MQSeries.
Usability
♦ Viewing active folders. The Designer and the Workflow Manager highlight the active
folder in the Navigator.
♦ Enhanced printing. The quality of printed workspace has improved.
Version Control
You can run object queries that return shortcut objects. You can also run object queries based
on the latest status of an object. The query can return local objects that are checked out, the
latest version of checked in objects, or a collection of all older versions of objects.
xviii Preface
Note: PowerCenter Connect for Web Services allows you to create sources, targets, and
transformations to call web services hosted by other providers. For more informations, see
PowerCenter Connect for Web Services User and Administrator Guide.
Workflow Monitor
The Workflow Monitor includes the following performance and usability enhancements:
♦ When you connect to the PowerCenter Server, you no longer distinguish between online
or offline mode.
♦ You can open multiple instances of the Workflow Monitor on one machine.
♦ You can simultaneously monitor multiple PowerCenter Servers registered to the same
repository.
♦ The Workflow Monitor includes improved options for filtering tasks by start and end
time.
♦ The Workflow Monitor displays workflow runs in Task view chronologically with the most
recent run at the top. It displays folders alphabetically.
♦ You can remove the Navigator and Output window.
XML Support
PowerCenter XML support now includes the following features:
♦ Enhanced datatype support. You can use XML schemas that contain simple and complex
datatypes.
♦ Additional options for XML definitions. When you import XML definitions, you can
choose how you want the Designer to represent the metadata associated with the imported
files. You can choose to generate XML views using hierarchy or entity relationships. In a
view with hierarchy relationships, the Designer expands each element and reference under
its parent element. When you create views with entity relationships, the Designer creates
separate entities for references and multiple-occurring elements.
♦ Synchronizing XML definitions. You can synchronize one or more XML definition when
the underlying schema changes. You can synchronize an XML definition with any
repository definition or file used to create the XML definition, including relational sources
or targets, XML files, DTD files, or schema files.
♦ XML workspace. You can edit XML views and relationships between views in the
workspace. You can create views, add or delete columns from views, and define
relationships between views.
♦ Midstream XML transformations. You can now create an XML Parser transformation or
an XML Generator transformation to parse or generate XML inside a pipeline. The XML
transformations enable you to extract XML data stored in relational tables, such as data
stored in a CLOB column. You can also extract data from messaging systems, such as
TIBCO or IBM MQSeries.
Preface xix
♦ Support for circular references. Circular references occur when an element is a direct or
indirect child of itself. PowerCenter now supports XML files, DTD files, and XML
schemas that use circular definitions.
♦ Increased performance for large XML targets. You can create XML files of several
gigabytes in a PowerCenter 7.1 XML session by using the following enhancements:
− Spill to disk. You can specify the size of the cache used to store the XML tree. If the size
of the tree exceeds the cache size, the XML data spills to disk in order to free up
memory.
− User-defined commits. You can define commits to trigger flushes for XML target files.
− Support for multiple XML output files. You can output XML data to multiple XML
targets. You can also define the file names for XML output files in the mapping.
PowerCenter 7.0
This section describes new features and enhancements to PowerCenter 7.0.
Data Profiling
If you have the Data Profiling option, you can profile source data to evaluate source data and
detect patterns and exceptions. For example, you can determine implicit data type, suggest
candidate keys, detect data patterns, and evaluate join criteria. After you create a profiling
warehouse, you can create profiling mappings and run sessions. Then you can view reports
based on the profile data in the profiling warehouse.
The PowerCenter Client provides a Profile Manager and a Profile Wizard to complete these
tasks.
Documentation
♦ Glossary. The Installation and Configuration Guide contains a glossary of new PowerCenter
terms.
♦ Installation and Configuration Guide. The connectivity information in the Installation
and Configuration Guide is consolidated into two chapters. This book now contains
chapters titled “Connecting to Databases from Windows” and “Connecting to Databases
from UNIX.”
♦ Upgrading metadata. The Installation and Configuration Guide now contains a chapter
titled “Upgrading Repository Metadata.” This chapter describes changes to repository
xx Preface
objects impacted by the upgrade process. The change in functionality for existing objects
depends on the version of the existing objects. Consult the upgrade information in this
chapter for each upgraded object to determine whether the upgrade applies to your current
version of PowerCenter.
Functions
♦ Soundex. The Soundex function encodes a string value into a four-character string.
SOUNDEX works for characters in the English alphabet (A-Z). It uses the first character
of the input string as the first character in the return value and encodes the remaining
three unique consonants as numbers.
♦ Metaphone. The Metaphone function encodes string values. You can specify the length of
the string that you want to encode. METAPHONE encodes characters of the English
language alphabet (A-Z). It encodes both uppercase and lowercase letters in uppercase.
Installation
♦ Remote PowerCenter Client installation. You can create a control file containing
installation information, and distribute it to other users to install the PowerCenter Client.
You access the Informatica installation CD from the command line to create the control
file and install the product.
PowerCenter Server
♦ DB2 bulk loading. You can enable bulk loading when you load to IBM DB2 8.1.
♦ Distributed processing. If you purchase the Server Grid option, you can group
PowerCenter Servers registered to the same repository into a server grid. In a server grid,
PowerCenter Servers balance the workload among all the servers in the grid.
♦ Row error logging. The session configuration object has new properties that allow you to
define error logging. You can choose to log row errors in a central location to help
understand the cause and source of errors.
♦ External loading enhancements. When using external loaders on Windows, you can now
choose to load from a named pipe. When using external loaders on UNIX, you can now
choose to load from staged files.
Preface xxi
♦ External loading using Teradata Warehouse Builder. You can use Teradata Warehouse
Builder to load to Teradata. You can choose to insert, update, upsert, or delete data.
Additionally, Teradata Warehouse Builder can simultaneously read from multiple sources
and load data into one or more tables.
♦ Mixed mode processing for Teradata external loaders. You can now use data driven load
mode with Teradata external loaders. When you select data driven loading, the
PowerCenter Server flags rows for insert, delete, or update. It writes a column in the target
file or named pipe to indicate the update strategy. The control file uses these values to
determine how to load data to the target.
♦ Concurrent processing. The PowerCenter Server now reads data concurrently from
sources within a target load order group. This enables more efficient joins with minimal
usage of memory and disk cache.
♦ Real time processing enhancements. You can now use real-time processing in sessions that
also process active transformations, such as the Aggregator transformation. You can apply
the transformation logic to rows defined by transaction boundaries.
Repository Server
♦ Object export and import enhancements. You can now export and import objects using
the Repository Manager and pmrep. You can export and import multiple objects and
objects types. You can export and import objects with or without their dependent objects.
You can also export objects from a query result or objects history.
♦ pmrep commands. You can use pmrep to perform change management tasks, such as
maintaining deployment groups and labels, checking in, deploying, importing, exporting,
and listing objects. You can also use pmrep to run queries. The deployment and object
import commands require you to use a control file to define options and resolve conflicts.
♦ Trusted connections. You can now use a Microsoft SQL Server trusted connection to
connect to the repository.
Security
♦ LDAP user authentication. You can now use default repository user authentication or
Lightweight Directory Access Protocol (LDAP) to authenticate users. If you use LDAP, the
repository maintains an association between your repository user name and your external
login name. When you log in to the repository, the security module passes your login name
to the external directory for authentication. The repository maintains a status for each
user. You can now enable or disable users from accessing the repository by changing the
status. You do not have to delete user names from the repository.
♦ Use Repository Manager privilege. The Use Repository Manager privilege allows you to
perform tasks in the Repository Manager, such as copy object, maintain labels, and change
object status. You can perform the same tasks in the Designer and Workflow Manager if
you have the Use Designer and Use Workflow Manager privileges.
♦ Audit trail. You can track changes to repository users, groups, privileges, and permissions
through the Repository Server Administration Console. The Repository Agent logs
security changes to a log file stored in the Repository Server installation directory. The
xxii Preface
audit trail log contains information, such as changes to folder properties, adding or
removing a user or group, and adding or removing privileges.
Transformations
♦ Custom transformation. Custom transformations operate in conjunction with procedures
you create outside of the Designer interface to extend PowerCenter functionality. The
Custom transformation replaces the Advanced External Procedure transformation. You can
create Custom transformations with multiple input and output groups, and you can
compile the procedure with any C compiler.
You can create templates that customize the appearance and available properties of a
Custom transformation you develop. You can specify the icons used for transformation,
the colors, and the properties a mapping developer can modify. When you create a Custom
transformation template, distribute the template with the DLL or shared library you
develop.
♦ Joiner transformation. You can use the Joiner transformation to join two data streams that
originate from the same source.
Version Control
The PowerCenter Client and repository introduce features that allow you to create and
manage multiple versions of objects in the repository. Version control allows you to maintain
multiple versions of an object, control development on the object, track changes, and use
deployment groups to copy specific groups of objects from one repository to another. Version
control in PowerCenter includes the following features:
♦ Object versioning. Individual objects in the repository are now versioned. This allows you
to store multiple copies of a given object during the development cycle. Each version is a
separate object with unique properties.
♦ Check out and check in versioned objects. You can check out and reserve an object you
want to edit, and check in the object when you are ready to create a new version of the
object in the repository.
♦ Compare objects. The Repository Manager and Workflow Manager allow you to compare
two repository objects of the same type to identify differences between them. You can
compare Designer objects and Workflow Manager objects in the Repository Manager. You
can compare tasks, sessions, worklets, and workflows in the Workflow Manager. The
PowerCenter Client tools allow you to compare objects across open folders and
repositories. You can also compare different versions of the same object.
♦ Delete or purge a version. You can delete an object from view and continue to store it in
the repository. You can recover or undelete deleted objects. If you want to permanently
remove an object version, you can purge it from the repository.
♦ Deployment. Unlike copying a folder, copying a deployment group allows you to copy a
select number of objects from multiple folders in the source repository to multiple folders
in the target repository. This gives you greater control over the specific objects copied from
one repository to another.
Preface xxiii
♦ Deployment groups. You can create a deployment group that contains references to
objects from multiple folders across the repository. You can create a static deployment
group that you manually add objects to, or create a dynamic deployment group that uses a
query to populate the group.
♦ Labels. A label is an object that you can apply to versioned objects in the repository. This
allows you to associate multiple objects in groups defined by the label. You can use labels
to track versioned objects during development, improve query results, and organize groups
of objects for deployment or export and import.
♦ Queries. You can create a query that specifies conditions to search for objects in the
repository. You can save queries for later use. You can make a private query, or you can
share it with all users in the repository.
♦ Track changes to an object. You can view a history that includes all versions of an object
and compare any version of the object in the history to any other version. This allows you
to see the changes made to an object over time.
XML Support
PowerCenter contains XML features that allow you to validate an XML file against an XML
schema, declare multiple namespaces, use XPath to locate XML nodes, increase performance
for large XML files, format your XML file output for increased readability, and parse or
generate XML data from various sources. XML support in PowerCenter includes the
following features:
♦ XML schema. You can use an XML schema to validate an XML file and to generate source
and target definitions. XML schemas allow you to declare multiple namespaces so you can
use prefixes for elements and attributes. XML schemas also allow you to define some
complex datatypes.
♦ XPath support. The XML wizard allows you to view the structure of XML schema. You
can use XPath to locate XML nodes.
♦ Increased performance for large XML files. When you process an XML file or stream, you
can set commits and periodically flush XML data to the target instead of writing all the
output at the end of the session. You can choose to append the data to the same target file
or create a new target file after each flush.
♦ XML target enhancements. You can format the XML target file so that you can easily view
the XML file in a text editor. You can also configure the PowerCenter Server to not output
empty elements to the XML target.
Usability
♦ Copying objects. You can now copy objects from all the PowerCenter Client tools using
the copy wizard to resolve conflicts. You can copy objects within folders, to other folders,
and to different repositories. Within the Designer, you can also copy segments of
mappings to a workspace in a new folder or repository.
♦ Comparing objects. You can compare workflows and tasks from the Workflow Manager.
You can also compare all objects from within the Repository Manager.
xxiv Preface
♦ Change propagation. When you edit a port in a mapping, you can choose to propagate
changed attributes throughout the mapping. The Designer propagates ports, expressions,
and conditions based on the direction that you propagate and the attributes you choose to
propagate.
♦ Enhanced partitioning interface. The Session Wizard is enhanced to provide a graphical
depiction of a mapping when you configure partitioning.
♦ Revert to saved. You can now revert to the last saved version of an object in the Workflow
Manager. When you do this, the Workflow Manager accesses the repository to retrieve the
last-saved version of the object.
♦ Enhanced validation messages. The PowerCenter Client writes messages in the Output
window that describe why it invalidates a mapping or workflow when you modify a
dependent object.
♦ Validate multiple objects. You can validate multiple objects in the repository without
fetching them into the workspace. You can save and optionally check in objects that
change from invalid to valid status as a result of the validation. You can validate sessions,
mappings, mapplets, workflows, and worklets.
♦ View dependencies. Before you edit or delete versioned objects, such as sources, targets,
mappings, or workflows, you can view dependencies to see the impact on other objects.
You can view parent and child dependencies and global shortcuts across repositories.
Viewing dependencies help you modify objects and composite objects without breaking
dependencies.
♦ Refresh session mappings. In the Workflow Manager, you can refresh a session mapping.
Preface xxv
About Informatica Documentation
The complete set of documentation for PowerCenter includes the following books:
♦ Data Profiling Guide. Provides information about how to profile PowerCenter sources to
evaluate source data and detect patterns and exceptions.
♦ Designer Guide. Provides information needed to use the Designer. Includes information to
help you create mappings, mapplets, and transformations. Also includes a description of
the transformation datatypes used to process and transform source data.
♦ Getting Started. Provides basic tutorials for getting started.
♦ Installation and Configuration Guide. Provides information needed to install and
configure the PowerCenter tools, including details on environment variables and database
connections.
♦ PowerCenter Connect® for JMS® User and Administrator Guide. Provides information
to install PowerCenter Connect for JMS, build mappings, extract data from JMS messages,
and load data into JMS messages.
♦ Repository Guide. Provides information needed to administer the repository using the
Repository Manager or the pmrep command line program. Includes details on
functionality available in the Repository Manager and Administration Console, such as
creating and maintaining repositories, folders, users, groups, and permissions and
privileges.
♦ Transformation Language Reference. Provides syntax descriptions and examples for each
transformation function provided with PowerCenter.
♦ Transformation Guide. Provides information on how to create and configure each type of
transformation in the Designer.
♦ Troubleshooting Guide. Lists error messages that you might encounter while using
PowerCenter. Each error message includes one or more possible causes and actions that
you can take to correct the condition.
♦ Web Services Provider Guide. Provides information you need to install and configure the Web
Services Hub. This guide also provides information about how to use the web services that the
Web Services Hub hosts. The Web Services Hub hosts Real-time Web Services, Batch Web
Services, and Metadata Web Services.
♦ Workflow Administration Guide. Provides information to help you create and run
workflows in the Workflow Manager, as well as monitor workflows in the Workflow
Monitor. Also contains information on administering the PowerCenter Server and
performance tuning.
♦ XML User Guide. Provides information you need to create XML definitions from XML,
XSD, or DTD files, and relational or other XML definitions. Includes information on
running sessions with XML data. Also includes details on using the midstream XML
transformations to parse or generate XML data within a pipeline.
xxvi Preface
About this Book
The XML User Guide is written for IS developers and software engineers responsible for
working with XML in a data warehouse environment. Before you use the XML User Guide,
ensure that you have a solid understanding of XML concepts, your operating systems, flat
files, or mainframe system in your environment. Also, ensure that you are familiar with the
interface requirements for your supporting applications.
The material in this book is available for online use.
Document Conventions
This guide uses the following formatting conventions:
italicized monospaced text This is the variable name for a value you enter as part of an
operating system command. This is generic text that should be
replaced with user-supplied values.
Warning: The following paragraph notes situations where you can overwrite
or corrupt data, unless you follow the specified procedure.
bold monospaced text This is an operating system command you enter from a prompt to
run a task.
Preface xxvii
Other Informatica Resources
In addition to the product manuals, Informatica provides these other resources:
♦ Informatica Customer Portal
♦ Informatica Webzine
♦ Informatica web site
♦ Informatica Developer Network
♦ Informatica Technical Support
xxviii Preface
The site contains information on how to create, market, and support customer-oriented add-
on solutions based on Informatica’s interoperability interfaces.
Belgium
Phone: +32 15 281 702
Hours: 9 a.m. - 5:30 p.m. (local time)
France
Phone: +33 1 41 38 92 26
Hours: 9 a.m. - 5:30 p.m. (local time)
Germany
Phone: +49 1805 702 702
Hours: 9 a.m. - 5:30 p.m. (local time)
Netherlands
Phone: +31 306 082 089
Hours: 9 a.m. - 5:30 p.m. (local time)
Singapore
Phone: +65 322 8589
Hours: 9 a.m. - 5 p.m. (local time)
Switzerland
Phone: +41 800 81 80 70
Hours: 8 a.m. - 5 p.m. (local time)
Preface xxix
xxx Preface
Chapter 1
XML Concepts
1
Overview
XML (Extensible Markup Language) is a flexible way to create common information formats
and share both the format and the data on the world wide web, intranets, and between
applications.
XML is similar to HTML because XML and HTML contain markup symbols to describe the
contents of a page or file. XML describes the content in terms of the data. HTML, however,
describes the content of a web page such as the text and graphics.
You can import XML definitions into PowerCenter from XML files, DTD files, and XML
schema files.
♦ An XML file contains data, and it can reference a Document Type Definition (DTD) or an
XML schema to describe the data.
♦ A Document Type Definition (DTD) file defines the element types, attributes, and
entities in an XML file. It provides some constraints on the XML document structure but
it does not contain any data.
♦ An XML schema file is called an XML Schema Definition (XSD). A schema file defines
elements and attributes, and it contains a description of the type of elements and attributes
in the associated XML file. Schemas contain simple and complex types. A simple type is an
XML element or attribute that can contains only text. A complex type is an XML element
that contains other elements and attributes.
In XML schemas, you can create element and attribute groups that you can reference
throughout a schema. You can also create substitution groups to substitute one element with
another element in the XML instance document.
<chapter>
<heading>Using DTD Files</heading>
<heading>Fun with Schemas</heading>
</chapter>
</book>
Book is the root element and it contains the title and chapter elements. Book is the parent
element of title and chapter, and chapter is the parent of heading. Title and chapter are sibling
elements because they have the same parent.
An element can have attributes that provide additional information about the element. In the
example below, the attribute graphic_type describes the content of file:
<file graphic_type="gif">computer.gif</file>
XML Files 3
Figure 1-1 shows the structure, elements, and attributes in an XML file:
Enclosure
Element
Element Tags
Element Data
Attribute
Value
Attribute Tag
An XML file is a hierarchical structure. The XML hierarchy may contain the following
elements:
♦ Child element. An element contained within another element.
♦ Enclosure element. An element that contains other elements but does not contain data. It
can include other enclosure elements.
♦ Global element. An element that is a direct child of the root element. You can reference
global elements throughout an XML schema.
♦ Leaf element. An element that does not contain other elements. The lowest level element
in the XML hierarchy.
♦ Local element. An element that is nested in another element. You cannot reuse local
elements outside of the context of the parent element.
♦ Multiple-occurring element. An element that occurs more than once within its parent
element. Enclosure elements can be multiple-occurring elements.
Enclosure Element:
Element Address encloses
elements StreetAddress, City,
State, and Zip. Element
Address is also a Parent
element.
Leaf Element:
Element Zip, along with all its
sibling elements, is the
lowest level element within
element Address.
Multiple-occurring Element:
Element Sales occurs more
than once within element
Product.
Single-occurring Element:
Element PName occurs only
once within element Product.
Child Element:
Element PName is a child of
product, which is a
descendant of Store.
Parent Chain:
Element YTDSales is a child of element Sales, which is a child of element
Product, which is a child of root element Store. All these elements belong in
the same parent chain.
XML Files 5
A valid XML file conforms to the structure defined in an associated DTD or schema file. The
DOCTYPE declaration in an XML file references the location and name of a DTD file. It
also names the root element for the XML file. For example, the following DOCTYPE
declaration specifies the note.dtd file:
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM
"http://www.w3schools.com/dtd/note.dtd">
<note>
<body>XML Data</body>
</note>
The schemaLocation declaration references the location and name of a schema. The following
XML file references an external schema:
<?xml version="1.0"?>
<note xsi:SchemaLocation="http://www.w3schools.com note.xsd">
<body>XML Data</body>
</note>
XML files contain an encoding declaration that indicates the code page in the file. The most
common code pages in XML are UTF-8 and UTF-16.
DTD Elements
The following syntax describes a simple element in a DTD file:
<!ELEMENT product (#PCDATA)>
This DTD description defines the XML tag <product>. The description (#PCDATA) stands
for parsed character data. Parsed character data is plain text without child elements. You can
also use (#CDATA), which is character data. CDATA is not parsed or displayed.
An element that contains child elements has the following syntax in a DTD file:
<!ELEMENT boat (brand, type) >
<!ELEMENT brand (#PCDATA) >
<!ELEMENT type (#PCDATA) >
The boat element has two child elements: brand and type. Each child element can contain
characters. In this example, brand and type can occur once inside the element boat. To change
the number of possible occurrences use one of the following occurrence indicators:
♦ + must occur one or more times
♦ * may occur zero or more times
♦ ? may occur once or not at all
For example, to specify that color must occur one or more times for a boat:
<!ELEMENT boat (color+) >
DTD Attributes
The following syntax describes an attribute in a DTD file:
<!ATTLIST element_name attribute_name attribute_type “default_value”>
DTD Files 7
− #FIXED. The XML document must use the default value from the DTD file. A valid
XML file can contain the same attribute value as the DTD, or it can have no attribute
value. You must specify a default value with this option. For example:
<!ATTLIST product product_name CDATA #FIXED “vacuum”>
Element
Element List
Element Occurrence
Attribute Name
Element Name
Attribute
Attribute Type
and Null
Constraint
Element
Datatype
Element Data
Element List
and
Occurrence
When you use a schema to define an XML file, you can restrict data, define data formats, and
convert data between datatypes.
Namespace
An XML namespace identifies groups of elements. It can identify elements and attributes
combined from different XML documents into one file. For example, you can distinguish
meanings for the element table by declaring different namespaces, such as math:table and
furniture:table.
A namespace contains a Uniform Resource Identifier (URI) to identify schema location. A
URI is a string of characters that identifies an internet resource. It is an abstraction of a URL.
A URL locates a resource, but a URI identifies a resource.
The DTD or schema file does not have to exist at the location you specify in the URI. The
URI distinguishes between elements with the same name from different locations.
You can declare a namespace at the root level of an XML instance document, or you can
declare a namespace inside any element in an XML structure. A namespace declaration
appears in an instance document as an attribute that starts with xmlns. XML is case-sensitive,
so the namespace Math:table is different namespace from math:table.
When you declare multiple namespaces in the same instance document, you use prefixes to
associate an element to a namespace. Prefixes follow the namespace attribute and are declared
as xmlns:<prefix>. You can create a prefix name of any length, however, short prefixes are more
<math:table> </math:table>
<furniture:table> </furniture:table>
</example>
If you associate a prefix with an attribute, the PowerCenter Server associates the attribute with
a namespace. If you do not associate a prefix, the PowerCenter Server use the default
namespace. An element cannot have two attributes with the same name unless the attributes
have different qualified names.
Qualified names are names that contain a namespace name. You create a qualified name by
using a prefix that is mapped to a namespace or by declaring a default namespace for an
element.
Each schema in the repository contains the default prefix for elements and attributes. For one
XML source or XML target, the list of prefixes and their namespaces should be unique. If a
duplicate prefix appears in the same instance document, the Designer makes the prefix unique
before storing it in the repository.
For example, the following is an example of a common schema declaration:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3XML.com"
xmlns="http://www.w3XML.com"
elementFormDefault="qualified">...
...</xs:schema>
The fragment
targetNamespace="http://www.w3XML.com"
describes the namespace for the schema itself. The simple and complex datatypes that the
schema defines come from the "http://www.w3XML.com" namespace.
The fragment
xmlns:xs="http://www.w3.org/2001/XMLSchema"
indicates that the native XML schema elements and datatypes the schema uses come from the
“http://www.w3.org/2001/XMLSchema” namespace. The elements and data types that come
from the “http://www.w3.org/2001/XMLSchema” namespace have a prefix of xs:.
The fragment
xmlns="http://www.w3XML.com"
indicates that any element that this schema declares must be namespace qualified in an XML
instance document.
Name
In an XML file, each tag is the name of an element or attribute. In a DTD file, the tag
<!ELEMENT> indicates the name of an element and the tag <!ATTLIST> indicates the set of
attributes for an element. In a schema file, <element name> indicates the name of an element
and <attribute name> indicates the name of an attribute.
When you import XML definitions into PowerCenter, the element tags become the column
names in the PowerCenter definition by default.
Datatype
The XML schema language has over 40 built-in datatypes, including numeric, string, time,
XML, and binary. These datatypes are called simple types. They contain text but no other
elements and attributes. You can derive new simple types from the basic XML simple types.
For more information about XML datatypes, see the W3C specifications for XML datatypes
at http://www.w3.org/TR/xmlschema-2.
You can create complex XML datatypes. A complex datatype is a datatype that can contain
more than one simple type. They can also contain other complex types and attributes. For
more information about simple and complex datatypes, see “Complex Types” on page 19.
XML files and DTD files do not store datatypes. When you import a source or target
definition from an XML file without an associated DTD or an XML schema, the Designer
assigns either a numeric or a string datatype to the elements. For information on how
PowerCenter uses XML datatypes, see “XML and Transformation Datatypes” on page 170.
Hierarchy
An XML document models a hierarchical database rather than a relational database. The
position of an element in an XML hierarchy represents its relationship to other elements. For
example, an element can contain child elements, and elements can inherit characteristics from
other elements.
When you import an XML definition in PowerCenter, the Designer creates a schema in the
repository. The schema models the hierarchy of the file you import. The Designer validates
the XML definition you create against the hierarchy of this schema.
Absolute Cardinality
The absolute cardinality of an element is the number of times an element occurs within its
parent element in an XML hierarchy. DTD and XML schema files explicitly describe the
absolute cardinality of elements within the hierarchy. A DTD file uses symbols, and an XML
schema file uses the <minOccurs> and <maxOccurs> attributes to describe the absolute
cardinality of an element.
For example, an element has an absolute cardinality of Once(1) if it occurs once within its
parent element. However, it might occur many times within an XML hierarchy if its parent
element has a cardinality of one or more(+).
Table 1-1 describes the way DTD and XML schema files represent cardinality:
Relative Cardinality
Relative cardinality is the relationship of an element to another element in the XML
hierarchy. An element can have a one-to-one, one-to-many, or many-to-many relationship to
another element in the hierarchy.
An element has a one-to-one relationship with another element if for every occurrence of one
element there is one occurrence of the other element. For example, an employee element can
have only one social security number element. Employee and social security number have a
one-to-one relationship.
An element has a one-to-many relationship with another element if for every occurrence of
one element, there can be multiple occurrences of the other element. For example, an
employee element might have multiple email elements.
An element has a many-to-many relationship with another element if an XML file can have
multiple occurrences of both elements. For example, an employee might have multiple email
addresses and multiple street addresses.
One-to-Many Relationship
For every occurrence of
SNAME, there can be
many occurrences of
ADDRESS and, therefore,
many occurrences of
CITY.
Many-to-Many Relationship
There can be multiple
occurrences of STATE and
multiple occurrences of
YTDSALES.
One-to-One Relationship
For every occurrence of
PNAME, there is also
one occurrence of
PPRICE.
Null Constraint
The absolute cardinality of an element determines its null constraint. An element that has an
absolute cardinality of one or more(+) cannot have null values, but an element with a
cardinality of zero or more(*) can have null values. An attribute marked as fixed or required in
an XML schema or DTD file cannot have null values, but an implied attribute can have null
values. For more information about default, fixed, and implied attributes, see “DTD
Attributes” on page 7.
When you import an XML definition, the Designer sets the null constraints for the columns
depending on the absolute cardinality of the element or attribute the column points to. For
more information about absolute cardinality, see “Cardinality” on page 14.
Simple Types
A simple type is an XML element or attribute that can contain only text. A simple type is
indivisible and it forms a column in an XML definition. Simple types cannot have attributes,
but attributes are simple types.
PowerCenter supports the following simple types:
♦ Atomic types
♦ Lists
♦ Unions
Atomic Types
An atomic datatype is a basic datatype in an XML schema definition such as a boolean, string,
integer, decimal, or date. You can define custom atomic datatypes by adding restrictions to an
atomic datatype in order to limit the content.
A facet is a specification of the values that are allowed or not allowed in a restriction. Facets
can specify minimum or maximum data values, a list of the legal values, or a data pattern. The
base attribute for the element specifies the datatype that the facet restricts.
The pattern facet restricts an element to an expression. The following example restricts a
string to one lowercase letter from a to z:
<xs:element name="letter">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[a-z]"/>
</xs:restriction>
</xs:simpleType></xs:element>
The enumeration facet lists all allowed values for an element. The following example restricts
a string to “a”, “b”, or “c”.
<xs:element name="letter">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="a"/>
<xs:enumeration value=”b”/>
<xs:enumeration value=”c”/>
</xs:restriction>
</xs:simpleType></xs:element>
PowerCenter stores a list as one string that contains all array elements. It does not parse the
respective simple types from the string.
Unions
A union type is a combination of one or more atomic or list types that map to one simple type
in an instance document. When you define a union type, you specify what types to combine.
For example, you might create a type called Size. Size can include string data, such as S, M,
and L, or size might contain decimal sizes, such as 30, 32, and 34. If you define a union type
element, the XML file can include a sizename type for string sizes, and a sizenum type for
numeric sizes.
Sizename is a restricted
string type.
The union defines sizenames and sizenums as member types. Sizenames defines a list of string
values. Sizenums defines a list of decimal values. When you import an XML schema
containing this union, the Designer creates the port using a compatible datatype between the
types. In this case, the Designer creates the port as a string datatype because it can import
both strings and decimals as strings.
Complex Types
Complex types aggregate collections of simple types into a logical unit. For example, the
Customer type might include the customer number, name, street address, town, city, and zip
code. Complex types can also reference other complex types or element and attribute groups.
XML supports complex type inheritance. If you define a complex type, you can create other
complex types that inherit the components of the base type. In a type relationship, the base
type is the complex type from which you derive another type. A derived type is a complex
type that inherits elements from the base type.
Extended Complex
Type
Restricted Complex
Type
Element Reference
The base type is PublicationType. BookType extends the PublicationType and includes the
ISBN and Publisher elements.
Publication_Minimum restricts PublicationType. Publication_Minimun requires between 1
and 25 Authors, and restricts the date to just the year.
If you substitute the ANY type element for a complex type containing childnames, the XML
file could contain the following data:
<person>
<firstname>Danny</firstname>
<lastname>Russell</lastname>
<children>
<childname>Cissy</childname>
<childname>Cory</childname>
</children>
</person>
Abstract Elements
The abstract attribute specifies that an element cannot occur directly in an XML document. If
you set the abstract attribute to true, an XML instance document must contain a derived
element instead of the abstract type. The abstract attribute default value is false.
For example, the following schema specifies PublicationType as abstract. In an XML
document that references this schema, you must use the derived type, BookType. BookType
inherits the elements in PublicationType, but also includes ISBN and Publisher elements.
<xsd:complexType name="PublicationType" abstract="true">
<xsd:sequence>
<xsd:element name="Title" type="xsd:string"/>
<xsd:element name="Author" type="xsd:string" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="Date" type="xsd:gYear"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="BookType">
<xsd:complexContent>
<xsd:extension base="PublicationType" >
<xsd:sequence>
<xsd:element name="ISBN" type="xsd:string"/>
<xsd:element name="Publisher" type="xsd:string"/>
</xsd:sequence>
</xsd:extension>
The following example shows the schema syntax for an attribute group:
<xs:attributeGroup name="Songs">
<xs:attribute name="songTitle" type="xs:string" />
<xs:attribute name="artist" type="xs:string" />
<xs:attribute name="publisher" type="xs:string" />
</xs:attributeGroup>
You can create the following element and attribute groups that have constraints:
♦ Sequence group. All elements in an XML file must occur in the order that the schema lists
them. For example, OrderHeader requires the customerName first, then orderNumber,
and then orderDate:
<xs:group name="OrderHeader">
<xs:sequence>
<xs:element name="customerName" type="xs:string" />
<xs:element name="orderNumber" type="xs:number" />
<xs:element name="orderDate" type="xs:date" />
</xs:sequence>
</xs:group>
♦ Choice group. Only one element in the group can occur in an XML document. For
example, only one element in the CustomerInfo group can occur in an XML document:
<xs:group name="CustomerInfo">
<xs:choice>
<xs:element name="customerName" type="xs:string" />
<xs:element name="customerID" type="xs:number" />
<xs:element name="customerNumber" type="xs:integer" />
Component Groups 23
</xs:choice>
</xs:group>
♦ All group. All elements must occur in the XML document or none at all. The elements can
occur in any order. For example, CustomerInfo requires all or none of the three elements:
<xs:group name="CustomerInfo">
<xs:all>
<xs:element name="customerName" type="xs:string" />
<xs:element name="customerAddress" type="xs:string" />
<xs:element name="customerPhone" type="xs:string" />
</xs:all>
</xs:group>
Substitution Groups
Substitution groups allow you to replace one element with another in an XML instance
document. For example, if you have addresses from Canada and the United States, you might
want to create an address type for Canada and another type for the United States. You can
then create a substitution group that accepts either type of address.
The following schema section shows an Address base type, and the derived types,
CAN_Address, and USA_Address. CAN_Address has Province and PostalCode, and the
USA_Address has State and Zip. The MailAddress substitution group includes both address
types.
<xs:complexType name="Address">
<xs:sequence>
<xs:element name="Name" type="xs:string" />
<xs:element name="Street" type="xs:string"
minOccurs="1" maxOccurs="3" />
<xs:element name="City" type="xs:string" />
</xs:sequence>
</xs:complexType>
<xs:complexType name="CAN_Address">
<xs:complexContent>
<xs:extension base="Address">
<xs:sequence>
<xs:element name="Province" type="xs:string" />
<xs:element name="PostalCode" type="CAN_PostalCode"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="USA_Address">
<xs:complexContent>
<xs:extension base="Address">
<xs:sequence>
<xs:element name="State" type="USPS_StateCode" />
<xs:element name="ZIP" type="USPS_ZIP"/>
</xs:sequence>
</xs:extension>
For more information about using Substitution Groups with PowerCenter, see “Using
Substitution Groups in an XML Definition” on page 45.
Component Groups 25
XML Path
The XML Path (XPath) represents the position of an element or attribute in the XML
hierarchy. In an XML definition, the XPath lists the path from an element or attribute to the
root. Except for generated key columns, each column in a group has an XPath and refers to an
attribute or an element in an XML hierarchy.
The XPath ensures integrity in the group column definition. PowerCenter uses a slash (/) to
depict the XPath of a column.
Figure 1-9 shows the XPath for different elements and attributes in a hierarchy:
STORE
STORE/SNAME
STORE/ADDRESS/STREETADDRESS
STORE/PRODUCT/PNAME
STORE/PRODUCT/SALES/YTDSALES
STORE/PRODUCT/@PID
STORE/PRODUCT/SALES/@REGION
Code Pages 27
28 Chapter 1: XML Concepts
Chapter 2
29
Overview
You can create an XML definition in PowerCenter from an XML file, DTD file, XML
schema, flat file definition, or relational table definition. When you create an XML
definition, the Designer extracts XML metadata from the file or definition you choose and it
creates a schema in the repository. The schema provides the structure from which you create
and validate the XML definition.
An XML definition can contain multiple groups. In an XML definition, groups are called
views. The relationship between elements in the XML hierarchy defines the relationship
between the views. When you generate an XML definition, the Designer creates views for
multiple-occurring elements and complex types in a schema. The relative cardinality of
elements in an XML hierarchy affects how PowerCenter creates views in an XML definition.
Relative cardinality determines if elements can be part of the same view.
The Designer relates the views in an XML definition by keys. Source definitions do not
require keys, but target views must have them. Each view has a primary key, which can
contain an XML element or a generated key. The Designer defines relationships between
views with foreign keys. A foreign key in a view points to the primary key of the other view in
the relationship.
When you generate an XML definition, you can create a hierarchical model or an entity
relationship model. If you create a hierarchical model, you create normalized or denormalized
views. A normalized hierarchy contains separate views for multiple-occurring elements. A
denormalized hierarchy has one group with duplicate data for multiple-occurring elements.
If you create an entity model, the Designer creates views for multiple-occurring and complex
types. You can use simple types such as lists, unions, and substitution groups. You can also
model inheritance and circular relationships in PowerCenter definitions.
Figure 2-2. The Root Element and XML Views in an XML Definition
Root Element
Multiple-Occurring
Elements
When you import an XML file, you do not need all of the XML data to create an XML
definition. You need enough data to accurately show the hierarchy of the XML file.
Informatica recommends that you import an XML file that is under 100K. You can reduce the
size of a large XML file by deleting duplicate data elements.
The Designer can create an XML definition from an XML file that references an internal or
external DTD or an XML schema. The XML file uses a universal resource identifier (URI) to
refer to the address of an external DTD or an XML schema. If an XML file has a reference to
a DTD or an XML schema on another machine, the machine that hosts the PowerCenter
Client must have access to the machine where the schema resides so that the Designer can
read it.
Figure 2-3. XML Definition From an XML File Referencing a DTD File
Address is a
Base Type.
CAN_Postal_
Code restricts a
string to a
pattern.
CAN_address
extends
Address.
The MailAddress element is an Address type. A derived type, Can_Address, inherits the
Name, City, and Street from the Address type, and it extends Address by adding a Province
and PostalCode. PostalCode is a simple type called CAN_Postal_Code.
Figure 2-5 shows an example XML definition that you might create from the schema if you
choose to import the schema with the default options:
The Can_Address group contains only the elements that are unique for its type. The group
does not contain the Name, Street, and City that it inherits from MailAddress.
STORE: Root
ADDRESS+: Multiple-occurring
PRODUCT*: Multiple-occurring
EMPLOYEE+: Multiple-occurring
SALES*: Multiple-occurring
STORE View
ADDRESS View
PRODUCT View
SALES View
EMPLOYEE View
Figure 2-9 shows a data preview for each view in the source definition:
ADDRESS View
PRODUCT View
SALES View
EMPLOYEE View
STORE: Root
PRODUCT*: Multiple-occurring
SALES*: Multiple-occurring
Because the multiple-occurring elements have a one-to-many relationship, the Designer can
create a single denormalized group that includes all elements
Figure 2-11 shows the denormalized group for ProdAndSales.dtd in a source definition. The
group does not need a primary or foreign key.
The Designer creates a single group for all the elements in the
ProdAndSales hierarchy. Because a DTD file does not define
datatypes, the Designer assigns a datatype of string to all columns.
For an example of a denormalized group in a mapping, see “Adding an XML Source Qualifier
to a Mapping” on page 123.
For more information about generating hierarchy relationships using the XML Wizard, see
“Generating Hierarchy Relationships” on page 64.
<xsd:complexType name="BookType">
<xsd:complexContent>
<xsd:extension base="PublicationType">
<xsd:sequence>
<xsd:element name="ISBN" type="xsd:string"/>
<xsd:element name="Publisher" type="xsd:string
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="MagazineType">
<xsd:complexContent>
<xsd:extension base="PublicationType">
<xsd:sequence>
<xsd:element name="Volume" type="xsd:string"/>
<xsd:element name="Edition" type="xsd:string"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:schema>
When you generate XML views as entities in an XML definition, the Title and Date metadata
from PublicationType does not repeat in BookType or MagazineType by default. Instead,
these views contain only the metadata that distinguishes them from the PublicationType.
They have foreign keys that link them to the base type.
Author is a multiple-occurring element in a Publication. It becomes an XML view. This
example uses reduced metadata explosion. None of the elements in the base type repeat in the
derived types.
Figure 2-14 shows a sample XML file containing a Publication, a Magazine, and two Books:
♦ BookType view contains the ISBN and Publisher. It contains a foreign key to
PublicationType:
♦ MagazineType view contains Volume and Edition. It also contains a foreign key to the
PublicationType:
♦ The Author view contains authors for all the publications. The Designer generates a
separate view for Author, because Author is a multiple-occurring element. Each
publication can contain multiple authors:
For more information about generating entity relationships using the XML Wizard, see
“Generating Entity Relationships” on page 63.
For more information about using the XML Editor to create relationships between XML
views, see “Creating Relationships Between Views” on page 89.
Substitution Group
Members
You might use the Part XML definition to read the following sample XML file in a session:
<Part>
<ID>1</ID>
<Name>Big Part</Name>
<Type>L</Type>
<Part>
<ID>1.A</ID>
<Name>Middle Part</Name>
<Type>M</Type>
<Part>
<ID>1.A.B</ID>
<Name>Small Part</Name>
<Type>S</Type>
</Part>
</Part>
</Part>
In this sample file, Part 1 contains Part 1.A, and Part 1.A contains Part 1.A.B.
extract ./Name/Firstname/Lastname
Employee, Address, and Email are multiple-occurring elements. You could create a view that
contains the following elements:
EMPLOYEE
ADDRESS
NAME
If you set the view row as Address, the PowerCenter Server extracts a Name for every
Employee/Address in the XML data. You cannot add Email to this view because you would
create a many-to-many relationship between Address and Email.
You can add a pivoted multiple-occurring column to the view. For example, you can add one
instance of Email as a pivoted column to the Employee view. The view would contain the
following elements:
EMPLOYEE
ADDRESS
NAME
EMAIL[1]
A view row can contain a pivoted column. For example, a view might have the view row,
EMPLOYEE/ADDRESS[1]. The PowerCenter Server extracts data for just the first instance of
Employee/Address. For more information about pivoting, see “Pivoting Columns” on
page 51.
♦ An effective view row for a view is the path of view rows from the top of a hierarchy
relationship down to the view row in the view. A view can have multiple effective view
rows because it can have multiple hierarchy relationships in the XML definition.
You can specify options in the XML Editor that affect how view rows and effective view rows
affect data output. For more information about setting row options, see “Setting XML View
Options” on page 100.
Pivoting Columns 51
Figure 2-19 shows the ADDRESS element of the StoreInfo XML file pivoted into two sets of
address columns:
Second occurrence of
Address pivoted to office
address columns with
prefix OFC_.
The first and second occurrences of Address represented as columns in the group:
If you pivot a view row, any column in the XML view that occurs below the view row must
have an XPath that matches XPath of the view row.
For example, a view might have the following view row:
Transaction/Trade[1]
The following columns have the same occurrence of Trade in the XPath:
Transaction/Trade[1]/Date
Transaction/Trade[1]/Price
Transaction/Trade[1]/Person[1]/Firstname
You cannot create a column with the following XPath in the view:
Transaction/Trade[2]/Date
Pivoting Columns 53
Limitations
PowerCenter does not support the following functions:
♦ Concatenated columns. A column cannot be a concatenation of two elements. For
example, you cannot create a column FULLNAME that refers to a concatenation of two
elements FIRSTNAME and LASTNAME.
♦ Composite keys. A key cannot be a concatenation of two elements. For example, you
cannot create a key CUSTOMERID that refers to a concatenation of two elements
LASTNAME and PHONENUMBER.
♦ Parsed Lists. PowerCenter stores a list type as one string that contains all array elements. It
does not parse the respective simple types from the string.
55
Overview
The Designer provides an XML Wizard that enables you to create XML definitions in the
repository. You can import files from a URL or a local machine. You can also import relational
or flat file definitions from a PowerCenter repository. You can create XML definitions from
the following types of files:
♦ XML files
♦ XML schema files
♦ DTD files
♦ Relational definitions
♦ Flat file definitions
Note: Informatica recommends you use an XML schema file to create XML source definitions.
When you create XML definitions, you import files with the XML Wizard and organize
metadata into XML views. XML views are groups of columns containing the elements and
attributes in the XML file. The wizard can generate views for you, or you can create custom
views. For more information see “Working with XML Views” on page 61.
You use the XML Wizard to create relationships between views. You can create hierarchy
relationships or entity relationships. For more information about creating hierarchy
relationships, see “Generating Hierarchy Relationships” on page 64. For more information
about creating entity relationships, see “Generating Entity Relationships” on page 63.
You can use the XML Wizard to synchronize an XML definitions against an XML schema file
if the structure of the schema changes. For more information, see “Synchronizing XML
Definitions” on page 68.
3. Use Advanced Options to specify the way the Designer creates and names XML views.
You can change the following options:
Option Description
Override all infinite lengths You can specify a default length for components with undefined lengths,
such as strings. If you do not set a default length, the precision for these
components sets to infinite. This can cause DTM buffer size errors when
you run a session with large files.
Analyze elements/attributes in Choose this option to create global declarations of standalone XML
standalone XML as global elements or attributes. Otherwise, they become local declarations.
declarations
Create an XML view for an enclosure If the schema has an enclosure element that can occur more than once
element and the enclosure element has child elements that can occur more than
once, you can create a separate view for it. An enclosure element is an
element that has no text content or attributes but has child elements.
Pivot elements into columns You can pivot leaf elements if they have an occurrence limit. You can
pivot elements in source definitions only.
Ignore fixed element and attribute You can ignore fixed values in a schema and allow other element values
values in the data.
Option Description
Ignore prohibited attributes In an XML schema or file, you can declare an attribute as prohibited.
This allows you to restrict a complex type by prohibiting existing
attributes. When you import the schema or file, you can choose to ignore
the prohibited attributes.
Generate names for XML columns You can choose to name XML columns by a sequence number or from
the element or attribute name in the schema. If you use names, you can
add the XML view as a prefix to each column, and you can add the
element name as a prefix to all the attributes.
You can choose from the following options to create XML views:
♦ Generate entity relationships. If you create entity relationships, the XML Wizard
generates views for multiple-occurring or referenced elements and complex types. For
information about generating entity relationships, see “Generating Entity Relationships”
on page 63.
♦ Generate hierarchy relationships. When you create hierarchical relationships, each
reference to a component expands under its parent element. You can generate normalized
or denormalized XML views in a hierarchy relationship.
− Normalized XML views. When you generate a normalized XML view, elements and
attributes appear only once. Multiple-occurring elements, or elements in one-to-many
relationships appear in different views related by keys. For more information about how
the Designer generates normalized views, see “Generating Normalized Views” on
page 38.
− Denormalized XML views. When you generate a denormalized XML view, all elements
and attributes appear in one view. The Designer does not model many-to-many
relationships between elements and attributes in an XML definition. For more
information about how the Designer generates denormalized views, see “Generating a
Denormalized View” on page 40.
1. In the Source Analyzer, select Sources-Import XML Definition. The XML Wizard opens.
2. Navigate to the source you want to import and click Open.
3. Enter a name for the file and click Next.
4. Select Entity Relationships and click Finish.
The XML Wizard generates an XML definition that uses entity relationships.
1. In the Source Analyzer, select Sources-Import XML Definition. The XML Wizard opens.
2. Navigate to the source you want to import and click Open.
3. Enter a name for the file and click Next.
4. Select Hierarchy Relationships.
5. Select Normalized XML Views or Denormalized XML Views and click Finish.
The XML Wizard generates XML groups using hierarchy relationships.
Bookstore
element
selected as root.
Book123
element cleared
as root element.
When you reduce metadata explosion, the Designer creates entity relationships between the
XML views it generates.
1. In the Source Analyzer, select Sources-Import XML Definition. The XML Wizard opens.
2. Navigate to the repository definition or file that you used to create a source or target
definition, and click Open.
3. In Step 1 of the Wizard, click Next. The Wizard ignores any change you make to the
name.
4. In Step 2 of the XML Wizard, choose to synchronize the XML definition and click Next.
The XML Wizard synchronizes the source with the selected definition.
You can use this method to synchronize XML target definitions. If you modify an XML
source definition, you may also need to synchronize the target definition.
Note: Verify that you synchronize the XML definition with the source you used to create the
definition. If you synchronize an XML definition with a source that you did not use to create
the definition, the Designer cannot synchronize the definitions and loses it metadata. Choose
Edit-Revert to Saved to restore the XML definition.
Rename button Edit the name of the source definition and enter a business name.
Business Name Descriptive name for the source definition. You can edit by clicking the Rename
button.
Description Optional description of the source. Character limit is 2,000 bytes/K, where K is
the maximum number of bytes for each character in the repository code page.
Enter links to business documentation.
Code Page Indicates the repository code page. The XML source definition uses the code
page of the repository, not the code page defined in the XML source file.
3. On the Columns tab, you can view information about the columns in the definition. To
change column names or values, use the XML Editor.
Key Type When the Designer creates default views for XML sources, it generates the
primary and foreign keys.
XPath Indicates the element referenced by the current column in the XML hierarchy.
XPath does not display for generated primary or foreign keys.
Business Name User-defined descriptive name for the column. If it is not visible in the window,
scroll to the right to view or modify the column.
4. Click the Metadata Extensions tab to create, edit, and delete user-defined metadata
extensions. For more information about metadata extensions, see the Repository Guide.
5. Click OK.
6. Choose Repository-Save to save changes to the repository.
Note: You cannot edit the Properties tab for an XML source definition.
3. Select a definition from the list of sources or targets. Click the Arrow button to add the
definition to the selected source list.
You can select more than one input, and more than one input type. If the definitions are
related through primary and foreign keys, the XML Wizard uses the keys to relate groups
when it generates the hierarchy.
4. Click Open. The XML Wizard displays.
5. Enter the Optional XML Rootname to have the XML Wizard create a separate group for
the root element and relate all other groups to it. The root name defaults to XRoot.
Remove the root name if you want to use one of the other groups as the root. The XML
Wizard creates a group for each input source or target definition you select and generates
Although in this example EMAIL and PHONE belong to the same parent element, they do
not belong in the same parent chain. Ordinarily, you cannot put them in the same
denormalized view.
To put all the elements of employee in one view, you need to pivot one of the multiple
occurring elements. For example, you can create an EMPLOYEE view and follow these steps
to add all the elements:
1. Add the EID and EMAIL elements to the EMPLOYEE view.
2. Pivot the number of occurrences of EMAIL that you want to include. If you get warnings
while pivoting any of the occurrences, confirm that you want to proceed.
3. Add the PHONE element.
Pivoting in this case turns an element into a single occurring element in a view. After you
pivot EMAIL, the Groups At element for the EMPLOYEE view is PHONE because it
becomes the only multiple-occurring element in the view.
How can I match the EMPNO and SALARY in the same view?
This example of a DTD element definition is ambiguous. It is equivalent to the following:
<!ELEMENT EMPLOYEE (EMPNO+, SALARY+)>
In this definition, there appears to be an EMPNO and a SALARY for each employee.
However, when the definition is written this way, the number of occurrences for EMPNO is
separate from and may not match the number of occurrences of SALARY.
You can try either of the following solutions:
♦ Rewrite the element definition to make it unambiguous.
In most cases, the EMPLOYEE element is more correctly defined in this way:
<!ELEMENT EMPLOYEES (EMPLOYEE+)>
<!ELEMENT EMPLOYEE (EMPNO, SALARY)>
Redefined this way, there is one EMPNO and one SALARY for each EMPLOYEE, and
both elements go into the same EMPLOYEE view.
<Book>Book Name</Book>
<ISBN>051022906630</ISBN>
</Bookstore>
When I import this file, the Designer drops the ISBN element. Why does this happen?
How can I get the Designer to include the ISBN element?
♦ Use the schema to import the XML definition. When you use an XML file to import an
XML definition, the Designer reads the first element as simple content because it has no
child elements. Based on this, the Designer determines that the second instance of Book is
also simple content and discards the child element, ISBN. If you use a schema to import
the definition, the Designer uses the structure defined in the schema to determine how to
read XML data.
♦ Ensure that the XML instance document accurately represents the associated schema. If
you use an XML file to import a source definition, ensure that the document is an accurate
representation of the structure in the corresponding XML schema.
77
Overview
When you import XML definitions with the Designer, you can create XML definitions using
entity or hierarchy relationships, custom XML views, or no views. After you create a
definition, you must use the XML Editor to make changes to it.
Use the XML Editor to create views, modify components, add columns, and maintain view
relationships in the workspace. When you update an XML definition, the Designer
propagates the changes to any mapping that includes the source. Some changes to XML
definitions can invalidate mappings.
Note: If you made significant changes to the source you used to create an XML definition, you
can synchronize it to the XML definition rather than editing the definition manually. For
information about synchronizing XML source definitions, see “Synchronizing XML
Definitions” on page 68.
Figure 4-1 shows the XML Editor:
Navigator
XML
Workspace
Columns Window
Properties Tab
The Properties tab displays information about a component you select in the Navigator. If the
component is a complex element, you can view element properties in the schema, such as
namespace, type, and content. When you view a simple element or attribute, the Properties
tab shows the type and length. The Properties tab also displays any annotations for schema
components.
If you import the definition from an XML file, you can edit the datatypes and cardinality
from the Properties tab. If you create the definition from a DTD file, you can edit the
component type.
You can change the namespace prefix and location for an element if the schema uses
namespace. The prefix identifies the element or attribute declarations that belong to a
namespace.
Actions Tab
The Actions tab lists options that you can use to see more information about a selected
component. It also provides a utility that enables you to reverse changes you make to
components in the Navigator.
Overview 79
The following options might display on the Actions tab, depending on the properties of the
component you select:
♦ ComplexType references. Displays the references to a selected complex type.
♦ ComplexType hierarchy. Displays the complex types derived from the selected
component.
♦ Element references. Displays the components that reference the selected element.
♦ Child components. Displays the global schema components that the selected component
uses.
♦ Revert simpleType. Changes the type, length, and precision values back to the original
value if you have changed them.
♦ XML view references. Displays all the XML views and columns that reference the selected
component.
Workspace Window
The XML Workspace window displays a graphical representation of the XML views and the
relationships between the views. You can create XML views in the workspace and define
relationships between views.
The XML workspace toolbar provides shortcuts to most of the functions that you can do in
the workspace.
You can modify the size of the XML workspace in the following ways:
♦ Hide the Columns window. Click View-Columns Properties.
♦ Hide the Navigator. Click View-Navigator.
♦ Reduce the workspace. Click the Zoom button on the Workspace toolbar.
Columns Window
The Columns window displays the columns for a view that you select in the workspace. You
can use the Columns window to name columns that you add. If you use pivoted columns, you
use the Columns window to select and rename occurrences of multiple-occurring elements.
You can also specify options such as the Not Null option to prevent null data in instance
documents, and the Force Row, Hierarchy or Type Relationship Row, and Non-Recursive
Row options. These options affect how the PowerCenter Server writes data to XML targets.
For more information about the Columns window, see “Setting XML View Options” on
page 100.
4. Set the Column Mode in the XPath Navigator to View Row Mode in order to add the
view row.
5. Select the element in the Navigator and drag it to the view in the workspace. The XML
Editor highlights the view row in blue.
The first time you add a column to a view, the Designer verifies the column can be a view
row. This occurs even if you do not specify to add a view row.
6. To change the view row to another column, right-click the appropriate row in the view
and select Set As view row.
Mode Button
4. Click the Mode button and select to add a column, or a view row. Select Advanced to add
a pivoted column if you have a multiple-occurring element that you want to pivot into
separate columns in the view. Advanced Mode also allows you to add normal columns.
Note: You cannot create pivoted columns in XML target definitions.
5. Drag a column from the XPath Navigator into the appropriate view in the XML
workspace. You can select multiple columns at a time.
The XML Editor validates the column you add. If the column is invalid for the view, a
message displays in the status bar while you are dragging the column. As you add new
columns to views, they display in the Columns window.
Figure 4-4 shows the ANY content element in the Schema Navigator:
Add Type
ANY Content
substituted type
For more information about the FileName column, see “Naming XML Files Dynamically” on
page 116.
Generate a
view.
Display child
components.
Exclude a
child
component.
6. To display a child component, select a shared element or complex type and click the
name.
7. To exclude a child component, clear the element in the Exclude Child Components pane.
To generate a new view, select the element or complex type. When you create the new
entity relationships, you generate a view with that element as a view root.
To sort components:
Navigating to Components
If you have large XML definitions you can quickly find components by using the Navigate
option. To use this option, select a component to navigate from and select a navigation
option. For example, if you click a foreign key, you can navigate to the associated primary key
or to the column in the Columns window. You can navigate between components in the
workspace, the Columns window, and the Navigator.
To navigate to components:
Click Advanced
Options to
search by
component
properties.
Click a search
result to view it
in the Properties
window.
1. If you want to view the metadata as a sample XML document, choose a global
component in the Navigator.
2. Click View-XML Metadata.
The View XML Metadata dialog box displays:
If you are working with an XML source definition, the Columns window contains XML View
Options. These options give you flexibility about when you generate rows or foreign keys in a
session. You can select the following options:
♦ All hierarchy foreign keys
♦ Non-recursive row
♦ Hierarchy relationship row
♦ Force row
♦ Type relationship row
I cannot find the DTD or XML schema file that I created when I viewed XML metadata.
The DTD or XML schema file that you can view is a temporary file that the Designer creates
only for viewing. If you want to use the file for other purposes, save it with another name and
directory when you view it.
When I add columns to my XML source views, the hierarchy in my source XML file
remains the same.
When you add columns to XML source views, you do not add elements to the underlying
hierarchy. The XML hierarchy that you import remains the same no matter how you create
the views or how you map the columns in a view to the elements in the hierarchy. You can
modify the datatypes and the cardinality of the elements, but you cannot modify the structure
of the hierarchy.
103
Overview
You can create XML target definitions in the following ways:
♦ Import the definition from an XML file. Create a target definition from an XML, DTD,
or XML schema file. You can import XML file definitions from a URL or a local machine.
If you import an XML file with an associated DTD, the XML Wizard uses the DTD to
generate the XML document. For more information about importing XML files, see
“Importing an XML Target Definition from XML Files” on page 105.
♦ Create an XML target definition based on an XML source definition. Drag an existing
XML source definition into the Warehouse Designer. If you create an XML target
definition, the Designer creates a target definition based on the hierarchy of the XML
definition. For more information about creating an XML target from other definitions, see
“Creating a Target from an XML Source Definition” on page 106.
♦ Create an XML target based on a relational file definition. You can import an XML
target definition from a relational or flat file repository definition. For more information
about importing XML definitions from relational or flat file definitions, see “Creating
XML Definitions from Repository Definitions” on page 72.
In addition to creating XML target definitions, you can perform the following tasks with
XML targets in the Warehouse Designer:
♦ Edit target properties. Edit an XML target definition to add comments or update them to
reflect changed target XML, DTD, or XML schema files. For information on editing XML
target definitions, see “Editing XML Target Definition Properties” on page 107.
♦ Synchronize target definitions. You can synchronize your target XML definition to an
updated schema if you need to make changes. Synchronizing enables you to update the
XML definition instead of reimporting the definition when the schema changes. For more
information about synchronizing XML definitions, see “Synchronizing XML Definitions”
on page 68.
1. In the Warehouse Designer, select Targets-Import XML Definition. The Import XML
Definitions window opens. The default displays schema files in a local folder.
2. Click Local File or URL to browse for XML files.
3. To browse for DTD or XML files, select the appropriate file extension from the Files of
Type list.
For more information about using the XML Wizard see “Importing an XML Source
Definition” on page 57.
1. Drag an XML source definition from the Navigator into the Warehouse Designer
workspace.
The XML Export dialog box appears:
Rename button Edit the name of the target definition and enter a business name.
Business Name Descriptive name for the target table. Edit the Business Name using the Rename
button.
Description Optional description of target table. Character limit is 2,000 bytes/K, where K is
the maximum number of bytes for each character in the repository code page.
Enter links to business documentation.
Code Page Select the code page to use in the target definition. For more information about
XML code pages, see “Code Pages” on page 27.
Keywords Allows you to keep track of your targets. As development and maintenance work
continues, the number of targets increases. While all of these targets may
appear in the same folder, they may all serve different purposes. Keywords can
help you find related targets. Keywords can include developer names, mappings,
or the associated schema.
You can use keywords to perform searches in the Repository Manager. For
details on keyword searches in the Repository Manager, see “Using the
Repository Manager” in the Repository Guide.
Select Table Displays the target definition you are editing. To choose a different definition to
edit, select one from the list of definitions you have available in the workspace.
Precision Size of column. You can change precision only for some datatypes, such as
string.
Key Type The type of key the XML Wizard generates to link the views.
XPath The path through the XML document hierarchy that enables you to locate an
item.
5. On the Properties tab, you can modify the transformation attributes of the target
definition. If you are using a source-based commit session or Transaction Control
transformation with the XML target, you can define how you want to flush data to the
target. For more information, see “Working with XML Targets in a Session” on page 155.
Select Table Displays the source definition you are editing. To choose a different source
definition to edit, select it from the list.
Duplicate Group Choose one of these options to handle processing duplicate rows in the target:
Row Handling - First Row. The PowerCenter Server passes the first duplicate row to the
target. Rows following with the same primary key are rejected.
- Last Row. The PowerCenter Server passes the last duplicate row to the
target.
- Error. The PowerCenter Server passes the first row to the target. Rows with
duplicate primary keys increment the error count. The session fails when the
error count reaches the error threshold.
For more information about duplicate group row handling, see “Handling
Duplicate Group Rows” on page 159.
DTD Reference DTD or XML schema file name for the target XML file. The PowerCenter
Server adds the document type declaration to the XML file when you create it.
For more information about using the DTD or schema file name, see “DTD and
Schema Reference” on page 160
On Commit The PowerCenter Server can generate multiple XML documents or append to
one XML document after a commit. You can use one of the following options:
- Ignore Commit. The PowerCenter server creates an XML document and
writes to it at end of file only.
- Create New Document. Creates a new XML document at each commit.
- Append to Document. Writes to the same XML document after each commit.
For more information about flushing XML on commits, see “Flushing XML on
Commits” on page 160.
Cache Directory The directory for the XML target cache files. The default is the $PMCacheDir
server variable. For more information about working with caches, see “XML
Caching Properties” on page 162.
Cache Size The total size in bytes for the XML target cache. The default is 10,000,000
bytes.
6. On the Metadata Extensions tab, you can create, modify, delete, and promote non-
reusable metadata extensions, as well as update their values. You can also update the
7. Click OK.
8. Choose Repository-Save.
Inheritance Validation
You can create two types of inheritance relationships with XML views:
♦ View-to-view inheritance. A view is a derived type of another view. Both views must have
global complex view roots.
A view can have an inheritance relationship to another view only if its view root is a
complex type derived from the view root type of the other view.
A view can be a parent in multiple inheritance relationships, but a view can be a child in
only one inheritance relationship.
♦ Column-to-view inheritance. The column is an element of a local complex type, Type1,
and the view is rooted at a global complex type, Type2. Type1 is derived from Type2.
A column in a view can have an inheritance relationship to another view if the column is a
local complex type and the type is derived from the view root type of the other view.
If a column in a view, V1, has an inheritance relationship to a view, V2, you cannot put
the content of V2 into view V1.
Active Sources
An active source is an active transformation the PowerCenter Server uses to generate rows.
The PowerCenter Server can load data from different active sources to an XML target.
However, all target ports within a single group must receive data from the same active source.
The following transformations are active sources:
♦ Aggregator
♦ Application Source Qualifier
♦ Custom, configured as an active transformation
♦ Joiner
♦ MQ Source Qualifier
♦ Normalizer (VSAM or pipeline)
♦ Rank
♦ Sorter
♦ Source Qualifier
♦ XML Source Qualifier
♦ Mapplet, if it contains one of the above transformations
1. Right-click the target definition in the Mapping Designer and select Edit.
Example
The following example shows a mapping containing an XML target with a FileName column.
The Expression transformation generates a file name from the Country XML element and
passes the value to the FileName column. The mapping passes a country to the target root,
If you do not create an XML view for an enclosure element in the source definition, you do
not create the Contactinfo element in the source.
The XML Wizard creates the following source and target definitions:
The XML target created from my relational sources contains all elements, but no attributes.
How can I modify the target hierarchy so that I can mark certain data as attributes?
You cannot modify the component types that the wizard creates from relational tables.
However, you can view a DTD or AN XML schema file of the target XML hierarchy. Save the
DTD or XML schema file to your own directory and filename. Open this new file and modify
the hierarchy, setting the attributes and elements as needed. Then, you can use this file to
import a target definition with a new hierarchy. For more information about viewing XML
definitions, see “Viewing XML Metadata” on page 97.
Troubleshooting 119
120 Chapter 5: Working with XML Targets
Chapter 6
121
Overview
Transformation type:
Active
Connected
When you add an XML source definition to a mapping, you need to connect it to an XML
Source Qualifier transformation. The XML Source Qualifier transformation defines the data
elements that the PowerCenter Server reads when it runs a session. It determines how the
PowerCenter reads the source data.
You can manually add a source qualifier transformation or you can create a source qualifier
transformation by default when you add a source definition to a mapping. For more
information about adding XML Source Qualifier transformations, see “Adding an XML
Source Qualifier to a Mapping” on page 123.
You can edit some of the properties and add metadata extensions to an XML Source Qualifier
transformation. For more information about editing an XML Source Qualifier, see “Editing
an XML Source Qualifier Transformation” on page 125.
When you connect an XML Source Qualifier transformation in a mapping, you must follow
some rules to create a valid mapping. For more information about using an XML Source
Qualifier transformation in a mapping, see “Using the XML Source Qualifier in a Mapping”
on page 129.
3. Select XML Source Qualifier transformation, and type a name for the new
transformation.
The naming convention for XML Source Qualifier transformations is
XSQ_TransformationName.
4. Click Create.
The Designer lists all the XML source definitions in the mapping with no corresponding
XML Source Qualifier transformations.
Generated Key
Sequence Numbers
Use the Sequence column to set start values for generated keys in XML groups. You can
enter different values for each generated key. Whenever you change these values, the
sequence numbers restart the next time you run a session using the transformation.
4. Click the Properties tab to configure properties that affect how the PowerCenter Server
runs the mapping during a session.
Select Transformation Displays the transformation you are editing. To choose a different transformation to
edit, select it from the list.
Tracing Level Determines the amount of information about this transformation the PowerCenter
Server writes to the session log when it runs the workflow. You can override this
tracing level when you configure a session.
Reset At the end of a session, the PowerCenter Server resets the start values to the start
values for the current session. For more information, see “Setting Sequence Numbers
for Generated Keys” on page 128.
Restart At the beginning of a session, the PowerCenter Server starts the generated key
sequence for all groups at one. For more information, see “Setting Sequence Numbers
for Generated Keys” on page 128.
5. Click the Metadata Extensions tab to create, edit, and delete user-defined metadata
extensions.
Add a metadata
extension.
Delete a metadata
extension.
You can create, modify, delete, and promote non-reusable metadata extensions, as well as
update their values. You can also update the values of reusable metadata extensions. For
more information, see “Metadata Extensions” in the Repository Guide.
6. Click OK.
7. Choose Repository-Save to save changes.
Figure 6-2. Linking XML Source Qualifier Transformations to One Input Group
XML SQ Single Input Group
Transformation Transformation
Group 1
Column11 Column1
Column12 Column2
Group 2 Column3
Column21 X Column4
X
Column22
♦ You can link ports from one group in an XML Source Qualifier transformation to ports
in more than one transformation. Each group in an XML Source Qualifier transformation
can be a source of data for more than one pipeline branch. Data can pass from one group
to several different transformations.
♦ You can link multiple groups from one XML Source Qualifier transformation to
different input groups in a transformation. You can link multiple groups from one XML
Source Qualifier transformation to different input groups in most multiple input group
transformation, such as a Joiner or Custom transformation. However, you can only link
multiple groups from one XML Source Qualifier transformation to one Joiner
transformation if the Joiner has sorted input. To connect two XML Source Qualifier
transformation groups to a Joiner transformation with unsorted input, you must create
two instances of the same XML source. For an example on connecting two XML Source
Qualifier transformations to Joiner transformations, see “Joining Two XML Source
Qualifier Transformation Groups” on page 133.
Figure 6-3. Linking XML Source Qualifier to Multiple Input Group Transformations
XML SQ Joiner Transformation configured
Transformation1 for sorted input
Group 1
Column11 Column1 (master)
Column12 Column2 (master)
Group 2 Column3 (detail)
Column21 Column4 (detail)
Column22
You might want to calculate the total YTD sales for each product in the StoreInfo.xml
regardless of region. Besides sales, you also want the names and prices of each product. To do
this, you need both product and sales information in the same transformation. However,
when you import the StoreInfo.xml file, the default groups that the Designer creates include a
Product group for the product information and a Sales group for the sales information.
Figure 6-5. Invalid use of XML Source Qualifier Transformation in Aggregator Mapping
Since you cannot link both the Product and the Sales groups to the same single input group
transformation, you can create the mapping in one of the following ways:
♦ Use a denormalized group containing all required information.
♦ Join the data from the two groups using a Joiner transformation.
To create the denormalized group, edit the source definition in the Source Analyzer. You can
either create a new group or modify an existing group. Add to the group all the product and
sales columns you need for the sales calculation in the Aggregator transformation. You can use
the XML Editor to create the group and validate it.
For more information about denormalized groups, see “Generating a Denormalized View” on
page 40.
I cannot break the link between the XML source definition and its source qualifier.
The XML Source Qualifier transformation columns match the corresponding XML source
definition columns exactly. You cannot remove or modify the links between an XML source
definition and its XML Source Qualifier transformation. When you remove an XML source
definition, the Designer automatically removes its XML Source Qualifier transformation.
Troubleshooting 135
136 Chapter 6: XML Source Qualifier Transformation
Chapter 7
Midstream XML
Transformations
This chapter includes the following topics:
♦ Overview, 138
♦ XML Parser Transformation, 139
♦ XML Generator Transformation, 141
♦ Creating a Midstream XML Transformation, 143
♦ Editing Midstream XML Transformation Properties, 144
♦ Generating Pass-Through Ports, 147
137
Overview
You can use XML definitions to read or create XML data. However, sometimes you need to
extract or generate XML inside a pipeline. For example, you might want to send a message to
a TIBCO target containing an XML document as the data field. In this case, you need to
generate an XML document before sending the message to TIBCO. You can use a midstream
XML transformation to generate the XML.
You can create the following types of midstream XML transformations:
♦ XML Parser transformation. The XML Parser transformation reads XML from one input
port and outputs data to one or more groups.
♦ XML Generator transformation. The XML Generator transformation reads data from one
or more sources and generates XML. It has a single output port.
Use a midstream XML transformation to extract XML data from messaging systems such as
TIBCO, MQSeries, or from other sources, such as files or databases. The XML
transformation functionality is similar to the XML source and target functionality, except it
parses the XML or generates the document in the pipeline.
Midstream XML transformations support the same XML schema components that the XML
Wizard and Editor support. In addition, XML transformations support the following
functionality:
♦ Pass-through ports. You can use pass-through ports to pass non-XML data through the
midstream transformation. These fields are not part of the XML schema definition, but
you can use them to generate denormalized XML groups. You use these fields in the same
manner as top-level XML elements. You can also use a pass-through field as a primary key
for the top-level group in your XML definition. For more information, see “Generating
Pass-Through Ports” on page 147.
♦ Real-time processing. You can use a midstream XML transformation to process data as
BLOBs from messaging systems.
♦ Support for multiple partitions. You can generate different XML documents for each
partition.
The XML Parser transformation is similar to an XML source definition. When the
PowerCenter Server processes an XML Parser transformation, it reads a row of XML data,
parses the XML, and passes data through output groups. It also can pass non-XML data.
The XML Parser transformation has one input group, and one or more output groups. The
input group has one input port, “DataInput” which accepts a binary or string data BLOB as
an XML document.
When you create a midstream XML Parser transformation, you use the XML Wizard to
import an XML, DTD, or XML schema file. For example, you can import the following
Employee DTD file:
<!ELEMENT EMPLOYEES (EMPLOYEE+)>
<!ELEMENT EMPLOYEE (LASTNAME, FIRSTNAME, ADDRESS, PHONE+, EMAIL*,
EMPLOYMENT)>
<!ATTLIST EMPLOYEE EMPID CDATA #REQUIRED
DEPTID CDATA #REQUIRED>
DataInput
Group
You can use an XML Generator transformation to combine input that comes from several
sources to create an XML document. For example, you can use the transformation to combine
the XML data from two TIBCO sources into one TIBCO target. One source might contain
employee and salary information, and the other might have employee phone and email
information.
The XML Generator transformation is similar to an XML target definition. When the
PowerCenter Server processes an XML Generator transformation, it writes rows of XML data.
The PowerCenter Server can also process pass-through fields containing non-XML data in the
transformation.
Figure 7-2 shows the XML Generator transformation:
Group 1
TIBCO Source 1
Group 2
Group 3
TIBCO Source 2
Group 4
The XML Generator transformation has one or more input groups and one output group.
The output group has one port, “DataOutput,” which allows a binary or string data BLOB as
an XML document. This group also contains the output port when you create pass-through
fields.
DataOutput Group
3. Select the Midstream XML Parser or Midstream XML Generator transformation type.
4. Enter a transformation name, and click Create.
The Import XML Definition dialog box displays.
5. Choose a file to import, and click Open.
The XML Wizard displays.
6. Create the XML definitions using the XML Wizard. For more information about the
XML Wizard, see “Importing an XML Source Definition” on page 57.
7. Click Finish in the XML Wizard.
The midstream XML transformation displays in the workspace.
8. To edit the midstream XML transformation properties, double-click the transformation
in the workspace.
Reset key
sequence
numbers to
beginning
values after
session
completes.
Always
start key
sequence
numbers at
one.
Table 7-1 shows the options you can change on the Midstream XML Parser tab:
Restart Always start the generated key sequence for all groups at one.
Reset At the end of a session, reset the value sequence for all generated
keys in all groups. This resets the sequence number back to where
it was previously.
Note: The options in Table 7-1 affect the generated key numbers. If you do not choose either
option, the sequence numbers in the generated keys increase from session to session. If you
select the Restart or Reset option, it updates the Restart or Reset property that displays on the
Initialization Properties tab. You cannot change these options from the Initialization
Properties tab, however.
Table 7-2 shows the options you can change on the XML Generator transformation tab:
Note: The Designer sets the transformation scope to all input when you choose to ignore
commits. Its sets the transformation scope to the transaction level if you set On Commit to
Create New Doc.
The dialog box lists the pass-through ports you added in the transformation.
9. Select the pass-through port that will correspond to the new reference port in the view
and click OK.
The corresponding output reference port displays in the view. You can rename the port to
a more meaningful name in the Columns window.
Output
Reference
Port
Input Pass-Through Port
151
Working with XML Sources in a Session
When you create a session to read data from an XML source, you can configure source
properties for that session. For example, you might want to override the source file name and
location in the session properties.
Figure 8-1 shows the Mapping tab in session properties:
Table 8-1 describes the properties you can override for XML readers in a session:
Source File Optional Location of the XML file. By default, the PowerCenter Server looks in the
Directory server variable directory, $PMSourceFileDir.
You can enter the full path and file name. If you specify both the directory and
file name in the Source Filename field, clear this field. The PowerCenter
Server concatenates this field with the Source Filename field when it runs the
session.
You can also use the $InputFileName session parameter to specify the file
directory.
For details on session parameters, see “Session Parameters” in the Workflow
Administration Guide.
Source Filename Required Enter the file name, or file name and path. Optionally use the $InputFileName
session parameter for the file name.
If you specify both the directory and file name in the Source File Directory
field, clear this field. The PowerCenter Server concatenates this field with the
Source File Directory field when it runs the session. For example, if you have
“C:\XMLdata\” in the Source File Directory field, then enter “filename.xml” in
the Source Filename field. When the PowerCenter Server begins the session,
it looks for “C:\data\filename.xml”.
For details on session parameters, see “Session Parameters” in the Workflow
Administration Guide.
Source Filetype Required The source filetype option enables you to configure multiple file sources by
using a file list. Choose Direct or Indirect. The option indicates whether the
source file contains the source data, or whether it contains a list of files with
the same file properties. Choose Direct if the source file contains the source
data. Choose Indirect if the source file contains a list of files.
When you select Indirect, the PowerCenter Server finds the file list and reads
each listed file when it runs the session. For details on file lists, see the
Workflow Administration Guide.
Table 8-2 describes the properties you can override for an XML Source Qualifier in a session:
Validate XML Required The Validate XML Source option provides flexibility for validating an XML
Source source against a schema or DTD file. Select Do Not Validate to skip
validation, even if the instance document has an associated DTD or schema
reference. Select Validate Only if DTD is Present to validate only when the
XML source has a corresponding DTD or schema file. The session fails if the
instance document specifies a DTD or schema and one is not present. Select
Always Validate to always validate the XML file. The session fails if the DTD
or schema does not exist or your data is invalid.
Partitionable Optional Allows you to create multiple partitions for the source pipeline.
Output File Directory Optional Enter the directory name in this field. By default, the PowerCenter Server writes
output files in the server variable directory, $PMTargetFileDir.
You can enter the full path and file name. If you specify both the directory and
file name in the Output Filename field, clear this field. The PowerCenter Server
concatenates this field with the Output Filename field when it runs the session.
You can also use the $OutputFileName session parameter to specify the file
directory.
For details on session parameters, see “Session Parameters” in the Workflow
Administration Guide.
Output Filename Required Enter the file name, or file name and path. By default, the Workflow Manager
names the target file based on the target definition used in the mapping:
target_name.xml.
If the target definition contains a slash character, the Workflow Manager
replaces the slash character with an underscore.
Enter the file name, or file name and path. Optionally use the $OutputFileName
session parameter for the file name.
If you specify both the directory and file name in the Output File Directory field,
clear this field. The PowerCenter Server concatenates this field with the Output
File Directory field when it runs the session.
For details on session parameters, see “Session Parameters” in the Workflow
Administration Guide.
Note: If you specify an absolute path file name when using FTP, the
PowerCenter Server ignores the Default Remote Directory specified in the FTP
connection. When you specify an absolute path file name, do not use single or
double quotes.
Validate Target Optional Validate XML target data against the schema.
Format Output Optional Format the XML target file so the XML elements and attributes indent.
Otherwise, each line of the XML file starts in the same position.
XML Datetime Required Choose local time, the difference in hours between the server time zone and
Format Greenwich Mean Time or Greenwich Mean Time (UTC).
Null Content Required Choose how to represent null content in the target. For more information, see
Representation “Null and Empty String” on page 158.
Empty String Content Required Choose how to represent empty string content in the target. For more
Representation information, see “Null and Empty String” on page 158.
Character Set
You can configure the PowerCenter Server to run sessions with XML targets in either ASCII
or Unicode data movement mode. XML files contain an encoding declaration that indicates
the code page used in the file. The most commonly used code pages are UTF-8 and UTF-16.
PowerCenter supports UTF-8 code pages for XML targets only.
PowerCenter supports the same set of code pages for XML files that it supports for relational
databases and other files. For details on code page compatibility, see “Globalization
Overview” in the Installation and Configuration Guide. For a list of supported code pages, see
“Code Pages” in the Installation and Configuration Guide.
Special Characters
The PowerCenter Server adds escape characters to the following special characters in XML
targets:
< & > ”
Table 8-4. Null and Empty String Output for XML Targets
The PowerCenter Server does not check that the file you specify exists or that it is valid. The
PowerCenter Server does not validate the target XML file against the DTD or schema file you
specify.
Note: An XML instance document must refer to the full relative path of a schema if a
midstream XML transformation is processing the file. Otherwise, the full path is not required.
Ignoring Commit
You can choose to generate the XML document after the session has read all the source
records. This option causes a session to store all of the XML data in cache as it processes, so
you should use this option when you are not processing a lot of data.
The base file name is a local file name for regular files and a remote file name for FTP files.
Each XML document that the PowerCenter Server creates contains the base file name and a
number, starting with 1. The PowerCenter Server generates a file list after writing all the
output files for the XML target. The file list contains all the output file names. For local files,
the file list contains a list of all fully qualified file names, one per line in server code page. For
FTP files, the file list contains file names with no path component.
Each time you run a session, the PowerCenter Server overwrites the target XML files and
generates a new file list. The PowerCenter Server places the file list into the XML target
directory.
Note: The first file that the PowerCenter Server generates does not contain a number. It is the
default base file. This file is included in the list file if the server writes data to it. If, however,
you use a FileName column to create file names, the server deletes the default file when
processing completes.
If the base file name does not have an extension, the PowerCenter Server appends a period
and the number at the end of the name. For example, instance “abc” becomes “abc.1”. If the
extension of a base file name is not “.xml” the PowerCenter Server retains the extension, but
applies a period and a number before the extension. For example “abc.txt” becomes
“abc.1.txt.”
For example, the following session log excerpt records the PowerCenter Server loading a target
table to the group DEPARTMENT in the target EMP_SALARY:
WRITER_1_1_1> WRT_8167 Start loading table [EMP_SALARY::DEPARTMENT] at:
Wed Nov 05 08:01:35 2003
Example
The following example includes a mapping that contains a flat file source of country names,
regions, and revenue dollars per region. The target is an XML file. The root view contains the
primary key, XPK_COL_0, which is a string.
Figure 8-3 shows data mapped to the root of the XML definition:
Each time the PowerCenter Server passes a new country name to the root view it generates a
new target file. Each target XML file contains country name, region, and revenue data for one
country.
The PowerCenter Server passes the following rows to the XML target:
Country,Region,Revenue
USA,region1,1000
Canada,region1,100
USA,region2,200
USA,region3,300
USA,region4,400
France,region1,10
France,region2,20
France,region3,30
France,region4,40
If the data has multiple root rows with circular references, but none of the root rows has a null
foreign key, the PowerCenter Server cannot find a root row. It outputs the following message
in the session log file:
XMLW_31108 Error: An appropriate start row was not found for XML root
group [B] with circular reference. No output was generated.
You can add a FileName column to XML targets in order to name XML output documents
based on data values.
Table 8-5 describes the properties you define in the XML Generator transformation:
Validate Output Optional Validate XML target data against the schema.
Format Output Optional Format XML output so the XML elements and attributes indent. Otherwise,
each line of the XML file starts in the same position.
XML Datetime Required Select local time, local time with time zone, or UTC. Local time with time zone
Format is the difference in hours between the server time zone and Greenwich Mean
Time. UTC is Greenwich Mean Time.
Null Content Required Select No Tag or Tag with Empty Content. For more information see “Null and
Representation Empty String” on page 158.
Empty String Content Required Select No Tag or Tag with Empty Content. For more information see “Null and
Representation Empty String” on page 158.
Duplicate Group Row Required Select First Row, Last Row, or Error. For more information, see “Handling
Handling Duplicate Group Rows” on page 159.
Orphan Row Required Orphan rows are child rows that are missing parent data. Select Ignore to
Handling continue the session and ignore the orphan rows. Select Error to abort the
session when orphan rows occur.
DTD Reference Optional Associated DTD or XML schema file name to add to the XML file the
transformation creates. You must fully qualify this file name when you use it
with XML Generator transformations. For more information about using the
DTD or schema file name, see “DTD and Schema Reference” on page 160.
Cache Size Required The total size in bytes for the cache memory used by the transformation. The
default is 10,000,000 bytes.
Cache Directory Required The directory for the XML cache files. The default is the $PMCacheDir server
variable. For more information about working with caches, see “XML Caching
Properties” on page 162.
Table 8-6 describes the properties you define in the XML Parser transformation:
Validate XML Source Optional Validate XML source data against the schema.
Treat Empty Content Required Select No Tag or Tag with Empty Content. For more information see “Null and
as Null Empty String” on page 158.
169
XML and Transformation Datatypes
PowerCenter supports all XML datatypes specified in the W3C May 2, 2001
Recommendation. Table A-1 lists the XML datatypes and compares them to the
transformation datatypes that display in the XML Source Qualifier transformation. For
details on XML datatypes, see the W3C specifications for XML datatypes at http://
www.w3.org/TR/xmlschema-2.
For more information about using transformation expressions and functions to convert
datatypes, see “Functions” in the Transformation Language Reference.
When you pass data to the target, make sure that it is in the correct format so that the
PowerCenter Server writes the data correctly in the target XML file.
You can change XML datatypes in XML definitions and in midstream XML transformations
if you import an XML file to create the definition. You cannot change XML datatypes when
you import them from an XML schema, and you cannot change the transformation datatypes
for XML sources within a mapping.
Table A-1 shows the XML and corresponding Transformation datatypes:
date Date/Time Jan 1, 1753 AD to Dec 31, 9999 AD (precision to the second)
dateTime Date/Time Jan 1, 1753 AD to Dec 31, 9999 AD (precision to the second)
gMonthDay Date/Time Jan 1, 1753 AD to Dec 31, 9999 AD (precision to the second)
gYearMonth Date/Time Jan 1, 1753 AD to Dec 31, 9999 AD (precision to the second)
time Date/Time Jan 1, 1753 AD to Dec 31, 9999 AD (precision to the second)
Unsupported Datatypes
PowerCenter does not support the following XML datatypes:
♦ binary
You can use this format or any portion of this format if it conforms to the XML schema
specifications. For example, an element of type date, time, or datetime may use either of the
following formats within a session:
CCYY-MM
or
CCYY-MM-DD/Thh
then all the subsequent values for the same element must follow the same format.
If the PowerCenter Server reads in a value for the same date, time, or datetime element that
has a different format, even if it is a valid dateTime format, it rejects the row.
In this example, if the Informatica reads in a subsequent value with a different format
CCYY-MM-DD
then it rejects the row even if the new format is also a valid datetime format.
A C
absolute cardinality (XML) caching XML
description 14 properties 162
abstract elements cardinality
description 21 absolute 14
using in a mapping 115 relative 15
advanced mode types 14
using the XPath Navigator 83 child element (XML)
all group (XML) overview 4
description 24 choice group (XML)
all hierarchy foreign keys option description 23
overview 100 circular references
ANY element type (XML) description 47
in the XML Editor 85 non-recursive row option 100
overview 21 code pages
append to document importing XML sources 57
flushing XML 161 XML file 6, 27
arrange columns
organizing views in workspace 94 adding to XML views 82
atomic types deleting from an XML view 86
description 17 generating names 60
attributes pivoting 51
DTD syntax 7 size limitations 102
XML 60 Columns window
overview 81
setting XML View Options 100
commit
flushing XML 160
175
complex types XML targets 104, 107
creating type relationships 90 XML Views 82
description 19 element type ANY (XML)
expanding 84 overview 21
extended 19 elements
in XML schemas 19 description 3
restricted 19 DTD syntax 7
viewing the hierarchy 96 empty strings
composite keys (XML) XML target files 158
overview 54 enclosure element (XML)
concatenated columns (XML) creating views for 59
description 54 XML hierarchy 4
creating encoding declaration
a new XML view in workspace 81 describing code page in XML 6
relationships between XML views 89 entity relationships
custom XML groups generating 42
description 37 generating XML views 63
skip create view 65 enumeration
description 17
searching for values 95
D escape characters
in XML targets 158
datatypes
unsupported XML 171
XML metadata 13
default value
F
XML attributes 115 facets
default_value description 17
for a DTD attribute 7 filename column
deleting passing to XML target 116
columns from XML view 86 #FIXED option
denormalized XML groups description 7
description 40 flushing data
documentation appending to document 161
conventions xxvii create new documents 161
description xxvi ignore commit 161
online xxvii XML targets 116
DTD file Force Row option
description 7 overview 101
metadata from DTD files 32 foreign keys
schema reference 160 generating values for all in a view 100
DTM buffer size errors
fixing 59
duplicate group row handling G
overview 159
generated keys
description 36
E sequence numbering 126
generating column names
editing setting option 60
XML source definitions 70 global declarations
XML Source Qualifier transformation 125 option to create 59
176 Index
global element (XML) legend
overview 4 XML Editor icons 79
limitations
XML sources and targets 54
H lists (XML)
description 18
hierarchy (XML) local element (XML)
description 13 overview 4
hierarchy relationships
circular references 47
element relationships 13
generating 38
M
Relationship Row option 101 mappings
connecting abstract elements 115
using XML targets 114
I XML Source Qualifier transformation 129
XML target ports 115
ignore commit message IDs
flushing XML 161 XML Generator transformations 147
ignore fixed element metadata
setting option 59 viewing as an XML schema 97
ignore prohibited attributes metadata explosion
setting options 60 example 43
#IMPLIED option reducing 66
description 7 metadata extensions
importing in XML source qualifiers 127
XML sources 57 in XML sources 71
XML targets 104, 105 in XML targets 110
infinite length columns midstream XML transformation
overriding 102 creating 143
infinite precision general properties 144
overriding 59 Generator properties 145
Informatica overview 138
documentation xxvi Parser properties 144
Webzine xxviii pass-through ports 147
reset generated key sequence 145
mode button
K using the XPath Navigator 83
multiple XML output
keys example 162
generated key sequence numbers 126 generating 161
generating multiple foreign keys in a view 100 multiple-occurring element (XML)
XML view keys and relationships 36 overview 4
L N
layout name tag
arranging views 94 description 13
leaf element (XML) namespace
overview 4 description 11
updating in XML Editor 93
Index 177
naming columns pivoting
option 60 deleting pivoting columns 86
Navigator in Advanced Options 59
viewing complex types 96 rules 51
viewing simple types 96 with Xpath Navigator 82
new line character XML columns 51
XML attributes 60 ports
non-recursive row option pass-through 147
description 100 XML Source Qualifier transformation 129
normalized XML groups XML targets 115
description 38 PowerCenter Server
null constraint (XML) handling XML targets 157
description 16 precision
null data overriding infinite length 59
XML target files 158 prefix
updating namespace 93
previewing
O XML data 97
properties
on commit midstream XML transformation 144
append to document 161 XML caching 162
create new documents 161 XML Generator transformation 145
ignore commit 161 XML Parser transformation 144
options 160
options
for creating views 59
output files
R
session properties 156 reference ports
adding to views 147
finding in workspace 87
P relative cardinality (XML)
description 15
parent chain (XML) #REQUIRED option
description 5 description 7
parent element (XML) reset
description 5 midstream generated key sequence 145
partitionable restart
XML source option 153 midstream generated key sequence 145
passive transformations root element
XML Source Qualifier 122 specifying in a target 114
pass-through ports
adding to XML views 144, 145
finding the reference ports 87
generating 147
S
overview 87 saving
pattern facet metadata as XML/DTD/schema 97
description 17 schema
files 9
location in namespace 93
searching for components 95
searching in XML Editor 95
simple types 17
178 Index
searching transaction control point
in schema 95 XML targets 116
in XML Editor 95 transformation
select columns XML Source Qualifier 122
creating type relationships 90 transformation datatypes
sequence comparing to XML 170
numbering generated keys 126 troubleshooting
sequence group (XML) XML Source Qualifier transformation 135
description 23 XML sources 74
server handling XML targets 118
XML sources 153 type relationships
XML targets 157 creating in the workspace 90
session logs Type Relationship Row option 101
XML targets 162
session properties
output files, XML 156 U
XML Generator transformation 165
XML Parser transformation 167 unions (XML)
XML sources 152 description 18
XML targets 156
simple types
description 17 V
viewing a hierarchy 96
single-occurring element (XML) validating
overview 5 target rules 112
Skip Create XML Views validate target option 156
setting custom views 65 XML definitions 99
source filename XML source option 153
XML sources option 153 view
source filetype arranging in workspace 94
XML source option 153 creating relationships 89
sources generating entity relationships 42
creating XML targets from XML sources 106 hierarchical relationships 38
special characters setting options 59
parsing 158 view row
start value description 49
generated keys 126 using the Force Row option 101
substitution groups viewing
example 45 XML metadata as XML/DTD/Schema 97
in XML schema files 24
synchronizing
XML definitions 68 W
webzine xxviii
T
targets X
duplicate group row handling 159
XML
setting DTD/schema reference 160
attributes 60
specifying a root element 114
character encoding 27
code pages 27, 57
Index 179
comparing datatypes to transformation 170 XML metadata
datatypes 170 cardinality 14
description 3 datatype 13
metadata from 31 description of types 11
previewing data 97 from DTD file 32
special characters 158 from substitution groups 24, 45
synchronizing definitions with schemas 68 from XML file 31
XML definitions from XML schema 33
creating from flat files 72 hierarchy 13
creating from relational files 72 name 13
editing source definition properties 70 null constraint 16
synchronizing with sources 68 viewing as a schema 97
XML Editor XML Parser transformation
adding columns to views 82 example 139
Columns window 81 overview 138
creating a new view 81 session properties 167
deleting columns 86 XML rules
editing views 82 pivoting groups 51
icons legend 79 XML source and target groups from relational tables 35
options 59 XML target port connections 115
pass-through fields 144, 145 XML schema
searching for components 95 complex types 19
validating definitions 99 metadata from 33
XML Generator transformation setting default attributes 115
example 141 XML Source Qualifier transformation
overview 138 adding to mapping 123
pass-through ports 147 automatically creating 123
session properties 165 editing 125
XML groups manually creating 123
all group 24 overview 122
choice group 23 port connections 129
custom 37 troubleshooting 135
denormalized groups 40 using in a mapping 129
element and attribute groups 23 XML sources
modifying source groups 57 creating targets from 106
normalized groups 38 editing XML definitions 70
options for creating 59 importing 57
source and target groups from relational tables 35 limitations 54
substitution groups 24, 45 partitionable option 153
XML hierarchy server handling 153
child element 4 session properties 152
creating hierarchy relationships 64 source filename 153
enclosure element 4 source filetype option 153
global element 4 source location 152
leaf element 4 troubleshooting 74
local element 4 validate option 153
multiple-occurring element 4 XML targets
parent chain 5 active sources 114
parent element 5 creating 104
single-occurring element 5 creating from XML sources 106
duplicate group row handling 159
180 Index
editing 104, 107
editing target properties 110
flushing data 116
groups from relational tables example 35
importing 104, 105
in sessions 155
limitations 54
multi-line attributes 60
On Commit session property 116
outputting multiple files 161
overview 104
port connections 115
server handling 157
session log entry 162
session properties 156
setting default attributes 115
setting DTD/schema reference 160
troubleshooting 118
using in mapping 114
validate option 156
XML View Options
setting in the Columns window 100
XML views
adding columns 82
adding pass-through fields 144, 145
creating 61
creating a new 81
creating hierarchy relationships 64
creating relationships between 89
creating with XML Wizard 61
generating custom views 65
generating entity relationships 63
pivoting columns 51
Skip Create XML View option 65
XML Wizard
generating custom XML views 65
generating entity relationships 63
generating hierarchy relationships 64
importing sources 57
selecting root elements 66
synchronizing XML definitions 68
XPath
See also XML Path
expanding complex types 84
using the Navigator 82
Index 181
182 Index