Importing SequentialFiles in DataStage
Importing SequentialFiles in DataStage
Table of Contents:
Table of Contents:................................................................................................................... 2
Introduction............................................................................................................................... 3
Objective ................................................................................................................................... 3
Working with Sequential Files.................................................................................................. 4
Data Files .................................................................................................................................. 5
Metadata Workbench Data Lineage Analysis........................................................................... 5
Importing Sequential Files From the DataStage and QualityStage Designer.................... 6
Publish the Sequential File Definition From the DataStage and QualityStage Designer .. 8
Modify the Data File From the DataStage and QualityStage Designer........................... 10
Deleting Data Files - From the Metadata Asset Manager....................................................... 11
Synchronizing Published Sequential Files.............................................................................. 12
Summary ................................................................................................................................. 13
Introduction
Data is at the core of our business and therefore proper management and understanding of such is essential.
Furthermore, data remains at the hub of the IBM Information Server. Whether it is the ETL processes of
DataStage and QualityStage, profiling by Information Analyzer or definition and understanding by Business
Glossary and MetaData Workbench, all require and reference a data source.
For purposes of identification and re-use, it becomes imperative that care be taken in how data structures are
imported into the IBM Information Server. Several mechanisms for import exist, however in all cases the
imported structure can be accessed and utilized by any of the IBM Information Server applications.
Objective
Learn how to import sequential files or complex flat file structures, displaying and re-using such structures
within the IBM Information Server applications.
Sequential Files are imported from the DataStage and QualityStage Designer, by invoking the Sequential File
Definitions import or the ODBC Connector. When complete, this import process creates a Table Definition
representing the structure of the Sequential File, including Column Definitions and their datatypes.
It is recommended to publish the created Table Definition as a Shared Table. This will allow the IBM
MetaData Workbench analysis services to report on the data dependencies of the Sequential File, including
searching on the File and viewing the DataStage Jobs which read or write from the File.
It is not required to import Sequential Files to facilitate Data Lineage analysis within the Metadata
Workbench. The Metadata Workbench will link DataStage File Stages, from different Jobs, together, when
one Stage is reading and the other is writing to an identical File. An identical file is determined by the
defined file name and location of the DataStage Stage.
Data Files
Data Files further may be assigned a Business Term, Business Label or a Data Steward via InfoSphere
Business Glossary or InfoSphere MetaData Workbench, in addition to allowing the authoring of its
Description or Business Name.
Data Files and Relational Databases are collectively referred as Implemented Data Resources, within the
Information Server.
When published, the following components and relationships are captured and defined:
Data File Structure The component of the File, defined during publication
Data File Field The fields of the File, defined during import
The Data File created by the publication method, must reflect the fully qualified file name and location
defined within the DataStage File Stages.
When the file name or location includes Job Parameters or Environment Variables, those are replaced with
their Default Values when evaluating Design Metadata, and with their Runtime Values when evaluating
Operational Metadata.
For more information please refer to the Metadata Workbench Administration Guide.
o Select the Directory containing the Sequential File or Complex Flat File to import.
o Select the File, from the list of displayed files, to be imported.
o Set the DataStage Project Folder to contain the Table Definition to be created by the import
process.
o Click Import. The Define Sequential MetaData dialog appears.
o Click the Define Tab. Set the Column Names, SQL Type, Length, Description and other
properties as appropriate.
o Click OK to complete the import process. The import process creates a Table Definition
within the current DataStage Project. Table Definitions are specific to the DataStage Project
and are not included in the display of Data Files from the Metadata Workbench or Business
Glossary.
Select another File for import, or click Close to close the Import MetaData dialog.
A Table Definition
The Table Definition which has been created is identified as a Sequential File. It may be necessary to
view or edit the Table Definition properties to ensure the Locator Table Type indicated Sequential.
Publish the Sequential File Definition From the DataStage and QualityStage Designer
Shared Table Creation Wizard to publish a Table Definition as a Data File:
From the DataStage Repository viewer, select and Right-Click the Table Definition. Select Shared
Table Creation Wizard from the menu. The Shared Table Creation Wizard dialog appears.
Modify the Data File From the DataStage and QualityStage Designer
Metadata Management
Host Systems and Data Files may be removed from the IBM InfoSphere Metadata Asset Manager application.
Select Delete from the toolbar menu item to remove the selected Asset. Click Yes to confirm the
removal of the selected Asset. Deletion of a Data File will additionally remove the contained Structure
and Fields.
Optional: Select More Actions from the toolbar menu to view the Asset within the IBM InfoSphere
Metadata Workbench.
Introduction
As development and changes are made to Databases or Files and their structures, their will come a time where
those changes will need to be synchronized with existing Physical Data Sources previously imported into the
IBM Information Server. This synchronization should be seamless, by identifying current Information
Assets and any changed content.
Synchronization
Synchronization requires the re-import of the Physical Data Sources. Data that has changed, will be deleted
and imported, this will cause any alterations of the data, such as Definitions or Classification, to be lost. Data
that has remained the same will not be affected.
For example, changing a Field name will cause only the corresponding Field to be imported anew.
Upon re-importing a Sequential File from within DataStage, please keep the following in mind:
When re-importing the File, a user will be prompted that the Shared Data File, which has been
previously been published, will be disconnected.
After re-importing the File, the changed Table Definition must be re-published as a Shared Data
File.
When re-publishing the File, the identical Host and Data File previously associated with the File
should be selected.
Summary
It is good practice to import the data structures of all sources into the IBM Information Server. This allows
for a single point of reference for governance, development, definition and reporting. ETL Developers can
reference the same Data Source which has been classified within Business Glossary; enriching their
understanding, analyzed within Information Analyzer or depicted within a Data Lineage report from the
Metadata Workbench.