ETL Standards Document
ETL Standards Document
ETL Standards Document
Arbitron, Inc.
Page 1
12/12/2013
Revision History
Date December 21, 2004 January 6, 2005 January 11, 2005 Version 1.0 1.1 1.2 Description Initial Outline for Review Altered Format and Details Revised Metadata Extension Names Added Backup and Down Times March 27, 2007 1.3 Added System Test environment information Patricia Khan Author Bhuwan Joshi Jack Vorsteg Mike Ficca
Arbitron, Inc.
Page 2
12/12/2013
Table of Contents
1. Introduction ................................................................................................................................ 6 1.0 Informatica PowerCenter ................................................................................................. 6 1.1 Scope ................................................................................................................................ 6 1.2 Audience ........................................................................................................................... 6 1.3 Terms ................................................................................................................................ 6 ETL Life Cycle ............................................................................................................................. 7 The Informatica PowerCenter Environments ......................................................................... 11 3.0 Development ................................................................................................................... 11 3.0.1 The DWDEVL1 Repository ............................................................................. 11 3.1 Test / UAT ....................................................................................................................... 11 3.1.1 The DWTEST1 Repository ............................................................................. 12 3.2 Production ...................................................................................................................... 13 3.2.1 The DWPROD1 Repository ............................................................................ 13 3.3 General Environment Settings ...................................................................................... 13 3.3.1 UNIX Environment Variables ......................................................................... 14 3.3.2 Server Variables / Directories ........................................................................ 14 PowerCenter Repositories....................................................................................................... 15 4.0 Privileges ........................................................................................................................ 15 4.0.1 Groups ............................................................................................................. 16 4.0.2 Users................................................................................................................ 16 4.0.3 Folders............................................................................................................. 16 4.1 The DWDEVL1 Repository............................................................................................. 16 4.1.1 Developer Folders .......................................................................................... 16 4.1.2 MASTER_SRC_TRGT_OBJS Shared Folder ................................................ 17 4.1.3 Project Folders................................................................................................ 18 4.2 The DWTEST1 Repository ............................................................................................. 19 4.2.1 Developer Folders .......................................................................................... 20 4.2.2 MASTER_SRC_TRGT_OBJS Shared Folder ................................................ 20 4.2.3 Project Folders................................................................................................ 20 4.3 The DWPROD1 Repository ............................................................................................ 20 4.3.1 Developer Folders .......................................................................................... 20 4.3.2 MASTER_SRC_TRGT_OBJS Shared Folder ................................................ 20 4.3.3 Project Folders................................................................................................ 21 PowerCenter Standards ........................................................................................................... 21 5.0 PowerCenter Repository Manager Standards ............................................................. 21 5.0.1 Repository Server ........................................................................................... 21 5.0.2 Repository Name ............................................................................................ 21 5.0.3 Repository Schema Owner ............................................................................ 21 5.0.4 Repository Folder ........................................................................................... 21 5.0.5 Repository Group Name ................................................................................ 21 5.0.6 Repository User Name ................................................................................... 22 Page 3 12/12/2013
2. 3.
4.
5.
Arbitron, Inc.
ETL Standards Document 5.1 PowerCenter Designer Standards ................................................................................ 22 5.1.1 Mapping Standards ........................................................................................ 22 5.1.2 Mapplet Standards.......................................................................................... 22 5.1.3 Source Object Standards ............................................................................... 23 5.1.4 Target Object Standards ................................................................................ 23 5.1.5 Source Qualifier Standards ........................................................................... 24 5.1.6 Advanced External Procedure Standards .................................................... 24 5.1.7 Aggregator Standards .................................................................................... 24 5.1.8 Expression Standards .................................................................................... 24 5.1.9 Filter Standards .............................................................................................. 24 5.1.10 Joiner Transformation Standards ................................................................. 25 5.1.11 Normalizer Transformation Standards ......................................................... 25 5.1.12 Ranker Transformation Standards ................................................................ 25 5.1.13 Router Transformation Standards ................................................................ 25 5.1.14 Sequence Generator Transformation Standards ......................................... 25 5.1.15 Stored Procedure Transformation Standards .............................................. 25 5.1.16 Update Strategy Transformation Standards ................................................ 26 5.1.17 Lookup Transformation Standards ............................................................... 26 5.1.18 Other Transformation Standards .................................................................. 27 5.1.19 Miscellaneous Standards ............................................................................... 27 PowerCenter Workflow Manager Standards ................................................................ 27 5.2.1 Informatica Power Center Server .................................................................. 27 5.2.2 Workflow Standards ....................................................................................... 27 5.2.3 Worklet Standards .......................................................................................... 28 5.2.4 Session Standards ......................................................................................... 28 5.2.5 Miscellaneous Workflow Standards.............................................................. 28 Power Centers Web Services Standards ..................................................................... 29
5.2
5.3
Appendix A: Code Review................................................................................................................ 30 Mapping Checks ....................................................................................................................... 30 Data and Unit Testing............................................................................................................... 32 Reviewers Additional Comments ........................................................................................... 32 Appendix B: Mapping Specification Templates ............................................................................. 33 Mapping Specification Template .................................................................................... 33 High Level Process Overview ......................................................................................... 34 Processing Description (Detail) ...................................................................................... 34 Stored Procedures ........................................................................................................... 34 External Procedures ........................................................................................................ 35 Mapplets ........................................................................................................................... 35 Aggregators ..................................................................................................................... 35 Ranks ................................................................................................................................ 36 Router ............................................................................................................................... 36 Joiner ................................................................................................................................ 36 Comments ........................................................................................................................ 37 [Note: Please copy and paste your comments from your Informatica Object here] .. 37 Normalizer ........................................................................................................................ 37 Expressions ..................................................................................................................... 37 Arbitron, Inc. Page 4 12/12/2013
ETL Standards Document ETL Estimations ................................................................................................................................ 38 Source to Staging, ODS or 3NF Model ................................................................................... 38 Staging to ODS or 3NF Model ................................................................................................. 38 PSA to MDA or Mart ................................................................................................................. 38
Arbitron, Inc.
Page 5
12/12/2013
1.
Introduction
1.0 Informatica PowerCenter Current Version: 7.1.1 Description: Informaticas ETL Software product used to address the complete lifecycle of all data integration and delivery needs. 1.1 Scope The scope of this document is to provide the standards and to describe the development lifecycle that properly trained individual must follow to develop high performance ETL Mappings and Workflows. Deviations from this document must be approved on a case by case by both an ETL Administrator and the Enterprise Informatica Owner. 1.2 Audience This document is intended for: 1.3 Terms PowerCenter: The Data Integration Platform software produced by Informatica. It has many pieces that are referred to within this document: Data Integration Engine, Repository Server, a Repository Database, and various Client Tools. The client tools are: Designer, Workflow Manager, Workflow Monitor, Repository Manager, and Repository Server Administration Console. Shareable: Refers to objects that are found in a shareable folder. These objects can be used in any folder within a repository by simply dragging and dropping into the target folder. Shareable objects can be sources, targets, mapplets, mappings, and transformations in the Designer, as well as commands, emails, sessions, and worklets in the Workflow Manager. Note: When deciding on using shared objects, keep in mind that if a change is made to the object, everything using that object is affected. Lets put this in perspective. If Project A and B are using a shared object, and Project A must change that object for their new development effort, then Project B must incorporate the change as well, whether they are ready for it or not. If Project B refuses to implement the change, then Project A might be late on their deliverable. PowerCenter Administrator PowerCenter Development Team Others with a good understanding of Informatica PowerCenter
Arbitron, Inc.
Page 6
12/12/2013
ETL Standards Document Reusable: Refers to transformations that are found in a regular folder. These transformations are restricted to the folder in which they were created. Other folders do not have access to them in the same manner as the shareable transformations, unless they are copied. Transformations can be made reusable and will appear under the Transformations list in the folder. Mapplets, sources, targets, and mappings are automatically created as reusable. ETL Administrator or ETL Admin: Refers to the PowerCenter Repository user id that has all the rights and privileges to work within the Informatica environment. It does not refer to any specific individual or individuals on the Data Management team. ETL Developer: Refers to the PowerCenter Repository user id that has all the rights and privileges to work within their own Informatica Folder. The Users will also have execute permission within folders of projects they are assigned to. Data Architects or DA: Refers to the Arbitron employee assigned to the project and creator of either a source or target data model (or both). This user is responsible for the source to target mappings documentation as well as the schema implementation. Database Administrator or DBA: Refers to the Arbitron employee assigned as the database administrator of the project.
2.
Arbitron, Inc.
Page 7
12/12/2013
STEP 2
IMPLEMENTATION/MODEL REVIEW The ETL Developer will meet with appropriate DA or persons to review the Data model and/or SRC and discuss the implementation strategy . This discussion shall include the methods to be developed in the event of a required historical backfill of data. NOTE: At this time there is no formal document required. The ETL Developer should be prepared to discuss the solution.
STEP 3
ETL IMPLEMENTATION REVIEW The Implementation Strategy developed during step 2 shall be discussed with at least one ETL Administrator. NOTE: At this time there is no formal document required. The ETL Developer is expected to provide enough detail so that the ETL Administrator is aware of the overall development effort.
STEP 4
DEVELOPMENT/ UNIT TESTING The ETL Developer shall Develop and Unit Test all mappings and Workflows in their own folder in the DWDEVL1 repository. All Source, Targets and Reusable Objects used in mappings shall be links to the MASTER_SRC_TGT_OBJ folder. The ETL Developers are dependent upon the DA to have the Schema in place in the correct development environment. The ETL Administrator is responsible for Importing all the necessary Source and Targets necessary for the ETL Developer to create their mappings. It is the ETL Administrators responsibility to perform a impact analysis in the event that a Source or Target definition is being updated.
Arbitron, Inc.
Page 8
12/12/2013
Issue: Go to Step 4
STEP 6
DEVELOPER-to-PROJECT FOLDER MIGRATION DEVELOPMENT SYSTEM INTEGRATION TESTING The ETL Administrator shall be responsible form the migration of the developers mappings and workflows from the Developers personal folder to the project folder. The Developer will then be responsible for a complete system integration test of the mappings and workflows in the overall environment. The Developer shall be responsible for scheduling the workflows in their correct order and verifying their execution.
Issue: Go to Step 4
STEP 8
DA/ETL DEVELOPER REVIEW The ETL Developer and the DA assigned to the project shall conduct a brief data analysis to review the data against the source to target mapping requirements. This review is not meant to identify errors in the mappings, but rather to help verify the data is loaded as intended
Issue: Go to Step 4
STEP 9
SYSTEM TEST MIGRATION/SYSTEM TEST The ETL Developer shall notify the ETL Admin when mappings and/or workflows are ready to be migrated to the DWSTEN1 repository. It shall be the responsibility of the DA to be certain that the Source and Target definitions are in place prior to the scheduling of the Mappings. The TESTER assigned to the project will be responsible for running the workflows and following all standards outlined by ARBITRONS BEST PRACTICES for TESTING. It is the responsibility of the ETL Developer, DA and TESTER and User Community to decide on the best methods for System Testing.
Arbitron, Inc.
Page 9
12/12/2013
Issue: Go to Step 4
STEP 10
TEST MIGRATION/TESTING/UAT The ETL Developer shall notify the ETL Admin when mappings and/or workflows are ready to be migrated to the DWTEST1 repository. It shall be the responsibility of the DA to be certain that the Source and Target definitions are in place prior to the scheduling of the Mappings. The TESTER assigned to the project will be responsible for running the workflows and following all standards outlined by ARBITRONS BEST PRACTICES for TESTING. It is the responsibility of the ETL Developer, DA and TESTER and user Community to decide on the best methods for UAT.
Issue: Go to Step 4
STEP 11
PRODUCTION MIGRATION The ETL Developer shall notify the ETL Admin when mappings and/or workflows are ready to be migrated to the DWPROD11 repository after the final SPRINT REVIEW and/or User Acceptance. It shall be the responsibility of the DA to be certain that the Source and Target definitions are in place prior to the scheduling of the Mappings. The ETL Admin will migrate the associated mapping based on the standards outlined in the ETL Migration Strategy Document. The ETL developer and DA are responsible for communicating the scheduling requirements and entering this information into the METADATA schema if applicable.
STEP 12
MAINTENANCE
STEP 1
Arbitron, Inc.
Page 10
12/12/2013
3.
Arbitron, Inc.
Page 11
12/12/2013
ETL Standards Document Server Scripts: /app/sten/informatica/powercenter/server/scripts Web Services Server Name: diuetl01_ws_sten Web Services Server Port: 5556 Web Services Server Home: /app/sten/informatica/powercenter/webserviceshub 3.1.1 The DWSTEN1 Repository Database: Oracle 9i Schema Owner: PM_REPO_STEN1 Instance: DWSTEN1 Backup: Monday Friday 6AM and 12noon Scheduled Downtime: Sunday 6PM Sunday 10PM The dwsten1 repository is where System Testing will take place. At this point all workflow problems should have been identified during the Unit Test phase. If a problem is discovered, developers will correct the problem in the dwdevl1 repository in their own folders. Once the complete Unit Testing cycle has been completed, the Administrator will re-migrate the corrected object(s) into dwsten1 repository. General developers, testers and users may have select permission upon request to Views to query the repository. 3.2 User Acceptance Test UNIX Box: ARBETL1 Solaris 5.8 SPARC Repository Server Name: DWTEST1 Repository Server Port: 5004 Repository Server Home: /app/informatica/powercenter/repositoryserver Server Name: arbetl1_TEST1 Server Port: 4004 Server Home: /app/informatica/powercenter/server Server Scripts: /app/informatica/powercenter/server/scripts Web Services Server Name: arbetl1_ws_test Web Services Server Port: 5555 Web Services Server Home: /app/informatica/powercenter/webserviceshub 3.2.1 The DWTEST1 Repository Database: Oracle 9i Schema Owner: PM_REPO_TEST1 Instance: DWTEST1 Backup: Monday Friday 6AM and 12noon Scheduled Downtime: Sunday 6PM Monday Midnight The dwtest1 repository is where User Acceptance Testing will take place. At this point all workflow problems should have been identified during the System Test phase. If a Arbitron, Inc. Page 12 12/12/2013
ETL Standards Document problem is discovered, developers will correct the problem in the dwdevl1 repository in their own folders. Once the complete Unit Testing cycle has been completed, the Administrator will re-migrate the corrected object(s) into the dwsten1 respository for System Testing. After System Testing is completed successfully, the Administrator will re-migrate the corrected object(s) into the dwtest1 repository. General developers, testers and users may have select permission upon request to Views to query the repository. 3.3 Production UNIX Box: PIUETL01 Solaris 5.8 SPARC Repository Server Name: DWPROD1 Repository Server Port: 5004 Repository Server Home: /app/informatica/powercenter/repositoryserver Server Name: piuetl01_PROD1 Server Port: 4004 Server Home: /app/informatica/powercenter/server Server Scripts: /app/informatica/powercenter/server/scripts Web Services Server Name: piuetl01_ws_prod Web Services Server Port: 5555 Web Services Server Home: /app/informatica/powercenter/webserviceshub 3.3.1 The DWPROD1 Repository Database: Oracle 9i Schema Owner: PM_REPO_PROD1 Instance: DWPROD1 Backup: Monday Saturday 11:59PM Scheduled Downtime: Sunday 3AM Sunday 3:05AM
The dwprod1 repository is where the Production run takes place. Since all the development and testing will be done in development and test environments, this repository will have no write permissions other than the Administrator who has full access. There will be a user id created which has only enough permissions and privileges to execute the workflows via scripts, but the actual password will be encrypted. The Developers will have select permission to this environment, and will be responsible for confirming all scheduled workflows.
3.4
General Environment Settings The following information is consistent across all environments. Substitute above values as necessary.
Arbitron, Inc.
Page 13
12/12/2013
ETL Standards Document 3.4.1 UNIX Environment Variables repo1_name=DWTEST1 - The Current Environment Repository repo_host=arbetl1 - The Current UNIX Repository Host repo_port=5004 - The Current Repository Port runpm_password=ND87NKGOFKM:N - The runpm user password runpm_user=runpm - The general pmcmd user for scripts server_host=arbetl1 - The Current UNIX Server Host server_port=4004 - The Current Unix Server Port 3.4.2 Server Variables / Directories $PMWorkflowLogDir /data/informatica/powercenter/WorkflowLogs $PMWorkflowLogCount 0 $PMLookupFileDir /data/informatica/powercenter/LkpFiles $PMRootDir /app/informatica/powercenter/server $PMSessionLogDir /data/informatica/powercenter/SessLogs $PMBadFileDir /data/informatica/powercenter/BadFiles $PMCacheDir /data/informatica/powercenter/Cache $PMTargetFileDir /data/informatica/powercenter/TgtFiles $PMSourceFileDir /data/informatica/powercenter/SrcFiles $PMExtProcDir $PMRootDir/ExtProc $PMTempDir $PMRootDir/Temp $PMSuccessEmailUser [email protected] $PMFailureEmailUser [email protected] $PMSessionLogCount 2 $PMSessionErrorThreshold 1 Where: $PMRootDir = /app/informatica/powercenter/server Because development and System Test share the same UNIX server, the System Test server variables and directories are as follows: $PMWorkflowLogDir /data/sten/informatica/powercenter/WorkflowLogs $PMWorkflowLogCount 0 $PMLookupFileDir /data/sten/informatica/powercenter/LkpFiles $PMRootDir /app/sten/informatica/powercenter/server $PMSessionLogDir /data/sten/informatica/powercenter/SessLogs $PMBadFileDir /data/sten/informatica/powercenter/BadFiles $PMCacheDir /data/sten/informatica/powercenter/Cache $PMTargetFileDir /data/sten/informatica/powercenter/TgtFiles $PMSourceFileDir /data/sten/informatica/powercenter/SrcFiles $PMExtProcDir $PMRootDir/ExtProc $PMTempDir $PMRootDir/Temp $PMSuccessEmailUser [email protected] $PMFailureEmailUser [email protected] $PMSessionLogCount 2 $PMSessionErrorThreshold 1 Arbitron, Inc. Page 14 12/12/2013
4.
PowerCenter Repositories
The Repository Manager is used to configure security at the folder, group and user level for each folder in a repository. This is not to be confused with the connection security defined in the Workflow Manager. Although it appears to be at the repository level, security is applied at the folder level. The permissions and privileges given to a user and group will determine what a user can do and see in the repository. Before a folder is created, users and groups are defined. Once that is done, a folder can be created and given ownership to a user and a group that the user belongs to. Remember that a user can be associated to multiple groups but from a folder perspective only 1 group can be specified for any given user. This is important when assigning group permissions to the folder. 4.0 Privileges Privileges provide the base levels of a given security structure because they are assigned to Users and Groups. New Privileges may not be created. A good understanding of these will help when creating your Groups and Users. Privileges are assigned in the Repository Manager in Security | Manage Privileges. The Administrator or members of the Administrators group can change privileges as they have the Administer Repository privilege. The Table below should provide you with a high-level picture of the available privileges. For a more in depth description of each privilege, please refer to the Informatica PowerCenter Documentation. Privilege Use Designer Browse Repository Use Workflow Manager Brief Description Connect to Repository using the Designer tool. Folder permissions then dictate how much one may access there. Connect to Repository using the Repository Manager Tool. You may run reports from here. Connect to Repository using the Workflow Manager tool. You can Monitor sessions/workflows through Workflow Monitor tool. Folder permissions then dictate how much one may access. Connect to the Informatica Server engine. Start jobs where Folder permissions have been granted.
Workflow Operator
Arbitron, Inc.
Page 15
12/12/2013
ETL Standards Document Administer Repository Administer Server Super User Manage Users, Groups, and Privileges. Upgrade, Restore, and Backup Repository. Manipulate Folders where appropriate permissions apply. Start / Stop the Informatica Server Engine The sky is the limit. Table 1 Background of Privileges 4.0.1 Groups Groups are created and maintained in the Repository Manager in the Security menu item. The Administrator or members of the Administrators group can create or maintain groups as they have the Administer Repository privilege. It is advisable that each repository contains only the users and groups that it needs. If all the repositories contain the same users and groups a security breach could ensue. 4.0.2 Users Users are created and maintained in the Repository Manager in the Security menu item. The Administrator or members of the Administrators group can create or maintain groups and users as they have the Administer Repository privilege. 4.0.3 Folders Folders are created and maintained in the Repository Manager in the Folder menu option. The Administrator or members of the Administrators group can create or maintain folders as they have the Administer Repository privilege. Since you can only have one Owner, and one Group assigned to a Folder, it is very important how these are assigned to insure the proper security. Please also note that there are no Sub-Folders. 4.1 The DWDEVL1 Repository 1. All Developers will be assigned to Project Groups 2. All Testers will only be assigned to the PUBLIC Group 4.1.1 Developer Folders Each developer will have his/her own folder with FULL read/write and execute permission. All development and Unit Testing should happen within this folder. This Developer shall be assigned to the DEVELOPER group and this group will have read permissions only to the Folder. The repository public users will also
Arbitron, Inc.
Page 16
12/12/2013
ETL Standards Document have read permission to individual developers folders. The following image is an example of general developers folders permissions.
4.1.2
MASTER_SRC_TRGT_OBJS Shared Folder The MASTER_SRC_TRGT_OBJS is a centralized shared folder owned by the user REPO_ADMIN that contains all the Source, Target and Reusable Transformations used by every Project and/or Developers folders. Every Source and Target in other folders MUST be a shortcut to this folder. This will aid the repository administrators when determining impact analysis of changes as well as migrations. Reusable Transformation transformations will be considered on a case-by-case basis, but it is not recommended at this time to reuse these transformations across folders. Administrators and Senior Developers assigned to the group MASTER_SRC_TRGT_OBJS, will have permission to write to this folder. In the event of an update, it will be their responsibility to take care of the other projects affected. (Notice that Allow Shortcut is selected).
Arbitron, Inc.
Page 17
12/12/2013
4.1.3 Project Folders The individual project folders are owned by the REPO_ADMIN user and contain all mappings and workflows associated with a project. A group will be created that contains all the individual developers associated with the project. All source and targets defined in these folders must be shortcuts to the centralized shared folder MASTER_SRC_TRGT_OBJS. No transformations may be shared from these project folders to other folders for any reason. Administrators will have permission to write to this folder after the mappings and workflows have been Unit Tested and reviewed for standards compliance. Once mappings and workflows are migrated to the Project Folders, all Developers assigned to the folders group will have both read and execute permission in order to perform system integration testing.
Arbitron, Inc.
Page 18
12/12/2013
4.2
The DWSTEN1 Repository 1. The Developers will only be assigned to the Public Group 2. Testers will be assigned to Project Groups 4.2.1 Developer Folders Developer folders will not exist in the dwsten1 repository. Developers will have read only permission in this repository. 4.2.2 MASTER_SRC_TRGT_OBJS Shared Folder The MASTER_SRC_TRGT_OBJS folder is owned by the user REPO_ADMIN in the DWSTEN1 repository. Only ETL Administrators will have permission to write to this folder. All other Users, Developers and Testers will have read-only permissions.
Arbitron, Inc.
Page 19
12/12/2013
ETL Standards Document 4.2.3 Project Folders All Project Folders are owned by the user REPO_ADMIN in the DWSTEN1 repository. Only ETL Administrators will have permission to write to these folders. Testers assigned to the Project Folder will be granted Execute permission to perform System Testing. 4.3 The DWTEST1 Repository 1. The Developers will only be assigned to the Public Group 2. Testers will be assigned to Project Groups 4.3.1 Developer Folders Developer folders will not exist in the dwtest1 repository. Developer will have read only permission in this repository. 4.3.2 MASTER_SRC_TRGT_OBJS Shared Folder The MASTER_SRC_TRGT_OBJS folder is owned by the user REPO_ADMIN in the DWTEST1 repository. Only ETL Administrators will have permission to write to this folder. All other Users, Developers and Testers will have read-only permissions. 4.3.3 Project Folders All Project Folders are owned by the user REPO_ADMIN in the DWTEST1 repository. Only ETL Administrators will have permission to write to these folders. Testers assigned to the Project Folder will be granted Execute permission to assist in UAT testing. There is no plan in place to allow users to directly execute any Power Center workflow from the Informatica Client applications. 4.4 The DWPROD1 Repository 1. The Developers will only be assigned to the Public Group. 2. Testers will only be assigned to the Public Group. 4.4.1 Developer Folders Developer folders will not exist in the dwprod1 repository. Developer will have read only permission in this repository. 4.4.2 MASTER_SRC_TRGT_OBJS Shared Folder The MASTER_SRC_TRGT_OBJS folder is owned by the user REPO_ADMIN in the DWPROD1 repository. Only ETL Administrators will have permission to
Arbitron, Inc.
Page 20
12/12/2013
ETL Standards Document write to this folder. All other Users, Developers and Testers will have read-only permissions. 4.4.3 Project Folders All Project Folders are owned by the user REPO_ADMIN in the DWPROD1 repository. Only ETL Administrators will have permission to write, schedule and execute within these folders on a normal basis. Execute permission will be granted on a case-by-case basis when deemed necessary. The general runpm user will be used for normal script and web services execution as necessary.
5.
PowerCenter Standards
5.0 PowerCenter Repository Manager Standards When the ETL Administrator is creating new objects associated with a repository, the following standards and conventions should be adhered to. 5.0.1 Repository Server Name is same as the UNIX Host name This name is determined by system Services. Ex. DIUETL01 The Repository name is the same as the Oracle Instance in which it resides. Ex. DWDEVL1 PM_REPO_(ENVIRONMENT)# The name based upon the environment as well as the number of repositories for that environment. Ex. PM_REPO_DEVL1 Any abbreviated Arbitron system or a unique value agreed upon by the ETL Administrator. Project Folder names shall be in Upper Case Developers Folder names shall be in Lower Case Ex. EDD, EDW, ODS Each Repository Folder will have an associated Group with the exact same name. Page 21 12/12/2013
Arbitron, Inc.
ETL Standards Document 5.0.6 Repository User Name 5.1 Each User name will consist of the users first initial and complete last name Ex. jvorsteg ETL Administrators may create system users that do not follow the developer naming convention
PowerCenter Designer Standards When the ETL Developer is creating new mappings or editing a mapping, the following standards and conventions will be adhered to. 5.1.1 Mapping Standards Ex. M_GDR001_LOAD_ODS_SRVY_FROM_GDR_PROD All Upper Case Name should start with M_ and then followed by abbreviated source database name then followed by numeric representation (3 numbers e.g. 001,002). Source database representation should be followed by the operation e.g. Insert/Update/Delete/Truncate or Load (generic). Operation should be followed by target table name. In case of multiple targets use the major target name. Target table name may be followed by the optional description of the mapping. All the above components of mapping name should be separated by _. Mapping name will not exceed 80 characters. Mappings will cover only one data flow. Mappings can have multiple sources and/or multiple targets but they can not have two entirely separate data streams in the same mapping. All mappings will have comments starting with date and initials of the person writing comments. These comments should be added in the mapping itself. Ex. 01/01/2002 JBV COMMENT. If a mapping is modified, comments with the name of the transformations being modified must be appended to the original comments. The actual transformations modified shall contain the date, initials and a brief description in the transformations AUDIT metadata extension. The Mapplet will reside either in the respective project folder as reusable or in the Common Transformations folder. Mapplets will follow the same standards as mappings where applicable. Mapplets names will begin with MPL_ Page 22 12/12/2013
ETL Standards Document Mapplets will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Database Source names will remain the same as the value imported from data base. Flat file source names will have the same name as the flat file Source. Add Three/Four characters for the abbreviated source system name if needed for clarification When creating a mapping that uses new Sources you will first need to have the ETL ADMIN import these into the MASTER_SRC_TRGT_OBJS Shared Folder. You then create a Shortcut to that Object in your own development folder in order to use it in the Mapping. Make sure to first create a shortcut for the sources in your Source Analyzer work space before they are used in your mapping. This will allow you to edit the Name. You shall remove the 'Shortcut To_' from the name and save. Source Transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. The $Source server variable should be used (wherever applicable) for transformations that connect to the database. Database Target names will remain the same as the value imported from data base. Flat file Target names will have the same name as the flat file Target. Add Three/Four characters for the abbreviated source system name if needed for clarification When creating a mapping that uses new Targets you will first need to have the ETL ADMIN import these into the MASTER_SRC_TRGT_OBJS Shared Folder. You then create a Shortcut to that Object in your own development folder in order to use it in the Mapping. Make sure to first create a shortcut for the targets in your Warehouse Designer work space before they are used in your mapping. This will allow you to edit the Name. You shall remove the 'Shortcut To_' from the name and save. Target Transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Target Names should be unique in the mapping. When multiple instances of the same target table exist they will be distinguished by operation. The $Target server variable should be used (wherever applicable) for transformations that connect to the database. Page 23 12/12/2013
5.1.4
Arbitron, Inc.
ETL Standards Document If the Target Update Override is used, then the target name must begin with OVR_ Target Transformations will have a UPD_OVR metadata extension with a default value of N. If an Update override is created, then the update SQL shall be copied to the metadata extension and along with all the future changes. Source Qualifier names will begin with SQ_ followed by a unique description. Source Qualifier names will begin with SQ_OVR_ if the SQL is over written. Import all the tables which are used in the source qualifier. Even if no column has been selected from the table, still import the table and drag any column to SQ. This will avoid hiding any table in query override. Source Qualifiers will have a SQL_OVR metadata extension with a default value of N. If a SQL override is created, then the SQL shall be copied to the metadata extension and along with all the future changes. Advanced External Proc names will begin with AEP_ followed by a unique description. Advanced External Procedure Transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Aggregator names will begin with AGG_ followed by a unique description. Aggregator names expecting sorted input data begin with AGG_SI_. Aggregators will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Expression names will begin with EXP_ followed by a unique description. Expression transformations will have an UNCONNECTED_LKP metadata extension to hold the name of the Lookup. The extension will have a default value of N. Expression transformations will have an UNCONNECTED_SP metadata extension to hold the name of the Stored Procedure. The extension will have a default value of N. Expression transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Filter names will begin with FIL_ followed by a unique description. Page 24 12/12/2013
ETL Standards Document Filter transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Joiner names will begin with JNR_ followed by the name of the type of join, then the Master Table, then 2 underscores __, and then the detail table. Ex. JNR_OUT_MDA_MARKET_DIM__MDA_SOSO_FACT If the Joiner expects filtered inputs, then the name shall begin with JNR_SI_. Joiner transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Join Type abbreviations are IN, OUT. Normalizer names will begin with NMR_ followed by a unique description. Normalizer transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Rank names will begin with RNK_ followed by a unique description. Rank transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Router names will begin with RTR_ followed by a unique description. Router transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Router groups must have a valid condition name. Sequence Generator names will begin with SG_ followed by a unique description that preferably matches the table the sequence is used for. Sequence Generator transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Sequence Generators will reset to 0 upon session initialization to avoid sequence contention between environments. Stored Procedure names will begin with SP_ followed by the Schema (Without the environment) and the actual Procedure. Ex. SP_GDR_PPMRE1001P
Arbitron, Inc.
Page 25
12/12/2013
ETL Standards Document Stored Procedure transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Changes captured here must include a reference to the procedure in the event that only the stored procedure has changed. Stored Procedure transformations will have an RDBMS_TYPE metadata extension with a default value of Oracle. Update Strategy names will begin with UPD_ followed by the operation and finally the target name. Ex. UPD_INSERT_GDR_MKT_AREA Operations shall include INSERT, UPDATE, DELETE and REJECT. The UPDATE strategy data driven command shall be based on the PowerCenter variables dd_insert, dd_update, dd_delete and dd_reject. The data driven command shall not be performed based upon the numeric value. Update strategy will not be used for mappings only performing inserts. Lookup Transformations names will begin with LKP_ followed by the table name. Lookup Transformations that contain a SQL Override shall be named starting with LKP_OVR_ followed by the Primary Lookup table. Lookup Transformations that are unconnected names will begin with UN_LKP_. Lookup transformations will only have the ports to and from the transformation necessary to complete the lookup. Default Lookup policy is Report Error. Do not make dynamic Lookups reusable unless explicitly directed to be the ETL Administrator. Dynamic lookup names will begin with DLKP_. Lookup Transformations will have a SQL_OVR metadata extension with a default value of N. If a SQL override is created, then the SQL shall be copied to the metadata extension and along with all the future changes. Lookup Transformations will have a SQL_OVR_TABLES metadata extension to store the current SQL Override statements tables. The tables shall be listed as a continuous string with the different tables delimited by double pipes ||. This will have a default value N. Lookup Transformations will have an AUDIT metadata extension. Audits will include the Developers Name, Date and change description. Lookup Transformations will have an CONNECTED_LKP metadata extension with a default value Y.
Arbitron, Inc.
Page 26
12/12/2013
ETL Standards Document Unconnected lookups shall be used with caution. As a rule of thumb, if more than 10% of rows use the lookup, the lookup should be connected. As other transformation are introduced by Informatica and used by developers, standards will be determined during Step 3 of the PowerCenter lifecycle and added to the document as necessary. Input Ports shall be named based on the connect source when possible. A reasonable name is acceptable. At no time is any default name acceptable. Output Ports shall be named based on the connect source when possible. A reasonable name is acceptable. At no time is any default name acceptable. Variable ports shall be named beginning with v_ followed by a reasonable name. Ports based upon an unconnected Lookup shall begin with lkp_. Mapping Parameters names shall begin with $$. Mapping Variables names shall begin with $$V_.
PowerCenter Workflow Manager Standards When the ETL Developer is creating new workflow or editing a workflow, the following standards and conventions will be adhered to. 5.2.1 Informatica Power Center Server The Informatica Server name is based on the UNIX machine where it is located followed by a _, followed by the environment of the server. Ex. piuetl01_prod1 If a workflow is scheduled its name shall start with SCHED_. If a workflow is web service enabled its name shall start with WS_. The Workflow name shall include the Project name and any other reasonable details. Default Error Handling shall be Suspend on Error. If any database connections are required other than the standard ones, create a temporary one with prefix as your initials. This will help removing them later. E.g. If a database connection is required by Naresh for testing from production then the database connection name should start with Naresh_. ETL Administrators will create and destroy all database connections as necessary.
Arbitron, Inc.
Page 27
12/12/2013
ETL Standards Document 5.2.3 Worklet Standards Worklet names shall start with WL followed by numeric representation (2 characters) e.g. WL01, followed by a reasonable name associating it to a project. For worklets within a worklet, the numeric representation should be followed by an alphabet. E.g. If a worklet is within a worklet starting with WL01 then the worklet under this should start with WL01A. If more than two levels of hierarchical arrangement are required for worklets, then alternate the alphabet representation with the numeric representation. E.g. WL01 then WL01A then WL01A01 then WL01A01A and so on. Sessions names shall begin with S_ followed by a name that clearly represents the mapping associated to it. Sessions should have a session log named exactly the same as the session name. Bad file names should be unique. Historical loads (or other ad-hoc) may use multiple sessions per workflow. If multiple sessions for same mapping are needed, use a reusable session. Session instance names should be unique and descriptive with matching session log names. By default sessions should use standard db connections previously created. Developers should submit a request for a new connection (if needed) to an administrator. By default sessions should have target load type defined as normal. Target properties should only have db operations checked that are needed. I.e. Do not check update box if the job is insert-only. Whenever possible, server variables such as $source, $target, $PMRoot should be used. The SQL_OVR metadata extension shall be used if the SQL is modified at session level. The default value is N. Default error handling shall be set as Fail Parent If Task Fails. The DTM buffer size shall be set to 24M. The block buffer size shall be set to 128K. If a session calls a UNIX script, the metadata extension UNIX_SCRIPT shall include the name and full path of the UNIX script with a brief description. The default value is N.
Command Tasks shall be named CMD_ followed by the name of the UNIX
5.3 script or any other reasonable name. Relational Connections shall be named at the discretion of the ETL Administrators. Queue Connection TBD. FTP Connections shall be named FTP_ followed by the name of the targeted server. Application Connection TBD. Loader Connection shall be named as the discretion of the ETL Administrators. Workflow Parameters shall be named $$W_ followed by any reasonable name. Workflow Variables shall be named starting with $$WV_ followed by any reasonable name. Workflow Parameter Files shall be named starting with W_PARAM followed by the workflow it belongs to. Session Parameter Files shall be named starting with S_PARAM followed by the session it belongs to.
Arbitron, Inc.
Page 29
12/12/2013
Source file name (s) Target table name(s) Mapping name Workflow name Session name(s) Folder mapping is located in Developer name Date mapping completed Code Reviewed by Date Review Completed
Checks and Balances Mapping Checks Did you follow all naming standards? Did you add mapping comments including your initials and changes to the change log? Did you put comments into key transformations, and are the Arbitron, Inc. Page 30
Completed
Comments
ETL Standards Document transformations named in a descriptive and identifying way? Please pay special attention to any SQL Overrides, as the WHERE and JOIN clauses should be copied into the comments section of the Source Qualifier. Did you remove the Shortcut_to for all Sources and Targets Did you connect only the necessary ports from the lookup? Did you use $Source and $Target for Location Information Property for Lookup? Did you use an Update Strategy only for updated targets, not for inserts? Did you use DD_UPDATE in the Update Strategy Expression? Did you only connect the ports being updated? Does your mapping have a SQL override in the source qualifier when it is not needed (i.e. you could just use the source filter or user defined join)? Does your mapping use unique target names distinguished by operation? Workflow / Session Checks Did you assign the Workflow to the correct Server? Is the name of the Session Log the same as the name of the Session? Are all flat file options selected correctly? Example: Fixed Width files that use Spaces to signify NULL should have the following options checked: 1) Change Null Character to Space, 2) Repeat Null Character, 3) Strip Trailing Blanks 4) Line Sequential Format Are you only using the Standard Database Connection names? Is the session using Normal or Bulk loading appropriately? For a given Target Instance, did you only check the operations that will be done to that particular Target Instance? Does the bad file name make sense? Is there more than 1 mapping loading a target table at the same time? If yes, have you made the bad files unique? If your session uses a parameter file, did you test reading from that parameter file successfully?
YES YES
YES
YES
YES
Arbitron, Inc.
Page 31
12/12/2013
Data and Unit Testing When you viewed the target data, did it look clean? Examples: 1) Are you consistently getting NO data values in a particular column in the target table? 2) Did any string data get truncated? 3) If a Number passed into a String column, are there any decimals in the value, such as 1.000000? (Hint: there should not be.) When the session completed, did you open the session log and look for any records that did not insert due to transformation errors? Did the session log show that any fields are being truncated coming from the source? YES
YES
YES
Arbitron, Inc.
Page 32
12/12/2013
Sources
Tables Table Name Schema/Owner Selection/Filter
Targets
Tables Table Name Schema Owner Update Delete Insert Unique Key
Lookups
Lookup Name Table Match Condition(s) Location
Arbitron, Inc.
Page 33
12/12/2013
Source
Target
Processing Description (Detail) <Describe processing logic contained in mapping> Templates For Reusable Transformations Lookups
Folder Name Lookup Name Lookup Type Table Match Condition(s) Outputs Filter/SQL Override Comments Created by : Modified by : Location
Stored Procedures
Folder Name
Stored Procedure
Arbitron, Inc.
Page 34
12/12/2013
External Procedures
Folder Name
External Procedure Trans. Name Inputs Return Value Module / Programmic Identifier Procedure Name Runtime Location Is_Partitionable Comments
I
Mapplets
Folder Name
Mapplet Name
Aggregators
Folder Name Aggregator Name Groups
Group By Size
Arbitron, Inc.
Page 35
12/12/2013
Ranks
Folder Name Rank Name Number of Ranks Total Output Column Size Comments
[Note: Please copy and paste your comments from your Informatica Object here] Number of Groups
Router
Folder Name Router Name Group Names
Number of Groups Including Default
Group 1 Condition Group 2 Condition Group 3 Condition Default Group Utilized (Yes, No) Output Comments
[Note: Please copy and paste your comments from your Informatica Object here]
Joiner
Folder Name Joiner Name Master Source Detail Source Join Type Join Condition
Arbitron, Inc.
Page 36
12/12/2013
Commen ts Normalizer
Folder Name Normalizer Name Occurs Field Occurs Field Occurs Field Comments
[Note: Please copy and paste your comments from your Informatica Object here]
Number of Occurs Number of Occurs Number of Occurs [Note: Please copy and paste your comments from your Informatica Object here]
Expressions
Folder Name
Expression Name
Arbitron, Inc.
Page 37
12/12/2013
ETL Estimations
This section provides the typical work estimates (in man hours) for average ETL work. These estimates may vary significantly depending on the nature of the work. Source to Staging, ODS or 3NF Model Standard Mappings (One source to One target) Additional sources or targets (same mapping) Special Case (One source to One Target) Non-database to database Processing Flat Files Standard mapping UNIX Standard mapping LAN Building mechanism for retrieving files Staging to ODS or 3NF Model Standard mapping (One source to One target) Additional sources or targets PSA to MDA or Mart Mapping (assumes many sources to a single target) Performance tuning 8 hours 8 hours 4 hours 2 hours per additional table 4 hours 2 hours per additional table 8 hours
These estimations include defining the table in Informatica and some query tuning. All special cases outside of these guidelines need written requirements and analysis by ETL developer. It is assumed that all DBMS objects will be created by the DA and/or DBA assigned to the task with their appropriate sizing where applicable and that all mapping documentations will be entered in Designer prior to the start of development efforts.
Arbitron, Inc.
Page 38
12/12/2013